I checked out the code from trunk after Sami committed the change. I started
out a new crawl db and run several cycles of crawl sequentially on one linux
server. See below for the real numbers from my test. The performance is
still poor because the crawler still spend too much time in reduce and
What kind of hardware are you running on? Your pages per sec ratio seems
very low to me.
How big was your crawldb when you started and how big was it at end?
What kind of filters and normalizers are you using?
--
Sami Siren
AJ Chen wrote:
I checked out the code from trunk after Sami
Hi
NUTCH-61(http://issues.apache.org/jira/browse/NUTCH-61) is about
adaptive re-fetch plugin, and Jerome Charron had commented --Why not
making FetchSchedule a new ExtensionPoint and then
DefaultFetchSchedule and AdaptiveFetchSchedule some fetch schedule
plugins? . I am for it. Maintaining
Scott:
Would you be kind enough to upload your Nutch-Gui patch which works
with current trunk? I would like to give it a try.
Regards
On 11/22/06, scott green [EMAIL PROTECTED] wrote:
On 11/22/06, Sami Siren [EMAIL PROTECTED] wrote:
scott green wrote:
Hi
I am now port Stefan to my