Hi Doug,
Thank you for the prompt reply.
Well things got much much faster (i guess about 40% faster), but it
seems that something got really corrupted. Everything gets stuck after
40K records.
[EMAIL PROTECTED] nutch]# bin/nutch mergesegs -dir index/segments/ -i -ds
050917 043331 parsing file:/nutch/conf/nutch-default.xml
050917 043331 parsing file:/nutch/conf/nutch-site.xml
050917 043331 No FS indicated, using default:local
050917 043331 * Opening 2 segments:
050917 043332 - segment 20050916013342: 42287 records.
050917 043332 - data in segment index/segments/20050916014401 is
corrupt, using only 128115 entries.
050917 043332 - segment 20050916014401: 128116 records.
050917 043332 * TOTAL 170403 input records in 2 segments.
050917 043332 * Creating master index...
050917 043345 Processed 20000 records (1613.5538 rec/s)
050917 043354 Processed 40000 records (2113.9414 rec/s)
And that is it. I notice memory is still being consumed but no apparent
activity.
Since I'm really newbie to nutch, could you give me a tip on a way to
rescue the already fetched data and to remove the corruption from the
segment. I already tried the -fix but it didn't help.
Regards,
Gal
Doug Cutting wrote:
The default for indexer.maxMergeDocs was mistakenly set to 50, which
can make indexing really slow. Try putting the following in your
nutch-site.xml:
<property>
<name>indexer.maxMergeDocs</name>
<value>2147483647</value>
</property>
Does that help?
I just fixed this in trunk. We should fix this in the 0.7 release
branch.
Doug
.
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general