Hi Doug,

Thank you for the prompt reply.

Well things got much much faster (i guess about 40% faster), but it seems that something got really corrupted. Everything gets stuck after 40K records.

[EMAIL PROTECTED] nutch]# bin/nutch mergesegs -dir index/segments/ -i -ds
050917 043331 parsing file:/nutch/conf/nutch-default.xml
050917 043331 parsing file:/nutch/conf/nutch-site.xml
050917 043331 No FS indicated, using default:local
050917 043331 * Opening 2 segments:
050917 043332  - segment 20050916013342: 42287 records.
050917 043332 - data in segment index/segments/20050916014401 is corrupt, using only 128115 entries.
050917 043332  - segment 20050916014401: 128116 records.
050917 043332 * TOTAL 170403 input records in 2 segments.
050917 043332 * Creating master index...
050917 043345  Processed 20000 records (1613.5538 rec/s)
050917 043354  Processed 40000 records (2113.9414 rec/s)

And that is it. I notice memory is still being consumed but no apparent activity.

Since I'm really newbie to nutch, could you give me a tip on a way to rescue the already fetched data and to remove the corruption from the segment. I already tried the -fix but it didn't help.

Regards,

Gal


Doug Cutting wrote:
The default for indexer.maxMergeDocs was mistakenly set to 50, which can make indexing really slow. Try putting the following in your nutch-site.xml:

<property>
  <name>indexer.maxMergeDocs</name>
  <value>2147483647</value>
</property>

Does that help?

I just fixed this in trunk. We should fix this in the 0.7 release branch.

Doug

.




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to