Hi Andrzej,
Thank you for your reply.
I have tried twice but the segment is not being fixed:
[EMAIL PROTECTED] nutch]# find index/segments/20050919092227/ -name index -print
index/segments/20050919092227/fetcher/index
index/segments/20050919092227/parse_text/index
index/segments/20050919092227/content/index
index/segments/20050919092227/parse_data/index
[EMAIL PROTECTED] nutch]# rm -rf index/segments/20050919092227/fetcher/index
[EMAIL PROTECTED] nutch]# rm -rf index/segments/20050919092227/parse_text/index
[EMAIL PROTECTED] nutch]# rm -rf index/segments/20050919092227/content/index
[EMAIL PROTECTED] nutch]# rm -rf index/segments/20050919092227/parse_data/index
[EMAIL PROTECTED] nutch]# bin/nutch segread index/segments/20050919092227 -fix
050920 031844 parsing file:/nutch/conf/nutch-default.xml
050920 031844 parsing file:/nutch/conf/nutch-site.xml
050920 031845 No FS indicated, using default:local
050920 031849 - fixed fetcher
050920 031932 - fixed content
050920 031952 - fixed parse_data
050920 032006 - fixed parse_text
050920 032006 Finished fixing 20050919092227
050920 032006 - data in segment index/segments/20050919092227 is
corrupt, using only 91212 entries.
Thanks,
Gal
Andrzej Bialecki wrote:
Gal Nitzan wrote:
Hi,
Well I still get a very slow mergesegs:
[EMAIL PROTECTED] nutch]# tail -f nutch-mergesegs-kunzon.com.log
050919 171351 Processed 120000 records (1146.5918 rec/s)
050919 171408 Processed 140000 records (1158.2788 rec/s)
050919 171428 Processed 160000 records (1019.8358 rec/s)
050919 171451 Processed 180000 records (879.2368 rec/s)
050919 171510 Processed 200000 records (1054.9636 rec/s)
050919 171528 Processed 220000 records (1069.2328 rec/s)
050919 171547 Processed 240000 records (1099.868 rec/s)
050919 171832 - creating next subindex...
050919 174512 Processed 260000 records (11.328647 rec/s)
050919 200315 Processed 280000 records (2.4145627 rec/s)
It is falling to 2.4 res per second ...
Can somebody help me please. 400K records is only the beginning what
will happen when it is 4M?
>050917 043332 - data in segment index/segments/20050916014401 is
corrupt, using only 128115 entries.
This is the real reason for the slowdown. Technically speaking, a
partially corrupted MapFile is readable and usable. However, random
access is orders of magnitude slower...
The fix is simple: delete the "index" files in each subdirectory of
the 20050916014401 segment. Then run "nutch segread -fix
20050916014401". Then re-run mergesegs - it will now work at full speed.
NB. if there are any more segments which give you this warning, do the
same before you run mergesegs.
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general