Hi,
I am crawling the web...
my machine:
cpu: Xeon 2.8 X 2
ram 2GB
HD raid 2 X 160 GB
After fetching (i stopped the fetcher after it halted (didn't fetch) for
a few hours) i have done the following:
1. s1=`ls -d index/segments/2* | tail -1`
2. bin/nutch updatedb index/db/ $s1
the following is the last few lines from the updatedb
--------------------------------------------------------------------
050916 135308 Processing document 127000
050916 135316 Processing document 128000
050916 135317 Unexpected EOF in: index/segments/20050916014401/fetcher
at entry #128116. Ignoring.
050916 135317 Finishing update
050916 135456 Processing pagesByURL: Sorted 3083939 instructions in
99.536 seconds.
050916 135456 Processing pagesByURL: Sorted 30983.15182446552
instructions/second
050916 135559 Processing pagesByURL: Merged to new DB containing 774610
records in 35.355 seconds
050916 135559 Processing pagesByURL: Merged 21909.489464007922
records/second
050916 135611 Processing pagesByMD5: Sorted 803182 instructions in
11.654 seconds.
050916 135611 Processing pagesByMD5: Sorted 68918.99776900635
instructions/second
050916 135627 Processing pagesByMD5: Merged to new DB containing 774610
records in 14.216 seconds
050916 135627 Processing pagesByMD5: Merged 54488.604389420376
records/second
050916 135633 Processing linksByMD5: Sorted 689997 instructions in 6.038
seconds.
050916 135633 Processing linksByMD5: Sorted 114275.75356078171
instructions/second
050916 135648 Processing linksByMD5: Merged to new DB containing 776849
records in 13.624 seconds
050916 135648 Processing linksByMD5: Merged 57020.62536699941 records/second
050916 135655 Processing linksByURL: Sorted 584963 instructions in 7.056
seconds.
050916 135655 Processing linksByURL: Sorted 82902.91950113379
instructions/second
050916 135711 Processing linksByURL: Merged to new DB containing 776849
records in 14.533 seconds
050916 135711 Processing linksByURL: Merged 53454.13885639579 records/second
050916 135718 Processing linksByMD5: Sorted 671867 instructions in 6.732
seconds.
050916 135718 Processing linksByMD5: Sorted 99801.99049316696
instructions/second
050916 135729 Processing linksByMD5: Merged to new DB containing 776849
records in 9.999 seconds
050916 135729 Processing linksByMD5: Merged 77692.66926692669 records/second
050916 135744 Update finished
--------------------------------------------------------------------
As you can see the updatedb gone fine though it encountered the stop of
the fetcher
3. bin/nutch mergesegs -dir index/segments/ -i -ds
from here on is the problem
--------------------------------------------------------------------
050916 141720 parsing file:/nutch/conf/nutch-default.xml
050916 141720 parsing file:/nutch/conf/nutch-site.xml
050916 141720 No FS indicated, using default:local
050916 141720 * Opening 2 segments:
050916 141720 - segment 20050916013342: 42287 records.
050916 141721 - data in segment index/segments/20050916014401 is
corrupt, using only 128115 entries.
050916 141722 - segment 20050916014401: 128116 records.
050916 141722 * TOTAL 170403 input records in 2 segments.
050916 141722 * Creating master index...
050916 141737 Processed 20000 records (1311.9916 rec/s)
050916 141751 Processed 40000 records (1394.0197 rec/s)
050916 154424 Processed 60000 records (3.851173 rec/s)
--------------------------------------------------------------------
as you can see in thelast line, the indexer process 3.8 records per
second which mean too long
Anybody got a clue or a hint please !!!
Regards,
Gal
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general