Hi,

I am crawling the web...

my machine:
cpu: Xeon 2.8 X 2
ram 2GB
HD raid 2 X 160 GB

After fetching (i stopped the fetcher after it halted (didn't fetch) for a few hours) i have done the following:

1. s1=`ls -d index/segments/2* | tail -1`

2. bin/nutch updatedb index/db/ $s1
   the following is the last few lines from the updatedb

--------------------------------------------------------------------
050916 135308 Processing document 127000
050916 135316 Processing document 128000
050916 135317 Unexpected EOF in: index/segments/20050916014401/fetcher at entry #128116. Ignoring.
050916 135317 Finishing update
050916 135456 Processing pagesByURL: Sorted 3083939 instructions in 99.536 seconds. 050916 135456 Processing pagesByURL: Sorted 30983.15182446552 instructions/second 050916 135559 Processing pagesByURL: Merged to new DB containing 774610 records in 35.355 seconds 050916 135559 Processing pagesByURL: Merged 21909.489464007922 records/second 050916 135611 Processing pagesByMD5: Sorted 803182 instructions in 11.654 seconds. 050916 135611 Processing pagesByMD5: Sorted 68918.99776900635 instructions/second 050916 135627 Processing pagesByMD5: Merged to new DB containing 774610 records in 14.216 seconds 050916 135627 Processing pagesByMD5: Merged 54488.604389420376 records/second 050916 135633 Processing linksByMD5: Sorted 689997 instructions in 6.038 seconds. 050916 135633 Processing linksByMD5: Sorted 114275.75356078171 instructions/second 050916 135648 Processing linksByMD5: Merged to new DB containing 776849 records in 13.624 seconds
050916 135648 Processing linksByMD5: Merged 57020.62536699941 records/second
050916 135655 Processing linksByURL: Sorted 584963 instructions in 7.056 seconds. 050916 135655 Processing linksByURL: Sorted 82902.91950113379 instructions/second 050916 135711 Processing linksByURL: Merged to new DB containing 776849 records in 14.533 seconds
050916 135711 Processing linksByURL: Merged 53454.13885639579 records/second
050916 135718 Processing linksByMD5: Sorted 671867 instructions in 6.732 seconds. 050916 135718 Processing linksByMD5: Sorted 99801.99049316696 instructions/second 050916 135729 Processing linksByMD5: Merged to new DB containing 776849 records in 9.999 seconds
050916 135729 Processing linksByMD5: Merged 77692.66926692669 records/second
050916 135744 Update finished
--------------------------------------------------------------------

As you can see the updatedb gone fine though it encountered the stop of the fetcher

3. bin/nutch mergesegs -dir index/segments/ -i -ds

from here on is the problem

--------------------------------------------------------------------
050916 141720 parsing file:/nutch/conf/nutch-default.xml
050916 141720 parsing file:/nutch/conf/nutch-site.xml
050916 141720 No FS indicated, using default:local
050916 141720 * Opening 2 segments:
050916 141720  - segment 20050916013342: 42287 records.
050916 141721 - data in segment index/segments/20050916014401 is corrupt, using only 128115 entries.
050916 141722  - segment 20050916014401: 128116 records.
050916 141722 * TOTAL 170403 input records in 2 segments.
050916 141722 * Creating master index...
050916 141737  Processed 20000 records (1311.9916 rec/s)
050916 141751  Processed 40000 records (1394.0197 rec/s)
050916 154424  Processed 60000 records (3.851173 rec/s)
--------------------------------------------------------------------
as you can see in thelast line, the indexer process 3.8 records per second which mean too long

Anybody got a clue or a hint please !!!

Regards,

Gal



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to