Hello All,

I was running an intranet crawl and It seems like it did not finish, properly.
It is a pretty default setup, but crawl's depth was 15, and I had
turned on queries by commenting out
# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]


other than bunch of fetch messages, and bunch of Exceeding max.delays
meaning message I am seeing the following..

crawl starts normally...
050228 064335 status: segment 20050228044354, 6300 pages, 91 errors,
140194211 bytes, 7163124 ms
050228 064335 status: 0.8795045 pages/s, 152.90356 kb/s, 22253.049 bytes/page
.......
050228 064551 status: segment 20050228044354, 6400 pages, 97 errors,
142348797 bytes, 7298549 ms
050228 064551 status: 0.87688667 pages/s, 152.37276 kb/s, 22242.0 bytes/page
.....
050228 064759 status: segment 20050228044354, 6500 pages, 102 errors,
144522915 bytes, 7427113 ms

Results of all this was a nutch-seacher-dir looked like this:
du -h nutch-searcher.dir/
5.3M    nutch-searcher.dir/db/webdb/pagesByURL
3.4M    nutch-searcher.dir/db/webdb/pagesByMD5
14M     nutch-searcher.dir/db/webdb/linksByMD5
14M     nutch-searcher.dir/db/webdb/linksByURL
36M     nutch-searcher.dir/db/webdb
36M     nutch-searcher.dir/db
12K     nutch-searcher.dir/segments/20050228020140/fetchlist
12K     nutch-searcher.dir/segments/20050228020140/fetcher
20K     nutch-searcher.dir/segments/20050228020140/content
12K     nutch-searcher.dir/segments/20050228020140/parse_text
16K     nutch-searcher.dir/segments/20050228020140/parse_data
76K     nutch-searcher.dir/segments/20050228020140
16K     nutch-searcher.dir/segments/20050228020146/fetchlist
16K     nutch-searcher.dir/segments/20050228020146/fetcher
316K    nutch-searcher.dir/segments/20050228020146/content
52K     nutch-searcher.dir/segments/20050228020146/parse_text
144K    nutch-searcher.dir/segments/20050228020146/parse_data
548K    nutch-searcher.dir/segments/20050228020146
56K     nutch-searcher.dir/segments/20050228020257/fetchlist
68K     nutch-searcher.dir/segments/20050228020257/fetcher
2.2M    nutch-searcher.dir/segments/20050228020257/content
260K    nutch-searcher.dir/segments/20050228020257/parse_text
912K    nutch-searcher.dir/segments/20050228020257/parse_data
3.5M    nutch-searcher.dir/segments/20050228020257
232K    nutch-searcher.dir/segments/20050228020931/fetchlist
276K    nutch-searcher.dir/segments/20050228020931/fetcher
9.4M    nutch-searcher.dir/segments/20050228020931/content
1.1M    nutch-searcher.dir/segments/20050228020931/parse_text
4.1M    nutch-searcher.dir/segments/20050228020931/parse_data
15M     nutch-searcher.dir/segments/20050228020931
900K    nutch-searcher.dir/segments/20050228024012/fetchlist
1.1M    nutch-searcher.dir/segments/20050228024012/fetcher
37M     nutch-searcher.dir/segments/20050228024012/content
3.9M    nutch-searcher.dir/segments/20050228024012/parse_text
16M     nutch-searcher.dir/segments/20050228024012/parse_data
58M     nutch-searcher.dir/segments/20050228024012
3.2M    nutch-searcher.dir/segments/20050228044354/fetchlist
1.1M    nutch-searcher.dir/segments/20050228044354/fetcher
39M     nutch-searcher.dir/segments/20050228044354/content
3.6M    nutch-searcher.dir/segments/20050228044354/parse_text
16M     nutch-searcher.dir/segments/20050228044354/parse_data
62M     nutch-searcher.dir/segments/20050228044354
139M    nutch-searcher.dir/segments
175M    nutch-searcher.dir

Crawl ran for about 2 hours and 43 minutes.

when I search, it looks at the right searcher.dir, but its not
returning anything for me:
050228 085819 10 query request from 64.171.1.207
050228 085819 10 query: bhangra
050228 085819 10 searching for 20 raw hits
050228 085819 10 total hits: 0

what am I doing wrong? TIA for the help.

Regards,
Paul


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to