Crawl process seems to complete but all output files seem to be empty

arul velusamy Tue, 03 Feb 2009 12:35:15 -0800

Dear All,

I downloaded Nutch 0.8.1 and am running it with Eclipse 3.4.1 (OS: windows
vista).



(1) I have set in crawl-urlfilter.txt the following -

+^http://([a-z0-9]*\.)*cricinfo.com/

(2) I have created NUTCH_HOME/urls/nutch with the content -

http://www.cricinfo.com

(3) My command line parameters -

urls -dir crawl -depth 3 -topN 50

When I run Crawl using Eclipse, I see all ouput directories and files
created. _BUT_ I don't see any useful crawled content in it.
Infact, running SegmentReader with the command line parameters - "-list -dir
crawl/segments/" is giving the following output -

NAME GENERATED FETCHER START FETCHER END FETCHED PARSED

20090203201844 0 292278994-08-17T07:12:55 292269055-12-02T16:47:04 0 0

20090203201851 0 292278994-08-17T07:12:55 292269055-12-02T16:47:04 0 0

20090203201857 0 292278994-08-17T07:12:55 292269055-12-02T16:47:04 0 0

What is going wrong? Please help on this.

Thanks,

Arul.

Crawl process seems to complete but all output files seem to be empty

Reply via email to