date:20100414

readlinkdb does not work on nutch 1.0 installation

2010-04-14 Thread Norman Birke

Hi, I am trying to dump my linkdb content for analysis using the following command: bin/nutch readlinkdb crawl/linkdb -dump readlinkdb_dump I receive the following output in my shell: LinkDb dump: starting LinkDb db: crawl/linkdb/ After that the readlinkdb_dump folder exists and in it the 2 file

Re: how to parse html files while crawling

2010-04-14 Thread xiao yang

The parsed html files are saved in "segments" On Fri, Apr 9, 2010 at 3:40 AM, cefurkan0 cefurkan0 wrote: > i can successfully crawl web sites with > > bin/nutch crawl command > > but i also want to save parsed html files > > how can i do that > > ty >