That is the SegmentReader tool. 

You an use the crawldbscanner tool in Nutch 1.4 to get a dump of crawldb 
records by status. In Nutch trunk you can use the readdb tool as well to get a 
dump of records by status or regex pattern and write as CSV which is easier to 
use than the output of crawldbscanner.

> This command dumps the fetched and unfetched but not gone urls:
> http://wiki.apache.org/nutch/bin/nutch_readseg
> 
> Remi
> 
> On Monday, January 23, 2012, Nutch Begineeer <sachinyadav0...@gmail.com>
> 
> wrote:
> > What is command to get list of all unfetched , gone, fetched urls. I am
> 
> only
> 
> > able to get their count using crawl_stats command. I want exact url list.
> > 
> > --
> 
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Dump-unfetched-fetched-gone-URLS-tp36817
> 69p3681769.html
> 
> > Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to