Not quite what I want - that will show me every url that's ever been crawled, not just the ones fetched this time, nor is it real-time.
On Fri, Aug 7, 2009 at 3:23 AM, Sebastian Nagel<[email protected]> wrote: > Hi Paul, > > you can use > > $NUTCH_HOME/bin/nutch readdb my_crawl/crawldb/ -dump dump_crawldb/ -format > csv > > then in dump_crawldb you'll find a CSV file with all URLs in your crawlDb. > One column indicates the status. Select only those records with "db_fetched" > and you'll have your list. > > Sebastian > -- http://www.linkedin.com/in/paultomblin
