Re: Print out a list of every URL fetched?

Paul Tomblin Fri, 07 Aug 2009 04:03:48 -0700

Not  quite what I want - that will show me every url that's ever been
crawled, not just the ones fetched this time, nor is it real-time.



On Fri, Aug 7, 2009 at 3:23 AM, Sebastian
Nagel<[email protected]> wrote:
> Hi Paul,
>
> you can use
>
>  $NUTCH_HOME/bin/nutch readdb my_crawl/crawldb/ -dump dump_crawldb/ -format 
> csv
>
> then in dump_crawldb you'll find a CSV file with all URLs in your crawlDb.
> One column indicates the status. Select only those records with "db_fetched"
> and you'll have your list.
>
> Sebastian
>



-- 
http://www.linkedin.com/in/paultomblin

Re: Print out a list of every URL fetched?

Reply via email to