You can dump segment info to a directory, let's say "tmps",
$NUTCH_HOME/bin/nutch readseg -dump $segment tmps -nocontent

Then, go to the directory, you should see a file "dump"
grep outlink: dump | cut -f5 -d" " > outlinks

On Fri, 2009-07-17 at 18:43 +0200, reinhard schwab wrote:
> is any tool available to dump all outlinks (filtered outlinks included)?
> (i know the tools to dump crawldb, linkdb and segments)
> or do i have to implement such a tool and if, how?
> i want to know them to adapt/manage the url filters.
> parse the contents with urlfilters disabled?
> 
> reinhard

Reply via email to