or you can modify the code from the crawldb reader and get it to dump only
the keys. If your crawldb is large, regex will take forever

On 7 June 2011 22:31, Markus Jelsma <[email protected]> wrote:

> Well, you can dump the crawldb using the bin/nutch readdb command. You'd
> still
> need to parse the output youself to get a decent list of URL's.
>
> > Hi guys,
> >
> > I was wondering if there is a quick method to dump all urls of a merged
> > index (ie a production index).
> > I want to use them them  for a 'fresh' seeding of a new crawldb
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to