or you can modify the code from the crawldb reader and get it to dump only the keys. If your crawldb is large, regex will take forever
On 7 June 2011 22:31, Markus Jelsma <[email protected]> wrote: > Well, you can dump the crawldb using the bin/nutch readdb command. You'd > still > need to parse the output youself to get a decent list of URL's. > > > Hi guys, > > > > I was wondering if there is a quick method to dump all urls of a merged > > index (ie a production index). > > I want to use them them for a 'fresh' seeding of a new crawldb > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

