Re: How to read content of a particular url from the crawldb?

shjiang Sun, 15 Oct 2006 21:58:30 -0700

from the version 0.8 ,"bin/nutch segread" has been replaced by"bin/nutch readseg"I try the command :bin/nutch readseg -get./crawl/segments/20061013144233/ http://www.nokia.com.cn/

I can get the entire content of the url.

The problem is that there are sevaral segments directories under./crawl/segments/ ,how can i know the content of the specified url inwhich segment.

You could do it from the command line using bin/nutch segread, or you
could do it in Java by opening map file readers on the directories
called "content" found in each segment.


On 10/15/06, shjiang <[EMAIL PROTECTED]> wrote:

I cannot find any api that support this function to read  the content of
a specified url from the crawldb.

Re: How to read content of a particular url from the crawldb?

Reply via email to