from the version 0.8 ,"bin/nutch segread" has been replaced by "bin/nutch readseg" I try the command :bin/nutch readseg -get ./crawl/segments/20061013144233/ http://www.nokia.com.cn/
I can get the entire content of the url.
The problem is that there are sevaral segments directories under ./crawl/segments/ ,how can i know the content of the specified url in which segment.
You could do it from the command line using bin/nutch segread, or you
could do it in Java by opening map file readers on the directories
called "content" found in each segment.

On 10/15/06, shjiang <[EMAIL PROTECTED]> wrote:
I cannot find any api that support this function to read  the content of
a specified url from the crawldb.




Reply via email to