from the version 0.8 ,"bin/nutch segread" has been replaced by
"bin/nutch readseg"
I try the command :bin/nutch readseg -get
./crawl/segments/20061013144233/ http://www.nokia.com.cn/
I can get the entire content of the url.
The problem is that there are sevaral segments directories under
./crawl/segments/ ,how can i know the content of the specified url in
which segment.
You could do it from the command line using bin/nutch segread, or you
could do it in Java by opening map file readers on the directories
called "content" found in each segment.
On 10/15/06, shjiang <[EMAIL PROTECTED]> wrote:
I cannot find any api that support this function to read the content of
a specified url from the crawldb.