That's a good point. I think the Lucene index might be the only place where that information is stored. If you really needed it I guess you could build your own mapping of URL to segment. However, I am not that familiar with Nutch so I will let someone with more experience answer this.
On 10/15/06, shjiang <[EMAIL PROTECTED]> wrote: > from the version 0.8 ,"bin/nutch segread" has been replaced by > "bin/nutch readseg" > I try the command :bin/nutch readseg -get > ./crawl/segments/20061013144233/ http://www.nokia.com.cn/ > I can get the entire content of the url. > The problem is that there are sevaral segments directories under > ./crawl/segments/ ,how can i know the content of the specified url in > which segment. > > You could do it from the command line using bin/nutch segread, or you > > could do it in Java by opening map file readers on the directories > > called "content" found in each segment. > > > > On 10/15/06, shjiang <[EMAIL PROTECTED]> wrote: > >> I cannot find any api that support this function to read the content of > >> a specified url from the crawldb. > >> > > > > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
