Re: Format of the Nutch Results
Thank you very much!!!I've tried the command as you told me...but I still have some problems...Till I understand is something about the JAVA_HOME, that I've already defined and checked the integrability of the file. I leave you a capture of the problem, maybe someone know what I'm doing wrong. Thanks in advance http://n3.nabble.com/forum/FileDownload.jtp?type=n&id=742376&name=untitled1.bmp -- View this message in context: http://lucene.472066.n3.nabble.com/Format-of-the-Nutch-Results-tp729918p742376.html Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Format of the Nutch Results
I think you need to specify the individual segment.. bin/nutch readseg -dump crawl-20100420112025/segments/20100422092816 dumpSegmentDirectory On Wed, Apr 21, 2010 at 9:38 PM, nachonieto3 wrote: > > Thank you a lot! Now I'm working on that, I have some doubts more...I'm not > able to run the command readseg...I've been consulting some help forum and > the basic synthesis is > readseg > I have the segments in this path: > D:\nutch-0.9\crawl-20100420112025\segments > The file named crawl-20100420112025 is the one where are stored the > segments. So I'm trying to execute the command using these but none is > working: > readseg d/nutch-0.9/crawl-20100420112025/segments > readseg crawl-20100420112025/segments > readseg crawl-20100420112025 > > What I'm doing wrong??When I try to execute I get bash: readseg:command not > found. > Any idea??Thank you in advance. > -- > View this message in context: > http://n3.nabble.com/Format-of-the-Nutch-Results-tp729918p739952.html > Sent from the Nutch - User mailing list archive at Nabble.com. >
Re: Format of the Nutch Results
Thank you a lot! Now I'm working on that, I have some doubts more...I'm not able to run the command readseg...I've been consulting some help forum and the basic synthesis is readseg I have the segments in this path: D:\nutch-0.9\crawl-20100420112025\segments The file named crawl-20100420112025 is the one where are stored the segments. So I'm trying to execute the command using these but none is working: readseg d/nutch-0.9/crawl-20100420112025/segments readseg crawl-20100420112025/segments readseg crawl-20100420112025 What I'm doing wrong??When I try to execute I get bash: readseg:command not found. Any idea??Thank you in advance. -- View this message in context: http://n3.nabble.com/Format-of-the-Nutch-Results-tp729918p739952.html Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Format of the Nutch Results
try bin/nutch on the console. It will give you a list of commands. You could use them to read segments e.g bin/nutch readdb .. On Mon, Apr 19, 2010 at 11:36 PM, nachonieto3 wrote: > > I have a doubt...How are the final results of Nutch stored?I mean, in which > format is stored the information contained in the links analyzed? > > I understood that Nutch need the information in plan text to parse it...but > in which format is stored finally?I know is stored in "segments" but how > can > I access to this information in order to convert it to plan text?Is it > possible? > > Thank you in advance > > > -- > View this message in context: > http://n3.nabble.com/Format-of-the-Nutch-Results-tp729918p729918.html > Sent from the Nutch - User mailing list archive at Nabble.com. >