Re: Format of the Nutch Results

2010-04-22 Thread nachonieto3

Thank you very much!!!I've tried the command as you told me...but I still
have some problems...Till I understand is something about the JAVA_HOME,
that I've already defined and checked the integrability of the file. I leave
you a capture of the problem, maybe someone know what I'm doing wrong.

Thanks in advance
http://n3.nabble.com/forum/FileDownload.jtp?type=n&id=742376&name=untitled1.bmp 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Format-of-the-Nutch-Results-tp729918p742376.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Format of the Nutch Results

2010-04-21 Thread Harry Nutch
I think you need to specify the individual segment..
bin/nutch readseg -dump crawl-20100420112025/segments/20100422092816
dumpSegmentDirectory

On Wed, Apr 21, 2010 at 9:38 PM, nachonieto3 wrote:

>
> Thank you a lot! Now I'm working on that, I have some doubts more...I'm not
> able to run the command readseg...I've been consulting some help forum and
> the basic synthesis is
> readseg 
> I have the segments in this path:
> D:\nutch-0.9\crawl-20100420112025\segments
> The file named  crawl-20100420112025 is the one where are stored the
> segments. So I'm trying to execute the command using these but none is
> working:
> readseg d/nutch-0.9/crawl-20100420112025/segments
> readseg crawl-20100420112025/segments
> readseg crawl-20100420112025
>
> What I'm doing wrong??When I try to execute I get bash: readseg:command not
> found.
> Any idea??Thank you in advance.
> --
> View this message in context:
> http://n3.nabble.com/Format-of-the-Nutch-Results-tp729918p739952.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>


Re: Format of the Nutch Results

2010-04-21 Thread nachonieto3

Thank you a lot! Now I'm working on that, I have some doubts more...I'm not
able to run the command readseg...I've been consulting some help forum and
the basic synthesis is 
readseg 
I have the segments in this path: D:\nutch-0.9\crawl-20100420112025\segments
The file named  crawl-20100420112025 is the one where are stored the
segments. So I'm trying to execute the command using these but none is
working:
readseg d/nutch-0.9/crawl-20100420112025/segments
readseg crawl-20100420112025/segments
readseg crawl-20100420112025

What I'm doing wrong??When I try to execute I get bash: readseg:command not
found.
Any idea??Thank you in advance.
-- 
View this message in context: 
http://n3.nabble.com/Format-of-the-Nutch-Results-tp729918p739952.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Format of the Nutch Results

2010-04-20 Thread Harry Nutch
try bin/nutch on the console.

It will give you a list of commands. You could use them to read segments e.g
bin/nutch readdb ..

On Mon, Apr 19, 2010 at 11:36 PM, nachonieto3 wrote:

>
> I have a doubt...How are the final results of Nutch stored?I mean, in which
> format is stored the information contained in the links analyzed?
>
> I understood that Nutch need the information in plan text to parse it...but
> in which format is stored finally?I know is stored in "segments" but how
> can
> I access to this information in order to convert it to plan text?Is it
> possible?
>
> Thank you in advance
>
>
> --
> View this message in context:
> http://n3.nabble.com/Format-of-the-Nutch-Results-tp729918p729918.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>