To convert the Nutch's crawled data which is stored in segments to human
readable and interpretable forms, you will have to look at the 'segread'
command (which was earlier 'readseg'). It reads and exports the segment
data.

Details at Nutch Wiki:
http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_segread

- Ankit Dangi


On Mon, Apr 19, 2010 at 9:15 PM, nachonieto3 <jinietosanc...@gmail.com>wrote:

>
> I have a doubt related with this topic (I guess)...How are the final
> results
> of Nutch stored?I mean, in which format is stored the information contained
> in the links analyzed?
>
> I understood that Nutch need the information in plan text to parse it...but
> in which format is stored finally?I know is stored in "segments" but how
> can
> I access to this information in order to convert it to plan text?Is it
> possible?
>
> Thank you in advance
> --
> View this message in context:
> http://n3.nabble.com/how-to-parse-html-files-while-crawling-tp706816p729943.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
Ankit Dangi

Reply via email to