To convert the Nutch's crawled data which is stored in segments to human readable and interpretable forms, you will have to look at the 'segread' command (which was earlier 'readseg'). It reads and exports the segment data.
Details at Nutch Wiki: http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch_segread - Ankit Dangi On Mon, Apr 19, 2010 at 9:15 PM, nachonieto3 <jinietosanc...@gmail.com>wrote: > > I have a doubt related with this topic (I guess)...How are the final > results > of Nutch stored?I mean, in which format is stored the information contained > in the links analyzed? > > I understood that Nutch need the information in plan text to parse it...but > in which format is stored finally?I know is stored in "segments" but how > can > I access to this information in order to convert it to plan text?Is it > possible? > > Thank you in advance > -- > View this message in context: > http://n3.nabble.com/how-to-parse-html-files-while-crawling-tp706816p729943.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- Ankit Dangi