I have a doubt related with this topic (I guess)...How are the final results of Nutch stored?I mean, in which format is stored the information contained in the links analyzed?
I understood that Nutch need the information in plan text to parse it...but in which format is stored finally?I know is stored in "segments" but how can I access to this information in order to convert it to plan text?Is it possible? Thank you in advance -- View this message in context: http://n3.nabble.com/how-to-parse-html-files-while-crawling-tp706816p729943.html Sent from the Nutch - User mailing list archive at Nabble.com.