I have a doubt related with this topic (I guess)...How are the final results
of Nutch stored?I mean, in which format is stored the information contained
in the links analyzed?

I understood that Nutch need the information in plan text to parse it...but
in which format is stored finally?I know is stored in "segments" but how can
I access to this information in order to convert it to plan text?Is it
possible?

Thank you in advance 
-- 
View this message in context: 
http://n3.nabble.com/how-to-parse-html-files-while-crawling-tp706816p729943.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to