Once I solved my problem with the other formats. Now I'm trying to figure out how to solve another one. I'm able to parse .html format but I get the ParseText in one line. I would like to respect at least the paragraphs of the original document. Anyone know how to do it? Thank you in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Parsing-html-tp776487p776487.html Sent from the Nutch - User mailing list archive at Nabble.com.