Thanks Markus! So after some testing and walking the DocumentFragment, I see that all I get is one node: <html> some content here and here </html>
I guess I expected to see more from a PDF/word document (like H1 tags, etc) that would help make the xhtml format more readable. Am I missing something? Do I have to do anything special to the DocumentFragment to format it? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Cached-page-like-google-with-hits-highlighted-tp4001374p4001434.html Sent from the Nutch - User mailing list archive at Nabble.com.