Thanks Markus!

So after some testing and walking the DocumentFragment, I see that all I get
is one node:
<html>
some content here and here
</html>

I guess I expected to see more from a PDF/word document (like H1 tags, etc)
that would help make the xhtml format more readable.

Am I missing something? Do I have to do anything special to the
DocumentFragment to format it?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cached-page-like-google-with-hits-highlighted-tp4001374p4001434.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to