PDF2XHTML is already being loaded by the pdf parser. Something is not adding it to the DocumentFragment however, I can't seem to find out where? * any other ideas? * I don't want to run Tika separately during the parse step to get the XHTML (seems silly) but I will if I absolutely have to.
-- View this message in context: http://lucene.472066.n3.nabble.com/Cached-page-like-google-with-hits-highlighted-tp4001374p4003801.html Sent from the Nutch - User mailing list archive at Nabble.com.