We are using parse-pdf to parse PDF documents. We modified cached.jsp
to display the parsed content instead of the link to the cached
document. We are using bean.getParseText(details) to get the parsed
text from the cached PDF document. But the output that comes on
cached.jsp is not pretty. Parsed text doesn't have any formatting
information. I am just wondering whether there is anything in Nutch
that could display cached PDF documents with proper formatting or at
least some formatting like headers, paragraphs, etc.?