On Wed, 15 Jul 2015, Nazar Hussain wrote:
The problem I am facing is with pages. I can extract total pages from document metadata. But I can't find any way to extract content per page from the document.

What file formats is this for? And how are you calling Tika?

If the file format is page-based, eg PDF or PPT, then the html you get back should have each page separated, IIRC by a div per page

If the file format isn't a page-based one, and no page information is available in the file, then there won't be page information in the HTML as Tika isn't able to render the document to spot page breaks.

Nick

Reply via email to