On Wed, 15 Jul 2015, Nazar Hussain wrote:
The problem I am facing is with pages. I can extract total pages from
document metadata. But I can't find any way to extract content per page
from the document.
What file formats is this for? And how are you calling Tika?
If the file format is page-based, eg PDF or PPT, then the html you get
back should have each page separated, IIRC by a div per page
If the file format isn't a page-based one, and no page information is
available in the file, then there won't be page information in the HTML as
Tika isn't able to render the document to spot page breaks.
Nick