Re: Per Page Document Content

Nick Burch Wed, 15 Jul 2015 02:00:58 -0700

On Wed, 15 Jul 2015, Nazar Hussain wrote:

The problem I am facing is with pages. I can extract total pages fromdocument metadata. But I can't find any way to extract content per pagefrom the document.


What file formats is this for? And how are you calling Tika?

If the file format is page-based, eg PDF or PPT, then the html you getback should have each page separated, IIRC by a div per page

If the file format isn't a page-based one, and no page information isavailable in the file, then there won't be page information in the HTML asTika isn't able to render the document to spot page breaks.


Nick

Re: Per Page Document Content

Reply via email to