On Wed, 15 Jul 2015, Nazar Hussain wrote:
@Matt. I am looking for plain text extraction, no css or xpath. I just want to extract text per page. So I would have array of plain text content on which each index have content of a single page.
You won't be able to do it in the plain-text space. You'll need to extract as XHTML, split into pages based on the page divs, then down-convert the XHTML for each page into plain text
If you have the plain text, then you've lost the page-break information. That's only there in the XHTML
Nick
