Re: Per Page Document Content

Nick Burch Wed, 15 Jul 2015 07:14:04 -0700

On Wed, 15 Jul 2015, Nazar Hussain wrote:

@Matt. I am looking for plain text extraction, no css or xpath. I justwant to extract text per page. So I would have array of plain textcontent on which each index have content of a single page.

You won't be able to do it in the plain-text space. You'll need to extractas XHTML, split into pages based on the page divs, then down-convert theXHTML for each page into plain text

If you have the plain text, then you've lost the page-break information.That's only there in the XHTML


Nick

Re: Per Page Document Content

Reply via email to