Hi Thx for the Suggestion...
Do we have a simple example for the same. please share with regards Karthik On 2021/10/21 18:26:58, Nick Burch <[email protected]> wrote: > On Thu, 21 Oct 2021, nskarthik wrote: > > Question : Need to extract Text / images at page level using java. > > Did not find any example on www or Tika website. > > For PDF, you should fetch the contents as XHTML rather than plain text. > You can then split on the page divs. This isn't available for formats > which aren't page-based, but luckily PDF is > > Depending on what you want to do, it might make sense to write a custom > ContentHandler which works a lot like the ToTextContentHandler in Tika, > but which starts writing to a new text buffer each time it hits the event > for a new page > > Nick >
