Hi I plan to get Text/images out of pdf/docx/xlsx./html/csv/mht......so on
Instead of using POI / PDFBox /... thought Tika would be single source of Data extraction... Hence wanted to use the same. with regards Karthik On 2021/10/22 14:41:38, AJ Weber <[email protected]> wrote: > > >>> Question : Need to extract Text / images at page level using java. > >>> Did not find any example on www or Tika website. > > Why not use a library specifically suited to the job like Apache PDFBox > (directly)? > > >
