Re: AW: Determine binary pdf?

Nick Burch Tue, 22 Jul 2014 04:36:13 -0700

On Tue, 22 Jul 2014, Clemens Wyss DEV wrote:

I have thousands of pdf's that are extracted using tika and thenindexed/analyzed in Lucene. An there seems to be "cryprtic" text (binarydata?) in some of the pdfs.

Are you able to identify a small pdf (ideally sub 100kb) which shows theproblem? If so, please open a new JIRA, and upload the problematic file

It might be a Tika bug, or it might be one in the upstream Apache PDFBox,but we'll need a sample file to work it out!


Nick

Re: AW: Determine binary pdf?

Reply via email to