Re: Fwd: Trouble reading IEEE pdf

Tilman Hausherr Thu, 02 Feb 2017 08:02:36 -0800

Am 02.02.2017 um 16:10 schrieb Pulkit Kapur:

Hi


I have uploaded the pdf here:
https://www.scribd.com/document/338221804/0024-iros-2016


Hello Pulkit,

This site requires registration. This is a "don't" from the list:
https://pdfbox.apache.org/support.html

I don't want to register.

Please find a sharehoster that doesn't require registration to download.

If the XObject that Karl Heinz Kremer mentioned is a form then textextraction should work, especially if it was possible to extract withAdobe Reader. If it is an image then it won't. Apache Tika might help.

Please mention what you did to get the text with PDFBox, and whatversion you were using.

You wrote "using readText function from the pdfbox library". There is no"readText" method in PDFBox. Could it be that you used a different product?


Tilman


I did some more diagnosis last night and it seems that there are two layers
on the pdf. One which is the content and the other with headers and
footers. Pdf box is only reading the headers and footers.
I suspect this must be common with all conference proceedings.

Thanks,

Pulkit

On Thu, Feb 2, 2017 at 1:21 AM, Tilman Hausherr <[email protected]>
wrote:

Am 02.02.2017 um 05:55 schrieb Pulkit Kapur:

Hi

I am trying to read some past years IEEE conference proceedings i have.
I can read the pdf using acrobat and select the text.

But when i try to read the text using readText function from the pdfbox
library, i only get the headers and footers in the pdf.

I did check the document is not encrypted.
Also my code works on other pdf documents but all IEEE proceedings that
are downloaded form IEEE fail to work.

I have attached the pdf document with this message.

Please upload the pdf somewhere, PDF attachments are not allowed here.



Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Fwd: Trouble reading IEEE pdf

Reply via email to