Hi Kulbhushan,
is it possible to extract the text using Adobe Reader or Adobe Acrobat without
the junk characters? If no PDFBox can't help too. If yes could you open a case
at Jira (https://issues.apache.org/jira/browse/PDFBOX) and attach a sample PDF
which enables us to reproduce the issue.
I'm pretty sure I had to build from source.
If that's not going to be easy for you I can provide the jar I built offline
but there's probably a better source.
Cheers,
Eliot
On 2/5/13 2:51 PM, "Alain" wrote:
> Eliot, thanks for the reply!
>
> I am currently running 1.7.1, where did you find r
On Tue, Feb 5, 2013 at 6:36 PM, Andreas Lehmkuehler wrote:
> Hi,
>
> Am 05.02.2013 15:01, schrieb kulbhushan singh:
>
> Hi,
>>
>> I am trying to extract text from a pdf file with custom fonts but it is
>> giving me junk characters. The fonts used are ArialMT (embedded subset) &
>> Arial-BoldMT (e
Eliot, thanks for the reply!
I am currently running 1.7.1, where did you find release 1.8?
Alain
From: Eliot Kimber
To: "users@pdfbox.apache.org" ; Alain
Sent: Tuesday, February 5, 2013 3:46 PM
Subject: Re: Printing a PDF doc with images
Not sure if it'
Not sure if it's the same issue, but I ran into a problem with scanned
images that used an overlay mask. Those images are not handled correctly by
PDFBox 1.7.1 but the handling is corrected in 1.8 as of last November some
time. So you might try using the latest 1.8 build and see if it resolves
your
Hi,
I am trying to extract text from a pdf file with custom fonts but it is
giving me junk characters. The fonts used are ArialMT (embedded subset) &
Arial-BoldMT (embedded subset). The producer of pdf file is GPL Ghost
script 8.15. I am using PDFTextStripper to extract the text. How can do it
for
if you set the system property
"org.apache.pdfbox.pdfparser.nonSequentialPDFParser.parseMinimal" to true page
content is only parsed if you request the page. In addition the default
behavior also creates less objects. Although this is not the behavior you
requested for i.e. streaming like Stax
Hi,
Am 05.02.2013 15:20, schrieb VIGNESH S:
I think non sequential PDF Parser also loads everyobjects in Objectpool..
The diffrence I think in nonsequential is that it reads the Xref table
in trailer to know the PDF structure instead of linearly traversing
the document.
Yes, it works different
Hi,
Am 05.02.2013 15:01, schrieb kulbhushan singh:
Hi,
I am trying to extract text from a pdf file with custom fonts but it is
giving me junk characters. The fonts used are ArialMT (embedded subset) &
Arial-BoldMT (embedded subset). The producer of pdf file is GPL Ghost
script 8.15. I am using
I think non sequential PDF Parser also loads everyobjects in Objectpool..
The diffrence I think in nonsequential is that it reads the Xref table
in trailer to know the PDF structure instead of linearly traversing
the document.
Correct me if Iam wrong.
On Sat, Feb 2, 2013 at 11:58 AM, Maruan Sah
Hi,
I am trying to extract text from a pdf file with custom fonts but it is
giving me junk characters. The fonts used are ArialMT (embedded subset) &
Arial-BoldMT (embedded subset). The producer of pdf file is GPL Ghost
script 8.15. I am using PDFTextStripper to extract the text. How can do it
for
11 matches
Mail list logo