Re: PDFBox 1.6.0 can't extract text correctly

Thomas Fischer Wed, 25 Apr 2012 08:03:43 -0700

Hallo Christian,

you might gain a little bit by using the additional "-sort" parameter. But in 
general I think you would have to tweak pdfbox's code to make it more sensitive 
for narrow spaces, which I suppose is your problem. I'm really no expert, but 
words are recognised by gaps, and if you set those gaps too small, the words 
might break. But it is possible that there are better options than the standard 
ones for particular texts.


Best
Thomas

Am 25.04.2012 um 13:01 schrieb Czech, Christian:

> Hello,
>  
> PDFBox 1.6.0 can’t extract text correctly (see attachment)!
> Operating System: Windows XP
> Java: build 1.6.0_31
> PDFBox: 1.6.0
>  
> Kind Regards
>  
> Mit freundlichen Grüßen
> 
> Christian Czech 
> Software-Entwicklung
>  
> ELO Digital Office GmbH  
> Heilbronner Str. 150, D-70191 Stuttgart  
> Tel.:      +49 (0) 711 806089-0  
> Fax:       +49 (0) 711 806089-39
> E-Mail:  [email protected]
> Web:     www.elo.com
>  
> <image001.jpg>
> Alle ELO Bücher finden Sie hier
> <image002.gif>  Please think before you print.
>  
> 
> ELO Digital Office GmbH
> Firmensitz: Heilbronner Strasse 150, 70191 Stuttgart
> Fon: +49 711 806089-0, Fax: +49 711 806089-19, Web: www.elo.com
> Geschäftsführer: Karl Heinz Mosbach, Matthias Thiele
> BW-Bank, Konto-Nr. 2089782, BLZ 600 501 01
> Registergericht Stuttgart HRB 15059 - USt-IdNr.: DE812471516
> <M-SOFT-33120120221-171234.txt>

Re: PDFBox 1.6.0 can't extract text correctly

Reply via email to