Hallo Christian, you might gain a little bit by using the additional "-sort" parameter. But in general I think you would have to tweak pdfbox's code to make it more sensitive for narrow spaces, which I suppose is your problem. I'm really no expert, but words are recognised by gaps, and if you set those gaps too small, the words might break. But it is possible that there are better options than the standard ones for particular texts.
Best Thomas Am 25.04.2012 um 13:01 schrieb Czech, Christian: > Hello, > > PDFBox 1.6.0 can’t extract text correctly (see attachment)! > Operating System: Windows XP > Java: build 1.6.0_31 > PDFBox: 1.6.0 > > Kind Regards > > Mit freundlichen Grüßen > > Christian Czech > Software-Entwicklung > > ELO Digital Office GmbH > Heilbronner Str. 150, D-70191 Stuttgart > Tel.: +49 (0) 711 806089-0 > Fax: +49 (0) 711 806089-39 > E-Mail: [email protected] > Web: www.elo.com > > <image001.jpg> > Alle ELO Bücher finden Sie hier > <image002.gif> Please think before you print. > > > ELO Digital Office GmbH > Firmensitz: Heilbronner Strasse 150, 70191 Stuttgart > Fon: +49 711 806089-0, Fax: +49 711 806089-19, Web: www.elo.com > Geschäftsführer: Karl Heinz Mosbach, Matthias Thiele > BW-Bank, Konto-Nr. 2089782, BLZ 600 501 01 > Registergericht Stuttgart HRB 15059 - USt-IdNr.: DE812471516 > <M-SOFT-33120120221-171234.txt>

