Hi Jukka,

On Tue, Oct 27, 2009 at 4:32 PM, Thomas Fischer <fischer...@aon.at> wrote:
I am puzzled about the "org.pdfbox.util.operator.ShowTextGlyph" appearing here (no apache!) and wondering if org.pdfbox.util.operator.ShowTextGlyph
should be cast to org.apache.pdfbox.util.operator.OperatorProcessor.

I believe you have both versions (0.7.3 and 0.8.0) of PDFBox in your
classpath. Remove the old jar, and everything should work correctly.

well, conceptually I would think that PDFBox 0.8 shouldn't trespass into the other namespace and both versions thus should be able to coexist. But I removed versions 0.7.x from the classpath, had to include fontbox in it and then PDFBox 0.8 worked! So thanks for that! (Does anybody have a good hint how to arrange packages and the classpath on Mac OS X?)

I find some improvements in PDFBox 0.8 over 0.7.3 in the separation of words, although it is still not perfect. For example:
0.7.3: Withoutrestrictionofgeneralitywecanassumethat
0.8.0: Without restriction of generalitywe can assume that

I'm also not quite sure what the "-sort" parameter does. In my mathematical texts it usually didn't matter to me, because only some formulas where inverted (like putting a superscript before or after the actual letter), and I'm looking into text search only for now. But I found one case, where it caused actual damage (line breaks removed):

PDFBox 0.8.0 w/o -sort:
hence Jn(X) ∼ = G Gamma1( ˜X,Omega11 ˜X (log D)), from which it follows that
(0.7.3 is similar)

PDFBox 0.8.0 using -sort:
hence Jn(X ∼) G hich it follows that = ˜ Gamma1(X, 1Omega1 (log D , from w)) X˜

Here the "from w" is removed from the "hich".
So I wonder when the "-sort" is actually useful.

Thomas


Reply via email to