Fixed Page rotation
-------------------

                 Key: PDFBOX-363
                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
             Project: PDFBox
          Issue Type: Improvement
            Reporter: Jukka Zitting


[Issue from SourceForge]
http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834

Hi all,

Daniel asked me for my patch for the rotation-issue described in
https://sourceforge.net/forum/message.php?msg_id=4992032

Attention, I didn't apply the newest patches to the classes PDFStreamEngine
and PageDrawer.

There are 4 more probably affected classes calling the page.findRotation
method which I didn't change, because I'm didn't have to use them (until
now).

org.pdfbox.util.operator.pagedrawer.Invoke
org.pdfbox.util.TextPositionComparator
org.pdfbox.examples.pdmodel.PrintURLs
org.pdfbox.examples.util.PrintImageLocations


I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
try to print or display (using the pdfbox-PDFReader and convertToImage
within my application) it with pdfbox. The acrobat reader has no problems
with my documents.
After my patch everything works fine. Perhaps it is a point of discussion,
if the convertToImage method has to rotate the image or if the user has to
do it. The PDFPagePanel didn't do it (yet).

Andreas

http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429

[Comment from SourceForge]
Date: 2008-05-29 12:42
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

I've just tried your sample PDF w/ the latest code -- prior to application
of your patch.  It doesn't work.

I'll work on incorporating your change for a full regression test in the
next hour or so.

[Comment from SourceForge]
Date: 2008-05-29 15:16
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

Hi Daniel,

I've just added my patch to the newest sources you send me earlier this
day. I guess it works. During testing I've found another problem concernign
graphics within landscape-docs. I found the solution in patching the class
org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
others. And consequently to be strict I've also patched the new methods in
org.pdfbox.pdfviewer.PageDrawer

For my everthings works fine inlc. the 4PP-pdf.

I've attached the patched files and another testpdf with a embedded
graphic.

Andreas
File Added: pdfbox_rotation_patch_2.zip
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429

[Comment from SourceForge]
Date: 2008-05-29 18:12
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

Your code works w/ the 4PP test ... and with the other rendering stuff
I've tried so far.

However ... the text extraction test fails with it.  I can't figure that
one out ... ideas?

[Comment from SourceForge]
Date: 2008-05-29 18:19
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

Can you give me some more details? I never do any textextractions with
pdfbox. Perhaps you'll provide with the code for test program, or is it
part of pdfbox, so that I can find it in the cvs?

However, it has to wait until tomorrow

[Comment from SourceForge]
Date: 2008-05-29 18:39
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

If you've got the whole project set up, try
ant testextract

I'll see if I can narrow it down some.

[Comment from SourceForge]
Date: 2008-05-29 21:00
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

The extraction problem seems to have to do w/ the changes to
PDFStreamEngine.

If I revert that file, extraction succeeds.  Unfortunately ... with that
reverted but your other changes in place, image rendering hangs.

Will work on it more ... probably tomorrow.

[Comment from SourceForge]
Date: 2008-05-29 21:12
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

Correction ... it doesn't hang ... it's just slow on the first PDF to
render ... maybe just due to the first one I'm sending it.

Will look more tomorrow.

[Comment from SourceForge]
Date: 2008-05-30 07:11
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

I've found one bug. While deleting the if rules for the rotation, I've
deleted line 394 which is still needed.

I've attached the corrected file


File Added: PDFStreamEngine.java
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429

[Comment from SourceForge]
Date: 2008-05-30 07:43
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

I forgot to mention that I can't run the test suite. When I try to get the
whole project, I realized that I'm behind a firewall here in my office.
Consequently my cvs-client doesn't work. I've to do it from home. :-(

I've only tested one file: 601501018.pdf

There are additional blanks and they disapper after adding the missing
line. But starting at page 21, when the document orientation changes from
portrait to landscape, there are additional cr or lf. Hmmmm ??

[Comment from SourceForge]
Date: 2008-05-30 08:25
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

I've continued testing and I guess the problem is somewhere starting in
org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
coordinates for rotated pages somehow in an other way than the
implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
But for the moment I don't understand what's happening in the
TextStripper, perhaps I'll find out later. 
I hope this hint helps ...

[Comment from SourceForge]
Date: 2008-05-30 16:20
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

I've put a couple more hours into this, and I don't know the answer.

I do know the text extraction is the more mature side of this library.

For the moment, I'll be skipping over your changes to PDFStreamEngine.

Thanks for the other changes!

[Comment from SourceForge]
Date: 2008-06-02 09:21
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

Hi Daniel,

I guess I've solved the problem. The textposition-handling has to be
adjusted within the method PDFTextStripper.flushText(). Of course my former
changes to the class PDFStreamEngine are needed. During debugging I found a
bug in the class TextPositionComparator (line 82). I solved it by removing
the rotation if-clauses. Whenever you compare two Textpositions, it is
needless to look at the rotation because they are on the same page so that
the comparison is independent of the rotation.

Furthermore my PDFTextStripper-patch seems to correct some minor problems,
which are described in
https://sourceforge.net/forum/message.php?msg_id=4976730.

I've tested the following cases:
Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
kind of special-character-issues. I guess you have to correct the input at
first.

I've attached my changes based on the newest versions of both classes.

[Comment from SourceForge]
Date: 2008-06-02 09:22
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

File Added: pdfbox_rotation_patch_3.zip
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to