[ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-363: -------------------------------------- Attachment: PageRotation-Patch3.diff There was still one unanswered question. First of all I thought that the text has to be rotated by 180 degrees, if the text-direction unequals 0. But this works only for rotation-angles which are multiplies of 90 degrees. Any other angle doesn't work. This problem can be demonstrated by using the test-document Flyer2.pdf attached to PDFBOX-358. The document has a landscape-orientation and most of the text has a direction of 90. But there are some pieces of text with a rotation of approx. 40 degrees. After some testing I realized that the text doesn't have to be rotated by 180 degrees, but the shearing has to be corrected by multiplying the values with -1. This operation compensates the moving of the 0,0-reference from lower left to upper left before drawing. I've attached my patch. > Fixed Page rotation > ------------------- > > Key: PDFBOX-363 > URL: https://issues.apache.org/jira/browse/PDFBOX-363 > Project: PDFBox > Issue Type: Improvement > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Fix For: 0.8.0-incubator > > Attachments: landscape_rot90.pdf, PageRotation-Patch.diff, > PageRotation-Patch2.diff, PageRotation-Patch3.diff, test-landscape2.pdf > > > [Issue from SourceForge] > http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834 > Hi all, > Daniel asked me for my patch for the rotation-issue described in > https://sourceforge.net/forum/message.php?msg_id=4992032 > Attention, I didn't apply the newest patches to the classes PDFStreamEngine > and PageDrawer. > There are 4 more probably affected classes calling the page.findRotation > method which I didn't change, because I'm didn't have to use them (until > now). > org.pdfbox.util.operator.pagedrawer.Invoke > org.pdfbox.util.TextPositionComparator > org.pdfbox.examples.pdmodel.PrintURLs > org.pdfbox.examples.util.PrintImageLocations > I've attached a pdf in DINA4-landscape. The text is missplaced whenever I > try to print or display (using the pdfbox-PDFReader and convertToImage > within my application) it with pdfbox. The acrobat reader has no problems > with my documents. > After my patch everything works fine. Perhaps it is a point of discussion, > if the convertToImage method has to rotate the image or if the user has to > do it. The PDFPagePanel didn't do it (yet). > Andreas > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429 > [Comment from SourceForge] > Date: 2008-05-29 12:42 > Sender: danielwilson > Logged In: YES > user_id=1737686 > Originator: NO > I've just tried your sample PDF w/ the latest code -- prior to application > of your patch. It doesn't work. > I'll work on incorporating your change for a full regression test in the > next hour or so. > [Comment from SourceForge] > Date: 2008-05-29 15:16 > Sender: lehmialk > Logged In: YES > user_id=2069622 > Originator: YES > Hi Daniel, > I've just added my patch to the newest sources you send me earlier this > day. I guess it works. During testing I've found another problem concernign > graphics within landscape-docs. I found the solution in patching the class > org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the > others. And consequently to be strict I've also patched the new methods in > org.pdfbox.pdfviewer.PageDrawer > For my everthings works fine inlc. the 4PP-pdf. > I've attached the patched files and another testpdf with a embedded > graphic. > Andreas > File Added: pdfbox_rotation_patch_2.zip > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429 > [Comment from SourceForge] > Date: 2008-05-29 18:12 > Sender: danielwilson > Logged In: YES > user_id=1737686 > Originator: NO > Your code works w/ the 4PP test ... and with the other rendering stuff > I've tried so far. > However ... the text extraction test fails with it. I can't figure that > one out ... ideas? > [Comment from SourceForge] > Date: 2008-05-29 18:19 > Sender: lehmialk > Logged In: YES > user_id=2069622 > Originator: YES > Can you give me some more details? I never do any textextractions with > pdfbox. Perhaps you'll provide with the code for test program, or is it > part of pdfbox, so that I can find it in the cvs? > However, it has to wait until tomorrow > [Comment from SourceForge] > Date: 2008-05-29 18:39 > Sender: danielwilson > Logged In: YES > user_id=1737686 > Originator: NO > If you've got the whole project set up, try > ant testextract > I'll see if I can narrow it down some. > [Comment from SourceForge] > Date: 2008-05-29 21:00 > Sender: danielwilson > Logged In: YES > user_id=1737686 > Originator: NO > The extraction problem seems to have to do w/ the changes to > PDFStreamEngine. > If I revert that file, extraction succeeds. Unfortunately ... with that > reverted but your other changes in place, image rendering hangs. > Will work on it more ... probably tomorrow. > [Comment from SourceForge] > Date: 2008-05-29 21:12 > Sender: danielwilson > Logged In: YES > user_id=1737686 > Originator: NO > Correction ... it doesn't hang ... it's just slow on the first PDF to > render ... maybe just due to the first one I'm sending it. > Will look more tomorrow. > [Comment from SourceForge] > Date: 2008-05-30 07:11 > Sender: lehmialk > Logged In: YES > user_id=2069622 > Originator: YES > I've found one bug. While deleting the if rules for the rotation, I've > deleted line 394 which is still needed. > I've attached the corrected file > File Added: PDFStreamEngine.java > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429 > [Comment from SourceForge] > Date: 2008-05-30 07:43 > Sender: lehmialk > Logged In: YES > user_id=2069622 > Originator: YES > I forgot to mention that I can't run the test suite. When I try to get the > whole project, I realized that I'm behind a firewall here in my office. > Consequently my cvs-client doesn't work. I've to do it from home. :-( > I've only tested one file: 601501018.pdf > There are additional blanks and they disapper after adding the missing > line. But starting at page 21, when the document orientation changes from > portrait to landscape, there are additional cr or lf. Hmmmm ?? > [Comment from SourceForge] > Date: 2008-05-30 08:25 > Sender: lehmialk > Logged In: YES > user_id=2069622 > Originator: YES > I've continued testing and I guess the problem is somewhere starting in > org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the > coordinates for rotated pages somehow in an other way than the > implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer. > But for the moment I don't understand what's happening in the > TextStripper, perhaps I'll find out later. > I hope this hint helps ... > [Comment from SourceForge] > Date: 2008-05-30 16:20 > Sender: danielwilson > Logged In: YES > user_id=1737686 > Originator: NO > I've put a couple more hours into this, and I don't know the answer. > I do know the text extraction is the more mature side of this library. > For the moment, I'll be skipping over your changes to PDFStreamEngine. > Thanks for the other changes! > [Comment from SourceForge] > Date: 2008-06-02 09:21 > Sender: lehmialk > Logged In: YES > user_id=2069622 > Originator: YES > Hi Daniel, > I guess I've solved the problem. The textposition-handling has to be > adjusted within the method PDFTextStripper.flushText(). Of course my former > changes to the class PDFStreamEngine are needed. During debugging I found a > bug in the class TextPositionComparator (line 82). I solved it by removing > the rotation if-clauses. Whenever you compare two Textpositions, it is > needless to look at the rotation because they are on the same page so that > the comparison is independent of the rotation. > Furthermore my PDFTextStripper-patch seems to correct some minor problems, > which are described in > https://sourceforge.net/forum/message.php?msg_id=4976730. > I've tested the following cases: > Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100% > test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in > lines 251-257, 278/279, 502/503, 574/575 and the other differences are some > kind of special-character-issues. I guess you have to correct the input at > first. > I've attached my changes based on the newest versions of both classes. > [Comment from SourceForge] > Date: 2008-06-02 09:22 > Sender: lehmialk > Logged In: YES > user_id=2069622 > Originator: YES > File Added: pdfbox_rotation_patch_3.zip > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.