I would say that the approach that fixes the most items w/o breaking those that work is the better approach.
What I have done is run the regression tests before committing. If, though I've fixed something, I've broken a regression test, I haven't committed that change. We do have a regression test for the image rendering now (have since May, IIRC) and 1 or 2 of those tests involve rotation. Daniel Wilson On 10/31/08, Brian Carrier (JIRA) <[EMAIL PROTECTED]> wrote: > > > [ > https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644414#action_12644414] > > > Brian Carrier commented on PDFBOX-363: > -------------------------------------- > > > What additional information is needed for a commiter to make a decision on > which approach will be used? There are some additional text rotation > problems that are not addressed by either this patch or the one associated > with PDFBOX-374 and the fix requires taking the text matrix into account. > The current fix that I am working on will be impacted by which approach is > used. > > FWIW, the approach of not rotating the text in PDFTextStripper and taking > the rotation into account in TextPositionComparator will be easiest for the > next fix as well (which will allow page rotation as well as having text on a > page going in a different direction - such as on the y-axis of a graph). > > > > > Fixed Page rotation > > ------------------- > > > > Key: PDFBOX-363 > > URL: https://issues.apache.org/jira/browse/PDFBOX-363 > > Project: PDFBox > > Issue Type: Improvement > > Reporter: Jukka Zitting > > > > [Issue from SourceForge] > > > http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834 > > Hi all, > > Daniel asked me for my patch for the rotation-issue described in > > https://sourceforge.net/forum/message.php?msg_id=4992032 > > Attention, I didn't apply the newest patches to the classes > PDFStreamEngine > > and PageDrawer. > > There are 4 more probably affected classes calling the page.findRotation > > method which I didn't change, because I'm didn't have to use them (until > > now). > > org.pdfbox.util.operator.pagedrawer.Invoke > > org.pdfbox.util.TextPositionComparator > > org.pdfbox.examples.pdmodel.PrintURLs > > org.pdfbox.examples.util.PrintImageLocations > > I've attached a pdf in DINA4-landscape. The text is missplaced whenever I > > try to print or display (using the pdfbox-PDFReader and convertToImage > > within my application) it with pdfbox. The acrobat reader has no problems > > with my documents. > > After my patch everything works fine. Perhaps it is a point of > discussion, > > if the convertToImage method has to rotate the image or if the user has > to > > do it. The PDFPagePanel didn't do it (yet). > > Andreas > > > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429 > > [Comment from SourceForge] > > Date: 2008-05-29 12:42 > > Sender: danielwilson > > Logged In: YES > > user_id=1737686 > > Originator: NO > > I've just tried your sample PDF w/ the latest code -- prior to > application > > of your patch. It doesn't work. > > I'll work on incorporating your change for a full regression test in the > > next hour or so. > > [Comment from SourceForge] > > Date: 2008-05-29 15:16 > > Sender: lehmialk > > Logged In: YES > > user_id=2069622 > > Originator: YES > > Hi Daniel, > > I've just added my patch to the newest sources you send me earlier this > > day. I guess it works. During testing I've found another problem > concernign > > graphics within landscape-docs. I found the solution in patching the > class > > org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched > the > > others. And consequently to be strict I've also patched the new methods > in > > org.pdfbox.pdfviewer.PageDrawer > > For my everthings works fine inlc. the 4PP-pdf. > > I've attached the patched files and another testpdf with a embedded > > graphic. > > Andreas > > File Added: pdfbox_rotation_patch_2.zip > > > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429 > > [Comment from SourceForge] > > Date: 2008-05-29 18:12 > > Sender: danielwilson > > Logged In: YES > > user_id=1737686 > > Originator: NO > > Your code works w/ the 4PP test ... and with the other rendering stuff > > I've tried so far. > > However ... the text extraction test fails with it. I can't figure that > > one out ... ideas? > > [Comment from SourceForge] > > Date: 2008-05-29 18:19 > > Sender: lehmialk > > Logged In: YES > > user_id=2069622 > > Originator: YES > > Can you give me some more details? I never do any textextractions with > > pdfbox. Perhaps you'll provide with the code for test program, or is it > > part of pdfbox, so that I can find it in the cvs? > > However, it has to wait until tomorrow > > [Comment from SourceForge] > > Date: 2008-05-29 18:39 > > Sender: danielwilson > > Logged In: YES > > user_id=1737686 > > Originator: NO > > If you've got the whole project set up, try > > ant testextract > > I'll see if I can narrow it down some. > > [Comment from SourceForge] > > Date: 2008-05-29 21:00 > > Sender: danielwilson > > Logged In: YES > > user_id=1737686 > > Originator: NO > > The extraction problem seems to have to do w/ the changes to > > PDFStreamEngine. > > If I revert that file, extraction succeeds. Unfortunately ... with that > > reverted but your other changes in place, image rendering hangs. > > Will work on it more ... probably tomorrow. > > [Comment from SourceForge] > > Date: 2008-05-29 21:12 > > Sender: danielwilson > > Logged In: YES > > user_id=1737686 > > Originator: NO > > Correction ... it doesn't hang ... it's just slow on the first PDF to > > render ... maybe just due to the first one I'm sending it. > > Will look more tomorrow. > > [Comment from SourceForge] > > Date: 2008-05-30 07:11 > > Sender: lehmialk > > Logged In: YES > > user_id=2069622 > > Originator: YES > > I've found one bug. While deleting the if rules for the rotation, I've > > deleted line 394 which is still needed. > > I've attached the corrected file > > File Added: PDFStreamEngine.java > > > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429 > > [Comment from SourceForge] > > Date: 2008-05-30 07:43 > > Sender: lehmialk > > Logged In: YES > > user_id=2069622 > > Originator: YES > > I forgot to mention that I can't run the test suite. When I try to get > the > > whole project, I realized that I'm behind a firewall here in my office. > > Consequently my cvs-client doesn't work. I've to do it from home. :-( > > I've only tested one file: 601501018.pdf > > There are additional blanks and they disapper after adding the missing > > line. But starting at page 21, when the document orientation changes from > > portrait to landscape, there are additional cr or lf. Hmmmm ?? > > [Comment from SourceForge] > > Date: 2008-05-30 08:25 > > Sender: lehmialk > > Logged In: YES > > user_id=2069622 > > Originator: YES > > I've continued testing and I guess the problem is somewhere starting in > > org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles > the > > coordinates for rotated pages somehow in an other way than the > > implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer. > > But for the moment I don't understand what's happening in the > > TextStripper, perhaps I'll find out later. > > I hope this hint helps ... > > [Comment from SourceForge] > > Date: 2008-05-30 16:20 > > Sender: danielwilson > > Logged In: YES > > user_id=1737686 > > Originator: NO > > I've put a couple more hours into this, and I don't know the answer. > > I do know the text extraction is the more mature side of this library. > > For the moment, I'll be skipping over your changes to PDFStreamEngine. > > Thanks for the other changes! > > [Comment from SourceForge] > > Date: 2008-06-02 09:21 > > Sender: lehmialk > > Logged In: YES > > user_id=2069622 > > Originator: YES > > Hi Daniel, > > I guess I've solved the problem. The textposition-handling has to be > > adjusted within the method PDFTextStripper.flushText(). Of course my > former > > changes to the class PDFStreamEngine are needed. During debugging I found > a > > bug in the class TextPositionComparator (line 82). I solved it by > removing > > the rotation if-clauses. Whenever you compare two Textpositions, it is > > needless to look at the rotation because they are on the same page so > that > > the comparison is independent of the rotation. > > Furthermore my PDFTextStripper-patch seems to correct some minor > problems, > > which are described in > > https://sourceforge.net/forum/message.php?msg_id=4976730. > > I've tested the following cases: > > Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100% > > test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in > > lines 251-257, 278/279, 502/503, 574/575 and the other differences are > some > > kind of special-character-issues. I guess you have to correct the input > at > > first. > > I've attached my changes based on the newest versions of both classes. > > [Comment from SourceForge] > > Date: 2008-06-02 09:22 > > Sender: lehmialk > > Logged In: YES > > user_id=2069622 > > Originator: YES > > File Added: pdfbox_rotation_patch_3.zip > > > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429 > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >