[ 
https://issues.apache.org/jira/browse/PDFBOX-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Carrier updated PDFBOX-374:
---------------------------------

    Attachment:     (was: TextPositionComparator.diff)

> text areas not properly being sorted because of page rotation
> -------------------------------------------------------------
>
>                 Key: PDFBOX-374
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-374
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Brian Carrier
>         Attachments: rotation.pdf, text-rotation-081117.zip
>
>
> When PDFTextStripper is set to sort the text before outputting, the sorting 
> is not correct if a page rotation exists.  The reason is because both 
> TextPositionComparator and PDFStreamEngine take the rotation into account.  
> So, the rotation is applied twice by the time the comparison is done in 
> TextPositionComparator. 
> Also, it seems that the rotation code in PDFStreamEngine is not consistent. I 
> verified the code for 0 and 90 degrees works, but the 180 and 270 situations 
> do not seem consistent with the goal of adjusting the X and Y values so that 
> 0,0 is in the upper left, which is what the 0 and 90 code does.  I do not 
> have examples of 180 and 270 to test with. There are no comments in this 
> section, so I have been guessing about its purpose.
> The attached patches:
> - Remove the rotation from TextPositionComparator
> - Adds comments and makes changes to the 180 and 270 situations to make it 
> consistent with 0 and 90. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to