Hello Tilman, thanks for your reply. That's it. I want to extract the text the way you did. You rotated 90° clockwise because you saw the text was rotated, right?
What I get from the page is that it has 0° rotation and the TextPosition 90° (on the contrary to page rotation, this is counter clock-wise, I assume). So the idea would be: Rotate the page until the text appears without rotation so the PDFTextStripper does its best to get the text, right? I mention this because I have been trying to get the text from the same pdf with all possible rotations (90, 180, 270). The pdf files I receive in the system can have any rotation on its page and on it's text. Thanks. Jorge Eduardo Flórez El lun., 5 nov. 2018 a las 2:08, <users-digest-h...@pdfbox.apache.org> escribió: > > users Digest 5 Nov 2018 07:08:47 -0000 Issue 1772 > > Topics (messages 11288 through 11288) > > Re: Extracting page "correctly" > 11288 by: Tilman Hausherr > > Administrivia: > > --------------------------------------------------------------------- > To post to the list, e-mail: users@pdfbox.apache.org > To unsubscribe, e-mail: users-digest-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-digest-h...@pdfbox.apache.org > > ---------------------------------------------------------------------- > > > > > ---------- Forwarded message ---------- > From: Tilman Hausherr <thaush...@t-online.de> > To: users@pdfbox.apache.org > Cc: > Bcc: > Date: Sat, 3 Nov 2018 10:35:30 +0100 > Subject: Re: Extracting page "correctly" > Am 02.11.2018 um 23:37 schrieb jorgeeflorez: > > > > The text I get is better than the first one, but it mixes the text > > from left and right "columns" (please see the bold text). > > My question is: is it possible to get the text as one would naturally > > read it? i.e. the text of the left column and then the text of the > > right column? > > > Is this what you'd like to have? > > All I did was to rotate 90° and then extract without sorting. It works > because many (but not all) PDFs with columns have the operators in the > column sequence.