Hi,

Do you have some code and file to reproduce this effect?

protected void writeString(String text, List<TextPosition> textPositions) throws IOException

is not overridden in PDFStripperByArea so this is weird.

Tilman

Am 03.03.2020 um 20:31 schrieb PDF Developer:
Hello,
I am trying to understand these two methods PDFTextStripper and PDFTextStripperByArea. I 
am using them obtain the properties of the text in a PDF. For what it is worth, I have 
some PDFs that are marked up with "regions" which I can, reliably, detect. 
Since I know the area in question, I thought it would be enough to use the 
PDFStripperByArea and get the text within the bounded area. That works quite well.  
However, now there is a requirement to get the rotation of the text, as there are use 
cases where the text has been rotated as part of an upstream process.
So I tried to get the TextPositions properties via an override of the writeString and I 
thought was all working but a colleague pointed out that the rotation was always 
"0".

Going back to basics, for test purposes, I used PDFTextStripper (again with an 
override it the writeString method) basically to dump  the properties of the 
TextPositions. That appears to give me the results I am looking for. However, 
if I use a similar override for PDFTextStripperByArea I never see a rotation 
other than 0.
Since there can be a lot of text on a page and the pages are very large, so I 
would prefer to use PDFTextStripperByArea (mainly because I know exactly where 
the text will be and the overhead will be less).

Have I misunderstood something along the way? Made a naive assumption? Any 
suggestions on how to get the PDFTextStripperByArea to return the string 
contained within an area/region and the rotation (or other properties) of the 
text?

PDFDev



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to