Hello, Think I solved this myself. If anyone else is interested; I had to create my own TextStripperByArea and make the getregionCharacterList public. Then I could access the information about the characters and determine the direction/rotation etc. PDFDev
On Tuesday, March 3, 2020, 7:31:39 PM GMT, PDF Developer <pdf...@yahoo.com.invalid> wrote: Hello, I am trying to understand these two methods PDFTextStripper and PDFTextStripperByArea. I am using them obtain the properties of the text in a PDF. For what it is worth, I have some PDFs that are marked up with "regions" which I can, reliably, detect. Since I know the area in question, I thought it would be enough to use the PDFStripperByArea and get the text within the bounded area. That works quite well. However, now there is a requirement to get the rotation of the text, as there are use cases where the text has been rotated as part of an upstream process. So I tried to get the TextPositions properties via an override of the writeString and I thought was all working but a colleague pointed out that the rotation was always "0". Going back to basics, for test purposes, I used PDFTextStripper (again with an override it the writeString method) basically to dump the properties of the TextPositions. That appears to give me the results I am looking for. However, if I use a similar override for PDFTextStripperByArea I never see a rotation other than 0. Since there can be a lot of text on a page and the pages are very large, so I would prefer to use PDFTextStripperByArea (mainly because I know exactly where the text will be and the overhead will be less). Have I misunderstood something along the way? Made a naive assumption? Any suggestions on how to get the PDFTextStripperByArea to return the string contained within an area/region and the rotation (or other properties) of the text? PDFDev