Hello,
Think I solved this myself. If anyone else is interested; I had to create my 
own TextStripperByArea and make the getregionCharacterList public. Then I could 
access the information about the characters and determine the 
direction/rotation etc.
PDFDev

    On Tuesday, March 3, 2020, 7:31:39 PM GMT, PDF Developer 
<pdf...@yahoo.com.invalid> wrote:  
 
 Hello,
I am trying to understand these two methods PDFTextStripper and 
PDFTextStripperByArea. I am using them obtain the properties of the text in a 
PDF. For what it is worth, I have some PDFs that are marked up with "regions" 
which I can, reliably, detect. Since I know the area in question, I thought it 
would be enough to use the PDFStripperByArea and get the text within the 
bounded area. That works quite well.  However, now there is a requirement to 
get the rotation of the text, as there are use cases where the text has been 
rotated as part of an upstream process.
So I tried to get the TextPositions properties via an override of the 
writeString and I thought was all working but a colleague pointed out that the 
rotation was always "0". 

Going back to basics, for test purposes, I used PDFTextStripper (again with an 
override it the writeString method) basically to dump  the properties of the 
TextPositions. That appears to give me the results I am looking for. However, 
if I use a similar override for PDFTextStripperByArea I never see a rotation 
other than 0.
Since there can be a lot of text on a page and the pages are very large, so I 
would prefer to use PDFTextStripperByArea (mainly because I know exactly where 
the text will be and the overhead will be less). 

Have I misunderstood something along the way? Made a naive assumption? Any 
suggestions on how to get the PDFTextStripperByArea to return the string 
contained within an area/region and the rotation (or other properties) of the 
text?

PDFDev  

Reply via email to