Re: Problem with processTextPosition

2014-05-22 Thread John Hewson
That’s great! -- John On 22 May 2014, at 10:12, DImuthu Upeksha wrote: > Yes I double checked it by debugging processTextPosition method in > normal operation. Thanks for the information. Now text position > details from OCR plugin are successfully fed into processTextPosition. > Output text al

Re: Problem with processTextPosition

2014-05-22 Thread DImuthu Upeksha
Yes I double checked it by debugging processTextPosition method in normal operation. Thanks for the information. Now text position details from OCR plugin are successfully fed into processTextPosition. Output text also pretty good for first sample PDFs. On Thu, May 22, 2014 at 10:31 PM, John Hewso

Re: Problem with processTextPosition

2014-05-22 Thread John Hewson
Yes, as Alin says, the y-axis in PDF uses y=0 as the bottom of the page, instead of the top as is usually the case in Java. PDFBox uses both styles of coordinates internally at various points. -- John On 17 May 2014, at 11:45, DImuthu Upeksha wrote: > Hi Alin, > Thank you. It helped me a lot.

Re: Problem with processTextPosition

2014-05-17 Thread DImuthu Upeksha
Hi Alin, Thank you. It helped me a lot. I'll look into that further. About OCR. I use Tesseract C library to do OCR and I have written some native calls to communicate with Tesseract API. [2] [2] https://github.com/DImuthuUpe/Tesseract-API On Sat, May 17, 2014 at 10:43 PM, Alin Mazilu wrote: >

Re: Problem with processTextPosition

2014-05-17 Thread Alin Mazilu
Hello, I commented on the gist. You have to use setSortByPosition(true) in the constructor right after super(). Be careful with your coordinate system. When you do textPosition1.getY() you get 792 not 0. I don't remember exactly where, but there is a class that uses the lower left corner of the pa

Re: Problem with processTextPosition

2014-05-17 Thread DImuthu Upeksha
Hi Alin, You can find my source code from here https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649 As you can see I set X-offset : 0 and Y-offset : 0 for "H" X-offset : 32 and Y-offset : 0 for "W" in Text Matrices. Is that enough? Is there other way to set X,Y co-ordinates? On Sat, May 17, 2

Re: Problem with processTextPosition

2014-05-17 Thread Alin Mazilu
What are the x and y coordinates of H and W? Alin Mazilu SKE GlobalTech, LLC 3250 West Market St. Suite 307D Fairlawn, OH 44333 Sent from my Galaxy S3 On May 17, 2014 2:42 AM, "DImuthu Upeksha" wrote: > Hi all, > > I was tying to manually feed text position objects to > processTextPosition meth