Hello,

I am new to Tesseract and could use some guidance on how a versed person 
would tackle this issue.  I have a php website where I can get the data out 
of a pdf without any issues but the order of the data that I am pulling is 
a mess.  The issue is that the return is only one long sting without any 
return characters or other way to break it down into parts  I was going to 
slice the pdf into several chunks and run each one though OCR at a time but 
I find that Tesseract has the power to do what I need it to do. Also with 
the 1000s of times the user will be uploading a new pdf it might not line 
up exactly the way I need it to. 

My end goal is to be able to update all these values to my database in the 
order they are related.  For the 4th generation that would be 31 different 
areas to scoop up the data I need.  If these are in order with an X 
coordinate I can always use that and work my Y values down.  

Even if all I had to work with is a /n character for each line I might be 
able to make that work.  

On the 4th generation Pedigree I tried to cut the last entire 4th 
generation out.  If I go that route that would only be 6 crops I need to 
make on this (1 for the dog, two for each of those parents, and then each 
generation.  My users will have 3 or 4 generation pedigrees.  

Any advice would be greatly appreciated. 
Thanks
Daron

<https://lh3.googleusercontent.com/-EUDy1RXhwNI/WwtIj87cJpI/AAAAAAAAAv4/YxrTRX4IDUU6fx5GlJTweEhUff6OgXzCgCLcBGAs/s1600/test4.png>

<https://lh3.googleusercontent.com/-Z4Jqh3ibhC0/WwtI760Pl_I/AAAAAAAAAwA/mgbcQyCfk5smwKyzzfhIaNutRCplfvlNACLcBGAs/s1600/test2.jpg>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/eb8d5420-67be-44b0-aec7-c6de7b78f758%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to