On Tuesday, June 6, 2017 at 6:08:32 PM UTC-4, John Muccigrosso wrote:
>
> The wiki suggests making sure that the x-height of text is at least 20 px. 
> Is there a fairly straightforward way to estimate this with manually 
> examining the image? Getting average or median from hocr or something?
>

Months later...

It looks like what I want to do is create a box file, so checking out the 
wiki, I modified the instructions to create this command, which seems to do 
what I want:

tesseract text_image_file output_file_name makebox

Output looks like this:

C 261 2453 285 2480 0
A 287 2454 312 2480 0
P 315 2454 334 2479 0
I 337 2454 347 2480 0
T 349 2454 372 2481 0
O 374 2454 402 2480 0
L 406 2454 426 2480 0
I 429 2454 439 2480 0
N 442 2454 471 2480 0
E 473 2454 494 2480 0


So now I need to process this output to get the letter heights (element 4 - 
element 2 in each line) and then grab the median.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b1f2c480-87dc-4a2d-8cc3-c95d101dad64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to