Ok, then it's like what I always do:

   $ tesseract isis_0153.png isis_0153 -l deu-frak+deu  makebox hocr 
tessedit_write_images


This way I can extract blocks, lines, words and character images from the 
clean page tiff.

Am Freitag, 13. November 2015 05:15:18 UTC+1 schrieb Chang Alden:
>
> Hi,
> With "makebox" you get the coordinates for the box for each character of 
> the image you input and scanned with tesseract, it is not in html format 
> (sorry about the confusion). I didn't work on training so I didn't know 
> such option exists.
>
>
> Helmut Wollmersdorfer於 2015年11月13日星期五 UTC+8上午2時13分11秒寫道:
>>
>> Sorry, in which option do you write it? Sound like the shell console, and 
>> you get a box-file. Or have you found how to get single character boxes in 
>> hOCR?
>>
>> Am Donnerstag, 12. November 2015 16:18:06 UTC+1 schrieb Chang Alden:
>>>
>>> Alright I got it, just type makebox in option, it seems everything else 
>>> in the configs folder can be accessed this way as well.
>>>
>>> Chang Alden於 2015年11月12日星期四 UTC+8下午9時55分12秒寫道:
>>>>
>>>> It seems it has to do with enabling the api.GetBoxText option, anyone 
>>>> know how to get it to work?
>>>>
>>>>
>>>> Chang Alden於 2015年11月12日星期四 UTC+8上午9時43分42秒寫道:
>>>>>
>>>>> So, this is an extension to my problem in case someone skipped the 
>>>>> title for the spacing problem. Pretty much I want to analyze the spacing 
>>>>> problem using hocr, but hocr only gives bounding box for word output. So 
>>>>> I 
>>>>> would like to know if there is a file in tessdata/configs that I can 
>>>>> modify 
>>>>> to get the character bounding box output from hocr, so far I have not 
>>>>> found 
>>>>> a post through Google Search so I am not sure if such a technique exist. 
>>>>> Ignoring the api way for now.
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/558db549-3e84-4b60-a48f-52777e5e77ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to