Hi, With "makebox" you get the coordinates for the box for each character of the image you input and scanned with tesseract, it is not in html format (sorry about the confusion). I didn't work on training so I didn't know such option exists.
Helmut Wollmersdorfer於 2015年11月13日星期五 UTC+8上午2時13分11秒寫道: > > Sorry, in which option do you write it? Sound like the shell console, and > you get a box-file. Or have you found how to get single character boxes in > hOCR? > > Am Donnerstag, 12. November 2015 16:18:06 UTC+1 schrieb Chang Alden: >> >> Alright I got it, just type makebox in option, it seems everything else >> in the configs folder can be accessed this way as well. >> >> Chang Alden於 2015年11月12日星期四 UTC+8下午9時55分12秒寫道: >>> >>> It seems it has to do with enabling the api.GetBoxText option, anyone >>> know how to get it to work? >>> >>> >>> Chang Alden於 2015年11月12日星期四 UTC+8上午9時43分42秒寫道: >>>> >>>> So, this is an extension to my problem in case someone skipped the >>>> title for the spacing problem. Pretty much I want to analyze the spacing >>>> problem using hocr, but hocr only gives bounding box for word output. So I >>>> would like to know if there is a file in tessdata/configs that I can >>>> modify >>>> to get the character bounding box output from hocr, so far I have not >>>> found >>>> a post through Google Search so I am not sure if such a technique exist. >>>> Ignoring the api way for now. >>>> >>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1911793e-7069-42a3-b97f-2f8db614b48a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

