Ok, then it's like what I always do: $ tesseract isis_0153.png isis_0153 -l deu-frak+deu makebox hocr tessedit_write_images
This way I can extract blocks, lines, words and character images from the clean page tiff. Am Freitag, 13. November 2015 05:15:18 UTC+1 schrieb Chang Alden: > > Hi, > With "makebox" you get the coordinates for the box for each character of > the image you input and scanned with tesseract, it is not in html format > (sorry about the confusion). I didn't work on training so I didn't know > such option exists. > > > Helmut Wollmersdorfer於 2015年11月13日星期五 UTC+8上午2時13分11秒寫道: >> >> Sorry, in which option do you write it? Sound like the shell console, and >> you get a box-file. Or have you found how to get single character boxes in >> hOCR? >> >> Am Donnerstag, 12. November 2015 16:18:06 UTC+1 schrieb Chang Alden: >>> >>> Alright I got it, just type makebox in option, it seems everything else >>> in the configs folder can be accessed this way as well. >>> >>> Chang Alden於 2015年11月12日星期四 UTC+8下午9時55分12秒寫道: >>>> >>>> It seems it has to do with enabling the api.GetBoxText option, anyone >>>> know how to get it to work? >>>> >>>> >>>> Chang Alden於 2015年11月12日星期四 UTC+8上午9時43分42秒寫道: >>>>> >>>>> So, this is an extension to my problem in case someone skipped the >>>>> title for the spacing problem. Pretty much I want to analyze the spacing >>>>> problem using hocr, but hocr only gives bounding box for word output. So >>>>> I >>>>> would like to know if there is a file in tessdata/configs that I can >>>>> modify >>>>> to get the character bounding box output from hocr, so far I have not >>>>> found >>>>> a post through Google Search so I am not sure if such a technique exist. >>>>> Ignoring the api way for now. >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/558db549-3e84-4b60-a48f-52777e5e77ac%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

