When you say that I will need to "map the font returned by Tesseract to some font available on your system that has similar glyph characteristics", you have restated my original question.
So maybe this rephrasing will help you understand my question: How can I map the font returned by Tesseract to some font available on Windows that has similar glyph characteristics? On Saturday, September 21, 2013 9:07:11 AM UTC-6, Quan Nguyen wrote: > > I don't think Tesseract has any knowledge about system fonts. It gets the > font info from the .traineddata file which includes information defined in > the font_properties file used during training. So it means the fonts used > in training may not exist on the machine it's being run on. Moreover, the > font name specified in font_properties may not reflect the actual font > name; e.g., "Times New Roman" may be shortened to "times". > > As such, you will need to map the font returned by Tesseract to some font > available on your system that has similar glyph characteristics. > > On Saturday, September 21, 2013 7:39:04 AM UTC-5, [email protected] wrote: >> >> Thanks for the quick response, but I already know about those APIs - let >> me try to explain with an example. >> >> Let's say that ResultIterator says that it found the word "hello" in the >> image at position (100, 100), and TessResultIteratorWordFontAttributes says >> it's in font "Arial" with a height of 16. In my Windows application, I can >> construct a 16-high Arial font and draw the word "hello" at (100, 100) and >> I am doing a good job of showing the user the OCR output. >> >> But now let's say that ResultIterator continues and says that it found >> the word "goodbye" in the image at position (100, 300), and >> TessResultIteratorWordFontAttributes says it's in font "DejaVu Sans" with a >> height of 16. If I tell Windows to construct a font named "DejaVu Sans", >> Window won't have any idea what that is, and it will pick some random font >> from its list. When I then have my Windows application draw the word >> "goodbye" at (100, 300), it's highly likely that the character widths in >> the font that Windows is using are very different from the character widths >> in the actual DejaVu Sans font, so the word "goodbye" will take up the >> wrong amount of space and I'll either end up with lots of white space or >> (more often) the words all run over each other. >> >> Does that make more sense? >> >> Thanks, >> Chris >> >> >> On Friday, September 20, 2013 5:39:07 PM UTC-6, Quan Nguyen wrote: >>> >>> You'll need to access Tessearct API for such information, specifically, >>> ResultIterator and ResultIteratorWordFontAttributes. Check out the API >>> Example <http://code.google.com/p/tesseract-ocr/wiki/APIExample> page. >>> >>> Quan >>> >>> >>> On Friday, September 20, 2013 3:42:14 PM UTC-5, [email protected] wrote: >>>> >>>> I would like to show the user the OCR output in my Windows application >>>> in a graphical form (the OCR'd characters, in the specified font, in the >>>> right location), in order to do that I need to pick a font to draw the OCR >>>> output text in, and it seems like I have two choices - >>>> 1) Map the Tesseract font to something Windows can understand >>>> 2) Use the actual Tesseract font >>>> >>>> For #1, Tesseract uses a lot of fonts that I've got on my Windows box >>>> (Times New Roman, Arial, etc.) but then it also comes up with some I don't >>>> have (Century Schoolbook). Is there a way to enumerate all the names of >>>> the fonts that Tesseract might return? I can then decide whether it's >>>> easier to find Windows equivalent for all the fonts, or to download fonts >>>> (if they are free and have nice licensing). >>>> >>>> For #2, it's not enough to just display the selected portion of the >>>> source image, that doesn't tell the user anything. I would need a way to >>>> ask Tesseract, "what is the glyph for an uppercase G in an Arial font of >>>> height 34". Does that exist? >>>> >>>> Thanks, >>>> Chris >>>> >>>> -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

