I'm working on a training model to recognize Mechanical Engineering 
drawings that may contain GDT symbols such as a symbol to indicate depth, a 
counterbore, countersink, diameter, etc. I saw that the eng.traineddata has 
a number of these GDT symbols already but not all. I'm using Legacy OEM.

I am obtaining 2 different types of images from these mechanical drawings - 
images that contain Notes which are typically english paragraphs/sentences 
of text, and images that contain dimensions/gdt symbols.

For the Notes regions of the drawing (in general, recognition of all 
letters, numbers, punctuation), i'm satisfied with the results that the 
eng.traineddata language produces.

For images obtained from the drawing that contain dimension text such as 
"⌀1.05 + .05 - .03 TYP" , I have developed a training model that is trained 
with letters A-Z (only uppercase letters - typical on these drawings - 
dimensions can have english text before or after as well), limited 
punctuation chars, and all the GDT symbols I need. It works OK on some 
fonts - but is not as good as the eng.traineddata model is at recognizing 
letters, numbers, punctuation. I'm assuming the main reason is because I 
haven't trained it with nearly as many fonts as the eng.traineddata model 
has been trained with. So my question is.. What's the best way to develop 
this language I need - which is just the eng model plus a few additional 
characters? Does it make sense to try to re-create the eng training data on 
my own? That seems like a daunting task that I'm trying to avoid. Do I have 
to re-create the eng language to add a few symbols?

Thanks for any Advice,
Boot

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3977493d-900c-4b01-91a8-9e814e0399c4n%40googlegroups.com.

Reply via email to