Hello, I am attempting to have Tesseract find a number on a parts diagram. I then will take the number found and its coordinates for further programmatic usage.
I am having no problem getting the engine to see strings of characters and from examination it is returning all of the correct text for the text that it finds. I have a folder structure of .\tessdata\configs in which I copied one of the config files located in the Tesseract installs configs folder. I edited the file to contain the following two lines: tessedit_char_whitelist 0123456789-. tessedit_char_blacklist abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ I am then referencing the config file and it is being opened properly without complaint. It seems however that the configuration has no effect. <https://lh3.googleusercontent.com/-Kmgl6SuZG9c/VXhqGUiyCCI/AAAAAAAAAKA/KG1zlBWXU5s/s1600/Tesseract%2BOutput.PNG> <https://lh3.googleusercontent.com/-x7Z-lF0CoEg/VXhqJRfg6MI/AAAAAAAAAKI/WRNo2T24BJ0/s1600/img1.TIF> I say this because as you can see I am still getting all of the characters in my blacklist returned. Once I get the configuration to take affect I need to know what configuration is needed to find the single numbers on a parts list per this example. <https://lh3.googleusercontent.com/-pPt4DndW9MA/VXhpIXPryLI/AAAAAAAAAJw/GFFE4oxRHVg/s1600/95bca-1.tif> Thanks in advance for any assistance that can be provided on this. V/R James -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/51fc9dc8-c275-4ebf-8f4b-460c8b5d48d2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

