Wow! Awesome. That file definitely helps. It fixed a few issues, but introduced a few of its own, so currently I am running "eng+asc" and that is giving great output, and is running faster then "eng+deu".
Attached is an example image and output using asc. Note that asc is getting the 'ü' as a 'ū', and a few other errors, that "deu" one handles. But still a huge help. A BIG improvement is it got '=' correctly, when all other trained data I tried, including math symbols, returns as ':' or worse. Thanks! A couple questions, to help me learn to fish so to speak... 1. How do I find/get the unicharset file? I checked the english and german tessdata downloads and there is nothing. 2. How did you go about making the asc traineddata? I think I need to dive into this aspect of tesseract. Do I follow these steps? https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3. I am not interested in new languages, just making one that covers extended ascii, and then hopefully one day the Unicode BMP (0x0000 - 0xFFFF). But not sure how to go about that with a huge time sink. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/01a3b8e3-51af-47a1-90f8-a5c884d3ffa9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
% 93% <61, 6, 179, 118> ( 85% <230, 1, 273, 135> ) 84% <319, 1, 362, 135> · 75% <411, 93, 442, 131> - 98% <492, 65, 532, 83> . 100% <586, 93, 608, 115> / 93% <656, 7, 708, 117> 0 89% <758, 8, 831, 116> ¶ 70% <888, 10, 937, 114> 2 90% <988, 8, 1058, 114> 3 86% <1110, 8, 1177, 116> 4 91% <1225, 10, 1303, 114> 5 88% <1359, 10, 1424, 116> 6 88% <1476, 8, 1546, 116> 8 89% <1596, 8, 1669, 116> 9 87% <1721, 8, 1791, 116> IJ 67% <1846, 37, 1947, 131> A 89% <1993, 10, 2099, 114> B 89% <2156, 10, 2230, 114> D 93% <2287, 10, 2376, 114> E 96% <2433, 10, 2495, 114> G 89% <2547, 8, 2638, 116> H 90% <2696, 10, 2780, 114> | 97% <2834, 9, 2856, 115> K 94% <2913, 10, 2999, 114> L 97% <3056, 10, 3118, 114> N 92% <3174, 10, 3260, 114> P 89% <3315, 10, 3386, 114> R 91% <3441, 10, 3519, 114> S 88% <3573, 8, 3640, 116> V 96% <3687, 10, 3784, 114> W 99% <3830, 10, 3977, 114> Z 96% <4028, 10, 4100, 114> a 93% <4150, 35, 4220, 116> b 90% <4275, 2, 4351, 116> c 84% <4402, 35, 4460, 116> d 89% <4510, 2, 4586, 116> e 91% <4636, 35, 4709, 116> f 84% <4757, 0, 4811, 114> g 87% <4861, 35, 4937, 147> h 92% <4992, 2, 5063, 114> ¡ 89% <5119, 3, 5139, 115> i 86% <5188, 4, 5221, 147> k 94% <5277, 2, 5349, 114> | 97% <5404, 1, 5425, 115> m 91% <5480, 35, 5593, 114> n 93% <5648, 35, 5719, 114> o 87% <5769, 35, 5850, 116> p 91% <5905, 35, 5981, 146> r 92% <6037, 35, 6083, 114> s 85% <6133, 35, 6189, 116> t 85% <6236, 15, 6290, 116> u 92% <6345, 37, 6416, 116> v 96% <6463, 37, 6543, 114> w 98% <6589, 37, 6719, 114> z 92% <6770, 37, 6834, 114> ¨ 97% <6883, 5, 6932, 23> ß 87% <6987, 0, 7063, 115> ä 88% <7114, 6, 7183, 116> ū 95% <7238, 6, 7309, 116>