Wow! Awesome.

That file definitely helps. It fixed a few issues, but introduced a few of 
its own, so currently I am running "eng+asc" and that is giving great 
output, and is running faster then "eng+deu".

Attached is an example image and output using asc. Note that asc is getting 
the 'ü' as a 'ū', and a few other errors, that "deu" one handles. But still 
a huge help. 

A BIG improvement is it got '=' correctly, when all other trained data I 
tried, including math symbols, returns as ':' or worse. Thanks!

A couple questions, to help me learn to fish so to speak...
1. How do I find/get the unicharset file? I checked the english and german 
tessdata downloads and there is nothing.
2. How did you go about making the asc traineddata? I think I need to dive 
into this aspect of tesseract. Do I follow these steps? 
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3. I am not 
interested in new languages, just making one that covers extended ascii, 
and then hopefully one day the Unicode BMP (0x0000 - 0xFFFF). But not sure 
how to go about that with a huge time sink.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/01a3b8e3-51af-47a1-90f8-a5c884d3ffa9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
% 93% <61, 6, 179, 118>
( 85% <230, 1, 273, 135>
) 84% <319, 1, 362, 135>
· 75% <411, 93, 442, 131>
- 98% <492, 65, 532, 83>
. 100% <586, 93, 608, 115>
/ 93% <656, 7, 708, 117>
0 89% <758, 8, 831, 116>
¶ 70% <888, 10, 937, 114>
2 90% <988, 8, 1058, 114>
3 86% <1110, 8, 1177, 116>
4 91% <1225, 10, 1303, 114>
5 88% <1359, 10, 1424, 116>
6 88% <1476, 8, 1546, 116>
8 89% <1596, 8, 1669, 116>
9 87% <1721, 8, 1791, 116>
IJ 67% <1846, 37, 1947, 131>
A 89% <1993, 10, 2099, 114>
B 89% <2156, 10, 2230, 114>
D 93% <2287, 10, 2376, 114>
E 96% <2433, 10, 2495, 114>
G 89% <2547, 8, 2638, 116>
H 90% <2696, 10, 2780, 114>
| 97% <2834, 9, 2856, 115>
K 94% <2913, 10, 2999, 114>
L 97% <3056, 10, 3118, 114>
N 92% <3174, 10, 3260, 114>
P 89% <3315, 10, 3386, 114>
R 91% <3441, 10, 3519, 114>
S 88% <3573, 8, 3640, 116>
V 96% <3687, 10, 3784, 114>
W 99% <3830, 10, 3977, 114>
Z 96% <4028, 10, 4100, 114>
a 93% <4150, 35, 4220, 116>
b 90% <4275, 2, 4351, 116>
c 84% <4402, 35, 4460, 116>
d 89% <4510, 2, 4586, 116>
e 91% <4636, 35, 4709, 116>
f 84% <4757, 0, 4811, 114>
g 87% <4861, 35, 4937, 147>
h 92% <4992, 2, 5063, 114>
¡ 89% <5119, 3, 5139, 115>
i 86% <5188, 4, 5221, 147>
k 94% <5277, 2, 5349, 114>
| 97% <5404, 1, 5425, 115>
m 91% <5480, 35, 5593, 114>
n 93% <5648, 35, 5719, 114>
o 87% <5769, 35, 5850, 116>
p 91% <5905, 35, 5981, 146>
r 92% <6037, 35, 6083, 114>
s 85% <6133, 35, 6189, 116>
t 85% <6236, 15, 6290, 116>
u 92% <6345, 37, 6416, 116>
v 96% <6463, 37, 6543, 114>
w 98% <6589, 37, 6719, 114>
z 92% <6770, 37, 6834, 114>
¨ 97% <6883, 5, 6932, 23>
ß 87% <6987, 0, 7063, 115>
ä 88% <7114, 6, 7183, 116>
ū 95% <7238, 6, 7309, 116>

Reply via email to