Hello,

I have extracted all the characters and id numbers from the 
chi_sim.traineddata. And all the characters are stored in a txt file, which 
can be demonstrated following:

0     
1    Joined
2    |Broken|0|1
3    S
4    D
5    F
6    8
7    7
8    0
9    K
10    O
11    U
12    H
13    E
14    I
15    4
16    5
17    1
18    9
19    &
20    C
21    W
22    N
23    _
24    P
25    M
26    T
27    V
28    R
29    L
30    A
31    Y
32    2
33    J
34    B
35    G
36    3
37    6
38    Z
39    X
40    Q
41    '
42    +
43    -
44    .
45    #
46    e
47    v
48    a
49    m
50    i
51    z
52    o
53    l
54    s
55    h
56    n
57    d
58    g
59    y
60    u
61    王
62    汝
63    敏
64    邹
65    立
66    健
67    熊
...
...
4013    扔
4014    嗨
4015    髋
4016    「
4017    [
4018    』
4019    瀵
4020    〕
4021    掺
4022    |"|0|2
4023    |"|1|2
4024    rn
4025    |m|0|2
4026    |m|1|2
4027    in
4028    cl
4029    |d|0|2
4030    |d|1|2
4031    rm
4032    |rm|0|2
4033    |rm|1|2
4034    nn
4035    |nn|0|2
4036    |nn|1|2
4037    ri
4038    |n|0|2
4039    |n|1|2
4040    |h|0|2
4041    |h|1|2
4042    |u|0|2
4043    |u|1|2
4044    |m|0|3
4045    |m|1|3
4046    |m|2|3
4047    |H|0|2
4048    |H|1|2
4049    |H|0|3
4050    |H|1|3
4051    |H|2|3
4052    |w|0|2
4053    |w|1|2
4054    |W|0|2
4055    |W|1|2
4056    fi
4057    |k|0|2
4058    |k|1|2
4059    ki
4060    |ki|0|2
4061    |ki|1|2
4062    |in|0|2
4063    |in|1|2
4064    tl
4065    th
...


I can recognize most of the characters, such as the han, ladin alphabet. 
But some characters, such as 'Joined', ' |Broken|0|1' at the file header, 
and |"|0|2, |m|0|2 at the end of the file, cannot be recognized by myself.

Can you explan what these characters mean?
4059    ki
4060    |ki|0|2
4061    |ki|1|2
4062    |in|0|2
4063    |in|1|2
 and so on


Thx alot.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b042f6e0-7fc9-487b-bcc6-0acf22c343fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to