In some recent posts, I've seen people with similar problems as mine, but 
no answer as how to fix it.  I'm trying to train tesseract to be more 
accurate with a new font.  When creating the unicharset using 
unicharset_extractor on my box file:

```
a 32 692 165 958 0 
b 221 734 354 958 0 
c 32 446 165 628 0 
d 221 488 354 628 0 
e 32 275 165 373 0 
f 221 317 277 373 0
```

I get the following output:

```
9 
NULL 0 NULL 0 
Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # Joined [4a 6f 69 6e 65 64 
] 
|Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0        # Broken 
a 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # a [61 ] 
b 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # b [62 ] 
c 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # c [63 ] 
d 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # d [64 ] 
e 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # e [65 ] 
f 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # f [66 ]
```

and when i run shapeclustering, if gives a the first few lines of:

```
Bad properties for index 3, char a: 0,255 0,255 0,0 0,0 0,0 
Bad properties for index 4, char b: 0,255 0,
```

It seems that the unicharset_extractor isn't properly parsing the box file. 
 Some obvious problems with the unicharset file are the "properties" bit 
mask is 0, the "glyph_metrics" field appears invalid 
(0,255,0,255,0,0,0,0,0,0), the "script" field should be either "Latin" or 
"Common", but is NULL, etc.

Anyone have an idea why is is happening?

O/S: Ubuntu 15.10
Tesseract Ver: 3.04

Posts with no simple resolution:
https://github.com/tesseract-ocr/tesseract/issues/139

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6af0b1c6-bd5a-4bbe-aac0-c95df30d7924%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to