Hi, Im trying to train as well  and I have the same problem. I got this 
result :

"P 5 0,255,0,255,0,32767,0,32767,0,32767 NULL 54 0 0 # # P [50 ]A
A 5 0,255,0,255,0,32767,0,32767,0,32767 NULL 38 0 0 # # A [41 ]A
S 5 0,255,0,255,0,32767,0,32767,0,32767 NULL 53 0 0 # # S [53 ]A"

I have the problem with the fields of glyph_metric and script. Is there any 
idea? 



On Tuesday, 1 December 2015 00:42:23 UTC+2, Gustavo Polledri wrote:
>
> In some recent posts, I've seen people with similar problems as mine, but 
> no answer as how to fix it.  I'm trying to train tesseract to be more 
> accurate with a new font.  When creating the unicharset using 
> unicharset_extractor on my box file:
>
> ```
> a 32 692 165 958 0 
> b 221 734 354 958 0 
> c 32 446 165 628 0 
> d 221 488 354 628 0 
> e 32 275 165 373 0 
> f 221 317 277 373 0
> ```
>
> I get the following output:
>
> ```
> 9 
> NULL 0 NULL 0 
> Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # Joined [4a 6f 69 6e 65 
> 64 ] 
> |Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0        # Broken 
> a 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # a [61 ] 
> b 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # b [62 ] 
> c 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # c [63 ] 
> d 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # d [64 ] 
> e 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # e [65 ] 
> f 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # f [66 ]
> ```
>
> and when i run shapeclustering, if gives a the first few lines of:
>
> ```
> Bad properties for index 3, char a: 0,255 0,255 0,0 0,0 0,0 
> Bad properties for index 4, char b: 0,255 0,
> ```
>
> It seems that the unicharset_extractor isn't properly parsing the box 
> file.  Some obvious problems with the unicharset file are the "properties" 
> bit mask is 0, the "glyph_metrics" field appears invalid 
> (0,255,0,255,0,0,0,0,0,0), the "script" field should be either "Latin" or 
> "Common", but is NULL, etc.
>
> Anyone have an idea why is is happening?
>
> O/S: Ubuntu 15.10
> Tesseract Ver: 3.04
>
> Posts with no simple resolution:
> https://github.com/tesseract-ocr/tesseract/issues/139
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/798f1c9f-9547-44d4-b272-6b7f59adbeb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to