I'm training Tesseract on Windows for a new font and everything went pretty
well until the set_unicharset_properties command step:
set_unicharset_properties -U .\unicharset -O .\unicharset2 -F
"C:\Windows\Fonts\Roman.tff" --script_dir='C:\Program Files
Loaded unicharset of size 7 from file .\unicharset
> Setting unichar properties
> Other case c of C is not in unicharset
> Other case f of F is not in unicharset
> Setting script properties
> Failed to load script unicharset from:C:\Program Files
> Warning: properties incomplete for index 3 = C
> Warning: properties incomplete for index 4 = 0
> Warning: properties incomplete for index 5 = 1
> Warning: properties incomplete for index 6 = F
> Writing unicharset to file .\unicharset2
I've verified that Latin.unicharset is in the right directory.
The problem (I'm pretty sure) is on the end of this line :
Failed to load script unicharset from:C:\Program Files
The thing is that the training software adds a "/" instead of a "\".
I've looked on unicharset_training_utils.cpp, in the line 166, the "/" is
added without taking care if the command is used on Windows or Linux.
Is there a solution for Windows to load Latin.unicharset even with this "/"
If not, what is the easiest solution ?
For information, my unicharset2 file looks like that :
> NULL 0 Common 0
> Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e
> 65 64 ]a
> |Broken|0|1 f 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken
> C 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 C # C [43 ]A
> 0 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 0 # 0 [30 ]0
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
To post to this group, send email to firstname.lastname@example.org.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
For more options, visit https://groups.google.com/d/optout.