---------- Forwarded message ----------
From: Juan Pablo Aveggio <[email protected]>
To: tesseract-ocr <[email protected]>
Cc:
Date: Mon, 21 Sep 2015 16:17:45 -0700 (PDT)
Subject: Train tesseract 3.04 for recognition of six patterns no existents
in UTF-8
Hello
I'm trying to train tesseract for recognition of patterns present in
tickets. Each ticket possesses a unique pattern in a predetermined place
which determines its value. As these patterns are not including unicode
characters,  I assigned them the characters 'a' to 'f'.
I created a .tif image with six patterns:
bil.pat.exp0.tif
<https://drive.google.com/file/d/0B7CfYFzWHQDAYWU4M3hIQXUyOWs/view?usp=sharing>
and the corresponding file box:
bil.pat.exp0.box
<https://drive.google.com/file/d/0B7CfYFzWHQDAVkJlZ3lreEdpaXc/view?usp=sharing>
a 32 692 165 958 0
b 221 734 354 958 0
c 32 446 165 628 0
d 221 488 354 628 0
e 32 275 165 373 0
f 221 317 277 373 0

Then I ran:
tesseract bil.pat.exp0.tif bil.pat.exp0 box.train
and output:
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Page 1
APPLY_BOXES:
   Boxes read from boxfile:       6
APPLY_BOXES: Unlabelled word at :Bounding box=(-958,221)->(-734,277)
APPLY_BOXES: Unlabelled word at :Bounding box=(-628,221)->(-488,277)
APPLY_BOXES: Unlabelled word at :Bounding box=(-958,32)->(-734,88)
APPLY_BOXES: Unlabelled word at :Bounding box=(-628,32)->(-488,88)
APPLY_BOXES: Unlabelled word at :Bounding box=(-373,32)->(-317,88)
   Found 6 good blobs.
   5 remaining unlabelled words deleted.
Generated training data for 6 words
That can not mean negative coordinates. Despite this I tried to keep going.
My font_properties is:
bil.pat.box 0 0 1 0 0
bil.words_list is:
a
b
c
d
e
f

then I ran:
$ unicharset_extractor bil.pat.exp0.box
Extracting unicharset from bil.pat.exp0.box
Wrote unicharset file ./unicharset.
but the unicharset file has:
9
NULL 0 NULL 0
Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # Joined [4a 6f 69 6e 65 64
]
|Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0        # Broken
a 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # a [61 ]
b 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # b [62 ]
c 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # c [63 ]
d 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # d [64 ]
e 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # e [65 ]
f 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # f [66 ]
Then I ran:
$ mftraining -F font_properties -U unicharset -O bil.unicharset bil.pat.exp0
.tr
Read shape table shapetable of 0 shapes
Reading bil.pat.exp0.tr ...
Bad properties for index 3, char a: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char b: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char c: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char d: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char e: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char f: 0,255 0,255 0,0 0,0 0,0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Warning: no protos/configs for a in CreateIntTemplates()
Warning: no protos/configs for b in CreateIntTemplates()
Warning: no protos/configs for c in CreateIntTemplates()
Warning: no protos/configs for d in CreateIntTemplates()
Warning: no protos/configs for e in CreateIntTemplates()
Warning: no protos/configs for f in CreateIntTemplates()
Done!
That's what I'm doing wrong?
I am on debian.
tesseract 3.04.00
 leptonica-1.72
  libgif 4.1.6(?) : libjpeg 6b (libjpeg-turbo 1.4.0) : libpng 1.2.50 :
libtiff 4.0.5 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
>From already thank you very much!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yB6MVF8Aptw0NUEtH_%3D_pk8kGzgxBDBPSRPvZJqEuP9w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to