I have image of pre printed forms that have been filled out by hand.  I am 
not trying to recognize the hand writing, just the "date" and "room name" 
that is printed on the form.  With that data I can link the image file to 
the database row and a human can view the image and do whatever they need 
to do.

I am hoping to tell tesseract that anything not found in a 20 line text 
file is nothing. 

Here is what I have so far:

carl@twist:~/Documents/scans/tests$ tesseract s1-0.png test
Tesseract Open Source OCR Engine v3.02.01 with Leptonica

carl@twist:~/Documents/scans/tests$ grep -E "(Feb 02|H.1301)" test.txt 
[ ] Equipment problems [ ] Notes on back Feb 02 5""
H.1301 (cornil)

the 2nd line should really be 
H.1301 (Cornil)


carl@twist:~/Documents/scans/tests$ head tessdata/foo.user-words 
Feb 01
Feb 02
Janson
H.1301 (Cornil)
K.1.105 (La Fontaine)
H.2215 (Ferrer)

> Put the wordlist in <lang>.user-words or recreate <lang>.word-dawg using 
wordlist2dawg.

I haven't been able to figure out how to do either of those, but I get the 
feeling that is the wrong direction.



-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to