I have image of pre printed forms that have been filled out by hand. I am not trying to recognize the hand writing, just the "date" and "room name" that is printed on the form. With that data I can link the image file to the database row and a human can view the image and do whatever they need to do.
I am hoping to tell tesseract that anything not found in a 20 line text file is nothing. Here is what I have so far: carl@twist:~/Documents/scans/tests$ tesseract s1-0.png test Tesseract Open Source OCR Engine v3.02.01 with Leptonica carl@twist:~/Documents/scans/tests$ grep -E "(Feb 02|H.1301)" test.txt [ ] Equipment problems [ ] Notes on back Feb 02 5"" H.1301 (cornil) the 2nd line should really be H.1301 (Cornil) carl@twist:~/Documents/scans/tests$ head tessdata/foo.user-words Feb 01 Feb 02 Janson H.1301 (Cornil) K.1.105 (La Fontaine) H.2215 (Ferrer) > Put the wordlist in <lang>.user-words or recreate <lang>.word-dawg using wordlist2dawg. I haven't been able to figure out how to do either of those, but I get the feeling that is the wrong direction. -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

