Do you think training one character per file is affecting my results? I was doing it because I have thousands of samples, and makebox always makes too many wrong guesses. If I have all the digits on the same image, fixing the resulting 10k chars box file manually would take forever. On the other hand, fixing a single digit box file only takes a simple regexp replace operation on the resulting box file (one replace for digit 1, another replace for digit 2, and so on).
Also, the goal of my application is for online OCR, to recognize single lines of handwritten digits as the user draws them. Would this affect the format of my sample image(s) as well? Thanks, Fred On Friday, February 28, 2014 10:58:05 PM UTC+8, Quan Nguyen wrote: > > I'm not sure having only samples of one character in a file is a good > idea. I normally train with all the characters in the same image(s). > > Check > http://code.google.com/p/tesseract-ocr/downloads/detail?name=boxtiff-2.01.eng.tar.gzfor > an example. > > On Tuesday, February 25, 2014 10:51:39 AM UTC-6, Frederico Ferro Schuh > wrote: >> >> Hello all, >> >> I'm training Tesseract to recognize handwritten digits, and I have >> provided it about 6000 samples of each digit, in 10 different box files, >> one for each digit. Each box file is a 2152x2152 TIF file. However, the >> resulting traineddata file I get after completing the training procedure is >> only 137 kb. >> I went through the process again, providing smaller sample files (1000 >> samples of each digit), and ended up with the same traineddata size of 137 >> kb. >> Is this size reasonable or am I doing something wrong? >> I assume something is wrong because my results are pretty bad so far. >> >> I've attached the sample image I am using for the digit 0. >> >> Thanks in advance, >> Fred >> > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

