I think your training data should be more than one line. Create a page of text and see if that works.
Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Oct 28, 2013 at 7:59 AM, Jonathan Nikkel <[email protected]> wrote: > Hey there, > > I am a Tesseract novice, and would like to solicit some help/advices from > you smart folks. I will preface by saying that I have read the FAQ, > searched this forum profusely, read all of the topics, and tried all the > suggestions/advices I found, with no luck so far. This is probably not a > difficult one, I assume I must be missing something stupid, but hey, that > is why we have forums like these =). > > What I am using: > Windows 7 box > Tesseract v3.02 > TesseractTrainer (auto-generated .tif's based on input training text, > automates the training process) > > I am able to successfully train the off-the-shelf arial training data > included with the Tesseract dev files. > > I am now trying to train a custom data set with the Arial font (no mods, > standard installed with windows) using this setup to make sure I understand > this training process/code, and am setting things up correctly, before > moving on to more complex fonts. > > I am getting 100% failures in blob recognition/box resegmentation, and am > puzzled as to why. I have tried numerous combinations of character > spacing, line spacing, font size, image bit depth (I am now using a binary > image), DPI (using 300 dpi, 3600x3600 now, to be consistent with the > example trainings), and am trying to home in using a font size that > achieves an xheight of 25 pixels. I have checked the box file accuracy > using cowboxer, and am getting accurate boxes it appears. > > Attached are some example files; I have tried alternative character > spacings from nearly touching, up to about double what you see here. I > have tried all of the pageseg modes, using* {prefix}.tif {prefix} nobatch > box.train *parameters. Pageseg mode 4 crashes, the rest generate 100% > resegmentation errors. > > Where am I going wrong? Anyone have a working example setup with > TesseractTraining they can share? > > Regards, > > -Jon > > > > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

