Images appearing readable to human eyes may not be so to computers. Therefore, image processing is most likely required prior to OCR step.
Sure, you can use jTessBoxEditor to train for your language. The generated .traineddata will be placed in a tessdata folder and you can use the *Validate *function to verify the resultant data. On Thursday, June 9, 2016 at 4:23:07 AM UTC-5, Rafał Błaczkowski wrote: > > Hello All!! > > I have a big problem with tesseract-ocr. > I downloaded the example of use tesseract from the official page > (net.sourceforge.tess4j.example) just for test how it works. > I downloaded too, almost all tessdata files (dunno what is the difference > between these files) and run the java script (using net.sourceforge.tess4j). > I put very simple and easy tiff file for test, and results have not been > so well. Some words have been recognized correctly, but the rest've been > recognized like: BEST instead of DEST, DEF instead of DEP, etc. > > I understand, that I should train my script how to recognize my picture > (font, size, etc). But I dunno how to deal with it! Is there any > documentation about these problem? > I know that some files should be put in tessdata directory, but how to > create them? > > I downloaded also jTessBoxEditor, put some demo image with my text, > trained something in Trainer tab, but after training nothing have been > done... > > Can somebody help me or tell me how to solve my problems?? > > Many thanks for considering my request! > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6c7d8066-b643-4800-a606-eef04d0d164b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

