Take a look at https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 for an overview of training for 4.0. Follow the tutorials to get a feel of the training process - you can try for English as well as Malayalam.
In terms of trainer GUI, I think that it will probably work for `fine tune` training. Areas where you could contribute re 4.0 training would be in creating box files in 4.0 format from scanned images. Also look at jtessboxeditor which offers tesseract training gui - though not for 4.0. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Jun 24, 2017 at 7:25 PM, Nalin Linux <[email protected]> wrote: > > > On Saturday, June 24, 2017 at 7:07:32 PM UTC+5:30, shree wrote: >> >> You can update it for 3.05.01 >> >> I am quit impressed with Tesseract 4.0. And it's working fine for my > language (Malayalam). Is this trained data for version 4.0 listed in > https://github.com/tesseract-ocr/tessdata > created from old language data itself ? (https://github.com/tesseract- > ocr/langdata). What about creating a training GUI for version 4.0 ? I > have two months of time at my disposal for developing such a GUI. > Please let me know the relevance of this project or else let me switch to > another relevant free and opensource project. > > Thanking you Nalin. > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/0929cd89-69c7-4693-be98-14286633d83c% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/0929cd89-69c7-4693-be98-14286633d83c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWLr%2Be8cT%2B%2BWerrz0%2BS1FH1TFHAJnePLJk17LrdrbULgA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

