Please try the vietocr gui frontend for tesseract ocr available from http://vietocr.sourceforge.net/ It uses a newer version of tesseract.
you can also try using the bengali traineddata available on tesseract site - https://code.google.com/p/tesseract-ocr/source/browse/ben.traineddata?repo=tessdata or https://github.com/tesseract-ocr/tessdata/blob/master/ben.traineddata ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, May 16, 2015 at 1:37 AM, Tawfiq Chowdhury <[email protected]> wrote: > I am developing a traindata for Bengali language.The problem is tesseract > does not recognize most spaces in the input file and keep almost all the > characters of the input image together to make one long word instead of > several words and sentences.This is for a big traindata where it detects > some spaces, for a small traindata, it detects nothing.I made an English > traindata with only 26 English alphabets to test whether tesseract detects > spacing for it and it can detect for English but not for Bengali.I am using > 3.02.02 windows installer.Please tell me where to edit the configuration to > make it work.I am giving some characters of Bengali to see. > > আ মা দে র দে শে র না ম বা লা দে শ > > An input text in an image file can be like this আমাদের দেশের নাম বালাদেশ > > However, tesseract generates output like this আমাদেরদেশেরনামবালাদেশ > > I am doing my thesis on it and in need to help urgently.Thanks in > advance.Is there any version of 3.03 or 3.04 for windows? I heard there is > 3.03 beta version. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/520ee839-2152-47be-a9b0-7e651db9a2a0%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/520ee839-2152-47be-a9b0-7e651db9a2a0%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVFuEpFTz9vrJgsAwT-i1VMg8-Y-MnTGukNgP%2BD7wP34w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

