Please try the vietocr gui frontend for tesseract ocr available from
http://vietocr.sourceforge.net/
It uses a newer version of tesseract.

you can also try using the bengali traineddata available on tesseract site
-
https://code.google.com/p/tesseract-ocr/source/browse/ben.traineddata?repo=tessdata

or

https://github.com/tesseract-ocr/tessdata/blob/master/ben.traineddata

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, May 16, 2015 at 1:37 AM, Tawfiq Chowdhury <[email protected]>
wrote:

> I am developing a traindata for Bengali language.The problem is tesseract
> does not recognize most spaces  in the input file and keep almost all the
> characters of the input image together to make one long word instead of
> several words and sentences.This is for a big traindata where it detects
> some spaces, for a small traindata, it detects nothing.I made an English
> traindata with only 26 English alphabets to test whether tesseract detects
> spacing for it and it can detect for English but not for Bengali.I am using
> 3.02.02 windows installer.Please tell me where to edit the configuration to
> make it work.I am giving some characters of Bengali to see.
>
> আ মা দে র দে শে র না ম বা লা দে শ
>
> An input text in an image file can be like this আমাদের দেশের নাম বালাদেশ
>
> However, tesseract generates output like this আমাদেরদেশেরনামবালাদেশ
>
> I am doing my thesis on it and in need to help urgently.Thanks in
> advance.Is there any version of 3.03 or 3.04 for windows? I heard there is
> 3.03 beta version.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/520ee839-2152-47be-a9b0-7e651db9a2a0%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/520ee839-2152-47be-a9b0-7e651db9a2a0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVFuEpFTz9vrJgsAwT-i1VMg8-Y-MnTGukNgP%2BD7wP34w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to