Hi,
I just started playing around with tesseract an hour ago - and I tried 
bengali first too. I do not actually know how to make it work yet. 
But I shall tell you what I think I know -
1. The default characters tesseract looks for are english/latin. Use 
`tesseract --list-langs` for a list of supported languages by default.
I get 3 on a fresh install from apt-get in Ubuntu 14.04
    $tesseract --list-langs 
    List of available languages (3):
    eng
    osd
    equ

This makes sense because the default `tessdata` directory has those 
traineddata files
    $ ls /usr/share/tesseract-ocr/tessdata/ | grep traineddata$
    eng.traineddata
    equ.traineddata
    osd.traineddata
2. clone the tessdata repository from github 
(https://github.com/tesseract-ocr/tessdata)
3. run tesseract with "-l ben" from the tessdata directory - 
    $ tesseract --list-langs --tessdata-dir $NEWTESSDATA

but even this crashes with message
   actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert 
failed:in file tessdatamanager.cpp, line 53
   Segmentation fault (core dumped)

I played around with keeping only one file ben.traineddata in the 
$NEWTESSDATA folder, but I do not know what the design of the arguments is 
till now. 


On Tuesday, 8 November 2016 07:38:20 UTC+1, rkvsraman wrote:
>
> Hello, 
>
>
> I tried to detect the script of the above bengali image with command
>
> tesseract ben.png bensc -  -psm 0
>
>
> and i get following output in bensc.osd  which detects the the script as 
> Latin.
>
>
> Page number: 0
> Orientation in degrees: 90
> Rotate: 270
> Orientation confidence: 1.48
> Script: Latin
> Script confidence: 2.35
>
>
> What do i need to do to make it detect it as Bengali. 
>
> Thanks.
>
> -Raman
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5a1e4060-fdda-4aa1-aa29-4358325c7094%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to