Hi,
I just started playing around with tesseract an hour ago - and I tried
bengali first too. I do not actually know how to make it work yet.
But I shall tell you what I think I know -
1. The default characters tesseract looks for are english/latin. Use
`tesseract --list-langs` for a list of supported languages by default.
I get 3 on a fresh install from apt-get in Ubuntu 14.04
$tesseract --list-langs
List of available languages (3):
eng
osd
equ
This makes sense because the default `tessdata` directory has those
traineddata files
$ ls /usr/share/tesseract-ocr/tessdata/ | grep traineddata$
eng.traineddata
equ.traineddata
osd.traineddata
2. clone the tessdata repository from github
(https://github.com/tesseract-ocr/tessdata)
3. run tesseract with "-l ben" from the tessdata directory -
$ tesseract --list-langs --tessdata-dir $NEWTESSDATA
but even this crashes with message
actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert
failed:in file tessdatamanager.cpp, line 53
Segmentation fault (core dumped)
I played around with keeping only one file ben.traineddata in the
$NEWTESSDATA folder, but I do not know what the design of the arguments is
till now.
On Tuesday, 8 November 2016 07:38:20 UTC+1, rkvsraman wrote:
>
> Hello,
>
>
> I tried to detect the script of the above bengali image with command
>
> tesseract ben.png bensc - -psm 0
>
>
> and i get following output in bensc.osd which detects the the script as
> Latin.
>
>
> Page number: 0
> Orientation in degrees: 90
> Rotate: 270
> Orientation confidence: 1.48
> Script: Latin
> Script confidence: 2.35
>
>
> What do i need to do to make it detect it as Bengali.
>
> Thanks.
>
> -Raman
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/5a1e4060-fdda-4aa1-aa29-4358325c7094%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.