Re: [tesseract-ocr] add new characters

shree Tue, 27 Oct 2020 19:06:24 -0700

Did you copy the traineddata file to /usr/share/tesseract-ocr/4.00/tessdata?
What's the value of TESSDATA_PREFIX  in your 'env' output?


What's the output of?

ls -l 
/usr/share/tesseract-ocr/4.00/tessdata/Sanskrit-1017-fast.traineddata  

combine_tessdata -d  
/usr/share/tesseract-ocr/4.00/tessdata/Sanskrit-1017-fast.traineddata 

tesseract --list-langs --tessdata-dir /usr/share/tesseract-ocr/4.00/tessdata

tesseract --list-langs

tesseract -v


On Wednesday, October 28, 2020 at 3:04:01 AM UTC+5:30 Timo Struppi wrote:

> Help!  I get following errorcode. What am i doing wrong?
>
> Error opening data file 
> /usr/share/tesseract-ocr/4.00/tessdata/Sanskrit-1017-fast.traineddata
> Please make sure the TESSDATA_PREFIX environment variable is set to your 
> "tessdata" directory.
> Failed loading language 'Sanskrit-1017-fast'
> Tesseract couldn't load any languages!
> Could not initialize tesseract.
>
> On Saturday, October 24, 2020 at 5:53:55 PM UTC+2 Timo Struppi wrote:
>
>> *perfect!* Thank you very much <3 Thats what i was looking for. 
>> International Alphabet of Sanskrit Transliteration Characters.
>>
>> Can tell me in which folder i must place the .traineddata?  
>>
>> My configuration:
>> tesseract 4.1.1
>>  leptonica-1.79.0
>>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : 
>> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
>>  Found AVX
>>  Found SSE
>>  Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 
>> liblz4/1.9.2 libzstd/1.4.4
>>
>> Many thanks again for your fast help
>>
>> On Saturday, October 24, 2020 at 3:12:15 PM UTC+2 shree wrote:
>>
>>> Ray has suggested using plus-minus type of training for adding a couple 
>>> of characters to the traineddata. Did you try that?
>>>
>>> Please share the training data you used (box/tiff pairs or lstmf files).
>>>
>>> I have done replace a layer training for Sanskrit. It adds the two 
>>> characters you want (in addition to many other required for Sanskrit 
>>> transliteration) . See sample image and attached output. The file is 
>>> available at 
>>> https://github.com/Shreeshrii/tess5training-sanskrit-iast/tree/main/tessdata/fast
>>>
>>>  
>>>
>>> On Sat, Oct 24, 2020 at 5:31 PM Timo Struppi <[email protected]> wrote:
>>>
>>>>
>>>> Hello,
>>>>
>>>> I dont want to invent the wheel new by creating a new language but how 
>>>> do i add the letters ṛ and ī to the OCR??
>>>>
>>>> I tried a lot (vietOCR, Linux inteligent OCR solution, followed the few 
>>>> avaible tutorials etc) for several days but i am still not achieve to add 
>>>> a 
>>>> single letter. 
>>>>
>>>>
>>>> Many thanks in advance
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/f23a9be3-dea4-46a6-8e21-dbe9c120d993n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/f23a9be3-dea4-46a6-8e21-dbe9c120d993n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> -- 
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/16ae9c7d-74e9-4d76-b998-e004d3540312n%40googlegroups.com.

Re: [tesseract-ocr] add new characters

Reply via email to