>From readme.md of langdata

To re-create the training of a single language, lang, you need the
following:

All the data in the lang directory.
san/*.*

The corresponding unicharset/xheights files for the script(s) used by lang.
Devanagari.*

All the remaining non-lang-specific files in the top-level directory, such
as
font_properties.

You also need to obtain the fonts needed to train the language. Some
languages were trained with commercially available fonts, so you will need
to buy them in order to reproduce the training exactly, or use substitutes.
On Sep 22, 2016 6:30 PM, "rkvsraman" <rkvsra...@gmail.com> wrote:

> Let me try with Devanagari.* files
>
> Thanks
>
> -Raman
>
> On Thursday, September 22, 2016 at 6:19:13 PM UTC+5:30, shree wrote:
>>
>> Warning: properties incomplete for index 93 = प्र
>> Warning: properties incomplete for index 94 = क्रि
>> Warning: properties incomplete for index 95 = २
>> Warning: properties incomplete for index 96 = ५
>>
>>
>> These errors will get eliminated / reduced if your langdata has the
>>
>> Devanagari.unicharset
>>
>> and
>>
>> Devanagari.xheights
>>
>> from
>>
>> https://github.com/tesseract-ocr/langdata
>>
>>
>> I might have run the training without shapeclustering - please see the zip 
>> file with training log.
>>
>>
>> Also see 
>> http://stackoverflow.com/questions/34389159/tesseract-index-0-index-size-used-errorassert-failed-error
>>
>>
>>
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Thu, Sep 22, 2016 at 5:14 PM, rkvsraman <rkvs...@gmail.com> wrote:
>>
>>>
>>> Hello,
>>>
>>> I am running the shape clustering command and it crashes with following
>>> message.
>>>
>>> /usr/local/bin/shapeclustering -D /tmp/tmp.0fGj1mVg2C/hin/ -U
>>> /tmp/tmp.0fGj1mVg2C/hin/hin.unicharset -O 
>>> /tmp/tmp.0fGj1mVg2C/hin/hin.mfunicharset
>>> -F /home/raman/Desktop/lang/font_properties /tmp/tmp.0fGj1mVg2C/hin/
>>> hin.Noto_Sans_Devanagari.exp0.tr
>>> Reading /tmp/tmp.0fGj1mVg2C/hin/hin.Noto_Sans_Devanagari.exp0.tr ...
>>> Reading spacing from /tmp/tmp.0fGj1mVg2C/hin/hin.No
>>> to_Sans_Devanagari.exp0.fontinfo for font 4293...
>>> shapeclustering: ../ccutil/genericvector.h:663: T&
>>> GenericVector<T>::operator[](int) const [with T = int]: Assertion
>>> `index >= 0 && index < size_used_' failed.
>>> Aborted (core dumped)
>>>
>>> Attached is tesstrain.log
>>>
>>> Any idea why?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/c1ef13db-90e8-4a13-8ade-2986e7a8df10%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/c1ef13db-90e8-4a13-8ade-2986e7a8df10%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/9e64dd11-8955-423c-bfca-2b3eefc26ed1%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/9e64dd11-8955-423c-bfca-2b3eefc26ed1%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUkKnhNykBkkqY%2BXUkCEjicX_Dg0PdFMO%3D38sdrfh4FiA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to