Re: [tesseract-ocr] Re: Adding new language to Tesseract?

ShreeDevi Kumar Mon, 03 Nov 2014 18:06:07 -0800

Thanks for clarifying and giving more details.

I am cc:ing this email to the tesseract developers group and Ray for answer
to your question "how to submit this file to Tesseract's repository?. "


Meanwhile, I suggest that you add an 'issue' and attach the traineddata.

Thanks!

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Nov 4, 2014 at 1:08 AM, Puramoca021 <[email protected]> wrote:

> Hi Devi,
>
> Unfortunately, you are slightly misinformed as well.
>
> The file with trained data for Serbian language that is currently in
> Tesseract's repository contains LATIN characters.
> What I made is corpus of trained data that recognizes *Serbian Cyrillic*
>  characters.
>
> A good summary and explanation what *Serbian Cyrillic* is can be found
> here <http://en.wikipedia.org/wiki/Serbian_Cyrillic_alphabet> (Wikipedia
> article). Please pay attention to section *"Modern alphabet"* in
> Wikipedia article.
> What current version of Tesseract's *srp.traineddata* can recognize are
> letters in column labelled "*Latin*" (see Wikipedia article).
> I would like to submit file with trained data which will make Tesseract
> recognize letters in column "*Cyrillic*" (again, see Wikipedia article).
>
> Again, I did not get a clear answer to my question - how to submit this
> file to Tesseract's repository?
>
> Shall I *assume* that I need to open an issue and submit trained data
> there? Please clarify.
>
>
> Regards,
> Zoltan
>
>
> понедељак, 03. новембар 2014. 19.45.38 UTC+1, shree је написао/ла:
>>
>> There already is language data for srp - please see
>>
>> https://code.google.com/p/tesseract-ocr/source/browse/srp/?repo=langdata
>>
>> and
>>
>> https://code.google.com/p/tesseract-ocr/source/browse/
>> srp.traineddata?repo=tessdata
>>
>> Ray Smith, the lead developer  of tesseract at Google is planning to
>> release updated versions of traineddata soon as part of 3.04 release.
>>
>> If  your traineddata has something additional that is not there in the
>> existing set, then please add as attachment to an issue so that it can be
>> tested.
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Tue, Nov 4, 2014 at 12:02 AM, Puramoca021 <[email protected]> wrote:
>>
>>>
>>> On Sunday, November 2, 2014 4:45:32 PM UTC+1, Vladimir Radnovic wrote:
>>>>
>>>> Hi, Zdravo Zoltane
>>>> za sta ti treba novi traindata ? imas vise nacina da odradis traning pa
>>>> ako ti treba pomoc ti se javi
>>>>
>>>> You have severas ways to traind data.... what u need for ?
>>>> pozdrav
>>>> vladimir
>>>>
>>>>
>>> Hi Vladimir,
>>>
>>> I am afraid you did not understand me ... I think I was not clear enough:
>>>
>>> - I *do not need* new traindata. I *made new traindata for Serbian
>>> Cyrillic myself* and I would like to offer this train data to all
>>> Tesseract users that need to OCR text printed in Serbian Cyrillic.
>>>
>>> My question is: How do I send this file (srp.traineddata) to you,
>>> Tesseract developers and maintainers?
>>>
>>> By zipping it and sending via email?
>>> By uploading to a file sharing service? If so, which one?
>>> By making a torrent out of it?
>>>
>>> Please advise
>>>
>>> Regards,
>>> Zoltan
>>>
>>>
>>>
>>>> On Saturday, 1 November 2014 21:12:04 UTC+1, Puramoca021 wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have trained unreleased Tesseract 3.04 (available only in Subversion
>>>>> repository) to recognize Serbian Cyrillic. Instructions for training
>>>>> Tesseract 3
>>>>> <https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> were
>>>>> strictly followed - I used script *tesstrain.sh* and provided
>>>>> required files.
>>>>>
>>>>> My question is: what is the procedure for submitting new trained data
>>>>> so that they are available for new, upcoming version of Tesseract ?
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Zoltan
>>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/tesseract-ocr/0362254d-260d-49fa-af8b-c098b50811f0%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/0362254d-260d-49fa-af8b-c098b50811f0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/29a8e468-3f2d-4350-b48b-e925791086e2%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/29a8e468-3f2d-4350-b48b-e925791086e2%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVXnKWEyPZiw4exZhTZ0t769JU1rb2JtQqeDgeSBR8y6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Adding new language to Tesseract?

Reply via email to