[tesseract-ocr] Re: Contribution : Serbian Cyrillic traineddata file

ShreeDevi Kumar Mon, 03 Nov 2014 19:55:45 -0800

* Changed subject to *Serbian Cyrillic*

* Please note that issues allow attachments only up to 10MB. So, if the
traineddata zipped version is larger than that, please host it elsewhere
(eg. github) and provide a link.  Ray/Jeff/Zdenko, please correct, if that
is not the case.



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Nov 4, 2014 at 7:35 AM, ShreeDevi Kumar <[email protected]>
wrote:

> Thanks for clarifying and giving more details.
>
> I am cc:ing this email to the tesseract developers group and Ray for
> answer to your question "how to submit this file to Tesseract's
> repository?. "
>
> Meanwhile, I suggest that you add an 'issue' and attach the traineddata.
>
> Thanks!
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Nov 4, 2014 at 1:08 AM, Puramoca021 <[email protected]> wrote:
>
>> Hi Devi,
>>
>> Unfortunately, you are slightly misinformed as well.
>>
>> The file with trained data for Serbian language that is currently in
>> Tesseract's repository contains LATIN characters.
>> What I made is corpus of trained data that recognizes *Serbian Cyrillic*
>>  characters.
>>
>> A good summary and explanation what *Serbian Cyrillic* is can be found
>> here <http://en.wikipedia.org/wiki/Serbian_Cyrillic_alphabet> (Wikipedia
>> article). Please pay attention to section *"Modern alphabet"* in
>> Wikipedia article.
>> What current version of Tesseract's *srp.traineddata* can recognize are
>> letters in column labelled "*Latin*" (see Wikipedia article).
>> I would like to submit file with trained data which will make Tesseract
>> recognize letters in column "*Cyrillic*" (again, see Wikipedia article).
>>
>> Again, I did not get a clear answer to my question - how to submit this
>> file to Tesseract's repository?
>>
>> Shall I *assume* that I need to open an issue and submit trained data
>> there? Please clarify.
>>
>>
>> Regards,
>> Zoltan
>>
>>
>> понедељак, 03. новембар 2014. 19.45.38 UTC+1, shree је написао/ла:
>>>
>>> There already is language data for srp - please see
>>>
>>> https://code.google.com/p/tesseract-ocr/source/browse/srp/?repo=langdata
>>>
>>> and
>>>
>>> https://code.google.com/p/tesseract-ocr/source/browse/
>>> srp.traineddata?repo=tessdata
>>>
>>> Ray Smith, the lead developer  of tesseract at Google is planning to
>>> release updated versions of traineddata soon as part of 3.04 release.
>>>
>>> If  your traineddata has something additional that is not there in the
>>> existing set, then please add as attachment to an issue so that it can be
>>> tested.
>>>
>>> ShreeDevi
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Tue, Nov 4, 2014 at 12:02 AM, Puramoca021 <[email protected]> wrote:
>>>
>>>>
>>>> On Sunday, November 2, 2014 4:45:32 PM UTC+1, Vladimir Radnovic wrote:
>>>>>
>>>>> Hi, Zdravo Zoltane
>>>>> za sta ti treba novi traindata ? imas vise nacina da odradis traning
>>>>> pa ako ti treba pomoc ti se javi
>>>>>
>>>>> You have severas ways to traind data.... what u need for ?
>>>>> pozdrav
>>>>> vladimir
>>>>>
>>>>>
>>>> Hi Vladimir,
>>>>
>>>> I am afraid you did not understand me ... I think I was not clear
>>>> enough:
>>>>
>>>> - I *do not need* new traindata. I *made new traindata for Serbian
>>>> Cyrillic myself* and I would like to offer this train data to all
>>>> Tesseract users that need to OCR text printed in Serbian Cyrillic.
>>>>
>>>> My question is: How do I send this file (srp.traineddata) to you,
>>>> Tesseract developers and maintainers?
>>>>
>>>> By zipping it and sending via email?
>>>> By uploading to a file sharing service? If so, which one?
>>>> By making a torrent out of it?
>>>>
>>>> Please advise
>>>>
>>>> Regards,
>>>> Zoltan
>>>>
>>>>
>>>>
>>>>> On Saturday, 1 November 2014 21:12:04 UTC+1, Puramoca021 wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have trained unreleased Tesseract 3.04 (available only in
>>>>>> Subversion repository) to recognize Serbian Cyrillic. Instructions for 
>>>>>> training
>>>>>> Tesseract 3
>>>>>> <https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> were
>>>>>> strictly followed - I used script *tesstrain.sh* and provided
>>>>>> required files.
>>>>>>
>>>>>> My question is: what is the procedure for submitting new trained data
>>>>>> so that they are available for new, upcoming version of Tesseract ?
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Zoltan
>>>>>>
>>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/tesseract-ocr/0362254d-260d-49fa-af8b-c098b50811f0%
>>>> 40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/0362254d-260d-49fa-af8b-c098b50811f0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/29a8e468-3f2d-4350-b48b-e925791086e2%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/29a8e468-3f2d-4350-b48b-e925791086e2%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW6Xbn2ScYrLgN8ZA86dFRvUp%3DN34H5uX22ZXSmjf8MnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Contribution : Serbian Cyrillic traineddata file

Reply via email to