Re: [tesseract-ocr] Re: Adding new language to Tesseract?

Puramoca021 Mon, 03 Nov 2014 11:39:13 -0800

Hi Devi,

Unfortunately, you are slightly misinformed as well.


The file with trained data for Serbian language that is currently in 
Tesseract's repository contains LATIN characters.
What I made is corpus of trained data that recognizes *Serbian Cyrillic*
 characters.

A good summary and explanation what *Serbian Cyrillic* is can be found here 
<http://en.wikipedia.org/wiki/Serbian_Cyrillic_alphabet> (Wikipedia 
article). Please pay attention to section *"Modern alphabet"* in Wikipedia 
article.
What current version of Tesseract's *srp.traineddata* can recognize are 
letters in column labelled "*Latin*" (see Wikipedia article).
I would like to submit file with trained data which will make Tesseract 
recognize letters in column "*Cyrillic*" (again, see Wikipedia article).

Again, I did not get a clear answer to my question - how to submit this 
file to Tesseract's repository?

Shall I *assume* that I need to open an issue and submit trained data 
there? Please clarify.


Regards,
Zoltan


понедељак, 03. новембар 2014. 19.45.38 UTC+1, shree је написао/ла:
>
> There already is language data for srp - please see 
>
> https://code.google.com/p/tesseract-ocr/source/browse/srp/?repo=langdata
>
> and
>
>
> https://code.google.com/p/tesseract-ocr/source/browse/srp.traineddata?repo=tessdata
>
> Ray Smith, the lead developer  of tesseract at Google is planning to 
> release updated versions of traineddata soon as part of 3.04 release.
>
> If  your traineddata has something additional that is not there in the 
> existing set, then please add as attachment to an issue so that it can be 
> tested.
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Nov 4, 2014 at 12:02 AM, Puramoca021 <[email protected] 
> <javascript:>> wrote:
>
>>
>> On Sunday, November 2, 2014 4:45:32 PM UTC+1, Vladimir Radnovic wrote:
>>>
>>> Hi, Zdravo Zoltane
>>> za sta ti treba novi traindata ? imas vise nacina da odradis traning pa 
>>> ako ti treba pomoc ti se javi
>>>
>>> You have severas ways to traind data.... what u need for ?
>>> pozdrav
>>> vladimir
>>>
>>>
>> Hi Vladimir,
>>
>> I am afraid you did not understand me ... I think I was not clear enough:
>>
>> - I *do not need* new traindata. I *made new traindata for Serbian 
>> Cyrillic myself* and I would like to offer this train data to all 
>> Tesseract users that need to OCR text printed in Serbian Cyrillic.
>>
>> My question is: How do I send this file (srp.traineddata) to you, 
>> Tesseract developers and maintainers?
>>
>> By zipping it and sending via email?
>> By uploading to a file sharing service? If so, which one?
>> By making a torrent out of it?
>>
>> Please advise
>>
>> Regards,
>> Zoltan
>>
>>  
>>
>>> On Saturday, 1 November 2014 21:12:04 UTC+1, Puramoca021 wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have trained unreleased Tesseract 3.04 (available only in Subversion 
>>>> repository) to recognize Serbian Cyrillic. Instructions for training 
>>>> Tesseract 3 
>>>> <https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> were 
>>>> strictly followed - I used script *tesstrain.sh* and provided required 
>>>> files.
>>>>
>>>> My question is: what is the procedure for submitting new trained data 
>>>> so that they are available for new, upcoming version of Tesseract ?
>>>>
>>>>
>>>> Best regards,
>>>> Zoltan
>>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/0362254d-260d-49fa-af8b-c098b50811f0%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/0362254d-260d-49fa-af8b-c098b50811f0%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/29a8e468-3f2d-4350-b48b-e925791086e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Adding new language to Tesseract?

Reply via email to