Thanks for clarifying and giving more details. I am cc:ing this email to the tesseract developers group and Ray for answer to your question "how to submit this file to Tesseract's repository?. "
Meanwhile, I suggest that you add an 'issue' and attach the traineddata. Thanks! ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Nov 4, 2014 at 1:08 AM, Puramoca021 <[email protected]> wrote: > Hi Devi, > > Unfortunately, you are slightly misinformed as well. > > The file with trained data for Serbian language that is currently in > Tesseract's repository contains LATIN characters. > What I made is corpus of trained data that recognizes *Serbian Cyrillic* > characters. > > A good summary and explanation what *Serbian Cyrillic* is can be found > here <http://en.wikipedia.org/wiki/Serbian_Cyrillic_alphabet> (Wikipedia > article). Please pay attention to section *"Modern alphabet"* in > Wikipedia article. > What current version of Tesseract's *srp.traineddata* can recognize are > letters in column labelled "*Latin*" (see Wikipedia article). > I would like to submit file with trained data which will make Tesseract > recognize letters in column "*Cyrillic*" (again, see Wikipedia article). > > Again, I did not get a clear answer to my question - how to submit this > file to Tesseract's repository? > > Shall I *assume* that I need to open an issue and submit trained data > there? Please clarify. > > > Regards, > Zoltan > > > понедељак, 03. новембар 2014. 19.45.38 UTC+1, shree је написао/ла: >> >> There already is language data for srp - please see >> >> https://code.google.com/p/tesseract-ocr/source/browse/srp/?repo=langdata >> >> and >> >> https://code.google.com/p/tesseract-ocr/source/browse/ >> srp.traineddata?repo=tessdata >> >> Ray Smith, the lead developer of tesseract at Google is planning to >> release updated versions of traineddata soon as part of 3.04 release. >> >> If your traineddata has something additional that is not there in the >> existing set, then please add as attachment to an issue so that it can be >> tested. >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Tue, Nov 4, 2014 at 12:02 AM, Puramoca021 <[email protected]> wrote: >> >>> >>> On Sunday, November 2, 2014 4:45:32 PM UTC+1, Vladimir Radnovic wrote: >>>> >>>> Hi, Zdravo Zoltane >>>> za sta ti treba novi traindata ? imas vise nacina da odradis traning pa >>>> ako ti treba pomoc ti se javi >>>> >>>> You have severas ways to traind data.... what u need for ? >>>> pozdrav >>>> vladimir >>>> >>>> >>> Hi Vladimir, >>> >>> I am afraid you did not understand me ... I think I was not clear enough: >>> >>> - I *do not need* new traindata. I *made new traindata for Serbian >>> Cyrillic myself* and I would like to offer this train data to all >>> Tesseract users that need to OCR text printed in Serbian Cyrillic. >>> >>> My question is: How do I send this file (srp.traineddata) to you, >>> Tesseract developers and maintainers? >>> >>> By zipping it and sending via email? >>> By uploading to a file sharing service? If so, which one? >>> By making a torrent out of it? >>> >>> Please advise >>> >>> Regards, >>> Zoltan >>> >>> >>> >>>> On Saturday, 1 November 2014 21:12:04 UTC+1, Puramoca021 wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have trained unreleased Tesseract 3.04 (available only in Subversion >>>>> repository) to recognize Serbian Cyrillic. Instructions for training >>>>> Tesseract 3 >>>>> <https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> were >>>>> strictly followed - I used script *tesstrain.sh* and provided >>>>> required files. >>>>> >>>>> My question is: what is the procedure for submitting new trained data >>>>> so that they are available for new, upcoming version of Tesseract ? >>>>> >>>>> >>>>> Best regards, >>>>> Zoltan >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/tesseract-ocr/0362254d-260d-49fa-af8b-c098b50811f0% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/0362254d-260d-49fa-af8b-c098b50811f0%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/29a8e468-3f2d-4350-b48b-e925791086e2%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/29a8e468-3f2d-4350-b48b-e925791086e2%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVXnKWEyPZiw4exZhTZ0t769JU1rb2JtQqeDgeSBR8y6g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

