* Changed subject to *Serbian Cyrillic* * Please note that issues allow attachments only up to 10MB. So, if the traineddata zipped version is larger than that, please host it elsewhere (eg. github) and provide a link. Ray/Jeff/Zdenko, please correct, if that is not the case.
ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Nov 4, 2014 at 7:35 AM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > Thanks for clarifying and giving more details. > > I am cc:ing this email to the tesseract developers group and Ray for > answer to your question "how to submit this file to Tesseract's > repository?. " > > Meanwhile, I suggest that you add an 'issue' and attach the traineddata. > > Thanks! > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, Nov 4, 2014 at 1:08 AM, Puramoca021 <puramoca...@gmail.com> wrote: > >> Hi Devi, >> >> Unfortunately, you are slightly misinformed as well. >> >> The file with trained data for Serbian language that is currently in >> Tesseract's repository contains LATIN characters. >> What I made is corpus of trained data that recognizes *Serbian Cyrillic* >> characters. >> >> A good summary and explanation what *Serbian Cyrillic* is can be found >> here <http://en.wikipedia.org/wiki/Serbian_Cyrillic_alphabet> (Wikipedia >> article). Please pay attention to section *"Modern alphabet"* in >> Wikipedia article. >> What current version of Tesseract's *srp.traineddata* can recognize are >> letters in column labelled "*Latin*" (see Wikipedia article). >> I would like to submit file with trained data which will make Tesseract >> recognize letters in column "*Cyrillic*" (again, see Wikipedia article). >> >> Again, I did not get a clear answer to my question - how to submit this >> file to Tesseract's repository? >> >> Shall I *assume* that I need to open an issue and submit trained data >> there? Please clarify. >> >> >> Regards, >> Zoltan >> >> >> понедељак, 03. новембар 2014. 19.45.38 UTC+1, shree је написао/ла: >>> >>> There already is language data for srp - please see >>> >>> https://code.google.com/p/tesseract-ocr/source/browse/srp/?repo=langdata >>> >>> and >>> >>> https://code.google.com/p/tesseract-ocr/source/browse/ >>> srp.traineddata?repo=tessdata >>> >>> Ray Smith, the lead developer of tesseract at Google is planning to >>> release updated versions of traineddata soon as part of 3.04 release. >>> >>> If your traineddata has something additional that is not there in the >>> existing set, then please add as attachment to an issue so that it can be >>> tested. >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Tue, Nov 4, 2014 at 12:02 AM, Puramoca021 <puram...@gmail.com> wrote: >>> >>>> >>>> On Sunday, November 2, 2014 4:45:32 PM UTC+1, Vladimir Radnovic wrote: >>>>> >>>>> Hi, Zdravo Zoltane >>>>> za sta ti treba novi traindata ? imas vise nacina da odradis traning >>>>> pa ako ti treba pomoc ti se javi >>>>> >>>>> You have severas ways to traind data.... what u need for ? >>>>> pozdrav >>>>> vladimir >>>>> >>>>> >>>> Hi Vladimir, >>>> >>>> I am afraid you did not understand me ... I think I was not clear >>>> enough: >>>> >>>> - I *do not need* new traindata. I *made new traindata for Serbian >>>> Cyrillic myself* and I would like to offer this train data to all >>>> Tesseract users that need to OCR text printed in Serbian Cyrillic. >>>> >>>> My question is: How do I send this file (srp.traineddata) to you, >>>> Tesseract developers and maintainers? >>>> >>>> By zipping it and sending via email? >>>> By uploading to a file sharing service? If so, which one? >>>> By making a torrent out of it? >>>> >>>> Please advise >>>> >>>> Regards, >>>> Zoltan >>>> >>>> >>>> >>>>> On Saturday, 1 November 2014 21:12:04 UTC+1, Puramoca021 wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have trained unreleased Tesseract 3.04 (available only in >>>>>> Subversion repository) to recognize Serbian Cyrillic. Instructions for >>>>>> training >>>>>> Tesseract 3 >>>>>> <https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> were >>>>>> strictly followed - I used script *tesstrain.sh* and provided >>>>>> required files. >>>>>> >>>>>> My question is: what is the procedure for submitting new trained data >>>>>> so that they are available for new, upcoming version of Tesseract ? >>>>>> >>>>>> >>>>>> Best regards, >>>>>> Zoltan >>>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To post to this group, send email to tesser...@googlegroups.com. >>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/tesseract-ocr/0362254d-260d-49fa-af8b-c098b50811f0% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/0362254d-260d-49fa-af8b-c098b50811f0%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/29a8e468-3f2d-4350-b48b-e925791086e2%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/29a8e468-3f2d-4350-b48b-e925791086e2%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW6Xbn2ScYrLgN8ZA86dFRvUp%3DN34H5uX22ZXSmjf8MnQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.