I am trying a test training for coptic for tess4, will let you know where to access traineddata.
You can train using utf-8 textand unicode optic fonts. 1. collect utf-8 text in Coptic 2. Find Coptic unicode fonts, if you can find one similar to the typewriter font used in books it will make training easier 3. train a model with these and then finetune it with line images and matching ground truth ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, May 30, 2018 at 4:09 PM, Ramast Magdy <ramast....@gmail.com> wrote: > Thank you ShreeDevi for both moheb's link and the one below. > The current one uses Tesseract 3 and according to the author: > "Recognition quality of Coptic texts containing old fonts will be very > poor, depending on the trained data." > > I will get in contact with him to see if we can use the other link you > provided > https://github.com/OCR-D/ocrd-train > To train Tesseract 4.00 > > Thank you very much > > > On 05/30/2018 06:31 AM, ShreeDevi Kumar wrote: > > See http://www.moheb.de/ocr.html > > It provides a traineddata file for Coptic for use with tesseract version 3. > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, May 29, 2018 at 9:57 PM, <ramast....@gmail.com> wrote: > >> Hi, >> I belong to a group who study an old Egyptian writing system called >> "Coptic". >> It's based mostly on Greek (with some variation). >> >> Big majority of books written in Coptic where during the last century and >> were mostly the same [typewriter] font. >> Here is a sample picture: >> https://imgur.com/a/ILRw6vm >> And sample book: >> https://archive.org/download/pistissophiaopu00petegoog >> >> We need to add Coptic to languages supported by Tesseract but not sure >> how. >> I tried following this document https://github.com/tesseract-o >> cr/tesseract/wiki/TrainingTesseract-4.00 but it's very difficult to >> understand. >> >> We need someone help us with the initial setup so that we can dedicate >> our man power to training the system. >> We are none profit group so we are hoping for free help but we would also >> consider paid help since the alternative is hundreds of hours of man labor >> to digitalize just few books. >> >> Thanks everyone for contributing to this awesome project >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/tesseract-ocr/08869d08-8b3a-4390-be79-fa811c78c0ca%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/08869d08-8b3a-4390-be79-fa811c78c0ca%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/CAG2NduUcTs8WTSM0Ppwuon%2B-e1RJHiS4pjsvLngYphW0yy4X2Q% > 40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUcTs8WTSM0Ppwuon%2B-e1RJHiS4pjsvLngYphW0yy4X2Q%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV1OpBCrwfohb43JD0zJJM%2Bqnfh3dvC%3D3a3Fe1a5cHYCQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.