I am trying a test training for coptic for tess4, will let you know where
to access traineddata.

You can train using utf-8 textand unicode optic fonts.

1. collect utf-8 text in Coptic
2. Find Coptic unicode fonts, if you can find one similar to the typewriter
font used in books it will make training easier
3. train a model with these and then finetune it with line images and
matching ground truth


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, May 30, 2018 at 4:09 PM, Ramast Magdy <ramast....@gmail.com> wrote:

> Thank you ShreeDevi for both moheb's link and the one below.
> The current one uses Tesseract 3 and according to the author:
> "Recognition quality of Coptic texts containing old fonts will be very
> poor, depending on the trained data."
>
> I will get in contact with him to see if we can use the other link you
> provided
> https://github.com/OCR-D/ocrd-train
> To train Tesseract 4.00
>
> Thank you very much
>
>
> On 05/30/2018 06:31 AM, ShreeDevi Kumar wrote:
>
> See http://www.moheb.de/ocr.html
>
> It provides a traineddata file for Coptic for use with tesseract version 3.
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, May 29, 2018 at 9:57 PM, <ramast....@gmail.com> wrote:
>
>> Hi,
>> I belong to a group who study an old Egyptian writing system called
>> "Coptic".
>> It's based mostly on Greek (with some variation).
>>
>> Big majority of books written in Coptic where during the last century and
>> were mostly the same [typewriter] font.
>> Here is a sample picture:
>> https://imgur.com/a/ILRw6vm
>> And sample book:
>> https://archive.org/download/pistissophiaopu00petegoog
>>
>> We need to add Coptic to languages supported by Tesseract but not sure
>> how.
>> I tried following this document https://github.com/tesseract-o
>> cr/tesseract/wiki/TrainingTesseract-4.00 but it's very difficult to
>> understand.
>>
>> We need someone help us with the initial setup so that we can dedicate
>> our man power to training the system.
>> We are none profit group so we are hoping for free help but we would also
>> consider paid help since the alternative is hundreds of hours of man labor
>> to digitalize just few books.
>>
>> Thanks everyone for contributing to this awesome project
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/08869d08-8b3a-4390-be79-fa811c78c0ca%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/08869d08-8b3a-4390-be79-fa811c78c0ca%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAG2NduUcTs8WTSM0Ppwuon%2B-e1RJHiS4pjsvLngYphW0yy4X2Q%
> 40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUcTs8WTSM0Ppwuon%2B-e1RJHiS4pjsvLngYphW0yy4X2Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV1OpBCrwfohb43JD0zJJM%2Bqnfh3dvC%3D3a3Fe1a5cHYCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to