Please see https://github.com/Shreeshrii/tesstrain-ckb It uses a modified
training text based on what you sent and earlier text that  I had from
Pewan and other corpora.

Currently the training data includes
* AWN 0-9
* AEN - ARabic numbers
* No Persian numbers since some shapes are similar to Arabic Numbers

Fonts do not include those which convert 0-9 to either Arabic or Persian
numbers.

The replace layer training is still ongoing. The eval results look much
better than the official ara or script/Arabic, however I do not have any
real world images for testing.

ArialArial BoldTahomaTahoma Bold
tessdata_fast/ara Accuracy 62.74 63.49 61.56 61.71
tessdata_fast/ara Basic Arabic 95.68 95.22 95.76 94.10
tessdata_fast/ara Arabic Extended 0.31 1.13 0.41 1.32
tessdata_fast/script/Arabic Accuracy 80.99 80.83 83.02 77.17
tessdata_fast/script/Arabic Basic Arabic 96.68 96.34 96.05 93.87
tessdata_fast/script/Arabic Arabic Extended 57.20 58.23 63.76 54.72
ckbLayer_1.661_152089_296500
ckbLayer_fast Accuracy 98.20 97.78 98.06 96.13
ckbLayer_fast Basic Arabic 99.10 99.15 98.54 98.44
ckbLayer_fast Arabic Extended 98.30 98.70 99.10 96.27


On Mon, Jan 13, 2020 at 7:17 PM Ayub Rauf wrote:

> Hi,
> I attached full training text with forbidden_characters in it.
> really both of number types will be used and I see two type numbers
> written in books but Kurdish institute verified that Arabic numbers will be
> used from now on. Persian numbers written by Iranian Kurds and Arabic
> number used by Iraqi Kurds but as I said numbers in ckb should be
> written by Arabic type, but we have to recognize two type in OCR.
> just like two types of "ك" and "ک" that written in books but now we only
> use "ک".
> I think these similarities won't into problem after that we can correct
> letters in a spell checker.
> As I said before Arial and Tahoma fonts are the most used fonts books
> written by.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWm%3DXQaxBergf5-OUE-C8jB3u12dSOPUPchRZT4w21Z-g%40mail.gmail.com.

Reply via email to