Can you try to remove it from the list of punctuations?

To do that, you need to extract the components of the traineddata file,
edit the ara.punc file, and then recombine them.

To extract the components: *combine_tessdata -d ara.traineddata*


On 20 Nov 2023 at 4:39:29 PM, Sifdin Nahhas <[email protected]> wrote:

> Hey guys,
> so i have problem where tesseract remove Extender letter in arabic "ـ"
> because it recognize it as underline like the images bellow
> i think it because of some configuration varaibles but i could not find
> the responsable one
>
> appreciate the help.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/911e8ef4-68f3-4e9d-b40b-e7a715ab912cn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/911e8ef4-68f3-4e9d-b40b-e7a715ab912cn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CA%2BLi4kCiTH_orpqOwMGC8dqs8OpiEE1CgBHG7Lx4LKZAm1Seww%40mail.gmail.com.

Reply via email to