[tesseract-ocr] Re: use of unicharambigs

Andrea Rossato Mon, 20 Mar 2023 11:53:00 -0700

Hi,

no, unicharambigs is not used by LSTM files. It was used in the legacy mode.


I'm having similar problems with the ancient greek best traineddata: 
unfortunately it has been trained with some non standard characters (ά έ ή 
ί ό ύ ώ, instead of  ά έ ή ί ό ύ ώ). I tried fine tuning the 
grc.traineddata, but without very much success, so, for the time being, I'm 
producing hocr files, post-process them and then use hocr-pdf to create a 
searchable pdf.


best,
andrea
On Monday, March 13, 2023 at 5:13:33 PM UTC+1 Isidore Paris wrote:

> Hi,
> I'm doing some frk text recognition, and in my results, I have a great 
> number of " > ". Each one should be replaced by " ck ".
> I updated my frk.traineddata file (from tessdata_best repository) with a 
> frk.unicharambigs file (I tried both formats v1 and v2) but absolutely 
> nothing changed.
> I also tried the parameter " -c use_ambigs_for_adaption=1 " to see if 
> maybe it was needed, but still nothing changed, not a single character (> 
> and = and / are all still there).
>
> Here is the content of my v2 frk.unicharambigs file:
> v2
> > ck 1
> = - 1
> / - 1
>
> Does unicharambigs not work with LSTM files? Or did I miss some particular 
> or special step?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bf3c74e3-2e6c-40e8-91b9-c2c76921ccffn%40googlegroups.com.

[tesseract-ocr] Re: use of unicharambigs

Reply via email to