[tesseract-ocr] LSTM training tesseract OCR high error rate

2024-03-12 Thread Mridul Davesar
Hey everyone , I am train my own lstm model based using some specific images that I want tesseract to work efficiently on. I have used the command *$ lstmtraining --model_output=my_output.lstm --traineddata="C:\Program Files\Tesseract-OCR\tessdata\eng.traineddata" --old_traineddata="C:\Program

[tesseract-ocr] Training Tesseract 5 for a New Font in Thai not wroking

2024-03-12 Thread Panumeth Khongsawatkiat
I tried to train Tesseract 5 with a new font in Thai but The BCER value keeps increasing. This is the detail Font : TH Sarabun New (200 samples) Base Model: tha.traineddata (I download it from tessdata_best) (base) Unknown tesstrain % TESSDATA_PREFIX=../tesseract/tessdata

[tesseract-ocr] Does training new images increase the size of the traindata file?

2024-03-12 Thread Cain Pian
I've trained thousands of images. But the traineddata file size didn't change at all. Did I do something wrong? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email

[tesseract-ocr] Leptonica directory

2024-03-12 Thread Ravil R
Windows, msvc 2022, win32, I've got some questions regarding compilation 1) How to specify the directory where Leptonica is installed? No matter what I tried sln file every time contains *c:\Program Files(x86)\Leptonica* 2) Leptonica is definitely compiled with libtiff support: *-- Used TIFF

Re: [tesseract-ocr] user patterns with tesserocr python API

2024-03-12 Thread Zdenko Podobny
One correction: I checked the example in the below mentioned url with the Tesseract executable and tessdata repository. The result is that user_pattern is effecting also LSTM. This could be easily tested by generating output without user_patters (Arial.txt): tesseract Arial.png Arial And with