[tesseract-ocr] Lack of accuracy on reading numbers

2024-03-27 Thread Ajay Pandya
Hello Everyone, I am using tesseract 5.2 with C#. Having problem in reading this number. PSM : 8 OEM : 3 Train file : eng (Best) Data : 3, Reading 3111. We have many same images with different numbers. Sometimes it adds extra number and some times it removes. Kindly help with this

Re: [tesseract-ocr] Lack of accuracy on reading numbers

2024-03-27 Thread Zdenko Podobny
Always test the command line if there is an issue with the wrapper. tesseract -v tesseract 5.3.4-44-g2b07 leptonica-1.84.0 (Dec 31 2023, 23:36:37) [MSC v.1929 LIB Release x64] libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 2.1.90) : libpng 1.6.40 : libtiff 4.6.0 : zlib 1.2.13.zlib-ng : libwebp 1.3.2

Re: [tesseract-ocr] fine tuning on images

2024-03-27 Thread Zdenko Podobny
You can easily test your hypothesis by modifying Makefile[1] lines from tesseract "$<" $* --psm $(PSM) lstm.train to tesseract "$<" $* --psm $(PSM) -l $(START_MODEL) lstm.train [1] https://github.com/tesseract-ocr/tesstrain/blob/19f79e2d38dfeada41a96c8d87426c85a7eaa454/Makefile#L242-L255

Re: [tesseract-ocr] Getting Error: No such file or directory: 'data/foo/all-lstmf'

2024-03-27 Thread Zdenko Podobny
You can try custom images - see the example ocrd-testset.zip And follow the example from https://github.com/tesseract-ocr/tesstrain/blob/main/README.md : unzip ocrd-testset.zip -d data/ocrd-ground-truth make training