Image processing can improve the result, but this typo is very particular, i.e. "unconnected" digit. I try morphological transformations to re-connect digit, better (*tesseract ocr_inv-8.png ocr-8 --psm 6*) , but yet far away a proper result. According to me, train *tesseract* with this typo is a way.
@*JB*Δ <http://jbigdata.fr/jbweb2/dev/miscs/image/index.html> Le ven. 26 avr. 2019 à 09:53, <[email protected]> a écrit : > Hi, > > I created a bazaar file as attached with load_system_dawg, load_freq_dawg > set to F. I also want to use user-patterns so I set it as well. > load_system_dawg F > load_freq_dawg F > user_patterns_suffix user-patterns > > In the same directory, I also have the user-pattern file: > \d\d\d\d\c > > So the structure looks like: > > ./bazaar > ./eng.user-patterns > ./ocr_inv.png > > > But these settings still fail to recognise the image correctly as "1880A". > If I just run tesseract without any bells and whistles, the outputs are > still the same. > > Commands used: > tesseract ocr_inv.png stdout > tesseract ocr_inv.png stdout bazaar > tesseract ocr_inv.png stdout --user-patterns eng.user-patterns bazaar > > Output: > Warning. Invalid resolution 0 dpi. Using 70 instead. > Estimating resolution as 1128 > ISON > > Can anyone tell me if this is expected behaviour? > Tesseract version: > tesseract 4.0.0-beta.1 > leptonica-1.75.3 > libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : > libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 > > Found AVX2 > Found AVX > Found SSE > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/10d1ed21-d30e-4eaf-8f2a-6fdf74a6a7d1%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/10d1ed21-d30e-4eaf-8f2a-6fdf74a6a7d1%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CA%2BPN%3DMz-U7-dQ8HONEL-_AKyg1jRAptSaOpk2Z4K%2BR073vCbRg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
IGOOA

