[tesseract-ocr] Re: train tess4j on a specific font?

Quan Nguyen Sun, 30 Jan 2022 15:53:22 -0800

Not about training, but you should use the latest version of tess4j that 
corresponds to the latest Tesseract releases.


https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j
https://github.com/nguyenq/tess4j

Hope it will produce better results for you.

On Saturday, January 29, 2022 at 1:52:48 AM UTC-6 Bernd Angelo wrote:

> Hello, I am having trouble getting numbers recognized.
> I am using Tess4J from http://tess4j.sourceforge.net/ 
> which, if I am not wrong, is using Tesseract 3.05 in the background.
>
> I followed the instructions outlined here:
> http://tess4j.sourceforge.net/tutorial/
> (using the command line version, no eclipse, maven or other sh't)
>
> I can modify the TesseractExample.java file without an issue and doing the 
> 2 command line commands mentioned in the site above, can do an tesseract 
> ocr scan on any png or jpg I want.
>
> Now you see what I in the end want to do is use ocr to make my program 
> "read" the balance of an online casino and with that balance now given as a 
> string variable, I will do all kinds of actions based on it.
> so reading the numbers properly is important.
>
> Now for test purposes I took 2 screenshots that together include all the 
> different digits that can appear, so 0-9.
>
> when I do the normal ocr as instructed in the page above, (from my 
> knowledge, it then uses the pre-trained standard eng.traineddata file)
> sadly both the digits 4 and 6 in the image are read as 5.
> the euro sign € is also as the pound sign isntead but that is of minor 
> importance to me.
> the ocr not being able to distinguish between 4 and 6 really sucks.
>
> The pictures used are these ones:
> https://ibb.co/ZTRFqVg
> https://ibb.co/p23w7nj
>
> As said, they are basically screenshots of the casino site and so I cant 
> influence the font or size or anything.
>
> as said, the ocr reads the "4,6" part as "5,5".
>
> which is bad.
> So I thought, why not use the 2 images to train tesseract, as obviously 
> tesseract having seen all the possible digits should give it 100% accuracy, 
> right?
> well, I got myself jtessboxeditor, got myself serrak tesseract trainer, 
> did a ton of stuff and created the traineddata from the image.
> and made the ocr file use it to try to ocr the image again.
> well, I wrote a line in my code to System.out.print the string and also 
> write down its length.
> I dont know what ocr does. but the stuff written as a result in the 
> command line window is an empty line (where the result string should stand) 
> and string length is claimed  to be 6 (it should be 11 with all the digits. 
> and , involved).
> so I dont know watf ocr is doing, is sucks way harder than with the 
> standard eng language.
>
> so I did some bit of googling, apparently the font "Alte DIN 1451 
> Mittelschrift" is VERY similar to my number, the casino (for the balance 
> display at least) uses this font or a very similar one.
> so while I know about a font worth training with (I also already 
> downloaded it's ttf file) I havent the slightest idea how to train with the 
> font.
>
> Can someone please help me, explain to me why the ocr result can be that 
> bad after training with the actual image to ocr?
> (was a pain to perfectly fit the rectangles to the digits!)
> or how to train tess4j with the given font?
> google even tells me about such a one click service but sadly it is 
> apparently gone by now.
>
> can someone help me please? :-)
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1bd230a7-b761-401f-80ce-acec7dd67a4an%40googlegroups.com.

[tesseract-ocr] Re: train tess4j on a specific font?

Reply via email to