I'm using tesseract-ocr from emgucv, with the following options:

"tessedit_char_whitelist", "0123456789."
"load_system_dawg", "false"
"load_freq_dawg", "false"

When i'm using tesseract only (without cube engine), it sometimes gives me 
bad results.

An example of how I'm currently trying to handle and interpret the image:

I take the original image then:
1. add a border to it
2. enlarge it 3x using Lanzos4 (in my experinece 3x is the best probably 
because each pixel became 3x3 full pixel) 
3. binaryze it in the middle between the darkest and lightest, and invert 
it if the forecolor is white.

It reads 1080 instead 480:

Original: 
<https://lh3.googleusercontent.com/-neVcLLrrVg8/V3P2Bm4A37I/AAAAAAAADgU/c1Uzttmc_TgwkTg6e8Cghjq8LPHEhG3rwCLcB/s1600/480%2Bas%2B1080%2Boriginal.png>

<https://lh3.googleusercontent.com/-2G1FF2XvAzs/V3P2ISQsECI/AAAAAAAADgc/DbcN1hKQogAr-9-d9Y1jlKirFp4tv3EJwCLcB/s1600/480%2Bas%2B1080%2Boriginal.png>
 ->  
<https://lh3.googleusercontent.com/-AphnpLfJE2w/V3P2NzWXuRI/AAAAAAAADgk/_dHy7BgTbiYCpQo68gXUJqGilzEOrRlJgCLcB/s1600/480%2Bas%2B1080.png>
 ->  
<https://lh3.googleusercontent.com/-LUzVWjG__Rg/V3P2R2H6ruI/AAAAAAAADgs/wvVxUG1goAQjIRR2rjEVItgbygLuntOHwCLcB/s1600/480%2Bas%2B1080%2Botline.png>


Or it reads 1000 instead 400:


Original:  
<https://lh3.googleusercontent.com/--kiuKxkd9ws/V3P2oIK9RNI/AAAAAAAADg8/Nqpb3dfyBwIGEy4N1hM8g7QIWg2teD-IACLcB/s1600/480%2Bas%2B1000%2Boriginal.png>

<https://lh3.googleusercontent.com/-arfGj_XGqJc/V3P2_NIwf9I/AAAAAAAADhQ/IN8f0XNOOcsbRAjPcYffsP20nSQHrGgtwCLcB/s1600/480%2Bas%2B1000%2Boriginal.png>
->  
<https://lh3.googleusercontent.com/-8B-TgUlhFsA/V3P2t3JzG5I/AAAAAAAADhE/6hlsUjyDY1A40WeB6wUAstWUMxmtybVewCLcB/s1600/480%2Bas%2B1000.png>
 ->  
<https://lh3.googleusercontent.com/-Rfq9WDGY4AA/V3P3ENeTqjI/AAAAAAAADhY/HJ7aiEQWYRAuP-YtJV7sj3_TBYUET518ACLcB/s1600/480%2Bas%2B1000%2Boutline.jpg>


Are there any Tesseract options that might improve this? Or have i done 
something wrong with the original image? 

I also tested using cube-engine. I'm not looking for suggestions reated to 
that, but a few words about my experiences.
If i'm using only cubeenegine, it gives me garbage most of the time.
If i'm using both combined, it gives me better results, but sometimes very 
unexpected results, like this (i know parameters like 
tessedit_char_whitelist not in effect in cube-engine mode)

Like its read "M)" instead 40:

Original:  
<https://lh3.googleusercontent.com/-_hncD3yvYQY/V3P3OQw0aBI/AAAAAAAADhg/3zNwJR4G-GkhPJBeT_bSfxTSK2Crm_0OwCLcB/s1600/40%2Bas%2BM%2529%2Boriginal.png>

<https://lh3.googleusercontent.com/-JpTPhHU9t7E/V3P3SfwXr3I/AAAAAAAADho/upZyqHvUYBMrdPpXnTWpzk9r744dkRWqACLcB/s1600/40%2Bas%2BM%2529%2Boriginal.png>
 ->  
<https://lh3.googleusercontent.com/-IVX3S3YwEPs/V3P3duFb_RI/AAAAAAAADh4/hf-rxUq1nzIXtUoLNXBsrJSUTBcQaAaIwCLcB/s1600/40%2Bas%2BM%2529.png>
 ->  
<https://lh3.googleusercontent.com/-1o9Nlz11zaU/V3P3jn3ctVI/AAAAAAAADiA/SxUSErsf0aM_nstWgUrOll-Ajsi_0fnSQCLcB/s1600/40%2Bas%2BM%2529.%2Boutline.png>




I also tried the following options, but i got the same results as above:
"chop_enable", "true");
"enable_new_segsearch", "0"
"language_model_ngram_on", "0"


Any suggestion welcome.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1745b04c-302d-4ca2-9db4-7938b3fa70be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to