I'm using tesseract-ocr from emgucv, with the following options: "tessedit_char_whitelist", "0123456789." "load_system_dawg", "false" "load_freq_dawg", "false"
When i'm using tesseract only (without cube engine), it sometimes gives me bad results. An example of how I'm currently trying to handle and interpret the image: I take the original image then: 1. add a border to it 2. enlarge it 3x using Lanzos4 (in my experinece 3x is the best probably because each pixel became 3x3 full pixel) 3. binaryze it in the middle between the darkest and lightest, and invert it if the forecolor is white. It reads 1080 instead 480: Original: <https://lh3.googleusercontent.com/-neVcLLrrVg8/V3P2Bm4A37I/AAAAAAAADgU/c1Uzttmc_TgwkTg6e8Cghjq8LPHEhG3rwCLcB/s1600/480%2Bas%2B1080%2Boriginal.png> <https://lh3.googleusercontent.com/-2G1FF2XvAzs/V3P2ISQsECI/AAAAAAAADgc/DbcN1hKQogAr-9-d9Y1jlKirFp4tv3EJwCLcB/s1600/480%2Bas%2B1080%2Boriginal.png> -> <https://lh3.googleusercontent.com/-AphnpLfJE2w/V3P2NzWXuRI/AAAAAAAADgk/_dHy7BgTbiYCpQo68gXUJqGilzEOrRlJgCLcB/s1600/480%2Bas%2B1080.png> -> <https://lh3.googleusercontent.com/-LUzVWjG__Rg/V3P2R2H6ruI/AAAAAAAADgs/wvVxUG1goAQjIRR2rjEVItgbygLuntOHwCLcB/s1600/480%2Bas%2B1080%2Botline.png> Or it reads 1000 instead 400: Original: <https://lh3.googleusercontent.com/--kiuKxkd9ws/V3P2oIK9RNI/AAAAAAAADg8/Nqpb3dfyBwIGEy4N1hM8g7QIWg2teD-IACLcB/s1600/480%2Bas%2B1000%2Boriginal.png> <https://lh3.googleusercontent.com/-arfGj_XGqJc/V3P2_NIwf9I/AAAAAAAADhQ/IN8f0XNOOcsbRAjPcYffsP20nSQHrGgtwCLcB/s1600/480%2Bas%2B1000%2Boriginal.png> -> <https://lh3.googleusercontent.com/-8B-TgUlhFsA/V3P2t3JzG5I/AAAAAAAADhE/6hlsUjyDY1A40WeB6wUAstWUMxmtybVewCLcB/s1600/480%2Bas%2B1000.png> -> <https://lh3.googleusercontent.com/-Rfq9WDGY4AA/V3P3ENeTqjI/AAAAAAAADhY/HJ7aiEQWYRAuP-YtJV7sj3_TBYUET518ACLcB/s1600/480%2Bas%2B1000%2Boutline.jpg> Are there any Tesseract options that might improve this? Or have i done something wrong with the original image? I also tested using cube-engine. I'm not looking for suggestions reated to that, but a few words about my experiences. If i'm using only cubeenegine, it gives me garbage most of the time. If i'm using both combined, it gives me better results, but sometimes very unexpected results, like this (i know parameters like tessedit_char_whitelist not in effect in cube-engine mode) Like its read "M)" instead 40: Original: <https://lh3.googleusercontent.com/-_hncD3yvYQY/V3P3OQw0aBI/AAAAAAAADhg/3zNwJR4G-GkhPJBeT_bSfxTSK2Crm_0OwCLcB/s1600/40%2Bas%2BM%2529%2Boriginal.png> <https://lh3.googleusercontent.com/-JpTPhHU9t7E/V3P3SfwXr3I/AAAAAAAADho/upZyqHvUYBMrdPpXnTWpzk9r744dkRWqACLcB/s1600/40%2Bas%2BM%2529%2Boriginal.png> -> <https://lh3.googleusercontent.com/-IVX3S3YwEPs/V3P3duFb_RI/AAAAAAAADh4/hf-rxUq1nzIXtUoLNXBsrJSUTBcQaAaIwCLcB/s1600/40%2Bas%2BM%2529.png> -> <https://lh3.googleusercontent.com/-1o9Nlz11zaU/V3P3jn3ctVI/AAAAAAAADiA/SxUSErsf0aM_nstWgUrOll-Ajsi_0fnSQCLcB/s1600/40%2Bas%2BM%2529.%2Boutline.png> I also tried the following options, but i got the same results as above: "chop_enable", "true"); "enable_new_segsearch", "0" "language_model_ngram_on", "0" Any suggestion welcome. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1745b04c-302d-4ca2-9db4-7938b3fa70be%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

