Hi, 
I would like to hear other's opinions on your questions too. 
In my case, when I try using Tesseract for Japan train tickets, I have to 
do a lot of steps for preprocessing (remove background colors, noise + line 
removal, increase contrast,  etc.) to get satisfactory results. 
I am sure what you are doing (locating text boxes, extracting them, and 
feeding them one by one to tesseract) can get better accuracy results. 
However, when the number of text boxes increases, it will undoubtedly 
affect your performance. 
Could you share the PSM mode for getting those text boxes' locations ?  I 
usually use the AUTO_OSD to get the boxes and expand them a bit at the 
edges before passing them to Tesseract. 

Regards
Hai
 
On Saturday, September 2, 2023 at 7:03:49 AM UTC+9 apism...@gmail.com wrote:

> I'm looking into OCR for ID cards and drivers licenses, and I found out 
> that tesseract performs relatively poor on ID cards, compared to other OCR 
> solutions. For this original image: 
> https://github.com/apismensky/ocr_id/blob/main/images/boxes_easy/AR.png 
> the results are: 
>
> tesseract: "4d DL 999 as = Ne allo) 2NICK © , q 12 RESTR oe } lick: 5 DD 
> 8888888888 <(888)%20888-8888> 1234 SZ"
> easyocr:  '''9 , ARKANSAS DRIVER'S LICENSE CLAss D 4d DLN 999999999 3 DOB 
> 03/05/1960 ] 2 SCKPLE 123 NORTH STREET CITY AR 12345 ISS 4b EXP 03/05/2018 
> 03/05/2026 15 SEX 16 HGT 18 EYES 5'-10" BRO 9a END NONE 12 RESTR NONE Ylck 
> Sorble DD 8888888888 1234 THE'''
> google cloud vision: """SARKANSAS\nSAMPLE\nSTATE O\n9 CLASS D\n4d DLN 
> 9999999993 DOB 03/05/1960\nNick Sample\nDRIVER'S LICENSE\n1 SAMPLE\n2 
> NICK\n8 123 NORTH STREET\nCITY, AR 12345\n4a ISS\n03/05/2018\n15 SEX 16 
> HGT\nM\n5'-10\"\nGREAT SE\n9a END NONE\n12 RESTR NONE\n5 DD 8888888888 
> 1234\n4b EXP\n03/05/2026 MS60\n18 EYES\nBRO\nRKANSAS\n0"""
>
> and word accuracy is:
>
>              tesseract  |  easyocr  |  google
> words         10.34%    |  68.97%   |  82.76%
>
> This is "out if the box" performance, without any preprocessing. I'm not 
> surprised that google vision is that good compared to others, but easyocr, 
> which is another open source solution performs much better than tesseract 
> is this case. I have the whole project dedicated to this, and all other 
> results are much better for easyocr: 
> https://github.com/apismensky/ocr_id/blob/main/result.json, all input 
> files are files in 
> https://github.com/apismensky/ocr_id/tree/main/images/sources
> After digging into it for a little bit, I suspect that bounding box 
> detection is much better in google (
> https://github.com/apismensky/ocr_id/blob/main/images/boxes_google/AR.png) 
> and easyocr (
> https://github.com/apismensky/ocr_id/blob/main/images/boxes_easy/AR.png), 
> than in tesseract (
> https://github.com/apismensky/ocr_id/blob/main/images/boxes_tesseract/AR.png).
>  
>
> I'm pretty sure, about this, cause when I manually cut the text boxes and 
> feed them to tesseract it works much better. 
>
>
> Now questions: 
>
> - What is the part of the codebase in tesseract that is responsible for 
> text detection and which algorithm is it using? 
> - What is impacting bounding box detection in tesseract so it fails on 
> these types of images (complex layouts / background noise... etc)
> - Is it possible to use the same text detection procedure as easyocr or 
> improve the existing one?  
> - Maybe possible to switch text detection algo based on the image type or 
> make it pluggable where user can configure from several options A,B,C...
>
>
> Thanks. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/93e637fd-2440-4fd5-9858-fc64350faaadn%40googlegroups.com.

Reply via email to