You could use OpenCV to define a template with regions of interest (ROIs) 
and then use tesseract to OCR them?

On Tuesday, January 25, 2022 at 3:52:57 AM UTC-7 [email protected] wrote:

> I'm also facing the same issue..... Can someone throw some time light here 
> please
>
>
>
> On Saturday, January 11, 2020 at 12:51:09 PM UTC+5:30 
> [email protected] wrote:
>
>> yes. that's what I am doing right now. but there's an issue. 
>>
>> While I am getting all the names, I am not able to detect the [father's 
>> name] even if it is in the lot and some unwanted garbage is sometimes 
>> getting in the way.
>>
>> I am attaching a sample image for your reference. and the text generated 
>> from that (in a list).
>>
>>
>> ['*MONIKA MAHADEV SHINDE*', '*ARa Gar*', 'GOVT. OF INDIA', '*MAHADEV 
>> SHINDE*', '*31/10/1992*', 'Permanent Account Number', '*EJAPS0276M *~', 
>> 'MONIKA 1 SHIN OE :', '- 8', 'Signature']
>>
>>
>> I am obtaining the above list. and the father's name shows "ARa Gar" and 
>> not MAHADEV SHINDE. 
>> As the unwanted text is not generated always, I need a way to figure out 
>> what the actual name might be.
>>
>> Can you please look into this??
>>
>>
>> On Wednesday, 8 January 2020 01:02:56 UTC+5:30, Saurabh Pal wrote:
>>>
>>> Try using template matching for your use case(I am assuming that PAN 
>>> card format is same all over India). Atleast dob and pan card number can be 
>>> found easily using regex. For names, you can reject all the other text 
>>> boxes like 'INCOME TAX DEPARTMENT', 'GOVT. OF INDIA', then you will be left 
>>> with only the fathers name and holders name just check for the y coordinate 
>>> among those two text boxes.
>>>
>>>
>>> On Monday, January 6, 2020 at 7:37:13 PM UTC+5:30, Shubhranshu Panda 
>>> wrote:
>>>>
>>>> I don't know how to extract particular text from a standard image type. 
>>>> I want to extract the name, dob and PAN number from the PAN card. I have 
>>>> attachhed a sample image for reference.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0176ded0-d9d3-4b23-b102-60a39e1ccc92n%40googlegroups.com.

Reply via email to