Re: [tesseract-ocr] Help extracting text from images.

newbie Wed, 07 Jan 2015 15:02:29 -0800

Sorry for the barrage here.
The interesting thing is you mentioned feature matching with openCV(I dont 
know anything at all about it). But the one thing is I can have a 
repository of these images with me and I need to match it to one of the 
user generated image.


A little background might help. I can(or come up with) have a repository of 
all the equipment images with me. A tech might head to the field, take a 
picture on his mobile device and  I need to match it(tech's picture) 
against my repository and come up with the model number.

Is this easier with ocr or feature matching with openCV ?

Thanks

On Wednesday, January 7, 2015 5:35:47 PM UTC-5, newbie wrote:
>
> Thanks Allistair , my lucky day as you have responded to both my queries. 
> Let me try to address your questions below and then go ahead with a few of 
> my own :-)
>
> *I also meant to ask whether your use case allows for cropping. If you 
> know you will have a certain format of image, cropping an area and 
> resampling should be easy.*
> Basically the image will be an user generated image, more like the first 
> png file, but we could ask the user to zoom in to the model number, if that 
> would help us indentify the model number.we could do anything with the 
> image(cropping ,resampling etc). But the problem is the model number 
> probably will not be located at the same place for all equipments.
>
> 2. Preprocessing - as it should be programatically done would I be using 
> opencv in conjunction with tesseract? I did not see much in tesseract for 
> image processing(I could be totally off).
> 3.*.I also use psm 6 for these types of image with various text 
> locations.*
>    what is this ?
>
> Another thing I probably can come up with is all the model #s or images of 
> all potential equipments, so I have repository to match against. Would that 
> help in any way ?
>
> Thanks again for taking the time to respond. Appreciate it.
>
>
>
> On Wednesday, January 7, 2015 4:44:47 PM UTC-5, Allistair C wrote:
>>
>> I also meant to ask whether your use case allows for cropping. If you 
>> know you will have a certain format of image, cropping an area and 
>> resampling should be easy. You could also do some preprocessing that looks 
>> for certain icons in your image to get some context as to where the model 
>> number is likely to be (see feature matching on Open CV). However, I would 
>> need to know more about your use case.
>>
>> That said, resampling your full image to 3000px wide yielded a result 
>> with a full model number but the more you can crop the area the better the 
>> result:
>>
>> AT&T U verse ‘ §
>> LINK HD nzc ,
>> rowzn Q I ‘ .» . ‘ nsuu 4 0|: > I
>> / sj J \
>> VIP2500 °%' 7 A R R I s
>>
>>
>> On 7 January 2015 at 21:39, Allistair <[email protected]> wrote:
>>
>>> A common technique is to pre-process your input image. 
>>>
>>> Resizing produced good results.I also use psm 6 for these types of image 
>>> with various text locations.
>>>
>>> In this case I first used your cropped image:
>>>
>>> tesseract ArrisVIP2500_cropped.png out -l eng -psm 6 config
>>>
>>> and got:
>>>
>>> AT&T U verse
>>> rowsn
>>> O F3.
>>> vrrzsoo ’e'
>>>
>>> Then I resampled your image to 2000px wide:
>>>
>>> tesseract ArrisVIP2500_cropped_2000.png out2000 -l eng -psm 6 config 
>>>
>>> and got:
>>>
>>> AT&T U verse
>>> POWER © " ‘|
>>> / ‘j""'j"’..
>>> VIP2500 '%’
>>>
>>> Cheers
>>>
>>>
>>>
>>> On 7 January 2015 at 19:26, newbie <[email protected]> wrote:
>>>
>>>> I am using tess4j, a java wrapper around tesseract and Here are the 
>>>> images and results. The intent is to extract VIP2500(model number) from 
>>>> the 
>>>> image. An help is appreciated.
>>>>
>>>> Attached are the original png  file ( ArrisVIP2500.png),binarized 
>>>> file(ArrisVIP2500_bin.TIF) and then a zoomed and cropped 
>>>> file(ArrisVIP2500_cropped.png).
>>>>
>>>> *ArrisVIP2500.png*
>>>>
>>>>  é ATE-T U-verse
>>>>
>>>> rowan 0
>>>> / 
>>>>
>>>> *ArrisVIP2500_bin.TIF*
>>>>
>>>> AT&T U-verse
>>>>
>>>> rowan <3 3
>>>> / --
>>>>
>>>> vxvzsoo ‘Q’ 
>>>>
>>>> *ArrisVIP2500_cropped.png*
>>>>
>>>> ATE-T U-verse
>>>>
>>>> rowsn Q 
>>>>
>>>> VIPZSOO ‘e’                      This looks the closest to VIP2500 , I 
>>>> need to get tess4j to reconginze digits, that said, this might not be a 
>>>> realistic scenario, as someone/something
>>>>
>>>>                                            Needs to zoom and crop the 
>>>> image before hand(preprocessing).
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/009ffbc7-90cc-417a-90c8-b4ac9b5bb203%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/009ffbc7-90cc-417a-90c8-b4ac9b5bb203%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e89dc5a6-fabb-49b7-b0b0-f3e311d74d03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Help extracting text from images.

Reply via email to