Thanks Allistair , my lucky day as you have responded to both my queries. 
Let me try to address your questions below and then go ahead with a few of 
my own :-)

*I also meant to ask whether your use case allows for cropping. If you know 
you will have a certain format of image, cropping an area and resampling 
should be easy.*
Basically the image will be an user generated image, more like the first 
png file, but we could ask the user to zoom in to the model number, if that 
would help us indentify the model number.we could do anything with the 
image(cropping ,resampling etc). But the problem is the model number 
probably will not be located at the same place for all equipments.

2. Preprocessing - as it should be programatically done would I be using 
opencv in conjunction with tesseract? I did not see much in tesseract for 
image processing(I could be totally off).
3.*.I also use psm 6 for these types of image with various text locations.*
   what is this ?

Another thing I probably can come up with is all the model #s or images of 
all potential equipments, so I have repository to match against. Would that 
help in any way ?

Thanks again for taking the time to respond. Appreciate it.



On Wednesday, January 7, 2015 4:44:47 PM UTC-5, Allistair C wrote:
>
> I also meant to ask whether your use case allows for cropping. If you know 
> you will have a certain format of image, cropping an area and resampling 
> should be easy. You could also do some preprocessing that looks for certain 
> icons in your image to get some context as to where the model number is 
> likely to be (see feature matching on Open CV). However, I would need to 
> know more about your use case.
>
> That said, resampling your full image to 3000px wide yielded a result with 
> a full model number but the more you can crop the area the better the 
> result:
>
> AT&T U verse ‘ §
> LINK HD nzc ,
> rowzn Q I ‘ .» . ‘ nsuu 4 0|: > I
> / sj J \
> VIP2500 °%' 7 A R R I s
>
>
> On 7 January 2015 at 21:39, Allistair <[email protected] <javascript:>> 
> wrote:
>
>> A common technique is to pre-process your input image. 
>>
>> Resizing produced good results.I also use psm 6 for these types of image 
>> with various text locations.
>>
>> In this case I first used your cropped image:
>>
>> tesseract ArrisVIP2500_cropped.png out -l eng -psm 6 config
>>
>> and got:
>>
>> AT&T U verse
>> rowsn
>> O F3.
>> vrrzsoo ’e'
>>
>> Then I resampled your image to 2000px wide:
>>
>> tesseract ArrisVIP2500_cropped_2000.png out2000 -l eng -psm 6 config 
>>
>> and got:
>>
>> AT&T U verse
>> POWER © " ‘|
>> / ‘j""'j"’..
>> VIP2500 '%’
>>
>> Cheers
>>
>>
>>
>> On 7 January 2015 at 19:26, newbie <[email protected] <javascript:>> 
>> wrote:
>>
>>> I am using tess4j, a java wrapper around tesseract and Here are the 
>>> images and results. The intent is to extract VIP2500(model number) from the 
>>> image. An help is appreciated.
>>>
>>> Attached are the original png  file ( ArrisVIP2500.png),binarized 
>>> file(ArrisVIP2500_bin.TIF) and then a zoomed and cropped 
>>> file(ArrisVIP2500_cropped.png).
>>>
>>> *ArrisVIP2500.png*
>>>
>>>  é ATE-T U-verse
>>>
>>> rowan 0
>>> / 
>>>
>>> *ArrisVIP2500_bin.TIF*
>>>
>>> AT&T U-verse
>>>
>>> rowan <3 3
>>> / --
>>>
>>> vxvzsoo ‘Q’ 
>>>
>>> *ArrisVIP2500_cropped.png*
>>>
>>> ATE-T U-verse
>>>
>>> rowsn Q 
>>>
>>> VIPZSOO ‘e’                      This looks the closest to VIP2500 , I 
>>> need to get tess4j to reconginze digits, that said, this might not be a 
>>> realistic scenario, as someone/something
>>>
>>>                                            Needs to zoom and crop the 
>>> image before hand(preprocessing).
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To post to this group, send email to [email protected] 
>>> <javascript:>.
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/009ffbc7-90cc-417a-90c8-b4ac9b5bb203%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/009ffbc7-90cc-417a-90c8-b4ac9b5bb203%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/207f15a7-b648-40db-b536-6c272a67ef9f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to