OK good. I got it working by both resampling (upscaling) the cropped version and the full image.
If you are using the "white box" approach so that you have a crop area (best method) then you just need to upscale that. There are many ways to resize an image up - you can find that easily with Google. I used Open CV for Android and the cvResize function for example. There are libraries for doing this in Java, .NET, Python etc.. just look around. Cheers On 8 January 2015 at 16:24, newbie <[email protected]> wrote: > It worked YAY!, you have all my gratitude!. ok now I need to know how you > did the resampling. I thought you said you took the cropped image and > resampled. But this seems like the original png file(Arris2500.png) > resampled. Let me know how you went about resampling and how I can acheive > it programatically. > > Thanks > > On Thursday, January 8, 2015 11:06:33 AM UTC-5, Allistair C wrote: >> >> Hi, >> >> I've not used tess4j but the JavaDocs show that it should be possible to >> set TessAPI.TessPageSegMode: >> >> http://tess4j.sourceforge.net/docs/docs-1.0/net/sourceforge/ >> tess4j/TessAPI.html >> >> http://tess4j.sourceforge.net/docs/docs-1.2/net/sourceforge/ >> tess4j/TessAPI1.TessPageSegMode.html >> >> The 3000 resampled image was: >> >> https://dl.dropboxusercontent.com/u/523401/ArrisVIP2500_3000.png >> >> Cheers >> >> On 8 January 2015 at 15:35, newbie <[email protected]> wrote: >> >>> Allistair, >>> Thanks for taking the time to respond . Do you know how to >>> use psm 6 in tess4j(its probably an argument to the instantiator, need to >>> look up the src code) ? I have not seen any examples of it being used by >>> googling.. I tried to resample the cropped image to 3000 px(horizontall >>> using paint) like you suggested and ran it thro tess4j and it still did >>> not recognize my model number. Gave me an output of "VIPZSOO". So I guess >>> piping it thro psm 6 is the key. Also can u send me the image that was >>> produced after you resampled it to 3000px, so that I know my resampling is >>> right. >>> >>> I also like your idea of providing the white box in the camera view to >>> use it as my input to cropping . Sure can do that. >>> I think I am glad discussed the feature matching - that seems more like >>> object recognition than text recognition. So probably is far fetched. I had >>> used camFlow(an app) to see if it would recognize my equipment images and >>> it always came back with "Black media player". So they probably are using >>> feature matching of openCV. >>> >>> Thanks again and appreciate your taking time to respond. >>> >>> >>> On Wednesday, January 7, 2015 6:12:05 PM UTC-5, Allistair C wrote: >>>> >>>> It sort of depends on your hardware and how similar or different they >>>> are. Reliable feature matching works on distinct features (so there need to >>>> be enough points of interest (edges usually) that cover text, buttons, >>>> other bits and pieces). If, for example, all your hardware was the same as >>>> the example you originally posted and only the model number was changing >>>> then this would be an issue most likely as the feature matching may match >>>> several targets. >>>> >>>> Also you mention the tech takes a picture on mobile. Does that need to >>>> be looked up immediately? The issue is that feature matching is CPU heavy >>>> and can take time on mobile and is a function of the photo resolution. >>>> Luckily, feature matching appears to work better on lower resolution images >>>> and most of the time works in black and white. Then there is the potential >>>> number of hardware items you are trying to match. The most advanced mobile >>>> augmented reality products (Metaio, Vuforia) that use feature matching only >>>> allow up to 100 targets to be "tracked" or "looked for" at a time - every >>>> piece of hardware you are looking for needs to be compared to the live >>>> input camera view (or photo) and this is the part that hits the CPU hard. >>>> If however there was an option to offload the image(s) to a backend cloud >>>> server for feature match or if the tech did not need an instant or any kind >>>> of result in the field, then you are in a better situation as you can stand >>>> up serious computing power. >>>> >>>> It's not easy to recommend one or the other without all the facts - as >>>> you begin to mention new things like mobile and techs in the field, this >>>> changes things :) For instance I also used mobile - an Android tablet, with >>>> Open CV and Tesseract OCR - the combination worked in the field - the tech >>>> can position the camera face-on to the model number and take a close photo. >>>> You could even provide a mini App for your techs that has a basic cropping >>>> tool. The technique I used was to show the camera view in my app with a >>>> little white transparent box over the camera view that allowed the user to >>>> position the text to fit that white box. Then, when the photo was taken I >>>> simply cropped that white box coordinate rectangle and I had a perfect >>>> match. This was easy vs. feature matching :) >>>> >>>> On Wednesday, 7 January 2015 23:02:09 UTC, newbie wrote: >>>>> >>>>> Sorry for the barrage here. >>>>> The interesting thing is you mentioned feature matching with openCV(I >>>>> dont know anything at all about it). But the one thing is I can have a >>>>> repository of these images with me and I need to match it to one of the >>>>> user generated image. >>>>> >>>>> A little background might help. I can(or come up with) have a >>>>> repository of all the equipment images with me. A tech might head to the >>>>> field, take a picture on his mobile device and I need to match it(tech's >>>>> picture) against my repository and come up with the model number. >>>>> >>>>> Is this easier with ocr or feature matching with openCV ? >>>>> >>>>> Thanks >>>>> >>>>> On Wednesday, January 7, 2015 5:35:47 PM UTC-5, newbie wrote: >>>>>> >>>>>> Thanks Allistair , my lucky day as you have responded to both my >>>>>> queries. Let me try to address your questions below and then go ahead >>>>>> with >>>>>> a few of my own :-) >>>>>> >>>>>> *I also meant to ask whether your use case allows for cropping. If >>>>>> you know you will have a certain format of image, cropping an area and >>>>>> resampling should be easy.* >>>>>> Basically the image will be an user generated image, more like the >>>>>> first png file, but we could ask the user to zoom in to the model number, >>>>>> if that would help us indentify the model number.we could do anything >>>>>> with >>>>>> the image(cropping ,resampling etc). But the problem is the model number >>>>>> probably will not be located at the same place for all equipments. >>>>>> >>>>>> 2. Preprocessing - as it should be programatically done would I be >>>>>> using opencv in conjunction with tesseract? I did not see much in >>>>>> tesseract >>>>>> for image processing(I could be totally off). >>>>>> 3.*.I also use psm 6 for these types of image with various text >>>>>> locations.* >>>>>> what is this ? >>>>>> >>>>>> Another thing I probably can come up with is all the model #s or >>>>>> images of all potential equipments, so I have repository to match >>>>>> against. >>>>>> Would that help in any way ? >>>>>> >>>>>> Thanks again for taking the time to respond. Appreciate it. >>>>>> >>>>>> >>>>>> >>>>>> On Wednesday, January 7, 2015 4:44:47 PM UTC-5, Allistair C wrote: >>>>>>> >>>>>>> I also meant to ask whether your use case allows for cropping. If >>>>>>> you know you will have a certain format of image, cropping an area and >>>>>>> resampling should be easy. You could also do some preprocessing that >>>>>>> looks >>>>>>> for certain icons in your image to get some context as to where the >>>>>>> model >>>>>>> number is likely to be (see feature matching on Open CV). However, I >>>>>>> would >>>>>>> need to know more about your use case. >>>>>>> >>>>>>> That said, resampling your full image to 3000px wide yielded a >>>>>>> result with a full model number but the more you can crop the area the >>>>>>> better the result: >>>>>>> >>>>>>> AT&T U verse ‘ § >>>>>>> LINK HD nzc , >>>>>>> rowzn Q I ‘ .» . ‘ nsuu 4 0|: > I >>>>>>> / sj J \ >>>>>>> VIP2500 °%' 7 A R R I s >>>>>>> >>>>>>> >>>>>>> On 7 January 2015 at 21:39, Allistair <[email protected]> wrote: >>>>>>> >>>>>>>> A common technique is to pre-process your input image. >>>>>>>> >>>>>>>> Resizing produced good results.I also use psm 6 for these types of >>>>>>>> image with various text locations. >>>>>>>> >>>>>>>> In this case I first used your cropped image: >>>>>>>> >>>>>>>> tesseract ArrisVIP2500_cropped.png out -l eng -psm 6 config >>>>>>>> >>>>>>>> and got: >>>>>>>> >>>>>>>> AT&T U verse >>>>>>>> rowsn >>>>>>>> O F3. >>>>>>>> vrrzsoo ’e' >>>>>>>> >>>>>>>> Then I resampled your image to 2000px wide: >>>>>>>> >>>>>>>> tesseract ArrisVIP2500_cropped_2000.png out2000 -l eng -psm 6 >>>>>>>> config >>>>>>>> >>>>>>>> and got: >>>>>>>> >>>>>>>> AT&T U verse >>>>>>>> POWER © " ‘| >>>>>>>> / ‘j""'j"’.. >>>>>>>> VIP2500 '%’ >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 7 January 2015 at 19:26, newbie <[email protected]> wrote: >>>>>>>> >>>>>>>>> I am using tess4j, a java wrapper around tesseract and Here are >>>>>>>>> the images and results. The intent is to extract VIP2500(model >>>>>>>>> number) from >>>>>>>>> the image. An help is appreciated. >>>>>>>>> >>>>>>>>> Attached are the original png file ( ArrisVIP2500.png),binarized >>>>>>>>> file(ArrisVIP2500_bin.TIF) and then a zoomed and cropped >>>>>>>>> file(ArrisVIP2500_cropped.png). >>>>>>>>> >>>>>>>>> *ArrisVIP2500.png* >>>>>>>>> >>>>>>>>> é ATE-T U-verse >>>>>>>>> >>>>>>>>> rowan 0 >>>>>>>>> / >>>>>>>>> >>>>>>>>> *ArrisVIP2500_bin.TIF* >>>>>>>>> >>>>>>>>> AT&T U-verse >>>>>>>>> >>>>>>>>> rowan <3 3 >>>>>>>>> / -- >>>>>>>>> >>>>>>>>> vxvzsoo ‘Q’ >>>>>>>>> >>>>>>>>> *ArrisVIP2500_cropped.png* >>>>>>>>> >>>>>>>>> ATE-T U-verse >>>>>>>>> >>>>>>>>> rowsn Q >>>>>>>>> >>>>>>>>> VIPZSOO ‘e’ This looks the closest to VIP2500 >>>>>>>>> , I need to get tess4j to reconginze digits, that said, this might >>>>>>>>> not be a >>>>>>>>> realistic scenario, as someone/something >>>>>>>>> >>>>>>>>> Needs to zoom and crop >>>>>>>>> the image before hand(preprocessing). >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/009ffbc7-90c >>>>>>>>> c-417a-90c8-b4ac9b5bb203%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/009ffbc7-90cc-417a-90c8-b4ac9b5bb203%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/tesseract-ocr/e6bd4bf6-ad6e-4bef-bff7-6397c924f42b% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/e6bd4bf6-ad6e-4bef-bff7-6397c924f42b%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/34aadbd6-f211-4ef6-87ac-fd6359c16be0%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/34aadbd6-f211-4ef6-87ac-fd6359c16be0%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAORW5viAgjeDmO_7xpdPW3gWahiWR5NCfNzgEa9aNp%2BgVXCD3w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

