What did the non upscaled version look like - this looks far too blurred which is why it's struggling. It might be that your upscaling is too much - it should be a ratio of the original size of the cropped image to make it 300dpi, rather than always 3000px.
Cheers On 8 January 2015 at 21:34, newbie <[email protected]> wrote: > Thanks Allistair. I have it working. The problem is , if I used the same > "mantra" of resampling for other images its not working. I have this > cropped image(attached, which is also upscaled to 3000 pixels width vice), > its coming out VIPZZSO. I need to sharpen this probably. I have to set to > very sharp in the preprocessing pgm I am using, but in vain. > > Any directions. for general preprocessing ? > > > On Thursday, January 8, 2015 at 11:39:55 AM UTC-5, Allistair C wrote: >> >> OK good. >> >> I got it working by both resampling (upscaling) the cropped version and >> the full image. >> >> If you are using the "white box" approach so that you have a crop area >> (best method) then you just need to upscale that. >> >> There are many ways to resize an image up - you can find that easily with >> Google. I used Open CV for Android and the cvResize function for example. >> There are libraries for doing this in Java, .NET, Python etc.. just look >> around. >> >> Cheers >> >> On 8 January 2015 at 16:24, newbie <[email protected]> wrote: >> >>> It worked YAY!, you have all my gratitude!. ok now I need to know how >>> you did the resampling. I thought you said you took the cropped image and >>> resampled. But this seems like the original png file(Arris2500.png) >>> resampled. Let me know how you went about resampling and how I can acheive >>> it programatically. >>> >>> Thanks >>> >>> On Thursday, January 8, 2015 11:06:33 AM UTC-5, Allistair C wrote: >>>> >>>> Hi, >>>> >>>> I've not used tess4j but the JavaDocs show that it should be possible >>>> to set TessAPI.TessPageSegMode: >>>> >>>> http://tess4j.sourceforge.net/docs/docs-1.0/net/sourceforge/ >>>> tess4j/TessAPI.html >>>> >>>> http://tess4j.sourceforge.net/docs/docs-1.2/net/sourceforge/ >>>> tess4j/TessAPI1.TessPageSegMode.html >>>> >>>> The 3000 resampled image was: >>>> >>>> https://dl.dropboxusercontent.com/u/523401/ArrisVIP2500_3000.png >>>> >>>> Cheers >>>> >>>> On 8 January 2015 at 15:35, newbie <[email protected]> wrote: >>>> >>>>> Allistair, >>>>> Thanks for taking the time to respond . Do you know how to >>>>> use psm 6 in tess4j(its probably an argument to the instantiator, need to >>>>> look up the src code) ? I have not seen any examples of it being used by >>>>> googling.. I tried to resample the cropped image to 3000 px(horizontall >>>>> using paint) like you suggested and ran it thro tess4j and it still did >>>>> not recognize my model number. Gave me an output of "VIPZSOO". So I guess >>>>> piping it thro psm 6 is the key. Also can u send me the image that was >>>>> produced after you resampled it to 3000px, so that I know my resampling is >>>>> right. >>>>> >>>>> I also like your idea of providing the white box in the camera view to >>>>> use it as my input to cropping . Sure can do that. >>>>> I think I am glad discussed the feature matching - that seems more >>>>> like object recognition than text recognition. So probably is far fetched. >>>>> I had used camFlow(an app) to see if it would recognize my equipment >>>>> images >>>>> and it always came back with "Black media player". So they probably are >>>>> using feature matching of openCV. >>>>> >>>>> Thanks again and appreciate your taking time to respond. >>>>> >>>>> >>>>> On Wednesday, January 7, 2015 6:12:05 PM UTC-5, Allistair C wrote: >>>>>> >>>>>> It sort of depends on your hardware and how similar or different they >>>>>> are. Reliable feature matching works on distinct features (so there need >>>>>> to >>>>>> be enough points of interest (edges usually) that cover text, buttons, >>>>>> other bits and pieces). If, for example, all your hardware was the same >>>>>> as >>>>>> the example you originally posted and only the model number was changing >>>>>> then this would be an issue most likely as the feature matching may match >>>>>> several targets. >>>>>> >>>>>> Also you mention the tech takes a picture on mobile. Does that need >>>>>> to be looked up immediately? The issue is that feature matching is CPU >>>>>> heavy and can take time on mobile and is a function of the photo >>>>>> resolution. Luckily, feature matching appears to work better on lower >>>>>> resolution images and most of the time works in black and white. Then >>>>>> there >>>>>> is the potential number of hardware items you are trying to match. The >>>>>> most >>>>>> advanced mobile augmented reality products (Metaio, Vuforia) that use >>>>>> feature matching only allow up to 100 targets to be "tracked" or "looked >>>>>> for" at a time - every piece of hardware you are looking for needs to be >>>>>> compared to the live input camera view (or photo) and this is the part >>>>>> that >>>>>> hits the CPU hard. If however there was an option to offload the image(s) >>>>>> to a backend cloud server for feature match or if the tech did not need >>>>>> an >>>>>> instant or any kind of result in the field, then you are in a better >>>>>> situation as you can stand up serious computing power. >>>>>> >>>>>> It's not easy to recommend one or the other without all the facts - >>>>>> as you begin to mention new things like mobile and techs in the field, >>>>>> this >>>>>> changes things :) For instance I also used mobile - an Android tablet, >>>>>> with >>>>>> Open CV and Tesseract OCR - the combination worked in the field - the >>>>>> tech >>>>>> can position the camera face-on to the model number and take a close >>>>>> photo. >>>>>> You could even provide a mini App for your techs that has a basic >>>>>> cropping >>>>>> tool. The technique I used was to show the camera view in my app with a >>>>>> little white transparent box over the camera view that allowed the user >>>>>> to >>>>>> position the text to fit that white box. Then, when the photo was taken I >>>>>> simply cropped that white box coordinate rectangle and I had a perfect >>>>>> match. This was easy vs. feature matching :) >>>>>> >>>>>> On Wednesday, 7 January 2015 23:02:09 UTC, newbie wrote: >>>>>>> >>>>>>> Sorry for the barrage here. >>>>>>> The interesting thing is you mentioned feature matching with >>>>>>> openCV(I dont know anything at all about it). But the one thing is I can >>>>>>> have a repository of these images with me and I need to match it to one >>>>>>> of >>>>>>> the user generated image. >>>>>>> >>>>>>> A little background might help. I can(or come up with) have a >>>>>>> repository of all the equipment images with me. A tech might head to the >>>>>>> field, take a picture on his mobile device and I need to match >>>>>>> it(tech's >>>>>>> picture) against my repository and come up with the model number. >>>>>>> >>>>>>> Is this easier with ocr or feature matching with openCV ? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On Wednesday, January 7, 2015 5:35:47 PM UTC-5, newbie wrote: >>>>>>>> >>>>>>>> Thanks Allistair , my lucky day as you have responded to both my >>>>>>>> queries. Let me try to address your questions below and then go ahead >>>>>>>> with >>>>>>>> a few of my own :-) >>>>>>>> >>>>>>>> *I also meant to ask whether your use case allows for cropping. If >>>>>>>> you know you will have a certain format of image, cropping an area and >>>>>>>> resampling should be easy.* >>>>>>>> Basically the image will be an user generated image, more like the >>>>>>>> first png file, but we could ask the user to zoom in to the model >>>>>>>> number, >>>>>>>> if that would help us indentify the model number.we could do anything >>>>>>>> with >>>>>>>> the image(cropping ,resampling etc). But the problem is the model >>>>>>>> number >>>>>>>> probably will not be located at the same place for all equipments. >>>>>>>> >>>>>>>> 2. Preprocessing - as it should be programatically done would I be >>>>>>>> using opencv in conjunction with tesseract? I did not see much in >>>>>>>> tesseract >>>>>>>> for image processing(I could be totally off). >>>>>>>> 3.*.I also use psm 6 for these types of image with various text >>>>>>>> locations.* >>>>>>>> what is this ? >>>>>>>> >>>>>>>> Another thing I probably can come up with is all the model #s or >>>>>>>> images of all potential equipments, so I have repository to match >>>>>>>> against. >>>>>>>> Would that help in any way ? >>>>>>>> >>>>>>>> Thanks again for taking the time to respond. Appreciate it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wednesday, January 7, 2015 4:44:47 PM UTC-5, Allistair C wrote: >>>>>>>>> >>>>>>>>> I also meant to ask whether your use case allows for cropping. If >>>>>>>>> you know you will have a certain format of image, cropping an area and >>>>>>>>> resampling should be easy. You could also do some preprocessing that >>>>>>>>> looks >>>>>>>>> for certain icons in your image to get some context as to where the >>>>>>>>> model >>>>>>>>> number is likely to be (see feature matching on Open CV). However, I >>>>>>>>> would >>>>>>>>> need to know more about your use case. >>>>>>>>> >>>>>>>>> That said, resampling your full image to 3000px wide yielded a >>>>>>>>> result with a full model number but the more you can crop the area the >>>>>>>>> better the result: >>>>>>>>> >>>>>>>>> AT&T U verse ‘ § >>>>>>>>> LINK HD nzc , >>>>>>>>> rowzn Q I ‘ .» . ‘ nsuu 4 0|: > I >>>>>>>>> / sj J \ >>>>>>>>> VIP2500 °%' 7 A R R I s >>>>>>>>> >>>>>>>>> >>>>>>>>> On 7 January 2015 at 21:39, Allistair <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> A common technique is to pre-process your input image. >>>>>>>>>> >>>>>>>>>> Resizing produced good results.I also use psm 6 for these types >>>>>>>>>> of image with various text locations. >>>>>>>>>> >>>>>>>>>> In this case I first used your cropped image: >>>>>>>>>> >>>>>>>>>> tesseract ArrisVIP2500_cropped.png out -l eng -psm 6 config >>>>>>>>>> >>>>>>>>>> and got: >>>>>>>>>> >>>>>>>>>> AT&T U verse >>>>>>>>>> rowsn >>>>>>>>>> O F3. >>>>>>>>>> vrrzsoo ’e' >>>>>>>>>> >>>>>>>>>> Then I resampled your image to 2000px wide: >>>>>>>>>> >>>>>>>>>> tesseract ArrisVIP2500_cropped_2000.png out2000 -l eng -psm 6 >>>>>>>>>> config >>>>>>>>>> >>>>>>>>>> and got: >>>>>>>>>> >>>>>>>>>> AT&T U verse >>>>>>>>>> POWER © " ‘| >>>>>>>>>> / ‘j""'j"’.. >>>>>>>>>> VIP2500 '%’ >>>>>>>>>> >>>>>>>>>> Cheers >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 7 January 2015 at 19:26, newbie <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I am using tess4j, a java wrapper around tesseract and Here are >>>>>>>>>>> the images and results. The intent is to extract VIP2500(model >>>>>>>>>>> number) from >>>>>>>>>>> the image. An help is appreciated. >>>>>>>>>>> >>>>>>>>>>> Attached are the original png file ( >>>>>>>>>>> ArrisVIP2500.png),binarized file(ArrisVIP2500_bin.TIF) and then a >>>>>>>>>>> zoomed >>>>>>>>>>> and cropped file(ArrisVIP2500_cropped.png). >>>>>>>>>>> >>>>>>>>>>> *ArrisVIP2500.png* >>>>>>>>>>> >>>>>>>>>>> é ATE-T U-verse >>>>>>>>>>> >>>>>>>>>>> rowan 0 >>>>>>>>>>> / >>>>>>>>>>> >>>>>>>>>>> *ArrisVIP2500_bin.TIF* >>>>>>>>>>> >>>>>>>>>>> AT&T U-verse >>>>>>>>>>> >>>>>>>>>>> rowan <3 3 >>>>>>>>>>> / -- >>>>>>>>>>> >>>>>>>>>>> vxvzsoo ‘Q’ >>>>>>>>>>> >>>>>>>>>>> *ArrisVIP2500_cropped.png* >>>>>>>>>>> >>>>>>>>>>> ATE-T U-verse >>>>>>>>>>> >>>>>>>>>>> rowsn Q >>>>>>>>>>> >>>>>>>>>>> VIPZSOO ‘e’ This looks the closest to >>>>>>>>>>> VIP2500 , I need to get tess4j to reconginze digits, that said, >>>>>>>>>>> this might >>>>>>>>>>> not be a realistic scenario, as someone/something >>>>>>>>>>> >>>>>>>>>>> Needs to zoom and >>>>>>>>>>> crop the image before hand(preprocessing). >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr >>>>>>>>>>> . >>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/009ffbc7-90c >>>>>>>>>>> c-417a-90c8-b4ac9b5bb203%40googlegroups.com >>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/009ffbc7-90cc-417a-90c8-b4ac9b5bb203%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/tesseract-ocr/e6bd4bf6-ad6e-4bef-bff7-6397c924f42b%40goo >>>>> glegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e6bd4bf6-ad6e-4bef-bff7-6397c924f42b%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/tesseract-ocr/34aadbd6-f211-4ef6-87ac-fd6359c16be0% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/34aadbd6-f211-4ef6-87ac-fd6359c16be0%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/222e8ac2-66af-490a-bb37-95659759bc43%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/222e8ac2-66af-490a-bb37-95659759bc43%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vhrfZW97HXicf4ereP_wgjDsiKrRFWvz%3DGxx8ybDXrJYQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

