Unfortunately I never used hOCR, so I can't help you. On Tuesday, February 4, 2014 5:57:10 PM UTC+1, Nick Porter wrote: > > Thanks I'll give those a shot, do you know of any way to use hOCR to solve > this problem? > > On Tuesday, February 4, 2014 3:46:56 AM UTC-7, Aleksander Grzyb wrote: >> >> Generally finding text on color images is very hard, but in your case, >> your business card image is in binary, so I have two possible solutions in >> my mind: >> >> 1. You can use OpenCV function findContours() which will detect >> contours of letters, then you can filter that contours (using hierarchy). >> After that you put bounding rect on that filtered contours. By now you >> have >> coordinates of letters and shapes on you business card. Based on that >> information you can group the bounding rects to contain your blocks of >> texts. >> 2. On your binary image there are only white and black colors. So you >> have 2 cases, where letters and shapes are white and background black or >> where letters and shapes are black and background white. You can use >> histogram to detect the color of letters. Next you can use histogram on >> every row of image and on every column of image and see where are the >> blocks of text. >> >> On Monday, February 3, 2014 11:41:30 PM UTC+1, Nick Porter wrote: >>> >>> Thanks for the reply Aleksander. These improved the accuracy of my >>> scans, however it does not provide a sloution to detecting paragraphs and >>> blocks of text. Any idea how to do this? >>> >>> On Monday, February 3, 2014 2:16:15 AM UTC-7, Aleksander Grzyb wrote: >>>> >>>> To improve results you should try to: >>>> >>>> 1. Convert image to binary image. >>>> 2. Crop the image to get rid off the surroundings. >>>> 3. Detect skew of image and do some perspective transform. >>>> >>>> I recommend to use OpenCV to do this operations. There is a pod for >>>> OpenCV: >>>> >>>> https://github.com/Fl0p/OpenCV-iOS >>>> >>>> Here are some links that should help you do the image processing part: >>>> >>>> >>>> http://stackoverflow.com/questions/8667818/opencv-c-obj-c-detecting-a-sheet-of-paper-square-detection >>>> >>>> http://stackoverflow.com/questions/6555629/algorithm-to-detect-corners-of-paper-sheet-in-photo >>>> >>>> http://stackoverflow.com/questions/8637867/skew-detection-and-reduction-in-opencv >>>> >>>> http://stackoverflow.com/questions/7838487/executing-cvwarpperspective-for-a-fake-deskewing-on-a-set-of-cvpoint >>>> >>>> W dniu piątek, 31 stycznia 2014 20:44:55 UTC+1 użytkownik Nick Porter >>>> napisał: >>>>> >>>>> I am trying to scan a business card using tesseract OCR, all I am >>>>> doing is sending the image in with no per-prossesing, heres the code I am >>>>> using. >>>>> >>>>> Tesseract* tesseract = [[Tesseract alloc] initWithLanguage:@"eng+ita"]; >>>>> tesseract.delegate = self; [tesseract >>>>> setVariableValue:@"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@.-()" >>>>> forKey:@"tessedit_char_whitelist"]; [tesseract setImage:[UIImage >>>>> imageNamed:@"card.jpg"]]; //image to check [tesseract recognize]; >>>>> NSLog(@"Here is the text %@", [tesseract recognizedText]); >>>>> >>>>> Picture of card <http://imgur.com/nQPG6iq> >>>>> >>>>> This is the output <http://imgur.com/poikzBn> >>>>> >>>>> As you can see the accuracy is not 100%, which is not what I am >>>>> concerned about I figure I can fix that with some simple per-processing. >>>>> However if you notice it mixes the two text blocks at the bottom, which >>>>> splits up the address, and possibly other information on other cards. >>>>> >>>>> How can I possibly use Leptonica(or something else) to group the text >>>>> somehow? Possibly send regions of text on the image individually to >>>>> tesseract to scan? I've been stuck on this problem for a while any >>>>> possible >>>>> solutions are welcome! >>>>> >>>>
-- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

