Sorry for "uping" the post like this... But I really need some help ASAP! Any guesses? At least something about the parameters?
Thanks a lot! - Romeo Em sexta-feira, 15 de fevereiro de 2013 10h07min40s UTC-8, Romeo Jihara escreveu: > > Hi all, > > I am trying to detect text that is overlaid on top of images. A common > example is memes like the ones here: http://www.quickmeme.com/memes/ > The goal is to produce a high quality bounding box prediction and, if > possible, generate OCR. Please note that I'm much more interested in the > former! > I am trying to use Tesseract for that. > > What makes the problem challenging is that the background can be anything. > In addition the text can have a stroke and a fill of arbitrary color. > My questions are: > 1) Tesseract has tons of different parameters. What is a set of important > parameters to tune for this case and what are good values for them? > 2) How do I preprocess the image? I was a bit surprised to find out that > converting the image to grayscale before passing it to Tesseract results in > different (and generally better) accuracy. Why? Also inverting the image > works better for some text. What are the set of important transformations > to play with? > 3) I noticed that often Tesseract is able to detect sequences of words but > not combine them together. What parameter affects the probability of > combining adjacent words together. > 4) Is it worth doing morphological transformations, such as trying to get > rid of the text stroke, or does Tesseract handle text strokes? > 5) When I call getRegions does it also perform OCR to give me better > confidence predictions of the text boxes? > 6) Does Tesseract use the OCR output in determining the confidence of a > region being true text? Looking at the results I get it seems like it is > possible to improve the next confidence by building an n-gram model. Also > some characters (like punctuation points) are highly indicative of false > positive text regions. Is there such built-in functionality or should I > build one? > Similarly the size and relative locations of text can also be used to > refine the confidence. It appears from my tests that often small and > disjoint text areas (and ones that are not horizontally aligned with > others) are false positives. Again, is there such built-in heuristic or > should I build one? > > I am attaching a couple of examples that show the text localization > results whit different preprocessing applied to the image. The numbers > inside each box is the confidence for that region, also blue boxes means > confidence > 75 and red boxes <= 75. I'm also sending the parameters used > in all these detections. > > Thanks for your time and for building such an awesome free OCR engine! > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

