Looks like a plain sans serif font like Helvetica, so I think you just need to resize the image to increase the pixel height. ImageMagick is a common choice (see PerlMagick). Sven
On Thursday, May 16, 2013, Mike Masinick wrote: > So, I have several hundred thousand scans of sports cards that look > similar to the attached. I want to scan the text at the top of the page > and extract at least the 8 digit number. Ideally more of the text as well, > but the 8 digit number is the most important. Before I spend a ton of time > researching the best way to train tesseract for this font, is there a > suggested way to preprocess an image like this to get the best results? > It seems to only grab the 8 digit number correctly about 1/10th of the > time. It gets the numbers wrong a lot. > > I'm using tesseract on Amazon EC2 with the Image::OCR::Tesseract perl > module. Any suggestions much appreciated. Might also be willling to pay > for somebody to create training data for me if anybody is well versed in > this and can save me the time of having to figure it out.... > > Thanks! > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to > [email protected]<javascript:_e({}, 'cvml', > '[email protected]');> > To unsubscribe from this group, send email to > [email protected] <javascript:_e({}, 'cvml', > 'tesseract-ocr%[email protected]');> > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:_e({}, > 'cvml', 'tesseract-ocr%[email protected]');>. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

