Looks like a plain sans serif font like Helvetica, so I think you just need
to resize the image to increase the pixel height. ImageMagick is a common
choice (see PerlMagick).
Sven

On Thursday, May 16, 2013, Mike Masinick wrote:

> So, I have several hundred thousand scans of sports cards that look
> similar to the attached.  I want to scan the text at the top of the page
> and extract at least the 8 digit number.  Ideally more of the text as well,
> but the 8 digit number is the most important.  Before I spend a ton of time
> researching the best way to train tesseract for this font, is there a
> suggested way to preprocess an image like this to get the best results?
> It seems to only grab the 8 digit number correctly about 1/10th of the
> time.  It gets the numbers wrong a lot.
>
> I'm using tesseract on Amazon EC2 with the Image::OCR::Tesseract perl
> module.  Any suggestions much appreciated.  Might also be willling to pay
> for somebody to create training data for me if anybody is well versed in
> this and can save me the time of having to figure it out....
>
> Thanks!
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to 
> [email protected]<javascript:_e({}, 'cvml', 
> '[email protected]');>
> To unsubscribe from this group, send email to
> [email protected] <javascript:_e({}, 'cvml',
> 'tesseract-ocr%[email protected]');>
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:_e({},
> 'cvml', 'tesseract-ocr%[email protected]');>.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>


-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to