Hi Andre,
You must have a minimum resolution to be effective, overly high
resolution will also throw results off. Internally the JPG is
converted to a bitmap format. They are currently only 72dpi, so you'll
need to resize (scale) them at least, but the low contrast (black on
gray) will be a problem, so ImageMagick or something could help.
--Sven


On Thu, Aug 18, 2011 at 9:39 AM, Andres <[email protected]> wrote:
> Sven:
> What do you exactly mean with 200-300 dpi ? The dpi attributes in the
> jpg files are being evaluated ? Or you are referring to some scaling
> of the images ?
>
> Andriy:
> If you continue having problems with this and if the camera is in a
> fixed position with respect to your display and the font is always the
> same, it should be very easy for you to avoid using tesseract and just
> recognizing the characters by evaluating some pixels after
> thresholding. (I would threshold just the evaluated pixels).
>
> Regards,
>
> Andres
>
>
>
> 2011/8/18 Sven Pedersen <[email protected]>:
>> You should not need to retrain. You need to change the images to
>> grayscale or B&W of 200-300 dpi, get the background (which seems to be
>> gray) to be closer to white. You can do that kind of cleanup
>> transformation with ImageMagick.
>> --Sven
>>
>>
>> On Wed, Aug 17, 2011 at 3:09 PM, Andriy Malovanyy <[email protected]> 
>> wrote:
>>> Hi,
>>>
>>> I try to write a simple program that uses pictures, which are taken from a
>>> web-cam every 10 sec. with another program, recognises the text with OCR and
>>> log the data into a text file. Everything seems to be working fine except
>>> the fact that tesseract does not want to recognize the pictures that are
>>> taken. If I "feed" tesseract pictures created with Photoshop, it works
>>> better but sometimes also can not recognize very simple and obvious text
>>> (numbers).
>>>
>>> I attach the 3 files taken by a web cam and 1 created with Photoshop. None
>>> of them recognize well. The first two web-cam picture return garbage text,
>>> the third one (the best quality I think) returns "Empty page message".
>>> Photoshop picture returns "1234.018" instead of "1234.0.18".
>>>
>>> I use Tesseract-OCR 3.0 with language files that followed the package
>>> (English only). Do I need to train Tessarat to recognise the pictures?? How
>>> is it better to do it then?? Take several pictures taken with a web-cam, and
>>> from them make a training file with numbers from 0 to 9 and points? I have
>>> started to read how to do that, it seems sooo complicated..
>>>
>>> Any advice appreciated.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>
>>
>> --
>> ``All that is gold does not glitter,
>>   not all those who wander are lost;
>> the old that is strong does not wither,
>>   deep roots are not reached by the frost.
>> From the ashes a fire shall be woken,
>>   a light from the shadows shall spring;
>> renewed shall be blade that was broken,
>>   the crownless again shall be king.”
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to