Yes, I think you have covered the tweaks I thought of suggesting.
Sven

On Friday, September 30, 2011, Calomer <[email protected]> wrote:
> Sven,
>
> Now I'm curious. What kind of tweaks are you talking about ?
>
> Appending old language training data with new fonts?
> Pre-enhancement of the image (skew transformation on italic
> characters, contract enhancement on low-contrast fonts etc) ?
>
> I'd love to know any other tweaks there is.
>
> Thanks
>
> On Sep 29, 10:39 pm, Sven Pedersen <[email protected]> wrote:
>> Thanks Calomer.
>>
>> Bonny, is the language you're trying to improve using a different set
>> of characters (alphabet)? If so, you'll need to do a lot of training
>> as Calomer described. Otherwise you'll just need some tweaks. The font
>> may be an issue.
>> --Sven
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Sep 29, 2011 at 12:39 PM, Calomer <[email protected]> wrote:
>> > I'll try my best to answer, tho I'm hardly eligible.
>>
>> > According to training instructions (onhttp://
code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3)
>> > and general OCR knowledge, you cannot train solely by new characters.
>> > You need training images, you need to create boxes (with any box
>> > editor, but I only used Qt Box Editor). Once you create new boxes
>> > around your new tiff image, and label them accordingly, you should be
>> > ready for training.
>>
>> > Keep in mind, you'll need at least 12 low x-height in pixels
>> > (preferably around 20 pixels), variety in images would be nice for
>> > increased performance.
>>
>> > Follow training instructions, train your own language file, try OCR
>> > again, if you fail again, I'm sure someone else who has wider
>> > knowledge than me should be able to answer your further questions.
>>
>> > On Sep 29, 2:44 pm, Bonny <[email protected]> wrote:
>> >> Nobody know or the question is too silly?
>>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to [email protected]
>> > To unsubscribe from this group, send email to
>> > [email protected]
>> > For more options, visit this group at
>> >http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> --
>> ``All that is gold does not glitter,
>>   not all those who wander are lost;
>> the old that is strong does not wither,
>>   deep roots are not reached by the frost.
>> From the ashes a fire shall be woken,
>>   a light from the shadows shall spring;
>> renewed shall be blade that was broken,
>>   the crownless again shall be king.”
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to