Hi Nick,

Thanks for the reply.
I do transform the subtitle images to be black text on white background 
before inputting to tesseract.
I tried training the text and getting fairly good results.
The fonts used in Dvds subtitles change for each Dvd. So, I tried with 
multiple fonts.
Currently, major issue is with the spacings, especially with italics.
I could see several config settings in textord.h..
Modifying some of them on trial and error basis. 
Some fonts have more kerning and some have less..
Also in italics, the spacing between boxes become narrower.
Can you help what are the settings that can influence the spacings 
especially in case of DVD fonts ?

Thanks,
Kiran

On Monday, 9 July 2012 10:12:01 UTC+5:30, Kiran Babu G wrote:
>
> Hi,
>
> I am trying to OCR dvd subtitle images using tesseract.
> It gives very good results with the built-in trained data.
> However, I would like to improve the output quality.
> Typical errors are missing of spaces, I vs l , / vs *l* etc.
> What is the best way to achieve it ?
> I tried training with subtitle images and use the trained data. But, it 
> didn't help. In fact, the output was worse.
>
> Regards
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to