Hi Syed,

No way to recognize it as is. It's a connected script and currently
Tesseract can't work with such scripts. You should do special
preprocessing yourself to determine every character's bounds and then
pass characters to Tess one-by-one. This is in theory.

However, in practice I think you can achieve somewhat satisfactory
accuracy for *short* words with this script/font. To do this you have
to prepare a number of training sheets with characters of this font
properly spaced out. See
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Training_Procedure
for details. Tess *might* work in this case, treating the characters
as merged and finding proper chop points.

I admit both methods require a good deal of effort and R&D, though.

HTH

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Wed, Jul 27, 2011 at 6:27 PM, syed arifullah badsha s
<[email protected]> wrote:
> Hi,
>
> Please find the attached file that i am trying to read.
> Need help to work on this.
>
> Regards,
> Syed A B.
>
>
> On Wed, Jul 27, 2011 at 3:21 PM, zdenko podobny <[email protected]> wrote:
>>
>> If you are really interesting in help, than provide example image ;-)
>> Zdenko
>>
>> On Wed, Jul 27, 2011 at 11:45 AM, <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> When i run the command tesseract fsmt.tif output
>>> it shows me some junk data "ȉY`I'I/2," for image with having "Mentally"
>>> as the text in this font.
>>>
>>> Any idea please help.
>>>
>>>
>>> On Jul 27, 2011 11:02am, sreekanth reddy <[email protected]> wrote:
>>> > Hi I am also working to train french Script Mt,if any positive results
>>> > ,i share it with you.
>>> >
>>> >
>>> > --sreekanth
>>> >
>>> > On Wed, Jul 27, 2011 at 10:35 AM, syed arifullah badsha s
>>> > [email protected]> wrote:
>>> >
>>> >
>>> > the box files are not getting created properly. I am trying to train
>>> > it, but in vain, but will try again. If u have any boxfiles are trained
>>> > data, kindly share with me.
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Jul 26, 2011 at 6:51 PM, Sven Pedersen [email protected]>
>>> > wrote:
>>> >
>>> >
>>> >
>>> > Hi Syed,How are you trying to OCR the image? What kind of failure
>>> > message are you getting? Is it a problem with the font, or with the image
>>> > format?
>>> >
>>> >
>>> >
>>> > --Sven
>>> >
>>> >
>>> >
>>> > On Tue, Jul 26, 2011 at 2:20 AM, [email protected]
>>> > [email protected]> wrote:
>>> >
>>> >
>>> >
>>> >
>>> > Hi All,
>>> >
>>> >
>>> >
>>> > Kindly help me in recognizing the french script MT font that is in a
>>> >
>>> > TIF image.
>>> >
>>> > Did any one tried it.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > I have a sample tif file but i dont have provision  to attach it
>>> >
>>> > here....
>>> >
>>> >
>>> >
>>> > Any info will help.
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > You received this message because you are subscribed to the Google
>>> >
>>> > Groups "tesseract-ocr" group.
>>> >
>>> > To post to this group, send email to [email protected]
>>> >
>>> > To unsubscribe from this group, send email to
>>> >
>>> > [email protected]
>>> >
>>> > For more options, visit this group at
>>> >
>>> > http://groups.google.com/group/tesseract-ocr?hl=en
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > ``All that is gold does not glitter,
>>> >   not all those who wander are lost;
>>> > the old that is strong does not wither,
>>> >   deep roots are not reached by the frost.
>>> >
>>> >
>>> >
>>> >
>>> > From the ashes a fire shall be woken,
>>> >   a light from the shadows shall spring;
>>> > renewed shall be blade that was broken,
>>> >   the crownless again shall be king.”
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > You received this message because you are subscribed to the Google
>>> >
>>> > Groups "tesseract-ocr" group.
>>> >
>>> > To post to this group, send email to [email protected]
>>> >
>>> > To unsubscribe from this group, send email to
>>> >
>>> > [email protected]
>>> >
>>> > For more options, visit this group at
>>> >
>>> > http://groups.google.com/group/tesseract-ocr?hl=en
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> >
>>> > You received this message because you are subscribed to the Google
>>> >
>>> > Groups "tesseract-ocr" group.
>>> >
>>> > To post to this group, send email to [email protected]
>>> >
>>> > To unsubscribe from this group, send email to
>>> >
>>> > [email protected]
>>> >
>>> > For more options, visit this group at
>>> >
>>> > http://groups.google.com/group/tesseract-ocr?hl=en
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > You received this message because you are subscribed to the Google
>>> >
>>> > Groups "tesseract-ocr" group.
>>> >
>>> > To post to this group, send email to [email protected]
>>> >
>>> > To unsubscribe from this group, send email to
>>> >
>>> > [email protected]
>>> >
>>> > For more options, visit this group at
>>> >
>>> > http://groups.google.com/group/tesseract-ocr?hl=en
>>> >
>>> >
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to