Hi Syed, No way to recognize it as is. It's a connected script and currently Tesseract can't work with such scripts. You should do special preprocessing yourself to determine every character's bounds and then pass characters to Tess one-by-one. This is in theory.
However, in practice I think you can achieve somewhat satisfactory accuracy for *short* words with this script/font. To do this you have to prepare a number of training sheets with characters of this font properly spaced out. See http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Training_Procedure for details. Tess *might* work in this case, treating the characters as merged and finding proper chop points. I admit both methods require a good deal of effort and R&D, though. HTH Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Jul 27, 2011 at 6:27 PM, syed arifullah badsha s <[email protected]> wrote: > Hi, > > Please find the attached file that i am trying to read. > Need help to work on this. > > Regards, > Syed A B. > > > On Wed, Jul 27, 2011 at 3:21 PM, zdenko podobny <[email protected]> wrote: >> >> If you are really interesting in help, than provide example image ;-) >> Zdenko >> >> On Wed, Jul 27, 2011 at 11:45 AM, <[email protected]> wrote: >>> >>> Hi, >>> >>> When i run the command tesseract fsmt.tif output >>> it shows me some junk data "»âY`I'I/2," for image with having "Mentally" >>> as the text in this font. >>> >>> Any idea please help. >>> >>> >>> On Jul 27, 2011 11:02am, sreekanth reddy <[email protected]> wrote: >>> > Hi I am also working to train french Script Mt,if any positive results >>> > ,i share it with you. >>> > >>> > >>> > --sreekanth >>> > >>> > On Wed, Jul 27, 2011 at 10:35 AM, syed arifullah badsha s >>> > [email protected]> wrote: >>> > >>> > >>> > the box files are not getting created properly. I am trying to train >>> > it, but in vain, but will try again. If u have any boxfiles are trained >>> > data, kindly share with me. >>> > >>> > >>> > >>> > >>> > On Tue, Jul 26, 2011 at 6:51 PM, Sven Pedersen [email protected]> >>> > wrote: >>> > >>> > >>> > >>> > Hi Syed,How are you trying to OCR the image? What kind of failure >>> > message are you getting? Is it a problem with the font, or with the image >>> > format? >>> > >>> > >>> > >>> > --Sven >>> > >>> > >>> > >>> > On Tue, Jul 26, 2011 at 2:20 AM, [email protected] >>> > [email protected]> wrote: >>> > >>> > >>> > >>> > >>> > Hi All, >>> > >>> > >>> > >>> > Kindly help me in recognizing the french script MT font that is in a >>> > >>> > TIF image. >>> > >>> > Did any one tried it. >>> > >>> > >>> > >>> > >>> > >>> > I have a sample tif file but i dont have provision to attach it >>> > >>> > here.... >>> > >>> > >>> > >>> > Any info will help. >>> > >>> > >>> > >>> > -- >>> > >>> > You received this message because you are subscribed to the Google >>> > >>> > Groups "tesseract-ocr" group. >>> > >>> > To post to this group, send email to [email protected] >>> > >>> > To unsubscribe from this group, send email to >>> > >>> > [email protected] >>> > >>> > For more options, visit this group at >>> > >>> > http://groups.google.com/group/tesseract-ocr?hl=en >>> > >>> > >>> > >>> > >>> > >>> > >>> > -- >>> > ``All that is gold does not glitter, >>> > not all those who wander are lost; >>> > the old that is strong does not wither, >>> > deep roots are not reached by the frost. >>> > >>> > >>> > >>> > >>> > From the ashes a fire shall be woken, >>> > a light from the shadows shall spring; >>> > renewed shall be blade that was broken, >>> > the crownless again shall be king.” >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > -- >>> > >>> > You received this message because you are subscribed to the Google >>> > >>> > Groups "tesseract-ocr" group. >>> > >>> > To post to this group, send email to [email protected] >>> > >>> > To unsubscribe from this group, send email to >>> > >>> > [email protected] >>> > >>> > For more options, visit this group at >>> > >>> > http://groups.google.com/group/tesseract-ocr?hl=en >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > -- >>> > >>> > >>> > You received this message because you are subscribed to the Google >>> > >>> > Groups "tesseract-ocr" group. >>> > >>> > To post to this group, send email to [email protected] >>> > >>> > To unsubscribe from this group, send email to >>> > >>> > [email protected] >>> > >>> > For more options, visit this group at >>> > >>> > http://groups.google.com/group/tesseract-ocr?hl=en >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > -- >>> > >>> > You received this message because you are subscribed to the Google >>> > >>> > Groups "tesseract-ocr" group. >>> > >>> > To post to this group, send email to [email protected] >>> > >>> > To unsubscribe from this group, send email to >>> > >>> > [email protected] >>> > >>> > For more options, visit this group at >>> > >>> > http://groups.google.com/group/tesseract-ocr?hl=en >>> > >>> > >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

