Yes, this font is called Tamalten. But the problem is that I need to use another font (that is Balaram in the font list I send). This is part of a project of Vedic scriptures, you can see the online version here: http://vedabase.com/en Those texts I need to get from those PDFs are for the offline version which uses Balaram font. So these two are not compatible. So a find&replace method to get the proper symbols is ok since there are not much material to get from those pdfs.
In a broader sense there are people who are traveling throughout Indian libraries to photograph old manuscripts to preserve and digitize them. So for that purpose a working OCR will be much needed. I think I will contact one person because if he actually needs the help in this regard, it will be definitely worth trying to train tesseract to properly recognize those images. But that is native sanskrit, bengali and other languages. And there are others who are looking for solution to be able to recognize sanskrit transliteration also. What do you think, can it be done in tesseract? No Finereader or other commercial orc programs cannot do that. On Thu, Nov 28, 2013 at 4:29 PM, V S Rawat <[email protected]> wrote: > On 11/27/2013 9:50 PM, Shree Devi Kumar wrote: > >> Rawatji, >> >> I was going by the assumption that the text can be easily extracted from >> > > It is good that we have found two methods for replacing these letters. > > However, the fundamental solution is that there has to be font in which > these same ASCII codes must already be showing the correct letters. > > So, if anyone gets time to do some research or somehow figures out which > font it is, it will be very helpful for handling such text in future. Then > replacement would not be required. > > To begin with, the font has to be one of the dozen listed in pdf file's > properties-fonts. > > Thanks. > -- > Rawat > > > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/tesseract-ocr/6uG7HUxLY7w/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

