Ok, thank you!
On Thu, Nov 28, 2013 at 6:01 PM, Shree Devi Kumar <[email protected]>wrote: > You may want to look at a software called SANSKRITOCR. The old version was > free. There is a new commercial version also. Please see > http://www.sanskritreader.de/ > > Shree Devi Kumar > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > > On Thu, Nov 28, 2013 at 3:58 PM, Jaanus Henno <[email protected]>wrote: > >> Yes, this font is called Tamalten. But the problem is that I need to use >> another font (that is Balaram in the font list I send). This is part of a >> project of Vedic scriptures, you can see the online version here: >> http://vedabase.com/en >> Those texts I need to get from those PDFs are for the offline version >> which uses Balaram font. So these two are not compatible. So a find&replace >> method to get the proper symbols is ok since there are not much material to >> get from those pdfs. >> >> In a broader sense there are people who are traveling throughout Indian >> libraries to photograph old manuscripts to preserve and digitize them. So >> for that purpose a working OCR will be much needed. I think I will contact >> one person because if he actually needs the help in this regard, it will be >> definitely worth trying to train tesseract to properly recognize those >> images. But that is native sanskrit, bengali and other languages. And there >> are others who are looking for solution to be able to recognize sanskrit >> transliteration also. What do you think, can it be done in tesseract? No >> Finereader or other commercial orc programs cannot do that. >> >> >> On Thu, Nov 28, 2013 at 4:29 PM, V S Rawat <[email protected]> wrote: >> >>> On 11/27/2013 9:50 PM, Shree Devi Kumar wrote: >>> >>>> Rawatji, >>>> >>>> I was going by the assumption that the text can be easily extracted from >>>> >>> >>> It is good that we have found two methods for replacing these letters. >>> >>> However, the fundamental solution is that there has to be font in which >>> these same ASCII codes must already be showing the correct letters. >>> >>> So, if anyone gets time to do some research or somehow figures out which >>> font it is, it will be very helpful for handling such text in future. Then >>> replacement would not be required. >>> >>> To begin with, the font has to be one of the dozen listed in pdf file's >>> properties-fonts. >>> >>> Thanks. >>> -- >>> Rawat >>> >>> >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >>> --- You received this message because you are subscribed to a topic in >>> the Google Groups "tesseract-ocr" group. >>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>> topic/tesseract-ocr/6uG7HUxLY7w/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-ocr/6uG7HUxLY7w/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

