Ok, you can try page 11. There is glossary and lots of words with diacritics. Thanks.
On Wed, Nov 27, 2013 at 4:41 PM, V S Rawat <[email protected]> wrote: > > "words with sanskrit transliteration marks are used" > > could you please point out exact pages where to look for it. I will try to > ocr it and see the results. > > Also, > http://www.omkarananda-ashram.org/Sanskrit/itranslator99.htm#downloads > > The above page and several links from that page also have a lot of > Sanskrit fonts. Maybe someone might be used by you. > > Thanks. > -- > Rawat > > > On 11/27/2013 9:16 AM, Srivas wrote: > >> Hi Rawat! >> >> I'm really sorry, I didn't know that this is a mailing list type of >> forum ;-( >> >> Second, if you look carefully, you will see that the text is not >> entirely english. In many places words with sanskrit transliteration >> marks are used. But as you said, it can actually copy/pasted and it >> didn't even come to my mind! So this part is actually working and that >> is great! So I am almost there. The remaining problem is another type. >> The provided tamalten font will display the marks, but I need to use >> another font to display the final document. It also contains the same >> diacritical marks but uses another encoding. But this might be a >> question to another person, I know the author of the fonts, I will ask >> him. Thanks for the help! >> >> Btw. If anyone needs to use sanskrit transliterated fonts, here are the >> resources: http://www.krishna-das.com/ksyberspace/fonts/ >> >> On Tuesday, November 26, 2013 4:47:11 PM UTC+7, V S Rawat wrote: >> >> Dear Sir Srivas ji, >> >> firstly, you should not have sent 2.2 MB 68 page pdf file and 181 KB >> zip >> to all the list members unasked. You could have loaded it somewhere >> and >> sent the link so that only those download it who can contribute in it. >> It is a wastage of time and bandwidth to get such huge messages. >> >> Secondly, I couldn't really understand your issue. I saw your pdf >> file. >> it is pure English. You can open it in any pdf reader and just copy >> entire text from there and paste in a text or word file. So, what else >> exactly you are looking for, please elaborate. >> >> you don't even need to ocr it. These are already ASCII text. >> >> Thanks. >> -- >> Rawat >> >> >> On 11/26/2013 12:40 PM, Srivas wrote: >> > Hi! >> > I have a bunch of PDF files journals and I need to get the text >> out of >> > it. They contain a lot of romanized sanskrit diacritical marks >> and that >> > creates a difficulty. I tried Finereader and OmniPage but they >> cannot be >> > trained to recognize those symbols. I just need an ORC program I >> can >> > train to show any symbol required and the above programs cannot >> do that. >> > >> > Where should I start from? I feel like this program can do the >> job but >> > can you help me to get started? I downloaded tesseract and >> installed it >> > (windows). There are different GUIs available and I think it will >> make >> > it easier to work. Can you suggest a good one? I tried >> gimagereader but >> > it's too primitive and leaves a lot of work to be done afterwards >> with >> > the overall text. >> > >> > I don't think this kind of language pack is available and how to >> create it? >> > >> > I will add one pdf and fonts that were used to create it. Maybe >> someone >> > would like to try and let me know how to do it? >> > >> > Thank you for any help! >> > >> > Regards, >> > Srivas >> > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/tesseract-ocr/6uG7HUxLY7w/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

