Ok, you can try page 11. There is glossary and lots of words with
diacritics. Thanks.


On Wed, Nov 27, 2013 at 4:41 PM, V S Rawat <[email protected]> wrote:

>
> "words with sanskrit transliteration marks are used"
>
> could you please point out exact pages where to look for it. I will try to
> ocr it and see the results.
>
> Also,
> http://www.omkarananda-ashram.org/Sanskrit/itranslator99.htm#downloads
>
> The above page and several links from that page also have a lot of
> Sanskrit fonts. Maybe someone might be used by you.
>
> Thanks.
> --
> Rawat
>
>
> On 11/27/2013 9:16 AM, Srivas wrote:
>
>> Hi Rawat!
>>
>> I'm really sorry, I didn't know that this is a mailing list type of
>> forum ;-(
>>
>> Second, if you look carefully, you will see that the text is not
>> entirely english. In many places words with sanskrit transliteration
>> marks are used. But as you said, it can actually copy/pasted and it
>> didn't even come to my mind! So this part is actually working and that
>> is great! So I am almost there. The remaining problem is another type.
>> The provided tamalten font will display the marks, but I need to use
>> another font to display the final document. It also contains the same
>> diacritical marks but uses another encoding. But this might be a
>> question to another person, I know the author of the fonts, I will ask
>> him. Thanks for the help!
>>
>> Btw. If anyone needs to use sanskrit transliterated fonts, here are the
>> resources: http://www.krishna-das.com/ksyberspace/fonts/
>>
>> On Tuesday, November 26, 2013 4:47:11 PM UTC+7, V S Rawat wrote:
>>
>>     Dear Sir Srivas ji,
>>
>>     firstly, you should not have sent 2.2 MB 68 page pdf file and 181 KB
>>     zip
>>     to all the list members unasked. You could have loaded it somewhere
>> and
>>     sent the link so that only those download it who can contribute in it.
>>     It is a wastage of time and bandwidth to get such huge messages.
>>
>>     Secondly, I couldn't really understand your issue. I saw your pdf
>> file.
>>     it is pure English. You can open it in any pdf reader and just copy
>>     entire text from there and paste in a text or word file. So, what else
>>     exactly you are looking for, please elaborate.
>>
>>     you don't even need to ocr it. These are already ASCII text.
>>
>>     Thanks.
>>     --
>>     Rawat
>>
>>
>>     On 11/26/2013 12:40 PM, Srivas wrote:
>>      > Hi!
>>      > I have a bunch of PDF files journals and I need to get the text
>>     out of
>>      > it. They contain a lot of romanized sanskrit diacritical marks
>>     and that
>>      > creates a difficulty. I tried Finereader and OmniPage but they
>>     cannot be
>>      > trained to recognize those symbols. I just need an ORC program I
>> can
>>      > train to show any symbol required and the above programs cannot
>>     do that.
>>      >
>>      > Where should I start from? I feel like this program can do the
>>     job but
>>      > can you help me to get started? I downloaded tesseract and
>>     installed it
>>      > (windows). There are different GUIs available and I think it will
>>     make
>>      > it easier to work. Can you suggest a good one? I tried
>>     gimagereader but
>>      > it's too primitive and leaves a lot of work to be done afterwards
>>     with
>>      > the overall text.
>>      >
>>      > I don't think this kind of language pack is available and how to
>>     create it?
>>      >
>>      > I will add one pdf and fonts that were used to create it. Maybe
>>     someone
>>      > would like to try and let me know how to do it?
>>      >
>>      > Thank you for any help!
>>      >
>>      > Regards,
>>      > Srivas
>>
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> --- You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/tesseract-ocr/6uG7HUxLY7w/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to