those Ā á character are defined in Garamond font, but the ASCII code used in this document is not the same as defined in Garamond font.

So, it is some other font where these ASCII codes have been defined for this character.

The document list a dozen fonts, some of it might be that. you need to figure out which font it could be, by hammer hit trial error method.

Thanks.
--
Rawat

On 11/27/2013 3:17 PM, Jaanus Henno wrote:
Ok, you can try page 11. There is glossary and lots of words with
diacritics. Thanks.


On Wed, Nov 27, 2013 at 4:41 PM, V S Rawat <[email protected]
<mailto:[email protected]>> wrote:


    "words with sanskrit transliteration marks are used"

    could you please point out exact pages where to look for it. I will
    try to ocr it and see the results.

    Also,
    http://www.omkarananda-ashram.__org/Sanskrit/itranslator99.__htm#downloads
    <http://www.omkarananda-ashram.org/Sanskrit/itranslator99.htm#downloads>

    The above page and several links from that page also have a lot of
    Sanskrit fonts. Maybe someone might be used by you.

    Thanks.
    --
    Rawat


    On 11/27/2013 9:16 AM, Srivas wrote:

        Hi Rawat!

        I'm really sorry, I didn't know that this is a mailing list type of
        forum ;-(

        Second, if you look carefully, you will see that the text is not
        entirely english. In many places words with sanskrit transliteration
        marks are used. But as you said, it can actually copy/pasted and it
        didn't even come to my mind! So this part is actually working
        and that
        is great! So I am almost there. The remaining problem is another
        type.
        The provided tamalten font will display the marks, but I need to use
        another font to display the final document. It also contains the
        same
        diacritical marks but uses another encoding. But this might be a
        question to another person, I know the author of the fonts, I
        will ask
        him. Thanks for the help!

        Btw. If anyone needs to use sanskrit transliterated fonts, here
        are the
        resources: http://www.krishna-das.com/__ksyberspace/fonts/
        <http://www.krishna-das.com/ksyberspace/fonts/>

        On Tuesday, November 26, 2013 4:47:11 PM UTC+7, V S Rawat wrote:

             Dear Sir Srivas ji,

             firstly, you should not have sent 2.2 MB 68 page pdf file
        and 181 KB
             zip
             to all the list members unasked. You could have loaded it
        somewhere and
             sent the link so that only those download it who can
        contribute in it.
             It is a wastage of time and bandwidth to get such huge
        messages.

             Secondly, I couldn't really understand your issue. I saw
        your pdf file.
             it is pure English. You can open it in any pdf reader and
        just copy
             entire text from there and paste in a text or word file.
        So, what else
             exactly you are looking for, please elaborate.

             you don't even need to ocr it. These are already ASCII text.

             Thanks.
             --
             Rawat


             On 11/26/2013 12:40 PM, Srivas wrote:
              > Hi!
              > I have a bunch of PDF files journals and I need to get
        the text
             out of
              > it. They contain a lot of romanized sanskrit diacritical
        marks
             and that
              > creates a difficulty. I tried Finereader and OmniPage
        but they
             cannot be
              > trained to recognize those symbols. I just need an ORC
        program I can
              > train to show any symbol required and the above programs
        cannot
             do that.
              >
              > Where should I start from? I feel like this program can
        do the
             job but
              > can you help me to get started? I downloaded tesseract and
             installed it
              > (windows). There are different GUIs available and I
        think it will
             make
              > it easier to work. Can you suggest a good one? I tried
             gimagereader but
              > it's too primitive and leaves a lot of work to be done
        afterwards
             with
              > the overall text.
              >
              > I don't think this kind of language pack is available
        and how to
             create it?
              >
              > I will add one pdf and fonts that were used to create
        it. Maybe
             someone
              > would like to try and let me know how to do it?
              >
              > Thank you for any help!
              >
              > Regards,
              > Srivas


--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to