Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

V S Rawat Wed, 27 Nov 2013 01:42:05 -0800


"words with sanskrit transliteration marks are used"

could you please point out exact pages where to look for it. I will tryto ocr it and see the results.


Also,
http://www.omkarananda-ashram.org/Sanskrit/itranslator99.htm#downloads

The above page and several links from that page also have a lot ofSanskrit fonts. Maybe someone might be used by you.


Thanks.
--
Rawat

On 11/27/2013 9:16 AM, Srivas wrote:

Hi Rawat!

I'm really sorry, I didn't know that this is a mailing list type of
forum ;-(

Second, if you look carefully, you will see that the text is not
entirely english. In many places words with sanskrit transliteration
marks are used. But as you said, it can actually copy/pasted and it
didn't even come to my mind! So this part is actually working and that
is great! So I am almost there. The remaining problem is another type.
The provided tamalten font will display the marks, but I need to use
another font to display the final document. It also contains the same
diacritical marks but uses another encoding. But this might be a
question to another person, I know the author of the fonts, I will ask
him. Thanks for the help!

Btw. If anyone needs to use sanskrit transliterated fonts, here are the
resources: http://www.krishna-das.com/ksyberspace/fonts/

On Tuesday, November 26, 2013 4:47:11 PM UTC+7, V S Rawat wrote:

    Dear Sir Srivas ji,

    firstly, you should not have sent 2.2 MB 68 page pdf file and 181 KB
    zip
    to all the list members unasked. You could have loaded it somewhere and
    sent the link so that only those download it who can contribute in it.
    It is a wastage of time and bandwidth to get such huge messages.

    Secondly, I couldn't really understand your issue. I saw your pdf file.
    it is pure English. You can open it in any pdf reader and just copy
    entire text from there and paste in a text or word file. So, what else
    exactly you are looking for, please elaborate.

    you don't even need to ocr it. These are already ASCII text.

    Thanks.
    --
    Rawat


    On 11/26/2013 12:40 PM, Srivas wrote:
     > Hi!
     > I have a bunch of PDF files journals and I need to get the text
    out of
     > it. They contain a lot of romanized sanskrit diacritical marks
    and that
     > creates a difficulty. I tried Finereader and OmniPage but they
    cannot be
     > trained to recognize those symbols. I just need an ORC program I can
     > train to show any symbol required and the above programs cannot
    do that.
     >
     > Where should I start from? I feel like this program can do the
    job but
     > can you help me to get started? I downloaded tesseract and
    installed it
     > (windows). There are different GUIs available and I think it will
    make
     > it easier to work. Can you suggest a good one? I tried
    gimagereader but
     > it's too primitive and leaves a lot of work to be done afterwards
    with
     > the overall text.
     >
     > I don't think this kind of language pack is available and how to
    create it?
     >
     > I will add one pdf and fonts that were used to create it. Maybe
    someone
     > would like to try and let me know how to do it?
     >
     > Thank you for any help!
     >
     > Regards,
     > Srivas


--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

---You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

Reply via email to