Ok, thank you!

On Thu, Nov 28, 2013 at 6:01 PM, Shree Devi Kumar <[email protected]>wrote:

> You may want to look at a software called SANSKRITOCR. The old version was
> free. There is a new commercial version also. Please see
> http://www.sanskritreader.de/
>
> Shree Devi Kumar
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
>
> On Thu, Nov 28, 2013 at 3:58 PM, Jaanus Henno <[email protected]>wrote:
>
>> Yes, this font is called Tamalten. But the problem is that I need to use
>> another font (that is Balaram in the font list I send). This is part of a
>> project of Vedic scriptures, you can see the online version here:
>> http://vedabase.com/en
>> Those texts I need to get from those PDFs are for the offline version
>> which uses Balaram font. So these two are not compatible. So a find&replace
>> method to get the proper symbols is ok since there are not much material to
>> get from those pdfs.
>>
>> In a broader sense there are people who are traveling throughout Indian
>> libraries to photograph old manuscripts to preserve and digitize them. So
>> for that purpose a working OCR will be much needed. I think I will contact
>> one person because if he actually needs the help in this regard, it will be
>> definitely worth trying to train tesseract to properly recognize those
>> images. But that is native sanskrit, bengali and other languages. And there
>> are others who are looking for solution to be able to recognize sanskrit
>> transliteration also. What do you think, can it be done in tesseract? No
>> Finereader or other commercial orc programs cannot do that.
>>
>>
>> On Thu, Nov 28, 2013 at 4:29 PM, V S Rawat <[email protected]> wrote:
>>
>>> On 11/27/2013 9:50 PM, Shree Devi Kumar wrote:
>>>
>>>> Rawatji,
>>>>
>>>> I was going by the assumption that the text can be easily extracted from
>>>>
>>>
>>> It is good that we have found two methods for replacing these letters.
>>>
>>> However, the fundamental solution is that there has to be font in which
>>> these same ASCII codes must already be showing the correct letters.
>>>
>>> So, if anyone gets time to do some research or somehow figures out which
>>> font it is, it will be very helpful for handling such text in future. Then
>>> replacement would not be required.
>>>
>>> To begin with, the font has to be one of the dozen listed in pdf file's
>>> properties-fonts.
>>>
>>> Thanks.
>>> --
>>> Rawat
>>>
>>>
>>>
>>> --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>> --- You received this message because you are subscribed to a topic in
>>> the Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>> topic/tesseract-ocr/6uG7HUxLY7w/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/6uG7HUxLY7w/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to