I'm a little new to all that. How do you run sed under Windows 7? I read 
information about it and that it can also be run under windows but cannot 
understand how to do that.

On Wednesday, November 27, 2013 9:11:01 PM UTC+7, shree wrote:
>
> sed -f roman.sed inputfile.txt > outputfile.txt
>
> You will have to add other substitutions to the file roman.sed - it only 
> has the first few substitutions that I encountered.
>
> Shree Devi Kumar
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>  
>
> On Wed, Nov 27, 2013 at 7:08 PM, Jaanus Henno 
> <[email protected]<javascript:>
> > wrote:
>
>> Thank you both for your help. This letter replacement is a good idea! 
>> Looks like this sed script will do the work. I will just have to see how to 
>> use sed... Tomorrow I will check it out.
>>
>>
>> On Wed, Nov 27, 2013 at 8:20 PM, Shree Devi Kumar 
>> <[email protected]<javascript:>
>> > wrote:
>>
>>> I think rather than try to OCR, please extract the text and then run a 
>>> conversion script to change the letters with diacritical marks.
>>>
>>> eg. you would do the following substitution using sed for the sample 
>>> text from page 11
>>>
>>> s/Å/Ā/g
>>> s/å/ā/g
>>> s/®/ṛ/g
>>> s/ß/ṣ/g
>>> s/∫/ṇ/g
>>> s/î/ī/g
>>> s/Ê/Ī/g
>>> s/¸/Ś/g
>>> s/Ω/ś/g
>>> s/ü/ū/g
>>>
>>> Also attaching sed script as a utf-8 text file.
>>>
>>> Shree Devi Kumar
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>  
>>>
>>> On Wed, Nov 27, 2013 at 3:45 PM, V S Rawat <[email protected]<javascript:>
>>> > wrote:
>>>
>>>> those Ā á character are defined in Garamond font, but the ASCII code 
>>>> used in this document is not the same as defined in Garamond font.
>>>>
>>>> So, it is some other font where these ASCII codes have been defined for 
>>>> this character.
>>>>
>>>> The document list a dozen fonts, some of it might be that. you need to 
>>>> figure out which font it could be, by hammer hit trial error method.
>>>>
>>>> Thanks.
>>>> -- 
>>>> Rawat
>>>>
>>>>
>>>> On 11/27/2013 3:17 PM, Jaanus Henno wrote:
>>>>
>>>>> Ok, you can try page 11. There is glossary and lots of words with
>>>>> diacritics. Thanks.
>>>>>
>>>>>
>>>>> On Wed, Nov 27, 2013 at 4:41 PM, V S Rawat <[email protected]<javascript:>
>>>>> <mailto:[email protected] <javascript:>>> wrote:
>>>>>
>>>>>
>>>>>     "words with sanskrit transliteration marks are used"
>>>>>
>>>>>     could you please point out exact pages where to look for it. I will
>>>>>     try to ocr it and see the results.
>>>>>
>>>>>     Also,
>>>>>     http://www.omkarananda-ashram.__org/Sanskrit/itranslator99._
>>>>> _htm#downloads
>>>>>
>>>>>     <http://www.omkarananda-ashram.org/Sanskrit/
>>>>> itranslator99.htm#downloads>
>>>>>
>>>>>     The above page and several links from that page also have a lot of
>>>>>     Sanskrit fonts. Maybe someone might be used by you.
>>>>>
>>>>>     Thanks.
>>>>>     --
>>>>>     Rawat
>>>>>
>>>>>
>>>>>     On 11/27/2013 9:16 AM, Srivas wrote:
>>>>>
>>>>>         Hi Rawat!
>>>>>
>>>>>         I'm really sorry, I didn't know that this is a mailing list 
>>>>> type of
>>>>>         forum ;-(
>>>>>
>>>>>         Second, if you look carefully, you will see that the text is 
>>>>> not
>>>>>         entirely english. In many places words with sanskrit 
>>>>> transliteration
>>>>>         marks are used. But as you said, it can actually copy/pasted 
>>>>> and it
>>>>>         didn't even come to my mind! So this part is actually working
>>>>>         and that
>>>>>         is great! So I am almost there. The remaining problem is 
>>>>> another
>>>>>         type.
>>>>>         The provided tamalten font will display the marks, but I need 
>>>>> to use
>>>>>         another font to display the final document. It also contains 
>>>>> the
>>>>>         same
>>>>>         diacritical marks but uses another encoding. But this might be 
>>>>> a
>>>>>         question to another person, I know the author of the fonts, I
>>>>>         will ask
>>>>>         him. Thanks for the help!
>>>>>
>>>>>         Btw. If anyone needs to use sanskrit transliterated fonts, here
>>>>>         are the
>>>>>         resources: http://www.krishna-das.com/__ksyberspace/fonts/
>>>>>
>>>>>         <http://www.krishna-das.com/ksyberspace/fonts/>
>>>>>
>>>>>         On Tuesday, November 26, 2013 4:47:11 PM UTC+7, V S Rawat 
>>>>> wrote:
>>>>>
>>>>>              Dear Sir Srivas ji,
>>>>>
>>>>>              firstly, you should not have sent 2.2 MB 68 page pdf file
>>>>>         and 181 KB
>>>>>              zip
>>>>>              to all the list members unasked. You could have loaded it
>>>>>         somewhere and
>>>>>              sent the link so that only those download it who can
>>>>>         contribute in it.
>>>>>              It is a wastage of time and bandwidth to get such huge
>>>>>         messages.
>>>>>
>>>>>              Secondly, I couldn't really understand your issue. I saw
>>>>>         your pdf file.
>>>>>              it is pure English. You can open it in any pdf reader and
>>>>>         just copy
>>>>>              entire text from there and paste in a text or word file.
>>>>>         So, what else
>>>>>              exactly you are looking for, please elaborate.
>>>>>
>>>>>              you don't even need to ocr it. These are already ASCII 
>>>>> text.
>>>>>
>>>>>              Thanks.
>>>>>              --
>>>>>              Rawat
>>>>>
>>>>>
>>>>>              On 11/26/2013 12:40 PM, Srivas wrote:
>>>>>               > Hi!
>>>>>               > I have a bunch of PDF files journals and I need to get
>>>>>         the text
>>>>>              out of
>>>>>               > it. They contain a lot of romanized sanskrit 
>>>>> diacritical
>>>>>         marks
>>>>>              and that
>>>>>               > creates a difficulty. I tried Finereader and OmniPage
>>>>>         but they
>>>>>              cannot be
>>>>>               > trained to recognize those symbols. I just need an ORC
>>>>>         program I can
>>>>>               > train to show any symbol required and the above 
>>>>> programs
>>>>>         cannot
>>>>>              do that.
>>>>>               >
>>>>>               > Where should I start from? I feel like this program can
>>>>>         do the
>>>>>              job but
>>>>>               > can you help me to get started? I downloaded tesseract 
>>>>> and
>>>>>              installed it
>>>>>               > (windows). There are different GUIs available and I
>>>>>         think it will
>>>>>              make
>>>>>               > it easier to work. Can you suggest a good one? I tried
>>>>>              gimagereader but
>>>>>               > it's too primitive and leaves a lot of work to be done
>>>>>         afterwards
>>>>>              with
>>>>>               > the overall text.
>>>>>               >
>>>>>               > I don't think this kind of language pack is available
>>>>>         and how to
>>>>>              create it?
>>>>>               >
>>>>>               > I will add one pdf and fonts that were used to create
>>>>>         it. Maybe
>>>>>              someone
>>>>>               > would like to try and let me know how to do it?
>>>>>               >
>>>>>               > Thank you for any help!
>>>>>               >
>>>>>               > Regards,
>>>>>               > Srivas
>>>>>
>>>>>  
>>>> -- 
>>>> -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to 
>>>> [email protected]<javascript:>
>>>> To unsubscribe from this group, send email to
>>>> [email protected] <javascript:>
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>
>>>> --- You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected] <javascript:>.
>>>>
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  -- 
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]<javascript:>
>>> To unsubscribe from this group, send email to
>>> [email protected] <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>  
>>> --- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/tesseract-ocr/6uG7HUxLY7w/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> [email protected] <javascript:>.
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  -- 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to