Please see
https://github.com/Shreeshrii/tesstrain-xsa/blob/master/langdata/latin2unicode.sh

It has sed substitution commands for going from transliteration to Unicode
for xsa, based on mapping shown in Wikipedia and other web pages.


On Mon, Mar 23, 2020, 01:58 Wincent Balin <[email protected]> wrote:

> Hi Shree,
>
> I will add a tool to create random text within Unicode range soon.
>
> @aby tesh: Do you know anything about a converter from transliterated text
> to [xsa] Unicode text?
>
> On Mon, 16 Mar 2020 at 03:12, Shree Devi Kumar <[email protected]>
> wrote:
>
>> Hi Wincent,
>> Thanks for the link.
>>
>> I had checked that site earlier. It has text transcription in Latin
>> transliteration,eg.
>> http://dasi.cnr.it/index.php?id=79&prjId=1&corId=5&colId=0&navId=522207406&recId=2149
>>  I
>> haven't found any conversion tool to Unicode for the same.
>>
>>    1  Yʿly w-ʾḏmr bny Whbʾl[ ... ...] ʾḏmr[ ... ... by]—
>>    2  t-(s¹m) Yġl b-rdʾ mrʾ-s¹[m ... ...]
>>    3  [... ... ]w-(b)-(rd)ʾ mrʾ-s¹m [... ...]
>>    4  [... ...] ʾḏ(mr) w-b-rd(ʾ)[ ... ...]
>>
>>  Maybe, you can add a tool in https://github.com/wincentbalin/pytesstrain to
>> create randomly generated training text from a range of characters/word
>> list, similar to
>>
>> The tool language_metrics runs Tesseract OCR over images of random word
>>> sequences, which are created out of the supplied wordlist,
>>
>>
>> On Mon, Mar 16, 2020 at 2:32 AM Wincent Balin <[email protected]>
>> wrote:
>>
>>> Maybe http://dasi.cnr.it does have something usable?
>>>
>>> Shree Devi Kumar <[email protected]> schrieb am So., 15. März 2020,
>>> 16:55:
>>>
>>>> There is no online corpus for xsa that I could find.
>>>>
>>>> Two of the fonts you sent are legacy fonts, that is they map English
>>>> letters to ancient Arabic characters.
>>>>
>>>> Are there any converters that convert from the legacy mapping to
>>>> Unicode?
>>>>
>>>> If there is existing text in legacy fonts, it can be converted to
>>>> Unicode and that can be used for training.
>>>>
>>>> On Sun, Mar 15, 2020, 17:57 aby tesh <[email protected]> wrote:
>>>>
>>>>> Where can i get the training text, or can i create a new one. I have a
>>>>> problem writing with fonts which some of included in the attachment i sent
>>>>> you.
>>>>>
>>>>> On Sunday, March 15, 2020 at 4:32:08 AM UTC+3, shree wrote:
>>>>>>
>>>>>> I had used the findfonts feature of text2image and found only two
>>>>>> fonts that rendered the xsa text. I will check the fonts that you sent.
>>>>>> What about training text? Unless you have some more text, it will be
>>>>>> difficult to do training.
>>>>>>
>>>>>> Quivira
>>>>>> Segoe UI Historic
>>>>>>
>>>>>> On Sun, Mar 15, 2020, 04:01 aby tesh <[email protected]> wrote:
>>>>>>
>>>>>>> That is what i am not getting, i don't think they all are unicode
>>>>>>> fonts, i couldn't get one. Some render on my machine (Linux) some don't.
>>>>>>>
>>>>>>> On Saturday, March 14, 2020 at 8:45:46 PM UTC+3, shree wrote:
>>>>>>>>
>>>>>>>> Are all these Unicode fonts?
>>>>>>>>
>>>>>>>> What about training text in utf-8 Unicode encoding?
>>>>>>>>
>>>>>>>> On Sat, Mar 14, 2020, 22:37 aby tesh <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hey shree, I have compiled all relevant fonts and attached them
>>>>>>>>> below. I am not sure know how i can generate text data with it.
>>>>>>>>>
>>>>>>>>> On Tuesday, March 10, 2020 at 5:35:26 AM UTC+3, shree wrote:
>>>>>>>>>>
>>>>>>>>>> If you can share a large enough training text and fonts, I can
>>>>>>>>>> rerun the training.
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 10, 2020, 03:41 aby tesh <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey,
>>>>>>>>>>>
>>>>>>>>>>> I followed the steps in the readme file, and i started the
>>>>>>>>>>> lstmtraining, but it seems my current computer's processor can't 
>>>>>>>>>>> handle the
>>>>>>>>>>> training for a longer period of time.
>>>>>>>>>>>
>>>>>>>>>>> What can i do about it? When should i abort the training to get
>>>>>>>>>>> a good trainedata file? or is there one which is accurate that you 
>>>>>>>>>>> can
>>>>>>>>>>> share ?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e727f106-d668-44b5-9bba-8fad29fc1587%40googlegroups.com
>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e727f106-d668-44b5-9bba-8fad29fc1587%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/efa79761-20a5-4d20-b0c1-40eb2523c289%40googlegroups.com
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/efa79761-20a5-4d20-b0c1-40eb2523c289%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/1d3e54cc-3f53-4ad3-b870-171bb26fc6eb%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1d3e54cc-3f53-4ad3-b870-171bb26fc6eb%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/88bfa189-4a1e-4528-857c-013248b5ee4b%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/88bfa189-4a1e-4528-857c-013248b5ee4b%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVrD9Vo8HUFWe_dr6c6Gs2EPOB2bh9DfkmAtA85cKp8fQ%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVrD9Vo8HUFWe_dr6c6Gs2EPOB2bh9DfkmAtA85cKp8fQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CANuFvMcdEir5VQr0RJCkBKaS-0C%3DE2EaPUpezxtqyKwaRcTAUw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CANuFvMcdEir5VQr0RJCkBKaS-0C%3DE2EaPUpezxtqyKwaRcTAUw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWPnzsoe7BgF2k6bg8QQg4XcLp1Cu%2B6Fq3kVbkw28XEwg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWPnzsoe7BgF2k6bg8QQg4XcLp1Cu%2B6Fq3kVbkw28XEwg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CANuFvMcxdxNSr5M4ruQqRmLW3n233DQmBHReYAmJ%2BHcNyCGtLg%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CANuFvMcxdxNSr5M4ruQqRmLW3n233DQmBHReYAmJ%2BHcNyCGtLg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf0OzOPf_yKGZOEShBPcsAmVzR9Hn5c%2BqaCjfBVccFMA%40mail.gmail.com.

Reply via email to