Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-03 Thread Richard Wordingham via Unicode
On Tue, 3 Dec 2019 17:35:14 +0530
विश्वासो वासुकिजः (Vishvas Vasuki) via Unicode 
wrote:

> On Tue, Dec 3, 2019 at 3:48 PM Richard Wordingham via Unicode <
> unicode@unicode.org> wrote:  

> > On Tue, 3 Dec 2019 02:05:35 +
> > Richard Wordingham via Unicode  wrote:  

> The text in IAST that I encounter seems not to have ansuvara before
> > stop consonants.  

> That's typical.
> Whatever the source script (if there is one), IAST tends to be used by
> people who follow the sanskrit devanAgarI conventions pretty strictly
> (so ends up being transcription rather than transliteration.)
 
> > I believe 'sa' would naturally expand (are there
> > non-void prescribed rules on this?) as sa-Deva-IN, so perhaps the
> > sa-Latn I usually see is unusual as sa-t-m0-iast and the description
> > should be expanded to at least sa-t-m0-sa-150-iast if sa-Latn is not
> > precise enough.

> Not sure what 150 is doing there..

I read, but in an old book, that when Sanskrit was printed in
Devanagari, clusters phonetically composed of nasal plus plosive were
written using the nasal consonant, but in India were printed using
anusvara.  The Sanskrit version of the UN Declaration of Human Rights
at Unicode (https://unicode.org/udhr/d/udhr_san.html) conforms to this
pattern by using anusvara instead of clusters, but I don't know where
the translation actually came from.

Accordingly, I thought that to get clusters instead of anusvara before
plosives, I should select Sanskrit as used in Europe, as opposed to
Sanskrit as used in India.  '150' is the region code for Europe.

Richard.




Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-03 Thread Vishvas Vasuki
On Tue, Dec 3, 2019 at 5:07 PM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:

>
> However, as a locale for generated text, I feel it is inadequate.
> Wouldn't the expansion rules generate saṃti from संति rather than santi
> from सन्ति for 'they are'?


True. I suppose that someone wanting to replicate the "anusvAra instead of
nasal" shorthand in IAST would use a dravidian source script or a
non-sanskrit source language - or ask for inclusion of a modifier after
"iast" - like t-sa-m0-iast-anusavrashorthand

-- 
--
Vishvas /विश्वासः


Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-03 Thread Vishvas Vasuki
On Tue, Dec 3, 2019 at 3:48 PM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:

> On Tue, 3 Dec 2019 02:05:35 +
> Richard Wordingham via Unicode  wrote:

The text in IAST that I encounter seems not to have ansuvara before
> stop consonants.

That's typical.
Whatever the source script (if there is one), IAST tends to be used by
people who follow the sanskrit devanAgarI conventions pretty strictly (so
ends up being transcription rather than transliteration.)



> I believe 'sa' would naturally expand (are there
> non-void prescribed rules on this?) as sa-Deva-IN, so perhaps the
> sa-Latn I usually see is unusual as sa-t-m0-iast and the description
> should be expanded to at least sa-t-m0-sa-150-iast if sa-Latn is not
> precise enough.
>

Not sure what 150 is doing there..

-- 
--
Vishvas /विश्वासः


Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-03 Thread Richard Wordingham via Unicode
I think the 'Latn' in sa-Latn-t-sa-m0-iast is unnecessary, though it
partly depends on the range of the IAST transform.  If the
transformation can only convert to the Roman script then 'Latn' is
superfluous; I'm not sure if the extension is formally enough to rule
out Devanagari.  On the other hand, some people seem to think that
there is an IAST transformation to Cyrillic. 

However, as a locale for generated text, I feel it is inadequate.
Wouldn't the expansion rules generate saṃti from संति rather than santi
from सन्ति for 'they are'? Or have better fonts changed Indian practice?

Richard.



Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-03 Thread Richard Wordingham via Unicode
On Tue, 3 Dec 2019 02:05:35 +
Richard Wordingham via Unicode  wrote:


> I'm still trying to work out what to do for IAST.  Is it just:
> 
> sa-t-m0-iast
> 
> if one finds that
> 
> sa-Latn
> 
> allows too much latitude?

For material that is a transcription rather than a transliteration, are
there regional preferences for the homorganic nasals when writing in
the writing systems generated by IAST?

> How does one choose between anusvara and specific consonants
> for homorganic nasals? Is it sa-150-t-m0-iast v. sa-IN-t-m0-iast?

As these locales strictly speaking defined locales, I think I put the
region in the wrong place.  Perhaps they should be:

sa-t-m0-sa-150-Deva-iast v. sa-t-m0-sa-IN-Deva-iast

As a locale, is the latter the same as sa-t-m0-sa-IN-Mlym?  I'm not
sure how the preference for writing homorganic nasals varies by region
and by script.  What is the scope of IAST?  Does sa-t-m0-sa-Thai
exist?  sa-Thai seems to prefer the nasal stops to anusvara before
oral stops.

The text in IAST that I encounter seems not to have ansuvara before
stop consonants.  I believe 'sa' would naturally expand (are there
non-void prescribed rules on this?) as sa-Deva-IN, so perhaps the
sa-Latn I usually see is unusual as sa-t-m0-iast and the description
should be expanded to at least sa-t-m0-sa-150-iast if sa-Latn is not
precise enough.

Can someone advise?

Richard.


Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Mark Davis ☕️ via Unicode
Filed the following, thanks Richard.
CLDR-13445 

Release link for "latest" goes to zip file








On Tue, Dec 3, 2019 at 2:31 AM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:

> On Mon, 2 Dec 2019 09:09:02 -0800
> Markus Scherer via Unicode  wrote:
>
> > On Mon, Dec 2, 2019 at 8:42 AM Roozbeh Pournader via Unicode <
> > unicode@unicode.org> wrote:
> >
> > > You don't need an ISO 15924 script code. You need to think in terms
> > > of BCP 47. Sanskrit in Latin would be sa-Latn.
> > >
> >
> > Right!
> >
> > Now, if you want to distinguish the different transcription systems
> > for
> > > writing Sanskrit in Latin, you can apply to registry a BCP 47
> > > variant. There are also BCP 47 extension T, which may also be
> > > useful to you:
> > >
> > > https://tools.ietf.org/html/rfc6497
> > >
> >
> > And that extension is administered by Unicode, with documentation and
> > data here:
> > http://www.unicode.org/reports/tr35/tr35.html#t_Extension
>
> But that says that the definitions are at
>
> https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml
> ,
> but all one currently gets from that is an error message 'XML Parsing
> Error: no element found'.
>


Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Vishvas Vasuki
On Tue, Dec 3, 2019 at 7:28 AM Markus Scherer  wrote:

>
> The subtag I would use for IAST seems to be:
>> sa-Latn-t-sa-m0-iast (https://r12a.github.io/app-subtags/ is unable to
>> confirm that the extension
>> 
>> t-sa-m0-iast  is all right though.. Could someone confirm?)
>>
>
> I assume that the second "sa" is unnecessary, but I am not very familiar
> with the -t- extension.
>

The example und-Cyrl-t-und-latn-m0-ungegn-2007 in
https://tools.ietf.org/rfc/rfc6497.txt led me to use:
sa-Latn-t-sa-Zyyy-m0-iast for my case.


>
> Then, the next step seems to be to propose to add the below to
>> https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml
>> :
>> ISO 15919, Kyoto-Harvard, ITRANS, Velthuis, SLP1, WX, National Library at
>> Kolkata romanisation
>> How to proceed with that?
>>
>
> I would start with filing a CLDR ticket:
> http://cldr.unicode.org/index/bug-reports
>

Thanks! I've filed https://unicode-org.atlassian.net/browse/CLDR-13444 .



>
> Best regards,
> markus
>


-- 
--
Vishvas /विश्वासः


Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Richard Wordingham via Unicode
On Tue, 3 Dec 2019 01:27:39 +
Richard Wordingham  wrote:

> On Mon, 2 Dec 2019 09:09:02 -0800
> Markus Scherer via Unicode  wrote:
> 
> > On Mon, Dec 2, 2019 at 8:42 AM Roozbeh Pournader via Unicode <  
> > unicode@unicode.org> wrote:
> >   
> > > You don't need an ISO 15924 script code. You need to think in
> > > terms of BCP 47. Sanskrit in Latin would be sa-Latn.
> > >
> > 
> > Right!
> > 
> > Now, if you want to distinguish the different transcription systems
> > for  
> > > writing Sanskrit in Latin, you can apply to registry a BCP 47
> > > variant. There are also BCP 47 extension T, which may also be
> > > useful to you:
> > >
> > > https://tools.ietf.org/html/rfc6497
> > >
> > 
> > And that extension is administered by Unicode, with documentation
> > and data here:
> > http://www.unicode.org/reports/tr35/tr35.html#t_Extension  
> 
> But that says that the definitions are at
> https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml
>  ,
> but all one currently gets from that is an error message 'XML Parsing
> Error: no element found'.

A working URI is
https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml .

I'm still trying to work out what to do for IAST.  Is it just:

sa-t-m0-iast

if one finds that

sa-Latn

allows too much latitude?

How does one choose between anusvara and specific consonants
for homorganic nasals? Is it sa-150-t-m0-iast v. sa-IN-t-m0-iast?

Richard.


Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Markus Scherer via Unicode
On Mon, Dec 2, 2019 at 5:47 PM विश्वासो वासुकिजः (Vishvas Vasuki) via
Unicode  wrote:

> But that says that the definitions are at
>>
>
>> https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml
>> ,
>> but all one currently gets from that is an error message 'XML Parsing
>> Error: no element found'.
>>
>
> Yes - that needs to be fixed (+markda...@google.com - could you please? )
>
> https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml
> shows iast!
>

FYI A working link to the version in the latest release is
https://github.com/unicode-org/cldr/blob/latest/common/bcp47/transform.xml

The subtag I would use for IAST seems to be:
> sa-Latn-t-sa-m0-iast (https://r12a.github.io/app-subtags/ is unable to
> confirm that the extension
> 
> t-sa-m0-iast  is all right though.. Could someone confirm?)
>

I assume that the second "sa" is unnecessary, but I am not very familiar
with the -t- extension.

Then, the next step seems to be to propose to add the below to
> https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml
> :
> ISO 15919, Kyoto-Harvard, ITRANS, Velthuis, SLP1, WX, National Library at
> Kolkata romanisation
> How to proceed with that?
>

I would start with filing a CLDR ticket:
http://cldr.unicode.org/index/bug-reports

Best regards,
markus


Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Vishvas Vasuki
On Tue, Dec 3, 2019 at 6:59 AM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:

> > > You don't need an ISO 15924 script code. You need to think in terms
> > > of BCP 47. Sanskrit in Latin would be sa-Latn.
> > >
> >
> > Right!
> >
> > Now, if you want to distinguish the different transcription systems
> > for
> > > writing Sanskrit in Latin, you can apply to registry a BCP 47
> > > variant. There are also BCP 47 extension T, which may also be
> > > useful to you:
> > >
> > > https://tools.ietf.org/html/rfc6497
> > >
> >
> > And that extension is administered by Unicode, with documentation and
> > data here:
> > http://www.unicode.org/reports/tr35/tr35.html#t_Extension
>
> Thanks for the pointers!



> But that says that the definitions are at
>
> https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml
> ,
> but all one currently gets from that is an error message 'XML Parsing
> Error: no element found'.
>

Yes - that needs to be fixed (+markda...@google.com - could you please? )

https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml
shows iast!

The subtag I would use for IAST seems to be:
sa-Latn-t-sa-m0-iast (https://r12a.github.io/app-subtags/ is unable to
confirm that the extension

t-sa-m0-iast  is all right though.. Could someone confirm?)

Then, the next step seems to be to propose to add the below to
https://github.com/unicode-org/cldr/blob/master/common/bcp47/transform.xml :
ISO 15919, Kyoto-Harvard, ITRANS, Velthuis, SLP1, WX, National Library at
Kolkata romanisation
How to proceed with that?


-- 
--
Vishvas /विश्वासः


Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Richard Wordingham via Unicode
On Mon, 2 Dec 2019 09:09:02 -0800
Markus Scherer via Unicode  wrote:

> On Mon, Dec 2, 2019 at 8:42 AM Roozbeh Pournader via Unicode <
> unicode@unicode.org> wrote:  
> 
> > You don't need an ISO 15924 script code. You need to think in terms
> > of BCP 47. Sanskrit in Latin would be sa-Latn.
> >  
> 
> Right!
> 
> Now, if you want to distinguish the different transcription systems
> for
> > writing Sanskrit in Latin, you can apply to registry a BCP 47
> > variant. There are also BCP 47 extension T, which may also be
> > useful to you:
> >
> > https://tools.ietf.org/html/rfc6497
> >  
> 
> And that extension is administered by Unicode, with documentation and
> data here:
> http://www.unicode.org/reports/tr35/tr35.html#t_Extension

But that says that the definitions are at
https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml
 ,
but all one currently gets from that is an error message 'XML Parsing
Error: no element found'.


Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Markus Scherer via Unicode
On Mon, Dec 2, 2019 at 8:42 AM Roozbeh Pournader via Unicode <
unicode@unicode.org> wrote:

> You don't need an ISO 15924 script code. You need to think in terms of BCP
> 47. Sanskrit in Latin would be sa-Latn.
>

Right!

Now, if you want to distinguish the different transcription systems for
> writing Sanskrit in Latin, you can apply to registry a BCP 47 variant.
> There are also BCP 47 extension T, which may also be useful to you:
>
> https://tools.ietf.org/html/rfc6497
>

And that extension is administered by Unicode, with documentation and data
here:
http://www.unicode.org/reports/tr35/tr35.html#t_Extension

Best regards,
markus


Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Roozbeh Pournader via Unicode
You don't need an ISO 15924 script code. You need to think in terms of BCP
47. Sanskrit in Latin would be sa-Latn. Now, if you want to distinguish the
different transcription systems for writing Sanskrit in Latin, you can
apply to registry a BCP 47 variant. There are also BCP 47 extension T,
which may also be useful to you:

https://tools.ietf.org/html/rfc6497

On Mon, Dec 2, 2019, 7:48 AM विश्वासो वासुकिजः (Vishvas Vasuki) via Unicode
 wrote:

> bcc:   as an FYI - plz respond on
> the unicode mailing list as needed.
>
> namaste!
>
> Sanskrit has traditionally been written in a variety of scripts ranging
> from Sharada to Grantha. In the past two centuries, it has been written in
> Latin based scripts as well (please see
> https://en.wikipedia.org/wiki/Devanagari_transliteration
> ). We
> would like these Latin based scripts (IAST, ISO 15919, Kyoto-Harvard,
> ITRANS, Velthuis, SLP1, WX, National Library at Kolkata romanisation) to be
> included in the https://unicode.org/iso15924/iso15924-codes.html list.
>
> The reason is that we would like to be able to present sanskrit text in a
> variety of scripts and representations (see related thread
> )
> - and search engines like Google
> recommend using  ISO
> 15924 to specify the script. Please guide us as to how to proceed.
>
> --
> --
> Vishvas /विश्वासः
>
>