from:"Shriramana Sharma via Unicode"

Re: emojis for mouse buttons?

2019-12-31 Thread Shriramana Sharma via Unicode

Why are these called "emojis" for mouse buttons rather than just
"characters" for them?

On Tue, 31 Dec, 2019, 18:45 Philippe Verdy via Unicode, 
wrote:

> A lot of application need to document their keymap and want to display
> keys.
>
> For now there are emojis for mouses (several variants: 1, 2 or 3 buttons),
> independently of the button actually pressed.
>
> However there's no simple emoji to represent the very common mouse click
> buttons used in lot of UI.
>
> But it would be good to have emojis for the left, center, and right click
> (showing a mouse with the correct button filled in black), instead of
> writing "left click" in plain text.
>
> Has it been proposed ?
>
> See for example https://wiki.openstreetmap.org/wiki/ID/Shortcuts
>
>

Re: Not accepted by UTC but in ISO ballot?

2019-12-27 Thread Shriramana Sharma via Unicode

Hello Ken and thanks for the reply. So I understand that the need for this
category is rare but occurs nevertheless.

Now I'm wondering about the similar category "not accepted by UTC, and not
in ISO ballot" – why such a character would be mentioned on the pipeline at
all…

On Fri, 27 Dec, 2019, 07:19 Ken Whistler,  wrote:

> Shriramana,
>
> On 12/20/2019 6:29 PM, Shriramana Sharma via Unicode wrote:
> > I was looking at the pipeline for something else, and for the first
> > time I see a character category: “not accepted by the UTC but in ISO
> > ballot” and two characters in it.
> Those two characters changed status as of December 4, when the
> disposition of comments for CD3 was posted. They will not be part of the
> DIS ballot. The pipeline has now been updated to reflect that change of
> status.
> >
> > So IIUC while technically people are free to submit a document to the
> > ISO separately without submitting to UTC, it has always been the
> > practice to my knowledge to get a character approved by the UTC first.
>
> That is a preferred process, but doesn't always occur. The most obvious
> exception is that large new CJK repertoire additions are developed by
> the IRG and often go into ballot in ISO before the UTC takes a formal
> decision to approve them. CJK Extension G has now been approved for 13.0
> by the UTC, but the entire block was listed in the pipeline for some
> time as "not accepted by UTC, but in active ISO technical ballot" once
> Extension G went into CD balloting.
>
> --Ken
>
>

Re: Re: NBSP supposed to stretch, right?

2019-12-22 Thread Shriramana Sharma via Unicode

So I was wondering whether TeX only does this to the ~ input character or
the actual NBSP Unicode character too?

Re: NBSP supposed to stretch, right?

2019-12-21 Thread Shriramana Sharma via Unicode

On 12/19/19, James Kass via Unicode  wrote:
>
> There's a bug report for the LibreOffice application here...
> https://bugs.documentfoundation.org/show_bug.cgi?id=41652
> ...which shows an interesting history of the situation.

LOL two years ago almost to the date Shriramana Sharma seems to have
already *quoted* the Unicode Standard on this
(https://bugs.documentfoundation.org/show_bug.cgi?id=41652#c30):

The Unicode standard document http://unicode.org/reports/tr14/ clearly
states that:

When expanding or compressing interword space according to common
typographical practice, only the spaces marked by U+0020 SPACE and
U+00A0 NO-BREAK SPACE are subject to compression, and only spaces
marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces
marked by U+2009 THIN SPACE are subject to expansion. All other space
characters normally have fixed width.

But we have some people there on that bug saying that:

While Unicode is an important standard, it's only of secondary
importance to an office suite. Its primary goal is *not* creating a
reference comformant implementation of the standard; rather, it should
use the standard to the extent it needs to serve its users most.

which is a 😒 approach in my eyes but well, that's how the real world
is on many things. Anyhow the above comment is continued as:

And if legacy requires that some statements of standard be violated to
keep existing documents intact, that should be that way, until a
better design is invented and implemented, which would make possible
to please both sides.

This means option #1 I mentioned earlier and which seems to already
have been discussed in the bug discussion: provide a per-document
option or at least a Word-compatibility option as to how to treat
NBSP.

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Long standing problem with Vedic tone markers and post-base visarga/anusvara

2019-12-20 Thread Shriramana Sharma via Unicode

https://github.com/harfbuzz/harfbuzz/issues/2017 should provide the
context for this.

Ever since the early days of Devanagari Unicode, scholars like me
dealing with Vedic Sanskrit orthography have been experiencing this
problem, but chalked it upto early days and consequent insufficient
support for Vedic sequences. Even now, Vedic support even on the font
side is quite limited, and we also find limitations on the software
side. So I hope it's time to fix them one by one.

The issue I would like to discuss now is as follows:

# SEMANTIC DISSOCIATION OF THE VISARGA FROM THE SYLLABLE

In Vedic, syllables that carry tone markers – which are mostly
above-base or below-base – often have to take a visarga, which is
always post-base. In this case, the sequence intuitive to native
scholars like me is:

+ +

This is because the tone marker indicates the tone of the syllable (or
its vowel) and the visarga is a separate aspirated sound *after* the
syllable to which the tone marker doesn't apply.

In fact, the only reason the visarga sign is analysed as a combining
mark rather than a separate letter is that it is not used in isolation
without a preceding syllable. Otherwise ie linguistically it doesn't
modify the preceding syllable in any way.

Anyhow, the point is that the tone marker should come before the
visarga because it semantically applies to the preceding syllable and
not the visarga.

This is all the more so since in some Vedic contexts (Sama Gana) the
visarga is far separated from the syllable by other syllables like
digits (themselves carrying combining marks) or spacing anusvara, as
seen in examples from my Grantha proposal L2/09-372 p 40.

So the visarga is semantically quite dissociated from the preceding
syllable unlikely the tone marker which is intimately associated with
it.

# SAME APPLICABLE TO THE ANUSVARA

The same argument is also applicable to the anusvara as it also
represents a nasal sound separate from the preceding syllable. (The
candrabindu OTOH nasalises the preceding syllable itself.)

The above Grantha proposal page also shows an example where an
anusvara is orthographically separated from the preceding syllable by
three characters: a tone marker + avagraha + digit. L2/15-178 shows
that in equivalent contexts of Devanagari the digit 0 is used as a
substitute since the Devanagari anusvara is non-spacing.

All this goes to the dissociation from the syllable of the anusvara –
just like the visarga – compared to tone markers. So to be consistent,
even in case of Devanagari (or such script) where the anusvara is
non-spacing, the sequence when a tone marker is also involved puts the
tone marker first, as mentioned before:

+ +

# CURRENT SITUATION INCOMPATIBLE WITH ABOVE

However, even the simplest Vedic sequence (not involving Sama Vedic or
multiple tone marker combinations) like दे॒वेभ्य॑ः throws up a dotted
circle, and one is expected (see developer feedback in that bug
report) to input the visarga before tone markers, hoping the software
is intelligent enough to skip over the visarga (or anusvara) place the
tone marker over the preceding syllable correctly. Why it is necessary
to put the visarga first in input only to have to skip over it in
shaping is beyond me.

So makes sense neither from a linguistic nor technological perspective
to push the tone markers to the end of the syllable. Even the
developers acknowledge that non-spacing marks are normally (ie outside
Indic) input before spacing ones.

However, they say “we can't support that in this particular case
because this is how Microsoft does it and we have to follow suit to
ensure people get the same shaping for the same input”,
notwithstanding the fact that the expectation to put the
visarga/anusvara first is non-sensical as explained above.

So everyone is looking to Microsoft Uniscribe (or whatever its
successor is) to fix things first before they can follow. I figured
that if this is discussed and decided here, everyone can fix it at the
same time.

--
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: NBSP supposed to stretch, right?

2019-12-20 Thread Shriramana Sharma via Unicode

On 12/21/19, Richard Wordingham via Unicode  wrote:
> On Fri, 20 Dec 2019 17:25:17 +0530
> Shriramana Sharma via Unicode  wrote:
>
>> I don't expect NBSP to ever disappear, because spaces disappear only
>> at linebreaks, and NBSP simply doesn't stand at linebreaks.
>
> I can certainly imagine someone writing "  ".

You don't need to go so far. Even the Unicode characters can be
entered: A0 0A (which makes for a nice smiley like pattern, two ears
besides two eyes 😉).

Obviously we are talking about *automatic* linebreaks. IIUC the point
about NBSP is that *it itself* doesn't break, whereas SP breaks up and
is *replaced* by a linebreak.

Nobody said anything about manual linebreak characters *following* a
space character, whether SP or NBSP or anything else.

I also just tested and noticed something related: in my wordprocessor
(LibreOffice Writer) when the cursor is near the end of a line and the
horizontal space remaining on that line is less than the nominal
advance width of the space, pressing space doesn't advance the cursor
(or maybe it does and I don't see it) irrespective of whether the
paragraph is left-aligned or justified, whereas inputting NBSP goes to
the next line, pulling the word before it along with it. This is
consistent with the current fixed-width NBSP behaviour of these
wordprocessors.

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: [EXTERNAL] Re: NBSP supposed to stretch, right?

2019-12-20 Thread Shriramana Sharma via Unicode

On 12/21/19, Shriramana Sharma  wrote:
> 1)
>
> With the existing single NBSP character, provide a software option to
> either make it flexible or inflexible, but this preference should be
> stored as part of the document and not the application settings, else
> shared documents would not preserve the layout intended by the
> creator.

One thing I forgot: are there any possibilities that *both* behaviours
would be required in the same document?

To my imagination, I who expect NBSP to be flexible won't use it
between text and punctuation like those Word users, and probably they
won't use it like me.

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: [EXTERNAL] Re: NBSP supposed to stretch, right?

2019-12-20 Thread Shriramana Sharma via Unicode

On 12/21/19, Murray Sargent  wrote:
> I checked with the Word team and they actually tried out stretching NBSP
> back in 2015 in the "good client" mode. But customer feedback was negative.
> The problem is that NBSP is used sometimes when stretching isn't wanted such
> as between the end of a question and the question mark or in multi-word
> trademarks or in italic expressions such as ad infinitum. Another example is
> Text«quotation»moretext. One doesn't want the « and » to
> be spaced apart from "quotation" for justification purposes.
>
> Conceivably Word should offer a special justification option to stretch
> NBSP, but user feedback has revealed that it's not a good default option.

Ohkay and that's very nice meaningful feedback from actual
developer+user interaction. So the way I look at this going forward is
that we have four options:

1)

With the existing single NBSP character, provide a software option to
either make it flexible or inflexible, but this preference should be
stored as part of the document and not the application settings, else
shared documents would not preserve the layout intended by the
creator.

2)

Consider that the non-stretching behaviour of wordprocessors (probably
following MS Word) is correct, and encode a new NBFSP non-breaking
flexible space. [I'm looking at that convenient hole at 2065.]

DTP software like InDesign/TeX (and browsers like Firefox, though web
content is assumed to be more fluid typographically) should then
ideally conform to this and potentially break their users' documents
(esp in the case of DTP).

3)

Consider that the stretching behaviour of DTP software like InDesign
is correct, and encode a new FWNBSP fixed-width non-breaking space [at
2065].

Wordprocessors should then ideally conform to this and potentially
break their users' documents.

4)

Leave alone the existing ambiguous behaviour of NBSP, and encode two
new characters [Supplemental Punctuation has space at 2E50…] for NBFSP
and FW-NBSP. Like the existing 2028 and 2029 Line and Paragraph
Separators with the annotation: “may be used to represent this
semantic unambiguously”.

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Not accepted by UTC but in ISO ballot?

2019-12-20 Thread Shriramana Sharma via Unicode

I was looking at the pipeline for something else, and for the first
time I see a character category: “not accepted by the UTC but in ISO
ballot” and two characters in it.

So IIUC while technically people are free to submit a document to the
ISO separately without submitting to UTC, it has always been the
practice to my knowledge to get a character approved by the UTC first.

Anyone throw some light on these particular cases?

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: NBSP supposed to stretch, right?

2019-12-20 Thread Shriramana Sharma via Unicode

On 12/17/19, Asmus Freytag via Unicode  wrote:
> On 12/17/2019 2:41 AM, Shriramana Sharma via Unicode wrote:
>>
>> On Tue 17 Dec, 2019, 16:09 QSJN 4 UKR via Unicode, 
>> wrote:
>>>
>>> «The no-break space is not the same character as the figure space. The
>>> figure space is not a character defined in most computer system's
>>> current code pages. In some fonts this character's width has been
>>> defined as equal to the figure width. This is an incorrect usage of
>>> the character no-break space.»
>>
>>
>> Sorry but I don't understand how this addresses the issue I raised.
>
> You don't?
>
> In principle it may be true that NBSP is not fixed width, but show me
> software that doesn't treat it that way.
>
> In HTML, NBSP isn't subject to space collapse, therefore it's the go-to
> space character when you need some extra spacing that doesn't disappear.

So I never asked for NBSP to disappear. I said I want it to *stretch*.
And to my mind "stretch" means to become wider than one's normal
width. It doesn't include decreasing or disappearing width.

I don't expect NBSP to ever disappear, because spaces disappear only
at linebreaks, and NBSP simply doesn't stand at linebreaks.

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: NBSP supposed to stretch, right?

2019-12-17 Thread Shriramana Sharma via Unicode

On Tue 17 Dec, 2019, 16:09 QSJN 4 UKR via Unicode, 
wrote:

> Agree.
> By the way, it is common practice to use multiple nbsp in a row to
> create a larger span. In my opinion, it is wrong to replace fixed
> width spaces with non-breaking spaces.
> Quote from Microsoft Typography Character design standards:
> «The no-break space is not the same character as the figure space. The
> figure space is not a character defined in most computer system's
> current code pages. In some fonts this character's width has been
> defined as equal to the figure width. This is an incorrect usage of
> the character no-break space.»
>

Sorry but I don't understand how this addresses the issue I raised.

NBSP supposed to stretch, right?

2019-12-16 Thread Shriramana Sharma via Unicode

Hello. I've just tested LibreOffice, Google Docs and MS Office on
Linux, Android and Windows, and it seems that NBSP doesn't get
stretched like the normal space character when justified alignment
requires it.

Let me explain. I'm creating a document with the following text
typeset in 12 pt Lohit Tamil with justified alignment on an A5 page
with 0.5" margin all around:

ஶ்ரீமத் மஹாபாரதம் என்பது நமது தேசத்தின் பெரும் இதிஹாஸமாகும். இதனை
இயற்றியவர் ஶ்ரீ வேத வ்யாஸர். அவரால் அனுக்ரஹிக்கப்பட்டவையான நூல்கள் பல.

The screenshot
https://sites.google.com/site/jamadagni/files/temp/nbsp-not-expanding.png
may be useful to illustrate the situation. Readers may try such
similar sentences in any software/platform of their choice and report
as to what happens.

Here the problem arises with the phrase ஶ்ரீ வேத வ்யாஸர். The word
ஶ்ரீ is a honorific applying to the following name of the sage வேத
வ்யாஸர், so it would seem unsightly to the reader if it goes to the
previous line, so I insert an NBSP between it and the name. (Isn't
there such a stylistic convention in English where Mr doesn't stand at
the end of a line? I don't know.)

However, the phrase is shortly followed by a long word
அனுக்ரஹிக்கப்பட்டவையான, which is too long to fit on the same line and
hence goes to the next line, thereby increasing the inter-word spacing
on its previous line significantly. But the NBSP after the honorific
doesn't stretch, making the word layout unsightly.

IIUC, no-break space is just that: a space that doesn't permit a line
break. This says nothing about it being fixed width.

Unicode 12.0 §2.3 on p 27 (55 of PDF) says:

“Other compatibility decomposable characters are widely used
characters serving essential functions. U+00A0 no-break space is one
example. In these and similar cases, such as fixed-width space
characters,….”

To my understanding this itself says that NBSP isn't fixed-width.

ibid §6.2 on p 265 (293 of PDF) specifically talking about spacing
characters says:

“No-Break Space. U+00A0 no-break space (NBSP) is the nonbreaking counterpart of
U+0020 space. It has the same width, but behaves differently for line
breaking. For more information, see Unicode Standard Annex #14,
“Unicode Line Breaking Algorithm.”

The wording “but behaves differently for line breaking” seems to
vindicate what I understood that the only difference is in line
breaking behaviour but the wording “has the same width” doesn't
clearly say anything about the stretching behaviour, only about the
nominal advance width given as part of font data.

I would have gone and filed this as a LibreOffice bug since that's the
software I use most, but when I found this is a cross-software
problem, I thought it would be best to have this discussed and
documented here (and in a future version of the standard).

My expectation is that since NBSP is not intended to be a fixed width
space, and the only difference intended between it and the normal
U+0020 SP being in line breaking, NBSP should be treated equal to
U+0020 for the purpose of stretching for justified alignment.

Only then can text such as the above be naturally easily formatted.

--
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Emoji boom?

2019-05-01 Thread Shriramana Sharma via Unicode

http://www.unicode.org/L2/L-curdoc.htm

The number of emoji-related proposals seems to be increasing compared
to the number of script-related ones.

Have we reached a plateau re scripts encoding?

Somehow this seems sad to me considering the great role Unicode played
in bringing Indic scripts (from my POV as an Indian) to mainstream
digital devices.

--
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: Fw: Latin Script Danda

2019-04-19 Thread Shriramana Sharma via Unicode

I don't know many modern fonts that display 007C as a broken glyph. In fact
I haven't seen a broken line pipe glyph since the MS-DOS days. Nowadays we
have 00A6 for that.

Re: Script_extension Property of U+0310 Combining Candrabindu

2019-04-19 Thread Shriramana Sharma via Unicode

On 4/19/19, Richard Wordingham via Unicode  wrote:
> That reminds me - what if anything is happening about Tamil script
> candrabindu? You reported that U+0310 was being used in that rôle.

I think that there was an idea to add Taml to U+0310's script extensions.

Or maybe the Grantha candrabindu can be used, since there is already
evidence for mixed usage of the scripts and nukta characters have been
encoded for Tamil usage in the Grantha block for this same reason
despite Grantha users objecting to it as unattested! 😉

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: Script_extension Property of U+0310 Combining Candrabindu

2019-04-18 Thread Shriramana Sharma via Unicode

On 4/19/19, Richard Wordingham via Unicode  wrote:
> That's a fair point.  My problem is that someone is claiming of
> U+0310 that "Somewhere in the Unicode specifications is a footnote
> saying it is to be used with Devanagari".

Why would anyone want to use 0310 with any Indic script that already
has a candrabindu?

> However, some people get rather upset with the idea of using the
> general combining diacritics in Indic scripts.

Many Vedic svara characters have lookalikes among the Combining
Diacritics but they were encoded anyway since IIUC the UTC felt that
separate characters would help preserving sanity in implementing text
shaping engines or such.

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: Latin Script Danda

2019-04-18 Thread Shriramana Sharma via Unicode

We are using the pipe character as it is readily available in our
favourite Latin script fonts. See for example:
https://twitter.com/ShriramanaS/status/793480884116529152

It would be ideal for Sanskrit/Indic text in IAST/ISO to be
displayable/printable using any common Latin font which is found
typographically pleasant. For instance the font I have used in that
Twitter post is Gentium Basic. I use this font for most of my Latin
script publication purposes (including Unicode documents) and it
contains the pipe character but it does not contain Devanagari
characters.

It would be difficult to canvas Latin font vendors to include the
Devanagari characters 0964/0965 on a small technicality of character
property.

Is there a particular reason it's *really* necessary to include Latn
in the script extension property of 0964/0965?


-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Excessive emoji usage and TTS (was Re: A last missing link)

2019-01-10 Thread Shriramana Sharma via Unicode

On Thu 10 Jan, 2019, 20:49 Arthur Reutenauer via Unicode <
unicode@unicode.org wrote:

>
>   On this topic, I was just pointed to
>
> https://twitter.com/kentcdodds/status/1083073242330361856
>
>   “You 𝘵𝘩𝘪𝘯𝘬 it's 𝒸𝓊𝓉ℯ to 𝘄𝗿𝗶𝘁𝗲 your tweets and usernames
> 𝖙𝖍𝖎𝖘 𝖜𝖆𝖞. But
> have you 𝙡𝙞𝙨𝙩𝙚𝙣𝙚𝙙 to what it 𝘴𝘰𝘶𝘯𝘥𝘴 𝘭𝘪𝘬𝘦 with assistive
> technologies
> like 𝓥𝓸𝓲𝓬𝓮𝓞𝓿𝓮𝓻?”


Something similar:

https://twitter.com/aaronreynolds/status/1083098920132071424?s=20

"This is what it’s like to get texts from my fourteen year old while
driving."

https://t.co/s8949bmgZI

Shortcuts question

2018-09-06 Thread Shriramana Sharma via Unicode

Hello. This may be slightly OT for this list but I'm asking it here as it
concerns computer usage with multiple scripts and i18n:

1) Are shortcuts like Ctrl+C changed as per locale? I mean Ctrl+T for
"tout" io Ctrl+A for "all"?

2) How about when the shortcuts are the Alt+ combinations referring to
underlined letters in actual user visible strings?

3) In a QWERTZ layout for Undo should one still press the (dislocated wrt
the other XCV shortcuts) Z key or the Y key which is in the physical
position of the QWERTY Z key (and close to the other XCV shortcuts)?

4) How are shortcuts handled in the case of non Latin keyboards like
Cyrillic or Japanese?

4a) I mean how are they displayed on screen?

4b) Like #1 above, are they changed per language?

4c) Like #2 above, how about for user visible shortcuts?

(In India since English is an associate official language, most computer
users are at least conversant with basic English so we use the
English/QWERTY shortcuts even if the keyboard physically shows an Indic
script.)

Thanks!

Usage of emoji in coding contexts!

2018-08-08 Thread Shriramana Sharma via Unicode

First time I'm seeing this (maybe others have seen this already):

https://github.com/wei/pull

Emoji being used in commit messages for classifying the nature of the
commit – bug fixes, feature additions etc

Now *that*'s a nice creative usage of emoji IMO…

I see they haven't used them always as the actual emoji characters but
sometimes as :coloned-tags: (or what do you call it) but I presume the
GitHub system will convert it to the actual characters before
displaying…

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: Tamil Brahmi Short Mid Vowels

2018-07-20 Thread Shriramana Sharma via Unicode

This is a unique problem because this is probably the only case where the
same script produces conjuncts for one language and not for another. I had
asked for a separate Tamil Brahmi virama to be encoded which would obviate
this problem but that was shot down. Maybe that case should be reopened?

On Sat 21 Jul, 2018, 06:33 Richard Wordingham via Unicode, <
unicode@unicode.org> wrote:

> A problem has been spotted with the rendering of Tamil Brahmi vowels -
> in particular the sequence  VOWEL SIGN O, U+11046 BRAHMI VIRAMA> does not conform to the grammar
> of the Universal Shaping Engine (USE); a dotted circle may be inserted
> between the vowel and the pulli.
>
> When considering font-level remedies, I realised that there may be a
> problem with a following consonant - is  U+11022 BRAHMI LETTER TA> a correct encoding of what may be
> transliterated as _kŏta_?
>
> The nearest to a convincing justification I can find for it to require
> U+200C ZWNJ after the virama is the text in TUS Section 12.1 for
> *Explicit Virama*, but that merely says that ZWNJ is required to
> produce explicit virama rather than a _conjunct_.  As I understand
> it, a subscript final consonant would be encoded as consonant+virama
> rather than virama+consonant, so there is no ambiguity in Brahmi text.
> (If we try to make a rule out of two conflicting mechanisms, the
> difference might be that one is used for viramas and the other is used
> for invisible stackers, though that would require changing U+10A3F
> KHAROSHTHI VIRAMA back to being a virama.) The problem is that a font
> that tries to recover the situation might interpret  U+11044, U+25CC DOTTED CIRCLE, U+11046, U+11022> as having TA
> subscripted to the dotted circle.  If ZWNJ is required for _kŏta_, what
> text if any in TUS requires it?
>
> Richard.
>
>

Re: UNICODE vehicle vanity registration?

2018-02-14 Thread Shriramana Sharma via Unicode

On 14-Feb-2018 22:45, "Alastair Houghton" 
wrote:


I’d hope that Mark Davis has “UNICODE” on his car.  However, I’m not sure
how relevant it really is to this mailing list.


You're right. My apologies. It *is* somewhat OT to the actual purpose of
this list. But I figured if anyone knew the answer to my question they'd be
here.

Re: UNICODE vehicle vanity registration?

2018-02-14 Thread Shriramana Sharma via Unicode

Sorry but "UNICODE" does fit within those rules doesn't it?

On 14-Feb-2018 21:54, "Stephane Bortzmeyer"  wrote:

On Wed, Feb 14, 2018 at 09:44:06PM +0530,
 Shriramana Sharma via Unicode  wrote
 a message of 6 lines which said:

> Given that in the US vanity vehicle registrations with arbitrary
> alphanumeric sequences upto 7 characters are permitted (I am correct
> I hope?), I wonder who (here?) owns the UNICODE registration?

Won't work in New York, unfortunately

https://dmv.ny.gov/learn-about-personalized-plates

"A character is a letter (A-Z), number (0-9) or space. Each space
counts as one character."

UNICODE vehicle vanity registration?

2018-02-14 Thread Shriramana Sharma via Unicode

Given that in the US vanity vehicle registrations with arbitrary
alphanumeric sequences upto 7 characters are permitted (I am correct I
hope?), I wonder who (here?) owns the UNICODE registration?

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: Why so much emoji nonsense?

2018-02-14 Thread Shriramana Sharma via Unicode

>From a mail which I had sent to two other Unicode contributors just a
few days ago:

Frankly I agree that this whole emoji thing is a Pandora box. It
should have been restricted to emoticons to express facial or physical
gestures which are insufficiently representable by words. When it
starts representing objects like 🍇🍎 then it becomes a problem as to
where to draw the line.

I mean I can see the argument for 💐 representing gratitude, but which
fruits are valid and which not... And which food items are valid and
which not, else you would get proposals for idli and dosa emojis as
well! (Those who don't know what those are see
https://en.wikipedia.org/wiki/Idli and
https://en.wikipedia.org/wiki/Dosa)

It seems to me that graphical items previously rejected as such are
now being encoded. I mean, if other things like bat ball etc then "why
not this one" cannot be refused, but the question is whether encoding
bat ball in the first place was keeping with the original intention or
spirit of Unicode.

Anyhow, what is done is done and the Pandora's box is now open and I
don't envy the ESC their job. I don't know, maybe sometimes they may
just feel like hitting "ESC" too!

--
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: Emoji blooper

2018-02-13 Thread Shriramana Sharma via Unicode

To illustrate…

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Emoji blooper

2018-02-13 Thread Shriramana Sharma via Unicode

Recently sent this message to a friends list:

🎺🎶🎺🎶🎺🎶

Apparently one font has the trumpet facing left and one has it facing
right! So before hitting Send in GMail's web interface, the text
appeared fine but after doing so, in my browser it is showing as if
the music is emanating from the back of the trumpet!

LOL.

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: 0027, 02BC, 2019, or a new character?

2018-01-26 Thread Shriramana Sharma via Unicode

But your outgoing "From" address doesn't seem to have an accent!?

On 26-Jan-2018 13:58, "Andre Schappo via Unicode" 
wrote:

>
> Talking of typing names correctly. Few people bother to type the acute
> accent in André.
>
> This academic year, for the first time ever, I gave the following
> challenges to my web programming class of 143 students. I gave these
> challenges in the first lecture.
>
> ①  learn how to write my name correctly on your desktop computers and
> mobile phones
> ② whenever you email me, ensure you write my name correctly
>
> I am pleased to report that the majority of this class now do type my name
> correctly when emailing me 😀
>
> André Schappo
>
> On 25 Jan 2018, at 18:48, Mark Davis ☕️ via Unicode 
> wrote:
>
> My apologies for the typo. There's no excuse for misspelling someone's
> name (especially since I live in Switzerland, and type German every day).
>
> Thanks for calling my attention to it: the doc has been updated.
>
> Mark
>
> Mark
>
> On Thu, Jan 25, 2018 at 4:15 AM, Andrew West via Unicode <
> unicode@unicode.org> wrote:
>
>> On 23 January 2018 at 00:55, James Kass via Unicode 
>> wrote:
>> >
>> > Regular American users simply don't type umlauts, period.
>>
>> Not even the president of the Unicode Consortium when referring to
>> Christoph Päper:
>>
>> http://www.unicode.org/L2/L2018/18051-emoji-ad-hoc-resp.pdf
>>
>> Andrew
>>
>>
>
> 🌏 🌍 🌎
> André Schappo
> schappo.blogspot.co.uk
> twitter.com/andreschappo
> weibo.com/andreschappo
> groups.google.com/forum/#!forum/computer-science-curriculum-
> internationalization
>
>
>
>
>
>

Re: 0027, 02BC, 2019, or a new character?

2018-01-24 Thread Shriramana Sharma via Unicode

On 24-Jan-2018 00:25, "Doug Ewell via Unicode"  wrote:

I think it's so cute that some of us think we can advise Nazarbayev on
whether to use straight or curly apostrophes or accents or x's or
whatever. Like he would listen to a bunch of Western technocrats.


Sir why this assumption that everyone here is "western"? I'm situated at an
even more eastern longitude than Kazakhstan.

An explicitly stated goal of the new orthography was to enable typing
Kazakh on a "standard keyboard," meaning an English-language one.


IMO it's hardly clear that that is or in fact *what* is meant by a standard
keyboard. It meeely seems to me loose political speak to make it appear as
if they are trying to make things simpler for the people.

Nazarbayev may ultimately be persuaded to embrace ASCII digraphs, which
also meet this goal, but this talk about U+2019 and U+02BC will make
exactly zero difference in Kazakh policy.


It shouldn't. At least the technical advisors should be monitoring this
discussion if not participate in it. I know that Govt of India people do,
at least on UnicoRe.

Re: 0027, 02BC, 2019, or a new character?

2018-01-24 Thread Shriramana Sharma via Unicode

On 23-Jan-2018 10:03, "James Kass via Unicode"  wrote:

(bottle, east, skier, crucial, cherry)
s'i's'a, s'yg'ys, s'an'g'ys'y, s'es'u's'i, s'i'i'e
sxixsxa, sxygxys, sxanxgxysxy, sxesxuxsxi, sxixixe
s̈ïs̈a, s̈yg̈ys, s̈an̈g̈ys̈y, s̈es̈üs̈i, s̈ïïe
śíśa, śyǵys, śańǵyśy, śeśúśi, śííe

Last one most readable of the lot IMO and it's close enough to the
apostrophe option. IIANM the apostrophe is used as a dead key for the acute
accent in some common international keyboard layouts already?

I retract my earlier statement about digraphs probably being the best
option. It was made without looking at the actual requirement. For such
heavy usage, it would simply make things horrible.

Acute accent for the win! 🙄

Re: 0027, 02BC, 2019, or a new character?

2018-01-19 Thread Shriramana Sharma via Unicode

Announcing:

Much ado about apostrophes

A Play

By

William Codesphere

Coming soon to a theatre near you...

😀

Re: 0027, 02BC, 2019, or a new character?

2018-01-19 Thread Shriramana Sharma via Unicode

You can just mail him or Skype-call him no? 😀

On 19-Jan-2018 18:53, "Michael Everson via Unicode" 
wrote:

> I’d go talk with him :-) I published Alice in Kazakh. He might like that.
>
> Michael
>
> > On 19 Jan 2018, at 09:39, Andrew West via Unicode 
> wrote:
> >
> > On 19 January 2018 at 09:16, Shriramana Sharma via Unicode
> >  wrote:
> >> Wow. Somebody really needs to convey this to the Kazhaks. Else a
> >> short-sighted decision would ruin their chances at native IDNs. Any
> Kazhaks
> >> on this list?
> >
> > There's only one Kazakh who counts, and I'm pretty sure he's not on this
> list.
> >
> > Andrew
>
>
>

Re: 0027, 02BC, 2019, or a new character?

2018-01-19 Thread Shriramana Sharma via Unicode

Wow. Somebody really needs to convey this to the Kazhaks. Else a
short-sighted decision would ruin their chances at native IDNs. Any Kazhaks
on this list?

On 19-Jan-2018 00:23, "Asmus Freytag via Unicode" 
wrote:

> Top level IDN domain names can not contain 02BC, nor 0027 or 2019.
>
> (RFC 6912 gives the rationale and RZ-LGR the implementation, see MSR-3
> )
>
> A./
>
> On 1/18/2018 3:00 AM, Andre Schappo via Unicode wrote:
>
>
>
> On 18 Jan 2018, at 08:21, Andre Schappo via Unicode 
> wrote:
>
>
>
> On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode <
> unicode@unicode.org> wrote:
>
> On Mon, 15 Jan 2018 20:16:21 -0800
> James Kass via Unicode  wrote:
>
> It will probably be the ASCII apostrophe.  The stated intent favors
> the apostrophe over diacritics or special characters to ensure that
> the language can be input to computers with standard keyboards.
>
>
> Typing U+0027 into a word processor takes planning.  Of the three, it
> should obviously be the modifier letter U+02BC, but I think what gets
> stored will be U+0027 or the single quotation mark U+2019.
>
> However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA
> ABOVE RIGHT.
>
> Richard.
>
>
> I have just tested twitter hashtags and as one would expect, U+02BC does
> not break hashtags. See twitter.com/andreschappo/status/953903964722024448
>
>
> ...and, just in case twitter.com/andreschappo/status/953944089896083456
> 
>
> André Schappo
>
>
>

Emoji for major planets at least?

2018-01-18 Thread Shriramana Sharma via Unicode

Hello people.

We have sun, earth and moon emoji (3 for the earth and more for the
moon's phases). But we don't have emoji for the rest of the planets.

We have astrological symbols for all the planets and a few
non-existent imaginary "planets" as well.

Given this, would it be impractical to encode proper emoji characters
for the rest of the planets, at least the major ones whose physical
characteristics are well known and identifiable?

I mean for example identifying Sedna and Quaoar
(https://en.wikipedia.org/wiki/File:EightTNOs.png) is probably not
going to be practical for all those other than astronomy buffs but the
physical shapes of the major planets are known to all high school
students…

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: 0027, 02BC, 2019, or a new character?

2018-01-16 Thread Shriramana Sharma via Unicode

Rejecting the digraph method (which is probably the simplest) doesn't have
much meaning because they have different sounds in different languages all
the time like ch in English and German.

Anyhow, it certainly can be difficult convincing non technical political
people.

Modifier letters are more legible than modifier punctuation IMO so that
maybe an option

And the labels on keycaps don't mean anything at all. We in India use the
plain QWERTY keyboard all the time for our scripts.

In any case, the linguistic committee should present their recommendation
along with a new set of actual keycaps and an MSKLC or such input method to
just show the president that what is recommended can be input using "a
standard keyboard".

Popular wordprocessors treating U+00A0 as fixed-width

2017-12-31 Thread Shriramana Sharma via Unicode

While http://unicode.org/reports/tr14/ clearly states that:


When expanding or compressing interword space according to common
typographical practice, only the spaces marked by U+0020 SPACE and
U+00A0 NO-BREAK SPACE are subject to compression, and only spaces
marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces
marked by U+2009 THIN SPACE are subject to expansion. All other space
characters normally have fixed width.


… really sad to see the misunderstanding around U+00A0:

https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2016/nonbreakable-space-justification-in-word-2016/4fa1ad30-004c-454f-9775-a3beaa91c88b?auth=1

https://bugs.documentfoundation.org/show_bug.cgi?id=41652

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

\b and Indic word boundaries?

2017-12-02 Thread Shriramana Sharma via Unicode

Hello. Yesterday I reported https://bugs.python.org/issue32198 but
then was pointed to already existing
https://bugs.python.org/issue1693050 and friends.

>From reading these I came to find \b under
https://unicode.org/reports/tr18/#Compatibility_Properties.

I confess I don't entirely grok all the intricacies. So my question:
isn't \b the Unicode-recommended way of identifying full Unicode-aware
word boundaries in regexes? If not, what is?

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा 𑀰𑁆𑀭𑀻𑀭𑀫𑀡𑀰𑀭𑁆𑀫𑀸

Re: Assamese and Unicode.

2017-09-05 Thread Shriramana Sharma via Unicode

On 9/5/17, Martin J. Dürst via Unicode  wrote:
> The best thing to do is to have lot's of content in Assamese in Unicode.
> This will show that things just work.

IIUC the problem is with Assamese not accepting the label "Bengali" to
"their" script. AFAICS they do not deny that the encoding "just
works".

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा

RE: Unicode education in Schools

2017-08-24 Thread Shriramana Sharma via Unicode

IIUC the limitation seems to be only that functions such as "charAt" do not
recognize that surrogates aren't valid characters:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charAt
via https://stackoverflow.com/a/8716157/1503120.

This is a problem of many 32-bit char based toolkits too and doesn't
(can't?) have an efficient solution for SMP without counting the surrogates
(and checking them). Right?

Re: Unicode education in Schools

2017-08-24 Thread Shriramana Sharma via Unicode

So how do you think it matters if the characters are in the BMP or SMP?

Ah the power of emoji! To encompass even science and mythology!

2017-08-23 Thread Shriramana Sharma via Unicode

🌓🌎🌞 <-- lunar eclipse

🌎🌓🌞 <-- solar eclipse

🌎🌞🌗 <-- apocalypse

https://twitter.com/AstroKatie/status/518697246305439745

☺

Not new (2014) but I hadn't seen this till today and felt it a propos re
the recent pair of eclipses.

Shriramana Sharma.

Re: Version linking?

2017-08-17 Thread Shriramana Sharma via Unicode

Thanks for your reply, but how can characters be used portably if they
are not part of the published standard yet? Or is it that hereafter
both Unicode Standard + Unicode Emoji Standard will be parallelly
portable or something like that?

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा

Version linking?

2017-08-17 Thread Shriramana Sharma via Unicode

A propos 
http://blog.unicode.org/2017/08/unicode-emoji-60-initial-drafts-draft.html
I would like to know whether it is intended that Emoji version N will
be always targeted at Unicode version N + 5 and published in year N +
2012.

I did not find the question or answer at
http://unicode.org/faq/emoji_dingbats.html – hence asking here. I hope
I didn't miss something.

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा

Re: Turtle Graphics Emoji

2017-07-28 Thread Shriramana Sharma via Unicode

for animal in animalKingdom:
  createEmojiProposal(animal)

☺

Emoji are a veritable Pandora box.

Wagging finger emoji?

2017-07-10 Thread Shriramana Sharma via Unicode

Hello. Searching UnicodeData.txt for emoji-s with the word "finger" I
am getting:

1F590;RAISED HAND WITH FINGERS SPLAYED;So;0;ON;N;
1F591;REVERSED RAISED HAND WITH FINGERS SPLAYED;So;0;ON;N;
1F595;REVERSED HAND WITH MIDDLE FINGER EXTENDED;So;0;ON;N;
1F596;RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS;So;0;ON;N;
1F834;LEFTWARDS FINGER-POST ARROW;So;0;ON;N;
1F835;UPWARDS FINGER-POST ARROW;So;0;ON;N;
1F836;RIGHTWARDS FINGER-POST ARROW;So;0;ON;N;
1F837;DOWNWARDS FINGER-POST ARROW;So;0;ON;N;
1F91E;HAND WITH INDEX AND MIDDLE FINGERS CROSSED;So;0;ON;N;
1F92B;FACE WITH FINGER COVERING CLOSED LIPS;So;0;ON;N;

Doesn't seem to be something that is equivalent to https://goo.gl/images/dWMpQd.

Is there a wagging finger emoji I missed or one in the pipeline?

☝ doesn't seem to cut it. It tells me "Look up!" And sure enough:

261D;WHITE UP POINTING INDEX;So;0;ON;N;

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा

Re: Counting Devanagari Aksharas

2017-04-20 Thread Shriramana Sharma via Unicode

Hello Richard. Yes my earlier reply wasn't intended to be offlist. I
have near-zero knowledge about non-Indic languages.

All I can say is that Tamil script has eschewed most consonant cluster
ligatures/conjoining forms. As for Devanagari, writing श्रीमान्‌को (I
used ZWNJ) i.o. श्रीमान्को is quite possible with existing technology.
The latter would be Sanskrit orthography and former perhaps Hindi,
although I wouldn't know why anyone would want to run in the को with
the preceding श्रीमान् even in Hindi. And IMO it would be better to
clearly define at the outset what you meant by "akshara" in your
question to avoid confusions by people replying having a different
idea of the meaning of that term.



-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा

46 matches

Mail list logo