On Mon, 27 Mar 2017 13:34:09 -0700
Ken Whistler wrote:
> And if a flag of
> California (or Pomerania or ...) then gets added to the list of emoji
> tag sequences in a future version of the data, there is a good chance
> that the "users" will then see the difference, because
On Tue, 14 Mar 2017 08:51:18 +
Alastair Houghton <alast...@alastairs-place.net> wrote:
> On 14 Mar 2017, at 02:03, Richard Wordingham
> <richard.wording...@ntlworld.com> wrote:
> >
> > On Mon, 13 Mar 2017 19:18:00 +
> > Alastair Houghton
On Tue, 4 Apr 2017 00:30:30 +0100
Michael Everson wrote:
> On 3 Apr 2017, at 23:07, Asmus Freytag (c)
> wrote:
> You want WHITE CHESS KNIGHT, and WHITE CHESS KNIGHT ON SQUARE, and
> use a VS that changes the colour of the square? That is less legible
On Mon, 3 Apr 2017 23:35:52 +0100
Michael Everson <ever...@evertype.com> wrote:
> On 3 Apr 2017, at 22:03, Richard Wordingham
> <richard.wording...@ntlworld.com> wrote:
The relevant text before was,
"I'm talking about looking for a U+2654 glyph for ordinary text when
a
onally, SARA UE is SARA I plus NIKHAHIT, and I
suspect this is the origin of the etymologically odd form of ลึงค์
'lingam'.
> If, as Richard Wordingham wrote: "Unicode combining classes cannot be
> changed. All that can be done is to enforce the order of characters
> in normal
On Wed, 5 Apr 2017 10:45:43 +0700
"Gerriet M. Denkmann" <gerri...@icloud.com> wrote:
> > On 4 Apr 2017, at 23:51,Richard Wordingham
> > <richard.wording...@ntlworld.com> wrote:
> > The order of MAITAIKHU and tone mark is significant - it should
> >
On Wed, 5 Apr 2017 01:02:32 +0100
Michael Everson <ever...@evertype.com> wrote:
> On 4 Apr 2017, at 18:54, Richard Wordingham
> <richard.wording...@ntlworld.com> wrote:
> >
> > On Tue, 4 Apr 2017 01:30:05 +0100
> > Michael Everson <ever...@evertype.com>
On Mon, 3 Apr 2017 20:33:55 +0100
Richard Wordingham <richard.wording...@ntlworld.com> wrote:
> On Sun, 2 Apr 2017 10:43:39 -0700
> Asmus Freytag <asm...@ix.netcom.com> wrote:
> The basic text elements in the scheme other than boundary markers will
> be:
>
> em
On Sun, 2 Apr 2017 10:43:39 -0700
Asmus Freytag wrote:
> In these cases, explicit encoding would better cover what is desired:
> a reliable way to mark a distinction between different symbols (the
> two bishops are separate symbols, that also happen to express
> distinct,
On Tue, 4 Apr 2017 09:39:57 +0700
"Gerriet M. Denkmann" wrote:
> So the rule should be:
>
> A consonant may have zero or one tone/other marks and also zero or
> one top/bottom vowels. Exceptions:
> NIKHAHIT + tone mark (no top/bottom vowel)
> MAITAIKHU + tone
On Tue, 4 Apr 2017 01:30:05 +0100
Michael Everson wrote:
> > I'm trying to work out whether we need a variation sequence for
> > "chesspiece in a sentence”.
>
> Of course! Haven’t you ever seen chess problem texts? Check out the
> Fairy Chess proposal for encoding
On Mon, 3 Apr 2017 14:12:52 +0200
Michael Everson <ever...@evertype.com> wrote:
> On 2 Apr 2017, at 18:27, Richard Wordingham
> <richard.wording...@ntlworld.com> wrote:
> I think you are seriously going the wrong way with this thinking. The
> immediate parallel that comes
Where in the draft databases for Unicode 10.0 is Unicode 9.0 variation
sequence declared legitimate? Without such a
declaration, a font that had a special glyph for or
a substitution specific to would not be
On Sun, 2 Apr 2017 08:19:00 +0200
Michael Everson wrote:
> > On 1 Apr 2017, at 23:49, Philippe Verdy wrote:
> >
> > I think it's all about sizing so that white or black cells will
> > align, independantly of the piece that may be within it.
>
> A
On Sun, 2 Apr 2017 11:57:49 +0200
Michael Everson wrote:
> THat’s why in the proposal says things like "set in Ludus in
> 24 points with 26-point leading” where relevant.
You forgot the most important setting though - that the higher-order
protocols allow symbols to be
On Sun, 2 Apr 2017 11:57:49 +0200
Michael Everson <ever...@evertype.com> wrote:
> On 2 Apr 2017, at 10:53, Richard Wordingham
> <richard.wording...@ntlworld.com> wrote:
> > while <U+2657, U+FE01> would be the white bishop on a black square.
> > Perha
On Thu, 6 Apr 2017 11:34:47 +0100 (BST)
William_J_G Overington wrote:
> The following post may be of interest.
>
> http://www.unicode.org/mail-arch/unicode-ml/y2002-m06/0337.html
>
> It is part of a thread from 2002 about the possibility of chromatic
> fonts.
>
> I
On Thu, 6 Apr 2017 01:19:42 -0400
Rebecca T <637...@gmail.com> wrote:
> ... and
> aside from usage I see
> no difference between U+1F989 OWL 黎 and U+13153 EGYPTIAN HIEROGLYPH
> G017 ㅓ.
OWL does not have a prescribed attitude. On the other hand, if G017
were not body side on and head face on, I
On Thu, 6 Apr 2017 13:17:36 -0700
Asmus Freytag wrote:
> While it appears possible, after Khaled's demonstration, I still
> think that the use of "white ink" instead of the "white" parts of a
> character being treated "transparent" is far from standard text
> presentation.
On Thu, 6 Apr 2017 01:11:09 +0100
Michael Everson <ever...@evertype.com> wrote:
> On 5 Apr 2017, at 22:48, Richard Wordingham
> <richard.wording...@ntlworld.com> wrote:
>
> > I tried to read it from UTS#51 ‘Unicode Emoji', which is not part
> > of TUS,
On Sun, 12 Mar 2017 20:02:28 +0100
"Janusz S. Bien" wrote:
> If the basic notion has to be referred in a cumbersome way as
> "extended grapheme cluster" then it is easier to talk about "Unicode
> characters" despite the fact that they have a rather loose relation
> to
On Mon, 13 Mar 2017 15:26:00 -0700
Manish Goregaokar wrote:
> Do you have examples of AA being split that way (and further reading)?
> I think I'm aware of what you're talking about, but would love to read
> more about it.
Just googling for the three words 'Sanskrit',
On Mon, 13 Mar 2017 20:20:25 -0400
"Mark E. Shoulson" wrote:
> Sanskrit external vowel sandhi is comparatively
> straightforward (compared to consonant sandhi), and it frequently
> loses information. A *or* AA plus I is E; A *or* AA plus U is O (you
> need A + O to get AU).
On Mon, 13 Mar 2017 19:18:00 +
Alastair Houghton wrote:
> IMO, returning code points by index is a mistake. It over-emphasises
> the importance of the code point, which helps to continue the notion
> in some developers’ minds that code points are somehow
On Mon, 13 Mar 2017 23:10:11 +0200
Khaled Hosny wrote:
> But there are many text operations that require access to Unicode code
> points. Take for example text layout, as mapping characters to glyphs
> and back has to operate on code points. The idea that you never need
>
On Wed, 1 Mar 2017 12:56:23 -0800
Jean Aurambault wrote:
> I'm wondering if there is any standard that defines a universal city
> id (similar to country codes).
ISO 3166-2 defines codes for some cities, but its uneven. However,
what's a city? Does Constantinople
On Mon, 3 Apr 2017 15:07:38 -0700
"Asmus Freytag (c)" wrote:
> Having the system use specific character codes for the empties and
> variation selectors for the pieces is a needless complication; just
> duplicate the few pieces with a hatched background. (The precise
>
On Mon, 3 Apr 2017 22:48:31 +0100
Michael Everson wrote:
> Yes, this is what I’ve proposed.
I was explaining it to Asmus and others with similar misunderstandings.
Richard.
On Mon, 3 Apr 2017 14:12:51 +0700
"Gerriet M. Denkmann" wrote:
> The Combining Class is used for normalisation of strings.
> Normalisation of strings is important for filenames in filesystems.
>
> As far as I know, a Thai consonant (Lo, Other_Letter) can have
> several
In topic 'Proposal to add standardized variation sequences for chess
notation', on Wed, 5 Apr 2017 03:05:16 -0700
Asmus Freytag <asm...@ix.netcom.com> wrote:
> On 4/5/2017 1:10 AM, Richard Wordingham wrote:
> > A piece with a *white* background is different to a piece tha
On Wed, 5 Apr 2017 14:08:03 +0100
Michael Everson <ever...@evertype.com> wrote:
> On 5 Apr 2017, at 04:50, Richard Wordingham
> <richard.wording...@ntlworld.com> wrote:
>
> >> Why would anyone make a font that supports the variants for
> >> drawing
On Wed, 5 Apr 2017 20:32:44 +0100
Michael Everson wrote:
> On 5 Apr 2017, at 20:13, Philippe Verdy wrote:
> Chess characters aren’t emojis.
That doesn't mean that solutions applicable to emojis might not be
applicable elsewhere.
> The logic of the
On Thu, 06 Apr 2017 18:26:39 +0200
Kent Karlsson wrote:
> All the characters in the "chess board lines" (apart from spaces, if
> any), are of bidi category ON or NSM. So there is no character that
> "sets" a bidi direction of the lines ("paragraphs"). So if the bidi
>
On Thu, 6 Apr 2017 11:08:43 +0200 (CEST)
Christoph Päper <christoph.pae...@crissov.de> wrote:
> Richard Wordingham <richard.wording...@ntlworld.com>:
> > If the variation selectors are ignored, these simplify to:
> >
> > white square
> > hatched square
>
On Wed, 12 Apr 2017 06:58:22 +0200
Philippe Verdy via Unicode wrote:
> This is the same problem. The same problem as crossword grids where
> we need also empty cells (and "black" cells which are equivalent to
> an empty cell with a black square symbol instead of letters).
I have doubts about the Indic_Positional_Category (InPC) values proposed
for four new dependent vowels being added in Unicode 10.0.0.
On examining the vowel chart (p1265
of http://www.unicode.org/Public/10.0.0/charts/CodeCharts.pdf) one may
feel quite comfortable with assigning the property
On Thu, 20 Apr 2017 11:17:05 -0700
Manish Goregaokar via Unicode wrote:
> When given a rendered representation people seem to uniformly count
> conjuncts as multiple aksharas if rendered with visible halant, and as
> a single akshara if they are rendered conjoined.
Now,
On Thu, 20 Apr 2017 14:14:00 -0700
Manish Goregaokar via Unicode <unicode@unicode.org> wrote:
> On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode
> <unicode@unicode.org> wrote:
> > On Thu, 20 Apr 2017 11:17:05 -0700
> > Manish Goregaokar via Unico
On Fri, 21 Apr 2017 00:08:24 -0500
Anshuman Pandey via Unicode <unicode@unicode.org> wrote:
> > On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode
> > <unicode@unicode.org> wrote:
> > Now imagine you're
> > typing Vedic Sanskrit, with its clusters
On Thu, 20 Apr 2017 15:33:37 +0530
Shriramana Sharma via Unicode wrote:
> All I can say is that Tamil script has eschewed most consonant cluster
> ligatures/conjoining forms. As for Devanagari, writing श्रीमान्को (I
> used ZWNJ) i.o. श्रीमान्को is quite possible with
I was offered the following reply:
> To my knowledge except in Tamil script vowel less consonants in
> written form aren't considered as separate "akshara"s in native
> terminology.
Word-finally they seem to be being treated as such. To be more
precise, a final cluster of one or more consonants
Is there consensus on how to count aksharas in the Devanagari script?
The doubts I have relate to a visible halant in orthographic syllables
other than the first.
For example, according to 'Devanagari VIP Team Issues Report'
http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, a
On Fri, 21 Apr 2017 16:27:43 -0700
Manish Goregaokar via Unicode wrote:
> > Do Hindi speakers really think of orthographic syllables as
> > characters?
>
> When rendered as a cluster, yes? I've asked around, and folks seem to
> insist on coupling it to the rendering.
On Sun, 23 Apr 2017 05:40:29 +0300
Eli Zaretskii via Unicode wrote:
> > The cursor moves to the cluster boundary, so there is much less of a
> > problem with Emacs.
>
> But you wanted to highlight only part of the cluster, AFAIU.
If I search for CGJ, highlighting it is
On Mon, 24 Apr 2017 00:36:26 +0530
Naena Guru via Unicode wrote:
> The Unicode approach to Sanskrit and all Indic is flawed. Indic
> should not be letter-assembly systems.
>
> Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of
> the speech. Each writing
On Sat, 22 Apr 2017 13:34:32 +0300
Eli Zaretskii via Unicode wrote:
> AFAIR, Emacs allows one to _delete_ individual characters,
> i.e. Backspace and C-d delete character-by-character, so the problem
> shouldn't be so grave for imperfect typists.
Deleting forwards by one
On Sat, 22 Apr 2017 21:39:42 +0100 (BST)
Julian Bradfield via Unicode wrote:
> On 2017-04-22, Eli Zaretskii via Unicode wrote:
> > I could imagine Emacs decomposing characters temporarily when only
> > part of a cluster matches the search string.
On Sat, 8 Jul 2017 09:04:39 -0700
Asmus Freytag via Unicode wrote:
> But some handling
> of combining mark (and also the new emoji sequences) would equally
> constitute "basic" knowledge, with the Unicode algorithms like
> sorting,
Which major applications actually use the
On Sat, 01 Jul 2017 09:51:00 +0300
"a.lukyanov via Unicode" wrote:
> Is it possible to design fonts that will render ẞ as SS?
>
> So we could choose between ẞ and SS by just selecting the proper
> font, without changing the text itself.
>
> Or perhaps there will be a "font
On Wed, 26 Apr 2017 08:48:13 +0300
Eli Zaretskii via Unicode <unicode@unicode.org> wrote:
> > Date: Sun, 23 Apr 2017 22:59:49 +0100
> > From: Richard Wordingham <richard.wording...@ntlworld.com>
> > Cc: Eli Zaretskii <e...@gnu.org>
> >
> > If
On Thu, 20 Apr 2017 11:17:05 -0700
Manish Goregaokar via Unicode <unicode@unicode.org> wrote:
> On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode
> <unicode@unicode.org> wrote:
> > Is there consensus on how to count aksharas in the Devanagari
> > sc
On Mon, 24 Apr 2017 20:53:12 +0530
Naena Guru via Unicode wrote:
> Quote by Richard:
> Unless this implies a spelling reform for many languages, I'd like to
> see how this works for the Tai Tham script. I'm not happy with the
> Romanisation I use to work round hostile
On Thu, 27 Apr 2017 13:57:55 +0530
Srinidhi A via Unicode wrote:
> The annotation of 0F85 ྅ TIBETAN MARK PALUTA says it is used for
> avagraha. However it seems this character denotes pluta instead of
> avagraha. Pluta is used for indicating elongation of vowel.
> Similar
On Fri, 28 Jul 2017 13:22:22 +0100 (BST)
William_J_G Overington via Unicode wrote:
> I have been thinking about having Turtle Graphics Emoji as an
> educational and fun idea.
I trust you are aware of the widespread feeling that there is already
an excessive number of turtle
On Thu, 17 Aug 2017 18:34:56 +0530
Shriramana Sharma via Unicode wrote:
> Thanks for your reply, but how can characters be used portably if they
> are not part of the published standard yet? Or is it that hereafter
> both Unicode Standard + Unicode Emoji Standard will be
On Mon, 1 May 2017 23:03:53 +
Michael Bear via Unicode wrote:
> “Rather than using "unused code positions", I would always recommend
> to use some of the Private Use code points.” Consider it done.
>
> “What is the intended usage of your font? Music
On Mon, 15 May 2017 16:14:23 +
Peter Constable via Unicode wrote:
> So, your helpful person was, indeed, helpful, giving you correct
> information: ZWJ sequences are not _characters_ and have no
> implications for ISO/IEC 10646.
Except in so far as the claimed ligature
On Wed, 17 May 2017 13:37:51 -0700
Doug Ewell via Unicode <unicode@unicode.org> wrote:
> Richard Wordingham wrote:
>
> >> It is not at all clear what the intent of the encoder was - or even
> >> if it's not just a problem with the data stream. E0 80 80 is no
On Wed, 17 May 2017 15:31:56 -0700
Doug Ewell via Unicode <unicode@unicode.org> wrote:
> Richard Wordingham wrote:
>
> > So it was still a legal way for a non-UTF-8-compliant process!
>
> Anything is possible if you are non-compliant. You can encode U+263A
> w
On Wed, 17 May 2017 13:41:56 -0700
Doug Ewell via Unicode wrote:
> Perhaps surprisingly, it's already too late. UTC approved this change
> the day after the proposal was written.
>
> http://www.unicode.org/L2/L2017/17103.htm#151-C19
Approved for Unicode 11.0. Unicode 10.0
On Thu, 18 May 2017 02:04:55 +0200
Philippe Verdy via Unicode wrote:
> I find intriguating that the update intends to enforce the decoding
> of the **shortest** sequences, but now wants to treat **maximal
> sequences** as a single unit with arbitrary length. UTF-8 was
>
On Tue, 16 May 2017 20:08:52 +0900
"Martin J. Dürst via Unicode" wrote:
> I agree with others that ICU should not be considered to have a
> special status, it should be just one implementation among others.
> [The next point is a side issue, please don't spend too much time
On Tue, 16 May 2017 17:30:01 +
Shawn Steele via Unicode wrote:
> > Would you advocate replacing
>
> > e0 80 80
>
> > with
>
> > U+FFFD U+FFFD U+FFFD (1)
>
> > rather than
>
> > U+FFFD (2)
>
> > It’s pretty clear what the
One of the early problems encountered with Unicode was that there can
be multiple ways of representing the same text. For many scripts, the
solution was canonical equivalence - the multiple ways were declared to
be equivalent, and anything that thought they had different meanings
and should
On Tue, 16 May 2017 10:01:03 +0300
Henri Sivonen via Unicode wrote:
> Even so, I think even changing a recommendation of "best practice"
> needs way better rationale than "feels right" or "ICU already does it"
> when a) major browsers (which operate in the most prominent
>
Given two raw values of the Age property, defined in UCD file
DerivedAge.txt, how is a computer program supposed to compare them?
Apart from special handling for the value "Unassigned" and its short
alias "NA", one used to be able to compare short values against short
values and long values
On Mon, 22 May 2017 15:10:02 -0700
Markus Scherer via Unicode <unicode@unicode.org> wrote:
> On Mon, May 22, 2017 at 2:44 PM, Richard Wordingham via Unicode <
> unicode@unicode.org> wrote:
>
> > Given two raw values of the Age property, defined in UCD file
> >
On Mon, 22 May 2017 17:19:08 -0500
Anshuman Pandey wrote:
> I performed several operations on DerivedAge.txt a few months ago.
> One basic example here:
>
> https://pandey.github.io/posts/unicode-growth-UCD-python.html
So what happens if you apply it to Unicode Version 10.0?
On Tue, 23 May 2017 05:29:33 -0700
Asmus Freytag via Unicode wrote:
> On 5/23/2017 4:04 AM, Janusz S. Bien via Unicode wrote:
> > Quote/Cytat - Manuel Strehl via Unicode (Tue
> > 23 May 2017 11:33:24 AM CEST):
> >
> >> The rising standard in the world
On Thu, 18 May 2017 09:58:43 +0100
Alastair Houghton via Unicode wrote:
> On 18 May 2017, at 07:18, Henri Sivonen via Unicode
> wrote:
> >
> > the decision complicates U+FFFD generation when validating UTF-8 by
> > state machine.
>
> It *really*
On Tue, 23 May 2017 17:44:49 -0700
Ken Whistler via Unicode wrote:
> Ah, but keep in mind, if projecting out to Version 23.0 (in the year
> 2030, by our current schedule), there is a significant chance that
> particular UCD data files may have morphed into something
On Tue, 16 May 2017 14:44:44 +0200
Hans Åberg via Unicode wrote:
> > On 15 May 2017, at 12:21, Henri Sivonen via Unicode
> > wrote:
> ...
> > I think Unicode should not adopt the proposed change.
>
> It would be useful, for use with filesystems, to
On Tue, 16 May 2017 11:36:39 -0700
Markus Scherer via Unicode wrote:
> Why do we care how we carve up an illegal sequence into subsequences?
> Only for debugging and visual inspection. Maybe some process is using
> illegal, overlong sequences to encode something special (à
On Fri, 26 May 2017 11:22:37 -0700
Ken Whistler via Unicode wrote:
> On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote:
> > The link provided about the PRI doesn't lead to the comments.
> >
>
> PRI #121 (August, 2008) pre-dated the practice of keeping all the
>
On Fri, 26 May 2017 21:41:49 +
Shawn Steele via Unicode wrote:
> I totally get the forward/backward scanning in sync without decoding
> reasoning for some implementations, however I do not think that the
> practices that benefit those should extend to other applications
On Tue, 30 May 2017 16:38:45 -0600
Karl Williamson via Unicode wrote:
> Under Best Practices, how many REPLACEMENT CHARACTERs should the
> sequence generate? 0, 1, 2, 3, 4 ?
>
> In practice, how many do parsers generate?
See Markus Kuhn's test page
On Thu, 1 Jun 2017 17:10:54 -0700
Ken Whistler via Unicode wrote:
> Well, working from the *current* specification:
>
> FC 80 80 80 80 80
> and
> FF FF FF FF FF FF
>
> are equal trash, uninterpretable as *anything* in UTF-8.
>
> By definition D39b, either sequence of
On Thu, 1 Jun 2017 19:19:51 -0700
Ken Whistler via Unicode wrote:
> > and therefore should start a
> > sequence of 6 characters.
>
> That is completely false, and has nothing to do with the current
> definition of UTF-8.
>
> The current, normative definition of UTF-8,
On Thu, 1 Jun 2017 17:10:54 -0700
Ken Whistler via Unicode <unicode@unicode.org> wrote:
> On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote:
> > You were implicitly invited to argue that there was no need to
> > handle 5 and 6 byte invalid sequences.
> >
On Mon, 5 Jun 2017 13:08:06 +0900
"Martin J. Dürst via Unicode" <unicode@unicode.org> wrote:
> On 2017/06/02 04:54, Doug Ewell via Unicode wrote:
> > Richard Wordingham wrote:
> >
> >> even supporting 6-byte patterns just in case 20.1 bits e
On Wed, 31 May 2017 19:24:04 +
Shawn Steele via Unicode wrote:
> It seems to me that being able to use a data stream of ambiguous
> quality in another application with predictable results, then that
> stream should be “repaired” prior to being handed over. Then both
>
On Thu, 1 Jun 2017 12:32:08 +0300
Henri Sivonen via Unicode <unicode@unicode.org> wrote:
> On Wed, May 31, 2017 at 8:11 PM, Richard Wordingham via Unicode
> <unicode@unicode.org> wrote:
> > On Wed, 31 May 2017 15:12:12 +0300
> > Henri Sivonen via Unicode <unicod
On Thu, 01 Jun 2017 12:54:45 -0700
Doug Ewell via Unicode <unicode@unicode.org> wrote:
> Richard Wordingham wrote:
>
> > even supporting 6-byte patterns just in case 20.1 bits eventually
> > turn out not to be enough,
>
> Oh, gosh, here we go with this.
You we
On Thu, 4 May 2017 05:01:17 +0200
Philippe Verdy via Unicode wrote:
> Rendering Devanagari with OpenType does not require any PUA
> assignment in that font for variants. The sequences are mapped
> directly using subtables and the rules defined in OpenType for that
> script.
On Sat, 6 May 2017 12:54:07 +
Michael Bear via Unicode wrote:
> If I open the Sutton SignWriting code chart in Mozilla Firefox, the
> glyphs in the tables are blank. I have no idea why. If I open it in
> Microsoft Edge, however, it works fine. Do you know why this is?
On Fri, 5 May 2017 18:46:17 +
Michael Bear via Unicode wrote:
> But
> if the cry for space gets REALLY desperate, I’ll merge identical
> glyphs into one glyph. Obviously, I won’t do this for more
> problematic merges, only glyphs in similar scripts with similar
>
On Mon, 1 May 2017 19:49:27 +0530
Naena Guru via Unicode wrote:
> The purpose of writing is to represent speech. It is not some secret
> that demi-gods created
Sarasvati and Thoth would be offended at being called mere demi-gods.
> sound => letter that is the basis for
On Mon, 1 May 2017 07:17:05 +0200
Philippe Verdy via Unicode wrote:
> 2017-04-29 21:21 GMT+02:00 Naena Guru via Unicode
> :
> > Anyway, Unicode is only about DISPLAYING a script: There's a shape
> > here; Let's find how to get it by assembling other
On Tue, 2 May 2017 05:08:27 +0200
Philippe Verdy via Unicode wrote:
> Consider also that the BMP is almost full, the remaining few holes
> are kept for isolated characters that may be added to existing
> scripts, or permanently reserved to avoid clashes with legacy
>
On Thu, 4 May 2017 23:13:08 +
Michael Bear via Unicode wrote:
> I plan to do everything in the plane EXCEPT for the surrogates, which
> you’re not supposed to encode in fonts anyway, which leaves room for
> about 2,048 more glyphs for OpenType features.
There are, if I
On Wed, 31 May 2017 15:12:12 +0300
Henri Sivonen via Unicode wrote:
> The write-up mentions
> https://bugs.chromium.org/p/chromium/issues/detail?id=662822#c13 . I'd
> like to draw everyone's attention to that bug, which is real-world
> evidence of a bug arising from two
In philological work, one encounters the problem that two or more
abstract characters have only same 'natural' transliteration; the same
problem can apply to reconstructed phonemes, where there is no sound
indication of the actual pronunciation. A common solution is to use a
subscript or
I'm preparing to share a spell-checker for Northern Thai in the Tai
Tham script, and I'm having difficulty deciding whether to
offer corrections in NFC/NFD or unnormalised.
The problem arises in closed syllables with tone marks. For example,
ᨠᩥ᩠᩵ᨶ /kin/ 'smell', has two canonically equivalent
On Tue, 10 Oct 2017 22:46:20 +0300
Eli Zaretskii via Unicode <unicode@unicode.org> wrote:
> > Date: Tue, 10 Oct 2017 20:00:12 +0100
> > From: Richard Wordingham via Unicode <unicode@unicode.org>
> >
> > 4) The pressure on search tools to respect canonical e
On Thu, 24 Aug 2017 17:17:10 +
Andre Schappo via Unicode wrote:
> So, I consider it important to familiarise students with SMP
> characters as well as BMP characters. Then when they develop software
> they will, at the start, be thinking beyond ASCII and Unicode BMP
>
On Sat, 26 Aug 2017 18:55:25 +0300
Eli Zaretskii via Unicode <unicode@unicode.org> wrote:
> > Date: Sat, 26 Aug 2017 16:09:33 +0100
> > From: Richard Wordingham via Unicode <unicode@unicode.org>
> > It shouldn't. UTF-16 works just like UTF-8, except that
On Fri, 25 Aug 2017 12:57:37 +0100 (BST)
William_J_G Overington via Unicode wrote:
> UTF-16 is very useful. I use it in my research project.
> If the byte content of a UTF-16 file is displayed in a hexadecimal
> display then for all plane 0 characters the byte content of
On Fri, 25 Aug 2017 09:36:00 +0300
Eli Zaretskii via Unicode <unicode@unicode.org> wrote:
> > Date: Fri, 25 Aug 2017 00:23:40 +0100
> > From: Richard Wordingham via Unicode <unicode@unicode.org>
> >
> > On Thu, 24 Aug 2017 17:17:10 +
> > Andre Schap
On Fri, 25 Aug 2017 01:24:36 +0200
Philippe Verdy via Unicode <unicode@unicode.org> wrote:
> 2017-08-17 22:37 GMT+02:00 Richard Wordingham via Unicode <
> unicode@unicode.org>:
>
> > Fortunately, there is no good evidence that the occurrence
> > of multiple
On Sat, 26 Aug 2017 21:20:45 +0300
Eli Zaretskii via Unicode <unicode@unicode.org> wrote:
> > Date: Sat, 26 Aug 2017 18:52:03 +0100
> > From: Richard Wordingham via Unicode <unicode@unicode.org>
> We are miscommunicating. My point was that programming for MS-Windows
&
601 - 700 of 1062 matches
Mail list logo