On Wed, 12 Apr 2017 06:58:22 +0200
Philippe Verdy via Unicode wrote:
> This is the same problem. The same problem as crossword grids where
> we need also empty cells (and "black" cells which are equivalent to
> an empty cell with a black square symbol instead of letters).
And black cells with no
I have doubts about the Indic_Positional_Category (InPC) values proposed
for four new dependent vowels being added in Unicode 10.0.0.
On examining the vowel chart (p1265
of http://www.unicode.org/Public/10.0.0/charts/CodeCharts.pdf) one may
feel quite comfortable with assigning the property values
Is there consensus on how to count aksharas in the Devanagari script?
The doubts I have relate to a visible halant in orthographic syllables
other than the first.
For example, according to 'Devanagari VIP Team Issues Report'
http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, a
derive
I was offered the following reply:
> To my knowledge except in Tamil script vowel less consonants in
> written form aren't considered as separate "akshara"s in native
> terminology.
Word-finally they seem to be being treated as such. To be more
precise, a final cluster of one or more consonants
On Thu, 20 Apr 2017 15:33:37 +0530
Shriramana Sharma via Unicode wrote:
> All I can say is that Tamil script has eschewed most consonant cluster
> ligatures/conjoining forms. As for Devanagari, writing श्रीमान्को (I
> used ZWNJ) i.o. श्रीमान्को is quite possible with existing technology.
> The
On Thu, 20 Apr 2017 11:17:05 -0700
Manish Goregaokar via Unicode wrote:
> When given a rendered representation people seem to uniformly count
> conjuncts as multiple aksharas if rendered with visible halant, and as
> a single akshara if they are rendered conjoined.
Now, that's what I expected.
On Thu, 20 Apr 2017 14:14:00 -0700
Manish Goregaokar via Unicode wrote:
> On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode
> wrote:
> > On Thu, 20 Apr 2017 11:17:05 -0700
> > Manish Goregaokar via Unicode wrote:
> >> I'm of the opinion that
On Fri, 21 Apr 2017 00:08:24 -0500
Anshuman Pandey via Unicode wrote:
> > On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode
> > wrote:
> > Now imagine you're
> > typing Vedic Sanskrit, with its clusters and pitch indicators.
> I tried typing Vedi
On Thu, 20 Apr 2017 11:17:05 -0700
Manish Goregaokar via Unicode wrote:
> On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode
> wrote:
> > Is there consensus on how to count aksharas in the Devanagari
> > script? The doubts I have relate to a visible halant i
On Fri, 21 Apr 2017 16:27:43 -0700
Manish Goregaokar via Unicode wrote:
> > Do Hindi speakers really think of orthographic syllables as
> > characters?
>
> When rendered as a cluster, yes? I've asked around, and folks seem to
> insist on coupling it to the rendering.
That argues that it's a u
On Sat, 22 Apr 2017 13:34:32 +0300
Eli Zaretskii via Unicode wrote:
> AFAIR, Emacs allows one to _delete_ individual characters,
> i.e. Backspace and C-d delete character-by-character, so the problem
> shouldn't be so grave for imperfect typists.
Deleting forwards by one _character_ certainly ma
On Sat, 22 Apr 2017 21:39:42 +0100 (BST)
Julian Bradfield via Unicode wrote:
> On 2017-04-22, Eli Zaretskii via Unicode wrote:
> > I could imagine Emacs decomposing characters temporarily when only
> > part of a cluster matches the search string. Assuming this would
> > make sense to users of
On Sun, 23 Apr 2017 05:40:29 +0300
Eli Zaretskii via Unicode wrote:
> > The cursor moves to the cluster boundary, so there is much less of a
> > problem with Emacs.
>
> But you wanted to highlight only part of the cluster, AFAIU.
If I search for CGJ, highlighting it is frequently supremely us
On Mon, 24 Apr 2017 00:36:26 +0530
Naena Guru via Unicode wrote:
> The Unicode approach to Sanskrit and all Indic is flawed. Indic
> should not be letter-assembly systems.
>
> Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of
> the speech. Each writing system then assigns a sha
On Mon, 24 Apr 2017 20:53:12 +0530
Naena Guru via Unicode wrote:
> Quote by Richard:
> Unless this implies a spelling reform for many languages, I'd like to
> see how this works for the Tai Tham script. I'm not happy with the
> Romanisation I use to work round hostile rendering engines. (My
> s
On Wed, 26 Apr 2017 08:48:13 +0300
Eli Zaretskii via Unicode wrote:
> > Date: Sun, 23 Apr 2017 22:59:49 +0100
> > From: Richard Wordingham
> > Cc: Eli Zaretskii
> >
> > If I search for CGJ, highlighting it is frequently supremely
> > useless. I want to know where it is; highlighting is merely
On Thu, 27 Apr 2017 13:57:55 +0530
Srinidhi A via Unicode wrote:
> The annotation of 0F85 ྅ TIBETAN MARK PALUTA says it is used for
> avagraha. However it seems this character denotes pluta instead of
> avagraha. Pluta is used for indicating elongation of vowel.
> Similar character with identical
On Mon, 1 May 2017 07:17:05 +0200
Philippe Verdy via Unicode wrote:
> 2017-04-29 21:21 GMT+02:00 Naena Guru via Unicode
> :
> > Anyway, Unicode is only about DISPLAYING a script: There's a shape
> > here; Let's find how to get it by assembling other shapes or by
> > creating a code point for it.
On Mon, 1 May 2017 19:49:27 +0530
Naena Guru via Unicode wrote:
> The purpose of writing is to represent speech. It is not some secret
> that demi-gods created
Sarasvati and Thoth would be offended at being called mere demi-gods.
> sound => letter that is the basis for writing.
"=>" is not a
On Mon, 1 May 2017 23:03:53 +
Michael Bear via Unicode wrote:
> “Rather than using "unused code positions", I would always recommend
> to use some of the Private Use code points.” Consider it done.
>
> “What is the intended usage of your font? Music score
> applications? othe
On Tue, 2 May 2017 05:08:27 +0200
Philippe Verdy via Unicode wrote:
> Consider also that the BMP is almost full, the remaining few holes
> are kept for isolated characters that may be added to existing
> scripts, or permanently reserved to avoid clashes with legacy
> softwares using simple code r
On Thu, 4 May 2017 05:01:17 +0200
Philippe Verdy via Unicode wrote:
> Rendering Devanagari with OpenType does not require any PUA
> assignment in that font for variants. The sequences are mapped
> directly using subtables and the rules defined in OpenType for that
> script. Fonts just use their o
On Thu, 4 May 2017 23:13:08 +
Michael Bear via Unicode wrote:
> I plan to do everything in the plane EXCEPT for the surrogates, which
> you’re not supposed to encode in fonts anyway, which leaves room for
> about 2,048 more glyphs for OpenType features.
There are, if I avoided double countin
On Sat, 6 May 2017 12:54:07 +
Michael Bear via Unicode wrote:
> If I open the Sutton SignWriting code chart in Mozilla Firefox, the
> glyphs in the tables are blank. I have no idea why. If I open it in
> Microsoft Edge, however, it works fine. Do you know why this is?
It smacks of being a fa
On Fri, 5 May 2017 18:46:17 +
Michael Bear via Unicode wrote:
> But
> if the cry for space gets REALLY desperate, I’ll merge identical
> glyphs into one glyph. Obviously, I won’t do this for more
> problematic merges, only glyphs in similar scripts with similar
> features. (e.g. I would repre
One of the early problems encountered with Unicode was that there can
be multiple ways of representing the same text. For many scripts, the
solution was canonical equivalence - the multiple ways were declared to
be equivalent, and anything that thought they had different meanings
and should *there
On Mon, 15 May 2017 16:14:23 +
Peter Constable via Unicode wrote:
> So, your helpful person was, indeed, helpful, giving you correct
> information: ZWJ sequences are not _characters_ and have no
> implications for ISO/IEC 10646.
Except in so far as the claimed ligature changes the meaning of
On Mon, 15 May 2017 21:38:26 +
David Starner via Unicode wrote:
> > and the fact is that handling surrogates (which is what proponents
> > of UTF-8 or UCS-4 usually focus on) is no more complicated than
> > handling combining characters, which you have to do anyway.
> Not necessarily; you ca
On Tue, 16 May 2017 10:01:03 +0300
Henri Sivonen via Unicode wrote:
> Even so, I think even changing a recommendation of "best practice"
> needs way better rationale than "feels right" or "ICU already does it"
> when a) major browsers (which operate in the most prominent
> environment of broken a
On Tue, 16 May 2017 20:08:52 +0900
"Martin J. Dürst via Unicode" wrote:
> I agree with others that ICU should not be considered to have a
> special status, it should be just one implementation among others.
> [The next point is a side issue, please don't spend too much time on
> it.] I find it
On Tue, 16 May 2017 14:44:44 +0200
Hans Åberg via Unicode wrote:
> > On 15 May 2017, at 12:21, Henri Sivonen via Unicode
> > wrote:
> ...
> > I think Unicode should not adopt the proposed change.
>
> It would be useful, for use with filesystems, to have Unicode
> codepoint markers that indi
On Tue, 16 May 2017 17:30:01 +
Shawn Steele via Unicode wrote:
> > Would you advocate replacing
>
> > e0 80 80
>
> > with
>
> > U+FFFD U+FFFD U+FFFD (1)
>
> > rather than
>
> > U+FFFD (2)
>
> > It’s pretty clear what the intent of the encoder was
On Tue, 16 May 2017 11:36:39 -0700
Markus Scherer via Unicode wrote:
> Why do we care how we carve up an illegal sequence into subsequences?
> Only for debugging and visual inspection. Maybe some process is using
> illegal, overlong sequences to encode something special (à la Java
> string serial
On Wed, 17 May 2017 13:41:56 -0700
Doug Ewell via Unicode wrote:
> Perhaps surprisingly, it's already too late. UTC approved this change
> the day after the proposal was written.
>
> http://www.unicode.org/L2/L2017/17103.htm#151-C19
Approved for Unicode 11.0. Unicode 10.0 has yet to be release
On Wed, 17 May 2017 13:37:51 -0700
Doug Ewell via Unicode wrote:
> Richard Wordingham wrote:
>
> >> It is not at all clear what the intent of the encoder was - or even
> >> if it's not just a problem with the data stream. E0 80 80 is not
> >> permitted, it's garbage. An encoder can't "intend" it
On Wed, 17 May 2017 15:31:56 -0700
Doug Ewell via Unicode wrote:
> Richard Wordingham wrote:
>
> > So it was still a legal way for a non-UTF-8-compliant process!
>
> Anything is possible if you are non-compliant. You can encode U+263A
> with 9,786 FF bytes followed by a terminating FE byte an
On Thu, 18 May 2017 02:04:55 +0200
Philippe Verdy via Unicode wrote:
> I find intriguating that the update intends to enforce the decoding
> of the **shortest** sequences, but now wants to treat **maximal
> sequences** as a single unit with arbitrary length. UTF-8 was
> designed to work only with
On Thu, 18 May 2017 09:58:43 +0100
Alastair Houghton via Unicode wrote:
> On 18 May 2017, at 07:18, Henri Sivonen via Unicode
> wrote:
> >
> > the decision complicates U+FFFD generation when validating UTF-8 by
> > state machine.
>
> It *really* doesn’t. Even if you’re hell bent on using a
Given two raw values of the Age property, defined in UCD file
DerivedAge.txt, how is a computer program supposed to compare them?
Apart from special handling for the value "Unassigned" and its short
alias "NA", one used to be able to compare short values against short
values and long values against
On Mon, 22 May 2017 15:10:02 -0700
Markus Scherer via Unicode wrote:
> On Mon, May 22, 2017 at 2:44 PM, Richard Wordingham via Unicode <
> unicode@unicode.org> wrote:
>
> > Given two raw values of the Age property, defined in UCD file
> > DerivedAge.txt, how is a c
On Mon, 22 May 2017 17:19:08 -0500
Anshuman Pandey wrote:
> I performed several operations on DerivedAge.txt a few months ago.
> One basic example here:
>
> https://pandey.github.io/posts/unicode-growth-UCD-python.html
So what happens if you apply it to Unicode Version 10.0? Are the
versions s
On Tue, 23 May 2017 05:29:33 -0700
Asmus Freytag via Unicode wrote:
> On 5/23/2017 4:04 AM, Janusz S. Bien via Unicode wrote:
> > Quote/Cytat - Manuel Strehl via Unicode (Tue
> > 23 May 2017 11:33:24 AM CEST):
> >
> >> The rising standard in the world of web development (and others)
> >> is ca
On Tue, 23 May 2017 17:44:49 -0700
Ken Whistler via Unicode wrote:
> Ah, but keep in mind, if projecting out to Version 23.0 (in the year
> 2030, by our current schedule), there is a significant chance that
> particular UCD data files may have morphed into something entirely
> different. Reca
On Fri, 26 May 2017 11:22:37 -0700
Ken Whistler via Unicode wrote:
> On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote:
> > The link provided about the PRI doesn't lead to the comments.
> >
>
> PRI #121 (August, 2008) pre-dated the practice of keeping all the
> feedback comments togeth
On Tue, 30 May 2017 16:38:45 -0600
Karl Williamson via Unicode wrote:
> Under Best Practices, how many REPLACEMENT CHARACTERs should the
> sequence generate? 0, 1, 2, 3, 4 ?
>
> In practice, how many do parsers generate?
See Markus Kuhn's test page
http://www.cl.cam.ac.uk/~mgk25/ucs/examples
On Fri, 26 May 2017 21:41:49 +
Shawn Steele via Unicode wrote:
> I totally get the forward/backward scanning in sync without decoding
> reasoning for some implementations, however I do not think that the
> practices that benefit those should extend to other applications that
> are happy with
On Wed, 31 May 2017 15:12:12 +0300
Henri Sivonen via Unicode wrote:
> The write-up mentions
> https://bugs.chromium.org/p/chromium/issues/detail?id=662822#c13 . I'd
> like to draw everyone's attention to that bug, which is real-world
> evidence of a bug arising from two UTF-8 decoders within one
On Wed, 31 May 2017 17:43:08 +
Shawn Steele via Unicode wrote:
> There also appears to be a special weight given to
> non-minimally-encoded sequences. It would seem to me that none of
> these illegal sequences should appear in practice, so we have either:
> I do not understand the energy
On Wed, 31 May 2017 19:24:04 +
Shawn Steele via Unicode wrote:
> It seems to me that being able to use a data stream of ambiguous
> quality in another application with predictable results, then that
> stream should be “repaired” prior to being handed over. Then both
> endpoints would be usin
On Thu, 1 Jun 2017 12:32:08 +0300
Henri Sivonen via Unicode wrote:
> On Wed, May 31, 2017 at 8:11 PM, Richard Wordingham via Unicode
> wrote:
> > On Wed, 31 May 2017 15:12:12 +0300
> > Henri Sivonen via Unicode wrote:
> >> I am not claiming it's too di
On Thu, 01 Jun 2017 12:54:45 -0700
Doug Ewell via Unicode wrote:
> Richard Wordingham wrote:
>
> > even supporting 6-byte patterns just in case 20.1 bits eventually
> > turn out not to be enough,
>
> Oh, gosh, here we go with this.
You were implicitly invited to argue that there was no need
On Thu, 1 Jun 2017 17:10:54 -0700
Ken Whistler via Unicode wrote:
> On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote:
> > You were implicitly invited to argue that there was no need to
> > handle 5 and 6 byte invalid sequences.
> >
>
> Well, working from
On Thu, 1 Jun 2017 17:10:54 -0700
Ken Whistler via Unicode wrote:
> Well, working from the *current* specification:
>
> FC 80 80 80 80 80
> and
> FF FF FF FF FF FF
>
> are equal trash, uninterpretable as *anything* in UTF-8.
>
> By definition D39b, either sequence of bytes, if encountered by a
On Thu, 1 Jun 2017 19:19:51 -0700
Ken Whistler via Unicode wrote:
> > and therefore should start a
> > sequence of 6 characters.
>
> That is completely false, and has nothing to do with the current
> definition of UTF-8.
>
> The current, normative definition of UTF-8, in the Unicode Standa
On Mon, 5 Jun 2017 13:08:06 +0900
"Martin J. Dürst via Unicode" wrote:
> On 2017/06/02 04:54, Doug Ewell via Unicode wrote:
> > Richard Wordingham wrote:
> >
> >> even supporting 6-byte patterns just in case 20.1 bits eventually
> >> turn out not to be enough,
>
> Sorry to be late with this
On Sat, 01 Jul 2017 09:51:00 +0300
"a.lukyanov via Unicode" wrote:
> Is it possible to design fonts that will render ẞ as SS?
>
> So we could choose between ẞ and SS by just selecting the proper
> font, without changing the text itself.
>
> Or perhaps there will be a "font feature" to select th
On Sat, 8 Jul 2017 09:04:39 -0700
Asmus Freytag via Unicode wrote:
> But some handling
> of combining mark (and also the new emoji sequences) would equally
> constitute "basic" knowledge, with the Unicode algorithms like
> sorting,
Which major applications actually use the Unicode Collation Algo
On Fri, 28 Jul 2017 13:22:22 +0100 (BST)
William_J_G Overington via Unicode wrote:
> I have been thinking about having Turtle Graphics Emoji as an
> educational and fun idea.
I trust you are aware of the widespread feeling that there is already
an excessive number of turtle characters in Unicode
On Thu, 17 Aug 2017 18:34:56 +0530
Shriramana Sharma via Unicode wrote:
> Thanks for your reply, but how can characters be used portably if they
> are not part of the published standard yet? Or is it that hereafter
> both Unicode Standard + Unicode Emoji Standard will be parallelly
> portable or
On Thu, 24 Aug 2017 17:17:10 +
Andre Schappo via Unicode wrote:
> So, I consider it important to familiarise students with SMP
> characters as well as BMP characters. Then when they develop software
> they will, at the start, be thinking beyond ASCII and Unicode BMP
> characters.
Just steer
On Fri, 25 Aug 2017 12:57:37 +0100 (BST)
William_J_G Overington via Unicode wrote:
> UTF-16 is very useful. I use it in my research project.
> If the byte content of a UTF-16 file is displayed in a hexadecimal
> display then for all plane 0 characters the byte content of the
> character codes ar
On Fri, 25 Aug 2017 09:36:00 +0300
Eli Zaretskii via Unicode wrote:
> > Date: Fri, 25 Aug 2017 00:23:40 +0100
> > From: Richard Wordingham via Unicode
> >
> > On Thu, 24 Aug 2017 17:17:10 +
> > Andre Schappo via Unicode wrote:
> >
> > &g
On Sat, 26 Aug 2017 18:55:25 +0300
Eli Zaretskii via Unicode wrote:
> > Date: Sat, 26 Aug 2017 16:09:33 +0100
> > From: Richard Wordingham via Unicode
> > It shouldn't. UTF-16 works just like UTF-8, except that the code
> > units are bigger.
> Not
On Fri, 25 Aug 2017 01:24:36 +0200
Philippe Verdy via Unicode wrote:
> 2017-08-17 22:37 GMT+02:00 Richard Wordingham via Unicode <
> unicode@unicode.org>:
>
> > Fortunately, there is no good evidence that the occurrence
> > of multiple distinct left matras is
On Sat, 26 Aug 2017 21:20:45 +0300
Eli Zaretskii via Unicode wrote:
> > Date: Sat, 26 Aug 2017 18:52:03 +0100
> > From: Richard Wordingham via Unicode
> We are miscommunicating. My point was that programming for MS-Windows
> needs a good understanding of what the UTF-16 surr
On Sat, 26 Aug 2017 21:52:19 +0200
Philippe Verdy via Unicode wrote:
> 2017-08-26 21:28 GMT+02:00 Richard Wordingham via Unicode <
> unicode@unicode.org>:
> Of course SHY in this use is not suitable, but who knows if one will
> not need this to split in tow parts what wo
On Fri, 25 Aug 2017 09:36:44 -0400
John W Kennedy wrote:
> Just a reminder that in Apple’s Swift a “Character” is anything that
> looks like a character, including a letter with any theoretically
> unlimited stack of diacritics, a flag, or a skin-toned emoji, and all
> Swift functions working wit
On Sun, 27 Aug 2017 19:55:31 +0200
Philippe Verdy via Unicode wrote:
> 2017-08-27 6:06 GMT+02:00 Richard Wordingham via Unicode <
> unicode@unicode.org>:
> Canonical reordering is unambiguously refering to the canonical
> equivalences in TUS. These are automated and can
In philological work, one encounters the problem that two or more
abstract characters have only same 'natural' transliteration; the same
problem can apply to reconstructed phonemes, where there is no sound
indication of the actual pronunciation. A common solution is to use a
subscript or superscrip
I'm preparing to share a spell-checker for Northern Thai in the Tai
Tham script, and I'm having difficulty deciding whether to
offer corrections in NFC/NFD or unnormalised.
The problem arises in closed syllables with tone marks. For example,
ᨠᩥ᩠᩵ᨶ /kin/ 'smell', has two canonically equivalent enc
On Tue, 10 Oct 2017 22:46:20 +0300
Eli Zaretskii via Unicode wrote:
> > Date: Tue, 10 Oct 2017 20:00:12 +0100
> > From: Richard Wordingham via Unicode
> >
> > 4) The pressure on search tools to respect canonical equivalence is
> > now relatively low. Some editors
On Wed, 11 Oct 2017 13:10:26 +0300
Eli Zaretskii via Unicode wrote:
> > Date: Tue, 10 Oct 2017 21:51:55 +0100
> > From: Richard Wordingham via Unicode
> >
> > > Emacs lately introduced character-folding in searches, but it's
> > > turned off by defa
On Fri, 3 Nov 2017 02:36:43 -0700
Asmus Freytag via Unicode wrote:
> On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote:
>
> You may
> find https://twitter.com/andreschappo/status/926163719331176450 amusing
> 😀
>
> André Schappo
>
> You're wildly off in your page count.
>
> The "book" part
May a collation algorithm that always compares all strings as equal be a
compliant implementation of the Unicode Collation Algorithm (UTS #10)?
If not, by which clause is it not compliant? Formally, this algorithm
would require that all weights be zero.
Would an implementation that supported no c
On Mon, 4 Dec 2017 12:48:11 -0800
Markus Scherer via Unicode wrote:
> On Mon, Dec 4, 2017 at 5:30 AM, Richard Wordingham via Unicode <
> unicode@unicode.org> wrote:
> > Would an implementation that supported no characters be compliant?
> I guess so. I assume that wo
Apart from the likely but unmandated consequence of making editing
Indic text more difficult (possibly contrary to the UK's Equality Act
2010), there is another difficulty that will follow directly from the
currently proposed expansion of grapheme clusters
(https://www.unicode.org/reports/tr29/prop
Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0
implies that it might be considered desirable to have a word boundary
in 'aquaφοβία' or a grapheme cluster break in a coding such as <006C,
U+0901 DEVANAGARI SIGN CANDRABINDU> for el candrabindu (l̐), which
should be <006C, U+0310 COM
On Sat, 9 Dec 2017 16:08:22 +0100
Philippe Verdy wrote:
> 2017-12-09 15:28 GMT+01:00 Richard Wordingham via Unicode <
> unicode@unicode.org>:
>
> > Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0
> > implies that it might be considered des
On Sat, 9 Dec 2017 16:16:44 +0100
Mark Davis ☕️ via Unicode wrote:
> 1. You make a good point about the GB9c. It should probably instead be
> something like:
>
> GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant
>
>
> Extend is a broader than necessary, and there are a few items that
> have c
On Sun, 10 Dec 2017 21:14:18 -0800
Manish Goregaokar via Unicode wrote:
> > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant
>
> You can also explicitly request ligatureification with a ZWJ, so
> perhaps this rule should be something like
>
> (Virama ZWJ? | ZWJ) x Extend* LinkingConsonant
>
On Mon, 11 Dec 2017 08:59:20 +0100
Mark Davis ☕️ via Unicode wrote:
> The proposed rules do not distinguish the different visual forms that
> a sequence of characters surrounding a virama can have, such as
>
>1. an explicit virama, or
>2. a half-form is visible, or
>3. a ligature is
don't work (or need
> more research before #29 is finalized in May), it is fairly
> straightforward to restrict the rule changes by modifying
> http://www.unicode.org/reports/tr29/proposed.html#Virama to either
> exclude particular scripts or include only particular scripts.
>
> Ma
On Mon, 11 Dec 2017 21:45:23 +
Cibu Johny (സിബു) wrote:
> I am assuming the purpose of the grapheme cluster definition is to be
> used line spacing, vertical writing or cursor movement. Without
> defining the purpose, it is hard for me to say if a ruleset is valid
> or not.
That is a very fa
I have been reviewing UAX#29 Unicode Text Segmentation because I have a
feeling we will be trying to do too much with the concept of grapheme
clusters, even with tailoring, when we extend it to include whole
aksharas.
What is the meaning of "Word boundaries, line boundaries, and sentence
boundarie
Is there any valid reason for Egyptian hieroglyphs to have
Word_Break=ALetter rather than Complex_Context? So far as I am aware,
hieroglyphs lack visible word breaks in both inscriptions and in modern
transcriptions.
Richard.
On Thu, 14 Dec 2017 18:11:33 +
Andrew Glass via Unicode wrote:
> We had some discussion on the sidelines of
> the August UTC meeting at which time it became clear that more work
> is needed as current property values are not entirely correct.
> Currently, my Hieroglyphic energies are focused
On Mon, 11 Dec 2017 21:45:23 +
Cibu Johny (സിബു) wrote:
> I am assuming the purpose of the grapheme cluster definition is to be
> used line spacing, vertical writing or cursor movement. Without
> defining the purpose, it is hard for me to say if a ruleset is valid
> or not. Assuming that pur
On Thu, 14 Dec 2017 15:53:13 +0100
Mark Davis ☕️ via Unicode wrote:
> On Thu, Dec 14, 2017 at 3:22 PM, Michael Everson
> wrote:
> > NO. Clusters cannot be broken up just anywhere.
> Does that mean that ancient inscriptions would leave gaps at the end
> of lines in order to not break a cluster
On Mon, 18 Dec 2017 15:15:11 +0100
Serge Rosmorduc via Unicode wrote:
> Hence, you have things like (like 5-6) : : the word ẖsy « small »,
> is cut between the two lines. The phonetic part is line 5, and the
> bird determinative is alone on line 5, above the preposition « m »,
> which is itself
On Thu, 21 Dec 2017 17:55:33 +0900
"Martin J. Dürst via Unicode" wrote:
> On 2017/12/15 07:40, Richard Wordingham via Unicode wrote:
> > On Mon, 11 Dec 2017 21:45:23 +
> > Cibu Johny (സിബു) wrote:
> >> For example see the poster with word ഉസ്താദ് broke
On Fri, 22 Dec 2017 09:27:15 +0200
Eli Zaretskii via Unicode wrote:
> > Date: Thu, 21 Dec 2017 22:04:37 -0800
> > Cc: Unicode Public
> > From: Manish Goregaokar via Unicode
> >
> > However, Firefox deletes by code point.
>
> As does Emacs, btw.
And deleting in that fashion from the right i
On Thu, 21 Dec 2017 22:04:37 -0800
Manish Goregaokar via Unicode wrote:
> > When deleting by backspace, the usual practice is to delete one
> > Unicode
> character for each key press.
>
> This seems to depend on the operating system and program involved. For
> example, on OSX any native text i
On Fri, 22 Dec 2017 17:44:39 +0200
Eli Zaretskii via Unicode wrote:
> You can always delete a codepoint at a given position in Emacs,
> specifying the position by its number, but there are no user-level
> commands to conveniently allow doing that in the middle of a grapheme
> cluster.
>
> It was
On Fri, 22 Dec 2017 17:44:39 +0200
Eli Zaretskii via Unicode wrote:
> > Date: Fri, 22 Dec 2017 15:36:35 +
> > From: Richard Wordingham via Unicode
> > However, it seems
> > that one has to modify the source code of Emacs to be able to edit
> > in the middle of
On Mon, 1 Jan 2018 13:24:29 +0530
Manish Goregaokar via Unicode wrote:
> sounds very much like a
> degenerate case to me.
Generally yes, but I'm not sure that they'd be inappropriate for
Egyptian hieroglyphs showing human beings. The choice of determinative
can convey unpronounceable semantic
On Tue, 2 Jan 2018 01:21:37 -0800
Asmus Freytag via Unicode wrote:
> On 1/1/2018 6:52 AM, Richard Wordingham via Unicode wrote:
> > Generally yes, but I'm not sure that they'd be inappropriate for
> > Egyptian hieroglyphs showing human beings. The choice of
>
On Mon, 15 Jan 2018 20:16:21 -0800
James Kass via Unicode wrote:
> It will probably be the ASCII apostrophe. The stated intent favors
> the apostrophe over diacritics or special characters to ensure that
> the language can be input to computers with standard keyboards.
Typing U+0027 into a word
On Mon, 15 Jan 2018 23:40:15 -0800
James Kass via Unicode wrote:
> On a side note, wouldn't most of the "standard keyboards" currently in
> Kazakhstan be labelled in Cyrillic anyway?
They're probably already labelled in Cyrillic *and* printable ASCII
(US QWERTY). Using the Cyrillic labels for no
On Fri, 19 Jan 2018 02:12:04 +0100
Pierpaolo Bernardi via Unicode wrote:
> On Fri, Jan 19, 2018 at 1:19 AM, Aleksey Tulinov via Unicode
> wrote:
> > Perhaps we all shall stop being ironical to each other, calm down,
> > sit and discuss how to encode 3D animated emojies (animojies) in
> > Unicode
On Sun, 21 Jan 2018 13:49:46 +0100
Philippe Verdy via Unicode wrote:
> But there's NO standard keyboard in Kazakhstan with the Latin
> alphabet. Those you'll find are cyrillic keyboards with a way to type
> basic Latin. Or keyboards made for other countries.
I believe we're talking about physica
1 - 100 of 437 matches
Mail list logo