On Sat, Mar 21, 2020 at 12:35 PM Doug Ewell via Unicode
wrote:
> I thought the whole premise of GB18030 was that it was Unicode mapped into
> a GB2312 framework. What characters exist in GB18030 that don't exist in
> Unicode, and have they been proposed for Unicode yet, and why was none of
> the
On Wed, Feb 12, 2020 at 11:37 AM Marius Spix via Unicode <
unicode@unicode.org> wrote:
> In my opinion, this is an invalid character, which should not be
> included in Unicode.
>
Please remember that feedback that you want the committee to look at needs
to go through
documents are published on
unicode-org.github.io/icu-docs/ – follow the “Dev” links there.
Best regards,
Markus Scherer for the ICU Project
On Mon, Dec 2, 2019 at 5:47 PM विश्वासो वासुकिजः (Vishvas Vasuki) via
Unicode wrote:
> But that says that the definitions are at
>>
>
>> https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml
>> ,
>> but all one currently gets from that is an error message 'XML
On Mon, Dec 2, 2019 at 8:42 AM Roozbeh Pournader via Unicode <
unicode@unicode.org> wrote:
> You don't need an ISO 15924 script code. You need to think in terms of BCP
> 47. Sanskrit in Latin would be sa-Latn.
>
Right!
Now, if you want to distinguish the different transcription systems for
>
On Mon, Nov 11, 2019 at 4:03 AM Philippe Verdy via Unicode <
unicode@unicode.org> wrote:
> But first there's still no code in ISO 15924 (first step easy to complete
> before encoding in the UCS).
>
That's not first; it's nearly last.
The script code standard says "In general, script codes shall
On Fri, Oct 11, 2019 at 12:05 PM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:
> On Thu, 10 Oct 2019 15:23:00 -0700
> Markus Scherer via Unicode wrote:
>
> > [c \q{ch}]h should work like (ch|c)h. Note that the order matters in
> > the alternation --
On Fri, Oct 11, 2019 at 4:37 AM Fred Brennan via Unicode <
unicode@unicode.org> wrote:
> Many users are asking me and I'm not sure of the answer (nor how to find
> it
> out).
>
You can find out by looking at the data files that are being developed for
Unicode 13.
Look at the latest
On Tue, Oct 8, 2019 at 7:28 AM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:
> An example UTS#18 gives for matching a literal cluster can be simplified
> to, in its notation:
>
> [c \q{ch}]
>
> This is interpreted as 'match against "ch" if possible, otherwise
> against "c". Thus
On Fri, Oct 4, 2019 at 2:05 PM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:
> > >> Is the use of the Meitei script aspirational or customary?
> > >> Which script is being used for major newspapers, popular books,
> > >> and video captions?
> > >
> > > This may give you some more
Dear Unicoders,
Is Manipuri/Meitei customarily written in Bangla/Bengali script or
in Meitei script?
I am looking at
https://en.wikipedia.org/wiki/Meitei_language#Writing_systems which seems
to describe writing practice in transition, and I can't quite tell where it
stands.
Is the use of the
There are lots of ways to implement the UCA.
When you want fast string comparison, the zero weights are useful for
processing -- and you don't actually assemble a sort key.
People who want sort keys usually want them to be short, so you spend time
on compression. You probably also build sort
On Tue, Oct 2, 2018 at 12:50 AM Martin J. Dürst via Unicode <
unicode@unicode.org> wrote:
> ... The only
> operation that can cause problems is 'capitalize'.
>
> When I say "cause problems", I mean producing mixed-case output. I
> originally thought that 'capitalize' would be fine. It is fine for
I would not expect for Ä+combining () above = Ä᪻ to look right except with
specialized fonts.
http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%84%5Cu1ABB==0
Even if it worked widely, I think it would be confusing.
I think you are best off writing Arzt/Ärztin.
Viele Grüße,
markus
On Tue, May 15, 2018 at 10:47 AM, Johnny Farraj via Unicode <
unicode@unicode.org> wrote:
> Dear Unicode list members,
>
> I wish to get feedback about a new symbol submission proposal.
>
Just to clarify, this is a discussion list where you may get some useful
feedback. This is not where you
On Mon, Mar 5, 2018 at 9:03 AM, suzuki toshiya via Unicode <
unicode@unicode.org> wrote:
> I have a question; if some people try to make a
> translated version of Unicode, they should contact
> all font contributors and ask for the license?
> Unicode Consortium cannot give any sublicense?
>
If
On Sun, Mar 4, 2018 at 6:10 AM, Helena Miton via Unicode <
unicode@unicode.org> wrote:
> Greetings. Is there a way to know which font and font size have been used
> in the Unicode charts (for various writing systems)? Many thanks!
>
What are you trying to do?
Many of the fonts are unique to the
On my machine (Chromebox+Gmail), the trumpets point down to the lower left.
If you want to convey precise images, then send images...
markus
If your presentation is accepted for the conference, you should get a hotel
discount.
markus
On Mon, Dec 4, 2017 at 5:30 AM, Richard Wordingham via Unicode <
unicode@unicode.org> wrote:
> May a collation algorithm that always compares all strings as equal be a
> compliant implementation of the Unicode Collation Algorithm (UTS #10)?
> If not, by which clause is it not compliant?
On Wed, Sep 27, 2017 at 4:07 PM, James Tauber wrote:
> Ah yes, I was just going by membership in the CJK Unified Ideographs
> Extension E block, not actual assignment.
>
> So the lack of assignment means it should fail the Unified_Ideograph
> membership in
On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode <
unicode@unicode.org> wrote:
> I recently updated pyuca[1], my pure Python implementation of the Unicode
> Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0 but to get all
> the tests to work, I had to special case the implicit
FYI, I changed the ICU behavior for the upcoming ICU 60 release (pending
code review).
Proposal & description:
https://sourceforge.net/p/icu/mailman/message/35990833/
Code changes: http://bugs.icu-project.org/trac/review/13311
Best regards,
markus
On Thu, Aug 3, 2017 at 5:34 PM, Mark Davis ☕️
On Mon, Jul 17, 2017 at 5:25 AM, Christoph Päper via Unicode <
unicode@unicode.org> wrote:
> As you may know, the combined original Japanese emoji set included three
> whitespace characters: one was the full width of a (square) emoji, one was
> half that and the last one was a quarter blank.
On Wed, May 31, 2017 at 5:12 AM, Henri Sivonen wrote:
> On Sun, May 21, 2017 at 7:37 PM, Mark Davis ☕️ via Unicode
> wrote:
> > There is plenty of time for public comment, since it was targeted at
> Unicode
> > 11, the release for about a year from
On Fri, May 26, 2017 at 3:28 AM, Martin J. Dürst
wrote:
> But there's plenty in the text that makes it absolutely clear that some
> things cannot be included. In particular, it says
>
>
> The term “maximal subpart of an ill-formed subsequence” refers to the code
>
On Wed, May 24, 2017 at 3:56 PM, Karl Williamson
wrote:
> On 05/24/2017 12:46 AM, Martin J. Dürst wrote:
>
>> That's wrong. There was a public review issue with various options and
>> with feedback, and the recommendation has been implemented and in use
>> widely (among
On Tue, May 23, 2017 at 7:05 AM, Asmus Freytag via Unicode <
unicode@unicode.org> wrote:
> So, if the proposal for Unicode really was more of a "feels right" and not
> a "deviate at your peril" situation (or necessary escape hatch), then we
> are better off not making a RECOMMEDATION that goes
On Mon, May 22, 2017 at 2:44 PM, Richard Wordingham via Unicode <
unicode@unicode.org> wrote:
> Given two raw values of the Age property, defined in UCD file
> DerivedAge.txt, how is a computer program supposed to compare them?
> Apart from special handling for the value "Unassigned" and its
Let me try to address some of the issues raised here.
The proposal changes a recommendation, not a requirement. Conformance
applies to finding and interpreting valid sequences properly. This includes
not consuming parts of valid sequences when dealing with illegal ones, as
explained in the
There were some symbols, mostly proprietary logos, that we did not propose
for encoding in Unicode. See pages 83-89 of
http://www.unicode.org/L2/L2010/10132-emojidata.pdf
You could also mine the defunct symbols subcommittee page for more
information:
On Mon, Apr 3, 2017 at 2:33 PM, Michael Everson <ever...@evertype.com>
wrote:
> On 3 Apr 2017, at 18:51, Markus Scherer <markus@gmail.com> wrote:
>
>
> It seems to me that higher-level layout (e.g, HTML+CSS) is appropriate for
> the board layout (e.g., via
It seems to me that higher-level layout (e.g, HTML+CSS) is appropriate for
the board layout (e.g., via a table), board frame style, and cell/field
shading.
In each field, the existing characters should suffice.
markus
I think "recommended" could be renamed to "(expected to be) widely
implemented".
markus
On Tue, Mar 28, 2017 at 11:41 AM, Doug Ewell wrote:
> Mark Davis wrote:
>
> > 3. Valid, but not recommended: "usca". Corresponds to the valid
> > Unicode subdivision code for California according to
> > http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences
> >
On Mon, Mar 27, 2017 at 5:09 PM, Philippe Verdy wrote:
> I followed the links. Check your links, you are referencing the proposal,
> and this contradicts the published version 4.0 of TR51. Where is stability ?
>
Of course I am pointing to the proposal. The version of TR 51
On Mon, Mar 27, 2017 at 4:58 PM, Philippe Verdy wrote:
> This only describes the sequences encoded with 2 characters, not the newer
> longer sequences for flags of subnational regions. the
> unicode_region_subtag data does not contain anything about the flags for
> the first
On Mon, Mar 27, 2017 at 1:34 PM, Ken Whistler wrote:
> Anybody could *attempt* to convey a flag of Pomerania (a rather handsome
> black gryphon on a yellow background, btw) with an emoji tag sequence right
> now, I suppose.
I suppose not. Since it's bound to ISO 3166
On Mon, Mar 27, 2017 at 1:39 PM, Philippe Verdy wrote:
> Note also that ISO3166-2 is far from being stable, and this could
> contradict Unicode encoding stability: it would then be required to ensure
> this stability by only allowing sequences that are effectively registered
I think the interest has been low because very few documents survive in
these encodings, and even fewer documents using not-already-encoded symbols.
In my opinion, this is a good use of the Private Use Area among a very
small group of people.
See also
On Wed, Jan 25, 2017 at 12:00 PM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:
> > > 2) Claims that logical_order_exception is relevant for searching
> > > (TUS, as above)
>
> > It informs the construction of the DUCET and could be used to
> > suppress_contractions in a search
On Wed, Jan 25, 2017 at 11:10 AM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:
> I now have a clutch of errors to report on Unicode's use of the term
> 'logical order' and references to logical_order_exception:
>
> 1) Claims that Thai is not encoded in logical order in
>
On Wed, Jan 4, 2017 at 2:28 AM, Alastair Houghton <
alast...@alastairs-place.net> wrote:
> RFC 5893 seems pretty clear to me, and the problem really is that the test
> vectors (which come from unicode.org) seem (to me) to be incorrect.
https://tools.ietf.org/html/rfc5893#section-2 says "*The
On Sun, Dec 25, 2016 at 8:33 AM, Yifán Wáng <747.neut...@gmail.com> wrote:
> I'm curious about the reason why U+270C VICTORY HAND ✌ has
> standardized text and emoji styles defined but not with U+270A RAISED
> FIST ✊ and U+270B RAISED HAND ✋.
>
On Tue, Dec 20, 2016 at 8:59 AM, Ken Whistler wrote:
> You found the resulting text in TUS 9.0, p. 126 - 129. The origin of the
> text there about best practices for using U+FFFD was the discussion and
> resolution of PRI #121 in August, 2008:
>
>
On Mon, Dec 19, 2016 at 3:04 PM, Karl Williamson
wrote:
> It seems counterintuitive to me that the two byte sequence C0 80 should be
> replaced by 2 replacement characters under best practices, or that E0 80 80
> should also be replaced by 2. Each sequence was legal in
On Sun, Dec 4, 2016 at 3:09 AM, Reini Urban wrote:
> Is anybody aware of any other language implementation, which does
> confusable or mixed-script protection?
> I think R has something, because it has this header:
> https://cran.r-project.org/bin/windows/extsoft/3.4/
>
On Sat, Dec 3, 2016 at 2:37 PM, Christoph Päper wrote:
> If an existing character encoding forms the (sole) base of an addition to
> Unicode, shouldn’t it be part of the UTC’s job to document these sources?
> This was obviously done in the case of Japanese emoji,
On Fri, Dec 2, 2016 at 4:35 AM, Christoph Päper wrote:
> Could and should custom vendor extensions like the ones documented in
>
> - http://unicode.org/Public/UCD/latest/ucd/EmojiSources.txt
>
> be included in these mappings?
>
They could, but it would be best for
On Wed, Sep 28, 2016 at 9:16 AM, Philippe Verdy wrote:
> My opinion is to put an accent on each letter and join them with a joiner
>
I don't see a reason for the joiner.
markus
On Fri, Aug 26, 2016 at 10:26 AM, Ken Whistler wrote:
> On 8/26/2016 10:01 AM, John O'Conner wrote:
>
>> What I find more interesting is how emoji (a small digital image or icon)
>> was ever interpreted as encodable text for the Unicode Standard. If our
>> German newspaper
On Fri, Aug 5, 2016 at 8:52 AM, Sean Leonard
wrote:
> What makes a character a "whitespace" in Unicode, e.g., why are ZWSP and
> ZWNBSP not "whitespace" even though they clearly say "SPACE" in them?
>
I think "white space" basically wants to have an advance width
Interesting discussion!
ICU does not support "is" nor "in" prefixes. I wasn't even aware that UAX
#44 loose matching prescribes "is". ICU just implements what
Property[Value]Aliases.txt say:
# Loose matching should be applied to all property names and property
values, with
# the exception of
Note that the Block property is an artifact of how the committee organizes
the encoding of characters. It is not very useful for processing. For that,
the Script property, Script_Extensions, and others are normally much better.
markus
FYI
It seems like 08xx is reserved for RTL scripts.
http://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedBidiClass.txt
# The unassigned code points that default to R are in the ranges:
# [\u0590-\u05FF *\u07C0-\u089F* \uFB1D-\uFB4F
\U00010800-\U00010FFF \U0001E800-\U0001EDFF
On Tue, Feb 9, 2016 at 7:58 AM, Michael Everson
wrote:
> On 9 Feb 2016, at 11:18, ACJ Unicode wrote:
>
> > This is taught in writing in primary school in the Netherlands (or at
> least it was 30 years ago), but this practice is often abandoned soon
>
On Mon, Feb 8, 2016 at 10:47 AM, James Tauber wrote:
> Even with all this, though, my own work includes accentuation and
> syllabification algorithms, all of which are made more cumbersome by the
> lack of precomposed characters indicating vowel length. I'm currently
>
I would specify that UTF-8 must be used, without mapping.
US-ASCII is a proper subset, so need not be mentioned explicitly, nor
distinguished in the protocol.
Mappings would require that all implementations carry relevant data, and
are up to date to recent versions of Unicode, or else
Dear Mr. Tranter,
I can't tell whether you intend to start a discussion on this discussion
mailing list, or intend to submit feedback on a proposal. Maybe you are
looking for discussion before you formalize your feedback.
If you do intend to submit feedback, then, once you have formulated a
On Thu, Nov 5, 2015 at 9:25 AM, Philippe Verdy wrote:
> (0xFF was reserved only in the old RFC version of UTF-8 when it allowed
> code points up to 31 bits, but even this RFC is obsolete and should no
> longer be used and it has never been approved by Unicode).
>
No, even in
About http://www.unicode.org/L2/L2015/15299-ucd-emoji-props.pdf
which has
Emoji_Presentation (EP)
● Non_Emoji (NE)
● Default_Text (DT)
● Default_Emoji (DE)
● NA
Why do we need both Non_Emoji and NA? Can't Non_Emoji be the default for
all code points that are not mentioned in the data?
markus
On Mon, Oct 19, 2015 at 1:32 PM, Doug Ewell wrote:
> > ICU (but perhaps it's actually Java) seems to have a culture of
> > tolerating lone surrogates, and rules for handling lone surrogates are
> > strewn across the Unicode standards and annexes.
>
> I suspect you have an
I would not spend any time specifying intricate rules for unpaired
surrogates in 16-bit strings, or out-of range values in 32-bit strings.
Most processing will treat them like unassigned characters, like U+50005,
with only default behaviors.
markus
On Mon, Jul 27, 2015 at 4:46 PM, Garth Wallace gwa...@gmail.com wrote:
where
does that leave the Kana Supplement block? That block contains only
two encoded characters, but was allocated 256 code points, presumably
for the future encoding of hentaigana. With hentaigana handled by
SVSes, it
On Thu, Jul 9, 2015 at 8:53 AM, Doug Ewell d...@ewellic.org wrote:
From http://www.unicode.org/L2/L2015/15169-montenegro-cyrillic.pdf,
Addition of two letters from Montenegrin language, CYRILLIC script:
9. Can any of the proposed characters be encoded using a composed
character sequence
Thanks!
markus
If the chart does not reflect the data, then please submit a bug ticket.
http://unicode.org/cldr/trac/newticket
The data is what counts.
markus
Looks all wrong to me.
don’t is a contraction of two words, it is not one word.
English is taught as that squiggle being punctuation, not a letter.
(Unlike, say, the Hawaiʻian ʻOkina
http://en.wikipedia.org/wiki/%CA%BBOkina.)
You can't use simple regular expressions to find word boundaries.
On Mon, May 18, 2015 at 11:19 AM, Doug Ewell d...@ewellic.org wrote:
Is the new mechanism intended to allow flag tags that include either
subtype values or contains values?
As far as I can tell from your quotes, CLDR will say what's valid (plus
containment info), and Unicode permits you to
On Fri, May 8, 2015 at 9:13 PM, Philippe Verdy verd...@wanadoo.fr wrote:
2015-05-09 5:13 GMT+02:00 Richard Wordingham
richard.wording...@ntlworld.com:
I can't think of a practical use for the specific concepts of Unicode
8-bit, 16-bit and 32-bit strings. Unicode 16-bit strings are
I assume that the JSON spec deliberately allows anything that Java and
JavaScript allow. In particular, there is no requirement for a Java String
or JavaScript string to contain text, or well-formed UTF-16, or only
assigned characters. Some code stores binary data (sequence of arbitrary
16-bit
On Fri, Mar 27, 2015 at 1:27 PM, Michael Norton
michaelanortons...@gmail.com wrote:
Easy example: what's the code for [blank space] U+020 across all language
sets of Unicode? Is it the same ie: 100%?
I don't understand what you are asking, and I have a hunch you haven't said
it in a way
On Tue, Feb 24, 2015 at 9:38 AM, Stephen E Slevinski Jr
sle...@signpuddle.net wrote:
Hi Unicode list,
This is a useful place for discussion, but once the discussion peters out
please submit formal feedback: http://www.unicode.org/review/pri285/
I am concerned that the SignWriting symbols as
On Thu, Feb 19, 2015 at 11:51 PM, Eli Zaretskii e...@gnu.org wrote:
I think decomposition to NFKD solves these issues, doesn't it?
Not completely. Judging from your question, you expected more mappings than
NFKD has. You might want to try the mappings that are used as input for
deriving the
On Thu, Feb 19, 2015 at 12:17 PM, Eli Zaretskii e...@gnu.org wrote:
Sorry, I disagree. First, collation data is overkill for search,
since the order information is not required, so the weights are simply
wasting storage. Second, people do want to find, e.g., ² when they
search for 2 etc.
On Mon, Feb 9, 2015 at 9:54 AM, Andrea Giammarchi
andrea.giammar...@gmail.com wrote:
if a cultural/language TLD is typed with Unicode RIS, then show the flag
for these culture/language:
This does not work. The Unicode RIS are defined to be used in pairs, with
semantics according to
On Mon, Feb 9, 2015 at 1:11 PM, Joan Montané j...@montane.cat wrote:
AFAIK, this is done in font side. Emoji flags are just ligatures, so a
font can provide a ligature for 4 RIS characters.
Technically true, but a font that violates the encoding standard would
cause large problems. Imagine a
These are not block boundaries. These lines are for book chart production,
where we don't need to print every unsigned code point.
markus
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode
Dear Unicoders, which is the proper second character in N'Ko?
See below for details.
Thanks,
markus
-- Forwarded message --
From: Doug Ewell d...@ewellic.org
Date: Sat, Jan 31, 2015 at 9:16 AM
Subject: Apostrophes (was: Re: ISO 639-3 changes)
To: Philip Newton
On Fri, Oct 31, 2014 at 6:20 AM, Jörg Knappen jknap...@web.de wrote:
Does someone here is aware of a standard or a de facto standard for names
or codes of historical countries? For the requirement I have in mind, all
countries where there was a printing press would be optimal coverage,
On Mon, Oct 13, 2014 at 2:23 PM, Jean-François Colson j...@colson.eu wrote:
I’ve found a 16-year-old proposal for Blissymbolics (
http://www.evertype.com/standards/iso10646/pdf/bliss.pdf ) but nothing more
recent. Was that script rejected? Was it forgotten? Are there any technical
difficulties
As Michael said, I don't have information. But I found this which might help:
http://en.wikipedia.org/wiki/Blissymbols#Towards_the_international_standardization_of_the_script
markus
___
Unicode mailing list
Unicode@unicode.org
Some of the data is available in the Unicode CLDR script metadata:
http://unicode.org/cldr/trac/browser/trunk/common/properties/scriptMetadata.txt
http://cldr.unicode.org/development/updating-codes/updating-script-metadata
markus
--
Google Internationalization Engineering
The context-sensitive and/or language-sensitive mappings are here:
http://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt
Best regards,
markus
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode
On Tue, Jun 24, 2014 at 4:56 PM, Daniel Bünzli daniel.buen...@erratique.ch
wrote:
Does an algorithm that simply applies R1 *regardless of context*
constitute a default case algorithm or not ? I.e. does simply mapping each
character C in a string using Uppercase_Mapping (C) (e.g. as exposed by
On Tue, Jun 24, 2014 at 6:46 PM, Daniel Bünzli daniel.buen...@erratique.ch
wrote:
Having a look at the data it seems that the Uppercase_Mapping property of
UCD includes (using the terminology of SpecialCasing.txt):
* All the unconditional mappings of SpecialCasing.txt (context independent)
*
On Wed, Jun 11, 2014 at 9:29 PM, Karl Williamson pub...@khwilliamson.com
wrote:
I have a something like a library that was written a long time ago (not by
me) assuming that noncharacters were illegal in open interchange. Programs
that use the library were guaranteed that they would not receive
On Mon, Jun 2, 2014 at 8:27 AM, Doug Ewell d...@ewellic.org wrote:
I suspect everyone can agree on the edge cases, that noncharacters are
harmless in internal processing, but probably should not appear in
random text shipped around on the web.
Right, in principle. However, it should be ok to
On Mon, Jun 2, 2014 at 10:00 AM, Shawn Steele shawn.ste...@microsoft.com
wrote:
To further my understanding, can someone provide examples of how these are
used in actual practice?
CLDR collation data defines special contraction mappings that start with a
noncharacter, for
On Mon, Jun 2, 2014 at 1:32 PM, David Starner prosfil...@gmail.com wrote:
I would especially discourage any web browser from handling
these; they're noncharacters used for unknown purposes that are
undisplayable and if used carelessly for their stated purpose, can
probably trigger serious
On Sun, Jun 1, 2014 at 1:49 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
D80: Unicode string:
A code unit sequence containing code units of a particular Unicode
encoding form...
Right -- in a Unicode 16-bit string, you have a sequence of any 16-bit
value in any order.
On Sun, Jun 1, 2014 at 7:49 AM, Karl Williamson pub...@khwilliamson.com
wrote:
Thanks, I had not thought about that. I'm thinking wording something like
this is more appropriate
Noncharacters may be openly interchanged, but it is inadvisable to do so
without prior agreement, since at each
On Sat, May 31, 2014 at 6:41 AM, Mark Davis ☕️ m...@macchiato.com wrote:
I think you have a point here. We should probably change to:
To meet this requirement, an implementation shall supply a mechanism for
specifying any Unicode scalar value (from U+ to U+D7FF and U+E000 to
U+10),
On Sat, May 31, 2014 at 1:59 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
Bear in mind that a pattern \uD808 shall not match anything in a
well-formed Unicode string.
Depends. See the definitions of Unicode strings vs. UTF strings.
\uD808\uDF45 specifies a sequence of two
In addition, the Block property is not particularly useful even in regular
expressions or other processing. It is almost always more useful to use
Script, Alphabetic, Unified_Ideograph, etc.
Blocks help with planning and allocation but little else.
markus
If you use Unicode 16-bit strings, it's easy to pass through unpaired
surrogates and treat them like code points; it's often not productive or
necessary to check for them all the time, that is, to be strict about
UTF-16.
On the other hand, I don't think anyone expects you to support invalid
If there is a Gmail bug, then please report it.
Either way, I suggest you go into Gmail Settings and set it to Use Unicode
(UTF-8) encoding for outgoing messages
markus
___
Unicode mailing list
Unicode@unicode.org
On Fri, Apr 25, 2014 at 11:06 PM, Mathias Bynens math...@qiwi.be wrote:
My initial question can be rephrased as the following remark/change
request:
http://unicode.org/reports/tr31/#Default_Identifier_Syntax could make it
more clear that “stability extensions” means `Other_ID_Start` and
On Fri, Apr 25, 2014 at 6:05 AM, Steffen Nurpmeso sdao...@yandex.comwrote:
|What I tried to say is, if you need ID_Start, then parse ID_Start from
|DerivedCoreProperties.txt. That's more stable (and easier than parsing
the
|pieces and deriving
|
|# Lu + Ll + Lt + Lm + Lo + Nl
|#
On Fri, Apr 25, 2014 at 1:54 AM, Eli Zaretskii e...@gnu.org wrote:
I also have a couple of questions about matching the canonical
equivalents of the opening bracket:
Please take a look at the date of the tech note.
I suggest you start a new thread with a new subject for serious discussion.
1 - 100 of 446 matches
Mail list logo