Sadly, time is not that simple. Most people using calendars consider the
duration between January 15 and March 15 to be exactly 2 months. But such
intervals are a different number of days, hence milliseconds.
Mark
On Mon, Mar 4, 2019 at 11:21 AM Naveen Chawla wrote:
> I don't like it.
.reverse would only be reasonable for a subset of characters supported by
Unicode. Its primary cited use case is for a particular educational
example, when there are probably thousands of similar examples of educational
snippets that would be rarely used in a production environment. Given that,
it
I think the cleanest mental model is where UTF-16 or UTF-8 strings are
interpreted as if they were transformed into UTF-32.
While that is generally feasible, it often represents a cost in performance
which is not acceptable in practice. So you see various approaches that
involve some deviation
Good, that sounds right.
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico del bene —*
On Wed, Jan 28, 2015 at 12:57 PM, André Bargull andre.barg...@udo.edu
wrote:
On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä marja at chromium.org
https://mail.mozilla.org/listinfo/es-discuss
On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä ma...@chromium.org wrote:
The ES6 unicode regexp spec is not very clear regarding what should happen
if the regexp or the matched string contains lonely surrogates (a lead
surrogate without a trail, or a trail without a lead). For example, for the
On Wed, Nov 13, 2013 at 3:51 PM, Joe Hildebrand (jhildebr)
jhild...@cisco.com wrote:
that all software implementations
which receive the un-prefixed text will not generate parse errors.
perhaps:
...all conformant software ...
Mark https://google.com/+MarkDavis
*— Il meglio è l’inimico
Lindenberg
ecmascr...@lindenbergsoftware.com wrote:
The identifier issues first:
On Mar 1, 2013, at 7:40 , Mark Davis ☕ wrote:
These names are canonicalized to the corresponding Zone name in the
casing used
Because the Zone names are unstable, in CLDR we adopted the same
convention
don't
think UTC offset is a good idea because that doesn't really represent a
Timezone very well. (If a meeting gets moved to a following week, that
offset might change or be wrong)
On Mar 1, 2013, at 7:40 , Mark Davis ☕ wrote:
This is a problematic. The canonicalized names are very ugly. What
On Sat, Mar 2, 2013 at 5:11 PM, Shawn Steele shawn.ste...@microsoft.comwrote:
I’m uncomfortable using the CLDR names, although perhaps they could be
aliases, because other standards use the tzdb names and we have to be able
to look up the tzdb names. It might be nice to get more stability for
These names are canonicalized to the corresponding Zone name in the
casing used
Because the Zone names are unstable, in CLDR we adopted the same convention
as in BCP47. That is, our canonical form never changes, no matter what
happens to Zone names. I'd strongly recommend using those as the
The man main complication for compatibility is indexing.
See
http://macchiati.blogspot.com/2012/07/unicode-string-models-many-programming.html
If you look back about a year in this list's archive you'll find a long
discussion.
{phone}
On Dec 21, 2012 9:34 PM, Chris Angelico ros...@gmail.com
clearly what's needed and then
approach the relevant W3C working group(s), possibly through the
Internationalization working group.
Norbert
On Oct 15, 2012, at 18:14 , Mark Davis ☕ wrote:
I added the following for discussion:
https://bugs.ecmascript.org/show_bug.cgi?id=798
https
I added the following for discussion:
https://bugs.ecmascript.org/show_bug.cgi?id=798
https://bugs.ecmascript.org/show_bug.cgi?id=797
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Mon, Oct 15, 2012 at 5:51 PM, Gillam, Richard
In ICU, we are using Gregorian eras (AD/BC) as customarily interpreted, and
there is no year zero. There isn't a simple way to get non-era years—and
that form is mostly interesting to techies, not normal people, which is why
we support the era form.
(If someone wanted to do it, you could probably
for them?
Norbert
On Sep 13, 2012, at 13:31 , Mark Davis ☕ wrote:
In ICU, we are using Gregorian eras (AD/BC) as customarily interpreted,
and there is no year zero. There isn't a simple way to get non-era
years—and that form is mostly interesting to techies, not normal people,
which is why we
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Wed, Sep 12, 2012 at 8:43 PM, Norbert Lindenberg
ecmascr...@norbertlindenberg.com wrote:
ES5 section 15.9.1 specifies a number of operations to map time values
(measured in milliseconds from
+Peter, since he has an interest in these issues.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Wed, Sep 12, 2012 at 9:37 PM, Mark Davis ☕ m...@macchiato.com wrote:
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è
Can you reformulate the table attached to
http://unicode.org/cldr/trac/ticket/5302?
In particular, if a currency is not in the LDML table, it gets the default
values (see below). So you need to compare on that basis.
It is much better for comparison if you attach a tab- or comma-delimited
file,
on by default.
Norbert
On Sep 4, 2012, at 13:23 , Mark Davis ☕ wrote:
In view of the schedule, I suggest that we make your first, minimal
change right now, and plan to correct it along one of the other lines in
the next edition.
#1 is much weaker than we want, so we should correct
normalization is primarily an optimization and so should be under
application control.
Comments?
Norbert
On Sep 1, 2012, at 16:19 , Mark Davis ☕ wrote:
Support for the normalization property in options and the kk key would
become mandatory.
The options that ICU offers are to observe
other thoughts?
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Sun, Sep 2, 2012 at 8:15 AM, Markus Scherer markus@gmail.com wrote:
On Sat, Sep 1, 2012 at 4:19 PM, Mark Davis ☕ m...@macchiato.com wrote:
Your proposal looks reasonable, except
off normalization (i.e., normalization
is on by default, independent of locale). Support for the normalization
property in options and the kk key would become mandatory.
Norbert
On Aug 31, 2012, at 10:12 , Mark Davis ☕ wrote:
I think we could go either way. It depends on the usage mode
I think we could go either way. It depends on the usage mode.
1. The case where performance is crucial is where you are comparing
gazillions of strings, such as records in a database.
2. If the number of strings to be compared is relatively small, and/or
there is enough overhead
ICU *is* always able to compare them as being equal, just by setting the
parameter.
Even if the parameter isn't set, it uses an FCD sort (see
http://unicode.org/notes/tn5/) and canonical closure, which handles most
cases of canonical equivalence. The default is turned on for languages
where the
17, 2012, at 14:49 , Brendan Eich wrote:
Allen Wirfs-Brock wrote:
On Jul 16, 2012, at 2:57 PM, Mark Davis ☕ wrote:
In order to support backwards iteration (which is sometimes used), we
should have codePointBefore.
or we can provide a backwards iterator that knows how to parse
In order to support backwards iteration (which is sometimes used), we
should have codePointBefore.
--
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Mon, Jul 16, 2012 at 2:54 PM, Gillam, Richard gil...@lab126.com
, that was very much
in flux. I chatted with Nebosja Ciric, Mark Davis and others last
March and they were planning on contributing to another proposal so I
just focused on explaining how a message extraction - localization -
message reintegration pipeline could work with messages in quasis, and
showing
I tend to agree with your proposal.
Some caveats below.
--
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Tue, Jun 26, 2012 at 3:22 PM, Norbert Lindenberg
ecmascr...@norbertlindenberg.com wrote:
The TC 39
This is for v2, right?
--
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Tue, May 29, 2012 at 5:34 PM, Norbert Lindenberg
ecmascr...@norbertlindenberg.com wrote:
The ECMAScript Language Specification 5.1 makes
Lgtm
On Mar 26, 2012 4:59 PM, Norbert Lindenberg
ecmascr...@norbertlindenberg.com wrote:
While everybody is reviewing the draft specification of the ECMAScript
Internationalization API [1] in preparation for this week's TC 39 meeting,
here are a few issues that have come up, with proposed
l’inimico del bene —*
**
On Tue, Mar 27, 2012 at 08:56, Glenn Adams gl...@skynav.com wrote:
This begs the question of what is the point of C1.
On Tue, Mar 27, 2012 at 9:13 AM, Mark Davis ☕ m...@macchiato.com wrote:
That would not be practical, nor predictable. And note that the 700K
reserved
Whew, a lot of work, Norbert. Looks quite good. My one question is whether
it is worth having a mechanism for iteration.
OLD CODE
for (int i = 0; i s.length(); ++) {
var x = s.charAt(i);
// do something with x
}
Using your mechanism, one would write:
NEW CODE
for (int i = 0; i s.length();
First, it would be great to get full Unicode support in JS. I know that's
been a problem for us at Google.
Secondly, while I agree with Addison that the approach that Java took is
workable, it does cause problems. Ideally someone would be able to loop (a
very common construct) with:
for
You can't use \u10 as syntax, because that could be \u10FF followed by
literal FF. A better syntax is \u{...}, with 1 to 6 digits, values from 0
.. 10.
Mark
*— Il meglio è l’inimico del bene —*
*
*
*
[https://plus.google.com/114199149796022210033]
*
On Wed, Jan 25, 2012 at 10:59,
(oh, and I agree with your other points)
Mark
*— Il meglio è l’inimico del bene —*
*
*
*
[https://plus.google.com/114199149796022210033]
*
On Wed, Jan 25, 2012 at 11:11, Mark Davis ☕ m...@macchiato.com wrote:
You can't use \u10 as syntax, because that could be \u10FF followed by
literal
Here's the problem.
The very same collator for de is valid for de-DE, de-AT, and de-CH.
In ICU you actually get a functionally-equivalent object back, no matter
which of these you ask for.
However, that collator is *also* valid for other countries where 'de' is
official: de-LU, de-BE, de-LI.
Regex has not been part of scope of the Globalization API work. I wanted to
find out whether any improvements from an internationalization point of
view are being planned, separately.
Some of the problems include:
- Regex's fail on supplementary characters (above U+). Most of these
are
Works for me. (I would need to be out from 11:00-12:30.)
Mark
*— Il meglio è l’inimico del bene —*
On Mon, Aug 1, 2011 at 09:29, Nebojša Ćirić c...@google.com wrote:
So far we have Monday and Tuesday off the table, and some people hinting
that Wednesday would work best for them. Anybody has
FYI
About the new BCP47 support in Java:
http://download.oracle.com/javase/tutorial/i18n/locale/extensions.html
The following, comparing Unicode support in programing languages, including
ES.
-- Forwarded message --
From: Karl Williamson pub...@khwilliamson.com
Date: Sat, Jul
, NumberFormat, or DateTimeFormat. E.g., if I request
ar-MA-u-ca-islamic, did I get exactly what I requested, or
ar-MA-u-ca-islamicc, ar-MA-u-ca-gregory, ar-u-ca-gregory, or yet something
else?
Best regards,
Norbert
On Jul 20, 2011, at 9:46 , Mark Davis ☕ wrote:
I have comments on some
I have comments on some of these.
Mark
*— Il meglio è l’inimico del bene —*
On Tue, Jul 19, 2011 at 01:29, Norbert Lindenberg
ecmascr...@norbertlindenberg.com wrote:
Hi all,
I'm sorry for not having been able to contribute to the
internationalization API earlier. I finally have reviewed
Markus isn't on es-discuss, so forwarding
-- Forwarded message --
From: Markus Scherer markus@gmail.com
Date: Wed, May 18, 2011 at 22:18
Subject: Re: Full Unicode strings strawman
To: Allen Wirfs-Brock al...@wirfs-brock.com
Cc: Shawn Steele shawn.ste...@microsoft.com, Mark
charsets ;-)
On 17 May 2011 21:55, Mark Davis ☕ m...@macchiato.com wrote:In the past,
I have read it thus, pseudo BNF:
UnicodeString = CodeUnitSequence // D80
CodeUnitSequence = CodeUnit | CodeUnitSequence CodeUnit // D78
CodeUnit = anything in the current encoding form // D77
So far, so
Yes, one of the options for the internal storage of the string class is to
use different arrays depending on the contents.
1. uint8's if all the codepoint are =FF
2. uint16's if all the codepoint values =
3. uint32's otherwise
That way the internal storage always corresponds
The wrong conclusion is being drawn. I can say definitively that for the
string a\uD800b.
- It is a valid Unicode string, according to the Unicode Standard.
- It cannot be encoded as well-formed in any UTF-x (it is not
'well-formed' in any UTF).
- When it comes to conversion, the bad
That is incorrect. See below.
Mark
*— Il meglio è l’inimico del bene —*
On Tue, May 17, 2011 at 18:33, Wes Garland w...@page.ca wrote:
On 17 May 2011 20:09, Boris Zbarsky bzbar...@mit.edu wrote:
On 5/17/11 5:24 PM, Wes Garland wrote:
Okay, I think we have to agree to disagree here. I
I'm quite sympathetic to the goal, but the proposal does represent a
significant breaking change. The problem, as Shawn points out, is with
indexing. Before, the strings were defined as UTF16.
Take a sample string \ud800\udc00\u0061 = \u{1}\u{61}. Right now,
the 'a' (the \u{61}) is at offset
-discuss-boun...@mozilla.org] *On Behalf Of *Jungshik Shin (???, ???)
*Sent:* Monday, May 16, 2011 2:24 PM
*To:* Mark Davis ☕
*Cc:* Markus Scherer; es-discuss@mozilla.org
*Subject:* Re: Full Unicode strings strawman
On Mon, May 16, 2011 at 2:19 PM, Mark Davis ☕ m...@macchiato.com wrote
A correction.
U+D800 is indeed a code point: http://www.unicode.org/glossary/#Code_Point. It
is defined for usage in Unicode Strings (see
http://www.unicode.org/glossary/#Unicode_String) because often it is useful
for implementations to be able to allow it in processing.
It does, however, have a
In practice, the supplemental code points don't really cause problems in
Unicode strings. Most implementations just treat them as if they were
unassigned. The only important issue is that *when* they are converted to
UTF-xx for storage or transmission, they need to be handled; typically by
, if one is using Turkish I, we expect all of them to do so.
- Shawn
*From:* Nebojša Ćirić [mailto:c...@google.com]
*Sent:* Monday, March 28, 2011 1:36 PM
*To:* Mark Davis ☕
*Cc:* es-discuss@mozilla.org; Shawn Steele; Phillips, Addison
*Subject:* Re: Collation API not complete for search
I think an iterator is a cleaner interface; we were just trying to minimize
new API.
In general, collation is context sensitive, so searching on substrings isn't
a good idea. You want to search from a location, but have the rest of the
text available to you.
For the iterator, you would need to
There are really 5 cases at issue:
1. Code point breaks
2. Grapheme-Cluster breaks (with three possible variants: 'legacy',
extended, and aksha http://www.unicode.org/glossary/#aksara)
3. Word breaks
4. Line breaks
5. Sentence breaks
Notes:
- #1 is pretty trivial to do
As stated before, I think that this approach is more error prone; that it
would be better to explicitly call the other function. Here would be the
difference between the two alternatives for the API: A and B, under the two
common scenarios:
*Scenario 1 I don't care*
A.
x = myLocaleInfo.region;
I would actually rather not have it be a construction argument, because it
is easier for people to make mistakes that way.
When I look this over, there are relatively few fields that need this. So
what about having API like:
// get an explicitly-set region, or null if there was no region
to infer, then it
should be easy to avoid inferring for LocaleInfo.
-Shawn
*From:* es-discuss-boun...@mozilla.org [mailto:
es-discuss-boun...@mozilla.org] *On Behalf Of *Mark Davis ?
*Sent:* Friday, January 21, 2011 12:44 PM
*To:* Axel Hecht
*Cc:* Derek Murman; es-discuss@mozilla.org
[mailto:mark.edward.da...@gmail.com] *On
Behalf Of *Mark Davis ?
*Sent:* Poʻahā, Ianuali 20, 2011 4:05 hours
*To:* Shawn Steele
*Cc:* es-discuss@mozilla.org; Peter Constable; Derek Murman
*Subject:* Re: i18n collator options
(BTW I haven't gotten added to es-discuss yet, so one of you might forward
these 3
*Re the following message:*
*
*
It is clearly expected that the number of locales available on any
particular device may be limited; a smartphone, for example, might have very
few installed, or have limited services for those it does have installed. With
the locale model, implementations are
58 matches
Mail list logo