Re: Proposal: Duration

2019-03-04 Thread Mark Davis ☕️
Sadly, time is not that simple. Most people using calendars consider the duration between January 15 and March 15 to be exactly 2 months. But such intervals are a different number of days, hence milliseconds. Mark On Mon, Mar 4, 2019 at 11:21 AM Naveen Chawla wrote: > I don't like it.

Re: add reverse() method to strings

2018-03-18 Thread Mark Davis ☕️
.reverse would only be reasonable for a subset of characters supported by Unicode. Its primary cited use case is for a particular educational example, when there are probably thousands of similar examples of educational snippets that would be rarely used in a production environment. Given that, it

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mark Davis ☕️
I think the cleanest mental model is where UTF-16 or UTF-8 strings are interpreted as if they were transformed into UTF-32. While that is generally feasible, it often represents a cost in performance which is not acceptable in practice. So you see various approaches that involve some deviation

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mark Davis ☕️
Good, that sounds right. Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Wed, Jan 28, 2015 at 12:57 PM, André Bargull andre.barg...@udo.edu wrote: On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä marja at chromium.org https://mail.mozilla.org/listinfo/es-discuss

Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mark Davis ☕️
On Wed, Jan 28, 2015 at 11:36 AM, Marja Hölttä ma...@chromium.org wrote: The ES6 unicode regexp spec is not very clear regarding what should happen if the regexp or the matched string contains lonely surrogates (a lead surrogate without a trail, or a trail without a lead). For example, for the

Re: [Json] JSON: remove gap between Ecma-404 and IETF draft

2013-11-13 Thread Mark Davis
On Wed, Nov 13, 2013 at 3:51 PM, Joe Hildebrand (jhildebr) jhild...@cisco.com wrote: that all software implementations which receive the un-prefixed text will not generate parse errors. perhaps: ...​all conformant software ...​ Mark https://google.com/+MarkDavis *— Il meglio è l’inimico

Re: Internationalization: Support for IANA time zones

2013-03-02 Thread Mark Davis
Lindenberg ecmascr...@lindenbergsoftware.com wrote: The identifier issues first: On Mar 1, 2013, at 7:40 , Mark Davis ☕ wrote: These names are canonicalized to the corresponding Zone name in the casing used Because the Zone names are unstable, in CLDR we adopted the same convention

Re: Internationalization: Support for IANA time zones

2013-03-02 Thread Mark Davis
don't think UTC offset is a good idea because that doesn't really represent a Timezone very well. (If a meeting gets moved to a following week, that offset might change or be wrong) On Mar 1, 2013, at 7:40 , Mark Davis ☕ wrote: This is a problematic. The canonicalized names are very ugly. What

Re: Internationalization: Support for IANA time zones

2013-03-02 Thread Mark Davis
On Sat, Mar 2, 2013 at 5:11 PM, Shawn Steele shawn.ste...@microsoft.comwrote: I’m uncomfortable using the CLDR names, although perhaps they could be aliases, because other standards use the tzdb names and we have to be able to look up the tzdb names. It might be nice to get more stability for

Re: Internationalization: Support for IANA time zones

2013-03-01 Thread Mark Davis
These names are canonicalized to the corresponding Zone name in the casing used Because the Zone names are unstable, in CLDR we adopted the same convention as in BCP47. That is, our canonical form never changes, no matter what happens to Zone names. I'd strongly recommend using those as the

Re: Flexible String Representation - full Unicode for ES6?

2012-12-21 Thread Mark Davis
The man main complication for compatibility is indexing. See http://macchiati.blogspot.com/2012/07/unicode-string-models-many-programming.html If you look back about a year in this list's archive you'll find a long discussion. {phone} On Dec 21, 2012 9:34 PM, Chris Angelico ros...@gmail.com

Re: API for text editing

2012-10-18 Thread Mark Davis
clearly what's needed and then approach the relevant W3C working group(s), possibly through the Internationalization working group. Norbert On Oct 15, 2012, at 18:14 , Mark Davis ☕ wrote: I added the following for discussion: https://bugs.ecmascript.org/show_bug.cgi?id=798 https

Re: Minutes from 10/5 internationalization ad-hoc meeting

2012-10-15 Thread Mark Davis
I added the following for discussion: https://bugs.ecmascript.org/show_bug.cgi?id=798 https://bugs.ecmascript.org/show_bug.cgi?id=797 Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Mon, Oct 15, 2012 at 5:51 PM, Gillam, Richard

Re: Calendar issues

2012-09-13 Thread Mark Davis
In ICU, we are using Gregorian eras (AD/BC) as customarily interpreted, and there is no year zero. There isn't a simple way to get non-era years—and that form is mostly interesting to techies, not normal people, which is why we support the era form. (If someone wanted to do it, you could probably

Re: Calendar issues

2012-09-13 Thread Mark Davis
for them? Norbert On Sep 13, 2012, at 13:31 , Mark Davis ☕ wrote: In ICU, we are using Gregorian eras (AD/BC) as customarily interpreted, and there is no year zero. There isn't a simple way to get non-era years—and that form is mostly interesting to techies, not normal people, which is why we

Re: Calendar issues

2012-09-12 Thread Mark Davis
Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Wed, Sep 12, 2012 at 8:43 PM, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: ES5 section 15.9.1 specifies a number of operations to map time values (measured in milliseconds from

Re: Calendar issues

2012-09-12 Thread Mark Davis
+Peter, since he has an interest in these issues. Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Wed, Sep 12, 2012 at 9:37 PM, Mark Davis ☕ m...@macchiato.com wrote: Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è

Re: General comment on ES 402 test suite (i18n)

2012-09-11 Thread Mark Davis
Can you reformulate the table attached to http://unicode.org/cldr/trac/ticket/5302? In particular, if a currency is not in the LDML table, it gets the default values (see below). So you need to compare on that basis. It is much better for comparison if you attach a tab- or comma-delimited file,

Re: ECMAScript collation question

2012-09-05 Thread Mark Davis
on by default. Norbert On Sep 4, 2012, at 13:23 , Mark Davis ☕ wrote: In view of the schedule, I suggest that we make your first, minimal change right now, and plan to correct it along one of the other lines in the next edition. #1 is much weaker than we want, so we should correct

Re: ECMAScript collation question

2012-09-04 Thread Mark Davis
normalization is primarily an optimization and so should be under application control. Comments? Norbert On Sep 1, 2012, at 16:19 , Mark Davis ☕ wrote: Support for the normalization property in options and the kk key would become mandatory. The options that ICU offers are to observe

Re: ECMAScript collation question

2012-09-02 Thread Mark Davis
other thoughts? Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Sun, Sep 2, 2012 at 8:15 AM, Markus Scherer markus@gmail.com wrote: On Sat, Sep 1, 2012 at 4:19 PM, Mark Davis ☕ m...@macchiato.com wrote: Your proposal looks reasonable, except

Re: ECMAScript collation question

2012-09-01 Thread Mark Davis
off normalization (i.e., normalization is on by default, independent of locale). Support for the normalization property in options and the kk key would become mandatory. Norbert On Aug 31, 2012, at 10:12 , Mark Davis ☕ wrote: I think we could go either way. It depends on the usage mode

Re: ECMAScript collation question

2012-08-31 Thread Mark Davis
I think we could go either way. It depends on the usage mode. 1. The case where performance is crucial is where you are comparing gazillions of strings, such as records in a database. 2. If the number of strings to be compared is relatively small, and/or there is enough overhead

Re: ECMAScript collation question

2012-08-30 Thread Mark Davis
ICU *is* always able to compare them as being equal, just by setting the parameter. Even if the parameter isn't set, it uses an FCD sort (see http://unicode.org/notes/tn5/) and canonical closure, which handles most cases of canonical equivalence. The default is turned on for languages where the

Re: Unicode support in new ES6 spec draft

2012-07-17 Thread Mark Davis
17, 2012, at 14:49 , Brendan Eich wrote: Allen Wirfs-Brock wrote: On Jul 16, 2012, at 2:57 PM, Mark Davis ☕ wrote: In order to support backwards iteration (which is sometimes used), we should have codePointBefore. or we can provide a backwards iterator that knows how to parse

Re: Unicode support in new ES6 spec draft

2012-07-16 Thread Mark Davis
In order to support backwards iteration (which is sometimes used), we should have codePointBefore. -- Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Mon, Jul 16, 2012 at 2:54 PM, Gillam, Richard gil...@lab126.com

Re: Quasi-literals and localization

2012-07-12 Thread Mark Davis
, that was very much in flux. I chatted with Nebosja Ciric, Mark Davis and others last March and they were planning on contributing to another proposal so I just focused on explaining how a message extraction - localization - message reintegration pipeline could work with messages in quasis, and showing

Re: Internationalization: Additional values in API

2012-06-26 Thread Mark Davis
I tend to agree with your proposal. Some caveats below. -- Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Tue, Jun 26, 2012 at 3:22 PM, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: The TC 39

Re: Unicode normalization

2012-05-29 Thread Mark Davis
This is for v2, right? -- Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Tue, May 29, 2012 at 5:34 PM, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: The ECMAScript Language Specification 5.1 makes

Re: Internationalization API issues and updates

2012-04-16 Thread Mark Davis
Lgtm On Mar 26, 2012 4:59 PM, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: While everybody is reviewing the draft specification of the ECMAScript Internationalization API [1] in preparation for this week's TC 39 meeting, here are a few issues that have come up, with proposed

Re: Full Unicode based on UTF-16 proposal

2012-03-27 Thread Mark Davis
l’inimico del bene —* ** On Tue, Mar 27, 2012 at 08:56, Glenn Adams gl...@skynav.com wrote: This begs the question of what is the point of C1. On Tue, Mar 27, 2012 at 9:13 AM, Mark Davis ☕ m...@macchiato.com wrote: That would not be practical, nor predictable. And note that the 700K reserved

Re: Full Unicode based on UTF-16 proposal

2012-03-16 Thread Mark Davis
Whew, a lot of work, Norbert. Looks quite good. My one question is whether it is worth having a mechanism for iteration. OLD CODE for (int i = 0; i s.length(); ++) { var x = s.charAt(i); // do something with x } Using your mechanism, one would write: NEW CODE for (int i = 0; i s.length();

Re: New full Unicode for ES6 idea

2012-02-19 Thread Mark Davis
First, it would be great to get full Unicode support in JS. I know that's been a problem for us at Google. Secondly, while I agree with Addison that the approach that Java took is workable, it does cause problems. Ideally someone would be able to loop (a very common construct) with: for

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Mark Davis
You can't use \u10 as syntax, because that could be \u10FF followed by literal FF. A better syntax is \u{...}, with 1 to 6 digits, values from 0 .. 10. Mark *— Il meglio è l’inimico del bene —* * * * [https://plus.google.com/114199149796022210033] * On Wed, Jan 25, 2012 at 10:59,

Re: Question about the “full Unicode in strings” strawman

2012-01-25 Thread Mark Davis
(oh, and I agree with your other points) Mark *— Il meglio è l’inimico del bene —* * * * [https://plus.google.com/114199149796022210033] * On Wed, Jan 25, 2012 at 11:11, Mark Davis ☕ m...@macchiato.com wrote: You can't use \u10 as syntax, because that could be \u10FF followed by literal

Re: Globalization API: supportedLocalesOf vs. getSupportedLocales

2011-11-28 Thread Mark Davis
Here's the problem. The very same collator for de is valid for de-DE, de-AT, and de-CH. In ICU you actually get a functionally-equivalent object back, no matter which of these you ask for. However, that collator is *also* valid for other countries where 'de' is official: de-LU, de-BE, de-LI.

Regex

2011-11-17 Thread Mark Davis
Regex has not been part of scope of the Globalization API work. I wanted to find out whether any improvements from an internationalization point of view are being planned, separately. Some of the problems include: - Regex's fail on supplementary characters (above U+). Most of these are

Re: i18n meeting mid August @ Google

2011-08-01 Thread Mark Davis
Works for me. (I would need to be out from 11:00-12:30.) Mark *— Il meglio è l’inimico del bene —* On Mon, Aug 1, 2011 at 09:29, Nebojša Ćirić c...@google.com wrote: So far we have Monday and Tuesday off the table, and some people hinting that Wednesday would work best for them. Anybody has

Fwd: Slide show: Survey of current programming language support for Unicode

2011-08-01 Thread Mark Davis
FYI About the new BCP47 support in Java: http://download.oracle.com/javase/tutorial/i18n/locale/extensions.html The following, comparing Unicode support in programing languages, including ES. -- Forwarded message -- From: Karl Williamson pub...@khwilliamson.com Date: Sat, Jul

Re: Comments on internationalization API

2011-07-22 Thread Mark Davis
, NumberFormat, or DateTimeFormat. E.g., if I request ar-MA-u-ca-islamic, did I get exactly what I requested, or ar-MA-u-ca-islamicc, ar-MA-u-ca-gregory, ar-u-ca-gregory, or yet something else? Best regards, Norbert On Jul 20, 2011, at 9:46 , Mark Davis ☕ wrote: I have comments on some

Re: Comments on internationalization API

2011-07-20 Thread Mark Davis
I have comments on some of these. Mark *— Il meglio è l’inimico del bene —* On Tue, Jul 19, 2011 at 01:29, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: Hi all, I'm sorry for not having been able to contribute to the internationalization API earlier. I finally have reviewed

Fwd: Full Unicode strings strawman

2011-05-19 Thread Mark Davis
Markus isn't on es-discuss, so forwarding -- Forwarded message -- From: Markus Scherer markus@gmail.com Date: Wed, May 18, 2011 at 22:18 Subject: Re: Full Unicode strings strawman To: Allen Wirfs-Brock al...@wirfs-brock.com Cc: Shawn Steele shawn.ste...@microsoft.com, Mark

Re: Full Unicode strings strawman

2011-05-18 Thread Mark Davis
charsets ;-) On 17 May 2011 21:55, Mark Davis ☕ m...@macchiato.com wrote:In the past, I have read it thus, pseudo BNF: UnicodeString = CodeUnitSequence // D80 CodeUnitSequence = CodeUnit | CodeUnitSequence CodeUnit // D78 CodeUnit = anything in the current encoding form // D77 So far, so

Re: Full Unicode strings strawman

2011-05-18 Thread Mark Davis
Yes, one of the options for the internal storage of the string class is to use different arrays depending on the contents. 1. uint8's if all the codepoint are =FF 2. uint16's if all the codepoint values = 3. uint32's otherwise That way the internal storage always corresponds

Re: Full Unicode strings strawman

2011-05-17 Thread Mark Davis
The wrong conclusion is being drawn. I can say definitively that for the string a\uD800b. - It is a valid Unicode string, according to the Unicode Standard. - It cannot be encoded as well-formed in any UTF-x (it is not 'well-formed' in any UTF). - When it comes to conversion, the bad

Re: Full Unicode strings strawman

2011-05-17 Thread Mark Davis
That is incorrect. See below. Mark *— Il meglio è l’inimico del bene —* On Tue, May 17, 2011 at 18:33, Wes Garland w...@page.ca wrote: On 17 May 2011 20:09, Boris Zbarsky bzbar...@mit.edu wrote: On 5/17/11 5:24 PM, Wes Garland wrote: Okay, I think we have to agree to disagree here. I

Re: Full Unicode strings strawman

2011-05-16 Thread Mark Davis
I'm quite sympathetic to the goal, but the proposal does represent a significant breaking change. The problem, as Shawn points out, is with indexing. Before, the strings were defined as UTF16. Take a sample string \ud800\udc00\u0061 = \u{1}\u{61}. Right now, the 'a' (the \u{61}) is at offset

Re: Full Unicode strings strawman

2011-05-16 Thread Mark Davis
-discuss-boun...@mozilla.org] *On Behalf Of *Jungshik Shin (???, ???) *Sent:* Monday, May 16, 2011 2:24 PM *To:* Mark Davis ☕ *Cc:* Markus Scherer; es-discuss@mozilla.org *Subject:* Re: Full Unicode strings strawman On Mon, May 16, 2011 at 2:19 PM, Mark Davis ☕ m...@macchiato.com wrote

Re: Full Unicode strings strawman

2011-05-16 Thread Mark Davis
A correction. U+D800 is indeed a code point: http://www.unicode.org/glossary/#Code_Point. It is defined for usage in Unicode Strings (see http://www.unicode.org/glossary/#Unicode_String) because often it is useful for implementations to be able to allow it in processing. It does, however, have a

Re: Full Unicode strings strawman

2011-05-16 Thread Mark Davis
In practice, the supplemental code points don't really cause problems in Unicode strings. Most implementations just treat them as if they were unassigned. The only important issue is that *when* they are converted to UTF-xx for storage or transmission, they need to be handled; typically by

Re: Collation API not complete for search

2011-03-28 Thread Mark Davis
, if one is using Turkish I, we expect all of them to do so. - Shawn *From:* Nebojša Ćirić [mailto:c...@google.com] *Sent:* Monday, March 28, 2011 1:36 PM *To:* Mark Davis ☕ *Cc:* es-discuss@mozilla.org; Shawn Steele; Phillips, Addison *Subject:* Re: Collation API not complete for search

Re: Collation API not complete for search

2011-03-25 Thread Mark Davis
I think an iterator is a cleaner interface; we were just trying to minimize new API. In general, collation is context sensitive, so searching on substrings isn't a good idea. You want to search from a location, but have the rest of the text available to you. For the iterator, you would need to

Re: Stupid i18n use cases question

2011-01-29 Thread Mark Davis
There are really 5 cases at issue: 1. Code point breaks 2. Grapheme-Cluster breaks (with three possible variants: 'legacy', extended, and aksha http://www.unicode.org/glossary/#aksara) 3. Word breaks 4. Line breaks 5. Sentence breaks Notes: - #1 is pretty trivial to do

Re: i18n objects

2011-01-24 Thread Mark Davis
As stated before, I think that this approach is more error prone; that it would be better to explicitly call the other function. Here would be the difference between the two alternatives for the API: A and B, under the two common scenarios: *Scenario 1 I don't care* A. x = myLocaleInfo.region;

Re: 2nd day meeting comments on the latest i18n API proposal

2011-01-21 Thread Mark Davis
I would actually rather not have it be a construction argument, because it is easier for people to make mistakes that way. When I look this over, there are relatively few fields that need this. So what about having API like: // get an explicitly-set region, or null if there was no region

Re: 2nd day meeting comments on the latest i18n API proposal

2011-01-21 Thread Mark Davis
to infer, then it should be easy to avoid inferring for LocaleInfo. -Shawn *From:* es-discuss-boun...@mozilla.org [mailto: es-discuss-boun...@mozilla.org] *On Behalf Of *Mark Davis ? *Sent:* Friday, January 21, 2011 12:44 PM *To:* Axel Hecht *Cc:* Derek Murman; es-discuss@mozilla.org

Re: i18n collator options

2011-01-20 Thread Mark Davis
[mailto:mark.edward.da...@gmail.com] *On Behalf Of *Mark Davis ? *Sent:* Poʻahā, Ianuali 20, 2011 4:05 hours *To:* Shawn Steele *Cc:* es-discuss@mozilla.org; Peter Constable; Derek Murman *Subject:* Re: i18n collator options (BTW I haven't gotten added to es-discuss yet, so one of you might forward these 3

Re: EcmaScript i18n API proposal

2010-06-10 Thread Mark Davis
*Re the following message:* * * It is clearly expected that the number of locales available on any particular device may be limited; a smartphone, for example, might have very few installed, or have limited services for those it does have installed. With the locale model, implementations are