Re: Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Ken Whistler via Unicode
On 10/2/2018 12:45 AM, Martin J. Dürst via Unicode wrote: capitalize: uppercase (or title-case) the first character of the string, lowercase the rest When I say "cause problems", I mean producing mixed-case output. I originally thought that 'capitalize' would be fine. It is fine for

Re: Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Philippe Verdy via Unicode
I see no easy way to convert ALL UPPERCASE text with consistant casing as there's no rule, except by using dictionnary lookups. In reality data should be input using default casing (as in dictionnary entries), independantly of their position in sentences, paragraphs or titles, and the contextual

Re: Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Markus Scherer via Unicode
On Tue, Oct 2, 2018 at 12:50 AM Martin J. Dürst via Unicode < unicode@unicode.org> wrote: > ... The only > operation that can cause problems is 'capitalize'. > > When I say "cause problems", I mean producing mixed-case output. I > originally thought that 'capitalize' would be fine. It is fine for

Re: Unicode String Models

2018-10-02 Thread Daniel Bünzli via Unicode
On 2 October 2018 at 14:03:48, Mark Davis ☕️ via Unicode (unicode@unicode.org) wrote: > Because of performance and storage consideration, you need to consider the > possible internal data structures when you are looking at something as > low-level as strings. But most of the 'model's in the

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Whether or not it is well suited, that's probably water under the bridge at this point. Think of it as a jargon at this point; after all, there are lots of cases like that: a "near miss" wasn't nearly a miss, it was nearly a hit. Mark On Sun, Sep 9, 2018 at 10:56 AM Janusz S. Bień wrote: > On

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Mark On Tue, Sep 11, 2018 at 12:17 PM Henri Sivonen via Unicode < unicode@unicode.org> wrote: > On Sat, Sep 8, 2018 at 7:36 PM Mark Davis ☕️ via Unicode > wrote: > > > > I recently did some extensive revisions of a paper on Unicode string > models (APIs). Comments are welcome. > > > > >

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Mark On Sun, Sep 9, 2018 at 3:42 PM Daniel Bünzli wrote: > Hello, > > I find your notion of "model" and presentation a bit confusing since it > conflates what I would call the internal representation and the API. > > The internal representation defines how the Unicode text is stored and >

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Mark On Sun, Sep 9, 2018 at 10:03 AM Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Sat, 8 Sep 2018 18:36:00 +0200 > Mark Davis ☕️ via Unicode wrote: > > > I recently did some extensive revisions of a paper on Unicode string > > models (APIs). Comments are welcome. > > > > >

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Thanks, added a quote from you on that; see if it looks ok. Mark On Sat, Sep 8, 2018 at 9:20 PM John Cowan wrote: > This paper makes the default assumption that the internal storage of a > string is a featureless array. If this assumption is abandoned, it is > possible to get O(1) indexes

Re: Unicode String Models

2018-10-02 Thread Mark Davis ☕️ via Unicode
Thanks to all for comments. Just revised the text in https://goo.gl/neguxb. Mark On Sat, Sep 8, 2018 at 6:36 PM Mark Davis ☕️ wrote: > I recently did some extensive revisions of a paper on Unicode string > models (APIs). Comments are welcome. > > >

Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Martin J. Dürst via Unicode
Since the last discussion on Georgian (Mtavruli) on this mailing list, I have been looking into how to implement it in the Programming language Ruby. Ruby has four case-conversion operations for its class String: upcase: convert all characters to upper case downcase: convert all characters