Re: G_UTF8String: Boxed Type Proposal

2016-03-23 Thread Behdad Esfahbod
On Mon, Mar 21, 2016 at 3:30 PM, Randall Sawyer wrote: > Frankly, the use of the term "character" when referring to a "UTF-8 > encoded Unicode code point" was for me a source of confusion A character means a "Unicode character". That's independent of encoding, so,

Re: G_UTF8String: Boxed Type Proposal

2016-03-21 Thread Randall Sawyer
Frankly, the use of the term "character" when referring to a "UTF-8 encoded Unicode code point" was for me a source of confusion when I leapt to the conclusion of the unmet need of a UTF-8-length-aware wrapped string type - be it called "G_UTF8String" or "GUString". I recommend that all Glib

Re: G_UTF8String: Boxed Type Proposal

2016-03-21 Thread Randall Sawyer
Thank you once again to all who have responded. I have changed my mind. I DO grasp the nature of responders' objections. My understanding has now reached a "tipping point". What is the tipping point? On 03/21/2016 04:30 PM, Behdad Esfahbod wrote: I like to voice my opinion as well: -

Re: G_UTF8String: Boxed Type Proposal

2016-03-21 Thread Behdad Esfahbod
I like to voice my opinion as well: - Bundling data and its length in a boxed type is useful, but that's gblob, - Bundling number-of-Unicode-character is rarely useful indeed, - A string API that would require any changes to the string content to go through editing function calls is

Re: G_UTF8String: Boxed Type Proposal

2016-03-21 Thread Matthias Clasen
On Fri, Mar 18, 2016 at 9:57 AM, Randall Sawyer wrote: > 2) If the former is true - which it is - then the developer will need to > call g_utf8_strlen() to determine if there are multi-byte sequences to > navigate - and if there are - g_utf8_offset_to_pointer() to

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Matthias Clasen
Sure, code point works too. Anyway, enough with the ontology, we're not a standards body I still don't think that we need a utf8-string datatype. ___ gtk-devel-list mailing list gtk-devel-list@gnome.org

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Christian Hergert
On 03/19/2016 02:04 PM, Randall Sawyer wrote: >> It's possible you are focusing on implementation before measuring the >> problem. DRY alone is not a sufficient argument. > > "DRY" is not a term I know - or at least in the way you are using it > here.

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Randall Sawyer
On 03/19/2016 04:09 PM, Christian Hergert wrote: It's possible you are focusing on implementation before measuring the problem. DRY alone is not a sufficient argument. "DRY" is not a term I know - or at least in the way you are using it here. One topic I'm interested in covering at the

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Christian Hergert
On 03/19/2016 12:25 PM, Randall Sawyer wrote: > > If there already were such a structure, then it could already have been > employed by existing objects and structures such as GtkEntryBuffer and > PangoLayout - to name two - eliminating the need for extra lines of > redundant code. In fact - as I

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Randall Sawyer
length-aware string object. That's all. Forwarded Message ---- Subject: Re: G_UTF8String: Boxed Type Proposal Date: Sat, 19 Mar 2016 15:11:23 -0400 From: Randall Sawyer <srandallsaw...@hushmail.me> To: Emmanuele Bassi <eba...@gmail.com> On 03/19/2016 02:57

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Florian Müllner
On Fri, Mar 18, 2016 at 2:57 PM Randall Sawyer wrote: > how about the following modifications? > Change "gstring.h": > ... > struct _GString > { >gchar *str; >gsize len; >gsize allocated_len; >gsize utf8_len; > }; > ... > Changing the size of a

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Emmanuele Bassi
Hi; On 19 March 2016 at 18:03, Randall Sawyer wrote: > The concision of "GUString" over "G_UTF8String" reflects the concision of my > thoughts over what they were at the beginning of this thread. Since you've brought it up multiple times, I wanted to ensure you

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Randall Sawyer
On 03/19/2016 01:38 PM, Christian Hergert wrote: On 03/19/2016 06:57 AM, Randall Sawyer wrote: Some object classes - such as GtkEntryBuffer - store this value and update it as text is inserted or deleted. That is efficient. The fact that developers need to write equivalent code for each such

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Matthias Clasen
On Thu, Mar 17, 2016 at 4:09 PM, Jasper St. Pierre wrote: > The major issue is that "Unicode character" doesn't have a good > definition. The most likely definition is a "Unicode code point", > however, Windows uses "Unicode character" to mean a UTF-16 byte > sequence,

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Christian Hergert
On 03/19/2016 06:57 AM, Randall Sawyer wrote: > > Some object classes - such as GtkEntryBuffer - store this value and > update it as text is inserted or deleted. That is efficient. The fact > that developers need to write equivalent code for each such class is > inefficient. A string abstraction

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Nicolas George
[ Replying a little randomly to this message. ] Randall Sawyer: > 3) Wouldn't it be helpful to keep track of how many code points > ("characters")are stored in the GString - a number which may be less than > the value of GString.len - without needing to call g_utf8_strlen() each time > to find

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Simon McVittie
On 17/03/16 20:29, Matthias Clasen wrote: > Terminology can certainly be confusing at times, but I think that a > Unicode character is a perfectly well-defined entity, non-withstanding > the fact that it can be represented in various encodings (a utf8 > sequence, a ucs4 word, a utf-16 surrogate

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Randall Sawyer
On 03/19/2016 03:41 AM, Errol van de l'Isle wrote: Just to add my two cents worth as a user of glibmm. Glib::usting uses g_utf8_pointer_to_offset() to obtain the length of the string in characters in the method Glib::ustring::length. The method Glib::ustring::bytes returns the length in bytes;

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Randall Sawyer
On 03/17/2016 09:30 AM, Matthias Clasen wrote: Hi Randall, thanks for contributing! Pleased to be of service! Looking forward to learning how folks work together in this community. I believe that you haven't found such a proposal because most people don't see much use in a separate boxed

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Matthias Clasen
On Wed, Mar 16, 2016 at 6:58 PM, Randall Sawyer wrote: > I have a question at the end of this! Please answer if you think it will > help. Hi Randall, thanks for contributing! > > I propose the development of a new boxed type for the Glib API named > "G_UTF8String".

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Errol van de l'Isle
Just to add my two cents worth as a user of glibmm. Glib::usting uses g_utf8_pointer_to_offset() to obtain the length of the string in characters in the method Glib::ustring::length. The method Glib::ustring::bytes returns the length in bytes; At no point does it store the number of UTF-8

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Jasper St. Pierre
The major issue is that "Unicode character" doesn't have a good definition. The most likely definition is a "Unicode code point", however, Windows uses "Unicode character" to mean a UTF-16 byte sequence, which means that any code point above the Basic Multilingual Plane is really composed of two

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Matthias Clasen
On Thu, Mar 17, 2016 at 2:26 PM, Jasper St. Pierre wrote: > I'll also ask what "character" means in this case, even though I know > glib also has the same confusion. Are you talking about the number of > Unicode code points in the string, or the number of grapheme

Re: G_UTF8String: Boxed Type Proposal

2016-03-19 Thread Jasper St. Pierre
I'll also ask what "character" means in this case, even though I know glib also has the same confusion. Are you talking about the number of Unicode code points in the string, or the number of grapheme clusters, as defined by Unicode TR29 [0]? The number of code points isn't useful for editing in

Re: G_UTF8String: Boxed Type Proposal

2016-03-18 Thread Randall Sawyer
On 03/18/2016 10:10 AM, Florian Müllner wrote: On Fri, Mar 18, 2016 at 2:57 PM Randall Sawyer > wrote: how about the following modifications? Change "gstring.h": ... struct _GString { gchar *str;

Re: G_UTF8String: Boxed Type Proposal

2016-03-18 Thread Randall Sawyer
On 03/17/2016 02:26 PM, Jasper St. Pierre wrote: I'll also ask what "character" means in this case, even though I know glib also has the same confusion. Are you talking about the number of Unicode code points in the string, or the number of grapheme clusters, as defined by Unicode TR29 [0]? The

Re: G_UTF8String: Boxed Type Proposal

2016-03-18 Thread Randall Sawyer
On 03/17/2016 07:23 PM, Matthias Clasen wrote: Sure, code point works too. Anyway, enough with the ontology, we're not a standards body I still don't think that we need a utf8-string datatype. I have questions, then. Here are excerpts from the current master files: "gstring.h" ... struct

Re: G_UTF8String: Boxed Type Proposal

2016-03-18 Thread Chris Vine
On Fri, 18 Mar 2016 10:19:08 -0400 Randall Sawyer wrote: > Also - I just discovered that glibmm has a class Glib::ustring > (https://developer.gnome.org/glibmm/stable/classGlib_1_1ustring.html). > I am going to take a look through its source to see what they have >

Re: G_UTF8String: Boxed Type Proposal

2016-03-18 Thread Randall Sawyer
On 03/17/2016 10:39 AM, Randall Sawyer wrote: On 03/17/2016 09:30 AM, Matthias Clasen wrote: I believe that you haven't found such a proposal because most people don't see much use in a separate boxed type for utf8 strings. Every string we pass around in GLib and GTK+, and every char * in