Re: [Pharo-dev] Unicode Support

2015-12-13 Thread stepharo
I am pretty sure that this whole discussion does more harm than good for most people's understanding of Unicode. It is best and (mostly) correct to think of a Unicode string as a sequence of Unicode characters, each defined/identified by a code point (out of 10.000s covering all languages).

Re: [Pharo-dev] Unicode Support

2015-12-11 Thread Richard Sargent
EuanM wrote > ... > all ISO-8859-1 maps 1:1 to Unicode UTF-8 > ... I am late coming in to this conversation. If it hasn't already been said, please do not conflate Unicode and UTF-8. I think that would be a recipe for a high P.I.T.A. factor. Unicode defines the meaning of the code

Re: [Pharo-dev] Unicode Support

2015-12-11 Thread Eliot Miranda
Hi Todd, > On Dec 11, 2015, at 12:57 PM, Todd Blanchard wrote: > > >> On Dec 11, 2015, at 12:19, EuanM wrote: >> >> "If it hasn't already been said, please do not conflate Unicode and >> UTF-8. I think that would be a recipe for >> a high P.I.T.A.

Re: [Pharo-dev] Unicode Support // e acute example --> decomposition in Pharo?

2015-12-10 Thread H. Hirzel
Hello Sven On 12/9/15, Sven Van Caekenberghe wrote: > The simplest example in a common language is (the French letter é) is > > LATIN SMALL LETTER E WITH ACUTE [U+00E9] > > which can also be written as > > LATIN SMALL LETTER E [U+0065] followed by COMBINING ACUTE ACCENT [U+0301] >

Re: [Pharo-dev] Unicode Support

2015-12-10 Thread Ben Coman
On Wed, Dec 9, 2015 at 5:35 PM, Guillermo Polito wrote: > >> On 8 dic 2015, at 10:07 p.m., EuanM wrote: >> >> "No. a codepoint is the numerical value assigned to a character. An >> "encoded character" is the way a codepoint is represented in bytes >>

Re: [Pharo-dev] Unicode Support

2015-12-09 Thread Sven Van Caekenberghe
> On 09 Dec 2015, at 10:35, Guillermo Polito wrote: > > >> On 8 dic 2015, at 10:07 p.m., EuanM wrote: >> >> "No. a codepoint is the numerical value assigned to a character. An >> "encoded character" is the way a codepoint is represented in bytes

Re: [Pharo-dev] Unicode Support

2015-12-09 Thread Sven Van Caekenberghe
> On 09 Dec 2015, at 14:16, EuanM wrote: > > "To encode Unicode for external representation as bytes, we use UTF-8 > like the rest of the modern world. > > So far, so good. > > Why all the confusion ?" That was a rhetorical question. I know that we lack normalization, we

Re: [Pharo-dev] Unicode Support

2015-12-08 Thread EuanM
"No. a codepoint is the numerical value assigned to a character. An "encoded character" is the way a codepoint is represented in bytes using a given encoding." No. A codepoint may represent a component part of an abstract character, or may represent an abstract character, or it may do both (but

Re: [Pharo-dev] Unicode Support

2015-12-07 Thread Sven Van Caekenberghe
I am sorry but one of your basic assumptions is completely wrong: 'Les élèves Français' encodeWith: #iso99591. => #[76 101 115 32 233 108 232 118 101 115 32 70 114 97 110 231 97 105 115] 'Les élèves Français' utf8Encoded. => #[76 101 115 32 195 169 108 195 168 118 101 115 32 70 114 97 110

Re: [Pharo-dev] Unicode Support

2015-12-07 Thread Henrik Johansen
> On 07 Dec 2015, at 11:51 , EuanM wrote: > > And indeed, in principle. > > On 7 December 2015 at 10:51, EuanM wrote: >> Verifying assumptions is the key reason why you should documents like >> this out for review. >> >> Sven - >> >> I'm confident I

Re: [Pharo-dev] Unicode Support

2015-12-07 Thread EuanM
And indeed, in principle. On 7 December 2015 at 10:51, EuanM wrote: > Verifying assumptions is the key reason why you should documents like > this out for review. > > Sven - > > Cuis is encoded with ISO 8859-15 (aka ISO Latin 9) > > Sven, this is *NOT* as you state, ISO

Re: [Pharo-dev] Unicode Support

2015-12-07 Thread Henrik Johansen
> On 07 Dec 2015, at 1:05 , EuanM wrote: > > Hi Henry, > > To be honest, at some point I'm going to long for the for the much > more succinct semantics of healthcare systems and sports scoring and > administration systems again. :-) > > codepoints are any of *either* > -

Re: [Pharo-dev] Unicode Support

2015-12-06 Thread Sven Van Caekenberghe
> On 05 Dec 2015, at 17:35, Todd Blanchard wrote: > > would suggest that the only worthwhile encoding is UTF8 - the rest are > distractions except for being able to read and convert from other encodings > to UTF8. UTF16 is a complete waste of time. > > Read

Re: [Pharo-dev] Unicode Support

2015-12-06 Thread EuanM
Thanks for those pointers, Steph. I'll make sure they are on my reading list. (I have a limited weekly time-budget for Unicode work, but I expect this is a long-term project). I'll keep in touch with Steph, so any new facilities can be immediately useful to Pharo, and someone can guide them to

Re: [Pharo-dev] Unicode Support

2015-12-06 Thread EuanM
Todd, As long as others are using it, it's useful to be able to send UTF16, and to successfully import it. I like systems that play well with others. :-) On 5 December 2015 at 16:35, Todd Blanchard wrote: > would suggest that the only worthwhile encoding is UTF8 - the rest

Re: [Pharo-dev] Unicode Support

2015-12-06 Thread EuanM
Steph - I'll dig out the Fr phone book ordering from wherever it was I read about it! I thought I ghad it to hand, but I haven;t found it tonight. It can't be far away. On 5 December 2015 at 13:08, stepharo wrote: > Hi EuanM > > Le 4/12/15 12:42, EuanM a écrit : >> >> I'm

Re: [Pharo-dev] Unicode Support

2015-12-06 Thread Max Leske
> On 06 Dec 2015, at 18:44, Sven Van Caekenberghe wrote: > > >> On 05 Dec 2015, at 17:35, Todd Blanchard wrote: >> >> would suggest that the only worthwhile encoding is UTF8 - the rest are >> distractions except for being able to read and convert from other

Re: [Pharo-dev] Unicode Support

2015-12-05 Thread stepharo
Hi EuanM Le 4/12/15 12:42, EuanM a écrit : I'm currently groping my way to seeing how feature-complete our Unicode support is. I am doing this to establish what still needs to be done to provide full Unicode support. this is great. Thanks for pushing this. I wrote and collected some roadmap

Re: [Pharo-dev] Unicode Support

2015-12-05 Thread stepharo
Hi todd thanks for the link. It looks really interesting. Stef Le 5/12/15 17:35, Todd Blanchard a écrit : would suggest that the only worthwhile encoding is UTF8 - the rest are distractions except for being able to read and convert from other encodings to UTF8. UTF16 is a complete waste of

Re: [Pharo-dev] Unicode Support

2015-12-05 Thread Todd Blanchard
Sent from the road > On Dec 5, 2015, at 05:08, stepharo wrote: > > Hi EuanM > > Le 4/12/15 12:42, EuanM a écrit : >> I'm currently groping my way to seeing how feature-complete our >> Unicode support is. I am doing this to establish what still needs to >> be done to

Re: [Pharo-dev] Unicode Support

2015-12-05 Thread Todd Blanchard
would suggest that the only worthwhile encoding is UTF8 - the rest are distractions except for being able to read and convert from other encodings to UTF8. UTF16 is a complete waste of time. Read http://utf8everywhere.org/ I have extensive Unicode chops from around 1999 to 2004 and my

Re: [Pharo-dev] Unicode Support

2015-12-04 Thread Max Leske
Hi Euan I think it’s great that you’re trying this. I hope you know what you’re getting yourself into :) I’m no Unicode expert but I want to add two points to your list (although you’ve probably already thought of them): - Normalisation and conversion

Re: [Pharo-dev] Unicode Support

2015-12-04 Thread Sven Van Caekenberghe
> On 04 Dec 2015, at 17:00, Max Leske wrote: > > Hi Euan > > I think it’s great that you’re trying this. I hope you know what you’re > getting yourself into :) > > > I’m no Unicode expert but I want to add two points to your list (although > you’ve probably already