Re: How to improve autodecoding? (Was: The Case Against Autodecode)

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 05/30/2016 03:00 PM, Jack Stouffer wrote: On Monday, 30 May 2016 at 18:24:23 UTC, Andrei Alexandrescu wrote: That kind of makes this thread less productive than "How to improve autodecoding?" -- Andrei Please don't misunderstand, I'm for fixing string behavior. Surely the

Re: The Case Against Autodecode

2016-05-30 Thread Timon Gehr via Digitalmars-d
On 30.05.2016 18:01, Andrei Alexandrescu wrote: On 05/28/2016 03:04 PM, Walter Bright wrote: On 5/28/2016 5:04 AM, Andrei Alexandrescu wrote: So it harkens back to the original mistake: strings should NOT be arrays with the respective primitives. An array of code units provides consistency,

Re: How to improve autodecoding? (Was: The Case Against Autodecode)

2016-05-30 Thread Jack Stouffer via Digitalmars-d
On Monday, 30 May 2016 at 18:24:23 UTC, Andrei Alexandrescu wrote: That kind of makes this thread less productive than "How to improve autodecoding?" -- Andrei Please don't misunderstand, I'm for fixing string behavior. But, let's not pretend that this wouldn't be one of the (if not the)

Re: The Case Against Autodecode

2016-05-30 Thread Adam D. Ruppe via Digitalmars-d
On Monday, 30 May 2016 at 16:03:03 UTC, Marco Leise wrote: When on the other hand you work with real world international text, you'll want to work with graphemes. Actually, my main rule of thumb is: don't mess with strings. Get them from the user, store them without modification, spit them

Re: The Case Against Autodecode

2016-05-30 Thread Adam D. Ruppe via Digitalmars-d
On Monday, 30 May 2016 at 17:14:47 UTC, Andrew Godfrey wrote: I like "make string iteration explicit" but I wonder about other constructs. E.g. What about "sort an array of strings"? How would you tell a generic sort function whether you want it to interpret strings by code unit vs code point

Re: How to improve autodecoding? (Was: The Case Against Autodecode)

2016-05-30 Thread Dmitry Olshansky via Digitalmars-d
On 30-May-2016 21:24, Andrei Alexandrescu wrote: On 05/30/2016 12:34 PM, Jack Stouffer wrote: On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote: D1 -> D2 was a vastly more disruptive change than getting rid of auto-decoding would be. Don't be so sure. All string handling code

Re: The Case Against Autodecode

2016-05-30 Thread Adam D. Ruppe via Digitalmars-d
On Monday, 30 May 2016 at 14:35:03 UTC, Seb wrote: That's a great idea - the compiler should also issue deprecation warnings when I try to do things like: I don't agree on changing those. Indexing and slicing a char[] is really useful and actually not hard to do correctly (at least with

How to improve autodecoding? (Was: The Case Against Autodecode)

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 05/30/2016 12:34 PM, Jack Stouffer wrote: On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote: D1 -> D2 was a vastly more disruptive change than getting rid of auto-decoding would be. Don't be so sure. All string handling code would become broken, even if it appears to work at

Re: The Case Against Autodecode

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 05/30/2016 12:25 PM, Nick Sabalausky wrote: On 05/29/2016 09:58 PM, Jack Stouffer wrote: The problem is not active users. The problem is companies who have > 10K LOC and libraries that are no longer maintained. E.g. It took Sociomantic eight years after D2's release to switch only a few

Re: The Case Against Autodecode

2016-05-30 Thread Chris via Digitalmars-d
On Monday, 30 May 2016 at 16:03:03 UTC, Marco Leise wrote: *** http://site.icu-project.org/home#TOC-What-is-ICU- I was actually talking about ICU with a colleague today. Could it be that Unicode itself is broken? I've often heard criticism of Unicode but never looked into it.

Re: The Case Against Autodecode

2016-05-30 Thread Andrew Godfrey via Digitalmars-d
I like "make string iteration explicit" but I wonder about other constructs. E.g. What about "sort an array of strings"? How would you tell a generic sort function whether you want it to interpret strings by code unit vs code point vs grapheme?

Re: The Case Against Autodecode

2016-05-30 Thread Jack Stouffer via Digitalmars-d
On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote: D1 -> D2 was a vastly more disruptive change than getting rid of auto-decoding would be. Don't be so sure. All string handling code would become broken, even if it appears to work at first.

Re: The Case Against Autodecode

2016-05-30 Thread Marco Leise via Digitalmars-d
Am Thu, 26 May 2016 16:23:16 -0700 schrieb "H. S. Teoh via Digitalmars-d" : > On Thu, May 26, 2016 at 12:00:54PM -0400, Andrei Alexandrescu via > Digitalmars-d wrote: > [...] > > s.walkLength > > s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation > >

Re: The Case Against Autodecode

2016-05-30 Thread Nick Sabalausky via Digitalmars-d
On 05/29/2016 09:58 PM, Jack Stouffer wrote: The problem is not active users. The problem is companies who have > 10K LOC and libraries that are no longer maintained. E.g. It took Sociomantic eight years after D2's release to switch only a few parts of their projects to D2. With the loss of old

Re: The Case Against Autodecode

2016-05-30 Thread Marco Leise via Digitalmars-d
Am Mon, 30 May 2016 09:26:09 + schrieb Chris : > If it's true that auto decode is unnecessary in many cases, then > it shouldn't affect the whole code base. But I might be mistaken > here. Maybe we should make a list of the functions where auto > decode does make a

Re: The Case Against Autodecode

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 05/28/2016 03:04 PM, Walter Bright wrote: On 5/28/2016 5:04 AM, Andrei Alexandrescu wrote: So it harkens back to the original mistake: strings should NOT be arrays with the respective primitives. An array of code units provides consistency, predictability, flexibility, and performance.

Re: The Case Against Autodecode

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 05/29/2016 04:47 PM, H. S. Teoh via Digitalmars-d wrote: It depends on what you're trying to accomplish. That's the point we're trying to get at. For some operations, working with code points makes the most sense. But for other operations, it does not. There is no one representation that is

Re: The Case Against Autodecode

2016-05-30 Thread Marc Schütz via Digitalmars-d
On Monday, 30 May 2016 at 14:56:36 UTC, ag0aep6g wrote: All this is only sensible when we move to a dedicated string type that's not just an alias of `immutable(char)[]`. `immutable(char)[]` explicitly is an array of code units. It would not be acceptable, in my opinion, if the normal array

Re: The Case Against Autodecode

2016-05-30 Thread ag0aep6g via Digitalmars-d
On 05/30/2016 04:35 PM, Seb wrote: That's a great idea - the compiler should also issue deprecation warnings when I try to do things like: string a = "你好"; a[1]; // deprecation: direct access to a Unicode string is highly error-prone. Please specify the type of access. More details

Re: The Case Against Autodecode

2016-05-30 Thread Seb via Digitalmars-d
On Monday, 30 May 2016 at 12:59:08 UTC, Adam D. Ruppe wrote: On Monday, 30 May 2016 at 12:45:27 UTC, Andrei Alexandrescu wrote: That's... what I said. -- Andrei You said "not arrays", he said "not ranges". So that just means making the std.range.primitives.popFront and front add a

Re: The Case Against Autodecode

2016-05-30 Thread Adam D. Ruppe via Digitalmars-d
On Monday, 30 May 2016 at 12:45:27 UTC, Andrei Alexandrescu wrote: That's... what I said. -- Andrei You said "not arrays", he said "not ranges". So that just means making the std.range.primitives.popFront and front add a constraint if(!isSomeString()). Language built-ins still work, but

Re: The Case Against Autodecode

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 05/30/2016 07:58 AM, Marc Schütz wrote: On Saturday, 28 May 2016 at 12:04:20 UTC, Andrei Alexandrescu wrote: On 5/28/16 6:59 AM, Marc Schütz wrote: The fundamental problem is choosing one of those possibilities over the others without knowing what the user actually wants, which is what both

Re: The Case Against Autodecode

2016-05-30 Thread Marc Schütz via Digitalmars-d
On Saturday, 28 May 2016 at 12:04:20 UTC, Andrei Alexandrescu wrote: On 5/28/16 6:59 AM, Marc Schütz wrote: The fundamental problem is choosing one of those possibilities over the others without knowing what the user actually wants, which is what both BEFORE and AFTER do. OK, that's a fair

Re: The Case Against Autodecode

2016-05-30 Thread Chris via Digitalmars-d
On Sunday, 29 May 2016 at 17:35:35 UTC, Nick Sabalausky wrote: On 05/12/2016 08:47 PM, Jack Stouffer wrote: As much as I agree on the importance of a good smooth migration path, I don't think the "Python 2 vs 3" situation is really all that comparable here. Unlike Python, we wouldn't be

Re: The Case Against Autodecode

2016-05-29 Thread Walter Bright via Digitalmars-d
On 5/29/2016 5:56 PM, H. S. Teoh via Digitalmars-d wrote: As far as Unicode is concerned, it is a standard for representing *written* text, not spoken language, so concepts like phonemes aren't even relevant in the first place. Let's not get derailed from the present discussion by confusing the

Re: The Case Against Autodecode

2016-05-29 Thread Jack Stouffer via Digitalmars-d
On Sunday, 29 May 2016 at 17:35:35 UTC, Nick Sabalausky wrote: Unlike Python, we wouldn't be maintaining a "with auto-decoding" fork for years and years and years, ensuring nobody ever had a pressing reason to bother migrating. If it happens, they better. The D1 fork was maintained for almost

Re: The Case Against Autodecode

2016-05-29 Thread H. S. Teoh via Digitalmars-d
On Sun, May 29, 2016 at 01:13:36PM +, Tobias M via Digitalmars-d wrote: > On Sunday, 29 May 2016 at 12:41:50 UTC, Chris wrote: > > Ok, you have a point there, to be precise is a multigraph (a > > digraph)(cf. [1]). In French you can have multigraphs consisting of > > three or more characters

Re: The Case Against Autodecode

2016-05-29 Thread Walter Bright via Digitalmars-d
On 5/29/2016 4:47 AM, Tobias Müller wrote: No, this is well established terminology, you are confusing several things here: For D, we should stick with the terminology as defined by Unicode.

Re: The Case Against Autodecode

2016-05-29 Thread Martin Nowak via Digitalmars-d
On 05/12/2016 10:15 PM, Walter Bright wrote: > On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote: >> I am as unclear about the problems of autodecoding as I am about the > necessity >> to remove curl. Whenever I ask I hear some arguments that work well > emotionally >> but are scant on reason and

Re: The Case Against Autodecode

2016-05-29 Thread H. S. Teoh via Digitalmars-d
On Sun, May 29, 2016 at 03:55:22PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 05/29/2016 09:42 AM, Tobias M wrote: > > On Friday, 27 May 2016 at 19:43:16 UTC, H. S. Teoh wrote: > > > On Fri, May 27, 2016 at 03:30:53PM -0400, Andrei Alexandrescu via > > > Digitalmars-d wrote: > > > >

Re: The Case Against Autodecode

2016-05-29 Thread Andrei Alexandrescu via Digitalmars-d
On 05/29/2016 09:42 AM, Tobias M wrote: On Friday, 27 May 2016 at 19:43:16 UTC, H. S. Teoh wrote: On Fri, May 27, 2016 at 03:30:53PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: On 5/27/16 3:10 PM, ag0aep6g wrote: > I don't think there is value in distinguishing by language. > The point

Re: The Case Against Autodecode

2016-05-29 Thread Nick Sabalausky via Digitalmars-d
On 05/12/2016 08:47 PM, Jack Stouffer wrote: If you're serious about removing auto-decoding, which I think you and others have shown has merits, you have to the THE SIMPLEST migration path ever, or you will kill D. I'm talking a simple press of a button. I'm not exaggerating here. Python, a

Re: The Case Against Autodecode

2016-05-29 Thread Chris via Digitalmars-d
On Sunday, 29 May 2016 at 13:04:18 UTC, Tobias M wrote: On Sunday, 29 May 2016 at 12:08:52 UTC, default0 wrote: I am pretty sure that a single grapheme in unicode does not correspond to your notion of "character". I am pretty sure that what you think of as a "character" is officially called

Re: The Case Against Autodecode

2016-05-29 Thread Tobias M via Digitalmars-d
On Friday, 27 May 2016 at 19:43:16 UTC, H. S. Teoh wrote: On Fri, May 27, 2016 at 03:30:53PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: On 5/27/16 3:10 PM, ag0aep6g wrote: > I don't think there is value in distinguishing by language. > The point of Unicode is that you shouldn't need

Re: The Case Against Autodecode

2016-05-29 Thread Tobias M via Digitalmars-d
On Sunday, 29 May 2016 at 12:41:50 UTC, Chris wrote: Ok, you have a point there, to be precise is a multigraph (a digraph)(cf. [1]). In French you can have multigraphs consisting of three or more characters /o/, as in Irish => /i:/. However, a phoneme is not necessarily a spoken

Re: The Case Against Autodecode

2016-05-29 Thread Tobias M via Digitalmars-d
On Sunday, 29 May 2016 at 12:08:52 UTC, default0 wrote: I am pretty sure that a single grapheme in unicode does not correspond to your notion of "character". I am pretty sure that what you think of as a "character" is officially called "Grapheme Cluster" not "Grapheme". Grapheme is a

Re: The Case Against Autodecode

2016-05-29 Thread Chris via Digitalmars-d
On Sunday, 29 May 2016 at 11:47:30 UTC, Tobias Müller wrote: On Sunday, 29 May 2016 at 11:25:11 UTC, Chris wrote: Unicode graphemes are not always the same as graphemes in natural (written) languages. If <é> is composed in Unicode, it is still one grapheme in a written language, not two

Re: The Case Against Autodecode

2016-05-29 Thread default0 via Digitalmars-d
On Sunday, 29 May 2016 at 11:47:30 UTC, Tobias Müller wrote: On Sunday, 29 May 2016 at 11:25:11 UTC, Chris wrote: Unicode graphemes are not always the same as graphemes in natural (written) languages. If <é> is composed in Unicode, it is still one grapheme in a written language, not two

Re: The Case Against Autodecode

2016-05-29 Thread Tobias Müller via Digitalmars-d
On Sunday, 29 May 2016 at 11:25:11 UTC, Chris wrote: Unicode graphemes are not always the same as graphemes in natural (written) languages. If <é> is composed in Unicode, it is still one grapheme in a written language, not two distinct characters. However, in natural languages two characters

Re: The Case Against Autodecode

2016-05-29 Thread Chris via Digitalmars-d
On Saturday, 28 May 2016 at 22:29:12 UTC, Andrew Godfrey wrote: [snip] From all the detail in this thread, I wonder now if "a grapheme" is even an unambiguous concept across different environments. Unicode graphemes are not always the same as graphemes in natural (written) languages. If

Re: The Case Against Autodecode

2016-05-29 Thread Dicebot via Digitalmars-d
On 05/28/2016 03:04 PM, Andrei Alexandrescu wrote: > On 5/28/16 6:59 AM, Marc Schütz wrote: >> The fundamental problem is choosing one of those possibilities over the >> others without knowing what the user actually wants, which is what both >> BEFORE and AFTER do. > > OK, that's a fair argument,

Re: The Case Against Autodecode

2016-05-28 Thread Jack Stouffer via Digitalmars-d
On Saturday, 28 May 2016 at 12:04:20 UTC, Andrei Alexandrescu wrote: OK, that's a fair argument, thanks. So it seems there should be no "default" way to iterate a string Yes! So it harkens back to the original mistake: strings should NOT be arrays with the respective primitives. If you're

Re: The Case Against Autodecode

2016-05-28 Thread Andrew Godfrey via Digitalmars-d
On Saturday, 28 May 2016 at 19:04:14 UTC, Walter Bright wrote: On 5/28/2016 5:04 AM, Andrei Alexandrescu wrote: So it harkens back to the original mistake: strings should NOT be arrays with the respective primitives. An array of code units provides consistency, predictability, flexibility,

Re: The Case Against Autodecode

2016-05-28 Thread Walter Bright via Digitalmars-d
On 5/28/2016 5:04 AM, Andrei Alexandrescu wrote: So it harkens back to the original mistake: strings should NOT be arrays with the respective primitives. An array of code units provides consistency, predictability, flexibility, and performance. It's a solid base upon which the programmer can

Re: The Case Against Autodecode

2016-05-28 Thread Chris via Digitalmars-d
On Friday, 27 May 2016 at 18:11:22 UTC, Andrei Alexandrescu wrote: On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make length 1? -- Andrei No, I've tried it. I think dchar[] returns one or you check by grapheme.

Re: The Case Against Autodecode

2016-05-28 Thread Andrei Alexandrescu via Digitalmars-d
On 5/28/16 6:59 AM, Marc Schütz wrote: The fundamental problem is choosing one of those possibilities over the others without knowing what the user actually wants, which is what both BEFORE and AFTER do. OK, that's a fair argument, thanks. So it seems there should be no "default" way to

Re: The Case Against Autodecode

2016-05-28 Thread Marc Schütz via Digitalmars-d
On Friday, 27 May 2016 at 13:34:33 UTC, Andrei Alexandrescu wrote: On 5/27/16 6:56 AM, Marc Schütz wrote: It is not, which has been shown by various posts in this thread. Couldn't quite find strong arguments. Could you please be more explicit on which you found most convincing? -- Andrei

Re: The Case Against Autodecode

2016-05-28 Thread Dmitry Olshansky via Digitalmars-d
On 28-May-2016 01:04, tsbockman wrote: On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote: On 05/27/2016 03:39 PM, Dmitry Olshansky wrote: No, this is not the point of normalization. What is? -- Andrei 1) A grapheme may include several combining characters (such as

Re: The Case Against Autodecode

2016-05-27 Thread H. S. Teoh via Digitalmars-d
On Fri, May 27, 2016 at 04:41:09PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 05/27/2016 03:43 PM, H. S. Teoh via Digitalmars-d wrote: > > That's what we've been trying to say all along! > > If that's the case things are pretty dire, autodecoding or not. -- > Andrei Like it or

Re: The Case Against Autodecode

2016-05-27 Thread Walter Bright via Digitalmars-d
On 5/27/2016 11:27 AM, Andrei Alexandrescu wrote: On 5/27/16 1:11 PM, Walter Bright wrote: They mean code units. Always valid or potentially invalid as well? -- Andrei Some years ago I would have said always valid. Experience, however, says that Unicode is often dirty and code should be

Re: The Case Against Autodecode

2016-05-27 Thread David Nadlinger via Digitalmars-d
On Friday, 27 May 2016 at 22:12:57 UTC, Minas Mina wrote: Those should be the same though, i.e compare the same. In order to do that, there is normalization. What is does is to _expand_ the single codepoint Ä into A + ¨ Unless I'm mistaken, this depends on the form used. For example, in NFKC

Re: The Case Against Autodecode

2016-05-27 Thread Minas Mina via Digitalmars-d
On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote: On 05/27/2016 03:39 PM, Dmitry Olshansky wrote: On 27-May-2016 21:11, Andrei Alexandrescu wrote: On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make

Re: The Case Against Autodecode

2016-05-27 Thread tsbockman via Digitalmars-d
On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote: On 05/27/2016 03:39 PM, Dmitry Olshansky wrote: No, this is not the point of normalization. What is? -- Andrei 1) A grapheme may include several combining characters (such as diacritics) whose order is not supposed to be

Re: The Case Against Autodecode

2016-05-27 Thread Minas Mina via Digitalmars-d
On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote: On 05/27/2016 03:39 PM, Dmitry Olshansky wrote: On 27-May-2016 21:11, Andrei Alexandrescu wrote: On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 05/27/2016 03:39 PM, Dmitry Olshansky wrote: On 27-May-2016 21:11, Andrei Alexandrescu wrote: On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make length 1? -- Andrei No, this is not the point of normalization.

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 05/27/2016 03:43 PM, H. S. Teoh via Digitalmars-d wrote: That's what we've been trying to say all along! If that's the case things are pretty dire, autodecoding or not. -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread H. S. Teoh via Digitalmars-d
On Fri, May 27, 2016 at 07:53:30PM +, Adam D. Ruppe via Digitalmars-d wrote: > On Friday, 27 May 2016 at 19:30:53 UTC, Andrei Alexandrescu wrote: > > It seems code points are kind of useless because they don't really > > mean anything, would that be accurate? -- Andrei > > It might help to

Re: The Case Against Autodecode

2016-05-27 Thread Steven Schveighoffer via Digitalmars-d
On 5/27/16 3:30 PM, Andrei Alexandrescu wrote: On 5/27/16 3:10 PM, ag0aep6g wrote: I don't think there is value in distinguishing by language. The point of Unicode is that you shouldn't need to do that. It seems code points are kind of useless because they don't really mean anything, would

Re: The Case Against Autodecode

2016-05-27 Thread Adam D. Ruppe via Digitalmars-d
On Friday, 27 May 2016 at 19:30:53 UTC, Andrei Alexandrescu wrote: It seems code points are kind of useless because they don't really mean anything, would that be accurate? -- Andrei It might help to think of code points as being a kind of byte code for a text-representing VM. It's not

Re: The Case Against Autodecode

2016-05-27 Thread H. S. Teoh via Digitalmars-d
On Fri, May 27, 2016 at 03:30:53PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 5/27/16 3:10 PM, ag0aep6g wrote: > > I don't think there is value in distinguishing by language. The > > point of Unicode is that you shouldn't need to do that. > > It seems code points are kind of

Re: The Case Against Autodecode

2016-05-27 Thread H. S. Teoh via Digitalmars-d
On Fri, May 27, 2016 at 02:42:27PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 5/27/16 12:40 PM, H. S. Teoh via Digitalmars-d wrote: > > Exactly. And we just keep getting stuck on this point. It seems that > > the message just isn't getting through. The unfounded assumption > >

Re: The Case Against Autodecode

2016-05-27 Thread ag0aep6g via Digitalmars-d
On 05/27/2016 09:30 PM, Andrei Alexandrescu wrote: It seems code points are kind of useless because they don't really mean anything, would that be accurate? -- Andrei I think so, yeah. Due to combining characters, code points are similar to code units: a Unicode thing that you need to know

Re: The Case Against Autodecode

2016-05-27 Thread Dmitry Olshansky via Digitalmars-d
On 27-May-2016 21:11, Andrei Alexandrescu wrote: On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make length 1? -- Andrei No, this is not the point of normalization. -- Dmitry Olshansky

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 1:11 PM, Walter Bright wrote: The std.string algorithms I wrote all work much better (i.e. faster) without autodecoding, while maintaining proper Unicode support. Violent agreement is occurring here. We have plenty of those and need more. -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 3:10 PM, ag0aep6g wrote: I don't think there is value in distinguishing by language. The point of Unicode is that you shouldn't need to do that. It seems code points are kind of useless because they don't really mean anything, would that be accurate? -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread ag0aep6g via Digitalmars-d
On 05/27/2016 08:42 PM, Andrei Alexandrescu wrote: Which languages are covered by code points, and which languages require graphemes consisting of multiple code points? How does normalization play into this? -- Andrei I don't think there is value in distinguishing by language. The point of

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 12:40 PM, H. S. Teoh via Digitalmars-d wrote: Exactly. And we just keep getting stuck on this point. It seems that the message just isn't getting through. The unfounded assumption continues to be made that iterating by code point is somehow "correct" by definition and nobody can

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 1:11 PM, Walter Bright wrote: They mean code units. Always valid or potentially invalid as well? -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread Adam D. Ruppe via Digitalmars-d
On Friday, 27 May 2016 at 18:11:22 UTC, Andrei Alexandrescu wrote: Would normalization make length 1? -- Andrei In some, but not all cases.

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make length 1? -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread Walter Bright via Digitalmars-d
On 5/26/2016 9:00 AM, Andrei Alexandrescu wrote: My thesis: the D1 design decision to represent strings as char[] was disastrous and probably one of the largest weaknesses of D1. The decision in D2 to use immutable(char)[] for strings is a vast improvement but still has a number of issues. The

Re: The Case Against Autodecode

2016-05-27 Thread H. S. Teoh via Digitalmars-d
On Fri, May 27, 2016 at 03:47:32PM +0200, ag0aep6g via Digitalmars-d wrote: > On 05/27/2016 03:32 PM, Andrei Alexandrescu wrote: > > > > However the following do require autodecoding: > > > > > > > > s.walkLength > > > > s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation > > > > s.count!(c

Re: The Case Against Autodecode

2016-05-27 Thread Chris via Digitalmars-d
On Friday, 27 May 2016 at 13:47:32 UTC, ag0aep6g wrote: Misunderstanding. All examples work properly today because of autodecoding. -- Andrei They only work "properly" if you define "properly" as "in terms of code points". But working in terms of code points is usually wrong. If you want to

Re: The Case Against Autodecode

2016-05-27 Thread ag0aep6g via Digitalmars-d
On 05/27/2016 03:32 PM, Andrei Alexandrescu wrote: However the following do require autodecoding: s.walkLength s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation s.count!(c => c >= 32) // non-control characters Currently the standard library operates at code point level even though

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 6:26 AM, Kagamin wrote: As I understand, design rationale behind strings being plain arrays of code units is that it's impractical for the string to smarter than array of code units - it just won't cut it, while plain array provides simple and easy to understand implementation of

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 6:56 AM, Marc Schütz wrote: It is not, which has been shown by various posts in this thread. Couldn't quite find strong arguments. Could you please be more explicit on which you found most convincing? -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 7:19 AM, Chris wrote: On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: [snip] I would agree only with the amendment "...if used naively", which is important. Knowledge of how autodecoding works is a prerequisite for writing fast string code in D. Also, little

Re: The Case Against Autodecode

2016-05-27 Thread Chris via Digitalmars-d
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: [snip] I would agree only with the amendment "...if used naively", which is important. Knowledge of how autodecoding works is a prerequisite for writing fast string code in D. Also, little code should deal with one code

Re: The Case Against Autodecode

2016-05-27 Thread Marc Schütz via Digitalmars-d
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: This might be a good time to discuss this a tad further. I'd appreciate if the debate stayed on point going forward. Thanks! My thesis: the D1 design decision to represent strings as char[] was disastrous and probably one of

Re: The Case Against Autodecode

2016-05-27 Thread Kagamin via Digitalmars-d
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: 11. Indexing an array produces different results than autodecoding, another glaring special case. This is a direct consequence of the fact that string is immutable(char)[] and not a specific type. That error predates

Re: The Case Against Autodecode

2016-05-26 Thread Vladimir Panteleev via Digitalmars-d
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: 4. Autodecoding is slow and has no place in high speed string processing. I would agree only with the amendment "...if used naively", which is important. Knowledge of how autodecoding works is a prerequisite for writing

Re: The Case Against Autodecode

2016-05-26 Thread Andrei Alexandrescu via Digitalmars-d
On 05/26/2016 07:23 PM, H. S. Teoh via Digitalmars-d wrote: Therefore, instead of: myString.splitter!"abc".joiner!"def".count; we have to write: myString.representation .splitter!("abc".representation) .joiner!("def".representation)

Re: The Case Against Autodecode

2016-05-26 Thread H. S. Teoh via Digitalmars-d
On Thu, May 26, 2016 at 12:00:54PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...] > On 05/12/2016 04:15 PM, Walter Bright wrote: [...] > > 4. Autodecoding is slow and has no place in high speed string processing. > > I would agree only with the amendment "...if used naively", which is

Re: The Case Against Autodecode

2016-05-26 Thread Jack Stouffer via Digitalmars-d
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: instead, it should use standard library algorithms for searching, matching etc. When needed, iterating every code unit is trivially done through indexing. For an example where the std.algorithm/range functions don't cut

Re: The Case Against Autodecode

2016-05-26 Thread Andrei Alexandrescu via Digitalmars-d
This might be a good time to discuss this a tad further. I'd appreciate if the debate stayed on point going forward. Thanks! My thesis: the D1 design decision to represent strings as char[] was disastrous and probably one of the largest weaknesses of D1. The decision in D2 to use

Re: The Case Against Autodecode

2016-05-17 Thread sarn via Digitalmars-d
On Tuesday, 17 May 2016 at 09:53:17 UTC, Kagamin wrote: With UTF-8 problems happened on a massive scale in LAMP setups: mysql used latin1 as a default encoding and almost everything worked fine. ^ latin-1 with Swedish collation rules. And even if you set the encoding to "utf8", almost

Re: The Case Against Autodecode

2016-05-17 Thread Kagamin via Digitalmars-d
On Friday, 13 May 2016 at 21:46:28 UTC, Jonathan M Davis wrote: The history of why UTF-16 was chosen isn't really relevant to my point (Win32 has the same problem as Java and for similar reasons). My point was that if you use UTF-8, then it's obvious _really_ fast when you screwed up

Re: The Case Against Autodecode

2016-05-16 Thread jmh530 via Digitalmars-d
On Sunday, 15 May 2016 at 23:10:38 UTC, Jon D wrote: Runs for each combination were done five times and the median times used. The median times and the char[] to ubyte[] ratio are below: | | |char[] | ubyte[] | | Compiler | Text type | time (ms) | time (ms) | ratio |

Re: The Case Against Autodecode

2016-05-15 Thread H. S. Teoh via Digitalmars-d
On Mon, May 16, 2016 at 12:31:04AM +, Jack Stouffer via Digitalmars-d wrote: > On Sunday, 15 May 2016 at 23:10:38 UTC, Jon D wrote: > >Given the importance of performance in the auto-decoding topic, it > >seems reasonable to quantify it. I took a stab at this. It would of > >course be prudent

Re: The Case Against Autodecode

2016-05-15 Thread Jack Stouffer via Digitalmars-d
On Sunday, 15 May 2016 at 23:10:38 UTC, Jon D wrote: Given the importance of performance in the auto-decoding topic, it seems reasonable to quantify it. I took a stab at this. It would of course be prudent to have others conduct similar analysis rather than rely on my numbers alone. Here is

Re: The Case Against Autodecode

2016-05-15 Thread Jon D via Digitalmars-d
On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote: > I am as unclear about the problems of autodecoding as I am about the necessity > to remove curl. Whenever I ask I hear some arguments that work well emotionally > but are scant on

Re: The Case Against Autodecode

2016-05-15 Thread Ola Fosheim Grøstad via Digitalmars-d
On Sunday, 15 May 2016 at 01:45:25 UTC, Bill Hicks wrote: From a technical point, D is not successful, for the most part. C/C++ at least can use the excuse that they were created during a time when we didn't have the experience and the knowledge that we do now. Not really. The dominating

Re: The Case Against Autodecode

2016-05-14 Thread Bill Hicks via Digitalmars-d
On Friday, 13 May 2016 at 09:28:45 UTC, Chris wrote: PS I wonder does Bill Hicks know you're using his name? But I guess he's lost interest in this planet and happily lives on Mars now. Maybe I'm using the name to avoid being harassed. Or maybe, there are thousands of people in the world

Re: The Case Against Autodecode

2016-05-14 Thread Bill Hicks via Digitalmars-d
On Friday, 13 May 2016 at 07:26:53 UTC, poliklosio wrote: Also, you are missing the point by claiming that a technical problem is sure to kill D. Note that very successful languages like C++, python and so on also have undergone heated discussions about various features, and often live

Re: The Case Against Autodecode

2016-05-13 Thread Steven Schveighoffer via Digitalmars-d
On 5/12/16 4:15 PM, Walter Bright wrote: 10. Autodecoded arrays cannot be RandomAccessRanges, losing a key benefit of being arrays in the first place. I'll repeat what I said in the other thread. The problem isn't auto-decoding. The problem is hijacking the char[] and wchar[] (and variants)

Re: The Case Against Autodecode

2016-05-13 Thread Chris via Digitalmars-d
On Friday, 13 May 2016 at 14:06:28 UTC, Vladimir Panteleev wrote: On Friday, 13 May 2016 at 13:41:30 UTC, Chris wrote: PS Why does do I get a "StopForumSpam error" every time I post today? Has anyone else experienced the same problem: "StopForumSpam error: Socket error: Lookup error:

Re: The Case Against Autodecode

2016-05-13 Thread Vladimir Panteleev via Digitalmars-d
On Friday, 13 May 2016 at 13:41:30 UTC, Chris wrote: PS Why does do I get a "StopForumSpam error" every time I post today? Has anyone else experienced the same problem: "StopForumSpam error: Socket error: Lookup error: getaddrinfo error: Name or service not known. Please solve a CAPTCHA to

Re: The Case Against Autodecode

2016-05-13 Thread Chris via Digitalmars-d
On Friday, 13 May 2016 at 13:17:44 UTC, Walter Bright wrote: On 5/13/2016 2:12 AM, Chris wrote: If autodecode is killed, could we have a test version asap? I'd be willing to test my programs with autodecode turned off and see what happens. Others should do likewise and we could come up with a

Re: The Case Against Autodecode

2016-05-13 Thread Walter Bright via Digitalmars-d
On 5/13/2016 3:43 AM, Marc Schütz wrote: On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: 7. Autodecode cannot be used with unicode path/filenames, because it is legal (at least on Linux) to have invalid UTF-8 as filenames. It turns out in the wild that pure Unicode is not

Re: The Case Against Autodecode

2016-05-13 Thread Walter Bright via Digitalmars-d
On 5/12/2016 11:50 PM, Bill Hicks wrote: And I get called a troll and other names when I list half a dozen things wrong with D, my posts get removed/censored, etc, all because I try to inform people not to waste time with D because it's a broken and failed language. Posts that engage in

<    1   2   3   4   5   >