Re: The Case Against Autodecode

2016-06-01 Thread Adam D. Ruppe via Digitalmars-d
On Wednesday, 1 June 2016 at 17:57:15 UTC, Andrei Alexandrescu wrote: Try typing the iteration variable with "dchar". -- Andrei Or you can type it as wchar... But important to note: that's opt in, not automatic.

Re: The Case Against Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d
On 06/01/2016 01:35 PM, ZombineDev wrote: On Tuesday, 31 May 2016 at 19:33:03 UTC, Andrei Alexandrescu wrote: On 05/31/2016 02:46 PM, Timon Gehr wrote: On 31.05.2016 20:30, Andrei Alexandrescu wrote: D's Phobos' foreach, too. -- Andrei Incorrect. https://dpaste.dzfl.pl/ba7a65d59534 Try

Re: The Case Against Autodecode

2016-06-01 Thread ZombineDev via Digitalmars-d
On Tuesday, 31 May 2016 at 19:33:03 UTC, Andrei Alexandrescu wrote: On 05/31/2016 02:46 PM, Timon Gehr wrote: On 31.05.2016 20:30, Andrei Alexandrescu wrote: D's Phobos' foreach, too. -- Andrei Incorrect. https://dpaste.dzfl.pl/ba7a65d59534

Re: The Case Against Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d
On 06/01/2016 12:41 PM, Nick Sabalausky wrote: As has been explained countless times already, code points are a non-1:1 internal representation of graphemes. Code points don't exist for their own sake, their entire existence is purely as a way to encode graphemes. Of course, thank you. Whethe

Re: The Case Against Autodecode

2016-06-01 Thread Nick Sabalausky via Digitalmars-d
On 06/01/2016 10:29 AM, Andrei Alexandrescu wrote: On 06/01/2016 06:25 AM, Marc Schütz wrote: On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu wrote: The point is to operate on representation-independent entities (Unicode code points) instead of low-level representation-specific ar

Re: The Case Against Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d
On 06/01/2016 06:25 AM, Marc Schütz wrote: On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu wrote: On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote: Wasn't the whole point of operating at the code point level by default to make it so that code would be operating on f

Re: The Case Against Autodecode

2016-06-01 Thread Joakim via Digitalmars-d
On Wednesday, 1 June 2016 at 10:04:42 UTC, Marc Schütz wrote: On Tuesday, 31 May 2016 at 16:29:33 UTC, Joakim wrote: UTF-8 is an antiquated hack that needs to be eradicated. It forces all other languages than English to be twice as long, for no good reason, have fun with that when you're downl

Re: The Case Against Autodecode

2016-06-01 Thread Marc Schütz via Digitalmars-d
On Wednesday, 1 June 2016 at 01:13:17 UTC, Steven Schveighoffer wrote: On 5/31/16 4:38 PM, Timon Gehr wrote: What about e.g. joiner? Compiler error. Better than what it does now. I believe everything that does only concatenation will work correctly. That's why joiner() is one of those algor

Re: The Case Against Autodecode

2016-06-01 Thread Marc Schütz via Digitalmars-d
On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu wrote: On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote: Wasn't the whole point of operating at the code point level by default to make it so that code would be operating on full characters by default instead of choppi

Re: The Case Against Autodecode

2016-06-01 Thread Marc Schütz via Digitalmars-d
On Tuesday, 31 May 2016 at 20:56:43 UTC, Andrei Alexandrescu wrote: On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d wrote: In the vast majority of cases what folks care about is full character How are you so sure? -- Andrei He doesn't need to be sure. You are the one advocating fo

Re: The Case Against Autodecode

2016-06-01 Thread Marc Schütz via Digitalmars-d
On Tuesday, 31 May 2016 at 16:29:33 UTC, Joakim wrote: UTF-8 is an antiquated hack that needs to be eradicated. It forces all other languages than English to be twice as long, for no good reason, have fun with that when you're downloading text on a 2G connection in the developing world. I as

Re: The Case Against Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d
On 5/31/2016 4:00 PM, ag0aep6g wrote: Wikipedia says [1] that UCS-2 is essentially UTF-16 without surrogate pairs. I suppose you mean UTF-32/UCS-4. [1] https://en.wikipedia.org/wiki/UTF-16 Thanks for the correction.

Re: The Case Against Autodecode

2016-05-31 Thread Jack Stouffer via Digitalmars-d
On Wednesday, 1 June 2016 at 02:17:21 UTC, Jonathan M Davis wrote: ... This thread is going in circles; the against crowd has stated each of their arguments very clearly at least five times in different ways. The cost/benefit problems with auto decoding are as clear as day. If the evidence

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 23:36:20 Marco Leise via Digitalmars-d wrote: > Am Tue, 31 May 2016 16:56:43 -0400 > > schrieb Andrei Alexandrescu : > > On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d wrote: > > > In the vast majority of cases what folks care about is full character > > > > How

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 20:38:14 Nick Sabalausky via Digitalmars-d wrote: > On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote: > > On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote: > >> On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote: > >>> Let's put the question this way. Given the fo

Re: The Case Against Autodecode

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d
On 5/31/16 4:38 PM, Timon Gehr wrote: On 31.05.2016 21:51, Steven Schveighoffer wrote: On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote: On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...] Does walkLength yield the same number for all representati

Re: The Case Against Autodecode

2016-05-31 Thread Nick Sabalausky via Digitalmars-d
On 05/31/2016 01:23 PM, Andrei Alexandrescu wrote: On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote: The standard library has to fight against itself because of autodecoding! The vast majority of the algorithms in Phobos are special-cased on strings in an attempt to get around au

Re: The Case Against Autodecode

2016-05-31 Thread Nick Sabalausky via Digitalmars-d
On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote: On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote: On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote: Let's put the question this way. Given the following string, what do *you* think walkLength should return? şŭt̥ḛ́k̠ The nu

Re: The Case Against Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d
On 5/31/2016 1:57 AM, Chris wrote: 1. Given you experience with Warp, how hard would it be to clean Phobos up? It's not hard, it's just a bit tedious. 2. After recoding a number of Phobos functions, how much code did actually break (yours or someone else's)?. It's been a while so I don't re

Re: The Case Against Autodecode

2016-05-31 Thread ag0aep6g via Digitalmars-d
On 06/01/2016 12:47 AM, Walter Bright wrote: But I didn't know which encoding would win - UTF-8, UTF-16, or UCS-2, so D bet on all three. If I had a do-over, I'd just support UTF-8. UTF-16 is useful pretty much only as a transitional encoding to talk with Windows APIs. Nobody uses UCS-2 (it consu

Re: The Case Against Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d
On 5/31/2016 1:20 PM, Marco Leise wrote: [...] I agree. I dealt the madness of code pages, Shift-JIS, EBCDIC, locales, etc., in the pre-Unicode days. Despite its problems, Unicode (and UTF-8) is a major improvement, and I mean major. 16 years ago, I bet that Unicode was the future, and even

Re: The Case Against Autodecode

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Tue, 31 May 2016 16:56:43 -0400 schrieb Andrei Alexandrescu : > On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d wrote: > > In the vast majority of cases what folks care about is full character > > How are you so sure? -- Andrei Because a full character is the typical unit of a wr

Re: The Case Against Autodecode

2016-05-31 Thread Max Samukha via Digitalmars-d
On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu wrote: If user code needs to go upper at the grapheme level, they can If anything this thread strengthens my opinion that autodecoding is a sweet spot. -- Andrei Unicode FAQ disagrees (http://unicode.org/faq/utf_bom.html): "Q: How

Re: The Case Against Autodecode

2016-05-31 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 05:01:17PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote: > > Wasn't the whole point of operating at the code point level by > > default to make it so that code would be operating on full > > character

Re: The Case Against Autodecode

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Tue, 31 May 2016 13:06:16 -0400 schrieb Andrei Alexandrescu : > On 05/31/2016 12:54 PM, Jonathan M Davis via Digitalmars-d wrote: > > Equality does not require decoding. Similarly, functions like find don't > > either. Something like filter generally would, but it's also not > > particularly no

Re: The Case Against Autodecode

2016-05-31 Thread Timon Gehr via Digitalmars-d
On 31.05.2016 22:20, Marco Leise wrote: Am Tue, 31 May 2016 16:29:33 + schrieb Joakim: >Part of it is the complexity of written language, part of it is >bad technical decisions. Building the default string type in D >around the horrible UTF-8 encoding was a fundamental mistake, >both in te

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote: Wasn't the whole point of operating at the code point level by default to make it so that code would be operating on full characters by default instead of chopping them up as is so easy to do when operating at the code unit level?

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d wrote: In the vast majority of cases what folks care about is full character How are you so sure? -- Andrei

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 03:34 PM, ag0aep6g wrote: On 05/31/2016 07:21 PM, Andrei Alexandrescu wrote: Could you please substantiate that? My understanding is that code unit is a higher-level Unicode notion independent of encoding, whereas code point is an encoding-dependent representation detail. -- Andrei

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote: On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote: Let's put the question this way. Given the following string, what do *you* think walkLength should return? şŭt̥ḛ́k̠ The number of code units in the string. That's the contrac

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote: Let's put the question this way. Given the following string, what do *you* think walkLength should return? şŭt̥ḛ́k̠ The number of code units in the string. That's the contract promised and honored by Phobos. -- Andrei

Re: The Case Against Autodecode

2016-05-31 Thread Joakim via Digitalmars-d
On Tuesday, 31 May 2016 at 20:28:32 UTC, ag0aep6g wrote: On 05/31/2016 06:29 PM, Joakim wrote: D devs should lead the way in getting rid of the UTF-8 encoding, not bickering about how to make it more palatable. I suggested a single-byte encoding for most languages, with double-byte for the on

Re: The Case Against Autodecode

2016-05-31 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 10:38:03PM +0200, Timon Gehr via Digitalmars-d wrote: > On 31.05.2016 21:51, Steven Schveighoffer wrote: > > On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote: > > > On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via > > > Digitalmars-d wrote: > > > [...]

Re: The Case Against Autodecode

2016-05-31 Thread Joakim via Digitalmars-d
On Tuesday, 31 May 2016 at 20:20:46 UTC, Marco Leise wrote: Am Tue, 31 May 2016 16:29:33 + schrieb Joakim : Part of it is the complexity of written language, part of it is bad technical decisions. Building the default string type in D around the horrible UTF-8 encoding was a fundamental

Re: The Case Against Autodecode

2016-05-31 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 10:47:56PM +0300, Dmitry Olshansky via Digitalmars-d wrote: > On 31-May-2016 01:00, Walter Bright wrote: > > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote: > > > I don't agree on changing those. Indexing and slicing a char[] is > > > really useful and actually not hard to do c

Re: The Case Against Autodecode

2016-05-31 Thread Timon Gehr via Digitalmars-d
On 31.05.2016 21:51, Steven Schveighoffer wrote: On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote: On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...] Does walkLength yield the same number for all representations? Let's put the question this way.

Re: The Case Against Autodecode

2016-05-31 Thread ag0aep6g via Digitalmars-d
On 05/31/2016 06:29 PM, Joakim wrote: D devs should lead the way in getting rid of the UTF-8 encoding, not bickering about how to make it more palatable. I suggested a single-byte encoding for most languages, with double-byte for the ones which wouldn't fit in a byte. Use some kind of header or

Re: The Case Against Autodecode

2016-05-31 Thread Joakim via Digitalmars-d
On Tuesday, 31 May 2016 at 18:34:54 UTC, Jonathan M Davis wrote: On Tuesday, May 31, 2016 16:29:33 Joakim via Digitalmars-d wrote: UTF-8 is an antiquated hack that needs to be eradicated. It forces all other languages than English to be twice as long, for no good reason, have fun with that whe

Re: The Case Against Autodecode

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Tue, 31 May 2016 16:29:33 + schrieb Joakim : > Part of it is the complexity of written language, part of it is > bad technical decisions. Building the default string type in D > around the horrible UTF-8 encoding was a fundamental mistake, > both in terms of efficiency and complexity.

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 22:47:56 Dmitry Olshansky via Digitalmars-d wrote: > On 31-May-2016 01:00, Walter Bright wrote: > > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote: > >> I don't agree on changing those. Indexing and slicing a char[] is > >> really useful > >> and actually not hard to do correct

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 21:48:36 Timon Gehr via Digitalmars-d wrote: > On 31.05.2016 21:40, Wyatt wrote: > > On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote: > >> The 'length' of a character is not one in all contexts. > >> The following text takes six columns in my terminal: > >> > >> 日

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 15:33:38 Andrei Alexandrescu via Digitalmars-d wrote: > On 05/31/2016 02:53 PM, Jonathan M Davis via Digitalmars-d wrote: > > walkLength treats a code point like it's a character. > > No, it treats a code point like it's a code point. -- Andrei Wasn't the whole point of op

Re: The Case Against Autodecode

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d
On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote: On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...] Does walkLength yield the same number for all representations? Let's put the question this way. Given the following string, what do *you* think

Re: The Case Against Autodecode

2016-05-31 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 07:40:13PM +, Wyatt via Digitalmars-d wrote: > On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote: > > > > The 'length' of a character is not one in all contexts. > > The following text takes six columns in my terminal: > > > > 日本語 > > 123456 > > That's a prope

Re: The Case Against Autodecode

2016-05-31 Thread Timon Gehr via Digitalmars-d
On 31.05.2016 21:40, Wyatt wrote: On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote: The 'length' of a character is not one in all contexts. The following text takes six columns in my terminal: 日本語 123456 That's a property of your font and font rendering engine, not Unicode. Sure.

Re: The Case Against Autodecode

2016-05-31 Thread Dmitry Olshansky via Digitalmars-d
On 31-May-2016 01:00, Walter Bright wrote: On 5/30/2016 11:25 AM, Adam D. Ruppe wrote: I don't agree on changing those. Indexing and slicing a char[] is really useful and actually not hard to do correctly (at least with regard to handling code units). Yup. It isn't hard at all to use arrays of

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 21:20:19 Timon Gehr via Digitalmars-d wrote: > On 31.05.2016 20:53, Jonathan M Davis via Digitalmars-d wrote: > > On Tuesday, May 31, 2016 14:30:08 Andrei Alexandrescu via Digitalmars-d wrote: > >> >On 5/31/16 2:11 PM, Jonathan M Davis via Digitalmars-d wrote: > >>> > >On

Re: The Case Against Autodecode

2016-05-31 Thread Wyatt via Digitalmars-d
On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote: The 'length' of a character is not one in all contexts. The following text takes six columns in my terminal: 日本語 123456 That's a property of your font and font rendering engine, not Unicode. (Also, it's probably not quite six columns

Re: The Case Against Autodecode

2016-05-31 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...] > Does walkLength yield the same number for all representations? Let's put the question this way. Given the following string, what do *you* think walkLength should return? şŭt̥ḛ́k̠ I think an

Re: The Case Against Autodecode

2016-05-31 Thread ag0aep6g via Digitalmars-d
On 05/31/2016 07:21 PM, Andrei Alexandrescu wrote: Could you please substantiate that? My understanding is that code unit is a higher-level Unicode notion independent of encoding, whereas code point is an encoding-dependent representation detail. -- Andrei You got the terms mixed up. Code unit

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 02:57 PM, Jonathan M Davis via Digitalmars-d wrote: In addition, as soon as you have ubyte[], none of the string-related functions work. That's fixable, but as it stands, operating on ubyte[] instead of char[] is a royal pain. That'd be nice to fix indeed. Please break the ground?

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 02:53 PM, Jonathan M Davis via Digitalmars-d wrote: walkLength treats a code point like it's a character. No, it treats a code point like it's a code point. -- Andrei

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 02:46 PM, Timon Gehr wrote: On 31.05.2016 20:30, Andrei Alexandrescu wrote: D's Phobos' foreach, too. -- Andrei

Re: The Case Against Autodecode

2016-05-31 Thread Timon Gehr via Digitalmars-d
On 31.05.2016 20:53, Jonathan M Davis via Digitalmars-d wrote: On Tuesday, May 31, 2016 14:30:08 Andrei Alexandrescu via Digitalmars-d wrote: >On 5/31/16 2:11 PM, Jonathan M Davis via Digitalmars-d wrote: > >On Tuesday, May 31, 2016 13:21:57 Andrei Alexandrescu via Digitalmars-d wrote: > >>On

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Friday, May 27, 2016 04:31:49 Vladimir Panteleev via Digitalmars-d wrote: > On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu > >> 9. Autodecode cannot be turned off, i.e. it isn't practical to > >> avoid > >> importing std.array one way or another, and then autodecode is > >> there.

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 14:30:08 Andrei Alexandrescu via Digitalmars-d wrote: > On 5/31/16 2:11 PM, Jonathan M Davis via Digitalmars-d wrote: > > On Tuesday, May 31, 2016 13:21:57 Andrei Alexandrescu via Digitalmars-d wrote: > >> On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote: >

Re: The Case Against Autodecode

2016-05-31 Thread Timon Gehr via Digitalmars-d
On 31.05.2016 20:30, Andrei Alexandrescu wrote: D's Phobos' handling of UTF is at the code unit code point level (like all of Unicode is portably defined).

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 16:29:33 Joakim via Digitalmars-d wrote: > UTF-8 is an antiquated hack that needs to be eradicated. It > forces all other languages than English to be twice as long, for > no good reason, have fun with that when you're downloading text > on a 2G connection in the developin

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 5/31/16 2:11 PM, Jonathan M Davis via Digitalmars-d wrote: On Tuesday, May 31, 2016 13:21:57 Andrei Alexandrescu via Digitalmars-d wrote: On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote: Saying that operating at the code point level - UTF-32 - is correct is like saying that

Re: How to improve autodecoding? (Was: The Case Against Autodecode)

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 5/31/16 2:21 PM, Jonathan M Davis via Digitalmars-d wrote: I think that the first step is getting Phobos to work with all ranges of character types - be they char, wchar, dchar, or graphemes. Then the algorithms themselves will work whether we have auto-decoding or not. With that done, we can

Re: How to improve autodecoding? (Was: The Case Against Autodecode)

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Monday, May 30, 2016 14:24:23 Andrei Alexandrescu via Digitalmars-d wrote: > On 05/30/2016 12:34 PM, Jack Stouffer wrote: > > On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote: > >> D1 -> D2 was a vastly more disruptive change than getting rid of > >> auto-decoding would be. > > > >

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 13:21:57 Andrei Alexandrescu via Digitalmars-d wrote: > On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote: > > Saying that operating at the code point level - UTF-32 - is correct > > is like saying that operating at UTF-16 instead of UTF-8 is correct. > > Cou

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Friday, May 27, 2016 16:41:09 Andrei Alexandrescu via Digitalmars-d wrote: > On 05/27/2016 03:43 PM, H. S. Teoh via Digitalmars-d wrote: > > That's what we've been trying to say all along! > > If that's the case things are pretty dire, autodecoding or not. -- Andrei True enough. Correctly handl

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote: The standard library has to fight against itself because of autodecoding! The vast majority of the algorithms in Phobos are special-cased on strings in an attempt to get around autodecoding. That alone should highlight the fact tha

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote: Saying that operating at the code point level - UTF-32 - is correct is like saying that operating at UTF-16 instead of UTF-8 is correct. Could you please substantiate that? My understanding is that code unit is a higher-level Un

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 13:01:11 Andrei Alexandrescu via Digitalmars-d wrote: > On 05/31/2016 12:45 PM, Jonathan M Davis via Digitalmars-d wrote: > > On Tuesday, May 31, 2016 11:07:09 Andrei Alexandrescu via Digitalmars-d wrote: > >> On 5/31/16 3:56 AM, Walter Bright wrote: > >>> If there is an a

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Friday, May 27, 2016 09:40:21 H. S. Teoh via Digitalmars-d wrote: > On Fri, May 27, 2016 at 03:47:32PM +0200, ag0aep6g via Digitalmars-d wrote: > > On 05/27/2016 03:32 PM, Andrei Alexandrescu wrote: > > > > > However the following do require autodecoding: > > > > > > > > > > s.walkLength > > > >

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 12:54 PM, Jonathan M Davis via Digitalmars-d wrote: Equality does not require decoding. Similarly, functions like find don't either. Something like filter generally would, but it's also not particularly normal to filter a string on a by-character basis. You'd probably want to get to

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 05/31/2016 12:45 PM, Jonathan M Davis via Digitalmars-d wrote: On Tuesday, May 31, 2016 11:07:09 Andrei Alexandrescu via Digitalmars-d wrote: On 5/31/16 3:56 AM, Walter Bright wrote: If there is an abstraction for strings that is efficient, consistent, useful, and hides the fact that it is U

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Friday, May 27, 2016 23:16:58 David Nadlinger via Digitalmars-d wrote: > On Friday, 27 May 2016 at 22:12:57 UTC, Minas Mina wrote: > > Those should be the same though, i.e compare the same. In order > > to do that, there is normalization. What is does is to _expand_ > > the single codepoint Ä in

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 07:17:03 default0 via Digitalmars-d wrote: > Thinking about this a bit more - what algorithms are actually > correct when implemented on the level of code units? > Off the top of my head I can only really think of copying and > hashing, since you want to do that on the byte

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 11:07:09 Andrei Alexandrescu via Digitalmars-d wrote: > On 5/31/16 3:56 AM, Walter Bright wrote: > > If there is an abstraction for strings that is efficient, consistent, > > useful, and hides the fact that it is UTF, I am not aware of it. > > It's been mentioned several ti

Re: The Case Against Autodecode

2016-05-31 Thread Joakim via Digitalmars-d
On Monday, 30 May 2016 at 17:35:36 UTC, Chris wrote: On Monday, 30 May 2016 at 16:03:03 UTC, Marco Leise wrote: *** http://site.icu-project.org/home#TOC-What-is-ICU- I was actually talking about ICU with a colleague today. Could it be that Unicode itself is broken? I've often heard criticism

Re: The Case Against Autodecode

2016-05-31 Thread deadalnix via Digitalmars-d
On Tuesday, 31 May 2016 at 15:07:09 UTC, Andrei Alexandrescu wrote: Consistency with what? Consistent with what? It is a slice type. It should work as a slice type. Every other design stink.

Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Sunday, May 29, 2016 13:47:32 H. S. Teoh via Digitalmars-d wrote: > On Sun, May 29, 2016 at 03:55:22PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > > So now code points are good? -- Andrei > > It depends on what you're trying to accomplish. That's the point we're > trying to get at. F

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 5/31/16 10:33 AM, Seb wrote: Explicitly stating the type of iteration in the 132 places with auto-decoding in Phobos doesn't sound that terrible. It is terrible, no two ways about it. We've been very very careful with changes that caused a handful or breakages in Phobos. It really means ev

Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d
On 5/31/16 3:56 AM, Walter Bright wrote: On 5/30/2016 9:16 PM, Andrei Alexandrescu wrote: On 5/30/16 5:51 PM, Walter Bright wrote: On 5/30/2016 8:34 AM, Marc Schütz wrote: In an ideal world, we'd also want to change the way `length` and `opIndex` work, Why? strings are arrays of code units.

Re: The Case Against Autodecode

2016-05-31 Thread Kagamin via Digitalmars-d
On Tuesday, 31 May 2016 at 13:33:14 UTC, Marc Schütz wrote: In an ideal world, the programs someone intuitively writes will do the right thing, and if they can't, they at least refuse to compile. If we agree that it's up to the user whether to iterate over a string by code unit or code points o

Re: The Case Against Autodecode

2016-05-31 Thread ag0aep6g via Digitalmars-d
On 05/31/2016 04:33 PM, Seb wrote: https://github.com/dlang/phobos/pull/4384 Explicitly stating the type of iteration in the 132 places with auto-decoding in Phobos doesn't sound that terrible. After checking some of those 132 places, they are in generic functions that take ranges. std.algori

Re: The Case Against Autodecode

2016-05-31 Thread Seb via Digitalmars-d
On Tuesday, 31 May 2016 at 13:33:14 UTC, Marc Schütz wrote: On Monday, 30 May 2016 at 21:51:36 UTC, Walter Bright wrote: [...] So, strings are _implemented_ as arrays of code units. But indiscriminately treating them as such in all situations leads to wrong results (just like arrays of code

Re: The Case Against Autodecode

2016-05-31 Thread Marc Schütz via Digitalmars-d
On Monday, 30 May 2016 at 21:51:36 UTC, Walter Bright wrote: On 5/30/2016 8:34 AM, Marc Schütz wrote: In an ideal world, we'd also want to change the way `length` and `opIndex` work, Why? strings are arrays of code units. So, strings are _implemented_ as arrays of code units. But indiscrimi

Re: The Case Against Autodecode

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Tue, 31 May 2016 07:17:03 + schrieb default0 : > Thinking about this a bit more - what algorithms are actually > correct when implemented on the level of code units? Calculating the buffer size of a string, validation and fast versions of general algorithms that can be defined in terms of

Re: The Case Against Autodecode

2016-05-31 Thread deadalnix via Digitalmars-d
On Tuesday, 31 May 2016 at 07:56:54 UTC, Walter Bright wrote: On 5/30/2016 9:16 PM, Andrei Alexandrescu wrote: On 5/30/16 5:51 PM, Walter Bright wrote: On 5/30/2016 8:34 AM, Marc Schütz wrote: In an ideal world, we'd also want to change the way `length` and `opIndex` work, Why? strings are

Re: The Case Against Autodecode

2016-05-31 Thread Chris via Digitalmars-d
On Monday, 30 May 2016 at 21:39:00 UTC, Walter Bright wrote: On 5/30/2016 12:52 PM, H. S. Teoh via Digitalmars-d wrote: If I ever had to write string-heavy code, I'd probably fork Phobos just so I can get decent performance. Just sayin'. When I wrote Warp, the only point of which was speed, I

Re: The Case Against Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d
On 5/30/2016 9:16 PM, Andrei Alexandrescu wrote: On 5/30/16 5:51 PM, Walter Bright wrote: On 5/30/2016 8:34 AM, Marc Schütz wrote: In an ideal world, we'd also want to change the way `length` and `opIndex` work, Why? strings are arrays of code units. All the trouble comes from erratically pre

Re: The Case Against Autodecode

2016-05-31 Thread default0 via Digitalmars-d
On Tuesday, 31 May 2016 at 06:45:56 UTC, H. S. Teoh wrote: On Tue, May 31, 2016 at 12:13:57AM -0400, Andrei Alexandrescu via Digitalmars-d wrote: On 5/30/16 6:00 PM, Walter Bright wrote: > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote: > > I don't agree on changing those. Indexing and slicing a >

Re: The Case Against Autodecode

2016-05-30 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 12:13:57AM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 5/30/16 6:00 PM, Walter Bright wrote: > > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote: > > > I don't agree on changing those. Indexing and slicing a char[] is > > > really useful and actually not hard to do

Re: The Case Against Autodecode

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 5/30/16 7:52 PM, Seb wrote: On Monday, 30 May 2016 at 21:39:14 UTC, Vladimir Panteleev wrote: On Monday, 30 May 2016 at 16:34:49 UTC, Jack Stouffer wrote: On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote: D1 -> D2 was a vastly more disruptive change than getting rid of auto-dec

Re: The Case Against Autodecode

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 5/30/16 5:51 PM, Walter Bright wrote: On 5/30/2016 8:34 AM, Marc Schütz wrote: In an ideal world, we'd also want to change the way `length` and `opIndex` work, Why? strings are arrays of code units. All the trouble comes from erratically pretending otherwise. That's not an argument. Objec

Re: The Case Against Autodecode

2016-05-30 Thread Andrei Alexandrescu via Digitalmars-d
On 5/30/16 6:00 PM, Walter Bright wrote: On 5/30/2016 11:25 AM, Adam D. Ruppe wrote: I don't agree on changing those. Indexing and slicing a char[] is really useful and actually not hard to do correctly (at least with regard to handling code units). Yup. It isn't hard at all to use arrays of c

Re: The Case Against Autodecode

2016-05-30 Thread Jack Stouffer via Digitalmars-d
On Monday, 30 May 2016 at 21:39:14 UTC, Vladimir Panteleev wrote: Perhaps it would be worth trying to silently remove autodecoding and seeing how much of Phobos breaks, as an experiment. Has this been tried before? Did it, the results are a large number of phobos modules fail to compile becau

Re: The Case Against Autodecode

2016-05-30 Thread Nick Sabalausky via Digitalmars-d
On 05/30/2016 04:30 PM, Timon Gehr wrote: In D, enum does not mean enumeration, const does not mean constant, pure is not pure, lazy is not lazy, and char does not mean character. My new favorite quote :)

Re: The Case Against Autodecode

2016-05-30 Thread Seb via Digitalmars-d
On Monday, 30 May 2016 at 21:39:14 UTC, Vladimir Panteleev wrote: On Monday, 30 May 2016 at 16:34:49 UTC, Jack Stouffer wrote: On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote: D1 -> D2 was a vastly more disruptive change than getting rid of auto-decoding would be. Don't be so su

Re: The Case Against Autodecode

2016-05-30 Thread Marco Leise via Digitalmars-d
A relevant thread in the Rust bug tracker I remember from three years ago: https://github.com/rust-lang/rust/issues/7043 May it be of inspiration. -- Marco

Re: The Case Against Autodecode

2016-05-30 Thread Marco Leise via Digitalmars-d
> 4: Indonesians* shall be converted to a sane alphabet *Correction: Koreans (2-4 Hangul syllables (code points) form each letter) -- Marco

Re: The Case Against Autodecode

2016-05-30 Thread Marco Leise via Digitalmars-d
Am Fri, 27 May 2016 15:47:32 +0200 schrieb ag0aep6g : > On 05/27/2016 03:32 PM, Andrei Alexandrescu wrote: > >>> However the following do require autodecoding: > >>> > >>> s.walkLength > >>> s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation > >>> s.count!(c => c >= 32) // non-control chara

Re: The Case Against Autodecode

2016-05-30 Thread Walter Bright via Digitalmars-d
On 5/30/2016 11:25 AM, Adam D. Ruppe wrote: I don't agree on changing those. Indexing and slicing a char[] is really useful and actually not hard to do correctly (at least with regard to handling code units). Yup. It isn't hard at all to use arrays of codeunits correctly.

Re: The Case Against Autodecode

2016-05-30 Thread Walter Bright via Digitalmars-d
On 5/30/2016 8:34 AM, Marc Schütz wrote: In an ideal world, we'd also want to change the way `length` and `opIndex` work, Why? strings are arrays of code units. All the trouble comes from erratically pretending otherwise.

Re: The Case Against Autodecode

2016-05-30 Thread Vladimir Panteleev via Digitalmars-d
On Monday, 30 May 2016 at 16:34:49 UTC, Jack Stouffer wrote: On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote: D1 -> D2 was a vastly more disruptive change than getting rid of auto-decoding would be. Don't be so sure. All string handling code would become broken, even if it appea

Re: The Case Against Autodecode

2016-05-30 Thread Walter Bright via Digitalmars-d
On 5/30/2016 12:52 PM, H. S. Teoh via Digitalmars-d wrote: If I ever had to write string-heavy code, I'd probably fork Phobos just so I can get decent performance. Just sayin'. When I wrote Warp, the only point of which was speed, I couldn't use phobos because of autodecoding. I have since rec

<    1   2   3   4   5   >