On Wednesday, 1 June 2016 at 17:57:15 UTC, Andrei Alexandrescu
wrote:
Try typing the iteration variable with "dchar". -- Andrei
Or you can type it as wchar...
But important to note: that's opt in, not automatic.
On 06/01/2016 01:35 PM, ZombineDev wrote:
On Tuesday, 31 May 2016 at 19:33:03 UTC, Andrei Alexandrescu wrote:
On 05/31/2016 02:46 PM, Timon Gehr wrote:
On 31.05.2016 20:30, Andrei Alexandrescu wrote:
D's
Phobos'
foreach, too. -- Andrei
Incorrect. https://dpaste.dzfl.pl/ba7a65d59534
Try
On Tuesday, 31 May 2016 at 19:33:03 UTC, Andrei Alexandrescu
wrote:
On 05/31/2016 02:46 PM, Timon Gehr wrote:
On 31.05.2016 20:30, Andrei Alexandrescu wrote:
D's
Phobos'
foreach, too. -- Andrei
Incorrect. https://dpaste.dzfl.pl/ba7a65d59534
On 06/01/2016 12:41 PM, Nick Sabalausky wrote:
As has been explained countless times already, code points are a non-1:1
internal representation of graphemes. Code points don't exist for their
own sake, their entire existence is purely as a way to encode graphemes.
Of course, thank you.
Whethe
On 06/01/2016 10:29 AM, Andrei Alexandrescu wrote:
On 06/01/2016 06:25 AM, Marc Schütz wrote:
On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu wrote:
The point is to operate on representation-independent entities
(Unicode code points) instead of low-level representation-specific
ar
On 06/01/2016 06:25 AM, Marc Schütz wrote:
On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu wrote:
On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote:
Wasn't the whole point of operating at the code point level by
default to
make it so that code would be operating on f
On Wednesday, 1 June 2016 at 10:04:42 UTC, Marc Schütz wrote:
On Tuesday, 31 May 2016 at 16:29:33 UTC, Joakim wrote:
UTF-8 is an antiquated hack that needs to be eradicated. It
forces all other languages than English to be twice as long,
for no good reason, have fun with that when you're downl
On Wednesday, 1 June 2016 at 01:13:17 UTC, Steven Schveighoffer
wrote:
On 5/31/16 4:38 PM, Timon Gehr wrote:
What about e.g. joiner?
Compiler error. Better than what it does now.
I believe everything that does only concatenation will work
correctly. That's why joiner() is one of those algor
On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu
wrote:
On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d
wrote:
Wasn't the whole point of operating at the code point level by
default to
make it so that code would be operating on full characters by
default
instead of choppi
On Tuesday, 31 May 2016 at 20:56:43 UTC, Andrei Alexandrescu
wrote:
On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d
wrote:
In the vast majority of cases what folks care about is full
character
How are you so sure? -- Andrei
He doesn't need to be sure. You are the one advocating fo
On Tuesday, 31 May 2016 at 16:29:33 UTC, Joakim wrote:
UTF-8 is an antiquated hack that needs to be eradicated. It
forces all other languages than English to be twice as long,
for no good reason, have fun with that when you're downloading
text on a 2G connection in the developing world.
I as
On 5/31/2016 4:00 PM, ag0aep6g wrote:
Wikipedia says [1] that UCS-2 is essentially UTF-16 without surrogate pairs. I
suppose you mean UTF-32/UCS-4.
[1] https://en.wikipedia.org/wiki/UTF-16
Thanks for the correction.
On Wednesday, 1 June 2016 at 02:17:21 UTC, Jonathan M Davis wrote:
...
This thread is going in circles; the against crowd has stated
each of their arguments very clearly at least five times in
different ways.
The cost/benefit problems with auto decoding are as clear as day.
If the evidence
On Tuesday, May 31, 2016 23:36:20 Marco Leise via Digitalmars-d wrote:
> Am Tue, 31 May 2016 16:56:43 -0400
>
> schrieb Andrei Alexandrescu :
> > On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d wrote:
> > > In the vast majority of cases what folks care about is full character
> >
> > How
On Tuesday, May 31, 2016 20:38:14 Nick Sabalausky via Digitalmars-d wrote:
> On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:
> > On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:
> >> On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote:
> >>> Let's put the question this way. Given the fo
On 5/31/16 4:38 PM, Timon Gehr wrote:
On 31.05.2016 21:51, Steven Schveighoffer wrote:
On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote:
On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via
Digitalmars-d wrote:
[...]
Does walkLength yield the same number for all representati
On 05/31/2016 01:23 PM, Andrei Alexandrescu wrote:
On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote:
The standard library has to fight against itself because of autodecoding!
The vast majority of the algorithms in Phobos are special-cased on
strings
in an attempt to get around au
On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:
On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:
On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote:
Let's put the question this way. Given the following string, what do
*you* think walkLength should return?
şŭt̥ḛ́k̠
The nu
On 5/31/2016 1:57 AM, Chris wrote:
1. Given you experience with Warp, how hard would it be to clean Phobos up?
It's not hard, it's just a bit tedious.
2. After recoding a number of Phobos functions, how much code did actually break
(yours or someone else's)?.
It's been a while so I don't re
On 06/01/2016 12:47 AM, Walter Bright wrote:
But I didn't know which encoding would win - UTF-8, UTF-16, or UCS-2, so
D bet on all three. If I had a do-over, I'd just support UTF-8. UTF-16
is useful pretty much only as a transitional encoding to talk with
Windows APIs. Nobody uses UCS-2 (it consu
On 5/31/2016 1:20 PM, Marco Leise wrote:
[...]
I agree. I dealt the madness of code pages, Shift-JIS, EBCDIC, locales, etc., in
the pre-Unicode days. Despite its problems, Unicode (and UTF-8) is a major
improvement, and I mean major.
16 years ago, I bet that Unicode was the future, and even
Am Tue, 31 May 2016 16:56:43 -0400
schrieb Andrei Alexandrescu :
> On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d wrote:
> > In the vast majority of cases what folks care about is full character
>
> How are you so sure? -- Andrei
Because a full character is the typical unit of a wr
On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu
wrote:
If user code needs to go upper at the grapheme level, they can
If anything this thread strengthens my opinion that
autodecoding is a sweet spot. -- Andrei
Unicode FAQ disagrees (http://unicode.org/faq/utf_bom.html):
"Q: How
On Tue, May 31, 2016 at 05:01:17PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
> On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote:
> > Wasn't the whole point of operating at the code point level by
> > default to make it so that code would be operating on full
> > character
Am Tue, 31 May 2016 13:06:16 -0400
schrieb Andrei Alexandrescu :
> On 05/31/2016 12:54 PM, Jonathan M Davis via Digitalmars-d wrote:
> > Equality does not require decoding. Similarly, functions like find don't
> > either. Something like filter generally would, but it's also not
> > particularly no
On 31.05.2016 22:20, Marco Leise wrote:
Am Tue, 31 May 2016 16:29:33 +
schrieb Joakim:
>Part of it is the complexity of written language, part of it is
>bad technical decisions. Building the default string type in D
>around the horrible UTF-8 encoding was a fundamental mistake,
>both in te
On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote:
Wasn't the whole point of operating at the code point level by default to
make it so that code would be operating on full characters by default
instead of chopping them up as is so easy to do when operating at the code
unit level?
On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d wrote:
In the vast majority of cases what folks care about is full character
How are you so sure? -- Andrei
On 05/31/2016 03:34 PM, ag0aep6g wrote:
On 05/31/2016 07:21 PM, Andrei Alexandrescu wrote:
Could you please substantiate that? My understanding is that code unit
is a higher-level Unicode notion independent of encoding, whereas code
point is an encoding-dependent representation detail. -- Andrei
On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:
On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote:
Let's put the question this way. Given the following string, what do
*you* think walkLength should return?
şŭt̥ḛ́k̠
The number of code units in the string. That's the contrac
On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote:
Let's put the question this way. Given the following string, what do
*you* think walkLength should return?
şŭt̥ḛ́k̠
The number of code units in the string. That's the contract promised and
honored by Phobos. -- Andrei
On Tuesday, 31 May 2016 at 20:28:32 UTC, ag0aep6g wrote:
On 05/31/2016 06:29 PM, Joakim wrote:
D devs should lead the way in getting rid of the UTF-8
encoding, not
bickering about how to make it more palatable. I suggested a
single-byte encoding for most languages, with double-byte for
the on
On Tue, May 31, 2016 at 10:38:03PM +0200, Timon Gehr via Digitalmars-d wrote:
> On 31.05.2016 21:51, Steven Schveighoffer wrote:
> > On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote:
> > > On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via
> > > Digitalmars-d wrote:
> > > [...]
On Tuesday, 31 May 2016 at 20:20:46 UTC, Marco Leise wrote:
Am Tue, 31 May 2016 16:29:33 +
schrieb Joakim :
Part of it is the complexity of written language, part of it
is bad technical decisions. Building the default string type
in D around the horrible UTF-8 encoding was a fundamental
On Tue, May 31, 2016 at 10:47:56PM +0300, Dmitry Olshansky via Digitalmars-d
wrote:
> On 31-May-2016 01:00, Walter Bright wrote:
> > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
> > > I don't agree on changing those. Indexing and slicing a char[] is
> > > really useful and actually not hard to do c
On 31.05.2016 21:51, Steven Schveighoffer wrote:
On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote:
On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via
Digitalmars-d wrote:
[...]
Does walkLength yield the same number for all representations?
Let's put the question this way.
On 05/31/2016 06:29 PM, Joakim wrote:
D devs should lead the way in getting rid of the UTF-8 encoding, not
bickering about how to make it more palatable. I suggested a
single-byte encoding for most languages, with double-byte for the ones
which wouldn't fit in a byte. Use some kind of header or
On Tuesday, 31 May 2016 at 18:34:54 UTC, Jonathan M Davis wrote:
On Tuesday, May 31, 2016 16:29:33 Joakim via Digitalmars-d
wrote:
UTF-8 is an antiquated hack that needs to be eradicated. It
forces all other languages than English to be twice as long,
for no good reason, have fun with that whe
Am Tue, 31 May 2016 16:29:33 +
schrieb Joakim :
> Part of it is the complexity of written language, part of it is
> bad technical decisions. Building the default string type in D
> around the horrible UTF-8 encoding was a fundamental mistake,
> both in terms of efficiency and complexity.
On Tuesday, May 31, 2016 22:47:56 Dmitry Olshansky via Digitalmars-d wrote:
> On 31-May-2016 01:00, Walter Bright wrote:
> > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
> >> I don't agree on changing those. Indexing and slicing a char[] is
> >> really useful
> >> and actually not hard to do correct
On Tuesday, May 31, 2016 21:48:36 Timon Gehr via Digitalmars-d wrote:
> On 31.05.2016 21:40, Wyatt wrote:
> > On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote:
> >> The 'length' of a character is not one in all contexts.
> >> The following text takes six columns in my terminal:
> >>
> >> 日
On Tuesday, May 31, 2016 15:33:38 Andrei Alexandrescu via Digitalmars-d wrote:
> On 05/31/2016 02:53 PM, Jonathan M Davis via Digitalmars-d wrote:
> > walkLength treats a code point like it's a character.
>
> No, it treats a code point like it's a code point. -- Andrei
Wasn't the whole point of op
On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote:
On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
[...]
Does walkLength yield the same number for all representations?
Let's put the question this way. Given the following string, what do
*you* think
On Tue, May 31, 2016 at 07:40:13PM +, Wyatt via Digitalmars-d wrote:
> On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote:
> >
> > The 'length' of a character is not one in all contexts.
> > The following text takes six columns in my terminal:
> >
> > 日本語
> > 123456
>
> That's a prope
On 31.05.2016 21:40, Wyatt wrote:
On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote:
The 'length' of a character is not one in all contexts.
The following text takes six columns in my terminal:
日本語
123456
That's a property of your font and font rendering engine, not Unicode.
Sure.
On 31-May-2016 01:00, Walter Bright wrote:
On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
I don't agree on changing those. Indexing and slicing a char[] is
really useful
and actually not hard to do correctly (at least with regard to
handling code
units).
Yup. It isn't hard at all to use arrays of
On Tuesday, May 31, 2016 21:20:19 Timon Gehr via Digitalmars-d wrote:
> On 31.05.2016 20:53, Jonathan M Davis via Digitalmars-d wrote:
> > On Tuesday, May 31, 2016 14:30:08 Andrei Alexandrescu via Digitalmars-d
wrote:
> >> >On 5/31/16 2:11 PM, Jonathan M Davis via Digitalmars-d wrote:
> >>> > >On
On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote:
The 'length' of a character is not one in all contexts.
The following text takes six columns in my terminal:
日本語
123456
That's a property of your font and font rendering engine, not
Unicode. (Also, it's probably not quite six columns
On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
[...]
> Does walkLength yield the same number for all representations?
Let's put the question this way. Given the following string, what do
*you* think walkLength should return?
şŭt̥ḛ́k̠
I think an
On 05/31/2016 07:21 PM, Andrei Alexandrescu wrote:
Could you please substantiate that? My understanding is that code unit
is a higher-level Unicode notion independent of encoding, whereas code
point is an encoding-dependent representation detail. -- Andrei
You got the terms mixed up. Code unit
On 05/31/2016 02:57 PM, Jonathan M Davis via Digitalmars-d wrote:
In addition, as soon as you have ubyte[], none of the string-related
functions work. That's fixable, but as it stands, operating on ubyte[]
instead of char[] is a royal pain.
That'd be nice to fix indeed. Please break the ground?
On 05/31/2016 02:53 PM, Jonathan M Davis via Digitalmars-d wrote:
walkLength treats a code point like it's a character.
No, it treats a code point like it's a code point. -- Andrei
On 05/31/2016 02:46 PM, Timon Gehr wrote:
On 31.05.2016 20:30, Andrei Alexandrescu wrote:
D's
Phobos'
foreach, too. -- Andrei
On 31.05.2016 20:53, Jonathan M Davis via Digitalmars-d wrote:
On Tuesday, May 31, 2016 14:30:08 Andrei Alexandrescu via Digitalmars-d wrote:
>On 5/31/16 2:11 PM, Jonathan M Davis via Digitalmars-d wrote:
> >On Tuesday, May 31, 2016 13:21:57 Andrei Alexandrescu via Digitalmars-d
wrote:
> >>On
On Friday, May 27, 2016 04:31:49 Vladimir Panteleev via Digitalmars-d wrote:
> On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu
> >> 9. Autodecode cannot be turned off, i.e. it isn't practical to
> >> avoid
> >> importing std.array one way or another, and then autodecode is
> >> there.
On Tuesday, May 31, 2016 14:30:08 Andrei Alexandrescu via Digitalmars-d wrote:
> On 5/31/16 2:11 PM, Jonathan M Davis via Digitalmars-d wrote:
> > On Tuesday, May 31, 2016 13:21:57 Andrei Alexandrescu via Digitalmars-d
wrote:
> >> On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote:
>
On 31.05.2016 20:30, Andrei Alexandrescu wrote:
D's
Phobos'
handling of UTF is at the code unit
code point
level (like all of Unicode is portably defined).
On Tuesday, May 31, 2016 16:29:33 Joakim via Digitalmars-d wrote:
> UTF-8 is an antiquated hack that needs to be eradicated. It
> forces all other languages than English to be twice as long, for
> no good reason, have fun with that when you're downloading text
> on a 2G connection in the developin
On 5/31/16 2:11 PM, Jonathan M Davis via Digitalmars-d wrote:
On Tuesday, May 31, 2016 13:21:57 Andrei Alexandrescu via Digitalmars-d wrote:
On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote:
Saying that operating at the code point level - UTF-32 - is correct
is like saying that
On 5/31/16 2:21 PM, Jonathan M Davis via Digitalmars-d wrote:
I think that the first step is getting Phobos to work with all ranges of
character types - be they char, wchar, dchar, or graphemes. Then the
algorithms themselves will work whether we have auto-decoding or not. With
that done, we can
On Monday, May 30, 2016 14:24:23 Andrei Alexandrescu via Digitalmars-d wrote:
> On 05/30/2016 12:34 PM, Jack Stouffer wrote:
> > On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote:
> >> D1 -> D2 was a vastly more disruptive change than getting rid of
> >> auto-decoding would be.
> >
> >
On Tuesday, May 31, 2016 13:21:57 Andrei Alexandrescu via Digitalmars-d wrote:
> On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote:
> > Saying that operating at the code point level - UTF-32 - is correct
> > is like saying that operating at UTF-16 instead of UTF-8 is correct.
>
> Cou
On Friday, May 27, 2016 16:41:09 Andrei Alexandrescu via Digitalmars-d wrote:
> On 05/27/2016 03:43 PM, H. S. Teoh via Digitalmars-d wrote:
> > That's what we've been trying to say all along!
>
> If that's the case things are pretty dire, autodecoding or not. -- Andrei
True enough. Correctly handl
On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote:
The standard library has to fight against itself because of autodecoding!
The vast majority of the algorithms in Phobos are special-cased on strings
in an attempt to get around autodecoding. That alone should highlight the
fact tha
On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote:
Saying that operating at the code point level - UTF-32 - is correct
is like saying that operating at UTF-16 instead of UTF-8 is correct.
Could you please substantiate that? My understanding is that code unit
is a higher-level Un
On Tuesday, May 31, 2016 13:01:11 Andrei Alexandrescu via Digitalmars-d wrote:
> On 05/31/2016 12:45 PM, Jonathan M Davis via Digitalmars-d wrote:
> > On Tuesday, May 31, 2016 11:07:09 Andrei Alexandrescu via Digitalmars-d
wrote:
> >> On 5/31/16 3:56 AM, Walter Bright wrote:
> >>> If there is an a
On Friday, May 27, 2016 09:40:21 H. S. Teoh via Digitalmars-d wrote:
> On Fri, May 27, 2016 at 03:47:32PM +0200, ag0aep6g via Digitalmars-d wrote:
> > On 05/27/2016 03:32 PM, Andrei Alexandrescu wrote:
> > > > > However the following do require autodecoding:
> > > > >
> > > > > s.walkLength
> > > >
On 05/31/2016 12:54 PM, Jonathan M Davis via Digitalmars-d wrote:
Equality does not require decoding. Similarly, functions like find don't
either. Something like filter generally would, but it's also not
particularly normal to filter a string on a by-character basis. You'd
probably want to get to
On 05/31/2016 12:45 PM, Jonathan M Davis via Digitalmars-d wrote:
On Tuesday, May 31, 2016 11:07:09 Andrei Alexandrescu via Digitalmars-d wrote:
On 5/31/16 3:56 AM, Walter Bright wrote:
If there is an abstraction for strings that is efficient, consistent,
useful, and hides the fact that it is U
On Friday, May 27, 2016 23:16:58 David Nadlinger via Digitalmars-d wrote:
> On Friday, 27 May 2016 at 22:12:57 UTC, Minas Mina wrote:
> > Those should be the same though, i.e compare the same. In order
> > to do that, there is normalization. What is does is to _expand_
> > the single codepoint Ä in
On Tuesday, May 31, 2016 07:17:03 default0 via Digitalmars-d wrote:
> Thinking about this a bit more - what algorithms are actually
> correct when implemented on the level of code units?
> Off the top of my head I can only really think of copying and
> hashing, since you want to do that on the byte
On Tuesday, May 31, 2016 11:07:09 Andrei Alexandrescu via Digitalmars-d wrote:
> On 5/31/16 3:56 AM, Walter Bright wrote:
> > If there is an abstraction for strings that is efficient, consistent,
> > useful, and hides the fact that it is UTF, I am not aware of it.
>
> It's been mentioned several ti
On Monday, 30 May 2016 at 17:35:36 UTC, Chris wrote:
On Monday, 30 May 2016 at 16:03:03 UTC, Marco Leise wrote:
*** http://site.icu-project.org/home#TOC-What-is-ICU-
I was actually talking about ICU with a colleague today. Could
it be that Unicode itself is broken? I've often heard criticism
On Tuesday, 31 May 2016 at 15:07:09 UTC, Andrei Alexandrescu
wrote:
Consistency with what? Consistent with what?
It is a slice type. It should work as a slice type. Every other
design stink.
On Sunday, May 29, 2016 13:47:32 H. S. Teoh via Digitalmars-d wrote:
> On Sun, May 29, 2016 at 03:55:22PM -0400, Andrei Alexandrescu via
Digitalmars-d wrote:
> > So now code points are good? -- Andrei
>
> It depends on what you're trying to accomplish. That's the point we're
> trying to get at. F
On 5/31/16 10:33 AM, Seb wrote:
Explicitly stating the type of iteration in the 132 places with
auto-decoding in Phobos doesn't sound that terrible.
It is terrible, no two ways about it. We've been very very careful with
changes that caused a handful or breakages in Phobos. It really means
ev
On 5/31/16 3:56 AM, Walter Bright wrote:
On 5/30/2016 9:16 PM, Andrei Alexandrescu wrote:
On 5/30/16 5:51 PM, Walter Bright wrote:
On 5/30/2016 8:34 AM, Marc Schütz wrote:
In an ideal world, we'd also want to change the way `length` and
`opIndex` work,
Why? strings are arrays of code units.
On Tuesday, 31 May 2016 at 13:33:14 UTC, Marc Schütz wrote:
In an ideal world, the programs someone intuitively writes will
do the right thing, and if they can't, they at least refuse to
compile. If we agree that it's up to the user whether to
iterate over a string by code unit or code points o
On 05/31/2016 04:33 PM, Seb wrote:
https://github.com/dlang/phobos/pull/4384
Explicitly stating the type of iteration in the 132 places with
auto-decoding in Phobos doesn't sound that terrible.
After checking some of those 132 places, they are in generic functions
that take ranges. std.algori
On Tuesday, 31 May 2016 at 13:33:14 UTC, Marc Schütz wrote:
On Monday, 30 May 2016 at 21:51:36 UTC, Walter Bright wrote:
[...]
So, strings are _implemented_ as arrays of code units. But
indiscriminately treating them as such in all situations leads
to wrong results (just like arrays of code
On Monday, 30 May 2016 at 21:51:36 UTC, Walter Bright wrote:
On 5/30/2016 8:34 AM, Marc Schütz wrote:
In an ideal world, we'd also want to change the way `length`
and `opIndex` work,
Why? strings are arrays of code units.
So, strings are _implemented_ as arrays of code units. But
indiscrimi
Am Tue, 31 May 2016 07:17:03 +
schrieb default0 :
> Thinking about this a bit more - what algorithms are actually
> correct when implemented on the level of code units?
Calculating the buffer size of a string, validation and
fast versions of general algorithms that can be defined in
terms of
On Tuesday, 31 May 2016 at 07:56:54 UTC, Walter Bright wrote:
On 5/30/2016 9:16 PM, Andrei Alexandrescu wrote:
On 5/30/16 5:51 PM, Walter Bright wrote:
On 5/30/2016 8:34 AM, Marc Schütz wrote:
In an ideal world, we'd also want to change the way `length`
and
`opIndex` work,
Why? strings are
On Monday, 30 May 2016 at 21:39:00 UTC, Walter Bright wrote:
On 5/30/2016 12:52 PM, H. S. Teoh via Digitalmars-d wrote:
If I ever had to write string-heavy code, I'd probably fork
Phobos just
so I can get decent performance. Just sayin'.
When I wrote Warp, the only point of which was speed, I
On 5/30/2016 9:16 PM, Andrei Alexandrescu wrote:
On 5/30/16 5:51 PM, Walter Bright wrote:
On 5/30/2016 8:34 AM, Marc Schütz wrote:
In an ideal world, we'd also want to change the way `length` and
`opIndex` work,
Why? strings are arrays of code units. All the trouble comes from
erratically pre
On Tuesday, 31 May 2016 at 06:45:56 UTC, H. S. Teoh wrote:
On Tue, May 31, 2016 at 12:13:57AM -0400, Andrei Alexandrescu
via Digitalmars-d wrote:
On 5/30/16 6:00 PM, Walter Bright wrote:
> On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
> > I don't agree on changing those. Indexing and slicing a
>
On Tue, May 31, 2016 at 12:13:57AM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
> On 5/30/16 6:00 PM, Walter Bright wrote:
> > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
> > > I don't agree on changing those. Indexing and slicing a char[] is
> > > really useful and actually not hard to do
On 5/30/16 7:52 PM, Seb wrote:
On Monday, 30 May 2016 at 21:39:14 UTC, Vladimir Panteleev wrote:
On Monday, 30 May 2016 at 16:34:49 UTC, Jack Stouffer wrote:
On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote:
D1 -> D2 was a vastly more disruptive change than getting rid of
auto-dec
On 5/30/16 5:51 PM, Walter Bright wrote:
On 5/30/2016 8:34 AM, Marc Schütz wrote:
In an ideal world, we'd also want to change the way `length` and
`opIndex` work,
Why? strings are arrays of code units. All the trouble comes from
erratically pretending otherwise.
That's not an argument. Objec
On 5/30/16 6:00 PM, Walter Bright wrote:
On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
I don't agree on changing those. Indexing and slicing a char[] is
really useful
and actually not hard to do correctly (at least with regard to
handling code
units).
Yup. It isn't hard at all to use arrays of c
On Monday, 30 May 2016 at 21:39:14 UTC, Vladimir Panteleev wrote:
Perhaps it would be worth trying to silently remove
autodecoding and seeing how much of Phobos breaks, as an
experiment. Has this been tried before?
Did it, the results are a large number of phobos modules fail to
compile becau
On 05/30/2016 04:30 PM, Timon Gehr wrote:
In D, enum does not mean enumeration, const does not mean constant, pure
is not pure, lazy is not lazy, and char does not mean character.
My new favorite quote :)
On Monday, 30 May 2016 at 21:39:14 UTC, Vladimir Panteleev wrote:
On Monday, 30 May 2016 at 16:34:49 UTC, Jack Stouffer wrote:
On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote:
D1 -> D2 was a vastly more disruptive change than getting rid
of auto-decoding would be.
Don't be so su
A relevant thread in the Rust bug tracker I remember from
three years ago: https://github.com/rust-lang/rust/issues/7043
May it be of inspiration.
--
Marco
> 4: Indonesians* shall be converted to a sane alphabet
*Correction: Koreans
(2-4 Hangul syllables (code points) form each letter)
--
Marco
Am Fri, 27 May 2016 15:47:32 +0200
schrieb ag0aep6g :
> On 05/27/2016 03:32 PM, Andrei Alexandrescu wrote:
> >>> However the following do require autodecoding:
> >>>
> >>> s.walkLength
> >>> s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation
> >>> s.count!(c => c >= 32) // non-control chara
On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
I don't agree on changing those. Indexing and slicing a char[] is really useful
and actually not hard to do correctly (at least with regard to handling code
units).
Yup. It isn't hard at all to use arrays of codeunits correctly.
On 5/30/2016 8:34 AM, Marc Schütz wrote:
In an ideal world, we'd also want to change the way `length` and `opIndex` work,
Why? strings are arrays of code units. All the trouble comes from erratically
pretending otherwise.
On Monday, 30 May 2016 at 16:34:49 UTC, Jack Stouffer wrote:
On Monday, 30 May 2016 at 16:25:20 UTC, Nick Sabalausky wrote:
D1 -> D2 was a vastly more disruptive change than getting rid
of auto-decoding would be.
Don't be so sure. All string handling code would become broken,
even if it appea
On 5/30/2016 12:52 PM, H. S. Teoh via Digitalmars-d wrote:
If I ever had to write string-heavy code, I'd probably fork Phobos just
so I can get decent performance. Just sayin'.
When I wrote Warp, the only point of which was speed, I couldn't use phobos
because of autodecoding. I have since rec
201 - 300 of 441 matches
Mail list logo