Re: The Case Against Autodecode

2016-06-05 Thread Walter Bright via Digitalmars-d
On 6/5/2016 1:05 AM, deadalnix wrote: TIL: books are read by computers. I should introduce you to a fabulous technology called OCR. :-)

Re: The Case Against Autodecode

2016-06-05 Thread Walter Bright via Digitalmars-d
On 6/5/2016 1:07 AM, deadalnix wrote: On Saturday, 4 June 2016 at 03:03:16 UTC, Walter Bright wrote: Oh rubbish. Let go of the idea that choosing bad fonts should drive Unicode codepoint decisions. Interestingly enough, I've mentioned earlier here that only people from the US would believe

Re: The Case Against Autodecode

2016-06-05 Thread docandrew via Digitalmars-d
On Saturday, 4 June 2016 at 08:12:47 UTC, Walter Bright wrote: On 6/3/2016 11:17 PM, H. S. Teoh via Digitalmars-d wrote: On Fri, Jun 03, 2016 at 08:03:16PM -0700, Walter Bright via Digitalmars-d wrote: It works for books. Because books don't allow their readers to change the font. Unicode

Re: The Case Against Autodecode

2016-06-05 Thread Jonathan M Davis via Digitalmars-d
On Friday, June 03, 2016 15:38:38 Walter Bright via Digitalmars-d wrote: > On 6/3/2016 2:10 PM, Jonathan M Davis via Digitalmars-d wrote: > > Actually, I would argue that the moment that Unicode is concerned with > > what > > the character actually looks like rather than what character it

Re: The Case Against Autodecode

2016-06-05 Thread deadalnix via Digitalmars-d
On Saturday, 4 June 2016 at 03:03:16 UTC, Walter Bright wrote: Oh rubbish. Let go of the idea that choosing bad fonts should drive Unicode codepoint decisions. Interestingly enough, I've mentioned earlier here that only people from the US would believe that documents with mixed languages

Re: The Case Against Autodecode

2016-06-05 Thread deadalnix via Digitalmars-d
On Friday, 3 June 2016 at 18:43:07 UTC, Walter Bright wrote: On 6/3/2016 9:28 AM, H. S. Teoh via Digitalmars-d wrote: Eventually you have no choice but to encode by logical meaning rather than by appearance, since there are many lookalikes between different languages that actually mean

Re: The Case Against Autodecode

2016-06-05 Thread deadalnix via Digitalmars-d
On Friday, 3 June 2016 at 12:04:39 UTC, Chris wrote: I do exactly this. Validate and normalize. And once you've done this, auto decoding is useless because the same character has the same representation anyway.

Re: The Case Against Autodecode

2016-06-04 Thread Alix Pexton via Digitalmars-d
On 03/06/2016 20:12, Dmitry Olshansky wrote: On 02-Jun-2016 23:27, Walter Bright wrote: I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness. Yeah, Unicode was not meant to be easy it seems. Or this is whatever

Re: The Case Against Autodecode

2016-06-04 Thread Walter Bright via Digitalmars-d
On 6/3/2016 11:17 PM, H. S. Teoh via Digitalmars-d wrote: On Fri, Jun 03, 2016 at 08:03:16PM -0700, Walter Bright via Digitalmars-d wrote: It works for books. Because books don't allow their readers to change the font. Unicode is not the font. This madness already exists *without*

Re: The Case Against Autodecode

2016-06-04 Thread Patrick Schluter via Digitalmars-d
One has also to take into consideration that Unicode is the way it is because it was not invented in an empty space. It had to take consideration of the existing and find compromisses allowing its adoption. Even if they had invented the perfect encoding, NO ONE WOULD HAVE USED IT, as it would

Re: The Case Against Autodecode

2016-06-04 Thread Patrick Schluter via Digitalmars-d
On Friday, 3 June 2016 at 20:53:32 UTC, H. S. Teoh wrote: Even the Greek sigma has two forms depending on whether it's at the end of a word or not -- so should it be two code points or one? If you say two, then you'd have a problem with how to search for sigma in Greek text, and you'd have

Re: The Case Against Autodecode

2016-06-04 Thread H. S. Teoh via Digitalmars-d
On Fri, Jun 03, 2016 at 08:03:16PM -0700, Walter Bright via Digitalmars-d wrote: > On 6/3/2016 6:08 PM, H. S. Teoh via Digitalmars-d wrote: > > It's not a hard concept, except that these different letters have > > lookalike forms with completely unrelated letters. Again: > > > > - Lowercase Latin

Re: The Case Against Autodecode

2016-06-03 Thread Walter Bright via Digitalmars-d
On 6/3/2016 6:08 PM, H. S. Teoh via Digitalmars-d wrote: It's not a hard concept, except that these different letters have lookalike forms with completely unrelated letters. Again: - Lowercase Latin m looks visually the same as lowercase Cyrillic Т in cursive form. In some font renderings the

Re: The Case Against Autodecode

2016-06-03 Thread ketmar via Digitalmars-d
On Saturday, 4 June 2016 at 02:46:31 UTC, Walter Bright wrote: On 6/3/2016 5:42 PM, ketmar wrote: sometimes used Cyrillic font to represent English. Nobody here suggested using the wrong font, it's completely irrelevant. you suggested that unicode designers should make similar-looking

Re: The Case Against Autodecode

2016-06-03 Thread Walter Bright via Digitalmars-d
On 6/3/2016 5:42 PM, ketmar wrote: sometimes used Cyrillic font to represent English. Nobody here suggested using the wrong font, it's completely irrelevant.

Re: The Case Against Autodecode

2016-06-03 Thread H. S. Teoh via Digitalmars-d
On Fri, Jun 03, 2016 at 03:35:18PM -0700, Walter Bright via Digitalmars-d wrote: > On 6/3/2016 1:53 PM, H. S. Teoh via Digitalmars-d wrote: [...] > > 'Cos by that argument, serif and sans serif letters should have > > different encodings, because in languages like Hebrew, a tiny little > > serif

Re: The Case Against Autodecode

2016-06-03 Thread ketmar via Digitalmars-d
On Friday, 3 June 2016 at 18:43:07 UTC, Walter Bright wrote: It's almost as if printed documents and books have never existed! some old xUSSR books which has some English text sometimes used Cyrillic font to represent English. it was awful, and barely readable. this was done to ease the work

Re: The Case Against Autodecode

2016-06-03 Thread Adam D. Ruppe via Digitalmars-d
On Friday, 3 June 2016 at 22:38:38 UTC, Walter Bright wrote: If a font choice changes the meaning then it is not a font. Nah, then it is an Awesome Font that is totally Web Scale! i wish i was making that up http://fontawesome.io/ i hate that thing But, it is kinda legal: gotta love the

Re: The Case Against Autodecode

2016-06-03 Thread Walter Bright via Digitalmars-d
On 6/3/2016 2:10 PM, Jonathan M Davis via Digitalmars-d wrote: Actually, I would argue that the moment that Unicode is concerned with what the character actually looks like rather than what character it logically is that it's gone outside of its charter. The way that characters actually look is

Re: The Case Against Autodecode

2016-06-03 Thread Walter Bright via Digitalmars-d
On 6/3/2016 1:53 PM, H. S. Teoh via Digitalmars-d wrote: But if we were to encode appearance instead of logical meaning, that would mean the *same* lowercase Cyrillic ь would have multiple, different encodings depending on which font was in use. I don't see that consequence at all. That

Re: The Case Against Autodecode

2016-06-03 Thread Jonathan M Davis via Digitalmars-d
On Friday, June 03, 2016 03:08:43 Walter Bright via Digitalmars-d wrote: > On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote: > > At the time > > Unicode also had to grapple with tricky issues like what to do with > > lookalike characters that served different purposes or had different > >

Re: The Case Against Autodecode

2016-06-03 Thread H. S. Teoh via Digitalmars-d
On Fri, Jun 03, 2016 at 11:43:07AM -0700, Walter Bright via Digitalmars-d wrote: > On 6/3/2016 9:28 AM, H. S. Teoh via Digitalmars-d wrote: > > Eventually you have no choice but to encode by logical meaning > > rather than by appearance, since there are many lookalikes between > > different

Re: The Case Against Autodecode

2016-06-03 Thread Walter Bright via Digitalmars-d
On 6/3/2016 11:54 AM, Timon Gehr wrote: On 03.06.2016 20:41, Walter Bright wrote: How did people ever get by with printed books and documents? They can disambiguate the letters based on context well enough. Characters do not have semantic meaning. Their meaning is always inferred from the

Re: The Case Against Autodecode

2016-06-03 Thread Dmitry Olshansky via Digitalmars-d
On 02-Jun-2016 23:27, Walter Bright wrote: On 6/2/2016 12:34 PM, deadalnix wrote: On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote: Pretty much everything. Consider s and s1 string variables with possibly different encodings (UTF8/UTF16). * s.all!(c => c == 'ö') works only

Re: The Case Against Autodecode

2016-06-03 Thread Adam D. Ruppe via Digitalmars-d
On Friday, 3 June 2016 at 18:41:36 UTC, Walter Bright wrote: How did people ever get by with printed books and documents? Printed books pick one font and one layout, then is read by people. It doesn't have to be represented in some format where end users can change the font and size etc.

Re: The Case Against Autodecode

2016-06-03 Thread Timon Gehr via Digitalmars-d
On 03.06.2016 20:41, Walter Bright wrote: On 6/3/2016 3:14 AM, Vladimir Panteleev wrote: That's not right either. Cyrillic letters can look slightly different from their latin lookalikes in some circumstances. I'm sure there are extremely good reasons for not using the latin lookalikes in the

Re: The Case Against Autodecode

2016-06-03 Thread Walter Bright via Digitalmars-d
On 6/3/2016 9:28 AM, H. S. Teoh via Digitalmars-d wrote: Eventually you have no choice but to encode by logical meaning rather than by appearance, since there are many lookalikes between different languages that actually mean something completely different, and often behaves completely

Re: The Case Against Autodecode

2016-06-03 Thread Walter Bright via Digitalmars-d
On 6/3/2016 3:14 AM, Vladimir Panteleev wrote: That's not right either. Cyrillic letters can look slightly different from their latin lookalikes in some circumstances. I'm sure there are extremely good reasons for not using the latin lookalikes in the Cyrillic alphabets, because most (all?)

Re: The Case Against Autodecode

2016-06-03 Thread Walter Bright via Digitalmars-d
On 6/3/2016 3:10 AM, Vladimir Panteleev wrote: I don't think it would work (or at least, the analogy doesn't hold). It would mean that you can't add new precomposited characters, because that means that previously valid sequences are now invalid. So don't add new precomposited characters when

Re: The Case Against Autodecode

2016-06-03 Thread H. S. Teoh via Digitalmars-d
On Fri, Jun 03, 2016 at 10:14:15AM +, Vladimir Panteleev via Digitalmars-d wrote: > On Friday, 3 June 2016 at 10:08:43 UTC, Walter Bright wrote: > > On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote: > > > At the time Unicode also had to grapple with tricky issues like > > > what to do

Re: The Case Against Autodecode

2016-06-03 Thread Nick Sabalausky via Digitalmars-d
On 06/02/2016 05:37 PM, Andrei Alexandrescu wrote: On 6/2/16 5:35 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed.

Re: The Case Against Autodecode

2016-06-03 Thread Chris via Digitalmars-d
On Friday, 3 June 2016 at 11:46:50 UTC, Jonathan M Davis wrote: On Friday, June 03, 2016 10:10:18 Vladimir Panteleev via Digitalmars-d wrote: On Friday, 3 June 2016 at 10:05:11 UTC, Walter Bright wrote: > On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote: >> However, this >> meant that

Re: The Case Against Autodecode

2016-06-03 Thread Jonathan M Davis via Digitalmars-d
On Friday, June 03, 2016 10:10:18 Vladimir Panteleev via Digitalmars-d wrote: > On Friday, 3 June 2016 at 10:05:11 UTC, Walter Bright wrote: > > On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote: > >> However, this > >> meant that some precomposed characters were "redundant": they > >>

Re: The Case Against Autodecode

2016-06-03 Thread Vladimir Panteleev via Digitalmars-d
On Friday, 3 June 2016 at 10:08:43 UTC, Walter Bright wrote: On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote: At the time Unicode also had to grapple with tricky issues like what to do with lookalike characters that served different purposes or had different meanings, e.g., the mu

Re: The Case Against Autodecode

2016-06-03 Thread Vladimir Panteleev via Digitalmars-d
On Friday, 3 June 2016 at 10:05:11 UTC, Walter Bright wrote: On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote: However, this meant that some precomposed characters were "redundant": they represented character + diacritic combinations that could equally well be expressed separately.

Re: The Case Against Autodecode

2016-06-03 Thread Walter Bright via Digitalmars-d
On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote: At the time Unicode also had to grapple with tricky issues like what to do with lookalike characters that served different purposes or had different meanings, e.g., the mu sign in the math block vs. the real letter mu in the Greek block,

Re: The Case Against Autodecode

2016-06-03 Thread Walter Bright via Digitalmars-d
On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote: However, this meant that some precomposed characters were "redundant": they represented character + diacritic combinations that could equally well be expressed separately. Normalization was the inevitable consequence. It is not

Re: The Case Against Autodecode

2016-06-03 Thread Jonathan M Davis via Digitalmars-d
On Thursday, June 02, 2016 15:05:44 Andrei Alexandrescu via Digitalmars-d wrote: > The intent of autodecoding was to make std.algorithm work meaningfully > with strings. As it's easy to see I just went through > std.algorithm.searching alphabetically and found issues literally with > every

Re: The Case Against Autodecode

2016-06-03 Thread H. S. Teoh via Digitalmars-d
On Thu, Jun 02, 2016 at 05:19:48PM -0700, Walter Bright via Digitalmars-d wrote: > On 6/2/2016 3:27 PM, John Colvin wrote: > > > I wonder what rationale there is for Unicode to have two different > > > sequences of codepoints be treated as the same. It's madness. > > > > There are languages that

Re: The Case Against Autodecode

2016-06-03 Thread Marco Leise via Digitalmars-d
Am Thu, 2 Jun 2016 18:54:21 -0400 schrieb Andrei Alexandrescu : > On 06/02/2016 06:10 PM, Marco Leise wrote: > > Am Thu, 2 Jun 2016 15:05:44 -0400 > > schrieb Andrei Alexandrescu : > > > >> On 06/02/2016 01:54 PM, Marc Schütz wrote:

Re: The Case Against Autodecode

2016-06-03 Thread H. S. Teoh via Digitalmars-d
On Thu, Jun 02, 2016 at 04:29:48PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 06/02/2016 04:22 PM, cym13 wrote: > > > > A:“We should decode to code points” > > B:“No, decoding to code points is a stupid idea.” > > A:“No it's not!” > > B:“Can you show a concrete example where it

Re: The Case Against Autodecode

2016-06-03 Thread H. S. Teoh via Digitalmars-d
On Thu, Jun 02, 2016 at 04:28:45PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 06/02/2016 04:17 PM, Timon Gehr wrote: > > I.e. you are saying that 'works' means 'operates on code points'. > > Affirmative. -- Andrei Again, a ridiculous position. I can use exactly the same line of

Re: The Case Against Autodecode

2016-06-03 Thread H. S. Teoh via Digitalmars-d
On Thu, Jun 02, 2016 at 04:38:28PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 06/02/2016 04:36 PM, tsbockman wrote: > > Your examples will pass or fail depending on how (and whether) the > > 'ö' grapheme is normalized. > > And that's fine. Want graphemes, .byGrapheme wags its tail

Re: The Case Against Autodecode

2016-06-03 Thread tsbockman via Digitalmars-d
On Thursday, 2 June 2016 at 21:00:17 UTC, tsbockman wrote: However, this document is very old - from Unicode 3.0 and the year 2000: While there are no surrogate characters in Unicode 3.0 (outside of private use characters), future versions of Unicode will contain them... Perhaps level 1

Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d
On 6/2/2016 3:27 PM, John Colvin wrote: I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness. There are languages that make heavy use of diacritics, often several on a single "character". Hebrew is a good example.

Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d
On 6/2/2016 2:25 PM, deadalnix wrote: On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote: I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness. To be able to convert back and forth from/to unicode in a

Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d
On 6/2/2016 4:29 PM, Jonathan M Davis via Digitalmars-d wrote: How do you suggest that we handle the normalization issue? Started a new thread for that one.

Re: The Case Against Autodecode

2016-06-02 Thread Jonathan M Davis via Digitalmars-d
On Thursday, June 02, 2016 15:48:03 Walter Bright via Digitalmars-d wrote: > On 6/2/2016 3:23 PM, Andrei Alexandrescu wrote: > > On 06/02/2016 05:58 PM, Walter Bright wrote: > >> > * s.balancedParens('〈', '〉') works only with autodecoding. > >> > * s.canFind('ö') works only with autodecoding. It

Re: The Case Against Autodecode

2016-06-02 Thread Jonathan M Davis via Digitalmars-d
On Thursday, June 02, 2016 22:27:16 John Colvin via Digitalmars-d wrote: > On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote: > > I wonder what rationale there is for Unicode to have two > > different sequences of codepoints be treated as the same. It's > > madness. > > There are

Re: The Case Against Autodecode

2016-06-02 Thread Jonathan M Davis via Digitalmars-d
On Thursday, June 02, 2016 18:23:19 Andrei Alexandrescu via Digitalmars-d wrote: > On 06/02/2016 05:58 PM, Walter Bright wrote: > > On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote: > >> The lambda returns bool. -- Andrei > > > > Yes, I was wrong about that. But the point still stands with: > > >

Re: The Case Against Autodecode

2016-06-02 Thread Vladimir Panteleev via Digitalmars-d
On Thursday, 2 June 2016 at 21:56:10 UTC, Walter Bright wrote: Yes, you have a good point. But we do allow things like: byte b; if (b == 1) ... Why allowing char/wchar/dchar comparisons is wrong: void main() { string s = "Привет"; foreach (c; s) assert(c != 'Ñ'); }

Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d
On 03.06.2016 00:23, Andrei Alexandrescu wrote: On 06/02/2016 05:58 PM, Walter Bright wrote: On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote: The lambda returns bool. -- Andrei Yes, I was wrong about that. But the point still stands with: > * s.balancedParens('〈', '〉') works only with

Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d
On 03.06.2016 00:26, Walter Bright wrote: On 6/2/2016 3:11 PM, Timon Gehr wrote: Well, this is a somewhat different case, because 1 is just not representable as a byte. Every value that fits in a byte fits in an int though. It's different for code units. They are incompatible both ways.

Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d
On Thursday, 2 June 2016 at 22:20:49 UTC, Walter Bright wrote: On 6/2/2016 2:05 PM, tsbockman wrote: Presumably if someone marks their own PR as "do not merge", it means they're planning to either close it themselves after it has served its purpose, or they plan to fix/finish it and then

Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d
On 6/2/2016 3:10 PM, Marco Leise wrote: we haven't looked into borrowing/scoped enough That's my fault. As for scoped, the idea is to make scope work analogously to DIP25's 'return ref'. I don't believe we need borrowing, we've worked out another solution that will work for ref counting.

Re: The Case Against Autodecode

2016-06-02 Thread John Colvin via Digitalmars-d
On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote: On 6/2/2016 12:34 PM, deadalnix wrote: On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote: Pretty much everything. Consider s and s1 string variables with possibly different encodings (UTF8/UTF16). * s.all!(c =>

Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d
On 6/2/2016 3:11 PM, Timon Gehr wrote: Well, this is a somewhat different case, because 1 is just not representable as a byte. Every value that fits in a byte fits in an int though. It's different for code units. They are incompatible both ways. Not exactly. (c == 'ö') is always false for

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 06/02/2016 05:58 PM, Walter Bright wrote: On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote: The lambda returns bool. -- Andrei Yes, I was wrong about that. But the point still stands with: > * s.balancedParens('〈', '〉') works only with autodecoding. > * s.canFind('ö') works only with

Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d
On Thursday, 2 June 2016 at 22:03:01 UTC, default0 wrote: *sigh* reading comprehension. ... Please do not take what I say out of context, thank you. Earlier you said: The level 2 support description noted that it should be opt-in because its slow. My main point is simply that you

Re: The Case Against Autodecode

2016-06-02 Thread Marco Leise via Digitalmars-d
Am Thu, 2 Jun 2016 15:05:44 -0400 schrieb Andrei Alexandrescu : > On 06/02/2016 01:54 PM, Marc Schütz wrote: > > Which practical tasks are made possible (and work _correctly_) if you > > decode to code points, that don't already work with code units? > > Pretty

Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d
On 02.06.2016 23:56, Walter Bright wrote: On 6/2/2016 1:12 PM, Timon Gehr wrote: ... It is not meaningful to compare utf-8 and utf-16 code units directly. Yes, you have a good point. But we do allow things like: byte b; if (b == 1) ... Well, this is a somewhat different case,

Re: The Case Against Autodecode

2016-06-02 Thread default0 via Digitalmars-d
On Thursday, 2 June 2016 at 21:51:51 UTC, tsbockman wrote: On Thursday, 2 June 2016 at 21:38:02 UTC, default0 wrote: On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote: 1) It does not say that level 2 should be opt-in; it says that level 2 should be toggle-able. Nowhere does it say

Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d
On 02.06.2016 23:46, Andrei Alexandrescu wrote: On 6/2/16 5:43 PM, Timon Gehr wrote: .̂ ̪.̂ (Copy-paste it somewhere else, I think it might not be rendered correctly on the forum.) The point is that if I do: ".̂ ̪.̂".normalize!NFC.byGrapheme.findAmong([Grapheme("."),Grapheme(",")]) no

Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d
On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote: The lambda returns bool. -- Andrei Yes, I was wrong about that. But the point still stands with: > * s.balancedParens('〈', '〉') works only with autodecoding. > * s.canFind('ö') works only with autodecoding. It returns always false without. Can

Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d
On 6/2/2016 1:12 PM, Timon Gehr wrote: On 02.06.2016 22:07, Walter Bright wrote: On 6/2/2016 12:05 PM, Andrei Alexandrescu wrote: * s.all!(c => c == 'ö') works only with autodecoding. It returns always false without. The o is inferred as a wchar. The lamda then is inferred to return a wchar.

Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d
On Thursday, 2 June 2016 at 21:38:02 UTC, default0 wrote: On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote: 1) It does not say that level 2 should be opt-in; it says that level 2 should be toggle-able. Nowhere does it say which of level 1 and 2 should be the default. 2) It says that

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:43 PM, Timon Gehr wrote: .̂ ̪.̂ (Copy-paste it somewhere else, I think it might not be rendered correctly on the forum.) The point is that if I do: ".̂ ̪.̂".normalize!NFC.byGrapheme.findAmong([Grapheme("."),Grapheme(",")]) no match is returned. If I use your method with dchars,

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:38 PM, cym13 wrote: Allow me to try another angle: - There are different levels of unicode support and you don't want to support them all transparently. That's understandable. Cool. - The level you choose to support is the code point level. There are many good arguments about

Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d
On 02.06.2016 23:23, Andrei Alexandrescu wrote: On 6/2/16 5:19 PM, Timon Gehr wrote: On 02.06.2016 23:16, Timon Gehr wrote: On 02.06.2016 23:06, Andrei Alexandrescu wrote: As the examples show, the examples would be entirely meaningless at code unit level. So far, I needed to count the

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:38 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:37:11 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:35 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:37 PM, Andrei Alexandrescu wrote: On 6/2/16 5:35 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed. --

Re: The Case Against Autodecode

2016-06-02 Thread cym13 via Digitalmars-d
On Thursday, 2 June 2016 at 20:29:48 UTC, Andrei Alexandrescu wrote: On 06/02/2016 04:22 PM, cym13 wrote: A:“We should decode to code points” B:“No, decoding to code points is a stupid idea.” A:“No it's not!” B:“Can you show a concrete example where it does something useful?” A:“Sure, look

Re: The Case Against Autodecode

2016-06-02 Thread deadalnix via Digitalmars-d
On Thursday, 2 June 2016 at 21:37:11 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:35 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works

Re: The Case Against Autodecode

2016-06-02 Thread default0 via Digitalmars-d
On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote: On Thursday, 2 June 2016 at 21:07:19 UTC, default0 wrote: The level 2 support description noted that it should be opt-in because its slow. 1) It does not say that level 2 should be opt-in; it says that level 2 should be toggle-able.

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:35 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed. -- Andrei Nobody says it doesn't. Everybody says

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:35 PM, ag0aep6g wrote: On 06/02/2016 11:27 PM, Andrei Alexandrescu wrote: On 6/2/16 5:24 PM, ag0aep6g wrote: On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote: Nope, that's a radically different matter. As the examples show, the examples would be entirely meaningless at code unit

Re: The Case Against Autodecode

2016-06-02 Thread deadalnix via Digitalmars-d
On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed. -- Andrei Nobody says it doesn't. Everybody says the design is crap.

Re: The Case Against Autodecode

2016-06-02 Thread ag0aep6g via Digitalmars-d
On 06/02/2016 11:27 PM, Andrei Alexandrescu wrote: On 6/2/16 5:24 PM, ag0aep6g wrote: On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote: Nope, that's a radically different matter. As the examples show, the examples would be entirely meaningless at code unit level. They're simply not

Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d
On Thursday, 2 June 2016 at 21:07:19 UTC, default0 wrote: The level 2 support description noted that it should be opt-in because its slow. 1) It does not say that level 2 should be opt-in; it says that level 2 should be toggle-able. Nowhere does it say which of level 1 and 2 should be the

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:27 PM, Andrei Alexandrescu wrote: On 6/2/16 5:24 PM, ag0aep6g wrote: Just like there is no single code point for 'a⃗' so you can't search for it in a range of code points. Of course you can. Correx, indeed you can't. -- Andrei

Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d
On 02.06.2016 22:51, Andrei Alexandrescu wrote: On 06/02/2016 04:50 PM, Timon Gehr wrote: On 02.06.2016 22:28, Andrei Alexandrescu wrote: On 06/02/2016 04:12 PM, Timon Gehr wrote: It is not meaningful to compare utf-8 and utf-16 code units directly. But it is meaningful to compare Unicode

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed. -- Andrei

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:23 PM, Timon Gehr wrote: On 02.06.2016 22:51, Andrei Alexandrescu wrote: On 06/02/2016 04:50 PM, Timon Gehr wrote: On 02.06.2016 22:28, Andrei Alexandrescu wrote: On 06/02/2016 04:12 PM, Timon Gehr wrote: It is not meaningful to compare utf-8 and utf-16 code units directly. But

Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d
On 02.06.2016 23:20, deadalnix wrote: The sample code won't count the instance of the grapheme 'ö' as some of its encoding won't be counted, which definitively count as doesn't work. It also has false positives (you can combine 'ö' with some combining character in order to get some strange

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:24 PM, ag0aep6g wrote: On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote: Nope, that's a radically different matter. As the examples show, the examples would be entirely meaningless at code unit level. They're simply not possible. Won't compile. They do compile. There is no

Re: The Case Against Autodecode

2016-06-02 Thread ag0aep6g via Digitalmars-d
On 06/02/2016 11:24 PM, ag0aep6g wrote: They're simply not possible. Won't compile. There is no single UTF-8 code unit for 'ö', so you can't (easily) search for it in a range for code units. Just like there is no single code point for 'a⃗' so you can't search for it in a range of code points.

Re: The Case Against Autodecode

2016-06-02 Thread deadalnix via Digitalmars-d
On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote: On 6/2/2016 12:34 PM, deadalnix wrote: On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote: Pretty much everything. Consider s and s1 string variables with possibly different encodings (UTF8/UTF16). * s.all!(c =>

Re: The Case Against Autodecode

2016-06-02 Thread ag0aep6g via Digitalmars-d
On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote: Nope, that's a radically different matter. As the examples show, the examples would be entirely meaningless at code unit level. They're simply not possible. Won't compile. There is no single UTF-8 code unit for 'ö', so you can't (easily)

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:19 PM, Timon Gehr wrote: On 02.06.2016 23:16, Timon Gehr wrote: On 02.06.2016 23:06, Andrei Alexandrescu wrote: As the examples show, the examples would be entirely meaningless at code unit level. So far, I needed to count the number of characters 'ö' inside some string exactly

Re: The Case Against Autodecode

2016-06-02 Thread deadalnix via Digitalmars-d
On Thursday, 2 June 2016 at 20:13:52 UTC, Andrei Alexandrescu wrote: On 06/02/2016 03:34 PM, deadalnix wrote: On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote: Pretty much everything. Consider s and s1 string variables with possibly different encodings (UTF8/UTF16). *

Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d
On 02.06.2016 23:06, Andrei Alexandrescu wrote: As the examples show, the examples would be entirely meaningless at code unit level. So far, I needed to count the number of characters 'ö' inside some string exactly zero times, but I wanted to chain or join strings relatively often.

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:05 PM, tsbockman wrote: On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote: What is supposed to be done with "do not merge" PRs other than close them? Occasionally people need to try something on the auto tester (not sure if that's relevant to that particular PR,

Re: The Case Against Autodecode

2016-06-02 Thread default0 via Digitalmars-d
On Thursday, 2 June 2016 at 20:52:29 UTC, ag0aep6g wrote: On 06/02/2016 10:36 PM, Andrei Alexandrescu wrote: By whom? The "support level 1" folks yonder at the Unicode standard? :o) -- Andrei Do they say that level 1 should be the default, and do they give a rationale for that? Would you

Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d
On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote: What is supposed to be done with "do not merge" PRs other than close them? Occasionally people need to try something on the auto tester (not sure if that's relevant to that particular PR, though). Presumably if someone marks

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 6/2/16 5:01 PM, ag0aep6g wrote: On 06/02/2016 10:50 PM, Andrei Alexandrescu wrote: It does not fall apart for code points. Yes it does. You've been given plenty examples where it falls apart. There weren't any. Your answer to that was that it operates on code points, not graphemes.

Re: The Case Against Autodecode

2016-06-02 Thread ag0aep6g via Digitalmars-d
On 06/02/2016 10:50 PM, Andrei Alexandrescu wrote: It does not fall apart for code points. Yes it does. You've been given plenty examples where it falls apart. Your answer to that was that it operates on code points, not graphemes. Well, duh. Comparing UTF-8 code units against each other

Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d
On Thursday, 2 June 2016 at 20:49:52 UTC, Andrei Alexandrescu wrote: On 06/02/2016 04:47 PM, tsbockman wrote: That doesn't sound like much of an endorsement for defaulting to only level 1 support to me - "it does not handle more complex languages or extensions to the Unicode Standard very

Re: The Case Against Autodecode

2016-06-02 Thread Jack Stouffer via Digitalmars-d
On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote: What is supposed to be done with "do not merge" PRs other than close them? Experimentally iterate until something workable comes about. This way it's done publicly and people can collaborate.

Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d
On 6/2/2016 1:46 PM, Adam D. Ruppe wrote: The compiler can help you with that. That's the point of the do not merge PR: it got an actionable list out of the compiler and proved the way forward was viable. What is supposed to be done with "do not merge" PRs other than close them?

Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d
On 06/02/2016 04:52 PM, ag0aep6g wrote: On 06/02/2016 10:36 PM, Andrei Alexandrescu wrote: By whom? The "support level 1" folks yonder at the Unicode standard? :o) -- Andrei Do they say that level 1 should be the default, and do they give a rationale for that? Would you kindly link or quote

  1   2   3   4   5   >