Re: [Issue 8660] New: Unclear semantics of array literals of char type, vs string literals
On Friday, 14 September 2012 at 15:00:29 UTC, Don Clugston wrote: On 14/09/12 14:50, monarch_dodra wrote: On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote: --- Comment #0 from Don 2012-09-14 04:28:17 PDT --- Array literals of char type, have completely different semantics from string literals. In module scope: char[] x = ['a']; // OK -- array literals can have an implicit .dup char[] y = "b";// illegal A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation. I think this is the normal behavior actually. When you write "char[] x = ['a'];", you are not actually "newing" (or "dup"-ing) any data. You are just letting x point to a stack allocated array of chars. I don't think you've looked at the compiler source code... The dup is in e2ir.c:4820. So the assignment is legal (but kind of unsafe actually, if you ever leak x). Yes it's legal. In my view it is a design mistake in the language. The issue now is how to minimize the damage from it. Thank you for taking the time to educate me. I still have a bit of trouble with static vs dynamic array initializations: Things don't work quite as in C++, which is confusing me. I'll need to study a bit harder how array initializations work. Good news is I'm learning. I think ALL my comments were wrong. In that case, you are right, since: char[] x = "a".dup; Is legal. Good point. For anybody reading though, the actual code example should be enum char[] x = foo(true); // ok enum char[] y = foo(false); // rejected! No it should not. The code example was correct. These are static variables. I hadn't thought of static variables: I placed your code in a main, and both produced a compilation error. The enums reproduced the issue for me however. I think this would work with my "m" suggestion Not necessary. This is only a question about what happens with the compiler internals. Yes.
Re: [Issue 8660] New: Unclear semantics of array literals of char type, vs string literals
On 14/09/12 14:50, monarch_dodra wrote: On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote: --- Comment #0 from Don 2012-09-14 04:28:17 PDT --- Array literals of char type, have completely different semantics from string literals. In module scope: char[] x = ['a']; // OK -- array literals can have an implicit .dup char[] y = "b";// illegal A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation. I think this is the normal behavior actually. When you write "char[] x = ['a'];", you are not actually "newing" (or "dup"-ing) any data. You are just letting x point to a stack allocated array of chars. I don't think you've looked at the compiler source code... The dup is in e2ir.c:4820. So the assignment is legal (but kind of unsafe actually, if you ever leak x). Yes it's legal. In my view it is a design mistake in the language. The issue now is how to minimize the damage from it. On the other hand, you can't bind y to an array of immutable chars, as that would subvert the type system. This, on the other hand, is legal. char[] y = "b".dup; I do not know how to initialize a char[] on the stack though (Appart from writing ['h', 'e', 'l', ... ]). If utf8 also gets involved, then I don't know of any workaround. I think a good solution would be to request the "m" prefix for literals, which would initialize them as "mutable": x = m"some mutable string"; A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation. CTFE can use either, but it has to choose one. This leads to odd effects: string foo(bool b) { string c = ['a']; string d = "a"; if (b) return c ~ c; else return c ~ d; } char[] x = foo(true); // ok char[] y = foo(false); // rejected! This is really bizarre because at run time, there is no difference between foo(true) and foo(false). They both return a slice of something allocated on the heap. I think x = foo(true) should be rejected as well, it has an implicit cast from immutable to mutable. Good point. For anybody reading though, the actual code example should be enum char[] x = foo(true); // ok enum char[] y = foo(false); // rejected! No it should not. The code example was correct. These are static variables. I think the best way to clean up this mess would be to convert char[] array literals into string literals whenever possible. This would mean that string literals may occasionally be of *mutable* type! This would means that whenever they are assigned to a mutable variable, an implicit .dup gets added (just as happens now with array literals). The trailing zero would not be duped. ie: A string literal of mutable type should behaves the way a char[] array literal behaves now. A char[] array literal of immutable type should behave the way a string literal does now. I think this would work with my "m" suggestion Not necessary. This is only a question about what happens with the compiler internals.
Re: [Issue 8660] New: Unclear semantics of array literals of char type, vs string literals
On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote: --- Comment #0 from Don 2012-09-14 04:28:17 PDT --- Array literals of char type, have completely different semantics from string literals. In module scope: char[] x = ['a']; // OK -- array literals can have an implicit .dup char[] y = "b";// illegal A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation. I think this is the normal behavior actually. When you write "char[] x = ['a'];", you are not actually "newing" (or "dup"-ing) any data. You are just letting x point to a stack allocated array of chars. So the assignment is legal (but kind of unsafe actually, if you ever leak x). On the other hand, you can't bind y to an array of immutable chars, as that would subvert the type system. This, on the other hand, is legal. char[] y = "b".dup; I do not know how to initialize a char[] on the stack though (Appart from writing ['h', 'e', 'l', ... ]). If utf8 also gets involved, then I don't know of any workaround. I think a good solution would be to request the "m" prefix for literals, which would initialize them as "mutable": x = m"some mutable string"; A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation. CTFE can use either, but it has to choose one. This leads to odd effects: string foo(bool b) { string c = ['a']; string d = "a"; if (b) return c ~ c; else return c ~ d; } char[] x = foo(true); // ok char[] y = foo(false); // rejected! This is really bizarre because at run time, there is no difference between foo(true) and foo(false). They both return a slice of something allocated on the heap. I think x = foo(true) should be rejected as well, it has an implicit cast from immutable to mutable. Good point. For anybody reading though, the actual code example should be enum char[] x = foo(true); // ok enum char[] y = foo(false); // rejected! I think the best way to clean up this mess would be to convert char[] array literals into string literals whenever possible. This would mean that string literals may occasionally be of *mutable* type! This would means that whenever they are assigned to a mutable variable, an implicit .dup gets added (just as happens now with array literals). The trailing zero would not be duped. ie: A string literal of mutable type should behaves the way a char[] array literal behaves now. A char[] array literal of immutable type should behave the way a string literal does now. I think this would work with my "m" suggestion