Re: What exactly are the String literrals in D and how they work?
On 8/15/21 2:10 AM, rempas wrote: So when I'm doing something like the following: `string name = "John";` Then what's the actual type of the literal `"John"`? In the chapter [Calling C functions](https://dlang.org/spec/interfaceToC.html#calling_c_functions) in the "Interfacing with C" page, the following is said: Strings are not 0 terminated in D. See "Data Type Compatibility" for more information about this. However, string literals in D are 0 terminated. Which is really interesting and makes me suppose that `"John"` is a string literal right? However, when I'm writing something like the following: `char *name = "John";`, then D will complain with the following message: Error: cannot implicitly convert expression `"John"` of type `string` to `char*` Which is interesting because this works in C. If I use `const char*` instead, it will work. I suppose that this has to do with the fact that `string` is an alias for `immutable(char[])` but still this has to mean that the actual type of a LITERAL string is of type `string` (aka `immutable(char[])`). Another thing I can do is cast the literal to a `char*` but I'm wondering what's going on under the hood in this case. Is casting executed at compile time or at runtime? So am I going to have an extra runtime cost having to first construct a `string` and then ALSO cast it to a string literal? I hope all that makes sense and the someone can answer, lol Lots of great responses in this thread! I wanted to stress that a string literal is sort of magic. It has extra type information inside the compiler that is not available in the normal type system. Namely that "this is a literal, and so can morph into other things". To give you some examples: ```d string s = "John"; immutable(char)* cs = s; // nope immutable(char)* cs2 = "John"; // OK! wstring ws = s; // nope wstring ws2 = "John"; // OK! ``` What is going on? Because the compiler knows this is a string *literal*, it can modify the type (and possibly the data itself) at will to match what you are assigning it to. In the case of zero-terminated C strings, it allows usage as a pointer instead of a D array. In the case of different width strings (wstring uses 16-bit code-units), it can actually transform the underlying data to what you wanted. Note that even when you do lose that "literal" magic by assigning to a variable, you can still rely on D always putting a terminating zero in the data segment for a string literal. So it's valid to just do: ```d string s = "John"; printf(s.ptr); As long as you *know* the string came from a literal. -Steve
Re: What exactly are the String literrals in D and how they work?
Lot's of great information and pointers already. I will try from another angle. :) On 8/14/21 11:10 PM, rempas wrote: > So when I'm doing something like the following: `string name = "John";` > Then what's the actual type of the literal `"John"`? As you say and as the code shows, there are two constructs in that line. The right-hand side is a string literal. The left-hand side is a 'string'. >> Strings are not 0 terminated in D. See "Data Type Compatibility" for >> more information about this. However, string literals in D are 0 >> terminated. The string literal is embedded into the compiled program as 5 bytes in this case: 'J', 'o', 'h', 'n', '\0'. That's the right-hand side of your code above. 'string' is an array in D and arrays are stored as the following pair: size_t length;// The number of elements T * ptr; // The pointer to the first element (This is called a "fat pointer".) So, if we assume that the literal 'John' was placed at memory location 0x1000, then the left-hand side of your code will satisfy the following conditions: assert(name.length == 4);// <-- NOT 5 assert(name.ptr == 0x1000); The important part to note is how even though the string literal was stored as 5 bytes but the string's length is 4. As others said, when we add a character to a string, there is no '\0' involved. Only the newly added char will the added. Functions in D do not need the '\0' sentinel to know where the string ends. The end is already known from the 'length' property. Ali
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 09:06:14 UTC, Mike Parker wrote: The D `string` is an alias for `immutable(char)[]`, immutable contents of a mutable array reference (`immutable(char[])` would mean the array reference is also immutable). You don't want to assign that to a `char*`, because then you'd be able to mutate the contents of the string, thereby violating the contract of immutable. (`immutable` means the data to which it's applied, in this case the contents of an array, will not be mutated through any reference anywhere in the program.) [...] Thanks a lot for the info!
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 08:11:39 UTC, rempas wrote: I mean that in C, we can assign a string literal into a `char*` and also a `const char*` type without getting a compilation error while in D, we can only assign it to a `const char*` type. I suppose that's because of C doing explicit conversion. I didn't talked about mutating a string literal The D `string` is an alias for `immutable(char)[]`, immutable contents of a mutable array reference (`immutable(char[])` would mean the array reference is also immutable). You don't want to assign that to a `char*`, because then you'd be able to mutate the contents of the string, thereby violating the contract of immutable. (`immutable` means the data to which it's applied, in this case the contents of an array, will not be mutated through any reference anywhere in the program.) Assigning it to `const(char)*` is fine, because `const` means the data can't be mutated through that particular reference (pointer in this case). And because strings in C are quite frequently represented as `const(char)*`, especially in function parameter lists, D string literals are explicitly convertible to `const(char)*` and also NUL-terminated. So you can do something like `puts("Something")` without worry. This blog post may be helpful: https://dlang.org/blog/2021/05/24/interfacing-d-with-c-strings-part-one/
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 09:01:17 UTC, jfondren wrote: They don't do the same thing. toStringz always copies, always GC-allocates, and always NUL-terminates. `cast(char*)` only does what you want in the case that you're applying it a string literal. But in that case you shouldn't cast, you should just ```d const char* s = "John"; ``` If you need cast cast the const away to work with a C API, doing that separately, at the point of the call to the C function, makes it clearer what you're doing and what the risks are there (does the C function modify the string? If so this will segfault). Yeah I won't cast when having a `const char*`. I already mentioned that it works without cast with `const` variables ;)
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 08:56:07 UTC, rempas wrote: On Sunday, 15 August 2021 at 08:53:50 UTC, Tejas wrote: External C libraries expect strings to be null terminated, so if you do use `.dup`, use `.toStringz` as well. Yeah, yeah I got that. My question is, if I should avoid `cast(char*)` and use `.toStringz` while both do the exact same thing? They don't do the same thing. toStringz always copies, always GC-allocates, and always NUL-terminates. `cast(char*)` only does what you want in the case that you're applying it a string literal. But in that case you shouldn't cast, you should just ```d const char* s = "John"; ``` If you need cast cast the const away to work with a C API, doing that separately, at the point of the call to the C function, makes it clearer what you're doing and what the risks are there (does the C function modify the string? If so this will segfault).
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 08:53:50 UTC, Tejas wrote: External C libraries expect strings to be null terminated, so if you do use `.dup`, use `.toStringz` as well. Yeah, yeah I got that. My question is, if I should avoid `cast(char*)` and use `.toStringz` while both do the exact same thing?
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 08:51:19 UTC, rempas wrote: On Sunday, 15 August 2021 at 08:47:39 UTC, jfondren wrote: dup() isn't aware of the NUL since that's outside the slice of the string. It only copies the chars in "John". You can use toStringz to ensure NUL termination: https://dlang.org/phobos/std_string.html#.toStringz Is there something bad than just casting it to `char*` that I should be aware of? External C libraries expect strings to be null terminated, so if you do use `.dup`, use `.toStringz` as well.
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 08:47:39 UTC, jfondren wrote: dup() isn't aware of the NUL since that's outside the slice of the string. It only copies the chars in "John". You can use toStringz to ensure NUL termination: https://dlang.org/phobos/std_string.html#.toStringz Is there something bad than just casting it to `char*` that I should be aware of?
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 08:11:39 UTC, rempas wrote: On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote: ```d unittest { char* s = "John".dup.ptr; s[0] = 'X'; // no segfaults assert(s[0..4] == "Xohn"); // ok } ``` Well, that one didn't worked out really well for me. Using `.dup.ptr`, didn't added a null terminated character dup() isn't aware of the NUL since that's outside the slice of the string. It only copies the chars in "John". You can use toStringz to ensure NUL termination: https://dlang.org/phobos/std_string.html#.toStringz
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 07:47:27 UTC, jfondren wrote: On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote: On Sunday, 15 August 2021 at 06:10:53 UTC, rempas wrote: ```d unittest { char* s = "John".dup.ptr; s[0] = 'X'; // no segfaults assert(s[0..4] == "Xohn"); // ok } ``` So am I going to have an extra runtime cost having to first construct a `string` and then ALSO cast it to a string literal? In the above case, "John" is a string that's compiled into the resulting executable and loaded into read-only memory, and this code is reached that string is duplicated, at runtime, to create a copy in writable memory. Probably a more useful way to think about this is to consider what happens in a loop: ```d void static_lifetime() @nogc { foreach (i; 0 .. 100) { string s = "John"; // some code } } ``` ^^ At runtime a slice is created on the stack 100 times, with a pointer to the 'J' of the literal, a length of 4, etc. The cost of this doesn't change with the length of the literal, and the bytes of the literal aren't copied, so this code would be just as fast if the string were megabytes in length. ```d void dynamically_allocated() { // no @nogc foreach (i; 0 .. 100) { char[] s = "John".dup; // some code } } ``` ^^ Here, the literal is copied into freshly GC-allocated memory a hundred times, and a slice is made from that. And for completeness: ```d void stack_allocated() @nogc { foreach (i; 0 .. 100) { char[4] raw = "John"; char[] s = raw[0..$]; // some code } } ``` ^^ Here, a static array is constructed on the stack a hundred times, and the literal is copied into the array, and then a slice is constructed on the stack with a pointer into the array on the stack, a length of 4, etc. This doesn't use the GC but the stack is limited in size and now you have worry about the slice getting copied elsewhere and outliving the data on the stack: ```d char[] stack_allocated() @nogc { char[] ret; foreach (i; 0 .. 100) { char[4] raw = "John"; char[] s = raw[0 .. $]; ret = s; } return ret; // errors with -preview=dip1000 } void main() { import std.stdio : writeln; char[] s = stack_allocated(); writeln(s); // prints garbage } ```
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 08:17:47 UTC, rikki cattermole wrote: pragma is a set of commands to the compiler that may be compiler specific. In the case of the msg command, it tells the compiler to output a message to stdout during compilation. Thanks man!
Re: What exactly are the String literrals in D and how they work?
On 15/08/2021 8:11 PM, rempas wrote: Still don't know what "pragma" does but thank you. pragma is a set of commands to the compiler that may be compiler specific. In the case of the msg command, it tells the compiler to output a message to stdout during compilation.
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote: ```d unittest { pragma(msg, typeof("John")); // string pragma(msg, is(typeof("John") == immutable(char)[])); // true } ``` Still don't know what "pragma" does but thank you. ```d void zerort(string s) { assert(s.ptr[s.length] == '\0'); } unittest { zerort("John"); // assertion success string s = "Jo"; s ~= "hn"; zerort(s); // assertion failure } ``` If a function takes a string as a runtime parameter, it might not be NUL terminated. This might be more obvious with substrings: ```d unittest { string j = "John"; string s = j[0..2]; assert(s == "Jo"); assert(s.ptr == j.ptr); assert(s.ptr[s.length] == 'h'); // it's h-terminated } ``` That's interesting! ```c void mutate(char *s) { s[0] = 'X'; } int main() { char *s = "John"; mutate(s); // segmentation fault } ``` `char*` is just the wrong type, it suggests mutability where mutability ain't. I mean that in C, we can assign a string literal into a `char*` and also a `const char*` type without getting a compilation error while in D, we can only assign it to a `const char*` type. I suppose that's because of C doing explicit conversion. I didn't talked about mutating a string literal Compile-time. std.conv.to is what you'd use at runtime. Here though, what you want is `dup` to get a `char[]`, which you can then take the pointer of if you want: ```d unittest { char* s = "John".dup.ptr; s[0] = 'X'; // no segfaults assert(s[0..4] == "Xohn"); // ok } ``` Well, that one didn't worked out really well for me. Using `.dup.ptr`, didn't added a null terminated character while `cast(char*)` did. So I suppose the first way is more better when you want a C-like `char*` and not a D-like `char[]`.
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote: On Sunday, 15 August 2021 at 06:10:53 UTC, rempas wrote: ```d unittest { char* s = "John".dup.ptr; s[0] = 'X'; // no segfaults assert(s[0..4] == "Xohn"); // ok } ``` So am I going to have an extra runtime cost having to first construct a `string` and then ALSO cast it to a string literal? In the above case, "John" is a string that's compiled into the resulting executable and loaded into read-only memory, and this code is reached that string is duplicated, at runtime, to create a copy in writable memory.
Re: What exactly are the String literrals in D and how they work?
On Sunday, 15 August 2021 at 06:10:53 UTC, rempas wrote: So when I'm doing something like the following: `string name = "John";` Then what's the actual type of the literal `"John"`? ```d unittest { pragma(msg, typeof("John")); // string pragma(msg, is(typeof("John") == immutable(char)[])); // true } ``` In the chapter [Calling C functions](https://dlang.org/spec/interfaceToC.html#calling_c_functions) in the "Interfacing with C" page, the following is said: Strings are not 0 terminated in D. See "Data Type Compatibility" for more information about this. However, string literals in D are 0 terminated. ```d void zerort(string s) { assert(s.ptr[s.length] == '\0'); } unittest { zerort("John"); // assertion success string s = "Jo"; s ~= "hn"; zerort(s); // assertion failure } ``` If a function takes a string as a runtime parameter, it might not be NUL terminated. This might be more obvious with substrings: ```d unittest { string j = "John"; string s = j[0..2]; assert(s == "Jo"); assert(s.ptr == j.ptr); assert(s.ptr[s.length] == 'h'); // it's h-terminated } ``` Which is really interesting and makes me suppose that `"John"` is a string literal right? However, when I'm writing something like the following: `char *name = "John";`, then D will complain with the following message: Error: cannot implicitly convert expression `"John"` of type `string` to `char*` Which is interesting because this works in C. Well, kinda: ```c void mutate(char *s) { s[0] = 'X'; } int main() { char *s = "John"; mutate(s); // segmentation fault } ``` `char*` is just the wrong type, it suggests mutability where mutability ain't. If I use `const char*` instead, it will work. I suppose that this has to do with the fact that `string` is an alias for `immutable(char[])` but still this has to mean that the actual type of a LITERAL string is of type `string` (aka `immutable(char[])`). Another thing I can do is cast the literal to a `char*` but I'm wondering what's going on under the hood in this case. The same thing as in C: ```d void mutate(char *s) { s[0] = 'X'; } void main() { char* s = cast(char*) "John"; mutate(s); // program killed by signal 11 } ``` Is casting executed at compile time or at runtime? Compile-time. std.conv.to is what you'd use at runtime. Here though, what you want is `dup` to get a `char[]`, which you can then take the pointer of if you want: ```d unittest { char* s = "John".dup.ptr; s[0] = 'X'; // no segfaults assert(s[0..4] == "Xohn"); // ok } ``` So am I going to have an extra runtime cost having to first construct a `string` and then ALSO cast it to a string literal? I hope all that makes sense and the someone can answer, lol