Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread Steven Schveighoffer via Digitalmars-d-learn

On 8/15/21 2:10 AM, rempas wrote:

So when I'm doing something like the following: `string name = "John";`
Then what's the actual type of the literal `"John"`?
In the chapter [Calling C 
functions](https://dlang.org/spec/interfaceToC.html#calling_c_functions) 
in the "Interfacing with C" page, the following is said:
Strings are not 0 terminated in D. See "Data Type Compatibility" for 
more information about this. However, string literals in D are 0 
terminated.


Which is really interesting and makes me suppose that `"John"` is a 
string literal right?
However, when I'm writing something like the following: `char *name = 
"John";`,

then D will complain with the following message:
Error: cannot implicitly convert expression `"John"` of type `string` 
to `char*`


Which is interesting because this works in C. If I use `const char*` 
instead, it will work. I suppose that this has to do with the fact that 
`string` is an alias for `immutable(char[])` but still this has to mean 
that the actual type of a LITERAL string is of type `string` (aka 
`immutable(char[])`).


Another thing I can do is cast the literal to a `char*` but I'm 
wondering what's going on under the hood in this case. Is casting 
executed at compile time or at runtime? So am I going to have an extra 
runtime cost having to first construct a `string` and then ALSO cast it 
to a string literal?


I hope all that makes sense and the someone can answer, lol


Lots of great responses in this thread!

I wanted to stress that a string literal is sort of magic. It has extra 
type information inside the compiler that is not available in the normal 
type system. Namely that "this is a literal, and so can morph into other 
things".


To give you some examples:

```d
string s = "John";
immutable(char)* cs = s; // nope
immutable(char)* cs2 = "John"; // OK!
wstring ws = s; // nope
wstring ws2 = "John"; // OK!
```

What is going on? Because the compiler knows this is a string *literal*, 
it can modify the type (and possibly the data itself) at will to match 
what you are assigning it to. In the case of zero-terminated C strings, 
it allows usage as a pointer instead of a D array. In the case of 
different width strings (wstring uses 16-bit code-units), it can 
actually transform the underlying data to what you wanted.


Note that even when you do lose that "literal" magic by assigning to a 
variable, you can still rely on D always putting a terminating zero in 
the data segment for a string literal. So it's valid to just do:


```d
string s = "John";
printf(s.ptr);


As long as you *know* the string came from a literal.

-Steve


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread Ali Çehreli via Digitalmars-d-learn
Lot's of great information and pointers already. I will try from another 
angle. :)


On 8/14/21 11:10 PM, rempas wrote:

> So when I'm doing something like the following: `string name = "John";`
> Then what's the actual type of the literal `"John"`?

As you say and as the code shows, there are two constructs in that line. 
The right-hand side is a string literal. The left-hand side is a 'string'.


>> Strings are not 0 terminated in D. See "Data Type Compatibility" for
>> more information about this. However, string literals in D are 0
>> terminated.

The string literal is embedded into the compiled program as 5 bytes in 
this case: 'J', 'o', 'h', 'n', '\0'. That's the right-hand side of your 
code above.


'string' is an array in D and arrays are stored as the following pair:

  size_t length;// The number of elements
  T * ptr;  // The pointer to the first element

(This is called a "fat pointer".)

So, if we assume that the literal 'John' was placed at memory location 
0x1000, then the left-hand side of your code will satisfy the following 
conditions:


  assert(name.length == 4);// <-- NOT 5
  assert(name.ptr == 0x1000);

The important part to note is how even though the string literal was 
stored as 5 bytes but the string's length is 4.


As others said, when we add a character to a string, there is no '\0' 
involved. Only the newly added char will the added.


Functions in D do not need the '\0' sentinel to know where the string 
ends. The end is already known from the 'length' property.


Ali



Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread rempas via Digitalmars-d-learn

On Sunday, 15 August 2021 at 09:06:14 UTC, Mike Parker wrote:


The D `string` is an alias for `immutable(char)[]`, immutable 
contents of a mutable array reference (`immutable(char[])` 
would mean the array reference is also immutable). You don't 
want to assign that to a `char*`, because then you'd be able to 
mutate the contents of the string, thereby violating the 
contract of immutable. (`immutable` means the data to which 
it's applied, in this case the contents of an array, will not 
be mutated through any reference anywhere in the program.)


[...]


Thanks a lot for the info!


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread Mike Parker via Digitalmars-d-learn

On Sunday, 15 August 2021 at 08:11:39 UTC, rempas wrote:



I mean that in C, we can assign a string literal into a `char*` 
and also a `const char*` type without getting a compilation 
error while in D, we can only assign it to a `const char*` 
type. I suppose that's because of C doing explicit conversion. 
I didn't talked about mutating a string literal


The D `string` is an alias for `immutable(char)[]`, immutable 
contents of a mutable array reference (`immutable(char[])` would 
mean the array reference is also immutable). You don't want to 
assign that to a `char*`, because then you'd be able to mutate 
the contents of the string, thereby violating the contract of 
immutable. (`immutable` means the data to which it's applied, in 
this case the contents of an array, will not be mutated through 
any reference anywhere in the program.)


Assigning it to `const(char)*` is fine, because `const` means the 
data can't be mutated through that particular reference (pointer 
in this case). And because strings in C are quite frequently 
represented as `const(char)*`, especially in function parameter 
lists, D string literals are explicitly convertible to 
`const(char)*` and also NUL-terminated. So you can do something 
like `puts("Something")` without worry.


This blog post may be helpful:

https://dlang.org/blog/2021/05/24/interfacing-d-with-c-strings-part-one/



Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread rempas via Digitalmars-d-learn

On Sunday, 15 August 2021 at 09:01:17 UTC, jfondren wrote:
They don't do the same thing. toStringz always copies, always 
GC-allocates, and always NUL-terminates. `cast(char*)` only 
does what you want in the case that you're applying it a string 
literal. But in that case you shouldn't cast, you should just


```d
const char* s = "John";
```

If you need cast cast the const away to work with a C API, 
doing that separately, at the point of the call to the C 
function, makes it clearer what you're doing and what the risks 
are there (does the C function modify the string? If so this 
will segfault).


Yeah I won't cast when having a `const char*`. I already 
mentioned that it works without cast with `const` variables ;)





Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread jfondren via Digitalmars-d-learn

On Sunday, 15 August 2021 at 08:56:07 UTC, rempas wrote:

On Sunday, 15 August 2021 at 08:53:50 UTC, Tejas wrote:
External C libraries expect strings to be null terminated, so 
if you do use `.dup`, use `.toStringz` as well.


Yeah, yeah I got that. My question is, if I should avoid 
`cast(char*)` and use `.toStringz` while both do the exact same 
thing?


They don't do the same thing. toStringz always copies, always 
GC-allocates, and always NUL-terminates. `cast(char*)` only does 
what you want in the case that you're applying it a string 
literal. But in that case you shouldn't cast, you should just


```d
const char* s = "John";
```

If you need cast cast the const away to work with a C API, doing 
that separately, at the point of the call to the C function, 
makes it clearer what you're doing and what the risks are there 
(does the C function modify the string? If so this will segfault).


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread rempas via Digitalmars-d-learn

On Sunday, 15 August 2021 at 08:53:50 UTC, Tejas wrote:
External C libraries expect strings to be null terminated, so 
if you do use `.dup`, use `.toStringz` as well.


Yeah, yeah I got that. My question is, if I should avoid 
`cast(char*)` and use `.toStringz` while both do the exact same 
thing?


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread Tejas via Digitalmars-d-learn

On Sunday, 15 August 2021 at 08:51:19 UTC, rempas wrote:

On Sunday, 15 August 2021 at 08:47:39 UTC, jfondren wrote:


dup() isn't aware of the NUL since that's outside the slice of 
the string. It only copies the chars in "John". You can use 
toStringz to ensure NUL termination:

https://dlang.org/phobos/std_string.html#.toStringz


Is there something bad than just casting it to `char*` that I 
should be aware of?


External C libraries expect strings to be null terminated, so if 
you do use `.dup`, use `.toStringz` as well.


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread rempas via Digitalmars-d-learn

On Sunday, 15 August 2021 at 08:47:39 UTC, jfondren wrote:


dup() isn't aware of the NUL since that's outside the slice of 
the string. It only copies the chars in "John". You can use 
toStringz to ensure NUL termination:

https://dlang.org/phobos/std_string.html#.toStringz


Is there something bad than just casting it to `char*` that I 
should be aware of?


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread jfondren via Digitalmars-d-learn

On Sunday, 15 August 2021 at 08:11:39 UTC, rempas wrote:

On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote:

```d
unittest {
char* s = "John".dup.ptr;
s[0] = 'X'; // no segfaults
assert(s[0..4] == "Xohn"); // ok
}
```



Well, that one didn't worked out really well for me. Using 
`.dup.ptr`, didn't added a null terminated character


dup() isn't aware of the NUL since that's outside the slice of 
the string. It only copies the chars in "John". You can use 
toStringz to ensure NUL termination:

https://dlang.org/phobos/std_string.html#.toStringz


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread jfondren via Digitalmars-d-learn

On Sunday, 15 August 2021 at 07:47:27 UTC, jfondren wrote:

On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote:

On Sunday, 15 August 2021 at 06:10:53 UTC, rempas wrote:
```d
unittest {
char* s = "John".dup.ptr;
s[0] = 'X'; // no segfaults
assert(s[0..4] == "Xohn"); // ok
}
```

So am I going to have an extra runtime cost having to first 
construct a `string` and then ALSO cast it to a string 
literal?


In the above case, "John" is a string that's compiled into the 
resulting executable and loaded into read-only memory, and this 
code is reached that string is duplicated, at runtime, to 
create a copy in writable memory.


Probably a more useful way to think about this is to consider 
what happens in a loop:


```d
void static_lifetime() @nogc {
foreach (i; 0 .. 100) {
string s = "John";
// some code
}
}
```

^^ At runtime a slice is created on the stack 100 times, with a 
pointer to the 'J' of the literal, a length of 4, etc. The cost 
of this doesn't change with the length of the literal, and the 
bytes of the literal aren't copied, so this code would be just as 
fast if the string were megabytes in length.


```d
void dynamically_allocated() { // no @nogc
foreach (i; 0 .. 100) {
char[] s = "John".dup;
// some code
}
}
```

^^ Here, the literal is copied into freshly GC-allocated memory a 
hundred times, and a slice is made from that.


And for completeness:

```d
void stack_allocated() @nogc {
foreach (i; 0 .. 100) {
char[4] raw = "John";
char[] s = raw[0..$];
// some code
}
}
```

^^ Here, a static array is constructed on the stack a hundred 
times, and the literal is copied into the array, and then a slice 
is constructed on the stack with a pointer into the array on the 
stack, a length of 4, etc. This doesn't use the GC but the stack 
is limited in size and now you have worry about the slice getting 
copied elsewhere and outliving the data on the stack:


```d
char[] stack_allocated() @nogc {
char[] ret;
foreach (i; 0 .. 100) {
char[4] raw = "John";
char[] s = raw[0 .. $];
ret = s;
}
return ret; // errors with -preview=dip1000
}

void main() {
import std.stdio : writeln;

char[] s = stack_allocated();
writeln(s); // prints garbage
}
```


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread rempas via Digitalmars-d-learn

On Sunday, 15 August 2021 at 08:17:47 UTC, rikki cattermole wrote:


pragma is a set of commands to the compiler that may be 
compiler specific.


In the case of the msg command, it tells the compiler to output 
a message to stdout during compilation.


Thanks man!


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread rikki cattermole via Digitalmars-d-learn



On 15/08/2021 8:11 PM, rempas wrote:

Still don't know what "pragma" does but thank you.


pragma is a set of commands to the compiler that may be compiler specific.

In the case of the msg command, it tells the compiler to output a 
message to stdout during compilation.


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread rempas via Digitalmars-d-learn

On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote:

```d
unittest {
pragma(msg, typeof("John"));  // string
pragma(msg, is(typeof("John") == immutable(char)[]));  // 
true

}
```


Still don't know what "pragma" does but thank you.


```d
void zerort(string s) {
assert(s.ptr[s.length] == '\0');
}

unittest {
zerort("John"); // assertion success
string s = "Jo";
s ~= "hn";
zerort(s); // assertion failure
}
```

If a function takes a string as a runtime parameter, it might 
not be NUL terminated. This might be more obvious with 
substrings:


```d
unittest {
string j = "John";
string s = j[0..2];
assert(s == "Jo");
assert(s.ptr == j.ptr);
assert(s.ptr[s.length] == 'h'); // it's h-terminated
}
```


That's interesting!


```c
void mutate(char *s) {
s[0] = 'X';
}

int main() {
char *s = "John";
mutate(s); // segmentation fault
}
```

`char*` is just the wrong type, it suggests mutability where 
mutability ain't.


I mean that in C, we can assign a string literal into a `char*` 
and also a `const char*` type without getting a compilation error 
while in D, we can only assign it to a `const char*` type. I 
suppose that's because of C doing explicit conversion. I didn't 
talked about mutating a string literal


Compile-time. std.conv.to is what you'd use at runtime. Here 
though, what you want is `dup` to get a `char[]`, which you can 
then take the pointer of if you want:


```d
unittest {
char* s = "John".dup.ptr;
s[0] = 'X'; // no segfaults
assert(s[0..4] == "Xohn"); // ok
}
```



Well, that one didn't worked out really well for me. Using 
`.dup.ptr`, didn't added a null terminated character while 
`cast(char*)` did. So I suppose the first way is more better when 
you want a C-like `char*` and not a D-like `char[]`.


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread jfondren via Digitalmars-d-learn

On Sunday, 15 August 2021 at 07:43:59 UTC, jfondren wrote:

On Sunday, 15 August 2021 at 06:10:53 UTC, rempas wrote:
```d
unittest {
char* s = "John".dup.ptr;
s[0] = 'X'; // no segfaults
assert(s[0..4] == "Xohn"); // ok
}
```

So am I going to have an extra runtime cost having to first 
construct a `string` and then ALSO cast it to a string literal?


In the above case, "John" is a string that's compiled into the 
resulting executable and loaded into read-only memory, and this 
code is reached that string is duplicated, at runtime, to create 
a copy in writable memory.


Re: What exactly are the String literrals in D and how they work?

2021-08-15 Thread jfondren via Digitalmars-d-learn

On Sunday, 15 August 2021 at 06:10:53 UTC, rempas wrote:
So when I'm doing something like the following: `string name = 
"John";`

Then what's the actual type of the literal `"John"`?


```d
unittest {
pragma(msg, typeof("John"));  // string
pragma(msg, is(typeof("John") == immutable(char)[]));  // true
}
```

In the chapter [Calling C 
functions](https://dlang.org/spec/interfaceToC.html#calling_c_functions) in the "Interfacing with C" page, the following is said:
Strings are not 0 terminated in D. See "Data Type 
Compatibility" for more information about this. However, 
string literals in D are 0 terminated.


```d
void zerort(string s) {
assert(s.ptr[s.length] == '\0');
}

unittest {
zerort("John"); // assertion success
string s = "Jo";
s ~= "hn";
zerort(s); // assertion failure
}
```

If a function takes a string as a runtime parameter, it might not 
be NUL terminated. This might be more obvious with substrings:


```d
unittest {
string j = "John";
string s = j[0..2];
assert(s == "Jo");
assert(s.ptr == j.ptr);
assert(s.ptr[s.length] == 'h'); // it's h-terminated
}
```



Which is really interesting and makes me suppose that `"John"` 
is a string literal right?
However, when I'm writing something like the following: `char 
*name = "John";`,

then D will complain with the following message:
Error: cannot implicitly convert expression `"John"` of type 
`string` to `char*`


Which is interesting because this works in C.


Well, kinda:

```c
void mutate(char *s) {
s[0] = 'X';
}

int main() {
char *s = "John";
mutate(s); // segmentation fault
}
```

`char*` is just the wrong type, it suggests mutability where 
mutability ain't.


If I use `const char*` instead, it will work. I suppose that 
this has to do with the fact that `string` is an alias for 
`immutable(char[])` but still this has to mean that the actual 
type of a LITERAL string is of type `string` (aka 
`immutable(char[])`).


Another thing I can do is cast the literal to a `char*` but I'm 
wondering what's going on under the hood in this case.


The same thing as in C:

```d
void mutate(char *s) {
s[0] = 'X';
}

void main() {
char* s = cast(char*) "John";
mutate(s); // program killed by signal 11
}
```


Is casting executed at compile time or at runtime?


Compile-time. std.conv.to is what you'd use at runtime. Here 
though, what you want is `dup` to get a `char[]`, which you can 
then take the pointer of if you want:


```d
unittest {
char* s = "John".dup.ptr;
s[0] = 'X'; // no segfaults
assert(s[0..4] == "Xohn"); // ok
}
```

So am I going to have an extra runtime cost having to first 
construct a `string` and then ALSO cast it to a string literal?


I hope all that makes sense and the someone can answer, lol