[Issue 8229] string literals are not zero-terminated during CTFE

2015-01-20 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=8229

--- Comment #6 from Kenji Hara  ---
I'd just introduce a sample code.
>From the comment in issue 7570:

bool not_end(const char *s, const int n) {
return s && s[n];
}
bool str_prefix(const char *s, const char *t, const int ns, const int nt) {
return (s == t) || !*(t + nt) || (*(s + ns) == *(t + nt) && (str_prefix(s,
t, ns+1, nt+1)));
}
bool contains(const char *s, const char *needle, const int n=0) {
return not_end(s, n) && (str_prefix(s, needle, n, 0) || contains(s, needle,
n+1));
}
enum int x = contains("froogler", "oogle");

Today the code fails to CTFE by the reading of string zero terminator.
Supporting it in CTFE may be useful for C string operations.

--


[Issue 8229] string literals are not zero-terminated during CTFE

2013-09-28 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8229



--- Comment #5 from Martin Nowak  2013-09-28 04:20:53 PDT ---
It is also a huge performance issue to use ArrayLiteralExp instead of
StringLiteralExp during object emission because the compiler creates a list of
1-byte elements. If for example you generate a 5kB string in CTFE this induces
a huge overhead.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8229] string literals are not zero-terminated during CTFE

2013-09-27 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8229


Martin Nowak  changed:

   What|Removed |Added

 CC||c...@dawg.eu
   Severity|normal  |major


--- Comment #4 from Martin Nowak  2013-09-27 15:58:28 PDT ---
---
string bug(string a)
{
char[] buf;
buf.length = a.length;
buf[0 .. a.length] = a[];
return cast(string)buf[];
}

static const var = bug("foo");
---

I have a much bigger problem related to this.
String literals resulting from CTFE are missing the terminating zero in the
data segment. Whether or not the bug bites depends on the object layout and the
virtual memory mapping, so this is pretty annoying because it works too often.
The underlying issue is that var is emitted to the object file from
ArrayLiteralExp::toDt which doesn't perform the zero termination.
Not sure if and at which stage this should be converted to a StringLiteralExp.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8229] string literals are not zero-terminated during CTFE

2012-06-13 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8229



--- Comment #3 from Don  2012-06-13 01:44:42 PDT ---
(In reply to comment #2)
> (In reply to comment #1)
> > This behaviour is intentional. Pointer operations are strictly checked in 
> > CTFE.
> > It's the same as doing 
> > 
> > int n = 0;
> > char c = ""[n];
> > 
> > which generates an array bounds error at runtime.
> > 
> 
> I think that would be stretching it too far. It is more like:
> 
> auto s = ['\0'];
> auto q = s[0..0];
> char c = *q.ptr;

That's an interesting interpretation. It can't be true for D1, where string
literals are fixed length arrays, but it could work for D2.

In D1 it's more like:
struct S
{
  static char[3] s = ['a', 'b', 'c'];
  static char terminator = '\0';
}
And every mention of it in the spec dates from D1.

> > Is the terminating null character still in the spec? A long time ago it was 
> > in
> > there, but now I can only find two references to it in the current spec (in
> > 'arrays' and in 'interfacing to C'), and they both relate to printf. 
> > 
> > The most detailed is in 'interface to C', which states:
> > "string literals, when they are not part of an initializer to a larger data
> > structure, have a '\0' character helpfully stored after the end of them."
> > 
> > which is pretty weird. These funky semantics would be difficult to 
> > implement in
> > CTFE,
> 
> I guess this is from D1 times, when string literals were static arrays, and
> doesn't apply anymore.

Could be. So the few parts of the spec that mention it are horribly
out-of-date.
Though it also applies to assigning to fixed length arrays.

immutable(char)[3] s = "abc";
// Does this have a trailing zero?

> > and I doubt they are desirable. Here's an example:
> > 
> > const(char)[] foo(char[] s) { return "abc" ~ s; }
> > 
> > immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE
> > 
> 
> Well, this is not specified afaics.

Hmm, maybe it isn't. The spec says almost nothing about the whole thing. What I
do know is that there is a lot of existing code that relies on this behaviour
(especially, "abc" ~ "def" having a trailing zero).
Pretty much the only thing the spec says is that you can use string literals
with printf.

Does TDPL mention it?

The spec definitely needs to be improved.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8229] string literals are not zero-terminated during CTFE

2012-06-12 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8229



--- Comment #2 from timon.g...@gmx.ch 2012-06-12 10:55:45 PDT ---
(In reply to comment #1)
> This behaviour is intentional. Pointer operations are strictly checked in 
> CTFE.
> It's the same as doing 
> 
> int n = 0;
> char c = ""[n];
> 
> which generates an array bounds error at runtime.
> 

I think that would be stretching it too far. It is more like:

auto s = ['\0'];
auto q = s[0..0];
char c = *q.ptr;

Which works fine at runtime and during CTFE.

> Is the terminating null character still in the spec? A long time ago it was in
> there, but now I can only find two references to it in the current spec (in
> 'arrays' and in 'interfacing to C'), and they both relate to printf. 
> 
> The most detailed is in 'interface to C', which states:
> "string literals, when they are not part of an initializer to a larger data
> structure, have a '\0' character helpfully stored after the end of them."
> 
> which is pretty weird. These funky semantics would be difficult to implement 
> in
> CTFE,

I guess this is from D1 times, when string literals were static arrays, and
doesn't apply anymore.

> and I doubt they are desirable. Here's an example:
> 
> const(char)[] foo(char[] s) { return "abc" ~ s; }
> 
> immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE
> 

Well, this is not specified afaics.

> bool baz()
> {
> immutable bar2 = foo("xyz"); // local variable, so isn't a string literal.
> 
> return true;
> }
> static assert(baz());
> 
> ---> bar is zero-terminated, bar2 is not, even though they had the same
> assignment. When does this magical trailing zero get added?
> 

This is exactly the behavior that is observed at runtime. If it is undesirable,
then that is a distinct issue that should be investigated.

It would certainly be desirable to have consistent behavior at compile time and
at runtime, but this is not a top-priority issue.

> I think you could reasonably interpret the spec as meaning that a trailing 
> zero
> is added to the end of string literals by the linker, not by the compiler. 
> It's
> only in CTFE that you can tell the difference.

In this case, the spec should definitely be fixed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8229] string literals are not zero-terminated during CTFE

2012-06-12 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8229


Don  changed:

   What|Removed |Added

 CC||clugd...@yahoo.com.au


--- Comment #1 from Don  2012-06-12 09:48:41 PDT ---
This behaviour is intentional. Pointer operations are strictly checked in CTFE.
It's the same as doing 

int n = 0;
char c = ""[n];

which generates an array bounds error at runtime.

Is the terminating null character still in the spec? A long time ago it was in
there, but now I can only find two references to it in the current spec (in
'arrays' and in 'interfacing to C'), and they both relate to printf. 

The most detailed is in 'interface to C', which states:
"string literals, when they are not part of an initializer to a larger data
structure, have a '\0' character helpfully stored after the end of them."

which is pretty weird. These funky semantics would be difficult to implement in
CTFE, and I doubt they are desirable. Here's an example:

const(char)[] foo(char[] s) { return "abc" ~ s; }

immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE

bool baz()
{
immutable bar2 = foo("xyz"); // local variable, so isn't a string literal.

return true;
}
static assert(baz());

---> bar is zero-terminated, bar2 is not, even though they had the same
assignment. When does this magical trailing zero get added?

I think you could reasonably interpret the spec as meaning that a trailing zero
is added to the end of string literals by the linker, not by the compiler. It's
only in CTFE that you can tell the difference.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---