[Issue 8185] Pure functions and pointers

2012-07-02 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #56 from github-bugzi...@puremagic.com 2012-07-01 23:12:33 PDT ---
Commits pushed to master at
https://github.com/D-Programming-Language/d-programming-language.org

https://github.com/D-Programming-Language/d-programming-language.org/commit/59670a7823d066f5146e276bdf5aac7bd93a3f45
Fix for issue# 8185.

This clarifies the definition of pure, since so many people seem to have
a hard time understanding that _all_ that pure means is that the
function cannot access global or static, mutable state or call impure
functions. Everything else with regards to pure is a matter of
implementation-specific optimizations - which does in some cases relate
to full, functional purity, but pure itself does not indicate anything
of the sort.

https://github.com/D-Programming-Language/d-programming-language.org/commit/8cc3ba694bc07ec684f2d1c5a088728aa18e7d93
Merge pull request #128 from jmdavis/pure

Fix for issue# 8185.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-07-02 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Walter Bright bugzi...@digitalmars.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||bugzi...@digitalmars.com
 Resolution||FIXED


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #20 from Denis Shelomovskij verylonglogin@gmail.com 
2012-06-04 11:54:40 MSD ---
(In reply to comment #19)
 I honestly don't understand why much in the way of examples are needed.

OK. I have written some examples. Are they too obvious to not be in docs?
Honestly, I'll be amazed if most of D programmers have thought about most of
that cases.

Examples:

pure functions (not sure if @system only or @safe too) in D are guaranteed to
be pure only if used according to it's documentation. There is no guarantees in
other case.
---
/// b argument have to be true or result will depend on global state
size_t f(size_t i, bool b) pure; // strongly pure

void main()
{
size_t i1 = f(1, false); // can depend on global state
size_t i2 = f(1, false); // f is free to produce different result here
// And if second f call is optimized out using i2 = i1,
// (because f is strongly pure) a program will behave
// differently in release mode so be careful.
}
---

For @system pure functions, it's your responsibility to pass correct arguments
to functions. These functions (even strongly pure) can be impure for
incorrect arguments and even results in undefined behavior.
---
extern (C) size_t strlen(in char* s) nothrow pure; // strongly pure

/// cstr must be zero-ended
size_t myStrlen(in char[] cstr) pure // strongly pure
{
return strlen(cstr.ptr);
}

void main()
{
char[3] str = abc;
// str isn't zero-ended so myStrlen call
// results in undefined behavior.
size_t l1 = myStrlen(str);
size_t l2 = myStrlen(str); // can give different result
}
---

@system strongly pure functions often can't be optimized out:
---
extern (C) size_t strlen(in char* s) nothrow pure; // strongly pure

void f(in char* cstr, int* n) pure
{
// strlen have to be executed every iteration,
// because compiler doesn't know if n is
// connected with cstr someway
for(size_t i = 0; i  strlen(cstr); ++i)
{
*n += cstr[i];
}
}
---

Same apply even if these functions hasn't pointers/arrays in it's signature:
---
size_t f(size_t) nothrow pure; // strongly pure

void g(size_t i1, ref size_t i2) pure
{
// f have to be executed every iteration,
// because compiler doesn't know if i1 is
// connected with i2 someway (f can expect
// that it's argument is an address of i2)
for(size_t i = 0; i  f(i1); ++i)
{
i2 *= 3;
}
}
---

One has to carefully watch if a function is strongly pure by it's signature
(the compiler is guaranteed to determine function purity type by it's signature
only to prevent different behavior between cases with/without a signature):
---
void f(size_t x) pure // strongly pure, can't have side effects
{
*cast(int*) x = 5; // undefined behavior
}


__gshared int tmp;
void g(size_t x, ref int dummy = tmp) pure // weakly pure, can have side
effects
{
*cast(int*) x = 5; // correct
}
---

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #22 from Denis Shelomovskij verylonglogin@gmail.com 
2012-06-04 13:07:21 MSD ---
(In reply to comment #21)
 Why would you be marking a function as pure if it can access global state? The
 compiler would flag that unless you cheated through casts or the use of
 extern(C) functions where you marked the declaration as pure but not the
 definition (since pure isn't part of the name mangling for extern(C)
 functions).

From your comment before:
 As for stuff like strlen, in that case, you're doing the @system thing of
 saying that yes, I know what I'm doing. I know that this function isn't marked
 as pure, because it's a C function, but I also know that it _is_ actually 
 pure.

`strlen` is now pure (marked by Andrei Alexandrescu) and it can access global
state once used with non-zero-ended string. I just made situation more evident.

 Also, none of your examples using in are strongly pure. At present, the
 parameters must be _immutable_ or implicitly convertible to immutable for the
 function to be strongly pure. The only way that const or in would work is if
 they were passed immutable arguments, but the compiler doesn't treat that as
 strongly pure right now.

From your comment before:
 When the compiler can guarantee that all of a pure function's arguments
 _cannot_ be altered by that function, _then_ it is strongly pure.

So I just don't know how strlen can change its argument...

 @system has _nothing_ to do with purity. There's no need to bring it up.

IMHO, yes it is. Because @safe and @system pure functions looks very different
for me. And yes, I can be wrong.

 It's just that @system will let you do dirty tricks (such as casting) to get 
 around
 pure. Certainly, an @system pure function isn't pure based on its arguments
 unless it's doing something very wrong. The function would have to be
 specifically trying to break purity to do that, and then it's the same as when
 you're dealing with const and the like. There's no need to even bring it up.
 It's a given with _anything_ where you can cast to do nasty @system stuff.

Does strlen doing something very wrong or specifically trying to break purity
when it accessing random memory? 

 Adding a description of weakly pure vs strongly pure to the documentation may
 be valuable, but adding any examples like these would be pointless without it.
 Also, if you'll notice, the documentation in general is very light on
 unnecessary examples. It explains exactly what the feature does and gives
 minimal examples on it. Any that are added should add real value.
 
 pure functions cannot access global mutable state or call any other functions
 which aren't pure. The compiler will give an error if a function marked as 
 pure
 does either of those things. What the compiler does in terms of optimizations
 is up to its implementation. I don't see how going into great detail on 
 whether
 this particular function signature or that particular function signature can 
 be
 optimized is going to help much.

Yes it is because as I wrote:
 Once it will have examples showing what asserts have to/may/shouldn't pass
 and/or (I prefer and) what optimizations can be done.

optimizations = what asserts should pure functions confirm = what is pure
function

 It seems to me that the core problem is that many programmers are having a 
 hard
 time understanding that all that pure means is that pure functions cannot
 access global mutable state or call any other functions which aren't pure. 
 They
 keep thinking that it means more than that, and it doesn't. The compiler will
 use that information to do optimizations where it can (which aren't even 
 always
 related to strongly pure - e.g. combining const and weakly pure enable
 optimizations, just not the kind which elide function calls). If programmers
 would just believe what the description says about what pure means and stop
 trying to insist that it must mean more than that, I think that they would be 
 a
 lot less confused. In some respects, discussing stuff like weakly pure and
 strongly pure just confuses matters. They're effectively implementation 
 details
 of how some pure-related optimizations are triggered.

strlen and other system functions does access global state in some cases. It's
pure. And I'm confused if there is no explanation on _how exactly pure
functions can access global state_.

 It's so very simple and understandable if you leave it at something like pure
 functions cannot access global or static variables which are at all mutable -
 either by the pure function or anything else - and they cannot call other
 functions which are not pure.

No. They call everything that want and do everything they want (see druntme
pull 198). They just should behave like a pure functions for a user. And I
don't clearly understand what does it mean to behave like a pure function.
That's why this issue is created. That's why I want to see what asserts should
pure 

[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #23 from timon.g...@gmx.ch 2012-06-04 02:22:54 PDT ---
(In reply to comment #14)
 (In reply to comment #13)
  (In reply to comment #12)
   (In reply to comment #11)
Pointers may only access their own memory blocks, therefore exactly 
those
blocks participate in argument value and return value.
   
   What does 'their own memory block' mean?
  
  The allocated memory block it points into.
 
 But, as the bounds are unknown to the compiler, it does not have the this
 information, it has to assume everything is reachable via the pointer.

1. It does not need the information. Dereferencing a pointer outside the valid
   bounds results in undefined behavior. Therefore the compiler can just ignore
   the possibility.
2. It can gain some information at the call site. Eg:

int foo(const(int)* y)pure;

void main(){
int* x = new int;
int* y = new int;
auto a = foo(x);
auto b = foo(y);
auto c = foo(x);

assert(a == c);
}

3. Aliasing is the classic optimization killer even without 'pure'.
4. Invalid use of pointers can break every other aspect of the type system.
   Why single out 'pure' ?

 This is
 why i suggested above that only dereferencing a pointer should be allowed in
 pure functions.
 

This is too restrictive.


 And one way to make it work is to forbid dereferencing pointers and require 
 fat
 ones. Then the bounds would be known.

The bounds are usually known only at runtime.
The compiler does not have more to work with.
From the compiler's point of view, an array access out of bounds
and an invalid pointer dereference are very similar.

   and, if the access isn't restricted somehow, makes the
   function dependent on global memory state.
  
  ? A function independent of memory state is useless.
 
 int n(int i) {return i+42;}
 

Where do you store the parameter 'i' if not in some memory location?


  f4 _is_ 'pure' (it does not access non-immutable free variables). The 
  compiler
  is not allowed to perform optimizations that change defined program 
  behavior.
 
 f4 isn't pure, by any definition - it depends on (or in this example modifies)
 state, which the caller may not even consider reachable.

Then it is the caller's fault. What is considered reachable is well-defined,
and f4 must document its valid inputs.

 The compiler can
 assume that a pure function does not access any mutable state other than what
 can be directly or indirectly reached via the arguments -- that is what
 function purity is all about. If the compiler has to assume that a pure
 function that takes a pointer argument can read or modify everything, the
 pure tag becomes worthless.

No pointer _argument_ necessary.

int foo()pure{
enum int* everything = cast(int*)...;
return *everything;
}

As I already pointed out, unsafe language features can be used to subvert the
type system. If pure functions should be restricted to the safe subset, they
can be marked @safe, or compiled with the -safe compiler switch.

 And what's worse, it allows other truly pure
 function to call our immoral one. 
 

Nothing wrong with that.

 Hmm, another way out of this could be to require all pointers args in a pure
 function to target 'immutable' - but that, again, seems to limiting; bool 
 f(in Struct* s) could not be pure.

This is why the restriction was dropped.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #24 from timon.g...@gmx.ch 2012-06-04 02:41:16 PDT ---
(In reply to comment #22)
 
 `strlen` is now pure (marked by Andrei Alexandrescu) and it can access global
 state once used with non-zero-ended string. I just made situation more 
 evident.
 

It may not be used with a non-zero-ended string.
See eg. http://www.cplusplus.com/reference/clibrary/cstring/strlen/

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #25 from klickverbot c...@klickverbot.at 2012-06-04 03:07:52 PDT 
---
I am partly playing Devil's advocate here, but:

(In reply to comment #23)
  This is
  why i suggested above that only dereferencing a pointer should be allowed in
  pure functions.
  
 This is too restrictive.

Why?

  And one way to make it work is to forbid dereferencing pointers and require 
  fat
  ones. Then the bounds would be known.
 
 The bounds are usually known only at runtime.
 The compiler does not have more to work with.
 From the compiler's point of view, an array access out of bounds
 and an invalid pointer dereference are very similar.

There is an important semantic difference between these two – a slice is a
bounded region of memory, whereas a pointer per se just represents a reference
to a single value.
---
int foo(int* p) pure {
  return *(p - 1); // Is this legal?
}

auto a = new int[10];
foo(a.ptr + 1);
---

   ? A function independent of memory state is useless.
  
  int n(int i) {return i+42;}
 Where do you store the parameter 'i' if not in some memory location?

In a register, but that's besides the point – which is that the type of i, int,
makes it clear that n depends on exactly four bytes of memory. In »struct Node
{ Node* next; } void foo(Node* n) pure;«, on the other hand, following your
interpretation foo() might depend on an almost arbitrarily large amount of
memory (consider e.g. uninitialized memory in the area between a heap-allocated
Node instance and the end of the block where it resides, which, if interpreted
as Node instance(s), might have »false pointers« to other memory blocks, etc.).

   f4 _is_ 'pure' (it does not access non-immutable free variables). The 
   compiler
   is not allowed to perform optimizations that change defined program 
   behavior.
  
  f4 isn't pure, by any definition - it depends on (or in this example 
  modifies)
  state, which the caller may not even consider reachable.
 
 Then it is the caller's fault. What is considered reachable is well-defined 
 […]

Is it? Could you please repeat the definition then, and point out how this is
clear from the definition of purity according to the spec, »Pure functions are
functions that produce the same result for the same arguments«.

 and f4 must document its valid inputs.
---
/// Passing anything other than `false` is illegal.
int g_state;
void foo(bool neverTrue) pure {
   if (neverTrue) g_state = 42;
}
---

Should this be allowed to be pure? Well, if strlen is, then ostensibly yes, but
isn't this too permissive of an interpretation, as the type system can't
actually guarantee it? Shouldn't rather a cast to pure at the _call site_ be
required if called with know good values, just as in other cases where the type
system can't prove a certain invariant, but the programmer can? Purity by
convention works just fine without the pure keyword as well…

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #26 from Jonathan M Davis jmdavisp...@gmx.com 2012-06-04 03:19:22 
PDT ---
I'd actually argue that the line Pure functions are functions that produce the
same result for the same arguments should be removed from the spec.
Ostensibly, yes. The same arguments will result in the same result, but that
doesn't really have anything to do with how pure is defined. It's more like
it's a side effect of the fact that you can't access global mutable state. It's
true that the compiler will elide additional function calls within an
expression in cases where the same function is called multiple times with the
same arguments and the compiler can guarantee that the result will be the same,
but that's arguably an implementation detail of the optimizer.

While the origin and original motivation for pure in D was to enable
optimizations based on functional purity (multiple calls to the same function
with the same arguments are guaranteed to have the same results), that's not
really what pure in D does now, and talking about that clouds the issue
something awful, as this bug report demonstrates.

Pure means solely that the function cannot access any global or static
variables which can be mutated either directly or indirectly once instantiated
and that the function cannot call any other functions which are not pure. That
enables the whole same result for the same arguments thing, but it does _not_
mean that in and of itself. The simple fact that an argument could have a
function on it which returns the value of a mutable global variable without
that variable being part of its state at all negates that.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #27 from klickverbot c...@klickverbot.at 2012-06-04 03:38:12 PDT 
---
(In reply to comment #26)
 While the origin and original motivation for pure in D was to enable
 optimizations based on functional purity (multiple calls to the same function
 with the same arguments are guaranteed to have the same results), that's not
 really what pure in D does now, and talking about that clouds the issue
 something awful, as this bug report demonstrates.

I think you've provided a good explanation of the high-level design of the pure
keyword, more than once, but it seems that you are missing that this issue, at
least as stated in comment 3, is actually about a very specific detail: The
extent to which memory reachably by manipulating passed in pointers is still
considered �local�, i.e. accessible by pure functions.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #29 from timon.g...@gmx.ch 2012-06-04 03:39:52 PDT ---
(In reply to comment #25)
 I am partly playing Devil's advocate here, but:
 
 (In reply to comment #23)
   This is
   why i suggested above that only dereferencing a pointer should be allowed 
   in
   pure functions.
   
  This is too restrictive.
 
 Why?

Because safety is an orthogonal concern. eg. strlen is a pure function.
By the same way of reasoning, all unsafe features could be banned in all parts
of the code, not just in pure functions.

 
   And one way to make it work is to forbid dereferencing pointers and 
   require fat
   ones. Then the bounds would be known.
  
  The bounds are usually known only at runtime.
  The compiler does not have more to work with.
  From the compiler's point of view, an array access out of bounds
  and an invalid pointer dereference are very similar.
 
 There is an important semantic difference between these two – a slice is a
 bounded region of memory, whereas a pointer per se just represents a reference
 to a single value.

Yes, 'per se'. Effectively, it references all memory in the same allocated
memory block. (This is also the view taken by the GC.)

 ---
 int foo(int* p) pure {
   return *(p - 1); // Is this legal?
 }
 

If it is legal depends on whether or not *(p-1) is part of the same memory
block. A conservative analysis (as is done in @safe code) would have to flag
the access as illegal.

 auto a = new int[10];
 foo(a.ptr + 1);
 ---

a.ptr is a pointer. The arithmetics are flagged as illegal in @safe code even
though it is safe. What do the examples show?


 
? A function independent of memory state is useless.
   
   int n(int i) {return i+42;}
  Where do you store the parameter 'i' if not in some memory location?
 
 In a register, but that's besides the point

Indeed, because a register is just memory after all.

 – which is that the type of i, int,
 makes it clear that n depends on exactly four bytes of memory. In »struct Node
 { Node* next; } void foo(Node* n) pure;«, on the other hand, following your
 interpretation foo() might depend on an almost arbitrarily large amount of
 memory (consider e.g. uninitialized memory in the area between a 
 heap-allocated
 Node instance and the end of the block where it resides,
 which, if interpreted as Node instance(s), might have »false pointers« to 
 other memory blocks, etc.).
 

The language does not define such a thing. Accessing this area therefore
results in undefined behavior.

f4 _is_ 'pure' (it does not access non-immutable free variables). The 
compiler
is not allowed to perform optimizations that change defined program 
behavior.
   
   f4 isn't pure, by any definition - it depends on (or in this example 
   modifies)
   state, which the caller may not even consider reachable.
  
  Then it is the caller's fault. What is considered reachable is well-defined 
  […]
 
 Is it? Could you please repeat the definition then,

It is written down in the C standard. There is no formal specification for D.

 and point out how this is
 clear from the definition of purity according to the spec,

This would not be defined in the pages about purity, but rather in the pages
about pointer arithmetics, which are missing, presumably because they would be
the same as in C.

 »Pure functions are
 functions that produce the same result for the same arguments«.
 

This is not a definition of the 'pure' keyword. It relies on informal terms
such as 'the same' and does not require annotation of a function. Therefore the
sentence should be dropped from the documentation.

If a function is marked with 'pure', then it may not reference mutable free
variables.

  and f4 must document its valid inputs.
 ---
 /// Passing anything other than `false` is illegal.
 int g_state;
 void foo(bool neverTrue) pure {
if (neverTrue) g_state = 42;
 }
 ---
 
 Should this be allowed to be pure? Well, if strlen is, then ostensibly yes, 
 but

No, because it is trivial to devise an equivalent implementation that does not
require the compiler to read documentation comments:

int g_state;
void foo(bool neverTrue) pure in{assert(!neverTrue);} body { }

The same does not hold for 'strlen', therefore the analogy immediately breaks
down.

 isn't this too permissive of an interpretation, as the type system can't
 actually guarantee it? Shouldn't rather a cast to pure at the _call site_ be
 required if called with know good values, just as in other cases where the 
 type
 system can't prove a certain invariant, but the programmer can?

The type system of an unsafe language cannot prove _any_ invariants, because
unsafe operations may result in undefined behavior. This does not imply we'd
better have to drop the entire type system.

 Purity by convention works just fine without the pure keyword as well…

This is not only about purity by convention, it is about memory safety by
convention. In @safe code, all the 

[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #28 from Jonathan M Davis jmdavisp...@gmx.com 2012-06-04 03:39:33 
PDT ---
https://github.com/D-Programming-Language/d-programming-language.org/pull/128

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #30 from Jonathan M Davis jmdavisp...@gmx.com 2012-06-04 03:42:40 
PDT ---
 I think you've provided a good explanation of the high-level design of the 
 pure keyword, more than once, but it seems that you are missing that this 
 issue, at least as stated in comment 3, is actually about a very specific 
 detail: The extent to which memory reachably by manipulating passed in 
 pointers is still considered �local�, i.e. accessible by pure functions.

pure doesn't restrict pointers in any way shape or form. That's an
@safe/@trusted/@system issue, and is completely orthogonal to pure.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #31 from klickverbot c...@klickverbot.at 2012-06-04 06:18:02 PDT 
---
(In reply to comment #30)
 pure doesn't restrict pointers in any way shape or form. That's an
 @safe/@trusted/@system issue, and is completely orthogonal to pure.

I guess I _might_ have understood what purity entails and what it doesn't… To
quote myself, the question here is the extent to which memory reachable by
manipulating passed in pointers is still considered local, i.e. accessible by
pure functions. This, conceptually, has nothing to do with
@safe/@trusted/@system, even though @safe code cannot manipulate pointers for
other reasons.

There are two options: Either, allow pure functions taking pointers to read
other memory locations in the same block of allocated values, or restrict
access to just the data directly pointed at (which incidentally is also what
@safe does, but, again, that's not relevant). Both options are equally valid,
and I think the current »spec« is not clear on which one should apply.

The first option, which is currently implemented in DMD, allows functions like
strlen() to be pure. On the other hand, it also makes the
semantics/implications of `pure` a lot more complex, because it links it to
something which is fundamentally not expressible by the type system, namely
that for any level of indirection, surrounding parts of the memory might be
accessible or not, depending on how it was originally allocated. This is
assuming C semantics, because, as Timon mentioned as well, OTOH the D docs
don't have a formal definition for this as all.

For example, consider »struct Node { int val; Node* next; } int foo(in Node*
head) pure;«. Using the first rule, it is almost impossible to figure out
statically what parts of the program state »foo(someHead)« depends on, because
if any of the Node instances in the chain was allocated as part of a contiguous
block (i.e. array), it would be legal for foo() to read them as well, even
though the function calling foo() might not even have been involved in the
construction of the list. Thus, the compiler is forced to always assume the
worst case in terms of optimization (at least without elaborate DFA), which, in
most D programs, is needlessly conservative.

The second option avoid such complications, and allows functions calls with
parameters on the heap (and thus pointers) to receive the same kind of
optimizations as if the parameters were passed on the stack, which might be
impractical. It is also the expected behavior if you are thinking of a pointer
literally just as an indirection to a single value stored somewhere else.

Personally, I am not sure what is the better choice; the second option seems
like the cleaner design, but I can see the merits of the first one as well. But
that's not my point – I am just trying to convince you that the »spec« (or
whatever it should really be called) needs improvement in this area, because it
frequently confuses people. Your revised version (#128) doesn't define »through
their arguments« either, yet this is the crucial point.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Don clugd...@yahoo.com.au changed:

   What|Removed |Added

 CC||clugd...@yahoo.com.au


--- Comment #32 from Don clugd...@yahoo.com.au 2012-06-04 06:31:41 PDT ---
(In reply to comment #31)
 (In reply to comment #30)
  pure doesn't restrict pointers in any way shape or form. That's an
  @safe/@trusted/@system issue, and is completely orthogonal to pure.
 
 I guess I _might_ have understood what purity entails and what it doesn't… To
 quote myself, the question here is the extent to which memory reachable by
 manipulating passed in pointers is still considered local, i.e. accessible by
 pure functions. This, conceptually, has nothing to do with
 @safe/@trusted/@system, even though @safe code cannot manipulate pointers for
 other reasons.

I
 
 There are two options: Either, allow pure functions taking pointers to read
 other memory locations in the same block of allocated values, or restrict
 access to just the data directly pointed at (which incidentally is also what
 @safe does, but, again, that's not relevant). Both options are equally valid,
 and I think the current »spec« is not clear on which one should apply.
 
 The first option, which is currently implemented in DMD, allows functions like
 strlen() to be pure. On the other hand, it also makes the
 semantics/implications of `pure` a lot more complex, because it links it to
 something which is fundamentally not expressible by the type system, namely
 that for any level of indirection, surrounding parts of the memory might be
 accessible or not, depending on how it was originally allocated. This is
 assuming C semantics, because, as Timon mentioned as well, OTOH the D docs
 don't have a formal definition for this as all.
 
 For example, consider »struct Node { int val; Node* next; } int foo(in Node*
 head) pure;«. Using the first rule, it is almost impossible to figure out
 statically what parts of the program state »foo(someHead)« depends on, because
 if any of the Node instances in the chain was allocated as part of a 
 contiguous
 block (i.e. array), it would be legal for foo() to read them as well, even
 though the function calling foo() might not even have been involved in the
 construction of the list. Thus, the compiler is forced to always assume the
 worst case in terms of optimization (at least without elaborate DFA), which, 
 in
 most D programs, is needlessly conservative.

That's correct. You should not expect *any* optimizations from weakly pure
functions. The ONLY purpose of weakly pure functions is to increase the number
of strongly pure functions. In all other respects, they are no different from
an impure function.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #33 from klickverbot c...@klickverbot.at 2012-06-04 06:43:32 PDT 
---
(In reply to comment #32)
 That's correct. You should not expect *any* optimizations from weakly pure
 functions. The ONLY purpose of weakly pure functions is to increase the number
 of strongly pure functions. In all other respects, they are no different from
 an impure function.

Const-pure functions invoked with immutable _arguments_ (even though parameters
might only be const) can receive exactly the same amount of optimizations. Even
if not implemented in DMD today (as are many other possible purity-related
optimizations), this is very useful, because otherwise functions would have to
accept immutable values just for the sake of optimization even though they
could work with const values just as well otherwise.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #34 from Denis Shelomovskij verylonglogin@gmail.com 
2012-06-04 19:08:08 MSD ---
(In reply to comment #33)
 (In reply to comment #32)
  That's correct. You should not expect *any* optimizations from weakly pure
  functions. The ONLY purpose of weakly pure functions is to increase the 
  number
  of strongly pure functions. In all other respects, they are no different 
  from
  an impure function.
 
 Const-pure functions invoked with immutable _arguments_ (even though 
 parameters
 might only be const) can receive exactly the same amount of optimizations. 
 Even
 if not implemented in DMD today (as are many other possible purity-related
 optimizations), this is very useful, because otherwise functions would have to
 accept immutable values just for the sake of optimization even though they
 could work with const values just as well otherwise.

Have you noticed that as I wrote in comment 20 strong unsafe pure functions
like
---
size_t f(size_t) nothrow pure;
---
also almost always can't be optimized out?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #35 from Denis Shelomovskij verylonglogin@gmail.com 
2012-06-04 19:18:33 MSD ---
For Jonathan M Davis: here (as before) when I say optimization I mean
doesn't behave such way that can be optimized which means doesn't behave
such way that is expected/desired (IMHO)/etc..

Example (for everybody):
---
int f(size_t) pure;

__gshared int tmp;
void g(size_t, ref int dummy = tmp) pure;

void h(size_t a, size_t b) pure
{
int res = f(a);
g(b);
assert(res == f(a)); // may fail, no guaranties by language!
}
---

So pure looks for me more then just useless. It looks dangerous because it
confuses people and forces them to think that the second `assert` will pass. At
least, with existing docs (or with pull 128).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #36 from Jonathan M Davis jmdavisp...@gmx.com 2012-06-04 08:45:00 
PDT ---
 int f(size_t) pure;

 __gshared int tmp;
 void g(size_t, ref int dummy = tmp) pure;

 void h(size_t a, size_t b) pure
 {
int res = f(a);
g(b);
assert(res == f(a)); // may fail, no guaranties by language!
}

Your g(b) causes h to be impure, because it accesses tmp, which is __gshared.
Also, as far as eliding additional calls to pure functions, at present, they
only occur within the same line, and I think that may only ever occur within
the same expression (it's either expression or statement, I'm not sure which).
So, the eliding of additional pure function calls is going to be quite rare.
The _primary_ benefit of pure is how it enables you to reason about your code.
You _know_ that f doesn't mess with anything other than the argument that you
passed to it without having to look at its body at all.

Oh, and the assertion _is_ guaranteed to pass. a and res are both value types.
Neither res nor a are passed to anything or accessed in any way other than in
the the lines with the calls to f, and even if g were impure, and it screwed
with whatever argument was passed as the first argument to the h call, it
wouldn't be able to mess with the value of a, because it was already copied.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #37 from klickverbot c...@klickverbot.at 2012-06-04 09:03:18 PDT 
---
(In reply to comment #34)
 […] strong unsafe pure functions […]

Please note that @safe-ty of a function has nothing to do with purity. Yes in a
@system/@trusted pure function, it's easy to do impure things, but if you do,
it's your fault, not that of the language/type system.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #38 from art.08...@gmail.com 2012-06-04 09:08:38 PDT ---
(In reply to comment #23)
 (In reply to comment #14)
  (In reply to comment #13)
   (In reply to comment #12)
(In reply to comment #11)
 Pointers may only access their own memory blocks, therefore exactly 
 those
 blocks participate in argument value and return value.

What does 'their own memory block' mean?
   
   The allocated memory block it points into.
  
  But, as the bounds are unknown to the compiler, it does not have the this
  information, it has to assume everything is reachable via the pointer.
 
 1. It does not need the information. Dereferencing a pointer outside the valid
bounds results in undefined behavior. Therefore the compiler can just 
 ignore
the possibility.

The problem is there are no valid bounds. Unless you'd like to declare
   (char* p) {return p[1];}
as invalid, which as you yourself say is restrictive (but IMO acceptable for
pure functions, at least the ones that are automatically inferred as pure).

 2. It can gain some information at the call site. Eg:
 
 int foo(const(int)* y)pure;
 
 void main(){
 int* x = new int;
 int* y = new int;
 auto a = foo(x);
 auto b = foo(y);
 auto c = foo(x);
 
 assert(a == c);
 }

According to certain replies in this report, that assertion could fail. :) 

But i get what you're saying - now consider this foo() definition instead:

   int foo()(const(int)* y) {
  int r;
  foreach (i; 0..size_t.max)
 r += y[i];
  return r;
   }

   /* same main () */

The compiler will treat foo() as pure, so if it would be able to act on the
a==c assumption above, it could also do the same here. And now it would be
completely wrong - the function doesn't even try to pretend that it's pure, yet
it will be inferred as if it were and there's no (clean) way to prevent that.
If the compiler optimizes based on a==c, it will miscompile the program.
This is why the restrictions on what is accessed via a pointer in a pure
function is necessary. Note it only matters for templates/literals/lambdas, ie
the cases where purity is inferred; the programmer can always add the purity
tag when he knows it is (logically) safe (eg most C string functions).

And yes, my example code doesn't make sense as-is, but it only servers to
illustrate the problem, there are sane implementations of foo(T*p) which under
the right conditions will have the same issues.

BTW, is my foo() above @safe? According to the compiler here - it is.


 3. Aliasing is the classic optimization killer even without 'pure'.

Yes. Maybe it's a good thing that D doesn't attempt to define it, given the
amount of confusion something like pure causes...


 4. Invalid use of pointers can break every other aspect of the type system.
Why single out 'pure' ?

It has nothing to do with invalid use of pointers, unless, again, p[1] is
deemed invalid.


  This is
  why i suggested above that only dereferencing a pointer should be allowed in
  pure functions.
  
 
 This is too restrictive.

What else do you want to be able to do with a pointer in a pure function?
Dereferencing it and working with the value itself should work, anything else?
Note that you should be able to explicitly tell the compiler to assume
something is pure even when the code accesses more than just the pointed-to
element.


  And one way to make it work is to forbid dereferencing pointers and require 
  fat
  ones. Then the bounds would be known.
 
 The bounds are usually known only at runtime.
 The compiler does not have more to work with.
 From the compiler's point of view, an array access out of bounds
 and an invalid pointer dereference are very similar.

Having well defined aliasing rules would help, yes, but I think that's beyond
the scope of this bug.


and, if the access isn't restricted somehow, makes the
function dependent on global memory state.
   
   ? A function independent of memory state is useless.
  
  int n(int i) {return i+42;}
  
 
 Where do you store the parameter 'i' if not in some memory location?

I said global memory state. The parameters are *local* state, just like
variables - they can not escape (you can't return their address) and the values
depend only on function inputs. Arguments containing references can be seen as
part of the global state, but those are explicitly defined as inputs that the
function depends on. And that definition wrt to pointers is exactly what this
bug is about.


   f4 _is_ 'pure' (it does not access non-immutable free variables). The 
   compiler
   is not allowed to perform optimizations that change defined program 
   behavior.
  
  f4 isn't pure, by any definition - it depends on (or in this example 
  modifies)
  state, which the caller may not even consider reachable.
 
 Then it is the caller's fault. What is considered reachable is well-defined,
 and f4 must document its valid inputs.


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #39 from klickverbot c...@klickverbot.at 2012-06-04 09:13:14 PDT 
---
(In reply to comment #38)
 BTW, is my foo() above @safe? According to the compiler here - it is.

If so, please open a new issue – this is clearly a bug.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #40 from Denis Shelomovskij verylonglogin@gmail.com 
2012-06-04 20:27:24 MSD ---
(In reply to comment #36)
  int f(size_t) pure;
 
  __gshared int tmp;
  void g(size_t, ref int dummy = tmp) pure;
 
  void h(size_t a, size_t b) pure
  {
 int res = f(a);
 g(b);
 assert(res == f(a)); // may fail, no guaranties by language!
 }
 
 Your g(b) causes h to be impure, because it accesses tmp, which is __gshared.

Yes, my mistake. Lets call g(b, b).

 Also, as far as eliding additional calls to pure functions, at present, they
 only occur within the same line, and I think that may only ever occur within
 the same expression (it's either expression or statement, I'm not sure which).
 So, the eliding of additional pure function calls is going to be quite rare.
 The _primary_ benefit of pure is how it enables you to reason about your code.
 You _know_ that f doesn't mess with anything other than the argument that you
 passed to it without having to look at its body at all.

No, because the assert may not pass. See below.

 Oh, and the assertion _is_ guaranteed to pass. a and res are both value types.
 Neither res nor a are passed to anything or accessed in any way other than in
 the the lines with the calls to f, and even if g were impure, and it screwed
 with whatever argument was passed as the first argument to the h call, it
 wouldn't be able to mess with the value of a, because it was already copied.

Again, assert may not pass. Were it pass, I will not write this question.
Example:
---
int f(size_t p) pure
{
return *cast(int*) p;
}

void g(size_t p, ref size_t) pure
{
++*cast(int*) p;
}

void h(size_t a, size_t b) pure
{
int res = f(a);
g(b, b);
assert(res == f(a)); // may fail, no guaranties by language!
}

void main()
{
int a;
h(cast(size_t) a, cast(size_t) a);
}
---

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #41 from Jonathan M Davis jmdavisp...@gmx.com 2012-06-04 09:35:33 
PDT ---
 void g(size_t p, ref size_t) pure
{
++*cast(int*) p;
}

You're casting a size_t to a pointer. That's breaking the type system. The
assertion is guaranteed to pass as long as you don't break the type system.
That's exactly the same as occurs when casting away const. When you subvert the
type system, the compiler can't guarantee anything. It's the _programmer's_ job
at that point to maintain the compiler's guarantees. The compiler is free to
assume that the programmer did not violate those guarantees. If you do, you've
created a bug. This is precisely the sort of thing that comes up when someone
is crazy enough to cast away const on somethnig and try and mutate it. Such an
example is ultimately irrelevant, precisely because it violates the type
system.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #42 from Denis Shelomovskij verylonglogin@gmail.com 
2012-06-04 20:52:56 MSD ---
(In reply to comment #41)
  void g(size_t p, ref size_t) pure
 {
 ++*cast(int*) p;
 }
 
 You're casting a size_t to a pointer. That's breaking the type system. The
 assertion is guaranteed to pass as long as you don't break the type system.
 That's exactly the same as occurs when casting away const.

It isn't and here is the point! It's explicitly stated that when I'm casting
away const and than modify date the result is undefined. I will be happy if I'm
missing that this casting results in undefined result too.

 When you subvert the
 type system, the compiler can't guarantee anything. It's the _programmer's_ 
 job
 at that point to maintain the compiler's guarantees. The compiler is free to
 assume that the programmer did not violate those guarantees.

No it's not. Otherwise every such break of the rules will result in undefined
behavior. E.g. C++ have strict aliasing and can shrink what function arguments
can refer to and if C++ program has `strlen` source it can inline and move it
out of loop if, e.g. in loop we only modify and `int*`, but in D it can't be
done because every `int*` can refer to every `char*`. So C++ support pure
functions better than D. :)

 If you do, you've
 created a bug. This is precisely the sort of thing that comes up when someone
 is crazy enough to cast away const on somethnig and try and mutate it. Such an
 example is ultimately irrelevant, precisely because it violates the type
 system.

Every @system function can do it. It can even be written in assembly language.
I'm just saying here that it doesn't violate definition of a `pure` function
and here is the problem. I will be happy once it will violate the definition.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #43 from Steven Schveighoffer schvei...@yahoo.com 2012-06-04 
10:30:19 PDT ---
(In reply to comment #42)
 It isn't and here is the point! It's explicitly stated that when I'm casting
 away const and than modify date the result is undefined. I will be happy if 
 I'm
 missing that this casting results in undefined result too.

I believe it is undefined to cast a size_t to a pointer and use it as a
pointer.  But I could be wrong.

In any case, pure function optimizations do not conservatively assume you will
be doing that -- the compiler will optimize assuming you do *not* use it as a
pointer.

Whenever you cast, you are telling the compiler I know what I'm doing. At
that point, you are on your own as far as guaranteeing type safety and pure
functions are actually pure.

 No it's not. Otherwise every such break of the rules will result in undefined
 behavior. E.g. C++ have strict aliasing and can shrink what function arguments
 can refer to and if C++ program has `strlen` source it can inline and move it
 out of loop if, e.g. in loop we only modify and `int*`, but in D it can't be
 done because every `int*` can refer to every `char*`. So C++ support pure
 functions better than D. :)

If you don't want the compiler to make bad optimization decisions, then don't
use casting.  At best, this will be implementation defined.

I think you are way overthinking this.  D's compiler and optimizer are based on
a C++ compiler, written by the same person.  Most of the same rules from C++
apply to D.

The compiler does not assume the worst, it assumes the reasonable, until
you tell it otherwise.  In other words, no reasonable developer will write code
like you have, so the compiler assumes you are reasonable.  Using toy examples
to show how the compiler *must* behave does not work.

Yes, maybe this isn't spelled out fully in the spec, and it should be.  But you
are coming at this problem from the wrong end, start with what the compiler
acutally *does*, not what you *think it should do* based on the spec.  The
spec, like most software products, is usually the last to be updated when it
comes to additional features, and the new pure rules are quite recent.

The priority of who is right goes like this:

1. TDPL (the book)
2. The reference implementation (DMD)
3. dlang.org

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #44 from Steven Schveighoffer schvei...@yahoo.com 2012-06-04 
10:45:33 PDT ---
(In reply to comment #7)
 In general response to this bug, I'm unsure how pointers should be treated by
 the optimizer.  My gut feeling is the compiler/optimizer should trust the code
 knows what it's doing. and so should expect that the code implicitly knows
 how much data it can access after the pointer.

After thinking about this for a couple days (and watching the emails pour in
with differing opinions), here is what I think pure functions with pointers
should mean:

For @system or @trusted functions, the definition of what data the pointer has
access to is defined by the programmer, and not expressed in possible way to
the type system or the compiler.  In other words, if I have a pointer to
something, the actual data referenced includes any number of bytes before or
after the memory pointed at.  The scope of that data is defined by the
programmer of the function/type, and should be clearly documented to the user
of the function.

For @safe functions, the compiler should allow access only to the specific item
pointed to as defined by the pointed-at type, and nothing else (pointer math is
disallowed, pointer indexing is disallowed, and casting is disallowed).

For pure functions, no conservative assumptions should be made or acted upon
during optimizations that expect the function has access to global data.  In
other words, a @system pure function that accepts a pointer should rightly
assume that the function does *not* access global data, and that whatever data
the function accesses via its pointer was passed via its parameter as expected
by the caller.  If the function incorrectly accesses global data via its
pointer, then it results in undefined behavior.

These expectations and behaviors should be spelled out in the spec.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #45 from klickverbot c...@klickverbot.at 2012-06-04 10:51:45 PDT 
---
(In reply to comment #44)
Still thinking about the rest of the proposal, but:

 […] or @trusted functions […]
If a @trusted function accepts a pointer, it must _under no circumstances_
access anything except for the pointer target, because it can be called from
@safe code.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #46 from Steven Schveighoffer schvei...@yahoo.com 2012-06-04 
10:59:49 PDT ---
(In reply to comment #45)
 (In reply to comment #44)
 Still thinking about the rest of the proposal, but:
 
  […] or @trusted functions […]
 If a @trusted function accepts a pointer, it must _under no circumstances_
 access anything except for the pointer target, because it can be called from
 @safe code.

The point of @trusted is that it is treated as @safe, but can do unsafe things.
 At that point, you are telling the compiler that you know better than it does
that the code is safe.

The compiler is going to assume you did not access anything else beyond the
target, so you have to keep that in mind when writing a @trusted function that
accepts a pointer parameter.

Off the top of my head, I can't think of any valid usage of this, but it
doesn't mean we should necessarily put a restriction on @trusted functions. 
This is a systems language, and @trusted is a tool used to circumvent @safe-ty
when you know it is actually @safe.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #47 from Denis Shelomovskij verylonglogin@gmail.com 
2012-06-04 22:13:05 MSD ---
(In reply to comment #43)
 The compiler does not assume the worst, it assumes the reasonable, until
 you tell it otherwise.  In other words, no reasonable developer will write 
 code
 like you have, so the compiler assumes you are reasonable.  Using toy examples
 to show how the compiler *must* behave does not work.

Common! System language must have strict rights. You just have said that D is
JavaScript.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #48 from klickverbot c...@klickverbot.at 2012-06-04 11:24:10 PDT 
---
(In reply to comment #46)
 (In reply to comment #45)
  If a @trusted function accepts a pointer, it must _under no circumstances_
  access anything except for the pointer target, because it can be called from
  @safe code.
 
 The point of @trusted is that it is treated as @safe, but can do unsafe 
 things.
  At that point, you are telling the compiler that you know better than it does
 that the code is safe.
 
 The compiler is going to assume you did not access anything else beyond the
 target, so you have to keep that in mind when writing a @trusted function that
 accepts a pointer parameter.
 
 Off the top of my head, I can't think of any valid usage of this, but it
 doesn't mean we should necessarily put a restriction on @trusted functions. 
 This is a systems language, and @trusted is a tool used to circumvent @safe-ty
 when you know it is actually @safe.

Sorry, but I think you got this wrong. Consider this example:

---
void gun(int* a) @trusted;

int fun() @safe {
  auto val = new int;
  gun(val);
  return *val;
}
---

Here, calling gun needs to be safe under _any_ circumstances. Thus, the only
memory location which gun is allowed to access is val. If it does so by
evaluating *(a + k), where k = (catalanNumber(5) - meaningOfLife()), that's
fine, it's @trusted, but ultimately k must always be zero. Otherwise, it might
violate the memory safety guarantees that need to hold for fun(). This is
definitely not �defined by the programmer, and not expressed in possible way to
the type system or the compiler�.

Makes sense?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #49 from art.08...@gmail.com 2012-06-04 11:29:39 PDT ---
As this discussions was mostly about what *should* be happening, I decided to
see what actually *is* happening right now.
It seems that the compiler will only optimize based on pureness if a function
takes an 'immutable T*' argument, even 'immutable(T)*' is enough to turn the
optimization off.
So, right now, it is extremely conservative - and there is no bug in the
implementation. (accessing mutable data via an immutable pointer can be done,
but would be clearly illegal, just as using a cast)

But that also means that a lot of valid optimizations aren't done, making
purity significantly less useful than it could be. Basically, only functions
that don't take any (non-immutable) references as arguments can benefit from
pure. But it also means D can still be incrementally fixed, as long as a sane
definition of function purity is used.

But this bug is a spec issue, hence probably INVALID, as there is no
specification. Sorry for the noise.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #50 from Steven Schveighoffer schvei...@yahoo.com 2012-06-04 
11:35:27 PDT ---
(In reply to comment #47)
 Common! System language must have strict rights. You just have said that D is
 JavaScript.

A systems language is very strict as long as you play within the type system.

Once you use casts, all bets are off.  The compiler can make *wrong
assumptions* and your code may not do what you think it should.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #51 from Steven Schveighoffer schvei...@yahoo.com 2012-06-04 
11:48:22 PDT ---
(In reply to comment #48)
 (In reply to comment #46)
  (In reply to comment #45)
   If a @trusted function accepts a pointer, it must _under no circumstances_
   access anything except for the pointer target, because it can be called 
   from
   @safe code.
  
  The point of @trusted is that it is treated as @safe, but can do unsafe 
  things.
   At that point, you are telling the compiler that you know better than it 
  does
  that the code is safe.
  
  The compiler is going to assume you did not access anything else beyond the
  target, so you have to keep that in mind when writing a @trusted function 
  that
  accepts a pointer parameter.
  
  Off the top of my head, I can't think of any valid usage of this, but it
  doesn't mean we should necessarily put a restriction on @trusted functions. 
  This is a systems language, and @trusted is a tool used to circumvent 
  @safe-ty
  when you know it is actually @safe.
 
 Sorry, but I think you got this wrong. Consider this example:
 
 ---
 void gun(int* a) @trusted;
 
 int fun() @safe {
   auto val = new int;
   gun(val);
   return *val;
 }
 ---
 
 Here, calling gun needs to be safe under _any_ circumstances.

No, it does not.  Once you use @trusted, the compiler stops checking that it's
@safe.

 Thus, the only
 memory location which gun is allowed to access is val. If it does so by
 evaluating *(a + k), where k = (catalanNumber(5) - meaningOfLife()), that's
 fine, it's @trusted, but ultimately k must always be zero. Otherwise, it might
 violate the memory safety guarantees that need to hold for fun(). This is
 definitely not �defined by the programmer, and not expressed in possible way 
 to
 the type system or the compiler�.

Yeah, that's a hard one to spell out in docs.  I'd recommend not writing that
function :)

But there's no way to specify this to the compiler, it must assume you have
communicated it properly.

Here is an interesting example (I pointed it out before in terms of sockaddr):

struct PacketHeader
{
   int nBytes;
   int packetType;
}

struct DataPacket
{
   PacketHeader header = {packetType:5};
   ubyte[1] data; // extends through length of packet
}

How to specify to the compiler that PacketHeader * with packetType of 5 is
really a DataPacket, and it's data member has nBytes bytes in it?

Such a well-described data structure system can be perfectly @safe, as long as
you follow the rules of construction.

Now, in order to ensure any function that receives a PacketHeader * is
@trusted, you will have to control construction of the PacketHeader somehow. 
Perhaps you make PacketHeader an opaque type, and @safe functions can therefore
never muck with the header information, or maybe you mark nBytes and packetType
as private, so it can never be changed outside the module that knows how to
build PacketHeaders.  In any case, it is wrong to assume that there isn't a
valid way to make a @trusted call that is free to go beyond the target.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #52 from Steven Schveighoffer schvei...@yahoo.com 2012-06-04 
11:51:14 PDT ---
(In reply to comment #49)
 It seems that the compiler will only optimize based on pureness if a 
 function
 takes an 'immutable T*' argument, even 'immutable(T)*' is enough to turn the
 optimization off.

This is a bug, both should be optimized equally:

void foo(immutable int * _param) pure
{
   immutable(int)* param = _param; // legal
   ... // same code as if you had written void foo(immutable(int)* param)
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #53 from klickverbot c...@klickverbot.at 2012-06-04 12:12:16 PDT 
---
(In reply to comment #51)
 (In reply to comment #48)
  Here, calling gun needs to be safe under _any_ circumstances.
 
 No, it does not.  Once you use @trusted, the compiler stops checking that it's
 @safe.

Yes, it does. As you noted correctly, you as the one implementing gun() must
take care of that, the compiler doesn't help you here. But still, you must
ensure that gun() never violates memory safety, regardless of what is passed
in, because otherwise it might cause @safe code to be no longer memory safe.

 Now, in order to ensure any function that receives a PacketHeader * is
 @trusted, you will have to control construction of the PacketHeader somehow. 
 […]

Okay, iff you are using a pointer more or less exclusively as an opaque handle,
then I guess you are right – I thought only about pointers that are directly
obtainable in @safe code.

But then, please be careful with including something along the lines of »For
@safe functions, the compiler should allow access only to the specific item
pointed to as defined by the pointed-at type, and nothing else« in the docs,
because it is quite misleading (or even technically wrong, although I know what
you are trying to say): A @safe function _can_ in effect access other memory,
if only with the help from a @trusted function.

On a related note, the distinction between @safe and @trusted (especially the
difference in mangling) is a horrible abomination and should die in a fire.
@safe and @system are contracts, @trusted is an implementation detail – mixing
them makes no sense.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #54 from klickverbot c...@klickverbot.at 2012-06-04 12:14:40 PDT 
---
(In reply to comment #52)
 This is a bug, both should be optimized equally:
 
 void foo(immutable int * _param) pure
 {
immutable(int)* param = _param; // legal
... // same code as if you had written void foo(immutable(int)* param)
 }

Yep, both should be recognized PUREstrong in DMD – if not, please open a new
bug report for that.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-04 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #55 from Steven Schveighoffer schvei...@yahoo.com 2012-06-04 
13:34:50 PDT ---
(In reply to comment #53)
 (In reply to comment #51)
  (In reply to comment #48)
   Here, calling gun needs to be safe under _any_ circumstances.
  
  No, it does not.  Once you use @trusted, the compiler stops checking that 
  it's
  @safe.
 
 Yes, it does. As you noted correctly, you as the one implementing gun() must
 take care of that, the compiler doesn't help you here. But still, you must
 ensure that gun() never violates memory safety, regardless of what is passed
 in, because otherwise it might cause @safe code to be no longer memory safe.

I think I misunderstood your original point.  I thought you were saying that
gun must be *prevented from* modifying other memory relative to its parameter. 
Were you simply saying that gun is not stopped by the compiler, but must avoid
it in order to maintain safety?  If so, I agree, for your example.

I can also see that my response was misleading.  I did not mean it should not
be safe, I meant it's not enforced as safe.  Obviously something that is
@trusted needs to maintain safety.

 On a related note, the distinction between @safe and @trusted (especially the
 difference in mangling) is a horrible abomination and should die in a fire.
 @safe and @system are contracts, @trusted is an implementation detail – mixing
 them makes no sense.

I'm not sure what you're saying here, but @trusted is *definitely* needed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-03 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #9 from Denis Shelomovskij verylonglogin@gmail.com 2012-06-03 
10:23:09 MSD ---
Such a mess! The more people write here the more different opinions I see.
IMHO, Walter and Andrei must also participate here to help with conclusion (or
to finally mix everything up).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-03 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #10 from art.08...@gmail.com 2012-06-03 06:28:09 PDT ---
(In reply to comment #7)

 argument value is all the data reachable via the parameters.  Argument result
 is all the data reachable via the result.
[...]
 the optimizer.  My gut feeling is the compiler/optimizer should trust the code
 knows what it's doing. and so should expect that the code implicitly knows
 how much data it can access after the pointer.

Having pure as an user provided attribute, the compiler completely trusting
the programmer and only checking/enforcing certain assumptions when it is easy
to do, is a reasonable solution. Anybody that understands the purity concept
will have no problem determining if some function is pure or not, this is how
it is in C, in dialects supporting pure.

Unfortunately, D has purity inference.

   uint f()(immutable ubyte* p) {
  uint r;
  foreach (i; 0..size_t.max)
 r += p[i];
  return r;
   }

Can this still be considered pure?
What about uint f2()(Struct* p) {/*same body*/}?
Or

   uint f3()(ubyte* p) {
  uint r;
  foreach (i; 0..size_t.max)
 r += p[i]++;
  return r;
   }

?

All three functions are tagged as pure by the compiler...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-03 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185


timon.g...@gmx.ch changed:

   What|Removed |Added

 CC||timon.g...@gmx.ch


--- Comment #11 from timon.g...@gmx.ch 2012-06-03 12:18:33 PDT ---
(In reply to comment #0)
 The Question: What exactly does these pure functions consider as `argument
 value` and as `returned value`? Looks like this is neither documented nor
 obvious.
 

Pointers may only access their own memory blocks, therefore exactly those
blocks participate in argument value and return value.
But why does it even matter? Isn't this discussion mostly philosophical?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-03 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #12 from art.08...@gmail.com 2012-06-03 12:46:28 PDT ---
(In reply to comment #11)
 Pointers may only access their own memory blocks, therefore exactly those
 blocks participate in argument value and return value.

What does 'their own memory block' mean? The problem is a pointer is basically
an unbounded array, and, if the access isn't restricted somehow, makes the
function dependent on global memory state.

 But why does it even matter? Isn't this discussion mostly philosophical?

The compiler will happily assume that template functions are pure even when
they clearly are not, and there isn't even a way to mark such functions as
impure (w/o using hacks like calling dummy functions etc).
Example - a function that is designed to operate on arrays, will always be
called with a pointer to inside an array, and can assume that the previous and
next element is always valid: 

  f4(T)(T* p) {
  p[-1] += p[0];
   }

The compiler thinks f4() is pure, when it clearly is not; optimizations based
on that assumption are likely to result in corrupted data.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-03 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #14 from art.08...@gmail.com 2012-06-03 13:52:53 PDT ---
(In reply to comment #13)
 (In reply to comment #12)
  (In reply to comment #11)
   Pointers may only access their own memory blocks, therefore exactly those
   blocks participate in argument value and return value.
  
  What does 'their own memory block' mean?
 
 The allocated memory block it points into.

But, as the bounds are unknown to the compiler, it does not have the this
information, it has to assume everything is reachable via the pointer. This is
why i suggested above that only dereferencing a pointer should be allowed in
pure functions.

  The problem is a pointer is basically an unbounded array,
 
 That is wrong. The pointer is bounded, but it is generally impossible to 
 devise
 the exact bounds from the pointer alone. This is why D has dynamic arrays.

And one way to make it work is to forbid dereferencing pointers and require fat
ones. Then the bounds would be known. But i don't think anybody would want to
write f(pointer_to_some_struct[0..1])...

  and, if the access isn't restricted somehow, makes the
  function dependent on global memory state.
 
 ? A function independent of memory state is useless.

int n(int i) {return i+42;}

  
   But why does it even matter? Isn't this discussion mostly philosophical?
  
  The compiler will happily assume that template functions are pure even when
  they clearly are not, and there isn't even a way to mark such functions as
  impure (w/o using hacks like calling dummy functions etc).
  Example - a function that is designed to operate on arrays, will always be
  called with a pointer to inside an array, and can assume that the previous 
  and
  next element is always valid: 
  
f4(T)(T* p) {
p[-1] += p[0];
 }
  
  The compiler thinks f4() is pure, when it clearly is not; optimizations 
  based
  on that assumption are likely to result in corrupted data.
 
 f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler
 is not allowed to perform optimizations that change defined program behavior.

f4 isn't pure, by any definition - it depends on (or in this example modifies)
state, which the caller may not even consider reachable. The compiler can
assume that a pure function does not access any mutable state other than what
can be directly or indirectly reached via the arguments -- that is what
function purity is all about. If the compiler has to assume that a pure
function that takes a pointer argument can read or modify everything, the
pure tag becomes worthless. And what's worse, it allows other truly pure
function to call our immoral one. 

Hmm, another way out of this could be to require all pointers args in a pure
function to target 'immutable' - but that, again, seems to limiting; bool f(in
Struct* s) could not be pure.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-03 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #15 from Jonathan M Davis jmdavisp...@gmx.com 2012-06-03 14:40:12 
PDT ---
The _only_ thing that the pure attribute means by itself is that that function
cannot directly access any mutable global or static variables. That is _all_.
It means _nothing_ else. It can mess with pointers. It can mess with in, ref,
out, and lazy parameters. It can mess with the elements in a slice (thereby
alterining external state). It can mess with mutable global or static variables
_indirectly_ via the arguments that it's passed (e.g if a pointer or ref is
passed to a global variable). It just cannot _directly_ access any mutable
global or static variables.

pure by itself indicates a weakly pure function. That function enables _zero_
optimizations. It is _not_ pure in the sense that the functional or
mathematical community would consider pure. It is not even _trying_ to be pure
in that sense. What weak purity does is enable _strong_ purity to actually be
useful.

When the compiler can guarantee that all of a pure function's arguments
_cannot_ be altered by that function, _then_ it is strongly pure. Currently,
that gurantee is in effect only when all of the parameters of the function are
immutable or implicitly convertible to immutable. It could be extended to const
parameters in the case when they're passed immutable arguments, but that isn't
currently done.

A strongly pure function cannot alter its arguments at all, but it _can_
allocate memory, and it _can_ mutate any of its local state. _weakly_ pure
functions can therefore be called from within a strongly pure function, because
the only state that they can alter is the state of what's passed to them
(because the fact that they're marked with pure means that they cannot access
mutable global or mutable static state except via their arguments), and the
only state that the strongly pure function _can_ pass to them is local to it,
because it can't access global or static mutable state any more than they can,
and it can't even access it via its arguments, because it's strongly pure.

This is all very clear and well-defined.

Having pointers sent off into la-la land doing unsafe @system stuff is a
_completely_ separate issue. You can break pretty much _anything_ with @system
code. You could even cast a function which called writeln so that that the
signature was pure and then call it from a pure function. All bets are off when
you're in @system land. It's _your_ job to make sure that your code isn't doing
something completely screwy at that point. Any function or operation which the
compiler doesn't consider pure would still make a templated function be
considered impure in such cases, but because it's @system, you can trick it if
you want to (e.g. by casting a function's signature). But it's @system code -
unsafe code - so it's your fault at that point, not the compiler's.

I really don't know how the documentation could be much clearer. ref and
pointer arguments are't returned. Only the return value is returned. And
arguments are clearly the arguments to the function. And as long as the
compiler can determine that nothing has been done to an argument to alter it,
it's going to consider to be the same value (and it's going to be _extremely_
conservative about that - even altering a reference or pointer of the same type
would make its value be considered different, because they both might point to
the same thing).

As for stuff like strlen, in that case, you're doing the @system thing of
saying that yes, I know what I'm doing. I know that this function isn't marked
as pure, because it's a C function, but I also know that it _is_ actually pure.
I know that it won't access global mutable state. So, I will mark it as pure so
that it can be used in pure code. I'm telling the compiler that I know better
than it does. And in this caes, I do. If I didn't, then you'd have a bug, and
it would be the my fault, because they I the compiler what was best, and I was
wrong. At that point, it's up to me to make sure that that the compiler's
guarantees aren't being violated. That's @system for you. D is a systems
programming language. You can do that sort of thing.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-03 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #16 from art.08...@gmail.com 2012-06-03 15:50:29 PDT ---
(In reply to comment #15)
 pure by itself indicates a weakly pure function. That function enables _zero_

Inventing terminology doesn't help, especially when the result is so confusing.


 optimizations. It is _not_ pure in the sense that the functional or
 mathematical community would consider pure. It is not even _trying_ to be pure
 in that sense. What weak purity does is enable _strong_ purity to actually be
 useful.
 
 When the compiler can guarantee that all of a pure function's arguments
 _cannot_ be altered by that function, _then_ it is strongly pure. Currently,
 that gurantee is in effect only when all of the parameters of the function are
 immutable or implicitly convertible to immutable. It could be extended to 
 const
 parameters in the case when they're passed immutable arguments, but that isn't
 currently done.
[...]

tl;dr.

The bugtracker is probably not the right place for this discussion; we could
move it to the ML, but talking about it only makes sense if D can be fixed;
otherwise we would be wasting our time...

Limiting pure to just immutable data would work indeed, but it's much too
limiting.

   struct S {int a,b; int[64] c; bool f() const pure {return a||b;}}

   int g(S* p) {
  int r;
  foreach (i; 0..64)
 if (p.f())
r |= p.c[i];
  return r;
   }


Using your weak pure definition, f's pure would be a NOOP - that is not
what most people would expect, and is not a sane purity implementation. 
It's not a problem for trivial examples such as this one because inlining
should take care of it, but would make pure almost useless in real code, as
it would almost never be, to use your terminology again, strongly pure (and
couldn't be moved out of the loop).

Note that, even when using your strong purity definition, the compiler still
does the wrong thing - some of the examples I gave previously in this bug are
(and others can be trivially modified to be) inferred as strongly pure
functions, when they are not pure at all.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-03 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #17 from Jonathan M Davis jmdavisp...@gmx.com 2012-06-03 16:02:19 
PDT ---
They aren't _my_ definitions. They're official. They've been discussed in the
newsgroup. They've even been used by folks like Walter Bright in talks at
conferences. How purity is implemented in D has been discussed and was decided
a while ago. It works well and is not going to change. Weak purity solved a
real need. All we had before was strong purity, and it was almost useless,
because it was so limited. It is _far_ more useful now that it was before.

A pure function is clearly defined as a function which cannot access global or
static state which is mutable. It doesn't matter how other languages use the
term pure. That's how D uses it. And in cases where a function is strongly
pure, you _do_ get the optimizations based on passing the same arguments to the
same pure function multiple times that you'd expect from a more functional
language.

If you don't like how D's pure works, that's fine - you're free to have your
own opinion, be it dissenting or otherwise - but how pure works in D is _not_
going to change. If bugs are found in the compiler's implementation of it, they
will be addressed, but at this point, the design is what it is.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-03 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #18 from Denis Shelomovskij verylonglogin@gmail.com 
2012-06-04 09:38:21 MSD ---
(In reply to comment #15)
 I really don't know how the documentation could be much clearer.

Once it will have examples showing what asserts have to/may/shouldn't pass
and/or (I prefer and) what optimizations can be done. Even Setting Dynamic
Array Length section has such examples but it is far more simple.

 As for stuff like strlen, in that case, you're doing the @system thing of
saying that yes, I know what I'm doing.

And the missing now words What exactly does these pure functions consider as
`argument
value` and as `returned value` from my original question because it's treated
by someone as only pointer dereferencing and by someone access to any
logically accessible address.

Again, all misunderstanding of pure functions in D can be easily solved by just
adding (lots of) examples with difficult cases into docs.

IMHO, Jonathan M Davis e.g. will save at least lots of his time (yes, and our
time too) by just adding such examples with minimal comments into docs instead
of writing such big answers.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-03 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #19 from Jonathan M Davis jmdavisp...@gmx.com 2012-06-03 22:58:33 
PDT ---
I honestly don't understand why much in the way of examples are needed. The
documentation explains what pure is. When the compiler is able to optimize out
calls to pure functions is an implementation detail - just like optimizations
with const or immutable are. You use pure wherever you can, and the compiler
will optimize where it can.

The documentation could go into more detail on weakly pure vs strongly pure
(since it doesn't mention either), but that's pretty much the only relevant
improvement that I can think of, and I know that Don would be annoyed by that,
since he wants the terms strongly pure and weakly pure to die and just leave
them as implementation details (though I think that he's the only one who
really feels that way).

I think that there's a lot of overthinking of this going on here. The
documentation quite clearly states what a pure function is and what it can and
can't do. I don't see how more examples would really help much with that. But
anyone has an idea that they think will improve the documentation, then feel
free to create a pull request with the changes.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-02 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185


klickverbot c...@klickverbot.at changed:

   What|Removed |Added

 CC||c...@klickverbot.at
   Severity|major   |enhancement


--- Comment #1 from klickverbot c...@klickverbot.at 2012-06-02 01:44:18 PDT 
---
The current behavior is by design, and perfectly fine – note that `pure` in D
just means that a function doesn't access global (mutable) state. A pointer
somewhere isn't a problem either, since the caller must have obtained the
address from somewhere, and if it was indeed from global state, the calling
code couldn't be pure.

Do you have any suggestions on how to make this clearer in the spec? I admit
that the design can take some time to wrap one's head around, but I'm not sure
what's the best way to make the concept easier to grasp.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-02 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #3 from Denis Shelomovskij verylonglogin@gmail.com 2012-06-02 
14:29:01 MSD ---
(In reply to comment #1)
 The current behavior is by design, and perfectly fine – note that `pure` in D
 just means that a function doesn't access global (mutable) state. A pointer
 somewhere isn't a problem either, since the caller must have obtained the
 address from somewhere, and if it was indeed from global state, the calling
 code couldn't be pure.

OK. Looks like everything works but I don't understand how. So could you please
answer the question (read this to the end).

According to http://dlang.org/function.html#pure-functions
 Pure functions are functions that produce the same result for the same 
 arguments.

And my original question is
 The Question: What exactly does these pure functions consider as `argument
value` and as `returned value`?

Illustration:
---
int f(in int* p) pure;

void g()
{
auto arr = new int[5];
auto res = f(arr.ptr);

assert(res == f(arr.ptr));

assert(res == f(arr.ptr + 1)); // *p isn't changed

arr[1] = 7;
assert(res == f(arr.ptr)); // neither p nor *p is changed

arr[0] = 7;
assert(res == f(arr.ptr)); // p isn't changed
}
---
Which asserts must pass?

The second assert is here according to
http://klickverbot.at/blog/2012/05/purity-in-d/  (yes, it's Indirections in
the Return Type? section, but sentences looks general and I think it can be
treated this way):
 The first essential point are addresses, respectively the definition of 
 equality applied when considering referential transparency. In functional 
 languages, the actual memory address that some value resides at is usually of 
 little to no importance. D being a system programming language, however, 
 exposes this concept.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-02 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185


klickverbot c...@klickverbot.at changed:

   What|Removed |Added

   Severity|enhancement |normal


--- Comment #4 from klickverbot c...@klickverbot.at 2012-06-02 07:50:05 PDT 
---
(In reply to comment #3)
 And my original question is
  The Question: What exactly does these pure functions consider as `argument
 value` and as `returned value`?
 
 Illustration:
 ---
 int f(in int* p) pure;

Thanks for the example, this certainly makes your concerns easier to see. You
are right, the spec is really not clear in this regard – but in my opinion,
only a single interpretation makes sense, in that it is actually enforceable by
the compiler:

---
 auto res = f(arr.ptr);
 assert(res == f(arr.ptr));
This one obviously has to pass.

 assert(res == f(arr.ptr + 1)); // *p isn't changed
Might fail, f is allowed to return cast(int)p.

 arr[1] = 7;
 assert(res == f(arr.ptr)); // neither p nor *p is changed
Must pass, reading/modifying random bits of memory inside pure functions is
obviously a bad idea. Bad idea meaning that pointer arithmetic is disallowed in
@safe code anyway, and in @system code, you as the programmer are responsible
for not violating the type system guarantees – for example, you can just call
any impure function in a pure context using a cast. This also means that e.g. C
string functions cannot not be pure in D.

 arr[0] = 7;
 assert(res == f(arr.ptr)); // p isn't changed
Might fail, as discussed in the »What about Referential Transparency« section
of the article – only if the parameters are _transitively_ equal (as defined by
their type), then pure functions are guaranteed to return the same value.

 The second assert is here according to
 http://klickverbot.at/blog/2012/05/purity-in-d/.
Then this aspect of the article is apparently not as clear as it could be –
thanks for the feedback, I'll incorporate it in the next revision.
---

Do you disagree with any of these points? If so, I'd be happy to provide a more
in-depth explanation of my view, so we can clarify the spec afterwards.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-02 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185


art.08...@gmail.com changed:

   What|Removed |Added

 CC||art.08...@gmail.com


--- Comment #5 from art.08...@gmail.com 2012-06-02 08:22:14 PDT ---
(In reply to comment #0)

 I see the only two ways to document it properly (yes, the main problem is with
 `h` function):

  * once pure function accepts a pointer it is considered depending on all
 process memory;

That would work, but would probably be too limiting.


 * Allow only dereferencing the pointer, disallow any kind of indexing. Note
it's not trivial, as pointer arithmetic should still work. But probably doable,
by disallowing dereferencing at all, and making a special exception for
accessing via an unmodified argument. This would also have to work recursively,
so it basically comes down to introducing a special kind of pointer, that
behaves a bit more like a reference. The alternatives are the ones you listed,
either banning pointers or assuming the function depends on everything -
neither is really acceptable. A pure function shouldn't deal with unbounded
arrays, so this kind of restriction should be fine (the alternative is to have
to slice everything, which is not a sane solution, eg when working with
pointers to structs)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-02 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #6 from Denis Shelomovskij verylonglogin@gmail.com 2012-06-02 
19:59:12 MSD ---
(In reply to comment #4)
 (In reply to comment #3)
  assert(res == f(arr.ptr + 1)); // *p isn't changed
 Might fail, f is allowed to return cast(int)p.

Am I understanding correct that:
---
int[] f() pure;
int g(in int[] a) pure;
int gs(in int[] a) @safe pure;

void h()
{
assert(g(f()) == g(f()));   // May or may not pass
assert(gs(f()) == gs(f())); // Should pass
}
---
?

  arr[1] = 7;
  assert(res == f(arr.ptr)); // neither p nor *p is changed
 Must pass,...

So this code is invalid:
---
void f(int* i) pure @safe // or unsafe, doesn't matter
{ ++i[1]; }
---
and this is invalid too:
---
struct MyArray {
int* p;
size_t len;

...

int opIndex(size_t i) pure @safe // or unsafe, doesn't matter
in { assert(i  len); }
body {
return p[len];
}
}
---
?

And this is valid:
---
void f(int* i) pure @safe // or unsafe, doesn't matter
{ ++*i; }
---
?

 reading/modifying random bits of memory inside pure functions is
 obviously a bad idea. Bad idea meaning that pointer arithmetic is disallowed 
 in
 @safe code anyway, and in @system code, you as the programmer are responsible
 for not violating the type system guarantees – for example, you can just call
 any impure function in a pure context using a cast. This also means that e.g. 
 C
 string functions cannot not be pure in D.

I'm a bit confused because I didn't mention @safe attribute. If you have a time
I'd like to see about @safe/unsafe pure functions differences in your article
because it looks like these things are really different.

  The second assert is here according to
  http://klickverbot.at/blog/2012/05/purity-in-d/.
 Then this aspect of the article is apparently not as clear as it could be –
 thanks for the feedback, I'll incorporate it in the next revision.

Not sure, my English is rather bad so I could just misunderstand something.

 Do you disagree with any of these points? If so, I'd be happy to provide a 
 more
 in-depth explanation of my view, so we can clarify the spec afterwards.


`void f(void*) pure;` is still unclear for me. What can it do? What can it do
if it's @safe?

And I completely misunderstand why pure functions can't be optimized out as
Steven Schveighoffer sad in druntime pull 198 comment:
 The fact that it returns mutable makes it weak pure (the optimizer cannot 
 remove any calls to gc_malloc)
(yes, this is a general question, not pointers only)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-02 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Steven Schveighoffer schvei...@yahoo.com changed:

   What|Removed |Added

 CC||schvei...@yahoo.com


--- Comment #7 from Steven Schveighoffer schvei...@yahoo.com 2012-06-02 
17:48:23 PDT ---
All of the functions(In reply to comment #3)
 
 According to http://dlang.org/function.html#pure-functions
  Pure functions are functions that produce the same result for the same 
  arguments.

This is certainly true.  However, it's not practical nor always possible for
the compiler to determine if a call can be optimized out.  Consider that on any
call to a pure function that takes mutable data, the function could modify the
data, so even calling with the same exact pointer again may result in a new
effective parameter.

However, if a function has only immutable or implicitly convertible to
immutable parameters and return values, the function *can* be optimized out,
because it's guaranteed nothing ever changes.

This situation is what has been called strong pure.  It's the equivalent to
functional language purity.

It's possible in certain situations for a weak pure function to be considered
strong pure.  For example, consider a function which takes a const parameter,
and returns a const.  Pass an immutable into it, and nothing could possibly
have changed before the next call, it can be optimized out.  The compiler does
not take advantage of these yet.

 And my original question is
  The Question: What exactly does these pure functions consider as `argument
 value` and as `returned value`?

argument value is all the data reachable via the parameters.  Argument result
is all the data reachable via the result.

For pointers, you are under the same rules as normal functions -- @safe
functions cannot use pointers, unsafe ones can.  If an unsafe pure function is
called, a certain degree of freedom to screw up is available, just like any
other unsafe function.

 int f(in int* p) pure;
 
 void g()
 {
 auto arr = new int[5];
 auto res = f(arr.ptr);
 
 assert(res == f(arr.ptr));

obviously this passes, all the parameters are identical, and nothing could have
changed between the two calls.  The call will not currently be optimized out,
because the compiler isn't smart enough yet.

 
 assert(res == f(arr.ptr + 1)); // *p isn't changed

may or may not pass, parameter is different.

 
 arr[1] = 7;
 assert(res == f(arr.ptr)); // neither p nor *p is changed

may or may not pass.  f is not @safe, so it could possibly access arr[1].

 
 arr[0] = 7;
 assert(res == f(arr.ptr)); // p isn't changed

may or may not pass, the parameter is different.

 And I completely misunderstand why pure functions can't be optimized out as
 Steven Schveighoffer sad in druntime pull 198 comment:

I hope I have helped to further your understanding with this post.  Don just
looked up the original thread which outlined the weak-pure proposal, which was
submitted to digitalmars.D on August 2010.  You may want to read that entire
thread.

In general response to this bug, I'm unsure how pointers should be treated by
the optimizer.  My gut feeling is the compiler/optimizer should trust the code
knows what it's doing. and so should expect that the code implicitly knows
how much data it can access after the pointer.

Consider an interesting case, using BSD sockets:

int f(immutable sockaddr *addr) pure;

sockaddr is a specific size, yet it's a base class of different types of
address structures.  Typically, one casts the sockaddr into the correct struct
based on the sa_family member.

But this may technically mean f accesses more data than it is given, based on a
rigid interpretation of the type system.  Should the compiler enforce this
given it makes this kind of function practically useless?  I think not.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 8185] Pure functions and pointers

2012-06-02 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Jonathan M Davis jmdavisp...@gmx.com changed:

   What|Removed |Added

 CC||jmdavisp...@gmx.com


--- Comment #8 from Jonathan M Davis jmdavisp...@gmx.com 2012-06-02 21:29:24 
PDT ---
This isn't true:

 @safe functions cannot use pointers, unsafe ones can.

@safe functions can use pointers just fine. Pointers themselves are considered
@safe (e.g. the AA's in operator works just fine in @safe code). It's unsafe
pointer operations such as pointer arithmetic which are not @safe.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---