Re: Nested functions [was Re: valgrind]

2022-03-28 Thread Chris Hanson
On Mar 24, 2022, at 8:16 PM, Mouse  wrote:

> Nested functions are not closures, or at least not what I know as
> closures.  A nested function pointer (conceptually) goes invalid as
> soon as anything it refers to goes out of scope, or at the latest as
> soon as its smallest enclosing block exits (or possibly when its
> smallest enclosing function returns).  The thing I know as a closure
> would preserve the referred-to objects as long as the closure is
> potentially callable.  This requires more reference tracking than C
> typically makes possible.

Closures can easily be used as nested functions, and the blocks feature that 
was created by Apple explicitly addresses this by including runtime functions.

1. You can just declare and use a block if it's not going to go out of scope, 
as if it were a nested function, with just slightly different syntax than 
nested functions. (You do this by declaring a block-typed variable and 
immediately assigning the block to it, then calling through the variable. It 
behaves like a function pointer.)

2. If you have a variable in the outer scope that you want to modify in the 
block, you have to annotate the variable with __block, so the block context 
(activation record) created for the bock can find it. Otherwise everything in 
the enclosing scope is effectively immutable (though one must of course be 
aware that an immutable pointer is not the same as a pointer to immutable data).

3. If you're passing the block to another function or block, whatever you pass 
it to should *either* annotate its block argument as non-escaping via the right 
__attribute__(()) to indicate it doesn't need any memory management, or it 
should Block_copy() the block it's handed and work only with the copy. That 
ensures the block context is hoisted to the heap. Following Apple's general 
patterns, a Block_copy() of a heap block is identical to a BlocK_retain() so 
the latter should really never be called.

4. If something calls Block_copy() because it wants to hold onto a block, it 
needs to call Block_release() to eventually dispose of the block.

Here are the language and runtime specs. They're a great language addition and 
used pervasively on Apple platforms as a result.

Language: https://clang.llvm.org/docs/BlockLanguageSpec.html 

Runtime/ABI: https://clang.llvm.org/docs/Block-ABI-Apple.html 


  -- Chris



Re: Nested functions [was Re: valgrind]

2022-03-25 Thread David Holland
On Thu, Mar 24, 2022 at 11:16:58PM -0400, Mouse wrote:
 > > The conclusion over the past ~25 years has been that there isn't;
 > > qsort and things like it work "well enough" and real uses for
 > > closures that really motivate the feature come up rarely enough that
 > > it doesn't happen.
 > 
 > Nested functions are not closures, or at least not what I know as
 > closures.  A nested function pointer (conceptually) goes invalid as
 > soon as anything it refers to goes out of scope, or at the latest as
 > soon as its smallest enclosing block exits (or possibly when its
 > smallest enclosing function returns).  The thing I know as a closure
 > would preserve the referred-to objects as long as the closure is
 > potentially callable.  This requires more reference tracking than C
 > typically makes possible.

There's a term for scope-restricted closures (that become invalid this
way) that I'm currently forgetting. They're still closures in most of
the important ways, and it's a useful concept for a language where
copying memory isn't free.

 > > There is no solution based on trampolines that'll pass security (or
 > > at least security-theatre) muster.  Unless maybe by doing something
 > > that's horrifying in other ways.
 > 
 > The safest alternative that comes to my mind is to have two stacks, one
 > for trampolines and one for everything else.  But that requires
 > something much like two stack pointers, including assist from the
 > setjmp/longjmp implementation and, if applicable, threads.

That's not s3kure! You can still get arbitrary code execution by
scribbling in it.

 > > (For example: you could declare a static limit on how many instances
 > > of the closure you'll ever produce, make a global array to stuff the
 > > data pointer in, and statically generate N trampoline entry points
 > > that read from that array and call the primary function.  But there
 > > are many other ways in which this is horrible.)
 > 
 > But there are use cases for which it is not a stupid implementation;
 > [...]

I think to keep it safe you need more code analysis than you're likely
to get with C code. But IDK.

-- 
David A. Holland
dholl...@netbsd.org


Re: Nested functions [was Re: valgrind]

2022-03-24 Thread Mouse
> [...] said, moving to fat function pointers on machines that don't
> already use them is a real ABI change and therefore a big deal; but
> it could be done if there were a compelling argument to justify going
> through all the associated dark rituals.

Or as a private experiment, in which compatibility with other ABIs is
irrelevant.  That's what I would, putatively, be doing.

> The conclusion over the past ~25 years has been that there isn't;
> qsort and things like it work "well enough" and real uses for
> closures that really motivate the feature come up rarely enough that
> it doesn't happen.

Nested functions are not closures, or at least not what I know as
closures.  A nested function pointer (conceptually) goes invalid as
soon as anything it refers to goes out of scope, or at the latest as
soon as its smallest enclosing block exits (or possibly when its
smallest enclosing function returns).  The thing I know as a closure
would preserve the referred-to objects as long as the closure is
potentially callable.  This requires more reference tracking than C
typically makes possible.

> There is no solution based on trampolines that'll pass security (or
> at least security-theatre) muster.  Unless maybe by doing something
> that's horrifying in other ways.

The safest alternative that comes to my mind is to have two stacks, one
for trampolines and one for everything else.  But that requires
something much like two stack pointers, including assist from the
setjmp/longjmp implementation and, if applicable, threads.

> (For example: you could declare a static limit on how many instances
> of the closure you'll ever produce, make a global array to stuff the
> data pointer in, and statically generate N trampoline entry points
> that read from that array and call the primary function.  But there
> are many other ways in which this is horrible.)

But there are use cases for which it is not a stupid implementation;
for example, if no containing function is ever called recursively or
reentrantly, and of course if the limit is high enough, it is obviously
safe and quite possibly one of the fastest techniques available (since
"creating" a trampoline can be made very fast, not needing even D$
pushes and I$ flushes).  But, under those conditions, the autos in the
enclosing function can be promoted to static storage duration and the
nested function turned, essentially, into an ordinary function.

Are the cases where the compiler can prove that's true a big enough
fraction of the use cases to be useful?  I don't know, not even for my
own code.  I _think_ one of the heavier uses of the technique - my ssh
implementation - could work just fine with this approach, but I haven't
looked at every nested function closely enough to be sure.  (It needs
more than nested functions, though; one of its uses of nested functions
is to combine them with what gcc calls nonlocal gotos as a throw-out
mechanism better in many respects than setjmp/longjmp.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Nested functions [was Re: valgrind]

2022-03-24 Thread David Holland
On Tue, Mar 22, 2022 at 09:16:36PM -0400, Mouse wrote:
 > Indeed, you can have different sizes for pointers to different object
 > types, too.  I _think_ pointers to different function types can have
 > different sizes, but I'm less certain of that.  (There would be little
 > point, since all function pointer types have to have the same
 > information content; see below.)

I don't think they can, but I wouldn't swear to it and, as you note,
it's a moot point.

 > > However there are, I fear, too many programs that somewhere convert a
 > > function pointer to (void *) and at that point things break.
 > 
 > That is a bug.  It always has been.  Such code is broken and it
 > deserves to be rednered _obviously_ broken so it can get fixed.

...and it already doesn't work on some platforms. I don't think
there's actually that much of this and finding it is a fairly
straightforward static analysis problem.

That said, moving to fat function pointers on machines that don't
already use them is a real ABI change and therefore a big deal; but it
could be done if there were a compelling argument to justify going
through all the associated dark rituals.

The conclusion over the past ~25 years has been that there isn't;
qsort and things like it work "well enough" and real uses for closures
that really motivate the feature come up rarely enough that it doesn't
happen. And that's why we are where we are.

(More than ~25 years ago people probably cared, or purported to care,
about an extra 4 bytes per function pointer enough to consider it not
worthwhile on those grounds.)

There is no solution based on trampolines that'll pass security (or at
least security-theatre) muster. Unless maybe by doing something that's
horrifying in other ways.

(For example: you could declare a static limit on how many instances
of the closure you'll ever produce, make a global array to stuff the
data pointer in, and statically generate N trampoline entry points
that read from that array and call the primary function. But there are
many other ways in which this is horrible.)

-- 
David A. Holland
dholl...@netbsd.org


Nested functions [was Re: valgrind]

2022-03-22 Thread Mouse
>> Can't you?  Does C require function pointers to have the same type,
>> or compatible structure, as data pointers?
> No, I don't think that it does.

Correct.

> You could have different sizes for those.

Indeed, you can have different sizes for pointers to different object
types, too.  I _think_ pointers to different function types can have
different sizes, but I'm less certain of that.  (There would be little
point, since all function pointer types have to have the same
information content; see below.)

> However there are, I fear, too many programs that somewhere convert a
> function pointer to (void *) and at that point things break.

That is a bug.  It always has been.  Such code is broken and it
deserves to be rednered _obviously_ broken so it can get fixed.  I can
understand commercial compiler vendors' reluctance to "break customer
code" even if the reality is more "pointing out where customer code was
already broken and just historically getting away with it".  But I have
more trouble understanding it for things like gcc.

But then, there is a lot of confusion around code portability.  There's
at least one relatively popular open-source project - SQLite, fuzzy
memory says - that has a FAQ list entry that goes something like
"$PROJECT provokes these warnings!" with a response that used to read
like "we test it heavily, the code is fine", not understanding that the
warnings are not so much about the code generated today on today's
machines, which they are correct that testing can address; it is about
tomorrow's architecture and/or tomorrow's new compiler release, or new
compiler, getting the code correct so it will work there too.

> And there is not really a "generic function pointer type" that you
> could sensibly use instead, I think.

There is, actually.  Any function pointer type will do.  Any function
pointer can be cast to any other function pointer type and back without
change - at least as of C99; "A pointer to a function of one type may
be converted to a pointer to a function of another type and back again;
the result shall compare equal to the original pointer.".  What you
can't do is call through the pointer when it points to the wrong
function type: "If a converted pointer is used to call a function whose
type is not compatible with the pointed-to type, the behavior is
undefined.".

> Possibly, a trampoline could be created on the heap, and then made
> executable and un-writable. Maybe that's considered too complicated /
> system dependent / expensive by gcc?

(1) it is ugly for the runtime to be mallocing behind the scenes, (2)
this breaks in the presence of longjmp (it is hard to stop it from
leaking trampolines when longjmping through a stack frame that created
a trampoline), and (3) it is difficult to use safely in the presence of
signal pointers or threads.  Oh, and (4) yes, changing memory
protection is system-dependent and expensive, and there are even some
Harvard(ish) architectures on which generating new executable code at
runtime is architecturally impossible.  (Few to none of them are
targeted by gcc, I suspect.)

(2) could be addressed if someone were to design an unwind-protect for
longjmp (which arguably should have existed all along), but that brings
it back to the "gcc wants to be compatible with existing systems"
issue; remember, back when gcc arose it usually wasn't the system
compiler, so it had to be compatible.  Arguably, now that gcc at least
sometimes _is_ the system compiler, it would make sense for it to grow
a way to handle nested functions better.  But I suspect there is
comparatively little will to do that in the gcc crowd, or it would have
happened long since.  And the heap techniques still run into (3); I
think fat function pointers is the rightest way to do it when
compatibility with a preexisting ABI is not a concern.

To bring this back to NetBSD, I ran into (4) myself back sometime in
2000-2005ish, I think it was.  There were two issues I ran into; I
can't recall which one was first.  One was that some new NetBSD release
brought in totally non-executable stack on at least one of the
architectures I ran, making it impossible to use gcc nested functions
at all.  That I had to fix in the kernel.  The other was that gcc's
configuration made at least one syscall per trampoline generated.  The
results were correct, but it was intolerably slow for at least one
program I cared about (I think it was a search program with a nested
function pointer being generated in a relatively deeply nested
routine).  I forget what I did there - just made the whole stack
executable, maybe?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: nested functions [was Re: valgrind]

2022-03-22 Thread Koning, Paul



> On Mar 22, 2022, at 2:23 PM, Mouse  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
>>> I found an interesting article about why they're bad...
>>> https://urldefense.com/v3/__https://thephd.dev/lambdas-nested-functions-block-expressions-oh-my__;!!LpKI!1zB1gatUTEiM-j9CQ_6N-NWd4jS8UvW5iTSMRgW1tEyW_mK0mG2pU48LnUwJJBvA$
>>>  [thephd[.]dev]
>> That's a good argument for why GCC's implementation of nested functions is b$
> 
> What security blunder is that?  Based on your next line, I'm going to
> assume it's "implementing them via stack trampolines".  (I would have
> to go to a work machine to tell, because thephd.dev has apparently
> drunk the "it's good to ram HTTPS down everyone's throat" koolaid.
> Even the stack trampoline mechanism, I would say, is not a security
> blunder per se; I see it as a security issue only in that it
> exacerbates the effects of certain other security issues.  Also don't
> forget that early gcc arose in a very different environment from
> today's.)

Yes, I did mean executable stacks.  True, the world was different
once, though executable stacks are also problematic on some older
architectures for entirely different reasons.  For example, you can't
necessarily use them on PDP-11s.

>> I don't believe ALGOL implementations needed executable stacks to implement $
> 
> Neither would gcc...IF it can set the ABI.  There really was very
> little choice for gcc when it started.  It had to be ABI-compatible
> with existing procedure calling sequences.  It also had to be
> compatible with existing longjmps. That eliminates pretty close to
> everything _but_ stack trampolines.

True.  Do general ABIs like the VAX one have this issue or is it 
specific to just some of the ABIs?

> Other ways of doing nested functions is one of the things I want to
> experiment with.

I know the classic answer in ALGOL is "displays", which go all the
way back to the first ALGOL compilers, but I don't have the details
in my head.

paul



nested functions [was Re: valgrind]

2022-03-22 Thread Mouse
>> I found an interesting article about why they're bad...
>> https://thephd.dev/lambdas-nested-functions-block-expressions-oh-my
> That's a good argument for why GCC's implementation of nested functions is b$

What security blunder is that?  Based on your next line, I'm going to
assume it's "implementing them via stack trampolines".  (I would have
to go to a work machine to tell, because thephd.dev has apparently
drunk the "it's good to ram HTTPS down everyone's throat" koolaid.
Even the stack trampoline mechanism, I would say, is not a security
blunder per se; I see it as a security issue only in that it
exacerbates the effects of certain other security issues.  Also don't
forget that early gcc arose in a very different environment from
today's.)

> I don't believe ALGOL implementations needed executable stacks to implement $

Neither would gcc...IF it can set the ABI.  There really was very
little choice for gcc when it started.  It had to be ABI-compatible
with existing procedure calling sequences.  It also had to be
compatible with existing longjmps. That eliminates pretty close to
everything _but_ stack trampolines.

Other ways of doing nested functions is one of the things I want to
experiment with.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B