Re: JIT emulator needs

2007-06-23 Thread William Lee Irwin III
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
>>> c. open() flag to unlink a file before returning the fd

On Jun 19, 2007, at 11:08:24, William Lee Irwin III wrote:
>> You probably want a tmpfile(3) -like affair which never has a  
>> pathname to begin with. It could be useful for security purposes  
>> more generally.

On Fri, Jun 22, 2007 at 11:52:12PM -0400, Kyle Moffett wrote:
> maybe this: open("/some/dir", O_TMPFILE);
> and this? open("/some/dir", O_TMPFILE|O_DIRECTORY);
> The former would return a filehandle to a new anonymous file  
> somewhere on whatever filesystem backs the specified path.  The  
> latter would do the same, except create an anonymous directory where  
> you could use "openat()" or something.  Presumably "lsof" and "/proc"  
> should show either type of handle as referring to either "/some/ 
> filesystem/" or "/some/filesystem/ (anonymous temp file)" or something.

This is plausible (and I did indeed consider the file variant),
though it may require more infrastructure than for tmpfs only.

It may be worth clarifying that I have no concrete plans to work on
the JIT emulator issues myself. I'm only disseminating ideas I think
will pass review. I expect others to take up the issue(s) perhaps with
some inspiration from what I described. I may review some, but I have
a large review backlog as things now stand.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-23 Thread William Lee Irwin III
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 c. open() flag to unlink a file before returning the fd

On Jun 19, 2007, at 11:08:24, William Lee Irwin III wrote:
 You probably want a tmpfile(3) -like affair which never has a  
 pathname to begin with. It could be useful for security purposes  
 more generally.

On Fri, Jun 22, 2007 at 11:52:12PM -0400, Kyle Moffett wrote:
 maybe this: open(/some/dir, O_TMPFILE);
 and this? open(/some/dir, O_TMPFILE|O_DIRECTORY);
 The former would return a filehandle to a new anonymous file  
 somewhere on whatever filesystem backs the specified path.  The  
 latter would do the same, except create an anonymous directory where  
 you could use openat() or something.  Presumably lsof and /proc  
 should show either type of handle as referring to either /some/ 
 filesystem/ or /some/filesystem/ (anonymous temp file) or something.

This is plausible (and I did indeed consider the file variant),
though it may require more infrastructure than for tmpfs only.

It may be worth clarifying that I have no concrete plans to work on
the JIT emulator issues myself. I'm only disseminating ideas I think
will pass review. I expect others to take up the issue(s) perhaps with
some inspiration from what I described. I may review some, but I have
a large review backlog as things now stand.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-22 Thread Kyle Moffett

On Jun 19, 2007, at 11:08:24, William Lee Irwin III wrote:

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

c. open() flag to unlink a file before returning the fd


You probably want a tmpfile(3) -like affair which never has a  
pathname to begin with. It could be useful for security purposes  
more generally.


maybe this: open("/some/dir", O_TMPFILE);
and this? open("/some/dir", O_TMPFILE|O_DIRECTORY);

The former would return a filehandle to a new anonymous file  
somewhere on whatever filesystem backs the specified path.  The  
latter would do the same, except create an anonymous directory where  
you could use "openat()" or something.  Presumably "lsof" and "/proc"  
should show either type of handle as referring to either "/some/ 
filesystem/" or "/some/filesystem/ (anonymous temp file)" or something.


Cheers,
Kyle Moffett


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-22 Thread Albert Cahalan

On 6/22/07, Arjan van de Ven <[EMAIL PROTECTED]> wrote:


> > > > and these methods also destroy yourself on any machine with a looser
> > > > cache coherency between I and D-cache
> > > >
> > > > for all but x86 you pretty much have to do the mprotect() between the
> > > > two states to deal with the cache flushing properly...
> > >
> > > If the instructions to force data write-back and/or to
> > > invalidate the instruction cache are priveleged, yes.
> > > AFAIK, only ARM is that lame.
> >
> > and your program executes this on all the cpus in the system?

no I meant that you had to call your userspace instruction on all cpus,
so on all-but-arm (from the Intel side I know IA64 needs such a flush,
but I'm pretty sure PPC does too)


I understood.

AFAIK, it is common to propagate this via a special
bus cycle. Section 5.1.5.2.1 of the PowerPC manual
states that this is so. Secion 5.1.5.2 lists the requirements
for both uniprocessor and multiprocessor. Note that
Linux uses the coherent memory model for PowerPC SMP.
See also the "icbi" instruction description, where the use
of an address-only broadcast is mentioned.


> I don't recall seeing such code in the libgcc tranpoline
> setup for PowerPC. Either it's not required, or this is
> a rather popular bug.

I suspect it'll be playing under the assumption that going from "no
code" to "code" is fine since the icache is cold.


A previous trampoline would ruin that.

Fortunately, PowerPC is not as brain-dead as ARM and IA64.
(not that I'm writing code for any of these)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-22 Thread Arjan van de Ven

> > > > and these methods also destroy yourself on any machine with a looser
> > > > cache coherency between I and D-cache
> > > >
> > > > for all but x86 you pretty much have to do the mprotect() between the
> > > > two states to deal with the cache flushing properly...
> > >
> > > If the instructions to force data write-back and/or to
> > > invalidate the instruction cache are priveleged, yes.
> > > AFAIK, only ARM is that lame.
> >
> > and your program executes this on all the cpus in the system?

no I meant that you had to call your userspace instruction on all cpus,
so on all-but-arm (from the Intel side I know IA64 needs such a flush,
but I'm pretty sure PPC does too)


> I don't recall seeing such code in the libgcc tranpoline
> setup for PowerPC. Either it's not required, or this is
> a rather popular bug.

I suspect it'll be playing under the assumption that going from "no
code" to "code" is fine since the icache is cold.


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-22 Thread Albert Cahalan

On 6/22/07, Arjan van de Ven <[EMAIL PROTECTED]> wrote:

On Fri, 2007-06-22 at 01:56 -0400, Albert Cahalan wrote:
> On 6/21/07, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> > On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote:
> > > Right now, Linux isn't all that friendly to JIT emulators.
> > > Here are the problems and suggestions to improve the situation.
> > >
> > > There is an SE Linux execmem restriction that enforces W^X.
> > > Assuming you don't wish to just disable SE Linux, there are
> > > two ugly ways around the problem. You can mmap a file twice,
> > > or you can abuse SysV shared memory. The mmap method requires
> > > that you know of a filesystem mounted rw,exec where you can
> > > write a very large temporary file. This arbitrary filesystem,
> > > rather than swap space, will be the backing store. The SysV
> > > shared memory method requires an undocumented flag and is
> > > subject to some annoying size limits. Both methods create
> > > objects that will fail to be deleted if the program dies
> > > before marking the objects for deletion.
> >
> > and these methods also destroy yourself on any machine with a looser
> > cache coherency between I and D-cache
> >
> > for all but x86 you pretty much have to do the mprotect() between the
> > two states to deal with the cache flushing properly...
>
> If the instructions to force data write-back and/or to
> invalidate the instruction cache are priveleged, yes.
> AFAIK, only ARM is that lame.

and your program executes this on all the cpus in the system?


I'll remember that if I ever run a JIT on the SMP ARM box.
(there's like one, at the manufacturer, right?)

I don't recall seeing such code in the libgcc tranpoline
setup for PowerPC. Either it's not required, or this is
a rather popular bug.

Perhaps ARM needs syscalls for this, or emulation for
the privileged instructions. This may already exist; it
sure is required. So this would be another need for
properly supporting JIT emulators.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-22 Thread Arjan van de Ven
On Fri, 2007-06-22 at 01:56 -0400, Albert Cahalan wrote:
> On 6/21/07, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> > On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote:
> > > Right now, Linux isn't all that friendly to JIT emulators.
> > > Here are the problems and suggestions to improve the situation.
> > >
> > > There is an SE Linux execmem restriction that enforces W^X.
> > > Assuming you don't wish to just disable SE Linux, there are
> > > two ugly ways around the problem. You can mmap a file twice,
> > > or you can abuse SysV shared memory. The mmap method requires
> > > that you know of a filesystem mounted rw,exec where you can
> > > write a very large temporary file. This arbitrary filesystem,
> > > rather than swap space, will be the backing store. The SysV
> > > shared memory method requires an undocumented flag and is
> > > subject to some annoying size limits. Both methods create
> > > objects that will fail to be deleted if the program dies
> > > before marking the objects for deletion.
> >
> > and these methods also destroy yourself on any machine with a looser
> > cache coherency between I and D-cache
> >
> > for all but x86 you pretty much have to do the mprotect() between the
> > two states to deal with the cache flushing properly...
> 
> If the instructions to force data write-back and/or to
> invalidate the instruction cache are priveleged, yes.
> AFAIK, only ARM is that lame.

and your program executes this on all the cpus in the system?

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-22 Thread Arjan van de Ven
On Fri, 2007-06-22 at 01:56 -0400, Albert Cahalan wrote:
 On 6/21/07, Arjan van de Ven [EMAIL PROTECTED] wrote:
  On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote:
   Right now, Linux isn't all that friendly to JIT emulators.
   Here are the problems and suggestions to improve the situation.
  
   There is an SE Linux execmem restriction that enforces W^X.
   Assuming you don't wish to just disable SE Linux, there are
   two ugly ways around the problem. You can mmap a file twice,
   or you can abuse SysV shared memory. The mmap method requires
   that you know of a filesystem mounted rw,exec where you can
   write a very large temporary file. This arbitrary filesystem,
   rather than swap space, will be the backing store. The SysV
   shared memory method requires an undocumented flag and is
   subject to some annoying size limits. Both methods create
   objects that will fail to be deleted if the program dies
   before marking the objects for deletion.
 
  and these methods also destroy yourself on any machine with a looser
  cache coherency between I and D-cache
 
  for all but x86 you pretty much have to do the mprotect() between the
  two states to deal with the cache flushing properly...
 
 If the instructions to force data write-back and/or to
 invalidate the instruction cache are priveleged, yes.
 AFAIK, only ARM is that lame.

and your program executes this on all the cpus in the system?

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-22 Thread Arjan van de Ven

and these methods also destroy yourself on any machine with a looser
cache coherency between I and D-cache
   
for all but x86 you pretty much have to do the mprotect() between the
two states to deal with the cache flushing properly...
  
   If the instructions to force data write-back and/or to
   invalidate the instruction cache are priveleged, yes.
   AFAIK, only ARM is that lame.
 
  and your program executes this on all the cpus in the system?

no I meant that you had to call your userspace instruction on all cpus,
so on all-but-arm (from the Intel side I know IA64 needs such a flush,
but I'm pretty sure PPC does too)


 I don't recall seeing such code in the libgcc tranpoline
 setup for PowerPC. Either it's not required, or this is
 a rather popular bug.

I suspect it'll be playing under the assumption that going from no
code to code is fine since the icache is cold.


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-22 Thread Albert Cahalan

On 6/22/07, Arjan van de Ven [EMAIL PROTECTED] wrote:

On Fri, 2007-06-22 at 01:56 -0400, Albert Cahalan wrote:
 On 6/21/07, Arjan van de Ven [EMAIL PROTECTED] wrote:
  On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote:
   Right now, Linux isn't all that friendly to JIT emulators.
   Here are the problems and suggestions to improve the situation.
  
   There is an SE Linux execmem restriction that enforces W^X.
   Assuming you don't wish to just disable SE Linux, there are
   two ugly ways around the problem. You can mmap a file twice,
   or you can abuse SysV shared memory. The mmap method requires
   that you know of a filesystem mounted rw,exec where you can
   write a very large temporary file. This arbitrary filesystem,
   rather than swap space, will be the backing store. The SysV
   shared memory method requires an undocumented flag and is
   subject to some annoying size limits. Both methods create
   objects that will fail to be deleted if the program dies
   before marking the objects for deletion.
 
  and these methods also destroy yourself on any machine with a looser
  cache coherency between I and D-cache
 
  for all but x86 you pretty much have to do the mprotect() between the
  two states to deal with the cache flushing properly...

 If the instructions to force data write-back and/or to
 invalidate the instruction cache are priveleged, yes.
 AFAIK, only ARM is that lame.

and your program executes this on all the cpus in the system?


I'll remember that if I ever run a JIT on the SMP ARM box.
(there's like one, at the manufacturer, right?)

I don't recall seeing such code in the libgcc tranpoline
setup for PowerPC. Either it's not required, or this is
a rather popular bug.

Perhaps ARM needs syscalls for this, or emulation for
the privileged instructions. This may already exist; it
sure is required. So this would be another need for
properly supporting JIT emulators.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-22 Thread Albert Cahalan

On 6/22/07, Arjan van de Ven [EMAIL PROTECTED] wrote:


and these methods also destroy yourself on any machine with a looser
cache coherency between I and D-cache
   
for all but x86 you pretty much have to do the mprotect() between the
two states to deal with the cache flushing properly...
  
   If the instructions to force data write-back and/or to
   invalidate the instruction cache are priveleged, yes.
   AFAIK, only ARM is that lame.
 
  and your program executes this on all the cpus in the system?

no I meant that you had to call your userspace instruction on all cpus,
so on all-but-arm (from the Intel side I know IA64 needs such a flush,
but I'm pretty sure PPC does too)


I understood.

AFAIK, it is common to propagate this via a special
bus cycle. Section 5.1.5.2.1 of the PowerPC manual
states that this is so. Secion 5.1.5.2 lists the requirements
for both uniprocessor and multiprocessor. Note that
Linux uses the coherent memory model for PowerPC SMP.
See also the icbi instruction description, where the use
of an address-only broadcast is mentioned.


 I don't recall seeing such code in the libgcc tranpoline
 setup for PowerPC. Either it's not required, or this is
 a rather popular bug.

I suspect it'll be playing under the assumption that going from no
code to code is fine since the icache is cold.


A previous trampoline would ruin that.

Fortunately, PowerPC is not as brain-dead as ARM and IA64.
(not that I'm writing code for any of these)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-22 Thread Kyle Moffett

On Jun 19, 2007, at 11:08:24, William Lee Irwin III wrote:

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

c. open() flag to unlink a file before returning the fd


You probably want a tmpfile(3) -like affair which never has a  
pathname to begin with. It could be useful for security purposes  
more generally.


maybe this: open(/some/dir, O_TMPFILE);
and this? open(/some/dir, O_TMPFILE|O_DIRECTORY);

The former would return a filehandle to a new anonymous file  
somewhere on whatever filesystem backs the specified path.  The  
latter would do the same, except create an anonymous directory where  
you could use openat() or something.  Presumably lsof and /proc  
should show either type of handle as referring to either /some/ 
filesystem/ or /some/filesystem/ (anonymous temp file) or something.


Cheers,
Kyle Moffett


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-21 Thread Albert Cahalan

On 6/21/07, Arjan van de Ven <[EMAIL PROTECTED]> wrote:

On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote:
> Right now, Linux isn't all that friendly to JIT emulators.
> Here are the problems and suggestions to improve the situation.
>
> There is an SE Linux execmem restriction that enforces W^X.
> Assuming you don't wish to just disable SE Linux, there are
> two ugly ways around the problem. You can mmap a file twice,
> or you can abuse SysV shared memory. The mmap method requires
> that you know of a filesystem mounted rw,exec where you can
> write a very large temporary file. This arbitrary filesystem,
> rather than swap space, will be the backing store. The SysV
> shared memory method requires an undocumented flag and is
> subject to some annoying size limits. Both methods create
> objects that will fail to be deleted if the program dies
> before marking the objects for deletion.

and these methods also destroy yourself on any machine with a looser
cache coherency between I and D-cache

for all but x86 you pretty much have to do the mprotect() between the
two states to deal with the cache flushing properly...


If the instructions to force data write-back and/or to
invalidate the instruction cache are priveleged, yes.
AFAIK, only ARM is that lame.

For example, PowerPC lets unprivileged code run
the required instructions.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-21 Thread Arjan van de Ven
On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote:
> Right now, Linux isn't all that friendly to JIT emulators.
> Here are the problems and suggestions to improve the situation.
> 
> There is an SE Linux execmem restriction that enforces W^X.
> Assuming you don't wish to just disable SE Linux, there are
> two ugly ways around the problem. You can mmap a file twice,
> or you can abuse SysV shared memory. The mmap method requires
> that you know of a filesystem mounted rw,exec where you can
> write a very large temporary file. This arbitrary filesystem,
> rather than swap space, will be the backing store. The SysV
> shared memory method requires an undocumented flag and is
> subject to some annoying size limits. Both methods create
> objects that will fail to be deleted if the program dies
> before marking the objects for deletion.


and these methods also destroy yourself on any machine with a looser
cache coherency between I and D-cache

for all but x86 you pretty much have to do the mprotect() between the
two states to deal with the cache flushing properly...



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-21 Thread Bodo Eggert
Albert Cahalan <[EMAIL PROTECTED]> wrote:
> On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:
>> On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

>>> Right now, Linux isn't all that friendly to JIT emulators.
>>> Here are the problems and suggestions to improve the situation.
>>> There is an SE Linux execmem restriction that enforces W^X.
>>> Assuming you don't wish to just disable SE Linux, there are
>>> two ugly ways around the problem. You can mmap a file twice,
>>> or you can abuse SysV shared memory. The mmap method requires
>>> that you know of a filesystem mounted rw,exec where you can
>>> write a very large temporary file. This arbitrary filesystem,
>>> rather than swap space, will be the backing store. The SysV
>>> shared memory method requires an undocumented flag and is
>>> subject to some annoying size limits. Both methods create
>>> objects that will fail to be deleted if the program dies
>>> before marking the objects for deletion.
>>
>> If the policy forbidding self-modifying code lacks a method of
>> exempting programs such as JIT interpreters (which I doubt) then
>> it's a problem. I'm with Alan on this one.
> 
> It does and it doesn't. There is not a reasonable way for a
> user to mark an app as needing full self-modifying ability.
> It's not like the executable stack, which can be set via the
> ELF note markings on the executable. (ELF note markings are
> ideal because they can not be used via a ret-to-libc attack)
> 
> With admin privs, one can change SE Linux settings. Mark the
> executable, disable the protection system-wide, generate a
> completely new SE Linux policy, or just turn SE Linux off.

According to the documents I found about SELinux, you can also
 - create a this-app-needs-selfmodification type
 - allow users to change the context type of their files to this type
 - configure a domain to allow self-modification
 - configure the domain transition

Brave words from someone who did not yet successfully find the magic in
order to install the refpolicy on debilian (after finding their refpolicy-foo
to be incomplete and their refpolicy-src to not compile).
-- 
Why do women have smaller feet than men?
It's one of those "evolutionary things" that allows them to stand
closer to the kitchen sink.
Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-21 Thread Albert Cahalan

On 6/20/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote:

Albert Cahalan wrote:



> Look, let's back up a bit here. At a high level, what exactly do
> you imagine that this behavior was intended for? I suggest you
> list some examples of the attacks that are blocked.
>
> Can you come up with a reasonable argument that the current behavior
> is the least painful restriction required to block those attacks?
> Does the current behavior block any attack that the proposed behavior
> would not? (list the attacks please)

See above.


Nope. I asked you to justify the existing behavior. Apparently you
are unable to do so. This should be a hint.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-21 Thread Albert Cahalan

On 6/20/07, H. Peter Anvin [EMAIL PROTECTED] wrote:

Albert Cahalan wrote:



 Look, let's back up a bit here. At a high level, what exactly do
 you imagine that this behavior was intended for? I suggest you
 list some examples of the attacks that are blocked.

 Can you come up with a reasonable argument that the current behavior
 is the least painful restriction required to block those attacks?
 Does the current behavior block any attack that the proposed behavior
 would not? (list the attacks please)

See above.


Nope. I asked you to justify the existing behavior. Apparently you
are unable to do so. This should be a hint.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-21 Thread Bodo Eggert
Albert Cahalan [EMAIL PROTECTED] wrote:
 On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:
 On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

 Right now, Linux isn't all that friendly to JIT emulators.
 Here are the problems and suggestions to improve the situation.
 There is an SE Linux execmem restriction that enforces W^X.
 Assuming you don't wish to just disable SE Linux, there are
 two ugly ways around the problem. You can mmap a file twice,
 or you can abuse SysV shared memory. The mmap method requires
 that you know of a filesystem mounted rw,exec where you can
 write a very large temporary file. This arbitrary filesystem,
 rather than swap space, will be the backing store. The SysV
 shared memory method requires an undocumented flag and is
 subject to some annoying size limits. Both methods create
 objects that will fail to be deleted if the program dies
 before marking the objects for deletion.

 If the policy forbidding self-modifying code lacks a method of
 exempting programs such as JIT interpreters (which I doubt) then
 it's a problem. I'm with Alan on this one.
 
 It does and it doesn't. There is not a reasonable way for a
 user to mark an app as needing full self-modifying ability.
 It's not like the executable stack, which can be set via the
 ELF note markings on the executable. (ELF note markings are
 ideal because they can not be used via a ret-to-libc attack)
 
 With admin privs, one can change SE Linux settings. Mark the
 executable, disable the protection system-wide, generate a
 completely new SE Linux policy, or just turn SE Linux off.

According to the documents I found about SELinux, you can also
 - create a this-app-needs-selfmodification type
 - allow users to change the context type of their files to this type
 - configure a domain to allow self-modification
 - configure the domain transition

Brave words from someone who did not yet successfully find the magic in
order to install the refpolicy on debilian (after finding their refpolicy-foo
to be incomplete and their refpolicy-src to not compile).
-- 
Why do women have smaller feet than men?
It's one of those evolutionary things that allows them to stand
closer to the kitchen sink.
Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-21 Thread Arjan van de Ven
On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote:
 Right now, Linux isn't all that friendly to JIT emulators.
 Here are the problems and suggestions to improve the situation.
 
 There is an SE Linux execmem restriction that enforces W^X.
 Assuming you don't wish to just disable SE Linux, there are
 two ugly ways around the problem. You can mmap a file twice,
 or you can abuse SysV shared memory. The mmap method requires
 that you know of a filesystem mounted rw,exec where you can
 write a very large temporary file. This arbitrary filesystem,
 rather than swap space, will be the backing store. The SysV
 shared memory method requires an undocumented flag and is
 subject to some annoying size limits. Both methods create
 objects that will fail to be deleted if the program dies
 before marking the objects for deletion.


and these methods also destroy yourself on any machine with a looser
cache coherency between I and D-cache

for all but x86 you pretty much have to do the mprotect() between the
two states to deal with the cache flushing properly...



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-21 Thread Albert Cahalan

On 6/21/07, Arjan van de Ven [EMAIL PROTECTED] wrote:

On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote:
 Right now, Linux isn't all that friendly to JIT emulators.
 Here are the problems and suggestions to improve the situation.

 There is an SE Linux execmem restriction that enforces W^X.
 Assuming you don't wish to just disable SE Linux, there are
 two ugly ways around the problem. You can mmap a file twice,
 or you can abuse SysV shared memory. The mmap method requires
 that you know of a filesystem mounted rw,exec where you can
 write a very large temporary file. This arbitrary filesystem,
 rather than swap space, will be the backing store. The SysV
 shared memory method requires an undocumented flag and is
 subject to some annoying size limits. Both methods create
 objects that will fail to be deleted if the program dies
 before marking the objects for deletion.

and these methods also destroy yourself on any machine with a looser
cache coherency between I and D-cache

for all but x86 you pretty much have to do the mprotect() between the
two states to deal with the cache flushing properly...


If the instructions to force data write-back and/or to
invalidate the instruction cache are priveleged, yes.
AFAIK, only ARM is that lame.

For example, PowerPC lets unprivileged code run
the required instructions.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread H. Peter Anvin
Albert Cahalan wrote:
>>
>> That's fine.  That's a policy decision.  That's what a security policy
>> *is*.  The owner of the system has decided, by security policy, that
>> that is not allowed.  Bypassing that is not acceptable.
> 
> Fixing a bug should be acceptable.
> 

That's not what you're trying to do, though.  You're trying to change
the behaviour underneath the security policy.  If there is a bug, it's
in the security policy and that's where it needs to be changed.

> Look, let's back up a bit here. At a high level, what exactly do
> you imagine that this behavior was intended for? I suggest you
> list some examples of the attacks that are blocked.
> 
> Can you come up with a reasonable argument that the current behavior
> is the least painful restriction required to block those attacks?
> Does the current behavior block any attack that the proposed behavior
> would not? (list the attacks please)

See above.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread Albert Cahalan

On 6/20/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote:

Albert Cahalan wrote:
> Putting this into the security policy was an error born of
> lazyness to begin with. Abuse of the security mechanism
> was easier than hacking the toolchain, ELF loader, etc.
>
> Either a binary needs self-modification, or it doesn't. This is
> determined by the author of the code. If you don't trust an
> executable that needs this ability, then you simply can not
> run it in a useful way.

That's fine.  That's a policy decision.  That's what a security policy
*is*.  The owner of the system has decided, by security policy, that
that is not allowed.  Bypassing that is not acceptable.


Fixing a bug should be acceptable.

Look, let's back up a bit here. At a high level, what exactly do
you imagine that this behavior was intended for? I suggest you
list some examples of the attacks that are blocked.

Can you come up with a reasonable argument that the current behavior
is the least painful restriction required to block those attacks?
Does the current behavior block any attack that the proposed behavior
would not? (list the attacks please)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread H. Peter Anvin
Albert Cahalan wrote:
> Putting this into the security policy was an error born of
> lazyness to begin with. Abuse of the security mechanism
> was easier than hacking the toolchain, ELF loader, etc.
> 
> Either a binary needs self-modification, or it doesn't. This is
> determined by the author of the code. If you don't trust an
> executable that needs this ability, then you simply can not
> run it in a useful way.

That's fine.  That's a policy decision.  That's what a security policy
*is*.  The owner of the system has decided, by security policy, that
that is not allowed.  Bypassing that is not acceptable.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread Albert Cahalan

On 6/20/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:

On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:

If the policy forbidding self-modifying code lacks a method of
exempting programs such as JIT interpreters (which I doubt) then
it's a problem. I'm with Alan on this one.


On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:

It does and it doesn't. There is not a reasonable way for a
user to mark an app as needing full self-modifying ability.
It's not like the executable stack, which can be set via the
ELF note markings on the executable. (ELF note markings are
ideal because they can not be used via a ret-to-libc attack)
With admin privs, one can change SE Linux settings. Mark the
executable, disable the protection system-wide, generate a
completely new SE Linux policy, or just turn SE Linux off.
Normally we don't expect/require admin privs to install an
executable in one's own ~/bin directory. This is broken.
It ought to be easier to get a JIT working well without
enabling arbitrary mprotect. This would allow a JIT to
partially benefit from the recent security enhancements.
(think of all the buggy browser-based JIT things!)


I presumed an ELF note or extended filesystem attributes were already
in place for this sort of affair. It may be that the model implemented
is so restrictive that users are forbidden to create new executables,
in which case using a different model is certainly in order. Otherwise
the ELF note or attributes need to be implemented.


Users can create executables. Some will be non-functional
unless specially marked by an admin.

What is the goal here? I see no reasonable goal that would
result in such a policy.


On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:

This sort of logic might be appropriate for a sort of parametrized
and specialized vma allocator setting the policy in /proc/ along
with various sorts of limits. There are limits to such and at some
point things will have to manually manage their own process address
spaces in a platform-specific fashion. If kernel assistance here is
rejected they may have to do so in all cases.


On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:

I prefer ELF notes (for start-up allocations) and prctl,
plus a mmap flag for per-allocation behavior.


Beware that the kernel (upstream of me) will likely refuse to support
to exotic mmap() placement policies. At that point userspace will have
to implement them itself with a front-end to mmap().

Userspace can actually live without kernel placement support for
everything but the executable itself, which is already implemented via
ELF loading standards. This is not to downplay the tremendous amounts
of pain involved for moving the stack, getting ld.so to land in the
right place, and so on. Actually I'm less sure about .interp placement.
In any event, exotic virtualspace allocation policies are largely yet
another "simple matter of programming" implementable entirely in
userspace.


When you go that route, you may need to abandon libc. I've done exactly
that for one emulator. It was not easy. Nearly nobody will want to go
down that path.

Things improve a bit if MAP_ANONYMOUS and SysV shared mem allocations
can be made to ignore the available memory checking. If I could allocate
a 2 GB chunk on a system with 1 GB total swap+RAM, then I could use
that as an area in which to perform MAP_FIXED allocations. As of now
this would require either adding the swap space or disabling the
available memory checking system-wide via sysctl.


On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:

This is a bad idea. The standard semantics are needed for programs
relying upon them.


On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:

I didn't mean that the default default :-) setting would change.
I meant that people could change the behavior from a boot script.
Things that break are really foul and nasty anyway, probably with
serious problems that ought to get fixed.


It's actually not a good idea to make it the default even via sysctl.
People won't realize something will break until it does, and what will
break is likely to be a database responsible for data integrity. The
IPC_RMID creation flag should suffice.


It's highly unlikely that such breakage would cause corruption.
Most likely it would cause the database to exit with an error
about failing to attach to a SysV shared memory segment.

I believe that a major cause of reboots is that admins are
unaware of SysV shared memory cruft left behind by apps that
crashed at the wrong moment or had other bugs. If something
is eating memory and you don't know what it is, you reboot.


On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:

This is MADV_REMOVE, though most filesystems don't support it. Do you
need it for more than tmpfs?


On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:

Yes and no. It's painful to be restricted to one backing store.
Covering 

Re: JIT emulator needs

2007-06-20 Thread Albert Cahalan

On 6/20/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote:

William Lee Irwin III wrote:



> I presumed an ELF note or extended filesystem attributes were already
> in place for this sort of affair. It may be that the model implemented
> is so restrictive that users are forbidden to create new executables,
> in which case using a different model is certainly in order. Otherwise
> the ELF note or attributes need to be implemented.

Another thing to keep in mind, since we're talking about security
policies in the first place, is that anything like this *MUST* be
"opt-in" on the part of the security policy, because what we're talking
about is circumventing an explicit security policy just based on a
user-provided binary saying, in effect, "don't worry, I know what I'm
doing."

Changing the meaning of an established explicit security policy is not
acceptable.


Not in this case. If an attacker can CHANGE THE BINARY then
it's already game over.

Putting this into the security policy was an error born of
lazyness to begin with. Abuse of the security mechanism
was easier than hacking the toolchain, ELF loader, etc.

Either a binary needs self-modification, or it doesn't. This is
determined by the author of the code. If you don't trust an
executable that needs this ability, then you simply can not
run it in a useful way.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread H. Peter Anvin
William Lee Irwin III wrote:
> William Lee Irwin III wrote:
>>> I presumed an ELF note or extended filesystem attributes were already
>>> in place for this sort of affair. It may be that the model implemented
>>> is so restrictive that users are forbidden to create new executables,
>>> in which case using a different model is certainly in order. Otherwise
>>> the ELF note or attributes need to be implemented.
> 
> On Wed, Jun 20, 2007 at 09:37:31AM -0700, H. Peter Anvin wrote:
>> Another thing to keep in mind, since we're talking about security
>> policies in the first place, is that anything like this *MUST* be
>> "opt-in" on the part of the security policy, because what we're talking
>> about is circumventing an explicit security policy just based on a
>> user-provided binary saying, in effect, "don't worry, I know what I'm
>> doing."
>> Changing the meaning of an established explicit security policy is not
>> acceptable.
> 
> This is what I had in mind with the commentary on the intentions of the
> policy. Thank you for correcting my hamhanded attempt to describe it.
> 

Right.  It's important to notice that it's actually more of an issue if
the user can create executables, but the policy doesn't want to allow
them to run bypassing the policy.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread William Lee Irwin III
William Lee Irwin III wrote:
>> I presumed an ELF note or extended filesystem attributes were already
>> in place for this sort of affair. It may be that the model implemented
>> is so restrictive that users are forbidden to create new executables,
>> in which case using a different model is certainly in order. Otherwise
>> the ELF note or attributes need to be implemented.

On Wed, Jun 20, 2007 at 09:37:31AM -0700, H. Peter Anvin wrote:
> Another thing to keep in mind, since we're talking about security
> policies in the first place, is that anything like this *MUST* be
> "opt-in" on the part of the security policy, because what we're talking
> about is circumventing an explicit security policy just based on a
> user-provided binary saying, in effect, "don't worry, I know what I'm
> doing."
> Changing the meaning of an established explicit security policy is not
> acceptable.

This is what I had in mind with the commentary on the intentions of the
policy. Thank you for correcting my hamhanded attempt to describe it.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread H. Peter Anvin
William Lee Irwin III wrote:
> 
> I presumed an ELF note or extended filesystem attributes were already
> in place for this sort of affair. It may be that the model implemented
> is so restrictive that users are forbidden to create new executables,
> in which case using a different model is certainly in order. Otherwise
> the ELF note or attributes need to be implemented.
> 

Another thing to keep in mind, since we're talking about security
policies in the first place, is that anything like this *MUST* be
"opt-in" on the part of the security policy, because what we're talking
about is circumventing an explicit security policy just based on a
user-provided binary saying, in effect, "don't worry, I know what I'm
doing."

Changing the meaning of an established explicit security policy is not
acceptable.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread William Lee Irwin III
On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:
>> If the policy forbidding self-modifying code lacks a method of
>> exempting programs such as JIT interpreters (which I doubt) then
>> it's a problem. I'm with Alan on this one.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
> It does and it doesn't. There is not a reasonable way for a
> user to mark an app as needing full self-modifying ability.
> It's not like the executable stack, which can be set via the
> ELF note markings on the executable. (ELF note markings are
> ideal because they can not be used via a ret-to-libc attack)
> With admin privs, one can change SE Linux settings. Mark the
> executable, disable the protection system-wide, generate a
> completely new SE Linux policy, or just turn SE Linux off.
> Normally we don't expect/require admin privs to install an
> executable in one's own ~/bin directory. This is broken.
> It ought to be easier to get a JIT working well without
> enabling arbitrary mprotect. This would allow a JIT to
> partially benefit from the recent security enhancements.
> (think of all the buggy browser-based JIT things!)

I presumed an ELF note or extended filesystem attributes were already
in place for this sort of affair. It may be that the model implemented
is so restrictive that users are forbidden to create new executables,
in which case using a different model is certainly in order. Otherwise
the ELF note or attributes need to be implemented.


On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:
>> This sort of logic might be appropriate for a sort of parametrized
>> and specialized vma allocator setting the policy in /proc/ along
>> with various sorts of limits. There are limits to such and at some
>> point things will have to manually manage their own process address
>> spaces in a platform-specific fashion. If kernel assistance here is
>> rejected they may have to do so in all cases.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
> I prefer ELF notes (for start-up allocations) and prctl,
> plus a mmap flag for per-allocation behavior.

Beware that the kernel (upstream of me) will likely refuse to support
to exotic mmap() placement policies. At that point userspace will have
to implement them itself with a front-end to mmap().

Userspace can actually live without kernel placement support for
everything but the executable itself, which is already implemented via
ELF loading standards. This is not to downplay the tremendous amounts
of pain involved for moving the stack, getting ld.so to land in the
right place, and so on. Actually I'm less sure about .interp placement.
In any event, exotic virtualspace allocation policies are largely yet
another "simple matter of programming" implementable entirely in
userspace.


On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:
>> This is a bad idea. The standard semantics are needed for programs
>> relying upon them.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
> I didn't mean that the default default :-) setting would change.
> I meant that people could change the behavior from a boot script.
> Things that break are really foul and nasty anyway, probably with
> serious problems that ought to get fixed.

It's actually not a good idea to make it the default even via sysctl.
People won't realize something will break until it does, and what will
break is likely to be a database responsible for data integrity. The
IPC_RMID creation flag should suffice.


On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:
>> You probably want a tmpfile(3) -like affair which never has a pathname
>> to begin with. It could be useful for security purposes more generally.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
> Yes, exactly. I think there are some possible optimizations
> available too, particularly with the cifs filesystem.

I doubt this will be controversial, but it's not clear to me that there
is any convenient way to obtain an anonymous inode on anything but tmpfs,
in which case it's not really anonymous, but not visible to userspace on
account of the default kern_mount(). Essentially it's possible to hoist
the tmpfile name generation in-kernel to where it's in a disconnected
namespace not visible to any userspace whatsoever, and kernel threads
can cooperatively ensure safety via access discipline. Alternatively,
one could kern_mount() a fresh tmpfs filesystem for some concurrency
domain, e.g. per-uid, per-process, or per-thread.


On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:
>> This sounds vaguely like another syscall, like mdup(). This is
>> particularly meaningful in the context of anonymous memory, for
>> which there is no method of replicating mappings within a single
>> process address space.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
> Yes, mdup() and probably mdup2(). It could be mremap flags or not.
> JIT emulators generally need a 

Re: JIT emulator needs

2007-06-20 Thread William Lee Irwin III
On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:
 If the policy forbidding self-modifying code lacks a method of
 exempting programs such as JIT interpreters (which I doubt) then
 it's a problem. I'm with Alan on this one.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
 It does and it doesn't. There is not a reasonable way for a
 user to mark an app as needing full self-modifying ability.
 It's not like the executable stack, which can be set via the
 ELF note markings on the executable. (ELF note markings are
 ideal because they can not be used via a ret-to-libc attack)
 With admin privs, one can change SE Linux settings. Mark the
 executable, disable the protection system-wide, generate a
 completely new SE Linux policy, or just turn SE Linux off.
 Normally we don't expect/require admin privs to install an
 executable in one's own ~/bin directory. This is broken.
 It ought to be easier to get a JIT working well without
 enabling arbitrary mprotect. This would allow a JIT to
 partially benefit from the recent security enhancements.
 (think of all the buggy browser-based JIT things!)

I presumed an ELF note or extended filesystem attributes were already
in place for this sort of affair. It may be that the model implemented
is so restrictive that users are forbidden to create new executables,
in which case using a different model is certainly in order. Otherwise
the ELF note or attributes need to be implemented.


On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:
 This sort of logic might be appropriate for a sort of parametrized
 and specialized vma allocator setting the policy in /proc/ along
 with various sorts of limits. There are limits to such and at some
 point things will have to manually manage their own process address
 spaces in a platform-specific fashion. If kernel assistance here is
 rejected they may have to do so in all cases.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
 I prefer ELF notes (for start-up allocations) and prctl,
 plus a mmap flag for per-allocation behavior.

Beware that the kernel (upstream of me) will likely refuse to support
to exotic mmap() placement policies. At that point userspace will have
to implement them itself with a front-end to mmap().

Userspace can actually live without kernel placement support for
everything but the executable itself, which is already implemented via
ELF loading standards. This is not to downplay the tremendous amounts
of pain involved for moving the stack, getting ld.so to land in the
right place, and so on. Actually I'm less sure about .interp placement.
In any event, exotic virtualspace allocation policies are largely yet
another simple matter of programming implementable entirely in
userspace.


On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:
 This is a bad idea. The standard semantics are needed for programs
 relying upon them.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
 I didn't mean that the default default :-) setting would change.
 I meant that people could change the behavior from a boot script.
 Things that break are really foul and nasty anyway, probably with
 serious problems that ought to get fixed.

It's actually not a good idea to make it the default even via sysctl.
People won't realize something will break until it does, and what will
break is likely to be a database responsible for data integrity. The
IPC_RMID creation flag should suffice.


On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:
 You probably want a tmpfile(3) -like affair which never has a pathname
 to begin with. It could be useful for security purposes more generally.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
 Yes, exactly. I think there are some possible optimizations
 available too, particularly with the cifs filesystem.

I doubt this will be controversial, but it's not clear to me that there
is any convenient way to obtain an anonymous inode on anything but tmpfs,
in which case it's not really anonymous, but not visible to userspace on
account of the default kern_mount(). Essentially it's possible to hoist
the tmpfile name generation in-kernel to where it's in a disconnected
namespace not visible to any userspace whatsoever, and kernel threads
can cooperatively ensure safety via access discipline. Alternatively,
one could kern_mount() a fresh tmpfs filesystem for some concurrency
domain, e.g. per-uid, per-process, or per-thread.


On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:
 This sounds vaguely like another syscall, like mdup(). This is
 particularly meaningful in the context of anonymous memory, for
 which there is no method of replicating mappings within a single
 process address space.

On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:
 Yes, mdup() and probably mdup2(). It could be mremap flags or not.
 JIT emulators generally need a second mapping so that they can
 have both read/write and execute for 

Re: JIT emulator needs

2007-06-20 Thread H. Peter Anvin
William Lee Irwin III wrote:
 
 I presumed an ELF note or extended filesystem attributes were already
 in place for this sort of affair. It may be that the model implemented
 is so restrictive that users are forbidden to create new executables,
 in which case using a different model is certainly in order. Otherwise
 the ELF note or attributes need to be implemented.
 

Another thing to keep in mind, since we're talking about security
policies in the first place, is that anything like this *MUST* be
opt-in on the part of the security policy, because what we're talking
about is circumventing an explicit security policy just based on a
user-provided binary saying, in effect, don't worry, I know what I'm
doing.

Changing the meaning of an established explicit security policy is not
acceptable.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread William Lee Irwin III
William Lee Irwin III wrote:
 I presumed an ELF note or extended filesystem attributes were already
 in place for this sort of affair. It may be that the model implemented
 is so restrictive that users are forbidden to create new executables,
 in which case using a different model is certainly in order. Otherwise
 the ELF note or attributes need to be implemented.

On Wed, Jun 20, 2007 at 09:37:31AM -0700, H. Peter Anvin wrote:
 Another thing to keep in mind, since we're talking about security
 policies in the first place, is that anything like this *MUST* be
 opt-in on the part of the security policy, because what we're talking
 about is circumventing an explicit security policy just based on a
 user-provided binary saying, in effect, don't worry, I know what I'm
 doing.
 Changing the meaning of an established explicit security policy is not
 acceptable.

This is what I had in mind with the commentary on the intentions of the
policy. Thank you for correcting my hamhanded attempt to describe it.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread H. Peter Anvin
William Lee Irwin III wrote:
 William Lee Irwin III wrote:
 I presumed an ELF note or extended filesystem attributes were already
 in place for this sort of affair. It may be that the model implemented
 is so restrictive that users are forbidden to create new executables,
 in which case using a different model is certainly in order. Otherwise
 the ELF note or attributes need to be implemented.
 
 On Wed, Jun 20, 2007 at 09:37:31AM -0700, H. Peter Anvin wrote:
 Another thing to keep in mind, since we're talking about security
 policies in the first place, is that anything like this *MUST* be
 opt-in on the part of the security policy, because what we're talking
 about is circumventing an explicit security policy just based on a
 user-provided binary saying, in effect, don't worry, I know what I'm
 doing.
 Changing the meaning of an established explicit security policy is not
 acceptable.
 
 This is what I had in mind with the commentary on the intentions of the
 policy. Thank you for correcting my hamhanded attempt to describe it.
 

Right.  It's important to notice that it's actually more of an issue if
the user can create executables, but the policy doesn't want to allow
them to run bypassing the policy.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread Albert Cahalan

On 6/20/07, H. Peter Anvin [EMAIL PROTECTED] wrote:

William Lee Irwin III wrote:



 I presumed an ELF note or extended filesystem attributes were already
 in place for this sort of affair. It may be that the model implemented
 is so restrictive that users are forbidden to create new executables,
 in which case using a different model is certainly in order. Otherwise
 the ELF note or attributes need to be implemented.

Another thing to keep in mind, since we're talking about security
policies in the first place, is that anything like this *MUST* be
opt-in on the part of the security policy, because what we're talking
about is circumventing an explicit security policy just based on a
user-provided binary saying, in effect, don't worry, I know what I'm
doing.

Changing the meaning of an established explicit security policy is not
acceptable.


Not in this case. If an attacker can CHANGE THE BINARY then
it's already game over.

Putting this into the security policy was an error born of
lazyness to begin with. Abuse of the security mechanism
was easier than hacking the toolchain, ELF loader, etc.

Either a binary needs self-modification, or it doesn't. This is
determined by the author of the code. If you don't trust an
executable that needs this ability, then you simply can not
run it in a useful way.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread Albert Cahalan

On 6/20/07, William Lee Irwin III [EMAIL PROTECTED] wrote:

On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:

If the policy forbidding self-modifying code lacks a method of
exempting programs such as JIT interpreters (which I doubt) then
it's a problem. I'm with Alan on this one.


On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:

It does and it doesn't. There is not a reasonable way for a
user to mark an app as needing full self-modifying ability.
It's not like the executable stack, which can be set via the
ELF note markings on the executable. (ELF note markings are
ideal because they can not be used via a ret-to-libc attack)
With admin privs, one can change SE Linux settings. Mark the
executable, disable the protection system-wide, generate a
completely new SE Linux policy, or just turn SE Linux off.
Normally we don't expect/require admin privs to install an
executable in one's own ~/bin directory. This is broken.
It ought to be easier to get a JIT working well without
enabling arbitrary mprotect. This would allow a JIT to
partially benefit from the recent security enhancements.
(think of all the buggy browser-based JIT things!)


I presumed an ELF note or extended filesystem attributes were already
in place for this sort of affair. It may be that the model implemented
is so restrictive that users are forbidden to create new executables,
in which case using a different model is certainly in order. Otherwise
the ELF note or attributes need to be implemented.


Users can create executables. Some will be non-functional
unless specially marked by an admin.

What is the goal here? I see no reasonable goal that would
result in such a policy.


On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:

This sort of logic might be appropriate for a sort of parametrized
and specialized vma allocator setting the policy in /proc/ along
with various sorts of limits. There are limits to such and at some
point things will have to manually manage their own process address
spaces in a platform-specific fashion. If kernel assistance here is
rejected they may have to do so in all cases.


On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:

I prefer ELF notes (for start-up allocations) and prctl,
plus a mmap flag for per-allocation behavior.


Beware that the kernel (upstream of me) will likely refuse to support
to exotic mmap() placement policies. At that point userspace will have
to implement them itself with a front-end to mmap().

Userspace can actually live without kernel placement support for
everything but the executable itself, which is already implemented via
ELF loading standards. This is not to downplay the tremendous amounts
of pain involved for moving the stack, getting ld.so to land in the
right place, and so on. Actually I'm less sure about .interp placement.
In any event, exotic virtualspace allocation policies are largely yet
another simple matter of programming implementable entirely in
userspace.


When you go that route, you may need to abandon libc. I've done exactly
that for one emulator. It was not easy. Nearly nobody will want to go
down that path.

Things improve a bit if MAP_ANONYMOUS and SysV shared mem allocations
can be made to ignore the available memory checking. If I could allocate
a 2 GB chunk on a system with 1 GB total swap+RAM, then I could use
that as an area in which to perform MAP_FIXED allocations. As of now
this would require either adding the swap space or disabling the
available memory checking system-wide via sysctl.


On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:

This is a bad idea. The standard semantics are needed for programs
relying upon them.


On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:

I didn't mean that the default default :-) setting would change.
I meant that people could change the behavior from a boot script.
Things that break are really foul and nasty anyway, probably with
serious problems that ought to get fixed.


It's actually not a good idea to make it the default even via sysctl.
People won't realize something will break until it does, and what will
break is likely to be a database responsible for data integrity. The
IPC_RMID creation flag should suffice.


It's highly unlikely that such breakage would cause corruption.
Most likely it would cause the database to exit with an error
about failing to attach to a SysV shared memory segment.

I believe that a major cause of reboots is that admins are
unaware of SysV shared memory cruft left behind by apps that
crashed at the wrong moment or had other bugs. If something
is eating memory and you don't know what it is, you reboot.


On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:

This is MADV_REMOVE, though most filesystems don't support it. Do you
need it for more than tmpfs?


On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote:

Yes and no. It's painful to be restricted to one backing store.
Covering MAP_ANONYMOUS and 

Re: JIT emulator needs

2007-06-20 Thread H. Peter Anvin
Albert Cahalan wrote:
 Putting this into the security policy was an error born of
 lazyness to begin with. Abuse of the security mechanism
 was easier than hacking the toolchain, ELF loader, etc.
 
 Either a binary needs self-modification, or it doesn't. This is
 determined by the author of the code. If you don't trust an
 executable that needs this ability, then you simply can not
 run it in a useful way.

That's fine.  That's a policy decision.  That's what a security policy
*is*.  The owner of the system has decided, by security policy, that
that is not allowed.  Bypassing that is not acceptable.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread Albert Cahalan

On 6/20/07, H. Peter Anvin [EMAIL PROTECTED] wrote:

Albert Cahalan wrote:
 Putting this into the security policy was an error born of
 lazyness to begin with. Abuse of the security mechanism
 was easier than hacking the toolchain, ELF loader, etc.

 Either a binary needs self-modification, or it doesn't. This is
 determined by the author of the code. If you don't trust an
 executable that needs this ability, then you simply can not
 run it in a useful way.

That's fine.  That's a policy decision.  That's what a security policy
*is*.  The owner of the system has decided, by security policy, that
that is not allowed.  Bypassing that is not acceptable.


Fixing a bug should be acceptable.

Look, let's back up a bit here. At a high level, what exactly do
you imagine that this behavior was intended for? I suggest you
list some examples of the attacks that are blocked.

Can you come up with a reasonable argument that the current behavior
is the least painful restriction required to block those attacks?
Does the current behavior block any attack that the proposed behavior
would not? (list the attacks please)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-20 Thread H. Peter Anvin
Albert Cahalan wrote:

 That's fine.  That's a policy decision.  That's what a security policy
 *is*.  The owner of the system has decided, by security policy, that
 that is not allowed.  Bypassing that is not acceptable.
 
 Fixing a bug should be acceptable.
 

That's not what you're trying to do, though.  You're trying to change
the behaviour underneath the security policy.  If there is a bug, it's
in the security policy and that's where it needs to be changed.

 Look, let's back up a bit here. At a high level, what exactly do
 you imagine that this behavior was intended for? I suggest you
 list some examples of the attacks that are blocked.
 
 Can you come up with a reasonable argument that the current behavior
 is the least painful restriction required to block those attacks?
 Does the current behavior block any attack that the proposed behavior
 would not? (list the attacks please)

See above.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-19 Thread Albert Cahalan

On 6/19/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:



Right now, Linux isn't all that friendly to JIT emulators.
Here are the problems and suggestions to improve the situation.
There is an SE Linux execmem restriction that enforces W^X.
Assuming you don't wish to just disable SE Linux, there are
two ugly ways around the problem. You can mmap a file twice,
or you can abuse SysV shared memory. The mmap method requires
that you know of a filesystem mounted rw,exec where you can
write a very large temporary file. This arbitrary filesystem,
rather than swap space, will be the backing store. The SysV
shared memory method requires an undocumented flag and is
subject to some annoying size limits. Both methods create
objects that will fail to be deleted if the program dies
before marking the objects for deletion.


If the policy forbidding self-modifying code lacks a method of
exempting programs such as JIT interpreters (which I doubt) then
it's a problem. I'm with Alan on this one.


It does and it doesn't. There is not a reasonable way for a
user to mark an app as needing full self-modifying ability.
It's not like the executable stack, which can be set via the
ELF note markings on the executable. (ELF note markings are
ideal because they can not be used via a ret-to-libc attack)

With admin privs, one can change SE Linux settings. Mark the
executable, disable the protection system-wide, generate a
completely new SE Linux policy, or just turn SE Linux off.

Normally we don't expect/require admin privs to install an
executable in one's own ~/bin directory. This is broken.

It ought to be easier to get a JIT working well without
enabling arbitrary mprotect. This would allow a JIT to
partially benefit from the recent security enhancements.
(think of all the buggy browser-based JIT things!)


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

Processors often have annoying limits on the immediate values
in instructions. An x86 or x86_64 JIT can go a bit faster if
all allocations are kept to the low 2 GB of address space.
There are also reasons for a 32bit-to-x86_64 JIT to chose
a nearly arbitrary 2 GB region that lies above 4 GB.
Other archs have other limits, such as 32 MB or 256 MB.


This sort of logic might be appropriate for a sort of parametrized
and specialized vma allocator setting the policy in /proc/ along
with various sorts of limits. There are limits to such and at some
point things will have to manually manage their own process address
spaces in a platform-specific fashion. If kernel assistance here is
rejected they may have to do so in all cases.


I prefer ELF notes (for start-up allocations) and prctl,
plus a mmap flag for per-allocation behavior.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

Additions to better support JIT emulators:
a. sysctl to set IPC_RMID by default


This is a bad idea. The standard semantics are needed for programs
relying upon them.


I didn't mean that the default default :-) setting would change.
I meant that people could change the behavior from a boot script.
Things that break are really foul and nasty anyway, probably with
serious problems that ought to get fixed.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

c. open() flag to unlink a file before returning the fd


You probably want a tmpfile(3) -like affair which never has a pathname
to begin with. It could be useful for security purposes more generally.


Yes, exactly. I think there are some possible optimizations
available too, particularly with the cifs filesystem.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

d. mremap() flag to always keep the old mapping


This sounds vaguely like another syscall, like mdup(). This is
particularly meaningful in the context of anonymous memory, for
which there is no method of replicating mappings within a single
process address space.


Yes, mdup() and probably mdup2(). It could be mremap flags or not.

JIT emulators generally need a second mapping so that they can
have both read/write and execute for the same physical memory.

It is somewhat tolerable to have SE Linux enforce that the second
mapping be randomized. (it helps security greatly, but slows the
emulator by a tiny bit)


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

e. mremap() flag to get a read/write mapping of a read/exec one
f. mremap() flag to get a read/exec mapping of a read/write one


Presumably to be used in conjunction with keeping the old mapping.
A composite mdup()/mremap() and mprotect(), presumably saving a TLB
flush or other sorts of overhead, may make some sort of sense here.
Odds are it'll get rejected as the sequence of syscalls is a rather
precise equivalent, though it would optimize things (as would other
composite syscalls, e.g. ones combining fork() and execve() etc.).


A few mremap flags ought to do the job I think.


On Fri, 

Re: JIT emulator needs

2007-06-19 Thread William Lee Irwin III
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> Right now, Linux isn't all that friendly to JIT emulators.
> Here are the problems and suggestions to improve the situation.
> There is an SE Linux execmem restriction that enforces W^X.
> Assuming you don't wish to just disable SE Linux, there are
> two ugly ways around the problem. You can mmap a file twice,
> or you can abuse SysV shared memory. The mmap method requires
> that you know of a filesystem mounted rw,exec where you can
> write a very large temporary file. This arbitrary filesystem,
> rather than swap space, will be the backing store. The SysV
> shared memory method requires an undocumented flag and is
> subject to some annoying size limits. Both methods create
> objects that will fail to be deleted if the program dies
> before marking the objects for deletion.

If the policy forbidding self-modifying code lacks a method of
exempting programs such as JIT interpreters (which I doubt) then
it's a problem. I'm with Alan on this one.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> Processors often have annoying limits on the immediate values
> in instructions. An x86 or x86_64 JIT can go a bit faster if
> all allocations are kept to the low 2 GB of address space.
> There are also reasons for a 32bit-to-x86_64 JIT to chose
> a nearly arbitrary 2 GB region that lies above 4 GB.
> Other archs have other limits, such as 32 MB or 256 MB.

This sort of logic might be appropriate for a sort of parametrized
and specialized vma allocator setting the policy in /proc/ along
with various sorts of limits. There are limits to such and at some
point things will have to manually manage their own process address
spaces in a platform-specific fashion. If kernel assistance here is
rejected they may have to do so in all cases.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> Sometimes it is very helpful to have the read/write mapping
> be a fixed offset from the read/exec mapping. A power of 2
> can be especially desirable.

As far as the kernel is concerned they're unrelated, so this will
likely need MAP_FIXED barring a staggering array of fresh system
calls to act on tuples of memory ranges in lockstep.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> Emulators often need a cheap way to change page permissions.
> One VMA per page is no good. Besides taking up space and making
> many things generally slower, having one VMA per page causes
> a huge performance loss for snapshot roll-back operations.
> Just tearing down all those VMAs takes a good while.

remap_file_pages_prot() is reputedly waiting in the wings somewhere
for this.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> Additions to better support JIT emulators:
> a. sysctl to set IPC_RMID by default

This is a bad idea. The standard semantics are needed for programs
relying upon them.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> b. shmget() flag to set IPC_RMID by default

This is relatively innocuous.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> c. open() flag to unlink a file before returning the fd

You probably want a tmpfile(3) -like affair which never has a pathname
to begin with. It could be useful for security purposes more generally.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> d. mremap() flag to always keep the old mapping

This sounds vaguely like another syscall, like mdup(). This is
particularly meaningful in the context of anonymous memory, for
which there is no method of replicating mappings within a single
process address space.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> e. mremap() flag to get a read/write mapping of a read/exec one
> f. mremap() flag to get a read/exec mapping of a read/write one

Presumably to be used in conjunction with keeping the old mapping.
A composite mdup()/mremap() and mprotect(), presumably saving a TLB
flush or other sorts of overhead, may make some sort of sense here.
Odds are it'll get rejected as the sequence of syscalls is a rather
precise equivalent, though it would optimize things (as would other
composite syscalls, e.g. ones combining fork() and execve() etc.).


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> g. mremap() flag to make the 5th arg (new addr) be the upper limit
> h. 6-bit wide mremap() "flag" to set the upper limit above given base

Essentially more placement support for mremap()/mdup(). It's not clear
to me those particular semantics are the ideal ones. A target range
for placement should do, if not manual address space management.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> i. support the prot argument to remap_file_pages

This is probably going to happen anyway.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
> j. a documented way (madvise?) to punch same-VMA zero-page holes

This is 

Re: JIT emulator needs

2007-06-19 Thread William Lee Irwin III
On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 Right now, Linux isn't all that friendly to JIT emulators.
 Here are the problems and suggestions to improve the situation.
 There is an SE Linux execmem restriction that enforces W^X.
 Assuming you don't wish to just disable SE Linux, there are
 two ugly ways around the problem. You can mmap a file twice,
 or you can abuse SysV shared memory. The mmap method requires
 that you know of a filesystem mounted rw,exec where you can
 write a very large temporary file. This arbitrary filesystem,
 rather than swap space, will be the backing store. The SysV
 shared memory method requires an undocumented flag and is
 subject to some annoying size limits. Both methods create
 objects that will fail to be deleted if the program dies
 before marking the objects for deletion.

If the policy forbidding self-modifying code lacks a method of
exempting programs such as JIT interpreters (which I doubt) then
it's a problem. I'm with Alan on this one.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 Processors often have annoying limits on the immediate values
 in instructions. An x86 or x86_64 JIT can go a bit faster if
 all allocations are kept to the low 2 GB of address space.
 There are also reasons for a 32bit-to-x86_64 JIT to chose
 a nearly arbitrary 2 GB region that lies above 4 GB.
 Other archs have other limits, such as 32 MB or 256 MB.

This sort of logic might be appropriate for a sort of parametrized
and specialized vma allocator setting the policy in /proc/ along
with various sorts of limits. There are limits to such and at some
point things will have to manually manage their own process address
spaces in a platform-specific fashion. If kernel assistance here is
rejected they may have to do so in all cases.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 Sometimes it is very helpful to have the read/write mapping
 be a fixed offset from the read/exec mapping. A power of 2
 can be especially desirable.

As far as the kernel is concerned they're unrelated, so this will
likely need MAP_FIXED barring a staggering array of fresh system
calls to act on tuples of memory ranges in lockstep.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 Emulators often need a cheap way to change page permissions.
 One VMA per page is no good. Besides taking up space and making
 many things generally slower, having one VMA per page causes
 a huge performance loss for snapshot roll-back operations.
 Just tearing down all those VMAs takes a good while.

remap_file_pages_prot() is reputedly waiting in the wings somewhere
for this.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 Additions to better support JIT emulators:
 a. sysctl to set IPC_RMID by default

This is a bad idea. The standard semantics are needed for programs
relying upon them.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 b. shmget() flag to set IPC_RMID by default

This is relatively innocuous.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 c. open() flag to unlink a file before returning the fd

You probably want a tmpfile(3) -like affair which never has a pathname
to begin with. It could be useful for security purposes more generally.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 d. mremap() flag to always keep the old mapping

This sounds vaguely like another syscall, like mdup(). This is
particularly meaningful in the context of anonymous memory, for
which there is no method of replicating mappings within a single
process address space.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 e. mremap() flag to get a read/write mapping of a read/exec one
 f. mremap() flag to get a read/exec mapping of a read/write one

Presumably to be used in conjunction with keeping the old mapping.
A composite mdup()/mremap() and mprotect(), presumably saving a TLB
flush or other sorts of overhead, may make some sort of sense here.
Odds are it'll get rejected as the sequence of syscalls is a rather
precise equivalent, though it would optimize things (as would other
composite syscalls, e.g. ones combining fork() and execve() etc.).


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 g. mremap() flag to make the 5th arg (new addr) be the upper limit
 h. 6-bit wide mremap() flag to set the upper limit above given base

Essentially more placement support for mremap()/mdup(). It's not clear
to me those particular semantics are the ideal ones. A target range
for placement should do, if not manual address space management.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 i. support the prot argument to remap_file_pages

This is probably going to happen anyway.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:
 j. a documented way (madvise?) to punch same-VMA zero-page holes

This is MADV_REMOVE, though most filesystems don't 

Re: JIT emulator needs

2007-06-19 Thread Albert Cahalan

On 6/19/07, William Lee Irwin III [EMAIL PROTECTED] wrote:

On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:



Right now, Linux isn't all that friendly to JIT emulators.
Here are the problems and suggestions to improve the situation.
There is an SE Linux execmem restriction that enforces W^X.
Assuming you don't wish to just disable SE Linux, there are
two ugly ways around the problem. You can mmap a file twice,
or you can abuse SysV shared memory. The mmap method requires
that you know of a filesystem mounted rw,exec where you can
write a very large temporary file. This arbitrary filesystem,
rather than swap space, will be the backing store. The SysV
shared memory method requires an undocumented flag and is
subject to some annoying size limits. Both methods create
objects that will fail to be deleted if the program dies
before marking the objects for deletion.


If the policy forbidding self-modifying code lacks a method of
exempting programs such as JIT interpreters (which I doubt) then
it's a problem. I'm with Alan on this one.


It does and it doesn't. There is not a reasonable way for a
user to mark an app as needing full self-modifying ability.
It's not like the executable stack, which can be set via the
ELF note markings on the executable. (ELF note markings are
ideal because they can not be used via a ret-to-libc attack)

With admin privs, one can change SE Linux settings. Mark the
executable, disable the protection system-wide, generate a
completely new SE Linux policy, or just turn SE Linux off.

Normally we don't expect/require admin privs to install an
executable in one's own ~/bin directory. This is broken.

It ought to be easier to get a JIT working well without
enabling arbitrary mprotect. This would allow a JIT to
partially benefit from the recent security enhancements.
(think of all the buggy browser-based JIT things!)


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

Processors often have annoying limits on the immediate values
in instructions. An x86 or x86_64 JIT can go a bit faster if
all allocations are kept to the low 2 GB of address space.
There are also reasons for a 32bit-to-x86_64 JIT to chose
a nearly arbitrary 2 GB region that lies above 4 GB.
Other archs have other limits, such as 32 MB or 256 MB.


This sort of logic might be appropriate for a sort of parametrized
and specialized vma allocator setting the policy in /proc/ along
with various sorts of limits. There are limits to such and at some
point things will have to manually manage their own process address
spaces in a platform-specific fashion. If kernel assistance here is
rejected they may have to do so in all cases.


I prefer ELF notes (for start-up allocations) and prctl,
plus a mmap flag for per-allocation behavior.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

Additions to better support JIT emulators:
a. sysctl to set IPC_RMID by default


This is a bad idea. The standard semantics are needed for programs
relying upon them.


I didn't mean that the default default :-) setting would change.
I meant that people could change the behavior from a boot script.
Things that break are really foul and nasty anyway, probably with
serious problems that ought to get fixed.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

c. open() flag to unlink a file before returning the fd


You probably want a tmpfile(3) -like affair which never has a pathname
to begin with. It could be useful for security purposes more generally.


Yes, exactly. I think there are some possible optimizations
available too, particularly with the cifs filesystem.


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

d. mremap() flag to always keep the old mapping


This sounds vaguely like another syscall, like mdup(). This is
particularly meaningful in the context of anonymous memory, for
which there is no method of replicating mappings within a single
process address space.


Yes, mdup() and probably mdup2(). It could be mremap flags or not.

JIT emulators generally need a second mapping so that they can
have both read/write and execute for the same physical memory.

It is somewhat tolerable to have SE Linux enforce that the second
mapping be randomized. (it helps security greatly, but slows the
emulator by a tiny bit)


On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote:

e. mremap() flag to get a read/write mapping of a read/exec one
f. mremap() flag to get a read/exec mapping of a read/write one


Presumably to be used in conjunction with keeping the old mapping.
A composite mdup()/mremap() and mprotect(), presumably saving a TLB
flush or other sorts of overhead, may make some sort of sense here.
Odds are it'll get rejected as the sequence of syscalls is a rather
precise equivalent, though it would optimize things (as would other
composite syscalls, e.g. ones combining fork() and execve() etc.).


A few mremap flags ought to do the job I think.


On Fri, Jun 

Re: JIT emulator needs

2007-06-09 Thread H. Peter Anvin
Albert Cahalan wrote:
> There is an SE Linux execmem restriction that enforces W^X.
> Assuming you don't wish to just disable SE Linux, there are
> two ugly ways around the problem.

This should be fixed in SELinux, or more accurately the SELinux profile.

There is absolutely no other sane option.

Of course, you generally don't need a page to be writable and executable
at the same time, but the overhead of switching can be enormous.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-09 Thread H. Peter Anvin
Albert Cahalan wrote:
 There is an SE Linux execmem restriction that enforces W^X.
 Assuming you don't wish to just disable SE Linux, there are
 two ugly ways around the problem.

This should be fixed in SELinux, or more accurately the SELinux profile.

There is absolutely no other sane option.

Of course, you generally don't need a page to be writable and executable
at the same time, but the overhead of switching can be enormous.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-08 Thread Albert Cahalan

On 6/8/07, Alan Cox <[EMAIL PROTECTED]> wrote:

> There is an SE Linux execmem restriction that enforces W^X.

This depends on whatever SELinux rulesets you are running. Its just a
good rule to have present that most programs shouldn't be self patching,
and then label those that do differently.


A marking in the executable would have made more sense.
It is really broken having an unprivileged user being able to
create whole new executables but unable to lift this restriction
on those executables.

In any case, the restriction is common and troublesome.


> Sometimes it is very helpful to have the read/write mapping
> be a fixed offset from the read/exec mapping. A power of 2
> can be especially desirable.

mmap MAP_FIXED can do this but you need to know a lot about the memory
layout of the system so it gets a bit platform specific.


Yes. There are unportable programs, and UNPORTABLE ones.
Memory layout can vary between vendor kernels, between normal
and 32-on-64 situations, between two different C libraries...


> Emulators often need a cheap way to change page permissions.

mprotect(, range) rather than a page at a time. The kernel will do
merging.


Nope. This can happen rapidly and repeatedly to pages
that are essentially random. The median length of a range
will be a page or two. Merging won't do very much at all.


> a. sysctl to set IPC_RMID by default
> b. shmget() flag to set IPC_RMID by default

Use POSIX shared memory


That appears to have the exact same problem.


> c. open() flag to unlink a file before returning the fd

Is it really that costly to create a blank file, why do you need to do it
a lot in a JIT ?


This part isn't about cost. It's about not leaving around
debris when the JIT crashes.


> e. mremap() flag to get a read/write mapping of a read/exec one
> f. mremap() flag to get a read/exec mapping of a read/write one
> g. mremap() flag to make the 5th arg (new addr) be the upper limit

This is all mprotect and munmap.


That won't get me a second mapping. Supposing that I had
a second mapping, SE Linux would deny the mprotect.
I'm looking for a mapping that is born executable or a mapping
that is born writable, as needed, so that no transition is needed.


> h. 6-bit wide mremap() "flag" to set the upper limit above given base
> i. support the prot argument to remap_file_pages
> j. a documented way (madvise?) to punch same-VMA zero-page holes

mmap (although you get more VMAs from that) so memset() is probably
genuinely cheaper if the permissions are not changing.


Well cost is the problem here. I sure can find some way to
get the operation done, but it isn't cheap. For some usages,
the current setup is costly enough that one must consider
abandoning the hardware MMU in favor of a software one
emitted as part of the JIT. :-(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-08 Thread Albert Cahalan

On 6/8/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:

Albert Cahalan a écrit :



> Additions to better support JIT emulators:
>
> a. sysctl to set IPC_RMID by default

Not very good, this will break some apps.


As a sysctl, the admin gets to choose between
compatibility and sanity.

I can see such a sysctl also being really helpful for a
shared computer used for an Operating Systems or
System Programming course.


> b. shmget() flag to set IPC_RMID by default

This is better :)


Both are good. This one requires that all apps using
SysV shared memory be modified to use the flag.
The other requires that a very few apps be modified
to tolerate a behavior change.


> c. open() flag to unlink a file before returning the fd


Well, I assume you would like fd = open("/path/somefile", O_RDWR | O_CREAT |
O_UNLINK, 0644)

(ie allocate a file handle but no name ?)


Yes.


Quite difficult to implement this atomically with current vfs, maybe a new
syscall would be better. (Linus will kill me for that :) )

(We dont need to insert "somefile" in one directory, then unlink it, we only
need to allocate an unnamed inode to get some backing store)


I suspect that SMB/CIFS has a native call for this. There is
some sort of tmpfile flag defined over in that world.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-08 Thread Nicholas Miell
On Fri, 2007-06-08 at 12:10 +0100, Alan Cox wrote:
> > e. mremap() flag to get a read/write mapping of a read/exec one
> > f. mremap() flag to get a read/exec mapping of a read/write one
> > g. mremap() flag to make the 5th arg (new addr) be the upper limit
> 
> This is all mprotect and munmap.

I think he's asking for a way to copy an existing mapping, which does
sound genuinely useful. (i.e. mremap(ptr, size, size, MREMAP_COPY), with
no need to mess with files to get multiple mappings of the same region)

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-08 Thread Alan Cox
> There is an SE Linux execmem restriction that enforces W^X.

This depends on whatever SELinux rulesets you are running. Its just a
good rule to have present that most programs shouldn't be self patching,
and then label those that do differently.

> Sometimes it is very helpful to have the read/write mapping
> be a fixed offset from the read/exec mapping. A power of 2
> can be especially desirable.

mmap MAP_FIXED can do this but you need to know a lot about the memory
layout of the system so it gets a bit platform specific.

> Emulators often need a cheap way to change page permissions.

mprotect(, range) rather than a page at a time. The kernel will do
merging. 

> a. sysctl to set IPC_RMID by default
> b. shmget() flag to set IPC_RMID by default

Use POSIX shared memory

> c. open() flag to unlink a file before returning the fd

Is it really that costly to create a blank file, why do you need to do it
a lot in a JIT ?

> e. mremap() flag to get a read/write mapping of a read/exec one
> f. mremap() flag to get a read/exec mapping of a read/write one
> g. mremap() flag to make the 5th arg (new addr) be the upper limit

This is all mprotect and munmap.

> h. 6-bit wide mremap() "flag" to set the upper limit above given base
> i. support the prot argument to remap_file_pages
> j. a documented way (madvise?) to punch same-VMA zero-page holes

mmap (although you get more VMAs from that) so memset() is probably
genuinely cheaper if the permissions are not changing.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-08 Thread Eric Dumazet

Albert Cahalan a écrit :

Right now, Linux isn't all that friendly to JIT emulators.
Here are the problems and suggestions to improve the situation.

There is an SE Linux execmem restriction that enforces W^X.
Assuming you don't wish to just disable SE Linux, there are
two ugly ways around the problem. You can mmap a file twice,
or you can abuse SysV shared memory. The mmap method requires
that you know of a filesystem mounted rw,exec where you can
write a very large temporary file. This arbitrary filesystem,
rather than swap space, will be the backing store. The SysV
shared memory method requires an undocumented flag and is
subject to some annoying size limits. Both methods create
objects that will fail to be deleted if the program dies
before marking the objects for deletion.

Processors often have annoying limits on the immediate values
in instructions. An x86 or x86_64 JIT can go a bit faster if
all allocations are kept to the low 2 GB of address space.
There are also reasons for a 32bit-to-x86_64 JIT to chose
a nearly arbitrary 2 GB region that lies above 4 GB.
Other archs have other limits, such as 32 MB or 256 MB.

Sometimes it is very helpful to have the read/write mapping
be a fixed offset from the read/exec mapping. A power of 2
can be especially desirable.

Emulators often need a cheap way to change page permissions.
One VMA per page is no good. Besides taking up space and making
many things generally slower, having one VMA per page causes
a huge performance loss for snapshot roll-back operations.
Just tearing down all those VMAs takes a good while.

Additions to better support JIT emulators:

a. sysctl to set IPC_RMID by default


Not very good, this will break some apps.


b. shmget() flag to set IPC_RMID by default


This is better :)


c. open() flag to unlink a file before returning the fd



Well, I assume you would like fd = open("/path/somefile", O_RDWR | O_CREAT | 
O_UNLINK, 0644)


(ie allocate a file handle but no name ?)

Quite difficult to implement this atomically with current vfs, maybe a new 
syscall would be better. (Linus will kill me for that :) )


(We dont need to insert "somefile" in one directory, then unlink it, we only 
need to allocate an unnamed inode to get some backing store)


This is a generalization of anonymous inodes ( fs/anon_inodes.c  )


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-08 Thread Eric Dumazet

Albert Cahalan a écrit :

Right now, Linux isn't all that friendly to JIT emulators.
Here are the problems and suggestions to improve the situation.

There is an SE Linux execmem restriction that enforces W^X.
Assuming you don't wish to just disable SE Linux, there are
two ugly ways around the problem. You can mmap a file twice,
or you can abuse SysV shared memory. The mmap method requires
that you know of a filesystem mounted rw,exec where you can
write a very large temporary file. This arbitrary filesystem,
rather than swap space, will be the backing store. The SysV
shared memory method requires an undocumented flag and is
subject to some annoying size limits. Both methods create
objects that will fail to be deleted if the program dies
before marking the objects for deletion.

Processors often have annoying limits on the immediate values
in instructions. An x86 or x86_64 JIT can go a bit faster if
all allocations are kept to the low 2 GB of address space.
There are also reasons for a 32bit-to-x86_64 JIT to chose
a nearly arbitrary 2 GB region that lies above 4 GB.
Other archs have other limits, such as 32 MB or 256 MB.

Sometimes it is very helpful to have the read/write mapping
be a fixed offset from the read/exec mapping. A power of 2
can be especially desirable.

Emulators often need a cheap way to change page permissions.
One VMA per page is no good. Besides taking up space and making
many things generally slower, having one VMA per page causes
a huge performance loss for snapshot roll-back operations.
Just tearing down all those VMAs takes a good while.

Additions to better support JIT emulators:

a. sysctl to set IPC_RMID by default


Not very good, this will break some apps.


b. shmget() flag to set IPC_RMID by default


This is better :)


c. open() flag to unlink a file before returning the fd



Well, I assume you would like fd = open(/path/somefile, O_RDWR | O_CREAT | 
O_UNLINK, 0644)


(ie allocate a file handle but no name ?)

Quite difficult to implement this atomically with current vfs, maybe a new 
syscall would be better. (Linus will kill me for that :) )


(We dont need to insert somefile in one directory, then unlink it, we only 
need to allocate an unnamed inode to get some backing store)


This is a generalization of anonymous inodes ( fs/anon_inodes.c  )


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-08 Thread Alan Cox
 There is an SE Linux execmem restriction that enforces W^X.

This depends on whatever SELinux rulesets you are running. Its just a
good rule to have present that most programs shouldn't be self patching,
and then label those that do differently.

 Sometimes it is very helpful to have the read/write mapping
 be a fixed offset from the read/exec mapping. A power of 2
 can be especially desirable.

mmap MAP_FIXED can do this but you need to know a lot about the memory
layout of the system so it gets a bit platform specific.

 Emulators often need a cheap way to change page permissions.

mprotect(, range) rather than a page at a time. The kernel will do
merging. 

 a. sysctl to set IPC_RMID by default
 b. shmget() flag to set IPC_RMID by default

Use POSIX shared memory

 c. open() flag to unlink a file before returning the fd

Is it really that costly to create a blank file, why do you need to do it
a lot in a JIT ?

 e. mremap() flag to get a read/write mapping of a read/exec one
 f. mremap() flag to get a read/exec mapping of a read/write one
 g. mremap() flag to make the 5th arg (new addr) be the upper limit

This is all mprotect and munmap.

 h. 6-bit wide mremap() flag to set the upper limit above given base
 i. support the prot argument to remap_file_pages
 j. a documented way (madvise?) to punch same-VMA zero-page holes

mmap (although you get more VMAs from that) so memset() is probably
genuinely cheaper if the permissions are not changing.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-08 Thread Nicholas Miell
On Fri, 2007-06-08 at 12:10 +0100, Alan Cox wrote:
  e. mremap() flag to get a read/write mapping of a read/exec one
  f. mremap() flag to get a read/exec mapping of a read/write one
  g. mremap() flag to make the 5th arg (new addr) be the upper limit
 
 This is all mprotect and munmap.

I think he's asking for a way to copy an existing mapping, which does
sound genuinely useful. (i.e. mremap(ptr, size, size, MREMAP_COPY), with
no need to mess with files to get multiple mappings of the same region)

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-08 Thread Albert Cahalan

On 6/8/07, Eric Dumazet [EMAIL PROTECTED] wrote:

Albert Cahalan a écrit :



 Additions to better support JIT emulators:

 a. sysctl to set IPC_RMID by default

Not very good, this will break some apps.


As a sysctl, the admin gets to choose between
compatibility and sanity.

I can see such a sysctl also being really helpful for a
shared computer used for an Operating Systems or
System Programming course.


 b. shmget() flag to set IPC_RMID by default

This is better :)


Both are good. This one requires that all apps using
SysV shared memory be modified to use the flag.
The other requires that a very few apps be modified
to tolerate a behavior change.


 c. open() flag to unlink a file before returning the fd


Well, I assume you would like fd = open(/path/somefile, O_RDWR | O_CREAT |
O_UNLINK, 0644)

(ie allocate a file handle but no name ?)


Yes.


Quite difficult to implement this atomically with current vfs, maybe a new
syscall would be better. (Linus will kill me for that :) )

(We dont need to insert somefile in one directory, then unlink it, we only
need to allocate an unnamed inode to get some backing store)


I suspect that SMB/CIFS has a native call for this. There is
some sort of tmpfile flag defined over in that world.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: JIT emulator needs

2007-06-08 Thread Albert Cahalan

On 6/8/07, Alan Cox [EMAIL PROTECTED] wrote:

 There is an SE Linux execmem restriction that enforces W^X.

This depends on whatever SELinux rulesets you are running. Its just a
good rule to have present that most programs shouldn't be self patching,
and then label those that do differently.


A marking in the executable would have made more sense.
It is really broken having an unprivileged user being able to
create whole new executables but unable to lift this restriction
on those executables.

In any case, the restriction is common and troublesome.


 Sometimes it is very helpful to have the read/write mapping
 be a fixed offset from the read/exec mapping. A power of 2
 can be especially desirable.

mmap MAP_FIXED can do this but you need to know a lot about the memory
layout of the system so it gets a bit platform specific.


Yes. There are unportable programs, and UNPORTABLE ones.
Memory layout can vary between vendor kernels, between normal
and 32-on-64 situations, between two different C libraries...


 Emulators often need a cheap way to change page permissions.

mprotect(, range) rather than a page at a time. The kernel will do
merging.


Nope. This can happen rapidly and repeatedly to pages
that are essentially random. The median length of a range
will be a page or two. Merging won't do very much at all.


 a. sysctl to set IPC_RMID by default
 b. shmget() flag to set IPC_RMID by default

Use POSIX shared memory


That appears to have the exact same problem.


 c. open() flag to unlink a file before returning the fd

Is it really that costly to create a blank file, why do you need to do it
a lot in a JIT ?


This part isn't about cost. It's about not leaving around
debris when the JIT crashes.


 e. mremap() flag to get a read/write mapping of a read/exec one
 f. mremap() flag to get a read/exec mapping of a read/write one
 g. mremap() flag to make the 5th arg (new addr) be the upper limit

This is all mprotect and munmap.


That won't get me a second mapping. Supposing that I had
a second mapping, SE Linux would deny the mprotect.
I'm looking for a mapping that is born executable or a mapping
that is born writable, as needed, so that no transition is needed.


 h. 6-bit wide mremap() flag to set the upper limit above given base
 i. support the prot argument to remap_file_pages
 j. a documented way (madvise?) to punch same-VMA zero-page holes

mmap (although you get more VMAs from that) so memset() is probably
genuinely cheaper if the permissions are not changing.


Well cost is the problem here. I sure can find some way to
get the operation done, but it isn't cheap. For some usages,
the current setup is costly enough that one must consider
abandoning the hardware MMU in favor of a software one
emitted as part of the JIT. :-(
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/