), but for socketcall-based architectures this really
should be a socketcall.
Don't you also need fconnect()? Or is that simply handled by allowing
open() without O_PATH?
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf
On 08/15/2012 09:52 AM, Ben Pfaff wrote:
Stanislav Kinsbursky skinsbur...@parallels.com writes:
This system call is especially required for UNIX sockets, which has name
lenght limitation.
The worst of the name length limitations can be worked around by
opening the directory where the
On 08/15/2012 12:49 PM, Eric W. Biederman wrote:
There is also the trick of getting a shorter directory name using
/proc/self/fd if you are threaded and can't change the directory.
The obvious choices at this point are
- Teach bind and connect and af_unix sockets to take longer AF_UNIX
On 08/10/2012 05:57 AM, Stanislav Kinsbursky wrote:
Today, there is a problem in connecting of local SUNRPC thansports. These
transports uses UNIX sockets and connection itself is done by rpciod
workqueue.
But UNIX sockets lookup is done in context of process file system root. I.e.
all local
On 08/10/2012 11:26 AM, Alan Cox wrote:
On that whole subject...
Do we need a Unix domain socket equivalent to openat()?
I don't think so. The name is just a file system indexing trick, it's not
really the socket proper. It's little more than ascii string with
permissions attached - indeed
On 08/10/2012 11:40 AM, Alan Cox wrote:
Agreed on open() for sockets.. the lack of open is a Berklix derived
pecularity of the interface. It would equally be useful to be able to
open /dev/socket/ipv4/1.2.3.4/1135 and the like for scripts and stuff
That needs VFS changes however so you can
On 08/10/2012 12:28 PM, Alan Cox wrote:
Explicitly for Linux yes - this is not generally true of the AF_UNIX
socket domain and even the permissions aspect isn't guaranteed to be
supported on some BSD environments !
Yes, but let's worry about what the Linux behavior should be.
The name is
On 07/06/2010 08:12 AM, Oren Laadan wrote:
The child returns from vfork, via the same return address that
the parent will later use. (on the stack for many architectures)
The child then calls a function which might not have the same
stack layout as vfork, scrambling whatever may be on the
it with
sys_clone, and it turned out to be a mess. Doing it in a separate
system call -- even though the internals are largely the same -- is cleaner.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf
On 10/22/2009 07:26 PM, Michael Kerrisk wrote:
3 is number of arguments.
sys_clone3(struct clone_struct __user *ucs, pid_t __user *pids)
It appears to me that the number of arguments is 2.
It was 3 at one point... I'm not sure when that changed last :-/
It's better than extended or
On 10/22/2009 09:14 PM, Michael Kerrisk wrote:
So, sometimes, a number in a system call should be the bit width of
some arguments(s), sometimes it should be the number of arguments, and
sometimes (well, just occasionally, as in mmap2() and clone()) -- it
should be a version number? Does the
On 10/21/2009 01:26 PM, Michael Kerrisk wrote:
My question here is: what does 3 actually mean? In general, system
calls have not followed any convention of numbering to indicate
successive versions -- clone2() being the one possible exception that
I know of.
3 is number of arguments. It's
On 10/22/2009 04:44 AM, Sukadev Bhattiprolu wrote:
3 is number of arguments.
To me, it is a version number.
mmap() and mmap2() both have 6 parameters.
You keep bringing this up. mmap2() is (a) a non-user-visible call; (b)
an exception (a mistake, if you want.)
Besides if wait4() were
On 10/20/2009 02:44 AM, Matt Helsley wrote:
|
| I know I'm late to this discussion, but why the name clone3()? It's
| not consistent with any other convention used fo syscall naming,
This assumption, of course, is just plain wrong. Look at the wait
system calls, for example. However, when a
On 10/14/2009 03:36 PM, Sukadev Bhattiprolu wrote:
H. Peter Anvin [...@zytor.com] wrote:
|
| Overall it seems sane to:
|
| a) make it an actual 3-argument call;
| b) make the existing flags a u32 forever, and make it a separate
|argument;
| c) any new expansion can be via the struct
to:
a) make it an actual 3-argument call;
b) make the existing flags a u32 forever, and make it a separate
argument;
c) any new expansion can be via the struct, which may want to have
an c3_flags field first in the structure.
-hpa
--
H. Peter Anvin, Intel Open Source Technology
On 10/13/2009 06:39 PM, Matt Helsley wrote:
On Tue, Oct 13, 2009 at 04:49:05PM -0700, H. Peter Anvin wrote:
On 10/12/2009 09:49 PM, Sukadev Bhattiprolu wrote:
This patchset implements a new system call, clone3() that lets a process
specify the pids of the child process.
A system call named
On 10/13/2009 09:40 PM, Sukadev Bhattiprolu wrote:
H. Peter Anvin [...@zytor.com] wrote:
|
| Except we can't use clone2() because it conflicts on ia64. Care to
propose
| a name you would prefer?
Yes, I am running out of ideas :-)
How about clone64_with_pids() ? - hope we don't
On 09/29/2009 11:40 AM, Roland McGrath wrote:
Why add a new syscall at all instead of just using a new CLONE_* flag to
indicate that the argument layout is different?
What an absolutely atrociously bad idea.
We already have a syscall layer which is painful to thunk in places, and
this would
On 09/29/2009 12:02 PM, Arjan van de Ven wrote:
On Tue, 29 Sep 2009 11:44:52 -0700
H. Peter Anvin h...@zytor.com wrote:
On 09/29/2009 11:40 AM, Roland McGrath wrote:
Why add a new syscall at all instead of just using a new CLONE_*
flag to indicate that the argument layout is different
On 09/29/2009 12:10 PM, Linus Torvalds wrote:
On Tue, 29 Sep 2009, Arjan van de Ven wrote:
We already have a syscall layer which is painful to thunk in places,
and this would make it much worse.
syscalls are cheap as well.
cheaper than decades of dealing with such multiplexer mess ;/
On 09/29/2009 03:11 PM, Linus Torvalds wrote:
Ok, I agree with that. The kernel side is easy (we have magic calling
conventions there and need to turn registers into arguments anyway before
you get to the shared code), but your point about the user side prototype
is valid.
I think it
length:
struct pid_set {
int num_pids;
pid_t pids[1];
};
In C99, this is spelt:
struct pid_set {
int num_pids;
pid_t pids[];
};
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf
On 09/09/2009 11:03 AM, Sukadev Bhattiprolu wrote:
C90 or C99 below should work. Is it ok to use a data structure that is
not in C89 ?
C89 is the same as C90 (C89 refers to the ANSI standard, C90 to the ISO
standard, but they're functionally identical.)
BTW, would it work if we defined
it's the latter piece which causes problems.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux
case which would require scaling below
the PAGE_SIZE level... in which case it would be nice for it to
gracefully decay to a single kmalloc allocation + some metadata.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf
Serge E. Hallyn wrote:
If you want security and permission arguments get with Serge and finish
the uid namespace. The you will have a user that looks like root but
does not have permissions to do most things.
Right, and in particular the way it would partially solve this issue is
that the
Daniel Lezcano wrote:
Yep, I changed my mind, I think Eric and HPA are right. devpts is a
file system and not a namespace even if the result is the same. That
makes sense to keep a global sysctl for the root container and handle
security problem with user namespace and mount option.
kernel memory used on 32-bit systems
especially, which is somewhat inherently global.
Resource limit partitioning is a much bigger and orthogonal problem.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf
Daniel Lezcano wrote:
Resource limit partitioning is a much bigger and orthogonal problem.
In this case we don't have the pty allocated independently, no ?
I mean one container can allocate 4095 pty, making a pty starvation for
others containers. Or imagine I am a vilain and I want to
.
Containers may very well want resource control, but it's a separate
issue from naming.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
___
Containers mailing list
contain
Serge E. Hallyn wrote:
Looks good. In the very last part, you might say just a little more to
make sure it's clear: You want to mount -o newinstance before sshd
or gnome is started in the root container, so that a child container
can't reach your devpts by doing a mount -t devpts without -o
have only reviewed it,
not actually tested it.
Acked-by: H. Peter Anvin [EMAIL PROTECTED]
-hpa
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers
[EMAIL PROTECTED] wrote:
Agree in general. Not sure if you are implying remount is necessary just
to change permissions of pts/ptmx. Why not chmod 0666 /dev/pts/ptmx ?
The remount changes the 'ptmxmode' setting, but since the node exists,
the 'ptmxmode' setting is never used again and we need
[EMAIL PROTECTED] wrote:
2. To effectively use the multi-instance mode, applications/libraries
should, open /dev/pts/ptmx instead of /dev/ptmx but obviously
this would fail in the legacy mode.
NOT SO!
/dev/ptmx is required by Unix98 (which is arguably
[EMAIL PROTECTED] wrote:
Ah, ok. Well, I will remove that para from the patch description.
If the -o newinstance is NOT followed by the bind mount, ptys won't
work and would be nice if we can print a useful message when opening
/dev/ptmx.
We can't, really, because it will open the
Alan Cox wrote:
O We can't, really, because it will open the global ptmx. This is an
unfortunate side effect of the backwards-compatibility code.
This is also why I don't like the bind mount; the symlink option has the
nice property that f*ckups are more obvious.
It's asking for trouble
[EMAIL PROTECTED] wrote:
But that node will not be accessible if there is a newinstance mount
without the bind mount ? IOW
1. mount -t devpts -o newinstance lxcpts /dev/pts
2. mount -o bind /dev/pts/ptmx /dev/ptmx
If both #1 and #2 or neither happen there is no problem.
[EMAIL PROTECTED] wrote:
When both modes are used simultaneously, we have following options:
1. Let container-startup deal with it i.e use above bind-mount approach
or, as Serge mentioned, have containers chroot and make ptmx-pts/ptmx
symlink or another option ?
2. Have the
Alan Cox wrote:
In the case of the initial open you don't yet know the tty pointer and
may be creating it. SO the tty isn't a reference because it doesn't exist.
Got it. I was under the (apparently mistaken) notion that only pty tty
structures were created dynamically.
-hpa
Alan Cox wrote:
On Tue, 26 Aug 2008 09:40:20 -0700
H. Peter Anvin [EMAIL PROTECTED] wrote:
Alan Cox wrote:
In the case of the initial open you don't yet know the tty pointer and
may be creating it. SO the tty isn't a reference because it doesn't exist.
Got it. I was under the (apparently
[EMAIL PROTECTED] wrote:
By extension, maybe the tty layer would need another interface to determine
the instance:
instance = driver-ops-get_instance(driver, inode, other_stuff)
using this we find the tty
tty = driver-ops-something(driver, instance, idx);
This seems
Alan Cox wrote:
This seems more than a bit redundant. The instance, IMO, *is* the tty
structure; so the interface should be:
Only for a re-open - which is very different to an initial open,
and /dev/tty is deep magic in this situation.
I guess I fail to understand something here, perhaps
[EMAIL PROTECTED] wrote:
tty = driver-ops-get_tty(driver, inode [, other_stuff?]);
Can the inode be used to identify the driver too ? (but inode to driver
mapping is not trivial atm).
It can, but it's an O(n) operation in the number of registered drivers.
However, we can only call
[EMAIL PROTECTED] wrote:
Yes, we know the driver, but do we need to pass it into -get_tty() ?
Passing it in (or having the operation compute from inode) has advantage
of allowing drivers to share code if necessary.
Yes, and it gets access to its own data. It's how you implement an
Alan Cox wrote:
This patch has the kernel internally create the [ptmx, c, 5:2] device
when mounting devpts filesystem. The permissions for the device node
can be specified by the '-o ptmx_mode=0666' option. The default mode
is 0666.
NAK
Hopefully, presence of the 'ptmx' node in
Cedric Le Goater wrote:
I suggest newinstance, but newns works, too.
Could we also use this mount option to 'unshare' a new posix message queue
namespace ?
Sorry, I fail to see the connection with devpts here? Are you
suggesting using the same option for another filesystem (if so,
Alan Cox wrote:
auto-created, than supporting mknod(2) inside the devpts filesystem.
It's not a matter of changing the user space; it's a matter of what
makes most sense inside the kernel.
Having an extra node with different permissions suddenely appear without
warning isn't I think good
Cedric Le Goater wrote:
H. Peter Anvin wrote:
Cedric Le Goater wrote:
I suggest newinstance, but newns works, too.
Could we also use this mount option to 'unshare' a new posix message
queue namespace ?
Sorry, I fail to see the connection with devpts here? Are you
suggesting using the same
[EMAIL PROTECTED] wrote:
I had the new ptmx node only in 'multi-mount' mode initially. But if users
want the multi-mount semantics, /dev/ptmx must be a symlink. If its a symlink,
we break in the single-mount case (which does not have the ptmx node and
we don't support mknod in pts).
True,
Eric W. Biederman wrote:
I had the new ptmx node only in 'multi-mount' mode initially. But if users
want the multi-mount semantics, /dev/ptmx must be a symlink. If its a
symlink,
we break in the single-mount case (which does not have the ptmx node and
we don't support mknod in pts).
Then
[EMAIL PROTECTED] wrote:
Hmm, so, single and multi-mount don't coexist ? i.e some are multi-mounts
while others are single-mounts.
The way I looked at is that even if a distro has not yet updated the
startup script (fstab), we could use the multi-mount. Maybe a container
startup script
Eric W. Biederman wrote:
The point of making it a bind is to address the concerns about
backwards compatibility in user space. In particular security
conscious applications and applications that perform sanity checks
are known to ignore things if they are the wrong type in the filesystem.
[EMAIL PROTECTED] wrote:
TODO:
- Remove even initial kernel mount of devpts ? (If we do, how
do we preserve single-mount semantics) ?
Doesn't make sense unless we decide to drop single-mount semantics in
the (far) future. As long as we have an instance that services
[EMAIL PROTECTED] wrote:
I don't like the name newmnt for the option; it is not just another
mount, but a whole new instance of the pty space.
I agree. Its mostly a place-holder for now. How about newns or newptsns ?
I suggest newinstance, but newns works, too.
I observe you didn't
Kyle Moffett wrote:
On Tue, Aug 5, 2008 at 2:15 AM, Eric W. Biederman [EMAIL PROTECTED] wrote:
There definitely needs to be a mount option (and possibly a config
option to forcibly enable the mount option). I personally have 5 or 6
different custom scripts that depend on being able to unmount
[EMAIL PROTECTED] wrote:
If devpts is mounted more than once, then '/dev/ptmx' must be a symlink
to '/dev/pts/ptmx' and in each new devpts mount we must create the
device node '/dev/pts/ptmx' [c, 5;2] by hand.
This should be auto-created. That also eliminates any need to support
the
[EMAIL PROTECTED] wrote:
Appreciate comments on overall approach of my mapping from the inode
to sb-s_fs_info to allocated_ptys and the hacky use of get_sb_nodev(),
and also on the tweak to init_dev() (patch 6).
First of all, thanks for taking this on :) It's always delightful to
spout
[EMAIL PROTECTED] wrote:
Ok. But was wondering if we can pass the ptmx symlink burden to the
'container-startup sripts' since they are the ones that need the second
or subsequent mount of devpts.
So, initially and for systems that don't need multiple mounts of devpts,
existing behavior
[EMAIL PROTECTED] wrote:
IIRC, /dev/tty also needs a similar symlink.
Why? I do not believe that is correct.
-hpa
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers
Kyle Moffett wrote:
Here's my suggestion:
By default, without any mount options, use the current legacy
behavior. The devpts filesystem would point to a global instance on
the whole box, controlled by the traditional /dev/ptmx device node.
There would *NOT* be a /dev/pts/ptmx node.
If
Since the issue of PTY namespaces came up (and was rejected) back in
April, I have thought a little bit about changing ptys to be tied
directly into a devpts instance. devpts would then be a normal
filesystem, which can be mounted multiple times (or not at all). pty's
would then become
Dave Hansen wrote:
On Fri, 2008-08-01 at 11:12 -0700, H. Peter Anvin wrote:
1. /dev/ptmx would have to change to a symlink, ptmx - pts/ptmx.
...
I worry #1 would have substantial user-space impact, but I don't see a
way around it, since there would be no obvious way to associate
/dev/ptmx
Serge E. Hallyn wrote:
Subrata,
pty namespaces as such are not going to happen. We'll be pursuing
full-scale device namespaces instead.
Again, either that, or tie Unix98 pty's closer into devpts (which would
have other advantages, in particular avoiding the double lookup) which
would
).
Signed-off-by: Sukadev Bhattiprolu [EMAIL PROTECTED]
Seems nice and non-contentuous.
Acked-by: Serge Hallyn [EMAIL PROTECTED]
Acked-by: H. Peter Anvin [EMAIL PROTECTED]
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org
Hallyn[EMAIL PROTECTED]
Signed-off-by: Matt Helsley[EMAIL PROTECTED]
No traces of devpts namespaces here, so I assume this should be
non-offensive and fine for inclusion.
Acked-by: H. Peter Anvin [EMAIL PROTECTED]
___
Containers mailing list
[EMAIL PROTECTED] wrote:
Some simple helper patches to enable implementation of multiple PTY
(or device) namespaces.
[PATCH 1/4]: Propagate error code from devpts_pty_new
[PATCH 2/4]: Factor out PTY index allocation
[PATCH 3/4]: Move devpts globals into init_pts_ns
Eric W. Biederman wrote:
/dev/ptmx can be a symlink ptmx - pts/ptmx, and we add a ptmx instance
inside the devpts filesystem. Each devpts filesystem is responsible for
its own pool of ptys, with own numbering, etc.
This does mean that entries in /dev/pts are more than just plain device
H. Peter Anvin wrote:
Thinking about it further, allowing this restriction would also allow a
whole lot of cleanups inside the pty setup, since it would eliminate the
need to do a separate lookup to find the corresponding devpts entry in
pty_open(). The benefit here comes from the closer
Cedric Le Goater wrote:
OK. I didn't know that. I took sys_llseek() as an example of an interface
to follow when coded clone64().
llseek() was the first system call that took a doublewidth argument.
It's not the one you want to mimic.
-hpa
[EMAIL PROTECTED] wrote:
We want to provide isolation between containers, meaning PTYs in container
C1 should not be accessible to processes in C2 (unless C2 is an ancestor).
Yes, I certainly can understand the desire for isolation. That wasn't
what my question was about.
The other reason
[EMAIL PROTECTED] wrote:
This is a resend of the patch set Cedric had sent earlier. I ported
the patch set to 2.6.25-rc8-mm1 and tested on x86 and x86_64.
---
We have run out of the 32 bits in clone_flags !
This patchset introduces 2 new system calls which support 64bit clone-flags.
[EMAIL PROTECTED] wrote:
If you're going to make it a 64-bit pass it in as a 64-bit number, instead
of breaking it into two numbers.
Maybe I am missing your point. The glibc interface could take a 64bit
parameter, but don't we need to pass 32-bit values into the system call
on 32 bit
[EMAIL PROTECTED] wrote:
Jakub Jelinek [EMAIL PROTECTED] wrote:
| On Wed, Apr 09, 2008 at 03:34:59PM -0700, [EMAIL PROTECTED] wrote:
| From: Cedric Le Goater [EMAIL PROTECTED]
| Subject: [PATCH 3/3] add the clone64() and unshare64() syscalls
|
| This patch adds 2 new syscalls :
|
|
[EMAIL PROTECTED] wrote:
Devpts namespace patchset
In continuation of the implementation of containers in mainline, we need to
support multiple PTY namespaces so that the PTY index (ie the tty names) in
one container is independent of the PTY indices of other containers. For
instance this
Jonathan Corbet wrote:
Heh, indeed. But we do seem to have a recurring problem of people
wanting to extend sys_foo() beyond the confines of its original API.
I've observed a few ways of doing that:
- create sys_foo2() (or sys_foo64(), or sys_fooat(), or sys_pfoo(),
or...) and add the new
Nick Piggin wrote:
This should work because the result gets used before reading again:
read_cr3(a);
write_cr3(a | 1);
read_cr3(a);
But this might be reordered so that b gets read before the write:
read_cr3(a);
write_cr3(a | 1);
read_cr3(b);
?
I don't see how, as write_cr3 clobbers memory.
Arjan van de Ven wrote:
On Tue, 02 Oct 2007 18:08:32 +0400
Kirill Korotaev [EMAIL PROTECTED] wrote:
Some gcc versions (I checked at least 4.1.1 from RHEL5 4.1.2 from
gentoo) can generate incorrect code with read_crX()/write_crX()
functions mix up, due to cached results of read_crX().
I'm
Miklos Szeredi wrote:
Andrew, please skip this patch, for now.
Serge found a problem with the fsuid approach: setfsuid(nonzero) will
remove filesystem related capabilities. So even if root is trying to
set the user=UID flag on a mount, access to the target (and in case
of bind, the
Ram Pai wrote:
It is in FC6. I dont know the status off upstream util-linux. I did
submit the patch many times to Adrian Bunk (the then util-linux
maintainer) and got no response. I have not pushed the patches to the
new maintainer(Karel Zak?) though.
Well, do that, then :)
Seriously.
Andi Kleen wrote:
On Monday 02 April 2007 13:38, Alexey Dobriyan wrote:
They will be used by cpuid driver and powernow-k8 cpufreq driver.
With these changes powernow-k8 driver could run correctly on OpenVZ kernels
with virtual cpus enabled (SCHED_VCPU).
This means openvz has multiple virtual
Alexey Dobriyan wrote:
Now that cpuid_on_cpu() is in core, cpuid driver can be shrinked.
Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
Hi Alexey,
This, and your other changes in this area does conflict with the work
that I've been doing on extending the usability of the CPUID and MSR
Alexey Dobriyan wrote:
+asmlinkage long sys_lutimesat(int dfd, char __user *filename, struct timeval
__user *utimes)
Could we get these to take struct timespec instead of struct timeval?
Right now we have a real problem in that the interfaces that *set* times
take struct timeval
83 matches
Mail list logo