Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()
On 09/29/2015 12:37 AM, Eric W. Biederman wrote: Oleg Nesterov writes: On 09/25, Konstantin Khlebnikov wrote: +struct ns_common *proc_ns_fdget(int fd, int nstype, struct fd *fd_ref) { - struct file *file; + struct ns_common *ns; + struct fd f; - file = fget(fd); - if (!file) + f = fdget(fd); + if (!f.file) return ERR_PTR(-EBADF); - if (file->f_op != &ns_file_operations) + if (f.file->f_op != &ns_file_operations) + goto out_invalid; + + ns = get_proc_ns(file_inode(f.file)); + if (nstype && (ns->ops->type != nstype)) goto out_invalid; - return file; + *fd_ref = f; + return ns; out_invalid: - fput(file); + fdput(f); return ERR_PTR(-EINVAL); } Well yes, fdget() makes sense but this is minor. Honestly, I do not really like the new helper... I understand this is subjective, so I won't insist. But how about 1/1? We do not need fd/file at all. With this patch your sys_getvpid() can just use proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns(). Eric, what do you think? At some level I don't care this is not exposed to userspace. However since we are going several rounds with this. Of the existing uses several of them sleep, which unfortunately means we can not use rcu locking for everything. The network namespace ones wind up taking a reference to struct net because the have the legacy pid case to deal with. Which makes we can not use fdget for all callers either. For this translate_pid rcu locking is sufficient, rcu locking is easy and doing any more than rcu locking just seems silly. So let me respectfully suggest. struct ns_common *ns_by_fd_rcu(int fd, int type) { struct files_struct *files = current->files; struct file *file; struct ns_common *ns; void *ret; pointer files seems no more useful. we can use fcheck(fd) instead. Thanks, file = fcheck_files(files, fd); if (!file) return ERR_PTR(-EBADF); if (file->f_mode & FMODE_PATH) return ERR_PTR(-EINVAL); if (file->f_op != &ns_file_operations) return ERR_PTR(-EINVAL); ns = get_proc_ns(file_inode(file)); if (ns->ops->type != type) return ERR_PTR(-EINVAL); return ns; } struct pid_namespace *pidns_by_fd_rcu(int fd) { struct ns_common *ns = ns_by_fd_rcu(fd, CLONE_NEWPID); if (IS_ERR(ns)) return ERR_CAST(ns); return container_of(ns, struct pid_namespace, ns); } SYSCALL_DEFINE3(translate_pid, pid_t, pid_nr, int, sourcefd, int, targetfd) { struct pid_namespace *source, *target; struct pid *pid; pid_t result; rcu_read_lock(); if (sourcefd >= 0) source = pidns_by_fd_rcu(sourcefd); else source = task_active_pid_ns(current); if (targetfd >= 0) target = pidns_by_fd_rcu(targetfd); else target = task_active_pid_ns(current); pid = find_pid_ns(pid_nr, source); result = pid_nr_ns(pid, target); if (result == 0) result = -ESRCH; rcu_read_unlock(); return result; } Eric . -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()
Oleg Nesterov writes: > On 09/29, Eric W. Biederman wrote: >> >> Oleg Nesterov writes: >> >> > OK, I won't insist, this too looks better to me than >> > proc_ns_fdget(&fd_ref). >> > >> > And in any case fcheck_files() makes more sense than fdget(), somehow I did >> > not think about this when I sent 1/1. >> > >> > Hmm. and after the quick look at cleanup_net() I can't understand whether >> > get_net_ns_by_fd() can use ns_by_fd_rcu() + maybe_get_net(to_net_ns()) or >> > not... Can it? >> >> Some of those places need a reference that allows them to sleep, and the >> code is shared with the legacy pid case so with an addition of get_net >> we can use ns_by_fd_rcu(). There are cases like setns that could >> use ns_by_fd_rcu() with code reording. >> >> We can implement get_net_ns_by_fd as: >> struct net *get_net_ns_by_fd(int fd) >> { >> struct net *net; >> >> rcu_read_lock(); >> net = net_ns_by_fd_rcu(fd); >> if (!IS_ERR(net)) >> get_net(net); >> rcu_read_unlock(); >> >> return net; >> } >> >> Which means we can achieve code sharing with the pure rcu version >> as a base. > > Yes, this is what I meant... but don't we need maybe_get_net() ? Let me see. The call chain is a little interesting so let's trace it. thread1 thread2 net = net_ns_by_fd_rcu(fd); fput(file) get_net(net); fput __fput dput dentry_kill __dentry_kill dentry_iput iput iput_final evict ->evict_inode (nsfs_evict) netns_put put_net __put_net queue_work(net_cleanup_work) ... cleanup_net synchronize_rcu. Given that someone in another thread can put the file descriptor and I don't see anything explicitly waiting until after the rcu critical section ends to put the network namespace I guess we do need maybe_get_net. In practice I think the delay logic in fput will act as an rcu barrier, but that is not guaranteed so we had better not count on it. Sigh. Anyway this is a long ways from a syscall to translate pids. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()
On 09/29, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > OK, I won't insist, this too looks better to me than proc_ns_fdget(&fd_ref). > > > > And in any case fcheck_files() makes more sense than fdget(), somehow I did > > not think about this when I sent 1/1. > > > > Hmm. and after the quick look at cleanup_net() I can't understand whether > > get_net_ns_by_fd() can use ns_by_fd_rcu() + maybe_get_net(to_net_ns()) or > > not... Can it? > > Some of those places need a reference that allows them to sleep, and the > code is shared with the legacy pid case so with an addition of get_net > we can use ns_by_fd_rcu(). There are cases like setns that could > use ns_by_fd_rcu() with code reording. > > We can implement get_net_ns_by_fd as: > struct net *get_net_ns_by_fd(int fd) > { > struct net *net; > > rcu_read_lock(); > net = net_ns_by_fd_rcu(fd); > if (!IS_ERR(net)) > get_net(net); > rcu_read_unlock(); > > return net; > } > > Which means we can achieve code sharing with the pure rcu version > as a base. Yes, this is what I meant... but don't we need maybe_get_net() ? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()
Oleg Nesterov writes: > On 09/28, Eric W. Biederman wrote: >> >> Oleg Nesterov writes: >> >> > Honestly, I do not really like the new helper... I understand this >> > is subjective, so I won't insist. But how about 1/1? We do not need >> > fd/file at all. With this patch your sys_getvpid() can just use >> > proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns(). >> > >> > Eric, what do you think? >> >> At some level I don't care this is not exposed to userspace. > > I agree, this is minor. But you know, the kernel is already complicated > too much, we should try to simplify/cleanup everything we can ;) > >> Of the existing uses several of them sleep, which unfortunately means we >> can not use rcu locking for everything. The network namespace ones wind >> up taking a reference to struct net because the have the legacy pid case >> to deal with. Which makes we can not use fdget for all callers either. > > And that is why 1/1 adds proc_get_ns_by_fd/get_type which can also be used > by the network namespace. > >> For this translate_pid rcu locking is sufficient, rcu locking is easy >> and doing any more than rcu locking just seems silly. So let me >> respectfully suggest. >> >> struct ns_common *ns_by_fd_rcu(int fd, int type) >> { >> struct files_struct *files = current->files; >> struct file *file; >> struct ns_common *ns; >> void *ret; >> >> file = fcheck_files(files, fd); >> if (!file) >> return ERR_PTR(-EBADF); >> >> if (file->f_mode & FMODE_PATH) >> return ERR_PTR(-EINVAL); >> >> if (file->f_op != &ns_file_operations) >> return ERR_PTR(-EINVAL); >> >> ns = get_proc_ns(file_inode(file)); >> if (ns->ops->type != type) >> return ERR_PTR(-EINVAL); >> >> return ns; >> } > > OK, I won't insist, this too looks better to me than proc_ns_fdget(&fd_ref). > > And in any case fcheck_files() makes more sense than fdget(), somehow I did > not think about this when I sent 1/1. > > Hmm. and after the quick look at cleanup_net() I can't understand whether > get_net_ns_by_fd() can use ns_by_fd_rcu() + maybe_get_net(to_net_ns()) or > not... Can it? Some of those places need a reference that allows them to sleep, and the code is shared with the legacy pid case so with an addition of get_net we can use ns_by_fd_rcu(). There are cases like setns that could use ns_by_fd_rcu() with code reording. We can implement get_net_ns_by_fd as: struct net *get_net_ns_by_fd(int fd) { struct net *net; rcu_read_lock(); net = net_ns_by_fd_rcu(fd); if (!IS_ERR(net)) get_net(net); rcu_read_unlock(); return net; } Which means we can achieve code sharing with the pure rcu version as a base. If the networking code did not have the legacy pid case to handle I would want to do something with struct fd, as the file descriptor already keeps the struct net reference alive and struct fd allows for sleeping. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()
On 09/28, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > Honestly, I do not really like the new helper... I understand this > > is subjective, so I won't insist. But how about 1/1? We do not need > > fd/file at all. With this patch your sys_getvpid() can just use > > proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns(). > > > > Eric, what do you think? > > At some level I don't care this is not exposed to userspace. I agree, this is minor. But you know, the kernel is already complicated too much, we should try to simplify/cleanup everything we can ;) > Of the existing uses several of them sleep, which unfortunately means we > can not use rcu locking for everything. The network namespace ones wind > up taking a reference to struct net because the have the legacy pid case > to deal with. Which makes we can not use fdget for all callers either. And that is why 1/1 adds proc_get_ns_by_fd/get_type which can also be used by the network namespace. > For this translate_pid rcu locking is sufficient, rcu locking is easy > and doing any more than rcu locking just seems silly. So let me > respectfully suggest. > > struct ns_common *ns_by_fd_rcu(int fd, int type) > { > struct files_struct *files = current->files; > struct file *file; > struct ns_common *ns; > void *ret; > > file = fcheck_files(files, fd); > if (!file) > return ERR_PTR(-EBADF); > > if (file->f_mode & FMODE_PATH) > return ERR_PTR(-EINVAL); > > if (file->f_op != &ns_file_operations) > return ERR_PTR(-EINVAL); > > ns = get_proc_ns(file_inode(file)); > if (ns->ops->type != type) > return ERR_PTR(-EINVAL); > > return ns; > } OK, I won't insist, this too looks better to me than proc_ns_fdget(&fd_ref). And in any case fcheck_files() makes more sense than fdget(), somehow I did not think about this when I sent 1/1. Hmm. and after the quick look at cleanup_net() I can't understand whether get_net_ns_by_fd() can use ns_by_fd_rcu() + maybe_get_net(to_net_ns()) or not... Can it? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()
Oleg Nesterov writes: > On 09/25, Konstantin Khlebnikov wrote: >> >> +struct ns_common *proc_ns_fdget(int fd, int nstype, struct fd *fd_ref) >> { >> -struct file *file; >> +struct ns_common *ns; >> +struct fd f; >> >> -file = fget(fd); >> -if (!file) >> +f = fdget(fd); >> +if (!f.file) >> return ERR_PTR(-EBADF); >> >> -if (file->f_op != &ns_file_operations) >> +if (f.file->f_op != &ns_file_operations) >> +goto out_invalid; >> + >> +ns = get_proc_ns(file_inode(f.file)); >> +if (nstype && (ns->ops->type != nstype)) >> goto out_invalid; >> >> -return file; >> +*fd_ref = f; >> +return ns; >> >> out_invalid: >> -fput(file); >> +fdput(f); >> return ERR_PTR(-EINVAL); >> } > > Well yes, fdget() makes sense but this is minor. > > Honestly, I do not really like the new helper... I understand this > is subjective, so I won't insist. But how about 1/1? We do not need > fd/file at all. With this patch your sys_getvpid() can just use > proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns(). > > Eric, what do you think? At some level I don't care this is not exposed to userspace. However since we are going several rounds with this. Of the existing uses several of them sleep, which unfortunately means we can not use rcu locking for everything. The network namespace ones wind up taking a reference to struct net because the have the legacy pid case to deal with. Which makes we can not use fdget for all callers either. For this translate_pid rcu locking is sufficient, rcu locking is easy and doing any more than rcu locking just seems silly. So let me respectfully suggest. struct ns_common *ns_by_fd_rcu(int fd, int type) { struct files_struct *files = current->files; struct file *file; struct ns_common *ns; void *ret; file = fcheck_files(files, fd); if (!file) return ERR_PTR(-EBADF); if (file->f_mode & FMODE_PATH) return ERR_PTR(-EINVAL); if (file->f_op != &ns_file_operations) return ERR_PTR(-EINVAL); ns = get_proc_ns(file_inode(file)); if (ns->ops->type != type) return ERR_PTR(-EINVAL); return ns; } struct pid_namespace *pidns_by_fd_rcu(int fd) { struct ns_common *ns = ns_by_fd_rcu(fd, CLONE_NEWPID); if (IS_ERR(ns)) return ERR_CAST(ns); return container_of(ns, struct pid_namespace, ns); } SYSCALL_DEFINE3(translate_pid, pid_t, pid_nr, int, sourcefd, int, targetfd) { struct pid_namespace *source, *target; struct pid *pid; pid_t result; rcu_read_lock(); if (sourcefd >= 0) source = pidns_by_fd_rcu(sourcefd); else source = task_active_pid_ns(current); if (targetfd >= 0) target = pidns_by_fd_rcu(targetfd); else target = task_active_pid_ns(current); pid = find_pid_ns(pid_nr, source); result = pid_nr_ns(pid, target); if (result == 0) result = -ESRCH; rcu_read_unlock(); return result; } Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()
On 25.09.2015 20:56, Oleg Nesterov wrote: On 09/25, Konstantin Khlebnikov wrote: +struct ns_common *proc_ns_fdget(int fd, int nstype, struct fd *fd_ref) { - struct file *file; + struct ns_common *ns; + struct fd f; - file = fget(fd); - if (!file) + f = fdget(fd); + if (!f.file) return ERR_PTR(-EBADF); - if (file->f_op != &ns_file_operations) + if (f.file->f_op != &ns_file_operations) + goto out_invalid; + + ns = get_proc_ns(file_inode(f.file)); + if (nstype && (ns->ops->type != nstype)) goto out_invalid; - return file; + *fd_ref = f; + return ns; out_invalid: - fput(file); + fdput(f); return ERR_PTR(-EINVAL); } Well yes, fdget() makes sense but this is minor. Honestly, I do not really like the new helper... I understand this is subjective, so I won't insist. But how about 1/1? We do not need fd/file at all. With this patch your sys_getvpid() can just use proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns(). Hmm. My version has 0 or 2 atomic ops per get-put sequence. Your version: 2 or 4 atomic ops. Plus even in worst case pinning by struct fd theoretically scales better because it touches refcount at struct file: there're might be many of them for one namespace. Eric, what do you think? See also "TODO" in the changelog. Oleg. -- Konstantin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/1] ns: introduce proc_get_ns_by_fd()
On 09/25, Konstantin Khlebnikov wrote: > > +struct ns_common *proc_ns_fdget(int fd, int nstype, struct fd *fd_ref) > { > - struct file *file; > + struct ns_common *ns; > + struct fd f; > > - file = fget(fd); > - if (!file) > + f = fdget(fd); > + if (!f.file) > return ERR_PTR(-EBADF); > > - if (file->f_op != &ns_file_operations) > + if (f.file->f_op != &ns_file_operations) > + goto out_invalid; > + > + ns = get_proc_ns(file_inode(f.file)); > + if (nstype && (ns->ops->type != nstype)) > goto out_invalid; > > - return file; > + *fd_ref = f; > + return ns; > > out_invalid: > - fput(file); > + fdput(f); > return ERR_PTR(-EINVAL); > } Well yes, fdget() makes sense but this is minor. Honestly, I do not really like the new helper... I understand this is subjective, so I won't insist. But how about 1/1? We do not need fd/file at all. With this patch your sys_getvpid() can just use proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns(). Eric, what do you think? See also "TODO" in the changelog. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/