Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()

2015-09-29 Thread Chen Fan


On 09/29/2015 12:37 AM, Eric W. Biederman wrote:

Oleg Nesterov  writes:


On 09/25, Konstantin Khlebnikov wrote:

+struct ns_common *proc_ns_fdget(int fd, int nstype, struct fd *fd_ref)
  {
-   struct file *file;
+   struct ns_common *ns;
+   struct fd f;
  
-	file = fget(fd);

-   if (!file)
+   f = fdget(fd);
+   if (!f.file)
return ERR_PTR(-EBADF);
  
-	if (file->f_op != &ns_file_operations)

+   if (f.file->f_op != &ns_file_operations)
+   goto out_invalid;
+
+   ns = get_proc_ns(file_inode(f.file));
+   if (nstype && (ns->ops->type != nstype))
goto out_invalid;
  
-	return file;

+   *fd_ref = f;
+   return ns;
  
  out_invalid:

-   fput(file);
+   fdput(f);
return ERR_PTR(-EINVAL);
  }

Well yes, fdget() makes sense but this is minor.

Honestly, I do not really like the new helper... I understand this
is subjective, so I won't insist. But how about 1/1? We do not need
fd/file at all. With this patch your sys_getvpid() can just use
proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns().

Eric, what do you think?

At some level I don't care this is not exposed to userspace.

However since we are going several rounds with this.

Of the existing uses several of them sleep, which unfortunately means we
can not use rcu locking for everything.  The network namespace ones wind
up taking a reference to struct net because the have the legacy pid case
to deal with.  Which makes we can not use fdget for all callers either.

For this translate_pid rcu locking is sufficient, rcu locking is easy
and doing any more than rcu locking just seems silly.  So let me
respectfully suggest.

struct ns_common *ns_by_fd_rcu(int fd, int type)
{
struct files_struct *files = current->files;
struct file *file;
struct ns_common *ns;
void *ret;

pointer files seems no more useful. we can
use fcheck(fd) instead.

Thanks,



file = fcheck_files(files, fd);
 if (!file)
return ERR_PTR(-EBADF);

if (file->f_mode & FMODE_PATH)
return ERR_PTR(-EINVAL);

if (file->f_op != &ns_file_operations)
return ERR_PTR(-EINVAL);

ns = get_proc_ns(file_inode(file));
if (ns->ops->type != type)
return ERR_PTR(-EINVAL);

return ns;
}

struct pid_namespace *pidns_by_fd_rcu(int fd)
{
struct ns_common *ns = ns_by_fd_rcu(fd, CLONE_NEWPID);
 if (IS_ERR(ns))
return ERR_CAST(ns);
return container_of(ns, struct pid_namespace, ns);
}

SYSCALL_DEFINE3(translate_pid, pid_t, pid_nr, int, sourcefd, int, targetfd)
{
struct pid_namespace *source, *target;
struct pid *pid;
pid_t result;

rcu_read_lock();

if (sourcefd >= 0)
source = pidns_by_fd_rcu(sourcefd);
else
source = task_active_pid_ns(current);

if (targetfd >= 0)
target = pidns_by_fd_rcu(targetfd);
else
target = task_active_pid_ns(current);

pid = find_pid_ns(pid_nr, source);
 result = pid_nr_ns(pid, target);
 if (result == 0)
result = -ESRCH;

rcu_read_unlock();

return result;
}

Eric
.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()

2015-09-29 Thread Eric W. Biederman
Oleg Nesterov  writes:

> On 09/29, Eric W. Biederman wrote:
>>
>> Oleg Nesterov  writes:
>>
>> > OK, I won't insist, this too looks better to me than 
>> > proc_ns_fdget(&fd_ref).
>> >
>> > And in any case fcheck_files() makes more sense than fdget(), somehow I did
>> > not think about this when I sent 1/1.
>> >
>> > Hmm. and after the quick look at cleanup_net() I can't understand whether
>> > get_net_ns_by_fd() can use ns_by_fd_rcu() + maybe_get_net(to_net_ns()) or
>> > not... Can it?
>>
>> Some of those places need a reference that allows them to sleep, and the
>> code is shared with the legacy pid case so with an addition of get_net
>> we can use ns_by_fd_rcu().   There are cases like setns that could
>> use ns_by_fd_rcu() with code reording.
>>
>> We can implement get_net_ns_by_fd as:
>> struct net *get_net_ns_by_fd(int fd)
>> {
>> struct net *net;
>>
>>  rcu_read_lock();
>>  net = net_ns_by_fd_rcu(fd);
>> if (!IS_ERR(net))
>>  get_net(net);
>> rcu_read_unlock();
>>
>>  return net;
>> }
>>
>> Which means we can achieve code sharing with the pure rcu version
>> as a base.
>
> Yes, this is what I meant... but don't we need maybe_get_net() ?

Let me see.

The call chain is a little interesting so let's trace it.

thread1   thread2
  net = net_ns_by_fd_rcu(fd);   fput(file)
  get_net(net);   fput
__fput
  dput
dentry_kill
  __dentry_kill
dentry_iput
  iput
iput_final
  evict
->evict_inode 
(nsfs_evict)
  netns_put
 put_net
   __put_net
  
queue_work(net_cleanup_work)
  ...
  
cleanup_net

synchronize_rcu.


Given that someone in another thread can put the file descriptor and I
don't see anything explicitly waiting until after the rcu critical
section ends to put the network namespace I guess we do need
maybe_get_net.

In practice I think the delay logic in fput will act as an rcu barrier,
but that is not guaranteed so we had better not count on it.  Sigh.

Anyway this is a long ways from a syscall to translate pids.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()

2015-09-29 Thread Oleg Nesterov
On 09/29, Eric W. Biederman wrote:
>
> Oleg Nesterov  writes:
>
> > OK, I won't insist, this too looks better to me than proc_ns_fdget(&fd_ref).
> >
> > And in any case fcheck_files() makes more sense than fdget(), somehow I did
> > not think about this when I sent 1/1.
> >
> > Hmm. and after the quick look at cleanup_net() I can't understand whether
> > get_net_ns_by_fd() can use ns_by_fd_rcu() + maybe_get_net(to_net_ns()) or
> > not... Can it?
>
> Some of those places need a reference that allows them to sleep, and the
> code is shared with the legacy pid case so with an addition of get_net
> we can use ns_by_fd_rcu().   There are cases like setns that could
> use ns_by_fd_rcu() with code reording.
>
> We can implement get_net_ns_by_fd as:
> struct net *get_net_ns_by_fd(int fd)
> {
> struct net *net;
>
>   rcu_read_lock();
>   net = net_ns_by_fd_rcu(fd);
> if (!IS_ERR(net))
>   get_net(net);
> rcu_read_unlock();
>
>   return net;
> }
>
> Which means we can achieve code sharing with the pure rcu version
> as a base.

Yes, this is what I meant... but don't we need maybe_get_net() ?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()

2015-09-29 Thread Eric W. Biederman
Oleg Nesterov  writes:

> On 09/28, Eric W. Biederman wrote:
>>
>> Oleg Nesterov  writes:
>>
>> > Honestly, I do not really like the new helper... I understand this
>> > is subjective, so I won't insist. But how about 1/1? We do not need
>> > fd/file at all. With this patch your sys_getvpid() can just use
>> > proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns().
>> >
>> > Eric, what do you think?
>>
>> At some level I don't care this is not exposed to userspace.
>
> I agree, this is minor. But you know, the kernel is already complicated
> too much, we should try to simplify/cleanup everything we can ;)
>
>> Of the existing uses several of them sleep, which unfortunately means we
>> can not use rcu locking for everything.  The network namespace ones wind
>> up taking a reference to struct net because the have the legacy pid case
>> to deal with. Which makes we can not use fdget for all callers either.
>
> And that is why 1/1 adds proc_get_ns_by_fd/get_type which can also be used
> by the network namespace.
>
>> For this translate_pid rcu locking is sufficient, rcu locking is easy
>> and doing any more than rcu locking just seems silly.  So let me
>> respectfully suggest.
>>
>> struct ns_common *ns_by_fd_rcu(int fd, int type)
>> {
>>  struct files_struct *files = current->files;
>>  struct file *file;
>>  struct ns_common *ns;
>>  void *ret;
>>
>>  file = fcheck_files(files, fd);
>> if (!file)
>>  return ERR_PTR(-EBADF);
>>
>>  if (file->f_mode & FMODE_PATH)
>>  return ERR_PTR(-EINVAL);
>>
>>  if (file->f_op != &ns_file_operations)
>>  return ERR_PTR(-EINVAL);
>>
>>  ns = get_proc_ns(file_inode(file));
>>  if (ns->ops->type != type)
>>  return ERR_PTR(-EINVAL);
>>
>>  return ns;
>> }
>
> OK, I won't insist, this too looks better to me than proc_ns_fdget(&fd_ref).
>
> And in any case fcheck_files() makes more sense than fdget(), somehow I did
> not think about this when I sent 1/1.
>
> Hmm. and after the quick look at cleanup_net() I can't understand whether
> get_net_ns_by_fd() can use ns_by_fd_rcu() + maybe_get_net(to_net_ns()) or
> not... Can it?

Some of those places need a reference that allows them to sleep, and the
code is shared with the legacy pid case so with an addition of get_net
we can use ns_by_fd_rcu().   There are cases like setns that could
use ns_by_fd_rcu() with code reording.

We can implement get_net_ns_by_fd as:
struct net *get_net_ns_by_fd(int fd)
{
struct net *net;

rcu_read_lock();
net = net_ns_by_fd_rcu(fd);
if (!IS_ERR(net))
get_net(net);
rcu_read_unlock();

return net;
}

Which means we can achieve code sharing with the pure rcu version
as a base.

If the networking code did not have the legacy pid case to handle I
would want to do something with struct fd, as the file descriptor
already keeps the struct net reference alive and struct fd allows
for sleeping.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()

2015-09-29 Thread Oleg Nesterov
On 09/28, Eric W. Biederman wrote:
>
> Oleg Nesterov  writes:
>
> > Honestly, I do not really like the new helper... I understand this
> > is subjective, so I won't insist. But how about 1/1? We do not need
> > fd/file at all. With this patch your sys_getvpid() can just use
> > proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns().
> >
> > Eric, what do you think?
>
> At some level I don't care this is not exposed to userspace.

I agree, this is minor. But you know, the kernel is already complicated
too much, we should try to simplify/cleanup everything we can ;)

> Of the existing uses several of them sleep, which unfortunately means we
> can not use rcu locking for everything.  The network namespace ones wind
> up taking a reference to struct net because the have the legacy pid case
> to deal with. Which makes we can not use fdget for all callers either.

And that is why 1/1 adds proc_get_ns_by_fd/get_type which can also be used
by the network namespace.

> For this translate_pid rcu locking is sufficient, rcu locking is easy
> and doing any more than rcu locking just seems silly.  So let me
> respectfully suggest.
>
> struct ns_common *ns_by_fd_rcu(int fd, int type)
> {
>   struct files_struct *files = current->files;
>   struct file *file;
>   struct ns_common *ns;
>   void *ret;
>
>   file = fcheck_files(files, fd);
> if (!file)
>   return ERR_PTR(-EBADF);
>
>   if (file->f_mode & FMODE_PATH)
>   return ERR_PTR(-EINVAL);
>
>   if (file->f_op != &ns_file_operations)
>   return ERR_PTR(-EINVAL);
>
>   ns = get_proc_ns(file_inode(file));
>   if (ns->ops->type != type)
>   return ERR_PTR(-EINVAL);
>
>   return ns;
> }

OK, I won't insist, this too looks better to me than proc_ns_fdget(&fd_ref).

And in any case fcheck_files() makes more sense than fdget(), somehow I did
not think about this when I sent 1/1.

Hmm. and after the quick look at cleanup_net() I can't understand whether
get_net_ns_by_fd() can use ns_by_fd_rcu() + maybe_get_net(to_net_ns()) or
not... Can it?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()

2015-09-28 Thread Eric W. Biederman
Oleg Nesterov  writes:

> On 09/25, Konstantin Khlebnikov wrote:
>>
>> +struct ns_common *proc_ns_fdget(int fd, int nstype, struct fd *fd_ref)
>>  {
>> -struct file *file;
>> +struct ns_common *ns;
>> +struct fd f;
>>  
>> -file = fget(fd);
>> -if (!file)
>> +f = fdget(fd);
>> +if (!f.file)
>>  return ERR_PTR(-EBADF);
>>  
>> -if (file->f_op != &ns_file_operations)
>> +if (f.file->f_op != &ns_file_operations)
>> +goto out_invalid;
>> +
>> +ns = get_proc_ns(file_inode(f.file));
>> +if (nstype && (ns->ops->type != nstype))
>>  goto out_invalid;
>>  
>> -return file;
>> +*fd_ref = f;
>> +return ns;
>>  
>>  out_invalid:
>> -fput(file);
>> +fdput(f);
>>  return ERR_PTR(-EINVAL);
>>  }
>
> Well yes, fdget() makes sense but this is minor.
>
> Honestly, I do not really like the new helper... I understand this
> is subjective, so I won't insist. But how about 1/1? We do not need
> fd/file at all. With this patch your sys_getvpid() can just use
> proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns().
>
> Eric, what do you think?

At some level I don't care this is not exposed to userspace.

However since we are going several rounds with this.

Of the existing uses several of them sleep, which unfortunately means we
can not use rcu locking for everything.  The network namespace ones wind
up taking a reference to struct net because the have the legacy pid case
to deal with.  Which makes we can not use fdget for all callers either.

For this translate_pid rcu locking is sufficient, rcu locking is easy
and doing any more than rcu locking just seems silly.  So let me
respectfully suggest.

struct ns_common *ns_by_fd_rcu(int fd, int type)
{
struct files_struct *files = current->files;
struct file *file;
struct ns_common *ns;
void *ret;

file = fcheck_files(files, fd);
if (!file)
return ERR_PTR(-EBADF);

if (file->f_mode & FMODE_PATH)
return ERR_PTR(-EINVAL);

if (file->f_op != &ns_file_operations)
return ERR_PTR(-EINVAL);

ns = get_proc_ns(file_inode(file));
if (ns->ops->type != type)
return ERR_PTR(-EINVAL);

return ns;
}

struct pid_namespace *pidns_by_fd_rcu(int fd)
{
struct ns_common *ns = ns_by_fd_rcu(fd, CLONE_NEWPID);
if (IS_ERR(ns))
return ERR_CAST(ns);
return container_of(ns, struct pid_namespace, ns);
}

SYSCALL_DEFINE3(translate_pid, pid_t, pid_nr, int, sourcefd, int, targetfd)
{
struct pid_namespace *source, *target;
struct pid *pid;
pid_t result;

rcu_read_lock();

if (sourcefd >= 0)
source = pidns_by_fd_rcu(sourcefd);
else
source = task_active_pid_ns(current);

if (targetfd >= 0)
target = pidns_by_fd_rcu(targetfd);
else
target = task_active_pid_ns(current);

pid = find_pid_ns(pid_nr, source);
result = pid_nr_ns(pid, target);
if (result == 0)
result = -ESRCH;  

rcu_read_unlock();

return result;
}

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] ns: introduce proc_get_ns_by_fd()

2015-09-28 Thread Konstantin Khlebnikov

On 25.09.2015 20:56, Oleg Nesterov wrote:

On 09/25, Konstantin Khlebnikov wrote:


+struct ns_common *proc_ns_fdget(int fd, int nstype, struct fd *fd_ref)
  {
-   struct file *file;
+   struct ns_common *ns;
+   struct fd f;

-   file = fget(fd);
-   if (!file)
+   f = fdget(fd);
+   if (!f.file)
return ERR_PTR(-EBADF);

-   if (file->f_op != &ns_file_operations)
+   if (f.file->f_op != &ns_file_operations)
+   goto out_invalid;
+
+   ns = get_proc_ns(file_inode(f.file));
+   if (nstype && (ns->ops->type != nstype))
goto out_invalid;

-   return file;
+   *fd_ref = f;
+   return ns;

  out_invalid:
-   fput(file);
+   fdput(f);
return ERR_PTR(-EINVAL);
  }


Well yes, fdget() makes sense but this is minor.

Honestly, I do not really like the new helper... I understand this
is subjective, so I won't insist. But how about 1/1? We do not need
fd/file at all. With this patch your sys_getvpid() can just use
proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns().


Hmm. My version has 0 or 2 atomic ops per get-put sequence.
Your version: 2 or 4 atomic ops. Plus even in worst case pinning by
struct fd theoretically scales better because it touches refcount at
struct file: there're might be many of them for one namespace.



Eric, what do you think?

See also "TODO" in the changelog.

Oleg.




--
Konstantin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/1] ns: introduce proc_get_ns_by_fd()

2015-09-25 Thread Oleg Nesterov
On 09/25, Konstantin Khlebnikov wrote:
>
> +struct ns_common *proc_ns_fdget(int fd, int nstype, struct fd *fd_ref)
>  {
> - struct file *file;
> + struct ns_common *ns;
> + struct fd f;
>  
> - file = fget(fd);
> - if (!file)
> + f = fdget(fd);
> + if (!f.file)
>   return ERR_PTR(-EBADF);
>  
> - if (file->f_op != &ns_file_operations)
> + if (f.file->f_op != &ns_file_operations)
> + goto out_invalid;
> +
> + ns = get_proc_ns(file_inode(f.file));
> + if (nstype && (ns->ops->type != nstype))
>   goto out_invalid;
>  
> - return file;
> + *fd_ref = f;
> + return ns;
>  
>  out_invalid:
> - fput(file);
> + fdput(f);
>   return ERR_PTR(-EINVAL);
>  }

Well yes, fdget() makes sense but this is minor.

Honestly, I do not really like the new helper... I understand this
is subjective, so I won't insist. But how about 1/1? We do not need
fd/file at all. With this patch your sys_getvpid() can just use
proc_get_ns_by_fd(fd, CLONE_NEWPID) and put_pid_ns().

Eric, what do you think?

See also "TODO" in the changelog.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/