Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-16 Thread Eric W. Biederman
Russell King - ARM Linux  writes:

> On Thu, May 10, 2018 at 01:39:18PM -0600, Mathieu Poirier wrote:
>> Hi Russell,
>> 
>> On 10 May 2018 at 02:40, Russell King - ARM Linux  
>> wrote:
>> > This does not leak information from other namespaces because of the
>> > uniqueness of the global PID.  However, what it does leak is the value
>> > of the global PID which is meaningless in the namespace.  So, before
>> > the event stream is delivered to userspace, this value needs to be
>> > re-written to the namespace's PID value.
>> 
>> Unfortunately that can't be done.  The trace stream is compressed and
>> needs to be decompressed using an external library.  I think the only
>> option is to return an error if a user is trying to use this feature
>> from a namespace.
>
> That sounds like a sensible approach, and that should get rid of the
> vpid stuff too.
>
> Eric, would this solve all your concerns?

It does, and I have given my ack to the respin.

I am moderately concerned about using the global pid with hardware.  But
pids are a core abstraction that aren't going anywhere.  As long as
hardware does not impose constraints on pids that software already does
not we should be fine.

Eric


Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-16 Thread Eric W. Biederman
Russell King - ARM Linux  writes:

> On Thu, May 10, 2018 at 01:39:18PM -0600, Mathieu Poirier wrote:
>> Hi Russell,
>> 
>> On 10 May 2018 at 02:40, Russell King - ARM Linux  
>> wrote:
>> > This does not leak information from other namespaces because of the
>> > uniqueness of the global PID.  However, what it does leak is the value
>> > of the global PID which is meaningless in the namespace.  So, before
>> > the event stream is delivered to userspace, this value needs to be
>> > re-written to the namespace's PID value.
>> 
>> Unfortunately that can't be done.  The trace stream is compressed and
>> needs to be decompressed using an external library.  I think the only
>> option is to return an error if a user is trying to use this feature
>> from a namespace.
>
> That sounds like a sensible approach, and that should get rid of the
> vpid stuff too.
>
> Eric, would this solve all your concerns?

It does, and I have given my ack to the respin.

I am moderately concerned about using the global pid with hardware.  But
pids are a core abstraction that aren't going anywhere.  As long as
hardware does not impose constraints on pids that software already does
not we should be fine.

Eric


Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-10 Thread Russell King - ARM Linux
On Thu, May 10, 2018 at 01:39:18PM -0600, Mathieu Poirier wrote:
> Hi Russell,
> 
> On 10 May 2018 at 02:40, Russell King - ARM Linux  
> wrote:
> > This does not leak information from other namespaces because of the
> > uniqueness of the global PID.  However, what it does leak is the value
> > of the global PID which is meaningless in the namespace.  So, before
> > the event stream is delivered to userspace, this value needs to be
> > re-written to the namespace's PID value.
> 
> Unfortunately that can't be done.  The trace stream is compressed and
> needs to be decompressed using an external library.  I think the only
> option is to return an error if a user is trying to use this feature
> from a namespace.

That sounds like a sensible approach, and that should get rid of the
vpid stuff too.

Eric, would this solve all your concerns?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-10 Thread Russell King - ARM Linux
On Thu, May 10, 2018 at 01:39:18PM -0600, Mathieu Poirier wrote:
> Hi Russell,
> 
> On 10 May 2018 at 02:40, Russell King - ARM Linux  
> wrote:
> > This does not leak information from other namespaces because of the
> > uniqueness of the global PID.  However, what it does leak is the value
> > of the global PID which is meaningless in the namespace.  So, before
> > the event stream is delivered to userspace, this value needs to be
> > re-written to the namespace's PID value.
> 
> Unfortunately that can't be done.  The trace stream is compressed and
> needs to be decompressed using an external library.  I think the only
> option is to return an error if a user is trying to use this feature
> from a namespace.

That sounds like a sensible approach, and that should get rid of the
vpid stuff too.

Eric, would this solve all your concerns?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-10 Thread Mathieu Poirier
On 10 May 2018 at 02:40, Russell King - ARM Linux  wrote:
> On Wed, May 09, 2018 at 09:35:07PM -0500, Eric W. Biederman wrote:
>> Mathieu Poirier  writes:
>>
>> > On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote:
>> >> Kim Phillips  writes:
>> >>
>> >> > This patch is in the context of allowing the Coresight h/w
>> >> > trace driver suite to be loaded as modules.  Coresight uses
>> >> > find_task_by_vpid when running in direct capture mode (via sysfs)
>> >> > when getting/setting the context ID comparator to trigger on
>> >> > (/sys/bus/coresight/devices/.etm/ctxid_pid).
>> >>
>> >> Aside from my objection about how bad an interface a pid in sysfs is.
>> >> The implementation of coresight_vpid_to_pid is horrible.
>> >>
>> >> The code should be just:
>> >>
>> >> static inline pid_t coresight_vpid_to_pid(pid_t vpid)
>> >> {
>> >>rcu_read_lock();
>> >> pid = pid_nr(find_vpid(vpid));
>> >>rcu_read_unlock();
>> >>
>> >>return pid;
>> >> }
>> >> Which takes find_task_by_vpid out of the picture.
>> >
>> > Many thanks for pointing out the right way to do this.  When Chunyan added
>> > this feature she broadly published her work and find_task_by_vpid() is the
>> > function she was asked to used.
>>
>> Clearly no one was thinking through the implications of a sysfs file
>> which does not have pid namespace support on namespacing.  I am quite
>> upset at this mess of an API.  It is not a maintainable way to do things.
>>
>> >> But reading further I am seeing code writing a pid to hardware.  That is
>> >> broken.  That is a layering violation of the first order.  Giving
>> >> implementation details like that to hardware.
>> >
>> > This is how the feature works - as Robin pointed out tracers are designed 
>> > to
>> > match pid values with the CPU's contextID register.  The input value has no
>> > other effect than triggering trace collection, which has absolutely no 
>> > baring on
>> > the CPU.
>>
>> So please tell me how we make the tracer pid namespace aware.  Or is it
>> guaranteed that only the global root user will use this functionality?
>>
>> As you are taking a vpid it looks like users with lesser privileges are
>> able to request this.   From the other reply it appears this is the
>> value the tracer returns to put in logs.  Perhaps I missed it but I
>> didn't see anything that translated from the global pid to something
>> else.  Which would make using this feature in a pid namespace confusing
>> and a problematic information leak if I have understood what has been
>> said so far.
>

Hi Russell,

> Let's look to see what's placed into the context ID register - this is
> done by arch/arm/mm/context.c::contextidr_notifier():
>
> pid = task_pid_nr(thread->task) << ASID_BITS;
>
> This is documented in linux/sched.h as:
>
>  * task_xid_nr() : global id, i.e. the id seen from the init namespace;
>
> So, what ends up in the context ID register is the _global_ PID, not a
> namespace specific PID.  This means the hardware deals with global PID
> values.

Correct.

>
> It seems quite logical to use the global PID value for the hardware,
> because that is a globally unique value - especially as the hardware
> uses this for filtering events.  So asking for a namespace's pid 1
> gets mapped to the global pid value, which won't match some other
> namespace's pid 1.
>
> The problem comes _if_ the event stream delivered to userspace contains
> the global PID values and the event stream is being looked at from
> within a namespace.

Correct.

>
> This does not leak information from other namespaces because of the
> uniqueness of the global PID.  However, what it does leak is the value
> of the global PID which is meaningless in the namespace.  So, before
> the event stream is delivered to userspace, this value needs to be
> re-written to the namespace's PID value.

Unfortunately that can't be done.  The trace stream is compressed and
needs to be decompressed using an external library.  I think the only
option is to return an error if a user is trying to use this feature
from a namespace.

>
> Things get more yucky with this when you look at the ctxid_masks stuff
>  - which looks to me like it implements a mask on the PID value.  Masks
> on the pid value are irrelevant from within a namespace, because the
> mask is applied to the global PID value, not the namespace's PID value.
> You can't really define how a set of namespace PIDs will map to global
> PIDs, so masking the context ID PID value in the presence of namespaces
> is pretty useless - and potentially ends up being an information leak.

Also correct.  Since there is no point in trying to use contextID
tracing from a namespace (see above) there is also no point in trying
to apply a mask to the contextIDs.  Once again I think it is best to
return an error when using from a namespace.

Thanks,
Mathieu

>
> As for the sysfs file thing, I think the 

Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-10 Thread Mathieu Poirier
On 10 May 2018 at 02:40, Russell King - ARM Linux  wrote:
> On Wed, May 09, 2018 at 09:35:07PM -0500, Eric W. Biederman wrote:
>> Mathieu Poirier  writes:
>>
>> > On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote:
>> >> Kim Phillips  writes:
>> >>
>> >> > This patch is in the context of allowing the Coresight h/w
>> >> > trace driver suite to be loaded as modules.  Coresight uses
>> >> > find_task_by_vpid when running in direct capture mode (via sysfs)
>> >> > when getting/setting the context ID comparator to trigger on
>> >> > (/sys/bus/coresight/devices/.etm/ctxid_pid).
>> >>
>> >> Aside from my objection about how bad an interface a pid in sysfs is.
>> >> The implementation of coresight_vpid_to_pid is horrible.
>> >>
>> >> The code should be just:
>> >>
>> >> static inline pid_t coresight_vpid_to_pid(pid_t vpid)
>> >> {
>> >>rcu_read_lock();
>> >> pid = pid_nr(find_vpid(vpid));
>> >>rcu_read_unlock();
>> >>
>> >>return pid;
>> >> }
>> >> Which takes find_task_by_vpid out of the picture.
>> >
>> > Many thanks for pointing out the right way to do this.  When Chunyan added
>> > this feature she broadly published her work and find_task_by_vpid() is the
>> > function she was asked to used.
>>
>> Clearly no one was thinking through the implications of a sysfs file
>> which does not have pid namespace support on namespacing.  I am quite
>> upset at this mess of an API.  It is not a maintainable way to do things.
>>
>> >> But reading further I am seeing code writing a pid to hardware.  That is
>> >> broken.  That is a layering violation of the first order.  Giving
>> >> implementation details like that to hardware.
>> >
>> > This is how the feature works - as Robin pointed out tracers are designed 
>> > to
>> > match pid values with the CPU's contextID register.  The input value has no
>> > other effect than triggering trace collection, which has absolutely no 
>> > baring on
>> > the CPU.
>>
>> So please tell me how we make the tracer pid namespace aware.  Or is it
>> guaranteed that only the global root user will use this functionality?
>>
>> As you are taking a vpid it looks like users with lesser privileges are
>> able to request this.   From the other reply it appears this is the
>> value the tracer returns to put in logs.  Perhaps I missed it but I
>> didn't see anything that translated from the global pid to something
>> else.  Which would make using this feature in a pid namespace confusing
>> and a problematic information leak if I have understood what has been
>> said so far.
>

Hi Russell,

> Let's look to see what's placed into the context ID register - this is
> done by arch/arm/mm/context.c::contextidr_notifier():
>
> pid = task_pid_nr(thread->task) << ASID_BITS;
>
> This is documented in linux/sched.h as:
>
>  * task_xid_nr() : global id, i.e. the id seen from the init namespace;
>
> So, what ends up in the context ID register is the _global_ PID, not a
> namespace specific PID.  This means the hardware deals with global PID
> values.

Correct.

>
> It seems quite logical to use the global PID value for the hardware,
> because that is a globally unique value - especially as the hardware
> uses this for filtering events.  So asking for a namespace's pid 1
> gets mapped to the global pid value, which won't match some other
> namespace's pid 1.
>
> The problem comes _if_ the event stream delivered to userspace contains
> the global PID values and the event stream is being looked at from
> within a namespace.

Correct.

>
> This does not leak information from other namespaces because of the
> uniqueness of the global PID.  However, what it does leak is the value
> of the global PID which is meaningless in the namespace.  So, before
> the event stream is delivered to userspace, this value needs to be
> re-written to the namespace's PID value.

Unfortunately that can't be done.  The trace stream is compressed and
needs to be decompressed using an external library.  I think the only
option is to return an error if a user is trying to use this feature
from a namespace.

>
> Things get more yucky with this when you look at the ctxid_masks stuff
>  - which looks to me like it implements a mask on the PID value.  Masks
> on the pid value are irrelevant from within a namespace, because the
> mask is applied to the global PID value, not the namespace's PID value.
> You can't really define how a set of namespace PIDs will map to global
> PIDs, so masking the context ID PID value in the presence of namespaces
> is pretty useless - and potentially ends up being an information leak.

Also correct.  Since there is no point in trying to use contextID
tracing from a namespace (see above) there is also no point in trying
to apply a mask to the contextIDs.  Once again I think it is best to
return an error when using from a namespace.

Thanks,
Mathieu

>
> As for the sysfs file thing, I think the simple solution to that is
> the sysfs file should accept a PID value in 

Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-10 Thread Russell King - ARM Linux
On Wed, May 09, 2018 at 09:35:07PM -0500, Eric W. Biederman wrote:
> Mathieu Poirier  writes:
> 
> > On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote:
> >> Kim Phillips  writes:
> >> 
> >> > This patch is in the context of allowing the Coresight h/w
> >> > trace driver suite to be loaded as modules.  Coresight uses
> >> > find_task_by_vpid when running in direct capture mode (via sysfs)
> >> > when getting/setting the context ID comparator to trigger on
> >> > (/sys/bus/coresight/devices/.etm/ctxid_pid).
> >> 
> >> Aside from my objection about how bad an interface a pid in sysfs is.
> >> The implementation of coresight_vpid_to_pid is horrible.
> >> 
> >> The code should be just:
> >> 
> >> static inline pid_t coresight_vpid_to_pid(pid_t vpid)
> >> {
> >>rcu_read_lock();
> >> pid = pid_nr(find_vpid(vpid));
> >>rcu_read_unlock();
> >> 
> >>return pid;
> >> }
> >> Which takes find_task_by_vpid out of the picture.
> >
> > Many thanks for pointing out the right way to do this.  When Chunyan added
> > this feature she broadly published her work and find_task_by_vpid() is the
> > function she was asked to used.
> 
> Clearly no one was thinking through the implications of a sysfs file
> which does not have pid namespace support on namespacing.  I am quite
> upset at this mess of an API.  It is not a maintainable way to do things.
> 
> >> But reading further I am seeing code writing a pid to hardware.  That is
> >> broken.  That is a layering violation of the first order.  Giving
> >> implementation details like that to hardware.
> >
> > This is how the feature works - as Robin pointed out tracers are designed to
> > match pid values with the CPU's contextID register.  The input value has no
> > other effect than triggering trace collection, which has absolutely no 
> > baring on
> > the CPU.
> 
> So please tell me how we make the tracer pid namespace aware.  Or is it
> guaranteed that only the global root user will use this functionality?
> 
> As you are taking a vpid it looks like users with lesser privileges are
> able to request this.   From the other reply it appears this is the
> value the tracer returns to put in logs.  Perhaps I missed it but I
> didn't see anything that translated from the global pid to something
> else.  Which would make using this feature in a pid namespace confusing
> and a problematic information leak if I have understood what has been
> said so far.

Let's look to see what's placed into the context ID register - this is
done by arch/arm/mm/context.c::contextidr_notifier():

pid = task_pid_nr(thread->task) << ASID_BITS;

This is documented in linux/sched.h as:

 * task_xid_nr() : global id, i.e. the id seen from the init namespace;

So, what ends up in the context ID register is the _global_ PID, not a
namespace specific PID.  This means the hardware deals with global PID
values.

It seems quite logical to use the global PID value for the hardware,
because that is a globally unique value - especially as the hardware
uses this for filtering events.  So asking for a namespace's pid 1
gets mapped to the global pid value, which won't match some other
namespace's pid 1.

The problem comes _if_ the event stream delivered to userspace contains
the global PID values and the event stream is being looked at from
within a namespace.

This does not leak information from other namespaces because of the
uniqueness of the global PID.  However, what it does leak is the value
of the global PID which is meaningless in the namespace.  So, before
the event stream is delivered to userspace, this value needs to be
re-written to the namespace's PID value.

Things get more yucky with this when you look at the ctxid_masks stuff
 - which looks to me like it implements a mask on the PID value.  Masks
on the pid value are irrelevant from within a namespace, because the
mask is applied to the global PID value, not the namespace's PID value.
You can't really define how a set of namespace PIDs will map to global
PIDs, so masking the context ID PID value in the presence of namespaces
is pretty useless - and potentially ends up being an information leak.

As for the sysfs file thing, I think the simple solution to that is
the sysfs file should accept a PID value in the current namespace,
and translate that to the global namespace - and the global PID value
should be stored.  When reading, the global PID value should be
translated back to the current namespace, or an error/empty given if
the PID doesn't exist in that namespace.  The current solution to
store the vpid and simply return it irrespective of the namespace is
just nonsense.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-10 Thread Russell King - ARM Linux
On Wed, May 09, 2018 at 09:35:07PM -0500, Eric W. Biederman wrote:
> Mathieu Poirier  writes:
> 
> > On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote:
> >> Kim Phillips  writes:
> >> 
> >> > This patch is in the context of allowing the Coresight h/w
> >> > trace driver suite to be loaded as modules.  Coresight uses
> >> > find_task_by_vpid when running in direct capture mode (via sysfs)
> >> > when getting/setting the context ID comparator to trigger on
> >> > (/sys/bus/coresight/devices/.etm/ctxid_pid).
> >> 
> >> Aside from my objection about how bad an interface a pid in sysfs is.
> >> The implementation of coresight_vpid_to_pid is horrible.
> >> 
> >> The code should be just:
> >> 
> >> static inline pid_t coresight_vpid_to_pid(pid_t vpid)
> >> {
> >>rcu_read_lock();
> >> pid = pid_nr(find_vpid(vpid));
> >>rcu_read_unlock();
> >> 
> >>return pid;
> >> }
> >> Which takes find_task_by_vpid out of the picture.
> >
> > Many thanks for pointing out the right way to do this.  When Chunyan added
> > this feature she broadly published her work and find_task_by_vpid() is the
> > function she was asked to used.
> 
> Clearly no one was thinking through the implications of a sysfs file
> which does not have pid namespace support on namespacing.  I am quite
> upset at this mess of an API.  It is not a maintainable way to do things.
> 
> >> But reading further I am seeing code writing a pid to hardware.  That is
> >> broken.  That is a layering violation of the first order.  Giving
> >> implementation details like that to hardware.
> >
> > This is how the feature works - as Robin pointed out tracers are designed to
> > match pid values with the CPU's contextID register.  The input value has no
> > other effect than triggering trace collection, which has absolutely no 
> > baring on
> > the CPU.
> 
> So please tell me how we make the tracer pid namespace aware.  Or is it
> guaranteed that only the global root user will use this functionality?
> 
> As you are taking a vpid it looks like users with lesser privileges are
> able to request this.   From the other reply it appears this is the
> value the tracer returns to put in logs.  Perhaps I missed it but I
> didn't see anything that translated from the global pid to something
> else.  Which would make using this feature in a pid namespace confusing
> and a problematic information leak if I have understood what has been
> said so far.

Let's look to see what's placed into the context ID register - this is
done by arch/arm/mm/context.c::contextidr_notifier():

pid = task_pid_nr(thread->task) << ASID_BITS;

This is documented in linux/sched.h as:

 * task_xid_nr() : global id, i.e. the id seen from the init namespace;

So, what ends up in the context ID register is the _global_ PID, not a
namespace specific PID.  This means the hardware deals with global PID
values.

It seems quite logical to use the global PID value for the hardware,
because that is a globally unique value - especially as the hardware
uses this for filtering events.  So asking for a namespace's pid 1
gets mapped to the global pid value, which won't match some other
namespace's pid 1.

The problem comes _if_ the event stream delivered to userspace contains
the global PID values and the event stream is being looked at from
within a namespace.

This does not leak information from other namespaces because of the
uniqueness of the global PID.  However, what it does leak is the value
of the global PID which is meaningless in the namespace.  So, before
the event stream is delivered to userspace, this value needs to be
re-written to the namespace's PID value.

Things get more yucky with this when you look at the ctxid_masks stuff
 - which looks to me like it implements a mask on the PID value.  Masks
on the pid value are irrelevant from within a namespace, because the
mask is applied to the global PID value, not the namespace's PID value.
You can't really define how a set of namespace PIDs will map to global
PIDs, so masking the context ID PID value in the presence of namespaces
is pretty useless - and potentially ends up being an information leak.

As for the sysfs file thing, I think the simple solution to that is
the sysfs file should accept a PID value in the current namespace,
and translate that to the global namespace - and the global PID value
should be stored.  When reading, the global PID value should be
translated back to the current namespace, or an error/empty given if
the PID doesn't exist in that namespace.  The current solution to
store the vpid and simply return it irrespective of the namespace is
just nonsense.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-09 Thread Eric W. Biederman
Mathieu Poirier  writes:

> On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote:
>> Kim Phillips  writes:
>> 
>> > This patch is in the context of allowing the Coresight h/w
>> > trace driver suite to be loaded as modules.  Coresight uses
>> > find_task_by_vpid when running in direct capture mode (via sysfs)
>> > when getting/setting the context ID comparator to trigger on
>> > (/sys/bus/coresight/devices/.etm/ctxid_pid).
>> 
>> Aside from my objection about how bad an interface a pid in sysfs is.
>> The implementation of coresight_vpid_to_pid is horrible.
>> 
>> The code should be just:
>> 
>> static inline pid_t coresight_vpid_to_pid(pid_t vpid)
>> {
>>  rcu_read_lock();
>> pid = pid_nr(find_vpid(vpid));
>>  rcu_read_unlock();
>> 
>>  return pid;
>> }
>> Which takes find_task_by_vpid out of the picture.
>
> Many thanks for pointing out the right way to do this.  When Chunyan added
> this feature she broadly published her work and find_task_by_vpid() is the
> function she was asked to used.

Clearly no one was thinking through the implications of a sysfs file
which does not have pid namespace support on namespacing.  I am quite
upset at this mess of an API.  It is not a maintainable way to do things.

>> But reading further I am seeing code writing a pid to hardware.  That is
>> broken.  That is a layering violation of the first order.  Giving
>> implementation details like that to hardware.
>
> This is how the feature works - as Robin pointed out tracers are designed to
> match pid values with the CPU's contextID register.  The input value has no
> other effect than triggering trace collection, which has absolutely no baring 
> on
> the CPU.

So please tell me how we make the tracer pid namespace aware.  Or is it
guaranteed that only the global root user will use this functionality?

As you are taking a vpid it looks like users with lesser privileges are
able to request this.   From the other reply it appears this is the
value the tracer returns to put in logs.  Perhaps I missed it but I
didn't see anything that translated from the global pid to something
else.  Which would make using this feature in a pid namespace confusing
and a problematic information leak if I have understood what has been
said so far.

Eric



Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-09 Thread Eric W. Biederman
Mathieu Poirier  writes:

> On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote:
>> Kim Phillips  writes:
>> 
>> > This patch is in the context of allowing the Coresight h/w
>> > trace driver suite to be loaded as modules.  Coresight uses
>> > find_task_by_vpid when running in direct capture mode (via sysfs)
>> > when getting/setting the context ID comparator to trigger on
>> > (/sys/bus/coresight/devices/.etm/ctxid_pid).
>> 
>> Aside from my objection about how bad an interface a pid in sysfs is.
>> The implementation of coresight_vpid_to_pid is horrible.
>> 
>> The code should be just:
>> 
>> static inline pid_t coresight_vpid_to_pid(pid_t vpid)
>> {
>>  rcu_read_lock();
>> pid = pid_nr(find_vpid(vpid));
>>  rcu_read_unlock();
>> 
>>  return pid;
>> }
>> Which takes find_task_by_vpid out of the picture.
>
> Many thanks for pointing out the right way to do this.  When Chunyan added
> this feature she broadly published her work and find_task_by_vpid() is the
> function she was asked to used.

Clearly no one was thinking through the implications of a sysfs file
which does not have pid namespace support on namespacing.  I am quite
upset at this mess of an API.  It is not a maintainable way to do things.

>> But reading further I am seeing code writing a pid to hardware.  That is
>> broken.  That is a layering violation of the first order.  Giving
>> implementation details like that to hardware.
>
> This is how the feature works - as Robin pointed out tracers are designed to
> match pid values with the CPU's contextID register.  The input value has no
> other effect than triggering trace collection, which has absolutely no baring 
> on
> the CPU.

So please tell me how we make the tracer pid namespace aware.  Or is it
guaranteed that only the global root user will use this functionality?

As you are taking a vpid it looks like users with lesser privileges are
able to request this.   From the other reply it appears this is the
value the tracer returns to put in logs.  Perhaps I missed it but I
didn't see anything that translated from the global pid to something
else.  Which would make using this feature in a pid namespace confusing
and a problematic information leak if I have understood what has been
said so far.

Eric



Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-09 Thread Mathieu Poirier
On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote:
> Kim Phillips  writes:
> 
> > This patch is in the context of allowing the Coresight h/w
> > trace driver suite to be loaded as modules.  Coresight uses
> > find_task_by_vpid when running in direct capture mode (via sysfs)
> > when getting/setting the context ID comparator to trigger on
> > (/sys/bus/coresight/devices/.etm/ctxid_pid).
> 
> Aside from my objection about how bad an interface a pid in sysfs is.
> The implementation of coresight_vpid_to_pid is horrible.
> 
> The code should be just:
> 
> static inline pid_t coresight_vpid_to_pid(pid_t vpid)
> {
>   rcu_read_lock();
> pid = pid_nr(find_vpid(vpid));
>   rcu_read_unlock();
> 
>   return pid;
> }
> Which takes find_task_by_vpid out of the picture.

Many thanks for pointing out the right way to do this.  When Chunyan added
this feature she broadly published her work and find_task_by_vpid() is the
function she was asked to used.

> 
> But reading further I am seeing code writing a pid to hardware.  That is
> broken.  That is a layering violation of the first order.  Giving
> implementation details like that to hardware.

This is how the feature works - as Robin pointed out tracers are designed to
match pid values with the CPU's contextID register.  The input value has no
other effect than triggering trace collection, which has absolutely no baring on
the CPU.

> 
> Any chance while you are working on this you can modify this code so
> that it does something sensible and defensible instead of every line of
> code I read be wrong in at least one detail?
> 
> Thank you,
> Eric
> 


Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-09 Thread Mathieu Poirier
On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote:
> Kim Phillips  writes:
> 
> > This patch is in the context of allowing the Coresight h/w
> > trace driver suite to be loaded as modules.  Coresight uses
> > find_task_by_vpid when running in direct capture mode (via sysfs)
> > when getting/setting the context ID comparator to trigger on
> > (/sys/bus/coresight/devices/.etm/ctxid_pid).
> 
> Aside from my objection about how bad an interface a pid in sysfs is.
> The implementation of coresight_vpid_to_pid is horrible.
> 
> The code should be just:
> 
> static inline pid_t coresight_vpid_to_pid(pid_t vpid)
> {
>   rcu_read_lock();
> pid = pid_nr(find_vpid(vpid));
>   rcu_read_unlock();
> 
>   return pid;
> }
> Which takes find_task_by_vpid out of the picture.

Many thanks for pointing out the right way to do this.  When Chunyan added
this feature she broadly published her work and find_task_by_vpid() is the
function she was asked to used.

> 
> But reading further I am seeing code writing a pid to hardware.  That is
> broken.  That is a layering violation of the first order.  Giving
> implementation details like that to hardware.

This is how the feature works - as Robin pointed out tracers are designed to
match pid values with the CPU's contextID register.  The input value has no
other effect than triggering trace collection, which has absolutely no baring on
the CPU.

> 
> Any chance while you are working on this you can modify this code so
> that it does something sensible and defensible instead of every line of
> code I read be wrong in at least one detail?
> 
> Thank you,
> Eric
> 


Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-09 Thread Robin Murphy

Hi Eric,

On 09/05/18 05:59, Eric W. Biederman wrote:

Kim Phillips  writes:


This patch is in the context of allowing the Coresight h/w
trace driver suite to be loaded as modules.  Coresight uses
find_task_by_vpid when running in direct capture mode (via sysfs)
when getting/setting the context ID comparator to trigger on
(/sys/bus/coresight/devices/.etm/ctxid_pid).


Aside from my objection about how bad an interface a pid in sysfs is.
The implementation of coresight_vpid_to_pid is horrible.

The code should be just:

static inline pid_t coresight_vpid_to_pid(pid_t vpid)
{
rcu_read_lock();
 pid = pid_nr(find_vpid(vpid));
rcu_read_unlock();

return pid;
}
Which takes find_task_by_vpid out of the picture.

But reading further I am seeing code writing a pid to hardware.  That is
broken.  That is a layering violation of the first order.  Giving
implementation details like that to hardware.


Note that the value here is nothing more than a token - the CoreSight 
hardware doesn't actually *do* anything with it other than match it 
against the same value which we also stash in the CPU in much the same 
fashion - see CONFIG_PID_IN_CONTEXTIDR for, if you'll pardon the pun, 
context.


TL;DR: the CPU has a special register whose only purpose is to allow the 
OS help external debug tools identify the currently executing process, 
by writing some arbitrary identifier in there. The trace hardware can 
spit that identifier out into the trace stream whenever it changes, such 
that the user can see context switches easily. Newer trace hardware can 
also use it to actively filter what the capture at source, such that 
only the portions of interest are traced at all. We could in theory make 
up any old value, but as I understand it the PID is/was the most 
user-friendly and easily correlatable thing to hand, and it's now 
probably too well-established to reasonably change.


Robin.


Any chance while you are working on this you can modify this code so
that it does something sensible and defensible instead of every line of
code I read be wrong in at least one detail?

Thank you,
Eric



Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-09 Thread Robin Murphy

Hi Eric,

On 09/05/18 05:59, Eric W. Biederman wrote:

Kim Phillips  writes:


This patch is in the context of allowing the Coresight h/w
trace driver suite to be loaded as modules.  Coresight uses
find_task_by_vpid when running in direct capture mode (via sysfs)
when getting/setting the context ID comparator to trigger on
(/sys/bus/coresight/devices/.etm/ctxid_pid).


Aside from my objection about how bad an interface a pid in sysfs is.
The implementation of coresight_vpid_to_pid is horrible.

The code should be just:

static inline pid_t coresight_vpid_to_pid(pid_t vpid)
{
rcu_read_lock();
 pid = pid_nr(find_vpid(vpid));
rcu_read_unlock();

return pid;
}
Which takes find_task_by_vpid out of the picture.

But reading further I am seeing code writing a pid to hardware.  That is
broken.  That is a layering violation of the first order.  Giving
implementation details like that to hardware.


Note that the value here is nothing more than a token - the CoreSight 
hardware doesn't actually *do* anything with it other than match it 
against the same value which we also stash in the CPU in much the same 
fashion - see CONFIG_PID_IN_CONTEXTIDR for, if you'll pardon the pun, 
context.


TL;DR: the CPU has a special register whose only purpose is to allow the 
OS help external debug tools identify the currently executing process, 
by writing some arbitrary identifier in there. The trace hardware can 
spit that identifier out into the trace stream whenever it changes, such 
that the user can see context switches easily. Newer trace hardware can 
also use it to actively filter what the capture at source, such that 
only the portions of interest are traced at all. We could in theory make 
up any old value, but as I understand it the PID is/was the most 
user-friendly and easily correlatable thing to hand, and it's now 
probably too well-established to reasonably change.


Robin.


Any chance while you are working on this you can modify this code so
that it does something sensible and defensible instead of every line of
code I read be wrong in at least one detail?

Thank you,
Eric



Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-08 Thread Eric W. Biederman
Kim Phillips  writes:

> This patch is in the context of allowing the Coresight h/w
> trace driver suite to be loaded as modules.  Coresight uses
> find_task_by_vpid when running in direct capture mode (via sysfs)
> when getting/setting the context ID comparator to trigger on
> (/sys/bus/coresight/devices/.etm/ctxid_pid).

Aside from my objection about how bad an interface a pid in sysfs is.
The implementation of coresight_vpid_to_pid is horrible.

The code should be just:

static inline pid_t coresight_vpid_to_pid(pid_t vpid)
{
rcu_read_lock();
pid = pid_nr(find_vpid(vpid));
rcu_read_unlock();

return pid;
}
Which takes find_task_by_vpid out of the picture.

But reading further I am seeing code writing a pid to hardware.  That is
broken.  That is a layering violation of the first order.  Giving
implementation details like that to hardware.

Any chance while you are working on this you can modify this code so
that it does something sensible and defensible instead of every line of
code I read be wrong in at least one detail?

Thank you,
Eric



Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-08 Thread Eric W. Biederman
Kim Phillips  writes:

> This patch is in the context of allowing the Coresight h/w
> trace driver suite to be loaded as modules.  Coresight uses
> find_task_by_vpid when running in direct capture mode (via sysfs)
> when getting/setting the context ID comparator to trigger on
> (/sys/bus/coresight/devices/.etm/ctxid_pid).

Aside from my objection about how bad an interface a pid in sysfs is.
The implementation of coresight_vpid_to_pid is horrible.

The code should be just:

static inline pid_t coresight_vpid_to_pid(pid_t vpid)
{
rcu_read_lock();
pid = pid_nr(find_vpid(vpid));
rcu_read_unlock();

return pid;
}
Which takes find_task_by_vpid out of the picture.

But reading further I am seeing code writing a pid to hardware.  That is
broken.  That is a layering violation of the first order.  Giving
implementation details like that to hardware.

Any chance while you are working on this you can modify this code so
that it does something sensible and defensible instead of every line of
code I read be wrong in at least one detail?

Thank you,
Eric



Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-08 Thread Eric W. Biederman
Kim Phillips  writes:

> This patch is in the context of allowing the Coresight h/w
> trace driver suite to be loaded as modules.  Coresight uses
> find_task_by_vpid when running in direct capture mode (via sysfs)
> when getting/setting the context ID comparator to trigger on
> (/sys/bus/coresight/devices/.etm/ctxid_pid).

Nacked-by: "Eric W. Biederman" 

There is no way to implement a sysfs file that takes a pid correctly.
Don't do it.

Pids are tied to pid namespaces and sysfs deliberately does not do
anything remotely resembly a pid namespace.

Eric


> Cc: Mathieu Poirier 
> Cc: Andrew Morton 
> Cc: Oleg Nesterov 
> Cc: Gargi Sharma 
> Cc: Rik van Riel 
> Cc: Pavel Tatashin 
> Cc: Kefeng Wang 
> Cc: Kirill Tkhai 
> Cc: Mike Rapoport 
> Cc: David Howells 
> Cc: "Eric W. Biederman" 
> Signed-off-by: Kim Phillips 
> ---
> Current CoreSight callsite:
>
> https://lxr.missinglinkelectronics.com/linux/include/linux/coresight.h#L285
>
> A quick look didn't find anything, but if Coresight needs to do
> something differently, please comment.
>
>  kernel/pid.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 157fe4b19971..92b1b623f3e0 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -342,6 +342,7 @@ struct task_struct *find_task_by_vpid(pid_t vnr)
>  {
>   return find_task_by_pid_ns(vnr, task_active_pid_ns(current));
>  }
> +EXPORT_SYMBOL_GPL(find_task_by_vpid);
>  
>  struct task_struct *find_get_task_by_vpid(pid_t nr)
>  {


Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-08 Thread Eric W. Biederman
Kim Phillips  writes:

> This patch is in the context of allowing the Coresight h/w
> trace driver suite to be loaded as modules.  Coresight uses
> find_task_by_vpid when running in direct capture mode (via sysfs)
> when getting/setting the context ID comparator to trigger on
> (/sys/bus/coresight/devices/.etm/ctxid_pid).

Nacked-by: "Eric W. Biederman" 

There is no way to implement a sysfs file that takes a pid correctly.
Don't do it.

Pids are tied to pid namespaces and sysfs deliberately does not do
anything remotely resembly a pid namespace.

Eric


> Cc: Mathieu Poirier 
> Cc: Andrew Morton 
> Cc: Oleg Nesterov 
> Cc: Gargi Sharma 
> Cc: Rik van Riel 
> Cc: Pavel Tatashin 
> Cc: Kefeng Wang 
> Cc: Kirill Tkhai 
> Cc: Mike Rapoport 
> Cc: David Howells 
> Cc: "Eric W. Biederman" 
> Signed-off-by: Kim Phillips 
> ---
> Current CoreSight callsite:
>
> https://lxr.missinglinkelectronics.com/linux/include/linux/coresight.h#L285
>
> A quick look didn't find anything, but if Coresight needs to do
> something differently, please comment.
>
>  kernel/pid.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 157fe4b19971..92b1b623f3e0 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -342,6 +342,7 @@ struct task_struct *find_task_by_vpid(pid_t vnr)
>  {
>   return find_task_by_pid_ns(vnr, task_active_pid_ns(current));
>  }
> +EXPORT_SYMBOL_GPL(find_task_by_vpid);
>  
>  struct task_struct *find_get_task_by_vpid(pid_t nr)
>  {