Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-03 Thread Ingo Molnar

* Brown, Len  wrote:

> > This driver only knows how to handle counters, though.  I'm not sure
> > whether all of the MSRs that turbostat needs are counters.
> 
> turbostat --debug
> dumps a lot of configuration MSRs that are not counters.
>
> "--debug" is not an obscure option, it is the only way that the turbostat is 
> used by advanced users, since it shows not just how fast a system is running, 
> but explains why.

I see no problems there: the main property for perf is read-only access - so it 
will work fine with non-counting MSRs as well.

Really, for system-wide statistics perf is the way to go, it has rich kernel 
side 
and user side support. We already kind of expose MSRs via the RAPL energy PMU 
bits. All we need is to extend it to an enumerated list of MSRs.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-03 Thread Ingo Molnar

* Brown, Len len.br...@intel.com wrote:

  This driver only knows how to handle counters, though.  I'm not sure
  whether all of the MSRs that turbostat needs are counters.
 
 turbostat --debug
 dumps a lot of configuration MSRs that are not counters.

 --debug is not an obscure option, it is the only way that the turbostat is 
 used by advanced users, since it shows not just how fast a system is running, 
 but explains why.

I see no problems there: the main property for perf is read-only access - so it 
will work fine with non-counting MSRs as well.

Really, for system-wide statistics perf is the way to go, it has rich kernel 
side 
and user side support. We already kind of expose MSRs via the RAPL energy PMU 
bits. All we need is to extend it to an enumerated list of MSRs.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-02 Thread Andy Lutomirski
On Thu, Jul 2, 2015 at 12:22 PM, H. Peter Anvin  wrote:
> On 07/01/2015 09:38 AM, Brown, Len wrote:
>>
>> BTW. I've had a discussion w/ LLNL about their needs,
>> both for security and performance.  For security, as concluded
>> by this thread, a white list is the only way to go.
>> I'm thinking a bit-vector of allowed MSR offsets...
>> For performance, they absolutely can not afford a system call
>> for every single MSR access.  Here an ioctl to have the
>> msr driver perform a vector of accesses in a single system
>> call seems the way to go.  I can prototype both of these
>> using turbostat as the customer.
>>
>
> Every time I have heard about people having issues with performance for
> MSR access, it is because they are doing cross-CPU accesses which means
> a neverending stream of IPIs.  You get immensely better performance by
> tying a thread to a CPU and only accessing the local CPU from that
> thread.  This has addressed any performance problems anyone has ever
> come to me with.  As Andy and Ingo have already pointed out, the MSR
> access itself is pretty much as expensive as the system call overhead.

To be fair, before we had opportunistic sysret,
CONFIG_CONTEXT_TRACKING was *extremely* expensive.  Even now, it's
still pretty bad.

Len, do you know what configuration and kernel version this was on or
what the apparent syscall overhead was?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-02 Thread H. Peter Anvin
On 07/01/2015 09:38 AM, Brown, Len wrote:
> 
> BTW. I've had a discussion w/ LLNL about their needs,
> both for security and performance.  For security, as concluded
> by this thread, a white list is the only way to go.
> I'm thinking a bit-vector of allowed MSR offsets...
> For performance, they absolutely can not afford a system call
> for every single MSR access.  Here an ioctl to have the
> msr driver perform a vector of accesses in a single system
> call seems the way to go.  I can prototype both of these
> using turbostat as the customer.
> 

Every time I have heard about people having issues with performance for
MSR access, it is because they are doing cross-CPU accesses which means
a neverending stream of IPIs.  You get immensely better performance by
tying a thread to a CPU and only accessing the local CPU from that
thread.  This has addressed any performance problems anyone has ever
come to me with.  As Andy and Ingo have already pointed out, the MSR
access itself is pretty much as expensive as the system call overhead.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-02 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Wed, Jul 1, 2015 at 9:38 AM, Brown, Len  wrote:
>
> > BTW. I've had a discussion w/ LLNL about their needs, both for security and 
> > performance.  For security, as concluded by this thread, a white list is 
> > the 
> > only way to go. I'm thinking a bit-vector of allowed MSR offsets... For 
> > performance, they absolutely can not afford a system call for every single 
> > MSR 
> > access.
> 
> I'm surprised.  On a sane kernel, a syscall is about 120 cycles.  Just rdmsr 
> to 
> an unoptimized MSR is probably fifty cycles, I'd guess.

RDMSR to a non-fastpath MSR is more like a hundred cycles:

[  104.151166] x86/bench: ---
[  104.155350] x86/bench: | Running x86 benchmarks: |
[  104.159530] x86/bench: 
---
[  104.167604] x86/bench: | RDTSC-cycles:hot  (±noise) /   
cold  (±noise)
[  104.175870] x86/bench: 
---

Ancient box (10 years old):

   x86/bench: rdmsr : 36   /
 17  (±29.4%)
   x86/bench: wrmsr :198   /
245

AMD box (2 years old):
...
[  173.208130] x86/bench: rdmsr :121   /
169  (±18.9%)
[  174.633653] x86/bench: wrmsr :365   /
422  (± 9.2%)

Intel box (1 year old):
...
[  130.185195] x86/bench: rdmsr :100   /
112
[  131.263560] x86/bench: wrmsr :492   /
728  (±15.3%)

so the RDMSR cost got progressively worse as MSRs got farther and farther away 
from the core and microcode execution got progressively worse as well.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-02 Thread Ingo Molnar

* Andy Lutomirski l...@amacapital.net wrote:

 On Wed, Jul 1, 2015 at 9:38 AM, Brown, Len len.br...@intel.com wrote:

  BTW. I've had a discussion w/ LLNL about their needs, both for security and 
  performance.  For security, as concluded by this thread, a white list is 
  the 
  only way to go. I'm thinking a bit-vector of allowed MSR offsets... For 
  performance, they absolutely can not afford a system call for every single 
  MSR 
  access.
 
 I'm surprised.  On a sane kernel, a syscall is about 120 cycles.  Just rdmsr 
 to 
 an unoptimized MSR is probably fifty cycles, I'd guess.

RDMSR to a non-fastpath MSR is more like a hundred cycles:

[  104.151166] x86/bench: ---
[  104.155350] x86/bench: | Running x86 benchmarks: |
[  104.159530] x86/bench: 
---
[  104.167604] x86/bench: | RDTSC-cycles:hot  (±noise) /   
cold  (±noise)
[  104.175870] x86/bench: 
---

Ancient box (10 years old):

   x86/bench: rdmsr : 36   /
 17  (±29.4%)
   x86/bench: wrmsr :198   /
245

AMD box (2 years old):
...
[  173.208130] x86/bench: rdmsr :121   /
169  (±18.9%)
[  174.633653] x86/bench: wrmsr :365   /
422  (± 9.2%)

Intel box (1 year old):
...
[  130.185195] x86/bench: rdmsr :100   /
112
[  131.263560] x86/bench: wrmsr :492   /
728  (±15.3%)

so the RDMSR cost got progressively worse as MSRs got farther and farther away 
from the core and microcode execution got progressively worse as well.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-02 Thread H. Peter Anvin
On 07/01/2015 09:38 AM, Brown, Len wrote:
 
 BTW. I've had a discussion w/ LLNL about their needs,
 both for security and performance.  For security, as concluded
 by this thread, a white list is the only way to go.
 I'm thinking a bit-vector of allowed MSR offsets...
 For performance, they absolutely can not afford a system call
 for every single MSR access.  Here an ioctl to have the
 msr driver perform a vector of accesses in a single system
 call seems the way to go.  I can prototype both of these
 using turbostat as the customer.
 

Every time I have heard about people having issues with performance for
MSR access, it is because they are doing cross-CPU accesses which means
a neverending stream of IPIs.  You get immensely better performance by
tying a thread to a CPU and only accessing the local CPU from that
thread.  This has addressed any performance problems anyone has ever
come to me with.  As Andy and Ingo have already pointed out, the MSR
access itself is pretty much as expensive as the system call overhead.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-02 Thread Andy Lutomirski
On Thu, Jul 2, 2015 at 12:22 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/01/2015 09:38 AM, Brown, Len wrote:

 BTW. I've had a discussion w/ LLNL about their needs,
 both for security and performance.  For security, as concluded
 by this thread, a white list is the only way to go.
 I'm thinking a bit-vector of allowed MSR offsets...
 For performance, they absolutely can not afford a system call
 for every single MSR access.  Here an ioctl to have the
 msr driver perform a vector of accesses in a single system
 call seems the way to go.  I can prototype both of these
 using turbostat as the customer.


 Every time I have heard about people having issues with performance for
 MSR access, it is because they are doing cross-CPU accesses which means
 a neverending stream of IPIs.  You get immensely better performance by
 tying a thread to a CPU and only accessing the local CPU from that
 thread.  This has addressed any performance problems anyone has ever
 come to me with.  As Andy and Ingo have already pointed out, the MSR
 access itself is pretty much as expensive as the system call overhead.

To be fair, before we had opportunistic sysret,
CONFIG_CONTEXT_TRACKING was *extremely* expensive.  Even now, it's
still pretty bad.

Len, do you know what configuration and kernel version this was on or
what the apparent syscall overhead was?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-01 Thread Andy Lutomirski
On Wed, Jul 1, 2015 at 9:38 AM, Brown, Len  wrote:
> BTW. I've had a discussion w/ LLNL about their needs,
> both for security and performance.  For security, as concluded
> by this thread, a white list is the only way to go.
> I'm thinking a bit-vector of allowed MSR offsets...
> For performance, they absolutely can not afford a system call
> for every single MSR access.

I'm surprised.  On a sane kernel, a syscall is about 120 cycles.  Just
rdmsr to an unoptimized MSR is probably fifty cycles, I'd guess.

Of course, LLNL is probably using NOHZ_FULL, which is currently very,
very slow.  Work is afoot to fix that.

>  Here an ioctl to have the
> msr driver perform a vector of accesses in a single system
> call seems the way to go.  I can prototype both of these
> using turbostat as the customer.

How about preadv?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-01 Thread Brown, Len
> This driver only knows how to handle counters, though.  I'm not sure
> whether all of the MSRs that turbostat needs are counters.

turbostat --debug
dumps a lot of configuration MSRs that are not counters.

"--debug" is not an obscure option, it is the only way that
the turbostat is used by advanced users, since it shows not just
how fast a system is running, but explains why.

turbostat -M or -C 'address'
will dump an arbitrary MSR at offset 'address' in hex or counter format.

This allows somebody to use yesterday's turbostat application
to dump an MSR that didn't exist when the application was written.
(and since the MSR driver doesn't have specific MSR addresses
 hard-coded into it, that is permitted)

> I knew that turbostat only did MSR reads

FYI, There have been proposals for turbostat to do MSR writes,
and I've resisted them because I like that multiple turbostats
can run at different intervals and even different users,
and not interfere with each other.  One of the proposals
was due to a short counter that tends to over-flow.  That,
I think would be better done in a central place in the kernel,
though it shouldn't poll unless it is actually being used.
The other is to be able to fix configuration bits that
are recognized to be incorrect or non-optimal. 

BTW. I've had a discussion w/ LLNL about their needs,
both for security and performance.  For security, as concluded
by this thread, a white list is the only way to go.
I'm thinking a bit-vector of allowed MSR offsets...
For performance, they absolutely can not afford a system call
for every single MSR access.  Here an ioctl to have the
msr driver perform a vector of accesses in a single system
call seems the way to go.  I can prototype both of these
using turbostat as the customer.

-Len

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-01 Thread Brown, Len
 This driver only knows how to handle counters, though.  I'm not sure
 whether all of the MSRs that turbostat needs are counters.

turbostat --debug
dumps a lot of configuration MSRs that are not counters.

--debug is not an obscure option, it is the only way that
the turbostat is used by advanced users, since it shows not just
how fast a system is running, but explains why.

turbostat -M or -C 'address'
will dump an arbitrary MSR at offset 'address' in hex or counter format.

This allows somebody to use yesterday's turbostat application
to dump an MSR that didn't exist when the application was written.
(and since the MSR driver doesn't have specific MSR addresses
 hard-coded into it, that is permitted)

 I knew that turbostat only did MSR reads

FYI, There have been proposals for turbostat to do MSR writes,
and I've resisted them because I like that multiple turbostats
can run at different intervals and even different users,
and not interfere with each other.  One of the proposals
was due to a short counter that tends to over-flow.  That,
I think would be better done in a central place in the kernel,
though it shouldn't poll unless it is actually being used.
The other is to be able to fix configuration bits that
are recognized to be incorrect or non-optimal. 

BTW. I've had a discussion w/ LLNL about their needs,
both for security and performance.  For security, as concluded
by this thread, a white list is the only way to go.
I'm thinking a bit-vector of allowed MSR offsets...
For performance, they absolutely can not afford a system call
for every single MSR access.  Here an ioctl to have the
msr driver perform a vector of accesses in a single system
call seems the way to go.  I can prototype both of these
using turbostat as the customer.

-Len

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-01 Thread Andy Lutomirski
On Wed, Jul 1, 2015 at 9:38 AM, Brown, Len len.br...@intel.com wrote:
 BTW. I've had a discussion w/ LLNL about their needs,
 both for security and performance.  For security, as concluded
 by this thread, a white list is the only way to go.
 I'm thinking a bit-vector of allowed MSR offsets...
 For performance, they absolutely can not afford a system call
 for every single MSR access.

I'm surprised.  On a sane kernel, a syscall is about 120 cycles.  Just
rdmsr to an unoptimized MSR is probably fifty cycles, I'd guess.

Of course, LLNL is probably using NOHZ_FULL, which is currently very,
very slow.  Work is afoot to fix that.

  Here an ioctl to have the
 msr driver perform a vector of accesses in a single system
 call seems the way to go.  I can prototype both of these
 using turbostat as the customer.

How about preadv?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-30 Thread Prarit Bhargava


On 06/30/2015 08:44 AM, Peter Zijlstra wrote:
> On Tue, Jun 30, 2015 at 08:20:55AM -0400, Prarit Bhargava wrote:
>> it seems like visiting changes on each of these packages (and the other
>> packages that I'm sure I've missed) will be moderately difficult.
>>
>> Thoughts?
> 
> Start by changing the ones users want to run most and leave the rest
> requiring root privs until someone has the time to convert them.
> 
> Its not like we can remove the msr driver any time soon anyway.
> 

Yes, I agree.

> So I would suggest starting with the perf MSR driver thingy for all
> those MSRs that count things and see if you can convert say
> turbostat/cpufrequtils/powertop over to that.

Yep, that was my plan -- I'm going to do the in-tree utilities first then look
at the external programs.

> 
> I suspect there's MSR that are useful to expose but are not counting,
> I'm not sure perf is the right interface for those.
> 
> Making an inventory on which MSRs are required by these tools and what
> kind of data they provide might give a good idea on how to continue.
> 
> If most of these tools only use counting MSRs that can be serviced with
> the perf-msr driver then that would be great.
> 

Thanks Peter!

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-30 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Tue, Jun 30, 2015 at 08:20:55AM -0400, Prarit Bhargava wrote:
> > it seems like visiting changes on each of these packages (and the other
> > packages that I'm sure I've missed) will be moderately difficult.
> > 
> > Thoughts?
> 
> Start by changing the ones users want to run most and leave the rest
> requiring root privs until someone has the time to convert them.

Yes, start by just converting the ones used by a single tool, say turbostat, 
and 
then convert turbostat (while keeping /dev/msr fall-back code as well) just to 
see 
what the tooling fallout is.

> Its not like we can remove the msr driver any time soon anyway.

Yes.

> So I would suggest starting with the perf MSR driver thingy for all
> those MSRs that count things and see if you can convert say
> turbostat/cpufrequtils/powertop over to that.

One of those tools would be enough, to keep the complexity of the initial 
submission low - and to allow a change of plans if necessary.

> I suspect there's MSR that are useful to expose but are not counting, I'm not 
> sure perf is the right interface for those.
>
> Making an inventory on which MSRs are required by these tools and what kind 
> of 
> data they provide might give a good idea on how to continue.
> 
> If most of these tools only use counting MSRs that can be serviced with the 
> perf-msr driver then that would be great.

Fully agreed!

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-30 Thread Peter Zijlstra
On Tue, Jun 30, 2015 at 08:20:55AM -0400, Prarit Bhargava wrote:
> it seems like visiting changes on each of these packages (and the other
> packages that I'm sure I've missed) will be moderately difficult.
> 
> Thoughts?

Start by changing the ones users want to run most and leave the rest
requiring root privs until someone has the time to convert them.

Its not like we can remove the msr driver any time soon anyway.

So I would suggest starting with the perf MSR driver thingy for all
those MSRs that count things and see if you can convert say
turbostat/cpufrequtils/powertop over to that.

I suspect there's MSR that are useful to expose but are not counting,
I'm not sure perf is the right interface for those.

Making an inventory on which MSRs are required by these tools and what
kind of data they provide might give a good idea on how to continue.

If most of these tools only use counting MSRs that can be serviced with
the perf-msr driver then that would be great.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-30 Thread Prarit Bhargava

The MSR exposure seems to be okay with the following statements:

- complete read of /dev/cpu/X/msr is bad, whitelist instead
- needs to be dependent on either CPU version or reading MSRs for support.
  IIRC the Intel documentaton on the MSRs indicated that there are ways to
  check to see if a particular MSR is supported by a processor.  That doesn't
  seem difficult to do IMO.

Here are the options that have been discussed in this thread, as well as a
third that I'm proposing:

1.  Andy's PMU driver suggestion (eventually exposed via sysfs)
2.  Standalone driver (LLNL) which exposes values in sysfs (one value per
sysfs file)
3.  Make /dev/cpu/X/msr readable for those addresses in the whitelist.
ie) Allow read access to address for IA32_MPERF, etc.

I find exposing MSRs via sysfs difficult to maintain as we move forward.
I suppose we could name them according to their MSR names in the Intel
documentation, however from a userspace point of view I still find it
cumbersome to code that way.  Doing this in the existing /dev/ space has the
benefit that we wouldn't have to change any userspace code. For
those users who did (in some crazy situation) want full read access, they can
still do 'setcap' on that particular executable.

Using Henrique's list of packages that use /dev/cpu/X/msr,

* cpufrequtils
* flashrom
* i7z
* intel-gpu-tools
* inteltool
* mcelog
* msrtool, msr-tools
* PAPI (can use msr_safe)
* powertop
* qemu
* slurm-llnl  (maybe this can also use msr_safe?)
* stressapptest
* turbostat
* wmlongrun, longrun
* x86info
* xserver-xorg-video-geode

it seems like visiting changes on each of these packages (and the other
packages that I'm sure I've missed) will be moderately difficult.

Thoughts?

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-30 Thread Prarit Bhargava

The MSR exposure seems to be okay with the following statements:

- complete read of /dev/cpu/X/msr is bad, whitelist instead
- needs to be dependent on either CPU version or reading MSRs for support.
  IIRC the Intel documentaton on the MSRs indicated that there are ways to
  check to see if a particular MSR is supported by a processor.  That doesn't
  seem difficult to do IMO.

Here are the options that have been discussed in this thread, as well as a
third that I'm proposing:

1.  Andy's PMU driver suggestion (eventually exposed via sysfs)
2.  Standalone driver (LLNL) which exposes values in sysfs (one value per
sysfs file)
3.  Make /dev/cpu/X/msr readable for those addresses in the whitelist.
ie) Allow read access to address for IA32_MPERF, etc.

I find exposing MSRs via sysfs difficult to maintain as we move forward.
I suppose we could name them according to their MSR names in the Intel
documentation, however from a userspace point of view I still find it
cumbersome to code that way.  Doing this in the existing /dev/ space has the
benefit that we wouldn't have to change any userspace code. For
those users who did (in some crazy situation) want full read access, they can
still do 'setcap' on that particular executable.

Using Henrique's list of packages that use /dev/cpu/X/msr,

* cpufrequtils
* flashrom
* i7z
* intel-gpu-tools
* inteltool
* mcelog
* msrtool, msr-tools
* PAPI (can use msr_safe)
* powertop
* qemu
* slurm-llnl  (maybe this can also use msr_safe?)
* stressapptest
* turbostat
* wmlongrun, longrun
* x86info
* xserver-xorg-video-geode

it seems like visiting changes on each of these packages (and the other
packages that I'm sure I've missed) will be moderately difficult.

Thoughts?

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-30 Thread Ingo Molnar

* Peter Zijlstra pet...@infradead.org wrote:

 On Tue, Jun 30, 2015 at 08:20:55AM -0400, Prarit Bhargava wrote:
  it seems like visiting changes on each of these packages (and the other
  packages that I'm sure I've missed) will be moderately difficult.
  
  Thoughts?
 
 Start by changing the ones users want to run most and leave the rest
 requiring root privs until someone has the time to convert them.

Yes, start by just converting the ones used by a single tool, say turbostat, 
and 
then convert turbostat (while keeping /dev/msr fall-back code as well) just to 
see 
what the tooling fallout is.

 Its not like we can remove the msr driver any time soon anyway.

Yes.

 So I would suggest starting with the perf MSR driver thingy for all
 those MSRs that count things and see if you can convert say
 turbostat/cpufrequtils/powertop over to that.

One of those tools would be enough, to keep the complexity of the initial 
submission low - and to allow a change of plans if necessary.

 I suspect there's MSR that are useful to expose but are not counting, I'm not 
 sure perf is the right interface for those.

 Making an inventory on which MSRs are required by these tools and what kind 
 of 
 data they provide might give a good idea on how to continue.
 
 If most of these tools only use counting MSRs that can be serviced with the 
 perf-msr driver then that would be great.

Fully agreed!

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-30 Thread Peter Zijlstra
On Tue, Jun 30, 2015 at 08:20:55AM -0400, Prarit Bhargava wrote:
 it seems like visiting changes on each of these packages (and the other
 packages that I'm sure I've missed) will be moderately difficult.
 
 Thoughts?

Start by changing the ones users want to run most and leave the rest
requiring root privs until someone has the time to convert them.

Its not like we can remove the msr driver any time soon anyway.

So I would suggest starting with the perf MSR driver thingy for all
those MSRs that count things and see if you can convert say
turbostat/cpufrequtils/powertop over to that.

I suspect there's MSR that are useful to expose but are not counting,
I'm not sure perf is the right interface for those.

Making an inventory on which MSRs are required by these tools and what
kind of data they provide might give a good idea on how to continue.

If most of these tools only use counting MSRs that can be serviced with
the perf-msr driver then that would be great.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-30 Thread Prarit Bhargava


On 06/30/2015 08:44 AM, Peter Zijlstra wrote:
 On Tue, Jun 30, 2015 at 08:20:55AM -0400, Prarit Bhargava wrote:
 it seems like visiting changes on each of these packages (and the other
 packages that I'm sure I've missed) will be moderately difficult.

 Thoughts?
 
 Start by changing the ones users want to run most and leave the rest
 requiring root privs until someone has the time to convert them.
 
 Its not like we can remove the msr driver any time soon anyway.
 

Yes, I agree.

 So I would suggest starting with the perf MSR driver thingy for all
 those MSRs that count things and see if you can convert say
 turbostat/cpufrequtils/powertop over to that.

Yep, that was my plan -- I'm going to do the in-tree utilities first then look
at the external programs.

 
 I suspect there's MSR that are useful to expose but are not counting,
 I'm not sure perf is the right interface for those.
 
 Making an inventory on which MSRs are required by these tools and what
 kind of data they provide might give a good idea on how to continue.
 
 If most of these tools only use counting MSRs that can be serviced with
 the perf-msr driver then that would be great.
 

Thanks Peter!

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-29 Thread H. Peter Anvin
On 06/28/2015 07:34 AM, Prarit Bhargava wrote:
>>
>> Seriously, though, it would be straightforward to make it handle a
>> more general list, complete with non-architectural stuff (such as the
>> upcoming PPERF in Skylake).
> 
> Is it easier to blacklist MSRs we don't want generally exposed, or only expose
> the ones that we think are safe?  That's sort of a devil's advocate sort of
> question ;) and I'm wondering what the shorter list is.
> 

The second is the only option.  Blacklisting MSRs is not safe, as you
have no idea what new MSRs might be introduced.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-29 Thread Matt Fleming
On Sun, 28 Jun, at 12:10:49PM, Henrique de Moraes Holschuh wrote:
> On Sun, 28 Jun 2015, Prarit Bhargava wrote:
> > Is it easier to blacklist MSRs we don't want generally exposed, or only 
> > expose
> > the ones that we think are safe?  That's sort of a devil's advocate sort of
> > question ;) and I'm wondering what the shorter list is.
> 
> The only way to make MSR access safe is to allow it only by whitelisting.
> The x86 platform restricts all MSR access to ring 0 for a damn good reason.

Blacklisting also breaks horribly if you run old kernels on new
hardware.

We need to "fail-closed" if someone tries to access an MSR the kernel
doesn't know about.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-29 Thread Ingo Molnar

* Henrique de Moraes Holschuh  wrote:

> On Sun, 28 Jun 2015, Prarit Bhargava wrote:
>
> > Is it easier to blacklist MSRs we don't want generally exposed, or only 
> > expose 
> > the ones that we think are safe?  That's sort of a devil's advocate sort of 
> > question ;) and I'm wondering what the shorter list is.
> 
> The only way to make MSR access safe is to allow it only by whitelisting. The 
> x86 platform restricts all MSR access to ring 0 for a damn good reason.

Exactly.

We also want to document them along the way: just exposing all doesn't achieve 
that.

> Also, such a whitelist would most likely need to be vendor and model-aware, 
> and 
> to differentiate "allow reads" from "allow writes"...

Initially it should only allow reads - which I believe fully meets turbostat's 
needs.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-29 Thread Matt Fleming
On Sun, 28 Jun, at 12:10:49PM, Henrique de Moraes Holschuh wrote:
 On Sun, 28 Jun 2015, Prarit Bhargava wrote:
  Is it easier to blacklist MSRs we don't want generally exposed, or only 
  expose
  the ones that we think are safe?  That's sort of a devil's advocate sort of
  question ;) and I'm wondering what the shorter list is.
 
 The only way to make MSR access safe is to allow it only by whitelisting.
 The x86 platform restricts all MSR access to ring 0 for a damn good reason.

Blacklisting also breaks horribly if you run old kernels on new
hardware.

We need to fail-closed if someone tries to access an MSR the kernel
doesn't know about.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-29 Thread H. Peter Anvin
On 06/28/2015 07:34 AM, Prarit Bhargava wrote:

 Seriously, though, it would be straightforward to make it handle a
 more general list, complete with non-architectural stuff (such as the
 upcoming PPERF in Skylake).
 
 Is it easier to blacklist MSRs we don't want generally exposed, or only expose
 the ones that we think are safe?  That's sort of a devil's advocate sort of
 question ;) and I'm wondering what the shorter list is.
 

The second is the only option.  Blacklisting MSRs is not safe, as you
have no idea what new MSRs might be introduced.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-29 Thread Ingo Molnar

* Henrique de Moraes Holschuh h...@hmh.eng.br wrote:

 On Sun, 28 Jun 2015, Prarit Bhargava wrote:

  Is it easier to blacklist MSRs we don't want generally exposed, or only 
  expose 
  the ones that we think are safe?  That's sort of a devil's advocate sort of 
  question ;) and I'm wondering what the shorter list is.
 
 The only way to make MSR access safe is to allow it only by whitelisting. The 
 x86 platform restricts all MSR access to ring 0 for a damn good reason.

Exactly.

We also want to document them along the way: just exposing all doesn't achieve 
that.

 Also, such a whitelist would most likely need to be vendor and model-aware, 
 and 
 to differentiate allow reads from allow writes...

Initially it should only allow reads - which I believe fully meets turbostat's 
needs.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-28 Thread Henrique de Moraes Holschuh
On Fri, 26 Jun 2015, Prarit Bhargava wrote:
> > The proper way to do this is to write a driver to only expose the MSRs
> > that the user tools need, and nothing else.
> 
> Will do -- At least I got everyone's attention with this :).

IMHO we need both a new driver that exposes semanthic functionality instead
of raw MSRs, and also to lock down the current MSR driver using processor
vendor, family and model-aware whitelists.

We have absolutely no idea of the real security impact of the CONFIG_X86_MSR
/dev/cpu/cpu#/msr driver, as that driver allows CAP_SYS_RAWIO userspace to
have complete access to all documented *and undocumented* MSRs.

Maybe we could build on the ideas and data already colleced by the msr-safe
LLNS code?

https://github.com/scalability-llnl/msr-safe
http://www.vi-hps.org/upload/program/espt-sc14/vi-hps-ESPT14-Shoga.pdf

At the very least, their work on building a list of safe MSRs should help...
Their code seems to be licensed under the GPLv3, which is a rather strange
choice of license for a kernel module.


A quick look using Debian's codesearch found these users of
/dev/cpu/cpu#/msr:

* cpufrequtils
* flashrom
* i7z
* intel-gpu-tools
* inteltool
* mcelog
* msrtool, msr-tools
* PAPI (can use msr_safe)
* powertop
* qemu
* slurm-llnl  (maybe this can also use msr_safe?)
* stressapptest
* turbostat
* wmlongrun, longrun
* x86info
* xserver-xorg-video-geode

As well as the other stuff that ships from the Linux kernel tree.

Looking at what they need the MSR access for (well, except for msrtools,
which is just a way for shell scripts to easily talk to the MSR driver)
might help define the problem space better.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-28 Thread Henrique de Moraes Holschuh
On Sun, 28 Jun 2015, Prarit Bhargava wrote:
> Is it easier to blacklist MSRs we don't want generally exposed, or only expose
> the ones that we think are safe?  That's sort of a devil's advocate sort of
> question ;) and I'm wondering what the shorter list is.

The only way to make MSR access safe is to allow it only by whitelisting.
The x86 platform restricts all MSR access to ring 0 for a damn good reason.

Also, such a whitelist would most likely need to be vendor and model-aware,
and to differentiate "allow reads" from "allow writes"...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-28 Thread Prarit Bhargava


On 06/27/2015 11:52 AM, Andy Lutomirski wrote:
> On Sat, Jun 27, 2015 at 1:39 AM, Ingo Molnar  wrote:
>>
>> * Ingo Molnar  wrote:
>>
>>> So what's wrong with exposing them as a simplified PMU driver?
>>>
>>> That way we only expose the ones we want to - plus tooling can use all the 
>>> rich
>>> perf features that can be used around this. (sampling, counting, call 
>>> chains,
>>> etc.)
>>
>> See below code from Andy that exposes a single MSR via perf. At the core of 
>> the
>> PMU driver is a single rdmsrl():
>>
>> +static void aperfmperf_event_start(struct perf_event *event, int flags)
>> +{
>> +   u64 now;
>> +
>> +   rdmsrl(event->hw.event_base, now);
>> +   local64_set(>hw.prev_count, now);
>> +}
>>

I just sat down to do something similar to what Andy had proposed here :).

>> Now I think what we really want is to expose not a single MSR but multiple 
>> MSRs in
>> a single driver, i.e. don't have one PMU driver per MSR, but have a driver 
>> that
>> allows the exposure of select MSRs as counters.
> 
> I'm way ahead of you: this driver can expose *two* MSRs as counters :)


> 
> Seriously, though, it would be straightforward to make it handle a
> more general list, complete with non-architectural stuff (such as the
> upcoming PPERF in Skylake).

Is it easier to blacklist MSRs we don't want generally exposed, or only expose
the ones that we think are safe?  That's sort of a devil's advocate sort of
question ;) and I'm wondering what the shorter list is.

> 
> This driver only knows how to handle counters, though.  I'm not sure
> whether all of the MSRs that turbostat needs are counters.

I knew that turbostat only did MSR reads and that's why turbostat's code was
changed in this patch.  TBH I'm more concerned for software that monitors system
power consumption, performance, and load.

I'll take what Andy has proposed and expand on it.

P.

> 
> --Andy
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-28 Thread Henrique de Moraes Holschuh
On Sun, 28 Jun 2015, Prarit Bhargava wrote:
 Is it easier to blacklist MSRs we don't want generally exposed, or only expose
 the ones that we think are safe?  That's sort of a devil's advocate sort of
 question ;) and I'm wondering what the shorter list is.

The only way to make MSR access safe is to allow it only by whitelisting.
The x86 platform restricts all MSR access to ring 0 for a damn good reason.

Also, such a whitelist would most likely need to be vendor and model-aware,
and to differentiate allow reads from allow writes...

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-28 Thread Prarit Bhargava


On 06/27/2015 11:52 AM, Andy Lutomirski wrote:
 On Sat, Jun 27, 2015 at 1:39 AM, Ingo Molnar mi...@kernel.org wrote:

 * Ingo Molnar mi...@kernel.org wrote:

 So what's wrong with exposing them as a simplified PMU driver?

 That way we only expose the ones we want to - plus tooling can use all the 
 rich
 perf features that can be used around this. (sampling, counting, call 
 chains,
 etc.)

 See below code from Andy that exposes a single MSR via perf. At the core of 
 the
 PMU driver is a single rdmsrl():

 +static void aperfmperf_event_start(struct perf_event *event, int flags)
 +{
 +   u64 now;
 +
 +   rdmsrl(event-hw.event_base, now);
 +   local64_set(event-hw.prev_count, now);
 +}


I just sat down to do something similar to what Andy had proposed here :).

 Now I think what we really want is to expose not a single MSR but multiple 
 MSRs in
 a single driver, i.e. don't have one PMU driver per MSR, but have a driver 
 that
 allows the exposure of select MSRs as counters.
 
 I'm way ahead of you: this driver can expose *two* MSRs as counters :)


 
 Seriously, though, it would be straightforward to make it handle a
 more general list, complete with non-architectural stuff (such as the
 upcoming PPERF in Skylake).

Is it easier to blacklist MSRs we don't want generally exposed, or only expose
the ones that we think are safe?  That's sort of a devil's advocate sort of
question ;) and I'm wondering what the shorter list is.

 
 This driver only knows how to handle counters, though.  I'm not sure
 whether all of the MSRs that turbostat needs are counters.

I knew that turbostat only did MSR reads and that's why turbostat's code was
changed in this patch.  TBH I'm more concerned for software that monitors system
power consumption, performance, and load.

I'll take what Andy has proposed and expand on it.

P.

 
 --Andy
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-28 Thread Henrique de Moraes Holschuh
On Fri, 26 Jun 2015, Prarit Bhargava wrote:
  The proper way to do this is to write a driver to only expose the MSRs
  that the user tools need, and nothing else.
 
 Will do -- At least I got everyone's attention with this :).

IMHO we need both a new driver that exposes semanthic functionality instead
of raw MSRs, and also to lock down the current MSR driver using processor
vendor, family and model-aware whitelists.

We have absolutely no idea of the real security impact of the CONFIG_X86_MSR
/dev/cpu/cpu#/msr driver, as that driver allows CAP_SYS_RAWIO userspace to
have complete access to all documented *and undocumented* MSRs.

Maybe we could build on the ideas and data already colleced by the msr-safe
LLNS code?

https://github.com/scalability-llnl/msr-safe
http://www.vi-hps.org/upload/program/espt-sc14/vi-hps-ESPT14-Shoga.pdf

At the very least, their work on building a list of safe MSRs should help...
Their code seems to be licensed under the GPLv3, which is a rather strange
choice of license for a kernel module.


A quick look using Debian's codesearch found these users of
/dev/cpu/cpu#/msr:

* cpufrequtils
* flashrom
* i7z
* intel-gpu-tools
* inteltool
* mcelog
* msrtool, msr-tools
* PAPI (can use msr_safe)
* powertop
* qemu
* slurm-llnl  (maybe this can also use msr_safe?)
* stressapptest
* turbostat
* wmlongrun, longrun
* x86info
* xserver-xorg-video-geode

As well as the other stuff that ships from the Linux kernel tree.

Looking at what they need the MSR access for (well, except for msrtools,
which is just a way for shell scripts to easily talk to the MSR driver)
might help define the problem space better.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-27 Thread Andy Lutomirski
On Sat, Jun 27, 2015 at 1:39 AM, Ingo Molnar  wrote:
>
> * Ingo Molnar  wrote:
>
>> So what's wrong with exposing them as a simplified PMU driver?
>>
>> That way we only expose the ones we want to - plus tooling can use all the 
>> rich
>> perf features that can be used around this. (sampling, counting, call chains,
>> etc.)
>
> See below code from Andy that exposes a single MSR via perf. At the core of 
> the
> PMU driver is a single rdmsrl():
>
> +static void aperfmperf_event_start(struct perf_event *event, int flags)
> +{
> +   u64 now;
> +
> +   rdmsrl(event->hw.event_base, now);
> +   local64_set(>hw.prev_count, now);
> +}
>
> Now I think what we really want is to expose not a single MSR but multiple 
> MSRs in
> a single driver, i.e. don't have one PMU driver per MSR, but have a driver 
> that
> allows the exposure of select MSRs as counters.

I'm way ahead of you: this driver can expose *two* MSRs as counters :)

Seriously, though, it would be straightforward to make it handle a
more general list, complete with non-architectural stuff (such as the
upcoming PPERF in Skylake).

This driver only knows how to handle counters, though.  I'm not sure
whether all of the MSRs that turbostat needs are counters.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-27 Thread Ingo Molnar

* Ingo Molnar  wrote:

> So what's wrong with exposing them as a simplified PMU driver?
> 
> That way we only expose the ones we want to - plus tooling can use all the 
> rich 
> perf features that can be used around this. (sampling, counting, call chains, 
> etc.)

See below code from Andy that exposes a single MSR via perf. At the core of the 
PMU driver is a single rdmsrl():

+static void aperfmperf_event_start(struct perf_event *event, int flags)
+{
+   u64 now;
+
+   rdmsrl(event->hw.event_base, now);
+   local64_set(>hw.prev_count, now);
+}

Now I think what we really want is to expose not a single MSR but multiple MSRs 
in 
a single driver, i.e. don't have one PMU driver per MSR, but have a driver that 
allows the exposure of select MSRs as counters.

There should also be a maker/family/model filter mechanism, so that certain 
MSRs 
are only exposed on models that are known to support them, etc.

Thanks,

Ingo

- Forwarded message from Andy Lutomirski  -

Date: Tue, 28 Apr 2015 14:25:37 -0700
From: Andy Lutomirski 
To: Len Brown , Peter Zijlstra , 
"linux-kernel@vger.kernel.org" 
Cc: Paul Mackerras , Ingo Molnar , Arnaldo 
Carvalho de Melo , Andy Lutomirski 
Subject: [RFC] x86, perf: Add an aperfmperf driver

Signed-off-by: Andy Lutomirski 
---

This driver seems a little bit silly, but I can imagine it being useful.  For
example, I think that turbostat could do some of its work without being
root if we had a driver like this.

Thoughts?  Would it make sense at all?  Did I wire it up right?  This is
the only PMU driver I've ever written, and it could have any number of
issues.

 arch/x86/kernel/cpu/Makefile|   2 +
 arch/x86/kernel/cpu/perf_event_aperfmperf.c | 119 
 2 files changed, 121 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/perf_event_aperfmperf.c

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 80091ae54c2b..fadc822efc90 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -45,6 +45,8 @@ obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)+= 
perf_event_intel_uncore.o \
   perf_event_intel_uncore_snb.o \
   perf_event_intel_uncore_snbep.o \
   perf_event_intel_uncore_nhmex.o
+obj-$(CONFIG_CPU_SUP_INTEL)+= perf_event_aperf_mperf.o
+obj-$(CONFIG_CPU_SUP_AMD)  += perf_event_aperf_mperf.o
 endif
 
 
diff --git a/arch/x86/kernel/cpu/perf_event_aperfmperf.c 
b/arch/x86/kernel/cpu/perf_event_aperfmperf.c
new file mode 100644
index ..6e6d113bd9ce
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_aperfmperf.c
@@ -0,0 +1,119 @@
+#include 
+
+#define APERFMPERF_EVENT_APERF 0
+#define APERFMPERF_EVENT_MPERF 1
+
+PMU_EVENT_ATTR_STRING(aperf, evattr_aperf, "event=0x00");
+PMU_EVENT_ATTR_STRING(mperf, evattr_mperf, "event=0x01");
+static struct attribute *events_attrs[] = {
+   _aperf.attr.attr,
+   _mperf.attr.attr,
+   NULL,
+};
+static struct attribute_group events_attr_group = {
+   .name = "events",
+   .attrs = events_attrs,
+};
+
+PMU_FORMAT_ATTR(event, "config:0-63");
+static struct attribute *format_attrs[] = {
+   _attr_event.attr,
+   NULL,
+};
+static struct attribute_group format_attr_group = {
+   .name = "format",
+   .attrs = format_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+   _attr_group,
+   _attr_group,
+   NULL,
+};
+
+static int aperfmperf_event_init(struct perf_event *event)
+{
+   if (event->attr.type != event->pmu->type)
+   return -ENOENT;
+
+   if (event->attr.config != APERFMPERF_EVENT_APERF &&
+   event->attr.config != APERFMPERF_EVENT_MPERF)
+   return -ENOENT;
+
+   if (event->attr.config1 != 0)
+   return -ENOENT;
+
+   /* no sampling */
+   if (event->hw.sample_period)
+   return -EINVAL;
+
+   /* unsupported modes and filters */
+   if (event->attr.exclude_user   ||
+   event->attr.exclude_kernel ||
+   event->attr.exclude_hv ||
+   event->attr.exclude_idle   ||
+   event->attr.exclude_host   ||
+   event->attr.exclude_guest  ||
+   event->attr.freq   ||
+   event->attr.sample_period) /* no sampling */
+   return -EINVAL;
+
+   event->hw.idx = -1;
+   event->hw.event_base = (event->attr.config == APERFMPERF_EVENT_APERF ?
+   MSR_IA32_APERF : MSR_IA32_MPERF);
+
+   return 0;
+}
+
+static void aperfmperf_event_update(struct perf_event *event)
+{
+   u64 prev;
+   u64 now;
+
+   rdmsrl(event->hw.event_base, now);
+   prev = local64_xchg(>hw.prev_count, now);
+   local64_add(now - prev, >count);
+}
+
+static void aperfmperf_event_start(struct perf_event *event, int flags)
+{
+   u64 now;
+
+   

Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-27 Thread Ingo Molnar

* Prarit Bhargava  wrote:

> Customers write system monitoring software for single systems as well as
> clusters.  In load-balancing software it is useful to know how "busy" a
> core is.  Unfortunately the only way to get this data is to run as root,
> or use setcap to allow userspace access for particular programs.  Both of
> these options are clunky at best.
> 
> This patch allows read access to the msr dev files which should be okay.
> No damage can be done by reading the MSR values and it allows non-root
> users to run system monitoring software.
> 
> The turbostat code specifically checks for CAP_SYS_RAWIO, which it
> shouldn't have to and I've removed that code.  Additionally I've modified
> the turbostat man page to remove documentation about configuring
> CAP_SYS_RAW_IO.
> 
> Note: Write access to msr is still restricted with this patch.
> 
> Cc: "H. Peter Anvin" 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: x...@kernel.org
> Cc: Len Brown 
> Cc: Prarit Bhargava 
> Cc: Dasaratharaman Chandramouli 
> Signed-off-by: Prarit Bhargava 
> ---
>  arch/x86/kernel/msr.c |   11 ---
>  tools/power/x86/turbostat/turbostat.8 |8 
>  tools/power/x86/turbostat/turbostat.c |   17 -
>  3 files changed, 8 insertions(+), 28 deletions(-)

So what's wrong with exposing them as a simplified PMU driver?

That way we only expose the ones we want to - plus tooling can use all the rich 
perf features that can be used around this. (sampling, counting, call chains, 
etc.)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-27 Thread Ingo Molnar

* Prarit Bhargava pra...@redhat.com wrote:

 Customers write system monitoring software for single systems as well as
 clusters.  In load-balancing software it is useful to know how busy a
 core is.  Unfortunately the only way to get this data is to run as root,
 or use setcap to allow userspace access for particular programs.  Both of
 these options are clunky at best.
 
 This patch allows read access to the msr dev files which should be okay.
 No damage can be done by reading the MSR values and it allows non-root
 users to run system monitoring software.
 
 The turbostat code specifically checks for CAP_SYS_RAWIO, which it
 shouldn't have to and I've removed that code.  Additionally I've modified
 the turbostat man page to remove documentation about configuring
 CAP_SYS_RAW_IO.
 
 Note: Write access to msr is still restricted with this patch.
 
 Cc: H. Peter Anvin h...@zytor.com
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@redhat.com
 Cc: x...@kernel.org
 Cc: Len Brown len.br...@intel.com
 Cc: Prarit Bhargava pra...@redhat.com
 Cc: Dasaratharaman Chandramouli dasaratharaman.chandramo...@intel.com
 Signed-off-by: Prarit Bhargava pra...@redhat.com
 ---
  arch/x86/kernel/msr.c |   11 ---
  tools/power/x86/turbostat/turbostat.8 |8 
  tools/power/x86/turbostat/turbostat.c |   17 -
  3 files changed, 8 insertions(+), 28 deletions(-)

So what's wrong with exposing them as a simplified PMU driver?

That way we only expose the ones we want to - plus tooling can use all the rich 
perf features that can be used around this. (sampling, counting, call chains, 
etc.)

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-27 Thread Ingo Molnar

* Ingo Molnar mi...@kernel.org wrote:

 So what's wrong with exposing them as a simplified PMU driver?
 
 That way we only expose the ones we want to - plus tooling can use all the 
 rich 
 perf features that can be used around this. (sampling, counting, call chains, 
 etc.)

See below code from Andy that exposes a single MSR via perf. At the core of the 
PMU driver is a single rdmsrl():

+static void aperfmperf_event_start(struct perf_event *event, int flags)
+{
+   u64 now;
+
+   rdmsrl(event-hw.event_base, now);
+   local64_set(event-hw.prev_count, now);
+}

Now I think what we really want is to expose not a single MSR but multiple MSRs 
in 
a single driver, i.e. don't have one PMU driver per MSR, but have a driver that 
allows the exposure of select MSRs as counters.

There should also be a maker/family/model filter mechanism, so that certain 
MSRs 
are only exposed on models that are known to support them, etc.

Thanks,

Ingo

- Forwarded message from Andy Lutomirski l...@kernel.org -

Date: Tue, 28 Apr 2015 14:25:37 -0700
From: Andy Lutomirski l...@kernel.org
To: Len Brown len.br...@intel.com, Peter Zijlstra pet...@infradead.org, 
linux-kernel@vger.kernel.org linux-kernel@vger.kernel.org
Cc: Paul Mackerras pau...@samba.org, Ingo Molnar mi...@redhat.com, Arnaldo 
Carvalho de Melo a...@kernel.org, Andy Lutomirski l...@kernel.org
Subject: [RFC] x86, perf: Add an aperfmperf driver

Signed-off-by: Andy Lutomirski l...@kernel.org
---

This driver seems a little bit silly, but I can imagine it being useful.  For
example, I think that turbostat could do some of its work without being
root if we had a driver like this.

Thoughts?  Would it make sense at all?  Did I wire it up right?  This is
the only PMU driver I've ever written, and it could have any number of
issues.

 arch/x86/kernel/cpu/Makefile|   2 +
 arch/x86/kernel/cpu/perf_event_aperfmperf.c | 119 
 2 files changed, 121 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/perf_event_aperfmperf.c

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 80091ae54c2b..fadc822efc90 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -45,6 +45,8 @@ obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)+= 
perf_event_intel_uncore.o \
   perf_event_intel_uncore_snb.o \
   perf_event_intel_uncore_snbep.o \
   perf_event_intel_uncore_nhmex.o
+obj-$(CONFIG_CPU_SUP_INTEL)+= perf_event_aperf_mperf.o
+obj-$(CONFIG_CPU_SUP_AMD)  += perf_event_aperf_mperf.o
 endif
 
 
diff --git a/arch/x86/kernel/cpu/perf_event_aperfmperf.c 
b/arch/x86/kernel/cpu/perf_event_aperfmperf.c
new file mode 100644
index ..6e6d113bd9ce
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_aperfmperf.c
@@ -0,0 +1,119 @@
+#include linux/perf_event.h
+
+#define APERFMPERF_EVENT_APERF 0
+#define APERFMPERF_EVENT_MPERF 1
+
+PMU_EVENT_ATTR_STRING(aperf, evattr_aperf, event=0x00);
+PMU_EVENT_ATTR_STRING(mperf, evattr_mperf, event=0x01);
+static struct attribute *events_attrs[] = {
+   evattr_aperf.attr.attr,
+   evattr_mperf.attr.attr,
+   NULL,
+};
+static struct attribute_group events_attr_group = {
+   .name = events,
+   .attrs = events_attrs,
+};
+
+PMU_FORMAT_ATTR(event, config:0-63);
+static struct attribute *format_attrs[] = {
+   format_attr_event.attr,
+   NULL,
+};
+static struct attribute_group format_attr_group = {
+   .name = format,
+   .attrs = format_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+   events_attr_group,
+   format_attr_group,
+   NULL,
+};
+
+static int aperfmperf_event_init(struct perf_event *event)
+{
+   if (event-attr.type != event-pmu-type)
+   return -ENOENT;
+
+   if (event-attr.config != APERFMPERF_EVENT_APERF 
+   event-attr.config != APERFMPERF_EVENT_MPERF)
+   return -ENOENT;
+
+   if (event-attr.config1 != 0)
+   return -ENOENT;
+
+   /* no sampling */
+   if (event-hw.sample_period)
+   return -EINVAL;
+
+   /* unsupported modes and filters */
+   if (event-attr.exclude_user   ||
+   event-attr.exclude_kernel ||
+   event-attr.exclude_hv ||
+   event-attr.exclude_idle   ||
+   event-attr.exclude_host   ||
+   event-attr.exclude_guest  ||
+   event-attr.freq   ||
+   event-attr.sample_period) /* no sampling */
+   return -EINVAL;
+
+   event-hw.idx = -1;
+   event-hw.event_base = (event-attr.config == APERFMPERF_EVENT_APERF ?
+   MSR_IA32_APERF : MSR_IA32_MPERF);
+
+   return 0;
+}
+
+static void aperfmperf_event_update(struct perf_event *event)
+{
+   u64 prev;
+   u64 now;
+
+   

Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-27 Thread Andy Lutomirski
On Sat, Jun 27, 2015 at 1:39 AM, Ingo Molnar mi...@kernel.org wrote:

 * Ingo Molnar mi...@kernel.org wrote:

 So what's wrong with exposing them as a simplified PMU driver?

 That way we only expose the ones we want to - plus tooling can use all the 
 rich
 perf features that can be used around this. (sampling, counting, call chains,
 etc.)

 See below code from Andy that exposes a single MSR via perf. At the core of 
 the
 PMU driver is a single rdmsrl():

 +static void aperfmperf_event_start(struct perf_event *event, int flags)
 +{
 +   u64 now;
 +
 +   rdmsrl(event-hw.event_base, now);
 +   local64_set(event-hw.prev_count, now);
 +}

 Now I think what we really want is to expose not a single MSR but multiple 
 MSRs in
 a single driver, i.e. don't have one PMU driver per MSR, but have a driver 
 that
 allows the exposure of select MSRs as counters.

I'm way ahead of you: this driver can expose *two* MSRs as counters :)

Seriously, though, it would be straightforward to make it handle a
more general list, complete with non-architectural stuff (such as the
upcoming PPERF in Skylake).

This driver only knows how to handle counters, though.  I'm not sure
whether all of the MSRs that turbostat needs are counters.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-26 Thread Prarit Bhargava


On 06/26/2015 03:23 PM, Brian Gerst wrote:
> On Fri, Jun 26, 2015 at 1:52 PM, Prarit Bhargava  wrote:
>> Customers write system monitoring software for single systems as well as
>> clusters.  In load-balancing software it is useful to know how "busy" a
>> core is.  Unfortunately the only way to get this data is to run as root,
>> or use setcap to allow userspace access for particular programs.  Both of
>> these options are clunky at best.
>>
>> This patch allows read access to the msr dev files which should be okay.
>> No damage can be done by reading the MSR values and it allows non-root
>> users to run system monitoring software.
>>
>> The turbostat code specifically checks for CAP_SYS_RAWIO, which it
>> shouldn't have to and I've removed that code.  Additionally I've modified
>> the turbostat man page to remove documentation about configuring
>> CAP_SYS_RAW_IO.
>>
>> Note: Write access to msr is still restricted with this patch.
> 
> Allowing unrestricted read access to all MSRs is wrong.  Some MSRs
> contain addresses of kernel data structures, which can be used in
> security exploits.
> 
> The proper way to do this is to write a driver to only expose the MSRs
> that the user tools need, and nothing else.

Will do -- At least I got everyone's attention with this :).

P.

> 
> --
> Brian Gerst
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-26 Thread Brian Gerst
On Fri, Jun 26, 2015 at 1:52 PM, Prarit Bhargava  wrote:
> Customers write system monitoring software for single systems as well as
> clusters.  In load-balancing software it is useful to know how "busy" a
> core is.  Unfortunately the only way to get this data is to run as root,
> or use setcap to allow userspace access for particular programs.  Both of
> these options are clunky at best.
>
> This patch allows read access to the msr dev files which should be okay.
> No damage can be done by reading the MSR values and it allows non-root
> users to run system monitoring software.
>
> The turbostat code specifically checks for CAP_SYS_RAWIO, which it
> shouldn't have to and I've removed that code.  Additionally I've modified
> the turbostat man page to remove documentation about configuring
> CAP_SYS_RAW_IO.
>
> Note: Write access to msr is still restricted with this patch.

Allowing unrestricted read access to all MSRs is wrong.  Some MSRs
contain addresses of kernel data structures, which can be used in
security exploits.

The proper way to do this is to write a driver to only expose the MSRs
that the user tools need, and nothing else.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-26 Thread H. Peter Anvin
On 06/26/2015 10:52 AM, Prarit Bhargava wrote:
> 
> This patch allows read access to the msr dev files which should be okay.
> No damage can be done by reading the MSR values and it allows non-root
> users to run system monitoring software.
> 

I don't believe that is true.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-26 Thread Prarit Bhargava
Customers write system monitoring software for single systems as well as
clusters.  In load-balancing software it is useful to know how "busy" a
core is.  Unfortunately the only way to get this data is to run as root,
or use setcap to allow userspace access for particular programs.  Both of
these options are clunky at best.

This patch allows read access to the msr dev files which should be okay.
No damage can be done by reading the MSR values and it allows non-root
users to run system monitoring software.

The turbostat code specifically checks for CAP_SYS_RAWIO, which it
shouldn't have to and I've removed that code.  Additionally I've modified
the turbostat man page to remove documentation about configuring
CAP_SYS_RAW_IO.

Note: Write access to msr is still restricted with this patch.

Cc: "H. Peter Anvin" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: x...@kernel.org
Cc: Len Brown 
Cc: Prarit Bhargava 
Cc: Dasaratharaman Chandramouli 
Signed-off-by: Prarit Bhargava 
---
 arch/x86/kernel/msr.c |   11 ---
 tools/power/x86/turbostat/turbostat.8 |8 
 tools/power/x86/turbostat/turbostat.c |   17 -
 3 files changed, 8 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 113e707..380d2ac 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -105,6 +105,9 @@ static ssize_t msr_write(struct file *file, const char 
__user *buf,
int err = 0;
ssize_t bytes = 0;
 
+   if (!capable(CAP_SYS_RAWIO))
+   return -EPERM;
+
if (count % 8)
return -EINVAL; /* Invalid chunk size */
 
@@ -148,6 +151,10 @@ static long msr_ioctl(struct file *file, unsigned int ioc, 
unsigned long arg)
break;
 
case X86_IOC_WRMSR_REGS:
+   if (!capable(CAP_SYS_RAWIO)) {
+   err = -EPERM;
+   break;
+   }
if (!(file->f_mode & FMODE_WRITE)) {
err = -EBADF;
break;
@@ -176,9 +183,6 @@ static int msr_open(struct inode *inode, struct file *file)
unsigned int cpu = iminor(file_inode(file));
struct cpuinfo_x86 *c;
 
-   if (!capable(CAP_SYS_RAWIO))
-   return -EPERM;
-
if (cpu >= nr_cpu_ids || !cpu_online(cpu))
return -ENXIO;  /* No such CPU */
 
@@ -241,6 +245,7 @@ static struct notifier_block __refdata 
msr_class_cpu_notifier = {
 
 static char *msr_devnode(struct device *dev, umode_t *mode)
 {
+   *mode = (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
return kasprintf(GFP_KERNEL, "cpu/%u/msr", MINOR(dev->devt));
 }
 
diff --git a/tools/power/x86/turbostat/turbostat.8 
b/tools/power/x86/turbostat/turbostat.8
index 05b8fc3..7b3fce4 100644
--- a/tools/power/x86/turbostat/turbostat.8
+++ b/tools/power/x86/turbostat/turbostat.8
@@ -204,14 +204,6 @@ not including any non-busy idle time.
 .SH NOTES
 
 .B "turbostat "
-must be run as root.
-Alternatively, non-root users can be enabled to run turbostat this way:
-
-# setcap cap_sys_rawio=ep ./turbostat
-
-# chmod +r /dev/cpu/*/msr
-
-.B "turbostat "
 reads hardware counters, but doesn't write them.
 So it will not interfere with the OS or other programs, including
 multiple invocations of itself.
diff --git a/tools/power/x86/turbostat/turbostat.c 
b/tools/power/x86/turbostat/turbostat.c
index 323b65e..d043ae8 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -1705,26 +1705,9 @@ void check_dev_msr()
 
 void check_permissions()
 {
-   struct __user_cap_header_struct cap_header_data;
-   cap_user_header_t cap_header = _header_data;
-   struct __user_cap_data_struct cap_data_data;
-   cap_user_data_t cap_data = _data_data;
-   extern int capget(cap_user_header_t hdrp, cap_user_data_t datap);
int do_exit = 0;
char pathname[32];
 
-   /* check for CAP_SYS_RAWIO */
-   cap_header->pid = getpid();
-   cap_header->version = _LINUX_CAPABILITY_VERSION;
-   if (capget(cap_header, cap_data) < 0)
-   err(-6, "capget(2) failed");
-
-   if ((cap_data->effective & (1 << CAP_SYS_RAWIO)) == 0) {
-   do_exit++;
-   warnx("capget(CAP_SYS_RAWIO) failed,"
-   " try \"# setcap cap_sys_rawio=ep %s\"", progname);
-   }
-
/* test file permissions */
sprintf(pathname, "/dev/cpu/%d/msr", base_cpu);
if (euidaccess(pathname, R_OK)) {
-- 
1.7.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-26 Thread Prarit Bhargava
Customers write system monitoring software for single systems as well as
clusters.  In load-balancing software it is useful to know how busy a
core is.  Unfortunately the only way to get this data is to run as root,
or use setcap to allow userspace access for particular programs.  Both of
these options are clunky at best.

This patch allows read access to the msr dev files which should be okay.
No damage can be done by reading the MSR values and it allows non-root
users to run system monitoring software.

The turbostat code specifically checks for CAP_SYS_RAWIO, which it
shouldn't have to and I've removed that code.  Additionally I've modified
the turbostat man page to remove documentation about configuring
CAP_SYS_RAW_IO.

Note: Write access to msr is still restricted with this patch.

Cc: H. Peter Anvin h...@zytor.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@redhat.com
Cc: x...@kernel.org
Cc: Len Brown len.br...@intel.com
Cc: Prarit Bhargava pra...@redhat.com
Cc: Dasaratharaman Chandramouli dasaratharaman.chandramo...@intel.com
Signed-off-by: Prarit Bhargava pra...@redhat.com
---
 arch/x86/kernel/msr.c |   11 ---
 tools/power/x86/turbostat/turbostat.8 |8 
 tools/power/x86/turbostat/turbostat.c |   17 -
 3 files changed, 8 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 113e707..380d2ac 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -105,6 +105,9 @@ static ssize_t msr_write(struct file *file, const char 
__user *buf,
int err = 0;
ssize_t bytes = 0;
 
+   if (!capable(CAP_SYS_RAWIO))
+   return -EPERM;
+
if (count % 8)
return -EINVAL; /* Invalid chunk size */
 
@@ -148,6 +151,10 @@ static long msr_ioctl(struct file *file, unsigned int ioc, 
unsigned long arg)
break;
 
case X86_IOC_WRMSR_REGS:
+   if (!capable(CAP_SYS_RAWIO)) {
+   err = -EPERM;
+   break;
+   }
if (!(file-f_mode  FMODE_WRITE)) {
err = -EBADF;
break;
@@ -176,9 +183,6 @@ static int msr_open(struct inode *inode, struct file *file)
unsigned int cpu = iminor(file_inode(file));
struct cpuinfo_x86 *c;
 
-   if (!capable(CAP_SYS_RAWIO))
-   return -EPERM;
-
if (cpu = nr_cpu_ids || !cpu_online(cpu))
return -ENXIO;  /* No such CPU */
 
@@ -241,6 +245,7 @@ static struct notifier_block __refdata 
msr_class_cpu_notifier = {
 
 static char *msr_devnode(struct device *dev, umode_t *mode)
 {
+   *mode = (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
return kasprintf(GFP_KERNEL, cpu/%u/msr, MINOR(dev-devt));
 }
 
diff --git a/tools/power/x86/turbostat/turbostat.8 
b/tools/power/x86/turbostat/turbostat.8
index 05b8fc3..7b3fce4 100644
--- a/tools/power/x86/turbostat/turbostat.8
+++ b/tools/power/x86/turbostat/turbostat.8
@@ -204,14 +204,6 @@ not including any non-busy idle time.
 .SH NOTES
 
 .B turbostat 
-must be run as root.
-Alternatively, non-root users can be enabled to run turbostat this way:
-
-# setcap cap_sys_rawio=ep ./turbostat
-
-# chmod +r /dev/cpu/*/msr
-
-.B turbostat 
 reads hardware counters, but doesn't write them.
 So it will not interfere with the OS or other programs, including
 multiple invocations of itself.
diff --git a/tools/power/x86/turbostat/turbostat.c 
b/tools/power/x86/turbostat/turbostat.c
index 323b65e..d043ae8 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -1705,26 +1705,9 @@ void check_dev_msr()
 
 void check_permissions()
 {
-   struct __user_cap_header_struct cap_header_data;
-   cap_user_header_t cap_header = cap_header_data;
-   struct __user_cap_data_struct cap_data_data;
-   cap_user_data_t cap_data = cap_data_data;
-   extern int capget(cap_user_header_t hdrp, cap_user_data_t datap);
int do_exit = 0;
char pathname[32];
 
-   /* check for CAP_SYS_RAWIO */
-   cap_header-pid = getpid();
-   cap_header-version = _LINUX_CAPABILITY_VERSION;
-   if (capget(cap_header, cap_data)  0)
-   err(-6, capget(2) failed);
-
-   if ((cap_data-effective  (1  CAP_SYS_RAWIO)) == 0) {
-   do_exit++;
-   warnx(capget(CAP_SYS_RAWIO) failed,
-try \# setcap cap_sys_rawio=ep %s\, progname);
-   }
-
/* test file permissions */
sprintf(pathname, /dev/cpu/%d/msr, base_cpu);
if (euidaccess(pathname, R_OK)) {
-- 
1.7.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-26 Thread H. Peter Anvin
On 06/26/2015 10:52 AM, Prarit Bhargava wrote:
 
 This patch allows read access to the msr dev files which should be okay.
 No damage can be done by reading the MSR values and it allows non-root
 users to run system monitoring software.
 

I don't believe that is true.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-26 Thread Prarit Bhargava


On 06/26/2015 03:23 PM, Brian Gerst wrote:
 On Fri, Jun 26, 2015 at 1:52 PM, Prarit Bhargava pra...@redhat.com wrote:
 Customers write system monitoring software for single systems as well as
 clusters.  In load-balancing software it is useful to know how busy a
 core is.  Unfortunately the only way to get this data is to run as root,
 or use setcap to allow userspace access for particular programs.  Both of
 these options are clunky at best.

 This patch allows read access to the msr dev files which should be okay.
 No damage can be done by reading the MSR values and it allows non-root
 users to run system monitoring software.

 The turbostat code specifically checks for CAP_SYS_RAWIO, which it
 shouldn't have to and I've removed that code.  Additionally I've modified
 the turbostat man page to remove documentation about configuring
 CAP_SYS_RAW_IO.

 Note: Write access to msr is still restricted with this patch.
 
 Allowing unrestricted read access to all MSRs is wrong.  Some MSRs
 contain addresses of kernel data structures, which can be used in
 security exploits.
 
 The proper way to do this is to write a driver to only expose the MSRs
 that the user tools need, and nothing else.

Will do -- At least I got everyone's attention with this :).

P.

 
 --
 Brian Gerst
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-06-26 Thread Brian Gerst
On Fri, Jun 26, 2015 at 1:52 PM, Prarit Bhargava pra...@redhat.com wrote:
 Customers write system monitoring software for single systems as well as
 clusters.  In load-balancing software it is useful to know how busy a
 core is.  Unfortunately the only way to get this data is to run as root,
 or use setcap to allow userspace access for particular programs.  Both of
 these options are clunky at best.

 This patch allows read access to the msr dev files which should be okay.
 No damage can be done by reading the MSR values and it allows non-root
 users to run system monitoring software.

 The turbostat code specifically checks for CAP_SYS_RAWIO, which it
 shouldn't have to and I've removed that code.  Additionally I've modified
 the turbostat man page to remove documentation about configuring
 CAP_SYS_RAW_IO.

 Note: Write access to msr is still restricted with this patch.

Allowing unrestricted read access to all MSRs is wrong.  Some MSRs
contain addresses of kernel data structures, which can be used in
security exploits.

The proper way to do this is to write a driver to only expose the MSRs
that the user tools need, and nothing else.

--
Brian Gerst
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/