Re: [PATCH] net: appletalk: remove cops support

2023-09-27 Thread Prarit Bhargava

On 9/27/23 05:26, Christoph Hellwig wrote:

On Wed, Sep 27, 2023 at 11:00:30AM +0200, Greg Kroah-Hartman wrote:

The COPS Appletalk support is very old, never said to actually work
properly, and the firmware code for the devices are under a very suspect
license.  Remove it all to clear up the license issue, if it is still
needed and actually used by anyone, we can add it back later once the
license is cleared up.


Looks good:

Acked-by: Christoph Hellwig 



Ditto.

Acked-by: Prarit Bhargava 

P.



Re: SPDX: Appletalk FW license in the kernel

2023-09-26 Thread Prarit Bhargava

On 9/26/23 04:02, Greg KH wrote:

On Tue, Sep 26, 2023 at 12:34:03AM -0700, Christoph Hellwig wrote:

On Fri, Sep 15, 2023 at 09:39:05AM -0400, Prarit Bhargava wrote:

To be clear, I am not asking for their removal, however, I do think a better
license should be issued for these files.  The files were trivially modified
in 2006. It could be that the code in question is now unused and it is just
easier to remove them.

Is there anyone you know of that we could approach to determine a proper
SPDX License for these files?


The code contains firmware that is downloaded to the device.  The proper
thing would be to convert them to separate binary files in the
linux-firmware packages.  But given that the driver has seen nothing
but tree wide cleanups since the dawn of git I suspect there is no
maintainer and probably no user left.  The best might be to indeed just
remove it and see if anyone screams, in which case we could bring it
back after doing the above.



We should just remove them for now, I have no objection to that at all.

Want me to send the patch?


Yes, that would be appreciated.  Thanks :)

P.



thanks,

greg k-h





Re: [PATCH] x86/apic/vector: Fix ordering in vector assignment

2020-12-11 Thread Prarit Bhargava



On 12/10/20 3:18 PM, Thomas Gleixner wrote:
> Prarit reported that depending on the affinity setting the
> 
>  ' irq $N: Affinity broken due to vector space exhaustion.'
> 
> message is showing up in dmesg, but the vector space on the CPUs in the
> affinity mask is definitely not exhausted.
> 
> Shung-Hsi provided traces and analysis which pinpoints the problem:
> 
> The ordering of trying to assign an interrupt vector in
> assign_irq_vector_any_locked() is simply wrong if the interrupt data has a
> valid node assigned. It does:
> 
>  1) Try the intersection of affinity mask and node mask
>  2) Try the node mask
>  3) Try the full affinity mask
>  4) Try the full online mask
> 
> Obviously #2 and #3 are in the wrong order as the requested affinity
> mask has to take precedence.
> 
> In the observed cases #1 failed because the affinity mask did not contain
> CPUs from node 0. That made it allocate a vector from node 0, thereby
> breaking affinity and emitting the misleading message.
> 
> Revert the order of #2 and #3 so the full affinity mask without the node
> intersection is tried before actually affinity is broken.
> 
> If no node is assigned then only the full affinity mask and if that fails
> the full online mask is tried.
> 
> Fixes: d6ffc6ac83b1 ("x86/vector: Respect affinity mask in irq descriptor")
> Reported-by: Shung-Hsi Yu 
> Reported-by: Prarit Bhargava 
> Signed-off-by: Thomas Gleixner 
> Tested-by: Shung-Hsi Yu 
> Cc: sta...@vger.kernel.org
> ---
>  arch/x86/kernel/apic/vector.c |   24 ++--
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -273,20 +273,24 @@ static int assign_irq_vector_any_locked(
>   const struct cpumask *affmsk = irq_data_get_affinity_mask(irqd);
>   int node = irq_data_get_node(irqd);
>  
> - if (node == NUMA_NO_NODE)
> - goto all;
> - /* Try the intersection of @affmsk and node mask */
> - cpumask_and(vector_searchmask, cpumask_of_node(node), affmsk);
> - if (!assign_vector_locked(irqd, vector_searchmask))
> - return 0;
> - /* Try the node mask */
> - if (!assign_vector_locked(irqd, cpumask_of_node(node)))
> - return 0;
> -all:
> + if (node != NUMA_NO_NODE) {
> + /* Try the intersection of @affmsk and node mask */
> + cpumask_and(vector_searchmask, cpumask_of_node(node), affmsk);
> + if (!assign_vector_locked(irqd, vector_searchmask))
> + return 0;
> + }
> +
>   /* Try the full affinity mask */
>   cpumask_and(vector_searchmask, affmsk, cpu_online_mask);
>   if (!assign_vector_locked(irqd, vector_searchmask))
>   return 0;
> +
> + if (node != NUMA_NO_NODE) {
> + /* Try the node mask */
> + if (!assign_vector_locked(irqd, cpumask_of_node(node)))
> + return 0;
> + }
> +
>   /* Try the full online mask */
>   return assign_vector_locked(irqd, cpu_online_mask);
>  }
> 

Tested-and-Reviewed-by: Prarit Bhargava 

P.



Re: "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console

2020-11-11 Thread Prarit Bhargava



On 11/10/20 3:56 PM, Thomas Gleixner wrote:
> Prarit,
> 
> On Tue, Nov 10 2020 at 14:24, Prarit Bhargava wrote:
>> Occasionally when logging out of the ttyS0 aka serial console I see that
>>
>>  irq 4: Affinity broken due to vector space exhaustion.
>>
>> *** console shutdown, IRQ released for cpu on socket 1
>> *** console starts back up again, IRQ assigned to on cpu on socket 0
>>
>> In this process, however, the smp_affinity is not cleared for IRQ4.  That 
>> is, it
>> remains as
>>
>> /proc/irq/4/smp_affinity:ff00,,ff00
>>
>> so that the check in activate_reserved() fails and
>>
>> "irq 4: Affinity broken due to vector space exhaustion."
>>
>> is output to the screen.
>>
>> I am not sure of correct fix here.  It looks like the smp_affinity should be
>> reset to default at irq shutdown, however, I cannot determine if that should 
>> be
>> done for every IRQ, or (hopefully not) per driver.
> 
> This has been that way forever and there was a discussion about this
> at least 15 years ago. I can't find it at the moment because I can't
> access my archives and I failed to find the conversation on lore.
> 
> But here is the gist:
> 
> At some point I actually wanted to reset the affinity mask to the
> default affinity mask in free_irq() when the last action was
> removed.
> 
> That broke setups where the affinity of the serial console interrupt,
> was set to some specific CPU by the admin and then it was moved to some
> other place due to logout -> shutdown -> startup.
> 
> So we left the historic behaviour untouched.
> 
> So now you are complaining about it the other way round and I have to
> tell you that there is no correct fix neither in the core nor in a
> driver.
> 
> The real problem is irqbalanced aggressively exhausting the vector space
> of a _whole_ socket to the point that there is not a single vector left
> for serial. That's the problem you want to fix.

Thanks tglx.  I'll go figure this out with the irqbalanced team.

P.

> 
> Thanks,
> 
> tglx
> 



"irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console

2020-11-10 Thread Prarit Bhargava
Occasionally when logging out of the ttyS0 aka serial console I see that

irq 4: Affinity broken due to vector space exhaustion.

is output to the console.

At boot the default smp_affinity is

/proc/irq/4/smp_affinity:00ff,,00ff

The irqbalance service runs and can change this value.  Depending on system load
and behaviour, the IRQ can be assigned to a cpu on socket 1 and the smp_affinity
is changed to socket 1.  In that case,

/proc/irq/4/smp_affinity:ff00,,ff00

When the user logs out of the serial console, the console is shut down and IRQ
is free'd.  The IRQ is immediately reacquired by the serial console when it
starts up again.

For example,

Red Hat Enterprise Linux 8.4 Beta (Ootpa)
Kernel 5.10.0-rc2+ on an x86_64

HOSTNAME: intel-whitley-07.khw1.lab.eng.bos.redhat.com
Activate the web console with: systemctl enable --now cockpit.socket

intel-whitley-07 login: root
Password:

Last login: Mon Nov  9 19:13:33 on ttyS0
[07:25 PM root@intel-whitley-07 ~]# exit

*** console shutdown, IRQ released for cpu on socket 1
*** console starts back up again, IRQ assigned to on cpu on socket 0

In this process, however, the smp_affinity is not cleared for IRQ4.  That is, it
remains as

/proc/irq/4/smp_affinity:ff00,,ff00

so that the check in activate_reserved() fails and

"irq 4: Affinity broken due to vector space exhaustion."

is output to the screen.

I am not sure of correct fix here.  It looks like the smp_affinity should be
reset to default at irq shutdown, however, I cannot determine if that should be
done for every IRQ, or (hopefully not) per driver.

Can anyone offer guidance on a fix?

P.



Re: [PATCH] module: Add more error message for failed kernel module loading

2020-09-01 Thread Prarit Bhargava



On 9/1/20 4:17 PM, Lucas De Marchi wrote:
> On Tue, Sep 1, 2020 at 12:56 PM Prarit Bhargava  wrote:
>>
>>
>>
>> On 9/1/20 2:50 PM, Lucas De Marchi wrote:
>>> On Sat, Aug 29, 2020 at 4:15 AM Qu Wenruo  wrote:
>>>>
>>>> When kernel module loading failed, user space only get one of the
>>>> following error messages:
>>>> - -ENOEXEC
>>>>   This is the most confusing one. From corrupted ELF header to bad
>>>>   WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
>>>>   all returns this error number.
>>>>
>>>> - -EPERM
>>>>   This is for blacklisted modules. But mod doesn't do extra explain
>>>>   on this error either.
>>>>
>>>> - -ENOMEM
>>>>   The only error which needs no explain.
>>>>
>>>> This means, if a user got "Exec format error" from modprobe, it provides
>>>> no meaningful way for the user to debug, and will take extra time
>>>> communicating to get extra info.
>>>>
>>>> So this patch will add extra error messages for -ENOEXEC and -EPERM
>>>> errors, allowing user to do better debugging and reporting.
>>>>
>>>> Signed-off-by: Qu Wenruo 
>>>> ---
>>>>  kernel/module.c | 11 +--
>>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/module.c b/kernel/module.c
>>>> index 1c5cff34d9f2..9f748c6eeb48 100644
>>>> --- a/kernel/module.c
>>>> +++ b/kernel/module.c
>>>> @@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr 
>>>> *hdr, Elf_Shdr *sechdrs,
>>>> int i;
>>>>
>>>> for (i = 0; i < hdr->e_shnum; i++) {
>>>> -   if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
>>>> +   if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
>>>> +   pr_err(
>>>> +   "Module %s section %d has invalid WRITE|EXEC 
>>>> flags\n",
>>>> +   mod->name, i);
>>>> return -ENOEXEC;
>>>> +   }
>>>> }
>>>>
>>>> return 0;
>>>> @@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, 
>>>> const char __user *uargs,
>>>> char *after_dashes;
>>>>
>>>> err = elf_header_check(info);
>>>> -   if (err)
>>>> +   if (err) {
>>>> +   pr_err("Module has invalid ELF header\n");
>>>> goto free_copy;
>>>> +   }
>>>>
>>>> err = setup_load_info(info, flags);
>>>> if (err)
>>>> @@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const 
>>>> char __user *uargs,
>>>>
>>>> if (blacklisted(info->name)) {
>>>> err = -EPERM;
>>>> +   pr_err("Module %s is blacklisted\n", info->name);
>>>
>>> I wonder why would anyone actually add the blacklist to the command
>>> line like this and have no
>>> way to revert that back. This was introduced in
>>
>> Debug.  Debug.  Debug. ;)  The parameter was added to debug broken 
>> installations
>> and kernel boots.
>>
>>> be7de5f91fdc modules: Add kernel parameter to blacklist modules
>>> as a way to overcome broken initrd generation afaics.
>>
>> Not the generation of the initramfs, but blocking a module loaded during the
>> boot.  The installation may have failed and there's no easy way to then debug
>> what module was responsible for the failure.
> 
> if you are using initrd, then *inside* the initrd you should have the
> /etc/modprobe.d/* file
> you want. I said "broken initrd generation" because the tool should
> put the file there, and
> apparently for you it isn't.
> 
> Even if you don't have the file, you could use modprobe.blacklist= and
> let the blacklist happen
> in the userspace library rather than in the kernel. Module loading is
> not like firmware loading
> that happens without help from userspace.
> 
>>
>>  Either kernel
>>> command line (using modprobe.blacklist)
>>> or /etc/modprobe.d options are honoured by libkmod and allow a
>>> sufficiently privileged user to bypass it.

Re: [PATCH] module: Add more error message for failed kernel module loading

2020-09-01 Thread Prarit Bhargava



On 9/1/20 2:50 PM, Lucas De Marchi wrote:
> On Sat, Aug 29, 2020 at 4:15 AM Qu Wenruo  wrote:
>>
>> When kernel module loading failed, user space only get one of the
>> following error messages:
>> - -ENOEXEC
>>   This is the most confusing one. From corrupted ELF header to bad
>>   WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
>>   all returns this error number.
>>
>> - -EPERM
>>   This is for blacklisted modules. But mod doesn't do extra explain
>>   on this error either.
>>
>> - -ENOMEM
>>   The only error which needs no explain.
>>
>> This means, if a user got "Exec format error" from modprobe, it provides
>> no meaningful way for the user to debug, and will take extra time
>> communicating to get extra info.
>>
>> So this patch will add extra error messages for -ENOEXEC and -EPERM
>> errors, allowing user to do better debugging and reporting.
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>>  kernel/module.c | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 1c5cff34d9f2..9f748c6eeb48 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, 
>> Elf_Shdr *sechdrs,
>> int i;
>>
>> for (i = 0; i < hdr->e_shnum; i++) {
>> -   if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
>> +   if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
>> +   pr_err(
>> +   "Module %s section %d has invalid WRITE|EXEC 
>> flags\n",
>> +   mod->name, i);
>> return -ENOEXEC;
>> +   }
>> }
>>
>> return 0;
>> @@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, const 
>> char __user *uargs,
>> char *after_dashes;
>>
>> err = elf_header_check(info);
>> -   if (err)
>> +   if (err) {
>> +   pr_err("Module has invalid ELF header\n");
>> goto free_copy;
>> +   }
>>
>> err = setup_load_info(info, flags);
>> if (err)
>> @@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const 
>> char __user *uargs,
>>
>> if (blacklisted(info->name)) {
>> err = -EPERM;
>> +   pr_err("Module %s is blacklisted\n", info->name);
> 
> I wonder why would anyone actually add the blacklist to the command
> line like this and have no
> way to revert that back. This was introduced in

Debug.  Debug.  Debug. ;)  The parameter was added to debug broken installations
and kernel boots.

> be7de5f91fdc modules: Add kernel parameter to blacklist modules
> as a way to overcome broken initrd generation afaics.

Not the generation of the initramfs, but blocking a module loaded during the
boot.  The installation may have failed and there's no easy way to then debug
what module was responsible for the failure.

 Either kernel
> command line (using modprobe.blacklist)
> or /etc/modprobe.d options are honoured by libkmod and allow a
> sufficiently privileged user to bypass it.
> 

Both of those options only work if the filesystem is mounted IIRC.  It could be
the case that modprobe.blacklist now works earlier in the boot, however, I've
never used it because module_blacklist is the biggest and best hammer that I can
use to get through a broken installation or boot.

In any case you're incorrectly assuming that the system has a filesystem on it.
That's not necessarily the case as I'll explain below.

> +Rusty, +Prarit: is there anything this module parameter is covering
> that I'm missing?

This is the situation I have repeatedly come across :  A system at a remote site
will not boot any flavor of Linux.  Since the system would not install the only
way to debug was to provide install images to workaround a module load failure.
As you can imagine that is a time consuming process as well as a bad end user
experience.

I got so sick of it that I wrote the code above (well similar to it anyway --
thanks for the review Randy ;)) to add the module_blacklist parameter to make
debugging an uninstallable "bricked" system easier.

It's come in handy in some unexpected ways.  We've had situations where modules
have loaded and corrupted memory and blacklisting them revealed that the modules
were responsible for the memory corruption.

P.

> 
> For the changes here,
> 
> Reviewed-by: Lucas De Marchi 
> 
> thanks
> Lucas De Marchi
> 
>> goto free_copy;
>> }
>>
>> --
>> 2.27.0
>>
> 



Re: [RFC PATCH] printk: Change timestamp to triplet as mono, boot and real

2020-08-11 Thread Prarit Bhargava



On 8/11/20 9:02 AM, Petr Mladek wrote:
> On Tue 2020-08-11 14:05:12, Thomas Gleixner wrote:
>> Petr Mladek  writes:
>>> At least "crash" tool would need an update anyway. AFAIK, it checks
>>> the size of struct printk_log and refuses to read it when it changes.
>>>
>>> It means that the hack with VMCOREINFO_FIELD_OFFSET probably is not
>>> needed because we would need to update the crashdump-related tools anyway.
>>>
>>> Well, the timing is good. We are about to switch the printk ring
>>> buffer into a lockless one. It requires updating the crashdump tools
>>> as well. We could do this at the same time. The lockless ring buffer
>>> already is in linux-next. It is aimed for 5.10 or 5.11.
>> ...
>>> It would be great to synchronize all these changes changes of the
>>> printk log buffer structures.
>>
>> I agree that having one update is a good thing, but pretty please can we
>> finally make progress with this and not create yet another dependency?
> 
> To make it clear. I definitely do not want to block lockless printk by
> this.

Thanks for clarifying that.  I had the same concern that tglx had.

> 
> BTW: I am not 100% convinced that storing all three timestamps is
> worth it. It increases the code complexity, metadata size. It needs
> an interface with the userspace that has to stay backward compatible.
> 
> Also it still will be racy because the timestamp is taken when the message
> is printed. It might be "long" before or after the event that
> it talks about.

That scenario plays out today with the current timestamp.  Printing debug with a
better timestamp doesn't resolve that problem nor is it intended to.

> 
> There is still the alternative to print all three timestamps regularly
> for those interested. It is less user convenient but much easier
> to maintain.

While I agree, in general, your alternative is useful for in-person debugging it
is not helpful in cases where a user only has an image of a panic where the
printk time-synch message has scrolled off the screen.

Consider this real debug case where I hacked the boottime stamp into printk:  We
had some systems with flaky HW that hit panics at 10 hours [1] of uptime.  Since
these systems came from different vendors with different HW the clocks were
out-of-sync.  I had a suspicion that it was some human-coded event causing the
HW to die but wasn't sure until I did a boottime-stamped printk to prove that
all the systems died after 10 hours.

I could have set a stopwatch or timer to somehow catch this kind of event.  I
could have set up video cameras to watch the consoles, etc.  There are a lot of
other ways I could have debugged this situation but ultimately the fastest thing
to do was code and provide kernels to the various HW companies and see if
everything lined up as I thought it would.

This problem happens far more often then I'd like to admit and we still see this
type of problem with new HW and FW.  I also recall that other kernel groups
(storage, networking, etc.) were interested in the timestamps as it would make
their debugging easier to have a true synchronized timestamp available.

One other thing -- IIRC I had to modify the dmesg code by providing a sysfs (or
proc?) interface for dmesg to identify the stamp.  It's something that should be
investigated with this code too.

P.

[1] It was 3+ years ago.  I can't remember if it was 10 or 100 but you get the
point.

> 
> Best Regards,
> Petr
> 



Re: [PATCH V13] printk: Add monotonic, boottime, and realtime timestamps

2020-08-05 Thread Prarit Bhargava



On 8/5/20 10:04 AM, Petr Mladek wrote:
> On Wed 2020-07-29 08:22:36, Prarit Bhargava wrote:
>>   Chunyan Zhang  wrote:
>>> From: Prarit Bhargava 
>>>
>>> printk.time=1/CONFIG_PRINTK_TIME=1 adds a unmodified local hardware clock
>>> timestamp to printk messages.  The local hardware clock loses time each
>>> day making it difficult to determine exactly when an issue has occurred in
>>> the kernel log, and making it difficult to determine how kernel and
>>> hardware issues relate to each other in real time.
>>>
>>> Make printk output different timestamps by adding options for no
>>> timestamp, the local hardware clock, the monotonic clock, the boottime
>>> clock, and the real clock.  Allow a user to pick one of the clocks by
>>> using the printk.time kernel parameter.  Output the type of clock in
>>> /sys/module/printk/parameters/time so userspace programs can interpret the
>>> timestamp.
>>>
>> ISTR the reason that this was dropped was because of the a problem with
>> the way systemd read the kernel's timestamps.  It got the attention of
>> Linus, and it was then pulled from the tree.
>>
>> I need to go back and review the entire thread as it's been several years
>> since we had the discussion although ISTR someone mentioning that doing two
>> timestamps would not be a problem for systemd.
> 
> I guess that you talk about this thread
> https://lore.kernel.org/lkml/ca+55afwufa__6mgmgjennx+_ryy2zoolisx2ea1avyhszn+...@mail.gmail.com/
> 
>> For example,
>>
>> [48551.015086]
>>
>> would be
>>
>> [48551.015086] m[.]
>>
>> for the monotonic clock timestamp, and
>>
>> [48551.015086] b[.]
>>
>> for the boottime clock, etc.
> 
> This approach has several drawbacks:
> 
>   + Too long prefix might make it hard to see the real messages
> because of shrunken/wrapped lines.
> 
>   + Too long lines are problem with slow consoles.
> 
>   + More space will be necessary to store all the time stamps.
> 
>   + Userspace tools would need/want to parse the format. We would
> need to maintain it forever.
> 
> 
> Linus had an interesting idea to print all timestamps regularly.
> The frequency might be configurable. It might print, for example,
> the following line every 10 minutes or once a day:
> 
>   [48551.015086] System alive: b[.] m[.]
> 
> It might be useful in general to see when the system was still alive
> before it froze.
> 
> Would it be enough to sort messages printed with different clock
> sources?

Hey Petr,

After reviewing the thread (it has been three years after all), I have asked
Orson and Lyra to look at the suggested changes by tglx and Linus and submit
those instead of "this" patch.

I should have updated this thread with that info.  Sorry 'bout that.

P.



Re: [PATCH V13] printk: Add monotonic, boottime, and realtime timestamps

2020-07-29 Thread Prarit Bhargava
  Chunyan Zhang  wrote:
> From: Prarit Bhargava 
> 
> printk.time=1/CONFIG_PRINTK_TIME=1 adds a unmodified local hardware clock
> timestamp to printk messages.  The local hardware clock loses time each
> day making it difficult to determine exactly when an issue has occurred in
> the kernel log, and making it difficult to determine how kernel and
> hardware issues relate to each other in real time.
> 
> Make printk output different timestamps by adding options for no
> timestamp, the local hardware clock, the monotonic clock, the boottime
> clock, and the real clock.  Allow a user to pick one of the clocks by
> using the printk.time kernel parameter.  Output the type of clock in
> /sys/module/printk/parameters/time so userspace programs can interpret the
> timestamp.
> 
> v13: This patch seems have being forgotten for 3 years. Rebase it on
> the latest kernel v5.8, reolve conflicts and fix compiling errors.
> Change code to adapt new printk_time usage.
> Petr's concern on printk_time is addressed by current version of kernel, too.

Lyra,

Copying a reply I sent to Orson who sent me this patch privately this
morning with some additional information.

ISTR the reason that this was dropped was because of the a problem with
the way systemd read the kernel's timestamps.  It got the attention of
Linus, and it was then pulled from the tree.

I need to go back and review the entire thread as it's been several years
since we had the discussion although ISTR someone mentioning that doing two
timestamps would not be a problem for systemd.

For example,

[48551.015086]

would be

[48551.015086] m[.]

for the monotonic clock timestamp, and

[48551.015086] b[.]

for the boottime clock, etc.

P.



Re: [PATCH v5] x86/split_lock: Sanitize userspace and guest error output

2020-06-30 Thread Prarit Bhargava
Just re-pinging on this.

P.

On 6/16/20 7:32 AM, Prarit Bhargava wrote:
> There are two problems with kernel messages in fatal mode that were found
> during testing of guests and userspace programs.
> 
> The first is that no kernel message is output when the split lock detector
> is triggered with a userspace program.  As a result the userspace process
> dies from receiving SIGBUS with no indication to the user of what caused
> the process to die.
> 
> The second problem is that only the first triggering guest causes a kernel
> message to be output because the message is output with pr_warn_once().
> This also results in a loss of information to the user.
> 
> While fixing these I noticed that the same message was being output
> three times so I'm cleaning that up too.
> 
> Fix fatal mode output, and use consistent messages for fatal and
> warn modes for both userspace and guests.
> 
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> Signed-off-by: Prarit Bhargava 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: x...@kernel.org
> Cc: "H. Peter Anvin" 
> Cc: Tony Luck 
> Cc: "Peter Zijlstra (Intel)" 
> Cc: Sean Christopherson 
> Cc: Rahul Tanwar 
> Cc: Xiaoyao Li 
> Cc: Ricardo Neri 
> Cc: Dave Hansen 
> ---
> v2: Do not output a message if CPL 3 Alignment Check is turned on (xiaoyao.li)
> v3: refactor code (sean.j.christopherson)
> v4: Fix Sign off (sean.j.christopherson)
> v5: Fix Sign off (sean.j.christopherson)
> 
>  arch/x86/kernel/cpu/intel.c | 22 ++
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 63926c94eb5f..3a373f0be674 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -1074,11 +1074,14 @@ static void split_lock_init(void)
>   split_lock_verify_msr(sld_state != sld_off);
>  }
>  
> -static void split_lock_warn(unsigned long ip)
> +static bool handle_split_lock(unsigned long ip)
>  {
> - pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
> 0x%lx\n",
> + pr_warn("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
>   current->comm, current->pid, ip);
>  
> + if (sld_state != sld_warn)
> + return false;
> +
>   /*
>* Disable the split lock detection for this task so it can make
>* progress and set TIF_SLD so the detection is re-enabled via
> @@ -1086,18 +1089,13 @@ static void split_lock_warn(unsigned long ip)
>*/
>   sld_update_msr(false);
>   set_tsk_thread_flag(current, TIF_SLD);
> + return true;
>  }
>  
>  bool handle_guest_split_lock(unsigned long ip)
>  {
> - if (sld_state == sld_warn) {
> - split_lock_warn(ip);
> + if (handle_split_lock(ip))
>   return true;
> - }
> -
> - pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
> -  current->comm, current->pid,
> -  sld_state == sld_fatal ? "fatal" : "bogus", ip);
>  
>   current->thread.error_code = 0;
>   current->thread.trap_nr = X86_TRAP_AC;
> @@ -1108,10 +1106,10 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
>  
>  bool handle_user_split_lock(struct pt_regs *regs, long error_code)
>  {
> - if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> + if (regs->flags & X86_EFLAGS_AC)
>   return false;
> - split_lock_warn(regs->ip);
> - return true;
> +
> + return handle_split_lock(regs->ip);
>  }
>  
>  /*
> 



[PATCH] turbostat: Use sched_getcpu() instead of hardcoded cpu 0

2020-06-29 Thread Prarit Bhargava
Disabling cpu 0 results in an error

turbostat: /sys/devices/system/cpu/cpu0/topology/thread_siblings: open failed: 
No such file or directory

Use sched_getcpu() instead of a hardcoded cpu 0 to get the max cpu number.

Signed-off-by: Prarit Bhargava 
Cc: Len Brown 
Cc: linux...@vger.kernel.org
Cc: Srinivas Pandruvada 
---
 tools/power/x86/turbostat/turbostat.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/power/x86/turbostat/turbostat.c 
b/tools/power/x86/turbostat/turbostat.c
index 33b370865d16..2d3a3012f2f8 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -2769,12 +2769,19 @@ void re_initialize(void)
 void set_max_cpu_num(void)
 {
FILE *filep;
+   int base_cpu;
unsigned long dummy;
+   char pathname[64];
 
+   base_cpu = sched_getcpu();
+   if (base_cpu < 0)
+   err(1, "cannot find calling cpu ID");
+   sprintf(pathname,
+   "/sys/devices/system/cpu/cpu%d/topology/thread_siblings",
+   base_cpu);
+
+   filep = fopen_or_die(pathname, "r");
topo.max_cpu_num = 0;
-   filep = fopen_or_die(
-   "/sys/devices/system/cpu/cpu0/topology/thread_siblings",
-   "r");
while (fscanf(filep, "%lx,", ) == 1)
topo.max_cpu_num += BITMASK_SIZE;
fclose(filep);
-- 
2.21.3



[PATCH v5] x86/split_lock: Sanitize userspace and guest error output

2020-06-16 Thread Prarit Bhargava
There are two problems with kernel messages in fatal mode that were found
during testing of guests and userspace programs.

The first is that no kernel message is output when the split lock detector
is triggered with a userspace program.  As a result the userspace process
dies from receiving SIGBUS with no indication to the user of what caused
the process to die.

The second problem is that only the first triggering guest causes a kernel
message to be output because the message is output with pr_warn_once().
This also results in a loss of information to the user.

While fixing these I noticed that the same message was being output
three times so I'm cleaning that up too.

Fix fatal mode output, and use consistent messages for fatal and
warn modes for both userspace and guests.

Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Signed-off-by: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: "Peter Zijlstra (Intel)" 
Cc: Sean Christopherson 
Cc: Rahul Tanwar 
Cc: Xiaoyao Li 
Cc: Ricardo Neri 
Cc: Dave Hansen 
---
v2: Do not output a message if CPL 3 Alignment Check is turned on (xiaoyao.li)
v3: refactor code (sean.j.christopherson)
v4: Fix Sign off (sean.j.christopherson)
v5: Fix Sign off (sean.j.christopherson)

 arch/x86/kernel/cpu/intel.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 63926c94eb5f..3a373f0be674 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1074,11 +1074,14 @@ static void split_lock_init(void)
split_lock_verify_msr(sld_state != sld_off);
 }
 
-static void split_lock_warn(unsigned long ip)
+static bool handle_split_lock(unsigned long ip)
 {
-   pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
0x%lx\n",
+   pr_warn("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
current->comm, current->pid, ip);
 
+   if (sld_state != sld_warn)
+   return false;
+
/*
 * Disable the split lock detection for this task so it can make
 * progress and set TIF_SLD so the detection is re-enabled via
@@ -1086,18 +1089,13 @@ static void split_lock_warn(unsigned long ip)
 */
sld_update_msr(false);
set_tsk_thread_flag(current, TIF_SLD);
+   return true;
 }
 
 bool handle_guest_split_lock(unsigned long ip)
 {
-   if (sld_state == sld_warn) {
-   split_lock_warn(ip);
+   if (handle_split_lock(ip))
return true;
-   }
-
-   pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
-current->comm, current->pid,
-sld_state == sld_fatal ? "fatal" : "bogus", ip);
 
current->thread.error_code = 0;
current->thread.trap_nr = X86_TRAP_AC;
@@ -1108,10 +1106,10 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
 
 bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
-   if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+   if (regs->flags & X86_EFLAGS_AC)
return false;
-   split_lock_warn(regs->ip);
-   return true;
+
+   return handle_split_lock(regs->ip);
 }
 
 /*
-- 
2.21.3



Re: [PATCH v4] x86/split_lock: Sanitize userspace and guest error output

2020-06-16 Thread Prarit Bhargava
This was sent in error.

Sorry,

P.


On 6/16/20 7:27 AM, Prarit Bhargava wrote:
> There are two problems with kernel messages in fatal mode that were found
> during testing of guests and userspace programs.
> 
> The first is that no kernel message is output when the split lock detector
> is triggered with a userspace program.  As a result the userspace process
> dies from receiving SIGBUS with no indication to the user of what caused
> the process to die.
> 
> The second problem is that only the first triggering guest causes a kernel
> message to be output because the message is output with pr_warn_once().
> This also results in a loss of information to the user.
> 
> While fixing these I noticed that the same message was being output
> three times so I'm cleaning that up too.
> 
> Fix fatal mode output, and use consistent messages for fatal and
> warn modes for both userspace and guests.
> 
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: x...@kernel.org
> Cc: "H. Peter Anvin" 
> Cc: Tony Luck 
> Cc: "Peter Zijlstra (Intel)" 
> Cc: Sean Christopherson 
> Cc: Rahul Tanwar 
> Cc: Xiaoyao Li 
> Cc: Ricardo Neri 
> Cc: Dave Hansen 
> ---
> v2: Do not output a message if CPL 3 Alignment Check is turned on (xiaoyao.li)
> v3: refactor code (sean.j.christopherson)
> v4: Fix Sign off (sean.j.christopherson)
> 
>  arch/x86/kernel/cpu/intel.c | 22 ++
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 63926c94eb5f..3a373f0be674 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -1074,11 +1074,14 @@ static void split_lock_init(void)
>   split_lock_verify_msr(sld_state != sld_off);
>  }
>  
> -static void split_lock_warn(unsigned long ip)
> +static bool handle_split_lock(unsigned long ip)
>  {
> - pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
> 0x%lx\n",
> + pr_warn("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
>   current->comm, current->pid, ip);
>  
> + if (sld_state != sld_warn)
> + return false;
> +
>   /*
>* Disable the split lock detection for this task so it can make
>* progress and set TIF_SLD so the detection is re-enabled via
> @@ -1086,18 +1089,13 @@ static void split_lock_warn(unsigned long ip)
>*/
>   sld_update_msr(false);
>   set_tsk_thread_flag(current, TIF_SLD);
> + return true;
>  }
>  
>  bool handle_guest_split_lock(unsigned long ip)
>  {
> - if (sld_state == sld_warn) {
> - split_lock_warn(ip);
> + if (handle_split_lock(ip))
>   return true;
> - }
> -
> - pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
> -  current->comm, current->pid,
> -  sld_state == sld_fatal ? "fatal" : "bogus", ip);
>  
>   current->thread.error_code = 0;
>   current->thread.trap_nr = X86_TRAP_AC;
> @@ -1108,10 +1106,10 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
>  
>  bool handle_user_split_lock(struct pt_regs *regs, long error_code)
>  {
> - if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> + if (regs->flags & X86_EFLAGS_AC)
>   return false;
> - split_lock_warn(regs->ip);
> - return true;
> +
> + return handle_split_lock(regs->ip);
>  }
>  
>  /*
> 



[PATCH v4] x86/split_lock: Sanitize userspace and guest error output

2020-06-16 Thread Prarit Bhargava
There are two problems with kernel messages in fatal mode that were found
during testing of guests and userspace programs.

The first is that no kernel message is output when the split lock detector
is triggered with a userspace program.  As a result the userspace process
dies from receiving SIGBUS with no indication to the user of what caused
the process to die.

The second problem is that only the first triggering guest causes a kernel
message to be output because the message is output with pr_warn_once().
This also results in a loss of information to the user.

While fixing these I noticed that the same message was being output
three times so I'm cleaning that up too.

Fix fatal mode output, and use consistent messages for fatal and
warn modes for both userspace and guests.

Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: "Peter Zijlstra (Intel)" 
Cc: Sean Christopherson 
Cc: Rahul Tanwar 
Cc: Xiaoyao Li 
Cc: Ricardo Neri 
Cc: Dave Hansen 
---
v2: Do not output a message if CPL 3 Alignment Check is turned on (xiaoyao.li)
v3: refactor code (sean.j.christopherson)
v4: Fix Sign off (sean.j.christopherson)

 arch/x86/kernel/cpu/intel.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 63926c94eb5f..3a373f0be674 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1074,11 +1074,14 @@ static void split_lock_init(void)
split_lock_verify_msr(sld_state != sld_off);
 }
 
-static void split_lock_warn(unsigned long ip)
+static bool handle_split_lock(unsigned long ip)
 {
-   pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
0x%lx\n",
+   pr_warn("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
current->comm, current->pid, ip);
 
+   if (sld_state != sld_warn)
+   return false;
+
/*
 * Disable the split lock detection for this task so it can make
 * progress and set TIF_SLD so the detection is re-enabled via
@@ -1086,18 +1089,13 @@ static void split_lock_warn(unsigned long ip)
 */
sld_update_msr(false);
set_tsk_thread_flag(current, TIF_SLD);
+   return true;
 }
 
 bool handle_guest_split_lock(unsigned long ip)
 {
-   if (sld_state == sld_warn) {
-   split_lock_warn(ip);
+   if (handle_split_lock(ip))
return true;
-   }
-
-   pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
-current->comm, current->pid,
-sld_state == sld_fatal ? "fatal" : "bogus", ip);
 
current->thread.error_code = 0;
current->thread.trap_nr = X86_TRAP_AC;
@@ -1108,10 +1106,10 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
 
 bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
-   if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+   if (regs->flags & X86_EFLAGS_AC)
return false;
-   split_lock_warn(regs->ip);
-   return true;
+
+   return handle_split_lock(regs->ip);
 }
 
 /*
-- 
2.21.3



[PATCH v3] x86/split_lock: Sanitize userspace and guest error output

2020-06-15 Thread Prarit Bhargava
There are two problems with kernel messages in fatal mode that were found
during testing of guests and userspace programs.

The first is that no kernel message is output when the split lock detector
is triggered with a userspace program.  As a result the userspace process
dies from receiving SIGBUS with no indication to the user of what caused
the process to die.

The second problem is that only the first triggering guest causes a kernel
message to be output because the message is output with pr_warn_once().
This also results in a loss of information to the user.

While fixing these I noticed that the same message was being output
three times so I'm cleaning that up too.

Fix fatal mode output, and use consistent messages for fatal and
warn modes for both userspace and guests.

Signed-off-by: Prarit Bhargava 
Signed-off-by: Sean Christopherson 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: "Peter Zijlstra (Intel)" 
Cc: Sean Christopherson 
Cc: Rahul Tanwar 
Cc: Xiaoyao Li 
Cc: Ricardo Neri 
Cc: Dave Hansen 
---
v2: Do not output a message if CPL 3 Alignment Check is turned on (xiaoyao.li)
v3: refactor code (sean.j.christopherson)

 arch/x86/kernel/cpu/intel.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 63926c94eb5f..3a373f0be674 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1074,11 +1074,14 @@ static void split_lock_init(void)
split_lock_verify_msr(sld_state != sld_off);
 }
 
-static void split_lock_warn(unsigned long ip)
+static bool handle_split_lock(unsigned long ip)
 {
-   pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
0x%lx\n",
+   pr_warn("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
current->comm, current->pid, ip);
 
+   if (sld_state != sld_warn)
+   return false;
+
/*
 * Disable the split lock detection for this task so it can make
 * progress and set TIF_SLD so the detection is re-enabled via
@@ -1086,18 +1089,13 @@ static void split_lock_warn(unsigned long ip)
 */
sld_update_msr(false);
set_tsk_thread_flag(current, TIF_SLD);
+   return true;
 }
 
 bool handle_guest_split_lock(unsigned long ip)
 {
-   if (sld_state == sld_warn) {
-   split_lock_warn(ip);
+   if (handle_split_lock(ip))
return true;
-   }
-
-   pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
-current->comm, current->pid,
-sld_state == sld_fatal ? "fatal" : "bogus", ip);
 
current->thread.error_code = 0;
current->thread.trap_nr = X86_TRAP_AC;
@@ -1108,10 +1106,10 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
 
 bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
-   if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+   if (regs->flags & X86_EFLAGS_AC)
return false;
-   split_lock_warn(regs->ip);
-   return true;
+
+   return handle_split_lock(regs->ip);
 }
 
 /*
-- 
2.21.3



Re: [PATCH v2] x86/split_lock: Sanitize userspace and guest error output

2020-06-11 Thread Prarit Bhargava



On 6/8/20 1:15 PM, Sean Christopherson wrote:
> On Mon, Jun 08, 2020 at 08:21:14AM -0400, Prarit Bhargava wrote:
>> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
>> index 166d7c355896..e02ec81fe1eb 100644
>> --- a/arch/x86/kernel/cpu/intel.c
>> +++ b/arch/x86/kernel/cpu/intel.c
>> @@ -1074,10 +1074,17 @@ static void split_lock_init(void)
>>  split_lock_verify_msr(sld_state != sld_off);
>>  }
>>  
>> -static void split_lock_warn(unsigned long ip)
>> +static bool split_lock_warn(unsigned long ip, int fatal_no_warn)
>>  {
>> -pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
>> 0x%lx\n",
>> -current->comm, current->pid, ip);
>> +if (fatal_no_warn)
>> +return false;
> 
> This misses the point Xiaoyao was making.  If EFLAGS.AC=1 then the #AC is a
> legacy alignment check fault and should not be treated as a split-lock #AC.
> The basic premise of the patch makes sense, but the end result is confusing
> because incorporating "fatal" and the EFLAGS.AC state into split_lock_warn()
> bastardizes both the "split_lock" and "warn" aspects of the function.
> 
> E.g. something like this yields the same net effect, it's just organized
> differently.  If so desired, the "bogus" message could be dropped via
> Xiaoyao's prep patch[*] so that this change would only affect the sld_fatal
> messages.
> 
> [*] https://lkml.kernel.org/r/20200509110542.8159-3-xiaoyao...@intel.com
> 
> 

Sean, I will just take your patch to make things easy.  I will add you as a
Signed-off-by.

/me is testing the patch right now

P.

> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 23fd5f319908..1aad0b8e394c 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -1071,11 +1071,14 @@ static void split_lock_init(void)
> split_lock_verify_msr(sld_state != sld_off);
>  }
> 
> -static void split_lock_warn(unsigned long ip)
> +static bool handle_split_lock(unsigned long ip)
>  {
> pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
> 0x%lx\n",
> current->comm, current->pid, ip);
> 
> +   if (sld_state != sld_warn)
> +   return false;
> +
> /*
>  * Disable the split lock detection for this task so it can make
>  * progress and set TIF_SLD so the detection is re-enabled via
> @@ -1083,18 +1086,13 @@ static void split_lock_warn(unsigned long ip)
>  */
> sld_update_msr(false);
> set_tsk_thread_flag(current, TIF_SLD);
> +   return true;
>  }
> 
>  bool handle_guest_split_lock(unsigned long ip)
>  {
> -   if (sld_state == sld_warn) {
> -   split_lock_warn(ip);
> +   if (handle_split_lock(ip))
> return true;
> -   }
> -
> -   pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
> -current->comm, current->pid,
> -sld_state == sld_fatal ? "fatal" : "bogus", ip);
> 
> current->thread.error_code = 0;
> current->thread.trap_nr = X86_TRAP_AC;
> @@ -1105,10 +1103,10 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
> 
>  bool handle_user_split_lock(struct pt_regs *regs, long error_code)
>  {
> -   if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> +   if (regs->flags & X86_EFLAGS_AC)
> return false;
> -   split_lock_warn(regs->ip);
> -   return true;
> +
> +   return handle_split_lock(regs->ip);
>  }
> 
>  /*
> 
> 
>> +
>> +pr_warn_ratelimited("#AC: %s/%d %ssplit_lock trap at address: 0x%lx\n",
>> +current->comm, current->pid,
>> +sld_state == sld_fatal ? "fatal " : "", ip);
>> +
>> +if (sld_state == sld_fatal)
>> +return false;
>>  
>>  /*
>>   * Disable the split lock detection for this task so it can make
>> @@ -1086,18 +1093,13 @@ static void split_lock_warn(unsigned long ip)
>>   */
>>  sld_update_msr(false);
>>  set_tsk_thread_flag(current, TIF_SLD);
>> +return true;
>>  }
>>  
>>  bool handle_guest_split_lock(unsigned long ip)
>>  {
>> -if (sld_state == sld_warn) {
>> -split_lock_warn(ip);
>> +if (split_lock_warn(ip, 0))
>>  return true;
>> -}
>> -
>> -pr_warn_o

Re: [PATCH v2] x86/split_lock: Sanitize userspace and guest error output

2020-06-11 Thread Prarit Bhargava



On 6/8/20 1:15 PM, Sean Christopherson wrote:
> On Mon, Jun 08, 2020 at 08:21:14AM -0400, Prarit Bhargava wrote:
>> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
>> index 166d7c355896..e02ec81fe1eb 100644
>> --- a/arch/x86/kernel/cpu/intel.c
>> +++ b/arch/x86/kernel/cpu/intel.c
>> @@ -1074,10 +1074,17 @@ static void split_lock_init(void)
>>  split_lock_verify_msr(sld_state != sld_off);
>>  }
>>  
>> -static void split_lock_warn(unsigned long ip)
>> +static bool split_lock_warn(unsigned long ip, int fatal_no_warn)
>>  {
>> -pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
>> 0x%lx\n",
>> -current->comm, current->pid, ip);
>> +if (fatal_no_warn)
>> +return false;
> 
> This misses the point Xiaoyao was making.  If EFLAGS.AC=1 then the #AC is a
> legacy alignment check fault and should not be treated as a split-lock #AC.
> The basic premise of the patch makes sense, but the end result is confusing
> because incorporating "fatal" and the EFLAGS.AC state into split_lock_warn()
> bastardizes both the "split_lock" and "warn" aspects of the function.
> 
> E.g. something like this yields the same net effect, it's just organized
> differently.  If so desired, the "bogus" message could be dropped via
> Xiaoyao's prep patch[*] so that this change would only affect the sld_fatal
> messages.
> 
> [*] https://lkml.kernel.org/r/20200509110542.8159-3-xiaoyao...@intel.com
> 


Sean, I'll just go with your patch below.  It's good enough.  I'll add a
Signed-off-by from you as well.

P.

> 
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 23fd5f319908..1aad0b8e394c 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -1071,11 +1071,14 @@ static void split_lock_init(void)
> split_lock_verify_msr(sld_state != sld_off);
>  }
> 
> -static void split_lock_warn(unsigned long ip)
> +static bool handle_split_lock(unsigned long ip)
>  {
> pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
> 0x%lx\n",
> current->comm, current->pid, ip);
> 
> +   if (sld_state != sld_warn)
> +   return false;
> +
> /*
>  * Disable the split lock detection for this task so it can make
>  * progress and set TIF_SLD so the detection is re-enabled via
> @@ -1083,18 +1086,13 @@ static void split_lock_warn(unsigned long ip)
>  */
> sld_update_msr(false);
> set_tsk_thread_flag(current, TIF_SLD);
> +   return true;
>  }
> 
>  bool handle_guest_split_lock(unsigned long ip)
>  {
> -   if (sld_state == sld_warn) {
> -   split_lock_warn(ip);
> +   if (handle_split_lock(ip))
> return true;
> -   }
> -
> -   pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
> -current->comm, current->pid,
> -sld_state == sld_fatal ? "fatal" : "bogus", ip);
> 
> current->thread.error_code = 0;
> current->thread.trap_nr = X86_TRAP_AC;
> @@ -1105,10 +1103,10 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
> 
>  bool handle_user_split_lock(struct pt_regs *regs, long error_code)
>  {
> -   if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> +   if (regs->flags & X86_EFLAGS_AC)
> return false;
> -   split_lock_warn(regs->ip);
> -   return true;
> +
> +   return handle_split_lock(regs->ip);
>  }
> 
>  /*
> 
> 
>> +
>> +pr_warn_ratelimited("#AC: %s/%d %ssplit_lock trap at address: 0x%lx\n",
>> +current->comm, current->pid,
>> +sld_state == sld_fatal ? "fatal " : "", ip);
>> +
>> +if (sld_state == sld_fatal)
>> +return false;
>>  
>>  /*
>>   * Disable the split lock detection for this task so it can make
>> @@ -1086,18 +1093,13 @@ static void split_lock_warn(unsigned long ip)
>>   */
>>  sld_update_msr(false);
>>  set_tsk_thread_flag(current, TIF_SLD);
>> +return true;
>>  }
>>  
>>  bool handle_guest_split_lock(unsigned long ip)
>>  {
>> -if (sld_state == sld_warn) {
>> -split_lock_warn(ip);
>> +if (split_lock_warn(ip, 0))
>>  return true;
>> -}
>> -
>> -pr_warn_o

[PATCH v2] x86/split_lock: Sanitize userspace and guest error output

2020-06-08 Thread Prarit Bhargava
There are two problems with kernel messages in fatal mode that
were found during testing of guests and userspace programs.

The first is that no kernel message is output when the split lock detector
is triggered with a userspace program.  As a result the userspace process
dies from receiving SIGBUS with no indication to the user of what caused
the process to die.

The second problem is that only the first triggering guest causes a kernel
message to be output because the message is output with pr_warn_once().
This also results in a loss of information to the user.

While fixing these I noticed that the same message was being output
three times so I'm cleaning that up too.

Fix fatal mode output, and use consistent messages for fatal and
warn modes for both userspace and guests.

Signed-off-by: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: "Peter Zijlstra (Intel)" 
Cc: Sean Christopherson 
Cc: Rahul Tanwar 
Cc: Xiaoyao Li 
Cc: Ricardo Neri 
Cc: Dave Hansen 
---
v2: Do not output a message if CPL 3 Alignment Check is turned on (xiaoyao.li)

 arch/x86/kernel/cpu/intel.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 166d7c355896..e02ec81fe1eb 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1074,10 +1074,17 @@ static void split_lock_init(void)
split_lock_verify_msr(sld_state != sld_off);
 }
 
-static void split_lock_warn(unsigned long ip)
+static bool split_lock_warn(unsigned long ip, int fatal_no_warn)
 {
-   pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
0x%lx\n",
-   current->comm, current->pid, ip);
+   if (fatal_no_warn)
+   return false;
+
+   pr_warn_ratelimited("#AC: %s/%d %ssplit_lock trap at address: 0x%lx\n",
+   current->comm, current->pid,
+   sld_state == sld_fatal ? "fatal " : "", ip);
+
+   if (sld_state == sld_fatal)
+   return false;
 
/*
 * Disable the split lock detection for this task so it can make
@@ -1086,18 +1093,13 @@ static void split_lock_warn(unsigned long ip)
 */
sld_update_msr(false);
set_tsk_thread_flag(current, TIF_SLD);
+   return true;
 }
 
 bool handle_guest_split_lock(unsigned long ip)
 {
-   if (sld_state == sld_warn) {
-   split_lock_warn(ip);
+   if (split_lock_warn(ip, 0))
return true;
-   }
-
-   pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
-current->comm, current->pid,
-sld_state == sld_fatal ? "fatal" : "bogus", ip);
 
current->thread.error_code = 0;
current->thread.trap_nr = X86_TRAP_AC;
@@ -1108,10 +1110,7 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
 
 bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
-   if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
-   return false;
-   split_lock_warn(regs->ip);
-   return true;
+   return split_lock_warn(regs->ip, regs->flags & X86_EFLAGS_AC);
 }
 
 /*
-- 
2.21.3



Re: [PATCH] x86/split_lock: Sanitize userspace and guest error output

2020-06-05 Thread Prarit Bhargava



On 6/5/20 11:29 AM, Xiaoyao Li wrote:
> On 6/5/2020 7:44 PM, Prarit Bhargava wrote:
>> There are two problems with kernel messages in fatal mode that
>> were found during testing of guests and userspace programs.
>>
>> The first is that no kernel message is output when the split lock detector
>> is triggered with a userspace program.  As a result the userspace process
>> dies from receiving SIGBUS with no indication to the user of what caused
>> the process to die.
>>
>> The second problem is that only the first triggering guest causes a kernel
>> message to be output because the message is output with pr_warn_once().
>> This also results in a loss of information to the user.
>>
>> While fixing these I noticed that the same message was being output
>> three times so I'm cleaning that up too.
>>
>> Fix fatal mode output, and use consistent messages for fatal and
>> warn modes for both userspace and guests.
>>
>> Signed-off-by: Prarit Bhargava 
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: x...@kernel.org
>> Cc: "H. Peter Anvin" 
>> Cc: Tony Luck 
>> Cc: "Peter Zijlstra (Intel)" 
>> Cc: Sean Christopherson 
>> Cc: Rahul Tanwar 
>> Cc: Xiaoyao Li 
>> Cc: Ricardo Neri 
>> Cc: Dave Hansen 
>> ---
>>   arch/x86/kernel/cpu/intel.c | 24 ++--
>>   1 file changed, 10 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
>> index 166d7c355896..463022aa9b7a 100644
>> --- a/arch/x86/kernel/cpu/intel.c
>> +++ b/arch/x86/kernel/cpu/intel.c
>> @@ -1074,10 +1074,14 @@ static void split_lock_init(void)
>>   split_lock_verify_msr(sld_state != sld_off);
>>   }
>>   -static void split_lock_warn(unsigned long ip)
>> +static bool split_lock_warn(unsigned long ip, int fatal)
>>   {
>> -    pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
>> 0x%lx\n",
>> -    current->comm, current->pid, ip);
>> +    pr_warn_ratelimited("#AC: %s/%d %ssplit_lock trap at address: 0x%lx\n",
>> +    current->comm, current->pid,
>> +    sld_state == sld_fatal ? "fatal " : "", ip);
>> +
>> +    if (sld_state == sld_fatal || fatal)
>> +    return false;
>>     /*
>>    * Disable the split lock detection for this task so it can make
>> @@ -1086,18 +1090,13 @@ static void split_lock_warn(unsigned long ip)
>>    */
>>   sld_update_msr(false);
>>   set_tsk_thread_flag(current, TIF_SLD);
>> +    return true;
>>   }
>>     bool handle_guest_split_lock(unsigned long ip)
>>   {
>> -    if (sld_state == sld_warn) {
>> -    split_lock_warn(ip);
>> +    if (split_lock_warn(ip, 0))
>>   return true;
>> -    }
>> -
>> -    pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
>> - current->comm, current->pid,
>> - sld_state == sld_fatal ? "fatal" : "bogus", ip);
>>     current->thread.error_code = 0;
>>   current->thread.trap_nr = X86_TRAP_AC;
>> @@ -1108,10 +1107,7 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
>>     bool handle_user_split_lock(struct pt_regs *regs, long error_code)
>>   {
>> -    if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
>> -    return false;
>> -    split_lock_warn(regs->ip);
>> -    return true;
>> +    return split_lock_warn(regs->ip, regs->flags & X86_EFLAGS_AC);
> 
> It's incorrect. You change the behavior that it will print the split lock
> warning even when CPL 3 Alignment Check is turned on.

Do you want the message to be displayed in the fatal case of CPL 3 Alignment 
check?

P.



[PATCH] x86/split_lock: Sanitize userspace and guest error output

2020-06-05 Thread Prarit Bhargava
There are two problems with kernel messages in fatal mode that
were found during testing of guests and userspace programs.

The first is that no kernel message is output when the split lock detector
is triggered with a userspace program.  As a result the userspace process
dies from receiving SIGBUS with no indication to the user of what caused
the process to die.

The second problem is that only the first triggering guest causes a kernel
message to be output because the message is output with pr_warn_once().
This also results in a loss of information to the user.

While fixing these I noticed that the same message was being output
three times so I'm cleaning that up too.

Fix fatal mode output, and use consistent messages for fatal and
warn modes for both userspace and guests.

Signed-off-by: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: "Peter Zijlstra (Intel)" 
Cc: Sean Christopherson 
Cc: Rahul Tanwar 
Cc: Xiaoyao Li 
Cc: Ricardo Neri 
Cc: Dave Hansen 
---
 arch/x86/kernel/cpu/intel.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 166d7c355896..463022aa9b7a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1074,10 +1074,14 @@ static void split_lock_init(void)
split_lock_verify_msr(sld_state != sld_off);
 }
 
-static void split_lock_warn(unsigned long ip)
+static bool split_lock_warn(unsigned long ip, int fatal)
 {
-   pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 
0x%lx\n",
-   current->comm, current->pid, ip);
+   pr_warn_ratelimited("#AC: %s/%d %ssplit_lock trap at address: 0x%lx\n",
+   current->comm, current->pid,
+   sld_state == sld_fatal ? "fatal " : "", ip);
+
+   if (sld_state == sld_fatal || fatal)
+   return false;
 
/*
 * Disable the split lock detection for this task so it can make
@@ -1086,18 +1090,13 @@ static void split_lock_warn(unsigned long ip)
 */
sld_update_msr(false);
set_tsk_thread_flag(current, TIF_SLD);
+   return true;
 }
 
 bool handle_guest_split_lock(unsigned long ip)
 {
-   if (sld_state == sld_warn) {
-   split_lock_warn(ip);
+   if (split_lock_warn(ip, 0))
return true;
-   }
-
-   pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
-current->comm, current->pid,
-sld_state == sld_fatal ? "fatal" : "bogus", ip);
 
current->thread.error_code = 0;
current->thread.trap_nr = X86_TRAP_AC;
@@ -1108,10 +1107,7 @@ EXPORT_SYMBOL_GPL(handle_guest_split_lock);
 
 bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
-   if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
-   return false;
-   split_lock_warn(regs->ip);
-   return true;
+   return split_lock_warn(regs->ip, regs->flags & X86_EFLAGS_AC);
 }
 
 /*
-- 
2.21.3



Re: [PATCH v2 0/3] printk: replace ringbuffer

2020-05-13 Thread Prarit Bhargava



On 5/1/20 5:40 AM, John Ogness wrote:
> Hello,
> 
> Here is a v2 for the first series to rework the printk subsystem. The
> v1 and history are here [0]. This first series only replaces the
> existing ringbuffer implementation. No locking is removed. No
> semantics/behavior of printk are changed.
> 
> The VMCOREINFO is updated. RFC patches for the external tools
> crash(8) [1] and makedumpfile(8) [2] have been submitted that allow
> the new ringbuffer to be correctly read.
> 
> This series is in line with the agreements [3] made at the meeting
> during LPC2019 in Lisbon, with 1 exception: support for dictionaries
> will not be discontinued [4]. Dictionaries are stored in a separate
> buffer so that they cannot interfere with the human-readable buffer.
> 

Hey John,

I tested this on two 128-cpu ppc64le power8 boxes, and a 64 core x86_64 box
using your prb-test testsuite with the latest upstream kernel.  The tests ran
successfully for ~16 hours without any problems.

P.



[PATCH] intel-speed-select: Fix json perf-profile output output

2020-05-11 Thread Prarit Bhargava
The 'intel-speed-select -f json perf-profile get-lock-status' command
outputs the package, die, and cpu data as separate fields.

ex)

  "package-0": {
"die-0": {
  "cpu-0": {

Commit 74062363f855 ("tools/power/x86/intel-speed-select: Avoid duplicate 
Package strings for json") prettied this output so that it is a single line for
some json output commands and the same should be done for other commands.

Output package, die, and cpu info in a single line when using json output.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: platform-driver-...@vger.kernel.org
---
 .../x86/intel-speed-select/isst-display.c | 26 +--
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index f6e2ce181123..e105fece47b6 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -316,21 +316,31 @@ void isst_ctdp_display_core_info(int cpu, FILE *outf, 
char *prefix,
 {
char header[256];
char value[256];
+   int level = 1;
+
+   if (out_format_is_json()) {
+   snprintf(header, sizeof(header), "package-%d:die-%d:cpu-%d",
+get_physical_package_id(cpu), get_physical_die_id(cpu),
+cpu);
+   format_and_print(outf, level++, header, NULL);
+   } else {
+   snprintf(header, sizeof(header), "package-%d",
+get_physical_package_id(cpu));
+   format_and_print(outf, level++, header, NULL);
+   snprintf(header, sizeof(header), "die-%d",
+get_physical_die_id(cpu));
+   format_and_print(outf, level++, header, NULL);
+   snprintf(header, sizeof(header), "cpu-%d", cpu);
+   format_and_print(outf, level++, header, NULL);
+   }
 
-   snprintf(header, sizeof(header), "package-%d",
-get_physical_package_id(cpu));
-   format_and_print(outf, 1, header, NULL);
-   snprintf(header, sizeof(header), "die-%d", get_physical_die_id(cpu));
-   format_and_print(outf, 2, header, NULL);
-   snprintf(header, sizeof(header), "cpu-%d", cpu);
-   format_and_print(outf, 3, header, NULL);
if (str0 && !val)
snprintf(value, sizeof(value), "%s", str0);
else if (str1 && val)
snprintf(value, sizeof(value), "%s", str1);
else
snprintf(value, sizeof(value), "%u", val);
-   format_and_print(outf, 4, prefix, value);
+   format_and_print(outf, level, prefix, value);
 
format_and_print(outf, 1, NULL, NULL);
 }
-- 
2.18.4



Re: [PATCH v3 0/6] Add CascadeLake-N Support

2019-10-14 Thread Prarit Bhargava



On 10/7/19 3:30 PM, Srinivas Pandruvada wrote:
> Add support for SST-BF on CascadeLake-N support.  The CascadeLake-N
> processor only support SST-BF and not other SST functionality.
>

Sorry Srinivas, was away from keyboard all last week :(

> v3:
> Fix crash due to geline

^^^ curious how you hit this?  I was repeatedly testing and couldn't
get it to happen.

> Fix display to perf-profile info and base-freq info command
> Fix output for coremask
> Fix base frequency CPU list. This should be displayed for a package
> Auto mode support for base-freq enable/disable
> One of the patch for config only change folded to next one where it is
> used.
>
> The patch 1 has nothing to do with the CLX-N. It saves some bytes in the
> size.
>

Reviewed-by: Prarit Bhargava 

P.


Re: [PATCH v2 3/7] intel-speed-select: Add check for CascadeLake-N models

2019-10-04 Thread Prarit Bhargava



On 10/4/19 1:15 PM, Srinivas Pandruvada wrote:
> On Thu, 2019-10-03 at 08:11 -0400, Prarit Bhargava wrote:
>> Three CascadeLake-N models (6252N, 6230N, and 5218N) have SST-PBF
>> support.
>>
>> Return an error if the CascadeLake processor is not one of these
>> specific
>> models.
>>
> This patch sigfaults immediately on CLX.>
>> v2: Add is_clx_n_platform()
>>
>> Signed-off-by: Prarit Bhargava 
>> ---
>>  .../x86/intel-speed-select/isst-config.c  | 44
>> ++-
>>  1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/power/x86/intel-speed-select/isst-config.c
>> b/tools/power/x86/intel-speed-select/isst-config.c
>> index f4a23678416e..734a7960458c 100644
>> --- a/tools/power/x86/intel-speed-select/isst-config.c
>> +++ b/tools/power/x86/intel-speed-select/isst-config.c
>> @@ -23,6 +23,7 @@ static int debug_flag;
>>  static FILE *outf;
>>  
>>  static int cpu_model;
>> +static int cpu_stepping;
>>  
>>  #define MAX_CPUS_IN_ONE_REQ 64
>>  static short max_target_cpus;
>> @@ -72,7 +73,16 @@ void debug_printf(const char *format, ...)
>>  va_end(args);
>>  }
>>  
>> -static void update_cpu_model(void)
>> +
>> +int is_clx_n_platform(void)
>> +{
>> +if (cpu_model == 0x55)
>> +if (cpu_stepping == 0x6 || cpu_stepping == 0x7)
>> +return 1;
>> +return 0;
>> +}
>> +
>> +static int update_cpu_model(void)
>>  {
>>  unsigned int ebx, ecx, edx;
>>  unsigned int fms, family;
>> @@ -82,6 +92,34 @@ static void update_cpu_model(void)
>>  cpu_model = (fms >> 4) & 0xf;
>>  if (family == 6 || family == 0xf)
>>  cpu_model += ((fms >> 16) & 0xf) << 4;
>> +
>> +cpu_stepping = fms & 0xf;
>> +
>> +/* only three CascadeLake-N models are supported */
>> +if (is_clx_n_platform()) {
>> +FILE *fp;
>> +size_t n;
>> +char *line;
> Need n = 0 and *line = NULL here as getline() will require if it has to
> allocate.

Odd, I ran multiple tests and never hit the segfault :(.  Thanks for fixing.

P.

> 
> Anyway I will update the patchset and post after test.
> 

> Thanks,
> Srinivas
>> +int ret = 1;
>> +
>> +fp = fopen("/proc/cpuinfo", "r");
>> +if (!fp)
>> +err(-1, "cannot open /proc/cpuinfo\n");
>> +
>> +while (getline(, , fp) > 0) {
>> +if (strstr(line, "model name")) {
>> +if (strstr(line, "6252N") ||
>> +strstr(line, "6230N") ||
>> +strstr(line, "5218N"))
>> +ret = 0;
>> +break;
>> +}
>> +}
>> +free(line);
>> +fclose(fp);
>> +return ret;
>> +}
>> +return 0;
>>  }
>>  
>>  /* Open a file, and exit on failure */
>> @@ -1889,7 +1927,9 @@ static void cmdline(int argc, char **argv)
>>  fprintf(stderr, "Feature name and|or command not
>> specified\n");
>>  exit(0);
>>  }
>> -update_cpu_model();
>> +ret = update_cpu_model();
>> +if (ret)
>> +err(-1, "Invalid CPU model (%d)\n", cpu_model);
>>  printf("Intel(R) Speed Select Technology\n");
>>  printf("Executing on CPU model:%d[0x%x]\n", cpu_model,
>> cpu_model);
>>  set_max_cpu_num();
> 


[PATCH v2 6/7] intel-speed-select: Implement 'perf-profile info' on CascadeLake-N

2019-10-03 Thread Prarit Bhargava
Add functionality for perf-profile info on CascadeLake-N.

Signed-off-by: Prarit Bhargava 
---
 .../x86/intel-speed-select/isst-config.c  | 20 +++
 .../x86/intel-speed-select/isst-display.c | 12 +++
 tools/power/x86/intel-speed-select/isst.h |  1 +
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 3ab0edade5ec..d9b580139a07 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -836,6 +836,12 @@ static void clx_n_config(void)
ctdp_level->fact_enabled = 0;
 }
 
+static void dump_cn_config_for_cpu(int cpu, void *arg1, void *arg2,
+  void *arg3, void *arg4)
+{
+   isst_ctdp_display_information(cpu, outf, tdp_level, _n_pkg_dev);
+}
+
 static void dump_isst_config_for_cpu(int cpu, void *arg1, void *arg2,
 void *arg3, void *arg4)
 {
@@ -854,6 +860,8 @@ static void dump_isst_config_for_cpu(int cpu, void *arg1, 
void *arg2,
 
 static void dump_isst_config(int arg)
 {
+   void *fn;
+
if (cmd_help) {
fprintf(stderr,
"Print Intel(R) Speed Select Technology Performance 
profile configuration\n");
@@ -865,14 +873,17 @@ static void dump_isst_config(int arg)
exit(0);
}
 
+   if (!is_clx_n_platform())
+   fn = dump_isst_config_for_cpu;
+   else
+   fn = dump_cn_config_for_cpu;
+
isst_ctdp_display_information_start(outf);
 
if (max_target_cpus)
-   for_each_online_target_cpu_in_set(dump_isst_config_for_cpu,
- NULL, NULL, NULL, NULL);
+   for_each_online_target_cpu_in_set(fn, NULL, NULL, NULL, NULL);
else
-   for_each_online_package_in_set(dump_isst_config_for_cpu, NULL,
-  NULL, NULL, NULL);
+   for_each_online_package_in_set(fn, NULL, NULL, NULL, NULL);
 
isst_ctdp_display_information_end(outf);
 }
@@ -1633,6 +1644,7 @@ static void get_clos_assoc(int arg)
 }
 
 static struct process_cmd_struct clx_n_cmds[] = {
+   { "perf-profile", "info", dump_isst_config, 0 },
{ NULL, NULL, NULL, 0 }
 };
 
diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 1caa7ae25245..8309810e7425 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -202,6 +202,9 @@ static void _isst_pbf_display_information(int cpu, FILE 
*outf, int level,
 pbf_info->p1_low * DISP_FREQ_MULTIPLIER);
format_and_print(outf, disp_level + 1, header, value);
 
+   if (is_clx_n_platform())
+   return;
+
snprintf(header, sizeof(header), "tjunction-temperature(C)");
snprintf(value, sizeof(value), "%d", pbf_info->t_prochot);
format_and_print(outf, disp_level + 1, header, value);
@@ -375,6 +378,15 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
snprintf(value, sizeof(value), "unsupported");
format_and_print(outf, base_level + 4, header, value);
 
+   if (is_clx_n_platform()) {
+   if (ctdp_level->pbf_support)
+   _isst_pbf_display_information(cpu, outf,
+ tdp_level,
+ _level->pbf_info,
+ base_level + 4);
+   continue;
+   }
+
snprintf(header, sizeof(header), "thermal-design-power(W)");
snprintf(value, sizeof(value), "%d", ctdp_level->pkg_tdp);
format_and_print(outf, base_level + 4, header, value);
diff --git a/tools/power/x86/intel-speed-select/isst.h 
b/tools/power/x86/intel-speed-select/isst.h
index 0dcae17b3945..bef27bd6138e 100644
--- a/tools/power/x86/intel-speed-select/isst.h
+++ b/tools/power/x86/intel-speed-select/isst.h
@@ -239,4 +239,5 @@ extern void isst_display_result(int cpu, FILE *outf, char 
*feature, char *cmd,
 extern int isst_clos_get_clos_information(int cpu, int *enable, int *type);
 extern void isst_clos_display_clos_information(int cpu, FILE *outf,
   int clos_enable, int type);
+extern int is_clx_n_platform(void);
 #endif
-- 
2.18.1



[PATCH v2 7/7] intel-speed-select: Implement base-freq commands on CascadeLake-N

2019-10-03 Thread Prarit Bhargava
Add functionality for base-freq info|enable|disable info on CascadeLake-N.

The enable command always returns success, and the disable command always
returns failed because SST-BF cannot be enabled or disabled from the OS on
CascadeLake-N.

Signed-off-by: Prarit Bhargava 
---
 .../x86/intel-speed-select/isst-config.c  | 68 +--
 1 file changed, 48 insertions(+), 20 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index d9b580139a07..7c7f0551eda1 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -836,7 +836,7 @@ static void clx_n_config(void)
ctdp_level->fact_enabled = 0;
 }
 
-static void dump_cn_config_for_cpu(int cpu, void *arg1, void *arg2,
+static void dump_clx_n_config_for_cpu(int cpu, void *arg1, void *arg2,
   void *arg3, void *arg4)
 {
isst_ctdp_display_information(cpu, outf, tdp_level, _n_pkg_dev);
@@ -876,7 +876,7 @@ static void dump_isst_config(int arg)
if (!is_clx_n_platform())
fn = dump_isst_config_for_cpu;
else
-   fn = dump_cn_config_for_cpu;
+   fn = dump_clx_n_config_for_cpu;
 
isst_ctdp_display_information_start(outf);
 
@@ -951,6 +951,18 @@ static void set_tdp_level(int arg)
isst_ctdp_display_information_end(outf);
 }
 
+static void clx_n_dump_pbf_config_for_cpu(int cpu, void *arg1, void *arg2,
+  void *arg3, void *arg4)
+{
+   struct isst_pbf_info *pbf_info;
+   struct isst_pkg_ctdp_level_info *ctdp_level;
+
+   ctdp_level = _n_pkg_dev.ctdp_level[0];
+   pbf_info = _level->pbf_info;
+
+   isst_pbf_display_information(cpu, outf, tdp_level, pbf_info);
+}
+
 static void dump_pbf_config_for_cpu(int cpu, void *arg1, void *arg2, void 
*arg3,
void *arg4)
 {
@@ -968,6 +980,8 @@ static void dump_pbf_config_for_cpu(int cpu, void *arg1, 
void *arg2, void *arg3,
 
 static void dump_pbf_config(int arg)
 {
+   void *fn;
+
if (cmd_help) {
fprintf(stderr,
"Print Intel(R) Speed Select Technology base frequency 
configuration for a TDP level\n");
@@ -981,13 +995,18 @@ static void dump_pbf_config(int arg)
exit(1);
}
 
+   if (!is_clx_n_platform())
+   fn = dump_pbf_config_for_cpu;
+   else
+   fn = clx_n_dump_pbf_config_for_cpu;
+
isst_ctdp_display_information_start(outf);
+
if (max_target_cpus)
-   for_each_online_target_cpu_in_set(dump_pbf_config_for_cpu, NULL,
- NULL, NULL, NULL);
+   for_each_online_target_cpu_in_set(fn, NULL, NULL, NULL, NULL);
else
-   for_each_online_package_in_set(dump_pbf_config_for_cpu, NULL,
-  NULL, NULL, NULL);
+   for_each_online_package_in_set(fn, NULL, NULL, NULL, NULL);
+
isst_ctdp_display_information_end(outf);
 }
 
@@ -1165,21 +1184,27 @@ static void set_pbf_for_cpu(int cpu, void *arg1, void 
*arg2, void *arg3,
int ret;
int status = *(int *)arg4;
 
-   ret = isst_set_pbf_fact_status(cpu, 1, status);
-   if (ret) {
-   perror("isst_set_pbf");
-   } else {
-   if (status) {
-   if (auto_mode)
-   ret = set_pbf_core_power(cpu);
-   isst_display_result(cpu, outf, "base-freq", "enable",
-   ret);
-   } else {
-   if (auto_mode)
-   isst_pm_qos_config(cpu, 0, 0);
-   isst_display_result(cpu, outf, "base-freq", "disable",
-   ret);
+   if (!is_clx_n_platform()) {
+   ret = isst_set_pbf_fact_status(cpu, 1, status);
+   if (ret) {
+   perror("isst_set_pbf");
+   return;
}
+   } else {
+   if (status == 0)
+   ret = -1;
+   else
+   ret = 0;
+   }
+
+   if (status) {
+   if (auto_mode)
+   ret = set_pbf_core_power(cpu);
+   isst_display_result(cpu, outf, "base-freq", "enable", ret);
+   } else {
+   if (auto_mode)
+   isst_pm_qos_config(cpu, 0, 0);
+   isst_display_result(cpu, outf, "base-freq", "disable", ret);
}
 }
 
@@ -1645,6 +1670,9 @@ static void get_clos_assoc(int arg)
 
 static struct process_cmd_struct clx_n_cmds[] = {
{ "perf-profile", "in

[PATCH v2 3/7] intel-speed-select: Add check for CascadeLake-N models

2019-10-03 Thread Prarit Bhargava
Three CascadeLake-N models (6252N, 6230N, and 5218N) have SST-PBF support.

Return an error if the CascadeLake processor is not one of these specific
models.

v2: Add is_clx_n_platform()

Signed-off-by: Prarit Bhargava 
---
 .../x86/intel-speed-select/isst-config.c  | 44 ++-
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index f4a23678416e..734a7960458c 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -23,6 +23,7 @@ static int debug_flag;
 static FILE *outf;
 
 static int cpu_model;
+static int cpu_stepping;
 
 #define MAX_CPUS_IN_ONE_REQ 64
 static short max_target_cpus;
@@ -72,7 +73,16 @@ void debug_printf(const char *format, ...)
va_end(args);
 }
 
-static void update_cpu_model(void)
+
+int is_clx_n_platform(void)
+{
+   if (cpu_model == 0x55)
+   if (cpu_stepping == 0x6 || cpu_stepping == 0x7)
+   return 1;
+   return 0;
+}
+
+static int update_cpu_model(void)
 {
unsigned int ebx, ecx, edx;
unsigned int fms, family;
@@ -82,6 +92,34 @@ static void update_cpu_model(void)
cpu_model = (fms >> 4) & 0xf;
if (family == 6 || family == 0xf)
cpu_model += ((fms >> 16) & 0xf) << 4;
+
+   cpu_stepping = fms & 0xf;
+
+   /* only three CascadeLake-N models are supported */
+   if (is_clx_n_platform()) {
+   FILE *fp;
+   size_t n;
+   char *line;
+   int ret = 1;
+
+   fp = fopen("/proc/cpuinfo", "r");
+   if (!fp)
+   err(-1, "cannot open /proc/cpuinfo\n");
+
+   while (getline(, , fp) > 0) {
+   if (strstr(line, "model name")) {
+   if (strstr(line, "6252N") ||
+   strstr(line, "6230N") ||
+   strstr(line, "5218N"))
+   ret = 0;
+   break;
+   }
+   }
+   free(line);
+   fclose(fp);
+   return ret;
+   }
+   return 0;
 }
 
 /* Open a file, and exit on failure */
@@ -1889,7 +1927,9 @@ static void cmdline(int argc, char **argv)
fprintf(stderr, "Feature name and|or command not specified\n");
exit(0);
}
-   update_cpu_model();
+   ret = update_cpu_model();
+   if (ret)
+   err(-1, "Invalid CPU model (%d)\n", cpu_model);
printf("Intel(R) Speed Select Technology\n");
printf("Executing on CPU model:%d[0x%x]\n", cpu_model, cpu_model);
set_max_cpu_num();
-- 
2.18.1



[PATCH v2 0/7] Add CascadeLake-N Support

2019-10-03 Thread Prarit Bhargava
Add support for SST-BF on CascadeLake-N support.  The CascadeLake-N
processor only support SST-BF and not other SST functionality.

v2: Updated with comments from Srinivas (use common clx_n_* function names,
common is_clx_n_platform() function call to identify CascadeLake-N)

Signed-off-by: Prarit Bhargava 

Prarit Bhargava (7):
  intel-speed-select: Add int argument to command functions
  intel-speed-select: Make process_command generic
  intel-speed-select: Add check for CascadeLake-N models
  intel-speed-select: Add configuration for CascadeLake-N
  intel-speed-select: Implement CascadeLake-N help and command functions
structures
  intel-speed-select: Implement 'perf-profile info' on CascadeLake-N
  intel-speed-select: Implement base-freq commands on CascadeLake-N

 .../x86/intel-speed-select/isst-config.c  | 493 --
 .../x86/intel-speed-select/isst-display.c |  14 +-
 tools/power/x86/intel-speed-select/isst.h |   3 +
 3 files changed, 339 insertions(+), 171 deletions(-)

-- 
2.18.1



[PATCH v2 1/7] intel-speed-select: Add int argument to command functions

2019-10-03 Thread Prarit Bhargava
The current code structure has similar but separate command functions for
the enable and disable operations.  This can be improved by adding an int
argument to the command function structure, and interpreting 1 as enable
and 0 as disable.  This change results in the removal of the disable
command functions.

Add int argument to the command function structure.

Signed-off-by: Prarit Bhargava 
---
 .../x86/intel-speed-select/isst-config.c  | 216 +++---
 1 file changed, 88 insertions(+), 128 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index edc4c50853bd..1b2899706857 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -11,7 +11,8 @@
 struct process_cmd_struct {
char *feature;
char *command;
-   void (*process_fn)(void);
+   void (*process_fn)(int arg);
+   int arg;
 };
 
 static const char *version_str = "v1.0";
@@ -679,7 +680,7 @@ static void exec_on_get_ctdp_cpu(int cpu, void *arg1, void 
*arg2, void *arg3,
 }
 
 #define _get_tdp_level(desc, suffix, object, help) 
   \
-   static void get_tdp_##object(void)  
  \
+   static void get_tdp_##object(int arg)   
 \
{   
  \
struct isst_pkg_ctdp ctdp;  
  \
 \
@@ -725,7 +726,7 @@ static void dump_isst_config_for_cpu(int cpu, void *arg1, 
void *arg2,
}
 }
 
-static void dump_isst_config(void)
+static void dump_isst_config(int arg)
 {
if (cmd_help) {
fprintf(stderr,
@@ -788,7 +789,7 @@ static void set_tdp_level_for_cpu(int cpu, void *arg1, void 
*arg2, void *arg3,
}
 }
 
-static void set_tdp_level(void)
+static void set_tdp_level(int arg)
 {
if (cmd_help) {
fprintf(stderr, "Set Config TDP level\n");
@@ -828,7 +829,7 @@ static void dump_pbf_config_for_cpu(int cpu, void *arg1, 
void *arg2, void *arg3,
}
 }
 
-static void dump_pbf_config(void)
+static void dump_pbf_config(int arg)
 {
if (cmd_help) {
fprintf(stderr,
@@ -1045,51 +1046,36 @@ static void set_pbf_for_cpu(int cpu, void *arg1, void 
*arg2, void *arg3,
}
 }
 
-static void set_pbf_enable(void)
+static void set_pbf_enable(int arg)
 {
-   int status = 1;
+   int enable = arg;
 
if (cmd_help) {
-   fprintf(stderr,
-   "Enable Intel Speed Select Technology base frequency 
feature\n");
-   fprintf(stderr,
-   "\tOptional Arguments: -a|--auto : Use priority of 
cores to set core-power associations\n");
-
-   exit(0);
-   }
-
-   isst_ctdp_display_information_start(outf);
-   if (max_target_cpus)
-   for_each_online_target_cpu_in_set(set_pbf_for_cpu, NULL, NULL,
- NULL, );
-   else
-   for_each_online_package_in_set(set_pbf_for_cpu, NULL, NULL,
-  NULL, );
-   isst_ctdp_display_information_end(outf);
-}
-
-static void set_pbf_disable(void)
-{
-   int status = 0;
+   if (enable) {
+   fprintf(stderr,
+   "Enable Intel Speed Select Technology base 
frequency feature\n");
+   fprintf(stderr,
+   "\tOptional Arguments: -a|--auto : Use priority 
of cores to set core-power associations\n");
+   } else {
 
-   if (cmd_help) {
-   fprintf(stderr,
-   "Disable Intel Speed Select Technology base frequency 
feature\n");
-   fprintf(stderr,
-   "\tOptional Arguments: -a|--auto : Also disable 
core-power associations\n");
+   fprintf(stderr,
+   "Disable Intel Speed Select Technology base 
frequency feature\n");
+   fprintf(stderr,
+   "\tOptional Arguments: -a|--auto : Also disable 
core-power associations\n");
+   }
exit(0);
}
 
isst_ctdp_display_information_start(outf);
if (max_target_cpus)
for_each_online_target_cpu_in_set(set_pbf_for_cpu, NULL, NULL,
- NULL, );
+ NULL, );
else
for_each_online_package_in_set(set_pbf_for_cpu, NULL, NULL,
-  NULL, );
+  NULL, );
isst_ctdp_display_information_end(outf);
 
-   if (auto_

[PATCH v2 4/7] intel-speed-select: Add configuration for CascadeLake-N

2019-10-03 Thread Prarit Bhargava
Create a 'dummy' pkg_dev struct for use by CascadeLake-N processors.  This
struct will be used in later patches to implement info and status calls
for CascadeLake-N SST-BF.

Signed-off-by: Prarit Bhargava 
---
 .../x86/intel-speed-select/isst-config.c  | 98 ++-
 .../x86/intel-speed-select/isst-display.c |  2 -
 tools/power/x86/intel-speed-select/isst.h |  2 +
 3 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 734a7960458c..164c4e5e6ccb 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -748,6 +748,94 @@ _get_tdp_level("get-config-current_level", levels, 
current_level,
   "Current TDP Level");
 _get_tdp_level("get-lock-status", levels, locked, "TDP lock status");
 
+struct isst_pkg_ctdp clx_n_pkg_dev;
+
+static int clx_n_get_base_ratio(void)
+{
+   FILE *fp;
+   char *begin, *end, *line;
+   char number[5];
+   float value = 0;
+   size_t n;
+
+   fp = fopen("/proc/cpuinfo", "r");
+   if (!fp)
+   err(-1, "cannot open /proc/cpuinfo\n");
+
+   while (getline(, , fp) > 0) {
+   if (strstr(line, "model name")) {
+   /* this is true for CascadeLake-N */
+   begin = strstr(line, "@ ") + 2;
+   end = strstr(line, "GHz");
+   strncpy(number, begin, end - begin);
+   value = atof(number) * 10;
+   break;
+   }
+   }
+   free(line);
+   fclose(fp);
+
+   return (int)(value);
+}
+
+static void clx_n_config(void)
+{
+   int i;
+   unsigned long cpu_bf;
+   struct isst_pkg_ctdp_level_info *ctdp_level;
+   struct isst_pbf_info *pbf_info;
+
+   ctdp_level = _n_pkg_dev.ctdp_level[0];
+   pbf_info = _level->pbf_info;
+
+   /* find the frequency base ratio */
+   ctdp_level->tdp_ratio = clx_n_get_base_ratio();
+   if (ctdp_level->tdp_ratio == 0)
+   err(-1, "cn base ratio is zero\n");
+
+   /* find the high and low priority frequencies */
+   pbf_info->p1_high = 0;
+   pbf_info->p1_low = ~0;
+   for (i = 0; i < topo_max_cpus; i++) {
+   cpu_bf = parse_int_file(1,
+   "/sys/devices/system/cpu/cpu%d/cpufreq/base_frequency",
+   i);
+   if (cpu_bf > pbf_info->p1_high)
+   pbf_info->p1_high = cpu_bf;
+   if (cpu_bf < pbf_info->p1_low)
+   pbf_info->p1_low = cpu_bf;
+   }
+
+   if (pbf_info->p1_high == ~0UL)
+   err(-1, "maximum base frequency not set\n");
+
+   if (pbf_info->p1_low == 0)
+   err(-1, "minimum base frequency not set\n");
+
+   /* convert frequencies back to ratios */
+   pbf_info->p1_high = pbf_info->p1_high / DISP_FREQ_MULTIPLIER;
+   pbf_info->p1_low = pbf_info->p1_low / DISP_FREQ_MULTIPLIER;
+
+   /* create high priority cpu mask */
+   pbf_info->core_cpumask_size = alloc_cpu_set(_info->core_cpumask);
+   for (i = 0; i < topo_max_cpus; i++) {
+   cpu_bf = parse_int_file(1,
+   "/sys/devices/system/cpu/cpu%d/cpufreq/base_frequency",
+   i);
+   cpu_bf = cpu_bf / DISP_FREQ_MULTIPLIER;
+   if (cpu_bf == pbf_info->p1_high)
+   CPU_SET_S(i, pbf_info->core_cpumask_size,
+ pbf_info->core_cpumask);
+   }
+
+   /* extra ctdp & pbf struct parameters */
+   ctdp_level->processed = 1;
+   ctdp_level->pbf_support = 1; /* PBF is always supported and enabled */
+   ctdp_level->pbf_enabled = 1;
+   ctdp_level->fact_support = 0; /* FACT is never supported */
+   ctdp_level->fact_enabled = 0;
+}
+
 static void dump_isst_config_for_cpu(int cpu, void *arg1, void *arg2,
 void *arg3, void *arg4)
 {
@@ -1809,7 +1897,10 @@ void process_command(int argc, char **argv,
}
}
 
-   create_cpu_map();
+   if (!is_clx_n_platform())
+   create_cpu_map();
+   else
+   clx_n_config();
 
i = 0;
while (cmds[i].feature) {
@@ -1825,6 +1916,11 @@ void process_command(int argc, char **argv,
 
if (!matched)
fprintf(stderr, "Invalid command\n");
+
+   if (!is_clx_n_platform()) {
+   /* this was allocated in clx_n_config() */
+   free_cpu_set(clx_n_pkg_dev.ctdp_level[0].pbf_info.core_cpumask);
+  

[PATCH v2 5/7] intel-speed-select: Implement CascadeLake-N help and command functions structures

2019-10-03 Thread Prarit Bhargava
CascadeLake-N only supports SST-BF and needs some of the perf-profile
commands, and the base-freq commands.

Add help functions, and create an empty command structures (the functions
will be implemented later in this patchset).  Call these functions
when running on CascadeLake-N.

Signed-off-by: Prarit Bhargava 
---
 .../x86/intel-speed-select/isst-config.c  | 37 ++-
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 164c4e5e6ccb..3ab0edade5ec 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -1632,6 +1632,10 @@ static void get_clos_assoc(int arg)
isst_ctdp_display_information_end(outf);
 }
 
+static struct process_cmd_struct clx_n_cmds[] = {
+   { NULL, NULL, NULL, 0 }
+};
+
 static struct process_cmd_struct isst_cmds[] = {
{ "perf-profile", "get-lock-status", get_tdp_locked, 0 },
{ "perf-profile", "get-config-levels", get_tdp_levels, 0 },
@@ -1820,12 +1824,15 @@ static void isst_help(void)
TDP, etc.\n");
printf("\nCommands : For feature=perf-profile\n");
printf("\tinfo\n");
-   printf("\tget-lock-status\n");
-   printf("\tget-config-levels\n");
-   printf("\tget-config-version\n");
-   printf("\tget-config-enabled\n");
-   printf("\tget-config-current-level\n");
-   printf("\tset-config-level\n");
+
+   if (!is_clx_n_platform()) {
+   printf("\tget-lock-status\n");
+   printf("\tget-config-levels\n");
+   printf("\tget-config-version\n");
+   printf("\tget-config-enabled\n");
+   printf("\tget-config-current-level\n");
+   printf("\tset-config-level\n");
+   }
 }
 
 static void pbf_help(void)
@@ -1875,6 +1882,12 @@ static struct process_cmd_help_struct isst_help_cmds[] = 
{
{ NULL, NULL }
 };
 
+static struct process_cmd_help_struct clx_n_help_cmds[] = {
+   { "perf-profile", isst_help },
+   { "base-freq", pbf_help },
+   { NULL, NULL }
+};
+
 void process_command(int argc, char **argv,
 struct process_cmd_help_struct *help_cmds,
 struct process_cmd_struct *cmds)
@@ -2031,11 +2044,15 @@ static void cmdline(int argc, char **argv)
set_max_cpu_num();
set_cpu_present_cpu_mask();
set_cpu_target_cpu_mask();
-   ret = isst_fill_platform_info();
-   if (ret)
-   goto out;
 
-   process_command(argc, argv, isst_help_cmds, isst_cmds);
+   if (!is_clx_n_platform()) {
+   ret = isst_fill_platform_info();
+   if (ret)
+   goto out;
+   process_command(argc, argv, isst_help_cmds, isst_cmds);
+   } else {
+   process_command(argc, argv, clx_n_help_cmds, clx_n_cmds);
+   }
 out:
free_cpu_set(present_cpumask);
free_cpu_set(target_cpumask);
-- 
2.18.1



[PATCH v2 2/7] intel-speed-select: Make process_command generic

2019-10-03 Thread Prarit Bhargava
Make the process_command take any help command and command list.  This
will make it easier to help commands and a command list for CascadeLake-N.

Signed-off-by: Prarit Bhargava 
---
 .../x86/intel-speed-select/isst-config.c  | 20 ++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 1b2899706857..f4a23678416e 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -1749,7 +1749,9 @@ static struct process_cmd_help_struct isst_help_cmds[] = {
{ NULL, NULL }
 };
 
-void process_command(int argc, char **argv)
+void process_command(int argc, char **argv,
+struct process_cmd_help_struct *help_cmds,
+struct process_cmd_struct *cmds)
 {
int i = 0, matched = 0;
char *feature = argv[optind];
@@ -1760,9 +1762,9 @@ void process_command(int argc, char **argv)
 
debug_printf("feature name [%s] command [%s]\n", feature, cmd);
if (!strcmp(cmd, "-h") || !strcmp(cmd, "--help")) {
-   while (isst_help_cmds[i].feature) {
-   if (!strcmp(isst_help_cmds[i].feature, feature)) {
-   isst_help_cmds[i].process_fn();
+   while (help_cmds[i].feature) {
+   if (!strcmp(help_cmds[i].feature, feature)) {
+   help_cmds[i].process_fn();
exit(0);
}
++i;
@@ -1772,11 +1774,11 @@ void process_command(int argc, char **argv)
create_cpu_map();
 
i = 0;
-   while (isst_cmds[i].feature) {
-   if (!strcmp(isst_cmds[i].feature, feature) &&
-   !strcmp(isst_cmds[i].command, cmd)) {
+   while (cmds[i].feature) {
+   if (!strcmp(cmds[i].feature, feature) &&
+   !strcmp(cmds[i].command, cmd)) {
parse_cmd_args(argc, optind + 1, argv);
-   isst_cmds[i].process_fn(isst_cmds[i].arg);
+   cmds[i].process_fn(cmds[i].arg);
matched = 1;
break;
}
@@ -1897,7 +1899,7 @@ static void cmdline(int argc, char **argv)
if (ret)
goto out;
 
-   process_command(argc, argv);
+   process_command(argc, argv, isst_help_cmds, isst_cmds);
 out:
free_cpu_set(present_cpumask);
free_cpu_set(target_cpumask);
-- 
2.18.1



Re: [PATCH 1/7] intel-speed-select: Add int argument to command functions

2019-09-30 Thread Prarit Bhargava



On 9/26/19 4:21 PM, Srinivas Pandruvada wrote:
> On Thu, 2019-09-26 at 08:54 -0400, Prarit Bhargava wrote:
>> The current code structure has similar but separate command functions
>> for
>> the enable and disable operations.  This can be improved by adding an
>> int
>> argument to the command function structure, and interpreting 1 as
>> enable
>> and 0 as disable.  This change results in the removal of the disable
>> command functions.
>>
>> Add int argument to the command function structure.
> Does this brings in any significant reduction in data or code size?

It reduces code size.  Right now you have separate functions for enable &
disable.  These are unified into single functions.

P.

> My focus is to add features first which helps users.
> 
> Thanks,
> Srinivas
> 
> 
>>
>> Signed-off-by: Prarit Bhargava 
>> Cc: Srinivas Pandruvada 
>> ---
>>  .../x86/intel-speed-select/isst-config.c  | 184 +++-
>> --
>>  1 file changed, 69 insertions(+), 115 deletions(-)
>>
>> diff --git a/tools/power/x86/intel-speed-select/isst-config.c
>> b/tools/power/x86/intel-speed-select/isst-config.c
>> index 2a9890c8395a..9f2798caead9 100644
>> --- a/tools/power/x86/intel-speed-select/isst-config.c
>> +++ b/tools/power/x86/intel-speed-select/isst-config.c
>> @@ -11,7 +11,8 @@
>>  struct process_cmd_struct {
>>  char *feature;
>>  char *command;
>> -void (*process_fn)(void);
>> +void (*process_fn)(int arg);
>> +int arg;
>>  };
>>  
>>  static const char *version_str = "v1.0";
>> @@ -678,7 +679,7 @@ static void exec_on_get_ctdp_cpu(int cpu, void
>> *arg1, void *arg2, void *arg3,
>>  }
>>  
>>  #define _get_tdp_level(desc, suffix, object,
>> help)\
>> -static void
>> get_tdp_##object(void)\
>> +static void get_tdp_##object(int
>> arg)\
>>  {  
>>\
>>  struct isst_pkg_ctdp
>> ctdp;\
>>  \
>> @@ -724,7 +725,7 @@ static void dump_isst_config_for_cpu(int cpu,
>> void *arg1, void *arg2,
>>  }
>>  }
>>  
>> -static void dump_isst_config(void)
>> +static void dump_isst_config(int arg)
>>  {
>>  if (cmd_help) {
>>  fprintf(stderr,
>> @@ -787,7 +788,7 @@ static void set_tdp_level_for_cpu(int cpu, void
>> *arg1, void *arg2, void *arg3,
>>  }
>>  }
>>  
>> -static void set_tdp_level(void)
>> +static void set_tdp_level(int arg)
>>  {
>>  if (cmd_help) {
>>  fprintf(stderr, "Set Config TDP level\n");
>> @@ -827,7 +828,7 @@ static void dump_pbf_config_for_cpu(int cpu, void
>> *arg1, void *arg2, void *arg3,
>>  }
>>  }
>>  
>> -static void dump_pbf_config(void)
>> +static void dump_pbf_config(int arg)
>>  {
>>  if (cmd_help) {
>>  fprintf(stderr,
>> @@ -871,43 +872,27 @@ static void set_pbf_for_cpu(int cpu, void
>> *arg1, void *arg2, void *arg3,
>>  }
>>  }
>>  
>> -static void set_pbf_enable(void)
>> -{
>> -int status = 1;
>> -
>> -if (cmd_help) {
>> -fprintf(stderr,
>> -"Enable Intel Speed Select Technology base
>> frequency feature [No command arguments are required]\n");
>> -exit(0);
>> -}
>> -
>> -isst_ctdp_display_information_start(outf);
>> -if (max_target_cpus)
>> -for_each_online_target_cpu_in_set(set_pbf_for_cpu,
>> NULL, NULL,
>> -  NULL, );
>> -else
>> -for_each_online_package_in_set(set_pbf_for_cpu, NULL,
>> NULL,
>> -   NULL, );
>> -isst_ctdp_display_information_end(outf);
>> -}
>> -
>> -static void set_pbf_disable(void)
>> +static void set_pbf_enable(int arg)
>>  {
>> -int status = 0;
>> +int enable = arg;
>>  
>>  if (cmd_help) {
>> -fprintf(stderr,
>> -"Disable Intel Speed Select Technology base
>> frequency feature [No command arguments are required]\n");
>> +if (enable)
>> +fprintf(stderr,
>> +"Enable Intel Speed Select Technolog

[PATCH 0/7] intel-speed-select: Add CascadeLake-N support

2019-09-26 Thread Prarit Bhargava
Add support for SST-BF on CascadeLake-N support.  The CascadeLake-N
processor only support SST-BF and not other SST functionality.

Prarit Bhargava (7):
  intel-speed-select: Add int argument to command functions
  intel-speed-select: Make process_command generic
  intel-speed-select: Add check for CascadeLake-N models
  intel-speed-select: Add configuration for CascadeLake-N
  intel-speed-select: Implement CascadeLake-N help and command functions
structures
  intel-speed-select: Implement 'perf-profile info' on CascadeLake-N
  intel-speed-select: Implement base-freq commands on CascadeLake-N

 .../x86/intel-speed-select/isst-config.c  | 443 --
 .../x86/intel-speed-select/isst-display.c |  28 +-
 tools/power/x86/intel-speed-select/isst.h |   8 +-
 3 files changed, 319 insertions(+), 160 deletions(-)

-- 
2.21.0



[PATCH 2/7] intel-speed-select: Make process_command generic

2019-09-26 Thread Prarit Bhargava
Make the process_command take any help command and command list.  This
will make it easier to help commands and a command list for CascadeLake-N.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
---
 .../x86/intel-speed-select/isst-config.c  | 20 ++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 9f2798caead9..bb6f8f5986c9 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -1498,7 +1498,9 @@ static struct process_cmd_help_struct isst_help_cmds[] = {
{ NULL, NULL }
 };
 
-void process_command(int argc, char **argv)
+void process_command(int argc, char **argv,
+struct process_cmd_help_struct *help_cmds,
+struct process_cmd_struct *cmds)
 {
int i = 0, matched = 0;
char *feature = argv[optind];
@@ -1509,9 +1511,9 @@ void process_command(int argc, char **argv)
 
debug_printf("feature name [%s] command [%s]\n", feature, cmd);
if (!strcmp(cmd, "-h") || !strcmp(cmd, "--help")) {
-   while (isst_help_cmds[i].feature) {
-   if (!strcmp(isst_help_cmds[i].feature, feature)) {
-   isst_help_cmds[i].process_fn();
+   while (help_cmds[i].feature) {
+   if (!strcmp(help_cmds[i].feature, feature)) {
+   help_cmds[i].process_fn();
exit(0);
}
++i;
@@ -1521,11 +1523,11 @@ void process_command(int argc, char **argv)
create_cpu_map();
 
i = 0;
-   while (isst_cmds[i].feature) {
-   if (!strcmp(isst_cmds[i].feature, feature) &&
-   !strcmp(isst_cmds[i].command, cmd)) {
+   while (cmds[i].feature) {
+   if (!strcmp(cmds[i].feature, feature) &&
+   !strcmp(cmds[i].command, cmd)) {
parse_cmd_args(argc, optind + 1, argv);
-   isst_cmds[i].process_fn(isst_cmds[i].arg);
+   cmds[i].process_fn(cmds[i].arg);
matched = 1;
break;
}
@@ -1646,7 +1648,7 @@ static void cmdline(int argc, char **argv)
if (ret)
goto out;
 
-   process_command(argc, argv);
+   process_command(argc, argv, isst_help_cmds, isst_cmds);
 out:
free_cpu_set(present_cpumask);
free_cpu_set(target_cpumask);
-- 
2.21.0



[PATCH 5/7] intel-speed-select: Implement CascadeLake-N help and command functions structures

2019-09-26 Thread Prarit Bhargava
CascadeLake-N only supports SST-BF and needs some of the perf-profile
commands, and the base-freq commands.

Add help functions, and create an empty command structures (the functions
will be implemented later in this patchset).  Call these functions
when running on CascadeLake-N.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
---
 .../x86/intel-speed-select/isst-config.c  | 37 ++-
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 33f328d971d3..a82005825990 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -1373,6 +1373,10 @@ static void get_clos_assoc(int arg)
isst_ctdp_display_information_end(outf);
 }
 
+static struct process_cmd_struct cn_cmds[] = {
+   { NULL, NULL, NULL, 0 }
+};
+
 static struct process_cmd_struct isst_cmds[] = {
{ "perf-profile", "get-lock-status", get_tdp_locked, 0 },
{ "perf-profile", "get-config-levels", get_tdp_levels, 0 },
@@ -1557,12 +1561,15 @@ static void isst_help(void)
TDP, etc.\n");
printf("\nCommands : For feature=perf-profile\n");
printf("\tinfo\n");
-   printf("\tget-lock-status\n");
-   printf("\tget-config-levels\n");
-   printf("\tget-config-version\n");
-   printf("\tget-config-enabled\n");
-   printf("\tget-config-current-level\n");
-   printf("\tset-config-level\n");
+
+   if (cpu_model != 0x55) {
+   printf("\tget-lock-status\n");
+   printf("\tget-config-levels\n");
+   printf("\tget-config-version\n");
+   printf("\tget-config-enabled\n");
+   printf("\tget-config-current-level\n");
+   printf("\tset-config-level\n");
+   }
 }
 
 static void pbf_help(void)
@@ -1612,6 +1619,12 @@ static struct process_cmd_help_struct isst_help_cmds[] = 
{
{ NULL, NULL }
 };
 
+static struct process_cmd_help_struct cn_help_cmds[] = {
+   { "perf-profile", isst_help },
+   { "base-freq", pbf_help },
+   { NULL, NULL }
+};
+
 void process_command(int argc, char **argv,
 struct process_cmd_help_struct *help_cmds,
 struct process_cmd_struct *cmds)
@@ -1768,11 +1781,15 @@ static void cmdline(int argc, char **argv)
set_max_cpu_num();
set_cpu_present_cpu_mask();
set_cpu_target_cpu_mask();
-   ret = isst_fill_platform_info();
-   if (ret)
-   goto out;
 
-   process_command(argc, argv, isst_help_cmds, isst_cmds);
+   if (cpu_model != 0x55) {
+   ret = isst_fill_platform_info();
+   if (ret)
+   goto out;
+   process_command(argc, argv, isst_help_cmds, isst_cmds);
+   } else {
+   process_command(argc, argv, cn_help_cmds, cn_cmds);
+   }
 out:
free_cpu_set(present_cpumask);
free_cpu_set(target_cpumask);
-- 
2.21.0



[PATCH 1/7] intel-speed-select: Add int argument to command functions

2019-09-26 Thread Prarit Bhargava
The current code structure has similar but separate command functions for
the enable and disable operations.  This can be improved by adding an int
argument to the command function structure, and interpreting 1 as enable
and 0 as disable.  This change results in the removal of the disable
command functions.

Add int argument to the command function structure.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
---
 .../x86/intel-speed-select/isst-config.c  | 184 +++---
 1 file changed, 69 insertions(+), 115 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 2a9890c8395a..9f2798caead9 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -11,7 +11,8 @@
 struct process_cmd_struct {
char *feature;
char *command;
-   void (*process_fn)(void);
+   void (*process_fn)(int arg);
+   int arg;
 };
 
 static const char *version_str = "v1.0";
@@ -678,7 +679,7 @@ static void exec_on_get_ctdp_cpu(int cpu, void *arg1, void 
*arg2, void *arg3,
 }
 
 #define _get_tdp_level(desc, suffix, object, help) 
   \
-   static void get_tdp_##object(void)  
  \
+   static void get_tdp_##object(int arg)   
 \
{   
  \
struct isst_pkg_ctdp ctdp;  
  \
 \
@@ -724,7 +725,7 @@ static void dump_isst_config_for_cpu(int cpu, void *arg1, 
void *arg2,
}
 }
 
-static void dump_isst_config(void)
+static void dump_isst_config(int arg)
 {
if (cmd_help) {
fprintf(stderr,
@@ -787,7 +788,7 @@ static void set_tdp_level_for_cpu(int cpu, void *arg1, void 
*arg2, void *arg3,
}
 }
 
-static void set_tdp_level(void)
+static void set_tdp_level(int arg)
 {
if (cmd_help) {
fprintf(stderr, "Set Config TDP level\n");
@@ -827,7 +828,7 @@ static void dump_pbf_config_for_cpu(int cpu, void *arg1, 
void *arg2, void *arg3,
}
 }
 
-static void dump_pbf_config(void)
+static void dump_pbf_config(int arg)
 {
if (cmd_help) {
fprintf(stderr,
@@ -871,43 +872,27 @@ static void set_pbf_for_cpu(int cpu, void *arg1, void 
*arg2, void *arg3,
}
 }
 
-static void set_pbf_enable(void)
-{
-   int status = 1;
-
-   if (cmd_help) {
-   fprintf(stderr,
-   "Enable Intel Speed Select Technology base frequency 
feature [No command arguments are required]\n");
-   exit(0);
-   }
-
-   isst_ctdp_display_information_start(outf);
-   if (max_target_cpus)
-   for_each_online_target_cpu_in_set(set_pbf_for_cpu, NULL, NULL,
- NULL, );
-   else
-   for_each_online_package_in_set(set_pbf_for_cpu, NULL, NULL,
-  NULL, );
-   isst_ctdp_display_information_end(outf);
-}
-
-static void set_pbf_disable(void)
+static void set_pbf_enable(int arg)
 {
-   int status = 0;
+   int enable = arg;
 
if (cmd_help) {
-   fprintf(stderr,
-   "Disable Intel Speed Select Technology base frequency 
feature [No command arguments are required]\n");
+   if (enable)
+   fprintf(stderr,
+   "Enable Intel Speed Select Technology base 
frequency feature [No command arguments are required]\n");
+   else
+   fprintf(stderr,
+   "Disable Intel Speed Select Technology base 
frequency feature [No command arguments are required]\n");
exit(0);
}
 
isst_ctdp_display_information_start(outf);
if (max_target_cpus)
for_each_online_target_cpu_in_set(set_pbf_for_cpu, NULL, NULL,
- NULL, );
+ NULL, );
else
for_each_online_package_in_set(set_pbf_for_cpu, NULL, NULL,
-  NULL, );
+  NULL, );
isst_ctdp_display_information_end(outf);
 }
 
@@ -925,7 +910,7 @@ static void dump_fact_config_for_cpu(int cpu, void *arg1, 
void *arg2,
  fact_avx, _info);
 }
 
-static void dump_fact_config(void)
+static void dump_fact_config(int arg)
 {
if (cmd_help) {
fprintf(stderr,
@@ -985,35 +970,17 @@ static void set_fact_for_cpu(int cpu, void *arg1, void 
*arg2, void *arg3,
}
 }
 
-static void set_fact_enable(void)
+static void set_fact_enable(int arg)
 {
-   int 

[PATCH 4/7] intel-speed-select: Add configuration for CascadeLake-N

2019-09-26 Thread Prarit Bhargava
Create a 'dummy' pkg_dev struct for use by CascadeLake-N processors.  This
struct will be used in later patches to implement info and status calls
for CascadeLake-N SST-BF.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
---
 .../x86/intel-speed-select/isst-config.c  | 98 ++-
 .../x86/intel-speed-select/isst-display.c |  2 -
 tools/power/x86/intel-speed-select/isst.h |  2 +
 3 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index ae8e3b5153ad..33f328d971d3 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -735,6 +735,94 @@ _get_tdp_level("get-config-current_level", levels, 
current_level,
   "Current TDP Level");
 _get_tdp_level("get-lock-status", levels, locked, "TDP lock status");
 
+struct isst_pkg_ctdp cn_pkg_dev;
+
+static int cn_get_base_ratio(void)
+{
+   FILE *fp;
+   char *begin, *end, *line;
+   char number[5];
+   float value = 0;
+   size_t n;
+
+   fp = fopen("/proc/cpuinfo", "r");
+   if (!fp)
+   err(-1, "cannot open /proc/cpuinfo\n");
+
+   while (getline(, , fp) > 0) {
+   if (strstr(line, "model name")) {
+   /* this is true for CascadeLake-N */
+   begin = strstr(line, "@ ") + 2;
+   end = strstr(line, "GHz");
+   strncpy(number, begin, end - begin);
+   value = atof(number) * 10;
+   break;
+   }
+   }
+   free(line);
+   fclose(fp);
+
+   return (int)(value);
+}
+
+static void cascadelaken_config(void)
+{
+   int i;
+   unsigned long cpu_bf;
+   struct isst_pkg_ctdp_level_info *ctdp_level;
+   struct isst_pbf_info *pbf_info;
+
+   ctdp_level = _pkg_dev.ctdp_level[0];
+   pbf_info = _level->pbf_info;
+
+   /* find the frequency base ratio */
+   ctdp_level->tdp_ratio = cn_get_base_ratio();
+   if (ctdp_level->tdp_ratio == 0)
+   err(-1, "cn base ratio is zero\n");
+
+   /* find the high and low priority frequencies */
+   pbf_info->p1_high = 0;
+   pbf_info->p1_low = ~0;
+   for (i = 0; i < topo_max_cpus; i++) {
+   cpu_bf = parse_int_file(1,
+   "/sys/devices/system/cpu/cpu%d/cpufreq/base_frequency",
+   i);
+   if (cpu_bf > pbf_info->p1_high)
+   pbf_info->p1_high = cpu_bf;
+   if (cpu_bf < pbf_info->p1_low)
+   pbf_info->p1_low = cpu_bf;
+   }
+
+   if (pbf_info->p1_high == ~0UL)
+   err(-1, "maximum base frequency not set\n");
+
+   if (pbf_info->p1_low == 0)
+   err(-1, "minimum base frequency not set\n");
+
+   /* convert frequencies back to ratios */
+   pbf_info->p1_high = pbf_info->p1_high / DISP_FREQ_MULTIPLIER;
+   pbf_info->p1_low = pbf_info->p1_low / DISP_FREQ_MULTIPLIER;
+
+   /* create high priority cpu mask */
+   pbf_info->core_cpumask_size = alloc_cpu_set(_info->core_cpumask);
+   for (i = 0; i < topo_max_cpus; i++) {
+   cpu_bf = parse_int_file(1,
+   "/sys/devices/system/cpu/cpu%d/cpufreq/base_frequency",
+   i);
+   cpu_bf = cpu_bf / DISP_FREQ_MULTIPLIER;
+   if (cpu_bf == pbf_info->p1_high)
+   CPU_SET_S(i, pbf_info->core_cpumask_size,
+ pbf_info->core_cpumask);
+   }
+
+   /* extra ctdp & pbf struct parameters */
+   ctdp_level->processed = 1;
+   ctdp_level->pbf_support = 1; /* PBF is always supported and enabled */
+   ctdp_level->pbf_enabled = 1;
+   ctdp_level->fact_support = 0; /* FACT is never supported */
+   ctdp_level->fact_enabled = 0;
+}
+
 static void dump_isst_config_for_cpu(int cpu, void *arg1, void *arg2,
 void *arg3, void *arg4)
 {
@@ -1546,7 +1634,10 @@ void process_command(int argc, char **argv,
}
}
 
-   create_cpu_map();
+   if (cpu_model != 0x55)
+   create_cpu_map();
+   else
+   cascadelaken_config();
 
i = 0;
while (cmds[i].feature) {
@@ -1562,6 +1653,11 @@ void process_command(int argc, char **argv,
 
if (!matched)
fprintf(stderr, "Invalid command\n");
+
+   if (cpu_model != 0x55) {
+   /* this was allocated in cascadelaken_config() */
+   free_cpu_set(cn_pkg_dev.ctdp_level[0]

[PATCH 3/7] intel-speed-select: Add check for CascadeLake-N models

2019-09-26 Thread Prarit Bhargava
Three CascadeLake-N models (6252N, 6230N, and 5218N) have SST-PBF support.

Return an error if the CascadeLake processor is not one of these specific
models.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
---
 .../x86/intel-speed-select/isst-config.c  | 32 +--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index bb6f8f5986c9..ae8e3b5153ad 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -71,7 +71,7 @@ void debug_printf(const char *format, ...)
va_end(args);
 }
 
-static void update_cpu_model(void)
+static int update_cpu_model(void)
 {
unsigned int ebx, ecx, edx;
unsigned int fms, family;
@@ -81,6 +81,32 @@ static void update_cpu_model(void)
cpu_model = (fms >> 4) & 0xf;
if (family == 6 || family == 0xf)
cpu_model += ((fms >> 16) & 0xf) << 4;
+
+   /* only three CascadeLake-N models are supported */
+   if (cpu_model == 0x55) {
+   FILE *fp;
+   size_t n;
+   char *line;
+   int ret = 1;
+
+   fp = fopen("/proc/cpuinfo", "r");
+   if (!fp)
+   err(-1, "cannot open /proc/cpuinfo\n");
+
+   while (getline(, , fp) > 0) {
+   if (strstr(line, "model name")) {
+   if (strstr(line, "6252N") ||
+   strstr(line, "6230N") ||
+   strstr(line, "5218N"))
+   ret = 0;
+   break;
+   }
+   }
+   free(line);
+   fclose(fp);
+   return ret;
+   }
+   return 0;
 }
 
 /* Open a file, and exit on failure */
@@ -1638,7 +1664,9 @@ static void cmdline(int argc, char **argv)
fprintf(stderr, "Feature name and|or command not specified\n");
exit(0);
}
-   update_cpu_model();
+   ret = update_cpu_model();
+   if (ret)
+   err(-1, "Invalid CPU model (%d)\n", cpu_model);
printf("Intel(R) Speed Select Technology\n");
printf("Executing on CPU model:%d[0x%x]\n", cpu_model, cpu_model);
set_max_cpu_num();
-- 
2.21.0



[PATCH 6/7] intel-speed-select: Implement 'perf-profile info' on CascadeLake-N

2019-09-26 Thread Prarit Bhargava
Add functionality for perf-profile info on CascadeLake-N.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
---
 .../x86/intel-speed-select/isst-config.c  | 24 +++
 .../x86/intel-speed-select/isst-display.c | 22 +
 tools/power/x86/intel-speed-select/isst.h |  3 ++-
 3 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index a82005825990..0c66012fc5a1 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -823,6 +823,13 @@ static void cascadelaken_config(void)
ctdp_level->fact_enabled = 0;
 }
 
+static void dump_cn_config_for_cpu(int cpu, void *arg1, void *arg2,
+  void *arg3, void *arg4)
+{
+   isst_ctdp_display_information(cpu, outf, tdp_level, _pkg_dev,
+ cpu_model);
+}
+
 static void dump_isst_config_for_cpu(int cpu, void *arg1, void *arg2,
 void *arg3, void *arg4)
 {
@@ -834,13 +841,16 @@ static void dump_isst_config_for_cpu(int cpu, void *arg1, 
void *arg2,
if (ret) {
perror("isst_get_process_ctdp");
} else {
-   isst_ctdp_display_information(cpu, outf, tdp_level, _dev);
+   isst_ctdp_display_information(cpu, outf, tdp_level, _dev,
+ cpu_model);
isst_get_process_ctdp_complete(cpu, _dev);
}
 }
 
 static void dump_isst_config(int arg)
 {
+   void *fn;
+
if (cmd_help) {
fprintf(stderr,
"Print Intel(R) Speed Select Technology Performance 
profile configuration\n");
@@ -852,14 +862,17 @@ static void dump_isst_config(int arg)
exit(0);
}
 
+   if (cpu_model != 0x55)
+   fn = dump_isst_config_for_cpu;
+   else
+   fn = dump_cn_config_for_cpu;
+
isst_ctdp_display_information_start(outf);
 
if (max_target_cpus)
-   for_each_online_target_cpu_in_set(dump_isst_config_for_cpu,
- NULL, NULL, NULL, NULL);
+   for_each_online_target_cpu_in_set(fn, NULL, NULL, NULL, NULL);
else
-   for_each_online_package_in_set(dump_isst_config_for_cpu, NULL,
-  NULL, NULL, NULL);
+   for_each_online_package_in_set(fn, NULL, NULL, NULL, NULL);
 
isst_ctdp_display_information_end(outf);
 }
@@ -1374,6 +1387,7 @@ static void get_clos_assoc(int arg)
 }
 
 static struct process_cmd_struct cn_cmds[] = {
+   { "perf-profile", "info", dump_isst_config, 0 },
{ NULL, NULL, NULL, 0 }
 };
 
diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 1caa7ae25245..7d91a5de1b85 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -173,7 +173,7 @@ static void print_package_info(int cpu, FILE *outf)
 
 static void _isst_pbf_display_information(int cpu, FILE *outf, int level,
  struct isst_pbf_info *pbf_info,
- int disp_level)
+ int disp_level, int cpu_model)
 {
char header[256];
char value[256];
@@ -202,6 +202,9 @@ static void _isst_pbf_display_information(int cpu, FILE 
*outf, int level,
 pbf_info->p1_low * DISP_FREQ_MULTIPLIER);
format_and_print(outf, disp_level + 1, header, value);
 
+   if (cpu_model == 0x55)
+   return;
+
snprintf(header, sizeof(header), "tjunction-temperature(C)");
snprintf(value, sizeof(value), "%d", pbf_info->t_prochot);
format_and_print(outf, disp_level + 1, header, value);
@@ -306,7 +309,7 @@ void isst_ctdp_display_core_info(int cpu, FILE *outf, char 
*prefix,
 }
 
 void isst_ctdp_display_information(int cpu, FILE *outf, int tdp_level,
-  struct isst_pkg_ctdp *pkg_dev)
+  struct isst_pkg_ctdp *pkg_dev, int cpu_model)
 {
char header[256];
char value[256];
@@ -375,6 +378,16 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
snprintf(value, sizeof(value), "unsupported");
format_and_print(outf, base_level + 4, header, value);
 
+   if (cpu_model == 0x55) {
+   if (ctdp_level->pbf_support)
+   _isst_pbf_display_information(cpu, outf,
+ tdp_level,
+ _level->p

[PATCH 7/7] intel-speed-select: Implement base-freq commands on CascadeLake-N

2019-09-26 Thread Prarit Bhargava
Add functionality for base-freq info|enable|disable info on CascadeLake-N.

The enable command always returns success, and the disable command always
returns failed because SST-BF cannot be enabled or disabled from the OS on
CascadeLake-N.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
---
 .../x86/intel-speed-select/isst-config.c  | 56 ++-
 .../x86/intel-speed-select/isst-display.c |  6 +-
 tools/power/x86/intel-speed-select/isst.h |  3 +-
 3 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 0c66012fc5a1..24e6084a36b4 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -940,6 +940,18 @@ static void set_tdp_level(int arg)
isst_ctdp_display_information_end(outf);
 }
 
+static void cn_dump_pbf_config_for_cpu(int cpu, void *arg1, void *arg2,
+  void *arg3, void *arg4)
+{
+   struct isst_pbf_info *pbf_info;
+   struct isst_pkg_ctdp_level_info *ctdp_level;
+
+   ctdp_level = _pkg_dev.ctdp_level[0];
+   pbf_info = _level->pbf_info;
+
+   isst_pbf_display_information(cpu, outf, tdp_level, pbf_info, cpu_model);
+}
+
 static void dump_pbf_config_for_cpu(int cpu, void *arg1, void *arg2, void 
*arg3,
void *arg4)
 {
@@ -950,13 +962,16 @@ static void dump_pbf_config_for_cpu(int cpu, void *arg1, 
void *arg2, void *arg3,
if (ret) {
perror("isst_get_pbf_info");
} else {
-   isst_pbf_display_information(cpu, outf, tdp_level, _info);
+   isst_pbf_display_information(cpu, outf, tdp_level, _info,
+cpu_model);
isst_get_pbf_info_complete(_info);
}
 }
 
 static void dump_pbf_config(int arg)
 {
+   void *fn;
+
if (cmd_help) {
fprintf(stderr,
"Print Intel(R) Speed Select Technology base frequency 
configuration for a TDP level\n");
@@ -970,13 +985,18 @@ static void dump_pbf_config(int arg)
exit(1);
}
 
+   if (cpu_model != 0x55)
+   fn = dump_pbf_config_for_cpu;
+   else
+   fn = cn_dump_pbf_config_for_cpu;
+
isst_ctdp_display_information_start(outf);
+
if (max_target_cpus)
-   for_each_online_target_cpu_in_set(dump_pbf_config_for_cpu, NULL,
- NULL, NULL, NULL);
+   for_each_online_target_cpu_in_set(fn, NULL, NULL, NULL, NULL);
else
-   for_each_online_package_in_set(dump_pbf_config_for_cpu, NULL,
-  NULL, NULL, NULL);
+   for_each_online_package_in_set(fn, NULL, NULL, NULL, NULL);
+
isst_ctdp_display_information_end(outf);
 }
 
@@ -986,17 +1006,24 @@ static void set_pbf_for_cpu(int cpu, void *arg1, void 
*arg2, void *arg3,
int ret;
int status = *(int *)arg4;
 
-   ret = isst_set_pbf_fact_status(cpu, 1, status);
-   if (ret) {
-   perror("isst_set_pbf");
+   if (cpu_model != 0x55) {
+   ret = isst_set_pbf_fact_status(cpu, 1, status);
+   if (ret) {
+   perror("isst_set_pbf");
+   return;
+   }
+
} else {
-   if (status)
-   isst_display_result(cpu, outf, "base-freq", "enable",
-   ret);
+   if (status == 0)
+   ret = -1;
else
-   isst_display_result(cpu, outf, "base-freq", "disable",
-   ret);
+   ret = 0;
}
+
+   if (status)
+   isst_display_result(cpu, outf, "base-freq", "enable", ret);
+   else
+   isst_display_result(cpu, outf, "base-freq", "disable", ret);
 }
 
 static void set_pbf_enable(int arg)
@@ -1388,6 +1415,9 @@ static void get_clos_assoc(int arg)
 
 static struct process_cmd_struct cn_cmds[] = {
{ "perf-profile", "info", dump_isst_config, 0 },
+   { "base-freq", "info", dump_pbf_config, 0 },
+   { "base-freq", "enable", set_pbf_enable, 1 },
+   { "base-freq", "disable", set_pbf_enable, 0 },
{ NULL, NULL, NULL, 0 }
 };
 
diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 7d91a5de1b85..8cc8d963a55e 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -47

[PATCH 1/2] intel-speed-select: Display turbo-ratio-limits as a table

2019-09-23 Thread Prarit Bhargava
The output of the Turbo Ratio Limits is 75 lines long (each bucket has
3 lines and the headers).  This can be shrunk down into a table that is
easier to consume for both scripts and humans.

Display Turbo Ratio Limits in a table.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
---
 .../x86/intel-speed-select/isst-display.c | 86 ---
 1 file changed, 34 insertions(+), 52 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index df4aa99c4e92..9b0ae0831a60 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -287,6 +287,28 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
format_and_print(outf, base_level + 2, header, value);
 }
 
+static void isst_turbo_ratio_limits(FILE *outf, char *header_name,
+   int *active_cores,
+   unsigned long long buckets_info,
+   int base_level)
+{
+   char header[256];
+   int i;
+
+   snprintf(header, sizeof(header), header_name);
+   format_and_print(outf, base_level, header, NULL);
+   snprintf(header, sizeof(header),"%11s %8s %8s",
+"" , "core-count", "max_freq(MHz)");
+   format_and_print(outf, base_level + 1, header, NULL);
+
+   for (i = 0; i < 8; ++i) {
+   snprintf(header, sizeof (header), "bucket-%d %8lld %12d", i,
+(buckets_info >> (i * 8)) & 0xff,
+active_cores[i] * DISP_FREQ_MULTIPLIER);
+   format_and_print(outf, base_level + 1, header, NULL);
+   }
+}
+
 void isst_ctdp_display_information(int cpu, FILE *outf, int tdp_level,
   struct isst_pkg_ctdp *pkg_dev)
 {
@@ -365,58 +387,18 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
snprintf(value, sizeof(value), "%d", ctdp_level->t_proc_hot);
format_and_print(outf, base_level + 4, header, value);
 
-   snprintf(header, sizeof(header), "turbo-ratio-limits-sse");
-   format_and_print(outf, base_level + 4, header, NULL);
-   for (j = 0; j < 8; ++j) {
-   snprintf(header, sizeof(header), "bucket-%d", j);
-   format_and_print(outf, base_level + 5, header, NULL);
-
-   snprintf(header, sizeof(header), "core-count");
-   snprintf(value, sizeof(value), "%llu", 
(ctdp_level->buckets_info >> (j * 8)) & 0xff);
-   format_and_print(outf, base_level + 6, header, value);
-
-   snprintf(header, sizeof(header),
-   "max-turbo-frequency(MHz)");
-   snprintf(value, sizeof(value), "%d",
-ctdp_level->trl_sse_active_cores[j] *
- DISP_FREQ_MULTIPLIER);
-   format_and_print(outf, base_level + 6, header, value);
-   }
-   snprintf(header, sizeof(header), "turbo-ratio-limits-avx");
-   format_and_print(outf, base_level + 4, header, NULL);
-   for (j = 0; j < 8; ++j) {
-   snprintf(header, sizeof(header), "bucket-%d", j);
-   format_and_print(outf, base_level + 5, header, NULL);
-
-   snprintf(header, sizeof(header), "core-count");
-   snprintf(value, sizeof(value), "%llu", 
(ctdp_level->buckets_info >> (j * 8)) & 0xff);
-   format_and_print(outf, base_level + 6, header, value);
-
-   snprintf(header, sizeof(header),
-   "max-turbo-frequency(MHz)");
-   snprintf(value, sizeof(value), "%d",
-ctdp_level->trl_avx_active_cores[j] *
- DISP_FREQ_MULTIPLIER);
-   format_and_print(outf, base_level + 6, header, value);
-   }
-
-   snprintf(header, sizeof(header), "turbo-ratio-limits-avx512");
-   format_and_print(outf, base_level + 4, header, NULL);
-   for (j = 0; j < 8; ++j) {
-   snprintf(header, sizeof(header), "bucket-%d", j);
-   format_and_print(outf, base_level + 5, header, NULL);
-
-   snprintf(header, sizeof(header), "core-count");
-   snprintf(value, sizeof(value), "%llu", 
(ctdp_level->buckets_info >> (j * 8)) & 0xff);
-

[PATCH 0/2] intel-speed-select: Convert output to tables

2019-09-23 Thread Prarit Bhargava
The turbo ratio limits and turbo frequencies add a large amount of
lines to the output.  The output can be truncated into human and
machine readable tables to reduce the number of lines of output.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 

Prarit Bhargava (2):
  intel-speed-select: Display turbo-ratio-limits as a table
  intel-speed-select: Display turbo frequencies in a table

 .../x86/intel-speed-select/isst-display.c | 196 +-
 1 file changed, 94 insertions(+), 102 deletions(-)

-- 
2.21.0



[PATCH 2/2] intel-speed-select: Display turbo frequencies in a table

2019-09-23 Thread Prarit Bhargava
The output of turbo frequencies is also long (each bucket has
3 lines and the headers).  This can be shrunk down into a table that is
easier to consume for both scripts and humans.

Display the turbo and clip frequencies in a table.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
---
 .../x86/intel-speed-select/isst-display.c | 116 ++
 1 file changed, 63 insertions(+), 53 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 9b0ae0831a60..16843e0f78f0 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -213,6 +213,52 @@ static void _isst_pbf_display_information(int cpu, FILE 
*outf, int level,
format_and_print(outf, disp_level + 1, header, value);
 }
 
+static void _isst_fact_display_frequencies(FILE *outf, int fact_avx, int level,
+  char *description, int bucket,
+  int core_count, int sse,
+  int avx, int avx512)
+{
+   char header[256];
+   int ret;
+   void *header_ptr;
+   int header_size;
+
+   header_ptr = header;
+   if (core_count > 0)
+   ret = snprintf(header_ptr, sizeof(header), "%s%d %6d ",
+  description, bucket, core_count);
+   else
+   ret = snprintf(header_ptr, sizeof(header), "%s%6s ",
+  description, "-");
+
+
+   header_ptr += ret;
+   header_size = header_ptr - (void *) header;
+   if (fact_avx & 0x01)
+   ret = snprintf(header_ptr, header_size, "%12d ",
+  sse * DISP_FREQ_MULTIPLIER);
+   else
+   ret = snprintf(header_ptr, header_size, "%12s ", "- ");
+
+   header_ptr += ret;
+   header_size = header_ptr - (void *) header;
+   if (fact_avx & 0x02)
+   ret = snprintf(header_ptr, header_size, "%12d ",
+  avx * DISP_FREQ_MULTIPLIER);
+   else
+   ret = snprintf(header_ptr, header_size, "%12s ", "- ");
+
+   header_ptr += ret;
+   header_size = header_ptr - (void *) header;
+   if (fact_avx & 0x04)
+   ret = snprintf(header_ptr, header_size, "%12d",
+  avx512 * DISP_FREQ_MULTIPLIER);
+   else
+   ret = snprintf(header_ptr, header_size, "%12s", "-");
+
+   format_and_print(outf, level + 1, header, NULL);
+}
+
 static void _isst_fact_display_information(int cpu, FILE *outf, int level,
   int fact_bucket, int fact_avx,
   struct isst_fact_info *fact_info,
@@ -220,11 +266,14 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
 {
struct isst_fact_bucket_info *bucket_info = fact_info->bucket_info;
char header[256];
-   char value[256];
int j;
 
snprintf(header, sizeof(header), "speed-select-turbo-freq");
format_and_print(outf, base_level, header, NULL);
+   snprintf(header, sizeof(header),"%11s %s %s %s %s",
+"", "core-count", "max-sse(MHz)" , "max-avx2(MHz)", 
"max-avx512(MHz)");
+   format_and_print(outf, base_level + 1, header, NULL);
+
for (j = 0; j < ISST_FACT_MAX_BUCKETS; ++j) {
if (fact_bucket != 0xff && fact_bucket != j)
continue;
@@ -232,59 +281,20 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
if (!bucket_info[j].high_priority_cores_count)
break;
 
-   snprintf(header, sizeof(header), "bucket-%d", j);
-   format_and_print(outf, base_level + 1, header, NULL);
-
-   snprintf(header, sizeof(header), "high-priority-cores-count");
-   snprintf(value, sizeof(value), "%d",
-bucket_info[j].high_priority_cores_count);
-   format_and_print(outf, base_level + 2, header, value);
-
-   if (fact_avx & 0x01) {
-   snprintf(header, sizeof(header),
-"high-priority-max-frequency(MHz)");
-   snprintf(value, sizeof(value), "%d",
-bucket_info[j].sse_trl * DISP_FREQ_MULTIPLIER);
-   format_and_print(outf, base_level + 2, header, value);
-   }
-
-   if (fact_avx & 0x02) {
-   snprintf(header, sizeof(header),
-   

Re: [PATCH 2/2] tools/power/x86/intel-speed-select: Display core count for bucket

2019-09-08 Thread Prarit Bhargava



On 9/7/19 2:18 PM, Andy Shevchenko wrote:
> On Fri, Sep 6, 2019 at 10:47 PM Srinivas Pandruvada
>  wrote:
>>
>> On Fri, 2019-09-06 at 07:50 -0700, Srinivas Pandruvada wrote:
>>> On Fri, 2019-09-06 at 16:46 +0300, Andy Shevchenko wrote:
>>>> On Fri, Sep 06, 2019 at 05:39:54AM -0400, Prarit Bhargava wrote:
>>>>> On 9/5/19 7:37 PM, Srinivas Pandruvada wrote:
>>>>>> Read the bucket and core count relationship via MSR and display
>>>>>> when displaying turbo ratio limits.
>>>>>> +   ret = isst_send_msr_command(cpu, 0x1ae, 0,
>>>>>> buckets_info);
>>>>>
>>>>> ^^^ you can get rid of the magic number 0x1ae by doing (sorry for
>>>>> the cut-and-paste)
>>>>>
>>>>> diff --git a/tools/power/x86/intel-speed-select/Makefile
>>>>> b/tools/power/x86/intel
>>>>> index 12c6939dca2a..087d802ad844 100644
>>>>> --- a/tools/power/x86/intel-speed-select/Makefile
>>>>> +++ b/tools/power/x86/intel-speed-select/Makefile
>>>>> @@ -15,6 +15,8 @@ endif
>>>>>  MAKEFLAGS += -r
>>>>>
>>>>>  override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
>>>>> +override CFLAGS += -I../../../include
>>>>> +override CFLAGS +=
>>>>> -DMSRHEADER='"../../../../arch/x86/include/asm/msr-index.h"'
>>>
>>> No, we can't use msr_index.
>> This comment was meant for use of /dev/cpu/X/msr not msr_index.
>> I didn't want to bring in dependency on msr-index.h for couple of 2
>> MSRs and the names in msr-index.h doesn't really reflect the actual
>> processing, they are doing. For example MSR_TURBO_RATIO_LIMIT1 for
>> 0x1ae. The definition of 0x1AE is different on cpu model 0x55 and
>> beyond.
>>
>>>
> 
> It seems not applicable on top of tools patch series I had applied before.
> 
>>>>
>>>> I guess it can be done in more neat way.
>>>>
>>>>> As I've been looking at this code I have been wondering why
>>>>> didn't
>>>>> you just use
>>>>> the standard /dev/cpu/X/msr interface that other x86 power
>>>>> utilities (turbostat,
>>>>> x86_energy_perf_policy) use?  Implementing msr_read() is trivial
>>>>> (warning
>>>>> untested and uncompiled code)
>>>
>>> No. We can't. The MSR interface is disabled on several distribution
>>> and
>>> platforms with secured boot. So some special MSRs are only allowed
>>> via
>>> this IOCTL interface.
>>>

Which distros don't have /dev/cpu/X/msr ?

None of other Intel tools have this restriction (or requirement depending on
your point of view).  Why is intel-speed-select special that we have to
jump through hoops?

P.

>>> Thanks,
>>> Srinivas
>>>
>>>
>>>>
>>>> Actually good point!
>>>>
>>
> 
> 


Re: [PATCH 2/2] tools/power/x86/intel-speed-select: Display core count for bucket

2019-09-06 Thread Prarit Bhargava



On 9/5/19 7:37 PM, Srinivas Pandruvada wrote:
> Read the bucket and core count relationship via MSR and display
> when displaying turbo ratio limits.
> 
> Signed-off-by: Srinivas Pandruvada 
> ---
>  .../power/x86/intel-speed-select/isst-core.c  | 22 +++
>  .../x86/intel-speed-select/isst-display.c |  6 ++---
>  tools/power/x86/intel-speed-select/isst.h |  1 +
>  3 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/power/x86/intel-speed-select/isst-core.c 
> b/tools/power/x86/intel-speed-select/isst-core.c
> index 8de4ac39a008..2f864c4b994d 100644
> --- a/tools/power/x86/intel-speed-select/isst-core.c
> +++ b/tools/power/x86/intel-speed-select/isst-core.c
> @@ -188,6 +188,24 @@ int isst_get_get_trl(int cpu, int level, int avx_level, 
> int *trl)
>   return 0;
>  }
>  
> +int isst_get_trl_bucket_info(int cpu, unsigned long long *buckets_info)
> +{
> + int ret;
> +
> + debug_printf("cpu:%d bucket info via MSR\n", cpu);
> +
> + *buckets_info = 0;
> +
> + ret = isst_send_msr_command(cpu, 0x1ae, 0, buckets_info);

^^^ you can get rid of the magic number 0x1ae by doing (sorry for the 
cut-and-paste)

diff --git a/tools/power/x86/intel-speed-select/Makefile b/tools/power/x86/intel
index 12c6939dca2a..087d802ad844 100644
--- a/tools/power/x86/intel-speed-select/Makefile
+++ b/tools/power/x86/intel-speed-select/Makefile
@@ -15,6 +15,8 @@ endif
 MAKEFLAGS += -r

 override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
+override CFLAGS += -I../../../include
+override CFLAGS += -DMSRHEADER='"../../../../arch/x86/include/asm/msr-index.h"'

 ALL_TARGETS := intel-speed-select
 ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
diff --git a/tools/power/x86/intel-speed-select/isst.h b/tools/power/x86/intel-s
index 2f7f62765eb6..00d159dc12a6 100644
--- a/tools/power/x86/intel-speed-select/isst.h
+++ b/tools/power/x86/intel-speed-select/isst.h
@@ -7,6 +7,7 @@
 #ifndef _ISST_H_
 #define _ISST_H_

+#include MSRHEADER
 #include 
 #include 
 #include 

and replacing the MSR addresses with the names of the MSRs.

> + if (ret)
> + return ret;
> +

As I've been looking at this code I have been wondering why didn't you just use
the standard /dev/cpu/X/msr interface that other x86 power utilities (turbostat,
x86_energy_perf_policy) use?  Implementing msr_read() is trivial (warning
untested and uncompiled code)

static void read_msr(int cpu, int offset, unsigned long long *msr)
{
ssize_t retval;
char pathname[32];
int fd;

sprintf(pathname, "/dev/cpu/%d/msr", cpu);
fd = open(pathname, O_RDONLY);
if (fd < 0)
err(-1, "%s open failed", pathname);

retval = pread(fd, msr, sizeof(*msr), offset);
if (retval != (sizeof *msr))
err(-1, "%s failed: cpu %d msr offset 0x%llx\n", __func__, cpu,
(unsigned long long)offset);
close(fd);
}

and would result in a significant reduction in code in the driver and the tool
IMO.  write_msr() is equally trivial.

P.

> + debug_printf("cpu:%d bucket info via MSR successful 0x%llx\n", cpu,
> +  *buckets_info);
> +
> + return 0;
> +}
> +
>  int isst_set_tdp_level_msr(int cpu, int tdp_level)
>  {
>   int ret;
> @@ -563,6 +581,10 @@ int isst_get_process_ctdp(int cpu, int tdp_level, struct 
> isst_pkg_ctdp *pkg_dev)
>   if (ret)
>   return ret;
>  
> + ret = isst_get_trl_bucket_info(cpu, _level->buckets_info);
> + if (ret)
> + return ret;
> +
>   ret = isst_get_get_trl(cpu, i, 0,
>  ctdp_level->trl_sse_active_cores);
>   if (ret)
> diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
> b/tools/power/x86/intel-speed-select/isst-display.c
> index 8500cf2997a6..df4aa99c4e92 100644
> --- a/tools/power/x86/intel-speed-select/isst-display.c
> +++ b/tools/power/x86/intel-speed-select/isst-display.c
> @@ -372,7 +372,7 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
> int tdp_level,
>   format_and_print(outf, base_level + 5, header, NULL);
>  
>   snprintf(header, sizeof(header), "core-count");
> - snprintf(value, sizeof(value), "%d", j);
> + snprintf(value, sizeof(value), "%llu", 
> (ctdp_level->buckets_info >> (j * 8)) & 0xff);
>   format_and_print(outf, base_level + 6, header, value);
>  
>   snprintf(header, sizeof(header),
> @@ -389,7 +389,7 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
> int tdp_level,
>   format_and_print(outf, base_level + 5, header, NULL);
>  
>   snprintf(header, sizeof(header), "core-count");
> - snprintf(value, sizeof(value), "%d", j);
> + snprintf(value, sizeof(value), "%llu", 
> 

Re: [PATCH v2 9/9] tools/power/x86/intel-speed-select: Fix memory leak

2019-09-05 Thread Prarit Bhargava



On 9/5/19 3:42 PM, Srinivas Pandruvada wrote:
> On Thu, 2019-09-05 at 08:03 -0400, Prarit Bhargava wrote:
>> cpumasks are allocated by calling the alloc_cpu_mask() function and
>> are
>> never free'd.  They should be free'd after the commands have run.
>>
>> Fix the memory leaks by calling free_cpu_set().
> Good to fix this. But after one command execution the process will
> exit.
> 

I'm sorry, I misunderstood your comment.  Yes, the process will exit after one
command :)

P.

> Thanks,
> Srinivas
> 
>>
>> Signed-off-by: Prarit Bhargava 
>> Cc: Srinivas Pandruvada 
>> Cc: David Arcari 
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  tools/power/x86/intel-speed-select/isst-config.c | 16 +++---
>> --
>>  1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/power/x86/intel-speed-select/isst-config.c
>> b/tools/power/x86/intel-speed-select/isst-config.c
>> index 78f0cebda1da..59753b3917bb 100644
>> --- a/tools/power/x86/intel-speed-select/isst-config.c
>> +++ b/tools/power/x86/intel-speed-select/isst-config.c
>> @@ -603,6 +603,10 @@ static int isst_fill_platform_info(void)
>>  
>>  close(fd);
>>  
>> +if (isst_platform_info.api_version > supported_api_ver) {
>> +printf("Incompatible API versions; Upgrade of tool is
>> required\n");
>> +return -1;
>> +}
>>  return 0;
>>  }
>>  
>> @@ -1528,6 +1532,7 @@ static void cmdline(int argc, char **argv)
>>  {
>>  int opt;
>>  int option_index = 0;
>> +int ret;
>>  
>>  static struct option long_options[] = {
>>  { "cpu", required_argument, 0, 'c' },
>> @@ -1589,13 +1594,14 @@ static void cmdline(int argc, char **argv)
>>  set_max_cpu_num();
>>  set_cpu_present_cpu_mask();
>>  set_cpu_target_cpu_mask();
>> -isst_fill_platform_info();
>> -if (isst_platform_info.api_version > supported_api_ver) {
>> -printf("Incompatible API versions; Upgrade of tool is
>> required\n");
>> -exit(0);
>> -}
>> +ret = isst_fill_platform_info();
>> +if (ret)
>> +goto out;
>>  
>>  process_command(argc, argv);
>> +out:
>> +free_cpu_set(present_cpumask);
>> +free_cpu_set(target_cpumask);
>>  }
>>  
>>  int main(int argc, char **argv)
> 


Re: [PATCH v2 9/9] tools/power/x86/intel-speed-select: Fix memory leak

2019-09-05 Thread Prarit Bhargava



On 9/5/19 3:42 PM, Srinivas Pandruvada wrote:
> On Thu, 2019-09-05 at 08:03 -0400, Prarit Bhargava wrote:
>> cpumasks are allocated by calling the alloc_cpu_mask() function and
>> are
>> never free'd.  They should be free'd after the commands have run.
>>
>> Fix the memory leaks by calling free_cpu_set().
> Good to fix this. But after one command execution the process will
> exit.

Oh ... I didn't realize it was possible to execute multiple commands in one
call.  I'll go off and fix that and send a v3.

P.

> 
> Thanks,
> Srinivas
> 
>>
>> Signed-off-by: Prarit Bhargava 
>> Cc: Srinivas Pandruvada 
>> Cc: David Arcari 
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  tools/power/x86/intel-speed-select/isst-config.c | 16 +++---
>> --
>>  1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/power/x86/intel-speed-select/isst-config.c
>> b/tools/power/x86/intel-speed-select/isst-config.c
>> index 78f0cebda1da..59753b3917bb 100644
>> --- a/tools/power/x86/intel-speed-select/isst-config.c
>> +++ b/tools/power/x86/intel-speed-select/isst-config.c
>> @@ -603,6 +603,10 @@ static int isst_fill_platform_info(void)
>>  
>>  close(fd);
>>  
>> +if (isst_platform_info.api_version > supported_api_ver) {
>> +printf("Incompatible API versions; Upgrade of tool is
>> required\n");
>> +return -1;
>> +}
>>  return 0;
>>  }
>>  
>> @@ -1528,6 +1532,7 @@ static void cmdline(int argc, char **argv)
>>  {
>>  int opt;
>>  int option_index = 0;
>> +int ret;
>>  
>>  static struct option long_options[] = {
>>  { "cpu", required_argument, 0, 'c' },
>> @@ -1589,13 +1594,14 @@ static void cmdline(int argc, char **argv)
>>  set_max_cpu_num();
>>  set_cpu_present_cpu_mask();
>>  set_cpu_target_cpu_mask();
>> -isst_fill_platform_info();
>> -if (isst_platform_info.api_version > supported_api_ver) {
>> -printf("Incompatible API versions; Upgrade of tool is
>> required\n");
>> -exit(0);
>> -}
>> +ret = isst_fill_platform_info();
>> +if (ret)
>> +goto out;
>>  
>>  process_command(argc, argv);
>> +out:
>> +free_cpu_set(present_cpumask);
>> +free_cpu_set(target_cpumask);
>>  }
>>  
>>  int main(int argc, char **argv)
> 


[PATCH v2 9/9] tools/power/x86/intel-speed-select: Fix memory leak

2019-09-05 Thread Prarit Bhargava
cpumasks are allocated by calling the alloc_cpu_mask() function and are
never free'd.  They should be free'd after the commands have run.

Fix the memory leaks by calling free_cpu_set().

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 tools/power/x86/intel-speed-select/isst-config.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 78f0cebda1da..59753b3917bb 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -603,6 +603,10 @@ static int isst_fill_platform_info(void)
 
close(fd);
 
+   if (isst_platform_info.api_version > supported_api_ver) {
+   printf("Incompatible API versions; Upgrade of tool is 
required\n");
+   return -1;
+   }
return 0;
 }
 
@@ -1528,6 +1532,7 @@ static void cmdline(int argc, char **argv)
 {
int opt;
int option_index = 0;
+   int ret;
 
static struct option long_options[] = {
{ "cpu", required_argument, 0, 'c' },
@@ -1589,13 +1594,14 @@ static void cmdline(int argc, char **argv)
set_max_cpu_num();
set_cpu_present_cpu_mask();
set_cpu_target_cpu_mask();
-   isst_fill_platform_info();
-   if (isst_platform_info.api_version > supported_api_ver) {
-   printf("Incompatible API versions; Upgrade of tool is 
required\n");
-   exit(0);
-   }
+   ret = isst_fill_platform_info();
+   if (ret)
+   goto out;
 
process_command(argc, argv);
+out:
+   free_cpu_set(present_cpumask);
+   free_cpu_set(target_cpumask);
 }
 
 int main(int argc, char **argv)
-- 
2.21.0



[PATCH v2 6/9] tools/power/x86/intel-speed-select: Change turbo ratio output to maximum turbo frequency

2019-09-05 Thread Prarit Bhargava
The intel-speed-select tool currently outputs the turbo ratio for every
bucket.  Make the output more user-friendly by changing the output to the
maximum turbo frequency.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 .../x86/intel-speed-select/isst-display.c  | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 4ec1924ff7a9..cfeee0beb78d 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -336,9 +336,11 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
snprintf(value, sizeof(value), "%d", j);
format_and_print(outf, base_level + 6, header, value);
 
-   snprintf(header, sizeof(header), "turbo-ratio");
+   snprintf(header, sizeof(header),
+   "max-turbo-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
-ctdp_level->trl_sse_active_cores[j]);
+ctdp_level->trl_sse_active_cores[j] *
+ DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 6, header, value);
}
snprintf(header, sizeof(header), "turbo-ratio-limits-avx");
@@ -351,9 +353,11 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
snprintf(value, sizeof(value), "%d", j);
format_and_print(outf, base_level + 6, header, value);
 
-   snprintf(header, sizeof(header), "turbo-ratio");
+   snprintf(header, sizeof(header),
+   "max-turbo-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
-ctdp_level->trl_avx_active_cores[j]);
+ctdp_level->trl_avx_active_cores[j] *
+ DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 6, header, value);
}
 
@@ -367,9 +371,11 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
snprintf(value, sizeof(value), "%d", j);
format_and_print(outf, base_level + 6, header, value);
 
-   snprintf(header, sizeof(header), "turbo-ratio");
+   snprintf(header, sizeof(header),
+   "max-turbo-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
-ctdp_level->trl_avx_512_active_cores[j]);
+ctdp_level->trl_avx_512_active_cores[j] *
+ DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 6, header, value);
}
if (ctdp_level->pbf_support)
-- 
2.21.0



[PATCH v2 4/9] tools/power/x86/intel-speed-select: Simplify output for turbo-freq and base-freq

2019-09-05 Thread Prarit Bhargava
The current output of 'intel-speed-select -c 53 perf-profile info -l 0'
shows

speed-select-turbo-freq-support:1
speed-select-base-freq-support:1
speed-select-base-freq-enabled:0
speed-select-turbo-freq-enabled:0

Simplify the output to single lines displaying status of disabled,
enabled, and unsupported.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 .../x86/intel-speed-select/isst-display.c | 30 ++-
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 0d9a53a5da2d..3ae693efceaa 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -297,23 +297,25 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
format_and_print(outf, base_level + 4, header, value);
 
snprintf(header, sizeof(header),
-"speed-select-turbo-freq-support");
-   snprintf(value, sizeof(value), "%d", ctdp_level->fact_support);
+"speed-select-turbo-freq");
+   if (ctdp_level->fact_support) {
+   if (ctdp_level->fact_enabled)
+   snprintf(value, sizeof(value), "enabled");
+   else
+   snprintf(value, sizeof(value), "disabled");
+   } else
+   snprintf(value, sizeof(value), "unsupported");
format_and_print(outf, base_level + 4, header, value);
 
snprintf(header, sizeof(header),
-"speed-select-base-freq-support");
-   snprintf(value, sizeof(value), "%d", ctdp_level->pbf_support);
-   format_and_print(outf, base_level + 4, header, value);
-
-   snprintf(header, sizeof(header),
-"speed-select-base-freq-enabled");
-   snprintf(value, sizeof(value), "%d", ctdp_level->pbf_enabled);
-   format_and_print(outf, base_level + 4, header, value);
-
-   snprintf(header, sizeof(header),
-"speed-select-turbo-freq-enabled");
-   snprintf(value, sizeof(value), "%d", ctdp_level->fact_enabled);
+"speed-select-base-freq");
+   if (ctdp_level->pbf_support) {
+   if (ctdp_level->pbf_enabled)
+   snprintf(value, sizeof(value), "enabled");
+   else
+   snprintf(value, sizeof(value), "disabled");
+   } else
+   snprintf(value, sizeof(value), "unsupported");
format_and_print(outf, base_level + 4, header, value);
 
snprintf(header, sizeof(header), "thermal-design-power(W)");
-- 
2.21.0



[PATCH v2 3/9] tools/power/x86/intel-speed-select: Fix cpu-count output

2019-09-05 Thread Prarit Bhargava
I have a system with 28 threads/socket but intel-speed-select reports
a cpu-count of 29.

Fix an off-by-one error in the cpu_count() function.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 tools/power/x86/intel-speed-select/isst-config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index d32af8210427..f81a28c6b586 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -304,7 +304,7 @@ static void set_cpu_present_cpu_mask(void)
 int get_cpu_count(int pkg_id, int die_id)
 {
if (pkg_id < MAX_PACKAGE_COUNT && die_id < MAX_DIE_PER_PACKAGE)
-   return cpu_cnt[pkg_id][die_id] + 1;
+   return cpu_cnt[pkg_id][die_id];
 
return 0;
 }
-- 
2.21.0



[PATCH v2 5/9] tools/power/x86/intel-speed-select: Switch output to MHz

2019-09-05 Thread Prarit Bhargava
These features are introduced on new processors that will never operate
in the KHz range.

Save some zeros and switch the output to MHz.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 .../x86/intel-speed-select/isst-display.c | 20 +--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 3ae693efceaa..4ec1924ff7a9 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -6,7 +6,7 @@
 
 #include "isst.h"
 
-#define DISP_FREQ_MULTIPLIER 10
+#define DISP_FREQ_MULTIPLIER 100
 
 static void printcpumask(int str_len, char *str, int mask_size,
 cpu_set_t *cpu_mask)
@@ -156,7 +156,7 @@ static void _isst_pbf_display_information(int cpu, FILE 
*outf, int level,
snprintf(header, sizeof(header), "speed-select-base-freq");
format_and_print(outf, disp_level, header, NULL);
 
-   snprintf(header, sizeof(header), "high-priority-base-frequency(KHz)");
+   snprintf(header, sizeof(header), "high-priority-base-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 pbf_info->p1_high * DISP_FREQ_MULTIPLIER);
format_and_print(outf, disp_level + 1, header, value);
@@ -166,7 +166,7 @@ static void _isst_pbf_display_information(int cpu, FILE 
*outf, int level,
 pbf_info->core_cpumask);
format_and_print(outf, disp_level + 1, header, value);
 
-   snprintf(header, sizeof(header), "low-priority-base-frequency(KHz)");
+   snprintf(header, sizeof(header), "low-priority-base-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 pbf_info->p1_low * DISP_FREQ_MULTIPLIER);
format_and_print(outf, disp_level + 1, header, value);
@@ -209,7 +209,7 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
 
if (fact_avx & 0x01) {
snprintf(header, sizeof(header),
-"high-priority-max-frequency(KHz)");
+"high-priority-max-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 bucket_info[j].sse_trl * DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 2, header, value);
@@ -217,7 +217,7 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
 
if (fact_avx & 0x02) {
snprintf(header, sizeof(header),
-"high-priority-max-avx2-frequency(KHz)");
+"high-priority-max-avx2-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 bucket_info[j].avx_trl * DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 2, header, value);
@@ -225,7 +225,7 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
 
if (fact_avx & 0x04) {
snprintf(header, sizeof(header),
-"high-priority-max-avx512-frequency(KHz)");
+"high-priority-max-avx512-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 bucket_info[j].avx512_trl *
 DISP_FREQ_MULTIPLIER);
@@ -235,19 +235,19 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
snprintf(header, sizeof(header),
 "speed-select-turbo-freq-clip-frequencies");
format_and_print(outf, base_level + 1, header, NULL);
-   snprintf(header, sizeof(header), "low-priority-max-frequency(KHz)");
+   snprintf(header, sizeof(header), "low-priority-max-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 fact_info->lp_clipping_ratio_license_sse *
 DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 2, header, value);
snprintf(header, sizeof(header),
-"low-priority-max-avx2-frequency(KHz)");
+"low-priority-max-avx2-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 fact_info->lp_clipping_ratio_license_avx2 *
 DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 2, header, value);
snprintf(header, sizeof(header),
-"low-priority-max-avx512-frequency(KHz)");
+

[PATCH v2 2/9] tools/power/x86/intel-speed-select: Fix help option typo

2019-09-05 Thread Prarit Bhargava
Help is -h, not --h.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 tools/power/x86/intel-speed-select/isst-config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 91c5ad1685a1..d32af8210427 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -1491,7 +1491,7 @@ static void usage(void)
printf("intel-speed-select [OPTIONS] FEATURE COMMAND 
COMMAND_ARGUMENTS\n");
printf("\nUse this tool to enumerate and control the Intel Speed Select 
Technology features,\n");
printf("\nFEATURE : [perf-profile|base-freq|turbo-freq|core-power]\n");
-   printf("\nFor help on each feature, use --h|--help\n");
+   printf("\nFor help on each feature, use -h|--help\n");
printf("\tFor example:  intel-speed-select perf-profile -h\n");
 
printf("\nFor additional help on each command for a feature, use 
--h|--help\n");
-- 
2.21.0



[PATCH v2 0/9] tools-power-x86-intel-speed-select: Fixes and updates for output

2019-09-05 Thread Prarit Bhargava
Some general fixes and updates for intel-speed-select.  Fixes include some
typos as well as an off-by-one cpu count reporting error.  Updates for the
output are

- switching to MHz as a standard
- reporting CPU frequencies instead of ratios as a standard
- viewing a human-readable CPU list.
- avoiding reporting "0|1" as success|fail as these can be confusing for a
  user.

v2: Add additional patch to fix memory leak and remove help text in 8/9.

Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org

Prarit Bhargava (9):
  tools/power/x86/intel-speed-select: Fix package typo
  tools/power/x86/intel-speed-select: Fix help option typo
  tools/power/x86/intel-speed-select: Fix cpu-count output
  tools/power/x86/intel-speed-select: Simplify output for turbo-freq and
base-freq
  tools/power/x86/intel-speed-select: Switch output to MHz
  tools/power/x86/intel-speed-select: Change turbo ratio output to
maximum turbo frequency
  tools/power/x86/intel-speed-select: Output human readable CPU list
  tools/power/x86/intel-speed-select: Output success/failed for command
output
  tools/power/x86/intel-speed-select: Fix memory leak

 .../x86/intel-speed-select/isst-config.c  |  21 +--
 .../x86/intel-speed-select/isst-display.c | 120 +-
 2 files changed, 98 insertions(+), 43 deletions(-)

-- 
2.21.0



[PATCH v2 8/9] tools/power/x86/intel-speed-select: Output success/failed for command output

2019-09-05 Thread Prarit Bhargava
Command output has confusing data, returning "0" on success.  For example

|# ./intel-speed-select -c 14 turbo-freq enable
Intel(R) Speed Select Technology
Executing on CPU model:106[0x6a]
 package-1
   die-0
 cpu-14
   turbo-freq
 enable:0

To avoid confusion change the command output to 'success' or 'failed'.

v2: Remove help output line.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 tools/power/x86/intel-speed-select/isst-config.c  | 1 -
 tools/power/x86/intel-speed-select/isst-display.c | 5 -
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index f81a28c6b586..78f0cebda1da 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -1514,7 +1514,6 @@ static void usage(void)
printf("\tResult display uses a common format for each command:\n");
printf("\tResults are formatted in text/JSON with\n");
printf("\t\tPackage, Die, CPU, and command specific results.\n");
-   printf("\t\t\tFor Set commands, status is 0 for success and rest for 
failures\n");
exit(1);
 }
 
diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 890a01bfee4b..8500cf2997a6 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -519,7 +519,10 @@ void isst_display_result(int cpu, FILE *outf, char 
*feature, char *cmd,
snprintf(header, sizeof(header), "%s", feature);
format_and_print(outf, 4, header, NULL);
snprintf(header, sizeof(header), "%s", cmd);
-   snprintf(value, sizeof(value), "%d", result);
+   if (!result)
+   snprintf(value, sizeof(value), "success");
+   else
+   snprintf(value, sizeof(value), "failed(error %d)", result);
format_and_print(outf, 5, header, value);
 
format_and_print(outf, 1, NULL, NULL);
-- 
2.21.0



[PATCH v2 7/9] tools/power/x86/intel-speed-select: Output human readable CPU list

2019-09-05 Thread Prarit Bhargava
The intel-speed-select tool currently only outputs a hexidecimal CPU mask,
which requires translation for use with kernel parameters such as
isolcpus.

Along with the CPU mask, output a human readable CPU list.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 .../x86/intel-speed-select/isst-display.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index cfeee0beb78d..890a01bfee4b 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -8,6 +8,33 @@
 
 #define DISP_FREQ_MULTIPLIER 100
 
+static void printcpulist(int str_len, char *str, int mask_size,
+cpu_set_t *cpu_mask)
+{
+   int i, first, curr_index, index;
+
+   if (!CPU_COUNT_S(mask_size, cpu_mask)) {
+   snprintf(str, str_len, "none");
+   return;
+   }
+
+   curr_index = 0;
+   first = 1;
+   for (i = 0; i < get_topo_max_cpus(); ++i) {
+   if (!CPU_ISSET_S(i, mask_size, cpu_mask))
+   continue;
+   if (!first) {
+   index = snprintf([curr_index],
+str_len - curr_index, ",");
+   curr_index += index;
+   }
+   index = snprintf([curr_index], str_len - curr_index, "%d",
+i);
+   curr_index += index;
+   first = 0;
+   }
+}
+
 static void printcpumask(int str_len, char *str, int mask_size,
 cpu_set_t *cpu_mask)
 {
@@ -166,6 +193,12 @@ static void _isst_pbf_display_information(int cpu, FILE 
*outf, int level,
 pbf_info->core_cpumask);
format_and_print(outf, disp_level + 1, header, value);
 
+   snprintf(header, sizeof(header), "high-priority-cpu-list");
+   printcpulist(sizeof(value), value,
+pbf_info->core_cpumask_size,
+pbf_info->core_cpumask);
+   format_and_print(outf, disp_level + 1, header, value);
+
snprintf(header, sizeof(header), "low-priority-base-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 pbf_info->p1_low * DISP_FREQ_MULTIPLIER);
@@ -287,6 +320,12 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
 ctdp_level->core_cpumask);
format_and_print(outf, base_level + 4, header, value);
 
+   snprintf(header, sizeof(header), "enable-cpu-list");
+   printcpulist(sizeof(value), value,
+ctdp_level->core_cpumask_size,
+ctdp_level->core_cpumask);
+   format_and_print(outf, base_level + 4, header, value);
+
snprintf(header, sizeof(header), "thermal-design-power-ratio");
snprintf(value, sizeof(value), "%d", ctdp_level->tdp_ratio);
format_and_print(outf, base_level + 4, header, value);
-- 
2.21.0



[PATCH v2 1/9] tools/power/x86/intel-speed-select: Fix package typo

2019-09-05 Thread Prarit Bhargava
packag_ should be package_.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 tools/power/x86/intel-speed-select/isst-display.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index f368b8323742..0d9a53a5da2d 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -133,7 +133,7 @@ static void format_and_print(FILE *outf, int level, char 
*header, char *value)
last_level = level;
 }
 
-static void print_packag_info(int cpu, FILE *outf)
+static void print_package_info(int cpu, FILE *outf)
 {
char header[256];
 
@@ -261,7 +261,7 @@ void isst_ctdp_display_information(int cpu, FILE *outf, int 
tdp_level,
char value[256];
int i, base_level = 1;
 
-   print_packag_info(cpu, outf);
+   print_package_info(cpu, outf);
 
for (i = 0; i <= pkg_dev->levels; ++i) {
struct isst_pkg_ctdp_level_info *ctdp_level;
@@ -397,7 +397,7 @@ void isst_ctdp_display_information_end(FILE *outf)
 void isst_pbf_display_information(int cpu, FILE *outf, int level,
  struct isst_pbf_info *pbf_info)
 {
-   print_packag_info(cpu, outf);
+   print_package_info(cpu, outf);
_isst_pbf_display_information(cpu, outf, level, pbf_info, 4);
format_and_print(outf, 1, NULL, NULL);
 }
@@ -406,7 +406,7 @@ void isst_fact_display_information(int cpu, FILE *outf, int 
level,
   int fact_bucket, int fact_avx,
   struct isst_fact_info *fact_info)
 {
-   print_packag_info(cpu, outf);
+   print_package_info(cpu, outf);
_isst_fact_display_information(cpu, outf, level, fact_bucket, fact_avx,
   fact_info, 4);
format_and_print(outf, 1, NULL, NULL);
-- 
2.21.0



Re: [PATCH 0/8] tools-power-x86-intel-speed-select: Fixes and updates for output

2019-09-04 Thread Prarit Bhargava



On 9/4/19 4:06 PM, Srinivas Pandruvada wrote:
> On Tue, 2019-09-03 at 11:37 -0400, Prarit Bhargava wrote:
>> Some general fixes and updates for intel-speed-select.  Fixes include
>> some
>> typos as well as an off-by-one cpu count reporting error.  Updates
>> for the
>> output are
>>
>> - switching to MHz as a standard
>> - reporting CPU frequencies instead of ratios as a standard
>> - viewing a human-readable CPU list.
>> - avoiding reporting "0|1" as success|fail as these can be confusing
>> for a
>>   user.
> Series looks fine, except 8/8.
> So please submit v2. Better to resubmit as a series as v2, unless Andy
> has other preference.

Thanks Srinivas.

I have an additional patch.  It looks like there's a memory leak.  Sorry for the
cut-and-paste but if okay I'll submit this as part of v2.  Reworking the code
this way makes it easier to introduce CascadeLake-N support too.

diff --git a/tools/power/x86/intel-speed-select/isst-config.c b/tools/power/x86/
intel-speed-select/isst-config.c
index 78f0cebda1da..59753b3917bb 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -603,6 +603,10 @@ static int isst_fill_platform_info(void)

close(fd);

+   if (isst_platform_info.api_version > supported_api_ver) {
+   printf("Incompatible API versions; Upgrade of tool is 
required\n");
+   return -1;
+   }
return 0;
 }

@@ -1528,6 +1532,7 @@ static void cmdline(int argc, char **argv)
 {
int opt;
int option_index = 0;
+   int ret;

static struct option long_options[] = {
{ "cpu", required_argument, 0, 'c' },
@@ -1589,13 +1594,14 @@ static void cmdline(int argc, char **argv)
set_max_cpu_num();
set_cpu_present_cpu_mask();
set_cpu_target_cpu_mask();
-   isst_fill_platform_info();
-   if (isst_platform_info.api_version > supported_api_ver) {
-   printf("Incompatible API versions; Upgrade of tool is 
required\n");
-   exit(0);
-   }
+   ret = isst_fill_platform_info();
+   if (ret)
+   goto out;

process_command(argc, argv);
+out:
+   free_cpu_set(present_cpumask);
+   free_cpu_set(target_cpumask);
 }

 int main(int argc, char **argv)

P.
> 
> Thanks,
> Srinivas
> 
>>
>> Signed-off-by: Prarit Bhargava 
>> Cc: Srinivas Pandruvada 
>> Cc: David Arcari 
>> Cc: linux-kernel@vger.kernel.org
>>
>> Prarit Bhargava (8):
>>   tools/power/x86/intel-speed-select: Fix package typo
>>   tools/power/x86/intel-speed-select: Fix help option typo
>>   tools/power/x86/intel-speed-select: Fix cpu-count output
>>   tools/power/x86/intel-speed-select: Simplify output for turbo-freq
>> and
>> base-freq
>>   tools/power/x86/intel-speed-select: Switch output to MHz
>>   tools/power/x86/intel-speed-select: Change turbo ratio output to
>> maximum turbo frequency
>>   tools/power/x86/intel-speed-select: Output human readable CPU list
>>   tools/power/x86/intel-speed-select: Output success/failed for
>> command
>> output
>>
>>  .../x86/intel-speed-select/isst-config.c  |   4 +-
>>  .../x86/intel-speed-select/isst-display.c | 116 --
>> 
>>  2 files changed, 83 insertions(+), 37 deletions(-)
>>
> 


[PATCH 3/8] tools/power/x86/intel-speed-select: Fix cpu-count output

2019-09-03 Thread Prarit Bhargava
I have a system with 28 threads/socket but intel-speed-select reports
a cpu-count of 29.

Fix an off-by-one error in the cpu_count() function.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 tools/power/x86/intel-speed-select/isst-config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index d32af8210427..f81a28c6b586 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -304,7 +304,7 @@ static void set_cpu_present_cpu_mask(void)
 int get_cpu_count(int pkg_id, int die_id)
 {
if (pkg_id < MAX_PACKAGE_COUNT && die_id < MAX_DIE_PER_PACKAGE)
-   return cpu_cnt[pkg_id][die_id] + 1;
+   return cpu_cnt[pkg_id][die_id];
 
return 0;
 }
-- 
2.21.0



[PATCH 1/8] tools/power/x86/intel-speed-select: Fix package typo

2019-09-03 Thread Prarit Bhargava
packag_ should be package_.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 tools/power/x86/intel-speed-select/isst-display.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index f368b8323742..0d9a53a5da2d 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -133,7 +133,7 @@ static void format_and_print(FILE *outf, int level, char 
*header, char *value)
last_level = level;
 }
 
-static void print_packag_info(int cpu, FILE *outf)
+static void print_package_info(int cpu, FILE *outf)
 {
char header[256];
 
@@ -261,7 +261,7 @@ void isst_ctdp_display_information(int cpu, FILE *outf, int 
tdp_level,
char value[256];
int i, base_level = 1;
 
-   print_packag_info(cpu, outf);
+   print_package_info(cpu, outf);
 
for (i = 0; i <= pkg_dev->levels; ++i) {
struct isst_pkg_ctdp_level_info *ctdp_level;
@@ -397,7 +397,7 @@ void isst_ctdp_display_information_end(FILE *outf)
 void isst_pbf_display_information(int cpu, FILE *outf, int level,
  struct isst_pbf_info *pbf_info)
 {
-   print_packag_info(cpu, outf);
+   print_package_info(cpu, outf);
_isst_pbf_display_information(cpu, outf, level, pbf_info, 4);
format_and_print(outf, 1, NULL, NULL);
 }
@@ -406,7 +406,7 @@ void isst_fact_display_information(int cpu, FILE *outf, int 
level,
   int fact_bucket, int fact_avx,
   struct isst_fact_info *fact_info)
 {
-   print_packag_info(cpu, outf);
+   print_package_info(cpu, outf);
_isst_fact_display_information(cpu, outf, level, fact_bucket, fact_avx,
   fact_info, 4);
format_and_print(outf, 1, NULL, NULL);
-- 
2.21.0



[PATCH 6/8] tools/power/x86/intel-speed-select: Change turbo ratio output to maximum turbo frequency

2019-09-03 Thread Prarit Bhargava
The intel-speed-select tool currently outputs the turbo ratio for every
bucket.  Make the output more user-friendly by changing the output to the
maximum turbo frequency.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 .../x86/intel-speed-select/isst-display.c  | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 4ec1924ff7a9..cfeee0beb78d 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -336,9 +336,11 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
snprintf(value, sizeof(value), "%d", j);
format_and_print(outf, base_level + 6, header, value);
 
-   snprintf(header, sizeof(header), "turbo-ratio");
+   snprintf(header, sizeof(header),
+   "max-turbo-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
-ctdp_level->trl_sse_active_cores[j]);
+ctdp_level->trl_sse_active_cores[j] *
+ DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 6, header, value);
}
snprintf(header, sizeof(header), "turbo-ratio-limits-avx");
@@ -351,9 +353,11 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
snprintf(value, sizeof(value), "%d", j);
format_and_print(outf, base_level + 6, header, value);
 
-   snprintf(header, sizeof(header), "turbo-ratio");
+   snprintf(header, sizeof(header),
+   "max-turbo-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
-ctdp_level->trl_avx_active_cores[j]);
+ctdp_level->trl_avx_active_cores[j] *
+ DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 6, header, value);
}
 
@@ -367,9 +371,11 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
snprintf(value, sizeof(value), "%d", j);
format_and_print(outf, base_level + 6, header, value);
 
-   snprintf(header, sizeof(header), "turbo-ratio");
+   snprintf(header, sizeof(header),
+   "max-turbo-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
-ctdp_level->trl_avx_512_active_cores[j]);
+ctdp_level->trl_avx_512_active_cores[j] *
+ DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 6, header, value);
}
if (ctdp_level->pbf_support)
-- 
2.21.0



[PATCH 4/8] tools/power/x86/intel-speed-select: Simplify output for turbo-freq and base-freq

2019-09-03 Thread Prarit Bhargava
The current output of 'intel-speed-select -c 53 perf-profile info -l 0'
shows

speed-select-turbo-freq-support:1
speed-select-base-freq-support:1
speed-select-base-freq-enabled:0
speed-select-turbo-freq-enabled:0

Simplify the output to single lines displaying status of disabled,
enabled, and unsupported.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 .../x86/intel-speed-select/isst-display.c | 30 ++-
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 0d9a53a5da2d..3ae693efceaa 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -297,23 +297,25 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
format_and_print(outf, base_level + 4, header, value);
 
snprintf(header, sizeof(header),
-"speed-select-turbo-freq-support");
-   snprintf(value, sizeof(value), "%d", ctdp_level->fact_support);
+"speed-select-turbo-freq");
+   if (ctdp_level->fact_support) {
+   if (ctdp_level->fact_enabled)
+   snprintf(value, sizeof(value), "enabled");
+   else
+   snprintf(value, sizeof(value), "disabled");
+   } else
+   snprintf(value, sizeof(value), "unsupported");
format_and_print(outf, base_level + 4, header, value);
 
snprintf(header, sizeof(header),
-"speed-select-base-freq-support");
-   snprintf(value, sizeof(value), "%d", ctdp_level->pbf_support);
-   format_and_print(outf, base_level + 4, header, value);
-
-   snprintf(header, sizeof(header),
-"speed-select-base-freq-enabled");
-   snprintf(value, sizeof(value), "%d", ctdp_level->pbf_enabled);
-   format_and_print(outf, base_level + 4, header, value);
-
-   snprintf(header, sizeof(header),
-"speed-select-turbo-freq-enabled");
-   snprintf(value, sizeof(value), "%d", ctdp_level->fact_enabled);
+"speed-select-base-freq");
+   if (ctdp_level->pbf_support) {
+   if (ctdp_level->pbf_enabled)
+   snprintf(value, sizeof(value), "enabled");
+   else
+   snprintf(value, sizeof(value), "disabled");
+   } else
+   snprintf(value, sizeof(value), "unsupported");
format_and_print(outf, base_level + 4, header, value);
 
snprintf(header, sizeof(header), "thermal-design-power(W)");
-- 
2.21.0



[PATCH 5/8] tools/power/x86/intel-speed-select: Switch output to MHz

2019-09-03 Thread Prarit Bhargava
These features are introduced on new processors that will never operate
in the KHz range.

Save some zeros and switch the output to MHz.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 .../x86/intel-speed-select/isst-display.c | 20 +--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 3ae693efceaa..4ec1924ff7a9 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -6,7 +6,7 @@
 
 #include "isst.h"
 
-#define DISP_FREQ_MULTIPLIER 10
+#define DISP_FREQ_MULTIPLIER 100
 
 static void printcpumask(int str_len, char *str, int mask_size,
 cpu_set_t *cpu_mask)
@@ -156,7 +156,7 @@ static void _isst_pbf_display_information(int cpu, FILE 
*outf, int level,
snprintf(header, sizeof(header), "speed-select-base-freq");
format_and_print(outf, disp_level, header, NULL);
 
-   snprintf(header, sizeof(header), "high-priority-base-frequency(KHz)");
+   snprintf(header, sizeof(header), "high-priority-base-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 pbf_info->p1_high * DISP_FREQ_MULTIPLIER);
format_and_print(outf, disp_level + 1, header, value);
@@ -166,7 +166,7 @@ static void _isst_pbf_display_information(int cpu, FILE 
*outf, int level,
 pbf_info->core_cpumask);
format_and_print(outf, disp_level + 1, header, value);
 
-   snprintf(header, sizeof(header), "low-priority-base-frequency(KHz)");
+   snprintf(header, sizeof(header), "low-priority-base-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 pbf_info->p1_low * DISP_FREQ_MULTIPLIER);
format_and_print(outf, disp_level + 1, header, value);
@@ -209,7 +209,7 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
 
if (fact_avx & 0x01) {
snprintf(header, sizeof(header),
-"high-priority-max-frequency(KHz)");
+"high-priority-max-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 bucket_info[j].sse_trl * DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 2, header, value);
@@ -217,7 +217,7 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
 
if (fact_avx & 0x02) {
snprintf(header, sizeof(header),
-"high-priority-max-avx2-frequency(KHz)");
+"high-priority-max-avx2-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 bucket_info[j].avx_trl * DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 2, header, value);
@@ -225,7 +225,7 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
 
if (fact_avx & 0x04) {
snprintf(header, sizeof(header),
-"high-priority-max-avx512-frequency(KHz)");
+"high-priority-max-avx512-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 bucket_info[j].avx512_trl *
 DISP_FREQ_MULTIPLIER);
@@ -235,19 +235,19 @@ static void _isst_fact_display_information(int cpu, FILE 
*outf, int level,
snprintf(header, sizeof(header),
 "speed-select-turbo-freq-clip-frequencies");
format_and_print(outf, base_level + 1, header, NULL);
-   snprintf(header, sizeof(header), "low-priority-max-frequency(KHz)");
+   snprintf(header, sizeof(header), "low-priority-max-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 fact_info->lp_clipping_ratio_license_sse *
 DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 2, header, value);
snprintf(header, sizeof(header),
-"low-priority-max-avx2-frequency(KHz)");
+"low-priority-max-avx2-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 fact_info->lp_clipping_ratio_license_avx2 *
 DISP_FREQ_MULTIPLIER);
format_and_print(outf, base_level + 2, header, value);
snprintf(header, sizeof(header),
-"low-priority-max-avx512-frequency(KHz)");
+

[PATCH 8/8] tools/power/x86/intel-speed-select: Output success/failed for command output

2019-09-03 Thread Prarit Bhargava
Command output has confusing data, returning "0" on success.  For example

|# ./intel-speed-select -c 14 turbo-freq enable
Intel(R) Speed Select Technology
Executing on CPU model:106[0x6a]
 package-1
   die-0
 cpu-14
   turbo-freq
 enable:0

To avoid confusion change the command output to 'success' or 'failed'.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 tools/power/x86/intel-speed-select/isst-display.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index 890a01bfee4b..8500cf2997a6 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -519,7 +519,10 @@ void isst_display_result(int cpu, FILE *outf, char 
*feature, char *cmd,
snprintf(header, sizeof(header), "%s", feature);
format_and_print(outf, 4, header, NULL);
snprintf(header, sizeof(header), "%s", cmd);
-   snprintf(value, sizeof(value), "%d", result);
+   if (!result)
+   snprintf(value, sizeof(value), "success");
+   else
+   snprintf(value, sizeof(value), "failed(error %d)", result);
format_and_print(outf, 5, header, value);
 
format_and_print(outf, 1, NULL, NULL);
-- 
2.21.0



[PATCH 7/8] tools/power/x86/intel-speed-select: Output human readable CPU list

2019-09-03 Thread Prarit Bhargava
The intel-speed-select tool currently only outputs a hexidecimal CPU mask,
which requires translation for use with kernel parameters such as
isolcpus.

Along with the CPU mask, output a human readable CPU list.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 .../x86/intel-speed-select/isst-display.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/tools/power/x86/intel-speed-select/isst-display.c 
b/tools/power/x86/intel-speed-select/isst-display.c
index cfeee0beb78d..890a01bfee4b 100644
--- a/tools/power/x86/intel-speed-select/isst-display.c
+++ b/tools/power/x86/intel-speed-select/isst-display.c
@@ -8,6 +8,33 @@
 
 #define DISP_FREQ_MULTIPLIER 100
 
+static void printcpulist(int str_len, char *str, int mask_size,
+cpu_set_t *cpu_mask)
+{
+   int i, first, curr_index, index;
+
+   if (!CPU_COUNT_S(mask_size, cpu_mask)) {
+   snprintf(str, str_len, "none");
+   return;
+   }
+
+   curr_index = 0;
+   first = 1;
+   for (i = 0; i < get_topo_max_cpus(); ++i) {
+   if (!CPU_ISSET_S(i, mask_size, cpu_mask))
+   continue;
+   if (!first) {
+   index = snprintf([curr_index],
+str_len - curr_index, ",");
+   curr_index += index;
+   }
+   index = snprintf([curr_index], str_len - curr_index, "%d",
+i);
+   curr_index += index;
+   first = 0;
+   }
+}
+
 static void printcpumask(int str_len, char *str, int mask_size,
 cpu_set_t *cpu_mask)
 {
@@ -166,6 +193,12 @@ static void _isst_pbf_display_information(int cpu, FILE 
*outf, int level,
 pbf_info->core_cpumask);
format_and_print(outf, disp_level + 1, header, value);
 
+   snprintf(header, sizeof(header), "high-priority-cpu-list");
+   printcpulist(sizeof(value), value,
+pbf_info->core_cpumask_size,
+pbf_info->core_cpumask);
+   format_and_print(outf, disp_level + 1, header, value);
+
snprintf(header, sizeof(header), "low-priority-base-frequency(MHz)");
snprintf(value, sizeof(value), "%d",
 pbf_info->p1_low * DISP_FREQ_MULTIPLIER);
@@ -287,6 +320,12 @@ void isst_ctdp_display_information(int cpu, FILE *outf, 
int tdp_level,
 ctdp_level->core_cpumask);
format_and_print(outf, base_level + 4, header, value);
 
+   snprintf(header, sizeof(header), "enable-cpu-list");
+   printcpulist(sizeof(value), value,
+ctdp_level->core_cpumask_size,
+ctdp_level->core_cpumask);
+   format_and_print(outf, base_level + 4, header, value);
+
snprintf(header, sizeof(header), "thermal-design-power-ratio");
snprintf(value, sizeof(value), "%d", ctdp_level->tdp_ratio);
format_and_print(outf, base_level + 4, header, value);
-- 
2.21.0



[PATCH 2/8] tools/power/x86/intel-speed-select: Fix help option typo

2019-09-03 Thread Prarit Bhargava
Help is -h, not --h.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org
---
 tools/power/x86/intel-speed-select/isst-config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/power/x86/intel-speed-select/isst-config.c 
b/tools/power/x86/intel-speed-select/isst-config.c
index 91c5ad1685a1..d32af8210427 100644
--- a/tools/power/x86/intel-speed-select/isst-config.c
+++ b/tools/power/x86/intel-speed-select/isst-config.c
@@ -1491,7 +1491,7 @@ static void usage(void)
printf("intel-speed-select [OPTIONS] FEATURE COMMAND 
COMMAND_ARGUMENTS\n");
printf("\nUse this tool to enumerate and control the Intel Speed Select 
Technology features,\n");
printf("\nFEATURE : [perf-profile|base-freq|turbo-freq|core-power]\n");
-   printf("\nFor help on each feature, use --h|--help\n");
+   printf("\nFor help on each feature, use -h|--help\n");
printf("\tFor example:  intel-speed-select perf-profile -h\n");
 
printf("\nFor additional help on each command for a feature, use 
--h|--help\n");
-- 
2.21.0



[PATCH 0/8] tools-power-x86-intel-speed-select: Fixes and updates for output

2019-09-03 Thread Prarit Bhargava
Some general fixes and updates for intel-speed-select.  Fixes include some
typos as well as an off-by-one cpu count reporting error.  Updates for the
output are

- switching to MHz as a standard
- reporting CPU frequencies instead of ratios as a standard
- viewing a human-readable CPU list.
- avoiding reporting "0|1" as success|fail as these can be confusing for a
  user.

Signed-off-by: Prarit Bhargava 
Cc: Srinivas Pandruvada 
Cc: David Arcari 
Cc: linux-kernel@vger.kernel.org

Prarit Bhargava (8):
  tools/power/x86/intel-speed-select: Fix package typo
  tools/power/x86/intel-speed-select: Fix help option typo
  tools/power/x86/intel-speed-select: Fix cpu-count output
  tools/power/x86/intel-speed-select: Simplify output for turbo-freq and
base-freq
  tools/power/x86/intel-speed-select: Switch output to MHz
  tools/power/x86/intel-speed-select: Change turbo ratio output to
maximum turbo frequency
  tools/power/x86/intel-speed-select: Output human readable CPU list
  tools/power/x86/intel-speed-select: Output success/failed for command
output

 .../x86/intel-speed-select/isst-config.c  |   4 +-
 .../x86/intel-speed-select/isst-display.c | 116 --
 2 files changed, 83 insertions(+), 37 deletions(-)

-- 
2.21.0



Re: [PATCH AUTOSEL 5.2 13/85] kernel/module.c: Only return -EEXIST for modules that have finished loading

2019-07-26 Thread Prarit Bhargava



On 7/26/19 9:38 AM, Sasha Levin wrote:
> From: Prarit Bhargava 
> 
> [ Upstream commit 6e6de3dee51a439f76eb73c22ae2ffd2c9384712 ]
> 
> Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and
> linux guests boot with repeated errors:
> 

Hey Sasha, I'd prefer to leave this out of stable branches for now.  Linus is a
bit nervous about it and I like to get see more soak time before the patch is
backported to stable.

https://lkml.org/lkml/2019/7/18/680

FWIW we've been running this in RHEL for some time now without any issues.

P.

> amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
> amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
> amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
> amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
> amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
> amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
> 
> The warnings occur because the module code erroneously returns -EEXIST
> for modules that have failed to load and are in the process of being
> removed from the module list.
> 
> module amd64_edac_mod has a dependency on module edac_mce_amd.  Using
> modules.dep, systemd will load edac_mce_amd for every request of
> amd64_edac_mod.  When the edac_mce_amd module loads, the module has
> state MODULE_STATE_UNFORMED and once the module load fails and the state
> becomes MODULE_STATE_GOING.  Another request for edac_mce_amd module
> executes and add_unformed_module() will erroneously return -EEXIST even
> though the previous instance of edac_mce_amd has MODULE_STATE_GOING.
> Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which
> fails because of unknown symbols from edac_mce_amd.
> 
> add_unformed_module() must wait to return for any case other than
> MODULE_STATE_LIVE to prevent a race between multiple loads of
> dependent modules.
> 
> Signed-off-by: Prarit Bhargava 
> Signed-off-by: Barret Rhoden 
> Cc: David Arcari 
> Cc: Jessica Yu 
> Cc: Heiko Carstens 
> Signed-off-by: Jessica Yu 
> Signed-off-by: Sasha Levin 
> ---
>  kernel/module.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 80c7c09584cf..8431c3d47c97 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3385,8 +3385,7 @@ static bool finished_loading(const char *name)
>   sched_annotate_sleep();
>   mutex_lock(_mutex);
>   mod = find_module_all(name, strlen(name), true);
> - ret = !mod || mod->state == MODULE_STATE_LIVE
> - || mod->state == MODULE_STATE_GOING;
> + ret = !mod || mod->state == MODULE_STATE_LIVE;
>   mutex_unlock(_mutex);
>  
>   return ret;
> @@ -3576,8 +3575,7 @@ static int add_unformed_module(struct module *mod)
>   mutex_lock(_mutex);
>   old = find_module_all(mod->name, strlen(mod->name), true);
>   if (old != NULL) {
> - if (old->state == MODULE_STATE_COMING
> - || old->state == MODULE_STATE_UNFORMED) {
> + if (old->state != MODULE_STATE_LIVE) {
>   /* Wait in case it fails to load. */
>   mutex_unlock(_mutex);
>   err = wait_event_interruptible(module_wq,
> 


Re: [GIT PULL] Modules updates for v5.3

2019-07-19 Thread Prarit Bhargava



On 7/18/19 3:20 PM, Linus Torvalds wrote:
> On Thu, Jul 18, 2019 at 4:33 AM Jessica Yu  wrote:
>>
>> Modules updates for v5.3
>>
>> - Fix bug where -EEXIST was being returned for going modules
> 
> Hmm.
> 
> I have pulled this, but this change makes me a bit nervous.
> 
> I have this dim memory of us having deadlocks in module loading
> because module loading triggered other modules to be loaded, and we
> had circular dependencies..
> 

Interesting.  I'll try to search around to see if I can find the thread.

P.

> This is from years and years ago (that whole state check is not
> recent), so my memory may simply be wrong, but I thought I'd mention
> it...
> 
>Linus
> 


[tip:x86/urgent] x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled

2019-06-12 Thread tip-bot for Prarit Bhargava
Commit-ID:  c7563e62a6d720aa3b068e26ddffab5f0df29263
Gitweb: https://git.kernel.org/tip/c7563e62a6d720aa3b068e26ddffab5f0df29263
Author: Prarit Bhargava 
AuthorDate: Mon, 10 Jun 2019 13:15:44 -0400
Committer:  Thomas Gleixner 
CommitDate: Wed, 12 Jun 2019 10:31:50 +0200

x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled

Booting with kernel parameter "rdt=cmt,mbmtotal,memlocal,l3cat,mba" and
executing "mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl" results in
a NULL pointer dereference on systems which do not have local MBM support
enabled..

BUG: kernel NULL pointer dereference, address: 0020
PGD 0 P4D 0
Oops:  [#1] SMP PTI
CPU: 0 PID: 722 Comm: kworker/0:3 Not tainted 
5.2.0-0.rc3.git0.1.el7_UNSUPPORTED.x86_64 #2
Workqueue: events mbm_handle_overflow
RIP: 0010:mbm_handle_overflow+0x150/0x2b0

Only enter the bandwith update loop if the system has local MBM enabled.

Fixes: de73f38f7680 ("x86/intel_rdt/mba_sc: Feedback loop to dynamically update 
mem bandwidth")
Signed-off-by: Prarit Bhargava 
Signed-off-by: Thomas Gleixner 
Cc: Fenghua Yu 
Cc: Reinette Chatre 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20190610171544.13474-1-pra...@redhat.com

---
 arch/x86/kernel/cpu/resctrl/monitor.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c 
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 1573a0a6b525..ff6e8e561405 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -368,6 +368,9 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct 
rdt_domain *dom_mbm)
struct list_head *head;
struct rdtgroup *entry;
 
+   if (!is_mbm_local_enabled())
+   return;
+
r_mba = _resources_all[RDT_RESOURCE_MBA];
closid = rgrp->closid;
rmid = rgrp->mon.rmid;


[PATCH] x86/resctrl: Fix panic on systems that do not enable local MBM

2019-06-10 Thread Prarit Bhargava
Booting with kernel parameter "rdt=cmt,mbmtotal,memlocal,l3cat,mba" and
executing "mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl"
results in a panic on systems without local MBM support enabled in
firmware.

BUG: kernel NULL pointer dereference, address: 0020
PGD 0 P4D 0
Oops:  [#1] SMP PTI
CPU: 0 PID: 722 Comm: kworker/0:3 Not tainted 
5.2.0-0.rc3.git0.1.el7_UNSUPPORTED.x86_64 #2
Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 2.1.8 04/30/2019
Workqueue: events mbm_handle_overflow
RIP: 0010:mbm_handle_overflow+0x150/0x2b0

Only call the bandwith update loop if the system has local MBM enabled.

Signed-off-by: Prarit Bhargava 
Cc: Fenghua Yu 
Cc: Reinette Chatre 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c 
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 7ee93125a211..397206f23d14 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -360,6 +360,9 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct 
rdt_domain *dom_mbm)
struct list_head *head;
struct rdtgroup *entry;
 
+   if (!is_mbm_local_enabled())
+   return;
+
r_mba = _resources_all[RDT_RESOURCE_MBA];
closid = rgrp->closid;
rmid = rgrp->mon.rmid;
-- 
2.21.0



[PATCH v3] kernel/module.c: Only return -EEXIST for modules that have finished loading

2019-05-29 Thread Prarit Bhargava
Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and
linux guests boot with repeated errors:

amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)

The warnings occur because the module code erroneously returns -EEXIST
for modules that have failed to load and are in the process of being
removed from the module list.

module amd64_edac_mod has a dependency on module edac_mce_amd.  Using
modules.dep, systemd will load edac_mce_amd for every request of
amd64_edac_mod.  When the edac_mce_amd module loads, the module has
state MODULE_STATE_UNFORMED and once the module load fails and the state
becomes MODULE_STATE_GOING.  Another request for edac_mce_amd module
executes and add_unformed_module() will erroneously return -EEXIST even
though the previous instance of edac_mce_amd has MODULE_STATE_GOING.
Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which
fails because of unknown symbols from edac_mce_amd.

add_unformed_module() must wait to return for any case other than
MODULE_STATE_LIVE to prevent a race between multiple loads of
dependent modules.

Signed-off-by: Prarit Bhargava 
Signed-off-by: Barret Rhoden 
Cc: David Arcari 
Cc: Jessica Yu 
Cc: Heiko Carstens 
---
 kernel/module.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 6e6712b3aaf5..1e7dcbe527af 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3397,8 +3397,7 @@ static bool finished_loading(const char *name)
sched_annotate_sleep();
mutex_lock(_mutex);
mod = find_module_all(name, strlen(name), true);
-   ret = !mod || mod->state == MODULE_STATE_LIVE
-   || mod->state == MODULE_STATE_GOING;
+   ret = !mod || mod->state == MODULE_STATE_LIVE;
mutex_unlock(_mutex);
 
return ret;
@@ -3588,8 +3587,7 @@ static int add_unformed_module(struct module *mod)
mutex_lock(_mutex);
old = find_module_all(mod->name, strlen(mod->name), true);
if (old != NULL) {
-   if (old->state == MODULE_STATE_COMING
-   || old->state == MODULE_STATE_UNFORMED) {
+   if (old->state != MODULE_STATE_LIVE) {
/* Wait in case it fails to load. */
mutex_unlock(_mutex);
err = wait_event_interruptible(module_wq,
-- 
2.21.0



Re: [PATCH] modules: fix livelock in add_unformed_module()

2019-05-28 Thread Prarit Bhargava



On 5/22/19 1:08 PM, Prarit Bhargava wrote:
> 
> 
> On 5/13/19 10:37 AM, Barret Rhoden wrote:
>> Hi -
>>
> 
> Hey Barret, my apologies for not getting back to you earlier.  I got caught up
> in something that took me away from this issue.
> 
>> On 5/13/19 7:23 AM, Prarit Bhargava wrote:
>> [snip]
>>> A module is loaded once for each cpu.
>>
>> Does one CPU succeed in loading the module, and the others fail with EEXIST?
>>
>>> My follow-up patch changes from wait_event_interruptible() to
>>> wait_event_interruptible_timeout() so the CPUs are no longer sleeping and 
>>> can
>>> make progress on other tasks, which changes the return values from
>>> wait_event_interruptible().
>>>
>>> https://marc.info/?l=linux-kernel=155724085927589=2
>>>
>>> I believe this also takes your concern into account?
>>
>> That patch might work for me, but I think it papers over the bug where the 
>> check
>> on old->state that you make before sleeping (was COMING || UNFORMED, now 
>> !LIVE)
>> doesn't match the check to wake up in finished_loading().
>>
>> The reason the issue might not show up in practice is that your patch 
>> basically
>> polls, so the condition checks in finished_loading() are only a quicker exit.
>>
>> If you squash my patch into yours, I think it will cover that case. Though if
>> polling is the right answer here, it also raises the question of whether or 
>> not
>> we even need finished_loading().
>>
> 
> The more I look at this I think you're right.  Let me do some additional 
> testing
> with your patch + my original patch.
> 

I have done testing on arm64, s390x, ppc64le, ppc64, and x86 and have not seen
any issues.

Jessica, how would you like me to proceed?  Would you like an updated patch with
Signed-off's from both Barret & myself?

P.

> P.
> 
> 
>> Barret


Re: [PATCH] modules: fix livelock in add_unformed_module()

2019-05-22 Thread Prarit Bhargava



On 5/13/19 10:37 AM, Barret Rhoden wrote:
> Hi -
> 

Hey Barret, my apologies for not getting back to you earlier.  I got caught up
in something that took me away from this issue.

> On 5/13/19 7:23 AM, Prarit Bhargava wrote:
> [snip]
>> A module is loaded once for each cpu.
> 
> Does one CPU succeed in loading the module, and the others fail with EEXIST?
> 
>> My follow-up patch changes from wait_event_interruptible() to
>> wait_event_interruptible_timeout() so the CPUs are no longer sleeping and can
>> make progress on other tasks, which changes the return values from
>> wait_event_interruptible().
>>
>> https://marc.info/?l=linux-kernel=155724085927589=2
>>
>> I believe this also takes your concern into account?
> 
> That patch might work for me, but I think it papers over the bug where the 
> check
> on old->state that you make before sleeping (was COMING || UNFORMED, now 
> !LIVE)
> doesn't match the check to wake up in finished_loading().
> 
> The reason the issue might not show up in practice is that your patch 
> basically
> polls, so the condition checks in finished_loading() are only a quicker exit.
> 
> If you squash my patch into yours, I think it will cover that case. Though if
> polling is the right answer here, it also raises the question of whether or 
> not
> we even need finished_loading().
> 

The more I look at this I think you're right.  Let me do some additional testing
with your patch + my original patch.

P.


> Barret


Re: [PATCH] modules: fix livelock in add_unformed_module()

2019-05-13 Thread Prarit Bhargava



On 5/10/19 2:42 PM, Barret Rhoden wrote:
> When add_unformed_module() finds an existing module with the same name,
> it waits until the preexisting module finished loading.  Prior to commit
> f9a75c1d717f, this meant if the module was either UNFORMED or COMING,
> we'd wait.  That commit changed the check to be !LIVE, so that we'd wait
> for UNFORMED, COMING, or GOING.

Hi Barret, thanks for the fix but I dropped that patch for other reasons.
Please see below.

> 
> The problem with that commit was that we wait on finished_loading(), and
> that function's state checks were not changed.  If a module was
> GOING, finished_loading() was returning true, meaning to recheck the
> state and presumably break out of the loop.  This mismatch between the
> state checked by add_unformed_module() and the state checked in
> finished_loading() could result in a hang.
> 
> Specifically, when a module was GOING, wait_event_interruptible() would
> immediately return with no error, we'd goto 'again,' see the state !=
> LIVE, and try again.
> 
> This commit changes finished_loading() such that we only consider a
> module 'finished' when it doesn't exist or is LIVE, which are the cases
> that break from the wait-loop in add_unformed_module().
> 
> Fixes: f9a75c1d717f ("modules: Only return -EEXIST for modules that have 
> finished loading")
> Signed-off-by: Barret Rhoden 
> ---
>  kernel/module.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 410eeb7e4f1d..0744eea7bb90 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3407,8 +3407,7 @@ static bool finished_loading(const char *name)
>   sched_annotate_sleep();
>   mutex_lock(_mutex);
>   mod = find_module_all(name, strlen(name), true);
> - ret = !mod || mod->state == MODULE_STATE_LIVE
> - || mod->state == MODULE_STATE_GOING;
> + ret = !mod || mod->state == MODULE_STATE_LIVE;
>   mutex_unlock(_mutex);

With your change the above my x86 system still locks up during boot and adds
minutes to boot time because the many CPUs are sleeping waiting for the
module_wq to be run.

A module is loaded once for each cpu.

CPU 0 loads the module, adds the module to the modules list, and sets the module
state to MODULE_STATE_UNFORMED.

Simultaneously all the other CPUs also load the module and sleep in
add_unformed_module().

If CPU 0 is interrupted/rescheduled for any reason that means the other CPUs are
stuck waiting until the module thread on CPU0 is executed again.  I mentioned
earlier that on some systems systemd is launching greater than the number of CPU
module loads so this occurs quite often on large CPU systems.  This is an odd
bug that also needs to be resolved but I suspect that it is in systemd & not the
kernel.

My follow-up patch changes from wait_event_interruptible() to
wait_event_interruptible_timeout() so the CPUs are no longer sleeping and can
make progress on other tasks, which changes the return values from
wait_event_interruptible().

https://marc.info/?l=linux-kernel=155724085927589=2

I believe this also takes your concern into account?

Please correct me if I'm wrong and am not understanding your issue.

P.

>  
>   return ret;
> 


[PATCH v2] modules: Only return -EEXIST for modules that have finished loading

2019-05-07 Thread Prarit Bhargava
Heiko, it would still be good to get a test of this patch from you.  I
tested this here at Red Hat on some System Z machines.  Without the
modification made here in v2, the systems failed to boot ~10% of the time.
After the modification I do not see any boot failures.  I also was
able to reproduce the boot issue with the acpi_cpufreq driver on a very
large & fast x86 system which had closer to 100% failure rate without
the changes in v2.  After the modification in v2 the system has rebooted
all weekend without any issues.

P.

---8<---

Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and
linux guests boot with repeated errors:

amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)

The warnings occur because the module code erroneously returns -EEXIST
for modules that have failed to load and are in the process of being
removed from the module list.

module amd64_edac_mod has a dependency on module edac_mce_amd.  Using
modules.dep, systemd will load edac_mce_amd for every request of
amd64_edac_mod.  When the edac_mce_amd module loads, the module has
state MODULE_STATE_UNFORMED and once the module load fails and the state
becomes MODULE_STATE_GOING.  Another request for edac_mce_amd module
executes and add_unformed_module() will erroneously return -EEXIST even
though the previous instance of edac_mce_amd has MODULE_STATE_GOING.
Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which
fails because of unknown symbols from edac_mce_amd.

add_unformed_module() must wait to return for any case other than
MODULE_STATE_LIVE to prevent a race between multiple loads of
dependent modules.

v2: The initial (old->state != MODULE_STATE_LIVE) change exposed an
additional issue in the code.  wait_event_interruptible() puts each thread
to sleep until the a module finishes loading an executes the module_wq
workqueue.  The result is a long delay during the boot.  Switching to
wait_event_interruptible_timeout() resolves the sleep problem.

Signed-off-by: Prarit Bhargava 
Cc: Jessica Yu 
Cc: Heiko Carstens 
Cc: David Arcari 
---
 kernel/module.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 1c429d8d2d74..6c868aabaf37 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3568,12 +3568,12 @@ static int add_unformed_module(struct module *mod)
mutex_lock(_mutex);
old = find_module_all(mod->name, strlen(mod->name), true);
if (old != NULL) {
-   if (old->state == MODULE_STATE_COMING
-   || old->state == MODULE_STATE_UNFORMED) {
+   if (old->state != MODULE_STATE_LIVE) {
/* Wait in case it fails to load. */
mutex_unlock(_mutex);
-   err = wait_event_interruptible(module_wq,
-  finished_loading(mod->name));
+   err = wait_event_interruptible_timeout(module_wq,
+  finished_loading(mod->name),
+  HZ/1000);
if (err)
goto out_unlocked;
goto again;
-- 
2.18.1



Re: [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading

2019-05-02 Thread Prarit Bhargava



On 5/2/19 8:41 AM, Prarit Bhargava wrote:
> 
> 
> On 5/2/19 5:48 AM, Jessica Yu wrote:
>> +++ Prarit Bhargava [01/05/19 17:26 -0400]:
>>>
>>>
>>> On 4/30/19 6:22 PM, Prarit Bhargava wrote:
>>>> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
>>>> loading the s390_trng.ko module.
>>>>
>>>> Add a reschedule point to the loop that waits for modules to complete
>>>> loading.
>>>>
>>>> v3: cleanup Fixes line.
>>>
>>> Jessica, even with this additional patch there appears to be some other 
>>> issues
>>> in the module code that are causing significant delays in boot up on large
>>> systems.
>>
>> Is this limited to only s390? Or are you seeing this on other arches
>> as well? And is it limited to specific modules (like s390_trng)?
> 
> Other arches.  We're seeing a hang on a new 192 CPU x86_64 box & the
> acpi_cpufreq driver.  The system is MUCH faster than any other x86_64 box I've
> seen and that's likely why I'm seeing a problem.
> 
>>
>>> FWIW, the logic in the original patch is correct.  It's just that there's, 
>>> as
>>> Heiko discovered, some poor scheduling, etc., that is impacting the module
>>> loading code after these changes.
>>
>> I am really curious to see what these performance regressions look
>> like :/ Please update us when you find out more.
>>
> 
> I sent Heiko a private v4 RFC last night with this patch (sorry for the
> cut-and-paste)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 1c429d8d2d74..a4ef8628f26f 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3568,12 +3568,12 @@ static int add_unformed_module(struct module *mod)
>   mutex_lock(_mutex);
>   old = find_module_all(mod->name, strlen(mod->name), true);
>   if (old != NULL) {
> - if (old->state == MODULE_STATE_COMING
> - || old->state == MODULE_STATE_UNFORMED) {
> + if (old->state != MODULE_STATE_LIVE) {
>   /* Wait in case it fails to load. */
>   mutex_unlock(_mutex);
> - err = wait_event_interruptible(module_wq,
> -finished_loading(mod->name));
> + err = wait_event_interruptible_timeout(module_wq,
> +finished_loading(mod->name),
> +HZ / 1);
>   if (err)
>   goto out_unlocked;
>   goto again;
> 
> The original module dependency race issue is fixed simply by changing the
> conditional to checking !MODULE_STATE_LIVE.  This, unfortunately, exposed some
> other problems within the code.
> 
> The module_wq is only run when a module fails to load.  It's possible that
> the time between the module's failed init() call and running module_wq
> (kernel/module.c:3455) takes a while.  Any thread entering the
> add_unformed_module() code while the old module is unloading is put to sleep
> waiting for the module_wq to execute.
> 
> On the 192 thread box I have noticed that the acpi_cpufreq module attempts
> to load 392 times (that is not a typo and I am going to try to figure that
> problem out after this one).  This means 191 cpus are put to sleep, and one
> cpu is executing the acpi_cpufreq module unload which is executing
> do_init_module() and is now at
> 
> fail_free_freeinit:
> kfree(freeinit);
> fail:
> /* Try to protect us from buggy refcounters. */
> mod->state = MODULE_STATE_GOING;
> synchronize_rcu();
> module_put(mod);
> blocking_notifier_call_chain(_notify_list,
>  MODULE_STATE_GOING, mod);
> klp_module_going(mod);
> ftrace_release_mod(mod);
> free_module(mod);
> wake_up_all(_wq);
> return ret;
> }
> 
> The 191 threads cannot schedule and the system is effectively stuck.  It 
> *does*
> eventually free itself but in some cases it takes minutes to do so.
> 
> A simple fix for this is to, as I've done above, to add a timeout so that
> the threads can be scheduled which allows other processes to run.  

After taking a much closer look the above patch appears to be correct.  I am not
seeing any boot failures associated with it anywhere.  I would like to hear from
Heiko as to whether or not this works for him though.

P.


Re: [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading

2019-05-02 Thread Prarit Bhargava



On 5/2/19 5:48 AM, Jessica Yu wrote:
> +++ Prarit Bhargava [01/05/19 17:26 -0400]:
>>
>>
>> On 4/30/19 6:22 PM, Prarit Bhargava wrote:
>>> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
>>> loading the s390_trng.ko module.
>>>
>>> Add a reschedule point to the loop that waits for modules to complete
>>> loading.
>>>
>>> v3: cleanup Fixes line.
>>
>> Jessica, even with this additional patch there appears to be some other 
>> issues
>> in the module code that are causing significant delays in boot up on large
>> systems.
> 
> Is this limited to only s390? Or are you seeing this on other arches
> as well? And is it limited to specific modules (like s390_trng)?

Other arches.  We're seeing a hang on a new 192 CPU x86_64 box & the
acpi_cpufreq driver.  The system is MUCH faster than any other x86_64 box I've
seen and that's likely why I'm seeing a problem.

> 
>> FWIW, the logic in the original patch is correct.  It's just that there's, as
>> Heiko discovered, some poor scheduling, etc., that is impacting the module
>> loading code after these changes.
> 
> I am really curious to see what these performance regressions look
> like :/ Please update us when you find out more.
> 

I sent Heiko a private v4 RFC last night with this patch (sorry for the
cut-and-paste)

diff --git a/kernel/module.c b/kernel/module.c
index 1c429d8d2d74..a4ef8628f26f 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3568,12 +3568,12 @@ static int add_unformed_module(struct module *mod)
mutex_lock(_mutex);
old = find_module_all(mod->name, strlen(mod->name), true);
if (old != NULL) {
-   if (old->state == MODULE_STATE_COMING
-   || old->state == MODULE_STATE_UNFORMED) {
+   if (old->state != MODULE_STATE_LIVE) {
/* Wait in case it fails to load. */
mutex_unlock(_mutex);
-   err = wait_event_interruptible(module_wq,
-  finished_loading(mod->name));
+   err = wait_event_interruptible_timeout(module_wq,
+  finished_loading(mod->name),
+  HZ / 1);
if (err)
goto out_unlocked;
goto again;

The original module dependency race issue is fixed simply by changing the
conditional to checking !MODULE_STATE_LIVE.  This, unfortunately, exposed some
other problems within the code.

The module_wq is only run when a module fails to load.  It's possible that
the time between the module's failed init() call and running module_wq
(kernel/module.c:3455) takes a while.  Any thread entering the
add_unformed_module() code while the old module is unloading is put to sleep
waiting for the module_wq to execute.

On the 192 thread box I have noticed that the acpi_cpufreq module attempts
to load 392 times (that is not a typo and I am going to try to figure that
problem out after this one).  This means 191 cpus are put to sleep, and one
cpu is executing the acpi_cpufreq module unload which is executing
do_init_module() and is now at

fail_free_freeinit:
kfree(freeinit);
fail:
/* Try to protect us from buggy refcounters. */
mod->state = MODULE_STATE_GOING;
synchronize_rcu();
module_put(mod);
blocking_notifier_call_chain(_notify_list,
 MODULE_STATE_GOING, mod);
klp_module_going(mod);
ftrace_release_mod(mod);
free_module(mod);
wake_up_all(_wq);
return ret;
}

The 191 threads cannot schedule and the system is effectively stuck.  It *does*
eventually free itself but in some cases it takes minutes to do so.

A simple fix for this is to, as I've done above, to add a timeout so that
the threads can be scheduled which allows other processes to run.  After
thinking about it a bit more, however, I wonder if a better approach is to
change the mod->state to MODULE_FAILED & running the module_wq immediately so
that the threads can return an error.

I'm experimenting with that now.

P.


Re: [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading

2019-05-01 Thread Prarit Bhargava



On 4/30/19 6:22 PM, Prarit Bhargava wrote:
> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
> loading the s390_trng.ko module.
> 
> Add a reschedule point to the loop that waits for modules to complete
> loading.
> 
> v3: cleanup Fixes line.

Jessica, even with this additional patch there appears to be some other issues
in the module code that are causing significant delays in boot up on large
systems.

Please revert these fixes from linux-next & modules-next.  I apologize for the
extra work but I think it is for the best until I come up with a more complete &
better tested patch.

FWIW, the logic in the original patch is correct.  It's just that there's, as
Heiko discovered, some poor scheduling, etc., that is impacting the module
loading code after these changes.

Again, my apologies,

P.

> 
> Reported-by: Heiko Carstens 
> Fixes: f9a75c1d717f ("modules: Only return -EEXIST for modules that have 
> finished loading")
> Signed-off-by: Prarit Bhargava 
> Cc: Jessica Yu 
> Cc: Heiko Carstens 
> ---
>  kernel/module.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 410eeb7e4f1d..48748cfec991 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3585,6 +3585,7 @@ static int add_unformed_module(struct module *mod)
>  finished_loading(mod->name));
>   if (err)
>   goto out_unlocked;
> + cond_resched();
>   goto again;
>   }
>   err = -EEXIST;
> 


[PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading

2019-04-30 Thread Prarit Bhargava
On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
loading the s390_trng.ko module.

Add a reschedule point to the loop that waits for modules to complete
loading.

v3: cleanup Fixes line.

Reported-by: Heiko Carstens 
Fixes: f9a75c1d717f ("modules: Only return -EEXIST for modules that have 
finished loading")
Signed-off-by: Prarit Bhargava 
Cc: Jessica Yu 
Cc: Heiko Carstens 
---
 kernel/module.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module.c b/kernel/module.c
index 410eeb7e4f1d..48748cfec991 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3585,6 +3585,7 @@ static int add_unformed_module(struct module *mod)
   finished_loading(mod->name));
if (err)
goto out_unlocked;
+   cond_resched();
goto again;
}
err = -EEXIST;
-- 
2.18.1



Re: [PATCH v2] kernel/module: Reschedule while waiting for modules to finish loading

2019-04-30 Thread Prarit Bhargava



On 4/30/19 6:18 PM, Prarit Bhargava wrote:
> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
> loading the s390_trng.ko module.
> 
> Add a reschedule point to the loop that waits for modules to complete
> loading.
> 

Sorry, sent in error.

P.

> v2: cleanup Fixes line.
> 
> Reported-by: Heiko Carstens 
> Fixes: commit f9a75c1d717f ("modules: Only return -EEXIST for modules that 
> have finished loading")
> Signed-off-by: Prarit Bhargava 
> Cc: Jessica Yu 
> Cc: Heiko Carstens 
> ---
>  kernel/module.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 410eeb7e4f1d..48748cfec991 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3585,6 +3585,7 @@ static int add_unformed_module(struct module *mod)
>  finished_loading(mod->name));
>   if (err)
>   goto out_unlocked;
> + cond_resched();
>   goto again;
>   }
>   err = -EEXIST;
> 


[PATCH v2] kernel/module: Reschedule while waiting for modules to finish loading

2019-04-30 Thread Prarit Bhargava
On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
loading the s390_trng.ko module.

Add a reschedule point to the loop that waits for modules to complete
loading.

v2: cleanup Fixes line.

Reported-by: Heiko Carstens 
Fixes: commit f9a75c1d717f ("modules: Only return -EEXIST for modules that have 
finished loading")
Signed-off-by: Prarit Bhargava 
Cc: Jessica Yu 
Cc: Heiko Carstens 
---
 kernel/module.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module.c b/kernel/module.c
index 410eeb7e4f1d..48748cfec991 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3585,6 +3585,7 @@ static int add_unformed_module(struct module *mod)
   finished_loading(mod->name));
if (err)
goto out_unlocked;
+   cond_resched();
goto again;
}
err = -EEXIST;
-- 
2.18.1



Re: linux-next: Fixes tag needs some work in the modules tree

2019-04-30 Thread Prarit Bhargava



On 4/30/19 6:10 PM, Stephen Rothwell wrote:
> Hi all,
> 
> In commit
> 
>   7e470ea99bcd ("kernel/module: Reschedule while waiting for modules to 
> finish loading")
> 
> Fixes tag
> 
>   Fixes: linux-next commit f9a75c1d717f ("modules: Only return -EEXIST for 
> modules that have finished loading")
> 
> has these problem(s):
> 
>   - the "linux-next commit" is unexpected (and not really meaningful
> once this is merged into Linus' tree)
> 

Stephen, I will send out an updated patch now.

P.


Re: [PATCH] kernel/module: Reschedule while waiting for modules to finish loading

2019-04-30 Thread Prarit Bhargava



On 4/30/19 3:51 AM, Jessica Yu wrote:
> +++ Prarit Bhargava [29/04/19 11:17 -0400]:
>> Heiko, do you want a Signed-off-by or a Reported-by?  Either one works
>> for me.
>>
>> P.
> 
> I think you forgot to CC Heiko :)

#oops.

I forgot that git-send-email doesn't pick up "Reported-by".

P.

> 
>> 8<
>>
>> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
>> loading the s390_trng.ko module.
>>
>> Add a reschedule point to the loop that waits for modules to complete
>> loading.
>>
>> Reported-by: Heiko Carstens 
>> Fixes: linux-next commit f9a75c1d717f ("modules: Only return -EEXIST for
>> modules that have finished loading")
>> Signed-off-by: Prarit Bhargava 
>> Cc: Jessica Yu 
>> ---
>> kernel/module.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 410eeb7e4f1d..48748cfec991 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -3585,6 +3585,7 @@ static int add_unformed_module(struct module *mod)
>>    finished_loading(mod->name));
>>     if (err)
>>     goto out_unlocked;
>> +    cond_resched();
>>     goto again;
>>     }
>>     err = -EEXIST;
>> -- 
>> 2.18.1
>>


[PATCH] kernel/module: Reschedule while waiting for modules to finish loading

2019-04-29 Thread Prarit Bhargava
Heiko, do you want a Signed-off-by or a Reported-by?  Either one works
for me.

P.

8<

On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
loading the s390_trng.ko module.

Add a reschedule point to the loop that waits for modules to complete
loading.

Reported-by: Heiko Carstens 
Fixes: linux-next commit f9a75c1d717f ("modules: Only return -EEXIST for 
modules that have finished loading")
Signed-off-by: Prarit Bhargava 
Cc: Jessica Yu 
---
 kernel/module.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module.c b/kernel/module.c
index 410eeb7e4f1d..48748cfec991 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3585,6 +3585,7 @@ static int add_unformed_module(struct module *mod)
   finished_loading(mod->name));
if (err)
goto out_unlocked;
+   cond_resched();
goto again;
}
err = -EEXIST;
-- 
2.18.1



Re: [-next] system hangs likely due to "modules: Only return -EEXIST for modules that have finished loading"

2019-04-27 Thread Prarit Bhargava



On 4/27/19 6:24 AM, Heiko Carstens wrote:

> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 410eeb7e4f1d..48748cfec991 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3585,6 +3585,7 @@ again:
>  finished_loading(mod->name));
>   if (err)
>   goto out_unlocked;
> + cond_resched();
>   goto again;
>   }
>   err = -EEXIST;
> 

Heiko, I'm testing on 2-cpu systems which appear to show the problem ~10% of the
time.  On another system I backed out my original patch to set a baseline, and
noticed that occasionally the time to boot the system doubles from ~4 seconds to
9 seconds.  Is this something you're also concerned with?

P.


Re: [-next] system hangs likely due to "modules: Only return -EEXIST for modules that have finished loading"

2019-04-27 Thread Prarit Bhargava



On 4/27/19 6:24 AM, Heiko Carstens wrote:
> On Fri, Apr 26, 2019 at 08:20:52PM -0400, Prarit Bhargava wrote:
>> Heiko and Jessica,
>>
>> The issue doesn't appear to be with my patch AFAICT.  The s390_trng fails to
>> load and then the kernel occasionally hangs (as Heiko mentioned) calling
>> synchronize_rcu().
>>
>> The call sequence is
>>
>> module_load()
>>  do_init_module()
>>  do_one_initcall(mod->init)
>>
>> which fails.
>>
>> The failure path in do_one_initcall() is entered and we start executing code 
>> at
>> kernel/module.c:3541
>>
>> fail_free_freeinit:
>> kfree(freeinit);
>> fail:
>> /* Try to protect us from buggy refcounters. */
>> mod->state = MODULE_STATE_GOING;
>> synchronize_rcu();
>>
>> ^^^ the kernel hangs here.  Sometimes it's very short and other times it 
>> seems
>> to hang.   I've left systems that appear to be hung and come back after 10
>> minutes to find that they've somehow made it through this call.
>>
>> Is there a known issue with RCU on s390 that is making this occur?
> 
> No there is no known issue with RCU on s390. The reason that
> synchronize_rcu() doesn't finish is because a different cpu is within
> an endless loop in add_unformed_module() just like Jessica suspected.
> 
> Note: the kernel is compiled with CONFIG_PREEMPT off - there is no
> kernel preemption that will make the looping cpu ever go over schedule
> and subsequently let synchronize_rcu() finish.
> 
> To confirm Jessicas theory - looking into the dump we have:
> 
> crash> bt 742
> PID: 742TASK: 1efa6c000 CPU: 7   COMMAND: "systemd-udevd"
>  #0 [3e0043aba30] __schedule at abb25e
>  #1 [3e0043abaa0] schedule at abb6a2
>  #2 [3e0043abac8] schedule_timeout at abf49a
>  #3 [3e0043abb60] wait_for_common at abc396
>  #4 [3e0043abbf0] __wait_rcu_gp at 1c0136
>  #5 [3e0043abc48] synchronize_rcu at 1c72ea
>  #6 [3e0043abc98] do_init_module at 1f10be
>  #7 [3e0043abcf0] load_module at 1f3594
>  #8 [3e0043abdd0] __se_sys_init_module at 1f3af0
>  #9 [3e0043abea8] system_call at ac0766
> 
> Which is the process waiting for synchronize_rcu to finish. Wading
> through the stack frames gives me this struct module:
> 
> struct module {
>   state = MODULE_STATE_GOING,
>   list = {
> next = 0x3ff80394508,
> prev = 0xe25090 
>   },
>   name = "s390_trng\000...
> ...
> 
> Then we have the looping task/cpu:
> 
> PID: 731TASK: 1e79ba000 CPU: 7   COMMAND: "systemd-udevd"
>  LOWCORE INFO:
>   -psw  : 0x0704c0018000 0x00ab666a
>   -function : memcmp at ab666a
> ...
>   -general registers:
>  0x0009 0x0009
>  0x03ff80347321 00
>  0x03ff8034f321 00
>  0x001e 0x03ff8c592708
>  0x03e0047da900 0x03ff8034f318
>  0x0001 0x0009
>  0x03ff80347300 0x00ad81b8
>  0x001ee062 0x03e004357cb0
>  #0 [3e004357cf0] load_module at 1f1eb0
>  #1 [3e004357dd0] __se_sys_init_module at 1f3af0
>  #2 [3e004357ea8] system_call at ac0766
> 
> which is find_module_all() calling memcmp with this string:
> 
> 3ff80347318:  79305f74726e 6700   s390_trng...
> 
> So it all seems to fit. A simple cond_resched() call, which enforces
> an RCU quiescent state for the calliung cpu, fixes the problem for me
> (patch on top of linux-next 20190424 -- c392798a85ab):
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 410eeb7e4f1d..48748cfec991 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3585,6 +3585,7 @@ again:
>  finished_loading(mod->name));
>   if (err)
>   goto out_unlocked;
> + cond_resched();

You just beat me to posting this :).  I am testing out this same patch in a loop
on a few systems here in the lab.  Will let you know results on Monday.

P.

>   goto again;
>   }
>   err = -EEXIST;
> 


Re: [-next] system hangs likely due to "modules: Only return -EEXIST for modules that have finished loading"

2019-04-26 Thread Prarit Bhargava



On 4/26/19 3:45 PM, Prarit Bhargava wrote:
> 
> 
> On 4/26/19 2:10 PM, Prarit Bhargava wrote:
>>
>>
>> On 4/26/19 12:09 PM, Jessica Yu wrote:
>>> +++ Heiko Carstens [26/04/19 17:07 +0200]:
>>>> On Fri, Apr 26, 2019 at 09:22:34AM -0400, Prarit Bhargava wrote:
>>>>> On 4/26/19 9:07 AM, Heiko Carstens wrote:
>>>>>> Hello Prarit,
>>>>>>
>>>>>> it looks like your commit f9a75c1d717f ("modules: Only return -EEXIST
>>>>>> for modules that have finished loading") _sometimes_ causes hangs on
>>>>>> s390. This is unfortunately not 100% reproducible, however the
>>>>>> mentioned commit seems to be the only relevant one in modules.c.
>>>>>>
>>>>>> What I see is a hanging system with messages like this on the console:
>>>>>>
>>>>>> [   65.876040] rcu: INFO: rcu_sched self-detected stall on CPU
>>>>>> [   65.876049] rcu: 7-: (5999 ticks this GP)
>>>>> idle=eae/1/0x4002 softirq=1181/1181 fqs=2729
>>>>>> [   65.876078]  (t=6000 jiffies g=-471 q=17196)
>>>>>> [   65.876084] Task dump for CPU 7:
>>>>>> [   65.876088] systemd-udevd   R  running task0   731721
>>>>> 0x0604
>>>>>> [   65.876097] Call Trace:
>>>>>> [   65.876113] ([<00abb264>] __schedule+0x2e4/0x6e0)
>>>>>> [   65.876122]  [<001ee486>] finished_loading+0x4e/0xb0
>>>>>> [   65.876128]  [<001f1ed6>] load_module+0xcce/0x27a0
>>>>>> [   65.876134]  [<001f3af0>] __s390x_sys_init_module+0x148/0x178
>>>>>> [   65.876142]  [<00ac0766>] system_call+0x2aa/0x2c8
>>>>>> I did not look any further into the dump, however since the commit
>>>>>> touches exactly the code path which seems to be looping... ;)
>>>>>>
>>>>>
>>>>> Ouch :(  I wonder if I exposed a further race or another bug.  Heiko, can 
>>>>> you
>>>>> determine which module is stuck?  Warning: I have not compiled this code.
>>>>
>>>> Here we go:
>>>>
>>>> [   11.716866] PRARIT: waiting for module s390_trng to load.
>>>> [   11.716867] PRARIT: waiting for module s390_trng to load.
>>>> [   11.716868] PRARIT: waiting for module s390_trng to load.
>>>> [   11.716870] PRARIT: waiting for module s390_trng to load.
>>>> [   11.716871] PRARIT: waiting for module s390_trng to load.
>>>> [   11.716872] PRARIT: waiting for module s390_trng to load.
>>>> [   11.716874] PRARIT: waiting for module s390_trng to load.
>>>> [   11.716875] PRARIT: waiting for module s390_trng to load.
>>>> [   11.716876] PRARIT: waiting for module s390_trng to load.
>>>> [   16.726850] add_unformed_module: 31403529 callbacks suppressed
>>>> [   16.726853] PRARIT: waiting for module s390_trng to load.
>>>> [   16.726862] PRARIT: waiting for module s390_trng to load.
>>>> [   16.726865] PRARIT: waiting for module s390_trng to load.
>>>> [   16.726867] PRARIT: waiting for module s390_trng to load.
>>>> [   16.726869] PRARIT: waiting for module s390_trng to load.
>>>>
>>>> If I'm not mistaken then there was _no_ corresponding message on the
>>>> console stating that the module already exists.
>>
>> Heiko,
>>
>> Where is the s390_trng module?  I can't seem to find it in the tree.
>>
>> [02:06 PM root@ibm-z-108 linux-next]# find ./ -name *s390_trng*
>> [02:06 PM root@ibm-z-108 linux-next]# git grep s390_trng
>> [02:07 PM root@ibm-z-108 linux-next]#
>>
> 
> Never mind, I found it: s390-trng.c
> 
> P.
> 
>> P.
>>


Heiko and Jessica,

The issue doesn't appear to be with my patch AFAICT.  The s390_trng fails to
load and then the kernel occasionally hangs (as Heiko mentioned) calling
synchronize_rcu().

The call sequence is

module_load()
do_init_module()
do_one_initcall(mod->init)

which fails.

The failure path in do_one_initcall() is entered and we start executing code at
kernel/module.c:3541

fail_free_freeinit:
kfree(freeinit);
fail:
/* Try to protect us from buggy refcounters. */
mod->state = MODULE_STATE_GOING;
synchronize_rcu();

^^^ the kernel hangs here.  Sometimes it's very short and other times it seems
to hang.   I've left systems that appear to be hung and come back after 10
minutes to find that they've somehow made it through this call.

Is there a known issue with RCU on s390 that is making this occur?

P.


Re: [-next] system hangs likely due to "modules: Only return -EEXIST for modules that have finished loading"

2019-04-26 Thread Prarit Bhargava



On 4/26/19 2:10 PM, Prarit Bhargava wrote:
> 
> 
> On 4/26/19 12:09 PM, Jessica Yu wrote:
>> +++ Heiko Carstens [26/04/19 17:07 +0200]:
>>> On Fri, Apr 26, 2019 at 09:22:34AM -0400, Prarit Bhargava wrote:
>>>> On 4/26/19 9:07 AM, Heiko Carstens wrote:
>>>>> Hello Prarit,
>>>>>
>>>>> it looks like your commit f9a75c1d717f ("modules: Only return -EEXIST
>>>>> for modules that have finished loading") _sometimes_ causes hangs on
>>>>> s390. This is unfortunately not 100% reproducible, however the
>>>>> mentioned commit seems to be the only relevant one in modules.c.
>>>>>
>>>>> What I see is a hanging system with messages like this on the console:
>>>>>
>>>>> [   65.876040] rcu: INFO: rcu_sched self-detected stall on CPU
>>>>> [   65.876049] rcu: 7-: (5999 ticks this GP)
>>>> idle=eae/1/0x4002 softirq=1181/1181 fqs=2729
>>>>> [   65.876078]  (t=6000 jiffies g=-471 q=17196)
>>>>> [   65.876084] Task dump for CPU 7:
>>>>> [   65.876088] systemd-udevd   R  running task0   731721
>>>> 0x0604
>>>>> [   65.876097] Call Trace:
>>>>> [   65.876113] ([<00abb264>] __schedule+0x2e4/0x6e0)
>>>>> [   65.876122]  [<001ee486>] finished_loading+0x4e/0xb0
>>>>> [   65.876128]  [<001f1ed6>] load_module+0xcce/0x27a0
>>>>> [   65.876134]  [<001f3af0>] __s390x_sys_init_module+0x148/0x178
>>>>> [   65.876142]  [<00ac0766>] system_call+0x2aa/0x2c8
>>>>> I did not look any further into the dump, however since the commit
>>>>> touches exactly the code path which seems to be looping... ;)
>>>>>
>>>>
>>>> Ouch :(  I wonder if I exposed a further race or another bug.  Heiko, can 
>>>> you
>>>> determine which module is stuck?  Warning: I have not compiled this code.
>>>
>>> Here we go:
>>>
>>> [   11.716866] PRARIT: waiting for module s390_trng to load.
>>> [   11.716867] PRARIT: waiting for module s390_trng to load.
>>> [   11.716868] PRARIT: waiting for module s390_trng to load.
>>> [   11.716870] PRARIT: waiting for module s390_trng to load.
>>> [   11.716871] PRARIT: waiting for module s390_trng to load.
>>> [   11.716872] PRARIT: waiting for module s390_trng to load.
>>> [   11.716874] PRARIT: waiting for module s390_trng to load.
>>> [   11.716875] PRARIT: waiting for module s390_trng to load.
>>> [   11.716876] PRARIT: waiting for module s390_trng to load.
>>> [   16.726850] add_unformed_module: 31403529 callbacks suppressed
>>> [   16.726853] PRARIT: waiting for module s390_trng to load.
>>> [   16.726862] PRARIT: waiting for module s390_trng to load.
>>> [   16.726865] PRARIT: waiting for module s390_trng to load.
>>> [   16.726867] PRARIT: waiting for module s390_trng to load.
>>> [   16.726869] PRARIT: waiting for module s390_trng to load.
>>>
>>> If I'm not mistaken then there was _no_ corresponding message on the
>>> console stating that the module already exists.
> 
> Heiko,
> 
> Where is the s390_trng module?  I can't seem to find it in the tree.
> 
> [02:06 PM root@ibm-z-108 linux-next]# find ./ -name *s390_trng*
> [02:06 PM root@ibm-z-108 linux-next]# git grep s390_trng
> [02:07 PM root@ibm-z-108 linux-next]#
> 

Never mind, I found it: s390-trng.c

P.

> P.
> 
>>
>> Hm, my current theory is that we have a module whose exit() function
>> is taking a while to run to completion. While it is doing so, the
>> module's state is already set to MODULE_STATE_GOING.
>>
>> With Prarit's patch, since this module is probably still in GOING,
>> add_unformed_module() will wait until the module is finally gone. If
>> this takes too long, we will keep trying to add ourselves to the
>> module list and hence stay in the loop in add_unformed_module().
>> According to Documentation/RCU/stallwarn.txt, this looping in the
>> kernel may trigger an rcu stall warning (see bullet point stating "a
>> CPU looping anywhere in the kernel without invoking schedule()".
>>
>> Heiko, could you modify the patch to print the module's state to
>> confirm?
>>
>> Thanks,
>>
>> Jessica


Re: [-next] system hangs likely due to "modules: Only return -EEXIST for modules that have finished loading"

2019-04-26 Thread Prarit Bhargava



On 4/26/19 12:09 PM, Jessica Yu wrote:
> +++ Heiko Carstens [26/04/19 17:07 +0200]:
>> On Fri, Apr 26, 2019 at 09:22:34AM -0400, Prarit Bhargava wrote:
>>> On 4/26/19 9:07 AM, Heiko Carstens wrote:
>>> > Hello Prarit,
>>> >
>>> > it looks like your commit f9a75c1d717f ("modules: Only return -EEXIST
>>> > for modules that have finished loading") _sometimes_ causes hangs on
>>> > s390. This is unfortunately not 100% reproducible, however the
>>> > mentioned commit seems to be the only relevant one in modules.c.
>>> >
>>> > What I see is a hanging system with messages like this on the console:
>>> >
>>> > [   65.876040] rcu: INFO: rcu_sched self-detected stall on CPU
>>> > [   65.876049] rcu: 7-: (5999 ticks this GP)
>>> idle=eae/1/0x4002 softirq=1181/1181 fqs=2729
>>> > [   65.876078]  (t=6000 jiffies g=-471 q=17196)
>>> > [   65.876084] Task dump for CPU 7:
>>> > [   65.876088] systemd-udevd   R  running task    0   731    721
>>> 0x0604
>>> > [   65.876097] Call Trace:
>>> > [   65.876113] ([<00abb264>] __schedule+0x2e4/0x6e0)
>>> > [   65.876122]  [<001ee486>] finished_loading+0x4e/0xb0
>>> > [   65.876128]  [<001f1ed6>] load_module+0xcce/0x27a0
>>> > [   65.876134]  [<001f3af0>] __s390x_sys_init_module+0x148/0x178
>>> > [   65.876142]  [<00ac0766>] system_call+0x2aa/0x2c8
>>> > I did not look any further into the dump, however since the commit
>>> > touches exactly the code path which seems to be looping... ;)
>>> >
>>>
>>> Ouch :(  I wonder if I exposed a further race or another bug.  Heiko, can 
>>> you
>>> determine which module is stuck?  Warning: I have not compiled this code.
>>
>> Here we go:
>>
>> [   11.716866] PRARIT: waiting for module s390_trng to load.
>> [   11.716867] PRARIT: waiting for module s390_trng to load.
>> [   11.716868] PRARIT: waiting for module s390_trng to load.
>> [   11.716870] PRARIT: waiting for module s390_trng to load.
>> [   11.716871] PRARIT: waiting for module s390_trng to load.
>> [   11.716872] PRARIT: waiting for module s390_trng to load.
>> [   11.716874] PRARIT: waiting for module s390_trng to load.
>> [   11.716875] PRARIT: waiting for module s390_trng to load.
>> [   11.716876] PRARIT: waiting for module s390_trng to load.
>> [   16.726850] add_unformed_module: 31403529 callbacks suppressed
>> [   16.726853] PRARIT: waiting for module s390_trng to load.
>> [   16.726862] PRARIT: waiting for module s390_trng to load.
>> [   16.726865] PRARIT: waiting for module s390_trng to load.
>> [   16.726867] PRARIT: waiting for module s390_trng to load.
>> [   16.726869] PRARIT: waiting for module s390_trng to load.
>>
>> If I'm not mistaken then there was _no_ corresponding message on the
>> console stating that the module already exists.

Heiko,

Where is the s390_trng module?  I can't seem to find it in the tree.

[02:06 PM root@ibm-z-108 linux-next]# find ./ -name *s390_trng*
[02:06 PM root@ibm-z-108 linux-next]# git grep s390_trng
[02:07 PM root@ibm-z-108 linux-next]#

P.

> 
> Hm, my current theory is that we have a module whose exit() function
> is taking a while to run to completion. While it is doing so, the
> module's state is already set to MODULE_STATE_GOING.
> 
> With Prarit's patch, since this module is probably still in GOING,
> add_unformed_module() will wait until the module is finally gone. If
> this takes too long, we will keep trying to add ourselves to the
> module list and hence stay in the loop in add_unformed_module().
> According to Documentation/RCU/stallwarn.txt, this looping in the
> kernel may trigger an rcu stall warning (see bullet point stating "a
> CPU looping anywhere in the kernel without invoking schedule()".
> 
> Heiko, could you modify the patch to print the module's state to
> confirm?
> 
> Thanks,
> 
> Jessica


Re: [-next] system hangs likely due to "modules: Only return -EEXIST for modules that have finished loading"

2019-04-26 Thread Prarit Bhargava



On 4/26/19 12:09 PM, Jessica Yu wrote:
> +++ Heiko Carstens [26/04/19 17:07 +0200]:
>> On Fri, Apr 26, 2019 at 09:22:34AM -0400, Prarit Bhargava wrote:
>>> On 4/26/19 9:07 AM, Heiko Carstens wrote:
>>> > Hello Prarit,
>>> >
>>> > it looks like your commit f9a75c1d717f ("modules: Only return -EEXIST
>>> > for modules that have finished loading") _sometimes_ causes hangs on
>>> > s390. This is unfortunately not 100% reproducible, however the
>>> > mentioned commit seems to be the only relevant one in modules.c.
>>> >
>>> > What I see is a hanging system with messages like this on the console:
>>> >
>>> > [   65.876040] rcu: INFO: rcu_sched self-detected stall on CPU
>>> > [   65.876049] rcu: 7-: (5999 ticks this GP)
>>> idle=eae/1/0x4002 softirq=1181/1181 fqs=2729
>>> > [   65.876078]  (t=6000 jiffies g=-471 q=17196)
>>> > [   65.876084] Task dump for CPU 7:
>>> > [   65.876088] systemd-udevd   R  running task    0   731    721
>>> 0x0604
>>> > [   65.876097] Call Trace:
>>> > [   65.876113] ([<00abb264>] __schedule+0x2e4/0x6e0)
>>> > [   65.876122]  [<001ee486>] finished_loading+0x4e/0xb0
>>> > [   65.876128]  [<001f1ed6>] load_module+0xcce/0x27a0
>>> > [   65.876134]  [<001f3af0>] __s390x_sys_init_module+0x148/0x178
>>> > [   65.876142]  [<00ac0766>] system_call+0x2aa/0x2c8
>>> > I did not look any further into the dump, however since the commit
>>> > touches exactly the code path which seems to be looping... ;)
>>> >
>>>
>>> Ouch :(  I wonder if I exposed a further race or another bug.  Heiko, can 
>>> you
>>> determine which module is stuck?  Warning: I have not compiled this code.
>>
>> Here we go:
>>
>> [   11.716866] PRARIT: waiting for module s390_trng to load.
>> [   11.716867] PRARIT: waiting for module s390_trng to load.
>> [   11.716868] PRARIT: waiting for module s390_trng to load.
>> [   11.716870] PRARIT: waiting for module s390_trng to load.
>> [   11.716871] PRARIT: waiting for module s390_trng to load.
>> [   11.716872] PRARIT: waiting for module s390_trng to load.
>> [   11.716874] PRARIT: waiting for module s390_trng to load.
>> [   11.716875] PRARIT: waiting for module s390_trng to load.
>> [   11.716876] PRARIT: waiting for module s390_trng to load.
>> [   16.726850] add_unformed_module: 31403529 callbacks suppressed
>> [   16.726853] PRARIT: waiting for module s390_trng to load.
>> [   16.726862] PRARIT: waiting for module s390_trng to load.
>> [   16.726865] PRARIT: waiting for module s390_trng to load.
>> [   16.726867] PRARIT: waiting for module s390_trng to load.
>> [   16.726869] PRARIT: waiting for module s390_trng to load.
>>
>> If I'm not mistaken then there was _no_ corresponding message on the
>> console stating that the module already exists.
> 
> Hm, my current theory is that we have a module whose exit() function
> is taking a while to run to completion. While it is doing so, the
> module's state is already set to MODULE_STATE_GOING.
> 
> With Prarit's patch, since this module is probably still in GOING,
> add_unformed_module() will wait until the module is finally gone. If
> this takes too long, we will keep trying to add ourselves to the
> module list and hence stay in the loop in add_unformed_module().
> According to Documentation/RCU/stallwarn.txt, this looping in the
> kernel may trigger an rcu stall warning (see bullet point stating "a
> CPU looping anywhere in the kernel without invoking schedule()".
> 

Yeah, that's what I'm thinking too.  The question is, however, why that module
is taking so long that it stalls the system.  If the module state is going then
the probe of the module has already failed.  Could it be some weird bug in the
notifier chains?  And :) why aren't I seeing this all the time?

P.

> Heiko, could you modify the patch to print the module's state to
> confirm?
> 
> Thanks,
> 
> Jessica


Re: [-next] system hangs likely due to "modules: Only return -EEXIST for modules that have finished loading"

2019-04-26 Thread Prarit Bhargava



On 4/26/19 9:07 AM, Heiko Carstens wrote:
> Hello Prarit,
> 
> it looks like your commit f9a75c1d717f ("modules: Only return -EEXIST
> for modules that have finished loading") _sometimes_ causes hangs on
> s390. This is unfortunately not 100% reproducible, however the
> mentioned commit seems to be the only relevant one in modules.c.
> 
> What I see is a hanging system with messages like this on the console:
> 
> [   65.876040] rcu: INFO: rcu_sched self-detected stall on CPU
> [   65.876049] rcu: 7-: (5999 ticks this GP) 
> idle=eae/1/0x4002 softirq=1181/1181 fqs=2729
> [   65.876078]  (t=6000 jiffies g=-471 q=17196)
> [   65.876084] Task dump for CPU 7:
> [   65.876088] systemd-udevd   R  running task0   731721 
> 0x0604
> [   65.876097] Call Trace:
> [   65.876113] ([<00abb264>] __schedule+0x2e4/0x6e0)
> [   65.876122]  [<001ee486>] finished_loading+0x4e/0xb0
> [   65.876128]  [<001f1ed6>] load_module+0xcce/0x27a0
> [   65.876134]  [<001f3af0>] __s390x_sys_init_module+0x148/0x178
> [   65.876142]  [<00ac0766>] system_call+0x2aa/0x2c8
> 
> crash's backtrace of this task looks like this:
> 
> PID: 731TASK: 1e79ba000 CPU: 7   COMMAND: "systemd-udevd"
>  LOWCORE INFO:
>   -psw  : 0x0704c0018000 0x00ab666a
>   -function : memcmp at ab666a
>   -prefix   : 0x7fe32000
>   -cpu timer: 0x7f5784d0f048
>   -clock cmp: 0xd60757081cd70d00
>   -general registers:
>  0x0009 0x0009
>  0x03ff80347321 00
>  0x03ff8034f321 00
>  0x001e 0x03ff8c592708
>  0x03e0047da900 0x03ff8034f318
>  0x0001 0x0009
>  0x03ff80347300 0x00ad81b8
>  0x001ee062 0x03e004357cb0
> [...]
>  #0 [3e004357cf0] load_module at 1f1eb0
>  #1 [3e004357dd0] __se_sys_init_module at 1f3af0
>  #2 [3e004357ea8] system_call at ac0766
>  PSW:  070500018000 03ff8c27ad3e (user space)
>  GPRS: 02aa280d2fb0 03ff8c27ad3c 0080 35fd
>03ff8c592708 02aa280c8770  02aa280d7130
>0002 02aa28915760 03ff8c592708 02aa280c00b0
>03ff8c8fff80 02aa280c00b0 03ff8c58a456 03ffe9cfde90
> 
> I did not look any further into the dump, however since the commit
> touches exactly the code path which seems to be looping... ;)
> 

Ouch :(  I wonder if I exposed a further race or another bug.  Heiko, can you
determine which module is stuck?  Warning: I have not compiled this code.

diff --git a/kernel/module.c b/kernel/module.c
index e8c1de2ab4e1..bd5eebc95049 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3567,6 +3567,8 @@ static int add_unformed_module(struct module *mod)
old = find_module_all(mod->name, strlen(mod->name), true);
if (old != NULL) {
if (old->state != MODULE_STATE_LIVE) {
+   printk_ratelimited("PRARIT: waiting for module %s to 
load.\n",
+  mod->name);
/* Wait in case it fails to load. */
mutex_unlock(_mutex);
err = wait_event_interruptible(module_wq,
@@ -3575,6 +3577,8 @@ static int add_unformed_module(struct module *mod)
goto out_unlocked;
goto again;
}
+   printk("PRARIT: module %s already exists.\n",
+  mod->name);
err = -EEXIST;
goto out;
}

P.


Re: [PATCH] modules: Only return -EEXIST for modules that have finished loading

2019-04-15 Thread Prarit Bhargava



On 4/15/19 7:23 AM, Jessica Yu wrote:
> +++ Prarit Bhargava [02/04/19 09:39 -0400]:
>> Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and
>> linux guests boot with repeated errors:
>>
>> amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
>> amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
>> amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
>> amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
>> amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
>> amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
>>
>> The warnings occur because the module code erroneously returns -EEXIST
>> for modules that have failed to load and are in the process of being
>> removed from the module list.
>>
>> module amd64_edac_mod has a dependency on module edac_mce_amd.  Using
>> modules.dep, systemd will load edac_mce_amd for every request of
>> amd64_edac_mod.  When the edac_mce_amd module loads, the module has
>> state MODULE_STATE_UNFORMED and once the module load fails and the state
>> becomes MODULE_STATE_GOING.  Another request for edac_mce_amd module
>> executes and add_unformed_module() will erroneously return -EEXIST even
>> though the previous instance of edac_mce_amd has MODULE_STATE_GOING.
>> Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which
>> fails because of unknown symbols from edac_mce_amd.
>>
>> add_unformed_module() must wait to return for any case other than
>> MODULE_STATE_LIVE to prevent a race between multiple loads of
>> dependent modules.
>>
>> Signed-off-by: Prarit Bhargava 
>> Reported-by: Cathy Avery 
>> Cc: Jessica Yu 
> 
> Applied to modules-next. Thanks Prarit!

Jessica, could I have the URL of the git tree?

Thanks,

P.

> 
> Jessica
> 


Re: [PATCH] tools/power turbostat: fix file descriptor leaks

2019-04-08 Thread Prarit Bhargava



On 4/8/19 12:12 PM, Gustavo A. R. Silva wrote:
> Fix file descriptor leaks by closing fp before return.
> 
> Addresses-Coverity-ID: 1444591 ("Resource leak")
> Addresses-Coverity-ID: 1444592 ("Resource leak")
> Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data")
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Prarit Bhargava 

P.

> ---
>  tools/power/x86/turbostat/turbostat.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/power/x86/turbostat/turbostat.c 
> b/tools/power/x86/turbostat/turbostat.c
> index c7727be9719f..78d9acba8e9b 100644
> --- a/tools/power/x86/turbostat/turbostat.c
> +++ b/tools/power/x86/turbostat/turbostat.c
> @@ -2924,6 +2924,7 @@ int snapshot_cpu_lpi_us(void)
>   if (retval != 1) {
>   fprintf(stderr, "Disabling Low Power Idle CPU output\n");
>   BIC_NOT_PRESENT(BIC_CPU_LPI);
> + fclose(fp);
>   return -1;
>   }
>  
> @@ -2950,6 +2951,7 @@ int snapshot_sys_lpi_us(void)
>   if (retval != 1) {
>   fprintf(stderr, "Disabling Low Power Idle System output\n");
>   BIC_NOT_PRESENT(BIC_SYS_LPI);
> + fclose(fp);
>   return -1;
>   }
>   fclose(fp);
> 


Re: [PATCH] modules: Only return -EEXIST for modules that have finished loading

2019-04-08 Thread Prarit Bhargava


Jessica?  ping?

P.

On 4/2/19 9:39 AM, Prarit Bhargava wrote:
> Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and
> linux guests boot with repeated errors:
> 
> amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
> amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
> amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
> amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
> amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
> amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
> 
> The warnings occur because the module code erroneously returns -EEXIST
> for modules that have failed to load and are in the process of being
> removed from the module list.
> 
> module amd64_edac_mod has a dependency on module edac_mce_amd.  Using
> modules.dep, systemd will load edac_mce_amd for every request of
> amd64_edac_mod.  When the edac_mce_amd module loads, the module has
> state MODULE_STATE_UNFORMED and once the module load fails and the state
> becomes MODULE_STATE_GOING.  Another request for edac_mce_amd module
> executes and add_unformed_module() will erroneously return -EEXIST even
> though the previous instance of edac_mce_amd has MODULE_STATE_GOING.
> Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which
> fails because of unknown symbols from edac_mce_amd.
> 
> add_unformed_module() must wait to return for any case other than
> MODULE_STATE_LIVE to prevent a race between multiple loads of
> dependent modules.
> 
> Signed-off-by: Prarit Bhargava 
> Reported-by: Cathy Avery 
> Cc: Jessica Yu 
> ---
>  kernel/module.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 0b9aa8ab89f0..e8c1de2ab4e1 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3566,8 +3566,7 @@ static int add_unformed_module(struct module *mod)
>   mutex_lock(_mutex);
>   old = find_module_all(mod->name, strlen(mod->name), true);
>   if (old != NULL) {
> - if (old->state == MODULE_STATE_COMING
> - || old->state == MODULE_STATE_UNFORMED) {
> + if (old->state != MODULE_STATE_LIVE) {
>   /* Wait in case it fails to load. */
>   mutex_unlock(_mutex);
>   err = wait_event_interruptible(module_wq,
> 


Re: [PATCH v1] tools/power: turbostat: fix buffer overrun

2019-04-03 Thread Prarit Bhargava



On 4/3/19 3:02 AM, Naoya Horiguchi wrote:
> turbostat could be terminated by general protection fault on some latest
> hardwares which (for example) support 9 levels of C-states and show 18
> "tADDED" lines. That bloats the total output and finally causes buffer
> overrun.  So let's extend the buffer to avoid this.
> 
> This patch also removes duplicated "pc10:" line to reduce buffer usage.
> 
> Signed-off-by: Naoya Horiguchi 
> ---
>  tools/power/x86/turbostat/turbostat.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git 
> v5.1-rc3-mmotm-2019-04-02-17-16/tools/power/x86/turbostat/turbostat.c 
> v5.1-rc3-mmotm-2019-04-02-17-16_patched/tools/power/x86/turbostat/turbostat.c
> index c7727be..17b1f544 100644
> --- v5.1-rc3-mmotm-2019-04-02-17-16/tools/power/x86/turbostat/turbostat.c
> +++ 
> v5.1-rc3-mmotm-2019-04-02-17-16_patched/tools/power/x86/turbostat/turbostat.c
> @@ -861,7 +861,6 @@ int dump_counters(struct thread_data *t, struct core_data 
> *c,
>   outp += sprintf(outp, "pc8: %016llX\n", p->pc8);
>   outp += sprintf(outp, "pc9: %016llX\n", p->pc9);
>   outp += sprintf(outp, "pc10: %016llX\n", p->pc10);
> - outp += sprintf(outp, "pc10: %016llX\n", p->pc10);
>   outp += sprintf(outp, "cpu_lpi: %016llX\n", p->cpu_lpi);
>   outp += sprintf(outp, "sys_lpi: %016llX\n", p->sys_lpi);
>   outp += sprintf(outp, "Joules PKG: %0X\n", p->energy_pkg);
> @@ -5135,7 +5134,7 @@ int initialize_counters(int cpu_id)
>  
>  void allocate_output_buffer()
>  {
> - output_buffer = calloc(1, (1 + topo.num_cpus) * 1024);
> + output_buffer = calloc(1, (1 + topo.num_cpus) * 2048);

Is there a better way to calculate the size of that buffer other than a magic
number?

P.

>   outp = output_buffer;
>   if (outp == NULL)
>   err(-1, "calloc output buffer");
> 


[PATCH] modules: Only return -EEXIST for modules that have finished loading

2019-04-02 Thread Prarit Bhargava
Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and
linux guests boot with repeated errors:

amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)

The warnings occur because the module code erroneously returns -EEXIST
for modules that have failed to load and are in the process of being
removed from the module list.

module amd64_edac_mod has a dependency on module edac_mce_amd.  Using
modules.dep, systemd will load edac_mce_amd for every request of
amd64_edac_mod.  When the edac_mce_amd module loads, the module has
state MODULE_STATE_UNFORMED and once the module load fails and the state
becomes MODULE_STATE_GOING.  Another request for edac_mce_amd module
executes and add_unformed_module() will erroneously return -EEXIST even
though the previous instance of edac_mce_amd has MODULE_STATE_GOING.
Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which
fails because of unknown symbols from edac_mce_amd.

add_unformed_module() must wait to return for any case other than
MODULE_STATE_LIVE to prevent a race between multiple loads of
dependent modules.

Signed-off-by: Prarit Bhargava 
Reported-by: Cathy Avery 
Cc: Jessica Yu 
---
 kernel/module.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 0b9aa8ab89f0..e8c1de2ab4e1 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3566,8 +3566,7 @@ static int add_unformed_module(struct module *mod)
mutex_lock(_mutex);
old = find_module_all(mod->name, strlen(mod->name), true);
if (old != NULL) {
-   if (old->state == MODULE_STATE_COMING
-   || old->state == MODULE_STATE_UNFORMED) {
+   if (old->state != MODULE_STATE_LIVE) {
/* Wait in case it fails to load. */
mutex_unlock(_mutex);
err = wait_event_interruptible(module_wq,
-- 
2.17.2



x86: Complete data loss during NMI lockup

2019-03-18 Thread Prarit Bhargava
Kernel 4.18.0 has a partial data loss when an NMI occurs but 5.0.0 has
complete data loss.

I tested with this simple kernel lockup code on kernels 4.18.0 through
5.0.0.  On 4.18.0 (and after) the expected lockup messages

 Uhhuh. NMI received for unknown reason 25 on CPU 0.
 Do you have a strange power saving mode enabled?
 Kernel panic - not syncing: NMI: Not continuing

and other debug messages are no longer printed to any console after commit
719f6a7040f1 ("printk: Use the main logbuf in NMI when logbuf_lock is 
available").

Things get worse with commit 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when
accessing the main log buffer in NMI") after which *NO* messages (including the
important stack trace) are output to the console at all.  There is a 100% data
loss with no clue as to what happened.

You can reproduce with a very simple spinlock lockup module:

static DEFINE_SPINLOCK(prarit_lock);

static void take_lock(void *data)
{
   unsigned long flags;

   pr_emerg("Taking lock on cpu %d\n", smp_processor_id());
   spin_lock_irqsave(_lock, flags);
}

static int init_dummy(void)
{
   on_each_cpu(take_lock, NULL, 0);
   return 0;
}

and the simple patch (sorry for the cut-and-paste)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 18bc9b51ac9b..d1800ecc16fc 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -312,6 +312,7 @@ static void default_do_nmi(struct pt_regs *regs)
int handled;
bool b2b = false;

+   pr_emerg("NMI on CPU %d.\n", smp_processor_id());
/*
 * CPU-specific NMI must be processed before non-CPU-specific
 * NMI, otherwise we may lose it, because the CPU-specific

During run-time a few NMIs happen, which result in

[  683.127105] NMI on CPU 0.
[  689.502992] NMI on CPU 91.
[  713.586459] NMI on CPU 92.
[  719.708041] NMI on CPU 3.
[ 1428.664558] NMI on CPU 95.
[ 2646.795310] NMI on CPU 87.

but nothing is output during the actual kernel lockup itself.  If I revert back
to the pre-719f6a7040f1 behaviour I do see the messages
on the console as well as the stack trace.

The call sequence is 5.0.0 is

nmi_enter()
-> printk_nmi_enter()
-> this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
-> printk()
-> vprintk_func()
-> vprintk_nmi()
-> printk_safe_log_store()

P.


  1   2   3   4   5   6   7   8   9   10   >