RE: [RFC PATCH 22/22] x86/fpu/xstate: Introduce boot-parameters for control some state component support

2020-10-13 Thread Brown, Len
> From: Andy Lutomirski  

> On Thu, Oct 1, 2020 at 1:43 PM Chang S. Bae  wrote:

> > "xstate.disable=0x6000" will disable AMX on a system that has AMX 
> > compiled into XFEATURE_MASK_USER_SUPPORTED.

> Can we please use words for this?  Perhaps:

> xstate.disable=amx,zmm

Yes, I think it is reasonable to add support for keywords for the features that
are both supported by the hardware and known by the kernel.

However, we need to continue to support numerical state-component bitmap format.
Otherwise, it would not be possible to coerce a legacy kernel (eg. a distro 
kernel)
to enable a feature on a new machine until it has been updated to know that 
keyword.

> and maybe add a list in /proc/cpuinfo or somewhere in /proc or /sys that 
> shows all the currently enabled xsave states.

Agreed, if we invent keywords, the list of supported+known keywords should be 
available on the system.

I do not advocate /proc/cpuinfo -- which is already an out of control parsing 
mess.

We could add the keywords to dmesg, since we already print the supported XSAVE 
BV:

[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating  point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'

Or maybe a sysfs attribute or a modparam that simply lists them all.

We wouldn't be able to dynamically  _write_ to that attribute, since the 
cmdline is boot-time only.

> > "xstate.enable=0x6000" will enable AMX on a system that does NOT have 
> > AMX compiled into XFEATURE_MASK_USER_SUPPORTED (assuming the kernel is 
> > new enough to support this feature).
>
> This sounds like it will be quite confusing to anyone reading the kernel code 
> to discover that a feature that is not "SUPPORTED" is nonetheless enabled.

Right now, this cmdline will only allow a new kernel to enable/disable kernel 
support for AMX
on hardware that also supports AMX.  But we hope to re-use this XSTATE code -- 
unchanged --
for future features that require just this simple state management support from 
the kernel.

We anticipate a future hardware enumeration mechanism to identify such 
qualified features
to assist the kernel in deciding whether to support a feature or not, by 
default.
The kernel can use the combination of its build-time config with advice from 
this boot-time
enumeration to decide if it wants to enable a particular feature or not.
And at the end, a user is empowered to override that default using this cmdline.

Thanks,
-Len



RE: [RFC PATCH 22/22] x86/fpu/xstate: Introduce boot-parameters for control some state component support

2020-10-13 Thread Brown, Len

> From: Randy Dunlap  

> What do these bitmasks look like?  what do the bits mean?
> Where does a user find this info?

The XSAVE state component bitmaps are detailed in
the Intel Software Developer's Manual, volume 1, Chapter 13:
"Managing State using the XSAVE Feature Set".

http://intel.com/sdm

In the kernel source, they are enumerated in xstate.c
and you can observe them in dmesg:

[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'

Thanks,
-Len



RE: [RFC PATCH 13/22] x86/fpu/xstate: Expand dynamic user state area on first use

2020-10-13 Thread Brown, Len
> From: Andy Lutomirski  

> > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> > @@ -518,3 +518,40 @@ int fpu__exception_code(struct fpu *fpu, int trap_nr)
..
> > +bool xfirstuse_event_handler(struct fpu *fpu)
> > +{
> > +   bool handled = false;
> > +   u64 event_mask;
> > +
> > +   /* Check whether the first-use detection is running. */
> > +   if (!static_cpu_has(X86_FEATURE_XFD) || !xfirstuse_enabled())
> > +   return handled;
> > +

> MSR_IA32_XFD_ERR needs to be wired up in the exception handler, not in
> some helper called farther down the stack

xfirstuse_event_handler() is called directly from the IDTENTRY 
exc_device_not_available():

> > @@ -1028,6 +1028,9 @@ DEFINE_IDTENTRY(exc_device_not_available)
> >  {
> > unsigned long cr0 = read_cr0();
> >  
> > +   if (xfirstuse_event_handler(>thread.fpu))
> > +   return;

Are you suggesting we should instead open code it inside that routine?

> But this raises an interesting point -- what happens if allocation
> fails?  I think that, from kernel code, we simply cannot support this
> exception mechanism.  If kernel code wants to use AMX (and that would
> be very strange indeed), it should call x86_i_am_crazy_amx_begin() and
> handle errors, not rely on exceptions.  From user code, I assume we
> send a signal if allocation fails.

The XFD feature allows us to transparently expand the kernel context switch 
buffer
for a user task, when that task first touches this associated hardware.
It allows applications to operate as if the kernel had allocated the backing AMX
context switch buffer at initialization time.  However, since we expect only
a sub-set of tasks to actually use AMX, we instead defer allocation until
we actually see first use for that task, rather than allocating for all tasks.

While we currently don't propose AMX use inside the kernel, it is conceivable
that could be done in the future, just like AVX is used by the RAID code;
and it would be done the same way -- kernel_fpu_begin()/kernel_fpu_end().
Such future Kernel AMX use would _not_ arm XFD, and would _not_ provoke this 
fault.
(note that we context switch the XFD-armed state per task)

vmalloc() does not fail, and does not return an error, and so there is no 
concept
of returning a signal.  If we got to the point where vmalloc() sleeps, then the 
system
has bigger OOM issues, and the OOM killer would be on the prowl.

If we were concerned about using vmalloc for a couple of pages in the task 
structure,
Then we could implement a routine to harvest unused buffers and free them --
but that didn't seem worth the complexity.  Note that this feature is 64-bit 
only.

Thanks,
-Len




RE: [RFC PATCH 07/22] x86/fpu/xstate: Introduce helpers to manage an xstate area dynamically

2020-10-13 Thread Brown, Len


>
> From: Andy Lutomirski  
>
> > +   /*
> > +* The caller may be under interrupt disabled condition. Ensure 
> > interrupt
> > +* allowance before the memory allocation, which may involve with 
> > page faults.
> > +*/
> > +   local_irq_enable();

> ... you can't just enable IRQs here.  If IRQs are off, they're off for a 
> reason.  Secondly, if they're *on*, you just forgot that fact.

Good catch.  This is a trap handler from user-space and should never be called 
with irqs disabled,
So the local_irq_enable() should be dead code.  Chang suggested that he 
erroneously left it in
from a previous implementation.

> > +   /* We need 64B aligned pointer, but vmalloc() returns a 
> > page-aligned address */
> > +   state_ptr = vmalloc(newsz);

> And allocating this state from vmalloc() seems questionable.  Why are you 
> doing this?

While the buffer needs to be virtually contiguous, it doesn't need to be 
physically contiguous,
And so vmalloc() is less overhead than kmalloc() for this.

Thanks,
-Len



RE: [PATCH AUTOSEL 5.2 82/94] tools/power turbostat: fix file descriptor leaks

2019-09-05 Thread Brown, Len
FWIW,

The latest turbostat and x86_energy_perf_policy utilities in the upstream 
kernel tree should always be backward compatible with all old kernels.  If that 
is EVER not the case, I want to know about it.

Yes, I know that some distros ship old versions of these utilities built out of 
their matching kernel tree snapshots.
Yes, applying upstream fixes to .stable for such distros is a good thing.

However, the better solution for these particular utilities, is that they 
simply always use upstream utilities -- even with old kernels.

When somebody reports a problem and I need them to run these tools, 100% of the 
time, I start by sending them the latest upstream version to replace the old 
version shipped by the distro.

Cheers,
-Len



-Original Message-
From: Sasha Levin [mailto:sas...@kernel.org] 
Sent: Wednesday, September 04, 2019 11:57 AM
To: linux-kernel@vger.kernel.org; sta...@vger.kernel.org
Cc: Gustavo A. R. Silva ; Prarit Bhargava 
; Brown, Len ; Sasha Levin 
; linux...@vger.kernel.org
Subject: [PATCH AUTOSEL 5.2 82/94] tools/power turbostat: fix file descriptor 
leaks

From: "Gustavo A. R. Silva" 

[ Upstream commit 605736c6929d541c78a85dffae4d33a23b6b2149 ]

Fix file descriptor leaks by closing fp before return.

Addresses-Coverity-ID: 1444591 ("Resource leak")
Addresses-Coverity-ID: 1444592 ("Resource leak")
Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data")
Signed-off-by: Gustavo A. R. Silva 
Reviewed-by: Prarit Bhargava 
Signed-off-by: Len Brown 
Signed-off-by: Sasha Levin 
---
 tools/power/x86/turbostat/turbostat.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/power/x86/turbostat/turbostat.c 
b/tools/power/x86/turbostat/turbostat.c
index 71a931813de00..066bd43ed6c9f 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -2912,6 +2912,7 @@ int snapshot_cpu_lpi_us(void)
if (retval != 1) {
fprintf(stderr, "Disabling Low Power Idle CPU output\n");
BIC_NOT_PRESENT(BIC_CPU_LPI);
+   fclose(fp);
return -1;
}
 
-- 
2.20.1



RE: [PATCH 09/14] thermal/x86_pkg_temp_thermal: Support multi-die/package

2019-04-05 Thread Brown, Len


> please add a new helper function topology_max_dies() and retrieve it from 
> that.

I'll do this for Rui -- since his patch is in my x86 branch,
and it is already (very early) Saturday morning for him:-)

FYI, there were a couple of other small changes I made to that branch
in response to list feedback that I pushed, but didn't re-email out the series.

Thomas,
Let me know if you prefer me to re-email the series, or just send a git pull 
request.

Thanks,
-Len



RE: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

2019-02-25 Thread Brown, Len
Hi Brice,

Yes, you re-discovered the bug that Kan Liang pointed out last week.

I have updated this patch set, and the latest for testing is in my git tree on 
kernel.org or

https://github.com/lenb/linux.git x86

Note that I took your advice and left the core_siblings with its original 
definition,
And created package_threads as a synonym.  I will e-mail out the patch set again
When I do 2 more things:

1. add core_threads map to sysfs
2. replace unique_die_id with logical_die_id -- turns out it is useful for same 
reason as logical_package_id.

Thanks,
-Len




-Original Message-
From: Brice Goglin [mailto:brice.gog...@inria.fr] 
Sent: Sunday, February 24, 2019 5:04 AM
To: Len Brown ; x...@kernel.org
Cc: linux-kernel@vger.kernel.org; Brown, Len ; 
linux-...@vger.kernel.org; Like Xu 
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

Le 19/02/2019 à 04:40, Len Brown a écrit :
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c 
> index ccd1f2a8e557..4250a87f57db 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct 
> cpuinfo_x86 *o)
>   int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>  
>   if (c->phys_proc_id == o->phys_proc_id &&
> + c->cpu_die_id == o->cpu_die_id &&
>   per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
>   if (c->cpu_core_id == o->cpu_core_id)
>   return topology_sane(c, o, "smt"); @@ -404,6 
> +405,7 @@ static 
> bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   }
>  
>   } else if (c->phys_proc_id == o->phys_proc_id &&
> +c->cpu_die_id == o->cpu_die_id &&
>  c->cpu_core_id == o->cpu_core_id) {
>   return topology_sane(c, o, "smt");
>   }
> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct 
> cpuinfo_x86 *o)
>   */
>  static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)  
> {
> - if (c->phys_proc_id == o->phys_proc_id)
> + if (c->cpu_die_id == o->cpu_die_id)
>   return true;
>   return false;
>  }

Hello Len,

I am testing your patches in a VM with CPUID.1f QEMU patches (author Like Xu is 
CC'ed), booted to get 2 packages with 1 NUMA node each and 2 dies each:

-smp cpus=16,threads=2,cores=2,dies=2,sockets=2 -numa node,cpus=0-7,nodeid=0 
-numa node,cpus=8-15,nodeid=1

sysfs files expose wrong information:
core_siblings contains threads of the local die AND of die with same number in 
other processors, eg cpu 0-3 and 8-11 instead of 0-3 only.

The issue is that you seem to assume that die ids will always be unique across 
multiple packages.
Is this a valid assumption? If so, QEMU patches should be fixed.
If not, I fixed the issue by changing match_die() to check both phys_proc_id 
and cpu_die_id:

static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) {
if (c->phys_proc_id == o->phys_proc_id && c->cpu_die_id == 
o->cpu_die_id)
return true;
return false;
}

Thanks
Brice




RE: [PATCH] MAINTAINERS: update simple firmware interface (SFI) section entry

2019-02-19 Thread Brown, Len
Yes, this can be scheduled to be deleted.



RE: [linux-drivers-review] [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording

2019-02-19 Thread Brown, Len
>> +if CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
>>  
>>  CONFIG_SCHED_BOOK and CONFIG_DRAWER are currently only used on s390, 
>> where

>fwiw:CONFIG_SCHED_DRAWER

Heh.  Seems that every line of update to a .txt files uncovers two lines of 
existing bugs.

Hopefully we are better at writing software, than documentation!

Indeed, maybe if we had a compiler or checkpatch.pl file for documentation 
files,
They would approach the quality of the source code?

I updated the patch to fix this too.

Thanks,
-Len




RE: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

2019-02-19 Thread Brown, Len
>> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct 
>> cpuinfo_x86 *o)
>>*/
>>   static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>>   {
>> -if (c->>phys_proc_id == o->>phys_proc_id)
>> +if (c->>cpu_die_id == o->>cpu_die_id)
>>  return true;
>>  return false;
>>   }

> Shouldn't we check the unique_die_id here?
> Die from different package can have the same cpu_die_id.

Good catch.
Updated this hunk as below.

Thanks!
-Len

@@ -461,7 +463,8 @@ static bool match_llc(struct cpuinfo_x86 *c, struct 
cpuinfo_x86 *o)
  */
 static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
-   if (c->phys_proc_id == o->phys_proc_id)
+   if ((c->phys_proc_id == o->phys_proc_id) &&
+   (c->cpu_die_id == o->cpu_die_id))
return true;
return false;
 }



RE: [PATCH 05/11] x86 topology: export die_siblings

2019-02-19 Thread Brown, Len
Thanks for the comments, Kan,

>> diff --git a/Documentation/cputopology.txt 
>> b/Documentation/cputopology.txt index 287213b4517b..7dd2ae3df233 
>> 100644
>> --- a/Documentation/cputopology.txt
>> +++ b/Documentation/cputopology.txt
>> @@ -56,6 +56,16 @@ core_siblings_list:
>>  human-readable list of cpuX's hardware threads within the same
>>  die_id.
>>   
>> +die_siblings:
>> +
>> +internal kernel map of cpuX's hardware threads within the same
>> +physical_package_id.
>> +
>> +die_siblings_list:
>> +
>> +human-readable list of cpuX's hardware threads within the same
>> +physical_package_id.
>> +
>>   book_siblings:
>>   
>>  internal kernel map of cpuX's hardware threads within the same diff 

> Could you please update the document regarding to topology_die_cpumask and 
> topology_core_cpumask in Documentation/x86/topology.txt
 
I agree that the top part of this file, as updated above, should document the 
external sysfs interface...

I'm less excited about the center of the file trying to document the internal 
implementation -- as the source code
is actually more clear than the document, but here is an update, consistent 
with the existing file.
Let me know if it does not fully address your comment.

Thanks,
-Len
---

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 7dd2ae3..f39c759 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -102,6 +102,7 @@ these macros in include/asm-XXX/topology.h::
#define topology_drawer_id(cpu)
#define topology_sibling_cpumask(cpu)
#define topology_core_cpumask(cpu)
+   #define topology_die_cpumask(cpu)
#define topology_book_cpumask(cpu)
#define topology_drawer_cpumask(cpu)

@@ -114,10 +115,11 @@ To be consistent on all architectures, 
include/linux/topology.h
 provides default definitions for any of the above macros that are
 not defined by include/asm-XXX/topology.h:

-1) physical_package_id: -1
-2) core_id: 0
-3) sibling_cpumask: just the given CPU
-4) core_cpumask: just the given CPU
+1) topology_physical_package_id: -1
+2) topology_core_id: 0
+3) topology_sibling_cpumask: just the given CPU
+4) topology_core_cpumask: just the given CPU
+5) topology_die_cpumask: just the given CPU

 For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
 default definitions for topology_book_id() and topology_book_cpumask().




RE: [RFC] x86, tsc: Add kcmdline args for skipping tsc calibration sequences

2018-07-13 Thread Brown, Len
We disabled CPUID-based TSC calibration on SKX in December for several reasons.
If you still have it enabled, you need this patch:

commit b511203093489eb1829cb4de86e8214752205ac6
x86/tsc: Fix erroneous TSC rate on Skylake Xeon

If you are referring to another platform that has CPUID-TSC calibration...
it should still work on an over-clocked system.  Over-clocked platforms should
use exactly the same reference crystal as non-overclocked platforms, but should
modify the crystal/core multiplier.   If you are changing the reference
crystal, then I believe you are using an un-supported hardware configuration,
and my ability to support you is limited.

-Len



RE: [RFC] x86, tsc: Add kcmdline args for skipping tsc calibration sequences

2018-07-13 Thread Brown, Len
We disabled CPUID-based TSC calibration on SKX in December for several reasons.
If you still have it enabled, you need this patch:

commit b511203093489eb1829cb4de86e8214752205ac6
x86/tsc: Fix erroneous TSC rate on Skylake Xeon

If you are referring to another platform that has CPUID-TSC calibration...
it should still work on an over-clocked system.  Over-clocked platforms should
use exactly the same reference crystal as non-overclocked platforms, but should
modify the crystal/core multiplier.   If you are changing the reference
crystal, then I believe you are using an un-supported hardware configuration,
and my ability to support you is limited.

-Len



RE: [PATCH 1/2] tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines

2018-04-07 Thread Brown, Len
You just did it:-)

-Original Message-
From: Matt Turner [mailto:matts...@gmail.com] 
Sent: Saturday, April 07, 2018 6:25 PM
To: Brown, Len <len.br...@intel.com>
Cc: Thomas Gleixner <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; H. 
Peter Anvin <h...@zytor.com>; Borislav Petkov <b...@suse.de>; LKML 
<linux-kernel@vger.kernel.org>; x...@kernel.org; Matt Turner 
<matts...@gmail.com>
Subject: Re: [PATCH 1/2] tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE 
defines

On Tue, Feb 13, 2018 at 11:12 AM, Matt Turner <matts...@gmail.com> wrote:
> According to the Intel Software Developers' Manual, Vol. 4, Order No.
> 335592, these macros have been reversed since they were added.
>
> Fixes: 889facbee3e6 ("tools/power turbostat: v3.0: monitor Watts and 
> Temperature")
> Signed-off-by: Matt Turner <matts...@gmail.com>

Is there something I need to do to ensure these two trivial patches get picked 
up?


RE: [PATCH 1/2] tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines

2018-04-07 Thread Brown, Len
You just did it:-)

-Original Message-
From: Matt Turner [mailto:matts...@gmail.com] 
Sent: Saturday, April 07, 2018 6:25 PM
To: Brown, Len 
Cc: Thomas Gleixner ; Ingo Molnar ; H. 
Peter Anvin ; Borislav Petkov ; LKML 
; x...@kernel.org; Matt Turner 

Subject: Re: [PATCH 1/2] tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE 
defines

On Tue, Feb 13, 2018 at 11:12 AM, Matt Turner  wrote:
> According to the Intel Software Developers' Manual, Vol. 4, Order No.
> 335592, these macros have been reversed since they were added.
>
> Fixes: 889facbee3e6 ("tools/power turbostat: v3.0: monitor Watts and 
> Temperature")
> Signed-off-by: Matt Turner 

Is there something I need to do to ensure these two trivial patches get picked 
up?


RE: [PATCH] tools/power x86_energy_perf_policy: fix resource leak on file descriptor

2017-05-31 Thread Brown, Len
Acked-by: Len Brown <len.br...@intel.com>


-Original Message-
From: Colin King [mailto:colin.k...@canonical.com] 
Sent: Wednesday, May 17, 2017 8:52 AM
To: Brown, Len <len.br...@intel.com>
Cc: kernel-janit...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: [PATCH] tools/power x86_energy_perf_policy: fix resource leak on file 
descriptor

From: Colin Ian King <colin.k...@canonical.com>

Function get_pkg_num is leaking an open file, fix this with a fclose().

Detected with static analysis by cppcheck:
[tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c:1115]:
   (error) Resource leak: fp

Fixes: 4beec1d7519691 ("tools/power x86_energy_perf_policy: support HWP.EPP")
Signed-off-by: Colin Ian King <colin.k...@canonical.com>
---
 tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c 
b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
index 65bbe627a425..26a5b5265290 100644
--- a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
+++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
@@ -1110,6 +1110,7 @@ unsigned int get_pkg_num(int cpu)
 
fp = fopen_or_die(pathname, "r");
retval = fscanf(fp, "%d\n", );
+   fclose(fp);
if (retval != 1)
errx(1, "%s: failed to parse", pathname);
return pkg;
-- 
2.11.0



RE: [PATCH] tools/power x86_energy_perf_policy: fix resource leak on file descriptor

2017-05-31 Thread Brown, Len
Acked-by: Len Brown 


-Original Message-
From: Colin King [mailto:colin.k...@canonical.com] 
Sent: Wednesday, May 17, 2017 8:52 AM
To: Brown, Len 
Cc: kernel-janit...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: [PATCH] tools/power x86_energy_perf_policy: fix resource leak on file 
descriptor

From: Colin Ian King 

Function get_pkg_num is leaking an open file, fix this with a fclose().

Detected with static analysis by cppcheck:
[tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c:1115]:
   (error) Resource leak: fp

Fixes: 4beec1d7519691 ("tools/power x86_energy_perf_policy: support HWP.EPP")
Signed-off-by: Colin Ian King 
---
 tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c 
b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
index 65bbe627a425..26a5b5265290 100644
--- a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
+++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
@@ -1110,6 +1110,7 @@ unsigned int get_pkg_num(int cpu)
 
fp = fopen_or_die(pathname, "r");
retval = fscanf(fp, "%d\n", );
+   fclose(fp);
if (retval != 1)
errx(1, "%s: failed to parse", pathname);
return pkg;
-- 
2.11.0



RE: [PATCH] tools/power turbostat: turbostat.8 add missing column definitions

2017-03-05 Thread Brown, Len
> On Saturday, March 04, 2017 06:36:29 PM Len Brown wrote:
> > Applied -- Thanks!
> 
> I guess I should include this into fixes too?
> 
> Or are you going to send a pull request with it to me?

When I apply it, you don't have to.
I have a couple of other fixes being tested on my turbostat branch now,
and this is in w/ those.

thanks,
-Len



RE: [PATCH] tools/power turbostat: turbostat.8 add missing column definitions

2017-03-05 Thread Brown, Len
> On Saturday, March 04, 2017 06:36:29 PM Len Brown wrote:
> > Applied -- Thanks!
> 
> I guess I should include this into fixes too?
> 
> Or are you going to send a pull request with it to me?

When I apply it, you don't have to.
I have a couple of other fixes being tested on my turbostat branch now,
and this is in w/ those.

thanks,
-Len



RE: [PATCH 2/2] x86/tsc: Add additional Intel CPU models to crystal_khz whitelist

2016-09-15 Thread Brown, Len
> + crystal_khz = 24000;/* 25.0 MHz */

I guess I prefer no comment over an incorrect comment.



RE: [PATCH 2/2] x86/tsc: Add additional Intel CPU models to crystal_khz whitelist

2016-09-15 Thread Brown, Len
> + crystal_khz = 24000;/* 25.0 MHz */

I guess I prefer no comment over an incorrect comment.



RE: [PATCH] drivers/idle: make intel_idle.c driver more explicitly non-modular

2016-04-04 Thread Brown, Len


> -Original Message-
> From: rcoch...@linutronix.de [mailto:rcoch...@linutronix.de]
> Sent: Tuesday, April 05, 2016 12:30 AM
> To: Brown, Len
> Cc: Gortmaker, Paul (Wind River); linux-kernel@vger.kernel.org; Len Brown;
> linux...@vger.kernel.org
> Subject: Re: [PATCH] drivers/idle: make intel_idle.c driver more
> explicitly non-modular
> 
> On Tue, Apr 05, 2016 at 04:20:47AM +, Brown, Len wrote:
> > The first idle driver to register with cpuidle wins.
> >
> > intel_idle should always get the opportunity
> > to probe and register before acpi_idle (processor_idle.c)
> >
> > When intel_idle was allowed to be modular,
> > some distros build their kernel and loaded modules
> > such that acpi_idle could probe first.  In such
> > a kernel, intel_idle became dead code.
> >
> > As intel_idle is a small driver, the q  uick fix
> > was to make it Y/N so that it would always probe
> > before acpi_idle, no matter how acpi_idle
> > is build and loaded.
> >
> > Yes, even though intel_idle is a tiny driver, I think
> > it would be good to be able to unload it when it doesn't probe.
> 
> And that means fixing the race with acpi_idle, right?
> 
> > Today, it appears that acpi_idle (acpi/processor_idle.c)
> > is compiled Y/N.
> 
> So it, too, needs work?

"needs" is somewhat subjective.
Some may argue that this driver is so small,
that an effort to save memory might be more effectively
directed elsewhere.  But if the goal is to save the memory
consumed by this driver when the driver doesn't probe,
then yes, it would have to be made modular.

I don't remember what ACPI dependency made it non-modular.
ACPI has some tricky initialization ordering issues
that are BIOS dependent, and sometimes out of our control.

> > No, I do not believe that cpuidle should bother
> > supporting changing idle drivers at run-time.
> 
> Huh?  But you just said, "it would be good to be able to unload it
> when it doesn't probe."

being able to switch the registered driver at run-time
does not require the driver to be modular.

cheers,
-Len




RE: [PATCH] drivers/idle: make intel_idle.c driver more explicitly non-modular

2016-04-04 Thread Brown, Len


> -Original Message-
> From: rcoch...@linutronix.de [mailto:rcoch...@linutronix.de]
> Sent: Tuesday, April 05, 2016 12:30 AM
> To: Brown, Len
> Cc: Gortmaker, Paul (Wind River); linux-kernel@vger.kernel.org; Len Brown;
> linux...@vger.kernel.org
> Subject: Re: [PATCH] drivers/idle: make intel_idle.c driver more
> explicitly non-modular
> 
> On Tue, Apr 05, 2016 at 04:20:47AM +, Brown, Len wrote:
> > The first idle driver to register with cpuidle wins.
> >
> > intel_idle should always get the opportunity
> > to probe and register before acpi_idle (processor_idle.c)
> >
> > When intel_idle was allowed to be modular,
> > some distros build their kernel and loaded modules
> > such that acpi_idle could probe first.  In such
> > a kernel, intel_idle became dead code.
> >
> > As intel_idle is a small driver, the q  uick fix
> > was to make it Y/N so that it would always probe
> > before acpi_idle, no matter how acpi_idle
> > is build and loaded.
> >
> > Yes, even though intel_idle is a tiny driver, I think
> > it would be good to be able to unload it when it doesn't probe.
> 
> And that means fixing the race with acpi_idle, right?
> 
> > Today, it appears that acpi_idle (acpi/processor_idle.c)
> > is compiled Y/N.
> 
> So it, too, needs work?

"needs" is somewhat subjective.
Some may argue that this driver is so small,
that an effort to save memory might be more effectively
directed elsewhere.  But if the goal is to save the memory
consumed by this driver when the driver doesn't probe,
then yes, it would have to be made modular.

I don't remember what ACPI dependency made it non-modular.
ACPI has some tricky initialization ordering issues
that are BIOS dependent, and sometimes out of our control.

> > No, I do not believe that cpuidle should bother
> > supporting changing idle drivers at run-time.
> 
> Huh?  But you just said, "it would be good to be able to unload it
> when it doesn't probe."

being able to switch the registered driver at run-time
does not require the driver to be modular.

cheers,
-Len




RE: [PATCH] drivers/idle: make intel_idle.c driver more explicitly non-modular

2016-04-04 Thread Brown, Len
The first idle driver to register with cpuidle wins.

intel_idle should always get the opportunity
to probe and register before acpi_idle (processor_idle.c)

When intel_idle was allowed to be modular,
some distros build their kernel and loaded modules
such that acpi_idle could probe first.  In such
a kernel, intel_idle became dead code.

As intel_idle is a small driver, the q  uick fix
was to make it Y/N so that it would always probe
before acpi_idle, no matter how acpi_idle
is build and loaded.

Yes, even though intel_idle is a tiny driver, I think
it would be good to be able to unload it when it doesn't probe.

Today, it appears that acpi_idle (acpi/processor_idle.c)
is compiled Y/N.

thanks,
-Len

ps.

No, I do not believe that cpuidle should bother
supporting changing idle drivers at run-time.



RE: [PATCH] drivers/idle: make intel_idle.c driver more explicitly non-modular

2016-04-04 Thread Brown, Len
The first idle driver to register with cpuidle wins.

intel_idle should always get the opportunity
to probe and register before acpi_idle (processor_idle.c)

When intel_idle was allowed to be modular,
some distros build their kernel and loaded modules
such that acpi_idle could probe first.  In such
a kernel, intel_idle became dead code.

As intel_idle is a small driver, the q  uick fix
was to make it Y/N so that it would always probe
before acpi_idle, no matter how acpi_idle
is build and loaded.

Yes, even though intel_idle is a tiny driver, I think
it would be good to be able to unload it when it doesn't probe.

Today, it appears that acpi_idle (acpi/processor_idle.c)
is compiled Y/N.

thanks,
-Len

ps.

No, I do not believe that cpuidle should bother
supporting changing idle drivers at run-time.



RE: [PATCH 01/30] ACPICA: Linuxize: reduce divergences for 20160212 release

2016-03-24 Thread Brown, Len
> > >
> > > -acpi_status acpi_hw_read(u32 *value, struct acpi_generic_address
> *reg)
> > > +acpi_status acpi_hw_read(u32 *value, struct acpi_generic_address *
> reg)
> >
> > The second argument * style appears the opposite of normal style
> > and a different style than the first argument * style.
> [Lv Zheng]
> The file is drivers/acpi/acpica/hwregs.c, which is coming from ACPICA
> upstream.
> So this is a result of "ACPICA release".
> In other words, this is a result of a "process".
> In order to fix this, things need to be done in "ACPICA release scripts".
> Which should be done in
> https://github.com/acpica/acpica/blob/master/generate/linux/libacpica.sh.
> Otherwise, "ACPICA release" process will require human intervention.
> 
> So please leave this patch fragment as is.
> It will be automatically fixed if the "ACPICA release" process is fixed.
> And if you don't leave this fragment as is, the "ACPICA release" process
> will get hurt.

I disagree.

The patch should be correct when it hits the Linux kernel tree.

If the process is broken, then fix the process and re-send
a fixed patch.  Linux doesn't care if the process is a program
that runs with the click of a button, or the result of 1000
engineering laboring day and night. Only the result matters,
the result should be correct, and this patch is not correct.

-Len



RE: [PATCH 01/30] ACPICA: Linuxize: reduce divergences for 20160212 release

2016-03-24 Thread Brown, Len
> > >
> > > -acpi_status acpi_hw_read(u32 *value, struct acpi_generic_address
> *reg)
> > > +acpi_status acpi_hw_read(u32 *value, struct acpi_generic_address *
> reg)
> >
> > The second argument * style appears the opposite of normal style
> > and a different style than the first argument * style.
> [Lv Zheng]
> The file is drivers/acpi/acpica/hwregs.c, which is coming from ACPICA
> upstream.
> So this is a result of "ACPICA release".
> In other words, this is a result of a "process".
> In order to fix this, things need to be done in "ACPICA release scripts".
> Which should be done in
> https://github.com/acpica/acpica/blob/master/generate/linux/libacpica.sh.
> Otherwise, "ACPICA release" process will require human intervention.
> 
> So please leave this patch fragment as is.
> It will be automatically fixed if the "ACPICA release" process is fixed.
> And if you don't leave this fragment as is, the "ACPICA release" process
> will get hurt.

I disagree.

The patch should be correct when it hits the Linux kernel tree.

If the process is broken, then fix the process and re-send
a fixed patch.  Linux doesn't care if the process is a program
that runs with the click of a button, or the result of 1000
engineering laboring day and night. Only the result matters,
the result should be correct, and this patch is not correct.

-Len



RE: C1E auto-promotion suspend/resume

2016-03-13 Thread Brown, Len
> By BIOS (1.2.3 on a Dell XPS 13 9350) seems to want to enable C1E
> auto-promotion (ugh!), which results in this difference across
> suspend/resume according to turbostat:
> 
> -cpu3: MSR_IA32_POWER_CTL: 0x0024005d (C1E auto-promotion: DISabled)
> +cpu3: MSR_IA32_POWER_CTL: 0x0024005f (C1E auto-promotion: ENabled)
> 
> Should intel_idle learn to re-disable idle promotion on resume?

Yes, it seems that way.

Go ahead and send a patch, or file a bug at bugzilla.kernel.org
and we'll get to it.

thanks,
-len



RE: C1E auto-promotion suspend/resume

2016-03-13 Thread Brown, Len
> By BIOS (1.2.3 on a Dell XPS 13 9350) seems to want to enable C1E
> auto-promotion (ugh!), which results in this difference across
> suspend/resume according to turbostat:
> 
> -cpu3: MSR_IA32_POWER_CTL: 0x0024005d (C1E auto-promotion: DISabled)
> +cpu3: MSR_IA32_POWER_CTL: 0x0024005f (C1E auto-promotion: ENabled)
> 
> Should intel_idle learn to re-disable idle promotion on resume?

Yes, it seems that way.

Go ahead and send a patch, or file a bug at bugzilla.kernel.org
and we'll get to it.

thanks,
-len



RE: Skylake (XPS 13 9350) TSC is way off

2015-12-02 Thread Brown, Len
> >> [0.00] tsc: PIT calibration matches HPET. 2 loops
> >> [0.00] tsc: Detected 2399.975 MHz processor
> >> [0.090897] TSC deadline timer enabled
> >> [1.960034] tsc: Refined TSC clocksource calibration: 2400.007 MHz

> turbostat version 4.10 10 Dec, 2015 - Len Brown 
> CPUID(0): GenuineIntel 22 CPUID levels; family:model:stepping 0x6:4e:3 
> (6:78:3)
> CPUID(1): SSE3 MONITOR EIST TM2 TSC MSR ACPI-TM TM
> CPUID(6): APERF, DTS, PTM, HWP, HWPnotify, HWPwindow, HWPepp, No-HWPpkg, EPB
> cpu1: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MONITOR)
> CPUID(0x15): eax_crystal: 2 ebx_tsc: 200 ecx_crystal_hz: 0
> TSC: 2400 MHz (2400 Hz * 200 / 2 / 100)
> CPUID(0x16): base_mhz: 2400 max_mhz: 2800 bus_mhz: 100


Both the initial and refined TSC calibration are right on the money -- 2400 MHz.
Further, this system happens to also have a base frequency of 2400 MHz,
so tsc_khz = cpu_khz = 2,400,000, exactly.  It would stray from that value
only based on the ppm error of the base 24 MHz crystal.

Anything other than that value is error.

-Len

> RAPL: 17476 sec. Joule Counter Range, at 15 Watts
> cpu1: MSR_PLATFORM_INFO: 0x4043df1011800
> 4 * 100 = 400 MHz max efficiency frequency
> 24 * 100 = 2400 MHz base frequency
> cpu1: MSR_IA32_POWER_CTL: 0x0024005d (C1E auto-promotion: DISabled)
> cpu1: MSR_TURBO_RATIO_LIMIT: 0x1b1b1b1c
> 27 * 100 = 2700 MHz max turbo 4 active cores
> 27 * 100 = 2700 MHz max turbo 3 active cores
> 27 * 100 = 2700 MHz max turbo 2 active cores
> 28 * 100 = 2800 MHz max turbo 1 active cores
> cpu1: MSR_CONFIG_TDP_NOMINAL: 0x0017 (base_ratio=7)
> cpu1: MSR_CONFIG_TDP_LEVEL_1: 0x0008003c (PKG_MIN_PWR_LVL1=0
> PKG_MAX_PWR_LVL1=0 LVL1_RATIO=8 PKG_TDP_LVL1=60)
> cpu1: MSR_CONFIG_TDP_LEVEL_2: 0x001800c8 (PKG_MIN_PWR_LVL2=0
> PKG_MAX_PWR_LVL2=0 LVL2_RATIO=8 PKG_TDP_LVL2=200)
> cpu1: MSR_CONFIG_TDP_CONTROL: 0x ( lock=0)
> cpu1: MSR_TURBO_ACTIVATION_RATIO: 0x (MAX_NON_TURBO_RATIO=0
> lock=0)
> cpu1: MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008006 (UNdemote-C3,
> UNdemote-C1, demote-C3, demote-C1, locked: pkg-cstate-limit=6: pc8)
> cpu0: MSR_PM_ENABLE: 0x0001 (HWP)
> cpu0: MSR_HWP_CAPABILITIES: 0x0105171c (high 0x1c guar 0x17 eff 0x5 low
> 0x1)
> cpu0: MSR_HWP_REQUEST: 0x80001c04 (min 0x4 max 0x1c des 0x0 epp 0x80
> window 0x0 pkg 0x0)
> cpu0: MSR_HWP_INTERRUPT: 0x0001 (EN_Guaranteed_Perf_Change,
> Dis_Excursion_Min)
> cpu0: MSR_HWP_STATUS: 0x (No-Guaranteed_Perf_Change, No-
> Excursion_Min)
> cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x0006 (balanced)
> cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.61
> Joules, 0.000977 sec.)
> cpu0: MSR_PKG_POWER_INFO: 0x0078 (15 W TDP, RAPL 0 - 0 W, 0.00
> sec.)
> cpu0: MSR_PKG_POWER_LIMIT: 0x4280c800dd8078 (UNlocked)
> cpu0: PKG Limit #1: ENabled (15.00 Watts, 28.00 sec, clamp
> ENabled)
> cpu0: PKG Limit #2: ENabled (25.00 Watts, 0.002441* sec, clamp
> DISabled)
> cpu0: MSR_DRAM_POWER_LIMIT: 0x5400de (UNlocked)
> cpu0: DRAM Limit: DISabled (0.00 Watts, 0.000977 sec, clamp DISabled)
> cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x0064 (100 C)
> cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88340800 (48 C)
> cpu0: MSR_IA32_THERM_STATUS: 0x88350800 (47 C +/- 1)
> cpu1: MSR_IA32_THERM_STATUS: 0x88340800 (48 C +/- 1)
> Core CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz SMI  CPU%c1
> CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Totl%C0  Any%C0  GFX%C0 CPUGFX%
> Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10 PkgWatt
> RAMWatt   PKG_%   RAM_%
>-   -  162.84 5492399   04.13
> 0.020.63   92.39  46  46   12.21   10.772.480.80
> 14.33   71.530.000.000.000.000.001.950.54
>   0.000.00
>0   0  111.92 5632399   02.48
> 0.020.80   94.78  46  46   12.21   10.782.480.80
> 14.34   71.550.000.000.000.000.001.950.54
>   0.000.00
>0   2   81.39 5502399   03.04
>1   1  132.04 6222400   07.48
> 0.020.45   90.01  45
>1   3  316.01 5202400   03.51
> 1.002561 sec
> 
> In case it's at all useful, adjtimex -p says:
> 
>  mode: 0
>offset: 0
> frequency: 135641
>  maxerror: 37498
>  esterror: 1532
>status: 8192
> time_constant: 2
> precision: 1
> tolerance: 32768000
>  tick: 1
>  raw time:  1449098317s 671243180us = 1449098317.671243180
> 
> this suggests a rather small correction, so I really have no idea what
> "Adjusting tsc more than 11% (8039115 vs 7759462)" means.
> 
> John, you wrote this code.  What does the error message mean?
> 
> --Andy
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Skylake (XPS 13 9350) TSC is way off

2015-12-02 Thread Brown, Len
+adrian

> [0.00] tsc: PIT calibration matches HPET. 2 loops
> [0.00] tsc: Detected 2399.975 MHz processor
> [0.090897] TSC deadline timer enabled
> [1.960034] tsc: Refined TSC clocksource calibration: 2400.007 MHz
> [1.960039] clocksource: tsc: mask: 0x max_cycles:
> 0x22983e30402, max_idle_ns: 440795260848 ns
> [2.959936] clocksource: Switched to clocksource tsc
> [   87.168211] Adjusting tsc more than 11% (5941981 vs 7759439)
> 
> This is more or less Linus' latest tree (v4.4-rc3 plus some unrelated
> platform driver patches).

Hi Andy,
Thanks for the note.
I'll send you a debug version of turbostat, off list,
since the list will just block mail with that attachment.

thanks,
-Len



RE: Skylake (XPS 13 9350) TSC is way off

2015-12-02 Thread Brown, Len
+adrian

> [0.00] tsc: PIT calibration matches HPET. 2 loops
> [0.00] tsc: Detected 2399.975 MHz processor
> [0.090897] TSC deadline timer enabled
> [1.960034] tsc: Refined TSC clocksource calibration: 2400.007 MHz
> [1.960039] clocksource: tsc: mask: 0x max_cycles:
> 0x22983e30402, max_idle_ns: 440795260848 ns
> [2.959936] clocksource: Switched to clocksource tsc
> [   87.168211] Adjusting tsc more than 11% (5941981 vs 7759439)
> 
> This is more or less Linus' latest tree (v4.4-rc3 plus some unrelated
> platform driver patches).

Hi Andy,
Thanks for the note.
I'll send you a debug version of turbostat, off list,
since the list will just block mail with that attachment.

thanks,
-Len



RE: Skylake (XPS 13 9350) TSC is way off

2015-12-02 Thread Brown, Len
> >> [0.00] tsc: PIT calibration matches HPET. 2 loops
> >> [0.00] tsc: Detected 2399.975 MHz processor
> >> [0.090897] TSC deadline timer enabled
> >> [1.960034] tsc: Refined TSC clocksource calibration: 2400.007 MHz

> turbostat version 4.10 10 Dec, 2015 - Len Brown 
> CPUID(0): GenuineIntel 22 CPUID levels; family:model:stepping 0x6:4e:3 
> (6:78:3)
> CPUID(1): SSE3 MONITOR EIST TM2 TSC MSR ACPI-TM TM
> CPUID(6): APERF, DTS, PTM, HWP, HWPnotify, HWPwindow, HWPepp, No-HWPpkg, EPB
> cpu1: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MONITOR)
> CPUID(0x15): eax_crystal: 2 ebx_tsc: 200 ecx_crystal_hz: 0
> TSC: 2400 MHz (2400 Hz * 200 / 2 / 100)
> CPUID(0x16): base_mhz: 2400 max_mhz: 2800 bus_mhz: 100


Both the initial and refined TSC calibration are right on the money -- 2400 MHz.
Further, this system happens to also have a base frequency of 2400 MHz,
so tsc_khz = cpu_khz = 2,400,000, exactly.  It would stray from that value
only based on the ppm error of the base 24 MHz crystal.

Anything other than that value is error.

-Len

> RAPL: 17476 sec. Joule Counter Range, at 15 Watts
> cpu1: MSR_PLATFORM_INFO: 0x4043df1011800
> 4 * 100 = 400 MHz max efficiency frequency
> 24 * 100 = 2400 MHz base frequency
> cpu1: MSR_IA32_POWER_CTL: 0x0024005d (C1E auto-promotion: DISabled)
> cpu1: MSR_TURBO_RATIO_LIMIT: 0x1b1b1b1c
> 27 * 100 = 2700 MHz max turbo 4 active cores
> 27 * 100 = 2700 MHz max turbo 3 active cores
> 27 * 100 = 2700 MHz max turbo 2 active cores
> 28 * 100 = 2800 MHz max turbo 1 active cores
> cpu1: MSR_CONFIG_TDP_NOMINAL: 0x0017 (base_ratio=7)
> cpu1: MSR_CONFIG_TDP_LEVEL_1: 0x0008003c (PKG_MIN_PWR_LVL1=0
> PKG_MAX_PWR_LVL1=0 LVL1_RATIO=8 PKG_TDP_LVL1=60)
> cpu1: MSR_CONFIG_TDP_LEVEL_2: 0x001800c8 (PKG_MIN_PWR_LVL2=0
> PKG_MAX_PWR_LVL2=0 LVL2_RATIO=8 PKG_TDP_LVL2=200)
> cpu1: MSR_CONFIG_TDP_CONTROL: 0x ( lock=0)
> cpu1: MSR_TURBO_ACTIVATION_RATIO: 0x (MAX_NON_TURBO_RATIO=0
> lock=0)
> cpu1: MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008006 (UNdemote-C3,
> UNdemote-C1, demote-C3, demote-C1, locked: pkg-cstate-limit=6: pc8)
> cpu0: MSR_PM_ENABLE: 0x0001 (HWP)
> cpu0: MSR_HWP_CAPABILITIES: 0x0105171c (high 0x1c guar 0x17 eff 0x5 low
> 0x1)
> cpu0: MSR_HWP_REQUEST: 0x80001c04 (min 0x4 max 0x1c des 0x0 epp 0x80
> window 0x0 pkg 0x0)
> cpu0: MSR_HWP_INTERRUPT: 0x0001 (EN_Guaranteed_Perf_Change,
> Dis_Excursion_Min)
> cpu0: MSR_HWP_STATUS: 0x (No-Guaranteed_Perf_Change, No-
> Excursion_Min)
> cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x0006 (balanced)
> cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.61
> Joules, 0.000977 sec.)
> cpu0: MSR_PKG_POWER_INFO: 0x0078 (15 W TDP, RAPL 0 - 0 W, 0.00
> sec.)
> cpu0: MSR_PKG_POWER_LIMIT: 0x4280c800dd8078 (UNlocked)
> cpu0: PKG Limit #1: ENabled (15.00 Watts, 28.00 sec, clamp
> ENabled)
> cpu0: PKG Limit #2: ENabled (25.00 Watts, 0.002441* sec, clamp
> DISabled)
> cpu0: MSR_DRAM_POWER_LIMIT: 0x5400de (UNlocked)
> cpu0: DRAM Limit: DISabled (0.00 Watts, 0.000977 sec, clamp DISabled)
> cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x0064 (100 C)
> cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88340800 (48 C)
> cpu0: MSR_IA32_THERM_STATUS: 0x88350800 (47 C +/- 1)
> cpu1: MSR_IA32_THERM_STATUS: 0x88340800 (48 C +/- 1)
> Core CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz SMI  CPU%c1
> CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Totl%C0  Any%C0  GFX%C0 CPUGFX%
> Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10 PkgWatt
> RAMWatt   PKG_%   RAM_%
>-   -  162.84 5492399   04.13
> 0.020.63   92.39  46  46   12.21   10.772.480.80
> 14.33   71.530.000.000.000.000.001.950.54
>   0.000.00
>0   0  111.92 5632399   02.48
> 0.020.80   94.78  46  46   12.21   10.782.480.80
> 14.34   71.550.000.000.000.000.001.950.54
>   0.000.00
>0   2   81.39 5502399   03.04
>1   1  132.04 6222400   07.48
> 0.020.45   90.01  45
>1   3  316.01 5202400   03.51
> 1.002561 sec
> 
> In case it's at all useful, adjtimex -p says:
> 
>  mode: 0
>offset: 0
> frequency: 135641
>  maxerror: 37498
>  esterror: 1532
>status: 8192
> time_constant: 2
> precision: 1
> tolerance: 32768000
>  tick: 1
>  raw time:  1449098317s 671243180us = 1449098317.671243180
> 
> this suggests a rather small correction, so I really have no idea what
> "Adjusting tsc more than 11% (8039115 vs 7759462)" means.
> 
> John, you wrote this code.  What does the error message mean?
> 
> --Andy
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] intel_idle: Don't use on Lenovo Ideapad S10-3t

2015-11-02 Thread Brown, Len
> Lenovo Ideapad S10-3t hangs coming out of S3 with intel_idle.
> The two workaround that seem to help are "intel_idle.max_cstate=0"
> or "nohz=off highres=off".
> 
> At a first glance quirk_tigerpoint_bm_sts() seemed promising, but
> even when moved to early_resume it didn't do anything.
> 
> I have no idea what's wrong here, so let's just disable intel_idle
> for these machines using a DMI match.

Ville, 

It is great that several workarounds have been discovered.

But it would be better to get a good idea of the root-cause
before permanently ignoring the problem via a new
black-list in the upstream kernel.

Is it possible for you to file a bug at bugzilla.kernel.org
against Product: power-management; component: intel_idle?

In it, please put the following information.

If this is a regression, the oldest kernel that broke.

When booted with intel_idle, and then without:

dmesg | grep idle
grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
# turbostat --debug sleep 10 2> turbostat.out

# cd /sys/devices/system/clocksource/clocksource0
grep . available_clocksource current_clocksource


Other boot options to test:

maxcpus=1
nohpet

intel_idle.max_cstate=3
and if that fails
intel_idle.max_cstate=2
and if that fails
intel_idle.max_cstate=1

thanks,
-Len

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] intel_idle: Don't use on Lenovo Ideapad S10-3t

2015-11-02 Thread Brown, Len
> Lenovo Ideapad S10-3t hangs coming out of S3 with intel_idle.
> The two workaround that seem to help are "intel_idle.max_cstate=0"
> or "nohz=off highres=off".
> 
> At a first glance quirk_tigerpoint_bm_sts() seemed promising, but
> even when moved to early_resume it didn't do anything.
> 
> I have no idea what's wrong here, so let's just disable intel_idle
> for these machines using a DMI match.

Ville, 

It is great that several workarounds have been discovered.

But it would be better to get a good idea of the root-cause
before permanently ignoring the problem via a new
black-list in the upstream kernel.

Is it possible for you to file a bug at bugzilla.kernel.org
against Product: power-management; component: intel_idle?

In it, please put the following information.

If this is a regression, the oldest kernel that broke.

When booted with intel_idle, and then without:

dmesg | grep idle
grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
# turbostat --debug sleep 10 2> turbostat.out

# cd /sys/devices/system/clocksource/clocksource0
grep . available_clocksource current_clocksource


Other boot options to test:

maxcpus=1
nohpet

intel_idle.max_cstate=3
and if that fails
intel_idle.max_cstate=2
and if that fails
intel_idle.max_cstate=1

thanks,
-Len

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re[6]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-15 Thread Brown, Len
try booting upstream with "cpu_init_udelay=1".
If it works, then it actually implicates this commit:

a9bcaa02a5104ace6a9d9e4a9cd9192a9e7744d6
("x86/smpboot: Remove SIPI delays from cpu_up()")

Unfortunately the commit message for that on is erroneous --
"cpu_init_udelay=1" is actually a NO-OP,
because that matches the compiled-in default.
Indeed, any non-zero value bug 1 should work.

thanks,
-Len



RE: Re[2]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-15 Thread Brown, Len
> I did the revert in linux-stable (last tag being v4.3-rc4) gave a revert
> description so it would be applied.
> 
> built and tested.  Result:  did not help, still missing the second core.

Same result here.

upstream failed to bring up CPU #1 on 5/5 boots

Revert "x86/smpboot: Remove APIC.wait_for_init_deassert and atomic 
init_deasserted"

This reverts commit 656bba306827a44ed73b3f93f75bb3147de17fae.

Still fails the same way.

Adding "cpu_init_udelay=1"

does not help.

commence bisect...

cheers,
-Len


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re[4]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-15 Thread Brown, Len
> > > You have similar hardware:
> > >
> > > Shane:
> > >
> > > smpboot: CPU0: Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz (fam:
> 06,
> > model: 0f, stepping: 06)
> > >
> > > Donald:
> > >
> > > CPU : Intel Core 2 CPU 6600 @ 2.4GHz
> > >
> > > I think I can get ahold of a core2 6xxx box tomorrow.
> 
> Intel(R) Core(TM)2 CPU E6800  @ 2.93GHz
> 
> is working for me with latest upstream.
> (It is on an Intel D975 XBX motherboard)

Good news - I reproduced the failure on a similar box, an Intel D975xbx2:

[0.00] Linux version 4.3.0-rc5+ (lenb@z87) (gcc version 4.9.2 20150212 
(Red Hat 4.9.2-6) (GCC) ) #375 SMP Thu Oct 15 18:17:04 EDT 2015
...
[0.084000] smpboot: CPU0: Intel(R) Core(TM)2 Quad CPU   @ 2.66GHz 
(family: 0x6, model: 0xf, stepping: 0x7)
[0.084000] Performance Events: PEBS fmt0-, 4-deep LBR, Core2 events, Intel 
PMU driver.
[0.084000] perf_event_intel: PEBS disabled due to CPU errata
[0.084000] ... version:2
[0.084000] ... bit width:  40
[0.084000] ... generic registers:  2
[0.084000] ... value mask: 00ff
[0.084000] ... max period: 7fff
[0.084000] ... fixed-purpose events:   3
[0.084000] ... event mask: 00070003
[0.084000] x86: Booting SMP configuration:
[0.084000]  node  #0, CPUs:  #1
[   10.080003] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1
[   10.080175] NMI watchdog: enabled on all CPUs, permanently consumes one 
hw-PMU counter.
[   10.080334]  #2 #3
[   10.084017] x86: Booted up 1 node, 3 CPUs
[   10.084120] smpboot: Total of 3 processors activated (16001.58 BogoMIPS)



RE: Re[4]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-15 Thread Brown, Len
> > You have similar hardware:
> >
> > Shane:
> >
> > smpboot: CPU0: Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz (fam: 06,
> model: 0f, stepping: 06)
> >
> > Donald:
> >
> > CPU : Intel Core 2 CPU 6600 @ 2.4GHz
> >
> > I think I can get ahold of a core2 6xxx box tomorrow.

Intel(R) Core(TM)2 CPU E6800  @ 2.93GHz

is working for me with latest upstream.
(It is on an Intel D975 XBX motherboard)

please send me the .config you are using
that fails and I'll try that.

thanks,
-Len

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re[4]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-15 Thread Brown, Len
> > > You have similar hardware:
> > >
> > > Shane:
> > >
> > > smpboot: CPU0: Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz (fam:
> 06,
> > model: 0f, stepping: 06)
> > >
> > > Donald:
> > >
> > > CPU : Intel Core 2 CPU 6600 @ 2.4GHz
> > >
> > > I think I can get ahold of a core2 6xxx box tomorrow.
> 
> Intel(R) Core(TM)2 CPU E6800  @ 2.93GHz
> 
> is working for me with latest upstream.
> (It is on an Intel D975 XBX motherboard)

Good news - I reproduced the failure on a similar box, an Intel D975xbx2:

[0.00] Linux version 4.3.0-rc5+ (lenb@z87) (gcc version 4.9.2 20150212 
(Red Hat 4.9.2-6) (GCC) ) #375 SMP Thu Oct 15 18:17:04 EDT 2015
...
[0.084000] smpboot: CPU0: Intel(R) Core(TM)2 Quad CPU   @ 2.66GHz 
(family: 0x6, model: 0xf, stepping: 0x7)
[0.084000] Performance Events: PEBS fmt0-, 4-deep LBR, Core2 events, Intel 
PMU driver.
[0.084000] perf_event_intel: PEBS disabled due to CPU errata
[0.084000] ... version:2
[0.084000] ... bit width:  40
[0.084000] ... generic registers:  2
[0.084000] ... value mask: 00ff
[0.084000] ... max period: 7fff
[0.084000] ... fixed-purpose events:   3
[0.084000] ... event mask: 00070003
[0.084000] x86: Booting SMP configuration:
[0.084000]  node  #0, CPUs:  #1
[   10.080003] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1
[   10.080175] NMI watchdog: enabled on all CPUs, permanently consumes one 
hw-PMU counter.
[   10.080334]  #2 #3
[   10.084017] x86: Booted up 1 node, 3 CPUs
[   10.084120] smpboot: Total of 3 processors activated (16001.58 BogoMIPS)



RE: Re[6]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-15 Thread Brown, Len
try booting upstream with "cpu_init_udelay=1".
If it works, then it actually implicates this commit:

a9bcaa02a5104ace6a9d9e4a9cd9192a9e7744d6
("x86/smpboot: Remove SIPI delays from cpu_up()")

Unfortunately the commit message for that on is erroneous --
"cpu_init_udelay=1" is actually a NO-OP,
because that matches the compiled-in default.
Indeed, any non-zero value bug 1 should work.

thanks,
-Len



RE: Re[4]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-15 Thread Brown, Len
> > You have similar hardware:
> >
> > Shane:
> >
> > smpboot: CPU0: Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz (fam: 06,
> model: 0f, stepping: 06)
> >
> > Donald:
> >
> > CPU : Intel Core 2 CPU 6600 @ 2.4GHz
> >
> > I think I can get ahold of a core2 6xxx box tomorrow.

Intel(R) Core(TM)2 CPU E6800  @ 2.93GHz

is working for me with latest upstream.
(It is on an Intel D975 XBX motherboard)

please send me the .config you are using
that fails and I'll try that.

thanks,
-Len

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re[2]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-15 Thread Brown, Len
> I did the revert in linux-stable (last tag being v4.3-rc4) gave a revert
> description so it would be applied.
> 
> built and tested.  Result:  did not help, still missing the second core.

Same result here.

upstream failed to bring up CPU #1 on 5/5 boots

Revert "x86/smpboot: Remove APIC.wait_for_init_deassert and atomic 
init_deasserted"

This reverts commit 656bba306827a44ed73b3f93f75bb3147de17fae.

Still fails the same way.

Adding "cpu_init_udelay=1"

does not help.

commence bisect...

cheers,
-Len


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re[2]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-14 Thread Brown, Len
> > > Did you try reverting the "x86/smpboot: Remove
> APIC.wait_for_init_deassert
> > > and atomic init_deasserted"  patch?
> >
> > Yes, please let me know if reverting that patch helps you too.
> 
> How?  Please send a patch or git cmd(s).I have the
> git/stable/linux-stable.git  on my PC.   Thanks.

git log calls it this:

commit 656bba306827a44ed73b3f93f75bb3147de17fae
Author: Len Brown 
Date:   Sun Aug 16 11:45:48 2015 -0400

x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted

So you want to simply do this:

$ git revert 656bba306827a44ed73b3f93f75bb3147de17fae

build and test.

cheers,
-Len


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re[2]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-14 Thread Brown, Len
Donald, Shane,
Thanks for reporting this.

> > I also tried suggested /vmlinuz-4.3.0-rc3 parameter in grub:
> >         "cpu_init_udelay=1"
> > which did not help getting missing CPU back online.

right, if the issue is caused by the patch below,
that cmdline will not help.

> Did you try reverting the "x86/smpboot: Remove APIC.wait_for_init_deassert
> and atomic init_deasserted"  patch?

Yes, please let me know if reverting that patch helps you too.

You have similar hardware:

Shane:

smpboot: CPU0: Intel(R) Core(TM)2 CPU  6400  @ 2.13GHz (fam: 06, model: 
0f, stepping: 06)

Donald:

CPU : Intel Core 2 CPU 6600 @ 2.4GHz

I think I can get ahold of a core2 6xxx box tomorrow.
Please send me your .config directly, and I'll see if I can reproduce the issue.

thanks,
-Len

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re[2]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-14 Thread Brown, Len
Donald, Shane,
Thanks for reporting this.

> > I also tried suggested /vmlinuz-4.3.0-rc3 parameter in grub:
> >         "cpu_init_udelay=1"
> > which did not help getting missing CPU back online.

right, if the issue is caused by the patch below,
that cmdline will not help.

> Did you try reverting the "x86/smpboot: Remove APIC.wait_for_init_deassert
> and atomic init_deasserted"  patch?

Yes, please let me know if reverting that patch helps you too.

You have similar hardware:

Shane:

smpboot: CPU0: Intel(R) Core(TM)2 CPU  6400  @ 2.13GHz (fam: 06, model: 
0f, stepping: 06)

Donald:

CPU : Intel Core 2 CPU 6600 @ 2.4GHz

I think I can get ahold of a core2 6xxx box tomorrow.
Please send me your .config directly, and I'll see if I can reproduce the issue.

thanks,
-Len

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re[2]: 3.4-rc smpboot: do_boot_cpu failed(-1) to wakeup CPU#1

2015-10-14 Thread Brown, Len
> > > Did you try reverting the "x86/smpboot: Remove
> APIC.wait_for_init_deassert
> > > and atomic init_deasserted"  patch?
> >
> > Yes, please let me know if reverting that patch helps you too.
> 
> How?  Please send a patch or git cmd(s).I have the
> git/stable/linux-stable.git  on my PC.   Thanks.

git log calls it this:

commit 656bba306827a44ed73b3f93f75bb3147de17fae
Author: Len Brown 
Date:   Sun Aug 16 11:45:48 2015 -0400

x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted

So you want to simply do this:

$ git revert 656bba306827a44ed73b3f93f75bb3147de17fae

build and test.

cheers,
-Len


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] power: print function name of callbacks

2015-09-22 Thread Brown, Len
have you used analyze_suspend?
It used to parse this output, but that was abandoned
when it cut over to using ftrace directly.

https://01.org/suspendresume

cheers,
-Len


> -Original Message-
> From: Douglas Anderson [mailto:diand...@chromium.org]
> Sent: Tuesday, September 22, 2015 1:27 PM
> To: r...@rjwysocki.net
> Cc: Dmitry Torokhov; Douglas Anderson; pa...@ucw.cz; Brown, Len;
> gre...@linuxfoundation.org; linux...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH] power: print function name of callbacks
> 
> The printouts writen to the logs by suspend can be a bit opaque: it can
> be hard to track them down to the actual function called.  You might
> see:
> 
>   calling  rfkill1+ @ 19473, parent: phy0
>   call rfkill1+ returned 0 after 1 usecs
>   calling  phy0+ @ 19473, parent: mmc2:0001:1
>   call phy0+ returned 0 after 19 usecs
> 
> It's a bit hard to know what's actually happening.  Instead, it's nice
> to see:
> 
>   calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_suspend
>   call rfkill1+ returned 0 after 1 usecs
>   calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_suspend
> [cfg80211]
>   call phy0+ returned 0 after 7 usecs
> 
> That makes it very obvious what's going on.  It also has the nice side
> effect of making the suspend/resume spew a little more obvious, since
> many resume functions have the word "resume" in the name:
> 
>   calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_resume [cfg80211]
>   call phy0+ returned 0 after 12 usecs
>   calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_resume
>   call rfkill1+ returned 0 after 1 usecs
> 
> Signed-off-by: Douglas Anderson 
> ---
>  drivers/base/power/main.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 1710c26..e2b1f14 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -188,14 +188,14 @@ void device_pm_move_last(struct device *dev)
>   list_move_tail(>power.entry, _list);
>  }
> 
> -static ktime_t initcall_debug_start(struct device *dev)
> +static ktime_t initcall_debug_start(struct device *dev, void *cb)
>  {
>   ktime_t calltime = ktime_set(0, 0);
> 
>   if (pm_print_times_enabled) {
> - pr_info("calling  %s+ @ %i, parent: %s\n",
> + pr_info("calling  %s+ @ %i, parent: %s, cb: %pf\n",
>   dev_name(dev), task_pid_nr(current),
> - dev->parent ? dev_name(dev->parent) : "none");
> + dev->parent ? dev_name(dev->parent) : "none", cb);
>   calltime = ktime_get();
>   }
> 
> @@ -382,7 +382,7 @@ static int dpm_run_callback(pm_callback_t cb, struct
> device *dev,
>   if (!cb)
>   return 0;
> 
> - calltime = initcall_debug_start(dev);
> + calltime = initcall_debug_start(dev, cb);
> 
>   pm_dev_dbg(dev, state, info);
>   trace_device_pm_callback_start(dev, info, state.event);
> @@ -1324,7 +1324,7 @@ static int legacy_suspend(struct device *dev,
> pm_message_t state,
>   int error;
>   ktime_t calltime;
> 
> - calltime = initcall_debug_start(dev);
> + calltime = initcall_debug_start(dev, cb);
> 
>   trace_device_pm_callback_start(dev, info, state.event);
>   error = cb(dev, state);
> --
> 2.6.0.rc0.131.gf624c3d

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] power: print function name of callbacks

2015-09-22 Thread Brown, Len
have you used analyze_suspend?
It used to parse this output, but that was abandoned
when it cut over to using ftrace directly.

https://01.org/suspendresume

cheers,
-Len


> -Original Message-
> From: Douglas Anderson [mailto:diand...@chromium.org]
> Sent: Tuesday, September 22, 2015 1:27 PM
> To: r...@rjwysocki.net
> Cc: Dmitry Torokhov; Douglas Anderson; pa...@ucw.cz; Brown, Len;
> gre...@linuxfoundation.org; linux...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH] power: print function name of callbacks
> 
> The printouts writen to the logs by suspend can be a bit opaque: it can
> be hard to track them down to the actual function called.  You might
> see:
> 
>   calling  rfkill1+ @ 19473, parent: phy0
>   call rfkill1+ returned 0 after 1 usecs
>   calling  phy0+ @ 19473, parent: mmc2:0001:1
>   call phy0+ returned 0 after 19 usecs
> 
> It's a bit hard to know what's actually happening.  Instead, it's nice
> to see:
> 
>   calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_suspend
>   call rfkill1+ returned 0 after 1 usecs
>   calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_suspend
> [cfg80211]
>   call phy0+ returned 0 after 7 usecs
> 
> That makes it very obvious what's going on.  It also has the nice side
> effect of making the suspend/resume spew a little more obvious, since
> many resume functions have the word "resume" in the name:
> 
>   calling  phy0+ @ 15793, parent: mmc2:0001:1, cb: wiphy_resume [cfg80211]
>   call phy0+ returned 0 after 12 usecs
>   calling  rfkill1+ @ 15793, parent: phy0, cb: rfkill_resume
>   call rfkill1+ returned 0 after 1 usecs
> 
> Signed-off-by: Douglas Anderson <diand...@chromium.org>
> ---
>  drivers/base/power/main.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 1710c26..e2b1f14 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -188,14 +188,14 @@ void device_pm_move_last(struct device *dev)
>   list_move_tail(>power.entry, _list);
>  }
> 
> -static ktime_t initcall_debug_start(struct device *dev)
> +static ktime_t initcall_debug_start(struct device *dev, void *cb)
>  {
>   ktime_t calltime = ktime_set(0, 0);
> 
>   if (pm_print_times_enabled) {
> - pr_info("calling  %s+ @ %i, parent: %s\n",
> + pr_info("calling  %s+ @ %i, parent: %s, cb: %pf\n",
>   dev_name(dev), task_pid_nr(current),
> - dev->parent ? dev_name(dev->parent) : "none");
> + dev->parent ? dev_name(dev->parent) : "none", cb);
>   calltime = ktime_get();
>   }
> 
> @@ -382,7 +382,7 @@ static int dpm_run_callback(pm_callback_t cb, struct
> device *dev,
>   if (!cb)
>   return 0;
> 
> - calltime = initcall_debug_start(dev);
> + calltime = initcall_debug_start(dev, cb);
> 
>   pm_dev_dbg(dev, state, info);
>   trace_device_pm_callback_start(dev, info, state.event);
> @@ -1324,7 +1324,7 @@ static int legacy_suspend(struct device *dev,
> pm_message_t state,
>   int error;
>   ktime_t calltime;
> 
> - calltime = initcall_debug_start(dev);
> + calltime = initcall_debug_start(dev, cb);
> 
>   trace_device_pm_callback_start(dev, info, state.event);
>   error = cb(dev, state);
> --
> 2.6.0.rc0.131.gf624c3d

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] suspend: make sync() on suspend-to-RAM optional

2015-07-21 Thread Brown, Len
> Who does enter suspend to ram multiple times a second? Only android

One significant customer for fast suspend/resume
is sufficient to justify Linux supporting fast suspend/resume.

> Can you run android on mainline kernel? No

This is something we need to work together to help fix.

thanks,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] suspend: make sync() on suspend-to-RAM optional

2015-07-21 Thread Brown, Len
 Who does enter suspend to ram multiple times a second? Only android

One significant customer for fast suspend/resume
is sufficient to justify Linux supporting fast suspend/resume.

 Can you run android on mainline kernel? No

This is something we need to work together to help fix.

thanks,
-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] suspend: make sync() on suspend-to-RAM optional

2015-07-15 Thread Brown, Len


> -Original Message-
> From: Austin S Hemmelgarn [mailto:ahferro...@gmail.com]
> Sent: Wednesday, July 15, 2015 10:07 AM
> To: Pavel Machek; Len Brown
> Cc: r...@rjwysocki.net; linux...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Brown, Len
> Subject: Re: [PATCH 1/1] suspend: make sync() on suspend-to-RAM optional
> 
> On 2015-07-15 02:43, Pavel Machek wrote:
> > On Tue 2015-07-14 22:24:51, Len Brown wrote:
> >> From: Len Brown 
> >>
> >> The Linux kernel suspend path has traditionally invoked sys_sync().
> >>
> >> But sys_sync() can be expensive, and some systems do not want
> >> to pay the cost of sys_sync() on every suspend.
> >
> > Have you measured how expensive it can be, and why it is expensive?

> How expensive it is can vary widely, but it pretty much boils down to
> how much dirty data still needs written out, and how slow the storage it
> needs written to is.  There's not really much that can be done in the
> kernel to change this, and most userspace suspend systems call sync
> themselves during the suspend sequence.

Right.
And now, user-space gets is no longer forced to incur that
delay on every suspend if they do not want it.

Yes, have measured this under many conditions.
The bottom line is that sys_sync() is rarely as fast as 1ms,
and is sometimes as slow as hundreds of ms.

>> Why do you need CONFIG parameter?

So that an OS that doesn't want to change their user-space,
can build a kernel that does what they want by default.

Originally I had the config parameter remove this code entirely,
which would achieve the same goal.
But Rafael prefers the sysfs attribute always exist
and the config simply set the default.

thanks,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] suspend: make sync() on suspend-to-RAM optional

2015-07-15 Thread Brown, Len


 -Original Message-
 From: Austin S Hemmelgarn [mailto:ahferro...@gmail.com]
 Sent: Wednesday, July 15, 2015 10:07 AM
 To: Pavel Machek; Len Brown
 Cc: r...@rjwysocki.net; linux...@vger.kernel.org; linux-
 ker...@vger.kernel.org; Brown, Len
 Subject: Re: [PATCH 1/1] suspend: make sync() on suspend-to-RAM optional
 
 On 2015-07-15 02:43, Pavel Machek wrote:
  On Tue 2015-07-14 22:24:51, Len Brown wrote:
  From: Len Brown len.br...@intel.com
 
  The Linux kernel suspend path has traditionally invoked sys_sync().
 
  But sys_sync() can be expensive, and some systems do not want
  to pay the cost of sys_sync() on every suspend.
 
  Have you measured how expensive it can be, and why it is expensive?

 How expensive it is can vary widely, but it pretty much boils down to
 how much dirty data still needs written out, and how slow the storage it
 needs written to is.  There's not really much that can be done in the
 kernel to change this, and most userspace suspend systems call sync
 themselves during the suspend sequence.

Right.
And now, user-space gets is no longer forced to incur that
delay on every suspend if they do not want it.

Yes, have measured this under many conditions.
The bottom line is that sys_sync() is rarely as fast as 1ms,
and is sometimes as slow as hundreds of ms.

 Why do you need CONFIG parameter?

So that an OS that doesn't want to change their user-space,
can build a kernel that does what they want by default.

Originally I had the config parameter remove this code entirely,
which would achieve the same goal.
But Rafael prefers the sysfs attribute always exist
and the config simply set the default.

thanks,
-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-01 Thread Brown, Len
> This driver only knows how to handle counters, though.  I'm not sure
> whether all of the MSRs that turbostat needs are counters.

turbostat --debug
dumps a lot of configuration MSRs that are not counters.

"--debug" is not an obscure option, it is the only way that
the turbostat is used by advanced users, since it shows not just
how fast a system is running, but explains why.

turbostat -M or -C 'address'
will dump an arbitrary MSR at offset 'address' in hex or counter format.

This allows somebody to use yesterday's turbostat application
to dump an MSR that didn't exist when the application was written.
(and since the MSR driver doesn't have specific MSR addresses
 hard-coded into it, that is permitted)

> I knew that turbostat only did MSR reads

FYI, There have been proposals for turbostat to do MSR writes,
and I've resisted them because I like that multiple turbostats
can run at different intervals and even different users,
and not interfere with each other.  One of the proposals
was due to a short counter that tends to over-flow.  That,
I think would be better done in a central place in the kernel,
though it shouldn't poll unless it is actually being used.
The other is to be able to fix configuration bits that
are recognized to be incorrect or non-optimal. 

BTW. I've had a discussion w/ LLNL about their needs,
both for security and performance.  For security, as concluded
by this thread, a white list is the only way to go.
I'm thinking a bit-vector of allowed MSR offsets...
For performance, they absolutely can not afford a system call
for every single MSR access.  Here an ioctl to have the
msr driver perform a vector of accesses in a single system
call seems the way to go.  I can prototype both of these
using turbostat as the customer.

-Len

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] x86, msr: Allow read access to /dev/cpu/X/msr

2015-07-01 Thread Brown, Len
 This driver only knows how to handle counters, though.  I'm not sure
 whether all of the MSRs that turbostat needs are counters.

turbostat --debug
dumps a lot of configuration MSRs that are not counters.

--debug is not an obscure option, it is the only way that
the turbostat is used by advanced users, since it shows not just
how fast a system is running, but explains why.

turbostat -M or -C 'address'
will dump an arbitrary MSR at offset 'address' in hex or counter format.

This allows somebody to use yesterday's turbostat application
to dump an MSR that didn't exist when the application was written.
(and since the MSR driver doesn't have specific MSR addresses
 hard-coded into it, that is permitted)

 I knew that turbostat only did MSR reads

FYI, There have been proposals for turbostat to do MSR writes,
and I've resisted them because I like that multiple turbostats
can run at different intervals and even different users,
and not interfere with each other.  One of the proposals
was due to a short counter that tends to over-flow.  That,
I think would be better done in a central place in the kernel,
though it shouldn't poll unless it is actually being used.
The other is to be able to fix configuration bits that
are recognized to be incorrect or non-optimal. 

BTW. I've had a discussion w/ LLNL about their needs,
both for security and performance.  For security, as concluded
by this thread, a white list is the only way to go.
I'm thinking a bit-vector of allowed MSR offsets...
For performance, they absolutely can not afford a system call
for every single MSR access.  Here an ioctl to have the
msr driver perform a vector of accesses in a single system
call seems the way to go.  I can prototype both of these
using turbostat as the customer.

-Len

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH 2/2] turbostat, add set_base_cpu()

2015-05-25 Thread Brown, Len


> -Original Message-
> From: Prarit Bhargava [mailto:pra...@redhat.com]
> Sent: Friday, May 22, 2015 6:30 PM
> To: Brown, Len
> Cc: linux-kernel@vger.kernel.org; linux...@vger.kernel.org; Semin, Andrey
> Subject: Re: [PATCH 2/2] turbostat, add set_base_cpu()
> 
> 
> 
> On 05/22/2015 11:55 AM, Brown, Len wrote:
> >> +void set_base_cpu(void)
> >> +{
> >> +  int cpu;
> >> +
> >> +  for (cpu = 0; cpu <= topo.max_cpu_num; ++cpu) {
> >> +  if (cpu_is_not_present(cpu))
> >> +  continue;
> >> +  base_cpu = cpu;
> >> +  break;
> >> +  }
> >> +
> >> +  if (base_cpu == -1)
> >> +  err(-ENODEV, "No valid cpus found");
> >> +}
> >
> >
> > cpu0 hard-coding is indeed arbitrary.
> > However, so is this proposed replacement, base_cpu.
> > Either may not match where turbostat is currently running,
> > and thus could provoke unnecessary cross-calls to get there.
> >
> > I think it would be better to ask getcpu(2) where we are already
> running,
> > and simply use that one.  I think we can call it once and cache it,
> > as you proposed, rather than multiple system calls.
> 
> Any objection to sched_getcpu()?  That way the code is simply
> 
>   base_cpu = sched_getcpu();
> 
>   if (base_cpu == -1)
>   err(-ENODEV, "No valid cpus found");

Agreed, that is better than invoking the syscall directly,
as we already are using the sched.h interface in this code.

thanks,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] turbostat, add set_base_cpu()

2015-05-25 Thread Brown, Len


 -Original Message-
 From: Prarit Bhargava [mailto:pra...@redhat.com]
 Sent: Friday, May 22, 2015 6:30 PM
 To: Brown, Len
 Cc: linux-kernel@vger.kernel.org; linux...@vger.kernel.org; Semin, Andrey
 Subject: Re: [PATCH 2/2] turbostat, add set_base_cpu()
 
 
 
 On 05/22/2015 11:55 AM, Brown, Len wrote:
  +void set_base_cpu(void)
  +{
  +  int cpu;
  +
  +  for (cpu = 0; cpu = topo.max_cpu_num; ++cpu) {
  +  if (cpu_is_not_present(cpu))
  +  continue;
  +  base_cpu = cpu;
  +  break;
  +  }
  +
  +  if (base_cpu == -1)
  +  err(-ENODEV, No valid cpus found);
  +}
 
 
  cpu0 hard-coding is indeed arbitrary.
  However, so is this proposed replacement, base_cpu.
  Either may not match where turbostat is currently running,
  and thus could provoke unnecessary cross-calls to get there.
 
  I think it would be better to ask getcpu(2) where we are already
 running,
  and simply use that one.  I think we can call it once and cache it,
  as you proposed, rather than multiple system calls.
 
 Any objection to sched_getcpu()?  That way the code is simply
 
   base_cpu = sched_getcpu();
 
   if (base_cpu == -1)
   err(-ENODEV, No valid cpus found);

Agreed, that is better than invoking the syscall directly,
as we already are using the sched.h interface in this code.

thanks,
-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] turbostat, add set_base_cpu()

2015-05-22 Thread Brown, Len
> +void set_base_cpu(void)
> +{
> + int cpu;
> +
> + for (cpu = 0; cpu <= topo.max_cpu_num; ++cpu) {
> + if (cpu_is_not_present(cpu))
> + continue;
> + base_cpu = cpu;
> + break;
> + }
> +
> + if (base_cpu == -1)
> + err(-ENODEV, "No valid cpus found");
> +}


cpu0 hard-coding is indeed arbitrary.
However, so is this proposed replacement, base_cpu.
Either may not match where turbostat is currently running,
and thus could provoke unnecessary cross-calls to get there.

I think it would be better to ask getcpu(2) where we are already running,
and simply use that one.  I think we can call it once and cache it,
as you proposed, rather than multiple system calls.

thanks,
-Len

ps. patches to turbostat should go to linux...@vger.kernel.org


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] turbostat, add set_base_cpu()

2015-05-22 Thread Brown, Len
 +void set_base_cpu(void)
 +{
 + int cpu;
 +
 + for (cpu = 0; cpu = topo.max_cpu_num; ++cpu) {
 + if (cpu_is_not_present(cpu))
 + continue;
 + base_cpu = cpu;
 + break;
 + }
 +
 + if (base_cpu == -1)
 + err(-ENODEV, No valid cpus found);
 +}


cpu0 hard-coding is indeed arbitrary.
However, so is this proposed replacement, base_cpu.
Either may not match where turbostat is currently running,
and thus could provoke unnecessary cross-calls to get there.

I think it would be better to ask getcpu(2) where we are already running,
and simply use that one.  I think we can call it once and cache it,
as you proposed, rather than multiple system calls.

thanks,
-Len

ps. patches to turbostat should go to linux...@vger.kernel.org


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] x86, perf: Add an aperfmperf driver

2015-04-28 Thread Brown, Len
> I think that turbostat could do some of its work without being
> root if we had a driver like this.

Note that turbostat can be run as non-root this way:

# setcap cap_sys_rawio=ep ./turbostat
# chmod +r /dev/cpu/*/msr

For the debug case, there are a number of MSRs that turbostat must access,
so would still need permission for that case (which is the only case I use:-)

> Thoughts?  Would it make sense at all?  Did I wire it up right?  This is
> the only PMU driver I've ever written, and it could have any number of
> issues.

APERF/MPERF, as with all per-thread MSRs, must be accessed
from the local processor.  I didn't see where this driver
distinguishes the CPU.  Also, I assume the intent is to return
a snapshot, rather than sampling, yes?

Note that turbostat binds itself to a remote CPU so that MSR reads
are all local, then it binds to the next CPU etc.  In the old days,
we read everything without this binding, and the kernel overhead
of the remote reads was too high, making it difficult to measure
profoundly idle systems.

cheers,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] x86, perf: Add an aperfmperf driver

2015-04-28 Thread Brown, Len
 I think that turbostat could do some of its work without being
 root if we had a driver like this.

Note that turbostat can be run as non-root this way:

# setcap cap_sys_rawio=ep ./turbostat
# chmod +r /dev/cpu/*/msr

For the debug case, there are a number of MSRs that turbostat must access,
so would still need permission for that case (which is the only case I use:-)

 Thoughts?  Would it make sense at all?  Did I wire it up right?  This is
 the only PMU driver I've ever written, and it could have any number of
 issues.

APERF/MPERF, as with all per-thread MSRs, must be accessed
from the local processor.  I didn't see where this driver
distinguishes the CPU.  Also, I assume the intent is to return
a snapshot, rather than sampling, yes?

Note that turbostat binds itself to a remote CPU so that MSR reads
are all local, then it binds to the next CPU etc.  In the old days,
we read everything without this binding, and the kernel overhead
of the remote reads was too high, making it difficult to measure
profoundly idle systems.

cheers,
-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] x86: replace cpu_up hard-coded mdelay with variable

2015-04-20 Thread Brown, Len
> What's the cutoff for 'modern hardware' - which CPUs stopped requiring
> the delay?

This is the topic of ongoing research, and I'm not ready to send
the patch setting a new default until I've heard back from a few more HW people.

Every system I've tested appears to work with delay 0.
Were I to guess, I'd venture that every
system that runs an X86_64 kernel might count as "modern" -- even
the 2005 AMD Turion laptop I've got in the bone pile.

> Also, is there any public document where the no-delay is specified for
> 'modern hardware'?

Unfortunately only the converse is true.  There is an ancient document 
specifying
that the delay may be necessary.

cheers,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] x86: replace cpu_up hard-coded mdelay with variable

2015-04-20 Thread Brown, Len
 What's the cutoff for 'modern hardware' - which CPUs stopped requiring
 the delay?

This is the topic of ongoing research, and I'm not ready to send
the patch setting a new default until I've heard back from a few more HW people.

Every system I've tested appears to work with delay 0.
Were I to guess, I'd venture that every
system that runs an X86_64 kernel might count as modern -- even
the 2005 AMD Turion laptop I've got in the bone pile.

 Also, is there any public document where the no-delay is specified for
 'modern hardware'?

Unfortunately only the converse is true.  There is an ancient document 
specifying
that the delay may be necessary.

cheers,
-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Kernel bug 60770

2014-08-28 Thread Brown, Len
ISTR that I got interrupted when working on that one
and never got back to it.  Also, I think I had a puzzling
power measurement result -- like I had measured a power benefit,
and then i couldn't find a benefit.  (No matter, should be a perf
benefit even if ISO power)

Next week I should have access to a core2 DT and Xeon
and can take another look at this.

thanks,
-Len



RE: Kernel bug 60770

2014-08-28 Thread Brown, Len
ISTR that I got interrupted when working on that one
and never got back to it.  Also, I think I had a puzzling
power measurement result -- like I had measured a power benefit,
and then i couldn't find a benefit.  (No matter, should be a perf
benefit even if ISO power)

Next week I should have access to a core2 DT and Xeon
and can take another look at this.

thanks,
-Len



RE: [tip:x86/urgent] x86 idle: Repair large-server 50-watt idle-power regression

2014-04-08 Thread Brown, Len
Davidlohr,

Thanks for the note.

Ideally (on Linux in general, and on servers, in particular) we strive
for the performance impact of power saving features to be small enough
to be considered in "measurement noise".

Your report for 160 core Westmere AIM numbers being hit at 10-25%
shows 15% measurement noise?  But even if true, this looks bad.

Any chance you can re-run, with the following two tweaks,
one at a time?

I'd be curious if you can wrap the invocation in turbostat -v
and capture that output to how what states we are seeing
during the benchmark run.

thanks,
-Len



#1: skip flush for C1

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index f80b700..6027d06 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -377,7 +377,7 @@ static int intel_idle(struct cpuidle_device *dev,
 
if (!current_set_polling_and_test()) {
 
-   if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+   if ((eax > 0) && this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)_thread_info()->flags);
 
__monitor((void *)_thread_info()->flags, 0, 0);


#2: skip flush for C1 and C1E

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index f80b700..6027d06 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -377,7 +377,7 @@ static int intel_idle(struct cpuidle_device *dev,
 
if (!current_set_polling_and_test()) {
 
-   if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+   if ((eax > 1) && this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)_thread_info()->flags);
 
__monitor((void *)_thread_info()->flags, 0, 0);




RE: [tip:x86/urgent] x86 idle: Repair large-server 50-watt idle-power regression

2014-04-08 Thread Brown, Len
Davidlohr,

Thanks for the note.

Ideally (on Linux in general, and on servers, in particular) we strive
for the performance impact of power saving features to be small enough
to be considered in measurement noise.

Your report for 160 core Westmere AIM numbers being hit at 10-25%
shows 15% measurement noise?  But even if true, this looks bad.

Any chance you can re-run, with the following two tweaks,
one at a time?

I'd be curious if you can wrap the invocation in turbostat -v
and capture that output to how what states we are seeing
during the benchmark run.

thanks,
-Len



#1: skip flush for C1

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index f80b700..6027d06 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -377,7 +377,7 @@ static int intel_idle(struct cpuidle_device *dev,
 
if (!current_set_polling_and_test()) {
 
-   if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+   if ((eax  0)  this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)current_thread_info()-flags);
 
__monitor((void *)current_thread_info()-flags, 0, 0);


#2: skip flush for C1 and C1E

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index f80b700..6027d06 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -377,7 +377,7 @@ static int intel_idle(struct cpuidle_device *dev,
 
if (!current_set_polling_and_test()) {
 
-   if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+   if ((eax  1)  this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)current_thread_info()-flags);
 
__monitor((void *)current_thread_info()-flags, 0, 0);




RE: Regression in intel_idle on Avaton/Rangely Mohon Peak board

2014-04-02 Thread Brown, Len
> [0.840668] intel_idle: lapic_timer_reliable_states 0x2
vs.
> [0.877528] intel_idle: lapic_timer_reliable_states 0x

This means CPUID.ARAT is set for the new board, and not set
for the old board.  You can observe that also in /proc/cpuinfo flags
where you will likely also find a visible stepping difference between
these two boards.

Bring-up Avoton had ARAT disabled, and also had broken deep C-states.
It looks like that is what your old board has.  Throw it away and
use only production steppings.  If you are stuck w/ pre-production hardware,
then you need to manually modify upstream Linux to make it happy --
since upstream Linux only cares about production hardware.

thanks,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Regression in intel_idle on Avaton/Rangely Mohon Peak board

2014-04-02 Thread Brown, Len
> > I'd be interested in the acpi_idle output above for both the
> > new and old boards to see if they are exporting different states
> > on the two boards.
> 
> Could be ; I can probably get access to the newer one again too, if
> that will be useful.

yes, please.

> >
> > dmidecode isn't useful in this case.  The CPUID in /proc/cpuinfo
> > may be useful if the problem turns out to be associated with
> > some stepping.
> 
> The dmidecode info I'd posted indicated that the steppings were
> unnchanged.  I can get the /proc/cpuinfo tomorrow, but I figured
> the dmidecode stepping info was accurate.  Is it not reliable?


DMI is useful to get the BIOS version string, not much else.
Here you want the output from the CPUID instruction,
which is displayed as family/model/stepping in /proc/cpuinfo

thanks,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Regression in intel_idle on Avaton/Rangely Mohon Peak board

2014-04-02 Thread Brown, Len
  I'd be interested in the acpi_idle output above for both the
  new and old boards to see if they are exporting different states
  on the two boards.
 
 Could be ; I can probably get access to the newer one again too, if
 that will be useful.

yes, please.

 
  dmidecode isn't useful in this case.  The CPUID in /proc/cpuinfo
  may be useful if the problem turns out to be associated with
  some stepping.
 
 The dmidecode info I'd posted indicated that the steppings were
 unnchanged.  I can get the /proc/cpuinfo tomorrow, but I figured
 the dmidecode stepping info was accurate.  Is it not reliable?


DMI is useful to get the BIOS version string, not much else.
Here you want the output from the CPUID instruction,
which is displayed as family/model/stepping in /proc/cpuinfo

thanks,
-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Regression in intel_idle on Avaton/Rangely Mohon Peak board

2014-04-02 Thread Brown, Len
 [0.840668] intel_idle: lapic_timer_reliable_states 0x2
vs.
 [0.877528] intel_idle: lapic_timer_reliable_states 0x

This means CPUID.ARAT is set for the new board, and not set
for the old board.  You can observe that also in /proc/cpuinfo flags
where you will likely also find a visible stepping difference between
these two boards.

Bring-up Avoton had ARAT disabled, and also had broken deep C-states.
It looks like that is what your old board has.  Throw it away and
use only production steppings.  If you are stuck w/ pre-production hardware,
then you need to manually modify upstream Linux to make it happy --
since upstream Linux only cares about production hardware.

thanks,
-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Regression in intel_idle on Avaton/Rangely Mohon Peak board

2014-04-01 Thread Brown, Len
> I've got an eval board with a 1.7GHz Avaton/C2000 that hangs at boot
> shortly after the idle driver registration -- typically 1/2 dozen
> dmesg lines later, around rtc init, or net stack init.

Paul,
Please boot the failing board with "intel_idle.max_cstate=0"
to disable intel_idle entirely, and then show the C-states
exported by acpi_idle, that predumably, are stable on both boards:

dmesg | grep idle
grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*

Then go back and boot with "intel_idle.max_cstate=N"
where N is incremented by 1 until when the system fails
and note the largest N that still works.

> The interesting part is that a nearly identical board, but with
> different (newer/faster) CPU and newer BIOS doesn't have the hang.

Possibly an electrical bug in the earlier board.
Maybe they worked around it by disabling a C-state in ACPI
and didn't test upstream Linux?

I'd be interested in the acpi_idle output above for both the
new and old boards to see if they are exporting different states
on the two boards.

dmidecode isn't useful in this case.  The CPUID in /proc/cpuinfo
may be useful if the problem turns out to be associated with
some stepping.

thanks,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Regression in intel_idle on Avaton/Rangely Mohon Peak board

2014-04-01 Thread Brown, Len
 I've got an eval board with a 1.7GHz Avaton/C2000 that hangs at boot
 shortly after the idle driver registration -- typically 1/2 dozen
 dmesg lines later, around rtc init, or net stack init.

Paul,
Please boot the failing board with intel_idle.max_cstate=0
to disable intel_idle entirely, and then show the C-states
exported by acpi_idle, that predumably, are stable on both boards:

dmesg | grep idle
grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*

Then go back and boot with intel_idle.max_cstate=N
where N is incremented by 1 until when the system fails
and note the largest N that still works.

 The interesting part is that a nearly identical board, but with
 different (newer/faster) CPU and newer BIOS doesn't have the hang.

Possibly an electrical bug in the earlier board.
Maybe they worked around it by disabling a C-state in ACPI
and didn't test upstream Linux?

I'd be interested in the acpi_idle output above for both the
new and old boards to see if they are exporting different states
on the two boards.

dmidecode isn't useful in this case.  The CPUID in /proc/cpuinfo
may be useful if the problem turns out to be associated with
some stepping.

thanks,
-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

2014-02-01 Thread Brown, Len
> And your point is?

It is a bad idea for an individual CPU to track the C-state
of another CPU, which can change the cycle after it was checked.
We know it is a bad idea because we used to do it,
until we realized code here can easily impact the
performance critical path.

In general, it is the OS's job to communicate constraints
to the HW, and the HW to act on them.  Not all HW is smart,
so sometimes the OS has to do more hand-holding -- but
less is more.

thanks,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

2014-02-01 Thread Brown, Len
 And your point is?

It is a bad idea for an individual CPU to track the C-state
of another CPU, which can change the cycle after it was checked.
We know it is a bad idea because we used to do it,
until we realized code here can easily impact the
performance critical path.

In general, it is the OS's job to communicate constraints
to the HW, and the HW to act on them.  Not all HW is smart,
so sometimes the OS has to do more hand-holding -- but
less is more.

thanks,
-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

2014-01-31 Thread Brown, Len
> Right now (on ARM at least but I imagine this is pretty universal), the
> biggest impact on information accuracy for a CPU depends on what the
> other CPUs are doing.  The most obvious example is cluster power down.
> For a cluster to be powered down, all the CPUs sharing this cluster must
> also be powered down.  And all those CPUs must have agreed to a possible
> cluster power down in advance as well.  But it is not because an idle
> CPU has agreed to the extra latency imposed by a cluster power down that
> the cluster has actually powered down since another CPU in that cluster
> might still be running, in which case the recorded latency information
> for that idle CPU would be higher than it would be in practice at that
> moment.

That will not work.

When a CPU goes idle, it uses the CURRENT criteria for entering that state.
If the criteria change after it has entered the state, are you going
to wake it up so it can re-evaluate?  No.

That is why the state must describe the worst case latency
that CPU may see when waking from the state on THAT entry.

That is why we use the package C-state numbers to describe
core C-states on IA.

-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

2014-01-31 Thread Brown, Len
 Right now (on ARM at least but I imagine this is pretty universal), the
 biggest impact on information accuracy for a CPU depends on what the
 other CPUs are doing.  The most obvious example is cluster power down.
 For a cluster to be powered down, all the CPUs sharing this cluster must
 also be powered down.  And all those CPUs must have agreed to a possible
 cluster power down in advance as well.  But it is not because an idle
 CPU has agreed to the extra latency imposed by a cluster power down that
 the cluster has actually powered down since another CPU in that cluster
 might still be running, in which case the recorded latency information
 for that idle CPU would be higher than it would be in practice at that
 moment.

That will not work.

When a CPU goes idle, it uses the CURRENT criteria for entering that state.
If the criteria change after it has entered the state, are you going
to wake it up so it can re-evaluate?  No.

That is why the state must describe the worst case latency
that CPU may see when waking from the state on THAT entry.

That is why we use the package C-state numbers to describe
core C-states on IA.

-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] suspend: make sync() on suspend-to-RAM optional

2014-01-22 Thread Brown, Len
> How about naming it as PM_SLEEP_FS_SYNC (and similarly in the sysfs
> files
> and variable names as well). Just to avoid confusion with
> "synchronous/async".


good point -- thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] suspend: make sync() on suspend-to-RAM optional

2014-01-22 Thread Brown, Len
> > +   depends on PM_SLEEP
>
> this is actually a suspend specific feature, and it should depends on
> SUSPEND instead?

yup, will update.

thanks,
-Len



RE: [PATCH 1/1] suspend: make sync() on suspend-to-RAM optional

2014-01-22 Thread Brown, Len
  +   depends on PM_SLEEP

 this is actually a suspend specific feature, and it should depends on
 SUSPEND instead?

yup, will update.

thanks,
-Len



RE: [PATCH 1/1] suspend: make sync() on suspend-to-RAM optional

2014-01-22 Thread Brown, Len
 How about naming it as PM_SLEEP_FS_SYNC (and similarly in the sysfs
 files
 and variable names as well). Just to avoid confusion with
 synchronous/async.


good point -- thanks!


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] x86, acpi: Fix wrong checking condition in acpi_register_lapic().

2013-07-22 Thread Brown, Len
Reviewed-by: Len Brown 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] x86, acpi: Fix wrong checking condition in acpi_register_lapic().

2013-07-22 Thread Brown, Len
Reviewed-by: Len Brown len.br...@intel.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [GIT PULL] Some error injection fixes to queue for 3.11

2013-06-19 Thread Brown, Len
> >> Pulled, thanks Tony!
> >>
> >> Len, are you fine with this route [tip:x86/ras tree] for the
> >> drivers/acpi/apei/einj.c changes?
> >
> > Yes, the RAS guys basically own that code.
> 
> These patches also got picked up by Rafael and are in his ACPI tree
> too.  I think the patches were applied identically, so there should not
> be any merge conflicts when this all comes back together in the 3.11
> merge window.
> 
> Rafael already had a chat about who will take future apei changes
> so that we won't have this happen again.

I think the only real problem with being in two places at once
is if the patches _change_.   When that happens, you need to
get them _both_ changed, else merge conflict happens at the worst time.

-L
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [GIT PULL] Some error injection fixes to queue for 3.11

2013-06-19 Thread Brown, Len
+rafael

> Pulled, thanks Tony!
> 
> Len, are you fine with this route [tip:x86/ras tree] for the
> drivers/acpi/apei/einj.c changes?

Yes, the RAS guys basically own that code.

thanks,
-Len


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [GIT PULL] Some error injection fixes to queue for 3.11

2013-06-19 Thread Brown, Len
+rafael

 Pulled, thanks Tony!
 
 Len, are you fine with this route [tip:x86/ras tree] for the
 drivers/acpi/apei/einj.c changes?

Yes, the RAS guys basically own that code.

thanks,
-Len


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [GIT PULL] Some error injection fixes to queue for 3.11

2013-06-19 Thread Brown, Len
  Pulled, thanks Tony!
 
  Len, are you fine with this route [tip:x86/ras tree] for the
  drivers/acpi/apei/einj.c changes?
 
  Yes, the RAS guys basically own that code.
 
 These patches also got picked up by Rafael and are in his ACPI tree
 too.  I think the patches were applied identically, so there should not
 be any merge conflicts when this all comes back together in the 3.11
 merge window.
 
 Rafael already had a chat about who will take future apei changes
 so that we won't have this happen again.

I think the only real problem with being in two places at once
is if the patches _change_.   When that happens, you need to
get them _both_ changed, else merge conflict happens at the worst time.

-L
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 09/16] ia64 idle: delete pm_idle

2013-03-25 Thread Brown, Len
> > -   idle = pm_idle;
> > if (!idle)
> 
> Hm, if I'm not mistaken idle will uninitialized at this point, so this
> could quite likely lead to a crash.


thanks,
will fix.

-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 09/16] ia64 idle: delete pm_idle

2013-03-25 Thread Brown, Len
  -   idle = pm_idle;
  if (!idle)
 
 Hm, if I'm not mistaken idle will uninitialized at this point, so this
 could quite likely lead to a crash.


thanks,
will fix.

-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [patch] Add IVB model 3e support to /drivers/idle/intel_idle.c

2012-11-26 Thread Brown, Len
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 
> e872617..b0f6b4c 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -413,6 +413,7 @@ static const struct x86_cpu_id intel_idle_ids[] = {
> ICPU(0x2a, idle_cpu_snb),
> ICPU(0x2d, idle_cpu_snb),
> ICPU(0x3a, idle_cpu_ivb),
> +   ICPU(0x3e, idle_cpu_ivb),
> {}

already in Linux-3.7

thanks,
-Len

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [patch] Add IVB model 3e support to /drivers/idle/intel_idle.c

2012-11-26 Thread Brown, Len
 diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 
 e872617..b0f6b4c 100644
 --- a/drivers/idle/intel_idle.c
 +++ b/drivers/idle/intel_idle.c
 @@ -413,6 +413,7 @@ static const struct x86_cpu_id intel_idle_ids[] = {
 ICPU(0x2a, idle_cpu_snb),
 ICPU(0x2d, idle_cpu_snb),
 ICPU(0x3a, idle_cpu_ivb),
 +   ICPU(0x3e, idle_cpu_ivb),
 {}

already in Linux-3.7

thanks,
-Len

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.25-rc2-mm1: WARNING at arch/x86/mm/ioremap.c:129

2008-02-17 Thread Brown, Len
>> evxfevnt-0091 [00] enable: Transition to 
>ACPI mode successful
>> khelper used greatest stack depth: 3144 bytes left
>> net_namespace: 304 bytes
>> NET: Registered protocol family 16
>> ACPI: bus type pci registered
>> khelper used greatest stack depth: 3032 bytes left
>> PCI: PCI BIOS revision 2.10 entry at 0xf1180, last bus=1
>> PCI: Using configuration type 1
>> Setting up standard PCI resources
>> evgpeblk-0956 [00] ev_create_gpe_block   : GPE 00 to 0F 
>[_GPE] 2 regs on int 0x9
>> evgpeblk-1052 [00] ev_initialize_gpe_bloc: Found 4 Wake, 
>Enabled 0 Runtime GPEs in this block
>> ACPI: EC: Look up EC in DSDT
>> [ cut here ]
>> WARNING: at arch/x86/mm/ioremap.c:129 __ioremap+0xc7/0x182()
>> Modules linked in:
>> Pid: 1, comm: swapper Not tainted 2.6.25-rc2-mm1 #40
>>  [] warn_on_slowpath+0x41/0x6d
>>  [] ? trace_hardirqs_on+0xb/0xd
>>  [] ? acpi_os_release_object+0x8/0xc
>>  [] ? acpi_ut_delete_object_desc+0x39/0x3e
>>  [] ? acpi_ut_delete_internal_obj+0x2c1/0x2c9
>>  [] ? trace_hardirqs_on+0xb/0xd
>>  [] ? trace_hardirqs_on_caller+0xdf/0x100
>>  [] ? trace_hardirqs_on+0xb/0xd
>>  [] __ioremap+0xc7/0x182
>>  [] ioremap_nocache+0xa/0xc
>>  [] acpi_os_map_memory+0x11/0x1a
>>  [] acpi_ex_system_memory_space_handler+0xd3/0x228
>>  [] ? acpi_ev_address_space_dispatch+0x142/0x1a8
>>  [] ? acpi_ex_system_memory_space_handler+0x0/0x228
>>  [] acpi_ev_address_space_dispatch+0x167/0x1a8
>>  [] acpi_ex_access_region+0x1e4/0x270
>>  [] acpi_ex_field_datum_io+0x153/0x2a1
>>  [] ? cache_alloc_debugcheck_after+0xe9/0x165
>>  [] acpi_ex_extract_from_field+0x91/0x224
>>  [] ? acpi_ex_read_data_from_field+0x163/0x1b0
>>  [] acpi_ex_read_data_from_field+0x180/0x1b0
>>  [] acpi_ex_resolve_node_to_value+0x1aa/0x230
>>  [] acpi_ex_resolve_to_value+0x270/0x2aa
>>  [] acpi_ex_resolve_operands+0x24e/0x52f
>
>Len: This WARN_ON says that ACPI is trying to call ioremap() 
>on memory that the e820_table
>lists as "kernel owned". Do you know why ACPI would do this? 
>Would ACPI get upset if
>the kernel would tell it to take a hike?

Depends on the BIOS -- as it is the BIOS AML that is making this
request.

-Len
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.25-rc2-mm1: WARNING at arch/x86/mm/ioremap.c:129

2008-02-17 Thread Brown, Len
 evxfevnt-0091 [00] enable: Transition to 
ACPI mode successful
 khelper used greatest stack depth: 3144 bytes left
 net_namespace: 304 bytes
 NET: Registered protocol family 16
 ACPI: bus type pci registered
 khelper used greatest stack depth: 3032 bytes left
 PCI: PCI BIOS revision 2.10 entry at 0xf1180, last bus=1
 PCI: Using configuration type 1
 Setting up standard PCI resources
 evgpeblk-0956 [00] ev_create_gpe_block   : GPE 00 to 0F 
[_GPE] 2 regs on int 0x9
 evgpeblk-1052 [00] ev_initialize_gpe_bloc: Found 4 Wake, 
Enabled 0 Runtime GPEs in this block
 ACPI: EC: Look up EC in DSDT
 [ cut here ]
 WARNING: at arch/x86/mm/ioremap.c:129 __ioremap+0xc7/0x182()
 Modules linked in:
 Pid: 1, comm: swapper Not tainted 2.6.25-rc2-mm1 #40
  [c0118955] warn_on_slowpath+0x41/0x6d
  [c0131d0a] ? trace_hardirqs_on+0xb/0xd
  [c01fdd89] ? acpi_os_release_object+0x8/0xc
  [c02188d4] ? acpi_ut_delete_object_desc+0x39/0x3e
  [c0217f05] ? acpi_ut_delete_internal_obj+0x2c1/0x2c9
  [c0131d0a] ? trace_hardirqs_on+0xb/0xd
  [c0131cde] ? trace_hardirqs_on_caller+0xdf/0x100
  [c0131d0a] ? trace_hardirqs_on+0xb/0xd
  [c01129c5] __ioremap+0xc7/0x182
  [c0112a99] ioremap_nocache+0xa/0xc
  [c02a6877] acpi_os_map_memory+0x11/0x1a
  [c020b697] acpi_ex_system_memory_space_handler+0xd3/0x228
  [c0203bd8] ? acpi_ev_address_space_dispatch+0x142/0x1a8
  [c020b5c4] ? acpi_ex_system_memory_space_handler+0x0/0x228
  [c0203bfd] acpi_ev_address_space_dispatch+0x167/0x1a8
  [c02083dd] acpi_ex_access_region+0x1e4/0x270
  [c02085bc] acpi_ex_field_datum_io+0x153/0x2a1
  [c0158ac0] ? cache_alloc_debugcheck_after+0xe9/0x165
  [c020879b] acpi_ex_extract_from_field+0x91/0x224
  [c0206bcf] ? acpi_ex_read_data_from_field+0x163/0x1b0
  [c0206bec] acpi_ex_read_data_from_field+0x180/0x1b0
  [c020d256] acpi_ex_resolve_node_to_value+0x1aa/0x230
  [c0207a32] acpi_ex_resolve_to_value+0x270/0x2aa
  [c0209e47] acpi_ex_resolve_operands+0x24e/0x52f

Len: This WARN_ON says that ACPI is trying to call ioremap() 
on memory that the e820_table
lists as kernel owned. Do you know why ACPI would do this? 
Would ACPI get upset if
the kernel would tell it to take a hike?

Depends on the BIOS -- as it is the BIOS AML that is making this
request.

-Len
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] reverse CONFIG_ACPI_PROC_EVENT default

2007-08-27 Thread Brown, Len
Thanks Hugh.

-Len 

>-Original Message-
>From: Hugh Dickins [mailto:[EMAIL PROTECTED] 
>Sent: Monday, August 27, 2007 11:05 AM
>To: Linus Torvalds
>Cc: Andrew Morton; Brown, Len; linux-kernel@vger.kernel.org
>Subject: [PATCH] reverse CONFIG_ACPI_PROC_EVENT default
>
>Sigh.  Again an ACPI assault on the Thinkpad's Fn+F4 to suspend to RAM.
>The default and text for CONFIG_THINKPAD_ACPI_INPUT_ENABLED were fixed
>in -rc3, but now 14e04fb34ffa82ee61ae69f98d8fca12d2e8e31c introduces
>CONFIG_ACPI_PROC_EVENT default n to disable it again.  Change default
>to y, and add comment to make it clearer that n is for future distros.
>
>Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]>
>
>--- 2.6.23-rc3-git10/drivers/acpi/Kconfig  2007-08-26 
>18:10:15.0 +0100
>+++ linux/drivers/acpi/Kconfig 2007-08-26 18:59:16.0 +0100
>@@ -71,6 +71,7 @@ config ACPI_PROCFS
> config ACPI_PROC_EVENT
>   bool "Deprecated /proc/acpi/event support"
>   depends on PROC_FS
>+  default y
>   ---help---
> A user-space daemon, acpi, typically read /proc/acpi/event
> and handled all ACPI sub-system generated events.
>@@ -78,10 +79,13 @@ config ACPI_PROC_EVENT
> These events are now delivered to user-space via
> either the input layer, or as netlink events.
> 
>-This build option enables the old code for for legacy
>+This build option enables the old code for legacy
> user-space implementation.  After some time, this will
> be moved under CONFIG_ACPI_PROCFS, and then deleted.
> 
>+Say Y here to retain the old behaviour.  Say N if your
>+user-space is newer than kernel 2.6.23 (September 2007).
>+
> config ACPI_AC
>   tristate "AC Adapter"
>   depends on X86
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] reverse CONFIG_ACPI_PROC_EVENT default

2007-08-27 Thread Brown, Len
Thanks Hugh.

-Len 

-Original Message-
From: Hugh Dickins [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 27, 2007 11:05 AM
To: Linus Torvalds
Cc: Andrew Morton; Brown, Len; linux-kernel@vger.kernel.org
Subject: [PATCH] reverse CONFIG_ACPI_PROC_EVENT default

Sigh.  Again an ACPI assault on the Thinkpad's Fn+F4 to suspend to RAM.
The default and text for CONFIG_THINKPAD_ACPI_INPUT_ENABLED were fixed
in -rc3, but now 14e04fb34ffa82ee61ae69f98d8fca12d2e8e31c introduces
CONFIG_ACPI_PROC_EVENT default n to disable it again.  Change default
to y, and add comment to make it clearer that n is for future distros.

Signed-off-by: Hugh Dickins [EMAIL PROTECTED]

--- 2.6.23-rc3-git10/drivers/acpi/Kconfig  2007-08-26 
18:10:15.0 +0100
+++ linux/drivers/acpi/Kconfig 2007-08-26 18:59:16.0 +0100
@@ -71,6 +71,7 @@ config ACPI_PROCFS
 config ACPI_PROC_EVENT
   bool Deprecated /proc/acpi/event support
   depends on PROC_FS
+  default y
   ---help---
 A user-space daemon, acpi, typically read /proc/acpi/event
 and handled all ACPI sub-system generated events.
@@ -78,10 +79,13 @@ config ACPI_PROC_EVENT
 These events are now delivered to user-space via
 either the input layer, or as netlink events.
 
-This build option enables the old code for for legacy
+This build option enables the old code for legacy
 user-space implementation.  After some time, this will
 be moved under CONFIG_ACPI_PROCFS, and then deleted.
 
+Say Y here to retain the old behaviour.  Say N if your
+user-space is newer than kernel 2.6.23 (September 2007).
+
 config ACPI_AC
   tristate AC Adapter
   depends on X86

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Git Patch] ACPI: Fix a warning of discarding qualifiers from pointer target type

2007-08-22 Thread Brown, Len

>On Tue, Aug 21, 2007 at 07:22:51PM +0400, Alexey Starikovskiy wrote:
>> Yes, that is the proper solution with a single drawback:
>> it touches ACPICA dual-licensed code and would take ages to commit,
>> and Len would probably ask you to give permission to 
>re-license it under
>> BSD.
>
>The latter is certainly not a problem (assuming that it's non-trivial
>enough to be copyrightable, in the first place).

The dual license is at the top of the file.
I just need to know when you touch one of these dual-license files
that Intel has your permission to ship your change to folks we
send the same file to under the BSD-style non-GPL license.
eg. BSD, Solaris and HPUX uses this same ACPICA files -- though in a
different format.

I've actually never had anybody say no when I asked.
And I've had some people who play laywers on TV tell me
this is not strictly necessary, but other people who play laywers
on TV tell me it is best to ask, so I do.

thanks,
-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Git Patch] ACPI: Fix a warning of discarding qualifiers from pointer target type

2007-08-22 Thread Brown, Len

On Tue, Aug 21, 2007 at 07:22:51PM +0400, Alexey Starikovskiy wrote:
 Yes, that is the proper solution with a single drawback:
 it touches ACPICA dual-licensed code and would take ages to commit,
 and Len would probably ask you to give permission to 
re-license it under
 BSD.

The latter is certainly not a problem (assuming that it's non-trivial
enough to be copyrightable, in the first place).

The dual license is at the top of the file.
I just need to know when you touch one of these dual-license files
that Intel has your permission to ship your change to folks we
send the same file to under the BSD-style non-GPL license.
eg. BSD, Solaris and HPUX uses this same ACPICA files -- though in a
different format.

I've actually never had anybody say no when I asked.
And I've had some people who play laywers on TV tell me
this is not strictly necessary, but other people who play laywers
on TV tell me it is best to ask, so I do.

thanks,
-Len
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH][ACPI][BUTTON] remove procfs-interface

2007-07-12 Thread Brown, Len
The last time I tried to remove this code,
we discovered that people were using it to query
the lid (button) status.

-Len 

>-Original Message-
>From: Richard Hughes [mailto:[EMAIL PROTECTED] 
>Sent: Thursday, July 12, 2007 6:26 AM
>To: Zhang, Rui
>Cc: Henne; Arjan van de Ven; Brown, Len; 
>[EMAIL PROTECTED]; [EMAIL PROTECTED]; 
>linux-kernel@vger.kernel.org
>Subject: RE: [PATCH][ACPI][BUTTON] remove procfs-interface
>
>On Thu, 2007-07-12 at 17:46 +0800, Zhang, Rui wrote:
>> 
>> I'm not sure if the button sysfs I/F is already finished.
>> We'd better make a double check. :)
>
>We need a button sysfs interface? What's wrong with just using input?
>
>Richard.
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH][ACPI][BUTTON] remove procfs-interface

2007-07-12 Thread Brown, Len
The last time I tried to remove this code,
we discovered that people were using it to query
the lid (button) status.

-Len 

-Original Message-
From: Richard Hughes [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 12, 2007 6:26 AM
To: Zhang, Rui
Cc: Henne; Arjan van de Ven; Brown, Len; 
[EMAIL PROTECTED]; [EMAIL PROTECTED]; 
linux-kernel@vger.kernel.org
Subject: RE: [PATCH][ACPI][BUTTON] remove procfs-interface

On Thu, 2007-07-12 at 17:46 +0800, Zhang, Rui wrote:
 
 I'm not sure if the button sysfs I/F is already finished.
 We'd better make a double check. :)

We need a button sysfs interface? What's wrong with just using input?

Richard.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: drivers/video/output.c

2007-04-17 Thread Brown, Len
>Asides from git-bisect failing me again[1], what gives with this file?

it supports output switching, which didn't make it into 2.6.21 --
probably will be 2.6.22.

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >