from:"Borislav Petkov"

Re: [FIRMWARE BUG] Lenovo x120e

2012-11-28 Thread Borislav Petkov

On Wed, Nov 28, 2012 at 05:27:45AM -0800, Denis Lotarev wrote:
> Im using kernel (Linux initbox 3.6.7-1-ARCH #1 SMP PREEMPT Sun Nov 18 
> 10:11:22 CET 2012 x86_64 GNU/Linux) and i see some problems at booting... So 
> dmesg output with BUG lines:
> 
> $ dmesg |grep Bug
> [1.363154] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
> [1.373784] [Firmware Bug]: ACPI: No _BQC method, cannot determine initial 
> brightness
> [1.380827] [Firmware Bug]: ACPI: No _BQC method, cannot determine initial 
> brightness
> [1.433354] pnp 00:09: [Firmware Bug]: [mem 0x-0x 
> disabled] covers only part of AMD MMCONFIG area [mem 0xf800-0xf9ff]; 
> adding more reservations
> [3.349658] [Firmware Bug]: ACPI: No _BQC method, cannot determine initial 
> brightness

Nothing to worry about, just ACPI in BIOS is not implementing _BQC
method with which you can query the initial brightness level of the
display device. Frankly, I'm not getting any smarter from the commit
message as to what it even got added for:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=98758faffc86ee6fe9504eeab75481ee7c1aa860

"kinda firmware bug"?? :-)

Maybe Zhang/Len can shed more light on the matter.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/8] x86, 386 removal: Remove CONFIG_XADD

2012-11-28 Thread Borislav Petkov

On Wed, Nov 28, 2012 at 11:50:25AM -0800, H. Peter Anvin wrote:
> From: "H. Peter Anvin" 
> 
> All 486+ CPUs support CMPXCHG, so remove the fallback 386 support
> code.

This commit message should say something about XADD and not about
CMPXCHG, right?

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/8] x86, cleanups: Simplify sync_core() in the case of no CPUID

2012-11-28 Thread Borislav Petkov

On Wed, Nov 28, 2012 at 11:50:30AM -0800, H. Peter Anvin wrote:
> From: "H. Peter Anvin" 
> 
> Simplify the implementation of sync_core() for the case where we may
> not have the CPUID instruction available.
> 
> Signed-off-by: H. Peter Anvin 
> ---
>  arch/x86/include/asm/processor.h | 27 +--
>  1 file changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h 
> b/arch/x86/include/asm/processor.h
> index 9a4ee46..b381df7 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -673,17 +673,24 @@ static inline void sync_core(void)
>   int tmp;
>  
>  #ifdef CONFIG_M486
> - if (boot_cpu_data.x86 < 5)
> - /* There is no speculative execution.
> -  * jmp is a barrier to prefetching. */
> - asm volatile("jmp 1f\n1:\n" ::: "memory");
> - else
> + /*
> +  * Do a CPUID if available, otherwise do a jump.  The jump
> +  * can conveniently enough be the jump around CPUID.
> +  */
> + asm volatile("cmpl %2,%1\n\t"
> +  "jl 1f\n\t"
> +  "cpuid\n"
> +  "1:"
> +  : "=a" (tmp)
> +  : "rm" (boot_cpu_data.cpuid_level), "ri" (0), "0" (1)
> +  : "ebx", "ecx", "edx", "memory");
> +#else
> + /* cpuid is a barrier to speculative execution.
> +  * Prefetched instructions are automatically
> +  * invalidated when modified. */

While at it, you could correct this comment to adhere to kernel coding
style:

/*
 * cpuid is a barrier...
 * ...
 */

> + asm volatile("cpuid" : "=a" (tmp) : "0" (1)
> +  : "ebx", "ecx", "edx", "memory");

... and then write this in its shorter form:

tmp = cpuid_eax(1);

to have it a bit easier on the eyes.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/8] x86, cleanups: Simplify sync_core() in the case of no CPUID

2012-11-29 Thread Borislav Petkov

On Wed, Nov 28, 2012 at 04:14:43PM -0800, H. Peter Anvin wrote:
> Wrong barrier semantics.

Let me try to understand what you mean by that :)

Now, your version's asm output looks like this:


.loc 2 95 0
movl$139, %esi  #, tmp140
xorl%eax, %eax  # tmp141
movl%esi, %ecx  # tmp140,
movl%eax, %edx  # tmp141,
#APP
# 95 "/home/boris/kernel/linux-2.6/arch/x86/include/asm/msr.h" 1
wrmsr
# 0 "" 2
#NO_APP
.loc 3 694 0
movb$1, %al #,
#APP
# 694 "/home/boris/kernel/linux-2.6/arch/x86/include/asm/processor.h" 1
cpuid
# 0 "" 2
.LVL15:
#NO_APP


This is the sync_core() call from early_init_intel(). Now look like 139
remains in %ecx after the WRMSR and CPUID actually gets called with
RAX=1 and RCX=139 (btw, 139 is MSR_IA32_UCODE_REV). Even if this works
I'd say, we don't want to have any stray values in RCX when doing CPUID,
no?

Now here's the version with the change I proposed:


.loc 2 95 0
xorl%esi, %esi  # tmp144
movl$139, %edi  #, tmp143
movl%edi, %ecx  # tmp143,
movl%esi, %eax  # tmp144,
movl%esi, %edx  # tmp144,
#APP
# 95 "/home/boris/kernel/linux-2.6/arch/x86/include/asm/msr.h" 1
wrmsr
# 0 "" 2
.LVL15:
#NO_APP
.loc 3 199 0
movb$1, %al #,
movl%esi, %ecx  # tmp144, ecx
#APP
# 199 "/home/boris/kernel/linux-2.6/arch/x86/include/asm/processor.h" 1
cpuid
# 0 "" 2
.LVL16:
#NO_APP


RCX gets correctly cleaned to 0 and *then* we call CPUID.

And the asm output is the same except that RCX gets correctly cleaned up
before calling CPUID.

So what am I missing?

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] NULL pointer dereference in 3.7-rc7 (syscall_trace_enter)

2012-11-29 Thread Borislav Petkov

On Thu, Nov 29, 2012 at 12:20:02AM +0100, Ian Kumlien wrote:
> Hi, 
> 
> Due to unexplained dns problems, I'll be using google plus to post the
> photo of the bug output.
> 
> https://plus.google.com/photos/110698868656495230656/albums/5816005854482735041
> 
> I'm sorry but my knowledge is limited and current caffeine level is low,
> so I'm offloading to someone who has these things handled ;)

Hmm, this looks strange, were you doing any system tracing or similar?
How reproducible is this? If you can reliably reproduce it, can you
make a much more readable screen photo of it so that one can read all
register values and the "Code:" section in the backtrace is complete and
also readable?

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4 v7] AMD64 EDAC: Fix PCI function lookup

2012-11-29 Thread Borislav Petkov

On Tue, Nov 27, 2012 at 02:32:11PM +0800, Daniel J Blueman wrote:
> Fix locating sibling memory controller PCI functions by using the
> correct PCI domain.
> 
> v7: Refactor patches grouping changes
> 
> Signed-off-by: Daniel J Blueman 
> ---
>  drivers/edac/amd64_edac.c |   40 +---
>  1 file changed, 21 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 60e93fa..62b7b17 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -983,6 +983,24 @@ static u64 get_error_address(struct mce *m)
>   return addr;
>  }
>  
> +static struct pci_dev *pci_get_related_function(unsigned int vendor,
> + unsigned int device,
> + struct pci_dev *related)
> +{
> + struct pci_dev *dev = NULL;
> +
> + dev = pci_get_device(vendor, device, dev);
> + while (dev) {
> + if (pci_domain_nr(dev->bus) == pci_domain_nr(related->bus) &&
> + (dev->bus->number == related->bus->number) &&
> + (PCI_SLOT(dev->devfn) == PCI_SLOT(related->devfn)))
> + break;
> + dev = pci_get_device(vendor, device, dev);
> + }

This loop looks strange and I'm wondering, wouldn't it be more
straightforward to simply do:

while (dev = pci_get_device(vendor, device, dev)) {
if (...)
break;
}

return dev;

I realize the original code does this pci_get_device twice but it is
crap IMO.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] NULL pointer dereference in 3.7-rc7 (syscall_trace_enter)

2012-11-29 Thread Borislav Petkov

On Thu, Nov 29, 2012 at 01:27:08PM +0100, Ian Kumlien wrote:
> I think that chrome does traceing all the time as a part of it's
> sandbox - this is most likely chrome monitoring flash...

Ah, ok.

> BUG: unable to handle kernel NULL pointer dereference at
> 0063
> IP: [] syscall_trace_enter+0x15e/0x191
> PGD 0
> Oops:  [#1] PREEMPT SMP
> Modules linked in: snd_usb_audio snd_usbmidi_lib nouveau mxm_wmi wmi
> i2x_algo_bit ttm drm_kms_helper drm
> CPU 0
> Pid: 24590, comm: chrome Not tainted 3.7.0-rc7 #50 System manufacturer
> System Product Name/A8N32-SLI-Deluxe
> RIP: 0010:[] []
> syscall_trace_enter+0x15e/0x191
> RSP: 0018:8800058e3f38 EFLAGS: 00010206
> RAX: 0081 RBX: 8800058e3f58 RCX: 0063
> RDX: 7fe1f2fbde18 RSI: 00ca RDI: 7fe1f2fbde18
> RBP:  R08: 0001 R09: 7fe23f9fcb10
> R10: 7fe23f9fcb10 R11: 0206 R12: 0032
> R13: 7fe23f9fd9c0 R14: 7fe25d743710 R15: 0007
> FS:  7fe23f9fd700() GS:88013fc0()
> knlGS:f5c88740
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0063 CR3: 83a84000 CR4: 07f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process chrome (pid: 24590, threadinfo 8800058e2000, task
> 88003dacd3b0)
> Stack:
>  7fe1f2fbde10 0001 0032 8160646c
>  0007 7fe25d743710 7fe23f9fd9c0 0032
>  0001 7fe1f2fbde10 0206 7fe23f9fcb10
> Call Trace:
>  [] ? tracesys+0x7e/0xe2
> Code: 53 28 48 85 ff 74 29 83 3f 00 75 24 eb 37 65 48 8b 0c 25 80 b8 00
> 00 48 8b 89 c0 04 00 00 4c 8b 4b 38 48 8b 53 70 48 85 c9 74 08 <83> 39
> 00 74 1f 48 83 ca ff 48 85 ed 75 04 48 8b 53 78 5b 5d 48
> RIP  [] syscall_trace_eneter+0x15e/0x191
>  RSP 
> CR2: 0063

Right, so I can get the code now where it happens, but it is pretty
unreliable to map it to what my compiler generates here (of course,
different compilers and hardware):

Code: 53 28 48 85 ff 74 29 83 3f 00 75 24 eb 37 65 48 8b 0c 25 80 b8 00 00 48 
8b 89 c0 04 00 00 4c 8b 4b 38 48 8b 53 70 48 85 c9 74 08 <83> 39 00 74 1f 48 83 
ca ff 48 85 ed 75 04 48 8b 53 78 5b 5d 48
All code

   0:   53  push   %rbx
   1:   28 48 85sub%cl,-0x7b(%rax)
   4:   ff 74 29 83 pushq  -0x7d(%rcx,%rbp,1)
   8:   3f  (bad)  
   9:   00 75 24add%dh,0x24(%rbp)
   c:   eb 37   jmp0x45
   e:   65 48 8b 0c 25 80 b8mov%gs:0xb880,%rcx
  15:   00 00 
  17:   48 8b 89 c0 04 00 00mov0x4c0(%rcx),%rcx
  1e:   4c 8b 4b 38 mov0x38(%rbx),%r9
  22:   48 8b 53 70 mov0x70(%rbx),%rdx
  26:   48 85 c9test   %rcx,%rcx
  29:   74 08   je 0x33
  2b:*  83 39 00cmpl   $0x0,(%rcx) <-- trapping instruction
  2e:   74 1f   je 0x4f
  30:   48 83 ca ff or $0x,%rdx
  34:   48 85 edtest   %rbp,%rbp
  37:   75 04   jne0x3d
  39:   48 8b 53 78 mov0x78(%rbx),%rdx
  3d:   5b  pop%rbx
  3e:   5d  pop%rbp
  3f:   48  rex.W

So we oops when we try to deref 0x63 which is, of course, not a valid
pointer. The question is, what exactly is that thing in rcx. It looks
like a percpu variable to me but I'm not sure.

Can you do:

make arch/x86/kernel/ptrace.lst

and send me that file, privately is fine too.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4 v7] AMD64 EDAC: Fix type usage in NB IDs and memory ranges

2012-11-29 Thread Borislav Petkov

On Tue, Nov 27, 2012 at 02:32:12PM +0800, Daniel J Blueman wrote:
> Use appropriate types for northbridge IDs and memory ranges.
> 
> v7: Refactor patches grouping changes
> 
> Signed-off-by: Daniel J Blueman 
> ---
>  arch/x86/include/asm/amd_nb.h |2 +-
>  drivers/edac/amd64_edac.c |   20 ++--
>  drivers/edac/amd64_edac.h |6 +++---
>  3 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
> index 417eb24..d2e703b 100644
> --- a/arch/x86/include/asm/amd_nb.h
> +++ b/arch/x86/include/asm/amd_nb.h
> @@ -76,7 +76,7 @@ static inline bool amd_nb_has_feature(unsigned feature)
>   return ((amd_northbridges.flags & feature) == feature);
>  }
>  
> -static inline struct amd_northbridge *node_to_amd_nb(int node)
> +static inline struct amd_northbridge *node_to_amd_nb(u16 node)
>  {
>   return (node < amd_northbridges.num) ? &amd_northbridges.nb[node] : 
> NULL;
>  }
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 62b7b17..b27412a 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -239,7 +239,7 @@ static int amd64_get_scrub_rate(struct mem_ctl_info *mci)
>   * DRAM base/limit associated with node_id
>   */
>  static bool amd64_base_limit_match(struct amd64_pvt *pvt, u64 sys_addr,
> -unsigned nid)
> +u8 nid)
>  {
>   u64 addr;
>  
> @@ -265,7 +265,7 @@ static struct mem_ctl_info *find_mc_by_sys_addr(struct 
> mem_ctl_info *mci,
>   u64 sys_addr)
>  {
>   struct amd64_pvt *pvt;
> - unsigned node_id;
> + u8 node_id;
>   u32 intlv_en, bits;
>  
>   /*
> @@ -1348,7 +1348,7 @@ static u8 f1x_determine_channel(struct amd64_pvt *pvt, 
> u64 sys_addr,
>  }
>  
>  /* Convert the sys_addr to the normalized DCT address */
> -static u64 f1x_get_norm_dct_addr(struct amd64_pvt *pvt, unsigned range,
> +static u64 f1x_get_norm_dct_addr(struct amd64_pvt *pvt, u8 range,
>u64 sys_addr, bool hi_rng,
>u32 dct_sel_base_addr)
>  {
> @@ -1399,7 +1399,7 @@ static u64 f1x_get_norm_dct_addr(struct amd64_pvt *pvt, 
> unsigned range,
>   * checks if the csrow passed in is marked as SPARED, if so returns the new
>   * spare row
>   */
> -static int f10_process_possible_spare(struct amd64_pvt *pvt, u8 dct, int 
> csrow)
> +static int f10_process_possible_spare(struct amd64_pvt *pvt, u16 dct, int 
> csrow)

This one can stay u8 since it comes from dram_dst_node() through
f1x_lookup_addr_in_dct() and it is u8 already there.

But, in general, the patches look much more straightforward and easy to
review, so please add those minor changes and I'll pick them up.

Also, I'm assuming you're testing them on both your Numascale systems
and on a normal AMD multisocket box, correct?

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] NULL pointer dereference in 3.7-rc7 (syscall_trace_enter)

2012-11-29 Thread Borislav Petkov

On Thu, Nov 29, 2012 at 05:24:03PM +0100, Ian Kumlien wrote:
> > Can you do:
> > 
> > make arch/x86/kernel/ptrace.lst
> > 
> > and send me that file, privately is fine too.
> 
> Done, =)

Ok, thanks. Here it is:

8100b627:   83 3f 00cmpl   $0x0,(%rdi)
8100b62a:   75 24   jne8100b650 

8100b62c:   eb 37   jmp8100b665 

8100b62e:   65 48 8b 0c 25 00 00mov%gs:0x0,%rcx
8100b635:   00 00 
8100b633: R_X86_64_32S  current_task
extern void __audit_seccomp(unsigned long syscall, long signr, int code);
extern void __audit_ptrace(struct task_struct *t);

static inline int audit_dummy_context(void)
{
void *p = current->audit_context;
8100b637:   48 8b 89 c0 04 00 00mov0x4c0(%rcx),%rcx
regs->orig_ax,
regs->bx, regs->cx,
regs->dx, regs->si);
#ifdef CONFIG_X86_64
else
audit_syscall_entry(AUDIT_ARCH_X86_64,
8100b63e:   4c 8b 4b 38 mov0x38(%rbx),%r9
8100b642:   48 8b 53 70 mov0x70(%rbx),%rdx
return !p || *(int *)p;
8100b646:   48 85 c9test   %rcx,%rcx
8100b649:   74 05   je 8100b650 

8100b64b:   83 39 00cmpl   $0x0,(%rcx)
8100b64e:   74 1f   je 8100b66f 

regs->di, regs->si,
regs->dx, regs->r10);
#endif

out:
return ret ?: regs->orig_ax;
8100b650:   48 83 ca ff or $0x,%rdx
8100b654:   48 85 edtest   %rbp,%rbp
8100b657:   75 04   jne8100b65d 

8100b659:   48 8b 53 78 mov0x78(%rbx),%rdx
}
8100b65d:   5b  pop%rbx
8100b65e:   5d  pop%rbp
8100b65f:   48 89 d0mov%rdx,%rax
8100b662:   41 5c   pop%r12
8100b664:   c3  retq

We're calling audit_syscall_entry() for a 64-bit task (chrome) and we
check whether the audit context of the task is not a dummy one.

We fail at the second check in

return !p || *(int *)p;

when we're trying to deref the ->audit_context pointer of current and
then check it for being 0 in audit_syscall_entry. It turns out it is
some random crap, as we saw already: RCX=0063.

>From looking at the code, task audit contexts get normally allocated
at fork time and dealloc'd at task exit time so your process should
actually have a valid task context.

The only explanation I have is that it could be some random corruption
which f*cked up the ->audit_context pointer but I might be wrong. Btw,
do you have CONFIG_AUDITSYSCALL enabled in your kernel?

I'd say right now we could watch this and if it is reproducible, then
we can involve some more brain power and skills into it. If it has been
only a single occurrence, then we'll write it on the random corruption's
tab.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/8] x86, cleanups: Simplify sync_core() in the case of no CPUID

2012-11-29 Thread Borislav Petkov

On Thu, Nov 29, 2012 at 01:06:20PM -0800, H. Peter Anvin wrote:
> It doesn't matter in that context, as the surrounding MSR references
> have barriers, but what I'm refering to is the "memory" barrier.

Ok, but the only difference between the two versions is this line:

movl%esi, %ecx  # tmp144, ecx

coming from the cpuid_eax() function. So the memory barrier is the same
and in the right place in both cases.

> The value of ECX doesn't matter when EAX=1.

Ok.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/8] x86, cleanups: Simplify sync_core() in the case of no CPUID

2012-11-29 Thread Borislav Petkov

On Thu, Nov 29, 2012 at 01:20:00PM -0800, H. Peter Anvin wrote:
> On 11/29/2012 01:18 PM, Borislav Petkov wrote:
> > On Thu, Nov 29, 2012 at 01:06:20PM -0800, H. Peter Anvin wrote:
> >> It doesn't matter in that context, as the surrounding MSR references
> >> have barriers, but what I'm refering to is the "memory" barrier.
> > 
> > Ok, but the only difference between the two versions is this line:
> > 
> > movl%esi, %ecx  # tmp144, ecx
> > 
> > coming from the cpuid_eax() function. So the memory barrier is the same
> > and in the right place in both cases.
> > 
> 
> In the case of that one call site, yes, because the MSR references
> include the barrier.  Other sites, current or future, may not have the
> same property.

Sorry, but I think we're misunderstanding each other in some way, so let
me restart. Here's the version I'm suggesting:

static inline void sync_core(void)
{
int tmp;

#ifdef CONFIG_M486
/*
 * Do a CPUID if available, otherwise do a jump.  The jump
 * can conveniently enough be the jump around CPUID.
 */
asm volatile("cmpl %2,%1\n\t"
 "jl 1f\n\t"
 "cpuid\n"
 "1:"
 : "=a" (tmp)
 : "rm" (boot_cpu_data.cpuid_level), "ri" (0), "0" (1)
 : "ebx", "ecx", "edx", "memory");
#else
/*
 * CPUID is a barrier to speculative execution. Prefetched instructions
 * are automatically invalidated when modified.
 */
tmp = cpuid_eax(1);

/*
asm volatile("cpuid" : "=a" (tmp) : "0" (1)
 : "ebx", "ecx", "edx", "memory");
*/
#endif
}

with the last asm volatile("cpuid"...) commented out. The only
non-trivial difference between the two is the zeroing out of %ecx when
looking at the resulting asm.

Now, cpuid_eax is actually native_cpuid() which has the memory barrier
character by having an "asm volatile" in there too and it too clobbers
memory.

Are you saying that there's a semantic difference between the naked
"asm volatile" and the compiler inlining a couple of inline functions
resulting in the same "asm volatile" memory barrier for it?

Hmm, strange.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/8] x86, cleanups: Simplify sync_core() in the case of no CPUID

2012-11-29 Thread Borislav Petkov

On Thu, Nov 29, 2012 at 01:24:26PM -0800, H. Peter Anvin wrote:
> Thinking about it some more, there is another reason to not do
> this, which is that we don't want this particular CPUID to be
> paravirtualized; we're after the synchronizing side effect, not the
> CPUID return value itself.

Ok, you're right, this can be a PVOP call.

> So let's leave it as a primitive; it gets too confusing otherwise.

Ok.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] NULL pointer dereference in 3.7-rc7 (syscall_trace_enter)

2012-11-29 Thread Borislav Petkov

On Fri, Nov 30, 2012 at 12:56:08AM +0100, Ian Kumlien wrote:
> > From looking at the code, task audit contexts get normally allocated
> > at fork time and dealloc'd at task exit time so your process should
> > actually have a valid task context.
> 
> Weird, and this should be allocated automatically?

Yes, during task creation in copy_process we do audit_alloc and in
do_exit() we do audit_free.

> > The only explanation I have is that it could be some random corruption
> > which f*cked up the ->audit_context pointer but I might be wrong. Btw,
> > do you have CONFIG_AUDITSYSCALL enabled in your kernel?
> 
> grep CONFIG_AUDITSYSCALL .config
> CONFIG_AUDITSYSCALL=y

Ok.

> > I'd say right now we could watch this and if it is reproducible, then
> > we can involve some more brain power and skills into it. If it has been
> > only a single occurrence, then we'll write it on the random corruption's
> > tab.
> 
> Uhmmm oki

Right, so thinking purely hypothetically I can imagine that there might
be some small window where we're in the process of freeing the audit
context during task exit and we issue a syscall which gets traced and we
end up in the audit_syscall_entry but AFAICT when we free the context,
we do get it and do tsk->audit_context = NULL which cannot explain the
funny ECX value. Hmm, strange.

But it doesn't bring a whole lot in us conjecturing what has happened if
this cannot be reliably reproduced so please watch your box and be on
alert for similar oopses and the steps you've made to cause them.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/3] perf: Add persistent event facilities

2013-03-18 Thread Borislav Petkov

Hi,

On Mon, Mar 18, 2013 at 06:13:58PM +0900, Namhyung Kim wrote:
> > -static void ring_buffer_put(struct ring_buffer *rb)
> > +void perf_ring_buffer_put(struct ring_buffer *rb)
> 
> Why did you rename this function?

Yeah, actually that's not needed.

However, the perf ring buffer function naming is kinda inconsistent:

ring_buffer_*
rb_*

Since we're keeping those internal to perf, they maybe should have the
same prefix, no?

I vote for "rb_" because it is shorter. :)

> > + err_event_file:
> > +   perf_event_release_kernel(event);
> 
> It needs to reset event to ERR_PTR(-ENOMEM) ?

Yeah, the error path in this function is kinda clumsy, I'll do some more
staring at how to make it simpler/better.

[ … ]

> > +   __list_del(desc->plist.prev, desc->plist.next);
> 
> Why not using list_del(&desc->plist) ?

Will do.

[ … ]

> > + err_event_file:
> > +   put_unused_fd(event_fd);
>
> Isn't it safe to have event_fd of -1 in case not found? Anyway, if
> it's returned to the user space directly, it's better having more
> meaningful error code IMHO.

Yeah, I rewrote that one in the meantime to use a helper and it is
cleaner now.

Thanks for the review!

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] Perf persistent events

2013-03-18 Thread Borislav Petkov

On Mon, Mar 18, 2013 at 09:40:08AM +0100, Ingo Molnar wrote:
> That definitely looks interesting and desirable. It would be nice to
> have more generic/flexible semantics by using the VFS for tracing
> context discovery.
>
> That would allow 'stateful tracing', and not just in a kernel
> initiated fashion: we could basically do ftrace-alike tracing, into
> persistent, VFS-named buffers.
>
> The question is, how are the individual buffers identified when and
> after they have been created? An option would be to use cgroups for
> that - cgroups already has its own VFS and syscall interfaces. But
> maybe some other, explicit interface is needed (eventfs).

My latest knowledge is that Steve needs an events filesystem too because
he wants to do mkdir in debugfs. So maybe we should really do an eventfs
finally - it has been asked for for so many times now. :)

> All the usecases we talked about in the past would work fine that way:
>
>  - the MCE events would show up as an already created set of buffers,
>  discoverable via the VFS interface.

Right, in the MCE case though, we want to enable them as early as
possible, i.e. long before we can even manipulate something through the
VFS. So my current thinking is that we need both - a way to enable a
persistent event from within the kernel and then the eventfs thing.

>  - user-space could generate more 'tracing/profiling contexts' runtime.

Sure.

>  - a boot tracer would activate via a boot option, and it would create
>  a tracing context - visible via the VFS interface.

Right, and this one you still want to enable as early as possible, long
before userspace can access something through VFS. Btw, how are we going
to boottrace stuff which happens before perf initialization? Cache it
into buffers somewhere?

>  - modern RAS daemon replacing mcelog
>
> If you make that work, via a new perf tool side as well that allows
> the creation of a tracing context (and a separate extraction as well),
> via modified 'perf trace' or a new subcommand, that would be an major,
> upstream-worthy perf feature IMO which would go way beyond the RAS
> usecase.

Right, ok, so we could start working towards that too but I'd like to
not delay the persistent events stuff so that we can finally do the RAS
daemon, and do it properly.

Besides, having the eventfs and persistency would be a different aspect
to manipulating perf events and can be developed independently and in
parallel.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: nouveau lockdep splat

2013-03-19 Thread Borislav Petkov

On Tue, Mar 05, 2013 at 05:30:52PM +0100, Lucas Stach wrote:
> Dropping Tegra ML, it's not the place where Nouveau mails should go.
> Adding Nouveau ML and Maarten, who probably knows Lockdep+Nouveau best.

Ok,

with the hope of having the right people on CC now (finally, thanks
Lucas :-)), here's the same splat on -rc3. Someone better take a look
soonish, please:

[0.541078] [drm] No driver support for vblank timestamp query.
[0.541272] nouveau  [ DRM] 3 available performance level(s)
[0.541276] nouveau  [ DRM] 0: core 135MHz shader 270MHz memory 135MHz 
voltage 900mV
[0.541280] nouveau  [ DRM] 1: core 405MHz shader 810MHz memory 405MHz 
voltage 900mV
[0.541284] nouveau  [ DRM] 3: core 520MHz shader 1230MHz memory 790MHz 
voltage 900mV
[0.541287] nouveau  [ DRM] c: core 405MHz shader 810MHz memory 405MHz 
voltage 900mV
[0.559846] nouveau  [ DRM] MM: using COPY for buffer copies
[0.625371] nouveau  [ DRM] allocated 1920x1080 fb: 0x7, bo 
88043b54f000
[0.625441] fbcon: nouveaufb (fb0) is primary device
[0.62] 
[0.625556] =
[0.625556] [ INFO: possible recursive locking detected ]
[0.625557] 3.9.0-rc3+ #25 Not tainted
[0.625557] -
[0.625558] swapper/0/1 is trying to acquire lock:
[0.625562]  (&dmac->lock){+.+...}, at: [] 
evo_wait+0x43/0xf0
[0.625562] 
[0.625562] but task is already holding lock:
[0.625564]  (&dmac->lock){+.+...}, at: [] 
evo_wait+0x43/0xf0
[0.625565] 
[0.625565] other info that might help us debug this:
[0.625565]  Possible unsafe locking scenario:
[0.625565] 
[0.625565]CPU0
[0.625565]
[0.625566]   lock(&dmac->lock);
[0.625567]   lock(&dmac->lock);
[0.625567] 
[0.625567]  *** DEADLOCK ***
[0.625567] 
[0.625567]  May be due to missing lock nesting notation
[0.625567] 
[0.625568] 10 locks held by swapper/0/1:
[0.625570]  #0:  (&__lockdep_no_validate__){..}, at: 
[] __driver_attach+0x5b/0xb0
[0.625572]  #1:  (&__lockdep_no_validate__){..}, at: 
[] __driver_attach+0x69/0xb0
[0.625575]  #2:  (drm_global_mutex){+.+.+.}, at: [] 
drm_get_pci_dev+0xc6/0x2d0
[0.625578]  #3:  (registration_lock){+.+.+.}, at: [] 
register_framebuffer+0x25/0x310
[0.625581]  #4:  (&fb_info->lock){+.+.+.}, at: [] 
lock_fb_info+0x26/0x60
[0.625583]  #5:  (console_lock){+.+.+.}, at: [] 
register_framebuffer+0x1ba/0x310
[0.625585]  #6:  ((fb_notifier_list).rwsem){.+.+.+}, at: 
[] __blocking_notifier_call_chain+0x42/0x80
[0.625587]  #7:  (&dev->mode_config.mutex){+.+.+.}, at: 
[] drm_modeset_lock_all+0x2a/0x70
[0.625589]  #8:  (&crtc->mutex){+.+.+.}, at: [] 
drm_modeset_lock_all+0x54/0x70
[0.625591]  #9:  (&dmac->lock){+.+...}, at: [] 
evo_wait+0x43/0xf0
[0.625591] 
[0.625591] stack backtrace:
[0.625592] Pid: 1, comm: swapper/0 Not tainted 3.9.0-rc3+ #25
[0.625593] Call Trace:
[0.625595]  [] __lock_acquire+0x76b/0x1c20
[0.625597]  [] ? dcb_table+0x1ac/0x2a0
[0.625599]  [] lock_acquire+0x8a/0x120
[0.625600]  [] ? evo_wait+0x43/0xf0
[0.625602]  [] ? mutex_lock_nested+0x292/0x330
[0.625603]  [] mutex_lock_nested+0x6e/0x330
[0.625605]  [] ? evo_wait+0x43/0xf0
[0.625606]  [] ? mark_held_locks+0x9b/0x100
[0.625607]  [] evo_wait+0x43/0xf0
[0.625609]  [] nv50_display_flip_next+0x713/0x7a0
[0.625611]  [] ? mutex_unlock+0xe/0x10
[0.625612]  [] ? evo_kick+0x37/0x40
[0.625613]  [] nv50_crtc_commit+0x10e/0x230
[0.625615]  [] drm_crtc_helper_set_mode+0x365/0x510
[0.625617]  [] drm_crtc_helper_set_config+0xa4e/0xb70
[0.625618]  [] drm_mode_set_config_internal+0x31/0x70
[0.625619]  [] drm_fb_helper_set_par+0x71/0xf0
[0.625621]  [] fbcon_init+0x514/0x5a0
[0.625623]  [] visual_init+0xbc/0x120
[0.625624]  [] do_bind_con_driver+0x163/0x320
[0.625625]  [] do_take_over_console+0x61/0x70
[0.625627]  [] do_fbcon_takeover+0x63/0xc0
[0.625628]  [] fbcon_event_notify+0x5fd/0x700
[0.625629]  [] notifier_call_chain+0x4d/0x70
[0.625630]  [] __blocking_notifier_call_chain+0x58/0x80
[0.625631]  [] blocking_notifier_call_chain+0x16/0x20
[0.625633]  [] fb_notifier_call_chain+0x1b/0x20
[0.625634]  [] register_framebuffer+0x1c8/0x310
[0.625635]  [] drm_fb_helper_initial_config+0x371/0x520
[0.625637]  [] ? 
drm_fb_helper_single_add_all_connectors+0x47/0xf0
[0.625639]  [] ? kmem_cache_alloc_trace+0xee/0x150
[0.625641]  [] nouveau_fbcon_init+0x10e/0x160
[0.625643]  [] nouveau_drm_load+0x40a/0x5d0
[0.625644]  [] ? device_register+0x1e/0x30
[0.625645]  [] ? drm_sysfs_device_add+0x86/0xb0
[0.625647]  [] drm_get_pci_dev+0x186/0x2d0
[0.625649]  [] ? __pci_set_master+0x2b/0x90
[0.625650]  [] nouveau_drm_probe+0x26a/0x2c0
[0.625652]  [] ? pci_match_device+0xd5/0xe0
[0.625654]

Re: [PATCH] x86/mce: Rework cmci_rediscover() to play well with CPU hotplug

2013-03-20 Thread Borislav Petkov

+ Tony.

On Wed, Mar 20, 2013 at 03:31:29PM +0530, Srivatsa S. Bhat wrote:
> On 03/20/2013 08:46 AM, Chen Gong wrote:
> > On Tue, Mar 19, 2013 at 06:44:08PM -0400, Dave Jones wrote:
> >>
> >> offlining a CPU in 3.9-rc3 gets me this trace..
> >>
> >> numa_remove_cpu cpu 1 node 0: mask now 0,2-3
> >> smpboot: CPU 1 is now offline
> >> BUG: using smp_processor_id() in preemptible [] code: 
> >> cpu-offline.sh/10591
> >> caller is cmci_rediscover+0x6a/0xe0
> >> Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
> >> Call Trace:
> >>  [] debug_smp_processor_id+0xdd/0x100
> >>  [] cmci_rediscover+0x6a/0xe0
> >>  [] mce_cpu_callback+0x19d/0x1ae
> >>  [] notifier_call_chain+0x66/0x150
> >>  [] __raw_notifier_call_chain+0xe/0x10
> >>  [] cpu_notify+0x23/0x50
> >>  [] cpu_notify_nofail+0xe/0x20
> >>  [] _cpu_down+0x302/0x350
> >>  [] cpu_down+0x36/0x50
> >>  [] store_online+0x8d/0xd0
> >>  [] dev_attr_store+0x18/0x30
> >>  [] sysfs_write_file+0xdb/0x150
> >>  [] vfs_write+0xa2/0x170
> >>  [] sys_write+0x4c/0xa0
> >>  [] system_call_fastpath+0x16/0x1b
> >>
> > Try this patch:
> > 
> > diff a/arch/x86/kernel/cpu/mcheck/mce_intel.c 
> > b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > index 402c454..692c91e 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > @@ -311,10 +311,12 @@ void cmci_rediscover(int dying)
> > if (cpu == dying)
> > continue;
> > 
> > -   if (cpu == smp_processor_id()) {
> > +   if (cpu == get_cpu()) {
> > +   put_cpu();
> > cmci_rediscover_work_func(NULL);
> > continue;
> > -   }
> > +   } else
> > +   put_cpu();
> > 
> > work_on_cpu(cpu, cmci_rediscover_work_func, NULL);
> > }
> > 
> 
> That doesn't really look right to me. In fact, the function cmci_rediscover()
> looks like it needs some attention. Let me quote the function here, so
> that its easier to discuss what's wrong with it..
> 
> 
> /*
>  * After a CPU went down cycle through all the others and rediscover
>  * Must run in process context.
>  */
> void cmci_rediscover(int dying)
> {
> int cpu, banks;
> 
> if (!cmci_supported(&banks))
> return;
> 
> for_each_online_cpu(cpu) {
> if (cpu == dying)
> continue;
> 
> if (cpu == smp_processor_id()) {
> cmci_rediscover_work_func(NULL);
> continue;
> }
> 
> work_on_cpu(cpu, cmci_rediscover_work_func, NULL);
>   }
> }
> 
> First of all, I think the comment that says that it must run in process
> context, is stale. I think its a remnant of the code which used to do
> GFP_KERNEL allocations for a temporary cpumask (looking at git logs).
> The function cmci_discover() takes a spin lock with irqs disabled. So
> obviously this whole thing can run in atomic context.
> 
> And cmci_rediscover() is called from CPU_POST_DEAD handler. So the CPU
> which was supposed to go offline would have already gone offline and
> out of the cpu_online_mask. So there is no point checking
> 'if (cpu == dying)' in that for-loop.
> 
> So, how about something like this:
> 
> >
> 
> From: Srivatsa S. Bhat 
> Subject: [PATCH] x86/mce: Rework cmci_rediscover() to play well with CPU 
> hotplug
> 
> Dave Jones reports that offlining a CPU leads to this trace:
> 
> numa_remove_cpu cpu 1 node 0: mask now 0,2-3
> smpboot: CPU 1 is now offline
> BUG: using smp_processor_id() in preemptible [] code:
> cpu-offline.sh/10591
> caller is cmci_rediscover+0x6a/0xe0
> Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
> Call Trace:
>  [] debug_smp_processor_id+0xdd/0x100
>  [] cmci_rediscover+0x6a/0xe0
>  [] mce_cpu_callback+0x19d/0x1ae
>  [] notifier_call_chain+0x66/0x150
>  [] __raw_notifier_call_chain+0xe/0x10
>  [] cpu_notify+0x23/0x50
>  [] cpu_notify_nofail+0xe/0x20
>  [] _cpu_down+0x302/0x350
>  [] cpu_down+0x36/0x50
>  [] store_online+0x8d/0xd0
>  [] dev_attr_store+0x18/0x30
>  [] sysfs_write_file+0xdb/0x150
>  [] vfs_write+0xa2/0x170
>  [] sys_write+0x4c/0xa0
>  [] system_call_fastpath+0x16/0x1b
> 
> 
> However, a look at cmci_rediscover shows that it can be simplified quite
> a bit, apart from solving the above issue. It invokes functions that
> take spin locks with interrupts disabled, and hence it can run in atomic
> context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
> is already dead and out of the cpu_online_mask. So take these points into
> account and simplify the code, and thereby also fix the above issue.
> 
> Reported-by: Dave Jones 
> Signed-off-by: Srivatsa S. Bhat 
> ---
> 
>  arch/x86/include/asm/mce.h |4 ++--
>  arch/x86/kernel/cpu/mcheck/mce.c   |2 +-
>  arch/

3.9-rc3 forcedeth lockdep splat with netconsole

2013-03-20 Thread Borislav Petkov

Hi,

when trying to log stuff with netconsole, I get the following:

[ … ]

[2.036977] netpoll: netconsole: device eth0 not up yet, forcing it
[2.037632] forcedeth :00:08.0: irq 42 for MSI/MSI-X
[2.037763] forcedeth :00:08.0 eth0: MSI enabled
[2.038074] forcedeth :00:08.0 eth0: no link during initialization
[2.780061] forcedeth :00:08.0 eth0: link up
[2.780709] 
[2.780710] =
[2.780710] [ INFO: inconsistent lock state ]
[2.780712] 3.9.0-rc3+ #2 Not tainted
[2.780712] -
[2.780713] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[2.780715] swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
[2.780723]  (&(&napi->poll_lock)->rlock){+.?...}, at: [] 
netpoll_poll_dev+0xe5/0xc80
[2.780723] {IN-SOFTIRQ-W} state was registered at:
[2.780728]   [] __lock_acquire+0x608/0x1de0
[2.780730]   [] lock_acquire+0x9e/0x1f0
[2.780733]   [] _raw_spin_lock+0x41/0x80
[2.780736]   [] net_rx_action+0xbe/0x330
[2.780739]   [] __do_softirq+0xe9/0x3d0
[2.780740]   [] run_ksoftirqd+0x35/0x50
[2.780744]   [] smpboot_thread_fn+0x1d4/0x2e0
[2.780749]   [] kthread+0xdb/0xe0
[2.780751]   [] ret_from_fork+0x7c/0xb0
[2.780752] irq event stamp: 576121
[2.780754] hardirqs last  enabled at (576121): [] 
restore_args+0x0/0x30
[2.780756] hardirqs last disabled at (576120): [] 
common_interrupt+0x6a/0x6f
[2.780758] softirqs last  enabled at (576090): [] 
__do_softirq+0x175/0x3d0
[2.780759] softirqs last disabled at (576085): [] 
irq_exit+0x96/0xb0
[2.780760] 
[2.780760] other info that might help us debug this:
[2.780760]  Possible unsafe locking scenario:
[2.780760] 
[2.780761]CPU0
[2.780761]
[2.780762]   lock(&(&napi->poll_lock)->rlock);
[2.780763]   
[2.780764] lock(&(&napi->poll_lock)->rlock);
[2.780764] 
[2.780764]  *** DEADLOCK ***
[2.780764] 
[2.780765] 3 locks held by swapper/0/1:
[2.780768]  #0:  (console_lock){+.+.+.}, at: [] 
register_console+0xfe/0x390
[2.780772]  #1:  (target_list_lock){+.+...}, at: [] 
write_msg+0x53/0x110
[2.780775]  #2:  (&npinfo->dev_lock){+.+...}, at: [] 
netpoll_poll_dev+0x46/0xc80
[2.780776] 
[2.780776] stack backtrace:
[2.780777] Pid: 1, comm: swapper/0 Not tainted 3.9.0-rc3+ #2
[2.780778] Call Trace:
[2.780781]  [] print_usage_bug.part.34+0x270/0x27f
[2.780785]  [] ? save_stack_trace+0x2f/0x50
[2.780787]  [] ? local_clock+0x43/0x50
[2.780789]  [] ? 
print_shortest_lock_dependencies+0x1d0/0x1d0
[2.780791]  [] mark_lock+0x27b/0x600
[2.780793]  [] __lock_acquire+0x676/0x1de0
[2.780795]  [] ? retint_restore_args+0xe/0xe
[2.780797]  [] ? _raw_spin_unlock_irqrestore+0x67/0x80
[2.780799]  [] lock_acquire+0x9e/0x1f0
[2.780801]  [] ? netpoll_poll_dev+0xe5/0xc80
[2.780803]  [] _raw_spin_trylock+0x73/0x90
[2.780804]  [] ? netpoll_poll_dev+0xe5/0xc80
[2.780806]  [] netpoll_poll_dev+0xe5/0xc80
[2.780808]  [] ? put_lock_stats.isra.17+0xe/0x40
[2.780810]  [] ? netpoll_send_skb_on_dev+0x304/0x400
[2.780812]  [] netpoll_send_skb_on_dev+0x315/0x400
[2.780814]  [] netpoll_send_udp+0x287/0x3a0
[2.780815]  [] ? write_msg+0x53/0x110
[2.780817]  [] write_msg+0xbf/0x110
[2.780818]  [] ? console_unlock+0x3d7/0x450
[2.780820]  [] 
call_console_drivers.constprop.15+0x9a/0x1d0
[2.780822]  [] console_unlock+0x3e7/0x450
[2.780823]  [] register_console+0x137/0x390
[2.780826]  [] init_netconsole+0x1a6/0x214
[2.780828]  [] ? 
driver_deferred_probe_trigger.part.9+0x80/0x80
[2.780830]  [] ? option_setup+0x1f/0x1f
[2.780832]  [] do_one_initcall+0x122/0x170
[2.780836]  [] kernel_init_freeable+0x103/0x192
[2.780837]  [] ? do_early_param+0x8c/0x8c
[2.780840]  [] ? rest_init+0x140/0x140
[2.780842]  [] kernel_init+0xe/0xe0
[2.780844]  [] ret_from_fork+0x7c/0xb0
[2.780845]  [] ? rest_init+0x140/0x140
[2.780899] [ cut here ]
[2.780901] WARNING: at net/core/netpoll.c:412 
netpoll_send_skb_on_dev+0x3e9/0x400()
[2.780903] Hardware name:  
[2.780905] netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll 
(nv_start_xmit_optimized+0x0/0x720)
[2.780907] Modules linked in:
[2.780908] Pid: 1, comm: swapper/0 Not tainted 3.9.0-rc3+ #2
[2.780908] Call Trace:
[2.780910]  [] ? netpoll_send_skb_on_dev+0x3e9/0x400
[2.780913]  [] warn_slowpath_common+0x7f/0xc0
[2.780915]  [] warn_slowpath_fmt+0x46/0x50
[2.780916]  [] ? nv_get_drvinfo+0x80/0x80
[2.780918]  [] netpoll_send_skb_on_dev+0x3e9/0x400
[2.780920]  [] netpoll_send_udp+0x287/0x3a0
[2.780922]  [] ? write_msg+0x53/0x110
[2.780924]  [] write_msg+0xbf/0x110
[2.780925]  [] ? console_unlock+0x3d7/0x450
[2.780927]  [] 
call_console_drivers.constprop.15+0x9a/0x1d0
[2.780928]  [] console_unlock+0x3e7/0x45

Re: 3.9-rc3 forcedeth lockdep splat with netconsole

2013-03-20 Thread Borislav Petkov

On Wed, Mar 20, 2013 at 12:01:51PM +0100, Borislav Petkov wrote:
> Hi,
> 
> when trying to log stuff with netconsole, I get the following:
> 
> [ … ]
> 
> [2.036977] netpoll: netconsole: device eth0 not up yet, forcing it
> [2.037632] forcedeth :00:08.0: irq 42 for MSI/MSI-X
> [2.037763] forcedeth :00:08.0 eth0: MSI enabled
> [2.038074] forcedeth :00:08.0 eth0: no link during initialization
> [2.780061] forcedeth :00:08.0 eth0: link up

Hehe,

so it is actually funny to see the things lockdep is warning about,
in action: I was just able to hang the box with the last line above
appearing on the screen.

Fun.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/6] x86, cpu: Expand ->x86_capability flags with bugs bitvector, v2

2013-03-20 Thread Borislav Petkov

From: Borislav Petkov 

Hi,

this is v2, with some added functionality in patches 5 and 6. It all
looks pretty straight-forward and testing it in kvm (even with lying to
the guest it is running on a Cyrix) works, as well as on bare metal.

Comments and suggestions appreciated,

Thanks.

Borislav Petkov (6):
  x86, cpu: Expand cpufeature facility to include cpu bugs
  x86, cpu: Convert F00F bug detection
  x86, cpu: Convert FDIV bug detection
  x86, cpu: Convert Cyrix coma bug detection
  x86, cpu: Convert AMD Erratum 383
  x86, cpu: Convert AMD Erratum 400

 arch/x86/include/asm/cpufeature.h | 19 +++
 arch/x86/include/asm/processor.h  | 25 +
 arch/x86/kernel/alternative.c |  2 +-
 arch/x86/kernel/cpu/amd.c | 31 ++-
 arch/x86/kernel/cpu/bugs.c|  7 ---
 arch/x86/kernel/cpu/common.c  |  4 
 arch/x86/kernel/cpu/cyrix.c   |  5 +++--
 arch/x86/kernel/cpu/intel.c   |  4 ++--
 arch/x86/kernel/cpu/proc.c|  6 +++---
 arch/x86/kernel/process.c |  2 +-
 arch/x86/kernel/setup.c   |  2 --
 arch/x86/kvm/svm.c|  2 +-
 arch/x86/mm/fault.c   |  2 +-
 13 files changed, 62 insertions(+), 49 deletions(-)

-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/6] x86, cpu: Convert F00F bug detection

2013-03-20 Thread Borislav Petkov

From: Borislav Petkov 

... to using the new facility and drop the cpuinfo_x86 member.

Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeature.h | 2 ++
 arch/x86/include/asm/processor.h  | 1 -
 arch/x86/kernel/cpu/intel.c   | 4 ++--
 arch/x86/kernel/cpu/proc.c| 2 +-
 arch/x86/mm/fault.c   | 2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 16190abd8905..c0ad7e75815c 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -222,6 +222,8 @@
  */
 #define X86_BUG(x) (NCAPINTS*32 + (x))
 
+#define X86_BUG_F00F   X86_BUG(0) /* Intel F00F */
+
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
 #include 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 23c8081d3870..1e55e2d543b5 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -92,7 +92,6 @@ struct cpuinfo_x86 {
charhard_math;
charrfu;
charfdiv_bug;
-   charf00f_bug;
charcoma_bug;
charpad0;
 #else
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 1905ce98bee0..1acdd42d86d1 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -209,11 +209,11 @@ static void __cpuinit intel_workarounds(struct 
cpuinfo_x86 *c)
 * system.
 * Note that the workaround only should be initialized once...
 */
-   c->f00f_bug = 0;
+   clear_cpu_bug(c, X86_BUG_F00F);
if (!paravirt_enabled() && c->x86 == 5) {
static int f00f_workaround_enabled;
 
-   c->f00f_bug = 1;
+   set_cpu_bug(c, X86_BUG_F00F);
if (!f00f_workaround_enabled) {
trap_init_f00f_bug();
printk(KERN_NOTICE "Intel Pentium with F0 0F bug - 
workaround enabled.\n");
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index e280253f6f94..2d60b2bec01c 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -35,7 +35,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct 
cpuinfo_x86 *c)
   "cpuid level\t: %d\n"
   "wp\t\t: %s\n",
   c->fdiv_bug ? "yes" : "no",
-  c->f00f_bug ? "yes" : "no",
+  static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no",
   c->coma_bug ? "yes" : "no",
   c->hard_math ? "yes" : "no",
   c->hard_math ? "yes" : "no",
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fa8c02de0d25..3da69dbc0185 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -555,7 +555,7 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long 
address)
/*
 * Pentium F0 0F C7 C8 bug workaround:
 */
-   if (boot_cpu_data.f00f_bug) {
+   if (boot_cpu_has_bug(X86_BUG_F00F)) {
nr = (address - idt_descr.address) >> 3;
 
if (nr == 6) {
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/6] x86, cpu: Convert AMD Erratum 400

2013-03-20 Thread Borislav Petkov

From: Borislav Petkov 

Convert AMD erratum 400 to the bug infrastructure. Then, retract all
exports for modules since they're not needed now and make the AMD
erratum checking machinery local to amd.c. Use forward declarations to
avoid shuffling too much code around needlessly.

Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/include/asm/processor.h  | 19 ---
 arch/x86/kernel/cpu/amd.c | 23 ---
 arch/x86/kernel/process.c |  2 +-
 4 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 4cfb5c9f6029..272fdccda5d5 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -226,6 +226,7 @@
 #define X86_BUG_FDIV   X86_BUG(1) /* FPU FDIV */
 #define X86_BUG_COMA   X86_BUG(2) /* Cyrix 6x86 coma */
 #define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* AMD Erratum 383 */
+#define X86_BUG_AMD_APIC_C1E   X86_BUG(4) /* AMD Erratum 400 */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index e044ef35f91f..4b3b43bb 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -970,25 +970,6 @@ unsigned long calc_aperfmperf_ratio(struct aperfmperf *old,
return ratio;
 }
 
-/*
- * AMD errata checking
- */
-#ifdef CONFIG_CPU_SUP_AMD
-extern const int amd_erratum_400[];
-extern bool cpu_has_amd_erratum(const int *);
-
-#define AMD_LEGACY_ERRATUM(...){ -1, __VA_ARGS__, 0 }
-#define AMD_OSVW_ERRATUM(osvw_id, ...) { osvw_id, __VA_ARGS__, 0 }
-#define AMD_MODEL_RANGE(f, m_start, s_start, m_end, s_end) \
-   ((f << 24) | (m_start << 16) | (s_start << 12) | (m_end << 4) | (s_end))
-#define AMD_MODEL_RANGE_FAMILY(range)  (((range) >> 24) & 0xff)
-#define AMD_MODEL_RANGE_START(range)   (((range) >> 12) & 0xfff)
-#define AMD_MODEL_RANGE_END(range) ((range) & 0xfff)
-
-#else
-#define cpu_has_amd_erratum(x) (false)
-#endif /* CONFIG_CPU_SUP_AMD */
-
 extern unsigned long arch_align_stack(unsigned long sp);
 extern void free_init_pages(char *what, unsigned long begin, unsigned long 
end);
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 85f84e13d2dd..9a2a71669c5d 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -514,6 +514,8 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 }
 
 static const int amd_erratum_383[];
+static const int amd_erratum_400[];
+static bool cpu_has_amd_erratum(const int *erratum);
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
 {
@@ -734,6 +736,9 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
}
 
+   if (cpu_has_amd_erratum(amd_erratum_400))
+   set_cpu_bug(c, X86_BUG_AMD_APIC_C1E);
+
rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);
 }
 
@@ -852,8 +857,7 @@ cpu_dev_register(amd_cpu_dev);
  * AMD_OSVW_ERRATUM() macros. The latter is intended for newer errata that
  * have an OSVW id assigned, which it takes as first argument. Both take a
  * variable number of family-specific model-stepping ranges created by
- * AMD_MODEL_RANGE(). Each erratum also has to be declared as extern const
- * int[] in arch/x86/include/asm/processor.h.
+ * AMD_MODEL_RANGE().
  *
  * Example:
  *
@@ -863,15 +867,22 @@ cpu_dev_register(amd_cpu_dev);
  *AMD_MODEL_RANGE(0x10, 0x9, 0x0, 0x9, 0x0));
  */
 
-const int amd_erratum_400[] =
+#define AMD_LEGACY_ERRATUM(...){ -1, __VA_ARGS__, 0 }
+#define AMD_OSVW_ERRATUM(osvw_id, ...) { osvw_id, __VA_ARGS__, 0 }
+#define AMD_MODEL_RANGE(f, m_start, s_start, m_end, s_end) \
+   ((f << 24) | (m_start << 16) | (s_start << 12) | (m_end << 4) | (s_end))
+#define AMD_MODEL_RANGE_FAMILY(range)  (((range) >> 24) & 0xff)
+#define AMD_MODEL_RANGE_START(range)   (((range) >> 12) & 0xfff)
+#define AMD_MODEL_RANGE_END(range) ((range) & 0xfff)
+
+static const int amd_erratum_400[] =
AMD_OSVW_ERRATUM(1, AMD_MODEL_RANGE(0xf, 0x41, 0x2, 0xff, 0xf),
AMD_MODEL_RANGE(0x10, 0x2, 0x1, 0xff, 0xf));
-EXPORT_SYMBOL_GPL(amd_erratum_400);
 
 static const int amd_erratum_383[] =
AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));
 
-bool cpu_has_amd_erratum(const int *erratum)
+static bool cpu_has_amd_erratum(const int *erratum)
 {
struct cpuinfo_x86 *cpu = __this_cpu_ptr(&cpu_info);
int osvw_id = *erratum++;
@@ -912,5 +923,3 @@ bool cpu_has_amd_erratum(const int *erratum)
 
return false;
 }
-
-EXPORT_SYMBOL_GPL(cpu_has_amd_erratum);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 14ae10031ff0..e718f150c880 1

[PATCH 3/6] x86, cpu: Convert FDIV bug detection

2013-03-20 Thread Borislav Petkov

From: Borislav Petkov 

... to the new facility. Add a reference to the wikipedia article
explaining the FDIV test we're doing here.

Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeature.h | 1 +
 arch/x86/include/asm/processor.h  | 1 -
 arch/x86/kernel/cpu/bugs.c| 7 ---
 arch/x86/kernel/cpu/proc.c| 2 +-
 arch/x86/kernel/setup.c   | 2 --
 5 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index c0ad7e75815c..25eb9488a9a5 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -223,6 +223,7 @@
 #define X86_BUG(x) (NCAPINTS*32 + (x))
 
 #define X86_BUG_F00F   X86_BUG(0) /* Intel F00F */
+#define X86_BUG_FDIV   X86_BUG(1) /* FPU FDIV */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1e55e2d543b5..ea22dfaf6c5e 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -91,7 +91,6 @@ struct cpuinfo_x86 {
/* Problems on some 486Dx4's and old 386's: */
charhard_math;
charrfu;
-   charfdiv_bug;
charcoma_bug;
charpad0;
 #else
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index af6455e3fcc9..c59635ecbbb8 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -59,7 +59,7 @@ static void __init check_fpu(void)
 * trap_init() enabled FXSR and company _before_ testing for FP
 * problems here.
 *
-* Test for the divl bug..
+* Test for the divl bug: http://en.wikipedia.org/wiki/Fdiv_bug
 */
__asm__("fninit\n\t"
"fldl %1\n\t"
@@ -75,9 +75,10 @@ static void __init check_fpu(void)
 
kernel_fpu_end();
 
-   boot_cpu_data.fdiv_bug = fdiv_bug;
-   if (boot_cpu_data.fdiv_bug)
+   if (fdiv_bug) {
+   set_cpu_bug(&boot_cpu_data, X86_BUG_FDIV);
pr_warn("Hmm, FPU with FDIV bug\n");
+   }
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 2d60b2bec01c..5dfb6c65138f 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -34,7 +34,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct 
cpuinfo_x86 *c)
   "fpu_exception\t: %s\n"
   "cpuid level\t: %d\n"
   "wp\t\t: %s\n",
-  c->fdiv_bug ? "yes" : "no",
+  static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no",
   static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no",
   c->coma_bug ? "yes" : "no",
   c->hard_math ? "yes" : "no",
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 90d8cc930f5e..29258c75a2f3 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -173,12 +173,10 @@ static struct resource bss_resource = {
 /* cpu data as detected by the assembly code in head.S */
 struct cpuinfo_x86 new_cpu_data __cpuinitdata = {
.wp_works_ok = -1,
-   .fdiv_bug = -1,
 };
 /* common cpu data for all cpus */
 struct cpuinfo_x86 boot_cpu_data __read_mostly = {
.wp_works_ok = -1,
-   .fdiv_bug = -1,
 };
 EXPORT_SYMBOL(boot_cpu_data);
 
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/6] x86, cpu: Convert AMD Erratum 383

2013-03-20 Thread Borislav Petkov

From: Borislav Petkov 

Convert the AMD erratum 383 testing code to the bug infrastructure. This
allows keeping the AMD-specific erratum testing machinery private to
amd.c and not export symbols to modules needlessly.

Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeature.h | 1 +
 arch/x86/include/asm/processor.h  | 1 -
 arch/x86/kernel/cpu/amd.c | 8 ++--
 arch/x86/kvm/svm.c| 2 +-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index a2b65c11081e..4cfb5c9f6029 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -225,6 +225,7 @@
 #define X86_BUG_F00F   X86_BUG(0) /* Intel F00F */
 #define X86_BUG_FDIV   X86_BUG(1) /* FPU FDIV */
 #define X86_BUG_COMA   X86_BUG(2) /* Cyrix 6x86 coma */
+#define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* AMD Erratum 383 */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4e2fa2859e39..e044ef35f91f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -974,7 +974,6 @@ unsigned long calc_aperfmperf_ratio(struct aperfmperf *old,
  * AMD errata checking
  */
 #ifdef CONFIG_CPU_SUP_AMD
-extern const int amd_erratum_383[];
 extern const int amd_erratum_400[];
 extern bool cpu_has_amd_erratum(const int *);
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index fa96eb0d02fb..85f84e13d2dd 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -513,6 +513,8 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 #endif
 }
 
+static const int amd_erratum_383[];
+
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
 {
u32 dummy;
@@ -727,6 +729,9 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
rdmsrl_safe(MSR_AMD64_BU_CFG2, &value);
value &= ~(1ULL << 24);
wrmsrl_safe(MSR_AMD64_BU_CFG2, value);
+
+   if (cpu_has_amd_erratum(amd_erratum_383))
+   set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
}
 
rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);
@@ -863,9 +868,8 @@ const int amd_erratum_400[] =
AMD_MODEL_RANGE(0x10, 0x2, 0x1, 0xff, 0xf));
 EXPORT_SYMBOL_GPL(amd_erratum_400);
 
-const int amd_erratum_383[] =
+static const int amd_erratum_383[] =
AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));
-EXPORT_SYMBOL_GPL(amd_erratum_383);
 
 bool cpu_has_amd_erratum(const int *erratum)
 {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e1b1ce21bc00..7d39d70647e3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -555,7 +555,7 @@ static void svm_init_erratum_383(void)
int err;
u64 val;
 
-   if (!cpu_has_amd_erratum(amd_erratum_383))
+   if (!static_cpu_has_bug(X86_BUG_AMD_TLB_MMATCH))
return;
 
/* Use _safe variants to not break nested virtualization */
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/6] x86, cpu: Convert Cyrix coma bug detection

2013-03-20 Thread Borislav Petkov

From: Borislav Petkov 

... to the new facility.

Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeature.h | 1 +
 arch/x86/include/asm/processor.h  | 1 -
 arch/x86/kernel/cpu/cyrix.c   | 5 +++--
 arch/x86/kernel/cpu/proc.c| 2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 25eb9488a9a5..a2b65c11081e 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -224,6 +224,7 @@
 
 #define X86_BUG_F00F   X86_BUG(0) /* Intel F00F */
 #define X86_BUG_FDIV   X86_BUG(1) /* FPU FDIV */
+#define X86_BUG_COMA   X86_BUG(2) /* Cyrix 6x86 coma */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ea22dfaf6c5e..4e2fa2859e39 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -91,7 +91,6 @@ struct cpuinfo_x86 {
/* Problems on some 486Dx4's and old 386's: */
charhard_math;
charrfu;
-   charcoma_bug;
charpad0;
 #else
/* Number of 4K pages in DTLB/ITLB combined(in pages): */
diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
index 4fbd384fb645..d048d5ca43c1 100644
--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -249,7 +249,7 @@ static void __cpuinit init_cyrix(struct cpuinfo_x86 *c)
/* Emulate MTRRs using Cyrix's ARRs. */
set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
/* 6x86's contain this bug */
-   c->coma_bug = 1;
+   set_cpu_bug(c, X86_BUG_COMA);
break;
 
case 4: /* MediaGX/GXm or Geode GXM/GXLV/GX1 */
@@ -317,7 +317,8 @@ static void __cpuinit init_cyrix(struct cpuinfo_x86 *c)
/* Enable MMX extensions (App note 108) */
setCx86_old(CX86_CCR7, getCx86_old(CX86_CCR7)|1);
} else {
-   c->coma_bug = 1;  /* 6x86MX, it has the bug. */
+   /* A 6x86MX - it has the bug. */
+   set_cpu_bug(c, X86_BUG_COMA);
}
tmp = (!(dir0_lsn & 7) || dir0_lsn & 1) ? 2 : 0;
Cx86_cb[tmp] = cyrix_model_mult2[dir0_lsn & 7];
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 5dfb6c65138f..37a198bd48c8 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -36,7 +36,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct 
cpuinfo_x86 *c)
   "wp\t\t: %s\n",
   static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no",
   static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no",
-  c->coma_bug ? "yes" : "no",
+  static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no",
   c->hard_math ? "yes" : "no",
   c->hard_math ? "yes" : "no",
   c->cpuid_level,
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/6] x86, cpu: Expand cpufeature facility to include cpu bugs

2013-03-20 Thread Borislav Petkov

From: Borislav Petkov 

We add another 32-bit vector at the end of the ->x86_capability
bitvector which collects bugs present in CPUs. After all, a CPU bug is a
kind of a capability, albeit a strange one.

Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeature.h | 13 +
 arch/x86/include/asm/processor.h  |  2 +-
 arch/x86/kernel/alternative.c |  2 +-
 arch/x86/kernel/cpu/common.c  |  4 
 4 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 93fe929d1cee..16190abd8905 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -9,6 +9,7 @@
 #endif
 
 #define NCAPINTS   10  /* N 32-bit words worth of info */
+#define NBUGINTS   1   /* N 32-bit bug flags */
 
 /*
  * Note: If the comment begins with a quoted string, that string is used
@@ -216,6 +217,11 @@
 #define X86_FEATURE_ADX(9*32+19) /* The ADCX and ADOX 
instructions */
 #define X86_FEATURE_SMAP   (9*32+20) /* Supervisor Mode Access Prevention 
*/
 
+/*
+ * BUG word(s)
+ */
+#define X86_BUG(x) (NCAPINTS*32 + (x))
+
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
 #include 
@@ -401,6 +407,13 @@ static __always_inline __pure bool __static_cpu_has(u16 
bit)
 #define static_cpu_has(bit) boot_cpu_has(bit)
 #endif
 
+#define cpu_has_bug(c, bit)cpu_has(c, (bit))
+#define set_cpu_bug(c, bit)set_cpu_cap(c, (bit))
+#define clear_cpu_bug(c, bit)  clear_cpu_cap(c, (bit));
+
+#define static_cpu_has_bug(bit)static_cpu_has((bit))
+#define boot_cpu_has_bug(bit)  cpu_has_bug(&boot_cpu_data, (bit))
+
 #endif /* defined(__KERNEL__) && !defined(__ASSEMBLY__) */
 
 #endif /* _ASM_X86_CPUFEATURE_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3270116b1488..23c8081d3870 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -107,7 +107,7 @@ struct cpuinfo_x86 {
__u32   extended_cpuid_level;
/* Maximum supported CPUID level, -1=no CPUID: */
int cpuid_level;
-   __u32   x86_capability[NCAPINTS];
+   __u32   x86_capability[NCAPINTS + NBUGINTS];
charx86_vendor_id[16];
charx86_model_id[64];
/* in KB - valid for CPUS which support this call: */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ef5ccca79a6c..c15cf9a25e27 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -271,7 +271,7 @@ void __init_or_module apply_alternatives(struct alt_instr 
*start,
replacement = (u8 *)&a->repl_offset + a->repl_offset;
BUG_ON(a->replacementlen > a->instrlen);
BUG_ON(a->instrlen > sizeof(insnbuf));
-   BUG_ON(a->cpuid >= NCAPINTS*32);
+   BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
if (!boot_cpu_has(a->cpuid))
continue;
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d814772c5bed..22018f70a671 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -920,6 +920,10 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
/* AND the already accumulated flags with these */
for (i = 0; i < NCAPINTS; i++)
boot_cpu_data.x86_capability[i] &= c->x86_capability[i];
+
+   /* OR, i.e. replicate the bug flags */
+   for (i = NCAPINTS; i < NCAPINTS + NBUGINTS; i++)
+   c->x86_capability[i] |= boot_cpu_data.x86_capability[i];
}
 
/* Init Machine Check Exception if available. */
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Nouveau] nouveau lockdep splat

2013-03-20 Thread Borislav Petkov

On Wed, Mar 20, 2013 at 07:23:19PM +0400, Lijo Antony wrote:
> # bad: [fffddfd6c8e0c10c42c6e2cc54ba880fcc36ebbb] Merge branch
> 'drm-next' of git://people.freedesktop.org/~airlied/linux
> git bisect bad fffddfd6c8e0c10c42c6e2cc54ba880fcc36ebbb

This is a merge commit which means something went wrong along the way of
the bisection.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AMD A6 3650 getting Kernel Panics in Ubuntu 12.10

2013-03-20 Thread Borislav Petkov

On Wed, Mar 20, 2013 at 11:10:58AM -0400, Rob Edwards wrote:
> I posted a question at
> http://www.dslreports.com/forum/r28112391-AMD-A6-3650-getting-Kernel-Panics-
> in-Ubuntu-12.10-  looking for help with my Ubuntu machine and some strange
> kernel panics I have been getting and was directed here.  I cant run them
> through the mcelog as it says the CPU is unknown.  (Too new I guess)?  I
> wont paste the contents of the logs in the email (its at the url I
> provided) unless asked too.  Does anyone know what I should be looking at
> for the problem?

Can you modprobe edac_mce_amd.ko and try to catch dmesg when those
errors happen again? You don't need mcelog on AMD - the decoded MCEs are
already in the kernel log.

Btw, what kernel is Ubuntu 12.10?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/9] Add blockconsole version 1.1 (try 2)

2013-03-20 Thread Borislav Petkov

Hi Andrew,

On Thu, Feb 28, 2013 at 04:39:53PM -0500, Joern Engel wrote:
> Blockconsole is a console driver very roughly similar to netconsole.
> Instead of sending messages out via UDP, they are written to a block
> device.  Typically a USB stick is chosen, although in principle any
> block device will do.
> 
> In most cases blockconsole is useful where netconsole is not, i.e.
> single machines without network access or without an accessable
> netconsole capture server.  When using both blockconsole and
> netconsole, I have found netconsole to sometimes create a mess under
> high message load (sysrq-t, etc.) while blockconsole does not.
> 
> Most importantly, a number of bugs were identified and fixed that
> would have been unexplained machine reboots without blockconsole.
> 
> More highlights:
> * reasonably small and self-contained code,
> * some 100+ machine years of runtime,
> * nice tutorial with a 30-sec guide for the impatient.

any thoughts on this?

Blockconsole is very useful in certain situations and a bunch of people
are using it already and it would be nice if we could get it moving
towards upstream.

So I'd appreciate it if you could take a look and maybe even pick it up
if there are no serious issues with it.

Thanks a bunch.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] nohz1: Documentation

2013-03-21 Thread Borislav Petkov

On Wed, Mar 20, 2013 at 07:22:59PM -0700, Paul E. McKenney wrote:
> > > > > The "full_nohz=" boot parameter specifies which CPUs are to be
> > > > > adaptive-ticks CPUs.  For example, "full_nohz=1,6-8" says that CPUs 1,
> > > > 
> > > > This is the first time you mention "adaptive-ticks". Probably should
> > > > define it before just using it, even though one should be able to figure
> > > > out what adaptive-ticks are, it does throw in a wrench when reading this
> > > > if you have no idea what an "adaptive-tick" is.
> > > 
> > > Good point, changed the first sentence of this paragraph to read:
> > > 
> > >   The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to
> > >   avoid sending scheduling-clock interrupts to CPUs with a single
> > >   runnable task, and such CPUs are said to be "adaptive-ticks CPUs".
> > 
> > Sounds good.

Yeah,

so I read this last night too and I have to say, very clearly written,
even for dummies like me.

But this "adaptive-ticks CPUs" reads kinda strange throughout the whole
text, it feels a bit weird. And since the cmdline option is called
"full_nohz", you might just as well call them the "full_nohz CPUs" or
the "full_nohz subset of CPUs" for simplicity and so that you don't have
yet another new term in the text denoting the same idea. I mean, all
those names kinda suck and need the full definition of what adaptive
ticking actually means anyway. :)

Btw, congrats on coining a new noun: "Adaptive-tick mode may prevent
this round-robining from happening."
 ^^

Funny. :-)

I spose now one can say: "The kids in the garden are round-robining on
the carousel."

or

"The kernel developers are round-robined for pull requests."

Or maybe it wasn't you who coined it after /me doing a little search. It
looks like technical people are pushing hard for it to be committed in
the upstream English language repository. :-)

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] nohz1: Documentation

2013-03-21 Thread Borislav Petkov

On Thu, Mar 21, 2013 at 08:18:11AM -0700, Paul E. McKenney wrote:
> Actually, this is a generic transformation. Given an English verb,
> you almost always add "ing" to create a noun. Since "round-robin" is
> used as a verb,

... which sounds, in this case, weird IMHO. :-)

> as in "The scheduler will round-robin between the two SCHED_RR
> tasks",

I think the "correct" way to say it is "The scheduler will select tasks
in a round-robin fashion..." But while it is correct (for some accepted
definition of correct), this is slow, has too many words and we don't
want that - we want fast! We want a lot less instructions in the pipe!
This way, we burn a lot less energy when talking. :-)

> "round-robining" may be used as a noun denoting the action
> corresponding to the verb "round-robin". There is no doubt an
> argument as to whether this should be spelled "round-robining" or
> "round-robinning", but I will leave this to those who care enough to
> argue about it. ;-)

Hey sir, you're preaching to the choir - I'm all for doing all kinds of
weird/funny experiments with language...

> The thing about English is that it is an open-source language, and
> always has been. English is defined by its usage, and the wise
> dictionary-makers try their best to keep up.

... yes, and then there are the English language Nazis who wouldn't
allow that - their rules are stricter than software APIs and breaking
userspace compatibility.

Technical people, OTOH, are much more willing and not afraid to take the
language and mold it in such a form so that it works for them instead of
adhering to ancient rules. Which is cool. That's why I was pointing out
the "round-robining" - nice and cool. And look how much shorter it is:

round-robining = iterate over the items on a list by periodically
switching from one to the next in a circular order.

Now imagine the pressure on I$ the two versions create. And compare. :-)

> (The unwise ones attempt to stop the evolution of the English
> language.) Everything good and everything bad about English stems from
> this property. ;-)

Yeah, I've had to deal with enough of those evolution-stopping idiots
during my days at the university. Well, I've got three words for them:
"Resistance is futile!"

:-)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Uhhuh. NMI received for unknown reason 2c on CPU 0.

2013-02-03 Thread Borislav Petkov

On Sun, Feb 03, 2013 at 12:04:46AM +0100, Rafael J. Wysocki wrote:
> The [2/5] is at: https://patchwork.kernel.org/patch/2001211/
> 
> The other two are attached.  I suppose the ordering doesn't matter.

Ok, the eth link cable hotplugging issue seems fixed, plugging and
unplugging the cable works as expected.

The issue I triggered earlier:

> BUT(!), if I start powertop and set all tunables in the "Tunables" tab
> to "Good", then suspend to disk, when I resume I get the NMI and this
> time the unknown reason is 0x3c.

... still happens:

[  123.250870] PM: Creating hibernation image:
[  123.504940] PM: Need to copy 95667 pages 
<--- suspend to disk
[  123.252841] Enabling non-boot CPUs ...   
<--- resume
[  123.254021] SMP alternatives: lockdep: fixing up alternatives
[  123.254026] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  123.275566] CPU1 is up
[  123.275697] SMP alternatives: lockdep: fixing up alternatives
[  123.275699] smpboot: Booting Node 0 Processor 2 APIC 0x2
[  123.297581] CPU2 is up
[  123.297699] SMP alternatives: lockdep: fixing up alternatives
[  123.297701] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  123.319358] CPU3 is up
[  123.321928] i915 :00:02.0: power state changed by ACPI to D0
[  123.321992] xhci_hcd :00:14.0: power state changed by ACPI to D0
[  123.333256] ehci-pci :00:1a.0: power state changed by ACPI to D0
[  123.344541] ehci-pci :00:1d.0: power state changed by ACPI to D0
[  123.345012] sdhci-pci :02:00.0: MMC controller base frequency changed to 
50Mhz.
[  123.345744] PM: noirq restore of devices complete after 24.061 msecs
[  123.346684] PM: early restore of devices complete after 0.836 msecs
[  123.389863] i915 :00:02.0: setting latency timer to 64
[  123.389870] xhci_hcd :00:14.0: setting latency timer to 64
[  123.389887] ehci-pci :00:1a.0: setting latency timer to 64
[  123.389907] usb usb3: root hub lost power or was reset
[  123.389908] usb usb1: root hub lost power or was reset
[  123.389909] usb usb2: root hub lost power or was reset
[  123.390034] e1000e :00:19.0: irq 44 for MSI/MSI-X
[  123.390171] xhci_hcd :00:14.0: irq 45 for MSI/MSI-X
[  123.390308] snd_hda_intel :00:1b.0: irq 47 for MSI/MSI-X
[  123.391013] ehci-pci :00:1d.0: setting latency timer to 64
[  123.391038] usb usb4: root hub lost power or was reset
[  123.393798] ehci-pci :00:1a.0: cache line size of 64 is not supported
[  123.394115] ahci :00:1f.2: setting latency timer to 64
[  123.394229] iwlwifi :03:00.0: RF_KILL bit toggled to disable radio.
[  123.394923] ehci-pci :00:1d.0: cache line size of 64 is not supported
[  123.697314] usb 3-1: reset high-speed USB device number 2 using ehci-pci
[  123.698252] ata2: SATA link down (SStatus 0 SControl 300)
[  123.699286] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  123.701259] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  123.701287] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[  123.701291] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) 
filtered out
[  123.702699] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[  123.702703] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) 
filtered out
[  123.703222] ata5: SATA link down (SStatus 0 SControl 300)
[  123.704603] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[  123.704606] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) 
filtered out
[  123.705938] ata3.00: configured for UDMA/100
[  123.706033] sd 2:0:0:0: [sdb] Starting disk
[  123.706041] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[  123.706045] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) 
filtered out
[  123.735336] ata1.00: configured for UDMA/133
[  123.735662] sd 0:0:0:0: [sda] Starting disk
[  123.912740] usb 4-1: reset high-speed USB device number 2 using ehci-pci
[  124.129520] PM: restore of devices complete after 741.589 msecs
[  124.174684] Restarting tasks ... done.
[  124.177521] video LNXVIDEO:00: Restoring backlight state
[  124.186033] xhci_hcd :00:14.0: power state changed by ACPI to D3cold
[  124.214931] ehci-pci :00:1a.0: power state changed by ACPI to D3cold
[  124.214970] ehci-pci :00:1d.0: power state changed by ACPI to D3cold
[  124.394882] Uhhuh. NMI received for unknown reason 3c on CPU 0.  
<--- FUN.
[  124.394890] Do you have a strange power saving mode enabled?
[  124.394892] Dazed and confused, but trying to continue
[  124.407438] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off
[  127.035581] ehci-pci :00:1a.0: power state changed by ACPI to D0
[  127.135668] ehci-pci :00:1a.0: setting latency timer to 64
[  127.135910] ehci-pci :00:1d.0: power state changed by ACPI to D0
[  127.146500] ehci-pci :00:1a.0: power state changed by ACPI to D3cold

[PATCH 0/4] x86, head_32: Some cleanups

2013-02-03 Thread Borislav Petkov

From: Borislav Petkov 

Hi,

here are some initial low-hanging fruits wrt head_32.S cleanup. I've
made them as easily digestible as possible; after all, this is boot asm
and meddling with it tends to upset kernels.

Also, I've made the assumption that having boot_cpu_data.cpuid_level
contain the CPUID level for the boot cpu means that the APs have the
same CPUID level. This should be the case on X86.

They boot fine 486 and 486SX in qemu but I'd like to hear whether
the direction I'm going is ok before I continue testing them on real
hardware.

Thanks.


Borislav Petkov (4):
  x86, head_32: Remove i386 pieces
  x86: Detect CPUID support early at boot
  x86, head_32: Remove CPUID detection from default_entry
  x86, 32-bit: Drop new_cpu_data

 arch/x86/include/asm/processor.h |   1 -
 arch/x86/kernel/head_32.S| 105 ---
 arch/x86/kernel/setup.c  |   3 --
 arch/x86/lguest/boot.c   |   6 +--
 arch/x86/xen/enlighten.c |   8 +--
 5 files changed, 51 insertions(+), 72 deletions(-)

-- 
1.8.1.2.422.g08c0e7f

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] x86, head_32: Remove i386 pieces

2013-02-03 Thread Borislav Petkov

From: Borislav Petkov 

Remove code fragments detecting a 386 CPU since we don't support those
anymore. Also, do not do alignment checks because they're done only at
CPL3. Also, no need to preserve EFLAGS.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index c8932c79e78b..a9c5cc851285 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -394,30 +394,21 @@ default_entry:
jz 1f   # Did we do this already?
call *%eax
 1:
-   
-/* check if it is 486 or 386. */
+
 /*
- * XXX - this does a lot of unnecessary setup.  Alignment checks don't
- * apply at our cpl of 0 and the stack ought to be aligned already, and
- * we don't need to preserve eflags.
+ * Check if it is 486
  */
movl $-1,X86_CPUID  # -1 for no CPUID initially
-   movb $3,X86 # at least 386
+   movb $4,X86 # at least 486
pushfl  # push EFLAGS
popl %eax   # get EFLAGS
movl %eax,%ecx  # save original EFLAGS
-   xorl $0x24,%eax # flip AC and ID bits in EFLAGS
+   xorl $0x20,%eax # flip ID bit in EFLAGS
pushl %eax  # copy to EFLAGS
popfl   # set EFLAGS
pushfl  # get new EFLAGS
popl %eax   # put it in eax
xorl %ecx,%eax  # change in flags
-   pushl %ecx  # restore original EFLAGS
-   popfl
-   testl $0x4,%eax # check if AC bit changed
-   je is386
-
-   movb $4,X86 # at least 486
testl $0x20,%eax# check if ID bit changed
je is486
 
@@ -445,10 +436,7 @@ default_entry:
movl %edx,X86_CAPABILITY
 
 is486: movl $0x50022,%ecx  # set AM, WP, NE and MP
-   jmp 2f
-
-is386: movl $2,%ecx# set MP
-2: movl %cr0,%eax
+   movl %cr0,%eax
andl $0x8011,%eax   # Save PG,PE,ET
orl %ecx,%eax
movl %eax,%cr0
-- 
1.8.1.2.422.g08c0e7f

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] x86, head_32: Remove CPUID detection from default_entry

2013-02-03 Thread Borislav Petkov

From: Borislav Petkov 

We do that once earlier now and cache it into boot_cpu_data.cpuid_level
so no need for the EFLAGS.ID toggling dance anymore.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 40 +++-
 1 file changed, 7 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index ce6b557017f4..0dba3598cf02 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -334,25 +334,14 @@ default_entry:
movl %eax,%cr0
 
 /*
- * New page tables may be in 4Mbyte page mode and may
- * be using the global pages. 
+ * New page tables may be in 4Mbyte page mode and may be using the global 
pages.
  *
- * NOTE! If we are on a 486 we may have no cr4 at all!
- * Specifically, cr4 exists if and only if CPUID exists
- * and has flags other than the FPU flag set.
+ * NOTE! If we are on a 486 we may have no cr4 at all! Specifically, cr4 exists
+ * if and only if CPUID exists (which has been checked above) and has flags
+ * other than the FPU flag set.
  */
-   movl $X86_EFLAGS_ID,%ecx
-   pushl %ecx
-   popfl
-   pushfl
-   popl %eax
-   pushl $0
-   popfl
-   pushfl
-   popl %edx
-   xorl %edx,%eax
-   testl %ecx,%eax
-   jz 6f   # No ID flag = no CPUID = no CR4
+   cmpl $-1, pa(X86_CPUID)
+   je 6f   # No CPUID = no CR4
 
movl $1,%eax
cpuid
@@ -389,7 +378,6 @@ default_entry:
btsl $_EFER_NX, %eax
/* Make changes effective */
wrmsr
-
 6:
 
 /*
@@ -417,21 +405,7 @@ default_entry:
call *%eax
 1:
 
-/*
- * Check if it is 486
- */
-   movl $-1,X86_CPUID  # -1 for no CPUID initially
-   movb $4,X86 # at least 486
-   pushfl  # push EFLAGS
-   popl %eax   # get EFLAGS
-   movl %eax,%ecx  # save original EFLAGS
-   xorl $0x20,%eax # flip ID bit in EFLAGS
-   pushl %eax  # copy to EFLAGS
-   popfl   # set EFLAGS
-   pushfl  # get new EFLAGS
-   popl %eax   # put it in eax
-   xorl %ecx,%eax  # change in flags
-   testl $0x20,%eax# check if ID bit changed
+   cmpl $-1,X86_CPUID
je is486
 
/* get vendor info */
-- 
1.8.1.2.422.g08c0e7f

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] x86, 32-bit: Drop new_cpu_data

2013-02-03 Thread Borislav Petkov

From: Borislav Petkov 

We copy it to boot_cpu_data anyway so use boot_cpu_data from the get-go.

Cc: Rusty Russell 
Cc: Konrad Rzeszutek Wilk 
Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/processor.h |  1 -
 arch/x86/kernel/head_32.S| 17 -
 arch/x86/kernel/setup.c  |  3 ---
 arch/x86/lguest/boot.c   |  6 +++---
 arch/x86/xen/enlighten.c |  8 
 5 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index cf500543f6ff..984223f93293 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -149,7 +149,6 @@ struct cpuinfo_x86 {
  * capabilities of CPUs
  */
 extern struct cpuinfo_x86  boot_cpu_data;
-extern struct cpuinfo_x86  new_cpu_data;
 
 extern struct tss_struct   doublefault_tss;
 extern __u32   cpu_caps_cleared[NCAPINTS];
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 0dba3598cf02..fe77ec1202a5 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -27,17 +27,16 @@
 #define pa(X) ((X) - __PAGE_OFFSET)
 
 /*
- * References to members of the new_cpu_data structure.
+ * References to members of the boot_cpu_data structure.
  */
-#define X86new_cpu_data+CPUINFO_x86
-#define X86_VENDOR new_cpu_data+CPUINFO_x86_vendor
-#define X86_MODEL  new_cpu_data+CPUINFO_x86_model
-#define X86_MASK   new_cpu_data+CPUINFO_x86_mask
-#define X86_HARD_MATH  new_cpu_data+CPUINFO_hard_math
-#define X86_CAPABILITY new_cpu_data+CPUINFO_x86_capability
-#define X86_VENDOR_ID  new_cpu_data+CPUINFO_x86_vendor_id
-
+#define X86boot_cpu_data+CPUINFO_x86
+#define X86_VENDOR boot_cpu_data+CPUINFO_x86_vendor
+#define X86_MODEL  boot_cpu_data+CPUINFO_x86_model
+#define X86_MASK   boot_cpu_data+CPUINFO_x86_mask
+#define X86_HARD_MATH  boot_cpu_data+CPUINFO_hard_math
 #define X86_CPUID  boot_cpu_data+CPUINFO_cpuid_level
+#define X86_CAPABILITY boot_cpu_data+CPUINFO_x86_capability
+#define X86_VENDOR_ID  boot_cpu_data+CPUINFO_x86_vendor_id
 
 /*
  * This is how much memory in addition to the memory covered up to
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 4f322b3eb078..548044f751fc 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -171,8 +171,6 @@ static struct resource bss_resource = {
 
 
 #ifdef CONFIG_X86_32
-/* cpu data as detected by the assembly code in head.S */
-struct cpuinfo_x86 new_cpu_data __cpuinitdata = {0, 0, 0, 0, -1, 1, 0, 0, -1};
 /* common cpu data for all cpus */
 struct cpuinfo_x86 boot_cpu_data __read_mostly = {0, 0, 0, 0, -1, 1, 0, 0, -1};
 EXPORT_SYMBOL(boot_cpu_data);
@@ -749,7 +747,6 @@ early_param("reservelow", parse_reservelow);
 void __init setup_arch(char **cmdline_p)
 {
 #ifdef CONFIG_X86_32
-   memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
visws_early_detect();
 
/*
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 1cbd89ca5569..bd222f2495f4 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1404,12 +1404,12 @@ __init void lguest_init(void)
 * This is messy CPU setup stuff which the native boot code does before
 * start_kernel, so we have to do, too:
 */
-   cpu_detect(&new_cpu_data);
+   cpu_detect(&boot_cpu_data);
/* head.S usually sets up the first capability word, so do it here. */
-   new_cpu_data.x86_capability[0] = cpuid_edx(1);
+   boot_cpu_data.x86_capability[0] = cpuid_edx(1);
 
/* Math is always hard! */
-   new_cpu_data.hard_math = 1;
+   boot_cpu_data.hard_math = 1;
 
/* We don't have features.  We have puppies!  Puppies! */
 #ifdef CONFIG_X86_MCE
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 8b4c56d85ca0..85871df3cc68 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1454,10 +1454,10 @@ asmlinkage void __init xen_start_kernel(void)
 
 #ifdef CONFIG_X86_32
/* set up basic CPUID stuff */
-   cpu_detect(&new_cpu_data);
-   new_cpu_data.hard_math = 1;
-   new_cpu_data.wp_works_ok = 1;
-   new_cpu_data.x86_capability[0] = cpuid_edx(1);
+   cpu_detect(&boot_cpu_data);
+   boot_cpu_data.hard_math = 1;
+   boot_cpu_data.wp_works_ok = 1;
+   boot_cpu_data.x86_capability[0] = cpuid_edx(1);
 #endif
 
/* Poke various useful things into boot_params */
-- 
1.8.1.2.422.g08c0e7f

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] x86: Detect CPUID support early at boot

2013-02-03 Thread Borislav Petkov

From: Borislav Petkov 

We detect CPUID function support on the boot CPU and save it for later
use, obviating the need to play the toggle EFLAGS.ID game every time. C
code is looking at ->cpuid_level anyway.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 36 +---
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index a9c5cc851285..ce6b557017f4 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -29,16 +29,16 @@
 /*
  * References to members of the new_cpu_data structure.
  */
-
 #define X86new_cpu_data+CPUINFO_x86
 #define X86_VENDOR new_cpu_data+CPUINFO_x86_vendor
 #define X86_MODEL  new_cpu_data+CPUINFO_x86_model
 #define X86_MASK   new_cpu_data+CPUINFO_x86_mask
 #define X86_HARD_MATH  new_cpu_data+CPUINFO_hard_math
-#define X86_CPUID  new_cpu_data+CPUINFO_cpuid_level
 #define X86_CAPABILITY new_cpu_data+CPUINFO_x86_capability
 #define X86_VENDOR_ID  new_cpu_data+CPUINFO_x86_vendor_id
 
+#define X86_CPUID  boot_cpu_data+CPUINFO_cpuid_level
+
 /*
  * This is how much memory in addition to the memory covered up to
  * and including _end we need mapped initially.
@@ -263,6 +263,33 @@ subarch_entries:
 num_subarch_entries = (. - subarch_entries) / 4
 .previous
 #else
+
+/*
+ * Initialize EFLAGS. Some BIOS's leave bits like NT set. This would confuse 
the
+ * debugger if this code is traced.
+ */
+   pushl $0
+   popfl
+
+/*
+ * Check whether this CPU supports CPUID, and, if so, save the highest standard
+ * CPUID function number for later.
+ */
+   movl $X86_EFLAGS_ID,%ecx/* EFLAGS.ID */
+   pushl %ecx
+   popfl   /* set EFLAGS=ID */
+   pushfl  /* get EFLAGS */
+   popl %eax
+   xorl %ecx,%eax
+   jnz 1f  /* hw disallowed setting of ID bit */
+
+   xorl %eax,%eax
+   cpuid
+   movl %eax,pa(X86_CPUID) /* save largest std CPUID function */
+   jmp default_entry
+
+1:
+   movl $-1,pa(X86_CPUID)
jmp default_entry
 #endif /* CONFIG_PARAVIRT */
 
@@ -377,11 +404,6 @@ default_entry:
/* Shift the stack pointer to a virtual address */
addl $__PAGE_OFFSET, %esp
 
-/*
- * Initialize eflags.  Some BIOS's leave bits like NT set.  This would
- * confuse the debugger if this code is traced.
- * XXX - best to initialize before switching to protected mode.
- */
pushl $0
popfl
 
-- 
1.8.1.2.422.g08c0e7f

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Uhhuh. NMI received for unknown reason 2c on CPU 0.

2013-02-03 Thread Borislav Petkov

On Sun, Feb 03, 2013 at 09:15:12PM +0100, Rafael J. Wysocki wrote:
> Is suspend-to-RAM triggering that as too?

Nope, not really. But, just to confirm: s2r is

echo "shutdown" > /sys/power/disk
echo "mem" > /sys/power/state

right?

Btw, this bug is very strange. So I did a couple more s2disk runs, i.e.

echo "shutdown" > /sys/power/disk
echo "disk" > /sys/power/state

and it seemed to me that when the eth cable is plugged in, it would
suspend and resume fine. When I then boot, unplug the cable, set all
tunables to "Good", suspend to disk and resume, no NMI message. When I
plug the cable back, only *then* the message triggered.

I need to play with this a bit more to get a better sense of when
exactly it happens.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Uhhuh. NMI received for unknown reason 2c on CPU 0.

2013-02-03 Thread Borislav Petkov

On Sun, Feb 03, 2013 at 09:58:57PM +0100, Borislav Petkov wrote:
> and it seemed to me that when the eth cable is plugged in, it would
> suspend and resume fine. When I then boot, unplug the cable, set all
> tunables to "Good", suspend to disk and resume, no NMI message. When I
> plug the cable back, only *then* the message triggered.
> 
> I need to play with this a bit more to get a better sense of when
> exactly it happens.

Ok, not really.

It is not influenced by the cable being plugged - it happens when I plug
in the cable or simply shortly after resume, without the cable.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Uhhuh. NMI received for unknown reason 2c on CPU 0.

2013-02-03 Thread Borislav Petkov

On Sun, Feb 03, 2013 at 10:06:45PM +0100, Borislav Petkov wrote:
> On Sun, Feb 03, 2013 at 09:58:57PM +0100, Borislav Petkov wrote:
> > and it seemed to me that when the eth cable is plugged in, it would
> > suspend and resume fine. When I then boot, unplug the cable, set all
> > tunables to "Good", suspend to disk and resume, no NMI message. When I
> > plug the cable back, only *then* the message triggered.
> > 
> > I need to play with this a bit more to get a better sense of when
> > exactly it happens.
> 
> Ok, not really.
> 
> It is not influenced by the cable being plugged - it happens when I plug
> in the cable or simply shortly after resume, without the cable.

Ok, just did 10 s2ram cycles back-to-back - no issue whatsoever, no
matter when I (un-)plug the cable. Changed the suspend script to

echo "disk" > /sys/power/state

and did an 11th suspend-resume run. It triggered right after resuming
from disk. So I'd guess the image kernel might be the required condition
for the triggering of the issue.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] x86, 32-bit: Drop new_cpu_data

2013-02-03 Thread Borislav Petkov

On Sun, Feb 03, 2013 at 03:44:29PM -0800, H. Peter Anvin wrote:
> On 02/03/2013 08:14 AM, Borislav Petkov wrote:
> >From: Borislav Petkov 
> >
> >We copy it to boot_cpu_data anyway so use boot_cpu_data from the get-go.
> >
> 
> Hmm... this is the only part of this patchset I feel skeptical
> towards.  Overall, a lot of the early SMP code went way out of its
> way to have zero impact on the !CONFIG_SMP case, but that was a long
> time ago. Nowadays what we really should have is cpu_data being a
> percpu variable separate from boot_cpu_data (which is really
> "all_cpu_data") even on UP.

Hmmkay.

My thought vector here was to use boot_cpu_data to cache stuff
here which is universally valid on the current system, i.e. like
all_cpu_data. IOW, cache here family (model and stepping could differ,
as we've come to realize over the years :)) vendor (btw, X86_VENDOR is
unused) CPUID_EAX(0) level, capability, etc and use them later instead
of querying them again.

So, so early and in this case, we're saving CPU data which is valid for
all CPUs on the system and thus it belongs into boot_cpu_data, right?

And then, btw, that data could've been used in verify_cpu.S only if the
damn thing wasn't being used in arch/x86/boot/...

> Another cleanup desperately needed in this area is a bitvector for
> bugs in addition to features.

Yeah, c->x86_unfeatures! :-)

> In fact, I kind of suspect we should make it the *same* bitvector
> (different words) so we cpu_has(X) works on both without confusion
> (just put the BUGS at the end; it means that if we add feature words
> the bug numbers will shift but that is okay.)
>
> I actually mean to do this when I did the CPU feature vector stuff
> over 10 years ago, but never got around to it... and it still has
> never gotten done.
>
> The difference between bugs and features, of course, is that the
> former should be combined across CPUs with an OR whereas the latter
> get combined with an AND.

Yeah, that should be pretty easy to do with the current machinery
already in place. I'll take a look.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] x86, 32-bit: Drop new_cpu_data

2013-02-04 Thread Borislav Petkov

On Sun, Feb 03, 2013 at 09:44:02PM -0800, H. Peter Anvin wrote:
> boot_cpu_data is ok for things that are indeed universally valid
> across. That does not include CPUID level, for one.

Wait a minute, hold the phone! :-)

Are you saying that CPUID_EAX(0) could return different values in %eax
on the *same* system with mixed-silicon steppings? Or even on an uniform
system?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86, intel_cacheinfo: Shut up annoying warning

2013-02-04 Thread Borislav Petkov

From: Borislav Petkov 

I've been getting the following warning when doing randbuilds since
forever. Now it finally pissed me off just the perfect amount so that I
can fix it.

arch/x86/kernel/cpu/intel_cacheinfo.c:489:27: warning: ‘cache_disable_0’ 
defined but not used [-Wunused-variable]
arch/x86/kernel/cpu/intel_cacheinfo.c:491:27: warning: ‘cache_disable_1’ 
defined but not used [-Wunused-variable]
arch/x86/kernel/cpu/intel_cacheinfo.c:524:27: warning: ‘subcaches’ defined but 
not used [-Wunused-variable]

It happens because in randconfigs where CONFIG_SYSFS is not set, the
whole sysfs-interface to L3 cache index disabling is remaining unused
and gcc correctly warns about it. Make it optional, depending on
CONFIG_SYSFS too, as is the case with other sysfs-related machinery in
this file.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/cpu/intel_cacheinfo.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c 
b/arch/x86/kernel/cpu/intel_cacheinfo.c
index 0e462404d6f1..7c6f7d548c0f 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -298,8 +298,7 @@ struct _cache_attr {
 unsigned int);
 };
 
-#ifdef CONFIG_AMD_NB
-
+#if defined(CONFIG_AMD_NB) && defined(CONFIG_SYSFS)
 /*
  * L3 cache descriptors
  */
@@ -524,9 +523,9 @@ store_subcaches(struct _cpuid4_info *this_leaf, const char 
*buf, size_t count,
 static struct _cache_attr subcaches =
__ATTR(subcaches, 0644, show_subcaches, store_subcaches);
 
-#else  /* CONFIG_AMD_NB */
+#else
 #define amd_init_l3_cache(x, y)
-#endif /* CONFIG_AMD_NB */
+#endif  /* CONFIG_AMD_NB && CONFIG_SYSFS */
 
 static int
 __cpuinit cpuid4_cache_lookup_regs(int index,
-- 
1.8.1.2.422.g08c0e7f

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf: Check for flex and bison before continuing building

2013-02-04 Thread Borislav Petkov

From: Borislav Petkov 

Check whether both executables are present on the system before
continuing with the build instead of failing halfway, if either are
missing.

Signed-off-by: Borislav Petkov 
---
 tools/perf/Makefile | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 4b1044cbd84c..a158309a65ef 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -149,6 +149,8 @@ RM = rm -f
 MKDIR = mkdir
 FIND = find
 INSTALL = install
+FLEX = flex
+BISON= bison
 
 # sparse is architecture-neutral, which means that we need to tell it
 # explicitly what architecture to check for. Fix this up for yours..
@@ -158,6 +160,14 @@ ifneq ($(MAKECMDGOALS),clean)
 ifneq ($(MAKECMDGOALS),tags)
 -include config/feature-tests.mak
 
+ifeq ($(call get-executable,$(FLEX)),)
+   dummy := $(error Error: $(FLEX) is missing on this system, please 
install it)
+endif
+
+ifeq ($(call get-executable,$(BISON)),)
+   dummy := $(error Error: $(BISON) is missing on this system, please 
install it)
+endif
+
 ifeq ($(call try-cc,$(SOURCE_HELLO),$(CFLAGS) -Werror 
-fstack-protector-all,-fstack-protector-all),y)
CFLAGS := $(CFLAGS) -fstack-protector-all
 endif
@@ -282,9 +292,6 @@ endif
 
 export PERL_PATH
 
-FLEX = flex
-BISON= bison
-
 $(OUTPUT)util/parse-events-flex.c: util/parse-events.l 
$(OUTPUT)util/parse-events-bison.c
$(QUIET_FLEX)$(FLEX) --header-file=$(OUTPUT)util/parse-events-flex.h 
$(PARSER_DEBUG_FLEX) -t util/parse-events.l > $(OUTPUT)util/parse-events-flex.c
 
-- 
1.8.1.2.422.g08c0e7f

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Borislav Petkov

On Mon, Feb 04, 2013 at 05:54:18PM +0530, Viresh Kumar wrote:
> One important point i would like to highlight is: governors directory
> would be present in cpu/cpu*/cpufreq/ now instead of cpu/cpufreq/.

Uh, hold on, isn't this breaking a bunch of userspace with this move?
Also, on all those other systems which don't need per-policy governors,
we probably don't need this. So maybe this should be made optional, to
be enabled by a config option IMO...

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Borislav Petkov

On Mon, Feb 04, 2013 at 06:24:19PM +0530, Viresh Kumar wrote:
> That's why i am highlighting it again and again. :)

Ah, see, someone caught up with it :).

> What i believe is, the place where this directory was present earlier
> (cpu/cpufreq/) wasn't the right place. Everything else was in 
> cpu/cpu*/cpufreq,
> then why this in cpu/cpufreq/ ?

For the simple reason that the "cpu*" stuff is per-cpu - the
"cpu/cpufreq" is per system, i.e. one governor for the whole system.

> I don't know how much of a pain it would be to fix userspace for it,
> but i know it wouldn't be that small.

I wouldn't fix userspace but simply not touch it. You can add your
per-policy stuff in "cpu/cpu*" as new sysfs nodes and no need to
change anything. And, also, as I suggested earlier, you should make it
configurable since this code wouldn't make sense on x86, for example,
where one system-wide governor should suffice.

> I had another idea of doing this only for platforms where we have
> multiple struct policy alive at the same time. But didn't wanted to
> implement it before discussing this further.

Simply put it behind a config option like
CONFIG_CPU_IDLE_MULTIPLE_DRIVERS, call the whole menu
"Multi-power-domain-policy" something and that should be modulary
enough.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Borislav Petkov

On Mon, Feb 04, 2013 at 06:55:25PM +0530, Viresh Kumar wrote:
> That's not completely true. There lies cpufreq directory in cpu/cpu*/
> too, where we have per policy stuff in cpu/cpu*/, like policy tunables
> and stats. And the same is true for governor too.

$ tree /sys/devices/system/cpu/cpu0/cpufreq/
/sys/devices/system/cpu/cpu0/cpufreq/
├── affected_cpus
├── bios_limit
├── cpb
├── cpuinfo_cur_freq
├── cpuinfo_max_freq
├── cpuinfo_min_freq
├── cpuinfo_transition_latency
├── related_cpus
├── scaling_available_frequencies
├── scaling_available_governors
├── scaling_cur_freq
├── scaling_driver
├── scaling_governor
├── scaling_max_freq
├── scaling_min_freq
├── scaling_setspeed
└── stats
├── time_in_state
├── total_trans
└── trans_table

1 directory, 19 files

$ grep -r . /sys/devices/system/cpu/cpu0/cpufreq/*
/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:0
/sys/devices/system/cpu/cpu0/cpufreq/bios_limit:400
/sys/devices/system/cpu/cpu0/cpufreq/cpb:1
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:140
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:400
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:140
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:4000
/sys/devices/system/cpu/cpu0/cpufreq/related_cpus:0 1
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:400 
340 280 210 140
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:powersave 
userspace conservative ondemand perform
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:140
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:acpi-cpufreq
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:ondemand
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:400
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:140
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:400 3089328
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:340 47448
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:280 67185
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:210 92731
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:140 11416914
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:   From  :To
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table: :   400   
340   280   210   14
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  400: 0 
34756 46388 5317921824
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  340: 12938 
0  3755  3555 1450
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  280: 19940 
0 0  4547 2565
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  210: 18523 
0 0 0 4275
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  140:301168 
0 0 0
/sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans:799918

Show me the policy tunables here.

> That was slightly confusing to me :( The whole governor directory
> is per policy, i have to keep that in cpu/cpu*/cpufreq instead of
> cpu/cpufreq.

So make a /sys/devices/system/cpu/cpufreq/policies/ and add
functionality to assign cpus to policies or whatever the design of this
thing will be.

> Its not only for multicluster system, but a system where multiple cpus
> have separate clock control and hence multiple policy structures.

What are those systems? Examples?

> Problem with this is it would fail for single image solutions on which
> everybody is working on. So, with multiple platforms compiled into a
> single image, this wouldn't work.

Single-image solutions will enable that config option and get built with
it - no problem at all.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -v4 5/5] x86,smp: limit spinlock delay on virtual machines

2013-02-04 Thread Borislav Petkov

On Mon, Feb 04, 2013 at 08:50:33AM -0500, Rik van Riel wrote:
> We need to know whether we are actually running on top of a
> hypervisor, not whether we have the code compiled in to do so.

Oh ok, I see.

The thing is, if CONFIG_PARAVIRT_GUEST is disabled, x86_hyper won't
exist, see: http://marc.info/?l=linux-kernel&m=135936817627848&w=2

So maybe the hypervisor guest should itself take care of this and upon
init it should set the max_spinlock_delay in init_hypervisor() instead?
Seems only fair to me...

> After all, the majority of distribution kernels will have
> CONFIG_PARAVIRT_GUEST set, but the majority of those kernels
> will be running bare metal...

Yeah :-)

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Borislav Petkov

On Mon, Feb 04, 2013 at 07:28:16PM +0530, Viresh Kumar wrote:
> All files which are directly present in cpu/cpu*/cpufreq/ folder. I am
> not talking about governor tunables but policy tunables. Things like
> scaling_[min]max_freq are policy tunables.

No, on x86 those are the P-states frequencies. They're defined by the
hardware.

> Policies don't have a name associated with them and so
> cpu/cpufreq/policies doesn't make any sense. Rather one policy is
> related to multiple cpus and its tunables are linked in all the cpus
> that belong to it, like scaling_[min]max_freq.

Then do the following:

cpu/cpufreq/policies/
|-> policy0
|-> min_freq
|-> max_freq
|-> affected_cpus
...

or whatever needs to be a flexible interface for multi-policy cpufreq
support.

Remember: once you do those, they're more or less cast in stone so take
your time and do the design right, do not hurry those.

> Don't have examples of these, but there can be few. Over that it is a
> must for multicluster systems as clusters normally have separate clock
> control.

Yeah, nice try. We only support real hardware in the kernel, not what
could there be.

> But then we will get governors tunables in cpu/cpu*/cpufreq/ instead
> of cpu/cpufreq/ . Will that not break userspace for other systems?

What's wrong with having both? The cpu/cpufreq/ governor will set the
system-wide governor and the cpu/cpu*/cpufreq/ will add the different
policies.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Borislav Petkov

On Mon, Feb 04, 2013 at 07:51:33PM +0530, Viresh Kumar wrote:
> We correlate things with cpus rather than policies and so the current
> directory structure of cpu/cpu*/cpufreq/*** is the best suited ones.

Ok, show me the details of that layout. How is that going to look?

One thing I've come to realize with the current interface is that if
you want to change stuff, you need to iterate over all cpus instead of
writing to a system-wide node.

And, in this case, if you can and need to change the policy per
clock-domain, I wouldn't make it needlessly too-granulary per-cpu.

That's why I'm advocating the cpu/cpufreq/ path.

> Yes, and that's why cpu/cpu*/cpufreq/ondemand/*** suits the best, with
> exactly the same logic that went for P-states or cpufreq-stats.

See above.

> Hmm.. confused..
> Consider two systems:
> - A dual core system, with cores sharing clocks.
> - A dual cluster system (dual core per cluster), with separate clocks
> per cluster.
> 
> Where will you keep governor directories for both of these configurations?

Easy: as said above, make the policy granularity per clock-domain. On
systems which have only one set of P-states - like it is the case with
the overwhelming majority of systems running linux now - nothing should
change.

> We need to select only one... cpu/cpufreq doesn't suit the second case
> at all as we need to use ondemand governor for both the clusters but
> with separate tunables. And so a single cpu/cpufreq/ondemand directory
> wouldn't solve the issue.

Think of it this way: what is the highest granularity you need per
clock-domain? If you want to control the policy per clock-domain, then
cpu/cpufreq/ is what you want. If you want finer-grained control -
and you need to think hard of what use cases are sensible for that
finer-grained solution - then you're better off with cpu/cpu*/ layout.

In both cases though, having clear examples of why you've come up with
the layout you're advocating would help reviewers a lot. If you simply
come and say we need this because there might be systems out there who
could use it, then that probably is not going to get you that far.

HTH.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

drivers/gpu/drm/nouveau/nouveau_acpi.c:420: undefined reference to `acpi_video_get_edid'

2013-02-04 Thread Borislav Petkov

Hi,

I'm guessing someone has already triggered this on latest Linus' tree
and has a fix?

drivers/built-in.o: In function `nouveau_acpi_edid':
/w/kernel/linux/drivers/gpu/drm/nouveau/nouveau_acpi.c:420: undefined reference 
to `acpi_video_get_edid'
make: *** [vmlinux] Error 1

Btw, I got CONFIG_ACPI_VIDEO=m while CONFIG_DRM_NOUVEAU=y and this is
probably the reason for the vmlinux link error.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-04 Thread Borislav Petkov

On Mon, Feb 04, 2013 at 09:07:11PM +0530, Viresh Kumar wrote:
> I don't have board right now to take the snapshot, but it would be
> like:
> 
> $ tree /sys/devices/system/cpu/cpu0/cpufreq/
> /sys/devices/system/cpu/cpu0/cpufreq/
> ├── affected_cpus
> ├── bios_limit
> ├── cpb
> ├── cpuinfo_cur_freq
> ├── cpuinfo_max_freq
> ├── cpuinfo_min_freq
> ├── cpuinfo_transition_latency
> ├── related_cpus
> ├── scaling_available_frequencies
> ├── scaling_available_governors
> ├── scaling_cur_freq
> ├── scaling_driver
> ├── scaling_governor
> ├── scaling_max_freq
> ├── scaling_min_freq
> ├── scaling_setspeed
> └── stats
> ├── time_in_state
> ├── total_trans
> └── trans_table
> └── ondemand
> ├── sampling_rate
> ├── up_threshold
> └── ignore_nice

So this is adding the current governor as a per-cpu thing.

> > One thing I've come to realize with the current interface is that if
> > you want to change stuff, you need to iterate over all cpus instead of
> > writing to a system-wide node.
> 
> Not really. Following is the way by which cpu/cpu*/cpufreq directories
> are created:

That's not what I meant - I meant from userspace:

for $i in $(grep processor /proc/cpuinfo | awk '{ print $3 }');
do
echo "performance" > 
/sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor;
done

Instead of

echo "performance" > /sys/devices/system/cpu/cpufreq/scaling_governor

which is hypothetical but sets it for the whole system without fuss.

[ … ]

> I want to control it over clock-domain, but can't get that in cpu/cpufreq/.
> Policies don't have numbers assigned to them.

So, give them names.

> So, i am working on ARM's big.LITTLE system where we have two
> clusters. One of A15s and other of A7s. Because of their different
> power ratings or performance figures, we need to have separate set of
> ondemand tunables for them. And hence this patch. Though this patch is
> required for any multi-cluster system.

So you want this (values after "="):

cpu/cpufreq/
|-> policy0
|-> name= A15
|-> min_freq= ...
|-> max_freq= ...
|-> affected_cpus   = 0,1,2,...
|-> ondemand
|-> sampling_rate
|-> up_threshold
|-> ignore_nice
...
|-> policy1
|-> name= A7
|-> min_freq= ...
|-> max_freq= ...
|-> affected_cpus   = n,n+1,n+2,...
|-> performance
|-> sampling_rate
|-> up_threshold
|-> ignore_nice
...

Other arches create other policies and that's it. If you need another
policy added to the set, you simply add 'policyN++' and that's it.

I think this is cleaner but whatever - I don't care that much. My
only strong concern is that this thing should be a Kconfig option and
optional for arches where it doesn't apply.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] x86, 32-bit: Drop new_cpu_data

2013-02-04 Thread Borislav Petkov

On Mon, Feb 04, 2013 at 08:55:17AM -0800, H. Peter Anvin wrote:
> Yes and yes (in the latter case due to inconsistent MSR programming.)

Ok, I'll drop the last one, redo 2/4 and run them on the hardware I have
here.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Borislav Petkov

On Tue, Feb 05, 2013 at 12:50:31PM +0530, Viresh Kumar wrote:
> > I think this is cleaner but whatever - I don't care that much. My
> > only strong concern is that this thing should be a Kconfig option and
> > optional for arches where it doesn't apply.
> 
> Your concern is: we don't want to fix userspace for existing platforms
> where we have just a single cluster and so struct policy in the system.

No, as I said so many times already and you're unwilling to understand
it: multiple policies support in cpufreq should be optional and
selectable in Kconfig so that systems which don't need that, don't
have to see or use it. It is yet another feature which doesn't apply
universally so we make such features optional. Like the rest of the
gazillion things in the kernel already.

The existing sysfs layout cannot be changed because you're breaking
userspace and we don't do that. It is that simple.

Concerning adding new sysfs entries, I told you to make it as easy as
possible and as sensible as possible, dictated by the use cases. If you
can't come up with some, then talk to the people who are going to use
your design and ask them what makes sense the most.

*Then* write the code.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Borislav Petkov

On Tue, Feb 05, 2013 at 03:17:21PM +0530, Viresh Kumar wrote:
> > multiple policies support in cpufreq should be optional and
> > selectable in Kconfig so that systems which don't need that, don't
> > have to see or use it. It is yet another feature which doesn't apply
> > universally so we make such features optional. Like the rest of the
> > gazillion things in the kernel already.
> 
> I understand what Kconfig options are for, but i am not able to understand
> what's the benefit of this option here.

Are you kidding me? You're simply not reading what I'm saying to you:
"... should be optional and selectable in Kconfig so that systems which
don't need that, don't have to see or use it." Because on those systems
it doesn't apply.

How about we add an x86-specific extension which is a big wad of code
and is needlessly run on ARM just because it is easier?

That's why we do config options, so that code which doesn't apply on a
specific system, doesn't run on it.

Goddammit, how hard is to understand that??!

> For example: for single image solutions we need to keep it enabled.

So keep it enabled!

> And so, would need some sort of logic in cpufreq core & platform
> driver to decide where to create the governors directory.
>
> The code without Kconfig option would be as simple as:

> 
> platform_driver:
> init(struct cpufreq_policy *policy)
> {
> ..
> policy->have_multiple_policies = true;
> ..
> }
> 
> cpufreq-core:
> 
> add_dev()
> {
> if (policy->have_multiple_policies)
> create-folder-in-cpu/cpu*/cpufreq;
> else
> create-folder-in-cpu/cpufreq;

Yes, this is how it could be done.

And this is what I mean by making it optional:

You go and abstract away the "create-folder-in-cpu/cpu*/cpufreq"
functionality. If this is a function called
add_additional_sysfs_entries(), for example, it should do nothing when
CONFIG_CPUFREQ_MULTIPLE_POLICIES is disabled. Otherwise, it will do your
dance:

#ifdef CONFIG_CPUFREQ_MULTIPLE_POLICIES
static int add_additional_sysfs_entries(...)
{

do all stuff required for multiple policies

}
#else /* CONFIG_CPUFREQ_MULTIPLE_POLICIES */
static int add_additional_sysfs_entries(...)
{
return 0;
}
#endif /* CONFIG_CPUFREQ_MULTIPLE_POLICIES */

and all the rest of the stuff which is needed for multiple system
policies, should be abstracted that way, more functions added, whatever.
If it is starting to become more, you can create your own compilation
unit. And so on and so on... the kernel is full of examples how to do
stuff like that.

> And so, platforms like Krait or big.LITTLE can set it to true from their
> cpufreq-drivers. And this wouldn't break any of the current platforms.
> 
> > The existing sysfs layout cannot be changed because you're breaking
> > userspace and we don't do that. It is that simple.
> 
> That's fine. I understood it already. :)

Not really, you obviously didn't.

> The problem i see is:
> - both governor tunables, cpufreq-stats & policy tunables (P-states) have the
> same requirement. They are all per policy or clock-domain, instead of per cpu.
> - I want to keep all of these at the same place, as they should be
> present in the
> same hierarchy.
> - If we move everything to cpu/cpufreq/policy-names/ then also we would break
> existing userspace stuff for stats and P-states.
> - If we move everything to cpu/cpu*/cpufreq/ then also we would break
> existing userspace stuff for governors.

No, you're not allowed to change existing sysfs layout. FULLSTOP.

Simply add the new stuff to cpu/cpu*/cpufreq/ with code which
is enabled when CONFIG_CPUFREQ_MULTIPLE_POLICIES is set. If
CONFIG_CPUFREQ_MULTIPLE_POLICIES is not enabled, nothing changes.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Borislav Petkov

On Tue, Feb 05, 2013 at 04:13:23PM +0530, Viresh Kumar wrote:
> There isn't lot of code that we have to keep inside the macro you
> suggest. Its just an if else (with single line block), which would
> give the parent kobject. Nothing else.
>
> I didn't wanted to create a macro for just that. For me an if/else is
> not that big code.

Yeah, I imagine for you it isn't, no.

> Anyway, if nobody else comes on my side i can create that macro for you.
> But, personally i would prefer code without such macros.

Here's an even cleaner way:

platform_driver:
init(struct cpufreq_policy *policy)
{
...

add_additional_sysfs_entries(policy);

...
}

...

static void add_additional_sysfs_entries(struct cpufreq_policy *policy)
{
#ifdef CONFIG_CPUFREQ_MULTIPLE_POLICIES
create-folder-in-cpu/cpu*/cpufreq;
...
#endif
}

and the platform driver will have in its Kconfig section:

config CPUFREQ_PLATFORM_DRIVER_X
...
select CPUFREQ_MULTIPLE_POLICIES


You don't need the policy->have_multiple_policies member even.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Borislav Petkov

On Tue, Feb 05, 2013 at 04:42:23PM +0530, Viresh Kumar wrote:
> Tricky part is the name of this routine: add_additional_sysfs_entries().

Now you're just being silly - this is just an example how to do it. If
you want me to do it for ya, you need to send me your monthly salary.

> And so, keeping that additional variable looks a better solution.

Yeah, I don't think you understand me at all. :(

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Borislav Petkov

On Tue, Feb 05, 2013 at 04:56:03PM +0530, Viresh Kumar wrote:
> Just some kind of indication from platform driver is required about
> how/where it wants its governor directory to be present.

The indication is this:

config CPUFREQ_PLATFORM_DRIVER_X
...
select CPUFREQ_MULTIPLE_POLICIES

You really need to slow down and really look at what I'm proposing.

> And the variable suits more here.

And this variable means nothing on other systems so why add it in the
first place?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Borislav Petkov

On Tue, Feb 05, 2013 at 11:29:04AM +, Charles Garcia-Tobin wrote:
> Actually shooting myself in the foot here, Krait is not such a great
> example because although you can use difference between frequencies
> you are less likely to use different tunables (not inconceivable
> but unlikely). The best examples systems are multi cluster and
> hereterogeneous systems, like the recently announced Samsung Exynos 5
> octa http://en.wikipedia.org/wiki/Exynos_(system_on_chip). We will see
> more systems like this appearing, sporting low power cores combined
> with high performance ones, all running at the same time. I appreciate
> this is all very new, but more will come, and the requirement to have
> different tunables per cluster is very real. In ARM on our own multi
> cluster test chip, using an experimental version of this approach, we
> have seen good improvements in power consumption without compromising
> performance.

Ok, thanks for giving this insight, this is useful.

Question: do you need the granularity of that control to be per cpu
(with that I mean what linux understands under "cpu," i.e. logical or
physical core) or does one governor suffice per a set of cores, or as
you call it, a cluster?

> (Apologies ahead for any bit my mail server appends, not much I can do
> about it)

Yeah, my condolences :-)

> -- IMPORTANT NOTICE: The contents of this email and any attachments
> are confidential and may also be privileged. If you are not the
> intended recipient, please notify the sender immediately and do not
> disclose the contents to any other person, use it for any purpose, or
> store or copy the information in any medium. Thank you.

Leaving it in, in case you haven't seen how it looks like :-)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Borislav Petkov

On Tue, Feb 05, 2013 at 05:54:57PM +0530, Viresh Kumar wrote:q
> This indication isn't enough. On a single image solution, we need to
> identify the system which needs support for multiple policies and i
> still feel we need that variable type indication :)

If the image is going to run also on systems which support only a
single policy, then I guess you can make it a bool, stuff it in struct
cpufreq_policy and ifdef around it.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drm/nouveau: always select ACPI_VIDEO if ACPI is enabled.

2013-02-05 Thread Borislav Petkov

On Mon, Feb 04, 2013 at 04:41:22PM +0100, Maarten Lankhorst wrote:
> Hey,
> 
> Op 04-02-13 16:23, Borislav Petkov schreef:
> > Hi,
> >
> > I'm guessing someone has already triggered this on latest Linus' tree
> > and has a fix?
> >
> > drivers/built-in.o: In function `nouveau_acpi_edid':
> > /w/kernel/linux/drivers/gpu/drm/nouveau/nouveau_acpi.c:420: undefined 
> > reference to `acpi_video_get_edid'
> > make: *** [vmlinux] Error 1
> >
> > Btw, I got CONFIG_ACPI_VIDEO=m while CONFIG_DRM_NOUVEAU=y and this is
> > probably the reason for the vmlinux link error.
> >
> > Thanks.
> >
> Does this fix things?
> 
> -->8
> Having nouveau builtin would still allow ACPI_VIDEO to be used as external 
> module if some of the deps for acpi_video
> have not been met, which would result in a linking failure. Solve this by 
> only requiring ACPI && X86 to select ACPI_VIDEO.
> 
> Signed-off-by: Maarten Lankhorst 
> 
> ---
> diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
> index 8a55bee..f08b9b6 100644
> --- a/drivers/gpu/drm/nouveau/Kconfig
> +++ b/drivers/gpu/drm/nouveau/Kconfig
> @@ -10,7 +10,7 @@ config DRM_NOUVEAU
>   select FB
>   select FRAMEBUFFER_CONSOLE if !EXPERT
>   select FB_BACKLIGHT if DRM_NOUVEAU_BACKLIGHT
> - select ACPI_VIDEO if ACPI && X86 && BACKLIGHT_CLASS_DEVICE && 
> VIDEO_OUTPUT_CONTROL && INPUT
> + select ACPI_VIDEO if ACPI && X86
>   select ACPI_WMI if ACPI
>   select MXM_WMI if ACPI
>   select POWER_SUPPLY

Not really.

drivers/built-in.o: In function `acpi_video_bus_put_one_device':
/root/kernel/linux/drivers/acpi/video.c:1407: undefined reference to 
`thermal_cooling_device_unregister'
drivers/built-in.o: In function `acpi_video_device_find_cap':
/root/kernel/linux/drivers/acpi/video.c:842: undefined reference to 
`thermal_cooling_device_register'
make: *** [vmlinux] Error 1

It is CONFIG_THERMAL=m this time.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] drm/nouveau: always select ACPI_VIDEO if ACPI is enabled.

2013-02-05 Thread Borislav Petkov

On Tue, Feb 05, 2013 at 04:38:35PM +0100, Maarten Lankhorst wrote:
> Argh, next attempt, based on i915's Kconfig.
> 
> It seems that not only I have to select ACPI_VIDEO, I also have to select all 
> the dependencies.
> Is this a Kconfig bug or working as intended? i915 seems to have a 
> workaround, so I copied it from
> there. Except it's currently missing select THERMAL, so I guess it didn't get 
> updated when that got
> added.
> 
> >8
> Having nouveau builtin would still allow ACPI_VIDEO to be used as external 
> module
> if some of the deps for acpi_video have not been met, which would result in a 
> linking
> failure. Solve this by selecting all dependencies as well.
> 
> Signed-off-by: Maarten Lankhorst 

Yep, this takes care of all deps,

Tested-by: Borislav Petkov 

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors

2013-02-05 Thread Borislav Petkov

On Tue, Feb 05, 2013 at 06:38:07PM +, Charles Garcia-Tobin wrote:
> Later. Whatever you'd like to call it, but essentially a set of cpus,
> as linux understands them, that are logically related by the fact that
> you'd like to be able to use the same tuning policy and same governor
> across all of them.

Right, policy will be applied to the whole set, yes, but can you imagine
that per-core settings could also make sense at some point?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: Lock down MSR writing in secure boot

2013-02-08 Thread Borislav Petkov

On Fri, Feb 08, 2013 at 02:30:52PM -0800, H. Peter Anvin wrote:
> Also, keep in mind that there is a very simple way to deny MSR access
> completely, which is to not include the driver in your kernel (and not
> allow module loading, but if you can load modules you can just load a
> module to muck with whatever MSR you want.)

I was contemplating that too. What is the use case of having
msr.ko in a secure boot environment? Isn't that an all-no-tools,
you-can't-do-sh*t-except-what-you're-explicitly-allowed-to environment which
simply doesn't need to write MSRs in the first place?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: Lock down MSR writing in secure boot

2013-02-09 Thread Borislav Petkov

On Fri, Feb 08, 2013 at 10:45:35PM -0800, Kees Cook wrote:
> Also, _reading_ MSRs from userspace arguably has utility that doesn't
> compromise ring-0.

And to come back to the original question: what is that utility, who
would need it on a secure boot system and why?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] add helper for highmem checks

2013-02-09 Thread Borislav Petkov

On Fri, Feb 08, 2013 at 12:28:13PM -0800, Dave Hansen wrote:
> 
> Boris, could you check that this series also fixes the /dev/mem
> problem you were seeing?
> 
> --
> 
> We have a new debugging check on x86 that has caught a number
> of long-standing bugs.  However, there is a _bit_ of collateral
> damage with things that call __pa(high_memory).
> 
> We are now checking that any addresses passed to __pa() are
> *valid* and can be dereferenced.
> 
> "high_memory", however, is not valid.  It marks the start of
> highmem, and isn't itself a valid pointer.  But, those users
> are really just asking "is this vaddr mapped"?  So, give them
> a helper that does that, plus is also kind to our new
> debugging check.
> 
> 
> Signed-off-by: Dave Hansen 
> ---
> 
>  linux-2.6.git-dave/arch/x86/mm/pat.c |   11 ++-
>  linux-2.6.git-dave/drivers/char/mem.c|4 ++--
>  linux-2.6.git-dave/drivers/mtd/mtdchar.c |2 +-
>  linux-2.6.git-dave/include/linux/mm.h|   13 +
>  4 files changed, 22 insertions(+), 8 deletions(-)
> 
> diff -puN drivers/char/mem.c~clean-up-highmem-checks drivers/char/mem.c
> --- linux-2.6.git/drivers/char/mem.c~clean-up-highmem-checks  2013-02-08 
> 08:42:37.291222110 -0800
> +++ linux-2.6.git-dave/drivers/char/mem.c 2013-02-08 12:27:27.837477867 
> -0800
> @@ -51,7 +51,7 @@ static inline unsigned long size_inside_
>  #ifndef ARCH_HAS_VALID_PHYS_ADDR_RANGE
>  static inline int valid_phys_addr_range(phys_addr_t addr, size_t count)
>  {
> - return addr + count <= __pa(high_memory);
> + return !phys_addr_is_highmem(addr + count);
>  }
>  
>  static inline int valid_mmap_phys_addr_range(unsigned long pfn, size_t size)
> @@ -250,7 +250,7 @@ static int uncached_access(struct file *
>*/
>   if (file->f_flags & O_DSYNC)
>   return 1;
> - return addr >= __pa(high_memory);
> + return phys_addr_is_highmem(addr);
>  #endif
>  }
>  #endif
> diff -puN include/linux/mm.h~clean-up-highmem-checks include/linux/mm.h
> --- linux-2.6.git/include/linux/mm.h~clean-up-highmem-checks  2013-02-08 
> 08:42:37.295222148 -0800
> +++ linux-2.6.git-dave/include/linux/mm.h 2013-02-08 09:01:49.758254468 
> -0800
> @@ -1771,5 +1771,18 @@ static inline unsigned int debug_guardpa
>  static inline bool page_is_guard(struct page *page) { return false; }
>  #endif /* CONFIG_DEBUG_PAGEALLOC */
>  
> +static inline phys_addr_t last_lowmem_phys_addr(void)
> +{
> + /*
> +  * 'high_memory' is not a pointer that can be dereferenced, so
> +  * avoid calling __pa() on it directly.
> +  */
> + return __pa(high_memory - 1);
> +}
> +static inline bool phys_addr_is_highmem(phys_addr_t addr)
> +{
> + return addr > last_lowmem_paddr();

I think you mean last_lowmem_phys_addr() here:

include/linux/mm.h: In function ‘phys_addr_is_highmem’:
include/linux/mm.h:1764:2: error: implicit declaration of function 
‘last_lowmem_paddr’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
make[1]: *** [arch/x86/kernel/asm-offsets.s] Error 1

Changed.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: Add support for 64bit get_user() on x86-32

2013-02-09 Thread Borislav Petkov

On Fri, Feb 08, 2013 at 11:08:52AM -0800, H. Peter Anvin wrote:
> Yes, or anything else getting a pointer in memory from user space.

Here are some more from a 32-bit build here:

fs/exec.c: In function ‘get_user_arg_ptr’:
fs/exec.c:414:6: warning: cast to pointer from integer of different size 
[-Wint-to-pointer-cast]
fs/splice.c: In function ‘vmsplice_to_user’:
fs/splice.c:1556:11: warning: cast to pointer from integer of different size 
[-Wint-to-pointer-cast]
ipc/syscall.c: In function ‘sys_ipc’:
ipc/syscall.c:39:7: warning: cast to pointer from integer of different size 
[-Wint-to-pointer-cast]

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] add helper for highmem checks

2013-02-09 Thread Borislav Petkov

On Sat, Feb 09, 2013 at 10:41:21AM +0100, Borislav Petkov wrote:
> > +static inline bool phys_addr_is_highmem(phys_addr_t addr)
> > +{
> > +   return addr > last_lowmem_paddr();
> 
> I think you mean last_lowmem_phys_addr() here:
> 
> include/linux/mm.h: In function ‘phys_addr_is_highmem’:
> include/linux/mm.h:1764:2: error: implicit declaration of function 
> ‘last_lowmem_paddr’ [-Werror=implicit-function-declaration]
> cc1: some warnings being treated as errors
> make[1]: *** [arch/x86/kernel/asm-offsets.s] Error 1
> 
> Changed.

With this change, they definitely fix something because I even get X on
the box started. Previously, it would spit out the warning and wouldn't
start X with the login window. And my suspicion is that wdm (WINGs
display manager) I'm using, does /dev/mem accesses when it starts and it
obviously failed. Now not so much :-)

Tested-by: Borislav Petkov 

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/5] x86, head_32: Some cleanups, -v2

2013-02-09 Thread Borislav Petkov

From: Borislav Petkov 


Ok,

here's the next version with new_cpu_data left put and two minor fixlets
added at the end. The patchset was boot-tested on a bunch of baremetal
boxes and all QEMU cpu models - no issues.

Boot tests:

* baremetal:
- P4
- Atom n270
- 32-bit kernel on an AMD64 (F10h Phenom and Intel SNB)

* qemu, with cpu models:
 - qemu64
 - phenom
 - core2duo
 - kvm64
 - qemu32
 - kvm32
 - coreduo
 - 486{,SX}
 - pentium{,2,3}
 - athlon
 - n270,+movbe
 - Conroe
 - Penryn
 - Nehalem
 - Westmere
 - SandyBridge
 - Haswell
 - Opteron_G{1,2,3,4,5}

Why am I testing all those, you ask? Because I'm a sadistic mofo :-)

Changelog:

v1:

here are some initial low-hanging fruits wrt head_32.S cleanup. I've
made them as easily digestible as possible; after all, this is boot asm
and meddling with it tends to upset kernels.

Also, I've made the assumption that having boot_cpu_data.cpuid_level
contain the CPUID level for the boot cpu means that the APs have the
same CPUID level. This should be the case on X86.

They boot fine 486 and 486SX in qemu but I'd like to hear whether
the direction I'm going is ok before I continue testing them on real
hardware.


Borislav Petkov (5):
  x86, head_32: Remove i386 pieces
  x86: Detect CPUID support early at boot
  x86, head_32: Remove second CPUID detection from default_entry
  x86, head_32: Give the 6 label a real name
  x86, head_32: Remove an old gcc2 fix

 arch/x86/kernel/head_32.S | 92 ++-
 1 file changed, 35 insertions(+), 57 deletions(-)

-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/5] x86, head_32: Remove second CPUID detection from default_entry

2013-02-09 Thread Borislav Petkov

From: Borislav Petkov 

We do that once earlier now and cache it into new_cpu_data.cpuid_level
so no need for the EFLAGS.ID toggling dance anymore.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index df0b324d2854..46aa51467c0e 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -409,18 +409,7 @@ default_entry:
 /*
  * Check if it is 486
  */
-   movl $-1,X86_CPUID  # -1 for no CPUID initially
-   movb $4,X86 # at least 486
-   pushfl  # push EFLAGS
-   popl %eax   # get EFLAGS
-   movl %eax,%ecx  # save original EFLAGS
-   xorl $0x20,%eax # flip ID bit in EFLAGS
-   pushl %eax  # copy to EFLAGS
-   popfl   # set EFLAGS
-   pushfl  # get new EFLAGS
-   popl %eax   # put it in eax
-   xorl %ecx,%eax  # change in flags
-   testl $0x20,%eax# check if ID bit changed
+   cmpl $-1,X86_CPUID
je is486
 
/* get vendor info */
@@ -446,7 +435,9 @@ default_entry:
movb %cl,X86_MASK
movl %edx,X86_CAPABILITY
 
-is486: movl $0x50022,%ecx  # set AM, WP, NE and MP
+is486:
+   movb $4,X86
+   movl $0x50022,%ecx  # set AM, WP, NE and MP
movl %cr0,%eax
andl $0x8011,%eax   # Save PG,PE,ET
orl %ecx,%eax
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5] x86, head_32: Give the 6 label a real name

2013-02-09 Thread Borislav Petkov

From: Borislav Petkov 

Jumping here we are about to enable paging so rename the label
accordingly.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 46aa51467c0e..75e96d7e4e5f 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -339,7 +339,7 @@ default_entry:
pushfl
popl %eax   # get EFLAGS
testl $X86_EFLAGS_ID,%eax   # did EFLAGS.ID remained set?
-   jz 6f   # hw disallowed setting of ID bit
+   jz enable_paging# hw disallowed setting of ID bit
# which means no CPUID and no CR4
 
xorl %eax,%eax
@@ -349,13 +349,13 @@ default_entry:
movl $1,%eax
cpuid
andl $~1,%edx   # Ignore CPUID.FPU
-   jz 6f   # No flags or only CPUID.FPU = no CR4
+   jz enable_paging# No flags or only CPUID.FPU = no CR4
 
movl pa(mmu_cr4_features),%eax
movl %eax,%cr4
 
testb $X86_CR4_PAE, %al # check if PAE is enabled
-   jz 6f
+   jz enable_paging
 
/* Check if extended functions are implemented */
movl $0x8000, %eax
@@ -363,7 +363,7 @@ default_entry:
/* Value must be in the range 0x8001 to 0x8000 */
subl $0x8001, %eax
cmpl $(0x8000-0x8001), %eax
-   ja 6f
+   ja enable_paging
 
/* Clear bogus XD_DISABLE bits */
call verify_cpu
@@ -372,7 +372,7 @@ default_entry:
cpuid
/* Execute Disable bit supported? */
btl $(X86_FEATURE_NX & 31), %edx
-   jnc 6f
+   jnc enable_paging
 
/* Setup EFER (Extended Feature Enable Register) */
movl $MSR_EFER, %ecx
@@ -382,7 +382,7 @@ default_entry:
/* Make changes effective */
wrmsr
 
-6:
+enable_paging:
 
 /*
  * Enable paging
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/5] x86, head_32: Remove an old gcc2 fix

2013-02-09 Thread Borislav Petkov

From: Borislav Petkov 

gcc2 wants direction flag cleared but we don't support gcc2 anymore. So
drop it. Original patch adding this was:

commit 57d40092c375d2b6d34f814f5fb306967e22c4f5
Author: linus1 
Date:   Mon Nov 9 12:00:00 1992 -0600

[PATCH] Linux-0.98.4 (November 9, 1992)
...

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 75e96d7e4e5f..fc56613224c3 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -463,7 +463,6 @@ is486:
xorl %eax,%eax  # Clear LDT
lldt %ax
 
-   cld # gcc2 wants the direction flag cleared at all 
times
pushl $0# fake return address for unwinder
jmp *(initial_code)
 
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/5] x86: Detect CPUID support early at boot

2013-02-09 Thread Borislav Petkov

From: Borislav Petkov 

We detect CPUID function support on each CPU and save it for later use,
obviating the need to play the toggle EFLAGS.ID game every time. C code
is looking at ->cpuid_level anyway.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 48 +++
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index f4d919e2cd2b..df0b324d2854 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -318,30 +318,38 @@ default_entry:
movl %eax,%cr0
 
 /*
- * New page tables may be in 4Mbyte page mode and may
- * be using the global pages. 
+ * Initialize EFLAGS. Some BIOSes leave bits like NT set. This would confuse 
the
+ * debugger if this code is traced. Best to initialize before switching to
+ * protected mode.
+ */
+
+   pushl $0
+   popfl
+
+/*
+ * New page tables may be in 4Mbyte page mode and may be using the global 
pages.
  *
- * NOTE! If we are on a 486 we may have no cr4 at all!
- * Specifically, cr4 exists if and only if CPUID exists
- * and has flags other than the FPU flag set.
+ * NOTE! If we are on a 486 we may have no cr4 at all! Specifically, cr4 exists
+ * if and only if CPUID exists and has flags other than the FPU flag set.
  */
+   movl $-1,pa(X86_CPUID)  # preset CPUID level
movl $X86_EFLAGS_ID,%ecx
pushl %ecx
-   popfl
-   pushfl
-   popl %eax
-   pushl $0
-   popfl
+   popfl   # set EFLAGS=ID
pushfl
-   popl %edx
-   xorl %edx,%eax
-   testl %ecx,%eax
-   jz 6f   # No ID flag = no CPUID = no CR4
+   popl %eax   # get EFLAGS
+   testl $X86_EFLAGS_ID,%eax   # did EFLAGS.ID remained set?
+   jz 6f   # hw disallowed setting of ID bit
+   # which means no CPUID and no CR4
+
+   xorl %eax,%eax
+   cpuid
+   movl %eax,pa(X86_CPUID) # save largest std CPUID function
 
movl $1,%eax
cpuid
-   andl $~1,%edx   # Ignore CPUID.FPU
-   jz 6f   # No flags or only CPUID.FPU = no CR4
+   andl $~1,%edx   # Ignore CPUID.FPU
+   jz 6f   # No flags or only CPUID.FPU = no CR4
 
movl pa(mmu_cr4_features),%eax
movl %eax,%cr4
@@ -389,14 +397,6 @@ default_entry:
addl $__PAGE_OFFSET, %esp
 
 /*
- * Initialize eflags.  Some BIOS's leave bits like NT set.  This would
- * confuse the debugger if this code is traced.
- * XXX - best to initialize before switching to protected mode.
- */
-   pushl $0
-   popfl
-
-/*
  * start system 32-bit setup. We need to re-do some of the things done
  * in 16-bit mode for the "real" operations.
  */
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] x86, head_32: Remove i386 pieces

2013-02-09 Thread Borislav Petkov

From: Borislav Petkov 

Remove code fragments detecting a 386 CPU since we don't support those
anymore. Also, do not do alignment checks because they're done only at
CPL3. Also, no need to preserve EFLAGS.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 0b8c825fc264..f4d919e2cd2b 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -405,30 +405,21 @@ default_entry:
jz 1f   # Did we do this already?
call *%eax
 1:
-   
-/* check if it is 486 or 386. */
+
 /*
- * XXX - this does a lot of unnecessary setup.  Alignment checks don't
- * apply at our cpl of 0 and the stack ought to be aligned already, and
- * we don't need to preserve eflags.
+ * Check if it is 486
  */
movl $-1,X86_CPUID  # -1 for no CPUID initially
-   movb $3,X86 # at least 386
+   movb $4,X86 # at least 486
pushfl  # push EFLAGS
popl %eax   # get EFLAGS
movl %eax,%ecx  # save original EFLAGS
-   xorl $0x24,%eax # flip AC and ID bits in EFLAGS
+   xorl $0x20,%eax # flip ID bit in EFLAGS
pushl %eax  # copy to EFLAGS
popfl   # set EFLAGS
pushfl  # get new EFLAGS
popl %eax   # put it in eax
xorl %ecx,%eax  # change in flags
-   pushl %ecx  # restore original EFLAGS
-   popfl
-   testl $0x4,%eax # check if AC bit changed
-   je is386
-
-   movb $4,X86 # at least 486
testl $0x20,%eax# check if ID bit changed
je is486
 
@@ -456,10 +447,7 @@ default_entry:
movl %edx,X86_CAPABILITY
 
 is486: movl $0x50022,%ecx  # set AM, WP, NE and MP
-   jmp 2f
-
-is386: movl $2,%ecx# set MP
-2: movl %cr0,%eax
+   movl %cr0,%eax
andl $0x8011,%eax   # Save PG,PE,ET
orl %ecx,%eax
movl %eax,%cr0
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] x86, head_32: Remove an old gcc2 fix

2013-02-09 Thread Borislav Petkov

On Sat, Feb 09, 2013 at 12:52:01PM -0800, H. Peter Anvin wrote:
> However... DF should have been cleared long before this...

How about we do this at the beginning of default_entry where we clear
EFLAGS too:

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index fc56613224c3..8b2a8a824fc6 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -322,10 +322,11 @@ default_entry:
  * debugger if this code is traced. Best to initialize before switching to
  * protected mode.
  */
-
pushl $0
popfl
 
+   cld # GCC wants DF=0 at all times
+
 /*
  * New page tables may be in 4Mbyte page mode and may be using the global 
pages.
  *
--

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/5 -v2] x86, head_32: Clear DF much earlier

2013-02-09 Thread Borislav Petkov

From: Borislav Petkov 

All GCC versions expect the direction flag to be cleared (DF=0) so move
this to the default entry point for each core.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 75e96d7e4e5f..8b2a8a824fc6 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -322,10 +322,11 @@ default_entry:
  * debugger if this code is traced. Best to initialize before switching to
  * protected mode.
  */
-
pushl $0
popfl
 
+   cld # GCC wants DF=0 at all times
+
 /*
  * New page tables may be in 4Mbyte page mode and may be using the global 
pages.
  *
@@ -463,7 +464,6 @@ is486:
xorl %eax,%eax  # Clear LDT
lldt %ax
 
-   cld # gcc2 wants the direction flag cleared at all 
times
pushl $0# fake return address for unwinder
jmp *(initial_code)
 
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] x86, head_32: Remove an old gcc2 fix

2013-02-09 Thread Borislav Petkov

On Sat, Feb 09, 2013 at 02:23:36PM -0800, H. Peter Anvin wrote:
> The pushfl/popfl sequence clears DF too...

Yes, indeed, good realization!

Ok, I'll fold that fact as a comment into the 2/5 patch resend it only
as a reply to this mail so as not to spam unnecessarily.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/5 -v2.1] x86: Detect CPUID support early at boot

2013-02-09 Thread Borislav Petkov

From: Borislav Petkov 

We detect CPUID function support on each CPU and save it for later use,
obviating the need to play the toggle EFLAGS.ID game every time. C code
is looking at ->cpuid_level anyway.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 48 +++
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index f4d919e2cd2b..534397ba226c 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -318,30 +318,37 @@ default_entry:
movl %eax,%cr0
 
 /*
- * New page tables may be in 4Mbyte page mode and may
- * be using the global pages. 
+ * Initialize EFLAGS. Some BIOSes leave bits like NT set. This would confuse 
the
+ * debugger if this code is traced. Best to initialize before switching to
+ * protected mode. As a side effect, we clear DF too because GCC expects it so.
+ */
+   pushl $0
+   popfl
+
+/*
+ * New page tables may be in 4Mbyte page mode and may be using the global 
pages.
  *
- * NOTE! If we are on a 486 we may have no cr4 at all!
- * Specifically, cr4 exists if and only if CPUID exists
- * and has flags other than the FPU flag set.
+ * NOTE! If we are on a 486 we may have no cr4 at all! Specifically, cr4 exists
+ * if and only if CPUID exists and has flags other than the FPU flag set.
  */
+   movl $-1,pa(X86_CPUID)  # preset CPUID level
movl $X86_EFLAGS_ID,%ecx
pushl %ecx
-   popfl
-   pushfl
-   popl %eax
-   pushl $0
-   popfl
+   popfl   # set EFLAGS=ID
pushfl
-   popl %edx
-   xorl %edx,%eax
-   testl %ecx,%eax
-   jz 6f   # No ID flag = no CPUID = no CR4
+   popl %eax   # get EFLAGS
+   testl $X86_EFLAGS_ID,%eax   # did EFLAGS.ID remained set?
+   jz 6f   # hw disallowed setting of ID bit
+   # which means no CPUID and no CR4
+
+   xorl %eax,%eax
+   cpuid
+   movl %eax,pa(X86_CPUID) # save largest std CPUID function
 
movl $1,%eax
cpuid
-   andl $~1,%edx   # Ignore CPUID.FPU
-   jz 6f   # No flags or only CPUID.FPU = no CR4
+   andl $~1,%edx   # Ignore CPUID.FPU
+   jz 6f   # No flags or only CPUID.FPU = no CR4
 
movl pa(mmu_cr4_features),%eax
movl %eax,%cr4
@@ -389,14 +396,6 @@ default_entry:
addl $__PAGE_OFFSET, %esp
 
 /*
- * Initialize eflags.  Some BIOS's leave bits like NT set.  This would
- * confuse the debugger if this code is traced.
- * XXX - best to initialize before switching to protected mode.
- */
-   pushl $0
-   popfl
-
-/*
  * start system 32-bit setup. We need to re-do some of the things done
  * in 16-bit mode for the "real" operations.
  */
@@ -472,7 +471,6 @@ is486:  movl $0x50022,%ecx  # set AM, WP, NE and MP
xorl %eax,%eax  # Clear LDT
lldt %ax
 
-   cld # gcc2 wants the direction flag cleared at all 
times
pushl $0# fake return address for unwinder
jmp *(initial_code)
 
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/5 -v2.1] x86: Detect CPUID support early at boot

2013-02-10 Thread Borislav Petkov

On Sat, Feb 09, 2013 at 08:34:53PM -0800, H. Peter Anvin wrote:
> I wouldn't really call it a "side effect". Perhaps the right thing
> here is to say something like "we want to start out with %eflags
> unambiguously clear".
>
> (Note also we have had to CLD earlier because we have already copied
> the command line.)

Ok, let's make it even more verbose so that people know in the future:

"... we want to start out with EFLAGS unambiguously clear. That means DF
in particular (even though we have cleared it earlier after copying the
command line) because GCC expects it."

How does that sound?

Also, I was wondering about the whole reasoning behind that: do you know
why DF=0 is a GCC requirement? I mean, nothing hurts GCC from issuing a
CLD each time?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2 v2] x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag

2013-02-10 Thread Borislav Petkov

On Sun, Feb 10, 2013 at 04:37:27PM -0500, Len Brown wrote:
> From: Len Brown 
> 
> Remove 32-bit x86 a cmdline param "no-hlt",
> and the cpuinfo_x86.hlt_works_ok that it sets.
> 
> If a user wants to avoid HLT, then "idle=poll"
> is much more useful, as it avoids invocation of HLT
> in idle, while "no-hlt" failed to do so.
> 
> Indeed, hlt_works_ok was consulted in only 3 places.
> 
> First, in /proc/cpuinfo where "hlt_bug yes"
> would be printed if and only if the user booted
> the system with "no-hlt" -- as there was no other code
> to set that flag.
> 
> Second, check_hlt() would not invoke halt() if "no-hlt"
> were on the cmdline.
> 
> Third, it was consulted in stop_this_cpu(), which is invoked
> by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() --
> all cases where the machine is being shutdown/reset.
> The flag was not consulted in the more frequently invoked
> play_dead()/hlt_play_dead() used in processor offline and suspend.
> 
> Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations
> indicating that it would be removed in 2012.
> 
> Signed-off-by: Len Brown 
> Cc: x...@kernel.org
> ---
> v2: remove also check_hlt()

[ … ]

> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
> index 3286a92..e280253 100644
> --- a/arch/x86/kernel/cpu/proc.c
> +++ b/arch/x86/kernel/cpu/proc.c
> @@ -28,7 +28,6 @@ static void show_cpuinfo_misc(struct seq_file *m, struct 
> cpuinfo_x86 *c)
>  {
>   seq_printf(m,
>  "fdiv_bug\t: %s\n"
> -"hlt_bug\t\t: %s\n"

Are we fine with changing /proc/cpuinfo output? We tend to consider it
an API to userspace, judging by past experience...

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/4] x86, cpu: Expand ->x86_capability flags with bugs bitvector

2013-02-11 Thread Borislav Petkov

From: Borislav Petkov 

Hi,

so this is a rough first version to collect some initial comments. It is
lightly tested in kvm.

Thanks.

Borislav Petkov (4):
  x86, cpu: Expand cpufeature facility to include cpu bugs
  x86, cpu: Convert F00F bug detection
  x86, cpu: Convert FDIV bug detection
  x86, cpu: Convert Cyrix coma bug detection

 arch/x86/include/asm/cpufeature.h | 38 ++
 arch/x86/include/asm/processor.h  |  6 +-
 arch/x86/kernel/alternative.c |  2 +-
 arch/x86/kernel/cpu/bugs.c|  5 +++--
 arch/x86/kernel/cpu/common.c  |  4 
 arch/x86/kernel/cpu/cyrix.c   |  5 +++--
 arch/x86/kernel/cpu/intel.c   |  4 ++--
 arch/x86/kernel/cpu/proc.c|  6 +++---
 arch/x86/mm/fault.c   |  2 +-
 9 files changed, 56 insertions(+), 16 deletions(-)

-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 2/4] x86, cpu: Convert F00F bug detection

2013-02-11 Thread Borislav Petkov

From: Borislav Petkov 

... to using the new facility and drop the cpuinfo_x86 member.

Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeature.h | 2 ++
 arch/x86/include/asm/processor.h  | 1 -
 arch/x86/kernel/cpu/intel.c   | 4 ++--
 arch/x86/kernel/cpu/proc.c| 2 +-
 arch/x86/mm/fault.c   | 2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 68daf877dad5..22107c57c0e4 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -216,6 +216,8 @@
 #define X86_FEATURE_ADX(9*32+19) /* The ADCX and ADOX 
instructions */
 #define X86_FEATURE_SMAP   (9*32+20) /* Supervisor Mode Access Prevention 
*/
 
+#define X86_BUG_F00F   (NCAPINTS*32+ 0) /* Intel F00F bug */
+
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
 #include 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 67721c634bf0..60b21f132ae4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -93,7 +93,6 @@ struct cpuinfo_x86 {
charhard_math;
charrfu;
charfdiv_bug;
-   charf00f_bug;
charcoma_bug;
charpad0;
 #else
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 1905ce98bee0..1acdd42d86d1 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -209,11 +209,11 @@ static void __cpuinit intel_workarounds(struct 
cpuinfo_x86 *c)
 * system.
 * Note that the workaround only should be initialized once...
 */
-   c->f00f_bug = 0;
+   clear_cpu_bug(c, X86_BUG_F00F);
if (!paravirt_enabled() && c->x86 == 5) {
static int f00f_workaround_enabled;
 
-   c->f00f_bug = 1;
+   set_cpu_bug(c, X86_BUG_F00F);
if (!f00f_workaround_enabled) {
trap_init_f00f_bug();
printk(KERN_NOTICE "Intel Pentium with F0 0F bug - 
workaround enabled.\n");
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 3286a92e662a..debb8826589b 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -37,7 +37,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct 
cpuinfo_x86 *c)
   "wp\t\t: %s\n",
   c->fdiv_bug ? "yes" : "no",
   c->hlt_works_ok ? "no" : "yes",
-  c->f00f_bug ? "yes" : "no",
+  static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no",
   c->coma_bug ? "yes" : "no",
   c->hard_math ? "yes" : "no",
   c->hard_math ? "yes" : "no",
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fb674fd3fc22..aaaf6931ff0b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -555,7 +555,7 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long 
address)
/*
 * Pentium F0 0F C7 C8 bug workaround:
 */
-   if (boot_cpu_data.f00f_bug) {
+   if (boot_cpu_has_bug(X86_BUG_F00F)) {
nr = (address - idt_descr.address) >> 3;
 
if (nr == 6) {
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 3/4] x86, cpu: Convert FDIV bug detection

2013-02-11 Thread Borislav Petkov

From: Borislav Petkov 

... to the new facility.

Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeature.h | 1 +
 arch/x86/include/asm/processor.h  | 1 -
 arch/x86/kernel/cpu/bugs.c| 5 +++--
 arch/x86/kernel/cpu/proc.c| 2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 22107c57c0e4..6be6fab3dced 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -217,6 +217,7 @@
 #define X86_FEATURE_SMAP   (9*32+20) /* Supervisor Mode Access Prevention 
*/
 
 #define X86_BUG_F00F   (NCAPINTS*32+ 0) /* Intel F00F bug */
+#define X86_BUG_FDIV   (NCAPINTS*32+ 1) /* FPU FDIV bug */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 60b21f132ae4..d18dedf333aa 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -92,7 +92,6 @@ struct cpuinfo_x86 {
charhlt_works_ok;
charhard_math;
charrfu;
-   charfdiv_bug;
charcoma_bug;
charpad0;
 #else
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 92dfec986a48..3ca8ab0001bc 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -84,9 +84,10 @@ static void __init check_fpu(void)
 
kernel_fpu_end();
 
-   boot_cpu_data.fdiv_bug = fdiv_bug;
-   if (boot_cpu_data.fdiv_bug)
+   if (fdiv_bug) {
+   set_cpu_bug(&boot_cpu_data, X86_BUG_FDIV);
pr_warn("Hmm, FPU with FDIV bug\n");
+   }
 }
 
 static void __init check_hlt(void)
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index debb8826589b..de41600664da 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -35,7 +35,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct 
cpuinfo_x86 *c)
   "fpu_exception\t: %s\n"
   "cpuid level\t: %d\n"
   "wp\t\t: %s\n",
-  c->fdiv_bug ? "yes" : "no",
+  static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no",
   c->hlt_works_ok ? "no" : "yes",
   static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no",
   c->coma_bug ? "yes" : "no",
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 1/4] x86, cpu: Expand cpufeature facility to include cpu bugs

2013-02-11 Thread Borislav Petkov

From: Borislav Petkov 

We add another 32-bit vector at the end of the ->x86_capability
bitvector which collects bugs present in CPUs. After all, a CPU bug is a
kind of a capability, albeit a strange one.

Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeature.h | 34 ++
 arch/x86/include/asm/processor.h  |  2 +-
 arch/x86/kernel/alternative.c |  2 +-
 arch/x86/kernel/cpu/common.c  |  4 
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 2d9075e863a0..68daf877dad5 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -9,6 +9,7 @@
 #endif
 
 #define NCAPINTS   10  /* N 32-bit words worth of info */
+#define NBUGINTS   1   /* N 32-bit bug flags */
 
 /*
  * Note: If the comment begins with a quoted string, that string is used
@@ -399,6 +400,39 @@ static __always_inline __pure bool __static_cpu_has(u16 
bit)
 #define static_cpu_has(bit) boot_cpu_has(bit)
 #endif
 
+#define __BUG_CHECK_BIT(bit)   \
+({ \
+   WARN_ON(bit >> 5 < NCAPINTS);   \
+   bit;\
+})
+
+#define cpu_has_bug(c, bit)\
+({ \
+   unsigned __bit = __BUG_CHECK_BIT((bit));\
+   cpu_has(c, __bit);  \
+})
+
+#define boot_cpu_has_bug(bit) \
+   cpu_has_bug(&boot_cpu_data, (bit))
+
+#define static_cpu_has_bug(bit)\
+({ \
+   unsigned __bit = __BUG_CHECK_BIT((bit));\
+   static_cpu_has(__bit);  \
+})
+
+#define set_cpu_bug(c, bit)\
+({ \
+   unsigned __bit = __BUG_CHECK_BIT((bit));\
+   set_cpu_cap(c, __bit);  \
+})
+
+#define clear_cpu_bug(c, bit)  \
+({ \
+   unsigned __bit = __BUG_CHECK_BIT((bit));\
+   clear_cpu_cap(c, __bit);\
+})
+
 #endif /* defined(__KERNEL__) && !defined(__ASSEMBLY__) */
 
 #endif /* _ASM_X86_CPUFEATURE_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 439f27a41ee8..67721c634bf0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -108,7 +108,7 @@ struct cpuinfo_x86 {
__u32   extended_cpuid_level;
/* Maximum supported CPUID level, -1=no CPUID: */
int cpuid_level;
-   __u32   x86_capability[NCAPINTS];
+   __u32   x86_capability[NCAPINTS + NBUGINTS];
charx86_vendor_id[16];
charx86_model_id[64];
/* in KB - valid for CPUS which support this call: */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ef5ccca79a6c..c15cf9a25e27 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -271,7 +271,7 @@ void __init_or_module apply_alternatives(struct alt_instr 
*start,
replacement = (u8 *)&a->repl_offset + a->repl_offset;
BUG_ON(a->replacementlen > a->instrlen);
BUG_ON(a->instrlen > sizeof(insnbuf));
-   BUG_ON(a->cpuid >= NCAPINTS*32);
+   BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
if (!boot_cpu_has(a->cpuid))
continue;
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d814772c5bed..22018f70a671 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -920,6 +920,10 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
/* AND the already accumulated flags with these */
for (i = 0; i < NCAPINTS; i++)
boot_cpu_data.x86_capability[i] &= c->x86_capability[i];
+
+   /* OR, i.e. replicate the bug flags */
+   for (i = NCAPINTS; i < NCAPINTS + NBUGINTS; i++)
+   c->x86_capability[i] |= boot_cpu_data.x86_capability[i];
}
 
/* Init Machine Check Exception if available. */
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 4/4] x86, cpu: Convert Cyrix coma bug detection

2013-02-11 Thread Borislav Petkov

From: Borislav Petkov 

... to the new facility. Drop the padding too since it becomes
unnecessary now.

Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/cpufeature.h | 1 +
 arch/x86/include/asm/processor.h  | 2 --
 arch/x86/kernel/cpu/cyrix.c   | 5 +++--
 arch/x86/kernel/cpu/proc.c| 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 6be6fab3dced..62b9affc0948 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -218,6 +218,7 @@
 
 #define X86_BUG_F00F   (NCAPINTS*32+ 0) /* Intel F00F bug */
 #define X86_BUG_FDIV   (NCAPINTS*32+ 1) /* FPU FDIV bug */
+#define X86_BUG_COMA   (NCAPINTS*32+ 2) /* Cyrix 6x86 coma */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index d18dedf333aa..c7f1066bd93a 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -92,8 +92,6 @@ struct cpuinfo_x86 {
charhlt_works_ok;
charhard_math;
charrfu;
-   charcoma_bug;
-   charpad0;
 #else
/* Number of 4K pages in DTLB/ITLB combined(in pages): */
int x86_tlbsize;
diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
index 4fbd384fb645..ef060edeb68e 100644
--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -249,7 +249,7 @@ static void __cpuinit init_cyrix(struct cpuinfo_x86 *c)
/* Emulate MTRRs using Cyrix's ARRs. */
set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
/* 6x86's contain this bug */
-   c->coma_bug = 1;
+   set_cpu_bug(c, X86_BUG_COMA);
break;
 
case 4: /* MediaGX/GXm or Geode GXM/GXLV/GX1 */
@@ -317,7 +317,8 @@ static void __cpuinit init_cyrix(struct cpuinfo_x86 *c)
/* Enable MMX extensions (App note 108) */
setCx86_old(CX86_CCR7, getCx86_old(CX86_CCR7)|1);
} else {
-   c->coma_bug = 1;  /* 6x86MX, it has the bug. */
+   /* 6x86MX, it has the bug. */
+   set_cpu_bug(c, X86_BUG_COMA);
}
tmp = (!(dir0_lsn & 7) || dir0_lsn & 1) ? 2 : 0;
Cx86_cb[tmp] = cyrix_model_mult2[dir0_lsn & 7];
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index de41600664da..7497d1eb6053 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -38,7 +38,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct 
cpuinfo_x86 *c)
   static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no",
   c->hlt_works_ok ? "no" : "yes",
   static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no",
-  c->coma_bug ? "yes" : "no",
+  static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no",
   c->hard_math ? "yes" : "no",
   c->hard_math ? "yes" : "no",
   c->cpuid_level,
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/4 -v3] x86, head_32: Some cleanups

2013-02-11 Thread Borislav Petkov

From: Borislav Petkov 

Hi,

this is the new version with patch 5 from the old transformed into an
expanded, more verbose comment at the beginning of default_entry. No
changes otherwise.


Changelog:

v2:

Here's the next version with new_cpu_data left put and two minor fixlets
added at the end. The patchset was boot-tested on a bunch of baremetal
boxes and all QEMU cpu models - no issues.

Boot tests:

* baremetal:
- P4
- Atom n270
- 32-bit kernel on an AMD64 (F10h Phenom and Intel SNB)

* qemu, with cpu models:
 - qemu64
 - phenom
 - core2duo
 - kvm64
 - qemu32
 - kvm32
 - coreduo
 - 486{,SX}
 - pentium{,2,3}
 - athlon
 - n270,+movbe
 - Conroe
 - Penryn
 - Nehalem
 - Westmere
 - SandyBridge
 - Haswell
 - Opteron_G{1,2,3,4,5}

Why am I testing all those, you ask? Because I'm a sadistic mofo :-)

v1:

here are some initial low-hanging fruits wrt head_32.S cleanup. I've
made them as easily digestible as possible; after all, this is boot asm
and meddling with it tends to upset kernels.

Also, I've made the assumption that having boot_cpu_data.cpuid_level
contain the CPUID level for the boot cpu means that the APs have the
same CPUID level. This should be the case on X86.

They boot fine 486 and 486SX in qemu but I'd like to hear whether
the direction I'm going is ok before I continue testing them on real
hardware.

Borislav Petkov (4):
  x86, head_32: Remove i386 pieces
  x86: Detect CPUID support early at boot
  x86, head_32: Remove second CPUID detection from default_entry
  x86, head_32: Give the 6 label a real name

 arch/x86/kernel/head_32.S | 93 ++-
 1 file changed, 36 insertions(+), 57 deletions(-)

-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] x86: Detect CPUID support early at boot

2013-02-11 Thread Borislav Petkov

From: Borislav Petkov 

We detect CPUID function support on each CPU and save it for later use,
obviating the need to play the toggle EFLAGS.ID game every time. C code
is looking at ->cpuid_level anyway.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 50 +++
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index f4d919e2cd2b..73e084a6d2c5 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -318,30 +318,39 @@ default_entry:
movl %eax,%cr0
 
 /*
- * New page tables may be in 4Mbyte page mode and may
- * be using the global pages. 
+ * We want to start out with EFLAGS unambiguously cleared. Some BIOSes leave
+ * bits like NT set. This would confuse the debugger if this code is traced. So
+ * initialize them properly now before switching to protected mode. That means
+ * DF in particular (even though we have cleared it earlier after copying the
+ * command line) because GCC expects it.
+ */
+   pushl $0
+   popfl
+
+/*
+ * New page tables may be in 4Mbyte page mode and may be using the global 
pages.
  *
- * NOTE! If we are on a 486 we may have no cr4 at all!
- * Specifically, cr4 exists if and only if CPUID exists
- * and has flags other than the FPU flag set.
+ * NOTE! If we are on a 486 we may have no cr4 at all! Specifically, cr4 exists
+ * if and only if CPUID exists and has flags other than the FPU flag set.
  */
+   movl $-1,pa(X86_CPUID)  # preset CPUID level
movl $X86_EFLAGS_ID,%ecx
pushl %ecx
-   popfl
-   pushfl
-   popl %eax
-   pushl $0
-   popfl
+   popfl   # set EFLAGS=ID
pushfl
-   popl %edx
-   xorl %edx,%eax
-   testl %ecx,%eax
-   jz 6f   # No ID flag = no CPUID = no CR4
+   popl %eax   # get EFLAGS
+   testl $X86_EFLAGS_ID,%eax   # did EFLAGS.ID remained set?
+   jz 6f   # hw disallowed setting of ID bit
+   # which means no CPUID and no CR4
+
+   xorl %eax,%eax
+   cpuid
+   movl %eax,pa(X86_CPUID) # save largest std CPUID function
 
movl $1,%eax
cpuid
-   andl $~1,%edx   # Ignore CPUID.FPU
-   jz 6f   # No flags or only CPUID.FPU = no CR4
+   andl $~1,%edx   # Ignore CPUID.FPU
+   jz 6f   # No flags or only CPUID.FPU = no CR4
 
movl pa(mmu_cr4_features),%eax
movl %eax,%cr4
@@ -389,14 +398,6 @@ default_entry:
addl $__PAGE_OFFSET, %esp
 
 /*
- * Initialize eflags.  Some BIOS's leave bits like NT set.  This would
- * confuse the debugger if this code is traced.
- * XXX - best to initialize before switching to protected mode.
- */
-   pushl $0
-   popfl
-
-/*
  * start system 32-bit setup. We need to re-do some of the things done
  * in 16-bit mode for the "real" operations.
  */
@@ -472,7 +473,6 @@ is486:  movl $0x50022,%ecx  # set AM, WP, NE and MP
xorl %eax,%eax  # Clear LDT
lldt %ax
 
-   cld # gcc2 wants the direction flag cleared at all 
times
pushl $0# fake return address for unwinder
jmp *(initial_code)
 
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] x86, head_32: Give the 6 label a real name

2013-02-11 Thread Borislav Petkov

From: Borislav Petkov 

Jumping here we are about to enable paging so rename the label
accordingly.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index e893ac09ca03..73afd11799ca 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -340,7 +340,7 @@ default_entry:
pushfl
popl %eax   # get EFLAGS
testl $X86_EFLAGS_ID,%eax   # did EFLAGS.ID remained set?
-   jz 6f   # hw disallowed setting of ID bit
+   jz enable_paging# hw disallowed setting of ID bit
# which means no CPUID and no CR4
 
xorl %eax,%eax
@@ -350,13 +350,13 @@ default_entry:
movl $1,%eax
cpuid
andl $~1,%edx   # Ignore CPUID.FPU
-   jz 6f   # No flags or only CPUID.FPU = no CR4
+   jz enable_paging# No flags or only CPUID.FPU = no CR4
 
movl pa(mmu_cr4_features),%eax
movl %eax,%cr4
 
testb $X86_CR4_PAE, %al # check if PAE is enabled
-   jz 6f
+   jz enable_paging
 
/* Check if extended functions are implemented */
movl $0x8000, %eax
@@ -364,7 +364,7 @@ default_entry:
/* Value must be in the range 0x8001 to 0x8000 */
subl $0x8001, %eax
cmpl $(0x8000-0x8001), %eax
-   ja 6f
+   ja enable_paging
 
/* Clear bogus XD_DISABLE bits */
call verify_cpu
@@ -373,7 +373,7 @@ default_entry:
cpuid
/* Execute Disable bit supported? */
btl $(X86_FEATURE_NX & 31), %edx
-   jnc 6f
+   jnc enable_paging
 
/* Setup EFER (Extended Feature Enable Register) */
movl $MSR_EFER, %ecx
@@ -383,7 +383,7 @@ default_entry:
/* Make changes effective */
wrmsr
 
-6:
+enable_paging:
 
 /*
  * Enable paging
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] x86, head_32: Remove second CPUID detection from default_entry

2013-02-11 Thread Borislav Petkov

From: Borislav Petkov 

We do that once earlier now and cache it into new_cpu_data.cpuid_level
so no need for the EFLAGS.ID toggling dance anymore.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 73e084a6d2c5..e893ac09ca03 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -410,18 +410,7 @@ default_entry:
 /*
  * Check if it is 486
  */
-   movl $-1,X86_CPUID  # -1 for no CPUID initially
-   movb $4,X86 # at least 486
-   pushfl  # push EFLAGS
-   popl %eax   # get EFLAGS
-   movl %eax,%ecx  # save original EFLAGS
-   xorl $0x20,%eax # flip ID bit in EFLAGS
-   pushl %eax  # copy to EFLAGS
-   popfl   # set EFLAGS
-   pushfl  # get new EFLAGS
-   popl %eax   # put it in eax
-   xorl %ecx,%eax  # change in flags
-   testl $0x20,%eax# check if ID bit changed
+   cmpl $-1,X86_CPUID
je is486
 
/* get vendor info */
@@ -447,7 +436,9 @@ default_entry:
movb %cl,X86_MASK
movl %edx,X86_CAPABILITY
 
-is486: movl $0x50022,%ecx  # set AM, WP, NE and MP
+is486:
+   movb $4,X86
+   movl $0x50022,%ecx  # set AM, WP, NE and MP
movl %cr0,%eax
andl $0x8011,%eax   # Save PG,PE,ET
orl %ecx,%eax
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] x86, head_32: Remove i386 pieces

2013-02-11 Thread Borislav Petkov

From: Borislav Petkov 

Remove code fragments detecting a 386 CPU since we don't support those
anymore. Also, do not do alignment checks because they're done only at
CPL3. Also, no need to preserve EFLAGS.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/head_32.S | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 0b8c825fc264..f4d919e2cd2b 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -405,30 +405,21 @@ default_entry:
jz 1f   # Did we do this already?
call *%eax
 1:
-   
-/* check if it is 486 or 386. */
+
 /*
- * XXX - this does a lot of unnecessary setup.  Alignment checks don't
- * apply at our cpl of 0 and the stack ought to be aligned already, and
- * we don't need to preserve eflags.
+ * Check if it is 486
  */
movl $-1,X86_CPUID  # -1 for no CPUID initially
-   movb $3,X86 # at least 386
+   movb $4,X86 # at least 486
pushfl  # push EFLAGS
popl %eax   # get EFLAGS
movl %eax,%ecx  # save original EFLAGS
-   xorl $0x24,%eax # flip AC and ID bits in EFLAGS
+   xorl $0x20,%eax # flip ID bit in EFLAGS
pushl %eax  # copy to EFLAGS
popfl   # set EFLAGS
pushfl  # get new EFLAGS
popl %eax   # put it in eax
xorl %ecx,%eax  # change in flags
-   pushl %ecx  # restore original EFLAGS
-   popfl
-   testl $0x4,%eax # check if AC bit changed
-   je is386
-
-   movb $4,X86 # at least 486
testl $0x20,%eax# check if ID bit changed
je is486
 
@@ -456,10 +447,7 @@ default_entry:
movl %edx,X86_CAPABILITY
 
 is486: movl $0x50022,%ecx  # set AM, WP, NE and MP
-   jmp 2f
-
-is386: movl $2,%ecx# set MP
-2: movl %cr0,%eax
+   movl %cr0,%eax
andl $0x8011,%eax   # Save PG,PE,ET
orl %ecx,%eax
movl %eax,%cr0
-- 
1.8.1.3.535.ga923c31

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/33] nohz: Basic full dynticks interface

2013-02-11 Thread Borislav Petkov

On Tue, Jan 08, 2013 at 03:08:07AM +0100, Frederic Weisbecker wrote:

[ … ]

> diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
> index 8601f0d..dc6381d 100644
> --- a/kernel/time/Kconfig
> +++ b/kernel/time/Kconfig
> @@ -70,6 +70,15 @@ config NO_HZ
> only trigger on an as-needed basis both when the system is
> busy and when the system is idle.
>  
> +config NO_HZ_FULL
> +   bool "Full tickless system"

I think you want to say here "Almost-completely tickless system".
"Almost" because of that one CPU outside of the range :-)

> +   depends on NO_HZ && RCU_USER_QS && VIRT_CPU_ACCOUNTING_GEN && 
> RCU_NOCB_CPU && SMP
> +   select CONTEXT_TRACKING_FORCE
> +   help
> + Try to be tickless everywhere, not just in idle. (You need
> +  to fill up the full_nohz_mask boot parameter).
> +
> +
>  config HIGH_RES_TIMERS
>   bool "High Resolution Timer Support"
>   depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 314b9ee..494a2aa 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -142,6 +142,29 @@ static void tick_sched_handle(struct tick_sched *ts, 
> struct pt_regs *regs)
>   profile_tick(CPU_PROFILING);
>  }
>  
> +#ifdef CONFIG_NO_HZ_FULL
> +static cpumask_var_t full_nohz_mask;
> +bool have_full_nohz_mask;
> +
> +int tick_nohz_full_cpu(int cpu)
> +{
> + if (!have_full_nohz_mask)
> + return 0;
> +
> + return cpumask_test_cpu(cpu, full_nohz_mask);
> +}
> +
> +/* Parse the boot-time nohz CPU list from the kernel parameters. */
> +static int __init tick_nohz_full_setup(char *str)
> +{
> + alloc_bootmem_cpumask_var(&full_nohz_mask);
> + have_full_nohz_mask = true;
> + cpulist_parse(str, full_nohz_mask);

Don't you want to check retval of cpulist_parse first here before
assigning have_full_nohz_mask and allocating cpumask var?

We don't trust userspace, you know.

> + return 1;
> +}
> +__setup("full_nohz=", tick_nohz_full_setup);

I'd guess this kernel parameter needs to go into
Documentation/kernel-parameters.txt along with a referral to
Documentation/cputopology.txt which explains how to specify cpulists for
n00bs like me :-)

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] x86, head_32: Remove second CPUID detection from default_entry

2013-02-11 Thread Borislav Petkov

On Mon, Feb 11, 2013 at 07:49:14AM -0800, H. Peter Anvin wrote:
> What about CPUs with inconsistent cpuid levels? Yes, they can and do
> happen, as we discussed on IRC.

Yes, this should still work. We're doing the EFLAGS.ID dance right at
the beginning of default_entry on each cpu and cache cpuld level in
new_cpu_data for the time we're in this code.

What this particular patch removes is the yet-another EFLAGS.ID dance
which we IMHO unnecessarily did after enabling paging.

So basically nothing changes wrt handling inconsistent cpuid levels and
MSR mis-programming - we still should be taking care of those cases.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] add helper for highmem checks

2013-02-11 Thread Borislav Petkov

On Mon, Feb 11, 2013 at 09:32:41AM -0800, Dave Hansen wrote:
> That's crazy. Didn't expect that at all.
>
> I guess X is happier getting an error than getting random pages back.

Yeah, I think this is something special only this window manager wdm
does. The line below has appeared repeatedly in the logs earlier:

Feb  5 23:02:02 a1 wdm: Cannot read randomFile "/dev/mem", errno = 14

This happens when wdm starts so I'm going to guess it uses it for
something funny, "randomFile" it calls it??

With the WARN_ON check added and booting 3.8-rc6, it would choke wdm
somehow and it wouldn't start properly so that even the error out above
doesn't happen. Oh well ...

> I'm working on a set of patches now that should get it _working_
> instead of just returning an error.

Yeah, send them on and I'll run them.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] add helper for highmem checks

2013-02-11 Thread Borislav Petkov

On Mon, Feb 11, 2013 at 11:44:12AM -0800, H. Peter Anvin wrote:
> Oh, craptastic. X used to hash /dev/mem to get a random seed. It
> should have stopped that long ago, and used /dev/[u]random.

That's because debian still has this WINGs window manager which hasn't
seen any new releases since 2005: http://voins.program.ru/wdm/ and I'm
using it because I don't want the pompous crap of the other display
managers.

But this one uses /dev/mem as a randomFile only by default - there's a
configuration variable DisplayManager.randomFile which can be pointed
away from /dev/mem so that's easily fixable.

Mind you, I wouldnt've caught the issue if I wasn't using this ancient
thing in its default settings :o).

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] add helper for highmem checks

2013-02-11 Thread Borislav Petkov

On Mon, Feb 11, 2013 at 02:46:43PM -0800, H. Peter Anvin wrote:
> The X server itself used to do that. Are you saying that wdm is a
> *privileged process*?

Nah, it is a simple display manager you start with /etc/init.d/wdm init
script. Like the other display managers gdm, kdm, etc.

But it looks like wdm has copied stuff from xdm (from the README):

"Wdm is a modification of XFree86's xdm package for graphically handling
authentication and system login. Most of xdm has been preserved (XFree86
4.2.1.1) with the Login interface based on a WINGs implementation using
Tom Rothamel's "external greet" interface (see AUTHORS)."

And from looking at the part in the source which does the /dev/mem
accesses, it comes from XFree86's source apparently, this is at the
beginning of src/wdm/genauth.c:

/* $Xorg: genauth.c,v 1.5 2001/02/09 02:05:40 xorgcvs Exp $ */
/*

   Copyright 1988, 1998  The Open Group
...

so this explains why it behaves like the X server in that respect.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:x86/cpu] x86, AMD: Enable WC+ memory type on family 10 processors

2013-02-12 Thread Borislav Petkov

Two issues I got with this one, see below.

On Thu, Jan 31, 2013 at 02:45:06PM -0800, tip-bot for Boris Ostrovsky wrote:
> Commit-ID:  f0322bd341fd63261527bf84afd3272bcc2e8dd3
> Gitweb: http://git.kernel.org/tip/f0322bd341fd63261527bf84afd3272bcc2e8dd3
> Author: Boris Ostrovsky 
> AuthorDate: Tue, 29 Jan 2013 16:32:49 -0500
> Committer:  H. Peter Anvin 
> CommitDate: Thu, 31 Jan 2013 13:35:38 -0800
> 
> x86, AMD: Enable WC+ memory type on family 10 processors
> 
> In some cases BIOS may not enable WC+ memory type on family 10
> processors, instead converting what would be WC+ memory to CD type.
> On guests using nested pages this could result in performance
> degradation. This patch enables WC+.
> 
> Signed-off-by: Boris Ostrovsky 
> Link: 
> http://lkml.kernel.org/r/1359495169-23278-1-git-send-email-o...@amd64.org
> Signed-off-by: H. Peter Anvin 
> ---
>  arch/x86/include/uapi/asm/msr-index.h |  1 +
>  arch/x86/kernel/cpu/amd.c | 21 -
>  2 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/msr-index.h 
> b/arch/x86/include/uapi/asm/msr-index.h
> index 433a59f..158cde9 100644
> --- a/arch/x86/include/uapi/asm/msr-index.h
> +++ b/arch/x86/include/uapi/asm/msr-index.h
> @@ -173,6 +173,7 @@
>  #define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140
>  #define MSR_AMD64_OSVW_STATUS0xc0010141
>  #define MSR_AMD64_DC_CFG 0xc0011022
> +#define MSR_AMD64_BU_CFG20xc001102a
>  #define MSR_AMD64_IBSFETCHCTL0xc0011030
>  #define MSR_AMD64_IBSFETCHLINAD  0xc0011031
>  #define MSR_AMD64_IBSFETCHPHYSAD 0xc0011032
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index dd4a5b6..721ef32 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -698,13 +698,11 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
>   if (c->x86 > 0x11)
>   set_cpu_cap(c, X86_FEATURE_ARAT);
>  
> - /*
> -  * Disable GART TLB Walk Errors on Fam10h. We do this here
> -  * because this is always needed when GART is enabled, even in a
> -  * kernel which has no MCE support built in.
> -  */
>   if (c->x86 == 0x10) {
>   /*
> +  * Disable GART TLB Walk Errors on Fam10h. We do this here
> +  * because this is always needed when GART is enabled, even in a
> +  * kernel which has no MCE support built in.
>* BIOS should disable GartTlbWlk Errors themself. If
>* it doesn't do it here as suggested by the BKDG.
>*
> @@ -718,6 +716,19 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
>   mask |= (1 << 10);
>   wrmsrl_safe(MSR_AMD64_MCx_MASK(4), mask);
>   }
> +
> + /*
> +  * On family 10h BIOS may not have properly enabled WC+ support,
> +  * causing it to be converted to CD memtype. This may result in
> +  * performance degradation for certain nested-paging guests.
> +  * Prevent this conversion by clearing bit 24 in
> +  * MSR_AMD64_BU_CFG2.
> +  */
> + if (c->x86 == 0x10) {

This family check is redundant, we're already in a 0x10 if-branch
above. Boris had sent a second version which doesn't have that check:
http://marc.info/?l=linux-kernel&m=135949774114910 but I don't know how this
other version has gotten in.

@hpa: maybe replace - patch is still at the top of tip:x86/cpu?

> + rdmsrl(MSR_AMD64_BU_CFG2, value);
> + value &= ~(1ULL << 24);
> + wrmsrl(MSR_AMD64_BU_CFG2, value);
> + }
>   }
>  
>   rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);

However, the more serious issue is that that same kernel #GPs when
booted in kvm. It seems it cannot stomach that specific MSR, see the
second "<-- trapping instruction" below and that BU_CFG2 MSR landing in
%ecx in the line before that.

Oh, and this happens only with the kvm executable (/usr/bin/kvm) in
debian testing. If I use qemu from git, it passes over init_amd just
fine.

Hmmm..

[0.018000] general protection fault:  [#1] PREEMPT SMP 
[0.018000] Modules linked in:
[0.018000] CPU 0 
[0.018000] Pid: 0, comm: swapper/0 Not tainted 3.8.0-rc6+ #3 Bochs Bochs
[0.018000] RIP: 0010:[]  [] 
init_amd+0x4d6/0x50d
[0.018000] RSP: :81813ed8  EFLAGS: 00010246
[0.018000] RAX:  RBX: 00726f73 RCX: c001102a
[0.018000] RDX: 8268b021 RSI: fffb RDI: 0005
[0.018000] RBP: 81813f28 R08:  R09: 
[0.018000] R10: 0001 R11:  R12: 8189e140
[0.018000] R13: 81af82e0 R14: 88007ffd0300 R15: 
[0.018000] FS:  () GS:88007fc00

Re: [tip:x86/cpu] x86, AMD: Enable WC+ memory type on family 10 processors

2013-02-12 Thread Borislav Petkov

On Tue, Feb 12, 2013 at 04:21:13PM -0800, H. Peter Anvin wrote:
> On 02/12/2013 04:16 PM, Borislav Petkov wrote:
> >
> >This family check is redundant, we're already in a 0x10 if-branch
> >above. Boris had sent a second version which doesn't have that check:
> >http://marc.info/?l=linux-kernel&m=135949774114910 but I don't know how this
> >other version has gotten in.
> >
> >@hpa: maybe replace - patch is still at the top of tip:x86/cpu?
> >
> 
> I'll check with Ingo if that is doable.
> 
> >>+   rdmsrl(MSR_AMD64_BU_CFG2, value);
> >>+   value &= ~(1ULL << 24);
> >>+   wrmsrl(MSR_AMD64_BU_CFG2, value);
> >>+   }
> >>}
> >>
> >>rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);
> >
> >However, the more serious issue is that that same kernel #GPs when
> >booted in kvm. It seems it cannot stomach that specific MSR, see the
> >second "<-- trapping instruction" below and that BU_CFG2 MSR landing in
> >%ecx in the line before that.
> >
> >Oh, and this happens only with the kvm executable (/usr/bin/kvm) in
> >debian testing. If I use qemu from git, it passes over init_amd just
> >fine.
> >
> >Hmmm..
> >
> 
> It #GPs on an MSR, which tends to be a bug in the VMM; RDMSR/WRMSR
> generally kick out to the VMM.  There isn't a huge lot of work we
> can do about that...

Yeah, kvm.ko which runs on the host says that it ignores this MSR:

[160716.170333] kvm [29093]: vcpu0 unhandled rdmsr: 0xc001102a

> I think Qemu defaults to ignoring unknown-to-it MSRs whereas maybe
> kvmtool croaks?  Pekka?

Actually that's the qemu kvm thing you get from http://www.linux-kvm.org
not the kvmtool.

Let me add the kvm ML to CC.

Guys, when I start the guest in kvm, it #GPs early when
it tries to RDMSR 0xc001102a. Here's the oops message:
http://marc.info/?l=linux-kernel&m=136071460803452

qemu-kvm is qemu-kvm (1.1.2+dfsg-5) from debian testing. Command line is:

kvm -snapshot -gdb tcp::1234 -cpu phenom -hda 
/home/boris/kvm/debian/sid-x86_64.img -name "Debian x86_64:1235" -boot 
menu=off,order=c -m 2048 -localtime -net nic -net user,hostfwd=tcp::1235-:22 
-usbdevice tablet -kernel /w/kernel/linux-2.6/arch/x86/boot/bzImage -append 
"vga=0 root=/dev/sda1 debug ignore_loglevel console=ttyS0,115200 console=tty0" 
-serial file:/home/boris/kvm/test-x86_64.log

Any ideas?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] cpufreq: Correct header guards typo

2013-04-02 Thread Borislav Petkov

From: Borislav Petkov 

It should be "governor".

Signed-off-by: Borislav Petkov 
---
 drivers/cpufreq/cpufreq_governor.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.h 
b/drivers/cpufreq/cpufreq_governor.h
index 513cc8234e5e..65937697cab3 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -14,8 +14,8 @@
  * published by the Free Software Foundation.
  */
 
-#ifndef _CPUFREQ_GOVERNER_H
-#define _CPUFREQ_GOVERNER_H
+#ifndef _CPUFREQ_GOVERNOR_H
+#define _CPUFREQ_GOVERNOR_H
 
 #include 
 #include 
@@ -263,4 +263,4 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
struct common_dbs_data *cdata, unsigned int event);
 void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
unsigned int delay, bool all_cpus);
-#endif /* _CPUFREQ_GOVERNER_H */
+#endif /* _CPUFREQ_GOVERNOR_H */
-- 
1.8.2.135.g7b592fa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 1/2] cpufreq: ondemand: allow custom powersave_bias_target function to be registered

2013-04-02 Thread Borislav Petkov

On Thu, Mar 28, 2013 at 01:24:16PM -0500, Jacob Shin wrote:
> This allows for another [arch specific] driver to hook into existing
> powersave bias function of the ondemand governor. i.e. This allows AMD
> specific powersave bias function (in a separate AMD specific driver)
> to aid ondemand governor's frequency transition deicisions.
> 
> Signed-off-by: Jacob Shin 
> ---
>  drivers/cpufreq/cpufreq_governor.h |3 +++
>  drivers/cpufreq/cpufreq_ondemand.c |   22 +++---
>  2 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_governor.h 
> b/drivers/cpufreq/cpufreq_governor.h
> index c83cabf..4b6808f 100644
> --- a/drivers/cpufreq/cpufreq_governor.h
> +++ b/drivers/cpufreq/cpufreq_governor.h
> @@ -262,4 +262,7 @@ bool need_load_eval(struct cpu_dbs_common_info *cdbs,
>   unsigned int sampling_rate);
>  int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>   struct common_dbs_data *cdata, unsigned int event);
> +void od_register_powersave_bias_function(unsigned int (*f)
> + (struct cpufreq_policy *, unsigned int, unsigned int));
> +void od_unregister_powersave_bias_function(void);

We generally call those a "callback" or a "handler". I.e.,
od_register_powersave_bias_handler or something.

>  #endif /* _CPUFREQ_GOVERNER_H */
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c 
> b/drivers/cpufreq/cpufreq_ondemand.c
> index 15e80ee..36f0798 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -40,6 +40,8 @@
>  
>  static DEFINE_PER_CPU(struct od_cpu_dbs_info_s, od_cpu_dbs_info);
>  
> +static struct od_ops od_ops;
> +
>  #ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND
>  static struct cpufreq_governor cpufreq_gov_ondemand;
>  #endif
> @@ -145,7 +147,8 @@ static void dbs_freq_increase(struct cpufreq_policy *p, 
> unsigned int freq)
>   struct od_dbs_tuners *od_tuners = dbs_data->tuners;
>  
>   if (od_tuners->powersave_bias)
> - freq = powersave_bias_target(p, freq, CPUFREQ_RELATION_H);
> + freq = od_ops.powersave_bias_target(p, freq,
> + CPUFREQ_RELATION_H);
>   else if (p->cur == p->max)
>   return;
>  
> @@ -206,8 +209,8 @@ static void od_check_cpu(int cpu, unsigned int load_freq)
>   __cpufreq_driver_target(policy, freq_next,
>   CPUFREQ_RELATION_L);
>   } else {
> - int freq = powersave_bias_target(policy, freq_next,
> - CPUFREQ_RELATION_L);
> + int freq = od_ops.powersave_bias_target(policy,
> + freq_next, CPUFREQ_RELATION_L);
>   __cpufreq_driver_target(policy, freq,
>   CPUFREQ_RELATION_L);
>   }
> @@ -565,6 +568,19 @@ static struct common_dbs_data od_dbs_cdata = {
>   .exit = od_exit,
>  };
>  
> +void od_register_powersave_bias_function(unsigned int (*f)
> + (struct cpufreq_policy *, unsigned int, unsigned int))
> +{
> + od_ops.powersave_bias_target = f;
> +}
> +EXPORT_SYMBOL_GPL(od_register_powersave_bias_function);
> +
> +void od_unregister_powersave_bias_function(void)
> +{
> + od_ops.powersave_bias_target = powersave_bias_target;

This is very confusing: we have ->powersave_bias_target and the default
powersave_bias_target in the ondemand governor. Can we call the default
one generic_powersave_bias_target or default_* or whatever.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

2013-04-02 Thread Borislav Petkov

On Tue, Apr 02, 2013 at 01:40:13PM +0200, Thomas Renninger wrote:
> On Thursday, March 28, 2013 01:24:17 PM Jacob Shin wrote:
> > Future AMD processors, starting with Family 16h, can provide software
> > with feedback on how the workload may respond to frequency change --
> > memory-bound workloads will not benefit from higher frequency, where
> > as compute-bound workloads will. This patch enables this "frequency
> > sensitivity feedback" to aid the ondemand governor to make better
> > frequency change decisions by hooking into the powersave bias.
> If I read this correctly, nothing changes even if the driver is loaded,
> unless user modifies:
> /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias
> is this correct?
> 
> I wonder who should modify:
> /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias
> Even cpupower is not aware of this very specific tunable.
> 
> Also, are you sure cpufreq subsystem will be the only user
> of this one?
> Or could cpuidle or others also make use of this somewhen in the future?

Yeah, I don't think this is supposed to work like that - more likely,
you want to use the freq sensitivity thing by default if the hardware
supports it.

So I think the od_tuners->powersave_bias check needs to be augmented
with a freq_sensitivity cpuid bit check...

> Then this could more be done like:
> drivers/cpufreq/mperf.c
> And scheduler, cpuidle, cpufreq or whatever could use this as well.
> 
> Just some thinking:
> I wonder how one could check/verify that the right thing is done
> (by CPU and kernel). Ideally it would be nice to have the CPU register
> appended to a cpufreq or cpuidle event trace.
> But this very (AMD or X86 only?) specific data would not look nice there.
> An arch placeholder value would be needed or similar?

I actually wonder whether this should be a separate module but I
guess this is maybe the most agreeable way for adding vendor-specific
functionality to cpufreq.

> ...
> > +}
> > +
> > +static int __init amd_freq_sensitivity_init(void)
> > +{
> > +   int i;
> > +   u32 eax, edx, dummy;
> > +
> > +   if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> > +   return -ENODEV;
> > +
> > +   cpuid(0x8007, &eax, &dummy, &dummy, &edx);
> If this really should be a separate module:
> Does/will Intel have the same (feature/cpuid bit)?
> Anyway, this should get a general AMD or X86 CPU capability flag.
> 
> Then you can also autoload this driver similar to how it's done in acpi-
> cpufreq:
> static const struct x86_cpu_id acpi_cpufreq_ids[] = {
> X86_FEATURE_MATCH(X86_FEATURE_ACPI),
> X86_FEATURE_MATCH(X86_FEATURE_HW_PSTATE),
> {}
> };
> MODULE_DEVICE_TABLE(x86cpu, acpi_cpufreq_ids);

Yes, this needs to be a cpu feature bit in cpufeature.h and be loaded
automatically.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 13227 matches

Mail list logo