Re: [Xen-devel] [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-17 Thread Thomas Gleixner
On Fri, 17 Nov 2017, Quan Xu wrote:
> On 2017-11-16 17:53, Thomas Gleixner wrote:
> > That's just plain wrong. We don't want to see any of this PARAVIRT crap in
> > anything outside the architecture/hypervisor interfacing code which really
> > needs it.
> > 
> > The problem can and must be solved at the generic level in the first place
> > to gather the data which can be used to make such decisions.
> > 
> > How that information is used might be either completely generic or requires
> > system specific variants. But as long as we don't have any information at
> > all we cannot discuss that.
> > 
> > Please sit down and write up which data needs to be considered to make
> > decisions about probabilistic polling. Then we need to compare and contrast
> > that with the data which is necessary to make power/idle state decisions.
> > 
> > I would be very surprised if this data would not overlap by at least 90%.
> > 
> 1. which data needs to considerd to make decisions about probabilistic polling
> 
> I really need to write up which data needs to considerd to make
> decisions about probabilistic polling. At last several months,
> I always focused on the data _from idle to reschedule_, then to bypass
> the idle loops. unfortunately, this makes me touch scheduler/idle/nohz
> code inevitably.
> 
> with tglx's suggestion, the data which is necessary to make power/idle
> state decisions, is the last idle state's residency time. IIUC this data
> is duration from idle to wakeup, which maybe by reschedule irq or other irq.

That's part of the picture, but not complete.

> I also test that the reschedule irq overlap by more than 90% (trace the
> need_resched status after cpuidle_idle_call), when I run ctxsw/netperf for
> one minute.
> 
> as the overlap, I think I can input the last idle state's residency time
> to make decisions about probabilistic polling, as @dev->last_residency does.
> it is much easier to get data.

That's only true for your particular use case.

> 
> 2. do a HV specific idle driver (function)
> 
> so far, power management is not exposed to guest.. idle is simple for KVM
> guest,
> calling "sti" / "hlt"(cpuidle_idle_call() --> default_idle_call())..
> thanks Xen guys, who has implemented the paravirt framework. I can implement
> it
> as easy as following:
> 
>  --- a/arch/x86/kernel/kvm.c

Your email client is using a very strange formatting. 

>  +++ b/arch/x86/kernel/kvm.c
>  @@ -465,6 +465,12 @@ static void __init kvm_apf_trap_init(void)
>  update_intr_gate(X86_TRAP_PF, async_page_fault);
>   }
> 
>  +static __cpuidle void kvm_safe_halt(void)
>  +{
>      +    /* 1. POLL, if need_resched() --> return */
>      +
>  +    asm volatile("sti; hlt": : :"memory"); /* 2. halt */
>  +
>      +    /* 3. get the last idle state's residency time */
>  +
>      +    /* 4. update poll duration based on last idle state's
> residency time */
>  +}
>  +
>   void __init kvm_guest_init(void)
>   {
>  int i;
>  @@ -490,6 +496,8 @@ void __init kvm_guest_init(void)
>  if (kvmclock_vsyscall)
>  kvm_setup_vsyscall_timeinfo();
> 
>  +   pv_irq_ops.safe_halt = kvm_safe_halt;
>  +
>   #ifdef CONFIG_SMP
> 
> 
> then, I am no need to introduce a new pvops, and never modify
> schedule/idle/nohz code again.
> also I can narrow all of the code down in arch/x86/kernel/kvm.c.
> 
> If this is in the right direction, I will send a new patch set next week..

This is definitely better than what you proposed so far and implementing it
as a prove of concept seems to be worthwhile.

But I doubt that this is the final solution. It's not generic and not
necessarily suitable for all use case scenarios.

Thanks,

tglx___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-16 Thread Thomas Gleixner
On Thu, 16 Nov 2017, Quan Xu wrote:
> On 2017-11-16 06:03, Thomas Gleixner wrote:
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -210,6 +210,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
> struct cpuidle_driver *drv,
>     target_state = >states[index];
>     }
> 
> +#ifdef CONFIG_PARAVIRT
> +   paravirt_idle_poll();
> +
> +   if (need_resched())
> +   return -EBUSY;
> +#endif

That's just plain wrong. We don't want to see any of this PARAVIRT crap in
anything outside the architecture/hypervisor interfacing code which really
needs it.

The problem can and must be solved at the generic level in the first place
to gather the data which can be used to make such decisions.

How that information is used might be either completely generic or requires
system specific variants. But as long as we don't have any information at
all we cannot discuss that.

Please sit down and write up which data needs to be considered to make
decisions about probabilistic polling. Then we need to compare and contrast
that with the data which is necessary to make power/idle state decisions.

I would be very surprised if this data would not overlap by at least 90%.

Thanks,

tglx___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-16 Thread Thomas Gleixner
On Thu, 16 Nov 2017, Quan Xu wrote:
> On 2017-11-16 16:45, Peter Zijlstra wrote:
> 
> I really have considered this factor, and try my best not to interfere with
> scheduler/idle code.
>
> if irq_timings code is ready, I can use it directly. I think irq_timings
> is not an easy task, I'd like to help as much as I can.

It's not a question whether irq_timings code is ready or not.

The infrastructure is there. I said that before and I'm happy to repeat:

 All parties who need this kind of prediction should:

 1) Talk to each other

 2) Work together to make it usable for _ALL_ use cases

 AFAICT, that never happened. Otherwise there would be either progress on
 that or at least a reasonable explanation WHY it cannot be done.

Peter and myself made it entirely clear several times in the past that this
must be solved at the generic level without any magic hackery in random
places. But the only thing we've seen so far is the magic hackery coming
around in a different flavour some time after we rejected the last one.

We can play that game forever. The outcome is extremly predictable.

Thanks,

tglx



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-16 Thread Thomas Gleixner
On Thu, 16 Nov 2017, Peter Zijlstra wrote:

> On Wed, Nov 15, 2017 at 11:03:08PM +0100, Thomas Gleixner wrote:
> > If I understand the problem correctly then he wants to avoid the heavy
> > lifting in tick_nohz_idle_enter() in the first place, but there is already
> > an interesting quirk there which makes it exit early. 
> 
> Sure. And there are people who want to do the same for native.
> 
> Adding more ugly and special cases just isn't the way to go about doing
> that.
> 
> I'm fairly sure I've told the various groups that want to tinker with
> this to work together on this. I've also in fairly significant detail
> sketched how to rework the idle code and idle predictors.
> 
> At this point I'm too tired to dig any of that up, so I'll just keep
> saying no to patches that don't even attempt to go in the right
> direction.

That's why I said: But lets not proliferate that. I'd rather see that go
away.

And yes, the VM folks should talk to those who are trying to solve similar
problems for native (embedded/mobile).

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-15 Thread Thomas Gleixner
On Wed, 15 Nov 2017, Peter Zijlstra wrote:

> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:
> > From: Yang Zhang 
> > 
> > Implement a generic idle poll which resembles the functionality
> > found in arch/. Provide weak arch_cpu_idle_poll function which
> > can be overridden by the architecture code if needed.
> 
> No, we want less of those magic hooks, not more.
> 
> > Interrupts arrive which may not cause a reschedule in idle loops.
> > In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
> > for interrupts and VM-exit immediately. Also this becomes more
> > expensive than bare metal. Add a generic idle poll before enter
> > real idle path. When a reschedule event is pending, we can bypass
> > the real idle path.
> 
> Why not do a HV specific idle driver?

If I understand the problem correctly then he wants to avoid the heavy
lifting in tick_nohz_idle_enter() in the first place, but there is already
an interesting quirk there which makes it exit early.  See commit
3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason for this commit
looks similar. But lets not proliferate that. I'd rather see that go away.

But the irq_timings stuff is heading into the same direction, with a more
complex prediction logic which should tell you pretty good how long that
idle period is going to be and in case of an interrupt heavy workload this
would skip the extra work of stopping and restarting the tick and provide a
very good input into a polling decision.

This can be handled either in a HV specific idle driver or even in the
generic core code. If the interrupt does not arrive then you can assume
within the predicted time then you can assume that the flood stopped and
invoke halt or whatever.

That avoids all of that 'tunable and tweakable' x86 specific hackery and
utilizes common functionality which is mostly there already.

Thanks,

tglx






___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 2/5] x86/pvclock: add setter for pvclock_pvti_cpu0_va

2017-11-08 Thread Thomas Gleixner
On Wed, 8 Nov 2017, Joao Martins wrote:
> On 11/08/2017 11:06 AM, Thomas Gleixner wrote:
> > The only nit-pick I have are the convoluted function names:
> > 
> > pvclock_set_pvti_cpu0_va() pvclock_pvti_cpu0_va()
> > 
> > What on earth does that mean?
> >
> Those two functions respectively set and get in pvclock common code the 
> address
> of a page for vCPU 0 containing time info (pvti, which is periodically updated
> by hypervisor). This region is guest memory and registered with hypervisor by
> guest PV clocksource and set in pvclock if certain conditions are met (i.e.
> PVCLOCK_TSC_STABLE_BIT is supported by hypervisor), and the getter is 
> afterwards
> used by vdso and ptp_kvm.
> 
> FWIW I merely followed the current style/code of the existent function but 
> there
> could be a better name like "pvclock_set_data() pvclock_get_data()". Albeit 
> the
> current names are more explicit on what we should expect to set or return from
> the functions.

Fair enough.

> 
> > Aside of that can you please make it at least symetric, i.e. _set_ and
> > _get_ ?
> > 
> OK - Provided this is changing an exported symbol (pvclock_pvti_cpu0_va in use
> by ptp_kvm) and a non-functional change would you want me to address in a
> separate patch or it is OK to have in this one?

Just fixup the ptp_kvm call site in the very same patch.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 2/5] x86/pvclock: add setter for pvclock_pvti_cpu0_va

2017-11-08 Thread Thomas Gleixner
On Tue, 7 Nov 2017, Joao Martins wrote:
> On 11/06/2017 04:09 PM, Paolo Bonzini wrote:
> > On 19/10/2017 15:39, Joao Martins wrote:
> >> Right now there is only a pvclock_pvti_cpu0_va() which is defined
> >> on kvmclock since:
> >>
> >> commit dac16fba6fc5
> >> ("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")
> >>
> >> The only user of this interface so far is kvm. This commit adds a
> >> setter function for the pvti page and moves pvclock_pvti_cpu0_va
> >> to pvclock, which is a more generic place to have it; and would
> >> allow other PV clocksources to use it, such as Xen.
> >>
> >> Signed-off-by: Joao Martins <joao.m.mart...@oracle.com>
> >> Acked-by: Andy Lutomirski <l...@kernel.org>
> > 
> > Acked-by: Paolo Bonzini <pbonz...@redhat.com>
> > 
> > IOW, the Xen folks are free to pick up the whole series. :)
> > 
> Thank you!
> 
> I guess only x86 maintainers Ack is left - any comments?

The only nit-pick I have are the convoluted function names:

pvclock_set_pvti_cpu0_va() pvclock_pvti_cpu0_va()

What on earth does that mean?

Aside of that can you please make it at least symetric, i.e. _set_ and
_get_ ?

Other than that:

  Acked-by: Thomas Gleixner <t...@linutronix.de>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree

2017-08-31 Thread Thomas Gleixner
On Thu, 31 Aug 2017, Boris Ostrovsky wrote:

> On 08/31/2017 08:00 AM, Thomas Gleixner wrote:
> > On Thu, 31 Aug 2017, Juergen Gross wrote:
> >>> I've applied it on top of tip:x86/apic and fixed up the merge conflicts
> >>> mindlessly. Patch below.
> >>>
> >>> Juergen, can you please check the result?
> >> You missed the updates to arch/x86/xen/xen-asm_64.S and the declarations
> >> of the xen specific trap entries in arch/x86/include/asm/traps.h
> > I'll try that again later today, unless you beat me to it.
> >
> 
> https://marc.info/?l=linux-kernel=150296063131595=2 should also be
> picked up by the tip tree then since it applies on top of the
> adjust_exception_frame patch. I will revert it from Xen tree as well.

Juergen folded that fix in and posted a version against tip:x86/apic. It's
applied and pushed out now.

Thanks to everyone helping with this.

   tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree

2017-08-31 Thread Thomas Gleixner
On Thu, 31 Aug 2017, Juergen Gross wrote:
> > I've applied it on top of tip:x86/apic and fixed up the merge conflicts
> > mindlessly. Patch below.
> > 
> > Juergen, can you please check the result?
> 
> You missed the updates to arch/x86/xen/xen-asm_64.S and the declarations
> of the xen specific trap entries in arch/x86/include/asm/traps.h

I'll try that again later today, unless you beat me to it.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree

2017-08-31 Thread Thomas Gleixner
On Thu, 31 Aug 2017, Thomas Gleixner wrote:
> Hrm. For some reason I missed to remove these defines after getting rid of
> the tracing idt.
> 
> I'll remove that now in tip and pull in the XEN stuff to see what needs to
> be done.

I pushed out the removal of the trace leftovers. Talked to Juergen on IRC
and he suggested to revert the XEN patch in the xen tree and merge it
through tip.

I've applied it on top of tip:x86/apic and fixed up the merge conflicts
mindlessly. Patch below.

Juergen, can you please check the result?

Thanks,

tglx

8<-
Subject: xen: Get rid of paravirt op adjust_exception_frame
From: Juergen Gross <jgr...@suse.com>
Date: Fri, 11 Aug 2017 16:54:48 +0200

When running as Xen pv-guest the exception frame on the stack contains
%r11 and %rcx additional to the other data pushed by the processor.

Instead of having a paravirt op being called for each exception type
prepend the Xen specific code to each exception entry. When running as
Xen pv-guest just use the exception entry with prepended instructions,
otherwise use the entry without the Xen specific code.

Signed-off-by: Juergen Gross <jgr...@suse.com>
Cc: xen-de...@lists.xenproject.org
Cc: boris.ostrov...@oracle.com
Cc: l...@amacapital.net
Link: http://lkml.kernel.org/r/20170811145448.5679-1-jgr...@suse.com

---
 arch/x86/entry/entry_64.S |   11 +--
 arch/x86/entry/entry_64_compat.S  |1 
 arch/x86/include/asm/paravirt.h   |5 -
 arch/x86/include/asm/paravirt_types.h |4 -
 arch/x86/include/asm/proto.h  |3 +
 arch/x86/include/asm/traps.h  |3 -
 arch/x86/kernel/asm-offsets_64.c  |1 
 arch/x86/kernel/paravirt.c|3 -
 arch/x86/xen/enlighten_pv.c   |   96 ++
 arch/x86/xen/irq.c|3 -
 arch/x86/xen/xen-ops.h|1 
 11 files changed, 70 insertions(+), 61 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -816,7 +816,6 @@ ENTRY(\sym)
.endif
 
ASM_CLAC
-   PARAVIRT_ADJUST_EXCEPTION_FRAME
 
.ifeq \has_error_code
pushq   $-1 /* ORIG_RAX: no syscall to 
restart */
@@ -962,7 +961,7 @@ ENTRY(do_softirq_own_stack)
 ENDPROC(do_softirq_own_stack)
 
 #ifdef CONFIG_XEN
-idtentry xen_hypervisor_callback xen_do_hypervisor_callback has_error_code=0
+idtentry hypervisor_callback xen_do_hypervisor_callback has_error_code=0
 
 /*
  * A note on the "critical region" in our callback handler.
@@ -1029,8 +1028,6 @@ ENTRY(xen_failsafe_callback)
movq8(%rsp), %r11
addq$0x30, %rsp
pushq   $0  /* RIP */
-   pushq   %r11
-   pushq   %rcx
UNWIND_HINT_IRET_REGS offset=8
jmp general_protection
 1: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
@@ -1061,9 +1058,8 @@ idtentry int3 do_int3 
has_error_code
 idtentry stack_segment do_stack_segmenthas_error_code=1
 
 #ifdef CONFIG_XEN
-idtentry xen_debug do_debughas_error_code=0
-idtentry xen_int3  do_int3 has_error_code=0
-idtentry xen_stack_segment do_stack_segmenthas_error_code=1
+idtentry xendebug  do_debughas_error_code=0
+idtentry xenint3   do_int3 has_error_code=0
 #endif
 
 idtentry general_protectiondo_general_protection   has_error_code=1
@@ -1227,6 +1223,7 @@ ENTRY(error_exit)
 END(error_exit)
 
 /* Runs on exception stack */
+/* XXX: broken on Xen PV */
 ENTRY(nmi)
UNWIND_HINT_IRET_REGS
/*
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -293,7 +293,6 @@ ENTRY(entry_INT80_compat)
/*
 * Interrupts are off on entry.
 */
-   PARAVIRT_ADJUST_EXCEPTION_FRAME
ASM_CLAC/* Do this early to minimize exposure */
SWAPGS
 
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -960,11 +960,6 @@ extern void default_banner(void);
 #define GET_CR2_INTO_RAX   \
call PARA_INDIRECT(pv_mmu_ops+PV_MMU_read_cr2)
 
-#define PARAVIRT_ADJUST_EXCEPTION_FRAME
\
-   PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_adjust_exception_frame), \
- CLBR_NONE,\
- call PARA_INDIRECT(pv_irq_ops+PV_IRQ_adjust_exception_frame))
-
 #define USERGS_SYSRET64
\
PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret64),   \
  CLBR_NONE,\
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -195,10 +195,6 @@ struct pv

Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree

2017-08-31 Thread Thomas Gleixner
On Thu, 31 Aug 2017, Stephen Rothwell wrote:
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc arch/x86/include/asm/traps.h
> index b4f322d6c95f,935709829a4e..
> --- a/arch/x86/include/asm/traps.h
> +++ b/arch/x86/include/asm/traps.h
> @@@ -38,7 -35,34 +35,33 @@@ asmlinkage void machine_check(void)
>   #endif /* CONFIG_X86_MCE */
>   asmlinkage void simd_coprocessor_error(void);
>   
> + #if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV)
> + asmlinkage void xen_divide_error(void);
> + asmlinkage void xen_xendebug(void);
> + asmlinkage void xen_xenint3(void);
> + asmlinkage void xen_nmi(void);
> + asmlinkage void xen_overflow(void);
> + asmlinkage void xen_bounds(void);
> + asmlinkage void xen_invalid_op(void);
> + asmlinkage void xen_device_not_available(void);
> + asmlinkage void xen_double_fault(void);
> + asmlinkage void xen_coprocessor_segment_overrun(void);
> + asmlinkage void xen_invalid_TSS(void);
> + asmlinkage void xen_segment_not_present(void);
> + asmlinkage void xen_stack_segment(void);
> + asmlinkage void xen_general_protection(void);
> + asmlinkage void xen_page_fault(void);
> + asmlinkage void xen_async_page_fault(void);
> + asmlinkage void xen_spurious_interrupt_bug(void);
> + asmlinkage void xen_coprocessor_error(void);
> + asmlinkage void xen_alignment_check(void);
> + #ifdef CONFIG_X86_MCE
> + asmlinkage void xen_machine_check(void);
> + #endif /* CONFIG_X86_MCE */
> + asmlinkage void xen_simd_coprocessor_error(void);
> + #endif
> + 
>   #ifdef CONFIG_TRACING
>  -asmlinkage void trace_page_fault(void);
>   #define trace_stack_segment stack_segment
>   #define trace_divide_error divide_error
>   #define trace_bounds bounds
> @@@ -53,7 -77,10 +76,11 @@@
>   #define trace_alignment_check alignment_check
>   #define trace_simd_coprocessor_error simd_coprocessor_error
>   #define trace_async_page_fault async_page_fault
>  +#define trace_page_fault page_fault

Hrm. For some reason I missed to remove these defines after getting rid of
the tracing idt.

I'll remove that now in tip and pull in the XEN stuff to see what needs to
be done.

Thanks,

tglx


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 00/38] x86: Secure Memory Encryption (AMD)

2017-07-18 Thread Thomas Gleixner
On Mon, 17 Jul 2017, Tom Lendacky wrote:
> This patch series provides support for AMD's new Secure Memory Encryption 
> (SME)
> feature.
> 
> SME can be used to mark individual pages of memory as encrypted through the
> page tables. A page of memory that is marked encrypted will be automatically
> decrypted when read from DRAM and will be automatically encrypted when
> written to DRAM. Details on SME can found in the links below.
> 
> The SME feature is identified through a CPUID function and enabled through
> the SYSCFG MSR. Once enabled, page table entries will determine how the
> memory is accessed. If a page table entry has the memory encryption mask set,
> then that memory will be accessed as encrypted memory. The memory encryption
> mask (as well as other related information) is determined from settings
> returned through the same CPUID function that identifies the presence of the
> feature.
> 
> The approach that this patch series takes is to encrypt everything possible
> starting early in the boot where the kernel is encrypted. Using the page
> table macros the encryption mask can be incorporated into all page table
> entries and page allocations. By updating the protection map, userspace
> allocations are also marked encrypted. Certain data must be accounted for
> as having been placed in memory before SME was enabled (EFI, initrd, etc.)
> and accessed accordingly.
> 
> This patch series is a pre-cursor to another AMD processor feature called
> Secure Encrypted Virtualization (SEV). The support for SEV will build upon
> the SME support and will be submitted later. Details on SEV can be found
> in the links below.

Well done series. Thanks to all people involved, especially Tom and Boris!
It was a pleasure to review that.

Reviewed-by: Thomas Gleixner <t...@linutronix.de>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen: dont fiddle with event channel masking in suspend/resume

2017-07-18 Thread Thomas Gleixner
On Mon, 17 Jul 2017, Juergen Gross wrote:

> Instead of fiddling with masking the event channels during suspend
> and resume handling let do the irq subsystem do its job. It will do
> the mask and unmask operations as needed.
> 
> Signed-off-by: Juergen Gross <jgr...@suse.com>

Acked-by: Thomas Gleixner <t...@linutronix.de>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Problem with commit bf22ff45bed664aefb5c4e43029057a199b7070c

2017-07-12 Thread Thomas Gleixner
On Wed, 12 Jul 2017, Thomas Gleixner wrote:
> On Mon, 10 Jul 2017, Juergen Gross wrote:
> > It is based on suspend/resume framework. The main work to be done
> > additionally is to disconnect from the pv-backends at save time and
> > connect to the pv-backends again at restore time.
> > 
> > The main function triggering all that is xen_suspend() (as seen in
> > above backtrace).
> 
> The untested patch below should give you hooks to do what you need to do.
> 
> Add the irq_suspend/resume callbacks and set the IRQCHIP_GENERIC_SUSPEND
> flag on your xen irqchip, so it actually gets invoked.
> 
> I have to make that opt in right now because the callbacks are used in the
> generic irqchip implementation already. We can revisit that when you can
> confirm that this is actually solving the problem.

There might be an even simpler solution.

As this is using the regular suspend_device_irqs() call, you just might get
away with setting IRQCHIP_MASK_ON_SUSPEND for your irq chip. That does not
use the lazy disable approach, it also masks all interrupts which are not
marked as wakeup irqs. I assume none of them is when you do that
save/restore dance.

That said, you still might make the whole mechanism cleaner by using the
irq chip callbacks so you can avoid traversing all the interrupts another
time. But I can't say for sure as I got lost in that xen event channel code.

Thanks,

tglx


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Problem with commit bf22ff45bed664aefb5c4e43029057a199b7070c

2017-07-12 Thread Thomas Gleixner
On Mon, 10 Jul 2017, Juergen Gross wrote:
> On 07/07/17 19:11, Thomas Gleixner wrote:
> > On Fri, 7 Jul 2017, Thomas Gleixner wrote:
> > 
> >> On Fri, 7 Jul 2017, Juergen Gross wrote:
> >>
> >>> Commit bf22ff45bed664aefb5c4e43029057a199b7070c ("genirq: Avoid
> >>> unnecessary low level irq function calls") breaks Xen guest
> >>> save/restore handling.
> >>>
> >>> The main problem are the PV devices using Xen event channels as
> >>> interrupt sources which are represented as an "irq chip" in the kernel.
> >>> When saving the guest the event channels are masked internally. At
> >>> restore time event channels are re-established and unmasked via
> >>> irq_startup().
> > 
> > And how exactly gets irq_startup() invoked on those event channels?
> 
> [   30.791879] Call Trace:
> [   30.791883]  ? irq_get_irq_data+0xe/0x20
> [   30.791886]  enable_dynirq+0x23/0x30
> [   30.791888]  unmask_irq.part.33+0x26/0x40
> [   30.791890]  irq_enable+0x65/0x70
> [   30.791891]  irq_startup+0x3c/0x110
> [   30.791893]  __enable_irq+0x37/0x60
> [   30.791895]  resume_irqs+0xbe/0xe0
> [   30.791897]  irq_pm_syscore_resume+0x13/0x20
> [   30.791900]  syscore_resume+0x50/0x1b0
> [   30.791902]  xen_suspend+0x76/0x140
> 
> > 
> >>> I have a patch repairing the issue, but I'm not sure if this way to do
> >>> it would be accepted. I have exported mask_irq() and I'm doing the
> >>> masking now through this function. Would the attached patch be
> >>> acceptable? Or is there a better way to solve the problem?
> >>
> >> Without looking at the patch (too lazy to fiddle with attachments right
> >> now), this is definitely wrong. I'll have a look later tonight.
> > 
> > Not that I'm surprised, but that patch is exactly what I expected. Export a
> > random function, which helps to paper over the real problem and run away.
> > These functions are internal for a reason and we worked hard on making
> > people understand that fiddling with the internals of interrupts is a
> > NONO. If there are special requirements for a good reason, then we create
> > proper interfaces and infrastructure, if there is no good reason, then the
> > problematic code needs to be fixed. There is no exception for XEN.
> 
> I'm absolutely on your side here. That was the reason I didn't send
> the patch right away, but asked how to solve my issue in a way which
> isn't "quick and dirty". The patch was just the easiest way to explain
> what should be the result of the proper solution.

Fair enough!

> > Can you please explain how that save/restore stuff works and which
> > functions are involved?
> 
> It is based on suspend/resume framework. The main work to be done
> additionally is to disconnect from the pv-backends at save time and
> connect to the pv-backends again at restore time.
> 
> The main function triggering all that is xen_suspend() (as seen in
> above backtrace).

The untested patch below should give you hooks to do what you need to do.

Add the irq_suspend/resume callbacks and set the IRQCHIP_GENERIC_SUSPEND
flag on your xen irqchip, so it actually gets invoked.

I have to make that opt in right now because the callbacks are used in the
generic irqchip implementation already. We can revisit that when you can
confirm that this is actually solving the problem.

Thanks,

tglx

8<--
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -476,6 +476,8 @@ struct irq_chip {
  * IRQCHIP_SKIP_SET_WAKE:  Skip chip.irq_set_wake(), for this irq chip
  * IRQCHIP_ONESHOT_SAFE:   One shot does not require mask/unmask
  * IRQCHIP_EOI_THREADED:   Chip requires eoi() on unmask in threaded mode
+ * IRQCHIP_GENERIC_SUSPEND:Use the suspend/resume callbacks in
+ * device_irq_suspend/resume
  */
 enum {
IRQCHIP_SET_TYPE_MASKED = (1 <<  0),
@@ -485,6 +487,7 @@ enum {
IRQCHIP_SKIP_SET_WAKE   = (1 <<  4),
IRQCHIP_ONESHOT_SAFE= (1 <<  5),
IRQCHIP_EOI_THREADED= (1 <<  6),
+   IRQCHIP_GENERIC_SUSPEND = (1 <<  7),
 };
 
 #include 
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -70,6 +70,8 @@ void irq_pm_remove_action(struct irq_des
 
 static bool suspend_device_irq(struct irq_desc *desc)
 {
+   struct irq_chip *chip;
+
if (!desc->action || irq_desc_is_chained(desc) ||
desc->no_suspend_depth)
return false;
@@ -94,8 +96,13 @@ static bool suspend_device_irq(struct ir
 * chip level. The chip implementation indicates that with
 * IRQCHIP_MASK_ON_

Re: [Xen-devel] Problem with commit bf22ff45bed664aefb5c4e43029057a199b7070c

2017-07-07 Thread Thomas Gleixner
On Fri, 7 Jul 2017, Thomas Gleixner wrote:

> On Fri, 7 Jul 2017, Juergen Gross wrote:
> 
> > Commit bf22ff45bed664aefb5c4e43029057a199b7070c ("genirq: Avoid
> > unnecessary low level irq function calls") breaks Xen guest
> > save/restore handling.
> > 
> > The main problem are the PV devices using Xen event channels as
> > interrupt sources which are represented as an "irq chip" in the kernel.
> > When saving the guest the event channels are masked internally. At
> > restore time event channels are re-established and unmasked via
> > irq_startup().

And how exactly gets irq_startup() invoked on those event channels?

> > I have a patch repairing the issue, but I'm not sure if this way to do
> > it would be accepted. I have exported mask_irq() and I'm doing the
> > masking now through this function. Would the attached patch be
> > acceptable? Or is there a better way to solve the problem?
> 
> Without looking at the patch (too lazy to fiddle with attachments right
> now), this is definitely wrong. I'll have a look later tonight.

Not that I'm surprised, but that patch is exactly what I expected. Export a
random function, which helps to paper over the real problem and run away.
These functions are internal for a reason and we worked hard on making
people understand that fiddling with the internals of interrupts is a
NONO. If there are special requirements for a good reason, then we create
proper interfaces and infrastructure, if there is no good reason, then the
problematic code needs to be fixed. There is no exception for XEN.

Can you please explain how that save/restore stuff works and which
functions are involved?

Thanks,

tglx



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Problem with commit bf22ff45bed664aefb5c4e43029057a199b7070c

2017-07-07 Thread Thomas Gleixner
On Fri, 7 Jul 2017, Juergen Gross wrote:

> Commit bf22ff45bed664aefb5c4e43029057a199b7070c ("genirq: Avoid
> unnecessary low level irq function calls") breaks Xen guest
> save/restore handling.
> 
> The main problem are the PV devices using Xen event channels as
> interrupt sources which are represented as an "irq chip" in the kernel.
> When saving the guest the event channels are masked internally. At
> restore time event channels are re-established and unmasked via
> irq_startup(). Unfortunately above commit will let the unmask operation
> be a nop as the irq handling doesn't know about the masking done before.

Rightfully so. Making assumptions about the inner workings of core code is
always wrong.

> I have a patch repairing the issue, but I'm not sure if this way to do
> it would be accepted. I have exported mask_irq() and I'm doing the
> masking now through this function. Would the attached patch be
> acceptable? Or is there a better way to solve the problem?

Without looking at the patch (too lazy to fiddle with attachments right
now), this is definitely wrong. I'll have a look later tonight.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [x86/time] 03fa63cc96: ACPI_Error:Table[DMAR]is_not_invalidated_during_early_boot_stage(#/tbxface -#)

2017-07-06 Thread Thomas Gleixner
On Thu, 6 Jul 2017, kernel test robot wrote:

> commit: 03fa63cc96ab35592e0a7d522b8edbc1e6b02d22 ("x86/time: Initialize 
> interrupt mode behind timer init")

> ++++
> || 43436935b7 | 03fa63cc96 |
> ++++
> | boot_successes | 0  | 4  |
> ++++

So 03fa63cc96 makes the box boot again. I'm confused as usual by the
output of this tool.
 
> kern  :info  : [0.005000] tsc: Fast TSC calibration using PIT
> kern  :info  : [0.006000] tsc: Detected 2195.020 MHz processor
> kern  :info  : [0.007000] Calibrating delay loop (skipped), value 
> calculated using timer frequency.. 4390.04 BogoMIPS (lpj=2195020)
> kern  :info  : [0.008001] pid_max: default: 90112 minimum: 704
> kern  :info  : [0.009037] ACPI: Core revision 20170303
> kern  :err   : [0.010002] ACPI Error: Table [DMAR] is not invalidated 
> during early boot stage (20170303/tbxface-193)

Sure we have a error message here, but compared to what? Compared to
something which does not boot at all?

Thanks,

tglx


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 10/12] x86/xen: Bypass intr mode setup in enlighten_pv system

2017-07-03 Thread Thomas Gleixner
On Mon, 3 Jul 2017, Dou Liyang wrote:
> At 07/03/2017 03:18 AM, Thomas Gleixner wrote:
> > On Fri, 30 Jun 2017, Dou Liyang wrote:
> > 
> > > xen_smp_ops overwrites smp_prepare_cpus to xen_pv_smp_prepare_cpus
> > > which initializes interrupt itself.
> > > 
> > > Touching the intr_mode_init causes unexpected results on the system.
> > > 
> > > Bypass it in enlighten_pv system.
> > 
> > So that's the wrong patch order then. You broke XEN at some point with your
> > changes. You need to prevent that breakage upfront not after the fact.
> 
> Yes, I have considered to prevent that breakage in the patchset.
> 
> Actually, Until the 11th patch, we put the intr_mode_init ahead of
> time, which will break XEN.
> 
> Before the 11th patch, we just unify the code and do the preparation,
> Kernel will do the intr_mode_init like before, which will have no
> influence on XEN.
> 
> So we put the patch here before 11th patch.

Ok. That's good, but please explain it in the changelog. I had the
impression that this is fixing breakage you introduced earlier, but now
with your explanation it makes sense. So the patch order is correct.

Something like this wants to be in the changelog:

  XEN PV overrides smp_prepare_cpus(). xen_pv_smp_prepare_cpus()
  initializes interrupts in the XEN PV specific way and does not invoke
  native_smp_prepare_cpus(). As a consequence, x86_init.intr_mode_init() is
  not invoked either.

  The invocation of x86_init.intr_mode_init() will be moved from
  native_smp_prepare_cpus() in a follow up patch to solve .

  That move would cause the invocation of x86_init.intr_mode_init() for XEN
  PV platforms. To prevent that, override the default x86_init.intr_mode_init()
  callback with a noop().

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 02/12] x86/apic: Prepare for unifying the interrupt delivery modes setup

2017-07-03 Thread Thomas Gleixner
On Mon, 3 Jul 2017, Dou Liyang wrote:
> At 07/03/2017 01:47 AM, Thomas Gleixner wrote:
> > On Fri, 30 Jun 2017, Dou Liyang wrote:
> > > +/* Init the interrupt delivery mode for the BSP */
> > > +void __init apic_intr_mode_init(void)
> > > +{
> > > + switch (apic_intr_mode_select()) {
> > > + case APIC_PIC:
> > > + apic_printk(APIC_VERBOSE, KERN_INFO
> > > + "Keep in PIC mode(8259)\n");
> > 
> > Please do not proliferate that APIC_VERBOSE, KERN_INFO mess. Clean up the
> > apic_printk() macro first. Either change printk() to pr_info() or make the
> > printk level dependent on the APIC verbosity.
> 
> Oops, I understood, How about the following:
> 
> pr_info("APIC: keep in PIC mode(8259)\n");

As this is once per boot, it's ok to have that information unconditionally
printed.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 10/12] x86/xen: Bypass intr mode setup in enlighten_pv system

2017-07-02 Thread Thomas Gleixner
On Fri, 30 Jun 2017, Dou Liyang wrote:

> xen_smp_ops overwrites smp_prepare_cpus to xen_pv_smp_prepare_cpus
> which initializes interrupt itself.
> 
> Touching the intr_mode_init causes unexpected results on the system.
> 
> Bypass it in enlighten_pv system.

So that's the wrong patch order then. You broke XEN at some point with your
changes. You need to prevent that breakage upfront not after the fact.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 09/12] x86/init: add intr_mode_init to x86_init_ops

2017-07-02 Thread Thomas Gleixner
On Fri, 30 Jun 2017, Dou Liyang wrote:
> Add an unconditional x86_init_ops function which defaults to the
> standard function and can be overridden by the early platform code.

That changelog describes WHAT the patch does, but not WHY. That's useless
as we can see WHAT the patch does from the patch itself. The WHY is the
really important information.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 08/12] x86/ioapic: Refactor the delay logic in timer_irq_works()

2017-07-02 Thread Thomas Gleixner
On Fri, 30 Jun 2017, Dou Liyang wrote:
> +static void __init delay_with_tsc(void)
> +{
> + unsigned long long start, now;
> + unsigned long ticks = jiffies;

Please make that

   unsigned long end = jiffies + 4;

ticks really means: number of ticks. But that variable is doing something
different.

> + start = rdtsc();
> +
> + /*
> +  * We don't know the TSC frequency yet, but waiting for
> +  * 400/HZ TSC cycles is safe:
> +  * 4 GHz == 10 jiffies
> +  * 1 GHz == 40 jiffies
> +  */
> + do {
> + rep_nop();
> + now = rdtsc();
> + } while ((now - start) < 400UL / HZ &&
> + time_before_eq(jiffies, ticks + 4));
> +}
> +
> +static void __init delay_without_tsc(void)
> +{
> + int band = 1;
> + unsigned long ticks = jiffies;

Please sort variables in reverse fir tree order

unsigned long end = jiffies + 4;
int band = 1;

> +
> + /*
> +  * We don't know any frequency yet, but waiting for
> +  * 4094000/HZ cycles is safe:
> +  * 4 GHz == 10 jiffies
> +  * 1 GHz == 40 jiffies
> +  * 1 << 1 + 1 << 2 +...+ 1 << 11 = 4094
> +  */
> + do {
> + __delay(((1 << band++) * 1000UL) / HZ);

  s/1/1U/

> + } while (band < 12 && time_before_eq(jiffies, ticks + 4));
> +}

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 07/12] x86/apic: Unify interrupt mode setup for UP system

2017-07-02 Thread Thomas Gleixner
On Fri, 30 Jun 2017, Dou Liyang wrote:
>  static inline int apic_force_enable(unsigned long addr)
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 0601054..9bf7e95 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1198,6 +1198,10 @@ static int __init apic_intr_mode_select(int *upmode)
>   }
>  #endif
>  
> +#ifdef CONFIG_UP_LATE_INIT
> + *upmode = true;
> +#endif

This is really wrong. The upmode decision, which is required for calling
apic_bsp_setup() should not happen here, really. As I told you in the
previous patch, use the return code and then you can make further decisions
in apic_intr_mode_init().

And you do it there w/o any ifdeffery:

static void apic_intr_mode_init(void)
{
bool upmode = IS_ENABLED(CONFIG_UP_LATE_INIT);

switch () {
case :
upmode = true;

}
apic_bsp_setup(upmode);
}

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 05/12] x86/apic: Unify interrupt mode setup for SMP-capable system

2017-07-02 Thread Thomas Gleixner
On Fri, 30 Jun 2017, Dou Liyang wrote:
> -static int __init apic_intr_mode_select(void)
> +static int __init apic_intr_mode_select(int *upmode)
>  {
>   /* Check kernel option */
>   if (disable_apic) {
> @@ -1206,12 +1208,30 @@ static int __init apic_intr_mode_select(void)
>   if (!smp_found_config) {
>   disable_ioapic_support();
>  
> - if (!acpi_lapic)
> + if (!acpi_lapic) {
>   pr_info("APIC: ACPI MADT or MP tables are not 
> detected\n");
> + *upmode = true;

That store and extra argument is pointless. 

> +
> + return APIC_VIRTUAL_WIRE_NO_CONFIG;

You added an extra return code, which you can use exactly for that purpose
at the callsite.


Aside of that, if you use int * then use numbers, if you use bool then use
true/false. But mixing that is horrible.

> + }

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 04/12] x86/apic: Move logical APIC ID away from apic_bsp_setup()

2017-07-02 Thread Thomas Gleixner
On Fri, 30 Jun 2017, Dou Liyang wrote:
>  /*
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 93f0cda..d6721f0 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1347,8 +1347,11 @@ void __init native_smp_prepare_cpus(unsigned int 
> max_cpus)
>   }
>  
>   default_setup_apic_routing();
> - cpu0_logical_apicid = apic_bsp_setup(false);
> -
> + apic_bsp_setup(false);
> + if (x2apic_mode)
> + cpu0_logical_apicid = apic_read(APIC_LDR);
> + else
> + cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));

Can you please move that into a seperate helper function?

>   /* Setup local timer */
>   x86_init.timers.setup_percpu_clockev();
>  
> -- 
> 2.5.5
> 
> 
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 02/12] x86/apic: Prepare for unifying the interrupt delivery modes setup

2017-07-02 Thread Thomas Gleixner
On Fri, 30 Jun 2017, Dou Liyang wrote:
> +/* Init the interrupt delivery mode for the BSP */
> +void __init apic_intr_mode_init(void)
> +{
> + switch (apic_intr_mode_select()) {
> + case APIC_PIC:
> + apic_printk(APIC_VERBOSE, KERN_INFO
> + "Keep in PIC mode(8259)\n");

Please do not proliferate that APIC_VERBOSE, KERN_INFO mess. Clean up the
apic_printk() macro first. Either change printk() to pr_info() or make the
printk level dependent on the APIC verbosity.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 01/12] x86/apic: Construct a selector for the interrupt delivery mode

2017-07-02 Thread Thomas Gleixner
On Fri, 30 Jun 2017, Dou Liyang wrote:
> +static int __init apic_intr_mode_select(void)
> +{
> + /* Check kernel option */
> + if (disable_apic) {
> + pr_info("APIC disabled via kernel command line\n");
> + return APIC_PIC;
> + }
> +
> + /* Check BIOS */
> +#ifdef CONFIG_X86_64
> + /* On 64-bit, the APIC must be integrated, Check local APIC only */
> + if (!boot_cpu_has(X86_FEATURE_APIC)) {
> + disable_apic = 1;
> + pr_info("APIC disabled by BIOS\n");
> + return APIC_PIC;
> + }
> +#else
> + /*
> +  * On 32-bit, check whether there is a separate chip or integrated
> +  * APIC
> +  */
> +
> + /* Has a local APIC ? */
> + if (!boot_cpu_has(X86_FEATURE_APIC) &&
> + APIC_INTEGRATED(boot_cpu_apic_version)) {

This looks wrong. The existing logic is:

if (!boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config)
return -1;

if (!boot_cpu_has(X86_FEATURE_APIC) &&
APIC_INTEGRATED(boot_cpu_apic_version)) {
pr_err();

I know that this is magically the same because boot_cpu_apic_version is 0
in the !boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config case, so you
don't fall into that conditional, but it's completely non obvious and does
not really make the code more understandable. Quite the contrary.

> + disable_apic = 1;
> + pr_err(FW_BUG "Local APIC %d not detected, force emulation\n",
> +boot_cpu_physical_apicid);
> + return APIC_PIC;
> + }
> +
> + /* Has a separate chip ? */
> + if (!boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config) {
> + disable_apic = 1;
> +
> + return APIC_PIC;
> + }

So if you move exactly that check above the other then it's clear what's
going on.

> +#endif
> +
> + /* Check MP table or ACPI MADT configuration */
> + if (!smp_found_config) {
> + disable_ioapic_support();
> +
> + if (!acpi_lapic)
> + pr_info("APIC: ACPI MADT or MP tables are not 
> detected\n");
> +
> + return APIC_VIRTUAL_WIRE;
> + }
> +
> + /* Other checks of APIC options will be done in each setup function */
> +

Please remove the extra new line. It's not helping readability.

> + return APIC_SYMMETRIC_IO;
> +}

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 08/36] x86/mm: Add support to enable SME in early boot processing

2017-06-21 Thread Thomas Gleixner
On Wed, 21 Jun 2017, Tom Lendacky wrote:
> On 6/21/2017 10:38 AM, Thomas Gleixner wrote:
> > /*
> >  * Sanitize CPU configuration and retrieve the modifier
> >  * for the initial pgdir entry which will be programmed
> >  * into CR3. Depends on enabled SME encryption, normally 0.
> >  */
> > call __startup_secondary_64
> > 
> >  addq$(init_top_pgt - __START_KERNEL_map), %rax
> > 
> > You can hide that stuff in C-code nicely without adding any cruft to the
> > ASM code.
> > 
> 
> Moving the call to verify_cpu into the C-code might be quite a bit of
> change.  Currently, the verify_cpu code is included code and not a
> global function.

Ah. Ok. I missed that.

> I can still do the __startup_secondary_64() function and then look to
> incorporate verify_cpu into both __startup_64() and
> __startup_secondary_64() as a post-patch to this series.

Yes, just having __startup_secondary_64() for now and there the extra bits
for that encryption stuff is fine.

> At least the secondary path will have a base C routine to which
> modifications can be made in the future if needed.  How does that sound?

Sounds like a plan.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 08/36] x86/mm: Add support to enable SME in early boot processing

2017-06-21 Thread Thomas Gleixner
On Wed, 21 Jun 2017, Tom Lendacky wrote:
> On 6/21/2017 2:16 AM, Thomas Gleixner wrote:
> > Why is this an unconditional function? Isn't the mask simply 0 when the MEM
> > ENCRYPT support is disabled?
> 
> I made it unconditional because of the call from head_64.S. I can't make
> use of the C level static inline function and since the mask is not a
> variable if CONFIG_AMD_MEM_ENCRYPT is not configured (#defined to 0) I
> can't reference the variable directly.
> 
> I could create a #define in head_64.S that changes this to load rax with
> the variable if CONFIG_AMD_MEM_ENCRYPT is configured or a zero if it's
> not or add a #ifdef at that point in the code directly. Thoughts on
> that?

See below.

> > That does not make any sense. Neither the call to sme_encrypt_kernel() nor
> > the following call to sme_get_me_mask().
> > 
> > __startup_64() is already C code, so why can't you simply call that from
> > __startup_64() in C and return the mask from there?
> 
> I was trying to keep it explicit as to what was happening, but I can
> move those calls into __startup_64().

That's much preferred. And the return value wants to be documented in both
C and ASM code.

> I'll still need the call to sme_get_me_mask() in the secondary_startup_64
> path, though (depending on your thoughts to the above response).

call verify_cpu

movq$(init_top_pgt - __START_KERNEL_map), %rax

So if you make that:

/*
 * Sanitize CPU configuration and retrieve the modifier
 * for the initial pgdir entry which will be programmed
 * into CR3. Depends on enabled SME encryption, normally 0.
 */
call __startup_secondary_64

addq$(init_top_pgt - __START_KERNEL_map), %rax

You can hide that stuff in C-code nicely without adding any cruft to the
ASM code.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 07/36] x86/mm: Don't use phys_to_virt in ioremap() if SME is active

2017-06-21 Thread Thomas Gleixner
On Fri, 16 Jun 2017, Tom Lendacky wrote:
> Currently there is a check if the address being mapped is in the ISA
> range (is_ISA_range()), and if it is then phys_to_virt() is used to
> perform the mapping.  When SME is active, however, this will result
> in the mapping having the encryption bit set when it is expected that
> an ioremap() should not have the encryption bit set. So only use the
> phys_to_virt() function if SME is not active
> 
> Reviewed-by: Borislav Petkov 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/mm/ioremap.c |7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 4c1b5fd..a382ba9 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -106,9 +107,11 @@ static void __iomem *__ioremap_caller(resource_size_t 
> phys_addr,
>   }
>  
>   /*
> -  * Don't remap the low PCI/ISA area, it's always mapped..
> +  * Don't remap the low PCI/ISA area, it's always mapped.
> +  *   But if SME is active, skip this so that the encryption bit
> +  *   doesn't get set.
>*/
> - if (is_ISA_range(phys_addr, last_addr))
> + if (is_ISA_range(phys_addr, last_addr) && !sme_active())
>   return (__force void __iomem *)phys_to_virt(phys_addr);

More thoughts about that.

Making this conditional on !sme_active() is not the best idea. I'd rather
remove that whole thing and make it unconditional so the code pathes get
always exercised and any subtle wreckage is detected on a broader base and
not only on that hard to access and debug SME capable machine owned by Joe
User.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 10/36] x86/mm: Provide general kernel support for memory encryption

2017-06-21 Thread Thomas Gleixner
On Fri, 16 Jun 2017, Tom Lendacky wrote:
>  
> +#ifndef pgprot_encrypted
> +#define pgprot_encrypted(prot)   (prot)
> +#endif
> +
> +#ifndef pgprot_decrypted

That looks wrong. It's not decrypted it's rather unencrypted, right?

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 08/36] x86/mm: Add support to enable SME in early boot processing

2017-06-21 Thread Thomas Gleixner
On Fri, 16 Jun 2017, Tom Lendacky wrote:
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index a105796..988b336 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -15,16 +15,24 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +#include 
> +
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  
>  extern unsigned long sme_me_mask;
>  
> +void __init sme_enable(void);
> +
>  #else/* !CONFIG_AMD_MEM_ENCRYPT */
>  
>  #define sme_me_mask  0UL
>  
> +static inline void __init sme_enable(void) { }
> +
>  #endif   /* CONFIG_AMD_MEM_ENCRYPT */
>  
> +unsigned long sme_get_me_mask(void);

Why is this an unconditional function? Isn't the mask simply 0 when the MEM
ENCRYPT support is disabled?

> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 6225550..ef12729 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -78,7 +78,29 @@ startup_64:
>   call__startup_64
>   popq%rsi
>  
> - movq$(early_top_pgt - __START_KERNEL_map), %rax
> + /*
> +  * Encrypt the kernel if SME is active.
> +  * The real_mode_data address is in %rsi and that register can be
> +  * clobbered by the called function so be sure to save it.
> +  */
> + push%rsi
> + callsme_encrypt_kernel
> + pop %rsi

That does not make any sense. Neither the call to sme_encrypt_kernel() nor
the following call to sme_get_me_mask().

__startup_64() is already C code, so why can't you simply call that from
__startup_64() in C and return the mask from there?

> @@ -98,7 +120,20 @@ ENTRY(secondary_startup_64)
>   /* Sanitize CPU configuration */
>   call verify_cpu
>  
> - movq$(init_top_pgt - __START_KERNEL_map), %rax
> + /*
> +  * Get the SME encryption mask.
> +  *  The encryption mask will be returned in %rax so we do an ADD
> +  *  below to be sure that the encryption mask is part of the
> +  *  value that will stored in %cr3.
> +  *
> +  * The real_mode_data address is in %rsi and that register can be
> +  * clobbered by the called function so be sure to save it.
> +  */
> + push%rsi
> + callsme_get_me_mask
> + pop %rsi

Do we really need a call here? The mask is established at this point, so
it's either 0 when the encryption stuff is not compiled in or it can be
retrieved from a variable which is accessible at this point.

> +
> + addq$(init_top_pgt - __START_KERNEL_map), %rax
>  1:
>  
>   /* Enable PAE mode, PGE and LA57 */

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 07/36] x86/mm: Don't use phys_to_virt in ioremap() if SME is active

2017-06-20 Thread Thomas Gleixner
On Fri, 16 Jun 2017, Tom Lendacky wrote:

> Currently there is a check if the address being mapped is in the ISA
> range (is_ISA_range()), and if it is then phys_to_virt() is used to
> perform the mapping.  When SME is active, however, this will result
> in the mapping having the encryption bit set when it is expected that
> an ioremap() should not have the encryption bit set. So only use the
> phys_to_virt() function if SME is not active

This does not make sense to me. What the heck has phys_to_virt() to do with
the encryption bit. Especially why would the encryption bit be set on that
mapping in the first place?

I'm probably missing something, but this want's some coherent explanation
understandable by mere mortals both in the changelog and the code comment.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 06/36] x86/mm: Add Secure Memory Encryption (SME) support

2017-06-20 Thread Thomas Gleixner
On Fri, 16 Jun 2017, Tom Lendacky wrote:
>  
> +config ARCH_HAS_MEM_ENCRYPT
> + def_bool y
> + depends on X86

That one is silly. The config switch is in the x86 KConfig file, so X86 is
on. If you intended to move this to some generic place outside of
x86/Kconfig then this should be

config ARCH_HAS_MEM_ENCRYPT
bool

and x86/Kconfig should have

select ARCH_HAS_MEM_ENCRYPT

and that should be selected by AMD_MEM_ENCRYPT

> +config AMD_MEM_ENCRYPT
> + bool "AMD Secure Memory Encryption (SME) support"
> + depends on X86_64 && CPU_SUP_AMD
> + ---help---
> +   Say yes to enable support for the encryption of system memory.
> +   This requires an AMD processor that supports Secure Memory
> +   Encryption (SME).

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen

2017-05-11 Thread Thomas Gleixner
On Thu, 11 May 2017, Juergen Gross wrote:

> On 27/04/17 07:01, Juergen Gross wrote:
> > When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set
> > on AMD cpus.
> > 
> > This bug/feature bit is kind of special as it will be used very early
> > when switching threads. Setting the bit and clearing it a little bit
> > later leaves a critical window where things can go wrong. This time
> > window has enlarged a little bit by using setup_clear_cpu_cap() instead
> > of the hypervisor's set_cpu_features callback. It seems this larger
> > window now makes it rather easy to hit the problem.
> > 
> > The proper solution is to never set the bit in case of Xen.
> > 
> > Signed-off-by: Juergen Gross <jgr...@suse.com>
> 
> Any objections for carrying this through the Xen tree?


Acked-by: Thomas Gleixner <t...@linutronix.de>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH linux v3 2/9] x86/acpi: store ACPI ids from MADT for future usage

2017-03-01 Thread Thomas Gleixner
On Tue, 26 Jul 2016, Vitaly Kuznetsov wrote:

So this patch made it's way into Linus tree via XEN w/o an ack or reviewed
by from the x86 maintainers.

Yes, we were on CC, but it's not that hard to ping the maintainers when
they do not respond on a particular patch.

The whole series ran under the cover letter subject:

 xen: pvhvm: support bootup on secondary vCPU

which suggests that this is a XEN internal affair. And I really have enough
stuff to look after so I don't dive into XEN internals if it's not
obviously required.

Let's look at this after the fact:

> Currently we don't save ACPI ids (unlike LAPIC ids which go to
> x86_cpu_to_apicid) from MADT and we may need this information later.

may need? Maybe, or maybe not.

> Particularly, ACPI ids is the only existent way for a PVHVM Xen guest
> to figure out Xen's idea of its vCPUs ids before these CPUs boot and
> in some cases these ids diverge from Linux's cpu ids.

I have no idea what this sentence means and what kind of divergence this is
talking about.

Dammit, if stuff gets slammed into the x86 tree w/o a proper notice, then
the minimum requirement is at least an understandable changelog which
allows non XEN experts to figure out WHY this is necessary and WHAT this is
about.

> @@ -714,7 +722,7 @@ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, 
> int *pcpu)
>  {
>   int cpu;
>  
> - cpu = acpi_register_lapic(physid, ACPI_MADT_ENABLED);
> + cpu = acpi_register_lapic(physid, U32_MAX, ACPI_MADT_ENABLED);

What the heck is this? ACPIID is U32_MAX? Sure, that's obvious as it can
get and the well thought out comment above this call explains it nicely.

Yes, I know it has been fixed later, but this crap should not have been
merged in the first place.

Yours grumpy

  tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 01/10] x86: assembly, ENTRY for fn, GLOBAL for data

2017-03-01 Thread Thomas Gleixner
On Wed, 1 Mar 2017, Ingo Molnar wrote:
> 
> * Jiri Slaby  wrote:
> 
> > This is a start of series to unify use of ENTRY, ENDPROC, GLOBAL, END,
> > and other macros across x86. When we have all this sorted out, this will
> > help to inject DWARF unwinding info by objtool later.
> > 
> > So, let us use the macros this way:
> > * ENTRY -- start of a global function
> > * ENDPROC -- end of a local/global function
> > * GLOBAL -- start of a globally visible data symbol
> > * END -- end of local/global data symbol
> 
> So how about using macro names that actually show the purpose, instead of 
> importing all the crappy, historic, essentially randomly chosen debug symbol 
> macro 
> names from the binutils and older kernels?
> 
> Something sane, like:
> 
>   SYM__FUNCTION_START

Sane would be:

SYM_FUNCTION_START

The double underscore is just not giving any value.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 09/35] x86: Convert remaining uses of pr_warning to pr_warn

2017-02-17 Thread Thomas Gleixner
On Thu, 16 Feb 2017, Joe Perches wrote:

> To enable eventual removal of pr_warning
> 
> This makes pr_warn use consistent for arch/x86
> 
> Prior to this patch, there were 46 uses of pr_warning and
> 122 uses of pr_warn in arch/x86
> 
> Miscellanea:
> 
> o Coalesce a few formats and realign arguments
> o Convert a couple of multiple line printks to single line
> 
> Signed-off-by: Joe Perches <j...@perches.com>

Acked-by: Thomas Gleixner <t...@linutronix.de>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] cpufreq: Remove cpu hotplug callbacks only if they were initialized

2016-12-15 Thread Thomas Gleixner
On Thu, 15 Dec 2016, Boris Ostrovsky wrote:

> Since cpu hotplug callbacks are requested for CPUHP_AP_ONLINE_DYN state,
> successful callback initialization will result in cpuhp_setup_state()
> returning a positive value. Therefore acpi_cpufreq_online being zero
> indicates that callbacks have not been installed.
> 
> This means that acpi_cpufreq_boost_exit() should only remove them if
> acpi_cpufreq_online is positive. Trying to call
> cpuhp_remove_state_nocalls(0) will cause a BUG().
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>

> ---
>  drivers/cpufreq/acpi-cpufreq.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> index 3a98702..3a2ca0f 100644
> --- a/drivers/cpufreq/acpi-cpufreq.c
> +++ b/drivers/cpufreq/acpi-cpufreq.c
> @@ -930,7 +930,7 @@ static void __init acpi_cpufreq_boost_init(void)
>  
>  static void acpi_cpufreq_boost_exit(void)
>  {
> - if (acpi_cpufreq_online >= 0)
> + if (acpi_cpufreq_online > 0)
>   cpuhp_remove_state_nocalls(acpi_cpufreq_online);
>  }
>  
> -- 
> 1.7.1
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [tip:x86/urgent] x86/smpboot: Make logical package management more robust

2016-12-13 Thread tip-bot for Thomas Gleixner
Commit-ID:  9d85eb9119f4eeeb48e87adfcd71f752655700e9
Gitweb: http://git.kernel.org/tip/9d85eb9119f4eeeb48e87adfcd71f752655700e9
Author: Thomas Gleixner <t...@linutronix.de>
AuthorDate: Mon, 12 Dec 2016 11:04:53 +0100
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Tue, 13 Dec 2016 10:22:39 +0100

x86/smpboot: Make logical package management more robust

The logical package management has several issues:

 - The APIC ids provided by ACPI are not required to be the same as the
   initial APIC id which can be retrieved by CPUID. The APIC ids provided
   by ACPI are those which are written by the BIOS into the APIC. The
   initial id is set by hardware and can not be changed. The hardware
   provided ids contain the real hardware package information.

   Especially AMD sets the effective APIC id different from the hardware id
   as they need to reserve space for the IOAPIC ids starting at id 0.

   As a consequence those machines trigger the currently active firmware
   bug printouts in dmesg, These are obviously wrong.

 - Virtual machines have their own interesting of enumerating APICs and
   packages which are not reliably covered by the current implementation.

The sizing of the mapping array has been tweaked to be generously large to
handle systems which provide a wrong core count when HT is disabled so the
whole magic which checks for space in the physical hotplug case is not
needed anymore.

Simplify the whole machinery and do the mapping when the CPU starts and the
CPUID derived physical package information is available. This solves the
observed problems on AMD machines and works for the virtualization issues
as well.

Remove the extra call from XEN cpu bringup code as it is not longer
required.

Fixes: d49597fd3bc7 ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
Reported-and-tested-by: Borislav Petkov <b...@suse.de>
Tested-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: M. Vefa Bicakci <m@runbox.com>
Cc: xen-devel <xen-devel@lists.xen.org>
Cc: Charles (Chas) Williams <ciwil...@brocade.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Alok Kataria <akata...@vmware.com>
Cc: sta...@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanos
Signed-off-by: Thomas Gleixner <t...@linutronix.de>

---
 arch/x86/kernel/apic/apic.c  | 15 -
 arch/x86/kernel/cpu/common.c | 24 +++--
 arch/x86/kernel/smpboot.c| 51 +---
 arch/x86/xen/smp.c   |  6 --
 4 files changed, 27 insertions(+), 69 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index bb47e5e..5b7e43e 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2160,21 +2160,6 @@ int __generic_processor_info(int apicid, int version, 
bool enabled)
}
 
/*
-* This can happen on physical hotplug. The sanity check at boot time
-* is done from native_smp_prepare_cpus() after num_possible_cpus() is
-* established.
-*/
-   if (topology_update_package_map(apicid, cpu) < 0) {
-   int thiscpu = max + disabled_cpus;
-
-   pr_warning("APIC: Package limit reached. Processor %d/0x%x 
ignored.\n",
-  thiscpu, apicid);
-
-   disabled_cpus++;
-   return -ENOSPC;
-   }
-
-   /*
 * Validate version
 */
if (version == 0x0) {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 729f92b..1f6b50a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -979,29 +979,21 @@ static void x86_init_cache_qos(struct cpuinfo_x86 *c)
 }
 
 /*
- * The physical to logical package id mapping is initialized from the
- * acpi/mptables information. Make sure that CPUID actually agrees with
- * that.
+ * Validate that ACPI/mptables have the same information about the
+ * effective APIC id and update the package map.
  */
-static void sanitize_package_id(struct cpuinfo_x86 *c)
+static void validate_apic_and_package_id(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
-   unsigned int pkg, apicid, cpu = smp_processor_id();
+   unsigned int apicid, cpu = smp_processor_id();
 
apicid = apic->cpu_present_to_apicid(cpu);
-   pkg = apicid >> boot_cpu_data.x86_coreid_bits;
 
-   if (apicid != c->initial_apicid) {
-   pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: 
%x\n",
+   if (apicid != c->apicid) {
+   pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: 
%x\n",
   cpu, apicid, c->initial_apicid);
-   c->initial_apicid

[Xen-devel] [PATCH v2] x86/smpboot: Make logical package management more robust

2016-12-12 Thread Thomas Gleixner
The logical package management has several issues:

 - The APIC ids provided by ACPI are not required to be the same as the
   initial APIC id which can be retrieved by CPUID. The APIC ids provided
   by ACPI are those which are written by the BIOS into the APIC. The
   initial id is set by hardware and can not be changed. The hardware
   provided ids contain the real hardware package information.

   Especially AMD sets the effective APIC id different from the hardware id
   as they need to reserve space for the IOAPIC ids starting at id 0.

   As a consequence those machines trigger the currently active firmware
   bug printouts in dmesg, These are obviously wrong.

 - Virtual machines have their own interesting of enumerating APICs and
   packages which are not reliably covered by the current implementation.

The sizing of the mapping array has been tweaked to be generously large to
handle systems which provide a wrong core count when HT is disabled so the
whole magic which checks for space in the physical hotplug case is not
needed anymore.

Simplify the whole machinery and do the mapping when the CPU starts and the
CPUID derived physical package information is available. This solves the
observed problems on AMD machines and works for the virtualization issues
as well.

Remove the extra call from XEN cpu bringup code as it is not longer
required.

Fixes: d49597fd3bc7 ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
Reported-and-tested-by: Borislav Petkov <b...@suse.de>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Cc: sta...@vger.kernel.org
---
 arch/x86/kernel/apic/apic.c  |   15 
 arch/x86/kernel/cpu/common.c |   24 ++--
 arch/x86/kernel/smpboot.c|   51 ---
 arch/x86/xen/smp.c   |6 -
 4 files changed, 27 insertions(+), 69 deletions(-)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2159,21 +2159,6 @@ int __generic_processor_info(int apicid,
}
 
/*
-* This can happen on physical hotplug. The sanity check at boot time
-* is done from native_smp_prepare_cpus() after num_possible_cpus() is
-* established.
-*/
-   if (topology_update_package_map(apicid, cpu) < 0) {
-   int thiscpu = max + disabled_cpus;
-
-   pr_warning("APIC: Package limit reached. Processor %d/0x%x 
ignored.\n",
-  thiscpu, apicid);
-
-   disabled_cpus++;
-   return -ENOSPC;
-   }
-
-   /*
 * Validate version
 */
if (version == 0x0) {
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -979,29 +979,21 @@ static void x86_init_cache_qos(struct cp
 }
 
 /*
- * The physical to logical package id mapping is initialized from the
- * acpi/mptables information. Make sure that CPUID actually agrees with
- * that.
+ * Validate that ACPI/mptables have the same information about the
+ * effective APIC id and update the package map.
  */
-static void sanitize_package_id(struct cpuinfo_x86 *c)
+static void validate_apic_and_package_id(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
-   unsigned int pkg, apicid, cpu = smp_processor_id();
+   unsigned int apicid, cpu = smp_processor_id();
 
apicid = apic->cpu_present_to_apicid(cpu);
-   pkg = apicid >> boot_cpu_data.x86_coreid_bits;
 
-   if (apicid != c->initial_apicid) {
-   pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: 
%x\n",
+   if (apicid != c->apicid) {
+   pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: 
%x\n",
   cpu, apicid, c->initial_apicid);
-   c->initial_apicid = apicid;
}
-   if (pkg != c->phys_proc_id) {
-   pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of 
%u\n",
-  cpu, pkg, c->phys_proc_id);
-   c->phys_proc_id = pkg;
-   }
-   c->logical_proc_id = topology_phys_to_logical_pkg(pkg);
+   BUG_ON(topology_update_package_map(c->phys_proc_id, cpu));
 #else
c->logical_proc_id = 0;
 #endif
@@ -1132,7 +1124,6 @@ static void identify_cpu(struct cpuinfo_
 #ifdef CONFIG_NUMA
numa_add_cpu(smp_processor_id());
 #endif
-   sanitize_package_id(c);
 }
 
 /*
@@ -1188,6 +1179,7 @@ void identify_secondary_cpu(struct cpuin
enable_sep_cpu();
 #endif
mtrr_ap_init();
+   validate_apic_and_package_id(c);
 }
 
 struct msr_range {
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -104,7 +104,6 @@ static unsigned int max_physical_pkg_id
 unsigned int __max_logical_packages __read_mostly;
 EXPORT_SYMBOL(__max_logical_packages);
 static unsigned int logical_packages __read_mostly;
-static bool logical_packages_frozen __read_mostly;
 
 /* Maximum number of SMT thre

Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Thomas Gleixner
On Sat, 10 Dec 2016, Thomas Gleixner wrote:
> On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> > On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:
> > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
> > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote:
> > > > 
> > > > Boris, can you please verify if that makes the
> > > > topology_update_package_map() call which you placed into the Xen cpu
> > > > starting code obsolete ?
> > > 
> > > Will do. I did test your patch but without removing
> > > topology_update_package_map() call. It complained about package IDs
> > > being wrong, but that's expected until I fix Xen part.
> > 
> > Ignore my statement about earlier testing --- it was all on single-node
> > machines.
> > 
> > Something is broken with multi-node on Intel, but failure modes are 
> > different.
> > Prior to this patch build_sched_domain() reports an error and pretty soon we
> > crash in scheduler (don't remember off the top of my head). With patch 
> > applied
> > I crash mush later, when one of the drivers does kmalloc_node(..,
> > cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen
> > ("x86: Booted up 1 node, 32 CPUs" is reported, for example).
> 
> Hmm. But the cpu_to_node() association is unrelated to the logical package
> management.

Just came to my mind after hitting send. We had the whole persistent cpuid
to nodeid association work merged in 4.9. So that might be related.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Thomas Gleixner
On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:
> > On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
> > > On Thu, 8 Dec 2016, Thomas Gleixner wrote:
> > > 
> > > Boris, can you please verify if that makes the
> > > topology_update_package_map() call which you placed into the Xen cpu
> > > starting code obsolete ?
> > 
> > Will do. I did test your patch but without removing
> > topology_update_package_map() call. It complained about package IDs
> > being wrong, but that's expected until I fix Xen part.
> 
> Ignore my statement about earlier testing --- it was all on single-node
> machines.
> 
> Something is broken with multi-node on Intel, but failure modes are different.
> Prior to this patch build_sched_domain() reports an error and pretty soon we
> crash in scheduler (don't remember off the top of my head). With patch applied
> I crash mush later, when one of the drivers does kmalloc_node(..,
> cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen
> ("x86: Booted up 1 node, 32 CPUs" is reported, for example).

Hmm. But the cpu_to_node() association is unrelated to the logical package
management.

> 2-node AMD box doesn't have these problems.
> 
> I haven't upgraded the Intel machine for about a month but this all must have
> happened in 4.9 timeframe.
> 
> So I can't answer your question since we clearly have other problems on Xen. I
> will be looking into this.

Fair enough. What you could do though with this patch applied and the extra
XEN call to topology_update_package_map() removed is to watchout for the
following messages:

  pr_info("Max logical packages: %u\n", __max_logical_packages);

and

  pr_warn(CPU %u Converting physical %u to logical package %u\n", ...)

Ideally the latter wont show.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-10 Thread Thomas Gleixner
On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> On 12/09/2016 06:00 PM, Thomas Gleixner wrote:
> > On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> > > On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
> > > > On Thu, 8 Dec 2016, Thomas Gleixner wrote:
> > > > 
> > > > Boris, can you please verify if that makes the
> > > > topology_update_package_map() call which you placed into the Xen cpu
> > > > starting code obsolete ?
> > > 
> > > Will do. I did test your patch but without removing
> > > topology_update_package_map() call. It complained about package IDs
> > > being wrong, but that's expected until I fix Xen part.
> > 
> > That should not longer be the case as I changed the approach to that
> > management thing.
> 
> 
> I didn't notice this email before I sent the earlier message.
> 
> Is these anything else besides this patch that I should use? I applied it to
> Linus tree and it didn't apply cleanly (there was some fuzz and such) so I
> wonder whether I am missing something.

No. I did it against tip, but there is nothing which it depends on.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/cpuid: Deal with broken firmware once more

2016-11-18 Thread Thomas Gleixner
On Mon, 14 Nov 2016, Boris Ostrovsky wrote:
> On 11/13/2016 06:42 PM, M. Vefa Bicakci wrote:
> 
> > I found out that my domU kernels invoke the 'apic_disable' function
> > because CONFIG_X86_MPPARSE was not enabled in my kernel configuration,
> > which would cause the 'smp_found_config' bit to be unset at boot-up.
> 
> smp_found_config is not the problem, it is usually zero for Xen PV guests.
> 
> What is the problem is that because of your particular config selection
> acpi_mps_check() fails (with the error message that you mention below) and
> that leads to X86_FEATURE_APIC being cleared. And then we indeed switch to
> APIC noop and things go south after that.

Indeed. And what really puzzles me is that Xen manages to bring up a
secondary CPU despite APIC being disabled. 

There are quite some assumptions about no APIC == no SMP in all of x86. Can
we please make Xen behave like anything else?

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/cpuid: Deal with broken firmware once more

2016-11-10 Thread Thomas Gleixner
On Thu, 10 Nov 2016, Boris Ostrovsky wrote:
> On 11/10/2016 10:12 AM, Thomas Gleixner wrote:
> > On Thu, 10 Nov 2016, Boris Ostrovsky wrote:
> >> By firmware you mean ACPI? It is most likely not available to PV guests.
> > You either have to provide ACPI or MP tables. And either of those has to
> > provide the intial APIC ids for the CPUs. They are supposed to match the
> > IDs which are in the CPUID leafs.
> >
> >> How about returning cpu_data(cpu).initial_apicid?
> > For a not yet brought up CPU. That's what the initial ACPI/MP table
> > enumeration is for.
> 
> Unfortunately PV guests have neither. So we may need to emulate
> something in xen_cpu_present_to_apicid().

SFI does the same thing and according to the dmesg which was posted, this
is using SFI. We also have devicetree based boot concept which provides the
APICids in the CPU enumeration at boot time in a way which the whole x86
machinery is expecting.

So what kind of APICid is XEN handing in via SFI? None, or just an invalid
one?

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 2/3] x86 Test and expose CPUID faulting capabilities in /proc/cpuinfo

2016-09-16 Thread Thomas Gleixner
On Thu, 15 Sep 2016, Kyle Huey wrote:

Please use proper prefixes for your patch:

git-log arch/x86/kernel/cpu/scattered.c will give you the hint:

x86/cpufeature: Move some of the scattered feature bits to x86_capability
x86/cpufeature: Correct spelling of the HWP_NOTIFY flag

and make the subject line short. Long sentences belong into the body of the
changelog.

And again this changelog does not tell anything.

What is this CPUID faulting support?
Which CPUs do support this?
Why do we want this?

>  #define X86_FEATURE_CPB  ( 7*32+ 2) /* AMD Core Performance 
> Boost */
>  #define X86_FEATURE_EPB  ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS 
> support */
> +#define X86_FEATURE_CPUID_FAULT ( 7*32+ 4) /* Intel CPUID faulting */

Boris?

>  #define X86_FEATURE_HW_PSTATE( 7*32+ 8) /* AMD HW-PState */
>  #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
> diff --git a/arch/x86/include/asm/msr-index.h 
> b/arch/x86/include/asm/msr-index.h
> index 56f4c66..83908d5 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -41,6 +41,7 @@
>  #define MSR_IA32_PERFCTR10x00c2
>  #define MSR_FSB_FREQ 0x00cd
>  #define MSR_PLATFORM_INFO0x00ce
> +#define CPUID_FAULTING_SUPPORT   (1UL << 31)

If you look at the other MSR bit defines then they have always a prefix
which links them to the MSR

What's the name of this bit in the Documentation?

> +static int supports_cpuid_faulting(void)

bool please

> +{
> + unsigned int lo, hi;
> +
> + if (rdmsr_safe(MSR_PLATFORM_INFO, , ) == 0 &&
> + (lo & CPUID_FAULTING_SUPPORT))
> + return 1;
> + else
> + return 0;

if (rdmsr_safe(MSR_PLATFORM_INFO, , ))
return false;

return lo & PLATINFO_CPUID_FAULT;

would make it readable without linebreaks.

Thanks,

tglx
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/3] syscalls, x86 Expose arch_prctl on x86-32.

2016-09-16 Thread Thomas Gleixner
On Thu, 15 Sep 2016, Kyle Huey wrote:

First of all, please add a cover letter [PATCH 0/N] to your patch series
and send it with something which provides proper mail threading. 
See: git-send-email, quilt 

> arch_prctl is currently 64-bit only. Wire it up for 32-bits, as a no-op for
> now. Rename the second arg to a more generic name.

This changelog is useless.

- it does not provide any rationale for this change, i.e. why this is
  required. Just because its 64bit only is not a reason.

- "Rename the second arg to a more generic name" does not give
  any useful information.

Misleading information is worse than no information.

Further your patch does 5 things at once. It wants to be split into parts:

1) Rename do_arch_prctl() and change the argument name,

> -long do_arch_prctl(struct task_struct *task, int code, unsigned long addr)
> +long do_arch_prctl_64(struct task_struct *task, int code, unsigned long arg2)

2) Provide do_arch_prctl_common() and hook it up to the arch_prctl syscall

> -long sys_arch_prctl(int code, unsigned long addr)
> +SYSCALL_DEFINE2(arch_prctl, int, code, unsigned long, arg2)
>  {
> - return do_arch_prctl(current, code, addr);
> + long ret;
> +
> + ret = do_arch_prctl_64(current, code, arg2);
> + if (ret == -EINVAL)
> + ret = do_arch_prctl_common(current, code, arg2);
> +
> + return ret;
>  }

3) Implement the compat version
  
Thanks,

tflx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [tip:x86/asm] x86/mm/xen: Suppress hugetlbfs in PV guests

2016-04-25 Thread Thomas Gleixner
On Mon, 25 Apr 2016, Jan Beulich wrote:
> >>> On 22.04.16 at 20:03,  wrote:
> >> +#define hugepages_supported() cpu_has_pse
> >>  
> > 
> > Please don't use the cpu_has_* macros anymore, they are going away soon.
> > 
> > In this case it should be static_cpu_has(X86_FEATURE_PSE).
> 
> I can certainly do that, but this
> - will (mildly) harm backportability
> - imo should have been requested much earlier (when the patch was
>   still under discussion)

It's requested now as cpu_has_* is going away. So instead of making silly
arguments you should have sent a delta patch fixing this.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/6] x86/mm/pat: Change PAT to support non-default PAT MSR

2016-03-22 Thread Thomas Gleixner
On Tue, 22 Mar 2016, Borislav Petkov wrote:
> > +static void pat_keep_handoff_state(void)
> 
> Static function, no need for "pat_" prefix. Also, no need for the
> kernel-doc comment.

I have to disagree. kernel-doc comments are not limited to global
functions.

I realy prefer kernel-doc style for static functions over randomly formatted
ones for consistency reasons.
 
Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down

2016-03-12 Thread Thomas Gleixner
Boris,

On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
> On 07/14/2015 04:15 PM, Thomas Gleixner wrote:
> > > > The issue here is that all architectures need that protection and just
> > > > Xen does irq allocations in cpu_up.
> > > > 
> > > > So moving that protection into architecture code is not really an
> > > > option.
> > > > 
> > > > > > > Otherwise we will need to have something like arch_post_cpu_up()
> > > > > > > after the lock is released.
> > > > I'm not sure, that this will work. You probably want to do this in the
> > > > cpu prepare stage, i.e. before calling __cpu_up().
> > > For PV guests (the ones that use xen_cpu_up()) it will work either before
> > > or
> > > after __cpu_up(). At least my (somewhat limited) testing didn't show any
> > > problems so far.
> > > 
> > > However, HVM CPUs use xen_hvm_cpu_up() and if you read comments there you
> > > will
> > > see that xen_smp_intr_init() needs to be called before native_cpu_up() but
> > > xen_init_lock_cpu() (which eventually calls irq_alloc_descs()) needs to be
> > > called after.
> > > 
> > > I think I can split xen_init_lock_cpu() so that the part that needs to be
> > > called after will avoid going into irq core code. And then the rest will
> > > go
> > > into arch_cpu_prepare().
> > I think we should revisit this for 4.3. For 4.2 we can do the trivial
> > variant and move the locking in native_cpu_up() and x86 only. x86 was
> > the only arch on which such wreckage has been seen in the wild, but we
> > should have that protection for all archs in the long run.
> > 
> > Patch below should fix the issue.
> 
> Thanks! Most of my tests passed, I had a couple of failures but I will need to
> see whether they are related to this patch.

Did you ever come around to address that irq allocation from within cpu_up()?

I really want to generalize the protection instead of carrying that x86 only
hack forever.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 13/41] x86: reuse asm-generic/barrier.h

2016-01-12 Thread Thomas Gleixner
On Sun, 10 Jan 2016, Michael S. Tsirkin wrote:

> As on most architectures, on x86 read_barrier_depends and
> smp_read_barrier_depends are empty.  Drop the local definitions and pull
> the generic ones from asm-generic/barrier.h instead: they are identical.
> 
> This is in preparation to refactoring this code area.
> 
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> Acked-by: Arnd Bergmann <a...@arndb.de>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 27/41] x86: define __smp_xxx

2016-01-12 Thread Thomas Gleixner
On Sun, 10 Jan 2016, Michael S. Tsirkin wrote:

> This defines __smp_xxx barriers for x86,
> for use by virtualization.
> 
> smp_xxx barriers are removed as they are
> defined correctly by asm-generic/barriers.h
> 
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> Acked-by: Arnd Bergmann <a...@arndb.de>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv1] x86: rtc_cmos platform device requires legacy irqs

2015-12-08 Thread Thomas Gleixner
On Fri, 4 Dec 2015, David Vrabel wrote:
> On 04/12/15 14:06, David Vrabel wrote:
> > On 03/12/15 10:43, David Vrabel wrote:
> >> Adding the rtc platform device when there are no legacy irqs (no
> >> legacy PIC) causes a conflict with other devices that end up using the
> >> same irq number.
> > 
> > An alternative is to remove the rtc_cmos platform device in Xen PV
> > guests.
> > 
> > Any preference on how this regression should be fixed?
> > 
> > David
> > 
> > 8<--
> > x86: Xen PV guests don't have the rtc_cmos platform device
> > 
> [...]
> > --- a/arch/x86/kernel/rtc.c
> > +++ b/arch/x86/kernel/rtc.c
> > @@ -200,6 +200,9 @@ static __init int add_rtc_cmos(void)
> > }
> >  #endif
> >  
> > +   if (xen_pv_domain())
> > +   return -ENODEV;
> > +
> 
> Note there's a missing include that breaks !XEN builds.

What's the state of this?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv1] x86: rtc_cmos platform device requires legacy irqs

2015-12-08 Thread Thomas Gleixner
On Tue, 8 Dec 2015, Boris Ostrovsky wrote:
> On 12/08/2015 04:02 PM, Thomas Gleixner wrote:
> > > > --- a/arch/x86/kernel/rtc.c
> > > > +++ b/arch/x86/kernel/rtc.c
> > > > @@ -200,6 +200,9 @@ static __init int add_rtc_cmos(void)
> > > > }
> > > >   #endif
> > > >   + if (xen_pv_domain())
> > > > +   return -ENODEV;
> > > > +
> > > Note there's a missing include that breaks !XEN builds.
> > What's the state of this?
> 
> I think we are waiting for x86 maintainers to express their preference. There
> were 3 proposals to add in add_rtc_cmos()
> 
> 1. if (!nr_legacy_irqs())
> return -ENODEV;
> 
> 2. #ifdef XEN
> if (xen_pv_domain())
> return -ENODEV;
> #endif
> 
> 3. if (paravirt_enabled())
> return -ENODEV;

Either #2 (but with the #ifdefs removed, include the proper header
instead) or #3.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/irq: Probe for PIC presence before allocating descs for legacy IRQs

2015-11-16 Thread Thomas Gleixner
On Mon, 16 Nov 2015, Boris Ostrovsky wrote:
> Xen expects legacy interrupts to be there (pretty much for the same reason as
> Hyper-V does) and with this change arch_probe_nr_irqs() returns zero and no
> descriptors are allocated.

Right, because everything which has a PIT gets them and everything
which does not have a PIT does not.

> We can allocate those descriptors as needed in xen_irq_init() (if we know that
> IRQs are legacy), although that would look somewhat ugly and out of place.

Why preallocating them in xen_irq_init()? You simply can remove the
NR_IRQS_LEGACY checks in xen_allocate_irq_gsi/xen_free_irq(), right?

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/7] timekeeping: introduce __current_kernel_time64

2015-11-10 Thread Thomas Gleixner
On Tue, 10 Nov 2015, John Stultz wrote:
> On Tue, Nov 10, 2015 at 7:10 AM, Stefano Stabellini
>  wrote:
> > On Tue, 10 Nov 2015, Arnd Bergmann wrote:
> >> On Tuesday 10 November 2015 11:57:49 Stefano Stabellini wrote:
> >> > __current_kernel_time64 returns a struct timespec64, without taking the
> >> > xtime lock. Mirrors __current_kernel_time/current_kernel_time.
> >> >
> >>
> >> Actually it doesn't mirror __current_kernel_time/current_kernel_time
> >>
> >> > diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
> >> > index ec89d84..b5802bf 100644
> >> > --- a/include/linux/timekeeping.h
> >> > +++ b/include/linux/timekeeping.h
> >> > @@ -19,7 +19,8 @@ extern int do_sys_settimeofday(const struct timespec 
> >> > *tv,
> >> >   */
> >> >  unsigned long get_seconds(void);
> >> >  struct timespec64 current_kernel_time64(void);
> >> > -/* does not take xtime_lock */
> >> > +/* do not take xtime_lock */
> >> > +struct timespec64 __current_kernel_time64(void);
> >> >  struct timespec __current_kernel_time(void);
> >>
> >> Please change __current_kernel_time into a static inline function
> >> while you are introducing the new one, to match the patch description ;-)
> >
> > The implementation is:
> >
> > struct timekeeper *tk = _core.timekeeper;
> >
> > return timespec64_to_timespec(tk_xtime(tk));
> >
> > which cannot be easily made into a static inline, unless we start
> > exporting tk_core.
> 
> So the timekeeper is passed to the notifier. So you probably want something 
> like
> 
> struct timespec64 __current_kernel_time64(struct timekeeper *tk)
> {
>  return timespec64_to_timespec(tk_xtime(tk));
> }
> 
> Then you can cast the priv pointer in the notifier to a timekeeper and
> use it that way?

Err no. Look at commit 8758a240e2d74c5932ab51a73377e6507b7fd441

i.e. Add the new 64bit function and make the existing one a static
inline which does the timespec64 to timespec conversion.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/7] timekeeping: introduce __current_kernel_time64

2015-11-10 Thread Thomas Gleixner
On Tue, 10 Nov 2015, John Stultz wrote:
> I'm sort of objecting to a different issue, where the
> __current_kernel_time() implementation probably shouldn't be grabbing
> the tk_core.timekeeper directly, and instead should take a passed
> pointer to a timekeeper. The vdso/pv_clock usage should have a
> timekeeper passed to them that they could use.

That usage of __current_kernel_time() in that xen notifier is silly to
begin with. The notifier gets already called with a pointer to the
time keeper. That xen implementation just does not use it.

We extract exactly that information in the vdso updates without
calling back into the core code. So for solving that xen thing we do
not need a 64 bit variant of __current_kernel_time() at all. The
notifier has the pointer to the timekeeper and can just grab data from
there.

> There's one useage in kdb thats maybe problematic, so maybe this will
> need a deeper cleanup.

That one is silly as well. It only wants to know the seconds portion.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Linux 4.2-rc6 regression: RIP: e030:[ffffffff8110fb18] [ffffffff8110fb18] detach_if_pending+0x18/0x80

2015-08-17 Thread Thomas Gleixner
On Mon, 17 Aug 2015, Eric Dumazet wrote:
 [PATCH] timer: fix a race in __mod_timer()
 
 lock_timer_base() can not catch following :
 
 CPU1 ( in __mod_timer()
 timer-flags |= TIMER_MIGRATING;
 spin_unlock(base-lock);
 base = new_base;
 spin_lock(base-lock);
 timer-flags = ~TIMER_BASEMASK;
   CPU2 (in lock_timer_base())
   see timer base is cpu0 base
   spin_lock_irqsave(base-lock, *flags);
   if (timer-flags == tf)
return base; // oops, wrong base
 timer-flags |= base-cpu // too late
 
 We must write timer-flags in one go, otherwise we can fool other cpus.
 
 Fixes: bc7a34b8b9eb (timer: Reduce timer migration overhead if disabled)
 Signed-off-by: Eric Dumazet eduma...@google.com
 Cc: Thomas Gleixner t...@linutronix.de
 ---
  kernel/time/timer.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/kernel/time/timer.c b/kernel/time/timer.c
 index 5e097fa9faf7..84190f02b521 100644
 --- a/kernel/time/timer.c
 +++ b/kernel/time/timer.c
 @@ -807,8 +807,8 @@ __mod_timer(struct timer_list *timer, unsigned long 
 expires,
   spin_unlock(base-lock);
   base = new_base;
   spin_lock(base-lock);
 - timer-flags = ~TIMER_BASEMASK;
 - timer-flags |= base-cpu;
 + WRITE_ONCE(timer-flags,
 +(timer-flags  ~TIMER_BASEMASK) | 
 base-cpu);

Duh, yes. Picking it up for timers/urgent.

Thanks for spotting it.

   tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/events: Support event channel rebind on ARM

2015-07-26 Thread Thomas Gleixner
On Sat, 25 Jul 2015, Julien Grall wrote:
 +/*
 + * Events delivered via platform PCI interrupts are always
 + * routed to vcpu 0 and hence cannot be rebound.
 + */
 +#define xen_support_evtchn_rebind()  \
 + (!xen_hvm_domain() || xen_have_vector_callback)

Make this an inline please.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [tip:irq/urgent] genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now

2015-07-15 Thread tip-bot for Thomas Gleixner
Commit-ID:  ce0d3c0a6fb1422101498ef378c0851dabbbf67f
Gitweb: http://git.kernel.org/tip/ce0d3c0a6fb1422101498ef378c0851dabbbf67f
Author: Thomas Gleixner t...@linutronix.de
AuthorDate: Tue, 14 Jul 2015 22:03:57 +0200
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Wed, 15 Jul 2015 10:39:17 +0200

genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now

Boris reported that the sparse_irq protection around __cpu_up() in the
generic code causes a regression on Xen. Xen allocates interrupts and
some more in the xen_cpu_up() function, so it deadlocks on the
sparse_irq_lock.

There is no simple fix for this and we really should have the
protection for all architectures, but for now the only solution is to
move it to x86 where actual wreckage due to the lack of protection has
been observed.

Reported-and-tested-by: Boris Ostrovsky boris.ostrov...@oracle.com
Fixes: a89941816726 'hotplug: Prevent alloc/free of irq descriptors during cpu 
up/down'
Signed-off-by: Thomas Gleixner t...@linutronix.de
Cc: Peter Zijlstra pet...@infradead.org
Cc: xiao jin jin.x...@intel.com
Cc: Joerg Roedel jroe...@suse.de
Cc: Borislav Petkov b...@suse.de
Cc: Yanmin Zhang yanmin_zh...@linux.intel.com
Cc: xen-devel xen-de...@lists.xenproject.org
---
 arch/x86/kernel/smpboot.c | 11 +++
 kernel/cpu.c  |  9 -
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d3010aa..b1f3ed9c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -992,8 +992,17 @@ int native_cpu_up(unsigned int cpu, struct task_struct 
*tidle)
 
common_cpu_up(cpu, tidle);
 
+   /*
+* We have to walk the irq descriptors to setup the vector
+* space for the cpu which comes online.  Prevent irq
+* alloc/free across the bringup.
+*/
+   irq_lock_sparse();
+
err = do_boot_cpu(apicid, cpu, tidle);
+
if (err) {
+   irq_unlock_sparse();
pr_err(do_boot_cpu failed(%d) to wakeup CPU#%u\n, err, cpu);
return -EIO;
}
@@ -1011,6 +1020,8 @@ int native_cpu_up(unsigned int cpu, struct task_struct 
*tidle)
touch_nmi_watchdog();
}
 
+   irq_unlock_sparse();
+
return 0;
 }
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6a37454..5644ec5 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -527,18 +527,9 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen)
goto out_notify;
}
 
-   /*
-* Some architectures have to walk the irq descriptors to
-* setup the vector space for the cpu which comes online.
-* Prevent irq alloc/free across the bringup.
-*/
-   irq_lock_sparse();
-
/* Arch-specific enabling code. */
ret = __cpu_up(cpu, idle);
 
-   irq_unlock_sparse();
-
if (ret != 0)
goto out_notify;
BUG_ON(!cpu_online(cpu));

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down

2015-07-14 Thread Thomas Gleixner
On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
  Prevent allocation and freeing of interrupt descriptors accross cpu
  hotplug.
 
 
 This breaks Xen guests that allocate interrupt descriptors in .cpu_up().

And where exactly does XEN allocate those descriptors?
 
 Any chance this locking can be moved into arch code?

No.

 (The patch doesn't appear to have any side effects for the down path since Xen
 guests deallocate descriptors in __cpu_die()).
 
Exact place please.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down

2015-07-14 Thread Thomas Gleixner
On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
 On 07/14/2015 01:32 PM, Thomas Gleixner wrote:
  On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
   On 07/14/2015 11:44 AM, Thomas Gleixner wrote:
On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
  Prevent allocation and freeing of interrupt descriptors accross cpu
  hotplug.
 This breaks Xen guests that allocate interrupt descriptors in
 .cpu_up().
And where exactly does XEN allocate those descriptors?
   xen_cpu_up()
xen_setup_timer()
bind_virq_to_irqhandler()
bind_virq_to_irq()
xen_allocate_irq_dynamic()
xen_allocate_irqs_dynamic()
irq_alloc_descs()
   
   
   There is also a similar pass via xen_cpu_up() - xen_smp_intr_init()
  Sigh.

   
 Any chance this locking can be moved into arch code?
No.
  The issue here is that all architectures need that protection and just
  Xen does irq allocations in cpu_up.
  
  So moving that protection into architecture code is not really an
  option.
  
 Otherwise we will need to have something like arch_post_cpu_up()
 after the lock is released.
  I'm not sure, that this will work. You probably want to do this in the
  cpu prepare stage, i.e. before calling __cpu_up().
 
 For PV guests (the ones that use xen_cpu_up()) it will work either before or
 after __cpu_up(). At least my (somewhat limited) testing didn't show any
 problems so far.
 
 However, HVM CPUs use xen_hvm_cpu_up() and if you read comments there you will
 see that xen_smp_intr_init() needs to be called before native_cpu_up() but
 xen_init_lock_cpu() (which eventually calls irq_alloc_descs()) needs to be
 called after.
 
 I think I can split xen_init_lock_cpu() so that the part that needs to be
 called after will avoid going into irq core code. And then the rest will go
 into arch_cpu_prepare().

I think we should revisit this for 4.3. For 4.2 we can do the trivial
variant and move the locking in native_cpu_up() and x86 only. x86 was
the only arch on which such wreckage has been seen in the wild, but we
should have that protection for all archs in the long run.

Patch below should fix the issue.

Thanks,

tglx
---
commit d4a969314077914a623f3e2c5120cd2ef31aba30
Author: Thomas Gleixner t...@linutronix.de
Date:   Tue Jul 14 22:03:57 2015 +0200

genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for 
now

Boris reported that the sparse_irq protection around __cpu_up() in the
generic code causes a regression on Xen. Xen allocates interrupts and
some more in the xen_cpu_up() function, so it deadlocks on the
sparse_irq_lock.

There is no simple fix for this and we really should have the
protection for all architectures, but for now the only solution is to
move it to x86 where actual wreckage due to the lack of protection has
been observed.

Reported-by: Boris Ostrovsky boris.ostrov...@oracle.com
Fixes: a89941816726 'hotplug: Prevent alloc/free of irq descriptors during 
cpu up/down'
Signed-off-by: Thomas Gleixner t...@linutronix.de
Cc: Peter Zijlstra pet...@infradead.org
Cc: xiao jin jin.x...@intel.com
Cc: Joerg Roedel jroe...@suse.de
Cc: Borislav Petkov b...@suse.de
Cc: Yanmin Zhang yanmin_zh...@linux.intel.com
Cc: xen-devel xen-de...@lists.xenproject.org

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d3010aa79daf..b1f3ed9c7a9e 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -992,8 +992,17 @@ int native_cpu_up(unsigned int cpu, struct task_struct 
*tidle)
 
common_cpu_up(cpu, tidle);
 
+   /*
+* We have to walk the irq descriptors to setup the vector
+* space for the cpu which comes online.  Prevent irq
+* alloc/free across the bringup.
+*/
+   irq_lock_sparse();
+
err = do_boot_cpu(apicid, cpu, tidle);
+
if (err) {
+   irq_unlock_sparse();
pr_err(do_boot_cpu failed(%d) to wakeup CPU#%u\n, err, cpu);
return -EIO;
}
@@ -1011,6 +1020,8 @@ int native_cpu_up(unsigned int cpu, struct task_struct 
*tidle)
touch_nmi_watchdog();
}
 
+   irq_unlock_sparse();
+
return 0;
 }
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6a374544d495..5644ec5582b9 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -527,18 +527,9 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen)
goto out_notify;
}
 
-   /*
-* Some architectures have to walk the irq descriptors to
-* setup the vector space for the cpu which comes online.
-* Prevent irq alloc/free across the bringup.
-*/
-   irq_lock_sparse();
-
/* Arch-specific enabling code. */
ret = __cpu_up(cpu, idle);
 
-   irq_unlock_sparse();
-
if (ret != 0)
goto out_notify

Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down

2015-07-14 Thread Thomas Gleixner
On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
 On 07/14/2015 11:44 AM, Thomas Gleixner wrote:
  On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
Prevent allocation and freeing of interrupt descriptors accross cpu
hotplug.
   
   This breaks Xen guests that allocate interrupt descriptors in .cpu_up().
  And where exactly does XEN allocate those descriptors?
 
 xen_cpu_up()
 xen_setup_timer()
 bind_virq_to_irqhandler()
 bind_virq_to_irq()
 xen_allocate_irq_dynamic()
 xen_allocate_irqs_dynamic()
 irq_alloc_descs()
 
 
 There is also a similar pass via xen_cpu_up() - xen_smp_intr_init()

Sigh.
 
 

   Any chance this locking can be moved into arch code?
  No.

The issue here is that all architectures need that protection and just
Xen does irq allocations in cpu_up.

So moving that protection into architecture code is not really an
option.

   Otherwise we will need to have something like arch_post_cpu_up()
   after the lock is released.

I'm not sure, that this will work. You probably want to do this in the
cpu prepare stage, i.e. before calling __cpu_up().

I have to walk the dogs now. Will look into it later tonight.

   (The patch doesn't appear to have any side effects for the down path since
   Xen
   guests deallocate descriptors in __cpu_die()).
Exact place please.
 
 Whose place? Where descriptors are deallocated?
 
 __cpu_die()
 xen_cpu_die()
 xen_teardown_timer()
 unbind_from_irqhandler()
 unbind_from_irq()
 __unbind_from_irq()
 xen_free_irq()
 irq_free_descs()
 free_desc()

Right, that's outside the lock held region.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-06-16 Thread Thomas Gleixner
On Tue, 16 Jun 2015, Juergen Gross wrote:

 AFAIK there are no outstanding questions for more than one month now.
 I'd appreciate some feedback or accepting these patches.

They are against dead code, which will be gone soon. We switched over
to queued locks.

Thanks,

tglx


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch v3 27/36] x86, irq: Use access helper irq_data_get_affinity_mask()

2015-06-02 Thread Thomas Gleixner
On Mon, 1 Jun 2015, Jiang Liu wrote:

 diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
 index 9b62f690b0ff..dfa3a5f5b3d3 100644
 --- a/arch/x86/kernel/apic/vector.c
 +++ b/arch/x86/kernel/apic/vector.c
 @@ -494,9 +494,8 @@ static int apic_set_affinity(struct irq_data *irq_data,
  
   err = assign_irq_vector(irq, data, dest);
   if (err) {
 - struct irq_data *top = irq_get_irq_data(irq);
 -
 - if (assign_irq_vector(irq, data, top-affinity))
 + if (assign_irq_vector(irq, data,
 +   irq_data_get_affinity_mask(irq_data)))

Does this patch work w/o moving the affinity mask to common data? I
doubt so, as you remove the retrieval of 'top'.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch v2 08/14] genirq: Introduce helper function irq_data_get_affinity_mask()

2015-05-20 Thread Thomas Gleixner
On Wed, 20 May 2015, Jiang Liu wrote:

 Introduce helper function irq_data_get_affinity_mask() and
 irq_get_affinity_mask() to hide implementation details,

That patch does way more than introducing the functions. Again:

Patch 1: Introduce helpers

Patch 2-n: Convert users subsystem wise

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V6 01/18] x86: Make page cache mode a real type

2015-01-22 Thread Thomas Gleixner
On Thu, 22 Jan 2015, Juergen Gross wrote:
 On 01/22/2015 08:11 AM, Steven Noonan wrote:
  I notice these two symbols are exported GPL-only. This breaks builds
  of several out-of-tree non-GPL modules such as the NVIDIA driver, and
  VMware modules, etc. What is the appropriate code path for proprietary
  modules to use when setting page cache mode flags? Alternatively, is
  it possible for these EXPORT_SYMBOL_GPLs to be changed to
  EXPORT_SYMBOL?
 
 I don't mind you sending a patch to change this. I won't object such a
 patch. OTOH this is more kind of a political question and I don't want
 to spend my time on arguing.

It's rather simple. If the new cache stuff replaces code which was
available under EXPORT_SYMBOL, we either need a compat export or make
the new exports non GPL.

Not that I like it, but that has been the policy so far. We only break
out of tree stuff if the export is broken or security relevant, like
we did with the init_mm export a few years ago.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] treewide: Convert clockevents_notify to use int cpu

2015-01-22 Thread Thomas Gleixner
On Wed, 10 Dec 2014, Joe Perches wrote:
 As far as I can tell, there's no value indirecting
 the cpu passed to this function via a void *.
 
 Update all the callers and called functions from within
 clockevents_notify.

Aside of that there is no value for this 'notification' function at
all. This should be seperate explicit calls. The notify function is a
leftover from the original implementation which used actual notifier
chains. I'll send out a cleanup series later today.

Thanks,

tglx



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] [Bugfix] x86/apic: Fix xen IRQ allocation failure caused by commit b81975eade8c

2015-01-13 Thread Thomas Gleixner
On Tue, 13 Jan 2015, Sander Eikelenboom wrote:

 
 Monday, January 12, 2015, 4:01:00 PM, you wrote:
 
  On 12/01/15 13:39, Jiang Liu wrote:
  Commit b81975eade8c (x86, irq: Clean up irqdomain transition code)
  breaks xen IRQ allocation because xen_smp_prepare_cpus() doesn't invoke
  setup_IO_APIC(), so no irqdomains created for IOAPICs and
  mp_map_pin_to_irq() fails at the very beginning.
  
  Enhance xen_smp_prepare_cpus() to call setup_IO_APIC() to initialize
  irqdomain for IOAPICs.
 
  Having Xen call setup_IO_APIC() to initialize the irq domains then having to
  add special cases to it is just wrong.
 
  The bits of init deferred by mp_register_apic() are also deferred to
  two different places which looks odd.
 
  What about something like the following (untested) patch?
 
 Hi David / Gerry,
 
 David's patch (after fixing a few compile issues) fixes the problem.
 
 The power button now works for me on:
 - intel baremetal
 - intel xen
 - amd baremetal (no issues with the override anymore)
 - amd xen   (no freeze issues anymore)

Can someone please send a proper patch with changelog?

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v13 10/11] pvqspinlock, x86: Enable PV qspinlock for KVM

2014-12-02 Thread Thomas Gleixner
On Wed, 29 Oct 2014, Waiman Long wrote:
 AIM7 XFS Disk Test (no overcommit)
   kernel JPMReal Time   Sys TimeUsr Time
   -  ----   
   PV ticketlock 25423737.08   98.95   5.44
   PV qspinlock  25495757.06   98.63   5.40
   unfairlock  26162796.91   97.05   5.42
 
 AIM7 XFS Disk Test (200% overcommit)
   kernel JPMReal Time   Sys TimeUsr Time
   -  ----   
   PV ticketlock 64446827.93  415.22   6.33
   PV qspinlock  64562427.88  419.84   0.39

That number is made up by what? 

   unfairlock  69551825.88  377.40   4.09
 
 AIM7 EXT4 Disk Test (no overcommit)
   kernel JPMReal Time   Sys TimeUsr Time
   -  ----   
   PV ticketlock 19955659.02  103.67   5.76
   PV qspinlock  20111738.95  102.15   5.40
   unfairlock  20665908.71   98.13   5.46
 
 AIM7 EXT4 Disk Test (200% overcommit)
   kernel JPMReal Time   Sys TimeUsr Time
   -  ----   
   PV ticketlock 47834137.63  495.81  30.78
   PV qspinlock  47405837.97  475.74  30.95
   unfairlock  56022432.13  398.43  26.27
 
 For the AIM7 disk workload, both PV ticketlock and qspinlock have
 about the same performance. The unfairlock performs slightly better
 than the PV lock.

Slightly?

Taking the PV locks, which are basically the same for the existing
ticket locks and your new fangled qlocks as a reference then the so
called 'unfair locks' which are just the native locks w/o the PV
nonsense are fundamentally better up to a whopping 18% in the
ext4/200% overcommit case. See below.
 
 EBIZZY-m Test (no overcommit)
   kernelRec/s   Real Time   Sys TimeUsr Time
   - -   -   
   PV ticketlock 3255  10.00   60.65   3.62
   PV qspinlock  3318  10.00   54.27   3.60
   unfairlock  2833  10.00   26.66   3.09
 
 EBIZZY-m Test (200% overcommit)
   kernelRec/s   Real Time   Sys TimeUsr Time
   - -   -   
   PV ticketlock  841  10.00   71.03   2.37
   PV qspinlock   834  10.00   68.27   2.39
   unfairlock   865  10.00   27.08   1.51
 
   futextest (no overcommit)
   kernel   kops/s
   ---
   PV ticketlock11523
   PV qspinlock 12328
   unfairlock  9478
 
   futextest (200% overcommit)
   kernel   kops/s
   ---
   PV ticketlock 7276
   PV qspinlock  7095
   unfairlock  5614
 
 The ebizzy and futextest have much higher spinlock contention than
 the AIM7 disk workload. In this case, the unfairlock performs worse
 than both the PV ticketlock and qspinlock. The performance of the 2
 PV locks are comparable.

While I can see that the PV lock stuff performs 13% better for the
ebizzy no overcommit case, what about the very interresting numbers
for the same test with 200% overcommit?

The regular lock has a slightly better performance, but significantly
less sys/usr time. How do you explain that?

'Lies, damned lies and statistics' comes to my mind.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] frequent lockups in 3.18rc4

2014-11-21 Thread Thomas Gleixner
On Fri, 21 Nov 2014, Konrad Rzeszutek Wilk wrote:
 On Fri, Nov 21, 2014 at 08:51:43PM +0100, Thomas Gleixner wrote:
   On Fri, 21 Nov 2014, Linus Torvalds wrote:
   Here's the simplified end result. Again, this is TOTALLY UNTESTED. I
   compiled it and verified that the code generation looks like what I'd
   have expected, but that's literally it.
   
 static noinline int vmalloc_fault(unsigned long address)
 {
   pgd_t *pgd_dst;
   pgdval_t pgd_entry;
   unsigned index = pgd_index(address);
   
   if (index  KERNEL_PGD_BOUNDARY)
   return -1;
   
   pgd_entry = init_mm.pgd[index].pgd;
   if (!pgd_entry)
   return -1;
   
   pgd_dst = __va(PAGE_MASK  read_cr3());
   pgd_dst += index;
   
   if (pgd_dst-pgd)
   return -1;
   
   ACCESS_ONCE(pgd_dst-pgd) = pgd_entry;
  
  This will break paravirt. set_pgd/set_pmd are paravirt functions.
  
  But I'm fine with breaking it, then you just need to change
  CONFIG_PARAVIRT to 'def_bool n'
 
 That is not very nice.

Maybe not nice, but sensible.

Thanks,

tglx


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel