RE:[PATCH] sched: Add trace for task wake up latency and leave running time

2020-09-02 Thread gengdongjiu
Hi Peter,
Sorry for the late response.

> 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h index
> > 93ecd930efd3..edb622c40a90 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1324,6 +1324,13 @@ struct task_struct {
> > /* CPU-specific state of this task: */
> > struct thread_structthread;
> >
> > +   /* Task wake up time stamp */
> > +   u64 ts_wakeup;
> > +   /* Previous task switch out time stamp */
> > +   u64 pre_ts_end;
> > +   /* Next task switch in time stamp */
> > +   u64 next_ts_start;
> > +   boolwakeup_state;
> > /*
> >  * WARNING: on x86, 'thread_struct' contains a variable-sized
> >  * structure.  It *MUST* be at the end of 'task_struct'.
> 
> ^^^ did you read that comment?
   Sorry for my carelessness.

> 
> > diff --git a/include/trace/events/sched.h
> > b/include/trace/events/sched.h index fec25b9cfbaf..e99c6d573a42 100644
> > --- a/include/trace/events/sched.h
> > +++ b/include/trace/events/sched.h
> > @@ -183,6 +183,72 @@ TRACE_EVENT(sched_switch,
> > __entry->next_comm, __entry->next_pid, __entry->next_prio)  );
> >
> > +DECLARE_EVENT_CLASS(sched_latency_template,
> > +
> > +   TP_PROTO(bool preempt,
> > +struct task_struct *prev,
> > +struct task_struct *next,
> > +u64 time),
> > +
> > +   TP_ARGS(preempt, prev, next, time),
> > +
> > +   TP_STRUCT__entry(
> > +   __array(char,   prev_comm,  TASK_COMM_LEN   )
> > +   __field(pid_t,  prev_pid)
> > +   __field(int,prev_prio   )
> > +   __field(long,   prev_state  )
> > +   __array(char,   next_comm,  TASK_COMM_LEN   )
> > +   __field(pid_t,  next_pid)
> > +   __field(int,next_prio   )
> > +   __field(u64,time)
> > +   ),
> > +
> > +   TP_fast_assign(
> > +   memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
> > +   __entry->prev_pid   = prev->pid;
> > +   __entry->prev_prio  = prev->prio;
> > +   __entry->prev_state = __trace_sched_switch_state(preempt, 
> > prev);
> > +   memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
> > +   __entry->next_pid   = next->pid;
> > +   __entry->next_prio  = next->prio;
> > +   __entry->time   = time;
> > +   /* XXX SCHED_DEADLINE */
> > +   ),
> > +
> > +   TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> 
> > next_comm=%s next_pid=%d next_prio=%d passed
> time=%llu (ns)",
> > +   __entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
> > +
> > +   (__entry->prev_state & (TASK_REPORT_MAX - 1)) ?
> > + __print_flags(__entry->prev_state & (TASK_REPORT_MAX - 1), 
> > "|",
> > +   { TASK_INTERRUPTIBLE, "S" },
> > +   { TASK_UNINTERRUPTIBLE, "D" },
> > +   { __TASK_STOPPED, "T" },
> > +   { __TASK_TRACED, "t" },
> > +   { EXIT_DEAD, "X" },
> > +   { EXIT_ZOMBIE, "Z" },
> > +   { TASK_PARKED, "P" },
> > +   { TASK_DEAD, "I" }) :
> > + "R",
> > +
> > +   __entry->prev_state & TASK_REPORT_MAX ? "+" : "",
> > +   __entry->next_comm, __entry->next_pid, __entry->next_prio,
> > +   __entry->time)
> > +);
> 
> NAK, that tracepoint is already broken, we don't want to proliferate the 
> broken.
  
Sorry, What the meaning that tracepoint is already broken? 
Maybe I need to explain the reason that why I add two trace point. 
when using perf tool or Ftrace sysfs to capture the task wake-up latency and 
the task leaving running queue time, usually the trace data is too large and 
the CPU utilization rate is too high in the process due to a lot of disk write. 
Sometimes even the disk is full, the issue still does not reproduced that above 
two time exceed a certain threshold.  So I added two trace points, using filter 
we can only record the abnormal trace that includes wakeup latency and leaving 
running time larger than an threshold. 
Or do you have better solution?

> 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c index
> > 8471a0f7eb32..b5a1928dc948 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2464,6 +2464,8 @@ static void ttwu_do_wakeup(struct rq *rq, struct
> > task_struct *p, int wake_flags,  {
> > check_preempt_curr(rq, p, wake_flags);
> > p->state = TASK_RUNNING;
> > +   p->ts_wakeup = local_clock();
> > +   p->wakeup_state = true;
> > trace_sched_wakeup(p);
> 

Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2019-01-10 Thread gengdongjiu
> 
> On Thu, 10 Jan 2019 at 12:09, gengdongjiu  wrote:
> > Peter, I summarize James's main idea, James think QEMU does not 
> > needs to check *something* if Qemu support firmware-first.
> > What do we do for your comments?
> 
> Unless I'm missing something, the code in your most recent patchset 
> attempts to update an ACPI table when it gets the SIGBUS from the host 
> kernel without doing anything to check whether it has ever created the ACPI 
> table (and set up the QEMU global variable that tells the code where it is in 
> the guest memory) in the first place.

when QEMU version is greater than some version, it will default create the APEI 
table. But only when the guest is booted by UEFI, it will support to record the 
CPER to guest memory. 
In my test, I boot guest using UEFI, so it is no problem, I will check whether 
this booting uses UEFI before update the ACPI table.

> I don't see how that can work.
> 
> > >> I think one question here which it would be good to answer is:
> > >> if we are modelling a guest and we haven't specifically provided 
> > >> it an ACPI table to tell it about memory errors, what do we do 
> > >> when we get a sigbus from the host? We have basically two choices:
> > >>  (1) send the guest an SError (aka asynchronous external abort)
> > >>  anyway (with no further info about what the memory error is)
> > >
> > > For an AR signal an external abort is valid. Its up to the 
> > > implementation whether these are synchronous or asynchronous. Qemu 
> > > can only take a signal for something that was synchronous, so you can 
> > > choose between the two.
> > > Synchronous external abort is marginally better as an unaware OS 
> > > knows its affects this thread, and may be able to kill it.
> > > SError with an imp-def ESR is indistinguishable from 'part of the 
> > > soc fell out', and should always result in a panic().
> > >
> > >
> > >>  (2) just stop QEMU (as we would for a memory error in QEMU's
> > >>  own memory)
> > >
> > > This is also valid. A machine may take external-abort to EL3 and 
> > > then reboot/crash/burn.
> 
> We should decide which of these we want to do, and have a comment 
> explaining what we're doing. If I'm reading your current patchset correctly, 
> it does neither -- if it can't record the fault in the ACPI table it just 
> ignores it without either stopping QEMU or delivering an SError.

James may not know my detailed implementation in the QEMU. In my patch, I only 
handle the BUS_MCEERR_AR SIGBUS signal(synchronous signal). when the SIGBUS is 
BUS_MCEERR_AR, it will deliver a synchronous exception abort.
James said it needs to deliver an SError when the BUS_MCEERR_OR SIGBUS 
signal(asynchronous signal), but I do not handle the this case because QEMU 
main thread will mask this asynchronous signal.

If the memory error is belong to QEMU itself, I just print an error log[2]. If 
you think, it should stop QEMU for this case, I will change it.

void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) {
..
if (code == BUS_MCEERR_AR) {
kvm_cpu_synchronize_state(c);
if (ghes_record_errors(ACPI_HEST_NOTIFY_SEA, paddr)) {
kvm_inject_arm_sea(c);
} else {
fprintf(stderr, "failed to record the error\n");
}
}
[2] fprintf(stderr, "Hardware memory error for memory used by "
"QEMU itself instead of guest system!\n"); 
}

> 
> I think I favour option (2) here.
> 
> thanks
> -- PMM


撤回: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2019-01-10 Thread gengdongjiu
gengdongjiu 将撤回邮件“[RFC RESEND PATCH] kvm: arm64: export memory error recovery 
capability to user space”。

Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2019-01-10 Thread gengdongjiu
> 
> On Thu, 10 Jan 2019 at 12:09, gengdongjiu  wrote:
> > Peter, I summarize James's main idea, James think QEMU does not needs
> > to check *something* if Qemu support firmware-first.
> > What do we do for your comments?
> 
> Unless I'm missing something, the code in your most recent patchset attempts 
> to update an ACPI table when it gets the SIGBUS from the
> host kernel without doing anything to check whether it has ever created the 
> ACPI table (and set up the QEMU global variable that tells the
> code where it is in the guest memory) in the first place.

when QEMU version is greater than some version, it will default create the APEI 
table. But only when the guest is booted by UEFI, it will support to record the 
CPER to guest memory. 
In my test, I boot guest using UEFI, so it is no problem, I will check whether 
this booting uses UEFI before update the ACPI table.

> I don't see how that can work.
> 
> > >> I think one question here which it would be good to answer is:
> > >> if we are modelling a guest and we haven't specifically provided it
> > >> an ACPI table to tell it about memory errors, what do we do when we
> > >> get a sigbus from the host? We have basically two choices:
> > >>  (1) send the guest an SError (aka asynchronous external abort)
> > >>  anyway (with no further info about what the memory error is)
> > >
> > > For an AR signal an external abort is valid. Its up to the
> > > implementation whether these are synchronous or asynchronous. Qemu
> > > can only take a signal for something that was synchronous, so you can 
> > > choose between the two.
> > > Synchronous external abort is marginally better as an unaware OS
> > > knows its affects this thread, and may be able to kill it.
> > > SError with an imp-def ESR is indistinguishable from 'part of the
> > > soc fell out', and should always result in a panic().
> > >
> > >
> > >>  (2) just stop QEMU (as we would for a memory error in QEMU's
> > >>  own memory)
> > >
> > > This is also valid. A machine may take external-abort to EL3 and
> > > then reboot/crash/burn.
> 
> We should decide which of these we want to do, and have a comment explaining 
> what we're doing. If I'm reading your current patchset
> correctly, it does neither -- if it can't record the fault in the ACPI table 
> it just ignores it without either stopping QEMU or delivering an SError.

James may not know my detailed implementation in the QEMU. In my patch, I only 
handle the BUS_MCEERR_AR SIGBUS signal(synchronous signal). when the SIGBUS is 
BUS_MCEERR_AR, it will deliver a synchronous exception abort.
James said it needs to deliver an SError when the BUS_MCEERR_OR SIGBUS 
signal(synchronous signal), but I do not handle the this case because QEMU main 
thread will mask this asynchronous signal.

If the memory error is belong to QEMU itself, I just print an error log[2]. If 
you think, it should stop QEMU for this case, I will change it.

void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) {
..
if (code == BUS_MCEERR_AR) {
kvm_cpu_synchronize_state(c);
if (ghes_record_errors(ACPI_HEST_NOTIFY_SEA, paddr)) {
kvm_inject_arm_sea(c);
} else {
fprintf(stderr, "failed to record the error\n");
}
}
[2] fprintf(stderr, "Hardware memory error for memory used by "
"QEMU itself instead of guest system!\n");
}
> 
> I think I favour option (2) here.
> 
> thanks
> -- PMM


Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2019-01-10 Thread gengdongjiu
Hi James/Peter,
   thanks for this discussion, and sorry for my late response due to vacation.

On 2018/12/22 2:17, James Morse wrote:
> Hi Peter,
> 
> On 19/12/2018 19:02, Peter Maydell wrote:
>> On Mon, 17 Dec 2018 at 15:56, James Morse  wrote:
>>> I don't think this really matters. Its only the NMIlike notifications that 
>>> the
>>> guest doesn't have to register or poll. The ones we support today extend the
>>> architectures existing behaviour: you would have taken an external-abort on 
>>> a
>>> real system, whether you know about the additional metadata doesn't matter 
>>> to Qemu.
>>
>> Consider the case where we booted the guest using a DTB and no ACPI
>> table at all -- we certainly can't just call QEMU code that tries to
>> add entries to a nonexistent table.
> 
> Sure, because you know which of the two sets of firmware-table you're 
> providing.
> 
> I'm taking the behaviour of physical machines as the template for what we 
> should
> do here. I can boot a DT-only kernel on Seattle. Firmware has no idea I did
> this, it will still take DRAM uncorrected-error IRQs in firmware, and generate
> CPER records in the POLLed areas. But the kernel will never look, because it
> booted with DT.
> What happens if the kernel goes on to access the corrupt location? It either
> gets corrupt values back, or an external abort, depending on the design of the
> memory-controller.
> 
> X-gene uses an IRQ for its firmware-first notification. Booted with DT that
> interrupt can be asserted, but as the OS has didn't know to register it, its
> never taken. We eventually get the same corrupt-values/external-abort 
> behaviour.
> 
> KVM/Linux is acting as the memory controller using stage2. When an error is
> detected by the host it unmaps the page from stage2, and refuses to map it 
> again
> until its fixed up in Qemu's memory map (which can happen automatically). If 
> the
> kernel can't fix it itself, the AO signal is like the DRAM-IRQ above, and the 
> AR
> like the external abort.
> We don't have a parallel to the 'gets corrupt values back' behaviour as Linux
> will always unmap hwpoison pages from user-space/guests.
> 
> If the host-kernel wasn't build with CONFIG_MEMORY_FAILURE, its like the 
> memory
> controller doesn't support any of the above. I think knowing this is the 
> closest
> to what you want.
> 
> 
>> My main point is that there
>> needs to be logic in Dongjiu's QEMU patches that checks more than
>> just "does this KVM feature exist". I'm not sufficiently familiar
>> with all this RAS stuff to be certain what those checks should
>> be and what the right choices are; I just know we need to check
>> *something*...
> 
> I think this is the crux of where we don't see this the same way.
> The v8.2 RAS stuff is new, RAS support on arm64 is not. Kernel support arrived
> at roughly the same time, but not CPU support. There are v8.0 systems that
> support RAS. There are DT systems that can do the same with edac drivers.
> The physical v8.0 systems that do this, are doing it without any extra CPU 
> support.
> 
> I think x86's behaviour here includes some history, which we don't have.
>>From the order of the HEST entries, it looks like the machine-check stuff came
> first, then firmware-first using a 'GHES' entry in that table.
> I think Qemu on x86 only supports the emulated machine check stuff, so it 
> needs
> to check KVM has the widget to do this.
> If Qemu on x86 supported firmware-first, I don't think there would be anything
> to check. (details below)

Peter, I summarize James's main idea, James think QEMU does not needs to check 
*something* if Qemu support firmware-first.
What do we do for your comments?

> 
> 
 Let us see the X86's QEMU logic:
 1. Before the vCPU created, it will set a default env->mcg_cap value with
>>>
 MCE_CAP_DEF flag, MCG_SER_P means it expected the guest CPU model supports
 RAS error recovery.[1] 2. when the vCPU initialize, it will check whether 
 host
 kernel support this feature[2]. Only when host kernel and default 
 env->mcg_cap
 value all expected this feature, then it will setup vCPU support RAS error
 recovery[3].
>>>
>>> This looks like KVM exposing a CPU capability to Qemu, which then 
>>> configures the
>>> behaviour KVM gives to the guest. This doesn't tell you anything about what 
>>> the
>>> guest supports.
>>
>> It tells you what the *guest CPU* supports, which for x86 is a combination
>> of (a) what did the user/machine model ask for and (b) what can KVM
>> actually implement. I don't much care whether the guest OS supports
>> anything or not, that's its business... but it does seem odd to me
>> that the equivalent Arm code is not similarly saying "what were we
>> asked for, and what can we do?".
> 
> The flow is something like:
> For AO, generate CPER records, and notify the OS via NOTIFY_POLL (which isn't
> really a notification) or some flavour of IRQ.
> To do this, Qemu needs to be able to write to its reserved area of 

Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2018-12-14 Thread gengdongjiu
> 
> On Fri, 14 Dec 2018 at 13:56, James Morse  wrote:
> >
> > Hi Dongjiu Geng,
> >
> > On 14/12/2018 10:15, Dongjiu Geng wrote:
> > > When user space do memory recovery, it will check whether KVM and 
> > > guest support the error recovery, only when both of them support, 
> > > user space will do the error recovery. This patch exports this 
> > > capability of KVM to user space.
> >
> > I can understand user-space only wanting to do the work if host and 
> > guest support the feature. But 'error recovery' isn't a KVM feature, 
> > its a Linux kernel feature.
> >
> > KVM will send it's user-space a SIGBUS with MCEERR code whenever its 
> > trying to map a page at stage2 that the kernel-mm code refuses this because 
> > its poisoned.
> > (e.g. check_user_page_hwpoison(), get_user_pages() returns 
> > -EHWPOISON)
> >
> > This is exactly the same as happens to a normal user-space process.
> >
> > I think you really want to know if the host kernel was built with 
> > CONFIG_MEMORY_FAILURE.
> 
> Does userspace need to care about that? Presumably if the host kernel 
> wasn't built with that support then it will simply never deliver any memory 
> failure events to QEMU, which is fine.
> 
> The point I was trying to make in the email Dongjiu references
> (https://patchwork.codeaurora.org/patch/652261/) is simply that "QEMU gets 
> memory-failure notifications from the host kernel"
> does not imply "the guest is prepared to receive memory failure 
> notifications", and so the code path which handles the SIGBUS must do 
> some kind of check for whether the guest CPU is a type which expects them and 
> that the board code set up the ACPI tables that it wants to fill in.

Thanks Peter's explanation. Frankly speaking, I agree Peter's suggestion.

To James, I explain more to you, as peter said QEMU needs to check whether the 
guest CPU is a type which can handle the error though guest ACPI table. Let us 
see the X86's QEMU logic:
1. Before the vCPU created, it will set a default env->mcg_cap value with 
MCE_CAP_DEF flag, MCG_SER_P means it expected the guest CPU model supports RAS 
error recovery.[1] 2. when the vCPU initialize, it will check whether host 
kernel support this feature[2]. Only when host kernel and default env->mcg_cap 
value all expected this feature, then it will setup vCPU support RAS error 
recovery[3].
So I add this IOCTL "KVM_CAP_ARM_MEMORY_ERROR_RECOVERY" to Let QEMU check 
whether host/KVM support RAS error detection and recovery, only when this 
supports, QEMU will do the error recovery for the guest memory. 

[1]
#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF |
(cpu->enable_lmce ? MCG_LMCE_P : 0);

[2] ret = kvm_get_mce_cap_supported(cs->kvm_state, _cap, );

[3]
env->mcg_cap &= mcg_cap | MCG_CAP_BANKS_MASK;
ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, >mcg_cap);

-For James's 
comments-
> KVM doesn't detect these errors.
> The hardware detects them and notifies the OS via one of a number of 
> mechanisms.
> This gets plumbed into memory_failure(), which sets a flag that the mm 
> code uses to prevent the page being used again.

> KVM is only involved when it tries to map a page at stage2 and the mm 
> code rejects it with -EHWPOISON. This is the same as the architectures
> do_page_fault() checking for (fault & VM_FAULT_HWPOISON) out of 
> handle_mm_fault(). We don't have a KVM cap for this, nor do we need one.
--
James, for your above comments, I completed understand, but KVM also delivered 
the SIGBUS, which means KVM supports guest memory RAS error recovery, so maybe 
we need to tell user space this capability.

-- For James's comments 
---
> The CPU RAS Extensions are not at all relevant here. It is perfectly 
> possible to support memory-failure without them, AMD-Seattle and 
> APM-X-Gene do this. These systems would report not-supported here, but the 
> kernel does support this stuff.
> Just because the CPU supports this, doesn't mean the kernel was built 
> with CONFIG_MEMORY_FAILURE. The CPU reports may be ignored, or upgraded to 
> SIGKILL.
--
James, for your above comments, if you think we should not check the 
"cpus_have_const_cap(ARM64_HAS_RAS_EXTN)", which do you prefer we should check?
In the X86 KVM code, it uses hardcode to tell use space the host/KVM support 
RAS error software recovery[4]. If KVM does not check the " 
cpus_have_const_cap(ARM64_HAS_RAS_EXTN)", we have to check the hardcode as 
X86's method.

[4]:
u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | 

撤回: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2018-12-14 Thread gengdongjiu
gengdongjiu 将撤回邮件“[RFC RESEND PATCH] kvm: arm64: export memory error recovery 
capability to user space”。

Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2018-12-14 Thread gengdongjiu
> 
> On Fri, 14 Dec 2018 at 13:56, James Morse  wrote:
> >
> > Hi Dongjiu Geng,
> >
> > On 14/12/2018 10:15, Dongjiu Geng wrote:
> > > When user space do memory recovery, it will check whether KVM and
> > > guest support the error recovery, only when both of them support,
> > > user space will do the error recovery. This patch exports this
> > > capability of KVM to user space.
> >
> > I can understand user-space only wanting to do the work if host and
> > guest support the feature. But 'error recovery' isn't a KVM feature,
> > its a Linux kernel feature.
> >
> > KVM will send it's user-space a SIGBUS with MCEERR code whenever its
> > trying to map a page at stage2 that the kernel-mm code refuses this because 
> > its poisoned.
> > (e.g. check_user_page_hwpoison(), get_user_pages() returns -EHWPOISON)
> >
> > This is exactly the same as happens to a normal user-space process.
> >
> > I think you really want to know if the host kernel was built with
> > CONFIG_MEMORY_FAILURE.
> 
> Does userspace need to care about that? Presumably if the host kernel wasn't 
> built with that support then it will simply never deliver any
> memory failure events to QEMU, which is fine.
> 
> The point I was trying to make in the email Dongjiu references
> (https://patchwork.codeaurora.org/patch/652261/) is simply that "QEMU gets 
> memory-failure notifications from the host kernel"
> does not imply "the guest is prepared to receive memory failure 
> notifications", and so the code path which handles the SIGBUS must do
> some kind of check for whether the guest CPU is a type which expects them and 
> that the board code set up the ACPI tables that it wants to
> fill in.

Thanks Peter's explanation. Frankly speaking, I agree Peter's suggestion.

To James, I explain more to you, as peter said QEMU needs to check whether the 
guest CPU is a type which can handle the error though guest ACPI table. Let us 
see the X86's QEMU logic:
1. Before the vCPU created, it will set a default env->mcg_cap value with 
MCE_CAP_DEF flag, MCG_SER_P means it expected the guest CPU model supports RAS 
error recovery.[1]
2. when the vCPU initialize, it will check whether host kernel support this 
feature[2]. Only when host kernel and default env->mcg_cap value all expected 
this feature, then it will setup vCPU support RAS error recovery[3].
So I add this IOCTL "KVM_CAP_ARM_MEMORY_ERROR_RECOVERY" to Let QEMU check 
whether host/KVM support RAS error detection and recovery, only when this 
supports, QEMU will do the error recovery for the guest memory. 

[1]
#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF |
(cpu->enable_lmce ? MCG_LMCE_P : 0);

[2] ret = kvm_get_mce_cap_supported(cs->kvm_state, _cap, );

[3]
env->mcg_cap &= mcg_cap | MCG_CAP_BANKS_MASK;
ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, >mcg_cap);

-For James's 
comments-
> KVM doesn't detect these errors.
> The hardware detects them and notifies the OS via one of a number of 
> mechanisms.
> This gets plumbed into memory_failure(), which sets a flag that the mm code 
> uses
> to prevent the page being used again.

> KVM is only involved when it tries to map a page at stage2 and the mm code
> rejects it with -EHWPOISON. This is the same as the architectures
> do_page_fault() checking for (fault & VM_FAULT_HWPOISON) out of
> handle_mm_fault(). We don't have a KVM cap for this, nor do we need one.
--
James, for your above comments, I completed understand, but KVM also delivered 
the SIGBUS, which means KVM supports guest memory RAS error recovery, so maybe 
we need to tell user space this capability.

-- For James's comments 
---
> The CPU RAS Extensions are not at all relevant here. It is perfectly possible 
> to
> support memory-failure without them, AMD-Seattle and APM-X-Gene do this. These
> systems would report not-supported here, but the kernel does support this 
> stuff.
> Just because the CPU supports this, doesn't mean the kernel was built with
> CONFIG_MEMORY_FAILURE. The CPU reports may be ignored, or upgraded to SIGKILL.
--
James, for your above comments[4], if you think we should not check the 
"cpus_have_const_cap(ARM64_HAS_RAS_EXTN)", which do you prefer we should check?
In the X86 KVM code, it uses hardcode to tell use space the host/KVM support 
RAS error software recovery. If KVM does not check the " 
cpus_have_const_cap(ARM64_HAS_RAS_EXTN)", we have to check the hardcode as 
X86's method.

[4]:
u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P;

long 

Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2018-12-14 Thread gengdongjiu
HI James,

  Thanks for the mail and comments, I will reply to you in the next mail.

2018-12-14 21:55 GMT+08:00, James Morse :
> Hi Dongjiu Geng,
>
> On 14/12/2018 10:15, Dongjiu Geng wrote:
>> When user space do memory recovery, it will check whether KVM and
>> guest support the error recovery, only when both of them support,
>> user space will do the error recovery. This patch exports this
>> capability of KVM to user space.
>
> I can understand user-space only wanting to do the work if host and guest
> support the feature. But 'error recovery' isn't a KVM feature, its a Linux
> kernel feature.
>
> KVM will send it's user-space a SIGBUS with MCEERR code whenever its trying
> to
> map a page at stage2 that the kernel-mm code refuses this because its
> poisoned.
> (e.g. check_user_page_hwpoison(), get_user_pages() returns -EHWPOISON)
>
> This is exactly the same as happens to a normal user-space process.
>
> I think you really want to know if the host kernel was built with
> CONFIG_MEMORY_FAILURE. The not-at-all-portable way to tell this from
> user-space
> is the presence of /proc/sys/vm/memory_failure_* files.
> (It looks like the prctl():PR_MCE_KILL/PR_MCE_KILL_GET options silently
> update
> an ignored policy if the kernel isn't built with CONFIG_MEMORY_FAILURE, so
> they
> aren't helpful)
>
>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index cd209f7..241e2e2 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4895,3 +4895,12 @@ Architectures: x86
>>  This capability indicates that KVM supports paravirtualized Hyper-V IPI
>> send
>>  hypercalls:
>>  HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
>> +
>> +8.21 KVM_CAP_ARM_MEMORY_ERROR_RECOVERY
>> +
>> +Architectures: arm, arm64
>> +
>> +This capability indicates that guest memory error can be detected by the
>> KVM which
>> +supports the error recovery.
>
> KVM doesn't detect these errors.
> The hardware detects them and notifies the OS via one of a number of
> mechanisms.
> This gets plumbed into memory_failure(), which sets a flag that the mm code
> uses
> to prevent the page being used again.
>
> KVM is only involved when it tries to map a page at stage2 and the mm code
> rejects it with -EHWPOISON. This is the same as the architectures
> do_page_fault() checking for (fault & VM_FAULT_HWPOISON) out of
> handle_mm_fault(). We don't have a KVM cap for this, nor do we need one.
>
>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index b72a3dd..90d1d9a 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -82,6 +82,7 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm,
>> long ext)
>>  r = kvm_arm_support_pmu_v3();
>>  break;
>>  case KVM_CAP_ARM_INJECT_SERROR_ESR:
>> +case KVM_CAP_ARM_MEMORY_ERROR_RECOVERY:
>>  r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
>>  break;
>
> The CPU RAS Extensions are not at all relevant here. It is perfectly
> possible to
> support memory-failure without them, AMD-Seattle and APM-X-Gene do this.
> These
> systems would report not-supported here, but the kernel does support this
> stuff.
> Just because the CPU supports this, doesn't mean the kernel was built with
> CONFIG_MEMORY_FAILURE. The CPU reports may be ignored, or upgraded to
> SIGKILL.
>
>
>
> Thanks,
>
> James
>


Re: Disable RT_GROUP_SCHED in PREEMPT_RT_FULL

2018-12-09 Thread gengdongjiu
Loop RT kernel Mailing lists.


On 2018/12/10 1:04, zhang jianwei wrote:
> hi Thomas and Mike??
> i say you disabled the RT_GROUP_SCHED in RT kernel because of strange cpu 
> stall was observed. currently, in my projects, I need 
> enable?0?2RT_GROUP_SCHED in RT kernel. I??d like to know how to reproduce the 
> strange cpu stall in order to fix this bug. can you guys give me some advice 
> ?0?2about how to fix this problem.
> Thanks in advance!
> 
> 
> patched mentioned?0?2
> https://lore.kernel.org/patchwork/patch/313215/
> 
> 



Re: Resend: How to handle the SMMU RAS Error in the kernel

2018-11-21 Thread gengdongjiu
Hi James,
   Thanks for the mail.

On 2018/11/20 2:05, James Morse wrote:
> Hi gengdongjiu,
> 
> On 17/11/2018 15:41, gengdongjiu wrote:
>> In the current kernel, it only handles three kinds of error, which is
>> memory error, PCIE device and ARM process. But now the SMMU already
>> support the RAS, how to handle the SMMU RAS error in the kernel?
> 
> What errors are being detected here?
> 
> I don't know much about the SMMU, but I think we should start with a list of
> errors that we want to handle.

In our platform, the SMMU RAS error mainly include below which flow the SMMU 
spec:
1. one bit ECC error, reported as CE.
2. two bits ECC error, reported as UEU.
3. fetch error in the SMMUv3 spec, reported as UER.

The 2 and 3 should be handled, but I do not know how do recovery to it.

> 
> 
> Is this a v8.2 fault handling interrupt from the SMMU taken to EL3?
> Or a cpu-access that was returned as external-abort? or a device access that 
> was
> told external-abort?
it flows v8.2 RAS spec, it is a v8.2 fault handling interrupt from the SMMU 
taken to EL3.

> 
> What do we intend to do with this error information? Does the DMA layer have
> error handling we can hook this into?

we can get the DMA layer error from the RAS registers, such as DMA read errors.
may be the handle method is disabling this SMMU to avoid propagation.

> 
> Is this just another interface for memory-errors? (e.g the SMMU provides a
> device/address pair and the kernel works out the physical page to run
> memory_failure() on)
I need to check more.

> 
> 
>> I check the UEFI_SPEC_2.7, the ACPI's CPER have the IOMMU type, but it
>> seems the IOMMU type only are specific to AMD’s IOMMU specification,
> 
> ... and Intel VT-d. It looks like UEFI generalises all these as types of 
> 'DMAr'.
yes, it is.

> 
> 
>> not have the ARM’s IOMMU section type, can we reuse this IOMMU section
>> type for the ARM SMMU?
> 
> The architecture specific records for AMD? No. Even if the information was the
> same, the presence of this record tells you its an AMD IOMMU, which its not.
> 
> The generic error section? Maybe.
> 
> Assuming the 'fault reason' list in Table 285 is sufficient to cover our list 
> or
> errors, we can use the 'DMAr Generic Errors' section, (N.2.11.1), to describe
> the generic bits of the error ... but SMMU doesn't have an 'Architecture 
> Type',
> so we at least need to get one allocated.
> 
> We will probably need an architecture specific 'DMAr Error Section'.
> 
> 
> I think adding the UEFI bits is probably the 'easy' bit. We should start with 
> a
> list of errors, and the error handling code. This way we know what we need to
> add to the spec.

The list of SMMU RAS error is shown below, but I still do not know how to 
handle it.
1. one bit ECC error
2. two bits ECC error
3. fetch error in the SMMUv3 spec

> 
> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: Resend: How to handle the SMMU RAS Error in the kernel

2018-11-21 Thread gengdongjiu
Hi James,
   Thanks for the mail.

On 2018/11/20 2:05, James Morse wrote:
> Hi gengdongjiu,
> 
> On 17/11/2018 15:41, gengdongjiu wrote:
>> In the current kernel, it only handles three kinds of error, which is
>> memory error, PCIE device and ARM process. But now the SMMU already
>> support the RAS, how to handle the SMMU RAS error in the kernel?
> 
> What errors are being detected here?
> 
> I don't know much about the SMMU, but I think we should start with a list of
> errors that we want to handle.

In our platform, the SMMU RAS error mainly include below which flow the SMMU 
spec:
1. one bit ECC error, reported as CE.
2. two bits ECC error, reported as UEU.
3. fetch error in the SMMUv3 spec, reported as UER.

The 2 and 3 should be handled, but I do not know how do recovery to it.

> 
> 
> Is this a v8.2 fault handling interrupt from the SMMU taken to EL3?
> Or a cpu-access that was returned as external-abort? or a device access that 
> was
> told external-abort?
it flows v8.2 RAS spec, it is a v8.2 fault handling interrupt from the SMMU 
taken to EL3.

> 
> What do we intend to do with this error information? Does the DMA layer have
> error handling we can hook this into?

we can get the DMA layer error from the RAS registers, such as DMA read errors.
may be the handle method is disabling this SMMU to avoid propagation.

> 
> Is this just another interface for memory-errors? (e.g the SMMU provides a
> device/address pair and the kernel works out the physical page to run
> memory_failure() on)
I need to check more.

> 
> 
>> I check the UEFI_SPEC_2.7, the ACPI's CPER have the IOMMU type, but it
>> seems the IOMMU type only are specific to AMD’s IOMMU specification,
> 
> ... and Intel VT-d. It looks like UEFI generalises all these as types of 
> 'DMAr'.
yes, it is.

> 
> 
>> not have the ARM’s IOMMU section type, can we reuse this IOMMU section
>> type for the ARM SMMU?
> 
> The architecture specific records for AMD? No. Even if the information was the
> same, the presence of this record tells you its an AMD IOMMU, which its not.
> 
> The generic error section? Maybe.
> 
> Assuming the 'fault reason' list in Table 285 is sufficient to cover our list 
> or
> errors, we can use the 'DMAr Generic Errors' section, (N.2.11.1), to describe
> the generic bits of the error ... but SMMU doesn't have an 'Architecture 
> Type',
> so we at least need to get one allocated.
> 
> We will probably need an architecture specific 'DMAr Error Section'.
> 
> 
> I think adding the UEFI bits is probably the 'easy' bit. We should start with 
> a
> list of errors, and the error handling code. This way we know what we need to
> add to the spec.

The list of SMMU RAS error is shown below, but I still do not know how to 
handle it.
1. one bit ECC error
2. two bits ECC error
3. fetch error in the SMMUv3 spec

> 
> 
> 
> Thanks,
> 
> James
> 
> .
> 



Resend: How to handle the SMMU RAS Error in the kernel

2018-11-17 Thread gengdongjiu
Hi robin/will/James


In the current kernel, it only handles three kinds of error, which is
memory error, PCIE device and ARM process. But now the SMMU already
support the RAS, how to handle the SMMU RAS error in the kernel?

I check the UEFI_SPEC_2.7, the ACPI's CPER have the IOMMU type, but it
seems the IOMMU type only are specific to AMD’s IOMMU specification,
not have the ARM’s IOMMU section type, can we reuse this IOMMU section
type for the ARM SMMU?



N.2.11.3 IOMMU specific DMAr Error Section

Type: {0x036F84E1, 0x7F37, 0x428c, {0xA7, 0x9E, 0x57, 0x5F, 0xDF,
0xAA, 0x84, 0xEC}}

All fields in this error record are specific to AMD’s IOMMU
specification. This error section has a fixed size.


Resend: How to handle the SMMU RAS Error in the kernel

2018-11-17 Thread gengdongjiu
Hi robin/will/James


In the current kernel, it only handles three kinds of error, which is
memory error, PCIE device and ARM process. But now the SMMU already
support the RAS, how to handle the SMMU RAS error in the kernel?

I check the UEFI_SPEC_2.7, the ACPI's CPER have the IOMMU type, but it
seems the IOMMU type only are specific to AMD’s IOMMU specification,
not have the ARM’s IOMMU section type, can we reuse this IOMMU section
type for the ARM SMMU?



N.2.11.3 IOMMU specific DMAr Error Section

Type: {0x036F84E1, 0x7F37, 0x428c, {0xA7, 0x9E, 0x57, 0x5F, 0xDF,
0xAA, 0x84, 0xEC}}

All fields in this error record are specific to AMD’s IOMMU
specification. This error section has a fixed size.


Re: [PATCH v2 2/2] arm/arm64: KVM: enable 32 bits kvm vcpu events support

2018-10-09 Thread gengdongjiu
Hi Suzuki

On 2018/10/10 1:22, Suzuki K Poulose wrote:
> 
> 
> On 08/10/18 13:34, Dongjiu Geng wrote:
>> The commit 539aee0edb9f ("KVM: arm64: Share the parts of
>> get/set events useful to 32bit") shares the get/set events
>> helper for arm64 and arm32, it is better also share the check
>> for vcpu events capability to enable 32 bit kvm vcpu events
>> support.
>>
>> User space will check whether KVM supports vcpu events through
>> KVM_CAP_VCPU_EVENTS IOCTL.
> 
> nit: User space will check whether KVM supports vcpu events by checking
> the KVM_CAP_VCPU_EVENTS extension.

  Thanks for the pointing out, I will update it.

> 
>>
>> Cc: James Morse 
>> Signed-off-by: Dongjiu Geng 
> 
> Reviewed-by : Suzuki K Poulose 
> 
>> ---
>> For the 32 bits kvm migration, it needs to enable the vcpu events,
>> this patch will enable it. The user space QEMU patch is here:
>> https://patchwork.ozlabs.org/patch/975615/
>> ---
>>   arch/arm64/kvm/reset.c | 1 -
>>   virt/kvm/arm/arm.c | 1 +
>>   2 files changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index fd37c53..e50245e 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -82,7 +82,6 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, 
>> long ext)
>>   break;
>>   case KVM_CAP_SET_GUEST_DEBUG:
>>   case KVM_CAP_VCPU_ATTRIBUTES:
>> -    case KVM_CAP_VCPU_EVENTS:
>>   r = 1;
>>   break;
>>   default:
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 40e79ea..64e5d97 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -212,6 +212,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
>> ext)
>>   case KVM_CAP_READONLY_MEM:
>>   case KVM_CAP_MP_STATE:
>>   case KVM_CAP_IMMEDIATE_EXIT:
>> +    case KVM_CAP_VCPU_EVENTS:
>>   r = 1;
>>   break;
>>   case KVM_CAP_ARM_SET_DEVICE_ADDR:
>>
> 
> .
> 



Re: [PATCH v2 2/2] arm/arm64: KVM: enable 32 bits kvm vcpu events support

2018-10-09 Thread gengdongjiu
Hi Suzuki

On 2018/10/10 1:22, Suzuki K Poulose wrote:
> 
> 
> On 08/10/18 13:34, Dongjiu Geng wrote:
>> The commit 539aee0edb9f ("KVM: arm64: Share the parts of
>> get/set events useful to 32bit") shares the get/set events
>> helper for arm64 and arm32, it is better also share the check
>> for vcpu events capability to enable 32 bit kvm vcpu events
>> support.
>>
>> User space will check whether KVM supports vcpu events through
>> KVM_CAP_VCPU_EVENTS IOCTL.
> 
> nit: User space will check whether KVM supports vcpu events by checking
> the KVM_CAP_VCPU_EVENTS extension.

  Thanks for the pointing out, I will update it.

> 
>>
>> Cc: James Morse 
>> Signed-off-by: Dongjiu Geng 
> 
> Reviewed-by : Suzuki K Poulose 
> 
>> ---
>> For the 32 bits kvm migration, it needs to enable the vcpu events,
>> this patch will enable it. The user space QEMU patch is here:
>> https://patchwork.ozlabs.org/patch/975615/
>> ---
>>   arch/arm64/kvm/reset.c | 1 -
>>   virt/kvm/arm/arm.c | 1 +
>>   2 files changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index fd37c53..e50245e 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -82,7 +82,6 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, 
>> long ext)
>>   break;
>>   case KVM_CAP_SET_GUEST_DEBUG:
>>   case KVM_CAP_VCPU_ATTRIBUTES:
>> -    case KVM_CAP_VCPU_EVENTS:
>>   r = 1;
>>   break;
>>   default:
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 40e79ea..64e5d97 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -212,6 +212,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
>> ext)
>>   case KVM_CAP_READONLY_MEM:
>>   case KVM_CAP_MP_STATE:
>>   case KVM_CAP_IMMEDIATE_EXIT:
>> +    case KVM_CAP_VCPU_EVENTS:
>>   r = 1;
>>   break;
>>   case KVM_CAP_ARM_SET_DEVICE_ADDR:
>>
> 
> .
> 



Re: [PATCH 2/2] arm/arm64: KVM: share the check for vcpu events capability

2018-09-27 Thread gengdongjiu
Hi christoffer/marc,
   Could you review this simple patch to enable the 32 bit KVM vcpu event 
supports?  Because below user space patch depended on it. thanks
   https://patchwork.kernel.org/patch/10617601/


> subject: [PATCH 2/2] arm/arm64: KVM: share the check for vcpu events 
> capability
> 
> The commit 539aee0edb9f ("KVM: arm64: Share the parts of get/set events 
> useful to 32bit") shares the get/set events helper for arm64
> and arm32, it is better also share the check for vcpu events capability.
> 
> User space will check whether KVM supports vcpu events through 
> KVM_CAP_VCPU_EVENTS IOCTL.
> 
> Signed-off-by: Dongjiu Geng 
> ---
> For the 32 bit guest migration, it needs to enable the vcpu events
> 
> ---
>  arch/arm64/kvm/reset.c | 1 -
>  virt/kvm/arm/arm.c | 1 +
>  2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 
> fd37c53..e50245e 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -82,7 +82,6 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   break;
>   case KVM_CAP_SET_GUEST_DEBUG:
>   case KVM_CAP_VCPU_ATTRIBUTES:
> - case KVM_CAP_VCPU_EVENTS:
>   r = 1;
>   break;
>   default:
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 40e79ea..64e5d97 
> 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -212,6 +212,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_READONLY_MEM:
>   case KVM_CAP_MP_STATE:
>   case KVM_CAP_IMMEDIATE_EXIT:
> + case KVM_CAP_VCPU_EVENTS:
>   r = 1;
>   break;
>   case KVM_CAP_ARM_SET_DEVICE_ADDR:
> --
> 1.9.1
> 



Re: [PATCH 2/2] arm/arm64: KVM: share the check for vcpu events capability

2018-09-27 Thread gengdongjiu
Hi christoffer/marc,
   Could you review this simple patch to enable the 32 bit KVM vcpu event 
supports?  Because below user space patch depended on it. thanks
   https://patchwork.kernel.org/patch/10617601/


> subject: [PATCH 2/2] arm/arm64: KVM: share the check for vcpu events 
> capability
> 
> The commit 539aee0edb9f ("KVM: arm64: Share the parts of get/set events 
> useful to 32bit") shares the get/set events helper for arm64
> and arm32, it is better also share the check for vcpu events capability.
> 
> User space will check whether KVM supports vcpu events through 
> KVM_CAP_VCPU_EVENTS IOCTL.
> 
> Signed-off-by: Dongjiu Geng 
> ---
> For the 32 bit guest migration, it needs to enable the vcpu events
> 
> ---
>  arch/arm64/kvm/reset.c | 1 -
>  virt/kvm/arm/arm.c | 1 +
>  2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 
> fd37c53..e50245e 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -82,7 +82,6 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   break;
>   case KVM_CAP_SET_GUEST_DEBUG:
>   case KVM_CAP_VCPU_ATTRIBUTES:
> - case KVM_CAP_VCPU_EVENTS:
>   r = 1;
>   break;
>   default:
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 40e79ea..64e5d97 
> 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -212,6 +212,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_READONLY_MEM:
>   case KVM_CAP_MP_STATE:
>   case KVM_CAP_IMMEDIATE_EXIT:
> + case KVM_CAP_VCPU_EVENTS:
>   r = 1;
>   break;
>   case KVM_CAP_ARM_SET_DEVICE_ADDR:
> --
> 1.9.1
> 



Re: [PATCH RESEND v2] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-09 Thread gengdongjiu
2018-08-10 5:05 GMT+08:00 Tyler Baicar :
> On Thu, Aug 9, 2018 at 8:32 AM, gengdongjiu  wrote:
>>
>> 2018-08-08 0:26 GMT+08:00 Dongjiu Geng :
>> > In order to remove the additional check before calling the
>> > ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
>> >
>> > After this cleanup, we can simply call the ghes_notify_sea() to let
>> > APEI driver handle the SEA notification.
>> >
>> > CC: Tyler Baicar 
>> > CC: James Morse 
>> > Signed-off-by: Dongjiu Geng 
>> > Acked-by: Will Deacon 
>> > ---
>> > This cleanup is ever mentioned by Mark Rutland in [1]
>> >
>> > [1]:
>> > https://lkml.org/lkml/2018/5/31/289
>> >
>
> FWIW - Looks good to me!

Thanks very much for this comments and review,
whether I can get your "Acked-by" or "Reviewed-by"? thanks

>
> Thanks,
> Tyler


Re: [PATCH RESEND v2] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-09 Thread gengdongjiu
2018-08-10 5:05 GMT+08:00 Tyler Baicar :
> On Thu, Aug 9, 2018 at 8:32 AM, gengdongjiu  wrote:
>>
>> 2018-08-08 0:26 GMT+08:00 Dongjiu Geng :
>> > In order to remove the additional check before calling the
>> > ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
>> >
>> > After this cleanup, we can simply call the ghes_notify_sea() to let
>> > APEI driver handle the SEA notification.
>> >
>> > CC: Tyler Baicar 
>> > CC: James Morse 
>> > Signed-off-by: Dongjiu Geng 
>> > Acked-by: Will Deacon 
>> > ---
>> > This cleanup is ever mentioned by Mark Rutland in [1]
>> >
>> > [1]:
>> > https://lkml.org/lkml/2018/5/31/289
>> >
>
> FWIW - Looks good to me!

Thanks very much for this comments and review,
whether I can get your "Acked-by" or "Reviewed-by"? thanks

>
> Thanks,
> Tyler


Re: [PATCH RESEND v2] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-09 Thread gengdongjiu
CC Borislav


2018-08-08 0:26 GMT+08:00 Dongjiu Geng :
> In order to remove the additional check before calling the
> ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
>
> After this cleanup, we can simply call the ghes_notify_sea() to let
> APEI driver handle the SEA notification.
>
> CC: Tyler Baicar 
> CC: James Morse 
> Signed-off-by: Dongjiu Geng 
> Acked-by: Will Deacon 
> ---
> This cleanup is ever mentioned by Mark Rutland in [1]
>
> [1]:
> https://lkml.org/lkml/2018/5/31/289
>
> Change since v1:
> 1. Update the commit messages
> 2. CC Tyler and James
> 3. Add Acked-by of Will
> ---
>  arch/arm64/mm/fault.c | 7 +--
>  include/acpi/ghes.h   | 4 
>  2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index b8eecc7..9ffe01d 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -727,12 +727,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
> struct pt_regs *regs)
>
>  int handle_guest_sea(phys_addr_t addr, unsigned int esr)
>  {
> -   int ret = -ENOENT;
> -
> -   if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
> -   ret = ghes_notify_sea();
> -
> -   return ret;
> +   return ghes_notify_sea();
>  }
>
>  asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int 
> esr,
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 1624e2b..82cb4eb 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -118,6 +118,10 @@ static inline void *acpi_hest_get_next(struct 
> acpi_hest_generic_data *gdata)
>  (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>  section = acpi_hest_get_next(section))
>
> +#ifdef CONFIG_ACPI_APEI_SEA
>  int ghes_notify_sea(void);
> +#else
> +static inline int ghes_notify_sea(void) { return -ENOENT; }
> +#endif
>
>  #endif /* GHES_H */
> --
> 1.9.1
>


Re: [PATCH RESEND v2] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-09 Thread gengdongjiu
CC Borislav


2018-08-08 0:26 GMT+08:00 Dongjiu Geng :
> In order to remove the additional check before calling the
> ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
>
> After this cleanup, we can simply call the ghes_notify_sea() to let
> APEI driver handle the SEA notification.
>
> CC: Tyler Baicar 
> CC: James Morse 
> Signed-off-by: Dongjiu Geng 
> Acked-by: Will Deacon 
> ---
> This cleanup is ever mentioned by Mark Rutland in [1]
>
> [1]:
> https://lkml.org/lkml/2018/5/31/289
>
> Change since v1:
> 1. Update the commit messages
> 2. CC Tyler and James
> 3. Add Acked-by of Will
> ---
>  arch/arm64/mm/fault.c | 7 +--
>  include/acpi/ghes.h   | 4 
>  2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index b8eecc7..9ffe01d 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -727,12 +727,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
> struct pt_regs *regs)
>
>  int handle_guest_sea(phys_addr_t addr, unsigned int esr)
>  {
> -   int ret = -ENOENT;
> -
> -   if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
> -   ret = ghes_notify_sea();
> -
> -   return ret;
> +   return ghes_notify_sea();
>  }
>
>  asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int 
> esr,
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 1624e2b..82cb4eb 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -118,6 +118,10 @@ static inline void *acpi_hest_get_next(struct 
> acpi_hest_generic_data *gdata)
>  (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>  section = acpi_hest_get_next(section))
>
> +#ifdef CONFIG_ACPI_APEI_SEA
>  int ghes_notify_sea(void);
> +#else
> +static inline int ghes_notify_sea(void) { return -ENOENT; }
> +#endif
>
>  #endif /* GHES_H */
> --
> 1.9.1
>


Re: [PATCH] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-07 Thread gengdongjiu



On 2018/8/6 22:26, Will Deacon wrote:
>> Will,
>>  This patch will be applied, right? thanks
> I haven't queued it in the arm64 tree, since it touches include/acpi/ghes.h
> and you don't have an ack from the acpi folks. I acked it so that you could
> route it via the acpi tree without me holding you up.

Thanks the explanation.
yes, this patch touches the "include/acpi/ghes.h", I will repost this patch to 
let acpi
folks review it.


> 
> Will
> 
 This cleanup is ever mentioned by Mark Rutland in [1]



Re: [PATCH] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-07 Thread gengdongjiu



On 2018/8/6 22:26, Will Deacon wrote:
>> Will,
>>  This patch will be applied, right? thanks
> I haven't queued it in the arm64 tree, since it touches include/acpi/ghes.h
> and you don't have an ack from the acpi folks. I acked it so that you could
> route it via the acpi tree without me holding you up.

Thanks the explanation.
yes, this patch touches the "include/acpi/ghes.h", I will repost this patch to 
let acpi
folks review it.


> 
> Will
> 
 This cleanup is ever mentioned by Mark Rutland in [1]



Re: [PATCH] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-04 Thread gengdongjiu
2018-07-27 18:06 GMT+08:00 Will Deacon :
> On Thu, Jul 26, 2018 at 05:01:47PM -0400, Dongjiu Geng wrote:
>> In order to remove the additional check before calling the
>> ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>
> Acked-by: Will Deacon 

Will,
 This patch will be applied, right? thanks

>
> Will
>
>> This cleanup is ever mentioned by Mark Rutland in [1]
>>
>> [1]:
>> https://lkml.org/lkml/2018/5/31/289
>> ---
>>  arch/arm64/mm/fault.c | 7 +--
>>  include/acpi/ghes.h   | 4 
>>  2 files changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index b8eecc7..9ffe01d 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -727,12 +727,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>> struct pt_regs *regs)
>>
>>  int handle_guest_sea(phys_addr_t addr, unsigned int esr)
>>  {
>> - int ret = -ENOENT;
>> -
>> - if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
>> - ret = ghes_notify_sea();
>> -
>> - return ret;
>> + return ghes_notify_sea();
>>  }
>>
>>  asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int 
>> esr,
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 1624e2b..82cb4eb 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -118,6 +118,10 @@ static inline void *acpi_hest_get_next(struct 
>> acpi_hest_generic_data *gdata)
>>(void *)section - (void *)(estatus + 1) < estatus->data_length; \
>>section = acpi_hest_get_next(section))
>>
>> +#ifdef CONFIG_ACPI_APEI_SEA
>>  int ghes_notify_sea(void);
>> +#else
>> +static inline int ghes_notify_sea(void) { return -ENOENT; }
>> +#endif
>>
>>  #endif /* GHES_H */
>> --
>> 1.9.1
>>


Re: [PATCH] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-04 Thread gengdongjiu
2018-07-27 18:06 GMT+08:00 Will Deacon :
> On Thu, Jul 26, 2018 at 05:01:47PM -0400, Dongjiu Geng wrote:
>> In order to remove the additional check before calling the
>> ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>
> Acked-by: Will Deacon 

Will,
 This patch will be applied, right? thanks

>
> Will
>
>> This cleanup is ever mentioned by Mark Rutland in [1]
>>
>> [1]:
>> https://lkml.org/lkml/2018/5/31/289
>> ---
>>  arch/arm64/mm/fault.c | 7 +--
>>  include/acpi/ghes.h   | 4 
>>  2 files changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index b8eecc7..9ffe01d 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -727,12 +727,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>> struct pt_regs *regs)
>>
>>  int handle_guest_sea(phys_addr_t addr, unsigned int esr)
>>  {
>> - int ret = -ENOENT;
>> -
>> - if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
>> - ret = ghes_notify_sea();
>> -
>> - return ret;
>> + return ghes_notify_sea();
>>  }
>>
>>  asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int 
>> esr,
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 1624e2b..82cb4eb 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -118,6 +118,10 @@ static inline void *acpi_hest_get_next(struct 
>> acpi_hest_generic_data *gdata)
>>(void *)section - (void *)(estatus + 1) < estatus->data_length; \
>>section = acpi_hest_get_next(section))
>>
>> +#ifdef CONFIG_ACPI_APEI_SEA
>>  int ghes_notify_sea(void);
>> +#else
>> +static inline int ghes_notify_sea(void) { return -ENOENT; }
>> +#endif
>>
>>  #endif /* GHES_H */
>> --
>> 1.9.1
>>


Re: hrtimer become inaccurate with RT patch

2018-07-02 Thread gengdongjiu
Hi Sebastian ,
   Thanks for the answer.

On 2018/7/2 18:14, Sebastian Andrzej Siewior wrote:
> On 2018-07-02 17:34:34 [+0800], gengdongjiu wrote:
>>   The Linux kernel version is v4.1.46, and the preempt_rt patch is
>> patch-4.1.46-rt52.patch.
> 
> the 4.1 series is no longer supported (neither RT wise nor non-RT,
> https://www.kernel.org/category/releases.html). I suggest to move away.
> If you notice this problem now it is hardly a long running project.
yes, I Know, but we found the latest RT 4.14 series also has the same problem,
so this is common issue.

> 
>> process will not be interrupt. But if the hrtimer is also runs in
>> process context the timer is useless when it's inaccurate. so I want to
>> consult you whether this is expected behavior? whether is reasonable to move 
>> the timer IRQ
>> handling to a thread?
> 
> This depends on your expectations. The timer is defined not to fire
> before the programmed time. So it fires as soon as possible _after_ the
> programmed time.
It is reasonable that the timer is defined not to fire before the programmed 
time.
but we found it fires long _after_ the programmed time. For example, we define 
it to
fire after 2s, but it will fire after 5s, so it is very later than the 
expectations. I think the reason may be
that the timer handler thread is preempted by another higher priority thread. 
so from for this issue,
the timer handler should be in IRQ context instead of the process context or 
increase the timer handler thread priority, right?

> 
>> I think this is why hrtimer is run as a thread, I tried to set
>> hrtimer.irqsafe to 1, but the timer still seemed not right. Could anyone
>> give some advise? Thanks.
> 
> By setting irqsafe to 1 you ensure taht the timer will fire from the
> timer interrupt and before doing so you should ensure that the timer is
> indeed IRQ safe.
> Depending on what you do it is possible that the timer fires early but
> the application notices it later (the scheduler will first handle RT
> tasks according to their priorities followed by non RT tasks).

(I will continue check this comments, thanks)

> 
> Sebastian
> 
> .
> 



Re: hrtimer become inaccurate with RT patch

2018-07-02 Thread gengdongjiu
Hi Sebastian ,
   Thanks for the answer.

On 2018/7/2 18:14, Sebastian Andrzej Siewior wrote:
> On 2018-07-02 17:34:34 [+0800], gengdongjiu wrote:
>>   The Linux kernel version is v4.1.46, and the preempt_rt patch is
>> patch-4.1.46-rt52.patch.
> 
> the 4.1 series is no longer supported (neither RT wise nor non-RT,
> https://www.kernel.org/category/releases.html). I suggest to move away.
> If you notice this problem now it is hardly a long running project.
yes, I Know, but we found the latest RT 4.14 series also has the same problem,
so this is common issue.

> 
>> process will not be interrupt. But if the hrtimer is also runs in
>> process context the timer is useless when it's inaccurate. so I want to
>> consult you whether this is expected behavior? whether is reasonable to move 
>> the timer IRQ
>> handling to a thread?
> 
> This depends on your expectations. The timer is defined not to fire
> before the programmed time. So it fires as soon as possible _after_ the
> programmed time.
It is reasonable that the timer is defined not to fire before the programmed 
time.
but we found it fires long _after_ the programmed time. For example, we define 
it to
fire after 2s, but it will fire after 5s, so it is very later than the 
expectations. I think the reason may be
that the timer handler thread is preempted by another higher priority thread. 
so from for this issue,
the timer handler should be in IRQ context instead of the process context or 
increase the timer handler thread priority, right?

> 
>> I think this is why hrtimer is run as a thread, I tried to set
>> hrtimer.irqsafe to 1, but the timer still seemed not right. Could anyone
>> give some advise? Thanks.
> 
> By setting irqsafe to 1 you ensure taht the timer will fire from the
> timer interrupt and before doing so you should ensure that the timer is
> indeed IRQ safe.
> Depending on what you do it is possible that the timer fires early but
> the application notices it later (the scheduler will first handle RT
> tasks according to their priorities followed by non RT tasks).

(I will continue check this comments, thanks)

> 
> Sebastian
> 
> .
> 



hrtimer become inaccurate with RT patch

2018-07-02 Thread gengdongjiu
Hi Thomas/Anna/John,

  Recently I found that the hrtimer become inaccurate when there is a RT
process runs on the same cpu core, and the kernel has applied preempt_rt
patch.
  The Linux kernel version is v4.1.46, and the preempt_rt patch is
patch-4.1.46-rt52.patch.
  I know that in the preempt_rt environment the interrupt handlers no
longer run in interrupt context but in process context, so that RT
process will not be interrupt. But if the hrtimer is also runs in
process context the timer is useless when it's inaccurate. so I want to
consult you whether this is expected behavior? whether is reasonable to move 
the timer IRQ
handling to a thread?

  Check the patch-4.1.46-rt52.patch will found in function
'hrtimer_interrupt' the modify below:

@@ -1296,7 +1539,10 @@ retry:
if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer))
break;

-   __run_hrtimer(timer, );
+   if (!hrtimer_rt_defer(timer))
+   __run_hrtimer(timer, );
+   else
+   raise = 1;
}
}
/* Reevaluate the clock bases for the next expiry */

@@ -1357,6 +1603,9 @@ retry:
tick_program_event(expires_next, 1);
printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
ktime_to_ns(delta));
+out:
+   if (raise)
+   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
 }

I think this is why hrtimer is run as a thread, I tried to set
hrtimer.irqsafe to 1, but the timer still seemed not right. Could anyone
give some advise? Thanks.



hrtimer become inaccurate with RT patch

2018-07-02 Thread gengdongjiu
Hi Thomas/Anna/John,

  Recently I found that the hrtimer become inaccurate when there is a RT
process runs on the same cpu core, and the kernel has applied preempt_rt
patch.
  The Linux kernel version is v4.1.46, and the preempt_rt patch is
patch-4.1.46-rt52.patch.
  I know that in the preempt_rt environment the interrupt handlers no
longer run in interrupt context but in process context, so that RT
process will not be interrupt. But if the hrtimer is also runs in
process context the timer is useless when it's inaccurate. so I want to
consult you whether this is expected behavior? whether is reasonable to move 
the timer IRQ
handling to a thread?

  Check the patch-4.1.46-rt52.patch will found in function
'hrtimer_interrupt' the modify below:

@@ -1296,7 +1539,10 @@ retry:
if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer))
break;

-   __run_hrtimer(timer, );
+   if (!hrtimer_rt_defer(timer))
+   __run_hrtimer(timer, );
+   else
+   raise = 1;
}
}
/* Reevaluate the clock bases for the next expiry */

@@ -1357,6 +1603,9 @@ retry:
tick_program_event(expires_next, 1);
printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
ktime_to_ns(delta));
+out:
+   if (raise)
+   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
 }

I think this is why hrtimer is run as a thread, I tried to set
hrtimer.irqsafe to 1, but the timer still seemed not right. Could anyone
give some advise? Thanks.



Re: [PATCH v5 2/2] arm64: KVM: export the capability to set guest SError syndrome

2018-07-01 Thread gengdongjiu
Hi James,

On 2018/6/29 23:58, James Morse wrote:
> Hi Dongjiu Geng,
> 
> This patch doesn't apply on v4.18-rc2.
> 
> Documentation/virtual/kvm/api.txt already has a 8.18 section. I guess you 
> based
> this on v4.17.

Yes, indeed I based on v4.17.

> 
> For posting patches, please use the latest 'rc' from Linus' tree, (or the
> maintainer's tree listed in MAINTAINERS for the tree you are targeting if the
> maintainer has started to pick up patches).
Ok, I will rebase it using the latest 'rc' from Linus' tree. thanks for the 
reminder.


> 
> 
> Thanks,
> 
> James
> 
> 
> On 25/06/18 21:58, Dongjiu Geng wrote:
>> For the arm64 RAS Extension, user space can inject a virtual-SError
>> with specified ESR. So user space needs to know whether KVM support
>> to inject such SError, this interface adds this query for this capability.
>>
>> KVM will check whether system support RAS Extension, if supported, KVM
>> returns true to user space, otherwise returns false.
> 
> 
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index 3732097..86b3808 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4628,3 +4628,14 @@ Architectures: s390
>>  This capability indicates that kvm will implement the interfaces to handle
>>  reset, migration and nested KVM for branch prediction blocking. The stfle
>>  facility 82 should not be provided to the guest without this capability.
>> +
>> +8.14 KVM_CAP_ARM_SET_SERROR_ESR
>> +
>> +Architectures: arm, arm64
>> +
>> +This capability indicates that userspace can specify the syndrome value 
>> reported
>> +to the guest OS when guest takes a virtual SError interrupt exception.
>> +If KVM has this capability, userspace can only specify the ISS field for 
>> the ESR
>> +syndrome, it can not specify the EC field which is not under control by KVM.
>> +If this virtual SError is taken to EL1 using AArch64, this value will be 
>> reported
>> +in ISS filed of ESR_EL1.
> 
> 
> 
> 
> .
> 



Re: [PATCH v5 2/2] arm64: KVM: export the capability to set guest SError syndrome

2018-07-01 Thread gengdongjiu
Hi James,

On 2018/6/29 23:58, James Morse wrote:
> Hi Dongjiu Geng,
> 
> This patch doesn't apply on v4.18-rc2.
> 
> Documentation/virtual/kvm/api.txt already has a 8.18 section. I guess you 
> based
> this on v4.17.

Yes, indeed I based on v4.17.

> 
> For posting patches, please use the latest 'rc' from Linus' tree, (or the
> maintainer's tree listed in MAINTAINERS for the tree you are targeting if the
> maintainer has started to pick up patches).
Ok, I will rebase it using the latest 'rc' from Linus' tree. thanks for the 
reminder.


> 
> 
> Thanks,
> 
> James
> 
> 
> On 25/06/18 21:58, Dongjiu Geng wrote:
>> For the arm64 RAS Extension, user space can inject a virtual-SError
>> with specified ESR. So user space needs to know whether KVM support
>> to inject such SError, this interface adds this query for this capability.
>>
>> KVM will check whether system support RAS Extension, if supported, KVM
>> returns true to user space, otherwise returns false.
> 
> 
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index 3732097..86b3808 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4628,3 +4628,14 @@ Architectures: s390
>>  This capability indicates that kvm will implement the interfaces to handle
>>  reset, migration and nested KVM for branch prediction blocking. The stfle
>>  facility 82 should not be provided to the guest without this capability.
>> +
>> +8.14 KVM_CAP_ARM_SET_SERROR_ESR
>> +
>> +Architectures: arm, arm64
>> +
>> +This capability indicates that userspace can specify the syndrome value 
>> reported
>> +to the guest OS when guest takes a virtual SError interrupt exception.
>> +If KVM has this capability, userspace can only specify the ISS field for 
>> the ESR
>> +syndrome, it can not specify the EC field which is not under control by KVM.
>> +If this virtual SError is taken to EL1 using AArch64, this value will be 
>> reported
>> +in ISS filed of ESR_EL1.
> 
> 
> 
> 
> .
> 



Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-18 Thread gengdongjiu
> On 12/06/18 15:50, gengdongjiu wrote:
> > On 2018/6/11 21:36, James Morse wrote:
> >> On 08/06/18 20:48, Dongjiu Geng wrote:
> >>> For the migrating VMs, user space may need to know the exception
> >>> state. For example, in the machine A, KVM make an SError pending,
> >>> when migrate to B, KVM also needs to pend an SError.
> >>>
> >>> This new IOCTL exports user-invisible states related to SError.
> >>> Together with appropriate user space changes, user space can get/set
> >>> the SError exception state to do migrate/snapshot/suspend.
> 
> 
> >>> diff --git a/arch/arm/include/uapi/asm/kvm.h
> >>> b/arch/arm/include/uapi/asm/kvm.h index caae484..c3e6975 100644
> >>> --- a/arch/arm/include/uapi/asm/kvm.h
> >>> +++ b/arch/arm/include/uapi/asm/kvm.h
> >>> @@ -124,6 +124,18 @@ struct kvm_sync_regs {  struct
> >>> kvm_arch_memory_slot {  };
> >>>
> >>> +/* for KVM_GET/SET_VCPU_EVENTS */
> >>> +struct kvm_vcpu_events {
> >>> + struct {
> >>> + __u8 serror_pending;
> >>> + __u8 serror_has_esr;
> >>> + /* Align it to 8 bytes */
> >>> + __u8 pad[6];
> >>> + __u64 serror_esr;
> >>> + } exception;
> >>> + __u32 reserved[12];
> >>> +};
> >>> +
> >>
> >> You haven't defined __KVM_HAVE_VCPU_EVENTS for 32bit, so presumably
> >> this struct will never be used. Why is it here?
> 
> >   if not add it for 32 bits. the 32 arm platform will build Fail, whether 
> > you have good
> >idea to avoid this Failure if not add this struct for the 32 bit?
> 
> How does this 32bit code build without this patch?
> If do you provide the struct, how will that code build with older headers?
> 
> As far as I can see, this is what the __KVM_HAVE_VCPU_EVENTS define is for.
> 
> This should be both, or neither. Having just the struct is useless.
> 
> 
> >>> +int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> >>> + struct kvm_vcpu_events *events)
> >>> +{
> >>> + bool serror_pending = events->exception.serror_pending;
> >>> + bool has_esr = events->exception.serror_has_esr;
> >>> +
> >>> + if (serror_pending && has_esr) {
> >>> + if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
> >>> + return -EINVAL;
> >>> +
> >>> + kvm_set_sei_esr(vcpu, events->exception.serror_esr);
> >>
> >> kvm_set_sei_esr() will silently discard the top 40 bits of
> >> serror_esr, (which is correct, we shouldn't copy them into hardware 
> >> without know what they do).
> >>
> >> Could we please force user-space to zero these bits, we can advertise
> >> extra CAPs if new features turn up in that space, instead of
> >> user-space passing  and relying on the kernel to remove it.
> >
> >   yes, I can zero these bits in the  user-space and not depend on kernel to 
> > remove it.
> 
> But the kernel must check that user-space did zero those bits. Otherwise 
> user-space may start using them when a future version of the

For this comments, how about add below kernel check that user-space did zero 
those bits? Thanks.

+   if (!((events->exception.serror_esr) & ~ESR_ELx_ISS_MASK))
+   kvm_set_sei_esr(vcpu, events->exception.serror_esr);
+   else
+   return -EINVAL;


> architecture gives them a meaning, but an older kernel version doesn't know 
> it has to do extra work, but still lets the bits from user-space
> through into the hardware.
> 
> If new bits do turn up, we can advertise a CAP that says that KVM supports 
> whatever that feature is.
> 
> 
> >> (Background: VSESR is a 64bit register that holds the value to go in
> >> a 32bit register. I suspect the top-half could get re-used for
> >> control values or something we don't want to give user-space)
> 
> >   do you mean when user-space get the VSESR value through
> > KVM_GET_VCPU_EVENTS it only return the low-half 32 bits?
> 
> No, the kernel will only ever set a 24bit value here. If we force user-space 
> to only provide a 24bit value then we don't need to check it on
> read. We never read the value back from hardware.
> 
> These high bits are RES0 at the moment, they may get used for something in 
> the future. As we are exposing this via a user-space ABI we
> need to make sure we only expose the bits we understand today.

Ok

> 
> 
> Thanks,
> 
> James


Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-18 Thread gengdongjiu
> On 12/06/18 15:50, gengdongjiu wrote:
> > On 2018/6/11 21:36, James Morse wrote:
> >> On 08/06/18 20:48, Dongjiu Geng wrote:
> >>> For the migrating VMs, user space may need to know the exception
> >>> state. For example, in the machine A, KVM make an SError pending,
> >>> when migrate to B, KVM also needs to pend an SError.
> >>>
> >>> This new IOCTL exports user-invisible states related to SError.
> >>> Together with appropriate user space changes, user space can get/set
> >>> the SError exception state to do migrate/snapshot/suspend.
> 
> 
> >>> diff --git a/arch/arm/include/uapi/asm/kvm.h
> >>> b/arch/arm/include/uapi/asm/kvm.h index caae484..c3e6975 100644
> >>> --- a/arch/arm/include/uapi/asm/kvm.h
> >>> +++ b/arch/arm/include/uapi/asm/kvm.h
> >>> @@ -124,6 +124,18 @@ struct kvm_sync_regs {  struct
> >>> kvm_arch_memory_slot {  };
> >>>
> >>> +/* for KVM_GET/SET_VCPU_EVENTS */
> >>> +struct kvm_vcpu_events {
> >>> + struct {
> >>> + __u8 serror_pending;
> >>> + __u8 serror_has_esr;
> >>> + /* Align it to 8 bytes */
> >>> + __u8 pad[6];
> >>> + __u64 serror_esr;
> >>> + } exception;
> >>> + __u32 reserved[12];
> >>> +};
> >>> +
> >>
> >> You haven't defined __KVM_HAVE_VCPU_EVENTS for 32bit, so presumably
> >> this struct will never be used. Why is it here?
> 
> >   if not add it for 32 bits. the 32 arm platform will build Fail, whether 
> > you have good
> >idea to avoid this Failure if not add this struct for the 32 bit?
> 
> How does this 32bit code build without this patch?
> If do you provide the struct, how will that code build with older headers?
> 
> As far as I can see, this is what the __KVM_HAVE_VCPU_EVENTS define is for.
> 
> This should be both, or neither. Having just the struct is useless.
> 
> 
> >>> +int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> >>> + struct kvm_vcpu_events *events)
> >>> +{
> >>> + bool serror_pending = events->exception.serror_pending;
> >>> + bool has_esr = events->exception.serror_has_esr;
> >>> +
> >>> + if (serror_pending && has_esr) {
> >>> + if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
> >>> + return -EINVAL;
> >>> +
> >>> + kvm_set_sei_esr(vcpu, events->exception.serror_esr);
> >>
> >> kvm_set_sei_esr() will silently discard the top 40 bits of
> >> serror_esr, (which is correct, we shouldn't copy them into hardware 
> >> without know what they do).
> >>
> >> Could we please force user-space to zero these bits, we can advertise
> >> extra CAPs if new features turn up in that space, instead of
> >> user-space passing  and relying on the kernel to remove it.
> >
> >   yes, I can zero these bits in the  user-space and not depend on kernel to 
> > remove it.
> 
> But the kernel must check that user-space did zero those bits. Otherwise 
> user-space may start using them when a future version of the

For this comments, how about add below kernel check that user-space did zero 
those bits? Thanks.

+   if (!((events->exception.serror_esr) & ~ESR_ELx_ISS_MASK))
+   kvm_set_sei_esr(vcpu, events->exception.serror_esr);
+   else
+   return -EINVAL;


> architecture gives them a meaning, but an older kernel version doesn't know 
> it has to do extra work, but still lets the bits from user-space
> through into the hardware.
> 
> If new bits do turn up, we can advertise a CAP that says that KVM supports 
> whatever that feature is.
> 
> 
> >> (Background: VSESR is a 64bit register that holds the value to go in
> >> a 32bit register. I suspect the top-half could get re-used for
> >> control values or something we don't want to give user-space)
> 
> >   do you mean when user-space get the VSESR value through
> > KVM_GET_VCPU_EVENTS it only return the low-half 32 bits?
> 
> No, the kernel will only ever set a 24bit value here. If we force user-space 
> to only provide a 24bit value then we don't need to check it on
> read. We never read the value back from hardware.
> 
> These high bits are RES0 at the moment, they may get used for something in 
> the future. As we are exposing this via a user-space ABI we
> need to make sure we only expose the bits we understand today.

Ok

> 
> 
> Thanks,
> 
> James


Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-12 Thread gengdongjiu
Hi James,
  thanks for the review.

On 2018/6/11 21:36, James Morse wrote:
> Hi Dongjiu Geng,
> 
> Please only put 'RESEND' in the subject if the patch content is identical.
> This patch is not the same as v4.

Yes, it should

> 
> On 08/06/18 20:48, Dongjiu Geng wrote:
>> For the migrating VMs, user space may need to know the exception
>> state. For example, in the machine A, KVM make an SError pending,
>> when migrate to B, KVM also needs to pend an SError.
>>
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, user space can get/set
>> the SError exception state to do migrate/snapshot/suspend.
> 
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index fdac969..8896737 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -835,11 +835,13 @@ struct kvm_clock_data {
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
> 
> Isn't this actually a per-vcpu ioctl? Can we fix the documentation?
  I will modify the original documentation
> 
> 
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>>  states of the vcpu.
>>  
>> @@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>  
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the 
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
>>  4.32 KVM_SET_VCPU_EVENTS
>>  
>> -Capability: KVM_CAP_VCPU_EVENTS
>> +Capebility: KVM_CAP_VCPU_EVENTS
> 
> (please fix this)
   Ok, will fix this
> 
> 
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
> 
> (this is also a vcpu ioctl)
  will fix

> 
> 
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of 
>> the
>>  vcpu.
>>  
>> @@ -910,6 +929,12 @@ shall be written into the VCPU.
>>  
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>  
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions as well as related states of the vcpu.
> 
> There are some deliberate choices here I think we should document:
> | This API can't be used to clear a pending SError.
> 
> If there already was an SError pending, this API just overwrites it with the 
> new
> one. The architecture has some rules about merging multiple SError. (details 
> in
> 2.5.3 Multiple SError interrupts of [0])
> 
> I don't think KVM needs to enforce these, as they are implementation-defined 
> if
> one of the ESR is implementation-defined... the part that matters is reporting
> the 'most severe' RAS ESR if there are multiple pending. As only user-space 
> ever
> sets these, let's make it user-spaces problem to do.
> 
> I think we should recommend user-space always reads the pending values and
> applies its merging-multiple-SError logic. (I assume your Qemu patches do 
> this).


 I will check whether QEMU can be possible to do such things, anyway this patch
 not need to do such merging.

> Something like:
> | User-space should first use KVM_GET_VCPU_EVENTS in case KVM has made an 
> SError
> | pending as part of its device emulation. When both values are architected 
> RAS
> | SError ESR values, the new ESR should reflect the combined effect of both
> | errors.>
> 
>> diff --git a/arch/arm/include/uapi/asm/kvm.h 
>> b/arch/arm/include/uapi/asm/kvm.h
>> index caae484..c3e6975 100644
>> --- a/arch/arm/include/uapi/asm/kvm.h
>> +++ b/arch/arm/include/uapi/asm/kvm.h
>> @@ -124,6 +124,18 @@ struct kvm_sync_regs {
>>  struct kvm_arch_memory_slot {
>>  };
>>  
>> +/* for KVM_GET/SET_VCPU_EVENTS */
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
> 
> You haven't defined __KVM_HAVE_VCPU_EVENTS for 32bit, so presumably this 
> struct
> will never be used. Why is it here?
  if not add it for 32 bits. the 32 arm platform will build Fail, whether you 
have good
   idea to avoid this Failure if not add this struct for the 32 bit?
> (I agree if we ever provide it on 32bit, the struct layout should be the same.
> Is this only here to force that to 

Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-12 Thread gengdongjiu
Hi James,
  thanks for the review.

On 2018/6/11 21:36, James Morse wrote:
> Hi Dongjiu Geng,
> 
> Please only put 'RESEND' in the subject if the patch content is identical.
> This patch is not the same as v4.

Yes, it should

> 
> On 08/06/18 20:48, Dongjiu Geng wrote:
>> For the migrating VMs, user space may need to know the exception
>> state. For example, in the machine A, KVM make an SError pending,
>> when migrate to B, KVM also needs to pend an SError.
>>
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, user space can get/set
>> the SError exception state to do migrate/snapshot/suspend.
> 
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index fdac969..8896737 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -835,11 +835,13 @@ struct kvm_clock_data {
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
> 
> Isn't this actually a per-vcpu ioctl? Can we fix the documentation?
  I will modify the original documentation
> 
> 
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>>  states of the vcpu.
>>  
>> @@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>  
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the 
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
>>  4.32 KVM_SET_VCPU_EVENTS
>>  
>> -Capability: KVM_CAP_VCPU_EVENTS
>> +Capebility: KVM_CAP_VCPU_EVENTS
> 
> (please fix this)
   Ok, will fix this
> 
> 
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
> 
> (this is also a vcpu ioctl)
  will fix

> 
> 
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of 
>> the
>>  vcpu.
>>  
>> @@ -910,6 +929,12 @@ shall be written into the VCPU.
>>  
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>  
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions as well as related states of the vcpu.
> 
> There are some deliberate choices here I think we should document:
> | This API can't be used to clear a pending SError.
> 
> If there already was an SError pending, this API just overwrites it with the 
> new
> one. The architecture has some rules about merging multiple SError. (details 
> in
> 2.5.3 Multiple SError interrupts of [0])
> 
> I don't think KVM needs to enforce these, as they are implementation-defined 
> if
> one of the ESR is implementation-defined... the part that matters is reporting
> the 'most severe' RAS ESR if there are multiple pending. As only user-space 
> ever
> sets these, let's make it user-spaces problem to do.
> 
> I think we should recommend user-space always reads the pending values and
> applies its merging-multiple-SError logic. (I assume your Qemu patches do 
> this).


 I will check whether QEMU can be possible to do such things, anyway this patch
 not need to do such merging.

> Something like:
> | User-space should first use KVM_GET_VCPU_EVENTS in case KVM has made an 
> SError
> | pending as part of its device emulation. When both values are architected 
> RAS
> | SError ESR values, the new ESR should reflect the combined effect of both
> | errors.>
> 
>> diff --git a/arch/arm/include/uapi/asm/kvm.h 
>> b/arch/arm/include/uapi/asm/kvm.h
>> index caae484..c3e6975 100644
>> --- a/arch/arm/include/uapi/asm/kvm.h
>> +++ b/arch/arm/include/uapi/asm/kvm.h
>> @@ -124,6 +124,18 @@ struct kvm_sync_regs {
>>  struct kvm_arch_memory_slot {
>>  };
>>  
>> +/* for KVM_GET/SET_VCPU_EVENTS */
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
> 
> You haven't defined __KVM_HAVE_VCPU_EVENTS for 32bit, so presumably this 
> struct
> will never be used. Why is it here?
  if not add it for 32 bits. the 32 arm platform will build Fail, whether you 
have good
   idea to avoid this Failure if not add this struct for the 32 bit?
> (I agree if we ever provide it on 32bit, the struct layout should be the same.
> Is this only here to force that to 

Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-12 Thread gengdongjiu
Hi Marc
   thanks for the review.

On 2018/6/9 20:40, Marc Zyngier wrote:
> On Fri, 08 Jun 2018 20:48:40 +0100,
> Dongjiu Geng wrote:
>>
>> For the migrating VMs, user space may need to know the exception
>> state. For example, in the machine A, KVM make an SError pending,
>> when migrate to B, KVM also needs to pend an SError.
>>
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, user space can get/set
>> the SError exception state to do migrate/snapshot/suspend.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>> change since v3:
>> 1. Fix the memset() issue in the kvm_arm_vcpu_get_events()
>>
>> change since v2:
>> 1. Add kvm_vcpu_events structure definition for arm platform to avoid the 
>> build errors.
>>
>> change since v1:
>> Address Marc's comments, thanks Marc's review
>> 1. serror_has_esr always true when ARM64_HAS_RAS_EXTN is set
>> 2. remove Spurious blank line in kvm_arm_vcpu_set_events()
>> 3. rename pend_guest_serror() to kvm_set_sei_esr()
>> 4. Make kvm_arm_vcpu_get_events() did all the work rather than having this 
>> split responsibility.
>> 5.  using sizeof(events) instead of sizeof(struct kvm_vcpu_events)
>>
>> this series patch is separated from 
>> https://www.spinics.net/lists/kvm/msg168917.html
>> The user space patch is here: 
>> https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06965.html
>>
>> change since V12:
>> 1. change (vcpu->arch.hcr_el2 & HCR_VSE) to !!(vcpu->arch.hcr_el2 & HCR_VSE) 
>> in kvm_arm_vcpu_get_events()
>>
>> Change since V11:
>> Address James's comments, thanks James
>> 1. Align the struct of kvm_vcpu_events to 64 bytes
>> 2. Avoid exposing the stale ESR value in the kvm_arm_vcpu_get_events()
>> 3. Change variables 'injected' name to 'serror_pending' in the 
>> kvm_arm_vcpu_set_events()
>> 4. Change to sizeof(events) from sizeof(struct kvm_vcpu_events) in 
>> kvm_arch_vcpu_ioctl()
>>
>> Change since V10:
>> Address James's comments, thanks James
>> 1. Merge the helper function with the user.
>> 2. Move the ISS_MASK into pend_guest_serror() to clear top bits
>> 3. Make kvm_vcpu_events struct align to 4 bytes
>> 4. Add something check in the kvm_arm_vcpu_set_events()
>> 5. Check kvm_arm_vcpu_get/set_events()'s return value.
>> 6. Initialise kvm_vcpu_events to 0 so that padding transferred to user-space 
>> doesn't
>>contain kernel stack.
>> ---
>>  Documentation/virtual/kvm/api.txt| 31 ---
>>  arch/arm/include/asm/kvm_host.h  |  6 ++
>>  arch/arm/include/uapi/asm/kvm.h  | 12 
>>  arch/arm/kvm/guest.c | 12 
>>  arch/arm64/include/asm/kvm_emulate.h |  5 +
>>  arch/arm64/include/asm/kvm_host.h|  7 +++
>>  arch/arm64/include/uapi/asm/kvm.h| 13 +
>>  arch/arm64/kvm/guest.c   | 36 
>> 
>>  arch/arm64/kvm/inject_fault.c|  6 +++---
>>  arch/arm64/kvm/reset.c   |  1 +
>>  virt/kvm/arm/arm.c   | 19 +++
>>  11 files changed, 142 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index fdac969..8896737 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -835,11 +835,13 @@ struct kvm_clock_data {
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>>  states of the vcpu.
>>  
>> @@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>  
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the 
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
>>  4.32 KVM_SET_VCPU_EVENTS
>>  
>> -Capability: KVM_CAP_VCPU_EVENTS
>> +Capebility: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of 
>> the
>>  vcpu.
>>  
>> @@ -910,6 +929,12 @@ shall be written into the VCPU.
>>  
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>  
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions as well as related states of the vcpu.

Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-12 Thread gengdongjiu
Hi Marc
   thanks for the review.

On 2018/6/9 20:40, Marc Zyngier wrote:
> On Fri, 08 Jun 2018 20:48:40 +0100,
> Dongjiu Geng wrote:
>>
>> For the migrating VMs, user space may need to know the exception
>> state. For example, in the machine A, KVM make an SError pending,
>> when migrate to B, KVM also needs to pend an SError.
>>
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, user space can get/set
>> the SError exception state to do migrate/snapshot/suspend.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>> change since v3:
>> 1. Fix the memset() issue in the kvm_arm_vcpu_get_events()
>>
>> change since v2:
>> 1. Add kvm_vcpu_events structure definition for arm platform to avoid the 
>> build errors.
>>
>> change since v1:
>> Address Marc's comments, thanks Marc's review
>> 1. serror_has_esr always true when ARM64_HAS_RAS_EXTN is set
>> 2. remove Spurious blank line in kvm_arm_vcpu_set_events()
>> 3. rename pend_guest_serror() to kvm_set_sei_esr()
>> 4. Make kvm_arm_vcpu_get_events() did all the work rather than having this 
>> split responsibility.
>> 5.  using sizeof(events) instead of sizeof(struct kvm_vcpu_events)
>>
>> this series patch is separated from 
>> https://www.spinics.net/lists/kvm/msg168917.html
>> The user space patch is here: 
>> https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06965.html
>>
>> change since V12:
>> 1. change (vcpu->arch.hcr_el2 & HCR_VSE) to !!(vcpu->arch.hcr_el2 & HCR_VSE) 
>> in kvm_arm_vcpu_get_events()
>>
>> Change since V11:
>> Address James's comments, thanks James
>> 1. Align the struct of kvm_vcpu_events to 64 bytes
>> 2. Avoid exposing the stale ESR value in the kvm_arm_vcpu_get_events()
>> 3. Change variables 'injected' name to 'serror_pending' in the 
>> kvm_arm_vcpu_set_events()
>> 4. Change to sizeof(events) from sizeof(struct kvm_vcpu_events) in 
>> kvm_arch_vcpu_ioctl()
>>
>> Change since V10:
>> Address James's comments, thanks James
>> 1. Merge the helper function with the user.
>> 2. Move the ISS_MASK into pend_guest_serror() to clear top bits
>> 3. Make kvm_vcpu_events struct align to 4 bytes
>> 4. Add something check in the kvm_arm_vcpu_set_events()
>> 5. Check kvm_arm_vcpu_get/set_events()'s return value.
>> 6. Initialise kvm_vcpu_events to 0 so that padding transferred to user-space 
>> doesn't
>>contain kernel stack.
>> ---
>>  Documentation/virtual/kvm/api.txt| 31 ---
>>  arch/arm/include/asm/kvm_host.h  |  6 ++
>>  arch/arm/include/uapi/asm/kvm.h  | 12 
>>  arch/arm/kvm/guest.c | 12 
>>  arch/arm64/include/asm/kvm_emulate.h |  5 +
>>  arch/arm64/include/asm/kvm_host.h|  7 +++
>>  arch/arm64/include/uapi/asm/kvm.h| 13 +
>>  arch/arm64/kvm/guest.c   | 36 
>> 
>>  arch/arm64/kvm/inject_fault.c|  6 +++---
>>  arch/arm64/kvm/reset.c   |  1 +
>>  virt/kvm/arm/arm.c   | 19 +++
>>  11 files changed, 142 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index fdac969..8896737 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -835,11 +835,13 @@ struct kvm_clock_data {
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>>  states of the vcpu.
>>  
>> @@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>  
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the 
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
>>  4.32 KVM_SET_VCPU_EVENTS
>>  
>> -Capability: KVM_CAP_VCPU_EVENTS
>> +Capebility: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of 
>> the
>>  vcpu.
>>  
>> @@ -910,6 +929,12 @@ shall be written into the VCPU.
>>  
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>  
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions as well as related states of the vcpu.

Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-12 Thread gengdongjiu
Christoffer,
   Thanks for the review.

On 2018/6/9 19:17, Christoffer Dall wrote:
> On Sat, Jun 09, 2018 at 03:48:40AM +0800, Dongjiu Geng wrote:
>> For the migrating VMs, user space may need to know the exception
>> state. For example, in the machine A, KVM make an SError pending,
>> when migrate to B, KVM also needs to pend an SError.
>>
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, user space can get/set
>> the SError exception state to do migrate/snapshot/suspend.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>> change since v3:
>> 1. Fix the memset() issue in the kvm_arm_vcpu_get_events()
>>
>> change since v2:
>> 1. Add kvm_vcpu_events structure definition for arm platform to avoid the 
>> build errors.
>>
>> change since v1:
>> Address Marc's comments, thanks Marc's review
>> 1. serror_has_esr always true when ARM64_HAS_RAS_EXTN is set
>> 2. remove Spurious blank line in kvm_arm_vcpu_set_events()
>> 3. rename pend_guest_serror() to kvm_set_sei_esr()
>> 4. Make kvm_arm_vcpu_get_events() did all the work rather than having this 
>> split responsibility.
>> 5.  using sizeof(events) instead of sizeof(struct kvm_vcpu_events)
>>
>> this series patch is separated from 
>> https://www.spinics.net/lists/kvm/msg168917.html
>> The user space patch is here: 
>> https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06965.html
>>
>> change since V12:
>> 1. change (vcpu->arch.hcr_el2 & HCR_VSE) to !!(vcpu->arch.hcr_el2 & HCR_VSE) 
>> in kvm_arm_vcpu_get_events()
>>
>> Change since V11:
>> Address James's comments, thanks James
>> 1. Align the struct of kvm_vcpu_events to 64 bytes
>> 2. Avoid exposing the stale ESR value in the kvm_arm_vcpu_get_events()
>> 3. Change variables 'injected' name to 'serror_pending' in the 
>> kvm_arm_vcpu_set_events()
>> 4. Change to sizeof(events) from sizeof(struct kvm_vcpu_events) in 
>> kvm_arch_vcpu_ioctl()
>>
>> Change since V10:
>> Address James's comments, thanks James
>> 1. Merge the helper function with the user.
>> 2. Move the ISS_MASK into pend_guest_serror() to clear top bits
>> 3. Make kvm_vcpu_events struct align to 4 bytes
>> 4. Add something check in the kvm_arm_vcpu_set_events()
>> 5. Check kvm_arm_vcpu_get/set_events()'s return value.
>> 6. Initialise kvm_vcpu_events to 0 so that padding transferred to user-space 
>> doesn't
>>contain kernel stack.
>> ---
>>  Documentation/virtual/kvm/api.txt| 31 ---
>>  arch/arm/include/asm/kvm_host.h  |  6 ++
>>  arch/arm/include/uapi/asm/kvm.h  | 12 
>>  arch/arm/kvm/guest.c | 12 
>>  arch/arm64/include/asm/kvm_emulate.h |  5 +
>>  arch/arm64/include/asm/kvm_host.h|  7 +++
>>  arch/arm64/include/uapi/asm/kvm.h| 13 +
>>  arch/arm64/kvm/guest.c   | 36 
>> 
>>  arch/arm64/kvm/inject_fault.c|  6 +++---
>>  arch/arm64/kvm/reset.c   |  1 +
>>  virt/kvm/arm/arm.c   | 19 +++
>>  11 files changed, 142 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index fdac969..8896737 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -835,11 +835,13 @@ struct kvm_clock_data {
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>>  states of the vcpu.
>>  
>> @@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>  
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the 
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
>>  4.32 KVM_SET_VCPU_EVENTS
>>  
>> -Capability: KVM_CAP_VCPU_EVENTS
>> +Capebility: KVM_CAP_VCPU_EVENTS
> 
> nit: unintended change?
> 
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of 
>> the
>>  vcpu.
>>  
>> @@ -910,6 +929,12 @@ shall be written into the VCPU.
>>  
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>  
>> +ARM, ARM64:
>> +
>> +Set pending SError 

Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-12 Thread gengdongjiu
Christoffer,
   Thanks for the review.

On 2018/6/9 19:17, Christoffer Dall wrote:
> On Sat, Jun 09, 2018 at 03:48:40AM +0800, Dongjiu Geng wrote:
>> For the migrating VMs, user space may need to know the exception
>> state. For example, in the machine A, KVM make an SError pending,
>> when migrate to B, KVM also needs to pend an SError.
>>
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, user space can get/set
>> the SError exception state to do migrate/snapshot/suspend.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>> change since v3:
>> 1. Fix the memset() issue in the kvm_arm_vcpu_get_events()
>>
>> change since v2:
>> 1. Add kvm_vcpu_events structure definition for arm platform to avoid the 
>> build errors.
>>
>> change since v1:
>> Address Marc's comments, thanks Marc's review
>> 1. serror_has_esr always true when ARM64_HAS_RAS_EXTN is set
>> 2. remove Spurious blank line in kvm_arm_vcpu_set_events()
>> 3. rename pend_guest_serror() to kvm_set_sei_esr()
>> 4. Make kvm_arm_vcpu_get_events() did all the work rather than having this 
>> split responsibility.
>> 5.  using sizeof(events) instead of sizeof(struct kvm_vcpu_events)
>>
>> this series patch is separated from 
>> https://www.spinics.net/lists/kvm/msg168917.html
>> The user space patch is here: 
>> https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06965.html
>>
>> change since V12:
>> 1. change (vcpu->arch.hcr_el2 & HCR_VSE) to !!(vcpu->arch.hcr_el2 & HCR_VSE) 
>> in kvm_arm_vcpu_get_events()
>>
>> Change since V11:
>> Address James's comments, thanks James
>> 1. Align the struct of kvm_vcpu_events to 64 bytes
>> 2. Avoid exposing the stale ESR value in the kvm_arm_vcpu_get_events()
>> 3. Change variables 'injected' name to 'serror_pending' in the 
>> kvm_arm_vcpu_set_events()
>> 4. Change to sizeof(events) from sizeof(struct kvm_vcpu_events) in 
>> kvm_arch_vcpu_ioctl()
>>
>> Change since V10:
>> Address James's comments, thanks James
>> 1. Merge the helper function with the user.
>> 2. Move the ISS_MASK into pend_guest_serror() to clear top bits
>> 3. Make kvm_vcpu_events struct align to 4 bytes
>> 4. Add something check in the kvm_arm_vcpu_set_events()
>> 5. Check kvm_arm_vcpu_get/set_events()'s return value.
>> 6. Initialise kvm_vcpu_events to 0 so that padding transferred to user-space 
>> doesn't
>>contain kernel stack.
>> ---
>>  Documentation/virtual/kvm/api.txt| 31 ---
>>  arch/arm/include/asm/kvm_host.h  |  6 ++
>>  arch/arm/include/uapi/asm/kvm.h  | 12 
>>  arch/arm/kvm/guest.c | 12 
>>  arch/arm64/include/asm/kvm_emulate.h |  5 +
>>  arch/arm64/include/asm/kvm_host.h|  7 +++
>>  arch/arm64/include/uapi/asm/kvm.h| 13 +
>>  arch/arm64/kvm/guest.c   | 36 
>> 
>>  arch/arm64/kvm/inject_fault.c|  6 +++---
>>  arch/arm64/kvm/reset.c   |  1 +
>>  virt/kvm/arm/arm.c   | 19 +++
>>  11 files changed, 142 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index fdac969..8896737 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -835,11 +835,13 @@ struct kvm_clock_data {
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>>  states of the vcpu.
>>  
>> @@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>  
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the 
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
>>  4.32 KVM_SET_VCPU_EVENTS
>>  
>> -Capability: KVM_CAP_VCPU_EVENTS
>> +Capebility: KVM_CAP_VCPU_EVENTS
> 
> nit: unintended change?
> 
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of 
>> the
>>  vcpu.
>>  
>> @@ -910,6 +929,12 @@ shall be written into the VCPU.
>>  
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>  
>> +ARM, ARM64:
>> +
>> +Set pending SError 

Re: [PATCH v2] usb: xhci: remove the code build warning

2018-06-11 Thread gengdongjiu
Hi Mathias,
   This patch is ready to apply, It is great that this patch can be applied to 
4.18. thanks a lot.

On 2018/6/6 22:36, Dongjiu Geng wrote:
> Initialize the 'err' variate to remove the build warning,
> the warning is shown as below:
> 
> drivers/usb/host/xhci-tegra.c: In function 'tegra_xusb_mbox_thread':
> drivers/usb/host/xhci-tegra.c:552:6: warning: 'err' may be used uninitialized 
> in this function [-Wuninitialized]
> drivers/usb/host/xhci-tegra.c:482:6: note: 'err' was declared here
> 
> Signed-off-by: Dongjiu Geng 
> Acked-by: Thierry Reding 
> ---
> How to reproduce:
> 1. make defconfig ARCH=arm
> 2. make -j100 CROSS_COMPILE=arm-linux-gnueabi- ARCH=arm
> 
> Then you can see below warnings:
> drivers/usb/host/xhci-tegra.c: In function 'tegra_xusb_mbox_thread':
> drivers/usb/host/xhci-tegra.c:552:6: warning: 'err' may be used uninitialized 
> in this function [-Wuninitialized]
> drivers/usb/host/xhci-tegra.c:482:6: note: 'err' was declared here
> ---
>  drivers/usb/host/xhci-tegra.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/xhci-tegra.c b/drivers/usb/host/xhci-tegra.c
> index a8c1d07..d50549f 100644
> --- a/drivers/usb/host/xhci-tegra.c
> +++ b/drivers/usb/host/xhci-tegra.c
> @@ -481,7 +481,7 @@ static void tegra_xusb_mbox_handle(struct tegra_xusb 
> *tegra,
>   unsigned long mask;
>   unsigned int port;
>   bool idle, enable;
> - int err;
> + int err = 0;
>  
>   memset(, 0, sizeof(rsp));
>  
> 



Re: [PATCH v2] usb: xhci: remove the code build warning

2018-06-11 Thread gengdongjiu
Hi Mathias,
   This patch is ready to apply, It is great that this patch can be applied to 
4.18. thanks a lot.

On 2018/6/6 22:36, Dongjiu Geng wrote:
> Initialize the 'err' variate to remove the build warning,
> the warning is shown as below:
> 
> drivers/usb/host/xhci-tegra.c: In function 'tegra_xusb_mbox_thread':
> drivers/usb/host/xhci-tegra.c:552:6: warning: 'err' may be used uninitialized 
> in this function [-Wuninitialized]
> drivers/usb/host/xhci-tegra.c:482:6: note: 'err' was declared here
> 
> Signed-off-by: Dongjiu Geng 
> Acked-by: Thierry Reding 
> ---
> How to reproduce:
> 1. make defconfig ARCH=arm
> 2. make -j100 CROSS_COMPILE=arm-linux-gnueabi- ARCH=arm
> 
> Then you can see below warnings:
> drivers/usb/host/xhci-tegra.c: In function 'tegra_xusb_mbox_thread':
> drivers/usb/host/xhci-tegra.c:552:6: warning: 'err' may be used uninitialized 
> in this function [-Wuninitialized]
> drivers/usb/host/xhci-tegra.c:482:6: note: 'err' was declared here
> ---
>  drivers/usb/host/xhci-tegra.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/xhci-tegra.c b/drivers/usb/host/xhci-tegra.c
> index a8c1d07..d50549f 100644
> --- a/drivers/usb/host/xhci-tegra.c
> +++ b/drivers/usb/host/xhci-tegra.c
> @@ -481,7 +481,7 @@ static void tegra_xusb_mbox_handle(struct tegra_xusb 
> *tegra,
>   unsigned long mask;
>   unsigned int port;
>   bool idle, enable;
> - int err;
> + int err = 0;
>  
>   memset(, 0, sizeof(rsp));
>  
> 



Re: [PATCH] usb: xhci: remove the code build warning

2018-06-05 Thread gengdongjiu



On 2018/6/5 16:40, Greg KH wrote:
> On Wed, Jun 06, 2018 at 12:35:00AM +0800, Dongjiu Geng wrote:
>> Initialize the 'err' variate to remove the build warning,
>> the warning is shown as below:
>>
>> drivers/usb/host/xhci-tegra.c: In function ‘tegra_xusb_mbox_thread’:
>> drivers/usb/host/xhci-tegra.c:552:6: warning: ‘err’ may be used 
>> uninitialized in this function [-Wuninitialized]
>> drivers/usb/host/xhci-tegra.c:482:6: note: ‘err’ was declared here
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
> 
> Any hint as to what commit caused this warning to show up?

It seems below commit:

commit e84fce0f8837496a48d11086829bdbe170358b7a
Author: Thierry Reding 
Date:   Thu Feb 11 18:10:48 2016 +0100

usb: xhci: Add NVIDIA Tegra XUSB controller driver

Add support for the on-chip XUSB controller present on Tegra SoCs. This
controller, when loaded with external firmware, exposes an interface
compliant with xHCI. This driver loads the firmware, starts the
controller, and is able to service host-specific messages sent by the
controller's firmware.

The controller also supports USB device mode as well as powergating
of the SuperSpeed and host-controller logic when not in use, but
support for these is not yet implemented.

Based on work by:
  Ajay Gupta 
  Bharath Yadav 
  Andrew Bresticker 

Cc: Mathias Nyman 
Cc: Greg Kroah-Hartman 
Acked-by: Mathias Nyman 
Signed-off-by: Thierry Reding 


> 
> thanks,
> 
> greg k-h
> 
> .
> 



Re: [PATCH] usb: xhci: remove the code build warning

2018-06-05 Thread gengdongjiu



On 2018/6/5 16:40, Greg KH wrote:
> On Wed, Jun 06, 2018 at 12:35:00AM +0800, Dongjiu Geng wrote:
>> Initialize the 'err' variate to remove the build warning,
>> the warning is shown as below:
>>
>> drivers/usb/host/xhci-tegra.c: In function ‘tegra_xusb_mbox_thread’:
>> drivers/usb/host/xhci-tegra.c:552:6: warning: ‘err’ may be used 
>> uninitialized in this function [-Wuninitialized]
>> drivers/usb/host/xhci-tegra.c:482:6: note: ‘err’ was declared here
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
> 
> Any hint as to what commit caused this warning to show up?

It seems below commit:

commit e84fce0f8837496a48d11086829bdbe170358b7a
Author: Thierry Reding 
Date:   Thu Feb 11 18:10:48 2016 +0100

usb: xhci: Add NVIDIA Tegra XUSB controller driver

Add support for the on-chip XUSB controller present on Tegra SoCs. This
controller, when loaded with external firmware, exposes an interface
compliant with xHCI. This driver loads the firmware, starts the
controller, and is able to service host-specific messages sent by the
controller's firmware.

The controller also supports USB device mode as well as powergating
of the SuperSpeed and host-controller logic when not in use, but
support for these is not yet implemented.

Based on work by:
  Ajay Gupta 
  Bharath Yadav 
  Andrew Bresticker 

Cc: Mathias Nyman 
Cc: Greg Kroah-Hartman 
Acked-by: Mathias Nyman 
Signed-off-by: Thierry Reding 


> 
> thanks,
> 
> greg k-h
> 
> .
> 



Re: [PATCH v1 1/2] ACPI / APEI: Add SEI notification type support for ARMv8

2018-06-01 Thread gengdongjiu
Hi Mark,
   Thanks for the comments.

On 2018/5/31 18:52, Mark Rutland wrote:
> On Thu, May 31, 2018 at 08:41:45PM +0800, Dongjiu Geng wrote:
>> +#ifdef CONFIG_ACPI_APEI_SEI
>> +static LIST_HEAD(ghes_sei);
>> +
>> +/*
>> + * Return 0 only if one of the SEI error sources successfully reported an 
>> error
>> + * record sent from the firmware.
>> + */
>> +int ghes_notify_sei(void)
>> +{
>> +struct ghes *ghes;
>> +int ret = -ENOENT;
>> +
>> +rcu_read_lock();
>> +list_for_each_entry_rcu(ghes, _sei, list) {
>> +if (!ghes_proc(ghes))
>> +ret = 0;
>> +}
>> +rcu_read_unlock();
>> +return ret;
>> +}
>> +
>> +static void ghes_sei_add(struct ghes *ghes)
>> +{
>> +mutex_lock(_list_mutex);
>> +list_add_rcu(>list, _sei);
>> +mutex_unlock(_list_mutex);
>> +}
>> +
>> +static void ghes_sei_remove(struct ghes *ghes)
>> +{
>> +mutex_lock(_list_mutex);
>> +list_del_rcu(>list);
>> +mutex_unlock(_list_mutex);
>> +synchronize_rcu();
>> +}
>> +#else /* CONFIG_ACPI_APEI_SEI */
>> +static inline void ghes_sei_add(struct ghes *ghes) { }
>> +static inline void ghes_sei_remove(struct ghes *ghes) { }
>> +#endif /* CONFIG_ACPI_APEI_SEI */
>> +
>>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>  /*
> 
>> index 8feb0c8..9ba59e2 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -120,5 +120,6 @@ static inline void *acpi_hest_get_next(struct 
>> acpi_hest_generic_data *gdata)
>>   section = acpi_hest_get_next(section))
>>  
>>  int ghes_notify_sea(void);
>> +int ghes_notify_sei(void);
> 
> It would be nice to have a stub definition when !CONFIG_ACPI_APEI_SEI,
> e.g.
> 
> #ifdef CONFIG_ACPI_APEI_SEI
> int ghes_notify_sei(void);
> #else
> static in int ghes_notify_sei(void) { return -ENOENT; }
> #endif
> 
> ... as callers could simply call this without additional checks.
> 
> As a cleanup, similar would be nice for ghes_notify_sea() with
> CONFIG_ACPI_APEI_SEA.

I think your suggestion is better, I will do the cleanup include the 
ghes_notify_sea().
thanks again.

> 
> Thanks,
> Mark.
> 
> .
> 



Re: [PATCH v1 1/2] ACPI / APEI: Add SEI notification type support for ARMv8

2018-06-01 Thread gengdongjiu
Hi Mark,
   Thanks for the comments.

On 2018/5/31 18:52, Mark Rutland wrote:
> On Thu, May 31, 2018 at 08:41:45PM +0800, Dongjiu Geng wrote:
>> +#ifdef CONFIG_ACPI_APEI_SEI
>> +static LIST_HEAD(ghes_sei);
>> +
>> +/*
>> + * Return 0 only if one of the SEI error sources successfully reported an 
>> error
>> + * record sent from the firmware.
>> + */
>> +int ghes_notify_sei(void)
>> +{
>> +struct ghes *ghes;
>> +int ret = -ENOENT;
>> +
>> +rcu_read_lock();
>> +list_for_each_entry_rcu(ghes, _sei, list) {
>> +if (!ghes_proc(ghes))
>> +ret = 0;
>> +}
>> +rcu_read_unlock();
>> +return ret;
>> +}
>> +
>> +static void ghes_sei_add(struct ghes *ghes)
>> +{
>> +mutex_lock(_list_mutex);
>> +list_add_rcu(>list, _sei);
>> +mutex_unlock(_list_mutex);
>> +}
>> +
>> +static void ghes_sei_remove(struct ghes *ghes)
>> +{
>> +mutex_lock(_list_mutex);
>> +list_del_rcu(>list);
>> +mutex_unlock(_list_mutex);
>> +synchronize_rcu();
>> +}
>> +#else /* CONFIG_ACPI_APEI_SEI */
>> +static inline void ghes_sei_add(struct ghes *ghes) { }
>> +static inline void ghes_sei_remove(struct ghes *ghes) { }
>> +#endif /* CONFIG_ACPI_APEI_SEI */
>> +
>>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>  /*
> 
>> index 8feb0c8..9ba59e2 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -120,5 +120,6 @@ static inline void *acpi_hest_get_next(struct 
>> acpi_hest_generic_data *gdata)
>>   section = acpi_hest_get_next(section))
>>  
>>  int ghes_notify_sea(void);
>> +int ghes_notify_sei(void);
> 
> It would be nice to have a stub definition when !CONFIG_ACPI_APEI_SEI,
> e.g.
> 
> #ifdef CONFIG_ACPI_APEI_SEI
> int ghes_notify_sei(void);
> #else
> static in int ghes_notify_sei(void) { return -ENOENT; }
> #endif
> 
> ... as callers could simply call this without additional checks.
> 
> As a cleanup, similar would be nice for ghes_notify_sea() with
> CONFIG_ACPI_APEI_SEA.

I think your suggestion is better, I will do the cleanup include the 
ghes_notify_sea().
thanks again.

> 
> Thanks,
> Mark.
> 
> .
> 



Re: [PATCH v1 2/2] arm64: handle NOTIFY_SEI notification by the APEI driver

2018-06-01 Thread gengdongjiu
Hi Mark, James,
   Thanks the review.

On 2018/6/1 0:51, James Morse wrote:
> Hi Mark, Dongjiu Geng,
> 
> On 31/05/18 12:01, Mark Rutland wrote:
>> In do_serror() we already handle nmi_{enter,exit}(), so there's no need
>> for that here.
> 
> Even better: nmi_enter() has a BUG_ON(in_nmi()).

There are two places call the arm64_is_fatal_ras_serror():
1. do_serror()
2. kvm_handle_guest_serror()

Yes, the do_serror() already handle nmi_{enter,exit}(), so the 
arm64_is_fatal_ras_serror() does not need to handle it again.
For the kvm_handle_guest_serror(), it does not handle the nmi_{enter,exit}(), 
so should we add nmi_{enter,exit}() in  kvm_handle_guest_serror() before 
calling arm64_is_fatal_ras_serror()?

For the NOTIFY_SEA, I do not know why handle_guest_sea() does not handle the 
nmi_{enter,exit}(). James, does we miss it in handle_guest_sea()?

> 
> 
>> TBH, I don't understand why do_sea() does that conditionally today.
>> Unless there's some constraint I'm missing, 
> 
> APEI uses a different fixmap entry and locks when in_nmi(). This was because 
> we
> may interrupt the irq-masked region in APEI that was using the regular memory.
> (e.g. the 'polled' notification, or something backed by an interrupt.) But,
> Borislav has spotted other things in here that are broken[0]. I'm working on
> rolling all that into 'v5' of the in_nmi() rework stuff.
> 
> We currently get away with this on arm because 'SEA' is the only NMI-like 
> thing,
> and it occurs synchronously. The problem cases are all also cases where the
> kernel text is corrupt, which we can't possibly hope to handle.
> 
> For NOTIFY_SDEI and NOTIFY_SEI this is the wrong pattern as these are
> asynchronous. do_serror() has already done most of the work for NOTIFY_SEI, 
> but
> we need to use the estatus queue in APEI, which is currently x86 only.
I think this patch can based on you 'v5' of the in_nmi() rework stuff

> 
> 
>> I think it would make more
>> sense to do that regardless of whether the interrupted context had
>> interrupts enabled. James -- does that make sense to you?
>>
>> If you update the prior patch with a stub for !CONFIG_ACPI_APEI_SEI, you
>> can simplify all of the above additions down to:
>>
>>  if (!ghes_notify_sei())
>>  return;
>>
>> ... which would look a lot nicer.
> 
> The code that calls ghes_notify_sei() may need calling by KVM too, but its
> default action to an 'unclaimed' SError will be different.
> Because of the race between memory_failure() and return-to-userspace, we may
> need to kick the irq work queue (if we can), as we return from do_serror(). 
> [1]
> and [2] provide an example for NOTIFY_SEA. SDEI does this by returning to the
> kernel through the IRQ handler, (which handles the KVM case too).
I can kick the IRQ work queue as you do for the NOTIFY_SEA and NOTIFY_SDEI.

> 
> 
> I think this series is unsafe until we can use the estatus queue in APEI. Its
> also missing the handling for an SError interrupting a KVM guest.

how about this series is based on your patches that uses the estatus queue in 
APEI to make it safe?

when an SError interrupting a KVM guest, it will trap to hypervisor, hypervisor 
will call
below software stack to handle it:
kvm_handle_guest_serror()->arm64_is_fatal_ras_serror()->ghes_notify_sei()

so it already handles the case that an SError interrupting a KVM guest.

> 
> 
> Thanks,
> 
> James
> 
> [0] https://www.spinics.net/lists/arm-kernel/msg653332.html
> [1] https://www.spinics.net/lists/arm-kernel/msg649237.html
> [2] https://www.spinics.net/lists/arm-kernel/msg649239.html
> 
> .
> 



Re: [PATCH v1 2/2] arm64: handle NOTIFY_SEI notification by the APEI driver

2018-06-01 Thread gengdongjiu
Hi Mark, James,
   Thanks the review.

On 2018/6/1 0:51, James Morse wrote:
> Hi Mark, Dongjiu Geng,
> 
> On 31/05/18 12:01, Mark Rutland wrote:
>> In do_serror() we already handle nmi_{enter,exit}(), so there's no need
>> for that here.
> 
> Even better: nmi_enter() has a BUG_ON(in_nmi()).

There are two places call the arm64_is_fatal_ras_serror():
1. do_serror()
2. kvm_handle_guest_serror()

Yes, the do_serror() already handle nmi_{enter,exit}(), so the 
arm64_is_fatal_ras_serror() does not need to handle it again.
For the kvm_handle_guest_serror(), it does not handle the nmi_{enter,exit}(), 
so should we add nmi_{enter,exit}() in  kvm_handle_guest_serror() before 
calling arm64_is_fatal_ras_serror()?

For the NOTIFY_SEA, I do not know why handle_guest_sea() does not handle the 
nmi_{enter,exit}(). James, does we miss it in handle_guest_sea()?

> 
> 
>> TBH, I don't understand why do_sea() does that conditionally today.
>> Unless there's some constraint I'm missing, 
> 
> APEI uses a different fixmap entry and locks when in_nmi(). This was because 
> we
> may interrupt the irq-masked region in APEI that was using the regular memory.
> (e.g. the 'polled' notification, or something backed by an interrupt.) But,
> Borislav has spotted other things in here that are broken[0]. I'm working on
> rolling all that into 'v5' of the in_nmi() rework stuff.
> 
> We currently get away with this on arm because 'SEA' is the only NMI-like 
> thing,
> and it occurs synchronously. The problem cases are all also cases where the
> kernel text is corrupt, which we can't possibly hope to handle.
> 
> For NOTIFY_SDEI and NOTIFY_SEI this is the wrong pattern as these are
> asynchronous. do_serror() has already done most of the work for NOTIFY_SEI, 
> but
> we need to use the estatus queue in APEI, which is currently x86 only.
I think this patch can based on you 'v5' of the in_nmi() rework stuff

> 
> 
>> I think it would make more
>> sense to do that regardless of whether the interrupted context had
>> interrupts enabled. James -- does that make sense to you?
>>
>> If you update the prior patch with a stub for !CONFIG_ACPI_APEI_SEI, you
>> can simplify all of the above additions down to:
>>
>>  if (!ghes_notify_sei())
>>  return;
>>
>> ... which would look a lot nicer.
> 
> The code that calls ghes_notify_sei() may need calling by KVM too, but its
> default action to an 'unclaimed' SError will be different.
> Because of the race between memory_failure() and return-to-userspace, we may
> need to kick the irq work queue (if we can), as we return from do_serror(). 
> [1]
> and [2] provide an example for NOTIFY_SEA. SDEI does this by returning to the
> kernel through the IRQ handler, (which handles the KVM case too).
I can kick the IRQ work queue as you do for the NOTIFY_SEA and NOTIFY_SDEI.

> 
> 
> I think this series is unsafe until we can use the estatus queue in APEI. Its
> also missing the handling for an SError interrupting a KVM guest.

how about this series is based on your patches that uses the estatus queue in 
APEI to make it safe?

when an SError interrupting a KVM guest, it will trap to hypervisor, hypervisor 
will call
below software stack to handle it:
kvm_handle_guest_serror()->arm64_is_fatal_ras_serror()->ghes_notify_sei()

so it already handles the case that an SError interrupting a KVM guest.

> 
> 
> Thanks,
> 
> James
> 
> [0] https://www.spinics.net/lists/arm-kernel/msg653332.html
> [1] https://www.spinics.net/lists/arm-kernel/msg649237.html
> [2] https://www.spinics.net/lists/arm-kernel/msg649239.html
> 
> .
> 



Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification

2018-04-18 Thread gengdongjiu
James,

> 
>> I do not know when it is merge-window. About the apply version, it does not 
>> have limited.
> 
> 'git fetch' Linus' tree and look at the tags. 'v4.16' lost its '-rc' suffixes,
> and there isn't a 'v4.17-rc1' yet, so we are still in the merge window.
> 
> Linus sends a message to LKML. eg:
> https://lkml.org/lkml/2018/4/1/175
> 
> net-next closes shortly before the merge window, and re-opens afterwards. 
> There
> is a handy web page:
> http://vger.kernel.org/~davem/net-next.html

 Thanks for this information, got it.



Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification

2018-04-18 Thread gengdongjiu
James,

> 
>> I do not know when it is merge-window. About the apply version, it does not 
>> have limited.
> 
> 'git fetch' Linus' tree and look at the tags. 'v4.16' lost its '-rc' suffixes,
> and there isn't a 'v4.17-rc1' yet, so we are still in the merge window.
> 
> Linus sends a message to LKML. eg:
> https://lkml.org/lkml/2018/4/1/175
> 
> net-next closes shortly before the merge window, and re-opens afterwards. 
> There
> is a handy web page:
> http://vger.kernel.org/~davem/net-next.html

 Thanks for this information, got it.



Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-04-13 Thread gengdongjiu
James,
   Thanks for this mail.

On 2018/4/13 0:14, James Morse wrote:
> Hi gengdongjiu,
> 
> On 12/04/18 06:00, gengdongjiu wrote:
>> 2018-02-16 1:55 GMT+08:00 James Morse <james.mo...@arm.com>:
>>> On 05/02/18 11:24, gengdongjiu wrote:
>>>>> Is the emulated SError routed following the routing rules for 
>>>>> HCR_EL2.{AMO,
>>>>> TGE}?
>>>>
>>>> Yes, it is.
>>>
>>> ... and yet ...
>>>
>>>
>>>>> What does your firmware do when it wants to emulate SError but its masked?
>>>>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
>>>>> PSTATE.A  set.
>>>>>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
>>>>> emulated  SError should go to EL1. This effectively masks SError.)
>>>>
>>>> Currently we does not consider much about the mask status(SPSR).
>>>
>>> .. this is a problem.
>>>
>>> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
>>> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't 
>>> eret to
>>> EL2. This should never happen, SError is effectively masked if you are 
>>> running
>>> at an EL higher than the one its routed to.
>>>
>>> More obviously: if the exception came from the EL that SError should be 
>>> routed
>>> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the 
>>> only
> 
>> James, I  summarized the masking and routing rules for SError to
>> confirm with you for the firmware first solution,
> 
> You also said "Currently we does not consider much about the mask 
> status(SPSR)."
Yes, we currently do not consider much it. After clarification with you, we 
want to modify the EL3 firmware to follow this rule.

> 
> 
>> 1. If the HCR_EL2.{AMO,TGE} is set,
> 
> If one or the other of these bits is set: (AMO==1 || TGE==1)
> 
>> which means the SError should route to EL2,
>> When system happens SError and trap to EL3,   If EL3 find
>> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
>> and find this SError come from EL2, it will not deliver an SError:
>> store the RAS error in the BERT and 'reboot'; but if
>> it find that this SError come from EL1 or EL0, it also need to deliver
>> an SError, right?
> 
> Yes.
> 
> 
>> 2. If the HCR_EL2.{AMO,TGE} is not set,
> 
> If neither of these bits is set: (AMO==0 && TGE == 0)
> 
>> which means the SError should route to EL1,
>> When system happens SError and trap to EL3, If EL3 find
>> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,
> 
> (I'm reading this as all three of these bits are clear)
sorry, it is a typo issue.
it should be HCR_EL2.AMO and HCR_EL2.TGE are both clear, but SPSR_EL3.A is set.

> 
>> and find this SError come from EL1, it will not deliver an SError:
>> store the RAS error in the BERT and 'reboot'; 
> 
> No, (AMO==0 && TGE == 0) means SError is routed to EL1, this exception
> interrupted EL1 and the A bit was clear, so EL1 can take an SError.

Agree.

> 
> The two cases here are:
> AMO==0,TGE==0 means SError should be routed to EL1. If SPSR_EL3 says the
> exception interrupted EL1 and the A bit was set, you need to do the BERT 
> trick.

> 
> If SPSR_EL3 says the exception interrupted EL2, you need to do the BERT trick
"BERT trick" is storing the RAS error in the BERT and 'reboot, right?

> regardless of the A bit, as SError is implicitly masked by running at a higher
> exception level than it was routed to.


> 
> 
>>From your v11 reply:
>> 2. The exception came from the EL that SError should not be routed
>> to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3
>> firmware still deliver SError
> 
> (this is re-iterating the two-cases above:)
> 'not be routed to' is one of two things: Route-to-EL2+interruted-EL1, or
> Route-to-EL1+interrupted-EL2.
> 
> Route-to-EL2+interrupted-EL1 is fine, regardless of SPSR_EL3.A the emulated
> SError can be delivered to EL2, as EL2 can't mask SError when executing at a
> lower EL.
Agree.

> 
> Route-to-EL1+interrupted-EL2 is the problem. SError is implicitly masked by
> running at a higher EL. Regardless of SPSR_EL3.A, the emulated SError can not 
> be
> delivered.
"can not be delivered" means storing the RAS error in the BERT and 'reboot, 
right?
In the Table D1-15 in "D1.14.2 Asynchronous exception masking", for the case, 
it is "C"
"C"means SError is not taken regardless of the value of

Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-04-13 Thread gengdongjiu
James,
   Thanks for this mail.

On 2018/4/13 0:14, James Morse wrote:
> Hi gengdongjiu,
> 
> On 12/04/18 06:00, gengdongjiu wrote:
>> 2018-02-16 1:55 GMT+08:00 James Morse :
>>> On 05/02/18 11:24, gengdongjiu wrote:
>>>>> Is the emulated SError routed following the routing rules for 
>>>>> HCR_EL2.{AMO,
>>>>> TGE}?
>>>>
>>>> Yes, it is.
>>>
>>> ... and yet ...
>>>
>>>
>>>>> What does your firmware do when it wants to emulate SError but its masked?
>>>>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
>>>>> PSTATE.A  set.
>>>>>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
>>>>> emulated  SError should go to EL1. This effectively masks SError.)
>>>>
>>>> Currently we does not consider much about the mask status(SPSR).
>>>
>>> .. this is a problem.
>>>
>>> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
>>> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't 
>>> eret to
>>> EL2. This should never happen, SError is effectively masked if you are 
>>> running
>>> at an EL higher than the one its routed to.
>>>
>>> More obviously: if the exception came from the EL that SError should be 
>>> routed
>>> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the 
>>> only
> 
>> James, I  summarized the masking and routing rules for SError to
>> confirm with you for the firmware first solution,
> 
> You also said "Currently we does not consider much about the mask 
> status(SPSR)."
Yes, we currently do not consider much it. After clarification with you, we 
want to modify the EL3 firmware to follow this rule.

> 
> 
>> 1. If the HCR_EL2.{AMO,TGE} is set,
> 
> If one or the other of these bits is set: (AMO==1 || TGE==1)
> 
>> which means the SError should route to EL2,
>> When system happens SError and trap to EL3,   If EL3 find
>> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
>> and find this SError come from EL2, it will not deliver an SError:
>> store the RAS error in the BERT and 'reboot'; but if
>> it find that this SError come from EL1 or EL0, it also need to deliver
>> an SError, right?
> 
> Yes.
> 
> 
>> 2. If the HCR_EL2.{AMO,TGE} is not set,
> 
> If neither of these bits is set: (AMO==0 && TGE == 0)
> 
>> which means the SError should route to EL1,
>> When system happens SError and trap to EL3, If EL3 find
>> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,
> 
> (I'm reading this as all three of these bits are clear)
sorry, it is a typo issue.
it should be HCR_EL2.AMO and HCR_EL2.TGE are both clear, but SPSR_EL3.A is set.

> 
>> and find this SError come from EL1, it will not deliver an SError:
>> store the RAS error in the BERT and 'reboot'; 
> 
> No, (AMO==0 && TGE == 0) means SError is routed to EL1, this exception
> interrupted EL1 and the A bit was clear, so EL1 can take an SError.

Agree.

> 
> The two cases here are:
> AMO==0,TGE==0 means SError should be routed to EL1. If SPSR_EL3 says the
> exception interrupted EL1 and the A bit was set, you need to do the BERT 
> trick.

> 
> If SPSR_EL3 says the exception interrupted EL2, you need to do the BERT trick
"BERT trick" is storing the RAS error in the BERT and 'reboot, right?

> regardless of the A bit, as SError is implicitly masked by running at a higher
> exception level than it was routed to.


> 
> 
>>From your v11 reply:
>> 2. The exception came from the EL that SError should not be routed
>> to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3
>> firmware still deliver SError
> 
> (this is re-iterating the two-cases above:)
> 'not be routed to' is one of two things: Route-to-EL2+interruted-EL1, or
> Route-to-EL1+interrupted-EL2.
> 
> Route-to-EL2+interrupted-EL1 is fine, regardless of SPSR_EL3.A the emulated
> SError can be delivered to EL2, as EL2 can't mask SError when executing at a
> lower EL.
Agree.

> 
> Route-to-EL1+interrupted-EL2 is the problem. SError is implicitly masked by
> running at a higher EL. Regardless of SPSR_EL3.A, the emulated SError can not 
> be
> delivered.
"can not be delivered" means storing the RAS error in the BERT and 'reboot, 
right?
In the Table D1-15 in "D1.14.2 Asynchronous exception masking", for the case, 
it is "C"
"C"means SError is not taken regardless of the value of the Process state 
inte

Re: [PATCH v11 2/4] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-04-12 Thread gengdongjiu
Hi James,
  Thanks for the comments.

2018-04-10 22:15 GMT+08:00, James Morse :
> Hi Dongjiu Geng,
>
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, it can inject
>> SError with specified syndrome to guest by setup kvm_vcpu_events
>> value.
>
>> Also it can support live migration.
>
> Could you explain what user-space is expected to do for this?
> (this is also relevant for snapshot-ing/suspending VMs)
Ok.

>
> It's probably worth noting that this solves an existing problem: KVM may
> make an
> SError pending, but user-space has no way to discover/migrate this.

if KVM make an SError pending, when user-space do migration, it get the
kvm_vcpu_events through KVM_GET_VCPU_EVENTS, then can find that pending status.
What are the things you're worried about?

>
>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index 8a3d708..45719b4 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -819,11 +819,13 @@ struct kvm_clock_data {
>>
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as
>> related
>>  states of the vcpu.
>>
>> @@ -865,15 +867,31 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 4 bytes */
>> +__u8 pad[2];
>> +__u64 serror_esr;
>> +} exception;
>> +};
>> +
>
> I'm not convinced we should change this struct from the layout/size x86 has.
> Its
> confusing for the documentation, is this API call really the same on all
> architectures?
>
> What if we want to add some future interrupt, NMI or related state? We've
> found
> ourselves needing to add this API, it seems odd to remove its other uses on
> x86.
> We can't put them back in the future.
>
> Having a different layout would force user-space to ifdef/duplicate any
> code
> that accesses this between architectures.
 In x86 and arm64 user space code, the handling logic of
KVM_GET/SET_VCPU_EVENTS is in different ARCH folder,  maybe it is not
necessary to share the handling code in the user space.

>
>
>
> The compiler will want that __u64 to be naturally aligned to 8-bytes, so
> your
> 4-byte padding still causes some secret compiler-padding to be inserted.
> Different versions of the compiler may put it in different places.
>
>
>>  4.32 KVM_SET_VCPU_EVENTS
>>
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of
>> the
>>  vcpu.
>>
>> @@ -894,6 +912,12 @@ shall be written into the VCPU.
>>
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions as well as related states of the vcpu.
>> +
>> +See KVM_GET_VCPU_EVENTS for the data structure.
>> +
>>
>>  4.33 KVM_GET_DEBUGREGS
>>
>
>
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h
>> b/arch/arm64/include/uapi/asm/kvm.h
>> index 9abbf30..855cc9a 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -39,6 +39,7 @@
>>  #define __KVM_HAVE_GUEST_DEBUG
>>  #define __KVM_HAVE_IRQ_LINE
>>  #define __KVM_HAVE_READONLY_MEM
>> +#define __KVM_HAVE_VCPU_EVENTS
>>
>>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>>
>> @@ -153,6 +154,17 @@ struct kvm_sync_regs {
>>  struct kvm_arch_memory_slot {
>>  };
>>
>> +/* for KVM_GET/SET_VCPU_EVENTS */
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>
>> +/* Align it to 4 bytes */
>> +__u8 pad[2];
>
> (padding noted above)
>
>
>> +__u64 serror_esr;
>> +} exception;
>> +};
>> +
>>  /* If you need to interpret the index values, here is the key: */
>>  #define KVM_REG_ARM_COPROC_MASK 0x0FFF
>>  #define KVM_REG_ARM_COPROC_SHIFT16
>
>
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 5c7f657..42e1222 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -277,6 +277,37 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu
>> *vcpu,
>>  

Re: [PATCH v11 2/4] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-04-12 Thread gengdongjiu
Hi James,
  Thanks for the comments.

2018-04-10 22:15 GMT+08:00, James Morse :
> Hi Dongjiu Geng,
>
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, it can inject
>> SError with specified syndrome to guest by setup kvm_vcpu_events
>> value.
>
>> Also it can support live migration.
>
> Could you explain what user-space is expected to do for this?
> (this is also relevant for snapshot-ing/suspending VMs)
Ok.

>
> It's probably worth noting that this solves an existing problem: KVM may
> make an
> SError pending, but user-space has no way to discover/migrate this.

if KVM make an SError pending, when user-space do migration, it get the
kvm_vcpu_events through KVM_GET_VCPU_EVENTS, then can find that pending status.
What are the things you're worried about?

>
>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index 8a3d708..45719b4 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -819,11 +819,13 @@ struct kvm_clock_data {
>>
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as
>> related
>>  states of the vcpu.
>>
>> @@ -865,15 +867,31 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 4 bytes */
>> +__u8 pad[2];
>> +__u64 serror_esr;
>> +} exception;
>> +};
>> +
>
> I'm not convinced we should change this struct from the layout/size x86 has.
> Its
> confusing for the documentation, is this API call really the same on all
> architectures?
>
> What if we want to add some future interrupt, NMI or related state? We've
> found
> ourselves needing to add this API, it seems odd to remove its other uses on
> x86.
> We can't put them back in the future.
>
> Having a different layout would force user-space to ifdef/duplicate any
> code
> that accesses this between architectures.
 In x86 and arm64 user space code, the handling logic of
KVM_GET/SET_VCPU_EVENTS is in different ARCH folder,  maybe it is not
necessary to share the handling code in the user space.

>
>
>
> The compiler will want that __u64 to be naturally aligned to 8-bytes, so
> your
> 4-byte padding still causes some secret compiler-padding to be inserted.
> Different versions of the compiler may put it in different places.
>
>
>>  4.32 KVM_SET_VCPU_EVENTS
>>
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of
>> the
>>  vcpu.
>>
>> @@ -894,6 +912,12 @@ shall be written into the VCPU.
>>
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions as well as related states of the vcpu.
>> +
>> +See KVM_GET_VCPU_EVENTS for the data structure.
>> +
>>
>>  4.33 KVM_GET_DEBUGREGS
>>
>
>
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h
>> b/arch/arm64/include/uapi/asm/kvm.h
>> index 9abbf30..855cc9a 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -39,6 +39,7 @@
>>  #define __KVM_HAVE_GUEST_DEBUG
>>  #define __KVM_HAVE_IRQ_LINE
>>  #define __KVM_HAVE_READONLY_MEM
>> +#define __KVM_HAVE_VCPU_EVENTS
>>
>>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>>
>> @@ -153,6 +154,17 @@ struct kvm_sync_regs {
>>  struct kvm_arch_memory_slot {
>>  };
>>
>> +/* for KVM_GET/SET_VCPU_EVENTS */
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>
>> +/* Align it to 4 bytes */
>> +__u8 pad[2];
>
> (padding noted above)
>
>
>> +__u64 serror_esr;
>> +} exception;
>> +};
>> +
>>  /* If you need to interpret the index values, here is the key: */
>>  #define KVM_REG_ARM_COPROC_MASK 0x0FFF
>>  #define KVM_REG_ARM_COPROC_SHIFT16
>
>
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 5c7f657..42e1222 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -277,6 +277,37 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu
>> *vcpu,
>>  return -EINVAL;
>>  }
>>

Re: [PATCH v11 1/4] arm64: KVM: export the capability to set guest SError syndrome

2018-04-12 Thread gengdongjiu
HI James,
  Thanks for the review.


2018-04-10 22:15 GMT+08:00, James Morse :
> Hi Dongjiu Geng,
>
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> Before user space injects a SError, it needs to know whether it can
>> specify the guest Exception Syndrome, so KVM should tell user space
>> whether it has such capability.
>
> (you could improve the commit message by briefly explaining how/why
> user-space
> would want to do this. As this is patch 1, you don't have the context of
> the
> previous patch to say that some systems can provide an ESR with
> virtual-SError)
Exactly, thanks for the good comments.

>
>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index fc3ae95..8a3d708 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4415,3 +4415,14 @@ Parameters: none
>>  This capability indicates if the flic device will be able to get/set the
>>  AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and
>> allows
>>  to discover this without having to create a flic device.
>> +
>> +8.14 KVM_CAP_ARM_SET_SERROR_ESR
>> +
>> +Architectures: arm, arm64
>> +
>> +This capability indicates that userspace can specify syndrome value
>> reported to
>
> (Nit: 'the syndrome value')
will fix it.

>
>> +guest OS when guest takes a virtual SError interrupt exception.
>
> (Nit: 'the guest')
will fix it.

>
>> +If KVM has this capability, userspace can only specify the ISS field for
>> the ESR
>> +syndrome, can not specify the EC field which is not under control by
>> KVM.
>
> (Nit: 'it can not specify...')
will fix it.

>
>> +If this virtual SError is taken to EL1 using AArch64, this value will be
>> reported
>> +into ISS filed of ESR_EL1.
>
> (Nit: 'in the ISS field')
will fix it.

>
>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index 3256b92..38c8a64 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm,
>> long ext)
>>  case KVM_CAP_ARM_PMU_V3:
>>  r = kvm_arm_support_pmu_v3();
>>  break;
>> +case KVM_CAP_ARM_INJECT_SERROR_ESR:
>> +r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
>> +break;
>>  case KVM_CAP_SET_GUEST_DEBUG:
>>  case KVM_CAP_VCPU_ATTRIBUTES:
>>  r = 1;
>
> 'dev_ioctl' feels a bit weird, but we already have cpu_has_32bit_el1() in
> here.

Yes, although the name is "dev_ioctl", it does not have relationship
with the device.
here it mainly check vcpu capability, such as PMU, 32bit EL1 etc.

>
>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 8fb90a0..3587b33 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -934,6 +934,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_S390_AIS_MIGRATION 150
>>  #define KVM_CAP_PPC_GET_CPU_CHAR 151
>>  #define KVM_CAP_S390_BPB 152
>> +#define KVM_CAP_ARM_INJECT_SERROR_ESR 153
>>
>>  #ifdef KVM_CAP_IRQ_ROUTING
>
> (patch 1&2 should probably be swapped around, as on its own this does
> thing).
ok, I will do it.

>
> Reviewed-by: James Morse 
thanks this Reviewed-by

>
>
> Thanks,
>
> James
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>


Re: [PATCH v11 1/4] arm64: KVM: export the capability to set guest SError syndrome

2018-04-12 Thread gengdongjiu
HI James,
  Thanks for the review.


2018-04-10 22:15 GMT+08:00, James Morse :
> Hi Dongjiu Geng,
>
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> Before user space injects a SError, it needs to know whether it can
>> specify the guest Exception Syndrome, so KVM should tell user space
>> whether it has such capability.
>
> (you could improve the commit message by briefly explaining how/why
> user-space
> would want to do this. As this is patch 1, you don't have the context of
> the
> previous patch to say that some systems can provide an ESR with
> virtual-SError)
Exactly, thanks for the good comments.

>
>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index fc3ae95..8a3d708 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4415,3 +4415,14 @@ Parameters: none
>>  This capability indicates if the flic device will be able to get/set the
>>  AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and
>> allows
>>  to discover this without having to create a flic device.
>> +
>> +8.14 KVM_CAP_ARM_SET_SERROR_ESR
>> +
>> +Architectures: arm, arm64
>> +
>> +This capability indicates that userspace can specify syndrome value
>> reported to
>
> (Nit: 'the syndrome value')
will fix it.

>
>> +guest OS when guest takes a virtual SError interrupt exception.
>
> (Nit: 'the guest')
will fix it.

>
>> +If KVM has this capability, userspace can only specify the ISS field for
>> the ESR
>> +syndrome, can not specify the EC field which is not under control by
>> KVM.
>
> (Nit: 'it can not specify...')
will fix it.

>
>> +If this virtual SError is taken to EL1 using AArch64, this value will be
>> reported
>> +into ISS filed of ESR_EL1.
>
> (Nit: 'in the ISS field')
will fix it.

>
>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index 3256b92..38c8a64 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm,
>> long ext)
>>  case KVM_CAP_ARM_PMU_V3:
>>  r = kvm_arm_support_pmu_v3();
>>  break;
>> +case KVM_CAP_ARM_INJECT_SERROR_ESR:
>> +r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
>> +break;
>>  case KVM_CAP_SET_GUEST_DEBUG:
>>  case KVM_CAP_VCPU_ATTRIBUTES:
>>  r = 1;
>
> 'dev_ioctl' feels a bit weird, but we already have cpu_has_32bit_el1() in
> here.

Yes, although the name is "dev_ioctl", it does not have relationship
with the device.
here it mainly check vcpu capability, such as PMU, 32bit EL1 etc.

>
>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 8fb90a0..3587b33 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -934,6 +934,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_S390_AIS_MIGRATION 150
>>  #define KVM_CAP_PPC_GET_CPU_CHAR 151
>>  #define KVM_CAP_S390_BPB 152
>> +#define KVM_CAP_ARM_INJECT_SERROR_ESR 153
>>
>>  #ifdef KVM_CAP_IRQ_ROUTING
>
> (patch 1&2 should probably be swapped around, as on its own this does
> thing).
ok, I will do it.

>
> Reviewed-by: James Morse 
thanks this Reviewed-by

>
>
> Thanks,
>
> James
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>


Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification

2018-04-12 Thread gengdongjiu
Hi James,
   thanks for this mail.

On 2018/4/10 22:15, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> 1. Detect whether KVM can set set guest SError syndrome
>> 2. Support to Set VSESR_EL2 and inject SError by user space.
>> 3. Support live migration to keep SError pending state and VSESR_EL2 value.
>> 4. ACPI 6.1 adds support for NOTIFY_SEI as a GHES notification mechanism, so 
>> support this
>>notification in software, KVM or kernel ARCH code call handle_guest_sei() 
>> to let ACP driver
>>to handle this notification.
> 
> Please don't post code during the merge-window, will this apply to v4.17-rc1? 
> We
> can't know until its tagged.
I do not know when it is merge-window. About the apply version, it does not 
have limited.

> 
> 
> This series is doing two separate things, please split it into two series.
OK, thanks!

> 
> But on the ACPI front: I don't see how any OS can support your NOTIFY_SEI when
> firmware is ignoring the normal world's PSTATE.A.
> 
> The latest lobe of that discussion was on the list here:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html
I have replied the mail.
I still have some questions that need to clarify with you.
After clarification, we will follow that.
The question is in the reply of this mail 
"https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html;

> 
> 
> As it is, we would need to spot SError being delivered while SError is masked,
> spray nasty messages about firmware being horrifically buggy, then panic(). 
> For
> a corrected error, this looks bad, but its preferable to letting firmware
> silently overwrite the exception registers, causing linux to spin through the
> vectors 'eret' with all exceptions masked.
> I still think its best to wait for firmware that does the right thing.
Let us  discuss that in another mail.
In a summary, I think firmware follow below rule can be OK, right?
1. The exception came from the EL that SError should be routed to(according to 
hcr_EL2.{AMO, TGE}),but PSTATE.A was set, EL3 firmware can't deliver SError;
2. The exception came from the EL that SError should not be routed to(according 
to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3 firmware still 
deliver SError

> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification

2018-04-12 Thread gengdongjiu
Hi James,
   thanks for this mail.

On 2018/4/10 22:15, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> 1. Detect whether KVM can set set guest SError syndrome
>> 2. Support to Set VSESR_EL2 and inject SError by user space.
>> 3. Support live migration to keep SError pending state and VSESR_EL2 value.
>> 4. ACPI 6.1 adds support for NOTIFY_SEI as a GHES notification mechanism, so 
>> support this
>>notification in software, KVM or kernel ARCH code call handle_guest_sei() 
>> to let ACP driver
>>to handle this notification.
> 
> Please don't post code during the merge-window, will this apply to v4.17-rc1? 
> We
> can't know until its tagged.
I do not know when it is merge-window. About the apply version, it does not 
have limited.

> 
> 
> This series is doing two separate things, please split it into two series.
OK, thanks!

> 
> But on the ACPI front: I don't see how any OS can support your NOTIFY_SEI when
> firmware is ignoring the normal world's PSTATE.A.
> 
> The latest lobe of that discussion was on the list here:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html
I have replied the mail.
I still have some questions that need to clarify with you.
After clarification, we will follow that.
The question is in the reply of this mail 
"https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html;

> 
> 
> As it is, we would need to spot SError being delivered while SError is masked,
> spray nasty messages about firmware being horrifically buggy, then panic(). 
> For
> a corrected error, this looks bad, but its preferable to letting firmware
> silently overwrite the exception registers, causing linux to spin through the
> vectors 'eret' with all exceptions masked.
> I still think its best to wait for firmware that does the right thing.
Let us  discuss that in another mail.
In a summary, I think firmware follow below rule can be OK, right?
1. The exception came from the EL that SError should be routed to(according to 
hcr_EL2.{AMO, TGE}),but PSTATE.A was set, EL3 firmware can't deliver SError;
2. The exception came from the EL that SError should not be routed to(according 
to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3 firmware still 
deliver SError

> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-04-11 Thread gengdongjiu
Dear James,
Thanks for this mail and sorry for my late response.


2018-02-16 1:55 GMT+08:00 James Morse <james.mo...@arm.com>:
> Hi gengdongjiu, liu jun
>
> On 05/02/18 11:24, gengdongjiu wrote:
[]
>>
>>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO,
>>> TGE}?
>>
>> Yes, it is.
>
> ... and yet ...
>
>
>>> What does your firmware do when it wants to emulate SError but its masked?
>>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
>>> PSTATE.A  set.
>>>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
>>> emulated  SError should go to EL1. This effectively masks SError.)
>>
>> Currently we does not consider much about the mask status(SPSR).
>
> .. this is a problem.
>
> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret 
> to
> EL2. This should never happen, SError is effectively masked if you are running
> at an EL higher than the one its routed to.
>
> More obviously: if the exception came from the EL that SError should be routed
> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only

James, I  summarized the masking and routing rules for SError to
confirm with you for the firmware first solution,

1. If the HCR_EL2.{AMO,TGE} is set, which means the SError should route to EL2,
When system happens SError and trap to EL3,   If EL3 find
HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
and find this SError come from EL2, it will not deliver an SError:
store the RAS error in the BERT and 'reboot'; but if
it find that this SError come from EL1 or EL0, it also need to deliver
an SError, right?


2. If the HCR_EL2.{AMO,TGE} is not set, which means the SError should
route to EL1,
When system happens SError and trap to EL3, If EL3 find
HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,
and find this SError come from EL1, it will not deliver an SError:
store the RAS error in the BERT and 'reboot'; but if
it find that this SError come from EL0, it also need to deliver an
SError, right?


> way the OS has to indicate it can't take an exception right now. VBAR_EL1 may 
> be
> 'wrong' if we're doing some power-management, the FAR/ELR/ESR/SPSR registers 
> may
> contain live values that the OS would lose if you deliver another exception 
> over
> the top.
>
> If you deliver an emulated-SError as the OS eret's, your new ELR will point at
> the eret instruction and the CPU will spin on this instruction forever.
>
> You have to honour the masking and routing rules for SError, otherwise no OS 
> can
> run safely with this firmware.
>
>
>> I remember that you ever suggested firmware should reboot if the mask status
>> is set(SPSR), right?
>
> Yes, this is my suggestion of what to do if you can't deliver an SError: store
> the RAS error in the BERT and 'reboot'.
>
>
>> I CC "liu jun" <liuju...@hisilicon.com> who is our UEFI firmware Architect,
>> if you have firmware requirements, you can raise again.
>
> (UEFI? I didn't think there was any of that at EL3, but I'm not familiar with
> all the 'PI' bits).
>
> The requirement is your emulated-SError from EL3 looks exactly like a
> physical-SError as if EL3 wasn't implemented.
> Your CPU has to handle cases where it can't deliver an SError, your emulation
> has to do the same.
>
> This is not something any OS can work around.
>
>
>>> Answers to these let us determine whether a bug is in the firmware or the
>>> kernel. If firmware is expecting the OS to do something special, I'd like 
>>> to know
>>> about it from the beginning!
>>
>> I know your meaning, thanks for raising it again.
>
>
> Happy new year,
>
> James
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-04-11 Thread gengdongjiu
Dear James,
Thanks for this mail and sorry for my late response.


2018-02-16 1:55 GMT+08:00 James Morse :
> Hi gengdongjiu, liu jun
>
> On 05/02/18 11:24, gengdongjiu wrote:
[]
>>
>>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO,
>>> TGE}?
>>
>> Yes, it is.
>
> ... and yet ...
>
>
>>> What does your firmware do when it wants to emulate SError but its masked?
>>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
>>> PSTATE.A  set.
>>>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
>>> emulated  SError should go to EL1. This effectively masks SError.)
>>
>> Currently we does not consider much about the mask status(SPSR).
>
> .. this is a problem.
>
> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret 
> to
> EL2. This should never happen, SError is effectively masked if you are running
> at an EL higher than the one its routed to.
>
> More obviously: if the exception came from the EL that SError should be routed
> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only

James, I  summarized the masking and routing rules for SError to
confirm with you for the firmware first solution,

1. If the HCR_EL2.{AMO,TGE} is set, which means the SError should route to EL2,
When system happens SError and trap to EL3,   If EL3 find
HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
and find this SError come from EL2, it will not deliver an SError:
store the RAS error in the BERT and 'reboot'; but if
it find that this SError come from EL1 or EL0, it also need to deliver
an SError, right?


2. If the HCR_EL2.{AMO,TGE} is not set, which means the SError should
route to EL1,
When system happens SError and trap to EL3, If EL3 find
HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,
and find this SError come from EL1, it will not deliver an SError:
store the RAS error in the BERT and 'reboot'; but if
it find that this SError come from EL0, it also need to deliver an
SError, right?


> way the OS has to indicate it can't take an exception right now. VBAR_EL1 may 
> be
> 'wrong' if we're doing some power-management, the FAR/ELR/ESR/SPSR registers 
> may
> contain live values that the OS would lose if you deliver another exception 
> over
> the top.
>
> If you deliver an emulated-SError as the OS eret's, your new ELR will point at
> the eret instruction and the CPU will spin on this instruction forever.
>
> You have to honour the masking and routing rules for SError, otherwise no OS 
> can
> run safely with this firmware.
>
>
>> I remember that you ever suggested firmware should reboot if the mask status
>> is set(SPSR), right?
>
> Yes, this is my suggestion of what to do if you can't deliver an SError: store
> the RAS error in the BERT and 'reboot'.
>
>
>> I CC "liu jun"  who is our UEFI firmware Architect,
>> if you have firmware requirements, you can raise again.
>
> (UEFI? I didn't think there was any of that at EL3, but I'm not familiar with
> all the 'PI' bits).
>
> The requirement is your emulated-SError from EL3 looks exactly like a
> physical-SError as if EL3 wasn't implemented.
> Your CPU has to handle cases where it can't deliver an SError, your emulation
> has to do the same.
>
> This is not something any OS can work around.
>
>
>>> Answers to these let us determine whether a bug is in the firmware or the
>>> kernel. If firmware is expecting the OS to do something special, I'd like 
>>> to know
>>> about it from the beginning!
>>
>> I know your meaning, thanks for raising it again.
>
>
> Happy new year,
>
> James
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH]arm64:defconfig:enable ACPI_APEI_SEA

2018-03-20 Thread gengdongjiu
Hi Shiju,
   The configuration "CONFIG_ACPI_APEI_SEA" is needed to manually enable?
In the "drivers/acpi/apei/Kconfig" file, the default value of ACPI_APEI_SEA is 
"y"

config ACPI_APEI_SEA
bool "APEI Synchronous External Abort logging/recovering support"
depends on ARM64 && ACPI_APEI_GHES
default y
help
  This option should be enabled if the system supports
  firmware first handling of SEA (Synchronous External Abort).
  SEA happens with certain faults of data abort or instruction
  abort synchronous exceptions on ARMv8 systems. If a system
  supports firmware first handling of SEA, the platform analyzes
  and handles hardware error notifications from SEA, and it may then
  form a HW error record for the OS to parse and handle. This
  option allows the OS to look for such hardware error record, and
  take appropriate action.


On 2018/3/20 20:23, Shiju Jose wrote:
> Enable ACPI APEI SEA option for arm64, to handle
> ARMv8 SEA(Synchronous External Abort).
> 
> Signed-off-by: Shiju Jose 
> Cc: Tyler Baicar 
> Cc: James Morse 
> Cc: Dongjiu Geng 
> Cc: Xie XiuQi 
> Cc: Qiang Zheng 
> ---
>  arch/arm64/configs/defconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 634b373..5ddf25c 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -574,6 +574,7 @@ CONFIG_ACPI_APEI_GHES=y
>  CONFIG_ACPI_APEI_PCIEAER=y
>  CONFIG_ACPI_APEI_MEMORY_FAILURE=y
>  CONFIG_ACPI_APEI_EINJ=y
> +CONFIG_ACPI_APEI_SEA=y
>  CONFIG_EXT2_FS=y
>  CONFIG_EXT3_FS=y
>  CONFIG_EXT4_FS_POSIX_ACL=y
> 



Re: [PATCH]arm64:defconfig:enable ACPI_APEI_SEA

2018-03-20 Thread gengdongjiu
Hi Shiju,
   The configuration "CONFIG_ACPI_APEI_SEA" is needed to manually enable?
In the "drivers/acpi/apei/Kconfig" file, the default value of ACPI_APEI_SEA is 
"y"

config ACPI_APEI_SEA
bool "APEI Synchronous External Abort logging/recovering support"
depends on ARM64 && ACPI_APEI_GHES
default y
help
  This option should be enabled if the system supports
  firmware first handling of SEA (Synchronous External Abort).
  SEA happens with certain faults of data abort or instruction
  abort synchronous exceptions on ARMv8 systems. If a system
  supports firmware first handling of SEA, the platform analyzes
  and handles hardware error notifications from SEA, and it may then
  form a HW error record for the OS to parse and handle. This
  option allows the OS to look for such hardware error record, and
  take appropriate action.


On 2018/3/20 20:23, Shiju Jose wrote:
> Enable ACPI APEI SEA option for arm64, to handle
> ARMv8 SEA(Synchronous External Abort).
> 
> Signed-off-by: Shiju Jose 
> Cc: Tyler Baicar 
> Cc: James Morse 
> Cc: Dongjiu Geng 
> Cc: Xie XiuQi 
> Cc: Qiang Zheng 
> ---
>  arch/arm64/configs/defconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 634b373..5ddf25c 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -574,6 +574,7 @@ CONFIG_ACPI_APEI_GHES=y
>  CONFIG_ACPI_APEI_PCIEAER=y
>  CONFIG_ACPI_APEI_MEMORY_FAILURE=y
>  CONFIG_ACPI_APEI_EINJ=y
> +CONFIG_ACPI_APEI_SEA=y
>  CONFIG_EXT2_FS=y
>  CONFIG_EXT3_FS=y
>  CONFIG_EXT4_FS_POSIX_ACL=y
> 



Re: [PATCH v10 2/5] arm64: KVM: export the capability to set guest SError syndrome

2018-03-18 Thread gengdongjiu
Hi James,
   Thanks for your time to review and give comments.

[...]
> > +
> > +8.14 KVM_CAP_ARM_SET_SERROR_ESR
> > +
> > +Architectures: arm, arm64
> > +
> > +This capability indicates that userspace can specify syndrome value
> > +reported to guest OS when guest takes a virtual SError interrupt exception.
> 
> "when userspace triggers a virtual SError"... how?

In the user space(QEMU), it will call kvm_arch_put_registers() or 
kvm_arch_get_registers() to set or get KVM registers through KVM_SET_ONE_REG/ 
KVM_GET_ONE_REG IOCTL, at the same time the two functions will separately call 
kvm_arm_vcpu_get_events() and kvm_arm_vcpu_set_events() to get/set vcpu events. 
If user space want to trigger a virtual SError with specified ESR, it only need 
to setup the kvm_vcpu_events struct(exception.serror_pending = 1; 
exception.serror_has_esr=1; serror_esr=x), then KVM will trigger this 
virtual SError.

userspace can trigger it at any time, for example, for debug purpose.  Or 
simulate a SError after recording a CPER for guest. But before triggering a 
virtual SError, it needs to know whether KVM has such capability, so KVM needs 
to export this capability to user space. If has this capability, User space 
will call kvm_arm_vcpu_set_events() to trigger a virtual SError.

> 
> 
> > +If KVM has this capability, userspace can only specify the ISS field
> > +for the ESR syndrome, can not specify the EC field which is not under 
> > control by KVM.
> 
> Where do I put the ESR?
> If you re-order this after the patch that adds the API, you can describe how 
> this can be used.

Ok, thank a lot for your suggestion.

> 
> 
> Thanks,
> 
> James
> 
> 
> 
> > +If this virtual SError is taken to EL1 using AArch64, this value will
> > +be reported into ISS filed of ESR_EL1.
> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index
> > 3256b92..38c8a64 100644
> > --- a/arch/arm64/kvm/reset.c
> > +++ b/arch/arm64/kvm/reset.c
> > @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, 
> > long ext)
> > case KVM_CAP_ARM_PMU_V3:
> > r = kvm_arm_support_pmu_v3();
> > break;
> > +   case KVM_CAP_ARM_INJECT_SERROR_ESR:
> > +   r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
> > +   break;
> > case KVM_CAP_SET_GUEST_DEBUG:
> > case KVM_CAP_VCPU_ATTRIBUTES:
> > r = 1;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index
> > 8fb90a0..3587b33 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -934,6 +934,7 @@ struct kvm_ppc_resize_hpt {  #define
> > KVM_CAP_S390_AIS_MIGRATION 150  #define KVM_CAP_PPC_GET_CPU_CHAR 151
> > #define KVM_CAP_S390_BPB 152
> > +#define KVM_CAP_ARM_INJECT_SERROR_ESR 153
> >
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >
> >



Re: [PATCH v10 2/5] arm64: KVM: export the capability to set guest SError syndrome

2018-03-18 Thread gengdongjiu
Hi James,
   Thanks for your time to review and give comments.

[...]
> > +
> > +8.14 KVM_CAP_ARM_SET_SERROR_ESR
> > +
> > +Architectures: arm, arm64
> > +
> > +This capability indicates that userspace can specify syndrome value
> > +reported to guest OS when guest takes a virtual SError interrupt exception.
> 
> "when userspace triggers a virtual SError"... how?

In the user space(QEMU), it will call kvm_arch_put_registers() or 
kvm_arch_get_registers() to set or get KVM registers through KVM_SET_ONE_REG/ 
KVM_GET_ONE_REG IOCTL, at the same time the two functions will separately call 
kvm_arm_vcpu_get_events() and kvm_arm_vcpu_set_events() to get/set vcpu events. 
If user space want to trigger a virtual SError with specified ESR, it only need 
to setup the kvm_vcpu_events struct(exception.serror_pending = 1; 
exception.serror_has_esr=1; serror_esr=x), then KVM will trigger this 
virtual SError.

userspace can trigger it at any time, for example, for debug purpose.  Or 
simulate a SError after recording a CPER for guest. But before triggering a 
virtual SError, it needs to know whether KVM has such capability, so KVM needs 
to export this capability to user space. If has this capability, User space 
will call kvm_arm_vcpu_set_events() to trigger a virtual SError.

> 
> 
> > +If KVM has this capability, userspace can only specify the ISS field
> > +for the ESR syndrome, can not specify the EC field which is not under 
> > control by KVM.
> 
> Where do I put the ESR?
> If you re-order this after the patch that adds the API, you can describe how 
> this can be used.

Ok, thank a lot for your suggestion.

> 
> 
> Thanks,
> 
> James
> 
> 
> 
> > +If this virtual SError is taken to EL1 using AArch64, this value will
> > +be reported into ISS filed of ESR_EL1.
> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index
> > 3256b92..38c8a64 100644
> > --- a/arch/arm64/kvm/reset.c
> > +++ b/arch/arm64/kvm/reset.c
> > @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, 
> > long ext)
> > case KVM_CAP_ARM_PMU_V3:
> > r = kvm_arm_support_pmu_v3();
> > break;
> > +   case KVM_CAP_ARM_INJECT_SERROR_ESR:
> > +   r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
> > +   break;
> > case KVM_CAP_SET_GUEST_DEBUG:
> > case KVM_CAP_VCPU_ATTRIBUTES:
> > r = 1;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index
> > 8fb90a0..3587b33 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -934,6 +934,7 @@ struct kvm_ppc_resize_hpt {  #define
> > KVM_CAP_S390_AIS_MIGRATION 150  #define KVM_CAP_PPC_GET_CPU_CHAR 151
> > #define KVM_CAP_S390_BPB 152
> > +#define KVM_CAP_ARM_INJECT_SERROR_ESR 153
> >
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >
> >



Re: [PATCH v10 3/5] arm/arm64: KVM: Introduce set and get per-vcpu event

2018-03-18 Thread gengdongjiu
Hi James,
   Thanks for your review and good suggestion.

> 
> Hi Dongjiu Geng,
> 
> On 03/03/18 16:09, Dongjiu Geng wrote:
> > RAS Extension provides VSESR_EL2 register to specify virtual SError
> > syndrome value, this patch adds a new IOCTL to export user-invisible
> > states related to SError exceptions. User space can setup the
> > kvm_vcpu_events to inject specified SError, also it can support live
> > migration.
> 
> > diff --git a/Documentation/virtual/kvm/api.txt
> > b/Documentation/virtual/kvm/api.txt
> > index 8a3d708..26ae151 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -819,11 +819,13 @@ struct kvm_clock_data {
> >
> >  Capability: KVM_CAP_VCPU_EVENTS
> >  Extended by: KVM_CAP_INTR_SHADOW
> > -Architectures: x86
> > +Architectures: x86, arm, arm64
> >  Type: vm ioctl
> >  Parameters: struct kvm_vcpu_event (out)
> >  Returns: 0 on success, -1 on error
> >
> > +X86:
> > +
> >  Gets currently pending exceptions, interrupts, and NMIs as well as
> > related  states of the vcpu.
> >
> > @@ -865,15 +867,29 @@ Only two fields are defined in the flags field:
> >  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
> >smi contains a valid state.
> >
> > +ARM, ARM64:
> > +
> > +Gets currently pending SError exceptions as well as related states of the 
> > vcpu.
> > +
> > +struct kvm_vcpu_events {
> > +   struct {
> > +   bool serror_pending;
> > +   bool serror_has_esr;
> > +   u64 serror_esr;
> > +   } exception;
> > +};
> 
> Don't put bool in an ABI struct. The encoding is up to the compiler.
> The compiler will insert padding in this struct to make serror_esr naturally 
> aligned. Different compilers may do it differently. You'll see that
> the existing struct kvm_vcpu_events has 'pad' fields to ensure each element 
> in the struct is naturally aligned.

I checked the exited x86 strut kvm_vcpu_events definition, it aligned to 32 
bits, so how about using below kvm_vcpu_events struct definition for arm64?
struct kvm_vcpu_events {
   struct {
   __u8_8 serror_pending;
   __u8 serror_has_esr;
   __u8 pad[2];
   __u64 serror_esr;
   } exception;
};

> 
> serror_pending and serror_has_esr need to be in a flags field.
How about this definition?
struct kvm_vcpu_events {
   struct {
   __u8_8 serror_pending;
   __u8 serror_has_esr;
   __u8 pad[2];
   __u64 serror_esr;
   } exception;
};

> 
> I thought the logic for re-using the CAP was so user-space could re-use 
> save/restore code to transfer whatever we put in here during
> migration. If the struct is a different size the code has to be different 
> anyway.
> My understanding of Drew and Christoffer's comments was that we should re-use 
> the existing struct. (but now that I look at it, its not so
> clear).
> 
> (If we reuse the struct, we can put the esr in exception.error_code, if we 
> can get away with it: It would be good to union exception up with
> a u64, then use that. This would let us transfer anything we need in those 
> RES0 bits of the 64bit VSESR_EL2).

It seems Drew and Christoffer's comments suggested to use the 
KVM_GET/SET_VCPU_EVENTS ABI, not suggested arm64 must use the same struct
kvm_vcpu_events definition with x86. 
> 
> 
> >  4.32 KVM_SET_VCPU_EVENTS
> >
> >  Capability: KVM_CAP_VCPU_EVENTS
> >  Extended by: KVM_CAP_INTR_SHADOW
> > -Architectures: x86
> > +Architectures: x86, arm, arm64
> >  Type: vm ioctl
> >  Parameters: struct kvm_vcpu_event (in)
> >  Returns: 0 on success, -1 on error
> >
> > +X86:
> > +
> >  Set pending exceptions, interrupts, and NMIs as well as related
> > states of the  vcpu.
> >
> > @@ -894,6 +910,12 @@ shall be written into the VCPU.
> >
> >  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
> >
> > +ARM, ARM64:
> > +
> > +Set pending SError exceptions as well as related states of the vcpu.
> > +
> > +See KVM_GET_VCPU_EVENTS for the data structure.
> > +
> >
> >  4.33 KVM_GET_DEBUGREGS
> >
> 
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h
> > b/arch/arm64/include/uapi/asm/kvm.h
> > index 9abbf30..32c0eae 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -39,6 +39,7 @@
> >  #define __KVM_HAVE_GUEST_DEBUG
> >  #define __KVM_HAVE_IRQ_LINE
> >  #define __KVM_HAVE_READONLY_MEM
> > +#define __KVM_HAVE_VCPU_EVENTS
> >
> >  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> >
> > @@ -153,6 +154,15 @@ struct kvm_sync_regs {  struct
> > kvm_arch_memory_slot {  };
> >
> > +/* for KVM_GET/SET_VCPU_EVENTS */
> > +struct kvm_vcpu_events {
> > +   struct {
> > +   bool serror_pending;
> > +   bool serror_has_esr;
> > +   u64 serror_esr;
> > +   } exception;
> > +};
> > +
> 
> >  /* If you need to interpret the index values, here is the key: */
> >  #define KVM_REG_ARM_COPROC_MASK

Re: [PATCH v10 3/5] arm/arm64: KVM: Introduce set and get per-vcpu event

2018-03-18 Thread gengdongjiu
Hi James,
   Thanks for your review and good suggestion.

> 
> Hi Dongjiu Geng,
> 
> On 03/03/18 16:09, Dongjiu Geng wrote:
> > RAS Extension provides VSESR_EL2 register to specify virtual SError
> > syndrome value, this patch adds a new IOCTL to export user-invisible
> > states related to SError exceptions. User space can setup the
> > kvm_vcpu_events to inject specified SError, also it can support live
> > migration.
> 
> > diff --git a/Documentation/virtual/kvm/api.txt
> > b/Documentation/virtual/kvm/api.txt
> > index 8a3d708..26ae151 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -819,11 +819,13 @@ struct kvm_clock_data {
> >
> >  Capability: KVM_CAP_VCPU_EVENTS
> >  Extended by: KVM_CAP_INTR_SHADOW
> > -Architectures: x86
> > +Architectures: x86, arm, arm64
> >  Type: vm ioctl
> >  Parameters: struct kvm_vcpu_event (out)
> >  Returns: 0 on success, -1 on error
> >
> > +X86:
> > +
> >  Gets currently pending exceptions, interrupts, and NMIs as well as
> > related  states of the vcpu.
> >
> > @@ -865,15 +867,29 @@ Only two fields are defined in the flags field:
> >  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
> >smi contains a valid state.
> >
> > +ARM, ARM64:
> > +
> > +Gets currently pending SError exceptions as well as related states of the 
> > vcpu.
> > +
> > +struct kvm_vcpu_events {
> > +   struct {
> > +   bool serror_pending;
> > +   bool serror_has_esr;
> > +   u64 serror_esr;
> > +   } exception;
> > +};
> 
> Don't put bool in an ABI struct. The encoding is up to the compiler.
> The compiler will insert padding in this struct to make serror_esr naturally 
> aligned. Different compilers may do it differently. You'll see that
> the existing struct kvm_vcpu_events has 'pad' fields to ensure each element 
> in the struct is naturally aligned.

I checked the exited x86 strut kvm_vcpu_events definition, it aligned to 32 
bits, so how about using below kvm_vcpu_events struct definition for arm64?
struct kvm_vcpu_events {
   struct {
   __u8_8 serror_pending;
   __u8 serror_has_esr;
   __u8 pad[2];
   __u64 serror_esr;
   } exception;
};

> 
> serror_pending and serror_has_esr need to be in a flags field.
How about this definition?
struct kvm_vcpu_events {
   struct {
   __u8_8 serror_pending;
   __u8 serror_has_esr;
   __u8 pad[2];
   __u64 serror_esr;
   } exception;
};

> 
> I thought the logic for re-using the CAP was so user-space could re-use 
> save/restore code to transfer whatever we put in here during
> migration. If the struct is a different size the code has to be different 
> anyway.
> My understanding of Drew and Christoffer's comments was that we should re-use 
> the existing struct. (but now that I look at it, its not so
> clear).
> 
> (If we reuse the struct, we can put the esr in exception.error_code, if we 
> can get away with it: It would be good to union exception up with
> a u64, then use that. This would let us transfer anything we need in those 
> RES0 bits of the 64bit VSESR_EL2).

It seems Drew and Christoffer's comments suggested to use the 
KVM_GET/SET_VCPU_EVENTS ABI, not suggested arm64 must use the same struct
kvm_vcpu_events definition with x86. 
> 
> 
> >  4.32 KVM_SET_VCPU_EVENTS
> >
> >  Capability: KVM_CAP_VCPU_EVENTS
> >  Extended by: KVM_CAP_INTR_SHADOW
> > -Architectures: x86
> > +Architectures: x86, arm, arm64
> >  Type: vm ioctl
> >  Parameters: struct kvm_vcpu_event (in)
> >  Returns: 0 on success, -1 on error
> >
> > +X86:
> > +
> >  Set pending exceptions, interrupts, and NMIs as well as related
> > states of the  vcpu.
> >
> > @@ -894,6 +910,12 @@ shall be written into the VCPU.
> >
> >  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
> >
> > +ARM, ARM64:
> > +
> > +Set pending SError exceptions as well as related states of the vcpu.
> > +
> > +See KVM_GET_VCPU_EVENTS for the data structure.
> > +
> >
> >  4.33 KVM_GET_DEBUGREGS
> >
> 
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h
> > b/arch/arm64/include/uapi/asm/kvm.h
> > index 9abbf30..32c0eae 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -39,6 +39,7 @@
> >  #define __KVM_HAVE_GUEST_DEBUG
> >  #define __KVM_HAVE_IRQ_LINE
> >  #define __KVM_HAVE_READONLY_MEM
> > +#define __KVM_HAVE_VCPU_EVENTS
> >
> >  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> >
> > @@ -153,6 +154,15 @@ struct kvm_sync_regs {  struct
> > kvm_arch_memory_slot {  };
> >
> > +/* for KVM_GET/SET_VCPU_EVENTS */
> > +struct kvm_vcpu_events {
> > +   struct {
> > +   bool serror_pending;
> > +   bool serror_has_esr;
> > +   u64 serror_esr;
> > +   } exception;
> > +};
> > +
> 
> >  /* If you need to interpret the index values, here is the key: */
> >  #define KVM_REG_ARM_COPROC_MASK

Re: [PATCH v10 1/5] arm64: KVM: Prepare set virtual SEI syndrome value

2018-03-17 Thread gengdongjiu
Hi James,

> 
> Hi Dongjiu Geng,
> 
> On 03/03/18 16:09, Dongjiu Geng wrote:
> > Export one API to specify virtual SEI syndrome value for guest, and
> > add a helper to get the VSESR_EL2 value.
> 
> This patch adds two helpers that nothing calls... its not big, please merge 
> it with the patch that uses these.

OK, thanks, I will merge them.

> 
> 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h
> > b/arch/arm64/include/asm/kvm_emulate.h
> > index 413dc82..3294885 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -71,6 +71,11 @@ static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, 
> > unsigned long hcr)
> > vcpu->arch.hcr_el2 = hcr;
> >  }
> >
> > +static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu) {
> > +   return vcpu->arch.vsesr_el2;
> > +}
> > +
> >  static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
> > {
> > vcpu->arch.vsesr_el2 = vsesr;
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> > b/arch/arm64/include/asm/kvm_host.h
> > index a73f63a..3dc49b7 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -354,6 +354,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu,
> > struct kvm_run *run,  int kvm_perf_init(void);  int
> > kvm_perf_teardown(void);
> >
> > +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
> > +
> >  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long
> > mpidr);
> >
> >  static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, diff
> > --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> > index 60666a0..78ecb28 100644
> > --- a/arch/arm64/kvm/inject_fault.c
> > +++ b/arch/arm64/kvm/inject_fault.c
> > @@ -186,3 +186,8 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu)  {
> > pend_guest_serror(vcpu, ESR_ELx_ISV);  }
> > +
> > +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome) {
> > +   pend_guest_serror(vcpu, syndrome & ESR_ELx_ISS_MASK);
> 
> If you move the ISS_MASK into pend_guest_serror(), you wouldn't need this at 
> all.
> 
> It would be better if any validation were in the user-space helpers so we can 
> check user-space hasn't put something funny in the top bits.

Yes, I agree your idea, thanks for the good suggestion.

> 
> > +}
> >
> 
> 
> Thanks,
> 
> James


Re: [PATCH v10 1/5] arm64: KVM: Prepare set virtual SEI syndrome value

2018-03-17 Thread gengdongjiu
Hi James,

> 
> Hi Dongjiu Geng,
> 
> On 03/03/18 16:09, Dongjiu Geng wrote:
> > Export one API to specify virtual SEI syndrome value for guest, and
> > add a helper to get the VSESR_EL2 value.
> 
> This patch adds two helpers that nothing calls... its not big, please merge 
> it with the patch that uses these.

OK, thanks, I will merge them.

> 
> 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h
> > b/arch/arm64/include/asm/kvm_emulate.h
> > index 413dc82..3294885 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -71,6 +71,11 @@ static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, 
> > unsigned long hcr)
> > vcpu->arch.hcr_el2 = hcr;
> >  }
> >
> > +static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu) {
> > +   return vcpu->arch.vsesr_el2;
> > +}
> > +
> >  static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
> > {
> > vcpu->arch.vsesr_el2 = vsesr;
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> > b/arch/arm64/include/asm/kvm_host.h
> > index a73f63a..3dc49b7 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -354,6 +354,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu,
> > struct kvm_run *run,  int kvm_perf_init(void);  int
> > kvm_perf_teardown(void);
> >
> > +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
> > +
> >  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long
> > mpidr);
> >
> >  static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, diff
> > --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> > index 60666a0..78ecb28 100644
> > --- a/arch/arm64/kvm/inject_fault.c
> > +++ b/arch/arm64/kvm/inject_fault.c
> > @@ -186,3 +186,8 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu)  {
> > pend_guest_serror(vcpu, ESR_ELx_ISV);  }
> > +
> > +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome) {
> > +   pend_guest_serror(vcpu, syndrome & ESR_ELx_ISS_MASK);
> 
> If you move the ISS_MASK into pend_guest_serror(), you wouldn't need this at 
> all.
> 
> It would be better if any validation were in the user-space helpers so we can 
> check user-space hasn't put something funny in the top bits.

Yes, I agree your idea, thanks for the good suggestion.

> 
> > +}
> >
> 
> 
> Thanks,
> 
> James


Re: [PATCH] arm64: rename the function arm64_is_ras_serror() to avoid confusion

2018-03-17 Thread gengdongjiu
Hi James,

> Hi gengdongjiu,
> 
> On 26/02/18 16:13, gengdongjiu wrote:
> > 2018-02-24 1:58 GMT+08:00 James Morse <james.mo...@arm.com>:
> >> On 22/02/18 18:02, Dongjiu Geng wrote:
> >>> The RAS SError Syndrome can be Implementation-Defined,
> >>> arm64_is_ras_serror() is used to judge whether it is RAS SError, but
> >>> arm64_is_ras_serror() does not include this judgement. In order to
> >>> avoid function name confusion, we rename the arm64_is_ras_serror()
> >>> to arm64_is_categorized_ras_serror(), this function is used to judge
> >>> whether it is categorized RAS Serror.
> >>
> >> I don't see how 'categorized' is relevant. The most significant ISS
> >> bit is used to determine if this is an IMP-DEF ESR, or one that uses the 
> >> architected layout.
> >
> > From the name arm64_is_ras_serror(), it used to judge whether this is
> > RAS Serror, but arm64_is_ras_serror() think the IMP-DEF SError is not
> > RAS SError, as shown the code note and code in[1].
> 
> > In fact the IMP-DEF SError is also RAS SError, so when I read the
> > code, it looks like
> 
> This is just you then. No-one else has your imp-def:RAS error ESR values.
> 
> This would be like me adding some impdef branch instruction, then claiming
> aarch64_insn_is_branch() doesn't take account of my private additions.
> 
> I agree the name is assuming all architected ESR are RAS-errors, and that 
> impdef ESR are just that: impdef, that's all we know about them.
> Unless this causes us to do the wrong thing, I don't think it matters.
> Obviously we would need to change it if a new architected ESR is added.

Ok, let us keep the current code and not change it until a new architected ESR 
is added


Re: [PATCH] arm64: rename the function arm64_is_ras_serror() to avoid confusion

2018-03-17 Thread gengdongjiu
Hi James,

> Hi gengdongjiu,
> 
> On 26/02/18 16:13, gengdongjiu wrote:
> > 2018-02-24 1:58 GMT+08:00 James Morse :
> >> On 22/02/18 18:02, Dongjiu Geng wrote:
> >>> The RAS SError Syndrome can be Implementation-Defined,
> >>> arm64_is_ras_serror() is used to judge whether it is RAS SError, but
> >>> arm64_is_ras_serror() does not include this judgement. In order to
> >>> avoid function name confusion, we rename the arm64_is_ras_serror()
> >>> to arm64_is_categorized_ras_serror(), this function is used to judge
> >>> whether it is categorized RAS Serror.
> >>
> >> I don't see how 'categorized' is relevant. The most significant ISS
> >> bit is used to determine if this is an IMP-DEF ESR, or one that uses the 
> >> architected layout.
> >
> > From the name arm64_is_ras_serror(), it used to judge whether this is
> > RAS Serror, but arm64_is_ras_serror() think the IMP-DEF SError is not
> > RAS SError, as shown the code note and code in[1].
> 
> > In fact the IMP-DEF SError is also RAS SError, so when I read the
> > code, it looks like
> 
> This is just you then. No-one else has your imp-def:RAS error ESR values.
> 
> This would be like me adding some impdef branch instruction, then claiming
> aarch64_insn_is_branch() doesn't take account of my private additions.
> 
> I agree the name is assuming all architected ESR are RAS-errors, and that 
> impdef ESR are just that: impdef, that's all we know about them.
> Unless this causes us to do the wrong thing, I don't think it matters.
> Obviously we would need to change it if a new architected ESR is added.

Ok, let us keep the current code and not change it until a new architected ESR 
is added


Re: [PATCH v10 1/5] arm64: KVM: Prepare set virtual SEI syndrome value

2018-03-16 Thread gengdongjiu
Hi James,
   Thank you very much for this mail and your time to review this patch. 
Appreciate that.
I will check it and reply.


On 2018/3/16 4:37, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 03/03/18 16:09, Dongjiu Geng wrote:
>> Export one API to specify virtual SEI syndrome value
>> for guest, and add a helper to get the VSESR_EL2 value.
> 
> This patch adds two helpers that nothing calls... its not big, please merge it
> with the patch that uses these.
> 
> 
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index 413dc82..3294885 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -71,6 +71,11 @@ static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, 
>> unsigned long hcr)
>>  vcpu->arch.hcr_el2 = hcr;
>>  }
>>  
>> +static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
>> +{
>> +return vcpu->arch.vsesr_el2;
>> +}
>> +
>>  static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
>>  {
>>  vcpu->arch.vsesr_el2 = vsesr;
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index a73f63a..3dc49b7 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -354,6 +354,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct 
>> kvm_run *run,
>>  int kvm_perf_init(void);
>>  int kvm_perf_teardown(void);
>>  
>> +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>> +
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>>  
>>  static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
>> index 60666a0..78ecb28 100644
>> --- a/arch/arm64/kvm/inject_fault.c
>> +++ b/arch/arm64/kvm/inject_fault.c
>> @@ -186,3 +186,8 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu)
>>  {
>>  pend_guest_serror(vcpu, ESR_ELx_ISV);
>>  }
>> +
>> +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome)
>> +{
>> +pend_guest_serror(vcpu, syndrome & ESR_ELx_ISS_MASK);
> 
> If you move the ISS_MASK into pend_guest_serror(), you wouldn't need this at 
> all.
> 
> It would be better if any validation were in the user-space helpers so we can
> check user-space hasn't put something funny in the top bits.
> 
>> +}
>>
> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v10 1/5] arm64: KVM: Prepare set virtual SEI syndrome value

2018-03-16 Thread gengdongjiu
Hi James,
   Thank you very much for this mail and your time to review this patch. 
Appreciate that.
I will check it and reply.


On 2018/3/16 4:37, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 03/03/18 16:09, Dongjiu Geng wrote:
>> Export one API to specify virtual SEI syndrome value
>> for guest, and add a helper to get the VSESR_EL2 value.
> 
> This patch adds two helpers that nothing calls... its not big, please merge it
> with the patch that uses these.
> 
> 
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index 413dc82..3294885 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -71,6 +71,11 @@ static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, 
>> unsigned long hcr)
>>  vcpu->arch.hcr_el2 = hcr;
>>  }
>>  
>> +static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
>> +{
>> +return vcpu->arch.vsesr_el2;
>> +}
>> +
>>  static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
>>  {
>>  vcpu->arch.vsesr_el2 = vsesr;
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index a73f63a..3dc49b7 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -354,6 +354,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct 
>> kvm_run *run,
>>  int kvm_perf_init(void);
>>  int kvm_perf_teardown(void);
>>  
>> +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>> +
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>>  
>>  static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
>> index 60666a0..78ecb28 100644
>> --- a/arch/arm64/kvm/inject_fault.c
>> +++ b/arch/arm64/kvm/inject_fault.c
>> @@ -186,3 +186,8 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu)
>>  {
>>  pend_guest_serror(vcpu, ESR_ELx_ISV);
>>  }
>> +
>> +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome)
>> +{
>> +pend_guest_serror(vcpu, syndrome & ESR_ELx_ISS_MASK);
> 
> If you move the ISS_MASK into pend_guest_serror(), you wouldn't need this at 
> all.
> 
> It would be better if any validation were in the user-space helpers so we can
> check user-space hasn't put something funny in the top bits.
> 
>> +}
>>
> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2018-03-07 Thread gengdongjiu
Hi James,
   sorry for my late response due to chines new year.

2018-02-16 1:55 GMT+08:00 James Morse <james.mo...@arm.com>:
> Hi gengdongjiu,
>
> On 12/02/18 10:19, gengdongjiu wrote:
>> On 2018/2/10 1:44, James Morse wrote:
>>> The point? We can't know what a CPU without the RAS extensions puts in 
>>> there.
>>>
>>> Why Does this matter? When migrating a pending SError we have to know the
>>> difference between 'use this 64bit value', and 'the CPU will generate it'.
>>> If I make an SError pending with ESR=0 on a CPU with VSESR, I can't 
>>> migrated to
>>> a system that generates an impdef SError-ESR, because I can't know it will 
>>> be 0.
>
>> For the target system, before taking the SError, no one can know whether its 
>> syndrome value
>> is IMPLEMENTATION DEFINED or architecturally defined.
>
> For a virtual-SError, the hypervisor knows what it generated. (do I have
> VSESR_EL2? What did I put in there?).
>
>
>> when the virtual SError is taken, the ESR_ELx.IDS will be updated, then we 
>> can know
>> whether the ESR value is impdef or architecturally defined.
>
> True, the guest can't know anything about a pending virtual SError until it
> takes it. Why is this a problem?
>
>
>> It seems migration is only allowed only when target system and source system 
>> all support
>> RAS extension, because we do not know whether its syndrome is IMPLEMENTATION 
>> DEFINED or
>> architecturally defined.
>
> I don't think Qemu allows migration between hosts with differing guest-ID
> registers. But we shouldn't depend on this, and we may want to hide the v8.2 
> RAS
> features from the guest's ID register, but still use them from the host.
>
> The way I imagined it working was we would pack the following information into
> that events struct:
> {
> bool serror_pending;
> bool serror_has_esr;
> u64  serror_esr;
> }

I have used your suggestion struct

>
> The problem I was trying to describe is because there is no value of 
> serror_esr
> we can use to mean 'Ignore this, I'm a v8.0 CPU'. VSESR_EL2 is a 64bit 
> register,
> any bits we abuse may get a meaning we want to use in the future.
>
> When it comes to migration, v8.{0,1} systems can only GET/SET events where
> serror_has_esr == false, they can't use the serror_esr. On v8.2 systems we
> should require serror_has_esr to be true.
yes, I agreed.

>
> If we need to support migration from v8.{0,1} to v8.2, we can make up an 
> impdef
> serror_esr.

For the Qemu migration, I need to check more the QEMU code.


Hi Andrew,
  I use KVM_GET/SET_VCPU_EVENTS IOCTL to migrate the Serror
exception status of VM,
The even struct is shown below:

{
  bool serror_pending;
  bool serror_has_esr;
 u64  serror_esr;
}

Only when the target machine is armv8.2, it needs to set the
serror_esr(SError Exception Syndrome Register).
for the armv8.0,  software can not set the serror_esr(SError Exception
Syndrome Register).
so when migration from v8.{0,1} to v8.2, QEMU should make up an impdef
serror_esr for the v8.2 target.
can you give me some suggestion how to set that register in the QEMU?
I do not familar with the QEMU migration.
Thanks very much.

>
> We will need to decide what KVM does when SET is called but an SError was
> already pending. 2.5.3 "Multiple SError interrupts" of [0] has something to 
> say.

how about KVM set again to the same VCPU?

>
>
> Happy new year,
thanks!

>
> James
>
>
> [0]
> https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2018-03-07 Thread gengdongjiu
Hi James,
   sorry for my late response due to chines new year.

2018-02-16 1:55 GMT+08:00 James Morse :
> Hi gengdongjiu,
>
> On 12/02/18 10:19, gengdongjiu wrote:
>> On 2018/2/10 1:44, James Morse wrote:
>>> The point? We can't know what a CPU without the RAS extensions puts in 
>>> there.
>>>
>>> Why Does this matter? When migrating a pending SError we have to know the
>>> difference between 'use this 64bit value', and 'the CPU will generate it'.
>>> If I make an SError pending with ESR=0 on a CPU with VSESR, I can't 
>>> migrated to
>>> a system that generates an impdef SError-ESR, because I can't know it will 
>>> be 0.
>
>> For the target system, before taking the SError, no one can know whether its 
>> syndrome value
>> is IMPLEMENTATION DEFINED or architecturally defined.
>
> For a virtual-SError, the hypervisor knows what it generated. (do I have
> VSESR_EL2? What did I put in there?).
>
>
>> when the virtual SError is taken, the ESR_ELx.IDS will be updated, then we 
>> can know
>> whether the ESR value is impdef or architecturally defined.
>
> True, the guest can't know anything about a pending virtual SError until it
> takes it. Why is this a problem?
>
>
>> It seems migration is only allowed only when target system and source system 
>> all support
>> RAS extension, because we do not know whether its syndrome is IMPLEMENTATION 
>> DEFINED or
>> architecturally defined.
>
> I don't think Qemu allows migration between hosts with differing guest-ID
> registers. But we shouldn't depend on this, and we may want to hide the v8.2 
> RAS
> features from the guest's ID register, but still use them from the host.
>
> The way I imagined it working was we would pack the following information into
> that events struct:
> {
> bool serror_pending;
> bool serror_has_esr;
> u64  serror_esr;
> }

I have used your suggestion struct

>
> The problem I was trying to describe is because there is no value of 
> serror_esr
> we can use to mean 'Ignore this, I'm a v8.0 CPU'. VSESR_EL2 is a 64bit 
> register,
> any bits we abuse may get a meaning we want to use in the future.
>
> When it comes to migration, v8.{0,1} systems can only GET/SET events where
> serror_has_esr == false, they can't use the serror_esr. On v8.2 systems we
> should require serror_has_esr to be true.
yes, I agreed.

>
> If we need to support migration from v8.{0,1} to v8.2, we can make up an 
> impdef
> serror_esr.

For the Qemu migration, I need to check more the QEMU code.


Hi Andrew,
  I use KVM_GET/SET_VCPU_EVENTS IOCTL to migrate the Serror
exception status of VM,
The even struct is shown below:

{
  bool serror_pending;
  bool serror_has_esr;
 u64  serror_esr;
}

Only when the target machine is armv8.2, it needs to set the
serror_esr(SError Exception Syndrome Register).
for the armv8.0,  software can not set the serror_esr(SError Exception
Syndrome Register).
so when migration from v8.{0,1} to v8.2, QEMU should make up an impdef
serror_esr for the v8.2 target.
can you give me some suggestion how to set that register in the QEMU?
I do not familar with the QEMU migration.
Thanks very much.

>
> We will need to decide what KVM does when SET is called but an SError was
> already pending. 2.5.3 "Multiple SError interrupts" of [0] has something to 
> say.

how about KVM set again to the same VCPU?

>
>
> Happy new year,
thanks!

>
> James
>
>
> [0]
> https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: rename the function arm64_is_ras_serror() to avoid confusion

2018-02-26 Thread gengdongjiu
Hi James,
   Thanks a lot for your review.

2018-02-24 1:58 GMT+08:00 James Morse :
> Hi Dongjiu Geng,
>
> On 22/02/18 18:02, Dongjiu Geng wrote:
>> The RAS SError Syndrome can be Implementation-Defined,
>> arm64_is_ras_serror() is used to judge whether it is RAS SError,
>> but arm64_is_ras_serror() does not include this judgement. In order
>> to avoid function name confusion, we rename the arm64_is_ras_serror()
>> to arm64_is_categorized_ras_serror(), this function is used to
>> judge whether it is categorized RAS Serror.
>
> I don't see how 'categorized' is relevant. The most significant ISS bit is 
> used
> to determine if this is an IMP-DEF ESR, or one that uses the architected 
> layout.

>From the name arm64_is_ras_serror(), it used to judge whether this is
RAS Serror,
but arm64_is_ras_serror() think the IMP-DEF SError is not RAS SError,
as shown the code note
and code in[1]. In fact the IMP-DEF SError is also RAS SError, so when
I read the code, it looks like
confusion, so I rename it to arm64_is_categorized_ras_serror(), then
this function is only used to
judge whether this is categorized RAS SError,
if it is categorized, the code will continue judge its Asynchronous Error Type.
if it is uncategorized, the code will panic(this is the original code
logic) or not panic when we support kernel-first or can isolate the
SError

[1]:
 /*
  *  ...
   *CPUs with the RAS extensions have an Implementation-Defined-Syndrome bit
- * If this bit is set, we know its not a RAS SError.
  * 
...
  */
static inline bool arm64_is_ras_serror(u32 esr)
 {
WARN_ON(preemptible());

if (esr & ESR_ELx_IDS)
return false;
.
}

>
> DFSC of zero means uncategorised. What should we do with an uncategorised 
> SError?

yes, so it returns false because it is uncategorised.

>
> From 2.4.3 "ESB and other physical errors" of [0]
> | It is IMPLEMENTATION DEFINED whether [..]uncategorized SError interrupts
> | are containable or Uncontainable, and whether they can be synchronized by an
> | Error Synchronization Barrier.
>
> We treat uncategorized as uncontainable as that's the worst thing it could be.

in original code, system will panic if it is uncategorized, I do not
change this logic.
 I only rename this function name to make it less confusion
it no need to panic for uncontainable if we can isolate this error.


>
> On aarch32 uncontainable and uncategorized even share an encoding.
> (AET:00 in G7.2.43 "DFSR, Data Fault Status Register", 'state of the PE after
> taking the SError interrupt exception')
>
>
>> Change some code notes, unrecoverable RAS errors is imprecise, but
>> Recoverable RAS errors is precise.
>
> Unrecoverable and Recoverable (but we don't know where it is) are both grouped
> into 'can't make progress'. The comment says:
> | The exception may have been imprecise.
> because one of those two is imprecise.
>
> If anyone cares which one is which, they can read the spec.
>
> I expect this code will one day be replaced with proper kernel-first support,
> its only here to avoid panic()ing due to corrected errors.

Ok, thanks for the explanation.


>
>
> Thanks,
>
> James
>
> [0]
> https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf


Re: [PATCH] arm64: rename the function arm64_is_ras_serror() to avoid confusion

2018-02-26 Thread gengdongjiu
Hi James,
   Thanks a lot for your review.

2018-02-24 1:58 GMT+08:00 James Morse :
> Hi Dongjiu Geng,
>
> On 22/02/18 18:02, Dongjiu Geng wrote:
>> The RAS SError Syndrome can be Implementation-Defined,
>> arm64_is_ras_serror() is used to judge whether it is RAS SError,
>> but arm64_is_ras_serror() does not include this judgement. In order
>> to avoid function name confusion, we rename the arm64_is_ras_serror()
>> to arm64_is_categorized_ras_serror(), this function is used to
>> judge whether it is categorized RAS Serror.
>
> I don't see how 'categorized' is relevant. The most significant ISS bit is 
> used
> to determine if this is an IMP-DEF ESR, or one that uses the architected 
> layout.

>From the name arm64_is_ras_serror(), it used to judge whether this is
RAS Serror,
but arm64_is_ras_serror() think the IMP-DEF SError is not RAS SError,
as shown the code note
and code in[1]. In fact the IMP-DEF SError is also RAS SError, so when
I read the code, it looks like
confusion, so I rename it to arm64_is_categorized_ras_serror(), then
this function is only used to
judge whether this is categorized RAS SError,
if it is categorized, the code will continue judge its Asynchronous Error Type.
if it is uncategorized, the code will panic(this is the original code
logic) or not panic when we support kernel-first or can isolate the
SError

[1]:
 /*
  *  ...
   *CPUs with the RAS extensions have an Implementation-Defined-Syndrome bit
- * If this bit is set, we know its not a RAS SError.
  * 
...
  */
static inline bool arm64_is_ras_serror(u32 esr)
 {
WARN_ON(preemptible());

if (esr & ESR_ELx_IDS)
return false;
.
}

>
> DFSC of zero means uncategorised. What should we do with an uncategorised 
> SError?

yes, so it returns false because it is uncategorised.

>
> From 2.4.3 "ESB and other physical errors" of [0]
> | It is IMPLEMENTATION DEFINED whether [..]uncategorized SError interrupts
> | are containable or Uncontainable, and whether they can be synchronized by an
> | Error Synchronization Barrier.
>
> We treat uncategorized as uncontainable as that's the worst thing it could be.

in original code, system will panic if it is uncategorized, I do not
change this logic.
 I only rename this function name to make it less confusion
it no need to panic for uncontainable if we can isolate this error.


>
> On aarch32 uncontainable and uncategorized even share an encoding.
> (AET:00 in G7.2.43 "DFSR, Data Fault Status Register", 'state of the PE after
> taking the SError interrupt exception')
>
>
>> Change some code notes, unrecoverable RAS errors is imprecise, but
>> Recoverable RAS errors is precise.
>
> Unrecoverable and Recoverable (but we don't know where it is) are both grouped
> into 'can't make progress'. The comment says:
> | The exception may have been imprecise.
> because one of those two is imprecise.
>
> If anyone cares which one is which, they can read the spec.
>
> I expect this code will one day be replaced with proper kernel-first support,
> its only here to avoid panic()ing due to corrected errors.

Ok, thanks for the explanation.


>
>
> Thanks,
>
> James
>
> [0]
> https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf


Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2018-02-12 Thread gengdongjiu
Hi James,
 Thanks for the mail.

On 2018/2/10 1:44, James Morse wrote:
[...]
> 
>> its ESR is 0, can not control the virtual SError's syndrom value, it does 
>> not have
>> such registers to control that.
> 
> My point was its more nuanced than this: the ARM-ARM's
> TakeVirtualSErrorException() pseudo-code lets virtual-SError have an
> implementation defined syndrome. I've never seen Juno generate anything other
> than '0', but it might do something different on a thursday.

I checked the "aarch64/exceptions/asynch/AArch64.TakeVirtualSErrorException",
you are right, the virtual SError's syndrome value can be 0 or implementation 
defined value, not always 0,
which is decided by the "exception.syndrome<24>".
thanks for the clarification.

> 
> The point? We can't know what a CPU without the RAS extensions puts in there.
> 
> Why Does this matter? When migrating a pending SError we have to know the
> difference between 'use this 64bit value', and 'the CPU will generate it'.
> If I make an SError pending with ESR=0 on a CPU with VSESR, I can't migrated 
> to
> a system that generates an impdef SError-ESR, because I can't know it will be 
> 0.
Yes,
But it will have a issue,
For the target system, before taking the SError, no one can know whether its 
syndrome value
is IMPLEMENTATION DEFINED or architecturally defined.

when the virtual SError is taken, the ESR_ELx.IDS will be updated, then we can 
know
whether the ESR value is impdef or architecturally defined.

It seems migration is only allowed only when target system and source system 
all support RAS extension, because we do not know
whether its syndrome is IMPLEMENTATION DEFINED or architecturally defined.

> 
> 
>> Does Juno not have RAS Extension? 
> 
> It's two types of v8.0 CPU, no RAS extensions.
> 
> 
>> if yes, I think we can only inject an SError, but can not change its ESR 
>> value,
>> because it does not have vsesr_el2.
> 
> I agree, this means we need to be able to tell the difference between 
> 'pending'
> and 'pending with this ESR'.
> 
> 
>>> Just because we can't control the ESR doesn't mean injecting an SError isn't
>>> something user-space may want to do.
> 
>> yes, we may need to support user-space injects an SError even through CPU 
>> does not have RAS Extension.
>>
>>> If we tackle migration of pending-SError first, I think that will give us 
>>> the
>>> API to create a new pending SError with/without an ESR as appropriate.
>>
>> sure, we should. Last week, I checked the KVM_GET/SET_VCPU_EVENTS IOCTL, it 
>> should meet our
>> migration requirements
> 
> Great!
> 
> 
> Thanks,
> 
> James
> 
> 



Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2018-02-12 Thread gengdongjiu
Hi James,
 Thanks for the mail.

On 2018/2/10 1:44, James Morse wrote:
[...]
> 
>> its ESR is 0, can not control the virtual SError's syndrom value, it does 
>> not have
>> such registers to control that.
> 
> My point was its more nuanced than this: the ARM-ARM's
> TakeVirtualSErrorException() pseudo-code lets virtual-SError have an
> implementation defined syndrome. I've never seen Juno generate anything other
> than '0', but it might do something different on a thursday.

I checked the "aarch64/exceptions/asynch/AArch64.TakeVirtualSErrorException",
you are right, the virtual SError's syndrome value can be 0 or implementation 
defined value, not always 0,
which is decided by the "exception.syndrome<24>".
thanks for the clarification.

> 
> The point? We can't know what a CPU without the RAS extensions puts in there.
> 
> Why Does this matter? When migrating a pending SError we have to know the
> difference between 'use this 64bit value', and 'the CPU will generate it'.
> If I make an SError pending with ESR=0 on a CPU with VSESR, I can't migrated 
> to
> a system that generates an impdef SError-ESR, because I can't know it will be 
> 0.
Yes,
But it will have a issue,
For the target system, before taking the SError, no one can know whether its 
syndrome value
is IMPLEMENTATION DEFINED or architecturally defined.

when the virtual SError is taken, the ESR_ELx.IDS will be updated, then we can 
know
whether the ESR value is impdef or architecturally defined.

It seems migration is only allowed only when target system and source system 
all support RAS extension, because we do not know
whether its syndrome is IMPLEMENTATION DEFINED or architecturally defined.

> 
> 
>> Does Juno not have RAS Extension? 
> 
> It's two types of v8.0 CPU, no RAS extensions.
> 
> 
>> if yes, I think we can only inject an SError, but can not change its ESR 
>> value,
>> because it does not have vsesr_el2.
> 
> I agree, this means we need to be able to tell the difference between 
> 'pending'
> and 'pending with this ESR'.
> 
> 
>>> Just because we can't control the ESR doesn't mean injecting an SError isn't
>>> something user-space may want to do.
> 
>> yes, we may need to support user-space injects an SError even through CPU 
>> does not have RAS Extension.
>>
>>> If we tackle migration of pending-SError first, I think that will give us 
>>> the
>>> API to create a new pending SError with/without an ESR as appropriate.
>>
>> sure, we should. Last week, I checked the KVM_GET/SET_VCPU_EVENTS IOCTL, it 
>> should meet our
>> migration requirements
> 
> Great!
> 
> 
> Thanks,
> 
> James
> 
> 



Re: [PATCH v5 1/3] arm64/ras: support sea error recovery

2018-02-08 Thread gengdongjiu


On 2018/2/8 3:03, James Morse wrote:
> Hi Xie XiuQi,
> 
> On 30/01/18 19:19, James Morse wrote:
>> On 26/01/18 12:31, Xie XiuQi wrote:
>>> With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
>>> are consumed. According to the existing process, errors occurred in the
>>> kernel, leading to direct panic, if it occurred the user-space, we should
>>> just kill process.
>>>
>>> But there is a class of error, in fact, is not necessary to kill
>>> process, you can recover and continue to run the process. Such as
>>> the instruction data corrupted, where the memory page might be
>>> read-only, which is has not been modified, the disk might have the
>>> correct data, so you can directly drop the page, ant reload it when
>>> necessary.
>>
>> With firmware-first support, we do all this...
>>
>>
>>> So this patchset is just try to solve such problem: if the error is
>>> consumed in user-space and the error occurs on a clean page, you can
>>> directly drop the memory page without killing process.
>>>
>>> If the corrupted page is clean, just dropped it and return to user-space
>>> without side effects. And if corrupted page is dirty, memory_failure()
>>> will send SIGBUS with code=BUS_MCEERR_AR. While without this patchset,
>>> do_sea() will just send SIGBUS, so the process was killed in the same place.
>>
>> ... but this happens too. I agree its something we should fix, but I don't 
>> think
>> this is the best way to do it.
>>
>> This series is pulling the memory-failure-queue details back into the 
>> arch-code
>> to build a second list, that gets processed as extra work when we return to
>> user-space.
>>
>>
>> The root of the issue is ghes_notify_sea() claims the notification as 
>> something
>> APEI has dealt with, ... but it hasn't done it yet. The signals will be
>> generated by something currently stuck in a queue. (Evidently x86 doesn't 
>> handle
>> synchronous errors like this using firmware-first).
>>
>> I think a smaller fix is to give the queues that may be holding the
>> memory_failure() work a kick as part of the code that calls 
>> ghes_notify_sea().
>> This means that by the time we return to do_sea() ghes_notify_sea()'s claim 
>> that
>> APEI has dealt with it is true as any generated signals are pending. We can 
>> then
>> skip the existing SIGBUS generation code.
>>
>>
>>> Because memory_failure() may sleep, we can not call it directly in SEA
>>
>> (this one is more serious, I've attempted to fix it by moving all NMI-like
>> GHES-notifications to use the estatus queue).
>>
>>
>>> exception context. So we saved faulting physical address associated with
>>> a process in the ghes handler and set __TIF_SEA_NOTIFY. When we return
>>> from SEA exception context and get into do_notify_resume() before the
>>> process running, we could check it and call memory_failure() to do
>>> recovery.
>>
>>> It's safe, because we are in process context.
>>
>> I think this is the trick. When we take a Synchronous-external-abort out of
>> userspace, we're in process context too. We can add helpers to drain the
>> memory_failure_queue which can be called when do_sea() when we know we're
>> preemptible and interrupts-et-al are unmasked.
> 
> Something like... base on [0], in arch/arm64/kernel/acpi.c:
> -%<-
> int apei_claim_sea(struct pt_regs *regs)
> {
> int cpu;
> int err = -ENOENT;
> unsigned long current_flags = arch_local_save_flags();
> unsigned long interrupted_flags = current_flags;
> 
> if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
> return err;
> 
> if (regs)
> interrupted_flags = regs->pstate;
> 
> /*
>  * APEI expects an NMI-like notification to always be called
>  * in NMI context.
>  */
> local_daif_restore(DAIF_ERRCTX);
> nmi_enter();
> err = ghes_notify_sea();
> cpu = smp_processor_id();
> nmi_exit();
> 
> /*
>  * APEI NMI-like notifications are deferred to irq_work. Unless
>  * we interrupted irqs-masked code, we can do that now.
>  */
> if (!err) {
> if (!arch_irqs_disabled_flags(interrupted_flags)) {
> local_daif_restore(DAIF_PROCCTX_NOIRQ);
> irq_work_run();
> } else {
> err = -EINPROGRESS;
> }
> }
> 
> local_daif_restore(current_flags);
> 
> if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE) && !err) {
> /*
>  * Memory failure work is scheduled on the local CPU.
>  * If we interrupted userspace, or are in process context
>  * we can do that now.
>  */
> if ((regs && !user_mode(regs)) || !preemptible())
> err = -EINPROGRESS;
> else
> memory_failure_queue_kick(cpu);
> 

Re: [PATCH v5 1/3] arm64/ras: support sea error recovery

2018-02-08 Thread gengdongjiu


On 2018/2/8 3:03, James Morse wrote:
> Hi Xie XiuQi,
> 
> On 30/01/18 19:19, James Morse wrote:
>> On 26/01/18 12:31, Xie XiuQi wrote:
>>> With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
>>> are consumed. According to the existing process, errors occurred in the
>>> kernel, leading to direct panic, if it occurred the user-space, we should
>>> just kill process.
>>>
>>> But there is a class of error, in fact, is not necessary to kill
>>> process, you can recover and continue to run the process. Such as
>>> the instruction data corrupted, where the memory page might be
>>> read-only, which is has not been modified, the disk might have the
>>> correct data, so you can directly drop the page, ant reload it when
>>> necessary.
>>
>> With firmware-first support, we do all this...
>>
>>
>>> So this patchset is just try to solve such problem: if the error is
>>> consumed in user-space and the error occurs on a clean page, you can
>>> directly drop the memory page without killing process.
>>>
>>> If the corrupted page is clean, just dropped it and return to user-space
>>> without side effects. And if corrupted page is dirty, memory_failure()
>>> will send SIGBUS with code=BUS_MCEERR_AR. While without this patchset,
>>> do_sea() will just send SIGBUS, so the process was killed in the same place.
>>
>> ... but this happens too. I agree its something we should fix, but I don't 
>> think
>> this is the best way to do it.
>>
>> This series is pulling the memory-failure-queue details back into the 
>> arch-code
>> to build a second list, that gets processed as extra work when we return to
>> user-space.
>>
>>
>> The root of the issue is ghes_notify_sea() claims the notification as 
>> something
>> APEI has dealt with, ... but it hasn't done it yet. The signals will be
>> generated by something currently stuck in a queue. (Evidently x86 doesn't 
>> handle
>> synchronous errors like this using firmware-first).
>>
>> I think a smaller fix is to give the queues that may be holding the
>> memory_failure() work a kick as part of the code that calls 
>> ghes_notify_sea().
>> This means that by the time we return to do_sea() ghes_notify_sea()'s claim 
>> that
>> APEI has dealt with it is true as any generated signals are pending. We can 
>> then
>> skip the existing SIGBUS generation code.
>>
>>
>>> Because memory_failure() may sleep, we can not call it directly in SEA
>>
>> (this one is more serious, I've attempted to fix it by moving all NMI-like
>> GHES-notifications to use the estatus queue).
>>
>>
>>> exception context. So we saved faulting physical address associated with
>>> a process in the ghes handler and set __TIF_SEA_NOTIFY. When we return
>>> from SEA exception context and get into do_notify_resume() before the
>>> process running, we could check it and call memory_failure() to do
>>> recovery.
>>
>>> It's safe, because we are in process context.
>>
>> I think this is the trick. When we take a Synchronous-external-abort out of
>> userspace, we're in process context too. We can add helpers to drain the
>> memory_failure_queue which can be called when do_sea() when we know we're
>> preemptible and interrupts-et-al are unmasked.
> 
> Something like... base on [0], in arch/arm64/kernel/acpi.c:
> -%<-
> int apei_claim_sea(struct pt_regs *regs)
> {
> int cpu;
> int err = -ENOENT;
> unsigned long current_flags = arch_local_save_flags();
> unsigned long interrupted_flags = current_flags;
> 
> if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
> return err;
> 
> if (regs)
> interrupted_flags = regs->pstate;
> 
> /*
>  * APEI expects an NMI-like notification to always be called
>  * in NMI context.
>  */
> local_daif_restore(DAIF_ERRCTX);
> nmi_enter();
> err = ghes_notify_sea();
> cpu = smp_processor_id();
> nmi_exit();
> 
> /*
>  * APEI NMI-like notifications are deferred to irq_work. Unless
>  * we interrupted irqs-masked code, we can do that now.
>  */
> if (!err) {
> if (!arch_irqs_disabled_flags(interrupted_flags)) {
> local_daif_restore(DAIF_PROCCTX_NOIRQ);
> irq_work_run();
> } else {
> err = -EINPROGRESS;
> }
> }
> 
> local_daif_restore(current_flags);
> 
> if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE) && !err) {
> /*
>  * Memory failure work is scheduled on the local CPU.
>  * If we interrupted userspace, or are in process context
>  * we can do that now.
>  */
> if ((regs && !user_mode(regs)) || !preemptible())
> err = -EINPROGRESS;
> else
> memory_failure_queue_kick(cpu);
> 

Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-02-05 Thread gengdongjiu
[...]
> 
> > Yes, I know you are dong that. Your serial's patch will consider all above
> things, right?
> 
> Assuming I got it right, yes. It currently makes the race Xie XiuQi spotted 
> worse,
> which I want to fix too. (details on the cover letter)

Ok.

> 
> 
> > If your patch can be consider that, this patch can based on your patchset.
> thanks.
> 
> I'd like to pick these patches onto the end of that series, but first I want 
> to
> know what NOTIFY_SEI means for any OS. The ACPI spec doesn't say, and
> because its asynchronous, route-able and mask-able, there are many more
> corners than NOTFIY_SEA.
> 
> This thing is a notification using an emulated SError exception. (emulated
> because physical-SError must be routed to EL3 for firmware-first, and
> virtual-SError belongs to EL2).
> 
> Does your firmware emulate SError exactly as the TakeException() pseudo code
> in the Arm-Arm?

Yes, it is.

> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO,
> TGE}?

Yes, it is.

> What does your firmware do when it wants to emulate SError but its masked?
> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
> PSTATE.A  set.
>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
> emulated  SError should go to EL1. This effectively masks SError.)

Currently we does not consider much about the mask status(SPSR).
I remember that you ever suggested firmware should reboot if the mask status is 
set(SPSR), right?
I ever suggest our firmware team to evaluate that, but there is no response.

I CC "liu jun"  who is our UEFI firmware Architect, if 
you have firmware requirements, you can
raise again.

> 
> Answers to these let us determine whether a bug is in the firmware or the
> kernel. If firmware is expecting the OS to do something special, I'd like to 
> know
> about it from the beginning!

I know your meaning, thanks for raising it again.

> 
> 
> >>> Expose API ghes_notify_sei() to external users. External modules can
> >>> call this exposed API to parse APEI table and handle the SEI
> >>> notification.
> >>
> >> external modules? You mean called by the arch code when it gets this
> NOTIFY_SEI?
> 
> > yes, called by kernel ARCH code, such as below, I remember I have discussed
> with you.
> 
> Sure. The phrase 'external modules' usually means the '.ko' files that live in
> /lib/modules, nothing outside the kernel tree should be doing this stuff.

I will rename 'external modules' to other name. Thanks.

> 
> 
> Thanks,
> 
> James



Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-02-05 Thread gengdongjiu
[...]
> 
> > Yes, I know you are dong that. Your serial's patch will consider all above
> things, right?
> 
> Assuming I got it right, yes. It currently makes the race Xie XiuQi spotted 
> worse,
> which I want to fix too. (details on the cover letter)

Ok.

> 
> 
> > If your patch can be consider that, this patch can based on your patchset.
> thanks.
> 
> I'd like to pick these patches onto the end of that series, but first I want 
> to
> know what NOTIFY_SEI means for any OS. The ACPI spec doesn't say, and
> because its asynchronous, route-able and mask-able, there are many more
> corners than NOTFIY_SEA.
> 
> This thing is a notification using an emulated SError exception. (emulated
> because physical-SError must be routed to EL3 for firmware-first, and
> virtual-SError belongs to EL2).
> 
> Does your firmware emulate SError exactly as the TakeException() pseudo code
> in the Arm-Arm?

Yes, it is.

> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO,
> TGE}?

Yes, it is.

> What does your firmware do when it wants to emulate SError but its masked?
> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
> PSTATE.A  set.
>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
> emulated  SError should go to EL1. This effectively masks SError.)

Currently we does not consider much about the mask status(SPSR).
I remember that you ever suggested firmware should reboot if the mask status is 
set(SPSR), right?
I ever suggest our firmware team to evaluate that, but there is no response.

I CC "liu jun"  who is our UEFI firmware Architect, if 
you have firmware requirements, you can
raise again.

> 
> Answers to these let us determine whether a bug is in the firmware or the
> kernel. If firmware is expecting the OS to do something special, I'd like to 
> know
> about it from the beginning!

I know your meaning, thanks for raising it again.

> 
> 
> >>> Expose API ghes_notify_sei() to external users. External modules can
> >>> call this exposed API to parse APEI table and handle the SEI
> >>> notification.
> >>
> >> external modules? You mean called by the arch code when it gets this
> NOTIFY_SEI?
> 
> > yes, called by kernel ARCH code, such as below, I remember I have discussed
> with you.
> 
> Sure. The phrase 'external modules' usually means the '.ko' files that live in
> /lib/modules, nothing outside the kernel tree should be doing this stuff.

I will rename 'external modules' to other name. Thanks.

> 
> 
> Thanks,
> 
> James



Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2018-02-04 Thread gengdongjiu
James,
   Thank you for your time to reply me.

On 2018/1/31 3:21, James Morse wrote:
> Hi gengdongjiu,
> 
> On 24/01/18 20:06, gengdongjiu wrote:
>>> On 06/01/18 16:02, Dongjiu Geng wrote:
>>>> The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
>>>> guest and user space needs a way to tell KVM this value. So we add a
>>>> new ioctl. Before user space specifies the Exception Syndrome Register
>>>> ESR(ESR), it firstly checks that whether KVM has the capability to set
>>>> the guest ESR, If has, will set it. Otherwise, nothing to do.
>>>>
>>>> For this ESR specifying, Only support for AArch64, not support AArch32.
>>>
>>> After this patch user-space can trigger an SError in the guest. If it wants 
>>> to migrate the guest, how does the pending SError get migrated?
>>>
>>> I think we need to fix migration first. Andrew Jones suggested using
>>> KVM_GET/SET_VCPU_EVENTS:
>>> https://www.spinics.net/lists/arm-kernel/msg616846.html
>>>
>>> Given KVM uses kvm_inject_vabt() on v8.0 hardware too, we should cover 
>>> systems without the v8.2 RAS Extensions with the same API. I
>>> think this means a bit to read/write whether SError is pending, and another 
>>> to indicate the ESR should be set/read.
>>> CPUs without the v8.2 RAS Extensions can reject pending-SError that had an 
>>> ESR.
>>
>> For the CPUs without the v8.2 RAS Extensions, its ESR is always 0, 
>> we only can inject a SError with ESR 0 to guest, cannot set its ESR.
> 
> 0? It's always implementation-defined. On Juno it seems to be always-0, but
> other systems may behave differently. (Juno may generate another ESR value 
> when
> I'm not watching it...)
For the armv8.0 cpu without RAS Extensions, it does not have vsesr_el2, so when 
guest take a virtual SError,
its ESR is 0, can not control the virtual SError's syndrom value, it does not 
have such registers to control that.

Does Juno not have RAS Extension? if yes, I think we can only inject an SError, 
but can not change its ESR value, because it does not have vsesr_el2.

> 
> Just because we can't control the ESR doesn't mean injecting an SError isn't
> something user-space may want to do.
yes, we may need to support user-space injects an SError even through CPU does 
not have RAS Extension.

> If we tackle migration of pending-SError first, I think that will give us the
> API to create a new pending SError with/without an ESR as appropriate.

sure, we should. Last week, I checked the KVM_GET/SET_VCPU_EVENTS IOCTL, it 
should meet our migration requirements

> 
> 
>> About how about to use the KVM_GET/SET_VCPU_EVENTS, I will check the code, 
>> and
>> consider your suggestion at the same time.
> 
> (Not my suggestion, It was Andrew Jones idea.)
Got it.

> 
>> The IOCTL KVM_GET/SET_VCPU_EVENTS has been used by X86.
> 
> We would be re-using the struct to have values with slightly different 
> meanings.
> But for migration the upshot is the same, call KVM_GET_VCPU_EVENTS on one
> system, and pass the struct to KVM_SET_VCPU_EVENTS on the new system. If we're
> lucky Qemu may be able to do this in shared x86/arm64 code.
> 
Thanks for the reminder, I know your meaning.
In the x86, the kvm_vcpu_events includes exception/interrupt/nmi/smi. For the 
RAS, we
only involves the exception, so Qemu handling logic is different. Anyway, I 
will try to
share code for the two platform in Qemu.


/* for KVM_GET/SET_VCPU_EVENTS */
struct kvm_vcpu_events {
struct {
__u8 injected;
__u8 nr;
__u8 has_error_code;
__u8 pad;
__u32 error_code;
} exception;
struct {
__u8 injected;
__u8 nr;
__u8 soft;
__u8 shadow;
} interrupt;
struct {
__u8 injected;
__u8 pending;
__u8 masked;
__u8 pad;
} nmi;
__u32 sipi_vector;
__u32 flags;
struct {
__u8 smm;
__u8 pending;
__u8 smm_inside_nmi;
__u8 latched_init;
} smi;
__u32 reserved[9];
};

> 
>>>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index
>>>> 5c7f657..738ae90 100644
>>>> --- a/arch/arm64/kvm/guest.c
>>>> +++ b/arch/arm64/kvm/guest.c
>>>> @@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu 
>>>> *vcpu,
>>>>return -EINVAL;
>>>>  }
>>>>
>>>> +int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome) {
>&

Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2018-02-04 Thread gengdongjiu
James,
   Thank you for your time to reply me.

On 2018/1/31 3:21, James Morse wrote:
> Hi gengdongjiu,
> 
> On 24/01/18 20:06, gengdongjiu wrote:
>>> On 06/01/18 16:02, Dongjiu Geng wrote:
>>>> The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
>>>> guest and user space needs a way to tell KVM this value. So we add a
>>>> new ioctl. Before user space specifies the Exception Syndrome Register
>>>> ESR(ESR), it firstly checks that whether KVM has the capability to set
>>>> the guest ESR, If has, will set it. Otherwise, nothing to do.
>>>>
>>>> For this ESR specifying, Only support for AArch64, not support AArch32.
>>>
>>> After this patch user-space can trigger an SError in the guest. If it wants 
>>> to migrate the guest, how does the pending SError get migrated?
>>>
>>> I think we need to fix migration first. Andrew Jones suggested using
>>> KVM_GET/SET_VCPU_EVENTS:
>>> https://www.spinics.net/lists/arm-kernel/msg616846.html
>>>
>>> Given KVM uses kvm_inject_vabt() on v8.0 hardware too, we should cover 
>>> systems without the v8.2 RAS Extensions with the same API. I
>>> think this means a bit to read/write whether SError is pending, and another 
>>> to indicate the ESR should be set/read.
>>> CPUs without the v8.2 RAS Extensions can reject pending-SError that had an 
>>> ESR.
>>
>> For the CPUs without the v8.2 RAS Extensions, its ESR is always 0, 
>> we only can inject a SError with ESR 0 to guest, cannot set its ESR.
> 
> 0? It's always implementation-defined. On Juno it seems to be always-0, but
> other systems may behave differently. (Juno may generate another ESR value 
> when
> I'm not watching it...)
For the armv8.0 cpu without RAS Extensions, it does not have vsesr_el2, so when 
guest take a virtual SError,
its ESR is 0, can not control the virtual SError's syndrom value, it does not 
have such registers to control that.

Does Juno not have RAS Extension? if yes, I think we can only inject an SError, 
but can not change its ESR value, because it does not have vsesr_el2.

> 
> Just because we can't control the ESR doesn't mean injecting an SError isn't
> something user-space may want to do.
yes, we may need to support user-space injects an SError even through CPU does 
not have RAS Extension.

> If we tackle migration of pending-SError first, I think that will give us the
> API to create a new pending SError with/without an ESR as appropriate.

sure, we should. Last week, I checked the KVM_GET/SET_VCPU_EVENTS IOCTL, it 
should meet our migration requirements

> 
> 
>> About how about to use the KVM_GET/SET_VCPU_EVENTS, I will check the code, 
>> and
>> consider your suggestion at the same time.
> 
> (Not my suggestion, It was Andrew Jones idea.)
Got it.

> 
>> The IOCTL KVM_GET/SET_VCPU_EVENTS has been used by X86.
> 
> We would be re-using the struct to have values with slightly different 
> meanings.
> But for migration the upshot is the same, call KVM_GET_VCPU_EVENTS on one
> system, and pass the struct to KVM_SET_VCPU_EVENTS on the new system. If we're
> lucky Qemu may be able to do this in shared x86/arm64 code.
> 
Thanks for the reminder, I know your meaning.
In the x86, the kvm_vcpu_events includes exception/interrupt/nmi/smi. For the 
RAS, we
only involves the exception, so Qemu handling logic is different. Anyway, I 
will try to
share code for the two platform in Qemu.


/* for KVM_GET/SET_VCPU_EVENTS */
struct kvm_vcpu_events {
struct {
__u8 injected;
__u8 nr;
__u8 has_error_code;
__u8 pad;
__u32 error_code;
} exception;
struct {
__u8 injected;
__u8 nr;
__u8 soft;
__u8 shadow;
} interrupt;
struct {
__u8 injected;
__u8 pending;
__u8 masked;
__u8 pad;
} nmi;
__u32 sipi_vector;
__u32 flags;
struct {
__u8 smm;
__u8 pending;
__u8 smm_inside_nmi;
__u8 latched_init;
} smi;
__u32 reserved[9];
};

> 
>>>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index
>>>> 5c7f657..738ae90 100644
>>>> --- a/arch/arm64/kvm/guest.c
>>>> +++ b/arch/arm64/kvm/guest.c
>>>> @@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu 
>>>> *vcpu,
>>>>return -EINVAL;
>>>>  }
>>>>
>>>> +int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome) {
>&

Re: [PATCH v9 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest

2018-01-25 Thread gengdongjiu
Hi James,
  thanks for the review.

On 2018/1/24 3:07, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 06/01/18 16:02, Dongjiu Geng wrote:
>> RAS Extension add a VSESR_EL2 register which can provide
>> the syndrome value reported to software on taking a virtual
>> SError interrupt exception. This patch supports to specify
>> this Syndrome.
>>
>> In the RAS Extensions we can not set all-zero syndrome value
>> for SError, which means 'RAS error: Uncategorized' instead of
>> 'no valid ISS'. So set it to IMPLEMENTATION DEFINED syndrome
>> by default.
>>
>> We also need to support userspace to specify a valid syndrome
>> value, Because in some case, the recovery is driven by userspace.
>> This patch can support that userspace specify it.
>>
>> In the guest/host world switch, restore this value to VSESR_EL2
>> only when HCR_EL2.VSE is set. This value no need to be saved
>> because it is stale vale when guest exit.
> 
> A version of this patch has been queued by Catalin.
yeah, I have seen that.

> 
> Now that the cpufeature bits are queued, I think this can be split up into two
> separate series for v4.16-rc1, one to tackle NOTIFY_SEI and the associated
> plumbing. The second for the KVM 'make SError pending' API.
> 
> 
>> Signed-off-by: Dongjiu Geng 
>> [Set an impdef ESR for Virtual-SError]
>> Signed-off-by: James Morse 
> 
> I didn't sign-off this patch. If you pick some bits from another version and
> want to credit someone else you can 'CC:' them or just mention it in the
> commit-message.

I pick your modification of setting an impdef ESR for Virtual-SError, so add 
your name,
I change it to 'CC'

> 
> 
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index 47b967d..3b035cc 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -86,6 +86,9 @@
>>  #define REG_PSTATE_PAN_IMM  sys_reg(0, 0, 4, 0, 4)
>>  #define REG_PSTATE_UAO_IMM  sys_reg(0, 0, 4, 0, 3)
>>  
>> +/* virtual SError exception syndrome register */
>> +#define REG_VSESR_EL2  sys_reg(3, 4, 5, 2, 3)
> 
> Irrelevant-Nit: sys-regs usually have a 'SYS_' prefix, and are in instruction
> encoding order lower down the file.
will follow that.

> 
> (These PSTATE PAN things are a bit odd as they were used to generate and
> instruction before the fancy {read,write}_sysreg() helpers were added).>
> 
>>  #define SET_PSTATE_PAN(x) __emit_inst(0xd500 | REG_PSTATE_PAN_IMM | 
>> \
>>(!!x)<<8 | 0x1f)
>>  #define SET_PSTATE_UAO(x) __emit_inst(0xd500 | REG_PSTATE_UAO_IMM | 
>> \
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 738ae90..ffad42b 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -279,7 +279,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>  
>>  int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
> 
> Bits of this are spread between patches 5 and 6. If you put them in the other
> order this wouldn't happen.

Ok, I will adjust that.

> 
> (but after a rebase most of this patch should disappear)
> 
>>  {
>> -return -EINVAL;
>> +u64 reg = *syndrome;
>> +
>> +/* inject virtual system Error or asynchronous abort */
>> +kvm_inject_vabt(vcpu);
> 
> So this writes an impdef ESR, because its the existing code-path in KVM.
> 
> 
>> +if (reg)
>> +/* set vsesr_el2[24:0] with value that user space specified */
>> +kvm_vcpu_set_vsesr(vcpu, reg & ESR_ELx_ISS_MASK);
> 
> And then you overwrite it. Which is a bit odd as there is a helper to do both 
> in
> one go:
thanks, I will directly call pend_guest_serror() in this function.

> 
> 
>> +
>> +return 0;
>>  }
> 
>>  int __attribute_const__ kvm_target_cpu(void)
> 
>> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
>> index 3556715..fb94b5e 100644
>> --- a/arch/arm64/kvm/inject_fault.c
>> +++ b/arch/arm64/kvm/inject_fault.c
>> @@ -246,14 +246,25 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
>>  inject_undef64(vcpu);
>>  }
>>  
>> +static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
>> +{
>> +kvm_vcpu_set_vsesr(vcpu, esr);
>> +vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
>> +}
> 
> How come you don't use this in kvm_arm_set_sei_esr()?
thanks, I will call pend_guest_serror() in the kvm_arm_set_sei_esr().

> 
> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v9 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest

2018-01-25 Thread gengdongjiu
Hi James,
  thanks for the review.

On 2018/1/24 3:07, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 06/01/18 16:02, Dongjiu Geng wrote:
>> RAS Extension add a VSESR_EL2 register which can provide
>> the syndrome value reported to software on taking a virtual
>> SError interrupt exception. This patch supports to specify
>> this Syndrome.
>>
>> In the RAS Extensions we can not set all-zero syndrome value
>> for SError, which means 'RAS error: Uncategorized' instead of
>> 'no valid ISS'. So set it to IMPLEMENTATION DEFINED syndrome
>> by default.
>>
>> We also need to support userspace to specify a valid syndrome
>> value, Because in some case, the recovery is driven by userspace.
>> This patch can support that userspace specify it.
>>
>> In the guest/host world switch, restore this value to VSESR_EL2
>> only when HCR_EL2.VSE is set. This value no need to be saved
>> because it is stale vale when guest exit.
> 
> A version of this patch has been queued by Catalin.
yeah, I have seen that.

> 
> Now that the cpufeature bits are queued, I think this can be split up into two
> separate series for v4.16-rc1, one to tackle NOTIFY_SEI and the associated
> plumbing. The second for the KVM 'make SError pending' API.
> 
> 
>> Signed-off-by: Dongjiu Geng 
>> [Set an impdef ESR for Virtual-SError]
>> Signed-off-by: James Morse 
> 
> I didn't sign-off this patch. If you pick some bits from another version and
> want to credit someone else you can 'CC:' them or just mention it in the
> commit-message.

I pick your modification of setting an impdef ESR for Virtual-SError, so add 
your name,
I change it to 'CC'

> 
> 
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index 47b967d..3b035cc 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -86,6 +86,9 @@
>>  #define REG_PSTATE_PAN_IMM  sys_reg(0, 0, 4, 0, 4)
>>  #define REG_PSTATE_UAO_IMM  sys_reg(0, 0, 4, 0, 3)
>>  
>> +/* virtual SError exception syndrome register */
>> +#define REG_VSESR_EL2  sys_reg(3, 4, 5, 2, 3)
> 
> Irrelevant-Nit: sys-regs usually have a 'SYS_' prefix, and are in instruction
> encoding order lower down the file.
will follow that.

> 
> (These PSTATE PAN things are a bit odd as they were used to generate and
> instruction before the fancy {read,write}_sysreg() helpers were added).>
> 
>>  #define SET_PSTATE_PAN(x) __emit_inst(0xd500 | REG_PSTATE_PAN_IMM | 
>> \
>>(!!x)<<8 | 0x1f)
>>  #define SET_PSTATE_UAO(x) __emit_inst(0xd500 | REG_PSTATE_UAO_IMM | 
>> \
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 738ae90..ffad42b 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -279,7 +279,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>  
>>  int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
> 
> Bits of this are spread between patches 5 and 6. If you put them in the other
> order this wouldn't happen.

Ok, I will adjust that.

> 
> (but after a rebase most of this patch should disappear)
> 
>>  {
>> -return -EINVAL;
>> +u64 reg = *syndrome;
>> +
>> +/* inject virtual system Error or asynchronous abort */
>> +kvm_inject_vabt(vcpu);
> 
> So this writes an impdef ESR, because its the existing code-path in KVM.
> 
> 
>> +if (reg)
>> +/* set vsesr_el2[24:0] with value that user space specified */
>> +kvm_vcpu_set_vsesr(vcpu, reg & ESR_ELx_ISS_MASK);
> 
> And then you overwrite it. Which is a bit odd as there is a helper to do both 
> in
> one go:
thanks, I will directly call pend_guest_serror() in this function.

> 
> 
>> +
>> +return 0;
>>  }
> 
>>  int __attribute_const__ kvm_target_cpu(void)
> 
>> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
>> index 3556715..fb94b5e 100644
>> --- a/arch/arm64/kvm/inject_fault.c
>> +++ b/arch/arm64/kvm/inject_fault.c
>> @@ -246,14 +246,25 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
>>  inject_undef64(vcpu);
>>  }
>>  
>> +static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
>> +{
>> +kvm_vcpu_set_vsesr(vcpu, esr);
>> +vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
>> +}
> 
> How come you don't use this in kvm_arm_set_sei_esr()?
thanks, I will call pend_guest_serror() in the kvm_arm_set_sei_esr().

> 
> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2018-01-24 Thread gengdongjiu
Hi James,
   Thanks a lot for your review and comments.

> 
> Hi Dongjiu Geng,
> 
> On 06/01/18 16:02, Dongjiu Geng wrote:
> > The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
> > guest and user space needs a way to tell KVM this value. So we add a
> > new ioctl. Before user space specifies the Exception Syndrome Register
> > ESR(ESR), it firstly checks that whether KVM has the capability to set
> > the guest ESR, If has, will set it. Otherwise, nothing to do.
> >
> > For this ESR specifying, Only support for AArch64, not support AArch32.
> 
> After this patch user-space can trigger an SError in the guest. If it wants 
> to migrate the guest, how does the pending SError get migrated?
> 
> I think we need to fix migration first. Andrew Jones suggested using
> KVM_GET/SET_VCPU_EVENTS:
> https://www.spinics.net/lists/arm-kernel/msg616846.html
> 
> Given KVM uses kvm_inject_vabt() on v8.0 hardware too, we should cover 
> systems without the v8.2 RAS Extensions with the same API. I
> think this means a bit to read/write whether SError is pending, and another 
> to indicate the ESR should be set/read.
> CPUs without the v8.2 RAS Extensions can reject pending-SError that had an 
> ESR.

For the CPUs without the v8.2 RAS Extensions, its ESR is always 0, we only can 
inject a SError with ESR 0 to guest, cannot set its ESR.
About how about to use the KVM_GET/SET_VCPU_EVENTS, I will check the code, and 
consider your suggestion at the same time. 
The IOCTL KVM_GET/SET_VCPU_EVENTS has been used by X86.

> 
> user-space can then use the 'for migration' calls to make a 'new' SError 
> pending.
> 
> Now that the cpufeature bits are queued, I think this can be split up into 
> two separate series for v4.16-rc1, one to tackle NOTIFY_SEI and
> the associated plumbing. The second for the KVM 'make SError pending' API.

Ok, thanks for your suggestion, will split it.

> 
> 
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index
> > 5c7f657..738ae90 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu 
> > *vcpu,
> > return -EINVAL;
> >  }
> >
> > +int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome) {
> > +   return -EINVAL;
> > +}
> 
> Does nothing in the patch that adds the support? This is a bit odd.
> (oh, its hiding in patch 6...)

To make this patch simple and small, I add it in patch 6.

> 
> 
> Thanks,
> 
> James



Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2018-01-24 Thread gengdongjiu
Hi James,
   Thanks a lot for your review and comments.

> 
> Hi Dongjiu Geng,
> 
> On 06/01/18 16:02, Dongjiu Geng wrote:
> > The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
> > guest and user space needs a way to tell KVM this value. So we add a
> > new ioctl. Before user space specifies the Exception Syndrome Register
> > ESR(ESR), it firstly checks that whether KVM has the capability to set
> > the guest ESR, If has, will set it. Otherwise, nothing to do.
> >
> > For this ESR specifying, Only support for AArch64, not support AArch32.
> 
> After this patch user-space can trigger an SError in the guest. If it wants 
> to migrate the guest, how does the pending SError get migrated?
> 
> I think we need to fix migration first. Andrew Jones suggested using
> KVM_GET/SET_VCPU_EVENTS:
> https://www.spinics.net/lists/arm-kernel/msg616846.html
> 
> Given KVM uses kvm_inject_vabt() on v8.0 hardware too, we should cover 
> systems without the v8.2 RAS Extensions with the same API. I
> think this means a bit to read/write whether SError is pending, and another 
> to indicate the ESR should be set/read.
> CPUs without the v8.2 RAS Extensions can reject pending-SError that had an 
> ESR.

For the CPUs without the v8.2 RAS Extensions, its ESR is always 0, we only can 
inject a SError with ESR 0 to guest, cannot set its ESR.
About how about to use the KVM_GET/SET_VCPU_EVENTS, I will check the code, and 
consider your suggestion at the same time. 
The IOCTL KVM_GET/SET_VCPU_EVENTS has been used by X86.

> 
> user-space can then use the 'for migration' calls to make a 'new' SError 
> pending.
> 
> Now that the cpufeature bits are queued, I think this can be split up into 
> two separate series for v4.16-rc1, one to tackle NOTIFY_SEI and
> the associated plumbing. The second for the KVM 'make SError pending' API.

Ok, thanks for your suggestion, will split it.

> 
> 
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index
> > 5c7f657..738ae90 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu 
> > *vcpu,
> > return -EINVAL;
> >  }
> >
> > +int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome) {
> > +   return -EINVAL;
> > +}
> 
> Does nothing in the patch that adds the support? This is a bit odd.
> (oh, its hiding in patch 6...)

To make this patch simple and small, I add it in patch 6.

> 
> 
> Thanks,
> 
> James



Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-01-23 Thread gengdongjiu
sorry fix a typo.

On 2018/1/23 17:23, gengdongjiu wrote:
>> There are problems with doing this:
>>
>> Oct. 18, 2017, 10:26 a.m. James Morse wrote:
>> | How do SEA and SEI interact?
>> |
>> | As far as I can see they can both interrupt each other, which isn't 
>> something
>> | the single in_nmi() path in APEI can handle. I thinks we should fix this
>> | first.
>>
>> [..]
>>
>> | SEA gets away with a lot of things because its synchronous. SEI isn't. Xie
>> | XiuQi pointed to the memory_failure_queue() code. We can use this directly
>> | from SEA, but not SEI. (what happens if an SError arrives while we are
>> | queueing memory_failure work from an IRQ).
>> |
>> | The one that scares me is the trace-point reporting stuff. What happens if 
>> an
>> | SError arrives while we are enabling a trace point? (these are static-keys
>> | right?)
>> |
>> |  I don't think we can just plumb SEI in like this and be done with it.
>> |  (I'm looking at teasing out the estatus cache code from being x86:NMI 
>> only.
>> |  This way we solve the same 'cant do this from NMI context' with the same
>> |  code'.)
>>
>>
>> I will post what I've got for this estatus-cache thing as an RFC, its not 
>> ready
>> to be considered yet.

Yes, I know you are dong that. Your serial's patch will consider all above 
things, right?
If your patch can be consider that, this patch can based on your patchset. 
thanks.

> 
>>



Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-01-23 Thread gengdongjiu
sorry fix a typo.

On 2018/1/23 17:23, gengdongjiu wrote:
>> There are problems with doing this:
>>
>> Oct. 18, 2017, 10:26 a.m. James Morse wrote:
>> | How do SEA and SEI interact?
>> |
>> | As far as I can see they can both interrupt each other, which isn't 
>> something
>> | the single in_nmi() path in APEI can handle. I thinks we should fix this
>> | first.
>>
>> [..]
>>
>> | SEA gets away with a lot of things because its synchronous. SEI isn't. Xie
>> | XiuQi pointed to the memory_failure_queue() code. We can use this directly
>> | from SEA, but not SEI. (what happens if an SError arrives while we are
>> | queueing memory_failure work from an IRQ).
>> |
>> | The one that scares me is the trace-point reporting stuff. What happens if 
>> an
>> | SError arrives while we are enabling a trace point? (these are static-keys
>> | right?)
>> |
>> |  I don't think we can just plumb SEI in like this and be done with it.
>> |  (I'm looking at teasing out the estatus cache code from being x86:NMI 
>> only.
>> |  This way we solve the same 'cant do this from NMI context' with the same
>> |  code'.)
>>
>>
>> I will post what I've got for this estatus-cache thing as an RFC, its not 
>> ready
>> to be considered yet.

Yes, I know you are dong that. Your serial's patch will consider all above 
things, right?
If your patch can be consider that, this patch can based on your patchset. 
thanks.

> 
>>



Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-01-23 Thread gengdongjiu
Hi James,

On 2018/1/23 3:39, James Morse wrote:
> Hi Dongjiu Geng,
> 
> (versions of patches 1,2 and 4 have been queued by Catalin)
> 
> (Nit 'ACPI / APEI:' is the normal subject prefix for ghes.c, this helps the
> maintainers know which patches they need to pay attention to when you are
> touching multiple trees)
> 
> On 06/01/18 16:02, Dongjiu Geng wrote:
>> ARMv8.2 requires implementation of the RAS extension.
> 
>> In
>> this extension, it adds SEI(SError Interrupt) notification
>> type, this patch adds new GHES error source SEI handling
>> functions. 
> 
> This reads as if this patch is handling SError RAS notifications generated by 
> a
> CPU with the RAS extensions. These are about CPU->Software notifications. APEI
> and GHES are a firmware first mechanism which is Software->Software.
> Reading the v8.2 documents won't help anyone with the APEI/GHES code.
> 
> Please describe this from the ACPI view, "ACPI 6.x adds support for NOTIFY_SEI
> as a GHES notification mechanism... ",  its up to the arch code to spot a v8.2
> RAS Error based on the cpu caps.Ok, I will modify it.

> 
> 
>> This error source parsing and handling method
>> is similar with the SEA.
> 
> There are problems with doing this:
> 
> Oct. 18, 2017, 10:26 a.m. James Morse wrote:
> | How do SEA and SEI interact?
> |
> | As far as I can see they can both interrupt each other, which isn't 
> something
> | the single in_nmi() path in APEI can handle. I thinks we should fix this
> | first.
> 
> [..]
> 
> | SEA gets away with a lot of things because its synchronous. SEI isn't. Xie
> | XiuQi pointed to the memory_failure_queue() code. We can use this directly
> | from SEA, but not SEI. (what happens if an SError arrives while we are
> | queueing memory_failure work from an IRQ).
> |
> | The one that scares me is the trace-point reporting stuff. What happens if 
> an
> | SError arrives while we are enabling a trace point? (these are static-keys
> | right?)
> |
> |  I don't think we can just plumb SEI in like this and be done with it.
> |  (I'm looking at teasing out the estatus cache code from being x86:NMI only.
> |  This way we solve the same 'cant do this from NMI context' with the same
> |  code'.)
> 
> 
> I will post what I've got for this estatus-cache thing as an RFC, its not 
> ready
> to be considered yet.Yes, I know you are dong that. Your serial's patch will 
> consider all above things, right?
If your patch can be consider that, this patch can based on your patchset. 
thanks.

> 
> 
>> Expose API ghes_notify_sei() to external users. External
>> modules can call this exposed API to parse APEI table and
>> handle the SEI notification.
> 
> external modules? You mean called by the arch code when it gets this 
> NOTIFY_SEI?
yes, called by kernel ARCH code, such as below, I remember I have discussed 
with you.

 asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
 {
nmi_enter();


if (!ghes_notify_sei())
return;



/* non-RAS errors are not containable */
if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, esr))
arm64_serror_panic(regs, esr);

nmi_exit();
}

> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-01-23 Thread gengdongjiu
Hi James,

On 2018/1/23 3:39, James Morse wrote:
> Hi Dongjiu Geng,
> 
> (versions of patches 1,2 and 4 have been queued by Catalin)
> 
> (Nit 'ACPI / APEI:' is the normal subject prefix for ghes.c, this helps the
> maintainers know which patches they need to pay attention to when you are
> touching multiple trees)
> 
> On 06/01/18 16:02, Dongjiu Geng wrote:
>> ARMv8.2 requires implementation of the RAS extension.
> 
>> In
>> this extension, it adds SEI(SError Interrupt) notification
>> type, this patch adds new GHES error source SEI handling
>> functions. 
> 
> This reads as if this patch is handling SError RAS notifications generated by 
> a
> CPU with the RAS extensions. These are about CPU->Software notifications. APEI
> and GHES are a firmware first mechanism which is Software->Software.
> Reading the v8.2 documents won't help anyone with the APEI/GHES code.
> 
> Please describe this from the ACPI view, "ACPI 6.x adds support for NOTIFY_SEI
> as a GHES notification mechanism... ",  its up to the arch code to spot a v8.2
> RAS Error based on the cpu caps.Ok, I will modify it.

> 
> 
>> This error source parsing and handling method
>> is similar with the SEA.
> 
> There are problems with doing this:
> 
> Oct. 18, 2017, 10:26 a.m. James Morse wrote:
> | How do SEA and SEI interact?
> |
> | As far as I can see they can both interrupt each other, which isn't 
> something
> | the single in_nmi() path in APEI can handle. I thinks we should fix this
> | first.
> 
> [..]
> 
> | SEA gets away with a lot of things because its synchronous. SEI isn't. Xie
> | XiuQi pointed to the memory_failure_queue() code. We can use this directly
> | from SEA, but not SEI. (what happens if an SError arrives while we are
> | queueing memory_failure work from an IRQ).
> |
> | The one that scares me is the trace-point reporting stuff. What happens if 
> an
> | SError arrives while we are enabling a trace point? (these are static-keys
> | right?)
> |
> |  I don't think we can just plumb SEI in like this and be done with it.
> |  (I'm looking at teasing out the estatus cache code from being x86:NMI only.
> |  This way we solve the same 'cant do this from NMI context' with the same
> |  code'.)
> 
> 
> I will post what I've got for this estatus-cache thing as an RFC, its not 
> ready
> to be considered yet.Yes, I know you are dong that. Your serial's patch will 
> consider all above things, right?
If your patch can be consider that, this patch can based on your patchset. 
thanks.

> 
> 
>> Expose API ghes_notify_sei() to external users. External
>> modules can call this exposed API to parse APEI table and
>> handle the SEI notification.
> 
> external modules? You mean called by the arch code when it gets this 
> NOTIFY_SEI?
yes, called by kernel ARCH code, such as below, I remember I have discussed 
with you.

 asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
 {
nmi_enter();


if (!ghes_notify_sei())
return;



/* non-RAS errors are not containable */
if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, esr))
arm64_serror_panic(regs, esr);

nmi_exit();
}

> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-20 Thread gengdongjiu
2018-01-15 16:33 GMT+08:00 Christoffer Dall <christoffer.d...@linaro.org>:
> On Fri, Jan 12, 2018 at 06:05:23PM +, James Morse wrote:
>> On 15/12/17 03:30, gengdongjiu wrote:
>> > On 2017/12/7 14:37, gengdongjiu wrote:
>
> [...]
>
>>
>> (I recall someone saying migration is needed for any new KVM/cpu features, 
>> but I
>> can't find the thread)
>>
>
> I don't know of any hard set-in-stone rule for this, but I have
> certainly argued that since migration is a popular technique in data
> centers and often a key motivation behind using virtual machines as it
> provides both load-balancing and high availability, we should think
> about migration support for all features and state.  Further, experience
> has shown that retroactively trying to support migration can result in
> really complex interfaces for saving/restoring state (see the ITS
> ordering requirements in
> Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
> thinking about this problem when introducing functionality is a good
> idea.
yes, agree it.

>
> Of course, if there are really good arguments for having some state that
> simply cannot be migrated, then that's fine, and we should just make
> sure that userspace (e.g. QEMU) and higher level components in the
> stack (libvirt, openstack, etc.) can detect this state being used, and
> ideally enable/disable it, so that it can predict that a particular VM
> cannot be migrated off a particular host, or between a particular set of
> two hosts.  As an example, migration is typically prohibited when using
> VFIO direct device assignment, but userspace etc. are already aware of
> this.
Ok,  I think this problem is similar to migrating a VM that uses an irqchip in
 userspace and has set the IRQ or FIQ lines using KVM_IRQ_LINE.

>
> As a final note, if we add support for some architectural feature, which
> may be present on some particular hardware and/or implementation, if the
> KVM support for said feature is automatically enabled (and not
> selectively from userspace), I would push back quite strongly on
> something that doesn't support migration, because it would effectively
> prevent migration of VMs on ARM.
Ok, got it.

>
> Thanks,
> -Christoffer
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-20 Thread gengdongjiu
2018-01-15 16:33 GMT+08:00 Christoffer Dall :
> On Fri, Jan 12, 2018 at 06:05:23PM +, James Morse wrote:
>> On 15/12/17 03:30, gengdongjiu wrote:
>> > On 2017/12/7 14:37, gengdongjiu wrote:
>
> [...]
>
>>
>> (I recall someone saying migration is needed for any new KVM/cpu features, 
>> but I
>> can't find the thread)
>>
>
> I don't know of any hard set-in-stone rule for this, but I have
> certainly argued that since migration is a popular technique in data
> centers and often a key motivation behind using virtual machines as it
> provides both load-balancing and high availability, we should think
> about migration support for all features and state.  Further, experience
> has shown that retroactively trying to support migration can result in
> really complex interfaces for saving/restoring state (see the ITS
> ordering requirements in
> Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
> thinking about this problem when introducing functionality is a good
> idea.
yes, agree it.

>
> Of course, if there are really good arguments for having some state that
> simply cannot be migrated, then that's fine, and we should just make
> sure that userspace (e.g. QEMU) and higher level components in the
> stack (libvirt, openstack, etc.) can detect this state being used, and
> ideally enable/disable it, so that it can predict that a particular VM
> cannot be migrated off a particular host, or between a particular set of
> two hosts.  As an example, migration is typically prohibited when using
> VFIO direct device assignment, but userspace etc. are already aware of
> this.
Ok,  I think this problem is similar to migrating a VM that uses an irqchip in
 userspace and has set the IRQ or FIQ lines using KVM_IRQ_LINE.

>
> As a final note, if we add support for some architectural feature, which
> may be present on some particular hardware and/or implementation, if the
> KVM support for said feature is automatically enabled (and not
> selectively from userspace), I would push back quite strongly on
> something that doesn't support migration, because it would effectively
> prevent migration of VMs on ARM.
Ok, got it.

>
> Thanks,
> -Christoffer
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-20 Thread gengdongjiu
2018-01-13 2:05 GMT+08:00 James Morse <james.mo...@arm.com>:
> Hi gengdongjiu,
>
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> + case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> + /*
>>>> +  * Userspace only handle the guest SError Interrupt(SEI) if 
>>>> the
>>>> +  * error has not been propagated
>>>> +  */
>>>> + run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> + run->ex.exception = ESR_ELx_EC_SERROR;
>>>> + run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> + return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either 
>>> handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports 
>>> RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
>
> [..]
>
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception,
>
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
>
> If the kernel takes an unhandled RAS error it should panic - we don't know 
> where
> the error is.
OK, here I will panic.

>
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
>
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables 
> from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
>
>
>> Host does not know which application of guest has error, so host can
>> not handle it,
>
> It has to work this out, otherwise the errors we can handle never get a 
> chance.
>
> This kernel is expected to look at the error description, (which for some 
> reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining 
> signals
> on all affected processes.
>
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
>
> If you pass the notification to user-space, all it can do is signal the guest 
> to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
>
>
>> panic OS is not a good choice for the Recoverable error.
>
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
Ok, I will panic here.

>
> This code is never expected to run! (why are we arguing about it?) We should 
> get
> RAS errors as GHES notifications from firmware via some mechanism. If those 
> are
> NOTIFY_SEI then APEI should claim the notification and kick off the 
> appropriate
> handling based on the CPER records. If/when we get kernel-first, that can 
> claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
>
>
>
> James


Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-20 Thread gengdongjiu
2018-01-13 2:05 GMT+08:00 James Morse :
> Hi gengdongjiu,
>
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> + case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> + /*
>>>> +  * Userspace only handle the guest SError Interrupt(SEI) if 
>>>> the
>>>> +  * error has not been propagated
>>>> +  */
>>>> + run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> + run->ex.exception = ESR_ELx_EC_SERROR;
>>>> + run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> + return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either 
>>> handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports 
>>> RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
>
> [..]
>
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception,
>
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
>
> If the kernel takes an unhandled RAS error it should panic - we don't know 
> where
> the error is.
OK, here I will panic.

>
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
>
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables 
> from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
>
>
>> Host does not know which application of guest has error, so host can
>> not handle it,
>
> It has to work this out, otherwise the errors we can handle never get a 
> chance.
>
> This kernel is expected to look at the error description, (which for some 
> reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining 
> signals
> on all affected processes.
>
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
>
> If you pass the notification to user-space, all it can do is signal the guest 
> to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
>
>
>> panic OS is not a good choice for the Recoverable error.
>
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
Ok, I will panic here.

>
> This code is never expected to run! (why are we arguing about it?) We should 
> get
> RAS errors as GHES notifications from firmware via some mechanism. If those 
> are
> NOTIFY_SEI then APEI should claim the notification and kick off the 
> appropriate
> handling based on the CPER records. If/when we get kernel-first, that can 
> claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
>
>
>
> James


Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-20 Thread gengdongjiu
Hi James,
   Sorry for my late response due to out of office recently.

2018-01-13 2:05 GMT+08:00 James Morse <james.mo...@arm.com>:
> Hi gengdongjiu,
>
> On 15/12/17 03:30, gengdongjiu wrote:
>> On 2017/12/7 14:37, gengdongjiu wrote:
>>>> We need to tackle (1) and (3) separately. For (3) we need some API that 
>>>> lets
>>>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't 
>>>> have
>>>> a way of migrating pending SError yet... which is where I got stuck last 
>>>> time I
>>>> was looking at this.
>>> I understand you most idea.
>>>
>>> But In the Qemu one signal type can only correspond to one behavior, can 
>>> not correspond to two behaviors,
>>> otherwise Qemu will do not know how to do.
>>>
>>> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate 
>>> the CPER
>>> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if 
>>> receives the SIGBUS_MCEERR_AO
>>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown 
>>> below:
>>>
>>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>>
>>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify 
>>> trigger method, which all
>>>
>>> not involve _trigger_ an SError.
>>>
>>> so there is no chance for Qemu to trigger the SError when gets the 
>>> SIGBUS_MCEERR_A{O,R}.
>>
>> As I explained above:
>>
>> If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger 
>> Synchronous External Abort;
>> If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;
>
>> So Qemu does not know when to _trigger_ an SError.
>
> There is no answer to this. How the CPU decides is specific to the CPU design.
> How Qemu decides is going to be specific to the machine it emulates.
>
> My understanding is there is some overlap for which RAS errors are reported as
> synchronous external abort, and which use SError. (Obviously the imprecise 
> ones
> are all SError). Which one the CPU uses depends on how the CPU is designed.
>
> When you take an SIGBUS_MCEERR_AR from KVM, its because KVM can't complete a
> stage2 fault because the page is marked with PG_poisoned. These started out 
> as a
> synchronous exception, but you could still report these with SError.

yes, I agree, it is policy choice.

>
> We don't have a way to signal user-space about imprecise exceptions, this 
> isn't
> a KVM specific problem.
>
>
>> so here I "return a error" to Qemu if ghes_notify_sei() return failure in 
>> [1], if you opposed KVM "return error",
>> do you have a better idea about it? thanks
>
> If ghes_notify_sei() fails to claim the error, we should drop through to
> kernel-first-handling. We don't have that yet, just the stub that ignores 
> errors
> where we can make progress.
>
> If neither firmware-first nor kernel-first claim a RAS error, we're in 
> trouble.
> I'd like to panic() as we got a RAS notification but no description of the
> error. We can't do this until we have kernel-first support, hence that stub.
>
>
>> About the way of migrating pending SError, I think it is a separate case, 
>> because Qemu still does not know
>> how and when to trigger the SError.
>
> I agree, but I think we should fix this first before we add another user of 
> this
> unmigratable hypervisor state.
>
> (I recall someone saying migration is needed for any new KVM/cpu features, 
> but I
> can't find the thread)
>
>
>> [1]:
>> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> {
>> ...
>> +   case ESR_ELx_AET_UER:   /* The error has not been propagated */
>> +   /*
>> +* Userspace only handle the guest SError Interrupt(SEI) if 
>> the
>> +* error has not been propagated
>> +*/
>> +   run->exit_reason = KVM_EXIT_EXCEPTION;
>> +   run->ex.exception = ESR_ELx_EC_SERROR;
>
> I'm against telling user space RAS errors ever happened, only the final
> user-visible error when the kernel can't fix it.

thanks for the explanation.
For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
be better[1].
But if you think panic is better until we support kernel-first, it is
also OK to me.


+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struc

Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-20 Thread gengdongjiu
Hi James,
   Sorry for my late response due to out of office recently.

2018-01-13 2:05 GMT+08:00 James Morse :
> Hi gengdongjiu,
>
> On 15/12/17 03:30, gengdongjiu wrote:
>> On 2017/12/7 14:37, gengdongjiu wrote:
>>>> We need to tackle (1) and (3) separately. For (3) we need some API that 
>>>> lets
>>>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't 
>>>> have
>>>> a way of migrating pending SError yet... which is where I got stuck last 
>>>> time I
>>>> was looking at this.
>>> I understand you most idea.
>>>
>>> But In the Qemu one signal type can only correspond to one behavior, can 
>>> not correspond to two behaviors,
>>> otherwise Qemu will do not know how to do.
>>>
>>> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate 
>>> the CPER
>>> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if 
>>> receives the SIGBUS_MCEERR_AO
>>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown 
>>> below:
>>>
>>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>>
>>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify 
>>> trigger method, which all
>>>
>>> not involve _trigger_ an SError.
>>>
>>> so there is no chance for Qemu to trigger the SError when gets the 
>>> SIGBUS_MCEERR_A{O,R}.
>>
>> As I explained above:
>>
>> If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger 
>> Synchronous External Abort;
>> If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;
>
>> So Qemu does not know when to _trigger_ an SError.
>
> There is no answer to this. How the CPU decides is specific to the CPU design.
> How Qemu decides is going to be specific to the machine it emulates.
>
> My understanding is there is some overlap for which RAS errors are reported as
> synchronous external abort, and which use SError. (Obviously the imprecise 
> ones
> are all SError). Which one the CPU uses depends on how the CPU is designed.
>
> When you take an SIGBUS_MCEERR_AR from KVM, its because KVM can't complete a
> stage2 fault because the page is marked with PG_poisoned. These started out 
> as a
> synchronous exception, but you could still report these with SError.

yes, I agree, it is policy choice.

>
> We don't have a way to signal user-space about imprecise exceptions, this 
> isn't
> a KVM specific problem.
>
>
>> so here I "return a error" to Qemu if ghes_notify_sei() return failure in 
>> [1], if you opposed KVM "return error",
>> do you have a better idea about it? thanks
>
> If ghes_notify_sei() fails to claim the error, we should drop through to
> kernel-first-handling. We don't have that yet, just the stub that ignores 
> errors
> where we can make progress.
>
> If neither firmware-first nor kernel-first claim a RAS error, we're in 
> trouble.
> I'd like to panic() as we got a RAS notification but no description of the
> error. We can't do this until we have kernel-first support, hence that stub.
>
>
>> About the way of migrating pending SError, I think it is a separate case, 
>> because Qemu still does not know
>> how and when to trigger the SError.
>
> I agree, but I think we should fix this first before we add another user of 
> this
> unmigratable hypervisor state.
>
> (I recall someone saying migration is needed for any new KVM/cpu features, 
> but I
> can't find the thread)
>
>
>> [1]:
>> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> {
>> ...
>> +   case ESR_ELx_AET_UER:   /* The error has not been propagated */
>> +   /*
>> +* Userspace only handle the guest SError Interrupt(SEI) if 
>> the
>> +* error has not been propagated
>> +*/
>> +   run->exit_reason = KVM_EXIT_EXCEPTION;
>> +   run->ex.exception = ESR_ELx_EC_SERROR;
>
> I'm against telling user space RAS errors ever happened, only the final
> user-visible error when the kernel can't fix it.

thanks for the explanation.
For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
be better[1].
But if you think panic is better until we support kernel-first, it is
also OK to me.


+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+ unsigned int esr

Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-16 Thread gengdongjiu
Hi James,
  thanks very much for your mail and reply, I will check it ASAP. Due to 
recently busy with other thing, so reply may be late.

On 2018/1/13 2:05, James Morse wrote:
> Hi gengdongjiu,
> 
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> + case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> + /*
>>>> +  * Userspace only handle the guest SError Interrupt(SEI) if 
>>>> the
>>>> +  * error has not been propagated
>>>> +  */
>>>> + run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> + run->ex.exception = ESR_ELx_EC_SERROR;
>>>> + run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> + return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either 
>>> handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports 
>>> RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
> 
> [..]
> 
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception, 
> 
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
> 
> If the kernel takes an unhandled RAS error it should panic - we don't know 
> where
> the error is.
> 
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
> 
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables 
> from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
> 
> 
>> Host does not know which application of guest has error, so host can
>> not handle it,
> 
> It has to work this out, otherwise the errors we can handle never get a 
> chance.
> 
> This kernel is expected to look at the error description, (which for some 
> reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining 
> signals
> on all affected processes.
> 
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
> 
> If you pass the notification to user-space, all it can do is signal the guest 
> to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
> 
> 
>> panic OS is not a good choice for the Recoverable error.
> 
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
> 
> This code is never expected to run! (why are we arguing about it?) We should 
> get
> RAS errors as GHES notifications from firmware via some mechanism. If those 
> are
> NOTIFY_SEI then APEI should claim the notification and kick off the 
> appropriate
> handling based on the CPER records. If/when we get kernel-first, that can 
> claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
> 
> 
> 
> James
> 
> .
> 



Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-16 Thread gengdongjiu
Hi James,
  thanks very much for your mail and reply, I will check it ASAP. Due to 
recently busy with other thing, so reply may be late.

On 2018/1/13 2:05, James Morse wrote:
> Hi gengdongjiu,
> 
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> + case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> + /*
>>>> +  * Userspace only handle the guest SError Interrupt(SEI) if 
>>>> the
>>>> +  * error has not been propagated
>>>> +  */
>>>> + run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> + run->ex.exception = ESR_ELx_EC_SERROR;
>>>> + run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> + return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either 
>>> handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports 
>>> RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
> 
> [..]
> 
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception, 
> 
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
> 
> If the kernel takes an unhandled RAS error it should panic - we don't know 
> where
> the error is.
> 
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
> 
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables 
> from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
> 
> 
>> Host does not know which application of guest has error, so host can
>> not handle it,
> 
> It has to work this out, otherwise the errors we can handle never get a 
> chance.
> 
> This kernel is expected to look at the error description, (which for some 
> reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining 
> signals
> on all affected processes.
> 
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
> 
> If you pass the notification to user-space, all it can do is signal the guest 
> to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
> 
> 
>> panic OS is not a good choice for the Recoverable error.
> 
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
> 
> This code is never expected to run! (why are we arguing about it?) We should 
> get
> RAS errors as GHES notifications from firmware via some mechanism. If those 
> are
> NOTIFY_SEI then APEI should claim the notification and kick off the 
> appropriate
> handling based on the CPER records. If/when we get kernel-first, that can 
> claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
> 
> 
> 
> James
> 
> .
> 



  1   2   3   4   5   >