答复: [PATCH RESEND] arm64: fault: avoid send SIGBUS two times

2017-12-11 Thread gengdongjiu
Hi James,
  Thanks for your review and suggestion.

> Hi gengdongjiu,
> 
> On 08/12/17 04:43, gengdongjiu wrote:
> > by the way, I think also change the info.si_code to "BUS_MCEERR_AR" is 
> > better, as shown [1].
> > BUS_MCEERR_AR can tell user space  "Hardware memory error consumed on a 
> > error; action required".
> 
> Today its also used as the last-resort. This signal tells user-space the page 
> can't be re-read from disk/swap, and its been unmapped from all
> affected processes.
> 
> I think using it like this (tempting as it is) changes the meaning.


Consider again, I think what is your said is reasonable, when the 
ghes_notify_sea() return failure, it means the meory_failure() does not handler 
the error and even not 
unmapped the affected processes. 
So setting the si_code to BUS_MCEERR_AR may not better, I will set the si_code 
to 0.
Thanks for your suggestion and reminder.

> 
> 
> > so it is better than "0". In the X86 platform, it also use the 
> > "BUS_MCEERR_AR" for si_code[2] in "arch/x86/mm/fault.c".
> > what do you think about it?
> 
> This is heading into kernel-first territory, I'd prefer we do that all at 
> once so we know everything is covered.

Yes, it is.

> 
> 
> > [2]:
> > arch/x86/mm/fault.c:
> >
> > static void
> > do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long 
> > address,
> >   u32 *pkey, unsigned int fault)
> > {
> >   ..
> > #ifdef CONFIG_MEMORY_FAILURE
> > if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
> 
> These VM_FAULT flags indicate memory_failure() has run, tried to re-read the 
> memory from disk/swap, failed, and unmapped the page
> from all affected processes.

Understand, These VM_FAULT flags is different with the do_sea() handling. In 
the do_sea(), the memory_failure() may not run.  

> 
> 
> > printk(KERN_ERR
> > "MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> > tsk->comm, tsk->pid, address);
> > code = BUS_MCEERR_AR;
> > }
> > #endif
> > force_sig_info_fault(SIGBUS, code, address, tsk, pkey, fault); }
> 
> This is x86's page fault handler, not its Machine-Check-Exception handler.
> 
> arm64's page fault handler does this too, from do_page_fault():

Yes, indeed. 
just now I check the code, you are right. 

> > } else if (fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) {
> > sig = SIGBUS;
> > code = BUS_MCEERR_AR;
> 
> 
> If you're seeing this, its likely due to the race Xie XiuQi spotted where the 
> recovery action has been queued, then we return to user-space
> before its done.
> 
> I had a go at tackling this, adding helpers to kick the assorted queues, 
> which we can do if we took the exception from user-space. Where I
> got stuck is whether we should still force a signal, and how signals get 
> merged. I'll try and spend some more time on that this week.


Understand, thanks for tacking and effort.

> 
> 
> 
> Thanks,
> 
> James


Re: [PATCH RESEND] arm64: fault: avoid send SIGBUS two times

2017-12-11 Thread James Morse
Hi gengdongjiu,

On 08/12/17 04:43, gengdongjiu wrote:
> by the way, I think also change the info.si_code to "BUS_MCEERR_AR" is 
> better, as shown [1].
> BUS_MCEERR_AR can tell user space  "Hardware memory error consumed on a 
> error; action required".

Today its also used as the last-resort. This signal tells user-space the page
can't be re-read from disk/swap, and its been unmapped from all affected 
processes.

I think using it like this (tempting as it is) changes the meaning.


> so it is better than "0". In the X86 platform, it also use the 
> "BUS_MCEERR_AR" for si_code[2] in "arch/x86/mm/fault.c".
> what do you think about it?

This is heading into kernel-first territory, I'd prefer we do that all at once
so we know everything is covered.


> [2]:
> arch/x86/mm/fault.c:
> 
> static void
> do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long 
> address,
>   u32 *pkey, unsigned int fault)
> {
>   ..
> #ifdef CONFIG_MEMORY_FAILURE
> if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {

These VM_FAULT flags indicate memory_failure() has run, tried to re-read the
memory from disk/swap, failed, and unmapped the page from all affected 
processes.


> printk(KERN_ERR
> "MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> tsk->comm, tsk->pid, address);
> code = BUS_MCEERR_AR;
> }
> #endif
> force_sig_info_fault(SIGBUS, code, address, tsk, pkey, fault);
> }

This is x86's page fault handler, not its Machine-Check-Exception handler.

arm64's page fault handler does this too, from do_page_fault():
>   } else if (fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) {
>   sig = SIGBUS;
>   code = BUS_MCEERR_AR;


If you're seeing this, its likely due to the race Xie XiuQi spotted where the
recovery action has been queued, then we return to user-space before its done.

I had a go at tackling this, adding helpers to kick the assorted queues, which
we can do if we took the exception from user-space. Where I got stuck is whether
we should still force a signal, and how signals get merged. I'll try and spend
some more time on that this week.



Thanks,

James


Re: [PATCH RESEND] arm64: fault: avoid send SIGBUS two times

2017-12-07 Thread gengdongjiu
Hi James, Will

On 2017/12/7 22:32, James Morse wrote:
> Hi gengdongjiu, Will,
> 
> On 07/12/17 05:55, gengdongjiu wrote:
>> On 2017/12/7 0:15, Will Deacon wrote:
 --- a/arch/arm64/mm/fault.c
 +++ b/arch/arm64/mm/fault.c
 @@ -570,7 +570,6 @@ static int do_sea(unsigned long addr, unsigned int 
 esr, struct pt_regs *regs)
  {
struct siginfo info;
const struct fault_info *inf;
 -  int ret = 0;
  
inf = esr_to_fault_info(esr);
pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
 @@ -585,7 +584,7 @@ static int do_sea(unsigned long addr, unsigned int 
 esr, struct pt_regs *regs)
if (interrupts_enabled(regs))
nmi_enter();
  
 -  ret = ghes_notify_sea();
 +  ghes_notify_sea();
  
if (interrupts_enabled(regs))
nmi_exit();
 @@ -600,7 +599,7 @@ static int do_sea(unsigned long addr, unsigned int 
 esr, struct pt_regs *regs)
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, &info, esr);
  
 -  return ret;
 +  return 0;
> 
>>> Hmm, so this code is a bit of mess.
>>>
>>> Wouldn't it be better to have the signal dispatching code in do_mem_abort
>>> check ESR.ESR_ELx_FnV, so then do_sea wouldn't have to, and we could just
>>> return an error instead?
> 
> FnV only applies to one of the Synchronous External Abort ESRs, hence it ended
> up in her>
>> Regardless ghes_notify_sea()'s return value, it always needs to deliver 
>> signal,
>> because ghes_notify_sea()'s return value does not reflect whether the memory 
>> error
>> handler(memory_failure()) handle the error successfully or failed. If let 
>> do_mem_abort()
>> delivers the signal, we should always let do_sea() return error, then  the 
>> do_mem_abort() can
>> always deliver signal. Then we will see the strange log as shown below when 
>> happen Synchronous External Abort.
>>
>> [  676.700652] Synchronous External Abort: synchronous external abort 
>> (0x96000410) at 0x33ff7008
>> [  676.723301] Unhandled fault: synchronous external abort (0x96000410) at 
>> 0x33ff7008
>>
>> so I think it is better send the signal in the do_sea(), not send it in the 
>> do_mem_abort().
> 
> I agree: I think improving the commit message would help here, something like:
> -
> do_sea() calls arm64_notify_die() which will always signal user-space.
> It also returns whether APEI claimed the external abort as a RAS notification.
> If it returns failure do_mem_abort() will signal user-space too.
> 
> do_mem_abort() wants to know if we handled the error, we always call
> arm64_notify_die() so can always return success.
> -

Thanks for the agreement and good example. surely I will update the commit 
message to clearly describe it.
by the way, I think also change the info.si_code to "BUS_MCEERR_AR" is better, 
as shown [1].
BUS_MCEERR_AR can tell user space  "Hardware memory error consumed on a error; 
action required".
so it is better than "0". In the X86 platform, it also use the "BUS_MCEERR_AR" 
for si_code[2] in "arch/x86/mm/fault.c".
what do you think about it?

[1]:
static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
{
.
info.si_signo = SIGBUS;
info.si_errno = 0;
-   info.si_code  = 0;
+   info.si_code  = BUS_MCEERR_AR;
}

[2]:
arch/x86/mm/fault.c:

static void
do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
  u32 *pkey, unsigned int fault)
{
  ..
#ifdef CONFIG_MEMORY_FAILURE
if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
printk(KERN_ERR
"MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
tsk->comm, tsk->pid, address);
code = BUS_MCEERR_AR;
}
#endif
force_sig_info_fault(SIGBUS, code, address, tsk, pkey, fault);
}


> 
> APEI's return value matters for KVM, and it will matter here too if we support
> kernel-first.
yes.

> 
> 
>> do_mem_abort() only send the signal when the exception does not defined in 
>> fault_info[]. Another benefit
>> is that do_sea() can send different signal according to the Synchronous 
>> External Abort type, such as SIGBUS or SIGKILL.
>> the do_mem_abort() can only send one kind signal.
> 
> (I'm not convinced we want to do this other than via the firwmare/kernel RAS
> code, but that is a separate issue)
yes, that is a separate issue.

> 
> 
> Thanks,
> 
> James
> 
> 
> .
> 



Re: [PATCH RESEND] arm64: fault: avoid send SIGBUS two times

2017-12-07 Thread James Morse
Hi gengdongjiu, Will,

On 07/12/17 05:55, gengdongjiu wrote:
> On 2017/12/7 0:15, Will Deacon wrote:
>>> --- a/arch/arm64/mm/fault.c
>>> +++ b/arch/arm64/mm/fault.c
>>> @@ -570,7 +570,6 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>>> struct pt_regs *regs)
>>>  {
>>> struct siginfo info;
>>> const struct fault_info *inf;
>>> -   int ret = 0;
>>>  
>>> inf = esr_to_fault_info(esr);
>>> pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>> @@ -585,7 +584,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>>> struct pt_regs *regs)
>>> if (interrupts_enabled(regs))
>>> nmi_enter();
>>>  
>>> -   ret = ghes_notify_sea();
>>> +   ghes_notify_sea();
>>>  
>>> if (interrupts_enabled(regs))
>>> nmi_exit();
>>> @@ -600,7 +599,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>>> struct pt_regs *regs)
>>> info.si_addr  = (void __user *)addr;
>>> arm64_notify_die("", regs, &info, esr);
>>>  
>>> -   return ret;
>>> +   return 0;

>> Hmm, so this code is a bit of mess.
>>
>> Wouldn't it be better to have the signal dispatching code in do_mem_abort
>> check ESR.ESR_ELx_FnV, so then do_sea wouldn't have to, and we could just
>> return an error instead?

FnV only applies to one of the Synchronous External Abort ESRs, hence it ended
up in here.


> Regardless ghes_notify_sea()'s return value, it always needs to deliver 
> signal,
> because ghes_notify_sea()'s return value does not reflect whether the memory 
> error
> handler(memory_failure()) handle the error successfully or failed. If let 
> do_mem_abort()
> delivers the signal, we should always let do_sea() return error, then  the 
> do_mem_abort() can
> always deliver signal. Then we will see the strange log as shown below when 
> happen Synchronous External Abort.
> 
> [  676.700652] Synchronous External Abort: synchronous external abort 
> (0x96000410) at 0x33ff7008
> [  676.723301] Unhandled fault: synchronous external abort (0x96000410) at 
> 0x33ff7008
> 
> so I think it is better send the signal in the do_sea(), not send it in the 
> do_mem_abort().

I agree: I think improving the commit message would help here, something like:
-
do_sea() calls arm64_notify_die() which will always signal user-space.
It also returns whether APEI claimed the external abort as a RAS notification.
If it returns failure do_mem_abort() will signal user-space too.

do_mem_abort() wants to know if we handled the error, we always call
arm64_notify_die() so can always return success.
-

APEI's return value matters for KVM, and it will matter here too if we support
kernel-first.


> do_mem_abort() only send the signal when the exception does not defined in 
> fault_info[]. Another benefit
> is that do_sea() can send different signal according to the Synchronous 
> External Abort type, such as SIGBUS or SIGKILL.
> the do_mem_abort() can only send one kind signal.

(I'm not convinced we want to do this other than via the firwmare/kernel RAS
code, but that is a separate issue)


Thanks,

James



Re: [PATCH RESEND] arm64: fault: avoid send SIGBUS two times

2017-12-06 Thread gengdongjiu
On 2017/12/7 0:15, Will Deacon wrote:
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -570,7 +570,6 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>> struct pt_regs *regs)
>>  {
>>  struct siginfo info;
>>  const struct fault_info *inf;
>> -int ret = 0;
>>  
>>  inf = esr_to_fault_info(esr);
>>  pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>> @@ -585,7 +584,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>> struct pt_regs *regs)
>>  if (interrupts_enabled(regs))
>>  nmi_enter();
>>  
>> -ret = ghes_notify_sea();
>> +ghes_notify_sea();
>>  
>>  if (interrupts_enabled(regs))
>>  nmi_exit();
>> @@ -600,7 +599,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>> struct pt_regs *regs)
>>  info.si_addr  = (void __user *)addr;
>>  arm64_notify_die("", regs, &info, esr);
>>  
>> -return ret;
>> +return 0;
> Hmm, so this code is a bit of mess.
> 
> Wouldn't it be better to have the signal dispatching code in do_mem_abort
> check ESR.ESR_ELx_FnV, so then do_sea wouldn't have to, and we could just
> return an error instead?

Thanks the mail and comments!

Regardless ghes_notify_sea()'s return value, it always needs to deliver signal,
because ghes_notify_sea()'s return value does not reflect whether the memory 
error
handler(memory_failure()) handle the error successfully or failed. If let 
do_mem_abort()
delivers the signal, we should always let do_sea() return error, then  the 
do_mem_abort() can
always deliver signal. Then we will see the strange log as shown below when 
happen Synchronous External Abort.

[  676.700652] Synchronous External Abort: synchronous external abort 
(0x96000410) at 0x33ff7008
[  676.723301] Unhandled fault: synchronous external abort (0x96000410) at 
0x33ff7008

so I think it is better send the signal in the do_sea(), not send it in the 
do_mem_abort().
do_mem_abort() only send the signal when the exception does not defined in 
fault_info[]. Another benefit
is that do_sea() can send different signal according to the Synchronous 
External Abort type, such as SIGBUS or SIGKILL.
the do_mem_abort() can only send one kind signal.




Re: [PATCH RESEND] arm64: fault: avoid send SIGBUS two times

2017-12-06 Thread Will Deacon
On Tue, Dec 05, 2017 at 11:02:35PM +0800, Dongjiu Geng wrote:
> If APEI handling the memory error is failed, the do_mem_abort()
> and do_sea() will all deliver SIGBUS. In fact, sending one time
> can be enough, so correct it.
> 
> Signed-off-by: Dongjiu Geng 
> ---
>  arch/arm64/mm/fault.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index fcf2ede3..9e3f7ca 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -570,7 +570,6 @@ static int do_sea(unsigned long addr, unsigned int esr, 
> struct pt_regs *regs)
>  {
>   struct siginfo info;
>   const struct fault_info *inf;
> - int ret = 0;
>  
>   inf = esr_to_fault_info(esr);
>   pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
> @@ -585,7 +584,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
> struct pt_regs *regs)
>   if (interrupts_enabled(regs))
>   nmi_enter();
>  
> - ret = ghes_notify_sea();
> + ghes_notify_sea();
>  
>   if (interrupts_enabled(regs))
>   nmi_exit();
> @@ -600,7 +599,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
> struct pt_regs *regs)
>   info.si_addr  = (void __user *)addr;
>   arm64_notify_die("", regs, &info, esr);
>  
> - return ret;
> + return 0;

Hmm, so this code is a bit of mess.

Wouldn't it be better to have the signal dispatching code in do_mem_abort
check ESR.ESR_ELx_FnV, so then do_sea wouldn't have to, and we could just
return an error instead?

Will


[PATCH RESEND] arm64: fault: avoid send SIGBUS two times

2017-12-04 Thread Dongjiu Geng
If APEI handling the memory error is failed, the do_mem_abort()
and do_sea() will all deliver SIGBUS. In fact, sending one time
can be enough, so correct it.

Signed-off-by: Dongjiu Geng 
---
 arch/arm64/mm/fault.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index fcf2ede3..9e3f7ca 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -570,7 +570,6 @@ static int do_sea(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
 {
struct siginfo info;
const struct fault_info *inf;
-   int ret = 0;
 
inf = esr_to_fault_info(esr);
pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
@@ -585,7 +584,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
if (interrupts_enabled(regs))
nmi_enter();
 
-   ret = ghes_notify_sea();
+   ghes_notify_sea();
 
if (interrupts_enabled(regs))
nmi_exit();
@@ -600,7 +599,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, &info, esr);
 
-   return ret;
+   return 0;
 }
 
 static const struct fault_info fault_info[] = {
-- 
2.10.1