Re: linux-next: manual merge of the edac-amd tree with the edac tree

2016-12-01 Thread Yazen Ghannam
On Thu, Dec 01, 2016 at 07:15:01PM +0100, Borislav Petkov wrote:
> On Thu, Dec 01, 2016 at 11:02:04AM -0500, Yazen Ghannam wrote:
> > A deferred error is an uncorrectable error whose handling can be
> > deferred, i.e. it's not urgent. This affects the system behavior, but
> > I'm now thinking that this shouldn't affect users' behavior. I think it
> > would be simpler to just classify deferred errors as uncorrectable
> > errors so that users treat them as such.
> 
> Why would we want to lie about deferred errors being uncorrectable?
>

They are uncorrectable errors that can be handled differently. If you
can't handle them then there's not much difference.
 
> And I believe deferred errors can be handled differently like freeze the
> process using the page instead of killing it. And so on...
> 

If deferred errors can be handled differently in userspace, then you're
right we should maintain the distinction. I was thinking we'd only
handle them in the kernel.

> Why aren't you simply adding the documentation about
> HW_EVENT_ERR_DEFERRED and be done with it? The downstream path like
> tracepoint and all can handle all that just fine.
> 

Okay, will do.

> > Boris,
> > Can we drop or revert commit d12a969ebbfc?
> 
> No can do. It is a public branch and there's no touching it.
> 

Okay, got it.

Thanks,
Yazen


Re: linux-next: manual merge of the edac-amd tree with the edac tree

2016-12-01 Thread Borislav Petkov
On Thu, Dec 01, 2016 at 11:02:04AM -0500, Yazen Ghannam wrote:
> A deferred error is an uncorrectable error whose handling can be
> deferred, i.e. it's not urgent. This affects the system behavior, but
> I'm now thinking that this shouldn't affect users' behavior. I think it
> would be simpler to just classify deferred errors as uncorrectable
> errors so that users treat them as such.

Why would we want to lie about deferred errors being uncorrectable?

And I believe deferred errors can be handled differently like freeze the
process using the page instead of killing it. And so on...

Why aren't you simply adding the documentation about
HW_EVENT_ERR_DEFERRED and be done with it? The downstream path like
tracepoint and all can handle all that just fine.

> Boris,
> Can we drop or revert commit d12a969ebbfc?

No can do. It is a public branch and there's no touching it.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: linux-next: manual merge of the edac-amd tree with the edac tree

2016-12-01 Thread Yazen Ghannam
On Thu, Dec 01, 2016 at 10:06:17AM -0200, Mauro Carvalho Chehab wrote:
> 
> However, rebasing over your tree showed a new documentation gap:
>   ./include/linux/edac.h:144: warning: Enum value 'HW_EVENT_ERR_DEFERRED' 
> not described in enum 'hw_event_mc_err_type'
> 
> With was introduced by this commit:
> 
>   commit d12a969ebbfcfc25853c4147d42b388f758e8784
>   Author: Yazen Ghannam 
>   Date:   Thu Nov 17 17:57:32 2016 -0500
> 
> EDAC, amd64: Add Deferred Error type
> 
> Currently, deferred errors are classified as correctable in EDAC. Add a
> new error type for deferred errors so that they are correctly reported
> to the user.
> 
> Signed-off-by: Yazen Ghannam 
> Cc: Aravind Gopalakrishnan 
> Cc: linux-edac 
> Link: 
> http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-yazen.ghan...@amd.com
> Signed-off-by: Borislav Petkov 
> 
> 
> Yazen introduced a "deferred error" code (whatever it means), but didn't
> document what's that. Unfortunately, the patch description is also
> not clear enough about what a "deferred error" means or how userspace
> is supposed to handle it.
> 
> Yazen,
> 
> Could you please send us a patch adding a proper description for this
> new error code?
>

Hi Mauro,
A deferred error is an uncorrectable error whose handling can be
deferred, i.e. it's not urgent. This affects the system behavior, but
I'm now thinking that this shouldn't affect users' behavior. I think it
would be simpler to just classify deferred errors as uncorrectable
errors so that users treat them as such.

Boris,
Can we drop or revert commit d12a969ebbfc?

And can we apply a fixup like this to commit 713ad54675fd?

---
From: Yazen Ghannam 
Date: Thu, 1 Dec 2016 08:54:49 -0600
Subject: [PATCH] fixup! EDAC, amd64: Define and register UMC error decode
 function

Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 991b36c..245b9a0 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2480,8 +2480,9 @@ static void decode_umc_error(int node_id, struct mce *m)
 
memset(&err, 0, sizeof(err));
 
+   /* Log deferred errors as uncorrectable errors. */
if (m->status & MCI_STATUS_DEFERRED)
-   ecc_type = 3;
+   ecc_type = 1;
 
err.channel = find_umc_channel(pvt, m);
if (err.channel < 0) {
-- 
2.7.4

---

Thanks,
Yazen


Re: linux-next: manual merge of the edac-amd tree with the edac tree

2016-12-01 Thread Mauro Carvalho Chehab
Em Thu, 1 Dec 2016 11:48:46 +0100
Borislav Petkov  escreveu:

> On Wed, Nov 30, 2016 at 08:50:13AM -0200, Mauro Carvalho Chehab wrote:
> > Fixed. If you have a stable branch, I can rebase it on the top
> > of your patches, in order to avoid the confict at linux-next.
> 
> http://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/log/?h=for-next

Thanks!

> but with that fixed, shouldn't the merge work without a conflict?

Well, we're removing some lines from one file and adding them on another
one. Git is not smart enough to solve such conflicts automatically:

CONFLICT (content): Merge conflict in drivers/edac/edac_mc.c
Recorded preimage for 'drivers/edac/edac_mc.c'
Automatic merge failed; fix conflicts and then commit the result.

Anyway, I rebased my work on the top of your branch. The conflict
is now gone:


https://git.kernel.org/cgit/linux/kernel/git/mchehab/linux-edac.git/log/?h=linux_next

You can see the generated documentation at:
https://mchehab.fedorapeople.org/kernel_docs/admin-guide/ras.html
(user and admin guide)
and:
https://mchehab.fedorapeople.org/kernel_docs/driver-api/edac.html
(kernel API)

However, rebasing over your tree showed a new documentation gap:
./include/linux/edac.h:144: warning: Enum value 'HW_EVENT_ERR_DEFERRED' 
not described in enum 'hw_event_mc_err_type'

With was introduced by this commit:

  commit d12a969ebbfcfc25853c4147d42b388f758e8784
  Author: Yazen Ghannam 
  Date:   Thu Nov 17 17:57:32 2016 -0500

EDAC, amd64: Add Deferred Error type

Currently, deferred errors are classified as correctable in EDAC. Add a
new error type for deferred errors so that they are correctly reported
to the user.

Signed-off-by: Yazen Ghannam 
Cc: Aravind Gopalakrishnan 
Cc: linux-edac 
Link: 
http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-yazen.ghan...@amd.com
Signed-off-by: Borislav Petkov 


Yazen introduced a "deferred error" code (whatever it means), but didn't
document what's that. Unfortunately, the patch description is also
not clear enough about what a "deferred error" means or how userspace
is supposed to handle it.

Yazen,

Could you please send us a patch adding a proper description for this
new error code?

Thanks!
Mauro


Re: linux-next: manual merge of the edac-amd tree with the edac tree

2016-12-01 Thread Borislav Petkov
On Wed, Nov 30, 2016 at 08:50:13AM -0200, Mauro Carvalho Chehab wrote:
> Fixed. If you have a stable branch, I can rebase it on the top
> of your patches, in order to avoid the confict at linux-next.

http://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/log/?h=for-next

but with that fixed, shouldn't the merge work without a conflict?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: linux-next: manual merge of the edac-amd tree with the edac tree

2016-11-30 Thread Mauro Carvalho Chehab
Em Mon, 28 Nov 2016 09:27:35 +0100
Borislav Petkov  escreveu:

> On Mon, Nov 28, 2016 at 02:37:26PM +1100, Stephen Rothwell wrote:
> > Hi Borislav,
> > 
> > Today's linux-next merge of the edac-amd tree got a conflict in:
> > 
> >   drivers/edac/edac_mc.c
> > 
> > between commit:
> > 
> >   ef91afa61088 ("edac: move documentation from edac_mc.c to edac_core.h")
> > 
> > from the edac tree and commit:
> > 
> >   c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list")
> > 
> > from the edac-amd tree.
> > 
> > I fixed it up (see below - there may be more fixes needed in
> > edac_core.h) and can carry the fix as necessary. This is now fixed as
> > far as linux-next is concerned, but any non trivial conflicts should be
> > mentioned to your upstream maintainer when your tree is submitted for
> > merging.  You may also want to consider cooperating with the maintainer
> > of the conflicting tree to minimise any particularly complex conflicts.  
> 
> Just one issue which has nothing to do with linux-next. There's still
> that in ef91afa61088:
> 
> > +/**
> > + * edac_mc_find: Search for a mem_ctl_info structure whose index is @idx.
> > + *
> > + * @idx: index to be seek
> > + *
> > + * If found, return a pointer to the structure.
> > + * Else return NULL.
> > + *
> > + * Caller must hold mem_ctls_mutex.
> > + */  
> 
> That last sentence in the comment is not true anymore - edac_mc_find()
> is grabbing the mutex itself as it should be. Mauro, please fix that in
> your tree.

Fixed. If you have a stable branch, I can rebase it on the top
of your patches, in order to avoid the confict at linux-next.

Regards,
Mauro


Re: linux-next: manual merge of the edac-amd tree with the edac tree

2016-11-28 Thread Borislav Petkov
On Mon, Nov 28, 2016 at 02:37:26PM +1100, Stephen Rothwell wrote:
> Hi Borislav,
> 
> Today's linux-next merge of the edac-amd tree got a conflict in:
> 
>   drivers/edac/edac_mc.c
> 
> between commit:
> 
>   ef91afa61088 ("edac: move documentation from edac_mc.c to edac_core.h")
> 
> from the edac tree and commit:
> 
>   c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list")
> 
> from the edac-amd tree.
> 
> I fixed it up (see below - there may be more fixes needed in
> edac_core.h) and can carry the fix as necessary. This is now fixed as
> far as linux-next is concerned, but any non trivial conflicts should be
> mentioned to your upstream maintainer when your tree is submitted for
> merging.  You may also want to consider cooperating with the maintainer
> of the conflicting tree to minimise any particularly complex conflicts.

Just one issue which has nothing to do with linux-next. There's still
that in ef91afa61088:

> +/**
> + * edac_mc_find: Search for a mem_ctl_info structure whose index is @idx.
> + *
> + * @idx: index to be seek
> + *
> + * If found, return a pointer to the structure.
> + * Else return NULL.
> + *
> + * Caller must hold mem_ctls_mutex.
> + */

That last sentence in the comment is not true anymore - edac_mc_find()
is grabbing the mutex itself as it should be. Mauro, please fix that in
your tree.

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.