Re: [PATCH EDACv16 1/2] edac: Change internal representation to work with layers

2012-05-04 Thread Borislav Petkov
On Thu, May 03, 2012 at 11:16:54AM -0300, Mauro Carvalho Chehab wrote:
  +  bool enable_filter,
  +  unsigned pos[EDAC_MAX_LAYERS])
  
  Passing the whole array as an argument instead of only a pointer to it?
 
 This is C, and not C++ or Pascal. Only the pointer is passed here. The size
 of the array is used for type check only.

Right, and you can see where he still has trouble. And by he I mean me :).

[ … ]

  +void edac_mc_handle_error(const enum hw_event_mc_err_type type,
  +struct mem_ctl_info *mci,
  +const unsigned long page_frame_number,
  +const unsigned long offset_in_page,
  +const unsigned long syndrome,
  +const int layer0,
  +const int layer1,
  +const int layer2,
  
  Instead of passing each layer as an arg, you can prepare the array pos[]
  in each edac_mc_hanlde_*() and pass around a pointer to it - you need it
  anyway in the edac_mc_inc*() functions.
 
 Yes, but the changes at the drivers will be more complex, without any reason:
 before each call to this function, they would need to create and fill a 
 temporary
 array.
 
 As there are only 3 layers, in the worse case, this way is simpler and more
 efficient. We can review it, if we ever need more than 3 layers.

I see, the edac_mc_handle_error is the main interface for all edac drivers, ok.

[ … ]

  +  bool enable_filter = false;
  
  What does this enable_filter thing mean:
  
  if (pos[i] = 0)
  enable_filter = true;
  
  This absolutely needs explanation and better naming!
 
 Renamed it to enable_per_layer_report.

Or detailed_dimm_report or ...

 The code that implement it seems self-explained: 
 
 ..
   if (enable_filter  dimm-nr_pages) {
   if (p != label) {
   strcpy(p, OTHER_LABEL);
   p += strlen(OTHER_LABEL);
   }
   strcpy(p, dimm-label);
   p += strlen(p);
   *p = '\0';
 
 ..
 
   if (!enable_filter) {
   strcpy(label, any memory);
   } else {
   debugf4(%s: csrow/channel to increment: (%d,%d)\n,
   __func__, row, chan);
   if (p == label)
   strcpy(label, unknown memory);
   if (type == HW_EVENT_ERR_CORRECTED) {
   if (row = 0) {
   mci-csrows[row].ce_count++;
   if (chan = 0)
   
 mci-csrows[row].channels[chan].ce_count++;
   }
   } else
   if (row = 0)
   mci-csrows[row].ue_count++;
   }
 
 Theis flag indicates if is there any useful information about the affected
 DIMM(s) provided by the EDAC driver. If this is provided, the DIMM location 
 labels are
 filtered and reported, and the per-layer error counters are incremented.
 
 As it was discussed on previous reviews, with FB-DIMM MCs, and/or when mirror 
 mode/lockstep mode is enabled, the memory controller points errors to 2 DIMMs 
 (or 4 DIMMs, when both mirror mode and lockstep mode are enabled) on most 
 memory
 controllers, under some conditions. The edac_mc_handle_fbd_ue() function call 
 were
 created due to that.
 
 When comparing with the old code, enable_filter = false would be equivalent 
 to call
 edac_mc_handle_ce_no_info/edac_mc_handle_ue_no_info.
 
 I'm adding a comment about it.

Much better, thanks.

Btw, I have to admit, this is a pretty strange way of handling the case
where layers are { -1, -1, -1 }, i.e. edac_mc_handle_error is called
with the no info hint.

I'm wondering whether it wouldn't be more readable if you could do

edac_mc_handle_error(HW_EVENT_ERR_INFO_INVALID | ..)

or similar and define such a flag which simply states that. But you'll
have to change enum hw_event_mc_err_type to a bitfield to allow more
than one set bit.

Hmm.


[ … ]

  The SCRUB_SW_SRC piece can be another function.
 
 It is now part of the edac_ce_error().

Hm, I can't find this function on your experimental branch on
infradead but it is mentioned in the inlined patch below, what's going
on? Which patch should I be looking at now?

[ … ]

 The following patch addresses the pointed issues. I've updated them
 on my experimental branch at infradead:
   git://git.infradead.org/users/mchehab/edac.git experimental

Ok, I checked this one out but can't find the edac_ce_error() function
as already stated above, pls check.

 They'll also be soon available at:
   git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac.git 
 hw_events_v18

Will review the patch below now and reply in another mail.

Thanks.

 
 Regards,
 Mauro
 
 -
 
 edac: Change internal representation to work with layers
 
 

Re: [PATCH EDACv16 1/2] edac: Change internal representation to work with layers

2012-05-04 Thread Mauro Carvalho Chehab
Em 04-05-2012 06:52, Borislav Petkov escreveu:
 On Thu, May 03, 2012 at 11:16:54AM -0300, Mauro Carvalho Chehab wrote:
 +  bool enable_filter,
 +  unsigned pos[EDAC_MAX_LAYERS])

 Passing the whole array as an argument instead of only a pointer to it?

 This is C, and not C++ or Pascal. Only the pointer is passed here. The size
 of the array is used for type check only.
 
 Right, and you can see where he still has trouble. And by he I mean me :).

:)
 
 [ … ]
 
 +void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 +struct mem_ctl_info *mci,
 +const unsigned long page_frame_number,
 +const unsigned long offset_in_page,
 +const unsigned long syndrome,
 +const int layer0,
 +const int layer1,
 +const int layer2,

 Instead of passing each layer as an arg, you can prepare the array pos[]
 in each edac_mc_hanlde_*() and pass around a pointer to it - you need it
 anyway in the edac_mc_inc*() functions.

 Yes, but the changes at the drivers will be more complex, without any reason:
 before each call to this function, they would need to create and fill a 
 temporary
 array.

 As there are only 3 layers, in the worse case, this way is simpler and more
 efficient. We can review it, if we ever need more than 3 layers.
 
 I see, the edac_mc_handle_error is the main interface for all edac drivers, 
 ok.
 
 [ … ]
 
 +  bool enable_filter = false;

 What does this enable_filter thing mean:

 if (pos[i] = 0)
 enable_filter = true;

 This absolutely needs explanation and better naming!

 Renamed it to enable_per_layer_report.
 
 Or detailed_dimm_report or ...

Detail is used on another context; enable_per_layer_report won't generate
any doubts for anyone reading the code.

 The code that implement it seems self-explained: 

 ..
  if (enable_filter  dimm-nr_pages) {
  if (p != label) {
  strcpy(p, OTHER_LABEL);
  p += strlen(OTHER_LABEL);
  }
  strcpy(p, dimm-label);
  p += strlen(p);
  *p = '\0';

 ..

  if (!enable_filter) {
  strcpy(label, any memory);
  } else {
  debugf4(%s: csrow/channel to increment: (%d,%d)\n,
  __func__, row, chan);
  if (p == label)
  strcpy(label, unknown memory);
  if (type == HW_EVENT_ERR_CORRECTED) {
  if (row = 0) {
  mci-csrows[row].ce_count++;
  if (chan = 0)
  
 mci-csrows[row].channels[chan].ce_count++;
  }
  } else
  if (row = 0)
  mci-csrows[row].ue_count++;
  }

 Theis flag indicates if is there any useful information about the affected
 DIMM(s) provided by the EDAC driver. If this is provided, the DIMM location 
 labels are
 filtered and reported, and the per-layer error counters are incremented.

 As it was discussed on previous reviews, with FB-DIMM MCs, and/or when 
 mirror 
 mode/lockstep mode is enabled, the memory controller points errors to 2 
 DIMMs 
 (or 4 DIMMs, when both mirror mode and lockstep mode are enabled) on most 
 memory
 controllers, under some conditions. The edac_mc_handle_fbd_ue() function 
 call were
 created due to that.

 When comparing with the old code, enable_filter = false would be 
 equivalent to call
 edac_mc_handle_ce_no_info/edac_mc_handle_ue_no_info.

 I'm adding a comment about it.
 
 Much better, thanks.
 
 Btw, I have to admit, this is a pretty strange way of handling the case
 where layers are { -1, -1, -1 }, i.e. edac_mc_handle_error is called
 with the no info hint.

Well, negative values are used on Linux to indicate error conditions, so this 
is not
that strange. Also, it allows partial no info, as, on some cases, a channel or
a csrow may not be known. So, this allows code simplification at the drivers. 

For example, look at this hunk on the amd64_edac conversion patch:

@@ -1585,16 +1609,10 @@ static void f1x_map_sysaddr_to_csrow(struct 
mem_ctl_info *mci, u64 sys_addr,
if (dct_ganging_enabled(pvt))
chan = get_channel_from_ecc_syndrome(mci, syndrome);
 
-   if (chan = 0)
-   edac_mc_handle_ce(mci, page, offset, syndrome, csrow, chan,
- EDAC_MOD_STR);
-   else
-   /*
-* Channel unknown, report all channels on this CSROW as failed.
-*/
-   for (chan = 0; chan  mci-csrows[csrow].nr_channels; chan++)
-   edac_mc_handle_ce(mci, page, offset, syndrome,
- csrow, chan, EDAC_MOD_STR);
+

Re: [PATCH EDACv16 1/2] edac: Change internal representation to work with layers, second version

2012-05-04 Thread Borislav Petkov
On Thu, May 03, 2012 at 11:16:54AM -0300, Mauro Carvalho Chehab wrote:
 edac: Change internal representation to work with layers
 
 From: Mauro Carvalho Chehab mche...@redhat.com
 
 Change the EDAC internal representation to work with non-csrow
 based memory controllers.
 
 There are lots of those memory controllers nowadays, and more
 are coming. So, the EDAC internal representation needs to be
 changed, in order to work with those memory controllers, while
 preserving backward compatibility with the old ones.
 
 The edac core was written with the idea that memory controllers
 are able to directly access csrows.
 
 This is not true for FB-DIMM and RAMBUS memory controllers.
 
 Also, some recent advanced memory controllers don't present a per-csrows
 view. Instead, they view memories as DIMMs, instead of ranks.
 
 So, change the allocation and error report routines to allow
 them to work with all types of architectures.
 
 This will allow the removal of several hacks with FB-DIMM and RAMBUS
 memory controllers.
 
 Also, several tests were done on different platforms using different
 x86 drivers.
 
 TODO: a multi-rank DIMMs are currently represented by multiple DIMM
 entries in struct dimm_info. That means that changing a label for one
 rank won't change the same label for the other ranks at the same DIMM.
 This bug is present since the beginning of the EDAC, so it is not a big
 deal. However, on several drivers, it is possible to fix this issue, but
 it should be a per-driver fix, as the csrow = DIMM arrangement may not
 be equal for all. So, don't try to fix it here yet.
 
 I tried to make this patch as short as possible, preceding it with
 several other patches that simplified the logic here. Yet, as the
 internal API changes, all drivers need changes. The changes are
 generally bigger in the drivers for FB-DIMMs.
 
 Cc: Aristeu Rozanski aroza...@redhat.com
 Cc: Doug Thompson nor...@yahoo.com
 Cc: Borislav Petkov borislav.pet...@amd.com
 Cc: Mark Gross mark.gr...@intel.com
 Cc: Jason Uhlenkott juhle...@akamai.com
 Cc: Tim Small t...@buttersideup.com
 Cc: Ranganathan Desikan r...@jetztechnologies.com
 Cc: Arvind R. arvin...@gmail.com
 Cc: Olof Johansson o...@lixom.net
 Cc: Egor Martovetsky e...@pasemi.com
 Cc: Chris Metcalf cmetc...@tilera.com
 Cc: Michal Marek mma...@suse.cz
 Cc: Jiri Kosina jkos...@suse.cz
 Cc: Joe Perches j...@perches.com
 Cc: Dmitry Eremin-Solenikov dbarysh...@gmail.com
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Hitoshi Mitake h.mit...@gmail.com
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Niklas Söderlund niklas.soderl...@ericsson.com
 Cc: Shaohui Xie shaohui@freescale.com
 Cc: Josh Boyer jwbo...@gmail.com
 Cc: linuxppc-dev@lists.ozlabs.org
 Signed-off-by: Mauro Carvalho Chehab mche...@redhat.com
 
 ---
 v18: Addresses the pointed issues on v17 review
 
 diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
 index e48ab31..1286c5e 100644
 --- a/drivers/edac/edac_core.h
 +++ b/drivers/edac/edac_core.h
 @@ -447,8 +447,12 @@ static inline void pci_write_bits32(struct pci_dev 
 *pdev, int offset,
  
  #endif   /* CONFIG_PCI */
  
 -extern struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned 
 nr_csrows,
 -   unsigned nr_chans, int edac_index);
 +struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
 +unsigned nr_chans, int edac_index);
 +struct mem_ctl_info *new_edac_mc_alloc(unsigned edac_index,
 +unsigned n_layers,
 +struct edac_mc_layer *layers,
 +unsigned sz_pvt);
  extern int edac_mc_add_mc(struct mem_ctl_info *mci);
  extern void edac_mc_free(struct mem_ctl_info *mci);
  extern struct mem_ctl_info *edac_mc_find(int idx);
 @@ -467,24 +471,78 @@ extern int edac_mc_find_csrow_by_page(struct 
 mem_ctl_info *mci,
   * reporting logic and function interface - reduces conditional
   * statement clutter and extra function arguments.
   */
 -extern void edac_mc_handle_ce(struct mem_ctl_info *mci,
 -   unsigned long page_frame_number,
 -   unsigned long offset_in_page,
 -   unsigned long syndrome, int row, int channel,
 -   const char *msg);
 -extern void edac_mc_handle_ce_no_info(struct mem_ctl_info *mci,
 -   const char *msg);
 -extern void edac_mc_handle_ue(struct mem_ctl_info *mci,
 -   unsigned long page_frame_number,
 -   unsigned long offset_in_page, int row,
 -   const char *msg);
 -extern void edac_mc_handle_ue_no_info(struct mem_ctl_info *mci,
 -   const char *msg);
 -extern void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci, unsigned int 
 csrow,
 -   unsigned int channel0, 

Re: [PATCH EDACv16 1/2] edac: Change internal representation to work with layers, second version

2012-05-04 Thread Mauro Carvalho Chehab
Em 04-05-2012 07:16, Borislav Petkov escreveu:
 On Thu, May 03, 2012 at 11:16:54AM -0300, Mauro Carvalho Chehab wrote:
 edac: Change internal representation to work with layers
...
  /**
 - * edac_mc_alloc: Allocate a struct mem_ctl_info structure
 - * @size_pvt:   size of private storage needed
 - * @nr_csrows:  Number of CWROWS needed for this MC
 - * @nr_chans:   Number of channels for the MC
 + * edac_mc_alloc: Allocate and partially fill a struct mem_ctl_info 
 structure
 + * @mc_num: Memory controller number
 + * @n_layers:   Number of MC hierarchy layers
 + * layers:  Describes each layer as seen by the Memory Controller
 + * @size_pvt:   size of private storage needed
 + *
 + *
 + * Non-csrow based drivers (like FB-DIMM and RAMBUS ones) will likely report
 + * such DIMMS properly, but the CSROWS-based ones will likely do the wrong
 
  DIMMs   csrow-based
 
 + * thing, as two chip select values are used for dual-rank memories (and 4, 
 for
 + * quad-rank ones). I suspect that this issue could be solved inside the 
 EDAC
 + * core for SDRAM memories, but it requires further study at JEDEC JESD 21C.
 
 The paragraph above is still in, let me repeat my last note:
 
 This last paragraph sounds innacurately, especially the likely
 adverbs make it even more confusing. The gist of what you're saying is
 already present in the commit message anyway, so drop it here pls.

There are two similar comments one for edac_mc_alloc and another for 
new_edac_mc_alloc.
It seems that I fixed just one of them.

 
 + *
 + * In summary, solving this issue is not easy, as it requires a lot of 
 testing.
 
 As before:
 
 Also superfluous and has nothing to do with edac_mc_alloc().
 
 Pls remove it.

Dropped both paragraphs.

 
   * Everything is kmalloc'ed as one big chunk - more efficient.
   * Only can be used if all structures have the same lifetime - otherwise
 @@ -168,22 +196,49 @@ void *edac_align_ptr(void **p, unsigned size, int 
 n_elems)
   *
   * Use edac_mc_free() to free mc structures allocated by this function.
   *
 + * NOTE: drivers handle multi-rank memories in different ways: in some
 + * drivers, one multi-rank memory stick is mapped as one entry, while, in
 + * others, a single multi-rank memory stick would be mapped into several
 + * entries. Currently, this function will allocate multiple struct dimm_info
 + * on such scenarios, as grouping the multiple ranks require drivers change.
 + *
   * Returns:
   *  NULL allocation failed
   *  struct mem_ctl_info pointer
 
 Ok, this patch still doesn't contain all the changes I requested for
 although you said you did them. Is this not the latest version? I'll
 wait for you to sort it out before I continue reviewing...
 
 Thanks.
 

Patch enclosed (and updated at Infradead).

Regards,
Mauro

-

edac: Change internal representation to work with layers

Change the EDAC internal representation to work with non-csrow
based memory controllers.

There are lots of those memory controllers nowadays, and more
are coming. So, the EDAC internal representation needs to be
changed, in order to work with those memory controllers, while
preserving backward compatibility with the old ones.

The edac core was written with the idea that memory controllers
are able to directly access csrows.

This is not true for FB-DIMM and RAMBUS memory controllers.

Also, some recent advanced memory controllers don't present a per-csrows
view. Instead, they view memories as DIMMs, instead of ranks.

So, change the allocation and error report routines to allow
them to work with all types of architectures.

This will allow the removal of several hacks with FB-DIMM and RAMBUS
memory controllers.

Also, several tests were done on different platforms using different
x86 drivers.

TODO: a multi-rank DIMMs are currently represented by multiple DIMM
entries in struct dimm_info. That means that changing a label for one
rank won't change the same label for the other ranks at the same DIMM.
This bug is present since the beginning of the EDAC, so it is not a big
deal. However, on several drivers, it is possible to fix this issue, but
it should be a per-driver fix, as the csrow = DIMM arrangement may not
be equal for all. So, don't try to fix it here yet.

I tried to make this patch as short as possible, preceding it with
several other patches that simplified the logic here. Yet, as the
internal API changes, all drivers need changes. The changes are
generally bigger in the drivers for FB-DIMMs.

Cc: Aristeu Rozanski aroza...@redhat.com
Cc: Doug Thompson nor...@yahoo.com
Cc: Borislav Petkov borislav.pet...@amd.com
Cc: Mark Gross mark.gr...@intel.com
Cc: Jason Uhlenkott juhle...@akamai.com
Cc: Tim Small t...@buttersideup.com
Cc: Ranganathan Desikan r...@jetztechnologies.com
Cc: Arvind R. arvin...@gmail.com
Cc: Olof Johansson o...@lixom.net
Cc: Egor Martovetsky e...@pasemi.com
Cc: Chris Metcalf cmetc...@tilera.com
Cc: Michal Marek 

Re: [PATCH 1/9] cpu: Introduce clear_tasks_mm_cpumask() helper

2012-05-04 Thread Anton Vorontsov
On Thu, Apr 26, 2012 at 04:59:11PM -0700, Andrew Morton wrote:
[...]
   so its not like new tasks will ever get this cpu set in
  +* their mm mask. -- Peter Zijlstra
  +* Thus, we may use rcu_read_lock() here, instead of grabbing
  +* full-fledged tasklist_lock.
  +*/
  +   rcu_read_lock();
  +   for_each_process(p) {
  +   struct task_struct *t;
  +
  +   t = find_lock_task_mm(p);
  +   if (!t)
  +   continue;
  +   cpumask_clear_cpu(cpu, mm_cpumask(t-mm));
  +   task_unlock(t);
  +   }
  +   rcu_read_unlock();
  +}
 
 It is good that this code exists under CONFIG_HOTPLUG_CPU.  Did you
 check that everything works correctly with CONFIG_HOTPLUG_CPU=n?

Yeah, only the code under CONFIG_HOTPLUG_CPU calls the function, so
it should be all fine.

Thanks!

-- 
Anton Vorontsov
Email: cbouatmai...@gmail.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] cpu: Document clear_tasks_mm_cpumask()

2012-05-04 Thread Anton Vorontsov
This patch adds more comments on clear_tasks_mm_cpumask, plus adds
a runtime check: the function is only suitable for offlined CPUs,
and if called inappropriately, the kernel should scream aloud.

Suggested-by: Andrew Morton a...@linux-foundation.org
Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl
Signed-off-by: Anton Vorontsov anton.voront...@linaro.org
---

On Tue, May 01, 2012 at 12:45:33PM +0200, Peter Zijlstra wrote:
 On Thu, 2012-04-26 at 16:59 -0700, Andrew Morton wrote:
   +void clear_tasks_mm_cpumask(int cpu)
  
  The operation of this function was presumably obvious to you at the
  time you wrote it, but that isn't true of other people at later times.
  
  Please document it?
[...]
  If someone tries to use this function for a different purpose, or
  copies-and-modifies it for a different purpose, we just shot them in
  the foot.
  
  They'd be pretty dumb to do that without reading the local comment,
  but still...
 
 Methinks something simple like:
 
   WARN_ON(cpu_online(cpu));
 
 Ought to cure that worry, no? :-)

Yeah, this is all good ideas, thanks.

How about the following patch?

 kernel/cpu.c |   18 ++
 1 file changed, 18 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index ecdf499..1bfa26f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -13,6 +13,7 @@
 #include linux/oom.h
 #include linux/rcupdate.h
 #include linux/export.h
+#include linux/bug.h
 #include linux/kthread.h
 #include linux/stop_machine.h
 #include linux/mutex.h
@@ -173,6 +174,18 @@ void __ref unregister_cpu_notifier(struct notifier_block 
*nb)
 }
 EXPORT_SYMBOL(unregister_cpu_notifier);
 
+/**
+ * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU
+ * @cpu: a CPU id
+ *
+ * This function walks up all processes, finds a valid mm struct for
+ * each one and then clears a corresponding bit in mm's cpumask. While
+ * this all sounds trivial, there are various non-obvious corner cases,
+ * which this function tries to solve in a safe manner.
+ *
+ * Also note that the function uses a somewhat relaxed locking scheme,
+ * so it may be called only for an already offlined CPU.
+ */
 void clear_tasks_mm_cpumask(int cpu)
 {
struct task_struct *p;
@@ -184,10 +197,15 @@ void clear_tasks_mm_cpumask(int cpu)
 * Thus, we may use rcu_read_lock() here, instead of grabbing
 * full-fledged tasklist_lock.
 */
+   WARN_ON(cpu_online(cpu));
rcu_read_lock();
for_each_process(p) {
struct task_struct *t;
 
+   /*
+* Main thread might exit, but other threads may still have
+* a valid mm. Find one.
+*/
t = find_lock_task_mm(p);
if (!t)
continue;
-- 
1.7.9.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev