[PATCH v3 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events

2018-08-08 Thread John Allen
When a PRRN event is being handled and another PRRN event comes in, the
second event will block rtas polling waiting on the first to complete,
preventing any further rtas events from being handled. This can be
especially problematic in case that PRRN events are continuously being
queued in which case rtas polling gets indefinitely blocked completely.

This patch removes the blocking call to flush_work and allows the
default workqueue behavior to handle duplicate events.

Signed-off-by: John Allen 
---
v3:
  -Scrap the mutex as it only replicates existing workqueue behavior.
v2:
  -Unlock prrn_lock when PRRN operations are complete, not after handler is
   scheduled.
  -Remove call to flush_work, the previous broken method of serializing
   PRRN events.
---
 arch/powerpc/kernel/rtasd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 44d66c33d59d..2017934e5985 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -290,7 +290,6 @@ static DECLARE_WORK(prrn_work, prrn_work_fn);
 
 static void prrn_schedule_update(u32 scope)
 {
-   flush_work(_work);
prrn_update_scope = scope;
schedule_work(_work);
 }
-- 
2.17.1



[PATCH v3 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling

2018-08-08 Thread John Allen
While handling PRRN events, the time to handle the actual hotplug events
dwarfs the time it takes to perform the device tree updates and queue the
hotplug events. In the case that PRRN events are being queued continuously,
hotplug events have been observed to be queued faster than the kernel can
actually handle them. This patch avoids the problem by waiting for a
hotplug request to complete before queueing more hotplug events.

Signed-off-by: John Allen 
---
 arch/powerpc/platforms/pseries/mobility.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 8a8033a249c7..49930848fa78 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -242,6 +242,7 @@ static int add_dt_node(__be32 parent_phandle, __be32 
drc_index)
 static void prrn_update_node(__be32 phandle)
 {
struct pseries_hp_errorlog *hp_elog;
+   struct completion hotplug_done;
struct device_node *dn;
 
/*
@@ -263,7 +264,9 @@ static void prrn_update_node(__be32 phandle)
hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
hp_elog->_drc_u.drc_index = phandle;
 
-   queue_hotplug_event(hp_elog, NULL, NULL);
+   init_completion(_done);
+   queue_hotplug_event(hp_elog, _done, NULL);
+   wait_for_completion(_done);
 
kfree(hp_elog);
 }
-- 
2.17.1



[PATCH v3 0/2] powerpc/pseries: Improve serialization of PRRN events

2018-08-08 Thread John Allen
Stress testing has uncovered issues with handling continuously queued PRRN
events. Running PRRN events in this way can seriously load the system given
the sheer volume of dlpar being handled. This patchset ensures that PRRN
events are handled more synchronously, only allowing the PRRN handler to
queue a single dlpar event at any given time.  Additionally, it ensures
that rtas polling continues normally when multiple PRRN events are queued
simultaneously.

v3:
  -Scrap the PRRN mutex as it only replicates existing workqueue behavior.
v2:
  -Unlock prrn_lock when PRRN operations are complete, not after handler is
   scheduled.
  -Remove call to flush_work, the previous broken method of serializing
   PRRN events.

John Allen (2):
  powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN
events
  powerpc/pseries: Wait for completion of hotplug events during PRRN
handling

 arch/powerpc/kernel/rtasd.c   | 10 +++---
 arch/powerpc/platforms/pseries/mobility.c |  5 -
 2 files changed, 11 insertions(+), 4 deletions(-)

-- 
2.17.1



Re: [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling

2018-08-07 Thread John Allen

On Wed, Aug 01, 2018 at 11:16:22PM +1000, Michael Ellerman wrote:

John Allen  writes:


On Mon, Jul 23, 2018 at 11:41:24PM +1000, Michael Ellerman wrote:

John Allen  writes:


While handling PRRN events, the time to handle the actual hotplug events
dwarfs the time it takes to perform the device tree updates and queue the
hotplug events. In the case that PRRN events are being queued continuously,
hotplug events have been observed to be queued faster than the kernel can
actually handle them. This patch avoids the problem by waiting for a
hotplug request to complete before queueing more hotplug events.


Have you tested this patch in isolation, ie. not with patch 1?


While I was away on vacation, I believe a build was tested with just 
this patch and not the first and it has been running with no problems.  
However, I think they've had problems recreating the problem in general 
so it may just be that the environment is not setup properly to recreate 
the issue.





So do we need the hotplug work queue at all? Can we just call
handle_dlpar_errorlog() directly?

Or are we using the work queue to serialise things? And if so would a
mutex be better?


Right, the workqueue is meant to serialize all hotplug events and it
gets used for more than just PRRN events. I believe the motivation for
using the workqueue over a mutex is that KVM guests initiate hotplug
events through the hotplug interrupt and can queue fairly large requests
meaning that in this scenario, waiting for a lock would block interrupts
for a while.


OK, but that just means that path needs to schedule work to run later.


Using the workqueue allows us to serialize hotplug events
from different sources in the same way without worrying about the
context in which the event is generated.


A lock would be so much simpler.

It looks like we have three callers of queue_hotplug_event(), the dlpar
code, the mobility code and the ras interrupt.

The dlpar code already waits synchronously:

 init_completion(_done);
 queue_hotplug_event(hp_elog, _done, );
 wait_for_completion(_done);

You're changing mobility to do the same (this patch), leaving only the
ras interrupt that actually queues work and returns.


So it really seems like a mutex would do the trick, and the ras
interrupt would be the only case that needs to schedule work for later.


I think you may be right, but I would need some feedback from Nathan 
Fontenot before I redesign the queue. He's been thinking about that 
design for longer than I have and may know something that I don't 
regarding the reason we're using a workqueue rather than a mutex.


Given that the bug this is meant to address is pretty high priority, 
would you consider the wait_for_completion an acceptable stopgap while a 
more substantial redesign of this code is discussed?


-John



Re: [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events

2018-08-06 Thread John Allen

On Wed, Aug 01, 2018 at 11:02:59PM +1000, Michael Ellerman wrote:

Hi John,

I'm still not sure about this one.

John Allen  writes:

On Mon, Jul 23, 2018 at 11:27:56PM +1000, Michael Ellerman wrote:

Hi John,

I'm a bit puzzled by this one.

John Allen  writes:

When a PRRN event is being handled and another PRRN event comes in, the
second event will block rtas polling waiting on the first to complete,
preventing any further rtas events from being handled. This can be
especially problematic in case that PRRN events are continuously being
queued in which case rtas polling gets indefinitely blocked completely.

This patch introduces a mutex that prevents any subsequent PRRN events from
running while there is a prrn event being handled, allowing rtas polling to
continue normally.

Signed-off-by: John Allen 
---
v2:
  -Unlock prrn_lock when PRRN operations are complete, not after handler is
   scheduled.
  -Remove call to flush_work, the previous broken method of serializing
   PRRN events.
---
 arch/powerpc/kernel/rtasd.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 44d66c33d59d..845fc5aec178 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
 */
pseries_devicetree_update(-prrn_update_scope);
numa_update_cpu_topology(false);
+   mutex_unlock(_lock);
 }

 static DECLARE_WORK(prrn_work, prrn_work_fn);

 static void prrn_schedule_update(u32 scope)
 {
-   flush_work(_work);


This seems like it's actually the core of the change. Previously we were
basically blocking on the flush before continuing.


The idea here is to replace the blocking flush_work with a non-blocking
mutex. So rather than waiting on the running PRRN event to complete, we
bail out since a PRRN event is already running.


OK, but why is it OK to bail out?

The firmware sent you an error log asking you to do something, with a
scope value that has some meaning, and now you're just going to drop
that on the floor?

Maybe it is OK to just drop these events? Or maybe you're saying that
because the system is crashing under the load of too many events it's OK
to drop the events in this case.


I think I see your point. If a PRRN event comes in while another is 
currently running, the new one may contain a different list of LMBs/CPUs 
and the old list becomes outdated. With the mutex, the only event that 
gets handled is the oldest and we will lose any additional changes 
beyond the initial event. Therefore, as you mentioned in your previous 
message, the behavior of the global workqueue should work just fine once 
we remove the call to flush_work.  While a prrn event is running, only 
one will remain on the workqueue, then when the first one completes, the 
newly scheduled work function should grab the latest PRRN list.


I will send a new version of the patch with just the call to flush_work 
removed.


-John




The situation this is
meant to address is flooding the workqueue with PRRN events, which like
the situation in patch 2/2, these can be queued up faster than they can
actually be handled.


I'm not really sure why this is a problem though.

The current code synchronously processes the events, so there should
only ever be one in flight.

I guess the issue is that each one can queue multiple events on the
hotplug work queue?

But still, we have terabytes of RAM, we should be able to queue a lot
of events before it becomes a problem.

So what exactly is getting flooded, what's the symptom?

If the queuing of the hotplug events is the problem, then why don't we
stop doing that? We could just process them synchronously from the PRRN
update, that would naturally throttle them.

cheers





Re: Infinite looping observed in __offline_pages

2018-07-27 Thread John Allen

On Wed, Jul 25, 2018 at 10:03:36PM +0200, Michal Hocko wrote:

On Wed 25-07-18 13:11:15, John Allen wrote:
[...]

Does a failure in do_migrate_range indicate that the range is unmigratable
and the loop in __offline_pages should terminate and goto failed_removal? Or
should we allow a certain number of retrys before we
give up on migrating the range?


Unfortunatelly not. Migration code doesn't tell a difference between
ephemeral and permanent failures. We are relying on
start_isolate_page_range to tell us this. So the question is, what kind
of page is not migratable and for what reason.

Are you able to add some debugging to give us more information. The
current debugging code in the hotplug/migration sucks...


After reproducing the problem a couple times, it seems that it can occur 
for different types of pages. Running page-types on the offending page 
over two separate instances produced the following:


# tools/vm/page-types -a 307968-308224
flags   page-count   MB  symbolic-flags 
long-symbolic-flags
0x0400   10  
__Bbuddy
 total   10

And the following on a separate run:

# tools/vm/page-types -a 313088-313344
flags   page-count   MB  symbolic-flags 
long-symbolic-flags
0x006c   10  
__RU_lA
referenced,uptodate,lru,active
total10

The source of the failure in migrate_pages actually doesn't seem to be 
that we're hitting the case of the permanent failure, but instead the 
-EAGAIN case. I traced the EAGAIN return back to 
migrate_page_move_mapping which I've seen return EAGAIN in two places:


mm/migrate.c:453
if (!mapping) {
/* Anonymous page without mapping */
if (page_count(page) != expected_count)
   return -EAGAIN;

mm/migrate.c:476
if (page_count(page) != expected_count ||
   radix_tree_deref_slot_protected(pslot,
   >i_pages.xa_lock) != page) {
   xa_unlock_irq(>i_pages);
   return -EAGAIN;
}

So it seems in each case, the actual reference count for the page is not 
what it is expected to be.




Infinite looping observed in __offline_pages

2018-07-25 Thread John Allen

Hi All,

Under heavy stress and constant memory hot add/remove, I have observed 
the following loop to occasionally loop infinitely:


mm/memory_hotplug.c:__offline_pages

repeat:
   /* start memory hot removal */
   ret = -EINTR;
   if (signal_pending(current))
   goto failed_removal;

   cond_resched();
   lru_add_drain_all();
   drain_all_pages(zone);

   pfn = scan_movable_pages(start_pfn, end_pfn);
   if (pfn) { /* We have movable pages */
   ret = do_migrate_range(pfn, end_pfn);
   goto repeat;
   }

What appears to be happening in this case is that do_migrate_range 
returns a failure code which is being ignored. The failure is stemming 
from migrate_pages returning "1" which I'm guessing is the result of us 
hitting the following case:


mm/migrate.c: migrate_pages

default:
/*
 * Permanent failure (-EBUSY, -ENOSYS, etc.):
 * unlike -EAGAIN case, the failed page is
 * removed from migration page list and not
 * retried in the next outer loop.
 */
nr_failed++;
break;
}

Does a failure in do_migrate_range indicate that the range is 
unmigratable and the loop in __offline_pages should terminate and goto 
failed_removal? Or should we allow a certain number of retrys before we

give up on migrating the range?

This issue was observed on a ppc64le lpar on a 4.18-rc6 kernel.

-John



Re: [PATCH v07 2/9] hotplug/cpu: Add operation queuing function

2018-07-23 Thread John Allen

On Fri, Jul 13, 2018 at 03:18:01PM -0500, Michael Bringmann wrote:

migration/dlpar: This patch adds function dlpar_queue_action()
which will queued up information about a CPU/Memory 'readd'
operation according to resource type, action code, and DRC index.
At a subsequent point, the list of operations can be run/played
in series.  Examples of such oprations include 'readd' of CPU
and Memory blocks identified as having changed their associativity
during an LPAR migration event.

Signed-off-by: Michael Bringmann 
---
Changes in patch:
 -- Correct drc_index before adding to pseries_hp_errorlog struct
 -- Correct text of notice
 -- Revise queuing model to save up all of the DLPAR actions for
later execution.
 -- Restore list init statement missing from patch
 -- Move call to apply queued operations into 'mobility.c'
 -- Compress some code
 -- Rename some of queueing function APIs
 -- Revise implementation to push execution of queued operations
to a workqueue task.
 -- Cleanup reference to outdated queuing operation.
---
arch/powerpc/include/asm/rtas.h   |2 +
arch/powerpc/platforms/pseries/dlpar.c|   61 +
arch/powerpc/platforms/pseries/mobility.c |4 ++
arch/powerpc/platforms/pseries/pseries.h  |2 +
4 files changed, 69 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 71e393c..4f601c7 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -310,12 +310,14 @@ struct pseries_hp_errorlog {
struct { __be32 count, index; } ic;
chardrc_name[1];
} _drc_u;
+   struct list_head list;
};

#define PSERIES_HP_ELOG_RESOURCE_CPU1
#define PSERIES_HP_ELOG_RESOURCE_MEM2
#define PSERIES_HP_ELOG_RESOURCE_SLOT   3
#define PSERIES_HP_ELOG_RESOURCE_PHB4
+#define PSERIES_HP_ELOG_RESOURCE_PMT   5

#define PSERIES_HP_ELOG_ACTION_ADD  1
#define PSERIES_HP_ELOG_ACTION_REMOVE   2
diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index a0b20c0..7264b8e 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -25,6 +25,7 @@
#include 
#include 
#include 
+#include 
#include 

static struct workqueue_struct *pseries_hp_wq;
@@ -329,6 +330,8 @@ int dlpar_release_drc(u32 drc_index)
return 0;
}

+static int dlpar_pmt(struct pseries_hp_errorlog *work);
+
static int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_elog)
{
int rc;
@@ -357,6 +360,9 @@ static int handle_dlpar_errorlog(struct pseries_hp_errorlog 
*hp_elog)
case PSERIES_HP_ELOG_RESOURCE_CPU:
rc = dlpar_cpu(hp_elog);
break;
+   case PSERIES_HP_ELOG_RESOURCE_PMT:
+   rc = dlpar_pmt(hp_elog);
+   break;
default:
pr_warn_ratelimited("Invalid resource (%d) specified\n",
hp_elog->resource);
@@ -407,6 +413,61 @@ void queue_hotplug_event(struct pseries_hp_errorlog 
*hp_errlog,
}
}

+LIST_HEAD(dlpar_delayed_list);
+
+int dlpar_queue_action(int resource, int action, u32 drc_index)
+{
+   struct pseries_hp_errorlog *hp_errlog;
+
+   hp_errlog = kmalloc(sizeof(struct pseries_hp_errorlog), GFP_KERNEL);
+   if (!hp_errlog)
+   return -ENOMEM;
+
+   hp_errlog->resource = resource;
+   hp_errlog->action = action;
+   hp_errlog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
+   hp_errlog->_drc_u.drc_index = cpu_to_be32(drc_index);
+
+   list_add_tail(_errlog->list, _delayed_list);
+
+   return 0;
+}
+
+static int dlpar_pmt(struct pseries_hp_errorlog *work)
+{
+   struct list_head *pos, *q;
+
+   ssleep(15);


Why do we need to sleep for so long here?

-John


+
+   list_for_each_safe(pos, q, _delayed_list) {
+   struct pseries_hp_errorlog *tmp;
+
+   tmp = list_entry(pos, struct pseries_hp_errorlog, list);
+   handle_dlpar_errorlog(tmp);
+
+   list_del(pos);
+   kfree(tmp);
+
+   ssleep(10);
+   }
+
+   return 0;
+}
+
+int dlpar_queued_actions_run(void)
+{
+   if (!list_empty(_delayed_list)) {
+   struct pseries_hp_errorlog hp_errlog;
+
+   hp_errlog.resource = PSERIES_HP_ELOG_RESOURCE_PMT;
+   hp_errlog.action = 0;
+   hp_errlog.id_type = 0;
+
+   queue_hotplug_event(_errlog, 0, 0);
+   }
+   return 0;
+}
+
static int dlpar_parse_resource(char **cmd, struct pseries_hp_errorlog *hp_elog)
{
char *arg;
diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index f6364d9..d0d1cae 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -378,6 +378,10 @@ static ssize_t migration_store(struct class *class,
return rc;

post_mobility_fixup();
+

Re: [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling

2018-07-23 Thread John Allen

On Mon, Jul 23, 2018 at 11:41:24PM +1000, Michael Ellerman wrote:

John Allen  writes:


While handling PRRN events, the time to handle the actual hotplug events
dwarfs the time it takes to perform the device tree updates and queue the
hotplug events. In the case that PRRN events are being queued continuously,
hotplug events have been observed to be queued faster than the kernel can
actually handle them. This patch avoids the problem by waiting for a
hotplug request to complete before queueing more hotplug events.


So do we need the hotplug work queue at all? Can we just call
handle_dlpar_errorlog() directly?

Or are we using the work queue to serialise things? And if so would a
mutex be better?


Right, the workqueue is meant to serialize all hotplug events and it 
gets used for more than just PRRN events. I believe the motivation for 
using the workqueue over a mutex is that KVM guests initiate hotplug 
events through the hotplug interrupt and can queue fairly large requests 
meaning that in this scenario, waiting for a lock would block interrupts
for a while. Using the workqueue allows us to serialize hotplug events 
from different sources in the same way without worrying about the 
context in which the event is generated.




It looks like prrn_update_node() is called via at least, prrn_work_fn()
and post_mobility_fixup().

The latter is called from migration_store(), which seems like it would
be harmless. But also from pseries_suspend_enable_irqs() which I'm less
clear on.


Yeah, that doesn't seem to make sense based on the function name. Odd 
that prrn_update_node is being called from anywhere outside of handling 
PRRN events. Perhaps if other code paths are using the function, it 
needs a more generic name.


-John



cheers


diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 8a8033a249c7..49930848fa78 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -242,6 +242,7 @@ static int add_dt_node(__be32 parent_phandle, __be32 
drc_index)
 static void prrn_update_node(__be32 phandle)
 {
struct pseries_hp_errorlog *hp_elog;
+   struct completion hotplug_done;
struct device_node *dn;

/*
@@ -263,7 +264,9 @@ static void prrn_update_node(__be32 phandle)
hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
hp_elog->_drc_u.drc_index = phandle;

-   queue_hotplug_event(hp_elog, NULL, NULL);
+   init_completion(_done);
+   queue_hotplug_event(hp_elog, _done, NULL);
+   wait_for_completion(_done);

kfree(hp_elog);
 }
--
2.17.1






Re: [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events

2018-07-23 Thread John Allen

On Mon, Jul 23, 2018 at 11:27:56PM +1000, Michael Ellerman wrote:

Hi John,

I'm a bit puzzled by this one.

John Allen  writes:

When a PRRN event is being handled and another PRRN event comes in, the
second event will block rtas polling waiting on the first to complete,
preventing any further rtas events from being handled. This can be
especially problematic in case that PRRN events are continuously being
queued in which case rtas polling gets indefinitely blocked completely.

This patch introduces a mutex that prevents any subsequent PRRN events from
running while there is a prrn event being handled, allowing rtas polling to
continue normally.

Signed-off-by: John Allen 
---
v2:
  -Unlock prrn_lock when PRRN operations are complete, not after handler is
   scheduled.
  -Remove call to flush_work, the previous broken method of serializing
   PRRN events.
---
 arch/powerpc/kernel/rtasd.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 44d66c33d59d..845fc5aec178 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
 */
pseries_devicetree_update(-prrn_update_scope);
numa_update_cpu_topology(false);
+   mutex_unlock(_lock);
 }

 static DECLARE_WORK(prrn_work, prrn_work_fn);

 static void prrn_schedule_update(u32 scope)
 {
-   flush_work(_work);


This seems like it's actually the core of the change. Previously we were
basically blocking on the flush before continuing.


The idea here is to replace the blocking flush_work with a non-blocking 
mutex. So rather than waiting on the running PRRN event to complete, we 
bail out since a PRRN event is already running. The situation this is 
meant to address is flooding the workqueue with PRRN events, which like 
the situation in patch 2/2, these can be queued up faster than they can 
actually be handled.





-   prrn_update_scope = scope;


I don't really understand the scope. With the old code we always ran the
work function once for call, now we potentially throw away the scope
value (if the try lock fails).


So anytime we actually want to run with the scope (in the event the 
trylock succeeds), we schedule the work with the scope value set 
accordingly as seen in the code below. In the case that we actually 
don't want to run a PRRN event (if one is already running) we do throw 
away the scope and ignore the request entirely.





-   schedule_work(_work);
+   if (mutex_trylock(_lock)) {
+   prrn_update_scope = scope;
+   schedule_work(_work);
+   }


Ignoring the scope, the addition of the mutex should not actually make
any difference. If you see the doco for schedule_work() it says:

* This puts a job in the kernel-global workqueue if it was not already
* queued and leaves it in the same position on the kernel-global
* workqueue otherwise.


So the mutex basically implements that existing behaviour. But maybe the
scope is the issue? Like I said I don't really understand the scope
value.


So I guess I'm wondering if we just need to drop the flush_work() and
the rest is not required?


To sum up the above, the behavior without the mutex is not the same as 
with the mutex. Without the mutex, that means that anytime we get a PRRN 
event, it will get queued on the workqueue which can get flooded if PRRN 
events are queued continuously. With the mutex, only one PRRN event can 
be queued for handling at once.


Hope that clears things up!

-John



cheers





[PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events

2018-07-17 Thread John Allen
When a PRRN event is being handled and another PRRN event comes in, the
second event will block rtas polling waiting on the first to complete,
preventing any further rtas events from being handled. This can be
especially problematic in case that PRRN events are continuously being
queued in which case rtas polling gets indefinitely blocked completely.

This patch introduces a mutex that prevents any subsequent PRRN events from
running while there is a prrn event being handled, allowing rtas polling to
continue normally.

Signed-off-by: John Allen 
---
v2:
  -Unlock prrn_lock when PRRN operations are complete, not after handler is
   scheduled.
  -Remove call to flush_work, the previous broken method of serializing
   PRRN events.
---
 arch/powerpc/kernel/rtasd.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 44d66c33d59d..845fc5aec178 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -35,6 +35,8 @@
 
 static DEFINE_SPINLOCK(rtasd_log_lock);
 
+static DEFINE_MUTEX(prrn_lock);
+
 static DECLARE_WAIT_QUEUE_HEAD(rtas_log_wait);
 
 static char *rtas_log_buf;
@@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
 */
pseries_devicetree_update(-prrn_update_scope);
numa_update_cpu_topology(false);
+   mutex_unlock(_lock);
 }
 
 static DECLARE_WORK(prrn_work, prrn_work_fn);
 
 static void prrn_schedule_update(u32 scope)
 {
-   flush_work(_work);
-   prrn_update_scope = scope;
-   schedule_work(_work);
+   if (mutex_trylock(_lock)) {
+   prrn_update_scope = scope;
+   schedule_work(_work);
+   }
 }
 
 static void handle_rtas_event(const struct rtas_error_log *log)
-- 
2.17.1



[PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling

2018-07-17 Thread John Allen
While handling PRRN events, the time to handle the actual hotplug events
dwarfs the time it takes to perform the device tree updates and queue the
hotplug events. In the case that PRRN events are being queued continuously,
hotplug events have been observed to be queued faster than the kernel can
actually handle them. This patch avoids the problem by waiting for a
hotplug request to complete before queueing more hotplug events.

Signed-off-by: John Allen 
---
 arch/powerpc/platforms/pseries/mobility.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 8a8033a249c7..49930848fa78 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -242,6 +242,7 @@ static int add_dt_node(__be32 parent_phandle, __be32 
drc_index)
 static void prrn_update_node(__be32 phandle)
 {
struct pseries_hp_errorlog *hp_elog;
+   struct completion hotplug_done;
struct device_node *dn;
 
/*
@@ -263,7 +264,9 @@ static void prrn_update_node(__be32 phandle)
hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
hp_elog->_drc_u.drc_index = phandle;
 
-   queue_hotplug_event(hp_elog, NULL, NULL);
+   init_completion(_done);
+   queue_hotplug_event(hp_elog, _done, NULL);
+   wait_for_completion(_done);
 
kfree(hp_elog);
 }
-- 
2.17.1



Re: [PATCH 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events

2018-07-16 Thread John Allen

On Fri, Jul 13, 2018 at 09:22:23AM -0500, John Allen wrote:

When a PRRN event is being handled and another PRRN event comes in, the
second event will block rtas polling waiting on the first to complete,
preventing any further rtas events from being handled. This can be
especially problematic in case that PRRN events are continuously being
queued in which case rtas polling gets indefinitely blocked completely.

This patch introduces a mutex that prevents any subsequent PRRN events from
running while there is a prrn event being handled, allowing rtas polling to
continue normally.

Signed-off-by: John Allen 
---
arch/powerpc/kernel/rtasd.c | 11 ---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 44d66c33d59d..8a72a53d62c0 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -35,6 +35,8 @@

static DEFINE_SPINLOCK(rtasd_log_lock);

+static DEFINE_MUTEX(prrn_lock);
+
static DECLARE_WAIT_QUEUE_HEAD(rtas_log_wait);

static char *rtas_log_buf;
@@ -290,9 +292,12 @@ static DECLARE_WORK(prrn_work, prrn_work_fn);

static void prrn_schedule_update(u32 scope)
{
-   flush_work(_work);
-   prrn_update_scope = scope;
-   schedule_work(_work);
+   if (mutex_trylock(_lock)) {
+   flush_work(_work);
+   prrn_update_scope = scope;
+   schedule_work(_work);
+   mutex_unlock(_lock);


This appears to be bugged. The mutex_unlock should be done elsewhere.  
Will send an updated version.


-John


+   }
}

static void handle_rtas_event(const struct rtas_error_log *log)
--
2.17.1





[PATCH 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling

2018-07-13 Thread John Allen
While handling PRRN events, the time to handle the actual hotplug events
dwarfs the time it takes to perform the device tree updates and queue the
hotplug events. In the case that PRRN events are being queued continuously,
hotplug events have been observed to be queued faster than the kernel can
actually handle them. This patch avoids the problem by waiting for a
hotplug request to complete before queueing more hotplug events.

Signed-off-by: John Allen 
---
 arch/powerpc/platforms/pseries/mobility.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 8a8033a249c7..49930848fa78 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -242,6 +242,7 @@ static int add_dt_node(__be32 parent_phandle, __be32 
drc_index)
 static void prrn_update_node(__be32 phandle)
 {
struct pseries_hp_errorlog *hp_elog;
+   struct completion hotplug_done;
struct device_node *dn;
 
/*
@@ -263,7 +264,9 @@ static void prrn_update_node(__be32 phandle)
hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
hp_elog->_drc_u.drc_index = phandle;
 
-   queue_hotplug_event(hp_elog, NULL, NULL);
+   init_completion(_done);
+   queue_hotplug_event(hp_elog, _done, NULL);
+   wait_for_completion(_done);
 
kfree(hp_elog);
 }
-- 
2.17.1



[PATCH 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events

2018-07-13 Thread John Allen
When a PRRN event is being handled and another PRRN event comes in, the
second event will block rtas polling waiting on the first to complete,
preventing any further rtas events from being handled. This can be
especially problematic in case that PRRN events are continuously being
queued in which case rtas polling gets indefinitely blocked completely.

This patch introduces a mutex that prevents any subsequent PRRN events from
running while there is a prrn event being handled, allowing rtas polling to
continue normally.

Signed-off-by: John Allen 
---
 arch/powerpc/kernel/rtasd.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 44d66c33d59d..8a72a53d62c0 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -35,6 +35,8 @@
 
 static DEFINE_SPINLOCK(rtasd_log_lock);
 
+static DEFINE_MUTEX(prrn_lock);
+
 static DECLARE_WAIT_QUEUE_HEAD(rtas_log_wait);
 
 static char *rtas_log_buf;
@@ -290,9 +292,12 @@ static DECLARE_WORK(prrn_work, prrn_work_fn);
 
 static void prrn_schedule_update(u32 scope)
 {
-   flush_work(_work);
-   prrn_update_scope = scope;
-   schedule_work(_work);
+   if (mutex_trylock(_lock)) {
+   flush_work(_work);
+   prrn_update_scope = scope;
+   schedule_work(_work);
+   mutex_unlock(_lock);
+   }
 }
 
 static void handle_rtas_event(const struct rtas_error_log *log)
-- 
2.17.1



[PATCH 0/2] powerpc/pseries: Improve serialization of PRRN events

2018-07-13 Thread John Allen
Stress testing has uncovered issues with handling continuously queued PRRN
events. Running PRRN events in this way can seriously load the system given
the sheer volume of dlpar being handled. This patchset ensures that PRRN
events are handled more synchronously, only allowing the PRRN handler to
queue a single dlpar event at any given time.  Additionally, it ensures
that rtas polling continues normally when multiple PRRN events are queued
simultaneously.

John Allen (2):
  pseries/prrn: Avoid blocking rtas polling handling multiple PRRN
events
  pseries/prrn: Wait for completion of hotplug events during PRRN
handling

 arch/powerpc/kernel/rtasd.c   | 11 ---
 arch/powerpc/platforms/pseries/mobility.c |  5 -
 2 files changed, 12 insertions(+), 4 deletions(-)

-- 
2.17.1



Re: [PATCH] powerpc/pseries: Don't attempt to acquire drc during memory hot add for assigned lmbs

2017-08-30 Thread John Allen
On 08/30/2017 09:35 AM, Nathan Fontenot wrote:
> On 08/29/2017 09:35 PM, Michael Ellerman wrote:
>> John Allen <jal...@linux.vnet.ibm.com> writes:
>>
>>> Check if an LMB is assigned before attempting to call dlpar_acquire_drc in
>>> order to avoid any unnecessary rtas calls. This substantially reduces the
>>> running time of memory hot add on lpars with large amounts of memory.
>>>
>>> Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
>>> ---
>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
>>> b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> index ca9b2f4..95cf2ff 100644
>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> @@ -817,6 +817,9 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add, 
>>> struct property *prop)
>>> return -EINVAL;
>>>
>>> for (i = 0; i < num_lmbs && lmbs_to_add != lmbs_added; i++) {
>>> +   if (lmbs[i].flags & DRCONF_MEM_ASSIGNED)
>>> +   continue;
>>> +
>>> rc = dlpar_acquire_drc(lmbs[i].drc_index);
>>> if (rc)
>>> continue;
>>
>> This doesn't build for me, see below. What compiler are you using to
>> test this?> 
>>   arch/powerpc/platforms/pseries/hotplug-memory.c: In function 
>> 'dlpar_memory':
>>   arch/powerpc/platforms/pseries/hotplug-memory.c:1081:2: error: 'rc' may be 
>> used uninitialized in this function [-Werror=maybe-uninitialized]
>> return rc;
>> ^
>>
> 
> I don't see this build failure either, looks like its time to update my
> compiler too.

This also builds for me. I'm using gcc version 4.8.5

> 
>> It's a bit confusing because you didn't modify that function, but the
>> function you did modify has been inlined into there.
>>
>> Possibly the compiler is wrong and we do always initialise rc, but it's
>> not clear at all.
>>
>> And it raises a bigger question, how is this supposed to actually work?
>>
>> If we go around the loop and find that something is already assigned:
>>
>>  for (i = 0; i < num_lmbs && lmbs_to_add != lmbs_added; i++) {
>>  if (lmbs[i].flags & DRCONF_MEM_ASSIGNED)
>>  continue;
>> ...
>>  lmbs_added++;
>> ...
>> }
>>
>> We don't increment lmbs_added, so at the end of the loop we will see
>> that lmbs_added is not equal to lmbs_to_add, and so we remove
>> everything:
> 
> There is a loop just before this code verifies there are enough
> LMBs available to satisfy the add request. This counts the number
> of LMBs that do not have the assigned flag set. This is where we
> update lmbs_available and if there are not enough lmbs available to
> satisfy the request, we bail.
> 
> When the loop you referenced above exits we should have attempted to
> enough LMBs to satisfy the request.
> 
>>
>>  if (lmbs_added != lmbs_to_add) {
>>  pr_err("Memory hot-add failed, removing any added LMBs\n");
>>
>>  for (i = 0; i < num_lmbs; i++) {
>>  if (!lmbs[i].reserved)
>>  continue;
>>
>>  rc = dlpar_remove_lmb([i]);
>>
>>
> 
> if we get into this recovery path, we only try to remove any LMBs
> that were added in the loop above. The code marks any LMB that is
> added as 'reserved' so that we only try to remove LMBs marked
> as 'reserved' in the cleanup path.
> 
>> So it seems like if we ever hit that continue you added, the whole
>> operation will fail anyway. So I'm confused.
> 
> The added code was to skip over any LMBs that are already assigned to
> the partition. As the code is written now, we walk through the entire list
> of LMBs possible. For large systems this can be tens, or hundreds of thousands
> of LMBs.
> 
> Without this check the code will try to acquire the drc for (this
> is done via rtas-set-indicator calls) which will fail for LMBs that are
> already assigned to the partition.
> 
> As an example on a large system that has 100,000 LMBs where the first
> 50,000 are assigned to the partition. When we get an add request to add
> X more LMBs we do see that there are 50,000 available to add. We then try
> to add X LMBs but start at the beginning of the LMB list when trying to
> add them. This results in 50,000 attempts to acquire the drc for LMbs
> that we already own. Instead we should just skip trying to acquire them.
> 
> Hopefully that helps clarify things.
> 
> -Nathan
> 
>>
>> cheers
>>
> 



Re: [PATCH] powerpc/pseries: Don't attempt to acquire drc during memory hot add for assigned lmbs

2017-08-24 Thread John Allen
On 08/24/2017 05:33 AM, Michael Ellerman wrote:
> John Allen <jal...@linux.vnet.ibm.com> writes:
> 
>> Check if an LMB is assigned before attempting to call dlpar_acquire_drc in
>> order to avoid any unnecessary rtas calls. This substantially reduces the
>> running time of memory hot add on lpars with large amounts of memory.
>>
>> Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
> 
> I'll add:
> 
>   Fixes: c21f515c7436 ("powerpc/pseries: Make the acquire/release of the drc 
> for memory a seperate step")
> 
> ?

Yes, thanks.

> 
> How bad is the slow down, do we need to backport to stable/distros?

On an lpar with 16 TB of memory assigned, I observed that adding 1 GB of
memory took several minutes without this fix and improved to several
seconds with this fix. Yep, this will need to be backported. Memory hotplug
performance is a hot issue for our team right now and we'll want to have
solid performance improvement to give to customers relatively soon.

> 
> cheers
> 
> 



[PATCH] powerpc/pseries: Don't attempt to acquire drc during memory hot add for assigned lmbs

2017-08-23 Thread John Allen
Check if an LMB is assigned before attempting to call dlpar_acquire_drc in
order to avoid any unnecessary rtas calls. This substantially reduces the
running time of memory hot add on lpars with large amounts of memory.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index ca9b2f4..95cf2ff 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -817,6 +817,9 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add, 
struct property *prop)
return -EINVAL;

for (i = 0; i < num_lmbs && lmbs_to_add != lmbs_added; i++) {
+   if (lmbs[i].flags & DRCONF_MEM_ASSIGNED)
+   continue;
+
rc = dlpar_acquire_drc(lmbs[i].drc_index);
if (rc)
continue;



[PATCH v3 3/3] powerpc/pseries: Update affinity for memory and cpus specified in a PRRN event

2017-01-06 Thread John Allen
Extend the existing PRRN infrastructure to perform the actual affinity
updating for cpus and memory in addition to the device tree updating. For
cpus, dynamic affinity updating already appears to exist in the kernel in
the form of arch_update_cpu_topology. For memory, we must place a READD
operation on the hotplug queue for any phandle included in the PRRN event
that is determined to be an LMB.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index a26a020..8836130 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -282,6 +283,7 @@ static void prrn_work_fn(struct work_struct *work)
 * the RTAS event.
 */
pseries_devicetree_update(-prrn_update_scope);
+   arch_update_cpu_topology();
 }

 static DECLARE_WORK(prrn_work, prrn_work_fn);
@@ -434,7 +436,10 @@ static void do_event_scan(void)
}

if (error == 0) {
-   pSeries_log_error(logdata, ERR_TYPE_RTAS_LOG, 0);
+   if (rtas_error_type((struct rtas_error_log *)logdata) !=
+   RTAS_TYPE_PRRN)
+   pSeries_log_error(logdata, ERR_TYPE_RTAS_LOG,
+ 0);
handle_rtas_event((struct rtas_error_log *)logdata);
}

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index a560a98..9230be5 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -39,6 +39,7 @@ struct update_props_workarea {
 #define ADD_DT_NODE0x0300

 #define MIGRATION_SCOPE(1)
+#define PRRN_SCOPE -2

 static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
@@ -236,6 +237,35 @@ static int add_dt_node(__be32 parent_phandle, __be32 
drc_index)
return rc;
 }

+void pseries_prrn_update_node(__be32 phandle)
+{
+   struct pseries_hp_errorlog *hp_elog;
+   struct device_node *dn;
+
+   /*
+* If a node is found from a the given phandle, the phandle does not
+* represent the drc index of an LMB and we can ignore.
+*/
+   dn = of_find_node_by_phandle(be32_to_cpu(phandle));
+   if (dn) {
+   of_node_put(dn);
+   return;
+   }
+
+   hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL);
+   if(!hp_elog)
+   return;
+
+   hp_elog->resource = PSERIES_HP_ELOG_RESOURCE_MEM;
+   hp_elog->action = PSERIES_HP_ELOG_ACTION_READD;
+   hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
+   hp_elog->_drc_u.drc_index = phandle;
+
+   queue_hotplug_event(hp_elog, NULL, NULL);
+
+   kfree(hp_elog);
+}
+
 int pseries_devicetree_update(s32 scope)
 {
char *rtas_buf;
@@ -274,6 +304,10 @@ int pseries_devicetree_update(s32 scope)
break;
case UPDATE_DT_NODE:
update_dt_node(phandle, scope);
+
+   if (scope == PRRN_SCOPE)
+   
pseries_prrn_update_node(phandle);
+
break;
case ADD_DT_NODE:
drc_index = *data++;



[PATCH v3 2/3] powerpc/pseries: Introduce memory hotplug READD operation

2017-01-06 Thread John Allen
Currently, memory must be hot removed and subsequently re-added in order
to dynamically update the affinity of LMBs specified by a PRRN event.
Earlier implementations of the PRRN event handler ran into issues in which
the hot remove would occur successfully, but a hotplug event would be
initiated from another source and grab the hotplug lock preventing the hot
add from occurring. To prevent this situation, this patch introduces the
notion of a hot "readd" action for memory which atomizes a hot remove and
a hot add into a single, serialized operation on the hotplug queue.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9c23baa..076b892 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -318,6 +318,7 @@ struct pseries_hp_errorlog {

 #define PSERIES_HP_ELOG_ACTION_ADD 1
 #define PSERIES_HP_ELOG_ACTION_REMOVE  2
+#define PSERIES_HP_ELOG_ACTION_READD   3

 #define PSERIES_HP_ELOG_ID_DRC_NAME1
 #define PSERIES_HP_ELOG_ID_DRC_INDEX   2
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 0eb4b1d..06f10a8 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -560,6 +560,44 @@ static int dlpar_memory_remove_by_index(u32 drc_index, 
struct property *prop)
return rc;
 }

+static int dlpar_memory_readd_by_index(u32 drc_index, struct property *prop)
+{
+   struct of_drconf_cell *lmbs;
+   u32 num_lmbs, *p;
+   int lmb_found;
+   int i, rc;
+
+   pr_info("Attempting to update LMB, drc index %x\n", drc_index);
+
+   p = prop->value;
+   num_lmbs = *p++;
+   lmbs = (struct of_drconf_cell *)p;
+
+   lmb_found = 0;
+   for (i = 0; i < num_lmbs; i++) {
+   if (lmbs[i].drc_index == drc_index) {
+   lmb_found = 1;
+   rc = dlpar_remove_lmb([i]);
+   if (!rc) {
+   rc = dlpar_add_lmb([i]);
+   if (rc)
+   dlpar_release_drc(lmbs[i].drc_index);
+   }
+   break;
+   }
+   }
+
+   if (!lmb_found)
+   rc = -EINVAL;
+
+   if (rc)
+   pr_info("Failed to update memory at %llx\n",
+   lmbs[i].base_addr);
+   else
+   pr_info("Memory at %llx was updated\n", lmbs[i].base_addr);
+
+   return rc;
+}
 #else
 static inline int pseries_remove_memblock(unsigned long base,
  unsigned int memblock_size)
@@ -776,6 +814,9 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
else
rc = -EINVAL;
break;
+   case PSERIES_HP_ELOG_ACTION_READD:
+   rc = dlpar_memory_readd_by_index(drc_index, prop);
+   break;
default:
pr_err("Invalid action (%d) specified\n", hp_elog->action);
rc = -EINVAL;



[PATCH v3 1/3] powerpc/pseries: Make the acquire/release of the drc for memory a seperate step

2017-01-06 Thread John Allen
When adding and removing LMBs we should make the acquire/release of
the DRC a separate step to allow for a few improvements. First
this will ensure that LMBs removed during a remove by count operation
are all available if a error occurs and we need to add them back. By
first removeing all the LMBs from the kernel before releasing their
DRCs the LMBs are available to add back should an error occur.

Also, this will allow for faster re-add operations of memory for
PRRN event handling since we can skip the unneeded step of having
to release the DRC and the acquire it back.

Signed-off-by: Nathan Fontenot <nf...@linux.vnet.ibm.com>
Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 76ec104..0eb4b1d 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -446,9 +446,7 @@ static int dlpar_remove_lmb(struct of_drconf_cell *lmb)
/* Update memory regions for memory remove */
memblock_remove(lmb->base_addr, block_sz);

-   dlpar_release_drc(lmb->drc_index);
dlpar_remove_device_tree_lmb(lmb);
-
return 0;
 }

@@ -513,6 +511,7 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove,
if (!lmbs[i].reserved)
continue;

+   dlpar_release_drc(lmbs[i].drc_index);
pr_info("Memory at %llx was hot-removed\n",
lmbs[i].base_addr);

@@ -542,6 +541,9 @@ static int dlpar_memory_remove_by_index(u32 drc_index, 
struct property *prop)
if (lmbs[i].drc_index == drc_index) {
lmb_found = 1;
rc = dlpar_remove_lmb([i]);
+   if (!rc)
+   dlpar_release_drc(lmbs[i].drc_index);
+
break;
}
}
@@ -596,10 +598,6 @@ static int dlpar_add_lmb(struct of_drconf_cell *lmb)
if (lmb->flags & DRCONF_MEM_ASSIGNED)
return -EINVAL;

-   rc = dlpar_acquire_drc(lmb->drc_index);
-   if (rc)
-   return rc;
-
rc = dlpar_add_device_tree_lmb(lmb);
if (rc) {
pr_err("Couldn't update device tree for drc index %x\n",
@@ -615,12 +613,10 @@ static int dlpar_add_lmb(struct of_drconf_cell *lmb)

/* Add the memory */
rc = add_memory(nid, lmb->base_addr, block_sz);
-   if (rc) {
+   if (rc)
dlpar_remove_device_tree_lmb(lmb);
-   dlpar_release_drc(lmb->drc_index);
-   } else {
+   else
lmb->flags |= DRCONF_MEM_ASSIGNED;
-   }

return rc;
 }
@@ -652,10 +648,16 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add, 
struct property *prop)
return -EINVAL;

for (i = 0; i < num_lmbs && lmbs_to_add != lmbs_added; i++) {
-   rc = dlpar_add_lmb([i]);
+   rc = dlpar_acquire_drc(lmbs[i].drc_index);
if (rc)
continue;

+   rc = dlpar_add_lmb([i]);
+   if (rc) {
+   dlpar_release_drc(lmbs[i].drc_index);
+   continue;
+   }
+
lmbs_added++;

/* Mark this lmb so we can remove it later if all of the
@@ -675,6 +677,8 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add, 
struct property *prop)
if (rc)
pr_err("Failed to remove LMB, drc index %x\n",
   be32_to_cpu(lmbs[i].drc_index));
+   else
+   dlpar_release_drc(lmbs[i].drc_index);
}
rc = -EINVAL;
} else {
@@ -708,7 +712,13 @@ static int dlpar_memory_add_by_index(u32 drc_index, struct 
property *prop)
for (i = 0; i < num_lmbs; i++) {
if (lmbs[i].drc_index == drc_index) {
lmb_found = 1;
-   rc = dlpar_add_lmb([i]);
+   rc = dlpar_acquire_drc(lmbs[i].drc_index);
+   if (!rc) {
+   rc = dlpar_add_lmb([i]);
+   if (rc)
+   dlpar_release_drc(lmbs[i].drc_index);
+   }
+
break;
}
}



[PATCH v3 0/3] powerpc/pseries: Perform PRRN topology updates in kernel

2017-01-06 Thread John Allen
Formerly, when we received a PRRN rtas event, device tree updating was
performed in the kernel and the actual topology updating was performed in
userspace. This was necessary as in order to update the topology for memory,
we must perform a hot remove and a subsequent hot add and until recently,
memory hotplug was not included in the kernel. Since memory hotplug is now
available, this patchset moves the PRRN topology updating into the kernel.

Changes from v1:
-Introduce patch to separate the acquire and release drc from existing
 memory hotplug
-Create new function "dlpar_memory_readd_by_index" that consolidates the
 necessary steps of memory hot remove and hot add into a single function
-Remove conversion of phandle to BE
-Since error messages are already generated in the memory hotplug code,
 remove redundant error messages in pseries_prrn_update_node. Since we no
 longer use the return code from the hotplug event, remove the
 wait_for_completion infrastructure.

Changes from v2:
-Check if a node exists before allocating memory and call of_node_put
 if the call to of_find_node_by_phandle returns successfully.

John Allen (3):
  powerpc/pseries: Make the acquire/release of the drc for memory a 
seperate step
  powerpc/pseries: Introduce memory hotplug READD operation
  powerpc/pseries: Update affinity for memory and cpus specified in a PRRN 
event

 arch/powerpc/include/asm/rtas.h |1
 arch/powerpc/kernel/rtasd.c |7 ++
 arch/powerpc/platforms/pseries/hotplug-memory.c |   75 +++
 arch/powerpc/platforms/pseries/mobility.c   |   34 ++
 4 files changed, 104 insertions(+), 13 deletions(-)



[PATCH v2 3/3] powerpc/pseries: Update affinity for memory and cpus specified in a PRRN event

2016-12-15 Thread John Allen
Extend the existing PRRN infrastructure to perform the actual affinity
updating for cpus and memory in addition to the device tree updating. For
cpus, dynamic affinity updating already appears to exist in the kernel in
the form of arch_update_cpu_topology. For memory, we must place a READD
operation on the hotplug queue for any phandle included in the PRRN event
that is determined to be an LMB.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index a26a020..8836130 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -282,6 +283,7 @@ static void prrn_work_fn(struct work_struct *work)
 * the RTAS event.
 */
pseries_devicetree_update(-prrn_update_scope);
+   arch_update_cpu_topology();
 }

 static DECLARE_WORK(prrn_work, prrn_work_fn);
@@ -434,7 +436,10 @@ static void do_event_scan(void)
}

if (error == 0) {
-   pSeries_log_error(logdata, ERR_TYPE_RTAS_LOG, 0);
+   if (rtas_error_type((struct rtas_error_log *)logdata) !=
+   RTAS_TYPE_PRRN)
+   pSeries_log_error(logdata, ERR_TYPE_RTAS_LOG,
+ 0);
handle_rtas_event((struct rtas_error_log *)logdata);
}

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index a560a98..d62d48a 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -39,6 +39,7 @@ struct update_props_workarea {
 #define ADD_DT_NODE0x0300

 #define MIGRATION_SCOPE(1)
+#define PRRN_SCOPE -2

 static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
@@ -236,6 +237,33 @@ static int add_dt_node(__be32 parent_phandle, __be32 
drc_index)
return rc;
 }

+void pseries_prrn_update_node(__be32 phandle)
+{
+   struct pseries_hp_errorlog *hp_elog;
+   struct device_node *dn;
+
+   hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL);
+   if (!hp_elog)
+   return;
+
+   dn = of_find_node_by_phandle(be32_to_cpu(phandle));
+
+   /*
+* If the phandle was not found, assume phandle is the drc index of
+* an LMB.
+*/
+   if (!dn) {
+   hp_elog->resource = PSERIES_HP_ELOG_RESOURCE_MEM;
+   hp_elog->action = PSERIES_HP_ELOG_ACTION_READD;
+   hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
+   hp_elog->_drc_u.drc_index = phandle;
+
+   queue_hotplug_event(hp_elog, NULL, NULL);
+   }
+
+   kfree(hp_elog);
+}
+
 int pseries_devicetree_update(s32 scope)
 {
char *rtas_buf;
@@ -274,6 +302,10 @@ int pseries_devicetree_update(s32 scope)
break;
case UPDATE_DT_NODE:
update_dt_node(phandle, scope);
+
+   if (scope == PRRN_SCOPE)
+   
pseries_prrn_update_node(phandle);
+
break;
case ADD_DT_NODE:
drc_index = *data++;



[PATCH v2 2/3] powerpc/pseries: Introduce memory hotplug READD operation

2016-12-15 Thread John Allen
Currently, memory must be hot removed and subsequently re-added in order
to dynamically update the affinity of LMBs specified by a PRRN event.
Earlier implementations of the PRRN event handler ran into issues in which
the hot remove would occur successfully, but a hotplug event would be
initiated from another source and grab the hotplug lock preventing the hot
add from occurring. To prevent this situation, this patch introduces the
notion of a hot "readd" action for memory which atomizes a hot remove and
a hot add into a single, serialized operation on the hotplug queue.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9c23baa..076b892 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -318,6 +318,7 @@ struct pseries_hp_errorlog {

 #define PSERIES_HP_ELOG_ACTION_ADD 1
 #define PSERIES_HP_ELOG_ACTION_REMOVE  2
+#define PSERIES_HP_ELOG_ACTION_READD   3

 #define PSERIES_HP_ELOG_ID_DRC_NAME1
 #define PSERIES_HP_ELOG_ID_DRC_INDEX   2
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 0eb4b1d..06f10a8 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -560,6 +560,44 @@ static int dlpar_memory_remove_by_index(u32 drc_index, 
struct property *prop)
return rc;
 }

+static int dlpar_memory_readd_by_index(u32 drc_index, struct property *prop)
+{
+   struct of_drconf_cell *lmbs;
+   u32 num_lmbs, *p;
+   int lmb_found;
+   int i, rc;
+
+   pr_info("Attempting to update LMB, drc index %x\n", drc_index);
+
+   p = prop->value;
+   num_lmbs = *p++;
+   lmbs = (struct of_drconf_cell *)p;
+
+   lmb_found = 0;
+   for (i = 0; i < num_lmbs; i++) {
+   if (lmbs[i].drc_index == drc_index) {
+   lmb_found = 1;
+   rc = dlpar_remove_lmb([i]);
+   if (!rc) {
+   rc = dlpar_add_lmb([i]);
+   if (rc)
+   dlpar_release_drc(lmbs[i].drc_index);
+   }
+   break;
+   }
+   }
+
+   if (!lmb_found)
+   rc = -EINVAL;
+
+   if (rc)
+   pr_info("Failed to update memory at %llx\n",
+   lmbs[i].base_addr);
+   else
+   pr_info("Memory at %llx was updated\n", lmbs[i].base_addr);
+
+   return rc;
+}
 #else
 static inline int pseries_remove_memblock(unsigned long base,
  unsigned int memblock_size)
@@ -776,6 +814,9 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
else
rc = -EINVAL;
break;
+   case PSERIES_HP_ELOG_ACTION_READD:
+   rc = dlpar_memory_readd_by_index(drc_index, prop);
+   break;
default:
pr_err("Invalid action (%d) specified\n", hp_elog->action);
rc = -EINVAL;



[PATCH v2 1/3] powerpc/pseries: Make the acquire/release of the drc for memory a seperate step

2016-12-15 Thread John Allen
When adding and removing LMBs we should make the acquire/release of
the DRC a separate step to allow for a few improvements. First
this will ensure that LMBs removed during a remove by count operation
are all available if a error occurs and we need to add them back. By
first removeing all the LMBs from the kernel before releasing their
DRCs the LMBs are available to add back should an error occur.

Also, this will allow for faster re-add operations of memory for
PRRN event handling since we can skip the unneeded step of having
to release the DRC and the acquire it back.

Signed-off-by: Nathan Fontenot <nf...@linux.vnet.ibm.com>
Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   34 +++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 2617f9f..be11fc3 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -446,9 +446,7 @@ static int dlpar_remove_lmb(struct of_drconf_cell *lmb)
/* Update memory regions for memory remove */
memblock_remove(lmb->base_addr, block_sz);

-   dlpar_release_drc(lmb->drc_index);
dlpar_remove_device_tree_lmb(lmb);
-
return 0;
 }

@@ -516,6 +514,7 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove,
if (!lmbs[i].reserved)
continue;

+   dlpar_release_drc(lmbs[i].drc_index);
pr_info("Memory at %llx was hot-removed\n",
lmbs[i].base_addr);

@@ -545,6 +544,9 @@ static int dlpar_memory_remove_by_index(u32 drc_index, 
struct property *prop)
if (lmbs[i].drc_index == drc_index) {
lmb_found = 1;
rc = dlpar_remove_lmb([i]);
+   if (!rc)
+   dlpar_release_drc(lmbs[i].drc_index);
+
break;
}
}
@@ -599,10 +601,6 @@ static int dlpar_add_lmb(struct of_drconf_cell *lmb)
if (lmb->flags & DRCONF_MEM_ASSIGNED)
return -EINVAL;

-   rc = dlpar_acquire_drc(lmb->drc_index);
-   if (rc)
-   return rc;
-
rc = dlpar_add_device_tree_lmb(lmb);
if (rc) {
pr_err("Couldn't update device tree for drc index %x\n",
@@ -618,12 +616,10 @@ static int dlpar_add_lmb(struct of_drconf_cell *lmb)

/* Add the memory */
rc = add_memory(nid, lmb->base_addr, block_sz);
-   if (rc) {
+   if (rc)
dlpar_remove_device_tree_lmb(lmb);
-   dlpar_release_drc(lmb->drc_index);
-   } else {
+   else
lmb->flags |= DRCONF_MEM_ASSIGNED;
-   }

return rc;
 }
@@ -655,10 +651,16 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add, 
struct property *prop)
return -EINVAL;

for (i = 0; i < num_lmbs && lmbs_to_add != lmbs_added; i++) {
-   rc = dlpar_add_lmb([i]);
+   rc = dlpar_acquire_drc(lmbs[i].drc_index);
if (rc)
continue;

+   rc = dlpar_add_lmb([i]);
+   if (rc) {
+   dlpar_release_drc(lmbs[i].drc_index);
+   continue;
+   }
+
lmbs_added++;

/* Mark this lmb so we can remove it later if all of the
@@ -678,6 +680,8 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add, 
struct property *prop)
if (rc)
pr_err("Failed to remove LMB, drc index %x\n",
   be32_to_cpu(lmbs[i].drc_index));
+   else
+   dlpar_release_drc(lmbs[i].drc_index);
}
rc = -EINVAL;
} else {
@@ -711,7 +715,13 @@ static int dlpar_memory_add_by_index(u32 drc_index, struct 
property *prop)
for (i = 0; i < num_lmbs; i++) {
if (lmbs[i].drc_index == drc_index) {
lmb_found = 1;
-   rc = dlpar_add_lmb([i]);
+   rc = dlpar_acquire_drc(lmbs[i].drc_index);
+   if (!rc) {
+   rc = dlpar_add_lmb([i]);
+   if (rc)
+   dlpar_release_drc(lmbs[i].drc_index);
+   }
+
break;
}
}



[PATCH v2 0/3] powerpc/pseries: Perform PRRN topology updates in kernel

2016-12-15 Thread John Allen
Formerly, when we received a PRRN rtas event, device tree updating was
performed in the kernel and the actual topology updating was performed in
userspace. This was necessary as in order to update the topology for memory,
we must perform a hot remove and a subsequent hot add and until recently,
memory hotplug was not included in the kernel. Since memory hotplug is now
available, this patchset moves the PRRN topology updating into the kernel.

Changes from v1:
-Introduce patch to separate the acquire and release drc from existing
 memory hotplug
-Create new function "dlpar_memory_readd_by_index" that consolidates the
 necessary steps of memory hot remove and hot add into a single function
-Remove conversion of phandle to BE
-Since error messages are already generated in the memory hotplug code,
 remove redundant error messages in pseries_prrn_update_node. Since we no
 longer use the return code from the hotplug event, remove the
 wait_for_completion infrastructure.

John Allen (3):
  powerpc/pseries: Make the acquire/release of the drc for memory a 
seperate step
  powerpc/pseries: Introduce memory hotplug READD operation
  powerpc/pseries: Update affinity for memory and cpus specified in a PRRN 
event

 arch/powerpc/include/asm/rtas.h |1
 arch/powerpc/kernel/rtasd.c |7 ++
 arch/powerpc/platforms/pseries/hotplug-memory.c |   75 +++
 arch/powerpc/platforms/pseries/mobility.c   |   32 ++
 4 files changed, 102 insertions(+), 13 deletions(-)



[PATCH 2/2] powerpc/pseries: Update affinity for memory and cpus specified in a PRRN event

2016-12-12 Thread John Allen
Extend the existing PRRN infrastructure to perform the actual affinity
updating for cpus and memory in addition to the device tree updating. For
cpus, dynamic affinity updating already appears to exist in the kernel in
the form of arch_update_cpu_topology. For memory, we must place a READD
operation on the hotplug queue for any phandle included in the PRRN event
that is determined to be an LMB.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index a26a020..8836130 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -282,6 +283,7 @@ static void prrn_work_fn(struct work_struct *work)
 * the RTAS event.
 */
pseries_devicetree_update(-prrn_update_scope);
+   arch_update_cpu_topology();
 }

 static DECLARE_WORK(prrn_work, prrn_work_fn);
@@ -434,7 +436,10 @@ static void do_event_scan(void)
}

if (error == 0) {
-   pSeries_log_error(logdata, ERR_TYPE_RTAS_LOG, 0);
+   if (rtas_error_type((struct rtas_error_log *)logdata) !=
+   RTAS_TYPE_PRRN)
+   pSeries_log_error(logdata, ERR_TYPE_RTAS_LOG,
+ 0);
handle_rtas_event((struct rtas_error_log *)logdata);
}

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index a560a98..e2ed2d6 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -39,6 +39,7 @@ struct update_props_workarea {
 #define ADD_DT_NODE0x0300

 #define MIGRATION_SCOPE(1)
+#define PRRN_SCOPE -2

 static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
@@ -236,6 +237,49 @@ static int add_dt_node(__be32 parent_phandle, __be32 
drc_index)
return rc;
 }

+void pseries_prrn_update_node(__be32 phandle)
+{
+   struct pseries_hp_errorlog *hp_elog;
+   struct completion hotplug_done;
+   struct device_node *dn;
+   char *type;
+   int rc = 0;
+
+   hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL);
+   if (!hp_elog)
+   return;
+
+   dn = of_find_node_by_phandle(be32_to_cpu(phandle));
+
+   /*
+* If the phandle was not found, assume phandle is the drc index of
+* an LMB.
+*/
+   if (!dn) {
+   type = "memory";
+   hp_elog->resource = PSERIES_HP_ELOG_RESOURCE_MEM;
+   hp_elog->action = PSERIES_HP_ELOG_ACTION_READD;
+   hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
+   hp_elog->_drc_u.drc_index = cpu_to_be32(phandle);
+
+   pr_info("Attempting to update %s at drc index %x\n", type,
+   hp_elog->_drc_u.drc_index);
+
+   init_completion(_done);
+   queue_hotplug_event(hp_elog, _done, );
+   wait_for_completion(_done);
+
+   if (rc)
+   pr_info("Failed to update %s at drc index %x\n", type,
+   hp_elog->_drc_u.drc_index);
+   else
+   pr_info("Updated %s at drc index %x\n", type,
+   hp_elog->_drc_u.drc_index);
+   }
+
+   kfree(hp_elog);
+}
+
 int pseries_devicetree_update(s32 scope)
 {
char *rtas_buf;
@@ -274,6 +318,10 @@ int pseries_devicetree_update(s32 scope)
break;
case UPDATE_DT_NODE:
update_dt_node(phandle, scope);
+
+   if (scope == PRRN_SCOPE)
+   
pseries_prrn_update_node(phandle);
+
break;
case ADD_DT_NODE:
drc_index = *data++;



[PATCH 1/2] powerpc/pseries: Introduce memory hotplug READD operation

2016-12-12 Thread John Allen
Currently, memory must be hot removed and subsequently re-added in order
to dynamically update the affinity of LMBs specified by a PRRN event.
Earlier implementations of the PRRN event handler ran into issues in which
the hot remove would occur successfully, but a hotplug event would be
initiated from another source and grab the hotplug lock preventing the hot
add from occurring. To prevent this situation, this patch introduces the
notion of a hot "readd" action for memory which atomizes a hot remove and
a hot add into a single, serialized operation on the hotplug queue.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9c23baa..076b892 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -318,6 +318,7 @@ struct pseries_hp_errorlog {

 #define PSERIES_HP_ELOG_ACTION_ADD 1
 #define PSERIES_HP_ELOG_ACTION_REMOVE  2
+#define PSERIES_HP_ELOG_ACTION_READD   3

 #define PSERIES_HP_ELOG_ID_DRC_NAME1
 #define PSERIES_HP_ELOG_ID_DRC_INDEX   2
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 76ec104..291d49b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -766,6 +766,11 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
else
rc = -EINVAL;
break;
+   case PSERIES_HP_ELOG_ACTION_READD:
+   rc = dlpar_memory_remove_by_index(drc_index, prop);
+   if (!rc)
+   dlpar_memory_add_by_index(drc_index, prop);
+   break;
default:
pr_err("Invalid action (%d) specified\n", hp_elog->action);
rc = -EINVAL;



[PATCH 0/2] powerpc/pseries: Perform PRRN topology updates in kernel

2016-12-12 Thread John Allen
Formerly, when we received a PRRN rtas event, device tree updating was
performed in the kernel and the actual topology updating was performed in
userspace. This was necessary as in order to update the topology for memory,
we must perform a hot remove and a subsequent hot add and until recently,
memory hotplug was not included in the kernel. Since memory hotplug is now
available, this patchset moves the PRRN topology updating into the kernel.

John Allen (2):
  powerpc/pseries: Introduce memory hotplug READD operation
  powerpc/pseries: Update affinity for memory and cpus specified in a PRRN 
event

 arch/powerpc/include/asm/rtas.h |1
 arch/powerpc/kernel/rtasd.c |7 +++
 arch/powerpc/platforms/pseries/hotplug-memory.c |5 ++
 arch/powerpc/platforms/pseries/mobility.c   |   48 +++
 4 files changed, 60 insertions(+), 1 deletion(-)



Re: [PATCH v6 2/3] powerpc/pseries: Implement indexed-count hotplug memory add

2016-10-27 Thread John Allen
On 10/18/2016 12:20 PM, Nathan Fontenot wrote:
> From: Sahil Mehta <sahilmeht...@gmail.com>
> 
> Indexed-count add for memory hotplug guarantees that a contiguous block
> of  lmbs beginning at a specified  will be assigned (NOT
> that  lmbs will be added). Because of Qemu's per-DIMM memory
> management, the addition of a contiguous block of memory currently
> requires a series of individual calls. Indexed-count add reduces
> this series into a single call.
> 
> Signed-off-by: Sahil Mehta <sahilmeht...@gmail.com>
> Signed-off-by: Nathan Fontenot <nf...@linux.vnet.ibm.com>
> ---

Reviewed-by: John Allen <jal...@linux.vnet.ibm.com>



Re: [PATCH v6 3/3] powerpc/pseries: Implement indexed-count hotplug memory remove

2016-10-27 Thread John Allen
On 10/18/2016 12:21 PM, Nathan Fontenot wrote:
> From: Sahil Mehta <sahilmeht...@gmail.com>
> 
> Indexed-count remove for memory hotplug guarantees that a contiguous block
> of  lmbs beginning at a specified  will be unassigned (NOT
> that  lmbs will be removed). Because of Qemu's per-DIMM memory
> management, the removal of a contiguous block of memory currently
> requires a series of individual calls. Indexed-count remove reduces
> this series into a single call.
> 
> Signed-off-by: Sahil Mehta <sahilmeht...@gmail.com>
> Signed-off-by: Nathan Fontenot <nf...@linux.vnet.ibm.com>
> ---

Reviewed-by: John Allen <jal...@linux.vnet.ibm.com>



Re: [PATCH v6 1/3] powerpc/pseries: Correct possible read beyond dlpar sysfs buffer

2016-10-27 Thread John Allen
On 10/18/2016 12:20 PM, Nathan Fontenot wrote:
> The pasrsing of data written to the dlpar file in sysfs does not correctly
> account for the possibility of reading past the end of the buffer. Correct
> this by updating the buffer parsing code to make a local copy and use the
> strsep() and sysfs_streq() routines to parse the buffer. This also
> separates the parsing code into subroutines to make cleaner.
> 
> Signed-off-by: Nathan Fontenot <nf...@linux.vnet.ibm.com>
> ---

Reviewed-by: John Allen <jal...@linux.vnet.ibm.com>



[PATCH 3/3] powerpc/pseries: Use kernel hotplug queue for PowerVM hotplug events

2016-07-07 Thread John Allen
The sysfs interface used to handle PowerVM hotplug events should use the
hotplug queue as well. PRRN events will soon be placing many hotplug
events on the queue at once and we will need ordinary hotplug events to
use the queue as well in order to ensure these events will still be handled
and that proper serialization is maintained during the PRRN event.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/dlpar.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 66a77d7..4748124 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -489,7 +489,9 @@ static ssize_t dlpar_store(struct class *class, struct 
class_attribute *attr,
goto dlpar_store_out;
}

-   rc = handle_dlpar_errorlog(hp_elog);
+   init_completion(_done);
+   queue_hotplug_event(hp_elog, _done, );
+   wait_for_completion(_done);

 dlpar_store_out:
kfree(hp_elog);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] powerpc/pseries: Add support for hotplug interrupt source

2016-07-07 Thread John Allen
Add handler for new hotplug interrupt. For memory and CPU hotplug events,
we will add the hotplug errorlog to the hotplug workqueue. Since PCI
hotplug is not currently supported in the kernel, PCI hotplug events are
written to the rtas_log_bug and are handled by rtas_errd.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/pseries.h |2 ++
 arch/powerpc/platforms/pseries/ras.c |   38 ++
 2 files changed, 40 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index ddb9aa5..bba3285 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -56,6 +56,8 @@ extern int dlpar_detach_node(struct device_node *);
 extern int dlpar_acquire_drc(u32 drc_index);
 extern int dlpar_release_drc(u32 drc_index);

+void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog,
+struct completion *hotplug_done, int *rc);
 #ifdef CONFIG_MEMORY_HOTPLUG
 int dlpar_memory(struct pseries_hp_errorlog *hp_elog);
 #else
diff --git a/arch/powerpc/platforms/pseries/ras.c 
b/arch/powerpc/platforms/pseries/ras.c
index 9a3e27b..d11e8ca 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -43,6 +43,7 @@ static int ras_check_exception_token;
 /* EPOW events counter variable */
 static int num_epow_events;

+static irqreturn_t ras_hotplug_interrupt(int irq, void *dev_id);
 static irqreturn_t ras_epow_interrupt(int irq, void *dev_id);
 static irqreturn_t ras_error_interrupt(int irq, void *dev_id);

@@ -65,6 +66,14 @@ static int __init init_ras_IRQ(void)
of_node_put(np);
}

+   /* Hotplug Events */
+   np = of_find_node_by_path("/event-sources/hot-plug-events");
+   if (np != NULL) {
+   request_event_sources_irqs(np, ras_hotplug_interrupt,
+  "RAS_HOTPLUG");
+   of_node_put(np);
+   }
+
/* EPOW Events */
np = of_find_node_by_path("/event-sources/epow-events");
if (np != NULL) {
@@ -190,6 +199,35 @@ static void rtas_parse_epow_errlog(struct rtas_error_log 
*log)
num_epow_events++;
 }

+static irqreturn_t ras_hotplug_interrupt(int irq, void *dev_id)
+{
+   struct pseries_errorlog *pseries_log;
+   struct pseries_hp_errorlog *hp_elog;
+
+   spin_lock(_log_buf_lock);
+
+   rtas_call(ras_check_exception_token, 6, 1, NULL,
+ RTAS_VECTOR_EXTERNAL_INTERRUPT, virq_to_hw(irq),
+ RTAS_HOTPLUG_EVENTS, 0, __pa(_log_buf),
+ rtas_get_error_log_max());
+
+   pseries_log = get_pseries_errorlog((struct rtas_error_log *)ras_log_buf,
+  PSERIES_ELOG_SECT_ID_HOTPLUG);
+   hp_elog = (struct pseries_hp_errorlog *)pseries_log->data;
+
+   /* Since PCI hotplug is not currently supported on pseries, put
+* PCI hotplug events on the ras_log_buf to be handled by rtas_errd
+*/
+   if (hp_elog->resource == PSERIES_HP_ELOG_RESOURCE_MEM ||
+   hp_elog->resource == PSERIES_HP_ELOG_RESOURCE_CPU)
+   queue_hotplug_event(hp_elog, NULL, NULL);
+   else
+   log_error(ras_log_buf, ERR_TYPE_RTAS_LOG, 0);
+
+   spin_unlock(_log_buf_lock);
+   return IRQ_HANDLED;
+}
+
 /* Handle environmental and power warning (EPOW) interrupts. */
 static irqreturn_t ras_epow_interrupt(int irq, void *dev_id)
 {

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] powerpc/pseries: Add pseries hotplug workqueue

2016-07-07 Thread John Allen
In support of PAPR changes to add a new hotplug interrupt, introduce a
hotplug workqueue to avoid processing hotplug events in interrupt context.
We will also take advantage of the queue on PowerVM to ensure hotplug
events initiated from different sources (HMC and PRRN events) are handled
and serialized properly.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/dlpar.c |   52 
 1 file changed, 52 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 2b93ae8..66a77d7 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -27,6 +27,15 @@
 #include 
 #include 

+struct workqueue_struct *pseries_hp_wq;
+
+struct pseries_hp_work {
+   struct work_struct work;
+   struct pseries_hp_errorlog *errlog;
+   struct completion *hp_completion;
+   int *rc;
+};
+
 struct cc_workarea {
__be32  drc_index;
__be32  zero;
@@ -368,10 +377,51 @@ static int handle_dlpar_errorlog(struct 
pseries_hp_errorlog *hp_elog)
return rc;
 }

+void pseries_hp_work_fn(struct work_struct *work)
+{
+   struct pseries_hp_work *hp_work =
+   container_of(work, struct pseries_hp_work, work);
+
+   if (hp_work->rc)
+   *(hp_work->rc) = handle_dlpar_errorlog(hp_work->errlog);
+   else
+   handle_dlpar_errorlog(hp_work->errlog);
+
+   if (hp_work->hp_completion)
+   complete(hp_work->hp_completion);
+
+   kfree(hp_work->errlog);
+   kfree((void *)work);
+}
+
+void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog,
+struct completion *hotplug_done, int *rc)
+{
+   struct pseries_hp_work *work;
+   struct pseries_hp_errorlog *hp_errlog_copy;
+
+   hp_errlog_copy = kmalloc(sizeof(struct pseries_hp_errorlog),
+GFP_KERNEL);
+   memcpy(hp_errlog_copy, hp_errlog, sizeof(struct pseries_hp_errorlog));
+
+   work = kmalloc(sizeof(struct pseries_hp_work), GFP_KERNEL);
+   if (work) {
+   INIT_WORK((struct work_struct *)work, pseries_hp_work_fn);
+   work->errlog = hp_errlog_copy;
+   work->hp_completion = hotplug_done;
+   work->rc = rc;
+   queue_work(pseries_hp_wq, (struct work_struct *)work);
+   } else {
+   *rc = -ENOMEM;
+   complete(hotplug_done);
+   }
+}
+
 static ssize_t dlpar_store(struct class *class, struct class_attribute *attr,
   const char *buf, size_t count)
 {
struct pseries_hp_errorlog *hp_elog;
+   struct completion hotplug_done;
const char *arg;
int rc;

@@ -450,6 +500,8 @@ static CLASS_ATTR(dlpar, S_IWUSR, NULL, dlpar_store);

 static int __init pseries_dlpar_init(void)
 {
+   pseries_hp_wq = alloc_workqueue("pseries hotplug workqueue",
+   WQ_UNBOUND, 1);
return sysfs_create_file(kernel_kobj, _attr_dlpar.attr);
 }
 machine_device_initcall(pseries, pseries_dlpar_init);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/3] powerpc/pseries: Add pseries hotplug queue and PowerVM/PowerKVM use cases

2016-07-07 Thread John Allen
This series introduces a new workqueue for handling hotplug events. On
PowerKVM guests, this provides a work deferral mechanism to avoid
processing hotplug events in interrupt context. On PowerVM lpars, this
will be used to ensure that hotplug events initiated from the HMC and
from PRRN events are handled and serialized properly.

---

John Allen (3):
  powerpc/pseries: Add pseries hotplug workqueue
  powerpc/pseries: Add support for hotplug interrupt source
  powerpc/pseries: Use kernel hotplug queue for PowerVM hotplug events


 arch/powerpc/platforms/pseries/dlpar.c   |   57 +-
 arch/powerpc/platforms/pseries/pseries.h |2 +
 arch/powerpc/platforms/pseries/ras.c |   38 
 3 files changed, 96 insertions(+), 1 deletion(-)

--
John Allen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] powerpc/pseries: Add pseries hotplug queue and PowerVM/PowerKVM use cases

2016-07-06 Thread John Allen
I forgot to run this series through checkpatch before sending. Looks like there
are some various whitespace errors. Will fix these and send with v2 after any
comments on the patches.

On 07/06/2016 10:44 AM, John Allen wrote:
> This series introduces a new workqueue for handling hotplug events. On
> PowerKVM guests, this provides a work deferral mechanism to avoid
> processing hotplug events in interrupt context. On PowerVM lpars, this
> will be used to ensure that hotplug events initiated from the HMC and
> from PRRN events are handled and serialized properly.
> 
> ---
> 
> John Allen (3):
>   powerpc/pseries: Add pseries hotplug workqueue
>   powerpc/pseries: Add support for hotplug interrupt source
>   powerpc/pseries: Use kernel hotplug queue for PowerVM hotplug events
> 
> 
>  arch/powerpc/platforms/pseries/dlpar.c   |   57 
> +-
>  arch/powerpc/platforms/pseries/pseries.h |2 +
>  arch/powerpc/platforms/pseries/ras.c |   38 
>  3 files changed, 96 insertions(+), 1 deletion(-)
> 
> --
> John Allen
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net-next] ibmvnic: Defer tx completion processing using a wait queue

2016-04-12 Thread John Allen
On 04/12/2016 03:12 PM, Eric Dumazet wrote:
> On Tue, 2016-04-12 at 14:38 -0500, John Allen wrote:
>> Moves tx completion processing out of interrupt context, deferring work
>> using a wait queue. With this work now deferred, we must account for the
>> possibility that skbs can be sent faster than we can process completion
>> requests in which case the tx buffer will overflow. If the tx buffer is
>> full, ibmvnic_xmit will return NETDEV_TX_BUSY and stop the current tx
>> queue. Subsequently, the queue will be restarted in ibmvnic_complete_tx
>> when all pending tx completion requests have been cleared.
> 
> 1) Why is this needed ?

In the current ibmvnic implementation, tx completion processing is done in
interrupt context. Depending on the load, this can block further
interrupts for a long time. This patch just creates a bottom half so that
when a tx completion interrupt comes in, we can defer the majority of the
work and exit interrupt context quickly.

> 
> 2) If it is needed, why is this not done in a generic way, so that other
> drivers can use this ?

I'm still fairly new to network driver development so I'm not in tune with
the needs of other drivers. My assumption was that the wait queue data
structure was a reasonably generic way to handle something like this. Is
there a more appropriate/generic way of implementing a bottom half for
this purpose?

-John
> 
> Thanks.
> 
> 
> 
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH net-next] ibmvnic: Defer tx completion processing using a wait queue

2016-04-12 Thread John Allen
Moves tx completion processing out of interrupt context, deferring work
using a wait queue. With this work now deferred, we must account for the
possibility that skbs can be sent faster than we can process completion
requests in which case the tx buffer will overflow. If the tx buffer is
full, ibmvnic_xmit will return NETDEV_TX_BUSY and stop the current tx
queue. Subsequently, the queue will be restarted in ibmvnic_complete_tx
when all pending tx completion requests have been cleared.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 864cb21..641e340 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -105,6 +105,7 @@ static int pending_scrq(struct ibmvnic_adapter *,
struct ibmvnic_sub_crq_queue *);
 static union sub_crq *ibmvnic_next_scrq(struct ibmvnic_adapter *,
struct ibmvnic_sub_crq_queue *);
+static int ibmvnic_tx_work(void *data);
 static int ibmvnic_poll(struct napi_struct *napi, int data);
 static void send_map_query(struct ibmvnic_adapter *adapter);
 static void send_request_map(struct ibmvnic_adapter *, dma_addr_t, __be32, u8);
@@ -437,6 +438,17 @@ static int ibmvnic_open(struct net_device *netdev)

tx_pool->consumer_index = 0;
tx_pool->producer_index = 0;
+
+   init_waitqueue_head(_pool->ibmvnic_tx_comp_q);
+   tx_pool->work_thread =
+   kthread_run(ibmvnic_tx_work, adapter->tx_scrq[i],
+   "%s_%s_%d",
+   IBMVNIC_NAME, adapter->netdev->name, i);
+   if (IS_ERR(tx_pool->work_thread)) {
+   dev_err(dev, "Couldn't create kernel thread: %ld\n",
+   PTR_ERR(tx_pool->work_thread));
+   goto thread_failed;
+   }
}
adapter->bounce_buffer_size =
(netdev->mtu + ETH_HLEN - 1) / PAGE_SIZE + 1;
@@ -477,6 +489,9 @@ bounce_map_failed:
 bounce_alloc_failed:
i = tx_subcrqs - 1;
kfree(adapter->tx_pool[i].free_map);
+thread_failed:
+   for (j = 0; j < i; j++)
+   kthread_stop(adapter->tx_pool[j].work_thread);
 tx_fm_alloc_failed:
free_long_term_buff(adapter, >tx_pool[i].long_term_buff);
 tx_ltb_alloc_failed:
@@ -731,6 +746,16 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct 
net_device *netdev)
}

index = tx_pool->free_map[tx_pool->consumer_index];
+   /* tx queue full */
+   if ((index + 1) % adapter->max_tx_entries_per_subcrq ==
+   tx_pool->free_map[tx_pool->producer_index]) {
+   netif_tx_stop_queue(netdev_get_tx_queue(netdev, queue_num));
+   tx_send_failed++;
+   tx_dropped++;
+   ret = NETDEV_TX_BUSY;
+   goto out;
+   }
+
offset = index * adapter->req_mtu;
dst = tx_pool->long_term_buff.buff + offset;
memset(dst, 0, adapter->req_mtu);
@@ -1314,6 +1339,7 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter 
*adapter,
 {
struct device *dev = >vdev->dev;
struct ibmvnic_tx_buff *txbuff;
+   struct netdev_queue *txq;
union sub_crq *next;
int index;
int i, j;
@@ -1361,6 +1387,10 @@ restart_loop:
next->tx_comp.first = 0;
}

+   txq = netdev_get_tx_queue(adapter->netdev, scrq->pool_index);
+   if (netif_tx_queue_stopped(txq))
+   netif_tx_wake_queue(txq);
+
enable_scrq_irq(adapter, scrq);

if (pending_scrq(adapter, scrq)) {
@@ -1371,13 +1401,35 @@ restart_loop:
return 0;
 }

+static int ibmvnic_tx_work(void *data)
+{
+   int rc;
+   struct ibmvnic_sub_crq_queue *scrq = data;
+   struct ibmvnic_adapter *adapter = scrq->adapter;
+
+   while (1) {
+   rc = wait_event_interruptible(adapter->
+ tx_pool[scrq->pool_index].
+ ibmvnic_tx_comp_q,
+ pending_scrq(adapter, scrq));
+   BUG_ON(rc);
+
+   if (kthread_should_stop())
+   break;
+
+   disable_scrq_irq(adapter, scrq);
+
+   ibmvnic_complete_tx(adapter, scrq);
+   }
+   return 0;
+}
+
 static irqreturn_t ibmvnic_interrupt_tx(int irq, void *instance)
 {
struct ibmvnic_sub_crq_queue *scrq = instance;
struct ibmvnic_adapter *adapter = scrq->adapter;

-   disable_scrq_irq(adapter, scrq);
-   ibmvnic_complete_tx(adapter, scrq);
+   wake_up(>tx_pool[scrq->pool_index].ibmvnic_tx_comp_q);

return IRQ_HANDLED;
 }

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH net-next] ibmvnic: Enable use of multiple tx/rx scrqs

2016-04-06 Thread John Allen
Enables the use of multiple transmit and receive scrqs allowing the ibmvnic
driver to take advantage of multiqueue functionality. To achieve this, the
driver must implement the process of negotiating the maximum number of
queues allowed by the server. Initially, the driver will attempt to login
with the maximum number of tx and rx queues supported by the server. If
the server fails to allocate the requested number of scrqs, it will return
partial success in the login response. In this case, we must reinitiate
the login process from the request capabilities stage and attempt to login
requesting fewer scrqs.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 21bccf6..6e9b91d 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -800,11 +800,12 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct 
net_device *netdev)
ret = NETDEV_TX_BUSY;
goto out;
}
-   lpar_rc = send_subcrq_indirect(adapter, handle_array[0],
+   lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num],
   (u64)tx_buff->indir_dma,
   (u64)num_entries);
} else {
-   lpar_rc = send_subcrq(adapter, handle_array[0], _crq);
+   lpar_rc = send_subcrq(adapter, handle_array[queue_num],
+ _crq);
}
if (lpar_rc != H_SUCCESS) {
dev_err(dev, "tx failed with code %ld\n", lpar_rc);
@@ -989,7 +990,7 @@ restart_poll:
netdev->stats.rx_bytes += length;
frames_processed++;
}
-   replenish_pools(adapter);
+   replenish_rx_pool(adapter, >rx_pool[scrq_num]);

if (frames_processed < budget) {
enable_scrq_irq(adapter, adapter->rx_scrq[scrq_num]);
@@ -1426,9 +1427,9 @@ static void init_sub_crqs(struct ibmvnic_adapter 
*adapter, int retry)
entries_page : adapter->max_rx_add_entries_per_subcrq;

/* Choosing the maximum number of queues supported by firmware*/
-   adapter->req_tx_queues = adapter->min_tx_queues;
-   adapter->req_rx_queues = adapter->min_rx_queues;
-   adapter->req_rx_add_queues = adapter->min_rx_add_queues;
+   adapter->req_tx_queues = adapter->max_tx_queues;
+   adapter->req_rx_queues = adapter->max_rx_queues;
+   adapter->req_rx_add_queues = adapter->max_rx_add_queues;

adapter->req_mtu = adapter->max_mtu;
}
@@ -1776,13 +1777,11 @@ static void send_login(struct ibmvnic_adapter *adapter)
goto buf_map_failed;
}

-   rsp_buffer_size =
-   sizeof(struct ibmvnic_login_rsp_buffer) +
-   sizeof(u64) * (adapter->req_tx_queues +
-  adapter->req_rx_queues *
-  adapter->req_rx_add_queues + adapter->
-  req_rx_add_queues) +
-   sizeof(u8) * (IBMVNIC_TX_DESC_VERSIONS);
+   rsp_buffer_size = sizeof(struct ibmvnic_login_rsp_buffer) +
+ sizeof(u64) * adapter->req_tx_queues +
+ sizeof(u64) * adapter->req_rx_queues +
+ sizeof(u64) * adapter->req_rx_queues +
+ sizeof(u8) * IBMVNIC_TX_DESC_VERSIONS;

login_rsp_buffer = kmalloc(rsp_buffer_size, GFP_ATOMIC);
if (!login_rsp_buffer)
@@ -2401,6 +2400,16 @@ static int handle_login_rsp(union ibmvnic_crq 
*login_rsp_crq,
dma_unmap_single(dev, adapter->login_rsp_buf_token,
 adapter->login_rsp_buf_sz, DMA_BIDIRECTIONAL);

+   /* If the number of queues requested can't be allocated by the
+* server, the login response will return with code 1. We will need
+* to resend the login buffer with fewer queues requested.
+*/
+   if (login_rsp_crq->generic.rc.code) {
+   adapter->renegotiate = true;
+   complete(>init_done);
+   return 0;
+   }
+
netdev_dbg(adapter->netdev, "Login Response Buffer:\n");
for (i = 0; i < (adapter->login_rsp_buf_sz - 1) / 8 + 1; i++) {
netdev_dbg(adapter->netdev, "%016lx\n",
@@ -3627,15 +3636,22 @@ static int ibmvnic_probe(struct vio_dev *dev, const 
struct vio_device_id *id)

init_completion(>init_done);
wait_for_completion(>init_done);
+
+   do {
+   adapter->renegotiate = false;

-   /* needed to pull init_sub_crqs outside of an interrupt context
-* because it creates IRQ mappings for the subCRQ queues, causing
-

[PATCH v4] memory-hotplug: Fix kernel warning during memory hotplug on ppc64

2016-01-06 Thread John Allen
This patch fixes a bug where a kernel warning is triggered when performing
a memory hotplug on ppc64. This warning may also occur on any architecture
that uses the memory_probe_store interface.

WARNING: at drivers/base/memory.c:200
CPU: 9 PID: 13042 Comm: systemd-udevd Not tainted 
4.4.0-rc4-00113-g0bd0f1e-dirty #7
NIP [c055e034] pages_correctly_reserved+0x134/0x1b0
LR [c055e7f8] memory_subsys_online+0x68/0x140
Call Trace:
[c000fa9e7b50] [c000fa9e7b90] 0xc000fa9e7b90 (unreliable)
[c000fa9e7bb0] [c055e7f8] memory_subsys_online+0x68/0x140
[c000fa9e7bf0] [c0540064] device_online+0xb4/0x120
[c000fa9e7c30] [c055e6c0] store_mem_state+0xb0/0x180
[c000fa9e7c70] [c053c5e4] dev_attr_store+0x34/0x60
[c000fa9e7c90] [c02db0a4] sysfs_kf_write+0x64/0xa0
[c000fa9e7cb0] [c02da0cc] kernfs_fop_write+0x17c/0x1e0
[c000fa9e7d00] [c02481b0] __vfs_write+0x40/0x160
[c000fa9e7d90] [c0248ce8] vfs_write+0xb8/0x200
[c000fa9e7de0] [c0249b40] SyS_write+0x60/0x110
[c000fa9e7e30] [c0009260] system_call+0x38/0xd0

The warning is triggered because there is a udev rule that automatically
tries to online memory after it has been added. The udev rule varies from
distro to distro, but will generally look something like:

SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

On any architecture that uses memory_probe_store to reserve memory, the
udev rule will be triggered after the first section of the block is
reserved and will subsequently attempt to online the entire block,
interrupting the memory reservation process and causing the warning.
This patch modifies memory_probe_store to add a block of memory with
a single call to add_memory as opposed to looping through and adding
each section individually. A single call to add_memory is protected by
the mem_hotplug mutex which will prevent the udev rule from onlining
memory until the reservation of the entire block is complete.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
v2: -Move call to unlock_device_hotplug under "out" label
v3: -Update changelog with trimmed traces from newer kernel
-After further testing and reviewing the original purpose of
 the lock_device_hotplug_sysfs mutex, modified solution to
 fix the issue by replacing loop that adds memory section
 by section with a single call to add_memory.
v4: -Remove unused variable 'i'

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 25425d3..e5c6ff0 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -450,8 +450,7 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
   const char *buf, size_t count)
 {
u64 phys_addr;
-   int nid;
-   int i, ret;
+   int nid, ret;
unsigned long pages_per_block = PAGES_PER_SECTION * sections_per_block;

ret = kstrtoull(buf, 0, _addr);
@@ -461,15 +460,12 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;

-   for (i = 0; i < sections_per_block; i++) {
-   nid = memory_add_physaddr_to_nid(phys_addr);
-   ret = add_memory(nid, phys_addr,
-PAGES_PER_SECTION << PAGE_SHIFT);
-   if (ret)
-   goto out;
+   nid = memory_add_physaddr_to_nid(phys_addr);
+   ret = add_memory(nid, phys_addr,
+MIN_MEMORY_BLOCK_SIZE * sections_per_block);

-   phys_addr += MIN_MEMORY_BLOCK_SIZE;
-   }
+   if (ret)
+   goto out;

ret = count;
 out:

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] memory-hotplug: Fix kernel warning during memory hotplug on ppc64

2016-01-06 Thread John Allen
This patch fixes a bug where a kernel warning is triggered when performing
a memory hotplug on ppc64. This warning may also occur on any architecture
that uses the memory_probe_store interface.

WARNING: at drivers/base/memory.c:200
CPU: 9 PID: 13042 Comm: systemd-udevd Not tainted 
4.4.0-rc4-00113-g0bd0f1e-dirty #7
NIP [c055e034] pages_correctly_reserved+0x134/0x1b0
LR [c055e7f8] memory_subsys_online+0x68/0x140
Call Trace:
[c000fa9e7b50] [c000fa9e7b90] 0xc000fa9e7b90 (unreliable)
[c000fa9e7bb0] [c055e7f8] memory_subsys_online+0x68/0x140
[c000fa9e7bf0] [c0540064] device_online+0xb4/0x120
[c000fa9e7c30] [c055e6c0] store_mem_state+0xb0/0x180
[c000fa9e7c70] [c053c5e4] dev_attr_store+0x34/0x60
[c000fa9e7c90] [c02db0a4] sysfs_kf_write+0x64/0xa0
[c000fa9e7cb0] [c02da0cc] kernfs_fop_write+0x17c/0x1e0
[c000fa9e7d00] [c02481b0] __vfs_write+0x40/0x160
[c000fa9e7d90] [c0248ce8] vfs_write+0xb8/0x200
[c000fa9e7de0] [c0249b40] SyS_write+0x60/0x110
[c000fa9e7e30] [c0009260] system_call+0x38/0xd0

The warning is triggered because there is a udev rule that automatically
tries to online memory after it has been added. The udev rule varies from
distro to distro, but will generally look something like:

SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

On any architecture that uses memory_probe_store to reserve memory, the
udev rule will be triggered after the first section of the block is
reserved and will subsequently attempt to online the entire block,
interrupting the memory reservation process and causing the warning.
This patch modifies memory_probe_store to add a block of memory with
a single call to add_memory as opposed to looping through and adding
each section individually. A single call to add_memory is protected by
the mem_hotplug mutex which will prevent the udev rule from onlining
memory until the reservation of the entire block is complete.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
v2: -Move call to unlock_device_hotplug under "out" label
v3: -Update changelog with trimmed traces from newer kernel
-After further testing and reviewing the original purpose of
 the lock_device_hotplug_sysfs mutex, modified solution to
 fix the issue by replacing loop that adds memory section
 by section with a single call to add_memory.

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 25425d3..e15ddb65 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -461,15 +461,12 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;

-   for (i = 0; i < sections_per_block; i++) {
-   nid = memory_add_physaddr_to_nid(phys_addr);
-   ret = add_memory(nid, phys_addr,
-PAGES_PER_SECTION << PAGE_SHIFT);
-   if (ret)
-   goto out;
+   nid = memory_add_physaddr_to_nid(phys_addr);
+   ret = add_memory(nid, phys_addr,
+MIN_MEMORY_BLOCK_SIZE * sections_per_block);

-   phys_addr += MIN_MEMORY_BLOCK_SIZE;
-   }
+   if (ret)
+   goto out;

ret = count;
 out:


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] memory-hotplug: Fix kernel warning during memory hotplug on ppc64

2015-11-24 Thread John Allen
Hi Michael,

On 11/09/2015 07:21 PM, Michael Ellerman wrote:
> Hi John,
> 
> On Tue, 2015-11-03 at 11:21 -0600, John Allen wrote:
>> This patch fixes a bug where a kernel warning is triggered when performing
>> a memory hotplug on ppc64. This warning may also occur on any architecture
>> that has multiple sections per memory block.
> 
> So it looks like the only arches that enable this code at all are powerpc, sh
> and x86 (via CONFIG_ARCH_MEMORY_PROBE). And it sounds like on x86 it's just
> there for debugging, ACPI is meant to notify you of memory hotplug by the
> sounds.
> 
> Do any of sh or x86 have "multiple sections per memory block" ?
> 
> If not then this bug would only apply to powerpc, which would be useful to
> know.

Sorry for the delayed response. It looks like your reply got buried in my
lkml folder instead of going straight to my inbox.

My understanding is that x86 may use multiple sections per block. I
think we would have to assume that any architecture that uses this code
could potentially hit this issue.

> 
> And what is the actual warning? ie. what does the code look like. My line 210
> of memory.c is a printk() not a WARN.

In mainline, the warning is at line 200:

if (WARN_ON_ONCE(!pfn_valid(pfn)))
return false;

> 
> 
>> [   78.300767] [ cut here ]
>> [   78.300768] WARNING: at ../drivers/base/memory.c:210
>> [   78.300769] Modules linked in: rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag 
>> inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c 
>> ibmveth(X) rtc_generic btrfs xor raid6_pq xts
>> gf128mul dm_crypt sd_mod sr_mod cdrom crc_t10dif ibmvscsi(X) 
>> scsi_transport_srp scsi_tgt dm_mod sg scsi_mod autofs4
>> [   78.300789] Supported: Yes, External
>> [   78.300791] CPU: 1 PID: 3090 Comm: systemd-udevd Tainted: G  
>> X 3.12.45-1-default #1
>> [   78.300793] task: c004d7d1d970 ti: c004d7b9 task.ti: 
>> c004d7b9
>> [   78.300794] NIP: c04fcff8 LR: c04fda84 CTR: 
>> 
>> [   78.300795] REGS: c004d7b93930 TRAP: 0700   Tainted: G  X 
>>  (3.12.45-1-default)
>> [   78.300796] MSR: 80029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24088848  
>> XER: 
>> [   78.300800] CFAR: c04fcf98 SOFTE: 1
>> GPR00: 0537 c004d7b93bb0 c0e7f200 00053000
>> GPR04: 1000 0001 c0e0f200 
>> GPR08:  0001 0537 014dc000
>> GPR12: 00054000 ce7f0900 10041040 
>> GPR16: 0100206f0010 1003ff78 1006c824 100410b0
>> GPR20: 1003ff90 1006c00c 01002073cd20 0100206f0760
>> GPR24: 0100206f85a0 c076d950 c004ef7c95e0 c004d7b93e00
>> GPR28: c004de601738 0001 c1218f80 003f
>> [   78.300818] NIP [c04fcff8] memory_block_action+0x258/0x2e0
>> [   78.300820] LR [c04fda84] memory_subsys_online+0x54/0x100
>> [   78.300821] Call Trace:
>> [   78.300822] [c004d7b93bb0] [c9071ce0] 0xc9071ce0 
>> (unreliable)
>> [   78.300824] [c004d7b93c40] [c04fda84] 
>> memory_subsys_online+0x54/0x100
>> [   78.300826] [c004d7b93c70] [c04df784] device_online+0xb4/0x120
>> [   78.300828] [c004d7b93cb0] [c04fd738] 
>> store_mem_state+0x88/0x220
>> [   78.300830] [c004d7b93cf0] [c04db448] dev_attr_store+0x68/0xa0
>> [   78.300833] [c004d7b93d30] [c031f938] 
>> sysfs_write_file+0xf8/0x1d0
>> [   78.300835] [c004d7b93d90] [c027d29c] vfs_write+0xec/0x250
>> [   78.300837] [c004d7b93de0] [c027dfdc] SyS_write+0x6c/0xf0
>> [   78.300839] [c004d7b93e30] [c000a17c] syscall_exit+0x0/0x7c
>> [   78.300840] Instruction dump:
>> [   78.300841] 780a0560 79482ea4 7ce94214 2fa7 41de0014 7d09402a 
>> 396b4000 7907ffe3
>> [   78.300844] 4082ff54 3cc2fff9 8926b83a 69290001 <0b09> 2fa9 
>> 40de006c 3860fff0
>> [   78.300847] ---[ end trace dfec8da06ebbc762 ]---
> 
> For change logs I think it's nice to trim the oops a bit. Others probably have
> different opinions but I'd remove the printk timestamp, the GPRs and some of
> the other regs and the instruction dump, so more like:
> 
>   WARNING: at ../drivers/base/memory.c:210
>   CPU: 1 PID: 3090 Comm: systemd-udevd Tainted: G  X 
> 3.12.45-1-default #1
>   NIP [c04fcff8] memory_block_action+0x258/0x2e0
>   LR

Re: [PATCH] memory-hotplug: Fix kernel warning during memory hotplug on ppc64

2015-11-03 Thread John Allen
On 11/02/2015 02:51 PM, John Allen wrote:
> This patch fixes a bug where a kernel warning is triggered when performing
> a memory hotplug on ppc64. This warning may also occur on any architecture
> that has multiple sections per memory block.
> 
> [   78.300767] [ cut here ]
> [   78.300768] WARNING: at ../drivers/base/memory.c:210
> [   78.300769] Modules linked in: rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag 
> inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c 
> ibmveth(X) rtc_generic btrfs xor raid6_pq xts gf128mul dm_crypt sd_mod sr_mod 
> cdrom crc_t10dif ibmvscsi(X) scsi_transport_srp scsi_tgt dm_mod sg scsi_mod 
> autofs4
> [   78.300789] Supported: Yes, External
> [   78.300791] CPU: 1 PID: 3090 Comm: systemd-udevd Tainted: G  X 
> 3.12.45-1-default #1
> [   78.300793] task: c004d7d1d970 ti: c004d7b9 task.ti: 
> c004d7b9
> [   78.300794] NIP: c04fcff8 LR: c04fda84 CTR: 
> 
> [   78.300795] REGS: c004d7b93930 TRAP: 0700   Tainted: G  X  
> (3.12.45-1-default)
> [   78.300796] MSR: 80029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24088848  
> XER: 
> [   78.300800] CFAR: c04fcf98 SOFTE: 1
> GPR00: 0537 c004d7b93bb0 c0e7f200 00053000
> GPR04: 1000 0001 c0e0f200 
> GPR08:  0001 0537 014dc000
> GPR12: 00054000 ce7f0900 10041040 
> GPR16: 0100206f0010 1003ff78 1006c824 100410b0
> GPR20: 1003ff90 1006c00c 01002073cd20 0100206f0760
> GPR24: 0100206f85a0 c076d950 c004ef7c95e0 c004d7b93e00
> GPR28: c004de601738 0001 c1218f80 003f
> [   78.300818] NIP [c04fcff8] memory_block_action+0x258/0x2e0
> [   78.300820] LR [c04fda84] memory_subsys_online+0x54/0x100
> [   78.300821] Call Trace:
> [   78.300822] [c004d7b93bb0] [c9071ce0] 0xc9071ce0 
> (unreliable)
> [   78.300824] [c004d7b93c40] [c04fda84] 
> memory_subsys_online+0x54/0x100
> [   78.300826] [c004d7b93c70] [c04df784] device_online+0xb4/0x120
> [   78.300828] [c004d7b93cb0] [c04fd738] 
> store_mem_state+0x88/0x220
> [   78.300830] [c004d7b93cf0] [c04db448] dev_attr_store+0x68/0xa0
> [   78.300833] [c004d7b93d30] [c031f938] 
> sysfs_write_file+0xf8/0x1d0
> [   78.300835] [c004d7b93d90] [c027d29c] vfs_write+0xec/0x250
> [   78.300837] [c004d7b93de0] [c027dfdc] SyS_write+0x6c/0xf0
> [   78.300839] [c004d7b93e30] [c000a17c] syscall_exit+0x0/0x7c
> [   78.300840] Instruction dump:
> [   78.300841] 780a0560 79482ea4 7ce94214 2fa7 41de0014 7d09402a 396b4000 
> 7907ffe3
> [   78.300844] 4082ff54 3cc2fff9 8926b83a 69290001 <0b09> 2fa9 
> 40de006c 3860fff0
> [   78.300847] ---[ end trace dfec8da06ebbc762 ]---
> 
> The warning is triggered because there is a udev rule that automatically
> tries to online memory after it has been added. The udev rule varies from
> distro to distro, but will generally look something like:
> 
> SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", 
> ATTR{state}="online"
> 
> On any architecture that uses memory_probe_store to reserve memory,
> this can interrupt the memory reservation process. This patch modifies
> memory_probe_store to take the hotplug sysfs lock to prevent the online
> of added memory before the completion of the probe.
> 
> Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
> ---
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 2804aed..baf7a22 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -457,6 +457,10 @@ memory_probe_store(struct device *dev, struct 
> device_attribute *attr,
>   if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
>   return -EINVAL;
> 
> + ret = lock_device_hotplug_sysfs();
> + if (ret)
> + return ret;
> +
>   for (i = 0; i < sections_per_block; i++) {
>   nid = memory_add_physaddr_to_nid(phys_addr);
>   ret = add_memory(nid, phys_addr,
> @@ -467,6 +471,8 @@ memory_probe_store(struct device *dev, struct 
> device_attribute *attr,
>   phys_addr += MIN_MEMORY_BLOCK_SIZE;
>   }
> 
> + unlock_device_hotplug();
> +

I realized that there is actually a bug here. If the call to add_memory in
the above loop fails, we goto out and skip unlocking the device hotplug
lock. I will be sending a v2 patch to address this issue shortly.

John

>   ret = count;
>  out:
>   return ret;
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] memory-hotplug: Fix kernel warning during memory hotplug on ppc64

2015-11-03 Thread John Allen
This patch fixes a bug where a kernel warning is triggered when performing
a memory hotplug on ppc64. This warning may also occur on any architecture
that has multiple sections per memory block.

[   78.300767] [ cut here ]
[   78.300768] WARNING: at ../drivers/base/memory.c:210
[   78.300769] Modules linked in: rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag 
inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c 
ibmveth(X) rtc_generic btrfs xor raid6_pq xts gf128mul dm_crypt sd_mod sr_mod 
cdrom crc_t10dif ibmvscsi(X) scsi_transport_srp scsi_tgt dm_mod sg scsi_mod 
autofs4
[   78.300789] Supported: Yes, External
[   78.300791] CPU: 1 PID: 3090 Comm: systemd-udevd Tainted: G  X 
3.12.45-1-default #1
[   78.300793] task: c004d7d1d970 ti: c004d7b9 task.ti: 
c004d7b9
[   78.300794] NIP: c04fcff8 LR: c04fda84 CTR: 
[   78.300795] REGS: c004d7b93930 TRAP: 0700   Tainted: G  X  
(3.12.45-1-default)
[   78.300796] MSR: 80029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24088848  XER: 

[   78.300800] CFAR: c04fcf98 SOFTE: 1
GPR00: 0537 c004d7b93bb0 c0e7f200 00053000
GPR04: 1000 0001 c0e0f200 
GPR08:  0001 0537 014dc000
GPR12: 00054000 ce7f0900 10041040 
GPR16: 0100206f0010 1003ff78 1006c824 100410b0
GPR20: 1003ff90 1006c00c 01002073cd20 0100206f0760
GPR24: 0100206f85a0 c076d950 c004ef7c95e0 c004d7b93e00
GPR28: c004de601738 0001 c1218f80 003f
[   78.300818] NIP [c04fcff8] memory_block_action+0x258/0x2e0
[   78.300820] LR [c04fda84] memory_subsys_online+0x54/0x100
[   78.300821] Call Trace:
[   78.300822] [c004d7b93bb0] [c9071ce0] 0xc9071ce0 
(unreliable)
[   78.300824] [c004d7b93c40] [c04fda84] 
memory_subsys_online+0x54/0x100
[   78.300826] [c004d7b93c70] [c04df784] device_online+0xb4/0x120
[   78.300828] [c004d7b93cb0] [c04fd738] store_mem_state+0x88/0x220
[   78.300830] [c004d7b93cf0] [c04db448] dev_attr_store+0x68/0xa0
[   78.300833] [c004d7b93d30] [c031f938] sysfs_write_file+0xf8/0x1d0
[   78.300835] [c004d7b93d90] [c027d29c] vfs_write+0xec/0x250
[   78.300837] [c004d7b93de0] [c027dfdc] SyS_write+0x6c/0xf0
[   78.300839] [c004d7b93e30] [c000a17c] syscall_exit+0x0/0x7c
[   78.300840] Instruction dump:
[   78.300841] 780a0560 79482ea4 7ce94214 2fa7 41de0014 7d09402a 396b4000 
7907ffe3
[   78.300844] 4082ff54 3cc2fff9 8926b83a 69290001 <0b09> 2fa9 40de006c 
3860fff0
[   78.300847] ---[ end trace dfec8da06ebbc762 ]---

The warning is triggered because there is a udev rule that automatically
tries to online memory after it has been added. The udev rule varies from
distro to distro, but will generally look something like:

SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

On any architecture that uses memory_probe_store to reserve memory,
this can interrupt the memory reservation process. This patch modifies
memory_probe_store to take the hotplug sysfs lock to prevent the online
of added memory before the completion of the probe.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
v2: Move call to unlock_device_hotplug under "out" label

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bece691..7c50415 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -422,6 +422,10 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;

+   ret = lock_device_hotplug_sysfs();
+   if (ret)
+   return ret;
+
for (i = 0; i < sections_per_block; i++) {
nid = memory_add_physaddr_to_nid(phys_addr);
ret = add_memory(nid, phys_addr,
@@ -434,6 +438,7 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,

ret = count;
 out:
+   unlock_device_hotplug();
return ret;
 }

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] memory-hotplug: Fix kernel warning during memory hotplug on ppc64

2015-11-02 Thread John Allen
This patch fixes a bug where a kernel warning is triggered when performing
a memory hotplug on ppc64. This warning may also occur on any architecture
that has multiple sections per memory block.

[   78.300767] [ cut here ]
[   78.300768] WARNING: at ../drivers/base/memory.c:210
[   78.300769] Modules linked in: rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag 
inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c 
ibmveth(X) rtc_generic btrfs xor raid6_pq xts gf128mul dm_crypt sd_mod sr_mod 
cdrom crc_t10dif ibmvscsi(X) scsi_transport_srp scsi_tgt dm_mod sg scsi_mod 
autofs4
[   78.300789] Supported: Yes, External
[   78.300791] CPU: 1 PID: 3090 Comm: systemd-udevd Tainted: G  X 
3.12.45-1-default #1
[   78.300793] task: c004d7d1d970 ti: c004d7b9 task.ti: 
c004d7b9
[   78.300794] NIP: c04fcff8 LR: c04fda84 CTR: 
[   78.300795] REGS: c004d7b93930 TRAP: 0700   Tainted: G  X  
(3.12.45-1-default)
[   78.300796] MSR: 80029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24088848  XER: 

[   78.300800] CFAR: c04fcf98 SOFTE: 1
GPR00: 0537 c004d7b93bb0 c0e7f200 00053000
GPR04: 1000 0001 c0e0f200 
GPR08:  0001 0537 014dc000
GPR12: 00054000 ce7f0900 10041040 
GPR16: 0100206f0010 1003ff78 1006c824 100410b0
GPR20: 1003ff90 1006c00c 01002073cd20 0100206f0760
GPR24: 0100206f85a0 c076d950 c004ef7c95e0 c004d7b93e00
GPR28: c004de601738 0001 c1218f80 003f
[   78.300818] NIP [c04fcff8] memory_block_action+0x258/0x2e0
[   78.300820] LR [c04fda84] memory_subsys_online+0x54/0x100
[   78.300821] Call Trace:
[   78.300822] [c004d7b93bb0] [c9071ce0] 0xc9071ce0 
(unreliable)
[   78.300824] [c004d7b93c40] [c04fda84] 
memory_subsys_online+0x54/0x100
[   78.300826] [c004d7b93c70] [c04df784] device_online+0xb4/0x120
[   78.300828] [c004d7b93cb0] [c04fd738] store_mem_state+0x88/0x220
[   78.300830] [c004d7b93cf0] [c04db448] dev_attr_store+0x68/0xa0
[   78.300833] [c004d7b93d30] [c031f938] sysfs_write_file+0xf8/0x1d0
[   78.300835] [c004d7b93d90] [c027d29c] vfs_write+0xec/0x250
[   78.300837] [c004d7b93de0] [c027dfdc] SyS_write+0x6c/0xf0
[   78.300839] [c004d7b93e30] [c000a17c] syscall_exit+0x0/0x7c
[   78.300840] Instruction dump:
[   78.300841] 780a0560 79482ea4 7ce94214 2fa7 41de0014 7d09402a 396b4000 
7907ffe3
[   78.300844] 4082ff54 3cc2fff9 8926b83a 69290001 <0b09> 2fa9 40de006c 
3860fff0
[   78.300847] ---[ end trace dfec8da06ebbc762 ]---

The warning is triggered because there is a udev rule that automatically
tries to online memory after it has been added. The udev rule varies from
distro to distro, but will generally look something like:

SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

On any architecture that uses memory_probe_store to reserve memory,
this can interrupt the memory reservation process. This patch modifies
memory_probe_store to take the hotplug sysfs lock to prevent the online
of added memory before the completion of the probe.

Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
---
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 2804aed..baf7a22 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -457,6 +457,10 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;

+   ret = lock_device_hotplug_sysfs();
+   if (ret)
+   return ret;
+
for (i = 0; i < sections_per_block; i++) {
nid = memory_add_physaddr_to_nid(phys_addr);
ret = add_memory(nid, phys_addr,
@@ -467,6 +471,8 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
phys_addr += MIN_MEMORY_BLOCK_SIZE;
}

+   unlock_device_hotplug();
+
ret = count;
 out:
return ret;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev