Re: [RFC PATCH] powerpc/powernv: recheck lock bit in core_idle_lock_held after lwarx

2016-10-24 Thread Li Zhong

> On 25 Oct 2016, at 12:15, Paul Mackerras <pau...@ozlabs.org> wrote:
> 
> On Tue, Oct 25, 2016 at 11:24:34AM +0800, Li Zhong wrote:
>> The core_idle_lock_held loops when the lock bit is held by others.
>> However, it is possible that after the lock bit is cleared, some one
>> else sees it first and sets the lock bit. And lwarx loads a value with
>> lock bit set, and the lock bit may be cleared in the following stwcx.
>> It is possible the first one is still executing in the critical section.
>> 
>> This patch rechecks the lock bit after lwarx, and go back to loop if it
>> is set.
> 
> You're quite correct, in fact I posted almost exactly the same patch a
> few days ago...  See http://patchwork.ozlabs.org/patch/684963/.


Forget to check the list before sending … 

One minor difference, maybe we can use bne- for the rechecking :)

Thanks, Zhong

> 
> Thanks,
> Paul.
> 



[RFC PATCH] powerpc/powernv: recheck lock bit in core_idle_lock_held after lwarx

2016-10-24 Thread Li Zhong
The core_idle_lock_held loops when the lock bit is held by others.
However, it is possible that after the lock bit is cleared, some one
else sees it first and sets the lock bit. And lwarx loads a value with
lock bit set, and the lock bit may be cleared in the following stwcx.
It is possible the first one is still executing in the critical section.

This patch rechecks the lock bit after lwarx, and go back to loop if it
is set.

Signed-off-by: Li Zhong <zh...@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/idle_book3s.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index bd739fe..ce07b3f 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -99,6 +99,8 @@ core_idle_lock_held:
bne 3b
HMT_MEDIUM
lwarx   r15,0,r14
+   andi.   r9,r15,PNV_CORE_IDLE_LOCK_BIT
+   bne-core_idle_lock_held
blr
 
 /*
-- 
1.9.1





Re: [RFC PATCH 1/2] KVM: PPC: divide the ics lock into smaller ones for each irq

2016-06-20 Thread Li Zhong
On Jun 20, 2016, at 13:25, Paul Mackerras <pau...@ozlabs.org> wrote:On Mon, May 16, 2016 at 02:58:18PM +0800, Li Zhong wrote:This patch tries to use smaller locks for each irq in the ics, insteadof a lock at the ics level, to provide better scalability.This looks like a worth-while thing to do.  Do you have anyperformance measurements to justify the change?  This will increasethe size of struct kvmppc_ics by 4kB, so it would be useful to showthe performance increase that justifies it.Actually, I saw some “improvement” because of the vcpus were not binded, io jobs and irqs on the guest were not binded. After I fixed those random factors, the result became stable, but I couldn’t see any obvious improvements from the patches... Maybe I need find some other test cases that could support this change. Also, when you resend the patch, please make the patch descriptionmore definite - say "With this patch, we use" rather than "this patchtries to use", for instance.OK, I will change that when doing a resend, if I can find some workload that could benefit from this change. Thanks, ZhongRegards,Paul.___Linuxppc-dev mailing listLinuxppc-dev@lists.ozlabs.orghttps://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 2/2] KVM: PPC: Don't take lock when check irq's resend flag

2016-06-20 Thread Li Zhong

> On Jun 20, 2016, at 13:27, Paul Mackerras <pau...@ozlabs.org> wrote:
> 
> On Mon, May 16, 2016 at 03:02:13PM +0800, Li Zhong wrote:
>> It seems that we don't need to take the lock before evaluating irq's
>> resend flag. What needed is to make sure when we clear the ics's bit
>> in the icp's resend_map, we don't miss the resend flag of the irqs
>> that set the bit.
>> 
>> And seems this could be ordered through the barrier in test_and_clear_bit(),
>> and an newly added wmb when setting irq's resend flag, and icp's resend_map.
> 
> This looks fine to me.  Is there a measurable performance improvement
> from this?  I understand it could be hard to measure.
> 
> Also, you could make the patch description more definite - just say
> that we don't need to take the lock, there's no need for "seems”.

OK :)

However, we may need to ignore this one for now. To implement the P/Q stuff, we 
probably need make sure the resend irqs to be resent only once. It’s easier to 
make sure that with the lock here, and the resend flag can be cleared inside 
the lock. 

Thanks, Zhong
 
> 
> Paul.
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 2/2] KVM: PPC: Don't take lock when check irq's resend flag

2016-05-16 Thread Li Zhong
It seems that we don't need to take the lock before evaluating irq's
resend flag. What needed is to make sure when we clear the ics's bit
in the icp's resend_map, we don't miss the resend flag of the irqs
that set the bit.

And seems this could be ordered through the barrier in test_and_clear_bit(),
and an newly added wmb when setting irq's resend flag, and icp's resend_map.

Signed-off-by: Li Zhong <zh...@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 14 +++---
 arch/powerpc/kvm/book3s_xics.c   | 22 +++---
 2 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index c3f604e..c3fa386 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -39,14 +39,9 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics,
for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
struct ics_irq_state *state = >irq_state[i];
 
-   arch_spin_lock(>lock);
-
-   if (!state->resend) {
-   arch_spin_unlock(>lock);
+   if (!state->resend)
continue;
-   }
 
-   arch_spin_unlock(>lock);
icp_rm_deliver_irq(xics, icp, state->number);
}
 }
@@ -374,8 +369,13 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 * We failed to deliver the interrupt we need to set the
 * resend map bit and mark the ICS state as needing a resend
 */
-   set_bit(ics->icsid, icp->resend_map);
state->resend = 1;
+   /*
+* Make sure when checking resend, we don't miss the resend if
+* resend_map bit is seen and cleared.
+*/
+   smp_wmb();
+   set_bit(ics->icsid, icp->resend_map);
 
/*
 * If the need_resend flag got cleared in the ICP some time
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index b9dbf75..420c4fc 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -110,30 +110,17 @@ static void ics_check_resend(struct kvmppc_xics *xics, 
struct kvmppc_ics *ics,
 {
int i;
 
-   unsigned long flags;
-
-   local_irq_save(flags);
-
for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
struct ics_irq_state *state = >irq_state[i];
 
-   arch_spin_lock(>lock);
-
-   if (!state->resend) {
-   arch_spin_unlock(>lock);
+   if (!state->resend)
continue;
-   }
 
XICS_DBG("resend %#x prio %#x\n", state->number,
  state->priority);
 
-   arch_spin_unlock(>lock);
-   local_irq_restore(flags);
icp_deliver_irq(xics, icp, state->number);
-   local_irq_save(flags);
}
-
-   local_irq_restore(flags);
 }
 
 static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
@@ -474,8 +461,13 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 * We failed to deliver the interrupt we need to set the
 * resend map bit and mark the ICS state as needing a resend
 */
-   set_bit(ics->icsid, icp->resend_map);
state->resend = 1;
+   /*
+* Make sure when checking resend, we don't miss the resend if
+* resend_map bit is seen and cleared.
+*/
+   smp_wmb();
+   set_bit(ics->icsid, icp->resend_map);
 
/*
 * If the need_resend flag got cleared in the ICP some time
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 1/2] KVM: PPC: divide the ics lock into smaller ones for each irq

2016-05-16 Thread Li Zhong
This patch tries to use smaller locks for each irq in the ics, instead
of a lock at the ics level, to provide better scalability.

Signed-off-by: Li Zhong <zh...@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 24 ++---
 arch/powerpc/kvm/book3s_xics.c   | 41 +++-
 arch/powerpc/kvm/book3s_xics.h   |  2 +-
 3 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 980d8a6..c3f604e 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -36,20 +36,19 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics,
 {
int i;
 
-   arch_spin_lock(>lock);
-
for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
struct ics_irq_state *state = >irq_state[i];
 
-   if (!state->resend)
+   arch_spin_lock(>lock);
+
+   if (!state->resend) {
+   arch_spin_unlock(>lock);
continue;
+   }
 
-   arch_spin_unlock(>lock);
+   arch_spin_unlock(>lock);
icp_rm_deliver_irq(xics, icp, state->number);
-   arch_spin_lock(>lock);
}
-
-   arch_spin_unlock(>lock);
 }
 
 /* -- ICP routines -- */
@@ -310,8 +309,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
}
state = >irq_state[src];
 
-   /* Get a lock on the ICS */
-   arch_spin_lock(>lock);
+   arch_spin_lock(>lock);
 
/* Get our server */
if (!icp || state->server != icp->server_num) {
@@ -367,7 +365,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 * Delivery was successful, did we reject somebody else ?
 */
if (reject && reject != XICS_IPI) {
-   arch_spin_unlock(>lock);
+   arch_spin_unlock(>lock);
new_irq = reject;
goto again;
}
@@ -387,12 +385,12 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 */
smp_mb();
if (!icp->state.need_resend) {
-   arch_spin_unlock(>lock);
+   arch_spin_unlock(>lock);
goto again;
}
}
- out:
-   arch_spin_unlock(>lock);
+out:
+   arch_spin_unlock(>lock);
 }
 
 static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 46871d5..b9dbf75 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -113,25 +113,26 @@ static void ics_check_resend(struct kvmppc_xics *xics, 
struct kvmppc_ics *ics,
unsigned long flags;
 
local_irq_save(flags);
-   arch_spin_lock(>lock);
 
for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
struct ics_irq_state *state = >irq_state[i];
 
-   if (!state->resend)
+   arch_spin_lock(>lock);
+
+   if (!state->resend) {
+   arch_spin_unlock(>lock);
continue;
+   }
 
XICS_DBG("resend %#x prio %#x\n", state->number,
  state->priority);
 
-   arch_spin_unlock(>lock);
+   arch_spin_unlock(>lock);
local_irq_restore(flags);
icp_deliver_irq(xics, icp, state->number);
local_irq_save(flags);
-   arch_spin_lock(>lock);
}
 
-   arch_spin_unlock(>lock);
local_irq_restore(flags);
 }
 
@@ -143,7 +144,7 @@ static bool write_xive(struct kvmppc_xics *xics, struct 
kvmppc_ics *ics,
unsigned long flags;
 
local_irq_save(flags);
-   arch_spin_lock(>lock);
+   arch_spin_lock(>lock);
 
state->server = server;
state->priority = priority;
@@ -154,7 +155,7 @@ static bool write_xive(struct kvmppc_xics *xics, struct 
kvmppc_ics *ics,
deliver = true;
}
 
-   arch_spin_unlock(>lock);
+   arch_spin_unlock(>lock);
local_irq_restore(flags);
 
return deliver;
@@ -207,10 +208,10 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 
*server, u32 *priority)
state = >irq_state[src];
 
local_irq_save(flags);
-   arch_spin_lock(>lock);
+   arch_spin_lock(>lock);
*server = state->server;
*priority = state->priority;
-   arch_spin_unlock(>lock);
+   arch_spin_unlock(>lock);
local_irq_restore(flags);
 
return 0;
@@ -406,7 +40

[PATCH v3 powerpc] perf/hv-24x7 fail 24x7 initcall if create_events_from_catalog() fails

2015-04-13 Thread Li Zhong
As Michael pointed out, create_events_from_catalog() fails when 
we either have:
 - a kernel bug
 - some sort of hypervisor misconfiguration
 - ENOMEM

In all the above cases, we can also fail 24x7 initcall. 

For hypervisor errors, EIO is used so there is something reported 
in dmesg.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/perf/hv-24x7.c | 37 +
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 9445a82..ead1420 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -637,7 +637,7 @@ static ssize_t catalog_event_len_validate(struct 
hv_24x7_event_data *event,
 
 #define MAX_4K (SIZE_MAX / 4096)
 
-static void create_events_from_catalog(struct attribute ***events_,
+static int create_events_from_catalog(struct attribute ***events_,
struct attribute ***event_descs_,
struct attribute ***event_long_descs_)
 {
@@ -655,19 +655,25 @@ static void create_events_from_catalog(struct attribute 
***events_,
void *event_data, *end;
struct hv_24x7_event_data *event;
struct rb_root ev_uniq = RB_ROOT;
+   int ret = 0;
 
-   if (!page)
+   if (!page) {
+   ret = -ENOMEM;
goto e_out;
+   }
 
hret = h_get_24x7_catalog_page(page, 0, 0);
-   if (hret)
+   if (hret) {
+   ret = -EIO;
goto e_free;
+   }
 
catalog_version_num = be64_to_cpu(page_0-version);
catalog_page_len = be32_to_cpu(page_0-length);
 
if (MAX_4K  catalog_page_len) {
pr_err(invalid page count: %zu\n, catalog_page_len);
+   ret = -EIO;
goto e_free;
}
 
@@ -686,6 +692,7 @@ static void create_events_from_catalog(struct attribute 
***events_,
|| (MAX_4K - event_data_offs  event_data_len)) {
pr_err(invalid event data offs %zu and/or len %zu\n,
event_data_offs, event_data_len);
+   ret = -EIO;
goto e_free;
}
 
@@ -694,12 +701,14 @@ static void create_events_from_catalog(struct attribute 
***events_,
event_data_offs,
event_data_offs + event_data_len,
catalog_page_len);
+   ret = -EIO;
goto e_free;
}
 
if (SIZE_MAX / MAX_EVENTS_PER_EVENT_DATA - 1  event_entry_count) {
pr_err(event_entry_count %zu is invalid\n,
event_entry_count);
+   ret = -EIO;
goto e_free;
}
 
@@ -712,6 +721,7 @@ static void create_events_from_catalog(struct attribute 
***events_,
event_data = vmalloc(event_data_bytes);
if (!event_data) {
pr_err(could not allocate event data\n);
+   ret = -ENOMEM;
goto e_free;
}
 
@@ -731,6 +741,7 @@ static void create_events_from_catalog(struct attribute 
***events_,
if (hret) {
pr_err(failed to get event data in page %zu\n,
i + event_data_offs);
+   ret = -EIO;
goto e_event_data;
}
}
@@ -778,18 +789,24 @@ static void create_events_from_catalog(struct attribute 
***events_,
event_idx_last, event_entry_count, junk_events);
 
events = kmalloc_array(attr_max + 1, sizeof(*events), GFP_KERNEL);
-   if (!events)
+   if (!events) {
+   ret = -ENOMEM;
goto e_event_data;
+   }
 
event_descs = kmalloc_array(event_idx + 1, sizeof(*event_descs),
GFP_KERNEL);
-   if (!event_descs)
+   if (!event_descs) {
+   ret = -ENOMEM;
goto e_event_attrs;
+   }
 
event_long_descs = kmalloc_array(event_idx + 1,
sizeof(*event_long_descs), GFP_KERNEL);
-   if (!event_long_descs)
+   if (!event_long_descs) {
+   ret = -ENOMEM;
goto e_event_descs;
+   }
 
/* Iterate over the catalog filling in the attribute vector */
for (junk_events = 0, event_attr_ct = 0, desc_ct = 0, long_desc_ct = 0,
@@ -843,7 +860,7 @@ static void create_events_from_catalog(struct attribute 
***events_,
*events_ = events;
*event_descs_ = event_descs;
*event_long_descs_ = event_long_descs;
-   return;
+   return 0;
 
 e_event_descs:
kfree(event_descs);
@@ -857,6 +874,7 @@ e_out:
*events_ = NULL;
*event_descs_ = NULL;
*event_long_descs_ = NULL;
+   return ret;
 }
 
 static ssize_t catalog_read(struct file *filp, struct kobject *kobj,
@@ -1219,10 +1237,13 @@ static int hv_24x7_init(void

[RFC PATCH powerpc] perf/hv-24x7 set the attr group to NULL if events failed to be initialized

2015-02-15 Thread Li Zhong
sysfs_create_groups() creates groups one by one in the attr_groups array
before a NULL entry is encountered. But if an error is seen, it stops
and removes all the groups already created:
for (i = 0; groups[i]; i++) {
error = sysfs_create_group(kobj, groups[i]);
if (error) {
while (--i = 0)
sysfs_remove_group(kobj, groups[i]);
break;
}
}

And for the three event groups of 24x7, if it is not supported,
according to the above logic, it causes format and interface group to be
removed because of the error.

This patch moves the three events groups to the end of the attr groups,
and if create_events_from_catalog() fails to set their attributes, we
set them to NULL in attr_groups.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/perf/hv-24x7.c | 37 -
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 9445a82..1e433f7 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -637,7 +637,7 @@ static ssize_t catalog_event_len_validate(struct 
hv_24x7_event_data *event,
 
 #define MAX_4K (SIZE_MAX / 4096)
 
-static void create_events_from_catalog(struct attribute ***events_,
+static int create_events_from_catalog(struct attribute ***events_,
struct attribute ***event_descs_,
struct attribute ***event_long_descs_)
 {
@@ -843,7 +843,7 @@ static void create_events_from_catalog(struct attribute 
***events_,
*events_ = events;
*event_descs_ = event_descs;
*event_long_descs_ = event_long_descs;
-   return;
+   return 0;
 
 e_event_descs:
kfree(event_descs);
@@ -857,6 +857,7 @@ e_out:
*events_ = NULL;
*event_descs_ = NULL;
*event_long_descs_ = NULL;
+   return -1;
 }
 
 static ssize_t catalog_read(struct file *filp, struct kobject *kobj,
@@ -967,13 +968,25 @@ static struct attribute_group if_group = {
.attrs = if_attrs,
 };
 
+enum GROUP_INDEX {
+   FORMAT,
+   IF,
+   EVENT,
+   EVENT_DESC,
+   EVENT_LONG_DESC,
+   MAX
+};
+
+/*
+ * keep the attr group whose attr is set in init (may fail)
+ * at the end
+ */
 static const struct attribute_group *attr_groups[] = {
-   format_group,
-   event_group,
-   event_desc_group,
-   event_long_desc_group,
-   if_group,
-   NULL,
+   [FORMAT] = format_group,
+   [IF] = if_group,
+   [EVENT] = event_group,
+   [EVENT_DESC] = event_desc_group,
+   [EVENT_LONG_DESC] = event_long_desc_group
 };
 
 DEFINE_PER_CPU(char, hv_24x7_reqb[4096]) __aligned(4096);
@@ -1219,10 +1232,16 @@ static int hv_24x7_init(void)
/* sampling not supported */
h_24x7_pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
 
-   create_events_from_catalog(event_group.attrs,
+   r = create_events_from_catalog(event_group.attrs,
   event_desc_group.attrs,
   event_long_desc_group.attrs);
 
+   if (r) {
+   h_24x7_pmu.attr_groups[EVENT] =
+   h_24x7_pmu.attr_groups[EVENT_DESC] =
+   h_24x7_pmu.attr_groups[EVENT_LONG_DESC] = 0;
+   }
+
r = perf_pmu_register(h_24x7_pmu, h_24x7_pmu.name, -1);
if (r)
return r;




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [powerpc] init nvram_pstore_info's buf_lock

2014-11-17 Thread Li Zhong
On 二, 2014-11-18 at 15:33 +1100, Michael Ellerman wrote:
 On Mon, 2014-17-11 at 02:52:30 UTC, Li Zhong wrote:
  It seems nvram_pstore_info's buf_lock is not initialized before
  registering, which causes some strange behavior when trying to obtain
  the lock during kdump process.
 
 What kind of strange behaviour? Does it still work and just print a warning?
 
 It's static, so I'd expect it to be unlocked by default.

Yes, you are right. It still works. 

I saw it on a UP configuration, noticed the console stopped for a couple
of seconds, then lockup suspected warning printed out, but then it
continued to run. 

After some further checking, it seems working as designed for the UP
spinlock debug, from the comments in spinlock_up.h

 * In the debug case, 1 means unlocked, 0 means locked. (the values
 * are inverted, to catch initialization bugs)

So try lock fails, and lockup reported, but then arch_spin_lock()
passes. 

Thanks, Zhong

 
 cheers
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH powerpc] init nvram_pstore_info's buf_lock

2014-11-16 Thread Li Zhong
It seems nvram_pstore_info's buf_lock is not initialized before
registering, which causes some strange behavior when trying to obtain
the lock during kdump process.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/nvram.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/nvram.c 
b/arch/powerpc/platforms/pseries/nvram.c
index 11a3b61..054a0ed 100644
--- a/arch/powerpc/platforms/pseries/nvram.c
+++ b/arch/powerpc/platforms/pseries/nvram.c
@@ -715,6 +715,8 @@ static int nvram_pstore_init(void)
nvram_pstore_info.buf = oops_data;
nvram_pstore_info.bufsize = oops_data_sz;
 
+   spin_lock_init(nvram_pstore_info.buf_lock);
+
rc = pstore_register(nvram_pstore_info);
if (rc != 0)
pr_err(nvram: pstore_register() failed, defaults to 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 1/3] powerpc: Fix warning reported by verify_cpu_node_mapping()

2014-10-15 Thread Li Zhong
On 二, 2014-10-14 at 15:35 +1100, Michael Ellerman wrote:
 On Tue, 2014-10-14 at 10:39 +0800, Li Zhong wrote:
  On 二, 2014-10-07 at 08:33 -0700, Nishanth Aravamudan wrote:
   On 07.10.2014 [17:28:38 +1100], Michael Ellerman wrote:
On Fri, 2014-10-03 at 16:26 -0700, Nishanth Aravamudan wrote:
 On 03.10.2014 [10:50:20 +1000], Michael Ellerman wrote:
  On Thu, 2014-10-02 at 14:13 -0700, Nishanth Aravamudan wrote:
   Ben  Michael,
   
   What's the status of these patches?
  
  Been in my next for a week :)
  
  https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=next
 
 Ah ok, thanks -- I wasn't following your tree, my fault. 

Not really your fault, I hadn't announced my trees existence :)

 Do we want these to go back to 3.17-stable, as they fix some annoying 
 splats
 during boot (non-fatal afaict, though)?

Up to you really, I don't know how often/bad they were. I haven't added 
CC
stable tags to the commits, so if you want them in stable you should 
send them
explicitly.
   
   I think they occur every boot, unconditionally, on pseries. Doesn't
   prevent boot, just really noisy. I think it'd be good to get them into
   -stable.
   
   Li Zhong, can you push them once they get sent upstream?
  
  I guess I only need to send the first two patches to stable? 
 
 Probably. It's not clear from the changelog how serious a problem it fixes.
 
 See Documentation/stable_kernel_rules.txt

OK, I'll send the first two to stable. 
Thanks, Zhong

 
 cheers
 
 
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 1/3] powerpc: Fix warning reported by verify_cpu_node_mapping()

2014-10-13 Thread Li Zhong
On 二, 2014-10-07 at 08:33 -0700, Nishanth Aravamudan wrote:
 On 07.10.2014 [17:28:38 +1100], Michael Ellerman wrote:
  On Fri, 2014-10-03 at 16:26 -0700, Nishanth Aravamudan wrote:
   On 03.10.2014 [10:50:20 +1000], Michael Ellerman wrote:
On Thu, 2014-10-02 at 14:13 -0700, Nishanth Aravamudan wrote:
 Ben  Michael,
 
 What's the status of these patches?

Been in my next for a week :)

https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=next
   
   Ah ok, thanks -- I wasn't following your tree, my fault. 
  
  Not really your fault, I hadn't announced my trees existence :)
  
   Do we want these to go back to 3.17-stable, as they fix some annoying 
   splats
   during boot (non-fatal afaict, though)?
  
  Up to you really, I don't know how often/bad they were. I haven't added CC
  stable tags to the commits, so if you want them in stable you should send 
  them
  explicitly.
 
 I think they occur every boot, unconditionally, on pseries. Doesn't
 prevent boot, just really noisy. I think it'd be good to get them into
 -stable.
 
 Li Zhong, can you push them once they get sent upstream?

I guess I only need to send the first two patches to stable? 

Thanks, Zhong
 
 Thanks,
 Nish


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Reimplement __get_SP() as a function not a define

2014-10-08 Thread Li Zhong
On 三, 2014-10-01 at 15:10 +1000, Anton Blanchard wrote:
 Li Zhong points out an issue with our current __get_SP()
 implementation. If ftrace function tracing is enabled (ie -pg
 profiling using _mcount) we spill a stack frame on 64bit all the
 time.
 
 If a function calls __get_SP() and later calls a function that is
 tail call optimised, we will pop the stack frame and the value
 returned by __get_SP() is no longer valid. An example from Li can
 be found in save_stack_trace - save_context_stack:
 
 c00432c0 .save_stack_trace:
 c00432c0:   mflrr0
 c00432c4:   std r0,16(r1)
 c00432c8:   stdur1,-128(r1) -- stack frame for _mcount
 c00432cc:   std r3,112(r1)
 c00432d0:   bl  ._mcount
 c00432d4:   nop
 
 c00432d8:   mr  r4,r1 -- __get_SP()
 
 c00432dc:   ld  r5,632(r13)
 c00432e0:   ld  r3,112(r1)
 c00432e4:   li  r6,1
 
 c00432e8:   addir1,r1,128 -- pop stack frame
 
 c00432ec:   ld  r0,16(r1)
 c00432f0:   mtlrr0
 c00432f4:   b   .save_context_stack -- tail call optimized
 
 save_context_stack ends up with a stack pointer below the current
 one, and it is likely to be scribbled over.
 
 Fix this by making __get_SP() a function which returns the
 callers stack frame. Also replace inline assembly which grabs
 the stack pointer in save_stack_trace and show_stack with
 __get_SP().
 
 Reported-by: Li Zhong zh...@linux.vnet.ibm.com
 Signed-off-by: Anton Blanchard an...@samba.org
 ---
  arch/powerpc/include/asm/reg.h   | 3 +--
  arch/powerpc/kernel/misc.S   | 4 
  arch/powerpc/kernel/process.c| 2 +-
  arch/powerpc/kernel/stacktrace.c | 2 +-
  4 files changed, 7 insertions(+), 4 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
 index 0c05059..0f973c0 100644
 --- a/arch/powerpc/include/asm/reg.h
 +++ b/arch/powerpc/include/asm/reg.h
 @@ -1264,8 +1264,7 @@ static inline unsigned long mfvtb (void)
  
  #define proc_trap()  asm volatile(trap)
  
 -#define __get_SP()   ({unsigned long sp; \
 - asm volatile(mr %0,1: =r (sp)); sp;})
 +extern unsigned long __get_SP(void);

It seems that some module code is using __get_SP, e.g. xfs in the
example below:
ERROR: .__get_SP [fs/xfs/xfs.ko] undefined!

Maybe we need export this symbol in arch/powerpc/kernel/ppc_ksyms.c? 

diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 48d17d6f..eebd4e4 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -207,3 +207,5 @@ EXPORT_SYMBOL_GPL(mmu_psize_defs);
 #ifdef CONFIG_EPAPR_PARAVIRT
 EXPORT_SYMBOL(epapr_hypercall_start);
 #endif
+
+EXPORT_SYMBOL(__get_SP);

With the above compiling error fixed, this patch solved the SP issue I saw, so 

Tested-by: Li Zhong zh...@linux.vnet.ibm.com

  
  extern unsigned long scom970_read(unsigned int address);
  extern void scom970_write(unsigned int address, unsigned long value);
 diff --git a/arch/powerpc/kernel/misc.S b/arch/powerpc/kernel/misc.S
 index 7ce26d4..120deb7 100644
 --- a/arch/powerpc/kernel/misc.S
 +++ b/arch/powerpc/kernel/misc.S
 @@ -114,3 +114,7 @@ _GLOBAL(longjmp)
   mtlrr0
   mr  r3,r4
   blr
 +
 +_GLOBAL(__get_SP)
 + PPC_LL  r3,0(r1)
 + blr
 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
 index aa1df89..3cc6439 100644
 --- a/arch/powerpc/kernel/process.c
 +++ b/arch/powerpc/kernel/process.c
 @@ -1545,7 +1545,7 @@ void show_stack(struct task_struct *tsk, unsigned long 
 *stack)
   tsk = current;
   if (sp == 0) {
   if (tsk == current)
 - asm(mr %0,1 : =r (sp));
 + sp = __get_SP();
   else
   sp = tsk-thread.ksp;
   }
 diff --git a/arch/powerpc/kernel/stacktrace.c 
 b/arch/powerpc/kernel/stacktrace.c
 index 3d30ef1..7f65bae 100644
 --- a/arch/powerpc/kernel/stacktrace.c
 +++ b/arch/powerpc/kernel/stacktrace.c
 @@ -50,7 +50,7 @@ void save_stack_trace(struct stack_trace *trace)
  {
   unsigned long sp;
  
 - asm(mr %0,1 : =r (sp));
 + sp = __get_SP();
  
   save_context_stack(trace, sp, current, 1);
  }


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 1/3] powerpc: Fix warning reported by verify_cpu_node_mapping()

2014-10-07 Thread Li Zhong
On 二, 2014-10-07 at 08:33 -0700, Nishanth Aravamudan wrote:
 On 07.10.2014 [17:28:38 +1100], Michael Ellerman wrote:
  On Fri, 2014-10-03 at 16:26 -0700, Nishanth Aravamudan wrote:
   On 03.10.2014 [10:50:20 +1000], Michael Ellerman wrote:
On Thu, 2014-10-02 at 14:13 -0700, Nishanth Aravamudan wrote:
 Ben  Michael,
 
 What's the status of these patches?

Been in my next for a week :)

https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=next
   
   Ah ok, thanks -- I wasn't following your tree, my fault. 
  
  Not really your fault, I hadn't announced my trees existence :)
  
   Do we want these to go back to 3.17-stable, as they fix some annoying 
   splats
   during boot (non-fatal afaict, though)?
  
  Up to you really, I don't know how often/bad they were. I haven't added CC
  stable tags to the commits, so if you want them in stable you should send 
  them
  explicitly.
 
 I think they occur every boot, unconditionally, on pseries. Doesn't
 prevent boot, just really noisy. I think it'd be good to get them into
 -stable.
 
 Li Zhong, can you push them once they get sent upstream?

Ok, I will send these patches to stable mailing list after it is merged.

Thanks, Zhong
 
 Thanks,
 Nish


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc: Fix warning reported by verify_cpu_node_mapping()

2014-08-27 Thread Li Zhong
On 三, 2014-08-27 at 09:41 +0800, Li Zhong wrote:
 On 二, 2014-08-26 at 08:10 -0500, Nathan Fontenot wrote:
  On 08/25/2014 02:22 AM, Li Zhong wrote:
   With commit 2fabf084b, during boottime, cpu_numa_callback() is called
   earlier(before their online) for each cpu, and verify_cpu_node_mapping()
   uses cpu_to_node() to check whether siblings are in the same node. 
   
   It skips the checking for siblings that are not online yet. So the only
   check done here is for the bootcpu, which is online at that time. But
   the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which
   will be set up in smp_prepare_cpus()).
   
   So I saw something like following reported:
   [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same
   node!
   
   As we don't actually do the checking during this early stage, so maybe
   we could directly call numa_setup_cpu() in do_init_bootmem().
   
   Also, as Nish suggested, here it's better to use present cpu mask
   instead of possible mask to avoid warning in numa_setup_cpu().
   
   Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
   ---
   diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
   index d7737a5..3a9061e 100644
   --- a/arch/powerpc/mm/numa.c
   +++ b/arch/powerpc/mm/numa.c
   @@ -1127,9 +1127,8 @@ void __init do_init_bootmem(void)
  * even before we online them, so that we can use cpu_to_{node,mem}
  * early in boot, cf. smp_prepare_cpus().
  */
   - for_each_possible_cpu(cpu) {
   - cpu_numa_callback(ppc64_numa_nb, CPU_UP_PREPARE,
   -   (void *)(unsigned long)cpu);
   + for_each_present_cpu(cpu) {
   + numa_setup_cpu((unsigned long)cpu);
 }
}

  
  I am getting the following error on my system booting with this patch.
  
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.0-202712-g9e81330-dirty #42
  task: c000fea4 ti: c000fea8 task.ti: c000fea8
  NIP: c01afad8 LR: c0193b68 CTR: 
  REGS: c000fea839e0 TRAP: 0300   Not tainted  
  (3.16.0-202712-g9e81330-dirty)
  MSR: 80019033 SF,EE,ME,IR,DR,RI,LE  CR: 2400  XER: 2004
  CFAR: c00084d4 DAR: 1690 DSISR: 4000 SOFTE: 1 
  GPR00: c0b6db9c c000fea83c60 c0cd0628 1688 
  GPR04: 0001  c000fea83c80 0990 
  GPR08: c0d531e0 c0d66218 c0d60628  
  GPR12:  cec6 c000bc88  
  GPR16:     
  GPR20:   c0c21b88 c0c03738 
  GPR24: c0c03638 c0d24b10 c0c03638 c0c03738 
  GPR28: 0080 0080 c0d208e8 0010 
  NIP [c01afad8] next_zones_zonelist+0x8/0xa0
  LR [c0193b68] local_memory_node+0x38/0x60
  Call Trace:
  [c000fea83c60] [c000fea83c90] 0xc000fea83c90 (unreliable)
  [c000fea83c90] [c0b6db9c] smp_prepare_cpus+0x16c/0x278
  [c000fea83d00] [c0b64098] kernel_init_freeable+0x150/0x340
  [c000fea83dc0] [c000bca4] kernel_init+0x24/0x140
  [c000fea83e30] [c0009560] ret_from_kernel_thread+0x5c/0x7c
  Instruction dump:
  e9230038 39490f00 7fa35040 409c000c 38630780 4e800020 7d234b78 4b64 
  6000 6042 2c25 40c2004c 81230008 7f892040 419d0014 4830 
  ---[ end trace cb88537fdc8fa200 ]---
  
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
  
  I think the loop needs to go back to initializing all possibe cpus instead 
  of
  only the present cpus. We can add a check for present cpus in 
  numa_setup_cpu()
  to avoid printing the WARN_ON() for cpus that are not present, something 
  like
  the following...
 
 Ah, yes, seems the panic was caused by smp_prepare_cpus() using
 uninitialized numa_cpu_lookup_table for cpus which are possible but not
 present during boottime. 
 
 However, by following change, it seems those cpus will be set to node 0
 at boottime, and not be changed after they become present, because of
 the following check in numa_setup_cpu(): 
 if ((nid = numa_cpu_lookup_table[lcpu]) = 0) {
 map_cpu_to_node(lcpu, nid);
 return nid;
 }
 
 Maybe we could change the smp_prepare_cpus() to set numa information for
 present cpus instead? 
 
 And for those possible, !present cpus, we could do the setup after they
 are started. 

Hi, Nathan, Nish,

I did some draft code based on the above approach, and will send it out.
Could you please help to have a review? 

I split the code to separate patches, so each small patch addressed only
one small issue. 

Thanks, Zhong

 
 Thanks, Zhong
 
  
  diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
  index d7737a5..b827f2e 100644
  --- a/arch/powerpc/mm/numa.c
  +++ b/arch/powerpc

[RFC PATCH v3 1/3] powerpc: Fix warning reported by verify_cpu_node_mapping()

2014-08-27 Thread Li Zhong
With commit 2fabf084b6ad (powerpc: reorder per-cpu NUMA information's
initialization), during boottime, cpu_numa_callback() is called
earlier(before their online) for each cpu, and verify_cpu_node_mapping()
uses cpu_to_node() to check whether siblings are in the same node.

It skips the checking for siblings that are not online yet. So the only
check done here is for the bootcpu, which is online at that time. But
the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which
will be set up in smp_prepare_cpus()).

So I saw something like following reported:
[0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same
node!

As we don't actually do the checking during this early stage, so maybe
we could directly call numa_setup_cpu() in do_init_bootmem().

Cc: Nishanth Aravamudan n...@linux.vnet.ibm.com
Cc: Nathan Fontenot nf...@linux.vnet.ibm.com
Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/numa.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index d7737a5..9918c02 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1128,8 +1128,7 @@ void __init do_init_bootmem(void)
 * early in boot, cf. smp_prepare_cpus().
 */
for_each_possible_cpu(cpu) {
-   cpu_numa_callback(ppc64_numa_nb, CPU_UP_PREPARE,
- (void *)(unsigned long)cpu);
+   numa_setup_cpu((unsigned long)cpu);
}
 }
 
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH v3 2/3] powerpc: Only set numa node information for present cpus at boottime

2014-08-27 Thread Li Zhong
As Nish suggested, it makes more sense to init the numa node informatiion
for present cpus at boottime, which could also avoid WARN_ON(1) in
numa_setup_cpu().

With this change, we also need to change the smp_prepare_cpus() to set up
numa information only on present cpus.

For those possible, but not present cpus, their numa information
will be set up after they are started, as the original code did before commit
2fabf084b6ad.

Cc: Nishanth Aravamudan n...@linux.vnet.ibm.com
Cc: Nathan Fontenot nf...@linux.vnet.ibm.com
Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/smp.c | 10 --
 arch/powerpc/mm/numa.c|  2 +-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index a0738af..dc0e774 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -379,8 +379,11 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
/*
 * numa_node_id() works after this.
 */
-   set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
-   set_cpu_numa_mem(cpu, 
local_memory_node(numa_cpu_lookup_table[cpu]));
+   if (cpu_present(cpu)) {
+   set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
+   set_cpu_numa_mem(cpu,
+   local_memory_node(numa_cpu_lookup_table[cpu]));
+   }
}
 
cpumask_set_cpu(boot_cpuid, cpu_sibling_mask(boot_cpuid));
@@ -728,6 +731,9 @@ void start_secondary(void *unused)
}
traverse_core_siblings(cpu, true);
 
+   set_numa_node(numa_cpu_lookup_table[cpu]);
+   set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
+
smp_wmb();
notify_cpu_starting(cpu);
set_cpu_online(cpu, true);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9918c02..3a9061e 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1127,7 +1127,7 @@ void __init do_init_bootmem(void)
 * even before we online them, so that we can use cpu_to_{node,mem}
 * early in boot, cf. smp_prepare_cpus().
 */
-   for_each_possible_cpu(cpu) {
+   for_each_present_cpu(cpu) {
numa_setup_cpu((unsigned long)cpu);
}
 }
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH v3 3/3] powerpc: some changes in numa_setup_cpu()

2014-08-27 Thread Li Zhong
this patches changes some error handling logics in numa_setup_cpu(),
when cpu node is not found, so:

if the cpu is possible, but not present, -1 is kept in numa_cpu_lookup_table,
so later, if the cpu is added, we could set correct numa information for it.

if the cpu is present, then we set the first online node to
numa_cpu_lookup_table instead of 0 ( in case 0 might not be an online node? )

Cc: Nishanth Aravamudan n...@linux.vnet.ibm.com
Cc: Nathan Fontenot nf...@linux.vnet.ibm.com
Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/numa.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 3a9061e..ec32d46 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -538,7 +538,7 @@ static int of_drconf_to_nid_single(struct of_drconf_cell 
*drmem,
  */
 static int numa_setup_cpu(unsigned long lcpu)
 {
-   int nid;
+   int nid = -1;
struct device_node *cpu;
 
/*
@@ -555,19 +555,21 @@ static int numa_setup_cpu(unsigned long lcpu)
 
if (!cpu) {
WARN_ON(1);
-   nid = 0;
-   goto out;
+   if (cpu_present(lcpu))
+   goto out_present;
+   else
+   goto out;
}
 
nid = of_node_to_nid_single(cpu);
 
+out_present:
if (nid  0 || !node_online(nid))
nid = first_online_node;
-out:
-   map_cpu_to_node(lcpu, nid);
 
+   map_cpu_to_node(lcpu, nid);
of_node_put(cpu);
-
+out:
return nid;
 }
 
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc: Fix warning reported by verify_cpu_node_mapping()

2014-08-26 Thread Li Zhong
On 二, 2014-08-26 at 08:10 -0500, Nathan Fontenot wrote:
 On 08/25/2014 02:22 AM, Li Zhong wrote:
  With commit 2fabf084b, during boottime, cpu_numa_callback() is called
  earlier(before their online) for each cpu, and verify_cpu_node_mapping()
  uses cpu_to_node() to check whether siblings are in the same node. 
  
  It skips the checking for siblings that are not online yet. So the only
  check done here is for the bootcpu, which is online at that time. But
  the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which
  will be set up in smp_prepare_cpus()).
  
  So I saw something like following reported:
  [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same
  node!
  
  As we don't actually do the checking during this early stage, so maybe
  we could directly call numa_setup_cpu() in do_init_bootmem().
  
  Also, as Nish suggested, here it's better to use present cpu mask
  instead of possible mask to avoid warning in numa_setup_cpu().
  
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
  ---
  diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
  index d7737a5..3a9061e 100644
  --- a/arch/powerpc/mm/numa.c
  +++ b/arch/powerpc/mm/numa.c
  @@ -1127,9 +1127,8 @@ void __init do_init_bootmem(void)
   * even before we online them, so that we can use cpu_to_{node,mem}
   * early in boot, cf. smp_prepare_cpus().
   */
  -   for_each_possible_cpu(cpu) {
  -   cpu_numa_callback(ppc64_numa_nb, CPU_UP_PREPARE,
  - (void *)(unsigned long)cpu);
  +   for_each_present_cpu(cpu) {
  +   numa_setup_cpu((unsigned long)cpu);
  }
   }
   
 
 I am getting the following error on my system booting with this patch.
 
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.0-202712-g9e81330-dirty #42
 task: c000fea4 ti: c000fea8 task.ti: c000fea8
 NIP: c01afad8 LR: c0193b68 CTR: 
 REGS: c000fea839e0 TRAP: 0300   Not tainted  
 (3.16.0-202712-g9e81330-dirty)
 MSR: 80019033 SF,EE,ME,IR,DR,RI,LE  CR: 2400  XER: 2004
 CFAR: c00084d4 DAR: 1690 DSISR: 4000 SOFTE: 1 
 GPR00: c0b6db9c c000fea83c60 c0cd0628 1688 
 GPR04: 0001  c000fea83c80 0990 
 GPR08: c0d531e0 c0d66218 c0d60628  
 GPR12:  cec6 c000bc88  
 GPR16:     
 GPR20:   c0c21b88 c0c03738 
 GPR24: c0c03638 c0d24b10 c0c03638 c0c03738 
 GPR28: 0080 0080 c0d208e8 0010 
 NIP [c01afad8] next_zones_zonelist+0x8/0xa0
 LR [c0193b68] local_memory_node+0x38/0x60
 Call Trace:
 [c000fea83c60] [c000fea83c90] 0xc000fea83c90 (unreliable)
 [c000fea83c90] [c0b6db9c] smp_prepare_cpus+0x16c/0x278
 [c000fea83d00] [c0b64098] kernel_init_freeable+0x150/0x340
 [c000fea83dc0] [c000bca4] kernel_init+0x24/0x140
 [c000fea83e30] [c0009560] ret_from_kernel_thread+0x5c/0x7c
 Instruction dump:
 e9230038 39490f00 7fa35040 409c000c 38630780 4e800020 7d234b78 4b64 
 6000 6042 2c25 40c2004c 81230008 7f892040 419d0014 4830 
 ---[ end trace cb88537fdc8fa200 ]---
 
 Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
 
 I think the loop needs to go back to initializing all possibe cpus instead of
 only the present cpus. We can add a check for present cpus in numa_setup_cpu()
 to avoid printing the WARN_ON() for cpus that are not present, something like
 the following...

Ah, yes, seems the panic was caused by smp_prepare_cpus() using
uninitialized numa_cpu_lookup_table for cpus which are possible but not
present during boottime. 

However, by following change, it seems those cpus will be set to node 0
at boottime, and not be changed after they become present, because of
the following check in numa_setup_cpu(): 
if ((nid = numa_cpu_lookup_table[lcpu]) = 0) {
map_cpu_to_node(lcpu, nid);
return nid;
}

Maybe we could change the smp_prepare_cpus() to set numa information for
present cpus instead? 

And for those possible, !present cpus, we could do the setup after they
are started. 

Thanks, Zhong

 
 diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
 index d7737a5..b827f2e 100644
 --- a/arch/powerpc/mm/numa.c
 +++ b/arch/powerpc/mm/numa.c
 @@ -554,7 +554,8 @@ static int numa_setup_cpu(unsigned long lcpu)
   cpu = of_get_cpu_node(lcpu, NULL);
 
   if (!cpu) {
 - WARN_ON(1);
 + if (cpu_present(lcpu))
 + WARN_ON(1);
   nid = 0;
   goto out;
   }
 @@ -1128,8 +1129,7 @@ void __init do_init_bootmem(void)
* early in boot, cf

Re: [RFC PATCH powerpc] Fix warning reported by verify_cpu_node_mapping()

2014-08-25 Thread Li Zhong
On 五, 2014-08-22 at 15:04 -0700, Nishanth Aravamudan wrote:
 On 22.08.2014 [10:12:56 +0800], Li Zhong wrote:
  On ???, 2014-08-21 at 08:45 -0700, Nishanth Aravamudan wrote:
   On 21.08.2014 [16:14:02 +0800], Li Zhong wrote:
With commit 2fabf084b, during boottime, cpu_numa_callback() is called
earlier(before their online) for each cpu, and verify_cpu_node_mapping()
uses cpu_to_node() to check whether siblings are in the same node. 

It skips the checking for siblings that are not online yet. So the only
check done here is for the bootcpu, which is online at that time. But
the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which
will be set up in smp_prepare_cpus()).

So I could see something like following reported:
[0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same
node!
   
   You mean you did see this, right? (as opposed to could based upon code
   inspection or something)
  
  Yes, I did see the warnings. Seems I didn't express it precisely in
  English ...
  
   

As we don't actually do the checking during this early stage, so maybe
we could directly call numa_setup_cpu() in do_init_bootmem()?

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
   
   Acked-by: Nishanth Aravamudan n...@linux.vnet.ibm.com
  
  Thank you for the review,
  
  Zhong
  
   
---
 arch/powerpc/mm/numa.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index d7737a5..9918c02 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1128,8 +1128,7 @@ void __init do_init_bootmem(void)
 * early in boot, cf. smp_prepare_cpus().
 */
for_each_possible_cpu(cpu) {
-   cpu_numa_callback(ppc64_numa_nb, CPU_UP_PREPARE,
- (void *)(unsigned long)cpu);
+   numa_setup_cpu((unsigned long)cpu);
   
   This is a good change, thanks for catching it. I must have glossed over
   those messages in my testing, my apologies!
 
 Actually, thinking about this more, do you think it makes more sense to
 do:
 
 for_each_present_cpu(cpu) in this loop? That is, at boot, ensure all
 present (but possibly offline) CPUs have their NUMA mapping set up. CPUs
 that aren't present (but are possible) might trigger other warnings,
 right? (e.g., the WARN_ON(1) in numa_setup_cpu)

After reading the code that set up the cpu masks, I think you are right.
I will send a new version with this fixed. 

Thanks, Zhong

 
 -Nish


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc: Fix warning reported by verify_cpu_node_mapping()

2014-08-25 Thread Li Zhong
With commit 2fabf084b, during boottime, cpu_numa_callback() is called
earlier(before their online) for each cpu, and verify_cpu_node_mapping()
uses cpu_to_node() to check whether siblings are in the same node. 

It skips the checking for siblings that are not online yet. So the only
check done here is for the bootcpu, which is online at that time. But
the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which
will be set up in smp_prepare_cpus()).

So I saw something like following reported:
[0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same
node!

As we don't actually do the checking during this early stage, so maybe
we could directly call numa_setup_cpu() in do_init_bootmem().

Also, as Nish suggested, here it's better to use present cpu mask
instead of possible mask to avoid warning in numa_setup_cpu().

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index d7737a5..3a9061e 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1127,9 +1127,8 @@ void __init do_init_bootmem(void)
 * even before we online them, so that we can use cpu_to_{node,mem}
 * early in boot, cf. smp_prepare_cpus().
 */
-   for_each_possible_cpu(cpu) {
-   cpu_numa_callback(ppc64_numa_nb, CPU_UP_PREPARE,
- (void *)(unsigned long)cpu);
+   for_each_present_cpu(cpu) {
+   numa_setup_cpu((unsigned long)cpu);
}
 }
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH powerpc] Fix warning reported by verify_cpu_node_mapping()

2014-08-21 Thread Li Zhong
With commit 2fabf084b, during boottime, cpu_numa_callback() is called
earlier(before their online) for each cpu, and verify_cpu_node_mapping()
uses cpu_to_node() to check whether siblings are in the same node. 

It skips the checking for siblings that are not online yet. So the only
check done here is for the bootcpu, which is online at that time. But
the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which
will be set up in smp_prepare_cpus()).

So I could see something like following reported:
[0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same
node!

As we don't actually do the checking during this early stage, so maybe
we could directly call numa_setup_cpu() in do_init_bootmem()?

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/numa.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index d7737a5..9918c02 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1128,8 +1128,7 @@ void __init do_init_bootmem(void)
 * early in boot, cf. smp_prepare_cpus().
 */
for_each_possible_cpu(cpu) {
-   cpu_numa_callback(ppc64_numa_nb, CPU_UP_PREPARE,
- (void *)(unsigned long)cpu);
+   numa_setup_cpu((unsigned long)cpu);
}
 }
 





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH powerpc] Fix warning reported by verify_cpu_node_mapping()

2014-08-21 Thread Li Zhong
On 四, 2014-08-21 at 08:45 -0700, Nishanth Aravamudan wrote:
 On 21.08.2014 [16:14:02 +0800], Li Zhong wrote:
  With commit 2fabf084b, during boottime, cpu_numa_callback() is called
  earlier(before their online) for each cpu, and verify_cpu_node_mapping()
  uses cpu_to_node() to check whether siblings are in the same node. 
  
  It skips the checking for siblings that are not online yet. So the only
  check done here is for the bootcpu, which is online at that time. But
  the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which
  will be set up in smp_prepare_cpus()).
  
  So I could see something like following reported:
  [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same
  node!
 
 You mean you did see this, right? (as opposed to could based upon code
 inspection or something)

Yes, I did see the warnings. Seems I didn't express it precisely in
English ...

 
  
  As we don't actually do the checking during this early stage, so maybe
  we could directly call numa_setup_cpu() in do_init_bootmem()?
  
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
 
 Acked-by: Nishanth Aravamudan n...@linux.vnet.ibm.com

Thank you for the review,

Zhong

 
  ---
   arch/powerpc/mm/numa.c | 3 +--
   1 file changed, 1 insertion(+), 2 deletions(-)
  
  diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
  index d7737a5..9918c02 100644
  --- a/arch/powerpc/mm/numa.c
  +++ b/arch/powerpc/mm/numa.c
  @@ -1128,8 +1128,7 @@ void __init do_init_bootmem(void)
   * early in boot, cf. smp_prepare_cpus().
   */
  for_each_possible_cpu(cpu) {
  -   cpu_numa_callback(ppc64_numa_nb, CPU_UP_PREPARE,
  - (void *)(unsigned long)cpu);
  +   numa_setup_cpu((unsigned long)cpu);
 
 This is a good change, thanks for catching it. I must have glossed over
 those messages in my testing, my apologies!
 
 -Nish



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH powerpc] use machine_subsys_initcall() for opal_hmi_handler_init()

2014-08-12 Thread Li Zhong
As opal_message_init() uses machine_early_initcall(powernv, ), and
opal_hmi_handler_init() depends on that early initcall, so it also needs
use machine_* to check the machine_id.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/opal-hmi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c 
b/arch/powerpc/platforms/powernv/opal-hmi.c
index 97ac8dc..5e1ed15 100644
--- a/arch/powerpc/platforms/powernv/opal-hmi.c
+++ b/arch/powerpc/platforms/powernv/opal-hmi.c
@@ -28,6 +28,7 @@
 
 #include asm/opal.h
 #include asm/cputable.h
+#include asm/machdep.h
 
 static int opal_hmi_handler_nb_init;
 struct OpalHmiEvtNode {
@@ -185,4 +186,4 @@ static int __init opal_hmi_handler_init(void)
}
return 0;
 }
-subsys_initcall(opal_hmi_handler_init);
+machine_subsys_initcall(powernv, opal_hmi_handler_init);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH powerpc] use _GLOBAL_TOC for memmove

2014-07-21 Thread Li Zhong
memmove may be called from module code copy_pages(btrfs), and it may
call memcpy, which may call back to C code, so it needs to use
_GLOBAL_TOC to set up r2 correctly. 

This fixes following error when I tried to boot an le guest:

Vector: 300 (Data Access) at [c00073f97210]
pc: c0015004: enable_kernel_altivec+0x24/0x80
lr: c0058fbc: enter_vmx_copy+0x3c/0x60
sp: c00073f97490
   msr: 82009033
   dar: d1d50170
 dsisr: 4000
  current = 0xc000734c
  paca= 0xcfff   softe: 0irq_happened: 0x01
pid   = 815, comm = mktemp
enter ? for help
[c00073f974f0] c0058fbc enter_vmx_copy+0x3c/0x60
[c00073f97510] c0057d34 memcpy_power7+0x274/0x840
[c00073f97610] d1c3179c copy_pages+0xfc/0x110 [btrfs]
[c00073f97660] d1c3c248 memcpy_extent_buffer+0xe8/0x160 [btrfs]
[c00073f97700] d1be4be8 setup_items_for_insert+0x208/0x4a0 [btrfs]
[c00073f97820] d1be50b4 btrfs_insert_empty_items+0xf4/0x140 [btrfs]
[c00073f97890] d1bfed30 insert_with_overflow+0x70/0x180 [btrfs]
[c00073f97900] d1bff174 btrfs_insert_dir_item+0x114/0x2f0 [btrfs]
[c00073f979a0] d1c1f92c btrfs_add_link+0x10c/0x370 [btrfs]
[c00073f97a40] d1c20e94 btrfs_create+0x204/0x270 [btrfs]
[c00073f97b00] c026d438 vfs_create+0x178/0x210
[c00073f97b50] c0270a70 do_last+0x9f0/0xe90
[c00073f97c20] c0271010 path_openat+0x100/0x810
[c00073f97ce0] c0272ea8 do_filp_open+0x58/0xd0
[c00073f97dc0] c025ade8 do_sys_open+0x1b8/0x300
[c00073f97e30] c000a008 syscall_exit+0x0/0x7c
--- Exception: c01 (System Call) at 3fffaabe2854

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/lib/mem_64.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S
index 0738f96..43435c6 100644
--- a/arch/powerpc/lib/mem_64.S
+++ b/arch/powerpc/lib/mem_64.S
@@ -77,7 +77,7 @@ _GLOBAL(memset)
stb r4,0(r6)
blr
 
-_GLOBAL(memmove)
+_GLOBAL_TOC(memmove)
cmplw   0,r3,r4
bgt backwards_memcpy
b   memcpy


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH powerpc] Fix parameter restoring issue in commit 752a6422f

2014-06-24 Thread Li Zhong
In commit 752a6422f, new stack frame is created for parameters.

However, the r1 is added back a little earlier, so r3 and r4 are
restored from a wrong place, which could cause following error during
boot:

Querying for OPAL presence...  there !
DEFAULT CATCH!, exception-handler=fff00700 
at   %SRR0: 04223058   %SRR1: 80081002 
Open Firmware exception handler entered from non-OF code

Client's Fix Pt Regs:
 00 04223054 04223020 04fbe838 0002
 04 28002024 04fbe838 0427e838 04222f20
 08   04222f20 1002
 0c a001  01a3fd20 040eb170
 10 040eb628 040eb368 fffd 01a3fd20
 14 01b37f00 0f34 00cc 0f34
 18 040ebb08 0358 040eb128 04285920
 1c 01a3fd60 040eb100 7c0802a6f8010010 f821ff914b91ebd1
Special Regs:
%IV: 0700 %CR: 28002022%XER:   %DSISR: 4200 
  %SRR0: 04223058   %SRR1: 80081002 
%LR: 04223054%CTR:  
   %DAR: 01a3fcf00020b4ac 
Virtual PID = 0 
 ok

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
diff --git a/arch/powerpc/platforms/powernv/opal-takeover.S 
b/arch/powerpc/platforms/powernv/opal-takeover.S
index 11a3169..9093540 100644
--- a/arch/powerpc/platforms/powernv/opal-takeover.S
+++ b/arch/powerpc/platforms/powernv/opal-takeover.S
@@ -5,6 +5,7 @@
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
+B
  * as published by the Free Software Foundation; either version
  * 2 of the License, or (at your option) any later version.
  */
@@ -27,11 +28,11 @@ _GLOBAL(opal_query_takeover)
li  r3,H_HAL_TAKEOVER
li  r4,H_HAL_TAKEOVER_QUERY_MAGIC
HVSC
-   addir1,r1,STACKFRAMESIZE
ld  r10,STK_PARAM(R3)(r1)
std r4,0(r10)
ld  r10,STK_PARAM(R4)(r1)
std r5,0(r10)
+   addir1,r1,STACKFRAMESIZE
lwz r0,8(r1)
mtcrf   0xff,r0
blr


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH powerpc] Fix parameter restoring issue in commit 752a6422f

2014-06-24 Thread Li Zhong
DaYu just reminded me that Michael already had a patch removing all the
related code. 

Please ignore this patch..

Thanks, Zhong

On Wed, 2014-06-25 at 12:00 +0800, Li Zhong wrote:
 In commit 752a6422f, new stack frame is created for parameters.
 
 However, the r1 is added back a little earlier, so r3 and r4 are
 restored from a wrong place, which could cause following error during
 boot:
 
 Querying for OPAL presence...  there !
 DEFAULT CATCH!, exception-handler=fff00700 
 at   %SRR0: 04223058   %SRR1: 80081002 
 Open Firmware exception handler entered from non-OF code
 
 Client's Fix Pt Regs:
  00 04223054 04223020 04fbe838 0002
  04 28002024 04fbe838 0427e838 04222f20
  08   04222f20 1002
  0c a001  01a3fd20 040eb170
  10 040eb628 040eb368 fffd 01a3fd20
  14 01b37f00 0f34 00cc 0f34
  18 040ebb08 0358 040eb128 04285920
  1c 01a3fd60 040eb100 7c0802a6f8010010 f821ff914b91ebd1
 Special Regs:
 %IV: 0700 %CR: 28002022%XER:   %DSISR: 4200 
   %SRR0: 04223058   %SRR1: 80081002 
 %LR: 04223054%CTR:  
%DAR: 01a3fcf00020b4ac 
 Virtual PID = 0 
  ok
 
 Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
 ---
 diff --git a/arch/powerpc/platforms/powernv/opal-takeover.S 
 b/arch/powerpc/platforms/powernv/opal-takeover.S
 index 11a3169..9093540 100644
 --- a/arch/powerpc/platforms/powernv/opal-takeover.S
 +++ b/arch/powerpc/platforms/powernv/opal-takeover.S
 @@ -5,6 +5,7 @@
   *
   * This program is free software; you can redistribute it and/or
   * modify it under the terms of the GNU General Public License
 +B
   * as published by the Free Software Foundation; either version
   * 2 of the License, or (at your option) any later version.
   */
 @@ -27,11 +28,11 @@ _GLOBAL(opal_query_takeover)
   li  r3,H_HAL_TAKEOVER
   li  r4,H_HAL_TAKEOVER_QUERY_MAGIC
   HVSC
 - addir1,r1,STACKFRAMESIZE
   ld  r10,STK_PARAM(R3)(r1)
   std r4,0(r10)
   ld  r10,STK_PARAM(R4)(r1)
   std r5,0(r10)
 + addir1,r1,STACKFRAMESIZE
   lwz r0,8(r1)
   mtcrf   0xff,r0
   blr
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/4] powerpc: implement vmemmap_list_free()

2014-06-11 Thread Li Zhong
This patch implements vmemmap_list_free() for vmemmap_free().

The freed entries will be removed from vmemmap_list, and form a freed list,
with next as the header. The next position in the last allocated page is kept
at the list tail.

When allocation, if there are freed entries left, get it from the freed list;
if no freed entries left, get it like before from the last allocated pages.

With this change, realmode_pfn_to_page() also needs to be changed to walk
all the entries in the vmemmap_list, as the virt_addr of the entries might not
be stored in order anymore.

It helps to reuse the memory when continuous doing memory hot-plug/remove
operations, but didn't reclaim the pages already allocated, so the memory usage
will only increase, but won't exceed the value for the largest memory
configuration.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
Cc: Nathan Fontenot nf...@linux.vnet.ibm.com
---
 arch/powerpc/mm/init_64.c |   62 +
 1 file changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index e3734ed..fa5d28b 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -226,14 +226,24 @@ static void __meminit vmemmap_create_mapping(unsigned 
long start,
 #endif /* CONFIG_PPC_BOOK3E */
 
 struct vmemmap_backing *vmemmap_list;
+static struct vmemmap_backing *next;
+static int num_left;
+static int num_freed;
 
 static __meminit struct vmemmap_backing * vmemmap_list_alloc(int node)
 {
-   static struct vmemmap_backing *next;
-   static int num_left;
+   struct vmemmap_backing *vmem_back;
+   /* get from freed entries first */
+   if (num_freed) {
+   num_freed--;
+   vmem_back = next;
+   next = next-list;
+
+   return vmem_back;
+   }
 
/* allocate a page when required and hand out chunks */
-   if (!next || !num_left) {
+   if (!num_left) {
next = vmemmap_alloc_block(PAGE_SIZE, node);
if (unlikely(!next)) {
WARN_ON(1);
@@ -266,6 +276,38 @@ static __meminit void vmemmap_list_populate(unsigned long 
phys,
vmemmap_list = vmem_back;
 }
 
+static unsigned long vmemmap_list_free(unsigned long start)
+{
+   struct vmemmap_backing *vmem_back, *vmem_back_prev;
+
+   vmem_back_prev = vmem_back = vmemmap_list;
+
+   /* look for it with prev pointer recorded */
+   for (; vmem_back; vmem_back = vmem_back-list) {
+   if (vmem_back-virt_addr == start)
+   break;
+   vmem_back_prev = vmem_back;
+   }
+
+   if (unlikely(!vmem_back)) {
+   WARN_ON(1);
+   return 0;
+   }
+
+   /* remove it from vmemmap_list */
+   if (vmem_back == vmemmap_list) /* remove head */
+   vmemmap_list = vmem_back-list;
+   else
+   vmem_back_prev-list = vmem_back-list;
+
+   /* next point to this freed entry */
+   vmem_back-list = next;
+   next = vmem_back;
+   num_freed++;
+
+   return vmem_back-phys;
+}
+
 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node)
 {
unsigned long page_size = 1  mmu_psize_defs[mmu_vmemmap_psize].shift;
@@ -331,16 +373,16 @@ struct page *realmode_pfn_to_page(unsigned long pfn)
if (pg_va  vmem_back-virt_addr)
continue;
 
-   /* Check that page struct is not split between real pages */
-   if ((pg_va + sizeof(struct page)) 
-   (vmem_back-virt_addr + page_size))
-   return NULL;
-
-   page = (struct page *) (vmem_back-phys + pg_va -
+   /* After vmemmap_list entry free is possible, need check all */
+   if ((pg_va + sizeof(struct page)) =
+   (vmem_back-virt_addr + page_size)) {
+   page = (struct page *) (vmem_back-phys + pg_va -
vmem_back-virt_addr);
-   return page;
+   return page;
+   }
}
 
+   /* Probably that page struct is split between real pages */
return NULL;
 }
 EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/4] powerpc: implement vmemmap_remove_mapping() for BOOK3S

2014-06-11 Thread Li Zhong
This is to be called in vmemmap_free(), leave the implementation on BOOK3E 
empty as before.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
Cc: Nathan Fontenot nf...@linux.vnet.ibm.com
---
 arch/powerpc/mm/hash_utils_64.c |2 +-
 arch/powerpc/mm/init_64.c   |   22 ++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 88fdd9d..25d9d66 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -243,7 +243,7 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long 
vend,
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static int htab_remove_mapping(unsigned long vstart, unsigned long vend,
+int htab_remove_mapping(unsigned long vstart, unsigned long vend,
  int psize, int ssize)
 {
unsigned long vaddr;
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index fa5d28b..69203c8 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -212,6 +212,13 @@ static void __meminit vmemmap_create_mapping(unsigned long 
start,
for (i = 0; i  page_size; i += PAGE_SIZE)
BUG_ON(map_kernel_page(start + i, phys, flags));
 }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static void vmemmap_remove_mapping(unsigned long start,
+  unsigned long page_size)
+{
+}
+#endif
 #else /* CONFIG_PPC_BOOK3E */
 static void __meminit vmemmap_create_mapping(unsigned long start,
 unsigned long page_size,
@@ -223,6 +230,21 @@ static void __meminit vmemmap_create_mapping(unsigned long 
start,
mmu_kernel_ssize);
BUG_ON(mapped  0);
 }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern int htab_remove_mapping(unsigned long vstart, unsigned long vend,
+   int psize, int ssize);
+
+static void vmemmap_remove_mapping(unsigned long start,
+  unsigned long page_size)
+{
+   int mapped = htab_remove_mapping(start, start + page_size,
+mmu_vmemmap_psize,
+mmu_kernel_ssize);
+   BUG_ON(mapped  0);
+}
+#endif
+
 #endif /* CONFIG_PPC_BOOK3E */
 
 struct vmemmap_backing *vmemmap_list;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/4] powerpc: implement vmemmap_free()

2014-06-11 Thread Li Zhong
vmemmap_free() does the opposite of vmemap_populate().
This patch also puts vmemmap_free() and vmemmap_list_free() into
 CONFIG_MEMMORY_HOTPLUG.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
Cc: Nathan Fontenot nf...@linux.vnet.ibm.com
---
 arch/powerpc/mm/init_64.c |   85 ++---
 1 file changed, 64 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 69203c8..4963790 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -298,6 +298,37 @@ static __meminit void vmemmap_list_populate(unsigned long 
phys,
vmemmap_list = vmem_back;
 }
 
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node)
+{
+   unsigned long page_size = 1  mmu_psize_defs[mmu_vmemmap_psize].shift;
+
+   /* Align to the page size of the linear mapping. */
+   start = _ALIGN_DOWN(start, page_size);
+
+   pr_debug(vmemmap_populate %lx..%lx, node %d\n, start, end, node);
+
+   for (; start  end; start += page_size) {
+   void *p;
+
+   if (vmemmap_populated(start, page_size))
+   continue;
+
+   p = vmemmap_alloc_block(page_size, node);
+   if (!p)
+   return -ENOMEM;
+
+   vmemmap_list_populate(__pa(p), start, node);
+
+   pr_debug(  * %016lx..%016lx allocated at %p\n,
+start, start + page_size, p);
+
+   vmemmap_create_mapping(start, page_size, __pa(p));
+   }
+
+   return 0;
+}
+
+#ifdef CONFIG_MEMORY_HOTPLUG
 static unsigned long vmemmap_list_free(unsigned long start)
 {
struct vmemmap_backing *vmem_back, *vmem_back_prev;
@@ -330,40 +361,52 @@ static unsigned long vmemmap_list_free(unsigned long 
start)
return vmem_back-phys;
 }
 
-int __meminit vmemmap_populate(unsigned long start, unsigned long end, int 
node)
+void __ref vmemmap_free(unsigned long start, unsigned long end)
 {
unsigned long page_size = 1  mmu_psize_defs[mmu_vmemmap_psize].shift;
 
-   /* Align to the page size of the linear mapping. */
start = _ALIGN_DOWN(start, page_size);
 
-   pr_debug(vmemmap_populate %lx..%lx, node %d\n, start, end, node);
+   pr_debug(vmemmap_free %lx...%lx\n, start, end);
 
for (; start  end; start += page_size) {
-   void *p;
+   unsigned long addr;
 
+   /*
+* the section has already be marked as invalid, so
+* vmemmap_populated() true means some other sections still
+* in this page, so skip it.
+*/
if (vmemmap_populated(start, page_size))
continue;
 
-   p = vmemmap_alloc_block(page_size, node);
-   if (!p)
-   return -ENOMEM;
-
-   vmemmap_list_populate(__pa(p), start, node);
-
-   pr_debug(  * %016lx..%016lx allocated at %p\n,
-start, start + page_size, p);
-
-   vmemmap_create_mapping(start, page_size, __pa(p));
+   addr = vmemmap_list_free(start);
+   if (addr) {
+   struct page *page = pfn_to_page(addr  PAGE_SHIFT);
+
+   if (PageReserved(page)) {
+   /* allocated from bootmem */
+   if (page_size  PAGE_SIZE) {
+   /*
+* this shouldn't happen, but if it is
+* the case, leave the memory there
+*/
+   WARN_ON_ONCE(1);
+   } else {
+   unsigned int nr_pages =
+   1  get_order(page_size);
+   while (nr_pages--)
+   free_reserved_page(page++);
+   }
+   } else
+   free_pages((unsigned long)(__va(addr)),
+   get_order(page_size));
+
+   vmemmap_remove_mapping(start, page_size);
+   }
}
-
-   return 0;
-}
-
-void vmemmap_free(unsigned long start, unsigned long end)
-{
 }
-
+#endif
 void register_page_bootmem_memmap(unsigned long section_nr,
  struct page *start_page, unsigned long size)
 {
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4] powerpc: start loop at section start of start in vmemmap_populated()

2014-06-11 Thread Li Zhong
vmemmap_populated() checks whether the [start, start + page_size) has valid
pfn numbers, to know whether a vmemmap mapping has been created that includes
this range.

Some range before end might not be checked by this loop:
  sec11start..start11..sec11end/sec12start..endstart12..sec12end
as the above, for start11(section 11), it checks [sec11start, sec11end), and
loop ends as the next start(start12) is bigger than end. However,
[sec11end/sec12start, end) is not checked here.

So before the loop, adjust the start to be the start of the section, so we 
don't miss ranges like the above.

After we adjust start to be the start of the section, it also means it's
aligned with vmemmap as of the sizeof struct page, so we could use
page_to_pfn directly in the loop.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
Cc: Nathan Fontenot nf...@linux.vnet.ibm.com
---
 arch/powerpc/mm/init_64.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 4963790..253b4b9 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -175,9 +175,10 @@ static unsigned long __meminit 
vmemmap_section_start(unsigned long page)
 static int __meminit vmemmap_populated(unsigned long start, int page_size)
 {
unsigned long end = start + page_size;
+   start = (unsigned long)(pfn_to_page(vmemmap_section_start(start)));
 
for (; start  end; start += (PAGES_PER_SECTION * sizeof(struct page)))
-   if (pfn_valid(vmemmap_section_start(start)))
+   if (pfn_valid(page_to_pfn((struct page *)start)))
return 1;
 
return 0;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH] powerpc: fix wrong sp saved in save_stack_trace()

2014-05-08 Thread Li Zhong
I found stack trace couldn't be saved sometimes. After some
investigation, it seems that when function trace is enabled, 

void save_stack_trace(struct stack_trace *trace)
{
unsigned long sp;

asm(mr %0,1 : =r (sp));

save_context_stack(trace, sp, current, 1);
}

is compiled into: 

c00432c0 .save_stack_trace:
c00432c0:   7c 08 02 a6 mflrr0
c00432c4:   f8 01 00 10 std r0,16(r1)
c00432c8:   f8 21 ff 81 stdur1,-128(r1)
c00432cc:   f8 61 00 70 std r3,112(r1)
c00432d0:   4b fc 77 bd bl  c000aa8c
._mcount
c00432d4:   60 00 00 00 nop
c00432d8:   7c 24 0b 78 mr  r4,r1

c00432dc:   e8 ad 02 78 ld  r5,632(r13)
c00432e0:   e8 61 00 70 ld  r3,112(r1)
c00432e4:   38 c0 00 01 li  r6,1
c00432e8:   38 21 00 80 addir1,r1,128
c00432ec:   e8 01 00 10 ld  r0,16(r1)
c00432f0:   7c 08 03 a6 mtlrr0
c00432f4:   4b ff fe 5c b   c0043150
.save_context_stack
c00432f8:   60 00 00 00 nop
c00432fc:   60 42 00 00 ori r2,r2,0

new stack frame -128(r1) is created to call ._mcount, and this new r1 is
copied into sp as the stack pointer, which then could be overwritten by
save_context_stack's prolog. 

I don't know how to specify in C that the embedded asm be compiled after
r1 being added back to the original value. But as a workaround, maybe we
could move this embedded asm into save_context_stack(). 

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/stacktrace.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 3d30ef1..5c0b461 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -22,6 +22,9 @@
 static void save_context_stack(struct stack_trace *trace, unsigned long sp,
struct task_struct *tsk, int savesched)
 {
+   if (tsk == current)
+   asm(mr %0,1 : =r (sp));
+
for (;;) {
unsigned long *stack = (unsigned long *) sp;
unsigned long newsp, ip;
@@ -48,11 +51,7 @@ static void save_context_stack(struct stack_trace *trace, 
unsigned long sp,
 
 void save_stack_trace(struct stack_trace *trace)
 {
-   unsigned long sp;
-
-   asm(mr %0,1 : =r (sp));
-
-   save_context_stack(trace, sp, current, 1);
+   save_context_stack(trace, 0, current, 1);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace);
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH] Fix Oops in rtas_stop_self()

2014-04-27 Thread Li Zhong
On Fri, 2014-04-25 at 22:18 +1000, Anton Blanchard wrote:
 Hi,
 
  When trying offline cpus, I noticed following Oops in
  rtas_stop_self(), and it seems caused by commit 41dd03a9. The Oops
  disappears after reverting this commit.
  
  After reading the code, I guess it might be caused by moving the
  rtas_args to stack. Still need some more time to read enter_rtas to
  understand why it happens, but the problem seems could be solved by
  moving the rtas_args away from stack by adding static before it.
 
 Nice catch. RTAS is 32bit and if your box has more than 4GB RAM then
 your stack could easily be outside 32bit range.

Ah, yes, the stack here is obviously at a much higher address than 4GB.

 
 You can add:
 
 Signed-off-by: Anton Blanchard an...@samba.org
 
 And also:
 
 Cc: sta...@vger.kernel.org # 3.14+

OK, 

Thanks, Zhong
 
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
  ---
   arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)
  
  diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c
  b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 9b8e050..20d6297
  100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
  +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
  @@ -88,13 +88,14 @@ void set_default_offline_state(int cpu)
   
   static void rtas_stop_self(void)
   {
  -   struct rtas_args args = {
  -   .token = cpu_to_be32(rtas_stop_self_token),
  +   static struct rtas_args args = {
  .nargs = 0,
  .nret = 1,
  .rets = args.args[0],
  };
   
  +   args.token = cpu_to_be32(rtas_stop_self_token);
  +
  local_irq_disable();
   
  BUG_ON(rtas_stop_self_token == RTAS_UNKNOWN_SERVICE);
  
  
  ___
  Linuxppc-dev mailing list
  Linuxppc-dev@lists.ozlabs.org
  https://lists.ozlabs.org/listinfo/linuxppc-dev
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Fix Oops in rtas_stop_self()

2014-04-27 Thread Li Zhong
commit 41dd03a9 may cause Oops in rtas_stop_self().

The reason is that the rtas_args was moved into stack space. For a box
with more that 4GB RAM, the stack could easily be outside 32bit range,
but RTAS is 32bit.

So the patch moves rtas_args away from stack by adding static before
it. 

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
Signed-off-by: Anton Blanchard an...@samba.org
Cc: sta...@vger.kernel.org # 3.14+
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 9b8e050..20d6297 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -88,13 +88,14 @@ void set_default_offline_state(int cpu)
 
 static void rtas_stop_self(void)
 {
-   struct rtas_args args = {
-   .token = cpu_to_be32(rtas_stop_self_token),
+   static struct rtas_args args = {
.nargs = 0,
.nret = 1,
.rets = args.args[0],
};
 
+   args.token = cpu_to_be32(rtas_stop_self_token);
+
local_irq_disable();
 
BUG_ON(rtas_stop_self_token == RTAS_UNKNOWN_SERVICE);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH] Fix Oops in rtas_stop_self()

2014-04-25 Thread Li Zhong
When trying offline cpus, I noticed following Oops in rtas_stop_self(),
and it seems caused by commit 41dd03a9. The Oops disappears after
reverting this commit.

After reading the code, I guess it might be caused by moving the
rtas_args to stack. Still need some more time to read enter_rtas to
understand why it happens, but the problem seems could be solved by
moving the rtas_args away from stack by adding static before it.

[  247.194623] cpu 28 (hwid 28) Ready to die...
[  247.194641] Unable to handle kernel paging request for data at address 
0xecb213c50
[  247.194642] Faulting instruction address: 0x0f34157c
[  247.194645] Oops: Kernel access of bad area, sig: 11 [#1]
[  247.194648] SMP NR_CPUS=1024 NUMA pSeries
[  247.194664] Modules linked in: nf_conntrack_netbios_ns 
nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle 
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat 
iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
ebtable_filter ebtables ip6table_filter ip6_tables ehea ibmvscsi 
scsi_transport_srp scsi_tgt
[  247.194667] CPU: 28 PID: 0 Comm: swapper/28 Not tainted 3.15.0-rc1+ #10
[  247.194670] CPU: 27 PID: 1171 Comm: drmgr Not tainted 3.15.0-rc1+ #10
[  247.194673] task: c007ca2f0ba0 ti: c00ecb21 task.ti: 
c00ecb21
[  247.194679] NIP: 0f34157c LR: a84c CTR: 0151
[  247.194680] REGS: c00ecb213780 TRAP: 0300   Not tainted  (3.15.0-rc1+)
[  247.194684] MSR: 80001000 SF,ME  CR:   XER: 
[  247.194707] CFAR: c000a844 DAR: 000ecb213c50 DSISR: 4000 
SOFTE: 0 
[  247.194707] GPR00: 80001030 c00ecb213a00 c14c7880 
000ecb213c50 
[  247.194707] GPR04: 0f34 000ecb213c50  
80001032 
[  247.194707] GPR08: 1000 c00ecb213a00  
 
[  247.194707] GPR12:  cf227e00 c091f5b0 
0f394e2c 
[  247.194707] GPR16: 001c 0001 001c 
c13ad5b0 
[  247.194707] GPR20: 0001  c13bfb89 
c0d57688 
[  247.194707] GPR24: 001c c0d44e2c c00ecb21 
c13c0f48 
[  247.194707] GPR28: c0d44e28 c150ca68 00e0 
c00ecb21 
[  247.194710] NIP [0f34157c] 0xf34157c
[  247.194712] LR [a84c] 0xa84c
[  247.194714] Call Trace:
[  247.194716] [c00ecb213a00] [001c] 0x1c (unreliable)
[  247.194722] [c00ecb213be0] [c007ffe0] 
.pseries_mach_cpu_die+0x150/0x350
[  247.194725] [c00ecb213cf0] [c00420cc] .cpu_die+0x3c/0x60
[  247.194727] [c00ecb213d60] [c0017e58] 
.arch_cpu_idle_dead+0x28/0x40
[  247.194732] [c00ecb213dd0] [c0104f40] 
.cpu_startup_entry+0x570/0x590
[  247.194735] [c00ecb213ed0] [c0041d4c] 
.start_secondary+0x2dc/0x310
[  247.194738] [c00ecb213f90] [c000996c] 
.start_secondary_prolog+0x10/0x14
[  247.194740] Instruction dump:
[  247.194744]        
 
[  247.194748]        
 
[  247.194751] ---[ end trace d1d21584135396ba ]---

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 9b8e050..20d6297 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -88,13 +88,14 @@ void set_default_offline_state(int cpu)
 
 static void rtas_stop_self(void)
 {
-   struct rtas_args args = {
-   .token = cpu_to_be32(rtas_stop_self_token),
+   static struct rtas_args args = {
.nargs = 0,
.nret = 1,
.rets = args.args[0],
};
 
+   args.token = cpu_to_be32(rtas_stop_self_token);
+
local_irq_disable();
 
BUG_ON(rtas_stop_self_token == RTAS_UNKNOWN_SERVICE);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH v2 powerpc] Protect remove_memory() with device hotplug lock

2014-04-10 Thread Li Zhong
While testing memory hot-remove, I found following dead lock:

Process #1141 is drmgr, trying to remove some memory, i.e. memory499. 
It holds the memory_hotplug_mutex, and blocks when trying to remove file
online under dir memory499, in kernfs_drain(), at 
wait_event(root-deactivate_waitq,
   atomic_read(kn-active) == KN_DEACTIVATED_BIAS);

Process #1120 is trying to online memory499 by 
   echo 1  memory499/online

In .kernfs_fop_write, it uses kernfs_get_active() to increase
kn-active, thus blocking process #1141. While itself is blocked later
when trying to acquire memory_hotplug_mutex, which is held by process
#1141. 

The backtrace of both processes are shown below:

# cat /proc/1120/stack 
[c1b18600] 0xc1b18600
[c0015044] .__switch_to+0x144/0x200
[c0263ca4] .online_pages+0x74/0x7b0
[c055b40c] .memory_subsys_online+0x9c/0x150
[c053cbe8] .device_online+0xb8/0x120
[c053cd04] .online_store+0xb4/0xc0
[c0538ce4] .dev_attr_store+0x64/0xa0
[c030f4ec] .sysfs_kf_write+0x7c/0xb0
[c030e574] .kernfs_fop_write+0x154/0x1e0
[c0268450] .vfs_write+0xe0/0x260
[c0269144] .SyS_write+0x64/0x110
[c0009ffc] syscall_exit+0x0/0x7c


# cat /proc/1141/stack 
[c1b18600] 0xc1b18600
[c0015044] .__switch_to+0x144/0x200
[c030be14] .__kernfs_remove+0x204/0x300
[c030d428] .kernfs_remove_by_name_ns+0x68/0xf0
[c030fb38] .sysfs_remove_file_ns+0x38/0x60
[c0539354] .device_remove_attrs+0x54/0xc0
[c0539fd8] .device_del+0x158/0x250
[c053a104] .device_unregister+0x34/0xa0
[c055bc14] .unregister_memory_section+0x164/0x170
[c024ee18] .__remove_pages+0x108/0x4c0
[c004b590] .arch_remove_memory+0x60/0xc0
[c026446c] .remove_memory+0x8c/0xe0
[c007f9f4] .pseries_remove_memblock+0xd4/0x160
[c007fcfc] .pseries_memory_notifier+0x27c/0x290
[c08ae6cc] .notifier_call_chain+0x8c/0x100
[c00d858c] .__blocking_notifier_call_chain+0x6c/0xe0
[c071ddec] .of_property_notify+0x7c/0xc0
[c071ed3c] .of_update_property+0x3c/0x1b0
[c00756cc] .ofdt_write+0x3dc/0x740
[c02f60fc] .proc_reg_write+0xac/0x110
[c0268450] .vfs_write+0xe0/0x260
[c0269144] .SyS_write+0x64/0x110
[c0009ffc] syscall_exit+0x0/0x7c

This patch uses lock_device_hotplug() to protect remove_memory() called
in pseries_remove_memblock(), which is also stated before function
remove_memory():

 * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
 * and online/offline operations before this call, as required by
 * try_offline_node().
 */ 
void __ref remove_memory(int nid, u64 start, u64 size)

With this lock held, the other process(#1120 above) trying to online the
memory block will retry the system call when calling
lock_device_hotplug_sysfs(), and finally find No such device error. 

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 573b488..7f75c94 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -100,10 +100,10 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
 
start_pfn = base  PAGE_SHIFT;
 
-   if (!pfn_valid(start_pfn)) {
-   memblock_remove(base, memblock_size);
-   return 0;
-   }
+   lock_device_hotplug();
+
+   if (!pfn_valid(start_pfn))
+   goto out;
 
block_sz = memory_block_size_bytes();
sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
@@ -114,8 +114,10 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
base += MIN_MEMORY_BLOCK_SIZE;
}
 
+out:
/* Update memory regions for memory remove */
memblock_remove(base, memblock_size);
+   unlock_device_hotplug();
return 0;
 }
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH powerpc] Protect remove_memory() with device hotplug lock

2014-04-09 Thread Li Zhong
While testing memory hot-remove, I found following dead lock:

Process #1141 is drmgr, trying to remove some memory, i.e. memory499. 
It holds the memory_hotplug_mutex, and blocks when trying to remove file
online under dir memory499, in kernfs_drain(), at 
wait_event(root-deactivate_waitq,
   atomic_read(kn-active) == KN_DEACTIVATED_BIAS);

Process #1120 is trying to online memory499 by 
   echo 1  memory499/online

In .kernfs_fop_write, it uses kernfs_get_active() to increase
kn-active, thus blocking process #1141. While itself is blocked later
when trying to acquire memory_hotplug_mutex, which is held by process
#1141. 

The backtrace of both processes are shown below:

# cat /proc/1120/stack 
[c1b18600] 0xc1b18600
[c0015044] .__switch_to+0x144/0x200
[c0263ca4] .online_pages+0x74/0x7b0
[c055b40c] .memory_subsys_online+0x9c/0x150
[c053cbe8] .device_online+0xb8/0x120
[c053cd04] .online_store+0xb4/0xc0
[c0538ce4] .dev_attr_store+0x64/0xa0
[c030f4ec] .sysfs_kf_write+0x7c/0xb0
[c030e574] .kernfs_fop_write+0x154/0x1e0
[c0268450] .vfs_write+0xe0/0x260
[c0269144] .SyS_write+0x64/0x110
[c0009ffc] syscall_exit+0x0/0x7c


# cat /proc/1141/stack 
[c1b18600] 0xc1b18600
[c0015044] .__switch_to+0x144/0x200
[c030be14] .__kernfs_remove+0x204/0x300
[c030d428] .kernfs_remove_by_name_ns+0x68/0xf0
[c030fb38] .sysfs_remove_file_ns+0x38/0x60
[c0539354] .device_remove_attrs+0x54/0xc0
[c0539fd8] .device_del+0x158/0x250
[c053a104] .device_unregister+0x34/0xa0
[c055bc14] .unregister_memory_section+0x164/0x170
[c024ee18] .__remove_pages+0x108/0x4c0
[c004b590] .arch_remove_memory+0x60/0xc0
[c026446c] .remove_memory+0x8c/0xe0
[c007f9f4] .pseries_remove_memblock+0xd4/0x160
[c007fcfc] .pseries_memory_notifier+0x27c/0x290
[c08ae6cc] .notifier_call_chain+0x8c/0x100
[c00d858c] .__blocking_notifier_call_chain+0x6c/0xe0
[c071ddec] .of_property_notify+0x7c/0xc0
[c071ed3c] .of_update_property+0x3c/0x1b0
[c00756cc] .ofdt_write+0x3dc/0x740
[c02f60fc] .proc_reg_write+0xac/0x110
[c0268450] .vfs_write+0xe0/0x260
[c0269144] .SyS_write+0x64/0x110
[c0009ffc] syscall_exit+0x0/0x7c

This patch uses lock_device_hotplug() to protect remove_memory() called
in pseries_remove_memblock(), which is also stated before function
remove_memory():

 * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
 * and online/offline operations before this call, as required by
 * try_offline_node().
 */ 
void __ref remove_memory(int nid, u64 start, u64 size)

With this lock held, the other process(#1120 above) trying to online the
memory block will retry the system call when calling
lock_device_hotplug_sysfs(), and finally find No such device error. 

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 573b488..e96357c 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -109,10 +109,12 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
nid = memory_add_physaddr_to_nid(base);
 
+   lock_device_hotplug();
for (i = 0; i  sections_per_block; i++) {
remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
base += MIN_MEMORY_BLOCK_SIZE;
}
+   unlock_device_hotplug();
 
/* Update memory regions for memory remove */
memblock_remove(base, memblock_size);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: use of_node_init() for the fakenode in msi_bitmap.c

2014-04-03 Thread Li Zhong
This patch uses of_node_init() to initialize the kobject in the fake
node used in test_of_node(), to avoid following kobject warning. 

[0.897654] kobject: '(null)' (c007ca183a08): is not initialized, yet 
kobject_put() is being called.
[0.897682] [ cut here ]
[0.897688] WARNING: at lib/kobject.c:670
[0.897692] Modules linked in:
[0.897701] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 3.14.0+ #1
[0.897708] task: c007ca10 ti: c007ca18 task.ti: 
c007ca18
[0.897715] NIP: c046a1f0 LR: c046a1ec CTR: 01704660
[0.897721] REGS: c007ca1835c0 TRAP: 0700   Not tainted  (3.14.0+)
[0.897727] MSR: 80029032 SF,EE,ME,IR,DR,RI  CR: 2824  XER: 
000d
[0.897749] CFAR: c08ef4ec SOFTE: 1 
GPR00: c046a1ec c007ca183840 c14c59b8 005c 
GPR04: 0001 c0129770  0001 
GPR08:    3fef 
GPR12:  cf221200 c000c350  
GPR16:     
GPR20:     
GPR24:  c144e808 c0c56f20 00d8 
GPR28: c0cd5058  c1454ca8 c007ca183a08 
[0.897856] NIP [c046a1f0] .kobject_put+0xa0/0xb0
[0.897863] LR [c046a1ec] .kobject_put+0x9c/0xb0
[0.897868] Call Trace:
[0.897874] [c007ca183840] [c046a1ec] .kobject_put+0x9c/0xb0 
(unreliable)
[0.897885] [c007ca1838c0] [c0743f9c] .of_node_put+0x2c/0x50
[0.897894] [c007ca183940] [c0c83954] .test_of_node+0x1dc/0x208
[0.897902] [c007ca183b80] [c0c839a4] 
.msi_bitmap_selftest+0x24/0x38
[0.897913] [c007ca183bf0] [c000bb34] 
.do_one_initcall+0x144/0x200
[0.897922] [c007ca183ce0] [c0c748e4] 
.kernel_init_freeable+0x2b4/0x394
[0.897931] [c007ca183db0] [c000c374] .kernel_init+0x24/0x130
[0.897940] [c007ca183e30] [c000a2f4] 
.ret_from_kernel_thread+0x5c/0x68
[0.897947] Instruction dump:
[0.897952] 7fe3fb78 38210080 e8010010 ebe1fff8 7c0803a6 4800014c e89f 
3c62ff6e 
[0.897971] 7fe5fb78 3863a950 48485279 6000 0fe0 3900 393f0038 
4b80 
[0.897992] ---[ end trace 1eeffdb9f825a556 ]---


Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/sysdev/msi_bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
index 8ba6042..2ff6302 100644
--- a/arch/powerpc/sysdev/msi_bitmap.c
+++ b/arch/powerpc/sysdev/msi_bitmap.c
@@ -202,7 +202,7 @@ void __init test_of_node(void)
 
/* There should really be a struct device_node allocator */
memset(of_node, 0, sizeof(of_node));
-   kref_init(of_node.kobj.kref);
+   of_node_init(of_node);
of_node.full_name = node_name;
 
check(0 == msi_bitmap_alloc(bmp, size, of_node));


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH powerpc] cpuidle: fix per cpu accessing of cpuidle_devices

2014-01-25 Thread Li Zhong
The patch tries to fix following bad address accessing, seems caused by
a typo in commit 7f74dc0f. cpuidle_devices is defined statically, so
per_cpu should be used to access it. 

[  204.774193] Unable to handle kernel paging request for data at address 
0x00a7d000
[  204.774215] Faulting instruction address: 0xc06ee92c
[  204.774225] Oops: Kernel access of bad area, sig: 11 [#1]
[  204.774230] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[  204.774244] Modules linked in: binfmt_misc
[  204.774254] CPU: 8 PID: 3590 Comm: bash Not tainted 
3.13.0-08526-gb2e448e-dirty #2
[  204.774262] task: c001faf4 ti: c001fa904000 task.ti: 
c001fa904000
[  204.774268] NIP: c06ee92c LR: c06ee920 CTR: c0064e94
[  204.774275] REGS: c001fa907240 TRAP: 0300   Not tainted  
(3.13.0-08526-gb2e448e-dirty)
[  204.774281] MSR: 80009032 SF,EE,ME,IR,DR,RI  CR: 22242424  XER: 
2000
[  204.774303] CFAR: 9088 DAR: 00a7d000 DSISR: 4000 
SOFTE: 1 
GPR00: c06ee920 c001fa9074c0 c0d764a8 c0c5d4c0 
GPR04: 0010 0010   
GPR08:  c16564a8   
GPR12: 0002 cf33e800 1012b3dc  
GPR16:  10129c58 10129bf8 1012b948 
GPR20:  1012b3e4 1013ea90  
GPR24: c0c5d858 c15f38e8 c0cb8b26 c0c664a8 
GPR28:  ffe5 00a7d000 00a7d000 
[  204.774407] NIP [c06ee92c] .cpuidle_disable_device+0x30/0xc4
[  204.774413] LR [c06ee920] .cpuidle_disable_device+0x24/0xc4
[  204.774419] Call Trace:
[  204.774424] [c001fa9074c0] [c06ee920] 
.cpuidle_disable_device+0x24/0xc4 (unreliable)
[  204.774435] [c001fa907540] [c00652a4] 
.pseries_cpuidle_add_cpu_notifier+0xb8/0xe0
[  204.774445] [c001fa9075c0] [c0855548] 
.notifier_call_chain+0x150/0x1c0
[  204.774455] [c001fa907670] [c00b3564] 
.__raw_notifier_call_chain+0x40/0x50
[  204.774463] [c001fa907710] [c0079c40] .__cpu_notify+0x50/0x9c
[  204.774472] [c001fa9077a0] [c018a054] ._cpu_down+0x1ec/0x388
[  204.774479] [c001fa9078b0] [c018a234] .cpu_down+0x44/0x64
[  204.774488] [c001fa907940] [c05627bc] 
.cpu_subsys_offline+0x24/0x3c
[  204.774497] [c001fa9079c0] [c055b2a4] .device_offline+0xc8/0x120
[  204.774504] [c001fa907a50] [c055d000] .online_store+0x74/0xb0
[  204.774512] [c001fa907b00] [c0559924] .dev_attr_store+0x60/0x78
[  204.774520] [c001fa907ba0] [c028cc78] .sysfs_kf_write+0x7c/0x9c
[  204.774528] [c001fa907c30] [c0291ae8] 
.kernfs_fop_write+0x108/0x174
[  204.774537] [c001fa907cd0] [c01f8308] .vfs_write+0x110/0x21c
[  204.774545] [c001fa907d70] [c01f854c] .SyS_write+0x70/0xbc
[  204.774553] [c001fa907e30] [c0009d88] syscall_exit+0x0/0x7c
[  204.774560] Instruction dump:
[  204.774565] 7c0802a6 f8010010 fbe1fff8 f821ff81 7c7f1b78 6000 6000 
7fe3fb78 
[  204.774582] 48000c89 6000 2fbf 419e0084 e81f 780b17e1 41820078 
2fa3 
[  204.774601] ---[ end trace edf1df93cf81e28d ]---

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/processor_idle.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/processor_idle.c 
b/arch/powerpc/platforms/pseries/processor_idle.c
index 94134a5..197cadc 100644
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -190,7 +190,7 @@ static int pseries_cpuidle_add_cpu_notifier(struct 
notifier_block *n,
 {
int hotcpu = (unsigned long)hcpu;
struct cpuidle_device *dev =
-   per_cpu_ptr(cpuidle_devices, hotcpu);
+   per_cpu(cpuidle_devices, hotcpu);
 
if (dev  cpuidle_get_driver()) {
switch (action) {


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH powerpc] Fix compile error of pgtable-ppc64.h

2014-01-17 Thread Li Zhong
On Fri, 2014-01-17 at 12:27 +0530, Aneesh Kumar K.V wrote:
 Li Zhong zh...@linux.vnet.ibm.com writes:
 
  It seems that forward declaration couldn't work well with typedef, use
  struct spinlock directly to avoiding following build errors:
 
  In file included from include/linux/spinlock.h:81,
   from include/linux/seqlock.h:35,
   from include/linux/time.h:5,
   from include/uapi/linux/timex.h:56,
   from include/linux/timex.h:56,
   from include/linux/sched.h:17,
   from arch/powerpc/kernel/asm-offsets.c:17:
  include/linux/spinlock_types.h:76: error: redefinition of typedef 
  'spinlock_t'
  /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: 
  previous declaration of 'spinlock_t' was here
 
 
 what compiler version ? I have seen that error in gcc 4.3 and it was
 concluded that it is too old a compiler version to worry about. That
 specific compiler version also gave error for forward declaring struct;

gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC), and it doesn't
report error after use struct directly. 

 
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
  ---
   arch/powerpc/include/asm/pgtable-ppc64.h |6 +++---
   1 files changed, 3 insertions(+), 3 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
  b/arch/powerpc/include/asm/pgtable-ppc64.h
  index d27960c..bc141c9 100644
  --- a/arch/powerpc/include/asm/pgtable-ppc64.h
  +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
  @@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, 
  unsigned long address,
  pmd_t *pmdp);
 
   #define pmd_move_must_withdraw pmd_move_must_withdraw
  -typedef struct spinlock spinlock_t;
  -static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
  -spinlock_t *old_pmd_ptl)
  +struct spinlock;
  +static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
  +struct spinlock *old_pmd_ptl)
   {
  /*
   * Archs like ppc64 use pgtable to store per pmd


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH powerpc] Fix compile error of pgtable-ppc64.h

2014-01-16 Thread Li Zhong
It seems that forward declaration couldn't work well with typedef, use
struct spinlock directly to avoiding following build errors:

In file included from include/linux/spinlock.h:81,
 from include/linux/seqlock.h:35,
 from include/linux/time.h:5,
 from include/uapi/linux/timex.h:56,
 from include/linux/timex.h:56,
 from include/linux/sched.h:17,
 from arch/powerpc/kernel/asm-offsets.c:17:
include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
/root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous 
declaration of 'spinlock_t' was here

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pgtable-ppc64.h |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index d27960c..bc141c9 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, 
unsigned long address,
pmd_t *pmdp);
 
 #define pmd_move_must_withdraw pmd_move_must_withdraw
-typedef struct spinlock spinlock_t;
-static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
-spinlock_t *old_pmd_ptl)
+struct spinlock;
+static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
+struct spinlock *old_pmd_ptl)
 {
/*
 * Archs like ppc64 use pgtable to store per pmd


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH powerpc 1/2] fix using page adddress as the page stucture pointer in __pte_free_tlb()

2013-12-02 Thread Li Zhong
It seems that in __pte_free_tlb (non 64K page code path), we wrongly
pass the page address as the page structure pointer to
pgtable_page_dtor(), which needs the page structure pointer as the
argument.

The change also fixes following bug on next-1128:

[0.563113] Unable to handle kernel paging request for data at address 
0x
[0.563121] Faulting instruction address: 0xc01d8e30
[0.563128] Oops: Kernel access of bad area, sig: 11 [#1]
[0.563132] PREEMPT SMP NR_CPUS=16 NUMA pSeries
[0.563143] Modules linked in:
[0.563150] CPU: 9 PID: 1 Comm: init Not tainted 
3.13.0-rc1-next-20131128-dirty #1
[0.563157] task: c001fed4 ti: c001fed3c000 task.ti: 
c001fed3c000
[0.563163] NIP: c01d8e30 LR: c01da4e8 CTR: c018ed28
[0.563171] REGS: c001fed3f060 TRAP: 0300   Not tainted  
(3.13.0-rc1-next-20131128-dirty)
[0.563177] MSR: 80009032 SF,EE,ME,IR,DR,RI  CR: 24222982  XER: 
2001
[0.563197] CFAR: c01d8e0c DAR:  DSISR: 4200 
SOFTE: 1 
GPR00:  c001fed3f2e0 c0d4ce98 c001fe01f500 
GPR04: f000  c01a9980  
GPR08: 00b69000 0004 c001fc550108  
GPR12: 44779982 cf33eb00 0020 1000 
GPR16: c000 0001 c001fcbd8000 c001fa4b83f8 
GPR20:  3fe0 c001fa0a8ff8  
GPR24: 0029 c001fed3c000 00210d00 0001 
GPR28:   c001fe01f500 f000 
[0.563299] NIP [c01d8e30] .__slab_free+0xc8/0x42c
[0.563306] LR [c01da4e8] .kmem_cache_free+0x1d4/0x364
[0.563311] Call Trace:
[0.563316] [c001fed3f2e0] [6db6db6db6db6db7] 0x6db6db6db6db6db7 
(unreliable)
[0.563325] [c001fed3f410] [c01da4e8] 
.kmem_cache_free+0x1d4/0x364
[0.563334] [c001fed3f4d0] [c01a9980] .ptlock_free+0x2c/0x44
[0.563342] [c001fed3f550] [c01aec98] .free_pgd_range+0x340/0x3d8
[0.563350] [c001fed3f680] [c01ee6b8] 
.shift_arg_pages+0x130/0x1a0
[0.563358] [c001fed3f7c0] [c01ee888] 
.setup_arg_pages+0x160/0x214
[0.563366] [c001fed3f870] [c0257664] 
.load_elf_binary+0x540/0x168c
[0.563374] [c001fed3f9d0] [c01ecf28] 
.search_binary_handler+0xcc/0x238
[0.563382] [c001fed3fa80] [c02551e8] .load_script+0x26c/0x290
[0.563390] [c001fed3fb90] [c01ecf28] 
.search_binary_handler+0xcc/0x238
[0.563397] [c001fed3fc40] [c01ef1f4] 
.do_execve_common+0x6b4/0xa0c
[0.563406] [c001fed3fd20] [c01ef634] .do_execve+0x40/0x58
[0.563414] [c001fed3fdb0] [c000bdb4] .kernel_init+0x74/0x158
[0.563422] [c001fed3fe30] [c000a0ec] 
.ret_from_kernel_thread+0x5c/0x70
[0.563429] Instruction dump:
[0.563433] 635a0d00 3b61 2fbc 41fe0014 e8810070 7f83e378 486543c1 
6000 
[0.563450] e81e0022 829f0018 92810090 eaff0010 7efd012a a1210090 8aa10093 
3809 
[0.563469] ---[ end trace 3be1fd3d950f1716 ]---

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pgalloc-64.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/pgalloc-64.h 
b/arch/powerpc/include/asm/pgalloc-64.h
index 6940128..d7543c2 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -151,7 +151,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
struct page *page = page_address(table);
 
tlb_flush_pgtable(tlb, address);
-   pgtable_page_dtor(page);
+   pgtable_page_dtor(table);
pgtable_free_tlb(tlb, page, 0);
 }
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH powerpc 2/2] adjust the variable type and name in __pte_free_tlb()

2013-12-02 Thread Li Zhong
The patch adjusts the variable type and name for page in
__pte_free_tlb(), which now seems a little confusing. 

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/pgalloc-64.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pgalloc-64.h 
b/arch/powerpc/include/asm/pgalloc-64.h
index d7543c2..1580f9e 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -148,11 +148,11 @@ static inline void pgtable_free_tlb(struct mmu_gather 
*tlb,
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
  unsigned long address)
 {
-   struct page *page = page_address(table);
+   void *page_addr = page_address(table);
 
tlb_flush_pgtable(tlb, address);
pgtable_page_dtor(table);
-   pgtable_free_tlb(tlb, page, 0);
+   pgtable_free_tlb(tlb, page_addr, 0);
 }
 
 #else /* if CONFIG_PPC_64K_PAGES */


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH powerpc] Revert c6102609 and replace it with the correct fix for vio dma mask setting

2013-11-27 Thread Li Zhong
This patch reverts my previous fix, and replace it with the correct
fix from Russell. 

And as Russell pointed out -- dma_set_mask_and_coherent() (and the other
dma_set_mask() functions) are really supposed to be used by drivers
only.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/vio.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c
index 76a6482..d771778 100644
--- a/arch/powerpc/kernel/vio.c
+++ b/arch/powerpc/kernel/vio.c
@@ -1419,7 +1419,8 @@ struct vio_dev *vio_register_device_node(struct 
device_node *of_node)
 
/* needed to ensure proper operation of coherent allocations
 * later, in case driver doesn't set it explicitly */
-   dma_coerce_mask_and_coherent(viodev-dev, DMA_BIT_MASK(64));
+   viodev-dev.coherent_dma_mask = DMA_BIT_MASK(64);
+   viodev-dev.dma_mask = viodev-dev.coherent_dma_mask;
}
 
/* register with generic device framework */


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH powerpc] Fix a dma_mask issue of vio

2013-11-20 Thread Li Zhong
On Thu, 2013-11-21 at 11:22 +1100, Benjamin Herrenschmidt wrote:
 On Thu, 2013-11-21 at 00:08 +, Russell King - ARM Linux wrote:
  On Thu, Nov 21, 2013 at 11:01:42AM +1100, Benjamin Herrenschmidt wrote:
   On Wed, 2013-11-20 at 23:23 +, Russell King - ARM Linux wrote:
Li Zong's patch works around the issue of a failing dma_set_mask(),
but as I've already said elsewhere, the real fix is to get whatever
created the struct device to initialise the dev-dma_mask with a
bus default.

Using dma_coerce_xxx() merely makes the problem go away papering
over the issue - it's fine to do it this way, but someone should still
fix the broken code creating these devices...
   
   Ok, they are created by the vio bus core, so it should be doing the
   job here of setting the dma_mask pointer to a proper value.
   
   Li, can you take care of that ? Look at other bus types we have in
   there such as the macio bus etc...
  
  Oh, hang on a moment, this is the bus code.
  
  In which case, the question becomes: do vio devices ever need to have
  a separate streaming DMA mask from a coherent DMA mask?  If not, then
  something like the following is what's needed here, and I should've
  never have used dma_set_mask_and_coherent().
 
 No, a single mask.
 
  dma_set_mask_and_coherent() (and the other dma_set_mask() functions)
  are really supposed to be used by drivers only.
  
   arch/powerpc/kernel/vio.c |3 ++-
   1 files changed, 2 insertions(+), 1 deletions(-)
  
  diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c
  index e7d0c88f621a..d771778f398e 100644
  --- a/arch/powerpc/kernel/vio.c
  +++ b/arch/powerpc/kernel/vio.c
  @@ -1419,7 +1419,8 @@ struct vio_dev *vio_register_device_node(struct 
  device_node *of_node)
   
  /* needed to ensure proper operation of coherent allocations
   * later, in case driver doesn't set it explicitly */
  -   dma_set_mask_and_coherent(viodev-dev, DMA_BIT_MASK(64));
  +   viodev-dev.coherent_dma_mask = DMA_BIT_MASK(64);
  +   viodev-dev.dma_mask = viodev-dev.coherent_dma_mask;
  }
   
  /* register with generic device framework */
 
 Right that's exactly what I had in mind. Li, can you test this please ?

Sure, and it works.

Tested-by: Li Zhong zh...@linux.vnet.ibm.com

 
 The previous fix using dma_coerce_mask_and_coherent() is already on
 its way to Linus, so we'll rework the above patch to undo it but for
 now please test.
 
 Cheers,
 Ben.
 
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH powerpc] Fix a dma_mask issue of vio

2013-11-19 Thread Li Zhong
I encountered following issue:
[0.283035] ibmvscsi 3015: couldn't initialize event pool
[5.688822] ibmvscsi: probe of 3015 failed with error -1

which prevents the storage from being recognized, and the machine from
booting.

After some digging, it seems that it is caused by commit 4886c399da

as dma_mask pointer in viodev-dev is not set, so in
dma_set_mask_and_coherent(), dma_set_coherent_mask() is not called
because dma_set_mask(), which is dma_set_mask_pSeriesLP() returned EIO.
While before the commit, dma_set_coherent_mask() is always called. 

I tried to replace dma_set_mask_and_coherent() with
dma_coerce_mask_and_coherent(), and the machine could boot again. 

But I'm not sure whether this is the correct fix...

---
 arch/powerpc/kernel/vio.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c
index e7d0c88..76a6482 100644
--- a/arch/powerpc/kernel/vio.c
+++ b/arch/powerpc/kernel/vio.c
@@ -1419,7 +1419,7 @@ struct vio_dev *vio_register_device_node(struct 
device_node *of_node)
 
/* needed to ensure proper operation of coherent allocations
 * later, in case driver doesn't set it explicitly */
-   dma_set_mask_and_coherent(viodev-dev, DMA_BIT_MASK(64));
+   dma_coerce_mask_and_coherent(viodev-dev, DMA_BIT_MASK(64));
}
 
/* register with generic device framework */





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH powerpc] Fix compiling error in powernv/rng.c

2013-11-19 Thread Li Zhong
On Tue, 2013-11-19 at 15:04 +1100, Michael Ellerman wrote:
 On Fri, Nov 15, 2013 at 03:36:04PM +0800, Li Zhong wrote:
  This is seen when CONFIG_SMP is not enabled:
  
  arch/powerpc/platforms/powernv/rng.c: In function 'rng_init_per_cpu':
  arch/powerpc/platforms/powernv/rng.c:74: error: implicit declaration of 
  function 'cpu_to_chip_id'
 
 Hi Li,
 
 We try whenever possible to avoid adding #ifdefs in C code.
 
 Also on a multi chip system where there are multiple RNGs, your code for
 UP will not necessarily choose the RNG on the same core as the cpu.

OK, thank you for the review, Michael.

Just try to make sure I understand it: 

So even in UP, we could have multiple rng sources, and we should try to
use the source which has the same chip_id as the logical cpu?

Thanks, Zhong

 I have a different fix that I will send.
 
 cheers
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH powerpc] Fix a dma_mask issue of vio

2013-11-19 Thread Li Zhong
On Wed, 2013-11-20 at 12:28 +1100, Benjamin Herrenschmidt wrote:
 On Tue, 2013-11-19 at 16:11 +0800, Li Zhong wrote:
  I encountered following issue:
  [0.283035] ibmvscsi 3015: couldn't initialize event pool
  [5.688822] ibmvscsi: probe of 3015 failed with error -1
  
  which prevents the storage from being recognized, and the machine from
  booting.
  
  After some digging, it seems that it is caused by commit 4886c399da
  
  as dma_mask pointer in viodev-dev is not set, so in
  dma_set_mask_and_coherent(), dma_set_coherent_mask() is not called
  because dma_set_mask(), which is dma_set_mask_pSeriesLP() returned EIO.
  While before the commit, dma_set_coherent_mask() is always called. 
  
  I tried to replace dma_set_mask_and_coherent() with
  dma_coerce_mask_and_coherent(), and the machine could boot again. 
  
  But I'm not sure whether this is the correct fix...
 
 Russell, care to chime in ? I can't make sense of the semantics...
 
 The original commit was fairly clear:
 
 
 Replace the following sequence:
 
   dma_set_mask(dev, mask);
   dma_set_coherent_mask(dev, mask);
 
 with a call to the new helper dma_set_mask_and_coherent().
 
 
 It all makes sense so far ... but doesn't work for some odd reason,
 and the fix uses a function whose name doesn't make much sense to
 me ... what is the difference between setting and coercing
 the mask ? And why doe replacing two set with a set both doesn't
 work and require a coerce ?

I think the difference is because the check of return value from
dma_set_mask in dma_coerce_mask_and_coherent():
--
int rc = dma_set_mask(dev, mask);
if (rc == 0)
dma_set_coherent_mask(dev, mask);
--

and in struct device {, dma_mask is a pointer, while coherent_dma_mask
is value (don't know why we have this difference). 

And here for pseries, dma_set_mask() failed because the dma_mask pointer
still remains null.

And in dma_coerce_mask_and_coherent(), the dma_mask is set with the
address of coherent_dma_mask
--
dev-dma_mask = dev-coherent_dma_mask;
--

Thanks, Zhong

 
 I'm asking because I'm worried about breakage elsewhere...
 
 Cheers,
 Ben. 
 
  ---
   arch/powerpc/kernel/vio.c |2 +-
   1 files changed, 1 insertions(+), 1 deletions(-)
  
  diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c
  index e7d0c88..76a6482 100644
  --- a/arch/powerpc/kernel/vio.c
  +++ b/arch/powerpc/kernel/vio.c
  @@ -1419,7 +1419,7 @@ struct vio_dev *vio_register_device_node(struct 
  device_node *of_node)
   
  /* needed to ensure proper operation of coherent allocations
   * later, in case driver doesn't set it explicitly */
  -   dma_set_mask_and_coherent(viodev-dev, DMA_BIT_MASK(64));
  +   dma_coerce_mask_and_coherent(viodev-dev, DMA_BIT_MASK(64));
  }
   
  /* register with generic device framework */
  
  
  
  
 
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH powerpc] Fix compiling error in rng.c

2013-11-14 Thread Li Zhong
This patch tries to fix following compiling errors, by including the
needed header file:

arch/powerpc/platforms/pseries/rng.c: In function 'pseries_get_random_long':
arch/powerpc/platforms/pseries/rng.c:20: error: 'PLPAR_HCALL_BUFSIZE' 
undeclared (first use in this function)
...

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/rng.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/rng.c 
b/arch/powerpc/platforms/pseries/rng.c
index a702f1c..62c7838 100644
--- a/arch/powerpc/platforms/pseries/rng.c
+++ b/arch/powerpc/platforms/pseries/rng.c
@@ -13,7 +13,7 @@
 #include linux/of.h
 #include asm/archrandom.h
 #include asm/machdep.h
-
+#include asm/hvcall.h
 
 static int pseries_get_random_long(unsigned long *v)
 {


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH powerpc] Fix compiling error in powernv/rng.c

2013-11-14 Thread Li Zhong
This is seen when CONFIG_SMP is not enabled:

arch/powerpc/platforms/powernv/rng.c: In function 'rng_init_per_cpu':
arch/powerpc/platforms/powernv/rng.c:74: error: implicit declaration of 
function 'cpu_to_chip_id'

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/rng.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/rng.c 
b/arch/powerpc/platforms/powernv/rng.c
index 8844628..04430a7 100644
--- a/arch/powerpc/platforms/powernv/rng.c
+++ b/arch/powerpc/platforms/powernv/rng.c
@@ -59,6 +59,7 @@ int powernv_get_random_long(unsigned long *v)
 }
 EXPORT_SYMBOL_GPL(powernv_get_random_long);
 
+#ifdef CONFIG_SMP
 static __init void rng_init_per_cpu(struct powernv_rng *rng,
struct device_node *dn)
 {
@@ -75,6 +76,13 @@ static __init void rng_init_per_cpu(struct powernv_rng *rng,
}
}
 }
+#else
+static __init void rng_init_per_cpu(struct powernv_rng *rng,
+   struct device_node *dn)
+{
+   per_cpu(powernv_rng, 0) = rng;
+}
+#endif
 
 static __init int rng_create(struct device_node *dn)
 {


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH powerpc] Set cpu sibling mask before online cpu

2013-06-13 Thread Li Zhong
On Tue, 2013-06-11 at 14:03 +0530, Srivatsa S. Bhat wrote:
 On 06/11/2013 12:30 PM, Benjamin Herrenschmidt wrote:
  On Thu, 2013-05-16 at 18:20 +0800, Li Zhong wrote:
  It seems following race is possible:
 
  
   .../...
  
 vdso_getcpu_init();
   #endif
  -  notify_cpu_starting(cpu);
  -  set_cpu_online(cpu, true);
 /* Update sibling maps */
 base = cpu_first_thread_sibling(cpu);
 for (i = 0; i  threads_per_core; i++) {
  -  if (cpu_is_offline(base + i))
  +  if (cpu_is_offline(base + i)  (cpu != base + i))
 continue;
 cpumask_set_cpu(cpu, cpu_sibling_mask(base + i));
 cpumask_set_cpu(base + i, cpu_sibling_mask(cpu));
  @@ -667,6 +665,10 @@ __cpuinit void start_secondary(void *unused)
 }
 of_node_put(l2_cache);
   
  +  smp_wmb();
  +  notify_cpu_starting(cpu);
  +  set_cpu_online(cpu, true);
  +
  
  So we could have an online CPU with an empty sibling mask. Now we can
  have a sibling that isn't online ... Is that ok ?
 
 I think it is OK. We do the same thing on x86 as well - we set up the
 sibling links before calling notify_cpu_starting() and setting the cpu
 in the cpu_online_mask. In fact, there is even a comment explicitly
 noting that order:
 
 arch/x86/kernel/smpboot.c:
  220 /*
  221  * This must be done before setting cpu_online_mask
  222  * or calling notify_cpu_starting.
  223  */
  224 set_cpu_sibling_map(raw_smp_processor_id());
  225 wmb();
  226 
  227 notify_cpu_starting(cpuid);
  228 
  229 /*
  230  * Allow the master to continue.
  231  */
  232 cpumask_set_cpu(cpuid, cpu_callin_mask);
 
 
 So I agree with Li Zhong's solution.
 
 [Arch-specific CPU hotplug code consolidation efforts such as [1] would
 have weeded out such nasty bugs.. I guess we should revive that patchset
 sometime soon.]
 

Thank you both for the review and comments. 

Good to know that it matches that of x86, and there is a patchset
consolidating the code. With the patches in [1], it seems we only need
the line to include the to be onlined cpu in this patch. 

Thanks, Zhong 

 Regards,
 Srivatsa S. Bhat
 
 [1]. https://lwn.net/Articles/500185/
 




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH powerpc] Set cpu sibling mask before online cpu

2013-05-16 Thread Li Zhong
It seems following race is possible:

cpu0cpux
smp_init-cpu_up-_cpu_up
__cpu_up
kick_cpu(1)
-
waiting online  ...
... notify CPU_STARTING
set cpux active
set cpux online
-
finish waiting online
...
sched_init_smp
init_sched_domains(cpu_active_mask)
build_sched_domains
set cpux sibling info
-

Execution of cpu0 and cpux could be concurrent between two separator
lines.

So if the cpux sibling information was set too late (normally
impossible, but could be triggered by adding some delay in
start_secondary, after setting cpu online), build_sched_domains()
running on cpu0 might see cpux active, with an empty sibling mask, then
cause some bad address accessing like following:

[0.099855] Unable to handle kernel paging request for data at address 
0xc0038518078f
[0.099868] Faulting instruction address: 0xc00b7a64
[0.099883] Oops: Kernel access of bad area, sig: 11 [#1]
[0.099895] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[0.099922] Modules linked in:
[0.099940] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
3.10.0-rc1-00120-gb973425-dirty #16
[0.099956] task: c001fed8 ti: c001fed7c000 task.ti: 
c001fed7c000
[0.099971] NIP: c00b7a64 LR: c00b7a40 CTR: c00b4934
[0.099985] REGS: c001fed7f760 TRAP: 0300   Not tainted  
(3.10.0-rc1-00120-gb973425-dirty)
[0.07] MSR: 80009032 SF,EE,ME,IR,DR,RI  CR: 24272828  XER: 
2003
[0.100045] SOFTE: 1
[0.100053] CFAR: c0445ee8
[0.100064] DAR: c0038518078f, DSISR: 4000
[0.100073] 
GPR00: 0080 c001fed7f9e0 c0c84d48 0010 
GPR04: 0010  c001fc55e090  
GPR08:  c0b80b30 c0c962d8 0003845ffc5f 
GPR12:  cf33d000 c000b9e4  
GPR16:   0001  
GPR20: c0ccf750  c0c94d48 c001fc504000 
GPR24: c001fc504000 c001fecef848 c0c94d48 c0ccf000 
GPR28: c001fc522090 0010 c001fecef848 c001fed7fae0 
[0.100293] NIP [c00b7a64] .get_group+0x84/0xc4
[0.100307] LR [c00b7a40] .get_group+0x60/0xc4
[0.100318] Call Trace:
[0.100332] [c001fed7f9e0] [c00dbce4] .lock_is_held+0xa8/0xd0 
(unreliable)
[0.100354] [c001fed7fa70] [c00bf62c] 
.build_sched_domains+0x728/0xd14
[0.100375] [c001fed7fbe0] [c0af67bc] .sched_init_smp+0x4fc/0x654
[0.100394] [c001fed7fce0] [c0adce24] 
.kernel_init_freeable+0x17c/0x30c
[0.100413] [c001fed7fdb0] [c000ba08] .kernel_init+0x24/0x12c
[0.100431] [c001fed7fe30] [c0009f74] 
.ret_from_kernel_thread+0x5c/0x68
[0.100445] Instruction dump:
[0.100456] 38800010 38a0 4838e3f5 6000 7c6307b4 2fbf 419e0040 
3d220001 
[0.100496] 78601f24 39491590 e93e0008 7d6a002a 7d69582a f97f 7d4a002a 
e93e0010 
[0.100559] ---[ end trace 31fd0ba7d8756001 ]---

This patch tries to move the sibling maps updating before
notify_cpu_starting() and cpu online, and a write barrier there to make
sure sibling maps are updated before active and online mask. 

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/smp.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ee7ac5e..c765937 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -637,12 +637,10 @@ __cpuinit void start_secondary(void *unused)
 
vdso_getcpu_init();
 #endif
-   notify_cpu_starting(cpu);
-   set_cpu_online(cpu, true);
/* Update sibling maps */
base = cpu_first_thread_sibling(cpu);
for (i = 0; i  threads_per_core; i++) {
-   if (cpu_is_offline(base + i))
+   if (cpu_is_offline(base + i)  (cpu != base + i))
continue;
cpumask_set_cpu(cpu, cpu_sibling_mask(base + i));
cpumask_set_cpu(base + i, cpu_sibling_mask(cpu));
@@ -667,6 +665,10 @@ __cpuinit void start_secondary(void *unused)
}
of_node_put(l2_cache);
 
+   smp_wmb();
+   notify_cpu_starting(cpu);
+   set_cpu_online(cpu, true

Re: [RFC PATCH v3 0/5] powerpc: Support context tracking for Power pSeries

2013-05-13 Thread Li Zhong
On Mon, 2013-05-13 at 15:51 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2013-05-13 at 13:21 +0800, Li Zhong wrote:
  These patches try to support context tracking for Power arch, beginning with
  64-bit pSeries. The codes are ported from that of the x86_64, and in each
  patch, I listed the corresponding patch for x86.
 
 So that's yet another pile of bloat on all syscall entry/exit and
 exception entry/exit. What is it used for ? (I haven't followed on
 x86_64 side).

To my understanding, it is used to enable RCU user extended quiescent
state, so RCU on that cpu doesn't need scheduler ticks. And together
with some other code(already in 3.10), we are able to remove the ticks
in some cases (e.g. only 1 task running on the cpu, with some other
limitations).

Maybe Paul, or Frederic could give some better descriptions. 

Thanks, Zhong

 
 Cheers,
 Ben.
 
  v3:
  
  This version is mainly a rebasing, against 3.10-rc1, also as the common 
  code 
  to handle the exception are pulled into 3.10, so there is no dependency on 
  tip tree. So patch #2 and #6 in previous version_2 is merged together. 
  
  Li Zhong (5):
powerpc: Syscall hooks for context tracking subsystem
powerpc: Exception hooks for context tracking subsystem
powerpc: Exit user context on notify resume
powerpc: Use the new schedule_user API on userspace preemption
powerpc: select HAVE_CONTEXT_TRACKING for pSeries
  
   arch/powerpc/include/asm/context_tracking.h |   10 +++
   arch/powerpc/include/asm/thread_info.h  |7 ++-
   arch/powerpc/kernel/entry_64.S  |3 +-
   arch/powerpc/kernel/ptrace.c|5 ++
   arch/powerpc/kernel/signal.c|5 ++
   arch/powerpc/kernel/traps.c |   91 
  ---
   arch/powerpc/mm/fault.c |   16 -
   arch/powerpc/mm/hash_utils_64.c |   38 ---
   arch/powerpc/platforms/pseries/Kconfig  |1 +
   9 files changed, 140 insertions(+), 36 deletions(-)
   create mode 100644 arch/powerpc/include/asm/context_tracking.h
  
 
 




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v3 2/5] powerpc: Exception hooks for context tracking subsystem

2013-05-13 Thread Li Zhong
On Mon, 2013-05-13 at 15:57 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2013-05-13 at 13:21 +0800, Li Zhong wrote:
  int recover = 0;
  +   enum ctx_state prev_state;
  +
  +   prev_state = exception_enter();
 
 Please make it nicer:
 
   enum ctx_state prev_state = exception_enter();
   int recover = 0;

OK, I'll fix it, and all the others.

  __get_cpu_var(irq_stat).mce_exceptions++;
   
  @@ -683,7 +687,7 @@ void machine_check_exception(struct pt_regs *regs)
  recover = cur_cpu_spec-machine_check(regs);
   
  if (recover  0)
  -   return;
  +   goto exit;
 
 I'm no fan of exit as a label as it's otherwise a libc function (I
 know there is no relationship here but I just find that annoying).
 
 I usually use bail

OK, I'll fix it, and all the others.

 
   #if defined(CONFIG_8xx)  defined(CONFIG_PCI)
  /* the qspan pci read routines can cause machine checks -- Cort
  @@ -693,20 +697,23 @@ void machine_check_exception(struct pt_regs *regs)
   * -- BenH
   */
  bad_page_fault(regs, regs-dar, SIGBUS);
  -   return;
  +   goto exit;
   #endif
   
  if (debugger_fault_handler(regs))
  -   return;
  +   goto exit;
   
  if (check_io_access(regs))
  -   return;
  +   goto exit;
   
  die(Machine check, regs, SIGBUS);
   
  /* Must die if the interrupt is not recoverable */
  if (!(regs-msr  MSR_RI))
  panic(Unrecoverable Machine check);
  +
  +exit:
  +   exception_exit(prev_state);
   }
   
   void SMIException(struct pt_regs *regs)
  @@ -716,20 +723,31 @@ void SMIException(struct pt_regs *regs)
   
   void unknown_exception(struct pt_regs *regs)
   {
  +   enum ctx_state prev_state;
  +   prev_state = exception_enter();
  +
 
 Same cosmetic comment for all other cases
 
  /...
 
   #include asm/firmware.h
   #include asm/page.h
  @@ -193,8 +194,8 @@ static int mm_fault_error(struct pt_regs *regs, 
  unsigned long addr, int fault)
* The return value is 0 if the fault was handled, or the signal
* number if this is a kernel fault that can't be handled here.
*/
  -int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
  -   unsigned long error_code)
  +static int __kprobes __do_page_fault(struct pt_regs *regs,
  +   unsigned long address, unsigned long error_code)
   {
  struct vm_area_struct * vma;
  struct mm_struct *mm = current-mm;
  @@ -475,6 +476,17 @@ bad_area_nosemaphore:
   
   }
   
  +int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
  +   unsigned long error_code)
  +{
  +   int ret;
  +   enum ctx_state prev_state;
  +   prev_state = exception_enter();
  +   ret = __do_page_fault(regs, address, error_code);
  +   exception_exit(prev_state);
  +   return ret;
  +}
 
 Any reason why you didn't use the same wrapping trick in traps.c ? I'd
 rather we stay consistent in the way we do things between the two files.

OK, I'll make it consistent with those in traps.c

 
   /*
* bad_page_fault is called when we have a bad access from the kernel.
* It is called from the DSI and ISI handlers in head.S and from some
  diff --git a/arch/powerpc/mm/hash_utils_64.c 
  b/arch/powerpc/mm/hash_utils_64.c
  index 88ac0ee..d92fb26 100644
  --- a/arch/powerpc/mm/hash_utils_64.c
  +++ b/arch/powerpc/mm/hash_utils_64.c
  @@ -33,6 +33,7 @@
   #include linux/init.h
   #include linux/signal.h
   #include linux/memblock.h
  +#include linux/context_tracking.h
   
   #include asm/processor.h
   #include asm/pgtable.h
  @@ -962,6 +963,9 @@ int hash_page(unsigned long ea, unsigned long access, 
  unsigned long trap)
  const struct cpumask *tmp;
  int rc, user_region = 0, local = 0;
  int psize, ssize;
  +   enum ctx_state prev_state;
  +
  +   prev_state = exception_enter();
   
  DBG_LOW(hash_page(ea=%016lx, access=%lx, trap=%lx\n,
  ea, access, trap);
  @@ -973,7 +977,8 @@ int hash_page(unsigned long ea, unsigned long access, 
  unsigned long trap)
  mm = current-mm;
  if (! mm) {
  DBG_LOW( user region with no mm !\n);
  -   return 1;
  +   rc = 1;
  +   goto exit;
  }
  psize = get_slice_psize(mm, ea);
  ssize = user_segment_size(ea);
  @@ -992,19 +997,23 @@ int hash_page(unsigned long ea, unsigned long access, 
  unsigned long trap)
  /* Not a valid range
   * Send the problem up to do_page_fault 
   */
  -   return 1;
  +   rc = 1;
  +   goto exit;
  }
  DBG_LOW( mm=%p, mm-pgdir=%p, vsid=%016lx\n, mm, mm-pgd, vsid);
   
  /* Bad address. */
  if (!vsid) {
  DBG_LOW(Bad address!\n);
  -   return 1;
  +   rc = 1;
  +   goto exit;
  }
  /* Get pgdir */
  pgdir = mm-pgd;
  -   if (pgdir

Re: [RFC PATCH v3 0/5] powerpc: Support context tracking for Power pSeries

2013-05-13 Thread Li Zhong
On Mon, 2013-05-13 at 18:59 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2013-05-13 at 16:03 +0800, Li Zhong wrote:
  
  To my understanding, it is used to enable RCU user extended quiescent
  state, so RCU on that cpu doesn't need scheduler ticks. And together
  with some other code(already in 3.10), we are able to remove the ticks
  in some cases (e.g. only 1 task running on the cpu, with some other
  limitations).
 
 Ok, sounds interesting. Once you fix the little cosmetic issue, I don't
 see any reason not to merge them as it's basically wiring up an existing
 feature (in that regard the patches are pretty straightforward) and I
 assume the overhead is only there when you enable it.

Thanks for your support, Ben.

Yes, there should be no overhead if not enabled. 

Thanks, Zhong

 
 Cheers,
 Ben.
 
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v3 2/5] powerpc: Exception hooks for context tracking subsystem

2013-05-13 Thread Li Zhong
On Mon, 2013-05-13 at 19:06 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2013-05-13 at 16:44 +0800, Li Zhong wrote:
  Yes, the above and hash_page() are two C functions for a same exception.
  And the exception hooks enable RCU usage in those C codes. But for asm
  codes, I think we could assume that there would be no RCU usage there,
  so we don't need wrap them in the hooks.
 
 hash_page() won't start a new RCU, at least not in its current incarnation,
 the only thing I can see it ever doing would be to take some RCU read locks 
 one
 day (it doesn't today).

Seems I added the hooks because of the trace point of hcall
entry/exit ...

Thanks, Zhong

 
 low_hash_fault() is a different beast. It will typically kill things, thus
 involving sending signals etc... RCU might well be involved.
 
 Cheers,
 Ben.
 
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 0/5] powerpc: Support context tracking for Power pSeries

2013-05-13 Thread Li Zhong
These patches try to support context tracking for Power arch, beginning with
64-bit pSeries. The codes are ported from that of the x86_64, and in each
patch, I listed the corresponding patch for x86.

v4:
fixed some cosmetic issues suggested by Ben.

Li Zhong (5):
  powerpc: Syscall hooks for context tracking subsystem
  powerpc: Exception hooks for context tracking subsystem
  powerpc: Exit user context on notify resume
  powerpc: Use the new schedule_user API on userspace preemption
  powerpc: select HAVE_CONTEXT_TRACKING for pSeries

 arch/powerpc/include/asm/context_tracking.h |   10 
 arch/powerpc/include/asm/thread_info.h  |7 ++-
 arch/powerpc/kernel/entry_64.S  |3 +-
 arch/powerpc/kernel/ptrace.c|5 ++
 arch/powerpc/kernel/signal.c|5 ++
 arch/powerpc/kernel/traps.c |   80 +++
 arch/powerpc/mm/fault.c |   41 +-
 arch/powerpc/mm/hash_utils_64.c |   36 +---
 arch/powerpc/platforms/pseries/Kconfig  |1 +
 9 files changed, 140 insertions(+), 48 deletions(-)
 create mode 100644 arch/powerpc/include/asm/context_tracking.h

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 1/5] powerpc: Syscall hooks for context tracking subsystem

2013-05-13 Thread Li Zhong
This is the syscall slow path hooks for context tracking subsystem,
corresponding to
[PATCH] x86: Syscall hooks for userspace RCU extended QS
  commit bf5a3c13b939813d28ce26c01425054c740d6731

TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
better for it to be in the same 16 bits with others in the group, so in the
asm code, andi. with this group could work.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
Acked-by: Frederic Weisbecker fweis...@gmail.com
---
 arch/powerpc/include/asm/thread_info.h |7 +--
 arch/powerpc/kernel/ptrace.c   |5 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 8ceea14..ba7b197 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -97,7 +97,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_PERFMON_CTXSW  6   /* perfmon needs ctxsw calls */
 #define TIF_SYSCALL_AUDIT  7   /* syscall auditing active */
 #define TIF_SINGLESTEP 8   /* singlestepping active */
-#define TIF_MEMDIE 9   /* is terminating due to OOM killer */
+#define TIF_NOHZ   9   /* in adaptive nohz mode */
 #define TIF_SECCOMP10  /* secure computing */
 #define TIF_RESTOREALL 11  /* Restore all regs (implies NOERROR) */
 #define TIF_NOERROR12  /* Force successful syscall return */
@@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SYSCALL_TRACEPOINT 15  /* syscall tracepoint instrumentation */
 #define TIF_EMULATE_STACK_STORE16  /* Is an instruction emulation
for stack store? */
+#define TIF_MEMDIE 17  /* is terminating due to OOM killer */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE (1TIF_SYSCALL_TRACE)
@@ -124,8 +125,10 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_UPROBE(1TIF_UPROBE)
 #define _TIF_SYSCALL_TRACEPOINT(1TIF_SYSCALL_TRACEPOINT)
 #define _TIF_EMULATE_STACK_STORE   (1TIF_EMULATE_STACK_STORE)
+#define _TIF_NOHZ  (1TIF_NOHZ)
 #define _TIF_SYSCALL_T_OR_A(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
-_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
+_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
+_TIF_NOHZ)
 
 #define _TIF_USER_WORK_MASK(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
 _TIF_NOTIFY_RESUME | _TIF_UPROBE)
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 3b14d32..98c2fc1 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include trace/syscall.h
 #include linux/hw_breakpoint.h
 #include linux/perf_event.h
+#include linux/context_tracking.h
 
 #include asm/uaccess.h
 #include asm/page.h
@@ -1788,6 +1789,8 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 {
long ret = 0;
 
+   user_exit();
+
secure_computing_strict(regs-gpr[0]);
 
if (test_thread_flag(TIF_SYSCALL_TRACE) 
@@ -1832,4 +1835,6 @@ void do_syscall_trace_leave(struct pt_regs *regs)
step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, step);
+
+   user_enter();
 }
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 2/5] powerpc: Exception hooks for context tracking subsystem

2013-05-13 Thread Li Zhong
This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.

This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
  commit 6ba3c97a38803883c2eee489505796cb0a727122

But after the exception handling moved to generic code, and some changes in
following two commits:
56dd9470d7c8734f055da2a6bac553caf4a468eb
  context_tracking: Move exception handling to generic code
6c1e0256fad84a843d915414e4b5973b7443d48d
  context_tracking: Restore correct previous context state on exception exit

it is able for exception hooks to use the generic code above instead of a
redundant arch implementation.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/traps.c |   80 ---
 arch/powerpc/mm/fault.c |   41 +---
 arch/powerpc/mm/hash_utils_64.c |   36 +-
 3 files changed, 112 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 83efa2f..a7a648f 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -35,6 +35,7 @@
 #include linux/kdebug.h
 #include linux/debugfs.h
 #include linux/ratelimit.h
+#include linux/context_tracking.h
 
 #include asm/emulated_ops.h
 #include asm/pgtable.h
@@ -667,6 +668,7 @@ int machine_check_generic(struct pt_regs *regs)
 
 void machine_check_exception(struct pt_regs *regs)
 {
+   enum ctx_state prev_state = exception_enter();
int recover = 0;
 
__get_cpu_var(irq_stat).mce_exceptions++;
@@ -683,7 +685,7 @@ void machine_check_exception(struct pt_regs *regs)
recover = cur_cpu_spec-machine_check(regs);
 
if (recover  0)
-   return;
+   goto bail;
 
 #if defined(CONFIG_8xx)  defined(CONFIG_PCI)
/* the qspan pci read routines can cause machine checks -- Cort
@@ -693,20 +695,23 @@ void machine_check_exception(struct pt_regs *regs)
 * -- BenH
 */
bad_page_fault(regs, regs-dar, SIGBUS);
-   return;
+   goto bail;
 #endif
 
if (debugger_fault_handler(regs))
-   return;
+   goto bail;
 
if (check_io_access(regs))
-   return;
+   goto bail;
 
die(Machine check, regs, SIGBUS);
 
/* Must die if the interrupt is not recoverable */
if (!(regs-msr  MSR_RI))
panic(Unrecoverable Machine check);
+
+bail:
+   exception_exit(prev_state);
 }
 
 void SMIException(struct pt_regs *regs)
@@ -716,20 +721,29 @@ void SMIException(struct pt_regs *regs)
 
 void unknown_exception(struct pt_regs *regs)
 {
+   enum ctx_state prev_state = exception_enter();
+
printk(Bad trap at PC: %lx, SR: %lx, vector=%lx\n,
   regs-nip, regs-msr, regs-trap);
 
_exception(SIGTRAP, regs, 0, 0);
+
+   exception_exit(prev_state);
 }
 
 void instruction_breakpoint_exception(struct pt_regs *regs)
 {
+   enum ctx_state prev_state = exception_enter();
+
if (notify_die(DIE_IABR_MATCH, iabr_match, regs, 5,
5, SIGTRAP) == NOTIFY_STOP)
-   return;
+   goto bail;
if (debugger_iabr_match(regs))
-   return;
+   goto bail;
_exception(SIGTRAP, regs, TRAP_BRKPT, regs-nip);
+
+bail:
+   exception_exit(prev_state);
 }
 
 void RunModeException(struct pt_regs *regs)
@@ -739,15 +753,20 @@ void RunModeException(struct pt_regs *regs)
 
 void __kprobes single_step_exception(struct pt_regs *regs)
 {
+   enum ctx_state prev_state = exception_enter();
+
clear_single_step(regs);
 
if (notify_die(DIE_SSTEP, single_step, regs, 5,
5, SIGTRAP) == NOTIFY_STOP)
-   return;
+   goto bail;
if (debugger_sstep(regs))
-   return;
+   goto bail;
 
_exception(SIGTRAP, regs, TRAP_TRACE, regs-nip);
+
+bail:
+   exception_exit(prev_state);
 }
 
 /*
@@ -1005,6 +1024,7 @@ int is_valid_bugaddr(unsigned long addr)
 
 void __kprobes program_check_exception(struct pt_regs *regs)
 {
+   enum ctx_state prev_state = exception_enter();
unsigned int reason = get_reason(regs);
extern int do_mathemu(struct pt_regs *regs);
 
@@ -1014,26 +1034,26 @@ void __kprobes program_check_exception(struct pt_regs 
*regs)
if (reason  REASON_FP) {
/* IEEE FP exception */
parse_fpe(regs);
-   return;
+   goto bail;
}
if (reason  REASON_TRAP) {
/* Debugger is first in line to stop recursive faults in
 * rcu_lock, notify_die, or atomic_notifier_call_chain */
if (debugger_bpt(regs

[RFC PATCH v4 3/5] powerpc: Exit user context on notify resume

2013-05-13 Thread Li Zhong
This patch allows RCU usage in do_notify_resume, e.g. signal handling.
It corresponds to
[PATCH] x86: Exit RCU extended QS on notify resume
  commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/signal.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index cf12eae..d63b502 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -13,6 +13,7 @@
 #include linux/signal.h
 #include linux/uprobes.h
 #include linux/key.h
+#include linux/context_tracking.h
 #include asm/hw_breakpoint.h
 #include asm/uaccess.h
 #include asm/unistd.h
@@ -159,6 +160,8 @@ static int do_signal(struct pt_regs *regs)
 
 void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 {
+   user_exit();
+
if (thread_info_flags  _TIF_UPROBE)
uprobe_notify_resume(regs);
 
@@ -169,4 +172,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long 
thread_info_flags)
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
}
+
+   user_enter();
 }
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 4/5] powerpc: Use the new schedule_user API on userspace preemption

2013-05-13 Thread Li Zhong
This patch corresponds to
[PATCH] x86: Use the new schedule_user API on userspace preemption
  commit 0430499ce9d78691f3985962021b16bf8f8a8048

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/context_tracking.h |   10 ++
 arch/powerpc/kernel/entry_64.S  |3 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/context_tracking.h

diff --git a/arch/powerpc/include/asm/context_tracking.h 
b/arch/powerpc/include/asm/context_tracking.h
new file mode 100644
index 000..b6f5a33
--- /dev/null
+++ b/arch/powerpc/include/asm/context_tracking.h
@@ -0,0 +1,10 @@
+#ifndef _ASM_POWERPC_CONTEXT_TRACKING_H
+#define _ASM_POWERPC_CONTEXT_TRACKING_H
+
+#ifdef CONFIG_CONTEXT_TRACKING
+#define SCHEDULE_USER bl   .schedule_user
+#else
+#define SCHEDULE_USER bl   .schedule
+#endif
+
+#endif
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 915fbb4..d418977 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -33,6 +33,7 @@
 #include asm/irqflags.h
 #include asm/ftrace.h
 #include asm/hw_irq.h
+#include asm/context_tracking.h
 
 /*
  * System calls.
@@ -634,7 +635,7 @@ _GLOBAL(ret_from_except_lite)
andi.   r0,r4,_TIF_NEED_RESCHED
beq 1f
bl  .restore_interrupts
-   bl  .schedule
+   SCHEDULE_USER
b   .ret_from_except_lite
 
 1: bl  .save_nvgprs
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 5/5] powerpc: select HAVE_CONTEXT_TRACKING for pSeries

2013-05-13 Thread Li Zhong
Start context tracking support from pSeries.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 9a0941b..023b288 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -18,6 +18,7 @@ config PPC_PSERIES
select PPC_PCI_CHOICE if EXPERT
select ZLIB_DEFLATE
select PPC_DOORBELL
+   select HAVE_CONTEXT_TRACKING
default y
 
 config PPC_SPLPAR
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 0/5] powerpc: Support context tracking for Power pSeries

2013-05-12 Thread Li Zhong
These patches try to support context tracking for Power arch, beginning with
64-bit pSeries. The codes are ported from that of the x86_64, and in each
patch, I listed the corresponding patch for x86.

v3:

This version is mainly a rebasing, against 3.10-rc1, also as the common code 
to handle the exception are pulled into 3.10, so there is no dependency on 
tip tree. So patch #2 and #6 in previous version_2 is merged together. 

Li Zhong (5):
  powerpc: Syscall hooks for context tracking subsystem
  powerpc: Exception hooks for context tracking subsystem
  powerpc: Exit user context on notify resume
  powerpc: Use the new schedule_user API on userspace preemption
  powerpc: select HAVE_CONTEXT_TRACKING for pSeries

 arch/powerpc/include/asm/context_tracking.h |   10 +++
 arch/powerpc/include/asm/thread_info.h  |7 ++-
 arch/powerpc/kernel/entry_64.S  |3 +-
 arch/powerpc/kernel/ptrace.c|5 ++
 arch/powerpc/kernel/signal.c|5 ++
 arch/powerpc/kernel/traps.c |   91 ---
 arch/powerpc/mm/fault.c |   16 -
 arch/powerpc/mm/hash_utils_64.c |   38 ---
 arch/powerpc/platforms/pseries/Kconfig  |1 +
 9 files changed, 140 insertions(+), 36 deletions(-)
 create mode 100644 arch/powerpc/include/asm/context_tracking.h

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 1/5] powerpc: Syscall hooks for context tracking subsystem

2013-05-12 Thread Li Zhong
This is the syscall slow path hooks for context tracking subsystem,
corresponding to
[PATCH] x86: Syscall hooks for userspace RCU extended QS
  commit bf5a3c13b939813d28ce26c01425054c740d6731

TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
better for it to be in the same 16 bits with others in the group, so in the
asm code, andi. with this group could work.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
Acked-by: Frederic Weisbecker fweis...@gmail.com
---
 arch/powerpc/include/asm/thread_info.h |7 +--
 arch/powerpc/kernel/ptrace.c   |5 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 8ceea14..ba7b197 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -97,7 +97,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_PERFMON_CTXSW  6   /* perfmon needs ctxsw calls */
 #define TIF_SYSCALL_AUDIT  7   /* syscall auditing active */
 #define TIF_SINGLESTEP 8   /* singlestepping active */
-#define TIF_MEMDIE 9   /* is terminating due to OOM killer */
+#define TIF_NOHZ   9   /* in adaptive nohz mode */
 #define TIF_SECCOMP10  /* secure computing */
 #define TIF_RESTOREALL 11  /* Restore all regs (implies NOERROR) */
 #define TIF_NOERROR12  /* Force successful syscall return */
@@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SYSCALL_TRACEPOINT 15  /* syscall tracepoint instrumentation */
 #define TIF_EMULATE_STACK_STORE16  /* Is an instruction emulation
for stack store? */
+#define TIF_MEMDIE 17  /* is terminating due to OOM killer */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE (1TIF_SYSCALL_TRACE)
@@ -124,8 +125,10 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_UPROBE(1TIF_UPROBE)
 #define _TIF_SYSCALL_TRACEPOINT(1TIF_SYSCALL_TRACEPOINT)
 #define _TIF_EMULATE_STACK_STORE   (1TIF_EMULATE_STACK_STORE)
+#define _TIF_NOHZ  (1TIF_NOHZ)
 #define _TIF_SYSCALL_T_OR_A(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
-_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
+_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
+_TIF_NOHZ)
 
 #define _TIF_USER_WORK_MASK(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
 _TIF_NOTIFY_RESUME | _TIF_UPROBE)
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 3b14d32..98c2fc1 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include trace/syscall.h
 #include linux/hw_breakpoint.h
 #include linux/perf_event.h
+#include linux/context_tracking.h
 
 #include asm/uaccess.h
 #include asm/page.h
@@ -1788,6 +1789,8 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 {
long ret = 0;
 
+   user_exit();
+
secure_computing_strict(regs-gpr[0]);
 
if (test_thread_flag(TIF_SYSCALL_TRACE) 
@@ -1832,4 +1835,6 @@ void do_syscall_trace_leave(struct pt_regs *regs)
step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, step);
+
+   user_enter();
 }
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 2/5] powerpc: Exception hooks for context tracking subsystem

2013-05-12 Thread Li Zhong
This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.

This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
  commit 6ba3c97a38803883c2eee489505796cb0a727122

But after the exception handling moved to generic code, and some changes in
following two commits:
56dd9470d7c8734f055da2a6bac553caf4a468eb
  context_tracking: Move exception handling to generic code
6c1e0256fad84a843d915414e4b5973b7443d48d
  context_tracking: Restore correct previous context state on exception exit

it is able for exception hooks to use the generic code above instead of a
redundant arch implementation.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/traps.c |   91 +--
 arch/powerpc/mm/fault.c |   16 ++-
 arch/powerpc/mm/hash_utils_64.c |   38 
 3 files changed, 112 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 83efa2f..9d3c000 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -35,6 +35,7 @@
 #include linux/kdebug.h
 #include linux/debugfs.h
 #include linux/ratelimit.h
+#include linux/context_tracking.h
 
 #include asm/emulated_ops.h
 #include asm/pgtable.h
@@ -668,6 +669,9 @@ int machine_check_generic(struct pt_regs *regs)
 void machine_check_exception(struct pt_regs *regs)
 {
int recover = 0;
+   enum ctx_state prev_state;
+
+   prev_state = exception_enter();
 
__get_cpu_var(irq_stat).mce_exceptions++;
 
@@ -683,7 +687,7 @@ void machine_check_exception(struct pt_regs *regs)
recover = cur_cpu_spec-machine_check(regs);
 
if (recover  0)
-   return;
+   goto exit;
 
 #if defined(CONFIG_8xx)  defined(CONFIG_PCI)
/* the qspan pci read routines can cause machine checks -- Cort
@@ -693,20 +697,23 @@ void machine_check_exception(struct pt_regs *regs)
 * -- BenH
 */
bad_page_fault(regs, regs-dar, SIGBUS);
-   return;
+   goto exit;
 #endif
 
if (debugger_fault_handler(regs))
-   return;
+   goto exit;
 
if (check_io_access(regs))
-   return;
+   goto exit;
 
die(Machine check, regs, SIGBUS);
 
/* Must die if the interrupt is not recoverable */
if (!(regs-msr  MSR_RI))
panic(Unrecoverable Machine check);
+
+exit:
+   exception_exit(prev_state);
 }
 
 void SMIException(struct pt_regs *regs)
@@ -716,20 +723,31 @@ void SMIException(struct pt_regs *regs)
 
 void unknown_exception(struct pt_regs *regs)
 {
+   enum ctx_state prev_state;
+   prev_state = exception_enter();
+
printk(Bad trap at PC: %lx, SR: %lx, vector=%lx\n,
   regs-nip, regs-msr, regs-trap);
 
_exception(SIGTRAP, regs, 0, 0);
+
+   exception_exit(prev_state);
 }
 
 void instruction_breakpoint_exception(struct pt_regs *regs)
 {
+   enum ctx_state prev_state;
+   prev_state = exception_enter();
+
if (notify_die(DIE_IABR_MATCH, iabr_match, regs, 5,
5, SIGTRAP) == NOTIFY_STOP)
-   return;
+   goto exit;
if (debugger_iabr_match(regs))
-   return;
+   goto exit;
_exception(SIGTRAP, regs, TRAP_BRKPT, regs-nip);
+
+exit:
+   exception_exit(prev_state);
 }
 
 void RunModeException(struct pt_regs *regs)
@@ -739,15 +757,21 @@ void RunModeException(struct pt_regs *regs)
 
 void __kprobes single_step_exception(struct pt_regs *regs)
 {
+   enum ctx_state prev_state;
+   prev_state = exception_enter();
+
clear_single_step(regs);
 
if (notify_die(DIE_SSTEP, single_step, regs, 5,
5, SIGTRAP) == NOTIFY_STOP)
-   return;
+   goto exit;
if (debugger_sstep(regs))
-   return;
+   goto exit;
 
_exception(SIGTRAP, regs, TRAP_TRACE, regs-nip);
+
+exit:
+   exception_exit(prev_state);
 }
 
 /*
@@ -1006,34 +1030,37 @@ int is_valid_bugaddr(unsigned long addr)
 void __kprobes program_check_exception(struct pt_regs *regs)
 {
unsigned int reason = get_reason(regs);
+   enum ctx_state prev_state;
extern int do_mathemu(struct pt_regs *regs);
 
+   prev_state = exception_enter();
+
/* We can now get here via a FP Unavailable exception if the core
 * has no FPU, in that case the reason flags will be 0 */
 
if (reason  REASON_FP) {
/* IEEE FP exception */
parse_fpe(regs);
-   return;
+   goto exit;
}
if (reason  REASON_TRAP) {
/* Debugger is first in line to stop recursive

[RFC PATCH v3 3/5] powerpc: Exit user context on notify resume

2013-05-12 Thread Li Zhong
This patch allows RCU usage in do_notify_resume, e.g. signal handling.
It corresponds to
[PATCH] x86: Exit RCU extended QS on notify resume
  commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/signal.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index cf12eae..d63b502 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -13,6 +13,7 @@
 #include linux/signal.h
 #include linux/uprobes.h
 #include linux/key.h
+#include linux/context_tracking.h
 #include asm/hw_breakpoint.h
 #include asm/uaccess.h
 #include asm/unistd.h
@@ -159,6 +160,8 @@ static int do_signal(struct pt_regs *regs)
 
 void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 {
+   user_exit();
+
if (thread_info_flags  _TIF_UPROBE)
uprobe_notify_resume(regs);
 
@@ -169,4 +172,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long 
thread_info_flags)
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
}
+
+   user_enter();
 }
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 4/5] powerpc: Use the new schedule_user API on userspace preemption

2013-05-12 Thread Li Zhong
This patch corresponds to
[PATCH] x86: Use the new schedule_user API on userspace preemption
  commit 0430499ce9d78691f3985962021b16bf8f8a8048

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/context_tracking.h |   10 ++
 arch/powerpc/kernel/entry_64.S  |3 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/context_tracking.h

diff --git a/arch/powerpc/include/asm/context_tracking.h 
b/arch/powerpc/include/asm/context_tracking.h
new file mode 100644
index 000..b6f5a33
--- /dev/null
+++ b/arch/powerpc/include/asm/context_tracking.h
@@ -0,0 +1,10 @@
+#ifndef _ASM_POWERPC_CONTEXT_TRACKING_H
+#define _ASM_POWERPC_CONTEXT_TRACKING_H
+
+#ifdef CONFIG_CONTEXT_TRACKING
+#define SCHEDULE_USER bl   .schedule_user
+#else
+#define SCHEDULE_USER bl   .schedule
+#endif
+
+#endif
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 915fbb4..d418977 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -33,6 +33,7 @@
 #include asm/irqflags.h
 #include asm/ftrace.h
 #include asm/hw_irq.h
+#include asm/context_tracking.h
 
 /*
  * System calls.
@@ -634,7 +635,7 @@ _GLOBAL(ret_from_except_lite)
andi.   r0,r4,_TIF_NEED_RESCHED
beq 1f
bl  .restore_interrupts
-   bl  .schedule
+   SCHEDULE_USER
b   .ret_from_except_lite
 
 1: bl  .save_nvgprs
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 5/5] powerpc: select HAVE_CONTEXT_TRACKING for pSeries

2013-05-12 Thread Li Zhong
Start context tracking support from pSeries.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 9a0941b..023b288 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -18,6 +18,7 @@ config PPC_PSERIES
select PPC_PCI_CHOICE if EXPERT
select ZLIB_DEFLATE
select PPC_DOORBELL
+   select HAVE_CONTEXT_TRACKING
default y
 
 config PPC_SPLPAR
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again

2013-05-07 Thread Li Zhong
On Tue, 2013-05-07 at 10:32 +0800, Li Zhong wrote:
 Saw this warning again, and this time from the ret_from_fork path. 
 
 It seems we could clear the back chain earlier in copy_thread(), which
 could cover both path, and also fix potential lockdep usage in
 schedule_tail(), or exception occurred before we clear the back chain.

Sorry, I made some mistake, please ignore this patch...
It seems clearing the back chain shouldn't use kregs-gpr[1] below, it
should be ((unsigned long*)sp)[0]. I'll send an updated version.

Thanks, Zhong

 
 Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
 ---
  arch/powerpc/kernel/entry_32.S |2 --
  arch/powerpc/kernel/entry_64.S |2 --
  arch/powerpc/kernel/process.c  |1 +
  3 files changed, 1 insertions(+), 4 deletions(-)
 
 diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
 index e514de5..d22e73e 100644
 --- a/arch/powerpc/kernel/entry_32.S
 +++ b/arch/powerpc/kernel/entry_32.S
 @@ -439,8 +439,6 @@ ret_from_fork:
  ret_from_kernel_thread:
   REST_NVGPRS(r1)
   bl  schedule_tail
 - li  r3,0
 - stw r3,0(r1)
   mtlrr14
   mr  r3,r15
   PPC440EP_ERR42
 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
 index 3fe5259..48e8a86 100644
 --- a/arch/powerpc/kernel/entry_64.S
 +++ b/arch/powerpc/kernel/entry_64.S
 @@ -376,8 +376,6 @@ _GLOBAL(ret_from_fork)
  _GLOBAL(ret_from_kernel_thread)
   bl  .schedule_tail
   REST_NVGPRS(r1)
 - li  r3,0
 - std r3,0(r1)
   ld  r14, 0(r14)
   mtlrr14
   mr  r3,r15
 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
 index ceb4e7b..2c9fc5e 100644
 --- a/arch/powerpc/kernel/process.c
 +++ b/arch/powerpc/kernel/process.c
 @@ -973,6 +973,7 @@ int copy_thread(unsigned long clone_flags, unsigned long 
 usp,
*/
   sp -= sizeof(struct pt_regs);
   kregs = (struct pt_regs *) sp;
 + kregs-gpr[1] = 0;
   sp -= STACK_FRAME_OVERHEAD;
   p-thread.ksp = sp;
   p-thread.ksp_limit = (unsigned long)task_stack_page(p) +


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2] powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again

2013-05-07 Thread Li Zhong
Saw this warning again, and this time from the ret_from_fork path. 

It seems we could clear the back chain earlier in copy_thread(), which
could cover both path, and also fix potential lockdep usage in
schedule_tail(), or exception occurred before we clear the back chain.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
--- 
 arch/powerpc/kernel/entry_32.S |2 --
 arch/powerpc/kernel/entry_64.S |2 --
 arch/powerpc/kernel/process.c  |1 +
 3 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index e514de5..d22e73e 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -439,8 +439,6 @@ ret_from_fork:
 ret_from_kernel_thread:
REST_NVGPRS(r1)
bl  schedule_tail
-   li  r3,0
-   stw r3,0(r1)
mtlrr14
mr  r3,r15
PPC440EP_ERR42
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 3fe5259..48e8a86 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -376,8 +376,6 @@ _GLOBAL(ret_from_fork)
 _GLOBAL(ret_from_kernel_thread)
bl  .schedule_tail
REST_NVGPRS(r1)
-   li  r3,0
-   std r3,0(r1)
ld  r14, 0(r14)
mtlrr14
mr  r3,r15
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ceb4e7b..80af366 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -971,6 +971,7 @@ int copy_thread(unsigned long clone_flags, unsigned long 
usp,
 * do some house keeping and then return from the fork or clone
 * system call, using the stack frame created above.
 */
+   ((unsigned long *)sp)[0] = 0;
sp -= sizeof(struct pt_regs);
kregs = (struct pt_regs *) sp;
sp -= STACK_FRAME_OVERHEAD;


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again

2013-05-06 Thread Li Zhong
Saw this warning again, and this time from the ret_from_fork path. 

It seems we could clear the back chain earlier in copy_thread(), which
could cover both path, and also fix potential lockdep usage in
schedule_tail(), or exception occurred before we clear the back chain.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/entry_32.S |2 --
 arch/powerpc/kernel/entry_64.S |2 --
 arch/powerpc/kernel/process.c  |1 +
 3 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index e514de5..d22e73e 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -439,8 +439,6 @@ ret_from_fork:
 ret_from_kernel_thread:
REST_NVGPRS(r1)
bl  schedule_tail
-   li  r3,0
-   stw r3,0(r1)
mtlrr14
mr  r3,r15
PPC440EP_ERR42
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 3fe5259..48e8a86 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -376,8 +376,6 @@ _GLOBAL(ret_from_fork)
 _GLOBAL(ret_from_kernel_thread)
bl  .schedule_tail
REST_NVGPRS(r1)
-   li  r3,0
-   std r3,0(r1)
ld  r14, 0(r14)
mtlrr14
mr  r3,r15
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ceb4e7b..2c9fc5e 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -973,6 +973,7 @@ int copy_thread(unsigned long clone_flags, unsigned long 
usp,
 */
sp -= sizeof(struct pt_regs);
kregs = (struct pt_regs *) sp;
+   kregs-gpr[1] = 0;
sp -= STACK_FRAME_OVERHEAD;
p-thread.ksp = sp;
p-thread.ksp_limit = (unsigned long)task_stack_page(p) +
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH powerpc] make CONFIG_NUMA depends on CONFIG_SMP

2013-04-18 Thread Li Zhong
On Thu, 2013-04-18 at 11:46 +1000, Michael Ellerman wrote:
 On Wed, May 30, 2012 at 05:31:58PM +0800, Li Zhong wrote:
  I'm not sure whether it makes sense to add this dependency to avoid
  CONFI_NUMA  !CONFIG_SMP. 
  
  I want to do this because I saw some build errors on next-tree when
  compiling with CONFIG_SMP disabled, and it seems they are caused by some
  codes under the CONFIG_NUMA #ifdefs.  
 
 This seems to make sense to me. Can you please repost with a better
 changelog and a description of the actual build error you were seeing.

I tried it today, but didn't find any build errors any more, guess those
errors should have already been fixed. 

But it seems to me by disabling CONFIG_NUMA when CONFIG_SMP is disabled,
could at least prevent some unnecessary code being compiled into the
kernel. (After building a kernel with/without CONFIG_NUMA just now, it
seems that the vmlinux is ~100K smaller without CONFIG_NUMA).

I'm not sure whether this is still needed. 

Thanks, Zhong

 
 cheers
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v2 3/4] powerpc: Don't bolt the hpte in kernel_map_linear_page()

2013-04-15 Thread Li Zhong
On Mon, 2013-04-15 at 08:56 +0200, Benjamin Herrenschmidt wrote:
 On Mon, 2013-04-15 at 13:50 +1000, Paul Mackerras wrote:
  On Fri, Apr 12, 2013 at 10:16:59AM +0800, Li Zhong wrote:
   It seems that in kernel_unmap_linear_page(), it only checks whether there
   is a map in the linear_map_hash_slots array, so seems we don't need bolt
   the hpte.
  

Hi Paul, Ben

Thank you both for the comments and detailed information. I'll keep it
bolted in the next version. If you have time, please help to check
whether my understanding below is correct.

Thanks, Zhong

  I don't exactly understand your rationale here, but I don't think it's
  safe not to have linear mapping pages bolted.  Basically, if a page
  will be used in the process of calling hash_page to demand-fault an
  HPTE into the hash table, then that page needs to be bolted, otherwise
  we can get an infinite recursion of HPT misses.  

So the infinite recursion happens like below?

fault for PAGE A

hash_page for PAGE A 

some page B needed by hash_page processing removed by others,
before inserting the HPTE

fault for PAGE B 

hash_page for PAGE B and recursion for ever


 That includes all
  kernel stack pages, among other things, so I think we need to leave
  the HPTE_V_BOLTED in there.
 
 I suspect Li's confusion comes from the fact that he doesn't realizes
 that we might evict random hash slots. If the linear mapping hash
 entries could only be thrown out via kernel_unmap_linear_page() then his
 comment would make sense. However this isn't the case.
 
 Li: When faulting something in, if both the primary and secondary
 buckets are full, we somewhat randomly evict the content of a slot and
 replace it. However we only do that on non-bolted slots.

So the code is implemented in ppc_md.hpte_remove(), may be called by
__hash_huge_page(), and asm code htab_call_hpte_remove?

 This is why the linear mapping (and the vmemmap) must be bolted.

If not, it would result the infinite recursion like above?

 Cheers,
 Ben.
 
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v2 1/4] powerpc: Move the setting of rflags out of loop in __hash_page_huge

2013-04-15 Thread Li Zhong
On Mon, 2013-04-15 at 16:32 +1000, Michael Ellerman wrote:
 On Fri, Apr 12, 2013 at 10:16:57AM +0800, Li Zhong wrote:
  It seems that rflags don't get changed in  the repeating loop, so move
  it out of the loop.

 You've also changed the way new_pte is handled on repeat, but I think
 that's OK too.

OK, I'll add it in the description :)

Thanks, Zhong

 cheers
 
  diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
  b/arch/powerpc/mm/hugetlbpage-hash64.c
  index cecad34..edb4129 100644
  --- a/arch/powerpc/mm/hugetlbpage-hash64.c
  +++ b/arch/powerpc/mm/hugetlbpage-hash64.c
  @@ -87,10 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long 
  access, unsigned long vsid,
   
  pa = pte_pfn(__pte(old_pte))  PAGE_SHIFT;
   
  -repeat:
  -   hpte_group = ((hash  htab_hash_mask) *
  - HPTES_PER_GROUP)  ~0x7UL;
  -
  /* clear HPTE slot informations in new PTE */
   #ifdef CONFIG_PPC_64K_PAGES
  new_pte = (new_pte  ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0;
 
 ie. here new_pte was updated on repeat, but now it's not.
 
  @@ -101,6 +97,10 @@ repeat:
  rflags |= (new_pte  (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
_PAGE_COHERENT | _PAGE_GUARDED));
   
  +repeat:
  +   hpte_group = ((hash  htab_hash_mask) *
  + HPTES_PER_GROUP)  ~0x7UL;
  +
  /* Insert into the hash table, primary slot */
  slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0,
mmu_psize, ssize);
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v2 4/4] powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()

2013-04-15 Thread Li Zhong
On Mon, 2013-04-15 at 16:36 +1000, Michael Ellerman wrote:
 On Fri, Apr 12, 2013 at 10:17:00AM +0800, Li Zhong wrote:
  This patch tries to fix following issue when CONFIG_DEBUG_PAGEALLOC
  is enabled:
 
 Please include a better changelog.

OK, I'll use the following as a template, thank you for the suggestion. 

 
 This patch does fix (I hope?) the following oops, caused by xxx. Reproducible
 by doing yyy.
 
 cheers
 
  
  [  543.075675] [ cut here ]
  [  543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239!
  [  543.075714] Oops: Exception in kernel mode, sig: 5 [#1]
  [  543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
  [  543.075741] Modules linked in: binfmt_misc ehea
  [  543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: 
  c005a594
  [  543.075771] REGS: c000a90832c0 TRAP: 0700   Not tainted  
  (3.8.0-next-20130222)
  [  543.075781] MSR: 80029032 SF,EE,ME,IR,DR,RI  CR: 4482  
  XER: 
  [  543.075816] SOFTE: 0
  [  543.075823] CFAR: c004c200
  [  543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: 
  c000a908 CPU: 1
  GPR00: 0001 c000a9083540 c0c600a8 
  GPR04: 0050 fffa c000a90834e0 004ff594
  GPR08: 0001  9592d4d8 c0c86854
  GPR12: 0002 c6ead300 00a51000 0001
  GPR16: f3354380  ff80 
  GPR20: 0001 c0c600a8 0001 0001
  GPR24: 03354380 c000  c0b65950
  GPR28: 0020 000cd50e 00bf50d9 c0c7c230
  [  543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8
  [  543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8
  [  543.076025] Call Trace:
  [  543.076033] [c000a9083540] [c0036ea4] 
  .kernel_map_pages+0x1d4/0x3f8 (unreliable)
  [  543.076053] [c000a9083640] [c0167638] 
  .get_page_from_freelist+0x6cc/0x8dc
  [  543.076067] [c000a9083800] [c0167a48] 
  .__alloc_pages_nodemask+0x200/0x96c
  [  543.076082] [c000a90839c0] [c01ade44] 
  .alloc_pages_vma+0x160/0x1e4
  [  543.076098] [c000a9083a80] [c018ce04] 
  .handle_pte_fault+0x1b0/0x7e8
  [  543.076113] [c000a9083b50] [c018d5a8] 
  .handle_mm_fault+0x16c/0x1a0
  [  543.076129] [c000a9083c00] [c07bf1dc] 
  .do_page_fault+0x4d0/0x7a4
  [  543.076144] [c000a9083e30] [c00090e8] 
  handle_page_fault+0x10/0x30
  [  543.076155] Instruction dump:
  [  543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de 
  e96a0010 e84a0008
  [  543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 0b00 7f63db78 
  48785781 6000
  [  543.076224] ---[ end trace bd5807e8d6ae186b ]---
  
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
  ---
   arch/powerpc/mm/hash_utils_64.c |   10 ++
   1 file changed, 6 insertions(+), 4 deletions(-)
  
  diff --git a/arch/powerpc/mm/hash_utils_64.c 
  b/arch/powerpc/mm/hash_utils_64.c
  index a7f54f0..4b449a0 100644
  --- a/arch/powerpc/mm/hash_utils_64.c
  +++ b/arch/powerpc/mm/hash_utils_64.c
  @@ -1272,7 +1272,7 @@ static void kernel_map_linear_page(unsigned long 
  vaddr, unsigned long lmi)
  unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
  unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
  unsigned long mode = htab_convert_pte_flags(PAGE_KERNEL);
  -   int ret;
  +   long ret;
   
  hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
  hpteg = ((hash  htab_hash_mask) * HPTES_PER_GROUP);
  @@ -1280,9 +1280,11 @@ static void kernel_map_linear_page(unsigned long 
  vaddr, unsigned long lmi)
  /* Don't create HPTE entries for bad address */
  if (!vsid)
  return;
  -   ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr),
  -mode, 0,
  -mmu_linear_psize, mmu_kernel_ssize);
  +
  +   ret = hpte_insert_repeating(hash, vpn, __pa(vaddr),
  +   mode, mmu_linear_psize,
  +   mmu_kernel_ssize);
  +
  BUG_ON (ret  0);
  spin_lock(linear_map_hash_lock);
  BUG_ON(linear_map_hash_slots[lmi]  0x80);
  -- 
  1.7.9.5
  
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v2 3/4] powerpc: Don't bolt the hpte in kernel_map_linear_page()

2013-04-15 Thread Li Zhong
On Mon, 2013-04-15 at 13:27 +0200, Benjamin Herrenschmidt wrote:
 On Mon, 2013-04-15 at 16:15 +0800, Li Zhong wrote:
 
  So the code is implemented in ppc_md.hpte_remove(), may be called by
  __hash_huge_page(), and asm code htab_call_hpte_remove?
  
   This is why the linear mapping (and the vmemmap) must be bolted.
  
  If not, it would result the infinite recursion like above?
 
 Potentially, we don't expect to fault linear mapping or vmemmap entries
 on demand. We aren't equipped to do it and we occasionally have code
 path that access the linear mapping and cannot afford to have SRR0 and
 SRR1 clobbered by a page fault.

Thank you for the education :)

Thanks, Zhong

 
 Cheers,
 Ben.
 
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 1/3] powerpc: Move the setting of rflags out of loop in __hash_page_huge

2013-04-15 Thread Li Zhong
It seems that new_pte and rflags don't get changed in the repeating loop, so
move their assignment out of the loop.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/hugetlbpage-hash64.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
b/arch/powerpc/mm/hugetlbpage-hash64.c
index cecad34..edb4129 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -87,10 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, 
unsigned long vsid,
 
pa = pte_pfn(__pte(old_pte))  PAGE_SHIFT;
 
-repeat:
-   hpte_group = ((hash  htab_hash_mask) *
- HPTES_PER_GROUP)  ~0x7UL;
-
/* clear HPTE slot informations in new PTE */
 #ifdef CONFIG_PPC_64K_PAGES
new_pte = (new_pte  ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0;
@@ -101,6 +97,10 @@ repeat:
rflags |= (new_pte  (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
  _PAGE_COHERENT | _PAGE_GUARDED));
 
+repeat:
+   hpte_group = ((hash  htab_hash_mask) *
+ HPTES_PER_GROUP)  ~0x7UL;
+
/* Insert into the hash table, primary slot */
slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0,
  mmu_psize, ssize);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 2/3] powerpc: Split the code trying to insert hpte repeatedly as an helper function

2013-04-15 Thread Li Zhong
Move the logic trying to insert hpte in __hash_page_huge() to an helper
function, so it could also be used by others.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/hash_utils_64.c  |   35 ++
 arch/powerpc/mm/hugetlbpage-hash64.c |   31 ++
 2 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index f410c3e..ead9fa8 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1230,6 +1230,41 @@ void low_hash_fault(struct pt_regs *regs, unsigned long 
address, int rc)
bad_page_fault(regs, address, SIGBUS);
 }
 
+long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
+  unsigned long pa, unsigned long rflags,
+  unsigned long vflags, int psize, int ssize)
+{
+   unsigned long hpte_group;
+   long slot;
+
+repeat:
+   hpte_group = ((hash  htab_hash_mask) *
+  HPTES_PER_GROUP)  ~0x7UL;
+
+   /* Insert into the hash table, primary slot */
+   slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, vflags,
+ psize, ssize);
+
+   /* Primary is full, try the secondary */
+   if (unlikely(slot == -1)) {
+   hpte_group = ((~hash  htab_hash_mask) *
+ HPTES_PER_GROUP)  ~0x7UL;
+   slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags,
+ vflags | HPTE_V_SECONDARY,
+ psize, ssize);
+   if (slot == -1) {
+   if (mftb()  0x1)
+   hpte_group = ((hash  htab_hash_mask) *
+ HPTES_PER_GROUP)~0x7UL;
+
+   ppc_md.hpte_remove(hpte_group);
+   goto repeat;
+   }
+   }
+
+   return slot;
+}
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
 {
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
b/arch/powerpc/mm/hugetlbpage-hash64.c
index edb4129..b913f41 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -14,6 +14,10 @@
 #include asm/cacheflush.h
 #include asm/machdep.h
 
+extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
+ unsigned long pa, unsigned long rlags,
+ unsigned long vflags, int psize, int ssize);
+
 int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long 
vsid,
 pte_t *ptep, unsigned long trap, int local, int ssize,
 unsigned int shift, unsigned int mmu_psize)
@@ -83,7 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, 
unsigned long vsid,
 
if (likely(!(old_pte  _PAGE_HASHPTE))) {
unsigned long hash = hpt_hash(vpn, shift, ssize);
-   unsigned long hpte_group;
 
pa = pte_pfn(__pte(old_pte))  PAGE_SHIFT;
 
@@ -97,30 +100,8 @@ int __hash_page_huge(unsigned long ea, unsigned long 
access, unsigned long vsid,
rflags |= (new_pte  (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
  _PAGE_COHERENT | _PAGE_GUARDED));
 
-repeat:
-   hpte_group = ((hash  htab_hash_mask) *
- HPTES_PER_GROUP)  ~0x7UL;
-
-   /* Insert into the hash table, primary slot */
-   slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0,
- mmu_psize, ssize);
-
-   /* Primary is full, try the secondary */
-   if (unlikely(slot == -1)) {
-   hpte_group = ((~hash  htab_hash_mask) *
- HPTES_PER_GROUP)  ~0x7UL;
-   slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags,
- HPTE_V_SECONDARY,
- mmu_psize, ssize);
-   if (slot == -1) {
-   if (mftb()  0x1)
-   hpte_group = ((hash  htab_hash_mask) *
- HPTES_PER_GROUP)~0x7UL;
-
-   ppc_md.hpte_remove(hpte_group);
-   goto repeat;
-}
-   }
+   slot = hpte_insert_repeating(hash, vpn, pa, rflags, 0,
+mmu_psize, ssize);
 
/*
 * Hypervisor failure. Restore old pte and return -1
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc

[RFC PATCH v3 3/3] powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()

2013-04-15 Thread Li Zhong
This patch fixes the following oops, which could be trigged by build the kernel
with many concurrent threads, under CONFIG_DEBUG_PAGEALLOC.

hpte_insert() might return -1, indicating that the bucket (primary here)
is full. We are not necessarily reporting a BUG in this case. Instead, we could
try repeatedly (try secondary, remove and try again) until we find a slot.

[  543.075675] [ cut here ]
[  543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239!
[  543.075714] Oops: Exception in kernel mode, sig: 5 [#1]
[  543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[  543.075741] Modules linked in: binfmt_misc ehea
[  543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: c005a594
[  543.075771] REGS: c000a90832c0 TRAP: 0700   Not tainted  
(3.8.0-next-20130222)
[  543.075781] MSR: 80029032 SF,EE,ME,IR,DR,RI  CR: 4482  XER: 

[  543.075816] SOFTE: 0
[  543.075823] CFAR: c004c200
[  543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: c000a908 
CPU: 1
GPR00: 0001 c000a9083540 c0c600a8 
GPR04: 0050 fffa c000a90834e0 004ff594
GPR08: 0001  9592d4d8 c0c86854
GPR12: 0002 c6ead300 00a51000 0001
GPR16: f3354380  ff80 
GPR20: 0001 c0c600a8 0001 0001
GPR24: 03354380 c000  c0b65950
GPR28: 0020 000cd50e 00bf50d9 c0c7c230
[  543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8
[  543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8
[  543.076025] Call Trace:
[  543.076033] [c000a9083540] [c0036ea4] 
.kernel_map_pages+0x1d4/0x3f8 (unreliable)
[  543.076053] [c000a9083640] [c0167638] 
.get_page_from_freelist+0x6cc/0x8dc
[  543.076067] [c000a9083800] [c0167a48] 
.__alloc_pages_nodemask+0x200/0x96c
[  543.076082] [c000a90839c0] [c01ade44] 
.alloc_pages_vma+0x160/0x1e4
[  543.076098] [c000a9083a80] [c018ce04] 
.handle_pte_fault+0x1b0/0x7e8
[  543.076113] [c000a9083b50] [c018d5a8] 
.handle_mm_fault+0x16c/0x1a0
[  543.076129] [c000a9083c00] [c07bf1dc] .do_page_fault+0x4d0/0x7a4
[  543.076144] [c000a9083e30] [c00090e8] handle_page_fault+0x10/0x30
[  543.076155] Instruction dump:
[  543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de e96a0010 
e84a0008
[  543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 0b00 7f63db78 48785781 
6000
[  543.076224] ---[ end trace bd5807e8d6ae186b ]---

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/hash_utils_64.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index ead9fa8..1ed4419 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1268,21 +1268,22 @@ repeat:
 #ifdef CONFIG_DEBUG_PAGEALLOC
 static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
 {
-   unsigned long hash, hpteg;
+   unsigned long hash;
unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
unsigned long mode = htab_convert_pte_flags(PAGE_KERNEL);
-   int ret;
+   long ret;
 
hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
-   hpteg = ((hash  htab_hash_mask) * HPTES_PER_GROUP);
 
/* Don't create HPTE entries for bad address */
if (!vsid)
return;
-   ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr),
-mode, HPTE_V_BOLTED,
-mmu_linear_psize, mmu_kernel_ssize);
+
+   ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode,
+   HPTE_V_BOLTED,
+   mmu_linear_psize, mmu_kernel_ssize);
+
BUG_ON (ret  0);
spin_lock(linear_map_hash_lock);
BUG_ON(linear_map_hash_slots[lmi]  0x80);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v2 0/4] try secondary hash before BUG in kernel_map_linear_page()

2013-04-11 Thread Li Zhong
Hi Michael, 

Here is the updated version,  could you please help to review it again? 

As you suggested, this version didn't copy the code, but splitted
the logic into a helper function, so both kernel_map_linear_page() and
__hash_page_huge() can use. 

Also patch #1 moves some unnecessary code out of the repeating loop, so the 
splitting is easier. Patch #3 removes the HPTE_V_BOLTED flag in 
kernel_map_linear_page(), it seems not needed based on my understanding. 

Changes are split into smaller ones, so each one did only one thing. 

Thanks, Zhong

Li Zhong (4):
  powerpc: Move the setting of rflags out of loop in __hash_page_huge
  powerpc: Split the code trying to insert hpte repeatedly as an helper
function
  powerpc: Don't bolt the hpte in kernel_map_linear_page()
  powerpc: Try to insert the hptes repeatedly in
kernel_map_linear_page()

 arch/powerpc/mm/hash_utils_64.c  |   45 +++---
 arch/powerpc/mm/hugetlbpage-hash64.c |   31 +--
 2 files changed, 47 insertions(+), 29 deletions(-)

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v2 2/4] powerpc: Split the code trying to insert hpte repeatedly as an helper function

2013-04-11 Thread Li Zhong
Move the logical trying to insert hpte in __hash_page_huge() to an helper
function, so it could also be used by others.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/hash_utils_64.c  |   35 ++
 arch/powerpc/mm/hugetlbpage-hash64.c |   31 ++
 2 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index f410c3e..716f42b 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1230,6 +1230,41 @@ void low_hash_fault(struct pt_regs *regs, unsigned long 
address, int rc)
bad_page_fault(regs, address, SIGBUS);
 }
 
+long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
+  unsigned long pa, unsigned long rflags,
+  int psize, int ssize)
+{
+   unsigned long hpte_group;
+   long slot;
+
+repeat:
+   hpte_group = ((hash  htab_hash_mask) *
+  HPTES_PER_GROUP)  ~0x7UL;
+
+   /* Insert into the hash table, primary slot */
+   slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0,
+ psize, ssize);
+
+   /* Primary is full, try the secondary */
+   if (unlikely(slot == -1)) {
+   hpte_group = ((~hash  htab_hash_mask) *
+ HPTES_PER_GROUP)  ~0x7UL;
+   slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags,
+ HPTE_V_SECONDARY,
+ psize, ssize);
+   if (slot == -1) {
+   if (mftb()  0x1)
+   hpte_group = ((hash  htab_hash_mask) *
+ HPTES_PER_GROUP)~0x7UL;
+
+   ppc_md.hpte_remove(hpte_group);
+   goto repeat;
+   }
+   }
+
+   return slot;
+}
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
 {
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
b/arch/powerpc/mm/hugetlbpage-hash64.c
index edb4129..bd7d38b 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -14,6 +14,10 @@
 #include asm/cacheflush.h
 #include asm/machdep.h
 
+extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
+  unsigned long pa, unsigned long rlags,
+  int psize, int ssize);
+
 int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long 
vsid,
 pte_t *ptep, unsigned long trap, int local, int ssize,
 unsigned int shift, unsigned int mmu_psize)
@@ -83,7 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, 
unsigned long vsid,
 
if (likely(!(old_pte  _PAGE_HASHPTE))) {
unsigned long hash = hpt_hash(vpn, shift, ssize);
-   unsigned long hpte_group;
 
pa = pte_pfn(__pte(old_pte))  PAGE_SHIFT;
 
@@ -97,30 +100,8 @@ int __hash_page_huge(unsigned long ea, unsigned long 
access, unsigned long vsid,
rflags |= (new_pte  (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
  _PAGE_COHERENT | _PAGE_GUARDED));
 
-repeat:
-   hpte_group = ((hash  htab_hash_mask) *
- HPTES_PER_GROUP)  ~0x7UL;
-
-   /* Insert into the hash table, primary slot */
-   slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0,
- mmu_psize, ssize);
-
-   /* Primary is full, try the secondary */
-   if (unlikely(slot == -1)) {
-   hpte_group = ((~hash  htab_hash_mask) *
- HPTES_PER_GROUP)  ~0x7UL;
-   slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags,
- HPTE_V_SECONDARY,
- mmu_psize, ssize);
-   if (slot == -1) {
-   if (mftb()  0x1)
-   hpte_group = ((hash  htab_hash_mask) *
- HPTES_PER_GROUP)~0x7UL;
-
-   ppc_md.hpte_remove(hpte_group);
-   goto repeat;
-}
-   }
+   slot = hpte_insert_repeating(hash, vpn, pa,
+rflags, mmu_psize, ssize);
 
/*
 * Hypervisor failure. Restore old pte and return -1
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v2 1/4] powerpc: Move the setting of rflags out of loop in __hash_page_huge

2013-04-11 Thread Li Zhong
It seems that rflags don't get changed in  the repeating loop, so move
it out of the loop.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/hugetlbpage-hash64.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
b/arch/powerpc/mm/hugetlbpage-hash64.c
index cecad34..edb4129 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -87,10 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, 
unsigned long vsid,
 
pa = pte_pfn(__pte(old_pte))  PAGE_SHIFT;
 
-repeat:
-   hpte_group = ((hash  htab_hash_mask) *
- HPTES_PER_GROUP)  ~0x7UL;
-
/* clear HPTE slot informations in new PTE */
 #ifdef CONFIG_PPC_64K_PAGES
new_pte = (new_pte  ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0;
@@ -101,6 +97,10 @@ repeat:
rflags |= (new_pte  (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
  _PAGE_COHERENT | _PAGE_GUARDED));
 
+repeat:
+   hpte_group = ((hash  htab_hash_mask) *
+ HPTES_PER_GROUP)  ~0x7UL;
+
/* Insert into the hash table, primary slot */
slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0,
  mmu_psize, ssize);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v2 3/4] powerpc: Don't bolt the hpte in kernel_map_linear_page()

2013-04-11 Thread Li Zhong
It seems that in kernel_unmap_linear_page(), it only checks whether there
is a map in the linear_map_hash_slots array, so seems we don't need bolt
the hpte.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/hash_utils_64.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 716f42b..a7f54f0 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1281,7 +1281,7 @@ static void kernel_map_linear_page(unsigned long vaddr, 
unsigned long lmi)
if (!vsid)
return;
ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr),
-mode, HPTE_V_BOLTED,
+mode, 0,
 mmu_linear_psize, mmu_kernel_ssize);
BUG_ON (ret  0);
spin_lock(linear_map_hash_lock);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v2 4/4] powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()

2013-04-11 Thread Li Zhong
This patch tries to fix following issue when CONFIG_DEBUG_PAGEALLOC
is enabled:

[  543.075675] [ cut here ]
[  543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239!
[  543.075714] Oops: Exception in kernel mode, sig: 5 [#1]
[  543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[  543.075741] Modules linked in: binfmt_misc ehea
[  543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: c005a594
[  543.075771] REGS: c000a90832c0 TRAP: 0700   Not tainted  
(3.8.0-next-20130222)
[  543.075781] MSR: 80029032 SF,EE,ME,IR,DR,RI  CR: 4482  XER: 

[  543.075816] SOFTE: 0
[  543.075823] CFAR: c004c200
[  543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: c000a908 
CPU: 1
GPR00: 0001 c000a9083540 c0c600a8 
GPR04: 0050 fffa c000a90834e0 004ff594
GPR08: 0001  9592d4d8 c0c86854
GPR12: 0002 c6ead300 00a51000 0001
GPR16: f3354380  ff80 
GPR20: 0001 c0c600a8 0001 0001
GPR24: 03354380 c000  c0b65950
GPR28: 0020 000cd50e 00bf50d9 c0c7c230
[  543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8
[  543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8
[  543.076025] Call Trace:
[  543.076033] [c000a9083540] [c0036ea4] 
.kernel_map_pages+0x1d4/0x3f8 (unreliable)
[  543.076053] [c000a9083640] [c0167638] 
.get_page_from_freelist+0x6cc/0x8dc
[  543.076067] [c000a9083800] [c0167a48] 
.__alloc_pages_nodemask+0x200/0x96c
[  543.076082] [c000a90839c0] [c01ade44] 
.alloc_pages_vma+0x160/0x1e4
[  543.076098] [c000a9083a80] [c018ce04] 
.handle_pte_fault+0x1b0/0x7e8
[  543.076113] [c000a9083b50] [c018d5a8] 
.handle_mm_fault+0x16c/0x1a0
[  543.076129] [c000a9083c00] [c07bf1dc] .do_page_fault+0x4d0/0x7a4
[  543.076144] [c000a9083e30] [c00090e8] handle_page_fault+0x10/0x30
[  543.076155] Instruction dump:
[  543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de e96a0010 
e84a0008
[  543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 0b00 7f63db78 48785781 
6000
[  543.076224] ---[ end trace bd5807e8d6ae186b ]---

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/hash_utils_64.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index a7f54f0..4b449a0 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1272,7 +1272,7 @@ static void kernel_map_linear_page(unsigned long vaddr, 
unsigned long lmi)
unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
unsigned long mode = htab_convert_pte_flags(PAGE_KERNEL);
-   int ret;
+   long ret;
 
hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
hpteg = ((hash  htab_hash_mask) * HPTES_PER_GROUP);
@@ -1280,9 +1280,11 @@ static void kernel_map_linear_page(unsigned long vaddr, 
unsigned long lmi)
/* Don't create HPTE entries for bad address */
if (!vsid)
return;
-   ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr),
-mode, 0,
-mmu_linear_psize, mmu_kernel_ssize);
+
+   ret = hpte_insert_repeating(hash, vpn, __pa(vaddr),
+   mode, mmu_linear_psize,
+   mmu_kernel_ssize);
+
BUG_ON (ret  0);
spin_lock(linear_map_hash_lock);
BUG_ON(linear_map_hash_slots[lmi]  0x80);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH powerpc] try secondary hash before BUG in kernel_map_linear_page()

2013-04-09 Thread Li Zhong
On Wed, 2013-04-10 at 12:21 +1000, Michael Ellerman wrote:
 On Mon, Feb 25, 2013 at 05:29:35PM +0800, Li Zhong wrote:
  This patch tries to fix following issue when CONFIG_DEBUG_PAGEALLOC
  is enabled:
  
  [  543.075675] [ cut here ]
  [  543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239!
  [  543.075714] Oops: Exception in kernel mode, sig: 5 [#1]

Hi Michael, 

Thanks for the review.

 So the issue is that kernel_map_linear_page() doesn't try the secondary
 hash slot.

It seems so.

  The code is borrowed from that in __hash_page_huge().
 
 It is, and in fact there is another copy in hash_low_64.S - in assembler.
 
 So I think we should at least try and keep ourselves to two
 implementations, one in asm and one in C. So can you split it out into a
 helper routine called by both kernel_map_linear_page() and
 __hash_page_huge() ?

OK, I'll try to update it as you suggested. 

Thanks, Zhong

 cheers
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v2 6/6] powerpc: Use generic code for exception handling

2013-04-09 Thread Li Zhong
On Wed, 2013-04-10 at 14:56 +1000, Michael Ellerman wrote:
 On Fri, Mar 29, 2013 at 06:00:21PM +0800, Li Zhong wrote:
  After the exception handling moved to generic code, and some changes in
 ...
  diff --git a/arch/powerpc/mm/hash_utils_64.c 
  b/arch/powerpc/mm/hash_utils_64.c
  index 360fba8..eeab30f 100644
  --- a/arch/powerpc/mm/hash_utils_64.c
  +++ b/arch/powerpc/mm/hash_utils_64.c
  @@ -33,6 +33,7 @@
   #include linux/init.h
   #include linux/signal.h
   #include linux/memblock.h
  +#include linux/context_tracking.h
   
   #include asm/processor.h
   #include asm/pgtable.h
  @@ -56,7 +57,6 @@
   #include asm/fadump.h
   #include asm/firmware.h
   #include asm/tm.h
  -#include asm/context_tracking.h
   
   #ifdef DEBUG
   #define DBG(fmt...) udbg_printf(fmt)
  @@ -919,13 +919,17 @@ int hash_page(unsigned long ea, unsigned long access, 
  unsigned long trap)
  const struct cpumask *tmp;
  int rc, user_region = 0, local = 0;
  int psize, ssize;
  +   enum ctx_state prev_state;
  +
  +   prev_state = exception_enter();
   
  DBG_LOW(hash_page(ea=%016lx, access=%lx, trap=%lx\n,
  ea, access, trap);
   
  if ((ea  ~REGION_MASK) = PGTABLE_RANGE) {
  DBG_LOW( out of pgtable range !\n);
  -   return 1;
  +   rc = 1;
  +   goto exit;
  }
   
  /* Get region  vsid */
 
 This no longer applies on mainline, please send an updated version.

Yes, for current mainline (powerpc tree), only previous five patches
could be applied. The dependency of this patch is current in tip tree,
and seems would be in for 3.10.

There are some more details in the cover letter (#0):

I assume these patches would get in through powerpc tree, so I didn't
combine the new patch (#6) with the original one (#2). So that if
powerpc tree picks these, it could pick the first five patches, and
apply patch #6 later when the dependency enters into powerpc tree (maybe
on some 3.10-rcs).

Thanks, Zhong

 cheers
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v2 6/6] powerpc: Use generic code for exception handling

2013-04-09 Thread Li Zhong
On Wed, 2013-04-10 at 13:32 +0800, Li Zhong wrote:
 On Wed, 2013-04-10 at 14:56 +1000, Michael Ellerman wrote:
  On Fri, Mar 29, 2013 at 06:00:21PM +0800, Li Zhong wrote:
   After the exception handling moved to generic code, and some changes in
  ...
   diff --git a/arch/powerpc/mm/hash_utils_64.c 
   b/arch/powerpc/mm/hash_utils_64.c
   index 360fba8..eeab30f 100644
   --- a/arch/powerpc/mm/hash_utils_64.c
   +++ b/arch/powerpc/mm/hash_utils_64.c
   @@ -33,6 +33,7 @@
#include linux/init.h
#include linux/signal.h
#include linux/memblock.h
   +#include linux/context_tracking.h

#include asm/processor.h
#include asm/pgtable.h
   @@ -56,7 +57,6 @@
#include asm/fadump.h
#include asm/firmware.h
#include asm/tm.h
   -#include asm/context_tracking.h

#ifdef DEBUG
#define DBG(fmt...) udbg_printf(fmt)
   @@ -919,13 +919,17 @@ int hash_page(unsigned long ea, unsigned long 
   access, unsigned long trap)
 const struct cpumask *tmp;
 int rc, user_region = 0, local = 0;
 int psize, ssize;
   + enum ctx_state prev_state;
   +
   + prev_state = exception_enter();

 DBG_LOW(hash_page(ea=%016lx, access=%lx, trap=%lx\n,
 ea, access, trap);

 if ((ea  ~REGION_MASK) = PGTABLE_RANGE) {
 DBG_LOW( out of pgtable range !\n);
   - return 1;
   + rc = 1;
   + goto exit;
 }

 /* Get region  vsid */
  
  This no longer applies on mainline, please send an updated version.
 
 Yes, for current mainline (powerpc tree), only previous five patches
 could be applied. The dependency of this patch is current in tip tree,
 and seems would be in for 3.10.
 
 There are some more details in the cover letter (#0):
 
 I assume these patches would get in through powerpc tree, so I didn't
 combine the new patch (#6) with the original one (#2). So that if
 powerpc tree picks these, it could pick the first five patches, and
 apply patch #6 later when the dependency enters into powerpc tree (maybe
 on some 3.10-rcs).

And I will send an updated version of this one when I see the dependency
commits in mainline. 

Thanks, Zhong

 Thanks, Zhong
 
  cheers
  
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v2 2/6] powerpc: Exception hooks for context tracking subsystem

2013-04-08 Thread Li Zhong
On Fri, 2013-04-05 at 13:50 +1100, Paul Mackerras wrote:
 On Fri, Mar 29, 2013 at 06:00:17PM +0800, Li Zhong wrote:
  This is the exception hooks for context tracking subsystem, including
  data access, program check, single step, instruction breakpoint, machine 
  check,
  alignment, fp unavailable, altivec assist, unknown exception, whose handlers
  might use RCU.
  
  This patch corresponds to
  [PATCH] x86: Exception hooks for userspace RCU extended QS
commit 6ba3c97a38803883c2eee489505796cb0a727122
  
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com


Hi Paul, 

Thanks for your review! Please check my answers below, and correct me if
any errors.

 Is there a reason why you didn't put the exception_exit() call in
 ret_from_except_lite in entry_64.S, and the exception_entry() call in
 EXCEPTION_PROLOG_COMMON?  That would seem to catch all these cases in
 a more centralized place.

It seems to me that ret_from_except_lite and EXCEPTION_PROLOG_COMMON are
also used by interrupts, where I think we don't need the hooks. So using
this way could help to avoid adding overhead to these code path
(interrupts, and some exit path of syscall). 

And I think adding the hook on higher level code seems a little easier
for reading and checking. It seems that some exceptions don't use
EXCEPTION_PROLOG_COMMON, and some don't go ret_from_except_lite exit
path (like fp unavailable might go directly to fast_exception_return ).
Maybe fast_exception_return is a centralized place for us to return to
user space? But it still adds some overheads which is not necessarily
needed. 

And I think it also makes the implementation here consistent with the
style that x86 uses. 

 Also, I notice that with the exception_exit calls where they are, we
 can still deliver signals (thus possibly taking a page fault) or call
 schedule() for preemption after the exception_exit() call.  Is that
 OK, or is it a potential problem?

If I understand correctly, I guess you are talking about the cases where
we might return to user space without context state correctly being set
as in user?

There is user_enter() called in do_notify_resume() in patch #3, so after
handling the signals we always call user_enter(). 

There are also some changes of the context_tracking code from Frederic,
which might be related: ( they are now in tip tree, and url of the
patches for your convenience https://lkml.org/lkml/2013/3/1/266 )

6c1e0256fad84a843d915414e4b5973b7443d48d
context_tracking: Restore correct previous context state on exception
exit. 

With this patch, if a later exception happened after user_enter(),
before the CPU actually returns to user space, the correct context
state(in user) is saved and restored when handling the later exception. 

Patch #6 converts the code to use these new APIs, which is currently not
available in powerpc tree. 

b22366cd54c6fe05db426f20adb10f461c19ec06
context_tracking: Restore preempted context state after
preempt_schedule_irq

With this patch, the user context state could be correctly restored
after schedule returns. 

Thanks, Zhong

 Paul.
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v2 0/6] powerpc: Support context tracking for Power pSeries

2013-03-29 Thread Li Zhong
These patches try to support context tracking for Power arch, beginning with
64-bit pSeries. The codes are ported from that of the x86_64, and in each
patch, I listed the corresponding patch for x86.

Would you please help review and give your comments?

v2:

I rebased the patches against 3.9-rcs, and also added a patch to replace
the exception handling with the generic code in tip timers/nohz.

I assume these patches would get in through powerpc tree, so I didn't combine
the new patch (#6) with the original one (#2). So that if powerpc tree picks
these, it could pick the first five patches, and apply patch #6 later
when the dependency enters into powerpc tree (maybe on some 3.10-rcs).

I'm also wondering whether it is possible for these to go through
tip timers/nohz, so for now, patches #6 and #2 could be combined into one, and
no need to  worry about the issues caused by arch/common code merging. And it
might also make future changes easier.

Thanks, Zhong

Li Zhong (6):
  powerpc: Syscall hooks for context tracking subsystem
  powerpc: Exception hooks for context tracking subsystem
  powerpc: Exit user context on notify resume
  powerpc: Use the new schedule_user API on userspace preemption
  powerpc: select HAVE_CONTEXT_TRACKING for pSeries
  powerpc: Use generic code for exception handling

 arch/powerpc/include/asm/context_tracking.h |   10 +++
 arch/powerpc/include/asm/thread_info.h  |7 ++-
 arch/powerpc/kernel/entry_64.S  |3 +-
 arch/powerpc/kernel/ptrace.c|5 ++
 arch/powerpc/kernel/signal.c|5 ++
 arch/powerpc/kernel/traps.c |   91 ---
 arch/powerpc/mm/fault.c |   16 -
 arch/powerpc/mm/hash_utils_64.c |   38 ---
 arch/powerpc/platforms/pseries/Kconfig  |1 +
 9 files changed, 140 insertions(+), 36 deletions(-)
 create mode 100644 arch/powerpc/include/asm/context_tracking.h

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v2 1/6] powerpc: Syscall hooks for context tracking subsystem

2013-03-29 Thread Li Zhong
This is the syscall slow path hooks for context tracking subsystem,
corresponding to
[PATCH] x86: Syscall hooks for userspace RCU extended QS
  commit bf5a3c13b939813d28ce26c01425054c740d6731

TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
better for it to be in the same 16 bits with others in the group, so in the
asm code, andi. with this group could work.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
Acked-by: Frederic Weisbecker fweis...@gmail.com
---
 arch/powerpc/include/asm/thread_info.h |7 +--
 arch/powerpc/kernel/ptrace.c   |5 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 406b7b9..414a261 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -97,7 +97,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_PERFMON_CTXSW  6   /* perfmon needs ctxsw calls */
 #define TIF_SYSCALL_AUDIT  7   /* syscall auditing active */
 #define TIF_SINGLESTEP 8   /* singlestepping active */
-#define TIF_MEMDIE 9   /* is terminating due to OOM killer */
+#define TIF_NOHZ   9   /* in adaptive nohz mode */
 #define TIF_SECCOMP10  /* secure computing */
 #define TIF_RESTOREALL 11  /* Restore all regs (implies NOERROR) */
 #define TIF_NOERROR12  /* Force successful syscall return */
@@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SYSCALL_TRACEPOINT 15  /* syscall tracepoint instrumentation */
 #define TIF_EMULATE_STACK_STORE16  /* Is an instruction emulation
for stack store? */
+#define TIF_MEMDIE 17  /* is terminating due to OOM killer */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE (1TIF_SYSCALL_TRACE)
@@ -124,8 +125,10 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_UPROBE(1TIF_UPROBE)
 #define _TIF_SYSCALL_TRACEPOINT(1TIF_SYSCALL_TRACEPOINT)
 #define _TIF_EMULATE_STACK_STORE   (1TIF_EMULATE_STACK_STORE)
+#define _TIF_NOHZ  (1TIF_NOHZ)
 #define _TIF_SYSCALL_T_OR_A(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
-_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
+_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
+_TIF_NOHZ)
 
 #define _TIF_USER_WORK_MASK(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
 _TIF_NOTIFY_RESUME | _TIF_UPROBE)
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 245c1b6..0b7aad0 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include trace/syscall.h
 #include linux/hw_breakpoint.h
 #include linux/perf_event.h
+#include linux/context_tracking.h
 
 #include asm/uaccess.h
 #include asm/page.h
@@ -1778,6 +1779,8 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 {
long ret = 0;
 
+   user_exit();
+
secure_computing_strict(regs-gpr[0]);
 
if (test_thread_flag(TIF_SYSCALL_TRACE) 
@@ -1822,4 +1825,6 @@ void do_syscall_trace_leave(struct pt_regs *regs)
step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, step);
+
+   user_enter();
 }
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v2 2/6] powerpc: Exception hooks for context tracking subsystem

2013-03-29 Thread Li Zhong
This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.

This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
  commit 6ba3c97a38803883c2eee489505796cb0a727122

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/context_tracking.h |   28 +
 arch/powerpc/kernel/exceptions-64s.S|4 +-
 arch/powerpc/kernel/traps.c |   83 ---
 arch/powerpc/mm/fault.c |   15 -
 arch/powerpc/mm/hash_utils_64.c |   17 ++
 5 files changed, 122 insertions(+), 25 deletions(-)
 create mode 100644 arch/powerpc/include/asm/context_tracking.h

diff --git a/arch/powerpc/include/asm/context_tracking.h 
b/arch/powerpc/include/asm/context_tracking.h
new file mode 100644
index 000..377146e
--- /dev/null
+++ b/arch/powerpc/include/asm/context_tracking.h
@@ -0,0 +1,28 @@
+#ifndef _ASM_POWERPC_CONTEXT_TRACKING_H
+#define _ASM_POWERPC_CONTEXT_TRACKING_H
+
+#include linux/context_tracking.h
+#include asm/ptrace.h
+
+/*
+ * temporarily defined to avoid potential conflicts with the common
+ * implementation, these will be removed by a later patch after the common
+ * code enters powerpc tree
+ */
+#define exception_enter __exception_enter
+#define exception_exit __exception_exit
+
+static inline void __exception_enter(struct pt_regs *regs)
+{
+   user_exit();
+}
+
+static inline void __exception_exit(struct pt_regs *regs)
+{
+#ifdef CONFIG_CONTEXT_TRACKING
+   if (user_mode(regs))
+   user_enter();
+#endif
+}
+
+#endif
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index a8a5361..6d82f4f 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1368,15 +1368,17 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_SLB)
rlwimi  r4,r0,32-13,30,30   /* becomes _PAGE_USER access bit */
ori r4,r4,1 /* add _PAGE_PRESENT */
rlwimi  r4,r5,22+2,31-2,31-2/* Set _PAGE_EXEC if trap is 0x400 */
+   addir6,r1,STACK_FRAME_OVERHEAD
 
/*
 * r3 contains the faulting address
 * r4 contains the required access permissions
 * r5 contains the trap number
+* r6 contains the address of pt_regs
 *
 * at return r3 = 0 for success, 1 for page fault, negative for error
 */
-   bl  .hash_page  /* build HPTE if possible */
+   bl  .hash_page_ct   /* build HPTE if possible */
cmpdi   r3,0/* see if hash_page succeeded */
 
/* Success */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 37cc40e..6228b6b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -60,6 +60,7 @@
 #include asm/switch_to.h
 #include asm/tm.h
 #include asm/debug.h
+#include asm/context_tracking.h
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -669,6 +670,8 @@ void machine_check_exception(struct pt_regs *regs)
 {
int recover = 0;
 
+   exception_enter(regs);
+
__get_cpu_var(irq_stat).mce_exceptions++;
 
/* See if any machine dependent calls. In theory, we would want
@@ -683,7 +686,7 @@ void machine_check_exception(struct pt_regs *regs)
recover = cur_cpu_spec-machine_check(regs);
 
if (recover  0)
-   return;
+   goto exit;
 
 #if defined(CONFIG_8xx)  defined(CONFIG_PCI)
/* the qspan pci read routines can cause machine checks -- Cort
@@ -693,20 +696,23 @@ void machine_check_exception(struct pt_regs *regs)
 * -- BenH
 */
bad_page_fault(regs, regs-dar, SIGBUS);
-   return;
+   goto exit;
 #endif
 
if (debugger_fault_handler(regs))
-   return;
+   goto exit;
 
if (check_io_access(regs))
-   return;
+   goto exit;
 
die(Machine check, regs, SIGBUS);
 
/* Must die if the interrupt is not recoverable */
if (!(regs-msr  MSR_RI))
panic(Unrecoverable Machine check);
+
+exit:
+   exception_exit(regs);
 }
 
 void SMIException(struct pt_regs *regs)
@@ -716,20 +722,29 @@ void SMIException(struct pt_regs *regs)
 
 void unknown_exception(struct pt_regs *regs)
 {
+   exception_enter(regs);
+
printk(Bad trap at PC: %lx, SR: %lx, vector=%lx\n,
   regs-nip, regs-msr, regs-trap);
 
_exception(SIGTRAP, regs, 0, 0);
+
+   exception_exit(regs);
 }
 
 void instruction_breakpoint_exception(struct pt_regs *regs)
 {
+   exception_enter(regs);
+
if (notify_die(DIE_IABR_MATCH, iabr_match, regs, 5

[RFC PATCH v2 3/6] powerpc: Exit user context on notify resume

2013-03-29 Thread Li Zhong
This patch allows RCU usage in do_notify_resume, e.g. signal handling.
It corresponds to
[PATCH] x86: Exit RCU extended QS on notify resume
  commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/signal.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index cf12eae..d63b502 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -13,6 +13,7 @@
 #include linux/signal.h
 #include linux/uprobes.h
 #include linux/key.h
+#include linux/context_tracking.h
 #include asm/hw_breakpoint.h
 #include asm/uaccess.h
 #include asm/unistd.h
@@ -159,6 +160,8 @@ static int do_signal(struct pt_regs *regs)
 
 void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 {
+   user_exit();
+
if (thread_info_flags  _TIF_UPROBE)
uprobe_notify_resume(regs);
 
@@ -169,4 +172,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long 
thread_info_flags)
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
}
+
+   user_enter();
 }
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v2 4/6] powerpc: Use the new schedule_user API on userspace preemption

2013-03-29 Thread Li Zhong
This patch corresponds to
[PATCH] x86: Use the new schedule_user API on userspace preemption
  commit 0430499ce9d78691f3985962021b16bf8f8a8048

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/context_tracking.h |   11 +++
 arch/powerpc/kernel/entry_64.S  |3 ++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/context_tracking.h 
b/arch/powerpc/include/asm/context_tracking.h
index 377146e..4da287e 100644
--- a/arch/powerpc/include/asm/context_tracking.h
+++ b/arch/powerpc/include/asm/context_tracking.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_POWERPC_CONTEXT_TRACKING_H
 #define _ASM_POWERPC_CONTEXT_TRACKING_H
 
+#ifndef __ASSEMBLY__
 #include linux/context_tracking.h
 #include asm/ptrace.h
 
@@ -25,4 +26,14 @@ static inline void __exception_exit(struct pt_regs *regs)
 #endif
 }
 
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_CONTEXT_TRACKING
+#define SCHEDULE_USER bl   .schedule_user
+#else
+#define SCHEDULE_USER bl   .schedule
+#endif
+
+#endif /* !__ASSEMBLY__ */
+
 #endif
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 256c5bf..f7e4622 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -33,6 +33,7 @@
 #include asm/irqflags.h
 #include asm/ftrace.h
 #include asm/hw_irq.h
+#include asm/context_tracking.h
 
 /*
  * System calls.
@@ -618,7 +619,7 @@ _GLOBAL(ret_from_except_lite)
andi.   r0,r4,_TIF_NEED_RESCHED
beq 1f
bl  .restore_interrupts
-   bl  .schedule
+   SCHEDULE_USER
b   .ret_from_except_lite
 
 1: bl  .save_nvgprs
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v2 6/6] powerpc: Use generic code for exception handling

2013-03-29 Thread Li Zhong
After the exception handling moved to generic code, and some changes in
following two commits:
56dd9470d7c8734f055da2a6bac553caf4a468eb
  context_tracking: Move exception handling to generic code
6c1e0256fad84a843d915414e4b5973b7443d48d
  context_tracking: Restore correct previous context state on exception exit

it is able for this patch to replace the implementation in arch code
with the generic code in above commits.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/context_tracking.h |   29 ---
 arch/powerpc/kernel/exceptions-64s.S|4 +--
 arch/powerpc/kernel/traps.c |   42 +-
 arch/powerpc/mm/fault.c |7 ++--
 arch/powerpc/mm/hash_utils_64.c |   51 ++-
 5 files changed, 57 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/include/asm/context_tracking.h 
b/arch/powerpc/include/asm/context_tracking.h
index 4da287e..b6f5a33 100644
--- a/arch/powerpc/include/asm/context_tracking.h
+++ b/arch/powerpc/include/asm/context_tracking.h
@@ -1,39 +1,10 @@
 #ifndef _ASM_POWERPC_CONTEXT_TRACKING_H
 #define _ASM_POWERPC_CONTEXT_TRACKING_H
 
-#ifndef __ASSEMBLY__
-#include linux/context_tracking.h
-#include asm/ptrace.h
-
-/*
- * temporarily defined to avoid potential conflicts with the common
- * implementation, these will be removed by a later patch after the common
- * code enters powerpc tree
- */
-#define exception_enter __exception_enter
-#define exception_exit __exception_exit
-
-static inline void __exception_enter(struct pt_regs *regs)
-{
-   user_exit();
-}
-
-static inline void __exception_exit(struct pt_regs *regs)
-{
-#ifdef CONFIG_CONTEXT_TRACKING
-   if (user_mode(regs))
-   user_enter();
-#endif
-}
-
-#else /* __ASSEMBLY__ */
-
 #ifdef CONFIG_CONTEXT_TRACKING
 #define SCHEDULE_USER bl   .schedule_user
 #else
 #define SCHEDULE_USER bl   .schedule
 #endif
 
-#endif /* !__ASSEMBLY__ */
-
 #endif
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 6d82f4f..a8a5361 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1368,17 +1368,15 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_SLB)
rlwimi  r4,r0,32-13,30,30   /* becomes _PAGE_USER access bit */
ori r4,r4,1 /* add _PAGE_PRESENT */
rlwimi  r4,r5,22+2,31-2,31-2/* Set _PAGE_EXEC if trap is 0x400 */
-   addir6,r1,STACK_FRAME_OVERHEAD
 
/*
 * r3 contains the faulting address
 * r4 contains the required access permissions
 * r5 contains the trap number
-* r6 contains the address of pt_regs
 *
 * at return r3 = 0 for success, 1 for page fault, negative for error
 */
-   bl  .hash_page_ct   /* build HPTE if possible */
+   bl  .hash_page  /* build HPTE if possible */
cmpdi   r3,0/* see if hash_page succeeded */
 
/* Success */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 6228b6b..1b46c2d9 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -35,6 +35,7 @@
 #include linux/kdebug.h
 #include linux/debugfs.h
 #include linux/ratelimit.h
+#include linux/context_tracking.h
 
 #include asm/emulated_ops.h
 #include asm/pgtable.h
@@ -60,7 +61,6 @@
 #include asm/switch_to.h
 #include asm/tm.h
 #include asm/debug.h
-#include asm/context_tracking.h
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -669,8 +669,9 @@ int machine_check_generic(struct pt_regs *regs)
 void machine_check_exception(struct pt_regs *regs)
 {
int recover = 0;
+   enum ctx_state prev_state;
 
-   exception_enter(regs);
+   prev_state = exception_enter();
 
__get_cpu_var(irq_stat).mce_exceptions++;
 
@@ -712,7 +713,7 @@ void machine_check_exception(struct pt_regs *regs)
panic(Unrecoverable Machine check);
 
 exit:
-   exception_exit(regs);
+   exception_exit(prev_state);
 }
 
 void SMIException(struct pt_regs *regs)
@@ -722,19 +723,21 @@ void SMIException(struct pt_regs *regs)
 
 void unknown_exception(struct pt_regs *regs)
 {
-   exception_enter(regs);
+   enum ctx_state prev_state;
+   prev_state = exception_enter();
 
printk(Bad trap at PC: %lx, SR: %lx, vector=%lx\n,
   regs-nip, regs-msr, regs-trap);
 
_exception(SIGTRAP, regs, 0, 0);
 
-   exception_exit(regs);
+   exception_exit(prev_state);
 }
 
 void instruction_breakpoint_exception(struct pt_regs *regs)
 {
-   exception_enter(regs);
+   enum ctx_state prev_state;
+   prev_state = exception_enter();
 
if (notify_die(DIE_IABR_MATCH, iabr_match, regs, 5,
5, SIGTRAP) == NOTIFY_STOP)
@@ -744,7 +747,7 @@ void

[RFC PATCH v2 5/6] powerpc: select HAVE_CONTEXT_TRACKING for pSeries

2013-03-29 Thread Li Zhong
Start context tracking support from pSeries.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 9a0941b..023b288 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -18,6 +18,7 @@ config PPC_PSERIES
select PPC_PCI_CHOICE if EXPERT
select ZLIB_DEFLATE
select PPC_DOORBELL
+   select HAVE_CONTEXT_TRACKING
default y
 
 config PPC_SPLPAR
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH powerpc] try secondary hash before BUG in kernel_map_linear_page()

2013-02-25 Thread Li Zhong
This patch tries to fix following issue when CONFIG_DEBUG_PAGEALLOC
is enabled:

[  543.075675] [ cut here ]
[  543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239!
[  543.075714] Oops: Exception in kernel mode, sig: 5 [#1]
[  543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[  543.075741] Modules linked in: binfmt_misc ehea
[  543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: c005a594
[  543.075771] REGS: c000a90832c0 TRAP: 0700   Not tainted  
(3.8.0-next-20130222)
[  543.075781] MSR: 80029032 SF,EE,ME,IR,DR,RI  CR: 4482  XER: 

[  543.075816] SOFTE: 0
[  543.075823] CFAR: c004c200
[  543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: c000a908 
CPU: 1
GPR00: 0001 c000a9083540 c0c600a8 
GPR04: 0050 fffa c000a90834e0 004ff594
GPR08: 0001  9592d4d8 c0c86854
GPR12: 0002 c6ead300 00a51000 0001
GPR16: f3354380  ff80 
GPR20: 0001 c0c600a8 0001 0001
GPR24: 03354380 c000  c0b65950
GPR28: 0020 000cd50e 00bf50d9 c0c7c230
[  543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8
[  543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8
[  543.076025] Call Trace:
[  543.076033] [c000a9083540] [c0036ea4] 
.kernel_map_pages+0x1d4/0x3f8 (unreliable)
[  543.076053] [c000a9083640] [c0167638] 
.get_page_from_freelist+0x6cc/0x8dc
[  543.076067] [c000a9083800] [c0167a48] 
.__alloc_pages_nodemask+0x200/0x96c
[  543.076082] [c000a90839c0] [c01ade44] 
.alloc_pages_vma+0x160/0x1e4
[  543.076098] [c000a9083a80] [c018ce04] 
.handle_pte_fault+0x1b0/0x7e8
[  543.076113] [c000a9083b50] [c018d5a8] 
.handle_mm_fault+0x16c/0x1a0
[  543.076129] [c000a9083c00] [c07bf1dc] .do_page_fault+0x4d0/0x7a4
[  543.076144] [c000a9083e30] [c00090e8] handle_page_fault+0x10/0x30
[  543.076155] Instruction dump:
[  543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de e96a0010 
e84a0008
[  543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 0b00 7f63db78 48785781 
6000
[  543.076224] ---[ end trace bd5807e8d6ae186b ]---

The code is borrowed from that in __hash_page_huge().

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/mm/hash_utils_64.c |   17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 1b6e127..31c7924 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1231,11 +1231,28 @@ static void kernel_map_linear_page(unsigned long vaddr, 
unsigned long lmi)
int ret;
 
hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
+
+repeat:
hpteg = ((hash  htab_hash_mask) * HPTES_PER_GROUP);
 
ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr),
 mode, HPTE_V_BOLTED,
 mmu_linear_psize, mmu_kernel_ssize);
+
+   if (unlikely(ret == -1)) {
+   hpteg = (~hash  htab_hash_mask) * HPTES_PER_GROUP;
+   ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr), mode,
+HPTE_V_SECONDARY,
+mmu_linear_psize, mmu_kernel_ssize);
+   if (ret == -1) {
+   if (mftb()  0x1)
+   hpteg = (hash  htab_hash_mask) *
+HPTES_PER_GROUP;
+   ppc_md.hpte_remove(hpteg);
+   goto repeat;
+   }
+   }
+
BUG_ON (ret  0);
spin_lock(linear_map_hash_lock);
BUG_ON(linear_map_hash_slots[lmi]  0x80);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/5] powerpc: Exception hooks for context tracking subsystem

2013-02-16 Thread Li Zhong
On Sun, 2013-02-10 at 15:10 +0100, Frederic Weisbecker wrote:
 2013/2/1 Li Zhong zh...@linux.vnet.ibm.com:
  This is the exception hooks for context tracking subsystem, including
  data access, program check, single step, instruction breakpoint, machine 
  check,
  alignment, fp unavailable, altivec assist, unknown exception, whose handlers
  might use RCU.
 
  This patch corresponds to
  [PATCH] x86: Exception hooks for userspace RCU extended QS
commit 6ba3c97a38803883c2eee489505796cb0a727122
 
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
 
 Looks good!
 
 I guess we should move exception_enter/exit definition to the generic
 code. They should be the same for all archs after all. 

Indeed.

 Also we are
 relying on user_mode(regs) but this may be buggy with some corner
 cases. For example if an exception happen after a call to user_exit()

I guess you mean user_enter() here, or am I confused?

 (on syscall exit) but before we actually resume in userspace, the
 exception will exit in kernel mode from the context tracking POV.
 
 So instead on relying on the regs, which are not sync with the context
 tracking state, we should use something like:
 
 prev_state = exception_enter();
 ...
 exception_exit(prev_state);
 
 Also preempt_schedule_irq() is concerned as well by this problem. So I
 should convert it to that scheme as well. I'm going to prepare some
 patches.
 
 Feel free to merge this patch in the powerpc tree, I'll do the
 conversion along the way.

Or if your patches gets merged earlier than these, I can update my code
according to yours.

Thanks, Zhong

 
 Thanks.
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/5] powerpc: Syscall hooks for context tracking subsystem

2013-02-07 Thread Li Zhong
On Thu, 2013-02-07 at 01:29 +0100, Frederic Weisbecker wrote:
 2013/2/1 Li Zhong zh...@linux.vnet.ibm.com:
  This is the syscall slow path hooks for context tracking subsystem,
  corresponding to
  [PATCH] x86: Syscall hooks for userspace RCU extended QS
commit bf5a3c13b939813d28ce26c01425054c740d6731
 
  TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
  is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
  better for it to be in the same 16 bits with others in the group, so in the
  asm code, andi. with this group could work.
 
  Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
  ---
   arch/powerpc/include/asm/thread_info.h |7 +--
   arch/powerpc/kernel/ptrace.c   |5 +
   2 files changed, 10 insertions(+), 2 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/thread_info.h 
  b/arch/powerpc/include/asm/thread_info.h
  index 406b7b9..414a261 100644
  --- a/arch/powerpc/include/asm/thread_info.h
  +++ b/arch/powerpc/include/asm/thread_info.h
  @@ -97,7 +97,7 @@ static inline struct thread_info 
  *current_thread_info(void)
   #define TIF_PERFMON_CTXSW  6   /* perfmon needs ctxsw calls */
   #define TIF_SYSCALL_AUDIT  7   /* syscall auditing active */
   #define TIF_SINGLESTEP 8   /* singlestepping active */
  -#define TIF_MEMDIE 9   /* is terminating due to OOM killer 
  */
  +#define TIF_NOHZ   9   /* in adaptive nohz mode */
   #define TIF_SECCOMP10  /* secure computing */
   #define TIF_RESTOREALL 11  /* Restore all regs (implies 
  NOERROR) */
   #define TIF_NOERROR12  /* Force successful syscall return 
  */
  @@ -106,6 +106,7 @@ static inline struct thread_info 
  *current_thread_info(void)
   #define TIF_SYSCALL_TRACEPOINT 15  /* syscall tracepoint 
  instrumentation */
   #define TIF_EMULATE_STACK_STORE16  /* Is an instruction 
  emulation
  for stack store? */
  +#define TIF_MEMDIE 17  /* is terminating due to OOM killer 
  */
 
   /* as above, but as bit values */
   #define _TIF_SYSCALL_TRACE (1TIF_SYSCALL_TRACE)
  @@ -124,8 +125,10 @@ static inline struct thread_info 
  *current_thread_info(void)
   #define _TIF_UPROBE(1TIF_UPROBE)
   #define _TIF_SYSCALL_TRACEPOINT(1TIF_SYSCALL_TRACEPOINT)
   #define _TIF_EMULATE_STACK_STORE   (1TIF_EMULATE_STACK_STORE)
  +#define _TIF_NOHZ  (1TIF_NOHZ)
   #define _TIF_SYSCALL_T_OR_A(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
  -_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
  +_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
  +_TIF_NOHZ)
 
   #define _TIF_USER_WORK_MASK(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
   _TIF_NOTIFY_RESUME | _TIF_UPROBE)
  diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
  index c497000..62238dd 100644
  --- a/arch/powerpc/kernel/ptrace.c
  +++ b/arch/powerpc/kernel/ptrace.c
  @@ -32,6 +32,7 @@
   #include trace/syscall.h
   #include linux/hw_breakpoint.h
   #include linux/perf_event.h
  +#include linux/context_tracking.h
 
   #include asm/uaccess.h
   #include asm/page.h
  @@ -1745,6 +1746,8 @@ long do_syscall_trace_enter(struct pt_regs *regs)
   {
  long ret = 0;
 
  +   user_exit();
  +
  secure_computing_strict(regs-gpr[0]);
 
  if (test_thread_flag(TIF_SYSCALL_TRACE) 
  @@ -1789,4 +1792,6 @@ void do_syscall_trace_leave(struct pt_regs *regs)
  step = test_thread_flag(TIF_SINGLESTEP);
  if (step || test_thread_flag(TIF_SYSCALL_TRACE))
  tracehook_report_syscall_exit(regs, step);
  +
  +   user_enter();
 
 In x86-64, schedule_user() and do_notify_resume() can be called before
 syscall_trace_leave(). As a result we may be entering
 syscall_trace_leave() in user mode (from a context tracking POV). To
 fix this I added a call to user_exit() on the very beginning of that
 function.
 
 You can find the details in 2c5594df344cd1ff0cc9bf007dea3235582b3acf
 (rcu: Fix unrecovered RCU user mode in syscall_trace_leave()).

Hi Frederic, 

Thank you very much for the reminding. 

 
 Could this problem happen in ppc as well?

I checked the code(64 bit) today, it seems to me that it won't happen.
But fortunately, we are in the ppc mailing list, please correct me if my
understanding is wrong.

By the way, I enabled CONTEXT_TRACKING_FORCE and PROVE_RCU, so if it
could happen, I think there should be some illegal RCU usage complaints
reported.  

Thanks, Zhong

 Thanks.
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH 0/5] powerpc: Support context tracking for Power pSeries

2013-02-01 Thread Li Zhong
These patches try to support context tracking for Power arch, beginning with 
64-bit pSeries. The codes are ported from that of the x86_64, and in each
patch, I listed the corresponding patch for x86.

Would you please help review and give your comments?

Thanks, Zhong

Li Zhong (5):
  powerpc: Syscall hooks for context tracking subsystem
  powerpc: Exception hooks for context tracking subsystem
  powerpc: Exit user context on notify resume
  powerpc: Use the new schedule_user API on userspace preemption
  powerpc: select HAVE_CONTEXT_TRACKING for pSeries

 arch/powerpc/include/asm/context_tracking.h |   31 +++
 arch/powerpc/include/asm/thread_info.h  |7 ++-
 arch/powerpc/kernel/entry_64.S  |3 +-
 arch/powerpc/kernel/exceptions-64s.S|4 +-
 arch/powerpc/kernel/ptrace.c|5 ++
 arch/powerpc/kernel/signal.c|5 ++
 arch/powerpc/kernel/traps.c |   79 ---
 arch/powerpc/mm/fault.c |   15 -
 arch/powerpc/mm/hash_utils_64.c |   17 ++
 arch/powerpc/platforms/pseries/Kconfig  |1 +
 10 files changed, 141 insertions(+), 26 deletions(-)
 create mode 100644 arch/powerpc/include/asm/context_tracking.h

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH 1/5] powerpc: Syscall hooks for context tracking subsystem

2013-02-01 Thread Li Zhong
This is the syscall slow path hooks for context tracking subsystem,
corresponding to
[PATCH] x86: Syscall hooks for userspace RCU extended QS
  commit bf5a3c13b939813d28ce26c01425054c740d6731

TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
better for it to be in the same 16 bits with others in the group, so in the
asm code, andi. with this group could work.

Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/thread_info.h |7 +--
 arch/powerpc/kernel/ptrace.c   |5 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 406b7b9..414a261 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -97,7 +97,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_PERFMON_CTXSW  6   /* perfmon needs ctxsw calls */
 #define TIF_SYSCALL_AUDIT  7   /* syscall auditing active */
 #define TIF_SINGLESTEP 8   /* singlestepping active */
-#define TIF_MEMDIE 9   /* is terminating due to OOM killer */
+#define TIF_NOHZ   9   /* in adaptive nohz mode */
 #define TIF_SECCOMP10  /* secure computing */
 #define TIF_RESTOREALL 11  /* Restore all regs (implies NOERROR) */
 #define TIF_NOERROR12  /* Force successful syscall return */
@@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SYSCALL_TRACEPOINT 15  /* syscall tracepoint instrumentation */
 #define TIF_EMULATE_STACK_STORE16  /* Is an instruction emulation
for stack store? */
+#define TIF_MEMDIE 17  /* is terminating due to OOM killer */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE (1TIF_SYSCALL_TRACE)
@@ -124,8 +125,10 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_UPROBE(1TIF_UPROBE)
 #define _TIF_SYSCALL_TRACEPOINT(1TIF_SYSCALL_TRACEPOINT)
 #define _TIF_EMULATE_STACK_STORE   (1TIF_EMULATE_STACK_STORE)
+#define _TIF_NOHZ  (1TIF_NOHZ)
 #define _TIF_SYSCALL_T_OR_A(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
-_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
+_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
+_TIF_NOHZ)
 
 #define _TIF_USER_WORK_MASK(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
 _TIF_NOTIFY_RESUME | _TIF_UPROBE)
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index c497000..62238dd 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include trace/syscall.h
 #include linux/hw_breakpoint.h
 #include linux/perf_event.h
+#include linux/context_tracking.h
 
 #include asm/uaccess.h
 #include asm/page.h
@@ -1745,6 +1746,8 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 {
long ret = 0;
 
+   user_exit();
+
secure_computing_strict(regs-gpr[0]);
 
if (test_thread_flag(TIF_SYSCALL_TRACE) 
@@ -1789,4 +1792,6 @@ void do_syscall_trace_leave(struct pt_regs *regs)
step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, step);
+
+   user_enter();
 }
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


  1   2   >