Re: [PATCH] rtas: Validate rtas entry before calling enter_rtas

2015-10-16 Thread Denis Kirjanov
On 10/16/15, Vasant Hegde  wrote:
> On 10/16/2015 04:02 PM, Denis Kirjanov wrote:
>> On 10/16/15, Vasant Hegde  wrote:
>>> Currently we do not validate rtas entry before calling enter_rtas().
>>> This
>>> is resulting in a kernel oops (see below) when user space calls rtas
>>> system
>>> call on PowerNV platform. We hit below oops when we ran trinity (system
>>> call
>>> fuzzer) on PowerNV. This patch adds code to validate rtas entry before
>>> making
>>> enter_rtas() call.
>>
>> Hi,
>> have you figured out why we have null entry?
>
> Denis,
>
> Yes... On PowerNV platform we don't have RTAS.. Hence it's not initialized.
But why do we have CONFIG_PPC_RTAS on OPAL machines then?


>
> -Vasant
>
>>
>> Thanks!
>>>
>>> dmesg:
>>> -
>>> [22061.541428] Oops: Exception in kernel mode, sig: 4 [#1]
>>> [22061.541446] SMP NR_CPUS=1024 NUMA PowerNV
>>> [22061.541453] Modules linked in: rfcomm bnep nfnetlink
>>> scsi_transport_iscsi
>>> hidp nfc af_802154 ieee802154 bluetooth rfkill pppoe pppox ppp_generic
>>> slhc
>>> irda crc_ccitt af_key sctp libcrc32c atm appletalk ipx p8023 psnap p8022
>>> ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_CHECKSUM tun ip6t_rpfilter
>>> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack
>>> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
>>> ip6_tables
>>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
>>> nf_conntrack
>>> iptable_mangle iptable_security iptable_raw windfarm_smu_sat ses
>>> enclosure
>>> windfarm_pid shpchp i2c_opal i2c_core kvm_hv kvm_pr dm_multipath kvm
>>> lpfc
>>> tg3 ptp pps_core scsi_transport_fc
>>> [22061.541561] CPU: 40 PID: 57748 Comm: trinity-c11 Not tainted
>>> 3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le #1
>>> [22061.541566] task: c4294b80 ti: c007e1a78000 task.ti:
>>> c007e1a78000
>>> [22061.541570] NIP:  LR: 9c14 CTR:
>>> c0423140
>>> [22061.541573] REGS: c007e1a7b920 TRAP: 0e40   Not tainted
>>> (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
>>> [22061.541577] MSR: 10081000   CR:   XER:
>>> 
>>> [22061.541585] CFAR: c0009c0c SOFTE: 0
>>> GPR00: 90001031 c007e1a7bba0 c12b1d00
>>> 01338840
>>> GPR04:   10001000
>>> 90001033
>>> GPR08: 4000 80002933 3fff9e9d0068
>>> 
>>> GPR12: 00ff c7db7c00 
>>> 
>>> GPR16:   
>>> 
>>> GPR20:   
>>> 
>>> GPR24:  dc58 0001
>>> c01ee716e000
>>> GPR28:  c1338840 3fff9db3
>>> 
>>> [22061.541629] NIP []   (null)
>>> [22061.541637] LR [9c14] 0x9c14
>>> [22061.541640] Call Trace:
>>> [22061.541649] [c007e1a7bba0] [c041a7f4]
>>> avc_has_perm_noaudit+0x54/0x110 (unreliable)
>>> [22061.541657] [c007e1a7bd80] [c002ddc0]
>>> ppc_rtas+0x150/0x2d0
>>> [22061.541662] [c007e1a7be30] [c0009358]
>>> syscall_exit+0x0/0x98
>>> [22061.541666] Instruction dump:
>>> [22061.541669]      
>>>  
>>> [22061.541675]     6000 6000
>>> 6000 6000
>>> [22061.541688] ---[ end trace 6f9bf0b3d32096aa ]---
>>>
>>> Reported-by: NAGESWARA R. SASTRY 
>>> Signed-off-by: Vasant Hegde 
>>> ---
>>>  arch/powerpc/kernel/rtas.c |3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>> index 84bf934..5a753fa 100644
>>> --- a/arch/powerpc/kernel/rtas.c
>>> +++ b/arch/powerpc/kernel/rtas.c
>>> @@ -1043,6 +1043,9 @@ asmlinkage int ppc_rtas(struct rtas_args __user
>>> *uargs)
>>> if (!capable(CAP_SYS_ADMIN))
>>> return -EPERM;
>>>
>>> +   if (!rtas.entry)
>>> +   return -EINVAL;
>>> +
>>> if (copy_from_user(, uargs, 3 * sizeof(u32)) != 0)
>>> return -EFAULT;
>>>
>>>
>>> ___
>>> Linuxppc-dev mailing list
>>> Linuxppc-dev@lists.ozlabs.org
>>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>>
>
>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc/prom: Avoid reference to potentially freed memory

2015-10-16 Thread Christophe JAILLET
of_get_property() is used inside the loop, but then the reference to the
node is dropped before dereferencing the prop pointer, which could by then
point to junk if the node has been freed.

Instead use of_property_read_u32() to actually read the property
value before dropping the reference.

Signed-off-by: Christophe JAILLET 
---
v2: Fix missing '{'
*** COMPILE-TESTED ONLY ***
---
 arch/powerpc/kernel/prom.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index bef76c5..dc4f6a4 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -783,14 +783,13 @@ void __init early_get_first_memblock_info(void *params, 
phys_addr_t *size)
 int of_get_ibm_chip_id(struct device_node *np)
 {
of_node_get(np);
-   while(np) {
+   while (np) {
struct device_node *old = np;
-   const __be32 *prop;
+   u32 chip_id;
 
-   prop = of_get_property(np, "ibm,chip-id", NULL);
-   if (prop) {
+   if (!of_property_read_u32(np, "ibm,chip-id", _id)) {
of_node_put(np);
-   return be32_to_cpup(prop);
+   return chip_id;
}
np = of_get_parent(np);
of_node_put(old);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc/mpc5xxx: Avoid dereferencing potentially freed memory

2015-10-16 Thread Christophe JAILLET

Le 16/10/2015 11:49, Michael Ellerman a écrit :

On Fri, 2015-10-16 at 08:20 +0200, Christophe JAILLET wrote:

Le 15/10/2015 08:36, Michael Ellerman a écrit :

On Thu, 2015-10-15 at 07:56 +0200, Christophe JAILLET wrote:

Use 'of_property_read_u32()' instead of
'of_get_property()'+pointer
dereference in order to avoid access to potentially freed memory.

Use 'of_get_next_parent()' to simplify the while() loop and avoid
the
need of a temp variable.

Signed-off-by: Christophe JAILLET 
---
v2: Use of_property_read_u32 instead of of_get_property+pointer
dereference
*** Untested ***

Thanks.

Can someone with an mpc5xxx test this?

Hi,
I don't think it is an issue, but while looking at another similar
patch, I noticed that the proposed patch adds a call to
be32_to_cpup()
(within of_property_read_u32).
Apparently, powerPC is a BE architecture, so this call should be a no
-op.

Just wanted to point it out, in case of.

Hi Christoph,

I'm not sure I follow.

The device tree is always big endian, but of_property_read_u32() does
the
conversion to CPU endian for you already. That is one of the advantages
of
using it.

cheers



Hi,
sorry if un-clear.

What I mean is that in the patch related 
'powerpc/sysdev/mpc5xxx_clocks.c', there was no call to 'be32_to_cpup'.

So in the proposed patch, 'of_property_read_u32' adds it.

While in the patch against 'powerpc/kernel/prom.c', 'be32_to_cpup' was 
called explicitly.

So using 'of_property_read_u32' keep the same logic.


Basically the code from 'mpc5xxx_clocks.c' and from 'prom.c' were 
written the same way. I found spurious that a call to 'be32_to_cpup' was 
done in only one case.

Maybe, it was a missing in 'mpc5xxx_clocks.c'.


I don't know if it can be an issue or not. I just find it 'strange'.


CJ


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/prom: Avoid reference to potentially freed memory

2015-10-16 Thread Christophe JAILLET

Le 16/10/2015 12:02, Michael Ellerman a écrit :

As the kbuild robot detected you have left an extra "}" here.

I don't mind too much if you send patches that aren't compile tested, but you
might save yourself some time by compiling them.


Sorry about it, and thanks for your patience.
IMHO, this should never happen and patches should be at least 
compile-tested.


I will be more careful and compile-test any new patch I submit.

CJ

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] mm/powerpc: enabling memory soft dirty tracking

2015-10-16 Thread Andrew Morton
On Fri, 16 Oct 2015 14:07:05 +0200 Laurent Dufour  
wrote:

> This series is enabling the software memory dirty tracking in the
> kernel for powerpc.  This is the follow up of the commit 0f8975ec4db2
> ("mm: soft-dirty bits for user memory changes tracking") which
> introduced this feature in the mm code.
> 
> The first patch is fixing an issue in the code clearing the soft dirty
> bit.  The PTE were not cleared before being modified, leading to hang
> on ppc64.
> 
> The second patch is fixing a build issue when the transparent huge
> page is not enabled.
> 
> The third patch is introducing the soft dirty tracking in the powerpc
> architecture code. 

I grabbed these patches, but they're more a ppc thing than a core
kernel thing.  I can merge them into 4.3 with suitable acks or drop
them if they turn up in the powerpc tree.  Or something else?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc/mpc5xxx: Avoid dereferencing potentially freed memory

2015-10-16 Thread Christophe JAILLET

Le 15/10/2015 08:36, Michael Ellerman a écrit :

On Thu, 2015-10-15 at 07:56 +0200, Christophe JAILLET wrote:

Use 'of_property_read_u32()' instead of 'of_get_property()'+pointer
dereference in order to avoid access to potentially freed memory.

Use 'of_get_next_parent()' to simplify the while() loop and avoid the
need of a temp variable.

Signed-off-by: Christophe JAILLET 
---
v2: Use of_property_read_u32 instead of of_get_property+pointer dereference
*** Untested ***

Thanks.

Can someone with an mpc5xxx test this?

cheers



Hi,
I don't think it is an issue, but while looking at another similar 
patch, I noticed that the proposed patch adds a call to be32_to_cpup() 
(within of_property_read_u32).

Apparently, powerPC is a BE architecture, so this call should be a no-op.

Just wanted to point it out, in case of.

Best regards,
CJ

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/prom: Avoid reference to potentially freed memory

2015-10-16 Thread kbuild test robot
Hi Christophe,

[auto build test ERROR on powerpc/next -- if it's inappropriate base, please 
suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/Christophe-JAILLET/powerpc-prom-Avoid-reference-to-potentially-freed-memory/20151016-141714
config: powerpc-defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/prom.c: In function 'of_get_ibm_chip_id':
>> arch/powerpc/kernel/prom.c:787:23: error: unused variable 'old' 
>> [-Werror=unused-variable]
  struct device_node *old = np;
  ^
>> arch/powerpc/kernel/prom.c:795:15: error: 'old' undeclared (first use in 
>> this function)
  of_node_put(old);
  ^
   arch/powerpc/kernel/prom.c:795:15: note: each undeclared identifier is 
reported only once for each function it appears in
   arch/powerpc/kernel/prom.c: At top level:
>> arch/powerpc/kernel/prom.c:797:2: error: expected identifier or '(' before 
>> 'return'
 return -1;
 ^
>> arch/powerpc/kernel/prom.c:798:1: error: expected identifier or '(' before 
>> '}' token
}
^
   arch/powerpc/kernel/prom.c: In function 'of_get_ibm_chip_id':
   arch/powerpc/kernel/prom.c:796:2: error: control reaches end of non-void 
function [-Werror=return-type]
 }
 ^
   cc1: all warnings being treated as errors

vim +/old +787 arch/powerpc/kernel/prom.c

b37193b7 Benjamin Herrenschmidt 2013-07-15  781   * be found.
b37193b7 Benjamin Herrenschmidt 2013-07-15  782   */
b37193b7 Benjamin Herrenschmidt 2013-07-15  783  int of_get_ibm_chip_id(struct 
device_node *np)
b37193b7 Benjamin Herrenschmidt 2013-07-15  784  {
b37193b7 Benjamin Herrenschmidt 2013-07-15  785 of_node_get(np);
b37193b7 Benjamin Herrenschmidt 2013-07-15  786 while (np) {
b37193b7 Benjamin Herrenschmidt 2013-07-15 @787 struct 
device_node *old = np;
12540384 Christophe JAILLET 2015-10-16  788 u32 chip_id;
b37193b7 Benjamin Herrenschmidt 2013-07-15  789  
12540384 Christophe JAILLET 2015-10-16  790 if 
(!of_property_read_u32(np, "ibm,chip-id", _id))
b37193b7 Benjamin Herrenschmidt 2013-07-15  791 
of_node_put(np);
12540384 Christophe JAILLET 2015-10-16  792 return 
chip_id;
b37193b7 Benjamin Herrenschmidt 2013-07-15  793 }
b37193b7 Benjamin Herrenschmidt 2013-07-15  794 np = 
of_get_parent(np);
b37193b7 Benjamin Herrenschmidt 2013-07-15 @795 
of_node_put(old);
b37193b7 Benjamin Herrenschmidt 2013-07-15  796 }
b37193b7 Benjamin Herrenschmidt 2013-07-15 @797 return -1;
b37193b7 Benjamin Herrenschmidt 2013-07-15 @798  }
b130e7c0 Dan Streetman  2015-05-07  799  
EXPORT_SYMBOL(of_get_ibm_chip_id);
b37193b7 Benjamin Herrenschmidt 2013-07-15  800  
3eb906c6 Michael Ellerman   2013-11-20  801  /**

:: The code at line 787 was first introduced by commit
:: b37193b71846858d816e152d3a5db010d7b73f5e powerpc/powernv: Add helper to 
get ibm,chip-id of a node

:: TO: Benjamin Herrenschmidt <b...@kernel.crashing.org>
:: CC: Benjamin Herrenschmidt <b...@kernel.crashing.org>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc/mpc5xxx: Avoid dereferencing potentially freed memory

2015-10-16 Thread Gabriel Paubert
On Fri, Oct 16, 2015 at 08:20:13AM +0200, Christophe JAILLET wrote:
> Le 15/10/2015 08:36, Michael Ellerman a écrit :
> >On Thu, 2015-10-15 at 07:56 +0200, Christophe JAILLET wrote:
> >>Use 'of_property_read_u32()' instead of 'of_get_property()'+pointer
> >>dereference in order to avoid access to potentially freed memory.
> >>
> >>Use 'of_get_next_parent()' to simplify the while() loop and avoid the
> >>need of a temp variable.
> >>
> >>Signed-off-by: Christophe JAILLET 
> >>---
> >>v2: Use of_property_read_u32 instead of of_get_property+pointer dereference
> >>*** Untested ***
> >Thanks.
> >
> >Can someone with an mpc5xxx test this?
> >
> >cheers
> >
> 
> Hi,
> I don't think it is an issue, but while looking at another similar
> patch, I noticed that the proposed patch adds a call to
> be32_to_cpup() (within of_property_read_u32).
> Apparently, powerPC is a BE architecture, so this call should be a no-op.

Sadly no more. 32 bit is BE only, but 64 bit can be either BEtter or
LEsser.

Gabriel
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online

2015-10-16 Thread Laurent Vivier


On 16/10/2015 04:14, Michael Ellerman wrote:
> On Thu, 2015-10-15 at 21:00 +0200, Laurent Vivier wrote:
>> On kexec, all secondary offline CPUs are onlined before
>> starting the new kernel, this is not done in the case of kdump.
>>
>> If kdump is configured and a kernel crash occurs whereas
>> some secondaries CPUs are offline (SMT=off),
>> the new kernel is not able to start them and displays some
>> "Processor X is stuck.".
> 
> Do we know why they are stuck?

Yes, we know :)

On the crash, as the CPUs are offline, kernel doesn't call
opal_return_cpu(), so for OPAL all these CPU are always in the kernel.

When the new kernel starts, it call s opal_query_cpu_status() to know
which CPUs are available. As they were not returned to OPAL these CPUs
are not available, but as the kernel logic relies on the fact they must
be available (the logic is SMT is on), it is waiting for their starting
and wait for ever...

When the kernel starts, all secondary processors are started by a call
for each of them of __cpu_up():

__cpu_up()

...
cpu_callin_map[cpu] = 0;
...
rc = smp_ops->kick_cpu(cpu);
...wait...
if (!cpu_callin_map[cpu]) {
printk(KERN_ERR "Processor %u is stuck.\n", cpu);
...

on powernv, kick_cpu() is pnv_smp_kick_cpu():

pnv_smp_kick_cpu()

...
unsigned long start_here =

__pa(ppc_function_entry(generic_secondary_smp_init));
...
/*
 * Already started, just kick it, probably coming from
 * kexec and spinning
 */
rc = opal_query_cpu_status(pcpu, );
...
if (status == OPAL_THREAD_STARTED)
goto kick;
...
rc = opal_start_cpu(pcpu, start_here);
...
kick:
...

generic_secondary_smp_init() is a function in assembly language that
calls in the end start_secondary() :

start_secondary()

...
cpu_callin_map[cpu] = 1;
...

So processors are stucked because start_secondary() is never called.

start_secondary() is never called because OPAL cpu status is
OPAL_THREAD_STARTED.

Secondary CPUs are in "OPAL_THREAD_STARTED" state because they have not
been returned to OPAL on crash.

CPUs are returned to OPAL by pnv_kexec_cpu_down() which is called by
crash_ipi_callback() (for secondary cpus)... except if the cpu is not
online.

As the CPUs are offline, they are not returned to OPAL, and then kernel
can't restart them.


> I really don't like this fix. The reason we're doing a kdump is because the
> first kernel has panicked, possibly with locks held or data structures
> corrupted. Calling cpu_up() then goes and tries to run a bunch of code in the
> crashed kernel, which increases the chance of us just wedging completely.

I agree, but the whole logic of the POWER kernel is we have all the
threads available.

Moreover the kernel parameter "maxcpus" is ignored if it is not a
multiple of thread per core:

...
static int subcore_init(void)
{
if (!cpu_has_feature(CPU_FTR_ARCH_207S))
return 0;

/*
 * We need all threads in a core to be present to split/unsplit so
 * continue only if max_cpus are aligned to threads_per_core.
 */
if (setup_max_cpus % threads_per_core)
return 0;

...
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/prom: Avoid reference to potentially freed memory

2015-10-16 Thread Christophe JAILLET
of_get_property() is used inside the loop, but then the reference to the
node is dropped before dereferencing the prop pointer, which could by then
point to junk if the node has been freed.

Instead use of_property_read_u32() to actually read the property
value before dropping the reference.

Signed-off-by: Christophe JAILLET 
---
*** UNTESTED ***
---
 arch/powerpc/kernel/prom.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index bef76c5..dc4f6a4 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -783,14 +783,13 @@ void __init early_get_first_memblock_info(void *params, 
phys_addr_t *size)
 int of_get_ibm_chip_id(struct device_node *np)
 {
of_node_get(np);
-   while(np) {
+   while (np) {
struct device_node *old = np;
-   const __be32 *prop;
+   u32 chip_id;
 
-   prop = of_get_property(np, "ibm,chip-id", NULL);
-   if (prop) {
+   if (!of_property_read_u32(np, "ibm,chip-id", _id))
of_node_put(np);
-   return be32_to_cpup(prop);
+   return chip_id;
}
np = of_get_parent(np);
of_node_put(old);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online

2015-10-16 Thread Laurent Vivier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256



On 16/10/2015 04:29, David Gibson wrote:
> On Thu, 15 Oct 2015 21:00:58 +0200 Laurent Vivier
>  wrote:
> 
>> On kexec, all secondary offline CPUs are onlined before starting
>> the new kernel, this is not done in the case of kdump.
>> 
>> If kdump is configured and a kernel crash occurs whereas some
>> secondaries CPUs are offline (SMT=off), the new kernel is not
>> able to start them and displays some "Processor X is stuck.".
>> 
>> Starting with POWER8, subcore logic relies on all threads of core
>> being booted. So, on startup kernel tries to start all threads,
>> and asks OPAL (or RTAS) to start all CPUs (including threads). If
>> a CPU has been offlined by the previous kernel, it has not been
>> returned to OPAL, and thus OPAL cannot restart it: this CPU has
>> been lost...
>> 
>> Signed-off-by: Laurent Vivier 
> 
> Nice analysis of the problem.  But, I'm a bit uneasy about this
> approach to fixing it: Onlining potentially hundreds of CPU threads
> seems like a risky operation in a kernel that's already crashed.

I agree.

> I don't have a terribly clear idea of what is the best way to
> address this.  Here's a few ideas in the right general direction:
> 
> * I'm already looking into a kdump userspace fixes to stop it 
> attempting to bring up secondary CPUs
> 
> * A working kernel option to say "only allow this many online cpus 
> ever" which we could pass to the kdump kernel would be nice
> 
> * Paulus had an idea about offline threads returning themselves 
> directly to OPAL by kicking a flag at kdump/kexec time.

For me the problem is: as these CPUs are offline, I guess the core has
been switched to 1 thread per core, so the CPUs (1 to 7 for core 0)
don't exist anymore, how can we return them to OPAL ?

> 
> BenH, Paulus,
> 
> OPAL <-> kernel cpu transitions don't seem to work quite how I
> thought they would.  IIUC there's a register we can use to directly
> control which threads on a core are active.  Given that I would
> have thought cpu "ownership" OPAL vs. kernel would be on a
> per-core, rather than per-thread basis.
> 
> Is there some way we can change the CPU onlining / offlining code
> so that if threads aren't in OPAL, we directly enable them, rather
> than just hoping they're in a nap loop somewhere?
> 

Laurent
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJWIK3kAAoJEPMMOL0/L748S4UP/2rJIRavrB4QylPMYKpRIxf6
VCLuve3TRY40er5GO8bwQ+95yHUo8K57OzZAh8T2mDQGjHGJArMElWUbb+EGaDF2
z5FU0iH7TKkJ9FDBlz2ZTny0vrEK2eBwxAFggLcfF8PeKMs5H4Rh9FrTFKKuc9Z4
KSAdhi4niKVdn0ln8M6k5FGB3AE0gG7zeTPeO74Knrr8cvOX1Xk5pfgzo2WpD91w
zymDgG127xBL0G9gs8jrse+yXoB2dLsevdxS6CEH4vKnjsLokqnWlk1n9JeIUKiW
+BEZ0llb5jppBYzOmrghTS5fPwh+Nmkbc4Kk9i/1Tjb8LRXNBEiSxVtHu9XIdwve
K37gOIuqCkOap0NE/AbcDjsFEoCFVSHbdD6cCgtLEPVFq7f8w7U/qa9ty//PM8br
KGtfZ1sG2/LCapMuyx3QhplxrXEy/bpQwT3BPnS818OMxrE20QfR5PM2C+nCpd4H
8mpdLpOctLJ7lgmYSwSlbNkJrQJvTFXv8WhZB2Qkadi0yaq8C5JZ3Dr10HrijoVL
lsOfrevB/mHrZmLBkp8t4+UYa5fM59nNpFZ/0BTdWfP8CDAlkw2Kla5PVeKN4ssk
GzySgQwOPsyS27aAk005ZeXPtfrGD93A43EcwG4IULf5J8DbzmCt5gPoJ241D0IO
3Z8+/4nl3WVRVzQ/Lwlc
=yLqE
-END PGP SIGNATURE-
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 2/2] powerpc/85xx: Add PCIe controller support for bsc9132qds

2015-10-16 Thread Zhiqiang Hou
From: Harninder Rai 

1. Use machine_arch_initcall to hook mpc85xx_common_publish_devices
This can ensure before pcibios_init() is called, pci controllers have
been probed and added to the hose_list.
2. Add a workaround for errata A-005434
For the BSC9132, PEX_PEXIWARn[TRGT] for all windows defaults to 0xF,
which is mapped to CCSRBAR. However, for other products, 0xF is
mapped to the local memory. Therefore, for the BSC9132, any default
PCI Express access to the local memory (DDR) will now access the
CCSRBAR. This patch changes the mapping of targets of inbound windows
PEX_PEXIWARn[TRGT] to the Local address space – 0x0 (from 0xF).

Signed-off-by: Harninder Rai 
Signed-off-by: Minghuan Lian 
Change-Id: Iebb3d0e057d64d5b96263c130495e5d04caf7948
Reviewed-on: http://git.am.freescale.net:8181/2455
Reviewed-by: Zang Tiefei-R61911 
Reviewed-by: Kushwaha Prabhakar-B32579 
Reviewed-by: Fleming Andrew-AFLEMING 
Tested-by: Fleming Andrew-AFLEMING 
---
 arch/powerpc/platforms/85xx/bsc913x_qds.c |  8 +++-
 arch/powerpc/sysdev/fsl_pci.c | 13 +
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/85xx/bsc913x_qds.c 
b/arch/powerpc/platforms/85xx/bsc913x_qds.c
index f0927e5..dcfafd6 100644
--- a/arch/powerpc/platforms/85xx/bsc913x_qds.c
+++ b/arch/powerpc/platforms/85xx/bsc913x_qds.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "mpc85xx.h"
@@ -46,10 +47,12 @@ static void __init bsc913x_qds_setup_arch(void)
mpc85xx_smp_init();
 #endif
 
+   fsl_pci_assign_primary();
+
pr_info("bsc913x board from Freescale Semiconductor\n");
 }
 
-machine_device_initcall(bsc9132_qds, mpc85xx_common_publish_devices);
+machine_arch_initcall(bsc9132_qds, mpc85xx_common_publish_devices);
 
 /*
  * Called very early, device-tree isn't unflattened
@@ -67,6 +70,9 @@ define_machine(bsc9132_qds) {
.probe  = bsc9132_qds_probe,
.setup_arch = bsc913x_qds_setup_arch,
.init_IRQ   = bsc913x_qds_pic_init,
+#ifdef CONFIG_PCI
+   .pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
+#endif
.get_irq= mpic_get_irq,
.restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index ebc1f412..b8607f6 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -193,6 +193,19 @@ static void setup_pci_atmu(struct pci_controller *hose)
const u64 *reg;
int len;
 
+   if (of_device_is_compatible(hose->dn, "fsl,bsc9132-pcie")) {
+   /*
+* BSC9132 Rev1.0 has an issue where all the PEX inbound
+* windows have implemented the default target value as 0xf
+* for CCSR space.In all Freescale legacy devices the target
+* of 0xf is reserved for local memory space. 9132 Rev1.0
+* now has local mempry space mapped to target 0x0 instead of
+* 0xf. Hence adding a workaround to remove the target 0xf
+* defined for memory space from Inbound window attributes.
+*/
+   piwar &= ~PIWAR_TGI_LOCAL;
+   }
+
if (early_find_capability(hose, 0, 0, PCI_CAP_ID_EXP)) {
if (in_be32(>block_rev1) >= PCIE_IP_REV_2_2) {
win_idx = 2;
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 1/2] powerpc/fsl: Add PCI node in device tree of bsc9132qds

2015-10-16 Thread Zhiqiang Hou
From: Harninder Rai 

Signed-off-by: Harninder Rai 
Signed-off-by: Minghuan Lian 
Change-Id: I4355add4a92d1fcf514843aea5ecadd2e2517969
Reviewed-on: http://git.am.freescale.net:8181/2454
Reviewed-by: Zang Tiefei-R61911 
Reviewed-by: Kushwaha Prabhakar-B32579 
Reviewed-by: Fleming Andrew-AFLEMING 
Tested-by: Fleming Andrew-AFLEMING 
---
V2:
 - Remove property clock-frequency.

 arch/powerpc/boot/dts/bsc9132qds.dts  | 15 ++
 arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi | 28 +++
 arch/powerpc/boot/dts/fsl/bsc9132si-pre.dtsi  |  1 +
 3 files changed, 44 insertions(+)

diff --git a/arch/powerpc/boot/dts/bsc9132qds.dts 
b/arch/powerpc/boot/dts/bsc9132qds.dts
index 6cab106..940d719 100644
--- a/arch/powerpc/boot/dts/bsc9132qds.dts
+++ b/arch/powerpc/boot/dts/bsc9132qds.dts
@@ -29,6 +29,21 @@
soc: soc@ff70 {
ranges = <0x0 0x0 0xff70 0x10>;
};
+
+   pci0: pcie@ff70a000 {
+   reg = <0 0xff70a000 0 0x1000>;
+   ranges = <0x200 0x0 0x9000 0 0x9000 0x0 0x2000
+ 0x100 0x0 0x 0 0xc001 0x0 0x1>;
+   pcie@0 {
+   ranges = <0x200 0x0 0x9000
+ 0x200 0x0 0x9000
+ 0x0 0x2000
+
+ 0x100 0x0 0x0
+ 0x100 0x0 0x0
+ 0x0 0x10>;
+   };
+   };
 };
 
 /include/ "bsc9132qds.dtsi"
diff --git a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
index c723071..b5f0715 100644
--- a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
@@ -40,6 +40,34 @@
interrupts = <16 2 0 0 20 2 0 0>;
 };
 
+/* controller at 0xa000 */
+ {
+   compatible = "fsl,bsc9132-pcie", "fsl,qoriq-pcie-v2.2";
+   device_type = "pci";
+   #size-cells = <2>;
+   #address-cells = <3>;
+   bus-range = <0 255>;
+   interrupts = <16 2 0 0>;
+
+   pcie@0 {
+   reg = <0 0 0 0 0>;
+   #interrupt-cells = <1>;
+   #size-cells = <2>;
+   #address-cells = <3>;
+   device_type = "pci";
+   interrupts = <16 2 0 0>;
+   interrupt-map-mask = <0xf800 0 0 7>;
+
+   interrupt-map = <
+   /* IDSEL 0x0 */
+    0x0 0x0 0x1  0x0 0x2 0x0 0x0
+    0x0 0x0 0x2  0x1 0x2 0x0 0x0
+    0x0 0x0 0x3  0x2 0x2 0x0 0x0
+    0x0 0x0 0x4  0x3 0x2 0x0 0x0
+   >;
+   };
+};
+
  {
#address-cells = <1>;
#size-cells = <1>;
diff --git a/arch/powerpc/boot/dts/fsl/bsc9132si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/bsc9132si-pre.dtsi
index 301a9db..90f7949 100644
--- a/arch/powerpc/boot/dts/fsl/bsc9132si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/bsc9132si-pre.dtsi
@@ -45,6 +45,7 @@
serial0 = 
ethernet0 = 
ethernet1 = 
+   pci0 = 
};
 
cpus {
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/prom: Avoid reference to potentially freed memory

2015-10-16 Thread Michael Ellerman
On Fri, 2015-10-16 at 08:14 +0200, Christophe JAILLET wrote:

> of_get_property() is used inside the loop, but then the reference to the
> node is dropped before dereferencing the prop pointer, which could by then
> point to junk if the node has been freed.
> 
> Instead use of_property_read_u32() to actually read the property
> value before dropping the reference.
> 
> Signed-off-by: Christophe JAILLET 
> ---
> *** UNTESTED ***
> ---
>  arch/powerpc/kernel/prom.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index bef76c5..dc4f6a4 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -783,14 +783,13 @@ void __init early_get_first_memblock_info(void *params, 
> phys_addr_t *size)
>  int of_get_ibm_chip_id(struct device_node *np)
>  {
>   of_node_get(np);
> - while(np) {
> + while (np) {
>   struct device_node *old = np;
> - const __be32 *prop;
> + u32 chip_id;
>  
> - prop = of_get_property(np, "ibm,chip-id", NULL);
> - if (prop) {
> + if (!of_property_read_u32(np, "ibm,chip-id", _id))
>   of_node_put(np);
> - return be32_to_cpup(prop);
> + return chip_id;
>   }


As the kbuild robot detected you have left an extra "}" here.

I don't mind too much if you send patches that aren't compile tested, but you
might save yourself some time by compiling them.

There are x86->powerpc cross compilers here:

https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/x86_64-gcc-4.9.0-nolibc_powerpc64-linux.tar.gz

Or if you're running on Ubuntu you can just do:

$ apt-get install gcc-powerpc-linux-gnu

I think there's a package for Fedora too but I don't know the name off the top
of my head.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] rtas: Validate rtas entry before calling enter_rtas

2015-10-16 Thread Denis Kirjanov
On 10/16/15, Vasant Hegde  wrote:
> Currently we do not validate rtas entry before calling enter_rtas(). This
> is resulting in a kernel oops (see below) when user space calls rtas system
> call on PowerNV platform. We hit below oops when we ran trinity (system call
> fuzzer) on PowerNV. This patch adds code to validate rtas entry before
> making
> enter_rtas() call.

Hi,
have you figured out why we have null entry?

Thanks!
>
> dmesg:
> -
> [22061.541428] Oops: Exception in kernel mode, sig: 4 [#1]
> [22061.541446] SMP NR_CPUS=1024 NUMA PowerNV
> [22061.541453] Modules linked in: rfcomm bnep nfnetlink scsi_transport_iscsi
> hidp nfc af_802154 ieee802154 bluetooth rfkill pppoe pppox ppp_generic slhc
> irda crc_ccitt af_key sctp libcrc32c atm appletalk ipx p8023 psnap p8022
> ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_CHECKSUM tun ip6t_rpfilter
> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> iptable_mangle iptable_security iptable_raw windfarm_smu_sat ses enclosure
> windfarm_pid shpchp i2c_opal i2c_core kvm_hv kvm_pr dm_multipath kvm lpfc
> tg3 ptp pps_core scsi_transport_fc
> [22061.541561] CPU: 40 PID: 57748 Comm: trinity-c11 Not tainted
> 3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le #1
> [22061.541566] task: c4294b80 ti: c007e1a78000 task.ti:
> c007e1a78000
> [22061.541570] NIP:  LR: 9c14 CTR:
> c0423140
> [22061.541573] REGS: c007e1a7b920 TRAP: 0e40   Not tainted
> (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
>   [22061.541577] MSR: 10081000   CR:   XER: 
> 
>   [22061.541585] CFAR: c0009c0c SOFTE: 0
>   GPR00: 90001031 c007e1a7bba0 c12b1d00 
> 01338840
>   GPR04:   10001000 
> 90001033
>   GPR08: 4000 80002933 3fff9e9d0068 
> 
>   GPR12: 00ff c7db7c00  
> 
>   GPR16:    
> 
>   GPR20:    
> 
>   GPR24:  dc58 0001 
> c01ee716e000
>   GPR28:  c1338840 3fff9db3 
> 
>   [22061.541629] NIP []   (null)
>   [22061.541637] LR [9c14] 0x9c14
>   [22061.541640] Call Trace:
>   [22061.541649] [c007e1a7bba0] [c041a7f4]
> avc_has_perm_noaudit+0x54/0x110 (unreliable)
>   [22061.541657] [c007e1a7bd80] [c002ddc0] 
> ppc_rtas+0x150/0x2d0
>   [22061.541662] [c007e1a7be30] [c0009358] 
> syscall_exit+0x0/0x98
>   [22061.541666] Instruction dump:
>   [22061.541669]      
>  
>   [22061.541675]     6000 6000
> 6000 6000
>   [22061.541688] ---[ end trace 6f9bf0b3d32096aa ]---
>
> Reported-by: NAGESWARA R. SASTRY 
> Signed-off-by: Vasant Hegde 
> ---
>  arch/powerpc/kernel/rtas.c |3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 84bf934..5a753fa 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1043,6 +1043,9 @@ asmlinkage int ppc_rtas(struct rtas_args __user
> *uargs)
>   if (!capable(CAP_SYS_ADMIN))
>   return -EPERM;
>
> + if (!rtas.entry)
> + return -EINVAL;
> +
>   if (copy_from_user(, uargs, 3 * sizeof(u32)) != 0)
>   return -EFAULT;
>
>
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[GIT PULL] Please pull powerpc/linux.git powerpc-4.3-4 tag

2015-10-16 Thread Michael Ellerman
Hi Linus,

Please pull some more powerpc fixes for 4.3:

The following changes since commit 9ffecb10283508260936b96022d4ee43a7798b4c:

  Linux 4.3-rc3 (2015-09-27 07:50:08 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.3-4

for you to fetch changes up to abb39bc792aa8e9143e78a39ec13d7de1391f4b4:

  selftests/powerpc: Fix build failure of load_unaligned_zeropad test 
(2015-10-13 15:08:37 +1100)


powerpc fixes for 4.3 #3

 - Re-enable CONFIG_SCSI_DH in our defconfigs
 - Remove unused os_area_db_id_video_mode
 - cxl: fix leak of IRQ names in cxl_free_afu_irqs() from Andrew
 - cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API from 
Andrew
 - cxl: fix leak of ctx->mapping when releasing kernel API contexts from Andrew
 - cxl: Workaround malformed pcie packets on some cards from Philippe
 - cxl: Fix number of allocated pages in SPA from Christophe Lombard
 - Fix checkstop in native_hpte_clear() with lockdep from Cyril
 - Panic on unhandled Machine Check on powernv from Daniel
 - selftests/powerpc: Fix build failure of load_unaligned_zeropad test


Andrew Donnellan (3):
  cxl: fix leak of IRQ names in cxl_free_afu_irqs()
  cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API
  cxl: fix leak of ctx->mapping when releasing kernel API contexts

Christophe Lombard (1):
  cxl: Fix number of allocated pages in SPA

Cyril Bur (1):
  powerpc: Fix checkstop in native_hpte_clear() with lockdep

Daniel Axtens (1):
  powerpc/powernv: Panic on unhandled Machine Check

Michael Ellerman (3):
  powerpc/configs: Re-enable CONFIG_SCSI_DH
  powerpc/ps3: Remove unused os_area_db_id_video_mode
  selftests/powerpc: Fix build failure of load_unaligned_zeropad test

Philippe Bergheaud (1):
  cxl: Workaround malformed pcie packets on some cards

 arch/powerpc/configs/ppc64_defconfig   |  2 +-
 arch/powerpc/configs/pseries_defconfig |  2 +-
 arch/powerpc/include/asm/machdep.h |  9 +--
 arch/powerpc/mm/hash_native_64.c   | 23 +-
 arch/powerpc/platforms/powernv/opal.c  |  7 --
 arch/powerpc/platforms/ps3/os-area.c   |  5 
 drivers/misc/cxl/api.c |  1 +
 drivers/misc/cxl/context.c |  3 +++
 drivers/misc/cxl/cxl.h |  1 +
 drivers/misc/cxl/file.c| 13 +++---
 drivers/misc/cxl/irq.c |  4 +---
 drivers/misc/cxl/native.c  |  2 +-
 drivers/misc/cxl/pci.c | 28 ++
 .../powerpc/primitives/load_unaligned_zeropad.c|  9 +++
 14 files changed, 79 insertions(+), 30 deletions(-)


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc/mpc5xxx: Avoid dereferencing potentially freed memory

2015-10-16 Thread Michael Ellerman
On Fri, 2015-10-16 at 08:20 +0200, Christophe JAILLET wrote:
> Le 15/10/2015 08:36, Michael Ellerman a écrit :
> > On Thu, 2015-10-15 at 07:56 +0200, Christophe JAILLET wrote:
> > > Use 'of_property_read_u32()' instead of
> > > 'of_get_property()'+pointer
> > > dereference in order to avoid access to potentially freed memory.
> > > 
> > > Use 'of_get_next_parent()' to simplify the while() loop and avoid
> > > the
> > > need of a temp variable.
> > > 
> > > Signed-off-by: Christophe JAILLET 
> > > ---
> > > v2: Use of_property_read_u32 instead of of_get_property+pointer
> > > dereference
> > > *** Untested ***
> > Thanks.
> > 
> > Can someone with an mpc5xxx test this?
> 
> Hi,
> I don't think it is an issue, but while looking at another similar 
> patch, I noticed that the proposed patch adds a call to
> be32_to_cpup() 
> (within of_property_read_u32).
> Apparently, powerPC is a BE architecture, so this call should be a no
> -op.
> 
> Just wanted to point it out, in case of.

Hi Christoph,

I'm not sure I follow.

The device tree is always big endian, but of_property_read_u32() does
the
conversion to CPU endian for you already. That is one of the advantages
of
using it.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] selfttest/powerpc: Add memory page migration tests

2015-10-16 Thread Anshuman Khandual
On 10/15/2015 09:16 PM, Anshuman Khandual wrote:
> This adds two tests for memory page migration. One for normal page
> migration which works for both 4K or 64K base page size kernel and
> the other one is for 16MB huge page migration which will work both
> 4K or 64K base page sized 16MB huge pages as and when we support
> huge page migration.
> 
> Signed-off-by: Anshuman Khandual 
> ---
> - Works for normal page migration on both 64K and 4K base pages
> - Works for 16MB huge page migration (64K) on Aneesh's V2 PTE changes
> 
> +
> +int test_migration(unsigned long length)
> +{
> + unsigned long failed;
> + void *addr;
> + int ret;
> +
> + addr = mmap(MMAP_ADDR, length, MMAP_PROT, MMAP_FLAGS, -1, 0);
> + if (addr == MAP_FAILED) {
> + perror("mmap() failed");
> + exit(-1);
> + }

Will add a mlock() call here as well. Some times soft offline
fails while trying to move a huge chunk of memory on a system
because of swapping. With a mlock() in there the problem gets
resolved.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] rtas: Validate rtas entry before calling enter_rtas

2015-10-16 Thread Vasant Hegde
Currently we do not validate rtas entry before calling enter_rtas(). This
is resulting in a kernel oops (see below) when user space calls rtas system
call on PowerNV platform. We hit below oops when we ran trinity (system call
fuzzer) on PowerNV. This patch adds code to validate rtas entry before making
enter_rtas() call.

dmesg:
-
[22061.541428] Oops: Exception in kernel mode, sig: 4 [#1]
[22061.541446] SMP NR_CPUS=1024 NUMA PowerNV
[22061.541453] Modules linked in: rfcomm bnep nfnetlink scsi_transport_iscsi 
hidp nfc af_802154 ieee802154 bluetooth rfkill pppoe pppox ppp_generic slhc 
irda crc_ccitt af_key sctp libcrc32c atm appletalk ipx p8023 psnap p8022 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_CHECKSUM tun ip6t_rpfilter ip6t_REJECT 
nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ebtable_nat 
ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle 
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle 
iptable_security iptable_raw windfarm_smu_sat ses enclosure windfarm_pid shpchp 
i2c_opal i2c_core kvm_hv kvm_pr dm_multipath kvm lpfc tg3 ptp pps_core 
scsi_transport_fc
[22061.541561] CPU: 40 PID: 57748 Comm: trinity-c11 Not tainted 
3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le #1
[22061.541566] task: c4294b80 ti: c007e1a78000 task.ti: 
c007e1a78000
[22061.541570] NIP:  LR: 9c14 CTR: c0423140
[22061.541573] REGS: c007e1a7b920 TRAP: 0e40   Not tainted  
(3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
[22061.541577] MSR: 10081000   CR:   XER: 

[22061.541585] CFAR: c0009c0c SOFTE: 0
GPR00: 90001031 c007e1a7bba0 c12b1d00 
01338840
GPR04:   10001000 
90001033
GPR08: 4000 80002933 3fff9e9d0068 

GPR12: 00ff c7db7c00  

GPR16:    

GPR20:    

GPR24:  dc58 0001 
c01ee716e000
GPR28:  c1338840 3fff9db3 

[22061.541629] NIP []   (null)
[22061.541637] LR [9c14] 0x9c14
[22061.541640] Call Trace:
[22061.541649] [c007e1a7bba0] [c041a7f4] 
avc_has_perm_noaudit+0x54/0x110 (unreliable)
[22061.541657] [c007e1a7bd80] [c002ddc0] 
ppc_rtas+0x150/0x2d0
[22061.541662] [c007e1a7be30] [c0009358] 
syscall_exit+0x0/0x98
[22061.541666] Instruction dump:
[22061.541669]       
 
[22061.541675]     6000 6000 
6000 6000
[22061.541688] ---[ end trace 6f9bf0b3d32096aa ]---

Reported-by: NAGESWARA R. SASTRY 
Signed-off-by: Vasant Hegde 
---
 arch/powerpc/kernel/rtas.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 84bf934..5a753fa 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1043,6 +1043,9 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs)
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
 
+   if (!rtas.entry)
+   return -EINVAL;
+
if (copy_from_user(, uargs, 3 * sizeof(u32)) != 0)
return -EFAULT;
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] powerpc/mm: Add page soft dirty tracking

2015-10-16 Thread Laurent Dufour
User space checkpoint and restart tool (CRIU) needs the page's change
to be soft tracked. This allows to do a pre checkpoint and then dump
only touched pages.

This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when
the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when
the page is swapped out.

The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type
in the swap pte. A check is added to ensure that the bit is not
overwritten by _PAGE_HPTEFLAGS.

Signed-off-by: Laurent Dufour 
CC: Aneesh Kumar K.V 
---
 arch/powerpc/Kconfig |  2 ++
 arch/powerpc/include/asm/pgtable-ppc64.h | 13 +--
 arch/powerpc/include/asm/pgtable.h   | 40 +++-
 arch/powerpc/include/asm/pte-book3e.h|  1 +
 arch/powerpc/include/asm/pte-common.h|  5 ++--
 arch/powerpc/include/asm/pte-hash64.h|  1 +
 6 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9a7057ec2154..73a4a36a6b38 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -559,6 +559,7 @@ choice
 
 config PPC_4K_PAGES
bool "4k page size"
+   select HAVE_ARCH_SOFT_DIRTY if CHECKPOINT_RESTORE && PPC_BOOK3S
 
 config PPC_16K_PAGES
bool "16k page size"
@@ -567,6 +568,7 @@ config PPC_16K_PAGES
 config PPC_64K_PAGES
bool "64k page size"
depends on !PPC_FSL_BOOK3E && (44x || PPC_STD_MMU_64 || PPC_BOOK3E_64)
+   select HAVE_ARCH_SOFT_DIRTY if CHECKPOINT_RESTORE && PPC_BOOK3S
 
 config PPC_256K_PAGES
bool "256k page size"
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index fa1dfb7f7b48..2738bf4a8c55 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -315,7 +315,8 @@ static inline void pte_clear(struct mm_struct *mm, unsigned 
long addr,
 static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
 {
unsigned long bits = pte_val(entry) &
-   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
+   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC |
+_PAGE_SOFT_DIRTY);
 
 #ifdef PTE_ATOMIC_UPDATES
unsigned long old, tmp;
@@ -354,6 +355,7 @@ static inline void __ptep_set_access_flags(pte_t *ptep, 
pte_t entry)
 * We filter HPTEFLAGS on set_pte.  \
 */ \
BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
+   BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY);   \
} while (0)
 /*
  * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
@@ -371,6 +373,8 @@ static inline void __ptep_set_access_flags(pte_t *ptep, 
pte_t entry)
 
 void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
 void pgtable_cache_init(void);
+
+#define _PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE))
 #endif /* __ASSEMBLY__ */
 
 /*
@@ -389,7 +393,7 @@ void pgtable_cache_init(void);
  */
 #define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS |  \
 _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_SPLITTING | \
-_PAGE_THP_HUGE)
+_PAGE_THP_HUGE | _PAGE_SOFT_DIRTY)
 
 #ifndef __ASSEMBLY__
 /*
@@ -513,6 +517,11 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_mkyoung(pmd)   pte_pmd(pte_mkyoung(pmd_pte(pmd)))
 #define pmd_mkwrite(pmd)   pte_pmd(pte_mkwrite(pmd_pte(pmd)))
 
+#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
+#define pmd_soft_dirty(pmd)pte_soft_dirty(pmd_pte(pmd))
+#define pmd_mksoft_dirty(pmd)  pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)))
+#endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
+
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd) pte_write(pmd_pte(pmd))
 
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 0717693c8428..88baad3d66e2 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -38,6 +38,44 @@ static inline int pte_special(pte_t pte) { return 
pte_val(pte) & _PAGE_SPECIAL;
 static inline int pte_none(pte_t pte)  { return (pte_val(pte) & 
~_PTE_NONE_MASK) == 0; }
 static inline pgprot_t pte_pgprot(pte_t pte)   { return __pgprot(pte_val(pte) 
& PAGE_PROT_BITS); }
 
+#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
+static inline int pte_soft_dirty(pte_t pte)
+{
+   return pte_val(pte) & _PAGE_SOFT_DIRTY;
+}
+static inline pte_t pte_mksoft_dirty(pte_t pte)
+{
+   pte_val(pte) |= _PAGE_SOFT_DIRTY;
+   return pte;
+}
+
+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
+{
+   pte_val(pte) |= _PAGE_SWP_SOFT_DIRTY;
+   return pte;
+}
+static inline int pte_swp_soft_dirty(pte_t pte)
+{
+   return pte_val(pte) & _PAGE_SWP_SOFT_DIRTY;
+}
+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
+{
+   

[PATCH 0/3] mm/powerpc: enabling memory soft dirty tracking

2015-10-16 Thread Laurent Dufour
This series is enabling the software memory dirty tracking in the
kernel for powerpc.  This is the follow up of the commit 0f8975ec4db2
("mm: soft-dirty bits for user memory changes tracking") which
introduced this feature in the mm code.

The first patch is fixing an issue in the code clearing the soft dirty
bit.  The PTE were not cleared before being modified, leading to hang
on ppc64.

The second patch is fixing a build issue when the transparent huge
page is not enabled.

The third patch is introducing the soft dirty tracking in the powerpc
architecture code. 

Laurent Dufour (3):
  mm: clearing pte in clear_soft_dirty()
  mm: clear_soft_dirty_pmd requires THP
  powerpc/mm: Add page soft dirty tracking

 arch/powerpc/Kconfig |  2 ++
 arch/powerpc/include/asm/pgtable-ppc64.h | 13 +--
 arch/powerpc/include/asm/pgtable.h   | 40 +++-
 arch/powerpc/include/asm/pte-book3e.h|  1 +
 arch/powerpc/include/asm/pte-common.h|  5 ++--
 arch/powerpc/include/asm/pte-hash64.h|  1 +
 fs/proc/task_mmu.c   | 21 +
 7 files changed, 68 insertions(+), 15 deletions(-)

-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] mm: clear_soft_dirty_pmd requires THP

2015-10-16 Thread Laurent Dufour
Don't build clear_soft_dirty_pmd() if the transparent huge pages are
not enabled.

Signed-off-by: Laurent Dufour 
CC: Aneesh Kumar K.V 
---
 fs/proc/task_mmu.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c9454ee39b28..fa847a982a9f 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -762,7 +762,14 @@ static inline void clear_soft_dirty(struct vm_area_struct 
*vma,
set_pte_at(vma->vm_mm, addr, pte, ptent);
}
 }
+#else
+static inline void clear_soft_dirty(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *pte)
+{
+}
+#endif
 
+#if defined(CONFIG_MEM_SOFT_DIRTY) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
 static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
 {
@@ -776,14 +783,7 @@ static inline void clear_soft_dirty_pmd(struct 
vm_area_struct *vma,
 
set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
 }
-
 #else
-
-static inline void clear_soft_dirty(struct vm_area_struct *vma,
-   unsigned long addr, pte_t *pte)
-{
-}
-
 static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
 {
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] selfttest/powerpc: Add memory page migration tests

2015-10-16 Thread Aneesh Kumar K.V
Anshuman Khandual  writes:

> On 10/15/2015 09:16 PM, Anshuman Khandual wrote:
>> This adds two tests for memory page migration. One for normal page
>> migration which works for both 4K or 64K base page size kernel and
>> the other one is for 16MB huge page migration which will work both
>> 4K or 64K base page sized 16MB huge pages as and when we support
>> huge page migration.
>> 
>> Signed-off-by: Anshuman Khandual 
>> ---
>> - Works for normal page migration on both 64K and 4K base pages
>> - Works for 16MB huge page migration (64K) on Aneesh's V2 PTE changes
>> 
>> +
>> +int test_migration(unsigned long length)
>> +{
>> +unsigned long failed;
>> +void *addr;
>> +int ret;
>> +
>> +addr = mmap(MMAP_ADDR, length, MMAP_PROT, MMAP_FLAGS, -1, 0);
>> +if (addr == MAP_FAILED) {
>> +perror("mmap() failed");
>> +exit(-1);
>> +}
>
> Will add a mlock() call here as well. Some times soft offline
> fails while trying to move a huge chunk of memory on a system
> because of swapping. With a mlock() in there the problem gets
> resolved.

Is that with respect to hugetlb pages ? I doubt, because hugetlb pages
are not swapped out.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] mm: clearing pte in clear_soft_dirty()

2015-10-16 Thread Laurent Dufour
As mentioned in the commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa()
for updating _PAGE_NUMA bit"), architecture like ppc64 doesn't do
tlb flush in set_pte/pmd functions.

So when dealing with existing pte in clear_soft_dirty, the pte must
be cleared before being modified.

Signed-off-by: Laurent Dufour 
CC: Aneesh Kumar K.V 
---
 fs/proc/task_mmu.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e2d46adb54b4..c9454ee39b28 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -753,19 +753,20 @@ static inline void clear_soft_dirty(struct vm_area_struct 
*vma,
pte_t ptent = *pte;
 
if (pte_present(ptent)) {
+   ptent = ptep_modify_prot_start(vma->vm_mm, addr, pte);
ptent = pte_wrprotect(ptent);
ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
+   ptep_modify_prot_commit(vma->vm_mm, addr, pte, ptent);
} else if (is_swap_pte(ptent)) {
ptent = pte_swp_clear_soft_dirty(ptent);
+   set_pte_at(vma->vm_mm, addr, pte, ptent);
}
-
-   set_pte_at(vma->vm_mm, addr, pte, ptent);
 }
 
 static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
 {
-   pmd_t pmd = *pmdp;
+   pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, addr, pmdp);
 
pmd = pmd_wrprotect(pmd);
pmd = pmd_clear_flags(pmd, _PAGE_SOFT_DIRTY);
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] rtas: Validate rtas entry before calling enter_rtas

2015-10-16 Thread Vasant Hegde
On 10/16/2015 04:02 PM, Denis Kirjanov wrote:
> On 10/16/15, Vasant Hegde  wrote:
>> Currently we do not validate rtas entry before calling enter_rtas(). This
>> is resulting in a kernel oops (see below) when user space calls rtas system
>> call on PowerNV platform. We hit below oops when we ran trinity (system call
>> fuzzer) on PowerNV. This patch adds code to validate rtas entry before
>> making
>> enter_rtas() call.
> 
> Hi,
> have you figured out why we have null entry?

Denis,

Yes... On PowerNV platform we don't have RTAS.. Hence it's not initialized.

-Vasant

> 
> Thanks!
>>
>> dmesg:
>> -
>> [22061.541428] Oops: Exception in kernel mode, sig: 4 [#1]
>> [22061.541446] SMP NR_CPUS=1024 NUMA PowerNV
>> [22061.541453] Modules linked in: rfcomm bnep nfnetlink scsi_transport_iscsi
>> hidp nfc af_802154 ieee802154 bluetooth rfkill pppoe pppox ppp_generic slhc
>> irda crc_ccitt af_key sctp libcrc32c atm appletalk ipx p8023 psnap p8022
>> ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_CHECKSUM tun ip6t_rpfilter
>> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack
>> ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables
>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
>> iptable_mangle iptable_security iptable_raw windfarm_smu_sat ses enclosure
>> windfarm_pid shpchp i2c_opal i2c_core kvm_hv kvm_pr dm_multipath kvm lpfc
>> tg3 ptp pps_core scsi_transport_fc
>> [22061.541561] CPU: 40 PID: 57748 Comm: trinity-c11 Not tainted
>> 3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le #1
>> [22061.541566] task: c4294b80 ti: c007e1a78000 task.ti:
>> c007e1a78000
>> [22061.541570] NIP:  LR: 9c14 CTR:
>> c0423140
>> [22061.541573] REGS: c007e1a7b920 TRAP: 0e40   Not tainted
>> (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
>>  [22061.541577] MSR: 10081000   CR:   XER: 
>> 
>>  [22061.541585] CFAR: c0009c0c SOFTE: 0
>>  GPR00: 90001031 c007e1a7bba0 c12b1d00 
>> 01338840
>>  GPR04:   10001000 
>> 90001033
>>  GPR08: 4000 80002933 3fff9e9d0068 
>> 
>>  GPR12: 00ff c7db7c00  
>> 
>>  GPR16:    
>> 
>>  GPR20:    
>> 
>>  GPR24:  dc58 0001 
>> c01ee716e000
>>  GPR28:  c1338840 3fff9db3 
>> 
>>  [22061.541629] NIP []   (null)
>>  [22061.541637] LR [9c14] 0x9c14
>>  [22061.541640] Call Trace:
>>  [22061.541649] [c007e1a7bba0] [c041a7f4]
>> avc_has_perm_noaudit+0x54/0x110 (unreliable)
>>  [22061.541657] [c007e1a7bd80] [c002ddc0] 
>> ppc_rtas+0x150/0x2d0
>>  [22061.541662] [c007e1a7be30] [c0009358] 
>> syscall_exit+0x0/0x98
>>  [22061.541666] Instruction dump:
>>  [22061.541669]      
>>  
>>  [22061.541675]     6000 6000
>> 6000 6000
>>  [22061.541688] ---[ end trace 6f9bf0b3d32096aa ]---
>>
>> Reported-by: NAGESWARA R. SASTRY 
>> Signed-off-by: Vasant Hegde 
>> ---
>>  arch/powerpc/kernel/rtas.c |3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 84bf934..5a753fa 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -1043,6 +1043,9 @@ asmlinkage int ppc_rtas(struct rtas_args __user
>> *uargs)
>>  if (!capable(CAP_SYS_ADMIN))
>>  return -EPERM;
>>
>> +if (!rtas.entry)
>> +return -EINVAL;
>> +
>>  if (copy_from_user(, uargs, 3 * sizeof(u32)) != 0)
>>  return -EFAULT;
>>
>>
>> ___
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] mm: clearing pte in clear_soft_dirty()

2015-10-16 Thread Benjamin Herrenschmidt
On Fri, 2015-10-16 at 14:07 +0200, Laurent Dufour wrote:
> As mentioned in the commit 56eecdb912b5 ("mm: Use
> ptep/pmdp_set_numa()
> for updating _PAGE_NUMA bit"), architecture like ppc64 doesn't do
> tlb flush in set_pte/pmd functions.
> 
> So when dealing with existing pte in clear_soft_dirty, the pte must
> be cleared before being modified.

Note that this is true of more than powerpc afaik. There's is a general
rule that we don't "restrict" a PTE access permissions without first
clearing it, due to various races.

> Signed-off-by: Laurent Dufour 
> CC: Aneesh Kumar K.V 
> ---
>  fs/proc/task_mmu.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index e2d46adb54b4..c9454ee39b28 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -753,19 +753,20 @@ static inline void clear_soft_dirty(struct
> vm_area_struct *vma,
>   pte_t ptent = *pte;
>  
>   if (pte_present(ptent)) {
> + ptent = ptep_modify_prot_start(vma->vm_mm, addr,
> pte);
>   ptent = pte_wrprotect(ptent);
>   ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
> + ptep_modify_prot_commit(vma->vm_mm, addr, pte,
> ptent);
>   } else if (is_swap_pte(ptent)) {
>   ptent = pte_swp_clear_soft_dirty(ptent);
> + set_pte_at(vma->vm_mm, addr, pte, ptent);
>   }
> -
> - set_pte_at(vma->vm_mm, addr, pte, ptent);
>  }
>  
>  static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
>   unsigned long addr, pmd_t *pmdp)
>  {
> - pmd_t pmd = *pmdp;
> + pmd_t pmd = pmdp_huge_get_and_clear(vma->vm_mm, addr, pmdp);
>  
>   pmd = pmd_wrprotect(pmd);
>   pmd = pmd_clear_flags(pmd, _PAGE_SOFT_DIRTY);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2] powerpc/e6500: hw tablewalk: make sure we invalidate and write to the same tlb entry

2015-10-16 Thread Scott Wood
On Tue, Aug 18, 2015 at 03:55:56PM +0800, Kevin Hao wrote:
> diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
> index e4185581c5a7..3a5b89dfb5a1 100644
> --- a/arch/powerpc/mm/tlb_low_64e.S
> +++ b/arch/powerpc/mm/tlb_low_64e.S
> @@ -68,11 +68,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
>   ld  r14,PACAPGD(r13)
>   std r15,EX_TLB_R15(r12)
>   std r10,EX_TLB_CR(r12)
> +#ifdef CONFIG_PPC_FSL_BOOK3E
> +BEGIN_FTR_SECTION
> + std r7,EX_TLB_R7(r12)
> +END_FTR_SECTION_IFSET(CPU_FTR_SMT)
> +#endif
>   TLB_MISS_PROLOG_STATS
>  .endm
>  
>  .macro tlb_epilog_bolted
>   ld  r14,EX_TLB_CR(r12)
> +#ifdef CONFIG_PPC_FSL_BOOK3E
> +BEGIN_FTR_SECTION
> + ld  r7,EX_TLB_R7(r12)
> +END_FTR_SECTION_IFSET(CPU_FTR_SMT)
> +#endif

r7 is used outside the CPU_FTR_SMT section of the e6500 TLB handler.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/3] powerpc: dts: p1022si: Add fsl,wake-on-filer for eTSEC

2015-10-16 Thread Scott Wood
On Thu, Oct 01, 2015 at 01:10:22PM +0300, Claudiu Manoil wrote:
> Enable the "wake-on-filer" (aka. wake on user defined packet)
> wake on lan capability for the eTSEC ethernet nodes.
> 
> Cc: Li Yang 
> Cc: Zhao Chenhui 
> 
> Signed-off-by: Claudiu Manoil 
> ---
>  arch/powerpc/boot/dts/fsl/p1022si-post.dtsi | 2 ++
>  1 file changed, 2 insertions(+)

Acked-by: Scott Wood 

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [V2,1/2] powerpc/fsl: Add PCI node in device tree of bsc9132qds

2015-10-16 Thread Scott Wood
On Fri, Oct 16, 2015 at 05:01:10PM +0800, Zhiqiang Hou wrote:
> From: Harninder Rai 
> 
> Signed-off-by: Harninder Rai 
> Signed-off-by: Minghuan Lian 
> Change-Id: I4355add4a92d1fcf514843aea5ecadd2e2517969
> Reviewed-on: http://git.am.freescale.net:8181/2454
> Reviewed-by: Zang Tiefei-R61911 
> Reviewed-by: Kushwaha Prabhakar-B32579 
> Reviewed-by: Fleming Andrew-AFLEMING 
> Tested-by: Fleming Andrew-AFLEMING 

Again:

Get rid of the gerrit stuff.  And where is your signoff?

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online

2015-10-16 Thread Benjamin Herrenschmidt
On Fri, 2015-10-16 at 09:57 +0200, Laurent Vivier wrote:
> For me the problem is: as these CPUs are offline, I guess the core has
> been switched to 1 thread per core, so the CPUs (1 to 7 for core 0)
> don't exist anymore, how can we return them to OPAL ?

Another option is to make the new kernel kick_cpu fallback, if it knows
it's coming as a crashdump, to sending IPIs.

We would need some sane way to catch the guys coming it at 0x100 and
route the to secondary start.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: PowerPC: Move Freescale device tree files into fsl folder

2015-10-16 Thread Scott Wood
On Fri, Sep 18, 2015 at 12:00:24PM +0800, Hongtao Jia wrote:
> It makes no sense that some Freescale device tree files are in fsl
> directory while some others not. This patch move Freescale device tree
> files into fsl folder. To do that the following two steps are made:
> - Move Freescale device tree files into fsl folder.
> - Update the include path in these files from "fsl/*.dtsi" to "*.dtsi".
> 
> Please add "fsl/" prefix when you make dtb using Makefile.
> 
> Signed-off-by: Jia Hongtao 

This broke cuImage -- I made this change when applying:

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 4eec430..99e4487 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -364,6 +364,9 @@ $(obj)/cuImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits)
 $(obj)/cuImage.%: vmlinux $(obj)/%.dtb $(wrapperbits)
$(call if_changed,wrap,cuboot-$*,,$(obj)/$*.dtb)
 
+$(obj)/cuImage.%: vmlinux $(obj)/fsl/%.dtb $(wrapperbits)
+   $(call if_changed,wrap,cuboot-$*,,$(obj)/fsl/$*.dtb)
+
 $(obj)/simpleImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits)
$(call 
if_changed,wrap,simpleboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online

2015-10-16 Thread Benjamin Herrenschmidt
On Fri, 2015-10-16 at 09:48 +0200, Laurent Vivier wrote:
> 
> Yes, we know :)
> 
> On the crash, as the CPUs are offline, kernel doesn't call
> opal_return_cpu(), so for OPAL all these CPU are always in the
> kernel.

Hrm and they may even be in winkle state, so basically off... waking
them up *could* be a tricky business.

I suppose we could, near the last stage of kexec, patch the 0x100
vector to send anybody coming in to a kexec wait loop, and then machine
gun the IPIs. But that will make them come in with an unclean ICP
needing an EOI, I'm not sure we handle that very well.

Ideally we could just soft-reset them but that's broken on P8.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] mm/powerpc: enabling memory soft dirty tracking

2015-10-16 Thread Benjamin Herrenschmidt
On Fri, 2015-10-16 at 14:11 -0700, Andrew Morton wrote:
> I grabbed these patches, but they're more a ppc thing than a core
> kernel thing.  I can merge them into 4.3 with suitable acks or drop
> them if they turn up in the powerpc tree.  Or something else?

I'm happy for you to keep the generic ones but the powerpc one at the
end should be reviewed by Aneesh at least.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] rtas: Validate rtas entry before calling enter_rtas

2015-10-16 Thread Vasant Hegde
On 10/16/2015 11:49 PM, Denis Kirjanov wrote:
> On 10/16/15, Vasant Hegde  wrote:
>> On 10/16/2015 04:02 PM, Denis Kirjanov wrote:
>>> On 10/16/15, Vasant Hegde  wrote:
 Currently we do not validate rtas entry before calling enter_rtas().
 This
 is resulting in a kernel oops (see below) when user space calls rtas
 system
 call on PowerNV platform. We hit below oops when we ran trinity (system
 call
 fuzzer) on PowerNV. This patch adds code to validate rtas entry before
 making
 enter_rtas() call.
>>>
>>> Hi,
>>> have you figured out why we have null entry?
>>
>> Denis,

Denis,

>>
>> Yes... On PowerNV platform we don't have RTAS.. Hence it's not initialized.
> But why do we have CONFIG_PPC_RTAS on OPAL machines then?


Sorry. I should have elaborated little bit..

Today we use single config to build kernel for both PowerNV and PAPR guest. So
that same ISO can be used in different environment (PAPR LPAR, PowerNV host,
guest). I believe most distro also following this method. Hence we need this
validation.

-Vasant

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev