Re: [PATCH 19/31] powerpc/mm: Convert 4k hash insert to C

2015-09-29 Thread Aneesh Kumar K.V
Benjamin Herrenschmidt  writes:

> On Mon, 2015-09-21 at 12:10 +0530, Aneesh Kumar K.V wrote:
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  arch/powerpc/mm/Makefile|   3 +
>>  arch/powerpc/mm/hash64_64k.c| 204 +
>>  arch/powerpc/mm/hash_low_64.S   | 380 --
>> --
>>  arch/powerpc/mm/hash_utils_64.c |   4 +-
>>  4 files changed, 210 insertions(+), 381 deletions(-)
>>  create mode 100644 arch/powerpc/mm/hash64_64k.c
>
> Did you check if there was any measurable performance difference ?
>


I looked at the performance number with and without patch. I don't see much
impact in the numbers. We do have a path lengh increase ( I measured this
using systemsim)

Path length __hash_page_4k
with patch: 196
without patch: 142

Path length __hash_page_64k
with patch: 219
without patch: 154


But even if we have a path lengh increase of around 50 instructions. We don't 
see
the impact when running workload. I tried the kernelbuild test. 

With THP enabled (which is default) we see an improvement. I haven't fully 
looked at
the reason. This could be due to reduced contention of ptl lock. 
__hash_thp_page is
already a C code.

make -j64 vmlinux modules 
With fix:
-
real1m35.509s
user56m8.565s
sys 4m34.973s

real1m32.174s
user57m2.336s
sys 4m39.142s

Without fix:
---
real1m37.703s
user58m50.783s
sys 7m52.440s


real1m37.890s
user57m55.445s
sys 7m50.501s


THP disabled:

make -j64 vmlinux modules 
With fix:
-
real1m37.197s
user58m28.672s
sys 7m58.188s

real1m44.638s
user58m37.551s
sys 7m53.960s

Without fix:

real1m41.224s
user58m46.944s
sys 7m49.714s


real1m42.585s
user59m14.019s
sys 7m52.714s


-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] perf record: Limit --intr-regs to platforms supporting PERF_REGS

2015-09-29 Thread Naveen N. Rao
On 2015/09/29 08:53AM, Jiri Olsa wrote:
> On Tue, Sep 29, 2015 at 11:06:17AM +0530, Naveen N. Rao wrote:
> > On 2015/09/24 10:15PM, Naveen N Rao wrote:
> > > On 2015/09/24 08:32AM, Stephane Eranian wrote:
> > > > On Thu, Sep 24, 2015 at 5:57 AM, Jiri Olsa  wrote:
> > > > >
> > > > > On Thu, Sep 24, 2015 at 05:41:58PM +0530, Naveen N. Rao wrote:
> > > > > > perf build currently fails on powerpc:
> > > > > >
> > > > > >   LINK perf
> > > > > > libperf.a(libperf-in.o):(.toc+0x120): undefined reference to
> > > > > > `sample_reg_masks'
> > > > > > libperf.a(libperf-in.o):(.toc+0x130): undefined reference to
> > > > > > `sample_reg_masks'
> > > > > > collect2: error: ld returned 1 exit status
> > > > > > make[1]: *** [perf] Error 1
> > > > > > make: *** [all] Error 2
> > > > > >
> > > > > > This is due to parse-regs-options.c using sample_reg_masks, which is
> > > > > > defined only with CONFIG_PERF_REGS.
> > > > > >
> > > > > > In addition, perf record -I is only useful if the arch supports
> > > > > > PERF_REGS. Hence, let's expose -I conditionally.
> > > > > >
> > > > > > Signed-off-by: Naveen N. Rao 
> > > > >
> > > > > hum, I wonder why we have sample_reg_masks defined as weak in 
> > > > > util/perf_regs.c
> > > > > which is also built only via CONFIG_PERF_REGS
> > > > >
> > > > > I wonder we could get rid of the weak definition via attached patch, 
> > > > > Stephane?
> > > > >
> > > > But the whole point of having it weak is to avoid this error scenario
> > > > on any arch without support
> > > > and avoid ugly #ifdef HAVE_ in generic files.
> > > > 
> > > > if perf_regs.c is compiled on PPC, then why do we get the undefined?
> > > 
> > > As Jiri Olsa pointed out, powerpc and many other architectures don't 
> > > (yet) have support for perf regs.
> > > 
> > > But, the larger reason to introduce #ifdef is so the user doesn't see 
> > > options (s)he can't use on a specific architecture, along the same lines 
> > > as builtin-probe.c
> > 
> > Stephane, Arnaldo,
> > Suka has also posted a fix for this with a different approach [1]. Can 
> > you please ack/pull one of these versions? Building perf is broken on 
> > v4.3-rc due to this.
> 
> I did not get any answer for additional comments I made to the patch
> (couldnt get marc.info working, sending the patch again)

Hi Jiri,
I concur with the changes you proposed to my patch here (getting rid of 
the weak variant):
http://article.gmane.org/gmane.linux.kernel/2046108

I am aware of the other approach you posted (and the one attached 
below). When I said "please ack/pull one of these versions", I meant one 
of: your version, Suka's and mine.

> 
> > 
> > [1] http://article.gmane.org/gmane.linux.kernel/2046370
> 
> I dont have this last version, which seems to have other changes
> and patch in above link looks mangled, could you please repost it?

Can you please check the raw version:
http://article.gmane.org/gmane.linux.kernel/2046370/raw


Thanks,
Naveen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] perf record: Limit --intr-regs to platforms supporting PERF_REGS

2015-09-29 Thread Jiri Olsa
On Tue, Sep 29, 2015 at 11:06:17AM +0530, Naveen N. Rao wrote:
> On 2015/09/24 10:15PM, Naveen N Rao wrote:
> > On 2015/09/24 08:32AM, Stephane Eranian wrote:
> > > On Thu, Sep 24, 2015 at 5:57 AM, Jiri Olsa  wrote:
> > > >
> > > > On Thu, Sep 24, 2015 at 05:41:58PM +0530, Naveen N. Rao wrote:
> > > > > perf build currently fails on powerpc:
> > > > >
> > > > >   LINK perf
> > > > > libperf.a(libperf-in.o):(.toc+0x120): undefined reference to
> > > > > `sample_reg_masks'
> > > > > libperf.a(libperf-in.o):(.toc+0x130): undefined reference to
> > > > > `sample_reg_masks'
> > > > > collect2: error: ld returned 1 exit status
> > > > > make[1]: *** [perf] Error 1
> > > > > make: *** [all] Error 2
> > > > >
> > > > > This is due to parse-regs-options.c using sample_reg_masks, which is
> > > > > defined only with CONFIG_PERF_REGS.
> > > > >
> > > > > In addition, perf record -I is only useful if the arch supports
> > > > > PERF_REGS. Hence, let's expose -I conditionally.
> > > > >
> > > > > Signed-off-by: Naveen N. Rao 
> > > >
> > > > hum, I wonder why we have sample_reg_masks defined as weak in 
> > > > util/perf_regs.c
> > > > which is also built only via CONFIG_PERF_REGS
> > > >
> > > > I wonder we could get rid of the weak definition via attached patch, 
> > > > Stephane?
> > > >
> > > But the whole point of having it weak is to avoid this error scenario
> > > on any arch without support
> > > and avoid ugly #ifdef HAVE_ in generic files.
> > > 
> > > if perf_regs.c is compiled on PPC, then why do we get the undefined?
> > 
> > As Jiri Olsa pointed out, powerpc and many other architectures don't 
> > (yet) have support for perf regs.
> > 
> > But, the larger reason to introduce #ifdef is so the user doesn't see 
> > options (s)he can't use on a specific architecture, along the same lines 
> > as builtin-probe.c
> 
> Stephane, Arnaldo,
> Suka has also posted a fix for this with a different approach [1]. Can 
> you please ack/pull one of these versions? Building perf is broken on 
> v4.3-rc due to this.

I did not get any answer for additional comments I made to the patch
(couldnt get marc.info working, sending the patch again)

> 
> [1] http://article.gmane.org/gmane.linux.kernel/2046370

I dont have this last version, which seems to have other changes
and patch in above link looks mangled, could you please repost it?

thanks,
jirka


---
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 142eeb341b29..19c8fd22fbe3 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1082,9 +1082,11 @@ struct option __record_options[] = {
"sample transaction flags (special events only)"),
OPT_BOOLEAN(0, "per-thread", _thread,
"use per-thread mmaps"),
+#ifdef HAVE_PERF_REGS_SUPPORT
OPT_CALLBACK_OPTARG('I', "intr-regs", _intr_regs, 
NULL, "any register",
"sample selected machine registers on interrupt,"
" use -I ? to list register names", parse_regs),
+#endif
OPT_BOOLEAN(0, "running-time", _time,
"Record running/enabled time of read (:S) events"),
OPT_CALLBACK('k', "clockid", ,
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 4bc7a9ab45b1..93c6371405a3 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -104,7 +104,7 @@ libperf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
 
 libperf-y += scripting-engines/
 
-libperf-$(CONFIG_PERF_REGS) += perf_regs.o
+libperf-y += perf_regs.o
 libperf-$(CONFIG_ZLIB) += zlib.o
 libperf-$(CONFIG_LZMA) += lzma.o
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/3] powerpc/512x: add a device tree binding for LocalPlus Bus FIFO

2015-09-29 Thread Alexander Popov
On 28.09.2015 16:26, Timur Tabi wrote:
> Alexander Popov wrote:
>> I've just followed devicetree/bindings/dma/dma.txt...
>> This "rx-tx" doesn't mean much but it could show that LocalPlus Bus FIFO
>> uses a single DMA read-write channel. Should I really drop it?
> 
> Hmmm, I'm not sure.  Is there anything else (besides your driver) that
> parses this device tree node?

No, mpc512x_lpbfifo.c is the only piece of code which is going to use this
device tree node.

> dma.txt says this:
> 
> "The specific strings that can be used are defined in the binding of the
> DMA client device."
> 
> So this looks like it's driver-specific,

Yes.
MPC512x LocalPlus Bus FIFO uses the channel #26 of the DMA controller
both for reading and writing, and other DMA clients use other specific
DMA channels. This channel assignment is fixed in hardware and described
in the Reference Manual.

> but it is a required property.
> I guess you should keep it, but I think you should get a second opinion.

Ok, thanks.

Best regards,
Alexander
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/3] powerpc/512x: add LocalPlus Bus FIFO device driver

2015-09-29 Thread Alexander Popov
On 28.09.2015 16:18, Timur Tabi wrote:
> Alexander Popov wrote:
>> The only question I have: why calling dma_unmap_single() from within
>> a spinlock is a bad practice?
> 
> I don't know, but usually functions that allocate or free memory cannot be
> called from within a spinlock.  You need to check that.  Since the MPC5121
> is a single-core CPU, you might not notice if you're doing something wrong.

I've double-checked the code and LDD and don't see any reason to avoid
calling dma_unmap_single() from interrupt context and within spinlock.

Please correct me if I'm wrong.
Thanks.

Best regards,
Alexander
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/configs: Re-enable CONFIG_SCSI_DH

2015-09-29 Thread Michael Ellerman
Commit 086b91d052eb ("scsi_dh: integrate into the core SCSI code")
changed CONFIG_SCSI_DH from tristate to bool.

Our defconfigs have CONFIG_SCSI_DH=m, which the kconfig machinery warns
us is invalid, but instead of converting it to =y it leaves it unset.
This means we loose the CONFIG_SCSI_DH code and everything that depends
on it.

So convert the values in the defconfigs to =y.

Fixes: 086b91d052eb ("scsi_dh: integrate into the core SCSI code")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/ppc64_defconfig   | 2 +-
 arch/powerpc/configs/pseries_defconfig | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index 6bc0ee4b1070..2c041b535a64 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -111,7 +111,7 @@ CONFIG_SCSI_QLA_FC=m
 CONFIG_SCSI_QLA_ISCSI=m
 CONFIG_SCSI_LPFC=m
 CONFIG_SCSI_VIRTIO=m
-CONFIG_SCSI_DH=m
+CONFIG_SCSI_DH=y
 CONFIG_SCSI_DH_RDAC=m
 CONFIG_SCSI_DH_ALUA=m
 CONFIG_ATA=y
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 7991f37e5fe2..36871a4bfa54 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -114,7 +114,7 @@ CONFIG_SCSI_QLA_FC=m
 CONFIG_SCSI_QLA_ISCSI=m
 CONFIG_SCSI_LPFC=m
 CONFIG_SCSI_VIRTIO=m
-CONFIG_SCSI_DH=m
+CONFIG_SCSI_DH=y
 CONFIG_SCSI_DH_RDAC=m
 CONFIG_SCSI_DH_ALUA=m
 CONFIG_ATA=y
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] perf record: Limit --intr-regs to platforms supporting PERF_REGS

2015-09-29 Thread Jiri Olsa
On Tue, Sep 29, 2015 at 01:30:10PM +0530, Naveen N. Rao wrote:

SNIP

> > > Suka has also posted a fix for this with a different approach [1]. Can 
> > > you please ack/pull one of these versions? Building perf is broken on 
> > > v4.3-rc due to this.
> > 
> > I did not get any answer for additional comments I made to the patch
> > (couldnt get marc.info working, sending the patch again)
> 
> Hi Jiri,
> I concur with the changes you proposed to my patch here (getting rid of 
> the weak variant):
> http://article.gmane.org/gmane.linux.kernel/2046108
> 
> I am aware of the other approach you posted (and the one attached 
> below). When I said "please ack/pull one of these versions", I meant one 
> of: your version, Suka's and mine.

I was hoping somebody could test it on ppc ;-)

I think the last version (in my last email) that keeps the weak
variable is correct, let's wait for Arnaldo to sort this out

> 
> > 
> > > 
> > > [1] http://article.gmane.org/gmane.linux.kernel/2046370
> > 
> > I dont have this last version, which seems to have other changes
> > and patch in above link looks mangled, could you please repost it?
> 
> Can you please check the raw version:
> http://article.gmane.org/gmane.linux.kernel/2046370/raw

we have __maybe_unused definition in tools/include/linux/compiler.h
why to redeclare it?

jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 30/32] cxlflash: Fix to avoid corrupting adapter fops

2015-09-29 Thread Daniel Axtens
"Matthew R. Ochs"  writes:

> The corruption that this fix remedies is due to the fact that the fops
> is initially defaulted to values found within a static structure. When
> the fops is handed down to the CXL services later in the attach path,
> certain services are patched. The fops structure remains correct until
> the user count drops to 0 and the fops is reset, triggering the process
> to repeat again. The user counts are tightly coupled with the creation
> and deletion of the user context. If multiple users perform a disk
> attach at the same time, when the user count is currently 0, some users
> can be in the middle of obtaining a file descriptor and have not yet
> reached the context creation code that [in addition to creating the
> context] increments the user count. Subsequent users coming in to
> perform the attach see that the user count is still 0, and reinitialize
> the fops, temporarily removing the patched fops. The users that are in
> the middle obtaining their file descriptor may then receive an invalid
> descriptor.
>
> The fix simply removes the user count altogether and moves the fops
> initialization to probe time such that it is only performed one time
> for the life of the adapter. In the future, if the CXL services adopt
> a private member for their context, that could be used to store the
> adapter structure reference and cxlflash could revert to a model that
> does not require an embedded fops.

Yep, this looks good.

We have discussed adding a private data field to a cxl context, and will
no doubt revisit the question at some point in the future :)

Reviewed-by: Daniel Axtens 

>
> Signed-off-by: Matthew R. Ochs 
> Signed-off-by: Manoj N. Kumar 
> ---
>  drivers/scsi/cxlflash/common.h|  3 +--
>  drivers/scsi/cxlflash/main.c  |  1 +
>  drivers/scsi/cxlflash/superpipe.c | 11 +--
>  3 files changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
> index bbfe711..c11cd19 100644
> --- a/drivers/scsi/cxlflash/common.h
> +++ b/drivers/scsi/cxlflash/common.h
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  
> +extern const struct file_operations cxlflash_cxl_fops;
>  
>  #define MAX_CONTEXT  CXLFLASH_MAX_CONTEXT   /* num contexts per afu */
>  
> @@ -115,8 +116,6 @@ struct cxlflash_cfg {
>   struct list_head ctx_err_recovery; /* contexts w/ recovery pending */
>   struct file_operations cxl_fops;
>  
> - atomic_t num_user_contexts;
> -
>   /* Parameters that are LUN table related */
>   int last_lun_index[CXLFLASH_NUM_FC_PORTS];
>   int promote_lun_index;
> diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
> index be78906..38e7edc 100644
> --- a/drivers/scsi/cxlflash/main.c
> +++ b/drivers/scsi/cxlflash/main.c
> @@ -2386,6 +2386,7 @@ static int cxlflash_probe(struct pci_dev *pdev,
>  
>   cfg->init_state = INIT_STATE_NONE;
>   cfg->dev = pdev;
> + cfg->cxl_fops = cxlflash_cxl_fops;
>  
>   /*
>* The promoted LUNs move to the top of the LUN table. The rest stay
> diff --git a/drivers/scsi/cxlflash/superpipe.c 
> b/drivers/scsi/cxlflash/superpipe.c
> index 3cc8609..f625e07 100644
> --- a/drivers/scsi/cxlflash/superpipe.c
> +++ b/drivers/scsi/cxlflash/superpipe.c
> @@ -712,7 +712,6 @@ static void destroy_context(struct cxlflash_cfg *cfg,
>   kfree(ctxi->rht_needs_ws);
>   kfree(ctxi->rht_lun);
>   kfree(ctxi);
> - atomic_dec_if_positive(>num_user_contexts);
>  }
>  
>  /**
> @@ -769,7 +768,6 @@ static struct ctx_info *create_context(struct 
> cxlflash_cfg *cfg,
>   INIT_LIST_HEAD(>luns);
>   INIT_LIST_HEAD(>list); /* initialize for list_empty() */
>  
> - atomic_inc(>num_user_contexts);
>   mutex_lock(>mutex);
>  out:
>   return ctxi;
> @@ -1164,10 +1162,7 @@ out:
>   return rc;
>  }
>  
> -/*
> - * Local fops for adapter file descriptor
> - */
> -static const struct file_operations cxlflash_cxl_fops = {
> +const struct file_operations cxlflash_cxl_fops = {
>   .owner = THIS_MODULE,
>   .mmap = cxlflash_cxl_mmap,
>   .release = cxlflash_cxl_release,
> @@ -1286,10 +1281,6 @@ static int cxlflash_disk_attach(struct scsi_device 
> *sdev,
>  
>   int fd = -1;
>  
> - /* On first attach set fileops */
> - if (atomic_read(>num_user_contexts) == 0)
> - cfg->cxl_fops = cxlflash_cxl_fops;
> -
>   if (attach->num_interrupts > 4) {
>   dev_dbg(dev, "%s: Cannot support this many interrupts %llu\n",
>   __func__, attach->num_interrupts);
> -- 
> 2.1.0


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 32/32] cxlflash: Fix to avoid potential deadlock on EEH

2015-09-29 Thread Daniel Axtens
"Matthew R. Ochs"  writes:

> Ioctl threads that use scsi_execute() can run for an excessive amount
> of time due to the fact that they have lengthy timeouts and retry logic
> built in. Under normal operation this is not an issue. However, once EEH
> enters the picture, a long execution time coupled with the possibility
> that a timeout can trigger entry to the driver via registered reset
> callbacks becomes a liability.
>
> In particular, a deadlock can occur when an EEH event is encountered
> while in running in scsi_execute(). As part of the recovery, the EEH
> handler drains all currently running ioctls, waiting until they have
> completed before proceeding with a reset. As the scsi_execute()'s are
> situated on the ioctl path, the EEH handler will wait until they (and
> the remainder of the ioctl handler they're associated with) have
> completed. Normally this would not be much of an issue aside from the
> longer recovery period. Unfortunately, the scsi_execute() triggers a
> reset when it times out. The reset handler will see that the device is
> already being reset and wait until that reset completed. This creates
> a condition where the EEH handler becomes stuck, infinitely waiting for
> the ioctl thread to complete.
>
> To avoid this behavior, temporarily unmark the scsi_execute() threads
> as an ioctl thread by releasing the ioctl read semaphore. This allows
> the EEH handler to proceed with a recovery while the thread is still
> running. Once the scsi_execute() returns, the ioctl read semaphore is
> reacquired and the adapter state is rechecked in case it changed while
> inside of scsi_execute(). The state check will wait if the adapter is
> still being recovered or returns a failure if the recovery failed. In
> the event that the adapter reset failed, the failure is simply returned
> as the ioctl would be unable to continue.

Yep, looks good.

Reviewed-by: Daniel Axtens 

>
> Reported-by: Brian King 
> Signed-off-by: Matthew R. Ochs 
> Signed-off-by: Manoj N. Kumar 
> ---
>  drivers/scsi/cxlflash/superpipe.c | 30 +-
>  drivers/scsi/cxlflash/superpipe.h |  2 ++
>  drivers/scsi/cxlflash/vlun.c  | 29 +
>  3 files changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/cxlflash/superpipe.c 
> b/drivers/scsi/cxlflash/superpipe.c
> index f625e07..8af7cdc 100644
> --- a/drivers/scsi/cxlflash/superpipe.c
> +++ b/drivers/scsi/cxlflash/superpipe.c
> @@ -283,6 +283,24 @@ out:
>   * @sdev:SCSI device associated with LUN.
>   * @lli: LUN destined for capacity request.
>   *
> + * The READ_CAP16 can take quite a while to complete. Should an EEH occur 
> while
> + * in scsi_execute(), the EEH handler will attempt to recover. As part of the
> + * recovery, the handler drains all currently running ioctls, waiting until 
> they
> + * have completed before proceeding with a reset. As this routine is used on 
> the
> + * ioctl path, this can create a condition where the EEH handler becomes 
> stuck,
> + * infinitely waiting for this ioctl thread. To avoid this behavior, 
> temporarily
> + * unmark this thread as an ioctl thread by releasing the ioctl read 
> semaphore.
> + * This will allow the EEH handler to proceed with a recovery while this 
> thread
> + * is still running. Once the scsi_execute() returns, reacquire the ioctl 
> read
> + * semaphore and check the adapter state in case it changed while inside of
> + * scsi_execute(). The state check will wait if the adapter is still being
> + * recovered or return a failure if the recovery failed. In the event that 
> the
> + * adapter reset failed, simply return the failure as the ioctl would be 
> unable
> + * to continue.
> + *
> + * Note that the above puts a requirement on this routine to only be called 
> on
> + * an ioctl thread.
> + *
>   * Return: 0 on success, -errno on failure
>   */
>  static int read_cap16(struct scsi_device *sdev, struct llun_info *lli)
> @@ -314,8 +332,18 @@ retry:
>   dev_dbg(dev, "%s: %ssending cmd(0x%x)\n", __func__,
>   retry_cnt ? "re" : "", scsi_cmd[0]);
>  
> + /* Drop the ioctl read semahpore across lengthy call */
> + up_read(>ioctl_rwsem);
>   result = scsi_execute(sdev, scsi_cmd, DMA_FROM_DEVICE, cmd_buf,
> CMD_BUFSIZE, sense_buf, to, CMD_RETRIES, 0, NULL);
> + down_read(>ioctl_rwsem);
> + rc = check_state(cfg);
> + if (rc) {
> + dev_err(dev, "%s: Failed state! result=0x08%X\n",
> + __func__, result);
> + rc = -ENODEV;
> + goto out;
> + }
>  
>   if (driver_byte(result) == DRIVER_SENSE) {
>   result &= ~(0xFF<<24); /* DRIVER_SENSE is not an error */
> @@ -1221,7 +1249,7 @@ static const struct file_operations null_fops = {
>   *
>   * Return: 0 on success, -errno on failure
> 

Re: [PATCH v4 29/32] cxlflash: Fix to double the delay each time

2015-09-29 Thread Matthew R. Ochs
> On Sep 28, 2015, at 8:40 PM, Daniel Axtens  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
> 
> "Matthew R. Ochs"  writes:
> 
>> From: Manoj Kumar 
>> 
>> The operator used to double the delay is incorrect and
>> does not result in delay doubling.
>> 
>> To fix, use a left shift instead of the XOR operator.
>> 
> I can see that the patch is correct, but this commit message is a bit
> confusing. What delay? In what circumstances are you doubling it? Why?

This is the response delay while resetting the master context. The reset
is performed by writing a bit and then waiting for it to clear. While waiting
for it to clear, the code relaxes the delta between MMIO reads.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 12/32] cxlflash: Fix to avoid spamming the kernel log

2015-09-29 Thread Matthew R. Ochs
> On Sep 29, 2015, at 12:05 AM, Andrew Donnellan  
> wrote:
> On 26/09/15 09:15, Matthew R. Ochs wrote:
>> During run-time the driver can be very chatty and spam the system
>> kernel log. Various print statements can be limited and/or moved
>> to development-only mode. Additionally, numerous prints can be
>> converted to trace the corresponding device.
>> 
>> The following changes were made:
>>  - pr_debug to pr_devel
>>  - pr_debug to pr_debug_ratelimited
>>  - pr_err to dev_err
>>  - pr_debug to dev_dbg
>> 
>> Signed-off-by: Matthew R. Ochs 
>> Signed-off-by: Manoj N. Kumar 
>> Reviewed-by: Brian King 
> 
> Reviewed-by: Andrew Donnellan 
> 
> Changes mostly look fine, further comments below.
> 
>> --- a/drivers/scsi/cxlflash/main.c
>> +++ b/drivers/scsi/cxlflash/main.c
>> @@ -58,8 +58,8 @@ static struct afu_cmd *cmd_checkout(struct afu *afu)
>>  cmd = >cmd[k];
>> 
>>  if (!atomic_dec_if_positive(>free)) {
>> -pr_debug("%s: returning found index=%d\n",
>> - __func__, cmd->slot);
>> +pr_devel("%s: returning found index=%d cmd=%p\n",
>> + __func__, cmd->slot, cmd);
> 
>>  pr_debug("%s: cmd failed afu_rc=%d scsi_rc=%d fc_rc=%d "
>> - "afu_extra=0x%X, scsi_entra=0x%X, fc_extra=0x%X\n",
>> + "afu_extra=0x%X, scsi_extra=0x%X, fc_extra=0x%X\n",
>>   __func__, ioasa->rc.afu_rc, ioasa->rc.scsi_rc,
>>   ioasa->rc.fc_rc, ioasa->afu_extra, ioasa->scsi_extra,
>>   ioasa->fc_extra);
> 
> Minor nitpicking: mention that you fix these in the commit message.

Noted for the future.

>> @@ -240,9 +240,9 @@ static void cmd_complete(struct afu_cmd *cmd)
>>  cmd_is_tmf = cmd->cmd_tmf;
>>  cmd_checkin(cmd); /* Don't use cmd after here */
>> 
>> -pr_debug("%s: calling scsi_set_resid, scp=%p "
>> - "result=%X resid=%d\n", __func__,
>> - scp, scp->result, resid);
>> +pr_debug_ratelimited("%s: calling scsi_done scp=%p result=%X "
>> + "ioasc=%d\n", __func__, scp, scp->result,
>> + cmd->sa.ioasc);
>> 
>>  scsi_set_resid(scp, resid);
>>  scsi_dma_unmap(scp);
> 
> Why has the message changed from scsi_set_resid to scsi_done, and should the 
> message be moved to immediately before the scsi_done call?

In a later patch in the series the scsi_set_resid() is actually moved.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] perf record: Limit --intr-regs to platforms supporting PERF_REGS

2015-09-29 Thread Jiri Olsa
On Tue, Sep 29, 2015 at 11:10:02AM -0700, Sukadev Bhattiprolu wrote:

SNIP

>  
> diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
> index 885e8ac..6b8eb13 100644
> --- a/tools/perf/util/perf_regs.c
> +++ b/tools/perf/util/perf_regs.c
> @@ -6,6 +6,7 @@ const struct sample_reg __weak sample_reg_masks[] = {
>   SMPL_REG_END
>  };
>  
> +#ifdef HAVE_PERF_REGS_SUPPORT
>  int perf_reg_value(u64 *valp, struct regs_dump *regs, int id)
>  {
>   int i, idx = 0;
> @@ -29,3 +30,4 @@ out:
>   *valp = regs->cache_regs[id];
>   return 0;
>  }
> +#endif
> diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
> index 2984dcc..8dbdfeb 100644
> --- a/tools/perf/util/perf_regs.h
> +++ b/tools/perf/util/perf_regs.h
> @@ -3,6 +3,10 @@
>  
>  #include 
>  
> +#ifndef __maybe_unused
> +#define __maybe_unused __attribute__((unused))
> +#endif
> +

would the linux/compiler.h include do instead?

otherwise I'd be ok with this

thanks,
jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 23/32] cxlflash: Fix function prolog parameters and return codes

2015-09-29 Thread Matthew R. Ochs
> On Sep 28, 2015, at 11:36 PM, Andrew Donnellan  
> wrote:
> On 26/09/15 09:18, Matthew R. Ochs wrote:
>> 
>>   */
>>  static int send_tmf(struct afu *afu, struct scsi_cmnd *scp, u64 tmfcmd)
>>  {
>> @@ -491,9 +490,7 @@ static const char *cxlflash_driver_info(struct Scsi_Host 
>> *host)
>>   * @host:   SCSI host associated with device.
>>   * @scp:SCSI command to send.
>>   *
>> - * Return:
>> - *  0 on success
>> - *  SCSI_MLQUEUE_HOST_BUSY when host is busy
>> + * Return: 0 on success or SCSI_MLQUEUE_HOST_BUSY
>>   */
> 
> I'd prefer it to say "SCSI_MLQUEUE_HOST_BUSY on failure". (Aesthetically I 
> prefer having it on a separate line, but that's just personal preference.)
> 
> As an aside, while checking the correctness of this, I found that the comment 
> for cxlflash_send_cmd() states that it returns -1 on failure, when the only 
> error value it actually returns is SCSI_MLQUEUE_HOST_BUSY. If you send a v5 
> you might want to fix this.

I'll make a note of this.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 29/32] cxlflash: Fix to double the delay each time

2015-09-29 Thread Daniel Axtens
"Matthew R. Ochs"  writes:

>> On Sep 28, 2015, at 8:40 PM, Daniel Axtens  wrote:
>> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA512
>> 
>> "Matthew R. Ochs"  writes:
>> 
>>> From: Manoj Kumar 
>>> 
>>> The operator used to double the delay is incorrect and
>>> does not result in delay doubling.
>>> 
>>> To fix, use a left shift instead of the XOR operator.
>>> 
>> I can see that the patch is correct, but this commit message is a bit
>> confusing. What delay? In what circumstances are you doubling it? Why?
>
> This is the response delay while resetting the master context. The reset
> is performed by writing a bit and then waiting for it to clear. While waiting
> for it to clear, the code relaxes the delta between MMIO reads.

OK. If you do a v5, please include this in the commit message.

Regards,
Daniel
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/31] powerpc/mm: Update page table format for book3s 64

2015-09-29 Thread Michael Ellerman
On Mon, 2015-09-28 at 11:41 -0500, Scott Wood wrote:
> On Mon, 2015-09-28 at 10:26 +0530, Aneesh Kumar K.V wrote:
> > Scott Wood  writes:
> > > 
> > > In any case, "nohash" is the term used elsewhere.
> > 
> > How about using swtlb ? (nohash always confused me, It would be nice to
> > be explict and us software tlb ?)
> 
> I'd prefer nohash.  Besides being existing practice (what's confusing about 
> it?), e6500 is nohash but has a partial hw tlb, and 603 is considered hash 
> despite having a software-loaded tlb.

It's not a great name because it describes what the MMU is *not*, rather than
what it *is*.

But it is the existing name, and there doesn't seem to be anything particular
common about the other MMUs that we can use as a name.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 01/31] powerpc/mm: move pte headers to book3s directory

2015-09-29 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/{pte-hash32.h => book3s/32/hash.h} | 0
 arch/powerpc/include/asm/{pte-hash64.h => book3s/64/hash.h} | 0
 arch/powerpc/include/asm/pgtable-ppc32.h| 2 +-
 arch/powerpc/include/asm/pgtable-ppc64.h| 2 +-
 4 files changed, 2 insertions(+), 2 deletions(-)
 rename arch/powerpc/include/asm/{pte-hash32.h => book3s/32/hash.h} (100%)
 rename arch/powerpc/include/asm/{pte-hash64.h => book3s/64/hash.h} (100%)

diff --git a/arch/powerpc/include/asm/pte-hash32.h 
b/arch/powerpc/include/asm/book3s/32/hash.h
similarity index 100%
rename from arch/powerpc/include/asm/pte-hash32.h
rename to arch/powerpc/include/asm/book3s/32/hash.h
diff --git a/arch/powerpc/include/asm/pte-hash64.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
similarity index 100%
rename from arch/powerpc/include/asm/pte-hash64.h
rename to arch/powerpc/include/asm/book3s/64/hash.h
diff --git a/arch/powerpc/include/asm/pgtable-ppc32.h 
b/arch/powerpc/include/asm/pgtable-ppc32.h
index 9c326565d498..1a58a05be99c 100644
--- a/arch/powerpc/include/asm/pgtable-ppc32.h
+++ b/arch/powerpc/include/asm/pgtable-ppc32.h
@@ -116,7 +116,7 @@ extern int icache_44x_need_flush;
 #elif defined(CONFIG_8xx)
 #include 
 #else /* CONFIG_6xx */
-#include 
+#include 
 #endif
 
 /* And here we include common definitions */
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index fa1dfb7f7b48..4f7d8e2f2a2f 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -98,7 +98,7 @@
  * Include the PTE bits definitions
  */
 #ifdef CONFIG_PPC_BOOK3S
-#include 
+#include 
 #else
 #include 
 #endif
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 06/31] powerpc/mm: Delete booke bits from book3s

2015-09-29 Thread Aneesh Kumar K.V
We also move __ASSEMBLY__ towards the end of header. This avoid
having #ifndef __ASSEMBLY___ all over the header

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 93 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 88 --
 arch/powerpc/include/asm/book3s/pgtable.h|  1 +
 3 files changed, 51 insertions(+), 131 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index a7738dfbe7e5..2afe5958c837 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -3,18 +3,10 @@
 
 #include 
 
-#ifndef __ASSEMBLY__
-#include 
-#include 
-#include /* For sub-arch specific PPC_PIN_SIZE */
-
-extern unsigned long ioremap_bot;
-
-#ifdef CONFIG_44x
-extern int icache_44x_need_flush;
-#endif
+#include 
 
-#endif /* __ASSEMBLY__ */
+/* And here we include common definitions */
+#include 
 
 /*
  * The normal case is that PTEs are 32-bits and we have a 1-page
@@ -31,28 +23,11 @@ extern int icache_44x_need_flush;
 #define PGDIR_SIZE (1UL << PGDIR_SHIFT)
 #define PGDIR_MASK (~(PGDIR_SIZE-1))
 
-/*
- * entries per page directory level: our page-table tree is two-level, so
- * we don't really have any PMD directory.
- */
-#ifndef __ASSEMBLY__
-#define PTE_TABLE_SIZE (sizeof(pte_t) << PTE_SHIFT)
-#define PGD_TABLE_SIZE (sizeof(pgd_t) << (32 - PGDIR_SHIFT))
-#endif /* __ASSEMBLY__ */
-
 #define PTRS_PER_PTE   (1 << PTE_SHIFT)
 #define PTRS_PER_PMD   1
 #define PTRS_PER_PGD   (1 << (32 - PGDIR_SHIFT))
 
 #define USER_PTRS_PER_PGD  (TASK_SIZE / PGDIR_SIZE)
-#define FIRST_USER_ADDRESS 0UL
-
-#define pte_ERROR(e) \
-   pr_err("%s:%d: bad pte %llx.\n", __FILE__, __LINE__, \
-   (unsigned long long)pte_val(e))
-#define pgd_ERROR(e) \
-   pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
-
 /*
  * This is the bottom of the PKMAP area with HIGHMEM or an arbitrary
  * value (for now) on others, from where we can start layout kernel
@@ -100,30 +75,30 @@ extern int icache_44x_need_flush;
 #endif
 #define VMALLOC_ENDioremap_bot
 
+#ifndef __ASSEMBLY__
+#include 
+#include 
+#include /* For sub-arch specific PPC_PIN_SIZE */
+
+extern unsigned long ioremap_bot;
+
+/*
+ * entries per page directory level: our page-table tree is two-level, so
+ * we don't really have any PMD directory.
+ */
+#define PTE_TABLE_SIZE (sizeof(pte_t) << PTE_SHIFT)
+#define PGD_TABLE_SIZE (sizeof(pgd_t) << (32 - PGDIR_SHIFT))
+
+#define pte_ERROR(e) \
+   pr_err("%s:%d: bad pte %llx.\n", __FILE__, __LINE__, \
+   (unsigned long long)pte_val(e))
+#define pgd_ERROR(e) \
+   pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
 /*
  * Bits in a linux-style PTE.  These match the bits in the
  * (hardware-defined) PowerPC PTE as closely as possible.
  */
 
-#if defined(CONFIG_40x)
-#include 
-#elif defined(CONFIG_44x)
-#include 
-#elif defined(CONFIG_FSL_BOOKE) && defined(CONFIG_PTE_64BIT)
-#include 
-#elif defined(CONFIG_FSL_BOOKE)
-#include 
-#elif defined(CONFIG_8xx)
-#include 
-#else /* CONFIG_6xx */
-#include 
-#endif
-
-/* And here we include common definitions */
-#include 
-
-#ifndef __ASSEMBLY__
-
 #define pte_clear(mm, addr, ptep) \
do { pte_update(ptep, ~_PAGE_HASHPTE, 0); } while (0)
 
@@ -167,7 +142,6 @@ static inline unsigned long pte_update(pte_t *p,
   unsigned long clr,
   unsigned long set)
 {
-#ifdef PTE_ATOMIC_UPDATES
unsigned long old, tmp;
 
__asm__ __volatile__("\
@@ -180,15 +154,7 @@ static inline unsigned long pte_update(pte_t *p,
: "=" (old), "=" (tmp), "=m" (*p)
: "r" (p), "r" (clr), "r" (set), "m" (*p)
: "cc" );
-#else /* PTE_ATOMIC_UPDATES */
-   unsigned long old = pte_val(*p);
-   *p = __pte((old & ~clr) | set);
-#endif /* !PTE_ATOMIC_UPDATES */
-
-#ifdef CONFIG_44x
-   if ((old & _PAGE_USER) && (old & _PAGE_EXEC))
-   icache_44x_need_flush = 1;
-#endif
+
return old;
 }
 #else /* CONFIG_PTE_64BIT */
@@ -196,7 +162,6 @@ static inline unsigned long long pte_update(pte_t *p,
unsigned long clr,
unsigned long set)
 {
-#ifdef PTE_ATOMIC_UPDATES
unsigned long long old;
unsigned long tmp;
 
@@ -211,15 +176,7 @@ static inline unsigned long long pte_update(pte_t *p,
: "=" (old), "=" (tmp), "=m" (*p)
: "r" (p), "r" ((unsigned long)(p) + 4), "r" (clr), "r" (set), "m" (*p)
: "cc" );
-#else /* PTE_ATOMIC_UPDATES */
-   unsigned long long old = pte_val(*p);
-   *p = __pte((old & ~(unsigned long long)clr) | set);
-#endif /* !PTE_ATOMIC_UPDATES */
-
-#ifdef CONFIG_44x
-   if ((old & _PAGE_USER) && (old & 

[PATCH V2 07/31] powerpc/mm: Don't have generic headers introduce functions touching pte bits

2015-09-29 Thread Aneesh Kumar K.V
We are going to drop pte_common.h in the later patch. The idea is to
enable hash code not require to define all PTE bits. Having PTE bits
defined in pte_common.h made the code unnecessarily complex.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/pgtable.h | 176 ++
 arch/powerpc/include/asm/pgtable-book3e.h | 199 ++
 arch/powerpc/include/asm/pgtable.h| 192 +---
 3 files changed, 376 insertions(+), 191 deletions(-)
 create mode 100644 arch/powerpc/include/asm/pgtable-book3e.h

diff --git a/arch/powerpc/include/asm/book3s/pgtable.h 
b/arch/powerpc/include/asm/book3s/pgtable.h
index 3818cc7bc9b7..fa270cfcf30a 100644
--- a/arch/powerpc/include/asm/book3s/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/pgtable.h
@@ -8,4 +8,180 @@
 #endif
 
 #define FIRST_USER_ADDRESS 0UL
+#ifndef __ASSEMBLY__
+
+/* Generic accessors to PTE bits */
+static inline int pte_write(pte_t pte)
+{
+   return (pte_val(pte) & (_PAGE_RW | _PAGE_RO)) != _PAGE_RO;
+}
+static inline int pte_dirty(pte_t pte) { return pte_val(pte) & 
_PAGE_DIRTY; }
+static inline int pte_young(pte_t pte) { return pte_val(pte) & 
_PAGE_ACCESSED; }
+static inline int pte_special(pte_t pte)   { return pte_val(pte) & 
_PAGE_SPECIAL; }
+static inline int pte_none(pte_t pte)  { return (pte_val(pte) & 
~_PTE_NONE_MASK) == 0; }
+static inline pgprot_t pte_pgprot(pte_t pte)   { return __pgprot(pte_val(pte) 
& PAGE_PROT_BITS); }
+
+#ifdef CONFIG_NUMA_BALANCING
+/*
+ * These work without NUMA balancing but the kernel does not care. See the
+ * comment in include/asm-generic/pgtable.h . On powerpc, this will only
+ * work for user pages and always return true for kernel pages.
+ */
+static inline int pte_protnone(pte_t pte)
+{
+   return (pte_val(pte) &
+   (_PAGE_PRESENT | _PAGE_USER)) == _PAGE_PRESENT;
+}
+
+static inline int pmd_protnone(pmd_t pmd)
+{
+   return pte_protnone(pmd_pte(pmd));
+}
+#endif /* CONFIG_NUMA_BALANCING */
+
+static inline int pte_present(pte_t pte)
+{
+   return pte_val(pte) & _PAGE_PRESENT;
+}
+
+/* Conversion functions: convert a page and protection to a page entry,
+ * and a page entry and page directory to the page they refer to.
+ *
+ * Even if PTEs can be unsigned long long, a PFN is always an unsigned
+ * long for now.
+ */
+static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot) {
+   return __pte(((pte_basic_t)(pfn) << PTE_RPN_SHIFT) |
+pgprot_val(pgprot)); }
+static inline unsigned long pte_pfn(pte_t pte) {
+   return pte_val(pte) >> PTE_RPN_SHIFT; }
+
+/* Generic modifiers for PTE bits */
+static inline pte_t pte_wrprotect(pte_t pte) {
+   pte_val(pte) &= ~(_PAGE_RW | _PAGE_HWWRITE);
+   pte_val(pte) |= _PAGE_RO; return pte; }
+static inline pte_t pte_mkclean(pte_t pte) {
+   pte_val(pte) &= ~(_PAGE_DIRTY | _PAGE_HWWRITE); return pte; }
+static inline pte_t pte_mkold(pte_t pte) {
+   pte_val(pte) &= ~_PAGE_ACCESSED; return pte; }
+static inline pte_t pte_mkwrite(pte_t pte) {
+   pte_val(pte) &= ~_PAGE_RO;
+   pte_val(pte) |= _PAGE_RW; return pte; }
+static inline pte_t pte_mkdirty(pte_t pte) {
+   pte_val(pte) |= _PAGE_DIRTY; return pte; }
+static inline pte_t pte_mkyoung(pte_t pte) {
+   pte_val(pte) |= _PAGE_ACCESSED; return pte; }
+static inline pte_t pte_mkspecial(pte_t pte) {
+   pte_val(pte) |= _PAGE_SPECIAL; return pte; }
+static inline pte_t pte_mkhuge(pte_t pte) {
+   return pte; }
+static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
+{
+   pte_val(pte) = (pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot);
+   return pte;
+}
+
+
+/* Insert a PTE, top-level function is out of line. It uses an inline
+ * low level function in the respective pgtable-* files
+ */
+extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+  pte_t pte);
+
+/* This low level function performs the actual PTE insertion
+ * Setting the PTE depends on the MMU type and other factors. It's
+ * an horrible mess that I'm not going to try to clean up now but
+ * I'm keeping it in one place rather than spread around
+ */
+static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
+   pte_t *ptep, pte_t pte, int percpu)
+{
+#if defined(CONFIG_PPC_STD_MMU_32) && defined(CONFIG_SMP) && 
!defined(CONFIG_PTE_64BIT)
+   /* First case is 32-bit Hash MMU in SMP mode with 32-bit PTEs. We use 
the
+* helper pte_update() which does an atomic update. We need to do that
+* because a concurrent invalidation can clear _PAGE_HASHPTE. If it's a
+* per-CPU PTE such as a kmap_atomic, we do a simple update preserving
+* the hash bits instead (ie, same as the non-SMP case)
+*/
+   if (percpu)
+   *ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
+  

[PATCH V2 05/31] powerpc/mm: Move hash specific pte width and other defines to book3s

2015-09-29 Thread Aneesh Kumar K.V
This further make a copy of pte defines to book3s/64/hash*.h. This
remove the dependency on ppc64-4k.h and ppc64-64k.h

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  | 87 ++-
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 46 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  6 +-
 3 files changed, 130 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 79750fd3eeb8..f2c51cd61f69 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -1,4 +1,51 @@
-/* To be include by pgtable-hash64.h only */
+#ifndef _ASM_POWERPC_BOOK3S_64_HASH_4K_H
+#define _ASM_POWERPC_BOOK3S_64_HASH_4K_H
+/*
+ * Entries per page directory level.  The PTE level must use a 64b record
+ * for each page table entry.  The PMD and PGD level use a 32b record for
+ * each entry by assuming that each entry is page aligned.
+ */
+#define PTE_INDEX_SIZE  9
+#define PMD_INDEX_SIZE  7
+#define PUD_INDEX_SIZE  9
+#define PGD_INDEX_SIZE  9
+
+#ifndef __ASSEMBLY__
+#define PTE_TABLE_SIZE (sizeof(pte_t) << PTE_INDEX_SIZE)
+#define PMD_TABLE_SIZE (sizeof(pmd_t) << PMD_INDEX_SIZE)
+#define PUD_TABLE_SIZE (sizeof(pud_t) << PUD_INDEX_SIZE)
+#define PGD_TABLE_SIZE (sizeof(pgd_t) << PGD_INDEX_SIZE)
+#endif /* __ASSEMBLY__ */
+
+#define PTRS_PER_PTE   (1 << PTE_INDEX_SIZE)
+#define PTRS_PER_PMD   (1 << PMD_INDEX_SIZE)
+#define PTRS_PER_PUD   (1 << PUD_INDEX_SIZE)
+#define PTRS_PER_PGD   (1 << PGD_INDEX_SIZE)
+
+/* PMD_SHIFT determines what a second-level page table entry can map */
+#define PMD_SHIFT  (PAGE_SHIFT + PTE_INDEX_SIZE)
+#define PMD_SIZE   (1UL << PMD_SHIFT)
+#define PMD_MASK   (~(PMD_SIZE-1))
+
+/* With 4k base page size, hugepage PTEs go at the PMD level */
+#define MIN_HUGEPTE_SHIFT  PMD_SHIFT
+
+/* PUD_SHIFT determines what a third-level page table entry can map */
+#define PUD_SHIFT  (PMD_SHIFT + PMD_INDEX_SIZE)
+#define PUD_SIZE   (1UL << PUD_SHIFT)
+#define PUD_MASK   (~(PUD_SIZE-1))
+
+/* PGDIR_SHIFT determines what a fourth-level page table entry can map */
+#define PGDIR_SHIFT(PUD_SHIFT + PUD_INDEX_SIZE)
+#define PGDIR_SIZE (1UL << PGDIR_SHIFT)
+#define PGDIR_MASK (~(PGDIR_SIZE-1))
+
+/* Bits to mask out from a PMD to get to the PTE page */
+#define PMD_MASKED_BITS0
+/* Bits to mask out from a PUD to get to the PMD page */
+#define PUD_MASKED_BITS0
+/* Bits to mask out from a PGD to get to the PUD page */
+#define PGD_MASKED_BITS0
 
 /* PTE bits */
 #define _PAGE_HASHPTE  0x0400 /* software: pte has an associated HPTE */
@@ -14,3 +61,41 @@
 
 /* shift to put page number into pte */
 #define PTE_RPN_SHIFT  (17)
+
+#ifndef __ASSEMBLY__
+/*
+ * 4-level page tables related bits
+ */
+
+#define pgd_none(pgd)  (!pgd_val(pgd))
+#define pgd_bad(pgd)   (pgd_val(pgd) == 0)
+#define pgd_present(pgd)   (pgd_val(pgd) != 0)
+#define pgd_clear(pgdp)(pgd_val(*(pgdp)) = 0)
+#define pgd_page_vaddr(pgd)(pgd_val(pgd) & ~PGD_MASKED_BITS)
+
+static inline pte_t pgd_pte(pgd_t pgd)
+{
+   return __pte(pgd_val(pgd));
+}
+
+static inline pgd_t pte_pgd(pte_t pte)
+{
+   return __pgd(pte_val(pte));
+}
+extern struct page *pgd_page(pgd_t pgd);
+
+#define pud_offset(pgdp, addr) \
+  (((pud_t *) pgd_page_vaddr(*(pgdp))) + \
+(((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1)))
+
+#define pud_ERROR(e) \
+   pr_err("%s:%d: bad pud %08lx.\n", __FILE__, __LINE__, pud_val(e))
+
+/*
+ * On all 4K setups, remap_4k_pfn() equates to remap_pfn_range() */
+#define remap_4k_pfn(vma, addr, pfn, prot) \
+   remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE, (prot))
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* _ASM_POWERPC_BOOK3S_64_HASH_4K_H */
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 4f4ec2ab45c9..ee073822145d 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -1,4 +1,35 @@
-/* To be include by pgtable-hash64.h only */
+#ifndef _ASM_POWERPC_BOOK3S_64_HASH_64K_H
+#define _ASM_POWERPC_BOOK3S_64_HASH_64K_H
+
+#include 
+
+#define PTE_INDEX_SIZE  8
+#define PMD_INDEX_SIZE  10
+#define PUD_INDEX_SIZE 0
+#define PGD_INDEX_SIZE  12
+
+#define PTRS_PER_PTE   (1 << PTE_INDEX_SIZE)
+#define PTRS_PER_PMD   (1 << PMD_INDEX_SIZE)
+#define PTRS_PER_PGD   (1 << PGD_INDEX_SIZE)
+
+/* With 4k base page size, hugepage PTEs go at the PMD level */
+#define MIN_HUGEPTE_SHIFT  PAGE_SHIFT
+
+/* PMD_SHIFT determines what a second-level page table entry can map */
+#define PMD_SHIFT  (PAGE_SHIFT + PTE_INDEX_SIZE)
+#define PMD_SIZE   (1UL << PMD_SHIFT)
+#define PMD_MASK   (~(PMD_SIZE-1))
+
+/* PGDIR_SHIFT determines what a third-level page table entry can map */

[PATCH V2 24/31] powerpc/mm: Convert __hash_page_64K to C

2015-09-29 Thread Aneesh Kumar K.V
Convert from asm to C

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h |   3 +-
 arch/powerpc/include/asm/book3s/64/hash.h |   1 +
 arch/powerpc/mm/hash64_64k.c  | 134 +++-
 arch/powerpc/mm/hash_low_64.S | 290 +-
 arch/powerpc/mm/hash_utils_64.c   |  19 +-
 5 files changed, 137 insertions(+), 310 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index b363d73ca225..f46fbd6cd837 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -35,7 +35,8 @@
 #define _PAGE_4K_PFN   0x0004 /* PFN is for a single 4k page */
 
 /* PTE flags to conserve for HPTE identification */
-#define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | _PAGE_COMBO)
+#define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_F_SECOND | \
+_PAGE_F_GIX | _PAGE_HASHPTE | _PAGE_COMBO)
 
 /* Shift to put page number into pte.
  *
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index f5b57d1c00dc..e84987ade89c 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -86,6 +86,7 @@
 #define _PAGE_HASHPTE  0x00400 /* software: pte has an associated HPTE 
*/
 #define _PAGE_BUSY 0x00800 /* software: PTE & hash are busy */
 #define _PAGE_F_GIX0x07000 /* full page: hidx bits */
+#define _PAGE_F_GIX_SHIFT  12
 #define _PAGE_F_SECOND 0x08000 /* Whether to use secondary hash or not 
*/
 #define _PAGE_SPECIAL  0x1 /* software: special page */
 
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 2beead9c760e..5736940f0b86 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -44,10 +44,10 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
real_pte_t rpte;
unsigned long hpte_group;
unsigned int subpg_index;
-   unsigned long shift = 12; /* 4K */
unsigned long rflags, pa, hidx;
unsigned long old_pte, new_pte, subpg_pte;
unsigned long vpn, hash, slot;
+   unsigned long shift = mmu_psize_defs[MMU_PAGE_4K].shift;
 
/*
 * atomically mark the linux large page PTE busy and dirty
@@ -212,7 +212,7 @@ repeat:
 * nobody is undating hidx.
 */
rpte.hidx[subpg_index] = (unsigned char)(slot << 4 | 0x1 << 3);
-   new_pte |= _PAGE_HASHPTE;
+   new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE | _PAGE_COMBO;
/*
 * check __real_pte for details on matching smp_rmb()
 */
@@ -220,3 +220,133 @@ repeat:
*ptep = __pte(new_pte & ~_PAGE_BUSY);
return 0;
 }
+
+int __hash_page_64K(unsigned long ea, unsigned long access,
+   unsigned long vsid, pte_t *ptep, unsigned long trap,
+   unsigned long flags, int ssize)
+{
+
+   unsigned long hpte_group;
+   unsigned long rflags, pa;
+   unsigned long old_pte, new_pte;
+   unsigned long vpn, hash, slot;
+   unsigned long shift = mmu_psize_defs[MMU_PAGE_64K].shift;
+
+   /*
+* atomically mark the linux large page PTE busy and dirty
+*/
+   do {
+   pte_t pte = READ_ONCE(*ptep);
+
+   old_pte = pte_val(pte);
+   /* If PTE busy, retry the access */
+   if (unlikely(old_pte & _PAGE_BUSY))
+   return 0;
+   /* If PTE permissions don't match, take page fault */
+   if (unlikely(access & ~old_pte))
+   return 1;
+   /*
+* Check if PTE has the cache-inhibit bit set
+* If so, bail out and refault as a 4k page
+*/
+   if (!mmu_has_feature(MMU_FTR_CI_LARGE_PAGE) &&
+   unlikely(old_pte & _PAGE_NO_CACHE))
+   return 0;
+   /*
+* Try to lock the PTE, add ACCESSED and DIRTY if it was
+* a write access. Since this is 4K insert of 64K page size
+* also add _PAGE_COMBO
+*/
+   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED;
+   if (access & _PAGE_RW)
+   new_pte |= _PAGE_DIRTY;
+   } while (old_pte != __cmpxchg_u64((unsigned long *)ptep,
+ old_pte, new_pte));
+   /*
+* PP bits. _PAGE_USER is already PP bit 0x2, so we only
+* need to add in 0x1 if it's a read-only user page
+*/
+   rflags = new_pte & _PAGE_USER;
+   if ((new_pte & _PAGE_USER) && !((new_pte & _PAGE_RW) &&
+   (new_pte & _PAGE_DIRTY)))
+   rflags |= 0x1;
+   /*
+* _PAGE_EXEC -> 

[PATCH V2 10/31] powerpc/mm: Don't use pmd_val, pud_val and pgd_val as lvalue

2015-09-29 Thread Aneesh Kumar K.V
We convert them static inline function here as we did with pte_val in
the previous patch

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  6 -
 arch/powerpc/include/asm/book3s/64/hash-4k.h |  6 -
 arch/powerpc/include/asm/book3s/64/pgtable.h | 36 +---
 arch/powerpc/include/asm/page.h  | 34 +++---
 arch/powerpc/include/asm/pgalloc-32.h| 34 +++---
 arch/powerpc/include/asm/pgalloc-64.h| 17 +
 arch/powerpc/include/asm/pgtable-ppc32.h |  7 +-
 arch/powerpc/include/asm/pgtable-ppc64-4k.h  |  6 -
 arch/powerpc/include/asm/pgtable-ppc64.h | 36 +---
 arch/powerpc/mm/pgtable_64.c | 19 +++
 10 files changed, 149 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 2afe5958c837..9e47515b2e01 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -105,7 +105,11 @@ extern unsigned long ioremap_bot;
 #define pmd_none(pmd)  (!pmd_val(pmd))
 #definepmd_bad(pmd)(pmd_val(pmd) & _PMD_BAD)
 #definepmd_present(pmd)(pmd_val(pmd) & _PMD_PRESENT_MASK)
-#definepmd_clear(pmdp) do { pmd_val(*(pmdp)) = 0; } while (0)
+static inline void pmd_clear(pmd_t *pmdp)
+{
+   *pmdp = __pmd(0);
+}
+
 
 /*
  * When flushing the tlb entry for a page, we also need to flush the hash
diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 15518b620f5a..537eacecf6e9 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -71,9 +71,13 @@
 #define pgd_none(pgd)  (!pgd_val(pgd))
 #define pgd_bad(pgd)   (pgd_val(pgd) == 0)
 #define pgd_present(pgd)   (pgd_val(pgd) != 0)
-#define pgd_clear(pgdp)(pgd_val(*(pgdp)) = 0)
 #define pgd_page_vaddr(pgd)(pgd_val(pgd) & ~PGD_MASKED_BITS)
 
+static inline void pgd_clear(pgd_t *pgdp)
+{
+   *pgdp = __pgd(0);
+}
+
 static inline pte_t pgd_pte(pgd_t pgd)
 {
return __pte(pgd_val(pgd));
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 02b2a8264028..57ce35b077e0 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -240,21 +240,38 @@
 #define PMD_BAD_BITS   (PTE_TABLE_SIZE-1)
 #define PUD_BAD_BITS   (PMD_TABLE_SIZE-1)
 
-#define pmd_set(pmdp, pmdval)  (pmd_val(*(pmdp)) = (pmdval))
+static inline void pmd_set(pmd_t *pmdp, unsigned long val)
+{
+   *pmdp = __pmd(val);
+}
+
+static inline void pmd_clear(pmd_t *pmdp)
+{
+   *pmdp = __pmd(0);
+}
+
+
 #define pmd_none(pmd)  (!pmd_val(pmd))
 #definepmd_bad(pmd)(!is_kernel_addr(pmd_val(pmd)) \
 || (pmd_val(pmd) & PMD_BAD_BITS))
 #definepmd_present(pmd)(!pmd_none(pmd))
-#definepmd_clear(pmdp) (pmd_val(*(pmdp)) = 0)
 #define pmd_page_vaddr(pmd)(pmd_val(pmd) & ~PMD_MASKED_BITS)
 extern struct page *pmd_page(pmd_t pmd);
 
-#define pud_set(pudp, pudval)  (pud_val(*(pudp)) = (pudval))
+static inline void pud_set(pud_t *pudp, unsigned long val)
+{
+   *pudp = __pud(val);
+}
+
+static inline void pud_clear(pud_t *pudp)
+{
+   *pudp = __pud(0);
+}
+
 #define pud_none(pud)  (!pud_val(pud))
 #definepud_bad(pud)(!is_kernel_addr(pud_val(pud)) \
 || (pud_val(pud) & PUD_BAD_BITS))
 #define pud_present(pud)   (pud_val(pud) != 0)
-#define pud_clear(pudp)(pud_val(*(pudp)) = 0)
 #define pud_page_vaddr(pud)(pud_val(pud) & ~PUD_MASKED_BITS)
 
 extern struct page *pud_page(pud_t pud);
@@ -269,8 +286,11 @@ static inline pud_t pte_pud(pte_t pte)
return __pud(pte_val(pte));
 }
 #define pud_write(pud) pte_write(pud_pte(pud))
-#define pgd_set(pgdp, pudp)({pgd_val(*(pgdp)) = (unsigned long)(pudp);})
 #define pgd_write(pgd) pte_write(pgd_pte(pgd))
+static inline void pgd_set(pgd_t *pgdp, unsigned long val)
+{
+   *pgdp = __pgd(val);
+}
 
 /*
  * Find an entry in a page-table-directory.  We combine the address region
@@ -584,14 +604,12 @@ static inline pmd_t pmd_mkhuge(pmd_t pmd)
 
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
-   pmd_val(pmd) &= ~_PAGE_PRESENT;
-   return pmd;
+   return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
 }
 
 static inline pmd_t pmd_mksplitting(pmd_t pmd)
 {
-   pmd_val(pmd) |= _PAGE_SPLITTING;
-   return pmd;
+   return __pmd(pmd_val(pmd) | _PAGE_SPLITTING);
 }
 
 #define __HAVE_ARCH_PMD_SAME
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 

[PATCH V2 09/31] powerpc/mm: Don't use pte_val as lvalue

2015-09-29 Thread Aneesh Kumar K.V
We also convert few #define to static inline in this patch for better
type checking

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/pgtable.h | 112 --
 arch/powerpc/include/asm/page.h   |  10 ++-
 arch/powerpc/include/asm/pgtable-book3e.h |  68 --
 3 files changed, 133 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/pgtable.h 
b/arch/powerpc/include/asm/book3s/pgtable.h
index 87333618af3b..e156a6c9d84c 100644
--- a/arch/powerpc/include/asm/book3s/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/pgtable.h
@@ -12,9 +12,9 @@
 
 /* Generic accessors to PTE bits */
 static inline int pte_write(pte_t pte) { return !!(pte_val(pte) & 
_PAGE_RW);}
-static inline int pte_dirty(pte_t pte) { return pte_val(pte) & 
_PAGE_DIRTY; }
-static inline int pte_young(pte_t pte) { return pte_val(pte) & 
_PAGE_ACCESSED; }
-static inline int pte_special(pte_t pte)   { return pte_val(pte) & 
_PAGE_SPECIAL; }
+static inline int pte_dirty(pte_t pte) { return !!(pte_val(pte) & 
_PAGE_DIRTY); }
+static inline int pte_young(pte_t pte) { return !!(pte_val(pte) & 
_PAGE_ACCESSED); }
+static inline int pte_special(pte_t pte)   { return !!(pte_val(pte) & 
_PAGE_SPECIAL); }
 static inline int pte_none(pte_t pte)  { return (pte_val(pte) & 
~_PTE_NONE_MASK) == 0; }
 static inline pgprot_t pte_pgprot(pte_t pte)   { return __pgprot(pte_val(pte) 
& PAGE_PROT_BITS); }
 
@@ -47,36 +47,61 @@ static inline int pte_present(pte_t pte)
  * Even if PTEs can be unsigned long long, a PFN is always an unsigned
  * long for now.
  */
-static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot) {
+static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
+{
return __pte(((pte_basic_t)(pfn) << PTE_RPN_SHIFT) |
-pgprot_val(pgprot)); }
-static inline unsigned long pte_pfn(pte_t pte) {
-   return pte_val(pte) >> PTE_RPN_SHIFT; }
+pgprot_val(pgprot));
+}
+
+static inline unsigned long pte_pfn(pte_t pte)
+{
+   return pte_val(pte) >> PTE_RPN_SHIFT;
+}
 
 /* Generic modifiers for PTE bits */
 static inline pte_t pte_wrprotect(pte_t pte)
 {
-   pte_val(pte) &= ~_PAGE_RW;
+   return __pte(pte_val(pte) & ~_PAGE_RW);
+}
+
+static inline pte_t pte_mkclean(pte_t pte)
+{
+   return __pte(pte_val(pte) & ~_PAGE_DIRTY);
+}
+
+static inline pte_t pte_mkold(pte_t pte)
+{
+   return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
+}
+
+static inline pte_t pte_mkwrite(pte_t pte)
+{
+   return __pte(pte_val(pte) | _PAGE_RW);
+}
+
+static inline pte_t pte_mkdirty(pte_t pte)
+{
+   return __pte(pte_val(pte) | _PAGE_DIRTY);
+}
+
+static inline pte_t pte_mkyoung(pte_t pte)
+{
+   return __pte(pte_val(pte) | _PAGE_ACCESSED);
+}
+
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+   return __pte(pte_val(pte) | _PAGE_SPECIAL);
+}
+
+static inline pte_t pte_mkhuge(pte_t pte)
+{
return pte;
 }
-static inline pte_t pte_mkclean(pte_t pte) {
-   pte_val(pte) &= ~_PAGE_DIRTY; return pte; }
-static inline pte_t pte_mkold(pte_t pte) {
-   pte_val(pte) &= ~_PAGE_ACCESSED; return pte; }
-static inline pte_t pte_mkwrite(pte_t pte) {
-   pte_val(pte) |= _PAGE_RW; return pte; }
-static inline pte_t pte_mkdirty(pte_t pte) {
-   pte_val(pte) |= _PAGE_DIRTY; return pte; }
-static inline pte_t pte_mkyoung(pte_t pte) {
-   pte_val(pte) |= _PAGE_ACCESSED; return pte; }
-static inline pte_t pte_mkspecial(pte_t pte) {
-   pte_val(pte) |= _PAGE_SPECIAL; return pte; }
-static inline pte_t pte_mkhuge(pte_t pte) {
-   return pte; }
+
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
-   pte_val(pte) = (pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot);
-   return pte;
+   return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
 }
 
 
@@ -159,22 +184,39 @@ extern int ptep_set_access_flags(struct vm_area_struct 
*vma, unsigned long addre
 #define _PAGE_CACHE_CTL(_PAGE_COHERENT | _PAGE_GUARDED | 
_PAGE_NO_CACHE | \
 _PAGE_WRITETHRU)
 
-#define pgprot_noncached(prot)   (__pgprot((pgprot_val(prot) & 
~_PAGE_CACHE_CTL) | \
-   _PAGE_NO_CACHE | _PAGE_GUARDED))
+static inline pgprot_t pgprot_noncached(pgprot_t prot)
+{
+   return __pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) |
+   _PAGE_NO_CACHE | _PAGE_GUARDED);
+}
 
-#define pgprot_noncached_wc(prot) (__pgprot((pgprot_val(prot) & 
~_PAGE_CACHE_CTL) | \
-   _PAGE_NO_CACHE))
+static inline pgprot_t pgprot_noncached_wc(pgprot_t prot)
+{
+   return __pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) |
+   _PAGE_NO_CACHE);
+}
 
-#define pgprot_cached(prot)   (__pgprot((pgprot_val(prot) & 
~_PAGE_CACHE_CTL) | \
-   

[PATCH V2 25/31] powerpc/mm: Convert 4k insert from asm to C

2015-09-29 Thread Aneesh Kumar K.V
This is similar to 64K insert. May be we want to consolidate

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/Makefile|   6 +-
 arch/powerpc/mm/hash64_4k.c | 139 +
 arch/powerpc/mm/hash_low_64.S   | 331 
 arch/powerpc/mm/hash_utils_64.c |  26 
 4 files changed, 142 insertions(+), 360 deletions(-)
 create mode 100644 arch/powerpc/mm/hash64_4k.c
 delete mode 100644 arch/powerpc/mm/hash_low_64.S

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index f80ad1a76cc8..1ffeda85c086 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -14,11 +14,11 @@ obj-$(CONFIG_PPC_MMU_NOHASH)+= mmu_context_nohash.o 
tlb_nohash.o \
 obj-$(CONFIG_PPC_BOOK3E)   += tlb_low_$(CONFIG_WORD_SIZE)e.o
 hash64-$(CONFIG_PPC_NATIVE):= hash_native_64.o
 obj-$(CONFIG_PPC_STD_MMU_64)   += hash_utils_64.o slb_low.o slb.o $(hash64-y)
-obj-$(CONFIG_PPC_STD_MMU_32)   += ppc_mmu_32.o
-obj-$(CONFIG_PPC_STD_MMU)  += hash_low_$(CONFIG_WORD_SIZE).o \
-  tlb_hash$(CONFIG_WORD_SIZE).o \
+obj-$(CONFIG_PPC_STD_MMU_32)   += ppc_mmu_32.o hash_low_32.o
+obj-$(CONFIG_PPC_STD_MMU)  += tlb_hash$(CONFIG_WORD_SIZE).o \
   mmu_context_hash$(CONFIG_WORD_SIZE).o
 ifeq ($(CONFIG_PPC_STD_MMU_64),y)
+obj-$(CONFIG_PPC_4K_PAGES) += hash64_4k.o
 obj-$(CONFIG_PPC_64K_PAGES)+= hash64_64k.o
 endif
 obj-$(CONFIG_PPC_ICSWX)+= icswx.o
diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
new file mode 100644
index ..1832ed7fef0d
--- /dev/null
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -0,0 +1,139 @@
+/*
+ * Copyright IBM Corporation, 2015
+ * Author Aneesh Kumar K.V 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#include 
+#include 
+#include 
+
+int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
+  pte_t *ptep, unsigned long trap, unsigned long flags,
+  int ssize, int subpg_prot)
+{
+   unsigned long hpte_group;
+   unsigned long rflags, pa;
+   unsigned long old_pte, new_pte;
+   unsigned long vpn, hash, slot;
+   unsigned long shift = mmu_psize_defs[MMU_PAGE_4K].shift;
+
+   /*
+* atomically mark the linux large page PTE busy and dirty
+*/
+   do {
+   pte_t pte = READ_ONCE(*ptep);
+
+   old_pte = pte_val(pte);
+   /* If PTE busy, retry the access */
+   if (unlikely(old_pte & _PAGE_BUSY))
+   return 0;
+   /* If PTE permissions don't match, take page fault */
+   if (unlikely(access & ~old_pte))
+   return 1;
+   /*
+* Try to lock the PTE, add ACCESSED and DIRTY if it was
+* a write access. Since this is 4K insert of 64K page size
+* also add _PAGE_COMBO
+*/
+   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE;
+   if (access & _PAGE_RW)
+   new_pte |= _PAGE_DIRTY;
+   } while (old_pte != __cmpxchg_u64((unsigned long *)ptep,
+ old_pte, new_pte));
+   /*
+* PP bits. _PAGE_USER is already PP bit 0x2, so we only
+* need to add in 0x1 if it's a read-only user page
+*/
+   rflags = new_pte & _PAGE_USER;
+   if ((new_pte & _PAGE_USER) && !((new_pte & _PAGE_RW) &&
+   (new_pte & _PAGE_DIRTY)))
+   rflags |= 0x1;
+   /*
+* _PAGE_EXEC -> HW_NO_EXEC since it's inverted
+*/
+   rflags |= ((new_pte & _PAGE_EXEC) ? 0 : HPTE_R_N);
+   /*
+* Always add C and Memory coherence bit
+*/
+   rflags |= HPTE_R_C | HPTE_R_M;
+   /*
+* Add in WIMG bits
+*/
+   rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
+   _PAGE_COHERENT | _PAGE_GUARDED));
+
+   if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
+   !cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
+   rflags = hash_page_do_lazy_icache(rflags, __pte(old_pte), trap);
+
+   vpn  = hpt_vpn(ea, vsid, ssize);
+   if (unlikely(old_pte & _PAGE_HASHPTE)) {
+   /*
+* There MIGHT be an HPTE for this pte
+*/
+   hash = hpt_hash(vpn, shift, ssize);
+   if (old_pte & _PAGE_F_SECOND)
+   hash = ~hash;
+

[PATCH V2 11/31] powerpc/mm: Move hash64 PTE bits from book3s/64/pgtable.h to hash.h

2015-09-29 Thread Aneesh Kumar K.V
This enables us to keep hash64 related bits together, and makes it easy
to follow.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h| 442 ++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 441 +-
 arch/powerpc/include/asm/pgtable.h   |   6 -
 3 files changed, 442 insertions(+), 447 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 7deb5063ff8c..6d62be326366 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -2,6 +2,63 @@
 #define _ASM_POWERPC_BOOK3S_64_HASH_H
 #ifdef __KERNEL__
 
+#ifdef CONFIG_PPC_64K_PAGES
+#include 
+#else
+#include 
+#endif
+
+/*
+ * Size of EA range mapped by our pagetables.
+ */
+#define PGTABLE_EADDR_SIZE (PTE_INDEX_SIZE + PMD_INDEX_SIZE + \
+PUD_INDEX_SIZE + PGD_INDEX_SIZE + PAGE_SHIFT)
+#define PGTABLE_RANGE  (ASM_CONST(1) << PGTABLE_EADDR_SIZE)
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define PMD_CACHE_INDEX(PMD_INDEX_SIZE + 1)
+#else
+#define PMD_CACHE_INDEXPMD_INDEX_SIZE
+#endif
+/*
+ * Define the address range of the kernel non-linear virtual area
+ */
+#define KERN_VIRT_START ASM_CONST(0xD000)
+#define KERN_VIRT_SIZE ASM_CONST(0x1000)
+
+/*
+ * The vmalloc space starts at the beginning of that region, and
+ * occupies half of it on hash CPUs and a quarter of it on Book3E
+ * (we keep a quarter for the virtual memmap)
+ */
+#define VMALLOC_START  KERN_VIRT_START
+#define VMALLOC_SIZE   (KERN_VIRT_SIZE >> 1)
+#define VMALLOC_END(VMALLOC_START + VMALLOC_SIZE)
+
+/*
+ * Region IDs
+ */
+#define REGION_SHIFT   60UL
+#define REGION_MASK(0xfUL << REGION_SHIFT)
+#define REGION_ID(ea)  (((unsigned long)(ea)) >> REGION_SHIFT)
+
+#define VMALLOC_REGION_ID  (REGION_ID(VMALLOC_START))
+#define KERNEL_REGION_ID   (REGION_ID(PAGE_OFFSET))
+#define VMEMMAP_REGION_ID  (0xfUL) /* Server only */
+#define USER_REGION_ID (0UL)
+
+/*
+ * Defines the address of the vmemap area, in its own region on
+ * hash table CPUs.
+ */
+#define VMEMMAP_BASE   (VMEMMAP_REGION_ID << REGION_SHIFT)
+
+#ifdef CONFIG_PPC_MM_SLICES
+#define HAVE_ARCH_UNMAPPED_AREA
+#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
+#else
+#error "Hash Config needs mm slice enabled"
+#endif /* CONFIG_PPC_MM_SLICES */
 /*
  * Common bits between 4K and 64K pages in a linux-style PTE.
  * These match the bits in the (hardware-defined) PowerPC PTE as closely
@@ -46,11 +103,390 @@
 /* Hash table based platforms need atomic updates of the linux PTE */
 #define PTE_ATOMIC_UPDATES 1
 
-#ifdef CONFIG_PPC_64K_PAGES
-#include 
+/*
+ * THP pages can't be special. So use the _PAGE_SPECIAL
+ */
+#define _PAGE_SPLITTING _PAGE_SPECIAL
+
+/*
+ * We need to differentiate between explicit huge page and THP huge
+ * page, since THP huge page also need to track real subpage details
+ */
+#define _PAGE_THP_HUGE  _PAGE_4K_PFN
+
+/*
+ * set of bits not changed in pmd_modify.
+ */
+#define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS |  \
+_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_SPLITTING | \
+_PAGE_THP_HUGE)
+#define _PTE_NONE_MASK _PAGE_HPTEFLAGS
+/*
+ * The mask convered by the RPN must be a ULL on 32-bit platforms with
+ * 64-bit PTEs
+ * FIXME!! double check the RPN_MAX May be not used
+ */
+//#define PTE_RPN_MAX  (1UL << (32 - PTE_RPN_SHIFT))
+#define PTE_RPN_MASK   (~((1UL << PTE_RPN_SHIFT) - 1))
+/*
+ * _PAGE_CHG_MASK masks of bits that are to be preserved across
+ * pgprot changes
+ */
+#define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \
+_PAGE_ACCESSED | _PAGE_SPECIAL)
+/*
+ * Mask of bits returned by pte_pgprot()
+ */
+#define PAGE_PROT_BITS (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | \
+_PAGE_WRITETHRU | _PAGE_4K_PFN | \
+_PAGE_USER | _PAGE_ACCESSED |  \
+_PAGE_RW |  _PAGE_DIRTY | _PAGE_EXEC)
+/*
+ * We define 2 sets of base prot bits, one for basic pages (ie,
+ * cacheable kernel and user pages) and one for non cacheable
+ * pages. We always set _PAGE_COHERENT when SMP is enabled or
+ * the processor might need it for DMA coherency.
+ */
+#define _PAGE_BASE_NC  (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE)
+#define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT)
+
+/* Permission masks used to generate the __P and __S table,
+ *
+ * Note:__pgprot is defined in arch/powerpc/include/asm/page.h
+ *
+ * Write permissions imply read permissions for now (we could make write-only
+ * pages on BookE but we don't bother for now). Execute permission control is
+ * possible on platforms that define _PAGE_EXEC
+ *
+ * Note due to the way vm flags are laid out, the bits are XWR
+ */
+#define 

[PATCH V2 12/31] powerpc/mm: Move PTE bits from generic functions to hash64 functions.

2015-09-29 Thread Aneesh Kumar K.V
functions which operate on pte bits are moved to hash*.h and other
generic functions are moved to pgtable.h

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 177 
 arch/powerpc/include/asm/book3s/64/hash.h| 144 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h |   6 +
 arch/powerpc/include/asm/book3s/pgtable.h| 198 ---
 4 files changed, 327 insertions(+), 198 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 9e47515b2e01..affcbfc14e3a 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -294,6 +294,183 @@ void pgtable_cache_init(void);
 extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep,
  pmd_t **pmdp);
 
+/* Generic accessors to PTE bits */
+static inline int pte_write(pte_t pte) { return !!(pte_val(pte) & 
_PAGE_RW);}
+static inline int pte_dirty(pte_t pte) { return !!(pte_val(pte) & 
_PAGE_DIRTY); }
+static inline int pte_young(pte_t pte) { return !!(pte_val(pte) & 
_PAGE_ACCESSED); }
+static inline int pte_special(pte_t pte)   { return !!(pte_val(pte) & 
_PAGE_SPECIAL); }
+static inline int pte_none(pte_t pte)  { return (pte_val(pte) & 
~_PTE_NONE_MASK) == 0; }
+static inline pgprot_t pte_pgprot(pte_t pte)   { return __pgprot(pte_val(pte) 
& PAGE_PROT_BITS); }
+
+static inline int pte_present(pte_t pte)
+{
+   return pte_val(pte) & _PAGE_PRESENT;
+}
+
+/* Conversion functions: convert a page and protection to a page entry,
+ * and a page entry and page directory to the page they refer to.
+ *
+ * Even if PTEs can be unsigned long long, a PFN is always an unsigned
+ * long for now.
+ */
+static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
+{
+   return __pte(((pte_basic_t)(pfn) << PTE_RPN_SHIFT) |
+pgprot_val(pgprot));
+}
+
+static inline unsigned long pte_pfn(pte_t pte)
+{
+   return pte_val(pte) >> PTE_RPN_SHIFT;
+}
+
+/* Generic modifiers for PTE bits */
+static inline pte_t pte_wrprotect(pte_t pte)
+{
+   return __pte(pte_val(pte) & ~_PAGE_RW);
+}
+
+static inline pte_t pte_mkclean(pte_t pte)
+{
+   return __pte(pte_val(pte) & ~_PAGE_DIRTY);
+}
+
+static inline pte_t pte_mkold(pte_t pte)
+{
+   return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
+}
+
+static inline pte_t pte_mkwrite(pte_t pte)
+{
+   return __pte(pte_val(pte) | _PAGE_RW);
+}
+
+static inline pte_t pte_mkdirty(pte_t pte)
+{
+   return __pte(pte_val(pte) | _PAGE_DIRTY);
+}
+
+static inline pte_t pte_mkyoung(pte_t pte)
+{
+   return __pte(pte_val(pte) | _PAGE_ACCESSED);
+}
+
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+   return __pte(pte_val(pte) | _PAGE_SPECIAL);
+}
+
+static inline pte_t pte_mkhuge(pte_t pte)
+{
+   return pte;
+}
+
+static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
+{
+   return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
+}
+
+
+
+/* This low level function performs the actual PTE insertion
+ * Setting the PTE depends on the MMU type and other factors. It's
+ * an horrible mess that I'm not going to try to clean up now but
+ * I'm keeping it in one place rather than spread around
+ */
+static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
+   pte_t *ptep, pte_t pte, int percpu)
+{
+#if defined(CONFIG_PPC_STD_MMU_32) && defined(CONFIG_SMP) && 
!defined(CONFIG_PTE_64BIT)
+   /* First case is 32-bit Hash MMU in SMP mode with 32-bit PTEs. We use 
the
+* helper pte_update() which does an atomic update. We need to do that
+* because a concurrent invalidation can clear _PAGE_HASHPTE. If it's a
+* per-CPU PTE such as a kmap_atomic, we do a simple update preserving
+* the hash bits instead (ie, same as the non-SMP case)
+*/
+   if (percpu)
+   *ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
+ | (pte_val(pte) & ~_PAGE_HASHPTE));
+   else
+   pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte));
+
+#elif defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
+   /* Second case is 32-bit with 64-bit PTE.  In this case, we
+* can just store as long as we do the two halves in the right order
+* with a barrier in between. This is possible because we take care,
+* in the hash code, to pre-invalidate if the PTE was already hashed,
+* which synchronizes us with any concurrent invalidation.
+* In the percpu case, we also fallback to the simple update preserving
+* the hash bits
+*/
+   if (percpu) {
+   *ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
+ | (pte_val(pte) & ~_PAGE_HASHPTE));
+   return;
+   }
+   if 

[PATCH 1/3] cxl: fix leak of IRQ names in cxl_free_afu_irqs()

2015-09-29 Thread Andrew Donnellan
cxl_free_afu_irqs() doesn't free IRQ names when it releases an AFU's IRQ
ranges. The userspace API equivalent in afu_release_irqs() calls
afu_irq_name_free() to release the IRQ names.

Call afu_irq_name_free() in cxl_free_afu_irqs() to release the IRQ names.
Make afu_irq_name_free() non-static to allow this.

Reported-by: Matthew R. Ochs 
Fixes: 6f7f0b3df6d4 ("cxl: Add AFU virtual PHB and kernel API")
Signed-off-by: Andrew Donnellan 
Signed-off-by: Ian Munsie 
---
 drivers/misc/cxl/api.c | 1 +
 drivers/misc/cxl/cxl.h | 1 +
 drivers/misc/cxl/irq.c | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 8af12c8..103baf0 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -105,6 +105,7 @@ EXPORT_SYMBOL_GPL(cxl_allocate_afu_irqs);
 
 void cxl_free_afu_irqs(struct cxl_context *ctx)
 {
+   afu_irq_name_free(ctx);
cxl_release_irq_ranges(>irqs, ctx->afu->adapter);
 }
 EXPORT_SYMBOL_GPL(cxl_free_afu_irqs);
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 1c30ef7..0cfb9c1 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -677,6 +677,7 @@ int cxl_register_serr_irq(struct cxl_afu *afu);
 void cxl_release_serr_irq(struct cxl_afu *afu);
 int afu_register_irqs(struct cxl_context *ctx, u32 count);
 void afu_release_irqs(struct cxl_context *ctx, void *cookie);
+void afu_irq_name_free(struct cxl_context *ctx);
 irqreturn_t cxl_slice_irq_err(int irq, void *data);
 
 int cxl_debugfs_init(void);
diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index 583b42a..38b57d6 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -414,7 +414,7 @@ void cxl_release_psl_irq(struct cxl_afu *afu)
kfree(afu->psl_irq_name);
 }
 
-static void afu_irq_name_free(struct cxl_context *ctx)
+void afu_irq_name_free(struct cxl_context *ctx)
 {
struct cxl_irq_name *irq_name, *tmp;
 
-- 
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 04/31] powerpc/mm: make a separate copy for book3s (part 2)

2015-09-29 Thread Aneesh Kumar K.V
Keep it seperate to make rebasing easier

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 4 ++--
 arch/powerpc/include/asm/book3s/64/pgtable.h | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 1a58a05be99c..a7738dfbe7e5 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PGTABLE_PPC32_H
-#define _ASM_POWERPC_PGTABLE_PPC32_H
+#ifndef _ASM_POWERPC_BOOK3S_32_PGTABLE_H
+#define _ASM_POWERPC_BOOK3S_32_PGTABLE_H
 
 #include 
 
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index fd00cab62008..28baca35935a 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PGTABLE_PPC64_H_
-#define _ASM_POWERPC_PGTABLE_PPC64_H_
+#ifndef _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
+#define _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
 /*
  * This file contains the functions and defines necessary to modify and use
  * the ppc64 hashed page table.
@@ -615,4 +615,4 @@ static inline int pmd_move_must_withdraw(struct spinlock 
*new_pmd_ptl,
return true;
 }
 #endif /* __ASSEMBLY__ */
-#endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
+#endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 14/31] powerpc/booke: Move booke headers (part 2)

2015-09-29 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/{pgtable-ppc32.h => nohash/32/pgtable.h} | 0
 arch/powerpc/include/asm/{pgtable-ppc64.h => nohash/64/pgtable.h} | 2 +-
 arch/powerpc/include/asm/nohash/pgtable.h | 8 
 3 files changed, 5 insertions(+), 5 deletions(-)
 rename arch/powerpc/include/asm/{pgtable-ppc32.h => nohash/32/pgtable.h} (100%)
 rename arch/powerpc/include/asm/{pgtable-ppc64.h => nohash/64/pgtable.h} (99%)

diff --git a/arch/powerpc/include/asm/pgtable-ppc32.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
similarity index 100%
rename from arch/powerpc/include/asm/pgtable-ppc32.h
rename to arch/powerpc/include/asm/nohash/32/pgtable.h
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
similarity index 99%
rename from arch/powerpc/include/asm/pgtable-ppc64.h
rename to arch/powerpc/include/asm/nohash/64/pgtable.h
index 8c851dbc625f..8f6195f147a4 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -18,7 +18,7 @@
  * Size of EA range mapped by our pagetables.
  */
 #define PGTABLE_EADDR_SIZE (PTE_INDEX_SIZE + PMD_INDEX_SIZE + \
-   PUD_INDEX_SIZE + PGD_INDEX_SIZE + PAGE_SHIFT)
+   PUD_INDEX_SIZE + PGD_INDEX_SIZE + PAGE_SHIFT)
 #define PGTABLE_RANGE (ASM_CONST(1) << PGTABLE_EADDR_SIZE)
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 91325997ba25..c0c41a2409d2 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -1,10 +1,10 @@
-#ifndef _ASM_POWERPC_PGTABLE_BOOK3E_H
-#define _ASM_POWERPC_PGTABLE_BOOK3E_H
+#ifndef _ASM_POWERPC_NOHASH_PGTABLE_H
+#define _ASM_POWERPC_NOHASH_PGTABLE_H
 
 #if defined(CONFIG_PPC64)
-#include 
+#include 
 #else
-#include 
+#include 
 #endif
 
 #ifndef __ASSEMBLY__
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 13/31] powerpc/booke: Move booke headers (part 1)

2015-09-29 Thread Aneesh Kumar K.V
Move the booke related headers below booke/32 or booke/64

We are splitting this change into multiple patch to make the rebasing
easier. The following patches can be folded into this if needed.
They are kept separate for easier review.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/{pgtable-book3e.h => nohash/pgtable.h} | 0
 arch/powerpc/include/asm/pgtable.h  | 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/include/asm/{pgtable-book3e.h => nohash/pgtable.h} (100%)

diff --git a/arch/powerpc/include/asm/pgtable-book3e.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
similarity index 100%
rename from arch/powerpc/include/asm/pgtable-book3e.h
rename to arch/powerpc/include/asm/nohash/pgtable.h
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 485b50cd03ff..10f94635ecfb 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -15,7 +15,7 @@ struct mm_struct;
 #ifdef CONFIG_PPC_BOOK3S
 #include 
 #else
-#include 
+#include 
 #endif /* !CONFIG_PPC_BOOK3S */
 
 #ifndef __ASSEMBLY__
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 28/31] powerpc/mm: Move WIMG update to helper.

2015-09-29 Thread Aneesh Kumar K.V
Only difference here is, we apply the WIMG mapping early, so rflags
passed to updatepp will also be changed.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hash64_4k.c  |  5 -
 arch/powerpc/mm/hash64_64k.c | 10 --
 arch/powerpc/mm/hash_utils_64.c  | 13 -
 arch/powerpc/mm/hugepage-hash64.c|  7 ---
 arch/powerpc/mm/hugetlbpage-hash64.c |  8 
 5 files changed, 12 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
index 7749857f6ced..42af2d3a8b63 100644
--- a/arch/powerpc/mm/hash64_4k.c
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -54,11 +54,6 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * need to add in 0x1 if it's a read-only user page
 */
rflags = htab_convert_pte_flags(new_pte);
-   /*
-* Add in WIMG bits
-*/
-   rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
-   _PAGE_COHERENT | _PAGE_GUARDED));
 
if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index c30635d27c0f..0f5f3fc4923e 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -77,11 +77,6 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 */
subpg_pte = new_pte & ~subpg_prot;
rflags = htab_convert_pte_flags(subpg_pte);
-   /*
-* Add in WIMG bits
-*/
-   rflags |= (subpg_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
-   _PAGE_COHERENT | _PAGE_GUARDED));
 
if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
@@ -249,11 +244,6 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
  old_pte, new_pte));
 
rflags = htab_convert_pte_flags(new_pte);
-   /*
-* Add in WIMG bits
-*/
-   rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
-   _PAGE_COHERENT | _PAGE_GUARDED));
 
if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 444c42eabfdf..b5c24455715e 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -182,7 +182,18 @@ unsigned long htab_convert_pte_flags(unsigned long 
pteflags)
/*
 * Always add "C" bit for perf. Memory coherence is always enabled
 */
-   return rflags | HPTE_R_C | HPTE_R_M;
+   rflags |=  HPTE_R_C | HPTE_R_M;
+   /*
+* Add in WIG bits
+*/
+   if (pteflags & _PAGE_WRITETHRU)
+   rflags |= HPTE_R_W;
+   if (pteflags & _PAGE_NO_CACHE)
+   rflags |= HPTE_R_I;
+   if (pteflags & _PAGE_GUARDED)
+   rflags |= HPTE_R_G;
+
+   return rflags;
 }
 
 int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
diff --git a/arch/powerpc/mm/hugepage-hash64.c 
b/arch/powerpc/mm/hugepage-hash64.c
index 91fcac6f989d..1f666de0110a 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -120,13 +120,6 @@ int __hash_page_thp(unsigned long ea, unsigned long 
access, unsigned long vsid,
pa = pmd_pfn(__pmd(old_pmd)) << PAGE_SHIFT;
new_pmd |= _PAGE_HASHPTE;
 
-   /* Add in WIMG bits */
-   rflags |= (new_pmd & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
- _PAGE_GUARDED));
-   /*
-* enable the memory coherence always
-*/
-   rflags |= HPTE_R_M;
 repeat:
hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & 
~0x7UL;
 
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c 
b/arch/powerpc/mm/hugetlbpage-hash64.c
index 304c8520506e..0734e4daffef 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -91,14 +91,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, 
unsigned long vsid,
/* clear HPTE slot informations in new PTE */
new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE;
 
-   /* Add in WIMG bits */
-   rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
- _PAGE_COHERENT | _PAGE_GUARDED));
-   /*
-* enable the memory coherence always
-*/
-   rflags |= HPTE_R_M;
-
slot = hpte_insert_repeating(hash, vpn, pa, rflags, 0,
 mmu_psize, ssize);
 
-- 
2.5.0

___
Linuxppc-dev mailing list

[PATCH V2 29/31] powerpc/mm: Move hugetlb related headers

2015-09-29 Thread Aneesh Kumar K.V
W.r.t hugetlb, we support two format for pmd. With book3s_64 and
64K linux page size, we can have pte at the pmd level. Hence we
don't need to support hugepd there. For everything else hugepd
is supported and pmd_huge is (0).

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  | 31 
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 40 
 arch/powerpc/include/asm/nohash/pgtable.h | 25 +
 arch/powerpc/include/asm/page.h   | 27 ++
 arch/powerpc/mm/hugetlbpage.c | 53 ---
 5 files changed, 100 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 75e8b9326e4b..b4d25529d179 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -93,6 +93,37 @@ extern struct page *pgd_page(pgd_t pgd);
 #define remap_4k_pfn(vma, addr, pfn, prot) \
remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE, (prot))
 
+#ifdef CONFIG_HUGETLB_PAGE
+/*
+ * For 4k page size, we support explicit hugepage via hugepd
+ */
+static inline int pmd_huge(pmd_t pmd)
+{
+   return 0;
+}
+
+static inline int pud_huge(pud_t pud)
+{
+   return 0;
+}
+
+static inline int pgd_huge(pgd_t pgd)
+{
+   return 0;
+}
+#define pgd_huge pgd_huge
+
+static inline int hugepd_ok(hugepd_t hpd)
+{
+   /*
+* hugepd pointer, bottom two bits == 00 and next 4 bits
+* indicate size of table
+*/
+   return (((hpd.pd & 0x3) == 0x0) && ((hpd.pd & HUGEPD_SHIFT_MASK) != 0));
+}
+#define is_hugepd(hpd) (hugepd_ok(hpd))
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_BOOK3S_64_HASH_4K_H */
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index f46fbd6cd837..0869e5fe5d08 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -119,6 +119,46 @@ static inline bool __rpte_sub_valid(real_pte_t rpte, 
unsigned long index)
 #define pgd_pte(pgd)   (pud_pte(((pud_t){ pgd })))
 #define pte_pgd(pte)   ((pgd_t)pte_pud(pte))
 
+#ifdef CONFIG_HUGETLB_PAGE
+/*
+ * We have PGD_INDEX_SIZ = 12 and PTE_INDEX_SIZE = 8, so that we can have
+ * 16GB hugepage pte in PGD and 16MB hugepage pte at PMD;
+ *
+ * Defined in such a way that we can optimize away code block at build time
+ * if CONFIG_HUGETLB_PAGE=n.
+ */
+static inline int pmd_huge(pmd_t pmd)
+{
+   /*
+* leaf pte for huge page, bottom two bits != 00
+*/
+   return ((pmd_val(pmd) & 0x3) != 0x0);
+}
+
+static inline int pud_huge(pud_t pud)
+{
+   /*
+* leaf pte for huge page, bottom two bits != 00
+*/
+   return ((pud_val(pud) & 0x3) != 0x0);
+}
+
+static inline int pgd_huge(pgd_t pgd)
+{
+   /*
+* leaf pte for huge page, bottom two bits != 00
+*/
+   return ((pgd_val(pgd) & 0x3) != 0x0);
+}
+#define pgd_huge pgd_huge
+
+static inline int hugepd_ok(hugepd_t hpd)
+{
+   return 0;
+}
+#define is_hugepd(pdep)0
+#endif /* CONFIG_HUGETLB_PAGE */
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_BOOK3S_64_HASH_64K_H */
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index c0c41a2409d2..1263c22d60d8 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -223,5 +223,30 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
unsigned long pfn,
 unsigned long size, pgprot_t vma_prot);
 #define __HAVE_PHYS_MEM_ACCESS_PROT
 
+#ifdef CONFIG_HUGETLB_PAGE
+static inline int hugepd_ok(hugepd_t hpd)
+{
+   return (hpd.pd > 0);
+}
+
+static inline int pmd_huge(pmd_t pmd)
+{
+   return 0;
+}
+
+static inline int pud_huge(pud_t pud)
+{
+   return 0;
+}
+
+static inline int pgd_huge(pgd_t pgd)
+{
+   return 0;
+}
+#define pgd_huge   pgd_huge
+
+#define is_hugepd(hpd) (hugepd_ok(hpd))
+#endif
+
 #endif /* __ASSEMBLY__ */
 #endif
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 83a4cc5fc306..31835988c12a 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -384,30 +384,11 @@ typedef unsigned long pgprot_t;
 
 typedef struct { signed long pd; } hugepd_t;
 
-#ifdef CONFIG_HUGETLB_PAGE
-#ifdef CONFIG_PPC_BOOK3S_64
-static inline int hugepd_ok(hugepd_t hpd)
-{
-   /*
-* hugepd pointer, bottom two bits == 00 and next 4 bits
-* indicate size of table
-*/
-   return (((hpd.pd & 0x3) == 0x0) && ((hpd.pd & HUGEPD_SHIFT_MASK) != 0));
-}
-#else
-static inline int hugepd_ok(hugepd_t hpd)
-{
-   return (hpd.pd > 0);
-}
-#endif
-
-#define is_hugepd(hpd)   (hugepd_ok(hpd))

[PATCH 2/3] cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API

2015-09-29 Thread Andrew Donnellan
At present, ctx->irq_bitmap is freed in afu_release_irqs(), which is called
from afu_release() via cxl_context_detach().

Move the freeing of ctx->irq_bitmap from afu_release_irqs() to
reclaim_ctx() (called through cxl_context_free()) so it's freed when
releasing a context via the kernel API (cxl_release_context()) or the
userspace API (afu_release()).

Reported-by: Matthew R. Ochs 
Fixes: 6f7f0b3df6d4 ("cxl: Add AFU virtual PHB and kernel API")
Signed-off-by: Andrew Donnellan 
---
 drivers/misc/cxl/context.c | 3 +++
 drivers/misc/cxl/irq.c | 2 --
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index e762f85..2faa127 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -275,6 +275,9 @@ static void reclaim_ctx(struct rcu_head *rcu)
if (ctx->kernelapi)
kfree(ctx->mapping);
 
+   if (ctx->irq_bitmap)
+   kfree(ctx->irq_bitmap);
+
kfree(ctx);
 }
 
diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index 38b57d6..09a4060 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -524,7 +524,5 @@ void afu_release_irqs(struct cxl_context *ctx, void *cookie)
afu_irq_name_free(ctx);
cxl_release_irq_ranges(>irqs, ctx->afu->adapter);
 
-   kfree(ctx->irq_bitmap);
-   ctx->irq_bitmap = NULL;
ctx->irq_count = 0;
 }
-- 
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 00/31] powerpc/mm: Update page table format for book3s 64

2015-09-29 Thread Aneesh Kumar K.V
Hi All,

This patch series attempt to update book3s 64 linux page table format to
make it more flexible. Our current pte format is very restrictive and we
overload multiple pte bits. This is due to the non-availability of free bits
in pte_t. We use pte_t to track the validity of 4K subpages. This patch
series free up pte_t of 11 bits by moving 4K subpage tracking to the
lower half of PTE page. The pte format is updated such that we have a
better method for identifying a pte entry at pmd level. This will also enable
us to implement hugetlb migration(not yet done in this series). 

Before making the changes to the pte format, I am splitting the
pte header definition such that we now have the below layout for headers

book3s
   32
 hash.h pgtable.h 
   64
 hash.h  pgtable.h hash-4k.h hash-64k.h
booke
  32
 pgtable.h pte-40x.h pte-44x.h pte-8xx.h pte-fsl-booke.h
  64
pgtable-4k.h  pgtable-64k.h  pgtable.h

I have done the header split such that booke headers and modified to the 
minimum so as to avoid
causing breakage in booke.

The patch series can also be found at
https://github.com/kvaneesh/linux.git book3s-pte-format 
https://github.com/kvaneesh/linux/commits/book3s-pte-format

Performance numbers with and without patch series.

Path length __hash_page_4k
with patch: 196
without patch: 142

Path length __hash_page_64k
with patch: 219
without patch: 154

But even if we have a path lengh increase of around 50 instructions. We don't 
see
the impact when running workload. I tried the kernelbuild test. 

With THP enabled (which is default) we see an improvement. I haven't fully 
looked at
the reason. This could be due to reduced contention of ptl lock. 
__hash_thp_page is
already a C code.

make -j64 vmlinux modules 
With fix:
-
real1m35.509s
user56m8.565s
sys 4m34.973s

real1m32.174s
user57m2.336s
sys 4m39.142s

Without fix:
---
real1m37.703s
user58m50.783s
sys 7m52.440s

real1m37.890s
user57m55.445s
sys 7m50.501s

THP disabled:

make -j64 vmlinux modules 
With fix:
-
real1m37.197s
user58m28.672s
sys 7m58.188s

real1m44.638s
user58m37.551s
sys 7m53.960s

Without fix:

real1m41.224s
user58m46.944s
sys 7m49.714s

real1m42.585s
user59m14.019s
sys 7m52.714s

Changes from V1:
1) Build fix with STRICT_MM_TYPES enabled 
2) pte_mkwrite fix for nohash
3) rebase to latest linus tree.

Aneesh Kumar K.V (31):
  powerpc/mm: move pte headers to book3s directory
  powerpc/mm: move pte headers to book3s directory (part 2)
  powerpc/mm: make a separate copy for book3s
  powerpc/mm: make a separate copy for book3s (part 2)
  powerpc/mm: Move hash specific pte width and other defines to book3s
  powerpc/mm: Delete booke bits from book3s
  powerpc/mm: Don't have generic headers introduce functions touching
pte bits
  powerpc/mm: Drop pte-common.h from BOOK3S 64
  powerpc/mm: Don't use pte_val as lvalue
  powerpc/mm: Don't use pmd_val,pud_val and pgd_val as lvalue
  powerpc/mm: Move hash64 PTE bits from book3s/64/pgtable.h to hash.h
  powerpc/mm: Move PTE bits from generic functions to hash64 functions.
  powerpc/booke: Move booke headers (part 1)
  powerpc/booke: Move booke headers (part 2)
  powerpc/booke: Move booke headers (part 3)
  powerpc/booke: Move booke headers (part 4)
  powerpc/booke: Move booke headers (part 5)
  powerpc/mm: Increase the pte frag size.
  powerpc/mm: Convert 4k hash insert to C
  powerpc/mm: update __real_pte to take address as argument
  powerpc/mm: make pte page hash index slot 8 bits
  powerpc/mm: Don't track subpage valid bit in pte_t
  powerpc/mm: Increase the width of #define
  powerpc/mm: Convert __hash_page_64K to C
  powerpc/mm: Convert 4k insert from asm to C
  powerpc/mm: Remove the dependency on pte bit position in asm code
  powerpc/mm: Add helper for converting pte bit to hpte bits
  powerpc/mm: Move WIMG update to helper.
  powerpc/mm: Move hugetlb related headers
  powerpc/mm: Move THP headers around
  powerpc/mm: Add a _PAGE_PTE bit

 .../include/asm/{pte-hash32.h => book3s/32/hash.h} |6 +-
 arch/powerpc/include/asm/book3s/32/pgtable.h   |  476 ++
 arch/powerpc/include/asm/book3s/64/hash-4k.h   |  132 +++
 arch/powerpc/include/asm/book3s/64/hash-64k.h  |  285 ++
 arch/powerpc/include/asm/book3s/64/hash.h  |  513 ++
 arch/powerpc/include/asm/book3s/64/pgtable.h   |  266 ++
 arch/powerpc/include/asm/book3s/pgtable.h  |   29 +
 arch/powerpc/include/asm/mmu-hash64.h  |2 +-
 .../asm/{pgtable-ppc32.h => nohash/32/pgtable.h}   |   25 +-
 arch/powerpc/include/asm/{ => nohash/32}/pte-40x.h |6 +-
 arch/powerpc/include/asm/{ => nohash/32}/pte-44x.h |6 +-
 arch/powerpc/include/asm/{ => nohash/32}/pte-8xx.h |6 +-
 .../include/asm/{ => nohash/32}/pte-fsl-booke.h|6 +-
 .../{pgtable-ppc64-4k.h => nohash/64/pgtable-4k.h} |   12 +-
 .../64/pgtable-64k.h}  

[PATCH V2 16/31] powerpc/booke: Move booke headers (part 4)

2015-09-29 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 16 
 arch/powerpc/include/asm/{ => nohash/32}/pte-40x.h   |  0
 arch/powerpc/include/asm/{ => nohash/32}/pte-44x.h   |  0
 arch/powerpc/include/asm/{ => nohash/32}/pte-8xx.h   |  0
 arch/powerpc/include/asm/{ => nohash/32}/pte-fsl-booke.h |  0
 arch/powerpc/include/asm/nohash/64/pgtable.h |  2 +-
 arch/powerpc/include/asm/{ => nohash}/pte-book3e.h   |  0
 7 files changed, 9 insertions(+), 9 deletions(-)
 rename arch/powerpc/include/asm/{ => nohash/32}/pte-40x.h (100%)
 rename arch/powerpc/include/asm/{ => nohash/32}/pte-44x.h (100%)
 rename arch/powerpc/include/asm/{ => nohash/32}/pte-8xx.h (100%)
 rename arch/powerpc/include/asm/{ => nohash/32}/pte-fsl-booke.h (100%)
 rename arch/powerpc/include/asm/{ => nohash}/pte-book3e.h (100%)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index fbb23c54b998..c82cbf52d19e 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PGTABLE_PPC32_H
-#define _ASM_POWERPC_PGTABLE_PPC32_H
+#ifndef _ASM_POWERPC_NOHASH_32_PGTABLE_H
+#define _ASM_POWERPC_NOHASH_32_PGTABLE_H
 
 #include 
 
@@ -106,15 +106,15 @@ extern int icache_44x_need_flush;
  */
 
 #if defined(CONFIG_40x)
-#include 
+#include 
 #elif defined(CONFIG_44x)
-#include 
+#include 
 #elif defined(CONFIG_FSL_BOOKE) && defined(CONFIG_PTE_64BIT)
-#include 
+#include 
 #elif defined(CONFIG_FSL_BOOKE)
-#include 
+#include 
 #elif defined(CONFIG_8xx)
-#include 
+#include 
 #endif
 
 /* And here we include common definitions */
@@ -340,4 +340,4 @@ extern int get_pteptr(struct mm_struct *mm, unsigned long 
addr, pte_t **ptep,
 
 #endif /* !__ASSEMBLY__ */
 
-#endif /* _ASM_POWERPC_PGTABLE_PPC32_H */
+#endif /* __ASM_POWERPC_NOHASH_32_PGTABLE_H */
diff --git a/arch/powerpc/include/asm/pte-40x.h 
b/arch/powerpc/include/asm/nohash/32/pte-40x.h
similarity index 100%
rename from arch/powerpc/include/asm/pte-40x.h
rename to arch/powerpc/include/asm/nohash/32/pte-40x.h
diff --git a/arch/powerpc/include/asm/pte-44x.h 
b/arch/powerpc/include/asm/nohash/32/pte-44x.h
similarity index 100%
rename from arch/powerpc/include/asm/pte-44x.h
rename to arch/powerpc/include/asm/nohash/32/pte-44x.h
diff --git a/arch/powerpc/include/asm/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
similarity index 100%
rename from arch/powerpc/include/asm/pte-8xx.h
rename to arch/powerpc/include/asm/nohash/32/pte-8xx.h
diff --git a/arch/powerpc/include/asm/pte-fsl-booke.h 
b/arch/powerpc/include/asm/nohash/32/pte-fsl-booke.h
similarity index 100%
rename from arch/powerpc/include/asm/pte-fsl-booke.h
rename to arch/powerpc/include/asm/nohash/32/pte-fsl-booke.h
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 1b582e56b60c..c33aa32ffba5 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -97,7 +97,7 @@
 /*
  * Include the PTE bits definitions
  */
-#include 
+#include 
 #include 
 
 #ifdef CONFIG_PPC_MM_SLICES
diff --git a/arch/powerpc/include/asm/pte-book3e.h 
b/arch/powerpc/include/asm/nohash/pte-book3e.h
similarity index 100%
rename from arch/powerpc/include/asm/pte-book3e.h
rename to arch/powerpc/include/asm/nohash/pte-book3e.h
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 15/31] powerpc/booke: Move booke headers (part 3)

2015-09-29 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 .../include/asm/{pgtable-ppc64-4k.h => nohash/64/pgtable-4k.h} |  0
 .../asm/{pgtable-ppc64-64k.h => nohash/64/pgtable-64k.h}   |  0
 arch/powerpc/include/asm/nohash/64/pgtable.h   | 10 +-
 3 files changed, 5 insertions(+), 5 deletions(-)
 rename arch/powerpc/include/asm/{pgtable-ppc64-4k.h => nohash/64/pgtable-4k.h} 
(100%)
 rename arch/powerpc/include/asm/{pgtable-ppc64-64k.h => 
nohash/64/pgtable-64k.h} (100%)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64-4k.h 
b/arch/powerpc/include/asm/nohash/64/pgtable-4k.h
similarity index 100%
rename from arch/powerpc/include/asm/pgtable-ppc64-4k.h
rename to arch/powerpc/include/asm/nohash/64/pgtable-4k.h
diff --git a/arch/powerpc/include/asm/pgtable-ppc64-64k.h 
b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
similarity index 100%
rename from arch/powerpc/include/asm/pgtable-ppc64-64k.h
rename to arch/powerpc/include/asm/nohash/64/pgtable-64k.h
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 8f6195f147a4..1b582e56b60c 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -1,14 +1,14 @@
-#ifndef _ASM_POWERPC_PGTABLE_PPC64_H_
-#define _ASM_POWERPC_PGTABLE_PPC64_H_
+#ifndef _ASM_POWERPC_NOHASH_64_PGTABLE_H
+#define _ASM_POWERPC_NOHASH_64_PGTABLE_H
 /*
  * This file contains the functions and defines necessary to modify and use
  * the ppc64 hashed page table.
  */
 
 #ifdef CONFIG_PPC_64K_PAGES
-#include 
+#include 
 #else
-#include 
+#include 
 #endif
 #include 
 
@@ -629,4 +629,4 @@ static inline int pmd_move_must_withdraw(struct spinlock 
*new_pmd_ptl,
return true;
 }
 #endif /* __ASSEMBLY__ */
-#endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
+#endif /* _ASM_POWERPC_NOHASH_64_PGTABLE_H */
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 22/31] powerpc/mm: Don't track subpage valid bit in pte_t

2015-09-29 Thread Aneesh Kumar K.V
This free up 11 bits in pte_t. In the later patch we also change
the pte_t format so that we can start supporting migration pte
at pmd level.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  | 10 +
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 29 ++-
 arch/powerpc/include/asm/book3s/64/hash.h |  4 
 arch/powerpc/mm/hash64_64k.c  |  4 ++--
 arch/powerpc/mm/hash_low_64.S |  6 +-
 arch/powerpc/mm/hugetlbpage-hash64.c  |  5 +
 arch/powerpc/mm/pgtable_64.c  |  2 +-
 7 files changed, 12 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 537eacecf6e9..75e8b9326e4b 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -47,17 +47,9 @@
 /* Bits to mask out from a PGD to get to the PUD page */
 #define PGD_MASKED_BITS0
 
-/* PTE bits */
-#define _PAGE_HASHPTE  0x0400 /* software: pte has an associated HPTE */
-#define _PAGE_SECONDARY 0x8000 /* software: HPTE is in secondary group */
-#define _PAGE_GROUP_IX  0x7000 /* software: HPTE index within group */
-#define _PAGE_F_SECOND  _PAGE_SECONDARY
-#define _PAGE_F_GIX _PAGE_GROUP_IX
-#define _PAGE_SPECIAL  0x1 /* software: special page */
-
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | \
-_PAGE_SECONDARY | _PAGE_GROUP_IX)
+_PAGE_F_SECOND | _PAGE_F_GIX)
 
 /* shift to put page number into pte */
 #define PTE_RPN_SHIFT  (17)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index dafc2f31c843..b363d73ca225 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -31,33 +31,8 @@
 /* Bits to mask out from a PGD/PUD to get to the PMD page */
 #define PUD_MASKED_BITS0x1ff
 
-/* Additional PTE bits (don't change without checking asm in hash_low.S) */
-#define _PAGE_SPECIAL  0x0400 /* software: special page */
-#define _PAGE_HPTE_SUB 0x0000 /* combo only: sub pages HPTE bits */
-#define _PAGE_HPTE_SUB00x0800 /* combo only: first sub page */
-#define _PAGE_COMBO0x1000 /* this is a combo 4k page */
-#define _PAGE_4K_PFN   0x2000 /* PFN is for a single 4k page */
-
-/* For 64K page, we don't have a separate _PAGE_HASHPTE bit. Instead,
- * we set that to be the whole sub-bits mask. The C code will only
- * test this, so a multi-bit mask will work. For combo pages, this
- * is equivalent as effectively, the old _PAGE_HASHPTE was an OR of
- * all the sub bits. For real 64k pages, we now have the assembly set
- * _PAGE_HPTE_SUB0 in addition to setting the HIDX bits which overlap
- * that mask. This is fine as long as the HIDX bits are never set on
- * a PTE that isn't hashed, which is the case today.
- *
- * A little nit is for the huge page C code, which does the hashing
- * in C, we need to provide which bit to use.
- */
-#define _PAGE_HASHPTE  _PAGE_HPTE_SUB
-
-/* Note the full page bits must be in the same location as for normal
- * 4k pages as the same assembly will be used to insert 64K pages
- * whether the kernel has CONFIG_PPC_64K_PAGES or not
- */
-#define _PAGE_F_SECOND  0x8000 /* full page: hidx bits */
-#define _PAGE_F_GIX 0x7000 /* full page: hidx bits */
+#define _PAGE_COMBO0x0002 /* this is a combo 4k page */
+#define _PAGE_4K_PFN   0x0004 /* PFN is for a single 4k page */
 
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | _PAGE_COMBO)
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 32a1f94201d0..f377757d2cbf 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -83,7 +83,11 @@
 #define _PAGE_DIRTY0x0080 /* C: page changed */
 #define _PAGE_ACCESSED 0x0100 /* R: page referenced */
 #define _PAGE_RW   0x0200 /* software: user write access allowed */
+#define _PAGE_HASHPTE  0x0400 /* software: pte has an associated HPTE 
*/
 #define _PAGE_BUSY 0x0800 /* software: PTE & hash are busy */
+#define _PAGE_F_GIX0x7000 /* full page: hidx bits */
+#define _PAGE_F_SECOND 0x8000 /* Whether to use secondary hash or not 
*/
+#define _PAGE_SPECIAL  0x1 /* software: special page */
 
 /* No separate kernel read-only */
 #define _PAGE_KERNEL_RW(_PAGE_RW | _PAGE_DIRTY) /* user access 
blocked by key */
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 423f47a89299..2beead9c760e 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ 

[PATCH V2 23/31] powerpc/mm: Increase the width of #define

2015-09-29 Thread Aneesh Kumar K.V
No real change, only style changes

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index f377757d2cbf..f5b57d1c00dc 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -71,22 +71,22 @@
  * We could create separate kernel read-only if we used the 3 PP bits
  * combinations that newer processors provide but we currently don't.
  */
-#define _PAGE_PRESENT  0x0001 /* software: pte contains a translation 
*/
-#define _PAGE_USER 0x0002 /* matches one of the PP bits */
+#define _PAGE_PRESENT  0x1 /* software: pte contains a translation 
*/
+#define _PAGE_USER 0x2 /* matches one of the PP bits */
 #define _PAGE_BIT_SWAP_TYPE2
-#define _PAGE_EXEC 0x0004 /* No execute on POWER4 and newer (we 
invert) */
-#define _PAGE_GUARDED  0x0008
+#define _PAGE_EXEC 0x4 /* No execute on POWER4 and newer (we 
invert) */
+#define _PAGE_GUARDED  0x8
 /* We can derive Memory coherence from _PAGE_NO_CACHE */
 #define _PAGE_COHERENT 0x0
-#define _PAGE_NO_CACHE 0x0020 /* I: cache inhibit */
-#define _PAGE_WRITETHRU0x0040 /* W: cache write-through */
-#define _PAGE_DIRTY0x0080 /* C: page changed */
-#define _PAGE_ACCESSED 0x0100 /* R: page referenced */
-#define _PAGE_RW   0x0200 /* software: user write access allowed */
-#define _PAGE_HASHPTE  0x0400 /* software: pte has an associated HPTE 
*/
-#define _PAGE_BUSY 0x0800 /* software: PTE & hash are busy */
-#define _PAGE_F_GIX0x7000 /* full page: hidx bits */
-#define _PAGE_F_SECOND 0x8000 /* Whether to use secondary hash or not 
*/
+#define _PAGE_NO_CACHE 0x00020 /* I: cache inhibit */
+#define _PAGE_WRITETHRU0x00040 /* W: cache write-through */
+#define _PAGE_DIRTY0x00080 /* C: page changed */
+#define _PAGE_ACCESSED 0x00100 /* R: page referenced */
+#define _PAGE_RW   0x00200 /* software: user write access allowed 
*/
+#define _PAGE_HASHPTE  0x00400 /* software: pte has an associated HPTE 
*/
+#define _PAGE_BUSY 0x00800 /* software: PTE & hash are busy */
+#define _PAGE_F_GIX0x07000 /* full page: hidx bits */
+#define _PAGE_F_SECOND 0x08000 /* Whether to use secondary hash or not 
*/
 #define _PAGE_SPECIAL  0x1 /* software: special page */
 
 /* No separate kernel read-only */
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 30/31] powerpc/mm: Move THP headers around

2015-09-29 Thread Aneesh Kumar K.V
We support THP only with book3s_64 and 64K page size. Move
THP details to hash64-64k.h to clarify the same.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 126 +
 arch/powerpc/include/asm/book3s/64/hash.h | 223 ++-
 arch/powerpc/include/asm/nohash/64/pgtable.h  | 245 +-
 arch/powerpc/mm/hash_native_64.c  |  10 ++
 arch/powerpc/mm/pgtable_64.c  |   2 +-
 arch/powerpc/platforms/pseries/lpar.c |  10 ++
 6 files changed, 201 insertions(+), 415 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 0869e5fe5d08..3ea7cd4704b9 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -159,6 +159,132 @@ static inline int hugepd_ok(hugepd_t hpd)
 #define is_hugepd(pdep)0
 #endif /* CONFIG_HUGETLB_PAGE */
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+extern unsigned long pmd_hugepage_update(struct mm_struct *mm,
+unsigned long addr,
+pmd_t *pmdp,
+unsigned long clr,
+unsigned long set);
+static inline char *get_hpte_slot_array(pmd_t *pmdp)
+{
+   /*
+* The hpte hindex is stored in the pgtable whose address is in the
+* second half of the PMD
+*
+* Order this load with the test for pmd_trans_huge in the caller
+*/
+   smp_rmb();
+   return *(char **)(pmdp + PTRS_PER_PMD);
+
+
+}
+/*
+ * The linux hugepage PMD now include the pmd entries followed by the address
+ * to the stashed pgtable_t. The stashed pgtable_t contains the hpte bits.
+ * [ 1 bit secondary | 3 bit hidx | 1 bit valid | 000]. We use one byte per
+ * each HPTE entry. With 16MB hugepage and 64K HPTE we need 256 entries and
+ * with 4K HPTE we need 4096 entries. Both will fit in a 4K pgtable_t.
+ *
+ * The last three bits are intentionally left to zero. This memory location
+ * are also used as normal page PTE pointers. So if we have any pointers
+ * left around while we collapse a hugepage, we need to make sure
+ * _PAGE_PRESENT bit of that is zero when we look at them
+ */
+static inline unsigned int hpte_valid(unsigned char *hpte_slot_array, int 
index)
+{
+   return (hpte_slot_array[index] >> 3) & 0x1;
+}
+
+static inline unsigned int hpte_hash_index(unsigned char *hpte_slot_array,
+  int index)
+{
+   return hpte_slot_array[index] >> 4;
+}
+
+static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array,
+   unsigned int index, unsigned int hidx)
+{
+   hpte_slot_array[index] = hidx << 4 | 0x1 << 3;
+}
+
+/*
+ *
+ * For core kernel code by design pmd_trans_huge is never run on any hugetlbfs
+ * page. The hugetlbfs page table walking and mangling paths are totally
+ * separated form the core VM paths and they're differentiated by
+ *  VM_HUGETLB being set on vm_flags well before any pmd_trans_huge could run.
+ *
+ * pmd_trans_huge() is defined as false at build time if
+ * CONFIG_TRANSPARENT_HUGEPAGE=n to optimize away code blocks at build
+ * time in such case.
+ *
+ * For ppc64 we need to differntiate from explicit hugepages from THP, because
+ * for THP we also track the subpage details at the pmd level. We don't do
+ * that for explicit huge pages.
+ *
+ */
+static inline int pmd_trans_huge(pmd_t pmd)
+{
+   /*
+* leaf pte for huge page, bottom two bits != 00
+*/
+   return (pmd_val(pmd) & 0x3) && (pmd_val(pmd) & _PAGE_THP_HUGE);
+}
+
+static inline int pmd_trans_splitting(pmd_t pmd)
+{
+   if (pmd_trans_huge(pmd))
+   return pmd_val(pmd) & _PAGE_SPLITTING;
+   return 0;
+}
+
+static inline int pmd_large(pmd_t pmd)
+{
+   /*
+* leaf pte for huge page, bottom two bits != 00
+*/
+   return ((pmd_val(pmd) & 0x3) != 0x0);
+}
+
+static inline pmd_t pmd_mknotpresent(pmd_t pmd)
+{
+   return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
+}
+
+static inline pmd_t pmd_mksplitting(pmd_t pmd)
+{
+   return __pmd(pmd_val(pmd) | _PAGE_SPLITTING);
+}
+
+#define __HAVE_ARCH_PMD_SAME
+static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
+{
+   return (((pmd_val(pmd_a) ^ pmd_val(pmd_b)) & ~_PAGE_HPTEFLAGS) == 0);
+}
+
+static inline int __pmdp_test_and_clear_young(struct mm_struct *mm,
+ unsigned long addr, pmd_t *pmdp)
+{
+   unsigned long old;
+
+   if ((pmd_val(*pmdp) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
+   return 0;
+   old = pmd_hugepage_update(mm, addr, pmdp, _PAGE_ACCESSED, 0);
+   return ((old & _PAGE_ACCESSED) != 0);
+}
+
+#define __HAVE_ARCH_PMDP_SET_WRPROTECT
+static inline void 

[PATCH V2 31/31] powerpc/mm: Add a _PAGE_PTE bit

2015-09-29 Thread Aneesh Kumar K.V
For a pte entry we will have _PAGE_PTE set. Our pte page
address have a minimum alignment requirement of HUGEPD_SHIFT_MASK + 1.
We use the lower 7 bits to indicate hugepd. ie.

For pmd and pgd we can find:
1) _PAGE_PTE set pte -> indicate PTE
2) bits [2..6] non zero -> indicate hugepd.
   They also encode the size. We skip bit 1 (_PAGE_PRESENT).
3) othewise pointer to next table.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  9 ++---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 23 +--
 arch/powerpc/include/asm/book3s/64/hash.h | 13 +++--
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  3 +--
 arch/powerpc/include/asm/pte-common.h |  5 +
 arch/powerpc/mm/hugetlbpage.c |  4 ++--
 arch/powerpc/mm/pgtable.c |  4 
 arch/powerpc/mm/pgtable_64.c  |  7 +--
 8 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index b4d25529d179..e59832c94609 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -116,10 +116,13 @@ static inline int pgd_huge(pgd_t pgd)
 static inline int hugepd_ok(hugepd_t hpd)
 {
/*
-* hugepd pointer, bottom two bits == 00 and next 4 bits
-* indicate size of table
+* if it is not a pte and have hugepd shift mask
+* set, then it is a hugepd directory pointer
 */
-   return (((hpd.pd & 0x3) == 0x0) && ((hpd.pd & HUGEPD_SHIFT_MASK) != 0));
+   if (!(hpd.pd & _PAGE_PTE) &&
+   ((hpd.pd & HUGEPD_SHIFT_MASK) != 0))
+   return true;
+   return false;
 }
 #define is_hugepd(hpd) (hugepd_ok(hpd))
 #endif
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 3ea7cd4704b9..741e07504745 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -130,25 +130,25 @@ static inline bool __rpte_sub_valid(real_pte_t rpte, 
unsigned long index)
 static inline int pmd_huge(pmd_t pmd)
 {
/*
-* leaf pte for huge page, bottom two bits != 00
+* leaf pte for huge page
 */
-   return ((pmd_val(pmd) & 0x3) != 0x0);
+   return !!(pmd_val(pmd) & _PAGE_PTE);
 }
 
 static inline int pud_huge(pud_t pud)
 {
/*
-* leaf pte for huge page, bottom two bits != 00
+* leaf pte for huge page
 */
-   return ((pud_val(pud) & 0x3) != 0x0);
+   return !!(pud_val(pud) & _PAGE_PTE);
 }
 
 static inline int pgd_huge(pgd_t pgd)
 {
/*
-* leaf pte for huge page, bottom two bits != 00
+* leaf pte for huge page
 */
-   return ((pgd_val(pgd) & 0x3) != 0x0);
+   return !!(pgd_val(pgd) & _PAGE_PTE);
 }
 #define pgd_huge pgd_huge
 
@@ -225,10 +225,8 @@ static inline void mark_hpte_slot_valid(unsigned char 
*hpte_slot_array,
  */
 static inline int pmd_trans_huge(pmd_t pmd)
 {
-   /*
-* leaf pte for huge page, bottom two bits != 00
-*/
-   return (pmd_val(pmd) & 0x3) && (pmd_val(pmd) & _PAGE_THP_HUGE);
+   return !!((pmd_val(pmd) & (_PAGE_PTE | _PAGE_THP_HUGE)) ==
+ (_PAGE_PTE | _PAGE_THP_HUGE));
 }
 
 static inline int pmd_trans_splitting(pmd_t pmd)
@@ -240,10 +238,7 @@ static inline int pmd_trans_splitting(pmd_t pmd)
 
 static inline int pmd_large(pmd_t pmd)
 {
-   /*
-* leaf pte for huge page, bottom two bits != 00
-*/
-   return ((pmd_val(pmd) & 0x3) != 0x0);
+   return !!(pmd_val(pmd) & _PAGE_PTE);
 }
 
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index d7c5c96b0faa..5c315cfade0e 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -14,11 +14,12 @@
  * We could create separate kernel read-only if we used the 3 PP bits
  * combinations that newer processors provide but we currently don't.
  */
-#define _PAGE_PRESENT  0x1 /* software: pte contains a translation 
*/
-#define _PAGE_USER 0x2 /* matches one of the PP bits */
+#define _PAGE_PTE  0x1
+#define _PAGE_PRESENT  0x2 /* software: pte contains a translation 
*/
+#define _PAGE_USER 0x4 /* matches one of the PP bits */
 #define _PAGE_BIT_SWAP_TYPE2
-#define _PAGE_EXEC 0x4 /* No execute on POWER4 and newer (we 
invert) */
-#define _PAGE_GUARDED  0x8
+#define _PAGE_EXEC 0x8 /* No execute on POWER4 and newer (we 
invert) */
+#define _PAGE_GUARDED  0x00010
 /* We can derive Memory coherence from _PAGE_NO_CACHE */
 #define _PAGE_COHERENT 0x0
 #define _PAGE_NO_CACHE 0x00020 /* I: 

[PATCH 3/3] cxl: fix leak of ctx->mapping when releasing kernel API contexts

2015-09-29 Thread Andrew Donnellan
When a context is created via the kernel API, ctx->mapping is allocated
within the kernel and thus needs to be freed when the context is freed.
reclaim_ctx() attempts to do this for contexts with the ctx->kernelapi flag
set, but afu_release() (which can be called from the kernel API through
cxl_fd_release()) sets ctx->mapping to NULL before calling
cxl_context_free() to free the context.

Add a check to afu_release() so that the mappings in contexts created via
the kernel API are left alone so reclaim_ctx() can free them.

Reported-by: Matthew R. Ochs 
Fixes: 6f7f0b3df6d4 ("cxl: Add AFU virtual PHB and kernel API")
Signed-off-by: Andrew Donnellan 
---
 drivers/misc/cxl/file.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index a30bf28..fcda6b0 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -120,9 +120,16 @@ int afu_release(struct inode *inode, struct file *file)
 __func__, ctx->pe);
cxl_context_detach(ctx);
 
-   mutex_lock(>mapping_lock);
-   ctx->mapping = NULL;
-   mutex_unlock(>mapping_lock);
+
+   /* 
+* Delete the context's mapping pointer, unless it's created by the
+* kernel API, in which case leave it so it can be freed by 
reclaim_ctx()
+*/
+   if (!ctx->kernelapi) {
+   mutex_lock(>mapping_lock);
+   ctx->mapping = NULL;
+   mutex_unlock(>mapping_lock);
+   }
 
put_device(>afu->dev);
 
-- 
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 03/31] powerpc/mm: make a separate copy for book3s

2015-09-29 Thread Aneesh Kumar K.V
In this patch we do:
cp pgtable-ppc32.h book3s/32/pgtable.h
cp pgtable-ppc64.h book3s/64/pgtable.h

This enable us to do further changes to hash specific config.
We will change the page table format for 64bit hash in later patches.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 340 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h | 618 +++
 arch/powerpc/include/asm/book3s/pgtable.h|  10 +
 arch/powerpc/include/asm/mmu-hash64.h|   2 +-
 arch/powerpc/include/asm/pgtable-ppc32.h |   2 -
 arch/powerpc/include/asm/pgtable-ppc64.h |   4 -
 arch/powerpc/include/asm/pgtable.h   |   4 +
 7 files changed, 973 insertions(+), 7 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/32/pgtable.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pgtable.h
 create mode 100644 arch/powerpc/include/asm/book3s/pgtable.h

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
new file mode 100644
index ..1a58a05be99c
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -0,0 +1,340 @@
+#ifndef _ASM_POWERPC_PGTABLE_PPC32_H
+#define _ASM_POWERPC_PGTABLE_PPC32_H
+
+#include 
+
+#ifndef __ASSEMBLY__
+#include 
+#include 
+#include /* For sub-arch specific PPC_PIN_SIZE */
+
+extern unsigned long ioremap_bot;
+
+#ifdef CONFIG_44x
+extern int icache_44x_need_flush;
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+/*
+ * The normal case is that PTEs are 32-bits and we have a 1-page
+ * 1024-entry pgdir pointing to 1-page 1024-entry PTE pages.  -- paulus
+ *
+ * For any >32-bit physical address platform, we can use the following
+ * two level page table layout where the pgdir is 8KB and the MS 13 bits
+ * are an index to the second level table.  The combined pgdir/pmd first
+ * level has 2048 entries and the second level has 512 64-bit PTE entries.
+ * -Matt
+ */
+/* PGDIR_SHIFT determines what a top-level page table entry can map */
+#define PGDIR_SHIFT(PAGE_SHIFT + PTE_SHIFT)
+#define PGDIR_SIZE (1UL << PGDIR_SHIFT)
+#define PGDIR_MASK (~(PGDIR_SIZE-1))
+
+/*
+ * entries per page directory level: our page-table tree is two-level, so
+ * we don't really have any PMD directory.
+ */
+#ifndef __ASSEMBLY__
+#define PTE_TABLE_SIZE (sizeof(pte_t) << PTE_SHIFT)
+#define PGD_TABLE_SIZE (sizeof(pgd_t) << (32 - PGDIR_SHIFT))
+#endif /* __ASSEMBLY__ */
+
+#define PTRS_PER_PTE   (1 << PTE_SHIFT)
+#define PTRS_PER_PMD   1
+#define PTRS_PER_PGD   (1 << (32 - PGDIR_SHIFT))
+
+#define USER_PTRS_PER_PGD  (TASK_SIZE / PGDIR_SIZE)
+#define FIRST_USER_ADDRESS 0UL
+
+#define pte_ERROR(e) \
+   pr_err("%s:%d: bad pte %llx.\n", __FILE__, __LINE__, \
+   (unsigned long long)pte_val(e))
+#define pgd_ERROR(e) \
+   pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
+
+/*
+ * This is the bottom of the PKMAP area with HIGHMEM or an arbitrary
+ * value (for now) on others, from where we can start layout kernel
+ * virtual space that goes below PKMAP and FIXMAP
+ */
+#ifdef CONFIG_HIGHMEM
+#define KVIRT_TOP  PKMAP_BASE
+#else
+#define KVIRT_TOP  (0xfe00UL)  /* for now, could be FIXMAP_BASE ? */
+#endif
+
+/*
+ * ioremap_bot starts at that address. Early ioremaps move down from there,
+ * until mem_init() at which point this becomes the top of the vmalloc
+ * and ioremap space
+ */
+#ifdef CONFIG_NOT_COHERENT_CACHE
+#define IOREMAP_TOP((KVIRT_TOP - CONFIG_CONSISTENT_SIZE) & PAGE_MASK)
+#else
+#define IOREMAP_TOPKVIRT_TOP
+#endif
+
+/*
+ * Just any arbitrary offset to the start of the vmalloc VM area: the
+ * current 16MB value just means that there will be a 64MB "hole" after the
+ * physical memory until the kernel virtual memory starts.  That means that
+ * any out-of-bounds memory accesses will hopefully be caught.
+ * The vmalloc() routines leaves a hole of 4kB between each vmalloced
+ * area for the same reason. ;)
+ *
+ * We no longer map larger than phys RAM with the BATs so we don't have
+ * to worry about the VMALLOC_OFFSET causing problems.  We do have to worry
+ * about clashes between our early calls to ioremap() that start growing down
+ * from ioremap_base being run into the VM area allocations (growing upwards
+ * from VMALLOC_START).  For this reason we have ioremap_bot to check when
+ * we actually run into our mappings setup in the early boot with the VM
+ * system.  This really does become a problem for machines with good amounts
+ * of RAM.  -- Cort
+ */
+#define VMALLOC_OFFSET (0x100) /* 16M */
+#ifdef PPC_PIN_SIZE
+#define VMALLOC_START (((_ALIGN((long)high_memory, PPC_PIN_SIZE) + 
VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
+#else
+#define VMALLOC_START long)high_memory + VMALLOC_OFFSET) & 
~(VMALLOC_OFFSET-1)))
+#endif
+#define VMALLOC_ENDioremap_bot
+
+/*
+ * Bits in a linux-style PTE.  These match the bits in the
+ * 

[PATCH V2 02/31] powerpc/mm: move pte headers to book3s directory (part 2)

2015-09-29 Thread Aneesh Kumar K.V
Splitting this so that rename can track changes to file. Before merging
we will fold this

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/hash.h  |  6 +++---
 .../include/asm/{pte-hash64-4k.h => book3s/64/hash-4k.h}   |  1 -
 .../include/asm/{pte-hash64-64k.h => book3s/64/hash-64k.h} |  0
 arch/powerpc/include/asm/book3s/64/hash.h  | 10 +-
 4 files changed, 8 insertions(+), 9 deletions(-)
 rename arch/powerpc/include/asm/{pte-hash64-4k.h => book3s/64/hash-4k.h} (99%)
 rename arch/powerpc/include/asm/{pte-hash64-64k.h => book3s/64/hash-64k.h} 
(100%)

diff --git a/arch/powerpc/include/asm/book3s/32/hash.h 
b/arch/powerpc/include/asm/book3s/32/hash.h
index 62cfb0c663bb..264b754d65b0 100644
--- a/arch/powerpc/include/asm/book3s/32/hash.h
+++ b/arch/powerpc/include/asm/book3s/32/hash.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PTE_HASH32_H
-#define _ASM_POWERPC_PTE_HASH32_H
+#ifndef _ASM_POWERPC_BOOK3S_32_HASH_H
+#define _ASM_POWERPC_BOOK3S_32_HASH_H
 #ifdef __KERNEL__
 
 /*
@@ -43,4 +43,4 @@
 #define PTE_ATOMIC_UPDATES 1
 
 #endif /* __KERNEL__ */
-#endif /*  _ASM_POWERPC_PTE_HASH32_H */
+#endif /* _ASM_POWERPC_BOOK3S_32_HASH_H */
diff --git a/arch/powerpc/include/asm/pte-hash64-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
similarity index 99%
rename from arch/powerpc/include/asm/pte-hash64-4k.h
rename to arch/powerpc/include/asm/book3s/64/hash-4k.h
index c134e809aac3..79750fd3eeb8 100644
--- a/arch/powerpc/include/asm/pte-hash64-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -14,4 +14,3 @@
 
 /* shift to put page number into pte */
 #define PTE_RPN_SHIFT  (17)
-
diff --git a/arch/powerpc/include/asm/pte-hash64-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
similarity index 100%
rename from arch/powerpc/include/asm/pte-hash64-64k.h
rename to arch/powerpc/include/asm/book3s/64/hash-64k.h
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index ef612c160da7..8e60d4fa434d 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PTE_HASH64_H
-#define _ASM_POWERPC_PTE_HASH64_H
+#ifndef _ASM_POWERPC_BOOK3S_64_HASH_H
+#define _ASM_POWERPC_BOOK3S_64_HASH_H
 #ifdef __KERNEL__
 
 /*
@@ -45,10 +45,10 @@
 #define PTE_ATOMIC_UPDATES 1
 
 #ifdef CONFIG_PPC_64K_PAGES
-#include 
+#include 
 #else
-#include 
+#include 
 #endif
 
 #endif /* __KERNEL__ */
-#endif /*  _ASM_POWERPC_PTE_HASH64_H */
+#endif /* _ASM_POWERPC_BOOK3S_64_HASH_H */
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 18/31] powerpc/mm: Increase the pte frag size.

2015-09-29 Thread Aneesh Kumar K.V
We will use the increased size to store more information of 4K pte
when using 64K page size. The idea is to free up bits in pte_t.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/pgalloc-64.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/pgalloc-64.h 
b/arch/powerpc/include/asm/pgalloc-64.h
index d8cde71f6734..4f1cc6c46728 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -164,15 +164,15 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
 
 #else /* if CONFIG_PPC_64K_PAGES */
 /*
- * we support 16 fragments per PTE page.
+ * we support 8 fragments per PTE page.
  */
-#define PTE_FRAG_NR16
+#define PTE_FRAG_NR8
 /*
- * We use a 2K PTE page fragment and another 2K for storing
- * real_pte_t hash index
+ * We use a 2K PTE page fragment and another 4K for storing
+ * real_pte_t hash index. Rounding the entire thing to 8K
  */
-#define PTE_FRAG_SIZE_SHIFT  12
-#define PTE_FRAG_SIZE (2 * PTRS_PER_PTE * sizeof(pte_t))
+#define PTE_FRAG_SIZE_SHIFT  13
+#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
 
 extern pte_t *page_table_alloc(struct mm_struct *, unsigned long, int);
 extern void page_table_free(struct mm_struct *, unsigned long *, int);
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 17/31] powerpc/booke: Move booke headers (part 5)

2015-09-29 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/nohash/32/pte-40x.h   | 6 +++---
 arch/powerpc/include/asm/nohash/32/pte-44x.h   | 6 +++---
 arch/powerpc/include/asm/nohash/32/pte-8xx.h   | 6 +++---
 arch/powerpc/include/asm/nohash/32/pte-fsl-booke.h | 6 +++---
 arch/powerpc/include/asm/nohash/64/pgtable-4k.h| 6 +++---
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h   | 6 +++---
 arch/powerpc/include/asm/nohash/pte-book3e.h   | 6 +++---
 7 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pte-40x.h 
b/arch/powerpc/include/asm/nohash/32/pte-40x.h
index 486b1ef81338..9624ebdacc47 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-40x.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-40x.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PTE_40x_H
-#define _ASM_POWERPC_PTE_40x_H
+#ifndef _ASM_POWERPC_NOHASH_32_PTE_40x_H
+#define _ASM_POWERPC_NOHASH_32_PTE_40x_H
 #ifdef __KERNEL__
 
 /*
@@ -61,4 +61,4 @@
 #define PTE_ATOMIC_UPDATES 1
 
 #endif /* __KERNEL__ */
-#endif /*  _ASM_POWERPC_PTE_40x_H */
+#endif /*  _ASM_POWERPC_NOHASH_32_PTE_40x_H */
diff --git a/arch/powerpc/include/asm/nohash/32/pte-44x.h 
b/arch/powerpc/include/asm/nohash/32/pte-44x.h
index 36f75fab23f5..fdab41c654ef 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-44x.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-44x.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PTE_44x_H
-#define _ASM_POWERPC_PTE_44x_H
+#ifndef _ASM_POWERPC_NOHASH_32_PTE_44x_H
+#define _ASM_POWERPC_NOHASH_32_PTE_44x_H
 #ifdef __KERNEL__
 
 /*
@@ -94,4 +94,4 @@
 
 
 #endif /* __KERNEL__ */
-#endif /*  _ASM_POWERPC_PTE_44x_H */
+#endif /*  _ASM_POWERPC_NOHASH_32_PTE_44x_H */
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index a0e2ba960976..3742b1919661 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PTE_8xx_H
-#define _ASM_POWERPC_PTE_8xx_H
+#ifndef _ASM_POWERPC_NOHASH_32_PTE_8xx_H
+#define _ASM_POWERPC_NOHASH_32_PTE_8xx_H
 #ifdef __KERNEL__
 
 /*
@@ -62,4 +62,4 @@
 _PAGE_HWWRITE | _PAGE_EXEC)
 
 #endif /* __KERNEL__ */
-#endif /*  _ASM_POWERPC_PTE_8xx_H */
+#endif /*  _ASM_POWERPC_NOHASH_32_PTE_8xx_H */
diff --git a/arch/powerpc/include/asm/nohash/32/pte-fsl-booke.h 
b/arch/powerpc/include/asm/nohash/32/pte-fsl-booke.h
index 9f5c3d04a1a3..5422d00c6145 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-fsl-booke.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-fsl-booke.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PTE_FSL_BOOKE_H
-#define _ASM_POWERPC_PTE_FSL_BOOKE_H
+#ifndef _ASM_POWERPC_NOHASH_32_PTE_FSL_BOOKE_H
+#define _ASM_POWERPC_NOHASH_32_PTE_FSL_BOOKE_H
 #ifdef __KERNEL__
 
 /* PTE bit definitions for Freescale BookE SW loaded TLB MMU based
@@ -37,4 +37,4 @@
 #define PTE_WIMGE_SHIFT (6)
 
 #endif /* __KERNEL__ */
-#endif /*  _ASM_POWERPC_PTE_FSL_BOOKE_H */
+#endif /*  _ASM_POWERPC_NOHASH_32_PTE_FSL_BOOKE_H */
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable-4k.h 
b/arch/powerpc/include/asm/nohash/64/pgtable-4k.h
index 7bace25d6b62..fc7d51753f81 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable-4k.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable-4k.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PGTABLE_PPC64_4K_H
-#define _ASM_POWERPC_PGTABLE_PPC64_4K_H
+#ifndef _ASM_POWERPC_NOHASH_64_PGTABLE_4K_H
+#define _ASM_POWERPC_NOHASH_64_PGTABLE_4K_H
 /*
  * Entries per page directory level.  The PTE level must use a 64b record
  * for each page table entry.  The PMD and PGD level use a 32b record for
@@ -89,4 +89,4 @@ extern struct page *pgd_page(pgd_t pgd);
 #define remap_4k_pfn(vma, addr, pfn, prot) \
remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE, (prot))
 
-#endif /* _ASM_POWERPC_PGTABLE_PPC64_4K_H */
+#endif /* _ _ASM_POWERPC_NOHASH_64_PGTABLE_4K_H */
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h 
b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
index 1de35bbd02a6..a44660d76096 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PGTABLE_PPC64_64K_H
-#define _ASM_POWERPC_PGTABLE_PPC64_64K_H
+#ifndef _ASM_POWERPC_NOHASH_64_PGTABLE_64K_H
+#define _ASM_POWERPC_NOHASH_64_PGTABLE_64K_H
 
 #include 
 
@@ -41,4 +41,4 @@
 #define pgd_pte(pgd)   (pud_pte(((pud_t){ pgd })))
 #define pte_pgd(pte)   ((pgd_t)pte_pud(pte))
 
-#endif /* _ASM_POWERPC_PGTABLE_PPC64_64K_H */
+#endif /* _ASM_POWERPC_NOHASH_64_PGTABLE_64K_H */
diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h 
b/arch/powerpc/include/asm/nohash/pte-book3e.h
index 8d8473278d91..e16807b78edf 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_PTE_BOOK3E_H
-#define _ASM_POWERPC_PTE_BOOK3E_H
+#ifndef 

[PATCH V2 27/31] powerpc/mm: Add helper for converting pte bit to hpte bits

2015-09-29 Thread Aneesh Kumar K.V
Instead of open coding it in multiple code paths, export the helper
and add more documentation. Also make sure we don't make assumption
regarding pte bit position

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h |  1 +
 arch/powerpc/mm/hash64_4k.c   | 13 +---
 arch/powerpc/mm/hash64_64k.c  | 35 +++
 arch/powerpc/mm/hash_utils_64.c   | 22 ---
 arch/powerpc/mm/hugepage-hash64.c | 13 +---
 arch/powerpc/mm/hugetlbpage-hash64.c  |  4 +---
 6 files changed, 21 insertions(+), 67 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index e84987ade89c..92831cf798ce 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -242,6 +242,7 @@ extern unsigned long pmd_hugepage_update(struct mm_struct 
*mm,
 pmd_t *pmdp,
 unsigned long clr,
 unsigned long set);
+extern unsigned long htab_convert_pte_flags(unsigned long pteflags);
 /* Atomic PTE updates */
 static inline unsigned long pte_update(struct mm_struct *mm,
   unsigned long addr,
diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
index 1832ed7fef0d..7749857f6ced 100644
--- a/arch/powerpc/mm/hash64_4k.c
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -53,18 +53,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * PP bits. _PAGE_USER is already PP bit 0x2, so we only
 * need to add in 0x1 if it's a read-only user page
 */
-   rflags = new_pte & _PAGE_USER;
-   if ((new_pte & _PAGE_USER) && !((new_pte & _PAGE_RW) &&
-   (new_pte & _PAGE_DIRTY)))
-   rflags |= 0x1;
-   /*
-* _PAGE_EXEC -> HW_NO_EXEC since it's inverted
-*/
-   rflags |= ((new_pte & _PAGE_EXEC) ? 0 : HPTE_R_N);
-   /*
-* Always add C and Memory coherence bit
-*/
-   rflags |= HPTE_R_C | HPTE_R_M;
+   rflags = htab_convert_pte_flags(new_pte);
/*
 * Add in WIMG bits
 */
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 5736940f0b86..c30635d27c0f 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -76,22 +76,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * Handle the subpage protection bits
 */
subpg_pte = new_pte & ~subpg_prot;
-   /*
-* PP bits. _PAGE_USER is already PP bit 0x2, so we only
-* need to add in 0x1 if it's a read-only user page
-*/
-   rflags = subpg_pte & _PAGE_USER;
-   if ((subpg_pte & _PAGE_USER) && !((subpg_pte & _PAGE_RW) &&
-   (subpg_pte & _PAGE_DIRTY)))
-   rflags |= 0x1;
-   /*
-* _PAGE_EXEC -> HW_NO_EXEC since it's inverted
-*/
-   rflags |= ((subpg_pte & _PAGE_EXEC) ? 0 : HPTE_R_N);
-   /*
-* Always add C and Memory coherence bit
-*/
-   rflags |= HPTE_R_C | HPTE_R_M;
+   rflags = htab_convert_pte_flags(subpg_pte);
/*
 * Add in WIMG bits
 */
@@ -262,22 +247,8 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
new_pte |= _PAGE_DIRTY;
} while (old_pte != __cmpxchg_u64((unsigned long *)ptep,
  old_pte, new_pte));
-   /*
-* PP bits. _PAGE_USER is already PP bit 0x2, so we only
-* need to add in 0x1 if it's a read-only user page
-*/
-   rflags = new_pte & _PAGE_USER;
-   if ((new_pte & _PAGE_USER) && !((new_pte & _PAGE_RW) &&
-   (new_pte & _PAGE_DIRTY)))
-   rflags |= 0x1;
-   /*
-* _PAGE_EXEC -> HW_NO_EXEC since it's inverted
-*/
-   rflags |= ((new_pte & _PAGE_EXEC) ? 0 : HPTE_R_N);
-   /*
-* Always add C and Memory coherence bit
-*/
-   rflags |= HPTE_R_C | HPTE_R_M;
+
+   rflags = htab_convert_pte_flags(new_pte);
/*
 * Add in WIMG bits
 */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 6cd9e40aae01..444c42eabfdf 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -159,20 +159,26 @@ static struct mmu_psize_def mmu_psize_defaults_gp[] = {
},
 };
 
-static unsigned long htab_convert_pte_flags(unsigned long pteflags)
+unsigned long htab_convert_pte_flags(unsigned long pteflags)
 {
-   unsigned long rflags = pteflags & 0x1fa;
+   unsigned long rflags = 0;
 
/* _PAGE_EXEC -> NOEXEC */
if ((pteflags & _PAGE_EXEC) == 0)
rflags |= 

[PATCH V2 08/31] powerpc/mm: Drop pte-common.h from BOOK3S 64

2015-09-29 Thread Aneesh Kumar K.V
We copy only needed PTE bits define from pte-common.h to respective
hash related header. This should greatly simply later patches in which
we are going to change the pte format for hash config

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h |   1 +
 arch/powerpc/include/asm/book3s/64/hash.h|   2 +
 arch/powerpc/include/asm/book3s/64/pgtable.h | 106 ++-
 arch/powerpc/include/asm/book3s/pgtable.h|  16 ++--
 4 files changed, 113 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index f2c51cd61f69..15518b620f5a 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -62,6 +62,7 @@
 /* shift to put page number into pte */
 #define PTE_RPN_SHIFT  (17)
 
+#define _PAGE_4K_PFN   0
 #ifndef __ASSEMBLY__
 /*
  * 4-level page tables related bits
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 8e60d4fa434d..7deb5063ff8c 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -20,6 +20,7 @@
 #define _PAGE_EXEC 0x0004 /* No execute on POWER4 and newer (we 
invert) */
 #define _PAGE_GUARDED  0x0008
 /* We can derive Memory coherence from _PAGE_NO_CACHE */
+#define _PAGE_COHERENT 0x0
 #define _PAGE_NO_CACHE 0x0020 /* I: cache inhibit */
 #define _PAGE_WRITETHRU0x0040 /* W: cache write-through */
 #define _PAGE_DIRTY0x0080 /* C: page changed */
@@ -30,6 +31,7 @@
 /* No separate kernel read-only */
 #define _PAGE_KERNEL_RW(_PAGE_RW | _PAGE_DIRTY) /* user access 
blocked by key */
 #define _PAGE_KERNEL_RO _PAGE_KERNEL_RW
+#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC)
 
 /* Strong Access Ordering */
 #define _PAGE_SAO  (_PAGE_WRITETHRU | _PAGE_NO_CACHE | 
_PAGE_COHERENT)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 8bd2f66738f2..02b2a8264028 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -96,11 +96,111 @@
 #define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS |  \
 _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_SPLITTING | \
 _PAGE_THP_HUGE)
+#define _PTE_NONE_MASK _PAGE_HPTEFLAGS
 /*
- * Default defines for things which we don't use.
- * We should get this removed.
+ * The mask convered by the RPN must be a ULL on 32-bit platforms with
+ * 64-bit PTEs
+ * FIXME!! double check the RPN_MAX May be not used
  */
-#include 
+//#define PTE_RPN_MAX  (1UL << (32 - PTE_RPN_SHIFT))
+#define PTE_RPN_MASK   (~((1UL << PTE_RPN_SHIFT) - 1))
+/*
+ * _PAGE_CHG_MASK masks of bits that are to be preserved across
+ * pgprot changes
+ */
+#define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \
+_PAGE_ACCESSED | _PAGE_SPECIAL)
+/*
+ * Mask of bits returned by pte_pgprot()
+ */
+#define PAGE_PROT_BITS (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | \
+_PAGE_WRITETHRU | _PAGE_4K_PFN | \
+_PAGE_USER | _PAGE_ACCESSED |  \
+_PAGE_RW |  _PAGE_DIRTY | _PAGE_EXEC)
+/*
+ * We define 2 sets of base prot bits, one for basic pages (ie,
+ * cacheable kernel and user pages) and one for non cacheable
+ * pages. We always set _PAGE_COHERENT when SMP is enabled or
+ * the processor might need it for DMA coherency.
+ */
+#define _PAGE_BASE_NC  (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE)
+#define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT)
+
+/* Permission masks used to generate the __P and __S table,
+ *
+ * Note:__pgprot is defined in arch/powerpc/include/asm/page.h
+ *
+ * Write permissions imply read permissions for now (we could make write-only
+ * pages on BookE but we don't bother for now). Execute permission control is
+ * possible on platforms that define _PAGE_EXEC
+ *
+ * Note due to the way vm flags are laid out, the bits are XWR
+ */
+#define PAGE_NONE  __pgprot(_PAGE_BASE)
+#define PAGE_SHARED__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
+#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | \
+_PAGE_EXEC)
+#define PAGE_COPY  __pgprot(_PAGE_BASE | _PAGE_USER )
+#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
+#define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER )
+#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
+
+#define __P000 PAGE_NONE
+#define __P001 PAGE_READONLY
+#define __P010 PAGE_COPY
+#define __P011 PAGE_COPY
+#define __P100 PAGE_READONLY_X
+#define __P101 PAGE_READONLY_X
+#define __P110 PAGE_COPY_X
+#define __P111 PAGE_COPY_X
+
+#define __S000 

[PATCH V2 19/31] powerpc/mm: Convert 4k hash insert to C

2015-09-29 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/Makefile|   3 +
 arch/powerpc/mm/hash64_64k.c| 202 +
 arch/powerpc/mm/hash_low_64.S   | 380 
 arch/powerpc/mm/hash_utils_64.c |   4 +-
 4 files changed, 208 insertions(+), 381 deletions(-)
 create mode 100644 arch/powerpc/mm/hash64_64k.c

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 3eb73a38220d..f80ad1a76cc8 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -18,6 +18,9 @@ obj-$(CONFIG_PPC_STD_MMU_32)  += ppc_mmu_32.o
 obj-$(CONFIG_PPC_STD_MMU)  += hash_low_$(CONFIG_WORD_SIZE).o \
   tlb_hash$(CONFIG_WORD_SIZE).o \
   mmu_context_hash$(CONFIG_WORD_SIZE).o
+ifeq ($(CONFIG_PPC_STD_MMU_64),y)
+obj-$(CONFIG_PPC_64K_PAGES)+= hash64_64k.o
+endif
 obj-$(CONFIG_PPC_ICSWX)+= icswx.o
 obj-$(CONFIG_PPC_ICSWX_PID)+= icswx_pid.o
 obj-$(CONFIG_40x)  += 40x_mmu.o
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
new file mode 100644
index ..b137e50a3e57
--- /dev/null
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -0,0 +1,202 @@
+/*
+ * Copyright IBM Corporation, 2015
+ * Author Aneesh Kumar K.V 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#include 
+#include 
+#include 
+
+int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
+  pte_t *ptep, unsigned long trap, unsigned long flags,
+  int ssize, int subpg_prot)
+{
+   real_pte_t rpte;
+   unsigned long *hidxp;
+   unsigned long hpte_group;
+   unsigned int subpg_index;
+   unsigned long shift = 12; /* 4K */
+   unsigned long rflags, pa, hidx;
+   unsigned long old_pte, new_pte, subpg_pte;
+   unsigned long vpn, hash, slot;
+
+   /*
+* atomically mark the linux large page PTE busy and dirty
+*/
+   do {
+   pte_t pte = READ_ONCE(*ptep);
+
+   old_pte = pte_val(pte);
+   /* If PTE busy, retry the access */
+   if (unlikely(old_pte & _PAGE_BUSY))
+   return 0;
+   /* If PTE permissions don't match, take page fault */
+   if (unlikely(access & ~old_pte))
+   return 1;
+   /*
+* Try to lock the PTE, add ACCESSED and DIRTY if it was
+* a write access. Since this is 4K insert of 64K page size
+* also add _PAGE_COMBO
+*/
+   new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED | _PAGE_COMBO;
+   if (access & _PAGE_RW)
+   new_pte |= _PAGE_DIRTY;
+   } while (old_pte != __cmpxchg_u64((unsigned long *)ptep,
+ old_pte, new_pte));
+   /*
+* Handle the subpage protection bits
+*/
+   subpg_pte = new_pte & ~subpg_prot;
+   /*
+* PP bits. _PAGE_USER is already PP bit 0x2, so we only
+* need to add in 0x1 if it's a read-only user page
+*/
+   rflags = subpg_pte & _PAGE_USER;
+   if ((subpg_pte & _PAGE_USER) && !((subpg_pte & _PAGE_RW) &&
+   (subpg_pte & _PAGE_DIRTY)))
+   rflags |= 0x1;
+   /*
+* _PAGE_EXEC -> HW_NO_EXEC since it's inverted
+*/
+   rflags |= ((subpg_pte & _PAGE_EXEC) ? 0 : HPTE_R_N);
+   /*
+* Always add C and Memory coherence bit
+*/
+   rflags |= HPTE_R_C | HPTE_R_M;
+   /*
+* Add in WIMG bits
+*/
+   rflags |= (subpg_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
+   _PAGE_COHERENT | _PAGE_GUARDED));
+
+   if (!cpu_has_feature(CPU_FTR_NOEXECUTE) &&
+   !cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
+
+   /*
+* No CPU has hugepages but lacks no execute, so we
+* don't need to worry about that case
+*/
+   rflags = hash_page_do_lazy_icache(rflags, __pte(old_pte), trap);
+   }
+
+   subpg_index = (ea & (PAGE_SIZE - 1)) >> shift;
+   vpn  = hpt_vpn(ea, vsid, ssize);
+   rpte = __real_pte(__pte(old_pte), ptep);
+   /*
+*None of the sub 4k page is hashed
+*/
+   if (!(old_pte & _PAGE_HASHPTE))
+   goto htab_insert_hpte;
+   /*
+* Check if the pte was already inserted into the hash table
+* as a 64k HW page, and 

[PATCH V2 21/31] powerpc/mm: make pte page hash index slot 8 bits

2015-09-29 Thread Aneesh Kumar K.V
Currently we use 4 bits for each slot and pack all the 16 slot
information related to a 64K linux page in a 64bit value. To do this
we use 16 bits of pte_t. Move the hash slot valid bit out of pte_t
and place them in the second half of pte page. We also use 8 bit
per each slot.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 48 +++
 arch/powerpc/include/asm/book3s/64/hash.h |  5 ---
 arch/powerpc/include/asm/page.h   |  4 +--
 arch/powerpc/mm/hash64_64k.c  | 34 +++
 4 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index ced5a17a8d1a..dafc2f31c843 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -78,33 +78,39 @@
  * generic accessors and iterators here
  */
 #define __real_pte __real_pte
-static inline real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep)
-{
-   real_pte_t rpte;
-
-   rpte.pte = pte;
-   rpte.hidx = 0;
-   if (pte_val(pte) & _PAGE_COMBO) {
-   /*
-* Make sure we order the hidx load against the _PAGE_COMBO
-* check. The store side ordering is done in __hash_page_4K
-*/
-   smp_rmb();
-   rpte.hidx = pte_val(*((ptep) + PTRS_PER_PTE));
-   }
-   return rpte;
-}
-
+extern real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep);
 static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long 
index)
 {
if ((pte_val(rpte.pte) & _PAGE_COMBO))
-   return (rpte.hidx >> (index<<2)) & 0xf;
+   return (unsigned long) rpte.hidx[index] >> 4;
return (pte_val(rpte.pte) >> 12) & 0xf;
 }
 
-#define __rpte_to_pte(r)   ((r).pte)
-#define __rpte_sub_valid(rpte, index) \
-   (pte_val(rpte.pte) & (_PAGE_HPTE_SUB0 >> (index)))
+static inline pte_t __rpte_to_pte(real_pte_t rpte)
+{
+   return rpte.pte;
+}
+/*
+ * we look at the second half of the pte page to determine whether
+ * the sub 4k hpte is valid. We use 8 bits per each index, and we have
+ * 16 index mapping full 64K page. Hence for each
+ * 64K linux page we use 128 bit from the second half of pte page.
+ * The encoding in the second half of the page is as below:
+ * [ index 15 ] .[index 0]
+ * [bit 127 ..bit 0]
+ * fomat of each index
+ * bit 7  bit0
+ * [one bit secondary][ 3 bit hidx][1 bit valid][000]
+ */
+static inline bool __rpte_sub_valid(real_pte_t rpte, unsigned long index)
+{
+   unsigned char index_val = rpte.hidx[index];
+
+   if ((index_val >> 3) & 0x1)
+   return true;
+   return false;
+}
+
 /*
  * Trick: we set __end to va + 64k, which happens works for
  * a 16M page as well as we want only one iteration
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 25e38809e4f7..32a1f94201d0 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -214,11 +214,6 @@
 
 #define PMD_BAD_BITS   (PTE_TABLE_SIZE-1)
 #define PUD_BAD_BITS   (PMD_TABLE_SIZE-1)
-/*
- * We save the slot number & secondary bit in the second half of the
- * PTE page. We use the 8 bytes per each pte entry.
- */
-#define PTE_PAGE_HIDX_OFFSET (PTRS_PER_PTE * 8)
 
 #ifndef __ASSEMBLY__
 #definepmd_bad(pmd)(!is_kernel_addr(pmd_val(pmd)) \
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index f0faa80ceac1..83a4cc5fc306 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -293,7 +293,7 @@ static inline pte_basic_t pte_val(pte_t x)
  * the "second half" part of the PTE for pseudo 64k pages
  */
 #if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
+typedef struct { pte_t pte; unsigned char *hidx; } real_pte_t;
 #else
 typedef struct { pte_t pte; } real_pte_t;
 #endif
@@ -345,7 +345,7 @@ static inline pte_basic_t pte_val(pte_t pte)
 }
 
 #if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
+typedef struct { pte_t pte; unsigned char *hidx; } real_pte_t;
 #else
 typedef pte_t real_pte_t;
 #endif
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 87cba0688fce..423f47a89299 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -16,12 +16,32 @@
 #include 
 #include 
 
+real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep)
+{
+   int indx;
+   real_pte_t rpte;
+   pte_t *pte_headp;
+
+   rpte.pte = pte;
+   rpte.hidx = NULL;
+   if (pte_val(pte) & _PAGE_COMBO) 

[PATCH V2 20/31] powerpc/mm: update __real_pte to take address as argument

2015-09-29 Thread Aneesh Kumar K.V
We will use this in the later patch to compute the right hash index

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 2 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 4 ++--
 arch/powerpc/include/asm/nohash/64/pgtable.h  | 4 ++--
 arch/powerpc/mm/hash64_64k.c  | 2 +-
 arch/powerpc/mm/tlb_hash64.c  | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index ee073822145d..ced5a17a8d1a 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -78,7 +78,7 @@
  * generic accessors and iterators here
  */
 #define __real_pte __real_pte
-static inline real_pte_t __real_pte(pte_t pte, pte_t *ptep)
+static inline real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep)
 {
real_pte_t rpte;
 
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index f2ace2cac7bb..3117f0495b74 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -44,10 +44,10 @@
 #ifndef __real_pte
 
 #ifdef CONFIG_STRICT_MM_TYPECHECKS
-#define __real_pte(e,p)((real_pte_t){(e)})
+#define __real_pte(a,e,p)  ((real_pte_t){(e)})
 #define __rpte_to_pte(r)   ((r).pte)
 #else
-#define __real_pte(e,p)(e)
+#define __real_pte(a,e,p)  (e)
 #define __rpte_to_pte(r)   (__pte(r))
 #endif
 #define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) >> 12)
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index c33aa32ffba5..6c35bd434f80 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -115,10 +115,10 @@
 #ifndef __real_pte
 
 #ifdef CONFIG_STRICT_MM_TYPECHECKS
-#define __real_pte(e,p)((real_pte_t){(e)})
+#define __real_pte(a,e,p)  ((real_pte_t){(e)})
 #define __rpte_to_pte(r)   ((r).pte)
 #else
-#define __real_pte(e,p)(e)
+#define __real_pte(a,e,p)  (e)
 #define __rpte_to_pte(r)   (__pte(r))
 #endif
 #define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) >> 12)
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index b137e50a3e57..87cba0688fce 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -90,7 +90,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 
subpg_index = (ea & (PAGE_SIZE - 1)) >> shift;
vpn  = hpt_vpn(ea, vsid, ssize);
-   rpte = __real_pte(__pte(old_pte), ptep);
+   rpte = __real_pte(ea, __pte(old_pte), ptep);
/*
 *None of the sub 4k page is hashed
 */
diff --git a/arch/powerpc/mm/tlb_hash64.c b/arch/powerpc/mm/tlb_hash64.c
index c522969f012d..cc35ae0d02e6 100644
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -89,7 +89,7 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
}
WARN_ON(vsid == 0);
vpn = hpt_vpn(addr, vsid, ssize);
-   rpte = __real_pte(__pte(pte), ptep);
+   rpte = __real_pte(addr, __pte(pte), ptep);
 
/*
 * Check if we have an active batch on this CPU. If not, just
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 26/31] powerpc/mm: Remove the dependency on pte bit position in asm code

2015-09-29 Thread Aneesh Kumar K.V
We should not expect pte bit position in asm code. Simply
by moving part of that to C

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/exceptions-64s.S | 16 +++-
 arch/powerpc/mm/hash_utils_64.c  | 29 +
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 0a0399c2af11..34920f11dbdd 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1556,28 +1556,18 @@ do_hash_page:
lwz r0,TI_PREEMPT(r11)  /* If we're in an "NMI" */
andis.  r0,r0,NMI_MASK@h/* (i.e. an irq when soft-disabled) */
bne 77f /* then don't call hash_page now */
-   /*
-* We need to set the _PAGE_USER bit if MSR_PR is set or if we are
-* accessing a userspace segment (even from the kernel). We assume
-* kernel addresses always have the high bit set.
-*/
-   rlwinm  r4,r4,32-25+9,31-9,31-9 /* DSISR_STORE -> _PAGE_RW */
-   rotldi  r0,r3,15/* Move high bit into MSR_PR posn */
-   orc r0,r12,r0   /* MSR_PR | ~high_bit */
-   rlwimi  r4,r0,32-13,30,30   /* becomes _PAGE_USER access bit */
-   ori r4,r4,1 /* add _PAGE_PRESENT */
-   rlwimi  r4,r5,22+2,31-2,31-2/* Set _PAGE_EXEC if trap is 0x400 */
 
/*
 * r3 contains the faulting address
-* r4 contains the required access permissions
+* r4 msr
 * r5 contains the trap number
 * r6 contains dsisr
 *
 * at return r3 = 0 for success, 1 for page fault, negative for error
 */
+mr r4,r12
ld  r6,_DSISR(r1)
-   bl  hash_page   /* build HPTE if possible */
+   bl  __hash_page /* build HPTE if possible */
cmpdi   r3,0/* see if hash_page succeeded */
 
/* Success */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 2b90850bdaf8..6cd9e40aae01 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1161,6 +1161,35 @@ int hash_page(unsigned long ea, unsigned long access, 
unsigned long trap,
 }
 EXPORT_SYMBOL_GPL(hash_page);
 
+int __hash_page(unsigned long ea, unsigned long msr, unsigned long trap,
+   unsigned long dsisr)
+{
+   unsigned long access = _PAGE_PRESENT;
+   unsigned long flags = 0;
+   struct mm_struct *mm = current->mm;
+
+   if (REGION_ID(ea) == VMALLOC_REGION_ID)
+   mm = _mm;
+
+   if (dsisr & DSISR_NOHPTE)
+   flags |= HPTE_NOHPTE_UPDATE;
+
+   if (dsisr & DSISR_ISSTORE)
+   access |= _PAGE_RW;
+   /*
+* We need to set the _PAGE_USER bit if MSR_PR is set or if we are
+* accessing a userspace segment (even from the kernel). We assume
+* kernel addresses always have the high bit set.
+*/
+   if ((msr & MSR_PR) || (REGION_ID(ea) == USER_REGION_ID))
+   access |= _PAGE_USER;
+
+   if (trap == 0x400)
+   access |= _PAGE_EXEC;
+
+   return hash_page_mm(mm, ea, access, trap, flags);
+}
+
 void hash_preload(struct mm_struct *mm, unsigned long ea,
  unsigned long access, unsigned long trap)
 {
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] perf record: Limit --intr-regs to platforms supporting PERF_REGS

2015-09-29 Thread Jiri Olsa
On Tue, Sep 29, 2015 at 10:01:36PM +0530, Naveen N. Rao wrote:
> On 2015/09/29 12:47PM, Jiri Olsa wrote:
> > On Tue, Sep 29, 2015 at 01:30:10PM +0530, Naveen N. Rao wrote:
> > 
> > SNIP
> > 
> > > > > Suka has also posted a fix for this with a different approach [1]. 
> > > > > Can 
> > > > > you please ack/pull one of these versions? Building perf is broken on 
> > > > > v4.3-rc due to this.
> > > > 
> > > > I did not get any answer for additional comments I made to the patch
> > > > (couldnt get marc.info working, sending the patch again)
> > > 
> > > Hi Jiri,
> > > I concur with the changes you proposed to my patch here (getting rid of 
> > > the weak variant):
> > > http://article.gmane.org/gmane.linux.kernel/2046108
> > > 
> > > I am aware of the other approach you posted (and the one attached 
> > > below). When I said "please ack/pull one of these versions", I meant one 
> > > of: your version, Suka's and mine.
> > 
> > I was hoping somebody could test it on ppc ;-)
> > 
> > I think the last version (in my last email) that keeps the weak
> > variable is correct, let's wait for Arnaldo to sort this out
> 
> I just tried it, but it fails. As Suka points out in his patch:
> "Adding perf_regs.o to util/Build unconditionally, exposes a 
> redefinition error for 'perf_reg_value()' function (due to the static 
> inline version in util/perf_regs.h). So use #ifdef 
> HAVE_PERF_REGS_SUPPORT' around that function."

could you (or Suka) please reply in here with the patch?

thanks,
jirka
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support

2015-09-29 Thread Raghavendra K T

On 09/28/2015 10:34 PM, Nishanth Aravamudan wrote:

On 28.09.2015 [13:44:42 +0300], Denis Kirjanov wrote:

On 9/27/15, Raghavendra K T  wrote:

Problem description:
Powerpc has sparse node numbering, i.e. on a 4 node system nodes are
numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid
got from device tree is naturally mapped (directly) to nid.


Interesting thing to play with, I'll try to test it on my POWER7 box,
but it doesn't have the OPAL layer :(


Hi Denis,
Thanks for your interest. I have pushed the patches to

https://github.com/ktraghavendra/linux/tree/serialnuma_v1 if it makes
 patches easy to grab.



Note that it's also interesting to try it under PowerVM, with odd NUMA
topologies and report any issues found :)



Thanks Nish, I 'll also grab a powerVM and test.




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 2/5] powerpc:numa Rename functions referring to nid as chipid

2015-09-29 Thread Raghavendra K T

On 09/28/2015 10:57 PM, Nishanth Aravamudan wrote:

On 27.09.2015 [23:59:10 +0530], Raghavendra K T wrote:

There is no change in the fuctionality

Signed-off-by: Raghavendra K T 
---
  arch/powerpc/mm/numa.c | 42 +-
  1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index d5e6eee..f84ed2f 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -235,47 +235,47 @@ static void initialize_distance_lookup_table(int nid,
}
  }

-/* Returns nid in the range [0..MAX_NUMNODES-1], or -1 if no useful numa
+/* Returns chipid in the range [0..MAX_NUMNODES-1], or -1 if no useful numa
   * info is found.
   */
-static int associativity_to_nid(const __be32 *associativity)
+static int associativity_to_chipid(const __be32 *associativity)


This is confusing to me. This function is also used by the DLPAR code
under PowerVM to indicate what node the CPU is on -- not a chip (which I
don't believe is exposed at all under PowerVM).



Good point.

should I retain the name nid?
or any suggestions? instead of chipid -> nid which fits both the cases.
or should I rename like nid->vnid  something?
[...]

@@ -1415,7 +1415,7 @@ int arch_update_cpu_topology(void)

/* Use associativity from first thread for all siblings */
vphn_get_associativity(cpu, associativity);
-   new_nid = associativity_to_nid(associativity);
+   new_nid = associativity_to_chipid(associativity);


If you are getting a chipid, shouldn't you be assigning it to a variable
called 'new_chipid'?


yes perhaps.
my splitting idea was
1. change nid name in functions to chipid (without changing nid
variable calling that function)
2. rename variables to chipid and assign nid=chipid (1:1 mapping)
3. now let nid = mapped chipid

But I see that it isn't consistent in some places. do you think
merging step 1 and step 2 is okay?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] perf record: Limit --intr-regs to platforms supporting PERF_REGS

2015-09-29 Thread Naveen N. Rao
On 2015/09/29 12:47PM, Jiri Olsa wrote:
> On Tue, Sep 29, 2015 at 01:30:10PM +0530, Naveen N. Rao wrote:
> 
> SNIP
> 
> > > > Suka has also posted a fix for this with a different approach [1]. Can 
> > > > you please ack/pull one of these versions? Building perf is broken on 
> > > > v4.3-rc due to this.
> > > 
> > > I did not get any answer for additional comments I made to the patch
> > > (couldnt get marc.info working, sending the patch again)
> > 
> > Hi Jiri,
> > I concur with the changes you proposed to my patch here (getting rid of 
> > the weak variant):
> > http://article.gmane.org/gmane.linux.kernel/2046108
> > 
> > I am aware of the other approach you posted (and the one attached 
> > below). When I said "please ack/pull one of these versions", I meant one 
> > of: your version, Suka's and mine.
> 
> I was hoping somebody could test it on ppc ;-)
> 
> I think the last version (in my last email) that keeps the weak
> variable is correct, let's wait for Arnaldo to sort this out

I just tried it, but it fails. As Suka points out in his patch:
"Adding perf_regs.o to util/Build unconditionally, exposes a 
redefinition error for 'perf_reg_value()' function (due to the static 
inline version in util/perf_regs.h). So use #ifdef 
HAVE_PERF_REGS_SUPPORT' around that function."

- Naveen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 3/5] powerpc:numa create 1:1 mappaing between chipid and nid

2015-09-29 Thread Raghavendra K T

On 09/28/2015 10:58 PM, Nishanth Aravamudan wrote:

On 27.09.2015 [23:59:11 +0530], Raghavendra K T wrote:

Once we have made the distinction between nid and chipid
create a 1:1 mapping between them. This makes compacting the
nids easy later.



Didn't the previous patch just do the opposite of...



As per my thoughts it was:
1. rename functions to say loud that it is chipid (and not nid)
2. and then assign nid = chipid so that we are clear that
we made nid:chipid 1:1 mapping and compact nids later..

But again may be I should combine patch 2 and 3.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support

2015-09-29 Thread Raghavendra K T

On 09/28/2015 11:04 PM, Nishanth Aravamudan wrote:

On 27.09.2015 [23:59:08 +0530], Raghavendra K T wrote:

[...]


2) Map the sparse chipid got from device tree to a serial nid at kernel
level (The idea proposed in this series).
Pro: It is more natural to handle at kernel level than at lower (OPAL) layer.
con: The chipid is in device tree no longer the same as nid in kernel


Is there any debugging/logging? Looks like not -- so how does a sysadmin
map from firmware-provided values to the Linux values? That's going to
make debugging of large systems (PowerVM or otherwise) less than
pleasant, it seems? Possibly you could put something in sysfs?


I see 2 things could be done here:

1) while doing dump_numa_cpu_topology() we can dump nid_to_chipid()
as additional information.

2) sysfs->
Does /sys/devices/system/node/nodeX/*chipid* looks good. May be we
should add only for powerpc or otherwise we need to have chipid = nid
populated for other archs. [ I think this change may be done slowly ]





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 3/5] powerpc:numa create 1:1 mappaing between chipid and nid

2015-09-29 Thread Raghavendra K T

On 09/28/2015 11:05 PM, Nishanth Aravamudan wrote:

On 27.09.2015 [23:59:11 +0530], Raghavendra K T wrote:

Once we have made the distinction between nid and chipid
create a 1:1 mapping between them. This makes compacting the
nids easy later.

No functionality change.

Signed-off-by: Raghavendra K T 
---
  arch/powerpc/mm/numa.c | 36 +---
  1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f84ed2f..dd2073b 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -264,6 +264,17 @@ out:
return chipid;
  }

+
+ /* Return the nid from associativity */
+static int associativity_to_nid(const __be32 *associativity)
+{
+   int nid;
+
+   nid = associativity_to_chipid(associativity);
+   return nid;
+}


This is ultimately confusing. You are assigning the semantic return
value of a chipid to a nid -- is it a nid or a chipid? Shouldn't the
variable naming be consistent?



:( yes. will come up with some consistent naming.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 08/32] cxlflash: Fix to avoid CXL services during EEH

2015-09-29 Thread Matthew R. Ochs
> On Sep 28, 2015, at 6:05 PM, Daniel Axtens  wrote:
> 
> You have two versions of check_state() below, which is a bit
> confusing. It looks like you've moved the function and also added the
> up/down of the read semaphore. I assume that's all that changed?

Correct.

It was originally moved to meet a dependency due to it being defined statically.

> 
>> 
>> /**
>> + * check_state() - checks and responds to the current adapter state
>> + * @cfg:Internal structure associated with the host.
>> + *
>> + * This routine can block and should only be used on process context.
>> + * It assumes that the caller is an ioctl thread and holding the ioctl
>> + * read semaphore. This is temporarily let up across the wait to allow
>> + * for draining actively running ioctls. Also note that when waking up
>> + * from waiting in reset, the state is unknown and must be checked again
>> + * before proceeding.
>> + *
>> + * Return: 0 on success, -errno on failure
>> + */
>> +static int check_state(struct cxlflash_cfg *cfg)
>> +{
>> +struct device *dev = >dev->dev;
>> +int rc = 0;
>> +
>> +retry:
>> +switch (cfg->state) {
>> +case STATE_LIMBO:
>> +dev_dbg(dev, "%s: Limbo state, going to wait...\n", __func__);
>> +up_read(>ioctl_rwsem);
>> +rc = wait_event_interruptible(cfg->limbo_waitq,
>> +  cfg->state != STATE_LIMBO);
>> +down_read(>ioctl_rwsem);
>> +if (unlikely(rc))
>> +break;
>> +goto retry;
>> +case STATE_FAILTERM:
>> +dev_dbg(dev, "%s: Failed/Terminating!\n", __func__);
>> +rc = -ENODEV;
>> +break;
>> +default:
>> +break;
>> +}
>> +
>> +return rc;
>> +}
>> +
>> +/**
>>  * cxlflash_disk_attach() - attach a LUN to a context
>>  * @sdev:SCSI device associated with LUN.
>>  * @attach:  Attach ioctl data structure.
>> @@ -1523,41 +1563,6 @@ err1:
>> }
>> 
>> /**
>> - * check_state() - checks and responds to the current adapter state
>> - * @cfg:Internal structure associated with the host.
>> - *
>> - * This routine can block and should only be used on process context.
>> - * Note that when waking up from waiting in limbo, the state is unknown
>> - * and must be checked again before proceeding.
>> - *
>> - * Return: 0 on success, -errno on failure
>> - */
>> -static int check_state(struct cxlflash_cfg *cfg)
>> -{
>> -struct device *dev = >dev->dev;
>> -int rc = 0;
>> -
>> -retry:
>> -switch (cfg->state) {
>> -case STATE_LIMBO:
>> -dev_dbg(dev, "%s: Limbo, going to wait...\n", __func__);
>> -rc = wait_event_interruptible(cfg->limbo_waitq,
>> -  cfg->state != STATE_LIMBO);
>> -if (unlikely(rc))
>> -break;
>> -goto retry;
>> -case STATE_FAILTERM:
>> -dev_dbg(dev, "%s: Failed/Terminating!\n", __func__);
>> -rc = -ENODEV;
>> -break;
>> -default:
>> -break;
>> -}
>> -
>> -return rc;
>> -}
>> -
>> -/**
>>  * cxlflash_afu_recover() - initiates AFU recovery
>>  * @sdev:SCSI device associated with LUN.
>>  * @recover: Recover ioctl data structure.
>> @@ -1646,9 +1651,14 @@ retry_recover:
>>  /* Test if in error state */
>>  reg = readq_be(>ctrl_map->mbox_r);
>>  if (reg == -1) {
>> -dev_dbg(dev, "%s: MMIO read fail! Wait for recovery...\n",
>> -__func__);
>> -mutex_unlock(>mutex);
>> +dev_dbg(dev, "%s: MMIO fail, wait for recovery.\n", __func__);
>> +
>> +/*
>> + * Before checking the state, put back the context obtained with
>> + * get_context() as it is no longer needed and sleep for a short
>> + * period of time (see prolog notes).
>> + */
>> +put_context(ctxi);
> 
> Is this needed for the drain to work? It looks like it fixes a
> refcounting bug in the function, but I'm not sure I understand how it
> interacts with the rest of this patch.

This was simply some "while I'm here" refactoring as the commit originally
included a change here. The main point of this change was to replace the
mutex_unlock() with put_context(), which is a wrapper around the unlocking
of the context's mutex.
> 
> Anyway, the patch overall looks good to me, and makes your driver
> interact with CXL's EEH support in the way I intended when I wrote it.

Thanks for reviewing.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 4/5] powerpc:numa Add helper functions to maintain chipid to nid mapping

2015-09-29 Thread Raghavendra K T

On 09/28/2015 11:02 PM, Nishanth Aravamudan wrote:

On 27.09.2015 [23:59:12 +0530], Raghavendra K T wrote:

Create arrays that maps serial nids and sparse chipids.

Note: My original idea had only two arrays of chipid to nid map. Final
code is inspired by driver/acpi/numa.c that maps a proximity node with
a logical node by Takayoshi Kochi , and thus
uses an additional chipid_map nodemask. The mask helps in first unused
nid easily by knowing first unset bit in the mask.

No change in functionality.

Signed-off-by: Raghavendra K T 
---
  arch/powerpc/mm/numa.c | 48 +++-
  1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index dd2073b..f015cad 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -63,6 +63,11 @@ static int form1_affinity;
  static int distance_ref_points_depth;
  static const __be32 *distance_ref_points;
  static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS];
+static nodemask_t chipid_map = NODE_MASK_NONE;
+static int chipid_to_nid_map[MAX_NUMNODES]
+   = { [0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE };


Hrm, conceptually there are *more* chips than nodes, right? So what
guarantees we won't see > MAX_NUMNODES chips?


You are correct that nid <= chipids.
and #nids = #chipids when all possible slots are populated. Considering
we assume that maximum chip slots are no more than MAX_NUMNODES,


how about having

#define MAX_CHIPNODES MAX_NUMNODES
and
chipid_to_nid_map[MAX_CHIPNODES] = { [0 ... MAX_CHIPNODES - 1] = ..




+static int nid_to_chipid_map[MAX_NUMNODES]
+   = { [0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE };

  /*
   * Allocate node_to_cpumask_map based on number of available nodes
@@ -133,6 +138,48 @@ static int __init fake_numa_create_new_node(unsigned long 
end_pfn,
return 0;
  }

+int chipid_to_nid(int chipid)
+{
+   if (chipid < 0)
+   return NUMA_NO_NODE;


Do you really want to support these cases? Or should they be
bugs/warnings indicating that you got an unexpected input? Or at least
WARN_ON_ONCE?



Right. Querying for nid of an invalid chipid should be atleast
WARN_ON_ONCE(). But 'll check once if there is any valid scenario
before the change.


+   return chipid_to_nid_map[chipid];
+}
+
+int nid_to_chipid(int nid)
+{
+   if (nid < 0)
+   return NUMA_NO_NODE;
+   return nid_to_chipid_map[nid];
+}
+
+static void __map_chipid_to_nid(int chipid, int nid)
+{
+   if (chipid_to_nid_map[chipid] == NUMA_NO_NODE
+|| nid < chipid_to_nid_map[chipid])
+   chipid_to_nid_map[chipid] = nid;
+   if (nid_to_chipid_map[nid] == NUMA_NO_NODE
+   || chipid < nid_to_chipid_map[nid])
+   nid_to_chipid_map[nid] = chipid;
+}


chip <-> node mapping is a static (physical) concept, right? Should we
emit some debugging if for some reason we get a runtime call to remap
an already mapped chip to a new node?



Good point. Already mapped chipid to a different nid is unexpected
whereas mapping chipid to same nid is expected.(because mapping comes
from cpus belonging to same node).

WARN_ON() should suffice here?



+
+int map_chipid_to_nid(int chipid)
+{
+   int nid;
+
+   if (chipid < 0 || chipid >= MAX_NUMNODES)
+   return NUMA_NO_NODE;
+
+   nid = chipid_to_nid_map[chipid];
+   if (nid == NUMA_NO_NODE) {
+   if (nodes_weight(chipid_map) >= MAX_NUMNODES)
+   return NUMA_NO_NODE;


If you create a KVM guest with a bogus topology, doesn't this just start
losing NUMA information for very high-noded guests?



'll try to see if it is possible to hit this case, ideally we should
not allow more than MAX_NUMNODES for chipids and we should abort early.


+   nid = first_unset_node(chipid_map);
+   __map_chipid_to_nid(chipid, nid);
+   node_set(nid, chipid_map);
+   }
+   return nid;
+}
+
  int numa_cpu_lookup(int cpu)
  {
return numa_cpu_lookup_table[cpu];
@@ -264,7 +311,6 @@ out:
return chipid;
  }

-


stray change?



yep, will correct that.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 32/32] cxlflash: Fix to avoid potential deadlock on EEH

2015-09-29 Thread Matthew R. Ochs
> On Sep 28, 2015, at 6:41 PM, Brian King  wrote:
> On 09/25/2015 06:19 PM, Matthew R. Ochs wrote:
>> static int write_same16(struct scsi_device *sdev,
>> @@ -433,9 +451,20 @@ static int write_same16(struct scsi_device *sdev,
>>  put_unaligned_be32(ws_limit < left ? ws_limit : left,
>> _cmd[10]);
>> 
>> +/* Drop the ioctl read semahpore across lengthy call */
>> +up_read(>ioctl_rwsem);
>>  result = scsi_execute(sdev, scsi_cmd, DMA_TO_DEVICE, cmd_buf,
>>CMD_BUFSIZE, sense_buf, to, CMD_RETRIES,
>>0, NULL);
>> +down_read(>ioctl_rwsem);
>> +rc = check_state(cfg);
>> +if (rc) {
>> +dev_err(dev, "%s: Failed state! result=0x08%X\n",
>> +__func__, result);
>> +rc = -ENODEV;
> 
> Since check_state only returns 0 or -ENODEV, this is a bit redundant, but not 
> worth redoing the
> patch in my mind.

Agreed. This occurred to me the other day after submitting this patch when I was
reviewing the state locking code. Will look at revising in a future patch.

Thanks again for reviewing.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] perf record: Limit --intr-regs to platforms supporting PERF_REGS

2015-09-29 Thread Sukadev Bhattiprolu
Jiri Olsa [jo...@redhat.com] wrote:
| > I just tried it, but it fails. As Suka points out in his patch:
| > "Adding perf_regs.o to util/Build unconditionally, exposes a 
| > redefinition error for 'perf_reg_value()' function (due to the static 
| > inline version in util/perf_regs.h). So use #ifdef 
| > HAVE_PERF_REGS_SUPPORT' around that function."
| 
| could you (or Suka) please reply in here with the patch?
Jiri,

Do you mean this patch? I was planning on pinging Arnaldo again in a
couple of days about this patch, since the powerpc build is broken.

Sukadev

---


From d1171a4c34c6100ec8b663ddb803dd69ef3fb7ce Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Thu, 24 Sep 2015 17:53:49 -0400
Subject: [PATCH] perf: Fix build break on powerpc due to sample_reg_masks

perf_regs.c does not get built on Powerpc as CONFIG_PERF_REGS is false.
So the weak definition for 'sample_regs_masks' doesn't get picked up.

Adding perf_regs.o to util/Build unconditionally, exposes a redefinition
error for 'perf_reg_value()' function (due to the static inline version
in util/perf_regs.h). So use #ifdef HAVE_PERF_REGS_SUPPORT' around that
function.

Signed-off-by: Sukadev Bhattiprolu 
---
 tools/perf/util/Build   | 2 +-
 tools/perf/util/perf_regs.c | 2 ++
 tools/perf/util/perf_regs.h | 4 
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 349bc96..e5f18a2 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -17,6 +17,7 @@ libperf-y += levenshtein.o
 libperf-y += llvm-utils.o
 libperf-y += parse-options.o
 libperf-y += parse-events.o
+libperf-y += perf_regs.o
 libperf-y += path.o
 libperf-y += rbtree.o
 libperf-y += bitmap.o
@@ -103,7 +104,6 @@ libperf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
 
 libperf-y += scripting-engines/
 
-libperf-$(CONFIG_PERF_REGS) += perf_regs.o
 libperf-$(CONFIG_ZLIB) += zlib.o
 libperf-$(CONFIG_LZMA) += lzma.o
 
diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
index 885e8ac..6b8eb13 100644
--- a/tools/perf/util/perf_regs.c
+++ b/tools/perf/util/perf_regs.c
@@ -6,6 +6,7 @@ const struct sample_reg __weak sample_reg_masks[] = {
SMPL_REG_END
 };
 
+#ifdef HAVE_PERF_REGS_SUPPORT
 int perf_reg_value(u64 *valp, struct regs_dump *regs, int id)
 {
int i, idx = 0;
@@ -29,3 +30,4 @@ out:
*valp = regs->cache_regs[id];
return 0;
 }
+#endif
diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
index 2984dcc..8dbdfeb 100644
--- a/tools/perf/util/perf_regs.h
+++ b/tools/perf/util/perf_regs.h
@@ -3,6 +3,10 @@
 
 #include 
 
+#ifndef __maybe_unused
+#define __maybe_unused __attribute__((unused))
+#endif
+
 struct regs_dump;
 
 struct sample_reg {
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 27/32] cxlflash: Fix to prevent stale AFU RRQ

2015-09-29 Thread Matthew R. Ochs
> On Sep 28, 2015, at 8:36 PM, Daniel Axtens  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
> 
> "Matthew R. Ochs"  writes:
> 
>> Following an adapter reset, the AFU RRQ that resides in host memory
>> holds stale data. This can lead to a condition where the RRQ interrupt
>> handler tries to process stale entries and/or endlessly loops due to an
>> out of sync generation bit.
>> 
>> To fix, the AFU RRQ in host memory needs to be cleared after each reset.
> 
> This looks good. Do you need anything to bail out of cxlflash_rrq_irq if
> the data goes stale or to all Fs while that function is running?

We're not performing an MMIO here, so I'm not sure how the all Fs check
would apply. We're also protected fairly well by the generation bit. I suppose
we could look at adding some type of 'max iterations' count to protect against
a runaway handler but that would be in a future patch.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 17/32] cxlflash: Remove dual port online dependency

2015-09-29 Thread Matthew R. Ochs
> On Sep 28, 2015, at 6:37 PM, Daniel Axtens  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
> 
> Hi,
> 
>> static int afu_set_wwpn(struct afu *afu, int port, u64 *fc_regs, u64 wwpn)
>> {
>> -int ret = 0;
>> +int rc = 0;
> I realise it's nice to have things consistent, but making this change
> now makes the rest of the patch quite difficult to follow.

Next time I will try to separate out a change like this.

> 
>> 
>>  set_port_offline(fc_regs);
>> 
>> @@ -1038,33 +1038,26 @@ static int afu_set_wwpn(struct afu *afu, int port, 
>> u64 *fc_regs, u64 wwpn)
>> FC_PORT_STATUS_RETRY_CNT)) {
>>  pr_debug("%s: wait on port %d to go offline timed out\n",
>>   __func__, port);
>> -ret = -1; /* but continue on to leave the port back online */
>> +rc = -1; /* but continue on to leave the port back online */
>>  }
>> 
>> -if (ret == 0)
>> +if (rc == 0)
>>  writeq_be(wwpn, _regs[FC_PNAME / 8]);
>> 
>> +/* Always return success after programming WWPN */
>> +rc = 0;
>> +
>>  set_port_online(fc_regs);
>> 
>>  if (!wait_port_online(fc_regs, FC_PORT_STATUS_RETRY_INTERVAL_US,
>>FC_PORT_STATUS_RETRY_CNT)) {
>>  pr_debug("%s: wait on port %d to go online timed out\n",
>>   __func__, port);
>> -ret = -1;
>> -
>> -/*
>> - * Override for internal lun!!!
>> - */
>> -if (afu->internal_lun) {
>> -pr_debug("%s: Overriding port %d online timeout!!!\n",
>> - __func__, port);
>> -ret = 0;
>> -}
>>  }
>> 
>> -pr_debug("%s: returning rc=%d\n", __func__, ret);
>> +pr_debug("%s: returning rc=%d\n", __func__, rc);
> I'm not sure I fully understand the flow of this function, but it looks
> like you set rc=0 regardless of how things actually go: is this ever
> going to print a return value other than zero?

Correct, this function behaves more like a void for the time being. The
overall goal of this is to allow a card to configure even when the link is
down. At some later point when the link is transitioned to 'up', a link state
change interrupt will trigger the port configuration. I left this with a return
code for right now in case we need to alter the behavior again (based
upon testing) and actually return a value other than 0.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support

2015-09-29 Thread Denis Kirjanov
On 9/29/15, Raghavendra K T  wrote:
> On 09/28/2015 10:34 PM, Nishanth Aravamudan wrote:
>> On 28.09.2015 [13:44:42 +0300], Denis Kirjanov wrote:
>>> On 9/27/15, Raghavendra K T  wrote:
 Problem description:
 Powerpc has sparse node numbering, i.e. on a 4 node system nodes are
 numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid
 got from device tree is naturally mapped (directly) to nid.
>>>
>>> Interesting thing to play with, I'll try to test it on my POWER7 box,
>>> but it doesn't have the OPAL layer :(
>
> Hi Denis,
> Thanks for your interest. I have pushed the patches to
>
> https://github.com/ktraghavendra/linux/tree/serialnuma_v1 if it makes
>   patches easy to grab.

Thanks!
One sad thing is that I can't test the actual node id mapping now
since currently I have an access to machine with only one memory node
:/ Can we fake it through qemu?

>
>>
>> Note that it's also interesting to try it under PowerVM, with odd NUMA
>> topologies and report any issues found :)
>>
>
> Thanks Nish, I 'll also grab a powerVM and test.
>
>
>
>
>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 25/32] cxlflash: Fix to prevent EEH recovery failure

2015-09-29 Thread Matthew R. Ochs
> On Sep 28, 2015, at 8:25 PM, Daniel Axtens  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
> 
> "Matthew R. Ochs"  writes:
> 
> 
>> The process_sense() routine can perform a read capacity which
>> can take some time to complete. If an EEH occurs while waiting
>> on the read capacity, the EEH handler is unable to obtain the
>> context's mutex in order to put the context in an error state.
>> The EEH handler will sit and wait until the context is free,
>> but this wait can last longer than the EEH handler tolerates,
>> leading to a failed recovery.
> 
> I'm not quite clear on what you mean by the EEH handler timing
> out. AFAIK there's nothing in eehd and the EEH core that times out if a
> driver doesn't respond - indeed, it's pretty easy to hang eehd with a
> misbehaving driver.
> 
> Are you referring to your own internal timeouts?
> cxlflash_wait_for_pci_err_recovery and anything else that uses
> CXLFLASH_PCI_ERROR_RECOVERY_TIMEOUT?

Reading through this again I can see how this is misleading. This is
actually similar and related to the deadlock scenario described in
"Fix to avoid potential deadlock on EEH". Without this fix, you'd end
up in a similar situation but deadlocked on the context mutex instead
of the ioctl semaphore.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev