Re: [PATCH] ipc/shm.c add ->pagesize function to shm_vm_ops

2018-07-30 Thread Jane Chu

Hi, Davidlohr,

On 7/30/2018 9:44 AM, Davidlohr Bueso wrote:


On Fri, 27 Jul 2018, Jane Chu wrote:


Commit 05ea88608d4e13 (mm, hugetlbfs: introduce ->pagesize() to
vm_operations_struct) adds a new ->pagesize() function to
hugetlb_vm_ops, intended to cover all hugetlbfs backed files.

With System V shared memory model, if "huge page" is specified,
the "shared memory" is backed by hugetlbfs files, but the mappings
initiated via shmget/shmat have their original vm_ops overwritten
with shm_vm_ops, so we need to add a ->pagesize function to shm_vm_ops.
Otherwise, vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs
backed vma, result in below BUG:

fs/hugetlbfs/inode.c
   443 if (unlikely(page_mapped(page))) {
   444 BUG_ON(truncate_op);

[  242.268342] hugetlbfs: oracle (4592): Using mlock ulimits for 
SHM_HUGETLB is deprecated

[  282.653208] [ cut here ]
[  282.708447] kernel BUG at fs/hugetlbfs/inode.c:444!
[  282.818957] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
[  284.025873] CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 
4.14.35-1829.el7uek.x86_64 #2

[  284.246609] task: 9bf0507aaf80 task.stack: a9e625628000
[  284.317455] RIP: 0010:remove_inode_hugepages+0x3db/0x3e2

[  285.292389] Call Trace:
[  285.321630]  hugetlbfs_evict_inode+0x1e/0x3e
[  285.372707]  evict+0xdb/0x1af
[  285.408185]  iput+0x1a2/0x1f7
[  285.443661]  dentry_unlink_inode+0xc6/0xf0
[  285.492661]  __dentry_kill+0xd8/0x18d
[  285.536459]  dput+0x1b5/0x1ed
[  285.571939]  __fput+0x18b/0x216
[  285.609495]  fput+0xe/0x10
[  285.646030]  task_work_run+0x90/0xa7
[  285.688788]  exit_to_usermode_loop+0xdd/0x116
[  285.740905]  do_syscall_64+0x187/0x1ae
[  285.785740]  entry_SYSCALL_64_after_hwframe+0x150/0x0

Suggested-by: Mike Kravetz 
Signed-off-by: Jane Chu 


Acked-by: Davidlohr Bueso 


Thank you!

-jane

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] ipc/shm.c add ->pagesize function to shm_vm_ops

2018-07-30 Thread Jane Chu

Hi, Michal,


On 7/30/2018 1:58 AM, Michal Hocko wrote:

On Fri 27-07-18 15:17:27, Jane Chu wrote:

Commit 05ea88608d4e13 (mm, hugetlbfs: introduce ->pagesize() to
vm_operations_struct) adds a new ->pagesize() function to
hugetlb_vm_ops, intended to cover all hugetlbfs backed files.

With System V shared memory model, if "huge page" is specified,
the "shared memory" is backed by hugetlbfs files, but the mappings
initiated via shmget/shmat have their original vm_ops overwritten
with shm_vm_ops, so we need to add a ->pagesize function to shm_vm_ops.
Otherwise, vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs
backed vma, result in below BUG:

fs/hugetlbfs/inode.c
 443 if (unlikely(page_mapped(page))) {
 444 BUG_ON(truncate_op);

[  242.268342] hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is 
deprecated
[  282.653208] [ cut here ]
[  282.708447] kernel BUG at fs/hugetlbfs/inode.c:444!
[  282.818957] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
[  284.025873] CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 
4.14.35-1829.el7uek.x86_64 #2
[  284.246609] task: 9bf0507aaf80 task.stack: a9e625628000
[  284.317455] RIP: 0010:remove_inode_hugepages+0x3db/0x3e2

[  285.292389] Call Trace:
[  285.321630]  hugetlbfs_evict_inode+0x1e/0x3e
[  285.372707]  evict+0xdb/0x1af
[  285.408185]  iput+0x1a2/0x1f7
[  285.443661]  dentry_unlink_inode+0xc6/0xf0
[  285.492661]  __dentry_kill+0xd8/0x18d
[  285.536459]  dput+0x1b5/0x1ed
[  285.571939]  __fput+0x18b/0x216
[  285.609495]  fput+0xe/0x10
[  285.646030]  task_work_run+0x90/0xa7
[  285.688788]  exit_to_usermode_loop+0xdd/0x116
[  285.740905]  do_syscall_64+0x187/0x1ae
[  285.785740]  entry_SYSCALL_64_after_hwframe+0x150/0x0

Suggested-by: Mike Kravetz 
Signed-off-by: Jane Chu 

Acked-by: Michal Hocko 

with Cc: stable and Fixes: tag as suggested by Mike.

I also agree with Matthew that the comment should be associated with
hugetlb_vm_ops/shm_vm_ops.


Indeed, will make the above changes.
Thanks for reviewing!

-jane



Thanks!


---
  include/linux/mm.h |  7 +++
  ipc/shm.c  | 12 
  2 files changed, 19 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a0fbb9ffe380..0c759379f2d9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -387,6 +387,13 @@ enum page_entry_size {
   * These are the virtual MM functions - opening of an area, closing and
   * unmapping it (needed to keep files on disk up-to-date etc), pointer
   * to the functions called when a no-page or a wp-page exception occurs.
+ *
+ * Note, when a new function is introduced to vm_operations_struct and
+ * added to hugetlb_vm_ops, please consider adding the function to
+ * shm_vm_ops. This is because under System V memory model, though
+ * mappings created via shmget/shmat with "huge page" specified are
+ * backed by hugetlbfs files, their original vm_ops are overwritten with
+ * shm_vm_ops.
   */
  struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
diff --git a/ipc/shm.c b/ipc/shm.c
index 051a3e1fb8df..fefa00d310fb 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -427,6 +427,17 @@ static int shm_split(struct vm_area_struct *vma, unsigned 
long addr)
return 0;
  }
  
+static unsigned long shm_pagesize(struct vm_area_struct *vma)

+{
+   struct file *file = vma->vm_file;
+   struct shm_file_data *sfd = shm_file_data(file);
+
+   if (sfd->vm_ops->pagesize)
+   return sfd->vm_ops->pagesize(vma);
+
+   return PAGE_SIZE;
+}
+
  #ifdef CONFIG_NUMA
  static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
  {
@@ -554,6 +565,7 @@ static const struct vm_operations_struct shm_vm_ops = {
.close  = shm_close,/* callback for when the vm-area is released */
.fault  = shm_fault,
.split  = shm_split,
+   .pagesize = shm_pagesize,
  #if defined(CONFIG_NUMA)
.set_policy = shm_set_policy,
.get_policy = shm_get_policy,
--
2.15.GIT



___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] ipc/shm.c add ->pagesize function to shm_vm_ops

2018-07-30 Thread Jane Chu

Hi, Mathew,


On 7/28/2018 12:02 PM, Matthew Wilcox wrote:

On Fri, Jul 27, 2018 at 03:17:27PM -0600, Jane Chu wrote:

+++ b/include/linux/mm.h
@@ -387,6 +387,13 @@ enum page_entry_size {
   * These are the virtual MM functions - opening of an area, closing and
   * unmapping it (needed to keep files on disk up-to-date etc), pointer
   * to the functions called when a no-page or a wp-page exception occurs.
+ *
+ * Note, when a new function is introduced to vm_operations_struct and
+ * added to hugetlb_vm_ops, please consider adding the function to
+ * shm_vm_ops. This is because under System V memory model, though
+ * mappings created via shmget/shmat with "huge page" specified are
+ * backed by hugetlbfs files, their original vm_ops are overwritten with
+ * shm_vm_ops.
   */
  struct vm_operations_struct {

I don't think this header file is the right place for this comment.
I'd think a better place for it would be at the definition of hugetlb_vm_ops.


Agreed, will make the change.
Thanks for reviewing!

-jane

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] ipc/shm.c add ->pagesize function to shm_vm_ops

2018-07-30 Thread Davidlohr Bueso

On Fri, 27 Jul 2018, Jane Chu wrote:


Commit 05ea88608d4e13 (mm, hugetlbfs: introduce ->pagesize() to
vm_operations_struct) adds a new ->pagesize() function to
hugetlb_vm_ops, intended to cover all hugetlbfs backed files.

With System V shared memory model, if "huge page" is specified,
the "shared memory" is backed by hugetlbfs files, but the mappings
initiated via shmget/shmat have their original vm_ops overwritten
with shm_vm_ops, so we need to add a ->pagesize function to shm_vm_ops.
Otherwise, vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs
backed vma, result in below BUG:

fs/hugetlbfs/inode.c
   443 if (unlikely(page_mapped(page))) {
   444 BUG_ON(truncate_op);

[  242.268342] hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is 
deprecated
[  282.653208] [ cut here ]
[  282.708447] kernel BUG at fs/hugetlbfs/inode.c:444!
[  282.818957] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
[  284.025873] CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 
4.14.35-1829.el7uek.x86_64 #2
[  284.246609] task: 9bf0507aaf80 task.stack: a9e625628000
[  284.317455] RIP: 0010:remove_inode_hugepages+0x3db/0x3e2

[  285.292389] Call Trace:
[  285.321630]  hugetlbfs_evict_inode+0x1e/0x3e
[  285.372707]  evict+0xdb/0x1af
[  285.408185]  iput+0x1a2/0x1f7
[  285.443661]  dentry_unlink_inode+0xc6/0xf0
[  285.492661]  __dentry_kill+0xd8/0x18d
[  285.536459]  dput+0x1b5/0x1ed
[  285.571939]  __fput+0x18b/0x216
[  285.609495]  fput+0xe/0x10
[  285.646030]  task_work_run+0x90/0xa7
[  285.688788]  exit_to_usermode_loop+0xdd/0x116
[  285.740905]  do_syscall_64+0x187/0x1ae
[  285.785740]  entry_SYSCALL_64_after_hwframe+0x150/0x0

Suggested-by: Mike Kravetz 
Signed-off-by: Jane Chu 


Acked-by: Davidlohr Bueso 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] ipc/shm.c add ->pagesize function to shm_vm_ops

2018-07-30 Thread Michal Hocko
On Fri 27-07-18 15:17:27, Jane Chu wrote:
> Commit 05ea88608d4e13 (mm, hugetlbfs: introduce ->pagesize() to
> vm_operations_struct) adds a new ->pagesize() function to
> hugetlb_vm_ops, intended to cover all hugetlbfs backed files.
> 
> With System V shared memory model, if "huge page" is specified,
> the "shared memory" is backed by hugetlbfs files, but the mappings
> initiated via shmget/shmat have their original vm_ops overwritten
> with shm_vm_ops, so we need to add a ->pagesize function to shm_vm_ops.
> Otherwise, vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs
> backed vma, result in below BUG:
> 
> fs/hugetlbfs/inode.c
> 443 if (unlikely(page_mapped(page))) {
> 444 BUG_ON(truncate_op);
> 
> [  242.268342] hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB 
> is deprecated
> [  282.653208] [ cut here ]
> [  282.708447] kernel BUG at fs/hugetlbfs/inode.c:444!
> [  282.818957] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
> [  284.025873] CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 
> 4.14.35-1829.el7uek.x86_64 #2
> [  284.246609] task: 9bf0507aaf80 task.stack: a9e625628000
> [  284.317455] RIP: 0010:remove_inode_hugepages+0x3db/0x3e2
> 
> [  285.292389] Call Trace:
> [  285.321630]  hugetlbfs_evict_inode+0x1e/0x3e
> [  285.372707]  evict+0xdb/0x1af
> [  285.408185]  iput+0x1a2/0x1f7
> [  285.443661]  dentry_unlink_inode+0xc6/0xf0
> [  285.492661]  __dentry_kill+0xd8/0x18d
> [  285.536459]  dput+0x1b5/0x1ed
> [  285.571939]  __fput+0x18b/0x216
> [  285.609495]  fput+0xe/0x10
> [  285.646030]  task_work_run+0x90/0xa7
> [  285.688788]  exit_to_usermode_loop+0xdd/0x116
> [  285.740905]  do_syscall_64+0x187/0x1ae
> [  285.785740]  entry_SYSCALL_64_after_hwframe+0x150/0x0
> 
> Suggested-by: Mike Kravetz 
> Signed-off-by: Jane Chu 

Acked-by: Michal Hocko 

with Cc: stable and Fixes: tag as suggested by Mike.

I also agree with Matthew that the comment should be associated with
hugetlb_vm_ops/shm_vm_ops.

Thanks!

> ---
>  include/linux/mm.h |  7 +++
>  ipc/shm.c  | 12 
>  2 files changed, 19 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a0fbb9ffe380..0c759379f2d9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -387,6 +387,13 @@ enum page_entry_size {
>   * These are the virtual MM functions - opening of an area, closing and
>   * unmapping it (needed to keep files on disk up-to-date etc), pointer
>   * to the functions called when a no-page or a wp-page exception occurs.
> + *
> + * Note, when a new function is introduced to vm_operations_struct and
> + * added to hugetlb_vm_ops, please consider adding the function to
> + * shm_vm_ops. This is because under System V memory model, though
> + * mappings created via shmget/shmat with "huge page" specified are
> + * backed by hugetlbfs files, their original vm_ops are overwritten with
> + * shm_vm_ops.
>   */
>  struct vm_operations_struct {
>   void (*open)(struct vm_area_struct * area);
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 051a3e1fb8df..fefa00d310fb 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -427,6 +427,17 @@ static int shm_split(struct vm_area_struct *vma, 
> unsigned long addr)
>   return 0;
>  }
>  
> +static unsigned long shm_pagesize(struct vm_area_struct *vma)
> +{
> + struct file *file = vma->vm_file;
> + struct shm_file_data *sfd = shm_file_data(file);
> +
> + if (sfd->vm_ops->pagesize)
> + return sfd->vm_ops->pagesize(vma);
> +
> + return PAGE_SIZE;
> +}
> +
>  #ifdef CONFIG_NUMA
>  static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
>  {
> @@ -554,6 +565,7 @@ static const struct vm_operations_struct shm_vm_ops = {
>   .close  = shm_close,/* callback for when the vm-area is released */
>   .fault  = shm_fault,
>   .split  = shm_split,
> + .pagesize = shm_pagesize,
>  #if defined(CONFIG_NUMA)
>   .set_policy = shm_set_policy,
>   .get_policy = shm_get_policy,
> -- 
> 2.15.GIT
> 

-- 
Michal Hocko
SUSE Labs
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] ipc/shm.c add ->pagesize function to shm_vm_ops

2018-07-28 Thread Matthew Wilcox
On Fri, Jul 27, 2018 at 03:17:27PM -0600, Jane Chu wrote:
> +++ b/include/linux/mm.h
> @@ -387,6 +387,13 @@ enum page_entry_size {
>   * These are the virtual MM functions - opening of an area, closing and
>   * unmapping it (needed to keep files on disk up-to-date etc), pointer
>   * to the functions called when a no-page or a wp-page exception occurs.
> + *
> + * Note, when a new function is introduced to vm_operations_struct and
> + * added to hugetlb_vm_ops, please consider adding the function to
> + * shm_vm_ops. This is because under System V memory model, though
> + * mappings created via shmget/shmat with "huge page" specified are
> + * backed by hugetlbfs files, their original vm_ops are overwritten with
> + * shm_vm_ops.
>   */
>  struct vm_operations_struct {

I don't think this header file is the right place for this comment.
I'd think a better place for it would be at the definition of hugetlb_vm_ops.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] ipc/shm.c add ->pagesize function to shm_vm_ops

2018-07-27 Thread Jane Chu

Hi, Andrew,

On 7/27/2018 2:50 PM, Andrew Morton wrote:


On Fri, 27 Jul 2018 15:17:27 -0600 Jane Chu  wrote:


Commit 05ea88608d4e13 (mm, hugetlbfs: introduce ->pagesize() to
vm_operations_struct) adds a new ->pagesize() function to
hugetlb_vm_ops, intended to cover all hugetlbfs backed files.

That was merged three months ago.  Can you suggest why this was only
noticed now?


The issue was recently reported by a QA engineer running Oracle database
test in Oracle Linux. He first noticed the issue in upstream 4.17, then 4.18,
but because the issue wasn't in Oracle product, it wasn't reported, not
until I cherry picked the patch into Oracle Linux recently.


What workload triggered this?  I see no cc:stable, but 4.17 is affected?


It's Oracle database workload. Large shared memory segments(SGAs) were created
and shared among dozens to hundreds of processes. The crash occurs when the
test stops the database workload.  I do not have access to the test source.
Yes, 4.17 is affected.


With System V shared memory model, if "huge page" is specified,
the "shared memory" is backed by hugetlbfs files, but the mappings
initiated via shmget/shmat have their original vm_ops overwritten
with shm_vm_ops, so we need to add a ->pagesize function to shm_vm_ops.
Otherwise, vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs
backed vma, result in below BUG:

fs/hugetlbfs/inode.c
 443 if (unlikely(page_mapped(page))) {
 444 BUG_ON(truncate_op);

OK, help me out here.  How does an incorrect return value from
vma_kernel_pagesize() result in remove_inode_hugepages() deciding that
it's truncating a mapped page?


To be honest, I don't have a satisfactory answer to how the wrong
pagesize causes a page that's about to be truncated remain mapped.
I relied on the hind sight of BUG_ON(truncate_op).

At a time I inserted dump_stack() into vma_kernel_pagesize() as Mike
suggested to try to dig out more,

unsigned long vma_kernel_pagesize(struct vm_area_struct *vma)
{
-   if (vma->vm_ops && vma->vm_ops->pagesize)
+   if (vma->vm_ops && vma->vm_ops->pagesize) {
return vma->vm_ops->pagesize(vma);
+} else if (is_vm_hugetlb_page(vma)) {
+   struct hstate *hstate;
+   dump_stack();
+   hstate = hstate_vma(vma);
+   return 1UL << huge_page_shift(hstate);
+   }
return PAGE_SIZE;
}

There were too many stack traces that clogged the console, I didn't
capture the entire output, perhaps I should go back to capture them.

Any other ideas?

Regards,
-jane


___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] ipc/shm.c add ->pagesize function to shm_vm_ops

2018-07-27 Thread Andrew Morton
On Fri, 27 Jul 2018 15:17:27 -0600 Jane Chu  wrote:

> Commit 05ea88608d4e13 (mm, hugetlbfs: introduce ->pagesize() to
> vm_operations_struct) adds a new ->pagesize() function to
> hugetlb_vm_ops, intended to cover all hugetlbfs backed files.

That was merged three months ago.  Can you suggest why this was only
noticed now?

What workload triggered this?  I see no cc:stable, but 4.17 is affected?

> With System V shared memory model, if "huge page" is specified,
> the "shared memory" is backed by hugetlbfs files, but the mappings
> initiated via shmget/shmat have their original vm_ops overwritten
> with shm_vm_ops, so we need to add a ->pagesize function to shm_vm_ops.
> Otherwise, vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs
> backed vma, result in below BUG:
> 
> fs/hugetlbfs/inode.c
> 443 if (unlikely(page_mapped(page))) {
> 444 BUG_ON(truncate_op);

OK, help me out here.  How does an incorrect return value from
vma_kernel_pagesize() result in remove_inode_hugepages() deciding that
it's truncating a mapped page?


___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] ipc/shm.c add ->pagesize function to shm_vm_ops

2018-07-27 Thread Mike Kravetz
On 07/27/2018 02:17 PM, Jane Chu wrote:
> Commit 05ea88608d4e13 (mm, hugetlbfs: introduce ->pagesize() to
> vm_operations_struct) adds a new ->pagesize() function to
> hugetlb_vm_ops, intended to cover all hugetlbfs backed files.

Thanks Jane!
Adding Dan on Cc as he authored 05ea88608d4e13.  Note that this is the
same type of omission that was made when adding ->split() to
vm_operations_struct. :(  That is why I suggested adding the comment
above vm_operations_struct.

> With System V shared memory model, if "huge page" is specified,
> the "shared memory" is backed by hugetlbfs files, but the mappings
> initiated via shmget/shmat have their original vm_ops overwritten
> with shm_vm_ops, so we need to add a ->pagesize function to shm_vm_ops.
> Otherwise, vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs
> backed vma, result in below BUG:
> 
> fs/hugetlbfs/inode.c
> 443 if (unlikely(page_mapped(page))) {
> 444 BUG_ON(truncate_op);
> 
> [  242.268342] hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB 
> is deprecated
> [  282.653208] [ cut here ]
> [  282.708447] kernel BUG at fs/hugetlbfs/inode.c:444!
> [  282.818957] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
> [  284.025873] CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 
> 4.14.35-1829.el7uek.x86_64 #2
> [  284.246609] task: 9bf0507aaf80 task.stack: a9e625628000
> [  284.317455] RIP: 0010:remove_inode_hugepages+0x3db/0x3e2
> 
> [  285.292389] Call Trace:
> [  285.321630]  hugetlbfs_evict_inode+0x1e/0x3e
> [  285.372707]  evict+0xdb/0x1af
> [  285.408185]  iput+0x1a2/0x1f7
> [  285.443661]  dentry_unlink_inode+0xc6/0xf0
> [  285.492661]  __dentry_kill+0xd8/0x18d
> [  285.536459]  dput+0x1b5/0x1ed
> [  285.571939]  __fput+0x18b/0x216
> [  285.609495]  fput+0xe/0x10
> [  285.646030]  task_work_run+0x90/0xa7
> [  285.688788]  exit_to_usermode_loop+0xdd/0x116
> [  285.740905]  do_syscall_64+0x187/0x1ae
> [  285.785740]  entry_SYSCALL_64_after_hwframe+0x150/0x0

We will need the tag,
Fixes: 05ea88608d4e13 ("mm, hugetlbfs: introduce ->pagesize() to 
vm_operations_struct")
and,
CC: sta...@vger.kernel.org

> Suggested-by: Mike Kravetz 
> Signed-off-by: Jane Chu 
> ---
>  include/linux/mm.h |  7 +++
>  ipc/shm.c  | 12 
>  2 files changed, 19 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a0fbb9ffe380..0c759379f2d9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -387,6 +387,13 @@ enum page_entry_size {
>   * These are the virtual MM functions - opening of an area, closing and
>   * unmapping it (needed to keep files on disk up-to-date etc), pointer
>   * to the functions called when a no-page or a wp-page exception occurs.
> + *
> + * Note, when a new function is introduced to vm_operations_struct and
> + * added to hugetlb_vm_ops, please consider adding the function to
> + * shm_vm_ops. This is because under System V memory model, though
> + * mappings created via shmget/shmat with "huge page" specified are
> + * backed by hugetlbfs files, their original vm_ops are overwritten with
> + * shm_vm_ops.
>   */
>  struct vm_operations_struct {
>   void (*open)(struct vm_area_struct * area);
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 051a3e1fb8df..fefa00d310fb 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -427,6 +427,17 @@ static int shm_split(struct vm_area_struct *vma, 
> unsigned long addr)
>   return 0;
>  }
>  
> +static unsigned long shm_pagesize(struct vm_area_struct *vma)
> +{
> + struct file *file = vma->vm_file;
> + struct shm_file_data *sfd = shm_file_data(file);
> +
> + if (sfd->vm_ops->pagesize)
> + return sfd->vm_ops->pagesize(vma);
> +
> + return PAGE_SIZE;
> +}
> +
>  #ifdef CONFIG_NUMA
>  static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
>  {
> @@ -554,6 +565,7 @@ static const struct vm_operations_struct shm_vm_ops = {
>   .close  = shm_close,/* callback for when the vm-area is released */
>   .fault  = shm_fault,
>   .split  = shm_split,
> + .pagesize = shm_pagesize,
>  #if defined(CONFIG_NUMA)
>   .set_policy = shm_set_policy,
>   .get_policy = shm_get_policy,
> 

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH] ipc/shm.c add ->pagesize function to shm_vm_ops

2018-07-27 Thread Jane Chu
Commit 05ea88608d4e13 (mm, hugetlbfs: introduce ->pagesize() to
vm_operations_struct) adds a new ->pagesize() function to
hugetlb_vm_ops, intended to cover all hugetlbfs backed files.

With System V shared memory model, if "huge page" is specified,
the "shared memory" is backed by hugetlbfs files, but the mappings
initiated via shmget/shmat have their original vm_ops overwritten
with shm_vm_ops, so we need to add a ->pagesize function to shm_vm_ops.
Otherwise, vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs
backed vma, result in below BUG:

fs/hugetlbfs/inode.c
443 if (unlikely(page_mapped(page))) {
444 BUG_ON(truncate_op);

[  242.268342] hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is 
deprecated
[  282.653208] [ cut here ]
[  282.708447] kernel BUG at fs/hugetlbfs/inode.c:444!
[  282.818957] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
[  284.025873] CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 
4.14.35-1829.el7uek.x86_64 #2
[  284.246609] task: 9bf0507aaf80 task.stack: a9e625628000
[  284.317455] RIP: 0010:remove_inode_hugepages+0x3db/0x3e2

[  285.292389] Call Trace:
[  285.321630]  hugetlbfs_evict_inode+0x1e/0x3e
[  285.372707]  evict+0xdb/0x1af
[  285.408185]  iput+0x1a2/0x1f7
[  285.443661]  dentry_unlink_inode+0xc6/0xf0
[  285.492661]  __dentry_kill+0xd8/0x18d
[  285.536459]  dput+0x1b5/0x1ed
[  285.571939]  __fput+0x18b/0x216
[  285.609495]  fput+0xe/0x10
[  285.646030]  task_work_run+0x90/0xa7
[  285.688788]  exit_to_usermode_loop+0xdd/0x116
[  285.740905]  do_syscall_64+0x187/0x1ae
[  285.785740]  entry_SYSCALL_64_after_hwframe+0x150/0x0

Suggested-by: Mike Kravetz 
Signed-off-by: Jane Chu 
---
 include/linux/mm.h |  7 +++
 ipc/shm.c  | 12 
 2 files changed, 19 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a0fbb9ffe380..0c759379f2d9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -387,6 +387,13 @@ enum page_entry_size {
  * These are the virtual MM functions - opening of an area, closing and
  * unmapping it (needed to keep files on disk up-to-date etc), pointer
  * to the functions called when a no-page or a wp-page exception occurs.
+ *
+ * Note, when a new function is introduced to vm_operations_struct and
+ * added to hugetlb_vm_ops, please consider adding the function to
+ * shm_vm_ops. This is because under System V memory model, though
+ * mappings created via shmget/shmat with "huge page" specified are
+ * backed by hugetlbfs files, their original vm_ops are overwritten with
+ * shm_vm_ops.
  */
 struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
diff --git a/ipc/shm.c b/ipc/shm.c
index 051a3e1fb8df..fefa00d310fb 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -427,6 +427,17 @@ static int shm_split(struct vm_area_struct *vma, unsigned 
long addr)
return 0;
 }
 
+static unsigned long shm_pagesize(struct vm_area_struct *vma)
+{
+   struct file *file = vma->vm_file;
+   struct shm_file_data *sfd = shm_file_data(file);
+
+   if (sfd->vm_ops->pagesize)
+   return sfd->vm_ops->pagesize(vma);
+
+   return PAGE_SIZE;
+}
+
 #ifdef CONFIG_NUMA
 static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
 {
@@ -554,6 +565,7 @@ static const struct vm_operations_struct shm_vm_ops = {
.close  = shm_close,/* callback for when the vm-area is released */
.fault  = shm_fault,
.split  = shm_split,
+   .pagesize = shm_pagesize,
 #if defined(CONFIG_NUMA)
.set_policy = shm_set_policy,
.get_policy = shm_get_policy,
-- 
2.15.GIT

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm