Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-13 Thread Hugh Dickins
On Tue, 13 May 2014, Jianyu Zhan wrote:

> >> This means they guarantee that even they are preemted the vm
> >> counter won't be modified incorrectly.  Because the counter is page-related
> >> (e.g., a new anon page added), and they are exclusively hold the pte lock.
> >
> >But there are multiple pte locks for numerous page. Another process could
> >modify the counter because the pte lock for a different page was
> >available which would cause counter corruption.
> >
> >
> >> So, as you concludes in the other mail that __modd_zone_page_stat
> >> couldn't be used.
> >> in mlocked_vma_newpage, then what qualifies other call sites for using
> >> it, in the same situation?
> 
> Thanks, now everything is clear.
> 
> I've renewed the patch, would you please review it? Thanks!
> 
> ---<8---
> mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()
> 
> mlocked_vma_newpage() is called with pte lock held(a spinlock), which
> implies preemtion disabled, and the vm stat counter is not modified from
> interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
> using a light-weight version __mod_zone_page_state() would be OK.
> 
> This patch also documents __mod_zone_page_state() and some of its
> callsites. The comment above __mod_zone_page_state() is from Hugh
> Dickins, and acked by Christoph.
> 
> Most credits to Hugh and Christoph for the clarification on the usage of
> the __mod_zone_page_state().
> 
> Suggested-by: Andrew Morton 
> Signed-off-by: Hugh Dickins 

No, I didn't sign it off, just offered some text (which you already
acknowledged graciously).  Andrew, please delete that line, but...

> Signed-off-by: Jianyu Zhan 

Acked-by: Hugh Dickins 

Yes, this is good - and even better is the version which Andrew
has slipped into mmotm, removing the duplicated comment line from
page_remove_rmap(), and reformatting the comments to fit better.

It is a little random, in that we can easily see other
__mod_zone_page_states to which the same comment could be applied;
but that's okay, please leave it as is, you've made the point enough
to help other people get it, and it would get boring to repeat it more.

Hugh

> ---
>  mm/internal.h |  7 ++-
>  mm/rmap.c | 10 ++
>  mm/vmstat.c   |  4 +++-
>  3 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 07b6736..53d439e 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct 
> vm_area_struct *vma,
>   return 0;
>  
>   if (!TestSetPageMlocked(page)) {
> - mod_zone_page_state(page_zone(page), NR_MLOCK,
> + /*
> +  * We use the irq-unsafe __mod_zone_page_stat because
> +  * this counter is not modified from interrupt context, and the
> +  * pte lock is held(spinlock), which implies preemtion disabled.
> +  */
> + __mod_zone_page_state(page_zone(page), NR_MLOCK,
>   hpage_nr_pages(page));
>   count_vm_event(UNEVICTABLE_PGMLOCKED);
>   }
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 9c3e773..2fa4375 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
>  {
>   int first = atomic_inc_and_test(>_mapcount);
>   if (first) {
> + /*
> +  * We use the irq-unsafe __{inc|mod}_zone_page_stat because
> +  * these counters are not modified in interrupt context, and
> +  * pte lock(a spinlock) is held, which implies preemtion 
> disabled.
> +  */
>   if (PageTransHuge(page))
>   __inc_zone_page_state(page,
> NR_ANON_TRANSPARENT_HUGEPAGES);
> @@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page)
>   /*
>* Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
>* and not charged by memcg for now.
> +  *
> +  * We use the irq-unsafe __{inc|mod}_zone_page_stat because
> +  * these counters are not modified in interrupt context, and
> +  * these counters are not modified in interrupt context, and
> +  * pte lock(a spinlock) is held, which implies preemtion disabled.
>*/
>   if (unlikely(PageHuge(page)))
>   goto out;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 302dd07..704928e 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
>  }
>  
>  /*
> - * For use when we know that interrupts are disabled.
> + * For use when we know that interrupts are disabled,
> + * or when we know that preemption is disabled and that
> + * particular counter cannot be updated from interrupt context.
>   */
>  void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>   int delta)
> -- 
> 2.0.0-rc1
--
To unsubscribe 

Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-13 Thread Hugh Dickins
On Tue, 13 May 2014, Jianyu Zhan wrote:

  This means they guarantee that even they are preemted the vm
  counter won't be modified incorrectly.  Because the counter is page-related
  (e.g., a new anon page added), and they are exclusively hold the pte lock.
 
 But there are multiple pte locks for numerous page. Another process could
 modify the counter because the pte lock for a different page was
 available which would cause counter corruption.
 
 
  So, as you concludes in the other mail that __modd_zone_page_stat
  couldn't be used.
  in mlocked_vma_newpage, then what qualifies other call sites for using
  it, in the same situation?
 
 Thanks, now everything is clear.
 
 I've renewed the patch, would you please review it? Thanks!
 
 ---8---
 mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()
 
 mlocked_vma_newpage() is called with pte lock held(a spinlock), which
 implies preemtion disabled, and the vm stat counter is not modified from
 interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
 using a light-weight version __mod_zone_page_state() would be OK.
 
 This patch also documents __mod_zone_page_state() and some of its
 callsites. The comment above __mod_zone_page_state() is from Hugh
 Dickins, and acked by Christoph.
 
 Most credits to Hugh and Christoph for the clarification on the usage of
 the __mod_zone_page_state().
 
 Suggested-by: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Hugh Dickins hu...@google.com

No, I didn't sign it off, just offered some text (which you already
acknowledged graciously).  Andrew, please delete that line, but...

 Signed-off-by: Jianyu Zhan nasa4...@gmail.com

Acked-by: Hugh Dickins hu...@google.com

Yes, this is good - and even better is the version which Andrew
has slipped into mmotm, removing the duplicated comment line from
page_remove_rmap(), and reformatting the comments to fit better.

It is a little random, in that we can easily see other
__mod_zone_page_states to which the same comment could be applied;
but that's okay, please leave it as is, you've made the point enough
to help other people get it, and it would get boring to repeat it more.

Hugh

 ---
  mm/internal.h |  7 ++-
  mm/rmap.c | 10 ++
  mm/vmstat.c   |  4 +++-
  3 files changed, 19 insertions(+), 2 deletions(-)
 
 diff --git a/mm/internal.h b/mm/internal.h
 index 07b6736..53d439e 100644
 --- a/mm/internal.h
 +++ b/mm/internal.h
 @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct 
 vm_area_struct *vma,
   return 0;
  
   if (!TestSetPageMlocked(page)) {
 - mod_zone_page_state(page_zone(page), NR_MLOCK,
 + /*
 +  * We use the irq-unsafe __mod_zone_page_stat because
 +  * this counter is not modified from interrupt context, and the
 +  * pte lock is held(spinlock), which implies preemtion disabled.
 +  */
 + __mod_zone_page_state(page_zone(page), NR_MLOCK,
   hpage_nr_pages(page));
   count_vm_event(UNEVICTABLE_PGMLOCKED);
   }
 diff --git a/mm/rmap.c b/mm/rmap.c
 index 9c3e773..2fa4375 100644
 --- a/mm/rmap.c
 +++ b/mm/rmap.c
 @@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
  {
   int first = atomic_inc_and_test(page-_mapcount);
   if (first) {
 + /*
 +  * We use the irq-unsafe __{inc|mod}_zone_page_stat because
 +  * these counters are not modified in interrupt context, and
 +  * pte lock(a spinlock) is held, which implies preemtion 
 disabled.
 +  */
   if (PageTransHuge(page))
   __inc_zone_page_state(page,
 NR_ANON_TRANSPARENT_HUGEPAGES);
 @@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page)
   /*
* Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
* and not charged by memcg for now.
 +  *
 +  * We use the irq-unsafe __{inc|mod}_zone_page_stat because
 +  * these counters are not modified in interrupt context, and
 +  * these counters are not modified in interrupt context, and
 +  * pte lock(a spinlock) is held, which implies preemtion disabled.
*/
   if (unlikely(PageHuge(page)))
   goto out;
 diff --git a/mm/vmstat.c b/mm/vmstat.c
 index 302dd07..704928e 100644
 --- a/mm/vmstat.c
 +++ b/mm/vmstat.c
 @@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
  }
  
  /*
 - * For use when we know that interrupts are disabled.
 + * For use when we know that interrupts are disabled,
 + * or when we know that preemption is disabled and that
 + * particular counter cannot be updated from interrupt context.
   */
  void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
   int delta)
 -- 
 2.0.0-rc1
--
To unsubscribe from this list: send the line unsubscribe 

Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Jianyu Zhan
>Preemption is mis-spelled throughout.
>
>Otherwise
>
>Reviewed-by: Christoph Lameter 

 Oops, corrected. Thanks.

-<8-
>From 1d0a080429542accfeac7de03e159a0bea12abba Mon Sep 17 00:00:00 2001
From: Jianyu Zhan 
Date: Tue, 13 May 2014 00:19:08 +0800
Subject: [PATCH] mm: use the light version __mod_zone_page_state in
 mlocked_vma_newpage()

mlocked_vma_newpage() is called with pte lock held(a spinlock), which
implies preemtion disabled, and the vm stat counter is not modified from
interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
using a light-weight version __mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites. The comment above __mod_zone_page_state() is from Hugh
Dickins, and acked by Christoph.

Most credits to Hugh and Christoph for the clarification on the usage of
the __mod_zone_page_state().

Suggested-by: Andrew Morton 
Signed-off-by: Hugh Dickins 
Reviewed-by: Christoph Lameter 
Signed-off-by: Jianyu Zhan 
---
 mm/internal.h | 7 ++-
 mm/rmap.c | 9 +
 mm/vmstat.c   | 4 +++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..d6a4868 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct 
vm_area_struct *vma,
return 0;
 
if (!TestSetPageMlocked(page)) {
-   mod_zone_page_state(page_zone(page), NR_MLOCK,
+   /*
+* We use the irq-unsafe __mod_zone_page_stat because
+* this counter is not modified from interrupt context, and the
+* pte lock is held(spinlock), which implies preemption 
disabled.
+*/
+   __mod_zone_page_state(page_zone(page), NR_MLOCK,
hpage_nr_pages(page));
count_vm_event(UNEVICTABLE_PGMLOCKED);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..fa73194 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
 {
int first = atomic_inc_and_test(>_mapcount);
if (first) {
+   /*
+* We use the irq-unsafe __{inc|mod}_zone_page_stat because
+* these counters are not modified from interrupt context, and
+* pte lock(a spinlock) is held, which implies preemption 
disabled.
+*/
if (PageTransHuge(page))
__inc_zone_page_state(page,
  NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1082,10 @@ void page_remove_rmap(struct page *page)
/*
 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 * and not charged by memcg for now.
+*
+* We use the irq-unsafe __{inc|mod}_zone_page_stat because
+* these counters are not modified from interrupt context, and
+* pte lock(a spinlock) is held, which implies preemption disabled.
 */
if (unlikely(PageHuge(page)))
goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..704928e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that interrupts are disabled,
+ * or when we know that preemption is disabled and that
+ * particular counter cannot be updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
int delta)
-- 
2.0.0-rc1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Christoph Lameter
On Tue, 13 May 2014, Jianyu Zhan wrote:

> diff --git a/mm/internal.h b/mm/internal.h
> index 07b6736..53d439e 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct 
> vm_area_struct *vma,
>   return 0;
>
>   if (!TestSetPageMlocked(page)) {
> - mod_zone_page_state(page_zone(page), NR_MLOCK,
> + /*
> +  * We use the irq-unsafe __mod_zone_page_stat because
> +  * this counter is not modified from interrupt context, and the
> +  * pte lock is held(spinlock), which implies preemtion disabled.

Preemption is mis-spelled throughout.

Otherwise

Reviewed-by: Christoph Lameter 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Jianyu Zhan
>> This means they guarantee that even they are preemted the vm
>> counter won't be modified incorrectly.  Because the counter is page-related
>> (e.g., a new anon page added), and they are exclusively hold the pte lock.
>
>But there are multiple pte locks for numerous page. Another process could
>modify the counter because the pte lock for a different page was
>available which would cause counter corruption.
>
>
>> So, as you concludes in the other mail that __modd_zone_page_stat
>> couldn't be used.
>> in mlocked_vma_newpage, then what qualifies other call sites for using
>> it, in the same situation?

Thanks, now everything is clear.

I've renewed the patch, would you please review it? Thanks!

---<8---
mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()

mlocked_vma_newpage() is called with pte lock held(a spinlock), which
implies preemtion disabled, and the vm stat counter is not modified from
interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
using a light-weight version __mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites. The comment above __mod_zone_page_state() is from Hugh
Dickins, and acked by Christoph.

Most credits to Hugh and Christoph for the clarification on the usage of
the __mod_zone_page_state().

Suggested-by: Andrew Morton 
Signed-off-by: Hugh Dickins 
Signed-off-by: Jianyu Zhan 
---
 mm/internal.h |  7 ++-
 mm/rmap.c | 10 ++
 mm/vmstat.c   |  4 +++-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..53d439e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct 
vm_area_struct *vma,
return 0;
 
if (!TestSetPageMlocked(page)) {
-   mod_zone_page_state(page_zone(page), NR_MLOCK,
+   /*
+* We use the irq-unsafe __mod_zone_page_stat because
+* this counter is not modified from interrupt context, and the
+* pte lock is held(spinlock), which implies preemtion disabled.
+*/
+   __mod_zone_page_state(page_zone(page), NR_MLOCK,
hpage_nr_pages(page));
count_vm_event(UNEVICTABLE_PGMLOCKED);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..2fa4375 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
 {
int first = atomic_inc_and_test(>_mapcount);
if (first) {
+   /*
+* We use the irq-unsafe __{inc|mod}_zone_page_stat because
+* these counters are not modified in interrupt context, and
+* pte lock(a spinlock) is held, which implies preemtion 
disabled.
+*/
if (PageTransHuge(page))
__inc_zone_page_state(page,
  NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page)
/*
 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 * and not charged by memcg for now.
+*
+* We use the irq-unsafe __{inc|mod}_zone_page_stat because
+* these counters are not modified in interrupt context, and
+* these counters are not modified in interrupt context, and
+* pte lock(a spinlock) is held, which implies preemtion disabled.
 */
if (unlikely(PageHuge(page)))
goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..704928e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that interrupts are disabled,
+ * or when we know that preemption is disabled and that
+ * particular counter cannot be updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
int delta)
-- 
2.0.0-rc1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Christoph Lameter
On Mon, 12 May 2014, Jianyu Zhan wrote:

> This means they guarantee that even they are preemted the vm
> counter won't be modified incorrectly.  Because the counter is page-related
> (e.g., a new anon page added), and they are exclusively hold the pte lock.

Ok. if these locations hold the pte lock then preemption is disabled and
you are ok to use __mod_zone_page_state. Has nothing to do with the page.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Christoph Lameter
On Mon, 12 May 2014, Jianyu Zhan wrote:

> This means they guarantee that even they are preemted the vm
> counter won't be modified incorrectly.  Because the counter is page-related
> (e.g., a new anon page added), and they are exclusively hold the pte lock.

But there are multiple pte locks for numerous page. Another process could
modify the counter because the pte lock for a different page was
available which would cause counter corruption.


> So, as you concludes in the other mail that __modd_zone_page_stat
> couldn't be used.
> in mlocked_vma_newpage, then what qualifies other call sites for using
> it, in the same situation?

Preemption should be off in those functions because a spinlock is being
held.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Jianyu Zhan
On Mon, May 12, 2014 at 10:01 PM, Christoph Lameter  wrote:
> >
> >/*
> > * For use when we know that interrupts are disabled,
> > * or when we know that preemption is disabled and that
> > * particular counter cannot be updated from interrupt context.
> > */
>
> The description above looks ok to me. The problem is that you are
> considering the page related data structures as an issue for races and not
> the data structures relevant for vm statistics.

Hi,  Christoph,

Yep. I did.

Let me restate the point here.

 To use  __mod_zone_page_stat when we know that either
 1. interrupts are disabled, or
 2. preemption is disabled and that particular counter cannot be
updated from interrupt context.

For those call sites currently using __mod_zone_page_stat, they just guarantees
the counter is never modified from an interrupt context, but doesn't disable
preemption.

This means they guarantee that even they are preemted the vm
counter won't be modified incorrectly.  Because the counter is page-related
(e.g., a new anon page added), and they are exclusively hold the pte lock.

This is why I emphasized on 'the page related data structures as an
issue for races'.

So, as you concludes in the other mail that __modd_zone_page_stat
couldn't be used.
in mlocked_vma_newpage, then what qualifies other call sites for using
it, in the same situation?
See:

void page_add_new_anon_rmap(struct page *page,
struct vm_area_struct *vma, unsigned long address)
{
VM_BUG_ON(address < vma->vm_start || address >= vma->vm_end);
SetPageSwapBacked(page);
atomic_set(>_mapcount, 0); /* increment count (starts at -1) */
if (PageTransHuge(page))
__inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
__mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
hpage_nr_pages(page)); <---  using it.
__page_set_anon_rmap(page, vma, address, 1);
if (!mlocked_vma_newpage(vma, page)) { <--- couldn't use it ?
SetPageActive(page);
lru_cache_add(page);
} else
add_page_to_unevictable_list(page);
}

Hope I express it clearly enough.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Christoph Lameter
On Sun, 11 May 2014, Jianyu Zhan wrote:

> >
> >/*
> > * For use when we know that interrupts are disabled,
> > * or when we know that preemption is disabled and that
> > * particular counter cannot be updated from interrupt context.
> > */
>
>  Seconded. Christoph, would you please write a comment? I've written
>  a new one based on Hugh's, would you please also take a look? Thanks.

The description above looks ok to me. The problem is that you are
considering the page related data structures as an issue for races and not
the data structures relevant for vm statistics.

> It is essential to have such gurantees, because __mod_zone_page_stat()
> is a two-step operation : read-percpu-couter-then-modify-it.
> (Need comments. Christoph, do I misunderstand it?)

Yup.

> mlocked_vma_newpage() is only called in fault path by
> page_add_new_anon_rmap(), which is called on a *new* page.
> And such page is initially only visible via the pagetables, and the
> pte is locked while calling page_add_new_anon_rmap(), so we need not
> use an irq-safe mod_zone_page_state() here, using a light-weight version
> __mod_zone_page_state() would be OK.

This is wrong.. What you could say is that preemption is off and that the
counter is never incremented from an interrupt context that could race
with it. If this is the case then it would be safe.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Christoph Lameter
On Sun, 11 May 2014, Jianyu Zhan wrote:

 
 /*
  * For use when we know that interrupts are disabled,
  * or when we know that preemption is disabled and that
  * particular counter cannot be updated from interrupt context.
  */

  Seconded. Christoph, would you please write a comment? I've written
  a new one based on Hugh's, would you please also take a look? Thanks.

The description above looks ok to me. The problem is that you are
considering the page related data structures as an issue for races and not
the data structures relevant for vm statistics.

 It is essential to have such gurantees, because __mod_zone_page_stat()
 is a two-step operation : read-percpu-couter-then-modify-it.
 (Need comments. Christoph, do I misunderstand it?)

Yup.

 mlocked_vma_newpage() is only called in fault path by
 page_add_new_anon_rmap(), which is called on a *new* page.
 And such page is initially only visible via the pagetables, and the
 pte is locked while calling page_add_new_anon_rmap(), so we need not
 use an irq-safe mod_zone_page_state() here, using a light-weight version
 __mod_zone_page_state() would be OK.

This is wrong.. What you could say is that preemption is off and that the
counter is never incremented from an interrupt context that could race
with it. If this is the case then it would be safe.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Jianyu Zhan
On Mon, May 12, 2014 at 10:01 PM, Christoph Lameter c...@linux.com wrote:
 
 /*
  * For use when we know that interrupts are disabled,
  * or when we know that preemption is disabled and that
  * particular counter cannot be updated from interrupt context.
  */

 The description above looks ok to me. The problem is that you are
 considering the page related data structures as an issue for races and not
 the data structures relevant for vm statistics.

Hi,  Christoph,

Yep. I did.

Let me restate the point here.

 To use  __mod_zone_page_stat when we know that either
 1. interrupts are disabled, or
 2. preemption is disabled and that particular counter cannot be
updated from interrupt context.

For those call sites currently using __mod_zone_page_stat, they just guarantees
the counter is never modified from an interrupt context, but doesn't disable
preemption.

This means they guarantee that even they are preemted the vm
counter won't be modified incorrectly.  Because the counter is page-related
(e.g., a new anon page added), and they are exclusively hold the pte lock.

This is why I emphasized on 'the page related data structures as an
issue for races'.

So, as you concludes in the other mail that __modd_zone_page_stat
couldn't be used.
in mlocked_vma_newpage, then what qualifies other call sites for using
it, in the same situation?
See:

void page_add_new_anon_rmap(struct page *page,
struct vm_area_struct *vma, unsigned long address)
{
VM_BUG_ON(address  vma-vm_start || address = vma-vm_end);
SetPageSwapBacked(page);
atomic_set(page-_mapcount, 0); /* increment count (starts at -1) */
if (PageTransHuge(page))
__inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
__mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
hpage_nr_pages(page)); ---  using it.
__page_set_anon_rmap(page, vma, address, 1);
if (!mlocked_vma_newpage(vma, page)) { --- couldn't use it ?
SetPageActive(page);
lru_cache_add(page);
} else
add_page_to_unevictable_list(page);
}

Hope I express it clearly enough.  Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Christoph Lameter
On Mon, 12 May 2014, Jianyu Zhan wrote:

 This means they guarantee that even they are preemted the vm
 counter won't be modified incorrectly.  Because the counter is page-related
 (e.g., a new anon page added), and they are exclusively hold the pte lock.

But there are multiple pte locks for numerous page. Another process could
modify the counter because the pte lock for a different page was
available which would cause counter corruption.


 So, as you concludes in the other mail that __modd_zone_page_stat
 couldn't be used.
 in mlocked_vma_newpage, then what qualifies other call sites for using
 it, in the same situation?

Preemption should be off in those functions because a spinlock is being
held.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Christoph Lameter
On Mon, 12 May 2014, Jianyu Zhan wrote:

 This means they guarantee that even they are preemted the vm
 counter won't be modified incorrectly.  Because the counter is page-related
 (e.g., a new anon page added), and they are exclusively hold the pte lock.

Ok. if these locations hold the pte lock then preemption is disabled and
you are ok to use __mod_zone_page_state. Has nothing to do with the page.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Jianyu Zhan
 This means they guarantee that even they are preemted the vm
 counter won't be modified incorrectly.  Because the counter is page-related
 (e.g., a new anon page added), and they are exclusively hold the pte lock.

But there are multiple pte locks for numerous page. Another process could
modify the counter because the pte lock for a different page was
available which would cause counter corruption.


 So, as you concludes in the other mail that __modd_zone_page_stat
 couldn't be used.
 in mlocked_vma_newpage, then what qualifies other call sites for using
 it, in the same situation?

Thanks, now everything is clear.

I've renewed the patch, would you please review it? Thanks!

---8---
mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()

mlocked_vma_newpage() is called with pte lock held(a spinlock), which
implies preemtion disabled, and the vm stat counter is not modified from
interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
using a light-weight version __mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites. The comment above __mod_zone_page_state() is from Hugh
Dickins, and acked by Christoph.

Most credits to Hugh and Christoph for the clarification on the usage of
the __mod_zone_page_state().

Suggested-by: Andrew Morton a...@linux-foundation.org
Signed-off-by: Hugh Dickins hu...@google.com
Signed-off-by: Jianyu Zhan nasa4...@gmail.com
---
 mm/internal.h |  7 ++-
 mm/rmap.c | 10 ++
 mm/vmstat.c   |  4 +++-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..53d439e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct 
vm_area_struct *vma,
return 0;
 
if (!TestSetPageMlocked(page)) {
-   mod_zone_page_state(page_zone(page), NR_MLOCK,
+   /*
+* We use the irq-unsafe __mod_zone_page_stat because
+* this counter is not modified from interrupt context, and the
+* pte lock is held(spinlock), which implies preemtion disabled.
+*/
+   __mod_zone_page_state(page_zone(page), NR_MLOCK,
hpage_nr_pages(page));
count_vm_event(UNEVICTABLE_PGMLOCKED);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..2fa4375 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
 {
int first = atomic_inc_and_test(page-_mapcount);
if (first) {
+   /*
+* We use the irq-unsafe __{inc|mod}_zone_page_stat because
+* these counters are not modified in interrupt context, and
+* pte lock(a spinlock) is held, which implies preemtion 
disabled.
+*/
if (PageTransHuge(page))
__inc_zone_page_state(page,
  NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page)
/*
 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 * and not charged by memcg for now.
+*
+* We use the irq-unsafe __{inc|mod}_zone_page_stat because
+* these counters are not modified in interrupt context, and
+* these counters are not modified in interrupt context, and
+* pte lock(a spinlock) is held, which implies preemtion disabled.
 */
if (unlikely(PageHuge(page)))
goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..704928e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that interrupts are disabled,
+ * or when we know that preemption is disabled and that
+ * particular counter cannot be updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
int delta)
-- 
2.0.0-rc1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Christoph Lameter
On Tue, 13 May 2014, Jianyu Zhan wrote:

 diff --git a/mm/internal.h b/mm/internal.h
 index 07b6736..53d439e 100644
 --- a/mm/internal.h
 +++ b/mm/internal.h
 @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct 
 vm_area_struct *vma,
   return 0;

   if (!TestSetPageMlocked(page)) {
 - mod_zone_page_state(page_zone(page), NR_MLOCK,
 + /*
 +  * We use the irq-unsafe __mod_zone_page_stat because
 +  * this counter is not modified from interrupt context, and the
 +  * pte lock is held(spinlock), which implies preemtion disabled.

Preemption is mis-spelled throughout.

Otherwise

Reviewed-by: Christoph Lameter c...@linux.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-12 Thread Jianyu Zhan
Preemption is mis-spelled throughout.

Otherwise

Reviewed-by: Christoph Lameter c...@linux.com

 Oops, corrected. Thanks.

-8-
From 1d0a080429542accfeac7de03e159a0bea12abba Mon Sep 17 00:00:00 2001
From: Jianyu Zhan nasa4...@gmail.com
Date: Tue, 13 May 2014 00:19:08 +0800
Subject: [PATCH] mm: use the light version __mod_zone_page_state in
 mlocked_vma_newpage()

mlocked_vma_newpage() is called with pte lock held(a spinlock), which
implies preemtion disabled, and the vm stat counter is not modified from
interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
using a light-weight version __mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites. The comment above __mod_zone_page_state() is from Hugh
Dickins, and acked by Christoph.

Most credits to Hugh and Christoph for the clarification on the usage of
the __mod_zone_page_state().

Suggested-by: Andrew Morton a...@linux-foundation.org
Signed-off-by: Hugh Dickins hu...@google.com
Reviewed-by: Christoph Lameter c...@linux.com
Signed-off-by: Jianyu Zhan nasa4...@gmail.com
---
 mm/internal.h | 7 ++-
 mm/rmap.c | 9 +
 mm/vmstat.c   | 4 +++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..d6a4868 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct 
vm_area_struct *vma,
return 0;
 
if (!TestSetPageMlocked(page)) {
-   mod_zone_page_state(page_zone(page), NR_MLOCK,
+   /*
+* We use the irq-unsafe __mod_zone_page_stat because
+* this counter is not modified from interrupt context, and the
+* pte lock is held(spinlock), which implies preemption 
disabled.
+*/
+   __mod_zone_page_state(page_zone(page), NR_MLOCK,
hpage_nr_pages(page));
count_vm_event(UNEVICTABLE_PGMLOCKED);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..fa73194 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
 {
int first = atomic_inc_and_test(page-_mapcount);
if (first) {
+   /*
+* We use the irq-unsafe __{inc|mod}_zone_page_stat because
+* these counters are not modified from interrupt context, and
+* pte lock(a spinlock) is held, which implies preemption 
disabled.
+*/
if (PageTransHuge(page))
__inc_zone_page_state(page,
  NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1082,10 @@ void page_remove_rmap(struct page *page)
/*
 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 * and not charged by memcg for now.
+*
+* We use the irq-unsafe __{inc|mod}_zone_page_stat because
+* these counters are not modified from interrupt context, and
+* pte lock(a spinlock) is held, which implies preemption disabled.
 */
if (unlikely(PageHuge(page)))
goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..704928e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that interrupts are disabled,
+ * or when we know that preemption is disabled and that
+ * particular counter cannot be updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
int delta)
-- 
2.0.0-rc1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-11 Thread Jianyu Zhan
>Your original __mod_zone_page_state happened to be correct;
>but you have no understanding of why it was correct, so its
>comment was very wrong, even after you changed the irq wording.
>
>This series just propagates your misunderstanding further,
>while providing an object lesson in how not to present a series.
>
>Sorry, you have quickly developed an unenviable reputation for
>patches which waste developers' time: please consider your
>patches much more carefully before posting them.

Hi, Hugh.

I'm sorry for my misunderstanding in the previous patches. I should
have given them more thought. I'm really sorry. {palmface} X 3

And I am really appreciated for your detailed comments and your patience
with me such a noob;-) Today I've spent some time on digging into
the history to figure out what is under the hood.

>I'd prefer to let Christoph write the definitive version,
>but my first stab at it would be:
>
>/*
> * For use when we know that interrupts are disabled,
> * or when we know that preemption is disabled and that
> * particular counter cannot be updated from interrupt context.
> */

 Seconded. Christoph, would you please write a comment? I've written
 a new one based on Hugh's, would you please also take a look? Thanks.

>Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant
>and helpful comment in __page_set_anon_rmap():
>/*
> * nr_mapped state can be updated without turning off
> * interrupts because it is not modified via interrupt.
> */
>__inc_page_state(nr_mapped);
>
>The comment survived the replacement of nr_mapped, but eventually
>it got cleaned away completely.
>
>It is safe to use the irq-unsafe __mod_zone_page_stat on counters
>which are never modified via interrupt.
>You are right that the comment is not good enough, but I don't trust
>your version either.  Since percpu variables are involved, it's important
>that preemption be disabled too (see comment above __inc_zone_state).

Thanks for clarifying this. I checked the history, this is my understanding
, correct me if I am wrong, thanks!

__mod_zone_page_stat() should be called with irq-off, this is a strongest
gurantee for safety. For a weaker gurantee, preemte-disable is needed and
in this situation, the item counter in question should not be modified in
interrupt context.

It is essential to have such gurantees, because __mod_zone_page_stat()
is a two-step operation : read-percpu-couter-then-modify-it.
(Need comments. Christoph, do I misunderstand it?)

For for all other call sites of __mod_zone_page_stat() in the patch,
I think your comment is right, it is because they are not modified in
interrupt context, so we could safely use __mod_zone_page_stat().
 
But for the call site in page_add_new_anon_rmap(), I think my previous
wording is appropriate: mlocked_vma_newpage() is only called in fault path
by page_add_new_anon_rmap() on a *new* page, which is initially only visible
via the pagetables, and the pte is locked while calling 
page_add_new_anon_rmap(),
so we are not afraid of others seeing it a this point, not even modifying it.
so whether is could be modified in interrupt context or not does matter, but it
is not the key point here.

I've renewed the patch. And in this patch, I also change page_add_new_anon_rmap 
to
use __mod_zone_page_stat(), since they are quite relative to be in one patch.

Thanks for your review. I am appreciated for your comments. Andrew, Christoph,
would you please review it? Thansk.

-<8-
mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()

mlocked_vma_newpage() is only called in fault path by
page_add_new_anon_rmap(), which is called on a *new* page.
And such page is initially only visible via the pagetables, and the
pte is locked while calling page_add_new_anon_rmap(), so we need not
use an irq-safe mod_zone_page_state() here, using a light-weight version
__mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites.

Suggested-by: Andrew Morton 
Suggested-by: Hugh Dickins 
Signed-off-by: Jianyu Zhan 
---
 mm/internal.h |  9 -
 mm/rmap.c | 11 +++
 mm/vmstat.c   |  5 -
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..7140c9b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,14 @@ static inline int mlocked_vma_newpage(struct 
vm_area_struct *vma,
return 0;
 
if (!TestSetPageMlocked(page)) {
-   mod_zone_page_state(page_zone(page), NR_MLOCK,
+   /*
+* We use the irq-unsafe __mod_zone_page_stat because
+* 1. this counter is not modified in interrupt context, and
+* 2. pte lock is held, and this a newpage, which is initially
+*only visible via the pagetables. So this would exclude
+*racy processes from preemting us and to modify it.
+   

Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-11 Thread Jianyu Zhan
Your original __mod_zone_page_state happened to be correct;
but you have no understanding of why it was correct, so its
comment was very wrong, even after you changed the irq wording.

This series just propagates your misunderstanding further,
while providing an object lesson in how not to present a series.

Sorry, you have quickly developed an unenviable reputation for
patches which waste developers' time: please consider your
patches much more carefully before posting them.

Hi, Hugh.

I'm sorry for my misunderstanding in the previous patches. I should
have given them more thought. I'm really sorry. {palmface} X 3

And I am really appreciated for your detailed comments and your patience
with me such a noob;-) Today I've spent some time on digging into
the history to figure out what is under the hood.

I'd prefer to let Christoph write the definitive version,
but my first stab at it would be:

/*
 * For use when we know that interrupts are disabled,
 * or when we know that preemption is disabled and that
 * particular counter cannot be updated from interrupt context.
 */

 Seconded. Christoph, would you please write a comment? I've written
 a new one based on Hugh's, would you please also take a look? Thanks.

Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant
and helpful comment in __page_set_anon_rmap():
/*
 * nr_mapped state can be updated without turning off
 * interrupts because it is not modified via interrupt.
 */
__inc_page_state(nr_mapped);

The comment survived the replacement of nr_mapped, but eventually
it got cleaned away completely.

It is safe to use the irq-unsafe __mod_zone_page_stat on counters
which are never modified via interrupt.
You are right that the comment is not good enough, but I don't trust
your version either.  Since percpu variables are involved, it's important
that preemption be disabled too (see comment above __inc_zone_state).

Thanks for clarifying this. I checked the history, this is my understanding
, correct me if I am wrong, thanks!

__mod_zone_page_stat() should be called with irq-off, this is a strongest
gurantee for safety. For a weaker gurantee, preemte-disable is needed and
in this situation, the item counter in question should not be modified in
interrupt context.

It is essential to have such gurantees, because __mod_zone_page_stat()
is a two-step operation : read-percpu-couter-then-modify-it.
(Need comments. Christoph, do I misunderstand it?)

For for all other call sites of __mod_zone_page_stat() in the patch,
I think your comment is right, it is because they are not modified in
interrupt context, so we could safely use __mod_zone_page_stat().
 
But for the call site in page_add_new_anon_rmap(), I think my previous
wording is appropriate: mlocked_vma_newpage() is only called in fault path
by page_add_new_anon_rmap() on a *new* page, which is initially only visible
via the pagetables, and the pte is locked while calling 
page_add_new_anon_rmap(),
so we are not afraid of others seeing it a this point, not even modifying it.
so whether is could be modified in interrupt context or not does matter, but it
is not the key point here.

I've renewed the patch. And in this patch, I also change page_add_new_anon_rmap 
to
use __mod_zone_page_stat(), since they are quite relative to be in one patch.

Thanks for your review. I am appreciated for your comments. Andrew, Christoph,
would you please review it? Thansk.

-8-
mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()

mlocked_vma_newpage() is only called in fault path by
page_add_new_anon_rmap(), which is called on a *new* page.
And such page is initially only visible via the pagetables, and the
pte is locked while calling page_add_new_anon_rmap(), so we need not
use an irq-safe mod_zone_page_state() here, using a light-weight version
__mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites.

Suggested-by: Andrew Morton a...@linux-foundation.org
Suggested-by: Hugh Dickins hu...@google.com
Signed-off-by: Jianyu Zhan nasa4...@gmail.com
---
 mm/internal.h |  9 -
 mm/rmap.c | 11 +++
 mm/vmstat.c   |  5 -
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..7140c9b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,14 @@ static inline int mlocked_vma_newpage(struct 
vm_area_struct *vma,
return 0;
 
if (!TestSetPageMlocked(page)) {
-   mod_zone_page_state(page_zone(page), NR_MLOCK,
+   /*
+* We use the irq-unsafe __mod_zone_page_stat because
+* 1. this counter is not modified in interrupt context, and
+* 2. pte lock is held, and this a newpage, which is initially
+*only visible via the pagetables. So this would exclude
+*racy processes from preemting us and 

Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-10 Thread Hugh Dickins
On Sat, 10 May 2014, Jianyu Zhan wrote:

> __mod_zone_page_stat() is not irq-safe, so it should be used carefully.
> And it is not appropirately documented now. This patch adds comment for
> it, and also documents for some of its call sites.
> 
> Suggested-by: Andrew Morton 
> Signed-off-by: Jianyu Zhan 

Your original __mod_zone_page_state happened to be correct;
but you have no understanding of why it was correct, so its
comment was very wrong, even after you changed the irq wording.

This series just propagates your misunderstanding further,
while providing an object lesson in how not to present a series.

Sorry, you have quickly developed an unenviable reputation for
patches which waste developers' time: please consider your
patches much more carefully before posting them.

> ---
>  mm/page_alloc.c |  2 ++
>  mm/rmap.c   |  6 ++
>  mm/vmstat.c | 16 +++-
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..9d6f474 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page)
>   *
>   * And clear the zone's pages_scanned counter, to hold off the "all pages are
>   * pinned" detection logic.
> + *
> + * Note: this function should be used with irq disabled.

Correct, but I don't see that that needed saying.  This is a static
function which is being used as intended: just because it matched your
search for "__mod_zone_page_state" is not a reason for you to add that
comment; irq disabled is only one of its prerequisites.

>   */
>  static void free_pcppages_bulk(struct zone *zone, int count,
>   struct per_cpu_pages *pcp)
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 9c3e773..6078a30 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page,
>  /*
>   * Special version of the above for do_swap_page, which often runs
>   * into pages that are exclusively owned by the current process.
> + * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat
> + * here without others racing change it in between.

And yet you can immediately see them being used without any test
for "exclusive" below: why is that?  Think about it.

>   * Everybody else should continue to use page_add_anon_rmap above.
>   */
>  void do_page_add_anon_rmap(struct page *page,
> @@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page)
>   /*
>* Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
>* and not charged by memcg for now.
> +  *
> +  * And we are the last user of this page, so it is safe to use
> +  * the irq-unsafe version __{mod|dec}_zone_page here, since we
> +  * have no racer.

Again, the code is correct to be using the irq-unsafe version, but your
comment is doubly wrong.

We are not necessarily the last user of this page, merely the one that
just now brought the mapcount down to 0.

But think: what bearing would being the last user of this page have on
the safety of using __mod_zone_page_state to adjust per-zone counters?

None at all.  A page does not move from one zone to another (though its
contents might be migrated from one page to another when safe to do so).

Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant
and helpful comment in __page_set_anon_rmap():
/*
 * nr_mapped state can be updated without turning off
 * interrupts because it is not modified via interrupt.
 */
__inc_page_state(nr_mapped);

The comment survived the replacement of nr_mapped, but eventually
it got cleaned away completely.

It is safe to use the irq-unsafe __mod_zone_page_stat on counters
which are never modified via interrupt.

>*/
>   if (unlikely(PageHuge(page)))
>   goto out;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 302dd07..778f154 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
>  }
>  
>  /*
> - * For use when we know that interrupts are disabled.
> + * Optimized modificatoin function.
> + *
> + * The code basically does the modification in two steps:
> + *
> + *  1. read the current counter based on the processor number
> + *  2. modificate the counter write it back.
> + *
> + * So this function should be used with the guarantee that
> + *
> + *  1. interrupts are disabled, or
> + *  2. interrupts are enabled, but no other sites would race to
> + * modify this counter in between.
> + *
> + * Otherwise, an irq-safe version mod_zone_page_state() should
> + * be used instead.

You are right that the comment is not good enough, but I don't trust
your version either.  Since percpu variables are involved, it's important
that preemption be disabled too (see comment above __inc_zone_state).

I'd prefer to let Christoph write the definitive version,
but my first stab at it would be:


[PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-10 Thread Jianyu Zhan
__mod_zone_page_stat() is not irq-safe, so it should be used carefully.
And it is not appropirately documented now. This patch adds comment for
it, and also documents for some of its call sites.

Suggested-by: Andrew Morton 
Signed-off-by: Jianyu Zhan 
---
 mm/page_alloc.c |  2 ++
 mm/rmap.c   |  6 ++
 mm/vmstat.c | 16 +++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..9d6f474 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page)
  *
  * And clear the zone's pages_scanned counter, to hold off the "all pages are
  * pinned" detection logic.
+ *
+ * Note: this function should be used with irq disabled.
  */
 static void free_pcppages_bulk(struct zone *zone, int count,
struct per_cpu_pages *pcp)
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..6078a30 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page,
 /*
  * Special version of the above for do_swap_page, which often runs
  * into pages that are exclusively owned by the current process.
+ * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat
+ * here without others racing change it in between.
  * Everybody else should continue to use page_add_anon_rmap above.
  */
 void do_page_add_anon_rmap(struct page *page,
@@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page)
/*
 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 * and not charged by memcg for now.
+*
+* And we are the last user of this page, so it is safe to use
+* the irq-unsafe version __{mod|dec}_zone_page here, since we
+* have no racer.
 */
if (unlikely(PageHuge(page)))
goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..778f154 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * Optimized modificatoin function.
+ *
+ * The code basically does the modification in two steps:
+ *
+ *  1. read the current counter based on the processor number
+ *  2. modificate the counter write it back.
+ *
+ * So this function should be used with the guarantee that
+ *
+ *  1. interrupts are disabled, or
+ *  2. interrupts are enabled, but no other sites would race to
+ * modify this counter in between.
+ *
+ * Otherwise, an irq-safe version mod_zone_page_state() should
+ * be used instead.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
int delta)
-- 
2.0.0-rc1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-10 Thread Jianyu Zhan
__mod_zone_page_stat() is not irq-safe, so it should be used carefully.
And it is not appropirately documented now. This patch adds comment for
it, and also documents for some of its call sites.

Suggested-by: Andrew Morton a...@linux-foundation.org
Signed-off-by: Jianyu Zhan nasa4...@gmail.com
---
 mm/page_alloc.c |  2 ++
 mm/rmap.c   |  6 ++
 mm/vmstat.c | 16 +++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..9d6f474 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page)
  *
  * And clear the zone's pages_scanned counter, to hold off the all pages are
  * pinned detection logic.
+ *
+ * Note: this function should be used with irq disabled.
  */
 static void free_pcppages_bulk(struct zone *zone, int count,
struct per_cpu_pages *pcp)
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..6078a30 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page,
 /*
  * Special version of the above for do_swap_page, which often runs
  * into pages that are exclusively owned by the current process.
+ * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat
+ * here without others racing change it in between.
  * Everybody else should continue to use page_add_anon_rmap above.
  */
 void do_page_add_anon_rmap(struct page *page,
@@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page)
/*
 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 * and not charged by memcg for now.
+*
+* And we are the last user of this page, so it is safe to use
+* the irq-unsafe version __{mod|dec}_zone_page here, since we
+* have no racer.
 */
if (unlikely(PageHuge(page)))
goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..778f154 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * Optimized modificatoin function.
+ *
+ * The code basically does the modification in two steps:
+ *
+ *  1. read the current counter based on the processor number
+ *  2. modificate the counter write it back.
+ *
+ * So this function should be used with the guarantee that
+ *
+ *  1. interrupts are disabled, or
+ *  2. interrupts are enabled, but no other sites would race to
+ * modify this counter in between.
+ *
+ * Otherwise, an irq-safe version mod_zone_page_state() should
+ * be used instead.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
int delta)
-- 
2.0.0-rc1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat

2014-05-10 Thread Hugh Dickins
On Sat, 10 May 2014, Jianyu Zhan wrote:

 __mod_zone_page_stat() is not irq-safe, so it should be used carefully.
 And it is not appropirately documented now. This patch adds comment for
 it, and also documents for some of its call sites.
 
 Suggested-by: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Jianyu Zhan nasa4...@gmail.com

Your original __mod_zone_page_state happened to be correct;
but you have no understanding of why it was correct, so its
comment was very wrong, even after you changed the irq wording.

This series just propagates your misunderstanding further,
while providing an object lesson in how not to present a series.

Sorry, you have quickly developed an unenviable reputation for
patches which waste developers' time: please consider your
patches much more carefully before posting them.

 ---
  mm/page_alloc.c |  2 ++
  mm/rmap.c   |  6 ++
  mm/vmstat.c | 16 +++-
  3 files changed, 23 insertions(+), 1 deletion(-)
 
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 5dba293..9d6f474 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page)
   *
   * And clear the zone's pages_scanned counter, to hold off the all pages are
   * pinned detection logic.
 + *
 + * Note: this function should be used with irq disabled.

Correct, but I don't see that that needed saying.  This is a static
function which is being used as intended: just because it matched your
search for __mod_zone_page_state is not a reason for you to add that
comment; irq disabled is only one of its prerequisites.

   */
  static void free_pcppages_bulk(struct zone *zone, int count,
   struct per_cpu_pages *pcp)
 diff --git a/mm/rmap.c b/mm/rmap.c
 index 9c3e773..6078a30 100644
 --- a/mm/rmap.c
 +++ b/mm/rmap.c
 @@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page,
  /*
   * Special version of the above for do_swap_page, which often runs
   * into pages that are exclusively owned by the current process.
 + * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat
 + * here without others racing change it in between.

And yet you can immediately see them being used without any test
for exclusive below: why is that?  Think about it.

   * Everybody else should continue to use page_add_anon_rmap above.
   */
  void do_page_add_anon_rmap(struct page *page,
 @@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page)
   /*
* Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
* and not charged by memcg for now.
 +  *
 +  * And we are the last user of this page, so it is safe to use
 +  * the irq-unsafe version __{mod|dec}_zone_page here, since we
 +  * have no racer.

Again, the code is correct to be using the irq-unsafe version, but your
comment is doubly wrong.

We are not necessarily the last user of this page, merely the one that
just now brought the mapcount down to 0.

But think: what bearing would being the last user of this page have on
the safety of using __mod_zone_page_state to adjust per-zone counters?

None at all.  A page does not move from one zone to another (though its
contents might be migrated from one page to another when safe to do so).

Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant
and helpful comment in __page_set_anon_rmap():
/*
 * nr_mapped state can be updated without turning off
 * interrupts because it is not modified via interrupt.
 */
__inc_page_state(nr_mapped);

The comment survived the replacement of nr_mapped, but eventually
it got cleaned away completely.

It is safe to use the irq-unsafe __mod_zone_page_stat on counters
which are never modified via interrupt.

*/
   if (unlikely(PageHuge(page)))
   goto out;
 diff --git a/mm/vmstat.c b/mm/vmstat.c
 index 302dd07..778f154 100644
 --- a/mm/vmstat.c
 +++ b/mm/vmstat.c
 @@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
  }
  
  /*
 - * For use when we know that interrupts are disabled.
 + * Optimized modificatoin function.
 + *
 + * The code basically does the modification in two steps:
 + *
 + *  1. read the current counter based on the processor number
 + *  2. modificate the counter write it back.
 + *
 + * So this function should be used with the guarantee that
 + *
 + *  1. interrupts are disabled, or
 + *  2. interrupts are enabled, but no other sites would race to
 + * modify this counter in between.
 + *
 + * Otherwise, an irq-safe version mod_zone_page_state() should
 + * be used instead.

You are right that the comment is not good enough, but I don't trust
your version either.  Since percpu variables are involved, it's important
that preemption be disabled too (see comment above __inc_zone_state).

I'd prefer to let Christoph write the definitive version,
but my first stab at it would be:

/*
 * For use when we know that