from:"Wang, Yalin"

RE: [RFC V2] proc: change /proc/stat show

2015-03-19 Thread Wang, Yalin

> -Original Message-
> From: Andrew Morton [mailto:a...@linux-foundation.org]
> Sent: Thursday, March 19, 2015 6:08 AM
> To: Wang, Yalin
> Cc: 'David Rientjes'; 'heiko.carst...@de.ibm.com'; 't...@linutronix.de';
> 'linux-kernel@vger.kernel.org'
> Subject: Re: [RFC V2] proc: change /proc/stat show
> 
> On Wed, 18 Mar 2015 10:35:39 +0800 "Wang, Yalin"
>  wrote:
> 
> > This patch change /proc/stat to show each cpu,
> > we show each present cpus instead of eacn online cpu,
> > because some cpus are online / offline dynamically,
> > we should also show its cputime even it is offline,
> > some lib will read this file to detect cpu numbers,
> > we should also return the real present cpu numbers,
> > not just online cpus.
> 
> /proc/cpuinfo also skips offline CPUs.
> 
> > --- a/fs/proc/stat.c
> > +++ b/fs/proc/stat.c
> > @@ -130,7 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
> > seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
> > seq_putc(p, '\n');
> >
> > -   for_each_online_cpu(i) {
> > +   for_each_present_cpu(i) {
> > /* Copy values here to work around gcc-2.95.3, gcc-2.96 */
> > user = kcpustat_cpu(i).cpustat[CPUTIME_USER];
> > nice = kcpustat_cpu(i).cpustat[CPUTIME_NICE];
> 
> It's a non-back-compatible userspace interface change and I don't know
> what might break as a result.  For example, anyone who is using
> /proc/stat to find out which CPUs are online will get a surprise!
> 
I see,
I found on android, sysconf(_SC_NPROCESSORS_ONLN) function is implemented by
read /proc/stat file, and return the online processor numbers,
it will be non-compatible if I change this.

Thanks for your comments.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V2] proc: change /proc/stat show

2015-03-19 Thread Wang, Yalin

 -Original Message-
 From: Andrew Morton [mailto:a...@linux-foundation.org]
 Sent: Thursday, March 19, 2015 6:08 AM
 To: Wang, Yalin
 Cc: 'David Rientjes'; 'heiko.carst...@de.ibm.com'; 't...@linutronix.de';
 'linux-kernel@vger.kernel.org'
 Subject: Re: [RFC V2] proc: change /proc/stat show

 On Wed, 18 Mar 2015 10:35:39 +0800 Wang, Yalin
 yalin.w...@sonymobile.com wrote:

  This patch change /proc/stat to show each cpu,
  we show each present cpus instead of eacn online cpu,
  because some cpus are online / offline dynamically,
  we should also show its cputime even it is offline,
  some lib will read this file to detect cpu numbers,
  we should also return the real present cpu numbers,
  not just online cpus.

 /proc/cpuinfo also skips offline CPUs.

  --- a/fs/proc/stat.c
  +++ b/fs/proc/stat.c
  @@ -130,7 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
  seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
  seq_putc(p, '\n');

  -   for_each_online_cpu(i) {
  +   for_each_present_cpu(i) {
  /* Copy values here to work around gcc-2.95.3, gcc-2.96 */
  user = kcpustat_cpu(i).cpustat[CPUTIME_USER];
  nice = kcpustat_cpu(i).cpustat[CPUTIME_NICE];

 It's a non-back-compatible userspace interface change and I don't know
 what might break as a result.  For example, anyone who is using
 /proc/stat to find out which CPUs are online will get a surprise!

I see,
I found on android, sysconf(_SC_NPROCESSORS_ONLN) function is implemented by
read /proc/stat file, and return the online processor numbers,
it will be non-compatible if I change this.

Thanks for your comments.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] proc: change /proc/stat show

2015-03-17 Thread Wang, Yalin

This patch change /proc/stat to show each cpu,
we show each present cpus instead of eacn online cpu,
because some cpus are online / offline dynamically,
we should also show its cputime even it is offline,
some lib will read this file to detect cpu numbers,
we should also return the real present cpu numbers,
not just online cpus.

Signed-off-by: Yalin Wang 
---
 fs/proc/stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 510413eb..f009cdd 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -130,7 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
seq_putc(p, '\n');
 
-   for_each_online_cpu(i) {
+   for_each_present_cpu(i) {
/* Copy values here to work around gcc-2.95.3, gcc-2.96 */
user = kcpustat_cpu(i).cpustat[CPUTIME_USER];
nice = kcpustat_cpu(i).cpustat[CPUTIME_NICE];
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] proc: change /proc/stat show

2015-03-17 Thread Wang, Yalin

> -Original Message-
> From: David Rientjes [mailto:rient...@google.com]
> Sent: Wednesday, March 18, 2015 7:02 AM
> To: Wang, Yalin
> Cc: a...@linux-foundation.org; heiko.carst...@de.ibm.com;
> t...@linutronix.de; linux-kernel@vger.kernel.org
> Subject: Re: [RFC] proc: change /proc/stat show
> 
> On Tue, 17 Mar 2015, Wang, Yalin wrote:
> 
> > diff --git a/fs/proc/stat.c b/fs/proc/stat.c
> > index 510413eb..60b39e2 100644
> > --- a/fs/proc/stat.c
> > +++ b/fs/proc/stat.c
> > @@ -130,7 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
> > seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
> > seq_putc(p, '\n');
> >
> > -   for_each_online_cpu(i) {
> > +   for_each_present_cpu(i) {
> > /* Copy values here to work around gcc-2.95.3, gcc-2.96 */
> > user = kcpustat_cpu(i).cpustat[CPUTIME_USER];
> > nice = kcpustat_cpu(i).cpustat[CPUTIME_NICE];
> > @@ -153,6 +153,7 @@ static int show_stat(struct seq_file *p, void *v)
> > seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
> > seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
> > seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
> > +   seq_printf(p, " %d", cpu_online(i)? 1 : 0);
> > seq_putc(p, '\n');
> > }
> > seq_printf(p, "intr %llu", (unsigned long long)sum);
> 
> Makes sense, but why do we need to output the cpu state as part of
> /proc/stat?  This information should already be available elsewhere.
Yeah,  online / offline info can be read from /sys/devices/cpu .
I can remove this line , if you think it is not needed .

Thanks



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] proc: change /proc/stat show

2015-03-17 Thread Wang, Yalin

This patch change /proc/stat to show each cpu,
we show each present cpus instead of eacn online cpu,
because some cpus are online / offline dynamically,
we should also show its cputime even it is offline,
some lib will read this file to detect cpu numbers,
we should also return the real present cpu numbers,
not just online cpus.

Signed-off-by: Yalin Wang 
---
 fs/proc/stat.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 510413eb..60b39e2 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -130,7 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
seq_putc(p, '\n');
 
-   for_each_online_cpu(i) {
+   for_each_present_cpu(i) {
/* Copy values here to work around gcc-2.95.3, gcc-2.96 */
user = kcpustat_cpu(i).cpustat[CPUTIME_USER];
nice = kcpustat_cpu(i).cpustat[CPUTIME_NICE];
@@ -153,6 +153,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+   seq_printf(p, " %d", cpu_online(i)? 1 : 0);
seq_putc(p, '\n');
}
seq_printf(p, "intr %llu", (unsigned long long)sum);
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] proc: change /proc/stat show

2015-03-17 Thread Wang, Yalin

This patch change /proc/stat to show each cpu,
we show each present cpus instead of eacn online cpu,
because some cpus are online / offline dynamically,
we should also show its cputime even it is offline,
some lib will read this file to detect cpu numbers,
we should also return the real present cpu numbers,
not just online cpus.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 fs/proc/stat.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 510413eb..60b39e2 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -130,7 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
seq_putc(p, '\n');
 
-   for_each_online_cpu(i) {
+   for_each_present_cpu(i) {
/* Copy values here to work around gcc-2.95.3, gcc-2.96 */
user = kcpustat_cpu(i).cpustat[CPUTIME_USER];
nice = kcpustat_cpu(i).cpustat[CPUTIME_NICE];
@@ -153,6 +153,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+   seq_printf(p,  %d, cpu_online(i)? 1 : 0);
seq_putc(p, '\n');
}
seq_printf(p, intr %llu, (unsigned long long)sum);
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] proc: change /proc/stat show

2015-03-17 Thread Wang, Yalin

 -Original Message-
 From: David Rientjes [mailto:rient...@google.com]
 Sent: Wednesday, March 18, 2015 7:02 AM
 To: Wang, Yalin
 Cc: a...@linux-foundation.org; heiko.carst...@de.ibm.com;
 t...@linutronix.de; linux-kernel@vger.kernel.org
 Subject: Re: [RFC] proc: change /proc/stat show
 
 On Tue, 17 Mar 2015, Wang, Yalin wrote:
 
  diff --git a/fs/proc/stat.c b/fs/proc/stat.c
  index 510413eb..60b39e2 100644
  --- a/fs/proc/stat.c
  +++ b/fs/proc/stat.c
  @@ -130,7 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
  seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
  seq_putc(p, '\n');
 
  -   for_each_online_cpu(i) {
  +   for_each_present_cpu(i) {
  /* Copy values here to work around gcc-2.95.3, gcc-2.96 */
  user = kcpustat_cpu(i).cpustat[CPUTIME_USER];
  nice = kcpustat_cpu(i).cpustat[CPUTIME_NICE];
  @@ -153,6 +153,7 @@ static int show_stat(struct seq_file *p, void *v)
  seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
  seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
  seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
  +   seq_printf(p,  %d, cpu_online(i)? 1 : 0);
  seq_putc(p, '\n');
  }
  seq_printf(p, intr %llu, (unsigned long long)sum);
 
 Makes sense, but why do we need to output the cpu state as part of
 /proc/stat?  This information should already be available elsewhere.
Yeah,  online / offline info can be read from /sys/devices/cpu .
I can remove this line , if you think it is not needed .

Thanks



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] proc: change /proc/stat show

2015-03-17 Thread Wang, Yalin

This patch change /proc/stat to show each cpu,
we show each present cpus instead of eacn online cpu,
because some cpus are online / offline dynamically,
we should also show its cputime even it is offline,
some lib will read this file to detect cpu numbers,
we should also return the real present cpu numbers,
not just online cpus.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 fs/proc/stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 510413eb..f009cdd 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -130,7 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
seq_putc(p, '\n');
 
-   for_each_online_cpu(i) {
+   for_each_present_cpu(i) {
/* Copy values here to work around gcc-2.95.3, gcc-2.96 */
user = kcpustat_cpu(i).cpustat[CPUTIME_USER];
nice = kcpustat_cpu(i).cpustat[CPUTIME_NICE];
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] mm:do recheck for freeable page in reclaim path

2015-03-11 Thread Wang, Yalin

In reclaim path, if encounter a freeable page,
the try_to_unmap may fail, because the page's pte is
dirty, we can recheck this page as normal non-freeable page,
this means we can swap out this page into swap partition.

Signed-off-by: Yalin Wang 
---
 mm/vmscan.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 260c413..9930850 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1000,6 +1000,7 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
}
}
 
+recheck:
if (!force_reclaim)
references = page_check_references(page, sc,
);
@@ -1045,6 +1046,10 @@ unmap:
switch (try_to_unmap(page,
freeable ? TTU_FREE : ttu_flags)) {
case SWAP_FAIL:
+   if (freeable) {
+   freeable = false;
+   goto recheck;
+   }
goto activate_locked;
case SWAP_AGAIN:
goto keep_locked;
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC ] mm: don't ignore file map pages for madvise_free( )

2015-03-11 Thread Wang, Yalin

Hi

I just want to explain my ideas about file map pages for madvise_free() syscall.
As the following patch,
For file map vma, there is 2 types:
1. private file map
In this type, the pages of this vma are file map pages or anon page 
(when COW happened),
2. shared file map
In this type, the pages of this vma are all file map pages.

No matter which type file map,
We can handle file map vma as the following:
If the page is file map pages,
We just clear its pte young bit(pte_mkold()),
This will have some advantages, it will make page
Reclaim path move this file map page into inactive
lru list aggressively.

If the page is anon map page, we just handle it in the
Same way as for the pages in anon vma.


---

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..8fdc82f 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -322,7 +322,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb->fullmm);
ptent = pte_mkold(ptent);
-   ptent = pte_mkclean(ptent);
+   if (PageAnon(page))
+   ptent = pte_mkclean(ptent);
set_pte_at(mm, addr, pte, ptent);
tlb_remove_tlb_entry(tlb, pte, addr);
}
@@ -364,10 +365,6 @@ static int madvise_free_single_vma(struct vm_area_struct 
*vma,
if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
return -EINVAL;

-   /* MADV_FREE works for only anon vma at the moment */
-   if (vma->vm_file)
-   return -EINVAL;
-
start = max(vma->vm_start, start_addr);
if (start >= vma->vm_end)
return -EINVAL;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] mm:do recheck for freeable page in reclaim path

2015-03-11 Thread Wang, Yalin

In reclaim path, if encounter a freeable page,
the try_to_unmap may fail, because the page's pte is
dirty, we can recheck this page as normal non-freeable page,
this means we can swap out this page into swap partition.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 mm/vmscan.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 260c413..9930850 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1000,6 +1000,7 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
}
}
 
+recheck:
if (!force_reclaim)
references = page_check_references(page, sc,
freeable);
@@ -1045,6 +1046,10 @@ unmap:
switch (try_to_unmap(page,
freeable ? TTU_FREE : ttu_flags)) {
case SWAP_FAIL:
+   if (freeable) {
+   freeable = false;
+   goto recheck;
+   }
goto activate_locked;
case SWAP_AGAIN:
goto keep_locked;
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC ] mm: don't ignore file map pages for madvise_free( )

2015-03-11 Thread Wang, Yalin

Hi

I just want to explain my ideas about file map pages for madvise_free() syscall.
As the following patch,
For file map vma, there is 2 types:
1. private file map
In this type, the pages of this vma are file map pages or anon page 
(when COW happened),
2. shared file map
In this type, the pages of this vma are all file map pages.

No matter which type file map,
We can handle file map vma as the following:
If the page is file map pages,
We just clear its pte young bit(pte_mkold()),
This will have some advantages, it will make page
Reclaim path move this file map page into inactive
lru list aggressively.

If the page is anon map page, we just handle it in the
Same way as for the pages in anon vma.


---

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..8fdc82f 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -322,7 +322,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb-fullmm);
ptent = pte_mkold(ptent);
-   ptent = pte_mkclean(ptent);
+   if (PageAnon(page))
+   ptent = pte_mkclean(ptent);
set_pte_at(mm, addr, pte, ptent);
tlb_remove_tlb_entry(tlb, pte, addr);
}
@@ -364,10 +365,6 @@ static int madvise_free_single_vma(struct vm_area_struct 
*vma,
if (vma-vm_flags  (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
return -EINVAL;

-   /* MADV_FREE works for only anon vma at the moment */
-   if (vma-vm_file)
-   return -EINVAL;
-
start = max(vma-vm_start, start_addr);
if (start = vma-vm_end)
return -EINVAL;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 3/4] mm: move lazy free pages to inactive list

2015-03-10 Thread Wang, Yalin

> -Original Message-
> From: Minchan Kim [mailto:minc...@kernel.org]
> Sent: Wednesday, March 11, 2015 9:21 AM
> To: Andrew Morton
> Cc: linux-kernel@vger.kernel.org; linux...@kvack.org; Michal Hocko;
> Johannes Weiner; Mel Gorman; Rik van Riel; Shaohua Li; Wang, Yalin; Minchan
> Kim
> Subject: [PATCH 3/4] mm: move lazy free pages to inactive list
> 
> MADV_FREE is hint that it's okay to discard pages if there is
> memory pressure and we uses reclaimers(ie, kswapd and direct reclaim)
> to free them so there is no worth to remain them in active anonymous LRU
> so this patch moves them to inactive LRU list's head.
> 
> This means that MADV_FREE-ed pages which were living on the inactive list
> are reclaimed first because they are more likely to be cold rather than
> recently active pages.
> 
> A arguable issue for the approach would be whether we should put it to
> head or tail in inactive list. I selected *head* because kernel cannot
> make sure it's really cold or warm for every MADV_FREE usecase but
> at least we know it's not *hot* so landing of inactive head would be
> comprimise for various usecases.
> 
> This is fixing a suboptimal behavior of MADV_FREE when pages living on
> the active list will sit there for a long time even under memory
> pressure while the inactive list is reclaimed heavily. This basically
> breaks the whole purpose of using MADV_FREE to help the system to free
> memory which is might not be used.
> 
> Acked-by: Michal Hocko 
> Signed-off-by: Minchan Kim 
> ---
>  include/linux/swap.h |  1 +
>  mm/madvise.c |  2 ++
>  mm/swap.c| 35 +++
>  3 files changed, 38 insertions(+)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index cee108c..0428e4c 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -308,6 +308,7 @@ extern void lru_add_drain_cpu(int cpu);
>  extern void lru_add_drain_all(void);
>  extern void rotate_reclaimable_page(struct page *page);
>  extern void deactivate_file_page(struct page *page);
> +extern void deactivate_page(struct page *page);
>  extern void swap_setup(void);
> 
>  extern void add_page_to_unevictable_list(struct page *page);
> diff --git a/mm/madvise.c b/mm/madvise.c
> index ebe692e..22e8f0c 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -340,6 +340,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned
> long addr,
>   ptent = pte_mkold(ptent);
>   ptent = pte_mkclean(ptent);
>   set_pte_at(mm, addr, pte, ptent);
> + if (PageActive(page))
> + deactivate_page(page);
>   tlb_remove_tlb_entry(tlb, pte, addr);
>   }

I think this place should be changed like this:
  + if (!page_referenced(page, false, NULL, NULL, NULL) && 
PageActive(page))
  + deactivate_page(page);
Because we don't know if other processes are reference this page,
If it is true, don't need deactivate this page.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 3/4] mm: move lazy free pages to inactive list

2015-03-10 Thread Wang, Yalin

 -Original Message-
 From: Minchan Kim [mailto:minc...@kernel.org]
 Sent: Wednesday, March 11, 2015 9:21 AM
 To: Andrew Morton
 Cc: linux-kernel@vger.kernel.org; linux...@kvack.org; Michal Hocko;
 Johannes Weiner; Mel Gorman; Rik van Riel; Shaohua Li; Wang, Yalin; Minchan
 Kim
 Subject: [PATCH 3/4] mm: move lazy free pages to inactive list

 MADV_FREE is hint that it's okay to discard pages if there is
 memory pressure and we uses reclaimers(ie, kswapd and direct reclaim)
 to free them so there is no worth to remain them in active anonymous LRU
 so this patch moves them to inactive LRU list's head.

 This means that MADV_FREE-ed pages which were living on the inactive list
 are reclaimed first because they are more likely to be cold rather than
 recently active pages.

 A arguable issue for the approach would be whether we should put it to
 head or tail in inactive list. I selected *head* because kernel cannot
 make sure it's really cold or warm for every MADV_FREE usecase but
 at least we know it's not *hot* so landing of inactive head would be
 comprimise for various usecases.

 This is fixing a suboptimal behavior of MADV_FREE when pages living on
 the active list will sit there for a long time even under memory
 pressure while the inactive list is reclaimed heavily. This basically
 breaks the whole purpose of using MADV_FREE to help the system to free
 memory which is might not be used.

 Acked-by: Michal Hocko mho...@suse.cz
 Signed-off-by: Minchan Kim minc...@kernel.org
 ---
  include/linux/swap.h |  1 +
  mm/madvise.c |  2 ++
  mm/swap.c| 35 +++
  3 files changed, 38 insertions(+)

 diff --git a/include/linux/swap.h b/include/linux/swap.h
 index cee108c..0428e4c 100644
 --- a/include/linux/swap.h
 +++ b/include/linux/swap.h
 @@ -308,6 +308,7 @@ extern void lru_add_drain_cpu(int cpu);
  extern void lru_add_drain_all(void);
  extern void rotate_reclaimable_page(struct page *page);
  extern void deactivate_file_page(struct page *page);
 +extern void deactivate_page(struct page *page);
  extern void swap_setup(void);

  extern void add_page_to_unevictable_list(struct page *page);
 diff --git a/mm/madvise.c b/mm/madvise.c
 index ebe692e..22e8f0c 100644
 --- a/mm/madvise.c
 +++ b/mm/madvise.c
 @@ -340,6 +340,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned
 long addr,
   ptent = pte_mkold(ptent);
   ptent = pte_mkclean(ptent);
   set_pte_at(mm, addr, pte, ptent);
 + if (PageActive(page))
 + deactivate_page(page);
   tlb_remove_tlb_entry(tlb, pte, addr);
   }

I think this place should be changed like this:
  + if (!page_referenced(page, false, NULL, NULL, NULL)  
PageActive(page))
  + deactivate_page(page);
Because we don't know if other processes are reference this page,
If it is true, don't need deactivate this page.

Thanks
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V3] mm: change mm_advise_free to clear page dirty

2015-03-02 Thread Wang, Yalin

> -Original Message-
> From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
> Sent: Tuesday, March 03, 2015 12:15 PM
> To: Wang, Yalin
> Cc: 'Michal Hocko'; 'Andrew Morton'; 'linux-kernel@vger.kernel.org';
> 'linux...@kvack.org'; 'Rik van Riel'; 'Johannes Weiner'; 'Mel Gorman';
> 'Shaohua Li'; Hugh Dickins; Cyrill Gorcunov
> Subject: Re: [RFC V3] mm: change mm_advise_free to clear page dirty
> 
> On Tue, Mar 03, 2015 at 11:59:17AM +0800, Wang, Yalin wrote:
> > > -Original Message-
> > > From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan
> Kim
> > > Sent: Tuesday, March 03, 2015 11:26 AM
> > > To: Wang, Yalin
> > > Cc: 'Michal Hocko'; 'Andrew Morton'; 'linux-kernel@vger.kernel.org';
> > > 'linux...@kvack.org'; 'Rik van Riel'; 'Johannes Weiner'; 'Mel Gorman';
> > > 'Shaohua Li'; Hugh Dickins; Cyrill Gorcunov
> > > Subject: Re: [RFC V3] mm: change mm_advise_free to clear page dirty
> > >
> > > Could you separte this patch in this patchset thread?
> > > It's tackling differnt problem.
> > >
> > > As well, I had a question to previous thread about why shared page
> > > has a problem now but you didn't answer and send a new patchset.
> > > It makes reviewers/maintainer time waste/confuse. Please, don't
> > > hurry to send a code. Before that, resolve reviewers's comments.
> > >
> > > On Tue, Mar 03, 2015 at 10:06:40AM +0800, Wang, Yalin wrote:
> > > > This patch add ClearPageDirty() to clear AnonPage dirty flag,
> > > > if not clear page dirty for this anon page, the page will never be
> > > > treated as freeable. We also make sure the shared AnonPage is not
> > > > freeable, we implement it by dirty all copyed AnonPage pte,
> > > > so that make sure the Anonpage will not become freeable, unless
> > > > all process which shared this page call madvise_free syscall.
> > >
> > > Please, spend more time to make description clear. I really doubt
> > > who understand this description without code inspection. :(
> > > Of course, I'm not a person to write description clear like native
> > > , either but just I'm sure I spend a more time to write description
> > > rather than coding, at least. :)
> > >
> > I see, I will send another mail for file private map pages.
> > Sorry for my English expressions.
> > I think your solution is ok,
> > Your patch will make sure the anonpage pte will always be dirty.
> > I add some comments for your patch:
> >
> > > ---
> > >  mm/madvise.c | 1 -
> > >  mm/memory.c  | 9 +++--
> > >  mm/rmap.c| 2 +-
> > >  mm/vmscan.c  | 3 +--
> > >  4 files changed, 9 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index 6d0fcb8..d64200e 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -309,7 +309,6 @@ static int madvise_free_pte_range(pmd_t *pmd,
> unsigned
> > > long addr,
> > >   continue;
> > >   }
> > >
> > > - ClearPageDirty(page);
> > >   unlock_page(page);
> > >   }
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 8ae52c9..2f45e77 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -2460,9 +2460,14 @@ static int do_swap_page(struct mm_struct *mm,
> struct
> > > vm_area_struct *vma,
> > >
> > >   inc_mm_counter_fast(mm, MM_ANONPAGES);
> > >   dec_mm_counter_fast(mm, MM_SWAPENTS);
> > > - pte = mk_pte(page, vma->vm_page_prot);
> > > +
> > > + /*
> > > +  * Every page swapped-out was pte_dirty so we makes pte dirty again.
> > > +  * MADV_FREE relys on it.
> > > +  */
> > > + pte = mk_pte(pte_mkdirty(page), vma->vm_page_prot);
> > pte_mkdirty() usage seems wrong here.
> 
> Argh, it reveals I didn't test even build. My shame.
> But RFC tag might mitigate my shame. :)
> I will fix it if I send a formal version.
> Thanks for the review.
> 
> >
> > >   if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
> > > - pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> > > + pte = maybe_mkwrite(pte, vma);
> > >   flags &= ~FAULT_FLAG_WRITE;
> > >   ret |= VM_FAULT_WRITE;
> > >   exclusive = 1;
> > > diff --git a/mm/rmap.c b/mm/rmap.c
&

RE: [RFC V3] mm: change mm_advise_free to clear page dirty

2015-03-02 Thread Wang, Yalin

> -Original Message-
> From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
> Sent: Tuesday, March 03, 2015 11:26 AM
> To: Wang, Yalin
> Cc: 'Michal Hocko'; 'Andrew Morton'; 'linux-kernel@vger.kernel.org';
> 'linux...@kvack.org'; 'Rik van Riel'; 'Johannes Weiner'; 'Mel Gorman';
> 'Shaohua Li'; Hugh Dickins; Cyrill Gorcunov
> Subject: Re: [RFC V3] mm: change mm_advise_free to clear page dirty
> 
> Could you separte this patch in this patchset thread?
> It's tackling differnt problem.
> 
> As well, I had a question to previous thread about why shared page
> has a problem now but you didn't answer and send a new patchset.
> It makes reviewers/maintainer time waste/confuse. Please, don't
> hurry to send a code. Before that, resolve reviewers's comments.
> 
> On Tue, Mar 03, 2015 at 10:06:40AM +0800, Wang, Yalin wrote:
> > This patch add ClearPageDirty() to clear AnonPage dirty flag,
> > if not clear page dirty for this anon page, the page will never be
> > treated as freeable. We also make sure the shared AnonPage is not
> > freeable, we implement it by dirty all copyed AnonPage pte,
> > so that make sure the Anonpage will not become freeable, unless
> > all process which shared this page call madvise_free syscall.
> 
> Please, spend more time to make description clear. I really doubt
> who understand this description without code inspection. :(
> Of course, I'm not a person to write description clear like native
> , either but just I'm sure I spend a more time to write description
> rather than coding, at least. :)
> 
I see, I will send another mail for file private map pages.
Sorry for my English expressions.
I think your solution is ok,
Your patch will make sure the anonpage pte will always be dirty.
I add some comments for your patch:

> ---
>  mm/madvise.c | 1 -
>  mm/memory.c  | 9 +++--
>  mm/rmap.c| 2 +-
>  mm/vmscan.c  | 3 +--
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6d0fcb8..d64200e 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -309,7 +309,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned
> long addr,
>   continue;
>   }
> 
> - ClearPageDirty(page);
>   unlock_page(page);
>   }
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 8ae52c9..2f45e77 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2460,9 +2460,14 @@ static int do_swap_page(struct mm_struct *mm, struct
> vm_area_struct *vma,
> 
>   inc_mm_counter_fast(mm, MM_ANONPAGES);
>   dec_mm_counter_fast(mm, MM_SWAPENTS);
> - pte = mk_pte(page, vma->vm_page_prot);
> +
> + /*
> +  * Every page swapped-out was pte_dirty so we makes pte dirty again.
> +  * MADV_FREE relys on it.
> +  */
> + pte = mk_pte(pte_mkdirty(page), vma->vm_page_prot);
pte_mkdirty() usage seems wrong here.

>   if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
> - pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> + pte = maybe_mkwrite(pte, vma);
>   flags &= ~FAULT_FLAG_WRITE;
>   ret |= VM_FAULT_WRITE;
>   exclusive = 1;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 47b3ba8..34c1d66 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct
> vm_area_struct *vma,
> 
>   if (flags & TTU_FREE) {
>   VM_BUG_ON_PAGE(PageSwapCache(page), page);
> - if (!dirty && !PageDirty(page)) {
> + if (!dirty) {
>   /* It's a freeable page by MADV_FREE */
>   dec_mm_counter(mm, MM_ANONPAGES);
>   goto discard;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 671e47e..7f520c9 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -805,8 +805,7 @@ static enum page_references
> page_check_references(struct page *page,
>   return PAGEREF_KEEP;
>   }
> 
> - if (PageAnon(page) && !pte_dirty && !PageSwapCache(page) &&
> - !PageDirty(page))
> + if (PageAnon(page) && !pte_dirty && !PageSwapCache(page))
>   *freeable = true;
> 
>   /* Reclaim if clean, defer dirty pages to writeback */
> --
> 1.9.3
Could we remove SetPageDirty(page); in try_to_free_swap() function based on 
this patch?
Because your patch will make sure the pte is always dirty,
We don't need setpagedirty(),
The try_to_unmap() path will re-dirty the page during reclaim path,
Isn't it?

Thanks







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V3] mm: change mm_advise_free to clear page dirty

2015-03-02 Thread Wang, Yalin

This patch add ClearPageDirty() to clear AnonPage dirty flag,
if not clear page dirty for this anon page, the page will never be
treated as freeable. We also make sure the shared AnonPage is not
freeable, we implement it by dirty all copyed AnonPage pte,
so that make sure the Anonpage will not become freeable, unless
all process which shared this page call madvise_free syscall.

Signed-off-by: Yalin Wang 
---
 mm/madvise.c | 16 +---
 mm/memory.c  | 12 ++--
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..b61070d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -297,23 +297,25 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
long addr,
continue;
 
page = vm_normal_page(vma, addr, ptent);
-   if (!page)
+   if (!page || !trylock_page(page))
continue;
 
if (PageSwapCache(page)) {
-   if (!trylock_page(page))
-   continue;
-
if (!try_to_free_swap(page)) {
unlock_page(page);
continue;
}
-
-   ClearPageDirty(page);
-   unlock_page(page);
}
 
/*
+* we clear page dirty flag for AnonPage, no matter if this
+* page is in swapcahce or not, AnonPage not in swapcache also 
set
+* dirty flag sometimes, this happened when a AnonPage is 
removed
+* from swapcahce by try_to_free_swap()
+*/
+   ClearPageDirty(page);
+   unlock_page(page);
+   /*
 * Some of architecture(ex, PPC) don't update TLB
 * with set_pte_at and tlb_remove_tlb_entry so for
 * the portability, remap the pte with old|clean
diff --git a/mm/memory.c b/mm/memory.c
index 8068893..3d949b3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -874,10 +874,18 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
if (page) {
get_page(page);
page_dup_rmap(page);
-   if (PageAnon(page))
+   if (PageAnon(page)) {
+   /*
+* we dirty the copyed pte for anon page,
+* this is useful for madvise_free_pte_range(),
+* this can prevent shared anon page freed by 
madvise_free
+* syscall
+*/
+   pte = pte_mkdirty(pte);
rss[MM_ANONPAGES]++;
-   else
+   } else {
rss[MM_FILEPAGES]++;
+   }
}
 
 out_set_pte:
-- 
2.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V3] mm: change mm_advise_free to clear page dirty

2015-03-02 Thread Wang, Yalin

This patch add ClearPageDirty() to clear AnonPage dirty flag,
if not clear page dirty for this anon page, the page will never be
treated as freeable. We also make sure the shared AnonPage is not
freeable, we implement it by dirty all copyed AnonPage pte,
so that make sure the Anonpage will not become freeable, unless
all process which shared this page call madvise_free syscall.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 mm/madvise.c | 16 +---
 mm/memory.c  | 12 ++--
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..b61070d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -297,23 +297,25 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
long addr,
continue;
 
page = vm_normal_page(vma, addr, ptent);
-   if (!page)
+   if (!page || !trylock_page(page))
continue;
 
if (PageSwapCache(page)) {
-   if (!trylock_page(page))
-   continue;
-
if (!try_to_free_swap(page)) {
unlock_page(page);
continue;
}
-
-   ClearPageDirty(page);
-   unlock_page(page);
}
 
/*
+* we clear page dirty flag for AnonPage, no matter if this
+* page is in swapcahce or not, AnonPage not in swapcache also 
set
+* dirty flag sometimes, this happened when a AnonPage is 
removed
+* from swapcahce by try_to_free_swap()
+*/
+   ClearPageDirty(page);
+   unlock_page(page);
+   /*
 * Some of architecture(ex, PPC) don't update TLB
 * with set_pte_at and tlb_remove_tlb_entry so for
 * the portability, remap the pte with old|clean
diff --git a/mm/memory.c b/mm/memory.c
index 8068893..3d949b3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -874,10 +874,18 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
if (page) {
get_page(page);
page_dup_rmap(page);
-   if (PageAnon(page))
+   if (PageAnon(page)) {
+   /*
+* we dirty the copyed pte for anon page,
+* this is useful for madvise_free_pte_range(),
+* this can prevent shared anon page freed by 
madvise_free
+* syscall
+*/
+   pte = pte_mkdirty(pte);
rss[MM_ANONPAGES]++;
-   else
+   } else {
rss[MM_FILEPAGES]++;
+   }
}
 
 out_set_pte:
-- 
2.2.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V3] mm: change mm_advise_free to clear page dirty

2015-03-02 Thread Wang, Yalin

 -Original Message-
 From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
 Sent: Tuesday, March 03, 2015 11:26 AM
 To: Wang, Yalin
 Cc: 'Michal Hocko'; 'Andrew Morton'; 'linux-kernel@vger.kernel.org';
 'linux...@kvack.org'; 'Rik van Riel'; 'Johannes Weiner'; 'Mel Gorman';
 'Shaohua Li'; Hugh Dickins; Cyrill Gorcunov
 Subject: Re: [RFC V3] mm: change mm_advise_free to clear page dirty

 Could you separte this patch in this patchset thread?
 It's tackling differnt problem.

 As well, I had a question to previous thread about why shared page
 has a problem now but you didn't answer and send a new patchset.
 It makes reviewers/maintainer time waste/confuse. Please, don't
 hurry to send a code. Before that, resolve reviewers's comments.

 On Tue, Mar 03, 2015 at 10:06:40AM +0800, Wang, Yalin wrote:
  This patch add ClearPageDirty() to clear AnonPage dirty flag,
  if not clear page dirty for this anon page, the page will never be
  treated as freeable. We also make sure the shared AnonPage is not
  freeable, we implement it by dirty all copyed AnonPage pte,
  so that make sure the Anonpage will not become freeable, unless
  all process which shared this page call madvise_free syscall.

 Please, spend more time to make description clear. I really doubt
 who understand this description without code inspection. :(
 Of course, I'm not a person to write description clear like native
 , either but just I'm sure I spend a more time to write description
 rather than coding, at least. :)

I see, I will send another mail for file private map pages.
Sorry for my English expressions.
I think your solution is ok,
Your patch will make sure the anonpage pte will always be dirty.
I add some comments for your patch:

 ---
  mm/madvise.c | 1 -
  mm/memory.c  | 9 +++--
  mm/rmap.c| 2 +-
  mm/vmscan.c  | 3 +--
  4 files changed, 9 insertions(+), 6 deletions(-)

 diff --git a/mm/madvise.c b/mm/madvise.c
 index 6d0fcb8..d64200e 100644
 --- a/mm/madvise.c
 +++ b/mm/madvise.c
 @@ -309,7 +309,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned
 long addr,
   continue;
   }

 - ClearPageDirty(page);
   unlock_page(page);
   }

 diff --git a/mm/memory.c b/mm/memory.c
 index 8ae52c9..2f45e77 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -2460,9 +2460,14 @@ static int do_swap_page(struct mm_struct *mm, struct
 vm_area_struct *vma,

   inc_mm_counter_fast(mm, MM_ANONPAGES);
   dec_mm_counter_fast(mm, MM_SWAPENTS);
 - pte = mk_pte(page, vma-vm_page_prot);
 +
 + /*
 +  * Every page swapped-out was pte_dirty so we makes pte dirty again.
 +  * MADV_FREE relys on it.
 +  */
 + pte = mk_pte(pte_mkdirty(page), vma-vm_page_prot);
pte_mkdirty() usage seems wrong here.

   if ((flags  FAULT_FLAG_WRITE)  reuse_swap_page(page)) {
 - pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 + pte = maybe_mkwrite(pte, vma);
   flags = ~FAULT_FLAG_WRITE;
   ret |= VM_FAULT_WRITE;
   exclusive = 1;
 diff --git a/mm/rmap.c b/mm/rmap.c
 index 47b3ba8..34c1d66 100644
 --- a/mm/rmap.c
 +++ b/mm/rmap.c
 @@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct
 vm_area_struct *vma,

   if (flags  TTU_FREE) {
   VM_BUG_ON_PAGE(PageSwapCache(page), page);
 - if (!dirty  !PageDirty(page)) {
 + if (!dirty) {
   /* It's a freeable page by MADV_FREE */
   dec_mm_counter(mm, MM_ANONPAGES);
   goto discard;
 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index 671e47e..7f520c9 100644
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -805,8 +805,7 @@ static enum page_references
 page_check_references(struct page *page,
   return PAGEREF_KEEP;
   }

 - if (PageAnon(page)  !pte_dirty  !PageSwapCache(page) 
 - !PageDirty(page))
 + if (PageAnon(page)  !pte_dirty  !PageSwapCache(page))
   *freeable = true;

   /* Reclaim if clean, defer dirty pages to writeback */
 --
 1.9.3
Could we remove SetPageDirty(page); in try_to_free_swap() function based on 
this patch?
Because your patch will make sure the pte is always dirty,
We don't need setpagedirty(),
The try_to_unmap() path will re-dirty the page during reclaim path,
Isn't it?

Thanks

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V3] mm: change mm_advise_free to clear page dirty

2015-03-02 Thread Wang, Yalin

 -Original Message-
 From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
 Sent: Tuesday, March 03, 2015 12:15 PM
 To: Wang, Yalin
 Cc: 'Michal Hocko'; 'Andrew Morton'; 'linux-kernel@vger.kernel.org';
 'linux...@kvack.org'; 'Rik van Riel'; 'Johannes Weiner'; 'Mel Gorman';
 'Shaohua Li'; Hugh Dickins; Cyrill Gorcunov
 Subject: Re: [RFC V3] mm: change mm_advise_free to clear page dirty

 On Tue, Mar 03, 2015 at 11:59:17AM +0800, Wang, Yalin wrote:
   -Original Message-
   From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan
 Kim
   Sent: Tuesday, March 03, 2015 11:26 AM
   To: Wang, Yalin
   Cc: 'Michal Hocko'; 'Andrew Morton'; 'linux-kernel@vger.kernel.org';
   'linux...@kvack.org'; 'Rik van Riel'; 'Johannes Weiner'; 'Mel Gorman';
   'Shaohua Li'; Hugh Dickins; Cyrill Gorcunov
   Subject: Re: [RFC V3] mm: change mm_advise_free to clear page dirty

   Could you separte this patch in this patchset thread?
   It's tackling differnt problem.

   As well, I had a question to previous thread about why shared page
   has a problem now but you didn't answer and send a new patchset.
   It makes reviewers/maintainer time waste/confuse. Please, don't
   hurry to send a code. Before that, resolve reviewers's comments.

   On Tue, Mar 03, 2015 at 10:06:40AM +0800, Wang, Yalin wrote:
This patch add ClearPageDirty() to clear AnonPage dirty flag,
if not clear page dirty for this anon page, the page will never be
treated as freeable. We also make sure the shared AnonPage is not
freeable, we implement it by dirty all copyed AnonPage pte,
so that make sure the Anonpage will not become freeable, unless
all process which shared this page call madvise_free syscall.

   Please, spend more time to make description clear. I really doubt
   who understand this description without code inspection. :(
   Of course, I'm not a person to write description clear like native
   , either but just I'm sure I spend a more time to write description
   rather than coding, at least. :)

  I see, I will send another mail for file private map pages.
  Sorry for my English expressions.
  I think your solution is ok,
  Your patch will make sure the anonpage pte will always be dirty.
  I add some comments for your patch:

   ---
mm/madvise.c | 1 -
mm/memory.c  | 9 +++--
mm/rmap.c| 2 +-
mm/vmscan.c  | 3 +--
4 files changed, 9 insertions(+), 6 deletions(-)

   diff --git a/mm/madvise.c b/mm/madvise.c
   index 6d0fcb8..d64200e 100644
   --- a/mm/madvise.c
   +++ b/mm/madvise.c
   @@ -309,7 +309,6 @@ static int madvise_free_pte_range(pmd_t *pmd,
 unsigned
   long addr,
 continue;
 }

   - ClearPageDirty(page);
 unlock_page(page);
 }

   diff --git a/mm/memory.c b/mm/memory.c
   index 8ae52c9..2f45e77 100644
   --- a/mm/memory.c
   +++ b/mm/memory.c
   @@ -2460,9 +2460,14 @@ static int do_swap_page(struct mm_struct *mm,
 struct
   vm_area_struct *vma,

 inc_mm_counter_fast(mm, MM_ANONPAGES);
 dec_mm_counter_fast(mm, MM_SWAPENTS);
   - pte = mk_pte(page, vma-vm_page_prot);
   +
   + /*
   +  * Every page swapped-out was pte_dirty so we makes pte dirty again.
   +  * MADV_FREE relys on it.
   +  */
   + pte = mk_pte(pte_mkdirty(page), vma-vm_page_prot);
  pte_mkdirty() usage seems wrong here.

 Argh, it reveals I didn't test even build. My shame.
 But RFC tag might mitigate my shame. :)
 I will fix it if I send a formal version.
 Thanks for the review.

 if ((flags  FAULT_FLAG_WRITE)  reuse_swap_page(page)) {
   - pte = maybe_mkwrite(pte_mkdirty(pte), vma);
   + pte = maybe_mkwrite(pte, vma);
 flags = ~FAULT_FLAG_WRITE;
 ret |= VM_FAULT_WRITE;
 exclusive = 1;
   diff --git a/mm/rmap.c b/mm/rmap.c
   index 47b3ba8..34c1d66 100644
   --- a/mm/rmap.c
   +++ b/mm/rmap.c
   @@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page,
 struct
   vm_area_struct *vma,

 if (flags  TTU_FREE) {
 VM_BUG_ON_PAGE(PageSwapCache(page), page);
   - if (!dirty  !PageDirty(page)) {
   + if (!dirty) {
 /* It's a freeable page by MADV_FREE */
 dec_mm_counter(mm, MM_ANONPAGES);
 goto discard;
   diff --git a/mm/vmscan.c b/mm/vmscan.c
   index 671e47e..7f520c9 100644
   --- a/mm/vmscan.c
   +++ b/mm/vmscan.c
   @@ -805,8 +805,7 @@ static enum page_references
   page_check_references(struct page *page,
 return PAGEREF_KEEP;
 }

   - if (PageAnon(page)  !pte_dirty  !PageSwapCache(page) 
   - !PageDirty(page))
   + if (PageAnon(page)  !pte_dirty  !PageSwapCache(page))
 *freeable = true;

 /* Reclaim if clean, defer dirty pages to writeback */
   --
   1.9.3
  Could we remove SetPageDirty

RE: [RFC] mm: change mm_advise_free to clear page dirty

2015-03-01 Thread Wang, Yalin

> -Original Message-
> From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
> Sent: Saturday, February 28, 2015 9:50 PM
> To: Wang, Yalin
> Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org; linux-
> m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
> Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty
> 
> On Fri, Feb 27, 2015 at 10:37:14PM +0900, Minchan Kim wrote:
> > On Fri, Feb 27, 2015 at 03:50:29PM +0800, Wang, Yalin wrote:
> > > > -Original Message-
> > > > From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan
> Kim
> > > > Sent: Friday, February 27, 2015 2:44 PM
> > > > To: Wang, Yalin
> > > > Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org; linux-
> > > > m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
> > > > Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty
> > > >
> > > > On Fri, Feb 27, 2015 at 01:48:48PM +0800, Wang, Yalin wrote:
> > > > > > -Original Message-
> > > > > > From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of
> Minchan
> > > > Kim
> > > > > > Sent: Friday, February 27, 2015 1:28 PM
> > > > > > To: Wang, Yalin
> > > > > > Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org;
> linux-
> > > > > > m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua
> Li
> > > > > > Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > On Fri, Feb 27, 2015 at 11:37:18AM +0800, Wang, Yalin wrote:
> > > > > > > This patch add ClearPageDirty() to clear AnonPage dirty flag,
> > > > > > > the Anonpage mapcount must be 1, so that this page is only used
> by
> > > > > > > the current process, not shared by other process like fork().
> > > > > > > if not clear page dirty for this anon page, the page will never
> be
> > > > > > > treated as freeable.
> > > > > >
> > > > > > In case of anonymous page, it has PG_dirty when VM adds it to
> > > > > > swap cache and clear it in clear_page_dirty_for_io. That's why
> > > > > > I added ClearPageDirty if we found it in swapcache.
> > > > > > What case am I missing? It would be better to understand if you
> > > > > > describe specific scenario.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Yalin Wang 
> > > > > > > ---
> > > > > > >  mm/madvise.c | 15 +--
> > > > > > >  1 file changed, 5 insertions(+), 10 deletions(-)
> > > > > > >
> > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > > > > index 6d0fcb8..257925a 100644
> > > > > > > --- a/mm/madvise.c
> > > > > > > +++ b/mm/madvise.c
> > > > > > > @@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t
> *pmd,
> > > > > > unsigned long addr,
> > > > > > >   continue;
> > > > > > >
> > > > > > >   page = vm_normal_page(vma, addr, ptent);
> > > > > > > - if (!page)
> > > > > > > + if (!page || !PageAnon(page)
> || !trylock_page(page))
> > > > > > >   continue;
> > > > > > >
> > > > > > >   if (PageSwapCache(page)) {
> > > > > > > - if (!trylock_page(page))
> > > > > > > + if (!try_to_free_swap(page))
> > > > > > >   continue;
> > > > > > > -
> > > > > > > - if (!try_to_free_swap(page)) {
> > > > > > > - unlock_page(page);
> > > > > > > - continue;
> > > > > > > - }
> > > > > > > -
> > > > > > > - ClearPageDirty(page);
> > > > > > > - unlock_page(page);
> > > > > > >

RE: [RFC] mm: change mm_advise_free to clear page dirty

2015-03-01 Thread Wang, Yalin

> -Original Message-
> From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
> Sent: Saturday, February 28, 2015 9:56 PM
> To: Wang, Yalin
> Cc: 'Michal Hocko'; Andrew Morton; linux-kernel@vger.kernel.org; linux-
> m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
> Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty
> 
> On Sat, Feb 28, 2015 at 10:11:13AM +0800, Wang, Yalin wrote:
> > > -Original Message-
> > > From: Michal Hocko [mailto:msts...@gmail.com] On Behalf Of Michal Hocko
> > > Sent: Saturday, February 28, 2015 5:03 AM
> > > To: Wang, Yalin
> > > Cc: 'Minchan Kim'; Andrew Morton; linux-kernel@vger.kernel.org; linux-
> > > m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
> > > Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty
> > >
> > > On Fri 27-02-15 11:37:18, Wang, Yalin wrote:
> > > > This patch add ClearPageDirty() to clear AnonPage dirty flag,
> > > > the Anonpage mapcount must be 1, so that this page is only used by
> > > > the current process, not shared by other process like fork().
> > > > if not clear page dirty for this anon page, the page will never be
> > > > treated as freeable.
> > >
> > > Very well spotted! I haven't noticed that during the review.
> > >
> > > > Signed-off-by: Yalin Wang 
> > > > ---
> > > >  mm/madvise.c | 15 +--
> > > >  1 file changed, 5 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > index 6d0fcb8..257925a 100644
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t *pmd,
> > > unsigned long addr,
> > > > continue;
> > > >
> > > > page = vm_normal_page(vma, addr, ptent);
> > > > -   if (!page)
> > > > +   if (!page || !PageAnon(page) || !trylock_page(page))
> > > > continue;
> > >
> > > PageAnon check seems to be redundant because we are not allowing
> > > MADV_FREE on any !anon private mappings AFAIR.
> > I only see this check:
> > /* MADV_FREE works for only anon vma at the moment */
> > if (vma->vm_file)
> > return -EINVAL;
> >
> > but for file private map, there are also AnonPage sometimes, do we need
> change
> > to like this:
> > if (vma->vm_flags & VM_SHARED)
> > return -EINVAL;
> 
> I couldn't understand your point. In this stage, we intentionally
> disabled madvise_free on file mapped area(AFAIRC, some guys tried
> it long time ago but it had many issues so dropped).
> So, how can file-private mmaped can reach this code?
> Could you elaborate it more about that why we need PageAnon check
> in here?
> 
I send a new patch:
[RFC V2] mm: change mm_advise_free to clear page dirty
Please have a look at it.
Thanks for your comments!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] mm: change mm_advise_free to clear page dirty

2015-03-01 Thread Wang, Yalin

 -Original Message-
 From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
 Sent: Saturday, February 28, 2015 9:50 PM
 To: Wang, Yalin
 Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org; linux-
 m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
 Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty

 On Fri, Feb 27, 2015 at 10:37:14PM +0900, Minchan Kim wrote:
  On Fri, Feb 27, 2015 at 03:50:29PM +0800, Wang, Yalin wrote:
-Original Message-
From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan
 Kim
Sent: Friday, February 27, 2015 2:44 PM
To: Wang, Yalin
Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org; linux-
m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty

On Fri, Feb 27, 2015 at 01:48:48PM +0800, Wang, Yalin wrote:
  -Original Message-
  From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of
 Minchan
Kim
  Sent: Friday, February 27, 2015 1:28 PM
  To: Wang, Yalin
  Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org;
 linux-
  m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua
 Li
  Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty

  Hello,

  On Fri, Feb 27, 2015 at 11:37:18AM +0800, Wang, Yalin wrote:
   This patch add ClearPageDirty() to clear AnonPage dirty flag,
   the Anonpage mapcount must be 1, so that this page is only used
 by
   the current process, not shared by other process like fork().
   if not clear page dirty for this anon page, the page will never
 be
   treated as freeable.

  In case of anonymous page, it has PG_dirty when VM adds it to
  swap cache and clear it in clear_page_dirty_for_io. That's why
  I added ClearPageDirty if we found it in swapcache.
  What case am I missing? It would be better to understand if you
  describe specific scenario.

  Thanks.

   Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
   ---
mm/madvise.c | 15 +--
1 file changed, 5 insertions(+), 10 deletions(-)

   diff --git a/mm/madvise.c b/mm/madvise.c
   index 6d0fcb8..257925a 100644
   --- a/mm/madvise.c
   +++ b/mm/madvise.c
   @@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t
 *pmd,
  unsigned long addr,
 continue;

 page = vm_normal_page(vma, addr, ptent);
   - if (!page)
   + if (!page || !PageAnon(page)
 || !trylock_page(page))
 continue;

 if (PageSwapCache(page)) {
   - if (!trylock_page(page))
   + if (!try_to_free_swap(page))
 continue;
   -
   - if (!try_to_free_swap(page)) {
   - unlock_page(page);
   - continue;
   - }
   -
   - ClearPageDirty(page);
   - unlock_page(page);
 }

   + if (page_mapcount(page) == 1)
   + ClearPageDirty(page);
   + unlock_page(page);
 /*
  * Some of architecture(ex, PPC) don't update TLB
  * with set_pte_at and tlb_remove_tlb_entry so for
   --
 Yes, for page which is in SwapCache, it is correct,
 But for anon page which is not in SwapCache, it is always
 PageDirty(), so we should also clear dirty bit to make it freeable,

No. Every anon page starts from !PageDirty and it has PG_dirty
only when it's addeded into swap cache. If vm_swap_full turns on,
a page in swap cache could have PG_dirty via try_to_free_swap again.

   mmm..
   sometimes you can see an anon page PageDirty(), but it is not in
 swapcache,
   for example, handle_pte_fault()--do_swap_page()--try_to_free_swap(),
   at this time, the page is deleted from swapcache and is marked
 PageDirty(),

  That's what I missed. It's clear and would be simple patch so
  could you send a patch to fix this issue with detailed description
  like above?

So, Do you have concern about swapped-out pages when MADV_FREE is
called? If so, please look at my patch.

https://lkml.org/lkml/2015/2/25/43

It will zap the swapped out page. So, this is not a issue any more?

 Another problem  is that if an anon page is shared by more than one
process,
 This happened when fork(), the anon page will be copy on write,
 In this case, we should not clear page dirty,
 This is not correct for other process which don't call MADV_FREE
 syscall.

You mean we shouldn't inherit MADV_FREE attribute?
Why?

   Is it correct

RE: [RFC] mm: change mm_advise_free to clear page dirty

2015-03-01 Thread Wang, Yalin

 -Original Message-
 From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
 Sent: Saturday, February 28, 2015 9:56 PM
 To: Wang, Yalin
 Cc: 'Michal Hocko'; Andrew Morton; linux-kernel@vger.kernel.org; linux-
 m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
 Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty

 On Sat, Feb 28, 2015 at 10:11:13AM +0800, Wang, Yalin wrote:
   -Original Message-
   From: Michal Hocko [mailto:msts...@gmail.com] On Behalf Of Michal Hocko
   Sent: Saturday, February 28, 2015 5:03 AM
   To: Wang, Yalin
   Cc: 'Minchan Kim'; Andrew Morton; linux-kernel@vger.kernel.org; linux-
   m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
   Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty

   On Fri 27-02-15 11:37:18, Wang, Yalin wrote:
This patch add ClearPageDirty() to clear AnonPage dirty flag,
the Anonpage mapcount must be 1, so that this page is only used by
the current process, not shared by other process like fork().
if not clear page dirty for this anon page, the page will never be
treated as freeable.

   Very well spotted! I haven't noticed that during the review.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 mm/madvise.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..257925a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t *pmd,
   unsigned long addr,
continue;

page = vm_normal_page(vma, addr, ptent);
-   if (!page)
+   if (!page || !PageAnon(page) || !trylock_page(page))
continue;

   PageAnon check seems to be redundant because we are not allowing
   MADV_FREE on any !anon private mappings AFAIR.
  I only see this check:
  /* MADV_FREE works for only anon vma at the moment */
  if (vma-vm_file)
  return -EINVAL;

  but for file private map, there are also AnonPage sometimes, do we need
 change
  to like this:
  if (vma-vm_flags  VM_SHARED)
  return -EINVAL;

 I couldn't understand your point. In this stage, we intentionally
 disabled madvise_free on file mapped area(AFAIRC, some guys tried
 it long time ago but it had many issues so dropped).
 So, how can file-private mmaped can reach this code?
 Could you elaborate it more about that why we need PageAnon check
 in here?

I send a new patch:
[RFC V2] mm: change mm_advise_free to clear page dirty
Please have a look at it.
Thanks for your comments!

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] mm: change mm_advise_free to clear page dirty

2015-02-27 Thread Wang, Yalin

This patch add ClearPageDirty() to clear AnonPage dirty flag,
if not clear page dirty for this anon page, the page will never be
treated as freeable. we also make sure the shared AnonPage is not
freeable, we implement it by dirty all copyed AnonPage pte,
so that make sure the Anonpage will not become freeable, unless
all process which shared this page call madvise_free syscall.

Another change is that we also handle file map page,
we just clear pte young bit for file map, this is useful,
it can make reclaim patch move file pages into inactive
lru list aggressively.

Signed-off-by: Yalin Wang 
---
 mm/madvise.c | 26 +++---
 mm/memory.c  | 12 ++--
 2 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..712756b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -299,30 +299,38 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
long addr,
page = vm_normal_page(vma, addr, ptent);
if (!page)
continue;
+   if (!PageAnon(page))
+   goto set_pte;
+   if (!trylock_page(page))
+   continue;
 
if (PageSwapCache(page)) {
-   if (!trylock_page(page))
-   continue;
-
if (!try_to_free_swap(page)) {
unlock_page(page);
continue;
}
-
-   ClearPageDirty(page);
-   unlock_page(page);
}
 
/*
+* we clear page dirty flag for AnonPage, no matter if this
+* page is in swapcahce or not, AnonPage not in swapcache also 
set
+* dirty flag sometimes, this happened when an AnonPage is 
removed
+* from swapcahce by try_to_free_swap()
+*/
+   ClearPageDirty(page);
+   unlock_page(page);
+   /*
 * Some of architecture(ex, PPC) don't update TLB
 * with set_pte_at and tlb_remove_tlb_entry so for
 * the portability, remap the pte with old|clean
 * after pte clearing.
 */
+set_pte:
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb->fullmm);
ptent = pte_mkold(ptent);
-   ptent = pte_mkclean(ptent);
+   if (PageAnon(page))
+   ptent = pte_mkclean(ptent);
set_pte_at(mm, addr, pte, ptent);
tlb_remove_tlb_entry(tlb, pte, addr);
}
@@ -364,10 +372,6 @@ static int madvise_free_single_vma(struct vm_area_struct 
*vma,
if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
return -EINVAL;
 
-   /* MADV_FREE works for only anon vma at the moment */
-   if (vma->vm_file)
-   return -EINVAL;
-
start = max(vma->vm_start, start_addr);
if (start >= vma->vm_end)
return -EINVAL;
diff --git a/mm/memory.c b/mm/memory.c
index 8068893..3d949b3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -874,10 +874,18 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
if (page) {
get_page(page);
page_dup_rmap(page);
-   if (PageAnon(page))
+   if (PageAnon(page)) {
+   /*
+* we dirty the copyed pte for anon page,
+* this is useful for madvise_free_pte_range(),
+* this can prevent shared anon page freed by 
madvise_free
+* syscall
+*/
+   pte = pte_mkdirty(pte);
rss[MM_ANONPAGES]++;
-   else
+   } else {
rss[MM_FILEPAGES]++;
+   }
}
 
 out_set_pte:
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] mm: change mm_advise_free to clear page dirty

2015-02-27 Thread Wang, Yalin

> -Original Message-
> From: Michal Hocko [mailto:msts...@gmail.com] On Behalf Of Michal Hocko
> Sent: Saturday, February 28, 2015 5:03 AM
> To: Wang, Yalin
> Cc: 'Minchan Kim'; Andrew Morton; linux-kernel@vger.kernel.org; linux-
> m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
> Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty
> 
> On Fri 27-02-15 11:37:18, Wang, Yalin wrote:
> > This patch add ClearPageDirty() to clear AnonPage dirty flag,
> > the Anonpage mapcount must be 1, so that this page is only used by
> > the current process, not shared by other process like fork().
> > if not clear page dirty for this anon page, the page will never be
> > treated as freeable.
> 
> Very well spotted! I haven't noticed that during the review.
> 
> > Signed-off-by: Yalin Wang 
> > ---
> >  mm/madvise.c | 15 +--
> >  1 file changed, 5 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 6d0fcb8..257925a 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t *pmd,
> unsigned long addr,
> > continue;
> >
> > page = vm_normal_page(vma, addr, ptent);
> > -   if (!page)
> > +   if (!page || !PageAnon(page) || !trylock_page(page))
> > continue;
> 
> PageAnon check seems to be redundant because we are not allowing
> MADV_FREE on any !anon private mappings AFAIR.
I only see this check:
/* MADV_FREE works for only anon vma at the moment */
if (vma->vm_file)
return -EINVAL;

but for file private map, there are also AnonPage sometimes, do we need change
to like this:
if (vma->vm_flags & VM_SHARED)
return -EINVAL;
> >
> > if (PageSwapCache(page)) {
> > -   if (!trylock_page(page))
> > +   if (!try_to_free_swap(page))
> > continue;
> 
> You need to unlock the page here.
Good spot.

> > -
> > -   if (!try_to_free_swap(page)) {
> > -   unlock_page(page);
> > -   continue;
> > -   }
> > -
> > -   ClearPageDirty(page);
> > -   unlock_page(page);
> > }
> >
> > +   if (page_mapcount(page) == 1)
> > +   ClearPageDirty(page);
> 
> Please add a comment about why we need to ClearPageDirty even
> !PageSwapCache. Anon pages are usually not marked dirty AFAIR. The
> reason seem to be racing try_to_free_swap which sets the page that way
> (although I do not seem to remember why are we doing that in the first
> place...)
> 
Use page_mapcount to judge if a page can be clear dirty flag seems
Not a very good solution, that is because we don't know how many
ptes are share this page, I am thinking if there is some good solution
For shared AnonPage.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] mm: change mm_advise_free to clear page dirty

2015-02-27 Thread Wang, Yalin

 -Original Message-
 From: Michal Hocko [mailto:msts...@gmail.com] On Behalf Of Michal Hocko
 Sent: Saturday, February 28, 2015 5:03 AM
 To: Wang, Yalin
 Cc: 'Minchan Kim'; Andrew Morton; linux-kernel@vger.kernel.org; linux-
 m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
 Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty

 On Fri 27-02-15 11:37:18, Wang, Yalin wrote:
  This patch add ClearPageDirty() to clear AnonPage dirty flag,
  the Anonpage mapcount must be 1, so that this page is only used by
  the current process, not shared by other process like fork().
  if not clear page dirty for this anon page, the page will never be
  treated as freeable.

 Very well spotted! I haven't noticed that during the review.

  Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
  ---
   mm/madvise.c | 15 +--
   1 file changed, 5 insertions(+), 10 deletions(-)

  diff --git a/mm/madvise.c b/mm/madvise.c
  index 6d0fcb8..257925a 100644
  --- a/mm/madvise.c
  +++ b/mm/madvise.c
  @@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t *pmd,
 unsigned long addr,
  continue;

  page = vm_normal_page(vma, addr, ptent);
  -   if (!page)
  +   if (!page || !PageAnon(page) || !trylock_page(page))
  continue;

 PageAnon check seems to be redundant because we are not allowing
 MADV_FREE on any !anon private mappings AFAIR.
I only see this check:
/* MADV_FREE works for only anon vma at the moment */
if (vma-vm_file)
return -EINVAL;

but for file private map, there are also AnonPage sometimes, do we need change
to like this:
if (vma-vm_flags  VM_SHARED)
return -EINVAL;

  if (PageSwapCache(page)) {
  -   if (!trylock_page(page))
  +   if (!try_to_free_swap(page))
  continue;

 You need to unlock the page here.
Good spot.

  -
  -   if (!try_to_free_swap(page)) {
  -   unlock_page(page);
  -   continue;
  -   }
  -
  -   ClearPageDirty(page);
  -   unlock_page(page);
  }

  +   if (page_mapcount(page) == 1)
  +   ClearPageDirty(page);

 Please add a comment about why we need to ClearPageDirty even
 !PageSwapCache. Anon pages are usually not marked dirty AFAIR. The
 reason seem to be racing try_to_free_swap which sets the page that way
 (although I do not seem to remember why are we doing that in the first
 place...)

Use page_mapcount to judge if a page can be clear dirty flag seems
Not a very good solution, that is because we don't know how many
ptes are share this page, I am thinking if there is some good solution
For shared AnonPage.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] mm: change mm_advise_free to clear page dirty

2015-02-27 Thread Wang, Yalin

This patch add ClearPageDirty() to clear AnonPage dirty flag,
if not clear page dirty for this anon page, the page will never be
treated as freeable. we also make sure the shared AnonPage is not
freeable, we implement it by dirty all copyed AnonPage pte,
so that make sure the Anonpage will not become freeable, unless
all process which shared this page call madvise_free syscall.

Another change is that we also handle file map page,
we just clear pte young bit for file map, this is useful,
it can make reclaim patch move file pages into inactive
lru list aggressively.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 mm/madvise.c | 26 +++---
 mm/memory.c  | 12 ++--
 2 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..712756b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -299,30 +299,38 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
long addr,
page = vm_normal_page(vma, addr, ptent);
if (!page)
continue;
+   if (!PageAnon(page))
+   goto set_pte;
+   if (!trylock_page(page))
+   continue;
 
if (PageSwapCache(page)) {
-   if (!trylock_page(page))
-   continue;
-
if (!try_to_free_swap(page)) {
unlock_page(page);
continue;
}
-
-   ClearPageDirty(page);
-   unlock_page(page);
}
 
/*
+* we clear page dirty flag for AnonPage, no matter if this
+* page is in swapcahce or not, AnonPage not in swapcache also 
set
+* dirty flag sometimes, this happened when an AnonPage is 
removed
+* from swapcahce by try_to_free_swap()
+*/
+   ClearPageDirty(page);
+   unlock_page(page);
+   /*
 * Some of architecture(ex, PPC) don't update TLB
 * with set_pte_at and tlb_remove_tlb_entry so for
 * the portability, remap the pte with old|clean
 * after pte clearing.
 */
+set_pte:
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb-fullmm);
ptent = pte_mkold(ptent);
-   ptent = pte_mkclean(ptent);
+   if (PageAnon(page))
+   ptent = pte_mkclean(ptent);
set_pte_at(mm, addr, pte, ptent);
tlb_remove_tlb_entry(tlb, pte, addr);
}
@@ -364,10 +372,6 @@ static int madvise_free_single_vma(struct vm_area_struct 
*vma,
if (vma-vm_flags  (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
return -EINVAL;
 
-   /* MADV_FREE works for only anon vma at the moment */
-   if (vma-vm_file)
-   return -EINVAL;
-
start = max(vma-vm_start, start_addr);
if (start = vma-vm_end)
return -EINVAL;
diff --git a/mm/memory.c b/mm/memory.c
index 8068893..3d949b3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -874,10 +874,18 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
if (page) {
get_page(page);
page_dup_rmap(page);
-   if (PageAnon(page))
+   if (PageAnon(page)) {
+   /*
+* we dirty the copyed pte for anon page,
+* this is useful for madvise_free_pte_range(),
+* this can prevent shared anon page freed by 
madvise_free
+* syscall
+*/
+   pte = pte_mkdirty(pte);
rss[MM_ANONPAGES]++;
-   else
+   } else {
rss[MM_FILEPAGES]++;
+   }
}
 
 out_set_pte:
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] mm: change mm_advise_free to clear page dirty

2015-02-26 Thread Wang, Yalin

> -Original Message-
> From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
> Sent: Friday, February 27, 2015 2:44 PM
> To: Wang, Yalin
> Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org; linux-
> m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
> Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty
> 
> On Fri, Feb 27, 2015 at 01:48:48PM +0800, Wang, Yalin wrote:
> > > -Original Message-
> > > From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan
> Kim
> > > Sent: Friday, February 27, 2015 1:28 PM
> > > To: Wang, Yalin
> > > Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org; linux-
> > > m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
> > > Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty
> > >
> > > Hello,
> > >
> > > On Fri, Feb 27, 2015 at 11:37:18AM +0800, Wang, Yalin wrote:
> > > > This patch add ClearPageDirty() to clear AnonPage dirty flag,
> > > > the Anonpage mapcount must be 1, so that this page is only used by
> > > > the current process, not shared by other process like fork().
> > > > if not clear page dirty for this anon page, the page will never be
> > > > treated as freeable.
> > >
> > > In case of anonymous page, it has PG_dirty when VM adds it to
> > > swap cache and clear it in clear_page_dirty_for_io. That's why
> > > I added ClearPageDirty if we found it in swapcache.
> > > What case am I missing? It would be better to understand if you
> > > describe specific scenario.
> > >
> > > Thanks.
> > >
> > > >
> > > > Signed-off-by: Yalin Wang 
> > > > ---
> > > >  mm/madvise.c | 15 +--
> > > >  1 file changed, 5 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > index 6d0fcb8..257925a 100644
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t *pmd,
> > > unsigned long addr,
> > > > continue;
> > > >
> > > > page = vm_normal_page(vma, addr, ptent);
> > > > -   if (!page)
> > > > +   if (!page || !PageAnon(page) || !trylock_page(page))
> > > > continue;
> > > >
> > > > if (PageSwapCache(page)) {
> > > > -   if (!trylock_page(page))
> > > > +   if (!try_to_free_swap(page))
> > > > continue;
> > > > -
> > > > -   if (!try_to_free_swap(page)) {
> > > > -   unlock_page(page);
> > > > -   continue;
> > > > -   }
> > > > -
> > > > -   ClearPageDirty(page);
> > > > -   unlock_page(page);
> > > > }
> > > >
> > > > +   if (page_mapcount(page) == 1)
> > > > +   ClearPageDirty(page);
> > > > +   unlock_page(page);
> > > > /*
> > > >  * Some of architecture(ex, PPC) don't update TLB
> > > >  * with set_pte_at and tlb_remove_tlb_entry so for
> > > > --
> > Yes, for page which is in SwapCache, it is correct,
> > But for anon page which is not in SwapCache, it is always
> > PageDirty(), so we should also clear dirty bit to make it freeable,
> 
> No. Every anon page starts from !PageDirty and it has PG_dirty
> only when it's addeded into swap cache. If vm_swap_full turns on,
> a page in swap cache could have PG_dirty via try_to_free_swap again.

mmm..
sometimes you can see an anon page PageDirty(), but it is not in swapcache,
for example, handle_pte_fault()-->do_swap_page()-->try_to_free_swap(),
at this time, the page is deleted from swapcache and is marked PageDirty(),


> So, Do you have concern about swapped-out pages when MADV_FREE is
> called? If so, please look at my patch.
> 
> https://lkml.org/lkml/2015/2/25/43
> 
> It will zap the swapped out page. So, this is not a issue any more?
> 
> >
> > Another problem  is that if an anon page is shared by more than one
> process,
> > This happened when fork(

RE: [RFC] mm: change mm_advise_free to clear page dirty

2015-02-26 Thread Wang, Yalin

> -Original Message-
> From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
> Sent: Friday, February 27, 2015 1:28 PM
> To: Wang, Yalin
> Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org; linux-
> m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
> Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty
> 
> Hello,
> 
> On Fri, Feb 27, 2015 at 11:37:18AM +0800, Wang, Yalin wrote:
> > This patch add ClearPageDirty() to clear AnonPage dirty flag,
> > the Anonpage mapcount must be 1, so that this page is only used by
> > the current process, not shared by other process like fork().
> > if not clear page dirty for this anon page, the page will never be
> > treated as freeable.
> 
> In case of anonymous page, it has PG_dirty when VM adds it to
> swap cache and clear it in clear_page_dirty_for_io. That's why
> I added ClearPageDirty if we found it in swapcache.
> What case am I missing? It would be better to understand if you
> describe specific scenario.
> 
> Thanks.
> 
> >
> > Signed-off-by: Yalin Wang 
> > ---
> >  mm/madvise.c | 15 +--
> >  1 file changed, 5 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 6d0fcb8..257925a 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t *pmd,
> unsigned long addr,
> > continue;
> >
> > page = vm_normal_page(vma, addr, ptent);
> > -   if (!page)
> > +   if (!page || !PageAnon(page) || !trylock_page(page))
> > continue;
> >
> > if (PageSwapCache(page)) {
> > -   if (!trylock_page(page))
> > +   if (!try_to_free_swap(page))
> > continue;
> > -
> > -   if (!try_to_free_swap(page)) {
> > -   unlock_page(page);
> > -   continue;
> > -   }
> > -
> > -   ClearPageDirty(page);
> > -   unlock_page(page);
> > }
> >
> > +   if (page_mapcount(page) == 1)
> > +   ClearPageDirty(page);
> > +   unlock_page(page);
> > /*
> >  * Some of architecture(ex, PPC) don't update TLB
> >  * with set_pte_at and tlb_remove_tlb_entry so for
> > --
Yes, for page which is in SwapCache, it is correct,
But for anon page which is not in SwapCache, it is always
PageDirty(), so we should also clear dirty bit to make it freeable,

Another problem  is that if an anon page is shared by more than one process,
This happened when fork(), the anon page will be copy on write,
In this case, we should not clear page dirty,
Otherwise, other process will get zero page if the page is freed by reclaim 
path,
This is not correct for other process which don't call MADV_FREE syscall.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] mm: change mm_advise_free to clear page dirty

2015-02-26 Thread Wang, Yalin

This patch add ClearPageDirty() to clear AnonPage dirty flag,
the Anonpage mapcount must be 1, so that this page is only used by
the current process, not shared by other process like fork().
if not clear page dirty for this anon page, the page will never be
treated as freeable.

Signed-off-by: Yalin Wang 
---
 mm/madvise.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..257925a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
long addr,
continue;
 
page = vm_normal_page(vma, addr, ptent);
-   if (!page)
+   if (!page || !PageAnon(page) || !trylock_page(page))
continue;
 
if (PageSwapCache(page)) {
-   if (!trylock_page(page))
+   if (!try_to_free_swap(page))
continue;
-
-   if (!try_to_free_swap(page)) {
-   unlock_page(page);
-   continue;
-   }
-
-   ClearPageDirty(page);
-   unlock_page(page);
}
 
+   if (page_mapcount(page) == 1)
+   ClearPageDirty(page);
+   unlock_page(page);
/*
 * Some of architecture(ex, PPC) don't update TLB
 * with set_pte_at and tlb_remove_tlb_entry so for
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] mm: change mm_advise_free to clear page dirty

2015-02-26 Thread Wang, Yalin

 -Original Message-
 From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
 Sent: Friday, February 27, 2015 1:28 PM
 To: Wang, Yalin
 Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org; linux-
 m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
 Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty

 Hello,

 On Fri, Feb 27, 2015 at 11:37:18AM +0800, Wang, Yalin wrote:
  This patch add ClearPageDirty() to clear AnonPage dirty flag,
  the Anonpage mapcount must be 1, so that this page is only used by
  the current process, not shared by other process like fork().
  if not clear page dirty for this anon page, the page will never be
  treated as freeable.

 In case of anonymous page, it has PG_dirty when VM adds it to
 swap cache and clear it in clear_page_dirty_for_io. That's why
 I added ClearPageDirty if we found it in swapcache.
 What case am I missing? It would be better to understand if you
 describe specific scenario.

 Thanks.

  Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
  ---
   mm/madvise.c | 15 +--
   1 file changed, 5 insertions(+), 10 deletions(-)

  diff --git a/mm/madvise.c b/mm/madvise.c
  index 6d0fcb8..257925a 100644
  --- a/mm/madvise.c
  +++ b/mm/madvise.c
  @@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t *pmd,
 unsigned long addr,
  continue;

  page = vm_normal_page(vma, addr, ptent);
  -   if (!page)
  +   if (!page || !PageAnon(page) || !trylock_page(page))
  continue;

  if (PageSwapCache(page)) {
  -   if (!trylock_page(page))
  +   if (!try_to_free_swap(page))
  continue;
  -
  -   if (!try_to_free_swap(page)) {
  -   unlock_page(page);
  -   continue;
  -   }
  -
  -   ClearPageDirty(page);
  -   unlock_page(page);
  }

  +   if (page_mapcount(page) == 1)
  +   ClearPageDirty(page);
  +   unlock_page(page);
  /*
   * Some of architecture(ex, PPC) don't update TLB
   * with set_pte_at and tlb_remove_tlb_entry so for
  --
Yes, for page which is in SwapCache, it is correct,
But for anon page which is not in SwapCache, it is always
PageDirty(), so we should also clear dirty bit to make it freeable,

Another problem  is that if an anon page is shared by more than one process,
This happened when fork(), the anon page will be copy on write,
In this case, we should not clear page dirty,
Otherwise, other process will get zero page if the page is freed by reclaim 
path,
This is not correct for other process which don't call MADV_FREE syscall.

Thanks

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] mm: change mm_advise_free to clear page dirty

2015-02-26 Thread Wang, Yalin

 -Original Message-
 From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan Kim
 Sent: Friday, February 27, 2015 2:44 PM
 To: Wang, Yalin
 Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org; linux-
 m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
 Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty

 On Fri, Feb 27, 2015 at 01:48:48PM +0800, Wang, Yalin wrote:
   -Original Message-
   From: Minchan Kim [mailto:minchan@gmail.com] On Behalf Of Minchan
 Kim
   Sent: Friday, February 27, 2015 1:28 PM
   To: Wang, Yalin
   Cc: Michal Hocko; Andrew Morton; linux-kernel@vger.kernel.org; linux-
   m...@kvack.org; Rik van Riel; Johannes Weiner; Mel Gorman; Shaohua Li
   Subject: Re: [RFC] mm: change mm_advise_free to clear page dirty

   Hello,

   On Fri, Feb 27, 2015 at 11:37:18AM +0800, Wang, Yalin wrote:
This patch add ClearPageDirty() to clear AnonPage dirty flag,
the Anonpage mapcount must be 1, so that this page is only used by
the current process, not shared by other process like fork().
if not clear page dirty for this anon page, the page will never be
treated as freeable.

   In case of anonymous page, it has PG_dirty when VM adds it to
   swap cache and clear it in clear_page_dirty_for_io. That's why
   I added ClearPageDirty if we found it in swapcache.
   What case am I missing? It would be better to understand if you
   describe specific scenario.

   Thanks.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 mm/madvise.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..257925a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t *pmd,
   unsigned long addr,
continue;

page = vm_normal_page(vma, addr, ptent);
-   if (!page)
+   if (!page || !PageAnon(page) || !trylock_page(page))
continue;

if (PageSwapCache(page)) {
-   if (!trylock_page(page))
+   if (!try_to_free_swap(page))
continue;
-
-   if (!try_to_free_swap(page)) {
-   unlock_page(page);
-   continue;
-   }
-
-   ClearPageDirty(page);
-   unlock_page(page);
}

+   if (page_mapcount(page) == 1)
+   ClearPageDirty(page);
+   unlock_page(page);
/*
 * Some of architecture(ex, PPC) don't update TLB
 * with set_pte_at and tlb_remove_tlb_entry so for
--
  Yes, for page which is in SwapCache, it is correct,
  But for anon page which is not in SwapCache, it is always
  PageDirty(), so we should also clear dirty bit to make it freeable,

 No. Every anon page starts from !PageDirty and it has PG_dirty
 only when it's addeded into swap cache. If vm_swap_full turns on,
 a page in swap cache could have PG_dirty via try_to_free_swap again.

mmm..
sometimes you can see an anon page PageDirty(), but it is not in swapcache,
for example, handle_pte_fault()--do_swap_page()--try_to_free_swap(),
at this time, the page is deleted from swapcache and is marked PageDirty(),

 So, Do you have concern about swapped-out pages when MADV_FREE is
 called? If so, please look at my patch.

 https://lkml.org/lkml/2015/2/25/43

 It will zap the swapped out page. So, this is not a issue any more?

  Another problem  is that if an anon page is shared by more than one
 process,
  This happened when fork(), the anon page will be copy on write,
  In this case, we should not clear page dirty,
  This is not correct for other process which don't call MADV_FREE syscall.

 You mean we shouldn't inherit MADV_FREE attribute?
 Why?

Is it correct behavior if code like this:

Parent:
ptr1 = malloc(len);
memset(ptr1, 'a', len);
fork();
if (I am parent)
madvise_free(ptr1, len);

child:
sleep(10);
parse_data(ptr1, len);  // child may see zero, not 'a',
// is it the right behavior that the programer want?

Because child don't call madvise_free(), so it should see 'a', not zero page.
Isn't it ?
Thanks

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] mm: change mm_advise_free to clear page dirty

2015-02-26 Thread Wang, Yalin

This patch add ClearPageDirty() to clear AnonPage dirty flag,
the Anonpage mapcount must be 1, so that this page is only used by
the current process, not shared by other process like fork().
if not clear page dirty for this anon page, the page will never be
treated as freeable.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 mm/madvise.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..257925a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -297,22 +297,17 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
long addr,
continue;
 
page = vm_normal_page(vma, addr, ptent);
-   if (!page)
+   if (!page || !PageAnon(page) || !trylock_page(page))
continue;
 
if (PageSwapCache(page)) {
-   if (!trylock_page(page))
+   if (!try_to_free_swap(page))
continue;
-
-   if (!try_to_free_swap(page)) {
-   unlock_page(page);
-   continue;
-   }
-
-   ClearPageDirty(page);
-   unlock_page(page);
}
 
+   if (page_mapcount(page) == 1)
+   ClearPageDirty(page);
+   unlock_page(page);
/*
 * Some of architecture(ex, PPC) don't update TLB
 * with set_pte_at and tlb_remove_tlb_entry so for
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V3] test bit before clear files_struct bits

2015-02-15 Thread Wang, Yalin

Test bit before clear close_on_exec and open_fds,
by trace __clear_bit(), these 2 place are false in most times,
we test it so that we don't need clear_bit, and we can win
in most time.
Add *_if_need bitop non-atomic version.

Signed-off-by: Yalin Wang 
---
 fs/file.c   |  9 +++--
 include/asm-generic/bitops/non-atomic.h | 35 +
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index ee738ea..2e08e6f 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -209,7 +209,7 @@ static inline void __set_close_on_exec(int fd, struct 
fdtable *fdt)
 
 static inline void __clear_close_on_exec(int fd, struct fdtable *fdt)
 {
-   __clear_bit(fd, fdt->close_on_exec);
+   __clear_bit_if_need(fd, fdt->close_on_exec);
 }
 
 static inline void __set_open_fd(int fd, struct fdtable *fdt)
@@ -222,6 +222,11 @@ static inline void __clear_open_fd(int fd, struct fdtable 
*fdt)
__clear_bit(fd, fdt->open_fds);
 }
 
+static inline void __clear_open_fd_if_need(int fd, struct fdtable *fdt)
+{
+   __clear_bit_if_need(fd, fdt->open_fds);
+}
+
 static int count_open_files(struct fdtable *fdt)
 {
int size = fdt->max_fds;
@@ -316,7 +321,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int 
*errorp)
 * is partway through open().  So make sure that this
 * fd is available to the new process.
 */
-   __clear_open_fd(open_files - i, new_fdt);
+   __clear_open_fd_if_need(open_files - i, new_fdt);
}
rcu_assign_pointer(*new_fds++, f);
}
diff --git a/include/asm-generic/bitops/non-atomic.h 
b/include/asm-generic/bitops/non-atomic.h
index 697cc2b..0713e3b 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -105,4 +105,39 @@ static inline int test_bit(int nr, const volatile unsigned 
long *addr)
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
 
+/*
+ * __*_if_need version is used in cases that you don't want write a bit which
+ * have been set / clear, to avoid dirty cacheline.
+ */
+static inline void __set_bit_if_need(int nr, volatile unsigned long *addr)
+{
+   if (!test_bit(nr, addr))
+   __set_bit(nr, addr);
+}
+
+static inline void __clear_bit_if_need(int nr, volatile unsigned long *addr)
+{
+   if (test_bit(nr, addr))
+   __clear_bit(nr, addr);
+}
+
+static inline int __test_and_set_bit_if_need(int nr, volatile unsigned long 
*addr)
+{
+   if (!test_bit(nr, addr)) {
+   __set_bit(nr, addr);
+   return false;
+   } else {
+   return true;
+   }
+}
+
+static inline int __test_and_clear_bit_if_need(int nr, volatile unsigned long 
*addr)
+{
+   if (test_bit(nr, addr)) {
+   __clear_bit(nr, addr);
+   return true;
+   } else {
+   return false;
+   }
+}
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V3] test bit before clear files_struct bits

2015-02-15 Thread Wang, Yalin

Test bit before clear close_on_exec and open_fds,
by trace __clear_bit(), these 2 place are false in most times,
we test it so that we don't need clear_bit, and we can win
in most time.
Add *_if_need bitop non-atomic version.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 fs/file.c   |  9 +++--
 include/asm-generic/bitops/non-atomic.h | 35 +
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index ee738ea..2e08e6f 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -209,7 +209,7 @@ static inline void __set_close_on_exec(int fd, struct 
fdtable *fdt)
 
 static inline void __clear_close_on_exec(int fd, struct fdtable *fdt)
 {
-   __clear_bit(fd, fdt-close_on_exec);
+   __clear_bit_if_need(fd, fdt-close_on_exec);
 }
 
 static inline void __set_open_fd(int fd, struct fdtable *fdt)
@@ -222,6 +222,11 @@ static inline void __clear_open_fd(int fd, struct fdtable 
*fdt)
__clear_bit(fd, fdt-open_fds);
 }
 
+static inline void __clear_open_fd_if_need(int fd, struct fdtable *fdt)
+{
+   __clear_bit_if_need(fd, fdt-open_fds);
+}
+
 static int count_open_files(struct fdtable *fdt)
 {
int size = fdt-max_fds;
@@ -316,7 +321,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int 
*errorp)
 * is partway through open().  So make sure that this
 * fd is available to the new process.
 */
-   __clear_open_fd(open_files - i, new_fdt);
+   __clear_open_fd_if_need(open_files - i, new_fdt);
}
rcu_assign_pointer(*new_fds++, f);
}
diff --git a/include/asm-generic/bitops/non-atomic.h 
b/include/asm-generic/bitops/non-atomic.h
index 697cc2b..0713e3b 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -105,4 +105,39 @@ static inline int test_bit(int nr, const volatile unsigned 
long *addr)
return 1UL  (addr[BIT_WORD(nr)]  (nr  (BITS_PER_LONG-1)));
 }
 
+/*
+ * __*_if_need version is used in cases that you don't want write a bit which
+ * have been set / clear, to avoid dirty cacheline.
+ */
+static inline void __set_bit_if_need(int nr, volatile unsigned long *addr)
+{
+   if (!test_bit(nr, addr))
+   __set_bit(nr, addr);
+}
+
+static inline void __clear_bit_if_need(int nr, volatile unsigned long *addr)
+{
+   if (test_bit(nr, addr))
+   __clear_bit(nr, addr);
+}
+
+static inline int __test_and_set_bit_if_need(int nr, volatile unsigned long 
*addr)
+{
+   if (!test_bit(nr, addr)) {
+   __set_bit(nr, addr);
+   return false;
+   } else {
+   return true;
+   }
+}
+
+static inline int __test_and_clear_bit_if_need(int nr, volatile unsigned long 
*addr)
+{
+   if (test_bit(nr, addr)) {
+   __clear_bit(nr, addr);
+   return true;
+   } else {
+   return false;
+   }
+}
 #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] test_bit before clear files_struct bits

2015-02-09 Thread Wang, Yalin

add test_bit() before clear close_on_exec and open_fds,
by trace __clear_bit(), these 2 place are false in most times,
we test it so that we don't need clear_bit, and we can win
in most time.

Signed-off-by: Yalin Wang 
---
 fs/file.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index ee738ea..b0e059c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -209,7 +209,8 @@ static inline void __set_close_on_exec(int fd, struct 
fdtable *fdt)
 
 static inline void __clear_close_on_exec(int fd, struct fdtable *fdt)
 {
-   __clear_bit(fd, fdt->close_on_exec);
+   if (test_bit(fd, fdt->close_on_exec))
+   __clear_bit(fd, fdt->close_on_exec);
 }
 
 static inline void __set_open_fd(int fd, struct fdtable *fdt)
@@ -309,7 +310,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int 
*errorp)
struct file *f = *old_fds++;
if (f) {
get_file(f);
-   } else {
+   } else if (test_bit(open_files - i, new_fdt->open_fds)) {
/*
 * The fd may be claimed in the fd bitmap but not yet
 * instantiated in the files array if a sibling thread
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] test_bit before clear files_struct bits

2015-02-09 Thread Wang, Yalin

add test_bit() before clear close_on_exec and open_fds,
by trace __clear_bit(), these 2 place are false in most times,
we test it so that we don't need dirty cacheline, and we can win
in most time.

Signed-off-by: Yalin Wang 
---
 fs/file.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index ee738ea..2fd296b 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -209,7 +209,8 @@ static inline void __set_close_on_exec(int fd, struct 
fdtable *fdt)
 
 static inline void __clear_close_on_exec(int fd, struct fdtable *fdt)
 {
-   __clear_bit(fd, fdt->close_on_exec);
+   if (test_bit(fd, fdt->close_on_exec))
+   __clear_bit(fd, fdt->close_on_exec);
 }
 
 static inline void __set_open_fd(int fd, struct fdtable *fdt)
@@ -309,7 +310,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int 
*errorp)
struct file *f = *old_fds++;
if (f) {
get_file(f);
-   } else {
+   } else if (test_bit(open_files - i, new_fdt)) {
/*
 * The fd may be claimed in the fd bitmap but not yet
 * instantiated in the files array if a sibling thread
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] change non-atomic bitops method

2015-02-09 Thread Wang, Yalin

> -Original Message-
> From: Andrew Morton [mailto:a...@linux-foundation.org]
> Sent: Tuesday, February 10, 2015 4:34 AM
> To: Wang, Yalin
> Cc: 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-a...@vger.kernel.org';
> 'linux-kernel@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
> ker...@lists.infradead.org'
> Subject: Re: [RFC] change non-atomic bitops method
> 
> On Mon, 9 Feb 2015 16:18:10 +0800 "Wang, Yalin" 
> wrote:
> 
> > > That we're running clear_bit against a cleared bit 10% of the time is a
> > > bit alarming.  I wonder where that's coming from.
> > >
> > > The enormous miss count in test_and_clear_bit() might indicate an
> > > inefficiency somewhere.
> > I te-test the patch on 3.10 kernel.
> > The result like this:
> >
> > VmallocChunk:   251498164 kB
> > __set_bit_miss_count:11730 __set_bit_success_count:1036316
> > __clear_bit_miss_count:209640 __clear_bit_success_count:4806556
> > __test_and_set_bit_miss_count:0 __test_and_set_bit_success_count:121
> > __test_and_clear_bit_miss_count:0 __test_and_clear_bit_success_count:445
> >
> > __clear_bit miss rate is a little high,
> > I check the log, and most miss coming from this code:
> >
> > <6>[  442.701798] [] warn_slowpath_fmt+0x4c/0x58
> > <6>[  442.701805] [] __clear_bit+0x98/0xa4
> > <6>[  442.701813] [] __alloc_fd+0xc8/0x124
> > <6>[  442.701821] [] get_unused_fd_flags+0x28/0x34
> > <6>[  442.701828] [] do_sys_open+0x10c/0x1c0
> > <6>[  442.701835] [] SyS_openat+0xc/0x18
> > In __clear_close_on_exec(fd, fdt);
> >
> >
> >
> > <6>[  442.695354] [] warn_slowpath_fmt+0x4c/0x58
> > <6>[  442.695359] [] __clear_bit+0x98/0xa4
> > <6>[  442.695367] [] dup_fd+0x1d4/0x280
> > <6>[  442.695375] [] copy_process.part.56+0x42c/0xe38
> > <6>[  442.695382] [] do_fork+0xe0/0x360
> > <6>[  442.695389] [] SyS_clone+0x10/0x1c
> > In __clear_open_fd(open_files - i, new_fdt);
> >
> > Do we need test_bit() before clear_bit()at these 2 place?
> 
> I don't know.  I was happily typing in this:
> 
> diff -puN include/linux/bitops.h~a include/linux/bitops.h
> --- a/include/linux/bitops.h~a
> +++ a/include/linux/bitops.h
> @@ -226,5 +226,37 @@ extern unsigned long find_last_bit(const
>  unsigned long size);
>  #endif
> 
> +/**
> + * __set_clear_bit - non-atomically set a bit if it is presently clear
> + * @nr: The bit number
> + * @addr: The base address of the operation
> + *
> + * __set_clear_bit() and similar functions avoid unnecessarily dirtying a
> + * cacheline when the operation will have no effect.
> + */
> +static inline void __set_clear_bit(unsigned nr, volatile unsigned long
> *addr)
> +{
> + if (!test_bit(nr, addr))
> + __set_bit(nr, addr);
> +}
> +
> +static inline void __clear_set_bit(unsigned nr, volatile unsigned long
> *addr)
> +{
> + if (test_bit(nr, addr))
> + __clear_bit(nr, addr);
> +}
> +
> +static inline void set_clear_bit(unsigned nr, volatile unsigned long
> *addr)
> +{
> + if (!test_bit(nr, addr))
> + set_bit(nr, addr);
> +}
> +
> +static inline void clear_set_bit(unsigned nr, volatile unsigned long
> *addr)
> +{
> + if (test_bit(nr, addr))
> + clear_bit(nr, addr);
> +}
> +
>  #endif /* __KERNEL__ */
>  #endif
> 
> (maybe __set_bit_if_clear would be a better name)
> 
> But I don't know if it will do anything useful.  The CPU *should* be
> able to avoid dirtying the cacheline on its own: it has all the info it
> needs to know that no writeback will be needed.  But I don't know which
> (if any) CPUs perform this optimisation.
I will send a new patch for your review .

Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] change non-atomic bitops method

2015-02-09 Thread Wang, Yalin

> -Original Message-
> From: Andrew Morton [mailto:a...@linux-foundation.org]
> Sent: Tuesday, February 03, 2015 6:59 PM
> To: Wang, Yalin
> Cc: 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-a...@vger.kernel.org';
> 'linux-kernel@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
> ker...@lists.infradead.org'
> Subject: Re: [RFC] change non-atomic bitops method
> 
> On Tue, 3 Feb 2015 16:42:14 +0800 "Wang, Yalin" 
> wrote:
> 
> > I make a change in kernel to test hit/miss ratio:
> 
> Neat, thanks.
> 
> >
> > ...
> >
> > After use the phone some time:
> > root@D5303:/ # cat /proc/meminfo
> > VmallocUsed:   10348 kB
> > VmallocChunk:  75632 kB
> > __set_bit_miss_count:10002 __set_bit_success_count:1096661
> > __clear_bit_miss_count:359484 __clear_bit_success_count:3674617
> > __test_and_set_bit_miss_count:7 __test_and_set_bit_success_count:221
> > __test_and_clear_bit_miss_count:924611
> __test_and_clear_bit_success_count:193
> >
> > __test_and_clear_bit_miss_count has a very high miss rate.
> > In fact, I think set/clear/test_and_set(clear)_bit atomic version can
> also
> > Be investigated to see its miss ratio,
> > I have not tested the atomic version,
> > Because it reside in different architectures.
> 
> Hopefully misses in test_and_X_bit are not a problem.  The CPU
> implementation would be pretty stupid to go and dirty the cacheline
> when it knows it didn't change anything.  But maybe I'm wrong about
> that.
> 
> That we're running clear_bit against a cleared bit 10% of the time is a
> bit alarming.  I wonder where that's coming from.
> 
> The enormous miss count in test_and_clear_bit() might indicate an
> inefficiency somewhere.
I te-test the patch on 3.10 kernel.
The result like this:

VmallocChunk:   251498164 kB
__set_bit_miss_count:11730 __set_bit_success_count:1036316
__clear_bit_miss_count:209640 __clear_bit_success_count:4806556
__test_and_set_bit_miss_count:0 __test_and_set_bit_success_count:121
__test_and_clear_bit_miss_count:0 __test_and_clear_bit_success_count:445

__clear_bit miss rate is a little high,
I check the log, and most miss coming from this code:

<6>[  442.701798] [] warn_slowpath_fmt+0x4c/0x58
<6>[  442.701805] [] __clear_bit+0x98/0xa4
<6>[  442.701813] [] __alloc_fd+0xc8/0x124
<6>[  442.701821] [] get_unused_fd_flags+0x28/0x34
<6>[  442.701828] [] do_sys_open+0x10c/0x1c0
<6>[  442.701835] [] SyS_openat+0xc/0x18
In __clear_close_on_exec(fd, fdt);



<6>[  442.695354] [] warn_slowpath_fmt+0x4c/0x58
<6>[  442.695359] [] __clear_bit+0x98/0xa4
<6>[  442.695367] [] dup_fd+0x1d4/0x280
<6>[  442.695375] [] copy_process.part.56+0x42c/0xe38
<6>[  442.695382] [] do_fork+0xe0/0x360
<6>[  442.695389] [] SyS_clone+0x10/0x1c
In __clear_open_fd(open_files - i, new_fdt);

Do we need test_bit() before clear_bit()at these 2 place?

Thanks




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] test_bit before clear files_struct bits

2015-02-09 Thread Wang, Yalin

add test_bit() before clear close_on_exec and open_fds,
by trace __clear_bit(), these 2 place are false in most times,
we test it so that we don't need dirty cacheline, and we can win
in most time.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 fs/file.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index ee738ea..2fd296b 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -209,7 +209,8 @@ static inline void __set_close_on_exec(int fd, struct 
fdtable *fdt)
 
 static inline void __clear_close_on_exec(int fd, struct fdtable *fdt)
 {
-   __clear_bit(fd, fdt-close_on_exec);
+   if (test_bit(fd, fdt-close_on_exec))
+   __clear_bit(fd, fdt-close_on_exec);
 }
 
 static inline void __set_open_fd(int fd, struct fdtable *fdt)
@@ -309,7 +310,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int 
*errorp)
struct file *f = *old_fds++;
if (f) {
get_file(f);
-   } else {
+   } else if (test_bit(open_files - i, new_fdt)) {
/*
 * The fd may be claimed in the fd bitmap but not yet
 * instantiated in the files array if a sibling thread
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] change non-atomic bitops method

2015-02-09 Thread Wang, Yalin

 -Original Message-
 From: Andrew Morton [mailto:a...@linux-foundation.org]
 Sent: Tuesday, February 10, 2015 4:34 AM
 To: Wang, Yalin
 Cc: 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-a...@vger.kernel.org';
 'linux-kernel@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
 ker...@lists.infradead.org'
 Subject: Re: [RFC] change non-atomic bitops method
 
 On Mon, 9 Feb 2015 16:18:10 +0800 Wang, Yalin yalin.w...@sonymobile.com
 wrote:
 
   That we're running clear_bit against a cleared bit 10% of the time is a
   bit alarming.  I wonder where that's coming from.
  
   The enormous miss count in test_and_clear_bit() might indicate an
   inefficiency somewhere.
  I te-test the patch on 3.10 kernel.
  The result like this:
 
  VmallocChunk:   251498164 kB
  __set_bit_miss_count:11730 __set_bit_success_count:1036316
  __clear_bit_miss_count:209640 __clear_bit_success_count:4806556
  __test_and_set_bit_miss_count:0 __test_and_set_bit_success_count:121
  __test_and_clear_bit_miss_count:0 __test_and_clear_bit_success_count:445
 
  __clear_bit miss rate is a little high,
  I check the log, and most miss coming from this code:
 
  6[  442.701798] [ffc00021d084] warn_slowpath_fmt+0x4c/0x58
  6[  442.701805] [ffc0002461a8] __clear_bit+0x98/0xa4
  6[  442.701813] [ffc0003126ac] __alloc_fd+0xc8/0x124
  6[  442.701821] [ffc000312768] get_unused_fd_flags+0x28/0x34
  6[  442.701828] [ffc0002f9370] do_sys_open+0x10c/0x1c0
  6[  442.701835] [ffc0002f9458] SyS_openat+0xc/0x18
  In __clear_close_on_exec(fd, fdt);
 
 
 
  6[  442.695354] [ffc00021d084] warn_slowpath_fmt+0x4c/0x58
  6[  442.695359] [ffc0002461a8] __clear_bit+0x98/0xa4
  6[  442.695367] [ffc000312340] dup_fd+0x1d4/0x280
  6[  442.695375] [ffc00021b07c] copy_process.part.56+0x42c/0xe38
  6[  442.695382] [ffc00021bb9c] do_fork+0xe0/0x360
  6[  442.695389] [ffc00021beb4] SyS_clone+0x10/0x1c
  In __clear_open_fd(open_files - i, new_fdt);
 
  Do we need test_bit() before clear_bit()at these 2 place?
 
 I don't know.  I was happily typing in this:
 
 diff -puN include/linux/bitops.h~a include/linux/bitops.h
 --- a/include/linux/bitops.h~a
 +++ a/include/linux/bitops.h
 @@ -226,5 +226,37 @@ extern unsigned long find_last_bit(const
  unsigned long size);
  #endif
 
 +/**
 + * __set_clear_bit - non-atomically set a bit if it is presently clear
 + * @nr: The bit number
 + * @addr: The base address of the operation
 + *
 + * __set_clear_bit() and similar functions avoid unnecessarily dirtying a
 + * cacheline when the operation will have no effect.
 + */
 +static inline void __set_clear_bit(unsigned nr, volatile unsigned long
 *addr)
 +{
 + if (!test_bit(nr, addr))
 + __set_bit(nr, addr);
 +}
 +
 +static inline void __clear_set_bit(unsigned nr, volatile unsigned long
 *addr)
 +{
 + if (test_bit(nr, addr))
 + __clear_bit(nr, addr);
 +}
 +
 +static inline void set_clear_bit(unsigned nr, volatile unsigned long
 *addr)
 +{
 + if (!test_bit(nr, addr))
 + set_bit(nr, addr);
 +}
 +
 +static inline void clear_set_bit(unsigned nr, volatile unsigned long
 *addr)
 +{
 + if (test_bit(nr, addr))
 + clear_bit(nr, addr);
 +}
 +
  #endif /* __KERNEL__ */
  #endif
 
 (maybe __set_bit_if_clear would be a better name)
 
 But I don't know if it will do anything useful.  The CPU *should* be
 able to avoid dirtying the cacheline on its own: it has all the info it
 needs to know that no writeback will be needed.  But I don't know which
 (if any) CPUs perform this optimisation.
I will send a new patch for your review .

Thanks

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] test_bit before clear files_struct bits

2015-02-09 Thread Wang, Yalin

add test_bit() before clear close_on_exec and open_fds,
by trace __clear_bit(), these 2 place are false in most times,
we test it so that we don't need clear_bit, and we can win
in most time.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 fs/file.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index ee738ea..b0e059c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -209,7 +209,8 @@ static inline void __set_close_on_exec(int fd, struct 
fdtable *fdt)
 
 static inline void __clear_close_on_exec(int fd, struct fdtable *fdt)
 {
-   __clear_bit(fd, fdt-close_on_exec);
+   if (test_bit(fd, fdt-close_on_exec))
+   __clear_bit(fd, fdt-close_on_exec);
 }
 
 static inline void __set_open_fd(int fd, struct fdtable *fdt)
@@ -309,7 +310,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int 
*errorp)
struct file *f = *old_fds++;
if (f) {
get_file(f);
-   } else {
+   } else if (test_bit(open_files - i, new_fdt-open_fds)) {
/*
 * The fd may be claimed in the fd bitmap but not yet
 * instantiated in the files array if a sibling thread
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] change non-atomic bitops method

2015-02-09 Thread Wang, Yalin

 -Original Message-
 From: Andrew Morton [mailto:a...@linux-foundation.org]
 Sent: Tuesday, February 03, 2015 6:59 PM
 To: Wang, Yalin
 Cc: 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-a...@vger.kernel.org';
 'linux-kernel@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
 ker...@lists.infradead.org'
 Subject: Re: [RFC] change non-atomic bitops method
 
 On Tue, 3 Feb 2015 16:42:14 +0800 Wang, Yalin yalin.w...@sonymobile.com
 wrote:
 
  I make a change in kernel to test hit/miss ratio:
 
 Neat, thanks.
 
 
  ...
 
  After use the phone some time:
  root@D5303:/ # cat /proc/meminfo
  VmallocUsed:   10348 kB
  VmallocChunk:  75632 kB
  __set_bit_miss_count:10002 __set_bit_success_count:1096661
  __clear_bit_miss_count:359484 __clear_bit_success_count:3674617
  __test_and_set_bit_miss_count:7 __test_and_set_bit_success_count:221
  __test_and_clear_bit_miss_count:924611
 __test_and_clear_bit_success_count:193
 
  __test_and_clear_bit_miss_count has a very high miss rate.
  In fact, I think set/clear/test_and_set(clear)_bit atomic version can
 also
  Be investigated to see its miss ratio,
  I have not tested the atomic version,
  Because it reside in different architectures.
 
 Hopefully misses in test_and_X_bit are not a problem.  The CPU
 implementation would be pretty stupid to go and dirty the cacheline
 when it knows it didn't change anything.  But maybe I'm wrong about
 that.
 
 That we're running clear_bit against a cleared bit 10% of the time is a
 bit alarming.  I wonder where that's coming from.
 
 The enormous miss count in test_and_clear_bit() might indicate an
 inefficiency somewhere.
I te-test the patch on 3.10 kernel.
The result like this:

VmallocChunk:   251498164 kB
__set_bit_miss_count:11730 __set_bit_success_count:1036316
__clear_bit_miss_count:209640 __clear_bit_success_count:4806556
__test_and_set_bit_miss_count:0 __test_and_set_bit_success_count:121
__test_and_clear_bit_miss_count:0 __test_and_clear_bit_success_count:445

__clear_bit miss rate is a little high,
I check the log, and most miss coming from this code:

6[  442.701798] [ffc00021d084] warn_slowpath_fmt+0x4c/0x58
6[  442.701805] [ffc0002461a8] __clear_bit+0x98/0xa4
6[  442.701813] [ffc0003126ac] __alloc_fd+0xc8/0x124
6[  442.701821] [ffc000312768] get_unused_fd_flags+0x28/0x34
6[  442.701828] [ffc0002f9370] do_sys_open+0x10c/0x1c0
6[  442.701835] [ffc0002f9458] SyS_openat+0xc/0x18
In __clear_close_on_exec(fd, fdt);



6[  442.695354] [ffc00021d084] warn_slowpath_fmt+0x4c/0x58
6[  442.695359] [ffc0002461a8] __clear_bit+0x98/0xa4
6[  442.695367] [ffc000312340] dup_fd+0x1d4/0x280
6[  442.695375] [ffc00021b07c] copy_process.part.56+0x42c/0xe38
6[  442.695382] [ffc00021bb9c] do_fork+0xe0/0x360
6[  442.695389] [ffc00021beb4] SyS_clone+0x10/0x1c
In __clear_open_fd(open_files - i, new_fdt);

Do we need test_bit() before clear_bit()at these 2 place?

Thanks




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] change non-atomic bitops method

2015-02-03 Thread Wang, Yalin

> -Original Message-
> From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> Sent: Tuesday, February 03, 2015 5:34 PM
> To: Andrew Morton
> Cc: Wang, Yalin; 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-
> a...@vger.kernel.org'; 'linux-kernel@vger.kernel.org';
> 'li...@arm.linux.org.uk'; 'linux-arm-ker...@lists.infradead.org'
> Subject: Re: [RFC] change non-atomic bitops method
> 
> On Tue, Feb 03 2015, Andrew Morton  wrote:
> 
> >
> > You aren't measuring the right thing.  You should compare
> >
> > if (p[i] != x)
> > p[i] = x;
> >
> > versus
> >
> > p[i] = x;
> >
> > and you should do this for two cases:
> >
> > a) p[i] == x
> >
> > b) p[i] != x
> >
> >
> > The first code sequence will be slower when (p[i] != x) and faster when
> > (p[i] == x).
> >
> >
> > Next, we should instrument the kernel to work out the frequency of
> > set_bit on an already-set bit.
> >
> > It is only with both these ratios that we can work out whether the
> > patch is a net gain.  My suspicion is that set_bit on an already-set
> > bit is so rare that the patch will be a loss.
> 
> There's also the code-bloat issue to consider (instruction cache and all
> that); the conditional versions will usually require three extra
> instructions and an extra register. Also, the cache line might already
> be dirty because of something in the surrounding code. Instruction cache
> misses and larger stack footprint (from larger register pressure) won't
> show up in a microbenchmark, so I think this needs a real-world example
> to justify.
> 
> But even if one finds some hot spot that would benefit from the
> conditional, that should simply be added explicitly there, instead of
> pessimizing every other user. (A good example of that is 358eec18243a
> ("vfs: decrapify dput(), fix cache behavior under normal load")).

Oh, thank you, it is really a very nice example.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] change non-atomic bitops method

2015-02-03 Thread Wang, Yalin

> -Original Message-
> From: Wang, Yalin
> Sent: Tuesday, February 03, 2015 3:04 PM
> To: 'Andrew Morton'
> Cc: 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-a...@vger.kernel.org';
> 'linux-kernel@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
> ker...@lists.infradead.org'
> Subject: RE: [RFC] change non-atomic bitops method
> 
> > -Original Message-
> > From: Andrew Morton [mailto:a...@linux-foundation.org]
> > Sent: Tuesday, February 03, 2015 2:39 PM
> > To: Wang, Yalin
> > Cc: 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-a...@vger.kernel.org';
> > 'linux-kernel@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
> > ker...@lists.infradead.org'
> > Subject: Re: [RFC] change non-atomic bitops method
> >
> > On Tue, 3 Feb 2015 13:42:45 +0800 "Wang, Yalin"
> 
> > wrote:
> > >
> > > ...
> > >
> > > #ifdef CHECK_BEFORE_SET
> > >   if (p[i] != times)
> > > #endif
> > >
> > > ...
> > >
> > > 
> > > One run on CPU0, reader thread run on CPU1,
> > > Test result:
> > > sudo ./cache_test
> > > reader:8.426228173
> > > 8.672198335
> > >
> > > With -DCHECK_BEFORE_SET
> > > sudo ./cache_test_check
> > > reader:7.537036819
> > > 10.799746531
> > >
> >
> > You aren't measuring the right thing.  You should compare
> >
> > if (p[i] != x)
> > p[i] = x;
> >
> > versus
> >
> > p[i] = x;
> >
> > and you should do this for two cases:
> >
> > a) p[i] == x
> >
> > b) p[i] != x
> >
> >
> > The first code sequence will be slower when (p[i] != x) and faster when
> > (p[i] == x).
> >
> >
> > Next, we should instrument the kernel to work out the frequency of
> > set_bit on an already-set bit.
> >
> > It is only with both these ratios that we can work out whether the
> > patch is a net gain.  My suspicion is that set_bit on an already-set
> > bit is so rare that the patch will be a loss.
> I see, let's change the test a little:
> 1)
>   memset(p, 0, SIZE);
>   if (p[i] != 0)
>   p[i] = 0;  // never called
> 
>   #sudo ./cache_test_check
>   6.698153838
>   reader:7.529402625
> 
> 
> 2)
>   memset(p, 0, SIZE);
>   if (p[i] == 0)
>   p[i] = 0; // always called
>   #sudo ./cache_test_check
>   reader:7.895421311
>   9.000889973
> 
> Thanks
> 
> 
I make a change in kernel to test hit/miss ratio:
---
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 80e4645..a82937b 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -15,6 +16,41 @@
 #include 
 #include "internal.h"
 
+atomic_t __set_bit_success_count = ATOMIC_INIT(0);
+atomic_t __set_bit_miss_count = ATOMIC_INIT(0);
+EXPORT_SYMBOL_GPL(__set_bit_success_count);
+EXPORT_SYMBOL_GPL(__set_bit_miss_count);
+
+atomic_t __clear_bit_success_count = ATOMIC_INIT(0);
+atomic_t __clear_bit_miss_count = ATOMIC_INIT(0);
+EXPORT_SYMBOL_GPL(__clear_bit_success_count);
+EXPORT_SYMBOL_GPL(__clear_bit_miss_count);
+
+atomic_t __test_and_set_bit_success_count = ATOMIC_INIT(0);
+atomic_t __test_and_set_bit_miss_count = ATOMIC_INIT(0);
+EXPORT_SYMBOL_GPL(__test_and_set_bit_success_count);
+EXPORT_SYMBOL_GPL(__test_and_set_bit_miss_count);
+
+atomic_t __test_and_clear_bit_success_count = ATOMIC_INIT(0);
+atomic_t __test_and_clear_bit_miss_count = ATOMIC_INIT(0);
+EXPORT_SYMBOL_GPL(__test_and_clear_bit_success_count);
+EXPORT_SYMBOL_GPL(__test_and_clear_bit_miss_count);
+
+/*
+ * atomic bitops
+ */
+atomic_t set_bit_success_count = ATOMIC_INIT(0);
+atomic_t set_bit_miss_count = ATOMIC_INIT(0);
+
+atomic_t clear_bit_success_count = ATOMIC_INIT(0);
+atomic_t clear_bit_miss_count = ATOMIC_INIT(0);
+
+atomic_t test_and_set_bit_success_count = ATOMIC_INIT(0);
+atomic_t test_and_set_bit_miss_count = ATOMIC_INIT(0);
+
+atomic_t test_and_clear_bit_success_count = ATOMIC_INIT(0);
+atomic_t test_and_clear_bit_miss_count = ATOMIC_INIT(0);
+
 void __attribute__((weak)) arch_report_meminfo(struct seq_file *m)
 {
 }
@@ -165,6 +201,18 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
   HPAGE_PMD_NR)
 #endif
);
+   seq_printf(m,   "__set_bit_miss_count:%d __set_bit_success_count:%d\n"
+   "__clear_bit_miss_count:%d 
__clear_bit_success_count:%d\n"
+   "__test_and_set_bit_miss_count:%d 
__test_and_set_bit_success_count:%d\n&q

RE: [RFC] change non-atomic bitops method

2015-02-03 Thread Wang, Yalin

 -Original Message-
 From: Wang, Yalin
 Sent: Tuesday, February 03, 2015 3:04 PM
 To: 'Andrew Morton'
 Cc: 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-a...@vger.kernel.org';
 'linux-kernel@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
 ker...@lists.infradead.org'
 Subject: RE: [RFC] change non-atomic bitops method
 
  -Original Message-
  From: Andrew Morton [mailto:a...@linux-foundation.org]
  Sent: Tuesday, February 03, 2015 2:39 PM
  To: Wang, Yalin
  Cc: 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-a...@vger.kernel.org';
  'linux-kernel@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
  ker...@lists.infradead.org'
  Subject: Re: [RFC] change non-atomic bitops method
 
  On Tue, 3 Feb 2015 13:42:45 +0800 Wang, Yalin
 yalin.w...@sonymobile.com
  wrote:
  
   ...
  
   #ifdef CHECK_BEFORE_SET
 if (p[i] != times)
   #endif
  
   ...
  
   
   One run on CPU0, reader thread run on CPU1,
   Test result:
   sudo ./cache_test
   reader:8.426228173
   8.672198335
  
   With -DCHECK_BEFORE_SET
   sudo ./cache_test_check
   reader:7.537036819
   10.799746531
  
 
  You aren't measuring the right thing.  You should compare
 
  if (p[i] != x)
  p[i] = x;
 
  versus
 
  p[i] = x;
 
  and you should do this for two cases:
 
  a) p[i] == x
 
  b) p[i] != x
 
 
  The first code sequence will be slower when (p[i] != x) and faster when
  (p[i] == x).
 
 
  Next, we should instrument the kernel to work out the frequency of
  set_bit on an already-set bit.
 
  It is only with both these ratios that we can work out whether the
  patch is a net gain.  My suspicion is that set_bit on an already-set
  bit is so rare that the patch will be a loss.
 I see, let's change the test a little:
 1)
   memset(p, 0, SIZE);
   if (p[i] != 0)
   p[i] = 0;  // never called
 
   #sudo ./cache_test_check
   6.698153838
   reader:7.529402625
 
 
 2)
   memset(p, 0, SIZE);
   if (p[i] == 0)
   p[i] = 0; // always called
   #sudo ./cache_test_check
   reader:7.895421311
   9.000889973
 
 Thanks
 
 
I make a change in kernel to test hit/miss ratio:
---
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 80e4645..a82937b 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -2,6 +2,7 @@
 #include linux/hugetlb.h
 #include linux/init.h
 #include linux/kernel.h
+#include linux/module.h
 #include linux/mm.h
 #include linux/mman.h
 #include linux/mmzone.h
@@ -15,6 +16,41 @@
 #include asm/pgtable.h
 #include internal.h
 
+atomic_t __set_bit_success_count = ATOMIC_INIT(0);
+atomic_t __set_bit_miss_count = ATOMIC_INIT(0);
+EXPORT_SYMBOL_GPL(__set_bit_success_count);
+EXPORT_SYMBOL_GPL(__set_bit_miss_count);
+
+atomic_t __clear_bit_success_count = ATOMIC_INIT(0);
+atomic_t __clear_bit_miss_count = ATOMIC_INIT(0);
+EXPORT_SYMBOL_GPL(__clear_bit_success_count);
+EXPORT_SYMBOL_GPL(__clear_bit_miss_count);
+
+atomic_t __test_and_set_bit_success_count = ATOMIC_INIT(0);
+atomic_t __test_and_set_bit_miss_count = ATOMIC_INIT(0);
+EXPORT_SYMBOL_GPL(__test_and_set_bit_success_count);
+EXPORT_SYMBOL_GPL(__test_and_set_bit_miss_count);
+
+atomic_t __test_and_clear_bit_success_count = ATOMIC_INIT(0);
+atomic_t __test_and_clear_bit_miss_count = ATOMIC_INIT(0);
+EXPORT_SYMBOL_GPL(__test_and_clear_bit_success_count);
+EXPORT_SYMBOL_GPL(__test_and_clear_bit_miss_count);
+
+/*
+ * atomic bitops
+ */
+atomic_t set_bit_success_count = ATOMIC_INIT(0);
+atomic_t set_bit_miss_count = ATOMIC_INIT(0);
+
+atomic_t clear_bit_success_count = ATOMIC_INIT(0);
+atomic_t clear_bit_miss_count = ATOMIC_INIT(0);
+
+atomic_t test_and_set_bit_success_count = ATOMIC_INIT(0);
+atomic_t test_and_set_bit_miss_count = ATOMIC_INIT(0);
+
+atomic_t test_and_clear_bit_success_count = ATOMIC_INIT(0);
+atomic_t test_and_clear_bit_miss_count = ATOMIC_INIT(0);
+
 void __attribute__((weak)) arch_report_meminfo(struct seq_file *m)
 {
 }
@@ -165,6 +201,18 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
   HPAGE_PMD_NR)
 #endif
);
+   seq_printf(m,   __set_bit_miss_count:%d __set_bit_success_count:%d\n
+   __clear_bit_miss_count:%d 
__clear_bit_success_count:%d\n
+   __test_and_set_bit_miss_count:%d 
__test_and_set_bit_success_count:%d\n
+   __test_and_clear_bit_miss_count:%d 
__test_and_clear_bit_success_count:%d\n,
+   atomic_read(__set_bit_miss_count), 
atomic_read(__set_bit_success_count),
+   atomic_read(__clear_bit_miss_count), 
atomic_read(__clear_bit_success_count),
+
+   atomic_read(__test_and_set_bit_miss_count),
+   atomic_read(__test_and_set_bit_success_count),
+
+   atomic_read(__test_and_clear_bit_miss_count),
+   atomic_read(__test_and_clear_bit_success_count));
 
hugetlb_report_meminfo(m);
 
diff --git

RE: [RFC] change non-atomic bitops method

2015-02-03 Thread Wang, Yalin

 -Original Message-
 From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
 Sent: Tuesday, February 03, 2015 5:34 PM
 To: Andrew Morton
 Cc: Wang, Yalin; 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-
 a...@vger.kernel.org'; 'linux-kernel@vger.kernel.org';
 'li...@arm.linux.org.uk'; 'linux-arm-ker...@lists.infradead.org'
 Subject: Re: [RFC] change non-atomic bitops method

 On Tue, Feb 03 2015, Andrew Morton a...@linux-foundation.org wrote:

  You aren't measuring the right thing.  You should compare

  if (p[i] != x)
  p[i] = x;

  versus

  p[i] = x;

  and you should do this for two cases:

  a) p[i] == x

  b) p[i] != x

  The first code sequence will be slower when (p[i] != x) and faster when
  (p[i] == x).

  Next, we should instrument the kernel to work out the frequency of
  set_bit on an already-set bit.

  It is only with both these ratios that we can work out whether the
  patch is a net gain.  My suspicion is that set_bit on an already-set
  bit is so rare that the patch will be a loss.

 There's also the code-bloat issue to consider (instruction cache and all
 that); the conditional versions will usually require three extra
 instructions and an extra register. Also, the cache line might already
 be dirty because of something in the surrounding code. Instruction cache
 misses and larger stack footprint (from larger register pressure) won't
 show up in a microbenchmark, so I think this needs a real-world example
 to justify.

 But even if one finds some hot spot that would benefit from the
 conditional, that should simply be added explicitly there, instead of
 pessimizing every other user. (A good example of that is 358eec18243a
 (vfs: decrapify dput(), fix cache behavior under normal load)).

Oh, thank you, it is really a very nice example.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] change non-atomic bitops method

2015-02-02 Thread Wang, Yalin

> -Original Message-
> From: Andrew Morton [mailto:a...@linux-foundation.org]
> Sent: Tuesday, February 03, 2015 2:39 PM
> To: Wang, Yalin
> Cc: 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-a...@vger.kernel.org';
> 'linux-kernel@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
> ker...@lists.infradead.org'
> Subject: Re: [RFC] change non-atomic bitops method
> 
> On Tue, 3 Feb 2015 13:42:45 +0800 "Wang, Yalin" 
> wrote:
> >
> > ...
> >
> > #ifdef CHECK_BEFORE_SET
> > if (p[i] != times)
> > #endif
> >
> > ...
> >
> > 
> > One run on CPU0, reader thread run on CPU1,
> > Test result:
> > sudo ./cache_test
> > reader:8.426228173
> > 8.672198335
> >
> > With -DCHECK_BEFORE_SET
> > sudo ./cache_test_check
> > reader:7.537036819
> > 10.799746531
> >
> 
> You aren't measuring the right thing.  You should compare
> 
>   if (p[i] != x)
>   p[i] = x;
> 
> versus
> 
>   p[i] = x;
> 
> and you should do this for two cases:
> 
> a) p[i] == x
> 
> b) p[i] != x
> 
> 
> The first code sequence will be slower when (p[i] != x) and faster when
> (p[i] == x).
> 
> 
> Next, we should instrument the kernel to work out the frequency of
> set_bit on an already-set bit.
> 
> It is only with both these ratios that we can work out whether the
> patch is a net gain.  My suspicion is that set_bit on an already-set
> bit is so rare that the patch will be a loss.
I see, let's change the test a little:
1)
memset(p, 0, SIZE);
if (p[i] != 0)
p[i] = 0;  // never called

#sudo ./cache_test_check
6.698153838
reader:7.529402625


2)
memset(p, 0, SIZE);
if (p[i] == 0)
p[i] = 0; // always called
#sudo ./cache_test_check
reader:7.895421311
9.000889973

Thanks




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] change non-atomic bitops method

2015-02-02 Thread Wang, Yalin

> -Original Message-
> From: Wang, Yalin
> Sent: Tuesday, February 03, 2015 10:13 AM
> To: 'Kirill A. Shutemov'; Andrew Morton
> Cc: 'a...@arndb.de'; 'linux-a...@vger.kernel.org'; 'linux-
> ker...@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
> ker...@lists.infradead.org'
> Subject: RE: [RFC] change non-atomic bitops method
> 
> > -Original Message-
> > From: Kirill A. Shutemov [mailto:kir...@shutemov.name]
> > Sent: Tuesday, February 03, 2015 9:18 AM
> > To: Andrew Morton
> > Cc: Wang, Yalin; 'a...@arndb.de'; 'linux-a...@vger.kernel.org'; 'linux-
> > ker...@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
> > ker...@lists.infradead.org'
> > Subject: Re: [RFC] change non-atomic bitops method
> >
> > On Mon, Feb 02, 2015 at 03:29:09PM -0800, Andrew Morton wrote:
> > > On Mon, 2 Feb 2015 11:55:03 +0800 "Wang, Yalin"
> >  wrote:
> > >
> > > > This patch change non-atomic bitops,
> > > > add a if() condition to test it, before set/clear the bit.
> > > > so that we don't need dirty the cache line, if this bit
> > > > have been set or clear. On SMP system, dirty cache line will
> > > > need invalidate other processors cache line, this will have
> > > > some impact on SMP systems.
> > > >
> > > > --- a/include/asm-generic/bitops/non-atomic.h
> > > > +++ b/include/asm-generic/bitops/non-atomic.h
> > > > @@ -17,7 +17,9 @@ static inline void __set_bit(int nr, volatile
> > unsigned long *addr)
> > > > unsigned long mask = BIT_MASK(nr);
> > > > unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> > > >
> > > > -   *p  |= mask;
> > > > +   if ((*p & mask) == 0)
> > > > +   *p  |= mask;
> > > > +
> > > >  }
> > >
> > > hm, maybe.
> > >
> > > It will speed up set_bit on an already-set bit.  But it will slow down
> > > set_bit on a not-set bit.  And the latter case is presumably much, much
> > > more common.
> > >
> > > How do we know the patch is a net performance gain?
> >
> > Let's try to measure. The micro benchmark:
> >
> > #include 
> > #include 
> > #include 
> >
> > #ifdef CACHE_HOT
> > #define SIZE (2UL << 20)
> > #define TIMES 1000
> > #else
> > #define SIZE (1UL << 30)
> > #define TIMES 1
> > #endif
> >
> > int main(int argc, char **argv)
> > {
> > struct timespec a, b, diff;
> > unsigned long i, *p, times = TIMES;
> >
> > p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> > MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1,
> > 0);
> >
> > clock_gettime(CLOCK_MONOTONIC, );
> > while (times--) {
> > for (i = 0; i < SIZE/64/sizeof(*p); i++) {
> > #ifdef CHECK_BEFORE_SET
> > if (p[i] != times)
> > #endif
> > p[i] = times;
> > }
> > }
> > clock_gettime(CLOCK_MONOTONIC, );
> >
> > diff.tv_sec = b.tv_sec - a.tv_sec;
> > if (a.tv_nsec > b.tv_nsec) {
> > diff.tv_sec--;
> > diff.tv_nsec = 10 + b.tv_nsec - a.tv_nsec;
> > } else
> > diff.tv_nsec = b.tv_nsec - a.tv_nsec;
> >
> > printf("%lu.%09lu\n", diff.tv_sec, diff.tv_nsec);
> > return 0;
> > }
> >
> > Results for 10 runs on my laptop -- i5-3427U (IvyBridge 1.8 Ghz, 2.8Ghz
> > Turbo
> > with 3MB LLC):
> >
> > Avg Stddev
> > baseline21.5351 0.5315
> > -DCHECK_BEFORE_SET  21.9834 0.0789
> > -DCACHE_HOT 14.9987 0.0365
> > -DCACHE_HOT -DCHECK_BEFORE_SET  29.9010 0.0204
> >
> > Difference between -DCACHE_HOT and -DCACHE_HOT -DCHECK_BEFORE_SET appears
> > huge, but if you recalculate it to CPU cycles per inner loop @ 2.8 Ghz,
> > it's 1.02530 and 2.04401 CPU cycles respectively.
> >
> > Basically, the check is free on decent CPU.
> >
> Awesome test, but you only test the one cpu which running this code,
> Have not consider the other CPUs, whose cache line wi

RE: [RFC] change non-atomic bitops method

2015-02-02 Thread Wang, Yalin

> -Original Message-
> From: Kirill A. Shutemov [mailto:kir...@shutemov.name]
> Sent: Tuesday, February 03, 2015 9:18 AM
> To: Andrew Morton
> Cc: Wang, Yalin; 'a...@arndb.de'; 'linux-a...@vger.kernel.org'; 'linux-
> ker...@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
> ker...@lists.infradead.org'
> Subject: Re: [RFC] change non-atomic bitops method
> 
> On Mon, Feb 02, 2015 at 03:29:09PM -0800, Andrew Morton wrote:
> > On Mon, 2 Feb 2015 11:55:03 +0800 "Wang, Yalin"
>  wrote:
> >
> > > This patch change non-atomic bitops,
> > > add a if() condition to test it, before set/clear the bit.
> > > so that we don't need dirty the cache line, if this bit
> > > have been set or clear. On SMP system, dirty cache line will
> > > need invalidate other processors cache line, this will have
> > > some impact on SMP systems.
> > >
> > > --- a/include/asm-generic/bitops/non-atomic.h
> > > +++ b/include/asm-generic/bitops/non-atomic.h
> > > @@ -17,7 +17,9 @@ static inline void __set_bit(int nr, volatile
> unsigned long *addr)
> > >   unsigned long mask = BIT_MASK(nr);
> > >   unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> > >
> > > - *p  |= mask;
> > > + if ((*p & mask) == 0)
> > > + *p  |= mask;
> > > +
> > >  }
> >
> > hm, maybe.
> >
> > It will speed up set_bit on an already-set bit.  But it will slow down
> > set_bit on a not-set bit.  And the latter case is presumably much, much
> > more common.
> >
> > How do we know the patch is a net performance gain?
> 
> Let's try to measure. The micro benchmark:
> 
>   #include 
>   #include 
>   #include 
> 
>   #ifdef CACHE_HOT
>   #define SIZE (2UL << 20)
>   #define TIMES 1000
>   #else
>   #define SIZE (1UL << 30)
>   #define TIMES 1
>   #endif
> 
>   int main(int argc, char **argv)
>   {
>   struct timespec a, b, diff;
>   unsigned long i, *p, times = TIMES;
> 
>   p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
>   MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1,
> 0);
> 
>   clock_gettime(CLOCK_MONOTONIC, );
>   while (times--) {
>   for (i = 0; i < SIZE/64/sizeof(*p); i++) {
>   #ifdef CHECK_BEFORE_SET
>   if (p[i] != times)
>   #endif
>   p[i] = times;
>   }
>   }
>   clock_gettime(CLOCK_MONOTONIC, );
> 
>   diff.tv_sec = b.tv_sec - a.tv_sec;
>   if (a.tv_nsec > b.tv_nsec) {
>   diff.tv_sec--;
>   diff.tv_nsec = 10 + b.tv_nsec - a.tv_nsec;
>   } else
>   diff.tv_nsec = b.tv_nsec - a.tv_nsec;
> 
>   printf("%lu.%09lu\n", diff.tv_sec, diff.tv_nsec);
>   return 0;
>   }
> 
> Results for 10 runs on my laptop -- i5-3427U (IvyBridge 1.8 Ghz, 2.8Ghz
> Turbo
> with 3MB LLC):
> 
>   Avg Stddev
> baseline  21.5351 0.5315
> -DCHECK_BEFORE_SET21.9834 0.0789
> -DCACHE_HOT   14.9987 0.0365
> -DCACHE_HOT -DCHECK_BEFORE_SET29.9010 0.0204
> 
> Difference between -DCACHE_HOT and -DCACHE_HOT -DCHECK_BEFORE_SET appears
> huge, but if you recalculate it to CPU cycles per inner loop @ 2.8 Ghz,
> it's 1.02530 and 2.04401 CPU cycles respectively.
> 
> Basically, the check is free on decent CPU.
> 
Awesome test, but you only test the one cpu which running this code,
Have not consider the other CPUs, whose cache line will be invalidate if
The cache is dirtied by writer CPU,
So another test should be running 2 thread on two different CPUs(bind to CPU),
One write , one read, to see the impact on the reader CPU.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] change non-atomic bitops method

2015-02-02 Thread Wang, Yalin

 -Original Message-
 From: Kirill A. Shutemov [mailto:kir...@shutemov.name]
 Sent: Tuesday, February 03, 2015 9:18 AM
 To: Andrew Morton
 Cc: Wang, Yalin; 'a...@arndb.de'; 'linux-a...@vger.kernel.org'; 'linux-
 ker...@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
 ker...@lists.infradead.org'
 Subject: Re: [RFC] change non-atomic bitops method

 On Mon, Feb 02, 2015 at 03:29:09PM -0800, Andrew Morton wrote:
  On Mon, 2 Feb 2015 11:55:03 +0800 Wang, Yalin
 yalin.w...@sonymobile.com wrote:

   This patch change non-atomic bitops,
   add a if() condition to test it, before set/clear the bit.
   so that we don't need dirty the cache line, if this bit
   have been set or clear. On SMP system, dirty cache line will
   need invalidate other processors cache line, this will have
   some impact on SMP systems.

   --- a/include/asm-generic/bitops/non-atomic.h
   +++ b/include/asm-generic/bitops/non-atomic.h
   @@ -17,7 +17,9 @@ static inline void __set_bit(int nr, volatile
 unsigned long *addr)
 unsigned long mask = BIT_MASK(nr);
 unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);

   - *p  |= mask;
   + if ((*p  mask) == 0)
   + *p  |= mask;
   +
}

  hm, maybe.

  It will speed up set_bit on an already-set bit.  But it will slow down
  set_bit on a not-set bit.  And the latter case is presumably much, much
  more common.

  How do we know the patch is a net performance gain?

 Let's try to measure. The micro benchmark:

   #include stdio.h
   #include time.h
   #include sys/mman.h

   #ifdef CACHE_HOT
   #define SIZE (2UL  20)
   #define TIMES 1000
   #else
   #define SIZE (1UL  30)
   #define TIMES 1
   #endif

   int main(int argc, char **argv)
   {
   struct timespec a, b, diff;
   unsigned long i, *p, times = TIMES;

   p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
   MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1,
 0);

   clock_gettime(CLOCK_MONOTONIC, a);
   while (times--) {
   for (i = 0; i  SIZE/64/sizeof(*p); i++) {
   #ifdef CHECK_BEFORE_SET
   if (p[i] != times)
   #endif
   p[i] = times;
   }
   }
   clock_gettime(CLOCK_MONOTONIC, b);

   diff.tv_sec = b.tv_sec - a.tv_sec;
   if (a.tv_nsec  b.tv_nsec) {
   diff.tv_sec--;
   diff.tv_nsec = 10 + b.tv_nsec - a.tv_nsec;
   } else
   diff.tv_nsec = b.tv_nsec - a.tv_nsec;

   printf(%lu.%09lu\n, diff.tv_sec, diff.tv_nsec);
   return 0;
   }

 Results for 10 runs on my laptop -- i5-3427U (IvyBridge 1.8 Ghz, 2.8Ghz
 Turbo
 with 3MB LLC):

   Avg Stddev
 baseline  21.5351 0.5315
 -DCHECK_BEFORE_SET21.9834 0.0789
 -DCACHE_HOT   14.9987 0.0365
 -DCACHE_HOT -DCHECK_BEFORE_SET29.9010 0.0204

 Difference between -DCACHE_HOT and -DCACHE_HOT -DCHECK_BEFORE_SET appears
 huge, but if you recalculate it to CPU cycles per inner loop @ 2.8 Ghz,
 it's 1.02530 and 2.04401 CPU cycles respectively.

 Basically, the check is free on decent CPU.

Awesome test, but you only test the one cpu which running this code,
Have not consider the other CPUs, whose cache line will be invalidate if
The cache is dirtied by writer CPU,
So another test should be running 2 thread on two different CPUs(bind to CPU),
One write , one read, to see the impact on the reader CPU.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] change non-atomic bitops method

2015-02-02 Thread Wang, Yalin

 -Original Message-
 From: Wang, Yalin
 Sent: Tuesday, February 03, 2015 10:13 AM
 To: 'Kirill A. Shutemov'; Andrew Morton
 Cc: 'a...@arndb.de'; 'linux-a...@vger.kernel.org'; 'linux-
 ker...@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
 ker...@lists.infradead.org'
 Subject: RE: [RFC] change non-atomic bitops method
 
  -Original Message-
  From: Kirill A. Shutemov [mailto:kir...@shutemov.name]
  Sent: Tuesday, February 03, 2015 9:18 AM
  To: Andrew Morton
  Cc: Wang, Yalin; 'a...@arndb.de'; 'linux-a...@vger.kernel.org'; 'linux-
  ker...@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
  ker...@lists.infradead.org'
  Subject: Re: [RFC] change non-atomic bitops method
 
  On Mon, Feb 02, 2015 at 03:29:09PM -0800, Andrew Morton wrote:
   On Mon, 2 Feb 2015 11:55:03 +0800 Wang, Yalin
  yalin.w...@sonymobile.com wrote:
  
This patch change non-atomic bitops,
add a if() condition to test it, before set/clear the bit.
so that we don't need dirty the cache line, if this bit
have been set or clear. On SMP system, dirty cache line will
need invalidate other processors cache line, this will have
some impact on SMP systems.
   
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -17,7 +17,9 @@ static inline void __set_bit(int nr, volatile
  unsigned long *addr)
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
   
-   *p  |= mask;
+   if ((*p  mask) == 0)
+   *p  |= mask;
+
 }
  
   hm, maybe.
  
   It will speed up set_bit on an already-set bit.  But it will slow down
   set_bit on a not-set bit.  And the latter case is presumably much, much
   more common.
  
   How do we know the patch is a net performance gain?
 
  Let's try to measure. The micro benchmark:
 
  #include stdio.h
  #include time.h
  #include sys/mman.h
 
  #ifdef CACHE_HOT
  #define SIZE (2UL  20)
  #define TIMES 1000
  #else
  #define SIZE (1UL  30)
  #define TIMES 1
  #endif
 
  int main(int argc, char **argv)
  {
  struct timespec a, b, diff;
  unsigned long i, *p, times = TIMES;
 
  p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
  MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1,
  0);
 
  clock_gettime(CLOCK_MONOTONIC, a);
  while (times--) {
  for (i = 0; i  SIZE/64/sizeof(*p); i++) {
  #ifdef CHECK_BEFORE_SET
  if (p[i] != times)
  #endif
  p[i] = times;
  }
  }
  clock_gettime(CLOCK_MONOTONIC, b);
 
  diff.tv_sec = b.tv_sec - a.tv_sec;
  if (a.tv_nsec  b.tv_nsec) {
  diff.tv_sec--;
  diff.tv_nsec = 10 + b.tv_nsec - a.tv_nsec;
  } else
  diff.tv_nsec = b.tv_nsec - a.tv_nsec;
 
  printf(%lu.%09lu\n, diff.tv_sec, diff.tv_nsec);
  return 0;
  }
 
  Results for 10 runs on my laptop -- i5-3427U (IvyBridge 1.8 Ghz, 2.8Ghz
  Turbo
  with 3MB LLC):
 
  Avg Stddev
  baseline21.5351 0.5315
  -DCHECK_BEFORE_SET  21.9834 0.0789
  -DCACHE_HOT 14.9987 0.0365
  -DCACHE_HOT -DCHECK_BEFORE_SET  29.9010 0.0204
 
  Difference between -DCACHE_HOT and -DCACHE_HOT -DCHECK_BEFORE_SET appears
  huge, but if you recalculate it to CPU cycles per inner loop @ 2.8 Ghz,
  it's 1.02530 and 2.04401 CPU cycles respectively.
 
  Basically, the check is free on decent CPU.
 
 Awesome test, but you only test the one cpu which running this code,
 Have not consider the other CPUs, whose cache line will be invalidate if
 The cache is dirtied by writer CPU,
 So another test should be running 2 thread on two different CPUs(bind to
 CPU),
 One write , one read, to see the impact on the reader CPU.
I make a little change about your test progrom,
Add a new thread to test SMP cache impact.
---
#include stdio.h
#include time.h
#include sys/mman.h
#include errno.h
#define _GNU_SOURCE
#define __USE_GNU
#include sched.h
#include pthread.h

#ifdef CACHE_HOT
#define SIZE (2UL  20)
#define TIMES 10
#else
#define SIZE (1UL  20)
#define TIMES 1
#endif
static void *reader_thread(void *arg)
{

struct timespec a, b, diff;
unsigned long *p = arg;
volatile unsigned long temp;
unsigned long i, ret, times = TIMES;
cpu_set_t set;
CPU_ZERO(set);
CPU_SET(1, set);
ret = sched_setaffinity(-1, sizeof(cpu_set_t), set);
if (ret  0) {
printf(sched_setaffinity error:%s, strerror(errno));
}
clock_gettime(CLOCK_MONOTONIC, a);
while (times

RE: [RFC] change non-atomic bitops method

2015-02-02 Thread Wang, Yalin

 -Original Message-
 From: Andrew Morton [mailto:a...@linux-foundation.org]
 Sent: Tuesday, February 03, 2015 2:39 PM
 To: Wang, Yalin
 Cc: 'Kirill A. Shutemov'; 'a...@arndb.de'; 'linux-a...@vger.kernel.org';
 'linux-kernel@vger.kernel.org'; 'li...@arm.linux.org.uk'; 'linux-arm-
 ker...@lists.infradead.org'
 Subject: Re: [RFC] change non-atomic bitops method
 
 On Tue, 3 Feb 2015 13:42:45 +0800 Wang, Yalin yalin.w...@sonymobile.com
 wrote:
 
  ...
 
  #ifdef CHECK_BEFORE_SET
  if (p[i] != times)
  #endif
 
  ...
 
  
  One run on CPU0, reader thread run on CPU1,
  Test result:
  sudo ./cache_test
  reader:8.426228173
  8.672198335
 
  With -DCHECK_BEFORE_SET
  sudo ./cache_test_check
  reader:7.537036819
  10.799746531
 
 
 You aren't measuring the right thing.  You should compare
 
   if (p[i] != x)
   p[i] = x;
 
 versus
 
   p[i] = x;
 
 and you should do this for two cases:
 
 a) p[i] == x
 
 b) p[i] != x
 
 
 The first code sequence will be slower when (p[i] != x) and faster when
 (p[i] == x).
 
 
 Next, we should instrument the kernel to work out the frequency of
 set_bit on an already-set bit.
 
 It is only with both these ratios that we can work out whether the
 patch is a net gain.  My suspicion is that set_bit on an already-set
 bit is so rare that the patch will be a loss.
I see, let's change the test a little:
1)
memset(p, 0, SIZE);
if (p[i] != 0)
p[i] = 0;  // never called

#sudo ./cache_test_check
6.698153838
reader:7.529402625


2)
memset(p, 0, SIZE);
if (p[i] == 0)
p[i] = 0; // always called
#sudo ./cache_test_check
reader:7.895421311
9.000889973

Thanks




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] change non-atomic bitops method

2015-02-01 Thread Wang, Yalin

This patch change non-atomic bitops,
add a if() condition to test it, before set/clear the bit.
so that we don't need dirty the cache line, if this bit
have been set or clear. On SMP system, dirty cache line will
need invalidate other processors cache line, this will have
some impact on SMP systems.

Signed-off-by: Yalin Wang 
---
 include/asm-generic/bitops/non-atomic.h | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/bitops/non-atomic.h 
b/include/asm-generic/bitops/non-atomic.h
index 697cc2b..e4ef18a 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -17,7 +17,9 @@ static inline void __set_bit(int nr, volatile unsigned long 
*addr)
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
 
-   *p  |= mask;
+   if ((*p & mask) == 0)
+   *p  |= mask;
+
 }
 
 static inline void __clear_bit(int nr, volatile unsigned long *addr)
@@ -25,7 +27,8 @@ static inline void __clear_bit(int nr, volatile unsigned long 
*addr)
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
 
-   *p &= ~mask;
+   if ((*p & mask) != 0)
+   *p &= ~mask;
 }
 
 /**
@@ -60,7 +63,8 @@ static inline int __test_and_set_bit(int nr, volatile 
unsigned long *addr)
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
unsigned long old = *p;
 
-   *p = old | mask;
+   if ((old & mask) == 0)
+   *p = old | mask;
return (old & mask) != 0;
 }
 
@@ -79,7 +83,8 @@ static inline int __test_and_clear_bit(int nr, volatile 
unsigned long *addr)
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
unsigned long old = *p;
 
-   *p = old & ~mask;
+   if ((old & mask) != 0)
+   *p = old & ~mask;
return (old & mask) != 0;
 }
 
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] change non-atomic bitops method

2015-02-01 Thread Wang, Yalin

This patch change non-atomic bitops,
add a if() condition to test it, before set/clear the bit.
so that we don't need dirty the cache line, if this bit
have been set or clear. On SMP system, dirty cache line will
need invalidate other processors cache line, this will have
some impact on SMP systems.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 include/asm-generic/bitops/non-atomic.h | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/bitops/non-atomic.h 
b/include/asm-generic/bitops/non-atomic.h
index 697cc2b..e4ef18a 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -17,7 +17,9 @@ static inline void __set_bit(int nr, volatile unsigned long 
*addr)
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
 
-   *p  |= mask;
+   if ((*p  mask) == 0)
+   *p  |= mask;
+
 }
 
 static inline void __clear_bit(int nr, volatile unsigned long *addr)
@@ -25,7 +27,8 @@ static inline void __clear_bit(int nr, volatile unsigned long 
*addr)
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
 
-   *p = ~mask;
+   if ((*p  mask) != 0)
+   *p = ~mask;
 }
 
 /**
@@ -60,7 +63,8 @@ static inline int __test_and_set_bit(int nr, volatile 
unsigned long *addr)
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
unsigned long old = *p;
 
-   *p = old | mask;
+   if ((old  mask) == 0)
+   *p = old | mask;
return (old  mask) != 0;
 }
 
@@ -79,7 +83,8 @@ static inline int __test_and_clear_bit(int nr, volatile 
unsigned long *addr)
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
unsigned long old = *p;
 
-   *p = old  ~mask;
+   if ((old  mask) != 0)
+   *p = old  ~mask;
return (old  mask) != 0;
 }
 
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V2] mm:change smaps/pagemap_read calculation behavior

2015-01-30 Thread Wang, Yalin

> -Original Message-
> From: Naoya Horiguchi [mailto:n-horigu...@ah.jp.nec.com]
> Sent: Friday, January 30, 2015 4:24 PM
> To: Wang, Yalin
> Cc: 'a...@linux-foundation.org'; 'kirill.shute...@linux.intel.com';
> 'o...@redhat.com'; 'gorcu...@openvz.org'; 'pfei...@google.com';
> 'aqu...@redhat.com'; 'linux-kernel@vger.kernel.org'
> Subject: Re: [RFC V2] mm:change smaps/pagemap_read calculation behavior
> 
> On Fri, Jan 30, 2015 at 03:47:54PM +0800, Wang, Yalin wrote:
> > This patch change smaps/pagemap_read pagetable walk behavior, to make
> > sure not skip VM_PFNMAP pagetables,
> > so that we can calculate COW pages of VM_PFNMAP as normal pages.
> >
> > Signed-off-by: Yalin Wang 
> 
> Hi Yalin,
> 
> The original motivation of the VM_PFNMAP code in pagewalk.c comes from the
> following patch:
> 
>   commit a9ff785e4437c83d2179161e012f5bdfbd6381f0
>   Author: Cliff Wickman 
>   Date:   Fri May 24 15:55:36 2013 -0700
> 
>   mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas
> 
> , where Cliff stated that some kind of vma(VM_PFNMAP) caused kernel panic
> when walk_page_range() was called over it. So I don't think that re-
> enabling
> to walk over every vma(VM_PFNMAP) unexceptionally is a good idea.
> 
> If you really want to get some information from a vma(VM_PFNMAP) via these
> interfaces, I recommend you to implement proper judging code which returns
> 0 for your vma(VM_PFNMAP) and returns 1 for Cliff's vma(VM_PFNMAP).
> 
I see, but I am curious that why kernel panic when I just
access process pagetables in page_table_walk()?
Is it caused by hardware problem?

The reason that I want to enable it is to see some drivers map some COW pages 
With VM_PFNMAP, so that user space can get correct page allocation info for 
These COW special pages with VM_PFNMAP flag.

Thanks 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V2] mm:change smaps/pagemap_read calculation behavior

2015-01-30 Thread Wang, Yalin

 -Original Message-
 From: Naoya Horiguchi [mailto:n-horigu...@ah.jp.nec.com]
 Sent: Friday, January 30, 2015 4:24 PM
 To: Wang, Yalin
 Cc: 'a...@linux-foundation.org'; 'kirill.shute...@linux.intel.com';
 'o...@redhat.com'; 'gorcu...@openvz.org'; 'pfei...@google.com';
 'aqu...@redhat.com'; 'linux-kernel@vger.kernel.org'
 Subject: Re: [RFC V2] mm:change smaps/pagemap_read calculation behavior

 On Fri, Jan 30, 2015 at 03:47:54PM +0800, Wang, Yalin wrote:
  This patch change smaps/pagemap_read pagetable walk behavior, to make
  sure not skip VM_PFNMAP pagetables,
  so that we can calculate COW pages of VM_PFNMAP as normal pages.

  Signed-off-by: Yalin Wang yalin.w...@sonymobile.com

 Hi Yalin,

 The original motivation of the VM_PFNMAP code in pagewalk.c comes from the
 following patch:

   commit a9ff785e4437c83d2179161e012f5bdfbd6381f0
   Author: Cliff Wickman c...@sgi.com
   Date:   Fri May 24 15:55:36 2013 -0700

   mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas

 , where Cliff stated that some kind of vma(VM_PFNMAP) caused kernel panic
 when walk_page_range() was called over it. So I don't think that re-
 enabling
 to walk over every vma(VM_PFNMAP) unexceptionally is a good idea.

 If you really want to get some information from a vma(VM_PFNMAP) via these
 interfaces, I recommend you to implement proper judging code which returns
 0 for your vma(VM_PFNMAP) and returns 1 for Cliff's vma(VM_PFNMAP).

I see, but I am curious that why kernel panic when I just
access process pagetables in page_table_walk()?
Is it caused by hardware problem?

The reason that I want to enable it is to see some drivers map some COW pages 
With VM_PFNMAP, so that user space can get correct page allocation info for 
These COW special pages with VM_PFNMAP flag.

Thanks 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] mm:change smaps/pagemap_read calculation behavior

2015-01-29 Thread Wang, Yalin

This patch change smaps/pagemap_read pagetable walk behavior, to make
sure not skip VM_PFNMAP pagetables,
so that we can calculate COW pages of VM_PFNMAP as normal pages.

Signed-off-by: Yalin Wang 
---
 fs/proc/task_mmu.c | 2 ++
 include/linux/mm.h | 2 ++
 mm/pagewalk.c  | 5 +
 3 files changed, 9 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c7267e9..e7d7c43 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -616,6 +616,7 @@ static int show_smap(struct seq_file *m, void *v, int 
is_pid)
struct mem_size_stats mss;
struct mm_walk smaps_walk = {
.pmd_entry = smaps_pte_range,
+   .test_walk = generic_walk_page_test_no_skip,
.mm = vma->vm_mm,
.private = ,
};
@@ -1264,6 +1265,7 @@ static ssize_t pagemap_read(struct file *file, char 
__user *buf,
 
pagemap_walk.pmd_entry = pagemap_pte_range;
pagemap_walk.pte_hole = pagemap_pte_hole;
+   pagemap_walk.test_walk = generic_walk_page_test_no_skip;
 #ifdef CONFIG_HUGETLB_PAGE
pagemap_walk.hugetlb_entry = pagemap_hugetlb_range;
 #endif
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b976d9f..07f71c5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1191,6 +1191,8 @@ struct mm_walk {
void *private;
 };
 
+int generic_walk_page_test_no_skip(unsigned long start, unsigned long end,
+   struct mm_walk *walk);
 int walk_page_range(unsigned long addr, unsigned long end,
struct mm_walk *walk);
 int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk);
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 75c1f28..14f38d5 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -206,6 +206,11 @@ static int __walk_page_range(unsigned long start, unsigned 
long end,
return err;
 }
 
+int generic_walk_page_test_no_skip(unsigned long start, unsigned long end,
+   struct mm_walk *walk)
+{
+   return 0;
+}
 /**
  * walk_page_range - walk page table with caller specific callbacks
  *
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] mm:change /proc/smaps caculation behavior

2015-01-29 Thread Wang, Yalin

This patch change smaps pagetable walk behavior, to make
sure not skip VM_PFNMAP pagetables,
so that we can calculate COW pages of VM_PFNMAP as normal pages.

Signed-off-by: Yalin Wang 
---
 fs/proc/task_mmu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c7267e9..00a5b73 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -503,6 +503,15 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
smaps_account(mss, page, PAGE_SIZE, pte_young(*pte), pte_dirty(*pte));
 }
 
+static int smaps_test_walk(unsigned long addr, unsigned long next,
+   struct mm_walk *walk)
+{
+   /*
+* don't skip VM_PFNMAP, so that we can caculate some COW pages.
+*/
+   return 0;
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
struct mm_walk *walk)
@@ -616,6 +625,7 @@ static int show_smap(struct seq_file *m, void *v, int 
is_pid)
struct mem_size_stats mss;
struct mm_walk smaps_walk = {
.pmd_entry = smaps_pte_range,
+   .test_walk = smaps_test_walk,
.mm = vma->vm_mm,
.private = ,
};
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] mm:change /proc/smaps caculation behavior

2015-01-29 Thread Wang, Yalin

This patch change smaps pagetable walk behavior, to make
sure not skip VM_PFNMAP pagetables,
so that we can calculate COW pages of VM_PFNMAP as normal pages.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 fs/proc/task_mmu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c7267e9..00a5b73 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -503,6 +503,15 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
smaps_account(mss, page, PAGE_SIZE, pte_young(*pte), pte_dirty(*pte));
 }
 
+static int smaps_test_walk(unsigned long addr, unsigned long next,
+   struct mm_walk *walk)
+{
+   /*
+* don't skip VM_PFNMAP, so that we can caculate some COW pages.
+*/
+   return 0;
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
struct mm_walk *walk)
@@ -616,6 +625,7 @@ static int show_smap(struct seq_file *m, void *v, int 
is_pid)
struct mem_size_stats mss;
struct mm_walk smaps_walk = {
.pmd_entry = smaps_pte_range,
+   .test_walk = smaps_test_walk,
.mm = vma-vm_mm,
.private = mss,
};
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] mm:change smaps/pagemap_read calculation behavior

2015-01-29 Thread Wang, Yalin

This patch change smaps/pagemap_read pagetable walk behavior, to make
sure not skip VM_PFNMAP pagetables,
so that we can calculate COW pages of VM_PFNMAP as normal pages.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 fs/proc/task_mmu.c | 2 ++
 include/linux/mm.h | 2 ++
 mm/pagewalk.c  | 5 +
 3 files changed, 9 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c7267e9..e7d7c43 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -616,6 +616,7 @@ static int show_smap(struct seq_file *m, void *v, int 
is_pid)
struct mem_size_stats mss;
struct mm_walk smaps_walk = {
.pmd_entry = smaps_pte_range,
+   .test_walk = generic_walk_page_test_no_skip,
.mm = vma-vm_mm,
.private = mss,
};
@@ -1264,6 +1265,7 @@ static ssize_t pagemap_read(struct file *file, char 
__user *buf,
 
pagemap_walk.pmd_entry = pagemap_pte_range;
pagemap_walk.pte_hole = pagemap_pte_hole;
+   pagemap_walk.test_walk = generic_walk_page_test_no_skip;
 #ifdef CONFIG_HUGETLB_PAGE
pagemap_walk.hugetlb_entry = pagemap_hugetlb_range;
 #endif
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b976d9f..07f71c5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1191,6 +1191,8 @@ struct mm_walk {
void *private;
 };
 
+int generic_walk_page_test_no_skip(unsigned long start, unsigned long end,
+   struct mm_walk *walk);
 int walk_page_range(unsigned long addr, unsigned long end,
struct mm_walk *walk);
 int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk);
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 75c1f28..14f38d5 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -206,6 +206,11 @@ static int __walk_page_range(unsigned long start, unsigned 
long end,
return err;
 }
 
+int generic_walk_page_test_no_skip(unsigned long start, unsigned long end,
+   struct mm_walk *walk)
+{
+   return 0;
+}
 /**
  * walk_page_range - walk page table with caller specific callbacks
  *
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] ion:change ion_cma_allocate return error value

2015-01-28 Thread Wang, Yalin

> -Original Message-
> From: 'gre...@linuxfoundation.org' [mailto:gre...@linuxfoundation.org]
> Sent: Thursday, January 29, 2015 2:45 AM
> To: Wang, Yalin
> Cc: 'tranmanph...@gmail.com'; 'fabio.este...@freescale.com';
> 'prime.z...@huawei.com'; 'de...@driverdev.osuosl.org'; 'linux-
> ker...@vger.kernel.org'; Gao, Neil
> Subject: Re: [RFC] ion:change ion_cma_allocate return error value
> 
> On Tue, Jan 27, 2015 at 02:04:21PM +0800, Wang, Yalin wrote:
> > This patch change the error return value from -1 to -ENOMEM, so that
> > userspace can get the correct errno, otherwise,
> > -1 will be -EPERM, userspace will print permission deny for allocation
> > failure.
> >
> > Signed-off-by: Yalin Wang 
> > ---
> >  drivers/staging/android/ion/ion_cma_heap.c | 6 ++
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> Don't you also need to now change userspace code to properly handle these
> errors?
> 
No, because userspace use strerror() to print errno,
It will print correct error string depending on kernel return
Errno.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] ion:change ion_cma_allocate return error value

2015-01-28 Thread Wang, Yalin

 -Original Message-
 From: 'gre...@linuxfoundation.org' [mailto:gre...@linuxfoundation.org]
 Sent: Thursday, January 29, 2015 2:45 AM
 To: Wang, Yalin
 Cc: 'tranmanph...@gmail.com'; 'fabio.este...@freescale.com';
 'prime.z...@huawei.com'; 'de...@driverdev.osuosl.org'; 'linux-
 ker...@vger.kernel.org'; Gao, Neil
 Subject: Re: [RFC] ion:change ion_cma_allocate return error value
 
 On Tue, Jan 27, 2015 at 02:04:21PM +0800, Wang, Yalin wrote:
  This patch change the error return value from -1 to -ENOMEM, so that
  userspace can get the correct errno, otherwise,
  -1 will be -EPERM, userspace will print permission deny for allocation
  failure.
 
  Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
  ---
   drivers/staging/android/ion/ion_cma_heap.c | 6 ++
   1 file changed, 2 insertions(+), 4 deletions(-)
 
 Don't you also need to now change userspace code to properly handle these
 errors?
 
No, because userspace use strerror() to print errno,
It will print correct error string depending on kernel return
Errno.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] xfs:change kmem_free to use generic kvfree()

2015-01-27 Thread Wang, Yalin

Change kmem_free to use kvfree() generic function,
remove the duplicated code.

Signed-off-by: Yalin Wang 
---
 fs/xfs/kmem.c | 10 --
 fs/xfs/kmem.h |  5 -
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 53e95b2..a7a3a63 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -91,16 +91,6 @@ kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
return ptr;
 }
 
-void
-kmem_free(const void *ptr)
-{
-   if (!is_vmalloc_addr(ptr)) {
-   kfree(ptr);
-   } else {
-   vfree(ptr);
-   }
-}
-
 void *
 kmem_realloc(const void *ptr, size_t newsize, size_t oldsize,
 xfs_km_flags_t flags)
diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 64db0e5..cc6b768 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -63,7 +63,10 @@ kmem_flags_convert(xfs_km_flags_t flags)
 extern void *kmem_alloc(size_t, xfs_km_flags_t);
 extern void *kmem_zalloc_large(size_t size, xfs_km_flags_t);
 extern void *kmem_realloc(const void *, size_t, size_t, xfs_km_flags_t);
-extern void  kmem_free(const void *);
+static inline void  kmem_free(const void *ptr)
+{
+   kvfree(ptr);
+}
 
 
 extern void *kmem_zalloc_greedy(size_t *, size_t, size_t);
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] agp:change agp_free_page_array to use kvfree

2015-01-27 Thread Wang, Yalin

Change agp_free_page_array to use kvfree function,
remove the duplicated code.

Signed-off-by: Yalin Wang 
---
 drivers/char/agp/agp.h |  5 -
 drivers/char/agp/generic.c | 11 ---
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h
index b709749..4eb1c77 100644
--- a/drivers/char/agp/agp.h
+++ b/drivers/char/agp/agp.h
@@ -219,7 +219,10 @@ struct agp_bridge_data *agp_generic_find_bridge(struct 
pci_dev *pdev);
 /* generic functions for user-populated AGP memory types */
 struct agp_memory *agp_generic_alloc_user(size_t page_count, int type);
 void agp_alloc_page_array(size_t size, struct agp_memory *mem);
-void agp_free_page_array(struct agp_memory *mem);
+static inline void agp_free_page_array(struct agp_memory *mem)
+{
+   kvfree(mem->pages);
+}
 
 
 /* generic routines for agp>=3 */
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index 0fbccce..f002fa5 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -98,17 +98,6 @@ void agp_alloc_page_array(size_t size, struct agp_memory 
*mem)
 }
 EXPORT_SYMBOL(agp_alloc_page_array);
 
-void agp_free_page_array(struct agp_memory *mem)
-{
-   if (is_vmalloc_addr(mem->pages)) {
-   vfree(mem->pages);
-   } else {
-   kfree(mem->pages);
-   }
-}
-EXPORT_SYMBOL(agp_free_page_array);
-
-
 static struct agp_memory *agp_create_user_memory(unsigned long num_agp_pages)
 {
struct agp_memory *new;
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] xfs:change kmem_free to use generic kvfree()

2015-01-27 Thread Wang, Yalin

Change kmem_free to use kvfree() generic function,
remove the duplicated code.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 fs/xfs/kmem.c | 10 --
 fs/xfs/kmem.h |  5 -
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 53e95b2..a7a3a63 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -91,16 +91,6 @@ kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
return ptr;
 }
 
-void
-kmem_free(const void *ptr)
-{
-   if (!is_vmalloc_addr(ptr)) {
-   kfree(ptr);
-   } else {
-   vfree(ptr);
-   }
-}
-
 void *
 kmem_realloc(const void *ptr, size_t newsize, size_t oldsize,
 xfs_km_flags_t flags)
diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 64db0e5..cc6b768 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -63,7 +63,10 @@ kmem_flags_convert(xfs_km_flags_t flags)
 extern void *kmem_alloc(size_t, xfs_km_flags_t);
 extern void *kmem_zalloc_large(size_t size, xfs_km_flags_t);
 extern void *kmem_realloc(const void *, size_t, size_t, xfs_km_flags_t);
-extern void  kmem_free(const void *);
+static inline void  kmem_free(const void *ptr)
+{
+   kvfree(ptr);
+}
 
 
 extern void *kmem_zalloc_greedy(size_t *, size_t, size_t);
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] agp:change agp_free_page_array to use kvfree

2015-01-27 Thread Wang, Yalin

Change agp_free_page_array to use kvfree function,
remove the duplicated code.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 drivers/char/agp/agp.h |  5 -
 drivers/char/agp/generic.c | 11 ---
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h
index b709749..4eb1c77 100644
--- a/drivers/char/agp/agp.h
+++ b/drivers/char/agp/agp.h
@@ -219,7 +219,10 @@ struct agp_bridge_data *agp_generic_find_bridge(struct 
pci_dev *pdev);
 /* generic functions for user-populated AGP memory types */
 struct agp_memory *agp_generic_alloc_user(size_t page_count, int type);
 void agp_alloc_page_array(size_t size, struct agp_memory *mem);
-void agp_free_page_array(struct agp_memory *mem);
+static inline void agp_free_page_array(struct agp_memory *mem)
+{
+   kvfree(mem-pages);
+}
 
 
 /* generic routines for agp=3 */
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index 0fbccce..f002fa5 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -98,17 +98,6 @@ void agp_alloc_page_array(size_t size, struct agp_memory 
*mem)
 }
 EXPORT_SYMBOL(agp_alloc_page_array);
 
-void agp_free_page_array(struct agp_memory *mem)
-{
-   if (is_vmalloc_addr(mem-pages)) {
-   vfree(mem-pages);
-   } else {
-   kfree(mem-pages);
-   }
-}
-EXPORT_SYMBOL(agp_free_page_array);
-
-
 static struct agp_memory *agp_create_user_memory(unsigned long num_agp_pages)
 {
struct agp_memory *new;
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] ion:change ion_cma_allocate return error value

2015-01-26 Thread Wang, Yalin

This patch change the error return value from -1 to -ENOMEM,
so that userspace can get the correct errno, otherwise,
-1 will be -EPERM, userspace will print permission deny for allocation
failure.

Signed-off-by: Yalin Wang 
---
 drivers/staging/android/ion/ion_cma_heap.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/android/ion/ion_cma_heap.c 
b/drivers/staging/android/ion/ion_cma_heap.c
index f4211f1..55bcaec2 100644
--- a/drivers/staging/android/ion/ion_cma_heap.c
+++ b/drivers/staging/android/ion/ion_cma_heap.c
@@ -24,8 +24,6 @@
 #include "ion.h"
 #include "ion_priv.h"
 
-#define ION_CMA_ALLOCATE_FAILED -1
-
 struct ion_cma_heap {
struct ion_heap heap;
struct device *dev;
@@ -59,7 +57,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct 
ion_buffer *buffer,
 
info = kzalloc(sizeof(struct ion_cma_buffer_info), GFP_KERNEL);
if (!info)
-   return ION_CMA_ALLOCATE_FAILED;
+   return -ENOMEM;
 
info->cpu_addr = dma_alloc_coherent(dev, len, &(info->handle),
GFP_HIGHUSER | __GFP_ZERO);
@@ -87,7 +85,7 @@ free_mem:
dma_free_coherent(dev, len, info->cpu_addr, info->handle);
 err:
kfree(info);
-   return ION_CMA_ALLOCATE_FAILED;
+   return -ENOMEM;
 }
 
 static void ion_cma_free(struct ion_buffer *buffer)
-- 
2.2.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] ion:change ion_cma_allocate return error value

2015-01-26 Thread Wang, Yalin

This patch change the error return value from -1 to -ENOMEM,
so that userspace can get the correct errno, otherwise,
-1 will be -EPERM, userspace will print permission deny for allocation
failure.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 drivers/staging/android/ion/ion_cma_heap.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/android/ion/ion_cma_heap.c 
b/drivers/staging/android/ion/ion_cma_heap.c
index f4211f1..55bcaec2 100644
--- a/drivers/staging/android/ion/ion_cma_heap.c
+++ b/drivers/staging/android/ion/ion_cma_heap.c
@@ -24,8 +24,6 @@
 #include ion.h
 #include ion_priv.h
 
-#define ION_CMA_ALLOCATE_FAILED -1
-
 struct ion_cma_heap {
struct ion_heap heap;
struct device *dev;
@@ -59,7 +57,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct 
ion_buffer *buffer,
 
info = kzalloc(sizeof(struct ion_cma_buffer_info), GFP_KERNEL);
if (!info)
-   return ION_CMA_ALLOCATE_FAILED;
+   return -ENOMEM;
 
info-cpu_addr = dma_alloc_coherent(dev, len, (info-handle),
GFP_HIGHUSER | __GFP_ZERO);
@@ -87,7 +85,7 @@ free_mem:
dma_free_coherent(dev, len, info-cpu_addr, info-handle);
 err:
kfree(info);
-   return ION_CMA_ALLOCATE_FAILED;
+   return -ENOMEM;
 }
 
 static void ion_cma_free(struct ion_buffer *buffer)
-- 
2.2.2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] export module parameters even the permission is zero

2015-01-21 Thread Wang, Yalin

> -Original Message-
> From: Rusty Russell [mailto:ru...@rustcorp.com.au]
> Sent: Tuesday, January 20, 2015 2:33 PM
> To: Wang, Yalin; 'a...@linux-foundation.org'; 'jani.nik...@intel.com';
> 'h...@infradead.org'; 'h...@suse.de'; 'keesc...@chromium.org'; 'linux-
> ker...@vger.kernel.org'
> Subject: Re: [RFC] export module parameters even the permission is zero
> 
> "Wang, Yalin"  writes:
> > This patch make sure to export module parameters even the permission
> > is zero, this is useful for some platforms like Android, the init
> > process can change the parameter mode/owner by chmod/chown during
> > bootup
> >
> > Signed-off-by: Yalin Wang 
> 
> Hi!
> 
> Unfortunately, this won't work.  Various parts of the code assume
> that 0 permissions means "cannot be changed or read", so this change would
> introduce many bugs (eg. the parameter could be __initdata).
> 
> Since these are owned by root at boot, changing mode and owner should still
> work.  However, you can't make a read-only field writable and expect it to
> work.
> 
Oh, I see,
This means kernel cmdline can change this module_parameter ignored permission,

Thanks for your comments.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] export module parameters even the permission is zero

2015-01-21 Thread Wang, Yalin

 -Original Message-
 From: Rusty Russell [mailto:ru...@rustcorp.com.au]
 Sent: Tuesday, January 20, 2015 2:33 PM
 To: Wang, Yalin; 'a...@linux-foundation.org'; 'jani.nik...@intel.com';
 'h...@infradead.org'; 'h...@suse.de'; 'keesc...@chromium.org'; 'linux-
 ker...@vger.kernel.org'
 Subject: Re: [RFC] export module parameters even the permission is zero

 Wang, Yalin yalin.w...@sonymobile.com writes:
  This patch make sure to export module parameters even the permission
  is zero, this is useful for some platforms like Android, the init
  process can change the parameter mode/owner by chmod/chown during
  bootup

  Signed-off-by: Yalin Wang yalin.w...@sonymobile.com

 Hi!

 Unfortunately, this won't work.  Various parts of the code assume
 that 0 permissions means cannot be changed or read, so this change would
 introduce many bugs (eg. the parameter could be __initdata).

 Since these are owned by root at boot, changing mode and owner should still
 work.  However, you can't make a read-only field writable and expect it to
 work.

Oh, I see,
This means kernel cmdline can change this module_parameter ignored permission,

Thanks for your comments.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction

2015-01-15 Thread Wang, Yalin

> -Original Message-
> From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
> Sent: Thursday, January 15, 2015 12:38 AM
> To: Wang, Yalin
> Cc: 'Ard Biesheuvel'; 'Will Deacon'; 'linux-kernel@vger.kernel.org';
> 'akinobu.m...@gmail.com'; 'linux...@kvack.org'; 'Joe Perches'; 'linux-arm-
> ker...@lists.infradead.org'
> Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction
> 
> On Fri, Jan 09, 2015 at 08:40:56PM +0800, Wang, Yalin wrote:
> > Oh, I see,
> > How about change like this:
> > +   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7) && !CPU_V6 &&
> > +!CPU_V6K)
> > I am not sure if I also need add some older CPU types like !CPU_ARM9TDMI
> &&　!CPU_ARM940T ?
> >
> > Another solution is:
> > +   select HAVE_ARCH_BITREVERSE if ((CPU_32V7M || CPU_32V7) && !CPU_32V6
> > +&& !CPU_32V5 && !CPU_32V4 && !CPU_32V4T && !CPU_32V3)
> >
> > By the way, I am not clear about the difference between CPU_V6 and
> > CPU_V6K, could you tell me? :)
> 
> I think
> 
>   select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
> 
> is sufficient - we don't support mixing pre-v6 and v6+ CPU architectures
> into a single kernel.
> 
Ok, I will re-send a patch. 

Thanks

RE: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction

2015-01-15 Thread Wang, Yalin

 -Original Message-
 From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
 Sent: Thursday, January 15, 2015 12:38 AM
 To: Wang, Yalin
 Cc: 'Ard Biesheuvel'; 'Will Deacon'; 'linux-kernel@vger.kernel.org';
 'akinobu.m...@gmail.com'; 'linux...@kvack.org'; 'Joe Perches'; 'linux-arm-
 ker...@lists.infradead.org'
 Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction
 
 On Fri, Jan 09, 2015 at 08:40:56PM +0800, Wang, Yalin wrote:
  Oh, I see,
  How about change like this:
  +   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7)  !CPU_V6 
  +!CPU_V6K)
  I am not sure if I also need add some older CPU types like !CPU_ARM9TDMI
 　!CPU_ARM940T ?
 
  Another solution is:
  +   select HAVE_ARCH_BITREVERSE if ((CPU_32V7M || CPU_32V7)  !CPU_32V6
  + !CPU_32V5  !CPU_32V4  !CPU_32V4T  !CPU_32V3)
 
  By the way, I am not clear about the difference between CPU_V6 and
  CPU_V6K, could you tell me? :)
 
 I think
 
   select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7)  !CPU_32v6
 
 is sufficient - we don't support mixing pre-v6 and v6+ CPU architectures
 into a single kernel.
 
Ok, I will re-send a patch. 

Thanks

[RFC] export module parameters even the permission is zero

2015-01-13 Thread Wang, Yalin

This patch make sure to export module parameters even the permission
is zero, this is useful for some platforms like Android,
the init process can change the parameter mode/owner by
chmod/chown during bootup

Signed-off-by: Yalin Wang 
---
 kernel/params.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/kernel/params.c b/kernel/params.c
index bd65d136..aa80c04 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -607,9 +607,6 @@ static __modinit int add_sysfs_param(struct module_kobject 
*mk,
struct attribute **new_attrs;
unsigned int i;
 
-   /* We don't bother calling this with invisible parameters. */
-   BUG_ON(!kp->perm);
-
if (!mk->mp) {
/* First allocation. */
mk->mp = kzalloc(sizeof(*mk->mp), GFP_KERNEL);
@@ -812,9 +809,6 @@ static void __init param_sysfs_builtin(void)
for (kp = __start___param; kp < __stop___param; kp++) {
char *dot;
 
-   if (kp->perm == 0)
-   continue;
-
dot = strchr(kp->name, '.');
if (!dot) {
/* This happens for core_param() */
-- 
2.1.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] export module parameters even the permission is zero

2015-01-13 Thread Wang, Yalin

This patch make sure to export module parameters even the permission
is zero, this is useful for some platforms like Android,
the init process can change the parameter mode/owner by
chmod/chown during bootup

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 kernel/params.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/kernel/params.c b/kernel/params.c
index bd65d136..aa80c04 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -607,9 +607,6 @@ static __modinit int add_sysfs_param(struct module_kobject 
*mk,
struct attribute **new_attrs;
unsigned int i;
 
-   /* We don't bother calling this with invisible parameters. */
-   BUG_ON(!kp-perm);
-
if (!mk-mp) {
/* First allocation. */
mk-mp = kzalloc(sizeof(*mk-mp), GFP_KERNEL);
@@ -812,9 +809,6 @@ static void __init param_sysfs_builtin(void)
for (kp = __start___param; kp  __stop___param; kp++) {
char *dot;
 
-   if (kp-perm == 0)
-   continue;
-
dot = strchr(kp-name, '.');
if (!dot) {
/* This happens for core_param() */
-- 
2.1.3
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

FW: [RFC V6 2/3 resend] arm:add bitrev.h file to support rbit instruction

2015-01-11 Thread Wang, Yalin

> -Original Message-
> From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
> Sent: Friday, January 09, 2015 7:11 PM
> To: Wang, Yalin
> Cc: 'Ard Biesheuvel'; 'Will Deacon'; 'linux-kernel@vger.kernel.org'; 
> 'akinobu.m...@gmail.com'; 'linux...@kvack.org'; 'Joe Perches'; 
> 'linux-arm- ker...@lists.infradead.org'
> Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit 
> instruction
> 
> On Fri, Jan 09, 2015 at 10:16:32AM +0800, Wang, Yalin wrote:
> > > -Original Message-
> > > From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
> > > Sent: Friday, January 09, 2015 2:41 AM
> > > To: Wang, Yalin
> > > Cc: 'Will Deacon'; 'Ard Biesheuvel'; 
> > > 'linux-kernel@vger.kernel.org'; 'akinobu.m...@gmail.com'; 
> > > 'linux...@kvack.org'; 'Joe Perches';
> > > 'linux-arm- ker...@lists.infradead.org'
> > > Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit 
> > > instruction
> > >
> > > The root cause is that the kernel being built is supposed to 
> > > support both
> > > ARMv7 and ARMv6K CPUs.  However, "rbit" is only available on
> > > ARMv6T2 (thumb2) and ARMv7, and not plain ARMv6 or ARMv6K CPUs.
> > >
> > In the patch that you applied:
> > 8205/1  add bitrev.h file to support rbit instruction
> >
> > I have add :
> > +   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7) && !CPU_V6)
> >
> > If you build kernel support ARMv6K, should CONFIG_CPU_V6=y, isn't it ?
> > Then will not build hardware rbit instruction, isn't it ?
> 
> The config has:
> 
> CONFIG_CPU_PJ4=y
> # CONFIG_CPU_V6 is not set
> CONFIG_CPU_V6K=y
> CONFIG_CPU_V7=y
> CONFIG_CPU_32v6=y
> CONFIG_CPU_32v6K=y
> CONFIG_CPU_32v7=y
> 
> And no, the CONFIG_CPU_V* flags refer to the CPUs.  The
> CONFIG_CPU_32v* symbols refer to the CPU architectures.
> 
Oh, I see,
How about change like this:
+   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7) && !CPU_V6 && 
+!CPU_V6K)
I am not sure if I also need add some older CPU types like !CPU_ARM9TDMI &&　
!CPU_ARM940T ?

Another solution is:
+   select HAVE_ARCH_BITREVERSE if ((CPU_32V7M || CPU_32V7) && !CPU_32V6 
+&& !CPU_32V5 && !CPU_32V4 && !CPU_32V4T && !CPU_32V3)

By the way, I am not clear about the difference between CPU_V6 and CPU_V6K, 
could you tell me? :)

Thank you 







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

FW: [RFC V6 2/3 resend] arm:add bitrev.h file to support rbit instruction

2015-01-11 Thread Wang, Yalin

 -Original Message-
 From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
 Sent: Friday, January 09, 2015 7:11 PM
 To: Wang, Yalin
 Cc: 'Ard Biesheuvel'; 'Will Deacon'; 'linux-kernel@vger.kernel.org'; 
 'akinobu.m...@gmail.com'; 'linux...@kvack.org'; 'Joe Perches'; 
 'linux-arm- ker...@lists.infradead.org'
 Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit 
 instruction
 
 On Fri, Jan 09, 2015 at 10:16:32AM +0800, Wang, Yalin wrote:
   -Original Message-
   From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
   Sent: Friday, January 09, 2015 2:41 AM
   To: Wang, Yalin
   Cc: 'Will Deacon'; 'Ard Biesheuvel'; 
   'linux-kernel@vger.kernel.org'; 'akinobu.m...@gmail.com'; 
   'linux...@kvack.org'; 'Joe Perches';
   'linux-arm- ker...@lists.infradead.org'
   Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit 
   instruction
  
   The root cause is that the kernel being built is supposed to 
   support both
   ARMv7 and ARMv6K CPUs.  However, rbit is only available on
   ARMv6T2 (thumb2) and ARMv7, and not plain ARMv6 or ARMv6K CPUs.
  
  In the patch that you applied:
  8205/1  add bitrev.h file to support rbit instruction
 
  I have add :
  +   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7)  !CPU_V6)
 
  If you build kernel support ARMv6K, should CONFIG_CPU_V6=y, isn't it ?
  Then will not build hardware rbit instruction, isn't it ?
 
 The config has:
 
 CONFIG_CPU_PJ4=y
 # CONFIG_CPU_V6 is not set
 CONFIG_CPU_V6K=y
 CONFIG_CPU_V7=y
 CONFIG_CPU_32v6=y
 CONFIG_CPU_32v6K=y
 CONFIG_CPU_32v7=y
 
 And no, the CONFIG_CPU_V* flags refer to the CPUs.  The
 CONFIG_CPU_32v* symbols refer to the CPU architectures.
 
Oh, I see,
How about change like this:
+   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7)  !CPU_V6  
+!CPU_V6K)
I am not sure if I also need add some older CPU types like !CPU_ARM9TDMI 　
!CPU_ARM940T ?

Another solution is:
+   select HAVE_ARCH_BITREVERSE if ((CPU_32V7M || CPU_32V7)  !CPU_32V6 
+ !CPU_32V5  !CPU_32V4  !CPU_32V4T  !CPU_32V3)

By the way, I am not clear about the difference between CPU_V6 and CPU_V6K, 
could you tell me? :)

Thank you 







--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction

2015-01-09 Thread Wang, Yalin

> -Original Message-
> From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
> Sent: Friday, January 09, 2015 7:11 PM
> To: Wang, Yalin
> Cc: 'Ard Biesheuvel'; 'Will Deacon'; 'linux-kernel@vger.kernel.org';
> 'akinobu.m...@gmail.com'; 'linux...@kvack.org'; 'Joe Perches'; 'linux-arm-
> ker...@lists.infradead.org'
> Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction
> 
> On Fri, Jan 09, 2015 at 10:16:32AM +0800, Wang, Yalin wrote:
> > > -Original Message-
> > > From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
> > > Sent: Friday, January 09, 2015 2:41 AM
> > > To: Wang, Yalin
> > > Cc: 'Will Deacon'; 'Ard Biesheuvel'; 'linux-kernel@vger.kernel.org';
> > > 'akinobu.m...@gmail.com'; 'linux...@kvack.org'; 'Joe Perches';
> > > 'linux-arm- ker...@lists.infradead.org'
> > > Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit
> > > instruction
> > >
> > > The root cause is that the kernel being built is supposed to support
> > > both
> > > ARMv7 and ARMv6K CPUs.  However, "rbit" is only available on
> > > ARMv6T2 (thumb2) and ARMv7, and not plain ARMv6 or ARMv6K CPUs.
> > >
> > In the patch that you applied:
> > 8205/1  add bitrev.h file to support rbit instruction
> >
> > I have add :
> > +   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7) && !CPU_V6)
> >
> > If you build kernel support ARMv6K, should CONFIG_CPU_V6=y, isn't it ?
> > Then will not build hardware rbit instruction, isn't it ?
> 
> The config has:
> 
> CONFIG_CPU_PJ4=y
> # CONFIG_CPU_V6 is not set
> CONFIG_CPU_V6K=y
> CONFIG_CPU_V7=y
> CONFIG_CPU_32v6=y
> CONFIG_CPU_32v6K=y
> CONFIG_CPU_32v7=y
> 
> And no, the CONFIG_CPU_V* flags refer to the CPUs.  The
> CONFIG_CPU_32v* symbols refer to the CPU architectures.
> 
Oh, I see,
How about change like this:
+   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7) && !CPU_V6 && 
!CPU_V6K)
I am not sure if I also need add some older CPU types like !CPU_ARM9TDMI &&　
!CPU_ARM940T ?

Another solution is:
+   select HAVE_ARCH_BITREVERSE if ((CPU_32V7M || CPU_32V7) && !CPU_32V6 && 
!CPU_32V5 && !CPU_32V4 && !CPU_32V4T && !CPU_32V3)

By the way, I am not clear about the difference between CPU_V6 and CPU_V6K, 
could you tell me? :)

Thank you 







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction

2015-01-09 Thread Wang, Yalin

 -Original Message-
 From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
 Sent: Friday, January 09, 2015 7:11 PM
 To: Wang, Yalin
 Cc: 'Ard Biesheuvel'; 'Will Deacon'; 'linux-kernel@vger.kernel.org';
 'akinobu.m...@gmail.com'; 'linux...@kvack.org'; 'Joe Perches'; 'linux-arm-
 ker...@lists.infradead.org'
 Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction
 
 On Fri, Jan 09, 2015 at 10:16:32AM +0800, Wang, Yalin wrote:
   -Original Message-
   From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
   Sent: Friday, January 09, 2015 2:41 AM
   To: Wang, Yalin
   Cc: 'Will Deacon'; 'Ard Biesheuvel'; 'linux-kernel@vger.kernel.org';
   'akinobu.m...@gmail.com'; 'linux...@kvack.org'; 'Joe Perches';
   'linux-arm- ker...@lists.infradead.org'
   Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit
   instruction
  
   The root cause is that the kernel being built is supposed to support
   both
   ARMv7 and ARMv6K CPUs.  However, rbit is only available on
   ARMv6T2 (thumb2) and ARMv7, and not plain ARMv6 or ARMv6K CPUs.
  
  In the patch that you applied:
  8205/1  add bitrev.h file to support rbit instruction
 
  I have add :
  +   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7)  !CPU_V6)
 
  If you build kernel support ARMv6K, should CONFIG_CPU_V6=y, isn't it ?
  Then will not build hardware rbit instruction, isn't it ?
 
 The config has:
 
 CONFIG_CPU_PJ4=y
 # CONFIG_CPU_V6 is not set
 CONFIG_CPU_V6K=y
 CONFIG_CPU_V7=y
 CONFIG_CPU_32v6=y
 CONFIG_CPU_32v6K=y
 CONFIG_CPU_32v7=y
 
 And no, the CONFIG_CPU_V* flags refer to the CPUs.  The
 CONFIG_CPU_32v* symbols refer to the CPU architectures.
 
Oh, I see,
How about change like this:
+   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7)  !CPU_V6  
!CPU_V6K)
I am not sure if I also need add some older CPU types like !CPU_ARM9TDMI 　
!CPU_ARM940T ?

Another solution is:
+   select HAVE_ARCH_BITREVERSE if ((CPU_32V7M || CPU_32V7)  !CPU_32V6  
!CPU_32V5  !CPU_32V4  !CPU_32V4T  !CPU_32V3)

By the way, I am not clear about the difference between CPU_V6 and CPU_V6K, 
could you tell me? :)

Thank you 







--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] mmc:change mmc_init workqueue into a freezable workqueue

2015-01-08 Thread Wang, Yalin

> -Original Message-
> From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
> Sent: Thursday, January 08, 2015 7:42 PM
> To: Wang, Yalin
> Cc: 'ch...@printf.net'; 'ulf.hans...@linaro.org'; 'tim.kry...@gmail.com';
> 'tgih@samsung.com'; 'johan.rudh...@axis.com'; 'linux-
> m...@vger.kernel.org'; 'linux-kernel@vger.kernel.org'; 'linux-arm-
> ker...@lists.infradead.org'
> Subject: Re: [RFC] mmc:change mmc_init workqueue into a freezable workqueue
> 
> On Thu, Jan 08, 2015 at 06:06:32PM +0800, Wang, Yalin wrote:
> > This patch fix the mmc driver suspend/resume conflict problems, mmc
> > workqueue will queue mmc_rescan(), and it will call some
> > pm_runtime_* functions, this will conflict with suspend path
> > sometimes, and will result in some strange behavior:
> >
> > Suspend path:
> > -000 |context_switch(inline)
> > -000 |__schedule()
> > -001 |schedule_preempt_disabled()
> > -002 |spin_lock(inline)
> > -002 |__mutex_lock_common(inline)
> > -002 |__mutex_lock_slowpath(lock_count = 0xEDCD0F48)
> > -003 |__mutex_fastpath_lock(inline)
> > -003 |mutex_lock(lock = 0xEDCD0F48)
> > -004 |sdhci_do_set_ios(host = 0xEDCD0CC0, ios = 0xEDCD0A70)
> > -005 |sdhci_set_ios(?, ios = 0xEDCD0A70)
> > -006 |mmc_set_ios(host = 0xEDCD0800)
> > -007 |mmc_delay(inline)
> > -007 |mmc_power_off(host = 0xEDCD0800)
> > -008 |mmc_suspend_host(inline)
> > -008 |mmc_suspend_host(host = 0xEDCD0800)
> > -009 |mmc_host_suspend(dev = 0xEDCD0808)
> > -010 |dpm_run_callback(cb = 0xC0627A88, dev = 0xEDCD0808, state =
> (event = 2), info = 0xC0B6EF9B)
> > -011 |__device_suspend(dev = 0xEDCD0808, state = (event = 2), ?)
> > -012 |device_suspend(inline)
> 
> Suspend can't complete until device_suspend() returns.  Hence,
> mmc_power_off() must finish.
> 
> > mmc_rescan() resume path:
> >  -000 |context_switch(inline)
> >  -000 |__schedule()
> >  -001 |do_undefinstr(regs = 0xD12242F0)
> >  -002 |__und_svc(asm)
> >   --> |exception
> 
> I assume this is what you mean by "strange behaviour" ?  If so, please give
> the full oops.  Have you fully diagnosed the failure?
> 
> > most mmc power callback function don't check this special case, and
> > will cause problems, make sure the workqueue is stopped during suspend
> > is more safe.
> 
> I think there's a bad side effect of this: if you have swap on a SD card,
> and you ask the system to hibernate, you don't want this thread to freeze.
> 
> What you do need to do is to ensure that new requests can't be processed
> while the host is suspended.
> 
> A hibernate works (approximately) as:
> 
> - freeze all tasks
> - push as much out to swap as possible
> - suspend devices
> - create system snapshot
> - resume devices
> - write system snapshot to swap
> - power down
> 
> If the MMC thread is frozen, and you have swap on SD, then it can't write
> the image to swap.
> 
I see your concerns, thanks for your comment.
I will think about some other solutions for this problems.


Thank you

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction

2015-01-08 Thread Wang, Yalin

> -Original Message-
> From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
> Sent: Friday, January 09, 2015 2:41 AM
> To: Wang, Yalin
> Cc: 'Will Deacon'; 'Ard Biesheuvel'; 'linux-kernel@vger.kernel.org';
> 'akinobu.m...@gmail.com'; 'linux...@kvack.org'; 'Joe Perches'; 'linux-arm-
> ker...@lists.infradead.org'
> Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction
> 
> On Mon, Nov 17, 2014 at 10:38:58AM +0800, Wang, Yalin wrote:
> > Joe has submitted patches to maintainers, So we need wait for them to
> > be accepted .
> 
> I ran these patches through my autobuilder, and while most builds didn't
> seem to be a problem, the randconfigs found errors:
> 
> /tmp/ccbiuDjS.s:137: Error: selected processor does not support ARM mode
> `rbit r3,r2'
> /tmp/ccbiuDjS.s:145: Error: selected processor does not support ARM mode
> `rbit r0,r1'
> make[4]: *** [drivers/iio/amplifiers/ad8366.o] Error 1
> 
> /tmp/ccFhnoO3.s:6789: Error: selected processor does not support ARM mode
> `rbit r2,r2'
> make[4]: *** [drivers/mtd/devices/docg3.o] Error 1
> 
> /tmp/cckMf2pp.s:239: Error: selected processor does not support ARM mode
> `rbit ip,ip'
> /tmp/cckMf2pp.s:241: Error: selected processor does not support ARM mode
> `rbit r2,r2'
> /tmp/cckMf2pp.s:248: Error: selected processor does not support ARM mode
> `rbit lr,lr'
> /tmp/cckMf2pp.s:250: Error: selected processor does not support ARM mode
> `rbit r3,r3'
> make[5]: *** [drivers/video/fbdev/nvidia/nvidia.o] Error 1
> 
> /tmp/ccTgULsO.s:1151: Error: selected processor does not support ARM mode
> `rbit r1,r1'
> /tmp/ccTgULsO.s:1158: Error: selected processor does not support ARM mode
> `rbit r0,r0'
> /tmp/ccTgULsO.s:1164: Error: selected processor does not support ARM mode
> `rbit ip,ip'
> /tmp/ccTgULsO.s:1166: Error: selected processor does not support ARM mode
> `rbit r3,r3'
> /tmp/ccTgULsO.s:1227: Error: selected processor does not support ARM mode
> `rbit r5,r5'
> /tmp/ccTgULsO.s:1229: Error: selected processor does not support ARM mode
> `rbit lr,lr'
> /tmp/ccTgULsO.s:1236: Error: selected processor does not support ARM mode
> `rbit r0,r0'
> /tmp/ccTgULsO.s:1238: Error: selected processor does not support ARM mode
> `rbit r3,r3'
> make[5]: *** [drivers/video/fbdev/nvidia/nv_accel.o] Error 1
> 
> The root cause is that the kernel being built is supposed to support both
> ARMv7 and ARMv6K CPUs.  However, "rbit" is only available on
> ARMv6T2 (thumb2) and ARMv7, and not plain ARMv6 or ARMv6K CPUs.
> 
In the patch that you applied:
8205/1  add bitrev.h file to support rbit instruction

I have add :
+   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7) && !CPU_V6)

If you build kernel support ARMv6K, should CONFIG_CPU_V6=y, isn't it ?
Then will not build hardware rbit instruction, isn't it ?

Thanks








--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] mmc:change mmc_init workqueue into a freezable workqueue

2015-01-08 Thread Wang, Yalin

This patch fix the mmc driver suspend/resume conflict problems,
mmc workqueue will queue mmc_rescan(), and it will call some
pm_runtime_* functions, this will conflict with suspend path sometimes,
and will result in some strange behavior:

Suspend path:
-000 |context_switch(inline)
-000 |__schedule()
-001 |schedule_preempt_disabled()
-002 |spin_lock(inline)
-002 |__mutex_lock_common(inline)
-002 |__mutex_lock_slowpath(lock_count = 0xEDCD0F48)
-003 |__mutex_fastpath_lock(inline)
-003 |mutex_lock(lock = 0xEDCD0F48)
-004 |sdhci_do_set_ios(host = 0xEDCD0CC0, ios = 0xEDCD0A70)
-005 |sdhci_set_ios(?, ios = 0xEDCD0A70)
-006 |mmc_set_ios(host = 0xEDCD0800)
-007 |mmc_delay(inline)
-007 |mmc_power_off(host = 0xEDCD0800)
-008 |mmc_suspend_host(inline)
-008 |mmc_suspend_host(host = 0xEDCD0800)
-009 |mmc_host_suspend(dev = 0xEDCD0808)
-010 |dpm_run_callback(cb = 0xC0627A88, dev = 0xEDCD0808, state = 
(event = 2), info = 0xC0B6EF9B)
-011 |__device_suspend(dev = 0xEDCD0808, state = (event = 2), ?)
-012 |device_suspend(inline)
-012 |dpm_suspend(state = (event = 2))
-013 |suspend_devices_and_enter(state = 3)
-014 |enter_state(inline)
-014 |pm_suspend(state = 3)
-015 |try_to_suspend(?)
-016 |static_key_false(inline)
-016 |trace_workqueue_execute_end(inline)
-016 |process_one_work(worker = 0xD5E87040, work = 0xC0E3FF54)
-017 |worker_thread(__worker = 0xD5E87040)
-018 |kthread(_create = 0xE9FFBF10)
-019 |kernel_thread_exit(asm)
 --- |end of frame

mmc_rescan() resume path:
 -000 |context_switch(inline)
 -000 |__schedule()
 -001 |do_undefinstr(regs = 0xD12242F0)
 -002 |__und_svc(asm)
  --> |exception
  -003 |sdhci_set_power(host = 0xEDCD0CC0, power = 0)
  -004 |sdhci_do_set_ios(host = 0xEDCD0CC0, ios = 0xEDCD0A70)
  -005 |sdhci_set_ios(?, ios = 0xEDCD0A70)
  -006 |mmc_set_ios(host = 0xEDCD0800)
  -007 |mmc_delay(inline)
  -007 |mmc_power_up(host = 0xEDCD0800)
  -008 |mmc_resume_bus(inline)
  -008 |mmc_resume_bus(host = 0xEDCD0800)
  -009 |mmc_rpm_hold(host = 0xEDCD0800, ?)
  -010 |mmc_rescan(work = 0xEDCD0AB0)
  -011 |static_key_false(inline)
  -011 |trace_workqueue_execute_end(inline)
  -011 |process_one_work(worker = 0xE5F0BBC0, work = 0xEDCD0AB0)
  -012 |worker_thread(__worker = 0xE5F0BBC0)
  -013 |kthread(_create = 0xD03A5F10)
  -014 |kernel_thread_exit(asm)
   --- |end of frame

most mmc power callback function don't check this special case, and
will cause problems.

Signed-off-by: Yalin Wang 
---
 drivers/mmc/core/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index a6c139d..ae8757f 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -3881,7 +3881,7 @@ static int __init mmc_init(void)
 {
int ret;
 
-   workqueue = alloc_ordered_workqueue("kmmcd", 0);
+   workqueue = alloc_ordered_workqueue("kmmcd", WQ_FREEZABLE);
if (!workqueue)
return -ENOMEM;
 
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] mmc:change mmc_init workqueue into a freezable workqueue

2015-01-08 Thread Wang, Yalin

This patch fix the mmc driver suspend/resume conflict problems,
mmc workqueue will queue mmc_rescan(), and it will call some
pm_runtime_* functions, this will conflict with suspend path sometimes,
and will result in some strange behavior:

Suspend path:
-000 |context_switch(inline)
-000 |__schedule()
-001 |schedule_preempt_disabled()
-002 |spin_lock(inline)
-002 |__mutex_lock_common(inline)
-002 |__mutex_lock_slowpath(lock_count = 0xEDCD0F48)
-003 |__mutex_fastpath_lock(inline)
-003 |mutex_lock(lock = 0xEDCD0F48)
-004 |sdhci_do_set_ios(host = 0xEDCD0CC0, ios = 0xEDCD0A70)
-005 |sdhci_set_ios(?, ios = 0xEDCD0A70)
-006 |mmc_set_ios(host = 0xEDCD0800)
-007 |mmc_delay(inline)
-007 |mmc_power_off(host = 0xEDCD0800)
-008 |mmc_suspend_host(inline)
-008 |mmc_suspend_host(host = 0xEDCD0800)
-009 |mmc_host_suspend(dev = 0xEDCD0808)
-010 |dpm_run_callback(cb = 0xC0627A88, dev = 0xEDCD0808, state = 
(event = 2), info = 0xC0B6EF9B)
-011 |__device_suspend(dev = 0xEDCD0808, state = (event = 2), ?)
-012 |device_suspend(inline)
-012 |dpm_suspend(state = (event = 2))
-013 |suspend_devices_and_enter(state = 3)
-014 |enter_state(inline)
-014 |pm_suspend(state = 3)
-015 |try_to_suspend(?)
-016 |static_key_false(inline)
-016 |trace_workqueue_execute_end(inline)
-016 |process_one_work(worker = 0xD5E87040, work = 0xC0E3FF54)
-017 |worker_thread(__worker = 0xD5E87040)
-018 |kthread(_create = 0xE9FFBF10)
-019 |kernel_thread_exit(asm)
 --- |end of frame

mmc_rescan() resume path:
 -000 |context_switch(inline)
 -000 |__schedule()
 -001 |do_undefinstr(regs = 0xD12242F0)
 -002 |__und_svc(asm)
  --> |exception
  -003 |sdhci_set_power(host = 0xEDCD0CC0, power = 0)
  -004 |sdhci_do_set_ios(host = 0xEDCD0CC0, ios = 0xEDCD0A70)
  -005 |sdhci_set_ios(?, ios = 0xEDCD0A70)
  -006 |mmc_set_ios(host = 0xEDCD0800)
  -007 |mmc_delay(inline)
  -007 |mmc_power_up(host = 0xEDCD0800)
  -008 |mmc_resume_bus(inline)
  -008 |mmc_resume_bus(host = 0xEDCD0800)
  -009 |mmc_rpm_hold(host = 0xEDCD0800, ?)
  -010 |mmc_rescan(work = 0xEDCD0AB0)
  -011 |static_key_false(inline)
  -011 |trace_workqueue_execute_end(inline)
  -011 |process_one_work(worker = 0xE5F0BBC0, work = 0xEDCD0AB0)
  -012 |worker_thread(__worker = 0xE5F0BBC0)
  -013 |kthread(_create = 0xD03A5F10)
  -014 |kernel_thread_exit(asm)
   --- |end of frame

most mmc power callback function don't check this special case, and
will cause problems, make sure the workqueue is stopped during suspend
is more safe.
---
 drivers/mmc/core/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index a6c139d..ae8757f 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -3881,7 +3881,7 @@ static int __init mmc_init(void)
 {
int ret;
 
-   workqueue = alloc_ordered_workqueue("kmmcd", 0);
+   workqueue = alloc_ordered_workqueue("kmmcd", WQ_FREEZABLE);
if (!workqueue)
return -ENOMEM;
 
-- 
2.1.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction

2015-01-08 Thread Wang, Yalin

 -Original Message-
 From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
 Sent: Friday, January 09, 2015 2:41 AM
 To: Wang, Yalin
 Cc: 'Will Deacon'; 'Ard Biesheuvel'; 'linux-kernel@vger.kernel.org';
 'akinobu.m...@gmail.com'; 'linux...@kvack.org'; 'Joe Perches'; 'linux-arm-
 ker...@lists.infradead.org'
 Subject: Re: [RFC V6 2/3] arm:add bitrev.h file to support rbit instruction
 
 On Mon, Nov 17, 2014 at 10:38:58AM +0800, Wang, Yalin wrote:
  Joe has submitted patches to maintainers, So we need wait for them to
  be accepted .
 
 I ran these patches through my autobuilder, and while most builds didn't
 seem to be a problem, the randconfigs found errors:
 
 /tmp/ccbiuDjS.s:137: Error: selected processor does not support ARM mode
 `rbit r3,r2'
 /tmp/ccbiuDjS.s:145: Error: selected processor does not support ARM mode
 `rbit r0,r1'
 make[4]: *** [drivers/iio/amplifiers/ad8366.o] Error 1
 
 /tmp/ccFhnoO3.s:6789: Error: selected processor does not support ARM mode
 `rbit r2,r2'
 make[4]: *** [drivers/mtd/devices/docg3.o] Error 1
 
 /tmp/cckMf2pp.s:239: Error: selected processor does not support ARM mode
 `rbit ip,ip'
 /tmp/cckMf2pp.s:241: Error: selected processor does not support ARM mode
 `rbit r2,r2'
 /tmp/cckMf2pp.s:248: Error: selected processor does not support ARM mode
 `rbit lr,lr'
 /tmp/cckMf2pp.s:250: Error: selected processor does not support ARM mode
 `rbit r3,r3'
 make[5]: *** [drivers/video/fbdev/nvidia/nvidia.o] Error 1
 
 /tmp/ccTgULsO.s:1151: Error: selected processor does not support ARM mode
 `rbit r1,r1'
 /tmp/ccTgULsO.s:1158: Error: selected processor does not support ARM mode
 `rbit r0,r0'
 /tmp/ccTgULsO.s:1164: Error: selected processor does not support ARM mode
 `rbit ip,ip'
 /tmp/ccTgULsO.s:1166: Error: selected processor does not support ARM mode
 `rbit r3,r3'
 /tmp/ccTgULsO.s:1227: Error: selected processor does not support ARM mode
 `rbit r5,r5'
 /tmp/ccTgULsO.s:1229: Error: selected processor does not support ARM mode
 `rbit lr,lr'
 /tmp/ccTgULsO.s:1236: Error: selected processor does not support ARM mode
 `rbit r0,r0'
 /tmp/ccTgULsO.s:1238: Error: selected processor does not support ARM mode
 `rbit r3,r3'
 make[5]: *** [drivers/video/fbdev/nvidia/nv_accel.o] Error 1
 
 The root cause is that the kernel being built is supposed to support both
 ARMv7 and ARMv6K CPUs.  However, rbit is only available on
 ARMv6T2 (thumb2) and ARMv7, and not plain ARMv6 or ARMv6K CPUs.
 
In the patch that you applied:
8205/1  add bitrev.h file to support rbit instruction

I have add :
+   select HAVE_ARCH_BITREVERSE if ((CPU_V7M || CPU_V7)  !CPU_V6)

If you build kernel support ARMv6K, should CONFIG_CPU_V6=y, isn't it ?
Then will not build hardware rbit instruction, isn't it ?

Thanks








--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] mmc:change mmc_init workqueue into a freezable workqueue

2015-01-08 Thread Wang, Yalin

 -Original Message-
 From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
 Sent: Thursday, January 08, 2015 7:42 PM
 To: Wang, Yalin
 Cc: 'ch...@printf.net'; 'ulf.hans...@linaro.org'; 'tim.kry...@gmail.com';
 'tgih@samsung.com'; 'johan.rudh...@axis.com'; 'linux-
 m...@vger.kernel.org'; 'linux-kernel@vger.kernel.org'; 'linux-arm-
 ker...@lists.infradead.org'
 Subject: Re: [RFC] mmc:change mmc_init workqueue into a freezable workqueue
 
 On Thu, Jan 08, 2015 at 06:06:32PM +0800, Wang, Yalin wrote:
  This patch fix the mmc driver suspend/resume conflict problems, mmc
  workqueue will queue mmc_rescan(), and it will call some
  pm_runtime_* functions, this will conflict with suspend path
  sometimes, and will result in some strange behavior:
 
  Suspend path:
  -000 |context_switch(inline)
  -000 |__schedule()
  -001 |schedule_preempt_disabled()
  -002 |spin_lock(inline)
  -002 |__mutex_lock_common(inline)
  -002 |__mutex_lock_slowpath(lock_count = 0xEDCD0F48)
  -003 |__mutex_fastpath_lock(inline)
  -003 |mutex_lock(lock = 0xEDCD0F48)
  -004 |sdhci_do_set_ios(host = 0xEDCD0CC0, ios = 0xEDCD0A70)
  -005 |sdhci_set_ios(?, ios = 0xEDCD0A70)
  -006 |mmc_set_ios(host = 0xEDCD0800)
  -007 |mmc_delay(inline)
  -007 |mmc_power_off(host = 0xEDCD0800)
  -008 |mmc_suspend_host(inline)
  -008 |mmc_suspend_host(host = 0xEDCD0800)
  -009 |mmc_host_suspend(dev = 0xEDCD0808)
  -010 |dpm_run_callback(cb = 0xC0627A88, dev = 0xEDCD0808, state =
 (event = 2), info = 0xC0B6EF9B)
  -011 |__device_suspend(dev = 0xEDCD0808, state = (event = 2), ?)
  -012 |device_suspend(inline)
 
 Suspend can't complete until device_suspend() returns.  Hence,
 mmc_power_off() must finish.
 
  mmc_rescan() resume path:
   -000 |context_switch(inline)
   -000 |__schedule()
   -001 |do_undefinstr(regs = 0xD12242F0)
   -002 |__und_svc(asm)
-- |exception
 
 I assume this is what you mean by strange behaviour ?  If so, please give
 the full oops.  Have you fully diagnosed the failure?
 
  most mmc power callback function don't check this special case, and
  will cause problems, make sure the workqueue is stopped during suspend
  is more safe.
 
 I think there's a bad side effect of this: if you have swap on a SD card,
 and you ask the system to hibernate, you don't want this thread to freeze.
 
 What you do need to do is to ensure that new requests can't be processed
 while the host is suspended.
 
 A hibernate works (approximately) as:
 
 - freeze all tasks
 - push as much out to swap as possible
 - suspend devices
 - create system snapshot
 - resume devices
 - write system snapshot to swap
 - power down
 
 If the MMC thread is frozen, and you have swap on SD, then it can't write
 the image to swap.
 
I see your concerns, thanks for your comment.
I will think about some other solutions for this problems.


Thank you

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] mmc:change mmc_init workqueue into a freezable workqueue

2015-01-08 Thread Wang, Yalin

This patch fix the mmc driver suspend/resume conflict problems,
mmc workqueue will queue mmc_rescan(), and it will call some
pm_runtime_* functions, this will conflict with suspend path sometimes,
and will result in some strange behavior:

Suspend path:
-000 |context_switch(inline)
-000 |__schedule()
-001 |schedule_preempt_disabled()
-002 |spin_lock(inline)
-002 |__mutex_lock_common(inline)
-002 |__mutex_lock_slowpath(lock_count = 0xEDCD0F48)
-003 |__mutex_fastpath_lock(inline)
-003 |mutex_lock(lock = 0xEDCD0F48)
-004 |sdhci_do_set_ios(host = 0xEDCD0CC0, ios = 0xEDCD0A70)
-005 |sdhci_set_ios(?, ios = 0xEDCD0A70)
-006 |mmc_set_ios(host = 0xEDCD0800)
-007 |mmc_delay(inline)
-007 |mmc_power_off(host = 0xEDCD0800)
-008 |mmc_suspend_host(inline)
-008 |mmc_suspend_host(host = 0xEDCD0800)
-009 |mmc_host_suspend(dev = 0xEDCD0808)
-010 |dpm_run_callback(cb = 0xC0627A88, dev = 0xEDCD0808, state = 
(event = 2), info = 0xC0B6EF9B)
-011 |__device_suspend(dev = 0xEDCD0808, state = (event = 2), ?)
-012 |device_suspend(inline)
-012 |dpm_suspend(state = (event = 2))
-013 |suspend_devices_and_enter(state = 3)
-014 |enter_state(inline)
-014 |pm_suspend(state = 3)
-015 |try_to_suspend(?)
-016 |static_key_false(inline)
-016 |trace_workqueue_execute_end(inline)
-016 |process_one_work(worker = 0xD5E87040, work = 0xC0E3FF54)
-017 |worker_thread(__worker = 0xD5E87040)
-018 |kthread(_create = 0xE9FFBF10)
-019 |kernel_thread_exit(asm)
 --- |end of frame

mmc_rescan() resume path:
 -000 |context_switch(inline)
 -000 |__schedule()
 -001 |do_undefinstr(regs = 0xD12242F0)
 -002 |__und_svc(asm)
  -- |exception
  -003 |sdhci_set_power(host = 0xEDCD0CC0, power = 0)
  -004 |sdhci_do_set_ios(host = 0xEDCD0CC0, ios = 0xEDCD0A70)
  -005 |sdhci_set_ios(?, ios = 0xEDCD0A70)
  -006 |mmc_set_ios(host = 0xEDCD0800)
  -007 |mmc_delay(inline)
  -007 |mmc_power_up(host = 0xEDCD0800)
  -008 |mmc_resume_bus(inline)
  -008 |mmc_resume_bus(host = 0xEDCD0800)
  -009 |mmc_rpm_hold(host = 0xEDCD0800, ?)
  -010 |mmc_rescan(work = 0xEDCD0AB0)
  -011 |static_key_false(inline)
  -011 |trace_workqueue_execute_end(inline)
  -011 |process_one_work(worker = 0xE5F0BBC0, work = 0xEDCD0AB0)
  -012 |worker_thread(__worker = 0xE5F0BBC0)
  -013 |kthread(_create = 0xD03A5F10)
  -014 |kernel_thread_exit(asm)
   --- |end of frame

most mmc power callback function don't check this special case, and
will cause problems.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 drivers/mmc/core/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index a6c139d..ae8757f 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -3881,7 +3881,7 @@ static int __init mmc_init(void)
 {
int ret;
 
-   workqueue = alloc_ordered_workqueue(kmmcd, 0);
+   workqueue = alloc_ordered_workqueue(kmmcd, WQ_FREEZABLE);
if (!workqueue)
return -ENOMEM;
 
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] mmc:change mmc_init workqueue into a freezable workqueue

2015-01-08 Thread Wang, Yalin

This patch fix the mmc driver suspend/resume conflict problems,
mmc workqueue will queue mmc_rescan(), and it will call some
pm_runtime_* functions, this will conflict with suspend path sometimes,
and will result in some strange behavior:

Suspend path:
-000 |context_switch(inline)
-000 |__schedule()
-001 |schedule_preempt_disabled()
-002 |spin_lock(inline)
-002 |__mutex_lock_common(inline)
-002 |__mutex_lock_slowpath(lock_count = 0xEDCD0F48)
-003 |__mutex_fastpath_lock(inline)
-003 |mutex_lock(lock = 0xEDCD0F48)
-004 |sdhci_do_set_ios(host = 0xEDCD0CC0, ios = 0xEDCD0A70)
-005 |sdhci_set_ios(?, ios = 0xEDCD0A70)
-006 |mmc_set_ios(host = 0xEDCD0800)
-007 |mmc_delay(inline)
-007 |mmc_power_off(host = 0xEDCD0800)
-008 |mmc_suspend_host(inline)
-008 |mmc_suspend_host(host = 0xEDCD0800)
-009 |mmc_host_suspend(dev = 0xEDCD0808)
-010 |dpm_run_callback(cb = 0xC0627A88, dev = 0xEDCD0808, state = 
(event = 2), info = 0xC0B6EF9B)
-011 |__device_suspend(dev = 0xEDCD0808, state = (event = 2), ?)
-012 |device_suspend(inline)
-012 |dpm_suspend(state = (event = 2))
-013 |suspend_devices_and_enter(state = 3)
-014 |enter_state(inline)
-014 |pm_suspend(state = 3)
-015 |try_to_suspend(?)
-016 |static_key_false(inline)
-016 |trace_workqueue_execute_end(inline)
-016 |process_one_work(worker = 0xD5E87040, work = 0xC0E3FF54)
-017 |worker_thread(__worker = 0xD5E87040)
-018 |kthread(_create = 0xE9FFBF10)
-019 |kernel_thread_exit(asm)
 --- |end of frame

mmc_rescan() resume path:
 -000 |context_switch(inline)
 -000 |__schedule()
 -001 |do_undefinstr(regs = 0xD12242F0)
 -002 |__und_svc(asm)
  -- |exception
  -003 |sdhci_set_power(host = 0xEDCD0CC0, power = 0)
  -004 |sdhci_do_set_ios(host = 0xEDCD0CC0, ios = 0xEDCD0A70)
  -005 |sdhci_set_ios(?, ios = 0xEDCD0A70)
  -006 |mmc_set_ios(host = 0xEDCD0800)
  -007 |mmc_delay(inline)
  -007 |mmc_power_up(host = 0xEDCD0800)
  -008 |mmc_resume_bus(inline)
  -008 |mmc_resume_bus(host = 0xEDCD0800)
  -009 |mmc_rpm_hold(host = 0xEDCD0800, ?)
  -010 |mmc_rescan(work = 0xEDCD0AB0)
  -011 |static_key_false(inline)
  -011 |trace_workqueue_execute_end(inline)
  -011 |process_one_work(worker = 0xE5F0BBC0, work = 0xEDCD0AB0)
  -012 |worker_thread(__worker = 0xE5F0BBC0)
  -013 |kthread(_create = 0xD03A5F10)
  -014 |kernel_thread_exit(asm)
   --- |end of frame

most mmc power callback function don't check this special case, and
will cause problems, make sure the workqueue is stopped during suspend
is more safe.
---
 drivers/mmc/core/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index a6c139d..ae8757f 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -3881,7 +3881,7 @@ static int __init mmc_init(void)
 {
int ret;
 
-   workqueue = alloc_ordered_workqueue(kmmcd, 0);
+   workqueue = alloc_ordered_workqueue(kmmcd, WQ_FREEZABLE);
if (!workqueue)
return -ENOMEM;
 
-- 
2.1.3
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] mm:change meminfo cached calculation

2014-12-26 Thread Wang, Yalin

This patch subtract sharedram from cached,
sharedram can only be swap into swap partitions,
they should be treated as swap pages, not as cached pages.

Signed-off-by: Yalin Wang 
---
 fs/proc/meminfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index d3ebf2e..2b598a6 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -45,7 +45,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
committed = percpu_counter_read_positive(_committed_as);
 
cached = global_page_state(NR_FILE_PAGES) -
-   total_swapcache_pages() - i.bufferram;
+   total_swapcache_pages() - i.bufferram - i.sharedram;
if (cached < 0)
cached = 0;
 
-- 
2.1.3
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[RFC] mm:change meminfo cached calculation

2014-12-26 Thread Wang, Yalin

This patch subtract sharedram from cached,
sharedram can only be swap into swap partitions,
they should be treated as swap pages, not as cached pages.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 fs/proc/meminfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index d3ebf2e..2b598a6 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -45,7 +45,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
committed = percpu_counter_read_positive(vm_committed_as);
 
cached = global_page_state(NR_FILE_PAGES) -
-   total_swapcache_pages() - i.bufferram;
+   total_swapcache_pages() - i.bufferram - i.sharedram;
if (cached  0)
cached = 0;
 
-- 
2.1.3
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [RFC] MADV_FREE doesn't work when doesn't have swap partition

2014-12-18 Thread Wang, Yalin

> -Original Message-
> From: Minchan Kim [mailto:minc...@kernel.org]
> Sent: Friday, December 19, 2014 9:05 AM
> To: Wang, Yalin
> Cc: 'Konstantin Khlebnikov'; 'Kirill A. Shutemov'; 'Andrew Morton'; 'linux-
> ker...@vger.kernel.org'; 'linux...@kvack.org'; 'linux-arm-
> ker...@lists.infradead.org'; 'n-horigu...@ah.jp.nec.com'
> Subject: Re: [RFC] MADV_FREE doesn't work when doesn't have swap partition
> 
> On Thu, Dec 18, 2014 at 11:50:01AM +0800, Wang, Yalin wrote:
> > I notice this commit:
> > mm: support madvise(MADV_FREE),
> >
> > it can free clean anonymous pages directly, doesn't need pageout to
> > swap partition,
> >
> > but I found it doesn't work on my platform, which don't enable any
> > swap partitions.
> 
> Current implementation, if there is no empty slot in swap, it does instant
> free instead of delayed free. Look at madvise_vma.
> 
> >
> > I make a change for this.
> > Just to explain my issue clearly,
> > Do we need some other checks to still scan anonymous pages even Don't
> > have swap partition but have clean anonymous pages?
> 
> There is a few places we should consider if you want to scan anonymous page
> withotu swap. Refer 69c854817566 and 74e3f3c3391d.
> 
> However, it's not simple at the moment. If we reenable anonymous scan
> without swap, it would make much regress of reclaim. So my direction is
> move normal anonymos pages into unevictable LRU list because they're real
> unevictable without swap and put delayed freeing pages into anon LRU list
> and age them.
> 
I understand your solution, sounds a great idea!
When this design will be merged into main stream?

Thanks.

RE: [RFC] MADV_FREE doesn't work when doesn't have swap partition

2014-12-18 Thread Wang, Yalin

 -Original Message-
 From: Minchan Kim [mailto:minc...@kernel.org]
 Sent: Friday, December 19, 2014 9:05 AM
 To: Wang, Yalin
 Cc: 'Konstantin Khlebnikov'; 'Kirill A. Shutemov'; 'Andrew Morton'; 'linux-
 ker...@vger.kernel.org'; 'linux...@kvack.org'; 'linux-arm-
 ker...@lists.infradead.org'; 'n-horigu...@ah.jp.nec.com'
 Subject: Re: [RFC] MADV_FREE doesn't work when doesn't have swap partition

 On Thu, Dec 18, 2014 at 11:50:01AM +0800, Wang, Yalin wrote:
  I notice this commit:
  mm: support madvise(MADV_FREE),

  it can free clean anonymous pages directly, doesn't need pageout to
  swap partition,

  but I found it doesn't work on my platform, which don't enable any
  swap partitions.

 Current implementation, if there is no empty slot in swap, it does instant
 free instead of delayed free. Look at madvise_vma.

  I make a change for this.
  Just to explain my issue clearly,
  Do we need some other checks to still scan anonymous pages even Don't
  have swap partition but have clean anonymous pages?

 There is a few places we should consider if you want to scan anonymous page
 withotu swap. Refer 69c854817566 and 74e3f3c3391d.

 However, it's not simple at the moment. If we reenable anonymous scan
 without swap, it would make much regress of reclaim. So my direction is
 move normal anonymos pages into unevictable LRU list because they're real
 unevictable without swap and put delayed freeing pages into anon LRU list
 and age them.

I understand your solution, sounds a great idea!
When this design will be merged into main stream?

Thanks.

[RFC] MADV_FREE doesn't work when doesn't have swap partition

2014-12-17 Thread Wang, Yalin

I notice this commit:
mm: support madvise(MADV_FREE),

it can free clean anonymous pages directly,
doesn't need pageout to swap partition,

but I found it doesn't work on my platform,
which don't enable any swap partitions.

I make a change for this.
Just to explain my issue clearly,
Do we need some other checks to still scan anonymous pages even
Don't have swap partition but have clean anonymous pages?
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8772b..8258f3a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1941,7 +1941,7 @@ static void get_scan_count(struct lruvec *lruvec, int 
swappiness,
force_scan = true;

/* If we have no swap space, do not bother scanning anon pages. */
-   if (!sc->may_swap || (get_nr_swap_pages() <= 0)) {
+   if (!sc->may_swap) {
scan_balance = SCAN_FILE;
goto out;
}

[RFC] MADV_FREE doesn't work when doesn't have swap partition

2014-12-17 Thread Wang, Yalin

I notice this commit:
mm: support madvise(MADV_FREE),

it can free clean anonymous pages directly,
doesn't need pageout to swap partition,

but I found it doesn't work on my platform,
which don't enable any swap partitions.

I make a change for this.
Just to explain my issue clearly,
Do we need some other checks to still scan anonymous pages even
Don't have swap partition but have clean anonymous pages?
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8772b..8258f3a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1941,7 +1941,7 @@ static void get_scan_count(struct lruvec *lruvec, int 
swappiness,
force_scan = true;

/* If we have no swap space, do not bother scanning anon pages. */
-   if (!sc-may_swap || (get_nr_swap_pages() = 0)) {
+   if (!sc-may_swap) {
scan_balance = SCAN_FILE;
goto out;
}

[PATCH] regmap:change spinlock_flags into the union

2014-12-15 Thread Wang, Yalin

This patch move struct regmap.spinlock_flags into the union of
spinlock, so that we can shrink struct regmap size.

Signed-off-by: Yalin Wang 
---
 drivers/base/regmap/internal.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/base/regmap/internal.h b/drivers/base/regmap/internal.h
index 0da5865..8e94584 100644
--- a/drivers/base/regmap/internal.h
+++ b/drivers/base/regmap/internal.h
@@ -51,9 +51,11 @@ struct regmap_async {
 struct regmap {
union {
struct mutex mutex;
-   spinlock_t spinlock;
+   struct {
+   spinlock_t spinlock;
+   unsigned long spinlock_flags;
+   };
};
-   unsigned long spinlock_flags;
regmap_lock lock;
regmap_unlock unlock;
void *lock_arg; /* This is passed to lock/unlock functions */
-- 
2.1.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] regmap:change spinlock_flags into the union

2014-12-15 Thread Wang, Yalin

This patch move struct regmap.spinlock_flags into the union of
spinlock, so that we can shrink struct regmap size.

Signed-off-by: Yalin Wang yalin.w...@sonymobile.com
---
 drivers/base/regmap/internal.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/base/regmap/internal.h b/drivers/base/regmap/internal.h
index 0da5865..8e94584 100644
--- a/drivers/base/regmap/internal.h
+++ b/drivers/base/regmap/internal.h
@@ -51,9 +51,11 @@ struct regmap_async {
 struct regmap {
union {
struct mutex mutex;
-   spinlock_t spinlock;
+   struct {
+   spinlock_t spinlock;
+   unsigned long spinlock_flags;
+   };
};
-   unsigned long spinlock_flags;
regmap_lock lock;
regmap_unlock unlock;
void *lock_arg; /* This is passed to lock/unlock functions */
-- 
2.1.3
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] discard task stack pages instead of pageout into swap partition

2014-12-11 Thread Wang, Yalin

Hi,

I am think of discard stack pages if the old
Page is under the stack pointer(Assume stack grow down)
of the task, This page don't need pageout, we can free it directly,
When the task need it again, we just use a zero page to
Map, it is safe for stack .

But I don't know how to implement it,
And is there some issue if do like this ?

The following is pseudo code to explain my ideas.
Any comments are appreciated !
Thanks
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dcb4707..52e8314 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -962,6 +962,12 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
; /* try to reclaim the page below */
}
 
+   if (page_vm_flags(page) & (VM_GROWSUP | VM_GROWSDOWN) &&
+   PageAnon(page) && !PageSwapCache(page)) {
+   if (page_task_is_sleep(page) && task_sp > page->index) {
+   zap_page_range(vma, page->index, PAGE_SIZE);
+   }
+   }
/*
 * Anonymous process memory has backing store?
 * Try to allocate it some swap space here.
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [RFC] mm:fix zero_page huge_zero_page rss/pss statistic

2014-12-11 Thread Wang, Yalin

> -Original Message-
> From: Kirill A. Shutemov [mailto:kir...@shutemov.name]
> Sent: Wednesday, December 10, 2014 7:06 PM
> To: Wang, Yalin
> Cc: 'Andrew Morton'; 'Konstantin Khlebnikov'; 'linux-
> ker...@vger.kernel.org'; 'linux...@kvack.org'; 'linux-arm-
> ker...@lists.infradead.org'; 'n-horigu...@ah.jp.nec.com'; 'o...@redhat.com';
> 'gorcu...@openvz.org'; 'pfei...@google.com'
> Subject: Re: [RFC] mm:fix zero_page huge_zero_page rss/pss statistic
> 
> On Wed, Dec 10, 2014 at 03:22:21PM +0800, Wang, Yalin wrote:
> > smaps_pte_entry() doesn't ignore zero_huge_page, but it ignore
> > zero_page, because vm_normal_page() will ignore it. We remove
> > vm_normal_page() call, because walk_page_range() have ignore VM_PFNMAP
> > vma maps, it's safe to just use pfn_valid(), so that we can also
> > consider zero_page to be a valid page.
> 
> We fixed huge zero page accounting in smaps recentely. See mm tree.
> 
Hi 
I can't find the git, could you send me a link?
Thank you !
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] mm:fix zero_page huge_zero_page rss/pss statistic

2014-12-11 Thread Wang, Yalin

 -Original Message-
 From: Kirill A. Shutemov [mailto:kir...@shutemov.name]
 Sent: Wednesday, December 10, 2014 7:06 PM
 To: Wang, Yalin
 Cc: 'Andrew Morton'; 'Konstantin Khlebnikov'; 'linux-
 ker...@vger.kernel.org'; 'linux...@kvack.org'; 'linux-arm-
 ker...@lists.infradead.org'; 'n-horigu...@ah.jp.nec.com'; 'o...@redhat.com';
 'gorcu...@openvz.org'; 'pfei...@google.com'
 Subject: Re: [RFC] mm:fix zero_page huge_zero_page rss/pss statistic

 On Wed, Dec 10, 2014 at 03:22:21PM +0800, Wang, Yalin wrote:
  smaps_pte_entry() doesn't ignore zero_huge_page, but it ignore
  zero_page, because vm_normal_page() will ignore it. We remove
  vm_normal_page() call, because walk_page_range() have ignore VM_PFNMAP
  vma maps, it's safe to just use pfn_valid(), so that we can also
  consider zero_page to be a valid page.

 We fixed huge zero page accounting in smaps recentely. See mm tree.

Hi 
I can't find the git, could you send me a link?
Thank you !
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] discard task stack pages instead of pageout into swap partition

2014-12-11 Thread Wang, Yalin

Hi,

I am think of discard stack pages if the old
Page is under the stack pointer(Assume stack grow down)
of the task, This page don't need pageout, we can free it directly,
When the task need it again, we just use a zero page to
Map, it is safe for stack .

But I don't know how to implement it,
And is there some issue if do like this ?

The following is pseudo code to explain my ideas.
Any comments are appreciated !
Thanks
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dcb4707..52e8314 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -962,6 +962,12 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
; /* try to reclaim the page below */
}
 
+   if (page_vm_flags(page)  (VM_GROWSUP | VM_GROWSDOWN) 
+   PageAnon(page)  !PageSwapCache(page)) {
+   if (page_task_is_sleep(page)  task_sp  page-index) {
+   zap_page_range(vma, page-index, PAGE_SIZE);
+   }
+   }
/*
 * Anonymous process memory has backing store?
 * Try to allocate it some swap space here.
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

1 2 3 4 >

1 - 100 of 358 matches

Mail list logo