Re: [PATCH] ARM:mm: define arch-specific IOREMAP_MAX_ORDER only in !SMP && !LPAE case

2015-09-09 Thread Sergey Dyasly

On 09.09.2015 6:10, Nicolas Pitre wrote:

On Tue, 8 Sep 2015, Sergey Dyasly wrote:


On 08.09.2015 5:45, Zhang Zhen wrote:

The arch-specific IOREMAP_MAX_ORDER is introduced in
commit: ff0daca([ARM] Add section support to ioremap) and
commit: a069c89 ([ARM] 3705/1: add supersection support to ioremap()).
But supersections and sections mappings are only used in !SMP && !LPAE case.
Otherwise, mapping is created using the usual 4K pages.

In most cases without !SMP && !LPAE, the big alignment cause high
fragmentation
issue in vmalloc area.
Here we use arch-specific IOREMAP_MAX_ORDER only in !SMP && !LPAE case,
otherwise use generic IOREMAP_MAX_ORDER in include/linux/vmalloc.h.

Signed-off-by: Zhang Zhen 
---
   arch/arm/include/asm/memory.h | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index b7f6fb4..3209012 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -76,10 +76,12 @@
*/
   #define XIP_VIRT_ADDR(physaddr)  (MODULES_VADDR + ((physaddr) &
0x000f))

+#if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
   /*
* Allow 16MB-aligned ioremap pages
*/
   #define IOREMAP_MAX_ORDER24
+#endif

   #else /* CONFIG_MMU */


Hmm... This looks exactly like my old patch -
http://thread.gmane.org/gmane.linux.kernel.mm/127620
(Thanks to Vladimir for pointing that out!)

Hasn't there been any progress about this issue?

IMHO the patch makes sense.

Did you send it to RMK's patch system?

Ok, submitted it to the patch system. Thanks!




Nicolas


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM:mm: define arch-specific IOREMAP_MAX_ORDER only in !SMP && !LPAE case

2015-09-09 Thread Sergey Dyasly

On 09.09.2015 6:10, Nicolas Pitre wrote:

On Tue, 8 Sep 2015, Sergey Dyasly wrote:


On 08.09.2015 5:45, Zhang Zhen wrote:

The arch-specific IOREMAP_MAX_ORDER is introduced in
commit: ff0daca([ARM] Add section support to ioremap) and
commit: a069c89 ([ARM] 3705/1: add supersection support to ioremap()).
But supersections and sections mappings are only used in !SMP && !LPAE case.
Otherwise, mapping is created using the usual 4K pages.

In most cases without !SMP && !LPAE, the big alignment cause high
fragmentation
issue in vmalloc area.
Here we use arch-specific IOREMAP_MAX_ORDER only in !SMP && !LPAE case,
otherwise use generic IOREMAP_MAX_ORDER in include/linux/vmalloc.h.

Signed-off-by: Zhang Zhen <zhenzhang.zh...@huawei.com>
---
   arch/arm/include/asm/memory.h | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index b7f6fb4..3209012 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -76,10 +76,12 @@
*/
   #define XIP_VIRT_ADDR(physaddr)  (MODULES_VADDR + ((physaddr) &
0x000f))

+#if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
   /*
* Allow 16MB-aligned ioremap pages
*/
   #define IOREMAP_MAX_ORDER24
+#endif

   #else /* CONFIG_MMU */


Hmm... This looks exactly like my old patch -
http://thread.gmane.org/gmane.linux.kernel.mm/127620
(Thanks to Vladimir for pointing that out!)

Hasn't there been any progress about this issue?

IMHO the patch makes sense.

Did you send it to RMK's patch system?

Ok, submitted it to the patch system. Thanks!




Nicolas


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM:mm: define arch-specific IOREMAP_MAX_ORDER only in !SMP && !LPAE case

2015-09-08 Thread Sergey Dyasly

On 08.09.2015 5:45, Zhang Zhen wrote:

The arch-specific IOREMAP_MAX_ORDER is introduced in
commit: ff0daca([ARM] Add section support to ioremap) and
commit: a069c89 ([ARM] 3705/1: add supersection support to ioremap()).
But supersections and sections mappings are only used in !SMP && !LPAE case.
Otherwise, mapping is created using the usual 4K pages.

In most cases without !SMP && !LPAE, the big alignment cause high fragmentation
issue in vmalloc area.
Here we use arch-specific IOREMAP_MAX_ORDER only in !SMP && !LPAE case,
otherwise use generic IOREMAP_MAX_ORDER in include/linux/vmalloc.h.

Signed-off-by: Zhang Zhen 
---
  arch/arm/include/asm/memory.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index b7f6fb4..3209012 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -76,10 +76,12 @@
   */
  #define XIP_VIRT_ADDR(physaddr)  (MODULES_VADDR + ((physaddr) & 0x000f))

+#if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
  /*
   * Allow 16MB-aligned ioremap pages
   */
  #define IOREMAP_MAX_ORDER 24
+#endif

  #else /* CONFIG_MMU */



Hmm... This looks exactly like my old patch -
http://thread.gmane.org/gmane.linux.kernel.mm/127620
(Thanks to Vladimir for pointing that out!)

Hasn't there been any progress about this issue?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM:mm: define arch-specific IOREMAP_MAX_ORDER only in !SMP && !LPAE case

2015-09-08 Thread Sergey Dyasly

On 08.09.2015 5:45, Zhang Zhen wrote:

The arch-specific IOREMAP_MAX_ORDER is introduced in
commit: ff0daca([ARM] Add section support to ioremap) and
commit: a069c89 ([ARM] 3705/1: add supersection support to ioremap()).
But supersections and sections mappings are only used in !SMP && !LPAE case.
Otherwise, mapping is created using the usual 4K pages.

In most cases without !SMP && !LPAE, the big alignment cause high fragmentation
issue in vmalloc area.
Here we use arch-specific IOREMAP_MAX_ORDER only in !SMP && !LPAE case,
otherwise use generic IOREMAP_MAX_ORDER in include/linux/vmalloc.h.

Signed-off-by: Zhang Zhen 
---
  arch/arm/include/asm/memory.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index b7f6fb4..3209012 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -76,10 +76,12 @@
   */
  #define XIP_VIRT_ADDR(physaddr)  (MODULES_VADDR + ((physaddr) & 0x000f))

+#if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
  /*
   * Allow 16MB-aligned ioremap pages
   */
  #define IOREMAP_MAX_ORDER 24
+#endif

  #else /* CONFIG_MMU */



Hmm... This looks exactly like my old patch -
http://thread.gmane.org/gmane.linux.kernel.mm/127620
(Thanks to Vladimir for pointing that out!)

Hasn't there been any progress about this issue?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: use default ioremap alignment for SMP or LPAE

2015-01-21 Thread Sergey Dyasly
16MB alignment for ioremap mappings was added by commit a069c896d0d6 ("[ARM]
3705/1: add supersection support to ioremap()") in order to support supersection
mappings. But __arm_ioremap_pfn_caller uses section and supersection mappings
only in !SMP && !LPAE case. There is no need for such big alignment if either
SMP or LPAE is enabled.

After this change, ioremap will use default maximum alignment of 128 pages.

Cc: Russell King 
Cc: Guan Xuetao 
Cc: Nicolas Pitre 
Cc: James Bottomley 
Cc: Will Deacon 
Cc: Arnd Bergmann 
Cc: Catalin Marinas 
Cc: Andrew Morton 
Cc: Dmitry Safonov 
Link: 
https://lkml.kernel.org/g/1419328813-2211-1-git-send-email-d.safo...@partner.samsung.com
Signed-off-by: Sergey Dyasly 
---
 arch/arm/include/asm/memory.h |2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 184def0..c3ef139 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -78,10 +78,12 @@
  */
 #define XIP_VIRT_ADDR(physaddr)  (MODULES_VADDR + ((physaddr) & 0x000f))
 
+#if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
 /*
  * Allow 16MB-aligned ioremap pages
  */
 #define IOREMAP_MAX_ORDER  24
+#endif
 
 #else /* CONFIG_MMU */
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] checkpatch: add check for the buggy while_each_thread()

2015-01-21 Thread Sergey Dyasly
Now it's preferable to use for_each_thread() instead of while_each_thread().
Add a check to checkpatch.pl in order to prevent any new usages of the buggy
while_each_thread() when possible.

Cc: Oleg Nesterov 
Cc: Andy Whitcroft 
Cc: Joe Perches 
Signed-off-by: Sergey Dyasly 
---
Changes since v1:
- Added "\s*" to the regular expression
- Limited commit id to 12 digits
- "Prefer to use" --> "Consider using"
- Moved check to the end of the file

 scripts/checkpatch.pl | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index f0bb6d6..d48f7b2 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5259,6 +5259,12 @@ sub process {
}
}
}
+
+# Check for the buggy while_each_thread()
+   if ($line =~ /\bwhile_each_thread\s*\(/) {
+   WARN("WHILE_EACH_THREAD",
+"Consider using for_each_thread() instead of the 
buggy while_each_thread(). See commit 0c740d0afc3b for details.\n" . $herecurr);
+   }
}
 
# If we have no input at all, then there is nothing to report on
-- 
2.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] checkpatch: add check for the buggy while_each_thread()

2015-01-21 Thread Sergey Dyasly
On Sun, 18 Jan 2015 17:27:33 -0800
Joe Perches  wrote:

> On Sun, 2015-01-18 at 17:24 +0300, Sergey Dyasly wrote:
> > Now it's preferable to use for_each_thread() instead of while_each_thread().
> > Add a check to checkpatch.pl in order to prevent any new usages of the buggy
> > while_each_thread() when possible.
> []
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
> > @@ -3255,6 +3255,12 @@ sub process {
> >  "Prefer dev_$level(... to dev_printk(KERN_$orig, 
> > ...\n" . $herecurr);
> > }
> >  
> > +# Check for the buggy while_each_thread()
> > +   if ($line =~ /\bwhile_each_thread\(/) {
> 
>   \bwhile_each_thread\s*\(
> or maybe
>   \bwhile_each_thread\b
> 
> > +   WARN("WHILE_EACH_THREAD",
> > +"Prefer to use for_each_thread() instead of the 
> > buggy while_each_thread(). See commit 
> > 0c740d0afc3bff0a097ad03a1c8df92757516f5c for details.\n" . $herecurr);
> 
> Pretty long commit id, 12 is probably enough.

"\bwhile_each_thread\s*\(" and 12-digit commit id sound reasonable, thanks.
I'll update the patch.

-- 
Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] checkpatch: add check for the buggy while_each_thread()

2015-01-21 Thread Sergey Dyasly
Now it's preferable to use for_each_thread() instead of while_each_thread().
Add a check to checkpatch.pl in order to prevent any new usages of the buggy
while_each_thread() when possible.

Cc: Oleg Nesterov o...@redhat.com
Cc: Andy Whitcroft a...@canonical.com
Cc: Joe Perches j...@perches.com
Signed-off-by: Sergey Dyasly dse...@gmail.com
---
Changes since v1:
- Added \s* to the regular expression
- Limited commit id to 12 digits
- Prefer to use -- Consider using
- Moved check to the end of the file

 scripts/checkpatch.pl | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index f0bb6d6..d48f7b2 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5259,6 +5259,12 @@ sub process {
}
}
}
+
+# Check for the buggy while_each_thread()
+   if ($line =~ /\bwhile_each_thread\s*\(/) {
+   WARN(WHILE_EACH_THREAD,
+Consider using for_each_thread() instead of the 
buggy while_each_thread(). See commit 0c740d0afc3b for details.\n . $herecurr);
+   }
}
 
# If we have no input at all, then there is nothing to report on
-- 
2.2.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] checkpatch: add check for the buggy while_each_thread()

2015-01-21 Thread Sergey Dyasly
On Sun, 18 Jan 2015 17:27:33 -0800
Joe Perches j...@perches.com wrote:

 On Sun, 2015-01-18 at 17:24 +0300, Sergey Dyasly wrote:
  Now it's preferable to use for_each_thread() instead of while_each_thread().
  Add a check to checkpatch.pl in order to prevent any new usages of the buggy
  while_each_thread() when possible.
 []
  diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
 []
  @@ -3255,6 +3255,12 @@ sub process {
   Prefer dev_$level(... to dev_printk(KERN_$orig, 
  ...\n . $herecurr);
  }
   
  +# Check for the buggy while_each_thread()
  +   if ($line =~ /\bwhile_each_thread\(/) {
 
   \bwhile_each_thread\s*\(
 or maybe
   \bwhile_each_thread\b
 
  +   WARN(WHILE_EACH_THREAD,
  +Prefer to use for_each_thread() instead of the 
  buggy while_each_thread(). See commit 
  0c740d0afc3bff0a097ad03a1c8df92757516f5c for details.\n . $herecurr);
 
 Pretty long commit id, 12 is probably enough.

\bwhile_each_thread\s*\( and 12-digit commit id sound reasonable, thanks.
I'll update the patch.

-- 
Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: use default ioremap alignment for SMP or LPAE

2015-01-21 Thread Sergey Dyasly
16MB alignment for ioremap mappings was added by commit a069c896d0d6 ([ARM]
3705/1: add supersection support to ioremap()) in order to support supersection
mappings. But __arm_ioremap_pfn_caller uses section and supersection mappings
only in !SMP  !LPAE case. There is no need for such big alignment if either
SMP or LPAE is enabled.

After this change, ioremap will use default maximum alignment of 128 pages.

Cc: Russell King li...@arm.linux.org.uk
Cc: Guan Xuetao g...@mprc.pku.edu.cn
Cc: Nicolas Pitre nicolas.pi...@linaro.org
Cc: James Bottomley jbottom...@parallels.com
Cc: Will Deacon will.dea...@arm.com
Cc: Arnd Bergmann arnd.bergm...@linaro.org
Cc: Catalin Marinas catalin.mari...@arm.com
Cc: Andrew Morton a...@linux-foundation.org
Cc: Dmitry Safonov d.safo...@partner.samsung.com
Link: 
https://lkml.kernel.org/g/1419328813-2211-1-git-send-email-d.safo...@partner.samsung.com
Signed-off-by: Sergey Dyasly s.dya...@samsung.com
---
 arch/arm/include/asm/memory.h |2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 184def0..c3ef139 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -78,10 +78,12 @@
  */
 #define XIP_VIRT_ADDR(physaddr)  (MODULES_VADDR + ((physaddr)  0x000f))
 
+#if !defined(CONFIG_SMP)  !defined(CONFIG_ARM_LPAE)
 /*
  * Allow 16MB-aligned ioremap pages
  */
 #define IOREMAP_MAX_ORDER  24
+#endif
 
 #else /* CONFIG_MMU */
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH RESEND] mm: vmalloc: remove ioremap align constraint

2015-01-20 Thread Sergey Dyasly
On Sun, 04 Jan 2015 17:38:06 +0100
Arnd Bergmann  wrote:

> On Saturday 03 January 2015 18:59:46 Sergey Dyasly wrote:
> > Hi Arnd,
> > 
> > First, some background information. We originally encountered high 
> > fragmentation
> > issue in vmalloc area:
> > 
> > 1. Total size of vmalloc area was 400 MB.
> > 2. 200 MB of vmalloc area was consumed by ioremaps of various sizes.
> > 3. Largest contiguous chunk of vmalloc area was 12 MB.
> > 4. ioremap of 10 MB failed due to 8 MB alignment requirement.
> 
> Interesting, can you describe how you end up with that many ioremap mappings?
> 200MB seems like a lot. Do you perhaps get a lot of duplicate entries for the
> same hardware registers, or maybe a leak?
> 
> Can you send the output of /proc/vmallocinfo?
>  
> > It was decided to further increase the size of vmalloc area to resolve the 
> > above
> > issue. And I don't like that solution because it decreases the amount of 
> > lowmem.
> 
> If all the mappings are in fact required, have you considered using
> CONFIG_VMSPLIT_2G split to avoid the use of highmem?
> 
> > Now let's see how ioremap uses supersections. Judging from current 
> > implementation
> > of __arm_ioremap_pfn_caller:
> > 
> > #if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
> > if (pfn >= 0x10 && !((paddr | size | addr) & 
> > ~SUPERSECTION_MASK)) {
> > remap_area_supersections();
> > } else if (!((paddr | size | addr) & ~PMD_MASK)) {
> > remap_area_sections();
> > } else
> > #endif
> > err = ioremap_page_range();
> > 
> > supersections and sections mappings are used only in !SMP && !LPAE case.
> > Otherwise, mapping is created using the usual 4K pages (and we are using 
> > SMP).
> > The suggested patch removes alignment requirements for ioremap but it means 
> > that
> > sections will not be used in !SMP case. So another solution is required.
> > 
> > __get_vm_area_node has align parameter, maybe it can be used to specify the
> > required alignment of ioremap operation? Because I find current generic fls
> > algorithm to be very restrictive in cases when it's not necessary to use 
> > such
> > a big alignment.
> 
> I think using next-power-of-two alignment generally helps limit the effects of
> fragmentation the same way that the buddy allocator works.
> 
> Since the section and supersection maps are only used with non-SMP non-LPAE
> (why is that the case btw?),

vmap/vunmap mechanism works that way. ARM is using 2 levels of page tables:
PGD and PTE; and that provides the needed level of indirection. Every mm
contains a copy of init_mm's pgd mappings for kernel and they point to the same
set of PTEs. vmap/vunmap manipulates only with *pgd->pte and the change becomes
visible to every mm. This is impossible to do for sections because they use
PGD entries directly.

> it would however make sense to use the default
> (7 + PAGE_SHIFT) instead of the ARM-specific 24 here if one of them is set,
> I don't see any downsides to that.

This makes sense. I'll prepare a patch for that.

> 
>   Arnd


-- 
Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH RESEND] mm: vmalloc: remove ioremap align constraint

2015-01-20 Thread Sergey Dyasly
On Sun, 04 Jan 2015 17:38:06 +0100
Arnd Bergmann a...@arndb.de wrote:

 On Saturday 03 January 2015 18:59:46 Sergey Dyasly wrote:
  Hi Arnd,
  
  First, some background information. We originally encountered high 
  fragmentation
  issue in vmalloc area:
  
  1. Total size of vmalloc area was 400 MB.
  2. 200 MB of vmalloc area was consumed by ioremaps of various sizes.
  3. Largest contiguous chunk of vmalloc area was 12 MB.
  4. ioremap of 10 MB failed due to 8 MB alignment requirement.
 
 Interesting, can you describe how you end up with that many ioremap mappings?
 200MB seems like a lot. Do you perhaps get a lot of duplicate entries for the
 same hardware registers, or maybe a leak?
 
 Can you send the output of /proc/vmallocinfo?
  
  It was decided to further increase the size of vmalloc area to resolve the 
  above
  issue. And I don't like that solution because it decreases the amount of 
  lowmem.
 
 If all the mappings are in fact required, have you considered using
 CONFIG_VMSPLIT_2G split to avoid the use of highmem?
 
  Now let's see how ioremap uses supersections. Judging from current 
  implementation
  of __arm_ioremap_pfn_caller:
  
  #if !defined(CONFIG_SMP)  !defined(CONFIG_ARM_LPAE)
  if (pfn = 0x10  !((paddr | size | addr)  
  ~SUPERSECTION_MASK)) {
  remap_area_supersections();
  } else if (!((paddr | size | addr)  ~PMD_MASK)) {
  remap_area_sections();
  } else
  #endif
  err = ioremap_page_range();
  
  supersections and sections mappings are used only in !SMP  !LPAE case.
  Otherwise, mapping is created using the usual 4K pages (and we are using 
  SMP).
  The suggested patch removes alignment requirements for ioremap but it means 
  that
  sections will not be used in !SMP case. So another solution is required.
  
  __get_vm_area_node has align parameter, maybe it can be used to specify the
  required alignment of ioremap operation? Because I find current generic fls
  algorithm to be very restrictive in cases when it's not necessary to use 
  such
  a big alignment.
 
 I think using next-power-of-two alignment generally helps limit the effects of
 fragmentation the same way that the buddy allocator works.
 
 Since the section and supersection maps are only used with non-SMP non-LPAE
 (why is that the case btw?),

vmap/vunmap mechanism works that way. ARM is using 2 levels of page tables:
PGD and PTE; and that provides the needed level of indirection. Every mm
contains a copy of init_mm's pgd mappings for kernel and they point to the same
set of PTEs. vmap/vunmap manipulates only with *pgd-pte and the change becomes
visible to every mm. This is impossible to do for sections because they use
PGD entries directly.

 it would however make sense to use the default
 (7 + PAGE_SHIFT) instead of the ARM-specific 24 here if one of them is set,
 I don't see any downsides to that.

This makes sense. I'll prepare a patch for that.

 
   Arnd


-- 
Sergey Dyasly s.dya...@samsung.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] checkpatch: add check for the buggy while_each_thread()

2015-01-18 Thread Sergey Dyasly
Now it's preferable to use for_each_thread() instead of while_each_thread().
Add a check to checkpatch.pl in order to prevent any new usages of the buggy
while_each_thread() when possible.

Cc: Oleg Nesterov 
Cc: Andy Whitcroft 
Cc: Joe Perches 
Signed-off-by: Sergey Dyasly 
---
 scripts/checkpatch.pl | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index f0bb6d6..0c5cc0b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3255,6 +3255,12 @@ sub process {
 "Prefer dev_$level(... to dev_printk(KERN_$orig, 
...\n" . $herecurr);
}
 
+# Check for the buggy while_each_thread()
+   if ($line =~ /\bwhile_each_thread\(/) {
+   WARN("WHILE_EACH_THREAD",
+"Prefer to use for_each_thread() instead of the 
buggy while_each_thread(). See commit 0c740d0afc3bff0a097ad03a1c8df92757516f5c 
for details.\n" . $herecurr);
+   }
+
 # function brace can't be on same line, except for #defines of do while,
 # or if closed on same line
if (($line=~/$Type\s*$Ident\(.*\).*\s*{/) and
-- 
2.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] checkpatch: add check for the buggy while_each_thread()

2015-01-18 Thread Sergey Dyasly
Now it's preferable to use for_each_thread() instead of while_each_thread().
Add a check to checkpatch.pl in order to prevent any new usages of the buggy
while_each_thread() when possible.

Cc: Oleg Nesterov o...@redhat.com
Cc: Andy Whitcroft a...@canonical.com
Cc: Joe Perches j...@perches.com
Signed-off-by: Sergey Dyasly dse...@gmail.com
---
 scripts/checkpatch.pl | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index f0bb6d6..0c5cc0b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3255,6 +3255,12 @@ sub process {
 Prefer dev_$level(... to dev_printk(KERN_$orig, 
...\n . $herecurr);
}
 
+# Check for the buggy while_each_thread()
+   if ($line =~ /\bwhile_each_thread\(/) {
+   WARN(WHILE_EACH_THREAD,
+Prefer to use for_each_thread() instead of the 
buggy while_each_thread(). See commit 0c740d0afc3bff0a097ad03a1c8df92757516f5c 
for details.\n . $herecurr);
+   }
+
 # function brace can't be on same line, except for #defines of do while,
 # or if closed on same line
if (($line=~/$Type\s*$Ident\(.*\).*\s*{/) and
-- 
2.2.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH RESEND] mm: vmalloc: remove ioremap align constraint

2015-01-03 Thread Sergey Dyasly
Hi Arnd,

First, some background information. We originally encountered high fragmentation
issue in vmalloc area:

1. Total size of vmalloc area was 400 MB.
2. 200 MB of vmalloc area was consumed by ioremaps of various sizes.
3. Largest contiguous chunk of vmalloc area was 12 MB.
4. ioremap of 10 MB failed due to 8 MB alignment requirement.

It was decided to further increase the size of vmalloc area to resolve the above
issue. And I don't like that solution because it decreases the amount of lowmem.

Now let's see how ioremap uses supersections. Judging from current 
implementation
of __arm_ioremap_pfn_caller:

#if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
if (pfn >= 0x10 && !((paddr | size | addr) & 
~SUPERSECTION_MASK)) {
remap_area_supersections();
} else if (!((paddr | size | addr) & ~PMD_MASK)) {
remap_area_sections();
} else
#endif
err = ioremap_page_range();

supersections and sections mappings are used only in !SMP && !LPAE case.
Otherwise, mapping is created using the usual 4K pages (and we are using SMP).
The suggested patch removes alignment requirements for ioremap but it means that
sections will not be used in !SMP case. So another solution is required.

__get_vm_area_node has align parameter, maybe it can be used to specify the
required alignment of ioremap operation? Because I find current generic fls
algorithm to be very restrictive in cases when it's not necessary to use such
a big alignment.


On Tue, 23 Dec 2014 21:58:49 +0100
Arnd Bergmann  wrote:

> On Tuesday 23 December 2014 13:00:13 Dmitry Safonov wrote:
> > ioremap uses __get_vm_area_node which sets alignment to fls of requested 
> > size.
> > I couldn't find any reason for such big align. Does it decrease TLB misses?
> > I tested it on custom ARM board with 200+ Mb of ioremap and it works.
> > What am I missing?
> 
> The alignment was originally introduced in this commit:
> 
> commit ff0daca525dde796382b9ccd563f169df2571211
> Author: Russell King 
> Date:   Thu Jun 29 20:17:15 2006 +0100
> 
> [ARM] Add section support to ioremap
> 
> Allow section mappings to be setup using ioremap() and torn down
> with iounmap().  This requires additional support in the MM
> context switch to ensure that mappings are properly synchronised
> when mapped in.
> 
> Based an original implementation by Deepak Saxena, reworked and
> ARMv6 support added by rmk.
> 
> Signed-off-by: Russell King 
> 
> and then later extended to 16MB supersection mappings, which indeed
> is used to reduce TLB pressure.
> 
> I don't see any downsides to it, why change it?
> 
>   Arnd

-- 
Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH RESEND] mm: vmalloc: remove ioremap align constraint

2015-01-03 Thread Sergey Dyasly
Hi Arnd,

First, some background information. We originally encountered high fragmentation
issue in vmalloc area:

1. Total size of vmalloc area was 400 MB.
2. 200 MB of vmalloc area was consumed by ioremaps of various sizes.
3. Largest contiguous chunk of vmalloc area was 12 MB.
4. ioremap of 10 MB failed due to 8 MB alignment requirement.

It was decided to further increase the size of vmalloc area to resolve the above
issue. And I don't like that solution because it decreases the amount of lowmem.

Now let's see how ioremap uses supersections. Judging from current 
implementation
of __arm_ioremap_pfn_caller:

#if !defined(CONFIG_SMP)  !defined(CONFIG_ARM_LPAE)
if (pfn = 0x10  !((paddr | size | addr)  
~SUPERSECTION_MASK)) {
remap_area_supersections();
} else if (!((paddr | size | addr)  ~PMD_MASK)) {
remap_area_sections();
} else
#endif
err = ioremap_page_range();

supersections and sections mappings are used only in !SMP  !LPAE case.
Otherwise, mapping is created using the usual 4K pages (and we are using SMP).
The suggested patch removes alignment requirements for ioremap but it means that
sections will not be used in !SMP case. So another solution is required.

__get_vm_area_node has align parameter, maybe it can be used to specify the
required alignment of ioremap operation? Because I find current generic fls
algorithm to be very restrictive in cases when it's not necessary to use such
a big alignment.


On Tue, 23 Dec 2014 21:58:49 +0100
Arnd Bergmann a...@arndb.de wrote:

 On Tuesday 23 December 2014 13:00:13 Dmitry Safonov wrote:
  ioremap uses __get_vm_area_node which sets alignment to fls of requested 
  size.
  I couldn't find any reason for such big align. Does it decrease TLB misses?
  I tested it on custom ARM board with 200+ Mb of ioremap and it works.
  What am I missing?
 
 The alignment was originally introduced in this commit:
 
 commit ff0daca525dde796382b9ccd563f169df2571211
 Author: Russell King r...@dyn-67.arm.linux.org.uk
 Date:   Thu Jun 29 20:17:15 2006 +0100
 
 [ARM] Add section support to ioremap
 
 Allow section mappings to be setup using ioremap() and torn down
 with iounmap().  This requires additional support in the MM
 context switch to ensure that mappings are properly synchronised
 when mapped in.
 
 Based an original implementation by Deepak Saxena, reworked and
 ARMv6 support added by rmk.
 
 Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk
 
 and then later extended to 16MB supersection mappings, which indeed
 is used to reduce TLB pressure.
 
 I don't see any downsides to it, why change it?
 
   Arnd

-- 
Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] initial while_each_thread() fixes

2013-12-03 Thread Sergey Dyasly
Hi Oleg,

I was waiting for this one!

On Mon, 2 Dec 2013 16:24:23 +0100
Oleg Nesterov  wrote:

> Hello.
> 
> This was reported several times, I believe the first report is
> http://marc.info/?l=linux-kernel=127688978121665. Hmm, 3 years
> ago. The lockless while_each_thread() is racy and broken, almost
> every user can loop forever.
> 
> Recently people started to report they actually hit this problem in
> oom_kill.c. This doesn't really matter and I can be wrong, but in
> fact I do not think they really hit this race, it is very unlikely.

The race is very easy to catch if you have a process with several threads,
all of which allocates memory simultaneously. This leads to:

  1) OOMk selects and sends SIGKILL to one of the threads

  2) another thread invokes OOMk and the first thread gets selected,
 but it gets unhashed before while_each_thread...

> Another problem with while_each_thread() is that it is very easy
> to use it wrongly, and oom_kill.c is the good example.
> 
> I came to conclusion that it is practically impossible to send a
> single series which fixes all problems, too many different users.
> 
> So 1/2 adds the new for_each_thread() interface, and 2/2 fixes oom
> kill as an example.
> 
> We obviously need a lot more changes like 2/2 before we can kill
> while_each_thread() and task_struct->thread_group, but I hope they
> will be straighforward. And in fact I hope that task->thread_group
> can go away before we change all users of while_each_thread().
> 
> David, et al, I din't actually test 2/2, I do not know how. Please
> review, although it looks simple.

The patches look correct and my test case no longer hangs, so

Reviewed-and-Tested-by: Sergey Dyasly 

> 
> Oleg.
> 
>  include/linux/init_task.h |2 ++
>  include/linux/sched.h |   12 
>  kernel/exit.c |1 +
>  kernel/fork.c |7 +++
>  mm/oom_kill.c |   37 ++++-
>  5 files changed, 42 insertions(+), 17 deletions(-)
> 

-- 
Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] initial while_each_thread() fixes

2013-12-03 Thread Sergey Dyasly
Hi Oleg,

I was waiting for this one!

On Mon, 2 Dec 2013 16:24:23 +0100
Oleg Nesterov o...@redhat.com wrote:

 Hello.
 
 This was reported several times, I believe the first report is
 http://marc.info/?l=linux-kernelm=127688978121665. Hmm, 3 years
 ago. The lockless while_each_thread() is racy and broken, almost
 every user can loop forever.
 
 Recently people started to report they actually hit this problem in
 oom_kill.c. This doesn't really matter and I can be wrong, but in
 fact I do not think they really hit this race, it is very unlikely.

The race is very easy to catch if you have a process with several threads,
all of which allocates memory simultaneously. This leads to:

  1) OOMk selects and sends SIGKILL to one of the threads

  2) another thread invokes OOMk and the first thread gets selected,
 but it gets unhashed before while_each_thread...

 Another problem with while_each_thread() is that it is very easy
 to use it wrongly, and oom_kill.c is the good example.
 
 I came to conclusion that it is practically impossible to send a
 single series which fixes all problems, too many different users.
 
 So 1/2 adds the new for_each_thread() interface, and 2/2 fixes oom
 kill as an example.
 
 We obviously need a lot more changes like 2/2 before we can kill
 while_each_thread() and task_struct-thread_group, but I hope they
 will be straighforward. And in fact I hope that task-thread_group
 can go away before we change all users of while_each_thread().
 
 David, et al, I din't actually test 2/2, I do not know how. Please
 review, although it looks simple.

The patches look correct and my test case no longer hangs, so

Reviewed-and-Tested-by: Sergey Dyasly dse...@gmail.com

 
 Oleg.
 
  include/linux/init_task.h |2 ++
  include/linux/sched.h |   12 
  kernel/exit.c |1 +
  kernel/fork.c |7 +++
  mm/oom_kill.c |   37 -
  5 files changed, 42 insertions(+), 17 deletions(-)
 

-- 
Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-10-01 Thread Sergey Dyasly
It seems to me that we are going nowhere with this discussion...

If you are ok with the first change in my patch regarding fatal_signal_pending,
I can send new patch with just that change.


On Mon, 30 Sep 2013 15:08:25 -0700 (PDT)
David Rientjes  wrote:

> On Fri, 27 Sep 2013, Sergey Dyasly wrote:
> 
> > What you are saying contradicts current OOMk code the way I read it. 
> > Comment in
> > oom_kill_process() says:
> > 
> > "If the task is already exiting ... set TIF_MEMDIE so it can die quickly"
> > 
> > I just want to know the right solution.
> > 
> 
> That's a comment, not code.  The point of the PF_EXITING special handling 
> in oom_kill_process() is to avoid telling sysadmins that a process has 
> been killed to free memory when it has already called exit() and to avoid 
> sacrificing one of its children for the exiting process.
> 
> It may or may not need access to memory reserves to actually exit after 
> PF_EXITING depending on whether it needs to allocate memory for 
> coredumping or anything else.  So instead of waiting for it to recall the 
> oom killer, TIF_MEMDIE is set anyway.  The point is that PF_EXITING 
> processes can already get TIF_MEMDIE immediately when their memory 
> allocation fails so there's no reason not to set it now as an 
> optimization.
> 
> But we definitely want to avoid printing anything to the kernel log when 
> the process has already called exit() and issuing the SIGKILL at that 
> point would be pointless.
> 
> > You are mistaken, oom_kill_process() is only called from out_of_memory()
> > and mem_cgroup_out_of_memory().
> > 
> 
> out_of_memory() calls oom_kill_process() in two places, plus the call from 
> mem_cgroup_out_of_memory(), making three calls in the tree.  Not that this 
> matters in the slightest, though.
> 
> > > Read the comment about why we don't emit anything to the kernel log in 
> > > this case; the process is already exiting, there's no need to kill it or 
> > > make anyone believe that it was killed.
> > 
> > Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
> > and in this case oom_kill_process() won't be even called. That's why it's
> > redundant.
> > 
> 
> You apparently have no idea how long select_bad_process() runs on a large 
> system with thousands of processes.  Keep in mind that SGI requested the 
> addition of the oom_kill_allocating_task sysctl specifically because of 
> how long select_bad_process() runs.  The PF_EXITING check in 
> oom_kill_process() is simply an optimization to return early and with 
> access to memory reserves so it can exit as quickly as possible and 
> without the kernel stating it's killing something that has already called 
> exit().


-- 
Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-10-01 Thread Sergey Dyasly
It seems to me that we are going nowhere with this discussion...

If you are ok with the first change in my patch regarding fatal_signal_pending,
I can send new patch with just that change.


On Mon, 30 Sep 2013 15:08:25 -0700 (PDT)
David Rientjes rient...@google.com wrote:

 On Fri, 27 Sep 2013, Sergey Dyasly wrote:
 
  What you are saying contradicts current OOMk code the way I read it. 
  Comment in
  oom_kill_process() says:
  
  If the task is already exiting ... set TIF_MEMDIE so it can die quickly
  
  I just want to know the right solution.
  
 
 That's a comment, not code.  The point of the PF_EXITING special handling 
 in oom_kill_process() is to avoid telling sysadmins that a process has 
 been killed to free memory when it has already called exit() and to avoid 
 sacrificing one of its children for the exiting process.
 
 It may or may not need access to memory reserves to actually exit after 
 PF_EXITING depending on whether it needs to allocate memory for 
 coredumping or anything else.  So instead of waiting for it to recall the 
 oom killer, TIF_MEMDIE is set anyway.  The point is that PF_EXITING 
 processes can already get TIF_MEMDIE immediately when their memory 
 allocation fails so there's no reason not to set it now as an 
 optimization.
 
 But we definitely want to avoid printing anything to the kernel log when 
 the process has already called exit() and issuing the SIGKILL at that 
 point would be pointless.
 
  You are mistaken, oom_kill_process() is only called from out_of_memory()
  and mem_cgroup_out_of_memory().
  
 
 out_of_memory() calls oom_kill_process() in two places, plus the call from 
 mem_cgroup_out_of_memory(), making three calls in the tree.  Not that this 
 matters in the slightest, though.
 
   Read the comment about why we don't emit anything to the kernel log in 
   this case; the process is already exiting, there's no need to kill it or 
   make anyone believe that it was killed.
  
  Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
  and in this case oom_kill_process() won't be even called. That's why it's
  redundant.
  
 
 You apparently have no idea how long select_bad_process() runs on a large 
 system with thousands of processes.  Keep in mind that SGI requested the 
 addition of the oom_kill_allocating_task sysctl specifically because of 
 how long select_bad_process() runs.  The PF_EXITING check in 
 oom_kill_process() is simply an optimization to return early and with 
 access to memory reserves so it can exit as quickly as possible and 
 without the kernel stating it's killing something that has already called 
 exit().


-- 
Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-27 Thread Sergey Dyasly
On Wed, 25 Sep 2013 13:31:32 -0700 (PDT)
David Rientjes  wrote:

> On Wed, 11 Sep 2013, Sergey Dyasly wrote:
> 
> > > > /*
> > > >  * If this task is not being ptraced on exit, then wait 
> > > > for it
> > > >  * to finish before killing some other task 
> > > > unnecessarily.
> > > >  */
> > > > -   if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > > > +   if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > > > +   set_tsk_thread_flag(task, TIF_MEMDIE);
> > > 
> > > This does not, we do not give access to memory reserves unless the 
> > > process 
> > > needs it to allocate memory.  The task here, which is not current, can 
> > > call into the oom killer and be granted memory reserves if necessary.
> > 
> > True. However, why TIF_MEMDIE is set for PF_EXITING task in 
> > oom_kill_process()
> > then?
> 
> If current needs access to memory reserves while PF_EXITING, it should 
> call the page allocator, find that it is out of memory, and call the oom 
> killer to silently be granted memory reserves.

I understand this and you are repeating yourself :)
What you are saying contradicts current OOMk code the way I read it. Comment in
oom_kill_process() says:

"If the task is already exiting ... set TIF_MEMDIE so it can die quickly"

I just want to know the right solution.

> > > > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
> > > > gfp_mask, int order,
> > > > static DEFINE_RATELIMIT_STATE(oom_rs, 
> > > > DEFAULT_RATELIMIT_INTERVAL,
> > > >   DEFAULT_RATELIMIT_BURST);
> > > >  
> > > > -   /*
> > > > -* If the task is already exiting, don't alarm the sysadmin or 
> > > > kill
> > > > -* its children or threads, just set TIF_MEMDIE so it can die 
> > > > quickly
> > > > -*/
> > > > -   if (p->flags & PF_EXITING) {
> > > > -   set_tsk_thread_flag(p, TIF_MEMDIE);
> > > > -   put_task_struct(p);
> > > > -   return;
> > > > -   }
> > > 
> > > I think you misunderstood the point of this; if a selected process is 
> > > already in the exit path then this is simply avoiding dumping oom kill 
> > > lines to the kernel log.  We want to keep doing that.
> > 
> > This happens in oom_kill_process() after victim has been selected by
> > select_bad_process(). But there is already PF_EXITING check in
> > oom_scan_process_thread() and in this case OOM code won't call 
> > oom_kill_process.
> 
> select_bad_process() is one of three callers to oom_kill_process().

You are mistaken, oom_kill_process() is only called from out_of_memory()
and mem_cgroup_out_of_memory().

> > The only difference is in force_kill flag, and the only case where it's set
> > is SysRq. And I think in this case OOM killer messages are a good thing to 
> > have
> > even when victim is already exiting, instead of just silence.
> > 
> 
> Read the comment about why we don't emit anything to the kernel log in 
> this case; the process is already exiting, there's no need to kill it or 
> make anyone believe that it was killed.

Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
and in this case oom_kill_process() won't be even called. That's why it's
redundant.

--
Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-27 Thread Sergey Dyasly
On Wed, 25 Sep 2013 13:31:32 -0700 (PDT)
David Rientjes rient...@google.com wrote:

 On Wed, 11 Sep 2013, Sergey Dyasly wrote:
 
/*
 * If this task is not being ptraced on exit, then wait 
for it
 * to finish before killing some other task 
unnecessarily.
 */
-   if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
+   if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
+   set_tsk_thread_flag(task, TIF_MEMDIE);
   
   This does not, we do not give access to memory reserves unless the 
   process 
   needs it to allocate memory.  The task here, which is not current, can 
   call into the oom killer and be granted memory reserves if necessary.
  
  True. However, why TIF_MEMDIE is set for PF_EXITING task in 
  oom_kill_process()
  then?
 
 If current needs access to memory reserves while PF_EXITING, it should 
 call the page allocator, find that it is out of memory, and call the oom 
 killer to silently be granted memory reserves.

I understand this and you are repeating yourself :)
What you are saying contradicts current OOMk code the way I read it. Comment in
oom_kill_process() says:

If the task is already exiting ... set TIF_MEMDIE so it can die quickly

I just want to know the right solution.

@@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
gfp_mask, int order,
static DEFINE_RATELIMIT_STATE(oom_rs, 
DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
 
-   /*
-* If the task is already exiting, don't alarm the sysadmin or 
kill
-* its children or threads, just set TIF_MEMDIE so it can die 
quickly
-*/
-   if (p-flags  PF_EXITING) {
-   set_tsk_thread_flag(p, TIF_MEMDIE);
-   put_task_struct(p);
-   return;
-   }
   
   I think you misunderstood the point of this; if a selected process is 
   already in the exit path then this is simply avoiding dumping oom kill 
   lines to the kernel log.  We want to keep doing that.
  
  This happens in oom_kill_process() after victim has been selected by
  select_bad_process(). But there is already PF_EXITING check in
  oom_scan_process_thread() and in this case OOM code won't call 
  oom_kill_process.
 
 select_bad_process() is one of three callers to oom_kill_process().

You are mistaken, oom_kill_process() is only called from out_of_memory()
and mem_cgroup_out_of_memory().

  The only difference is in force_kill flag, and the only case where it's set
  is SysRq. And I think in this case OOM killer messages are a good thing to 
  have
  even when victim is already exiting, instead of just silence.
  
 
 Read the comment about why we don't emit anything to the kernel log in 
 this case; the process is already exiting, there's no need to kill it or 
 make anyone believe that it was killed.

Yes, but there is already the PF_EXITING check in oom_scan_process_thread(),
and in this case oom_kill_process() won't be even called. That's why it's
redundant.

--
Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-19 Thread Sergey Dyasly
Ping :)

On Wed, 11 Sep 2013 19:06:05 +0400
Sergey Dyasly  wrote:

> On Mon, 9 Sep 2013 13:07:08 -0700 (PDT)
> David Rientjes  wrote:
> 
> > >   /*
> > >* If this task is not being ptraced on exit, then wait for it
> > >* to finish before killing some other task unnecessarily.
> > >*/
> > > - if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > > + if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > > + set_tsk_thread_flag(task, TIF_MEMDIE);
> > 
> > This does not, we do not give access to memory reserves unless the process 
> > needs it to allocate memory.  The task here, which is not current, can 
> > call into the oom killer and be granted memory reserves if necessary.
> 
> True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
> then?
> Also, setting TIF_MEMDIE will avoid direct reclaim and memory allocation 
> should
> be fast if exiting task needs it.
> 
> > > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
> > > gfp_mask, int order,
> > >   static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> > > DEFAULT_RATELIMIT_BURST);
> > >  
> > > - /*
> > > -  * If the task is already exiting, don't alarm the sysadmin or kill
> > > -  * its children or threads, just set TIF_MEMDIE so it can die quickly
> > > -  */
> > > - if (p->flags & PF_EXITING) {
> > > - set_tsk_thread_flag(p, TIF_MEMDIE);
> > > - put_task_struct(p);
> > > - return;
> > > - }
> > 
> > I think you misunderstood the point of this; if a selected process is 
> > already in the exit path then this is simply avoiding dumping oom kill 
> > lines to the kernel log.  We want to keep doing that.
> 
> This happens in oom_kill_process() after victim has been selected by
> select_bad_process(). But there is already PF_EXITING check in
> oom_scan_process_thread() and in this case OOM code won't call 
> oom_kill_process.
> There is only a slight chance that victim will become PF_EXITING between
> scan and kill.
> 
> The only difference is in force_kill flag, and the only case where it's set
> is SysRq. And I think in this case OOM killer messages are a good thing to 
> have
> even when victim is already exiting, instead of just silence.
> 
> -- 
> Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-19 Thread Sergey Dyasly
Ping :)

On Wed, 11 Sep 2013 19:06:05 +0400
Sergey Dyasly dse...@gmail.com wrote:

 On Mon, 9 Sep 2013 13:07:08 -0700 (PDT)
 David Rientjes rient...@google.com wrote:
 
 /*
  * If this task is not being ptraced on exit, then wait for it
  * to finish before killing some other task unnecessarily.
  */
   - if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
   + if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
   + set_tsk_thread_flag(task, TIF_MEMDIE);
  
  This does not, we do not give access to memory reserves unless the process 
  needs it to allocate memory.  The task here, which is not current, can 
  call into the oom killer and be granted memory reserves if necessary.
 
 True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
 then?
 Also, setting TIF_MEMDIE will avoid direct reclaim and memory allocation 
 should
 be fast if exiting task needs it.
 
   @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
   gfp_mask, int order,
 static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
   DEFAULT_RATELIMIT_BURST);

   - /*
   -  * If the task is already exiting, don't alarm the sysadmin or kill
   -  * its children or threads, just set TIF_MEMDIE so it can die quickly
   -  */
   - if (p-flags  PF_EXITING) {
   - set_tsk_thread_flag(p, TIF_MEMDIE);
   - put_task_struct(p);
   - return;
   - }
  
  I think you misunderstood the point of this; if a selected process is 
  already in the exit path then this is simply avoiding dumping oom kill 
  lines to the kernel log.  We want to keep doing that.
 
 This happens in oom_kill_process() after victim has been selected by
 select_bad_process(). But there is already PF_EXITING check in
 oom_scan_process_thread() and in this case OOM code won't call 
 oom_kill_process.
 There is only a slight chance that victim will become PF_EXITING between
 scan and kill.
 
 The only difference is in force_kill flag, and the only case where it's set
 is SysRq. And I think in this case OOM killer messages are a good thing to 
 have
 even when victim is already exiting, instead of just silence.
 
 -- 
 Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-11 Thread Sergey Dyasly
On Mon, 9 Sep 2013 13:07:08 -0700 (PDT)
David Rientjes  wrote:

> > /*
> >  * If this task is not being ptraced on exit, then wait for it
> >  * to finish before killing some other task unnecessarily.
> >  */
> > -   if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
> > +   if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
> > +   set_tsk_thread_flag(task, TIF_MEMDIE);
> 
> This does not, we do not give access to memory reserves unless the process 
> needs it to allocate memory.  The task here, which is not current, can 
> call into the oom killer and be granted memory reserves if necessary.

True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
then?
Also, setting TIF_MEMDIE will avoid direct reclaim and memory allocation should
be fast if exiting task needs it.

> > @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
> > gfp_mask, int order,
> > static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
> >   DEFAULT_RATELIMIT_BURST);
> >  
> > -   /*
> > -* If the task is already exiting, don't alarm the sysadmin or kill
> > -* its children or threads, just set TIF_MEMDIE so it can die quickly
> > -*/
> > -   if (p->flags & PF_EXITING) {
> > -   set_tsk_thread_flag(p, TIF_MEMDIE);
> > -   put_task_struct(p);
> > -   return;
> > -   }
> 
> I think you misunderstood the point of this; if a selected process is 
> already in the exit path then this is simply avoiding dumping oom kill 
> lines to the kernel log.  We want to keep doing that.

This happens in oom_kill_process() after victim has been selected by
select_bad_process(). But there is already PF_EXITING check in
oom_scan_process_thread() and in this case OOM code won't call oom_kill_process.
There is only a slight chance that victim will become PF_EXITING between
scan and kill.

The only difference is in force_kill flag, and the only case where it's set
is SysRq. And I think in this case OOM killer messages are a good thing to have
even when victim is already exiting, instead of just silence.

-- 
Sergey Dyasly 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-11 Thread Sergey Dyasly
On Mon, 9 Sep 2013 13:07:08 -0700 (PDT)
David Rientjes rient...@google.com wrote:

  /*
   * If this task is not being ptraced on exit, then wait for it
   * to finish before killing some other task unnecessarily.
   */
  -   if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
  +   if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
  +   set_tsk_thread_flag(task, TIF_MEMDIE);
 
 This does not, we do not give access to memory reserves unless the process 
 needs it to allocate memory.  The task here, which is not current, can 
 call into the oom killer and be granted memory reserves if necessary.

True. However, why TIF_MEMDIE is set for PF_EXITING task in oom_kill_process()
then?
Also, setting TIF_MEMDIE will avoid direct reclaim and memory allocation should
be fast if exiting task needs it.

  @@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
  gfp_mask, int order,
  static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
   
  -   /*
  -* If the task is already exiting, don't alarm the sysadmin or kill
  -* its children or threads, just set TIF_MEMDIE so it can die quickly
  -*/
  -   if (p-flags  PF_EXITING) {
  -   set_tsk_thread_flag(p, TIF_MEMDIE);
  -   put_task_struct(p);
  -   return;
  -   }
 
 I think you misunderstood the point of this; if a selected process is 
 already in the exit path then this is simply avoiding dumping oom kill 
 lines to the kernel log.  We want to keep doing that.

This happens in oom_kill_process() after victim has been selected by
select_bad_process(). But there is already PF_EXITING check in
oom_scan_process_thread() and in this case OOM code won't call oom_kill_process.
There is only a slight chance that victim will become PF_EXITING between
scan and kill.

The only difference is in force_kill flag, and the only case where it's set
is SysRq. And I think in this case OOM killer messages are a good thing to have
even when victim is already exiting, instead of just silence.

-- 
Sergey Dyasly dse...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-09 Thread Sergey Dyasly
If OOM killer finds a task which is about to exit or is already doing so,
there is no need to kill anyone. It should just wait until task dies.

Add missing fatal_signal_pending() check and allow selected task to use memory
reserves in order to exit quickly.

Also remove redundant PF_EXITING check after victim has been selected.

Signed-off-by: Sergey Dyasly 
---
 mm/oom_kill.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 98e75f2..ef83b81 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct 
task_struct *task,
if (oom_task_origin(task))
return OOM_SCAN_SELECT;
 
-   if (task->flags & PF_EXITING && !force_kill) {
+   if ((task->flags & PF_EXITING || fatal_signal_pending(task)) &&
+   !force_kill) {
/*
 * If this task is not being ptraced on exit, then wait for it
 * to finish before killing some other task unnecessarily.
 */
-   if (!(task->group_leader->ptrace & PT_TRACE_EXIT))
+   if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) {
+   set_tsk_thread_flag(task, TIF_MEMDIE);
return OOM_SCAN_ABORT;
+   }
}
return OOM_SCAN_OK;
 }
@@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
gfp_mask, int order,
static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
 
-   /*
-* If the task is already exiting, don't alarm the sysadmin or kill
-* its children or threads, just set TIF_MEMDIE so it can die quickly
-*/
-   if (p->flags & PF_EXITING) {
-   set_tsk_thread_flag(p, TIF_MEMDIE);
-   put_task_struct(p);
-   return;
-   }
-
if (__ratelimit(_rs))
dump_header(p, gfp_mask, order, memcg, nodemask);
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] OOM killer: wait for tasks with pending SIGKILL to exit

2013-09-09 Thread Sergey Dyasly
If OOM killer finds a task which is about to exit or is already doing so,
there is no need to kill anyone. It should just wait until task dies.

Add missing fatal_signal_pending() check and allow selected task to use memory
reserves in order to exit quickly.

Also remove redundant PF_EXITING check after victim has been selected.

Signed-off-by: Sergey Dyasly dse...@gmail.com
---
 mm/oom_kill.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 98e75f2..ef83b81 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -275,13 +275,16 @@ enum oom_scan_t oom_scan_process_thread(struct 
task_struct *task,
if (oom_task_origin(task))
return OOM_SCAN_SELECT;
 
-   if (task-flags  PF_EXITING  !force_kill) {
+   if ((task-flags  PF_EXITING || fatal_signal_pending(task)) 
+   !force_kill) {
/*
 * If this task is not being ptraced on exit, then wait for it
 * to finish before killing some other task unnecessarily.
 */
-   if (!(task-group_leader-ptrace  PT_TRACE_EXIT))
+   if (!(task-group_leader-ptrace  PT_TRACE_EXIT)) {
+   set_tsk_thread_flag(task, TIF_MEMDIE);
return OOM_SCAN_ABORT;
+   }
}
return OOM_SCAN_OK;
 }
@@ -412,16 +415,6 @@ void oom_kill_process(struct task_struct *p, gfp_t 
gfp_mask, int order,
static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
 
-   /*
-* If the task is already exiting, don't alarm the sysadmin or kill
-* its children or threads, just set TIF_MEMDIE so it can die quickly
-*/
-   if (p-flags  PF_EXITING) {
-   set_tsk_thread_flag(p, TIF_MEMDIE);
-   put_task_struct(p);
-   return;
-   }
-
if (__ratelimit(oom_rs))
dump_header(p, gfp_mask, order, memcg, nodemask);
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/