Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v3)

2014-05-29 Thread Marcelo Tosatti

On Wed, May 28, 2014 at 06:45:04PM -0500, Christoph Lameter wrote:



Much cleaner, sent v4 with your suggestions.

> Why call __alloc_pages_nodemask at all if you want to skip the node
> handling? Punt to alloc_pages()

- __alloc_pages_nodemask ignored GFP_DMA32 on older kernels, so the
interface should remain functional.
- There are others callers of alloc_pages(GFP_DMA) that can suffer
from the same problem.
- Mirrors mempolicy behaviour.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v3)

2014-05-29 Thread Marcelo Tosatti

On Wed, May 28, 2014 at 06:45:04PM -0500, Christoph Lameter wrote:

snip

Much cleaner, sent v4 with your suggestions.

 Why call __alloc_pages_nodemask at all if you want to skip the node
 handling? Punt to alloc_pages()

- __alloc_pages_nodemask ignored GFP_DMA32 on older kernels, so the
interface should remain functional.
- There are others callers of alloc_pages(GFP_DMA) that can suffer
from the same problem.
- Mirrors mempolicy behaviour.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v3)

2014-05-28 Thread Christoph Lameter
On Wed, 28 May 2014, Marcelo Tosatti wrote:

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..dfea3dc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order,
>   unsigned int cpuset_mems_cookie;
>   int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
>   struct mem_cgroup *memcg = NULL;
> + nodemask_t *cpuset_mems_allowed = _current_mems_allowed;

Why do you need this one?

>   gfp_mask &= gfp_allowed_mask;
>
> @@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order,
>  retry_cpuset:
>   cpuset_mems_cookie = read_mems_allowed_begin();
>
> +#ifdef CONFIG_NUMA
> + if (gfp_zone(gfp_mask) < policy_zone)
> + cpuset_mems_allowed = NULL;

nodemask = _states[N_ONLINE];

> +#endif


> +
>   /* The preferred zone is used for statistics later */
>   first_zones_zonelist(zonelist, high_zoneidx,
> - nodemask ? : _current_mems_allowed,
> + nodemask ? : cpuset_mems_allowed,

Skip this?

>   _zone);
>   if (!preferred_zone)
>   goto out;
>

Why call __alloc_pages_nodemask at all if you want to skip the node
handling? Punt to alloc_pages()
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v3)

2014-05-28 Thread Marcelo Tosatti

Zone specific allocations, such as GFP_DMA32, should not be restricted
to cpusets allowed node list: the zones which such allocations demand
might be contained in particular nodes outside the cpuset node list.

Necessary for the following usecase:
- driver which requires zone specific memory (such as KVM, which
requires root pagetable at paddr < 4GB).
- user wants to limit allocations of application to nodeX, and nodeX has
no memory < 4GB.

Signed-off-by: Marcelo Tosatti 

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..3bbc23f 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2374,6 +2374,7 @@ static struct cpuset *nearest_hardwall_ancestor(struct 
cpuset *cs)
  * variable 'wait' is not set, and the bit ALLOC_CPUSET is not set
  * in alloc_flags.  That logic and the checks below have the combined
  * affect that:
+ * gfp_zone(mask) < policy_zone - any node ok
  * in_interrupt - any node ok (current task context irrelevant)
  * GFP_ATOMIC   - any node ok
  * TIF_MEMDIE   - any node ok
@@ -2392,6 +2393,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t 
gfp_mask)
 
if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
return 1;
+#ifdef CONFIG_NUMA
+   if (gfp_zone(gfp_mask) < policy_zone)
+   return 1;
+#endif
might_sleep_if(!(gfp_mask & __GFP_HARDWALL));
if (node_isset(node, current->mems_allowed))
return 1;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..dfea3dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
struct mem_cgroup *memcg = NULL;
+   nodemask_t *cpuset_mems_allowed = _current_mems_allowed;
 
gfp_mask &= gfp_allowed_mask;
 
@@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
order,
 retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
 
+#ifdef CONFIG_NUMA
+   if (gfp_zone(gfp_mask) < policy_zone)
+   cpuset_mems_allowed = NULL;
+#endif
+
/* The preferred zone is used for statistics later */
first_zones_zonelist(zonelist, high_zoneidx,
-   nodemask ? : _current_mems_allowed,
+   nodemask ? : cpuset_mems_allowed,
_zone);
if (!preferred_zone)
goto out;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v3)

2014-05-28 Thread Marcelo Tosatti

Zone specific allocations, such as GFP_DMA32, should not be restricted
to cpusets allowed node list: the zones which such allocations demand
might be contained in particular nodes outside the cpuset node list.

Necessary for the following usecase:
- driver which requires zone specific memory (such as KVM, which
requires root pagetable at paddr  4GB).
- user wants to limit allocations of application to nodeX, and nodeX has
no memory  4GB.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..3bbc23f 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2374,6 +2374,7 @@ static struct cpuset *nearest_hardwall_ancestor(struct 
cpuset *cs)
  * variable 'wait' is not set, and the bit ALLOC_CPUSET is not set
  * in alloc_flags.  That logic and the checks below have the combined
  * affect that:
+ * gfp_zone(mask)  policy_zone - any node ok
  * in_interrupt - any node ok (current task context irrelevant)
  * GFP_ATOMIC   - any node ok
  * TIF_MEMDIE   - any node ok
@@ -2392,6 +2393,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t 
gfp_mask)
 
if (in_interrupt() || (gfp_mask  __GFP_THISNODE))
return 1;
+#ifdef CONFIG_NUMA
+   if (gfp_zone(gfp_mask)  policy_zone)
+   return 1;
+#endif
might_sleep_if(!(gfp_mask  __GFP_HARDWALL));
if (node_isset(node, current-mems_allowed))
return 1;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..dfea3dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
struct mem_cgroup *memcg = NULL;
+   nodemask_t *cpuset_mems_allowed = cpuset_current_mems_allowed;
 
gfp_mask = gfp_allowed_mask;
 
@@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
order,
 retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
 
+#ifdef CONFIG_NUMA
+   if (gfp_zone(gfp_mask)  policy_zone)
+   cpuset_mems_allowed = NULL;
+#endif
+
/* The preferred zone is used for statistics later */
first_zones_zonelist(zonelist, high_zoneidx,
-   nodemask ? : cpuset_current_mems_allowed,
+   nodemask ? : cpuset_mems_allowed,
preferred_zone);
if (!preferred_zone)
goto out;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v3)

2014-05-28 Thread Christoph Lameter
On Wed, 28 May 2014, Marcelo Tosatti wrote:

 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 5dba293..dfea3dc 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
 order,
   unsigned int cpuset_mems_cookie;
   int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
   struct mem_cgroup *memcg = NULL;
 + nodemask_t *cpuset_mems_allowed = cpuset_current_mems_allowed;

Why do you need this one?

   gfp_mask = gfp_allowed_mask;

 @@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
 order,
  retry_cpuset:
   cpuset_mems_cookie = read_mems_allowed_begin();

 +#ifdef CONFIG_NUMA
 + if (gfp_zone(gfp_mask)  policy_zone)
 + cpuset_mems_allowed = NULL;

nodemask = node_states[N_ONLINE];

 +#endif


 +
   /* The preferred zone is used for statistics later */
   first_zones_zonelist(zonelist, high_zoneidx,
 - nodemask ? : cpuset_current_mems_allowed,
 + nodemask ? : cpuset_mems_allowed,

Skip this?

   preferred_zone);
   if (!preferred_zone)
   goto out;


Why call __alloc_pages_nodemask at all if you want to skip the node
handling? Punt to alloc_pages()
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/