Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v2)

2014-05-28 Thread Li Zefan
On 2014/5/27 2:53, Marcelo Tosatti wrote:
> 
> Zone specific allocations, such as GFP_DMA32, should not be restricted
> to cpusets allowed node list: the zones which such allocations demand
> might be contained in particular nodes outside the cpuset node list.
> 
> The alternative would be to not perform such allocations from
> applications which are cpuset restricted, which is unrealistic.
> 
> Fixes KVM's alloc_page(gfp_mask=GFP_DMA32) with cpuset as explained.
> 

Could you add the use case that you described in a previous email to
the changelog?

> Signed-off-by: Marcelo Tosatti 
> 
> v2: fix slowpath as well (David Rientjes)
> 
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 3d54c41..b70a336 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -2392,6 +2392,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t 
> gfp_mask)
>  

Add a comment accordingly?

 *  in_interrupt - any node ok (current task context irrelevant)
 *  GFP_ATOMIC   - any node ok
 *  TIF_MEMDIE   - any node ok
 *  GFP_KERNEL   - any node in enclosing hardwalled cpuset ok
 *  GFP_USER - only nodes in current tasks mems allowed ok.

>   if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
>   return 1;
> +#ifdef CONFIG_NUMA
> + if (gfp_zone(gfp_mask) < policy_zone)
> + return 1;
> +#endif
>   might_sleep_if(!(gfp_mask & __GFP_HARDWALL));
>   if (node_isset(node, current->mems_allowed))
>   return 1;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..dfea3dc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order,
>   unsigned int cpuset_mems_cookie;
>   int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
>   struct mem_cgroup *memcg = NULL;
> + nodemask_t *cpuset_mems_allowed = _current_mems_allowed;
>  
>   gfp_mask &= gfp_allowed_mask;
>  
> @@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order,
>  retry_cpuset:
>   cpuset_mems_cookie = read_mems_allowed_begin();
>  
> +#ifdef CONFIG_NUMA
> + if (gfp_zone(gfp_mask) < policy_zone)
> + cpuset_mems_allowed = NULL;
> +#endif
> +
>   /* The preferred zone is used for statistics later */
>   first_zones_zonelist(zonelist, high_zoneidx,
> - nodemask ? : _current_mems_allowed,
> + nodemask ? : cpuset_mems_allowed,
>   _zone);
>   if (!preferred_zone)
>   goto out;
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v2)

2014-05-28 Thread Li Zefan
On 2014/5/27 2:53, Marcelo Tosatti wrote:
 
 Zone specific allocations, such as GFP_DMA32, should not be restricted
 to cpusets allowed node list: the zones which such allocations demand
 might be contained in particular nodes outside the cpuset node list.
 
 The alternative would be to not perform such allocations from
 applications which are cpuset restricted, which is unrealistic.
 
 Fixes KVM's alloc_page(gfp_mask=GFP_DMA32) with cpuset as explained.
 

Could you add the use case that you described in a previous email to
the changelog?

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 v2: fix slowpath as well (David Rientjes)
 
 diff --git a/kernel/cpuset.c b/kernel/cpuset.c
 index 3d54c41..b70a336 100644
 --- a/kernel/cpuset.c
 +++ b/kernel/cpuset.c
 @@ -2392,6 +2392,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t 
 gfp_mask)
  

Add a comment accordingly?

 *  in_interrupt - any node ok (current task context irrelevant)
 *  GFP_ATOMIC   - any node ok
 *  TIF_MEMDIE   - any node ok
 *  GFP_KERNEL   - any node in enclosing hardwalled cpuset ok
 *  GFP_USER - only nodes in current tasks mems allowed ok.

   if (in_interrupt() || (gfp_mask  __GFP_THISNODE))
   return 1;
 +#ifdef CONFIG_NUMA
 + if (gfp_zone(gfp_mask)  policy_zone)
 + return 1;
 +#endif
   might_sleep_if(!(gfp_mask  __GFP_HARDWALL));
   if (node_isset(node, current-mems_allowed))
   return 1;
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 5dba293..dfea3dc 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
 order,
   unsigned int cpuset_mems_cookie;
   int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
   struct mem_cgroup *memcg = NULL;
 + nodemask_t *cpuset_mems_allowed = cpuset_current_mems_allowed;
  
   gfp_mask = gfp_allowed_mask;
  
 @@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
 order,
  retry_cpuset:
   cpuset_mems_cookie = read_mems_allowed_begin();
  
 +#ifdef CONFIG_NUMA
 + if (gfp_zone(gfp_mask)  policy_zone)
 + cpuset_mems_allowed = NULL;
 +#endif
 +
   /* The preferred zone is used for statistics later */
   first_zones_zonelist(zonelist, high_zoneidx,
 - nodemask ? : cpuset_current_mems_allowed,
 + nodemask ? : cpuset_mems_allowed,
   preferred_zone);
   if (!preferred_zone)
   goto out;
 .
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v2)

2014-05-26 Thread Marcelo Tosatti

Zone specific allocations, such as GFP_DMA32, should not be restricted
to cpusets allowed node list: the zones which such allocations demand
might be contained in particular nodes outside the cpuset node list.

The alternative would be to not perform such allocations from
applications which are cpuset restricted, which is unrealistic.

Fixes KVM's alloc_page(gfp_mask=GFP_DMA32) with cpuset as explained.

Signed-off-by: Marcelo Tosatti 

v2: fix slowpath as well (David Rientjes)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..b70a336 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2392,6 +2392,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t 
gfp_mask)
 
if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
return 1;
+#ifdef CONFIG_NUMA
+   if (gfp_zone(gfp_mask) < policy_zone)
+   return 1;
+#endif
might_sleep_if(!(gfp_mask & __GFP_HARDWALL));
if (node_isset(node, current->mems_allowed))
return 1;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..dfea3dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
struct mem_cgroup *memcg = NULL;
+   nodemask_t *cpuset_mems_allowed = _current_mems_allowed;
 
gfp_mask &= gfp_allowed_mask;
 
@@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
order,
 retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
 
+#ifdef CONFIG_NUMA
+   if (gfp_zone(gfp_mask) < policy_zone)
+   cpuset_mems_allowed = NULL;
+#endif
+
/* The preferred zone is used for statistics later */
first_zones_zonelist(zonelist, high_zoneidx,
-   nodemask ? : _current_mems_allowed,
+   nodemask ? : cpuset_mems_allowed,
_zone);
if (!preferred_zone)
goto out;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v2)

2014-05-26 Thread Marcelo Tosatti

Zone specific allocations, such as GFP_DMA32, should not be restricted
to cpusets allowed node list: the zones which such allocations demand
might be contained in particular nodes outside the cpuset node list.

The alternative would be to not perform such allocations from
applications which are cpuset restricted, which is unrealistic.

Fixes KVM's alloc_page(gfp_mask=GFP_DMA32) with cpuset as explained.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

v2: fix slowpath as well (David Rientjes)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..b70a336 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2392,6 +2392,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t 
gfp_mask)
 
if (in_interrupt() || (gfp_mask  __GFP_THISNODE))
return 1;
+#ifdef CONFIG_NUMA
+   if (gfp_zone(gfp_mask)  policy_zone)
+   return 1;
+#endif
might_sleep_if(!(gfp_mask  __GFP_HARDWALL));
if (node_isset(node, current-mems_allowed))
return 1;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..dfea3dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
struct mem_cgroup *memcg = NULL;
+   nodemask_t *cpuset_mems_allowed = cpuset_current_mems_allowed;
 
gfp_mask = gfp_allowed_mask;
 
@@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
order,
 retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
 
+#ifdef CONFIG_NUMA
+   if (gfp_zone(gfp_mask)  policy_zone)
+   cpuset_mems_allowed = NULL;
+#endif
+
/* The preferred zone is used for statistics later */
first_zones_zonelist(zonelist, high_zoneidx,
-   nodemask ? : cpuset_current_mems_allowed,
+   nodemask ? : cpuset_mems_allowed,
preferred_zone);
if (!preferred_zone)
goto out;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/