Re: [RFC v2 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator

2015-03-25 Thread Roman Peniaev
On Wed, Mar 25, 2015 at 7:00 AM, Andrew Morton
 wrote:
> On Thu, 19 Mar 2015 23:04:39 +0900 Roman Pen  wrote:
>
>> If suitable block can't be found, new block is allocated and put into a head
>> of a free list, so on next iteration this new block will be found first.
>>
>> ...
>>
>> Cc: sta...@vger.kernel.org
>>
>> ...
>>
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
>>
>>   vbq = _cpu_var(vmap_block_queue);
>>   spin_lock(>lock);
>> - list_add_rcu(>free_list, >free);
>> + list_add_tail_rcu(>free_list, >free);
>>   spin_unlock(>lock);
>>   put_cpu_var(vmap_block_queue);
>>
>
> I'm not sure about the cc:stable here.  There is potential for
> unexpected side-effects

Only one potential side-effect I see is that allocator has to iterate
up to 6 (7 on 64-bit systems) blocks in a free list two times.
The second patch fixes this by occupying the block right away after
allocation.  But even the second patch is not applied - iterating 6 (7)
blocks (and this is the worst and rare case) is not a big deal comparing
to the size of a free list, which increases over time, if this patch was
not applied.

I can compare the behaviour of the allocator, which puts new blocks to the
head of a free list, with the tetris game: sooner or later coming blocks
will reach the top, and you will lose, even if you are the champion.

> and I don't *think* people are hurting from
> this issue in real life.  Or maybe I'm wrong about that?

Yes, probably they are not.  I showed one special synthetic scenario, which
works pretty well and exhausts the virtual space very fast, another scenario
is a random one, which also works, but very slow.

I think drivers tend only to preallocate (not frequent usage) or to pass
sequential sizes to vm_map_ram.  In these cases everything will be fine.
Also free list is a CPU variable.  Good and fast reproduction happens only
if you bind a vm_map_ram call to the CPU or use uniprocessor system.

Probably the conjunction of all of these reasons hid the problem for a
long time.  But I tend to think that this is a bug, long-standing bug.

--
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator

2015-03-25 Thread Roman Peniaev
On Wed, Mar 25, 2015 at 7:00 AM, Andrew Morton
a...@linux-foundation.org wrote:
 On Thu, 19 Mar 2015 23:04:39 +0900 Roman Pen r.peni...@gmail.com wrote:

 If suitable block can't be found, new block is allocated and put into a head
 of a free list, so on next iteration this new block will be found first.

 ...

 Cc: sta...@vger.kernel.org

 ...

 --- a/mm/vmalloc.c
 +++ b/mm/vmalloc.c
 @@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)

   vbq = get_cpu_var(vmap_block_queue);
   spin_lock(vbq-lock);
 - list_add_rcu(vb-free_list, vbq-free);
 + list_add_tail_rcu(vb-free_list, vbq-free);
   spin_unlock(vbq-lock);
   put_cpu_var(vmap_block_queue);


 I'm not sure about the cc:stable here.  There is potential for
 unexpected side-effects

Only one potential side-effect I see is that allocator has to iterate
up to 6 (7 on 64-bit systems) blocks in a free list two times.
The second patch fixes this by occupying the block right away after
allocation.  But even the second patch is not applied - iterating 6 (7)
blocks (and this is the worst and rare case) is not a big deal comparing
to the size of a free list, which increases over time, if this patch was
not applied.

I can compare the behaviour of the allocator, which puts new blocks to the
head of a free list, with the tetris game: sooner or later coming blocks
will reach the top, and you will lose, even if you are the champion.

 and I don't *think* people are hurting from
 this issue in real life.  Or maybe I'm wrong about that?

Yes, probably they are not.  I showed one special synthetic scenario, which
works pretty well and exhausts the virtual space very fast, another scenario
is a random one, which also works, but very slow.

I think drivers tend only to preallocate (not frequent usage) or to pass
sequential sizes to vm_map_ram.  In these cases everything will be fine.
Also free list is a CPU variable.  Good and fast reproduction happens only
if you bind a vm_map_ram call to the CPU or use uniprocessor system.

Probably the conjunction of all of these reasons hid the problem for a
long time.  But I tend to think that this is a bug, long-standing bug.

--
Roman
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator

2015-03-24 Thread Gioh Kim
> 
> In current patch I simply put newly allocated block to the tail of a free 
> list,
> thus reduce fragmentation, giving a chance to resolve allocation request using
> older blocks with possible holes left.

It's great.
I think this might be helpful for fragmentation by mix of long-time, short-time 
mappings.
I do thank you for your work.

> 
> Signed-off-by: Roman Pen 
> Cc: Andrew Morton 
> Cc: Eric Dumazet 
> Acked-by: Joonsoo Kim 
> Cc: David Rientjes 
> Cc: WANG Chao 
> Cc: Fabian Frederick 
> Cc: Christoph Lameter 
> Cc: Gioh Kim 
> Cc: Rob Jones 
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: sta...@vger.kernel.org
> ---
>   mm/vmalloc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 39c3388..db6bffb 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
>   
>   vbq = _cpu_var(vmap_block_queue);
>   spin_lock(>lock);
> - list_add_rcu(>free_list, >free);
> + list_add_tail_rcu(>free_list, >free);
>   spin_unlock(>lock);
>   put_cpu_var(vmap_block_queue);
>   
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator

2015-03-24 Thread Andrew Morton
On Thu, 19 Mar 2015 23:04:39 +0900 Roman Pen  wrote:

> If suitable block can't be found, new block is allocated and put into a head
> of a free list, so on next iteration this new block will be found first.
> 
> ...
>
> Cc: sta...@vger.kernel.org
>
> ...
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
>  
>   vbq = _cpu_var(vmap_block_queue);
>   spin_lock(>lock);
> - list_add_rcu(>free_list, >free);
> + list_add_tail_rcu(>free_list, >free);
>   spin_unlock(>lock);
>   put_cpu_var(vmap_block_queue);
>  

I'm not sure about the cc:stable here.  There is potential for
unexpected side-effects and I don't *think* people are hurting from
this issue in real life.  Or maybe I'm wrong about that?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator

2015-03-24 Thread Andrew Morton
On Thu, 19 Mar 2015 23:04:39 +0900 Roman Pen r.peni...@gmail.com wrote:

 If suitable block can't be found, new block is allocated and put into a head
 of a free list, so on next iteration this new block will be found first.
 
 ...

 Cc: sta...@vger.kernel.org

 ...

 --- a/mm/vmalloc.c
 +++ b/mm/vmalloc.c
 @@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
  
   vbq = get_cpu_var(vmap_block_queue);
   spin_lock(vbq-lock);
 - list_add_rcu(vb-free_list, vbq-free);
 + list_add_tail_rcu(vb-free_list, vbq-free);
   spin_unlock(vbq-lock);
   put_cpu_var(vmap_block_queue);
  

I'm not sure about the cc:stable here.  There is potential for
unexpected side-effects and I don't *think* people are hurting from
this issue in real life.  Or maybe I'm wrong about that?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator

2015-03-24 Thread Gioh Kim
 
 In current patch I simply put newly allocated block to the tail of a free 
 list,
 thus reduce fragmentation, giving a chance to resolve allocation request using
 older blocks with possible holes left.

It's great.
I think this might be helpful for fragmentation by mix of long-time, short-time 
mappings.
I do thank you for your work.

 
 Signed-off-by: Roman Pen r.peni...@gmail.com
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Eric Dumazet eduma...@google.com
 Acked-by: Joonsoo Kim iamjoonsoo@lge.com
 Cc: David Rientjes rient...@google.com
 Cc: WANG Chao chaow...@redhat.com
 Cc: Fabian Frederick f...@skynet.be
 Cc: Christoph Lameter c...@linux.com
 Cc: Gioh Kim gioh@lge.com
 Cc: Rob Jones rob.jo...@codethink.co.uk
 Cc: linux...@kvack.org
 Cc: linux-kernel@vger.kernel.org
 Cc: sta...@vger.kernel.org
 ---
   mm/vmalloc.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/mm/vmalloc.c b/mm/vmalloc.c
 index 39c3388..db6bffb 100644
 --- a/mm/vmalloc.c
 +++ b/mm/vmalloc.c
 @@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
   
   vbq = get_cpu_var(vmap_block_queue);
   spin_lock(vbq-lock);
 - list_add_rcu(vb-free_list, vbq-free);
 + list_add_tail_rcu(vb-free_list, vbq-free);
   spin_unlock(vbq-lock);
   put_cpu_var(vmap_block_queue);
   
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v2 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator

2015-03-19 Thread Roman Pen
If suitable block can't be found, new block is allocated and put into a head
of a free list, so on next iteration this new block will be found first.

That's bad, because old blocks in a free list will not get a chance to be fully
used, thus fragmentation will grow.

Let's consider this simple example:

 #1 We have one block in a free list which is partially used, and where only
one page is free:

HEAD |x-| TAIL
   ^
   free space for 1 page, order 0

 #2 New allocation request of order 1 (2 pages) comes, new block is allocated
since we do not have free space to complete this request. New block is put
into a head of a free list:

HEAD |--|x-| TAIL

 #3 Two pages were occupied in a new found block:

HEAD |xx|x-| TAIL
  ^
  two pages mapped here

 #4 New allocation request of order 0 (1 page) comes.  Block, which was created
on #2 step, is located at the beginning of a free list, so it will be found
first:

  HEAD |xxX---|x-| TAIL
  ^ ^
  page mapped here, but better to use this hole

It is obvious, that it is better to complete request of #4 step using the old
block, where free space is left, because in other case fragmentation will be
highly increased.

But fragmentation is not only the case.  The most worst thing is that I can
easily create scenario, when the whole vmalloc space is exhausted by blocks,
which are not used, but already dirty and have several free pages.

Let's consider this function which execution should be pinned to one CPU:

 --
static void exhaust_virtual_space(struct page *pages[16], int iters)
{
/* Firstly we have to map a big chunk, e.g. 16 pages.
 * Then we have to occupy the remaining space with smaller
 * chunks, i.e. 8 pages. At the end small hole should remain.
 * So at the end of our allocation sequence block looks like
 * this:
 *XX  big chunk
 * |XXxxx-|x  small chunk
 * -  hole, which is enough for a small chunk,
 *but is not enough for a big chunk
 */
while (iters--) {
int i;
void *vaddr;

/* Map/unmap big chunk */
vaddr = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, 16);

/* Map/unmap small chunks.
 *
 * -1 for hole, which should be left at the end of each block
 * to keep it partially used, with some free space available */
for (i = 0; i < (VMAP_BBMAP_BITS - 16) / 8 - 1; i++) {
vaddr = vm_map_ram(pages, 8, -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, 8);
}
}
}
 --

On every iteration new block (1MB of vm area in my case) will be allocated and
then will be occupied, without attempt to resolve small allocation request
using previously allocated blocks in a free list.

In case of random allocation (size should be randomly taken from the range
[1..64] in 64-bit case or [1..32] in 32-bit case) situation is the same:
new blocks continue to appear if maximum possible allocation size (32 or 64)
passed to the allocator, because all remaining blocks in a free list do not
have enough free space to complete this allocation request.

In summary if new blocks are put into the head of a free list eventually
virtual space will be exhausted.

In current patch I simply put newly allocated block to the tail of a free list,
thus reduce fragmentation, giving a chance to resolve allocation request using
older blocks with possible holes left.

Signed-off-by: Roman Pen 
Cc: Andrew Morton 
Cc: Eric Dumazet 
Acked-by: Joonsoo Kim 
Cc: David Rientjes 
Cc: WANG Chao 
Cc: Fabian Frederick 
Cc: Christoph Lameter 
Cc: Gioh Kim 
Cc: Rob Jones 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: sta...@vger.kernel.org
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 39c3388..db6bffb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
 
vbq = _cpu_var(vmap_block_queue);
spin_lock(>lock);
-   list_add_rcu(>free_list, >free);
+   list_add_tail_rcu(>free_list, >free);
spin_unlock(>lock);
put_cpu_var(vmap_block_queue);
 
-- 
2.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v2 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator

2015-03-19 Thread Roman Pen
If suitable block can't be found, new block is allocated and put into a head
of a free list, so on next iteration this new block will be found first.

That's bad, because old blocks in a free list will not get a chance to be fully
used, thus fragmentation will grow.

Let's consider this simple example:

 #1 We have one block in a free list which is partially used, and where only
one page is free:

HEAD |x-| TAIL
   ^
   free space for 1 page, order 0

 #2 New allocation request of order 1 (2 pages) comes, new block is allocated
since we do not have free space to complete this request. New block is put
into a head of a free list:

HEAD |--|x-| TAIL

 #3 Two pages were occupied in a new found block:

HEAD |xx|x-| TAIL
  ^
  two pages mapped here

 #4 New allocation request of order 0 (1 page) comes.  Block, which was created
on #2 step, is located at the beginning of a free list, so it will be found
first:

  HEAD |xxX---|x-| TAIL
  ^ ^
  page mapped here, but better to use this hole

It is obvious, that it is better to complete request of #4 step using the old
block, where free space is left, because in other case fragmentation will be
highly increased.

But fragmentation is not only the case.  The most worst thing is that I can
easily create scenario, when the whole vmalloc space is exhausted by blocks,
which are not used, but already dirty and have several free pages.

Let's consider this function which execution should be pinned to one CPU:

 --
static void exhaust_virtual_space(struct page *pages[16], int iters)
{
/* Firstly we have to map a big chunk, e.g. 16 pages.
 * Then we have to occupy the remaining space with smaller
 * chunks, i.e. 8 pages. At the end small hole should remain.
 * So at the end of our allocation sequence block looks like
 * this:
 *XX  big chunk
 * |XXxxx-|x  small chunk
 * -  hole, which is enough for a small chunk,
 *but is not enough for a big chunk
 */
while (iters--) {
int i;
void *vaddr;

/* Map/unmap big chunk */
vaddr = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, 16);

/* Map/unmap small chunks.
 *
 * -1 for hole, which should be left at the end of each block
 * to keep it partially used, with some free space available */
for (i = 0; i  (VMAP_BBMAP_BITS - 16) / 8 - 1; i++) {
vaddr = vm_map_ram(pages, 8, -1, PAGE_KERNEL);
vm_unmap_ram(vaddr, 8);
}
}
}
 --

On every iteration new block (1MB of vm area in my case) will be allocated and
then will be occupied, without attempt to resolve small allocation request
using previously allocated blocks in a free list.

In case of random allocation (size should be randomly taken from the range
[1..64] in 64-bit case or [1..32] in 32-bit case) situation is the same:
new blocks continue to appear if maximum possible allocation size (32 or 64)
passed to the allocator, because all remaining blocks in a free list do not
have enough free space to complete this allocation request.

In summary if new blocks are put into the head of a free list eventually
virtual space will be exhausted.

In current patch I simply put newly allocated block to the tail of a free list,
thus reduce fragmentation, giving a chance to resolve allocation request using
older blocks with possible holes left.

Signed-off-by: Roman Pen r.peni...@gmail.com
Cc: Andrew Morton a...@linux-foundation.org
Cc: Eric Dumazet eduma...@google.com
Acked-by: Joonsoo Kim iamjoonsoo@lge.com
Cc: David Rientjes rient...@google.com
Cc: WANG Chao chaow...@redhat.com
Cc: Fabian Frederick f...@skynet.be
Cc: Christoph Lameter c...@linux.com
Cc: Gioh Kim gioh@lge.com
Cc: Rob Jones rob.jo...@codethink.co.uk
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: sta...@vger.kernel.org
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 39c3388..db6bffb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
 
vbq = get_cpu_var(vmap_block_queue);
spin_lock(vbq-lock);
-   list_add_rcu(vb-free_list, vbq-free);
+   list_add_tail_rcu(vb-free_list, vbq-free);
spin_unlock(vbq-lock);
put_cpu_var(vmap_block_queue);
 
-- 
2.2.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a