from:"zijun

[PATCH v2 1/1] irqchip/gicv3: iterate over possible CPUs by for_each_possible_cpu()

2017-09-15 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

get_cpu_number() doesn't use existing helper to iterate over possible
CPUs, it will cause error in case of discontinuous @cpu_possible_mask
such as 0b0001, such discontinuous @cpu_possible_mask is resulted
in likely because one core have failed to come up on a SMP machine

fixed by using existing helper for_each_possible_cpu().

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 Changes in v2:
  - more detailed commit messages are offered

 drivers/irqchip/irq-gic-v3.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 519149ec9053..b5df99c6f680 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1042,7 +1042,7 @@ static int get_cpu_number(struct device_node *dn)
 {
const __be32 *cell;
u64 hwid;
-   int i;
+   int cpu;
 
cell = of_get_property(dn, "reg", NULL);
if (!cell)
@@ -1056,9 +1056,9 @@ static int get_cpu_number(struct device_node *dn)
if (hwid & ~MPIDR_HWID_BITMASK)
return -1;
 
-   for (i = 0; i < num_possible_cpus(); i++)
-   if (cpu_logical_map(i) == hwid)
-   return i;
+   for_each_possible_cpu(cpu)
+   if (cpu_logical_map(cpu) == hwid)
+   return cpu;
 
return -1;
 }
-- 
1.9.1

[PATCH v2 1/1] irqchip/gicv3: iterate over possible CPUs by for_each_possible_cpu()

2017-09-15 Thread zijun_hu

From: zijun_hu 

get_cpu_number() doesn't use existing helper to iterate over possible
CPUs, it will cause error in case of discontinuous @cpu_possible_mask
such as 0b0001, such discontinuous @cpu_possible_mask is resulted
in likely because one core have failed to come up on a SMP machine

fixed by using existing helper for_each_possible_cpu().

Signed-off-by: zijun_hu 
---
 Changes in v2:
  - more detailed commit messages are offered

 drivers/irqchip/irq-gic-v3.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 519149ec9053..b5df99c6f680 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1042,7 +1042,7 @@ static int get_cpu_number(struct device_node *dn)
 {
const __be32 *cell;
u64 hwid;
-   int i;
+   int cpu;
 
cell = of_get_property(dn, "reg", NULL);
if (!cell)
@@ -1056,9 +1056,9 @@ static int get_cpu_number(struct device_node *dn)
if (hwid & ~MPIDR_HWID_BITMASK)
return -1;
 
-   for (i = 0; i < num_possible_cpus(); i++)
-   if (cpu_logical_map(i) == hwid)
-   return i;
+   for_each_possible_cpu(cpu)
+   if (cpu_logical_map(cpu) == hwid)
+   return cpu;
 
return -1;
 }
-- 
1.9.1

Re: [PATCH 1/1] irqchip/gicv3: iterate over possible CPUs by for_each_possible_cpu()

2017-09-14 Thread zijun_hu

On 09/15/2017 03:20 AM, Marc Zyngier wrote:
> On Thu, Sep 14 2017 at  1:15:14 pm BST, zijun_hu <zijun...@zoho.com> wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> get_cpu_number() doesn't use existing helper to iterate over possible
>> CPUs, so error happens in case of discontinuous @cpu_possible_mask
>> such as 0b0001.
> 
> Do you have an example of such a situation? Your patch is definitely an
> improvement, but I'd like to understand how you get there...
> 
> Thanks,
> 
>   M.
> 
a few conditions which maybe result in discontiguous @cpu_possible_mask
are noticed and considered of by ARM64 init code as indicated by bellow code
segments:
in arch/arm64/kernel/smp.c :
void __init smp_init_cpus(void) {
..
/*
 * We need to set the cpu_logical_map entries before enabling
 * the cpus so that cpu processor description entries (DT cpu nodes
 * and ACPI MADT entries) can be retrieved by matching the cpu hwid
 * with entries in cpu_logical_map while initializing the cpus.
 * If the cpu set-up fails, invalidate the cpu_logical_map entry.
 */
for (i = 1; i < nr_cpu_ids; i++) {
if (cpu_logical_map(i) != INVALID_HWID) {
if (smp_cpu_setup(i))
cpu_logical_map(i) = INVALID_HWID;
}
}
..
}

/*
 * Initialize cpu operations for a logical cpu and
 * set it in the possible mask on success
 */
static int __init smp_cpu_setup(int cpu)
{
if (cpu_read_ops(cpu))
return -ENODEV;

if (cpu_ops[cpu]->cpu_init(cpu))
return -ENODEV;

set_cpu_possible(cpu, true);

return 0;
}

i browses GICv3 drivers code and notice that a little weird code segments

Re: [PATCH 1/1] irqchip/gicv3: iterate over possible CPUs by for_each_possible_cpu()

2017-09-14 Thread zijun_hu

On 09/15/2017 03:20 AM, Marc Zyngier wrote:
> On Thu, Sep 14 2017 at  1:15:14 pm BST, zijun_hu  wrote:
>> From: zijun_hu 
>>
>> get_cpu_number() doesn't use existing helper to iterate over possible
>> CPUs, so error happens in case of discontinuous @cpu_possible_mask
>> such as 0b0001.
> 
> Do you have an example of such a situation? Your patch is definitely an
> improvement, but I'd like to understand how you get there...
> 
> Thanks,
> 
>   M.
> 
a few conditions which maybe result in discontiguous @cpu_possible_mask
are noticed and considered of by ARM64 init code as indicated by bellow code
segments:
in arch/arm64/kernel/smp.c :
void __init smp_init_cpus(void) {
..
/*
 * We need to set the cpu_logical_map entries before enabling
 * the cpus so that cpu processor description entries (DT cpu nodes
 * and ACPI MADT entries) can be retrieved by matching the cpu hwid
 * with entries in cpu_logical_map while initializing the cpus.
 * If the cpu set-up fails, invalidate the cpu_logical_map entry.
 */
for (i = 1; i < nr_cpu_ids; i++) {
if (cpu_logical_map(i) != INVALID_HWID) {
if (smp_cpu_setup(i))
cpu_logical_map(i) = INVALID_HWID;
}
}
..
}

/*
 * Initialize cpu operations for a logical cpu and
 * set it in the possible mask on success
 */
static int __init smp_cpu_setup(int cpu)
{
if (cpu_read_ops(cpu))
return -ENODEV;

if (cpu_ops[cpu]->cpu_init(cpu))
return -ENODEV;

set_cpu_possible(cpu, true);

return 0;
}

i browses GICv3 drivers code and notice that a little weird code segments

[PATCH 1/1] irqchip/gicv3: iterate over possible CPUs by for_each_possible_cpu()

2017-09-13 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

get_cpu_number() doesn't use existing helper to iterate over possible
CPUs, so error happens in case of discontinuous @cpu_possible_mask
such as 0b0001.

fixed by using existing helper for_each_possible_cpu().

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 drivers/irqchip/irq-gic-v3.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 19d642eae096..2a0427c2971e 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -990,7 +990,7 @@ static int get_cpu_number(struct device_node *dn)
 {
const __be32 *cell;
u64 hwid;
-   int i;
+   int cpu;
 
cell = of_get_property(dn, "reg", NULL);
if (!cell)
@@ -1004,9 +1004,9 @@ static int get_cpu_number(struct device_node *dn)
if (hwid & ~MPIDR_HWID_BITMASK)
return -1;
 
-   for (i = 0; i < num_possible_cpus(); i++)
-   if (cpu_logical_map(i) == hwid)
-   return i;
+   for_each_possible_cpu(cpu)
+   if (cpu_logical_map(cpu) == hwid)
+   return cpu;
 
return -1;
 }
-- 
1.9.1

[PATCH 1/1] irqchip/gicv3: iterate over possible CPUs by for_each_possible_cpu()

2017-09-13 Thread zijun_hu

From: zijun_hu 

get_cpu_number() doesn't use existing helper to iterate over possible
CPUs, so error happens in case of discontinuous @cpu_possible_mask
such as 0b0001.

fixed by using existing helper for_each_possible_cpu().

Signed-off-by: zijun_hu 
---
 drivers/irqchip/irq-gic-v3.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 19d642eae096..2a0427c2971e 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -990,7 +990,7 @@ static int get_cpu_number(struct device_node *dn)
 {
const __be32 *cell;
u64 hwid;
-   int i;
+   int cpu;
 
cell = of_get_property(dn, "reg", NULL);
if (!cell)
@@ -1004,9 +1004,9 @@ static int get_cpu_number(struct device_node *dn)
if (hwid & ~MPIDR_HWID_BITMASK)
return -1;
 
-   for (i = 0; i < num_possible_cpus(); i++)
-   if (cpu_logical_map(i) == hwid)
-   return i;
+   for_each_possible_cpu(cpu)
+   if (cpu_logical_map(cpu) == hwid)
+   return cpu;
 
return -1;
 }
-- 
1.9.1

Re: [PATCH 1/1] workqueue: use type int instead of bool to index array

2017-09-06 Thread zijun_hu

On 2017/9/7 0:40, Tejun Heo wrote:
> On Thu, Sep 07, 2017 at 12:04:59AM +0800, zijun_hu wrote:
>> On 2017/9/6 22:33, Tejun Heo wrote:
>>> Hello,
>>>
>>> On Wed, Sep 06, 2017 at 11:34:14AM +0800, zijun_hu wrote:
>>>> From: zijun_hu <zijun...@htc.com>
>>>>
>>>> type bool is used to index three arrays in alloc_and_link_pwqs()
>>>> it doesn't look like conventional.
>>>>
>>>> it is fixed by using type int to index the relevant arrays.
>>>
>>> bool is a uint type which can be either 0 or 1.  I don't see what the
>>> benefit of this patch is.q
>>>
>> bool is NOT a uint type now, it is a new type introduced by gcc, it is
>> rather different with "typedef int bool" historically
> 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n815.htm
> 
>   Because C has existed for so long without a Boolean type, however, the
>   new standard must coexist with the old remedies. Therefore, the type
>   name is taken from the reserved identifier space. To maintain
>   orthogonal promotion rules, the Boolean type is defined as an unsigned
>   integer type capable of representing the values 0 and 1. The more
>   conventional names for the type and its values are then made available
>   only with the inclusion of the  header. In addition, the
>   header defines a feature test macro to aid in integrating new code
>   with old code that defines its own Boolean type.
> 
in this case, i think type int is more suitable than bool in aspects of
extendibility, program custom and consistency.

Re: [PATCH 1/1] workqueue: use type int instead of bool to index array

2017-09-06 Thread zijun_hu

On 2017/9/7 0:40, Tejun Heo wrote:
> On Thu, Sep 07, 2017 at 12:04:59AM +0800, zijun_hu wrote:
>> On 2017/9/6 22:33, Tejun Heo wrote:
>>> Hello,
>>>
>>> On Wed, Sep 06, 2017 at 11:34:14AM +0800, zijun_hu wrote:
>>>> From: zijun_hu 
>>>>
>>>> type bool is used to index three arrays in alloc_and_link_pwqs()
>>>> it doesn't look like conventional.
>>>>
>>>> it is fixed by using type int to index the relevant arrays.
>>>
>>> bool is a uint type which can be either 0 or 1.  I don't see what the
>>> benefit of this patch is.q
>>>
>> bool is NOT a uint type now, it is a new type introduced by gcc, it is
>> rather different with "typedef int bool" historically
> 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n815.htm
> 
>   Because C has existed for so long without a Boolean type, however, the
>   new standard must coexist with the old remedies. Therefore, the type
>   name is taken from the reserved identifier space. To maintain
>   orthogonal promotion rules, the Boolean type is defined as an unsigned
>   integer type capable of representing the values 0 and 1. The more
>   conventional names for the type and its values are then made available
>   only with the inclusion of the  header. In addition, the
>   header defines a feature test macro to aid in integrating new code
>   with old code that defines its own Boolean type.
> 
in this case, i think type int is more suitable than bool in aspects of
extendibility, program custom and consistency.

Re: [PATCH 1/1] workqueue: use type int instead of bool to index array

2017-09-06 Thread zijun_hu

On 2017/9/6 22:33, Tejun Heo wrote:
> Hello,
> 
> On Wed, Sep 06, 2017 at 11:34:14AM +0800, zijun_hu wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> type bool is used to index three arrays in alloc_and_link_pwqs()
>> it doesn't look like conventional.
>>
>> it is fixed by using type int to index the relevant arrays.
> 
> bool is a uint type which can be either 0 or 1.  I don't see what the
> benefit of this patch is.q
> 
bool is NOT a uint type now, it is a new type introduced by gcc, it is
rather different with "typedef int bool" historically
see following code segments for more info about type bool
bool v = 0x10;
printf("v = %d\n", v);
the output is v = 1.

it maybe cause a invalid array index if bool is represented as uint
bool highpri = wq->flags & WQ_HIGHPRI; WQ_HIGHPRI = 1 << 4,
@highpri maybe 16, but the number of array elements is 2.

bool is a logic value, the valid value is true or false. 
indexing array by type bool is not a good program custom

it is more extendable to use type int, type bool maybe is improper if the 
number of
array elements is extended to more than 2 in future

besides, the relevant array is indexed by type int in many other places of the
same source file. this patch can keep consistency
>

Re: [PATCH 1/1] workqueue: use type int instead of bool to index array

2017-09-06 Thread zijun_hu

On 2017/9/6 22:33, Tejun Heo wrote:
> Hello,
> 
> On Wed, Sep 06, 2017 at 11:34:14AM +0800, zijun_hu wrote:
>> From: zijun_hu 
>>
>> type bool is used to index three arrays in alloc_and_link_pwqs()
>> it doesn't look like conventional.
>>
>> it is fixed by using type int to index the relevant arrays.
> 
> bool is a uint type which can be either 0 or 1.  I don't see what the
> benefit of this patch is.q
> 
bool is NOT a uint type now, it is a new type introduced by gcc, it is
rather different with "typedef int bool" historically
see following code segments for more info about type bool
bool v = 0x10;
printf("v = %d\n", v);
the output is v = 1.

it maybe cause a invalid array index if bool is represented as uint
bool highpri = wq->flags & WQ_HIGHPRI; WQ_HIGHPRI = 1 << 4,
@highpri maybe 16, but the number of array elements is 2.

bool is a logic value, the valid value is true or false. 
indexing array by type bool is not a good program custom

it is more extendable to use type int, type bool maybe is improper if the 
number of
array elements is extended to more than 2 in future

besides, the relevant array is indexed by type int in many other places of the
same source file. this patch can keep consistency
>

[PATCH 1/1] workqueue: use type int instead of bool to index array

2017-09-05 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

type bool is used to index three arrays in alloc_and_link_pwqs()
it doesn't look like conventional.

it is fixed by using type int to index the relevant arrays.

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 kernel/workqueue.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ab3c0dc8c7ed..0c7a16792949 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3917,7 +3917,7 @@ static void wq_update_unbound_numa(struct 
workqueue_struct *wq, int cpu,
 
 static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 {
-   bool highpri = wq->flags & WQ_HIGHPRI;
+   int pool_idx = (wq->flags & WQ_HIGHPRI) ? 1 : 0;
int cpu, ret;
 
if (!(wq->flags & WQ_UNBOUND)) {
@@ -3931,7 +3931,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
struct worker_pool *cpu_pools =
per_cpu(cpu_worker_pools, cpu);
 
-   init_pwq(pwq, wq, _pools[highpri]);
+   init_pwq(pwq, wq, _pools[pool_idx]);
 
mutex_lock(>mutex);
link_pwq(pwq);
@@ -3939,14 +3939,15 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
}
return 0;
} else if (wq->flags & __WQ_ORDERED) {
-   ret = apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]);
+   ret = apply_workqueue_attrs(wq, ordered_wq_attrs[pool_idx]);
/* there should only be single pwq for ordering guarantee */
WARN(!ret && (wq->pwqs.next != >dfl_pwq->pwqs_node ||
  wq->pwqs.prev != >dfl_pwq->pwqs_node),
 "ordering guarantee broken for workqueue %s\n", wq->name);
return ret;
} else {
-   return apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
+   return apply_workqueue_attrs(wq,
+   unbound_std_wq_attrs[pool_idx]);
}
 }
 
-- 
1.9.1

[PATCH 1/1] workqueue: use type int instead of bool to index array

2017-09-05 Thread zijun_hu

From: zijun_hu 

type bool is used to index three arrays in alloc_and_link_pwqs()
it doesn't look like conventional.

it is fixed by using type int to index the relevant arrays.

Signed-off-by: zijun_hu 
---
 kernel/workqueue.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ab3c0dc8c7ed..0c7a16792949 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3917,7 +3917,7 @@ static void wq_update_unbound_numa(struct 
workqueue_struct *wq, int cpu,
 
 static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 {
-   bool highpri = wq->flags & WQ_HIGHPRI;
+   int pool_idx = (wq->flags & WQ_HIGHPRI) ? 1 : 0;
int cpu, ret;
 
if (!(wq->flags & WQ_UNBOUND)) {
@@ -3931,7 +3931,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
struct worker_pool *cpu_pools =
per_cpu(cpu_worker_pools, cpu);
 
-   init_pwq(pwq, wq, _pools[highpri]);
+   init_pwq(pwq, wq, _pools[pool_idx]);
 
mutex_lock(>mutex);
link_pwq(pwq);
@@ -3939,14 +3939,15 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
}
return 0;
} else if (wq->flags & __WQ_ORDERED) {
-   ret = apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]);
+   ret = apply_workqueue_attrs(wq, ordered_wq_attrs[pool_idx]);
/* there should only be single pwq for ordering guarantee */
WARN(!ret && (wq->pwqs.next != >dfl_pwq->pwqs_node ||
  wq->pwqs.prev != >dfl_pwq->pwqs_node),
 "ordering guarantee broken for workqueue %s\n", wq->name);
return ret;
} else {
-   return apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
+   return apply_workqueue_attrs(wq,
+   unbound_std_wq_attrs[pool_idx]);
}
 }
 
-- 
1.9.1

Re: [PATCH] mm/vmalloc: add vm_struct for vm_map_ram area

2017-07-19 Thread zijun_hu

On 07/19/2017 06:44 PM, Zhaoyang Huang wrote:
> /proc/vmallocinfo will not show the area allocated by vm_map_ram, which
> will make confusion when debug. Add vm_struct for them and show them in
> proc.
> 
> Signed-off-by: Zhaoyang Huang 
> ---
another patch titled "vmalloc: show lazy-purged vma info in vmallocinfo" was 
phased-in to linux-next branch
to resolve this problem.
>  mm/vmalloc.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 34a1c3e..4a2e93c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -46,6 +46,9 @@ struct vfree_deferred {
>  
>  static void __vunmap(const void *, int);
>  
> +static void setup_vmap_ram_vm(struct vm_struct *vm, struct vmap_area *va,
> + unsigned long flags, const void *caller);
> +
>  static void free_work(struct work_struct *w)
>  {
>   struct vfree_deferred *p = container_of(w, struct vfree_deferred, wq);
> @@ -315,6 +318,7 @@ unsigned long vmalloc_to_pfn(const void *vmalloc_addr)
>  /*** Global kva allocator ***/
>  
>  #define VM_VM_AREA   0x04
> +#define VM_VM_RAM0x08
>  
>  static DEFINE_SPINLOCK(vmap_area_lock);
>  /* Export for kexec only */
> @@ -1141,6 +1145,7 @@ void vm_unmap_ram(const void *mem, unsigned int count)
>  
>   va = find_vmap_area(addr);
>   BUG_ON(!va);
> + kfree(va->vm);
>   free_unmap_vmap_area(va);
>  }
>  EXPORT_SYMBOL(vm_unmap_ram);
> @@ -1173,6 +1178,12 @@ void *vm_map_ram(struct page **pages, unsigned int 
> count, int node, pgprot_t pro
>   addr = (unsigned long)mem;
>   } else {
>   struct vmap_area *va;
> + struct vm_struct *area;
> +
> + area = kzalloc_node(sizeof(*area), GFP_KERNEL, node);
> + if (unlikely(!area))
> + return NULL;
> +
>   va = alloc_vmap_area(size, PAGE_SIZE,
>   VMALLOC_START, VMALLOC_END, node, GFP_KERNEL);
>   if (IS_ERR(va))
> @@ -1180,6 +1191,7 @@ void *vm_map_ram(struct page **pages, unsigned int 
> count, int node, pgprot_t pro
>  
>   addr = va->va_start;
>   mem = (void *)addr;
> + setup_vmap_ram_vm(area, va, 0, __builtin_return_address(0));
>   }
>   if (vmap_page_range(addr, addr + size, prot, pages) < 0) {
>   vm_unmap_ram(mem, count);
> @@ -1362,6 +1374,19 @@ static void setup_vmalloc_vm(struct vm_struct *vm, 
> struct vmap_area *va,
>   spin_unlock(_area_lock);
>  }
>  
> +static void setup_vmap_ram_vm(struct vm_struct *vm, struct vmap_area *va,
> +   unsigned long flags, const void *caller)
> +{
> + spin_lock(_area_lock);
> + vm->flags = flags;
> + vm->addr = (void *)va->va_start;
> + vm->size = va->va_end - va->va_start;
> + vm->caller = caller;
> + va->vm = vm;
> + va->flags |= VM_VM_RAM;
> + spin_unlock(_area_lock);
> +}
> +
>  static void clear_vm_uninitialized_flag(struct vm_struct *vm)
>  {
>   /*
> @@ -2698,7 +2723,7 @@ static int s_show(struct seq_file *m, void *p)
>* s_show can encounter race with remove_vm_area, !VM_VM_AREA on
>* behalf of vmap area is being tear down or vm_map_ram allocation.
>*/
> - if (!(va->flags & VM_VM_AREA))
> + if (!(va->flags & (VM_VM_AREA | VM_VM_RAM)))
>   return 0;
>  
>   v = va->vm;
>

Re: [PATCH] mm/vmalloc: add vm_struct for vm_map_ram area

2017-07-19 Thread zijun_hu

On 07/19/2017 06:44 PM, Zhaoyang Huang wrote:
> /proc/vmallocinfo will not show the area allocated by vm_map_ram, which
> will make confusion when debug. Add vm_struct for them and show them in
> proc.
> 
> Signed-off-by: Zhaoyang Huang 
> ---
another patch titled "vmalloc: show lazy-purged vma info in vmallocinfo" was 
phased-in to linux-next branch
to resolve this problem.
>  mm/vmalloc.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 34a1c3e..4a2e93c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -46,6 +46,9 @@ struct vfree_deferred {
>  
>  static void __vunmap(const void *, int);
>  
> +static void setup_vmap_ram_vm(struct vm_struct *vm, struct vmap_area *va,
> + unsigned long flags, const void *caller);
> +
>  static void free_work(struct work_struct *w)
>  {
>   struct vfree_deferred *p = container_of(w, struct vfree_deferred, wq);
> @@ -315,6 +318,7 @@ unsigned long vmalloc_to_pfn(const void *vmalloc_addr)
>  /*** Global kva allocator ***/
>  
>  #define VM_VM_AREA   0x04
> +#define VM_VM_RAM0x08
>  
>  static DEFINE_SPINLOCK(vmap_area_lock);
>  /* Export for kexec only */
> @@ -1141,6 +1145,7 @@ void vm_unmap_ram(const void *mem, unsigned int count)
>  
>   va = find_vmap_area(addr);
>   BUG_ON(!va);
> + kfree(va->vm);
>   free_unmap_vmap_area(va);
>  }
>  EXPORT_SYMBOL(vm_unmap_ram);
> @@ -1173,6 +1178,12 @@ void *vm_map_ram(struct page **pages, unsigned int 
> count, int node, pgprot_t pro
>   addr = (unsigned long)mem;
>   } else {
>   struct vmap_area *va;
> + struct vm_struct *area;
> +
> + area = kzalloc_node(sizeof(*area), GFP_KERNEL, node);
> + if (unlikely(!area))
> + return NULL;
> +
>   va = alloc_vmap_area(size, PAGE_SIZE,
>   VMALLOC_START, VMALLOC_END, node, GFP_KERNEL);
>   if (IS_ERR(va))
> @@ -1180,6 +1191,7 @@ void *vm_map_ram(struct page **pages, unsigned int 
> count, int node, pgprot_t pro
>  
>   addr = va->va_start;
>   mem = (void *)addr;
> + setup_vmap_ram_vm(area, va, 0, __builtin_return_address(0));
>   }
>   if (vmap_page_range(addr, addr + size, prot, pages) < 0) {
>   vm_unmap_ram(mem, count);
> @@ -1362,6 +1374,19 @@ static void setup_vmalloc_vm(struct vm_struct *vm, 
> struct vmap_area *va,
>   spin_unlock(_area_lock);
>  }
>  
> +static void setup_vmap_ram_vm(struct vm_struct *vm, struct vmap_area *va,
> +   unsigned long flags, const void *caller)
> +{
> + spin_lock(_area_lock);
> + vm->flags = flags;
> + vm->addr = (void *)va->va_start;
> + vm->size = va->va_end - va->va_start;
> + vm->caller = caller;
> + va->vm = vm;
> + va->flags |= VM_VM_RAM;
> + spin_unlock(_area_lock);
> +}
> +
>  static void clear_vm_uninitialized_flag(struct vm_struct *vm)
>  {
>   /*
> @@ -2698,7 +2723,7 @@ static int s_show(struct seq_file *m, void *p)
>* s_show can encounter race with remove_vm_area, !VM_VM_AREA on
>* behalf of vmap area is being tear down or vm_map_ram allocation.
>*/
> - if (!(va->flags & VM_VM_AREA))
> + if (!(va->flags & (VM_VM_AREA | VM_VM_RAM)))
>   return 0;
>  
>   v = va->vm;
>

Re: [PATCH v3] mm/vmalloc: terminate searching since one node found

2017-07-18 Thread zijun_hu

On 07/18/2017 04:31 PM, Zhaoyang Huang (黄朝阳) wrote:
> 
> It is no need to find the very beginning of the area within
> alloc_vmap_area, which can be done by judging each node during the process
> 
it seems the original code is wrote to achieve the following two purposes :
A, the result vamp_area has the lowest available address in the required range 
[vstart, vend)
B, it maybe update the cached vamp_area node info which can speedup other 
relative allocations
it look redundant but conventional and necessary
this approach maybe destroy the original purposes
> For current approach, the worst case is that the starting node which be found
> for searching the 'vmap_area_list' is close to the 'vstart', while the final
> available one is round to the tail(especially for the left branch).
> This commit have the list searching start at the first available node, which
> will save the time of walking the rb tree'(1)' and walking the list'(2)'.
> 
>   vmap_area_root
>   /  \
>  tmp_next U
> /
>   tmp
>/
>  ...  (1)
>   /
> first(current approach)
>
 @tmp_next is the next node of @tmp in the ordered list_head, not in the rbtree

> vmap_area_list->...->first->...->tmp->tmp_next
> (2)
> 
> Signed-off-by: Zhaoyang Huang 
> ---
>  mm/vmalloc.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 34a1c3e..9a5c177 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -459,9 +459,18 @@ static struct vmap_area *alloc_vmap_area(unsigned long 
> size,
> 
> while (n) {
> struct vmap_area *tmp;
> +   struct vmap_area *tmp_next;
> tmp = rb_entry(n, struct vmap_area, rb_node);
> +   tmp_next = list_next_entry(tmp, list);
> if (tmp->va_end >= addr) {
> first = tmp;
> +   if (ALIGN(tmp->va_end, align) + size
> +   < tmp_next->va_start) {
if @tmp node don't locate in the required rang [vstart, vend), but the right of 
the range it maybe
satisfy this condition, even if it locate it locate within the range, it maybe 
don't have the lowest free address.
if @tmp don't have the next node, tmp_next->va_start will cause NULL dereference
> +   addr = ALIGN(tmp->va_end, align);
> +   if (cached_hole_size >= size)
> +   cached_hole_size = 0;
it seems a little rough to reset the @cached_hole_size by this way,  it will 
caused the cached info is updated in the next
allocation regardless the allocation arguments.
> +   goto found;
> +   }
> if (tmp->va_start <= addr)
> break;
> n = n->rb_left;
> --
> 1.9.1
>

Re: [PATCH v3] mm/vmalloc: terminate searching since one node found

2017-07-18 Thread zijun_hu

On 07/18/2017 04:31 PM, Zhaoyang Huang (黄朝阳) wrote:
> 
> It is no need to find the very beginning of the area within
> alloc_vmap_area, which can be done by judging each node during the process
> 
it seems the original code is wrote to achieve the following two purposes :
A, the result vamp_area has the lowest available address in the required range 
[vstart, vend)
B, it maybe update the cached vamp_area node info which can speedup other 
relative allocations
it look redundant but conventional and necessary
this approach maybe destroy the original purposes
> For current approach, the worst case is that the starting node which be found
> for searching the 'vmap_area_list' is close to the 'vstart', while the final
> available one is round to the tail(especially for the left branch).
> This commit have the list searching start at the first available node, which
> will save the time of walking the rb tree'(1)' and walking the list'(2)'.
> 
>   vmap_area_root
>   /  \
>  tmp_next U
> /
>   tmp
>/
>  ...  (1)
>   /
> first(current approach)
>
 @tmp_next is the next node of @tmp in the ordered list_head, not in the rbtree

> vmap_area_list->...->first->...->tmp->tmp_next
> (2)
> 
> Signed-off-by: Zhaoyang Huang 
> ---
>  mm/vmalloc.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 34a1c3e..9a5c177 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -459,9 +459,18 @@ static struct vmap_area *alloc_vmap_area(unsigned long 
> size,
> 
> while (n) {
> struct vmap_area *tmp;
> +   struct vmap_area *tmp_next;
> tmp = rb_entry(n, struct vmap_area, rb_node);
> +   tmp_next = list_next_entry(tmp, list);
> if (tmp->va_end >= addr) {
> first = tmp;
> +   if (ALIGN(tmp->va_end, align) + size
> +   < tmp_next->va_start) {
if @tmp node don't locate in the required rang [vstart, vend), but the right of 
the range it maybe
satisfy this condition, even if it locate it locate within the range, it maybe 
don't have the lowest free address.
if @tmp don't have the next node, tmp_next->va_start will cause NULL dereference
> +   addr = ALIGN(tmp->va_end, align);
> +   if (cached_hole_size >= size)
> +   cached_hole_size = 0;
it seems a little rough to reset the @cached_hole_size by this way,  it will 
caused the cached info is updated in the next
allocation regardless the allocation arguments.
> +   goto found;
> +   }
> if (tmp->va_start <= addr)
> break;
> n = n->rb_left;
> --
> 1.9.1
>

Re: [PATCH v2] mm/vmalloc: terminate searching since one node found

2017-07-17 Thread zijun_hu

On 07/17/2017 04:45 PM, zijun_hu wrote:
> On 07/17/2017 04:07 PM, Zhaoyang Huang wrote:
>> It is no need to find the very beginning of the area within
>> alloc_vmap_area, which can be done by judging each node during the process
>>
>> For current approach, the worst case is that the starting node which be found
>> for searching the 'vmap_area_list' is close to the 'vstart', while the final
>> available one is round to the tail(especially for the left branch).
>> This commit have the list searching start at the first available node, which
>> will save the time of walking the rb tree'(1)' and walking the list(2).
>>
>>   vmap_area_root
>>   /  \
>>  tmp_next U
>> /   (1)
>>   tmp
>>/
>>  ...
>>   /
>> first(current approach)
>>
>> vmap_area_list->...->first->...->tmp->tmp_next
> 
> the original code can ensure the following two points :
> A, the result vamp_area has the lowest available address in the range 
> [vstart, vend)
> B, it can maintain the cached vamp_area node rightly which can speedup 
> relative allocation
> i suspect this patch maybe destroy the above two points 
>> (2)
>>
>> Signed-off-by: Zhaoyang Huang <zhaoyang.hu...@spreadtrum.com>
>> ---
>>  mm/vmalloc.c | 7 +++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 34a1c3e..f833e07 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -459,9 +459,16 @@ static struct vmap_area *alloc_vmap_area(unsigned
>> long size,
>>
>> while (n) {
>> struct vmap_area *tmp;
>> +   struct vmap_area *tmp_next;
>> tmp = rb_entry(n, struct vmap_area, rb_node);
>> +   tmp_next = list_next_entry(tmp, list);
>> if (tmp->va_end >= addr) {
>> first = tmp;
>> +   if (ALIGN(tmp->va_end, align) + size
>> +   < tmp_next->va_start) {
>> +   addr = ALIGN(tmp->va_end, align);
>> +   goto found;
>> +   }
> is the aim vamp_area the lowest available one if the goto occurs ?
> it will bypass the latter cached vamp_area info  cached_hole_size update 
> possibly if the goto occurs
  it think the aim area maybe don't locates the required range [vstart, vend) 
possibly.
>> if (tmp->va_start <= addr)
>> break;
>> n = n->rb_left;
>> --
>> 1.9.1
>>
>

Re: [PATCH v2] mm/vmalloc: terminate searching since one node found

2017-07-17 Thread zijun_hu

On 07/17/2017 04:45 PM, zijun_hu wrote:
> On 07/17/2017 04:07 PM, Zhaoyang Huang wrote:
>> It is no need to find the very beginning of the area within
>> alloc_vmap_area, which can be done by judging each node during the process
>>
>> For current approach, the worst case is that the starting node which be found
>> for searching the 'vmap_area_list' is close to the 'vstart', while the final
>> available one is round to the tail(especially for the left branch).
>> This commit have the list searching start at the first available node, which
>> will save the time of walking the rb tree'(1)' and walking the list(2).
>>
>>   vmap_area_root
>>   /  \
>>  tmp_next U
>> /   (1)
>>   tmp
>>/
>>  ...
>>   /
>> first(current approach)
>>
>> vmap_area_list->...->first->...->tmp->tmp_next
> 
> the original code can ensure the following two points :
> A, the result vamp_area has the lowest available address in the range 
> [vstart, vend)
> B, it can maintain the cached vamp_area node rightly which can speedup 
> relative allocation
> i suspect this patch maybe destroy the above two points 
>> (2)
>>
>> Signed-off-by: Zhaoyang Huang 
>> ---
>>  mm/vmalloc.c | 7 +++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 34a1c3e..f833e07 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -459,9 +459,16 @@ static struct vmap_area *alloc_vmap_area(unsigned
>> long size,
>>
>> while (n) {
>> struct vmap_area *tmp;
>> +   struct vmap_area *tmp_next;
>> tmp = rb_entry(n, struct vmap_area, rb_node);
>> +   tmp_next = list_next_entry(tmp, list);
>> if (tmp->va_end >= addr) {
>> first = tmp;
>> +   if (ALIGN(tmp->va_end, align) + size
>> +   < tmp_next->va_start) {
>> +   addr = ALIGN(tmp->va_end, align);
>> +   goto found;
>> +   }
> is the aim vamp_area the lowest available one if the goto occurs ?
> it will bypass the latter cached vamp_area info  cached_hole_size update 
> possibly if the goto occurs
  it think the aim area maybe don't locates the required range [vstart, vend) 
possibly.
>> if (tmp->va_start <= addr)
>> break;
>> n = n->rb_left;
>> --
>> 1.9.1
>>
>

Re: [PATCH v2] mm/vmalloc: terminate searching since one node found

2017-07-17 Thread zijun_hu

On 07/17/2017 04:07 PM, Zhaoyang Huang wrote:
> It is no need to find the very beginning of the area within
> alloc_vmap_area, which can be done by judging each node during the process
> 
> For current approach, the worst case is that the starting node which be found
> for searching the 'vmap_area_list' is close to the 'vstart', while the final
> available one is round to the tail(especially for the left branch).
> This commit have the list searching start at the first available node, which
> will save the time of walking the rb tree'(1)' and walking the list(2).
> 
>   vmap_area_root
>   /  \
>  tmp_next U
> /   (1)
>   tmp
>/
>  ...
>   /
> first(current approach)
> 
> vmap_area_list->...->first->...->tmp->tmp_next

the original code can ensure the following two points :
A, the result vamp_area has the lowest available address in the range [vstart, 
vend)
B, it can maintain the cached vamp_area node rightly which can speedup relative 
allocation
i suspect this patch maybe destroy the above two points 
> (2)
> 
> Signed-off-by: Zhaoyang Huang 
> ---
>  mm/vmalloc.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 34a1c3e..f833e07 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -459,9 +459,16 @@ static struct vmap_area *alloc_vmap_area(unsigned
> long size,
> 
> while (n) {
> struct vmap_area *tmp;
> +   struct vmap_area *tmp_next;
> tmp = rb_entry(n, struct vmap_area, rb_node);
> +   tmp_next = list_next_entry(tmp, list);
> if (tmp->va_end >= addr) {
> first = tmp;
> +   if (ALIGN(tmp->va_end, align) + size
> +   < tmp_next->va_start) {
> +   addr = ALIGN(tmp->va_end, align);
> +   goto found;
> +   }
is the aim vamp_area the lowest available one if the goto occurs ?
it will bypass the latter cached vamp_area info  cached_hole_size update 
possibly if the goto occurs
> if (tmp->va_start <= addr)
> break;
> n = n->rb_left;
> --
> 1.9.1
>

Re: [PATCH v2] mm/vmalloc: terminate searching since one node found

2017-07-17 Thread zijun_hu

On 07/17/2017 04:07 PM, Zhaoyang Huang wrote:
> It is no need to find the very beginning of the area within
> alloc_vmap_area, which can be done by judging each node during the process
> 
> For current approach, the worst case is that the starting node which be found
> for searching the 'vmap_area_list' is close to the 'vstart', while the final
> available one is round to the tail(especially for the left branch).
> This commit have the list searching start at the first available node, which
> will save the time of walking the rb tree'(1)' and walking the list(2).
> 
>   vmap_area_root
>   /  \
>  tmp_next U
> /   (1)
>   tmp
>/
>  ...
>   /
> first(current approach)
> 
> vmap_area_list->...->first->...->tmp->tmp_next

the original code can ensure the following two points :
A, the result vamp_area has the lowest available address in the range [vstart, 
vend)
B, it can maintain the cached vamp_area node rightly which can speedup relative 
allocation
i suspect this patch maybe destroy the above two points 
> (2)
> 
> Signed-off-by: Zhaoyang Huang 
> ---
>  mm/vmalloc.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 34a1c3e..f833e07 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -459,9 +459,16 @@ static struct vmap_area *alloc_vmap_area(unsigned
> long size,
> 
> while (n) {
> struct vmap_area *tmp;
> +   struct vmap_area *tmp_next;
> tmp = rb_entry(n, struct vmap_area, rb_node);
> +   tmp_next = list_next_entry(tmp, list);
> if (tmp->va_end >= addr) {
> first = tmp;
> +   if (ALIGN(tmp->va_end, align) + size
> +   < tmp_next->va_start) {
> +   addr = ALIGN(tmp->va_end, align);
> +   goto found;
> +   }
is the aim vamp_area the lowest available one if the goto occurs ?
it will bypass the latter cached vamp_area info  cached_hole_size update 
possibly if the goto occurs
> if (tmp->va_start <= addr)
> break;
> n = n->rb_left;
> --
> 1.9.1
>

Re: [RFC PATCH 1/1] mm/vmalloc.c: correct logic errors when insert vmap_area

2016-10-20 Thread zijun_hu

On 10/13/2016 02:39 PM, zijun_hu wrote:

Hi Nicholas,
could you give some comments for this patch?

thanks a lot
> Hi Nicholas,
> 
> i find __insert_vmap_area() is introduced by you
> could you offer comments for this patch related to that funciton
> 
> thanks
> 
> On 10/12/2016 10:46 PM, Michal Hocko wrote:
>> [Let's CC Nick who has written this code]
>>
>> On Wed 12-10-16 22:30:13, zijun_hu wrote:
>>> From: zijun_hu <zijun...@htc.com>
>>>
>>> the KVA allocator organizes vmap_areas allocated by rbtree. in order to
>>> insert a new vmap_area @i_va into the rbtree, walk around the rbtree from
>>> root and compare the vmap_area @t_va met on the rbtree against @i_va; walk
>>> toward the left branch of @t_va if @i_va is lower than @t_va, and right
>>> branch if higher, otherwise handle this error case since @i_va has overlay
>>> with @t_va; however, __insert_vmap_area() don't follow the desired
>>> procedure rightly, moreover, it includes a meaningless else if condition
>>> and a redundant else branch as shown by comments in below code segments:
>>> static void __insert_vmap_area(struct vmap_area *va)
>>> {
>>> as a internal interface parameter, we assume vmap_area @va has nonzero size
>>> ...
>>> if (va->va_start < tmp->va_end)
>>> p = &(*p)->rb_left;
>>> else if (va->va_end > tmp->va_start)
>>> p = &(*p)->rb_right;
>>> this else if condition is always true and meaningless due to
>>> va->va_end > va->va_start >= tmp_va->va_end > tmp_va->va_start normally
>>> else
>>> BUG();
>>> this BUG() is meaningless too due to never be reached normally
>>> ...
>>> }
>>>
>>> it looks like the else if condition and else branch are canceled. no errors
>>> are caused since the vmap_area @va to insert as a internal interface
>>> parameter doesn't have overlay with any one on the rbtree normally. however
>>>  __insert_vmap_area() looks weird and really has several logic errors as
>>> pointed out above when it is viewed as a separate function.
>>
>> I have tried to read this several times but I am completely lost to
>> understand what the actual bug is and how it causes vmap_area sorting to
>> misbehave. So is this a correctness issue, performance improvement or
>> theoretical fix for an incorrect input?
>>
>>> fix by walking around vmap_area rbtree as described above to insert
>>> a vmap_area.
>>>
>>> BTW, (va->va_end == tmp_va->va_start) is consider as legal case since it
>>> indicates vmap_area @va left neighbors with @tmp_va tightly.
>>>
>>> Fixes: db64fe02258f ("mm: rewrite vmap layer")
>>> Signed-off-by: zijun_hu <zijun...@htc.com>
>>> ---
>>>  mm/vmalloc.c | 8 
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index 5daf3211b84f..8b80931654b7 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)
>>>  
>>> parent = *p;
>>> tmp_va = rb_entry(parent, struct vmap_area, rb_node);
>>> -   if (va->va_start < tmp_va->va_end)
>>> -   p = &(*p)->rb_left;
>>> -   else if (va->va_end > tmp_va->va_start)
>>> -   p = &(*p)->rb_right;
>>> +   if (va->va_end <= tmp_va->va_start)
>>> +   p = >rb_left;
>>> +   else if (va->va_start >= tmp_va->va_end)
>>> +   p = >rb_right;
>>> else
>>> BUG();
>>> }
>>> -- 
>>> 1.9.1
>>
>

Re: [RFC PATCH 1/1] mm/vmalloc.c: correct logic errors when insert vmap_area

2016-10-20 Thread zijun_hu

On 10/13/2016 02:39 PM, zijun_hu wrote:

Hi Nicholas,
could you give some comments for this patch?

thanks a lot
> Hi Nicholas,
> 
> i find __insert_vmap_area() is introduced by you
> could you offer comments for this patch related to that funciton
> 
> thanks
> 
> On 10/12/2016 10:46 PM, Michal Hocko wrote:
>> [Let's CC Nick who has written this code]
>>
>> On Wed 12-10-16 22:30:13, zijun_hu wrote:
>>> From: zijun_hu 
>>>
>>> the KVA allocator organizes vmap_areas allocated by rbtree. in order to
>>> insert a new vmap_area @i_va into the rbtree, walk around the rbtree from
>>> root and compare the vmap_area @t_va met on the rbtree against @i_va; walk
>>> toward the left branch of @t_va if @i_va is lower than @t_va, and right
>>> branch if higher, otherwise handle this error case since @i_va has overlay
>>> with @t_va; however, __insert_vmap_area() don't follow the desired
>>> procedure rightly, moreover, it includes a meaningless else if condition
>>> and a redundant else branch as shown by comments in below code segments:
>>> static void __insert_vmap_area(struct vmap_area *va)
>>> {
>>> as a internal interface parameter, we assume vmap_area @va has nonzero size
>>> ...
>>> if (va->va_start < tmp->va_end)
>>> p = &(*p)->rb_left;
>>> else if (va->va_end > tmp->va_start)
>>> p = &(*p)->rb_right;
>>> this else if condition is always true and meaningless due to
>>> va->va_end > va->va_start >= tmp_va->va_end > tmp_va->va_start normally
>>> else
>>> BUG();
>>> this BUG() is meaningless too due to never be reached normally
>>> ...
>>> }
>>>
>>> it looks like the else if condition and else branch are canceled. no errors
>>> are caused since the vmap_area @va to insert as a internal interface
>>> parameter doesn't have overlay with any one on the rbtree normally. however
>>>  __insert_vmap_area() looks weird and really has several logic errors as
>>> pointed out above when it is viewed as a separate function.
>>
>> I have tried to read this several times but I am completely lost to
>> understand what the actual bug is and how it causes vmap_area sorting to
>> misbehave. So is this a correctness issue, performance improvement or
>> theoretical fix for an incorrect input?
>>
>>> fix by walking around vmap_area rbtree as described above to insert
>>> a vmap_area.
>>>
>>> BTW, (va->va_end == tmp_va->va_start) is consider as legal case since it
>>> indicates vmap_area @va left neighbors with @tmp_va tightly.
>>>
>>> Fixes: db64fe02258f ("mm: rewrite vmap layer")
>>> Signed-off-by: zijun_hu 
>>> ---
>>>  mm/vmalloc.c | 8 
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index 5daf3211b84f..8b80931654b7 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)
>>>  
>>> parent = *p;
>>> tmp_va = rb_entry(parent, struct vmap_area, rb_node);
>>> -   if (va->va_start < tmp_va->va_end)
>>> -   p = &(*p)->rb_left;
>>> -   else if (va->va_end > tmp_va->va_start)
>>> -   p = &(*p)->rb_right;
>>> +   if (va->va_end <= tmp_va->va_start)
>>> +   p = >rb_left;
>>> +   else if (va->va_start >= tmp_va->va_end)
>>> +   p = >rb_right;
>>> else
>>> BUG();
>>> }
>>> -- 
>>> 1.9.1
>>
>

[PATCH 1/1] mm/percpu.c: append alignment sanity checkup to avoid memory leakage

2016-10-14 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

the percpu allocator only works well currently when allocates a power of
2 aligned area, but there aren't any hints for alignment requirement, so
memory leakage maybe be caused if allocate other alignment areas

the alignment must be a even at least since the LSB of a chunk->map element
is used as free/in-use flag of a area; besides, the alignment must be a
power of 2 too since ALIGN() doesn't work well for other alignment always
but is adopted by pcpu_fit_in_area(). IOW, the current allocator only works
well for a power of 2 aligned area allocation.

see below opposite example for why a odd alignment doesn't work
lets assume area [16, 36) is free but its previous one is in-use, we want
to allocate a @size == 8 and @align == 7 area. the larger area [16, 36) is
split to three areas [16, 21), [21, 29), [29, 36) eventually. however, due
to the usage for a chunk->map element, the actual offset of the aim area
[21, 29) is 21 but is recorded in relevant element as 20; moreover the
residual tail free area [29, 36) is mistook as in-use and is lost silently

unlike macro roundup(), ALIGN(x, a) doesn't work if @a isn't a power of 2
for example, roundup(10, 6) == 12 but ALIGN(10, 6) == 10, and the latter
result isn't desired obviously.

fix it by appending sanity checkup for alignment requirement

Signed-off-by: zijun_hu <zijun...@htc.com>
Suggested-by: Tejun Heo <t...@kernel.org>
---
 include/linux/kernel.h | 1 +
 mm/percpu.c| 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index bc6ed52a39b9..0dc0b21bd164 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -45,6 +45,7 @@
 
 #define REPEAT_BYTE(x) ((~0ul / 0xff) * (x))
 
+/* @a is a power of 2 value */
 #define ALIGN(x, a)__ALIGN_KERNEL((x), (a))
 #define __ALIGN_MASK(x, mask)  __ALIGN_KERNEL_MASK((x), (mask))
 #define PTR_ALIGN(p, a)((typeof(p))ALIGN((unsigned long)(p), 
(a)))
diff --git a/mm/percpu.c b/mm/percpu.c
index 255714302394..10ba3f9a3826 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -886,7 +886,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, 
bool reserved,
 
size = ALIGN(size, 2);
 
-   if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
+   if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE
+   || !is_power_of_2(align))) {
WARN(true, "illegal size (%zu) or align (%zu) for percpu 
allocation\n",
 size, align);
return NULL;
-- 
1.9.1

[PATCH 1/1] mm/percpu.c: append alignment sanity checkup to avoid memory leakage

2016-10-14 Thread zijun_hu

From: zijun_hu 

the percpu allocator only works well currently when allocates a power of
2 aligned area, but there aren't any hints for alignment requirement, so
memory leakage maybe be caused if allocate other alignment areas

the alignment must be a even at least since the LSB of a chunk->map element
is used as free/in-use flag of a area; besides, the alignment must be a
power of 2 too since ALIGN() doesn't work well for other alignment always
but is adopted by pcpu_fit_in_area(). IOW, the current allocator only works
well for a power of 2 aligned area allocation.

see below opposite example for why a odd alignment doesn't work
lets assume area [16, 36) is free but its previous one is in-use, we want
to allocate a @size == 8 and @align == 7 area. the larger area [16, 36) is
split to three areas [16, 21), [21, 29), [29, 36) eventually. however, due
to the usage for a chunk->map element, the actual offset of the aim area
[21, 29) is 21 but is recorded in relevant element as 20; moreover the
residual tail free area [29, 36) is mistook as in-use and is lost silently

unlike macro roundup(), ALIGN(x, a) doesn't work if @a isn't a power of 2
for example, roundup(10, 6) == 12 but ALIGN(10, 6) == 10, and the latter
result isn't desired obviously.

fix it by appending sanity checkup for alignment requirement

Signed-off-by: zijun_hu 
Suggested-by: Tejun Heo 
---
 include/linux/kernel.h | 1 +
 mm/percpu.c| 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index bc6ed52a39b9..0dc0b21bd164 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -45,6 +45,7 @@
 
 #define REPEAT_BYTE(x) ((~0ul / 0xff) * (x))
 
+/* @a is a power of 2 value */
 #define ALIGN(x, a)__ALIGN_KERNEL((x), (a))
 #define __ALIGN_MASK(x, mask)  __ALIGN_KERNEL_MASK((x), (mask))
 #define PTR_ALIGN(p, a)((typeof(p))ALIGN((unsigned long)(p), 
(a)))
diff --git a/mm/percpu.c b/mm/percpu.c
index 255714302394..10ba3f9a3826 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -886,7 +886,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, 
bool reserved,
 
size = ALIGN(size, 2);
 
-   if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
+   if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE
+   || !is_power_of_2(align))) {
WARN(true, "illegal size (%zu) or align (%zu) for percpu 
allocation\n",
 size, align);
return NULL;
-- 
1.9.1

Re: [RFC v2 PATCH] mm/percpu.c: simplify grouping CPU algorithm

2016-10-13 Thread zijun_hu

On 2016/10/14 8:33, Tejun Heo wrote:
> Hello,
> 
> On Fri, Oct 14, 2016 at 07:49:44AM +0800, zijun_hu wrote:
>> the main intent of this change is making the CPU grouping algorithm more
>> easily to understand, especially, for newcomer for memory managements
>> take me as a example, i really take me a longer timer to understand it
> 
> If the new code is easier to understand, it's only so marginally.  It
> just isn't worth the effort or risk.
> 
> Thanks.
> 
okay i agree with your opinion.
but i am sure this changes don't have any risk after tests and theoretic analyse

Re: [RFC v2 PATCH] mm/percpu.c: simplify grouping CPU algorithm

2016-10-13 Thread zijun_hu

On 2016/10/14 8:33, Tejun Heo wrote:
> Hello,
> 
> On Fri, Oct 14, 2016 at 07:49:44AM +0800, zijun_hu wrote:
>> the main intent of this change is making the CPU grouping algorithm more
>> easily to understand, especially, for newcomer for memory managements
>> take me as a example, i really take me a longer timer to understand it
> 
> If the new code is easier to understand, it's only so marginally.  It
> just isn't worth the effort or risk.
> 
> Thanks.
> 
okay i agree with your opinion.
but i am sure this changes don't have any risk after tests and theoretic analyse

Re: [RFC PATCH 1/1] mm/percpu.c: fix several trivial issues

2016-10-13 Thread zijun_hu

On 2016/10/14 8:34, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 09:29:27PM +0800, zijun_hu wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> as shown by pcpu_setup_first_chunk(), the first chunk is same as the
>> reserved chunk if the reserved size is nonzero but the dynamic is zero
>> this special scenario is referred as the special case by below content
>>
>> fix several trivial issues:
>>
>> 1) correct or fix several comments
>> the LSB of a chunk->map element is used as free/in-use flag and is cleared
>> for free area and set for in-use, rather than use positive/negative number
>> to mark area state.
>>
>> 2) change atomic size to PAGE_SIZE for consistency when CONFIG_SMP == n
>> both default setup_per_cpu_areas() and pcpu_page_first_chunk()
>> use PAGE_SIZE as atomic size when CONFIG_SMP == y; however
>> setup_per_cpu_areas() allocates memory for the only unit with alignment
>> PAGE_SIZE but assigns unit size to atomic size when CONFIG_SMP == n, so the
>> atomic size isn't consistent with either the alignment or the SMP ones.
>> fix it by changing atomic size to PAGE_SIZE when CONFIG_SMP == n
>>
>> 3) correct empty and populated pages statistic error
>> in order to service dynamic atomic memory allocation, the number of empty
>> and populated pages of chunks is counted to maintain above a low threshold.
>> however, for the special case, the first chunk is took into account by
>> pcpu_setup_first_chunk(), it is meaningless since the chunk don't include
>> any dynamic areas.
>> fix it by excluding the reserved chunk before statistic as the other
>> contexts do.
>>
>> 4) fix potential memory leakage for percpu_init_late()
>> in order to manage chunk->map memory uniformly, for the first and reserved
>> chunks, percpu_init_late() will allocate memory to replace the static
>> chunk->map array within section .init.data after slab is brought up
>> however, for the special case, memory are allocated for the same chunk->map
>> twice since the first chunk reference is same as the reserved, so the
>> memory allocated at the first time are leaked obviously.
>> fix it by eliminating the second memory allocation under the special case
>>
>> Signed-off-by: zijun_hu <zijun...@htc.com>
> 
> Can you please break the changes into separate patches?
> 
> Thanks.
>

Re: [RFC PATCH 1/1] mm/percpu.c: fix several trivial issues

2016-10-13 Thread zijun_hu

On 2016/10/14 8:34, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 09:29:27PM +0800, zijun_hu wrote:
>> From: zijun_hu 
>>
>> as shown by pcpu_setup_first_chunk(), the first chunk is same as the
>> reserved chunk if the reserved size is nonzero but the dynamic is zero
>> this special scenario is referred as the special case by below content
>>
>> fix several trivial issues:
>>
>> 1) correct or fix several comments
>> the LSB of a chunk->map element is used as free/in-use flag and is cleared
>> for free area and set for in-use, rather than use positive/negative number
>> to mark area state.
>>
>> 2) change atomic size to PAGE_SIZE for consistency when CONFIG_SMP == n
>> both default setup_per_cpu_areas() and pcpu_page_first_chunk()
>> use PAGE_SIZE as atomic size when CONFIG_SMP == y; however
>> setup_per_cpu_areas() allocates memory for the only unit with alignment
>> PAGE_SIZE but assigns unit size to atomic size when CONFIG_SMP == n, so the
>> atomic size isn't consistent with either the alignment or the SMP ones.
>> fix it by changing atomic size to PAGE_SIZE when CONFIG_SMP == n
>>
>> 3) correct empty and populated pages statistic error
>> in order to service dynamic atomic memory allocation, the number of empty
>> and populated pages of chunks is counted to maintain above a low threshold.
>> however, for the special case, the first chunk is took into account by
>> pcpu_setup_first_chunk(), it is meaningless since the chunk don't include
>> any dynamic areas.
>> fix it by excluding the reserved chunk before statistic as the other
>> contexts do.
>>
>> 4) fix potential memory leakage for percpu_init_late()
>> in order to manage chunk->map memory uniformly, for the first and reserved
>> chunks, percpu_init_late() will allocate memory to replace the static
>> chunk->map array within section .init.data after slab is brought up
>> however, for the special case, memory are allocated for the same chunk->map
>> twice since the first chunk reference is same as the reserved, so the
>> memory allocated at the first time are leaked obviously.
>> fix it by eliminating the second memory allocation under the special case
>>
>> Signed-off-by: zijun_hu 
> 
> Can you please break the changes into separate patches?
> 
> Thanks.
>

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-13 Thread zijun_hu

On 2016/10/14 8:28, Tejun Heo wrote:
> Hello,
> 
> On Fri, Oct 14, 2016 at 08:23:06AM +0800, zijun_hu wrote:
>> for the current code, only power of 2 alignment value can works well
>>
>> is it acceptable to performing a power of 2 checking and returning error code
>> if fail?
> 
> Yeah, just add is_power_of_2() test to the existing sanity check.
> 
> Thanks.
> 
okay. i will do that

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-13 Thread zijun_hu

On 2016/10/14 8:28, Tejun Heo wrote:
> Hello,
> 
> On Fri, Oct 14, 2016 at 08:23:06AM +0800, zijun_hu wrote:
>> for the current code, only power of 2 alignment value can works well
>>
>> is it acceptable to performing a power of 2 checking and returning error code
>> if fail?
> 
> Yeah, just add is_power_of_2() test to the existing sanity check.
> 
> Thanks.
> 
okay. i will do that

Re: [RFC PATCH 1/1] mm/percpu.c: fix several trivial issues

2016-10-13 Thread zijun_hu

On 2016/10/14 8:34, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 09:29:27PM +0800, zijun_hu wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> as shown by pcpu_setup_first_chunk(), the first chunk is same as the
>> reserved chunk if the reserved size is nonzero but the dynamic is zero
>> this special scenario is referred as the special case by below content
>>
>> fix several trivial issues:
>>
>> 1) correct or fix several comments
>> the LSB of a chunk->map element is used as free/in-use flag and is cleared
>> for free area and set for in-use, rather than use positive/negative number
>> to mark area state.
>>
>> 2) change atomic size to PAGE_SIZE for consistency when CONFIG_SMP == n
>> both default setup_per_cpu_areas() and pcpu_page_first_chunk()
>> use PAGE_SIZE as atomic size when CONFIG_SMP == y; however
>> setup_per_cpu_areas() allocates memory for the only unit with alignment
>> PAGE_SIZE but assigns unit size to atomic size when CONFIG_SMP == n, so the
>> atomic size isn't consistent with either the alignment or the SMP ones.
>> fix it by changing atomic size to PAGE_SIZE when CONFIG_SMP == n
>>
>> 3) correct empty and populated pages statistic error
>> in order to service dynamic atomic memory allocation, the number of empty
>> and populated pages of chunks is counted to maintain above a low threshold.
>> however, for the special case, the first chunk is took into account by
>> pcpu_setup_first_chunk(), it is meaningless since the chunk don't include
>> any dynamic areas.
>> fix it by excluding the reserved chunk before statistic as the other
>> contexts do.
>>
>> 4) fix potential memory leakage for percpu_init_late()
>> in order to manage chunk->map memory uniformly, for the first and reserved
>> chunks, percpu_init_late() will allocate memory to replace the static
>> chunk->map array within section .init.data after slab is brought up
>> however, for the special case, memory are allocated for the same chunk->map
>> twice since the first chunk reference is same as the reserved, so the
>> memory allocated at the first time are leaked obviously.
>> fix it by eliminating the second memory allocation under the special case
>>
>> Signed-off-by: zijun_hu <zijun...@htc.com>
> 
> Can you please break the changes into separate patches?
  yes, i can
  could you give many comments for that trivial issues firstly?
  i will separate and product which you thinks reasonable formally.
  thanks
> 
> Thanks.
>

Re: [RFC PATCH 1/1] mm/percpu.c: fix several trivial issues

2016-10-13 Thread zijun_hu

On 2016/10/14 8:34, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 09:29:27PM +0800, zijun_hu wrote:
>> From: zijun_hu 
>>
>> as shown by pcpu_setup_first_chunk(), the first chunk is same as the
>> reserved chunk if the reserved size is nonzero but the dynamic is zero
>> this special scenario is referred as the special case by below content
>>
>> fix several trivial issues:
>>
>> 1) correct or fix several comments
>> the LSB of a chunk->map element is used as free/in-use flag and is cleared
>> for free area and set for in-use, rather than use positive/negative number
>> to mark area state.
>>
>> 2) change atomic size to PAGE_SIZE for consistency when CONFIG_SMP == n
>> both default setup_per_cpu_areas() and pcpu_page_first_chunk()
>> use PAGE_SIZE as atomic size when CONFIG_SMP == y; however
>> setup_per_cpu_areas() allocates memory for the only unit with alignment
>> PAGE_SIZE but assigns unit size to atomic size when CONFIG_SMP == n, so the
>> atomic size isn't consistent with either the alignment or the SMP ones.
>> fix it by changing atomic size to PAGE_SIZE when CONFIG_SMP == n
>>
>> 3) correct empty and populated pages statistic error
>> in order to service dynamic atomic memory allocation, the number of empty
>> and populated pages of chunks is counted to maintain above a low threshold.
>> however, for the special case, the first chunk is took into account by
>> pcpu_setup_first_chunk(), it is meaningless since the chunk don't include
>> any dynamic areas.
>> fix it by excluding the reserved chunk before statistic as the other
>> contexts do.
>>
>> 4) fix potential memory leakage for percpu_init_late()
>> in order to manage chunk->map memory uniformly, for the first and reserved
>> chunks, percpu_init_late() will allocate memory to replace the static
>> chunk->map array within section .init.data after slab is brought up
>> however, for the special case, memory are allocated for the same chunk->map
>> twice since the first chunk reference is same as the reserved, so the
>> memory allocated at the first time are leaked obviously.
>> fix it by eliminating the second memory allocation under the special case
>>
>> Signed-off-by: zijun_hu 
> 
> Can you please break the changes into separate patches?
  yes, i can
  could you give many comments for that trivial issues firstly?
  i will separate and product which you thinks reasonable formally.
  thanks
> 
> Thanks.
>

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-13 Thread zijun_hu

On 2016/10/14 7:31, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 09:24:50PM +0800, zijun_hu wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> the LSB of a chunk->map element is used for free/in-use flag of a area
>> and the other bits for offset, the sufficient and necessary condition of
>> this usage is that both size and alignment of a area must be even numbers
>> however, pcpu_alloc() doesn't force its @align parameter a even number
>> explicitly, so a odd @align maybe causes a series of errors, see below
>> example for concrete descriptions.
>>
>> lets assume area [16, 36) is free but its previous one is in-use, we want
>> to allocate a @size == 8 and @align == 7 area. the larger area [16, 36) is
>> split to three areas [16, 21), [21, 29), [29, 36) eventually. however, due
>> to the usage for a chunk->map element, the actual offset of the aim area
>> [21, 29) is 21 but is recorded in relevant element as 20; moreover the
>> residual tail free area [29, 36) is mistook as in-use and is lost silently
>>
>> as explained above, inaccurate either offset or free/in-use state of
>> a area is recorded into relevant chunk->map element if request a odd
>> alignment area, and so causes memory leakage issue
>>
>> fix it by forcing the @align of a area to allocate a even number
>> as do for @size.
>>
>> BTW, macro ALIGN() within pcpu_fit_in_area() is replaced by roundup() too
>> due to back reason. in order to align a value @v up to @a boundary, macro
>> roundup(v, a) is more generic than ALIGN(x, a); the latter doesn't work
>> well when @a isn't a power of 2 value. for example, roundup(10, 6) == 12
>> but ALIGN(10, 6) == 10, the former result is desired obviously
>>
>> Signed-off-by: zijun_hu <zijun...@htc.com>
> 
> Nacked-by: Tejun Heo <t...@kernel.org>
> 
> This is a fix for an imaginary problem.  The most we should do about
> odd alignment is triggering a WARN_ON.
> 
for the current code, only power of 2 alignment value can works well

is it acceptable to performing a power of 2 checking and returning error code
if fail?

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-13 Thread zijun_hu

On 2016/10/14 7:31, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 09:24:50PM +0800, zijun_hu wrote:
>> From: zijun_hu 
>>
>> the LSB of a chunk->map element is used for free/in-use flag of a area
>> and the other bits for offset, the sufficient and necessary condition of
>> this usage is that both size and alignment of a area must be even numbers
>> however, pcpu_alloc() doesn't force its @align parameter a even number
>> explicitly, so a odd @align maybe causes a series of errors, see below
>> example for concrete descriptions.
>>
>> lets assume area [16, 36) is free but its previous one is in-use, we want
>> to allocate a @size == 8 and @align == 7 area. the larger area [16, 36) is
>> split to three areas [16, 21), [21, 29), [29, 36) eventually. however, due
>> to the usage for a chunk->map element, the actual offset of the aim area
>> [21, 29) is 21 but is recorded in relevant element as 20; moreover the
>> residual tail free area [29, 36) is mistook as in-use and is lost silently
>>
>> as explained above, inaccurate either offset or free/in-use state of
>> a area is recorded into relevant chunk->map element if request a odd
>> alignment area, and so causes memory leakage issue
>>
>> fix it by forcing the @align of a area to allocate a even number
>> as do for @size.
>>
>> BTW, macro ALIGN() within pcpu_fit_in_area() is replaced by roundup() too
>> due to back reason. in order to align a value @v up to @a boundary, macro
>> roundup(v, a) is more generic than ALIGN(x, a); the latter doesn't work
>> well when @a isn't a power of 2 value. for example, roundup(10, 6) == 12
>> but ALIGN(10, 6) == 10, the former result is desired obviously
>>
>> Signed-off-by: zijun_hu 
> 
> Nacked-by: Tejun Heo 
> 
> This is a fix for an imaginary problem.  The most we should do about
> odd alignment is triggering a WARN_ON.
> 
for the current code, only power of 2 alignment value can works well

is it acceptable to performing a power of 2 checking and returning error code
if fail?

Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-13 Thread zijun_hu

On 2016/10/14 7:29, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 10:00:28PM +0800, zijun_hu wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> I really can't decode what the actual issue is here.  Can you please
> give an example of a concrete case?
> 
the right relationship between the number of CPUs @nr_cpus within a percpu group
and the number of unites @nr_units within the same group is that
@nr_units == roundup(@nr_cpus, @upa);

the process of consideration is shown as follows:

1） current code segments:

BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

2） changes for considering the right relationship between the number of CPUs 
and units 

BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa));

3) replace BUG_ON() by warning and returning error code since it seems BUG_ON() 
isn't
   nice as shown by linus recent LKML mail

BUG_ON(ai->nr_groups != 1);
if (ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa))
   return -EINVAL;

so 3) is my finial changes;
for the relationship of both numbers : see the reply for andrew

>> @@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t 
>> reserved_size,
>>  
>>  /* allocate pages */
>>  j = 0;
>> -for (unit = 0; unit < num_possible_cpus(); unit++)
>> +for (unit = 0; unit < num_possible_cpus(); unit++) {
>> +unsigned int cpu = ai->groups[0].cpu_map[unit];
>>  for (i = 0; i < unit_pages; i++) {
>> -unsigned int cpu = ai->groups[0].cpu_map[unit];
>>  void *ptr;
>>  
>>  ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
>>  if (!ptr) {
>>  pr_warn("failed to allocate %s page for 
>> cpu%u\n",
>> -psize_str, cpu);
>> +psize_str, cpu);
> 
> And stop making gratuitous changes?
>

this changes is just for looking nicer instinctively
@cpu can be determined in the first outer loop.

> Thanks.
>

Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-13 Thread zijun_hu

On 2016/10/14 7:29, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 10:00:28PM +0800, zijun_hu wrote:
>> From: zijun_hu 
>>
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> I really can't decode what the actual issue is here.  Can you please
> give an example of a concrete case?
> 
the right relationship between the number of CPUs @nr_cpus within a percpu group
and the number of unites @nr_units within the same group is that
@nr_units == roundup(@nr_cpus, @upa);

the process of consideration is shown as follows:

1） current code segments:

BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

2） changes for considering the right relationship between the number of CPUs 
and units 

BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa));

3) replace BUG_ON() by warning and returning error code since it seems BUG_ON() 
isn't
   nice as shown by linus recent LKML mail

BUG_ON(ai->nr_groups != 1);
if (ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa))
   return -EINVAL;

so 3) is my finial changes;
for the relationship of both numbers : see the reply for andrew

>> @@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t 
>> reserved_size,
>>  
>>  /* allocate pages */
>>  j = 0;
>> -for (unit = 0; unit < num_possible_cpus(); unit++)
>> +for (unit = 0; unit < num_possible_cpus(); unit++) {
>> +unsigned int cpu = ai->groups[0].cpu_map[unit];
>>  for (i = 0; i < unit_pages; i++) {
>> -unsigned int cpu = ai->groups[0].cpu_map[unit];
>>  void *ptr;
>>  
>>  ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
>>  if (!ptr) {
>>  pr_warn("failed to allocate %s page for 
>> cpu%u\n",
>> -psize_str, cpu);
>> +psize_str, cpu);
> 
> And stop making gratuitous changes?
>

this changes is just for looking nicer instinctively
@cpu can be determined in the first outer loop.

> Thanks.
>

Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-13 Thread zijun_hu

On 2016/10/14 7:29, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 10:00:28PM +0800, zijun_hu wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> I really can't decode what the actual issue is here.  Can you please
> give an example of a concrete case?
> 
the right relationship between the number of CPUs @nr_cpus within a percpu group
and the number of unites @nr_units within the same group is that
@nr_units == roundup(@nr_cpus, @upa);

the process of consideration is shown as follows:

1） current code segments:

BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

2） changes for considering the right relationship between the number of CPUs 
and units 

BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa));

3) replace BUG_ON() by warning and returning error code since it seems BUG_ON() 
isn't
   nice as shown by linus recent LKML mail

BUG_ON(ai->nr_groups != 1);
if (ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa))
   return -EINVAL;

so 3) is my finial changes;


for the relationship of both numbers : see the reply for andrew
thanks

;
>> @@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t 
>> reserved_size,
>>  
>>  /* allocate pages */
>>  j = 0;
>> -for (unit = 0; unit < num_possible_cpus(); unit++)
>> +for (unit = 0; unit < num_possible_cpus(); unit++) {
>> +unsigned int cpu = ai->groups[0].cpu_map[unit];
>>  for (i = 0; i < unit_pages; i++) {
>> -unsigned int cpu = ai->groups[0].cpu_map[unit];
>>  void *ptr;
>>  
>>  ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
>>  if (!ptr) {
>>  pr_warn("failed to allocate %s page for 
>> cpu%u\n",
>> -psize_str, cpu);
>> +psize_str, cpu);
> 
> And stop making gratuitous changes?
> 
> Thanks.
>

Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-13 Thread zijun_hu

On 2016/10/14 7:29, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 10:00:28PM +0800, zijun_hu wrote:
>> From: zijun_hu 
>>
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> I really can't decode what the actual issue is here.  Can you please
> give an example of a concrete case?
> 
the right relationship between the number of CPUs @nr_cpus within a percpu group
and the number of unites @nr_units within the same group is that
@nr_units == roundup(@nr_cpus, @upa);

the process of consideration is shown as follows:

1） current code segments:

BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

2） changes for considering the right relationship between the number of CPUs 
and units 

BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa));

3) replace BUG_ON() by warning and returning error code since it seems BUG_ON() 
isn't
   nice as shown by linus recent LKML mail

BUG_ON(ai->nr_groups != 1);
if (ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa))
   return -EINVAL;

so 3) is my finial changes;


for the relationship of both numbers : see the reply for andrew
thanks

;
>> @@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t 
>> reserved_size,
>>  
>>  /* allocate pages */
>>  j = 0;
>> -for (unit = 0; unit < num_possible_cpus(); unit++)
>> +for (unit = 0; unit < num_possible_cpus(); unit++) {
>> +unsigned int cpu = ai->groups[0].cpu_map[unit];
>>  for (i = 0; i < unit_pages; i++) {
>> -unsigned int cpu = ai->groups[0].cpu_map[unit];
>>  void *ptr;
>>  
>>  ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
>>  if (!ptr) {
>>  pr_warn("failed to allocate %s page for 
>> cpu%u\n",
>> -psize_str, cpu);
>> +psize_str, cpu);
> 
> And stop making gratuitous changes?
> 
> Thanks.
>

Re: [RFC v2 PATCH] mm/percpu.c: simplify grouping CPU algorithm

2016-10-13 Thread zijun_hu

On 2016/10/14 7:37, Tejun Heo wrote:
> Hello, Zijun.
> 
> On Tue, Oct 11, 2016 at 08:48:45PM +0800, zijun_hu wrote:
>> compared with the original algorithm theoretically and practically, the
>> new one educes the same grouping results, besides, it is more effective,
>> simpler and easier to understand.
> 
> If the original code wasn't broken and the new code produces the same
> output, I'd really not mess with this code.  There simply is no upside
> to messing with this code.  It's run once during boot and never a
> noticeable contributor of boot overhead.  Maybe the new code is a bit
> simpler and more efficient but the actual benefit is so small that any
> risk would outweigh it.
> 
> Thanks.
>
the main intent of this change is making the CPU grouping algorithm more
easily to understand, especially, for newcomer for memory managements
take me as a example, i really take me a longer timer to understand it

Re: [RFC v2 PATCH] mm/percpu.c: simplify grouping CPU algorithm

2016-10-13 Thread zijun_hu

On 2016/10/14 7:37, Tejun Heo wrote:
> Hello, Zijun.
> 
> On Tue, Oct 11, 2016 at 08:48:45PM +0800, zijun_hu wrote:
>> compared with the original algorithm theoretically and practically, the
>> new one educes the same grouping results, besides, it is more effective,
>> simpler and easier to understand.
> 
> If the original code wasn't broken and the new code produces the same
> output, I'd really not mess with this code.  There simply is no upside
> to messing with this code.  It's run once during boot and never a
> noticeable contributor of boot overhead.  Maybe the new code is a bit
> simpler and more efficient but the actual benefit is so small that any
> risk would outweigh it.
> 
> Thanks.
>
the main intent of this change is making the CPU grouping algorithm more
easily to understand, especially, for newcomer for memory managements
take me as a example, i really take me a longer timer to understand it

Re: [RFC PATCH 1/1] mm/vmalloc.c: correct logic errors when insert vmap_area

2016-10-13 Thread zijun_hu

Hi Nicholas,

i find __insert_vmap_area() is introduced by you
could you offer comments for this patch related to that funciton

thanks

On 10/12/2016 10:46 PM, Michal Hocko wrote:
> [Let's CC Nick who has written this code]
> 
> On Wed 12-10-16 22:30:13, zijun_hu wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> the KVA allocator organizes vmap_areas allocated by rbtree. in order to
>> insert a new vmap_area @i_va into the rbtree, walk around the rbtree from
>> root and compare the vmap_area @t_va met on the rbtree against @i_va; walk
>> toward the left branch of @t_va if @i_va is lower than @t_va, and right
>> branch if higher, otherwise handle this error case since @i_va has overlay
>> with @t_va; however, __insert_vmap_area() don't follow the desired
>> procedure rightly, moreover, it includes a meaningless else if condition
>> and a redundant else branch as shown by comments in below code segments:
>> static void __insert_vmap_area(struct vmap_area *va)
>> {
>> as a internal interface parameter, we assume vmap_area @va has nonzero size
>> ...
>>  if (va->va_start < tmp->va_end)
>>  p = &(*p)->rb_left;
>>  else if (va->va_end > tmp->va_start)
>>  p = &(*p)->rb_right;
>> this else if condition is always true and meaningless due to
>> va->va_end > va->va_start >= tmp_va->va_end > tmp_va->va_start normally
>>  else
>>  BUG();
>> this BUG() is meaningless too due to never be reached normally
>> ...
>> }
>>
>> it looks like the else if condition and else branch are canceled. no errors
>> are caused since the vmap_area @va to insert as a internal interface
>> parameter doesn't have overlay with any one on the rbtree normally. however
>>  __insert_vmap_area() looks weird and really has several logic errors as
>> pointed out above when it is viewed as a separate function.
> 
> I have tried to read this several times but I am completely lost to
> understand what the actual bug is and how it causes vmap_area sorting to
> misbehave. So is this a correctness issue, performance improvement or
> theoretical fix for an incorrect input?
> 
>> fix by walking around vmap_area rbtree as described above to insert
>> a vmap_area.
>>
>> BTW, (va->va_end == tmp_va->va_start) is consider as legal case since it
>> indicates vmap_area @va left neighbors with @tmp_va tightly.
>>
>> Fixes: db64fe02258f ("mm: rewrite vmap layer")
>> Signed-off-by: zijun_hu <zijun...@htc.com>
>> ---
>>  mm/vmalloc.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 5daf3211b84f..8b80931654b7 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)
>>  
>>  parent = *p;
>>  tmp_va = rb_entry(parent, struct vmap_area, rb_node);
>> -if (va->va_start < tmp_va->va_end)
>> -p = &(*p)->rb_left;
>> -else if (va->va_end > tmp_va->va_start)
>> -p = &(*p)->rb_right;
>> +if (va->va_end <= tmp_va->va_start)
>> +p = >rb_left;
>> +else if (va->va_start >= tmp_va->va_end)
>> +p = >rb_right;
>>  else
>>  BUG();
>>  }
>> -- 
>> 1.9.1
>

Re: [RFC PATCH 1/1] mm/vmalloc.c: correct logic errors when insert vmap_area

2016-10-13 Thread zijun_hu

Hi Nicholas,

i find __insert_vmap_area() is introduced by you
could you offer comments for this patch related to that funciton

thanks

On 10/12/2016 10:46 PM, Michal Hocko wrote:
> [Let's CC Nick who has written this code]
> 
> On Wed 12-10-16 22:30:13, zijun_hu wrote:
>> From: zijun_hu 
>>
>> the KVA allocator organizes vmap_areas allocated by rbtree. in order to
>> insert a new vmap_area @i_va into the rbtree, walk around the rbtree from
>> root and compare the vmap_area @t_va met on the rbtree against @i_va; walk
>> toward the left branch of @t_va if @i_va is lower than @t_va, and right
>> branch if higher, otherwise handle this error case since @i_va has overlay
>> with @t_va; however, __insert_vmap_area() don't follow the desired
>> procedure rightly, moreover, it includes a meaningless else if condition
>> and a redundant else branch as shown by comments in below code segments:
>> static void __insert_vmap_area(struct vmap_area *va)
>> {
>> as a internal interface parameter, we assume vmap_area @va has nonzero size
>> ...
>>  if (va->va_start < tmp->va_end)
>>  p = &(*p)->rb_left;
>>  else if (va->va_end > tmp->va_start)
>>  p = &(*p)->rb_right;
>> this else if condition is always true and meaningless due to
>> va->va_end > va->va_start >= tmp_va->va_end > tmp_va->va_start normally
>>  else
>>  BUG();
>> this BUG() is meaningless too due to never be reached normally
>> ...
>> }
>>
>> it looks like the else if condition and else branch are canceled. no errors
>> are caused since the vmap_area @va to insert as a internal interface
>> parameter doesn't have overlay with any one on the rbtree normally. however
>>  __insert_vmap_area() looks weird and really has several logic errors as
>> pointed out above when it is viewed as a separate function.
> 
> I have tried to read this several times but I am completely lost to
> understand what the actual bug is and how it causes vmap_area sorting to
> misbehave. So is this a correctness issue, performance improvement or
> theoretical fix for an incorrect input?
> 
>> fix by walking around vmap_area rbtree as described above to insert
>> a vmap_area.
>>
>> BTW, (va->va_end == tmp_va->va_start) is consider as legal case since it
>> indicates vmap_area @va left neighbors with @tmp_va tightly.
>>
>> Fixes: db64fe02258f ("mm: rewrite vmap layer")
>> Signed-off-by: zijun_hu 
>> ---
>>  mm/vmalloc.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 5daf3211b84f..8b80931654b7 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)
>>  
>>  parent = *p;
>>  tmp_va = rb_entry(parent, struct vmap_area, rb_node);
>> -if (va->va_start < tmp_va->va_end)
>> -p = &(*p)->rb_left;
>> -else if (va->va_end > tmp_va->va_start)
>> -p = &(*p)->rb_right;
>> +if (va->va_end <= tmp_va->va_start)
>> +p = >rb_left;
>> +else if (va->va_start >= tmp_va->va_end)
>> +p = >rb_right;
>>  else
>>  BUG();
>>  }
>> -- 
>> 1.9.1
>

Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-12 Thread zijun_hu

On 10/13/2016 05:41 AM, Andrew Morton wrote:
> On Tue, 11 Oct 2016 22:00:28 +0800 zijun_hu <zijun...@zoho.com> wrote:
> 
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> Under what circumstances is the triggered?  In other words, what are
> the end-user visible effects of the fix?
> 
the BUG_ON() takes effect when the number isn't aligned @upa, the BUG_ON()
should not be triggered under this normal circumstances.
the aim of this fixing is prevent the BUG_ON() which is triggered under
the case.

see below original code segments for reason.
pcpu_build_alloc_info(){
...

for_each_possible_cpu(cpu)
if (group_map[cpu] == group)
gi->cpu_map[gi->nr_units++] = cpu;
gi->nr_units = roundup(gi->nr_units, upa);

calculate the number of CPUs belonging to a group into relevant @gi->nr_units
then roundup @gi->nr_units up to @upa for itself

unit += gi->nr_units;
...
}

pcpu_page_first_chunk() {
...
ai = pcpu_build_alloc_info(reserved_size, 0, PAGE_SIZE, NULL);
if (IS_ERR(ai))
return PTR_ERR(ai);
BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

it seems there is only one group and all CPUs belong to the group
but compare the number of CPUs with the number of units directly.

as shown by comments in above function. ai->groups[0].nr_units
should equal to roundup(num_possible_cpus(), @upa) other than
num_possible_cpus() directly.
...
}

> I mean, this is pretty old code (isn't it?) so what are you doing that
> triggers this?
> 
> 
i am learning memory source and find the inconsistency and think
the BUG_ON() maybe be triggered under this special normal but possible case

Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-12 Thread zijun_hu

On 10/13/2016 05:41 AM, Andrew Morton wrote:
> On Tue, 11 Oct 2016 22:00:28 +0800 zijun_hu  wrote:
> 
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> Under what circumstances is the triggered?  In other words, what are
> the end-user visible effects of the fix?
> 
the BUG_ON() takes effect when the number isn't aligned @upa, the BUG_ON()
should not be triggered under this normal circumstances.
the aim of this fixing is prevent the BUG_ON() which is triggered under
the case.

see below original code segments for reason.
pcpu_build_alloc_info(){
...

for_each_possible_cpu(cpu)
if (group_map[cpu] == group)
gi->cpu_map[gi->nr_units++] = cpu;
gi->nr_units = roundup(gi->nr_units, upa);

calculate the number of CPUs belonging to a group into relevant @gi->nr_units
then roundup @gi->nr_units up to @upa for itself

unit += gi->nr_units;
...
}

pcpu_page_first_chunk() {
...
ai = pcpu_build_alloc_info(reserved_size, 0, PAGE_SIZE, NULL);
if (IS_ERR(ai))
return PTR_ERR(ai);
BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

it seems there is only one group and all CPUs belong to the group
but compare the number of CPUs with the number of units directly.

as shown by comments in above function. ai->groups[0].nr_units
should equal to roundup(num_possible_cpus(), @upa) other than
num_possible_cpus() directly.
...
}

> I mean, this is pretty old code (isn't it?) so what are you doing that
> triggers this?
> 
> 
i am learning memory source and find the inconsistency and think
the BUG_ON() maybe be triggered under this special normal but possible case

Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-12 Thread zijun_hu

On 10/13/2016 05:41 AM, Andrew Morton wrote:
> On Tue, 11 Oct 2016 22:00:28 +0800 zijun_hu <zijun...@zoho.com> wrote:
> 
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> Under what circumstances is the triggered?  In other words, what are
> the end-user visible effects of the fix?
> 

the BUG_ON() takes effect when the number of CPUs isn't aligned @upa,
the BUG_ON() should not be triggered under this normal circumstances.
the aim of this fixing is prevent the BUG_ON() which is triggered under
the case.

see below original code segments for reason.
pcpu_build_alloc_info(){
...

for_each_possible_cpu(cpu)
if (group_map[cpu] == group)
gi->cpu_map[gi->nr_units++] = cpu;
gi->nr_units = roundup(gi->nr_units, upa);

calculate the number of CPUs belonging to a group into relevant @gi->nr_units
then roundup @gi->nr_units up to @upa for itself

unit += gi->nr_units;
...
}

pcpu_page_first_chunk() {
...
ai = pcpu_build_alloc_info(reserved_size, 0, PAGE_SIZE, NULL);
if (IS_ERR(ai))
return PTR_ERR(ai);
BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

it seems there is only one group and all CPUs belong to the group
but compare the number of CPUs with the number of units directly.
...
}

as shown by comments in above function. ai->groups[0].nr_units
should equal to roundup(num_possible_cpus(), @upa) other than
num_possible_cpus() directly.


> I mean, this is pretty old code (isn't it?) so what are you doing that
> triggers this?
> 
> 
i am learning memory management source and find the inconsistency and think
the BUG_ON() maybe be triggered under this special normal but possible case
it maybe a logic error

Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-12 Thread zijun_hu

On 10/13/2016 05:41 AM, Andrew Morton wrote:
> On Tue, 11 Oct 2016 22:00:28 +0800 zijun_hu  wrote:
> 
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> Under what circumstances is the triggered?  In other words, what are
> the end-user visible effects of the fix?
> 

the BUG_ON() takes effect when the number of CPUs isn't aligned @upa,
the BUG_ON() should not be triggered under this normal circumstances.
the aim of this fixing is prevent the BUG_ON() which is triggered under
the case.

see below original code segments for reason.
pcpu_build_alloc_info(){
...

for_each_possible_cpu(cpu)
if (group_map[cpu] == group)
gi->cpu_map[gi->nr_units++] = cpu;
gi->nr_units = roundup(gi->nr_units, upa);

calculate the number of CPUs belonging to a group into relevant @gi->nr_units
then roundup @gi->nr_units up to @upa for itself

unit += gi->nr_units;
...
}

pcpu_page_first_chunk() {
...
ai = pcpu_build_alloc_info(reserved_size, 0, PAGE_SIZE, NULL);
if (IS_ERR(ai))
return PTR_ERR(ai);
BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

it seems there is only one group and all CPUs belong to the group
but compare the number of CPUs with the number of units directly.
...
}

as shown by comments in above function. ai->groups[0].nr_units
should equal to roundup(num_possible_cpus(), @upa) other than
num_possible_cpus() directly.


> I mean, this is pretty old code (isn't it?) so what are you doing that
> triggers this?
> 
> 
i am learning memory management source and find the inconsistency and think
the BUG_ON() maybe be triggered under this special normal but possible case
it maybe a logic error

Re: [RFC PATCH 1/1] mm/vmalloc.c: correct logic errors when insert vmap_area

2016-10-12 Thread zijun_hu

On 2016/10/12 22:46, Michal Hocko wrote:
> [Let's CC Nick who has written this code]
> 
> On Wed 12-10-16 22:30:13, zijun_hu wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> the KVA allocator organizes vmap_areas allocated by rbtree. in order to
>> insert a new vmap_area @i_va into the rbtree, walk around the rbtree from
>> root and compare the vmap_area @t_va met on the rbtree against @i_va; walk
>> toward the left branch of @t_va if @i_va is lower than @t_va, and right
>> branch if higher, otherwise handle this error case since @i_va has overlay
>> with @t_va; however, __insert_vmap_area() don't follow the desired
>> procedure rightly, moreover, it includes a meaningless else if condition
>> and a redundant else branch as shown by comments in below code segments:
>> static void __insert_vmap_area(struct vmap_area *va)
>> {
>> as a internal interface parameter, we assume vmap_area @va has nonzero size
>> ...
>>  if (va->va_start < tmp->va_end)
>>  p = &(*p)->rb_left;
>>  else if (va->va_end > tmp->va_start)
>>  p = &(*p)->rb_right;
>> this else if condition is always true and meaningless due to
>> va->va_end > va->va_start >= tmp_va->va_end > tmp_va->va_start normally
>>  else
>>  BUG();
>> this BUG() is meaningless too due to never be reached normally
>> ...
>> }
>>
>> it looks like the else if condition and else branch are canceled. no errors
>> are caused since the vmap_area @va to insert as a internal interface
>> parameter doesn't have overlay with any one on the rbtree normally. however
>>  __insert_vmap_area() looks weird and really has several logic errors as
>> pointed out above when it is viewed as a separate function.
> 
> I have tried to read this several times but I am completely lost to
> understand what the actual bug is and how it causes vmap_area sorting to
> misbehave. So is this a correctness issue, performance improvement or
> theoretical fix for an incorrect input?
> 

there are several logic errors for this function in current code:

current code is :

static void __insert_vmap_area(struct vmap_area *va)
{
...

if (va->va_start < tmp->va_end)
p = &(*p)->rb_left;
else if (va->va_end > tmp->va_start)
p = &(*p)->rb_right;
else
BUG();
...
}

the current code is equivalent with the following code

static void __insert_vmap_area(struct vmap_area *va)
{
...
if (va->va_start < tmp->va_end)
p = &(*p)->rb_left;
else
p = &(*p)->rb_right;
...
}

as shown above, for current code :
this else if (va->va_end > tmp->va_start) is meaningless since it is always true
the else branch BUG(); is meaningless too since it never be reached
it seems there are logic error in the function

the code we expect should be as follows:

static void __insert_vmap_area(struct vmap_area *va)
{
...
if (va->va_end <= tmp_va->va_start)
p = &(*p)->rb_left;
else if (va->va_start >= tmp_va->va_end)
p = &(*p)->rb_right;
else
BUG();
...
}

>> fix by walking around vmap_area rbtree as described above to insert
>> a vmap_area.
>>
>> BTW, (va->va_end == tmp_va->va_start) is consider as legal case since it
>> indicates vmap_area @va left neighbors with @tmp_va tightly.
>>
>> Fixes: db64fe02258f ("mm: rewrite vmap layer")
>> Signed-off-by: zijun_hu <zijun...@htc.com>
>> ---
>>  mm/vmalloc.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 5daf3211b84f..8b80931654b7 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)
>>  
>>  parent = *p;
>>  tmp_va = rb_entry(parent, struct vmap_area, rb_node);
>> -if (va->va_start < tmp_va->va_end)
>> -p = &(*p)->rb_left;
>> -else if (va->va_end > tmp_va->va_start)
>> -p = &(*p)->rb_right;
>> +if (va->va_end <= tmp_va->va_start)
>> +p = >rb_left;
>> +else if (va->va_start >= tmp_va->va_end)
>> +p = >rb_right;
>>  else
>>  BUG();
>>  }
>> -- 
>> 1.9.1
>

Re: [RFC PATCH 1/1] mm/vmalloc.c: correct logic errors when insert vmap_area

2016-10-12 Thread zijun_hu

On 2016/10/12 22:46, Michal Hocko wrote:
> [Let's CC Nick who has written this code]
> 
> On Wed 12-10-16 22:30:13, zijun_hu wrote:
>> From: zijun_hu 
>>
>> the KVA allocator organizes vmap_areas allocated by rbtree. in order to
>> insert a new vmap_area @i_va into the rbtree, walk around the rbtree from
>> root and compare the vmap_area @t_va met on the rbtree against @i_va; walk
>> toward the left branch of @t_va if @i_va is lower than @t_va, and right
>> branch if higher, otherwise handle this error case since @i_va has overlay
>> with @t_va; however, __insert_vmap_area() don't follow the desired
>> procedure rightly, moreover, it includes a meaningless else if condition
>> and a redundant else branch as shown by comments in below code segments:
>> static void __insert_vmap_area(struct vmap_area *va)
>> {
>> as a internal interface parameter, we assume vmap_area @va has nonzero size
>> ...
>>  if (va->va_start < tmp->va_end)
>>  p = &(*p)->rb_left;
>>  else if (va->va_end > tmp->va_start)
>>  p = &(*p)->rb_right;
>> this else if condition is always true and meaningless due to
>> va->va_end > va->va_start >= tmp_va->va_end > tmp_va->va_start normally
>>  else
>>  BUG();
>> this BUG() is meaningless too due to never be reached normally
>> ...
>> }
>>
>> it looks like the else if condition and else branch are canceled. no errors
>> are caused since the vmap_area @va to insert as a internal interface
>> parameter doesn't have overlay with any one on the rbtree normally. however
>>  __insert_vmap_area() looks weird and really has several logic errors as
>> pointed out above when it is viewed as a separate function.
> 
> I have tried to read this several times but I am completely lost to
> understand what the actual bug is and how it causes vmap_area sorting to
> misbehave. So is this a correctness issue, performance improvement or
> theoretical fix for an incorrect input?
> 

there are several logic errors for this function in current code:

current code is :

static void __insert_vmap_area(struct vmap_area *va)
{
...

if (va->va_start < tmp->va_end)
p = &(*p)->rb_left;
else if (va->va_end > tmp->va_start)
p = &(*p)->rb_right;
else
BUG();
...
}

the current code is equivalent with the following code

static void __insert_vmap_area(struct vmap_area *va)
{
...
if (va->va_start < tmp->va_end)
p = &(*p)->rb_left;
else
p = &(*p)->rb_right;
...
}

as shown above, for current code :
this else if (va->va_end > tmp->va_start) is meaningless since it is always true
the else branch BUG(); is meaningless too since it never be reached
it seems there are logic error in the function

the code we expect should be as follows:

static void __insert_vmap_area(struct vmap_area *va)
{
...
if (va->va_end <= tmp_va->va_start)
p = &(*p)->rb_left;
else if (va->va_start >= tmp_va->va_end)
p = &(*p)->rb_right;
    else
BUG();
...
}

>> fix by walking around vmap_area rbtree as described above to insert
>> a vmap_area.
>>
>> BTW, (va->va_end == tmp_va->va_start) is consider as legal case since it
>> indicates vmap_area @va left neighbors with @tmp_va tightly.
>>
>> Fixes: db64fe02258f ("mm: rewrite vmap layer")
>> Signed-off-by: zijun_hu 
>> ---
>>  mm/vmalloc.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 5daf3211b84f..8b80931654b7 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)
>>  
>>  parent = *p;
>>  tmp_va = rb_entry(parent, struct vmap_area, rb_node);
>> -if (va->va_start < tmp_va->va_end)
>> -p = &(*p)->rb_left;
>> -else if (va->va_end > tmp_va->va_start)
>> -p = &(*p)->rb_right;
>> +if (va->va_end <= tmp_va->va_start)
>> +p = >rb_left;
>> +else if (va->va_start >= tmp_va->va_end)
>> +p = >rb_right;
>>  else
>>  BUG();
>>  }
>> -- 
>> 1.9.1
>

[RFC PATCH 1/1] mm/vmalloc.c: correct logic errors when insert vmap_area

2016-10-12 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

the KVA allocator organizes vmap_areas allocated by rbtree. in order to
insert a new vmap_area @i_va into the rbtree, walk around the rbtree from
root and compare the vmap_area @t_va met on the rbtree against @i_va; walk
toward the left branch of @t_va if @i_va is lower than @t_va, and right
branch if higher, otherwise handle this error case since @i_va has overlay
with @t_va; however, __insert_vmap_area() don't follow the desired
procedure rightly, moreover, it includes a meaningless else if condition
and a redundant else branch as shown by comments in below code segments:
static void __insert_vmap_area(struct vmap_area *va)
{
as a internal interface parameter, we assume vmap_area @va has nonzero size
...
if (va->va_start < tmp->va_end)
p = &(*p)->rb_left;
else if (va->va_end > tmp->va_start)
p = &(*p)->rb_right;
this else if condition is always true and meaningless due to
va->va_end > va->va_start >= tmp_va->va_end > tmp_va->va_start normally
else
BUG();
this BUG() is meaningless too due to never be reached normally
...
}

it looks like the else if condition and else branch are canceled. no errors
are caused since the vmap_area @va to insert as a internal interface
parameter doesn't have overlay with any one on the rbtree normally. however
 __insert_vmap_area() looks weird and really has several logic errors as
pointed out above when it is viewed as a separate function.

fix by walking around vmap_area rbtree as described above to insert
a vmap_area.

BTW, (va->va_end == tmp_va->va_start) is consider as legal case since it
indicates vmap_area @va left neighbors with @tmp_va tightly.

Fixes: db64fe02258f ("mm: rewrite vmap layer")
Signed-off-by: zijun_hu <zijun...@htc.com>
---
 mm/vmalloc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5daf3211b84f..8b80931654b7 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)
 
parent = *p;
tmp_va = rb_entry(parent, struct vmap_area, rb_node);
-   if (va->va_start < tmp_va->va_end)
-   p = &(*p)->rb_left;
-   else if (va->va_end > tmp_va->va_start)
-   p = &(*p)->rb_right;
+   if (va->va_end <= tmp_va->va_start)
+   p = >rb_left;
+   else if (va->va_start >= tmp_va->va_end)
+   p = >rb_right;
else
BUG();
}
-- 
1.9.1

[RFC PATCH 1/1] mm/vmalloc.c: correct logic errors when insert vmap_area

2016-10-12 Thread zijun_hu

From: zijun_hu 

the KVA allocator organizes vmap_areas allocated by rbtree. in order to
insert a new vmap_area @i_va into the rbtree, walk around the rbtree from
root and compare the vmap_area @t_va met on the rbtree against @i_va; walk
toward the left branch of @t_va if @i_va is lower than @t_va, and right
branch if higher, otherwise handle this error case since @i_va has overlay
with @t_va; however, __insert_vmap_area() don't follow the desired
procedure rightly, moreover, it includes a meaningless else if condition
and a redundant else branch as shown by comments in below code segments:
static void __insert_vmap_area(struct vmap_area *va)
{
as a internal interface parameter, we assume vmap_area @va has nonzero size
...
if (va->va_start < tmp->va_end)
p = &(*p)->rb_left;
else if (va->va_end > tmp->va_start)
p = &(*p)->rb_right;
this else if condition is always true and meaningless due to
va->va_end > va->va_start >= tmp_va->va_end > tmp_va->va_start normally
else
BUG();
this BUG() is meaningless too due to never be reached normally
...
}

it looks like the else if condition and else branch are canceled. no errors
are caused since the vmap_area @va to insert as a internal interface
parameter doesn't have overlay with any one on the rbtree normally. however
 __insert_vmap_area() looks weird and really has several logic errors as
pointed out above when it is viewed as a separate function.

fix by walking around vmap_area rbtree as described above to insert
a vmap_area.

BTW, (va->va_end == tmp_va->va_start) is consider as legal case since it
indicates vmap_area @va left neighbors with @tmp_va tightly.

Fixes: db64fe02258f ("mm: rewrite vmap layer")
Signed-off-by: zijun_hu 
---
 mm/vmalloc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5daf3211b84f..8b80931654b7 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)
 
parent = *p;
tmp_va = rb_entry(parent, struct vmap_area, rb_node);
-   if (va->va_start < tmp_va->va_end)
-   p = &(*p)->rb_left;
-   else if (va->va_end > tmp_va->va_start)
-   p = &(*p)->rb_right;
+   if (va->va_end <= tmp_va->va_start)
+   p = >rb_left;
+   else if (va->va_start >= tmp_va->va_end)
+   p = >rb_right;
else
BUG();
}
-- 
1.9.1

[RESEND RFC PATCH v2 1/1] mm/vmalloc.c: simplify /proc/vmallocinfo implementation

2016-10-12 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

many seq_file helpers exist for simplifying implementation of virtual files
especially, for /proc nodes. however, the helpers for iteration over
list_head are available but aren't adopted to implement /proc/vmallocinfo
currently.

simplify /proc/vmallocinfo implementation by existing seq_file helpers

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 Changes in v2:
  - the redundant type cast is removed as advised by rient...@google.com
  - commit messages are updated

 mm/vmalloc.c | 27 +--
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f2481cb4e6b2..e73948afac70 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2574,32 +2574,13 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int 
nr_vms)
 static void *s_start(struct seq_file *m, loff_t *pos)
__acquires(_area_lock)
 {
-   loff_t n = *pos;
-   struct vmap_area *va;
-
spin_lock(_area_lock);
-   va = list_first_entry(_area_list, typeof(*va), list);
-   while (n > 0 && >list != _area_list) {
-   n--;
-   va = list_next_entry(va, list);
-   }
-   if (!n && >list != _area_list)
-   return va;
-
-   return NULL;
-
+   return seq_list_start(_area_list, *pos);
 }
 
 static void *s_next(struct seq_file *m, void *p, loff_t *pos)
 {
-   struct vmap_area *va = p, *next;
-
-   ++*pos;
-   next = list_next_entry(va, list);
-   if (>list != _area_list)
-   return next;
-
-   return NULL;
+   return seq_list_next(p, _area_list, pos);
 }
 
 static void s_stop(struct seq_file *m, void *p)
@@ -2634,9 +2615,11 @@ static void show_numa_info(struct seq_file *m, struct 
vm_struct *v)
 
 static int s_show(struct seq_file *m, void *p)
 {
-   struct vmap_area *va = p;
+   struct vmap_area *va;
struct vm_struct *v;
 
+   va = list_entry(p, struct vmap_area, list);
+
/*
 * s_show can encounter race with remove_vm_area, !VM_VM_AREA on
 * behalf of vmap area is being tear down or vm_map_ram allocation.
-- 
1.9.1

[RESEND RFC PATCH v2 1/1] mm/vmalloc.c: simplify /proc/vmallocinfo implementation

2016-10-12 Thread zijun_hu

From: zijun_hu 

many seq_file helpers exist for simplifying implementation of virtual files
especially, for /proc nodes. however, the helpers for iteration over
list_head are available but aren't adopted to implement /proc/vmallocinfo
currently.

simplify /proc/vmallocinfo implementation by existing seq_file helpers

Signed-off-by: zijun_hu 
---
 Changes in v2:
  - the redundant type cast is removed as advised by rient...@google.com
  - commit messages are updated

 mm/vmalloc.c | 27 +--
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f2481cb4e6b2..e73948afac70 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2574,32 +2574,13 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int 
nr_vms)
 static void *s_start(struct seq_file *m, loff_t *pos)
__acquires(_area_lock)
 {
-   loff_t n = *pos;
-   struct vmap_area *va;
-
spin_lock(_area_lock);
-   va = list_first_entry(_area_list, typeof(*va), list);
-   while (n > 0 && >list != _area_list) {
-   n--;
-   va = list_next_entry(va, list);
-   }
-   if (!n && >list != _area_list)
-   return va;
-
-   return NULL;
-
+   return seq_list_start(_area_list, *pos);
 }
 
 static void *s_next(struct seq_file *m, void *p, loff_t *pos)
 {
-   struct vmap_area *va = p, *next;
-
-   ++*pos;
-   next = list_next_entry(va, list);
-   if (>list != _area_list)
-   return next;
-
-   return NULL;
+   return seq_list_next(p, _area_list, pos);
 }
 
 static void s_stop(struct seq_file *m, void *p)
@@ -2634,9 +2615,11 @@ static void show_numa_info(struct seq_file *m, struct 
vm_struct *v)
 
 static int s_show(struct seq_file *m, void *p)
 {
-   struct vmap_area *va = p;
+   struct vmap_area *va;
struct vm_struct *v;
 
+   va = list_entry(p, struct vmap_area, list);
+
/*
 * s_show can encounter race with remove_vm_area, !VM_VM_AREA on
 * behalf of vmap area is being tear down or vm_map_ram allocation.
-- 
1.9.1

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-12 Thread zijun_hu

On 10/12/2016 05:54 PM, Michal Hocko wrote:
> On Wed 12-10-16 16:44:31, zijun_hu wrote:
>> On 10/12/2016 04:25 PM, Michal Hocko wrote:
>>> On Wed 12-10-16 15:24:33, zijun_hu wrote:
> [...]
>>>> i found the following code segments in mm/vmalloc.c
>>>> static struct vmap_area *alloc_vmap_area(unsigned long size,
>>>> unsigned long align,
>>>> unsigned long vstart, unsigned long vend,
>>>> int node, gfp_t gfp_mask)
>>>> {
>>>> ...
>>>>
>>>> BUG_ON(!size);
>>>> BUG_ON(offset_in_page(size));
>>>> BUG_ON(!is_power_of_2(align));
>>>
>>> See a recent Linus rant about BUG_ONs. These BUG_ONs are quite old and
>>> from a quick look they are even unnecessary. So rather than adding more
>>> of those, I think removing those that are not needed is much more
>>> preferred.
>>>
>> i notice that, and the above code segments is used to illustrate that
>> input parameter checking is necessary sometimes
> 
> Why do you think it is necessary here?
> 
i am sorry for reply late
i don't know whether it is necessary
i just find there are so many sanity checkup in current internal interfaces

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-12 Thread zijun_hu

On 10/12/2016 05:54 PM, Michal Hocko wrote:
> On Wed 12-10-16 16:44:31, zijun_hu wrote:
>> On 10/12/2016 04:25 PM, Michal Hocko wrote:
>>> On Wed 12-10-16 15:24:33, zijun_hu wrote:
> [...]
>>>> i found the following code segments in mm/vmalloc.c
>>>> static struct vmap_area *alloc_vmap_area(unsigned long size,
>>>> unsigned long align,
>>>> unsigned long vstart, unsigned long vend,
>>>> int node, gfp_t gfp_mask)
>>>> {
>>>> ...
>>>>
>>>> BUG_ON(!size);
>>>> BUG_ON(offset_in_page(size));
>>>> BUG_ON(!is_power_of_2(align));
>>>
>>> See a recent Linus rant about BUG_ONs. These BUG_ONs are quite old and
>>> from a quick look they are even unnecessary. So rather than adding more
>>> of those, I think removing those that are not needed is much more
>>> preferred.
>>>
>> i notice that, and the above code segments is used to illustrate that
>> input parameter checking is necessary sometimes
> 
> Why do you think it is necessary here?
> 
i am sorry for reply late
i don't know whether it is necessary
i just find there are so many sanity checkup in current internal interfaces

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-12 Thread zijun_hu

On 10/12/2016 04:25 PM, Michal Hocko wrote:
> On Wed 12-10-16 15:24:33, zijun_hu wrote:
>> On 10/12/2016 02:53 PM, Michal Hocko wrote:
>>> On Wed 12-10-16 08:28:17, zijun_hu wrote:
>>>> On 2016/10/12 1:22, Michal Hocko wrote:
>>>>> On Tue 11-10-16 21:24:50, zijun_hu wrote:
>>>>>> From: zijun_hu <zijun...@htc.com>
>>>>>>
>> should we have a generic discussion whether such patches which considers
>> many boundary or rare conditions are necessary.
> 
> In general, I believe that kernel internal interfaces which have no
> userspace exposure shouldn't be cluttered with sanity checks.
> 

you are right and i agree with you. but there are many internal interfaces
perform sanity checks in current linux sources

>> i found the following code segments in mm/vmalloc.c
>> static struct vmap_area *alloc_vmap_area(unsigned long size,
>> unsigned long align,
>> unsigned long vstart, unsigned long vend,
>> int node, gfp_t gfp_mask)
>> {
>> ...
>>
>> BUG_ON(!size);
>> BUG_ON(offset_in_page(size));
>> BUG_ON(!is_power_of_2(align));
> 
> See a recent Linus rant about BUG_ONs. These BUG_ONs are quite old and
> from a quick look they are even unnecessary. So rather than adding more
> of those, I think removing those that are not needed is much more
> preferred.
>
i notice that, and the above code segments is used to illustrate that
input parameter checking is necessary sometimes

>> should we make below declarations as conventions
>> 1) when we say 'alignment', it means align to a power of 2 value
>>for example, aligning value @v to @b implicit @v is power of 2
>>, align 10 to 4 is 12
> 
> alignment other than power-of-two makes only very limited sense to me.
> 
you are right and i agree with you.
>> 2) when we say 'round value @v up/down to boundary @b', it means the 
>>result is a times of @b,  it don't requires @b is a power of 2
> 

i will write to linus to ask for opinions whether we should declare 
the meaning of 'align' and 'round up/down' formally and whether such
patches are necessary

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-12 Thread zijun_hu

On 10/12/2016 04:25 PM, Michal Hocko wrote:
> On Wed 12-10-16 15:24:33, zijun_hu wrote:
>> On 10/12/2016 02:53 PM, Michal Hocko wrote:
>>> On Wed 12-10-16 08:28:17, zijun_hu wrote:
>>>> On 2016/10/12 1:22, Michal Hocko wrote:
>>>>> On Tue 11-10-16 21:24:50, zijun_hu wrote:
>>>>>> From: zijun_hu 
>>>>>>
>> should we have a generic discussion whether such patches which considers
>> many boundary or rare conditions are necessary.
> 
> In general, I believe that kernel internal interfaces which have no
> userspace exposure shouldn't be cluttered with sanity checks.
> 

you are right and i agree with you. but there are many internal interfaces
perform sanity checks in current linux sources

>> i found the following code segments in mm/vmalloc.c
>> static struct vmap_area *alloc_vmap_area(unsigned long size,
>> unsigned long align,
>> unsigned long vstart, unsigned long vend,
>> int node, gfp_t gfp_mask)
>> {
>> ...
>>
>> BUG_ON(!size);
>> BUG_ON(offset_in_page(size));
>> BUG_ON(!is_power_of_2(align));
> 
> See a recent Linus rant about BUG_ONs. These BUG_ONs are quite old and
> from a quick look they are even unnecessary. So rather than adding more
> of those, I think removing those that are not needed is much more
> preferred.
>
i notice that, and the above code segments is used to illustrate that
input parameter checking is necessary sometimes

>> should we make below declarations as conventions
>> 1) when we say 'alignment', it means align to a power of 2 value
>>for example, aligning value @v to @b implicit @v is power of 2
>>, align 10 to 4 is 12
> 
> alignment other than power-of-two makes only very limited sense to me.
> 
you are right and i agree with you.
>> 2) when we say 'round value @v up/down to boundary @b', it means the 
>>result is a times of @b,  it don't requires @b is a power of 2
> 

i will write to linus to ask for opinions whether we should declare 
the meaning of 'align' and 'round up/down' formally and whether such
patches are necessary

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-12 Thread zijun_hu

On 10/12/2016 02:53 PM, Michal Hocko wrote:
> On Wed 12-10-16 08:28:17, zijun_hu wrote:
>> On 2016/10/12 1:22, Michal Hocko wrote:
>>> On Tue 11-10-16 21:24:50, zijun_hu wrote:
>>>> From: zijun_hu <zijun...@htc.com>
>>>>
>>>> the LSB of a chunk->map element is used for free/in-use flag of a area
>>>> and the other bits for offset, the sufficient and necessary condition of
>>>> this usage is that both size and alignment of a area must be even numbers
>>>> however, pcpu_alloc() doesn't force its @align parameter a even number
>>>> explicitly, so a odd @align maybe causes a series of errors, see below
>>>> example for concrete descriptions.
>>>
>>> Is or was there any user who would use a different than even (or power of 2)
>>> alighment? If not is this really worth handling?
>>>
>>
>> it seems only a power of 2 alignment except 1 can make sure it work very 
>> well,
>> that is a strict limit, maybe this more strict limit should be checked
> 
> I fail to see how any other alignment would actually make any sense
> what so ever. Look, I am not a maintainer of this code but adding a new
> code to catch something that doesn't make any sense sounds dubious at
> best to me.
> 
> I could understand this patch if you see a problem and want to prevent
> it from repeating bug doing these kind of changes just in case sounds
> like a bad idea.
> 

thanks for your reply

should we have a generic discussion whether such patches which considers
many boundary or rare conditions are necessary.

i found the following code segments in mm/vmalloc.c
static struct vmap_area *alloc_vmap_area(unsigned long size,
unsigned long align,
unsigned long vstart, unsigned long vend,
int node, gfp_t gfp_mask)
{
...

BUG_ON(!size);
BUG_ON(offset_in_page(size));
BUG_ON(!is_power_of_2(align));


should we make below declarations as conventions
1) when we say 'alignment', it means align to a power of 2 value
   for example, aligning value @v to @b implicit @v is power of 2
   , align 10 to 4 is 12
2) when we say 'round value @v up/down to boundary @b', it means the 
   result is a times of @b,  it don't requires @b is a power of 2

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-12 Thread zijun_hu

On 10/12/2016 02:53 PM, Michal Hocko wrote:
> On Wed 12-10-16 08:28:17, zijun_hu wrote:
>> On 2016/10/12 1:22, Michal Hocko wrote:
>>> On Tue 11-10-16 21:24:50, zijun_hu wrote:
>>>> From: zijun_hu 
>>>>
>>>> the LSB of a chunk->map element is used for free/in-use flag of a area
>>>> and the other bits for offset, the sufficient and necessary condition of
>>>> this usage is that both size and alignment of a area must be even numbers
>>>> however, pcpu_alloc() doesn't force its @align parameter a even number
>>>> explicitly, so a odd @align maybe causes a series of errors, see below
>>>> example for concrete descriptions.
>>>
>>> Is or was there any user who would use a different than even (or power of 2)
>>> alighment? If not is this really worth handling?
>>>
>>
>> it seems only a power of 2 alignment except 1 can make sure it work very 
>> well,
>> that is a strict limit, maybe this more strict limit should be checked
> 
> I fail to see how any other alignment would actually make any sense
> what so ever. Look, I am not a maintainer of this code but adding a new
> code to catch something that doesn't make any sense sounds dubious at
> best to me.
> 
> I could understand this patch if you see a problem and want to prevent
> it from repeating bug doing these kind of changes just in case sounds
> like a bad idea.
> 

thanks for your reply

should we have a generic discussion whether such patches which considers
many boundary or rare conditions are necessary.

i found the following code segments in mm/vmalloc.c
static struct vmap_area *alloc_vmap_area(unsigned long size,
unsigned long align,
unsigned long vstart, unsigned long vend,
int node, gfp_t gfp_mask)
{
...

BUG_ON(!size);
BUG_ON(offset_in_page(size));
BUG_ON(!is_power_of_2(align));


should we make below declarations as conventions
1) when we say 'alignment', it means align to a power of 2 value
   for example, aligning value @v to @b implicit @v is power of 2
   , align 10 to 4 is 12
2) when we say 'round value @v up/down to boundary @b', it means the 
   result is a times of @b,  it don't requires @b is a power of 2

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-12 Thread zijun_hu

On 10/12/2016 02:53 PM, Michal Hocko wrote:
> On Wed 12-10-16 08:28:17, zijun_hu wrote:
>> On 2016/10/12 1:22, Michal Hocko wrote:
>>> On Tue 11-10-16 21:24:50, zijun_hu wrote:
>>>> From: zijun_hu <zijun...@htc.com>
>>>>
>>>> the LSB of a chunk->map element is used for free/in-use flag of a area
>>>> and the other bits for offset, the sufficient and necessary condition of
>>>> this usage is that both size and alignment of a area must be even numbers
>>>> however, pcpu_alloc() doesn't force its @align parameter a even number
>>>> explicitly, so a odd @align maybe causes a series of errors, see below
>>>> example for concrete descriptions.
>>>
>>> Is or was there any user who would use a different than even (or power of 2)
>>> alighment? If not is this really worth handling?
>>>
>>
>> it seems only a power of 2 alignment except 1 can make sure it work very 
>> well,
>> that is a strict limit, maybe this more strict limit should be checked
> 
> I fail to see how any other alignment would actually make any sense
> what so ever. Look, I am not a maintainer of this code but adding a new
> code to catch something that doesn't make any sense sounds dubious at
> best to me.
> 
> I could understand this patch if you see a problem and want to prevent
> it from repeating bug doing these kind of changes just in case sounds
> like a bad idea.
> 
thanks for your reply

should we have a generic discussion whether such patches which considers
many boundary or rare conditions are necessary.

should we make below declarations as conventions
1) when we say 'alignment', it means align to a power of 2 value
   for example, aligning value @v to @b implicit @v is power of 2
   , align 10 to 4 is 12
2) when we say 'round value @v up/down to boundary @b', it means the 
   result is a times of @b,  it don't requires @b is a power of 2

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-12 Thread zijun_hu

On 10/12/2016 02:53 PM, Michal Hocko wrote:
> On Wed 12-10-16 08:28:17, zijun_hu wrote:
>> On 2016/10/12 1:22, Michal Hocko wrote:
>>> On Tue 11-10-16 21:24:50, zijun_hu wrote:
>>>> From: zijun_hu 
>>>>
>>>> the LSB of a chunk->map element is used for free/in-use flag of a area
>>>> and the other bits for offset, the sufficient and necessary condition of
>>>> this usage is that both size and alignment of a area must be even numbers
>>>> however, pcpu_alloc() doesn't force its @align parameter a even number
>>>> explicitly, so a odd @align maybe causes a series of errors, see below
>>>> example for concrete descriptions.
>>>
>>> Is or was there any user who would use a different than even (or power of 2)
>>> alighment? If not is this really worth handling?
>>>
>>
>> it seems only a power of 2 alignment except 1 can make sure it work very 
>> well,
>> that is a strict limit, maybe this more strict limit should be checked
> 
> I fail to see how any other alignment would actually make any sense
> what so ever. Look, I am not a maintainer of this code but adding a new
> code to catch something that doesn't make any sense sounds dubious at
> best to me.
> 
> I could understand this patch if you see a problem and want to prevent
> it from repeating bug doing these kind of changes just in case sounds
> like a bad idea.
> 
thanks for your reply

should we have a generic discussion whether such patches which considers
many boundary or rare conditions are necessary.

should we make below declarations as conventions
1) when we say 'alignment', it means align to a power of 2 value
   for example, aligning value @v to @b implicit @v is power of 2
   , align 10 to 4 is 12
2) when we say 'round value @v up/down to boundary @b', it means the 
   result is a times of @b,  it don't requires @b is a power of 2

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-11 Thread zijun_hu

On 2016/10/12 1:22, Michal Hocko wrote:
> On Tue 11-10-16 21:24:50, zijun_hu wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> the LSB of a chunk->map element is used for free/in-use flag of a area
>> and the other bits for offset, the sufficient and necessary condition of
>> this usage is that both size and alignment of a area must be even numbers
>> however, pcpu_alloc() doesn't force its @align parameter a even number
>> explicitly, so a odd @align maybe causes a series of errors, see below
>> example for concrete descriptions.
> 
> Is or was there any user who would use a different than even (or power of 2)
> alighment? If not is this really worth handling?
> 

it seems only a power of 2 alignment except 1 can make sure it work very well,
that is a strict limit, maybe this more strict limit should be checked

i don't know since there are too many sources and too many users and too many
use cases. even if nobody, i can't be sure that it doesn't happens in the future

it is worth since below reasons
1) if it is used in right ways, this patch have no impact; otherwise, it can 
alert
   user by warning message and correct the behavior.
   is it better that a warning message and correcting than resulting in many 
terrible
   error silently under a special case by change?
   it can make program more stronger.

2) does any alignment but 1 means a power of 2 alignment conventionally and 
implicitly? 
   if not, is it better that adjusting both @align and @size uniformly based on 
the sufficient
   necessary condition than mixing supposing one part is right and correcting 
the other?
   i find that there is BUG_ON(!is_power_of_2(align)) statement in mm/vmalloc.c

3) this simple fix can make the function applicable in wider range, it hints 
the reader
   that the lowest requirement for alignment is a even number

4) for char a[10][10]; char (*p)[10]; if a user want to allocate a @size = 10 
and
   @align = 10 memory block, should we reject the user's request?

Re: [RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-11 Thread zijun_hu

On 2016/10/12 1:22, Michal Hocko wrote:
> On Tue 11-10-16 21:24:50, zijun_hu wrote:
>> From: zijun_hu 
>>
>> the LSB of a chunk->map element is used for free/in-use flag of a area
>> and the other bits for offset, the sufficient and necessary condition of
>> this usage is that both size and alignment of a area must be even numbers
>> however, pcpu_alloc() doesn't force its @align parameter a even number
>> explicitly, so a odd @align maybe causes a series of errors, see below
>> example for concrete descriptions.
> 
> Is or was there any user who would use a different than even (or power of 2)
> alighment? If not is this really worth handling?
> 

it seems only a power of 2 alignment except 1 can make sure it work very well,
that is a strict limit, maybe this more strict limit should be checked

i don't know since there are too many sources and too many users and too many
use cases. even if nobody, i can't be sure that it doesn't happens in the future

it is worth since below reasons
1) if it is used in right ways, this patch have no impact; otherwise, it can 
alert
   user by warning message and correct the behavior.
   is it better that a warning message and correcting than resulting in many 
terrible
   error silently under a special case by change?
   it can make program more stronger.

2) does any alignment but 1 means a power of 2 alignment conventionally and 
implicitly? 
   if not, is it better that adjusting both @align and @size uniformly based on 
the sufficient
   necessary condition than mixing supposing one part is right and correcting 
the other?
   i find that there is BUG_ON(!is_power_of_2(align)) statement in mm/vmalloc.c

3) this simple fix can make the function applicable in wider range, it hints 
the reader
   that the lowest requirement for alignment is a even number

4) for char a[10][10]; char (*p)[10]; if a user want to allocate a @size = 10 
and
   @align = 10 memory block, should we reject the user's request?

Re: [RFC PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-11 Thread zijun_hu

Hi all,

please ignore this patch since it includes a build error
i resend the fixed patch in v2 version

i am sorry for my incaution

On 2016/10/11 21:03, zijun_hu wrote:
> From: zijun_hu <zijun...@htc.com>
> 
> as shown by pcpu_build_alloc_info(), the number of units within a percpu
> group is educed by rounding up the number of CPUs within the group to
> @upa boundary, therefore, the number of CPUs isn't equal to the units's
> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
> uses BUG_ON() to assert one number is equal the other roughly, so a panic
> is maybe triggered by the BUG_ON() falsely.
> 
> in order to fix this issue, the number of CPUs is rounded up then compared
> with units's, the BUG_ON() is replaced by warning and returning error code
> as well to keep system alive as much as possible.
> 
> Signed-off-by: zijun_hu <zijun...@htc.com>
> ---
>  mm/percpu.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 32e2d8d128c1..c2f0d9734d8c 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2095,6 +2095,8 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
>   size_t pages_size;
>   struct page **pages;
>   int unit, i, j, rc;
> + int upa;
> + int nr_g0_units;
>  
>   snprintf(psize_str, sizeof(psize_str), "%luK", PAGE_SIZE >> 10);
>  
> @@ -2102,7 +2104,12 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
>   if (IS_ERR(ai))
>   return PTR_ERR(ai);
>   BUG_ON(ai->nr_groups != 1);
> - BUG_ON(ai->groups[0].nr_units != num_possible_cpus());
> + upa = ai->alloc_size/ai->unit_size;
> + g0_nr_units = roundup(num_possible_cpus(), upa);
> + if (unlikely(WARN_ON(ai->groups[0].nr_units != nr_g0_units))) {
> + pcpu_free_alloc_info(ai);
> + return -EINVAL;
> + }
>  
>   unit_pages = ai->unit_size >> PAGE_SHIFT;
>  
> @@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
>  
>   /* allocate pages */
>   j = 0;
> - for (unit = 0; unit < num_possible_cpus(); unit++)
> + for (unit = 0; unit < num_possible_cpus(); unit++) {
> + unsigned int cpu = ai->groups[0].cpu_map[unit];
>   for (i = 0; i < unit_pages; i++) {
> - unsigned int cpu = ai->groups[0].cpu_map[unit];
>   void *ptr;
>  
>   ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
>   if (!ptr) {
>   pr_warn("failed to allocate %s page for 
> cpu%u\n",
> - psize_str, cpu);
> + psize_str, cpu);
>   goto enomem;
>   }
>   /* kmemleak tracks the percpu allocations separately */
>   kmemleak_free(ptr);
>   pages[j++] = virt_to_page(ptr);
>   }
> + }
>  
>   /* allocate vm area, map the pages and copy static data */
>   vm.flags = VM_ALLOC;
>

Re: [RFC PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-11 Thread zijun_hu

Hi all,

please ignore this patch since it includes a build error
i resend the fixed patch in v2 version

i am sorry for my incaution

On 2016/10/11 21:03, zijun_hu wrote:
> From: zijun_hu 
> 
> as shown by pcpu_build_alloc_info(), the number of units within a percpu
> group is educed by rounding up the number of CPUs within the group to
> @upa boundary, therefore, the number of CPUs isn't equal to the units's
> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
> uses BUG_ON() to assert one number is equal the other roughly, so a panic
> is maybe triggered by the BUG_ON() falsely.
> 
> in order to fix this issue, the number of CPUs is rounded up then compared
> with units's, the BUG_ON() is replaced by warning and returning error code
> as well to keep system alive as much as possible.
> 
> Signed-off-by: zijun_hu 
> ---
>  mm/percpu.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 32e2d8d128c1..c2f0d9734d8c 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2095,6 +2095,8 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
>   size_t pages_size;
>   struct page **pages;
>   int unit, i, j, rc;
> + int upa;
> + int nr_g0_units;
>  
>   snprintf(psize_str, sizeof(psize_str), "%luK", PAGE_SIZE >> 10);
>  
> @@ -2102,7 +2104,12 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
>   if (IS_ERR(ai))
>   return PTR_ERR(ai);
>   BUG_ON(ai->nr_groups != 1);
> - BUG_ON(ai->groups[0].nr_units != num_possible_cpus());
> + upa = ai->alloc_size/ai->unit_size;
> + g0_nr_units = roundup(num_possible_cpus(), upa);
> + if (unlikely(WARN_ON(ai->groups[0].nr_units != nr_g0_units))) {
> + pcpu_free_alloc_info(ai);
> + return -EINVAL;
> + }
>  
>   unit_pages = ai->unit_size >> PAGE_SHIFT;
>  
> @@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
>  
>   /* allocate pages */
>   j = 0;
> - for (unit = 0; unit < num_possible_cpus(); unit++)
> + for (unit = 0; unit < num_possible_cpus(); unit++) {
> + unsigned int cpu = ai->groups[0].cpu_map[unit];
>   for (i = 0; i < unit_pages; i++) {
> - unsigned int cpu = ai->groups[0].cpu_map[unit];
>   void *ptr;
>  
>   ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
>   if (!ptr) {
>   pr_warn("failed to allocate %s page for 
> cpu%u\n",
> - psize_str, cpu);
> + psize_str, cpu);
>   goto enomem;
>   }
>   /* kmemleak tracks the percpu allocations separately */
>   kmemleak_free(ptr);
>   pages[j++] = virt_to_page(ptr);
>   }
> + }
>  
>   /* allocate vm area, map the pages and copy static data */
>   vm.flags = VM_ALLOC;
>

[RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-11 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

as shown by pcpu_build_alloc_info(), the number of units within a percpu
group is educed by rounding up the number of CPUs within the group to
@upa boundary, therefore, the number of CPUs isn't equal to the units's
if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
uses BUG_ON() to assert one number is equal the other roughly, so a panic
is maybe triggered by the BUG_ON() falsely.

in order to fix this issue, the number of CPUs is rounded up then compared
with units's, the BUG_ON() is replaced by warning and returning error code
as well to keep system alive as much as possible.

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 Changes in v2:
  - fix build error

 mm/percpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 32e2d8d128c1..ab1186c68ab6 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2095,6 +2095,8 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
size_t pages_size;
struct page **pages;
int unit, i, j, rc;
+   int upa;
+   int nr_g0_units;
 
snprintf(psize_str, sizeof(psize_str), "%luK", PAGE_SIZE >> 10);
 
@@ -2102,7 +2104,12 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
if (IS_ERR(ai))
return PTR_ERR(ai);
BUG_ON(ai->nr_groups != 1);
-   BUG_ON(ai->groups[0].nr_units != num_possible_cpus());
+   upa = ai->alloc_size/ai->unit_size;
+   nr_g0_units = roundup(num_possible_cpus(), upa);
+   if (unlikely(WARN_ON(ai->groups[0].nr_units != nr_g0_units))) {
+   pcpu_free_alloc_info(ai);
+   return -EINVAL;
+   }
 
unit_pages = ai->unit_size >> PAGE_SHIFT;
 
@@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
 
/* allocate pages */
j = 0;
-   for (unit = 0; unit < num_possible_cpus(); unit++)
+   for (unit = 0; unit < num_possible_cpus(); unit++) {
+   unsigned int cpu = ai->groups[0].cpu_map[unit];
for (i = 0; i < unit_pages; i++) {
-   unsigned int cpu = ai->groups[0].cpu_map[unit];
void *ptr;
 
ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
if (!ptr) {
pr_warn("failed to allocate %s page for 
cpu%u\n",
-   psize_str, cpu);
+   psize_str, cpu);
goto enomem;
}
/* kmemleak tracks the percpu allocations separately */
kmemleak_free(ptr);
pages[j++] = virt_to_page(ptr);
}
+   }
 
/* allocate vm area, map the pages and copy static data */
vm.flags = VM_ALLOC;
-- 
1.9.1

[RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-11 Thread zijun_hu

From: zijun_hu 

as shown by pcpu_build_alloc_info(), the number of units within a percpu
group is educed by rounding up the number of CPUs within the group to
@upa boundary, therefore, the number of CPUs isn't equal to the units's
if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
uses BUG_ON() to assert one number is equal the other roughly, so a panic
is maybe triggered by the BUG_ON() falsely.

in order to fix this issue, the number of CPUs is rounded up then compared
with units's, the BUG_ON() is replaced by warning and returning error code
as well to keep system alive as much as possible.

Signed-off-by: zijun_hu 
---
 Changes in v2:
  - fix build error

 mm/percpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 32e2d8d128c1..ab1186c68ab6 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2095,6 +2095,8 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
size_t pages_size;
struct page **pages;
int unit, i, j, rc;
+   int upa;
+   int nr_g0_units;
 
snprintf(psize_str, sizeof(psize_str), "%luK", PAGE_SIZE >> 10);
 
@@ -2102,7 +2104,12 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
if (IS_ERR(ai))
return PTR_ERR(ai);
BUG_ON(ai->nr_groups != 1);
-   BUG_ON(ai->groups[0].nr_units != num_possible_cpus());
+   upa = ai->alloc_size/ai->unit_size;
+   nr_g0_units = roundup(num_possible_cpus(), upa);
+   if (unlikely(WARN_ON(ai->groups[0].nr_units != nr_g0_units))) {
+   pcpu_free_alloc_info(ai);
+   return -EINVAL;
+   }
 
unit_pages = ai->unit_size >> PAGE_SHIFT;
 
@@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
 
/* allocate pages */
j = 0;
-   for (unit = 0; unit < num_possible_cpus(); unit++)
+   for (unit = 0; unit < num_possible_cpus(); unit++) {
+   unsigned int cpu = ai->groups[0].cpu_map[unit];
for (i = 0; i < unit_pages; i++) {
-   unsigned int cpu = ai->groups[0].cpu_map[unit];
void *ptr;
 
ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
if (!ptr) {
pr_warn("failed to allocate %s page for 
cpu%u\n",
-   psize_str, cpu);
+   psize_str, cpu);
goto enomem;
}
/* kmemleak tracks the percpu allocations separately */
kmemleak_free(ptr);
pages[j++] = virt_to_page(ptr);
}
+   }
 
/* allocate vm area, map the pages and copy static data */
vm.flags = VM_ALLOC;
-- 
1.9.1

[RFC PATCH 1/1] mm/percpu.c: fix several trivial issues

2016-10-11 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

as shown by pcpu_setup_first_chunk(), the first chunk is same as the
reserved chunk if the reserved size is nonzero but the dynamic is zero
this special scenario is referred as the special case by below content

fix several trivial issues:

1) correct or fix several comments
the LSB of a chunk->map element is used as free/in-use flag and is cleared
for free area and set for in-use, rather than use positive/negative number
to mark area state.

2) change atomic size to PAGE_SIZE for consistency when CONFIG_SMP == n
both default setup_per_cpu_areas() and pcpu_page_first_chunk()
use PAGE_SIZE as atomic size when CONFIG_SMP == y; however
setup_per_cpu_areas() allocates memory for the only unit with alignment
PAGE_SIZE but assigns unit size to atomic size when CONFIG_SMP == n, so the
atomic size isn't consistent with either the alignment or the SMP ones.
fix it by changing atomic size to PAGE_SIZE when CONFIG_SMP == n

3) correct empty and populated pages statistic error
in order to service dynamic atomic memory allocation, the number of empty
and populated pages of chunks is counted to maintain above a low threshold.
however, for the special case, the first chunk is took into account by
pcpu_setup_first_chunk(), it is meaningless since the chunk don't include
any dynamic areas.
fix it by excluding the reserved chunk before statistic as the other
contexts do.

4) fix potential memory leakage for percpu_init_late()
in order to manage chunk->map memory uniformly, for the first and reserved
chunks, percpu_init_late() will allocate memory to replace the static
chunk->map array within section .init.data after slab is brought up
however, for the special case, memory are allocated for the same chunk->map
twice since the first chunk reference is same as the reserved, so the
memory allocated at the first time are leaked obviously.
fix it by eliminating the second memory allocation under the special case

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 mm/percpu.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 26d1c73bd9e2..760a4730f6fd 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -36,8 +36,8 @@
  * chunk maps unnecessarily.
  *
  * Allocation state in each chunk is kept using an array of integers
- * on chunk->map.  A positive value in the map represents a free
- * region and negative allocated.  Allocation inside a chunk is done
+ * on chunk->map.  A LSB cleared value in the map represents a free
+ * region and set allocated.  Allocation inside a chunk is done
  * by scanning this map sequentially and serving the first matching
  * entry.  This is mostly copied from the percpu_modalloc() allocator.
  * Chunks can be determined from the address using the index field
@@ -93,9 +93,9 @@
 #endif
 #ifndef __pcpu_ptr_to_addr
 #define __pcpu_ptr_to_addr(ptr)
\
-   (void __force *)((unsigned long)(ptr) + \
-(unsigned long)pcpu_base_addr -\
-(unsigned long)__per_cpu_start)
+   (void __force *)((unsigned long)(ptr) - \
+(unsigned long)__per_cpu_start +   \
+(unsigned long)pcpu_base_addr)
 #endif
 #else  /* CONFIG_SMP */
 /* on UP, it's always identity mapped */
@@ -583,6 +583,11 @@ static int pcpu_alloc_area(struct pcpu_chunk *chunk, int 
size, int align,
 * merge'em.  Note that 'small' is defined as smaller
 * than sizeof(int), which is very small but isn't too
 * uncommon for percpu allocations.
+*
+* it is unnecessary to append i > 0 to make sure p[-1]
+* available: the offset of the first area is 0 and is aligned
+* very well, no head free area is left if segments one piece
+* from it
 */
if (head && (head < sizeof(int) || !(p[-1] & 1))) {
*p = off += head;
@@ -1707,8 +1712,9 @@ int __init pcpu_setup_first_chunk(const struct 
pcpu_alloc_info *ai,
 
/* link the first chunk in */
pcpu_first_chunk = dchunk ?: schunk;
-   pcpu_nr_empty_pop_pages +=
-   pcpu_count_occupied_pages(pcpu_first_chunk, 1);
+   if (pcpu_first_chunk != pcpu_reserved_chunk)
+   pcpu_nr_empty_pop_pages +=
+   pcpu_count_occupied_pages(pcpu_first_chunk, 1);
pcpu_chunk_relocate(pcpu_first_chunk, -1);
 
/* we're done */
@@ -2266,7 +2272,7 @@ void __init setup_per_cpu_areas(void)
 
ai->dyn_size = unit_size;
ai->unit_size = unit_size;
-   ai->atom_size = unit_size;
+   ai->atom_size = PAGE_SIZE;
ai->alloc_size = unit_size;
ai->groups[0].nr_units = 1

[RFC PATCH 1/1] mm/percpu.c: fix several trivial issues

2016-10-11 Thread zijun_hu

From: zijun_hu 

as shown by pcpu_setup_first_chunk(), the first chunk is same as the
reserved chunk if the reserved size is nonzero but the dynamic is zero
this special scenario is referred as the special case by below content

fix several trivial issues:

1) correct or fix several comments
the LSB of a chunk->map element is used as free/in-use flag and is cleared
for free area and set for in-use, rather than use positive/negative number
to mark area state.

2) change atomic size to PAGE_SIZE for consistency when CONFIG_SMP == n
both default setup_per_cpu_areas() and pcpu_page_first_chunk()
use PAGE_SIZE as atomic size when CONFIG_SMP == y; however
setup_per_cpu_areas() allocates memory for the only unit with alignment
PAGE_SIZE but assigns unit size to atomic size when CONFIG_SMP == n, so the
atomic size isn't consistent with either the alignment or the SMP ones.
fix it by changing atomic size to PAGE_SIZE when CONFIG_SMP == n

3) correct empty and populated pages statistic error
in order to service dynamic atomic memory allocation, the number of empty
and populated pages of chunks is counted to maintain above a low threshold.
however, for the special case, the first chunk is took into account by
pcpu_setup_first_chunk(), it is meaningless since the chunk don't include
any dynamic areas.
fix it by excluding the reserved chunk before statistic as the other
contexts do.

4) fix potential memory leakage for percpu_init_late()
in order to manage chunk->map memory uniformly, for the first and reserved
chunks, percpu_init_late() will allocate memory to replace the static
chunk->map array within section .init.data after slab is brought up
however, for the special case, memory are allocated for the same chunk->map
twice since the first chunk reference is same as the reserved, so the
memory allocated at the first time are leaked obviously.
fix it by eliminating the second memory allocation under the special case

Signed-off-by: zijun_hu 
---
 mm/percpu.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 26d1c73bd9e2..760a4730f6fd 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -36,8 +36,8 @@
  * chunk maps unnecessarily.
  *
  * Allocation state in each chunk is kept using an array of integers
- * on chunk->map.  A positive value in the map represents a free
- * region and negative allocated.  Allocation inside a chunk is done
+ * on chunk->map.  A LSB cleared value in the map represents a free
+ * region and set allocated.  Allocation inside a chunk is done
  * by scanning this map sequentially and serving the first matching
  * entry.  This is mostly copied from the percpu_modalloc() allocator.
  * Chunks can be determined from the address using the index field
@@ -93,9 +93,9 @@
 #endif
 #ifndef __pcpu_ptr_to_addr
 #define __pcpu_ptr_to_addr(ptr)
\
-   (void __force *)((unsigned long)(ptr) + \
-(unsigned long)pcpu_base_addr -\
-(unsigned long)__per_cpu_start)
+   (void __force *)((unsigned long)(ptr) - \
+(unsigned long)__per_cpu_start +   \
+(unsigned long)pcpu_base_addr)
 #endif
 #else  /* CONFIG_SMP */
 /* on UP, it's always identity mapped */
@@ -583,6 +583,11 @@ static int pcpu_alloc_area(struct pcpu_chunk *chunk, int 
size, int align,
 * merge'em.  Note that 'small' is defined as smaller
 * than sizeof(int), which is very small but isn't too
 * uncommon for percpu allocations.
+*
+* it is unnecessary to append i > 0 to make sure p[-1]
+* available: the offset of the first area is 0 and is aligned
+* very well, no head free area is left if segments one piece
+* from it
 */
if (head && (head < sizeof(int) || !(p[-1] & 1))) {
*p = off += head;
@@ -1707,8 +1712,9 @@ int __init pcpu_setup_first_chunk(const struct 
pcpu_alloc_info *ai,
 
/* link the first chunk in */
pcpu_first_chunk = dchunk ?: schunk;
-   pcpu_nr_empty_pop_pages +=
-   pcpu_count_occupied_pages(pcpu_first_chunk, 1);
+   if (pcpu_first_chunk != pcpu_reserved_chunk)
+   pcpu_nr_empty_pop_pages +=
+   pcpu_count_occupied_pages(pcpu_first_chunk, 1);
pcpu_chunk_relocate(pcpu_first_chunk, -1);
 
/* we're done */
@@ -2266,7 +2272,7 @@ void __init setup_per_cpu_areas(void)
 
ai->dyn_size = unit_size;
ai->unit_size = unit_size;
-   ai->atom_size = unit_size;
+   ai->atom_size = PAGE_SIZE;
ai->alloc_size = unit_size;
ai->groups[0].nr_units = 1;
ai->groups[0].cpu_map[0] = 0

[RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-11 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

the LSB of a chunk->map element is used for free/in-use flag of a area
and the other bits for offset, the sufficient and necessary condition of
this usage is that both size and alignment of a area must be even numbers
however, pcpu_alloc() doesn't force its @align parameter a even number
explicitly, so a odd @align maybe causes a series of errors, see below
example for concrete descriptions.

lets assume area [16, 36) is free but its previous one is in-use, we want
to allocate a @size == 8 and @align == 7 area. the larger area [16, 36) is
split to three areas [16, 21), [21, 29), [29, 36) eventually. however, due
to the usage for a chunk->map element, the actual offset of the aim area
[21, 29) is 21 but is recorded in relevant element as 20; moreover the
residual tail free area [29, 36) is mistook as in-use and is lost silently

as explained above, inaccurate either offset or free/in-use state of
a area is recorded into relevant chunk->map element if request a odd
alignment area, and so causes memory leakage issue

fix it by forcing the @align of a area to allocate a even number
as do for @size.

BTW, macro ALIGN() within pcpu_fit_in_area() is replaced by roundup() too
due to back reason. in order to align a value @v up to @a boundary, macro
roundup(v, a) is more generic than ALIGN(x, a); the latter doesn't work
well when @a isn't a power of 2 value. for example, roundup(10, 6) == 12
but ALIGN(10, 6) == 10, the former result is desired obviously

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 include/linux/kernel.h | 1 +
 mm/percpu.c| 6 --
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 74fd6f05bc5b..ddf46638ef21 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -45,6 +45,7 @@
 
 #define REPEAT_BYTE(x) ((~0ul / 0xff) * (x))
 
+/* @a is a power of 2 value */
 #define ALIGN(x, a)__ALIGN_KERNEL((x), (a))
 #define __ALIGN_MASK(x, mask)  __ALIGN_KERNEL_MASK((x), (mask))
 #define PTR_ALIGN(p, a)((typeof(p))ALIGN((unsigned long)(p), 
(a)))
diff --git a/mm/percpu.c b/mm/percpu.c
index c2f0d9734d8c..26d1c73bd9e2 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -502,7 +502,7 @@ static int pcpu_fit_in_area(struct pcpu_chunk *chunk, int 
off, int this_size,
int cand_off = off;
 
while (true) {
-   int head = ALIGN(cand_off, align) - off;
+   int head = roundup(cand_off, align) - off;
int page_start, page_end, rs, re;
 
if (this_size < head + size)
@@ -879,11 +879,13 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
 
/*
 * We want the lowest bit of offset available for in-use/free
-* indicator, so force >= 16bit alignment and make size even.
+* indicator, so force alignment >= 2 even and make size even.
 */
if (unlikely(align < 2))
align = 2;
 
+   if (WARN_ON_ONCE(!IS_ALIGNED(align, 2)))
+   align = ALIGN(align, 2);
size = ALIGN(size, 2);
 
if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
-- 
1.9.1

[RFC PATCH 1/1] mm/percpu.c: fix memory leakage issue when allocate a odd alignment area

2016-10-11 Thread zijun_hu

From: zijun_hu 

the LSB of a chunk->map element is used for free/in-use flag of a area
and the other bits for offset, the sufficient and necessary condition of
this usage is that both size and alignment of a area must be even numbers
however, pcpu_alloc() doesn't force its @align parameter a even number
explicitly, so a odd @align maybe causes a series of errors, see below
example for concrete descriptions.

lets assume area [16, 36) is free but its previous one is in-use, we want
to allocate a @size == 8 and @align == 7 area. the larger area [16, 36) is
split to three areas [16, 21), [21, 29), [29, 36) eventually. however, due
to the usage for a chunk->map element, the actual offset of the aim area
[21, 29) is 21 but is recorded in relevant element as 20; moreover the
residual tail free area [29, 36) is mistook as in-use and is lost silently

as explained above, inaccurate either offset or free/in-use state of
a area is recorded into relevant chunk->map element if request a odd
alignment area, and so causes memory leakage issue

fix it by forcing the @align of a area to allocate a even number
as do for @size.

BTW, macro ALIGN() within pcpu_fit_in_area() is replaced by roundup() too
due to back reason. in order to align a value @v up to @a boundary, macro
roundup(v, a) is more generic than ALIGN(x, a); the latter doesn't work
well when @a isn't a power of 2 value. for example, roundup(10, 6) == 12
but ALIGN(10, 6) == 10, the former result is desired obviously

Signed-off-by: zijun_hu 
---
 include/linux/kernel.h | 1 +
 mm/percpu.c| 6 --
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 74fd6f05bc5b..ddf46638ef21 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -45,6 +45,7 @@
 
 #define REPEAT_BYTE(x) ((~0ul / 0xff) * (x))
 
+/* @a is a power of 2 value */
 #define ALIGN(x, a)__ALIGN_KERNEL((x), (a))
 #define __ALIGN_MASK(x, mask)  __ALIGN_KERNEL_MASK((x), (mask))
 #define PTR_ALIGN(p, a)((typeof(p))ALIGN((unsigned long)(p), 
(a)))
diff --git a/mm/percpu.c b/mm/percpu.c
index c2f0d9734d8c..26d1c73bd9e2 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -502,7 +502,7 @@ static int pcpu_fit_in_area(struct pcpu_chunk *chunk, int 
off, int this_size,
int cand_off = off;
 
while (true) {
-   int head = ALIGN(cand_off, align) - off;
+   int head = roundup(cand_off, align) - off;
int page_start, page_end, rs, re;
 
if (this_size < head + size)
@@ -879,11 +879,13 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
 
/*
 * We want the lowest bit of offset available for in-use/free
-* indicator, so force >= 16bit alignment and make size even.
+* indicator, so force alignment >= 2 even and make size even.
 */
if (unlikely(align < 2))
align = 2;
 
+   if (WARN_ON_ONCE(!IS_ALIGNED(align, 2)))
+   align = ALIGN(align, 2);
size = ALIGN(size, 2);
 
if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
-- 
1.9.1

[RFC PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-11 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

as shown by pcpu_build_alloc_info(), the number of units within a percpu
group is educed by rounding up the number of CPUs within the group to
@upa boundary, therefore, the number of CPUs isn't equal to the units's
if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
uses BUG_ON() to assert one number is equal the other roughly, so a panic
is maybe triggered by the BUG_ON() falsely.

in order to fix this issue, the number of CPUs is rounded up then compared
with units's, the BUG_ON() is replaced by warning and returning error code
as well to keep system alive as much as possible.

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 mm/percpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 32e2d8d128c1..c2f0d9734d8c 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2095,6 +2095,8 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
size_t pages_size;
struct page **pages;
int unit, i, j, rc;
+   int upa;
+   int nr_g0_units;
 
snprintf(psize_str, sizeof(psize_str), "%luK", PAGE_SIZE >> 10);
 
@@ -2102,7 +2104,12 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
if (IS_ERR(ai))
return PTR_ERR(ai);
BUG_ON(ai->nr_groups != 1);
-   BUG_ON(ai->groups[0].nr_units != num_possible_cpus());
+   upa = ai->alloc_size/ai->unit_size;
+   g0_nr_units = roundup(num_possible_cpus(), upa);
+   if (unlikely(WARN_ON(ai->groups[0].nr_units != nr_g0_units))) {
+   pcpu_free_alloc_info(ai);
+   return -EINVAL;
+   }
 
unit_pages = ai->unit_size >> PAGE_SHIFT;
 
@@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
 
/* allocate pages */
j = 0;
-   for (unit = 0; unit < num_possible_cpus(); unit++)
+   for (unit = 0; unit < num_possible_cpus(); unit++) {
+   unsigned int cpu = ai->groups[0].cpu_map[unit];
for (i = 0; i < unit_pages; i++) {
-   unsigned int cpu = ai->groups[0].cpu_map[unit];
void *ptr;
 
ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
if (!ptr) {
pr_warn("failed to allocate %s page for 
cpu%u\n",
-   psize_str, cpu);
+   psize_str, cpu);
goto enomem;
}
/* kmemleak tracks the percpu allocations separately */
kmemleak_free(ptr);
pages[j++] = virt_to_page(ptr);
}
+   }
 
/* allocate vm area, map the pages and copy static data */
vm.flags = VM_ALLOC;
-- 
1.9.1

[RFC PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

2016-10-11 Thread zijun_hu

From: zijun_hu 

as shown by pcpu_build_alloc_info(), the number of units within a percpu
group is educed by rounding up the number of CPUs within the group to
@upa boundary, therefore, the number of CPUs isn't equal to the units's
if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
uses BUG_ON() to assert one number is equal the other roughly, so a panic
is maybe triggered by the BUG_ON() falsely.

in order to fix this issue, the number of CPUs is rounded up then compared
with units's, the BUG_ON() is replaced by warning and returning error code
as well to keep system alive as much as possible.

Signed-off-by: zijun_hu 
---
 mm/percpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 32e2d8d128c1..c2f0d9734d8c 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2095,6 +2095,8 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
size_t pages_size;
struct page **pages;
int unit, i, j, rc;
+   int upa;
+   int nr_g0_units;
 
snprintf(psize_str, sizeof(psize_str), "%luK", PAGE_SIZE >> 10);
 
@@ -2102,7 +2104,12 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
if (IS_ERR(ai))
return PTR_ERR(ai);
BUG_ON(ai->nr_groups != 1);
-   BUG_ON(ai->groups[0].nr_units != num_possible_cpus());
+   upa = ai->alloc_size/ai->unit_size;
+   g0_nr_units = roundup(num_possible_cpus(), upa);
+   if (unlikely(WARN_ON(ai->groups[0].nr_units != nr_g0_units))) {
+   pcpu_free_alloc_info(ai);
+   return -EINVAL;
+   }
 
unit_pages = ai->unit_size >> PAGE_SHIFT;
 
@@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
 
/* allocate pages */
j = 0;
-   for (unit = 0; unit < num_possible_cpus(); unit++)
+   for (unit = 0; unit < num_possible_cpus(); unit++) {
+   unsigned int cpu = ai->groups[0].cpu_map[unit];
for (i = 0; i < unit_pages; i++) {
-   unsigned int cpu = ai->groups[0].cpu_map[unit];
void *ptr;
 
ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
if (!ptr) {
pr_warn("failed to allocate %s page for 
cpu%u\n",
-   psize_str, cpu);
+   psize_str, cpu);
goto enomem;
}
/* kmemleak tracks the percpu allocations separately */
kmemleak_free(ptr);
pages[j++] = virt_to_page(ptr);
}
+   }
 
/* allocate vm area, map the pages and copy static data */
vm.flags = VM_ALLOC;
-- 
1.9.1

Re: [RFC v2 PATCH] mm/percpu.c: simplify grouping CPU algorithm

2016-10-11 Thread zijun_hu

On 2016/10/11 20:48, zijun_hu wrote:
> From: zijun_hu <zijun...@htc.com>

> in order to verify the new algorithm, we enumerate many pairs of type
> @pcpu_fc_cpu_distance_fn_t function and the relevant CPU IDs array such
> below sample, then apply both algorithms to the same pair and print the
> grouping results separately, the new algorithm is okay after checking
> whether the result printed from the new one is same with the original.
> a sample pair of function and array format is shown as follows:
> /* group CPUs by even/odd number */
> static int cpu_distance_fn0(int from, int to)
> {
>   if (from % 2 ^ to % 2)
>   return REMOTE_DISTANCE;
>   else
>   return LOCAL_DISTANCE;
> }
> /* end with -1 */
> int cpu_ids_0[] = {0, 1, 2, 3, 7, 8, 9, 11, 14, 17, 19, 20, 22, 24, -1};
> 
> Signed-off-by: zijun_hu <zijun...@htc.com>
> Tested-by: zijun_hu <zijun...@htc.com>
> ---
how to test the new grouping CPU algorithm ?
1) copy the test source code to a file named percpu_group_test.c
2) compile the test program with below command line
   gcc -Wall -o percpu_group_test percpu_group_test.c
3) get usage info about the percpu_group_test
   ./percpu_group_test -h
4) produce the grouping result by the new algorithm
   ./percpu_group_test new > percpu.new
5) produce the grouping result by the original algorithm
   ./percpu_group_test orig > percpu.orig
6) examine the test result; okay if same result; otherwise, failed
   diff -u percpu.new percpu.orig

test program sources is shown as follows:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define LOCAL_DISTANCE  10
#define REMOTE_DISTANCE 20

#define NR_CPUS 96
static int nr_groups;
static int group_map[NR_CPUS];
static int group_cnt[NR_CPUS];

static int *cpu_ids_ptr;
static int (*cpu_distance_fn) (int from, int to);

static int cpu_distance_fn0(int from, int to);
extern int cpu_ids_0[];

static int cpu_distance_fn1(int from, int to);
extern int cpu_ids_1[];

static int cpu_distance_fn2(int from, int to);
extern int cpu_ids_2[];

int (*cpu_distance_funcx[]) (int, int) = {
cpu_distance_fn0, cpu_distance_fn1, cpu_distance_fn2, NULL};
int *cpu_ids_x[] = { cpu_ids_0, cpu_ids_1, cpu_ids_2, NULL };

static void percpu_test_prepare(int test_type)
{
nr_groups = 0;
memset(group_map, 0xff, sizeof(group_map));
memset(group_cnt, 0xff, sizeof(group_cnt));

cpu_ids_ptr = cpu_ids_x[test_type];
cpu_distance_fn = cpu_distance_funcx[test_type];
}

static int next_cpu(int cpu)
{
int i = 0;
while (cpu_ids_ptr[i] != cpu)
i++;
return cpu_ids_ptr[i + 1];
}

#define for_each_possible_cpu(cpu)  \
for ((cpu) = cpu_ids_ptr[0];\
(cpu) != -1;\
(cpu) = next_cpu((cpu)))

#define max(v0, v1) ((v0) > (v1) ? (v0) : (v1))

static void percpu_result_printf(void)
{
int g;
int c;

printf("nr_groups = %d\n", nr_groups);
#if 0
for_each_possible_cpu(c)
if (group_map[c] != -1)
printf("group_map[%d] = %d\n", c, group_map[c]);
for (g = 0; g < nr_groups; g++)
printf("group_cnt[%d] = %d\n", g, group_cnt[g]);
#else
for (g = 0; g < nr_groups; g++) {
printf("group id %d : ", g);
for_each_possible_cpu(c)
if (group_map[c] == g)
printf("%d ", c);
printf("\n");
}
printf("\n");
#endif
}

/*
 * group cpus by even or odd ids
 */
static int cpu_distance_fn0(int from, int to)
{
if (from % 2 ^ to % 2)
return REMOTE_DISTANCE;
else
return LOCAL_DISTANCE;
}

/* end with -1 */
int cpu_ids_0[] = { 0, 1, 2, 3, 7, 8, 9, 11, 14, 17, 19, 20, 22, 24, 31, -1 };

/*
 * group cpus by 3x of cpu ids
 */
int cpu_distance_fn1(int from, int to)
{
if (from % 3 == 0 && to % 3 == 0)
return LOCAL_DISTANCE;
else if (from % 3 && to % 3)
return LOCAL_DISTANCE;
else
return REMOTE_DISTANCE;
}
int cpu_ids_1[] = { 0, 3, 5, 6, 8, 9, 10, 11, 12, 14, 17, 18, 21, 24, 25, -1 };

/*
 * group cpus by range, [..., 10), [10, 20), [20, ...)
 */
int cpu_distance_fn2(int from, int to)
{
if (from < 10 && to < 10)
return LOCAL_DISTANCE;
else if (from >= 20 && to >= 20)
return LOCAL_DISTANCE;
else if ((from >= 10 && from < 20) && (to >= 10 && to < 20))
return LOCAL_DISTANCE;
else
return REMOTE_DISTANCE;

}
int cpu_ids_2[] =
{ 0, 1, 2, 4, 6, 8, 9, 10, 12, 1

Re: [RFC v2 PATCH] mm/percpu.c: simplify grouping CPU algorithm

2016-10-11 Thread zijun_hu

On 2016/10/11 20:48, zijun_hu wrote:
> From: zijun_hu 

> in order to verify the new algorithm, we enumerate many pairs of type
> @pcpu_fc_cpu_distance_fn_t function and the relevant CPU IDs array such
> below sample, then apply both algorithms to the same pair and print the
> grouping results separately, the new algorithm is okay after checking
> whether the result printed from the new one is same with the original.
> a sample pair of function and array format is shown as follows:
> /* group CPUs by even/odd number */
> static int cpu_distance_fn0(int from, int to)
> {
>   if (from % 2 ^ to % 2)
>   return REMOTE_DISTANCE;
>   else
>   return LOCAL_DISTANCE;
> }
> /* end with -1 */
> int cpu_ids_0[] = {0, 1, 2, 3, 7, 8, 9, 11, 14, 17, 19, 20, 22, 24, -1};
> 
> Signed-off-by: zijun_hu 
> Tested-by: zijun_hu 
> ---
how to test the new grouping CPU algorithm ?
1) copy the test source code to a file named percpu_group_test.c
2) compile the test program with below command line
   gcc -Wall -o percpu_group_test percpu_group_test.c
3) get usage info about the percpu_group_test
   ./percpu_group_test -h
4) produce the grouping result by the new algorithm
   ./percpu_group_test new > percpu.new
5) produce the grouping result by the original algorithm
   ./percpu_group_test orig > percpu.orig
6) examine the test result; okay if same result; otherwise, failed
   diff -u percpu.new percpu.orig

test program sources is shown as follows:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define LOCAL_DISTANCE  10
#define REMOTE_DISTANCE 20

#define NR_CPUS 96
static int nr_groups;
static int group_map[NR_CPUS];
static int group_cnt[NR_CPUS];

static int *cpu_ids_ptr;
static int (*cpu_distance_fn) (int from, int to);

static int cpu_distance_fn0(int from, int to);
extern int cpu_ids_0[];

static int cpu_distance_fn1(int from, int to);
extern int cpu_ids_1[];

static int cpu_distance_fn2(int from, int to);
extern int cpu_ids_2[];

int (*cpu_distance_funcx[]) (int, int) = {
cpu_distance_fn0, cpu_distance_fn1, cpu_distance_fn2, NULL};
int *cpu_ids_x[] = { cpu_ids_0, cpu_ids_1, cpu_ids_2, NULL };

static void percpu_test_prepare(int test_type)
{
nr_groups = 0;
memset(group_map, 0xff, sizeof(group_map));
memset(group_cnt, 0xff, sizeof(group_cnt));

cpu_ids_ptr = cpu_ids_x[test_type];
cpu_distance_fn = cpu_distance_funcx[test_type];
}

static int next_cpu(int cpu)
{
int i = 0;
while (cpu_ids_ptr[i] != cpu)
i++;
return cpu_ids_ptr[i + 1];
}

#define for_each_possible_cpu(cpu)  \
for ((cpu) = cpu_ids_ptr[0];\
(cpu) != -1;\
(cpu) = next_cpu((cpu)))

#define max(v0, v1) ((v0) > (v1) ? (v0) : (v1))

static void percpu_result_printf(void)
{
int g;
int c;

printf("nr_groups = %d\n", nr_groups);
#if 0
for_each_possible_cpu(c)
if (group_map[c] != -1)
printf("group_map[%d] = %d\n", c, group_map[c]);
for (g = 0; g < nr_groups; g++)
printf("group_cnt[%d] = %d\n", g, group_cnt[g]);
#else
for (g = 0; g < nr_groups; g++) {
printf("group id %d : ", g);
for_each_possible_cpu(c)
if (group_map[c] == g)
printf("%d ", c);
printf("\n");
}
printf("\n");
#endif
}

/*
 * group cpus by even or odd ids
 */
static int cpu_distance_fn0(int from, int to)
{
if (from % 2 ^ to % 2)
return REMOTE_DISTANCE;
else
return LOCAL_DISTANCE;
}

/* end with -1 */
int cpu_ids_0[] = { 0, 1, 2, 3, 7, 8, 9, 11, 14, 17, 19, 20, 22, 24, 31, -1 };

/*
 * group cpus by 3x of cpu ids
 */
int cpu_distance_fn1(int from, int to)
{
if (from % 3 == 0 && to % 3 == 0)
return LOCAL_DISTANCE;
else if (from % 3 && to % 3)
return LOCAL_DISTANCE;
else
return REMOTE_DISTANCE;
}
int cpu_ids_1[] = { 0, 3, 5, 6, 8, 9, 10, 11, 12, 14, 17, 18, 21, 24, 25, -1 };

/*
 * group cpus by range, [..., 10), [10, 20), [20, ...)
 */
int cpu_distance_fn2(int from, int to)
{
if (from < 10 && to < 10)
return LOCAL_DISTANCE;
else if (from >= 20 && to >= 20)
return LOCAL_DISTANCE;
else if ((from >= 10 && from < 20) && (to >= 10 && to < 20))
return LOCAL_DISTANCE;
else
return REMOTE_DISTANCE;

}
int cpu_ids_2[] =
{ 0, 1, 2, 4, 6, 8, 9, 10, 12, 15, 16, 18, 19, 20, 22, 25, 27, -1 };

void orig_group_cpus(

[RFC v2 PATCH] mm/percpu.c: simplify grouping CPU algorithm

2016-10-11 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

pcpu_build_alloc_info() groups CPUs according to relevant proximity
together to allocate memory for each percpu unit based on group.
however, the grouping algorithm consists of three loops and a goto
statement actually, and is inefficient and difficult to understand

the original algorithm is simplified to only consists of two loops
without any goto statement. for the new one, in order to assign a group
number to a target CPU, we check whether it can share a group with any
lower index CPU; the shareable group number is reused if so; otherwise,
a new one is assigned to it.

compared with the original algorithm theoretically and practically, the
new one educes the same grouping results, besides, it is more effective,
simpler and easier to understand.

in order to verify the new algorithm, we enumerate many pairs of type
@pcpu_fc_cpu_distance_fn_t function and the relevant CPU IDs array such
below sample, then apply both algorithms to the same pair and print the
grouping results separately, the new algorithm is okay after checking
whether the result printed from the new one is same with the original.
a sample pair of function and array format is shown as follows:
/* group CPUs by even/odd number */
static int cpu_distance_fn0(int from, int to)
{
if (from % 2 ^ to % 2)
return REMOTE_DISTANCE;
else
return LOCAL_DISTANCE;
}
/* end with -1 */
int cpu_ids_0[] = {0, 1, 2, 3, 7, 8, 9, 11, 14, 17, 19, 20, 22, 24, -1};

Signed-off-by: zijun_hu <zijun...@htc.com>
Tested-by: zijun_hu <zijun...@htc.com>
---
 Changes in v2:
  - update commit messages

 mm/percpu.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 255714302394..32e2d8d128c1 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1824,23 +1824,25 @@ static struct pcpu_alloc_info * __init 
pcpu_build_alloc_info(
max_upa = upa;
 
/* group cpus according to their proximity */
-   for_each_possible_cpu(cpu) {
-   group = 0;
-   next_group:
+   group = 0;
+   for_each_possible_cpu(cpu)
for_each_possible_cpu(tcpu) {
-   if (cpu == tcpu)
-   break;
-   if (group_map[tcpu] == group && cpu_distance_fn &&
-   (cpu_distance_fn(cpu, tcpu) > LOCAL_DISTANCE ||
-cpu_distance_fn(tcpu, cpu) > LOCAL_DISTANCE)) {
+   if (tcpu == cpu) {
+   group_map[cpu] = group;
+   group_cnt[group] = 1;
group++;
-   nr_groups = max(nr_groups, group + 1);
-   goto next_group;
+   break;
+   }
+
+   if (!cpu_distance_fn ||
+   (cpu_distance_fn(cpu, tcpu) == LOCAL_DISTANCE &&
+cpu_distance_fn(tcpu, cpu) == LOCAL_DISTANCE)) {
+   group_map[cpu] = group_map[tcpu];
+   group_cnt[group_map[cpu]]++;
+   break;
}
}
-   group_map[cpu] = group;
-   group_cnt[group]++;
-   }
+   nr_groups = group;
 
/*
 * Expand unit size until address space usage goes over 75%
-- 
1.9.1

[RFC v2 PATCH] mm/percpu.c: simplify grouping CPU algorithm

2016-10-11 Thread zijun_hu

From: zijun_hu 

pcpu_build_alloc_info() groups CPUs according to relevant proximity
together to allocate memory for each percpu unit based on group.
however, the grouping algorithm consists of three loops and a goto
statement actually, and is inefficient and difficult to understand

the original algorithm is simplified to only consists of two loops
without any goto statement. for the new one, in order to assign a group
number to a target CPU, we check whether it can share a group with any
lower index CPU; the shareable group number is reused if so; otherwise,
a new one is assigned to it.

compared with the original algorithm theoretically and practically, the
new one educes the same grouping results, besides, it is more effective,
simpler and easier to understand.

in order to verify the new algorithm, we enumerate many pairs of type
@pcpu_fc_cpu_distance_fn_t function and the relevant CPU IDs array such
below sample, then apply both algorithms to the same pair and print the
grouping results separately, the new algorithm is okay after checking
whether the result printed from the new one is same with the original.
a sample pair of function and array format is shown as follows:
/* group CPUs by even/odd number */
static int cpu_distance_fn0(int from, int to)
{
if (from % 2 ^ to % 2)
return REMOTE_DISTANCE;
else
return LOCAL_DISTANCE;
}
/* end with -1 */
int cpu_ids_0[] = {0, 1, 2, 3, 7, 8, 9, 11, 14, 17, 19, 20, 22, 24, -1};

Signed-off-by: zijun_hu 
Tested-by: zijun_hu 
---
 Changes in v2:
  - update commit messages

 mm/percpu.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 255714302394..32e2d8d128c1 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1824,23 +1824,25 @@ static struct pcpu_alloc_info * __init 
pcpu_build_alloc_info(
max_upa = upa;
 
/* group cpus according to their proximity */
-   for_each_possible_cpu(cpu) {
-   group = 0;
-   next_group:
+   group = 0;
+   for_each_possible_cpu(cpu)
for_each_possible_cpu(tcpu) {
-   if (cpu == tcpu)
-   break;
-   if (group_map[tcpu] == group && cpu_distance_fn &&
-   (cpu_distance_fn(cpu, tcpu) > LOCAL_DISTANCE ||
-cpu_distance_fn(tcpu, cpu) > LOCAL_DISTANCE)) {
+   if (tcpu == cpu) {
+   group_map[cpu] = group;
+   group_cnt[group] = 1;
group++;
-   nr_groups = max(nr_groups, group + 1);
-   goto next_group;
+   break;
+   }
+
+   if (!cpu_distance_fn ||
+   (cpu_distance_fn(cpu, tcpu) == LOCAL_DISTANCE &&
+cpu_distance_fn(tcpu, cpu) == LOCAL_DISTANCE)) {
+   group_map[cpu] = group_map[tcpu];
+   group_cnt[group_map[cpu]]++;
+   break;
}
}
-   group_map[cpu] = group;
-   group_cnt[group]++;
-   }
+   nr_groups = group;
 
/*
 * Expand unit size until address space usage goes over 75%
-- 
1.9.1

[RFC PATCH v3 2/2] mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-10-05 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

in order to ensure the percpu group areas within a chunk aren't
distributed too sparsely, pcpu_embed_first_chunk() goes to error handling
path when a chunk spans over 3/4 VMALLOC area, however, during the error
handling, it forget to free the memory allocated for all percpu groups by
going to label @out_free other than @out_free_areas.

it will cause memory leakage issue if the rare scene really happens, in
order to fix the issue, we check chunk spanned area immediately after
completing memory allocation for all percpu groups, we go to label
@out_free_areas to free the memory then return if the checking is failed.

in order to verify the approach, we dump all memory allocated then
enforce the jump then dump all memory freed, the result is okay after
checking whether we free all memory we allocate in this function.

BTW, The approach is chosen after thinking over the below scenes
 - we don't go to label @out_free directly to fix this issue since we
   maybe free several allocated memory blocks twice
 - the aim of jumping after pcpu_setup_first_chunk() is bypassing free
   usable memory other than handling error, moreover, the function does
   not return error code in any case, it either panics due to BUG_ON()
   or return 0.

Signed-off-by: zijun_hu <zijun...@htc.com>
Tested-by: zijun_hu <zijun...@htc.com>
---
 Changes in v3:
  - commit message is updated
  - more descriptive local variant name highest_group is used

 Changes in v2:
  - more detailed commit message is provided as discussed
with t...@kernel.org

 mm/percpu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index e2737e56b017..255714302394 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1963,7 +1963,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
struct pcpu_alloc_info *ai;
size_t size_sum, areas_size;
unsigned long max_distance;
-   int group, i, rc;
+   int group, i, highest_group, rc;
 
ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
   cpu_distance_fn);
@@ -1979,7 +1979,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
goto out_free;
}
 
-   /* allocate, copy and determine base address */
+   /* allocate, copy and determine base address & max_distance */
+   highest_group = 0;
for (group = 0; group < ai->nr_groups; group++) {
struct pcpu_group_info *gi = >groups[group];
unsigned int cpu = NR_CPUS;
@@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
areas[group] = ptr;
 
base = min(ptr, base);
+   if (ptr > areas[highest_group])
+   highest_group = group;
+   }
+   max_distance = areas[highest_group] - base;
+   max_distance += ai->unit_size * ai->groups[highest_group].nr_units;
+
+   /* warn if maximum distance is further than 75% of vmalloc space */
+   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
+   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
+   max_distance, VMALLOC_TOTAL);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+   /* and fail if we have fallback */
+   rc = -EINVAL;
+   goto out_free_areas;
+#endif
}
 
/*
@@ -2024,24 +2040,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
}
 
/* base address is now known, determine group base offsets */
-   i = 0;
for (group = 0; group < ai->nr_groups; group++) {
ai->groups[group].base_offset = areas[group] - base;
-   if (areas[group] > areas[i])
-   i = group;
-   }
-   max_distance = ai->groups[i].base_offset +
-   ai->unit_size * ai->groups[i].nr_units;
-
-   /* warn if maximum distance is further than 75% of vmalloc space */
-   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
-   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
-   max_distance, VMALLOC_TOTAL);
-#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
-   /* and fail if we have fallback */
-   rc = -EINVAL;
-   goto out_free;
-#endif
}
 
pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
-- 
1.9.1

[RFC PATCH v3 2/2] mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-10-05 Thread zijun_hu

From: zijun_hu 

in order to ensure the percpu group areas within a chunk aren't
distributed too sparsely, pcpu_embed_first_chunk() goes to error handling
path when a chunk spans over 3/4 VMALLOC area, however, during the error
handling, it forget to free the memory allocated for all percpu groups by
going to label @out_free other than @out_free_areas.

it will cause memory leakage issue if the rare scene really happens, in
order to fix the issue, we check chunk spanned area immediately after
completing memory allocation for all percpu groups, we go to label
@out_free_areas to free the memory then return if the checking is failed.

in order to verify the approach, we dump all memory allocated then
enforce the jump then dump all memory freed, the result is okay after
checking whether we free all memory we allocate in this function.

BTW, The approach is chosen after thinking over the below scenes
 - we don't go to label @out_free directly to fix this issue since we
   maybe free several allocated memory blocks twice
 - the aim of jumping after pcpu_setup_first_chunk() is bypassing free
   usable memory other than handling error, moreover, the function does
   not return error code in any case, it either panics due to BUG_ON()
   or return 0.

Signed-off-by: zijun_hu 
Tested-by: zijun_hu 
---
 Changes in v3:
  - commit message is updated
  - more descriptive local variant name highest_group is used

 Changes in v2:
  - more detailed commit message is provided as discussed
with t...@kernel.org

 mm/percpu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index e2737e56b017..255714302394 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1963,7 +1963,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
struct pcpu_alloc_info *ai;
size_t size_sum, areas_size;
unsigned long max_distance;
-   int group, i, rc;
+   int group, i, highest_group, rc;
 
ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
   cpu_distance_fn);
@@ -1979,7 +1979,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
goto out_free;
}
 
-   /* allocate, copy and determine base address */
+   /* allocate, copy and determine base address & max_distance */
+   highest_group = 0;
for (group = 0; group < ai->nr_groups; group++) {
struct pcpu_group_info *gi = >groups[group];
unsigned int cpu = NR_CPUS;
@@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
areas[group] = ptr;
 
base = min(ptr, base);
+   if (ptr > areas[highest_group])
+   highest_group = group;
+   }
+   max_distance = areas[highest_group] - base;
+   max_distance += ai->unit_size * ai->groups[highest_group].nr_units;
+
+   /* warn if maximum distance is further than 75% of vmalloc space */
+   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
+   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
+   max_distance, VMALLOC_TOTAL);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+   /* and fail if we have fallback */
+   rc = -EINVAL;
+   goto out_free_areas;
+#endif
}
 
/*
@@ -2024,24 +2040,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
}
 
/* base address is now known, determine group base offsets */
-   i = 0;
for (group = 0; group < ai->nr_groups; group++) {
ai->groups[group].base_offset = areas[group] - base;
-   if (areas[group] > areas[i])
-   i = group;
-   }
-   max_distance = ai->groups[i].base_offset +
-   ai->unit_size * ai->groups[i].nr_units;
-
-   /* warn if maximum distance is further than 75% of vmalloc space */
-   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
-   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
-   max_distance, VMALLOC_TOTAL);
-#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
-   /* and fail if we have fallback */
-   rc = -EINVAL;
-   goto out_free;
-#endif
}
 
pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
-- 
1.9.1

[RFC PATCH v2 1/2] mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk()

2016-10-05 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

pcpu_embed_first_chunk() calculates the range a percpu chunk spans into
@max_distance and uses it to ensure that a chunk is not too big compared
to the total vmalloc area. However, during calculation, it used incorrect
top address by adding a unit size to the highest group's base address.

This can make the calculated max_distance slightly smaller than the actual
distance although given the scale of values involved the error is very
unlikely to have an actual impact.

Fix this issue by adding the group's size instead of a unit size.

BTW, The type of variable max_distance is changed from size_t to unsigned
long too based on below consideration:
 - type unsigned long usually have same width with IP core registers and
   can be applied at here very well
 - make @max_distance type consistent with the operand calculated against
   it such as @ai->groups[i].base_offset and macro VMALLOC_TOTAL
 - type unsigned long is more universal then size_t, size_t is type defined
   to unsigned int or unsigned long among various ARCHs usually

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 Changes in v2:
  - commit messages offered by Tejun are applied
  - redundant type cast is removed

 mm/percpu.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 9903830aaebb..e2737e56b017 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1961,7 +1961,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
void *base = (void *)ULONG_MAX;
void **areas = NULL;
struct pcpu_alloc_info *ai;
-   size_t size_sum, areas_size, max_distance;
+   size_t size_sum, areas_size;
+   unsigned long max_distance;
int group, i, rc;
 
ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
@@ -2023,17 +2024,18 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
}
 
/* base address is now known, determine group base offsets */
-   max_distance = 0;
+   i = 0;
for (group = 0; group < ai->nr_groups; group++) {
ai->groups[group].base_offset = areas[group] - base;
-   max_distance = max_t(size_t, max_distance,
-ai->groups[group].base_offset);
+   if (areas[group] > areas[i])
+   i = group;
}
-   max_distance += ai->unit_size;
+   max_distance = ai->groups[i].base_offset +
+   ai->unit_size * ai->groups[i].nr_units;
 
/* warn if maximum distance is further than 75% of vmalloc space */
if (max_distance > VMALLOC_TOTAL * 3 / 4) {
-   pr_warn("max_distance=0x%zx too large for vmalloc space 
0x%lx\n",
+   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
max_distance, VMALLOC_TOTAL);
 #ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
/* and fail if we have fallback */
-- 
1.9.1

[RFC PATCH v2 1/2] mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk()

2016-10-05 Thread zijun_hu

From: zijun_hu 

pcpu_embed_first_chunk() calculates the range a percpu chunk spans into
@max_distance and uses it to ensure that a chunk is not too big compared
to the total vmalloc area. However, during calculation, it used incorrect
top address by adding a unit size to the highest group's base address.

This can make the calculated max_distance slightly smaller than the actual
distance although given the scale of values involved the error is very
unlikely to have an actual impact.

Fix this issue by adding the group's size instead of a unit size.

BTW, The type of variable max_distance is changed from size_t to unsigned
long too based on below consideration:
 - type unsigned long usually have same width with IP core registers and
   can be applied at here very well
 - make @max_distance type consistent with the operand calculated against
   it such as @ai->groups[i].base_offset and macro VMALLOC_TOTAL
 - type unsigned long is more universal then size_t, size_t is type defined
   to unsigned int or unsigned long among various ARCHs usually

Signed-off-by: zijun_hu 
---
 Changes in v2:
  - commit messages offered by Tejun are applied
  - redundant type cast is removed

 mm/percpu.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 9903830aaebb..e2737e56b017 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1961,7 +1961,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
void *base = (void *)ULONG_MAX;
void **areas = NULL;
struct pcpu_alloc_info *ai;
-   size_t size_sum, areas_size, max_distance;
+   size_t size_sum, areas_size;
+   unsigned long max_distance;
int group, i, rc;
 
ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
@@ -2023,17 +2024,18 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
}
 
/* base address is now known, determine group base offsets */
-   max_distance = 0;
+   i = 0;
for (group = 0; group < ai->nr_groups; group++) {
ai->groups[group].base_offset = areas[group] - base;
-   max_distance = max_t(size_t, max_distance,
-ai->groups[group].base_offset);
+   if (areas[group] > areas[i])
+   i = group;
}
-   max_distance += ai->unit_size;
+   max_distance = ai->groups[i].base_offset +
+   ai->unit_size * ai->groups[i].nr_units;
 
/* warn if maximum distance is further than 75% of vmalloc space */
if (max_distance > VMALLOC_TOTAL * 3 / 4) {
-   pr_warn("max_distance=0x%zx too large for vmalloc space 
0x%lx\n",
+   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
max_distance, VMALLOC_TOTAL);
 #ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
/* and fail if we have fallback */
-- 
1.9.1

Re: [PATCH v2 1/1] mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-10-01 Thread zijun_hu

Hi Tejun,
  as we discussed, i include some discussion content in the commit message.
could you give some new comments or acknowledgment for this patch?

On 2016/9/30 19:30, zijun_hu wrote:
> From: zijun_hu <zijun...@htc.com>
> 
> it will cause memory leakage for pcpu_embed_first_chunk() to go to
> label @out_free if the chunk spans over 3/4 VMALLOC area. all memory
> are allocated and recorded into array @areas for each CPU group, but
> the memory allocated aren't be freed before returning after going to
> label @out_free.
> 
> in order to fix this bug, we check chunk spanned area immediately
> after completing memory allocation for all CPU group, we go to label
> @out_free_areas other than @out_free to free all memory allocated if
> the checking is failed.
> 
> in order to verify the approach, we dump all memory allocated then
> enforce the jump then dump all memory freed, the result is okay after
> checking whether we free all memory we allocate in this function.
> 
> BTW, The approach is chosen after thinking over the below scenes
>  - we don't go to label @out_free directly to fix this issue since we
>maybe free several allocated memory blocks twice
>  - the aim of jumping after pcpu_setup_first_chunk() is bypassing free
>usable memory other than handling error, moreover, the function does
>not return error code in any case, it either panics due to BUG_ON()
>or return 0.
> 
> Signed-off-by: zijun_hu <zijun...@htc.com>
> Tested-by: zijun_hu <zijun...@htc.com>
> ---
>  this patch is based on mmotm/linux-next branch so can be
>  applied to them directly
> 
>  Changes in v2: 
>   - more detailed commit message is provided as discussed
> with t...@kernel.org
> 
>  mm/percpu.c | 36 ++--
>  1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 41d9d0b35801..7a5dae185ce1 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -1963,7 +1963,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
> size_t dyn_size,
>   struct pcpu_alloc_info *ai;
>   size_t size_sum, areas_size;
>   unsigned long max_distance;
> - int group, i, rc;
> + int group, i, j, rc;
>  
>   ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
>  cpu_distance_fn);
> @@ -1979,7 +1979,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
> size_t dyn_size,
>   goto out_free;
>   }
>  
> - /* allocate, copy and determine base address */
> + /* allocate, copy and determine base address & max_distance */
> + j = 0;
>   for (group = 0; group < ai->nr_groups; group++) {
>   struct pcpu_group_info *gi = >groups[group];
>   unsigned int cpu = NR_CPUS;
> @@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t 
> reserved_size, size_t dyn_size,
>   areas[group] = ptr;
>  
>   base = min(ptr, base);
> + if (ptr > areas[j])
> + j = group;
> + }
> + max_distance = areas[j] - base;
> + max_distance += ai->unit_size * ai->groups[j].nr_units;
> +
> + /* warn if maximum distance is further than 75% of vmalloc space */
> + if (max_distance > VMALLOC_TOTAL * 3 / 4) {
> + pr_warn("max_distance=0x%lx too large for vmalloc space 
> 0x%lx\n",
> + max_distance, VMALLOC_TOTAL);
> +#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
> + /* and fail if we have fallback */
> + rc = -EINVAL;
> + goto out_free_areas;
> +#endif
>   }
>  
>   /*
> @@ -2024,24 +2040,8 @@ int __init pcpu_embed_first_chunk(size_t 
> reserved_size, size_t dyn_size,
>   }
>  
>   /* base address is now known, determine group base offsets */
> - i = 0;
>   for (group = 0; group < ai->nr_groups; group++) {
>   ai->groups[group].base_offset = areas[group] - base;
> - if (areas[group] > areas[i])
> - i = group;
> - }
> - max_distance = ai->groups[i].base_offset +
> - (unsigned long)ai->unit_size * ai->groups[i].nr_units;
> -
> - /* warn if maximum distance is further than 75% of vmalloc space */
> - if (max_distance > VMALLOC_TOTAL * 3 / 4) {
> - pr_warn("max_distance=0x%lx too large for vmalloc space 
> 0x%lx\n",
> - max_distance, VMALLOC_TOTAL);
> -#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
> - /* and fail if we have fallback */
> - rc = -EINVAL;
> - goto out_free;
> -#endif
>   }
>  
>   pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
>

Re: [PATCH v2 1/1] mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-10-01 Thread zijun_hu

Hi Tejun,
  as we discussed, i include some discussion content in the commit message.
could you give some new comments or acknowledgment for this patch?

On 2016/9/30 19:30, zijun_hu wrote:
> From: zijun_hu 
> 
> it will cause memory leakage for pcpu_embed_first_chunk() to go to
> label @out_free if the chunk spans over 3/4 VMALLOC area. all memory
> are allocated and recorded into array @areas for each CPU group, but
> the memory allocated aren't be freed before returning after going to
> label @out_free.
> 
> in order to fix this bug, we check chunk spanned area immediately
> after completing memory allocation for all CPU group, we go to label
> @out_free_areas other than @out_free to free all memory allocated if
> the checking is failed.
> 
> in order to verify the approach, we dump all memory allocated then
> enforce the jump then dump all memory freed, the result is okay after
> checking whether we free all memory we allocate in this function.
> 
> BTW, The approach is chosen after thinking over the below scenes
>  - we don't go to label @out_free directly to fix this issue since we
>maybe free several allocated memory blocks twice
>  - the aim of jumping after pcpu_setup_first_chunk() is bypassing free
>usable memory other than handling error, moreover, the function does
>not return error code in any case, it either panics due to BUG_ON()
>or return 0.
> 
> Signed-off-by: zijun_hu 
> Tested-by: zijun_hu 
> ---
>  this patch is based on mmotm/linux-next branch so can be
>  applied to them directly
> 
>  Changes in v2: 
>   - more detailed commit message is provided as discussed
> with t...@kernel.org
> 
>  mm/percpu.c | 36 ++--
>  1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 41d9d0b35801..7a5dae185ce1 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -1963,7 +1963,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
> size_t dyn_size,
>   struct pcpu_alloc_info *ai;
>   size_t size_sum, areas_size;
>   unsigned long max_distance;
> - int group, i, rc;
> + int group, i, j, rc;
>  
>   ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
>  cpu_distance_fn);
> @@ -1979,7 +1979,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
> size_t dyn_size,
>   goto out_free;
>   }
>  
> - /* allocate, copy and determine base address */
> + /* allocate, copy and determine base address & max_distance */
> + j = 0;
>   for (group = 0; group < ai->nr_groups; group++) {
>   struct pcpu_group_info *gi = >groups[group];
>   unsigned int cpu = NR_CPUS;
> @@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t 
> reserved_size, size_t dyn_size,
>   areas[group] = ptr;
>  
>   base = min(ptr, base);
> + if (ptr > areas[j])
> + j = group;
> + }
> + max_distance = areas[j] - base;
> + max_distance += ai->unit_size * ai->groups[j].nr_units;
> +
> + /* warn if maximum distance is further than 75% of vmalloc space */
> + if (max_distance > VMALLOC_TOTAL * 3 / 4) {
> + pr_warn("max_distance=0x%lx too large for vmalloc space 
> 0x%lx\n",
> + max_distance, VMALLOC_TOTAL);
> +#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
> + /* and fail if we have fallback */
> + rc = -EINVAL;
> + goto out_free_areas;
> +#endif
>   }
>  
>   /*
> @@ -2024,24 +2040,8 @@ int __init pcpu_embed_first_chunk(size_t 
> reserved_size, size_t dyn_size,
>   }
>  
>   /* base address is now known, determine group base offsets */
> - i = 0;
>   for (group = 0; group < ai->nr_groups; group++) {
>   ai->groups[group].base_offset = areas[group] - base;
> - if (areas[group] > areas[i])
> - i = group;
> - }
> - max_distance = ai->groups[i].base_offset +
> - (unsigned long)ai->unit_size * ai->groups[i].nr_units;
> -
> - /* warn if maximum distance is further than 75% of vmalloc space */
> - if (max_distance > VMALLOC_TOTAL * 3 / 4) {
> - pr_warn("max_distance=0x%lx too large for vmalloc space 
> 0x%lx\n",
> - max_distance, VMALLOC_TOTAL);
> -#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
> - /* and fail if we have fallback */
> - rc = -EINVAL;
> - goto out_free;
> -#endif
>   }
>  
>   pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
>

[PATCH v2 1/1] mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-09-30 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

it will cause memory leakage for pcpu_embed_first_chunk() to go to
label @out_free if the chunk spans over 3/4 VMALLOC area. all memory
are allocated and recorded into array @areas for each CPU group, but
the memory allocated aren't be freed before returning after going to
label @out_free.

in order to fix this bug, we check chunk spanned area immediately
after completing memory allocation for all CPU group, we go to label
@out_free_areas other than @out_free to free all memory allocated if
the checking is failed.

in order to verify the approach, we dump all memory allocated then
enforce the jump then dump all memory freed, the result is okay after
checking whether we free all memory we allocate in this function.

BTW, The approach is chosen after thinking over the below scenes
 - we don't go to label @out_free directly to fix this issue since we
   maybe free several allocated memory blocks twice
 - the aim of jumping after pcpu_setup_first_chunk() is bypassing free
   usable memory other than handling error, moreover, the function does
   not return error code in any case, it either panics due to BUG_ON()
   or return 0.

Signed-off-by: zijun_hu <zijun...@htc.com>
Tested-by: zijun_hu <zijun...@htc.com>
---
 this patch is based on mmotm/linux-next branch so can be
 applied to them directly

 Changes in v2: 
  - more detailed commit message is provided as discussed
with t...@kernel.org

 mm/percpu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 41d9d0b35801..7a5dae185ce1 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1963,7 +1963,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
struct pcpu_alloc_info *ai;
size_t size_sum, areas_size;
unsigned long max_distance;
-   int group, i, rc;
+   int group, i, j, rc;
 
ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
   cpu_distance_fn);
@@ -1979,7 +1979,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
goto out_free;
}
 
-   /* allocate, copy and determine base address */
+   /* allocate, copy and determine base address & max_distance */
+   j = 0;
for (group = 0; group < ai->nr_groups; group++) {
struct pcpu_group_info *gi = >groups[group];
unsigned int cpu = NR_CPUS;
@@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
areas[group] = ptr;
 
base = min(ptr, base);
+   if (ptr > areas[j])
+   j = group;
+   }
+   max_distance = areas[j] - base;
+   max_distance += ai->unit_size * ai->groups[j].nr_units;
+
+   /* warn if maximum distance is further than 75% of vmalloc space */
+   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
+   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
+   max_distance, VMALLOC_TOTAL);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+   /* and fail if we have fallback */
+   rc = -EINVAL;
+   goto out_free_areas;
+#endif
}
 
/*
@@ -2024,24 +2040,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
}
 
/* base address is now known, determine group base offsets */
-   i = 0;
for (group = 0; group < ai->nr_groups; group++) {
ai->groups[group].base_offset = areas[group] - base;
-   if (areas[group] > areas[i])
-   i = group;
-   }
-   max_distance = ai->groups[i].base_offset +
-   (unsigned long)ai->unit_size * ai->groups[i].nr_units;
-
-   /* warn if maximum distance is further than 75% of vmalloc space */
-   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
-   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
-   max_distance, VMALLOC_TOTAL);
-#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
-   /* and fail if we have fallback */
-   rc = -EINVAL;
-   goto out_free;
-#endif
}
 
pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
-- 
1.9.1

[PATCH v2 1/1] mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-09-30 Thread zijun_hu

From: zijun_hu 

it will cause memory leakage for pcpu_embed_first_chunk() to go to
label @out_free if the chunk spans over 3/4 VMALLOC area. all memory
are allocated and recorded into array @areas for each CPU group, but
the memory allocated aren't be freed before returning after going to
label @out_free.

in order to fix this bug, we check chunk spanned area immediately
after completing memory allocation for all CPU group, we go to label
@out_free_areas other than @out_free to free all memory allocated if
the checking is failed.

in order to verify the approach, we dump all memory allocated then
enforce the jump then dump all memory freed, the result is okay after
checking whether we free all memory we allocate in this function.

BTW, The approach is chosen after thinking over the below scenes
 - we don't go to label @out_free directly to fix this issue since we
   maybe free several allocated memory blocks twice
 - the aim of jumping after pcpu_setup_first_chunk() is bypassing free
   usable memory other than handling error, moreover, the function does
   not return error code in any case, it either panics due to BUG_ON()
   or return 0.

Signed-off-by: zijun_hu 
Tested-by: zijun_hu 
---
 this patch is based on mmotm/linux-next branch so can be
 applied to them directly

 Changes in v2: 
  - more detailed commit message is provided as discussed
with t...@kernel.org

 mm/percpu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 41d9d0b35801..7a5dae185ce1 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1963,7 +1963,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
struct pcpu_alloc_info *ai;
size_t size_sum, areas_size;
unsigned long max_distance;
-   int group, i, rc;
+   int group, i, j, rc;
 
ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
   cpu_distance_fn);
@@ -1979,7 +1979,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
goto out_free;
}
 
-   /* allocate, copy and determine base address */
+   /* allocate, copy and determine base address & max_distance */
+   j = 0;
for (group = 0; group < ai->nr_groups; group++) {
struct pcpu_group_info *gi = >groups[group];
unsigned int cpu = NR_CPUS;
@@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
areas[group] = ptr;
 
base = min(ptr, base);
+   if (ptr > areas[j])
+   j = group;
+   }
+   max_distance = areas[j] - base;
+   max_distance += ai->unit_size * ai->groups[j].nr_units;
+
+   /* warn if maximum distance is further than 75% of vmalloc space */
+   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
+   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
+   max_distance, VMALLOC_TOTAL);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+   /* and fail if we have fallback */
+   rc = -EINVAL;
+   goto out_free_areas;
+#endif
}
 
/*
@@ -2024,24 +2040,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
}
 
/* base address is now known, determine group base offsets */
-   i = 0;
for (group = 0; group < ai->nr_groups; group++) {
ai->groups[group].base_offset = areas[group] - base;
-   if (areas[group] > areas[i])
-   i = group;
-   }
-   max_distance = ai->groups[i].base_offset +
-   (unsigned long)ai->unit_size * ai->groups[i].nr_units;
-
-   /* warn if maximum distance is further than 75% of vmalloc space */
-   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
-   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
-   max_distance, VMALLOC_TOTAL);
-#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
-   /* and fail if we have fallback */
-   rc = -EINVAL;
-   goto out_free;
-#endif
}
 
pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
-- 
1.9.1

Re: [RFC PATCH 1/1] mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-09-29 Thread zijun_hu

On 2016/9/30 0:44, Tejun Heo wrote:
> Hello,
> 
> On Fri, Sep 30, 2016 at 12:03:20AM +0800, zijun_hu wrote:
>> From: zijun_hu <zijun...@htc.com>
>>
>> it will cause memory leakage for pcpu_embed_first_chunk() to go to
>> label @out_free if the chunk spans over 3/4 VMALLOC area. all memory
>> are allocated and recorded into array @areas for each CPU group, but
>> the memory allocated aren't be freed before returning after going to
>> label @out_free
>>
>> in order to fix this bug, we check chunk spanned area immediately
>> after completing memory allocation for all CPU group, we go to label
>> @out_free_areas other than @out_free to free all memory allocated if
>> the checking is failed.
>>
>> Signed-off-by: zijun_hu <zijun...@htc.com>
> ...
>> @@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t 
>> reserved_size, size_t dyn_size,
>>  areas[group] = ptr;
>>  
>>  base = min(ptr, base);
>> +if (ptr > areas[j])
>> +j = group;
>> +}
>> +max_distance = areas[j] - base;
>> +max_distance += ai->unit_size * ai->groups[j].nr_units;
>> +
>> +/* warn if maximum distance is further than 75% of vmalloc space */
>> +if (max_distance > VMALLOC_TOTAL * 3 / 4) {
>> +pr_warn("max_distance=0x%lx too large for vmalloc space 
>> 0x%lx\n",
>> +max_distance, VMALLOC_TOTAL);
>> +#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
>> +/* and fail if we have fallback */
>> +rc = -EINVAL;
>> +goto out_free_areas;
>> +#endif
> 
> Isn't it way simpler to make the error path jump to out_free_areas?
> There's another similar case after pcpu_setup_first_chunk() failure
> too.  Also, can you please explain how you tested the changes?
> 
> Thanks.
> 
1) the simpler way don't work because it maybe free many memory block twice

let us take a CPU group as a example, after we allocate All memory
needed by a CPU group, we maybe free a unit memory block which
don't map to a available CPU, we maybe free a part of unit memory which 
we don't used too, you can refer to following code segments for detailed
info.
for (group = 0; group < ai->nr_groups; group++) {
struct pcpu_group_info *gi = >groups[group];
void *ptr = areas[group];

for (i = 0; i < gi->nr_units; i++, ptr += ai->unit_size) {
if (gi->cpu_map[i] == NR_CPUS) {
/* unused unit, free whole */
free_fn(ptr, ai->unit_size);
continue;
}
/* copy and return the unused part */
memcpy(ptr, __per_cpu_load, ai->static_size);
free_fn(ptr + size_sum, ai->unit_size - size_sum);
}
}

2) as we seen, pcpu_setup_first_chunk() doesn't cause a failure, it  return 0
   always or panic by BUG_ON(), even if it fails, we can conclude the allocated
   memory based on information recorded by it, such as pcpu_base_addr and many 
of
   static variable, we can complete the free operations; but we can't if we
   fail in the case pointed by this patch

3) my test way is simple, i force "if (max_distance > VMALLOC_TOTAL * 3 / 4)"
   to if (1) and print which memory i allocate before the jumping, then print 
which memory
   i free after the jumping and before returning, then check whether i free the 
memory i 
   allocate in this function, the result is okay

Re: [RFC PATCH 1/1] mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-09-29 Thread zijun_hu

On 2016/9/30 0:44, Tejun Heo wrote:
> Hello,
> 
> On Fri, Sep 30, 2016 at 12:03:20AM +0800, zijun_hu wrote:
>> From: zijun_hu 
>>
>> it will cause memory leakage for pcpu_embed_first_chunk() to go to
>> label @out_free if the chunk spans over 3/4 VMALLOC area. all memory
>> are allocated and recorded into array @areas for each CPU group, but
>> the memory allocated aren't be freed before returning after going to
>> label @out_free
>>
>> in order to fix this bug, we check chunk spanned area immediately
>> after completing memory allocation for all CPU group, we go to label
>> @out_free_areas other than @out_free to free all memory allocated if
>> the checking is failed.
>>
>> Signed-off-by: zijun_hu 
> ...
>> @@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t 
>> reserved_size, size_t dyn_size,
>>  areas[group] = ptr;
>>  
>>  base = min(ptr, base);
>> +if (ptr > areas[j])
>> +j = group;
>> +}
>> +max_distance = areas[j] - base;
>> +max_distance += ai->unit_size * ai->groups[j].nr_units;
>> +
>> +/* warn if maximum distance is further than 75% of vmalloc space */
>> +if (max_distance > VMALLOC_TOTAL * 3 / 4) {
>> +pr_warn("max_distance=0x%lx too large for vmalloc space 
>> 0x%lx\n",
>> +max_distance, VMALLOC_TOTAL);
>> +#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
>> +/* and fail if we have fallback */
>> +rc = -EINVAL;
>> +goto out_free_areas;
>> +#endif
> 
> Isn't it way simpler to make the error path jump to out_free_areas?
> There's another similar case after pcpu_setup_first_chunk() failure
> too.  Also, can you please explain how you tested the changes?
> 
> Thanks.
> 
1) the simpler way don't work because it maybe free many memory block twice

let us take a CPU group as a example, after we allocate All memory
needed by a CPU group, we maybe free a unit memory block which
don't map to a available CPU, we maybe free a part of unit memory which 
we don't used too, you can refer to following code segments for detailed
info.
for (group = 0; group < ai->nr_groups; group++) {
struct pcpu_group_info *gi = >groups[group];
void *ptr = areas[group];

for (i = 0; i < gi->nr_units; i++, ptr += ai->unit_size) {
if (gi->cpu_map[i] == NR_CPUS) {
/* unused unit, free whole */
free_fn(ptr, ai->unit_size);
continue;
}
/* copy and return the unused part */
memcpy(ptr, __per_cpu_load, ai->static_size);
free_fn(ptr + size_sum, ai->unit_size - size_sum);
}
}

2) as we seen, pcpu_setup_first_chunk() doesn't cause a failure, it  return 0
   always or panic by BUG_ON(), even if it fails, we can conclude the allocated
   memory based on information recorded by it, such as pcpu_base_addr and many 
of
   static variable, we can complete the free operations; but we can't if we
   fail in the case pointed by this patch

3) my test way is simple, i force "if (max_distance > VMALLOC_TOTAL * 3 / 4)"
   to if (1) and print which memory i allocate before the jumping, then print 
which memory
   i free after the jumping and before returning, then check whether i free the 
memory i 
   allocate in this function, the result is okay

[RFC PATCH 1/1] mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-09-29 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

it will cause memory leakage for pcpu_embed_first_chunk() to go to
label @out_free if the chunk spans over 3/4 VMALLOC area. all memory
are allocated and recorded into array @areas for each CPU group, but
the memory allocated aren't be freed before returning after going to
label @out_free

in order to fix this bug, we check chunk spanned area immediately
after completing memory allocation for all CPU group, we go to label
@out_free_areas other than @out_free to free all memory allocated if
the checking is failed.

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 Hi Andrew,
  i am sorry to forget to prefix title with "PATCH" keyword in previous
  mail, so i resend it with correction
  this patch is based on mmotm/linux-next branch so can be
  applied directly

 mm/percpu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 41d9d0b35801..7a5dae185ce1 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1963,7 +1963,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
struct pcpu_alloc_info *ai;
size_t size_sum, areas_size;
unsigned long max_distance;
-   int group, i, rc;
+   int group, i, j, rc;
 
ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
   cpu_distance_fn);
@@ -1979,7 +1979,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
goto out_free;
}
 
-   /* allocate, copy and determine base address */
+   /* allocate, copy and determine base address & max_distance */
+   j = 0;
for (group = 0; group < ai->nr_groups; group++) {
struct pcpu_group_info *gi = >groups[group];
unsigned int cpu = NR_CPUS;
@@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
areas[group] = ptr;
 
base = min(ptr, base);
+   if (ptr > areas[j])
+   j = group;
+   }
+   max_distance = areas[j] - base;
+   max_distance += ai->unit_size * ai->groups[j].nr_units;
+
+   /* warn if maximum distance is further than 75% of vmalloc space */
+   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
+   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
+   max_distance, VMALLOC_TOTAL);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+   /* and fail if we have fallback */
+   rc = -EINVAL;
+   goto out_free_areas;
+#endif
}
 
/*
@@ -2024,24 +2040,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
}
 
/* base address is now known, determine group base offsets */
-   i = 0;
for (group = 0; group < ai->nr_groups; group++) {
ai->groups[group].base_offset = areas[group] - base;
-   if (areas[group] > areas[i])
-   i = group;
-   }
-   max_distance = ai->groups[i].base_offset +
-   (unsigned long)ai->unit_size * ai->groups[i].nr_units;
-
-   /* warn if maximum distance is further than 75% of vmalloc space */
-   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
-   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
-   max_distance, VMALLOC_TOTAL);
-#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
-   /* and fail if we have fallback */
-   rc = -EINVAL;
-   goto out_free;
-#endif
}
 
pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
-- 
1.9.1

[RFC PATCH 1/1] mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-09-29 Thread zijun_hu

From: zijun_hu 

it will cause memory leakage for pcpu_embed_first_chunk() to go to
label @out_free if the chunk spans over 3/4 VMALLOC area. all memory
are allocated and recorded into array @areas for each CPU group, but
the memory allocated aren't be freed before returning after going to
label @out_free

in order to fix this bug, we check chunk spanned area immediately
after completing memory allocation for all CPU group, we go to label
@out_free_areas other than @out_free to free all memory allocated if
the checking is failed.

Signed-off-by: zijun_hu 
---
 Hi Andrew,
  i am sorry to forget to prefix title with "PATCH" keyword in previous
  mail, so i resend it with correction
  this patch is based on mmotm/linux-next branch so can be
  applied directly

 mm/percpu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 41d9d0b35801..7a5dae185ce1 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1963,7 +1963,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
struct pcpu_alloc_info *ai;
size_t size_sum, areas_size;
unsigned long max_distance;
-   int group, i, rc;
+   int group, i, j, rc;
 
ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
   cpu_distance_fn);
@@ -1979,7 +1979,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
goto out_free;
}
 
-   /* allocate, copy and determine base address */
+   /* allocate, copy and determine base address & max_distance */
+   j = 0;
for (group = 0; group < ai->nr_groups; group++) {
struct pcpu_group_info *gi = >groups[group];
unsigned int cpu = NR_CPUS;
@@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
areas[group] = ptr;
 
base = min(ptr, base);
+   if (ptr > areas[j])
+   j = group;
+   }
+   max_distance = areas[j] - base;
+   max_distance += ai->unit_size * ai->groups[j].nr_units;
+
+   /* warn if maximum distance is further than 75% of vmalloc space */
+   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
+   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
+   max_distance, VMALLOC_TOTAL);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+   /* and fail if we have fallback */
+   rc = -EINVAL;
+   goto out_free_areas;
+#endif
}
 
/*
@@ -2024,24 +2040,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
}
 
/* base address is now known, determine group base offsets */
-   i = 0;
for (group = 0; group < ai->nr_groups; group++) {
ai->groups[group].base_offset = areas[group] - base;
-   if (areas[group] > areas[i])
-   i = group;
-   }
-   max_distance = ai->groups[i].base_offset +
-   (unsigned long)ai->unit_size * ai->groups[i].nr_units;
-
-   /* warn if maximum distance is further than 75% of vmalloc space */
-   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
-   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
-   max_distance, VMALLOC_TOTAL);
-#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
-   /* and fail if we have fallback */
-   rc = -EINVAL;
-   goto out_free;
-#endif
}
 
pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
-- 
1.9.1

mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-09-29 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

it will cause memory leakage for pcpu_embed_first_chunk() to go to
label @out_free if the chunk spans over 3/4 VMALLOC area. all memory
are allocated and recorded into array @areas for each CPU group, but
the memory allocated aren't be freed before returning after going to
label @out_free

in order to fix this bug, we check chunk spanned area immediately
after completing memory allocation for all CPU group, we go to label
@out_free_areas other than @out_free to free all memory allocated if
the checking is failed.

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 Hi Andrew,
 this patch is based on mmotm/linux-next branch so can be
 applied directly

 mm/percpu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 41d9d0b35801..7a5dae185ce1 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1963,7 +1963,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
struct pcpu_alloc_info *ai;
size_t size_sum, areas_size;
unsigned long max_distance;
-   int group, i, rc;
+   int group, i, j, rc;
 
ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
   cpu_distance_fn);
@@ -1979,7 +1979,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
goto out_free;
}
 
-   /* allocate, copy and determine base address */
+   /* allocate, copy and determine base address & max_distance */
+   j = 0;
for (group = 0; group < ai->nr_groups; group++) {
struct pcpu_group_info *gi = >groups[group];
unsigned int cpu = NR_CPUS;
@@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
areas[group] = ptr;
 
base = min(ptr, base);
+   if (ptr > areas[j])
+   j = group;
+   }
+   max_distance = areas[j] - base;
+   max_distance += ai->unit_size * ai->groups[j].nr_units;
+
+   /* warn if maximum distance is further than 75% of vmalloc space */
+   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
+   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
+   max_distance, VMALLOC_TOTAL);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+   /* and fail if we have fallback */
+   rc = -EINVAL;
+   goto out_free_areas;
+#endif
}
 
/*
@@ -2024,24 +2040,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
}
 
/* base address is now known, determine group base offsets */
-   i = 0;
for (group = 0; group < ai->nr_groups; group++) {
ai->groups[group].base_offset = areas[group] - base;
-   if (areas[group] > areas[i])
-   i = group;
-   }
-   max_distance = ai->groups[i].base_offset +
-   (unsigned long)ai->unit_size * ai->groups[i].nr_units;
-
-   /* warn if maximum distance is further than 75% of vmalloc space */
-   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
-   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
-   max_distance, VMALLOC_TOTAL);
-#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
-   /* and fail if we have fallback */
-   rc = -EINVAL;
-   goto out_free;
-#endif
}
 
pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
-- 
1.9.1

mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()

2016-09-29 Thread zijun_hu

From: zijun_hu 

it will cause memory leakage for pcpu_embed_first_chunk() to go to
label @out_free if the chunk spans over 3/4 VMALLOC area. all memory
are allocated and recorded into array @areas for each CPU group, but
the memory allocated aren't be freed before returning after going to
label @out_free

in order to fix this bug, we check chunk spanned area immediately
after completing memory allocation for all CPU group, we go to label
@out_free_areas other than @out_free to free all memory allocated if
the checking is failed.

Signed-off-by: zijun_hu 
---
 Hi Andrew,
 this patch is based on mmotm/linux-next branch so can be
 applied directly

 mm/percpu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 41d9d0b35801..7a5dae185ce1 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1963,7 +1963,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
struct pcpu_alloc_info *ai;
size_t size_sum, areas_size;
unsigned long max_distance;
-   int group, i, rc;
+   int group, i, j, rc;
 
ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
   cpu_distance_fn);
@@ -1979,7 +1979,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
goto out_free;
}
 
-   /* allocate, copy and determine base address */
+   /* allocate, copy and determine base address & max_distance */
+   j = 0;
for (group = 0; group < ai->nr_groups; group++) {
struct pcpu_group_info *gi = >groups[group];
unsigned int cpu = NR_CPUS;
@@ -2000,6 +2001,21 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
areas[group] = ptr;
 
base = min(ptr, base);
+   if (ptr > areas[j])
+   j = group;
+   }
+   max_distance = areas[j] - base;
+   max_distance += ai->unit_size * ai->groups[j].nr_units;
+
+   /* warn if maximum distance is further than 75% of vmalloc space */
+   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
+   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
+   max_distance, VMALLOC_TOTAL);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+   /* and fail if we have fallback */
+   rc = -EINVAL;
+   goto out_free_areas;
+#endif
}
 
/*
@@ -2024,24 +2040,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, 
size_t dyn_size,
}
 
/* base address is now known, determine group base offsets */
-   i = 0;
for (group = 0; group < ai->nr_groups; group++) {
ai->groups[group].base_offset = areas[group] - base;
-   if (areas[group] > areas[i])
-   i = group;
-   }
-   max_distance = ai->groups[i].base_offset +
-   (unsigned long)ai->unit_size * ai->groups[i].nr_units;
-
-   /* warn if maximum distance is further than 75% of vmalloc space */
-   if (max_distance > VMALLOC_TOTAL * 3 / 4) {
-   pr_warn("max_distance=0x%lx too large for vmalloc space 
0x%lx\n",
-   max_distance, VMALLOC_TOTAL);
-#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
-   /* and fail if we have fallback */
-   rc = -EINVAL;
-   goto out_free;
-#endif
}
 
pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
-- 
1.9.1

Re: [RESEND PATCH 1/1] mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk()

2016-09-29 Thread zijun_hu

On 2016/9/29 18:35, Tejun Heo wrote:
> Hello,
> 
> On Sat, Sep 24, 2016 at 07:20:49AM +0800, zijun_hu wrote:
>> it is error to represent the max range max_distance spanned by all the
>> group areas as the offset of the highest group area plus unit size in
>> pcpu_embed_first_chunk(), it should equal to the offset plus the size
>> of the highest group area
>>
>> in order to fix this issue,let us find the highest group area who has the
>> biggest base address among all the ones, then max_distance is formed by
>> add it's offset and size value
> 
>  [PATCH] percpu: fix max_distance calculation in pcpu_embed_first_chunk()
> 
>  pcpu_embed_first_chunk() calculates the range a percpu chunk spans
>  into max_distance and uses it to ensure that a chunk is not too big
>  compared to the total vmalloc area.  However, during calculation, it
>  used incorrect top address by adding a unit size to the higest
>  group's base address.
> 
>  This can make the calculated max_distance slightly smaller than the
>  actual distance although given the scale of values involved the error
>  is very unlikely to have an actual impact.
> 
>  Fix this issue by adding the group's size instead of a unit size.
> 
>> the type of variant max_distance is changed from size_t to unsigned long
>> to prevent potential overflow
> 
> This doesn't make any sense.  All the values involved are valid
> addresses (or +1 of it), they can't overflow and size_t is the same
> size as ulong.
> 
>> @@ -2025,17 +2026,18 @@ int __init pcpu_embed_first_chunk(size_t 
>> reserved_size, size_t dyn_size,
>>  }
>>  
>>  /* base address is now known, determine group base offsets */
>> -max_distance = 0;
>> +i = 0;
>>  for (group = 0; group < ai->nr_groups; group++) {
>>  ai->groups[group].base_offset = areas[group] - base;
>> -max_distance = max_t(size_t, max_distance,
>> - ai->groups[group].base_offset);
>> +if (areas[group] > areas[i])
>> +i = group;
>>  }
>> -max_distance += ai->unit_size;
>> +max_distance = ai->groups[i].base_offset +
>> +(unsigned long)ai->unit_size * ai->groups[i].nr_units;
> 
> I don't think you need ulong cast here.
> 
> Thanks.
> 
okay, thanks for your reply
i will correct this in another patch

Re: [RESEND PATCH 1/1] mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk()

2016-09-29 Thread zijun_hu

On 2016/9/29 18:35, Tejun Heo wrote:
> Hello,
> 
> On Sat, Sep 24, 2016 at 07:20:49AM +0800, zijun_hu wrote:
>> it is error to represent the max range max_distance spanned by all the
>> group areas as the offset of the highest group area plus unit size in
>> pcpu_embed_first_chunk(), it should equal to the offset plus the size
>> of the highest group area
>>
>> in order to fix this issue,let us find the highest group area who has the
>> biggest base address among all the ones, then max_distance is formed by
>> add it's offset and size value
> 
>  [PATCH] percpu: fix max_distance calculation in pcpu_embed_first_chunk()
> 
>  pcpu_embed_first_chunk() calculates the range a percpu chunk spans
>  into max_distance and uses it to ensure that a chunk is not too big
>  compared to the total vmalloc area.  However, during calculation, it
>  used incorrect top address by adding a unit size to the higest
>  group's base address.
> 
>  This can make the calculated max_distance slightly smaller than the
>  actual distance although given the scale of values involved the error
>  is very unlikely to have an actual impact.
> 
>  Fix this issue by adding the group's size instead of a unit size.
> 
>> the type of variant max_distance is changed from size_t to unsigned long
>> to prevent potential overflow
> 
> This doesn't make any sense.  All the values involved are valid
> addresses (or +1 of it), they can't overflow and size_t is the same
> size as ulong.
> 
>> @@ -2025,17 +2026,18 @@ int __init pcpu_embed_first_chunk(size_t 
>> reserved_size, size_t dyn_size,
>>  }
>>  
>>  /* base address is now known, determine group base offsets */
>> -max_distance = 0;
>> +i = 0;
>>  for (group = 0; group < ai->nr_groups; group++) {
>>  ai->groups[group].base_offset = areas[group] - base;
>> -max_distance = max_t(size_t, max_distance,
>> - ai->groups[group].base_offset);
>> +if (areas[group] > areas[i])
>> +i = group;
>>  }
>> -max_distance += ai->unit_size;
>> +max_distance = ai->groups[i].base_offset +
>> +(unsigned long)ai->unit_size * ai->groups[i].nr_units;
> 
> I don't think you need ulong cast here.
> 
> Thanks.
> 
okay, thanks for your reply
i will correct this in another patch

[RFC PATCH v2 1/1] mm/vmalloc.c: simplify /proc/vmallocinfo implementation

2016-09-27 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

simplify /proc/vmallocinfo implementation via existing seq_file
helpers for list_head

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 Changes in v2:
  - more detailed commit message is provided
  - the redundant type cast for list_entry() is removed as advised
by rient...@google.com

 mm/vmalloc.c | 27 +--
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index cc6ecd60cc0e..8b80931654b7 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2576,32 +2576,13 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int 
nr_vms)
 static void *s_start(struct seq_file *m, loff_t *pos)
__acquires(_area_lock)
 {
-   loff_t n = *pos;
-   struct vmap_area *va;
-
spin_lock(_area_lock);
-   va = list_first_entry(_area_list, typeof(*va), list);
-   while (n > 0 && >list != _area_list) {
-   n--;
-   va = list_next_entry(va, list);
-   }
-   if (!n && >list != _area_list)
-   return va;
-
-   return NULL;
-
+   return seq_list_start(_area_list, *pos);
 }
 
 static void *s_next(struct seq_file *m, void *p, loff_t *pos)
 {
-   struct vmap_area *va = p, *next;
-
-   ++*pos;
-   next = list_next_entry(va, list);
-   if (>list != _area_list)
-   return next;
-
-   return NULL;
+   return seq_list_next(p, _area_list, pos);
 }
 
 static void s_stop(struct seq_file *m, void *p)
@@ -2636,9 +2617,11 @@ static void show_numa_info(struct seq_file *m, struct 
vm_struct *v)
 
 static int s_show(struct seq_file *m, void *p)
 {
-   struct vmap_area *va = p;
+   struct vmap_area *va;
struct vm_struct *v;
 
+   va = list_entry(p, struct vmap_area, list);
+
/*
 * s_show can encounter race with remove_vm_area, !VM_VM_AREA on
 * behalf of vmap area is being tear down or vm_map_ram allocation.
-- 
1.9.1

[RFC PATCH v2 1/1] mm/vmalloc.c: simplify /proc/vmallocinfo implementation

2016-09-27 Thread zijun_hu

From: zijun_hu 

simplify /proc/vmallocinfo implementation via existing seq_file
helpers for list_head

Signed-off-by: zijun_hu 
---
 Changes in v2:
  - more detailed commit message is provided
  - the redundant type cast for list_entry() is removed as advised
by rient...@google.com

 mm/vmalloc.c | 27 +--
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index cc6ecd60cc0e..8b80931654b7 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2576,32 +2576,13 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int 
nr_vms)
 static void *s_start(struct seq_file *m, loff_t *pos)
__acquires(_area_lock)
 {
-   loff_t n = *pos;
-   struct vmap_area *va;
-
spin_lock(_area_lock);
-   va = list_first_entry(_area_list, typeof(*va), list);
-   while (n > 0 && >list != _area_list) {
-   n--;
-   va = list_next_entry(va, list);
-   }
-   if (!n && >list != _area_list)
-   return va;
-
-   return NULL;
-
+   return seq_list_start(_area_list, *pos);
 }
 
 static void *s_next(struct seq_file *m, void *p, loff_t *pos)
 {
-   struct vmap_area *va = p, *next;
-
-   ++*pos;
-   next = list_next_entry(va, list);
-   if (>list != _area_list)
-   return next;
-
-   return NULL;
+   return seq_list_next(p, _area_list, pos);
 }
 
 static void s_stop(struct seq_file *m, void *p)
@@ -2636,9 +2617,11 @@ static void show_numa_info(struct seq_file *m, struct 
vm_struct *v)
 
 static int s_show(struct seq_file *m, void *p)
 {
-   struct vmap_area *va = p;
+   struct vmap_area *va;
struct vm_struct *v;
 
+   va = list_entry(p, struct vmap_area, list);
+
/*
 * s_show can encounter race with remove_vm_area, !VM_VM_AREA on
 * behalf of vmap area is being tear down or vm_map_ram allocation.
-- 
1.9.1

[RFC PATCH v2 1/1] mm/vmalloc.c: correct a few logic error for __insert_vmap_area()

2016-09-27 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

__insert_vmap_area() has a few obvious logic errors as shown by comments
within below code segments
static void __insert_vmap_area(struct vmap_area *va)
{
as a internal function parameter, we assume vmap_area @va has nonzero size
...
if (va->va_start < tmp->va_end)
p = &(*p)->rb_left;
else if (va->va_end > tmp->va_start)
p = &(*p)->rb_right;
this else if condition is always true and meaningless due to
va->va_end > va->va_start >= tmp_va->va_end > tmp_va->va_start normally
else
BUG();
this BUG() is meaningless too due to never be touched normally
...
}

the function don't implement the below desire behavior based on context
if the vmap_area @va to be inserted is lower than that on the rbtree then
we walk around the left branch of the given rbtree node; else if higher
then right branch; else the former vmap_area has overlay with the latter
then the existing BUG() is triggered

it is fixed by correcting vmap_area rbtree walk manner as mentioned above
BTW, we consider (va->va_end == tmp_va->va_start) as legal case since it
indicate vmap_area @va neighbors with @tmp_va tightly

Fixes: db64fe02258f ("mm: rewrite vmap layer")
Signed-off-by: zijun_hu <zijun...@htc.com>
---
 Hi npiggin,
 could you offer some comments for this patch since __insert_vmap_area()
 was introduced by you?
 thanks a lot

 Changes in v2:
  - more detailed commit message is provided

 mm/vmalloc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 91f44e78c516..cc6ecd60cc0e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)
 
parent = *p;
tmp_va = rb_entry(parent, struct vmap_area, rb_node);
-   if (va->va_start < tmp_va->va_end)
-   p = &(*p)->rb_left;
-   else if (va->va_end > tmp_va->va_start)
-   p = &(*p)->rb_right;
+   if (va->va_end <= tmp_va->va_start)
+   p = >rb_left;
+   else if (va->va_start >= tmp_va->va_end)
+   p = >rb_right;
else
BUG();
}
-- 
1.9.1

[RFC PATCH v2 1/1] mm/vmalloc.c: correct a few logic error for __insert_vmap_area()

2016-09-27 Thread zijun_hu

From: zijun_hu 

__insert_vmap_area() has a few obvious logic errors as shown by comments
within below code segments
static void __insert_vmap_area(struct vmap_area *va)
{
as a internal function parameter, we assume vmap_area @va has nonzero size
...
if (va->va_start < tmp->va_end)
p = &(*p)->rb_left;
else if (va->va_end > tmp->va_start)
p = &(*p)->rb_right;
this else if condition is always true and meaningless due to
va->va_end > va->va_start >= tmp_va->va_end > tmp_va->va_start normally
else
BUG();
this BUG() is meaningless too due to never be touched normally
...
}

the function don't implement the below desire behavior based on context
if the vmap_area @va to be inserted is lower than that on the rbtree then
we walk around the left branch of the given rbtree node; else if higher
then right branch; else the former vmap_area has overlay with the latter
then the existing BUG() is triggered

it is fixed by correcting vmap_area rbtree walk manner as mentioned above
BTW, we consider (va->va_end == tmp_va->va_start) as legal case since it
indicate vmap_area @va neighbors with @tmp_va tightly

Fixes: db64fe02258f ("mm: rewrite vmap layer")
Signed-off-by: zijun_hu 
---
 Hi npiggin,
 could you offer some comments for this patch since __insert_vmap_area()
 was introduced by you?
 thanks a lot

 Changes in v2:
  - more detailed commit message is provided

 mm/vmalloc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 91f44e78c516..cc6ecd60cc0e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)
 
parent = *p;
tmp_va = rb_entry(parent, struct vmap_area, rb_node);
-   if (va->va_start < tmp_va->va_end)
-   p = &(*p)->rb_left;
-   else if (va->va_end > tmp_va->va_start)
-   p = &(*p)->rb_right;
+   if (va->va_end <= tmp_va->va_start)
+   p = >rb_left;
+   else if (va->va_start >= tmp_va->va_end)
+   p = >rb_right;
else
BUG();
}
-- 
1.9.1

[RESEND RFC PATCH 1/1] linux/mm.h: canonicalize macro PAGE_ALIGNED() definition

2016-09-27 Thread zijun_hu

From: zijun_hu <zijun...@htc.com>

macro PAGE_ALIGNED() is prone to cause error because it doesn't follow
convention to parenthesize parameter @addr within macro body, for example
unsigned long *ptr = kmalloc(...); PAGE_ALIGNED(ptr + 16);
for the left parameter of macro IS_ALIGNED(), (unsigned long)(ptr + 16)
is desired but the actual one is (unsigned long)ptr + 16

it is fixed by simply canonicalizing macro PAGE_ALIGNED() definition

Signed-off-by: zijun_hu <zijun...@htc.com>
---
 include/linux/mm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef815b9cd426..ec6818631635 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -126,7 +126,7 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
 
 /* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE 
*/
-#define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)addr, PAGE_SIZE)
+#define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)(addr), PAGE_SIZE)
 
 /*
  * Linux kernel virtual memory manager primitives.
-- 
1.9.1

[RESEND RFC PATCH 1/1] linux/mm.h: canonicalize macro PAGE_ALIGNED() definition

2016-09-27 Thread zijun_hu

From: zijun_hu 

macro PAGE_ALIGNED() is prone to cause error because it doesn't follow
convention to parenthesize parameter @addr within macro body, for example
unsigned long *ptr = kmalloc(...); PAGE_ALIGNED(ptr + 16);
for the left parameter of macro IS_ALIGNED(), (unsigned long)(ptr + 16)
is desired but the actual one is (unsigned long)ptr + 16

it is fixed by simply canonicalizing macro PAGE_ALIGNED() definition

Signed-off-by: zijun_hu 
---
 include/linux/mm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef815b9cd426..ec6818631635 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -126,7 +126,7 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
 
 /* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE 
*/
-#define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)addr, PAGE_SIZE)
+#define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)(addr), PAGE_SIZE)
 
 /*
  * Linux kernel virtual memory manager primitives.
-- 
1.9.1

Re: [PATCH 1/5] mm/vmalloc.c: correct a few logic error for __insert_vmap_area()

2016-09-27 Thread zijun_hu

On 09/22/2016 07:15 AM, David Rientjes wrote:
> On Thu, 22 Sep 2016, zijun_hu wrote:
> 
>>> We don't support inserting when va->va_start == tmp_va->va_end, plain and 
>>> simple.  There's no reason to do so.  NACK to the patch.
>>>
>> i am sorry i disagree with you because
>> 1) in almost all context of vmalloc, original logic treat the special case 
>> as normal
>>for example, __find_vmap_area() or alloc_vmap_area()
> 
> The ranges are [start, end) like everywhere else.  __find_vmap_area() is 
> implemented as such for the passed address.  The address is aligned in 
> alloc_vmap_area(), there's no surprise here.  The logic is correct in 
> __insert_vmap_area().
> 
i am sorry to disagree with you
i will resend this patch with more detailed illustration

Re: [PATCH 1/5] mm/vmalloc.c: correct a few logic error for __insert_vmap_area()

2016-09-27 Thread zijun_hu

On 09/22/2016 07:15 AM, David Rientjes wrote:
> On Thu, 22 Sep 2016, zijun_hu wrote:
> 
>>> We don't support inserting when va->va_start == tmp_va->va_end, plain and 
>>> simple.  There's no reason to do so.  NACK to the patch.
>>>
>> i am sorry i disagree with you because
>> 1) in almost all context of vmalloc, original logic treat the special case 
>> as normal
>>for example, __find_vmap_area() or alloc_vmap_area()
> 
> The ranges are [start, end) like everywhere else.  __find_vmap_area() is 
> implemented as such for the passed address.  The address is aligned in 
> alloc_vmap_area(), there's no surprise here.  The logic is correct in 
> __insert_vmap_area().
> 
i am sorry to disagree with you
i will resend this patch with more detailed illustration

1 2 3 >

1 - 100 of 232 matches

Mail list logo