Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-10 Thread Kamezawa Hiroyuki

(2013/01/10 16:55), Glauber Costa wrote:

On 01/10/2013 11:31 AM, Kamezawa Hiroyuki wrote:

(2013/01/10 16:14), Glauber Costa wrote:

On 01/10/2013 06:17 AM, Tang Chen wrote:

Note: if the memory provided by the memory device is used by the
kernel, it
can't be offlined. It is not a bug.


Right.  But how often does this happen in testing?  In other words,
please provide an overall description of how well memory hot-remove is
presently operating.  Is it reliable?  What is the success rate in
real-world situations?


We test the hot-remove functionality mostly with movable_online used.
And the memory used by kernel is not allowed to be removed.


Can you try doing this using cpusets configured to hardwall ?
It is my understanding that the object allocators will try hard not to
allocate anything outside the walls defined by cpuset. Which means that
if you have one process per node, and they are hardwalled, your kernel
memory will be spread evenly among the machine. With a big enough load,
they should eventually be present in all blocks.



I'm sorry I couldn't catch your point.
Do you want to confirm whether cpuset can work enough instead of
ZONE_MOVABLE ?
Or Do you want to confirm whether ZONE_MOVABLE will not work if it's
used with cpuset ?



No, I am not proposing to use cpuset do tackle the problem. I am just
wondering if you would still have high success rates with cpusets in use
with hardwalls. This is just one example of a workload that would spread
kernel memory around quite heavily.

So this is just me trying to understand the limitations of the mechanism.



Hm, okay. In my undestanding, if the whole memory of a node is configured as
MOVABLE, no kernel memory will not be allocated in the node because zonelist
will not match. So, if cpuset is used with hardwalls, user will see -ENOMEM or 
OOM,
I guess. even fork() will fail if fallback-to-other-node is not allowed.

If it's configure as ZONE_NORMAL, you need to pray for offlining memory.

AFAIK, IBM's ppc? has 16MB section size. So, some of sections can be offlined
even if they are configured as ZONE_NORMAL. For them, placement of offlined
memory is not important because it's virtualized by LPAR, they don't try
to remove DIMM, they just want to increase/decrease amount of memory.
It's an another approach.

But here, we(fujitsu) tries to remove a system board/DIMM.
So, configuring the whole memory of a node as ZONE_MOVABLE and tries to 
guarantee
DIMM as removable.


IMHO, I don't think shrink_slab() can kill all objects in a node even
if they are some caches. We need more study for doing that.



Indeed, shrink_slab can only kill cached objects. They, however, are
usually a very big part of kernel memory. I wonder though if in case of
failure, it is worth it to try at least one shrink pass before you give up.



Yeah, now, his (our) approach is never allowing kernel memory on a node to be
hot-removed by ZONE_MOVABLE. So, shrink_slab()'s effect will not be seen.

If other brave guys tries to use ZONE_NORMAL for hot-pluggable DIMM, I see,
it's worth triying.

How about checking the target memsection is in NORMAL or in MOVABLE at
hot-removing ? If NORMAL, shrink_slab() will be worth to be called.

BTW, shrink_slab() is now node/zone aware ? If not, fixing that first will
be better direction I guess.

Thanks,
-Kame


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-10 Thread Kamezawa Hiroyuki

(2013/01/10 17:36), Glauber Costa wrote:
 

BTW, shrink_slab() is now node/zone aware ? If not, fixing that first will
be better direction I guess.


It is not upstream, but there are patches for this that I am already
using in my private tree.



Oh, I see. If it's merged, it's worth add shrink_slab() if ZONE_NORMAL
code.

Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-09 Thread Kamezawa Hiroyuki

(2013/01/10 16:14), Glauber Costa wrote:

On 01/10/2013 06:17 AM, Tang Chen wrote:

Note: if the memory provided by the memory device is used by the
kernel, it
can't be offlined. It is not a bug.


Right.  But how often does this happen in testing?  In other words,
please provide an overall description of how well memory hot-remove is
presently operating.  Is it reliable?  What is the success rate in
real-world situations?


We test the hot-remove functionality mostly with movable_online used.
And the memory used by kernel is not allowed to be removed.


Can you try doing this using cpusets configured to hardwall ?
It is my understanding that the object allocators will try hard not to
allocate anything outside the walls defined by cpuset. Which means that
if you have one process per node, and they are hardwalled, your kernel
memory will be spread evenly among the machine. With a big enough load,
they should eventually be present in all blocks.



I'm sorry I couldn't catch your point.
Do you want to confirm whether cpuset can work enough instead of ZONE_MOVABLE ?
Or Do you want to confirm whether ZONE_MOVABLE will not work if it's used with 
cpuset ?



Another question I have for you: Have you considering calling
shrink_slab to try to deplete the caches and therefore free at least
slab memory in the nodes that can't be offlined? Is it relevant?



At this stage, we don't consider to call shrink_slab(). We require
nearly 100% success at offlining memory for removing DIMM.
It's my understanding.

IMHO, I don't think shrink_slab() can kill all objects in a node even
if they are some caches. We need more study for doing that.

Thanks,
-Kame


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 14/14] memory-hotplug: free node_data when a node is offlined

2013-01-06 Thread Kamezawa Hiroyuki
(2012/12/30 15:02), Wen Congyang wrote:
 At 12/28/2012 08:28 AM, Kamezawa Hiroyuki Wrote:
 (2012/12/27 21:16), Wen Congyang wrote:
 At 12/26/2012 11:55 AM, Kamezawa Hiroyuki Wrote:
 (2012/12/24 21:09), Tang Chen wrote:
 From: Wen Congyang we...@cn.fujitsu.com

 We call hotadd_new_pgdat() to allocate memory to store node_data. So we
 should free it when removing a node.

 Signed-off-by: Wen Congyang we...@cn.fujitsu.com

 I'm sorry but is it safe to remove pgdat ? All zone cache and zonelists are
 properly cleared/rebuilded in synchronous way ? and No threads are 
 visinting
 zone in vmscan.c ?

 We have rebuilt zonelists when a zone has no memory after offlining some 
 pages.


 How do you guarantee that the address of pgdat/zone is not on stack of any 
 kernel
 threads or other kernel objects without reference counting or other syncing 
 method ?
 
 No way to guarentee this. But, the kernel should not use the address of 
 pgdat/zone when
 it is offlined.
 
 Hmm, what about this: reuse the memory when the node is onlined again?
 

That's the only way which we can go now. Please don't free it.

Thanks,
-Kame


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 14/14] memory-hotplug: free node_data when a node is offlined

2012-12-27 Thread Kamezawa Hiroyuki
(2012/12/27 21:16), Wen Congyang wrote:
 At 12/26/2012 11:55 AM, Kamezawa Hiroyuki Wrote:
 (2012/12/24 21:09), Tang Chen wrote:
 From: Wen Congyang we...@cn.fujitsu.com

 We call hotadd_new_pgdat() to allocate memory to store node_data. So we
 should free it when removing a node.

 Signed-off-by: Wen Congyang we...@cn.fujitsu.com

 I'm sorry but is it safe to remove pgdat ? All zone cache and zonelists are
 properly cleared/rebuilded in synchronous way ? and No threads are visinting
 zone in vmscan.c ?
 
 We have rebuilt zonelists when a zone has no memory after offlining some 
 pages.
 

How do you guarantee that the address of pgdat/zone is not on stack of any 
kernel
threads or other kernel objects without reference counting or other syncing 
method ?


Thanks,
-Kame


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 02/14] memory-hotplug: check whether all memory blocks are offlined or not when removing memory

2012-12-25 Thread Kamezawa Hiroyuki
(2012/12/24 21:09), Tang Chen wrote:
 From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 
 We remove the memory like this:
 1. lock memory hotplug
 2. offline a memory block
 3. unlock memory hotplug
 4. repeat 1-3 to offline all memory blocks
 5. lock memory hotplug
 6. remove memory(TODO)
 7. unlock memory hotplug
 
 All memory blocks must be offlined before removing memory. But we don't hold
 the lock in the whole operation. So we should check whether all memory blocks
 are offlined before step6. Otherwise, kernel maybe panicked.
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

a nitpick below.

 ---
   drivers/base/memory.c  |6 +
   include/linux/memory_hotplug.h |1 +
   mm/memory_hotplug.c|   47 
 
   3 files changed, 54 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/base/memory.c b/drivers/base/memory.c
 index 987604d..8300a18 100644
 --- a/drivers/base/memory.c
 +++ b/drivers/base/memory.c
 @@ -693,6 +693,12 @@ int offline_memory_block(struct memory_block *mem)
   return ret;
   }
   
 +/* return true if the memory block is offlined, otherwise, return false */
 +bool is_memblock_offlined(struct memory_block *mem)
 +{
 + return mem-state == MEM_OFFLINE;
 +}
 +
   /*
* Initialize the sysfs support for memory devices...
*/
 diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
 index 4a45c4e..8dd0950 100644
 --- a/include/linux/memory_hotplug.h
 +++ b/include/linux/memory_hotplug.h
 @@ -247,6 +247,7 @@ extern int add_memory(int nid, u64 start, u64 size);
   extern int arch_add_memory(int nid, u64 start, u64 size);
   extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
   extern int offline_memory_block(struct memory_block *mem);
 +extern bool is_memblock_offlined(struct memory_block *mem);
   extern int remove_memory(u64 start, u64 size);
   extern int sparse_add_one_section(struct zone *zone, unsigned long 
 start_pfn,
   int nr_pages);
 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index 62e04c9..d43d97b 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -1430,6 +1430,53 @@ repeat:
   goto repeat;
   }
   
 + lock_memory_hotplug();
 +
 + /*
 +  * we have offlined all memory blocks like this:
 +  *   1. lock memory hotplug
 +  *   2. offline a memory block
 +  *   3. unlock memory hotplug
 +  *
 +  * repeat step1-3 to offline the memory block. All memory blocks
 +  * must be offlined before removing memory. But we don't hold the
 +  * lock in the whole operation. So we should check whether all
 +  * memory blocks are offlined.
 +  */
 +
 + for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {

I prefer adding mem = NULL at the start of this for().

 + section_nr = pfn_to_section_nr(pfn);
 + if (!present_section_nr(section_nr))
 + continue;
 +
 + section = __nr_to_section(section_nr);
 + /* same memblock? */
 + if (mem)
 + if ((section_nr = mem-start_section_nr) 
 + (section_nr = mem-end_section_nr))
 + continue;
 +

Thanks,
-Kame


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 03/14] memory-hotplug: remove redundant codes

2012-12-25 Thread Kamezawa Hiroyuki
(2012/12/24 21:09), Tang Chen wrote:
 From: Wen Congyang we...@cn.fujitsu.com
 
 offlining memory blocks and checking whether memory blocks are offlined
 are very similar. This patch introduces a new function to remove
 redundant codes.
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
   mm/memory_hotplug.c |  101 
 ---
   1 files changed, 55 insertions(+), 46 deletions(-)
 
 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index d43d97b..dbb04d8 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -1381,20 +1381,14 @@ int offline_pages(unsigned long start_pfn, unsigned 
 long nr_pages)
   return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
   }
   
 -int remove_memory(u64 start, u64 size)

please add explanation of this function here. If (*func) returns val other than 
0,
this function will fail and returns callback's return value...right ?


 +static int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 + void *arg, int (*func)(struct memory_block *, void *))
   {
   struct memory_block *mem = NULL;
   struct mem_section *section;
 - unsigned long start_pfn, end_pfn;
   unsigned long pfn, section_nr;
   int ret;
 - int return_on_error = 0;
 - int retry = 0;
 -
 - start_pfn = PFN_DOWN(start);
 - end_pfn = start_pfn + PFN_DOWN(size);
   
 -repeat:

Shouldn't we check lock is held here ? 
(VM_BUG_ON(!mutex_is_locked(mem_hotplug_mutex);


   for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
   section_nr = pfn_to_section_nr(pfn);
   if (!present_section_nr(section_nr))
 @@ -1411,22 +1405,61 @@ repeat:
   if (!mem)
   continue;
   
 - ret = offline_memory_block(mem);
 + ret = func(mem, arg);
   if (ret) {
 - if (return_on_error) {
 - kobject_put(mem-dev.kobj);
 - return ret;
 - } else {
 - retry = 1;
 - }
 + kobject_put(mem-dev.kobj);
 + return ret;
   }
   }
   
   if (mem)
   kobject_put(mem-dev.kobj);
   
 - if (retry) {
 - return_on_error = 1;
 + return 0;
 +}
 +
 +static int offline_memory_block_cb(struct memory_block *mem, void *arg)
 +{
 + int *ret = arg;
 + int error = offline_memory_block(mem);
 +
 + if (error != 0  *ret == 0)
 + *ret = error;
 +
 + return 0;

Always returns 0 and run through all mem blocks for scan-and-retry, right ?
You need explanation here !


 +}
 +
 +static int is_memblock_offlined_cb(struct memory_block *mem, void *arg)
 +{
 + int ret = !is_memblock_offlined(mem);
 +
 + if (unlikely(ret))
 + pr_warn(removing memory fails, because memory 
 + [%#010llx-%#010llx] is onlined\n,
 + PFN_PHYS(section_nr_to_pfn(mem-start_section_nr)),
 + PFN_PHYS(section_nr_to_pfn(mem-end_section_nr + 1))-1);
 +
 + return ret;
 +}
 +
 +int remove_memory(u64 start, u64 size)
 +{
 + unsigned long start_pfn, end_pfn;
 + int ret = 0;
 + int retry = 1;
 +
 + start_pfn = PFN_DOWN(start);
 + end_pfn = start_pfn + PFN_DOWN(size);
 +
 +repeat:

please explan why you repeat here .

 + walk_memory_range(start_pfn, end_pfn, ret,
 +   offline_memory_block_cb);
 + if (ret) {
 + if (!retry)
 + return ret;
 +
 + retry = 0;
 + ret = 0;
   goto repeat;
   }
   
 @@ -1444,37 +1477,13 @@ repeat:
* memory blocks are offlined.
*/
   
 - for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
 - section_nr = pfn_to_section_nr(pfn);
 - if (!present_section_nr(section_nr))
 - continue;
 -
 - section = __nr_to_section(section_nr);
 - /* same memblock? */
 - if (mem)
 - if ((section_nr = mem-start_section_nr) 
 - (section_nr = mem-end_section_nr))
 - continue;
 -
 - mem = find_memory_block_hinted(section, mem);
 - if (!mem)
 - continue;
 -
 - ret = is_memblock_offlined(mem);
 - if (!ret) {
 - pr_warn(removing memory fails, because memory 
 - [%#010llx-%#010llx] is onlined\n,
 - 
 PFN_PHYS(section_nr_to_pfn(mem-start_section_nr)),
 - PFN_PHYS(section_nr_to_pfn(mem-end_section_nr 
 + 1)) - 1);
 -
 - kobject_put(mem-dev.kobj);
 - unlock_memory_hotplug();
 - return ret;
 - }

please explain 

Re: [PATCH v5 04/14] memory-hotplug: remove /sys/firmware/memmap/X sysfs

2012-12-25 Thread Kamezawa Hiroyuki
(2012/12/24 21:09), Tang Chen wrote:
 From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 
 When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
 sysfs files are created. But there is no code to remove these files. The patch
 implements the function to remove them.
 
 Note: The code does not free firmware_map_entry which is allocated by bootmem.
So the patch makes memory leak. But I think the memory leak size is
very samll. And it does not affect the system.
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 ---
   drivers/firmware/memmap.c|   98 
 +-
   include/linux/firmware-map.h |6 +++
   mm/memory_hotplug.c  |5 ++-
   3 files changed, 106 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
 index 90723e6..49be12a 100644
 --- a/drivers/firmware/memmap.c
 +++ b/drivers/firmware/memmap.c
 @@ -21,6 +21,7 @@
   #include linux/types.h
   #include linux/bootmem.h
   #include linux/slab.h
 +#include linux/mm.h
   
   /*
* Data types 
 --
 @@ -41,6 +42,7 @@ struct firmware_map_entry {
   const char  *type;  /* type of the memory range */
   struct list_headlist;   /* entry for the linked list */
   struct kobject  kobj;   /* kobject for each entry */
 + unsigned intbootmem:1; /* allocated from bootmem */
   };

Can't we detect from which the object is allocated from, slab or bootmem ?

Hm, for example,

PageReserved(virt_to_page(address_of_obj)) ?
PageSlab(virt_to_page(address_of_obj)) ?

Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 05/14] memory-hotplug: introduce new function arch_remove_memory() for removing page table depends on architecture

2012-12-25 Thread Kamezawa Hiroyuki
(2012/12/24 21:09), Tang Chen wrote:
 From: Wen Congyang we...@cn.fujitsu.com
 
 For removing memory, we need to remove page table. But it depends
 on architecture. So the patch introduce arch_remove_memory() for
 removing page table. Now it only calls __remove_pages().
 
 Note: __remove_pages() for some archtecuture is not implemented
(I don't know how to implement it for s390).
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com

Then, remove code will be symetric to add codes.

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 01/14] memory-hotplug: try to offline the memory twice to avoid dependence

2012-12-25 Thread Kamezawa Hiroyuki
(2012/12/24 21:09), Tang Chen wrote:
 From: Wen Congyang we...@cn.fujitsu.com
 
 memory can't be offlined when CONFIG_MEMCG is selected.
 For example: there is a memory device on node 1. The address range
 is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
 and memory11 under the directory /sys/devices/system/memory/.
 
 If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
 when we online pages. When we online memory8, the memory stored page cgroup
 is not provided by this memory device. But when we online memory9, the memory
 stored page cgroup may be provided by memory8. So we can't offline memory8
 now. We should offline the memory in the reversed order.
 

If memory8 is onlined as NORMAL memory ...right ?

IIUC, vmalloc() uses __GFP_HIGHMEM but doesn't use __GFP_MOVABLE.

 When the memory device is hotremoved, we will auto offline memory provided
 by this memory device. But we don't know which memory is onlined first, so
 offlining memory may fail. In such case, iterate twice to offline the memory.
 1st iterate: offline every non primary memory block.
 2nd iterate: offline primary (i.e. first added) memory block.
 
 This idea is suggested by KOSAKI Motohiro.
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com

I'm not sure but the whole DIMM should be onlined as MOVABLE mem ?

Anyway, I agree this kind of retry is required if memory is onlined as NORMAL 
mem.
But retry-once is ok ?

Thanks,
-Kame

 ---
   mm/memory_hotplug.c |   16 ++--
   1 files changed, 14 insertions(+), 2 deletions(-)
 
 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index d04ed87..62e04c9 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -1388,10 +1388,13 @@ int remove_memory(u64 start, u64 size)
   unsigned long start_pfn, end_pfn;
   unsigned long pfn, section_nr;
   int ret;
 + int return_on_error = 0;
 + int retry = 0;
   
   start_pfn = PFN_DOWN(start);
   end_pfn = start_pfn + PFN_DOWN(size);
   
 +repeat:
   for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
   section_nr = pfn_to_section_nr(pfn);
   if (!present_section_nr(section_nr))
 @@ -1410,14 +1413,23 @@ int remove_memory(u64 start, u64 size)
   
   ret = offline_memory_block(mem);
   if (ret) {
 - kobject_put(mem-dev.kobj);
 - return ret;
 + if (return_on_error) {
 + kobject_put(mem-dev.kobj);
 + return ret;
 + } else {
 + retry = 1;
 + }
   }
   }
   
   if (mem)
   kobject_put(mem-dev.kobj);
   
 + if (retry) {
 + return_on_error = 1;
 + goto repeat;
 + }
 +
   return 0;
   }
   #else
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 07/14] memory-hotplug: move pgdat_resize_lock into sparse_remove_one_section()

2012-12-25 Thread Kamezawa Hiroyuki
(2012/12/24 21:09), Tang Chen wrote:
 In __remove_section(), we locked pgdat_resize_lock when calling
 sparse_remove_one_section(). This lock will disable irq. But we don't need
 to lock the whole function. If we do some work to free pagetables in
 free_section_usemap(), we need to call flush_tlb_all(), which need
 irq enabled. Otherwise the WARN_ON_ONCE() in smp_call_function_many()
 will be triggered.
 
 Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com

If this is a bug fix, call-trace in your log and BUGFIX or -fix- in patch title
will be appreciated, I think.

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 14/14] memory-hotplug: free node_data when a node is offlined

2012-12-25 Thread Kamezawa Hiroyuki
(2012/12/24 21:09), Tang Chen wrote:
 From: Wen Congyang we...@cn.fujitsu.com
 
 We call hotadd_new_pgdat() to allocate memory to store node_data. So we
 should free it when removing a node.
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com

I'm sorry but is it safe to remove pgdat ? All zone cache and zonelists are
properly cleared/rebuilded in synchronous way ? and No threads are visinting
zone in vmscan.c ?

Thanks,
-Kame

 ---
   mm/memory_hotplug.c |   20 +++-
   1 files changed, 19 insertions(+), 1 deletions(-)
 
 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index f8a1d2f..447fa24 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -1680,9 +1680,12 @@ static int check_cpu_on_node(void *data)
   /* offline the node if all memory sections of this node are removed */
   static void try_offline_node(int nid)
   {
 + pg_data_t *pgdat = NODE_DATA(nid);
   unsigned long start_pfn = NODE_DATA(nid)-node_start_pfn;
 - unsigned long end_pfn = start_pfn + NODE_DATA(nid)-node_spanned_pages;
 + unsigned long end_pfn = start_pfn + pgdat-node_spanned_pages;
   unsigned long pfn;
 + struct page *pgdat_page = virt_to_page(pgdat);
 + int i;
   
   for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
   unsigned long section_nr = pfn_to_section_nr(pfn);
 @@ -1709,6 +1712,21 @@ static void try_offline_node(int nid)
*/
   node_set_offline(nid);
   unregister_one_node(nid);
 +
 + if (!PageSlab(pgdat_page)  !PageCompound(pgdat_page))
 + /* node data is allocated from boot memory */
 + return;
 +
 + /* free waittable in each zone */
 + for (i = 0; i  MAX_NR_ZONES; i++) {
 + struct zone *zone = pgdat-node_zones + i;
 +
 + if (zone-wait_table)
 + vfree(zone-wait_table);
 + }
 +
 + arch_refresh_nodedata(nid, NULL);
 + arch_free_nodedata(pgdat);
   }
   
   int __ref remove_memory(int nid, u64 start, u64 size)
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [patch 4/4] mm, oom: remove statically defined arch functions of same name

2012-11-15 Thread Kamezawa Hiroyuki

(2012/11/14 18:15), David Rientjes wrote:

out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file
scope in their respective fault.c unnecessarily.  Inline the functions
into the pagefault handlers to clean the code up.

Cc: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Paul Mackerras pau...@samba.org
Cc: Paul Mundt let...@linux-sh.org
Signed-off-by: David Rientjes rient...@google.com


I think this is good.

Reviewed-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: build failure after merge of the final tree (Linus' tree related)

2011-06-17 Thread KAMEZAWA Hiroyuki
On Fri, 17 Jun 2011 15:38:09 +1000
Stephen Rothwell s...@canb.auug.org.au wrote:

 Hi all,
 
 After merging the final tree, today's linux-next build (powerpc
 allyesconfig) failed like this:
 
 mm/page_cgroup.c: In function 'page_cgroup_init':
 mm/page_cgroup.c:309:13: error: 'pg_data_t' has no member named 'node_end_pfn'
 
 Caused by commit 37573e8c7182 (memcg: fix init_page_cgroup nid with
 sparsemem).  On powerpc, node_end_pfn() is defined to be (NODE_DATA
 (nid)-node_end_pfn) where NODE_DATA(nid) is (node_data[nid]) and
 node_data is struct pglist_data *node_data[].  As far as I can see,
 struct pglist_data has never had a member called node_end_pfn.
 
 This commit introduces the only use of node_end_pfn() in the generic
 kernel code.  Presumably the powerpc definition needs to be fixed (to
 maybe something like the x86 version).  It looks like the sparc version
 is broken as well.
 

Sorry, here is a fix I posted today. but no ack yet.
==
From 507cc95c5ba2351bff16c5421255d1395a3b555b Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
Date: Thu, 16 Jun 2011 17:28:07 +0900
Subject: [PATCH] Fix node_start/end_pfn() definition for mm/page_cgroup.c

commit 21a3c96 uses node_start/end_pfn(nid) for detection start/end
of nodes. But, it's not defined in linux/mmzone.h but defined in
/arch/???/include/mmzone.h which is included only under
CONFIG_NEED_MULTIPLE_NODES=y.

Then, we see
mm/page_cgroup.c: In function 'page_cgroup_init':
mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn'
mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn'

So, fixiing page_cgroup.c is an idea...

But node_start_pfn()/node_end_pfn() is a very generic macro and
should be implemented in the same manner for all archs.
(m32r has different implementation...)

This patch removes definitions of node_start/end_pfn() in each archs
and defines a unified one in linux/mmzone.h. It's not under
CONFIG_NEED_MULTIPLE_NODES, now.

A result of macro expansion is here (mm/page_cgroup.c)

for !NUMA
 start_pfn = ((contig_page_data)-node_start_pfn);
  end_pfn = ({ pg_data_t *__pgdat = (contig_page_data); 
__pgdat-node_start_pfn + __pgdat-node_spanned_pages;});

for NUMA (x86-64)
  start_pfn = ((node_data[nid])-node_start_pfn);
  end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat-node_start_pfn + 
__pgdat-node_spanned_pages;});

Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

Changelog:
 - fixed to avoid using nid twice in node_end_pfn() macro.
---
 arch/alpha/include/asm/mmzone.h   |1 -
 arch/m32r/include/asm/mmzone.h|8 +---
 arch/parisc/include/asm/mmzone.h  |7 ---
 arch/powerpc/include/asm/mmzone.h |7 ---
 arch/sh/include/asm/mmzone.h  |4 
 arch/sparc/include/asm/mmzone.h   |2 --
 arch/tile/include/asm/mmzone.h|   11 ---
 arch/x86/include/asm/mmzone_32.h  |   11 ---
 arch/x86/include/asm/mmzone_64.h  |3 ---
 include/linux/mmzone.h|7 +++
 10 files changed, 8 insertions(+), 53 deletions(-)

diff --git a/arch/alpha/include/asm/mmzone.h b/arch/alpha/include/asm/mmzone.h
index 8af56ce..445dc42 100644
--- a/arch/alpha/include/asm/mmzone.h
+++ b/arch/alpha/include/asm/mmzone.h
@@ -56,7 +56,6 @@ PLAT_NODE_DATA_LOCALNR(unsigned long p, int n)
  * Given a kernel address, find the home node of the underlying memory.
  */
 #define kvaddr_to_nid(kaddr)   pa_to_nid(__pa(kaddr))
-#define node_start_pfn(nid)(NODE_DATA(nid)-node_start_pfn)
 
 /*
  * Given a kaddr, LOCAL_BASE_ADDR finds the owning node of the memory
diff --git a/arch/m32r/include/asm/mmzone.h b/arch/m32r/include/asm/mmzone.h
index 9f3b5ac..115ced3 100644
--- a/arch/m32r/include/asm/mmzone.h
+++ b/arch/m32r/include/asm/mmzone.h
@@ -14,12 +14,6 @@ extern struct pglist_data *node_data[];
 #define NODE_DATA(nid) (node_data[nid])
 
 #define node_localnr(pfn, nid) ((pfn) - NODE_DATA(nid)-node_start_pfn)
-#define node_start_pfn(nid)(NODE_DATA(nid)-node_start_pfn)
-#define node_end_pfn(nid)  \
-({ \
-   pg_data_t *__pgdat = NODE_DATA(nid);\
-   __pgdat-node_start_pfn + __pgdat-node_spanned_pages - 1;  \
-})
 
 #define pmd_page(pmd)  (pfn_to_page(pmd_val(pmd)  PAGE_SHIFT))
 /*
@@ -44,7 +38,7 @@ static __inline__ int pfn_to_nid(unsigned long pfn)
int node;
 
for (node = 0 ; node  MAX_NUMNODES ; node++)
-   if (pfn = node_start_pfn(node)  pfn = node_end_pfn(node))
+   if (pfn = node_start_pfn(node)  pfn  node_end_pfn(node))
break;
 
return node;
diff --git a/arch/parisc/include/asm/mmzone.h b/arch/parisc/include/asm/mmzone.h
index 9608d2c..e67eb9c 100644
--- a/arch/parisc/include/asm/mmzone.h
+++ b/arch/parisc/include/asm/mmzone.h
@@ -14,13 +14,6 @@ extern struct

Re: [linux-2.6.36-git7: Power7] LTP Memory CGROUP Controller functional test creates Backtrace, OOMKill rcu_sched_state detected stall jiffies

2010-10-26 Thread KAMEZAWA Hiroyuki
On Tue, 26 Oct 2010 16:03:56 +0530
Subrata Modak subr...@linux.vnet.ibm.com wrote:

 If you run LTP Memory CGROUP Controller functional test on
 linux-2.6.36-git7, the following Backtrace, OOMKill  rcu_sched_state
 detected stall jiffies are created. The machine is not reachable
 thereafter. Ways to reproduce this problem:
 
 1) Build and boot kernel 2.6.36-git7 on Power7 machine with attached
 config file,
 2) Fetch, build and install LTP:
   git clone git://ltp.git.sourceforge.net/gitroot/ltp/ltp
   cd ltp
   ./configure
   make
   make install
 3) Create a LTP runtest file /opt/ltp/runtest/memcg_function_test with
 the following entry:
 memcg_function  memcg_function_test.sh
 EOF
   cd /opt/ltp
   ./runltp -f memcg_function_test
 

IIUC, memcg test includes intentional OOM-Kill test by setting the limit to 0.
And it has another test to set the limit to PAGE_SIZE.

In your environemnt, I think page size is 64kb...right ?

About rcu_sched_state()I have no idea at this stage. I reviewed memcontrol.c
and oom_kill.c again and coundn't found anything in quick review.

Could you try again after -rc1 shipped ?
I think Andrew Morton sent some amount of updates for oom_kill and memcg, vmscan
to Linus, today.

Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 3/9] v3 Add section count to memory_block struct

2010-10-04 Thread KAMEZAWA Hiroyuki
On Fri, 01 Oct 2010 13:30:40 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Add a section count property to the memory_block struct to track the number
 of memory sections that have been added/removed from a memory block. This
 allows us to know when the last memory section of a memory block has been
 removed so we can remove the memory block.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
 

Reviewed-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

a nitpick,


 Index: linux-next/include/linux/memory.h
 ===
 --- linux-next.orig/include/linux/memory.h2010-09-29 14:56:29.0 
 -0500
 +++ linux-next/include/linux/memory.h 2010-09-30 14:13:50.0 -0500
 @@ -23,6 +23,8 @@
  struct memory_block {
   unsigned long phys_index;
   unsigned long state;
 + int section_count;

I prefer
int section_count; /* updated under mutex */

or some for this kind of non-atomic counters. but nitpick.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 4/9] v3 Allow memory blocks to span multiple memory sections

2010-10-04 Thread KAMEZAWA Hiroyuki
On Fri, 01 Oct 2010 14:00:50 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the memory sysfs code such that each sysfs memory directory is now
 considered a memory block that can span multiple memory sections per
 memory block.  The default size of each memory block is SECTION_SIZE_BITS
 to maintain the current behavior of having a single memory section per
 memory block (i.e. one sysfs directory per memory section).
 
 For architectures that want to have memory blocks span multiple
 memory sections they need only define their own memory_block_size_bytes()
 routine.
 
This should be commented in code before MEMORY_BLOCK_SIZE declaration.

 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
 

Reviewed-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/9] v3 rename phys_index properties of memory block struct

2010-10-04 Thread KAMEZAWA Hiroyuki
On Fri, 01 Oct 2010 13:33:38 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the 'phys_index' property of a the memory_block struct to be
 called start_section_nr, and add a end_section_nr property.  The
 data tracked here is the same but the updated naming is more in line
 with what is stored here, namely the first and last section number
 that the memory block spans.
 
 The names presented to userspace remain the same, phys_index for
 start_section_nr and end_phys_index for end_section_nr, to avoid breaking
 anything in userspace.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Reviewed-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 6/9] v3 Update node sysfs code

2010-10-04 Thread KAMEZAWA Hiroyuki
On Fri, 01 Oct 2010 13:34:34 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the node sysfs code to be aware of the new capability for a memory
 block to contain multiple memory sections and be aware of the memory block
 structure name changes (start_section_nr).  This requires an additional
 parameter to unregister_mem_sect_under_nodes so that we know which memory
 section of the memory block to unregister.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
Reviewed-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 9/9] v3 Update memory hotplug documentation

2010-10-04 Thread KAMEZAWA Hiroyuki
On Fri, 01 Oct 2010 13:37:49 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the memory hotplug documentation to reflect the new behaviors of
 memory blocks reflected in sysfs.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
 
Reviewed-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

Thank you for your patient work!.



 ---
  Documentation/memory-hotplug.txt |   47 
 +--
  1 file changed, 31 insertions(+), 16 deletions(-)
 
 Index: linux-next/Documentation/memory-hotplug.txt
 ===
 --- linux-next.orig/Documentation/memory-hotplug.txt  2010-09-29 
 14:56:24.0 -0500
 +++ linux-next/Documentation/memory-hotplug.txt   2010-09-30 
 14:59:47.0 -0500
 @@ -126,36 +126,51 @@
  
  4 sysfs files for memory hotplug
  
 -All sections have their device information under /sys/devices/system/memory 
 as
 +All sections have their device information in sysfs.  Each section is part of
 +a memory block under /sys/devices/system/memory as
  
  /sys/devices/system/memory/memoryXXX
 -(XXX is section id.)
 +(XXX is the section id.)
  
 -Now, XXX is defined as start_address_of_section / section_size.
 +Now, XXX is defined as (start_address_of_section / section_size) of the first
 +section contained in the memory block.  The files 'phys_index' and
 +'end_phys_index' under each directory report the beginning and end section 
 id's
 +for the memory block covered by the sysfs directory.  It is expected that all
 +memory sections in this range are present and no memory holes exist in the
 +range. Currently there is no way to determine if there is a memory hole, but
 +the existence of one should not affect the hotplug capabilities of the memory
 +block.
  
  For example, assume 1GiB section size. A device for a memory starting at
  0x1 is /sys/device/system/memory/memory4
  (0x1 / 1Gib = 4)
  This device covers address range [0x1 ... 0x14000)
  
 -Under each section, you can see 4 files.
 +Under each section, you can see 4 or 5 files, the end_phys_index file being
 +a recent addition and not present on older kernels.
  
 -/sys/devices/system/memory/memoryXXX/phys_index
 +/sys/devices/system/memory/memoryXXX/start_phys_index
 +/sys/devices/system/memory/memoryXXX/end_phys_index
  /sys/devices/system/memory/memoryXXX/phys_device
  /sys/devices/system/memory/memoryXXX/state
  /sys/devices/system/memory/memoryXXX/removable
  
 -'phys_index' : read-only and contains section id, same as XXX.
 -'state'  : read-write
 -   at read:  contains online/offline state of memory.
 -   at write: user can specify online, offline command
 -'phys_device': read-only: designed to show the name of physical memory 
 device.
 -   This is not well implemented now.
 -'removable'  : read-only: contains an integer value indicating
 -   whether the memory section is removable or not
 -   removable.  A value of 1 indicates that the memory
 -   section is removable and a value of 0 indicates that
 -   it is not removable.
 +'phys_index'  : read-only and contains section id of the first section
 + in the memory block, same as XXX.
 +'end_phys_index'  : read-only and contains section id of the last section
 + in the memory block.
 +'state'   : read-write
 +at read:  contains online/offline state of memory.
 +at write: user can specify online, offline command
 +which will be performed on al sections in the block.
 +'phys_device' : read-only: designed to show the name of physical memory
 +device.  This is not well implemented now.
 +'removable'   : read-only: contains an integer value indicating
 +whether the memory block is removable or not
 +removable.  A value of 1 indicates that the memory
 +block is removable and a value of 0 indicates that
 +it is not removable. A memory block is removable only if
 +every section in the block is removable.
  
  NOTE:
These directories/files appear after physical memory hotplug phase.
 
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/9] v3 Move find_memory_block routine

2010-10-04 Thread KAMEZAWA Hiroyuki
On Fri, 01 Oct 2010 13:28:39 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Move the find_memory_block() routine up to avoid needing a forward
 declaration in subsequent patches.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
 
Reviewd-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/9] v3 Add mutex for adding/removing memory blocks

2010-10-04 Thread KAMEZAWA Hiroyuki
On Fri, 01 Oct 2010 13:29:42 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Add a new mutex for use in adding and removing of memory blocks.  This
 is needed to avoid any race conditions in which the same memory block could
 be added and removed at the same time.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
 
Reviewed-By: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/9] v4 Move the find_memory_block() routine up

2010-08-04 Thread KAMEZAWA Hiroyuki
On Tue, 03 Aug 2010 08:36:39 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Move the find_memory_block() routine up to avoid needing a forward
 declaration in subsequent patches.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/9] v4 Add new phys_index properties

2010-08-04 Thread KAMEZAWA Hiroyuki
On Tue, 03 Aug 2010 08:37:31 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the 'phys_index' properties of a memory block to include a
 'start_phys_index' which is the same as the current 'phys_index' property.
 The property still appears as 'phys_index' in sysfs but the memory_block
 struct name is updated to indicate the start and end values.
 This also adds an 'end_phys_index' property to indicate the id of the
 last section in th memory block.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

nitpick. After this patch, end_phys_index is added but contains 0.
It's better to contain the same value with phys_index..

But, ok. Following patch will fix it.

Thanks,
-Kame

 ---
  drivers/base/memory.c  |   28 
  include/linux/memory.h |3 ++-
  2 files changed, 22 insertions(+), 9 deletions(-)
 
 Index: linux-2.6/drivers/base/memory.c
 ===
 --- linux-2.6.orig/drivers/base/memory.c  2010-08-02 13:32:21.0 
 -0500
 +++ linux-2.6/drivers/base/memory.c   2010-08-02 13:33:27.0 -0500
 @@ -109,12 +109,20 @@ unregister_memory(struct memory_block *m
   * uses.
   */
  
 -static ssize_t show_mem_phys_index(struct sys_device *dev,
 +static ssize_t show_mem_start_phys_index(struct sys_device *dev,
   struct sysdev_attribute *attr, char *buf)
  {
   struct memory_block *mem =
   container_of(dev, struct memory_block, sysdev);
 - return sprintf(buf, %08lx\n, mem-phys_index);
 + return sprintf(buf, %08lx\n, mem-start_phys_index);
 +}
 +
 +static ssize_t show_mem_end_phys_index(struct sys_device *dev,
 + struct sysdev_attribute *attr, char *buf)
 +{
 + struct memory_block *mem =
 + container_of(dev, struct memory_block, sysdev);
 + return sprintf(buf, %08lx\n, mem-end_phys_index);
  }
  
  /*
 @@ -128,7 +136,7 @@ static ssize_t show_mem_removable(struct
   struct memory_block *mem =
   container_of(dev, struct memory_block, sysdev);
  
 - start_pfn = section_nr_to_pfn(mem-phys_index);
 + start_pfn = section_nr_to_pfn(mem-start_phys_index);
   ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
   return sprintf(buf, %d\n, ret);
  }
 @@ -191,7 +199,7 @@ memory_block_action(struct memory_block
   int ret;
   int old_state = mem-state;
  
 - psection = mem-phys_index;
 + psection = mem-start_phys_index;
   first_page = pfn_to_page(psection  PFN_SECTION_SHIFT);
  
   /*
 @@ -264,7 +272,7 @@ store_mem_state(struct sys_device *dev,
   int ret = -EINVAL;
  
   mem = container_of(dev, struct memory_block, sysdev);
 - phys_section_nr = mem-phys_index;
 + phys_section_nr = mem-start_phys_index;
  
   if (!present_section_nr(phys_section_nr))
   goto out;
 @@ -296,7 +304,8 @@ static ssize_t show_phys_device(struct s
   return sprintf(buf, %d\n, mem-phys_device);
  }
  
 -static SYSDEV_ATTR(phys_index, 0444, show_mem_phys_index, NULL);
 +static SYSDEV_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
 +static SYSDEV_ATTR(end_phys_index, 0444, show_mem_end_phys_index, NULL);
  static SYSDEV_ATTR(state, 0644, show_mem_state, store_mem_state);
  static SYSDEV_ATTR(phys_device, 0444, show_phys_device, NULL);
  static SYSDEV_ATTR(removable, 0444, show_mem_removable, NULL);
 @@ -476,16 +485,18 @@ static int add_memory_block(int nid, str
   if (!mem)
   return -ENOMEM;
  
 - mem-phys_index = __section_nr(section);
 + mem-start_phys_index = __section_nr(section);
   mem-state = state;
   mutex_init(mem-state_mutex);
 - start_pfn = section_nr_to_pfn(mem-phys_index);
 + start_pfn = section_nr_to_pfn(mem-start_phys_index);
   mem-phys_device = arch_get_memory_phys_device(start_pfn);
  
   ret = register_memory(mem, section);
   if (!ret)
   ret = mem_create_simple_file(mem, phys_index);
   if (!ret)
 + ret = mem_create_simple_file(mem, end_phys_index);
 + if (!ret)
   ret = mem_create_simple_file(mem, state);
   if (!ret)
   ret = mem_create_simple_file(mem, phys_device);
 @@ -507,6 +518,7 @@ int remove_memory_block(unsigned long no
   mem = find_memory_block(section);
   unregister_mem_sect_under_nodes(mem);
   mem_remove_simple_file(mem, phys_index);
 + mem_remove_simple_file(mem, end_phys_index);
   mem_remove_simple_file(mem, state);
   mem_remove_simple_file(mem, phys_device);
   mem_remove_simple_file(mem, removable);
 Index: linux-2.6/include/linux/memory.h
 ===
 --- linux-2.6.orig/include/linux/memory.h 2010-08-02 13:23:49.0 
 -0500
 +++ linux-2.6/include/linux/memory.h  2010-08-02 13:33:27.0 -0500
 @@ -21,7 +21,8 @@
  #include linux

Re: [PATCH 3/9] v4 Add section count to memory_block

2010-08-04 Thread KAMEZAWA Hiroyuki
On Tue, 03 Aug 2010 08:38:37 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Add a section count property to the memory_block struct to track the number
 of memory sections that have been added/removed from a memory block. This
 allows us to know when the last memory section of a memory block has been
 removed so we can remove the memory block.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 4/9] v4 Add mutex for add/remove of memory blocks

2010-08-04 Thread KAMEZAWA Hiroyuki
On Tue, 03 Aug 2010 08:39:50 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Add a new mutex for use in adding and removing of memory blocks.  This
 is needed to avoid any race conditions in which the same memory block could
 be added and removed at the same time.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

But a nitpick (see below)

 ---
  drivers/base/memory.c |9 +
  1 file changed, 9 insertions(+)
 
 Index: linux-2.6/drivers/base/memory.c
 ===
 --- linux-2.6.orig/drivers/base/memory.c  2010-08-02 13:35:00.0 
 -0500
 +++ linux-2.6/drivers/base/memory.c   2010-08-02 13:45:34.0 -0500
 @@ -27,6 +27,8 @@
  #include asm/atomic.h
  #include asm/uaccess.h
  
 +static struct mutex mem_sysfs_mutex;
 +

For static symbol of mutex, we usually do
static DEFINE_MUTEX(mem_sysfs_mutex);

Then, extra calls of mutex_init() is not required.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/9] v4 Allow memory_block to span multiple memory sections

2010-08-04 Thread KAMEZAWA Hiroyuki
On Tue, 03 Aug 2010 08:40:49 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the memory sysfs code that each sysfs memory directory is now
 considered a memory block that can contain multiple memory sections per
 memory block.  The default size of each memory block is SECTION_SIZE_BITS
 to maintain the current behavior of having a single memory section per
 memory block (i.e. one sysfs directory per memory section).
 
 For architectures that want to have memory blocks span multiple
 memory sections they need only define their own memory_block_size_bytes()
 routine.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
(But maybe it's better to get ppc guy's Ack.)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 6/9] v4 Update the find_memory_block declaration

2010-08-04 Thread KAMEZAWA Hiroyuki
On Tue, 03 Aug 2010 08:41:45 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the find_memory_block declaration to to take a struct mem_section *
 so that it matches the definition.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

Hmm...my mmotm-0727 has this definition in memory.h...

extern struct memory_block *find_memory_block(struct mem_section *);

What patch makes it unsigned long ?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 7/9] v4 Update the node sysfs code

2010-08-04 Thread KAMEZAWA Hiroyuki
On Tue, 03 Aug 2010 08:42:35 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the node sysfs code to be aware of the new capability for a memory
 block to contain multiple memory sections.  This requires an additional
 parameter to unregister_mem_sect_under_nodes so that we know which memory
 section of the memory block to unregister.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 9/9] v4 Update memory-hotplug documentation

2010-08-04 Thread KAMEZAWA Hiroyuki
On Tue, 03 Aug 2010 08:44:16 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the memory hotplug documentation to reflect the new behaviors of
 memory blocks reflected in sysfs.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

A request from me:

 Could you clarify what happens if there are memory hole in [start 
end)_phys_index.
 in Documentation ? (Or add TODO list.)

Thanks,
-Kame


 ---
  Documentation/memory-hotplug.txt |   40 
 +++
  1 file changed, 24 insertions(+), 16 deletions(-)
 
 Index: linux-2.6/Documentation/memory-hotplug.txt
 ===
 --- linux-2.6.orig/Documentation/memory-hotplug.txt   2010-08-02 
 14:09:28.0 -0500
 +++ linux-2.6/Documentation/memory-hotplug.txt2010-08-02 
 14:10:36.0 -0500
 @@ -126,36 +126,44 @@ config options.
  
  4 sysfs files for memory hotplug
  
 -All sections have their device information under /sys/devices/system/memory 
 as
 +All sections have their device information in sysfs.  Each section is part of
 +a memory block under /sys/devices/system/memory as
  
  /sys/devices/system/memory/memoryXXX
 -(XXX is section id.)
 +(XXX is the section id.)
  
 -Now, XXX is defined as start_address_of_section / section_size.
 +Now, XXX is defined as (start_address_of_section / section_size) of the first
 +section contained in the memory block.
  
  For example, assume 1GiB section size. A device for a memory starting at
  0x1 is /sys/device/system/memory/memory4
  (0x1 / 1Gib = 4)
  This device covers address range [0x1 ... 0x14000)
  
 -Under each section, you can see 4 files.
 +Under each section, you can see 5 files.
  
 -/sys/devices/system/memory/memoryXXX/phys_index
 +/sys/devices/system/memory/memoryXXX/start_phys_index
 +/sys/devices/system/memory/memoryXXX/end_phys_index
  /sys/devices/system/memory/memoryXXX/phys_device
  /sys/devices/system/memory/memoryXXX/state
  /sys/devices/system/memory/memoryXXX/removable
  
 -'phys_index' : read-only and contains section id, same as XXX.
 -'state'  : read-write
 -   at read:  contains online/offline state of memory.
 -   at write: user can specify online, offline command
 -'phys_device': read-only: designed to show the name of physical memory 
 device.
 -   This is not well implemented now.
 -'removable'  : read-only: contains an integer value indicating
 -   whether the memory section is removable or not
 -   removable.  A value of 1 indicates that the memory
 -   section is removable and a value of 0 indicates that
 -   it is not removable.
 +'phys_index'  : read-only and contains section id of the first section
 + in the memory block, same as XXX.
 +'end_phys_index'  : read-only and contains section id of the last section
 + in the memory block.
 +'state'   : read-write
 +at read:  contains online/offline state of memory.
 +at write: user can specify online, offline command
 +which will be performed on al sections in the block.
 +'phys_device' : read-only: designed to show the name of physical memory
 +device.  This is not well implemented now.
 +'removable'   : read-only: contains an integer value indicating
 +whether the memory block is removable or not
 +removable.  A value of 1 indicates that the memory
 +block is removable and a value of 0 indicates that
 +it is not removable. A memory block is removable only if
 +every section in the block is removable.
  
  NOTE:
These directories/files appear after physical memory hotplug phase.
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] v3 Move the find_memory_block() routine up

2010-07-20 Thread KAMEZAWA Hiroyuki
On Mon, 19 Jul 2010 22:51:42 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Move the find_me mory_block() routine up to avoid needing a forward
 declaration in subsequent patches.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

 ---
  drivers/base/memory.c |   62 
 +-
  1 file changed, 31 insertions(+), 31 deletions(-)
 
 Index: linux-2.6/drivers/base/memory.c
 ===
 --- linux-2.6.orig/drivers/base/memory.c  2010-07-16 12:41:30.0 
 -0500
 +++ linux-2.6/drivers/base/memory.c   2010-07-19 20:42:11.0 -0500
 @@ -435,6 +435,37 @@ int __weak arch_get_memory_phys_device(u
   return 0;
  }
  
 +/*
 + * For now, we have a linear search to go find the appropriate
 + * memory_block corresponding to a particular phys_index. If
 + * this gets to be a real problem, we can always use a radix
 + * tree or something here.
 + *
 + * This could be made generic for all sysdev classes.
 + */
 +struct memory_block *find_memory_block(struct mem_section *section)
 +{
 + struct kobject *kobj;
 + struct sys_device *sysdev;
 + struct memory_block *mem;
 + char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
 +
 + /*
 +  * This only works because we know that section == sysdev-id
 +  * slightly redundant with sysdev_register()
 +  */
 + sprintf(name[0], %s%d, MEMORY_CLASS_NAME, __section_nr(section));
 +
 + kobj = kset_find_obj(memory_sysdev_class.kset, name);
 + if (!kobj)
 + return NULL;
 +
 + sysdev = container_of(kobj, struct sys_device, kobj);
 + mem = container_of(sysdev, struct memory_block, sysdev);
 +
 + return mem;
 +}
 +
  static int add_memory_block(int nid, struct mem_section *section,
   unsigned long state, enum mem_add_context context)
  {
 @@ -468,37 +499,6 @@ static int add_memory_block(int nid, str
   return ret;
  }
  
 -/*
 - * For now, we have a linear search to go find the appropriate
 - * memory_block corresponding to a particular phys_index. If
 - * this gets to be a real problem, we can always use a radix
 - * tree or something here.
 - *
 - * This could be made generic for all sysdev classes.
 - */
 -struct memory_block *find_memory_block(struct mem_section *section)
 -{
 - struct kobject *kobj;
 - struct sys_device *sysdev;
 - struct memory_block *mem;
 - char name[sizeof(MEMORY_CLASS_NAME) + 9 + 1];
 -
 - /*
 -  * This only works because we know that section == sysdev-id
 -  * slightly redundant with sysdev_register()
 -  */
 - sprintf(name[0], %s%d, MEMORY_CLASS_NAME, __section_nr(section));
 -
 - kobj = kset_find_obj(memory_sysdev_class.kset, name);
 - if (!kobj)
 - return NULL;
 -
 - sysdev = container_of(kobj, struct sys_device, kobj);
 - mem = container_of(sysdev, struct memory_block, sysdev);
 -
 - return mem;
 -}
 -
  int remove_memory_block(unsigned long node_id, struct mem_section *section,
   int phys_device)
  {
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/8] v3 Add new phys_index properties

2010-07-20 Thread KAMEZAWA Hiroyuki
On Mon, 19 Jul 2010 22:52:50 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the 'phys_index' properties of a memory block to include a
 'start_phys_index' which is the same as the current 'phys_index' property.
 This also adds an 'end_phys_index' property to indicate the id of the
 last section in th memory block.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

No, please remain phys_index as it is. please don't rename it.
IMHO, just adding end_phys_index is better.
please avoid interface change AFAP.

Do you have a problem if phys_index means start_phys_index ?

Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 3/8] v3 Add section count to memory_block

2010-07-20 Thread KAMEZAWA Hiroyuki
On Mon, 19 Jul 2010 22:53:58 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Add a section count property to the memory_block struct to track the number
 of memory sections that have been added/removed from a emory block.
 
 Signed-off-by: Nathan Fontenot nf...@asutin.ibm.com
 ---
  drivers/base/memory.c  |   19 ---
  include/linux/memory.h |2 ++
  2 files changed, 14 insertions(+), 7 deletions(-)
 
 Index: linux-2.6/drivers/base/memory.c
 ===
 --- linux-2.6.orig/drivers/base/memory.c  2010-07-19 20:43:49.0 
 -0500
 +++ linux-2.6/drivers/base/memory.c   2010-07-19 20:44:01.0 -0500
 @@ -487,6 +487,7 @@ static int add_memory_block(int nid, str
  
   mem-start_phys_index = __section_nr(section);
   mem-state = state;
 + atomic_inc(mem-section_count);
   mutex_init(mem-state_mutex);
   start_pfn = section_nr_to_pfn(mem-start_phys_index);
   mem-phys_device = arch_get_memory_phys_device(start_pfn);
 @@ -516,13 +517,17 @@ int remove_memory_block(unsigned long no
   struct memory_block *mem;
  
   mem = find_memory_block(section);
 - unregister_mem_sect_under_nodes(mem);
 - mem_remove_simple_file(mem, start_phys_index);
 - mem_remove_simple_file(mem, end_phys_index);
 - mem_remove_simple_file(mem, state);
 - mem_remove_simple_file(mem, phys_device);
 - mem_remove_simple_file(mem, removable);
 - unregister_memory(mem, section);
 + atomic_dec(mem-section_count);
 +
 + if (atomic_read(mem-section_count) == 0) {

We use atomic_dec_and_test() in usual.

Otherwise, I don't see other problems in other part. Please fix this nitpick.

Regards,
-Kame


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 4/8] v3 Allow memory_block to span multiple memory sections

2010-07-20 Thread KAMEZAWA Hiroyuki
On Mon, 19 Jul 2010 22:55:08 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the memory sysfs code that each sysfs memory directory is now
 considered a memory block that can contain multiple memory sections per
 memory block.  The default size of each memory block is SECTION_SIZE_BITS
 to maintain the current behavior of having a single memory section per
 memory block (i.e. one sysfs directory per memory section).
 
 For architectures that want to have memory blocks span multiple
 memory sections they need only define their own memory_block_size_bytes()
 routine.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
 ---
  drivers/base/memory.c |  141 
 ++
  1 file changed, 98 insertions(+), 43 deletions(-)
 
 Index: linux-2.6/drivers/base/memory.c
 ===
 --- linux-2.6.orig/drivers/base/memory.c  2010-07-19 20:44:01.0 
 -0500
 +++ linux-2.6/drivers/base/memory.c   2010-07-19 21:12:22.0 -0500
 @@ -28,6 +28,14 @@
  #include asm/uaccess.h
  
  #define MEMORY_CLASS_NAMEmemory
 +#define MIN_MEMORY_BLOCK_SIZE(1  SECTION_SIZE_BITS)
 +
 +static int sections_per_block;
 +
 +static inline int base_memory_block_id(int section_nr)
 +{
 + return (section_nr / sections_per_block) * sections_per_block;
 +}
  
  static struct sysdev_class memory_sysdev_class = {
   .name = MEMORY_CLASS_NAME,
 @@ -82,22 +90,21 @@ EXPORT_SYMBOL(unregister_memory_isolate_
   * register_memory - Setup a sysfs device for a memory block
   */
  static
 -int register_memory(struct memory_block *memory, struct mem_section *section)
 +int register_memory(struct memory_block *memory)
  {
   int error;
  
   memory-sysdev.cls = memory_sysdev_class;
 - memory-sysdev.id = __section_nr(section);
 + memory-sysdev.id = memory-start_phys_index;

I'm curious that this memory-start_phys_index can't overflow ?
sysdev.id is 32bit.


Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/8] v3 Update the find_memory_block declaration

2010-07-20 Thread KAMEZAWA Hiroyuki
On Mon, 19 Jul 2010 22:56:16 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the find_memory_block declaration to to take a struct mem_section *
 so that it matches the definition.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
Reviewd-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 6/8] v3 Update the node sysfs code

2010-07-20 Thread KAMEZAWA Hiroyuki
On Mon, 19 Jul 2010 22:57:35 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the node sysfs code to be aware of the new capability for a memory
 block to contain multiple memory sections.  This requires an additional
 parameter to unregister_mem_sect_under_nodes so that we know which memory
 section of the memory block to unregister.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/5] v2 Split the memory_block structure

2010-07-15 Thread KAMEZAWA Hiroyuki
On Thu, 15 Jul 2010 13:37:51 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Split the memory_block struct into a memory_block
 struct to cover each sysfs directory and a new memory_block_section
 struct for each memory section covered by the sysfs directory.
 This change allows for creation of memory sysfs directories that
 can span multiple memory sections.
 
 This can be beneficial in that it can reduce the number of memory
 sysfs directories created at boot.  This also allows different
 architectures to define how many memory sections are covered by
 a sysfs directory.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
 ---
  drivers/base/memory.c  |  222 
 ++---
  include/linux/memory.h |   11 +-
  2 files changed, 167 insertions(+), 66 deletions(-)
 
 Index: linux-2.6/drivers/base/memory.c
 ===
 --- linux-2.6.orig/drivers/base/memory.c  2010-07-15 08:48:41.0 
 -0500
 +++ linux-2.6/drivers/base/memory.c   2010-07-15 09:55:54.0 -0500
 @@ -28,6 +28,14 @@
  #include asm/uaccess.h
  
  #define MEMORY_CLASS_NAMEmemory
 +#define MIN_MEMORY_BLOCK_SIZE(1  SECTION_SIZE_BITS)
 +
 +static int sections_per_block;
 +
 +static inline int base_memory_block_id(int section_nr)
 +{
 + return (section_nr / sections_per_block) * sections_per_block;
 +}
  
  static struct sysdev_class memory_sysdev_class = {
   .name = MEMORY_CLASS_NAME,
 @@ -94,10 +102,9 @@
  }
  
  static void
 -unregister_memory(struct memory_block *memory, struct mem_section *section)
 +unregister_memory(struct memory_block *memory)
  {
   BUG_ON(memory-sysdev.cls != memory_sysdev_class);
 - BUG_ON(memory-sysdev.id != __section_nr(section));
  
   /* drop the ref. we got in remove_memory_block() */
   kobject_put(memory-sysdev.kobj);
 @@ -123,13 +130,20 @@
  static ssize_t show_mem_removable(struct sys_device *dev,
   struct sysdev_attribute *attr, char *buf)
  {
 + struct memory_block *mem;
 + struct memory_block_section *mbs;
   unsigned long start_pfn;
 - int ret;
 - struct memory_block *mem =
 - container_of(dev, struct memory_block, sysdev);
 + int ret = 1;
 +
 + mem = container_of(dev, struct memory_block, sysdev);
 + mutex_lock(mem-state_mutex);
  
 - start_pfn = section_nr_to_pfn(mem-phys_index);
 - ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
 + list_for_each_entry(mbs, mem-sections, next) {
 + start_pfn = section_nr_to_pfn(mbs-phys_index);
 + ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
 + }
 +
 + mutex_unlock(mem-state_mutex);

Hmm, this means memory cab be offlined the while memory block section. Right ?
Please write this fact in patch description...
And Documentaion/memory_hotplug.txt as From user's perspective, memory section
is not a unit of memory hotplug anymore.
And descirbe about a new rule.


   return sprintf(buf, %d\n, ret);
  }
  
 @@ -182,16 +196,16 @@
   * OK to have direct references to sparsemem variables in here.
   */
  static int
 -memory_block_action(struct memory_block *mem, unsigned long action)
 +memory_block_action(struct memory_block_section *mbs, unsigned long action)
  {
   int i;
   unsigned long psection;
   unsigned long start_pfn, start_paddr;
   struct page *first_page;
   int ret;
 - int old_state = mem-state;
 + int old_state = mbs-state;
  
 - psection = mem-phys_index;
 + psection = mbs-phys_index;
   first_page = pfn_to_page(psection  PFN_SECTION_SHIFT);
  
   /*
 @@ -217,18 +231,18 @@
   ret = online_pages(start_pfn, PAGES_PER_SECTION);
   break;
   case MEM_OFFLINE:
 - mem-state = MEM_GOING_OFFLINE;
 + mbs-state = MEM_GOING_OFFLINE;
   start_paddr = page_to_pfn(first_page)  PAGE_SHIFT;
   ret = remove_memory(start_paddr,
   PAGES_PER_SECTION  PAGE_SHIFT);
   if (ret) {
 - mem-state = old_state;
 + mbs-state = old_state;
   break;
   }
   break;
   default:
   WARN(1, KERN_WARNING %s(%p, %ld) unknown action: 
 %ld\n,
 - __func__, mem, action, action);
 + __func__, mbs, action, action);
   ret = -EINVAL;
   }
  
 @@ -238,19 +252,34 @@

And please check quilt's diff option.
Usual patche in ML shows a function name in any changes, as
@@ -241,6 +293,8 @@ static int memory_block_change_state(str

Maybe -p option is lacked..


  static int memory_block_change_state(struct memory_block *mem,
   unsigned 

Re: [PATCH 2/5] v2 Create new 'end_phys_index' file

2010-07-15 Thread KAMEZAWA Hiroyuki
On Thu, 15 Jul 2010 13:38:52 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Add a new 'end_phys_index' file to each memory sysfs directory to
 report the physical index of the last memory section
 covered by the sysfs directory.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Does memory_block have to be contiguous between [phys_index, end_phys_index] ?
Should we provide # of sections or amount of memory under a block ?

No objections to end_phys_index...buf plz fix diff style.

Thanks,
-Kame


 ---
  drivers/base/memory.c  |   14 +-
  include/linux/memory.h |3 +++
  2 files changed, 16 insertions(+), 1 deletion(-)
 
 Index: linux-2.6/drivers/base/memory.c
 ===
 --- linux-2.6.orig/drivers/base/memory.c  2010-07-15 09:55:54.0 
 -0500
 +++ linux-2.6/drivers/base/memory.c   2010-07-15 09:56:05.0 -0500
 @@ -121,7 +121,15 @@
  {
   struct memory_block *mem =
   container_of(dev, struct memory_block, sysdev);
 - return sprintf(buf, %08lx\n, mem-phys_index);
 + return sprintf(buf, %08lx\n, mem-start_phys_index);
 +}
 +
 +static ssize_t show_mem_end_phys_index(struct sys_device *dev,
 + struct sysdev_attribute *attr, char *buf)
 +{
 + struct memory_block *mem =
 + container_of(dev, struct memory_block, sysdev);
 + return sprintf(buf, %08lx\n, mem-end_phys_index);
  }
  
  /*
 @@ -321,6 +329,7 @@
  }
  
  static SYSDEV_ATTR(phys_index, 0444, show_mem_phys_index, NULL);
 +static SYSDEV_ATTR(end_phys_index, 0444, show_mem_end_phys_index, NULL);
  static SYSDEV_ATTR(state, 0644, show_mem_state, store_mem_state);
  static SYSDEV_ATTR(phys_device, 0444, show_phys_device, NULL);
  static SYSDEV_ATTR(removable, 0444, show_mem_removable, NULL);
 @@ -533,6 +542,8 @@
   if (!ret)
   ret = mem_create_simple_file(mem, phys_index);
   if (!ret)
 + ret = mem_create_simple_file(mem, end_phys_index);
 + if (!ret)
   ret = mem_create_simple_file(mem, state);
   if (!ret)
   ret = mem_create_simple_file(mem, phys_device);
 @@ -577,6 +588,7 @@
   if (list_empty(mem-sections)) {
   unregister_mem_sect_under_nodes(mem);
   mem_remove_simple_file(mem, phys_index);
 + mem_remove_simple_file(mem, end_phys_index);
   mem_remove_simple_file(mem, state);
   mem_remove_simple_file(mem, phys_device);
   mem_remove_simple_file(mem, removable);
 Index: linux-2.6/include/linux/memory.h
 ===
 --- linux-2.6.orig/include/linux/memory.h 2010-07-15 09:54:06.0 
 -0500
 +++ linux-2.6/include/linux/memory.h  2010-07-15 09:56:05.0 -0500
 @@ -29,6 +29,9 @@
  
  struct memory_block {
   unsigned long state;
 + unsigned long start_phys_index;
 + unsigned long end_phys_index;
 +
   /*
* This serializes all state change requests.  It isn't
* held during creation because the control files are
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 4/5] v2 Update sysfs node routines for new sysfs memory directories

2010-07-15 Thread KAMEZAWA Hiroyuki
On Thu, 15 Jul 2010 13:40:40 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 Update the node sysfs directory routines that create
 links to the memory sysfs directories under each node.
 This update makes the node code aware that a memory sysfs
 directory can cover multiple memory sections.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com

Shouldn't static int link_mem_sections(int nid) be update ?
It does
 for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
register..

Thanks,
-Kame


 ---
  drivers/base/node.c |   12 
  1 file changed, 8 insertions(+), 4 deletions(-)
 
 Index: linux-2.6/drivers/base/node.c
 ===
 --- linux-2.6.orig/drivers/base/node.c2010-07-15 09:54:06.0 
 -0500
 +++ linux-2.6/drivers/base/node.c 2010-07-15 09:56:16.0 -0500
 @@ -346,8 +346,10 @@
   return -EFAULT;
   if (!node_online(nid))
   return 0;
 - sect_start_pfn = section_nr_to_pfn(mem_blk-phys_index);
 - sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
 +
 + sect_start_pfn = section_nr_to_pfn(mem_blk-start_phys_index);
 + sect_end_pfn = section_nr_to_pfn(mem_blk-end_phys_index);
 + sect_end_pfn += PAGES_PER_SECTION - 1;
   for (pfn = sect_start_pfn; pfn = sect_end_pfn; pfn++) {
   int page_nid;
  
 @@ -383,8 +385,10 @@
   if (!unlinked_nodes)
   return -ENOMEM;
   nodes_clear(*unlinked_nodes);
 - sect_start_pfn = section_nr_to_pfn(mem_blk-phys_index);
 - sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
 +
 + sect_start_pfn = section_nr_to_pfn(mem_blk-start_phys_index);
 + sect_end_pfn = section_nr_to_pfn(mem_blk-end_phys_index);
 + sect_end_pfn += PAGES_PER_SECTION - 1;
   for (pfn = sect_start_pfn; pfn = sect_end_pfn; pfn++) {
   int nid;
  
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 4/7] Allow sysfs memory directories to be split

2010-07-14 Thread KAMEZAWA Hiroyuki
On Wed, 14 Jul 2010 12:25:03 +0900
KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote:

 On Tue, 13 Jul 2010 22:18:03 -0500
 Nathan Fontenot nf...@austin.ibm.com wrote:
 
  On 07/13/2010 07:35 PM, KAMEZAWA Hiroyuki wrote:
   On Tue, 13 Jul 2010 10:51:58 -0500
   Nathan Fontenot nf...@austin.ibm.com wrote:
   
  
   And for what purpose this interface is ? Does this split memory block 
   into 2 pieces
   of the same size ?? sounds __very__ strange interface to me.
  
   Yes, this splits the memory_block into two blocks of the same size.  
   This was
   suggested as something we may want to do.  From ppc perspective I am not 
   sure we
   would use this.
  
   The split functionality is not required.  The main goal of the patch set 
   is to
   reduce the number of memory sysfs directories created.  From a ppc 
   perspective
   the split functionality is not really needed.
  
   
   Okay, this is an offer from me.
   
 1. I think you can add an boot option as don't create memory sysfs.
please do.
  
  I posted a patch to do that a week or so ago, it didn't go over very well.
  
   
 2. I'd like to write a configfs module for handling memory hotplug even 
   when
sysfs directroy is not created.
Because configfs support rmdir/mkdir, the user (ppc's daemon?) has 
   to do

When offlining section X.
# insmod configfs_memory.ko
# mount -t configfs none /configfs
# mkdir /configfs/memoryX
# echo offline  /configfs/memoryX/state
# rmdir /configfs/memoryX
   
 And making this operation as the default bahavior for all arch's memory 
   hotplug may
 be better...
   
   Dave, how do you think ? Because ppc guys uses probe interface already,
   this can be handled... no ?
  
  ppc would still require the existance of the 'probe' interface.
  
  Are you objecting to the 'split' functionality? 
 yes.
 
  If so I do not see any reason from ppc
  perspective that it is needed.  This was something Dave suggested, unless I 
  am missing
  something.
  
  Since ppc needs the 'probe' interface in sysfs, and for ppc having mutliple 
  memory_block_sections reside under a single memory_block makes memory 
  hotplug
  simpler.  On ppc we do emory hotplug operations on an LMB size basis.  With 
  my
  patches this now lets us set each memory_block to span an LMB's worth of
  memory.  Now we could do emory hotplug in a single operation instead of 
  multiple
  operations to offline/online all of the memory sections in an LMB.
  
 
 Why per-section memory offlining is provided is for allowing good 
 success-rate of
 memory offlining. Because memory-hotplug has to migrate or free all used 
 page
 under a section, possibility of memory unplug depends on usage of memory.
 If a section contains unmovable page(kernel page), we can't offline sectin.
 
 For example, comparing
   1. offlining 128MB of memory at once
   2. offlining 8 chunks of 16MB memory
 2 can get very good possibility and system-busy time can be much reduced.
 
 IIUC, ppc's 1st requirement is resizing not hot-removing some memory 
 device,
 2 is much welcomed. So, some fine-grained interface to section_size is
 appreciated. So, multiple operations is much better than single operation.
 
 As I posted show/hide patch, I'm writing it in configfs. I think it meets 
 IBM's
 requirements.
 _But_, it's IBM's issue not Fujitsu's. So, final decistion will depend on you 
 guys.
 
 Anyway, I don't like a too fancy interface as split.
 

This is a sample configfs for handling memory hotplug.
I wrote this just for my fun and study. code-duplication was not as
big as expected...most of codes are for configfs management.

you can ignore this. but please avoid changing existing interace in fancy way.

==
[r...@bluextal kamezawa]# mount -t configfs none /configfs/
[r...@bluextal kamezawa]# mkdir /configfs/memory/72
[r...@bluextal kamezawa]# cat /configfs/memory/72/phys_index
0048
[r...@bluextal kamezawa]# cat /sys/devices/system/memory/memory72/phys_index
0048
[r...@bluextal kamezawa]# echo offline  /configfs/memory/72/state
[r...@bluextal kamezawa]# cat /configfs/memory/72/state
offline
[r...@bluextal kamezawa]# cat /sys/devices/system/memory/memory72/state
offline
[r...@bluextal kamezawa]# echo online  /configfs/memory/72/state
[r...@bluextal kamezawa]# cat /sys/devices/system/memory/memory72/state
online

No sign.

---
 drivers/base/Makefile|2 
 drivers/base/memory.c|   87 +--
 drivers/base/memory_config.c |  192 +++
 include/linux/memory.h   |   10 ++
 mm/Kconfig   |1 
 5 files changed, 280 insertions(+), 12 deletions(-)

Index: mmotm-2.6.35-0701/drivers/base/memory.c
===
--- mmotm-2.6.35-0701.orig/drivers/base/memory.c
+++ mmotm-2.6.35-0701/drivers/base/memory.c
@@ -23,12 +23,15 @@
 #include linux/mutex.h

Re: [PATCH 1/7] Split the memory_block structure

2010-07-13 Thread KAMEZAWA Hiroyuki

plz cc linux-mm in the next time...
And please incudes updates for Documentation/memory-hotplug.txt.


On Mon, 12 Jul 2010 10:42:06 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 This patch splits the memory_block struct into a memory_block
 struct to cover each sysfs directory and a new memory_block_section
 struct for each memory section covered by the sysfs directory.
 
 This also updates the routine handling memory_block creation
 and manipulation to use these updated structures.
 

Could you clarify the number of memory_block_section per memory_block ?


 Signed -off-by: Nathan Fontenot nf...@austin.ibm.com
 ---
  drivers/base/memory.c  |  228 
 +++--
  include/linux/memory.h |   11 +-
  2 files changed, 172 insertions(+), 67 deletions(-)
 
 Index: linux-2.6/drivers/base/memory.c
 ===
 --- linux-2.6.orig/drivers/base/memory.c  2010-07-08 11:27:21.0 
 -0500
 +++ linux-2.6/drivers/base/memory.c   2010-07-09 14:23:09.0 -0500
 @@ -28,6 +28,14 @@
  #include asm/uaccess.h
  
  #define MEMORY_CLASS_NAMEmemory
 +#define MIN_MEMORY_BLOCK_SIZE(1  SECTION_SIZE_BITS)
 +
 +static int sections_per_block;
 +
some default value, plz. Does this can be determined only by .config ?


 +static inline int base_memory_block_id(int section_nr)
 +{
 + return (section_nr / sections_per_block) * sections_per_block;
 +}
  
  static struct sysdev_class memory_sysdev_class = {
   .name = MEMORY_CLASS_NAME,
 @@ -94,10 +102,9 @@
  }
  
  static void
 -unregister_memory(struct memory_block *memory, struct mem_section *section)
 +unregister_memory(struct memory_block *memory)
  {
   BUG_ON(memory-sysdev.cls != memory_sysdev_class);
 - BUG_ON(memory-sysdev.id != __section_nr(section));
  
   /* drop the ref. we got in remove_memory_block() */
   kobject_put(memory-sysdev.kobj);
 @@ -123,13 +130,20 @@
  static ssize_t show_mem_removable(struct sys_device *dev,
   struct sysdev_attribute *attr, char *buf)
  {
 - unsigned long start_pfn;
 - int ret;
 - struct memory_block *mem =
 - container_of(dev, struct memory_block, sysdev);
 + struct list_head *pos, *tmp;
 + struct memory_block *mem;
 + int ret = 1;
 +
 + mem = container_of(dev, struct memory_block, sysdev);
 + list_for_each_safe(pos, tmp, mem-sections) {
 + struct memory_block_section *mbs;
 + unsigned long start_pfn;
 +
 + mbs = list_entry(pos, struct memory_block_section, next);

list_for_each_entry ?



 + start_pfn = section_nr_to_pfn(mbs-phys_index);
 + ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
 + }

Hmm, them, only when the whole memory block is removable, it's shown as
removable. Right ?
Does it meets ppc guy's requirements ?

  
 - start_pfn = section_nr_to_pfn(mem-phys_index);
 - ret = is_mem_section_removable(start_pfn, PAGES_PER_SECTION);
   return sprintf(buf, %d\n, ret);
  }

Hmm...can't you print removable information as bitmap, here ?
overkill ?


  
 @@ -182,16 +196,16 @@
   * OK to have direct references to sparsemem variables in here.
   */
  static int
 -memory_block_action(struct memory_block *mem, unsigned long action)
 +memory_block_action(struct memory_block_section *mbs, unsigned long action)
  {
   int i;
   unsigned long psection;
   unsigned long start_pfn, start_paddr;
   struct page *first_page;
   int ret;
 - int old_state = mem-state;
 ot-option-to-disable-memory-hotplug.patch
 + int old_state = mbs-state;

Where is this noise from ?

  
 - psection = mem-phys_index;
 + psection = mbs-phys_index;
   first_page = pfn_to_page(psection  PFN_SECTION_SHIFT);
  
   /*
 @@ -217,18 +231,18 @@
   ret = online_pages(start_pfn, PAGES_PER_SECTION);
   break;
   case MEM_OFFLINE:
 - mem-state = MEM_GOING_OFFLINE;
 + mbs-state = MEM_GOING_OFFLINE;
   start_paddr = page_to_pfn(first_page)  PAGE_SHIFT;
   ret = remove_memory(start_paddr,
   PAGES_PER_SECTION  PAGE_SHIFT);
   if (ret) {
 - mem-state = old_state;
 + mbs-state = old_state;
   break;
   }
   break;
   default:
   WARN(1, KERN_WARNING %s(%p, %ld) unknown action: 
 %ld\n,
 - __func__, mem, action, action);
 + __func__, mbs, action, action);
   ret = -EINVAL;
   }
  
 @@ -238,19 +252,40 @@
  static int memory_block_change_state(struct memory_block *mem,
   unsigned long to_state, unsigned 

Re: [PATCH 3/7] Update the [register,unregister]_memory routines

2010-07-13 Thread KAMEZAWA Hiroyuki
On Mon, 12 Jul 2010 10:44:10 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 This patch moves the register/unregister_memory routines to
 avoid a forward declaration.  It also moves the sysfs file
 creation and deletion for each directory into the register/
 unregister routines to avoid duplicating it with these updates.
 
 Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
 ---
  drivers/base/memory.c |   93 
 +-
  1 file changed, 48 insertions(+), 45 deletions(-)
 
 Index: linux-2.6/drivers/base/memory.c
 ===
 --- linux-2.6.orig/drivers/base/memory.c  2010-07-09 14:23:17.0 
 -0500
 +++ linux-2.6/drivers/base/memory.c   2010-07-09 14:23:20.0 -0500
 @@ -87,31 +87,6 @@
  EXPORT_SYMBOL(unregister_memory_isolate_notifier);
  
  /*
 - * register_memory - Setup a sysfs device for a memory block
 - */
 -static
 -int register_memory(struct memory_block *memory, struct mem_section *section)
 -{
 - int error;
 -
 - memory-sysdev.cls = memory_sysdev_class;
 - memory-sysdev.id = __section_nr(section);
 -
 - error = sysdev_register(memory-sysdev);
 - return error;
 -}
 -
 -static void
 -unregister_memory(struct memory_block *memory)
 -{
 - BUG_ON(memory-sysdev.cls != memory_sysdev_class);
 -
 - /* drop the ref. we got in remove_memory_block() */
 - kobject_put(memory-sysdev.kobj);
 - sysdev_unregister(memory-sysdev);
 -}
 -
 -/*
   * use this as the physical section index that this memsection
   * uses.
   */
 @@ -346,6 +321,53 @@
   sysdev_remove_file(mem-sysdev, attr_##attr_name)
  
  /*
 + * register_memory - Setup a sysfs device for a memory block
 + */
 +static
 +int register_memory(struct memory_block *memory, struct mem_section *section,
 + int nid, enum mem_add_context context)
 +{
 + int ret;
 +
 + memory-sysdev.cls = memory_sysdev_class;
 + memory-sysdev.id = __section_nr(section);
 +
Why not block-ID  but section-ID ?

-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 4/7] Allow sysfs memory directories to be split

2010-07-13 Thread KAMEZAWA Hiroyuki
On Mon, 12 Jul 2010 10:45:25 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 This patch introduces the new 'split' file in each memory sysfs
 directory and the associated routines needed to handle splitting
 a directory.
 
 Signed-off-by; Nathan Fontenot nf...@austin.ibm.com
 ---

pleae check diff option...


  drivers/base/memory.c |   99 
 +-
  1 file changed, 98 insertions(+), 1 deletion(-)
 
 Index: linux-2.6/drivers/base/memory.c
 ===
 --- linux-2.6.orig/drivers/base/memory.c  2010-07-09 14:23:20.0 
 -0500
 +++ linux-2.6/drivers/base/memory.c   2010-07-09 14:38:09.0 -0500
 @@ -32,6 +32,9 @@
  
  static int sections_per_block;
  
 +static int register_memory(struct memory_block *, struct mem_section *,
 +int, enum mem_add_context);
 +
  static inline int base_memory_block_id(int section_nr)
  {
   return (section_nr / sections_per_block) * sections_per_block;
 @@ -309,11 +312,100 @@
   return sprintf(buf, %d\n, mem-phys_device);
  }
  
 +static void update_memory_block_phys_indexes(struct memory_block *mem)
 +{
 + struct list_head *pos;
 + struct memory_block_section *mbs;
 + unsigned long min_index = 0x;
 + unsigned long max_index = 0;
 +
 + list_for_each(pos, mem-sections) {
 + mbs = list_entry(pos, struct memory_block_section, next);
 +
 + if (mbs-phys_index  min_index)
 + min_index = mbs-phys_index;
 +
 + if (mbs-phys_index  max_index)
 + max_index = mbs-phys_index;
 + }
 +
 + mem-start_phys_index = min_index;
 + mem-end_phys_index = max_index;
 +}
 +
 +static ssize_t
 +store_mem_split_block(struct sys_device *dev, struct sysdev_attribute *attr,
 +   const char *buf, size_t count)
 +{
 + struct memory_block *mem, *new_mem_blk;
 + struct memory_block_section *mbs;
 + struct list_head *pos, *tmp;
 + struct mem_section *section;
 + int min_scn_nr = 0;
 + int max_scn_nr = 0;
 + int total_scns = 0;
 + int new_blk_min, new_blk_total;
 + int ret = -EINVAL;
 +
 + mem = container_of(dev, struct memory_block, sysdev);
 +
 + if (list_is_singular(mem-sections))
 + return -EINVAL;

What this means ?


 +
 + mutex_lock(mem-state_mutex);
 +
 + list_for_each(pos, mem-sections) {
 + mbs = list_entry(pos, struct memory_block_section, next);
 +
 + total_scns++;
 +
 + if (min_scn_nr  mbs-phys_index)
 + min_scn_nr = mbs-phys_index;
 +
 + if (max_scn_nr  mbs-phys_index)
 + max_scn_nr = mbs-phys_index;
 + }
 +
 + new_mem_blk = kzalloc(sizeof(*new_mem_blk), GFP_KERNEL);
 + if (!new_mem_blk)
 + return -ENOMEM;
 +
 + mutex_init(new_mem_blk-state_mutex);
 + INIT_LIST_HEAD(new_mem_blk-sections);
 + new_mem_blk-state = mem-state;
 +
 + mutex_lock(new_mem_blk-state_mutex);
 +
 + new_blk_total = total_scns / 2;
 + new_blk_min = max_scn_nr - new_blk_total + 1;
 +
 + section = __nr_to_section(new_blk_min);
 + ret = register_memory(new_mem_blk, section, 0, HOTPLUG);
 +
'nid' is always 0 ?

And for what purpose this interface is ? Does this split memory block into 2 
pieces
of the same size ?? sounds __very__ strange interface to me.

If this is necessary, I hope move the whole things to configfs rather than
something tricky.

Bye.
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 4/7] Allow sysfs memory directories to be split

2010-07-13 Thread KAMEZAWA Hiroyuki
On Tue, 13 Jul 2010 10:51:58 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

  
  And for what purpose this interface is ? Does this split memory block into 
  2 pieces
  of the same size ?? sounds __very__ strange interface to me.
 
 Yes, this splits the memory_block into two blocks of the same size.  This was
 suggested as something we may want to do.  From ppc perspective I am not sure 
 we
 would use this.
 
 The split functionality is not required.  The main goal of the patch set is to
 reduce the number of memory sysfs directories created.  From a ppc perspective
 the split functionality is not really needed.
 

Okay, this is an offer from me.

  1. I think you can add an boot option as don't create memory sysfs.
 please do.

  2. I'd like to write a configfs module for handling memory hotplug even when
 sysfs directroy is not created.
 Because configfs support rmdir/mkdir, the user (ppc's daemon?) has to do
 
 When offlining section X.
 # insmod configfs_memory.ko
 # mount -t configfs none /configfs
 # mkdir /configfs/memoryX
 # echo offline  /configfs/memoryX/state
 # rmdir /configfs/memoryX

  And making this operation as the default bahavior for all arch's memory 
hotplug may
  be better...

Dave, how do you think ? Because ppc guys uses probe interface already,
this can be handled... no ?

One problem is that I don't have enough knowledge about configfs..it seems 
complex.

Thanks,
-Kame
  

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 4/7] Allow sysfs memory directories to be split

2010-07-13 Thread KAMEZAWA Hiroyuki
On Tue, 13 Jul 2010 22:18:03 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 On 07/13/2010 07:35 PM, KAMEZAWA Hiroyuki wrote:
  On Tue, 13 Jul 2010 10:51:58 -0500
  Nathan Fontenot nf...@austin.ibm.com wrote:
  
 
  And for what purpose this interface is ? Does this split memory block 
  into 2 pieces
  of the same size ?? sounds __very__ strange interface to me.
 
  Yes, this splits the memory_block into two blocks of the same size.  This 
  was
  suggested as something we may want to do.  From ppc perspective I am not 
  sure we
  would use this.
 
  The split functionality is not required.  The main goal of the patch set 
  is to
  reduce the number of memory sysfs directories created.  From a ppc 
  perspective
  the split functionality is not really needed.
 
  
  Okay, this is an offer from me.
  
1. I think you can add an boot option as don't create memory sysfs.
   please do.
 
 I posted a patch to do that a week or so ago, it didn't go over very well.
 
  
2. I'd like to write a configfs module for handling memory hotplug even 
  when
   sysfs directroy is not created.
   Because configfs support rmdir/mkdir, the user (ppc's daemon?) has to 
  do
   
   When offlining section X.
   # insmod configfs_memory.ko
   # mount -t configfs none /configfs
   # mkdir /configfs/memoryX
   # echo offline  /configfs/memoryX/state
   # rmdir /configfs/memoryX
  
And making this operation as the default bahavior for all arch's memory 
  hotplug may
be better...
  
  Dave, how do you think ? Because ppc guys uses probe interface already,
  this can be handled... no ?
 
 ppc would still require the existance of the 'probe' interface.
 
 Are you objecting to the 'split' functionality? 
yes.

 If so I do not see any reason from ppc
 perspective that it is needed.  This was something Dave suggested, unless I 
 am missing
 something.
 
 Since ppc needs the 'probe' interface in sysfs, and for ppc having mutliple 
 memory_block_sections reside under a single memory_block makes memory hotplug
 simpler.  On ppc we do emory hotplug operations on an LMB size basis.  With my
 patches this now lets us set each memory_block to span an LMB's worth of
 memory.  Now we could do emory hotplug in a single operation instead of 
 multiple
 operations to offline/online all of the memory sections in an LMB.
 

Why per-section memory offlining is provided is for allowing good success-rate 
of
memory offlining. Because memory-hotplug has to migrate or free all used page
under a section, possibility of memory unplug depends on usage of memory.
If a section contains unmovable page(kernel page), we can't offline sectin.

For example, comparing
  1. offlining 128MB of memory at once
  2. offlining 8 chunks of 16MB memory
2 can get very good possibility and system-busy time can be much reduced.

IIUC, ppc's 1st requirement is resizing not hot-removing some memory device,
2 is much welcomed. So, some fine-grained interface to section_size is
appreciated. So, multiple operations is much better than single operation.

As I posted show/hide patch, I'm writing it in configfs. I think it meets IBM's
requirements.
_But_, it's IBM's issue not Fujitsu's. So, final decistion will depend on you 
guys.

Anyway, I don't like a too fancy interface as split.

Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: 2.6.35-rc2 : OOPS with LTP memcg regression test run.

2010-06-10 Thread KAMEZAWA Hiroyuki
On Thu, 10 Jun 2010 22:00:57 +0200
Maciej Rutecki maciej.rute...@gmail.com wrote:

 I created a Bugzilla entry at 
 https://bugzilla.kernel.org/show_bug.cgi?id=16178
 for your bug report, please add your address to the CC list in there, thanks!
 

Hmm... It seems a panic in SLUB or SLAB.
Is .config available ?

-Kame


 On niedziela, 6 czerwca 2010 o 17:06:54 Sachin Sant wrote:
  While executing LTP Controller tests(memcg regression) on
  a POWER6 box came across this following OOPS.
  
  Memory cgroup out of memory: kill process 9139 (memcg_test_1) score 3 or a
   child Killed process 9139 (memcg_test_1) vsz:3456kB, anon-rss:448kB,
   file-rss:1088kB Memory cgroup out of memory: kill process 9140
   (memcg_test_1) score 3 or a child Killed process 9140 (memcg_test_1)
   vsz:3456kB, anon-rss:448kB, file-rss:1088kB Unable to handle kernel paging
   request for data at address 0x720072007200720 Faulting instruction
   address: 0xc015b778
  Oops: Kernel access of bad area, sig: 11 [#2]
  SMP NR_CPUS=1024 NUMA pSeries
  last sysfs file: /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map
  Modules linked in: quota_v2 quota_tree ipv6 fuse loop dm_mod sr_mod cdrom
   sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod NIP:
   c015b778 LR: c015b740 CTR: 
  REGS: c9812ff0 TRAP: 0300   Tainted: G  D 
   (2.6.35-rc2-autotest) MSR: 80009032 EE,ME,IR,DR  CR: 44004424 
   XER: 0001
  DAR: 0720072007200720, DSISR: 4000
  TASK = c5fb1100[9155] 'umount' THREAD: c981 CPU: 0
  GPR00:  c9813270 c0d3d7a0 
  GPR04: 8050 0016 0027 cf2c6870
  GPR08: 06a5 c0b16870 c0cf0140 0e7b
  GPR12: 24004428 c744 8000 f000
  GPR16:  c98138f0 002d 0027
  GPR20:  0027  c7063138
  GPR24:   c019bafc ce02e000
  GPR28: 0001 8050 c0ca6b00 0720072007200720
  NIP [c015b778] .kmem_cache_alloc+0xb0/0x13c
  LR [c015b740] .kmem_cache_alloc+0x78/0x13c
  Call Trace:
  [c9813270] [c015b740] .kmem_cache_alloc+0x78/0x13c
   (unreliable) [c9813310] [c019bafc]
   .alloc_buffer_head+0x2c/0x78 [c9813390] [c019c99c]
   .alloc_page_buffers+0x60/0x114 [c9813450] [c019ca78]
   .create_empty_buffers+0x28/0x140 [c98134e0] [c019f2ec]
   .__block_prepare_write+0xe4/0x4f0 [c9813610] [c019f94c]
   .block_write_begin_newtrunc+0xa8/0x120 [c98136d0]
   [c019fea0] .block_write_begin+0x34/0x8c [c9813770]
   [c022b458] .ext3_write_begin+0x13c/0x298 [c9813880]
   [c0117500] .generic_file_buffered_write+0x13c/0x320
   [c98139b0] [c0119c80]
   .__generic_file_aio_write+0x378/0x3dc [c9813ab0]
   [c0119d68] .generic_file_aio_write+0x84/0xfc [c9813b60]
   [c016e460] .do_sync_write+0xac/0x10c
  [c9813ce0] [c016f204] .vfs_write+0xd0/0x1dc
  [c9813d80] [c016f418] .SyS_write+0x58/0xa0
  [c9813e30] [c00085b4] syscall_exit+0x0/0x40
  Instruction dump:
  3860 409e0090 3800 8b8d0212 980d0212 e96d0040 e93b 7ce95a14
  7fe9582a 2fbf 419e0014 e81b001a 7c1f002a 7c09592a 481c 7f46d378
  ---[ end trace f24cb0cb5729d2bb ]---
  
  And few more of these. Previous snapshot release
   2.6.35-rc1-git5(6c5de280b6...) was good.
  
  Thanks
  -Sachin
  
 
 -- 
 Maciej Rutecki
 http://www.maciek.unixy.pl
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[BUGFIX][PATCH] memcg: avoid use cmpxchg in swap cgroup maintainance (Was Re: 34-rc1-git3 build failure with CGROUP_MEM_RES_CTLR_SWAP=y

2010-03-14 Thread KAMEZAWA Hiroyuki
On Sun, 14 Mar 2010 16:18:06 +0530
Sachin Sant sach...@in.ibm.com wrote:

 On a PowerPC box, latest 34-rc1 git(d89b218b8...) fails to build
 with CGROUPS_MEM_RES_CTRL_SWAP=y. 
 
 LD  init/built-in.o
 LD  .tmp_vmlinux1
 mm/built-in.o: In function __xchg:
 arch/powerpc/include/asm/system.h:331: undefined reference to 
 .__xchg_called_with_bad_pointer
 mm/built-in.o: In function __cmpxchg:
 arch/powerpc/include/asm/system.h:474: undefined reference to 
 .__cmpxchg_called_with_bad_pointer
 make: *** [.tmp_vmlinux1] Error 1
 
 The code in question was added via commit 024914477e...
 
 memcg: move charges of anonymous swap
 
Oh..ok, powerpc (and other archs?) can't do 2byte cmpxchg and xchg.
Then, we should use spinlock rather than that.

How about this ? Nishimura-san, could you consider something better ?
We need a quick fix.

==
swap_cgroup uses 2bytes data and uses cmpxchg in a new operation.
2byte cmpxchg/xchg is not available on some archs. This patch replaces
cmpxchg/xchg with operations under lock.

Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
---
 mm/page_cgroup.c |   20 
 1 file changed, 16 insertions(+), 4 deletions(-)

Index: mmotm-2.6.34-Mar11/mm/page_cgroup.c
===
--- mmotm-2.6.34-Mar11.orig/mm/page_cgroup.c
+++ mmotm-2.6.34-Mar11/mm/page_cgroup.c
@@ -284,6 +284,7 @@ static DEFINE_MUTEX(swap_cgroup_mutex);
 struct swap_cgroup_ctrl {
struct page **map;
unsigned long length;
+   spinlock_t  lock;
 };
 
 struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_SWAPFILES];
@@ -353,16 +354,22 @@ unsigned short swap_cgroup_cmpxchg(swp_e
struct swap_cgroup_ctrl *ctrl;
struct page *mappage;
struct swap_cgroup *sc;
+   unsigned long flags;
+   unsigned short retval;
 
ctrl = swap_cgroup_ctrl[type];
 
mappage = ctrl-map[idx];
sc = page_address(mappage);
sc += pos;
-   if (cmpxchg(sc-id, old, new) == old)
-   return old;
+   spin_lock_irqsave(ctrl-lock, flags);
+   retval = sc-id;
+   if (retval == old)
+   sc-id = new;
else
-   return 0;
+   retval = 0;
+   spin_unlock_irqrestore(ctrl-lock, flags);
+   return retval;
 }
 
 /**
@@ -383,13 +390,17 @@ unsigned short swap_cgroup_record(swp_en
struct page *mappage;
struct swap_cgroup *sc;
unsigned short old;
+   unsigned long flags;
 
ctrl = swap_cgroup_ctrl[type];
 
mappage = ctrl-map[idx];
sc = page_address(mappage);
sc += pos;
-   old = xchg(sc-id, id);
+   spin_lock_irqsave(ctrl-lock, flags);
+   old = sc-id;
+   sc-id = id;
+   spin_unlock_irqrestore(ctrl-lock, flags);
 
return old;
 }
@@ -441,6 +452,7 @@ int swap_cgroup_swapon(int type, unsigne
mutex_lock(swap_cgroup_mutex);
ctrl-length = length;
ctrl-map = array;
+   spin_lock_init(ctrl-lock);
if (swap_cgroup_prepare(type)) {
/* memory shortage */
ctrl-map = NULL;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2][v2] mm: add notifier in pageblock isolation for balloon drivers

2009-10-08 Thread KAMEZAWA Hiroyuki
On Fri, 2 Oct 2009 13:44:58 -0500
Robert Jennings r...@linux.vnet.ibm.com wrote:

 Memory balloon drivers can allocate a large amount of memory which
 is not movable but could be freed to accomodate memory hotplug remove.
 
 Prior to calling the memory hotplug notifier chain the memory in the
 pageblock is isolated.  If the migrate type is not MIGRATE_MOVABLE the
 isolation will not proceed, causing the memory removal for that page
 range to fail.
 
 Rather than failing pageblock isolation if the the migrateteype is not
 MIGRATE_MOVABLE, this patch checks if all of the pages in the pageblock
 are owned by a registered balloon driver (or other entity) using a
 notifier chain.  If all of the non-movable pages are owned by a balloon,
 they can be freed later through the memory notifier chain and the range
 can still be isolated in set_migratetype_isolate().
 
 Signed-off-by: Robert Jennings r...@linux.vnet.ibm.com
 
 ---
  drivers/base/memory.c  |   19 +++
  include/linux/memory.h |   26 ++
  mm/page_alloc.c|   45 ++---
  3 files changed, 83 insertions(+), 7 deletions(-)
 
 Index: b/drivers/base/memory.c
 ===
 --- a/drivers/base/memory.c
 +++ b/drivers/base/memory.c
 @@ -63,6 +63,20 @@ void unregister_memory_notifier(struct n
  }
  EXPORT_SYMBOL(unregister_memory_notifier);
  
 +static BLOCKING_NOTIFIER_HEAD(memory_isolate_chain);
 +

IIUC, this notifier is called under zone-lock.
please ATOMIC_NOTIFIER_HEAD().




 +int register_memory_isolate_notifier(struct notifier_block *nb)
 +{
 + return blocking_notifier_chain_register(memory_isolate_chain, nb);
 +}
 +EXPORT_SYMBOL(register_memory_isolate_notifier);
 +
 +void unregister_memory_isolate_notifier(struct notifier_block *nb)
 +{
 + blocking_notifier_chain_unregister(memory_isolate_chain, nb);
 +}
 +EXPORT_SYMBOL(unregister_memory_isolate_notifier);
 +
  /*
   * register_memory - Setup a sysfs device for a memory block
   */
 @@ -157,6 +171,11 @@ int memory_notify(unsigned long val, voi
   return blocking_notifier_call_chain(memory_chain, val, v);
  }
  
 +int memory_isolate_notify(unsigned long val, void *v)
 +{
 + return blocking_notifier_call_chain(memory_isolate_chain, val, v);
 +}
 +
  /*
   * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
   * OK to have direct references to sparsemem variables in here.
 Index: b/include/linux/memory.h
 ===
 --- a/include/linux/memory.h
 +++ b/include/linux/memory.h
 @@ -50,6 +50,18 @@ struct memory_notify {
   int status_change_nid;
  };
  
 +/*
 + * During pageblock isolation, count the number of pages in the
 + * range [start_pfn, start_pfn + nr_pages)
 + */
 +#define MEM_ISOLATE_COUNT(10)
 +
 +struct memory_isolate_notify {
 + unsigned long start_pfn;
 + unsigned int nr_pages;
 + unsigned int pages_found;
 +};

Could you add commentary for each field ?

 +
  struct notifier_block;
  struct mem_section;
  
 @@ -76,14 +88,28 @@ static inline int memory_notify(unsigned
  {
   return 0;
  }
 +static inline int register_memory_isolate_notifier(struct notifier_block *nb)
 +{
 + return 0;
 +}
 +static inline void unregister_memory_isolate_notifier(struct notifier_block 
 *nb)
 +{
 +}
 +static inline int memory_isolate_notify(unsigned long val, void *v)
 +{
 + return 0;
 +}
  #else
  extern int register_memory_notifier(struct notifier_block *nb);
  extern void unregister_memory_notifier(struct notifier_block *nb);
 +extern int register_memory_isolate_notifier(struct notifier_block *nb);
 +extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
  extern int register_new_memory(int, struct mem_section *);
  extern int unregister_memory_section(struct mem_section *);
  extern int memory_dev_init(void);
  extern int remove_memory_block(unsigned long, struct mem_section *, int);
  extern int memory_notify(unsigned long val, void *v);
 +extern int memory_isolate_notify(unsigned long val, void *v);
  extern struct memory_block *find_memory_block(struct mem_section *);
  #define CONFIG_MEM_BLOCK_SIZE(PAGES_PER_SECTIONPAGE_SHIFT)
  enum mem_add_context { BOOT, HOTPLUG };
 Index: b/mm/page_alloc.c
 ===
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -48,6 +48,7 @@
  #include linux/page_cgroup.h
  #include linux/debugobjects.h
  #include linux/kmemleak.h
 +#include linux/memory.h
  #include trace/events/kmem.h
  
  #include asm/tlbflush.h
 @@ -4985,23 +4986,53 @@ void set_pageblock_flags_group(struct pa
  int set_migratetype_isolate(struct page *page)
  {
   struct zone *zone;
 - unsigned long flags;
 + unsigned long flags, pfn, iter;
 + unsigned long immobile = 0;
 + struct memory_isolate_notify arg;
 + int notifier_ret;
   int ret = -EBUSY;

Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc

2008-02-19 Thread KAMEZAWA Hiroyuki
On Sun, 17 Feb 2008 20:29:13 +0100
Jens Axboe [EMAIL PROTECTED] wrote:

 It's odd stuff. Could you perhaps try and add some printks to
 block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
 from radix_tree_gang_lookup() and the pointer value of cics[i] in the
 for() loop after the lookup?
 
I met the same issue on ia64/NUMA box.
seems cisc[]-key is NULL and index for radix_tree_gang_lookup() was always '1'.

Attached patch works well for me, 
but I don't know much about cfq. please confirm. 

Regards,
-Kame

==
cics[]-key can be NULL.
In that case, cics[]-dead_key has key value.

Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED]

Index: linux-2.6.25-rc2/block/cfq-iosched.c
===
--- linux-2.6.25-rc2.orig/block/cfq-iosched.c
+++ linux-2.6.25-rc2/block/cfq-iosched.c
@@ -1171,7 +1171,11 @@ call_for_each_cic(struct io_context *ioc
break;
 
called += nr;
-   index = 1 + (unsigned long) cics[nr - 1]-key;
+
+   if (!cics[nr - 1]-key)
+   index = 1 + (unsigned long) cics[nr - 1]-dead_key;
+   else
+   index = 1 + (unsigned long) cics[nr - 1]-key;
 
for (i = 0; i  nr; i++)
func(ioc, cics[i]);

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc

2008-02-19 Thread KAMEZAWA Hiroyuki
On Tue, 19 Feb 2008 09:36:34 +0100
Jens Axboe [EMAIL PROTECTED] wrote:

 On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
  On Sun, 17 Feb 2008 20:29:13 +0100
  Jens Axboe [EMAIL PROTECTED] wrote:
  
   It's odd stuff. Could you perhaps try and add some printks to
   block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
   from radix_tree_gang_lookup() and the pointer value of cics[i] in the
   for() loop after the lookup?
   
  I met the same issue on ia64/NUMA box.
  seems cisc[]-key is NULL and index for radix_tree_gang_lookup() was
  always '1'.
 
 Why does it keep repeating then? If -key is NULL, the next lookup index
 should be 1UL.
 
when I inserted printk here
==
for (i = 0; i  nr; i++)
func(ioc, cics[i]);
printk(%d %lx\n, nr, index);
==
index was always 1 and  nr was always 32.

So, cics[31]-key was always NULL when index=1 is passed to 
radix_tree_gang_lookup().


 But I think the radix 'scan over entire tree' is a bit fragile. This
 patch adds a parallel hlist for ease of properly browsing the members,
 does that work for you? It compiles, but I haven't booted it here yet...
 
will try. please wait a bit.

Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc

2008-02-19 Thread KAMEZAWA Hiroyuki
On Tue, 19 Feb 2008 09:58:38 +0100
Jens Axboe [EMAIL PROTECTED] wrote:
  when I inserted printk here
  ==
  for (i = 0; i  nr; i++)
  func(ioc, cics[i]);
  printk(%d %lx\n, nr, index);
  ==
  index was always 1 and  nr was always 32.
  
  So, cics[31]-key was always NULL when index=1 is passed to
  radix_tree_gang_lookup().
 
 Hang on, it returned 32? It should not return more than 16, since that
 is what we have room for and asked for. 
sorry. Of course, it was 16 ;(

your patch works well. thank you.

-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc

2008-02-19 Thread KAMEZAWA Hiroyuki
On Tue, 19 Feb 2008 09:36:34 +0100
Jens Axboe [EMAIL PROTECTED] wrote:

 On Tue, Feb 19 2008, KAMEZAWA Hiroyuki wrote:
  On Sun, 17 Feb 2008 20:29:13 +0100
  Jens Axboe [EMAIL PROTECTED] wrote:
  
   It's odd stuff. Could you perhaps try and add some printks to
   block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
   from radix_tree_gang_lookup() and the pointer value of cics[i] in the
   for() loop after the lookup?
   
  I met the same issue on ia64/NUMA box.
  seems cisc[]-key is NULL and index for radix_tree_gang_lookup() was
  always '1'.
 
 Why does it keep repeating then? If -key is NULL, the next lookup index
 should be 1UL.
 
 But I think the radix 'scan over entire tree' is a bit fragile. This
 patch adds a parallel hlist for ease of properly browsing the members,
 does that work for you? It compiles, but I haven't booted it here yet...
 
Works well for me and my box booted !

Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC] hotplug memory remove - walk_memory_resource for ppc64

2007-10-31 Thread KAMEZAWA Hiroyuki
On Wed, 31 Oct 2007 08:02:40 -0800
Badari Pulavarty [EMAIL PROTECTED] wrote:
 Paul's concern is, since we didn't need it so far - why we need this
 for hotplug memory remove to work ? It might break API for *unknown*
 applications. Its unfortunate that, hotplug memory add updates 
 /proc/iomem. We can deal with it later, as a separate patch.
 
I have no objection to skip /proc/iomem related routine when arch
doesn't need it. 

My advice is just please take care both of hot-add and hot-remove.

If ppc64 people agreed to use arch-specific routine for detect
conventional memory, there is no problem, I think.

Thanks,
-Kame
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/3] Add remove_memory() for ppc64

2007-10-31 Thread KAMEZAWA Hiroyuki
On Wed, 31 Oct 2007 14:55:03 -0700
Dave Hansen [EMAIL PROTECTED] wrote:

 On Wed, 2007-10-31 at 14:11 -0800, Badari Pulavarty wrote:
  
  Well, We don't need arch-specific remove_memory() for ia64 and ppc64.
  x86_64, I don't know. We will know, only when some one does the
  verification. I don't need arch_remove_memory() hook also at this
  time.
 
 I wasn't being very clear.  I say, add the arch hook only if you need
 it.  But, for now, just take the ia64 code and make it generic.  
 

remove_memory() has been arch-specific since there was no piece of unplug
code. And I didn't merge it to be generic when I implemented ia64 ver.

Hmm...I have no objection to merge them. But let's see how memory hotremove
for ppc64 works for a while. We can merge them later.

I'm glad to have new testers :)

Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC] hotplug memory remove - walk_memory_resource for ppc64

2007-10-30 Thread KAMEZAWA Hiroyuki
On Wed, 31 Oct 2007 14:28:46 +0900
KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote:

 ioresource was good structure for remembering which memory is conventional
 memory and i386/x86_64/ia64 registered conventional memory as System RAM,
 when I posted patch. (just say System Ram is not for memory hotplug.)
 
If I remember correctly, System RAM is for kdump (to know which memory should
be dumped.) Then, memory-hotadd/remove has to modify it anyway.

Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC] PPC64 Exporting memory information through /proc/iomem

2007-10-03 Thread KAMEZAWA Hiroyuki
On Wed, 03 Oct 2007 08:35:35 -0700
Badari Pulavarty [EMAIL PROTECTED] wrote:

 On Wed, 2007-10-03 at 10:19 +0900, KAMEZAWA Hiroyuki wrote:
 CONFIG_ARCH_HAS_VALID_MEMORY_RANGE. Then define own
 find_next_system_ram() (rename to is_valid_memory_range()) - which
 checks the given range is a valid memory range for memory-remove
 or not. What do you think ?
 
My concern is...
Now, memory hot *add* makes use of resource(/proc/iomem) information for 
onlining
memory.(See add_memory()-register_memory_resource() in mm/memoryhotplug.c)
So, we'll have to consider changing it if we need.

Does PPC64 memory hot add registers new memory information to arch dependent
information list ? It seems ppc64 registers hot-added memory information from
*probe* file and registers it by add_memory()-register_memory_resource().

If you add all add/remove/walk system ram information in sane way, I have no
objection.

I like find_next_system_ram() because I used some amount of time to debug it ;)

Thanks,
-Kame
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [RFC] PPC64 Exporting memory information through /proc/iomem

2007-10-02 Thread KAMEZAWA Hiroyuki
On Tue, 02 Oct 2007 16:10:53 -0700
Badari Pulavarty [EMAIL PROTECTED] wrote:
   Otherwise, we need to add arch-specific hooks in hotplug-remove
   code to be able to do this.
  
  Isn't it just a matter of abstracting the test for a valid range of
  memory?  If it's really hard to abstract that, then I guess we can put
  RAM in iomem_resource, but I'd rather not.
  
 
 Sure. I will work on it and see how ugly it looks.
 
 KAME, are you okay with abstracting the find_next_system_ram() and
 let arch provide whatever implementation they want ? (since current
 code doesn't work for x86-64 also ?).
 
Hmm, registering /proc/iomem is complicated ? If too complicated, adding config
like
CONFIG_ARCH_SUPPORT_IORESOURCE_RAM or something can do good work.
you can define your own check_pages_isolated (you can rename this to
arch_check_apges_isolated().)


BTW, I shoudl ask people how to describe conventional memory

A. #define IORESOURCE_RAM   IORESOURCE_MEM  (ia64)
B. #define IORESOURCE_RAM   IORESOURCE_MEM | IORESOUCE_BUSY (i386, 
x86_64)

Sad to say, memory hot-add registers new memory just as IORESOURCE_MEM.

Thanks,
-Kame

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev