Re: [PATCH 1/2] mm: add notifier in pageblock isolation for balloon drivers

2009-10-02 Thread Mel Gorman
On Thu, Oct 01, 2009 at 02:53:11PM -0500, Robert Jennings wrote:
 Memory balloon drivers can allocate a large amount of memory which
 is not movable but could be freed to accommodate memory hotplug remove.
 
 Prior to calling the memory hotplug notifier chain the memory in the
 pageblock is isolated.  If the migrate type is not MIGRATE_MOVABLE the
 isolation will not proceed, causing the memory removal for that page
 range to fail.
 
 Rather than immediately failing pageblock isolation if the the
 migrateteype is not MIGRATE_MOVABLE, this patch checks if all of the
 pages in the pageblock are owned by a registered balloon driver using a
 notifier chain.  If all of the non-movable pages are owned by a balloon,
 they can be freed later through the memory notifier chain and the range
 can still be isolated in set_migratetype_isolate().
 
 Signed-off-by: Robert Jennings r...@linux.vnet.ibm.com
 
 ---
  drivers/base/memory.c  |   19 +++
  include/linux/memory.h |   22 ++
  mm/page_alloc.c|   49 
 +
  3 files changed, 82 insertions(+), 8 deletions(-)
 
 Index: b/drivers/base/memory.c
 ===
 --- a/drivers/base/memory.c
 +++ b/drivers/base/memory.c
 @@ -63,6 +63,20 @@ void unregister_memory_notifier(struct n
  }
  EXPORT_SYMBOL(unregister_memory_notifier);
  
 +static BLOCKING_NOTIFIER_HEAD(memory_isolate_chain);
 +
 +int register_memory_isolate_notifier(struct notifier_block *nb)
 +{
 + return blocking_notifier_chain_register(memory_isolate_chain, nb);
 +}
 +EXPORT_SYMBOL(register_memory_isolate_notifier);
 +
 +void unregister_memory_isolate_notifier(struct notifier_block *nb)
 +{
 + blocking_notifier_chain_unregister(memory_isolate_chain, nb);
 +}
 +EXPORT_SYMBOL(unregister_memory_isolate_notifier);
 +
  /*
   * register_memory - Setup a sysfs device for a memory block
   */
 @@ -157,6 +171,11 @@ int memory_notify(unsigned long val, voi
   return blocking_notifier_call_chain(memory_chain, val, v);
  }
  
 +int memory_isolate_notify(unsigned long val, void *v)
 +{
 + return blocking_notifier_call_chain(memory_isolate_chain, val, v);
 +}
 +
  /*
   * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
   * OK to have direct references to sparsemem variables in here.
 Index: b/include/linux/memory.h
 ===
 --- a/include/linux/memory.h
 +++ b/include/linux/memory.h
 @@ -50,6 +50,14 @@ struct memory_notify {
   int status_change_nid;
  };
  
 +#define MEM_ISOLATE_COUNT(10)
 +

This needs a comment explaining that that this is an action to count the
number of pages within a range that have been isolated within a range of
pages and not a default value for nr_pages in the next structure.

 +struct memory_isolate_notify {
 + unsigned long start_addr;
 + unsigned int nr_pages;
 + unsigned int pages_found;
 +};

Is there any particular reason you used virtual address of the mapped
page instead of PFN? I am guessing at this point that the balloon driver
is based on addresses but the code that populates the structure more
commonly deals with PFNs. Outside of debugging code, page_address is
rarely used in mm/page_alloc.c .

It's picky but it feels more natural to me to have the structure have
start_pfn and nr_pages or start_addr and end_addr but not a mix of both.

 +
  struct notifier_block;
  struct mem_section;
  
 @@ -76,14 +84,28 @@ static inline int memory_notify(unsigned
  {
   return 0;
  }
 +static inline int register_memory_isolate_notifier(struct notifier_block *nb)
 +{
 + return 0;
 +}
 +static inline void unregister_memory_isolate_notifier(struct notifier_block 
 *nb)
 +{
 +}
 +static inline int memory_isolate_notify(unsigned long val, void *v)
 +{
 + return 0;
 +}
  #else
  extern int register_memory_notifier(struct notifier_block *nb);
  extern void unregister_memory_notifier(struct notifier_block *nb);
 +extern int register_memory_isolate_notifier(struct notifier_block *nb);
 +extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
  extern int register_new_memory(int, struct mem_section *);
  extern int unregister_memory_section(struct mem_section *);
  extern int memory_dev_init(void);
  extern int remove_memory_block(unsigned long, struct mem_section *, int);
  extern int memory_notify(unsigned long val, void *v);
 +extern int memory_isolate_notify(unsigned long val, void *v);
  extern struct memory_block *find_memory_block(struct mem_section *);
  #define CONFIG_MEM_BLOCK_SIZE(PAGES_PER_SECTIONPAGE_SHIFT)
  enum mem_add_context { BOOT, HOTPLUG };
 Index: b/mm/page_alloc.c
 ===
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -48,6 +48,7 @@
  #include linux/page_cgroup.h
  #include linux/debugobjects.h
  #include linux/kmemleak.h
 +#include 

Is volatile always verboten for FSL QE structures?

2009-10-02 Thread Michael Barkowski
Just wondering - is there a case where using volatile for UCC parameter RAM for 
example will not work, or is the use of I/O accessors everywhere an attempt to 
be portable to other architectures?

I'm asking because I really want to know ;)

-- 
Michael Barkowski
905-482-4577
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Is volatile always verboten for FSL QE structures?

2009-10-02 Thread Timur Tabi
Michael Barkowski wrote:
 Just wondering - is there a case where using volatile for UCC parameter RAM 
 for example will not work, or is the use of I/O accessors everywhere an 
 attempt to be portable to other architectures?

'volatile' just doesn't really do what you think it should do.  The PowerPC 
architecture is too complicated w.r.t. ordering of reads and writes.  In other 
words, you can't trust it.

No one should be using 'volatile' to access I/O registers.

-- 
Timur Tabi
Linux kernel developer at Freescale
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: tree build failure

2009-10-02 Thread Hollis Blanchard
On Wed, 2009-09-30 at 07:35 +0100, Jan Beulich wrote:
  Hollis Blanchard holl...@us.ibm.com 30.09.09 01:39 
 On Tue, 2009-09-29 at 10:28 +0100, Jan Beulich wrote:
   Hollis Blanchard  09/29/09 2:00 AM 
  First, I think there is a real bug here, and the code should read like
  this (to match the comment):
  /* type has to be known at build time for optimization */
  -BUILD_BUG_ON(__builtin_constant_p(type));
  +BUILD_BUG_ON(!__builtin_constant_p(type));
  
  However, I get the same build error *both* ways, i.e.
  __builtin_constant_p(type) evaluates to both 0 and 1? Either that, or
  the new BUILD_BUG_ON() macro isn't working...
  
  No, at this point of the compilation process it's neither zero nor one,
  it's simply considered non-constant by the compiler at that stage
  (this builtin is used for optimization, not during parsing, and the
  error gets generated when the body of the function gets parsed,
  not when code gets generated from it).
 
 I think I see what you're saying. Do you have a fix to suggest?
 
 The one Rusty suggested the other day may help here. I don't like it
 as a drop-in replacement for BUILD_BUG_ON() though (due to it
 deferring the error generated to the linking stage), I'd rather view
 this as an improvement to MAYBE_BUILD_BUG_ON() (which should
 then be used here).

Can you be more specific?

I have no idea what Rusty suggested where. I can't even guess what
MAYBE_BUILD_BUG_ON() is supposed to do (sounds like a terrible name).

All I know is that this used to build...

-- 
Hollis Blanchard
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Is volatile always verboten for FSL QE structures?

2009-10-02 Thread Kumar Gala


On Oct 2, 2009, at 9:46 AM, Timur Tabi wrote:


Michael Barkowski wrote:
Just wondering - is there a case where using volatile for UCC  
parameter RAM for example will not work, or is the use of I/O  
accessors everywhere an attempt to be portable to other  
architectures?


'volatile' just doesn't really do what you think it should do.  The  
PowerPC architecture is too complicated w.r.t. ordering of reads and  
writes.  In other words, you can't trust it.


No one should be using 'volatile' to access I/O registers.


See Documentation/volatile-considered-harmful.txt

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Is volatile always verboten for FSL QE structures?

2009-10-02 Thread Michael Barkowski
Kumar Gala wrote:
 
 On Oct 2, 2009, at 9:46 AM, Timur Tabi wrote:
 
 Michael Barkowski wrote:
 Just wondering - is there a case where using volatile for UCC
 parameter RAM for example will not work, or is the use of I/O
 accessors everywhere an attempt to be portable to other architectures?

 'volatile' just doesn't really do what you think it should do.  The
 PowerPC architecture is too complicated w.r.t. ordering of reads and
 writes.  In other words, you can't trust it.

 No one should be using 'volatile' to access I/O registers.
 
 See Documentation/volatile-considered-harmful.txt
 

I'm happy to adopt your interpretation of it, and I appreciate the explanation.

from Documentation/volatile-considered-harmful.txt:
   - The above-mentioned accessor functions might use volatile on
 architectures where direct I/O memory access does work.  Essentially,
 each accessor call becomes a little critical section on its own and
 ensures that the access happens as expected by the programmer.

Part of it was that I wondered if this was one of those architectures.  I guess 
not.

-- 
Michael Barkowski
905-482-4577
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux booting fails on ppc440x5 with SRAM

2009-10-02 Thread Johnny Hung
It's seems a RAM initialize problem. Try to use ICE or your bootloader
to test initialized RAM wirh write/read operation.
Ex, use mtest in uboot to check memory. For ICE, it should be an
detailed memory test function like hardware diagnostic.

2009/9/24 Benjamin Herrenschmidt b...@kernel.crashing.org:
 On Wed, 2009-09-23 at 20:19 +0530, Vineeth _ wrote:
 I am trying to port linux on IBM powerpc-440x5. I have this board
 which has this processor, a 16MB SRAM sits on location 0x0, uart and a
 flash.I have a simple bootloader which does the following.
     1. Initialize the processor (as part of it, we Generates the tlbs
 for UART,16MB flash,16MB SRAM)
     2. Initialize the UART
     3. Copy the simple-boot linux_image (binary file) from flash to
 0x40 location of SRAM.
     4. Kernel entry to 0x40

 Not sure yet, looks like things are being overwritten during boot.
 Interestingly enough, the DEAR value looks like an MSR value..

 Hard to tell what's up. You'll have to dig a bit more youself to
 figure out how come the kernel's getting that bad pointer.

 All I can tell you is that things work fine on 440x5 cores, though
 I haven't tested a configuration with so little memory in a while.

 Ben.

 zImage starting: loaded at 0x0040 (sp: 0x004deeb0)
 Allocating 0x1dad84 bytes for kernel ...
 gunzipping (0x - 0x0040c000:0x004dd3f1)...done 0x1c31cc bytes

 Linux/PowerPC load: console=ttyS0 root=/dev/ram
 Finalizing device tree... flat tree at 0x4eb300
 id mach(): done
 inside skybeam_register_console function
 MMU:enterMMU:hw initMMU:mapinMMU:setioMMU:exitinside
 _setup_arch-begininginside _setup_arch-1inside
 _setup_arch-2setup_arch: bootmemarch: exit7Top of RAM: 0x100,
 Total RAM: 0x100
 Zone PFN ranges:
  DMA      0x - 0x1000
  Normal   0x1000 - 0x1000
 Movable zone start PFN for each node
 early_node_map[1] active PFN ranges
    0: 0x - 0x1000
 MMU: Allocated 1088 bytes of context maps for 255 contexts
 Built 1 zonelists in Zone order, mobility grouping off.  Total pages: 4064
 Kernel command line: console=ttyS0 root=/dev/ram
 Unable to handle kernel paging request for data at address 0x00021000
 Faulting instruction address: 0xc010a7c4
 Oops: Kernel access of bad area, sig: 11 [#1]
 PREEMPT PowerPC 44x Platform
 Modules linked in:
 NIP: c010a7c4 LR: c010dc50 CTR: 
 REGS: c01bfeb0 TRAP: 0300   Not tainted  (2.6.30)
 MSR: 00021000 ME,CE  CR: 2444  XER: 
 DEAR: 00021000, ESR: 
 TASK = c01a94b8[0] 'swapper' THREAD: c01be000
 GPR00: 1180 c01bff60 c01a94b8 00021000 0025 0008 c0104968 
 
 GPR08: 2f72616d c011 c0155938 c01a 2224  f104 
 
 GPR16:     fff8 08b8 c010d758 
 c0104968
 GPR24: 1198 1190 c018a001 c01c5498 08c0 1188 00021000 
 c01c42f0
 NIP [c010a7c4] strchr+0x0/0x3c
 LR [c010dc50] match_token+0x138/0x228
 Call Trace:
 [c01bff60] [c016b99c] 0xc016b99c (unreliable)
 [c01bffa0] [c0104a00] sort_extable+0x28/0x38
 [c01bffb0] [c01938ec] sort_main_extable+0x20/0x30
 [c01bffc0] [c018c734] start_kernel+0x140/0x288
 [c01bfff0] [c200] skpinv+0x190/0x1cc
 Instruction dump:
 7ca903a6 8804 38a5 38840001 2f80 9809 39290001 419e0010
 4200ffe4 98a9 4e800020 4e800020 8803 5484063e 7f802000 4d9e0020
 ---[ end trace 31fd0ba7d8756001 ]---
 Kernel panic - not syncing: Attempted to kill the idle task!
 Call Trace:
 [c01bfd90] [c0005d5c] show_stack+0x4c/0x16c (unreliable)
 [c01bfdd0] [c002f17c] panic+0xa0/0x168
 [c01bfe20] [c0032eb8] do_exit+0x61c/0x638
 [c01bfe60] [c000b60c] kernel_bad_stack+0x0/0x4c
 [c01bfe90] [c000f310] bad_page_fault+0x90/0xd8
 [c01bfea0] [c000e184] handle_page_fault+0x7c/0x80
 [c01bff60] [c016b99c] 0xc016b99c
 [c01bffa0] [c0104a00] sort_extable+0x28/0x38
 [c01bffb0] [c01938ec] sort_main_extable+0x20/0x30
 [c01bffc0] [c018c734] start_kernel+0x140/0x288
 [c01bfff0] [c200] skpinv+0x190/0x1cc
 Rebooting in 180 seconds..
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev

 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite

2009-10-02 Thread Joakim Tjernlund

 Benjamin Herrenschmidt b...@kernel.crashing.org wrote on 01/10/2009 
 00:35:59:
 
 
Had a look at linus tree and there is something I don't understand.
Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
works and top of tree does not, how can that be?
   
To me it seems more likely that some mm change introduced between .31 
and
top of tree is the culprit.
My patch addresses the problem described in the comment:
/* On 8xx, cache control instructions (particularly
 * dcbst from flush_dcache_icache) fault as write
 * operation if there is an unpopulated TLB entry
 * for the address in question. To workaround that,
 * we invalidate the TLB here, thus avoiding dcbst
 * misbehaviour.
 */
Now you are using this old fix to paper over some other bug or so I 
think.
  
   There is something fishy with the TLB status, looking into the mpc862 
   manual I
   don't see how it can work reliably. Need to dwell some more.
   Anyhow, I have incorporated some of my findings into a new patch,
   perhaps this will be the golden one?
 
  Well, I still wonder about what whole unpopulated entry business.
 
  From what I can see, the TLB miss code will check _PAGE_PRESENT, and
  when not set, it will -still- insert something into the TLB (unlike
  all other CPU types that go straight to data access faults from there).
 
  So we end up populating with those unpopulated entries that will then
  cause us to take a DSI (or ISI) instead of a TLB miss the next time
  around ... and so we need to remove them once we do that, no ? IE. Once
  we have actually put a valid PTE in.
 
  At least that's my understanding and why I believe we should always have
  tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
  in putting it into the 2 filter functions in the new code.
 
  Well.. actually, the ptep_set_access_flags() will already do a
  flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
  really need here would be in set_pte_filter(), to do the same if we are
  writing a PTE that has _PAGE_PRESENT, at least on 8xx.
 
  But just unconditionally doing a tlbil_va() in both the filter functions
  shouldn't harm and also fix the problem, though Rex seems to indicate
  that is not the case.
 
  Now, we -might- have something else broken on 8xx, hard to tell. You may
  want to try to bisect, adding back the removed tlbil_va() as you go
  backward, between .31 and upstream...

 Found something odd, perhaps Rex can test?
 I tested this on my old 2.4 and it worked well.

That was not quite correct, sorry.
Try this instead:

From c5f1a70561730b8a443f7081fbd9c5b023147043 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund joakim.tjernl...@transmode.se
Date: Fri, 2 Oct 2009 14:59:21 +0200
Subject: [PATCH] powerpc, 8xx: DTLB Error must check for more errors.

DataTLBError currently does:
 if ((err  0x0200) == 0)
DSI();
This won't handle a store with no valid translation.
Change this to
 if ((err  0x4800) != 0)
DSI();
that is, branch to DSI if either !permission or
!translation.
---
 arch/powerpc/kernel/head_8xx.S |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..118bb05 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -472,8 +472,8 @@ DataTLBError:
/* First, make sure this was a store operation.
*/
mfspr   r10, SPRN_DSISR
-   andis.  r11, r10, 0x0200/* If set, indicates store op */
-   beq 2f
+   andis.  r11, r10, 0x4800 /* no translation, no permission. */
+   bne 2f  /* branch if either is set */

/* The EA of a data TLB miss is automatically stored in the MD_EPN
 * register.  The EA of a data TLB error is automatically stored in
--
1.6.4.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Is volatile always verboten for FSL QE structures?

2009-10-02 Thread Guillaume Knispel
Michael Barkowski wrote:

 Kumar Gala wrote:
  
  On Oct 2, 2009, at 9:46 AM, Timur Tabi wrote:
  
  Michael Barkowski wrote:
  Just wondering - is there a case where using volatile for UCC
  parameter RAM for example will not work, or is the use of I/O
  accessors everywhere an attempt to be portable to other architectures?
 
  'volatile' just doesn't really do what you think it should do.  The
  PowerPC architecture is too complicated w.r.t. ordering of reads and
  writes.  In other words, you can't trust it.
 
  No one should be using 'volatile' to access I/O registers.
  
  See Documentation/volatile-considered-harmful.txt
  
 
 I'm happy to adopt your interpretation of it, and I appreciate the 
 explanation.
 
 from Documentation/volatile-considered-harmful.txt:
- The above-mentioned accessor functions might use volatile on
  architectures where direct I/O memory access does work.  Essentially,
  each accessor call becomes a little critical section on its own and
  ensures that the access happens as expected by the programmer.
 
 Part of it was that I wondered if this was one of those architectures.  I 
 guess not.

I guess this could only work on architectures having a totally ordered
memory model.  Definitely not the case for Power.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] mm: add notifier in pageblock isolation for balloon drivers

2009-10-02 Thread Robert Jennings
* Nathan Fontenot (nf...@austin.ibm.com) wrote:
 Robert Jennings wrote:
 Memory balloon drivers can allocate a large amount of memory which
 is not movable but could be freed to accommodate memory hotplug remove.

 Prior to calling the memory hotplug notifier chain the memory in the
 pageblock is isolated.  If the migrate type is not MIGRATE_MOVABLE the
 isolation will not proceed, causing the memory removal for that page
 range to fail.

 Rather than immediately failing pageblock isolation if the the
 migrateteype is not MIGRATE_MOVABLE, this patch checks if all of the
 pages in the pageblock are owned by a registered balloon driver using a
 notifier chain.  If all of the non-movable pages are owned by a balloon,
 they can be freed later through the memory notifier chain and the range
 can still be isolated in set_migratetype_isolate().

 Signed-off-by: Robert Jennings r...@linux.vnet.ibm.com

 ---
  drivers/base/memory.c  |   19 +++
  include/linux/memory.h |   22 ++
  mm/page_alloc.c|   49 
 +
  3 files changed, 82 insertions(+), 8 deletions(-)

snip
 Index: b/mm/page_alloc.c
 ===
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -48,6 +48,7 @@
  #include linux/page_cgroup.h
  #include linux/debugobjects.h
  #include linux/kmemleak.h
 +#include linux/memory.h
  #include trace/events/kmem.h
   #include asm/tlbflush.h
 @@ -4985,23 +4986,55 @@ void set_pageblock_flags_group(struct pa
  int set_migratetype_isolate(struct page *page)
  {
  struct zone *zone;
 -unsigned long flags;
 +unsigned long flags, pfn, iter;
 +long immobile = 0;
 +struct memory_isolate_notify arg;
 +int notifier_ret;
  int ret = -EBUSY;
  int zone_idx;
  zone = page_zone(page);
  zone_idx = zone_idx(zone);
 +
 +pfn = page_to_pfn(page);
 +arg.start_addr = (unsigned long)page_address(page);
 +arg.nr_pages = pageblock_nr_pages;
 +arg.pages_found = 0;
 +
  spin_lock_irqsave(zone-lock, flags);
  /*
   * In future, more migrate types will be able to be isolation target.
   */
 -if (get_pageblock_migratetype(page) != MIGRATE_MOVABLE 
 -zone_idx != ZONE_MOVABLE)
 -goto out;
 -set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 -move_freepages_block(zone, page, MIGRATE_ISOLATE);
 -ret = 0;
 -out:
 +do {
 +if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE 
 +zone_idx == ZONE_MOVABLE) {
 +ret = 0;
 +break;
 +}
 +
 +/*
 + * If all of the pages in a zone are used by a balloon,
 + * the range can be still be isolated.  The balloon will
 + * free these pages from the memory notifier chain.
 + */
 +notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, arg);
 +notifier_ret = notifier_to_errno(ret);

 Should this be

   notifier_ret = notifier_to_errno(notifier_ret);

 -Nathan

I'll correct this.  Thanks

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] mm: add notifier in pageblock isolation for balloon drivers

2009-10-02 Thread Robert Jennings
* Mel Gorman (m...@csn.ul.ie) wrote:
 On Thu, Oct 01, 2009 at 02:53:11PM -0500, Robert Jennings wrote:
  Memory balloon drivers can allocate a large amount of memory which
  is not movable but could be freed to accommodate memory hotplug remove.
  
  Prior to calling the memory hotplug notifier chain the memory in the
  pageblock is isolated.  If the migrate type is not MIGRATE_MOVABLE the
  isolation will not proceed, causing the memory removal for that page
  range to fail.
  
  Rather than immediately failing pageblock isolation if the the
  migrateteype is not MIGRATE_MOVABLE, this patch checks if all of the
  pages in the pageblock are owned by a registered balloon driver using a
  notifier chain.  If all of the non-movable pages are owned by a balloon,
  they can be freed later through the memory notifier chain and the range
  can still be isolated in set_migratetype_isolate().
  
  Signed-off-by: Robert Jennings r...@linux.vnet.ibm.com
  
  ---
   drivers/base/memory.c  |   19 +++
   include/linux/memory.h |   22 ++
   mm/page_alloc.c|   49 
  +
   3 files changed, 82 insertions(+), 8 deletions(-)
  
  Index: b/drivers/base/memory.c
  ===
  --- a/drivers/base/memory.c
  +++ b/drivers/base/memory.c
  @@ -63,6 +63,20 @@ void unregister_memory_notifier(struct n
   }
   EXPORT_SYMBOL(unregister_memory_notifier);
   
  +static BLOCKING_NOTIFIER_HEAD(memory_isolate_chain);
  +
  +int register_memory_isolate_notifier(struct notifier_block *nb)
  +{
  +   return blocking_notifier_chain_register(memory_isolate_chain, nb);
  +}
  +EXPORT_SYMBOL(register_memory_isolate_notifier);
  +
  +void unregister_memory_isolate_notifier(struct notifier_block *nb)
  +{
  +   blocking_notifier_chain_unregister(memory_isolate_chain, nb);
  +}
  +EXPORT_SYMBOL(unregister_memory_isolate_notifier);
  +
   /*
* register_memory - Setup a sysfs device for a memory block
*/
  @@ -157,6 +171,11 @@ int memory_notify(unsigned long val, voi
  return blocking_notifier_call_chain(memory_chain, val, v);
   }
   
  +int memory_isolate_notify(unsigned long val, void *v)
  +{
  +   return blocking_notifier_call_chain(memory_isolate_chain, val, v);
  +}
  +
   /*
* MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
* OK to have direct references to sparsemem variables in here.
  Index: b/include/linux/memory.h
  ===
  --- a/include/linux/memory.h
  +++ b/include/linux/memory.h
  @@ -50,6 +50,14 @@ struct memory_notify {
  int status_change_nid;
   };
   
  +#define MEM_ISOLATE_COUNT  (10)
  +
 
 This needs a comment explaining that that this is an action to count the
 number of pages within a range that have been isolated within a range of
 pages and not a default value for nr_pages in the next structure.

I'll provide a clear explanation for this.

  +struct memory_isolate_notify {
  +   unsigned long start_addr;
  +   unsigned int nr_pages;
  +   unsigned int pages_found;
  +};
 
 Is there any particular reason you used virtual address of the mapped
 page instead of PFN? I am guessing at this point that the balloon driver
 is based on addresses but the code that populates the structure more
 commonly deals with PFNs. Outside of debugging code, page_address is
 rarely used in mm/page_alloc.c .
 
 It's picky but it feels more natural to me to have the structure have
 start_pfn and nr_pages or start_addr and end_addr but not a mix of both.

Changing this to use start_pfn and nr_pages, this will also match
struct memory_notify.  Thanks for the review of this patch.

  +
   struct notifier_block;
   struct mem_section;
   
  @@ -76,14 +84,28 @@ static inline int memory_notify(unsigned
   {
  return 0;
   }
  +static inline int register_memory_isolate_notifier(struct notifier_block 
  *nb)
  +{
  +   return 0;
  +}
  +static inline void unregister_memory_isolate_notifier(struct 
  notifier_block *nb)
  +{
  +}
  +static inline int memory_isolate_notify(unsigned long val, void *v)
  +{
  +   return 0;
  +}
   #else
   extern int register_memory_notifier(struct notifier_block *nb);
   extern void unregister_memory_notifier(struct notifier_block *nb);
  +extern int register_memory_isolate_notifier(struct notifier_block *nb);
  +extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
   extern int register_new_memory(int, struct mem_section *);
   extern int unregister_memory_section(struct mem_section *);
   extern int memory_dev_init(void);
   extern int remove_memory_block(unsigned long, struct mem_section *, int);
   extern int memory_notify(unsigned long val, void *v);
  +extern int memory_isolate_notify(unsigned long val, void *v);
   extern struct memory_block *find_memory_block(struct mem_section *);
   #define CONFIG_MEM_BLOCK_SIZE  (PAGES_PER_SECTIONPAGE_SHIFT)
   

[PATCH 1/2][v2] mm: add notifier in pageblock isolation for balloon drivers

2009-10-02 Thread Robert Jennings
Memory balloon drivers can allocate a large amount of memory which
is not movable but could be freed to accomodate memory hotplug remove.

Prior to calling the memory hotplug notifier chain the memory in the
pageblock is isolated.  If the migrate type is not MIGRATE_MOVABLE the
isolation will not proceed, causing the memory removal for that page
range to fail.

Rather than failing pageblock isolation if the the migrateteype is not
MIGRATE_MOVABLE, this patch checks if all of the pages in the pageblock
are owned by a registered balloon driver (or other entity) using a
notifier chain.  If all of the non-movable pages are owned by a balloon,
they can be freed later through the memory notifier chain and the range
can still be isolated in set_migratetype_isolate().

Signed-off-by: Robert Jennings r...@linux.vnet.ibm.com

---
 drivers/base/memory.c  |   19 +++
 include/linux/memory.h |   26 ++
 mm/page_alloc.c|   45 ++---
 3 files changed, 83 insertions(+), 7 deletions(-)

Index: b/drivers/base/memory.c
===
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -63,6 +63,20 @@ void unregister_memory_notifier(struct n
 }
 EXPORT_SYMBOL(unregister_memory_notifier);
 
+static BLOCKING_NOTIFIER_HEAD(memory_isolate_chain);
+
+int register_memory_isolate_notifier(struct notifier_block *nb)
+{
+   return blocking_notifier_chain_register(memory_isolate_chain, nb);
+}
+EXPORT_SYMBOL(register_memory_isolate_notifier);
+
+void unregister_memory_isolate_notifier(struct notifier_block *nb)
+{
+   blocking_notifier_chain_unregister(memory_isolate_chain, nb);
+}
+EXPORT_SYMBOL(unregister_memory_isolate_notifier);
+
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
@@ -157,6 +171,11 @@ int memory_notify(unsigned long val, voi
return blocking_notifier_call_chain(memory_chain, val, v);
 }
 
+int memory_isolate_notify(unsigned long val, void *v)
+{
+   return blocking_notifier_call_chain(memory_isolate_chain, val, v);
+}
+
 /*
  * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
  * OK to have direct references to sparsemem variables in here.
Index: b/include/linux/memory.h
===
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -50,6 +50,18 @@ struct memory_notify {
int status_change_nid;
 };
 
+/*
+ * During pageblock isolation, count the number of pages in the
+ * range [start_pfn, start_pfn + nr_pages)
+ */
+#define MEM_ISOLATE_COUNT  (10)
+
+struct memory_isolate_notify {
+   unsigned long start_pfn;
+   unsigned int nr_pages;
+   unsigned int pages_found;
+};
+
 struct notifier_block;
 struct mem_section;
 
@@ -76,14 +88,28 @@ static inline int memory_notify(unsigned
 {
return 0;
 }
+static inline int register_memory_isolate_notifier(struct notifier_block *nb)
+{
+   return 0;
+}
+static inline void unregister_memory_isolate_notifier(struct notifier_block 
*nb)
+{
+}
+static inline int memory_isolate_notify(unsigned long val, void *v)
+{
+   return 0;
+}
 #else
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
+extern int register_memory_isolate_notifier(struct notifier_block *nb);
+extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 extern int register_new_memory(int, struct mem_section *);
 extern int unregister_memory_section(struct mem_section *);
 extern int memory_dev_init(void);
 extern int remove_memory_block(unsigned long, struct mem_section *, int);
 extern int memory_notify(unsigned long val, void *v);
+extern int memory_isolate_notify(unsigned long val, void *v);
 extern struct memory_block *find_memory_block(struct mem_section *);
 #define CONFIG_MEM_BLOCK_SIZE  (PAGES_PER_SECTIONPAGE_SHIFT)
 enum mem_add_context { BOOT, HOTPLUG };
Index: b/mm/page_alloc.c
===
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -48,6 +48,7 @@
 #include linux/page_cgroup.h
 #include linux/debugobjects.h
 #include linux/kmemleak.h
+#include linux/memory.h
 #include trace/events/kmem.h
 
 #include asm/tlbflush.h
@@ -4985,23 +4986,53 @@ void set_pageblock_flags_group(struct pa
 int set_migratetype_isolate(struct page *page)
 {
struct zone *zone;
-   unsigned long flags;
+   unsigned long flags, pfn, iter;
+   unsigned long immobile = 0;
+   struct memory_isolate_notify arg;
+   int notifier_ret;
int ret = -EBUSY;
int zone_idx;
 
zone = page_zone(page);
zone_idx = zone_idx(zone);
+
spin_lock_irqsave(zone-lock, flags);
+   if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+   zone_idx == ZONE_MOVABLE) {
+   ret = 0;
+   goto out;
+   }
+
+

[PATCH 2/2][v2] powerpc: Make the CMM memory hotplug aware

2009-10-02 Thread Robert Jennings

The Collaborative Memory Manager (CMM) module allocates individual pages
over time that are not migratable.  On a long running system this can
severely impact the ability to find enough pages to support a hotplug
memory remove operation.

This patch adds a memory isolation notifier and a memory hotplug notifier.
The memory isolation notifier will return the number of pages found
in the range specified.  This is used to determine if all of the used
pages in a pageblock are owned by the balloon (or other entities in
the notifier chain).  The hotplug notifier will free pages in the range
which is to be removed.  The priority of this hotplug notifier is low
so that it will be called near last, this helps avoids removing loaned
pages in operations that fail due to other handlers.

CMM activity will be halted when hotplug remove operations are active
and resume activity after a delay period to allow the hypervisor time
to adjust.

Signed-off-by: Robert Jennings r...@linux.vnet.ibm.com

---
Minor update to cmm_count_pages() to account for changes in
struct memory_isolate_notify.
---
 arch/powerpc/platforms/pseries/cmm.c |  207 ++-
 1 file changed, 201 insertions(+), 6 deletions(-)

Index: b/arch/powerpc/platforms/pseries/cmm.c
===
--- a/arch/powerpc/platforms/pseries/cmm.c
+++ b/arch/powerpc/platforms/pseries/cmm.c
@@ -38,19 +38,28 @@
 #include asm/mmu.h
 #include asm/pgalloc.h
 #include asm/uaccess.h
+#include linux/memory.h
 
 #include plpar_wrappers.h
 
 #define CMM_DRIVER_VERSION 1.0.0
 #define CMM_DEFAULT_DELAY  1
+#define CMM_HOTPLUG_DELAY  5
 #define CMM_DEBUG  0
 #define CMM_DISABLE0
 #define CMM_OOM_KB 1024
 #define CMM_MIN_MEM_MB 256
 #define KB2PAGES(_p)   ((_p)(PAGE_SHIFT-10))
 #define PAGES2KB(_p)   ((_p)(PAGE_SHIFT-10))
+/*
+ * The priority level tries to ensure that this notifier is called as
+ * late as possible to reduce thrashing in the shared memory pool.
+ */
+#define CMM_MEM_HOTPLUG_PRI1
+#define CMM_MEM_ISOLATE_PRI15
 
 static unsigned int delay = CMM_DEFAULT_DELAY;
+static unsigned int hotplug_delay = CMM_HOTPLUG_DELAY;
 static unsigned int oom_kb = CMM_OOM_KB;
 static unsigned int cmm_debug = CMM_DEBUG;
 static unsigned int cmm_disabled = CMM_DISABLE;
@@ -65,6 +74,10 @@ MODULE_VERSION(CMM_DRIVER_VERSION);
 module_param_named(delay, delay, uint, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(delay, Delay (in seconds) between polls to query hypervisor 
paging requests. 
 [Default= __stringify(CMM_DEFAULT_DELAY) ]);
+module_param_named(hotplug_delay, hotplug_delay, uint, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(delay, Delay (in seconds) after memory hotplug remove 
+before activity resumes. 
+[Default= __stringify(CMM_HOTPLUG_DELAY) ]);
 module_param_named(oom_kb, oom_kb, uint, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(oom_kb, Amount of memory in kb to free on OOM. 
 [Default= __stringify(CMM_OOM_KB) ]);
@@ -88,6 +101,8 @@ struct cmm_page_array {
 static unsigned long loaned_pages;
 static unsigned long loaned_pages_target;
 static unsigned long oom_freed_pages;
+static atomic_t hotplug_active = ATOMIC_INIT(0);
+static atomic_t hotplug_occurred = ATOMIC_INIT(0);
 
 static struct cmm_page_array *cmm_page_list;
 static DEFINE_SPINLOCK(cmm_lock);
@@ -110,6 +125,9 @@ static long cmm_alloc_pages(long nr)
cmm_dbg(Begin request for %ld pages\n, nr);
 
while (nr) {
+   if (atomic_read(hotplug_active))
+   break;
+
addr = __get_free_page(GFP_NOIO | __GFP_NOWARN |
   __GFP_NORETRY | __GFP_NOMEMALLOC);
if (!addr)
@@ -119,8 +137,10 @@ static long cmm_alloc_pages(long nr)
if (!pa || pa-index = CMM_NR_PAGES) {
/* Need a new page for the page list. */
spin_unlock(cmm_lock);
-   npa = (struct cmm_page_array *)__get_free_page(GFP_NOIO 
| __GFP_NOWARN |
-  
__GFP_NORETRY | __GFP_NOMEMALLOC);
+   npa = (struct cmm_page_array *)__get_free_page(
+   GFP_NOIO | __GFP_NOWARN |
+   __GFP_NORETRY | __GFP_NOMEMALLOC |
+   __GFP_MOVABLE);
if (!npa) {
pr_info(%s: Can not allocate new page list\n, 
__func__);
free_page(addr);
@@ -273,9 +293,23 @@ static int cmm_thread(void *dummy)
while (1) {
timeleft = msleep_interruptible(delay * 1000);
 
-   if (kthread_should_stop() || timeleft) {
-   loaned_pages_target = loaned_pages;
+   if 

Re: [patch 1/3] powerpc: Move 64bit VDSO to improve context switch performance

2009-10-02 Thread Andreas Schwab
Anton Blanchard an...@samba.org writes:

 On 64bit applications the VDSO is the only thing in segment 0. Since the VDSO
 is position independent we can remove the hint and let get_unmapped_area pick
 an area.

This breaks gdb.  The section table in the VDSO image when mapped into
the process no longer contains meaningful values, and gdb rejects it.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite

2009-10-02 Thread Scott Wood
On Thu, Oct 01, 2009 at 08:35:59AM +1000, Benjamin Herrenschmidt wrote:
 From what I can see, the TLB miss code will check _PAGE_PRESENT, and
 when not set, it will -still- insert something into the TLB (unlike
 all other CPU types that go straight to data access faults from there).
 
 So we end up populating with those unpopulated entries that will then
 cause us to take a DSI (or ISI) instead of a TLB miss the next time
 around ... and so we need to remove them once we do that, no ? IE. Once
 we have actually put a valid PTE in.
 
 At least that's my understanding and why I believe we should always have
 tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
 in putting it into the 2 filter functions in the new code.
 
 Well.. actually, the ptep_set_access_flags() will already do a
 flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
 really need here would be in set_pte_filter(), to do the same if we are
 writing a PTE that has _PAGE_PRESENT, at least on 8xx.
 
 But just unconditionally doing a tlbil_va() in both the filter functions
 shouldn't harm and also fix the problem, though Rex seems to indicate
 that is not the case.

Adding a tlbil_va to do_page_fault makes the problem go away for me (on
top of your merge branch) -- none of the other changes in this thread
do (assuming I didn't miss any).  FWIW, when it gets stuck on a fault,
DSISR is 0xc000, and handle_mm_fault returns zero.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite

2009-10-02 Thread Benjamin Herrenschmidt
On Fri, 2009-10-02 at 16:49 -0500, Scott Wood wrote:
 Adding a tlbil_va to do_page_fault makes the problem go away for me (on
 top of your merge branch) -- none of the other changes in this thread
 do (assuming I didn't miss any).  FWIW, when it gets stuck on a fault,
 DSISR is 0xc000, and handle_mm_fault returns zero. 

But in that case, it should hit ptep_set_access_flags() (via
handle_mm_fault) and from there call tlbil_va (provided we add a call to
it in the relevant filter function which I proposed initially), no ?

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev