Re: [PATCH V2] powerpc/mm: Fix Multi hit ERAT cause by recent THP update

2016-02-07 Thread Kirill A. Shutemov
On Mon, Feb 08, 2016 at 11:44:22AM +0530, Aneesh Kumar K.V wrote:
> With ppc64 we use the deposited pgtable_t to store the hash pte slot
> information. We should not withdraw the deposited pgtable_t without
> marking the pmd none. This ensure that low level hash fault handling
> will skip this huge pte and we will handle them at upper levels.
> 
> Recent change to pmd splitting changed the above in order to handle the
> race between pmd split and exit_mmap. The race is explained below.
> 
> Consider following race:
> 
>   CPU0CPU1
> shrink_page_list()
>   add_to_swap()
> split_huge_page_to_list()
>   __split_huge_pmd_locked()
> pmdp_huge_clear_flush_notify()
>   // pmd_none() == true
>   exit_mmap()
> unmap_vmas()
>   zap_pmd_range()
> // no action on pmd since 
> pmd_none() == true
>   pmd_populate()
> 
> As result the THP will not be freed. The leak is detected by check_mm():
> 
>   BUG: Bad rss-counter state mm:880058d2e580 idx:1 val:512
> 
> The above required us to not mark pmd none during a pmd split.
> 
> The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
> level fault handling code skip this pte. At higher level we do take ptl
> lock. That should serialze us against the pmd split. Once the lock is
> acquired we do check the pmd again using pmd_same. That should always
> return false for us and hence we should retry the access.

I guess it worth mention that this serialization against ptl happens in
huge_pmd_set_accessed(), if I didn't miss anything.

> 
> Also make sure we wait for irq disable section in other cpus to finish
> before flipping a huge pte entry with a regular pmd entry. Code paths
> like find_linux_pte_or_hugepte depend on irq disable to get
> a stable pte_t pointer. A parallel thp split need to make sure we
> don't convert a pmd pte to a regular pmd entry without waiting for the
> irq disable section to finish.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  4 
>  arch/powerpc/mm/pgtable_64.c | 35 
> +++-
>  include/asm-generic/pgtable.h|  8 +++
>  mm/huge_memory.c |  1 +
>  4 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 8d1c41d28318..0415856941e0 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -281,6 +281,10 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct 
> mm_struct *mm, pmd_t *pmdp);
>  extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long 
> address,
>   pmd_t *pmdp);
>  
> +#define __HAVE_ARCH_PMDP_HUGE_SPLITTING_FLUSH
> +extern void pmdp_huge_splitting_flush(struct vm_area_struct *vma,
> +   unsigned long address, pmd_t *pmdp);
> +
>  #define pmd_move_must_withdraw pmd_move_must_withdraw
>  struct spinlock;
>  static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index 3124a20d0fab..e8214b7f2210 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -646,6 +646,30 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct 
> *mm, pmd_t *pmdp)
>   return pgtable;
>  }
>  
> +void pmdp_huge_splitting_flush(struct vm_area_struct *vma,
> +unsigned long address, pmd_t *pmdp)
> +{
> + VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> +
> +#ifdef CONFIG_DEBUG_VM
> + BUG_ON(REGION_ID(address) != USER_REGION_ID);
> +#endif
> + /*
> +  * We can't mark the pmd none here, because that will cause a race
> +  * against exit_mmap. We need to continue mark pmd TRANS HUGE, while
> +  * we spilt, but at the same time we wan't rest of the ppc64 code
> +  * not to insert hash pte on this, because we will be modifying
> +  * the deposited pgtable in the caller of this function. Hence
> +  * clear the _PAGE_USER so that we move the fault handling to
> +  * higher level function and that will serialize against ptl.
> +  * We need to flush existing hash pte entries here even though,
> +  * the translation is still valid, because we will withdraw
> +  * pgtable_t after this.
> +  */
> + pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_USER, 0);
> +}
> +
> +
>  /*
>   * set a new huge pmd. We should not be called for updating
>   * an existing pmd entry. That should go via pmd_hugepage_update.
> @@ -663,10 +687,19 @@ void set_pmd_at(struct mm_struct *mm, unsigned long 
> addr,
>   return set_pte_at(mm, addr, 

Re: [PATCH v10 8/8] numa, mm, cleanup: remove redundant NODE_DATA macro from asm header files.

2016-02-07 Thread kbuild test robot
Hi Ganapatrao,

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.5-rc2 next-20160205]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Ganapatrao-Kulkarni/arm64-numa-adding-numa-support-for-arm64-platforms/20160202-181522
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux 
for-next/core
config: i386-randconfig-sb0-02030124 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/gfp.h:5:0,
from include/linux/slab.h:14,
from include/linux/crypto.h:24,
from arch/x86/kernel/asm-offsets.c:8:
   arch/x86/include/asm/mmzone_32.h: In function 'pfn_valid':
>> include/linux/mmzone.h:704:41: error: implicit declaration of function 
>> 'NODE_DATA' [-Werror=implicit-function-declaration]
#define node_end_pfn(nid) pgdat_end_pfn(NODE_DATA(nid))
^
>> arch/x86/include/asm/mmzone_32.h:42:17: note: in expansion of macro 
>> 'node_end_pfn'
  return (pfn < node_end_pfn(nid));
^
>> include/linux/mmzone.h:704:41: warning: passing argument 1 of 
>> 'pgdat_end_pfn' makes pointer from integer without a cast [-Wint-conversion]
#define node_end_pfn(nid) pgdat_end_pfn(NODE_DATA(nid))
^
>> arch/x86/include/asm/mmzone_32.h:42:17: note: in expansion of macro 
>> 'node_end_pfn'
  return (pfn < node_end_pfn(nid));
^
   include/linux/mmzone.h:706:29: note: expected 'pg_data_t * {aka struct 
pglist_data *}' but argument is of type 'int'
static inline unsigned long pgdat_end_pfn(pg_data_t *pgdat)
^
   cc1: some warnings being treated as errors
   make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +/NODE_DATA +704 include/linux/mmzone.h

d41dee369 Andy Whitcroft2005-06-23  698  #else
d41dee369 Andy Whitcroft2005-06-23  699  #define pgdat_page_nr(pgdat, 
pagenr)   pfn_to_page((pgdat)->node_start_pfn + (pagenr))
d41dee369 Andy Whitcroft2005-06-23  700  #endif
408fde81c Dave Hansen   2005-06-23  701  #define nid_page_nr(nid, pagenr)   
pgdat_page_nr(NODE_DATA(nid),(pagenr))
^1da177e4 Linus Torvalds2005-04-16  702  
c6830c226 KAMEZAWA Hiroyuki 2011-06-16  703  #define node_start_pfn(nid)
(NODE_DATA(nid)->node_start_pfn)
da3649e13 Cody P Schafer2013-02-22 @704  #define node_end_pfn(nid) 
pgdat_end_pfn(NODE_DATA(nid))
c6830c226 KAMEZAWA Hiroyuki 2011-06-16  705  
da3649e13 Cody P Schafer2013-02-22  706  static inline unsigned long 
pgdat_end_pfn(pg_data_t *pgdat)
da3649e13 Cody P Schafer2013-02-22  707  {

:: The code at line 704 was first introduced by commit
:: da3649e133948d8b7d8c57b05a33faf62ac2cc7e mmzone: add 
pgdat_{end_pfn,is_empty}() helpers & consolidate.

:: TO: Cody P Schafer 
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 06/23] powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together

2016-02-07 Thread kbuild test robot
Hi Christophe,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.5-rc2 next-20160205]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/powerpc-8xx-Use-large-pages-for-RAM-and-IMMR-and-other-improvments/20160204-071322
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-ppc64e_defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

>> arch/powerpc/mm/fsl_booke_mmu.c:78:13: error: redefinition of 
>> 'v_block_mapped'
phys_addr_t v_block_mapped(unsigned long va)
^
   In file included from arch/powerpc/mm/fsl_booke_mmu.c:57:0:
   arch/powerpc/mm/mmu_decl.h:168:27: note: previous definition of 
'v_block_mapped' was here
static inline phys_addr_t v_block_mapped(unsigned long va) { return 0; }
  ^
>> arch/powerpc/mm/fsl_booke_mmu.c:90:15: error: redefinition of 
>> 'p_block_mapped'
unsigned long p_block_mapped(phys_addr_t pa)
  ^
   In file included from arch/powerpc/mm/fsl_booke_mmu.c:57:0:
   arch/powerpc/mm/mmu_decl.h:169:29: note: previous definition of 
'p_block_mapped' was here
static inline unsigned long p_block_mapped(phys_addr_t pa) { return 0; }
^

vim +/v_block_mapped +78 arch/powerpc/mm/fsl_booke_mmu.c

72  return tlbcam_addrs[idx].limit - tlbcam_addrs[idx].start + 1;
73  }
74  
75  /*
76   * Return PA for this VA if it is mapped by a CAM, or 0
77   */
  > 78  phys_addr_t v_block_mapped(unsigned long va)
79  {
80  int b;
81  for (b = 0; b < tlbcam_index; ++b)
82  if (va >= tlbcam_addrs[b].start && va < 
tlbcam_addrs[b].limit)
83  return tlbcam_addrs[b].phys + (va - 
tlbcam_addrs[b].start);
84  return 0;
85  }
86  
87  /*
88   * Return VA for a given PA or 0 if not mapped
89   */
  > 90  unsigned long p_block_mapped(phys_addr_t pa)
91  {
92  int b;
93  for (b = 0; b < tlbcam_index; ++b)

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4] powerpc/ps3: gelic_udbg: use struct udphdr from

2016-02-07 Thread Luis Henriques
Instead of defining a local version of struct udphdr use the standard
definition from .

The 'src' field is named 'source' in the  definition.

Signed-off-by: Luis Henriques 
---
 arch/powerpc/platforms/ps3/gelic_udbg.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/ps3/gelic_udbg.c 
b/arch/powerpc/platforms/ps3/gelic_udbg.c
index 01d274fcbe51..b8f90a8465b9 100644
--- a/arch/powerpc/platforms/ps3/gelic_udbg.c
+++ b/arch/powerpc/platforms/ps3/gelic_udbg.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -60,13 +61,6 @@ struct debug_block {
u8 pkt[1520];
 } __packed;
 
-struct udphdr {
-   u16 src;
-   u16 dest;
-   u16 len;
-   u16 checksum;
-} __packed;
-
 static __iomem struct ethhdr *h_eth;
 static __iomem struct vlan_hdr *h_vlan;
 static __iomem struct iphdr *h_ip;
@@ -185,7 +179,7 @@ static void gelic_debug_init(void)
 
header_size += sizeof(struct udphdr);
h_udp = (struct udphdr *)(h_ip + 1);
-   h_udp->src = GELIC_DEBUG_PORT;
+   h_udp->source = GELIC_DEBUG_PORT;
h_udp->dest = GELIC_DEBUG_PORT;
 
pmsgc = pmsg = (char *)(h_udp + 1);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/4] powerpc/ps3: gelic_udbg: drop local versions of network data structs

2016-02-07 Thread Luis Henriques
Several network-related data structures are defined in gelic_udbg.
These could be easily dropped and the standard ones defined in network
headers could be used instead.

The 4 patches that follow replace ethernet, vlan, ip and udp
structures in gelic_udbg.  Note that this has been compile-tested
only.

Luis Henriques (4):
  powerpc/ps3: gelic_udbg: use struct ethhdr from 
  powerpc/ps3: gelic_udbg: use struct vlan_hdr from 
  powerpc/ps3: gelic_udbg: use struct iphdr from 
  powerpc/ps3: gelic_udbg: use struct udphdr from 

 arch/powerpc/platforms/ps3/gelic_udbg.c | 71 +++--
 1 file changed, 23 insertions(+), 48 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/4] powerpc/ps3: gelic_udbg: use struct vlan_hdr from

2016-02-07 Thread Luis Henriques
Instead of defining the local struct vlantag use the standard definition
of vlan_hdr from .

The fields in the  definition have different names:
 - vlan -> h_vlan_TCI
 - subtype -> h_vlan_encapsulated_proto

Signed-off-by: Luis Henriques 
---
 arch/powerpc/platforms/ps3/gelic_udbg.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/ps3/gelic_udbg.c 
b/arch/powerpc/platforms/ps3/gelic_udbg.c
index ac87811e8b4e..4d6e827edfde 100644
--- a/arch/powerpc/platforms/ps3/gelic_udbg.c
+++ b/arch/powerpc/platforms/ps3/gelic_udbg.c
@@ -14,6 +14,7 @@
  */
 
 #include 
+#include 
 
 #include 
 #include 
@@ -58,11 +59,6 @@ struct debug_block {
u8 pkt[1520];
 } __packed;
 
-struct vlantag {
-   u16 vlan;
-   u16 subtype;
-} __packed;
-
 struct iphdr {
u8 ver_len;
u8 dscp_ecn;
@@ -84,7 +80,7 @@ struct udphdr {
 } __packed;
 
 static __iomem struct ethhdr *h_eth;
-static __iomem struct vlantag *h_vlan;
+static __iomem struct vlan_hdr *h_vlan;
 static __iomem struct iphdr *h_ip;
 static __iomem struct udphdr *h_udp;
 
@@ -181,10 +177,10 @@ static void gelic_debug_init(void)
if (!result) {
h_eth->h_proto= 0x8100;
 
-   header_size += sizeof(struct vlantag);
-   h_vlan = (struct vlantag *)(h_eth + 1);
-   h_vlan->vlan = vlan_id;
-   h_vlan->subtype = 0x0800;
+   header_size += sizeof(struct vlan_hdr);
+   h_vlan = (struct vlan_hdr *)(h_eth + 1);
+   h_vlan->h_vlan_TCI = vlan_id;
+   h_vlan->h_vlan_encapsulated_proto = 0x0800;
h_ip = (struct iphdr *)(h_vlan + 1);
} else {
h_eth->h_proto= 0x0800;
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/4] powerpc/ps3: gelic_udbg: use struct iphdr from

2016-02-07 Thread Luis Henriques
Instead of defining a local version of struct iphdr use the standard
definition from .

Several fields in the  definition have different names:
 - proto -> protocol
 - src -> saddr
 - dest -> daddr
 - total_length -> tot_len
 - checksum -> check

Also, 'ver_len' is composed by 'version' and 'ihl' in .

Signed-off-by: Luis Henriques 
---
 arch/powerpc/platforms/ps3/gelic_udbg.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/platforms/ps3/gelic_udbg.c 
b/arch/powerpc/platforms/ps3/gelic_udbg.c
index 4d6e827edfde..01d274fcbe51 100644
--- a/arch/powerpc/platforms/ps3/gelic_udbg.c
+++ b/arch/powerpc/platforms/ps3/gelic_udbg.c
@@ -15,6 +15,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -59,19 +60,6 @@ struct debug_block {
u8 pkt[1520];
 } __packed;
 
-struct iphdr {
-   u8 ver_len;
-   u8 dscp_ecn;
-   u16 total_length;
-   u16 ident;
-   u16 frag_off_flags;
-   u8 ttl;
-   u8 proto;
-   u16 checksum;
-   u32 src;
-   u32 dest;
-} __packed;
-
 struct udphdr {
u16 src;
u16 dest;
@@ -188,11 +176,12 @@ static void gelic_debug_init(void)
}
 
header_size += sizeof(struct iphdr);
-   h_ip->ver_len = 0x45;
+   h_ip->version = 4;
+   h_ip->ihl = 5;
h_ip->ttl = 10;
-   h_ip->proto = 0x11;
-   h_ip->src = 0x;
-   h_ip->dest = 0x;
+   h_ip->protocol = 0x11;
+   h_ip->saddr = 0x;
+   h_ip->daddr = 0x;
 
header_size += sizeof(struct udphdr);
h_udp = (struct udphdr *)(h_ip + 1);
@@ -217,16 +206,16 @@ static void gelic_sendbuf(int msgsize)
int i;
 
dbg.descr.buf_size = header_size + msgsize;
-   h_ip->total_length = msgsize + sizeof(struct udphdr) +
+   h_ip->tot_len = msgsize + sizeof(struct udphdr) +
 sizeof(struct iphdr);
h_udp->len = msgsize + sizeof(struct udphdr);
 
-   h_ip->checksum = 0;
+   h_ip->check = 0;
sum = 0;
p = (u16 *)h_ip;
for (i = 0; i < 5; i++)
sum += *p++;
-   h_ip->checksum = ~(sum + (sum >> 16));
+   h_ip->check = ~(sum + (sum >> 16));
 
dbg.descr.dmac_cmd_status = GELIC_DESCR_DMA_CMD_NO_CHKSUM |
GELIC_DESCR_TX_DMA_FRAME_TAIL;
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/4] powerpc/ps3: gelic_udbg: use struct ethhdr from

2016-02-07 Thread Luis Henriques
Instead of defining a local version of struct ethhdr use the standard
definition from .

The fields in the  definition have different names:
 - dest -> h_dest
 - src -> h_source
 - type -> h_proto

Signed-off-by: Luis Henriques 
---
 arch/powerpc/platforms/ps3/gelic_udbg.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/ps3/gelic_udbg.c 
b/arch/powerpc/platforms/ps3/gelic_udbg.c
index 20b46a19a48f..ac87811e8b4e 100644
--- a/arch/powerpc/platforms/ps3/gelic_udbg.c
+++ b/arch/powerpc/platforms/ps3/gelic_udbg.c
@@ -13,6 +13,8 @@
  *
  */
 
+#include 
+
 #include 
 #include 
 #include 
@@ -56,12 +58,6 @@ struct debug_block {
u8 pkt[1520];
 } __packed;
 
-struct ethhdr {
-   u8 dest[6];
-   u8 src[6];
-   u16 type;
-} __packed;
-
 struct vlantag {
u16 vlan;
u16 subtype;
@@ -173,8 +169,8 @@ static void gelic_debug_init(void)
 
h_eth = (struct ethhdr *)dbg.pkt;
 
-   memset(_eth->dest, 0xff, 6);
-   memcpy(_eth->src, , 6);
+   memset(_eth->h_dest, 0xff, 6);
+   memcpy(_eth->h_source, , 6);
 
header_size = sizeof(struct ethhdr);
 
@@ -183,7 +179,7 @@ static void gelic_debug_init(void)
 GELIC_LV1_VLAN_TX_ETHERNET_0, 0, 0,
 _id, );
if (!result) {
-   h_eth->type = 0x8100;
+   h_eth->h_proto= 0x8100;
 
header_size += sizeof(struct vlantag);
h_vlan = (struct vlantag *)(h_eth + 1);
@@ -191,7 +187,7 @@ static void gelic_debug_init(void)
h_vlan->subtype = 0x0800;
h_ip = (struct iphdr *)(h_vlan + 1);
} else {
-   h_eth->type = 0x0800;
+   h_eth->h_proto= 0x0800;
h_ip = (struct iphdr *)(h_eth + 1);
}
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] powerpc/ps3: gelic_udbg: use struct ethhdr from

2016-02-07 Thread Joe Perches
On Sun, 2016-02-07 at 17:38 +, Luis Henriques wrote:
> Instead of defining a local version of struct ethhdr use the standard
> definition from .

trivia:

> diff --git a/arch/powerpc/platforms/ps3/gelic_udbg.c 
> b/arch/powerpc/platforms/ps3/gelic_udbg.c
[]
> @@ -173,8 +169,8 @@ static void gelic_debug_init(void)
>  
>   h_eth = (struct ethhdr *)dbg.pkt;
>  
> - memset(_eth->dest, 0xff, 6);
> - memcpy(_eth->src, , 6);
> + memset(_eth->h_dest, 0xff, 6);
> + memcpy(_eth->h_source, , 6);

Be nice to use ETH_ALEN and eth_broadcast_addr.
Maybe ether_addr_copy too.

> @@ -183,7 +179,7 @@ static void gelic_debug_init(void)
>    GELIC_LV1_VLAN_TX_ETHERNET_0, 0, 0,
>    _id, );
>   if (!result) {
> - h_eth->type = 0x8100;
> + h_eth->h_proto= 0x8100;

ETH_P_8021Q

>   header_size += sizeof(struct vlantag);
>   h_vlan = (struct vlantag *)(h_eth + 1);
> @@ -191,7 +187,7 @@ static void gelic_debug_init(void)
>   h_vlan->subtype = 0x0800;
>   h_ip = (struct iphdr *)(h_vlan + 1);
>   } else {
> - h_eth->type = 0x0800;
> + h_eth->h_proto= 0x0800;

ETH_P_IP

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 6/9] pseries: Add hypercall wrappers for hash page table resizing

2016-02-07 Thread David Gibson
On Thu, Feb 04, 2016 at 04:41:10PM +0530, Anshuman Khandual wrote:
> On 02/02/2016 06:28 AM, David Gibson wrote:
> > On Mon, Feb 01, 2016 at 12:41:31PM +0530, Anshuman Khandual wrote:
> >> On 01/29/2016 10:54 AM, David Gibson wrote:
> >>> This adds the hypercall numbers and wrapper functions for the hash page
> >>> table resizing hypercalls.
> >>>
> >>> These are experimental "platform specific" values for now, until we have a
> >>> formal PAPR update.
> >>>
> >>> It also adds a new firmware feature flat to track the presence of the
> >>> HPT resizing calls.
> >>
> >> Its a flag   ... ^^^ here.
> > 
> > Oops, thanks.
> > 
> >>
> >>>
> >>> Signed-off-by: David Gibson 
> >>> ---
> >>>  arch/powerpc/include/asm/firmware.h   |  5 +++--
> >>>  arch/powerpc/include/asm/hvcall.h |  2 ++
> >>>  arch/powerpc/include/asm/plpar_wrappers.h | 12 
> >>>  arch/powerpc/platforms/pseries/firmware.c |  1 +
> >>>  4 files changed, 18 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/include/asm/firmware.h 
> >>> b/arch/powerpc/include/asm/firmware.h
> >>> index b062924..32435d2 100644
> >>> --- a/arch/powerpc/include/asm/firmware.h
> >>> +++ b/arch/powerpc/include/asm/firmware.h
> >>> @@ -42,7 +42,7 @@
> >>>  #define FW_FEATURE_SPLPARASM_CONST(0x0010)
> >>>  #define FW_FEATURE_LPAR  ASM_CONST(0x0040)
> >>>  #define FW_FEATURE_PS3_LV1   ASM_CONST(0x0080)
> >>> -/* Free  ASM_CONST(0x0100) */
> >>> +#define FW_FEATURE_HPT_RESIZEASM_CONST(0x0100)
> >>>  #define FW_FEATURE_CMO   ASM_CONST(0x0200)
> >>>  #define FW_FEATURE_VPHN  ASM_CONST(0x0400)
> >>>  #define FW_FEATURE_XCMO  ASM_CONST(0x0800)
> >>> @@ -66,7 +66,8 @@ enum {
> >>>   FW_FEATURE_MULTITCE | FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
> >>>   FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
> >>>   FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
> >>> - FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN,
> >>> + FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
> >>> + FW_FEATURE_HPT_RESIZE,
> >>>   FW_FEATURE_PSERIES_ALWAYS = 0,
> >>>   FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
> >>>   FW_FEATURE_POWERNV_ALWAYS = 0,
> >>> diff --git a/arch/powerpc/include/asm/hvcall.h 
> >>> b/arch/powerpc/include/asm/hvcall.h
> >>> index e3b54dd..195e080 100644
> >>> --- a/arch/powerpc/include/asm/hvcall.h
> >>> +++ b/arch/powerpc/include/asm/hvcall.h
> >>> @@ -293,6 +293,8 @@
> >>>  
> >>>  /* Platform specific hcalls, used by KVM */
> >>>  #define H_RTAS   0xf000
> >>> +#define H_RESIZE_HPT_PREPARE 0xf003
> >>> +#define H_RESIZE_HPT_COMMIT  0xf004
> >>
> >> This sound better and matches FW_FEATURE_HPT_RESIZE ?
> > 
> > I'm not quite sure what you're suggesting here.
> > 
> >> #define H_HPT_RESIZE_PREPARE   0xf003
> >> #define H_HPT_RESIZE_COMMIT0xf004
> 
> Just little bit of change of name of the macro like this
> 
> 
> H_RESIZE_HPT_PREPARE -->  H_HPT_RESIZE_PREPARE
> H_RESIZE_HPT_COMMIT -->  H_HPT_RESIZE_COMMIT

Oh, I see.  Actually, I'm trying to standardize on "resize hpt" rather
than "hpt resize" everywhere.


-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/eeh: fix incorrect function name in comment

2016-02-07 Thread Andrew Donnellan
The comment block above pcibios_set_pcie_reset_state() incorrectly refers
to pcibios_set_pcie_slot_reset(). Fix the comment accordingly.

Signed-off-by: Andrew Donnellan 
---
 arch/powerpc/kernel/eeh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 40e4d4a..8c6005c 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -739,7 +739,7 @@ static void *eeh_restore_dev_state(void *data, void 
*userdata)
 }
 
 /**
- * pcibios_set_pcie_slot_reset - Set PCI-E reset state
+ * pcibios_set_pcie_reset_state - Set PCI-E reset state
  * @dev: pci device struct
  * @state: reset state to enter
  *
-- 
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 13/18] cxl: sysfs support for guests

2016-02-07 Thread Stewart Smith
Frederic Barrat  writes:
> --- a/Documentation/ABI/testing/sysfs-class-cxl
> +++ b/Documentation/ABI/testing/sysfs-class-cxl
> @@ -183,7 +183,7 @@ Description:read only
>  Identifies the revision level of the PSL.
>  Users:   https://github.com/ibm-capi/libcxl
>  
> -What:   /sys/class/cxl//base_image
> +What:   /sys/class/cxl//base_image (not in a guest)

Is this going to be the case for KVM guest as well as PowerVM guest?

-- 
Stewart Smith
OPAL Architect, IBM.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/eeh: fix incorrect function name in comment

2016-02-07 Thread Gavin Shan
On Mon, Feb 08, 2016 at 02:39:19PM +1100, Andrew Donnellan wrote:
>The comment block above pcibios_set_pcie_reset_state() incorrectly refers
>to pcibios_set_pcie_slot_reset(). Fix the comment accordingly.
>
>Signed-off-by: Andrew Donnellan 

Acked-by: Gavin Shan 

Thanks,
Gavin

>---
> arch/powerpc/kernel/eeh.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>index 40e4d4a..8c6005c 100644
>--- a/arch/powerpc/kernel/eeh.c
>+++ b/arch/powerpc/kernel/eeh.c
>@@ -739,7 +739,7 @@ static void *eeh_restore_dev_state(void *data, void 
>*userdata)
> }
>
> /**
>- * pcibios_set_pcie_slot_reset - Set PCI-E reset state
>+ * pcibios_set_pcie_reset_state - Set PCI-E reset state
>  * @dev: pci device struct
>  * @state: reset state to enter
>  *
>-- 
>Andrew Donnellan  Software Engineer, OzLabs
>andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
>+61 2 6201 8874 (work)IBM Australia Limited
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3] powerpc/powernv: Remove support for p5ioc2

2016-02-07 Thread Russell Currey
"p5ioc2 is used by approximately 2 machines in the world, and has never
ever been a supported configuration."

The code for p5ioc2 is essentially unused and complicates what is already
a very complicated codebase.  Its removal is essentially a "free win" in
the effort to simplify the powernv PCI code.

In addition, support for p5ioc2 has been dropped from skiboot.  There's no
reason to keep it around in the kernel.

Signed-off-by: Russell Currey 
---
V3: Remove now-useless variable "found_ioda" thanks to Gavin
V2: Remove pointless union and rebase on -next thanks to Andrew

Tested on a P7IOC machine and a PHB3 machine.

Skiboot p5ioc2 removal patch: https://patchwork.ozlabs.org/patch/544898/
---
 arch/powerpc/platforms/powernv/Makefile |   2 +-
 arch/powerpc/platforms/powernv/pci-p5ioc2.c | 271 
 arch/powerpc/platforms/powernv/pci.c|  17 +-
 arch/powerpc/platforms/powernv/pci.h| 152 
 4 files changed, 74 insertions(+), 368 deletions(-)
 delete mode 100644 arch/powerpc/platforms/powernv/pci-p5ioc2.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index f1516b5..cd9711e 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -5,7 +5,7 @@ obj-y   += opal-msglog.o opal-hmi.o 
opal-power.o opal-irqchip.o
 obj-y  += opal-kmsg.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
-obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o npu-dma.o
+obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
 obj-$(CONFIG_EEH)  += eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)   += opal-memory-errors.o
diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c 
b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
deleted file mode 100644
index f2bdfea..000
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ /dev/null
@@ -1,271 +0,0 @@
-/*
- * Support PCI/PCIe on PowerNV platforms
- *
- * Currently supports only P5IOC2
- *
- * Copyright 2011 Benjamin Herrenschmidt, IBM Corp.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "powernv.h"
-#include "pci.h"
-
-/* For now, use a fixed amount of TCE memory for each p5ioc2
- * hub, 16M will do
- */
-#define P5IOC2_TCE_MEMORY  0x0100
-
-#ifdef CONFIG_PCI_MSI
-static int pnv_pci_p5ioc2_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
-   unsigned int hwirq, unsigned int virq,
-   unsigned int is_64, struct msi_msg *msg)
-{
-   if (WARN_ON(!is_64))
-   return -ENXIO;
-   msg->data = hwirq - phb->msi_base;
-   msg->address_hi = 0x1000;
-   msg->address_lo = 0;
-
-   return 0;
-}
-
-static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb)
-{
-   unsigned int count;
-   const __be32 *prop = of_get_property(phb->hose->dn,
-"ibm,opal-msi-ranges", NULL);
-   if (!prop)
-   return;
-
-   /* Don't do MSI's on p5ioc2 PCI-X are they are not properly
-* verified in HW
-*/
-   if (of_device_is_compatible(phb->hose->dn, "ibm,p5ioc2-pcix"))
-   return;
-   phb->msi_base = be32_to_cpup(prop);
-   count = be32_to_cpup(prop + 1);
-   if (msi_bitmap_alloc(>msi_bmp, count, phb->hose->dn)) {
-   pr_err("PCI %d: Failed to allocate MSI bitmap !\n",
-  phb->hose->global_number);
-   return;
-   }
-   phb->msi_setup = pnv_pci_p5ioc2_msi_setup;
-   phb->msi32_support = 0;
-   pr_info(" Allocated bitmap for %d MSIs (base IRQ 0x%x)\n",
-   count, phb->msi_base);
-}
-#else
-static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb) { }
-#endif /* CONFIG_PCI_MSI */
-
-static struct iommu_table_ops pnv_p5ioc2_iommu_ops = {
-   .set = pnv_tce_build,
-#ifdef CONFIG_IOMMU_API
-   .exchange = pnv_tce_xchg,
-#endif
-   .clear = pnv_tce_free,
-   .get = pnv_tce_get,
-};
-
-static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
-struct pci_dev *pdev)
-{
-   struct iommu_table *tbl = phb->p5ioc2.table_group.tables[0];
-
-   if (!tbl->it_map) {
-   tbl->it_ops = _p5ioc2_iommu_ops;
-   iommu_init_table(tbl, phb->hose->node);
-   iommu_register_group(>p5ioc2.table_group,
-   pci_domain_nr(phb->hose->bus), 

Re: [PATCH V3] powerpc/powernv: Remove support for p5ioc2

2016-02-07 Thread Gavin Shan
On Mon, Feb 08, 2016 at 03:08:20PM +1100, Russell Currey wrote:
>"p5ioc2 is used by approximately 2 machines in the world, and has never
>ever been a supported configuration."
>
>The code for p5ioc2 is essentially unused and complicates what is already
>a very complicated codebase.  Its removal is essentially a "free win" in
>the effort to simplify the powernv PCI code.
>
>In addition, support for p5ioc2 has been dropped from skiboot.  There's no
>reason to keep it around in the kernel.
>
>Signed-off-by: Russell Currey 

Acked-by: Gavin Shan 

Note: I plan to rebase my next hotplug patchset revision (v8) on top of this 
one.

Thanks,
Gavin

>---
>V3: Remove now-useless variable "found_ioda" thanks to Gavin
>V2: Remove pointless union and rebase on -next thanks to Andrew
>
>Tested on a P7IOC machine and a PHB3 machine.
>
>Skiboot p5ioc2 removal patch: https://patchwork.ozlabs.org/patch/544898/
>---
> arch/powerpc/platforms/powernv/Makefile |   2 +-
> arch/powerpc/platforms/powernv/pci-p5ioc2.c | 271 
> arch/powerpc/platforms/powernv/pci.c|  17 +-
> arch/powerpc/platforms/powernv/pci.h| 152 
> 4 files changed, 74 insertions(+), 368 deletions(-)
> delete mode 100644 arch/powerpc/platforms/powernv/pci-p5ioc2.c
>
>diff --git a/arch/powerpc/platforms/powernv/Makefile 
>b/arch/powerpc/platforms/powernv/Makefile
>index f1516b5..cd9711e 100644
>--- a/arch/powerpc/platforms/powernv/Makefile
>+++ b/arch/powerpc/platforms/powernv/Makefile
>@@ -5,7 +5,7 @@ obj-y  += opal-msglog.o opal-hmi.o 
>opal-power.o opal-irqchip.o
> obj-y += opal-kmsg.o
> 
> obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o
>-obj-$(CONFIG_PCI) += pci.o pci-p5ioc2.o pci-ioda.o npu-dma.o
>+obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o
> obj-$(CONFIG_EEH) += eeh-powernv.o
> obj-$(CONFIG_PPC_SCOM)+= opal-xscom.o
> obj-$(CONFIG_MEMORY_FAILURE)  += opal-memory-errors.o
>diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c 
>b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
>deleted file mode 100644
>index f2bdfea..000
>--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
>+++ /dev/null
>@@ -1,271 +0,0 @@
>-/*
>- * Support PCI/PCIe on PowerNV platforms
>- *
>- * Currently supports only P5IOC2
>- *
>- * Copyright 2011 Benjamin Herrenschmidt, IBM Corp.
>- *
>- * This program is free software; you can redistribute it and/or
>- * modify it under the terms of the GNU General Public License
>- * as published by the Free Software Foundation; either version
>- * 2 of the License, or (at your option) any later version.
>- */
>-
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-#include 
>-
>-#include "powernv.h"
>-#include "pci.h"
>-
>-/* For now, use a fixed amount of TCE memory for each p5ioc2
>- * hub, 16M will do
>- */
>-#define P5IOC2_TCE_MEMORY 0x0100
>-
>-#ifdef CONFIG_PCI_MSI
>-static int pnv_pci_p5ioc2_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
>-  unsigned int hwirq, unsigned int virq,
>-  unsigned int is_64, struct msi_msg *msg)
>-{
>-  if (WARN_ON(!is_64))
>-  return -ENXIO;
>-  msg->data = hwirq - phb->msi_base;
>-  msg->address_hi = 0x1000;
>-  msg->address_lo = 0;
>-
>-  return 0;
>-}
>-
>-static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb)
>-{
>-  unsigned int count;
>-  const __be32 *prop = of_get_property(phb->hose->dn,
>-   "ibm,opal-msi-ranges", NULL);
>-  if (!prop)
>-  return;
>-
>-  /* Don't do MSI's on p5ioc2 PCI-X are they are not properly
>-   * verified in HW
>-   */
>-  if (of_device_is_compatible(phb->hose->dn, "ibm,p5ioc2-pcix"))
>-  return;
>-  phb->msi_base = be32_to_cpup(prop);
>-  count = be32_to_cpup(prop + 1);
>-  if (msi_bitmap_alloc(>msi_bmp, count, phb->hose->dn)) {
>-  pr_err("PCI %d: Failed to allocate MSI bitmap !\n",
>- phb->hose->global_number);
>-  return;
>-  }
>-  phb->msi_setup = pnv_pci_p5ioc2_msi_setup;
>-  phb->msi32_support = 0;
>-  pr_info(" Allocated bitmap for %d MSIs (base IRQ 0x%x)\n",
>-  count, phb->msi_base);
>-}
>-#else
>-static void pnv_pci_init_p5ioc2_msis(struct pnv_phb *phb) { }
>-#endif /* CONFIG_PCI_MSI */
>-
>-static struct iommu_table_ops pnv_p5ioc2_iommu_ops = {
>-  .set = pnv_tce_build,
>-#ifdef CONFIG_IOMMU_API
>-  .exchange = pnv_tce_xchg,
>-#endif
>-  .clear = pnv_tce_free,
>-  .get = pnv_tce_get,
>-};
>-
>-static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
>-   struct pci_dev *pdev)
>-{
>-  

[PATCH 2/2] powerpc/eeh: Reworked eeh_pe_bus_get()

2016-02-07 Thread Gavin Shan
The original implementation is ugly: unnecessary if statements and
"out" tag. This reworks the function to avoid above weaknesses. No
functional changes introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh_pe.c | 28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 8654cb1..1d64e60 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -923,25 +923,21 @@ out:
  */
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
 {
-   struct pci_bus *bus = NULL;
struct eeh_dev *edev;
struct pci_dev *pdev;
 
-   if (pe->type & EEH_PE_PHB) {
-   bus = pe->phb->bus;
-   } else if (pe->type & EEH_PE_BUS ||
-  pe->type & EEH_PE_DEVICE) {
-   if (pe->bus) {
-   bus = pe->bus;
-   goto out;
-   }
+   if (pe->type & EEH_PE_PHB)
+   return pe->phb->bus;
 
-   edev = list_first_entry(>edevs, struct eeh_dev, list);
-   pdev = eeh_dev_to_pci_dev(edev);
-   if (pdev)
-   bus = pdev->bus;
-   }
+   /* The primary bus might be cached during probe time */
+   if (pe->bus)
+   return pe->bus;
 
-out:
-   return bus;
+   /* Retrieve the parent PCI bus of first (top) PCI device */
+   edev = list_first_entry_or_null(>edevs, struct eeh_dev, list);
+   pdev = eeh_dev_to_pci_dev(edev);
+   if (pdev)
+   return pdev->bus;
+
+   return NULL;
 }
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 1/9] memblock: Don't mark memblock_phys_mem_size() as __init

2016-02-07 Thread Paul Mackerras
On Fri, Jan 29, 2016 at 04:23:55PM +1100, David Gibson wrote:
> At the moment memblock_phys_mem_size() is marked as __init, and so is
> discarded after boot.  This is different from most of the memblock
> functions which are marked __init_memblock, and are only discarded after
> boot if memory hotplug is not configured.
> 
> To allow for upcoming code which will need memblock_phys_mem_size() in the
> hotplug path, change it from __init to __init_memblock.
> 
> Signed-off-by: David Gibson 

Reviewed-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 2/9] arch/powerpc: Clean up error handling for htab_remove_mapping

2016-02-07 Thread Paul Mackerras
On Fri, Jan 29, 2016 at 04:23:56PM +1100, David Gibson wrote:
> Currently, the only error that htab_remove_mapping() can report is -EINVAL,
> if removal of bolted HPTEs isn't implemeted for this platform.  We make
> a few clean ups to the handling of this:
> 
>  * EINVAL isn't really the right code - there's nothing wrong with the
>function's arguments - use ENODEV instead
>  * We were also printing a warning message, but that's a decision better
>left up to the callers, so remove it
>  * One caller is vmemmap_remove_mapping(), which will just BUG_ON() on
>error, making the warning message irrelevant, so no change is needed
>there.
>  * The other caller is remove_section_mapping().  This is called in the
>memory hot remove path at a point after vmemmap_remove_mapping() so
>if hpte_removebolted isn't implemented, we'd expect to have already
>BUG()ed anyway.  Put a WARN_ON() here, in lieu of a printk() since this
>really shouldn't be happening.
> 
> Signed-off-by: David Gibson 

Reviewed-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 5/9] arch/powerpc: Split hash page table sizing heuristic into a helper

2016-02-07 Thread Paul Mackerras
On Thu, Feb 04, 2016 at 04:26:20PM +0530, Anshuman Khandual wrote:
> On 02/02/2016 06:34 AM, David Gibson wrote:
> > On Mon, Feb 01, 2016 at 12:34:32PM +0530, Anshuman Khandual wrote:
> >> On 01/29/2016 10:53 AM, David Gibson wrote:
> >>> htab_get_table_size() either retrieve the size of the hash page table 
> >>> (HPT)
> >>> from the device tree - if the HPT size is determined by firmware - or
> >>> uses a heuristic to determine a good size based on RAM size if the kernel
> >>> is responsible for allocating the HPT.
> >>>
> >>> To support a PAPR extension allowing resizing of the HPT, we're going to
> >>> want the memory size -> HPT size logic elsewhere, so split it out into a
> >>> helper function.
> >>>
> >>> Signed-off-by: David Gibson 
> >>> ---
> >>>  arch/powerpc/include/asm/mmu-hash64.h |  3 +++
> >>>  arch/powerpc/mm/hash_utils_64.c   | 30 +-
> >>>  2 files changed, 20 insertions(+), 13 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
> >>> b/arch/powerpc/include/asm/mmu-hash64.h
> >>> index 7352d3f..cf070fd 100644
> >>> --- a/arch/powerpc/include/asm/mmu-hash64.h
> >>> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> >>> @@ -607,6 +607,9 @@ static inline unsigned long get_kernel_vsid(unsigned 
> >>> long ea, int ssize)
> >>>   context = (MAX_USER_CONTEXT) + ((ea >> 60) - 0xc) + 1;
> >>>   return get_vsid(context, ea, ssize);
> >>>  }
> >>> +
> >>> +unsigned htab_shift_for_mem_size(unsigned long mem_size);
> >>> +
> >>>  #endif /* __ASSEMBLY__ */
> >>>  
> >>>  #endif /* _ASM_POWERPC_MMU_HASH64_H_ */
> >>> diff --git a/arch/powerpc/mm/hash_utils_64.c 
> >>> b/arch/powerpc/mm/hash_utils_64.c
> >>> index e88a86e..d63f7dc 100644
> >>> --- a/arch/powerpc/mm/hash_utils_64.c
> >>> +++ b/arch/powerpc/mm/hash_utils_64.c
> >>> @@ -606,10 +606,24 @@ static int __init htab_dt_scan_pftsize(unsigned 
> >>> long node,
> >>>   return 0;
> >>>  }
> >>>  
> >>> -static unsigned long __init htab_get_table_size(void)
> >>> +unsigned htab_shift_for_mem_size(unsigned long mem_size)
> >>>  {
> >>> - unsigned long mem_size, rnd_mem_size, pteg_count, psize;
> >>> + unsigned memshift = __ilog2(mem_size);
> >>> + unsigned pshift = mmu_psize_defs[mmu_virtual_psize].shift;
> >>> + unsigned pteg_shift;
> >>> +
> >>> + /* round mem_size up to next power of 2 */
> >>> + if ((1UL << memshift) < mem_size)
> >>> + memshift += 1;
> >>> +
> >>> + /* aim for 2 pages / pteg */
> >>
> >> While here I guess its a good opportunity to write couple of lines
> >> about why one PTE group for every two physical pages on the system,
> > 
> > Well, that don't really know, it's just copied from the existing code.
> 
> Aneesh, would you know why ?

1 PTEG per 2 pages means 4 HPTEs per page, which means you can map
each page to an average of 4 different virtual addresses.  It's a
heuristic that has been around for a long time and dates back to the
early days of AIX.  For Linux, running on machines which typically
have quite a lot of memory, it's probably overkill.

> > 
> >> why minimum (1UL << 11 = 2048) number of PTE groups required,
> 
> Aneesh, would you know why ?

It's in the architecture, which specifies the minimum size of the HPT
as 256kB.  The reason is because not all of the virtual address bits
are present in the HPT.  That's OK because some of the virtual address
bits are implied by the HPTEG index in the hash table.  If the HPT was
less than 256kB (2048 HPTEGs) there would be the possibility of
collisions where two different virtual addresses could hash to the
same HPTEG and their HPTEs would be impossible to tell apart.

> 
> > 
> > Ok.
> > 
> >> why
> >> (1U << 7 = 128) entries per PTE group
> > 
> > Um.. what?  Because that's how big a PTEG is, I don't think
> > re-explaining the HPT structure here is useful.
> 
> Agreed, though think some where these things should be macros not used
> as hard coded numbers like this.

Using symbols instead of constant numbers is not always clearer.  The
symbol name can give some context (but so can a suitable comment) but
has the cost of obscuring the actual numeric value.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 7/9] pseries: Add support for hash table resizing

2016-02-07 Thread Paul Mackerras
On Fri, Jan 29, 2016 at 04:24:01PM +1100, David Gibson wrote:
> This adds support for using experimental hypercalls to change the size
> of the main hash page table while running as a PAPR guest.  For now these
> hypercalls are only in experimental qemu versions.
> 
> The interface is two part: first H_RESIZE_HPT_PREPARE is used to allocate
> and prepare the new hash table.  This may be slow, but can be done
> asynchronously.  Then, H_RESIZE_HPT_COMMIT is used to switch to the new
> hash table.  This requires that no CPUs be concurrently updating the HPT,
> and so must be run under stop_machine().
> 
> This also adds a debugfs file which can be used to manually control
> HPT resizing or testing purposes.
> 
> Signed-off-by: David Gibson 

Reviewed-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 6/9] pseries: Add hypercall wrappers for hash page table resizing

2016-02-07 Thread Paul Mackerras
On Fri, Jan 29, 2016 at 04:24:00PM +1100, David Gibson wrote:
> This adds the hypercall numbers and wrapper functions for the hash page
> table resizing hypercalls.
> 
> These are experimental "platform specific" values for now, until we have a
> formal PAPR update.
> 
> It also adds a new firmware feature flat to track the presence of the
> HPT resizing calls.
> 
> Signed-off-by: David Gibson 

Reviewed-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 9/9] pseries: Automatically resize HPT for memory hot add/remove

2016-02-07 Thread Paul Mackerras
On Fri, Jan 29, 2016 at 04:24:03PM +1100, David Gibson wrote:
> We've now implemented code in the pseries platform to use the new PAPR
> interface to allow resizing the hash page table (HPT) at runtime.
> 
> This patch uses that interface to automatically attempt to resize the HPT
> when memory is hot added or removed.  This tries to always keep the HPT at
> a reasonable size for our current memory size.
> 
> Signed-off-by: David Gibson 

Reviewed-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2] powerpc/mm: Fix Multi hit ERAT cause by recent THP update

2016-02-07 Thread Aneesh Kumar K.V
With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.

Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.

Consider following race:

CPU0CPU1
shrink_page_list()
  add_to_swap()
split_huge_page_to_list()
  __split_huge_pmd_locked()
pmdp_huge_clear_flush_notify()
// pmd_none() == true
exit_mmap()
  unmap_vmas()
zap_pmd_range()
  // no action on pmd since 
pmd_none() == true
pmd_populate()

As result the THP will not be freed. The leak is detected by check_mm():

BUG: Bad rss-counter state mm:880058d2e580 idx:1 val:512

The above required us to not mark pmd none during a pmd split.

The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access.

Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  4 
 arch/powerpc/mm/pgtable_64.c | 35 +++-
 include/asm-generic/pgtable.h|  8 +++
 mm/huge_memory.c |  1 +
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 8d1c41d28318..0415856941e0 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -281,6 +281,10 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct 
mm_struct *mm, pmd_t *pmdp);
 extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp);
 
+#define __HAVE_ARCH_PMDP_HUGE_SPLITTING_FLUSH
+extern void pmdp_huge_splitting_flush(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
+
 #define pmd_move_must_withdraw pmd_move_must_withdraw
 struct spinlock;
 static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 3124a20d0fab..e8214b7f2210 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -646,6 +646,30 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct 
*mm, pmd_t *pmdp)
return pgtable;
 }
 
+void pmdp_huge_splitting_flush(struct vm_area_struct *vma,
+  unsigned long address, pmd_t *pmdp)
+{
+   VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+
+#ifdef CONFIG_DEBUG_VM
+   BUG_ON(REGION_ID(address) != USER_REGION_ID);
+#endif
+   /*
+* We can't mark the pmd none here, because that will cause a race
+* against exit_mmap. We need to continue mark pmd TRANS HUGE, while
+* we spilt, but at the same time we wan't rest of the ppc64 code
+* not to insert hash pte on this, because we will be modifying
+* the deposited pgtable in the caller of this function. Hence
+* clear the _PAGE_USER so that we move the fault handling to
+* higher level function and that will serialize against ptl.
+* We need to flush existing hash pte entries here even though,
+* the translation is still valid, because we will withdraw
+* pgtable_t after this.
+*/
+   pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_USER, 0);
+}
+
+
 /*
  * set a new huge pmd. We should not be called for updating
  * an existing pmd entry. That should go via pmd_hugepage_update.
@@ -663,10 +687,19 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
 }
 
+/*
+ * We use this to invalidate a pmdp entry before switching from a
+ * hugepte to regular pmd entry.
+ */
 void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 pmd_t *pmdp)
 {
-   pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
+   

Re: [PATCH V3] powerpc/powernv: Remove support for p5ioc2

2016-02-07 Thread Stewart Smith
Russell Currey  writes:

> "p5ioc2 is used by approximately 2 machines in the world, and has never
> ever been a supported configuration."
>
> The code for p5ioc2 is essentially unused and complicates what is already
> a very complicated codebase.  Its removal is essentially a "free win" in
> the effort to simplify the powernv PCI code.
>
> In addition, support for p5ioc2 has been dropped from skiboot.  There's no
> reason to keep it around in the kernel.
>
> Signed-off-by: Russell Currey 

Yep, it's gone from firmware and there was only ever a handful of
machines inside development labs inside IBM that had it.

We may still have one in the lab, but I agree - it's not worth
maintaining it.

Acked-by: Stewart Smith 

-- 
Stewart Smith
OPAL Architect, IBM.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] powerpc/powernv: Simplify definitions of EEH debugfs handlers

2016-02-07 Thread Gavin Shan
The EEH debugfs handlers have same prototype. This introduces
a macro to define them, then to simplify the code. No logical
changes.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 60 ++--
 1 file changed, 22 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 5f152b9..3f1cb35 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -167,42 +167,26 @@ static int pnv_eeh_dbgfs_get(void *data, int offset, u64 
*val)
return 0;
 }
 
-static int pnv_eeh_outb_dbgfs_set(void *data, u64 val)
-{
-   return pnv_eeh_dbgfs_set(data, 0xD10, val);
-}
-
-static int pnv_eeh_outb_dbgfs_get(void *data, u64 *val)
-{
-   return pnv_eeh_dbgfs_get(data, 0xD10, val);
-}
-
-static int pnv_eeh_inbA_dbgfs_set(void *data, u64 val)
-{
-   return pnv_eeh_dbgfs_set(data, 0xD90, val);
-}
-
-static int pnv_eeh_inbA_dbgfs_get(void *data, u64 *val)
-{
-   return pnv_eeh_dbgfs_get(data, 0xD90, val);
-}
-
-static int pnv_eeh_inbB_dbgfs_set(void *data, u64 val)
-{
-   return pnv_eeh_dbgfs_set(data, 0xE10, val);
-}
-
-static int pnv_eeh_inbB_dbgfs_get(void *data, u64 *val)
-{
-   return pnv_eeh_dbgfs_get(data, 0xE10, val);
-}
+#define PNV_EEH_DBGFS_ENTRY(name, reg) \
+static int pnv_eeh_dbgfs_set_##name(void *data, u64 val)   \
+{  \
+   return pnv_eeh_dbgfs_set(data, reg, val);   \
+}  \
+   \
+static int pnv_eeh_dbgfs_get_##name(void *data, u64 *val)  \
+{  \
+   return pnv_eeh_dbgfs_get(data, reg, val);   \
+}  \
+   \
+DEFINE_SIMPLE_ATTRIBUTE(pnv_eeh_dbgfs_ops_##name,  \
+   pnv_eeh_dbgfs_get_##name,   \
+pnv_eeh_dbgfs_set_##name,  \
+   "0x%llx\n")
+
+PNV_EEH_DBGFS_ENTRY(outb, 0xD10);
+PNV_EEH_DBGFS_ENTRY(inbA, 0xD90);
+PNV_EEH_DBGFS_ENTRY(inbB, 0xE10);
 
-DEFINE_SIMPLE_ATTRIBUTE(pnv_eeh_outb_dbgfs_ops, pnv_eeh_outb_dbgfs_get,
-   pnv_eeh_outb_dbgfs_set, "0x%llx\n");
-DEFINE_SIMPLE_ATTRIBUTE(pnv_eeh_inbA_dbgfs_ops, pnv_eeh_inbA_dbgfs_get,
-   pnv_eeh_inbA_dbgfs_set, "0x%llx\n");
-DEFINE_SIMPLE_ATTRIBUTE(pnv_eeh_inbB_dbgfs_ops, pnv_eeh_inbB_dbgfs_get,
-   pnv_eeh_inbB_dbgfs_set, "0x%llx\n");
 #endif /* CONFIG_DEBUG_FS */
 
 /**
@@ -268,13 +252,13 @@ static int pnv_eeh_post_init(void)
 
debugfs_create_file("err_injct_outbound", 0600,
phb->dbgfs, hose,
-   _eeh_outb_dbgfs_ops);
+   _eeh_dbgfs_ops_outb);
debugfs_create_file("err_injct_inboundA", 0600,
phb->dbgfs, hose,
-   _eeh_inbA_dbgfs_ops);
+   _eeh_dbgfs_ops_inbA);
debugfs_create_file("err_injct_inboundB", 0600,
phb->dbgfs, hose,
-   _eeh_inbB_dbgfs_ops);
+   _eeh_dbgfs_ops_inbB);
 #endif /* CONFIG_DEBUG_FS */
}
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 3/9] arch/powerpc: Handle removing maybe-present bolted HPTEs

2016-02-07 Thread Paul Mackerras
On Fri, Jan 29, 2016 at 04:23:57PM +1100, David Gibson wrote:
> At the moment the hpte_removebolted callback in ppc_md returns void and
> will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
> place.  This is awkward for the case of cleaning up a mapping which was
> partially made before failing.
> 
> So, we add a return value to hpte_removebolted, and have it return ENOENT
> in the case that the HPTE to remove didn't exist in the first place.
> 
> In the (sole) caller, we propagate errors in hpte_removebolted to its
> caller to handle.  However, we handle ENOENT specially, continuing to
> complete the unmapping over the specified range before returning the error
> to the caller.
> 
> This means that htab_remove_mapping() will work sanely on a partially
> present mapping, removing any HPTEs which are present, while also returning
> ENOENT to its caller in case it's important there.
> 
> There are two callers of htab_remove_mapping():
>- In remove_section_mapping() we already WARN_ON() any error return,
>  which is reasonable - in this case the mapping should be fully
>  present
>- In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
>  just a WARN_ON() in the case of ENOENT, since failing to remove a
>  mapping that wasn't there in the first place probably shouldn't be
>  fatal.
> 
> Signed-off-by: David Gibson 

[snip]

> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -269,6 +269,7 @@ int htab_remove_mapping(unsigned long vstart, unsigned 
> long vend,
>  {
>   unsigned long vaddr;
>   unsigned int step, shift;
> + int rc = 0;
>  
>   shift = mmu_psize_defs[psize].shift;
>   step = 1 << shift;
> @@ -276,10 +277,13 @@ int htab_remove_mapping(unsigned long vstart, unsigned 
> long vend,
>   if (!ppc_md.hpte_removebolted)
>   return -ENODEV;
>  
> - for (vaddr = vstart; vaddr < vend; vaddr += step)
> - ppc_md.hpte_removebolted(vaddr, psize, ssize);
> + for (vaddr = vstart; vaddr < vend; vaddr += step) {
> + rc = ppc_md.hpte_removebolted(vaddr, psize, ssize);
> + if ((rc < 0) && (rc != -ENOENT))
> + return rc;
> + }
>  
> - return 0;
> + return rc;

This will return the rc from the last hpte_removebolted call, which
might be 0 even if earlier calls had returned -ENOENT.  Or, if the
last call fails with -ENOENT, this will return -ENOENT.  Is that
exactly what you meant?  In the case where some calls to
hpte_removebolted return -ENOENT, I would think we would want a
consistent return value, which could be either 0 or -ENOENT, but it
shouldn't depend on which specific calls fail with -ENOENT, in my
opinion.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 4/9] arch/powerpc: Clean up memory hotplug failure paths

2016-02-07 Thread Paul Mackerras
On Fri, Jan 29, 2016 at 04:23:58PM +1100, David Gibson wrote:
> This makes a number of cleanups to handling of mapping failures during
> memory hotplug on Power:
> 
> For errors creating the linear mapping for the hot-added region:
>   * This is now reported with EFAULT which is more appropriate than the
> previous EINVAL (the failure is unlikely to be related to the
> function's parameters)
>   * An error in this path now prints a warning message, rather than just
> silently failing to add the extra memory.
>   * Previously a failure here could result in the region being partially
> mapped.  We now clean up any partial mapping before failing.
> 
> For errors creating the vmemmap for the hot-added region:
>* This is now reported with EFAULT instead of causing a BUG() - this
>  could happen for external reason (e.g. full hash table) so it's better
>  to handle this non-fatally
>* An error message is also printed, so the failure won't be silent
>* As above a failure could cause a partially mapped region, we now
>  clean this up.
> 
> Signed-off-by: David Gibson 

Reviewed-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/mm: Fix Multi hit ERAT cause by recent THP update

2016-02-07 Thread Aneesh Kumar K.V
"Kirill A. Shutemov"  writes:

> On Fri, Feb 05, 2016 at 11:41:40PM +0530, Aneesh Kumar K.V wrote:
>> With ppc64 we use the deposted pgtable_t to store the hash pte slot
>> information. We should not withdraw the deposited pgtable_t without
>> marking the pmd none. This ensure that low level hash fault handling
>> will skip this huge pte and we will handle them at upper levels. We
>> do take page table lock there and we can serialize against a parallel
>> THP split there. Hence mark the pte none (ie, remove __PAGE_USER) before
>> splitting the huge pmd.
>> 
>> Also make sure we wait for irq disable section in other cpus to finish
>> before flipping a huge pte entry with a regular pmd entry. Code paths
>> like find_linux_pte_or_hugepte depend on irq disable to get
>> a stable pte_t pointer. A parallel thp split need to make sure we
>> don't convert a pmd pte to a regular pmd entry without waiting for the
>> irq disable section to finish.
>> 
>> Signed-off-by: Aneesh Kumar K.V 
>
> Cc list is too short. At least akpm@ and linux-mm@ should be there.
> Probably numa balancing folks.

Will add them in the next iteration.

>
> Have you tested it with CONFIG_NUMA_BALANCING disabled?


yes.


>
> I would expect some additional changes in this area would be required.
> pmd_protnone() is always zero without numa balancing compiled in and
> therefore I don't see where we will get this serialization agians ptl on
> fault side.


I am not really depending on the pmd_protnone definition here. The thing
that I am depending with respect to the core code is that after taking
ptl, all the code path should check for pmd using pmd_same. If found not
matching they should force a retry. All code path within pmd_trans_huge()
check seem to do so. 

>
>> ---
>>  arch/powerpc/include/asm/book3s/64/pgtable.h |  4 
>>  arch/powerpc/mm/pgtable_64.c | 36 
>> +++-
>>  include/asm-generic/pgtable.h|  8 +++
>>  mm/huge_memory.c |  1 +
>>  4 files changed, 48 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 8d1c41d28318..0415856941e0 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -281,6 +281,10 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct 
>> mm_struct *mm, pmd_t *pmdp);
>>  extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long 
>> address,
>>  pmd_t *pmdp);
>>  
>> +#define __HAVE_ARCH_PMDP_HUGE_SPLITTING_FLUSH
>> +extern void pmdp_huge_splitting_flush(struct vm_area_struct *vma,
>> +  unsigned long address, pmd_t *pmdp);
>
> I don't really like the name, but cannot think of anything better.


same here. I will keep this as it is for now. ?


>
>> +
>>  #define pmd_move_must_withdraw pmd_move_must_withdraw
>>  struct spinlock;
>>  static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
>> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
>> index 3124a20d0fab..d80a23a92f95 100644
>> --- a/arch/powerpc/mm/pgtable_64.c
>> +++ b/arch/powerpc/mm/pgtable_64.c
>> @@ -646,6 +646,31 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct 
>> *mm, pmd_t *pmdp)
>>  return pgtable;
>>  }

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] powerpc/eeh: Reworked eeh_pe_bus_get()

2016-02-07 Thread Andrew Donnellan

On 08/02/16 16:35, Gavin Shan wrote:

The original implementation is ugly: unnecessary if statements and
"out" tag. This reworks the function to avoid above weaknesses. No
functional changes introduced.

Signed-off-by: Gavin Shan 


This is definitely a lot nicer to read and doesn't appear to have any 
functional changes.


Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 8/9] pseries: Advertise HPT resizing support via CAS

2016-02-07 Thread Paul Mackerras
On Fri, Jan 29, 2016 at 04:24:02PM +1100, David Gibson wrote:
> The hypervisor needs to know a guest is capable of using the HPT resizing
> PAPR extension in order to make full advantage of it for memory hotplug.
> 
> If the hypervisor knows the guest is HPT resize aware, it can size the
> initial HPT based on the initial guest RAM size, relying on the guest to
> resize the HPT when more memory is hot-added.  Without this, the hypervisor
> must size the HPT for the maximum possible guest RAM, which can lead to
> a huge waste of space if the guest never actually expends to that maximum
> size.
> 
> This patch advertises the guest's support for HPT resizing via the
> ibm,client-architecture-support OF interface.  Obviously, the actual
> encoding in the CAS vector is tentative until the extension is officially
> incorporated into PAPR.  For now we use bit 0 of (previously unused) byte 8
> of option vector 5.
> 
> Signed-off-by: David Gibson 

Reviewed-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev