Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-20 Thread Boqun Feng
On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> 
> Am I missing something here?  If not, it seems to me that you need
> the leading lwsync to instead be a sync.
> 
> Of course, if I am not missing something, then this applies also to the
> value-returning RMW atomic operations that you pulled this pattern from.
> If so, it would seem that I didn't think through all the possibilities
> back when PPC_ATOMIC_EXIT_BARRIER moved to sync...  In fact, I believe
> that I worried about the RMW atomic operation acting as a barrier,
> but not as the load/store itself.  :-/
> 

Paul, I know this may be difficult, but could you recall why the
__futex_atomic_op() and futex_atomic_cmpxchg_inatomic() also got
involved into the movement of PPC_ATOMIC_EXIT_BARRIER to "sync"?

I did some search, but couldn't find the discussion of that patch.

I ask this because I recall Peter once bought up a discussion:

https://lkml.org/lkml/2015/8/26/596

Peter's conclusion seems to be that we could(though didn't want to) live
with futex atomics not being full barriers.


Peter, just be clear, I'm not in favor of relaxing futex atomics. But if
I make PPC_ATOMIC_ENTRY_BARRIER being "sync", it will also strengthen
the futex atomics, just wonder whether such strengthen is a -fix- or
not, considering that I want this patch to go to -stable tree.


Of course, in the meanwhile of waiting for your answer, I will try to
figure out this by myself ;-)

Regards,
Boqun


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] barriers: introduce smp_mb__release_acquire and update documentation

2015-10-20 Thread Boqun Feng
On Mon, Oct 19, 2015 at 12:23:24PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:17:18AM +0800, Boqun Feng wrote:
> > This is confusing me right now. ;-)
> > 
> > Let's use a simple example for only one primitive, as I understand it,
> > if we say a primitive A is "fully ordered", we actually mean:
> > 
> > 1.  The memory operations preceding(in program order) A can't be
> > reordered after the memory operations following(in PO) A.
> > 
> > and
> > 
> > 2.  The memory operation(s) in A can't be reordered before the
> > memory operations preceding(in PO) A and after the memory
> > operations following(in PO) A.
> > 
> > If we say A is a "full barrier", we actually means:
> > 
> > 1.  The memory operations preceding(in program order) A can't be
> > reordered after the memory operations following(in PO) A.
> > 
> > and
> > 
> > 2.  The memory ordering guarantee in #1 is visible globally.
> > 
> > Is that correct? Or "full barrier" is more strong than I understand,
> > i.e. there is a third property of "full barrier":
> > 
> > 3.  The memory operation(s) in A can't be reordered before the
> > memory operations preceding(in PO) A and after the memory
> > operations following(in PO) A.
> > 
> > IOW, is "full barrier" a more strong version of "fully ordered" or not?
> 
> Yes, that was how I used it.
> 
> Now of course; the big question is do we want to promote this usage or
> come up with a different set of words describing this stuff.
> 
> I think separating the ordering from the transitivity is useful, for we
> can then talk about and specify them independently.
> 

Great idea! 

> That is, we can say:
> 
>   LOAD-ACQUIRE: orders LOAD->{LOAD,STORE}
> weak transitivity (RCpc)
> 
>   MB: orders {LOAD,STORE}->{LOAD,STORE} (fully ordered)
>   strong transitivity (RCsc)
> 

It will be helpful if we have this kind of description for each
primitive mentioned in memory-barriers.txt, which, IMO, is better than
the description like the following:

"""
Any atomic operation that modifies some state in memory and returns information
about the state (old or new) implies an SMP-conditional general memory barrier
(smp_mb()) on each side of the actual operation (with the exception of
"""

I'm assuming that the arrow "->" stands for the program order, and word
"orders" means that a primitive guarantees some program order becomes
the memory operation order, so that the description above can be
rewritten as:

value-returning atomics:
orders {LOAD,STORE}->RmW(atomic operation)->{LOAD,STORE}
strong transitivity

much simpler and clearer for discussion and reasoning

Regards,
Boqun

> etc..
> 
> Also, in the above I used weak and strong transitivity, but that too is
> of course up for grabs.


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V6 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR

2015-10-20 Thread Wei Yang
On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
from 64bit prefetchable window, which means M64 BAR can't work on it.

The reason is PCI bridges support only 2 windows and the kernel code
programs bridges in the way that one window is 32bit-nonprefetchable and
the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
non-prefetchable, it will be mapped into 32bit space and therefore M64
cannot be used for it.

This patch makes this explicit and truncate IOV resource in this case to
save MMIO space.

Signed-off-by: Wei Yang 
Reviewed-by: Gavin Shan 
Acked-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 34 ---
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 85cbc96..f042fed 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
int offset)
if (!res->flags || !res->parent)
continue;
 
-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
/*
 * The actual IOV BAR range is determined by the start address
 * and the actual size for num_vfs VFs BAR.  This check is to
@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
int offset)
if (!res->flags || !res->parent)
continue;
 
-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
res2 = *res;
res->start += size * offset;
@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
u16 num_vfs)
if (!res->flags || !res->parent)
continue;
 
-   if (!pnv_pci_is_mem_pref_64(res->flags))
-   continue;
-
for (j = 0; j < vf_groups; j++) {
do {
win = 
find_next_zero_bit(>ioda.m64_bar_alloc,
@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
num_vfs)
pdn = pci_get_pdn(pdev);
 
if (phb->type == PNV_PHB_IODA2) {
+   if (!pdn->vfs_expanded) {
+   dev_info(>dev, "don't support this SRIOV device"
+   " with non 64bit-prefetchable IOV BAR\n");
+   return -ENOSPC;
+   }
+
/* Calculate available PE for required VFs */
mutex_lock(>ioda.pe_alloc_mutex);
pdn->offset = bitmap_find_next_zero_area(
@@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
if (!res->flags || res->parent)
continue;
if (!pnv_pci_is_mem_pref_64(res->flags)) {
-   dev_warn(>dev, " non M64 VF BAR%d: %pR\n",
+   dev_warn(>dev, "Don't support SR-IOV with"
+   " non M64 VF BAR%d: %pR. \n",
 i, res);
-   continue;
+   goto truncate_iov;
}
 
size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
@@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
res = >resource[i + PCI_IOV_RESOURCES];
if (!res->flags || res->parent)
continue;
-   if (!pnv_pci_is_mem_pref_64(res->flags)) {
-   dev_warn(>dev, "Skipping expanding VF BAR%d: 
%pR\n",
-i, res);
-   continue;
-   }
 
dev_dbg(>dev, " Fixing VF BAR%d: %pR to\n", i, res);
size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
@@ -2810,6 +2803,15 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
 i, res, mul);
}
pdn->vfs_expanded = mul;
+
+   return;
+
+truncate_iov:
+   /* To save MMIO space, IOV BAR is truncated. */
+   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+   res = >resource[i + PCI_IOV_RESOURCES];
+   res->end = res->start - 1;
+   }
 }
 #endif /* CONFIG_PCI_IOV */
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V6 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR

2015-10-20 Thread Wei Yang
In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
BARs in Single PE mode to cover the number of VFs required to be enabled.
By doing so, several VFs would be in one VF Group and leads to interference
between VFs in the same group.

And in this patch, m64_wins is renamed to m64_map, which means index number
of the M64 BAR used to map the VF BAR. Based on Gavin's comments. Also
makes sure the VF BAR size is bigger than 32MB when M64 BAR is used in
Single PE mode.

This patch changes the design by using one M64 BAR in Single PE mode for
one VF BAR. This gives absolute isolation for VFs.

Signed-off-by: Wei Yang 
Reviewed-by: Gavin Shan 
Acked-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/pci-bridge.h |   5 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 177 --
 2 files changed, 75 insertions(+), 107 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 712add5..8aeba4c 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -214,10 +214,9 @@ struct pci_dn {
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
int offset; /* PE# for the first VF PE */
-#define M64_PER_IOV 4
-   int m64_per_iov;
+   boolm64_single_mode;/* Use M64 BAR in Single Mode */
 #define IODA_INVALID_M64(-1)
-   int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
+   int (*m64_map)[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 629ab1b..dc64026 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
 }
 
 #ifdef CONFIG_PCI_IOV
-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
 {
struct pci_bus*bus;
struct pci_controller *hose;
struct pnv_phb*phb;
struct pci_dn *pdn;
inti, j;
+   intm64_bars;
 
bus = pdev->bus;
hose = pci_bus_to_host(bus);
phb = hose->private_data;
pdn = pci_get_pdn(pdev);
 
+   if (pdn->m64_single_mode)
+   m64_bars = num_vfs;
+   else
+   m64_bars = 1;
+
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-   for (j = 0; j < M64_PER_IOV; j++) {
-   if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
+   for (j = 0; j < m64_bars; j++) {
+   if (pdn->m64_map[j][i] == IODA_INVALID_M64)
continue;
opal_pci_phb_mmio_enable(phb->opal_id,
-   OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
-   clear_bit(pdn->m64_wins[i][j], 
>ioda.m64_bar_alloc);
-   pdn->m64_wins[i][j] = IODA_INVALID_M64;
+   OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
+   clear_bit(pdn->m64_map[j][i], >ioda.m64_bar_alloc);
+   pdn->m64_map[j][i] = IODA_INVALID_M64;
}
 
+   kfree(pdn->m64_map);
return 0;
 }
 
@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
u16 num_vfs)
inttotal_vfs;
resource_size_tsize, start;
intpe_num;
-   intvf_groups;
-   intvf_per_group;
+   intm64_bars;
 
bus = pdev->bus;
hose = pci_bus_to_host(bus);
@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
u16 num_vfs)
pdn = pci_get_pdn(pdev);
total_vfs = pci_sriov_get_totalvfs(pdev);
 
-   /* Initialize the m64_wins to IODA_INVALID_M64 */
-   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-   for (j = 0; j < M64_PER_IOV; j++)
-   pdn->m64_wins[i][j] = IODA_INVALID_M64;
+   if (pdn->m64_single_mode)
+   m64_bars = num_vfs;
+   else
+   m64_bars = 1;
+
+   pdn->m64_map = kmalloc(sizeof(*pdn->m64_map) * m64_bars, GFP_KERNEL);
+   if (!pdn->m64_map)
+   return -ENOMEM;
+   /* Initialize the m64_map to IODA_INVALID_M64 */
+   for (i = 0; i < m64_bars ; i++)
+   for (j = 0; j < PCI_SRIOV_NUM_BARS; j++)
+   pdn->m64_map[i][j] = IODA_INVALID_M64;
 
-   if (pdn->m64_per_iov == M64_PER_IOV) {
-   vf_groups = (num_vfs <= M64_PER_IOV) ? num_vfs: M64_PER_IOV;
-  

[PATCH V6 2/6] powerpc/powernv: simplify the calculation of iov resource alignment

2015-10-20 Thread Wei Yang
The alignment of IOV BAR on PowerNV platform is the total size of the IOV
BAR. No matter whether the IOV BAR is extended with number of
roundup_pow_of_two(total_vfs) or number of max PE number (256), the total
size could be calculated by (vfs_expanded * VF_BAR_size).

This patch simplifies the pnv_pci_iov_resource_alignment() by removing the
first case.

Signed-off-by: Wei Yang 
Reviewed-by: Gavin Shan 
Acked-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f042fed..629ab1b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2997,17 +2997,21 @@ static resource_size_t 
pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
  int resno)
 {
struct pci_dn *pdn = pci_get_pdn(pdev);
-   resource_size_t align, iov_align;
-
-   iov_align = resource_size(>resource[resno]);
-   if (iov_align)
-   return iov_align;
+   resource_size_t align;
 
+   /*
+* On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
+* SR-IOV. While from hardware perspective, the range mapped by M64
+* BAR should be size aligned.
+*
+* This function returns the total IOV BAR size if M64 BAR is in
+* Shared PE mode or just VF BAR size if not.
+*/
align = pci_iov_resource_size(pdev, resno);
-   if (pdn->vfs_expanded)
-   return pdn->vfs_expanded * align;
+   if (!pdn->vfs_expanded)
+   return align;
 
-   return align;
+   return pdn->vfs_expanded * align;
 }
 #endif /* CONFIG_PCI_IOV */
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V6 6/6] powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE mode

2015-10-20 Thread Wei Yang
When M64 BAR is set to Single PE mode, the PE# assigned to VF could be
sparse.

This patch restructures the code to allocate sparse PE# for VFs when M64
BAR is set to Single PE mode. Also it rename the offset to pe_num_map to
reflect the content is the PE number.

Signed-off-by: Wei Yang 
Reviewed-by: Gavin Shan 
Acked-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/pci-bridge.h |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 81 +++
 2 files changed, 63 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 8aeba4c..b3a226b 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -213,7 +213,7 @@ struct pci_dn {
 #ifdef CONFIG_PCI_IOV
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
-   int offset; /* PE# for the first VF PE */
+   int *pe_num_map;/* PE# for the first VF PE or array */
boolm64_single_mode;/* Use M64 BAR in Single Mode */
 #define IODA_INVALID_M64(-1)
int (*m64_map)[PCI_SRIOV_NUM_BARS];
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index a8c55f5..b8dfd31 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1243,7 +1243,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
u16 num_vfs)
 
/* Map the M64 here */
if (pdn->m64_single_mode) {
-   pe_num = pdn->offset + j;
+   pe_num = pdn->pe_num_map[j];
rc = opal_pci_map_pe_mmio_window(phb->opal_id,
pe_num, OPAL_M64_WINDOW_TYPE,
pdn->m64_map[j][i], 0);
@@ -1347,7 +1347,7 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
struct pnv_phb*phb;
struct pci_dn *pdn;
struct pci_sriov  *iov;
-   u16 num_vfs;
+   u16num_vfs, i;
 
bus = pdev->bus;
hose = pci_bus_to_host(bus);
@@ -1361,14 +1361,21 @@ void pnv_pci_sriov_disable(struct pci_dev *pdev)
 
if (phb->type == PNV_PHB_IODA2) {
if (!pdn->m64_single_mode)
-   pnv_pci_vf_resource_shift(pdev, -pdn->offset);
+   pnv_pci_vf_resource_shift(pdev, -*pdn->pe_num_map);
 
/* Release M64 windows */
pnv_pci_vf_release_m64(pdev, num_vfs);
 
/* Release PE numbers */
-   bitmap_clear(phb->ioda.pe_alloc, pdn->offset, num_vfs);
-   pdn->offset = 0;
+   if (pdn->m64_single_mode) {
+   for (i = 0; i < num_vfs; i++) {
+   if (pdn->pe_num_map[i] != IODA_INVALID_PE)
+   pnv_ioda_free_pe(phb, 
pdn->pe_num_map[i]);
+   }
+   } else
+   bitmap_clear(phb->ioda.pe_alloc, *pdn->pe_num_map, 
num_vfs);
+   /* Releasing pe_num_map */
+   kfree(pdn->pe_num_map);
}
 }
 
@@ -1394,7 +1401,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, 
u16 num_vfs)
 
/* Reserve PE for each VF */
for (vf_index = 0; vf_index < num_vfs; vf_index++) {
-   pe_num = pdn->offset + vf_index;
+   if (pdn->m64_single_mode)
+   pe_num = pdn->pe_num_map[vf_index];
+   else
+   pe_num = *pdn->pe_num_map + vf_index;
 
pe = >ioda.pe_array[pe_num];
pe->pe_number = pe_num;
@@ -1436,6 +1446,7 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
num_vfs)
struct pnv_phb*phb;
struct pci_dn *pdn;
intret;
+   u16i;
 
bus = pdev->bus;
hose = pci_bus_to_host(bus);
@@ -1458,20 +1469,44 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
num_vfs)
return -EBUSY;
}
 
+   /* Allocating pe_num_map */
+   if (pdn->m64_single_mode)
+   pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map) * 
num_vfs,
+   GFP_KERNEL);
+   else
+   pdn->pe_num_map = kmalloc(sizeof(*pdn->pe_num_map), 
GFP_KERNEL);
+
+   if (!pdn->pe_num_map)
+   return -ENOMEM;
+
+   if (pdn->m64_single_mode)
+   for (i = 0; i < num_vfs; i++)
+   pdn->pe_num_map[i] = IODA_INVALID_PE;
+
/* 

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-20 Thread Peter Zijlstra
On Tue, Oct 20, 2015 at 03:15:32PM +0800, Boqun Feng wrote:
> On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> > 
> > Am I missing something here?  If not, it seems to me that you need
> > the leading lwsync to instead be a sync.
> > 
> > Of course, if I am not missing something, then this applies also to the
> > value-returning RMW atomic operations that you pulled this pattern from.
> > If so, it would seem that I didn't think through all the possibilities
> > back when PPC_ATOMIC_EXIT_BARRIER moved to sync...  In fact, I believe
> > that I worried about the RMW atomic operation acting as a barrier,
> > but not as the load/store itself.  :-/
> > 
> 
> Paul, I know this may be difficult, but could you recall why the
> __futex_atomic_op() and futex_atomic_cmpxchg_inatomic() also got
> involved into the movement of PPC_ATOMIC_EXIT_BARRIER to "sync"?
> 
> I did some search, but couldn't find the discussion of that patch.
> 
> I ask this because I recall Peter once bought up a discussion:
> 
> https://lkml.org/lkml/2015/8/26/596
> 
> Peter's conclusion seems to be that we could(though didn't want to) live
> with futex atomics not being full barriers.
> 
> 
> Peter, just be clear, I'm not in favor of relaxing futex atomics. But if
> I make PPC_ATOMIC_ENTRY_BARRIER being "sync", it will also strengthen
> the futex atomics, just wonder whether such strengthen is a -fix- or
> not, considering that I want this patch to go to -stable tree.

So Linus' argued that since we only need to order against user accesses
(true) and priv changes typically imply strong barriers (open) we might
want to allow archs to rely on those instead of mandating they have
explicit barriers in the futex primitives.

And I indeed forgot to follow up on that discussion.

So; does PPC imply full barriers on user<->kernel boundaries? If so, its
not critical to the futex atomic implementations what extra barriers are
added.

If not; then strengthening the futex ops is indeed (probably) a good
thing :-)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V6 0/6] Redesign SR-IOV on PowerNV

2015-10-20 Thread Wei Yang
In original design, it tries to group VFs to enable more number of VFs in the
system, when VF BAR is bigger than 64MB. This design has a flaw in which one
error on a VF will interfere other VFs in the same group.

This patch series change this design by using M64 BAR in Single PE mode to
cover only one VF BAR. By doing so, it gives absolute isolation between VFs.

v6:
   * add the minimum size check when M64 BAR is in Single PE mode
   * truncate IOV BAR when powernv can't handle it
v5:
   * rebase on top of v4.3-rc4, with commit 68230242cdb "net/mlx4_core: Add port
 attribute when tracking counters" reverted
   * add some reason in change log of Patch 1
   * make the pnv_pci_iov_resource_alignment() more easy to read
   * initialize pe_num_map[] just after it is allocated
   * test ssh from guest to host via VF passed and then shutdown the guest
   * no code change
v4:
   * rebase the code on top of v4.2-rc7
   * switch back to use the dynamic version of pe_num_map and m64_map
   * split the memory allocation and PE assignment of pe_num_map to make it
 more easy to read
   * check pe_num_map value before free PE.
   * add the rename reason for pe_num_map and m64_map in change log
v3:
   * return -ENOSPC when a VF has non-64bit prefetchable BAR
   * rename offset to pe_num_map and define it staticly
   * change commit log based on comments
   * define m64_map staticly
v2:
   * clean up iov bar alignment calculation
   * change m64s to m64_bars
   * add a field to represent M64 Single PE mode will be used
   * change m64_wins to m64_map
   * calculate the gate instead of hard coded
   * dynamically allocate m64_map
   * dynamically allocate PE#
   * add a case to calculate iov bar alignment when M64 Single PE is used
   * when M64 Single PE is used, compare num_vfs with M64 BAR available number 
 in system at first

Wei Yang (6):
  powerpc/powernv: don't enable SRIOV when VF BAR has non
64bit-prefetchable BAR
  powerpc/powernv: simplify the calculation of iov resource alignment
  powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR
  powerpc/powernv: replace the hard coded boundary with gate
  powerpc/powernv: boundary the total VF BAR size instead of the
individual one
  powerpc/powernv: allocate sparse PE# when using M64 BAR in Single PE
mode

 arch/powerpc/include/asm/pci-bridge.h |   7 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 346 --
 2 files changed, 191 insertions(+), 162 deletions(-)

-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V6 5/6] powerpc/powernv: boundary the total VF BAR size instead of the individual one

2015-10-20 Thread Wei Yang
Each VF could have 6 BARs at most. When the total BAR size exceeds the
gate, after expanding it will also exhaust the M64 Window.

This patch limits the boundary by checking the total VF BAR size instead of
the individual BAR.

Signed-off-by: Wei Yang 
Reviewed-by: Gavin Shan 
Acked-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index de5a194..a8c55f5 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2701,7 +2701,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
const resource_size_t gate = phb->ioda.m64_segsize >> 2;
struct resource *res;
int i;
-   resource_size_t size;
+   resource_size_t size, total_vf_bar_sz;
struct pci_dn *pdn;
int mul, total_vfs;
 
@@ -2714,6 +2714,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
 
total_vfs = pci_sriov_get_totalvfs(pdev);
mul = phb->ioda.total_pe;
+   total_vf_bar_sz = 0;
 
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = >resource[i + PCI_IOV_RESOURCES];
@@ -2726,7 +2727,8 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
goto truncate_iov;
}
 
-   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
+   total_vf_bar_sz += pci_iov_resource_size(pdev,
+   i + PCI_IOV_RESOURCES);
 
/*
 * If bigger than quarter of M64 segment size, just round up
@@ -2740,11 +2742,11 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
 * limit the system flexibility.  This is a design decision to
 * set the boundary to quarter of the M64 segment size.
 */
-   if (size > gate) {
-   dev_info(>dev, "PowerNV: VF BAR%d: %pR IOV size "
-   "is bigger than %lld, roundup power2\n",
-i, res, gate);
+   if (total_vf_bar_sz > gate) {
mul = roundup_pow_of_two(total_vfs);
+   dev_info(>dev,
+   "VF BAR Total IOV size %llx > %llx, roundup to 
%d VFs\n",
+   total_vf_bar_sz, gate, mul);
pdn->m64_single_mode = true;
break;
}
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V6 4/6] powerpc/powernv: replace the hard coded boundary with gate

2015-10-20 Thread Wei Yang
At the moment 64bit-prefetchable window can be maximum 64GB, which is
currently got from device tree. This means that in shared mode the maximum
supported VF BAR size is 64GB/256=256MB. While this size could exhaust the
whole 64bit-prefetchable window. This is a design decision to set a
boundary to 64MB of the VF BAR size. Since VF BAR size with 64MB would
occupy a quarter of the 64bit-prefetchable window, this is affordable.

This patch replaces magic limit of 64MB with "gate", which is 1/4 of the
M64 Segment Size(m64_segsize >> 2) and adds comment to explain the reason
for it.

Signed-off-by: Wei Yang 
Reviewed-by: Gavin Shan 
Acked-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index dc64026..de5a194 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2696,8 +2696,9 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { 
}
 #ifdef CONFIG_PCI_IOV
 static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 {
-   struct pci_controller *hose;
-   struct pnv_phb *phb;
+   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   const resource_size_t gate = phb->ioda.m64_segsize >> 2;
struct resource *res;
int i;
resource_size_t size;
@@ -2707,9 +2708,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
if (!pdev->is_physfn || pdev->is_added)
return;
 
-   hose = pci_bus_to_host(pdev->bus);
-   phb = hose->private_data;
-
pdn = pci_get_pdn(pdev);
pdn->vfs_expanded = 0;
pdn->m64_single_mode = false;
@@ -2730,10 +2728,22 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
 
size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
 
-   /* bigger than 64M */
-   if (size > (1 << 26)) {
-   dev_info(>dev, "PowerNV: VF BAR%d: %pR IOV size 
is bigger than 64M, roundup power2\n",
-i, res);
+   /*
+* If bigger than quarter of M64 segment size, just round up
+* power of two.
+*
+* Generally, one M64 BAR maps one IOV BAR. To avoid conflict
+* with other devices, IOV BAR size is expanded to be
+* (total_pe * VF_BAR_size).  When VF_BAR_size is half of M64
+* segment size , the expanded size would equal to half of the
+* whole M64 space size, which will exhaust the M64 Space and
+* limit the system flexibility.  This is a design decision to
+* set the boundary to quarter of the M64 segment size.
+*/
+   if (size > gate) {
+   dev_info(>dev, "PowerNV: VF BAR%d: %pR IOV size "
+   "is bigger than %lld, roundup power2\n",
+i, res, gate);
mul = roundup_pow_of_two(total_vfs);
pdn->m64_single_mode = true;
break;
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC][PATCH 3/3]perf/powerpc :add support for sampling intr machine state

2015-10-20 Thread AnjuTSudhakar

On Tuesday 20 October 2015 09:50 AM, Madhavan Srinivasan wrote:


On Monday 19 October 2015 05:48 PM, Anju T wrote:

From: Anju 

The registers to sample are passed through the sample_regs_intr bitmask.
The name and bit position for each register is defined in asm/perf_regs.h.
This feature can be enabled by using -I option with perf  record command.
To display the sampled register values use perf script -D.
The kernel uses the "PERF" register ids to find offset of the register in 
'struct pt_regs'.
CONFIG_HAVE_PERF_REGS will enable sampling of the interrupted machine state.

Signed-off-by: Anju T 
---
  arch/powerpc/Kconfig  |  1 +
  arch/powerpc/perf/Makefile|  2 +-
  arch/powerpc/perf/perf_regs.c | 85 +++
  tools/perf/config/Makefile|  4 ++
  4 files changed, 91 insertions(+), 1 deletion(-)
  create mode 100644 arch/powerpc/perf/perf_regs.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9a7057e..c4ce60d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -119,6 +119,7 @@ config PPC
select GENERIC_ATOMIC64 if PPC32
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select HAVE_PERF_EVENTS
+   select HAVE_PERF_REGS
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_HW_BREAKPOINT if PERF_EVENTS && PPC_BOOK3S_64
select ARCH_WANT_IPC_PARSE_VERSION
diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f9c083a..8e7f545 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -7,7 +7,7 @@ obj64-$(CONFIG_PPC_PERF_CTRS)   += power4-pmu.o ppc970-pmu.o 
power5-pmu.o \
   power5+-pmu.o power6-pmu.o power7-pmu.o \
   power8-pmu.o
  obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o
-
+obj-$(CONFIG_PERF_EVENTS)  += perf_regs.o
  obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
  obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o

diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
new file mode 100644
index 000..7a71de2
--- /dev/null
+++ b/arch/powerpc/perf/perf_regs.c
@@ -0,0 +1,85 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r)
+
+#define REG_RESERVED (~((1ULL << PERF_REG_POWERPC_MAX) - 1))
+
+static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = {
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR0, gpr[0]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR1, gpr[1]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR2, gpr[2]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR3, gpr[3]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR4, gpr[4]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR5, gpr[5]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR6, gpr[6]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR7, gpr[7]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR8, gpr[8]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR9, gpr[9]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR10, gpr[10]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR11, gpr[11]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR12, gpr[12]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR13, gpr[13]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR14, gpr[14]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR15, gpr[15]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR16, gpr[16]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR17, gpr[17]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR18, gpr[18]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR19, gpr[19]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR20, gpr[20]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR21, gpr[21]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR22, gpr[22]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR23, gpr[23]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR24, gpr[24]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR25, gpr[25]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR26, gpr[26]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR27, gpr[27]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR28, gpr[28]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR29, gpr[29]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR30, gpr[30]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR31, gpr[31]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_NIP, nip),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_MSR, msr),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_ORIG_R3, orig_gpr3),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_CTR, ctr),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_LNK, link),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_XER, xer),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_CCR, ccr),
+#ifdef __powerpc64__
+   PT_REGS_OFFSET(PERF_REG_POWERPC_SOFTE, softe),
+#else
+   PT_REGS_OFFSET(PERF_REG_POWERPC_MQ, mq),
+#endif
+   PT_REGS_OFFSET(PERF_REG_POWERPC_TRAP, trap),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_DAR, dar),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_DSISR, dsisr),
+ 

Re: [PATCH 1/3] perf/powerpc:add ability to sample intr machine state in power

2015-10-20 Thread AnjuTSudhakar

Hi maddy,
On Tuesday 20 October 2015 09:46 AM, Madhavan Srinivasan wrote:


On Monday 19 October 2015 05:48 PM, Anju T wrote:

From: Anju 

The enum definition assigns an 'id' to each register in power.

I guess it should be "each register in "struct pt_regs" of arch/powerpc

Right, that seems better.Will change the description like that.

Thanks a lot for reviewing the patch .

The order of these values in the enum definition are based on
the corresponding macros in arch/powerpc/include/uapi/asm/ptrace.h .

Signed-off-by: Anju T 
---
  arch/powerpc/include/uapi/asm/perf_regs.h | 55 +++
  1 file changed, 55 insertions(+)
  create mode 100644 arch/powerpc/include/uapi/asm/perf_regs.h

diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
b/arch/powerpc/include/uapi/asm/perf_regs.h
new file mode 100644
index 000..b97727c
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -0,0 +1,55 @@
+#ifndef _ASM_POWERPC_PERF_REGS_H
+#define _ASM_POWERPC_PERF_REGS_H
+
+enum perf_event_powerpc_regs {
+   PERF_REG_POWERPC_GPR0,
+   PERF_REG_POWERPC_GPR1,
+   PERF_REG_POWERPC_GPR2,
+   PERF_REG_POWERPC_GPR3,
+   PERF_REG_POWERPC_GPR4,
+   PERF_REG_POWERPC_GPR5,
+   PERF_REG_POWERPC_GPR6,
+   PERF_REG_POWERPC_GPR7,
+   PERF_REG_POWERPC_GPR8,
+   PERF_REG_POWERPC_GPR9,
+   PERF_REG_POWERPC_GPR10,
+   PERF_REG_POWERPC_GPR11,
+   PERF_REG_POWERPC_GPR12,
+   PERF_REG_POWERPC_GPR13,
+   PERF_REG_POWERPC_GPR14,
+   PERF_REG_POWERPC_GPR15,
+   PERF_REG_POWERPC_GPR16,
+   PERF_REG_POWERPC_GPR17,
+   PERF_REG_POWERPC_GPR18,
+   PERF_REG_POWERPC_GPR19,
+   PERF_REG_POWERPC_GPR20,
+   PERF_REG_POWERPC_GPR21,
+   PERF_REG_POWERPC_GPR22,
+   PERF_REG_POWERPC_GPR23,
+   PERF_REG_POWERPC_GPR24,
+   PERF_REG_POWERPC_GPR25,
+   PERF_REG_POWERPC_GPR26,
+   PERF_REG_POWERPC_GPR27,
+   PERF_REG_POWERPC_GPR28,
+   PERF_REG_POWERPC_GPR29,
+   PERF_REG_POWERPC_GPR30,
+   PERF_REG_POWERPC_GPR31,
+   PERF_REG_POWERPC_NIP,
+   PERF_REG_POWERPC_MSR,
+   PERF_REG_POWERPC_ORIG_R3,
+   PERF_REG_POWERPC_CTR,
+   PERF_REG_POWERPC_LNK,
+   PERF_REG_POWERPC_XER,
+   PERF_REG_POWERPC_CCR,
+#ifdef __powerpc64__
+   PERF_REG_POWERPC_SOFTE,
+#else
+   PERF_REG_POWERPC_MQ,
+#endif
+   PERF_REG_POWERPC_TRAP,
+   PERF_REG_POWERPC_DAR,
+   PERF_REG_POWERPC_DSISR,
+   PERF_REG_POWERPC_RESULT,
+   PERF_REG_POWERPC_MAX,
+};
+#endif /* _ASM_POWERPC_PERF_REGS_H */





Thanks
Anju

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: linux-next: build warning after merge of the powerpc tree

2015-10-20 Thread Michael Ellerman
On Tue, 2015-10-20 at 16:21 +1100, Stephen Rothwell wrote:

> Hi all,
> 
> After merging the powerpc tree, today's linux-next build (powerpc
> allyesconfig) produced this warning:
> 
> WARNING: vmlinux.o(.text+0x9367c): Section mismatch in reference from the 
> function .msi_bitmap_alloc() to the function 
> .init.text:.memblock_virt_alloc_try_nid()
> The function .msi_bitmap_alloc() references
> the function __init .memblock_virt_alloc_try_nid().
> This is often because .msi_bitmap_alloc lacks a __init
> annotation or the annotation of .memblock_virt_alloc_try_nid is wrong.
> 
> Introduced (probably) by commit
> 
>   cb2d3883c603 ("powerpc/msi: Free the bitmap if it was slab allocated")


Yeah that's correct, though it should be safe in practice.

I'm not sure why you only saw that now though, the patch has been in next since
the 13th of October.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] barriers: introduce smp_mb__release_acquire and update documentation

2015-10-20 Thread Boqun Feng
On Mon, Oct 12, 2015 at 04:30:48PM -0700, Paul E. McKenney wrote:
> On Fri, Oct 09, 2015 at 07:33:28PM +0100, Will Deacon wrote:
> > On Fri, Oct 09, 2015 at 10:43:27AM -0700, Paul E. McKenney wrote:
> > > On Fri, Oct 09, 2015 at 10:51:29AM +0100, Will Deacon wrote:
[snip]
> 
> > > > We could also include a link to the ppcmem/herd web frontends and your
> > > > lwn.net article. (ppcmem is already linked, but it's not obvious that
> > > > you can run litmus tests in your browser).
> > > 
> > > I bet that the URLs for the web frontends are not stable long term.
> > > Don't get me wrong, PPCMEM/ARMMEM has been there for me for a goodly
> > > number of years, but professors do occasionally move from one institution
> > > to another.  For but one example, Susmit Sarkar is now at University
> > > of St. Andrews rather than at Cambridge.
> > > 
> > > So to make this work, we probably need to be thinking in terms of
> > > asking the researchers for permission to include their ocaml code in the
> > > Linux-kernel source tree.  I would be strongly in favor of this, actually.
> > > 
> > > Thoughts?
> > 
> > I'm extremely hesitant to import a bunch of dubiously licensed, academic
> > ocaml code into the kernel. Even if we did, who would maintain it?
> > 
> > A better solution might be to host a mirror of the code on kernel.org,
> > along with a web front-end for people to play with (the tests we're talking
> > about here do seem to run ok in my browser).
> 
> I am not too worried about how this happens, but we should avoid
> constraining the work of our academic partners.  The reason I was thinking
> in terms of in the kernel was to avoid version-synchronization issues.
> "Wait, this is Linux kernel v4.17, which means that you need to use
> version 8.3.5.1 of the tooling...  And with these four patches as well."
> 

Maybe including only the models' code(arm.cat, ppc.cat, etc.) into
kernel rather than the whole code base could also solve the
version-synchronization in some degree, and avoid maintaining the whole
tool code? I'm assuming modifying the verifier's code other than the
models' code will unlikely change the result of a litmus test.

Regards,
Boqun


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5] drivers/tty: make more bool drivers explicitly non-modular

2015-10-20 Thread Alexandre Belloni
On 18/10/2015 at 18:21:13 -0400, Paul Gortmaker wrote :
> The one common thread here for all the patches is that we also
> scrap the .remove functions which would only be used for module
> unload (impossible) and driver unbind.  For the drivers here, there
> doesn't seem to be a sensible unbind use case (vs. e.g. a multiport
> PCI ethernet driver where one port is unbound and passed through to
> a kvm guest or similar).  Hence we just explicitly disallow any
> driver unbind operations to help prevent root from doing something
> illogical to the machine that they could have done previously.
> 
> We've already done this for drivers/tty/serial/mpsc.c previously.
> 
> Build tested for allmodconfig on ARM64 and powerpc for tty/tty-testing.
> 

So, how does this actually build test atmel_serial?

A proper solution would be to actually make it a tristate and allow
building as a module. I think it currently fails because of
console_initcall() but that is certainly fixable.


-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 5/7] powerpc/mm: update frag size

2015-10-20 Thread Aneesh Kumar K.V
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 5062c6d423fd..a28dbfe2baed 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -39,14 +39,14 @@
  */
 #define PTE_RPN_SHIFT  (30)
 /*
- * we support 8 fragments per PTE page of 64K size.
+ * we support 32 fragments per PTE page of 64K size.
  */
-#define PTE_FRAG_NR8
+#define PTE_FRAG_NR32
 /*
  * We use a 2K PTE page fragment and another 4K for storing
  * real_pte_t hash index. Rounding the entire thing to 8K
  */
-#define PTE_FRAG_SIZE_SHIFT  13
+#define PTE_FRAG_SIZE_SHIFT  11
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
 
 /*
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 3/7] powerpc/nohash: Update 64K nohash config to have 32 pte fragement

2015-10-20 Thread Aneesh Kumar K.V
They don't need to track 4k subpage slot details and hence don't need
second half of pgtable_t.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h 
b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
index 1d8e26e8167b..dbd9de9264c2 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
@@ -10,14 +10,14 @@
 #define PGD_INDEX_SIZE  12
 
 /*
- * we support 8 fragments per PTE page of 64K size
+ * we support 32 fragments per PTE page of 64K size
  */
-#define PTE_FRAG_NR8
+#define PTE_FRAG_NR32
 /*
  * We use a 2K PTE page fragment and another 4K for storing
  * real_pte_t hash index. Rounding the entire thing to 8K
  */
-#define PTE_FRAG_SIZE_SHIFT  13
+#define PTE_FRAG_SIZE_SHIFT  11
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
 
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 7/7] powerpc/mm: getrid of real_pte_t

2015-10-20 Thread Aneesh Kumar K.V
Now that we don't track 4k subpage slot details, get rid of real_pte

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h| 15 -
 arch/powerpc/include/asm/book3s/64/pgtable.h | 24 
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h |  3 +--
 arch/powerpc/include/asm/nohash/64/pgtable.h | 17 +-
 arch/powerpc/include/asm/page.h  | 15 -
 arch/powerpc/include/asm/tlbflush.h  |  4 ++--
 arch/powerpc/mm/hash64_64k.c | 28 +++-
 arch/powerpc/mm/hash_native_64.c |  4 ++--
 arch/powerpc/mm/hash_utils_64.c  |  4 ++--
 arch/powerpc/mm/init_64.c|  3 +--
 arch/powerpc/mm/tlb_hash64.c | 15 ++---
 arch/powerpc/platforms/pseries/lpar.c|  4 ++--
 12 files changed, 44 insertions(+), 92 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 19e0afb36fa8..90d4c3bfbafd 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -43,8 +43,7 @@
  */
 #define PTE_FRAG_NR32
 /*
- * We use a 2K PTE page fragment and another 4K for storing
- * real_pte_t hash index. Rounding the entire thing to 8K
+ * We use a 2K PTE page fragment
  */
 #define PTE_FRAG_SIZE_SHIFT  11
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
@@ -58,21 +57,15 @@
 #define PUD_MASKED_BITS0x1ff
 
 #ifndef __ASSEMBLY__
-
 /*
  * With 64K pages on hash table, we have a special PTE format that
  * uses a second "half" of the page table to encode sub-page information
  * in order to deal with 64K made of 4K HW pages. Thus we override the
  * generic accessors and iterators here
  */
-#define __real_pte __real_pte
-extern real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep);
-extern unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long hash,
-   unsigned long vpn, int ssize, bool *valid);
-static inline pte_t __rpte_to_pte(real_pte_t rpte)
-{
-   return rpte.pte;
-}
+#define pte_to_hidx pte_to_hidx
+extern unsigned long pte_to_hidx(pte_t rpte, unsigned long hash,
+unsigned long vpn, int ssize, bool *valid);
 /*
  * Trick: we set __end to va + 64k, which happens works for
  * a 16M page as well as we want only one iteration
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 79a90ca7b9f6..1d5648e25fcb 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -35,36 +35,30 @@
 #define __HAVE_ARCH_PTE_SPECIAL
 
 #ifndef __ASSEMBLY__
-
 /*
  * This is the default implementation of various PTE accessors, it's
  * used in all cases except Book3S with 64K pages where we have a
  * concept of sub-pages
  */
-#ifndef __real_pte
-
-#ifdef CONFIG_STRICT_MM_TYPECHECKS
-#define __real_pte(a,e,p)  ((real_pte_t){(e)})
-#define __rpte_to_pte(r)   ((r).pte)
-#else
-#define __real_pte(a,e,p)  (e)
-#define __rpte_to_pte(r)   (__pte(r))
+#ifndef pte_to_hidx
+#define pte_to_hidx(pte, index)(pte_val(pte) >> _PAGE_F_GIX_SHIFT)
 #endif
-#define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) 
>>_PAGE_F_GIX_SHIFT)
 
-#define pte_iterate_hashed_subpages(vpn, psize, shift) \
-   do {\
-   shift = mmu_psize_defs[psize].shift;\
+#ifndef pte_iterate_hashed_subpages
+#define pte_iterate_hashed_subpages(vpn, psize, shift) \
+   do {\
+   shift = mmu_psize_defs[psize].shift;\
 
 #define pte_iterate_hashed_end() } while(0)
+#endif
 
 /*
  * We expect this to be called only for user addresses or kernel virtual
  * addresses other than the linear mapping.
  */
+#ifndef pte_pagesize_index
 #define pte_pagesize_index(mm, addr, pte)  MMU_PAGE_4K
-
-#endif /* __real_pte */
+#endif
 
 static inline void pmd_set(pmd_t *pmdp, unsigned long val)
 {
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h 
b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
index dbd9de9264c2..0f075799ae97 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
@@ -14,8 +14,7 @@
  */
 #define PTE_FRAG_NR32
 /*
- * We use a 2K PTE page fragment and another 4K for storing
- * real_pte_t hash index. Rounding the entire thing to 8K
+ * We use a 2K PTE page fragment
  */
 #define PTE_FRAG_SIZE_SHIFT  11
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 37b5a62d18f4..ddde5f16c385 100644
--- 

Re: linux-next: build warning after merge of the powerpc tree

2015-10-20 Thread Stephen Rothwell
Hi Michael,

On Tue, 20 Oct 2015 21:06:51 +1100 Michael Ellerman  wrote:
>
> On Tue, 2015-10-20 at 16:21 +1100, Stephen Rothwell wrote:
> 
> > After merging the powerpc tree, today's linux-next build (powerpc
> > allyesconfig) produced this warning:
> > 
> > WARNING: vmlinux.o(.text+0x9367c): Section mismatch in reference from the 
> > function .msi_bitmap_alloc() to the function 
> > .init.text:.memblock_virt_alloc_try_nid()
> > The function .msi_bitmap_alloc() references
> > the function __init .memblock_virt_alloc_try_nid().
> > This is often because .msi_bitmap_alloc lacks a __init
> > annotation or the annotation of .memblock_virt_alloc_try_nid is wrong.
> > 
> > Introduced (probably) by commit
> > 
> >   cb2d3883c603 ("powerpc/msi: Free the bitmap if it was slab allocated")  
> 
> Yeah that's correct, though it should be safe in practice.
> 
> I'm not sure why you only saw that now though, the patch has been in next 
> since
> the 13th of October.

I don't always notice new warnings immediately among all the others :-(

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 4/7] powerpc/mm: Don't track 4k subpage information with 64k linux page size

2015-10-20 Thread Aneesh Kumar K.V
We search the hash table to find the slot information. This slows down
the lookup, but we do that only for 4k subpage config

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 33 +--
 arch/powerpc/include/asm/machdep.h|  2 +
 arch/powerpc/include/asm/page.h   |  4 +-
 arch/powerpc/mm/hash64_64k.c  | 59 ---
 arch/powerpc/mm/hash_native_64.c  | 23 ++-
 arch/powerpc/mm/hash_utils_64.c   |  5 ++-
 arch/powerpc/mm/pgtable_64.c  |  6 ++-
 arch/powerpc/platforms/pseries/lpar.c | 17 +++-
 8 files changed, 96 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 681657cabbe4..5062c6d423fd 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -67,51 +67,22 @@
  */
 #define __real_pte __real_pte
 extern real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep);
-static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long 
index)
-{
-   if ((pte_val(rpte.pte) & _PAGE_COMBO))
-   return (unsigned long) rpte.hidx[index] >> 4;
-   return (pte_val(rpte.pte) >> _PAGE_F_GIX_SHIFT) & 0xf;
-}
-
+extern unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long hash,
+   unsigned long vpn, int ssize, bool *valid);
 static inline pte_t __rpte_to_pte(real_pte_t rpte)
 {
return rpte.pte;
 }
 /*
- * we look at the second half of the pte page to determine whether
- * the sub 4k hpte is valid. We use 8 bits per each index, and we have
- * 16 index mapping full 64K page. Hence for each
- * 64K linux page we use 128 bit from the second half of pte page.
- * The encoding in the second half of the page is as below:
- * [ index 15 ] .[index 0]
- * [bit 127 ..bit 0]
- * fomat of each index
- * bit 7  bit0
- * [one bit secondary][ 3 bit hidx][1 bit valid][000]
- */
-static inline bool __rpte_sub_valid(real_pte_t rpte, unsigned long index)
-{
-   unsigned char index_val = rpte.hidx[index];
-
-   if ((index_val >> 3) & 0x1)
-   return true;
-   return false;
-}
-
-/*
  * Trick: we set __end to va + 64k, which happens works for
  * a 16M page as well as we want only one iteration
  */
 #define pte_iterate_hashed_subpages(rpte, psize, vpn, index, shift)\
do {\
unsigned long __end = vpn + (1UL << (PAGE_SHIFT - VPN_SHIFT));  
\
-   unsigned __split = (psize == MMU_PAGE_4K || \
-   psize == MMU_PAGE_64K_AP);  \
shift = mmu_psize_defs[psize].shift;\
for (index = 0; vpn < __end; index++,   \
 vpn += (1L << (shift - VPN_SHIFT))) {  \
-   if (!__split || __rpte_sub_valid(rpte, index))  \
do {
 
 #define pte_iterate_hashed_end() } while(0); } } while(0)
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index cab6753f1be5..40df21982ae1 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -61,6 +61,8 @@ struct machdep_calls {
   unsigned long addr,
   unsigned char *hpte_slot_array,
   int psize, int ssize, int local);
+
+   unsigned long (*get_hpte_v)(unsigned long slot);
/* special for kexec, to be called in real mode, linear mapping is
 * destroyed as well */
void(*hpte_clear_all)(void);
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index f63b2761cdd0..bbdf9e6cc8b1 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -295,7 +295,7 @@ static inline pte_basic_t pte_val(pte_t x)
  * the "second half" part of the PTE for pseudo 64k pages
  */
 #if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned char *hidx; } real_pte_t;
+typedef struct { pte_t pte; } real_pte_t;
 #else
 typedef struct { pte_t pte; } real_pte_t;
 #endif
@@ -347,7 +347,7 @@ static inline pte_basic_t pte_val(pte_t pte)
 }
 
 #if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned char *hidx; } real_pte_t;
+typedef struct { pte_t pte; } real_pte_t;
 #else
 typedef pte_t real_pte_t;
 #endif
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 84867a1491a2..e063895694e9 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ 

[RESEND PATCH 1/2] powerpc: platforms: mpc52xx_lpbfifo: Fix module autoload for OF platform driver

2015-10-20 Thread Luis de Bethencourt
From: Luis de Bethencourt 

This platform driver has a OF device ID table but the OF module
alias information is not created so module autoloading won't work.

Signed-off-by: Luis de Bethencourt 
---
 arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c 
b/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c
index 251dcb9..7bb42a0 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c
@@ -568,6 +568,7 @@ static const struct of_device_id mpc52xx_lpbfifo_match[] = {
{ .compatible = "fsl,mpc5200-lpbfifo", },
{},
 };
+MODULE_DEVICE_TABLE(of, mpc52xx_lpbfifo_match);
 
 static struct platform_driver mpc52xx_lpbfifo_driver = {
.driver = {
-- 
2.5.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 1/7] powerpc/mm: Don't hardcode page table size

2015-10-20 Thread Aneesh Kumar K.V
pte and pmd table size are dependent on config items. Don't
hard code the same. This make sure we use the right value
when masking pmd entries and also while checking pmd_bad

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h| 30 ++--
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h | 22 +
 arch/powerpc/include/asm/pgalloc-64.h| 10 
 arch/powerpc/mm/init_64.c|  4 
 4 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 957d66d13a97..565f9418c25f 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -25,12 +25,6 @@
 #define PGDIR_SIZE (1UL << PGDIR_SHIFT)
 #define PGDIR_MASK (~(PGDIR_SIZE-1))
 
-/* Bits to mask out from a PMD to get to the PTE page */
-/* PMDs point to PTE table fragments which are 4K aligned.  */
-#define PMD_MASKED_BITS0xfff
-/* Bits to mask out from a PGD/PUD to get to the PMD page */
-#define PUD_MASKED_BITS0x1ff
-
 #define _PAGE_COMBO0x0002 /* this is a combo 4k page */
 #define _PAGE_4K_PFN   0x0004 /* PFN is for a single 4k page */
 
@@ -44,6 +38,24 @@
  * of addressable physical space, or 46 bits for the special 4k PFNs.
  */
 #define PTE_RPN_SHIFT  (30)
+/*
+ * we support 8 fragments per PTE page of 64K size.
+ */
+#define PTE_FRAG_NR8
+/*
+ * We use a 2K PTE page fragment and another 4K for storing
+ * real_pte_t hash index. Rounding the entire thing to 8K
+ */
+#define PTE_FRAG_SIZE_SHIFT  13
+#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
+
+/*
+ * Bits to mask out from a PMD to get to the PTE page
+ * PMDs point to PTE table fragments which are PTE_FRAG_SIZE aligned.
+ */
+#define PMD_MASKED_BITS(PTE_FRAG_SIZE - 1)
+/* Bits to mask out from a PGD/PUD to get to the PMD page */
+#define PUD_MASKED_BITS0x1ff
 
 #ifndef __ASSEMBLY__
 
@@ -112,8 +124,12 @@ static inline bool __rpte_sub_valid(real_pte_t rpte, 
unsigned long index)
remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE,\
__pgprot(pgprot_val((prot)) | _PAGE_4K_PFN)))
 
-#define PTE_TABLE_SIZE (sizeof(real_pte_t) << PTE_INDEX_SIZE)
+#define PTE_TABLE_SIZE PTE_FRAG_SIZE
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define PMD_TABLE_SIZE ((sizeof(pmd_t) << PMD_INDEX_SIZE) + (sizeof(unsigned 
long) << PMD_INDEX_SIZE))
+#else
 #define PMD_TABLE_SIZE (sizeof(pmd_t) << PMD_INDEX_SIZE)
+#endif
 #define PGD_TABLE_SIZE (sizeof(pgd_t) << PGD_INDEX_SIZE)
 
 #define pgd_pte(pgd)   (pud_pte(((pud_t){ pgd })))
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h 
b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
index a44660d76096..1d8e26e8167b 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
@@ -9,8 +9,20 @@
 #define PUD_INDEX_SIZE 0
 #define PGD_INDEX_SIZE  12
 
+/*
+ * we support 8 fragments per PTE page of 64K size
+ */
+#define PTE_FRAG_NR8
+/*
+ * We use a 2K PTE page fragment and another 4K for storing
+ * real_pte_t hash index. Rounding the entire thing to 8K
+ */
+#define PTE_FRAG_SIZE_SHIFT  13
+#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
+
+
 #ifndef __ASSEMBLY__
-#define PTE_TABLE_SIZE (sizeof(real_pte_t) << PTE_INDEX_SIZE)
+#define PTE_TABLE_SIZE PTE_FRAG_SIZE
 #define PMD_TABLE_SIZE (sizeof(pmd_t) << PMD_INDEX_SIZE)
 #define PGD_TABLE_SIZE (sizeof(pgd_t) << PGD_INDEX_SIZE)
 #endif /* __ASSEMBLY__ */
@@ -32,9 +44,11 @@
 #define PGDIR_SIZE (1UL << PGDIR_SHIFT)
 #define PGDIR_MASK (~(PGDIR_SIZE-1))
 
-/* Bits to mask out from a PMD to get to the PTE page */
-/* PMDs point to PTE table fragments which are 4K aligned.  */
-#define PMD_MASKED_BITS0xfff
+/*
+ * Bits to mask out from a PMD to get to the PTE page
+ * PMDs point to PTE table fragments which are PTE_FRAG_SIZE aligned.
+ */
+#define PMD_MASKED_BITS(PTE_FRAG_SIZE - 1)
 /* Bits to mask out from a PGD/PUD to get to the PMD page */
 #define PUD_MASKED_BITS0x1ff
 
diff --git a/arch/powerpc/include/asm/pgalloc-64.h 
b/arch/powerpc/include/asm/pgalloc-64.h
index 4f1cc6c46728..69ef28a81733 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -163,16 +163,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
 }
 
 #else /* if CONFIG_PPC_64K_PAGES */
-/*
- * we support 8 fragments per PTE page.
- */
-#define PTE_FRAG_NR8
-/*
- * We use a 2K PTE page fragment and another 4K for storing
- * real_pte_t hash index. Rounding the entire thing to 8K
- */
-#define PTE_FRAG_SIZE_SHIFT  13
-#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
 
 extern pte_t *page_table_alloc(struct mm_struct *, unsigned long, int);
 extern 

[RESEND PATCH] powerpc: rackmeter: Fix module autoload for OF platform driver

2015-10-20 Thread Luis de Bethencourt
From: Luis de Bethencourt 

This platform driver has a OF device ID table but the OF module
alias information is not created so module autoloading won't work.

Signed-off-by: Luis de Bethencourt 
---

Hi,

This is a resend of a patch sent September 17 [0]

This patch adds the missing MODULE_DEVICE_TABLE() for OF to export
that information so modules have the correct aliases built-in and
autoloading works correctly.

A longer explanation by Javier Canillas can be found here:
https://lkml.org/lkml/2015/7/30/519

Thanks,
Luis

[0] https://lkml.org/lkml/2015/9/17/598

 drivers/macintosh/rack-meter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
index 048901a..caaec65 100644
--- a/drivers/macintosh/rack-meter.c
+++ b/drivers/macintosh/rack-meter.c
@@ -582,6 +582,7 @@ static struct of_device_id rackmeter_match[] = {
{ .name = "i2s" },
{ }
 };
+MODULE_DEVICE_TABLE(of, rackmeter_match);
 
 static struct macio_driver rackmeter_driver = {
.driver = {
-- 
2.5.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-20 Thread Paul E. McKenney
On Tue, Oct 20, 2015 at 11:21:47AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 20, 2015 at 03:15:32PM +0800, Boqun Feng wrote:
> > On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> > > 
> > > Am I missing something here?  If not, it seems to me that you need
> > > the leading lwsync to instead be a sync.
> > > 
> > > Of course, if I am not missing something, then this applies also to the
> > > value-returning RMW atomic operations that you pulled this pattern from.
> > > If so, it would seem that I didn't think through all the possibilities
> > > back when PPC_ATOMIC_EXIT_BARRIER moved to sync...  In fact, I believe
> > > that I worried about the RMW atomic operation acting as a barrier,
> > > but not as the load/store itself.  :-/
> > > 
> > 
> > Paul, I know this may be difficult, but could you recall why the
> > __futex_atomic_op() and futex_atomic_cmpxchg_inatomic() also got
> > involved into the movement of PPC_ATOMIC_EXIT_BARRIER to "sync"?
> > 
> > I did some search, but couldn't find the discussion of that patch.
> > 
> > I ask this because I recall Peter once bought up a discussion:
> > 
> > https://lkml.org/lkml/2015/8/26/596
> > 
> > Peter's conclusion seems to be that we could(though didn't want to) live
> > with futex atomics not being full barriers.

I have heard of user-level applications relying on unlock-lock being a
full barrier.  So paranoia would argue for the full barrier.

> > Peter, just be clear, I'm not in favor of relaxing futex atomics. But if
> > I make PPC_ATOMIC_ENTRY_BARRIER being "sync", it will also strengthen
> > the futex atomics, just wonder whether such strengthen is a -fix- or
> > not, considering that I want this patch to go to -stable tree.
> 
> So Linus' argued that since we only need to order against user accesses
> (true) and priv changes typically imply strong barriers (open) we might
> want to allow archs to rely on those instead of mandating they have
> explicit barriers in the futex primitives.
> 
> And I indeed forgot to follow up on that discussion.
> 
> So; does PPC imply full barriers on user<->kernel boundaries? If so, its
> not critical to the futex atomic implementations what extra barriers are
> added.
> 
> If not; then strengthening the futex ops is indeed (probably) a good
> thing :-)

I am not seeing a sync there, but I really have to defer to the
maintainers on this one.  I could easily have missed one.

Thanx, Paul

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 0/7] Remove 4k subpage tracking with hash 64K config

2015-10-20 Thread Aneesh Kumar K.V
Hi,

This patch series is on top of the series posted at 

https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135299.html
"[PATCH V4 00/31] powerpc/mm: Update page table format for book3s 64". In this
series we remove 4k subpage tracking with 64K config. Instead we do a hash
table lookup to get the slot information of 4k hash ptes. This also allow us
to remove real_pte_t. Side effect of the change is that a specific 4k slot
lookup can result in multiple H_READ hcalls. But that should only impact
when we are using 4K subpages which should be rare.

NOTE: I only tested this on systemsim. Wanted to get this out to get early
feedback.

Aneesh Kumar K.V (7):
  powerpc/mm: Don't hardcode page table size
  powerpc/mm: Don't hardcode the hash pte slot shift
  powerpc/nohash: Update 64K nohash config to have 32 pte fragement
  powerpc/mm: Don't track 4k subpage information with 64k linux page
size
  powerpc/mm: update frag size
  powerpc/mm: Update pte_iterate_hashed_subpaes args
  powerpc/mm: getrid of real_pte_t

 arch/powerpc/include/asm/book3s/64/hash-64k.h| 75 +---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 25 +++-
 arch/powerpc/include/asm/machdep.h   |  2 +
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h | 21 +--
 arch/powerpc/include/asm/nohash/64/pgtable.h | 24 +++-
 arch/powerpc/include/asm/page.h  | 15 -
 arch/powerpc/include/asm/pgalloc-64.h| 10 
 arch/powerpc/include/asm/tlbflush.h  |  4 +-
 arch/powerpc/mm/hash64_64k.c | 67 +
 arch/powerpc/mm/hash_native_64.c | 35 ---
 arch/powerpc/mm/hash_utils_64.c  | 13 ++--
 arch/powerpc/mm/init_64.c|  7 +--
 arch/powerpc/mm/pgtable_64.c |  6 +-
 arch/powerpc/mm/tlb_hash64.c | 15 +++--
 arch/powerpc/platforms/pseries/lpar.c| 23 ++--
 15 files changed, 175 insertions(+), 167 deletions(-)

-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 2/7] powerpc/mm: Don't hardcode the hash pte slot shift

2015-10-20 Thread Aneesh Kumar K.V
Use the #define instead of open-coding the same

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 2 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 2 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 565f9418c25f..681657cabbe4 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -71,7 +71,7 @@ static inline unsigned long __rpte_to_hidx(real_pte_t rpte, 
unsigned long index)
 {
if ((pte_val(rpte.pte) & _PAGE_COMBO))
return (unsigned long) rpte.hidx[index] >> 4;
-   return (pte_val(rpte.pte) >> 12) & 0xf;
+   return (pte_val(rpte.pte) >> _PAGE_F_GIX_SHIFT) & 0xf;
 }
 
 static inline pte_t __rpte_to_pte(real_pte_t rpte)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 0b43ca60dcb9..64ef7316ff88 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -50,7 +50,7 @@
 #define __real_pte(a,e,p)  (e)
 #define __rpte_to_pte(r)   (__pte(r))
 #endif
-#define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) >> 12)
+#define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) 
>>_PAGE_F_GIX_SHIFT)
 
 #define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)   \
do { \
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index c4dff4d41c26..8969b4c93c4f 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -121,7 +121,7 @@
 #define __real_pte(a,e,p)  (e)
 #define __rpte_to_pte(r)   (__pte(r))
 #endif
-#define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) >> 12)
+#define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) >> 
_PAGE_F_GIX_SHIFT)
 
 #define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)   \
do { \
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RESEND PATCH 0/2] powerpc: Fix module autoload for OF platform drivers

2015-10-20 Thread Luis de Bethencourt
Hi,

This is a resend of this patch series. It was posted on September 18 [0]

These patches add the missing MODULE_DEVICE_TABLE() for OF to export
the information so modules have the correct aliases built-in and
autoloading works correctly.

A longer explanation by Javier Canillas can be found here:
https://lkml.org/lkml/2015/7/30/519

Thanks,
Luis

[0] https://lkml.org/lkml/2015/9/18/749
[1] https://lkml.org/lkml/2015/9/18/750
[2] https://lkml.org/lkml/2015/9/18/752

Luis de Bethencourt (2):
  powerpc: platforms: mpc52xx_lpbfifo: Fix module autoload for OF
platform driver
  powerpc: axonram: Fix module autoload for OF platform driver

 arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c | 1 +
 arch/powerpc/sysdev/axonram.c | 1 +
 2 files changed, 2 insertions(+)

-- 
2.5.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 6/7] powerpc/mm: Update pte_iterate_hashed_subpaes args

2015-10-20 Thread Aneesh Kumar K.V
Now that we don't really use real_pte_t drop them from iterator argument
list. The follow up patch will remove real_pte_t completely

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  5 +++--
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  7 +++
 arch/powerpc/include/asm/nohash/64/pgtable.h  |  7 +++
 arch/powerpc/mm/hash_native_64.c  | 10 --
 arch/powerpc/mm/hash_utils_64.c   |  6 +++---
 arch/powerpc/platforms/pseries/lpar.c |  4 ++--
 6 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index a28dbfe2baed..19e0afb36fa8 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -77,9 +77,10 @@ static inline pte_t __rpte_to_pte(real_pte_t rpte)
  * Trick: we set __end to va + 64k, which happens works for
  * a 16M page as well as we want only one iteration
  */
-#define pte_iterate_hashed_subpages(rpte, psize, vpn, index, shift)\
+#define pte_iterate_hashed_subpages(vpn, psize, shift) \
do {\
-   unsigned long __end = vpn + (1UL << (PAGE_SHIFT - VPN_SHIFT));  
\
+   unsigned long index;\
+   unsigned long __end = vpn + (1UL << (PAGE_SHIFT - VPN_SHIFT)); \
shift = mmu_psize_defs[psize].shift;\
for (index = 0; vpn < __end; index++,   \
 vpn += (1L << (shift - VPN_SHIFT))) {  \
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 64ef7316ff88..79a90ca7b9f6 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -52,10 +52,9 @@
 #endif
 #define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) 
>>_PAGE_F_GIX_SHIFT)
 
-#define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)   \
-   do { \
-   index = 0;   \
-   shift = mmu_psize_defs[psize].shift; \
+#define pte_iterate_hashed_subpages(vpn, psize, shift) \
+   do {\
+   shift = mmu_psize_defs[psize].shift;\
 
 #define pte_iterate_hashed_end() } while(0)
 
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 8969b4c93c4f..37b5a62d18f4 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -123,10 +123,9 @@
 #endif
 #define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) >> 
_PAGE_F_GIX_SHIFT)
 
-#define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)   \
-   do { \
-   index = 0;   \
-   shift = mmu_psize_defs[psize].shift; \
+#define pte_iterate_hashed_subpages(vpn, psize, shift)   \
+   do { \
+   shift = mmu_psize_defs[psize].shift; \
 
 #define pte_iterate_hashed_end() } while(0)
 
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index ca747ae19c76..b035dafcdea0 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -646,7 +646,7 @@ static void native_hpte_clear(void)
 static void native_flush_hash_range(unsigned long number, int local)
 {
unsigned long vpn;
-   unsigned long hash, index, hidx, shift, slot;
+   unsigned long hash, hidx, shift, slot;
struct hash_pte *hptep;
unsigned long hpte_v;
unsigned long want_v;
@@ -665,7 +665,7 @@ static void native_flush_hash_range(unsigned long number, 
int local)
vpn = batch->vpn[i];
pte = batch->pte[i];
 
-   pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+   pte_iterate_hashed_subpages(vpn, psize, shift) {
hash = hpt_hash(vpn, shift, ssize);
hidx = __rpte_to_hidx(pte, hash, vpn, ssize, 
_slot);
if (!valid_slot)
@@ -693,8 +693,7 @@ static void native_flush_hash_range(unsigned long number, 
int local)
vpn = batch->vpn[i];
pte = batch->pte[i];
 
-   pte_iterate_hashed_subpages(pte, psize,
-   vpn, index, shift) {
+   pte_iterate_hashed_subpages(vpn, psize, shift) {

[RESEND PATCH 2/2] powerpc: axonram: Fix module autoload for OF platform driver

2015-10-20 Thread Luis de Bethencourt
From: Luis de Bethencourt 

This platform driver has a OF device ID table but the OF module
alias information is not created so module autoloading won't work.

Signed-off-by: Luis de Bethencourt 
---
 arch/powerpc/sysdev/axonram.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index d2b79bc..51b41c9 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -312,6 +312,7 @@ static const struct of_device_id axon_ram_device_id[] = {
},
{}
 };
+MODULE_DEVICE_TABLE(of, axon_ram_device_id);
 
 static struct platform_driver axon_ram_driver = {
.probe  = axon_ram_probe,
-- 
2.5.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5] drivers/tty: make more bool drivers explicitly non-modular

2015-10-20 Thread Paul Gortmaker
[Re: [PATCH 0/5] drivers/tty: make more bool drivers explicitly non-modular] On 
20/10/2015 (Tue 17:10) Alexandre Belloni wrote:

> On 18/10/2015 at 18:21:13 -0400, Paul Gortmaker wrote :
> > The one common thread here for all the patches is that we also
> > scrap the .remove functions which would only be used for module
> > unload (impossible) and driver unbind.  For the drivers here, there
> > doesn't seem to be a sensible unbind use case (vs. e.g. a multiport
> > PCI ethernet driver where one port is unbound and passed through to
> > a kvm guest or similar).  Hence we just explicitly disallow any
> > driver unbind operations to help prevent root from doing something
> > illogical to the machine that they could have done previously.
> > 
> > We've already done this for drivers/tty/serial/mpsc.c previously.
> > 
> > Build tested for allmodconfig on ARM64 and powerpc for tty/tty-testing.
> > 
> 
> So, how does this actually build test atmel_serial?

Not sure why this should be a surprise;  I build test it exactly like this:

paul@builder-02:~/git/linux-head$ echo $ARCH
arm64
paul@builder-02:~/git/linux-head$ echo $CROSS_COMPILE 
aarch64-linux-gnu-
paul@builder-02:~/git/linux-head$ make O=../arm-build/  
drivers/tty/serial/atmel_serial.o
make[1]: Entering directory '/home/paul/git/arm-build'
arch/arm64/Makefile:25: LSE atomics not supported by binutils
  CHK include/config/kernel.release
  Using /home/paul/git/linux-head as source for kernel
  GEN ./Makefile
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h

[...]

  HOSTCC  scripts/sign-file
  HOSTCC  scripts/extract-cert
  CC  drivers/tty/serial/atmel_serial.o
make[1]: Leaving directory '/home/paul/git/arm-build'
paul@builder-02:~/git/linux-head$ 

It did build; no warning/error.  Would you call it an invalid build test?

> 
> A proper solution would be to actually make it a tristate and allow
> building as a module. I think it currently fails because of
> console_initcall() but that is certainly fixable.

Well, as per other threads on this topic, if people want to extend
the functionality to support tristate, then great.  But please do
not confuse that with existing functionality which is clearly non
modular in this case.

Thanks,
Paul.
--

> 
> 
> -- 
> Alexandre Belloni, Free Electrons
> Embedded Linux, Kernel and Android engineering
> http://free-electrons.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V6 1/6] powerpc/powernv: don't enable SRIOV when VF BAR has non 64bit-prefetchable BAR

2015-10-20 Thread Gavin Shan
On Tue, Oct 20, 2015 at 05:03:00PM +0800, Wei Yang wrote:
>On PHB_IODA2, we enable SRIOV devices by mapping IOV BAR with M64 BARs. If
^

s/PHB_IODA2/PHB3 or s/PHB_IODA2/IODA2 PHB

>a SRIOV device's IOV BAR is not 64bit-prefetchable, this is not assigned
>from 64bit prefetchable window, which means M64 BAR can't work on it.
>
>The reason is PCI bridges support only 2 windows and the kernel code
^

It would be more accurate: "2 memory windows".

>programs bridges in the way that one window is 32bit-nonprefetchable and
>the other one is 64bit-prefetchable. So if devices' IOV BAR is 64bit and
>non-prefetchable, it will be mapped into 32bit space and therefore M64
>cannot be used for it.
>
>This patch makes this explicit and truncate IOV resource in this case to

>save MMIO space.
>
>Signed-off-by: Wei Yang 
>Reviewed-by: Gavin Shan 
>Acked-by: Alexey Kardashevskiy 
>---
> arch/powerpc/platforms/powernv/pci-ioda.c | 34 ---
> 1 file changed, 18 insertions(+), 16 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 85cbc96..f042fed 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -908,9 +908,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
>int offset)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   /*
>* The actual IOV BAR range is determined by the start address
>* and the actual size for num_vfs VFs BAR.  This check is to
>@@ -939,9 +936,6 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
>int offset)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
>   res2 = *res;
>   res->start += size * offset;
>@@ -1221,9 +1215,6 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, 
>u16 num_vfs)
>   if (!res->flags || !res->parent)
>   continue;
>
>-  if (!pnv_pci_is_mem_pref_64(res->flags))
>-  continue;
>-
>   for (j = 0; j < vf_groups; j++) {
>   do {
>   win = 
> find_next_zero_bit(>ioda.m64_bar_alloc,
>@@ -1510,6 +1501,12 @@ int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 
>num_vfs)
>   pdn = pci_get_pdn(pdev);
>
>   if (phb->type == PNV_PHB_IODA2) {
>+  if (!pdn->vfs_expanded) {
>+  dev_info(>dev, "don't support this SRIOV device"
>+  " with non 64bit-prefetchable IOV BAR\n");
>+  return -ENOSPC;
>+  }
>+
>   /* Calculate available PE for required VFs */
>   mutex_lock(>ioda.pe_alloc_mutex);
>   pdn->offset = bitmap_find_next_zero_area(
>@@ -2775,9 +2772,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>pci_dev *pdev)
>   if (!res->flags || res->parent)
>   continue;
>   if (!pnv_pci_is_mem_pref_64(res->flags)) {
>-  dev_warn(>dev, " non M64 VF BAR%d: %pR\n",
>+  dev_warn(>dev, "Don't support SR-IOV with"
>+  " non M64 VF BAR%d: %pR. \n",
>i, res);
>-  continue;
>+  goto truncate_iov;
>   }
>
>   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>@@ -2796,11 +2794,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>pci_dev *pdev)
>   res = >resource[i + PCI_IOV_RESOURCES];
>   if (!res->flags || res->parent)
>   continue;
>-  if (!pnv_pci_is_mem_pref_64(res->flags)) {
>-  dev_warn(>dev, "Skipping expanding VF BAR%d: 
>%pR\n",
>-   i, res);
>-  continue;
>-  }
>
>   dev_dbg(>dev, " Fixing VF BAR%d: %pR to\n", i, res);
>   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>@@ -2810,6 +2803,15 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>pci_dev *pdev)
>i, res, mul);
>   }
>   pdn->vfs_expanded = mul;
>+
>+  return;
>+
>+truncate_iov:
>+  /* To save MMIO space, IOV BAR is truncated. */
>+  for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>+  res = >resource[i + PCI_IOV_RESOURCES];
>+  res->end = res->start - 1;
>+  }


Re: [PATCH v2] barriers: introduce smp_mb__release_acquire and update documentation

2015-10-20 Thread Paul E. McKenney
On Mon, Oct 19, 2015 at 09:17:18AM +0800, Boqun Feng wrote:
> On Fri, Oct 09, 2015 at 10:40:39AM +0100, Will Deacon wrote:
> > On Fri, Oct 09, 2015 at 10:31:38AM +0200, Peter Zijlstra wrote:
> [snip]
> > > 
> > > So lots of little confusions added up to complete fail :-{
> > > 
> > > Mostly I think it was the UNLOCK x + LOCK x are fully ordered (where I
> > > forgot: but not against uninvolved CPUs) and RELEASE/ACQUIRE are
> > > transitive (where I forgot: RELEASE/ACQUIRE _chains_ are transitive, but
> > > again not against uninvolved CPUs).
> > > 
> > > Which leads me to think I would like to suggest alternative rules for
> > > RELEASE/ACQUIRE (to replace those Will suggested; as I think those are
> > > partly responsible for my confusion).
> > 
> > Yeah, sorry. I originally used the phrase "fully ordered" but changed it
> > to "full barrier", which has stronger transitivity (newly understood
> > definition) requirements that I didn't intend.
> > 
> > RELEASE -> ACQUIRE should be used for message passing between two CPUs
> > and not have ordering effects on other observers unless they're part of
> > the RELEASE -> ACQUIRE chain.
> > 
> > >  - RELEASE -> ACQUIRE is fully ordered (but not a full barrier) when
> > >they operate on the same variable and the ACQUIRE reads from the
> > >RELEASE. Notable, RELEASE/ACQUIRE are RCpc and lack transitivity.
> > 
> > Are we explicit about the difference between "fully ordered" and "full
> > barrier" somewhere else, because this looks like it will confuse people.
> > 
> 
> This is confusing me right now. ;-)
> 
> Let's use a simple example for only one primitive, as I understand it,
> if we say a primitive A is "fully ordered", we actually mean:
> 
> 1.The memory operations preceding(in program order) A can't be
>   reordered after the memory operations following(in PO) A.
> 
> and
> 
> 2.The memory operation(s) in A can't be reordered before the
>   memory operations preceding(in PO) A and after the memory
>   operations following(in PO) A.
> 
> If we say A is a "full barrier", we actually means:
> 
> 1.The memory operations preceding(in program order) A can't be
>   reordered after the memory operations following(in PO) A.
> 
> and
> 
> 2.The memory ordering guarantee in #1 is visible globally.
> 
> Is that correct? Or "full barrier" is more strong than I understand,
> i.e. there is a third property of "full barrier":
> 
> 3.The memory operation(s) in A can't be reordered before the
>   memory operations preceding(in PO) A and after the memory
>   operations following(in PO) A.
> 
> IOW, is "full barrier" a more strong version of "fully ordered" or not?

There is also the question of whether the barrier forces ordering
of unrelated stores, everything initially zero and all accesses
READ_ONCE() or WRITE_ONCE():

P0  P1  P2  P3
X = 1;  Y = 1;  r1 = X; r3 = Y;
some_barrier(); some_barrier();
r2 = Y; r4 = X;

P2's and P3's ordering could be globally visible without requiring
P0's and P1's independent stores to be ordered, for example, if you
used smp_rmb() for some_barrier().  In contrast, if we used smp_mb()
for barrier, everyone would agree on the order of P0's and P0's stores.

There are actually a fair number of different combinations of
aspects of memory ordering.  We will need to choose wisely.  ;-)

My hope is that the store-ordering gets folded into the globally
visible transitive level.  Especially given that I have not (yet)
seen any algorithms used in production that relied on the ordering of
independent stores.

Thanx, Paul

> Regards,
> Boqun
> 
> > >  - RELEASE -> ACQUIRE can be upgraded to a full barrier (including
> > >transitivity) using smp_mb__release_acquire(), either before RELEASE
> > >or after ACQUIRE (but consistently [*]).
> > 
> > Hmm, but we don't actually need this for RELEASE -> ACQUIRE, afaict. This
> > is just needed for UNLOCK -> LOCK, and is exactly what RCU is currently
> > using (for PPC only).
> > 
> > Stepping back a second, I believe that there are three cases:
> > 
> > 
> >  RELEASE X -> ACQUIRE Y (same CPU)
> >* Needs a barrier on TSO architectures for full ordering
> > 
> >  UNLOCK X -> LOCK Y (same CPU)
> >* Needs a barrier on PPC for full ordering
> > 
> >  RELEASE X -> ACQUIRE X (different CPUs)
> >  UNLOCK X -> ACQUIRE X (different CPUs)
> >* Fully ordered everywhere...
> >* ... but needs a barrier on PPC to become a full barrier
> > 
> > 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] powerpc/prom: Avoid reference to potentially freed memory

2015-10-20 Thread Christophe JAILLET
of_get_property() is used inside the loop, but then the reference to the
node is dropped before dereferencing the prop pointer, which could by then
point to junk if the node has been freed.
Instead use of_property_read_u32() to actually read the property
value before dropping the reference.

Use of_get_next_parent to simplify code.

Signed-off-by: Christophe JAILLET 
---
v2: Fix missing '{'
v3: Use of_get_next_parent to simply code
*** COMPILE-TESTED ONLY ***
---
 arch/powerpc/kernel/prom.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index bef76c5..ba29c0d 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -783,17 +783,14 @@ void __init early_get_first_memblock_info(void *params, 
phys_addr_t *size)
 int of_get_ibm_chip_id(struct device_node *np)
 {
of_node_get(np);
-   while(np) {
-   struct device_node *old = np;
-   const __be32 *prop;
+   while (np) {
+   u32 chip_id;
 
-   prop = of_get_property(np, "ibm,chip-id", NULL);
-   if (prop) {
+   if (!of_property_read_u32(np, "ibm,chip-id", _id)) {
of_node_put(np);
-   return be32_to_cpup(prop);
+   return chip_id;
}
-   np = of_get_parent(np);
-   of_node_put(old);
+   np = of_get_next_parent(np);
}
return -1;
 }
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 22/22] of/platform: Defer probes of registered devices

2015-10-20 Thread Scott Wood
On Mon, 2015-09-21 at 16:03 +0200, Tomeu Vizoso wrote:
> Instead of trying to match and probe platform and AMBA devices right
> after each is registered, delay their probes until device_initcall_sync.
> 
> This means that devices will start probing once all built-in drivers
> have registered, and after all platform and AMBA devices from the DT
> have been registered already.
> 
> This allows us to prevent deferred probes by probing dependencies on
> demand.
> 
> Signed-off-by: Tomeu Vizoso 
> ---
> 
> Changes in v4:
> - Also defer probes of AMBA devices registered from the DT as they can
>   also request resources.
> 
>  drivers/of/platform.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)

This breaks arch/powerpc/sysdev/fsl_pci.c.  The PCI bus is an OF platform 
device, and it must be probed before pcibios_init() which is a 
subsys_initcall(), or else the PCI bus never gets scanned.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/msi: fix section mismatch warning

2015-10-20 Thread Denis Kirjanov
Building with CONFIG_DEBUG_SECTION_MISMATCH
gives the following warning:

WARNING: vmlinux.o(.text+0x41fa8): Section mismatch in reference from
the function .msi_bitmap_alloc() to the function
.init.text:.memblock_virt_alloc_try_nid()
The function .msi_bitmap_alloc() references
the function __init .memblock_virt_alloc_try_nid().
This is often because .msi_bitmap_alloc lacks a __init
annotation or the annotation of .memblock_virt_alloc_try_nid is wrong.

Signed-off-by: Denis Kirjanov 
---
 arch/powerpc/include/asm/msi_bitmap.h | 2 +-
 arch/powerpc/sysdev/msi_bitmap.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/msi_bitmap.h 
b/arch/powerpc/include/asm/msi_bitmap.h
index 1ec7125..fbd3424 100644
--- a/arch/powerpc/include/asm/msi_bitmap.h
+++ b/arch/powerpc/include/asm/msi_bitmap.h
@@ -29,7 +29,7 @@ void msi_bitmap_reserve_hwirq(struct msi_bitmap *bmp, 
unsigned int hwirq);
 
 int msi_bitmap_reserve_dt_hwirqs(struct msi_bitmap *bmp);
 
-int msi_bitmap_alloc(struct msi_bitmap *bmp, unsigned int irq_count,
+int __init_refok msi_bitmap_alloc(struct msi_bitmap *bmp, unsigned int 
irq_count,
 struct device_node *of_node);
 void msi_bitmap_free(struct msi_bitmap *bmp);
 
diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
index 1a826f3..ed5234e 100644
--- a/arch/powerpc/sysdev/msi_bitmap.c
+++ b/arch/powerpc/sysdev/msi_bitmap.c
@@ -112,7 +112,7 @@ int msi_bitmap_reserve_dt_hwirqs(struct msi_bitmap *bmp)
return 0;
 }
 
-int msi_bitmap_alloc(struct msi_bitmap *bmp, unsigned int irq_count,
+int __init_refok msi_bitmap_alloc(struct msi_bitmap *bmp, unsigned int 
irq_count,
 struct device_node *of_node)
 {
int size;
-- 
2.4.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev