Re: [PATCH 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-13 Thread Balbir Singh


On 13/04/16 04:06, Akshay Adiga wrote:
> This patch brings down global pstate at a slower rate than the local
> pstate. As the frequency transition latency from pmin to pmax is
> observed to be in few millisecond granurality. It takes a performance
> penalty during sudden frequency rampup. Hence by holding global pstates
> higher than local pstate makes the subsequent rampups faster.

What domains does local and global refer to?

> 
> A global per policy structure is maintained to keep track of the global
> and local pstate changes. The global pstate is brought down using a
> parabolic equation. The ramp down time to pmin is set to 6 seconds. To
> make sure that the global pstates are dropped at regular interval , a
> timer is queued for every 2 seconds, which eventually brings the pstate
> down to local pstate.
> 
> Iozone results show fairly consistent performance boost.
> YCSB on redis shows improved Max latencies in most cases.
> 
> Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb with
> different record sizes . The following table shows IOoperations/sec with and
> without patch.
> 
> Iozone Results ( in op/sec) ( mean over 3 iterations )
> 
> file size-withwithout
> recordsize-IOtype patch   patch% 
> change
> --
> 200704-1-SeqWrite 1616532 1615425 0.06
> 200704-1-Rewrite  2423195 2303130 5.21
> 200704-2-SeqWrite 1628577 1602620 1.61
> 200704-2-Rewrite  2428264 2312154 5.02
> 200704-4-SeqWrite 1617605 1617182 0.02
> 200704-4-Rewrite  2430524 2351238 3.37
> 200704-8-SeqWrite 1629478 1600436 1.81
> 200704-8-Rewrite  2415308 2298136 5.09
> 200704-16-SeqWrite1619632 1618250 0.08
> 200704-16-Rewrite 2396650 2352591 1.87
> 200704-32-SeqWrite1632544 1598083 2.15
> 200704-32-Rewrite 2425119 2329743 4.09
> 200704-64-SeqWrite1617812 1617235 0.03
> 200704-64-Rewrite 2402021 2321080 3.48
> 200704-128-SeqWrite   1631998 1600256 1.98
> 200704-128-Rewrite2422389 2304954 5.09
> 200704-256 SeqWrite   1617065 1616962 0.00
> 200704-256-Rewrite2432539 2301980 5.67
> 200704-512-SeqWrite   1632599 1598656 2.12
> 200704-512-Rewrite2429270 2323676 4.54
> 200704-1024-SeqWrite  1618758 1616156 0.16
> 200704-1024-Rewrite   2431631 2315889 4.99
> 401408-1-SeqWrite 1631479 1608132 1.45
> 401408-1-Rewrite  2501550 2459409 1.71
> 401408-2-SeqWrite 1617095 1626069 -0.55
> 401408-2-Rewrite  2507557 2443621 2.61
> 401408-4-SeqWrite 1629601 1611869 1.10
> 401408-4-Rewrite  2505909 2462098 1.77
> 401408-8-SeqWrite 1617110 1626968 -0.60
> 401408-8-Rewrite  2512244 2456827 2.25
> 401408-16-SeqWrite1632609 1609603 1.42
> 401408-16-Rewrite 2500792 2451405 2.01
> 401408-32-SeqWrite1619294 1628167 -0.54
> 401408-32-Rewrite 2510115 2451292 2.39
> 401408-64-SeqWrite1632709 1603746 1.80
> 401408-64-Rewrite 2506692 2433186 3.02
> 401408-128-SeqWrite   1619284 1627461 -0.50
> 401408-128-Rewrite2518698 2453361 2.66
> 401408-256-SeqWrite   1634022 1610681 1.44
> 401408-256-Rewrite2509987 2446328 2.60
> 401408-512-SeqWrite   1617524 1628016 -0.64
> 401408-512-Rewrite2504409 2442899 2.51
> 401408-1024-SeqWrite  1629812 1611566 1.13
> 401408-1024-Rewrite   2507620  24429682.64
> 
> Tested with YCSB workloada over redis for 1 million records and 1 million
> operation. Each test was carried out with target operations per second and
> persistence disabled. 
> 
> Max-latency (in us)( mean over 5 iterations )
> ---
> op/s  Operation   with patch  without patch   %change
> 
> 15000 Read61480.6 50261.4 22.32
> 15000 cleanup 215.2   

Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-04-13 Thread Gavin Shan
On Thu, Apr 14, 2016 at 01:26:51PM +1000, Alexey Kardashevskiy wrote:

.../...

>>
>>Do you mean physically pull the adapter out and insert the same
>>adapter back? What's the point for the test case?
>
>
>Because this is what the patchset is for - to replace a physical device on a
>physical machine. Powering on/off the slots via sysfs is just an
>approximation (which is fine when you are debugging), something can go wrong
>and require some work but you do not know it for sure.
>

Yes, It's absolutely worthy to be covered by the test cases though case (2)
covers part of that. Anyway, I'll test it through in next revision. Thanks
for your review.

>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 00/29] bitops: add parity functions

2016-04-13 Thread zengzhaoxiu
From: Zhaoxiu Zeng 

When I do "grep parity -r linux", I found many parity calculations
distributed in many drivers.

This patch series does:
  1. provide generic and architecture-specific parity calculations
  2. remove drivers' local parity calculations, use bitops' parity
 functions instead
  3. replace "hweightN(x) & 1" with "parityN(x)" to improve readability,
 and improve performance on some CPUs that without popcount support

I did not use GCC's __builtin_parity* functions, based on the following reasons:
  1. I don't know where to identify which version of GCC from the beginning
 supported __builtin_parity for the architecture.
  2. For the architecture that doesn't has popcount instruction, GCC instead use
 "call __paritysi2" (__paritydi2 for 64-bits). So if use __builtin_parity, 
we must
 provide __paritysi2 and __paritydi2 functions for these architectures.
 Additionally, parity4,8,16 might be "__builtin_parity(x & mask)", but the 
"& mask"
 operation is totally unnecessary.
  3. For the architecture that has popcount instruction, we do the same things.
  4. For powerpc, sparc, and x86, we do runtime patching to use popcount 
instruction
 if the CPU support.

I have compiled successfully with x86_64_defconfig, i386_defconfig, 
pseries_defconfig
and sparc64_defconfig.

Changes to v2:
- Add constant PARITY_MAGIC (proposals by Sam Ravnborg)
- Add include/asm-generic/bitops/popc-parity.h (proposals by Chris Metcalf)
- Tile uses popc-parity.h directly
- Mips uses popc-parity.h if has usable __builtin_popcount
- Add few comments in powerpc's and sparc's parity.S
- X86, remove custom calling convention

Changes to v1:
- Add runtime patching for powerpc, sparc, and x86
- Avr32 use grenric parity too
- Fix error in ssfdc's patch, and add commit message
- Don't change the original code composition of drivers/iio/gyro/adxrs450.c
- Directly assignement to phy_cap.parity in drivers/scsi/isci/phy.c

Regards,

=== diffstat ===

Zhaoxiu Zeng (29):
  bitops: add parity functions
  Include generic parity.h in some architectures' bitops.h
  Add alpha-specific parity functions
  Add blackfin-specific parity functions
  Add ia64-specific parity functions
  Tile and MIPS (if has usable __builtin_popcount) use popcount parity
functions
  Add powerpc-specific parity functions
  Add sparc-specific parity functions
  Add x86-specific parity functions
  sunrpc: use parity8
  mips: use parity functions in cerr-sb1.c
  bch: use parity32
  media: use parity8 in vivid-vbi-gen.c
  media: use parity functions in saa7115
  input: use parity32 in grip_mp
  input: use parity64 in sidewinder
  input: use parity16 in ams_delta_serio
  scsi: use parity32 in isci's phy
  mtd: use parity16 in ssfdc
  mtd: use parity functions in inftlcore
  crypto: use parity functions in qat_hal
  mtd: use parity16 in sm_ftl
  ethernet: use parity8 in sun/niu.c
  input: use parity8 in pcips2
  input: use parity8 in saps2
  iio: use parity32 in adxrs450
  serial: use parity32 in max3100
  input: use parity8 in elantech
  ethernet: use parity8 in broadcom/tg3.c

 arch/alpha/include/asm/bitops.h  |  27 +
 arch/arc/include/asm/bitops.h|   1 +
 arch/arm/include/asm/bitops.h|   1 +
 arch/arm64/include/asm/bitops.h  |   1 +
 arch/avr32/include/asm/bitops.h  |   1 +
 arch/blackfin/include/asm/bitops.h   |  31 ++
 arch/c6x/include/asm/bitops.h|   1 +
 arch/cris/include/asm/bitops.h   |   1 +
 arch/frv/include/asm/bitops.h|   1 +
 arch/h8300/include/asm/bitops.h  |   1 +
 arch/hexagon/include/asm/bitops.h|   1 +
 arch/ia64/include/asm/bitops.h   |  31 ++
 arch/m32r/include/asm/bitops.h   |   1 +
 arch/m68k/include/asm/bitops.h   |   1 +
 arch/metag/include/asm/bitops.h  |   1 +
 arch/mips/include/asm/bitops.h   |   7 ++
 arch/mips/mm/cerr-sb1.c  |  67 -
 arch/mn10300/include/asm/bitops.h|   1 +
 arch/openrisc/include/asm/bitops.h   |   1 +
 arch/parisc/include/asm/bitops.h |   1 +
 arch/powerpc/include/asm/bitops.h|  11 +++
 arch/powerpc/lib/Makefile|   2 +-
 arch/powerpc/lib/parity_64.S | 142 +++
 arch/powerpc/lib/ppc_ksyms.c |   5 +
 arch/s390/include/asm/bitops.h   |   1 +
 arch/sh/include/asm/bitops.h |   1 +
 arch/sparc/include/asm/bitops_32.h   |   1 +
 arch/sparc/include/asm/bitops_64.h   |  18 
 arch/sparc/kernel/sparc_ksyms_64.c   |   6 ++
 arch/sparc/lib/Makefile  |   2 +-
 arch/sparc/lib/parity.S  | 128 
 arch/tile/include/asm/bitops.h   |   2 +
 arch/x86/include/asm/arch_hweight.h  |   5 

Re: [PATCH] ftrace: filter: Match dot symbols when searching functions on ppc64.

2016-04-13 Thread Steven Rostedt
On Wed, 13 Apr 2016 21:39 -0300
Thiago Jung Bauermann  wrote:


> People seem to be considering patches for next, so this looks like a good 
> moment to ping about this one.

Your timing is fine with respect to the merge window. I'm currently
traveling, but I'll get to it on Monday. I have it marked as TODO.

> 
> Ps: patchwork seems to have an issue which causes it to show the message 
> body as if it were the commit message, but if you feed my original email 
> (the one I’m replying to here) to git am, the commit message will be 
> correct.
> 

Yeah I noticed that. But I'll be able to handle it.

Thanks,

-- Steve

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-04-13 Thread Alexey Kardashevskiy

On 04/14/2016 11:30 AM, Gavin Shan wrote:

On Thu, Apr 14, 2016 at 09:57:32AM +1000, Alistair Popple wrote:

Hi Gavin,




Why exactly cannot EEH reset changes go to a smaller separate patchset
(before hotplug)?



As I explained before, the patchset's order is: PCI generic part,
PowerNV PCI related, EEH related, device-tree part and hotplug driver.

The EEH reset change is included in PATCH[37/45]. There is no point
to reorder the patches.


I don't understand all of the dependencies but if possible splitting the
series up into a set of smaller self-contained patch series makes things
easier to review and may make it easier for you to get this functionality
reviewed and accepted into upstream.



Thanks, Alistair. I will move those cleanup/refactor related patches
to form a separate series which is expected to be merged first. That
will helps the reviewers to focus on the patches with complicated
changes as you suggested. Alexey, please let me know if that way is
you like to see or not.


I do not know yet, I have not finished reviewing this version. May be the 
EEH reset patch depends on 1/45..36/45; or it only makes sense when 45/45 
is applied - this all is unclear.


If 37/45 has no dependencies and good just by itself, you could have posted 
it separately few months ago and it would have reached upstream by now and 
this patchset would be at least one patch shorter and you would not have to 
rebase all 45 patches over and over again on top of the current upstream 
tree...




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE

2016-04-13 Thread Alexey Kardashevskiy

On 04/14/2016 09:54 AM, Gavin Shan wrote:

On Wed, Apr 13, 2016 at 06:29:42PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:43 PM, Gavin Shan wrote:

Currently, there is one macro (TCE32_TABLE_SIZE) representing the
TCE table size for one DMA32 segment. The constant representing
the DMA32 segment size (1 << 28) is still used in the code.

This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
segment size. the TCE table size can be calcualted when the page


s/calcualted/calculated/



has fixed 4KB size. So all the related calculation depends on one
macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.


Please move PNV_IODA1_DMA32_SEGSIZE where TCE32_TABLE_SIZE was.




Signed-off-by: Gavin Shan 
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 30 +-
  arch/powerpc/platforms/powernv/pci.h  |  1 +
  2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index d18b95e..e60cff6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -48,9 +48,6 @@
  #include "powernv.h"
  #include "pci.h"

-/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
-#define TCE32_TABLE_SIZE   ((0x1000 / 0x1000) * 8)
-
  #define POWERNV_IOMMU_DEFAULT_LEVELS  1
  #define POWERNV_IOMMU_MAX_LEVELS  5

@@ -2034,7 +2031,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,

struct page *tce_mem = NULL;
struct iommu_table *tbl;
-   unsigned int i;
+   unsigned int tce32_segsz, i;



PNV_IODA1_DMA32_SEGSIZE is a segment size in bytes. The name @tce32_segsz
also suggests that it is a segment size in bytes (otherwise it would be
tce32_seg_entries or something like this) but it is not, it is a number of
TCE entries (arch/powerpc/kernel/iommu.c uses "entry" for these). And
tce32_segsz never changes. So:

const unsigned int entries = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K
- 3);



Are you sure @tce32_segsz and equation you gave are for number of TCE entries,
not the size of meory required for the DMA32 segment TCE table?


No, I am not :) "-3" makes it a table size in bytes, so it is rather 
tablesz then.






int64_t rc;
void *addr;

@@ -2054,29 +2051,34 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
/* Grab a 32-bit TCE table */
pe->tce32_seg = base;
pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
-   (base << 28), ((base + segs) << 28) - 1);
+   base * PNV_IODA1_DMA32_SEGSIZE,
+   (base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);

/* XXX Currently, we allocate one big contiguous table for the
 * TCEs. We only really need one chunk per 256M of TCE space
 * (ie per segment) but that's an optimization for later, it
 * requires some added smarts with our get/put_tce implementation
+*
+* Each TCE page is 4KB in size and each TCE entry occupies 8
+* bytes
 */
+   tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);



tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
-  get_order(TCE32_TABLE_SIZE * segs));
+  get_order(tce32_segsz * segs));
if (!tce_mem) {
pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
goto fail;
}
addr = page_address(tce_mem);
-   memset(addr, 0, TCE32_TABLE_SIZE * segs);
+   memset(addr, 0, tce32_segsz * segs);

/* Configure HW */
for (i = 0; i < segs; i++) {
rc = opal_pci_map_pe_dma_window(phb->opal_id,
  pe->pe_number,
  base + i, 1,
- __pa(addr) + TCE32_TABLE_SIZE * i,
- TCE32_TABLE_SIZE, 0x1000);
+ __pa(addr) + tce32_segsz * i,
+ tce32_segsz, 0x1000);



As you started using IOMMU_PAGE_SHIFT_4K and you are also touching this piece
of code -

s/0x1000/IOMMU_PAGE_SHIFT_4K/



Does 0x1000 is equal to IOMMU_PAGE_SHIFT_4K? I guess you probably suggested
to use IOMMU_PAGE_SIZE_4K instead?



Ah, my bad, should have been IOMMU_PAGE_SIZE_4K. I'll pay more attention to 
the details, sorry.






if (rc) {
pe_err(pe, " Failed to configure 32-bit TCE table,"
   " err %ld\n", rc);
@@ -2085,8 +2087,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
}

/* Setup linux iommu table */
-   pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
- base << 28, IOMMU_PAGE_SHIFT_4K);
+   

Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-04-13 Thread Alexey Kardashevskiy

On 04/14/2016 09:42 AM, Gavin Shan wrote:

On Wed, Apr 13, 2016 at 07:14:59PM +1000, Alexey Kardashevskiy wrote:

On 04/13/2016 05:42 PM, Gavin Shan wrote:

On Wed, Apr 13, 2016 at 05:28:15PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:43 PM, Gavin Shan wrote:

This series of patches rebases on powerpc/next branch, plus below additional
patches:



https://patchwork.ozlabs.org/patch/581315/  (PATCH[1/9] Richard's SRIOV EEH)
https://patchwork.ozlabs.org/patch/582639/  (PATCH[1/1] Gavin's EEH fix)
https://patchwork.ozlabs.org/patch/582093/  (PATCH[1/1] Gavin's EEH fix)
https://patchwork.ozlabs.org/patch/580626/  (PATCH[1/4] Gavin's PCI fix)
https://patchwork.ozlabs.org/patch/580153/  (PATCH[1/1] Andrew's EEH minor 
fix)
https://patchwork.ozlabs.org/patch/566827/  (PATCH[1/1] Russell's P5IOC2 
removal)
https://patchwork.ozlabs.org/patch/534154/  (PATCH[1/7] Richard's SRIOV 
rework)
commit 388f7b1 ("Linux 4.5-rc3")

The series of patches intend to support PCI slot for PowerPC PowerNV platform,
which is running on top of skiboot firmware. The patchset requires corresponding
changes from skiboot firmware, which is sent to skib...@lists.ozlabs.org
for review. The PCI slots are exposed by skiboot with device node properties,
and kernel utilizes those properties to populated PCI slots accordingly.

The original PCI infrastructure on PowerNV platform can't support hotplug
because the PE is assigned during PHB fixup time, which is called for once
during system boot time. For this, the PCI infrastructure on PowerNV platform
has been reworked for a lot. After that, the PE and its corresponding resources
(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
resources, on P8 strictly speaking). Each PE will maintain a reference count,
which is (number of child PCI devices + 1). That indicates when last child PCI
device leaves the PE, the PE and its included resources will be relased and put
back into free pool again. With this design, the PE will be released when EEH PE
is released. PATCH[1 - 23] are related to this part.

 From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
resets to EEH. The kernel gets to know if skiboot supports various reset on one
particular PCI slot through device-tree node. If it does, EEH will utilize the
functionality provided by skiboot. Besides, the device-tree nodes have to change
in order to support PCI hotplug. For example, when one PCI adapter inserted to
one slot, its device-tree node should be added to the system dynamically. 
Conversely,
the device-tree node should be removed from the system when the PCI adapter is 
going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
doing the related work.

The OF driver is changed to support unflattening FDT blob for sub-stree, which
is covered by PATCH[40 - 44].

The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC 
PowerNV
platform.

===
Testing
===
1. Unplug adapters behind non-empty slot, then plug them.

1.1 Check status
# cat /sys/bus/pci/slots/C10/address
0003:09:00
# cat /sys/bus/pci/slots/C10/adapter
1
# cat /sys/bus/pci/slots/C10/power
1
# lspci
0003:09:00.0 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
# lspci -t
# lspci -t
-+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
 |   +-08.0-[04-08]--
 |   +-09.0-[09]--+-00.0
 |   |+-00.1
 |   |+-00.2
 |   |\-00.3
 |   +-10.0-[0a-0e]--
 |   \-11.0-[0f-13]--

1.2 Unplug adapter 0003:09.00.x
# echo 0 > /sys/bus/pci/slots/C10/power
# lspci -t
-+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
 |   +-08.0-[04-08]--
 |   +-09.0-[09]--
 |   +-10.0-[0a-0e]--
 |   \-11.0-[0f-13]--

1.3 Plug adapter 0003:09.00.x
# echo 1 > /sys/bus/pci/slots/C10/power



Do I understand correctly that the adapter 

[PATCH V3 07/29] Add powerpc-specific parity functions

2016-04-13 Thread zengzhaoxiu
From: Zhaoxiu Zeng 

Use runtime patching for ppc64, lifted from hweight_64

Signed-off-by: Zhaoxiu Zeng 
---
 arch/powerpc/include/asm/bitops.h |  11 +++
 arch/powerpc/lib/Makefile |   2 +-
 arch/powerpc/lib/parity_64.S  | 142 ++
 arch/powerpc/lib/ppc_ksyms.c  |   5 ++
 4 files changed, 159 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/lib/parity_64.S

diff --git a/arch/powerpc/include/asm/bitops.h 
b/arch/powerpc/include/asm/bitops.h
index 59abc62..cd34030 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -269,8 +269,19 @@ unsigned int __arch_hweight16(unsigned int w);
 unsigned int __arch_hweight32(unsigned int w);
 unsigned long __arch_hweight64(__u64 w);
 #include 
+static inline unsigned int __arch_parity4(unsigned int w)
+{
+   w &= 0xf;
+   return ((PARITY_MAGIC) >> w) & 1;
+}
+unsigned int __arch_parity8(unsigned int w);
+unsigned int __arch_parity16(unsigned int w);
+unsigned int __arch_parity32(unsigned int w);
+unsigned int __arch_parity64(__u64 w);
+#include 
 #else
 #include 
+#include 
 #endif
 
 #include 
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index ba21be1..cae2e7f 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -16,7 +16,7 @@ obj-$(CONFIG_PPC32)   += div64.o copy_32.o
 
 obj64-y+= copypage_64.o copyuser_64.o usercopy_64.o mem_64.o 
hweight_64.o \
   copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \
-  memcpy_64.o memcmp_64.o
+  memcpy_64.o memcmp_64.o parity_64.o
 
 obj64-$(CONFIG_SMP)+= locks.o
 obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o
diff --git a/arch/powerpc/lib/parity_64.S b/arch/powerpc/lib/parity_64.S
new file mode 100644
index 000..7eff686
--- /dev/null
+++ b/arch/powerpc/lib/parity_64.S
@@ -0,0 +1,142 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Author: Zhaoxiu Zeng 
+ */
+
+#include 
+#include 
+
+/*
+ * This file contains the generic code to calculate the odd parity
+ * of N-bits number, and the POPCNT feature sections.
+ *
+ * Note: This code relies on -mminimal-toc
+ */
+
+/*
+ * unsigned int __arch_parity8(unsigned int w)
+ */
+_GLOBAL(__arch_parity8)
+BEGIN_FTR_SECTION
+   srdir4,r3,4
+   xor r3,r3,r4
+   clrldi  r3,r3,64-4
+   li  r4,0x6996
+   srd r3,r4,r3
+   clrldi  r3,r3,64-1
+   blr
+FTR_SECTION_ELSE
+   PPC_POPCNTB(R3,R3)
+   clrldi  r3,r3,64-1
+   blr
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB)
+
+/*
+ * unsigned int __arch_parity16(unsigned int w)
+ */
+_GLOBAL(__arch_parity16)
+BEGIN_FTR_SECTION
+   srdir4,r3,8
+   xor r3,r3,r4
+   srdir4,r3,4
+   xor r3,r3,r4
+   clrldi  r3,r3,64-4
+   li  r4,0x6996
+   srd r3,r4,r3
+   clrldi  r3,r3,64-1
+   blr
+FTR_SECTION_ELSE
+  BEGIN_FTR_SECTION_NESTED(50)
+   PPC_POPCNTB(R3,R3)
+   srdir4,r3,8
+   add r3,r4,r3
+   clrldi  r3,r3,64-1
+   blr
+  FTR_SECTION_ELSE_NESTED(50)
+   clrlwi  r3,r3,16
+   PPC_POPCNTW(R3,R3)
+   clrldi  r3,r3,64-1
+   blr
+  ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_POPCNTD, 50)
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB)
+
+/*
+ * unsigned int __arch_parity32(unsigned int w)
+ */
+_GLOBAL(__arch_parity32)
+BEGIN_FTR_SECTION
+   srdir4,r3,16
+   xor r3,r3,r4
+   srdir4,r3,8
+   xor r3,r3,r4
+   srdir4,r3,4
+   xor r3,r3,r4
+   clrldi  r3,r3,64-4
+   li  r4,0x6996
+   srd r3,r4,r3
+   clrldi  r3,r3,64-1
+   blr
+FTR_SECTION_ELSE
+  BEGIN_FTR_SECTION_NESTED(51)
+   PPC_POPCNTB(R3,R3)
+   srdir4,r3,16
+   add r3,r4,r3
+   srdir4,r3,8
+   add r3,r4,r3
+   clrldi  r3,r3,64-1
+   blr
+  FTR_SECTION_ELSE_NESTED(51)
+   PPC_POPCNTW(R3,R3)
+   clrldi  r3,r3,64-1
+   blr
+  ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_POPCNTD, 51)
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB)
+
+/*
+ * unsigned int __arch_parity64(__u64 w)
+ */
+_GLOBAL(__arch_parity64)
+BEGIN_FTR_SECTION
+   srdir4,r3,32
+   xor r3,r3,r4
+   srdir4,r3,16
+  

[PATCH V3 00/29] bitops: add parity functions

2016-04-13 Thread zengzhaoxiu
From: Zhaoxiu Zeng 

When I do "grep parity -r linux", I found many parity calculations
distributed in many drivers.

This patch series does:
  1. provide generic and architecture-specific parity calculations
  2. remove drivers' local parity calculations, use bitops' parity
 functions instead
  3. replace "hweightN(x) & 1" with "parityN(x)" to improve readability,
 and improve performance on some CPUs that without popcount support

I did not use GCC's __builtin_parity* functions, based on the following reasons:
  1. I don't know where to identify which version of GCC from the beginning
 supported __builtin_parity for the architecture.
  2. For the architecture that doesn't has popcount instruction, GCC instead use
 "call __paritysi2" (__paritydi2 for 64-bits). So if use __builtin_parity, 
we must
 provide __paritysi2 and __paritydi2 functions for these architectures.
 Additionally, parity4,8,16 might be "__builtin_parity(x & mask)", but the 
"& mask"
 operation is totally unnecessary.
  3. For the architecture that has popcount instruction, we do the same things.
  4. For powerpc, sparc, and x86, we do runtime patching to use popcount 
instruction
 if the CPU support.

I have compiled successfully with x86_64_defconfig, i386_defconfig, 
pseries_defconfig
and sparc64_defconfig.

Changes to v2:
- Add constant PARITY_MAGIC (proposals by Sam Ravnborg)
- Add include/asm-generic/bitops/popc-parity.h (proposals by Chris Metcalf)
- Tile uses popc-parity.h directly
- Mips uses popc-parity.h if has usable __builtin_popcount
- Add few comments in powerpc's and sparc's parity.S
- X86, remove custom calling convention

Changes to v1:
- Add runtime patching for powerpc, sparc, and x86
- Avr32 use grenric parity too
- Fix error in ssfdc's patch, and add commit message
- Don't change the original code composition of drivers/iio/gyro/adxrs450.c
- Directly assignement to phy_cap.parity in drivers/scsi/isci/phy.c

Regards,

=== diffstat ===

Zhaoxiu Zeng (29):
  bitops: add parity functions
  Include generic parity.h in some architectures' bitops.h
  Add alpha-specific parity functions
  Add blackfin-specific parity functions
  Add ia64-specific parity functions
  Tile and MIPS (if has usable __builtin_popcount) use popcount parity
functions
  Add powerpc-specific parity functions
  Add sparc-specific parity functions
  Add x86-specific parity functions
  sunrpc: use parity8
  mips: use parity functions in cerr-sb1.c
  bch: use parity32
  media: use parity8 in vivid-vbi-gen.c
  media: use parity functions in saa7115
  input: use parity32 in grip_mp
  input: use parity64 in sidewinder
  input: use parity16 in ams_delta_serio
  scsi: use parity32 in isci's phy
  mtd: use parity16 in ssfdc
  mtd: use parity functions in inftlcore
  crypto: use parity functions in qat_hal
  mtd: use parity16 in sm_ftl
  ethernet: use parity8 in sun/niu.c
  input: use parity8 in pcips2
  input: use parity8 in saps2
  iio: use parity32 in adxrs450
  serial: use parity32 in max3100
  input: use parity8 in elantech
  ethernet: use parity8 in broadcom/tg3.c

 arch/alpha/include/asm/bitops.h  |  27 +
 arch/arc/include/asm/bitops.h|   1 +
 arch/arm/include/asm/bitops.h|   1 +
 arch/arm64/include/asm/bitops.h  |   1 +
 arch/avr32/include/asm/bitops.h  |   1 +
 arch/blackfin/include/asm/bitops.h   |  31 ++
 arch/c6x/include/asm/bitops.h|   1 +
 arch/cris/include/asm/bitops.h   |   1 +
 arch/frv/include/asm/bitops.h|   1 +
 arch/h8300/include/asm/bitops.h  |   1 +
 arch/hexagon/include/asm/bitops.h|   1 +
 arch/ia64/include/asm/bitops.h   |  31 ++
 arch/m32r/include/asm/bitops.h   |   1 +
 arch/m68k/include/asm/bitops.h   |   1 +
 arch/metag/include/asm/bitops.h  |   1 +
 arch/mips/include/asm/bitops.h   |   7 ++
 arch/mips/mm/cerr-sb1.c  |  67 -
 arch/mn10300/include/asm/bitops.h|   1 +
 arch/openrisc/include/asm/bitops.h   |   1 +
 arch/parisc/include/asm/bitops.h |   1 +
 arch/powerpc/include/asm/bitops.h|  11 +++
 arch/powerpc/lib/Makefile|   2 +-
 arch/powerpc/lib/parity_64.S | 142 +++
 arch/powerpc/lib/ppc_ksyms.c |   5 +
 arch/s390/include/asm/bitops.h   |   1 +
 arch/sh/include/asm/bitops.h |   1 +
 arch/sparc/include/asm/bitops_32.h   |   1 +
 arch/sparc/include/asm/bitops_64.h   |  18 
 arch/sparc/kernel/sparc_ksyms_64.c   |   6 ++
 arch/sparc/lib/Makefile  |   2 +-
 arch/sparc/lib/parity.S  | 128 
 arch/tile/include/asm/bitops.h   |   2 +
 arch/x86/include/asm/arch_hweight.h  |   5 

Re: [PATCH 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-13 Thread Viresh Kumar
On 13-04-16, 23:27, Akshay Adiga wrote:
> On 04/13/2016 10:33 AM, Viresh Kumar wrote:
> >>+void gpstate_timer_handler(unsigned long data)
> >>+{
> >>+   struct cpufreq_policy *policy = (struct cpufreq_policy *) data;
> >no need to cast.
> 
> May be i need a cast here,  because data is unsigned long ( unlike other 
> places where its void *).
> On building without cast, it throws me a warning.

My bad, yeah :(

> >>+   if (freq_data.gpstate_id != freq_data.pstate_id)
> >>+   ret = queue_gpstate_timer(gpstates);
> >ret not used.
> 
> Should i make it void instead of returning int?, as i cannot do much even if 
> it fails, except for notifying.

Sure.

-- 
viresh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel 1/2] powerpc/iommu: Get rid of default group_release()

2016-04-13 Thread David Gibson
On Fri, Apr 08, 2016 at 04:36:43PM +1000, Alexey Kardashevskiy wrote:
> IBM PPC IOMMU API users always set IOMMU data and IOMMU release callback
> to an IOMMU group. At the moment the callback clears one pointer in
> iommu_table_group and that's it.
> 
> The platform code calls iommu_group_put() and counts on _put() being
> called last so they check for table_group->group being reset which
> is conceptually wrong as there may be another user holding a reference.
> 
> This removes the default IOMMU group release() callback and adds it
> as a parameter to iommu_register_group(). As we are changing the prototype
> anyway, this also changes the function name to more distinctive
> iommu_register_table_group().
> 
> This should cause no behavioral change as it leaves BUG_ON for IODA2
> (where it was reported) and removes BUG_ON for pseries/IODA1 as they
> do not support IOV anyway and this BUG_ON has never been reported for
> these platforms.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/iommu.h  | 12 +++-
>  arch/powerpc/kernel/iommu.c   | 14 --
>  arch/powerpc/platforms/powernv/pci-ioda.c | 15 +++
>  arch/powerpc/platforms/pseries/iommu.c| 17 +
>  4 files changed, 31 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 7b87bab..d7ba3b4 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -201,17 +201,19 @@ struct iommu_table_group {
>  
>  #ifdef CONFIG_IOMMU_API
>  
> -extern void iommu_register_group(struct iommu_table_group *table_group,
> -  int pci_domain_number, unsigned long pe_num);
> +extern void iommu_register_table_group(struct iommu_table_group *table_group,
> + int pci_domain_number, unsigned long pe_num,
> + void (*release)(void *iommu_data));
>  extern int iommu_add_device(struct device *dev);
>  extern void iommu_del_device(struct device *dev);
>  extern int __init tce_iommu_bus_notifier_init(void);
>  extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
>   unsigned long *hpa, enum dma_data_direction *direction);
>  #else
> -static inline void iommu_register_group(struct iommu_table_group 
> *table_group,
> - int pci_domain_number,
> - unsigned long pe_num)
> +static inline void iommu_register_table_group(
> + struct iommu_table_group *table_group,
> + int pci_domain_number, unsigned long pe_num,
> + void (*release)(void *iommu_data))
>  {
>  }
>  
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index a8e3490..8eed2fa 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -887,15 +887,9 @@ EXPORT_SYMBOL_GPL(iommu_direction_to_tce_perm);
>  /*
>   * SPAPR TCE API
>   */
> -static void group_release(void *iommu_data)
> -{
> - struct iommu_table_group *table_group = iommu_data;
> -
> - table_group->group = NULL;
> -}
> -
> -void iommu_register_group(struct iommu_table_group *table_group,
> - int pci_domain_number, unsigned long pe_num)
> +void iommu_register_table_group(struct iommu_table_group *table_group,
> + int pci_domain_number, unsigned long pe_num,
> + void (*release)(void *iommu_data))
>  {
>   struct iommu_group *grp;
>   char *name;
> @@ -907,7 +901,7 @@ void iommu_register_group(struct iommu_table_group 
> *table_group,
>   return;
>   }
>   table_group->group = grp;
> - iommu_group_set_iommudata(grp, table_group, group_release);
> + iommu_group_set_iommudata(grp, table_group, release);
>   name = kasprintf(GFP_KERNEL, "domain%d-pe%lx",
>   pci_domain_number, pe_num);
>   if (!name)
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index c5baaf3..ce9f2bf 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1330,6 +1330,13 @@ static long pnv_pci_ioda2_unset_window(struct 
> iommu_table_group *table_group,
>   int num);
>  static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
>  
> +static void pnv_pci_ioda2_group_release(void *iommu_data)
> +{
> + struct iommu_table_group *table_group = iommu_data;
> +
> + table_group->group = NULL;
> +}
> +
>  static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct 
> pnv_ioda_pe *pe)
>  {
>   struct iommu_table*tbl;
> @@ -1965,8 +1972,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
> *phb,
>   return;
>  
>   tbl = pnv_pci_table_alloc(phb->hose->node);
> - iommu_register_group(>table_group, 

Re: [PATCH kernel 2/2] powerpc/powernv/ioda2: Delay PE disposal

2016-04-13 Thread David Gibson
On Fri, Apr 08, 2016 at 04:36:44PM +1000, Alexey Kardashevskiy wrote:
> When SRIOV is disabled, the existing code presumes there is no
> virtual function (VF) in use and destroys all associated PEs.
> However it is possible to get into the situation when the user
> activated SRIOV disabling while a VF is still in use via VFIO.
> For example, unbinding a physical function (PF) while there is a guest
> running with a VF passed throuhgh via VFIO will trigger the bug.
> 
> This defines an IODA2-specific IOMMU group release() callback.
> This moves all the disposal code from pnv_ioda_release_vf_PE() to this
> new callback so the cleanup happens when the last user of an IOMMU
> group released the reference.
> 
> As pnv_pci_ioda2_release_dma_pe() was reduced to just calling
> iommu_group_put(), this merges pnv_pci_ioda2_release_dma_pe()
> into pnv_ioda_release_vf_PE().
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 33 
> +--
>  1 file changed, 14 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index ce9f2bf..8108c54 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1333,27 +1333,25 @@ static void pnv_pci_ioda2_set_bypass(struct 
> pnv_ioda_pe *pe, bool enable);
>  static void pnv_pci_ioda2_group_release(void *iommu_data)
>  {
>   struct iommu_table_group *table_group = iommu_data;
> + struct pnv_ioda_pe *pe = container_of(table_group,
> + struct pnv_ioda_pe, table_group);
> + struct pci_controller *hose = pci_bus_to_host(pe->parent_dev->bus);
> + struct pnv_phb *phb = hose->private_data;
> + struct iommu_table *tbl = pe->table_group.tables[0];
> + int64_t rc;
>  
> - table_group->group = NULL;
> -}
> -
> -static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct 
> pnv_ioda_pe *pe)
> -{
> - struct iommu_table*tbl;
> - int64_t   rc;
> -
> - tbl = pe->table_group.tables[0];
>   rc = pnv_pci_ioda2_unset_window(>table_group, 0);

Is it safe to go manipulating the PE windows, etc. after SR-IOV is
disabled?

When SR-IOV is disabled, you need to immediately disable the VF (I'm
guessing that happens somewhere) and stop all access to the VF
"hardware".  Only the iommu group structure *has* to stick around
until the reference count drops to zero.  I think other structures and
hardware reconfiguration can be deferred or done immediately,
whichever is more convenient.

>   if (rc)
>   pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
>  
>   pnv_pci_ioda2_set_bypass(pe, false);
> - if (pe->table_group.group) {
> - iommu_group_put(pe->table_group.group);
> - BUG_ON(pe->table_group.group);
> - }
> +
> + BUG_ON(!tbl);
>   pnv_pci_ioda2_table_free_pages(tbl);
> - iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
> + iommu_free_table(tbl, of_node_full_name(pe->parent_dev->dev.of_node));
> +
> + pnv_ioda_deconfigure_pe(phb, pe);
> + pnv_ioda_free_pe(phb, pe->pe_number);
>  }
>  
>  static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
> @@ -1376,16 +1374,13 @@ static void pnv_ioda_release_vf_PE(struct pci_dev 
> *pdev)
>   if (pe->parent_dev != pdev)
>   continue;
>  
> - pnv_pci_ioda2_release_dma_pe(pdev, pe);
> -
>   /* Remove from list */
>   mutex_lock(>ioda.pe_list_mutex);
>   list_del(>list);
>   mutex_unlock(>ioda.pe_list_mutex);
>  
> - pnv_ioda_deconfigure_pe(phb, pe);
> -
> - pnv_ioda_free_pe(phb, pe->pe_number);
> + if (pe->table_group.group)
> + iommu_group_put(pe->table_group.group);
>   }
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-04-13 Thread Gavin Shan
On Thu, Apr 14, 2016 at 09:57:32AM +1000, Alistair Popple wrote:
>Hi Gavin,
>
>
>
>> >Why exactly cannot EEH reset changes go to a smaller separate patchset
>> >(before hotplug)?
>> >
>> 
>> As I explained before, the patchset's order is: PCI generic part,
>> PowerNV PCI related, EEH related, device-tree part and hotplug driver.
>> 
>> The EEH reset change is included in PATCH[37/45]. There is no point
>> to reorder the patches.
>
>I don't understand all of the dependencies but if possible splitting the 
>series up into a set of smaller self-contained patch series makes things 
>easier to review and may make it easier for you to get this functionality 
>reviewed and accepted into upstream.
>

Thanks, Alistair. I will move those cleanup/refactor related patches
to form a separate series which is expected to be merged first. That
will helps the reviewers to focus on the patches with complicated
changes as you suggested. Alexey, please let me know if that way is
you like to see or not.

Thanks,
Gavin

>Regards,
>
>Alistair
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/2] perf probe fixes for ppc64le

2016-04-13 Thread Balbir Singh


On 12/04/16 19:10, Naveen N. Rao wrote:
> This patchset fixes three issues found with perf probe on ppc64le:
> 1. 'perf test kallsyms' failure on ppc64le (reported by Michael
> Ellerman). This was due to the symbols being fixed up during symbol
> table load. This is fixed in patch 2 by delaying symbol fixup until
> later.
> 2. perf probe function offset was being calculated from the local entry
> point (LEP), which does not match user expectation when trying to look
> at function disassembly output (reported by Ananth N). This is fixed for
> kallsyms in patch 1 and for symbol table in patch 2.
> 3. perf probe failure with kretprobe when using kallsyms. This was
> failing as we were specifying an offset. This is fixed in patch 1.
> 
> A few examples demonstrating the issues and the fix:
> 

Given the choices, I think this makes sense

Acked-by: Balbir Singh 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/kprobes: Remove kretprobe_trampoline_holder.

2016-04-13 Thread Thiago Jung Bauermann
Hello,

People seem to be considering patches for next, so this looks like a good 
moment to ping about this one.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center



Am Donnerstag, 31 März 2016, 17:10:40 schrieb Thiago Jung Bauermann:
> Fixes the following testsuite failure:
> 
>   $ sudo ./perf test -v kallsyms
>1: vmlinux symtab matches kallsyms  :
>   --- start ---
>   test child forked, pid 12489
>   Using /proc/kcore for kernel object code
>   Looking at the vmlinux_path (8 entries long)
>   Using /boot/vmlinux for symbols
>   0xc003d300: diff name v: .kretprobe_trampoline_holder k:
> kretprobe_trampoline Maps only in vmlinux:
>c086ca38-c0879b6c 87ca38 [kernel].text.unlikely
>c0879b6c-c0bf 889b6c [kernel].meminit.text
>c0bf-c0c53264 c0 [kernel].init.text
>c0c53264-d425 c63264 [kernel].exit.text
>d425-d445 0 [libcrc32c]
>d445-d462 0 [xfs]
>d462-d468 0 [autofs4]
>d468-d46e 0 [x_tables]
>d46e-d478 0 [ip_tables]
>d478-d47e 0 [rng_core]
>d47e- 0 [pseries_rng]
>   Maps in vmlinux with a different name in kallsyms:
>   Maps only in kallsyms:
>d000-f000 1001 [kernel.kallsyms]
>f000- 3001 [kernel.kallsyms]
>   test child finished with -1
>    end 
>   vmlinux symtab matches kallsyms: FAILED!
> 
> The problem is that the kretprobe_trampoline symbol looks like this:
> 
>   $ eu-readelf -s /boot/vmlinux G kretprobe_trampoline
>2431: c1302368 24 NOTYPE  LOCAL  DEFAULT   37
> kretprobe_trampoline_holder 2432: c003d300  8 FUNCLOCAL 
> DEFAULT1 .kretprobe_trampoline_holder 97543: c003d300
>  0 NOTYPE  GLOBAL DEFAULT1 kretprobe_trampoline
> 
> Its type is NOTYPE, and its size is 0, and this is a problem because
> symbol-elf.c:dso__load_sym skips function symbols that are not STT_FUNC
> or STT_GNU_IFUNC (this is determined by elf_sym__is_function). Even
> if the type is changed to STT_FUNC, when dso__load_sym calls
> symbols__fixup_duplicate, the kretprobe_trampoline symbol is dropped in
> favour of .kretprobe_trampoline_holder because the latter has non-zero
> size (as determined by choose_best_symbol).
> 
> With this patch, all vmlinux symbols match /proc/kallsyms and the
> testcase passes.
> 
> Commit c1c355ce14c0 ("x86/kprobes: Get rid of
> kretprobe_trampoline_holder()") gets rid of kretprobe_trampoline_holder
> altogether on x86. This commit does the same on powerpc. This change
> introduces no regressions on the perf and ftracetest testsuite results.
> 
> Cc: Ananth N Mavinakayanahalli 
> Cc: Michael Ellerman 
> Reviewed-by: Naveen N. Rao 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/kernel/kprobes.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 7c053f281406..417c0eadd094 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -278,12 +278,11 @@ no_kprobe:
>   *   - When the probed function returns, this probe
>   *   causes the handlers to fire
>   */
> -static void __used kretprobe_trampoline_holder(void)
> -{
> - asm volatile(".global kretprobe_trampoline\n"
> - "kretprobe_trampoline:\n"
> - "nop\n");
> -}
> +asm(".global kretprobe_trampoline\n"
> + ".type kretprobe_trampoline, @function\n"
> + "kretprobe_trampoline:\n"
> + "nop\n"
> + ".size kretprobe_trampoline, .-kretprobe_trampoline\n");
> 
>  /*
>   * Called when the probe at kretprobe trampoline is hit

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ftrace: filter: Match dot symbols when searching functions on ppc64.

2016-04-13 Thread Thiago Jung Bauermann
Hello,

Am Freitag, 01 April 2016, 18:28:06 schrieb Thiago Jung Bauermann:
> Am Samstag, 02 April 2016, 03:51:21 schrieb kbuild test robot:
> > >> arch/powerpc/include/asm/ftrace.h:62:5: error: "CONFIG_PPC64" is not
> > >> defined [-Werror=undef]
> > >> 
> > #if CONFIG_PPC64 && (!defined(_CALL_ELF) || _CALL_ELF != 2)
> > 
> > ^
> >
> >cc1: all warnings being treated as errors
> 
> I forgot to use defined() in the #if expression. Here’s the fixed version.

People seem to be considering patches for next, so this looks like a good 
moment to ping about this one.

Ps: patchwork seems to have an issue which causes it to show the message 
body as if it were the commit message, but if you feed my original email 
(the one I’m replying to here) to git am, the commit message will be 
correct.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-04-13 Thread Alistair Popple
Hi Gavin,



> >Why exactly cannot EEH reset changes go to a smaller separate patchset
> >(before hotplug)?
> >
> 
> As I explained before, the patchset's order is: PCI generic part,
> PowerNV PCI related, EEH related, device-tree part and hotplug driver.
> 
> The EEH reset change is included in PATCH[37/45]. There is no point
> to reorder the patches.
 
I don't understand all of the dependencies but if possible splitting the 
series up into a set of smaller self-contained patch series makes things 
easier to review and may make it easier for you to get this functionality 
reviewed and accepted into upstream.

Regards,

Alistair

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE

2016-04-13 Thread Gavin Shan
On Wed, Apr 13, 2016 at 06:29:42PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>Currently, there is one macro (TCE32_TABLE_SIZE) representing the
>>TCE table size for one DMA32 segment. The constant representing
>>the DMA32 segment size (1 << 28) is still used in the code.
>>
>>This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
>>segment size. the TCE table size can be calcualted when the page
>
>s/calcualted/calculated/
>
>
>>has fixed 4KB size. So all the related calculation depends on one
>>macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.
>
>Please move PNV_IODA1_DMA32_SEGSIZE where TCE32_TABLE_SIZE was.
>
>
>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 30 
>> +-
>>  arch/powerpc/platforms/powernv/pci.h  |  1 +
>>  2 files changed, 18 insertions(+), 13 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index d18b95e..e60cff6 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -48,9 +48,6 @@
>>  #include "powernv.h"
>>  #include "pci.h"
>>
>>-/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>>-#define TCE32_TABLE_SIZE ((0x1000 / 0x1000) * 8)
>>-
>>  #define POWERNV_IOMMU_DEFAULT_LEVELS1
>>  #define POWERNV_IOMMU_MAX_LEVELS5
>>
>>@@ -2034,7 +2031,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
>>*phb,
>>
>>  struct page *tce_mem = NULL;
>>  struct iommu_table *tbl;
>>- unsigned int i;
>>+ unsigned int tce32_segsz, i;
>
>
>PNV_IODA1_DMA32_SEGSIZE is a segment size in bytes. The name @tce32_segsz
>also suggests that it is a segment size in bytes (otherwise it would be
>tce32_seg_entries or something like this) but it is not, it is a number of
>TCE entries (arch/powerpc/kernel/iommu.c uses "entry" for these). And
>tce32_segsz never changes. So:
>
>const unsigned int entries = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K
>- 3);
>

Are you sure @tce32_segsz and equation you gave are for number of TCE entries,
not the size of meory required for the DMA32 segment TCE table?

>>  int64_t rc;
>>  void *addr;
>>
>>@@ -2054,29 +2051,34 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
>>*phb,
>>  /* Grab a 32-bit TCE table */
>>  pe->tce32_seg = base;
>>  pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
>>- (base << 28), ((base + segs) << 28) - 1);
>>+ base * PNV_IODA1_DMA32_SEGSIZE,
>>+ (base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
>>
>>  /* XXX Currently, we allocate one big contiguous table for the
>>   * TCEs. We only really need one chunk per 256M of TCE space
>>   * (ie per segment) but that's an optimization for later, it
>>   * requires some added smarts with our get/put_tce implementation
>>+  *
>>+  * Each TCE page is 4KB in size and each TCE entry occupies 8
>>+  * bytes
>>   */
>>+ tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);
>
>>  tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
>>-get_order(TCE32_TABLE_SIZE * segs));
>>+get_order(tce32_segsz * segs));
>>  if (!tce_mem) {
>>  pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
>>  goto fail;
>>  }
>>  addr = page_address(tce_mem);
>>- memset(addr, 0, TCE32_TABLE_SIZE * segs);
>>+ memset(addr, 0, tce32_segsz * segs);
>>
>>  /* Configure HW */
>>  for (i = 0; i < segs; i++) {
>>  rc = opal_pci_map_pe_dma_window(phb->opal_id,
>>pe->pe_number,
>>base + i, 1,
>>-   __pa(addr) + TCE32_TABLE_SIZE * i,
>>-   TCE32_TABLE_SIZE, 0x1000);
>>+   __pa(addr) + tce32_segsz * i,
>>+   tce32_segsz, 0x1000);
>
>
>As you started using IOMMU_PAGE_SHIFT_4K and you are also touching this piece
>of code -
>
>s/0x1000/IOMMU_PAGE_SHIFT_4K/
>

Does 0x1000 is equal to IOMMU_PAGE_SHIFT_4K? I guess you probably suggested
to use IOMMU_PAGE_SIZE_4K instead?

>>  if (rc) {
>>  pe_err(pe, " Failed to configure 32-bit TCE table,"
>> " err %ld\n", rc);
>>@@ -2085,8 +2087,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
>>*phb,
>>  }
>>
>>  /* Setup linux iommu table */
>>- pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
>>-   base << 28, IOMMU_PAGE_SHIFT_4K);
>>+ pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
>>+   base * PNV_IODA1_DMA32_SEGSIZE,
>>+  

Re: [PATCH 1/3] powerpc: Complete FSCR context switch

2016-04-13 Thread Michael Neuling
On Wed, 2016-04-13 at 12:52 -0500, Jack Miller wrote:

> Hi Anton.
> 
> On Wed, Apr 13, 2016 at 5:51 AM, Anton Blanchard  wrote:

> > Hi Jack,
> > 

> > > Previously we just saved the FSCR, but only restored it in some
> > > settings, and never copied it thread to thread. This patch always
> > > restores the FSCR and formalizes new threads inheriting its setting so
> > > that later we can manipulate FSCR bits in start_thread.
> > 
> > Will this break the existing FSCR_DSCR bit handling?
> > 
> >  if (cpu_has_feature(CPU_FTR_DSCR)) {
> > u64 dscr = get_paca()->dscr_default;
> > u64 fscr = old_thread->fscr & ~FSCR_DSCR;
> > 
> > if (new_thread->dscr_inherit) {
> > dscr = new_thread->dscr;
> > fscr |= FSCR_DSCR;
> > }
> > 
> > if (old_thread->dscr != dscr)
> > mtspr(SPRN_DSCR, dscr);
> > 
> > if (old_thread->fscr != fscr)
> > mtspr(SPRN_FSCR, fscr);
> > }
> > 
> > If not, we should modify the above so we don't write the FSCR twice.
> 
> I think this code is just partially redundant. I think it's trying to
> predict the right FSCR value based on this dscr_inherit flag. Now that
> we fully switch it, we could skip setting FSCR here, and just set DSCR
> if FSCR.DSCR is set (similar to what my patches do with FSCR.LM). In
> fact, we might be able to just entirely get rid of the dscr_inherit
> flag, but I'd have to look harder at that.

I'm not sure that works on processes before power8.

There DSCR SPR number 0x11 will always trap and emulate from userspace
(see arch/powerpc/kernel/traps.c:emulate_instruction()).  That is not
controlled by FSCR and should work on POWER7 where FSCR is not
present.  We need to set the inherit bit there too.

DSCR SPR number 0x3 is controlled by fscr, but it's only avaliable on
POWER8.

> Right now the FSCR switch is conditional on FTR_ARCH_207S which is
> more exclusive than FTR_DSCR, but I guess the actual FSCR register is
> universal to PPC64 like the fscr field in the thread struct? If so, I
> can just move the FSCR save/restore out of the 207 conditional.

FSCR was only introduced in power8, so it needs to be 207 conditional

Mikey

> 
> Anyway, I'll clean this up a bit, add the little asm tweak from
> Segher, and put another spin on the list.
> 
> - Jack
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-04-13 Thread Gavin Shan
On Wed, Apr 13, 2016 at 07:14:59PM +1000, Alexey Kardashevskiy wrote:
>On 04/13/2016 05:42 PM, Gavin Shan wrote:
>>On Wed, Apr 13, 2016 at 05:28:15PM +1000, Alexey Kardashevskiy wrote:
>>>On 02/17/2016 02:43 PM, Gavin Shan wrote:
This series of patches rebases on powerpc/next branch, plus below additional
patches:



https://patchwork.ozlabs.org/patch/581315/  (PATCH[1/9] Richard's 
 SRIOV EEH)
https://patchwork.ozlabs.org/patch/582639/  (PATCH[1/1] Gavin's EEH 
 fix)
https://patchwork.ozlabs.org/patch/582093/  (PATCH[1/1] Gavin's EEH 
 fix)
https://patchwork.ozlabs.org/patch/580626/  (PATCH[1/4] Gavin's PCI 
 fix)
https://patchwork.ozlabs.org/patch/580153/  (PATCH[1/1] Andrew's 
 EEH minor fix)
https://patchwork.ozlabs.org/patch/566827/  (PATCH[1/1] Russell's 
 P5IOC2 removal)
https://patchwork.ozlabs.org/patch/534154/  (PATCH[1/7] Richard's 
 SRIOV rework)
commit 388f7b1 ("Linux 4.5-rc3")

The series of patches intend to support PCI slot for PowerPC PowerNV 
platform,
which is running on top of skiboot firmware. The patchset requires 
corresponding
changes from skiboot firmware, which is sent to skib...@lists.ozlabs.org
for review. The PCI slots are exposed by skiboot with device node 
properties,
and kernel utilizes those properties to populated PCI slots accordingly.

The original PCI infrastructure on PowerNV platform can't support hotplug
because the PE is assigned during PHB fixup time, which is called for once
during system boot time. For this, the PCI infrastructure on PowerNV 
platform
has been reworked for a lot. After that, the PE and its corresponding 
resources
(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon 
updating
PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
resources, on P8 strictly speaking). Each PE will maintain a reference 
count,
which is (number of child PCI devices + 1). That indicates when last child 
PCI
device leaves the PE, the PE and its included resources will be relased and 
put
back into free pool again. With this design, the PE will be released when 
EEH PE
is released. PATCH[1 - 23] are related to this part.

 From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
resets to EEH. The kernel gets to know if skiboot supports various reset on 
one
particular PCI slot through device-tree node. If it does, EEH will utilize 
the
functionality provided by skiboot. Besides, the device-tree nodes have to 
change
in order to support PCI hotplug. For example, when one PCI adapter inserted 
to
one slot, its device-tree node should be added to the system dynamically. 
Conversely,
the device-tree node should be removed from the system when the PCI adapter 
is going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device 
nodes,
they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] 
are
doing the related work.

The OF driver is changed to support unflattening FDT blob for sub-stree, 
which
is covered by PATCH[40 - 44].

The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC 
PowerNV
platform.

===
Testing
===
1. Unplug adapters behind non-empty slot, then plug them.

1.1 Check status
# cat /sys/bus/pci/slots/C10/address
0003:09:00
# cat /sys/bus/pci/slots/C10/adapter
1
# cat /sys/bus/pci/slots/C10/power
1
# lspci
0003:09:00.0 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
# lspci -t
# lspci -t
-+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
 |   +-08.0-[04-08]--
 |   +-09.0-[09]--+-00.0
 |   |+-00.1
 |   |+-00.2
 |   |\-00.3
 |   +-10.0-[0a-0e]--
 |   \-11.0-[0f-13]--

1.2 Unplug adapter 0003:09.00.x
# echo 0 > /sys/bus/pci/slots/C10/power
# lspci 

Re: [PATCH 1/3] powerpc: Complete FSCR context switch

2016-04-13 Thread Jack Miller
Hi Anton.

On Wed, Apr 13, 2016 at 5:51 AM, Anton Blanchard  wrote:
> Hi Jack,
>
>> Previously we just saved the FSCR, but only restored it in some
>> settings, and never copied it thread to thread. This patch always
>> restores the FSCR and formalizes new threads inheriting its setting so
>> that later we can manipulate FSCR bits in start_thread.
>
> Will this break the existing FSCR_DSCR bit handling?
>
>  if (cpu_has_feature(CPU_FTR_DSCR)) {
> u64 dscr = get_paca()->dscr_default;
> u64 fscr = old_thread->fscr & ~FSCR_DSCR;
>
> if (new_thread->dscr_inherit) {
> dscr = new_thread->dscr;
> fscr |= FSCR_DSCR;
> }
>
> if (old_thread->dscr != dscr)
> mtspr(SPRN_DSCR, dscr);
>
> if (old_thread->fscr != fscr)
> mtspr(SPRN_FSCR, fscr);
> }
>
> If not, we should modify the above so we don't write the FSCR twice.

I think this code is just partially redundant. I think it's trying to
predict the right FSCR value based on this dscr_inherit flag. Now that
we fully switch it, we could skip setting FSCR here, and just set DSCR
if FSCR.DSCR is set (similar to what my patches do with FSCR.LM). In
fact, we might be able to just entirely get rid of the dscr_inherit
flag, but I'd have to look harder at that.

Right now the FSCR switch is conditional on FTR_ARCH_207S which is
more exclusive than FTR_DSCR, but I guess the actual FSCR register is
universal to PPC64 like the fscr field in the thread struct? If so, I
can just move the FSCR save/restore out of the 207 conditional.

Anyway, I'll clean this up a bit, add the little asm tweak from
Segher, and put another spin on the list.

- Jack
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] powerpc: Load Monitor Register Support

2016-04-13 Thread Jack Miller
Thanks, yeah, that's more readable and more correct. I'll change it in
the next spin.

- Jack

On Tue, Apr 12, 2016 at 12:40 AM, Segher Boessenkool
 wrote:
> Hi,
>
> On Mon, Apr 11, 2016 at 01:57:44PM -0500, Jack Miller wrote:
>>  __init_FSCR:
>>   mfspr   r3,SPRN_FSCR
>> + andi.   r3,r3,(~FSCR_LM)@L
>>   ori r3,r3,FSCR_TAR|FSCR_DSCR|FSCR_EBB
>>   mtspr   SPRN_FSCR,r3
>>   blr
>
> This clears the top 48 bits as well.  Shouldn't matter currently; but
> more robust (and easier to read, if you know the idiom) is
>
> ori r3,r3,FSCR_LM|FSCR_TAR|FSCR_DSCR|FSCR_EBB
> xorir3,r3,FSCR_LM
>
>
> Segher
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: introduce {cmp}xchg for u8 and u16

2016-04-13 Thread Waiman Long

On 04/13/2016 07:15 AM, Pan Xinhui wrote:

Hello Peter,

On 2016年04月12日 22:30, Peter Zijlstra wrote:


I am working on the qspinlock implementation on PPC.
Your and Waiman's patches are so nice. :)

Thanks!, last time I looked at PPC spinlocks they could not use things
like ticket locks because PPC might be a guest and fairness blows etc..

You're making the qspinlock-paravirt thing work on PPC, or doing
qspinlock only for bare-metal PPC?


I am making the both work. :)
qspinlock works on PPC now. I am preparing the patches and will send them out 
in next weeks :)


What of performance improvement are you seeing in PPC?


The paravirt work is a little hard.
currently, there are pv_wait() and pv_kick(). but only pv_kick has the 
parameter cpu(who will hold the lock as soon as the lock is unlocked).
We need parameter cpu(who holds the lock now) in pv_wait,too.


That can be doable to a certain extent. However, if the current lock 
holder acquired the lock via the fastpath only. The CPU information is 
not logged anywhere. For a contended lock, the information should be there.


Cheers,
Longman
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Live patching for powerpc

2016-04-13 Thread Jessica Yu

+++ Miroslav Benes [13/04/16 15:01 +0200]:

On Wed, 13 Apr 2016, Michael Ellerman wrote:


This series adds live patching support for powerpc (ppc64le only ATM).

It's unchanged since the version I posted on March 24, with the exception that
I've dropped the first patch, which was a testing-only patch.

If there's no further comments I'll put this in a topic branch in the next day
or two and Jiri & I will both merge that into next.


Hi,

I'll definitely give it a proper look today or tomorrow, but there is one
thing that needs to be solved. The patch set from Jessica reworking
relocations for live patching is now merged in our for-next branch. This
means that we need to find out if there is something in struct
mod_arch_specific for powerpc which needs to be preserved and do it.



I took a look around the powerpc module.c code and it looks like the
mod_arch_specific stuff should be fine, since it is statically allocated
in the module struct (unlike the situation in s390, where
mod->arch.syminfo was vmalloc'd and we had to delay the free).
However I'm not familiar with the powerpc code so I need to dig around
a bit more to be 100% sure.

A second concern I have is that apply_relocate_add() relies on
sections like .stubs and .toc (for 64-bit) and .init.plt and .plt
sections (for 32-bit). In order for apply_relocate_add() to work for
livepatch, we must make sure these sections aren't thrown away and are
not in init module memory since this memory will be freed at the end
of module load (see how INIT_OFFSET_MASK is used in kernel/module.c).
As long as these sections are placed in module core memory, we will be
OK. I need to think about this a bit more.

Third and unrelated comment: the klp_write_module_reloc stub isn't
needed anymore :-)

Thanks,
Jessica
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-13 Thread Akshay Adiga

Hi Viresh ,

Thanks for reviewing in detail.
I will correct all comments related to coding standards in my next patch.

On 04/13/2016 10:33 AM, Viresh Kumar wrote:


Comments mostly on the coding standards which you have *not* followed.

Also, please run checkpatch --strict next time you send patches
upstream.


Thanks for pointing out the --strict option, was not aware of that. I will
run checkpatch --strict on the next versions.


On 12-04-16, 23:36, Akshay Adiga wrote:
+
+/*
+ * While resetting we don't want "timer" fields to be set to zero as we
+ * may lose track of timer and will not be able to cleanly remove it
+ */
+#define reset_gpstates(policy)   memset(policy->driver_data, 0,\
+   sizeof(struct global_pstate_info)-\
+   sizeof(struct timer_list)-\
+   sizeof(spinlock_t))
That's super *ugly*. Why don't you create a simple routine which will
set the 5 integer variables to 0 in a straight forward way ?


Yeh, will create a routine.


@@ -348,14 +395,17 @@ static void set_pstate(void *freq_data)
unsigned long val;
unsigned long pstate_ul =
((struct powernv_smp_call_data *) freq_data)->pstate_id;
+   unsigned long gpstate_ul =
+   ((struct powernv_smp_call_data *) freq_data)->gpstate_id;

Remove these unnecessary casts and do:

struct powernv_smp_call_data *freq_data = data; //Name func arg as data

And then use freq_data->*.


Ok. Will do that.


+/*
+ * gpstate_timer_handler
+ *
+ * @data: pointer to cpufreq_policy on which timer was queued
+ *
+ * This handler brings down the global pstate closer to the local pstate
+ * according quadratic equation. Queues a new timer if it is still not equal
+ * to local pstate
+ */
+void gpstate_timer_handler(unsigned long data)
+{
+   struct cpufreq_policy *policy = (struct cpufreq_policy *) data;

no need to cast.


May be i need a cast here,  because data is unsigned long ( unlike other places 
where its void *).
On building without cast, it throws me a warning.


+   struct global_pstate_info *gpstates = (struct global_pstate_info *)
+   struct powernv_smp_call_data freq_data;
+   int ret;
+
+   ret = spin_trylock(>gpstate_lock);

no need of 'ret' for just this, simply do: if (!spin_trylock())...


Sure will do that.


a

+   if (!ret)
+   return;
+
+   gpstates->last_sampled_time += time_diff;
+   gpstates->elapsed_time += time_diff;
+   freq_data.pstate_id = gpstates->last_lpstate;
+   if ((gpstates->last_gpstate == freq_data.pstate_id) ||
+   (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME)) {
+   freq_data.gpstate_id = freq_data.pstate_id;
+   reset_gpstates(policy);
+   gpstates->highest_lpstate = freq_data.pstate_id;
+   } else {
+   freq_data.gpstate_id = calculate_global_pstate(

You can't break a line after ( of a function call :)

Let it go beyond 80 columns if it has to.


May be i will try to get it inside 80 columns with a temporary variable instead 
of
freq_data.gpstate_id.


+   gpstates->elapsed_time, gpstates->highest_lpstate,
+   freq_data.pstate_id);
+   }
+
+   /* If local pstate is equal to global pstate, rampdown is over

Bad style again.


+* So timer is not required to be queued.
+*/
+   if (freq_data.gpstate_id != freq_data.pstate_id)
+   ret = queue_gpstate_timer(gpstates);

ret not used.


Should i make it void instead of returning int?, as i cannot do much even if it 
fails, except for notifying.


+gpstates_done:
+   gpstates->last_sampled_time = cur_msec;
+   gpstates->last_gpstate = freq_data.gpstate_id;
+   gpstates->last_lpstate = freq_data.pstate_id;
+
/*
 * Use smp_call_function to send IPI and execute the
 * mtspr on target CPU.  We could do that without IPI
 * if current CPU is within policy->cpus (core)
 */
smp_call_function_any(policy->cpus, set_pstate, _data, 1);
+   spin_unlock_irqrestore(>gpstate_lock, flags);
+   return 0;
+}
  
+static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)

Add this after the init() routine.


Ok will do it.


+   policy->driver_data = gpstates;
+
+   /* initialize timer */
+   init_timer_deferrable(>timer);
+   gpstates->timer.data = (unsigned long) policy;
+   gpstates->timer.function = gpstate_timer_handler;
+   gpstates->timer.expires = jiffies +
+   msecs_to_jiffies(GPSTATE_TIMER_INTERVAL);
+
+   pr_info("Added global_pstate_info & timer for %d cpu\n", base);
return cpufreq_table_validate_and_show(policy, powernv_freqs);

Who will free gpstates if this fails ?


Thanks for pointing out. Will fix in v2.

Regards
Akshay Adiga

___
Linuxppc-dev 

Re: [PATCH V2 1/2] devicetree/bindings: Add binding for operator panel on FSP machines

2016-04-13 Thread Rob Herring
On Tue, Apr 12, 2016 at 11:05:06AM +1000, Suraj Jitindar Singh wrote:
> Add a binding to Documentation/devicetree/bindings/powerpc/opal
> (oppanel-opal.txt) for the operator panel which is present on IBM
> pseries machines with FSPs.
> 
> Signed-off-by: Suraj Jitindar Singh 
> ---
>  .../devicetree/bindings/powerpc/opal/oppanel-opal.txt  | 14 
> ++
>  1 file changed, 14 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt

Acked-by: Rob Herring 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: cxl: Delete an unnecessary check before the function call "kfree"

2016-04-13 Thread Michael Ellerman
On Fri, 2015-06-11 at 10:05:46 UTC, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Fri, 6 Nov 2015 11:00:23 +0100
> 
> The kfree() function tests whether its argument is NULL and then
> returns immediately. Thus the test around the call is not needed.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> Reviewed-by: Andrew Donnellan 
> Acked-by: Ian Munsie 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1050e689a63baffdadcd33498c

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/2] powerpc: sparse: static-ify some things

2016-04-13 Thread Michael Ellerman
On Wed, 2016-06-01 at 00:45:50 UTC, Daniel Axtens wrote:
> As sparse suggests, these should be made static.
> 
> Signed-off-by: Daniel Axtens 
> Reviewed-by: Andrew Donnellan 
> Reviewed-by: Stewart Smith 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/635218c785bef355bc8266a1fd

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND, 1/2] drivers: macintosh: rack-meter: limit idle ticks to total ticks

2016-04-13 Thread Michael Ellerman
On Sun, 2016-10-04 at 19:53:47 UTC, Aaro Koskinen wrote:
> Limit idle ticks to total ticks. This prevents the annoying rackmeter
> leds fully ON / OFF blinking state that happens on fully idling
> G5 Xserve systems.
> 
> Signed-off-by: Aaro Koskinen 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c796d1d97c3035cf54d4d5a9e7

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5] Live patching for powerpc

2016-04-13 Thread Jiri Kosina
On Wed, 13 Apr 2016, Miroslav Benes wrote:

> > This series adds live patching support for powerpc (ppc64le only ATM).
> > 
> > It's unchanged since the version I posted on March 24, with the exception 
> > that
> > I've dropped the first patch, which was a testing-only patch.
> > 
> > If there's no further comments I'll put this in a topic branch in the next 
> > day
> > or two and Jiri & I will both merge that into next.
> 
> Hi,
> 
> I'll definitely give it a proper look today or tomorrow, but there is one 
> thing that needs to be solved. The patch set from Jessica reworking 
> relocations for live patching is now merged in our for-next branch. This 
> means that we need to find out if there is something in struct 
> mod_arch_specific for powerpc which needs to be preserved and do it.

Michael, if the plan is still the original one, i.e. you push it to your 
branch, and I merge it to livepatching (and resolve any dependencies on 
the relocations code during the merge) and push it to Linus from 
livepatching.git, then there shouldn't be anything do to on your side.

Alternatively, you can rebase on top of livepatching.git#for-next, and 
I'll take it directly.

Thanks,

-- 
Jiri Kosina
SUSE Labs

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5] Live patching for powerpc

2016-04-13 Thread Miroslav Benes
On Wed, 13 Apr 2016, Michael Ellerman wrote:

> This series adds live patching support for powerpc (ppc64le only ATM).
> 
> It's unchanged since the version I posted on March 24, with the exception that
> I've dropped the first patch, which was a testing-only patch.
> 
> If there's no further comments I'll put this in a topic branch in the next day
> or two and Jiri & I will both merge that into next.

Hi,

I'll definitely give it a proper look today or tomorrow, but there is one 
thing that needs to be solved. The patch set from Jessica reworking 
relocations for live patching is now merged in our for-next branch. This 
means that we need to find out if there is something in struct 
mod_arch_specific for powerpc which needs to be preserved and do it.

Regards,
Miroslav
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 5/5] powerpc/livepatch: Add live patching support on ppc64le

2016-04-13 Thread Michael Ellerman
Add the kconfig logic & assembly support for handling live patched
functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
depends on the new -mprofile-kernel ftrace ABI, which is only supported
currently on ppc64le.

Live patching is handled by a special ftrace handler. This means it runs
from ftrace_caller(). The live patch handler modifies the NIP so as to
redirect the return from ftrace_caller() to the new patched function.

However there is one particularly tricky case we need to handle.

If a function A calls another function B, and it is known at link time
that they share the same TOC, then A will not save or restore its TOC,
and will call the local entry point of B.

When we live patch B, we replace it with a new function C, which may
not have the same TOC as A. At live patch time it's too late to modify A
to do the TOC save/restore, so the live patching code must interpose
itself between A and C, and do the TOC save/restore that A omitted.

An additionaly complication is that the livepatch code can not create a
stack frame in order to save the TOC. That is because if C takes > 8
arguments, or is varargs, A will have written the arguments for C in
A's stack frame.

To solve this, we introduce a "livepatch stack" which grows upward from
the base of the regular stack, and is used to store the TOC & LR when
calling a live patched function.

When the patched function returns, we retrieve the real LR & TOC from
the livepatch stack, restore them, and pop the livepatch "stack frame".

Signed-off-by: Michael Ellerman 
Reviewed-by: Torsten Duwe 
Reviewed-by: Balbir Singh 
---
 arch/powerpc/Kconfig  |  3 ++
 arch/powerpc/kernel/asm-offsets.c |  4 ++
 arch/powerpc/kernel/entry_64.S| 97 +++
 3 files changed, 104 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7cd32c038286..ed0603102442 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -160,6 +160,7 @@ config PPC
select HAVE_ARCH_SECCOMP_FILTER
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
+   select HAVE_LIVEPATCH if HAVE_DYNAMIC_FTRACE_WITH_REGS
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
@@ -1107,3 +1108,5 @@ config PPC_LIB_RHEAP
bool
 
 source "arch/powerpc/kvm/Kconfig"
+
+source "kernel/livepatch/Kconfig"
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0d0183d3180a..c9370d4e36bd 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -86,6 +86,10 @@ int main(void)
DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit));
 #endif /* CONFIG_PPC64 */
 
+#ifdef CONFIG_LIVEPATCH
+   DEFINE(TI_livepatch_sp, offsetof(struct thread_info, livepatch_sp));
+#endif
+
DEFINE(KSP, offsetof(struct thread_struct, ksp));
DEFINE(PT_REGS, offsetof(struct thread_struct, regs));
 #ifdef CONFIG_BOOKE
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 9916d150b28c..39a79c89a4b6 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -20,6 +20,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1248,6 +1249,9 @@ _GLOBAL(ftrace_caller)
addir3,r3,function_trace_op@toc@l
ld  r5,0(r3)
 
+#ifdef CONFIG_LIVEPATCH
+   mr  r14,r7  /* remember old NIP */
+#endif
/* Calculate ip from nip-4 into r3 for call below */
subir3, r7, MCOUNT_INSN_SIZE
 
@@ -1272,6 +1276,9 @@ ftrace_call:
/* Load ctr with the possibly modified NIP */
ld  r3, _NIP(r1)
mtctr   r3
+#ifdef CONFIG_LIVEPATCH
+   cmpdr14,r3  /* has NIP been altered? */
+#endif
 
/* Restore gprs */
REST_8GPRS(0,r1)
@@ -1289,6 +1296,11 @@ ftrace_call:
ld  r0, LRSAVE(r1)
mtlrr0
 
+#ifdef CONFIG_LIVEPATCH
+/* Based on the cmpd above, if the NIP was altered handle livepatch */
+   bne-livepatch_handler
+#endif
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
stdur1, -112(r1)
 .globl ftrace_graph_call
@@ -1305,6 +1317,91 @@ _GLOBAL(ftrace_graph_stub)
 
 _GLOBAL(ftrace_stub)
blr
+
+#ifdef CONFIG_LIVEPATCH
+   /*
+* This function runs in the mcount context, between two functions. As
+* such it can only clobber registers which are volatile and used in
+* function linkage.
+*
+* We get here when a function A, calls another function B, but B has
+* been live patched with a new function C.
+*
+* On entry:
+*  - we have no stack frame and can not allocate one
+*  - LR points back to the original caller (in A)
+*  - CTR holds the new NIP in C
+*  - r0 & r12 are free
+*
+* r0 can't be used as the base register for a DS-form load or 

[PATCH 4/5] powerpc/livepatch: Add livepatch stack to struct thread_info

2016-04-13 Thread Michael Ellerman
In order to support live patching we need to maintain an alternate
stack of TOC & LR values. We use the base of the stack for this, and
store the "live patch stack pointer" in struct thread_info.

Unlike the other fields of thread_info, we can not statically initialise
that value, so it must be done at run time.

This patch just adds the code to support that, it is not enabled until
the next patch which actually adds live patch support.

Signed-off-by: Michael Ellerman 
Acked-by: Balbir Singh 
---
 arch/powerpc/include/asm/livepatch.h   |  8 
 arch/powerpc/include/asm/thread_info.h |  4 +++-
 arch/powerpc/kernel/irq.c  |  3 +++
 arch/powerpc/kernel/process.c  |  6 +-
 arch/powerpc/kernel/setup_64.c | 17 ++---
 5 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
index ad36e8e34fa1..a402f7f94896 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -49,6 +49,14 @@ static inline unsigned long klp_get_ftrace_location(unsigned 
long faddr)
 */
return ftrace_location_range(faddr, faddr + 16);
 }
+
+static inline void klp_init_thread_info(struct thread_info *ti)
+{
+   /* + 1 to account for STACK_END_MAGIC */
+   ti->livepatch_sp = (unsigned long *)(ti + 1) + 1;
+}
+#else
+static void klp_init_thread_info(struct thread_info *ti) { }
 #endif /* CONFIG_LIVEPATCH */
 
 #endif /* _ASM_POWERPC_LIVEPATCH_H */
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 7efee4a3240b..8febc3f66d53 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -43,7 +43,9 @@ struct thread_info {
int preempt_count;  /* 0 => preemptable,
   <0 => BUG */
unsigned long   local_flags;/* private flags for thread */
-
+#ifdef CONFIG_LIVEPATCH
+   unsigned long *livepatch_sp;
+#endif
/* low level flags - has atomic operations done on it */
unsigned long   flags cacheline_aligned_in_smp;
 };
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 290559df1e8b..3cb46a3b1de7 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -66,6 +66,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_PPC64
 #include 
@@ -607,10 +608,12 @@ void irq_ctx_init(void)
memset((void *)softirq_ctx[i], 0, THREAD_SIZE);
tp = softirq_ctx[i];
tp->cpu = i;
+   klp_init_thread_info(tp);
 
memset((void *)hardirq_ctx[i], 0, THREAD_SIZE);
tp = hardirq_ctx[i];
tp->cpu = i;
+   klp_init_thread_info(tp);
}
 }
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b8500b4ac7fe..2a9280b945e0 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -55,6 +55,8 @@
 #include 
 #endif
 #include 
+#include 
+
 #include 
 #include 
 
@@ -1400,13 +1402,15 @@ int copy_thread(unsigned long clone_flags, unsigned 
long usp,
extern void ret_from_kernel_thread(void);
void (*f)(void);
unsigned long sp = (unsigned long)task_stack_page(p) + THREAD_SIZE;
+   struct thread_info *ti = task_thread_info(p);
+
+   klp_init_thread_info(ti);
 
/* Copy registers */
sp -= sizeof(struct pt_regs);
childregs = (struct pt_regs *) sp;
if (unlikely(p->flags & PF_KTHREAD)) {
/* kernel thread */
-   struct thread_info *ti = (void *)task_stack_page(p);
memset(childregs, 0, sizeof(struct pt_regs));
childregs->gpr[1] = sp + sizeof(struct pt_regs);
/* function */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index f98be8383a39..96d4a2b23d0f 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -69,6 +69,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -667,16 +668,16 @@ static void __init emergency_stack_init(void)
limit = min(safe_stack_limit(), ppc64_rma_size);
 
for_each_possible_cpu(i) {
-   unsigned long sp;
-   sp  = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit);
-   sp += THREAD_SIZE;
-   paca[i].emergency_sp = __va(sp);
+   struct thread_info *ti;
+   ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
+   klp_init_thread_info(ti);
+   paca[i].emergency_sp = (void *)ti + THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for machine check exception handling. */
-   sp  = memblock_alloc_base(THREAD_SIZE, 

[PATCH 3/5] powerpc/livepatch: Add livepatch header

2016-04-13 Thread Michael Ellerman
Add the powerpc specific livepatch definitions. In particular we provide
a non-default implementation of klp_get_ftrace_location().

This is required because the location of the mcount call is not constant
when using -mprofile-kernel (which we always do for live patching).

Signed-off-by: Torsten Duwe 
Signed-off-by: Balbir Singh 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/livepatch.h | 54 
 1 file changed, 54 insertions(+)
 create mode 100644 arch/powerpc/include/asm/livepatch.h

diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
new file mode 100644
index ..ad36e8e34fa1
--- /dev/null
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -0,0 +1,54 @@
+/*
+ * livepatch.h - powerpc-specific Kernel Live Patching Core
+ *
+ * Copyright (C) 2015-2016, SUSE, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+#ifndef _ASM_POWERPC_LIVEPATCH_H
+#define _ASM_POWERPC_LIVEPATCH_H
+
+#include 
+#include 
+
+#ifdef CONFIG_LIVEPATCH
+static inline int klp_check_compiler_support(void)
+{
+   return 0;
+}
+
+static inline int klp_write_module_reloc(struct module *mod, unsigned long
+   type, unsigned long loc, unsigned long value)
+{
+   /* This requires infrastructure changes; we need the loadinfos. */
+   return -ENOSYS;
+}
+
+static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
+{
+   regs->nip = ip;
+}
+
+#define klp_get_ftrace_location klp_get_ftrace_location
+static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
+{
+   /*
+* Live patch works only with -mprofile-kernel on PPC. In this case,
+* the ftrace location is always within the first 16 bytes.
+*/
+   return ftrace_location_range(faddr, faddr + 16);
+}
+#endif /* CONFIG_LIVEPATCH */
+
+#endif /* _ASM_POWERPC_LIVEPATCH_H */
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/5] livepatch: Allow architectures to specify an alternate ftrace location

2016-04-13 Thread Michael Ellerman
When livepatch tries to patch a function it takes the function address
and asks ftrace to install the livepatch handler at that location.
ftrace will look for an mcount call site at that exact address.

On powerpc the mcount location is not the first instruction of the
function, and in fact it's not at a constant offset from the start of
the function. To accommodate this add a hook which arch code can
override to customise the behaviour.

Signed-off-by: Torsten Duwe 
Signed-off-by: Balbir Singh 
Signed-off-by: Petr Mladek 
Signed-off-by: Michael Ellerman 
---
 kernel/livepatch/core.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index d68fbf63b083..b0476bb30f92 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -298,6 +298,19 @@ unlock:
rcu_read_unlock();
 }
 
+/*
+ * Convert a function address into the appropriate ftrace location.
+ *
+ * Usually this is just the address of the function, but on some architectures
+ * it's more complicated so allow them to provide a custom behaviour.
+ */
+#ifndef klp_get_ftrace_location
+static unsigned long klp_get_ftrace_location(unsigned long faddr)
+{
+   return faddr;
+}
+#endif
+
 static void klp_disable_func(struct klp_func *func)
 {
struct klp_ops *ops;
@@ -312,8 +325,14 @@ static void klp_disable_func(struct klp_func *func)
return;
 
if (list_is_singular(>func_stack)) {
+   unsigned long ftrace_loc;
+
+   ftrace_loc = klp_get_ftrace_location(func->old_addr);
+   if (WARN_ON(!ftrace_loc))
+   return;
+
WARN_ON(unregister_ftrace_function(>fops));
-   WARN_ON(ftrace_set_filter_ip(>fops, func->old_addr, 1, 0));
+   WARN_ON(ftrace_set_filter_ip(>fops, ftrace_loc, 1, 0));
 
list_del_rcu(>stack_node);
list_del(>node);
@@ -338,6 +357,15 @@ static int klp_enable_func(struct klp_func *func)
 
ops = klp_find_ops(func->old_addr);
if (!ops) {
+   unsigned long ftrace_loc;
+
+   ftrace_loc = klp_get_ftrace_location(func->old_addr);
+   if (!ftrace_loc) {
+   pr_err("failed to find location for function '%s'\n",
+   func->old_name);
+   return -EINVAL;
+   }
+
ops = kzalloc(sizeof(*ops), GFP_KERNEL);
if (!ops)
return -ENOMEM;
@@ -352,7 +380,7 @@ static int klp_enable_func(struct klp_func *func)
INIT_LIST_HEAD(>func_stack);
list_add_rcu(>stack_node, >func_stack);
 
-   ret = ftrace_set_filter_ip(>fops, func->old_addr, 0, 0);
+   ret = ftrace_set_filter_ip(>fops, ftrace_loc, 0, 0);
if (ret) {
pr_err("failed to set ftrace filter for function '%s' 
(%d)\n",
   func->old_name, ret);
@@ -363,7 +391,7 @@ static int klp_enable_func(struct klp_func *func)
if (ret) {
pr_err("failed to register ftrace handler for function 
'%s' (%d)\n",
   func->old_name, ret);
-   ftrace_set_filter_ip(>fops, func->old_addr, 1, 0);
+   ftrace_set_filter_ip(>fops, ftrace_loc, 1, 0);
goto err;
}
 
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/5] ftrace: Make ftrace_location_range() global

2016-04-13 Thread Michael Ellerman
In order to support live patching on powerpc we would like to call
ftrace_location_range(), so make it global.

Signed-off-by: Torsten Duwe 
Signed-off-by: Balbir Singh 
Signed-off-by: Michael Ellerman 
---
 include/linux/ftrace.h |  1 +
 kernel/trace/ftrace.c  | 14 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dea12a6e413b..66a36a815f0a 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -455,6 +455,7 @@ int ftrace_update_record(struct dyn_ftrace *rec, int 
enable);
 int ftrace_test_record(struct dyn_ftrace *rec, int enable);
 void ftrace_run_stop_machine(int command);
 unsigned long ftrace_location(unsigned long ip);
+unsigned long ftrace_location_range(unsigned long start, unsigned long end);
 unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec);
 unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b1870fbd2b67..7e8d792da963 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1530,7 +1530,19 @@ static int ftrace_cmp_recs(const void *a, const void *b)
return 0;
 }
 
-static unsigned long ftrace_location_range(unsigned long start, unsigned long 
end)
+/**
+ * ftrace_location_range - return the first address of a traced location
+ * if it touches the given ip range
+ * @start: start of range to search.
+ * @end: end of range to search (inclusive). @end points to the last byte
+ * to check.
+ *
+ * Returns rec->ip if the related ftrace location is a least partly within
+ * the given address range. That is, the first address of the instruction
+ * that is either a NOP or call to the function tracer. It checks the ftrace
+ * internal tables to determine if the address belongs or not.
+ */
+unsigned long ftrace_location_range(unsigned long start, unsigned long end)
 {
struct ftrace_page *pg;
struct dyn_ftrace *rec;
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/5] Live patching for powerpc

2016-04-13 Thread Michael Ellerman
This series adds live patching support for powerpc (ppc64le only ATM).

It's unchanged since the version I posted on March 24, with the exception that
I've dropped the first patch, which was a testing-only patch.

If there's no further comments I'll put this in a topic branch in the next day
or two and Jiri & I will both merge that into next.

cheers

Michael Ellerman (5):
  ftrace: Make ftrace_location_range() global
  livepatch: Allow architectures to specify an alternate ftrace location
  powerpc/livepatch: Add livepatch header
  powerpc/livepatch: Add livepatch stack to struct thread_info
  powerpc/livepatch: Add live patching support on ppc64le

 arch/powerpc/Kconfig   |  3 ++
 arch/powerpc/include/asm/livepatch.h   | 62 ++
 arch/powerpc/include/asm/thread_info.h |  4 +-
 arch/powerpc/kernel/asm-offsets.c  |  4 ++
 arch/powerpc/kernel/entry_64.S | 97 ++
 arch/powerpc/kernel/irq.c  |  3 ++
 arch/powerpc/kernel/process.c  |  6 ++-
 arch/powerpc/kernel/setup_64.c | 17 +++---
 include/linux/ftrace.h |  1 +
 kernel/livepatch/core.c| 34 ++--
 kernel/trace/ftrace.c  | 14 -
 11 files changed, 232 insertions(+), 13 deletions(-)
 create mode 100644 arch/powerpc/include/asm/livepatch.h

-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3] mtd: nand: pasemi: switch to dev_* printing functions

2016-04-13 Thread Boris Brezillon
On Wed, 13 Apr 2016 11:48:05 +0200
Rafał Miłecki  wrote:

> It also contains some minor related changes:
> 1) Don't warn if kzalloc fails as it dumps stack on its own
> 2) Use %pR format for displaying whole resource to avoid:
> warning: format ‘%08llx’ expects type ‘long long unsigned int’, but argument 
> 2 has type ‘resource_size_t’
> 
> Signed-off-by: Rafał Miłecki 

Applied with a slightly different commit message to avoid "Possible
unwrapped commit description" checkpatch warning.

Thanks,

Boris

> ---
> V3: Switch to dev_* instead of pr_*
> ---
>  drivers/mtd/nand/pasemi_nand.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/mtd/nand/pasemi_nand.c b/drivers/mtd/nand/pasemi_nand.c
> index 63fcb8c..5de7591 100644
> --- a/drivers/mtd/nand/pasemi_nand.c
> +++ b/drivers/mtd/nand/pasemi_nand.c
> @@ -92,8 +92,9 @@ int pasemi_device_ready(struct mtd_info *mtd)
>  
>  static int pasemi_nand_probe(struct platform_device *ofdev)
>  {
> + struct device *dev = >dev;
>   struct pci_dev *pdev;
> - struct device_node *np = ofdev->dev.of_node;
> + struct device_node *np = dev->of_node;
>   struct resource res;
>   struct nand_chip *chip;
>   int err = 0;
> @@ -107,13 +108,11 @@ static int pasemi_nand_probe(struct platform_device 
> *ofdev)
>   if (pasemi_nand_mtd)
>   return -ENODEV;
>  
> - pr_debug("pasemi_nand at %pR\n", );
> + dev_dbg(dev, "pasemi_nand at %pR\n", );
>  
>   /* Allocate memory for MTD device structure and private data */
>   chip = kzalloc(sizeof(struct nand_chip), GFP_KERNEL);
>   if (!chip) {
> - printk(KERN_WARNING
> -"Unable to allocate PASEMI NAND MTD device structure\n");
>   err = -ENOMEM;
>   goto out;
>   }
> @@ -121,7 +120,7 @@ static int pasemi_nand_probe(struct platform_device 
> *ofdev)
>   pasemi_nand_mtd = nand_to_mtd(chip);
>  
>   /* Link the private data with the MTD structure */
> - pasemi_nand_mtd->dev.parent = >dev;
> + pasemi_nand_mtd->dev.parent = dev;
>  
>   chip->IO_ADDR_R = of_iomap(np, 0);
>   chip->IO_ADDR_W = chip->IO_ADDR_R;
> @@ -163,13 +162,13 @@ static int pasemi_nand_probe(struct platform_device 
> *ofdev)
>   }
>  
>   if (mtd_device_register(pasemi_nand_mtd, NULL, 0)) {
> - printk(KERN_ERR "pasemi_nand: Unable to register MTD device\n");
> + dev_err(dev, "Unable to register MTD device\n");
>   err = -ENODEV;
>   goto out_lpc;
>   }
>  
> - printk(KERN_INFO "PA Semi NAND flash at %08llx, control at I/O %x\n",
> -res.start, lpcctl);
> + dev_info(dev, "PA Semi NAND flash at %pR, control at I/O %x\n", ,
> +  lpcctl);
>  
>   return 0;
>  



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc/xmon: Fix SPR read/write commands and add command to dump SPRs

2016-04-13 Thread Michael Ellerman
From: Paul Mackerras 

xmon has commands for reading and writing SPRs, but they don't work
currently for several reasons. They attempt to synthesize a small
function containing an mfspr or mtspr instruction and call it. However,
the instructions are on the stack, which is usually not executable.
Also, for 64-bit we set up a procedure descriptor, which is fine for the
big-endian ABIv1, but not correct for ABIv2. Finally, the code uses the
infrastructure for catching memory errors, but that only catches data
storage interrupts and machine check interrupts, but a failed
mfspr/mtspr can generate a program interrupt or a hypervisor emulation
assist interrupt, or be a no-op.

Instead of trying to synthesize a function on the fly, this adds two new
functions, xmon_mfspr() and xmon_mtspr(), which take an SPR number as an
argument and read or write the SPR. Because there is no Power ISA
instruction which takes an SPR number in a register, we have to generate
one of each possible mfspr and mtspr instruction, for all 1024 possible
SPRs. Thus we get just over 8k bytes of code for each of xmon_mfspr()
and xmon_mtspr(). However, this 16kB of code pales in comparison to the
> 130kB of PPC opcode tables used by the xmon disassembler.

To catch interrupts caused by the mfspr/mtspr instructions, we add a new
'catch_spr_faults' flag. If an interrupt occurs while it is set, we come
back into xmon() via program_check_interrupt(), _exception() and die(),
see that catch_spr_faults is set and do a longjmp to bus_error_jmp, back
into read_spr() or write_spr().

This adds a couple of other nice features: first, a "Sa" command that
attempts to read and print out the value of all 1024 SPRs. If any mfspr
instruction acts as a no-op, then the SPR is not implemented and not
printed.

Secondly, the Sr and Sw commands detect when an SPR is not
implemented (i.e. mfspr is a no-op) and print a message to that effect
rather than printing a bogus value.

Signed-off-by: Paul Mackerras 
Signed-off-by: Michael Ellerman 
---
v2: Rename spraccess.S to spr_access.S.
Rename to xmon_mf/tspr().
Use @got and document rlwinm.
Consolidate the asm.
Tweak output formatting.
Update SPR help message.
Refactor SPR dump.
Just let the logic detect 4, 5 & 6 are unimplemented.
Rename SPR fault flag.
Switch all to 'Sa'.
Consolidate to one switch statement.

 arch/powerpc/xmon/Makefile |   2 +-
 arch/powerpc/xmon/spr_access.S |  45 ++
 arch/powerpc/xmon/xmon.c   | 136 +++--
 3 files changed, 122 insertions(+), 61 deletions(-)
 create mode 100644 arch/powerpc/xmon/spr_access.S

diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index 436062dbb6e2..0b2f771593eb 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -7,7 +7,7 @@ UBSAN_SANITIZE := n
 
 ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC)
 
-obj-y  += xmon.o nonstdio.o
+obj-y  += xmon.o nonstdio.o spr_access.o
 
 ifdef CONFIG_XMON_DISASSEMBLY
 obj-y  += ppc-dis.o ppc-opc.o
diff --git a/arch/powerpc/xmon/spr_access.S b/arch/powerpc/xmon/spr_access.S
new file mode 100644
index ..84ad74213c83
--- /dev/null
+++ b/arch/powerpc/xmon/spr_access.S
@@ -0,0 +1,45 @@
+#include 
+
+/* unsigned long xmon_mfspr(sprn, default_value) */
+_GLOBAL(xmon_mfspr)
+   ld  r5, .Lmfspr_table@got(r2)
+   b   xmon_mxspr
+
+/* void xmon_mtspr(sprn, new_value) */
+_GLOBAL(xmon_mtspr)
+   ld  r5, .Lmtspr_table@got(r2)
+   b   xmon_mxspr
+
+/*
+ * r3 = sprn
+ * r4 = default or new value
+ * r5 = table base
+ */
+xmon_mxspr:
+   /*
+* To index into the table of mxsprs we need:
+*  i = (sprn & 0x3ff) * 8
+* or using rwlinm:
+*  i = (sprn << 3) & (0x3ff << 3)
+*/
+   rlwinm  r3, r3, 3, 0x3ff << 3
+   add r5, r5, r3
+   mtctr   r5
+   mr  r3, r4 /* put default_value in r3 for mfspr */
+   bctr
+
+.Lmfspr_table:
+   spr = 0
+   .rept   1024
+   mfspr   r3, spr
+   blr
+   spr = spr + 1
+   .endr
+
+.Lmtspr_table:
+   spr = 0
+   .rept   1024
+   mtspr   spr, r4
+   blr
+   spr = spr + 1
+   .endr
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 942796fa4767..7f5c74ef80ec 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -86,6 +86,7 @@ static char tmpstr[128];
 
 static long bus_error_jmp[JMP_BUF_LEN];
 static int catch_memory_errors;
+static int catch_spr_faults;
 static long *xmon_fault_jmp[NR_CPUS];
 
 /* Breakpoint stuff */
@@ -147,7 +148,7 @@ void getstring(char *, int);
 static void flush_input(void);
 static int inchar(void);
 static void take_input(char *);
-static unsigned long read_spr(int);
+static int  read_spr(int, unsigned long *);
 static void write_spr(int, unsigned long);
 static void 

Re: [PATCH] sched/cpuacct: Check for NULL when using task_pt_regs()

2016-04-13 Thread Ingo Molnar

* Michael Ellerman  wrote:

> On Wed, 2016-04-13 at 09:43 +0200, Ingo Molnar wrote:
> > * Srikar Dronamraju  wrote:
> > 
> > > * Anton Blanchard  [2016-04-06 21:59:50]:
> > > 
> > > > Looks good, and the patch below does fix the oops for me.
> > > > 
> > > > Anton
> > > > --
> > > > 
> > > > task_pt_regs() can return NULL for kernel threads, so add a check.
> > > > This fixes an oops at boot on ppc64.
> > > > 
> > > > Signed-off-by: Anton Blanchard 
> > > 
> > > Works for me too.
> > > 
> > > Reported-and-Tested-by: Srikar Dronamraju 
> > 
> > Could someone please re-send the fix, because it has not reached me nor 
> > lkml.
> 
> It did hit LKML:
> 
> http://lkml.kernel.org/r/20160406215950.04bc3f0b@kryten
> 
> But that did have some verbiage at the top.
> 
> Anton's also resent it directly To you.

So it was in my Spam folder, due to the following SPF softfail:

  Received-SPF: softfail (google.com: domain of transitioning an...@samba.org 
does not designate 198.145.29.136 as permitted sender) client-ip=198.145.29.136;

have the patch now.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: introduce {cmp}xchg for u8 and u16

2016-04-13 Thread Pan Xinhui
Hello Peter,

On 2016年04月12日 22:30, Peter Zijlstra wrote:
> On Sun, Apr 10, 2016 at 10:17:28PM +0800, Pan Xinhui wrote:
>>
>> On 2016年04月08日 15:47, Peter Zijlstra wrote:
>>> On Fri, Apr 08, 2016 at 02:41:46PM +0800, Pan Xinhui wrote:
 From: pan xinhui 

 Implement xchg{u8,u16}{local,relaxed}, and
 cmpxchg{u8,u16}{,local,acquire,relaxed}.

 Atomic operation on 8-bit and 16-bit data type is supported from power7
>>>
>>> And yes I see nothing P7 specific here, this implementation is for
>>> everything PPC64 afaict, no?
>>>
>> Hello Peter,
>>  No, it's not for every ppc. So yes, I need add #ifdef here. Thanks for 
>> pointing it out.
>> We might need a new config option and let it depend on POWER7/POWER8_CPU or 
>> even POWER9...
> 
> Right, I'm not sure if PPC has alternatives, but you could of course
> runtime patch the code from emulated with 32bit ll/sc to native 8/16bit
> ll/sc if present on the current CPU if you have infrastructure for these
> things.
> 
seems interesting. I have no idea about how to runtime patch the code. I will 
try to learn that.
If so, we need change {cmp}xchg into uninline functions?

>>> Also, note that you don't need explicit 8/16 bit atomics to implement
>>> these. Its fine to use 32bit atomics and only modify half the word.
>>>
>> That is true. But I am a little worried about the performance. It will
>> forbid any other tasks to touch the other half word during the
>> load/reserve, right?
> 
> Well, not forbid, it would just make the LL/SC fail and try again. Other
> archs already implement them this way. See commit 3226aad81aa6 ("sh:
> support 1 and 2 byte xchg") for example.
> 
thanks for your explanation. :)

I wrote one similar patch as you suggested.

I paste the new __xchg_u8's alpha implementation here. it need rewrite to be 
understood easily...
It does work, but some performance tests are needed later.

static __always_inline unsigned long
__xchg_u8_local(volatile void *p, unsigned char val)
{
unsigned int prev, prev_mask, tmp, offset, _val, *_p;

_p = (unsigned int *)round_down((unsigned long)p, sizeof(int));
_val = val;
offset = 8 * ( (unsigned long)p - (unsigned long )_p) ;
#ifndef CONFIG_CPU_LITTLE_ENDIAN
offset = 8 * (sizeof(int) - sizeof(__typeof__(val))) - offset;
#endif
_val <<= offset;
prev_mask = ~((unsigned int)(__typeof__ (val))-1 << offset);

__asm__ __volatile__(
"1: lwarx   %0,0,%3\n"
"   and %1,%0,%5\n"
"   or %1,%1,%4\n"
PPC405_ERR77(0,%2)
"   stwcx.  %1,0,%3\n"
"   bne-1b"
: "=" (prev), "=" (tmp), "+m" (*(volatile unsigned int *)_p)
: "r" (_p), "r" (_val), "r" (prev_mask)
: "cc", "memory");

return prev >> offset;
}

>> I am working on the qspinlock implementation on PPC.
>> Your and Waiman's patches are so nice. :)
> 
> Thanks!, last time I looked at PPC spinlocks they could not use things
> like ticket locks because PPC might be a guest and fairness blows etc..
> 
> You're making the qspinlock-paravirt thing work on PPC, or doing
> qspinlock only for bare-metal PPC?
> 
I am making the both work. :)
qspinlock works on PPC now. I am preparing the patches and will send them out 
in next weeks :)

The paravirt work is a little hard.
currently, there are pv_wait() and pv_kick(). but only pv_kick has the 
parameter cpu(who will hold the lock as soon as the lock is unlocked). 
We need parameter cpu(who holds the lock now) in pv_wait,too.

thanks
xinhui

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 05/10] powerpc/hugetlb: Split the function 'huge_pte_alloc'

2016-04-13 Thread Anshuman Khandual
On 04/11/2016 07:21 PM, Balbir Singh wrote:
> 
> 
> On 07/04/16 15:37, Anshuman Khandual wrote:
>> Currently the function 'huge_pte_alloc' has got two versions, one for the
>> BOOK3S server and the other one for the BOOK3E embedded platforms. This
>> change splits only the BOOK3S server version into two parts, one for the
>> ARCH_WANT_GENERAL_HUGETLB config implementation and the other one for
>> everything else. This change is one of the prerequisites towards enabling
>> ARCH_WANT_GENERAL_HUGETLB config option on POWER platform.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  arch/powerpc/mm/hugetlbpage.c | 67 
>> +++
>>  1 file changed, 43 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>> index d991b9e..e453918 100644
>> --- a/arch/powerpc/mm/hugetlbpage.c
>> +++ b/arch/powerpc/mm/hugetlbpage.c
>> @@ -59,6 +59,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long 
>> addr)
>>  return __find_linux_pte_or_hugepte(mm->pgd, addr, NULL, NULL);
>>  }
>>  
>> +#ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
>>  static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
>> unsigned long address, unsigned pdshift, unsigned 
>> pshift)
>>  {
>> @@ -116,6 +117,7 @@ static int __hugepte_alloc(struct mm_struct *mm, 
>> hugepd_t *hpdp,
>>  spin_unlock(>page_table_lock);
>>  return 0;
>>  }
>> +#endif /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>>  
>>  /*
>>   * These macros define how to determine which level of the page table holds
>> @@ -130,6 +132,7 @@ static int __hugepte_alloc(struct mm_struct *mm, 
>> hugepd_t *hpdp,
>>  #endif
>>  
>>  #ifdef CONFIG_PPC_BOOK3S_64
>> +#ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
>>  /*
>>   * At this point we do the placement change only for BOOK3S 64. This would
>>   * possibly work on other subarchs.
>> @@ -145,32 +148,23 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned 
>> long addr, unsigned long sz
>>  
>>  addr &= ~(sz-1);
>>  pg = pgd_offset(mm, addr);
>> -
>> -if (pshift == PGDIR_SHIFT)
>> -/* 16GB huge page */
>> -return (pte_t *) pg;
>> -else if (pshift > PUD_SHIFT)
>> -/*
>> - * We need to use hugepd table
>> - */
>> +if (pshift > PUD_SHIFT) {
>>  hpdp = (hugepd_t *)pg;
>> -else {
>> -pdshift = PUD_SHIFT;
>> -pu = pud_alloc(mm, pg, addr);
>> -if (pshift == PUD_SHIFT)
>> -return (pte_t *)pu;
>> -else if (pshift > PMD_SHIFT)
>> -hpdp = (hugepd_t *)pu;
>> -else {
>> -pdshift = PMD_SHIFT;
>> -pm = pmd_alloc(mm, pu, addr);
>> -if (pshift == PMD_SHIFT)
>> -/* 16MB hugepage */
>> -return (pte_t *)pm;
>> -else
>> -hpdp = (hugepd_t *)pm;
>> -}
>> +goto hugepd_search;
>>  }
>> +
>> +pdshift = PUD_SHIFT;
>> +pu = pud_alloc(mm, pg, addr);
>> +if (pshift > PMD_SHIFT) {
>> +hpdp = (hugepd_t *)pu;
>> +goto hugepd_search;
>> +}
>> +
>> +pdshift = PMD_SHIFT;
>> +pm = pmd_alloc(mm, pu, addr);
>> +hpdp = (hugepd_t *)pm;
>> +
>> +hugepd_search:
>>  if (!hpdp)
>>  return NULL;
>>  
>> @@ -182,6 +176,31 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned 
>> long addr, unsigned long sz
>>  return hugepte_offset(*hpdp, addr, pdshift);
>>  }
>>  
>> +#else /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>> +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned 
>> long sz)
> 
> This is confusing, aren't we using the one from mm/hugetlb.c?

We are using huge_pte_alloc() from mm/hugetlb.c only when we have
CONFIG_ARCH_WANT_GENERAL_HUGETLB enabled. For every thing else we
use the definition here for BOOK3S platforms.

> 
>> +{
>> +pgd_t *pg;
>> +pud_t *pu;
>> +pmd_t *pm;
>> +unsigned pshift = __ffs(sz);
>> +
>> +addr &= ~(sz-1);
> 
> Am I reading this right? Shouldn't this be addr &= ~(1 << pshift - 1)

Both are same. __ffs() computes the __ilog2 of the size and arrives at
the page shift. Here we use the size directly instead.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] sched/cpuacct: Check for NULL when using task_pt_regs()

2016-04-13 Thread Michael Ellerman
On Wed, 2016-04-13 at 09:43 +0200, Ingo Molnar wrote:
> * Srikar Dronamraju  wrote:
> 
> > * Anton Blanchard  [2016-04-06 21:59:50]:
> > 
> > > Looks good, and the patch below does fix the oops for me.
> > > 
> > > Anton
> > > --
> > > 
> > > task_pt_regs() can return NULL for kernel threads, so add a check.
> > > This fixes an oops at boot on ppc64.
> > > 
> > > Signed-off-by: Anton Blanchard 
> > 
> > Works for me too.
> > 
> > Reported-and-Tested-by: Srikar Dronamraju 
> 
> Could someone please re-send the fix, because it has not reached me nor lkml.

It did hit LKML:

http://lkml.kernel.org/r/20160406215950.04bc3f0b@kryten

But that did have some verbiage at the top.

Anton's also resent it directly To you.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: Complete FSCR context switch

2016-04-13 Thread Anton Blanchard via Linuxppc-dev
Hi Jack,

> Previously we just saved the FSCR, but only restored it in some
> settings, and never copied it thread to thread. This patch always
> restores the FSCR and formalizes new threads inheriting its setting so
> that later we can manipulate FSCR bits in start_thread.

Will this break the existing FSCR_DSCR bit handling?

 if (cpu_has_feature(CPU_FTR_DSCR)) {
u64 dscr = get_paca()->dscr_default;
u64 fscr = old_thread->fscr & ~FSCR_DSCR;

if (new_thread->dscr_inherit) {
dscr = new_thread->dscr;
fscr |= FSCR_DSCR;
}

if (old_thread->dscr != dscr)
mtspr(SPRN_DSCR, dscr);

if (old_thread->fscr != fscr)
mtspr(SPRN_FSCR, fscr);
}

If not, we should modify the above so we don't write the FSCR twice.

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap

2016-04-13 Thread Alexey Kardashevskiy

On 04/13/2016 05:53 PM, Gavin Shan wrote:

On Wed, Apr 13, 2016 at 04:21:07PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:43 PM, Gavin Shan wrote:

There are two arrays for IO and M32 segment maps on every PHB.
The index of the arrays are segment number and the value stored
in the corresponding element is PE number, indicating the segment
is assigned to the PE. Initially, all elements in those two arrays
are zeroes, meaning all segments are assigned to PE#0. It's wrong.

This fixes the initial values in the elements of those two arrays
to IODA_INVALID_PE, meaning all segments aren't assigned to any
PE.


This is ok.


In order to use IODA_INVALID_PE (-1) to represent invalid PE
number, the types of those two arrays are changed from "unsigned int"
to "int".


"unsigned" can carry (-1) perfectly fine, just add a type cast to
IODA_INVALID_PE:

#define IODA_INVALID_PE(unsigned int)(-1)

Using "signed" type for indexes which cannot be negative does not make much
sense - instead of checking for the upper boundary, you have to check for "<
0" too.

OPAL uses unsigned type for PE (uint64_t or uint32_t or uint16_t - this is
quite funny).

pnv_ioda_pe::pe_number is "unsigned" and this pe_number is the same thing as
I can see in pnv_ioda_setup_dev_PE().

Some printk() print the PE number as "%x" (which implies "unsigned").



Yes, I can simply have something like below when PE number as well as
segment index are represented by "unsigned int" values, right?

#define IODA_INVALID_PE 0x



This will work too, yes.





I suggest changing the pci_dn::pe_number type from "int" to "unsigned int" to
match pnv_ioda_pe::pe_number, in a separate patch. Or do not touch types for
now.



Yes, I will have a separate patch right before this one to address it.




Signed-off-by: Gavin Shan 
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++--
  arch/powerpc/platforms/powernv/pci.h  | 4 ++--
  2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1d2514f..44cc5f3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3239,7 +3239,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
unsigned long size, m32map_off, pemap_off, iomap_off = 0;
const __be64 *prop64;
const __be32 *prop32;
-   int len;
+   int i, len;
u64 phb_id;
void *aux;
long rc;
@@ -3334,8 +3334,13 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
aux = memblock_virt_alloc(size, 0);
phb->ioda.pe_alloc = aux;
phb->ioda.m32_segmap = aux + m32map_off;
-   if (phb->type == PNV_PHB_IODA1)
+   for (i = 0; i < phb->ioda.total_pe_num; i++)
+   phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
+   if (phb->type == PNV_PHB_IODA1) {
phb->ioda.io_segmap = aux + iomap_off;
+   for (i = 0; i < phb->ioda.total_pe_num; i++)
+   phb->ioda.io_segmap[i] = IODA_INVALID_PE;
+   }
phb->ioda.pe_array = aux + pemap_off;
set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);

diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 784882a..36c4965 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -146,8 +146,8 @@ struct pnv_phb {
struct pnv_ioda_pe  *pe_array;

/* M32 & IO segment maps */
-   unsigned int*m32_segmap;
-   unsigned int*io_segmap;
+   int *m32_segmap;
+   int *io_segmap;

/* IRQ chip */
int irq_chip_init;




--
Alexey






--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3] mtd: nand: pasemi: switch to dev_* printing functions

2016-04-13 Thread Rafał Miłecki
It also contains some minor related changes:
1) Don't warn if kzalloc fails as it dumps stack on its own
2) Use %pR format for displaying whole resource to avoid:
warning: format ‘%08llx’ expects type ‘long long unsigned int’, but argument 2 
has type ‘resource_size_t’

Signed-off-by: Rafał Miłecki 
---
V3: Switch to dev_* instead of pr_*
---
 drivers/mtd/nand/pasemi_nand.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/mtd/nand/pasemi_nand.c b/drivers/mtd/nand/pasemi_nand.c
index 63fcb8c..5de7591 100644
--- a/drivers/mtd/nand/pasemi_nand.c
+++ b/drivers/mtd/nand/pasemi_nand.c
@@ -92,8 +92,9 @@ int pasemi_device_ready(struct mtd_info *mtd)
 
 static int pasemi_nand_probe(struct platform_device *ofdev)
 {
+   struct device *dev = >dev;
struct pci_dev *pdev;
-   struct device_node *np = ofdev->dev.of_node;
+   struct device_node *np = dev->of_node;
struct resource res;
struct nand_chip *chip;
int err = 0;
@@ -107,13 +108,11 @@ static int pasemi_nand_probe(struct platform_device 
*ofdev)
if (pasemi_nand_mtd)
return -ENODEV;
 
-   pr_debug("pasemi_nand at %pR\n", );
+   dev_dbg(dev, "pasemi_nand at %pR\n", );
 
/* Allocate memory for MTD device structure and private data */
chip = kzalloc(sizeof(struct nand_chip), GFP_KERNEL);
if (!chip) {
-   printk(KERN_WARNING
-  "Unable to allocate PASEMI NAND MTD device structure\n");
err = -ENOMEM;
goto out;
}
@@ -121,7 +120,7 @@ static int pasemi_nand_probe(struct platform_device *ofdev)
pasemi_nand_mtd = nand_to_mtd(chip);
 
/* Link the private data with the MTD structure */
-   pasemi_nand_mtd->dev.parent = >dev;
+   pasemi_nand_mtd->dev.parent = dev;
 
chip->IO_ADDR_R = of_iomap(np, 0);
chip->IO_ADDR_W = chip->IO_ADDR_R;
@@ -163,13 +162,13 @@ static int pasemi_nand_probe(struct platform_device 
*ofdev)
}
 
if (mtd_device_register(pasemi_nand_mtd, NULL, 0)) {
-   printk(KERN_ERR "pasemi_nand: Unable to register MTD device\n");
+   dev_err(dev, "Unable to register MTD device\n");
err = -ENODEV;
goto out_lpc;
}
 
-   printk(KERN_INFO "PA Semi NAND flash at %08llx, control at I/O %x\n",
-  res.start, lpcctl);
+   dev_info(dev, "PA Semi NAND flash at %pR, control at I/O %x\n", ,
+lpcctl);
 
return 0;
 
-- 
1.8.4.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-04-13 Thread Alexey Kardashevskiy

On 04/13/2016 05:42 PM, Gavin Shan wrote:

On Wed, Apr 13, 2016 at 05:28:15PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:43 PM, Gavin Shan wrote:

This series of patches rebases on powerpc/next branch, plus below additional
patches:



https://patchwork.ozlabs.org/patch/581315/  (PATCH[1/9] Richard's SRIOV EEH)
https://patchwork.ozlabs.org/patch/582639/  (PATCH[1/1] Gavin's EEH fix)
https://patchwork.ozlabs.org/patch/582093/  (PATCH[1/1] Gavin's EEH fix)
https://patchwork.ozlabs.org/patch/580626/  (PATCH[1/4] Gavin's PCI fix)
https://patchwork.ozlabs.org/patch/580153/  (PATCH[1/1] Andrew's EEH minor 
fix)
https://patchwork.ozlabs.org/patch/566827/  (PATCH[1/1] Russell's P5IOC2 
removal)
https://patchwork.ozlabs.org/patch/534154/  (PATCH[1/7] Richard's SRIOV 
rework)
commit 388f7b1 ("Linux 4.5-rc3")

The series of patches intend to support PCI slot for PowerPC PowerNV platform,
which is running on top of skiboot firmware. The patchset requires corresponding
changes from skiboot firmware, which is sent to skib...@lists.ozlabs.org
for review. The PCI slots are exposed by skiboot with device node properties,
and kernel utilizes those properties to populated PCI slots accordingly.

The original PCI infrastructure on PowerNV platform can't support hotplug
because the PE is assigned during PHB fixup time, which is called for once
during system boot time. For this, the PCI infrastructure on PowerNV platform
has been reworked for a lot. After that, the PE and its corresponding resources
(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
resources, on P8 strictly speaking). Each PE will maintain a reference count,
which is (number of child PCI devices + 1). That indicates when last child PCI
device leaves the PE, the PE and its included resources will be relased and put
back into free pool again. With this design, the PE will be released when EEH PE
is released. PATCH[1 - 23] are related to this part.

 From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
resets to EEH. The kernel gets to know if skiboot supports various reset on one
particular PCI slot through device-tree node. If it does, EEH will utilize the
functionality provided by skiboot. Besides, the device-tree nodes have to change
in order to support PCI hotplug. For example, when one PCI adapter inserted to
one slot, its device-tree node should be added to the system dynamically. 
Conversely,
the device-tree node should be removed from the system when the PCI adapter is 
going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
doing the related work.

The OF driver is changed to support unflattening FDT blob for sub-stree, which
is covered by PATCH[40 - 44].

The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC 
PowerNV
platform.

===
Testing
===
1. Unplug adapters behind non-empty slot, then plug them.

1.1 Check status
# cat /sys/bus/pci/slots/C10/address
0003:09:00
# cat /sys/bus/pci/slots/C10/adapter
1
# cat /sys/bus/pci/slots/C10/power
1
# lspci
0003:09:00.0 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
# lspci -t
# lspci -t
-+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
 |   +-08.0-[04-08]--
 |   +-09.0-[09]--+-00.0
 |   |+-00.1
 |   |+-00.2
 |   |\-00.3
 |   +-10.0-[0a-0e]--
 |   \-11.0-[0f-13]--

1.2 Unplug adapter 0003:09.00.x
# echo 0 > /sys/bus/pci/slots/C10/power
# lspci -t
-+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
 |   +-08.0-[04-08]--
 |   +-09.0-[09]--
 |   +-10.0-[0a-0e]--
 |   \-11.0-[0f-13]--

1.3 Plug adapter 0003:09.00.x
# echo 1 > /sys/bus/pci/slots/C10/power



Do I understand correctly that the adapter was not physically moved in/out of
the slot between 1.2 and 1.3?



Correct.



This is not right then... Someone 

Re: [PATCH v8 16/45] powerpc/powernv: Remove DMA32 PE list

2016-04-13 Thread Alexey Kardashevskiy

On 02/17/2016 02:43 PM, Gavin Shan wrote:

PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
to their DMA32 weight. The PEs on the list are iterated to setup
their TCE32 tables at system booting time. The list is used for
once and there is for keep having it.


"there is no need to keep it" may be?




This moves the logic calculating DMA32 weight of PHB and PE to
pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE
traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount
are useless and they're removed.

Signed-off-by: Gavin Shan 



Reviewed-by: Alexey Kardashevskiy 

with few comments below...


---
  arch/powerpc/platforms/powernv/pci-ioda.c | 168 +-
  arch/powerpc/platforms/powernv/pci.h  |  19 
  2 files changed, 75 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index e60cff6..0fc2309 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -886,44 +886,6 @@ out:
return 0;
  }

-static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
-  struct pnv_ioda_pe *pe)
-{
-   struct pnv_ioda_pe *lpe;
-
-   list_for_each_entry(lpe, >ioda.pe_dma_list, dma_link) {
-   if (lpe->dma_weight < pe->dma_weight) {
-   list_add_tail(>dma_link, >dma_link);
-   return;
-   }
-   }
-   list_add_tail(>dma_link, >ioda.pe_dma_list);
-}
-
-static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
-{
-   /* This is quite simplistic. The "base" weight of a device
-* is 10. 0 means no DMA is to be accounted for it.
-*/
-
-   /* If it's a bridge, no DMA */
-   if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
-   return 0;
-
-   /* Reduce the weight of slow USB controllers */
-   if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
-   dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
-   dev->class == PCI_CLASS_SERIAL_USB_EHCI)
-   return 3;
-
-   /* Increase the weight of RAID (includes Obsidian) */
-   if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
-   return 15;
-
-   /* Default */
-   return 10;
-}
-
  #ifdef CONFIG_PCI_IOV
  static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
  {
@@ -1028,7 +990,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct 
pci_dev *dev)
pe->flags = PNV_IODA_PE_DEV;
pe->pdev = dev;
pe->pbus = NULL;
-   pe->tce32_seg = -1;
pe->mve_number = -1;
pe->rid = dev->bus->number << 8 | pdn->devfn;

@@ -1044,16 +1005,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct 
pci_dev *dev)
return NULL;
}

-   /* Assign a DMA weight to the device */
-   pe->dma_weight = pnv_ioda_dma_weight(dev);
-   if (pe->dma_weight != 0) {
-   phb->ioda.dma_weight += pe->dma_weight;
-   phb->ioda.dma_pe_count++;
-   }
-
-   /* Link the PE */
-   pnv_ioda_link_pe_by_weight(phb, pe);
-
return pe;
  }

@@ -1071,7 +1022,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, 
struct pnv_ioda_pe *pe)
}
pdn->pcidev = dev;
pdn->pe_number = pe->pe_number;
-   pe->dma_weight += pnv_ioda_dma_weight(dev);
if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
pnv_ioda_setup_same_PE(dev->subordinate, pe);
}
@@ -1108,10 +1058,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, 
bool all)
pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
pe->pbus = bus;
pe->pdev = NULL;
-   pe->tce32_seg = -1;
pe->mve_number = -1;
pe->rid = bus->busn_res.start << 8;
-   pe->dma_weight = 0;

if (all)
pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
@@ -1133,17 +1081,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, 
bool all)

/* Put PE to the list */
list_add_tail(>list, >ioda.pe_list);
-
-   /* Account for one DMA PE if at least one DMA capable device exist
-* below the bridge
-*/
-   if (pe->dma_weight != 0) {
-   phb->ioda.dma_weight += pe->dma_weight;
-   phb->ioda.dma_pe_count++;
-   }
-
-   /* Link the PE */
-   pnv_ioda_link_pe_by_weight(phb, pe);
  }

  static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
@@ -1184,7 +1121,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct 
pci_dev *npu_pdev)
rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
npu_pdn->pcidev = npu_pdev;
npu_pdn->pe_number = pe_num;
-   pe->dma_weight 

Re: [PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE

2016-04-13 Thread Alexey Kardashevskiy

On 02/17/2016 02:43 PM, Gavin Shan wrote:

Currently, there is one macro (TCE32_TABLE_SIZE) representing the
TCE table size for one DMA32 segment. The constant representing
the DMA32 segment size (1 << 28) is still used in the code.

This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
segment size. the TCE table size can be calcualted when the page


s/calcualted/calculated/



has fixed 4KB size. So all the related calculation depends on one
macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.


Please move PNV_IODA1_DMA32_SEGSIZE where TCE32_TABLE_SIZE was.




Signed-off-by: Gavin Shan 
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 30 +-
  arch/powerpc/platforms/powernv/pci.h  |  1 +
  2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index d18b95e..e60cff6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -48,9 +48,6 @@
  #include "powernv.h"
  #include "pci.h"

-/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
-#define TCE32_TABLE_SIZE   ((0x1000 / 0x1000) * 8)
-
  #define POWERNV_IOMMU_DEFAULT_LEVELS  1
  #define POWERNV_IOMMU_MAX_LEVELS  5

@@ -2034,7 +2031,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,

struct page *tce_mem = NULL;
struct iommu_table *tbl;
-   unsigned int i;
+   unsigned int tce32_segsz, i;



PNV_IODA1_DMA32_SEGSIZE is a segment size in bytes. The name @tce32_segsz 
also suggests that it is a segment size in bytes (otherwise it would be 
tce32_seg_entries or something like this) but it is not, it is a number of 
TCE entries (arch/powerpc/kernel/iommu.c uses "entry" for these). And 
tce32_segsz never changes. So:


const unsigned int entries = PNV_IODA1_DMA32_SEGSIZE >> 
(IOMMU_PAGE_SHIFT_4K - 3);






int64_t rc;
void *addr;

@@ -2054,29 +2051,34 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
/* Grab a 32-bit TCE table */
pe->tce32_seg = base;
pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
-   (base << 28), ((base + segs) << 28) - 1);
+   base * PNV_IODA1_DMA32_SEGSIZE,
+   (base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);

/* XXX Currently, we allocate one big contiguous table for the
 * TCEs. We only really need one chunk per 256M of TCE space
 * (ie per segment) but that's an optimization for later, it
 * requires some added smarts with our get/put_tce implementation
+*
+* Each TCE page is 4KB in size and each TCE entry occupies 8
+* bytes
 */
+   tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);



tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
-  get_order(TCE32_TABLE_SIZE * segs));
+  get_order(tce32_segsz * segs));
if (!tce_mem) {
pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
goto fail;
}
addr = page_address(tce_mem);
-   memset(addr, 0, TCE32_TABLE_SIZE * segs);
+   memset(addr, 0, tce32_segsz * segs);

/* Configure HW */
for (i = 0; i < segs; i++) {
rc = opal_pci_map_pe_dma_window(phb->opal_id,
  pe->pe_number,
  base + i, 1,
- __pa(addr) + TCE32_TABLE_SIZE * i,
- TCE32_TABLE_SIZE, 0x1000);
+ __pa(addr) + tce32_segsz * i,
+ tce32_segsz, 0x1000);



As you started using IOMMU_PAGE_SHIFT_4K and you are also touching this 
piece of code -


s/0x1000/IOMMU_PAGE_SHIFT_4K/



if (rc) {
pe_err(pe, " Failed to configure 32-bit TCE table,"
   " err %ld\n", rc);
@@ -2085,8 +2087,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
}

/* Setup linux iommu table */
-   pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
- base << 28, IOMMU_PAGE_SHIFT_4K);
+   pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
+ base * PNV_IODA1_DMA32_SEGSIZE,
+ IOMMU_PAGE_SHIFT_4K);

/* OPAL variant of P7IOC SW invalidated TCEs */
if (phb->ioda.tce_inval_reg)
@@ -2116,7 +2119,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
if (pe->tce32_seg >= 0)
pe->tce32_seg = -1;
if (tce_mem)
-   __free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
+   __free_pages(tce_mem, 

[PATCH] sched/cpuacct: Check for NULL when using task_pt_regs()

2016-04-13 Thread Anton Blanchard via Linuxppc-dev
task_pt_regs() can return NULL for kernel threads, so add a check.
This fixes an oops at boot on ppc64.

Fixes: d740037fac70 ("sched/cpuacct: Split usage accounting into user_usage and 
sys_usage")
Signed-off-by: Anton Blanchard 
Reported-and-Tested-by: Srikar Dronamraju 
---

diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index df947e0..41f85c4 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -316,12 +316,11 @@ static struct cftype files[] = {
 void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 {
struct cpuacct *ca;
-   int index;
+   int index = CPUACCT_USAGE_SYSTEM;
+   struct pt_regs *regs = task_pt_regs(tsk);
 
-   if (user_mode(task_pt_regs(tsk)))
+   if (regs && user_mode(regs))
index = CPUACCT_USAGE_USER;
-   else
-   index = CPUACCT_USAGE_SYSTEM;
 
rcu_read_lock();
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff

2016-04-13 Thread Michal Hocko
On Thu 07-04-16 11:07:35, Anshuman Khandual wrote:
> The commit 091d0d55b286 ("shm: fix null pointer deref when userspace
> specifies invalid hugepage size") had replaced MAP_HUGE_MASK with
> SHM_HUGE_MASK. Though both of them contain the same numeric value of
> 0x3f, MAP_HUGE_MASK flag sounds more appropriate than the other one
> in the context. Hence change it back.

Yes, SHM_HUGE_MASK mixing with MAP_HUGE_SHIFT is not only misleading
it might bite us later should any of the two change.

> 
> Signed-off-by: Anshuman Khandual 

Acked-by: Michal Hocko 

> ---
>  mm/mmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index bd2e1a53..7d730a4 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1315,7 +1315,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, 
> unsigned long, len,
>   struct user_struct *user = NULL;
>   struct hstate *hs;
>  
> - hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK);
> + hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
>   if (!hs)
>   return -EINVAL;
>  
> -- 
> 2.1.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Michal Hocko
SUSE Labs
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap

2016-04-13 Thread Gavin Shan
On Wed, Apr 13, 2016 at 04:21:07PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>There are two arrays for IO and M32 segment maps on every PHB.
>>The index of the arrays are segment number and the value stored
>>in the corresponding element is PE number, indicating the segment
>>is assigned to the PE. Initially, all elements in those two arrays
>>are zeroes, meaning all segments are assigned to PE#0. It's wrong.
>>
>>This fixes the initial values in the elements of those two arrays
>>to IODA_INVALID_PE, meaning all segments aren't assigned to any
>>PE.
>
>This is ok.
>
>>In order to use IODA_INVALID_PE (-1) to represent invalid PE
>>number, the types of those two arrays are changed from "unsigned int"
>>to "int".
>
>"unsigned" can carry (-1) perfectly fine, just add a type cast to
>IODA_INVALID_PE:
>
>#define IODA_INVALID_PE(unsigned int)(-1)
>
>Using "signed" type for indexes which cannot be negative does not make much
>sense - instead of checking for the upper boundary, you have to check for "<
>0" too.
>
>OPAL uses unsigned type for PE (uint64_t or uint32_t or uint16_t - this is
>quite funny).
>
>pnv_ioda_pe::pe_number is "unsigned" and this pe_number is the same thing as
>I can see in pnv_ioda_setup_dev_PE().
>
>Some printk() print the PE number as "%x" (which implies "unsigned").
>

Yes, I can simply have something like below when PE number as well as
segment index are represented by "unsigned int" values, right?

#define IODA_INVALID_PE 0x

>
>I suggest changing the pci_dn::pe_number type from "int" to "unsigned int" to
>match pnv_ioda_pe::pe_number, in a separate patch. Or do not touch types for
>now.
>

Yes, I will have a separate patch right before this one to address it.

>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++--
>>  arch/powerpc/platforms/powernv/pci.h  | 4 ++--
>>  2 files changed, 9 insertions(+), 4 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 1d2514f..44cc5f3 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -3239,7 +3239,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
>>device_node *np,
>>  unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>  const __be64 *prop64;
>>  const __be32 *prop32;
>>- int len;
>>+ int i, len;
>>  u64 phb_id;
>>  void *aux;
>>  long rc;
>>@@ -3334,8 +3334,13 @@ static void __init pnv_pci_init_ioda_phb(struct 
>>device_node *np,
>>  aux = memblock_virt_alloc(size, 0);
>>  phb->ioda.pe_alloc = aux;
>>  phb->ioda.m32_segmap = aux + m32map_off;
>>- if (phb->type == PNV_PHB_IODA1)
>>+ for (i = 0; i < phb->ioda.total_pe_num; i++)
>>+ phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
>>+ if (phb->type == PNV_PHB_IODA1) {
>>  phb->ioda.io_segmap = aux + iomap_off;
>>+ for (i = 0; i < phb->ioda.total_pe_num; i++)
>>+ phb->ioda.io_segmap[i] = IODA_INVALID_PE;
>>+ }
>>  phb->ioda.pe_array = aux + pemap_off;
>>  set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci.h 
>>b/arch/powerpc/platforms/powernv/pci.h
>>index 784882a..36c4965 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -146,8 +146,8 @@ struct pnv_phb {
>>  struct pnv_ioda_pe  *pe_array;
>>
>>  /* M32 & IO segment maps */
>>- unsigned int*m32_segmap;
>>- unsigned int*io_segmap;
>>+ int *m32_segmap;
>>+ int *io_segmap;
>>
>>  /* IRQ chip */
>>  int irq_chip_init;
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 13/45] powerpc/powernv/ioda1: M64 support on P7IOC

2016-04-13 Thread Alexey Kardashevskiy

On 02/17/2016 02:43 PM, Gavin Shan wrote:

This enables M64 window on P7IOC, which has been enabled on PHB3.
Different from PHB3 where 16 M64 BARs are supported and each of
them can be owned by one particular PE# exclusively or divided
evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
of them are divided to 8 segments. So every P7IOC PHB supports
128 M64 segments in total. P7IOC has M64DT, which helps mapping
one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
M64DT, indicating that one M64 segment can only be pinned to the
fixed PE#. In order to have same code to support M64 on P7IOC and
PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
of them is pinned to the fixed PE# by bypassing the function of
M64DT. In turn, we just need different phb->init_m64() for P7IOC
and PHB3 to support M64.


The comment is not quite correct - in addition to pnv_ioda1_init_m64(), you 
also need to hack pnv_ioda_pick_m64_pe().





Signed-off-by: Gavin Shan 
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 86 +--
  arch/powerpc/platforms/powernv/pci.h  |  3 ++
  2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1dc663a..8488238 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -246,6 +246,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev 
*pdev,
}
  }

+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
+{
+   struct resource *r;
+   int index;
+
+   /*
+* There are 16 M64 BARs, each of which has 8 segments. So
+* there are as many M64 segments as the maximum number of
+* PEs, which is 128.
+*/
+   for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
+   unsigned long base, segsz = phb->ioda.m64_segsize;
+   int64_t rc;
+
+   base = phb->ioda.m64_base +
+  index * PNV_IODA1_M64_SEGS * segsz;
+   rc = opal_pci_set_phb_mem_window(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, index, base, 0,
+   PNV_IODA1_M64_SEGS * segsz);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
+   rc, phb->hose->global_number, index);
+   goto fail;
+   }
+
+   rc = opal_pci_phb_mmio_enable(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, index,
+   OPAL_ENABLE_M64_SPLIT);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
+   rc, phb->hose->global_number, index);
+   goto fail;
+   }
+   }
+
+   /*
+* Exclude the segment used by the reserved PE, which
+* is expected to be 0 or last supported PE#.
+*/
+   r = >hose->mem_resources[1];
+   if (phb->ioda.reserved_pe_idx == 0)
+   r->start += phb->ioda.m64_segsize;
+   else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
+   r->end -= phb->ioda.m64_segsize;
+   else
+   pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
+   phb->ioda.reserved_pe_idx);
+
+   return 0;
+
+fail:
+   for ( ; index >= 0; index--)
+   opal_pci_phb_mmio_enable(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
+
+   return -EIO;
+}
+
  static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
unsigned long *pe_bitmap,
bool all)
@@ -315,6 +373,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool 
all)
pe->master = master_pe;
list_add_tail(>list, _pe->slaves);
}
+
+   /*
+* P7IOC supports M64DT, which helps mapping M64 segment
+* to one particular PE#. However, PHB3 has fixed mapping
+* between M64 segment and PE#. In order to have same logic
+* for P7IOC and PHB3, we enforce fixed mapping between M64
+* segment and PE# on P7IOC.
+*/
+   if (phb->type == PNV_PHB_IODA1) {
+   int64_t rc;
+
+   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+   pe->pe_number, OPAL_M64_WINDOW_TYPE,
+   pe->pe_number / PNV_IODA1_M64_SEGS,
+   pe->pe_number % PNV_IODA1_M64_SEGS);
+   if (rc != OPAL_SUCCESS)
+   pr_warn("%s: Error %lld mapping M64 for 

Re: [PATCH V2] mtd: nand: pasemi: switch to pr_* functions

2016-04-13 Thread Boris Brezillon
Hi,

On Sat, 9 Apr 2016 12:50:35 +0300
Andy Shevchenko  wrote:

> On Fri, Apr 8, 2016 at 2:13 PM, Rafał Miłecki  wrote:
> > 1) Use pr_fmt to keep messages consistent
> > 2) Don't warn if kzalloc fails as it dumps stack on its own
> > 3) Use %pR format for displaying whole resource to avoid:
> > warning: format ‘%08llx’ expects type ‘long long unsigned int’, but 
> > argument 2 has type ‘resource_size_t’
> >
> > Signed-off-by: Rafał Miłecki 
> > ---
> >  drivers/mtd/nand/pasemi_nand.c | 9 -
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/mtd/nand/pasemi_nand.c b/drivers/mtd/nand/pasemi_nand.c
> > index 63fcb8c..e8372b4 100644
> > --- a/drivers/mtd/nand/pasemi_nand.c
> > +++ b/drivers/mtd/nand/pasemi_nand.c
> > @@ -22,6 +22,8 @@
> >
> >  #undef DEBUG
> >
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> >  #include 
> >  #include 
> >  #include 
> > @@ -112,8 +114,6 @@ static int pasemi_nand_probe(struct platform_device 
> > *ofdev)
> > /* Allocate memory for MTD device structure and private data */
> > chip = kzalloc(sizeof(struct nand_chip), GFP_KERNEL);
> > if (!chip) {
> > -   printk(KERN_WARNING
> > -  "Unable to allocate PASEMI NAND MTD device 
> > structure\n");
> > err = -ENOMEM;
> > goto out;
> > }
> > @@ -163,13 +163,12 @@ static int pasemi_nand_probe(struct platform_device 
> > *ofdev)
> > }
> >
> > if (mtd_device_register(pasemi_nand_mtd, NULL, 0)) {
> > -   printk(KERN_ERR "pasemi_nand: Unable to register MTD 
> > device\n");
> > +   pr_err("Unable to register MTD device\n");
> 
> And why not to use dev_err(>dev, …); ?

Yep, I think it's better to use dev_err().

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] sched/cpuacct: Check for NULL when using task_pt_regs()

2016-04-13 Thread Ingo Molnar

* Srikar Dronamraju  wrote:

> * Anton Blanchard  [2016-04-06 21:59:50]:
> 
> > Looks good, and the patch below does fix the oops for me.
> > 
> > Anton
> > --
> > 
> > task_pt_regs() can return NULL for kernel threads, so add a check.
> > This fixes an oops at boot on ppc64.
> > 
> > Signed-off-by: Anton Blanchard 
> 
> Works for me too.
> 
> Reported-and-Tested-by: Srikar Dronamraju 

Could someone please re-send the fix, because it has not reached me nor lkml.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-04-13 Thread Gavin Shan
On Wed, Apr 13, 2016 at 05:28:15PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>This series of patches rebases on powerpc/next branch, plus below additional
>>patches:
>>
>>
>>
>>https://patchwork.ozlabs.org/patch/581315/(PATCH[1/9] Richard's 
>> SRIOV EEH)
>>https://patchwork.ozlabs.org/patch/582639/(PATCH[1/1] Gavin's EEH 
>> fix)
>>https://patchwork.ozlabs.org/patch/582093/(PATCH[1/1] Gavin's EEH 
>> fix)
>>https://patchwork.ozlabs.org/patch/580626/(PATCH[1/4] Gavin's PCI 
>> fix)
>>https://patchwork.ozlabs.org/patch/580153/(PATCH[1/1] Andrew's 
>> EEH minor fix)
>>https://patchwork.ozlabs.org/patch/566827/(PATCH[1/1] Russell's 
>> P5IOC2 removal)
>>https://patchwork.ozlabs.org/patch/534154/(PATCH[1/7] Richard's 
>> SRIOV rework)
>>commit 388f7b1 ("Linux 4.5-rc3")
>>
>>The series of patches intend to support PCI slot for PowerPC PowerNV platform,
>>which is running on top of skiboot firmware. The patchset requires 
>>corresponding
>>changes from skiboot firmware, which is sent to skib...@lists.ozlabs.org
>>for review. The PCI slots are exposed by skiboot with device node properties,
>>and kernel utilizes those properties to populated PCI slots accordingly.
>>
>>The original PCI infrastructure on PowerNV platform can't support hotplug
>>because the PE is assigned during PHB fixup time, which is called for once
>>during system boot time. For this, the PCI infrastructure on PowerNV platform
>>has been reworked for a lot. After that, the PE and its corresponding 
>>resources
>>(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon 
>>updating
>>PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
>>resources, on P8 strictly speaking). Each PE will maintain a reference count,
>>which is (number of child PCI devices + 1). That indicates when last child PCI
>>device leaves the PE, the PE and its included resources will be relased and 
>>put
>>back into free pool again. With this design, the PE will be released when EEH 
>>PE
>>is released. PATCH[1 - 23] are related to this part.
>>
>> From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
>>resets to EEH. The kernel gets to know if skiboot supports various reset on 
>>one
>>particular PCI slot through device-tree node. If it does, EEH will utilize the
>>functionality provided by skiboot. Besides, the device-tree nodes have to 
>>change
>>in order to support PCI hotplug. For example, when one PCI adapter inserted to
>>one slot, its device-tree node should be added to the system dynamically. 
>>Conversely,
>>the device-tree node should be removed from the system when the PCI adapter 
>>is going
>>to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device 
>>nodes,
>>they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] 
>>are
>>doing the related work.
>>
>>The OF driver is changed to support unflattening FDT blob for sub-stree, which
>>is covered by PATCH[40 - 44].
>>
>>The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC 
>>PowerNV
>>platform.
>>
>>===
>>Testing
>>===
>>1. Unplug adapters behind non-empty slot, then plug them.
>>
>>1.1 Check status
>># cat /sys/bus/pci/slots/C10/address
>>0003:09:00
>># cat /sys/bus/pci/slots/C10/adapter
>>1
>># cat /sys/bus/pci/slots/C10/power
>>1
>># lspci
>>0003:09:00.0 Ethernet controller: \
>>Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>0003:09:00.1 Ethernet controller: \
>>Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>0003:09:00.2 Ethernet controller: \
>>Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>>0003:09:00.3 Ethernet controller: \
>>Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
>># lspci -t
>># lspci -t
>>-+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
>> |   +-08.0-[04-08]--
>> |   +-09.0-[09]--+-00.0
>> |   |+-00.1
>> |   |+-00.2
>> |   |\-00.3
>> |   +-10.0-[0a-0e]--
>> |   \-11.0-[0f-13]--
>>
>>1.2 Unplug adapter 0003:09.00.x
>># echo 0 > /sys/bus/pci/slots/C10/power
>># lspci -t
>>-+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
>> |   +-08.0-[04-08]--
>> |   +-09.0-[09]--
>> |   +-10.0-[0a-0e]--
>> |   \-11.0-[0f-13]--
>>
>>  

Re: [PATCH v8 14/45] powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()

2016-04-13 Thread Alexey Kardashevskiy

On 02/17/2016 02:43 PM, Gavin Shan wrote:

This renames pnv_pci_ioda_setup_dma_pe() to pnv_pci_ioda1_setup_dma_pe()
as it's the counter-part of IODA2's pnv_pci_ioda2_setup_dma_pe().
No logical changes introduced.

Signed-off-by: Gavin Shan 



Reviewed-by: Alexey Kardashevskiy 




---
  arch/powerpc/platforms/powernv/pci-ioda.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8488238..d18b95e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2026,9 +2026,10 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
.free = pnv_ioda2_table_free,
  };

-static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
- struct pnv_ioda_pe *pe, unsigned int base,
- unsigned int segs)
+static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
+  struct pnv_ioda_pe *pe,
+  unsigned int base,
+  unsigned int segs)
  {

struct page *tce_mem = NULL;
@@ -2616,7 +2617,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
if (phb->type == PNV_PHB_IODA1) {
pe_info(pe, "DMA weight %d, assigned %d DMA32 
segments\n",
pe->dma_weight, segs);
-   pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
+   pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
} else if (phb->type == PNV_PHB_IODA2) {
pe_info(pe, "Assign DMA32 space\n");
segs = 0;




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-04-13 Thread Alexey Kardashevskiy

On 02/17/2016 02:43 PM, Gavin Shan wrote:

This series of patches rebases on powerpc/next branch, plus below additional
patches:



https://patchwork.ozlabs.org/patch/581315/  (PATCH[1/9] Richard's SRIOV EEH)
https://patchwork.ozlabs.org/patch/582639/  (PATCH[1/1] Gavin's EEH fix)
https://patchwork.ozlabs.org/patch/582093/  (PATCH[1/1] Gavin's EEH fix)
https://patchwork.ozlabs.org/patch/580626/  (PATCH[1/4] Gavin's PCI fix)
https://patchwork.ozlabs.org/patch/580153/  (PATCH[1/1] Andrew's EEH minor 
fix)
https://patchwork.ozlabs.org/patch/566827/  (PATCH[1/1] Russell's P5IOC2 
removal)
https://patchwork.ozlabs.org/patch/534154/  (PATCH[1/7] Richard's SRIOV 
rework)
commit 388f7b1 ("Linux 4.5-rc3")

The series of patches intend to support PCI slot for PowerPC PowerNV platform,
which is running on top of skiboot firmware. The patchset requires corresponding
changes from skiboot firmware, which is sent to skib...@lists.ozlabs.org
for review. The PCI slots are exposed by skiboot with device node properties,
and kernel utilizes those properties to populated PCI slots accordingly.

The original PCI infrastructure on PowerNV platform can't support hotplug
because the PE is assigned during PHB fixup time, which is called for once
during system boot time. For this, the PCI infrastructure on PowerNV platform
has been reworked for a lot. After that, the PE and its corresponding resources
(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
resources, on P8 strictly speaking). Each PE will maintain a reference count,
which is (number of child PCI devices + 1). That indicates when last child PCI
device leaves the PE, the PE and its included resources will be relased and put
back into free pool again. With this design, the PE will be released when EEH PE
is released. PATCH[1 - 23] are related to this part.

 From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
resets to EEH. The kernel gets to know if skiboot supports various reset on one
particular PCI slot through device-tree node. If it does, EEH will utilize the
functionality provided by skiboot. Besides, the device-tree nodes have to change
in order to support PCI hotplug. For example, when one PCI adapter inserted to
one slot, its device-tree node should be added to the system dynamically. 
Conversely,
the device-tree node should be removed from the system when the PCI adapter is 
going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
doing the related work.

The OF driver is changed to support unflattening FDT blob for sub-stree, which
is covered by PATCH[40 - 44].

The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC 
PowerNV
platform.

===
Testing
===
1. Unplug adapters behind non-empty slot, then plug them.

1.1 Check status
# cat /sys/bus/pci/slots/C10/address
0003:09:00
# cat /sys/bus/pci/slots/C10/adapter
1
# cat /sys/bus/pci/slots/C10/power
1
# lspci
0003:09:00.0 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: \
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
# lspci -t
# lspci -t
-+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
 |   +-08.0-[04-08]--
 |   +-09.0-[09]--+-00.0
 |   |+-00.1
 |   |+-00.2
 |   |\-00.3
 |   +-10.0-[0a-0e]--
 |   \-11.0-[0f-13]--

1.2 Unplug adapter 0003:09.00.x
# echo 0 > /sys/bus/pci/slots/C10/power
# lspci -t
-+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
 |   +-08.0-[04-08]--
 |   +-09.0-[09]--
 |   +-10.0-[0a-0e]--
 |   \-11.0-[0f-13]--

1.3 Plug adapter 0003:09.00.x
# echo 1 > /sys/bus/pci/slots/C10/power



Do I understand correctly that the adapter was not physically moved in/out 
of the slot between 1.2 and 1.3?





# lspci -t
-+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
 |   +-08.0-[04-08]--
 |  

Re: [PATCH v8 12/45] powerpc/powernv: Rename M64 related functions

2016-04-13 Thread Alexey Kardashevskiy

On 02/17/2016 02:43 PM, Gavin Shan wrote:

This renames those functions picking PE number based on consumed
M64 segments, mapping M64 segments to PEs as those functions are
going to be shared by IODA1/IODA2 in next patch. No logical changes
introduced.

Signed-off-by: Gavin Shan 



Reviewed-by: Alexey Kardashevskiy 





---
  arch/powerpc/platforms/powernv/pci-ioda.c | 22 +++---
  1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index fc0374a..1dc663a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -219,7 +219,7 @@ fail:
return -EIO;
  }

-static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
+static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
 unsigned long *pe_bitmap)
  {
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
@@ -246,22 +246,22 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev 
*pdev,
}
  }

-static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
-unsigned long *pe_bitmap,
-bool all)
+static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
+   unsigned long *pe_bitmap,
+   bool all)
  {
struct pci_dev *pdev;

list_for_each_entry(pdev, >devices, bus_list) {
-   pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
+   pnv_ioda_reserve_dev_m64_pe(pdev, pe_bitmap);

if (all && pdev->subordinate)
-   pnv_ioda2_reserve_m64_pe(pdev->subordinate,
-pe_bitmap, all);
+   pnv_ioda_reserve_m64_pe(pdev->subordinate,
+   pe_bitmap, all);
}
  }

-static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
+static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
  {
struct pci_controller *hose = pci_bus_to_host(bus);
struct pnv_phb *phb = hose->private_data;
@@ -283,7 +283,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool 
all)
}

/* Figure out reserved PE numbers by the PE */
-   pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
+   pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);

/*
 * the current bus might not own M64 window and that's all
@@ -365,8 +365,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb 
*phb)
/* Use last M64 BAR to cover M64 window */
phb->ioda.m64_bar_idx = 15;
phb->init_m64 = pnv_ioda2_init_m64;
-   phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
-   phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
+   phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
+   phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
  }

  static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 11/45] powerpc/powernv: Track M64 segment consumption

2016-04-13 Thread Alexey Kardashevskiy

On 02/17/2016 02:43 PM, Gavin Shan wrote:

When unplugging PCI devices, their parent PEs might be offline.
The consumed M64 resource by the PEs should be released at that
time. As we track M32 segment consumption, this introduces an
array to the PHB to track the mapping between M64 segment and
PE number.

Signed-off-by: Gavin Shan 



Reviewed-by: Alexey Kardashevskiy 

but it would not hurt to mention in the commit log why M64 segment is not 
tracked/setup by the existing (at this point, at least) 
pnv_ioda_setup_one_res().




---
  arch/powerpc/platforms/powernv/pci-ioda.c | 10 --
  arch/powerpc/platforms/powernv/pci.h  |  1 +
  2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7330a73..fc0374a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -305,6 +305,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool 
all)
phb->ioda.total_pe_num) {
pe = >ioda.pe_array[i];

+   phb->ioda.m64_segmap[pe->pe_number] = pe->pe_number;
if (!master_pe) {
pe->flags |= PNV_IODA_PE_MASTER;
INIT_LIST_HEAD(>slaves);
@@ -3245,7 +3246,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
  {
struct pci_controller *hose;
struct pnv_phb *phb;
-   unsigned long size, m32map_off, pemap_off, iomap_off = 0;
+   unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
const __be64 *prop64;
const __be32 *prop32;
int i, len;
@@ -3332,6 +,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,

/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
+   m64map_off = size;
+   size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]);
m32map_off = size;
size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
if (phb->type == PNV_PHB_IODA1) {
@@ -3342,9 +3345,12 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
aux = memblock_virt_alloc(size, 0);
phb->ioda.pe_alloc = aux;
+   phb->ioda.m64_segmap = aux + m64map_off;
phb->ioda.m32_segmap = aux + m32map_off;
-   for (i = 0; i < phb->ioda.total_pe_num; i++)
+   for (i = 0; i < phb->ioda.total_pe_num; i++) {
+   phb->ioda.m64_segmap[i] = IODA_INVALID_PE;
phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
+   }
if (phb->type == PNV_PHB_IODA1) {
phb->ioda.io_segmap = aux + iomap_off;
for (i = 0; i < phb->ioda.total_pe_num; i++)
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 36c4965..866a5ea 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -146,6 +146,7 @@ struct pnv_phb {
struct pnv_ioda_pe  *pe_array;

/* M32 & IO segment maps */
+   int *m64_segmap;
int *m32_segmap;
int *io_segmap;





--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 09/45] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()

2016-04-13 Thread Alexey Kardashevskiy

On 02/17/2016 02:43 PM, Gavin Shan wrote:

The original implementation of pnv_ioda_setup_pe_seg() configures
IO and M32 segments by separate logics, which can be merged by
by caching @segmap, @seg_size, @win in advance. This shouldn't
cause any behavioural changes.

>

Signed-off-by: Gavin Shan 
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 62 ++-
  1 file changed, 28 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 44cc5f3..fd7d382 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2940,8 +2940,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller 
*hose,
struct pnv_phb *phb = hose->private_data;
struct pci_bus_region region;
struct resource *res;
-   int i, index;
-   int rc;
+   unsigned int segsize;
+   int *segmap, index, i;
+   uint16_t win;
+   int64_t rc;

/*
 * NOTE: We only care PCI bus based PE for now. For PCI
@@ -2958,23 +2960,9 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller 
*hose,
if (res->flags & IORESOURCE_IO) {
region.start = res->start - phb->ioda.io_pci_base;
region.end   = res->end - phb->ioda.io_pci_base;
-   index = region.start / phb->ioda.io_segsize;
-
-   while (index < phb->ioda.total_pe_num &&
-  region.start <= region.end) {
-   phb->ioda.io_segmap[index] = pe->pe_number;
-   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-   pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, 
index);
-   if (rc != OPAL_SUCCESS) {
-   pr_err("%s: OPAL error %d when mapping IO 
"
-  "segment #%d to PE#%d\n",
-  __func__, rc, index, 
pe->pe_number);
-   break;
-   }
-
-   region.start += phb->ioda.io_segsize;
-   index++;
-   }
+   segsize  = phb->ioda.io_segsize;
+   segmap   = phb->ioda.io_segmap;
+   win  = OPAL_IO_WINDOW_TYPE;
} else if ((res->flags & IORESOURCE_MEM) &&
   !pnv_pci_is_mem_pref_64(res->flags)) {
region.start = res->start -
@@ -2983,23 +2971,29 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller 
*hose,
region.end   = res->end -
   hose->mem_offset[0] -
   phb->ioda.m32_pci_base;
-   index = region.start / phb->ioda.m32_segsize;
-
-   while (index < phb->ioda.total_pe_num &&
-  region.start <= region.end) {
-   phb->ioda.m32_segmap[index] = pe->pe_number;
-   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-   pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, 
index);
-   if (rc != OPAL_SUCCESS) {
-   pr_err("%s: OPAL error %d when mapping M32 
"
-  "segment#%d to PE#%d",
-  __func__, rc, index, 
pe->pe_number);
-   break;
-   }
+   segsize  = phb->ioda.m32_segsize;
+   segmap   = phb->ioda.m32_segmap;
+   win  = OPAL_M32_WINDOW_TYPE;
+   } else {
+   continue;
+   }

-   region.start += phb->ioda.m32_segsize;
-   index++;
+   index = region.start / segsize;
+   while (index < phb->ioda.total_pe_num &&
+  region.start <= region.end) {
+   segmap[index] = pe->pe_number;
+   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+   pe->pe_number, win, 0, index);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("%s: Error %lld mapping (%d) seg#%d to 
PHB#%d-PE#%d\n",
+   __func__, rc, win, index,
+   pe->phb->hose->global_number,
+   pe->pe_number);
+   break;


Please move this loop to a helper and stop caching segsize/segmap/win; this 
will make 

[patch] tty: hvc_console: silence unintialized variable warning

2016-04-13 Thread Dan Carpenter
If ->get_char() returns a negative error code and that can mean that
"ch" is uninitialized.  The callers of this function expect NO_POLL_CHAR
on error so let's return that.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index e46d628..325747a 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -814,7 +814,7 @@ static int hvc_poll_get_char(struct tty_driver *driver, int 
line)
 
n = hp->ops->get_chars(hp->vtermno, , 1);
 
-   if (n == 0)
+   if (n <= 0)
return NO_POLL_CHAR;
 
return ch;
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap

2016-04-13 Thread Alexey Kardashevskiy

On 02/17/2016 02:43 PM, Gavin Shan wrote:

There are two arrays for IO and M32 segment maps on every PHB.
The index of the arrays are segment number and the value stored
in the corresponding element is PE number, indicating the segment
is assigned to the PE. Initially, all elements in those two arrays
are zeroes, meaning all segments are assigned to PE#0. It's wrong.

>

This fixes the initial values in the elements of those two arrays
to IODA_INVALID_PE, meaning all segments aren't assigned to any
PE.


This is ok.


In order to use IODA_INVALID_PE (-1) to represent invalid PE
number, the types of those two arrays are changed from "unsigned int"
to "int".


"unsigned" can carry (-1) perfectly fine, just add a type cast to 
IODA_INVALID_PE:


#define IODA_INVALID_PE(unsigned int)(-1)

Using "signed" type for indexes which cannot be negative does not make much 
sense - instead of checking for the upper boundary, you have to check for 
"< 0" too.


OPAL uses unsigned type for PE (uint64_t or uint32_t or uint16_t - this is 
quite funny).


pnv_ioda_pe::pe_number is "unsigned" and this pe_number is the same thing 
as I can see in pnv_ioda_setup_dev_PE().


Some printk() print the PE number as "%x" (which implies "unsigned").


I suggest changing the pci_dn::pe_number type from "int" to "unsigned int" 
to match pnv_ioda_pe::pe_number, in a separate patch. Or do not touch types 
for now.




Signed-off-by: Gavin Shan 
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++--
  arch/powerpc/platforms/powernv/pci.h  | 4 ++--
  2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1d2514f..44cc5f3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3239,7 +3239,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
unsigned long size, m32map_off, pemap_off, iomap_off = 0;
const __be64 *prop64;
const __be32 *prop32;
-   int len;
+   int i, len;
u64 phb_id;
void *aux;
long rc;
@@ -3334,8 +3334,13 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
aux = memblock_virt_alloc(size, 0);
phb->ioda.pe_alloc = aux;
phb->ioda.m32_segmap = aux + m32map_off;
-   if (phb->type == PNV_PHB_IODA1)
+   for (i = 0; i < phb->ioda.total_pe_num; i++)
+   phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
+   if (phb->type == PNV_PHB_IODA1) {
phb->ioda.io_segmap = aux + iomap_off;
+   for (i = 0; i < phb->ioda.total_pe_num; i++)
+   phb->ioda.io_segmap[i] = IODA_INVALID_PE;
+   }
phb->ioda.pe_array = aux + pemap_off;
set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);

diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 784882a..36c4965 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -146,8 +146,8 @@ struct pnv_phb {
struct pnv_ioda_pe  *pe_array;

/* M32 & IO segment maps */
-   unsigned int*m32_segmap;
-   unsigned int*io_segmap;
+   int *m32_segmap;
+   int *io_segmap;

/* IRQ chip */
int irq_chip_init;




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 07/45] powerpc/powernv: Rename PE# fields in struct pnv_phb

2016-04-13 Thread Alexey Kardashevskiy

On 02/17/2016 02:43 PM, Gavin Shan wrote:

This renames the fields related to PE number in "struct pnv_phb"
for better reflecting of their usages as Alexey suggested. No
logical changes introduced.

Signed-off-by: Gavin Shan 



Reviewed-by: Alexey Kardashevskiy 



---
  arch/powerpc/platforms/powernv/eeh-powernv.c |  2 +-
  arch/powerpc/platforms/powernv/pci-ioda.c| 58 ++--
  arch/powerpc/platforms/powernv/pci.c |  2 +-
  arch/powerpc/platforms/powernv/pci.h |  4 +-
  4 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 950b3e5..69e41ce 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -75,7 +75,7 @@ static int pnv_eeh_init(void)
 * and P7IOC separately. So we should regard
 * PE#0 as valid for PHB3 and P7IOC.
 */
-   if (phb->ioda.reserved_pe != 0)
+   if (phb->ioda.reserved_pe_idx != 0)
eeh_add_flag(EEH_VALID_PE_ZERO);

break;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 10ecd97..1d2514f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -124,7 +124,7 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long 
flags)

  static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
  {
-   if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
+   if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
pr_warn("%s: Invalid PE %d on PHB#%x\n",
__func__, pe_no, phb->hose->global_number);
return;
@@ -144,8 +144,8 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)

do {
pe = find_next_zero_bit(phb->ioda.pe_alloc,
-   phb->ioda.total_pe, 0);
-   if (pe >= phb->ioda.total_pe)
+   phb->ioda.total_pe_num, 0);
+   if (pe >= phb->ioda.total_pe_num)
return IODA_INVALID_PE;
} while(test_and_set_bit(pe, phb->ioda.pe_alloc));

@@ -199,13 +199,13 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 * expected to be 0 or last one of PE capabicity.
 */
r = >hose->mem_resources[1];
-   if (phb->ioda.reserved_pe == 0)
+   if (phb->ioda.reserved_pe_idx == 0)
r->start += phb->ioda.m64_segsize;
-   else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
+   else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
r->end -= phb->ioda.m64_segsize;
else
pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
-   phb->ioda.reserved_pe);
+   phb->ioda.reserved_pe_idx);

return 0;

@@ -274,7 +274,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool 
all)
return IODA_INVALID_PE;

/* Allocate bitmap */
-   size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
+   size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
pe_alloc = kzalloc(size, GFP_KERNEL);
if (!pe_alloc) {
pr_warn("%s: Out of memory !\n",
@@ -290,7 +290,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool 
all)
 * contributed by its child buses. For the case, we needn't
 * pick M64 dependent PE#.
 */
-   if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
+   if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
kfree(pe_alloc);
return IODA_INVALID_PE;
}
@@ -301,8 +301,8 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool 
all)
 */
master_pe = NULL;
i = -1;
-   while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe, i + 1)) <
-   phb->ioda.total_pe) {
+   while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe_num, i + 1)) <
+   phb->ioda.total_pe_num) {
pe = >ioda.pe_array[i];

if (!master_pe) {
@@ -355,7 +355,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb 
*phb)
hose->mem_offset[1] = res->start - pci_addr;

phb->ioda.m64_size = resource_size(res);
-   phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe;
+   phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe_num;
phb->ioda.m64_base = pci_addr;

pr_info(" MEM64 0x%016llx..0x%016llx -> 0x%016llx\n",
@@ -456,7 +456,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int 
pe_no)
s64 rc;

/* Sanity check on PE number */
-   if (pe_no < 0 || pe_no >= phb->ioda.total_pe)
+   if (pe_no <