[PATCH] powerpc: configs: Add Skiroot defconfig

2017-07-10 Thread Joel Stanley
This configuration is used by the OpenPower firmware for it's
Linux-as-bootloader implementation. Also known as the Petitboot kernel,
this configuration broke in 4.12 (CPU_HOTPLUG=n), so add it to the
upstream tree in order to get better coverage.

Signed-off-by: Joel Stanley 
---
 arch/powerpc/configs/skiroot_defconfig | 226 +
 1 file changed, 226 insertions(+)
 create mode 100644 arch/powerpc/configs/skiroot_defconfig

diff --git a/arch/powerpc/configs/skiroot_defconfig 
b/arch/powerpc/configs/skiroot_defconfig
new file mode 100644
index ..24a3ea65d9e2
--- /dev/null
+++ b/arch/powerpc/configs/skiroot_defconfig
@@ -0,0 +1,226 @@
+CONFIG_PPC64=y
+CONFIG_ALTIVEC=y
+CONFIG_VSX=y
+CONFIG_NR_CPUS=2048
+CONFIG_CPU_LITTLE_ENDIAN=y
+# CONFIG_SWAP is not set
+CONFIG_SYSVIPC=y
+CONFIG_POSIX_MQUEUE=y
+# CONFIG_CROSS_MEMORY_ATTACH is not set
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_LOG_BUF_SHIFT=20
+CONFIG_RELAY=y
+CONFIG_BLK_DEV_INITRD=y
+# CONFIG_RD_GZIP is not set
+# CONFIG_RD_BZIP2 is not set
+# CONFIG_RD_LZMA is not set
+# CONFIG_RD_LZO is not set
+# CONFIG_RD_LZ4 is not set
+CONFIG_CC_OPTIMIZE_FOR_SIZE=y
+CONFIG_PERF_EVENTS=y
+# CONFIG_COMPAT_BRK is not set
+CONFIG_JUMP_LABEL=y
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_MODULE_SIG=y
+CONFIG_MODULE_SIG_FORCE=y
+CONFIG_MODULE_SIG_SHA512=y
+CONFIG_PARTITION_ADVANCED=y
+# CONFIG_IOSCHED_DEADLINE is not set
+# CONFIG_PPC_PSERIES is not set
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
+CONFIG_CPU_IDLE=y
+CONFIG_HZ_100=y
+CONFIG_KEXEC=y
+CONFIG_IRQ_ALL_CPUS=y
+CONFIG_NUMA=y
+# CONFIG_COMPACTION is not set
+# CONFIG_MIGRATION is not set
+# CONFIG_BOUNCE is not set
+CONFIG_PPC_64K_PAGES=y
+CONFIG_SCHED_SMT=y
+CONFIG_CMDLINE_BOOL=y
+CONFIG_CMDLINE="console=tty0 console=hvc0 powersave=off"
+# CONFIG_SECCOMP is not set
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+CONFIG_NET_IPIP=y
+CONFIG_SYN_COOKIES=y
+# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
+# CONFIG_INET_XFRM_MODE_TUNNEL is not set
+# CONFIG_INET_XFRM_MODE_BEET is not set
+# CONFIG_IPV6 is not set
+CONFIG_DNS_RESOLVER=y
+# CONFIG_WIRELESS is not set
+CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
+CONFIG_MTD=m
+CONFIG_MTD_POWERNV_FLASH=m
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_SIZE=65536
+CONFIG_VIRTIO_BLK=m
+CONFIG_BLK_DEV_NVME=m
+CONFIG_EEPROM_AT24=y
+# CONFIG_CXL is not set
+CONFIG_BLK_DEV_SD=m
+CONFIG_BLK_DEV_SR=m
+CONFIG_BLK_DEV_SR_VENDOR=y
+CONFIG_CHR_DEV_SG=m
+CONFIG_SCSI_CONSTANTS=y
+CONFIG_SCSI_SCAN_ASYNC=y
+CONFIG_SCSI_FC_ATTRS=y
+CONFIG_SCSI_CXGB3_ISCSI=m
+CONFIG_SCSI_CXGB4_ISCSI=m
+CONFIG_SCSI_BNX2_ISCSI=m
+CONFIG_BE2ISCSI=m
+CONFIG_SCSI_AACRAID=m
+CONFIG_MEGARAID_NEWGEN=y
+CONFIG_MEGARAID_MM=m
+CONFIG_MEGARAID_MAILBOX=m
+CONFIG_MEGARAID_SAS=m
+CONFIG_SCSI_MPT2SAS=m
+CONFIG_SCSI_IPR=m
+# CONFIG_SCSI_IPR_TRACE is not set
+# CONFIG_SCSI_IPR_DUMP is not set
+CONFIG_SCSI_QLA_FC=m
+CONFIG_SCSI_QLA_ISCSI=m
+CONFIG_SCSI_LPFC=m
+CONFIG_SCSI_VIRTIO=m
+CONFIG_SCSI_DH=y
+CONFIG_SCSI_DH_ALUA=m
+CONFIG_ATA=y
+CONFIG_SATA_AHCI=y
+# CONFIG_ATA_SFF is not set
+CONFIG_MD=y
+CONFIG_BLK_DEV_MD=m
+CONFIG_MD_LINEAR=m
+CONFIG_MD_RAID0=m
+CONFIG_MD_RAID1=m
+CONFIG_MD_RAID10=m
+CONFIG_MD_RAID456=m
+CONFIG_MD_MULTIPATH=m
+CONFIG_MD_FAULTY=m
+CONFIG_BLK_DEV_DM=m
+CONFIG_DM_CRYPT=m
+CONFIG_DM_SNAPSHOT=m
+CONFIG_DM_MIRROR=m
+CONFIG_DM_ZERO=m
+CONFIG_DM_MULTIPATH=m
+CONFIG_ACENIC=m
+CONFIG_ACENIC_OMIT_TIGON_I=y
+CONFIG_TIGON3=y
+CONFIG_BNX2X=m
+CONFIG_CHELSIO_T1=y
+CONFIG_BE2NET=m
+CONFIG_S2IO=m
+CONFIG_E100=m
+CONFIG_E1000=m
+CONFIG_E1000E=m
+CONFIG_IXGB=m
+CONFIG_IXGBE=m
+CONFIG_MLX4_EN=m
+CONFIG_MLX5_CORE=m
+CONFIG_MLX5_CORE_EN=y
+CONFIG_MYRI10GE=m
+CONFIG_QLGE=m
+CONFIG_NETXEN_NIC=m
+CONFIG_SFC=m
+# CONFIG_USB_NET_DRIVERS is not set
+# CONFIG_WLAN is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_INPUT_MISC=y
+# CONFIG_SERIO_SERPORT is not set
+# CONFIG_DEVMEM is not set
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_IPMI_HANDLER=y
+CONFIG_IPMI_DEVICE_INTERFACE=y
+CONFIG_IPMI_POWERNV=y
+CONFIG_HW_RANDOM=y
+CONFIG_RAW_DRIVER=y
+CONFIG_MAX_RAW_DEVS=1024
+CONFIG_TCG_TIS_I2C_NUVOTON=y
+# CONFIG_I2C_COMPAT is not set
+CONFIG_I2C_CHARDEV=y
+# CONFIG_I2C_HELPER_AUTO is not set
+CONFIG_DRM=y
+CONFIG_DRM_RADEON=y
+CONFIG_DRM_AST=m
+CONFIG_FIRMWARE_EDID=y
+CONFIG_FB_MODE_HELPERS=y
+CONFIG_FB_OF=y
+CONFIG_FB_MATROX=y
+CONFIG_FB_MATROX_MILLENIUM=y
+CONFIG_FB_MATROX_MYSTIQUE=y
+CONFIG_FB_MATROX_G=y
+# CONFIG_LCD_CLASS_DEVICE is not set
+# CONFIG_BACKLIGHT_GENERIC is not set
+# CONFIG_VGA_CONSOLE is not set
+CONFIG_LOGO=y
+CONFIG_USB_HIDDEV=y
+CONFIG_USB=y
+CONFIG_USB_MON=y
+CONFIG_USB_XHCI_HCD=y
+CONFIG_USB_EHCI_HCD=y
+# CONFIG_USB_EHCI_HCD_PPC_OF is not set
+CONFIG_USB_OHCI_HCD=y
+CONFIG_USB_STORAGE=y
+CONFIG_RTC_CLASS=y

Re: [PATCH 5/5] powernv:idle: Disable LOSE_FULL_CONTEXT states when stop-api fails.

2017-07-10 Thread Gautham R Shenoy
On Sat, Jul 08, 2017 at 07:05:26PM +1000, Nicholas Piggin wrote:
> On Fri, 7 Jul 2017 23:07:10 +0530
> Gautham R Shenoy  wrote:
> 
> > On Fri, Jul 07, 2017 at 01:29:16AM +1000, Nicholas Piggin wrote:
> > > On Wed,  5 Jul 2017 22:08:16 +0530
> > > "Gautham R. Shenoy"  wrote:
> > >   
> > > > From: "Gautham R. Shenoy" 
> > > > 
> > > > Currently, we use the opal call opal_slw_set_reg() to inform the that
> > > > the Sleep-Winkle Engine (SLW) to restore the contents of some of the
> > > > Hypervisor state on wakeup from deep idle states that lose full
> > > > hypervisor context (characterized by the flag
> > > > OPAL_PM_LOSE_FULL_CONTEXT).
> > > > 
> > > > However, the current code has a bug in that if opal_slw_set_reg()
> > > > fails, we don't disable the use of these deep states (winkle on
> > > > POWER8, stop4 onwards on POWER9).
> > > > 
> > > > This patch fixes this bug by ensuring that if the the sleep winkle
> > > > engine is unable to restore the hypervisor states in
> > > > pnv_save_sprs_for_deep_states(), then we mark as invalid the states
> > > > which lose full context.
> > > > 
> > > > As a side-effect, since supported_cpuidle_states in
> > > > pnv_probe_idle_states() consists of flags of only the valid states,
> > > > this patch will ensure that no other subsystem in the kernel can use
> > > > the states which lose full context on stop-api failures.  
> > > 
> > > Looks good. Is there something minimal we can do for stable here?
> > > 
> > > Aside question, do we need to restore LPCR at all with the SLW engine?
> > > It gets set up again when by the idle wakeup code.  
> > 
> > 
> > > 
> > > And does POWER9 really need MSR and PSSCR restored by SLW? (going a bit
> > > off topic here, I'm just curious)  
> > 
> > MSR is needed to be restored so that we wakeup with the right
> > endianness and with the IR,DR disabled.
> 
> And POWER8 does not require this?
> 
> > PSSCR is set to a value so that in case of a special wakeup for a
> > deep-stop, the SLW can program the core to go back to the stop level
> > provided by the PSSCR value via the stop-api.
> 
> It always restores to deepest stop? Is there any way to restore to the
> achieved stop level? Maybe there is no usefulness for that.

That would have been ideal. But there's no way to achieve that at the
moment. The alternative is to have call the stop-api with psscr set to
the desired stop level before every stop entry. This is will consume
additional cycles which is what we are trying to avoid.

So we are currently setting the psscr value to deepest stop via stop
api as a compromise, because then on wakeup, we end up restoring more
than what would typically be required, but that's still ok since we
would be erring on the side of caution.

Programming the PSSCR to any other value might have safety
concerns. Eg: Suppose a core which was in stop11 got woken up by a
special wakeup and if the psscr programmed via stop API was stop4 then
the firmware will put the core in stop4. Now, since stop4 doesn't lose
timebase and stop11 does, in the aforementioned case TB would have
gone out of sync in the duration that the core was in stop11. Thus,
when the core wakes up in the kernel in stop4, the kernel won't resync
the TB which is a problematic.

> 
> Thanks,
> Nick
> 



Re: [RFC 1/4] libnvdimm: add to_{nvdimm,nd_region}_dev()

2017-07-10 Thread Oliver
On Tue, Jul 11, 2017 at 9:53 AM, Dan Williams  wrote:
> On Tue, Jun 27, 2017 at 3:28 AM, Oliver O'Halloran  wrote:
>> struct device contains the ->of_node pointer so that devices can be
>> assoicated with the device-tree node that created them on DT platforms.
>> libnvdimm hides the struct device for regions and nvdimm devices inside
>> of an opaque structure so this patch adds accessors for each to allow
>> the of_nvdimm driver to set the of_node pointer.
>
> I'd rather go the other way and pass in the of_node to the bus and
> dimm registration routines. It's a generic property of the device so
> we should handle it like other generic device properties that get set
> at initialization time like 'attr_groups' in nvdimm_bus_descriptor, or
> a new parameter to nvdimm_create().

Sure. I just figured it would be preferable to keep firmware specific
details inside the firmware driver rather than adding #ifdef CONFIG_OF
around the place. Do you have any objections to making nvdimm_create()
take a descriptor structure rather than adding a parameter?


Re: [PATCH v2 10/10] mtd: powernv_flash: Use opal_async_wait_response_interruptible()

2017-07-10 Thread kbuild test robot
Hi Cyril,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.12 next-20170710]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Cyril-Bur/Allow-opal-async-waiters-to-get-interrupted/20170710-095504
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

>> ERROR: ".opal_error_code" [drivers/mtd/devices/powernv_flash.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] powerpc/asm: Mark cr0 as clobbered in mftb()

2017-07-10 Thread Oliver
On Mon, Jul 10, 2017 at 9:50 PM, Michael Ellerman  wrote:
> Oliver O'Halloran  writes:
>
>> The workaround for the CELL timebase bug does not correctly mark cr0 as
>> being clobbered. This can result in GCC making some poor^W completely
>> broken optimisations.
>
> Fruit 'n oats, how did we never notice that? Luck I guess. Or subtle
> breakage that no one could pin down :E

Dumb luck probably.  The workaround is inside a feature section which
depends on CPU_FTR_CELL_TB_BUG so you would only ever see a problem
when actually running on Cell (can't be that many people...). CR0
being volatile across function calls probably helped mask it too.

> I'll tag it for stable.
Good idea.

> Your change log is not entirely fair, it's not GCC's fault, we changed
> register state without telling it, so it's on us :)  I'll reword it a
> bit here.

Yeah that was probably a bit harsh.

>
> cheers
>
>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
>> index 7e50e47375d6..a3b6575c7842 100644
>> --- a/arch/powerpc/include/asm/reg.h
>> +++ b/arch/powerpc/include/asm/reg.h
>> @@ -1303,7 +1303,7 @@ static inline void msr_check_and_clear(unsigned long 
>> bits)
>>   "   .llong 0\n" \
>>   ".previous" \
>>   : "=r" (rval) \
>> - : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL)); \
>> + : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL) : "cr0"); 
>> \
>>   rval;})
>>  #else
>>  #define mftb()   ({unsigned long rval;   \
>> --
>> 2.9.4


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao



On 7/11/2017 10:28 AM, Michael Ellerman wrote:

"Jin, Yao"  writes:


On 7/10/2017 9:46 PM, Peter Zijlstra wrote:

On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:


PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
transition).

So your "PERF_BR_INT" is a system call?

The "INT" thing has indeed been used as system call mechanism (typically
INT 80). But these days we have special purpose syscall instructions.

It could maybe be compared to the PPC "Unconditional TRAP with
immediate" where you use the immediate value as an index into a handler
vector.


And PERF_BR_IRQ is not an interrupt request (as its name suggests),
not what we call an "external interrupt" either; instead it is every
interrupt that is not a system call?

It is actual interrupts, but also faults, traps and all the other
exceptions not caused by "INT" I think.


Yes. It's interrupt, traps, faults. If from is in the user space and to
is in the kernel, it indicates the ring3 -> ring0 transition.

If the from instruction is not syscall or other ring transition
instruction, it should be interrupt, traps and faults. That's how we get
the PERF_BR_IRQ on x86.

Anyway, maybe we just use a minimum but the most common set of branch
types now, it could be a good start and acceptable on all architectures.

PERF_BR_COND= 1,/* conditional */
PERF_BR_UNCOND= 2,/* unconditional */
PERF_BR_IND= 3,/* indirect */
PERF_BR_CALL= 4,/* call */
PERF_BR_IND_CALL= 5,/* indirect call */
PERF_BR_RET= 6,/* return */

That would be fine by me, if you're sick of talking about it and just
want to get it merged :)

:)


I think you could expand it a bit, this list would cover the vast bulk
of branch types for us:

   PERF_BR_COND /* Conditional */
   PERF_BR_UNCOND   /* Unconditional */
   PERF_BR_IND  /* Indirect */
   PERF_BR_CALL /* Function call */
   PERF_BR_IND_CALL /* Indirect function call */
   PERF_BR_RET  /* Function return */
   PERF_BR_SYSCALL  /* Syscall */
   PERF_BR_SYSRET   /* Syscall return */
   PERF_BR_COND_CALL/* Conditional function call */
   PERF_BR_COND_RET /* Conditional function return */

cheers


OK, accept! Use 4 bits for above branch types and we can reserve 5 for 
potential future types.


Thanks
Jin Yao



Re: [PATCH] powerpc/asm: Mark cr0 as clobbered in mftb()

2017-07-10 Thread Michael Ellerman
Segher Boessenkool  writes:

> Hi!
>
> On Mon, Jul 10, 2017 at 09:50:06PM +1000, Michael Ellerman wrote:
>> Oliver O'Halloran  writes:
>> 
>> > The workaround for the CELL timebase bug does not correctly mark cr0 as
>> > being clobbered. This can result in GCC making some poor^W completely
>> > broken optimisations.
>> 
>> Fruit 'n oats, how did we never notice that? Luck I guess. Or subtle
>> breakage that no one could pin down :E
>
> GCC does not use cr0 before it has used cr7, cr5, cr6, cr1 (unless some
> instruction _requires_ cr0, like record form ("dot") instructions, which
> until recently were disabled when targetting Cell), so it isn't too easy
> to hit the problem here.  Maybe Oliver used a very new GCC?  Or he was
> unlucky.

Aha. I think he actually hit the bug elsewhere and then noticed it here
also? And yeah I think it was a dot instruction that triggered it.

>> I'll tag it for stable.
>> 
>> Your change log is not entirely fair, it's not GCC's fault, we changed
>> register state without telling it, so it's on us :)  I'll reword it a
>> bit here.
>
> Yep, GCC works fine here, GIGO etc.  It isn't feasible to make GCC warn
> for most inline asm problems, either (we do not parse the mnemonics in
> the asm; this is a fundamental part of the design; you must express all
> constraints as, well, constraints).

Ack. It's a terrible terrible interface, but that's the price we pay for
wanting to insert arbitrary bits of asm inside an otherwise compiled
language :)

When I get the time I might go through our inline asm and convert all
the cmpwi x,z to cmpwi cr0,x,z, so at least the cr0 usage is a little
more obvious.

cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
"Jin, Yao"  writes:

> On 7/10/2017 9:46 PM, Peter Zijlstra wrote:
>> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:
>>
 PERF_BR_INT is triggered by instruction "int" .
 PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
 transition).
>>> So your "PERF_BR_INT" is a system call?
>> The "INT" thing has indeed been used as system call mechanism (typically
>> INT 80). But these days we have special purpose syscall instructions.
>>
>> It could maybe be compared to the PPC "Unconditional TRAP with
>> immediate" where you use the immediate value as an index into a handler
>> vector.
>>
>>> And PERF_BR_IRQ is not an interrupt request (as its name suggests),
>>> not what we call an "external interrupt" either; instead it is every
>>> interrupt that is not a system call?
>> It is actual interrupts, but also faults, traps and all the other
>> exceptions not caused by "INT" I think.
>>
> Yes. It's interrupt, traps, faults. If from is in the user space and to 
> is in the kernel, it indicates the ring3 -> ring0 transition.
>
> If the from instruction is not syscall or other ring transition 
> instruction, it should be interrupt, traps and faults. That's how we get 
> the PERF_BR_IRQ on x86.
>
> Anyway, maybe we just use a minimum but the most common set of branch 
> types now, it could be a good start and acceptable on all architectures.
>
> PERF_BR_COND= 1,/* conditional */
> PERF_BR_UNCOND= 2,/* unconditional */
> PERF_BR_IND= 3,/* indirect */
> PERF_BR_CALL= 4,/* call */
> PERF_BR_IND_CALL= 5,/* indirect call */
> PERF_BR_RET= 6,/* return */

That would be fine by me, if you're sick of talking about it and just
want to get it merged :)

I think you could expand it a bit, this list would cover the vast bulk
of branch types for us:

  PERF_BR_COND  /* Conditional */
  PERF_BR_UNCOND/* Unconditional */
  PERF_BR_IND   /* Indirect */
  PERF_BR_CALL  /* Function call */
  PERF_BR_IND_CALL  /* Indirect function call */
  PERF_BR_RET   /* Function return */
  PERF_BR_SYSCALL   /* Syscall */
  PERF_BR_SYSRET/* Syscall return */
  PERF_BR_COND_CALL /* Conditional function call */
  PERF_BR_COND_RET  /* Conditional function return */

cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
Segher Boessenkool  writes:
> Hi!
>
> On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:
>> 1. We all agree these definitions:
>> 
>> +PERF_BR_COND= 1,/* conditional */
>> +PERF_BR_UNCOND  = 2,/* unconditional */
>> +PERF_BR_IND = 3,/* indirect */
>> +PERF_BR_CALL= 4,/* call */
>> +PERF_BR_IND_CALL= 5,/* indirect call */
>> +PERF_BR_RET = 6,/* return */
>> +PERF_BR_SYSCALL = 7,/* syscall */
>> +PERF_BR_SYSRET  = 8,/* syscall return */
>> +PERF_BR_IRET= 11,   /* return from interrupt */
>
> Do we?  It does not map very well to PowerPC branch types.

I think they map well enough to the types of branches that are actually
used in practice.

To represent the full range of possibilities we'd need to switch to a
bitmap of flags, ie. COND, IND, CALL, RET, SYSCALL, INT, etc. But it
would need more than 4 bits and I don't think there's that much added
value in being able to represent all the bizarre combinations.

But maybe that is the best option as it makes the API more flexible and
means we don't have to get the list of branches correct up front?


I ran some quick numbers on a kernel I had here (powernv w/gcc 7):

  Type  Percent
  -
  cond  40.92%  beq (79166) bne (57379) ble (10411) bgt (9587) blt 
(6248) bge (3704) bdnz (1251) bdz (353) bns (30) bdnzf (2) bdnzt (1)
  uncond14.89%  b (61182) 
  indirect  0.10%   bctr (418)
  call  33.33%  bl (136926)
  ind call  1.44%   bctrl (5912)
  return9.23%   blr (37943)
=   99.91%

If we add cond call/return that covers another 0.08% taking us to 99.99%
of branches.

I know future compilers and or different code might use a different
distribution, but I doubt it will change all that much.

Maybe cond could be broken down further, but the only really meaningful
sub category I can think of is the decrementing type, and those are
quite rare.

cheers


Re: [PATCH] tty: Fix TIOCGPTPEER ioctl definition

2017-07-10 Thread Aleksa Sarai

This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag
because it doesn't copy anything from/to userspace to access the
argument.


  Acked-by: Aleksa Sarai 

Oops, I misunderstood what _IOR means semantically. TIL -- thanks!

--
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/


[PATCH] tty: Fix TIOCGPTPEER ioctl definition

2017-07-10 Thread Gleb Fotengauer-Malinovskiy
This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag
because it doesn't copy anything from/to userspace to access the
argument.

Fixes: 54ebbfb1 ("tty: add TIOCGPTPEER ioctl")
Signed-off-by: Gleb Fotengauer-Malinovskiy 
---
 arch/alpha/include/uapi/asm/ioctls.h   | 2 +-
 arch/mips/include/uapi/asm/ioctls.h| 2 +-
 arch/parisc/include/uapi/asm/ioctls.h  | 2 +-
 arch/powerpc/include/uapi/asm/ioctls.h | 2 +-
 arch/sh/include/uapi/asm/ioctls.h  | 2 +-
 arch/sparc/include/uapi/asm/ioctls.h   | 2 +-
 arch/xtensa/include/uapi/asm/ioctls.h  | 2 +-
 include/uapi/asm-generic/ioctls.h  | 2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/ioctls.h 
b/arch/alpha/include/uapi/asm/ioctls.h
index ff67b837..1cd7dc7 100644
--- a/arch/alpha/include/uapi/asm/ioctls.h
+++ b/arch/alpha/include/uapi/asm/ioctls.h
@@ -100,7 +100,7 @@
 #define TIOCGPKT   _IOR('T', 0x38, int) /* Get packet mode state */
 #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */
 #define TIOCGEXCL  _IOR('T', 0x40, int) /* Get exclusive mode state */
-#define TIOCGPTPEER_IOR('T', 0x41, int) /* Safely open the slave */
+#define TIOCGPTPEER_IO('T', 0x41) /* Safely open the slave */
 
 #define TIOCSERCONFIG  0x5453
 #define TIOCSERGWILD   0x5454
diff --git a/arch/mips/include/uapi/asm/ioctls.h 
b/arch/mips/include/uapi/asm/ioctls.h
index 68e19b6..1609cb0 100644
--- a/arch/mips/include/uapi/asm/ioctls.h
+++ b/arch/mips/include/uapi/asm/ioctls.h
@@ -91,7 +91,7 @@
 #define TIOCGPKT   _IOR('T', 0x38, int) /* Get packet mode state */
 #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */
 #define TIOCGEXCL  _IOR('T', 0x40, int) /* Get exclusive mode state */
-#define TIOCGPTPEER_IOR('T', 0x41, int) /* Safely open the slave */
+#define TIOCGPTPEER_IO('T', 0x41) /* Safely open the slave */
 
 /* I hope the range from 0x5480 on is free ... */
 #define TIOCSCTTY  0x5480  /* become controlling tty */
diff --git a/arch/parisc/include/uapi/asm/ioctls.h 
b/arch/parisc/include/uapi/asm/ioctls.h
index 674c68a..d0e3321 100644
--- a/arch/parisc/include/uapi/asm/ioctls.h
+++ b/arch/parisc/include/uapi/asm/ioctls.h
@@ -60,7 +60,7 @@
 #define TIOCGPKT   _IOR('T', 0x38, int) /* Get packet mode state */
 #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */
 #define TIOCGEXCL  _IOR('T', 0x40, int) /* Get exclusive mode state */
-#define TIOCGPTPEER_IOR('T', 0x41, int) /* Safely open the slave */
+#define TIOCGPTPEER_IO('T', 0x41) /* Safely open the slave */
 
 #define FIONCLEX   0x5450  /* these numbers need to be adjusted. */
 #define FIOCLEX0x5451
diff --git a/arch/powerpc/include/uapi/asm/ioctls.h 
b/arch/powerpc/include/uapi/asm/ioctls.h
index bfd609a..e3b1046 100644
--- a/arch/powerpc/include/uapi/asm/ioctls.h
+++ b/arch/powerpc/include/uapi/asm/ioctls.h
@@ -100,7 +100,7 @@
 #define TIOCGPKT   _IOR('T', 0x38, int) /* Get packet mode state */
 #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */
 #define TIOCGEXCL  _IOR('T', 0x40, int) /* Get exclusive mode state */
-#define TIOCGPTPEER_IOR('T', 0x41, int) /* Safely open the slave */
+#define TIOCGPTPEER_IO('T', 0x41) /* Safely open the slave */
 
 #define TIOCSERCONFIG  0x5453
 #define TIOCSERGWILD   0x5454
diff --git a/arch/sh/include/uapi/asm/ioctls.h 
b/arch/sh/include/uapi/asm/ioctls.h
index eec7901..787bac9 100644
--- a/arch/sh/include/uapi/asm/ioctls.h
+++ b/arch/sh/include/uapi/asm/ioctls.h
@@ -93,7 +93,7 @@
 #define TIOCGPKT   _IOR('T', 0x38, int) /* Get packet mode state */
 #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */
 #define TIOCGEXCL  _IOR('T', 0x40, int) /* Get exclusive mode state */
-#define TIOCGPTPEER_IOR('T', 0x41, int) /* Safely open the slave */
+#define TIOCGPTPEER_IO('T', 0x41) /* Safely open the slave */
 
 #define TIOCSERCONFIG  _IO('T', 83) /* 0x5453 */
 #define TIOCSERGWILD   _IOR('T', 84,  int) /* 0x5454 */
diff --git a/arch/sparc/include/uapi/asm/ioctls.h 
b/arch/sparc/include/uapi/asm/ioctls.h
index 6d27398..f5df72b 100644
--- a/arch/sparc/include/uapi/asm/ioctls.h
+++ b/arch/sparc/include/uapi/asm/ioctls.h
@@ -88,7 +88,7 @@
 #define TIOCGPTN   _IOR('t', 134, unsigned int) /* Get Pty Number */
 #define TIOCSPTLCK _IOW('t', 135, int) /* Lock/unlock PTY */
 #define TIOCSIG_IOW('t', 136, int) /* Generate signal on Pty 
slave */
-#define TIOCGPTPEER_IOR('t', 137, int) /* Safely open the slave */
+#define TIOCGPTPEER_IO('t', 137) /* Safely open the slave */
 
 /* Little f */
 #define FIOCLEX_IO('f', 1)
diff --git a/arch/xtensa/include/uapi/asm/ioctls.h 
b/arch/xtensa/include/uapi/asm/ioctls.h
index 98b004e..47d82c0 100644
--- a/arch/xtensa/include/uapi/asm/ioctls.h
+++ b/arch/xtensa/include/uapi/asm/ioctls.h
@@ -105,7 +105,7 @@
 #define TIOCGPKT   _IOR('T', 0x38, 

Re: [PATCH v2 08/10] powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async

2017-07-10 Thread Cyril Bur
On Mon, 2017-07-10 at 14:05 +, David Laight wrote:
> From: Cyril Bur
> > Sent: 10 July 2017 02:31
> > This patch adds an _interruptible version of opal_async_wait_response().
> > This is useful when a long running OPAL call is performed on behalf of a
> > userspace thread, for example, the opal_flash_{read,write,erase}
> > functions performed by the powernv-flash MTD driver.
> > 
> > It is foreseeable that these functions would take upwards of two minutes
> > causing the wait_event() to block long enough to cause hung task
> > warnings. Furthermore, wait_event_interruptible() is preferable as
> > otherwise there is no way for signals to stop the process which is going
> > to be confusing in userspace.
> 
> ISTM that if you are doing (something like) a flash full device erase
> (that really can take minutes) it isn't actually an interruptible
> operation - the flash chip will still be busy.
> So allowing the user process be interrupted just leaves a big mess.
> 

Agreed.

> OTOH the 'hung task' warning isn't the only problem with uninterruptible
> sleeps - the processes also count towards the 'load average'.
> Some software believes the 'load average' is a meaningful value.
> 

Yes, and because the read write and erase ops are actually calls into
firmware which in some cases completely emulates flash we can mitigate
the mess of allowing the process to be interrupted that you mention
above.

> It would be more generally useful for tasks to be able to sleep
> uninterruptibly without counting towards the 'load average' or triggering
> the 'task stuck' warning.
> (I've code somewhere that sleeps interruptibly unless there is a signal
> pending when it sleeps uninterruptibly.)
> 

I'm not sure what you mean here, if I understand correctly, this is
what I'm doing. In the patch to the powernv_flash driver which uses
opal_async_wait_response_interruptible() I essentially do the
interruptible and if it breaks early the driver determines if it is
safe to return to userspace otherwise it sleeps uninterruptibly
(hopefully not for long). I was hesitant to put that logic here and
prefered to leave it up to the caller to make the decision as to what
to do.

> WRT flash erases, 'whole device' erases aren't significantly quicker
> than sector by sector erases.

Yes, I'm rewriting some of the userspace tools we have to do everything
sector by sector. MTD interfaces allow big reads/writes and erases so
we should still handle it as best we can.

> The latter can be interrupted between sectors.
> I'm not sure you'd want to do writes than lock down enough kernel
> memory to take even a second to complete.
> 

The MTD core actually chunks them up before passing them to the to
backing driver so I don't think holding too much memory is a problem.
Because the time spent in OPAL servicing flash calls is hard to
determine and might not even be that related to the size of the
operation, even smaller ops might still spend 'time' in OPAL.

Cyril

>   David
> 


Re: [RFC 3/4] powerpc: Add pmem API support

2017-07-10 Thread Dan Williams
On Tue, Jun 27, 2017 at 3:28 AM, Oliver O'Halloran  wrote:
> Adds powerpc64 implementations of:
>
> memcpy_flushcache()
> arch_wb_cache_pmem()
> arch_invalidate_pmem()
>
> Which form the architecture-specific portition of the persistent memory
> API. These functions provide cache-management primitives for the DAX
> drivers and libNVDIMM.
>

Ok, now that we have commit 0aed55af8834 "x86, uaccess: introduce
copy_from_iter_flushcache for pmem / cache-bypass operations" upstream
that changes the model for how an architecture advertises
pmem-specific cache management routines. CONFIG_ARCH_HAS_PMEM_API
causes the pmem driver to try to link to these helpers rather than
dummy fallbacks:

arch_wb_cache_pmem
arch_invalidate_pmem

and CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE indicates that the arch has
all the needed "flushcache" apis:

__copy_from_user_flushcache
memcpy_page_flushcache
memcpy_flushcache


Re: [RFC 1/4] libnvdimm: add to_{nvdimm,nd_region}_dev()

2017-07-10 Thread Dan Williams
On Tue, Jun 27, 2017 at 3:28 AM, Oliver O'Halloran  wrote:
> struct device contains the ->of_node pointer so that devices can be
> assoicated with the device-tree node that created them on DT platforms.
> libnvdimm hides the struct device for regions and nvdimm devices inside
> of an opaque structure so this patch adds accessors for each to allow
> the of_nvdimm driver to set the of_node pointer.

I'd rather go the other way and pass in the of_node to the bus and
dimm registration routines. It's a generic property of the device so
we should handle it like other generic device properties that get set
at initialization time like 'attr_groups' in nvdimm_bus_descriptor, or
a new parameter to nvdimm_create().


Re: Today's linux-next build fail on powerpc

2017-07-10 Thread Abdul Haleem
On Mon, 2017-07-10 at 18:17 +0300, Andy Shevchenko wrote:
> On Mon, Jul 10, 2017 at 5:37 PM, Abdul Haleem
>  wrote:
> > On Fri, 2017-07-07 at 19:36 +0300, Andy Shevchenko wrote:
> >> On Thu, Jul 6, 2017 at 9:00 AM, Abdul Haleem
> >>  wrote:
> >> > next-20170705 fails to build on powerpc with below errors.
> >> I had sent a fix yesterday. Had you chance to test it?
> 
> > drivers/i2c/busses/i2c-designware-platdrv.c:331:6: error: implicit
> > declaration of function
> > ‘i2c_detect_slave_mode’ [-Werror=implicit-function-declaration]
> >   if (i2c_detect_slave_mode(>dev))
> >   ^
> >   CC [M]  fs/jffs2/scan.o
> >
> > Please point me to the correct patch you are referring to ?
> 
> commit 8f1a357d41a22009150cf404b5aa5876efdb59b1
> 
>i2c: Provide a stub for i2c_detect_slave_mode()
> 

Thanks Andy for the fix, the patch fixes the build errors.

Machine booted with no issues.

Reported-and-tested-by: Abdul Haleem 

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





Re: [RFC v5 00/38] powerpc: Memory Protection Keys

2017-07-10 Thread Ram Pai
On Sun, Jul 09, 2017 at 11:05:44PM -0700, Ram Pai wrote:
> On Mon, Jul 10, 2017 at 11:13:23AM +0530, Anshuman Khandual wrote:
> > On 07/06/2017 02:51 AM, Ram Pai wrote:
.
> 
> > do you have data points to show the difference in
> > performance between this version and the last one where
> > we skipped the bits from PTE and directly programmed the
> > HPTE entries looking into VMA bits.
> 
> No. I dont. I am hoping you can help me out with this.
Anshuman,
The last version where we skipped the PTE bits is guaranteed
to be bad/horrible. For one it has a bug, since it accesses
the vma without a lock. And even if we did take a lock, it
will slow down the page-hash path un-acceptably. So there is
no point measuring the performance of that design.

I think the number we want to measure is -- the performance 
with the current design and comparing that to the performance
without memkey feature. We want to find if there is
any degradation by adding this feature.

RP



Re: [PATCH] ptrace: Add compat PTRACE_{G,S}ETSIGMASK handlers

2017-07-10 Thread Oleg Nesterov
On 06/29, James Morse wrote:
>
> compat_ptrace_request() lacks handlers for PTRACE_{G,S}ETSIGMASK,
> instead using those in ptrace_request(). The compat variant should
> read a compat_sigset_t from userspace instead of ptrace_request()s
> sigset_t.

Acked-by: Oleg Nesterov 



Re: Today's linux-next build fail on powerpc

2017-07-10 Thread Andy Shevchenko
On Mon, Jul 10, 2017 at 5:37 PM, Abdul Haleem
 wrote:
> On Fri, 2017-07-07 at 19:36 +0300, Andy Shevchenko wrote:
>> On Thu, Jul 6, 2017 at 9:00 AM, Abdul Haleem
>>  wrote:
>> > next-20170705 fails to build on powerpc with below errors.
>> I had sent a fix yesterday. Had you chance to test it?

> drivers/i2c/busses/i2c-designware-platdrv.c:331:6: error: implicit
> declaration of function
> ‘i2c_detect_slave_mode’ [-Werror=implicit-function-declaration]
>   if (i2c_detect_slave_mode(>dev))
>   ^
>   CC [M]  fs/jffs2/scan.o
>
> Please point me to the correct patch you are referring to ?

commit 8f1a357d41a22009150cf404b5aa5876efdb59b1

   i2c: Provide a stub for i2c_detect_slave_mode()

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Segher Boessenkool
Hi Peter,

On Mon, Jul 10, 2017 at 03:46:58PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:
> 
> > > PERF_BR_INT is triggered by instruction "int" .
> > > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> > > transition).
> > 
> > So your "PERF_BR_INT" is a system call? 
> 
> The "INT" thing has indeed been used as system call mechanism (typically
> INT 80). But these days we have special purpose syscall instructions.
> 
> It could maybe be compared to the PPC "Unconditional TRAP with
> immediate" where you use the immediate value as an index into a handler
> vector.

If we would do that, yes :-)  (We just generate a SIGTRAP instead).

> > And PERF_BR_IRQ is not an interrupt request (as its name suggests),
> > not what we call an "external interrupt" either; instead it is every
> > interrupt that is not a system call?
> 
> It is actual interrupts, but also faults, traps and all the other
> exceptions not caused by "INT" I think.

Ah, right, exceptions == interrupts for PowerPC, more terminological
confusion :-)


Segher


Re: Today's linux-next build fail on powerpc

2017-07-10 Thread Abdul Haleem
On Fri, 2017-07-07 at 19:36 +0300, Andy Shevchenko wrote:
> On Thu, Jul 6, 2017 at 9:00 AM, Abdul Haleem
>  wrote:
> > Hi Luis,
> >
> > next-20170705 fails to build on powerpc with below errors.
> 
> Hi,
> 
> I had sent a fix yesterday. Had you chance to test it?
> 
> 

Are these the one you are referring to ?

[v1,2/2] ACPI / boot: Don't define unused variables 
[v1,1/2] ACPI / boot: Correct address space of __acpi_map_table()

Applying above patches did not fix the build failure.

drivers/i2c/busses/i2c-designware-platdrv.c:331:6: error: implicit
declaration of function
‘i2c_detect_slave_mode’ [-Werror=implicit-function-declaration]
  if (i2c_detect_slave_mode(>dev))
  ^
  CC [M]  fs/jffs2/scan.o

Please point me to the correct patch you are referring to ?

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao



On 7/10/2017 9:46 PM, Peter Zijlstra wrote:

On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:


PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
transition).

So your "PERF_BR_INT" is a system call?

The "INT" thing has indeed been used as system call mechanism (typically
INT 80). But these days we have special purpose syscall instructions.

It could maybe be compared to the PPC "Unconditional TRAP with
immediate" where you use the immediate value as an index into a handler
vector.


And PERF_BR_IRQ is not an interrupt request (as its name suggests),
not what we call an "external interrupt" either; instead it is every
interrupt that is not a system call?

It is actual interrupts, but also faults, traps and all the other
exceptions not caused by "INT" I think.

Yes. It's interrupt, traps, faults. If from is in the user space and to 
is in the kernel, it indicates the ring3 -> ring0 transition.


If the from instruction is not syscall or other ring transition 
instruction, it should be interrupt, traps and faults. That's how we get 
the PERF_BR_IRQ on x86.


Anyway, maybe we just use a minimum but the most common set of branch 
types now, it could be a good start and acceptable on all architectures.


PERF_BR_COND= 1,/* conditional */
PERF_BR_UNCOND= 2,/* unconditional */
PERF_BR_IND= 3,/* indirect */
PERF_BR_CALL= 4,/* call */
PERF_BR_IND_CALL= 5,/* indirect call */
PERF_BR_RET= 6,/* return */

Thanks
Jin Yao



RE: [PATCH v2 08/10] powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async

2017-07-10 Thread David Laight
From: Cyril Bur
> Sent: 10 July 2017 02:31
> This patch adds an _interruptible version of opal_async_wait_response().
> This is useful when a long running OPAL call is performed on behalf of a
> userspace thread, for example, the opal_flash_{read,write,erase}
> functions performed by the powernv-flash MTD driver.
> 
> It is foreseeable that these functions would take upwards of two minutes
> causing the wait_event() to block long enough to cause hung task
> warnings. Furthermore, wait_event_interruptible() is preferable as
> otherwise there is no way for signals to stop the process which is going
> to be confusing in userspace.

ISTM that if you are doing (something like) a flash full device erase
(that really can take minutes) it isn't actually an interruptible
operation - the flash chip will still be busy.
So allowing the user process be interrupted just leaves a big mess.

OTOH the 'hung task' warning isn't the only problem with uninterruptible
sleeps - the processes also count towards the 'load average'.
Some software believes the 'load average' is a meaningful value.

It would be more generally useful for tasks to be able to sleep
uninterruptibly without counting towards the 'load average' or triggering
the 'task stuck' warning.
(I've code somewhere that sleeps interruptibly unless there is a signal
pending when it sleeps uninterruptibly.)

WRT flash erases, 'whole device' erases aren't significantly quicker
than sector by sector erases.
The latter can be interrupted between sectors.
I'm not sure you'd want to do writes than lock down enough kernel
memory to take even a second to complete.

David



Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Peter Zijlstra
On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:

> > PERF_BR_INT is triggered by instruction "int" .
> > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> > transition).
> 
> So your "PERF_BR_INT" is a system call? 

The "INT" thing has indeed been used as system call mechanism (typically
INT 80). But these days we have special purpose syscall instructions.

It could maybe be compared to the PPC "Unconditional TRAP with
immediate" where you use the immediate value as an index into a handler
vector.

> And PERF_BR_IRQ is not an interrupt request (as its name suggests),
> not what we call an "external interrupt" either; instead it is every
> interrupt that is not a system call?

It is actual interrupts, but also faults, traps and all the other
exceptions not caused by "INT" I think.



Re: [PATCH] powerpc/asm: Mark cr0 as clobbered in mftb()

2017-07-10 Thread Segher Boessenkool
Hi!

On Mon, Jul 10, 2017 at 09:50:06PM +1000, Michael Ellerman wrote:
> Oliver O'Halloran  writes:
> 
> > The workaround for the CELL timebase bug does not correctly mark cr0 as
> > being clobbered. This can result in GCC making some poor^W completely
> > broken optimisations.
> 
> Fruit 'n oats, how did we never notice that? Luck I guess. Or subtle
> breakage that no one could pin down :E

GCC does not use cr0 before it has used cr7, cr5, cr6, cr1 (unless some
instruction _requires_ cr0, like record form ("dot") instructions, which
until recently were disabled when targetting Cell), so it isn't too easy
to hit the problem here.  Maybe Oliver used a very new GCC?  Or he was
unlucky.

> I'll tag it for stable.
> 
> Your change log is not entirely fair, it's not GCC's fault, we changed
> register state without telling it, so it's on us :)  I'll reword it a
> bit here.

Yep, GCC works fine here, GIGO etc.  It isn't feasible to make GCC warn
for most inline asm problems, either (we do not parse the mnemonics in
the asm; this is a fundamental part of the design; you must express all
constraints as, well, constraints).


Segher


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao

Hi,

Following branch types should be common enough, right?

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */

I decide to only define these types in this patch set. For other more 
arch-related branch type, we can add it in future.


Is this OK?

Thanks
Jin Yao

On 7/10/2017 9:10 PM, Segher Boessenkool wrote:

Hi!

On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:

1. We all agree these definitions:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */
+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRET= 11,   /* return from interrupt */

Do we?  It does not map very well to PowerPC branch types.


2. I wish to keep following definitions for x86.

+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */

PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
transition).

So your "PERF_BR_INT" is a system call?  And PERF_BR_IRQ is not an
interrupt request (as its name suggests), not what we call an "external
interrupt" either; instead it is every interrupt that is not a system
call?

It also does not follow the lines of "software caused interrupt" vs.
the rest.


4. I'd like to add following types for powerpc.

PERF_BR_COND_CALL   /* Conditional call */
PERF_BR_COND_RET/* Condition return */

Almost all PowerPC branches have a "conditional" version (only "syscall"
and "sysret/iret" do not -- and those last two are the same, just like
PERF_BR_INT seems to be the same as PERF_BR_SYSCALL).

So how should those PERF_BR_* be used?  It cannot be used in an
architecture-neutral interface the way you define it now.


Segher




Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Segher Boessenkool
Hi!

On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:
> 1. We all agree these definitions:
> 
> + PERF_BR_COND= 1,/* conditional */
> + PERF_BR_UNCOND  = 2,/* unconditional */
> + PERF_BR_IND = 3,/* indirect */
> + PERF_BR_CALL= 4,/* call */
> + PERF_BR_IND_CALL= 5,/* indirect call */
> + PERF_BR_RET = 6,/* return */
> + PERF_BR_SYSCALL = 7,/* syscall */
> + PERF_BR_SYSRET  = 8,/* syscall return */
> + PERF_BR_IRET= 11,   /* return from interrupt */

Do we?  It does not map very well to PowerPC branch types.

> 2. I wish to keep following definitions for x86.
> 
> + PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
> + PERF_BR_INT = 10,   /* sw interrupt */
> 
> PERF_BR_INT is triggered by instruction "int" .
> PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> transition).

So your "PERF_BR_INT" is a system call?  And PERF_BR_IRQ is not an
interrupt request (as its name suggests), not what we call an "external
interrupt" either; instead it is every interrupt that is not a system
call?

It also does not follow the lines of "software caused interrupt" vs.
the rest.

> 4. I'd like to add following types for powerpc.
> 
>   PERF_BR_COND_CALL   /* Conditional call */
>   PERF_BR_COND_RET/* Condition return */

Almost all PowerPC branches have a "conditional" version (only "syscall"
and "sysret/iret" do not -- and those last two are the same, just like
PERF_BR_INT seems to be the same as PERF_BR_SYSCALL).

So how should those PERF_BR_* be used?  It cannot be used in an
architecture-neutral interface the way you define it now.


Segher


Re: [PATCH] POWER9 PMU interrupt after idle workaround

2017-07-10 Thread Madhavan Srinivasan



On Monday 10 July 2017 11:49 AM, Nicholas Piggin wrote:

POWER9 DD2 can see spurious PMU interrupts after state-loss idle in
some conditions.

A solution is to save and reload MMCR0 over state-loss idle.


Acked-by: Madhavan Srinivasan 


Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/kernel/idle_book3s.S | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 5adb390e773b..516ebef905c0 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -30,6 +30,7 @@
   * Use unused space in the interrupt stack to save and restore
   * registers for winkle support.
   */
+#define _MMCR0 GPR0
  #define _SDR1 GPR3
  #define _PTCR GPR3
  #define _RPR  GPR4
@@ -272,6 +273,14 @@ power_enter_stop:
b   pnv_wakeup_noloss

  .Lhandle_esl_ec_set:
+   /*
+* POWER9 DD2 can incorrectly set PMAO when waking up after a
+* state-loss idle. Saving and restoring MMCR0 over idle is a
+* workaround.
+*/
+   mfspr   r4,SPRN_MMCR0
+   std r4,_MMCR0(r1)
+
  /*
   * Check if the requested state is a deep idle state.
   */
@@ -450,10 +459,14 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
  pnv_restore_hyp_resource_arch300:
/*
 * Workaround for POWER9, if we lost resources, the ERAT
-* might have been mixed up and needs flushing.
+* might have been mixed up and needs flushing. We also need
+* to reload MMCR0 (see comment above).
 */
blt cr3,1f
PPC_INVALIDATE_ERAT
+   ld  r1,PACAR1(r13)
+   ld  r4,_MMCR0(r1)
+   mtspr   SPRN_MMCR0,r4
  1:
/*
 * POWER ISA 3. Use PSSCR to determine if we




Re: [PATCH] powerpc/asm: Mark cr0 as clobbered in mftb()

2017-07-10 Thread Michael Ellerman
Oliver O'Halloran  writes:

> The workaround for the CELL timebase bug does not correctly mark cr0 as
> being clobbered. This can result in GCC making some poor^W completely
> broken optimisations.

Fruit 'n oats, how did we never notice that? Luck I guess. Or subtle
breakage that no one could pin down :E

I'll tag it for stable.

Your change log is not entirely fair, it's not GCC's fault, we changed
register state without telling it, so it's on us :)  I'll reword it a
bit here.

cheers

> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 7e50e47375d6..a3b6575c7842 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -1303,7 +1303,7 @@ static inline void msr_check_and_clear(unsigned long 
> bits)
>   "   .llong 0\n" \
>   ".previous" \
>   : "=r" (rval) \
> - : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL)); \
> + : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL) : "cr0"); \
>   rval;})
>  #else
>  #define mftb()   ({unsigned long rval;   \
> -- 
> 2.9.4


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao

Hi Michael,

Please let me summarize for the new branch type definitions.

1. We all agree these definitions:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */
+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRET= 11,   /* return from interrupt */

2. I wish to keep following definitions for x86.

+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */

PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
transition).


3. I can drop PERF_BR_FAR_BRANCH

4. I'd like to add following types for powerpc.

PERF_BR_COND_CALL   /* Conditional call */
PERF_BR_COND_RET/* Condition return */

If you agree these new definitions, I will prepare the new patch.

Thanks
Jin Yao

On 7/10/2017 6:32 PM, Michael Ellerman wrote:

"Jin, Yao"  writes:

On 7/10/2017 2:05 PM, Michael Ellerman wrote:

Jin Yao  writes:


It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.


...

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

Most of the code and doc uses "branch" but then a few these are called
"jump". Can we just stick with "branch"?


PERF_BR_NONE  : unknown
PERF_BR_JCC   : conditional jump
PERF_BR_JMP   : jump
PERF_BR_IND_JMP   : indirect jump

eg:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

Call and jump are all branches. If we want to figure out which one is
jump and which one is call, we need the detail branch type definitions.

Yeah I'm not saying we don't need the different types, I'm saying I'd
rather we just called them "branch" not "jump". Just because "jump" can
mean different things on different arches.


For example,  if we only say "PERF_BR_IND", we could not know if it's an
indirect jump or indirect call.

Yes we can, PERF_BR_IND is an indirect branch, which is not a call,
because if it was a call then it would be PERF_BR_IND_CALL.


PERF_BR_CALL  : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET   : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET: syscall return
PERF_BR_IRQ   : hw interrupt/trap/fault
PERF_BR_INT   : sw interrupt

I'm not sure what that means, I'm guessing on x86 it means someone
executed "int" ?

PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

OK, but I still don't know what that means :)

What's an example of an instruction that is PERF_BR_IRQ and PERF_BR_INT ?


PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call
(direct call and indirect call) and return.

Yep makes sense.


PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.

Yep OK.


Is that sufficiently useful to use up a bit? I think we only have 3
free?

Do you means 3 bits? Each bit stands for one branch type? I guess what
you mean is:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

But 3 branch types are not enough for us.

What I meant was you're using 4 bits for the type, so you have 16
possible values, and you've defined 13 of them. Meaning there are only 3
types free.

So we should try to only define branch types that are really useful, and
keep some free for future use.

Maybe PERF_BR_INT is really common on x86 and so it's important to count
it, but like I said above I don't know what it is.


PERF_BR_IRET  : return from interrupt
PERF_BR_FAR_BRANCH: not generic far branch type

What is a "not generic far branch" ?

I don't know what that would mean on powerpc for example.

It's reserved for future using I think.

OK so let's not put it in the Linux API until it's defined?


I think the only thing we have on powerpc that's commonly used and that
isn't covered above is branches that decrement a loop counter and then
branch based on the result.

...

Sorry, I'm not familiar with powerpc arch. Or could you add the branch
type which powerpc needs?

These are good:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */

These we wouldn't use currently, but 

Re: issue with kernel 4.12.rc6 addnote -kernel dont build

2017-07-10 Thread Michael Ellerman
luigi burdo  writes:

> Hi all, kernel here is not building.

Did it just stop working? That code is from 2014?

> attached file will explain better than my poor english.
>
> Host Machine BE Qoriq e5500  16GB ram

What userspace are you running? Is it 32-bit ?

>  HOSTCC  arch/powerpc/boot/addnote
   ^^
   What compiler is that?

Building with V=1 should show you.

> arch/powerpc/boot/addnote.c: In function ‘main’:
> arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of 
> type [-Wshift-count-overflow]
>  #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \

These warnings seem like they're correct, but that code should never
actually run on 32-bit so it should be harmless.

> mv: cannot move 'arch/powerpc/boot/.addnote.tmp' to 
> 'arch/powerpc/boot/.addnote.cmd': No such file or directory
> scripts/Makefile.host:107: recipe for target 'arch/powerpc/boot/addnote' 
> failed
> make[1]: *** [arch/powerpc/boot/addnote] Error 1

But they're only warnings, so they shouldn't be breaking the build AFAICS.

cheers


Re: [PATCH] powerpc/powernv/idle: Put pnv_cpu_offline behind HOTPLUG_CPU

2017-07-10 Thread Michael Ellerman
Joel Stanley  writes:
> In commit 900612315788 ("powerpc/powernv/smp: Add busy-wait loop as fall back
> for CPU-Hotplug") the idle code uses generic_check_cpu_restart(), but that
> function is not available when CONFIG_HOTPLUG_CPU is disabled.
>
> arch/powerpc/platforms/powernv/idle.c: In function ‘pnv_cpu_offline’:
> arch/powerpc/platforms/powernv/idle.c:286:11: error: implicit declaration of
> function ‘generic_check_cpu_restart’ [-Werror=implicit-function-declaration]
>while (!generic_check_cpu_restart(cpu)) {
>^
>
> The callers of pnv_cpu_offline are behind CONFIG_HOTPLUG_CPU, so fix the build
> error by putting this code behind the same gard.
>
> Fixes: 900612315788 ("powerpc/powernv/smp: Add busy-wait loop as fall back 
> for CPU-Hotplug").
> Cc:  # 4.12
> Signed-off-by: Joel Stanley 

This was fixed upstream in commit 67d204180886, but I forgot to tag it
for stable.

https://git.kernel.org/torvalds/c/67d204180886


I've just sent mail to stable asking for it to be backported to 4.12.

cheers


Re: [PATCH 1/2] powerpc/mm/radix: Properly clear process table entry

2017-07-10 Thread Michael Ellerman
Benjamin Herrenschmidt  writes:

> On Mon, 2017-07-10 at 13:51 +1000, Michael Ellerman wrote:
>> Benjamin Herrenschmidt  writes:
>> 
>> > On radix, the process table entry we want to clear when
>> > destroying a context is entry 0, not entry 1
>> 
>> .. but has no real world consequence? Or causes the system to instantly
>> catch fire?
>
> It has no *immediate* consequence on P9 , but it can cause other bugs
> to become worse (such as the KVM issue I'm chasing).

OK, I'll flag it for stable then to be safe.

cheers


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
"Jin, Yao"  writes:
> On 7/10/2017 2:05 PM, Michael Ellerman wrote:
>> Jin Yao  writes:
>>
>>> It is often useful to know the branch types while analyzing branch
>>> data. For example, a call is very different from a conditional branch.
>>>
...
>>> To keep consistent on kernel and userspace and make the classification
>>> more common, the patch adds the common branch type classification
>>> in perf_event.h.
>>
>> Most of the code and doc uses "branch" but then a few these are called
>> "jump". Can we just stick with "branch"?
>>
>>> PERF_BR_NONE  : unknown
>>> PERF_BR_JCC   : conditional jump
>>> PERF_BR_JMP   : jump
>>> PERF_BR_IND_JMP   : indirect jump
>> eg:
>>
>> PERF_BR_COND: conditional branch
>> PERF_BR_UNCOND  : unconditional branch
>> PERF_BR_IND : indirect branch
>
> Call and jump are all branches. If we want to figure out which one is 
> jump and which one is call, we need the detail branch type definitions.

Yeah I'm not saying we don't need the different types, I'm saying I'd
rather we just called them "branch" not "jump". Just because "jump" can
mean different things on different arches.

> For example,  if we only say "PERF_BR_IND", we could not know if it's an 
> indirect jump or indirect call.

Yes we can, PERF_BR_IND is an indirect branch, which is not a call,
because if it was a call then it would be PERF_BR_IND_CALL.

>>> PERF_BR_CALL  : call
>>> PERF_BR_IND_CALL  : indirect call
>>> PERF_BR_RET   : return
>>> PERF_BR_SYSCALL   : syscall
>>> PERF_BR_SYSRET: syscall return
>>> PERF_BR_IRQ   : hw interrupt/trap/fault
>>> PERF_BR_INT   : sw interrupt
>> I'm not sure what that means, I'm guessing on x86 it means someone
>> executed "int" ?
>
> PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

OK, but I still don't know what that means :)

What's an example of an instruction that is PERF_BR_IRQ and PERF_BR_INT ?

> PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call 
> (direct call and indirect call) and return.

Yep makes sense.

> PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.

Yep OK.

>> Is that sufficiently useful to use up a bit? I think we only have 3
>> free?
>
> Do you means 3 bits? Each bit stands for one branch type? I guess what 
> you mean is:
>
> PERF_BR_COND: conditional branch
> PERF_BR_UNCOND  : unconditional branch
> PERF_BR_IND : indirect branch
>
> But 3 branch types are not enough for us.

What I meant was you're using 4 bits for the type, so you have 16
possible values, and you've defined 13 of them. Meaning there are only 3
types free.

So we should try to only define branch types that are really useful, and
keep some free for future use.

Maybe PERF_BR_INT is really common on x86 and so it's important to count
it, but like I said above I don't know what it is.

>>> PERF_BR_IRET  : return from interrupt
>>> PERF_BR_FAR_BRANCH: not generic far branch type
>> What is a "not generic far branch" ?
>>
>> I don't know what that would mean on powerpc for example.
>
> It's reserved for future using I think.

OK so let's not put it in the Linux API until it's defined?

>> I think the only thing we have on powerpc that's commonly used and that
>> isn't covered above is branches that decrement a loop counter and then
>> branch based on the result.
...
>
> Sorry, I'm not familiar with powerpc arch. Or could you add the branch 
> type which powerpc needs?

These are good:

+   PERF_BR_COND= 1,/* conditional */
+   PERF_BR_UNCOND  = 2,/* unconditional */
+   PERF_BR_IND = 3,/* indirect */
+   PERF_BR_CALL= 4,/* call */
+   PERF_BR_IND_CALL= 5,/* indirect call */
+   PERF_BR_RET = 6,/* return */

These we wouldn't use currently, but make sense:

+   PERF_BR_SYSCALL = 7,/* syscall */
+   PERF_BR_SYSRET  = 8,/* syscall return */
+   PERF_BR_IRET= 11,   /* return from interrupt */

These I'm not so sure about, I don't really know what they would map to
for us:

+   PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */
+   PERF_BR_INT = 10,   /* sw interrupt */

And sounds like this should be dropped for now:

+   PERF_BR_FAR_BRANCH  = 12,   /* not generic far branch type */

The branch types you haven't covered which might be useful for us are:

PERF_BR_COND_CALL   /* Conditional call */
PERF_BR_COND_RET/* Condition return */


cheers


Re: [PATCH] ptrace: Add compat PTRACE_{G,S}ETSIGMASK handlers

2017-07-10 Thread Yury Norov
On Thu, Jun 29, 2017 at 05:26:37PM +0100, James Morse wrote:
> compat_ptrace_request() lacks handlers for PTRACE_{G,S}ETSIGMASK,
> instead using those in ptrace_request(). The compat variant should
> read a compat_sigset_t from userspace instead of ptrace_request()s
> sigset_t.
> 
> While compat_sigset_t is the same size as sigset_t, it is defined as
> 2xu32, instead of a single u64. On a big-endian CPU this means that
> compat_sigset_t is passed to user-space using middle-endianness,
> where the least-significant u32 is written most significant byte
> first.
> 
> If ptrace_request()s code is used userspace will read the most
> significant u32 where it expected the least significant.
> 
> Instead of duplicating ptrace_request()s code as a special case in
> the arch code, handle it here.
 
Hi James,

I tested arm64/ilp32 on top of, and everything is fine.

Yury

Acked-by: Yury Norov 

> CC: Yury Norov 
> CC: Andrey Vagin 
> Reported-by: Zhou Chengming 
> Signed-off-by: James Morse 
> Fixes: 29000caecbe87 ("ptrace: add ability to get/set signal-blocked mask")
> ---
> LTP test case here:
> https://lists.linux.it/pipermail/ltp/2017-June/004932.html
> 
>  kernel/ptrace.c | 52 
>  1 file changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 8d2c10714530..a5bebb6713e8 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -843,6 +843,22 @@ static int ptrace_regset(struct task_struct *task, int 
> req, unsigned int type,
>  EXPORT_SYMBOL_GPL(task_user_regset_view);
>  #endif
>  
> +static int ptrace_setsigmask(struct task_struct *child, sigset_t *new_set)
> +{
> + sigdelsetmask(new_set, sigmask(SIGKILL)|sigmask(SIGSTOP));
> +
> + /*
> +  * Every thread does recalc_sigpending() after resume, so
> +  * retarget_shared_pending() and recalc_sigpending() are not
> +  * called here.
> +  */
> + spin_lock_irq(>sighand->siglock);
> + child->blocked = *new_set;
> + spin_unlock_irq(>sighand->siglock);
> +
> + return 0;
> +}
> +
>  int ptrace_request(struct task_struct *child, long request,
>  unsigned long addr, unsigned long data)
>  {
> @@ -914,18 +930,7 @@ int ptrace_request(struct task_struct *child, long 
> request,
>   break;
>   }
>  
> - sigdelsetmask(_set, sigmask(SIGKILL)|sigmask(SIGSTOP));
> -
> - /*
> -  * Every thread does recalc_sigpending() after resume, so
> -  * retarget_shared_pending() and recalc_sigpending() are not
> -  * called here.
> -  */
> - spin_lock_irq(>sighand->siglock);
> - child->blocked = new_set;
> - spin_unlock_irq(>sighand->siglock);
> -
> - ret = 0;
> + ret = ptrace_setsigmask(child, _set);
>   break;
>   }
>  
> @@ -1149,7 +1154,9 @@ int compat_ptrace_request(struct task_struct *child, 
> compat_long_t request,
> compat_ulong_t addr, compat_ulong_t data)
>  {
>   compat_ulong_t __user *datap = compat_ptr(data);
> + compat_sigset_t set32;
>   compat_ulong_t word;
> + sigset_t new_set;
>   siginfo_t siginfo;
>   int ret;
>  
> @@ -1189,6 +1196,27 @@ int compat_ptrace_request(struct task_struct *child, 
> compat_long_t request,
>   else
>   ret = ptrace_setsiginfo(child, );
>   break;
> + case PTRACE_GETSIGMASK:
> + if (addr != sizeof(compat_sigset_t))
> + return -EINVAL;
> +
> + sigset_to_compat(, >blocked);
> +
> + if (copy_to_user(datap, , sizeof(set32)))
> + return -EFAULT;
> +
> + ret = 0;
> + break;
> + case PTRACE_SETSIGMASK:
> + if (addr != sizeof(compat_sigset_t))
> + return -EINVAL;
> +
> + if (copy_from_user(, datap, sizeof(compat_sigset_t)))
> + return -EFAULT;
> +
> + sigset_from_compat(_set, );
> + ret = ptrace_setsigmask(child, _set);
> + break;
>  #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
>   case PTRACE_GETREGSET:
>   case PTRACE_SETREGSET:
> -- 
> 2.11.0


Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Jin, Yao



On 7/10/2017 2:05 PM, Michael Ellerman wrote:

Hi Jin Yao,

Sorry I haven't commented until now, but it got lost in the flood of
patches.


Never mind, it's no problem. :)


Just a few nit-picks below ...
Jin Yao  writes:


It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.

Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.

Perf already has support for disassembling the branch instruction
to get the x86 branch type.

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

Most of the code and doc uses "branch" but then a few these are called
"jump". Can we just stick with "branch"?


PERF_BR_NONE  : unknown
PERF_BR_JCC   : conditional jump
PERF_BR_JMP   : jump
PERF_BR_IND_JMP   : indirect jump

eg:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch


Call and jump are all branches. If we want to figure out which one is 
jump and which one is call, we need the detail branch type definitions.


For example,  if we only say "PERF_BR_IND", we could not know if it's an 
indirect jump or indirect call.

PERF_BR_CALL  : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET   : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET: syscall return
PERF_BR_IRQ   : hw interrupt/trap/fault
PERF_BR_INT   : sw interrupt

I'm not sure what that means, I'm guessing on x86 it means someone
executed "int" ?


PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call 
(direct call and indirect call) and return.


PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.


Is that sufficiently useful to use up a bit? I think we only have 3
free?


Do you means 3 bits? Each bit stands for one branch type? I guess what 
you mean is:


PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

But 3 branch types are not enough for us.


PERF_BR_IRET  : return from interrupt
PERF_BR_FAR_BRANCH: not generic far branch type

What is a "not generic far branch" ?

I don't know what that would mean on powerpc for example.


It's reserved for future using I think.



I think the only thing we have on powerpc that's commonly used and that
isn't covered above is branches that decrement a loop counter and then
branch based on the result.

It might be nice if we could separate those out from other conditional
branches. Whether it's worth using a bit for I'm not sure. Do other
arches have something similar?

Those branches do tend to be "backward conditional", so that may be
sufficient. But backward conditional also includes if bodies that have
been moved out of line and then branch back to the main body of the
function.

cheers


Sorry, I'm not familiar with powerpc arch. Or could you add the branch 
type which powerpc needs?


For backward conditional and forward conditional, we compute them in 
userspace according to the from/to addresses.


Thanks
Jin Yao





Re: [RESEND PATCH v5 00/16] eeprom: at24: Add OF device ID table

2017-07-10 Thread Javier Martinez Canillas
Hello Wolfram,

On Thu, Jun 15, 2017 at 8:54 PM, Javier Martinez Canillas
 wrote:
>
> This series is a follow-up to patch [0] that added an OF device ID table
> to the at24 EEPROM driver. As you suggested [1], this version instead of
> adding entries for every used  tuple, only adds a single
> entry for each chip type using the "atmel" vendor as a generic fallback.
>
> The first patch documents in the DT binding what's the correct vendor to
> use and what are the ones that are being deprecated. The second one adds
> the OF device ID table for the at24 driver and the next patches use this
> vendor in the compatible string to each DTS that defines a compatible I2C
> EEPROM device node.
>
> Patches can be applied independently since the DTS changes without driver
> changes are no-op and the OF table won't be used without the DTS changes.
>
> [0]: https://lkml.org/lkml/2017/3/14/589
> [1]: https://lkml.org/lkml/2017/3/15/99
>

Are you planning to pick this series? It has been in the list for
months and were resent many times...

Best regards,
Javier


[PATCH] soc/qbman: Simplify bman_release()

2017-07-10 Thread Sebastian Huber
Get the affine portal only once for the complete transaction.

Signed-off-by: Sebastian Huber 
---
 drivers/soc/fsl/qbman/bman.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
index a3d6d7cfa929..0af2069a6b7e 100644
--- a/drivers/soc/fsl/qbman/bman.c
+++ b/drivers/soc/fsl/qbman/bman.c
@@ -723,26 +723,23 @@ int bman_release(struct bman_pool *pool, const struct 
bm_buffer *bufs, u8 num)
 
DPAA_ASSERT(num > 0 && num <= 8);
 
-   do {
+   while (1) {
p = get_affine_portal();
local_irq_save(irqflags);
avail = bm_rcr_get_avail(>p);
if (avail < 2)
update_rcr_ci(p, avail);
r = bm_rcr_start(>p);
-   local_irq_restore(irqflags);
-   put_affine_portal();
if (likely(r))
break;
 
+   local_irq_restore(irqflags);
+   put_affine_portal();
+   if (unlikely(--timeout == 0))
+   return -ETIMEDOUT;
udelay(1);
-   } while (--timeout);
-
-   if (unlikely(!timeout))
-   return -ETIMEDOUT;
+   }
 
-   p = get_affine_portal();
-   local_irq_save(irqflags);
/*
 * we can copy all but the first entry, as this can trigger badness
 * with the valid-bit
-- 
2.12.3



Re: [PATCH 2/2] powerpc/mm/radix: Synchronize updates to the process table

2017-07-10 Thread Benjamin Herrenschmidt
On Mon, 2017-07-10 at 14:40 +1000, Nicholas Piggin wrote:
> On Fri, 07 Jul 2017 16:12:16 -0500
> Benjamin Herrenschmidt  wrote:
> 
> > When writing to the process table, we need to ensure the store is
> > visible to a subsequent access by the MMU. We assume we never have
> > the PID active while doing the update, so a ptesync/isync pair
> > should hopefully be a big enough hammer for our purpose.
> > 
> 
> Do we need this if it's going from invalid->valid?

No. While there is no valid bit in radix, I checked with HW and they
will not cache an entry that has an invalid RTS field. We should ensure
this gets architected for future impl. though.

> 
> > Signed-off-by: Benjamin Herrenschmidt 
> > ---
> > 
> > Note: Architecturally, we also need to use a tlbie(l) with RIC=2
> > to flush the process table cache. However this is (very) expensive
> > and we know that POWER9 will invalidate its cache when hitting the
> > mtpid instruction.
> > 
> > To be safe, we should add the tlbie for any ARCH300 processor we
> > don't know about though. (Aneesh, Nick do we need a ftr bit ?)
> 
> Good question, I'm not sure. Aside from this particular thing, it
> seems like a good idea in general to add implementation specific
> tests into the ftr framework.
> 
> We could add the PVR into it so we don't have to pollute FTR bits.
> The POWER9_DD1 bit for example could just be a PVR mask and cmp.

Reading the PVR isn't necessarily cheap though, we may want to cache
it.
> 
> > 
> >  arch/powerpc/mm/mmu_context_book3s64.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
> > b/arch/powerpc/mm/mmu_context_book3s64.c
> > index 9404b5e..e3e2803 100644
> > --- a/arch/powerpc/mm/mmu_context_book3s64.c
> > +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> > @@ -138,6 +138,14 @@ static int radix__init_new_context(struct mm_struct 
> > *mm)
> > rts_field = radix__get_tree_size();
> > process_tb[index].prtb0 = cpu_to_be64(rts_field | __pa(mm->pgd) | 
> > RADIX_PGD_INDEX_SIZE);
> >  
> > +   /*
> > +* Order the above store with subsequent update of the PID
> > +* register (at which point HW can start loading/caching
> > +* the entry) and the corresponding load by the MMU from
> > +* the L2 cache.
> > +*/
> > +   asm volatile("ptesync;isync" : : : "memory");
> > +
> > mm->context.npu_context = NULL;
> >  
> > return index;
> > 


[PATCH] POWER9 PMU interrupt after idle workaround

2017-07-10 Thread Nicholas Piggin
POWER9 DD2 can see spurious PMU interrupts after state-loss idle in
some conditions.

A solution is to save and reload MMCR0 over state-loss idle.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/idle_book3s.S | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 5adb390e773b..516ebef905c0 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -30,6 +30,7 @@
  * Use unused space in the interrupt stack to save and restore
  * registers for winkle support.
  */
+#define _MMCR0 GPR0
 #define _SDR1  GPR3
 #define _PTCR  GPR3
 #define _RPR   GPR4
@@ -272,6 +273,14 @@ power_enter_stop:
b   pnv_wakeup_noloss
 
 .Lhandle_esl_ec_set:
+   /*
+* POWER9 DD2 can incorrectly set PMAO when waking up after a
+* state-loss idle. Saving and restoring MMCR0 over idle is a
+* workaround.
+*/
+   mfspr   r4,SPRN_MMCR0
+   std r4,_MMCR0(r1)
+
 /*
  * Check if the requested state is a deep idle state.
  */
@@ -450,10 +459,14 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 pnv_restore_hyp_resource_arch300:
/*
 * Workaround for POWER9, if we lost resources, the ERAT
-* might have been mixed up and needs flushing.
+* might have been mixed up and needs flushing. We also need
+* to reload MMCR0 (see comment above).
 */
blt cr3,1f
PPC_INVALIDATE_ERAT
+   ld  r1,PACAR1(r13)
+   ld  r4,_MMCR0(r1)
+   mtspr   SPRN_MMCR0,r4
 1:
/*
 * POWER ISA 3. Use PSSCR to determine if we
-- 
2.11.0



Re: [RFC v5 00/38] powerpc: Memory Protection Keys

2017-07-10 Thread Ram Pai
On Mon, Jul 10, 2017 at 11:13:23AM +0530, Anshuman Khandual wrote:
> On 07/06/2017 02:51 AM, Ram Pai wrote:
> > Memory protection keys enable applications to protect its
> > address space from inadvertent access or corruption from
> > itself.
> > 
> > The overall idea:
> > 
> >  A process allocates a   key  and associates it with
> >  an  address  range  withinits   address   space.
> >  The process  then  can  dynamically  set read/write 
> >  permissions on  the   key   without  involving  the 
> >  kernel. Any  code that  violates   the  permissions
> >  of  the address space; as defined by its associated
> >  key, will receive a segmentation fault.
> > 
> > This patch series enables the feature on PPC64 HPTE
> > platform.
> > 
> > ISA3.0 section 5.7.13 describes the detailed specifications.
> > 
> > 
> > Testing:
> > This patch series has passed all the protection key
> > tests available in  the selftests directory.
> > The tests are updated to work on both x86 and powerpc.
> > 
> > version v5:
> > (1) reverted back to the old design -- store the 
> > key in the pte, instead of bypassing it.
> > The v4 design slowed down the hash page path.
> > (2) detects key violation when kernel is told to 
> > access user pages.
> > (3) further refined the patches into smaller consumable
> > units
> > (4) page faults handlers captures the faulting key 
> > from the pte instead of the vma. This closes a
> > race between where the key update in the vma and
> > a key fault caused cause by the key programmed
> > in the pte.
> > (5) a key created with access-denied should
> > also set it up to deny write. Fixed it.
> > (6) protection-key number is displayed in smaps
> > the x86 way.
> 
> Hello Ram,
> 
> This patch series has now grown a lot. Do you have this
> hosted some where for us to pull and test it out ? BTW

https://github.com/rampai/memorykeys.git
branch memkey.v5.3

> do you have data points to show the difference in
> performance between this version and the last one where
> we skipped the bits from PTE and directly programmed the
> HPTE entries looking into VMA bits.

No. I dont. I am hoping you can help me out with this.
RP



Re: [PATCH v6 1/7] perf/core: Define the common branch type classification

2017-07-10 Thread Michael Ellerman
Hi Jin Yao,

Sorry I haven't commented until now, but it got lost in the flood of
patches.

Just a few nit-picks below ...

Jin Yao  writes:

> It is often useful to know the branch types while analyzing branch
> data. For example, a call is very different from a conditional branch.
>
> Currently we have to look it up in binary while the binary may later
> not be available and even the binary is available but user has to take
> some time. It is very useful for user to check it directly in perf
> report.
>
> Perf already has support for disassembling the branch instruction
> to get the x86 branch type.
>
> To keep consistent on kernel and userspace and make the classification
> more common, the patch adds the common branch type classification
> in perf_event.h.

Most of the code and doc uses "branch" but then a few these are called
"jump". Can we just stick with "branch"?

> PERF_BR_NONE  : unknown
> PERF_BR_JCC   : conditional jump
> PERF_BR_JMP   : jump
> PERF_BR_IND_JMP   : indirect jump

eg:

PERF_BR_COND: conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND : indirect branch

> PERF_BR_CALL  : call
> PERF_BR_IND_CALL  : indirect call
> PERF_BR_RET   : return
> PERF_BR_SYSCALL   : syscall
> PERF_BR_SYSRET: syscall return
> PERF_BR_IRQ   : hw interrupt/trap/fault
> PERF_BR_INT   : sw interrupt

I'm not sure what that means, I'm guessing on x86 it means someone
executed "int" ?

Is that sufficiently useful to use up a bit? I think we only have 3
free?

> PERF_BR_IRET  : return from interrupt
> PERF_BR_FAR_BRANCH: not generic far branch type

What is a "not generic far branch" ?

I don't know what that would mean on powerpc for example.


I think the only thing we have on powerpc that's commonly used and that
isn't covered above is branches that decrement a loop counter and then
branch based on the result.

It might be nice if we could separate those out from other conditional
branches. Whether it's worth using a bit for I'm not sure. Do other
arches have something similar?

Those branches do tend to be "backward conditional", so that may be
sufficient. But backward conditional also includes if bodies that have
been moved out of line and then branch back to the main body of the
function.

cheers


Re: [RFC v5 34/38] procfs: display the protection-key number associated with a vma

2017-07-10 Thread Ram Pai
On Mon, Jul 10, 2017 at 08:37:28AM +0530, Anshuman Khandual wrote:
> On 07/06/2017 02:52 AM, Ram Pai wrote:
> > Display the pkey number associated with the vma in smaps of a task.
> > The key will be seen as below:
> > 
> > ProtectionKey: 0
> > 
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/kernel/setup_64.c |8 
> >  1 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > index f35ff9d..ebc82b3 100644
> > --- a/arch/powerpc/kernel/setup_64.c
> > +++ b/arch/powerpc/kernel/setup_64.c
> > @@ -37,6 +37,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -745,3 +746,10 @@ static int __init disable_hardlockup_detector(void)
> >  }
> >  early_initcall(disable_hardlockup_detector);
> >  #endif
> > +
> > +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> 
> Why not for X86 protection keys ?

hmm.. I dont understand the comment.


-- 
Ram Pai



Re: [RFC v5 38/38] Documentation: PowerPC specific updates to memory protection keys

2017-07-10 Thread Ram Pai
On Mon, Jul 10, 2017 at 08:37:04AM +0530, Anshuman Khandual wrote:
> On 07/06/2017 02:52 AM, Ram Pai wrote:
> > Add documentation updates that capture PowerPC specific changes.
> > 
> > Signed-off-by: Ram Pai 
> > ---
> >  Documentation/vm/protection-keys.txt |   85 
> > ++
> >  1 files changed, 65 insertions(+), 20 deletions(-)
> > 
> > diff --git a/Documentation/vm/protection-keys.txt 
> > b/Documentation/vm/protection-keys.txt
> > index b643045..d50b6ab 100644
> > --- a/Documentation/vm/protection-keys.txt
> > +++ b/Documentation/vm/protection-keys.txt
> > @@ -1,21 +1,46 @@
> > -Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
> > -which will be found on future Intel CPUs.
> > +Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature 
> > found in
> > +new generation of intel CPUs and on PowerPC 7 and higher CPUs.
> >  
> >  Memory Protection Keys provides a mechanism for enforcing page-based
> > -protections, but without requiring modification of the page tables
> > -when an application changes protection domains.  It works by
> > -dedicating 4 previously ignored bits in each page table entry to a
> > -"protection key", giving 16 possible keys.
> > -
> > -There is also a new user-accessible register (PKRU) with two separate
> > -bits (Access Disable and Write Disable) for each key.  Being a CPU
> > -register, PKRU is inherently thread-local, potentially giving each
> > -thread a different set of protections from every other thread.
> > -
> > -There are two new instructions (RDPKRU/WRPKRU) for reading and writing
> > -to the new register.  The feature is only available in 64-bit mode,
> > -even though there is theoretically space in the PAE PTEs.  These
> > -permissions are enforced on data access only and have no effect on
> > +protections, but without requiring modification of the page tables when an
> > +application changes protection domains.
> > +
> > +
> > +On Intel:
> > +
> > +   It works by dedicating 4 previously ignored bits in each page table
> > +   entry to a "protection key", giving 16 possible keys.
> > +
> > +   There is also a new user-accessible register (PKRU) with two separate
> > +   bits (Access Disable and Write Disable) for each key.  Being a CPU
> > +   register, PKRU is inherently thread-local, potentially giving each
> > +   thread a different set of protections from every other thread.
> > +
> > +   There are two new instructions (RDPKRU/WRPKRU) for reading and writing
> > +   to the new register.  The feature is only available in 64-bit mode,
> > +   even though there is theoretically space in the PAE PTEs.  These
> > +   permissions are enforced on data access only and have no effect on
> > +   instruction fetches.
> > +
> > +
> > +On PowerPC:
> > +
> > +   It works by dedicating 5 page table entry bits to a "protection key",
> > +   giving 32 possible keys.
> > +
> > +   There  is  a  user-accessible  register (AMR)  with  two separate bits;
> > +   Access Disable and  Write  Disable, for  each key.  Being  a  CPU
> > +   register,  AMR  is inherently  thread-local,  potentially  giving  each
> > +   thread a different set of protections from every other thread.  NOTE:
> > +   Disabling read permission does not disable write and vice-versa.
> 
> We can only enable/disable entire access or write. Then how
> read permission can be changed with protection keys directly ?

Good catch. On powerpc there is a disable read and disable write. They
both can be combined to disable access. Will fix the error. Read it
as 'Access Read' . thanks.

RP