[PATCH] powerpc: configs: Add Skiroot defconfig
This configuration is used by the OpenPower firmware for it's Linux-as-bootloader implementation. Also known as the Petitboot kernel, this configuration broke in 4.12 (CPU_HOTPLUG=n), so add it to the upstream tree in order to get better coverage. Signed-off-by: Joel Stanley--- arch/powerpc/configs/skiroot_defconfig | 226 + 1 file changed, 226 insertions(+) create mode 100644 arch/powerpc/configs/skiroot_defconfig diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig new file mode 100644 index ..24a3ea65d9e2 --- /dev/null +++ b/arch/powerpc/configs/skiroot_defconfig @@ -0,0 +1,226 @@ +CONFIG_PPC64=y +CONFIG_ALTIVEC=y +CONFIG_VSX=y +CONFIG_NR_CPUS=2048 +CONFIG_CPU_LITTLE_ENDIAN=y +# CONFIG_SWAP is not set +CONFIG_SYSVIPC=y +CONFIG_POSIX_MQUEUE=y +# CONFIG_CROSS_MEMORY_ATTACH is not set +CONFIG_NO_HZ=y +CONFIG_HIGH_RES_TIMERS=y +CONFIG_TASKSTATS=y +CONFIG_TASK_DELAY_ACCT=y +CONFIG_TASK_XACCT=y +CONFIG_TASK_IO_ACCOUNTING=y +CONFIG_IKCONFIG=y +CONFIG_IKCONFIG_PROC=y +CONFIG_LOG_BUF_SHIFT=20 +CONFIG_RELAY=y +CONFIG_BLK_DEV_INITRD=y +# CONFIG_RD_GZIP is not set +# CONFIG_RD_BZIP2 is not set +# CONFIG_RD_LZMA is not set +# CONFIG_RD_LZO is not set +# CONFIG_RD_LZ4 is not set +CONFIG_CC_OPTIMIZE_FOR_SIZE=y +CONFIG_PERF_EVENTS=y +# CONFIG_COMPAT_BRK is not set +CONFIG_JUMP_LABEL=y +CONFIG_MODULES=y +CONFIG_MODULE_UNLOAD=y +CONFIG_MODULE_SIG=y +CONFIG_MODULE_SIG_FORCE=y +CONFIG_MODULE_SIG_SHA512=y +CONFIG_PARTITION_ADVANCED=y +# CONFIG_IOSCHED_DEADLINE is not set +# CONFIG_PPC_PSERIES is not set +CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y +CONFIG_CPU_IDLE=y +CONFIG_HZ_100=y +CONFIG_KEXEC=y +CONFIG_IRQ_ALL_CPUS=y +CONFIG_NUMA=y +# CONFIG_COMPACTION is not set +# CONFIG_MIGRATION is not set +# CONFIG_BOUNCE is not set +CONFIG_PPC_64K_PAGES=y +CONFIG_SCHED_SMT=y +CONFIG_CMDLINE_BOOL=y +CONFIG_CMDLINE="console=tty0 console=hvc0 powersave=off" +# CONFIG_SECCOMP is not set +CONFIG_NET=y +CONFIG_PACKET=y +CONFIG_UNIX=y +CONFIG_INET=y +CONFIG_IP_MULTICAST=y +CONFIG_NET_IPIP=y +CONFIG_SYN_COOKIES=y +# CONFIG_INET_XFRM_MODE_TRANSPORT is not set +# CONFIG_INET_XFRM_MODE_TUNNEL is not set +# CONFIG_INET_XFRM_MODE_BEET is not set +# CONFIG_IPV6 is not set +CONFIG_DNS_RESOLVER=y +# CONFIG_WIRELESS is not set +CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_DEVTMPFS=y +CONFIG_DEVTMPFS_MOUNT=y +CONFIG_MTD=m +CONFIG_MTD_POWERNV_FLASH=m +CONFIG_BLK_DEV_LOOP=y +CONFIG_BLK_DEV_RAM=y +CONFIG_BLK_DEV_RAM_SIZE=65536 +CONFIG_VIRTIO_BLK=m +CONFIG_BLK_DEV_NVME=m +CONFIG_EEPROM_AT24=y +# CONFIG_CXL is not set +CONFIG_BLK_DEV_SD=m +CONFIG_BLK_DEV_SR=m +CONFIG_BLK_DEV_SR_VENDOR=y +CONFIG_CHR_DEV_SG=m +CONFIG_SCSI_CONSTANTS=y +CONFIG_SCSI_SCAN_ASYNC=y +CONFIG_SCSI_FC_ATTRS=y +CONFIG_SCSI_CXGB3_ISCSI=m +CONFIG_SCSI_CXGB4_ISCSI=m +CONFIG_SCSI_BNX2_ISCSI=m +CONFIG_BE2ISCSI=m +CONFIG_SCSI_AACRAID=m +CONFIG_MEGARAID_NEWGEN=y +CONFIG_MEGARAID_MM=m +CONFIG_MEGARAID_MAILBOX=m +CONFIG_MEGARAID_SAS=m +CONFIG_SCSI_MPT2SAS=m +CONFIG_SCSI_IPR=m +# CONFIG_SCSI_IPR_TRACE is not set +# CONFIG_SCSI_IPR_DUMP is not set +CONFIG_SCSI_QLA_FC=m +CONFIG_SCSI_QLA_ISCSI=m +CONFIG_SCSI_LPFC=m +CONFIG_SCSI_VIRTIO=m +CONFIG_SCSI_DH=y +CONFIG_SCSI_DH_ALUA=m +CONFIG_ATA=y +CONFIG_SATA_AHCI=y +# CONFIG_ATA_SFF is not set +CONFIG_MD=y +CONFIG_BLK_DEV_MD=m +CONFIG_MD_LINEAR=m +CONFIG_MD_RAID0=m +CONFIG_MD_RAID1=m +CONFIG_MD_RAID10=m +CONFIG_MD_RAID456=m +CONFIG_MD_MULTIPATH=m +CONFIG_MD_FAULTY=m +CONFIG_BLK_DEV_DM=m +CONFIG_DM_CRYPT=m +CONFIG_DM_SNAPSHOT=m +CONFIG_DM_MIRROR=m +CONFIG_DM_ZERO=m +CONFIG_DM_MULTIPATH=m +CONFIG_ACENIC=m +CONFIG_ACENIC_OMIT_TIGON_I=y +CONFIG_TIGON3=y +CONFIG_BNX2X=m +CONFIG_CHELSIO_T1=y +CONFIG_BE2NET=m +CONFIG_S2IO=m +CONFIG_E100=m +CONFIG_E1000=m +CONFIG_E1000E=m +CONFIG_IXGB=m +CONFIG_IXGBE=m +CONFIG_MLX4_EN=m +CONFIG_MLX5_CORE=m +CONFIG_MLX5_CORE_EN=y +CONFIG_MYRI10GE=m +CONFIG_QLGE=m +CONFIG_NETXEN_NIC=m +CONFIG_SFC=m +# CONFIG_USB_NET_DRIVERS is not set +# CONFIG_WLAN is not set +CONFIG_INPUT_EVDEV=y +CONFIG_INPUT_MISC=y +# CONFIG_SERIO_SERPORT is not set +# CONFIG_DEVMEM is not set +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_IPMI_HANDLER=y +CONFIG_IPMI_DEVICE_INTERFACE=y +CONFIG_IPMI_POWERNV=y +CONFIG_HW_RANDOM=y +CONFIG_RAW_DRIVER=y +CONFIG_MAX_RAW_DEVS=1024 +CONFIG_TCG_TIS_I2C_NUVOTON=y +# CONFIG_I2C_COMPAT is not set +CONFIG_I2C_CHARDEV=y +# CONFIG_I2C_HELPER_AUTO is not set +CONFIG_DRM=y +CONFIG_DRM_RADEON=y +CONFIG_DRM_AST=m +CONFIG_FIRMWARE_EDID=y +CONFIG_FB_MODE_HELPERS=y +CONFIG_FB_OF=y +CONFIG_FB_MATROX=y +CONFIG_FB_MATROX_MILLENIUM=y +CONFIG_FB_MATROX_MYSTIQUE=y +CONFIG_FB_MATROX_G=y +# CONFIG_LCD_CLASS_DEVICE is not set +# CONFIG_BACKLIGHT_GENERIC is not set +# CONFIG_VGA_CONSOLE is not set +CONFIG_LOGO=y +CONFIG_USB_HIDDEV=y +CONFIG_USB=y +CONFIG_USB_MON=y +CONFIG_USB_XHCI_HCD=y +CONFIG_USB_EHCI_HCD=y +# CONFIG_USB_EHCI_HCD_PPC_OF is not set +CONFIG_USB_OHCI_HCD=y +CONFIG_USB_STORAGE=y +CONFIG_RTC_CLASS=y
Re: [PATCH 5/5] powernv:idle: Disable LOSE_FULL_CONTEXT states when stop-api fails.
On Sat, Jul 08, 2017 at 07:05:26PM +1000, Nicholas Piggin wrote: > On Fri, 7 Jul 2017 23:07:10 +0530 > Gautham R Shenoywrote: > > > On Fri, Jul 07, 2017 at 01:29:16AM +1000, Nicholas Piggin wrote: > > > On Wed, 5 Jul 2017 22:08:16 +0530 > > > "Gautham R. Shenoy" wrote: > > > > > > > From: "Gautham R. Shenoy" > > > > > > > > Currently, we use the opal call opal_slw_set_reg() to inform the that > > > > the Sleep-Winkle Engine (SLW) to restore the contents of some of the > > > > Hypervisor state on wakeup from deep idle states that lose full > > > > hypervisor context (characterized by the flag > > > > OPAL_PM_LOSE_FULL_CONTEXT). > > > > > > > > However, the current code has a bug in that if opal_slw_set_reg() > > > > fails, we don't disable the use of these deep states (winkle on > > > > POWER8, stop4 onwards on POWER9). > > > > > > > > This patch fixes this bug by ensuring that if the the sleep winkle > > > > engine is unable to restore the hypervisor states in > > > > pnv_save_sprs_for_deep_states(), then we mark as invalid the states > > > > which lose full context. > > > > > > > > As a side-effect, since supported_cpuidle_states in > > > > pnv_probe_idle_states() consists of flags of only the valid states, > > > > this patch will ensure that no other subsystem in the kernel can use > > > > the states which lose full context on stop-api failures. > > > > > > Looks good. Is there something minimal we can do for stable here? > > > > > > Aside question, do we need to restore LPCR at all with the SLW engine? > > > It gets set up again when by the idle wakeup code. > > > > > > > > > > And does POWER9 really need MSR and PSSCR restored by SLW? (going a bit > > > off topic here, I'm just curious) > > > > MSR is needed to be restored so that we wakeup with the right > > endianness and with the IR,DR disabled. > > And POWER8 does not require this? > > > PSSCR is set to a value so that in case of a special wakeup for a > > deep-stop, the SLW can program the core to go back to the stop level > > provided by the PSSCR value via the stop-api. > > It always restores to deepest stop? Is there any way to restore to the > achieved stop level? Maybe there is no usefulness for that. That would have been ideal. But there's no way to achieve that at the moment. The alternative is to have call the stop-api with psscr set to the desired stop level before every stop entry. This is will consume additional cycles which is what we are trying to avoid. So we are currently setting the psscr value to deepest stop via stop api as a compromise, because then on wakeup, we end up restoring more than what would typically be required, but that's still ok since we would be erring on the side of caution. Programming the PSSCR to any other value might have safety concerns. Eg: Suppose a core which was in stop11 got woken up by a special wakeup and if the psscr programmed via stop API was stop4 then the firmware will put the core in stop4. Now, since stop4 doesn't lose timebase and stop11 does, in the aforementioned case TB would have gone out of sync in the duration that the core was in stop11. Thus, when the core wakes up in the kernel in stop4, the kernel won't resync the TB which is a problematic. > > Thanks, > Nick >
Re: [RFC 1/4] libnvdimm: add to_{nvdimm,nd_region}_dev()
On Tue, Jul 11, 2017 at 9:53 AM, Dan Williamswrote: > On Tue, Jun 27, 2017 at 3:28 AM, Oliver O'Halloran wrote: >> struct device contains the ->of_node pointer so that devices can be >> assoicated with the device-tree node that created them on DT platforms. >> libnvdimm hides the struct device for regions and nvdimm devices inside >> of an opaque structure so this patch adds accessors for each to allow >> the of_nvdimm driver to set the of_node pointer. > > I'd rather go the other way and pass in the of_node to the bus and > dimm registration routines. It's a generic property of the device so > we should handle it like other generic device properties that get set > at initialization time like 'attr_groups' in nvdimm_bus_descriptor, or > a new parameter to nvdimm_create(). Sure. I just figured it would be preferable to keep firmware specific details inside the firmware driver rather than adding #ifdef CONFIG_OF around the place. Do you have any objections to making nvdimm_create() take a descriptor structure rather than adding a parameter?
Re: [PATCH v2 10/10] mtd: powernv_flash: Use opal_async_wait_response_interruptible()
Hi Cyril, [auto build test ERROR on powerpc/next] [also build test ERROR on v4.12 next-20170710] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Cyril-Bur/Allow-opal-async-waiters-to-get-interrupted/20170710-095504 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-allmodconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc All errors (new ones prefixed by >>): >> ERROR: ".opal_error_code" [drivers/mtd/devices/powernv_flash.ko] undefined! --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] powerpc/asm: Mark cr0 as clobbered in mftb()
On Mon, Jul 10, 2017 at 9:50 PM, Michael Ellermanwrote: > Oliver O'Halloran writes: > >> The workaround for the CELL timebase bug does not correctly mark cr0 as >> being clobbered. This can result in GCC making some poor^W completely >> broken optimisations. > > Fruit 'n oats, how did we never notice that? Luck I guess. Or subtle > breakage that no one could pin down :E Dumb luck probably. The workaround is inside a feature section which depends on CPU_FTR_CELL_TB_BUG so you would only ever see a problem when actually running on Cell (can't be that many people...). CR0 being volatile across function calls probably helped mask it too. > I'll tag it for stable. Good idea. > Your change log is not entirely fair, it's not GCC's fault, we changed > register state without telling it, so it's on us :) I'll reword it a > bit here. Yeah that was probably a bit harsh. > > cheers > >> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h >> index 7e50e47375d6..a3b6575c7842 100644 >> --- a/arch/powerpc/include/asm/reg.h >> +++ b/arch/powerpc/include/asm/reg.h >> @@ -1303,7 +1303,7 @@ static inline void msr_check_and_clear(unsigned long >> bits) >> " .llong 0\n" \ >> ".previous" \ >> : "=r" (rval) \ >> - : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL)); \ >> + : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL) : "cr0"); >> \ >> rval;}) >> #else >> #define mftb() ({unsigned long rval; \ >> -- >> 2.9.4
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
On 7/11/2017 10:28 AM, Michael Ellerman wrote: "Jin, Yao"writes: On 7/10/2017 9:46 PM, Peter Zijlstra wrote: On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote: PERF_BR_INT is triggered by instruction "int" . PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 transition). So your "PERF_BR_INT" is a system call? The "INT" thing has indeed been used as system call mechanism (typically INT 80). But these days we have special purpose syscall instructions. It could maybe be compared to the PPC "Unconditional TRAP with immediate" where you use the immediate value as an index into a handler vector. And PERF_BR_IRQ is not an interrupt request (as its name suggests), not what we call an "external interrupt" either; instead it is every interrupt that is not a system call? It is actual interrupts, but also faults, traps and all the other exceptions not caused by "INT" I think. Yes. It's interrupt, traps, faults. If from is in the user space and to is in the kernel, it indicates the ring3 -> ring0 transition. If the from instruction is not syscall or other ring transition instruction, it should be interrupt, traps and faults. That's how we get the PERF_BR_IRQ on x86. Anyway, maybe we just use a minimum but the most common set of branch types now, it could be a good start and acceptable on all architectures. PERF_BR_COND= 1,/* conditional */ PERF_BR_UNCOND= 2,/* unconditional */ PERF_BR_IND= 3,/* indirect */ PERF_BR_CALL= 4,/* call */ PERF_BR_IND_CALL= 5,/* indirect call */ PERF_BR_RET= 6,/* return */ That would be fine by me, if you're sick of talking about it and just want to get it merged :) :) I think you could expand it a bit, this list would cover the vast bulk of branch types for us: PERF_BR_COND /* Conditional */ PERF_BR_UNCOND /* Unconditional */ PERF_BR_IND /* Indirect */ PERF_BR_CALL /* Function call */ PERF_BR_IND_CALL /* Indirect function call */ PERF_BR_RET /* Function return */ PERF_BR_SYSCALL /* Syscall */ PERF_BR_SYSRET /* Syscall return */ PERF_BR_COND_CALL/* Conditional function call */ PERF_BR_COND_RET /* Conditional function return */ cheers OK, accept! Use 4 bits for above branch types and we can reserve 5 for potential future types. Thanks Jin Yao
Re: [PATCH] powerpc/asm: Mark cr0 as clobbered in mftb()
Segher Boessenkoolwrites: > Hi! > > On Mon, Jul 10, 2017 at 09:50:06PM +1000, Michael Ellerman wrote: >> Oliver O'Halloran writes: >> >> > The workaround for the CELL timebase bug does not correctly mark cr0 as >> > being clobbered. This can result in GCC making some poor^W completely >> > broken optimisations. >> >> Fruit 'n oats, how did we never notice that? Luck I guess. Or subtle >> breakage that no one could pin down :E > > GCC does not use cr0 before it has used cr7, cr5, cr6, cr1 (unless some > instruction _requires_ cr0, like record form ("dot") instructions, which > until recently were disabled when targetting Cell), so it isn't too easy > to hit the problem here. Maybe Oliver used a very new GCC? Or he was > unlucky. Aha. I think he actually hit the bug elsewhere and then noticed it here also? And yeah I think it was a dot instruction that triggered it. >> I'll tag it for stable. >> >> Your change log is not entirely fair, it's not GCC's fault, we changed >> register state without telling it, so it's on us :) I'll reword it a >> bit here. > > Yep, GCC works fine here, GIGO etc. It isn't feasible to make GCC warn > for most inline asm problems, either (we do not parse the mnemonics in > the asm; this is a fundamental part of the design; you must express all > constraints as, well, constraints). Ack. It's a terrible terrible interface, but that's the price we pay for wanting to insert arbitrary bits of asm inside an otherwise compiled language :) When I get the time I might go through our inline asm and convert all the cmpwi x,z to cmpwi cr0,x,z, so at least the cr0 usage is a little more obvious. cheers
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
"Jin, Yao"writes: > On 7/10/2017 9:46 PM, Peter Zijlstra wrote: >> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote: >> PERF_BR_INT is triggered by instruction "int" . PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 transition). >>> So your "PERF_BR_INT" is a system call? >> The "INT" thing has indeed been used as system call mechanism (typically >> INT 80). But these days we have special purpose syscall instructions. >> >> It could maybe be compared to the PPC "Unconditional TRAP with >> immediate" where you use the immediate value as an index into a handler >> vector. >> >>> And PERF_BR_IRQ is not an interrupt request (as its name suggests), >>> not what we call an "external interrupt" either; instead it is every >>> interrupt that is not a system call? >> It is actual interrupts, but also faults, traps and all the other >> exceptions not caused by "INT" I think. >> > Yes. It's interrupt, traps, faults. If from is in the user space and to > is in the kernel, it indicates the ring3 -> ring0 transition. > > If the from instruction is not syscall or other ring transition > instruction, it should be interrupt, traps and faults. That's how we get > the PERF_BR_IRQ on x86. > > Anyway, maybe we just use a minimum but the most common set of branch > types now, it could be a good start and acceptable on all architectures. > > PERF_BR_COND= 1,/* conditional */ > PERF_BR_UNCOND= 2,/* unconditional */ > PERF_BR_IND= 3,/* indirect */ > PERF_BR_CALL= 4,/* call */ > PERF_BR_IND_CALL= 5,/* indirect call */ > PERF_BR_RET= 6,/* return */ That would be fine by me, if you're sick of talking about it and just want to get it merged :) I think you could expand it a bit, this list would cover the vast bulk of branch types for us: PERF_BR_COND /* Conditional */ PERF_BR_UNCOND/* Unconditional */ PERF_BR_IND /* Indirect */ PERF_BR_CALL /* Function call */ PERF_BR_IND_CALL /* Indirect function call */ PERF_BR_RET /* Function return */ PERF_BR_SYSCALL /* Syscall */ PERF_BR_SYSRET/* Syscall return */ PERF_BR_COND_CALL /* Conditional function call */ PERF_BR_COND_RET /* Conditional function return */ cheers
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
Segher Boessenkoolwrites: > Hi! > > On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote: >> 1. We all agree these definitions: >> >> +PERF_BR_COND= 1,/* conditional */ >> +PERF_BR_UNCOND = 2,/* unconditional */ >> +PERF_BR_IND = 3,/* indirect */ >> +PERF_BR_CALL= 4,/* call */ >> +PERF_BR_IND_CALL= 5,/* indirect call */ >> +PERF_BR_RET = 6,/* return */ >> +PERF_BR_SYSCALL = 7,/* syscall */ >> +PERF_BR_SYSRET = 8,/* syscall return */ >> +PERF_BR_IRET= 11, /* return from interrupt */ > > Do we? It does not map very well to PowerPC branch types. I think they map well enough to the types of branches that are actually used in practice. To represent the full range of possibilities we'd need to switch to a bitmap of flags, ie. COND, IND, CALL, RET, SYSCALL, INT, etc. But it would need more than 4 bits and I don't think there's that much added value in being able to represent all the bizarre combinations. But maybe that is the best option as it makes the API more flexible and means we don't have to get the list of branches correct up front? I ran some quick numbers on a kernel I had here (powernv w/gcc 7): Type Percent - cond 40.92% beq (79166) bne (57379) ble (10411) bgt (9587) blt (6248) bge (3704) bdnz (1251) bdz (353) bns (30) bdnzf (2) bdnzt (1) uncond14.89% b (61182) indirect 0.10% bctr (418) call 33.33% bl (136926) ind call 1.44% bctrl (5912) return9.23% blr (37943) = 99.91% If we add cond call/return that covers another 0.08% taking us to 99.99% of branches. I know future compilers and or different code might use a different distribution, but I doubt it will change all that much. Maybe cond could be broken down further, but the only really meaningful sub category I can think of is the decrementing type, and those are quite rare. cheers
Re: [PATCH] tty: Fix TIOCGPTPEER ioctl definition
This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag because it doesn't copy anything from/to userspace to access the argument. Acked-by: Aleksa SaraiOops, I misunderstood what _IOR means semantically. TIL -- thanks! -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
[PATCH] tty: Fix TIOCGPTPEER ioctl definition
This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag because it doesn't copy anything from/to userspace to access the argument. Fixes: 54ebbfb1 ("tty: add TIOCGPTPEER ioctl") Signed-off-by: Gleb Fotengauer-Malinovskiy--- arch/alpha/include/uapi/asm/ioctls.h | 2 +- arch/mips/include/uapi/asm/ioctls.h| 2 +- arch/parisc/include/uapi/asm/ioctls.h | 2 +- arch/powerpc/include/uapi/asm/ioctls.h | 2 +- arch/sh/include/uapi/asm/ioctls.h | 2 +- arch/sparc/include/uapi/asm/ioctls.h | 2 +- arch/xtensa/include/uapi/asm/ioctls.h | 2 +- include/uapi/asm-generic/ioctls.h | 2 +- 8 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/alpha/include/uapi/asm/ioctls.h b/arch/alpha/include/uapi/asm/ioctls.h index ff67b837..1cd7dc7 100644 --- a/arch/alpha/include/uapi/asm/ioctls.h +++ b/arch/alpha/include/uapi/asm/ioctls.h @@ -100,7 +100,7 @@ #define TIOCGPKT _IOR('T', 0x38, int) /* Get packet mode state */ #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */ #define TIOCGEXCL _IOR('T', 0x40, int) /* Get exclusive mode state */ -#define TIOCGPTPEER_IOR('T', 0x41, int) /* Safely open the slave */ +#define TIOCGPTPEER_IO('T', 0x41) /* Safely open the slave */ #define TIOCSERCONFIG 0x5453 #define TIOCSERGWILD 0x5454 diff --git a/arch/mips/include/uapi/asm/ioctls.h b/arch/mips/include/uapi/asm/ioctls.h index 68e19b6..1609cb0 100644 --- a/arch/mips/include/uapi/asm/ioctls.h +++ b/arch/mips/include/uapi/asm/ioctls.h @@ -91,7 +91,7 @@ #define TIOCGPKT _IOR('T', 0x38, int) /* Get packet mode state */ #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */ #define TIOCGEXCL _IOR('T', 0x40, int) /* Get exclusive mode state */ -#define TIOCGPTPEER_IOR('T', 0x41, int) /* Safely open the slave */ +#define TIOCGPTPEER_IO('T', 0x41) /* Safely open the slave */ /* I hope the range from 0x5480 on is free ... */ #define TIOCSCTTY 0x5480 /* become controlling tty */ diff --git a/arch/parisc/include/uapi/asm/ioctls.h b/arch/parisc/include/uapi/asm/ioctls.h index 674c68a..d0e3321 100644 --- a/arch/parisc/include/uapi/asm/ioctls.h +++ b/arch/parisc/include/uapi/asm/ioctls.h @@ -60,7 +60,7 @@ #define TIOCGPKT _IOR('T', 0x38, int) /* Get packet mode state */ #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */ #define TIOCGEXCL _IOR('T', 0x40, int) /* Get exclusive mode state */ -#define TIOCGPTPEER_IOR('T', 0x41, int) /* Safely open the slave */ +#define TIOCGPTPEER_IO('T', 0x41) /* Safely open the slave */ #define FIONCLEX 0x5450 /* these numbers need to be adjusted. */ #define FIOCLEX0x5451 diff --git a/arch/powerpc/include/uapi/asm/ioctls.h b/arch/powerpc/include/uapi/asm/ioctls.h index bfd609a..e3b1046 100644 --- a/arch/powerpc/include/uapi/asm/ioctls.h +++ b/arch/powerpc/include/uapi/asm/ioctls.h @@ -100,7 +100,7 @@ #define TIOCGPKT _IOR('T', 0x38, int) /* Get packet mode state */ #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */ #define TIOCGEXCL _IOR('T', 0x40, int) /* Get exclusive mode state */ -#define TIOCGPTPEER_IOR('T', 0x41, int) /* Safely open the slave */ +#define TIOCGPTPEER_IO('T', 0x41) /* Safely open the slave */ #define TIOCSERCONFIG 0x5453 #define TIOCSERGWILD 0x5454 diff --git a/arch/sh/include/uapi/asm/ioctls.h b/arch/sh/include/uapi/asm/ioctls.h index eec7901..787bac9 100644 --- a/arch/sh/include/uapi/asm/ioctls.h +++ b/arch/sh/include/uapi/asm/ioctls.h @@ -93,7 +93,7 @@ #define TIOCGPKT _IOR('T', 0x38, int) /* Get packet mode state */ #define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */ #define TIOCGEXCL _IOR('T', 0x40, int) /* Get exclusive mode state */ -#define TIOCGPTPEER_IOR('T', 0x41, int) /* Safely open the slave */ +#define TIOCGPTPEER_IO('T', 0x41) /* Safely open the slave */ #define TIOCSERCONFIG _IO('T', 83) /* 0x5453 */ #define TIOCSERGWILD _IOR('T', 84, int) /* 0x5454 */ diff --git a/arch/sparc/include/uapi/asm/ioctls.h b/arch/sparc/include/uapi/asm/ioctls.h index 6d27398..f5df72b 100644 --- a/arch/sparc/include/uapi/asm/ioctls.h +++ b/arch/sparc/include/uapi/asm/ioctls.h @@ -88,7 +88,7 @@ #define TIOCGPTN _IOR('t', 134, unsigned int) /* Get Pty Number */ #define TIOCSPTLCK _IOW('t', 135, int) /* Lock/unlock PTY */ #define TIOCSIG_IOW('t', 136, int) /* Generate signal on Pty slave */ -#define TIOCGPTPEER_IOR('t', 137, int) /* Safely open the slave */ +#define TIOCGPTPEER_IO('t', 137) /* Safely open the slave */ /* Little f */ #define FIOCLEX_IO('f', 1) diff --git a/arch/xtensa/include/uapi/asm/ioctls.h b/arch/xtensa/include/uapi/asm/ioctls.h index 98b004e..47d82c0 100644 --- a/arch/xtensa/include/uapi/asm/ioctls.h +++ b/arch/xtensa/include/uapi/asm/ioctls.h @@ -105,7 +105,7 @@ #define TIOCGPKT _IOR('T', 0x38,
Re: [PATCH v2 08/10] powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async
On Mon, 2017-07-10 at 14:05 +, David Laight wrote: > From: Cyril Bur > > Sent: 10 July 2017 02:31 > > This patch adds an _interruptible version of opal_async_wait_response(). > > This is useful when a long running OPAL call is performed on behalf of a > > userspace thread, for example, the opal_flash_{read,write,erase} > > functions performed by the powernv-flash MTD driver. > > > > It is foreseeable that these functions would take upwards of two minutes > > causing the wait_event() to block long enough to cause hung task > > warnings. Furthermore, wait_event_interruptible() is preferable as > > otherwise there is no way for signals to stop the process which is going > > to be confusing in userspace. > > ISTM that if you are doing (something like) a flash full device erase > (that really can take minutes) it isn't actually an interruptible > operation - the flash chip will still be busy. > So allowing the user process be interrupted just leaves a big mess. > Agreed. > OTOH the 'hung task' warning isn't the only problem with uninterruptible > sleeps - the processes also count towards the 'load average'. > Some software believes the 'load average' is a meaningful value. > Yes, and because the read write and erase ops are actually calls into firmware which in some cases completely emulates flash we can mitigate the mess of allowing the process to be interrupted that you mention above. > It would be more generally useful for tasks to be able to sleep > uninterruptibly without counting towards the 'load average' or triggering > the 'task stuck' warning. > (I've code somewhere that sleeps interruptibly unless there is a signal > pending when it sleeps uninterruptibly.) > I'm not sure what you mean here, if I understand correctly, this is what I'm doing. In the patch to the powernv_flash driver which uses opal_async_wait_response_interruptible() I essentially do the interruptible and if it breaks early the driver determines if it is safe to return to userspace otherwise it sleeps uninterruptibly (hopefully not for long). I was hesitant to put that logic here and prefered to leave it up to the caller to make the decision as to what to do. > WRT flash erases, 'whole device' erases aren't significantly quicker > than sector by sector erases. Yes, I'm rewriting some of the userspace tools we have to do everything sector by sector. MTD interfaces allow big reads/writes and erases so we should still handle it as best we can. > The latter can be interrupted between sectors. > I'm not sure you'd want to do writes than lock down enough kernel > memory to take even a second to complete. > The MTD core actually chunks them up before passing them to the to backing driver so I don't think holding too much memory is a problem. Because the time spent in OPAL servicing flash calls is hard to determine and might not even be that related to the size of the operation, even smaller ops might still spend 'time' in OPAL. Cyril > David >
Re: [RFC 3/4] powerpc: Add pmem API support
On Tue, Jun 27, 2017 at 3:28 AM, Oliver O'Halloranwrote: > Adds powerpc64 implementations of: > > memcpy_flushcache() > arch_wb_cache_pmem() > arch_invalidate_pmem() > > Which form the architecture-specific portition of the persistent memory > API. These functions provide cache-management primitives for the DAX > drivers and libNVDIMM. > Ok, now that we have commit 0aed55af8834 "x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations" upstream that changes the model for how an architecture advertises pmem-specific cache management routines. CONFIG_ARCH_HAS_PMEM_API causes the pmem driver to try to link to these helpers rather than dummy fallbacks: arch_wb_cache_pmem arch_invalidate_pmem and CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE indicates that the arch has all the needed "flushcache" apis: __copy_from_user_flushcache memcpy_page_flushcache memcpy_flushcache
Re: [RFC 1/4] libnvdimm: add to_{nvdimm,nd_region}_dev()
On Tue, Jun 27, 2017 at 3:28 AM, Oliver O'Halloranwrote: > struct device contains the ->of_node pointer so that devices can be > assoicated with the device-tree node that created them on DT platforms. > libnvdimm hides the struct device for regions and nvdimm devices inside > of an opaque structure so this patch adds accessors for each to allow > the of_nvdimm driver to set the of_node pointer. I'd rather go the other way and pass in the of_node to the bus and dimm registration routines. It's a generic property of the device so we should handle it like other generic device properties that get set at initialization time like 'attr_groups' in nvdimm_bus_descriptor, or a new parameter to nvdimm_create().
Re: Today's linux-next build fail on powerpc
On Mon, 2017-07-10 at 18:17 +0300, Andy Shevchenko wrote: > On Mon, Jul 10, 2017 at 5:37 PM, Abdul Haleem >wrote: > > On Fri, 2017-07-07 at 19:36 +0300, Andy Shevchenko wrote: > >> On Thu, Jul 6, 2017 at 9:00 AM, Abdul Haleem > >> wrote: > >> > next-20170705 fails to build on powerpc with below errors. > >> I had sent a fix yesterday. Had you chance to test it? > > > drivers/i2c/busses/i2c-designware-platdrv.c:331:6: error: implicit > > declaration of function > > ‘i2c_detect_slave_mode’ [-Werror=implicit-function-declaration] > > if (i2c_detect_slave_mode(>dev)) > > ^ > > CC [M] fs/jffs2/scan.o > > > > Please point me to the correct patch you are referring to ? > > commit 8f1a357d41a22009150cf404b5aa5876efdb59b1 > >i2c: Provide a stub for i2c_detect_slave_mode() > Thanks Andy for the fix, the patch fixes the build errors. Machine booted with no issues. Reported-and-tested-by: Abdul Haleem -- Regard's Abdul Haleem IBM Linux Technology Centre
Re: [RFC v5 00/38] powerpc: Memory Protection Keys
On Sun, Jul 09, 2017 at 11:05:44PM -0700, Ram Pai wrote: > On Mon, Jul 10, 2017 at 11:13:23AM +0530, Anshuman Khandual wrote: > > On 07/06/2017 02:51 AM, Ram Pai wrote: . > > > do you have data points to show the difference in > > performance between this version and the last one where > > we skipped the bits from PTE and directly programmed the > > HPTE entries looking into VMA bits. > > No. I dont. I am hoping you can help me out with this. Anshuman, The last version where we skipped the PTE bits is guaranteed to be bad/horrible. For one it has a bug, since it accesses the vma without a lock. And even if we did take a lock, it will slow down the page-hash path un-acceptably. So there is no point measuring the performance of that design. I think the number we want to measure is -- the performance with the current design and comparing that to the performance without memkey feature. We want to find if there is any degradation by adding this feature. RP
Re: [PATCH] ptrace: Add compat PTRACE_{G,S}ETSIGMASK handlers
On 06/29, James Morse wrote: > > compat_ptrace_request() lacks handlers for PTRACE_{G,S}ETSIGMASK, > instead using those in ptrace_request(). The compat variant should > read a compat_sigset_t from userspace instead of ptrace_request()s > sigset_t. Acked-by: Oleg Nesterov
Re: Today's linux-next build fail on powerpc
On Mon, Jul 10, 2017 at 5:37 PM, Abdul Haleemwrote: > On Fri, 2017-07-07 at 19:36 +0300, Andy Shevchenko wrote: >> On Thu, Jul 6, 2017 at 9:00 AM, Abdul Haleem >> wrote: >> > next-20170705 fails to build on powerpc with below errors. >> I had sent a fix yesterday. Had you chance to test it? > drivers/i2c/busses/i2c-designware-platdrv.c:331:6: error: implicit > declaration of function > ‘i2c_detect_slave_mode’ [-Werror=implicit-function-declaration] > if (i2c_detect_slave_mode(>dev)) > ^ > CC [M] fs/jffs2/scan.o > > Please point me to the correct patch you are referring to ? commit 8f1a357d41a22009150cf404b5aa5876efdb59b1 i2c: Provide a stub for i2c_detect_slave_mode() -- With Best Regards, Andy Shevchenko
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
Hi Peter, On Mon, Jul 10, 2017 at 03:46:58PM +0200, Peter Zijlstra wrote: > On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote: > > > > PERF_BR_INT is triggered by instruction "int" . > > > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 > > > transition). > > > > So your "PERF_BR_INT" is a system call? > > The "INT" thing has indeed been used as system call mechanism (typically > INT 80). But these days we have special purpose syscall instructions. > > It could maybe be compared to the PPC "Unconditional TRAP with > immediate" where you use the immediate value as an index into a handler > vector. If we would do that, yes :-) (We just generate a SIGTRAP instead). > > And PERF_BR_IRQ is not an interrupt request (as its name suggests), > > not what we call an "external interrupt" either; instead it is every > > interrupt that is not a system call? > > It is actual interrupts, but also faults, traps and all the other > exceptions not caused by "INT" I think. Ah, right, exceptions == interrupts for PowerPC, more terminological confusion :-) Segher
Re: Today's linux-next build fail on powerpc
On Fri, 2017-07-07 at 19:36 +0300, Andy Shevchenko wrote: > On Thu, Jul 6, 2017 at 9:00 AM, Abdul Haleem >wrote: > > Hi Luis, > > > > next-20170705 fails to build on powerpc with below errors. > > Hi, > > I had sent a fix yesterday. Had you chance to test it? > > Are these the one you are referring to ? [v1,2/2] ACPI / boot: Don't define unused variables [v1,1/2] ACPI / boot: Correct address space of __acpi_map_table() Applying above patches did not fix the build failure. drivers/i2c/busses/i2c-designware-platdrv.c:331:6: error: implicit declaration of function ‘i2c_detect_slave_mode’ [-Werror=implicit-function-declaration] if (i2c_detect_slave_mode(>dev)) ^ CC [M] fs/jffs2/scan.o Please point me to the correct patch you are referring to ? -- Regard's Abdul Haleem IBM Linux Technology Centre
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
On 7/10/2017 9:46 PM, Peter Zijlstra wrote: On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote: PERF_BR_INT is triggered by instruction "int" . PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 transition). So your "PERF_BR_INT" is a system call? The "INT" thing has indeed been used as system call mechanism (typically INT 80). But these days we have special purpose syscall instructions. It could maybe be compared to the PPC "Unconditional TRAP with immediate" where you use the immediate value as an index into a handler vector. And PERF_BR_IRQ is not an interrupt request (as its name suggests), not what we call an "external interrupt" either; instead it is every interrupt that is not a system call? It is actual interrupts, but also faults, traps and all the other exceptions not caused by "INT" I think. Yes. It's interrupt, traps, faults. If from is in the user space and to is in the kernel, it indicates the ring3 -> ring0 transition. If the from instruction is not syscall or other ring transition instruction, it should be interrupt, traps and faults. That's how we get the PERF_BR_IRQ on x86. Anyway, maybe we just use a minimum but the most common set of branch types now, it could be a good start and acceptable on all architectures. PERF_BR_COND= 1,/* conditional */ PERF_BR_UNCOND= 2,/* unconditional */ PERF_BR_IND= 3,/* indirect */ PERF_BR_CALL= 4,/* call */ PERF_BR_IND_CALL= 5,/* indirect call */ PERF_BR_RET= 6,/* return */ Thanks Jin Yao
RE: [PATCH v2 08/10] powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async
From: Cyril Bur > Sent: 10 July 2017 02:31 > This patch adds an _interruptible version of opal_async_wait_response(). > This is useful when a long running OPAL call is performed on behalf of a > userspace thread, for example, the opal_flash_{read,write,erase} > functions performed by the powernv-flash MTD driver. > > It is foreseeable that these functions would take upwards of two minutes > causing the wait_event() to block long enough to cause hung task > warnings. Furthermore, wait_event_interruptible() is preferable as > otherwise there is no way for signals to stop the process which is going > to be confusing in userspace. ISTM that if you are doing (something like) a flash full device erase (that really can take minutes) it isn't actually an interruptible operation - the flash chip will still be busy. So allowing the user process be interrupted just leaves a big mess. OTOH the 'hung task' warning isn't the only problem with uninterruptible sleeps - the processes also count towards the 'load average'. Some software believes the 'load average' is a meaningful value. It would be more generally useful for tasks to be able to sleep uninterruptibly without counting towards the 'load average' or triggering the 'task stuck' warning. (I've code somewhere that sleeps interruptibly unless there is a signal pending when it sleeps uninterruptibly.) WRT flash erases, 'whole device' erases aren't significantly quicker than sector by sector erases. The latter can be interrupted between sectors. I'm not sure you'd want to do writes than lock down enough kernel memory to take even a second to complete. David
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote: > > PERF_BR_INT is triggered by instruction "int" . > > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 > > transition). > > So your "PERF_BR_INT" is a system call? The "INT" thing has indeed been used as system call mechanism (typically INT 80). But these days we have special purpose syscall instructions. It could maybe be compared to the PPC "Unconditional TRAP with immediate" where you use the immediate value as an index into a handler vector. > And PERF_BR_IRQ is not an interrupt request (as its name suggests), > not what we call an "external interrupt" either; instead it is every > interrupt that is not a system call? It is actual interrupts, but also faults, traps and all the other exceptions not caused by "INT" I think.
Re: [PATCH] powerpc/asm: Mark cr0 as clobbered in mftb()
Hi! On Mon, Jul 10, 2017 at 09:50:06PM +1000, Michael Ellerman wrote: > Oliver O'Halloranwrites: > > > The workaround for the CELL timebase bug does not correctly mark cr0 as > > being clobbered. This can result in GCC making some poor^W completely > > broken optimisations. > > Fruit 'n oats, how did we never notice that? Luck I guess. Or subtle > breakage that no one could pin down :E GCC does not use cr0 before it has used cr7, cr5, cr6, cr1 (unless some instruction _requires_ cr0, like record form ("dot") instructions, which until recently were disabled when targetting Cell), so it isn't too easy to hit the problem here. Maybe Oliver used a very new GCC? Or he was unlucky. > I'll tag it for stable. > > Your change log is not entirely fair, it's not GCC's fault, we changed > register state without telling it, so it's on us :) I'll reword it a > bit here. Yep, GCC works fine here, GIGO etc. It isn't feasible to make GCC warn for most inline asm problems, either (we do not parse the mnemonics in the asm; this is a fundamental part of the design; you must express all constraints as, well, constraints). Segher
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
Hi, Following branch types should be common enough, right? + PERF_BR_COND= 1,/* conditional */ + PERF_BR_UNCOND = 2,/* unconditional */ + PERF_BR_IND = 3,/* indirect */ + PERF_BR_CALL= 4,/* call */ + PERF_BR_IND_CALL= 5,/* indirect call */ + PERF_BR_RET = 6,/* return */ I decide to only define these types in this patch set. For other more arch-related branch type, we can add it in future. Is this OK? Thanks Jin Yao On 7/10/2017 9:10 PM, Segher Boessenkool wrote: Hi! On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote: 1. We all agree these definitions: + PERF_BR_COND= 1,/* conditional */ + PERF_BR_UNCOND = 2,/* unconditional */ + PERF_BR_IND = 3,/* indirect */ + PERF_BR_CALL= 4,/* call */ + PERF_BR_IND_CALL= 5,/* indirect call */ + PERF_BR_RET = 6,/* return */ + PERF_BR_SYSCALL = 7,/* syscall */ + PERF_BR_SYSRET = 8,/* syscall return */ + PERF_BR_IRET= 11, /* return from interrupt */ Do we? It does not map very well to PowerPC branch types. 2. I wish to keep following definitions for x86. + PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */ + PERF_BR_INT = 10, /* sw interrupt */ PERF_BR_INT is triggered by instruction "int" . PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 transition). So your "PERF_BR_INT" is a system call? And PERF_BR_IRQ is not an interrupt request (as its name suggests), not what we call an "external interrupt" either; instead it is every interrupt that is not a system call? It also does not follow the lines of "software caused interrupt" vs. the rest. 4. I'd like to add following types for powerpc. PERF_BR_COND_CALL /* Conditional call */ PERF_BR_COND_RET/* Condition return */ Almost all PowerPC branches have a "conditional" version (only "syscall" and "sysret/iret" do not -- and those last two are the same, just like PERF_BR_INT seems to be the same as PERF_BR_SYSCALL). So how should those PERF_BR_* be used? It cannot be used in an architecture-neutral interface the way you define it now. Segher
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
Hi! On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote: > 1. We all agree these definitions: > > + PERF_BR_COND= 1,/* conditional */ > + PERF_BR_UNCOND = 2,/* unconditional */ > + PERF_BR_IND = 3,/* indirect */ > + PERF_BR_CALL= 4,/* call */ > + PERF_BR_IND_CALL= 5,/* indirect call */ > + PERF_BR_RET = 6,/* return */ > + PERF_BR_SYSCALL = 7,/* syscall */ > + PERF_BR_SYSRET = 8,/* syscall return */ > + PERF_BR_IRET= 11, /* return from interrupt */ Do we? It does not map very well to PowerPC branch types. > 2. I wish to keep following definitions for x86. > > + PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */ > + PERF_BR_INT = 10, /* sw interrupt */ > > PERF_BR_INT is triggered by instruction "int" . > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 > transition). So your "PERF_BR_INT" is a system call? And PERF_BR_IRQ is not an interrupt request (as its name suggests), not what we call an "external interrupt" either; instead it is every interrupt that is not a system call? It also does not follow the lines of "software caused interrupt" vs. the rest. > 4. I'd like to add following types for powerpc. > > PERF_BR_COND_CALL /* Conditional call */ > PERF_BR_COND_RET/* Condition return */ Almost all PowerPC branches have a "conditional" version (only "syscall" and "sysret/iret" do not -- and those last two are the same, just like PERF_BR_INT seems to be the same as PERF_BR_SYSCALL). So how should those PERF_BR_* be used? It cannot be used in an architecture-neutral interface the way you define it now. Segher
Re: [PATCH] POWER9 PMU interrupt after idle workaround
On Monday 10 July 2017 11:49 AM, Nicholas Piggin wrote: POWER9 DD2 can see spurious PMU interrupts after state-loss idle in some conditions. A solution is to save and reload MMCR0 over state-loss idle. Acked-by: Madhavan SrinivasanSigned-off-by: Nicholas Piggin --- arch/powerpc/kernel/idle_book3s.S | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S index 5adb390e773b..516ebef905c0 100644 --- a/arch/powerpc/kernel/idle_book3s.S +++ b/arch/powerpc/kernel/idle_book3s.S @@ -30,6 +30,7 @@ * Use unused space in the interrupt stack to save and restore * registers for winkle support. */ +#define _MMCR0 GPR0 #define _SDR1 GPR3 #define _PTCR GPR3 #define _RPR GPR4 @@ -272,6 +273,14 @@ power_enter_stop: b pnv_wakeup_noloss .Lhandle_esl_ec_set: + /* +* POWER9 DD2 can incorrectly set PMAO when waking up after a +* state-loss idle. Saving and restoring MMCR0 over idle is a +* workaround. +*/ + mfspr r4,SPRN_MMCR0 + std r4,_MMCR0(r1) + /* * Check if the requested state is a deep idle state. */ @@ -450,10 +459,14 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300) pnv_restore_hyp_resource_arch300: /* * Workaround for POWER9, if we lost resources, the ERAT -* might have been mixed up and needs flushing. +* might have been mixed up and needs flushing. We also need +* to reload MMCR0 (see comment above). */ blt cr3,1f PPC_INVALIDATE_ERAT + ld r1,PACAR1(r13) + ld r4,_MMCR0(r1) + mtspr SPRN_MMCR0,r4 1: /* * POWER ISA 3. Use PSSCR to determine if we
Re: [PATCH] powerpc/asm: Mark cr0 as clobbered in mftb()
Oliver O'Halloranwrites: > The workaround for the CELL timebase bug does not correctly mark cr0 as > being clobbered. This can result in GCC making some poor^W completely > broken optimisations. Fruit 'n oats, how did we never notice that? Luck I guess. Or subtle breakage that no one could pin down :E I'll tag it for stable. Your change log is not entirely fair, it's not GCC's fault, we changed register state without telling it, so it's on us :) I'll reword it a bit here. cheers > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h > index 7e50e47375d6..a3b6575c7842 100644 > --- a/arch/powerpc/include/asm/reg.h > +++ b/arch/powerpc/include/asm/reg.h > @@ -1303,7 +1303,7 @@ static inline void msr_check_and_clear(unsigned long > bits) > " .llong 0\n" \ > ".previous" \ > : "=r" (rval) \ > - : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL)); \ > + : "i" (CPU_FTR_CELL_TB_BUG), "i" (SPRN_TBRL) : "cr0"); \ > rval;}) > #else > #define mftb() ({unsigned long rval; \ > -- > 2.9.4
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
Hi Michael, Please let me summarize for the new branch type definitions. 1. We all agree these definitions: + PERF_BR_COND= 1,/* conditional */ + PERF_BR_UNCOND = 2,/* unconditional */ + PERF_BR_IND = 3,/* indirect */ + PERF_BR_CALL= 4,/* call */ + PERF_BR_IND_CALL= 5,/* indirect call */ + PERF_BR_RET = 6,/* return */ + PERF_BR_SYSCALL = 7,/* syscall */ + PERF_BR_SYSRET = 8,/* syscall return */ + PERF_BR_IRET= 11, /* return from interrupt */ 2. I wish to keep following definitions for x86. + PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */ + PERF_BR_INT = 10, /* sw interrupt */ PERF_BR_INT is triggered by instruction "int" . PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 transition). 3. I can drop PERF_BR_FAR_BRANCH 4. I'd like to add following types for powerpc. PERF_BR_COND_CALL /* Conditional call */ PERF_BR_COND_RET/* Condition return */ If you agree these new definitions, I will prepare the new patch. Thanks Jin Yao On 7/10/2017 6:32 PM, Michael Ellerman wrote: "Jin, Yao"writes: On 7/10/2017 2:05 PM, Michael Ellerman wrote: Jin Yao writes: It is often useful to know the branch types while analyzing branch data. For example, a call is very different from a conditional branch. ... To keep consistent on kernel and userspace and make the classification more common, the patch adds the common branch type classification in perf_event.h. Most of the code and doc uses "branch" but then a few these are called "jump". Can we just stick with "branch"? PERF_BR_NONE : unknown PERF_BR_JCC : conditional jump PERF_BR_JMP : jump PERF_BR_IND_JMP : indirect jump eg: PERF_BR_COND: conditional branch PERF_BR_UNCOND : unconditional branch PERF_BR_IND : indirect branch Call and jump are all branches. If we want to figure out which one is jump and which one is call, we need the detail branch type definitions. Yeah I'm not saying we don't need the different types, I'm saying I'd rather we just called them "branch" not "jump". Just because "jump" can mean different things on different arches. For example, if we only say "PERF_BR_IND", we could not know if it's an indirect jump or indirect call. Yes we can, PERF_BR_IND is an indirect branch, which is not a call, because if it was a call then it would be PERF_BR_IND_CALL. PERF_BR_CALL : call PERF_BR_IND_CALL : indirect call PERF_BR_RET : return PERF_BR_SYSCALL : syscall PERF_BR_SYSRET: syscall return PERF_BR_IRQ : hw interrupt/trap/fault PERF_BR_INT : sw interrupt I'm not sure what that means, I'm guessing on x86 it means someone executed "int" ? PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt. OK, but I still don't know what that means :) What's an example of an instruction that is PERF_BR_IRQ and PERF_BR_INT ? PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call (direct call and indirect call) and return. Yep makes sense. PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return. Yep OK. Is that sufficiently useful to use up a bit? I think we only have 3 free? Do you means 3 bits? Each bit stands for one branch type? I guess what you mean is: PERF_BR_COND: conditional branch PERF_BR_UNCOND : unconditional branch PERF_BR_IND : indirect branch But 3 branch types are not enough for us. What I meant was you're using 4 bits for the type, so you have 16 possible values, and you've defined 13 of them. Meaning there are only 3 types free. So we should try to only define branch types that are really useful, and keep some free for future use. Maybe PERF_BR_INT is really common on x86 and so it's important to count it, but like I said above I don't know what it is. PERF_BR_IRET : return from interrupt PERF_BR_FAR_BRANCH: not generic far branch type What is a "not generic far branch" ? I don't know what that would mean on powerpc for example. It's reserved for future using I think. OK so let's not put it in the Linux API until it's defined? I think the only thing we have on powerpc that's commonly used and that isn't covered above is branches that decrement a loop counter and then branch based on the result. ... Sorry, I'm not familiar with powerpc arch. Or could you add the branch type which powerpc needs? These are good: + PERF_BR_COND= 1,/* conditional */ + PERF_BR_UNCOND = 2,/* unconditional */ + PERF_BR_IND = 3,/* indirect */ + PERF_BR_CALL= 4,/* call */ + PERF_BR_IND_CALL= 5,/* indirect call */ + PERF_BR_RET = 6,/* return */ These we wouldn't use currently, but
Re: issue with kernel 4.12.rc6 addnote -kernel dont build
luigi burdowrites: > Hi all, kernel here is not building. Did it just stop working? That code is from 2014? > attached file will explain better than my poor english. > > Host Machine BE Qoriq e5500 16GB ram What userspace are you running? Is it 32-bit ? > HOSTCC arch/powerpc/boot/addnote ^^ What compiler is that? Building with V=1 should show you. > arch/powerpc/boot/addnote.c: In function ‘main’: > arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of > type [-Wshift-count-overflow] > #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \ These warnings seem like they're correct, but that code should never actually run on 32-bit so it should be harmless. > mv: cannot move 'arch/powerpc/boot/.addnote.tmp' to > 'arch/powerpc/boot/.addnote.cmd': No such file or directory > scripts/Makefile.host:107: recipe for target 'arch/powerpc/boot/addnote' > failed > make[1]: *** [arch/powerpc/boot/addnote] Error 1 But they're only warnings, so they shouldn't be breaking the build AFAICS. cheers
Re: [PATCH] powerpc/powernv/idle: Put pnv_cpu_offline behind HOTPLUG_CPU
Joel Stanleywrites: > In commit 900612315788 ("powerpc/powernv/smp: Add busy-wait loop as fall back > for CPU-Hotplug") the idle code uses generic_check_cpu_restart(), but that > function is not available when CONFIG_HOTPLUG_CPU is disabled. > > arch/powerpc/platforms/powernv/idle.c: In function ‘pnv_cpu_offline’: > arch/powerpc/platforms/powernv/idle.c:286:11: error: implicit declaration of > function ‘generic_check_cpu_restart’ [-Werror=implicit-function-declaration] >while (!generic_check_cpu_restart(cpu)) { >^ > > The callers of pnv_cpu_offline are behind CONFIG_HOTPLUG_CPU, so fix the build > error by putting this code behind the same gard. > > Fixes: 900612315788 ("powerpc/powernv/smp: Add busy-wait loop as fall back > for CPU-Hotplug"). > Cc: # 4.12 > Signed-off-by: Joel Stanley This was fixed upstream in commit 67d204180886, but I forgot to tag it for stable. https://git.kernel.org/torvalds/c/67d204180886 I've just sent mail to stable asking for it to be backported to 4.12. cheers
Re: [PATCH 1/2] powerpc/mm/radix: Properly clear process table entry
Benjamin Herrenschmidtwrites: > On Mon, 2017-07-10 at 13:51 +1000, Michael Ellerman wrote: >> Benjamin Herrenschmidt writes: >> >> > On radix, the process table entry we want to clear when >> > destroying a context is entry 0, not entry 1 >> >> .. but has no real world consequence? Or causes the system to instantly >> catch fire? > > It has no *immediate* consequence on P9 , but it can cause other bugs > to become worse (such as the KVM issue I'm chasing). OK, I'll flag it for stable then to be safe. cheers
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
"Jin, Yao"writes: > On 7/10/2017 2:05 PM, Michael Ellerman wrote: >> Jin Yao writes: >> >>> It is often useful to know the branch types while analyzing branch >>> data. For example, a call is very different from a conditional branch. >>> ... >>> To keep consistent on kernel and userspace and make the classification >>> more common, the patch adds the common branch type classification >>> in perf_event.h. >> >> Most of the code and doc uses "branch" but then a few these are called >> "jump". Can we just stick with "branch"? >> >>> PERF_BR_NONE : unknown >>> PERF_BR_JCC : conditional jump >>> PERF_BR_JMP : jump >>> PERF_BR_IND_JMP : indirect jump >> eg: >> >> PERF_BR_COND: conditional branch >> PERF_BR_UNCOND : unconditional branch >> PERF_BR_IND : indirect branch > > Call and jump are all branches. If we want to figure out which one is > jump and which one is call, we need the detail branch type definitions. Yeah I'm not saying we don't need the different types, I'm saying I'd rather we just called them "branch" not "jump". Just because "jump" can mean different things on different arches. > For example, if we only say "PERF_BR_IND", we could not know if it's an > indirect jump or indirect call. Yes we can, PERF_BR_IND is an indirect branch, which is not a call, because if it was a call then it would be PERF_BR_IND_CALL. >>> PERF_BR_CALL : call >>> PERF_BR_IND_CALL : indirect call >>> PERF_BR_RET : return >>> PERF_BR_SYSCALL : syscall >>> PERF_BR_SYSRET: syscall return >>> PERF_BR_IRQ : hw interrupt/trap/fault >>> PERF_BR_INT : sw interrupt >> I'm not sure what that means, I'm guessing on x86 it means someone >> executed "int" ? > > PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt. OK, but I still don't know what that means :) What's an example of an instruction that is PERF_BR_IRQ and PERF_BR_INT ? > PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call > (direct call and indirect call) and return. Yep makes sense. > PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return. Yep OK. >> Is that sufficiently useful to use up a bit? I think we only have 3 >> free? > > Do you means 3 bits? Each bit stands for one branch type? I guess what > you mean is: > > PERF_BR_COND: conditional branch > PERF_BR_UNCOND : unconditional branch > PERF_BR_IND : indirect branch > > But 3 branch types are not enough for us. What I meant was you're using 4 bits for the type, so you have 16 possible values, and you've defined 13 of them. Meaning there are only 3 types free. So we should try to only define branch types that are really useful, and keep some free for future use. Maybe PERF_BR_INT is really common on x86 and so it's important to count it, but like I said above I don't know what it is. >>> PERF_BR_IRET : return from interrupt >>> PERF_BR_FAR_BRANCH: not generic far branch type >> What is a "not generic far branch" ? >> >> I don't know what that would mean on powerpc for example. > > It's reserved for future using I think. OK so let's not put it in the Linux API until it's defined? >> I think the only thing we have on powerpc that's commonly used and that >> isn't covered above is branches that decrement a loop counter and then >> branch based on the result. ... > > Sorry, I'm not familiar with powerpc arch. Or could you add the branch > type which powerpc needs? These are good: + PERF_BR_COND= 1,/* conditional */ + PERF_BR_UNCOND = 2,/* unconditional */ + PERF_BR_IND = 3,/* indirect */ + PERF_BR_CALL= 4,/* call */ + PERF_BR_IND_CALL= 5,/* indirect call */ + PERF_BR_RET = 6,/* return */ These we wouldn't use currently, but make sense: + PERF_BR_SYSCALL = 7,/* syscall */ + PERF_BR_SYSRET = 8,/* syscall return */ + PERF_BR_IRET= 11, /* return from interrupt */ These I'm not so sure about, I don't really know what they would map to for us: + PERF_BR_IRQ = 9,/* hw interrupt/trap/fault */ + PERF_BR_INT = 10, /* sw interrupt */ And sounds like this should be dropped for now: + PERF_BR_FAR_BRANCH = 12, /* not generic far branch type */ The branch types you haven't covered which might be useful for us are: PERF_BR_COND_CALL /* Conditional call */ PERF_BR_COND_RET/* Condition return */ cheers
Re: [PATCH] ptrace: Add compat PTRACE_{G,S}ETSIGMASK handlers
On Thu, Jun 29, 2017 at 05:26:37PM +0100, James Morse wrote: > compat_ptrace_request() lacks handlers for PTRACE_{G,S}ETSIGMASK, > instead using those in ptrace_request(). The compat variant should > read a compat_sigset_t from userspace instead of ptrace_request()s > sigset_t. > > While compat_sigset_t is the same size as sigset_t, it is defined as > 2xu32, instead of a single u64. On a big-endian CPU this means that > compat_sigset_t is passed to user-space using middle-endianness, > where the least-significant u32 is written most significant byte > first. > > If ptrace_request()s code is used userspace will read the most > significant u32 where it expected the least significant. > > Instead of duplicating ptrace_request()s code as a special case in > the arch code, handle it here. Hi James, I tested arm64/ilp32 on top of, and everything is fine. Yury Acked-by: Yury Norov> CC: Yury Norov > CC: Andrey Vagin > Reported-by: Zhou Chengming > Signed-off-by: James Morse > Fixes: 29000caecbe87 ("ptrace: add ability to get/set signal-blocked mask") > --- > LTP test case here: > https://lists.linux.it/pipermail/ltp/2017-June/004932.html > > kernel/ptrace.c | 52 > 1 file changed, 40 insertions(+), 12 deletions(-) > > diff --git a/kernel/ptrace.c b/kernel/ptrace.c > index 8d2c10714530..a5bebb6713e8 100644 > --- a/kernel/ptrace.c > +++ b/kernel/ptrace.c > @@ -843,6 +843,22 @@ static int ptrace_regset(struct task_struct *task, int > req, unsigned int type, > EXPORT_SYMBOL_GPL(task_user_regset_view); > #endif > > +static int ptrace_setsigmask(struct task_struct *child, sigset_t *new_set) > +{ > + sigdelsetmask(new_set, sigmask(SIGKILL)|sigmask(SIGSTOP)); > + > + /* > + * Every thread does recalc_sigpending() after resume, so > + * retarget_shared_pending() and recalc_sigpending() are not > + * called here. > + */ > + spin_lock_irq(>sighand->siglock); > + child->blocked = *new_set; > + spin_unlock_irq(>sighand->siglock); > + > + return 0; > +} > + > int ptrace_request(struct task_struct *child, long request, > unsigned long addr, unsigned long data) > { > @@ -914,18 +930,7 @@ int ptrace_request(struct task_struct *child, long > request, > break; > } > > - sigdelsetmask(_set, sigmask(SIGKILL)|sigmask(SIGSTOP)); > - > - /* > - * Every thread does recalc_sigpending() after resume, so > - * retarget_shared_pending() and recalc_sigpending() are not > - * called here. > - */ > - spin_lock_irq(>sighand->siglock); > - child->blocked = new_set; > - spin_unlock_irq(>sighand->siglock); > - > - ret = 0; > + ret = ptrace_setsigmask(child, _set); > break; > } > > @@ -1149,7 +1154,9 @@ int compat_ptrace_request(struct task_struct *child, > compat_long_t request, > compat_ulong_t addr, compat_ulong_t data) > { > compat_ulong_t __user *datap = compat_ptr(data); > + compat_sigset_t set32; > compat_ulong_t word; > + sigset_t new_set; > siginfo_t siginfo; > int ret; > > @@ -1189,6 +1196,27 @@ int compat_ptrace_request(struct task_struct *child, > compat_long_t request, > else > ret = ptrace_setsiginfo(child, ); > break; > + case PTRACE_GETSIGMASK: > + if (addr != sizeof(compat_sigset_t)) > + return -EINVAL; > + > + sigset_to_compat(, >blocked); > + > + if (copy_to_user(datap, , sizeof(set32))) > + return -EFAULT; > + > + ret = 0; > + break; > + case PTRACE_SETSIGMASK: > + if (addr != sizeof(compat_sigset_t)) > + return -EINVAL; > + > + if (copy_from_user(, datap, sizeof(compat_sigset_t))) > + return -EFAULT; > + > + sigset_from_compat(_set, ); > + ret = ptrace_setsigmask(child, _set); > + break; > #ifdef CONFIG_HAVE_ARCH_TRACEHOOK > case PTRACE_GETREGSET: > case PTRACE_SETREGSET: > -- > 2.11.0
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
On 7/10/2017 2:05 PM, Michael Ellerman wrote: Hi Jin Yao, Sorry I haven't commented until now, but it got lost in the flood of patches. Never mind, it's no problem. :) Just a few nit-picks below ... Jin Yaowrites: It is often useful to know the branch types while analyzing branch data. For example, a call is very different from a conditional branch. Currently we have to look it up in binary while the binary may later not be available and even the binary is available but user has to take some time. It is very useful for user to check it directly in perf report. Perf already has support for disassembling the branch instruction to get the x86 branch type. To keep consistent on kernel and userspace and make the classification more common, the patch adds the common branch type classification in perf_event.h. Most of the code and doc uses "branch" but then a few these are called "jump". Can we just stick with "branch"? PERF_BR_NONE : unknown PERF_BR_JCC : conditional jump PERF_BR_JMP : jump PERF_BR_IND_JMP : indirect jump eg: PERF_BR_COND: conditional branch PERF_BR_UNCOND : unconditional branch PERF_BR_IND : indirect branch Call and jump are all branches. If we want to figure out which one is jump and which one is call, we need the detail branch type definitions. For example, if we only say "PERF_BR_IND", we could not know if it's an indirect jump or indirect call. PERF_BR_CALL : call PERF_BR_IND_CALL : indirect call PERF_BR_RET : return PERF_BR_SYSCALL : syscall PERF_BR_SYSRET: syscall return PERF_BR_IRQ : hw interrupt/trap/fault PERF_BR_INT : sw interrupt I'm not sure what that means, I'm guessing on x86 it means someone executed "int" ? PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt. PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call (direct call and indirect call) and return. PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return. Is that sufficiently useful to use up a bit? I think we only have 3 free? Do you means 3 bits? Each bit stands for one branch type? I guess what you mean is: PERF_BR_COND: conditional branch PERF_BR_UNCOND : unconditional branch PERF_BR_IND : indirect branch But 3 branch types are not enough for us. PERF_BR_IRET : return from interrupt PERF_BR_FAR_BRANCH: not generic far branch type What is a "not generic far branch" ? I don't know what that would mean on powerpc for example. It's reserved for future using I think. I think the only thing we have on powerpc that's commonly used and that isn't covered above is branches that decrement a loop counter and then branch based on the result. It might be nice if we could separate those out from other conditional branches. Whether it's worth using a bit for I'm not sure. Do other arches have something similar? Those branches do tend to be "backward conditional", so that may be sufficient. But backward conditional also includes if bodies that have been moved out of line and then branch back to the main body of the function. cheers Sorry, I'm not familiar with powerpc arch. Or could you add the branch type which powerpc needs? For backward conditional and forward conditional, we compute them in userspace according to the from/to addresses. Thanks Jin Yao
Re: [RESEND PATCH v5 00/16] eeprom: at24: Add OF device ID table
Hello Wolfram, On Thu, Jun 15, 2017 at 8:54 PM, Javier Martinez Canillaswrote: > > This series is a follow-up to patch [0] that added an OF device ID table > to the at24 EEPROM driver. As you suggested [1], this version instead of > adding entries for every used tuple, only adds a single > entry for each chip type using the "atmel" vendor as a generic fallback. > > The first patch documents in the DT binding what's the correct vendor to > use and what are the ones that are being deprecated. The second one adds > the OF device ID table for the at24 driver and the next patches use this > vendor in the compatible string to each DTS that defines a compatible I2C > EEPROM device node. > > Patches can be applied independently since the DTS changes without driver > changes are no-op and the OF table won't be used without the DTS changes. > > [0]: https://lkml.org/lkml/2017/3/14/589 > [1]: https://lkml.org/lkml/2017/3/15/99 > Are you planning to pick this series? It has been in the list for months and were resent many times... Best regards, Javier
[PATCH] soc/qbman: Simplify bman_release()
Get the affine portal only once for the complete transaction. Signed-off-by: Sebastian Huber--- drivers/soc/fsl/qbman/bman.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c index a3d6d7cfa929..0af2069a6b7e 100644 --- a/drivers/soc/fsl/qbman/bman.c +++ b/drivers/soc/fsl/qbman/bman.c @@ -723,26 +723,23 @@ int bman_release(struct bman_pool *pool, const struct bm_buffer *bufs, u8 num) DPAA_ASSERT(num > 0 && num <= 8); - do { + while (1) { p = get_affine_portal(); local_irq_save(irqflags); avail = bm_rcr_get_avail(>p); if (avail < 2) update_rcr_ci(p, avail); r = bm_rcr_start(>p); - local_irq_restore(irqflags); - put_affine_portal(); if (likely(r)) break; + local_irq_restore(irqflags); + put_affine_portal(); + if (unlikely(--timeout == 0)) + return -ETIMEDOUT; udelay(1); - } while (--timeout); - - if (unlikely(!timeout)) - return -ETIMEDOUT; + } - p = get_affine_portal(); - local_irq_save(irqflags); /* * we can copy all but the first entry, as this can trigger badness * with the valid-bit -- 2.12.3
Re: [PATCH 2/2] powerpc/mm/radix: Synchronize updates to the process table
On Mon, 2017-07-10 at 14:40 +1000, Nicholas Piggin wrote: > On Fri, 07 Jul 2017 16:12:16 -0500 > Benjamin Herrenschmidtwrote: > > > When writing to the process table, we need to ensure the store is > > visible to a subsequent access by the MMU. We assume we never have > > the PID active while doing the update, so a ptesync/isync pair > > should hopefully be a big enough hammer for our purpose. > > > > Do we need this if it's going from invalid->valid? No. While there is no valid bit in radix, I checked with HW and they will not cache an entry that has an invalid RTS field. We should ensure this gets architected for future impl. though. > > > Signed-off-by: Benjamin Herrenschmidt > > --- > > > > Note: Architecturally, we also need to use a tlbie(l) with RIC=2 > > to flush the process table cache. However this is (very) expensive > > and we know that POWER9 will invalidate its cache when hitting the > > mtpid instruction. > > > > To be safe, we should add the tlbie for any ARCH300 processor we > > don't know about though. (Aneesh, Nick do we need a ftr bit ?) > > Good question, I'm not sure. Aside from this particular thing, it > seems like a good idea in general to add implementation specific > tests into the ftr framework. > > We could add the PVR into it so we don't have to pollute FTR bits. > The POWER9_DD1 bit for example could just be a PVR mask and cmp. Reading the PVR isn't necessarily cheap though, we may want to cache it. > > > > > arch/powerpc/mm/mmu_context_book3s64.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/arch/powerpc/mm/mmu_context_book3s64.c > > b/arch/powerpc/mm/mmu_context_book3s64.c > > index 9404b5e..e3e2803 100644 > > --- a/arch/powerpc/mm/mmu_context_book3s64.c > > +++ b/arch/powerpc/mm/mmu_context_book3s64.c > > @@ -138,6 +138,14 @@ static int radix__init_new_context(struct mm_struct > > *mm) > > rts_field = radix__get_tree_size(); > > process_tb[index].prtb0 = cpu_to_be64(rts_field | __pa(mm->pgd) | > > RADIX_PGD_INDEX_SIZE); > > > > + /* > > +* Order the above store with subsequent update of the PID > > +* register (at which point HW can start loading/caching > > +* the entry) and the corresponding load by the MMU from > > +* the L2 cache. > > +*/ > > + asm volatile("ptesync;isync" : : : "memory"); > > + > > mm->context.npu_context = NULL; > > > > return index; > >
[PATCH] POWER9 PMU interrupt after idle workaround
POWER9 DD2 can see spurious PMU interrupts after state-loss idle in some conditions. A solution is to save and reload MMCR0 over state-loss idle. Signed-off-by: Nicholas Piggin--- arch/powerpc/kernel/idle_book3s.S | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S index 5adb390e773b..516ebef905c0 100644 --- a/arch/powerpc/kernel/idle_book3s.S +++ b/arch/powerpc/kernel/idle_book3s.S @@ -30,6 +30,7 @@ * Use unused space in the interrupt stack to save and restore * registers for winkle support. */ +#define _MMCR0 GPR0 #define _SDR1 GPR3 #define _PTCR GPR3 #define _RPR GPR4 @@ -272,6 +273,14 @@ power_enter_stop: b pnv_wakeup_noloss .Lhandle_esl_ec_set: + /* +* POWER9 DD2 can incorrectly set PMAO when waking up after a +* state-loss idle. Saving and restoring MMCR0 over idle is a +* workaround. +*/ + mfspr r4,SPRN_MMCR0 + std r4,_MMCR0(r1) + /* * Check if the requested state is a deep idle state. */ @@ -450,10 +459,14 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300) pnv_restore_hyp_resource_arch300: /* * Workaround for POWER9, if we lost resources, the ERAT -* might have been mixed up and needs flushing. +* might have been mixed up and needs flushing. We also need +* to reload MMCR0 (see comment above). */ blt cr3,1f PPC_INVALIDATE_ERAT + ld r1,PACAR1(r13) + ld r4,_MMCR0(r1) + mtspr SPRN_MMCR0,r4 1: /* * POWER ISA 3. Use PSSCR to determine if we -- 2.11.0
Re: [RFC v5 00/38] powerpc: Memory Protection Keys
On Mon, Jul 10, 2017 at 11:13:23AM +0530, Anshuman Khandual wrote: > On 07/06/2017 02:51 AM, Ram Pai wrote: > > Memory protection keys enable applications to protect its > > address space from inadvertent access or corruption from > > itself. > > > > The overall idea: > > > > A process allocates a key and associates it with > > an address range withinits address space. > > The process then can dynamically set read/write > > permissions on the key without involving the > > kernel. Any code that violates the permissions > > of the address space; as defined by its associated > > key, will receive a segmentation fault. > > > > This patch series enables the feature on PPC64 HPTE > > platform. > > > > ISA3.0 section 5.7.13 describes the detailed specifications. > > > > > > Testing: > > This patch series has passed all the protection key > > tests available in the selftests directory. > > The tests are updated to work on both x86 and powerpc. > > > > version v5: > > (1) reverted back to the old design -- store the > > key in the pte, instead of bypassing it. > > The v4 design slowed down the hash page path. > > (2) detects key violation when kernel is told to > > access user pages. > > (3) further refined the patches into smaller consumable > > units > > (4) page faults handlers captures the faulting key > > from the pte instead of the vma. This closes a > > race between where the key update in the vma and > > a key fault caused cause by the key programmed > > in the pte. > > (5) a key created with access-denied should > > also set it up to deny write. Fixed it. > > (6) protection-key number is displayed in smaps > > the x86 way. > > Hello Ram, > > This patch series has now grown a lot. Do you have this > hosted some where for us to pull and test it out ? BTW https://github.com/rampai/memorykeys.git branch memkey.v5.3 > do you have data points to show the difference in > performance between this version and the last one where > we skipped the bits from PTE and directly programmed the > HPTE entries looking into VMA bits. No. I dont. I am hoping you can help me out with this. RP
Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
Hi Jin Yao, Sorry I haven't commented until now, but it got lost in the flood of patches. Just a few nit-picks below ... Jin Yaowrites: > It is often useful to know the branch types while analyzing branch > data. For example, a call is very different from a conditional branch. > > Currently we have to look it up in binary while the binary may later > not be available and even the binary is available but user has to take > some time. It is very useful for user to check it directly in perf > report. > > Perf already has support for disassembling the branch instruction > to get the x86 branch type. > > To keep consistent on kernel and userspace and make the classification > more common, the patch adds the common branch type classification > in perf_event.h. Most of the code and doc uses "branch" but then a few these are called "jump". Can we just stick with "branch"? > PERF_BR_NONE : unknown > PERF_BR_JCC : conditional jump > PERF_BR_JMP : jump > PERF_BR_IND_JMP : indirect jump eg: PERF_BR_COND: conditional branch PERF_BR_UNCOND : unconditional branch PERF_BR_IND : indirect branch > PERF_BR_CALL : call > PERF_BR_IND_CALL : indirect call > PERF_BR_RET : return > PERF_BR_SYSCALL : syscall > PERF_BR_SYSRET: syscall return > PERF_BR_IRQ : hw interrupt/trap/fault > PERF_BR_INT : sw interrupt I'm not sure what that means, I'm guessing on x86 it means someone executed "int" ? Is that sufficiently useful to use up a bit? I think we only have 3 free? > PERF_BR_IRET : return from interrupt > PERF_BR_FAR_BRANCH: not generic far branch type What is a "not generic far branch" ? I don't know what that would mean on powerpc for example. I think the only thing we have on powerpc that's commonly used and that isn't covered above is branches that decrement a loop counter and then branch based on the result. It might be nice if we could separate those out from other conditional branches. Whether it's worth using a bit for I'm not sure. Do other arches have something similar? Those branches do tend to be "backward conditional", so that may be sufficient. But backward conditional also includes if bodies that have been moved out of line and then branch back to the main body of the function. cheers
Re: [RFC v5 34/38] procfs: display the protection-key number associated with a vma
On Mon, Jul 10, 2017 at 08:37:28AM +0530, Anshuman Khandual wrote: > On 07/06/2017 02:52 AM, Ram Pai wrote: > > Display the pkey number associated with the vma in smaps of a task. > > The key will be seen as below: > > > > ProtectionKey: 0 > > > > Signed-off-by: Ram Pai> > --- > > arch/powerpc/kernel/setup_64.c |8 > > 1 files changed, 8 insertions(+), 0 deletions(-) > > > > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c > > index f35ff9d..ebc82b3 100644 > > --- a/arch/powerpc/kernel/setup_64.c > > +++ b/arch/powerpc/kernel/setup_64.c > > @@ -37,6 +37,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -745,3 +746,10 @@ static int __init disable_hardlockup_detector(void) > > } > > early_initcall(disable_hardlockup_detector); > > #endif > > + > > +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS > > Why not for X86 protection keys ? hmm.. I dont understand the comment. -- Ram Pai
Re: [RFC v5 38/38] Documentation: PowerPC specific updates to memory protection keys
On Mon, Jul 10, 2017 at 08:37:04AM +0530, Anshuman Khandual wrote: > On 07/06/2017 02:52 AM, Ram Pai wrote: > > Add documentation updates that capture PowerPC specific changes. > > > > Signed-off-by: Ram Pai> > --- > > Documentation/vm/protection-keys.txt | 85 > > ++ > > 1 files changed, 65 insertions(+), 20 deletions(-) > > > > diff --git a/Documentation/vm/protection-keys.txt > > b/Documentation/vm/protection-keys.txt > > index b643045..d50b6ab 100644 > > --- a/Documentation/vm/protection-keys.txt > > +++ b/Documentation/vm/protection-keys.txt > > @@ -1,21 +1,46 @@ > > -Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature > > -which will be found on future Intel CPUs. > > +Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature > > found in > > +new generation of intel CPUs and on PowerPC 7 and higher CPUs. > > > > Memory Protection Keys provides a mechanism for enforcing page-based > > -protections, but without requiring modification of the page tables > > -when an application changes protection domains. It works by > > -dedicating 4 previously ignored bits in each page table entry to a > > -"protection key", giving 16 possible keys. > > - > > -There is also a new user-accessible register (PKRU) with two separate > > -bits (Access Disable and Write Disable) for each key. Being a CPU > > -register, PKRU is inherently thread-local, potentially giving each > > -thread a different set of protections from every other thread. > > - > > -There are two new instructions (RDPKRU/WRPKRU) for reading and writing > > -to the new register. The feature is only available in 64-bit mode, > > -even though there is theoretically space in the PAE PTEs. These > > -permissions are enforced on data access only and have no effect on > > +protections, but without requiring modification of the page tables when an > > +application changes protection domains. > > + > > + > > +On Intel: > > + > > + It works by dedicating 4 previously ignored bits in each page table > > + entry to a "protection key", giving 16 possible keys. > > + > > + There is also a new user-accessible register (PKRU) with two separate > > + bits (Access Disable and Write Disable) for each key. Being a CPU > > + register, PKRU is inherently thread-local, potentially giving each > > + thread a different set of protections from every other thread. > > + > > + There are two new instructions (RDPKRU/WRPKRU) for reading and writing > > + to the new register. The feature is only available in 64-bit mode, > > + even though there is theoretically space in the PAE PTEs. These > > + permissions are enforced on data access only and have no effect on > > + instruction fetches. > > + > > + > > +On PowerPC: > > + > > + It works by dedicating 5 page table entry bits to a "protection key", > > + giving 32 possible keys. > > + > > + There is a user-accessible register (AMR) with two separate bits; > > + Access Disable and Write Disable, for each key. Being a CPU > > + register, AMR is inherently thread-local, potentially giving each > > + thread a different set of protections from every other thread. NOTE: > > + Disabling read permission does not disable write and vice-versa. > > We can only enable/disable entire access or write. Then how > read permission can be changed with protection keys directly ? Good catch. On powerpc there is a disable read and disable write. They both can be combined to disable access. Will fix the error. Read it as 'Access Read' . thanks. RP