Re: [PATCH] powerpc/papr_scm: Mark nvdimm as unarmed if needed during probe
On Mon, 29 Mar 2021 17:01:03 +0530, Vaibhav Jain wrote: > In case an nvdimm is found to be unarmed during probe then set its > NDD_UNARMED flag before nvdimm_create(). This would enforce a > read-only access to the ndimm region. Presently even if an nvdimm is > unarmed its not marked as read-only on ppc64 guests. > > The patch updates papr_scm_nvdimm_init() to force query of nvdimm > health via __drc_pmem_query_health() and if nvdimm is found to be > unarmed then set the nvdimm flag ND_UNARMED for nvdimm_create(). Applied to powerpc/next. [1/1] powerpc/papr_scm: Mark nvdimm as unarmed if needed during probe https://git.kernel.org/powerpc/c/adb68c38d8d49a3d60805479c558649bb2182473 cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v2] powerpc/mm: Add cond_resched() while removing hpte mappings
On Sun, 4 Apr 2021 22:01:48 +0530, Vaibhav Jain wrote: > While removing large number of mappings from hash page tables for > large memory systems as soft-lockup is reported because of the time > spent inside htap_remove_mapping() like one below: > > watchdog: BUG: soft lockup - CPU#8 stuck for 23s! > > NIP plpar_hcall+0x38/0x58 > LR pSeries_lpar_hpte_invalidate+0x68/0xb0 > Call Trace: > 0x1fff000 (unreliable) > pSeries_lpar_hpte_removebolted+0x9c/0x230 > hash__remove_section_mapping+0xec/0x1c0 > remove_section_mapping+0x28/0x3c > arch_remove_memory+0xfc/0x150 > devm_memremap_pages_release+0x180/0x2f0 > devm_action_release+0x30/0x50 > release_nodes+0x28c/0x300 > device_release_driver_internal+0x16c/0x280 > unbind_store+0x124/0x170 > drv_attr_store+0x44/0x60 > sysfs_kf_write+0x64/0x90 > kernfs_fop_write+0x1b0/0x290 > __vfs_write+0x3c/0x70 > vfs_write+0xd4/0x270 > ksys_write+0xdc/0x130 > system_call+0x5c/0x70 > > [...] Applied to powerpc/next. [1/1] powerpc/mm: Add cond_resched() while removing hpte mappings https://git.kernel.org/powerpc/c/a5d6a3e73acbd619dd5b7b831762b755f9e2db80 cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v3] powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall
On Mon, 29 Mar 2021 13:36:43 -0400, Shivaprasad G Bhat wrote: > Add support for ND_REGION_ASYNC capability if the device tree > indicates 'ibm,hcall-flush-required' property in the NVDIMM node. > Flush is done by issuing H_SCM_FLUSH hcall to the hypervisor. > > If the flush request failed, the hypervisor is expected to > to reflect the problem in the subsequent nvdimm H_SCM_HEALTH call. > > [...] Applied to powerpc/next. [1/1] powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall https://git.kernel.org/powerpc/c/75b7c05ebf902632f7f540c3eb0a8945c2d74aab cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v3] powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall
"Aneesh Kumar K.V" writes: > Shivaprasad G Bhat writes: > >> Add support for ND_REGION_ASYNC capability if the device tree >> indicates 'ibm,hcall-flush-required' property in the NVDIMM node. >> Flush is done by issuing H_SCM_FLUSH hcall to the hypervisor. >> >> If the flush request failed, the hypervisor is expected to >> to reflect the problem in the subsequent nvdimm H_SCM_HEALTH call. >> >> This patch prevents mmap of namespaces with MAP_SYNC flag if the >> nvdimm requires an explicit flush[1]. >> >> References: >> [1] >> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c > > > Reviewed-by: Aneesh Kumar K.V Do we need an ack from nvdimm folks on this? Or is it entirely powerpc internal (seems like it from the diffstat)? cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH] powerpc/papr_scm: Add PAPR command family to pass-through command-set
On Mon, 14 Sep 2020 02:49:04 +0530, Vaibhav Jain wrote: > Add NVDIMM_FAMILY_PAPR to the list of valid 'dimm_family_mask' > acceptable by papr_scm. This is needed as since commit > 92fe2aa859f5 ("libnvdimm: Validate command family indices") libnvdimm > performs a validation of 'nd_cmd_pkg.nd_family' received as part of > ND_CMD_CALL processing to ensure only known command families can use > the general ND_CMD_CALL pass-through functionality. > > [...] Applied to powerpc/next. [1/1] powerpc/papr_scm: Add PAPR command family to pass-through command-set https://git.kernel.org/powerpc/c/13135b461cf205941308984bd3271ec7d403dc40 cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v2] powerpc/papr_scm: Fix warning triggered by perf_stats_show()
On Sat, 12 Sep 2020 13:44:51 +0530, Vaibhav Jain wrote: > A warning is reported by the kernel in case perf_stats_show() returns > an error code. The warning is of the form below: > > papr_scm ibm,persistent-memory:ibm,pmemory@4411: > Failed to query performance stats, Err:-10 > dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count > fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count > > [...] Applied to powerpc/next. [1/1] powerpc/papr_scm: Fix warning triggered by perf_stats_show() https://git.kernel.org/powerpc/c/ca78ef2f08ccfa29b711d644964cdf9d7ace15e5 cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v2] powerpc/papr_scm: Fix warning triggered by perf_stats_show()
Vaibhav Jain writes: > A warning is reported by the kernel in case perf_stats_show() returns > an error code. The warning is of the form below: > > papr_scm ibm,persistent-memory:ibm,pmemory@4411: > Failed to query performance stats, Err:-10 > dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count > fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count > > On investigation it looks like that the compiler is silently truncating the > return value of drc_pmem_query_stats() from 'long' to 'int', since the > variable used to store the return code 'rc' is an 'int'. This > truncated value is then returned back as a 'ssize_t' back from > perf_stats_show() to 'dev_attr_show()' which thinks of it as a large > unsigned number and triggers this warning.. > > To fix this we update the type of variable 'rc' from 'int' to > 'ssize_t' that prevents the compiler from truncating the return value > of drc_pmem_query_stats() and returning correct signed value back from > perf_stats_show(). > > Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance >stats from PHYP') Please don't word wrap the Fixes tag it breaks b4. I've fixed it up this time. cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v2] powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute
On Mon, 7 Sep 2020 16:35:40 +0530, Vaibhav Jain wrote: > The newly introduced 'perf_stats' attribute uses the default access > mode of 0444 letting non-root users access performance stats of an > nvdimm and potentially force the kernel into issuing large number of > expensive HCALLs. Since the information exposed by this attribute > cannot be cached hence its better to ward of access to this attribute > from users who don't need to access these performance statistics. > > [...] Applied to powerpc/fixes. [1/1] powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute https://git.kernel.org/powerpc/c/0460534b532e5518c657c7d6492b9337d975eaa3 cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v2 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends
David Hildenbrand writes: > On 09.09.20 09:17, Greg Kroah-Hartman wrote: >> On Tue, Sep 08, 2020 at 10:10:08PM +0200, David Hildenbrand wrote: >>> We soon want to pass flags, e.g., to mark added System RAM resources. >>> mergeable. Prepare for that. >> >> What are these random "flags", and how do we know what should be passed >> to them? >> >> Why not make this an enumerated type so that we know it all works >> properly, like the GPF_* flags are? Passing around a random unsigned >> long feels very odd/broken... > > Agreed, an enum (mhp_flags) seems to give a better hint what can > actually be passed. Thanks! You probably know this but ... Just using a C enum doesn't get you any type safety. You can get some checking via sparse by using __bitwise, which is what gfp_t does. You don't actually have to use an enum for that, it works with #defines also. Or you can wrap the flag in a struct, the way atomic_t does, and then the compiler will prevent passing plain integers in place of your custom type. cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH] powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute
"Aneesh Kumar K.V" writes: > On 8/13/20 10:04 AM, Vaibhav Jain wrote: >> The newly introduced 'perf_stats' attribute uses the default access >> mode of 0444 letting non-root users access performance stats of an >> nvdimm and potentially force the kernel into issuing large number of >> expensive HCALLs. Since the information exposed by this attribute >> cannot be cached hence its better to ward of access to this attribute >> from users who don't need to access these performance statistics. >> >> Hence this patch adds check in perf_stats_show() to only let users >> that are 'perfmon_capable()' to read the nvdimm performance >> statistics. >> >> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance stats from >> PHYP') >> Reported-by: Aneesh Kumar K.V >> Signed-off-by: Vaibhav Jain >> --- >> arch/powerpc/platforms/pseries/papr_scm.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c >> b/arch/powerpc/platforms/pseries/papr_scm.c >> index f439f0dfea7d1..36c51bf8af9a8 100644 >> --- a/arch/powerpc/platforms/pseries/papr_scm.c >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c >> @@ -792,6 +792,10 @@ static ssize_t perf_stats_show(struct device *dev, >> struct nvdimm *dimm = to_nvdimm(dev); >> struct papr_scm_priv *p = nvdimm_provider_data(dimm); >> >> +/* Allow access only to perfmon capable users */ >> +if (!perfmon_capable()) >> +return -EACCES; >> + > > An access check is usually done in open(). This is the read callback IIUC. Yes. Otherwise an unprivileged user can open the file, and then trick a suid program into reading from it. cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH] powerpc/papr_scm: Make access mode of 'perf_stats' attribute file to '0400'
Vaibhav Jain writes: > The newly introduced 'perf_stats' attribute uses the default access > mode of 0444 letting non-root users access performance stats of an > nvdimm and potentially force the kernel into issuing large number of > expensive HCALLs. Since the information exposed by this attribute > cannot be cached hence its better to ward of access to this attribute > from non-root users. > > Hence this patch updates the access-mode of 'perf_stats' sysfs > attribute file to 0400 to make it only readable to root-users. Or should we ratelimit it? Fixes: ?? > Reported-by: Aneesh Kumar K.V > Signed-off-by: Vaibhav Jain cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v4 0/2] powerpc/papr_scm: add support for reporting NVDIMM 'life_used_percentage' metric
On Fri, 31 Jul 2020 12:11:51 +0530, Vaibhav Jain wrote: > Changes since v3[1]: > > * Fixed a rebase issue pointed out by Aneesh in first patch in the series. > > [1] > https://lore.kernel.org/linux-nvdimm/20200730121303.134230-1-vaib...@linux.ibm.com Applied to powerpc/next. [1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP https://git.kernel.org/powerpc/c/2d02bf835e5731de632c8a13567905fa7c0da01c [2/2] powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric https://git.kernel.org/powerpc/c/af0870c4e75655b1931d0a5ffde2f448a2794362 cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v7 0/7] Support new pmem flush and sync instructions for POWER
On Wed, 1 Jul 2020 12:52:28 +0530, Aneesh Kumar K.V wrote: > This patch series enables the usage os new pmem flush and sync instructions > on POWER > architecture. POWER10 introduces two new variants of dcbf instructions > (dcbstps and dcbfps) > that can be used to write modified locations back to persistent storage. > Additionally, > POWER10 also introduce phwsync and plwsync which can be used to establish > order of these > writes to persistent storage. > > This series exposes these instructions to the rest of the kernel. The existing > dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate > synchronization with OpenCAPI-hosted persistent storage. Hence the new > instructions > are added as a variant of the old ones that old hardware won't differentiate. > > [...] Applied to powerpc/next. [1/7] powerpc/pmem: Restrict papr_scm to P8 and above. https://git.kernel.org/powerpc/c/c83040192f3763b243ece26073d61a895b4a230f [2/7] powerpc/pmem: Add new instructions for persistent storage and sync https://git.kernel.org/powerpc/c/32db09d992ddc7d145595cff49cccfe14e018266 [3/7] powerpc/pmem: Add flush routines using new pmem store and sync instruction https://git.kernel.org/powerpc/c/d358042793183a57094dac45a44116e1165ac593 [4/7] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier https://git.kernel.org/powerpc/c/3e79f082ebfc130360bcee23e4dd74729dcafdf4 [5/7] powerpc/pmem: Update ppc64 to use the new barrier instruction. https://git.kernel.org/powerpc/c/76e6c73f33d4e1cc4de4f25c0bf66d59e42113c4 [6/7] powerpc/pmem: Avoid the barrier in flush routines https://git.kernel.org/powerpc/c/436499ab868f1a9e497cfdbf641affe8a122c571 [7/7] powerpc/pmem: Initialize pmem device on newer hardware https://git.kernel.org/powerpc/c/8c26ab72663b4affc31e47cdf77d61d0172d1033 cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [GIT PULL] libnvdimm for v5.8-rc2
Dan Williams writes: > Hi Linus, please pull from: > > git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm > tags/libnvdimm-for-5.8-rc2 > > ...to receive a feature (papr_scm health retrieval) and a fix (sysfs > attribute visibility) for v5.8. > > Vaibhav explains in the merge commit below why missing v5.8 would be > painful and I agreed to try a -rc2 pull because only cosmetics kept > this out of -rc1 and his initial versions were posted in more than > enough time for v5.8 consideration. > > === > These patches are tied to specific features that were committed to > customers in upcoming distros releases (RHEL and SLES) whose time-lines > are tied to 5.8 kernel release. > > Being able to track the health of an nvdimm is critical for our > customers that are running workloads leveraging papr-scm nvdimms. > Missing the 5.8 kernel would mean missing the distro timelines and > shifting forward the availability of this feature in distro kernels by > at least 6 months. > === > > I notice that these do not have an ack from Michael, but I had been > assuming that he was deferring this to a libnvdimm subsystem decision > ever since v7 back at the end of May where he said "I don't have > strong opinions about the user API, it's really up to the nvdimm > folks." [1] Yeah, sorry for not providing an actual ack, I didn't realise you were planning to send it for 5.8. The arch parts of that series are pretty boring plumbing of hypervisor calls, so the important details were all the libnvdimm related issues IMO. So please consider this a belated ack and thanks for getting it merged. cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
Vaibhav Jain writes: > Hi Ira, Mpe and Aneesh, > > Vaibhav Jain writes: > >> Michael Ellerman writes: >> >>> Ira Weiny writes: >>>> On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote: >>>>> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm >>>>> modules and add the command family to the white list of NVDIMM command >>>>> sets. Also advertise support for ND_CMD_CALL for the dimm >>>>> command mask and implement necessary scaffolding in the module to >>>>> handle ND_CMD_CALL ioctl and PDSM requests that we receive. >>> ... >>>>> + * >>>>> + * Payload Version: >>>>> + * >>>>> + * A 'payload_version' field is present in PDSM header that indicates a >>>>> specific >>>>> + * version of the structure present in PDSM Payload for a given PDSM >>>>> command. >>>>> + * This provides backward compatibility in case the PDSM Payload >>>>> structure >>>>> + * evolves and different structures are supported by 'papr_scm' and >>>>> 'libndctl'. >>>>> + * >>>>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the >>>>> version >>>>> + * of the payload struct it supports via 'payload_version' field. The >>>>> 'papr_scm' >>>>> + * module when servicing the PDSM envelope checks the 'payload_version' >>>>> and then >>>>> + * uses 'payload struct version' == MIN('payload_version field', >>>>> + * 'max payload-struct-version supported by papr_scm') to service the >>>>> PDSM. >>>>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of >>>>> payload >>>>> + * struct in returned 'payload_version' field. >>>> >>>> FWIW many people believe using a size rather than version is more >>>> sustainable. >>>> It is expected that new payload structures are larger (more features) than >>>> the >>>> previous payload structure. >>>> >>>> I can't find references at the moment through. >>> >>> I think clone_args is a good modern example: >>> >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/sched.h#n88 >> >> Thank Ira and Mpe for pointing this out. I looked into how clone3 sycall >> handles clone_args and few differences came out: >> >> * Unlike clone_args that are always transferred in one direction from >> user-space to kernel, payload contents of pdsms are transferred in both >> directions. Having a single version number makes it easier for >> user-space and kernel to determine what data will be exchanged. >> >> * For PDSMs, the version number is negotiated between libndctl and >> kernel. For example in case kernel only supports an older version of >> a structure, its free to send a lower version number back to >> libndctl. Such negotiations doesnt happen with clone3 syscall. > > If you are ok with the explaination above please let me know. I will > quickly spin off a v8 addressing your review comments. I don't have strong opinions about the user API, it's really up to the nvdimm folks. cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v4 1/2] x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
Hi Dan, Sorry one minor nit below. Dan Williams writes: > diff --git a/tools/testing/selftests/powerpc/copyloops/.gitignore > b/tools/testing/selftests/powerpc/copyloops/.gitignore > index ddaf140b8255..1152bcc819fe 100644 > --- a/tools/testing/selftests/powerpc/copyloops/.gitignore > +++ b/tools/testing/selftests/powerpc/copyloops/.gitignore > @@ -12,4 +12,4 @@ memcpy_p7_t1 > copyuser_64_exc_t0 > copyuser_64_exc_t1 > copyuser_64_exc_t2 > -memcpy_mcsafe_64 > +copy_mc_to_user Should be: +copy_mc_64 Otherwise the powerpc bits look good to me. Acked-by: Michael Ellerman (powerpc) cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP
Vaibhav Jain writes: > Thanks for reviewing this this patch Ira. My responses below: > Ira Weiny writes: >> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote: >>> Implement support for fetching nvdimm health information via >>> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair >>> of 64-bit big-endian integers, bitwise-and of which is then stored in >>> 'struct papr_scm_priv' and subsequently partially exposed to >>> user-space via newly introduced dimm specific attribute >>> 'papr/flags'. Since the hcall is costly, the health information is >>> cached and only re-queried, 60s after the previous successful hcall. ... >>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c >>> b/arch/powerpc/platforms/pseries/papr_scm.c >>> index f35592423380..142636e1a59f 100644 >>> --- a/arch/powerpc/platforms/pseries/papr_scm.c >>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c >>> @@ -39,6 +78,15 @@ struct papr_scm_priv { >>> struct resource res; >>> struct nd_region *region; >>> struct nd_interleave_set nd_set; >>> + >>> + /* Protect dimm health data from concurrent read/writes */ >>> + struct mutex health_mutex; >>> + >>> + /* Last time the health information of the dimm was updated */ >>> + unsigned long lasthealth_jiffies; >>> + >>> + /* Health information for the dimm */ >>> + u64 health_bitmap; >> >> I wonder if this should be typed big endian as you mention that it is in the >> commit message? > This was discussed in an earlier review of the patch series at > https://lore.kernel.org/linux-nvdimm/878sjetcis@mpe.ellerman.id.au > > Even though health bitmap is returned in big endian format (For ex > value 0xC00 indicates bits 0,1 set), its value is never > used. Instead only test for specific bits being set in the register is > done. This has already caused a lot of confusion, so let me try and clear it up. I will probably fail :) The value is not big endian. It's returned in a GPR (a register), from the hypervisor. The ordering of bytes in a register is not dependent on what endian we're executing in. It's true that the hypervisor will have been running big endian, and when it returns to us we will now be running little endian. But the value is unchanged, it was 0xC00 in the GPR while the HV was running and it's still 0xC00 when we return to Linux. You can see this in mambo, see below for an example. _However_, the specification of the bits in the bitmap value uses MSB 0 ordering, as is traditional for IBM documentation. That means the most significant bit, aka. the left most bit, is called "bit 0". See: https://en.wikipedia.org/wiki/Bit_numbering#MSB_0_bit_numbering That is the opposite numbering from what most people use, and in particular what most code in Linux uses, which is that bit 0 is the least significant bit. Which is where the confusion comes in. It's not that the bytes are returned in a different order, it's that the bits are numbered differently in the IBM documentation. The way to fix this kind of thing is to read the docs, and convert all the bits into correct numbering (LSB=0), and then throw away the docs ;) cheers In mambo we can set a breakpoint just before the kernel enters skiboot, towards the end of __opal_call. The kernel is running LE and skiboot runs BE. systemsim-p9 [~/skiboot/skiboot/external/mambo] b 0xc00c1744 breakpoint set at [0:0:0]: 0xc00c1744 (0x000C1744) Enc:0x2402004C : hrfid Then run: systemsim-p9 [~/skiboot/skiboot/external/mambo] c [0:0:0]: 0xC00C1744 (0x000C1744) Enc:0x2402004C : hrfid INFO: 121671618: (121671618): ** Execution stopped: user (tcl), ** 121671618: ** finished running 121671618 instructions ** And we stop there, on an hrfid that we haven't executed yet. We can print r0, to see the OPAL token: systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0 0x0019 ie. we're calling OPAL_CONSOLE_WRITE_BUFFER_SPACE (25). And we can print the MSR: systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr 0x92001033 64-bit mode (SF): 0x1 [64-bit mode] Hypervisor State (HV): 0x1 Vector Available (VEC): 0x1 Machine Check Interrupt Enable (ME): 0x1 Instruction Relocate (IR): 0x1 Data Relocate (DR): 0x1 Recoverable Interrupt (RI): 0x1 Little-Endian Mode (LE): 0x1 [little-endian] ie. we're little endian. We then step one instruction: systemsim-p9 [~/skiboot/skiboot/external/mambo] s [0:0:0]: 0x30002BF0 (0x30002BF0) Enc:0x7D9FFAA6 : mfspr r12,PIR Now we're in skiboot. Print the MSR again: systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr 0x92001002 64-bit mode (SF): 0x1 [64-bit mode] Hypervisor State (HV): 0x1 Vector Available (VEC): 0x1 Machine Check Interrupt Enable (ME):
Re: [RESEND PATCH v7 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
Ira Weiny writes: > On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote: >> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm >> modules and add the command family to the white list of NVDIMM command >> sets. Also advertise support for ND_CMD_CALL for the dimm >> command mask and implement necessary scaffolding in the module to >> handle ND_CMD_CALL ioctl and PDSM requests that we receive. ... >> + * >> + * Payload Version: >> + * >> + * A 'payload_version' field is present in PDSM header that indicates a >> specific >> + * version of the structure present in PDSM Payload for a given PDSM >> command. >> + * This provides backward compatibility in case the PDSM Payload structure >> + * evolves and different structures are supported by 'papr_scm' and >> 'libndctl'. >> + * >> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the >> version >> + * of the payload struct it supports via 'payload_version' field. The >> 'papr_scm' >> + * module when servicing the PDSM envelope checks the 'payload_version' and >> then >> + * uses 'payload struct version' == MIN('payload_version field', >> + * 'max payload-struct-version supported by papr_scm') to service the PDSM. >> + * After servicing the PDSM, 'papr_scm' put the negotiated version of >> payload >> + * struct in returned 'payload_version' field. > > FWIW many people believe using a size rather than version is more sustainable. > It is expected that new payload structures are larger (more features) than the > previous payload structure. > > I can't find references at the moment through. I think clone_args is a good modern example: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/sched.h#n88 cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v3 1/2] x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
Hi Dan, Just a couple of minor things ... Dan Williams writes: > In reaction to a proposal to introduce a memcpy_mcsafe_fast() > implementation Linus points out that memcpy_mcsafe() is poorly named > relative to communicating the scope of the interface. Specifically what > addresses are valid to pass as source, destination, and what faults / > exceptions are handled. Of particular concern is that even though x86 > might be able to handle the semantics of copy_mc_to_user() with its > common copy_user_generic() implementation other archs likely need / want > an explicit path for this case: ... > diff --git a/arch/powerpc/include/asm/uaccess.h > b/arch/powerpc/include/asm/uaccess.h > index 0969285996cb..dcbbcbf3552c 100644 > --- a/arch/powerpc/include/asm/uaccess.h > +++ b/arch/powerpc/include/asm/uaccess.h > @@ -348,6 +348,32 @@ do { > \ > extern unsigned long __copy_tofrom_user(void __user *to, > const void __user *from, unsigned long size); > > +#ifdef CONFIG_ARCH_HAS_COPY_MC > +extern unsigned long __must_check We try not to add extern in headers anymore. > +copy_mc_generic(void *to, const void *from, unsigned long size); > + > +static inline unsigned long __must_check > +copy_mc_to_kernel(void *to, const void *from, unsigned long size) > +{ > + return copy_mc_generic(to, from, size); > +} > +#define copy_mc_to_kernel copy_mc_to_kernel > + > +static inline unsigned long __must_check > +copy_mc_to_user(void __user *to, const void *from, unsigned long n) > +{ > + if (likely(check_copy_size(from, n, true))) { > + if (access_ok(to, n)) { > + allow_write_to_user(to, n); > + n = copy_mc_generic((void *)to, from, n); > + prevent_write_to_user(to, n); > + } > + } > + > + return n; > +} > +#endif Otherwise that looks fine. ... > diff --git a/tools/testing/selftests/powerpc/copyloops/Makefile > b/tools/testing/selftests/powerpc/copyloops/Makefile > index 0917983a1c78..959817e7567c 100644 > --- a/tools/testing/selftests/powerpc/copyloops/Makefile > +++ b/tools/testing/selftests/powerpc/copyloops/Makefile > @@ -45,9 +45,9 @@ $(OUTPUT)/memcpy_p7_t%: memcpy_power7.S $(EXTRA_SOURCES) > -D SELFTEST_CASE=$(subst memcpy_p7_t,,$(notdir $@)) \ > -o $@ $^ > > -$(OUTPUT)/memcpy_mcsafe_64: memcpy_mcsafe_64.S $(EXTRA_SOURCES) > +$(OUTPUT)/copy_mc: copy_mc.S $(EXTRA_SOURCES) > $(CC) $(CPPFLAGS) $(CFLAGS) \ > - -D COPY_LOOP=test_memcpy_mcsafe \ > + -D COPY_LOOP=test_copy_mc \ This needs a fixup: diff --git a/tools/testing/selftests/powerpc/copyloops/Makefile b/tools/testing/selftests/powerpc/copyloops/Makefile index 959817e7567c..b4eb5c4c6858 100644 --- a/tools/testing/selftests/powerpc/copyloops/Makefile +++ b/tools/testing/selftests/powerpc/copyloops/Makefile @@ -47,7 +47,7 @@ $(OUTPUT)/memcpy_p7_t%: memcpy_power7.S $(EXTRA_SOURCES) $(OUTPUT)/copy_mc: copy_mc.S $(EXTRA_SOURCES) $(CC) $(CPPFLAGS) $(CFLAGS) \ - -D COPY_LOOP=test_copy_mc \ + -D COPY_LOOP=test_copy_mc_generic \ -o $@ $^ $(OUTPUT)/copyuser_64_exc_t%: copyuser_64.S exc_validate.c ../harness.c \ > -o $@ $^ > > $(OUTPUT)/copyuser_64_exc_t%: copyuser_64.S exc_validate.c ../harness.c \ > diff --git a/tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S > b/tools/testing/selftests/powerpc/copyloops/copy_mc.S > similarity index 100% > rename from tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S > rename to tools/testing/selftests/powerpc/copyloops/copy_mc.S This file is a symlink to the file in arch/powerpc/lib, so the name of the link needs updating, as well as the target. Also is there a reason you dropped the "_64"? It would make most sense to keep it I think, as then the file in selftests and the file in arch/ have the same name. If you want to keep the copy_mc.S name it needs the diff below. Though as I said I think it would be better to use copy_mc_64.S. cheers diff --git a/tools/testing/selftests/powerpc/copyloops/copy_mc.S b/tools/testing/selftests/powerpc/copyloops/copy_mc.S new file mode 12 index ..dcbe06d500fb --- /dev/null +++ b/tools/testing/selftests/powerpc/copyloops/copy_mc.S @@ -0,0 +1 @@ +../../../../../arch/powerpc/lib/copy_mc_64.S \ No newline at end of file diff --git a/tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S b/tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S deleted file mode 12 index f0feef3062f6.. --- a/tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S +++ /dev/null @@ -1 +0,0 @@ -../../../../../arch/powerpc/lib/memcpy_mcsafe_64.S \ No newline at end of file -- 2.25.1 ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to
Re: [PATCH v7 2/5] seq_buf: Export seq_buf_printf() to external modules
Borislav Petkov writes: > On Fri, May 08, 2020 at 04:19:19PM +0530, Vaibhav Jain wrote: >> 'seq_buf' provides a very useful abstraction for writing to a string >> buffer without needing to worry about it over-flowing. However even >> though the API has been stable for couple of years now its stills not >> exported to external modules limiting its usage. >> >> Hence this patch proposes update to 'seq_buf.c' to mark >> seq_buf_printf() which is part of the seq_buf API to be exported to >> external GPL modules. This symbol will be used in later parts of this > > What is "external GPL modules"? A module that has MODULE_LICENSE("GPL") ? cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v6 3/4] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
Vaibhav Jain writes: > Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm > modules and add the command family to the white list of NVDIMM command > sets. Also advertise support for ND_CMD_CALL for the dimm > command mask and implement necessary scaffolding in the module to > handle ND_CMD_CALL ioctl and PDSM requests that we receive. > > The layout of the PDSM request as we expect from libnvdimm/libndctl is > described in newly introduced uapi header 'papr_scm_pdsm.h' which > defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used > to communicate the PDSM request via member > 'nd_pkg_papr_scm->nd_command' and size of payload that need to be > sent/received for servicing the PDSM. > > A new function is_cmd_valid() is implemented that reads the args to > papr_scm_ndctl() and performs sanity tests on them. A new function > papr_scm_service_pdsm() is introduced and is called from > papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL > command from libnvdimm. > > Cc: Dan Williams > Cc: Michael Ellerman > Cc: "Aneesh Kumar K . V" > Signed-off-by: Vaibhav Jain > --- > Changelog > > v5..v6 : > * Changed the usage of the term DSM to PDSM to distinguish it from the > ACPI term [ Dan Williams ] > * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct > to reflect the new terminology. > * Updated the patch description and title to reflect the new terminology. > * Squashed patch to introduce new command family in 'ndctl.h' with > this patch [ Dan Williams ] > * Updated the papr_scm_pdsm method starting index from 0x1 to 0x0 > [ Dan Williams ] > * Removed redundant license text from the papr_scm_psdm.h file. > [ Dan Williams ] > * s/envelop/envelope/ at various places [ Dan Williams ] > * Added '__packed' attribute to command package header to gaurd > against different compiler adding paddings between the fields. > [ Dan Williams] > * Converted various pr_debug to dev_debug [ Dan Williams ] > > v4..v5 : > * None > > v3..v4 : > * None > > v2..v3 : > * Updated the patch prefix to 'ndctl/uapi' [Aneesh] > > v1..v2 : > * None > --- > arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 153 ++ > arch/powerpc/platforms/pseries/papr_scm.c | 101 +++- > include/uapi/linux/ndctl.h| 1 + > 3 files changed, 249 insertions(+), 6 deletions(-) > create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h > > diff --git a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h > b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h > new file mode 100644 > index ..ec48b5c7fc18 > --- /dev/null > +++ b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h > @@ -0,0 +1,153 @@ > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ > +/* > + * PAPR-SCM Dimm specific methods (PDSM) and structs for libndctl > + * > + * (C) Copyright IBM 2020 > + * > + * Author: Vaibhav Jain > + */ > + > +#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ > +#define _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ > + > +#include > + > +#ifdef __KERNEL__ > +#include > +#else > +#include > +#endif We shouldn't be checking __KERNEL__ in uapi headers, something has gone wrong if this is necessary. > + > +/* > + * PDSM Envelope: > + * > + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via > + * 'envelopes' which consists of a header and user-defined payload sections. > + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a > + * payload following it and offset of which relative to the struct is > provided > + * by 'nd_pdsm_cmd_pkg.payload_offset'. * > + * > + * +-+-+---+ > + * | 64-Bytes | 8-Bytes | Max 184-Bytes | > + * +-+-+---+ > + * | nd_pdsm_cmd_pkg | | > + * |-+ | | > + * | nd_cmd_pkg | | | > + * +-+-+---+ > + * | nd_family | | | > + * | nd_size_out | cmd_status || > + * | nd_size_in | payload_version | PAYLOAD | > + * | nd_command | payload_offset ->| > + * | nd_fw_size | || > + * +-+-+---+ > + * > + * PDSM Header: > + * > + * The header is defined as 's
Re: [PATCH v6 2/4] powerpc/papr_scm: Fetch nvdimm health information from PHYP
Vaibhav Jain writes: > Implement support for fetching nvdimm health information via > H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair > of 64-bit big-endian integers, bitwise-and of which is then stored in > 'struct papr_scm_priv' and subsequently partially exposed to > user-space via newly introduced dimm specific attribute > 'papr/flags'. Since the hcall is costly, the health information is > cached and only re-queried, 60s after the previous successful hcall. > > The patch also adds a new asm header named 'papr_scm.h' describing the > interface between PHYP and guest kernel. A documentation text > describing flags reported by the the new sysfs attribute 'papr/flags' > is also introduced at Documentation/ABI/testing/sysfs-bus-papr-scm. > > [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for > PAPR hcalls") > > Cc: Dan Williams > Cc: Michael Ellerman > Cc: "Aneesh Kumar K . V" > Signed-off-by: Vaibhav Jain > --- > Changelog > > v5..v6 : > * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags' > [Dan Williams] > * Include documentation for 'papr/flags' attr [Dan Williams] > * Change flag 'save_fail' to 'flush_fail' [Dan Williams] > * Caching of health bitmap to reduce expensive hcalls [Dan Williams] > * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe] > * Replaced two __be64 integers from papr_scm_priv to a single u64 > integer [Mpe] > * Updated patch description to reflect the changes made in this > version. > * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from > flags_show() [Dan Williams] > > v4..v5 : > * None > > v3..v4 : > * None > > v2..v3 : > * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for >NVDIMM unarmed [Aneesh] > > v1..v2 : > * New patch in the series. > --- > Documentation/ABI/testing/sysfs-bus-papr-scm | 27 > arch/powerpc/include/asm/papr_scm.h | 49 > arch/powerpc/platforms/pseries/papr_scm.c| 126 ++- > 3 files changed, 200 insertions(+), 2 deletions(-) > create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm > create mode 100644 arch/powerpc/include/asm/papr_scm.h > > diff --git a/Documentation/ABI/testing/sysfs-bus-papr-scm > b/Documentation/ABI/testing/sysfs-bus-papr-scm > new file mode 100644 > index ..001e4d34ab5c > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-bus-papr-scm > @@ -0,0 +1,27 @@ > +What:/sys/bus/nd/devices/nmemX/papr/flags > +Date:Apr, 2020 > +KernelVersion: v5.8 > +Contact: linuxppc-dev , > linux-nvdimm@lists.01.org, > +Description: > + (RO) Report flags indicating various states of a > + papr-scm NVDIMM device. Each flag maps to a one or > + more bits set in the dimm-health-bitmap retrieved in > + response to H_SCM_HEALTH hcall. The details of the bit > + flags returned in response to this hcall is available > + at 'Documentation/powerpc/papr_hcalls.rst' . Below are > + the flags reported in this sysfs file: > + > + * "not_armed" : Indicating that nvdimm contents will not NVDIMM? > + survive a power cycle. > + * "flush_fail" : Indicating that nvdimm contents > + couldn't be flushed during last > + shutdown event. > + * "restore_fail": Indicating that nvdimm contents > + couldn't be restored during dimm DIMM? > + initialization. > + * "encrypted" : Dimm contents are encrypted. > + * "smart_notify": There is health event for the nvdimm. > + * "scrubbed": Indicating that contents of the > + nvdimm have been scrubbed. > + * "locked" : Indicating that nvdimm contents cant > + be modified until next power cycle. Some of these strings are not very meaningful to me, I would choose different values. But I assume you are using these because they are defined somewhere else, where is that? > diff --git a/arch/powerpc/include/asm/papr_scm.h > b/arch/powerpc/include/asm/papr_scm.h > new file mode 100644 > index ..b51c048e906a > --- /dev/null > +++ b/arch/powerpc/include/asm/papr_scm.h > @@ -0,0 +1,49 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +/* > + * Structures and defines needed to manage nvdimms for spapr guests. > + */ > +#ifndef _AS
Re: [PATCH v6 1/4] powerpc: Document details on H_SCM_HEALTH hcall
Vaibhav Jain writes: > Add documentation to 'papr_hcalls.rst' describing the bitmap flags > that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM > specification. > > Cc: Dan Williams > Cc: Michael Ellerman > Cc: "Aneesh Kumar K . V" > Signed-off-by: Vaibhav Jain > --- > Changelog: > > v5..v6 > * New patch in the series > --- > Documentation/powerpc/papr_hcalls.rst | 43 --- > 1 file changed, 39 insertions(+), 4 deletions(-) > > diff --git a/Documentation/powerpc/papr_hcalls.rst > b/Documentation/powerpc/papr_hcalls.rst > index 3493631a60f8..9a5ba5eaf323 100644 > --- a/Documentation/powerpc/papr_hcalls.rst > +++ b/Documentation/powerpc/papr_hcalls.rst > @@ -220,13 +220,48 @@ from the LPAR memory. > **H_SCM_HEALTH** > > | Input: drcIndex > -| Out: *health-bitmap, health-bit-valid-bitmap* > +| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)* > | Return Value: *H_Success, H_Parameter, H_Hardware* > > Given a DRC Index return the info on predictive failure and overall health of > -the NVDIMM. The asserted bits in the health-bitmap indicate a single > predictive > -failure and health-bit-valid-bitmap indicate which bits in health-bitmap are > -valid. > +the NVDIMM. The asserted bits in the health-bitmap indicate one or more > states > +(described in table below) of the NVDIMM and health-bit-valid-bitmap indicate > +which bits in health-bitmap are valid. > + > +Health Bitmap Flags: > + > ++--+---+ > +| Bit | Definition > | > ++==+===+ > +| 00 | SCM device is unable to persist memory contents. > | > +| | If the system is powered down, nothing will be saved. > | > ++--+---+ Are these correct bit numbers or backward IBM big endian bit numbers? ie. which bit is LSB? cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v4 03/25] powerpc/powernv: Map & release OpenCAPI LPC memory
Benjamin Herrenschmidt writes: > On Wed, 2020-04-01 at 01:48 -0700, Dan Williams wrote: >> > >> > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size) >> > +{ >> > + struct pci_controller *hose = pci_bus_to_host(pdev->bus); >> > + struct pnv_phb *phb = hose->private_data; >> >> Is calling the local variable 'hose' instead of 'host' on purpose? > > Haha that's funny :-) > > It's an ld usage that comes iirc from sparc ? or maybe alpha ? Yeah it was alpha, I found it in the history tree: https://github.com/mpe/linux-fullhistory/blob/1928de59ba4209dc5e9f2cef63560c09ba0df73b/arch/alpha/kernel/mcpcia.c And airlied found an old manual which confirms it: The TIOP module interfaces the AlphaServer 8000 system bus to four I/O channels, called "hoses." https://www.hpl.hp.com/hpjournal/dtj/vol7num1/vol7num1art4.pdf So at least now we know where it comes from. It's also used widely in mips, microblaze, sh and a little bit in drm. cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v5 1/4] powerpc/papr_scm: Fetch nvdimm health information from PHYP
Vaibhav Jain writes: > Thanks for reviewing this patch Mpe, > Michael Ellerman writes: >> Vaibhav Jain writes: ... >> >>> + /* Check for various masks in bitmap and set the buffer */ >>> + if (health & PAPR_SCM_DIMM_UNARMED_MASK) >>> + rc += sprintf(buf, "not_armed "); >> >> I know buf is "big enough" but using sprintf() in 2020 is a bit ... :) >> >> seq_buf is a pretty thin wrapper over a buffer you can use to make this >> cleaner and also handles overflow for you. >> >> See eg. show_user_instructions() for an example. > > Unfortunatly seq_buf_printf() is still not an exported symbol hence not > usable in external modules. Send a patch? :) cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v5 1/4] powerpc/papr_scm: Fetch nvdimm health information from PHYP
Vaibhav Jain writes: > Implement support for fetching nvdimm health information via > H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair > of 64-bit big-endian integers which are then stored in 'struct > papr_scm_priv' and subsequently partially exposed to user-space via > newly introduced dimm specific attribute 'papr_flags'. Also a new asm > header named 'papr-scm.h' is added that describes the interface > between PHYP and guest kernel. > > Following flags are reported via 'papr_flags' sysfs attribute contents > of which are space separated string flags indicating various nvdimm > states: > > * "not_armed": Indicating that nvdimm contents wont survive a power > cycle. > * "save_fail": Indicating that nvdimm contents couldn't be flushed > during last shutdown event. > * "restore_fail": Indicating that nvdimm contents couldn't be restored > during dimm initialization. > * "encrypted": Dimm contents are encrypted. > * "smart_notify": There is health event for the nvdimm. > * "scrubbed" : Indicating that contents of the nvdimm have been > scrubbed. > * "locked" : Indicating that nvdimm contents cant be modified > until next power cycle. > > [1]: commit 58b278f568f0 ("powerpc: Provide initial documentation for > PAPR hcalls") > > Signed-off-by: Vaibhav Jain > --- > Changelog: > > v4..v5 : None > > v3..v4 : None > > v2..v3 : Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for >NVDIMM unarmed [Aneesh] > > v1..v2 : New patch in the series. > --- > arch/powerpc/include/asm/papr_scm.h | 48 ++ > arch/powerpc/platforms/pseries/papr_scm.c | 105 +- > 2 files changed, 151 insertions(+), 2 deletions(-) > create mode 100644 arch/powerpc/include/asm/papr_scm.h > > diff --git a/arch/powerpc/include/asm/papr_scm.h > b/arch/powerpc/include/asm/papr_scm.h > new file mode 100644 > index ..868d3360f56a > --- /dev/null > +++ b/arch/powerpc/include/asm/papr_scm.h > @@ -0,0 +1,48 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +/* > + * Structures and defines needed to manage nvdimms for spapr guests. > + */ > +#ifndef _ASM_POWERPC_PAPR_SCM_H_ > +#define _ASM_POWERPC_PAPR_SCM_H_ > + > +#include > +#include > + > +/* DIMM health bitmap bitmap indicators */ > +/* SCM device is unable to persist memory contents */ > +#define PAPR_SCM_DIMM_UNARMEDPPC_BIT(0) Please don't use PPC_BIT, it's just unncessary obfuscation for folks who are reading the code without access to the docs (ie. more or less everyone other than you :) > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > b/arch/powerpc/platforms/pseries/papr_scm.c > index 0b4467e378e5..aaf2e4ab1f75 100644 > --- a/arch/powerpc/platforms/pseries/papr_scm.c > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > @@ -14,6 +14,7 @@ > #include > > #include > +#include > > #define BIND_ANY_ADDR (~0ul) > > @@ -39,6 +40,13 @@ struct papr_scm_priv { > struct resource res; > struct nd_region *region; > struct nd_interleave_set nd_set; > + > + /* Protect dimm data from concurrent access */ > + struct mutex dimm_mutex; > + > + /* Health information for the dimm */ > + __be64 health_bitmap; > + __be64 health_bitmap_valid; It's much less error prone to store the data in CPU endian and do the endian conversion only at the point where the data either comes from or goes to firmware. That would also mean you can define flags above without needing PPC_BIT because they'll be in CPU endian too. > @@ -144,6 +152,35 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p) > return drc_pmem_bind(p); > } > > +static int drc_pmem_query_health(struct papr_scm_priv *p) > +{ > + unsigned long ret[PLPAR_HCALL_BUFSIZE]; > + int64_t rc; Use kernel types please, ie. s64, or just long. > + rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index); > + if (rc != H_SUCCESS) { > + dev_err(>pdev->dev, > + "Failed to query health information, Err:%lld\n", rc); > + return -ENXIO; > + } > + > + /* Protect modifications to papr_scm_priv with the mutex */ > + rc = mutex_lock_interruptible(>dimm_mutex); > + if (rc) > + return rc; > + > + /* Store the retrieved health information in dimm platform data */ > + p->health_bitmap = ret[0]; > + p->health_bitmap_valid = ret[1]; > + > + dev_dbg(>pdev->dev, > + "Queried dimm health info. Bitmap:0x%016llx Mask:0x%016llx\n", > + be64_to_cpu(p->health_bitmap), > + be64_to_cpu(p->health_bitmap_valid)); > + > + mutex_unlock(>dimm_mutex); > + return 0; > +} > > static int papr_scm_meta_get(struct papr_scm_priv *p, >struct nd_cmd_get_config_data_hdr *hdr) > @@ -304,6 +341,67 @@ static inline int
Re: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices
"Oliver O'Halloran" writes: > On Thu, Apr 2, 2020 at 2:42 PM Michael Ellerman wrote: >> "Alastair D'Silva" writes: >> >> -Original Message- >> >> From: Dan Williams >> >> >> >> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva >> >> wrote: >> >> > >> >> > *snip* >> >> Are OPAL calls similar to ACPI DSMs? I.e. methods for the OS to invoke >> >> platform firmware services? What's Skiboot? >> > >> > Yes, OPAL is the interface to firmware for POWER. Skiboot is the >> > open-source (and only) implementation of OPAL. >> >> https://github.com/open-power/skiboot >> >> In particular the tokens for calls are defined here: >> >> https://github.com/open-power/skiboot/blob/master/include/opal-api.h#L220 >> >> And you can grep for the token to find the implementation: >> >> https://github.com/open-power/skiboot/blob/master/hw/npu2-opencapi.c#L2328 > > I'm not sure I'd encourage anyone to read npu2-opencapi.c. I find it > hard enough to follow even with access to the workbooks. Compared to certain firmwares that run on certain other platforms it's actually pretty readable code ;) > There's an OPAL call API reference here: > http://open-power.github.io/skiboot/doc/opal-api/index.html Even better. cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
RE: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices
"Alastair D'Silva" writes: >> -Original Message- >> From: Dan Williams >> >> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva >> wrote: >> > >> > This series adds support for OpenCAPI Persistent Memory devices on >> > bare metal (arch/powernv), exposing them as nvdimms so that we can >> > make use of the existing infrastructure. There already exists a driver >> > for the same devices abstracted through PowerVM (arch/pseries): >> > arch/powerpc/platforms/pseries/papr_scm.c >> > >> > These devices are connected via OpenCAPI, and present as LPC (lowest >> coherence point) memory to the system, practically, that means that >> memory on these cards could be treated as conventional, cache-coherent >> memory. >> > >> > Since the devices are connected via OpenCAPI, they are not enumerated >> via ACPI. Instead, OpenCAPI links present as pseudo-PCI bridges, with >> devices below them. >> > >> > This series introduces a driver that exposes the memory on these cards as >> nvdimms, with each card getting it's own bus. This is somewhat complicated >> by the fact that the cards do not have out of band persistent storage for >> metadata, so 1 SECTION_SIZE's (see SPARSEMEM) worth of storage is carved >> out of the top of the card storage to implement the ndctl_config_* calls. >> >> Is it really tied to section-size? Can't that change based on the configured >> page-size? It's not clear to me why that would be the choice, but I'll dig >> into >> the implementation. >> > > I had tried using PAGE_SIZE, but ran into problems carving off just 1 page > and handing it to the kernel, while leaving the rest as pmem. That was a > while ago though, so maybe I should retry it. > >> > The driver is not responsible for configuring the NPU (NVLink Processing >> Unit) BARs to map the LPC memory from the card into the system's physical >> address space, instead, it requests this to be done via OPAL calls (typically >> implemented by Skiboot). >> >> Are OPAL calls similar to ACPI DSMs? I.e. methods for the OS to invoke >> platform firmware services? What's Skiboot? >> > > Yes, OPAL is the interface to firmware for POWER. Skiboot is the open-source > (and only) implementation of OPAL. https://github.com/open-power/skiboot In particular the tokens for calls are defined here: https://github.com/open-power/skiboot/blob/master/include/opal-api.h#L220 And you can grep for the token to find the implementation: https://github.com/open-power/skiboot/blob/master/hw/npu2-opencapi.c#L2328 cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v4 1/5] mm/memremap_pages: Introduce memremap_compat_align()
Dan Williams writes: > The "sub-section memory hotplug" facility allows memremap_pages() users > like libnvdimm to compensate for hardware platforms like x86 that have a > section size larger than their hardware memory mapping granularity. The > compensation that sub-section support affords is being tolerant of > physical memory resources shifting by units smaller (64MiB on x86) than > the memory-hotplug section size (128 MiB). Where the platform > physical-memory mapping granularity is limited by the number and > capability of address-decode-registers in the memory controller. > > While the sub-section support allows memremap_pages() to operate on > sub-section (2MiB) granularity, the Power architecture may still > require 16MiB alignment on "!radix_enabled()" platforms. > > In order for libnvdimm to be able to detect and manage this per-arch > limitation, introduce memremap_compat_align() as a common minimum > alignment across all driver-facing memory-mapping interfaces, and let > Power override it to 16MiB in the "!radix_enabled()" case. > > The assumption / requirement for 16MiB to be a viable > memremap_compat_align() value is that Power does not have platforms > where its equivalent of address-decode-registers never hardware remaps a > persistent memory resource on smaller than 16MiB boundaries. Note that I > tried my best to not add a new Kconfig symbol, but header include > entanglements defeated the #ifndef memremap_compat_align design pattern > and the need to export it defeats the __weak design pattern for arch > overrides. > > Based on an initial patch by Aneesh. > > Link: > http://lore.kernel.org/r/capcyv4gbgnp95apyabcsocea50tqj9b5h__83vgngjq3oug...@mail.gmail.com > Reported-by: Aneesh Kumar K.V > Reported-by: Jeff Moyer > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Reviewed-by: Aneesh Kumar K.V > Signed-off-by: Dan Williams > --- > arch/powerpc/Kconfig |1 + > arch/powerpc/mm/ioremap.c | 21 + > drivers/nvdimm/pfn_devs.c |2 +- > include/linux/memremap.h |8 > include/linux/mmzone.h|1 + > lib/Kconfig |3 +++ > mm/memremap.c | 23 +++ > 7 files changed, 58 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 497b7d0b2d7e..e6ffe905e2b9 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -122,6 +122,7 @@ config PPC > select ARCH_HAS_GCOV_PROFILE_ALL > select ARCH_HAS_KCOV > select ARCH_HAS_HUGEPD if HUGETLB_PAGE > + select ARCH_HAS_MEMREMAP_COMPAT_ALIGN > select ARCH_HAS_MMIOWB if PPC64 > select ARCH_HAS_PHYS_TO_DMA > select ARCH_HAS_PMEM_API > diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c > index fc669643ce6a..b1a0aebe8c48 100644 > --- a/arch/powerpc/mm/ioremap.c > +++ b/arch/powerpc/mm/ioremap.c > @@ -2,6 +2,7 @@ > > #include > #include > +#include > #include > #include > > @@ -97,3 +98,23 @@ void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t > offset, unsigned long size, > > return NULL; > } > + > +#ifdef CONFIG_ZONE_DEVICE > +/* > + * Override the generic version in mm/memremap.c. > + * > + * With hash translation, the direct-map range is mapped with just one > + * page size selected by htab_init_page_sizes(). Consult > + * mmu_psize_defs[] to determine the minimum page size alignment. > +*/ > +unsigned long memremap_compat_align(void) > +{ > + unsigned int shift = mmu_psize_defs[mmu_linear_psize].shift; > + > + if (radix_enabled()) > + return SUBSECTION_SIZE; > + return max(SUBSECTION_SIZE, 1UL << shift); > + > +} > +EXPORT_SYMBOL_GPL(memremap_compat_align); > +#endif LGTM. Acked-by: Michael Ellerman (powerpc) cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH 2/5] mm/memremap_pages: Introduce memremap_compat_align()
Dan Williams writes: > The "sub-section memory hotplug" facility allows memremap_pages() users > like libnvdimm to compensate for hardware platforms like x86 that have a > section size larger than their hardware memory mapping granularity. The > compensation that sub-section support affords is being tolerant of > physical memory resources shifting by units smaller (64MiB on x86) than > the memory-hotplug section size (128 MiB). Where the platform > physical-memory mapping granularity is limited by the number and > capability of address-decode-registers in the memory controller. > > While the sub-section support allows memremap_pages() to operate on > sub-section (2MiB) granularity, the Power architecture may still > require 16MiB alignment on "!radix_enabled()" platforms. > > In order for libnvdimm to be able to detect and manage this per-arch > limitation, introduce memremap_compat_align() as a common minimum > alignment across all driver-facing memory-mapping interfaces, and let > Power override it to 16MiB in the "!radix_enabled()" case. > > The assumption / requirement for 16MiB to be a viable > memremap_compat_align() value is that Power does not have platforms > where its equivalent of address-decode-registers never hardware remaps a > persistent memory resource on smaller than 16MiB boundaries. > > Based on an initial patch by Aneesh. > > Link: > http://lore.kernel.org/r/capcyv4gbgnp95apyabcsocea50tqj9b5h__83vgngjq3oug...@mail.gmail.com > Reported-by: Aneesh Kumar K.V > Reported-by: Jeff Moyer > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Signed-off-by: Dan Williams > --- > arch/powerpc/include/asm/io.h | 10 ++ > drivers/nvdimm/pfn_devs.c |2 +- > include/linux/io.h| 23 +++ > include/linux/mmzone.h|1 + > 4 files changed, 35 insertions(+), 1 deletion(-) The powerpc change here looks fine to me. Acked-by: Michael Ellerman (powerpc) cheers > diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h > index a63ec938636d..0fa2dc483008 100644 > --- a/arch/powerpc/include/asm/io.h > +++ b/arch/powerpc/include/asm/io.h > @@ -734,6 +734,16 @@ extern void __iomem * __ioremap_at(phys_addr_t pa, void > *ea, > unsigned long size, pgprot_t prot); > extern void __iounmap_at(void *ea, unsigned long size); > > +#ifdef CONFIG_SPARSEMEM > +static inline unsigned long memremap_compat_align(void) > +{ > + if (radix_enabled()) > + return SUBSECTION_SIZE; > + return (1UL << mmu_psize_defs[mmu_linear_psize].shift); > +} > +#define memremap_compat_align memremap_compat_align > +#endif > + > /* > * When CONFIG_PPC_INDIRECT_PIO is set, we use the generic iomap > implementation > * which needs some additional definitions here. They basically allow PIO ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH v2 16/18] powerpc/papr_scm: Switch to numa_map_to_online_node()
Dan Williams writes: > Now that the core exports numa_map_to_online_node() switch to that > instead of the locally coded duplicate. > > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman Acked-by: Michael Ellerman cheers > Cc: "Oliver O'Halloran" > Reported-by: "Aneesh Kumar K.V" > Signed-off-by: Dan Williams > --- > arch/powerpc/platforms/pseries/papr_scm.c | 21 + > 1 file changed, 1 insertion(+), 20 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > b/arch/powerpc/platforms/pseries/papr_scm.c > index 33aa59e666e5..ef81515f3b6a 100644 > --- a/arch/powerpc/platforms/pseries/papr_scm.c > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > @@ -284,25 +284,6 @@ int papr_scm_ndctl(struct nvdimm_bus_descriptor > *nd_desc, struct nvdimm *nvdimm, > return 0; > } > > -static inline int papr_scm_node(int node) > -{ > - int min_dist = INT_MAX, dist; > - int nid, min_node; > - > - if ((node == NUMA_NO_NODE) || node_online(node)) > - return node; > - > - min_node = first_online_node; > - for_each_online_node(nid) { > - dist = node_distance(node, nid); > - if (dist < min_dist) { > - min_dist = dist; > - min_node = nid; > - } > - } > - return min_node; > -} > - > static int papr_scm_nvdimm_init(struct papr_scm_priv *p) > { > struct device *dev = >pdev->dev; > @@ -347,7 +328,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) > > memset(_desc, 0, sizeof(ndr_desc)); > target_nid = dev_to_node(>pdev->dev); > - online_nid = papr_scm_node(target_nid); > + online_nid = numa_map_to_online_node(target_nid); > ndr_desc.numa_node = online_nid; > ndr_desc.target_node = target_nid; > ndr_desc.res = >res; ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH] mm/nvdimm: Add is_ioremap_addr and use that to check ioremap address
"Aneesh Kumar K.V" writes: > Andrew Morton writes: > >> On Mon, 1 Jul 2019 19:10:38 +0530 "Aneesh Kumar K.V" >> wrote: >> >>> Architectures like powerpc use different address range to map ioremap >>> and vmalloc range. The memunmap() check used by the nvdimm layer was >>> wrongly using is_vmalloc_addr() to check for ioremap range which fails for >>> ppc64. This result in ppc64 not freeing the ioremap mapping. The side effect >>> of this is an unbind failure during module unload with papr_scm nvdimm >>> driver >> >> The patch applies to 5.1. Does it need a Fixes: and a Cc:stable? > > Actually, we want it to be backported to an older kernel possibly one > that added papr-scm driver, b5beae5e224f ("powerpc/pseries: Add driver > for PAPR SCM regions"). But that doesn't apply easily. It does apply > without conflicts to 5.0 Don't worry about where it applies or doesn't, just tag it with the correct Fixes: and stable versions and then if it doesn't backport cleanly then we deal with that later. cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 2/5] mm/resource: move HMM pr_debug() deeper into resource code
Dave Hansen writes: > On 1/25/19 1:18 PM, Bjorn Helgaas wrote: >> On Thu, Jan 24, 2019 at 5:21 PM Dave Hansen >> wrote: >>> diff -puN kernel/resource.c~move-request_region-check kernel/resource.c >>> --- a/kernel/resource.c~move-request_region-check 2019-01-24 >>> 15:13:14.453199539 -0800 >>> +++ b/kernel/resource.c 2019-01-24 15:13:14.458199539 -0800 >>> @@ -1123,6 +1123,16 @@ struct resource * __request_region(struc >>> conflict = __request_resource(parent, res); >>> if (!conflict) >>> break; >>> + /* >>> +* mm/hmm.c reserves physical addresses which then >>> +* become unavailable to other users. Conflicts are >>> +* not expected. Be verbose if one is encountered. >>> +*/ >>> + if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) { >>> + pr_debug("Resource conflict with unaddressable " >>> +"device memory at %#010llx !\n", >>> +(unsigned long long)start); >> >> I don't object to the change, but are you really OK with this being a >> pr_debug() message that is only emitted when enabled via either the >> dynamic debug mechanism or DEBUG being defined? From the comments, it >> seems more like a KERN_INFO sort of message. > > I left it consistent with the original message that was in the code. > I'm happy to change it, though, if the consumers of it (Jerome, > basically) want something different. At least using pr_debug() doesn't match the comment, ie. the comment says "Be verbose" but pr_debug() is silent by default. cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 1/5] mm/resource: return real error codes from walk failures
Dave Hansen writes: > On 1/25/19 1:02 PM, Bjorn Helgaas wrote: >>> @@ -453,7 +453,7 @@ int walk_system_ram_range(unsigned long >>> unsigned long flags; >>> struct resource res; >>> unsigned long pfn, end_pfn; >>> - int ret = -1; >>> + int ret = -EINVAL; >> Can you either make a similar change to the powerpc version of >> walk_system_ram_range() in arch/powerpc/mm/mem.c or explain why it's >> not needed? It *seems* like we'd want both versions of >> walk_system_ram_range() to behave similarly in this respect. > > Sure. A quick grep shows powerpc being the only other implementation. Ugh gross, why are we reimplementing it? ... Oh right, memblock vs iomem. We should fix that one day :/ > I'll just add this hunk: > >> diff -puN >> arch/powerpc/mm/mem.c~memory-hotplug-walk_system_ram_range-returns-neg-1 >> arch/powerpc/mm/mem.c >> --- >> a/arch/powerpc/mm/mem.c~memory-hotplug-walk_system_ram_range-returns-neg-1 >> 2019-01-25 12:57:00.04446 -0800 >> +++ b/arch/powerpc/mm/mem.c 2019-01-25 12:58:13.215004263 -0800 >> @@ -188,7 +188,7 @@ walk_system_ram_range(unsigned long star >> struct memblock_region *reg; >> unsigned long end_pfn = start_pfn + nr_pages; >> unsigned long tstart, tend; >> - int ret = -1; >> + int ret = -EINVAL; > > I'll also dust off the ol' cross-compiler and make sure I didn't > fat-finger anything. Modern Fedora & Ubuntu have packaged cross toolchains. Otherwise there's the kernel.org ones, or bootlin has versions with libc if you need it. Patch looks fine. That value could only get to userspace if we have no memory, which would be interesting. Acked-by: Michael Ellerman (powerpc) cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [v3,1/2] powerpc/pseries: PAPR persistent memory support
On Sun, 2018-10-14 at 23:18:27 UTC, Oliver O'Halloran wrote: > This patch implements support for discovering storage class memory > devices at boot and for handling hotplug of new regions via RTAS > hotplug events. > > Signed-off-by: Oliver O'Halloran Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/4c5d87db497832c493ed296157bd17 cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 2/2] powerpc/pseries: Add driver for PAPR SCM regions
Dan Williams writes: > On Sat, Oct 13, 2018 at 5:08 AM Michael Ellerman wrote: >> >> Dan Williams writes: >> > On Tue, Oct 9, 2018 at 11:21 PM Oliver O'Halloran wrote: >> >> >> >> Adds a driver that implements support for enabling and accessing PAPR >> >> SCM regions. Unfortunately due to how the PAPR interface works we can't >> >> use the existing of_pmem driver (yet) because: >> >> >> ... >> >> + >> >> +static int papr_scm_nvdimm_init(struct papr_scm_priv *p) >> >> +{ >> >> + struct device *dev = >pdev->dev; >> >> + struct nd_mapping_desc mapping; >> >> + struct nd_region_desc ndr_desc; >> >> + unsigned long dimm_flags; >> >> + >> >> + p->bus_desc.ndctl = papr_scm_ndctl; >> >> + p->bus_desc.module = THIS_MODULE; >> >> + p->bus_desc.of_node = p->pdev->dev.of_node; >> >> + p->bus_desc.attr_groups = bus_attr_groups; >> >> + p->bus_desc.provider_name = kstrdup(p->pdev->name, GFP_KERNEL); >> >> + >> >> + if (!p->bus_desc.provider_name) >> >> + return -ENOMEM; >> >> + >> >> + p->bus = nvdimm_bus_register(NULL, >bus_desc); >> >> + if (!p->bus) { >> >> + dev_err(dev, "Error creating nvdimm bus %pOF\n", p->dn); >> >> + return -ENXIO; >> >> + } >> >> + >> >> + dimm_flags = 0; >> >> + set_bit(NDD_ALIASING, _flags); >> >> + >> >> + p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_groups, >> >> + dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, >> >> NULL); >> > >> > Looks good, although I'm just about to push out commits that change >> > this function signature to take a 'security_ops' pointer. If you need >> > a stable branch to base this on, let me know. >> ... >> > >> > Other than that looks ok to me: >> > >> > Acked-by: Dan Williams >> > >> > ...just the matter of what to do about function signature change. >> >> Yeah that's a bit of a bother. >> >> The ideal for me would be that you put the commit that changes the >> signature by itself in a branch based on 4.19-rc3 (or earlier), and then >> we could both just merge that. >> >> But not sure if that will work with whatever else you're trying to sync >> up with. > > How about this, I'll move the new signature to an 'advanced': > > __nvdimm_create() > > ...and make the existing: > >nvdimm_create() > > ...a simple wrapper around the new functionality. That way no matter > what merge order we should be ok. Yep that's much easier, thanks. cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 2/2] powerpc/pseries: Add driver for PAPR SCM regions
Dan Williams writes: > On Tue, Oct 9, 2018 at 11:21 PM Oliver O'Halloran wrote: >> >> Adds a driver that implements support for enabling and accessing PAPR >> SCM regions. Unfortunately due to how the PAPR interface works we can't >> use the existing of_pmem driver (yet) because: >> ... >> + >> +static int papr_scm_nvdimm_init(struct papr_scm_priv *p) >> +{ >> + struct device *dev = >pdev->dev; >> + struct nd_mapping_desc mapping; >> + struct nd_region_desc ndr_desc; >> + unsigned long dimm_flags; >> + >> + p->bus_desc.ndctl = papr_scm_ndctl; >> + p->bus_desc.module = THIS_MODULE; >> + p->bus_desc.of_node = p->pdev->dev.of_node; >> + p->bus_desc.attr_groups = bus_attr_groups; >> + p->bus_desc.provider_name = kstrdup(p->pdev->name, GFP_KERNEL); >> + >> + if (!p->bus_desc.provider_name) >> + return -ENOMEM; >> + >> + p->bus = nvdimm_bus_register(NULL, >bus_desc); >> + if (!p->bus) { >> + dev_err(dev, "Error creating nvdimm bus %pOF\n", p->dn); >> + return -ENXIO; >> + } >> + >> + dimm_flags = 0; >> + set_bit(NDD_ALIASING, _flags); >> + >> + p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_groups, >> + dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL); > > Looks good, although I'm just about to push out commits that change > this function signature to take a 'security_ops' pointer. If you need > a stable branch to base this on, let me know. ... > > Other than that looks ok to me: > > Acked-by: Dan Williams > > ...just the matter of what to do about function signature change. Yeah that's a bit of a bother. The ideal for me would be that you put the commit that changes the signature by itself in a branch based on 4.19-rc3 (or earlier), and then we could both just merge that. But not sure if that will work with whatever else you're trying to sync up with. cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 2/2] powerpc/pseries: Add driver for PAPR SCM regions
Oliver O'Halloran writes: > Adds a driver that implements support for enabling and accessing PAPR > SCM regions. Unfortunately due to how the PAPR interface works we can't > use the existing of_pmem driver (yet) because: > > a) The guest is required to use the H_SCM_BIND_MEM h-call to add > add the SCM region to it's physical address space, and > b) There is currently no mechanism for relating a bare of_pmem region > to the backing DIMM (or not-a-DIMM for our case). > > Both of these are easily handled by rolling the functionality into a > seperate driver so here we are... > > Signed-off-by: Oliver O'Halloran > --- > The alternative implementation here is that we have the pseries code > do the h-calls and craft a pmem-region@ node based on that. > However, that doesn't solve b) and mpe has expressed his dislike of > adding new stuff to the DT at runtime so i'd say that's a non-starter. Hmm, from memory you yelled something at me across the office about that, so my response may not have been entirely well considered. I'm not quite sure what the state of the OF overlays support is, but that would be The Right Way to do that sort of modification to the device tree at runtime. If we merged this and then later got the of_pmem driver to work for us would there be any user-visible change? cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v3 3/4] doc/devicetree: Persistent memory region bindings
Oliver O'Halloran <ooh...@gmail.com> writes: > Add device-tree binding documentation for the nvdimm region driver. > > Cc: devicet...@vger.kernel.org > Signed-off-by: Oliver O'Halloran <ooh...@gmail.com> > --- > v2: Changed name from nvdimm-region to pmem-region. > Cleaned up the example binding and fixed the overlapping regions. > Added support for multiple regions in a single reg. > v3: Removed platform bus boilerplate from the example. > Changed description of the volatile and reg properties > to make them more clear. Thanks. Acked-by: Michael Ellerman <m...@ellerman.id.au> cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v3 1/3] mm, powerpc: use vma_kernel_pagesize() in vma_mmu_pagesize()
Dan Williams <dan.j.willi...@intel.com> writes: > The current powerpc definition of vma_mmu_pagesize() open codes looking > up the page size via hstate. It is identical to the generic > vma_kernel_pagesize() implementation. > > Now, vma_kernel_pagesize() is growing support for determining the > page size of Device-DAX vmas in addition to the existing Hugetlbfs page > size determination. > > Ideally, if the powerpc vma_mmu_pagesize() used vma_kernel_pagesize() it > would automatically benefit from any new vma-type support that is added > to vma_kernel_pagesize(). However, the powerpc vma_mmu_pagesize() is > prevented from calling vma_kernel_pagesize() due to a circular header > dependency that requires vma_mmu_pagesize() to be defined before > including . > > Break this circular dependency by defining the default > vma_mmu_pagesize() as a __weak symbol to be overridden by the powerpc > version. > > Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> > Cc: Paul Mackerras <pau...@samba.org> > Cc: Michael Ellerman <m...@ellerman.id.au> > Signed-off-by: Dan Williams <dan.j.willi...@intel.com> > --- > arch/powerpc/include/asm/hugetlb.h |6 -- > arch/powerpc/mm/hugetlbpage.c |5 + > mm/hugetlb.c |8 +++- > 3 files changed, 4 insertions(+), 15 deletions(-) This looks OK to me. I was worried switching to a weak symbol would mean it doesn't get inlined, but it's not inlined today anyway! Acked-by: Michael Ellerman <m...@ellerman.id.au> (powerpc) cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT
Matthew Wilcoxwrites: > On Mon, Dec 18, 2017 at 01:35:47PM -0700, Ross Zwisler wrote: >> What I'm hoping to do with this series is to just provide a sysfs >> representation of the HMAT so that applications can know which NUMA nodes to >> select with existing utilities like numactl. This series does not currently >> alter any kernel behavior, it only provides a sysfs interface. >> >> Say for example you had a system with some high bandwidth memory (HBM), and >> you wanted to use it for a specific application. You could use the sysfs >> representation of the HMAT to figure out which memory target held your HBM. >> You could do this by looking at the local bandwidth values for the various >> memory targets, so: >> >> # grep . /sys/devices/system/hmat/mem_tgt*/local_init/write_bw_MBps >> /sys/devices/system/hmat/mem_tgt2/local_init/write_bw_MBps:81920 >> /sys/devices/system/hmat/mem_tgt3/local_init/write_bw_MBps:40960 >> /sys/devices/system/hmat/mem_tgt4/local_init/write_bw_MBps:40960 >> /sys/devices/system/hmat/mem_tgt5/local_init/write_bw_MBps:40960 >> >> and look for the one that corresponds to your HBM speed. (These numbers are >> made up, but you get the idea.) > > Presumably ACPI-based platforms will not be the only ones who have the > ability to expose different bandwidth memories in the future. I think > we need a platform-agnostic way ... right, PowerPC people? Yes! I don't have any detail at hand but will try and rustle something up. cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 1/2] mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct
Michal Hocko <mho...@kernel.org> writes: > On Thu 07-12-17 19:30:55, Dan Williams wrote: >> When device-dax is operating in huge-page mode we want it to behave like >> hugetlbfs and report the MMU page mapping size that is being enforced by >> the vma. Similar to commit 31383c6865a5 "mm, hugetlbfs: introduce >> ->split() to vm_operations_struct" it would be messy to teach >> vma_mmu_pagesize() about device-dax page mapping sizes in the same >> (hstate) way that hugetlbfs communicates this attribute. Instead, these >> patches introduce a new ->pagesize() vm operation. >> >> Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> >> Cc: Paul Mackerras <pau...@samba.org> >> Cc: Michael Ellerman <m...@ellerman.id.au> >> Reported-by: Jane Chu <jane@oracle.com> >> Signed-off-by: Dan Williams <dan.j.willi...@intel.com> > > My build battery choked on the following > In file included from drivers/infiniband/core/umem_odp.c:41:0: > ./include/linux/hugetlb.h: In function 'vma_kernel_pagesize': > ./include/linux/hugetlb.h:262:32: error: dereferencing pointer to incomplete > type > if (vma->vm_ops && vma->vm_ops->pagesize) > ^ > ./include/linux/hugetlb.h:263:21: error: dereferencing pointer to incomplete > type >return vma->vm_ops->pagesize(vma); > > I thought that adding #include into linux/hugetlb.h would > be sufficient but then it failed for powerpc defconfig which overrides > vma_kernel_pagesize > In file included from ./include/linux/hugetlb.h:452:0, > from arch/powerpc/mm/hugetlbpage.c:14: > ./arch/powerpc/include/asm/hugetlb.h:131:26: error: redefinition of > 'vma_mmu_pagesize' > #define vma_mmu_pagesize vma_mmu_pagesize > ^ > arch/powerpc/mm/hugetlbpage.c:563:15: note: in expansion of macro > 'vma_mmu_pagesize' > unsigned long vma_mmu_pagesize(struct vm_area_struct *vma) >^ > In file included from arch/powerpc/mm/hugetlbpage.c:14:0: > ./include/linux/hugetlb.h:275:29: note: previous definition of > 'vma_mmu_pagesize' was here > static inline unsigned long vma_mmu_pagesize(struct vm_area_struct *vma) > > So it looks this needs something more laborous. This builds for me. cheers diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h index 14c9d44f355b..3cc6ca1bdaf2 100644 --- a/arch/powerpc/include/asm/hugetlb.h +++ b/arch/powerpc/include/asm/hugetlb.h @@ -123,6 +123,7 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr, * to override the version in mm/hugetlb.c */ #define vma_mmu_pagesize vma_mmu_pagesize +unsigned long vma_mmu_pagesize(struct vm_area_struct *vma); /* * If the arch doesn't supply something else, assume that hugepage diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index a9b9083c5e49..c6a2e577e842 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -568,10 +568,7 @@ unsigned long vma_mmu_pagesize(struct vm_area_struct *vma) if (!radix_enabled()) return 1UL << mmu_psize_to_shift(psize); #endif - if (!is_vm_hugetlb_page(vma)) - return PAGE_SIZE; - - return huge_page_size(hstate_vma(vma)); + return vma_kernel_pagesize(vma); } static inline bool is_power_of_4(unsigned long x) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6e3696c7b35a..fe7b74325856 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -394,10 +394,6 @@ static inline unsigned long huge_page_size(struct hstate *h) return (unsigned long)PAGE_SIZE << h->order; } -extern unsigned long vma_kernel_pagesize(struct vm_area_struct *vma); - -extern unsigned long vma_mmu_pagesize(struct vm_area_struct *vma); - static inline unsigned long huge_page_mask(struct hstate *h) { return h->mask; @@ -430,6 +426,30 @@ static inline unsigned int blocks_per_huge_page(struct hstate *h) #include +/* + * Return the size of the pages allocated when backing a VMA. In the majority + * cases this will be same size as used by the page table entries. + */ +static inline unsigned long vma_kernel_pagesize(struct vm_area_struct *vma) +{ + if (vma->vm_ops && vma->vm_ops->pagesize) + return vma->vm_ops->pagesize(vma); + return PAGE_SIZE; +} + +/* + * Return the page size being used by the MMU to back a VMA. In the majority + * of cases, the page size used by the kernel matches the MMU size. On + * architectures where it differs, an architecture-specific version of this + * function is required. + */ +#ifndef vma_mmu_pagesize +static inline unsigned long vma_mmu_pagesize(struct vm_area_struct
Re: [PATCH 2/3] libnvdimm: Add a device-tree interface
Rob Herringwrites: > On Thu, Nov 16, 2017 at 04:51:30AM +1100, Oliver O'Halloran wrote: ... >> diff --git a/drivers/nvdimm/of_nvdimm.c b/drivers/nvdimm/of_nvdimm.c >> new file mode 100644 >> index ..765cbbae8bcb >> --- /dev/null >> +++ b/drivers/nvdimm/of_nvdimm.c >> @@ -0,0 +1,202 @@ >> +/* >> + * Copyright 2017, IBM Corporation >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation; either version 2 of the License, or >> + * (at your option) any later version. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + * You should have received a copy of the GNU General Public License >> + * along with this program; if not, you can access it online at >> + * http://www.gnu.org/licenses/gpl-2.0.html. > > This can be SPDX tag. Specifically: // SPDX-License-Identifier: GPL-2.0+ /* * Copyright 2017, IBM Corporation */ ... cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 02/15] mm, dax: introduce pfn_t_special()
Dan Williamswrites: > In support of removing the VM_MIXEDMAP indication from DAX VMAs, > introduce pfn_t_special() for drivers to indicate that _PAGE_SPECIAL > should be used for DAX ptes. This also helps identify drivers like > dccssblk that only want to use DAX in a read-only fashion without > get_user_pages() support. > > Ideally we could delete axonram and dcssblk DAX support, but if we need > to keep it better make it explicit that axonram and dcssblk only support > a sub-set of DAX due to missing _PAGE_DEVMAP support. I sent a patch to remove axonram (sorry meant to Cc you): http://patchwork.ozlabs.org/patch/833588/ Will see if there's any feedback. cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
Oliver O'Halloranwrites: > On Wed, Apr 12, 2017 at 11:53 AM, Balbir Singh wrote: >> >> The API also does not clear linear_map_hash_slots[] under DEBUG_PAGEALLOC > > I'm not sure what API you're referring to here. The tracking for > linear_map_hash_slots[] is agnostic of mmu_hash_ops so we shouldn't be > touching it here. It also looks like DEBUG_PAGEALLOC is a bit broken > with hotplugged memory anyway so I think that's a fix for a different > patch. It's old and has probably bit-rotted, so I'm happy for it to be fixed up later if necessary. cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc
Stephen Rothwellwrites: > Hi Oliver, > > On Wed, 12 Apr 2017 03:42:30 +1000 Oliver O'Halloran wrote: >> >> Flip the switch. Running around and screaming "IT'S ALIVE" is optional, >> but recommended. >> >> Signed-off-by: Oliver O'Halloran >> --- >> mm/Kconfig | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/Kconfig b/mm/Kconfig >> index 43d000e44424..d696af58f97f 100644 >> --- a/mm/Kconfig >> +++ b/mm/Kconfig >> @@ -724,7 +724,7 @@ config ZONE_DEVICE >> depends on MEMORY_HOTPLUG >> depends on MEMORY_HOTREMOVE >> depends on SPARSEMEM_VMEMMAP >> -depends on X86_64 #arch_add_memory() comprehends device memory >> +depends on (X86_64 || PPC_BOOK3S_64) #arch_add_memory() comprehends >> device memory >> > > That's fine, but at what point do we create > CONFIG_ARCH_HAVE_ZONE_DEVICE, replace the "depends on > " above with "depends on ARCH_HAVE_ZONE_DEVICE" and > select that from the appropriate places? You mean CONFIG_HAVE_ZONE_DEVICE :) A patch to do that, and update x86, would be a good precursor to this series. It could probably go in right now, and be in place for when this series lands. cheers ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm