Re: [Virtio-fs] [PATCH v3 2/3] dax: Add a wakeup mode parameter to put_unlocked_entry()

2021-04-20 Thread Greg Kurz
On Mon, 19 Apr 2021 17:36:35 -0400
Vivek Goyal  wrote:

> As of now put_unlocked_entry() always wakes up next waiter. In next
> patches we want to wake up all waiters at one callsite. Hence, add a
> parameter to the function.
> 
> This patch does not introduce any change of behavior.
> 
> Suggested-by: Dan Williams 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/dax.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 00978d0838b1..f19d76a6a493 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -275,11 +275,12 @@ static void wait_entry_unlocked(struct xa_state *xas, 
> void *entry)
>   finish_wait(wq, );
>  }
>  
> -static void put_unlocked_entry(struct xa_state *xas, void *entry)
> +static void put_unlocked_entry(struct xa_state *xas, void *entry,
> +enum dax_entry_wake_mode mode)
>  {
>   /* If we were the only waiter woken, wake the next one */

With this change, the comment is no longer accurate since the
function can now wake all waiters if passed mode == WAKE_ALL.
Also, it paraphrases the code which is simple enough, so I'd
simply drop it.

This is minor though and it shouldn't prevent this fix to go
forward.

Reviewed-by: Greg Kurz 

>   if (entry && !dax_is_conflict(entry))
> - dax_wake_entry(xas, entry, WAKE_NEXT);
> + dax_wake_entry(xas, entry, mode);
>  }
>  
>  /*
> @@ -633,7 +634,7 @@ struct page *dax_layout_busy_page_range(struct 
> address_space *mapping,
>   entry = get_unlocked_entry(, 0);
>   if (entry)
>   page = dax_busy_page(entry);
> - put_unlocked_entry(, entry);
> + put_unlocked_entry(, entry, WAKE_NEXT);
>   if (page)
>   break;
>   if (++scanned % XA_CHECK_SCHED)
> @@ -675,7 +676,7 @@ static int __dax_invalidate_entry(struct address_space 
> *mapping,
>   mapping->nrexceptional--;
>   ret = 1;
>  out:
> - put_unlocked_entry(, entry);
> + put_unlocked_entry(, entry, WAKE_NEXT);
>   xas_unlock_irq();
>   return ret;
>  }
> @@ -954,7 +955,7 @@ static int dax_writeback_one(struct xa_state *xas, struct 
> dax_device *dax_dev,
>   return ret;
>  
>   put_unlocked:
> - put_unlocked_entry(xas, entry);
> + put_unlocked_entry(xas, entry, WAKE_NEXT);
>   return ret;
>  }
>  
> @@ -1695,7 +1696,7 @@ dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, 
> unsigned int order)
>   /* Did we race with someone splitting entry or so? */
>   if (!entry || dax_is_conflict(entry) ||
>   (order == 0 && !dax_is_pte_entry(entry))) {
> - put_unlocked_entry(, entry);
> + put_unlocked_entry(, entry, WAKE_NEXT);
>   xas_unlock_irq();
>   trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf,
> VM_FAULT_NOPAGE);
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [Virtio-fs] [PATCH v3 1/3] dax: Add an enum for specifying dax wakup mode

2021-04-20 Thread Greg Kurz
On Mon, 19 Apr 2021 17:36:34 -0400
Vivek Goyal  wrote:

> Dan mentioned that he is not very fond of passing around a boolean true/false
> to specify if only next waiter should be woken up or all waiters should be
> woken up. He instead prefers that we introduce an enum and make it very
> explicity at the callsite itself. Easier to read code.
> 
> This patch should not introduce any change of behavior.
> 
> Suggested-by: Dan Williams 
> Signed-off-by: Vivek Goyal 
> ---

Reviewed-by: Greg Kurz 

>  fs/dax.c | 23 +--
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index b3d27fdc6775..00978d0838b1 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -144,6 +144,16 @@ struct wait_exceptional_entry_queue {
>   struct exceptional_entry_key key;
>  };
>  
> +/**
> + * enum dax_entry_wake_mode: waitqueue wakeup toggle
> + * @WAKE_NEXT: entry was not mutated
> + * @WAKE_ALL: entry was invalidated, or resized
> + */
> +enum dax_entry_wake_mode {
> + WAKE_NEXT,
> + WAKE_ALL,
> +};
> +
>  static wait_queue_head_t *dax_entry_waitqueue(struct xa_state *xas,
>   void *entry, struct exceptional_entry_key *key)
>  {
> @@ -182,7 +192,8 @@ static int wake_exceptional_entry_func(wait_queue_entry_t 
> *wait,
>   * The important information it's conveying is whether the entry at
>   * this index used to be a PMD entry.
>   */
> -static void dax_wake_entry(struct xa_state *xas, void *entry, bool wake_all)
> +static void dax_wake_entry(struct xa_state *xas, void *entry,
> +enum dax_entry_wake_mode mode)
>  {
>   struct exceptional_entry_key key;
>   wait_queue_head_t *wq;
> @@ -196,7 +207,7 @@ static void dax_wake_entry(struct xa_state *xas, void 
> *entry, bool wake_all)
>* must be in the waitqueue and the following check will see them.
>*/
>   if (waitqueue_active(wq))
> - __wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, );
> + __wake_up(wq, TASK_NORMAL, mode == WAKE_ALL ? 0 : 1, );
>  }
>  
>  /*
> @@ -268,7 +279,7 @@ static void put_unlocked_entry(struct xa_state *xas, void 
> *entry)
>  {
>   /* If we were the only waiter woken, wake the next one */
>   if (entry && !dax_is_conflict(entry))
> - dax_wake_entry(xas, entry, false);
> + dax_wake_entry(xas, entry, WAKE_NEXT);
>  }
>  
>  /*
> @@ -286,7 +297,7 @@ static void dax_unlock_entry(struct xa_state *xas, void 
> *entry)
>   old = xas_store(xas, entry);
>   xas_unlock_irq(xas);
>   BUG_ON(!dax_is_locked(old));
> - dax_wake_entry(xas, entry, false);
> + dax_wake_entry(xas, entry, WAKE_NEXT);
>  }
>  
>  /*
> @@ -524,7 +535,7 @@ static void *grab_mapping_entry(struct xa_state *xas,
>  
>   dax_disassociate_entry(entry, mapping, false);
>   xas_store(xas, NULL);   /* undo the PMD join */
> - dax_wake_entry(xas, entry, true);
> + dax_wake_entry(xas, entry, WAKE_ALL);
>   mapping->nrexceptional--;
>   entry = NULL;
>   xas_set(xas, index);
> @@ -937,7 +948,7 @@ static int dax_writeback_one(struct xa_state *xas, struct 
> dax_device *dax_dev,
>   xas_lock_irq(xas);
>   xas_store(xas, entry);
>   xas_clear_mark(xas, PAGECACHE_TAG_DIRTY);
> - dax_wake_entry(xas, entry, false);
> + dax_wake_entry(xas, entry, WAKE_NEXT);
>  
>   trace_dax_writeback_one(mapping->host, index, count);
>   return ret;
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [RFC Qemu PATCH v2 1/2] spapr: drc: Add support for async hcalls at the drc level

2020-12-21 Thread Greg Kurz
On Mon, 21 Dec 2020 13:08:53 +0100
Greg Kurz  wrote:

> Hi Shiva,
> 
> On Mon, 30 Nov 2020 09:16:39 -0600
> Shivaprasad G Bhat  wrote:
> 
> > The patch adds support for async hcalls at the DRC level for the
> > spapr devices. To be used by spapr-scm devices in the patch/es to follow.
> > 
> > Signed-off-by: Shivaprasad G Bhat 
> > ---
> 
> The overall idea looks good but I think you should consider using
> a thread pool to implement it. See below.
> 

Some more comments.

> >  hw/ppc/spapr_drc.c |  149 
> > 
> >  include/hw/ppc/spapr_drc.h |   25 +++
> >  2 files changed, 174 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index 77718cde1f..4ecd04f686 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -15,6 +15,7 @@
> >  #include "qapi/qmp/qnull.h"
> >  #include "cpu.h"
> >  #include "qemu/cutils.h"
> > +#include "qemu/guest-random.h"
> >  #include "hw/ppc/spapr_drc.h"
> >  #include "qom/object.h"
> >  #include "migration/vmstate.h"
> > @@ -421,6 +422,148 @@ void spapr_drc_detach(SpaprDrc *drc)
> >  spapr_drc_release(drc);
> >  }
> >  
> > +
> > +/*
> > + * @drc : device DRC targetting which the async hcalls to be made.
> > + *
> > + * All subsequent requests to run/query the status should use the
> > + * unique token returned here.
> > + */
> > +uint64_t spapr_drc_get_new_async_hcall_token(SpaprDrc *drc)
> > +{
> > +Error *err = NULL;
> > +uint64_t token;
> > +SpaprDrcDeviceAsyncHCallState *tmp, *next, *state;
> > +
> > +state = g_malloc0(sizeof(*state));
> > +state->pending = true;
> > +
> > +qemu_mutex_lock(>async_hcall_states_lock);
> > +retry:
> > +if (qemu_guest_getrandom(, sizeof(token), ) < 0) {
> > +error_report_err(err);
> > +g_free(state);
> > +qemu_mutex_unlock(>async_hcall_states_lock);
> > +return 0;
> > +}
> > +
> > +if (!token) /* Token should be non-zero */
> > +goto retry;
> > +
> > +if (!QLIST_EMPTY(>async_hcall_states)) {
> > +QLIST_FOREACH_SAFE(tmp, >async_hcall_states, node, next) {
> > +if (tmp->continue_token == token) {
> > +/* If the token already in use, get a new one */
> > +goto retry;
> > +}
> > +}
> > +}
> > +
> > +state->continue_token = token;
> > +QLIST_INSERT_HEAD(>async_hcall_states, state, node);
> > +
> > +qemu_mutex_unlock(>async_hcall_states_lock);
> > +
> > +return state->continue_token;
> > +}
> > +
> > +static void *spapr_drc_async_hcall_runner(void *opaque)
> > +{
> > +int response = -1;

It feels wrong since the return value of func() is supposed to be
opaque to this function. And anyway it isn't needed since response
is always set a few lines below.

> > +SpaprDrcDeviceAsyncHCallState *state = opaque;
> > +
> > +/*
> > + * state is freed only after this thread finishes(after 
> > pthread_join()),
> > + * don't worry about it becoming NULL.
> > + */
> > +
> > +response = state->func(state->data);
> > +
> > +state->hcall_ret = response;
> > +state->pending = 0;

s/0/false/

> > +
> > +return NULL;
> > +}
> > +
> > +/*
> > + * @drc  : device DRC targetting which the async hcalls to be made.
> > + * token : The continue token to be used for tracking as recived from
> > + * spapr_drc_get_new_async_hcall_token
> > + * @func() : the worker function which needs to be executed asynchronously
> > + * @data : data to be passed to the asynchronous function. Worker is 
> > supposed
> > + * to free/cleanup the data that is passed here
> 
> It'd be cleaner to pass a completion callback and have free/cleanup handled 
> there.
> 
> > + */
> > +void spapr_drc_run_async_hcall(SpaprDrc *drc, uint64_t token,
> > +   SpaprDrcAsyncHcallWorkerFunc *func, void 
> > *data)
> > +{
> > +SpaprDrcDeviceAsyncHCallState *state;
> > +
> > +qemu_mutex_lock(>async_hcall_states_lock);
> > +QLIST_FOREACH(state, >async_hcall_states, node) {
> > +if (state->continue_token == token) {
> > +   

Re: [RFC Qemu PATCH v2 0/2] spapr: nvdimm: Asynchronus flush hcall support

2020-12-21 Thread Greg Kurz
On Mon, 30 Nov 2020 09:16:14 -0600
Shivaprasad G Bhat  wrote:

> The nvdimm devices are expected to ensure write persistent during power
> failure kind of scenarios.
> 
> The libpmem has architecture specific instructions like dcbf on power
> to flush the cache data to backend nvdimm device during normal writes.
> 
> Qemu - virtual nvdimm devices are memory mapped. The dcbf in the guest
> doesn't traslate to actual flush to the backend file on the host in case
> of file backed vnvdimms. This is addressed by virtio-pmem in case of x86_64
> by making asynchronous flushes.
> 
> On PAPR, issue is addressed by adding a new hcall to
> request for an explicit asynchronous flush requests from the guest ndctl
> driver when the backend nvdimm cannot ensure write persistence with dcbf
> alone. So, the approach here is to convey when the asynchronous flush is
> required in a device tree property. The guest makes the hcall when the
> property is found, instead of relying on dcbf.
> 
> The first patch adds the necessary asynchronous hcall support infrastructure
> code at the DRC level. Second patch implements the hcall using the
> infrastructure.
> 
> Hcall semantics are in review and not final.
> 
> A new device property sync-dax is added to the nvdimm device. When the 
> sync-dax is off(default), the asynchronous hcalls will be called.
> 
> With respect to save from new qemu to restore on old qemu, having the
> sync-dax by default off(when not specified) causes IO errors in guests as
> the async-hcall would not be supported on old qemu. The new hcall
> implementation being supported only on the new  pseries machine version,
> the current machine version checks may be sufficient to prevent
> such migration. Please suggest what should be done.
> 

First, all requests that are still not completed from the guest POV,
ie. the hcall hasn't returned H_SUCCESS yet, are state that we should
migrate in theory. In this case, I guess we rather want to drain all
pending requests on the source in some pre-save handler.

Then, as explained in another mail, you should enforce stable behavior
for existing machine types with some hw_compat magic.

> The below demonstration shows the map_sync behavior with sync-dax on & off.
> (https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c)
> 
> The pmem0 is from nvdimm with With sync-dax=on, and pmem1 is from nvdimm with 
> syn-dax=off, mounted as
> /dev/pmem0 on /mnt1 type xfs 
> (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/pmem1 on /mnt2 type xfs 
> (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)
> 
> [root@atest-guest ~]# ./mapsync /mnt1/newfile> When sync-dax=off
> [root@atest-guest ~]# ./mapsync /mnt2/newfile> when sync-dax=on
> Failed to mmap  with Operation not supported
> 
> ---
> v1 - https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg06330.html
> Changes from v1
>   - Fixed a missed-out unlock
>   - using QLIST_FOREACH instead of QLIST_FOREACH_SAFE while generating 
> token
> 
> Shivaprasad G Bhat (2):
>   spapr: drc: Add support for async hcalls at the drc level
>   spapr: nvdimm: Implement async flush hcalls
> 
> 
>  hw/mem/nvdimm.c|1
>  hw/ppc/spapr_drc.c |  146 
> 
>  hw/ppc/spapr_nvdimm.c  |   79 
>  include/hw/mem/nvdimm.h|   10 +++
>  include/hw/ppc/spapr.h |3 +
>  include/hw/ppc/spapr_drc.h |   25 
>  6 files changed, 263 insertions(+), 1 deletion(-)
> 
> --
> Signature
> 
> 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [RFC Qemu PATCH v2 2/2] spapr: nvdimm: Implement async flush hcalls

2020-12-21 Thread Greg Kurz
On Mon, 30 Nov 2020 09:17:24 -0600
Shivaprasad G Bhat  wrote:

> When the persistent memory beacked by a file, a cpu cache flush instruction
> is not sufficient to ensure the stores are correctly flushed to the media.
> 
> The patch implements the async hcalls for flush operation on demand from the
> guest kernel.
> 
> The device option sync-dax is by default off and enables explicit asynchronous
> flush requests from guest. It can be disabled by setting syn-dax=on.
> 
> Signed-off-by: Shivaprasad G Bhat 
> ---
>  hw/mem/nvdimm.c |1 +
>  hw/ppc/spapr_nvdimm.c   |   79 
> +++
>  include/hw/mem/nvdimm.h |   10 ++
>  include/hw/ppc/spapr.h  |3 +-
>  4 files changed, 92 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> index 03c2201b56..37a4db0135 100644
> --- a/hw/mem/nvdimm.c
> +++ b/hw/mem/nvdimm.c
> @@ -220,6 +220,7 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
> const void *buf,
>  
>  static Property nvdimm_properties[] = {
>  DEFINE_PROP_BOOL(NVDIMM_UNARMED_PROP, NVDIMMDevice, unarmed, false),
> +DEFINE_PROP_BOOL(NVDIMM_SYNC_DAX_PROP, NVDIMMDevice, sync_dax, false),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> index a833a63b5e..557e36aa98 100644
> --- a/hw/ppc/spapr_nvdimm.c
> +++ b/hw/ppc/spapr_nvdimm.c
> @@ -22,6 +22,7 @@
>   * THE SOFTWARE.
>   */
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  #include "qapi/error.h"
>  #include "hw/ppc/spapr_drc.h"
>  #include "hw/ppc/spapr_nvdimm.h"
> @@ -155,6 +156,11 @@ static int spapr_dt_nvdimm(SpaprMachineState *spapr, 
> void *fdt,
>   "operating-system")));
>  _FDT(fdt_setprop(fdt, child_offset, "ibm,cache-flush-required", NULL, 
> 0));
>  
> +if (!nvdimm->sync_dax) {

So this is done unconditionally for all machine types. This means that a
guest started on a newer QEMU cannot be migrated to an older QEMU. This
is annoying because people legitimately expect an existing machine type
to be migratable to any QEMU version that supports it.

This means that something like the following should be added in hw_compat_5_2[]
to fix the property for pre-6.0 machine types:

{ "nvdimm", "sync-dax", "on" },

> +_FDT(fdt_setprop(fdt, child_offset, "ibm,async-flush-required",
> + NULL, 0));
> +}
> +
>  return child_offset;
>  }
>  
> @@ -370,6 +376,78 @@ static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, 
> SpaprMachineState *spapr,
>  return H_SUCCESS;
>  }
>  
> +typedef struct SCMAsyncFlushData {
> +int fd;
> +uint64_t token;
> +} SCMAsyncFlushData;
> +
> +static int flush_worker_cb(void *opaque)
> +{
> +int ret = H_SUCCESS;
> +SCMAsyncFlushData *req_data = opaque;
> +
> +/* flush raw backing image */
> +if (qemu_fdatasync(req_data->fd) < 0) {
> +error_report("papr_scm: Could not sync nvdimm to backend file: %s",
> + strerror(errno));
> +ret = H_HARDWARE;
> +}
> +
> +g_free(req_data);
> +
> +return ret;
> +}
> +
> +static target_ulong h_scm_async_flush(PowerPCCPU *cpu, SpaprMachineState 
> *spapr,
> +  target_ulong opcode, target_ulong 
> *args)
> +{
> +int ret;
> +uint32_t drc_index = args[0];
> +uint64_t continue_token = args[1];
> +SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +PCDIMMDevice *dimm;
> +HostMemoryBackend *backend = NULL;
> +SCMAsyncFlushData *req_data = NULL;
> +
> +if (!drc || !drc->dev ||
> +spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +return H_PARAMETER;
> +}
> +
> +if (continue_token != 0) {
> +ret = spapr_drc_get_async_hcall_status(drc, continue_token);
> +if (ret == H_BUSY) {
> +args[0] = continue_token;
> +return H_LONG_BUSY_ORDER_1_SEC;
> +}
> +
> +return ret;
> +}
> +
> +dimm = PC_DIMM(drc->dev);
> +backend = MEMORY_BACKEND(dimm->hostmem);
> +
> +req_data = g_malloc0(sizeof(SCMAsyncFlushData));
> +req_data->fd = memory_region_get_fd(>mr);
> +
> +continue_token = spapr_drc_get_new_async_hcall_token(drc);
> +if (!continue_token) {
> +g_free(req_data);
> +return H_P2;
> +}
> +req_data->token = continue_token;
> +
> +spapr_drc_run_async_hcall(drc, continue_token, _worker_cb, 
> req_data);
> +
> +ret = spapr_drc_get_async_hcall_status(drc, continue_token);
> +if (ret == H_BUSY) {
> +args[0] = req_data->token;
> +return ret;
> +}
> +
> +return ret;
> +}
> +
>  static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, SpaprMachineState 
> *spapr,
>   target_ulong opcode, target_ulong *args)
>  {
> @@ -486,6 +564,7 @@ static void spapr_scm_register_types(void)
>  spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
> 

Re: [RFC Qemu PATCH v2 1/2] spapr: drc: Add support for async hcalls at the drc level

2020-12-21 Thread Greg Kurz
Hi Shiva,

On Mon, 30 Nov 2020 09:16:39 -0600
Shivaprasad G Bhat  wrote:

> The patch adds support for async hcalls at the DRC level for the
> spapr devices. To be used by spapr-scm devices in the patch/es to follow.
> 
> Signed-off-by: Shivaprasad G Bhat 
> ---

The overall idea looks good but I think you should consider using
a thread pool to implement it. See below.

>  hw/ppc/spapr_drc.c |  149 
> 
>  include/hw/ppc/spapr_drc.h |   25 +++
>  2 files changed, 174 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 77718cde1f..4ecd04f686 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -15,6 +15,7 @@
>  #include "qapi/qmp/qnull.h"
>  #include "cpu.h"
>  #include "qemu/cutils.h"
> +#include "qemu/guest-random.h"
>  #include "hw/ppc/spapr_drc.h"
>  #include "qom/object.h"
>  #include "migration/vmstate.h"
> @@ -421,6 +422,148 @@ void spapr_drc_detach(SpaprDrc *drc)
>  spapr_drc_release(drc);
>  }
>  
> +
> +/*
> + * @drc : device DRC targetting which the async hcalls to be made.
> + *
> + * All subsequent requests to run/query the status should use the
> + * unique token returned here.
> + */
> +uint64_t spapr_drc_get_new_async_hcall_token(SpaprDrc *drc)
> +{
> +Error *err = NULL;
> +uint64_t token;
> +SpaprDrcDeviceAsyncHCallState *tmp, *next, *state;
> +
> +state = g_malloc0(sizeof(*state));
> +state->pending = true;
> +
> +qemu_mutex_lock(>async_hcall_states_lock);
> +retry:
> +if (qemu_guest_getrandom(, sizeof(token), ) < 0) {
> +error_report_err(err);
> +g_free(state);
> +qemu_mutex_unlock(>async_hcall_states_lock);
> +return 0;
> +}
> +
> +if (!token) /* Token should be non-zero */
> +goto retry;
> +
> +if (!QLIST_EMPTY(>async_hcall_states)) {
> +QLIST_FOREACH_SAFE(tmp, >async_hcall_states, node, next) {
> +if (tmp->continue_token == token) {
> +/* If the token already in use, get a new one */
> +goto retry;
> +}
> +}
> +}
> +
> +state->continue_token = token;
> +QLIST_INSERT_HEAD(>async_hcall_states, state, node);
> +
> +qemu_mutex_unlock(>async_hcall_states_lock);
> +
> +return state->continue_token;
> +}
> +
> +static void *spapr_drc_async_hcall_runner(void *opaque)
> +{
> +int response = -1;
> +SpaprDrcDeviceAsyncHCallState *state = opaque;
> +
> +/*
> + * state is freed only after this thread finishes(after pthread_join()),
> + * don't worry about it becoming NULL.
> + */
> +
> +response = state->func(state->data);
> +
> +state->hcall_ret = response;
> +state->pending = 0;
> +
> +return NULL;
> +}
> +
> +/*
> + * @drc  : device DRC targetting which the async hcalls to be made.
> + * token : The continue token to be used for tracking as recived from
> + * spapr_drc_get_new_async_hcall_token
> + * @func() : the worker function which needs to be executed asynchronously
> + * @data : data to be passed to the asynchronous function. Worker is supposed
> + * to free/cleanup the data that is passed here

It'd be cleaner to pass a completion callback and have free/cleanup handled 
there.

> + */
> +void spapr_drc_run_async_hcall(SpaprDrc *drc, uint64_t token,
> +   SpaprDrcAsyncHcallWorkerFunc *func, void 
> *data)
> +{
> +SpaprDrcDeviceAsyncHCallState *state;
> +
> +qemu_mutex_lock(>async_hcall_states_lock);
> +QLIST_FOREACH(state, >async_hcall_states, node) {
> +if (state->continue_token == token) {
> +state->func = func;
> +state->data = data;
> +qemu_thread_create(>thread, "sPAPR Async HCALL",
> +   spapr_drc_async_hcall_runner, state,
> +   QEMU_THREAD_JOINABLE);

qemu_thread_create() exits on failure, it shouldn't be called on
a guest triggerable path, eg. a buggy guest could call it up to
the point that pthread_create() returns EAGAIN.

Please use a thread pool (see thread_pool_submit_aio()). This takes care
of all the thread housekeeping for you in a safe way, and it provides a
completion callback API. The implementation could then be just about
having two lists: one for pending requests (fed here) and one for
completed requests (fed by the completion callback).

> +break;
> +}
> +}
> +qemu_mutex_unlock(>async_hcall_states_lock);
> +}
> +
> +/*
> + * spapr_drc_finish_async_hcalls
> + *  Waits for all pending async requests to complete
> + *  thier execution and free the states
> + */
> +static void spapr_drc_finish_async_hcalls(SpaprDrc *drc)
> +{
> +SpaprDrcDeviceAsyncHCallState *state, *next;
> +
> +if (QLIST_EMPTY(>async_hcall_states)) {
> +return;
> +}
> +
> +qemu_mutex_lock(>async_hcall_states_lock);
> +QLIST_FOREACH_SAFE(state, >async_hcall_states, node, next) {

Re: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices

2020-04-02 Thread Greg Kurz
On Thu, 02 Apr 2020 21:06:01 +1100
Michael Ellerman  wrote:

> "Oliver O'Halloran"  writes:
> > On Thu, Apr 2, 2020 at 2:42 PM Michael Ellerman  wrote:
> >> "Alastair D'Silva"  writes:
> >> >> -Original Message-
> >> >> From: Dan Williams 
> >> >>
> >> >> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva 
> >> >> wrote:
> >> >> >
> >> >> > *snip*
> >> >> Are OPAL calls similar to ACPI DSMs? I.e. methods for the OS to invoke
> >> >> platform firmware services? What's Skiboot?
> >> >
> >> > Yes, OPAL is the interface to firmware for POWER. Skiboot is the 
> >> > open-source (and only) implementation of OPAL.
> >>
> >>   https://github.com/open-power/skiboot
> >>
> >> In particular the tokens for calls are defined here:
> >>
> >>   https://github.com/open-power/skiboot/blob/master/include/opal-api.h#L220
> >>
> >> And you can grep for the token to find the implementation:
> >>
> >>   
> >> https://github.com/open-power/skiboot/blob/master/hw/npu2-opencapi.c#L2328
> >
> > I'm not sure I'd encourage anyone to read npu2-opencapi.c. I find it
> > hard enough to follow even with access to the workbooks.
> 
> Compared to certain firmwares that run on certain other platforms it's
> actually pretty readable code ;)
> 

Forth rocks ! ;-)

> > There's an OPAL call API reference here:
> > http://open-power.github.io/skiboot/doc/opal-api/index.html
> 
> Even better.
> 
> cheers
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 04/27] ocxl: Remove unnecessary externs

2020-02-26 Thread Greg Kurz
On Wed, 26 Feb 2020 22:15:23 +0800
'Baoquan He'  wrote:

> On 02/26/20 at 10:01am, Greg Kurz wrote:
> > On Wed, 26 Feb 2020 19:26:34 +1100
> > "Alastair D'Silva"  wrote:
> > 
> > > > -Original Message-
> > > > From: Baoquan He 
> > > > Sent: Wednesday, 26 February 2020 7:15 PM
> > > > To: Alastair D'Silva 
> > > > Cc: alast...@d-silva.org; Aneesh Kumar K . V
> > > > ; Oliver O'Halloran ;
> > > > Benjamin Herrenschmidt ; Paul Mackerras
> > > > ; Michael Ellerman ; Frederic
> > > > Barrat ; Andrew Donnellan ;
> > > > Arnd Bergmann ; Greg Kroah-Hartman
> > > > ; Dan Williams ;
> > > > Vishal Verma ; Dave Jiang
> > > > ; Ira Weiny ; Andrew Morton
> > > > ; Mauro Carvalho Chehab
> > > > ; David S. Miller ;
> > > > Rob Herring ; Anton Blanchard ;
> > > > Krzysztof Kozlowski ; Mahesh Salgaonkar
> > > > ; Madhavan Srinivasan
> > > > ; Cédric Le Goater ; Anju T
> > > > Sudhakar ; Hari Bathini
> > > > ; Thomas Gleixner ; Greg
> > > > Kurz ; Nicholas Piggin ; Masahiro
> > > > Yamada ; Alexey Kardashevskiy
> > > > ; linux-ker...@vger.kernel.org; linuxppc-
> > > > d...@lists.ozlabs.org; linux-nvdimm@lists.01.org; linux...@kvack.org
> > > > Subject: Re: [PATCH v3 04/27] ocxl: Remove unnecessary externs
> > > > 
> > > > On 02/21/20 at 02:26pm, Alastair D'Silva wrote:
> > > > > From: Alastair D'Silva 
> > > > >
> > > > > Function declarations don't need externs, remove the existing ones so
> > > > > they are consistent with newer code
> > > > >
> > > > > Signed-off-by: Alastair D'Silva 
> > > > > ---
> > > > >  arch/powerpc/include/asm/pnv-ocxl.h | 32 
> > > > > ++---
> > > > >  include/misc/ocxl.h |  6 +++---
> > > > >  2 files changed, 18 insertions(+), 20 deletions(-)
> > > > >
> > > > > diff --git a/arch/powerpc/include/asm/pnv-ocxl.h
> > > > > b/arch/powerpc/include/asm/pnv-ocxl.h
> > > > > index 0b2a6707e555..b23c99bc0c84 100644
> > > > > --- a/arch/powerpc/include/asm/pnv-ocxl.h
> > > > > +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> > > > > @@ -9,29 +9,27 @@
> > > > >  #define PNV_OCXL_TL_BITS_PER_RATE   4
> > > > >  #define PNV_OCXL_TL_RATE_BUF_SIZE
> > > > ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8)
> > > > >
> > > > > -extern int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16
> > > > *enabled,
> > > > > - u16 *supported);
> > > > 
> > > > It works w or w/o extern when declare functions. Searching 'extern'
> > > > under include can find so many functions with 'extern' adding. Do we 
> > > > have
> > > a
> > > > explicit standard if we should add or remove 'exter' in function
> > > declaration?
> > > > 
> > > > I have no objection to this patch, just want to make clear so that I can
> > > handle
> > > > it w/o confusion.
> > > > 
> > > > Thanks
> > > > Baoquan
> > > > 
> > > 
> > > For the OpenCAPI driver, we have settled on not having 'extern' on
> > > functions.
> > > 
> > > I don't think I've seen a standard that supports or refutes this, but it
> > > does not value add.
> > > 
> > 
> > FWIW this is a warning condition for checkpatch:
> > 
> > $ ./scripts/checkpatch.pl --strict -f include/misc/ocxl.h
> 
> Good to know, thanks.
> 
> I didn't know checkpatch.pl can run on header file directly. Tried to
> check patch with '--strict -f', the below info doesn't appear. But it

Hmm... -f is to check a source file, not a patch... What did you try
exactly ?

> does give out below information when run on header file.
> 
> > 
> > [...]
> > 
> > CHECK: extern prototypes should be avoided in .h files
> > #176: FILE: include/misc/ocxl.h:176:
> > +extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id);
> > 
> > [...]
> > 
> 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 04/27] ocxl: Remove unnecessary externs

2020-02-26 Thread Greg Kurz
On Wed, 26 Feb 2020 19:26:34 +1100
"Alastair D'Silva"  wrote:

> > -Original Message-
> > From: Baoquan He 
> > Sent: Wednesday, 26 February 2020 7:15 PM
> > To: Alastair D'Silva 
> > Cc: alast...@d-silva.org; Aneesh Kumar K . V
> > ; Oliver O'Halloran ;
> > Benjamin Herrenschmidt ; Paul Mackerras
> > ; Michael Ellerman ; Frederic
> > Barrat ; Andrew Donnellan ;
> > Arnd Bergmann ; Greg Kroah-Hartman
> > ; Dan Williams ;
> > Vishal Verma ; Dave Jiang
> > ; Ira Weiny ; Andrew Morton
> > ; Mauro Carvalho Chehab
> > ; David S. Miller ;
> > Rob Herring ; Anton Blanchard ;
> > Krzysztof Kozlowski ; Mahesh Salgaonkar
> > ; Madhavan Srinivasan
> > ; Cédric Le Goater ; Anju T
> > Sudhakar ; Hari Bathini
> > ; Thomas Gleixner ; Greg
> > Kurz ; Nicholas Piggin ; Masahiro
> > Yamada ; Alexey Kardashevskiy
> > ; linux-ker...@vger.kernel.org; linuxppc-
> > d...@lists.ozlabs.org; linux-nvdimm@lists.01.org; linux...@kvack.org
> > Subject: Re: [PATCH v3 04/27] ocxl: Remove unnecessary externs
> > 
> > On 02/21/20 at 02:26pm, Alastair D'Silva wrote:
> > > From: Alastair D'Silva 
> > >
> > > Function declarations don't need externs, remove the existing ones so
> > > they are consistent with newer code
> > >
> > > Signed-off-by: Alastair D'Silva 
> > > ---
> > >  arch/powerpc/include/asm/pnv-ocxl.h | 32 ++---
> > >  include/misc/ocxl.h |  6 +++---
> > >  2 files changed, 18 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/arch/powerpc/include/asm/pnv-ocxl.h
> > > b/arch/powerpc/include/asm/pnv-ocxl.h
> > > index 0b2a6707e555..b23c99bc0c84 100644
> > > --- a/arch/powerpc/include/asm/pnv-ocxl.h
> > > +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> > > @@ -9,29 +9,27 @@
> > >  #define PNV_OCXL_TL_BITS_PER_RATE   4
> > >  #define PNV_OCXL_TL_RATE_BUF_SIZE
> > ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8)
> > >
> > > -extern int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16
> > *enabled,
> > > - u16 *supported);
> > 
> > It works w or w/o extern when declare functions. Searching 'extern'
> > under include can find so many functions with 'extern' adding. Do we have
> a
> > explicit standard if we should add or remove 'exter' in function
> declaration?
> > 
> > I have no objection to this patch, just want to make clear so that I can
> handle
> > it w/o confusion.
> > 
> > Thanks
> > Baoquan
> > 
> 
> For the OpenCAPI driver, we have settled on not having 'extern' on
> functions.
> 
> I don't think I've seen a standard that supports or refutes this, but it
> does not value add.
> 

FWIW this is a warning condition for checkpatch:

$ ./scripts/checkpatch.pl --strict -f include/misc/ocxl.h

[...]

CHECK: extern prototypes should be avoided in .h files
#176: FILE: include/misc/ocxl.h:176:
+extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id);

[...]
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 05/27] powerpc: Map & release OpenCAPI LPC memory

2020-01-20 Thread Greg Kurz
On Tue, 21 Jan 2020 17:46:12 +1100
Andrew Donnellan  wrote:

> On 3/12/19 2:46 pm, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > This patch adds platform support to map & release LPC memory.
> 
> Might want to explain what LPC is.
> 
> Otherwise:
> 
> Reviewed-by: Andrew Donnellan 
> 
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   arch/powerpc/include/asm/pnv-ocxl.h   |  2 ++
> >   arch/powerpc/platforms/powernv/ocxl.c | 42 +++
> >   2 files changed, 44 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
> > b/arch/powerpc/include/asm/pnv-ocxl.h
> > index 7de82647e761..f8f8ffb48aa8 100644
> > --- a/arch/powerpc/include/asm/pnv-ocxl.h
> > +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> > @@ -32,5 +32,7 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void 
> > *platform_data, int pe_handle)
> >   
> >   extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
> >   extern void pnv_ocxl_free_xive_irq(u32 irq);
> > +extern u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size);
> > +extern void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);
> 
> nit: I don't think these need to be extern?
> 
> 

And even if they were, as verified by checkpatch:

"extern prototypes should be avoided in .h files"
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org