Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On 04/10/2018 10:55 AM, Cornelia Huck wrote: > On Tue, 10 Apr 2018 10:16:39 +0800 > Dong Jia Shiwrote: > >> Does the following effect make sense? >> >> # tracer: nop >> # >> # _-=> irqs-off >> # / _=> need-resched >> #| / _---=> hardirq/softirq >> #|| / _--=> preempt-depth >> #||| / delay >> # TASK-PID CPU# TIMESTAMP FUNCTION >> # | | | | | >> qemu-system-s39-4252 [006] 231.457214: vfio_ccw_cp_prefetch: >> schid=0.0.013f errno=0 >> qemu-system-s39-4252 [006] 231.457222: vfio_ccw_fsm_io_helper: >> schid=0.0.013f errno=0 >> qemu-system-s39-4252 [006] 231.457223: vfio_ccw_io_fctl: >> schid=0.0.013f fctl=4 errno=0 >> ... ... > > I would likely find this useful for following a code path and making > sure the right things are called. > > We certainly want error conditions traced as well (although the code > has been working too well for me to trigger that easily :) > Looks interesting. The approach is to trace (all) exits from selected functions, or? It is an interesting approach. Function entry could probably be traced with the function tracer (if we should ever need that, although relating the two unambiguously may not be possible -- I don't know). I'm still not completely in clear how do we want to do error reporting. Using traces as means of error reporting smells like abuse to me. @Dong Jia: could you help me get an overview? I'm thinking of something like a matrix of type: error | handler | action (propagate as / report / try recover / discard silently) I'm mostly interested in what gets reported and if there is stuff that should be reported. Other than that I'm in favor. And having traces for tracking error condition is clearly better than having nothing. Regards, Halil
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On 04/10/2018 10:55 AM, Cornelia Huck wrote: > On Tue, 10 Apr 2018 10:16:39 +0800 > Dong Jia Shi wrote: > >> Does the following effect make sense? >> >> # tracer: nop >> # >> # _-=> irqs-off >> # / _=> need-resched >> #| / _---=> hardirq/softirq >> #|| / _--=> preempt-depth >> #||| / delay >> # TASK-PID CPU# TIMESTAMP FUNCTION >> # | | | | | >> qemu-system-s39-4252 [006] 231.457214: vfio_ccw_cp_prefetch: >> schid=0.0.013f errno=0 >> qemu-system-s39-4252 [006] 231.457222: vfio_ccw_fsm_io_helper: >> schid=0.0.013f errno=0 >> qemu-system-s39-4252 [006] 231.457223: vfio_ccw_io_fctl: >> schid=0.0.013f fctl=4 errno=0 >> ... ... > > I would likely find this useful for following a code path and making > sure the right things are called. > > We certainly want error conditions traced as well (although the code > has been working too well for me to trigger that easily :) > Looks interesting. The approach is to trace (all) exits from selected functions, or? It is an interesting approach. Function entry could probably be traced with the function tracer (if we should ever need that, although relating the two unambiguously may not be possible -- I don't know). I'm still not completely in clear how do we want to do error reporting. Using traces as means of error reporting smells like abuse to me. @Dong Jia: could you help me get an overview? I'm thinking of something like a matrix of type: error | handler | action (propagate as / report / try recover / discard silently) I'm mostly interested in what gets reported and if there is stuff that should be reported. Other than that I'm in favor. And having traces for tracking error condition is clearly better than having nothing. Regards, Halil
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On Tue, 10 Apr 2018 10:16:39 +0800 Dong Jia Shiwrote: > Does the following effect make sense? > > # tracer: nop > # > # _-=> irqs-off > # / _=> need-resched > #| / _---=> hardirq/softirq > #|| / _--=> preempt-depth > #||| / delay > # TASK-PID CPU# TIMESTAMP FUNCTION > # | | | | | > qemu-system-s39-4252 [006] 231.457214: vfio_ccw_cp_prefetch: > schid=0.0.013f errno=0 > qemu-system-s39-4252 [006] 231.457222: vfio_ccw_fsm_io_helper: > schid=0.0.013f errno=0 > qemu-system-s39-4252 [006] 231.457223: vfio_ccw_io_fctl: > schid=0.0.013f fctl=4 errno=0 > ... ... I would likely find this useful for following a code path and making sure the right things are called. We certainly want error conditions traced as well (although the code has been working too well for me to trigger that easily :)
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On Tue, 10 Apr 2018 10:16:39 +0800 Dong Jia Shi wrote: > Does the following effect make sense? > > # tracer: nop > # > # _-=> irqs-off > # / _=> need-resched > #| / _---=> hardirq/softirq > #|| / _--=> preempt-depth > #||| / delay > # TASK-PID CPU# TIMESTAMP FUNCTION > # | | | | | > qemu-system-s39-4252 [006] 231.457214: vfio_ccw_cp_prefetch: > schid=0.0.013f errno=0 > qemu-system-s39-4252 [006] 231.457222: vfio_ccw_fsm_io_helper: > schid=0.0.013f errno=0 > qemu-system-s39-4252 [006] 231.457223: vfio_ccw_io_fctl: > schid=0.0.013f fctl=4 errno=0 > ... ... I would likely find this useful for following a code path and making sure the right things are called. We certainly want error conditions traced as well (although the code has been working too well for me to trigger that easily :)
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On Tue, 27 Mar 2018 17:27:54 +0200 Halil Pasicwrote: > On 03/27/2018 12:07 PM, Cornelia Huck wrote: > > On Tue, 27 Mar 2018 15:51:14 +0800 > > Dong Jia Shi wrote: > > > >> * Cornelia Huck [2018-03-26 15:59:02 +0200]: > >> > >> [...] > >> > @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > > io_region->ret_code = cp_prefetch(>cp); > if (io_region->ret_code) { > + > trace_vfio_ccw_cp_prefetch_failed(get_schid(private), > + > io_region->ret_code); > cp_free(>cp); > goto err_out; > } > @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > /* Start channel program and wait for I/O interrupt. */ > io_region->ret_code = fsm_io_helper(private); > if (io_region->ret_code) { > +trace_vfio_ccw_ssch_failed(get_schid(private), > + io_region->ret_code); > cp_free(>cp); > goto err_out; > } > @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) { > /* XXX: Handle halt. */ > io_region->ret_code = -EOPNOTSUPP; > +trace_vfio_ccw_halt(get_schid(private)); > goto err_out; > } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) { > /* XXX: Handle clear. */ > io_region->ret_code = -EOPNOTSUPP; > +trace_vfio_ccw_clear(get_schid(private)); > goto err_out; > >>> > >>> Hmmm perhaps better to just trace the function (start/halt/clear) > >>> in any case? > >>> > >> I agree trace the function in any case is good. @Halil, opinion? > > See below. I don't really understand the question. Trace the function > means, trace when it was requested on a subch, or trace the outcome > of the request? Seems the question got amended though. Both tracing the functions per se and tracing their outcome has its benefits IME. > > >> > >> But the traces for cp_prefetch() and fsm_io_helper() should also be > >> kept, since they are helpful to debug problem. So I tend to trace the > >> following in any case: > >> - cp_prefetch() > >> - fsm_io_helper() > >> - start > >> - halt > >> - clear > > > > OK, I was unclear :) I'd argue to keep the others, just replace the > > halt/clear tracing with tracing the function. > > I'm a bit confused. > > My idea was the following: Prior to this patch we had a kind of OK > possibility to trace what we consider the expected and good scenario > using the function tracer and the normal cio stuff. But what I wanted is > to verify that my fix works (problem occurs but is handled more > appropriately) and I've found it difficult to trace this. So the idea was > to introduce trace points which tell us what went wrong. The idea is to > benefit diagnostic of unrecoverable failures and get an idea how often > are we doing extra work recovering recoverable failures. > > In this sense halt and clear is something that does not work currently. > When we get proper halt and clear, these trace points were supposed to > become obsolete and get removed. I guess the implementation will > eventually issue csch() and hsch() for the underlying subchannel and and > we should be able to trace those (see drivers/s390/cio/ioasm.c) Tracing what userspace expects us to do has its benefits as well (i.e. do we get mainly ssch? unexpected amounts of csch? etc.). I find it useful to be able to get this information from the vfio-ccw layer already. > > Now this is the tricky part. I've read some used to see trace points as > part of the kernel ABI. See e.g. https://lwn.net/Articles/705270/. AFAIU > this is not a concern any more -- but my confidence on this is pretty > low. I'd not worry about that. FWIW, for kvm/s390 I tried to do the following: - one set of tracepoints for things that are mandated by the architecture and therefore expected to be stable - and another set of tracepoints for implementation details that have a good change of changing > > So IMHO we have two questions to answer: > > * Do we want static trace points (events) for undesirable or concerning > stuff happened (e.g. translation failed, a function we hope we can live > without was supposed to be executed)? > > * Do we want static trace points (events) coverage for the normal path > (that is beyond what cio and the function tracer
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On Tue, 27 Mar 2018 17:27:54 +0200 Halil Pasic wrote: > On 03/27/2018 12:07 PM, Cornelia Huck wrote: > > On Tue, 27 Mar 2018 15:51:14 +0800 > > Dong Jia Shi wrote: > > > >> * Cornelia Huck [2018-03-26 15:59:02 +0200]: > >> > >> [...] > >> > @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > > io_region->ret_code = cp_prefetch(>cp); > if (io_region->ret_code) { > + > trace_vfio_ccw_cp_prefetch_failed(get_schid(private), > + > io_region->ret_code); > cp_free(>cp); > goto err_out; > } > @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > /* Start channel program and wait for I/O interrupt. */ > io_region->ret_code = fsm_io_helper(private); > if (io_region->ret_code) { > +trace_vfio_ccw_ssch_failed(get_schid(private), > + io_region->ret_code); > cp_free(>cp); > goto err_out; > } > @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) { > /* XXX: Handle halt. */ > io_region->ret_code = -EOPNOTSUPP; > +trace_vfio_ccw_halt(get_schid(private)); > goto err_out; > } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) { > /* XXX: Handle clear. */ > io_region->ret_code = -EOPNOTSUPP; > +trace_vfio_ccw_clear(get_schid(private)); > goto err_out; > >>> > >>> Hmmm perhaps better to just trace the function (start/halt/clear) > >>> in any case? > >>> > >> I agree trace the function in any case is good. @Halil, opinion? > > See below. I don't really understand the question. Trace the function > means, trace when it was requested on a subch, or trace the outcome > of the request? Seems the question got amended though. Both tracing the functions per se and tracing their outcome has its benefits IME. > > >> > >> But the traces for cp_prefetch() and fsm_io_helper() should also be > >> kept, since they are helpful to debug problem. So I tend to trace the > >> following in any case: > >> - cp_prefetch() > >> - fsm_io_helper() > >> - start > >> - halt > >> - clear > > > > OK, I was unclear :) I'd argue to keep the others, just replace the > > halt/clear tracing with tracing the function. > > I'm a bit confused. > > My idea was the following: Prior to this patch we had a kind of OK > possibility to trace what we consider the expected and good scenario > using the function tracer and the normal cio stuff. But what I wanted is > to verify that my fix works (problem occurs but is handled more > appropriately) and I've found it difficult to trace this. So the idea was > to introduce trace points which tell us what went wrong. The idea is to > benefit diagnostic of unrecoverable failures and get an idea how often > are we doing extra work recovering recoverable failures. > > In this sense halt and clear is something that does not work currently. > When we get proper halt and clear, these trace points were supposed to > become obsolete and get removed. I guess the implementation will > eventually issue csch() and hsch() for the underlying subchannel and and > we should be able to trace those (see drivers/s390/cio/ioasm.c) Tracing what userspace expects us to do has its benefits as well (i.e. do we get mainly ssch? unexpected amounts of csch? etc.). I find it useful to be able to get this information from the vfio-ccw layer already. > > Now this is the tricky part. I've read some used to see trace points as > part of the kernel ABI. See e.g. https://lwn.net/Articles/705270/. AFAIU > this is not a concern any more -- but my confidence on this is pretty > low. I'd not worry about that. FWIW, for kvm/s390 I tried to do the following: - one set of tracepoints for things that are mandated by the architecture and therefore expected to be stable - and another set of tracepoints for implementation details that have a good change of changing > > So IMHO we have two questions to answer: > > * Do we want static trace points (events) for undesirable or concerning > stuff happened (e.g. translation failed, a function we hope we can live > without was supposed to be executed)? > > * Do we want static trace points (events) coverage for the normal path > (that is beyond what cio and the function tracer already give us)? What > benefit do we expect if we do want these? (E.g.
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On 03/27/2018 12:07 PM, Cornelia Huck wrote: > On Tue, 27 Mar 2018 15:51:14 +0800 > Dong Jia Shiwrote: > >> * Cornelia Huck [2018-03-26 15:59:02 +0200]: >> >> [...] >> @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private *private, io_region->ret_code = cp_prefetch(>cp); if (io_region->ret_code) { + trace_vfio_ccw_cp_prefetch_failed(get_schid(private), +io_region->ret_code); cp_free(>cp); goto err_out; } @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private *private, /* Start channel program and wait for I/O interrupt. */ io_region->ret_code = fsm_io_helper(private); if (io_region->ret_code) { + trace_vfio_ccw_ssch_failed(get_schid(private), + io_region->ret_code); cp_free(>cp); goto err_out; } @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private *private, } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) { /* XXX: Handle halt. */ io_region->ret_code = -EOPNOTSUPP; + trace_vfio_ccw_halt(get_schid(private)); goto err_out; } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) { /* XXX: Handle clear. */ io_region->ret_code = -EOPNOTSUPP; + trace_vfio_ccw_clear(get_schid(private)); goto err_out; >>> >>> Hmmm perhaps better to just trace the function (start/halt/clear) >>> in any case? >>> >> I agree trace the function in any case is good. @Halil, opinion? See below. I don't really understand the question. Trace the function means, trace when it was requested on a subch, or trace the outcome of the request? Seems the question got amended though. >> >> But the traces for cp_prefetch() and fsm_io_helper() should also be >> kept, since they are helpful to debug problem. So I tend to trace the >> following in any case: >> - cp_prefetch() >> - fsm_io_helper() >> - start >> - halt >> - clear > > OK, I was unclear :) I'd argue to keep the others, just replace the > halt/clear tracing with tracing the function. I'm a bit confused. My idea was the following: Prior to this patch we had a kind of OK possibility to trace what we consider the expected and good scenario using the function tracer and the normal cio stuff. But what I wanted is to verify that my fix works (problem occurs but is handled more appropriately) and I've found it difficult to trace this. So the idea was to introduce trace points which tell us what went wrong. The idea is to benefit diagnostic of unrecoverable failures and get an idea how often are we doing extra work recovering recoverable failures. In this sense halt and clear is something that does not work currently. When we get proper halt and clear, these trace points were supposed to become obsolete and get removed. I guess the implementation will eventually issue csch() and hsch() for the underlying subchannel and and we should be able to trace those (see drivers/s390/cio/ioasm.c) Now this is the tricky part. I've read some used to see trace points as part of the kernel ABI. See e.g. https://lwn.net/Articles/705270/. AFAIU this is not a concern any more -- but my confidence on this is pretty low. So IMHO we have two questions to answer: * Do we want static trace points (events) for undesirable or concerning stuff happened (e.g. translation failed, a function we hope we can live without was supposed to be executed)? * Do we want static trace points (events) coverage for the normal path (that is beyond what cio and the function tracer already give us)? What benefit do we expect if we do want these? (E.g. performance evaluation, better debugging especially when multiple virtualized subchannels used.) > >> } [..] +DECLARE_EVENT_CLASS(vfio_ccw_notsupp, + TP_PROTO(struct subchannel_id schid), + TP_ARGS(schid), + + TP_STRUCT__entry( + __field_struct(struct subchannel_id, schid) + ), + + TP_fast_assign( + __entry->schid = schid; + ), + + TP_printk("(schid 0.%x.%04X) request not supported", + __entry->schid.ssid, __entry->schid.sch_no) +); >>> >>> Especially as I don't plan to leave this unsupported for too long :) Sounds great! I don't insist. Especially not if our linux guest tells us what went wrong. @Dong Jia: What would happen should the guest for some reason try to do a clear or a halt (e.g. we make it fail here, guest retries a couple of times and then panicks or gives up on the device)? >>> >>> Just tracing the
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On 03/27/2018 12:07 PM, Cornelia Huck wrote: > On Tue, 27 Mar 2018 15:51:14 +0800 > Dong Jia Shi wrote: > >> * Cornelia Huck [2018-03-26 15:59:02 +0200]: >> >> [...] >> @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private *private, io_region->ret_code = cp_prefetch(>cp); if (io_region->ret_code) { + trace_vfio_ccw_cp_prefetch_failed(get_schid(private), +io_region->ret_code); cp_free(>cp); goto err_out; } @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private *private, /* Start channel program and wait for I/O interrupt. */ io_region->ret_code = fsm_io_helper(private); if (io_region->ret_code) { + trace_vfio_ccw_ssch_failed(get_schid(private), + io_region->ret_code); cp_free(>cp); goto err_out; } @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private *private, } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) { /* XXX: Handle halt. */ io_region->ret_code = -EOPNOTSUPP; + trace_vfio_ccw_halt(get_schid(private)); goto err_out; } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) { /* XXX: Handle clear. */ io_region->ret_code = -EOPNOTSUPP; + trace_vfio_ccw_clear(get_schid(private)); goto err_out; >>> >>> Hmmm perhaps better to just trace the function (start/halt/clear) >>> in any case? >>> >> I agree trace the function in any case is good. @Halil, opinion? See below. I don't really understand the question. Trace the function means, trace when it was requested on a subch, or trace the outcome of the request? Seems the question got amended though. >> >> But the traces for cp_prefetch() and fsm_io_helper() should also be >> kept, since they are helpful to debug problem. So I tend to trace the >> following in any case: >> - cp_prefetch() >> - fsm_io_helper() >> - start >> - halt >> - clear > > OK, I was unclear :) I'd argue to keep the others, just replace the > halt/clear tracing with tracing the function. I'm a bit confused. My idea was the following: Prior to this patch we had a kind of OK possibility to trace what we consider the expected and good scenario using the function tracer and the normal cio stuff. But what I wanted is to verify that my fix works (problem occurs but is handled more appropriately) and I've found it difficult to trace this. So the idea was to introduce trace points which tell us what went wrong. The idea is to benefit diagnostic of unrecoverable failures and get an idea how often are we doing extra work recovering recoverable failures. In this sense halt and clear is something that does not work currently. When we get proper halt and clear, these trace points were supposed to become obsolete and get removed. I guess the implementation will eventually issue csch() and hsch() for the underlying subchannel and and we should be able to trace those (see drivers/s390/cio/ioasm.c) Now this is the tricky part. I've read some used to see trace points as part of the kernel ABI. See e.g. https://lwn.net/Articles/705270/. AFAIU this is not a concern any more -- but my confidence on this is pretty low. So IMHO we have two questions to answer: * Do we want static trace points (events) for undesirable or concerning stuff happened (e.g. translation failed, a function we hope we can live without was supposed to be executed)? * Do we want static trace points (events) coverage for the normal path (that is beyond what cio and the function tracer already give us)? What benefit do we expect if we do want these? (E.g. performance evaluation, better debugging especially when multiple virtualized subchannels used.) > >> } [..] +DECLARE_EVENT_CLASS(vfio_ccw_notsupp, + TP_PROTO(struct subchannel_id schid), + TP_ARGS(schid), + + TP_STRUCT__entry( + __field_struct(struct subchannel_id, schid) + ), + + TP_fast_assign( + __entry->schid = schid; + ), + + TP_printk("(schid 0.%x.%04X) request not supported", + __entry->schid.ssid, __entry->schid.sch_no) +); >>> >>> Especially as I don't plan to leave this unsupported for too long :) Sounds great! I don't insist. Especially not if our linux guest tells us what went wrong. @Dong Jia: What would happen should the guest for some reason try to do a clear or a halt (e.g. we make it fail here, guest retries a couple of times and then panicks or gives up on the device)? >>> >>> Just tracing the function is useful now and will stay useful in the
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On Tue, 27 Mar 2018 15:51:14 +0800 Dong Jia Shiwrote: > * Cornelia Huck [2018-03-26 15:59:02 +0200]: > > [...] > > > > @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private > > > *private, > > > > > > io_region->ret_code = cp_prefetch(>cp); > > > if (io_region->ret_code) { > > > + trace_vfio_ccw_cp_prefetch_failed(get_schid(private), > > > + io_region->ret_code); > > > cp_free(>cp); > > > goto err_out; > > > } > > > @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private > > > *private, > > > /* Start channel program and wait for I/O interrupt. */ > > > io_region->ret_code = fsm_io_helper(private); > > > if (io_region->ret_code) { > > > + trace_vfio_ccw_ssch_failed(get_schid(private), > > > +io_region->ret_code); > > > cp_free(>cp); > > > goto err_out; > > > } > > > @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private > > > *private, > > > } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) { > > > /* XXX: Handle halt. */ > > > io_region->ret_code = -EOPNOTSUPP; > > > + trace_vfio_ccw_halt(get_schid(private)); > > > goto err_out; > > > } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) { > > > /* XXX: Handle clear. */ > > > io_region->ret_code = -EOPNOTSUPP; > > > + trace_vfio_ccw_clear(get_schid(private)); > > > goto err_out; > > > > Hmmm perhaps better to just trace the function (start/halt/clear) > > in any case? > > > I agree trace the function in any case is good. @Halil, opinion? > > But the traces for cp_prefetch() and fsm_io_helper() should also be > kept, since they are helpful to debug problem. So I tend to trace the > following in any case: > - cp_prefetch() > - fsm_io_helper() > - start > - halt > - clear OK, I was unclear :) I'd argue to keep the others, just replace the halt/clear tracing with tracing the function. > > > > } > > > > > > diff --git a/drivers/s390/cio/vfio_ccw_trace.h > > > b/drivers/s390/cio/vfio_ccw_trace.h > > > new file mode 100644 > > > index ..edd3321cd919 > > > --- /dev/null > > > +++ b/drivers/s390/cio/vfio_ccw_trace.h > > > @@ -0,0 +1,86 @@ > > > +/* SPDX-License-Identifier: GPL-2.0 > > > + * Tracepoints for vfio_ccw driver > > > + * > > > + * Copyright IBM Corp. 2018 > > > + * > > > + * Author(s): Dong Jia Shi > > > + *Halil Pasic > > > + */ > > > + > > > + > > > +#undef TRACE_SYSTEM > > > +#define TRACE_SYSTEM vfio_ccw > > > + > > > +#if !defined(_VFIO_CCW_TRACE_) || defined(TRACE_HEADER_MULTI_READ) > > > +#define _VFIO_CCW_TRACE_ > > > + > > > +#include > > > + > > > +TRACE_EVENT(vfio_ccw_cp_prefetch_failed, > > > + TP_PROTO(struct subchannel_id schid, int errno), > > > + TP_ARGS(schid, errno), > > > + > > > + TP_STRUCT__entry( > > > + __field_struct(struct subchannel_id, schid) > > > + __field(int, errno) > > > + ), > > > + > > > + TP_fast_assign( > > > + __entry->schid = schid; > > > + __entry->errno = errno; > > > + ), > > > + > > > + TP_printk("(schid 0.%x.%04X) translation failed (errno: %d)", > > > + __entry->schid.ssid, __entry->schid.sch_no, __entry->errno) > > > +); > > > + > > > +TRACE_EVENT(vfio_ccw_ssch_failed, > > > + TP_PROTO(struct subchannel_id schid, int errno), > > > + TP_ARGS(schid, errno), > > > + > > > + TP_STRUCT__entry( > > > + __field_struct(struct subchannel_id, schid) > > > + __field(int, errno) > > > + ), > > > + > > > + TP_fast_assign( > > > + __entry->schid = schid; > > > + __entry->errno = errno; > > > + ), > > > + > > > + TP_printk("(schid 0.%x.%04X) ssch failed (errno: %d)", > > > + __entry->schid.ssid, __entry->schid.sch_no, __entry->errno) > > > +); > > > + > > > +DECLARE_EVENT_CLASS(vfio_ccw_notsupp, > > > + TP_PROTO(struct subchannel_id schid), > > > + TP_ARGS(schid), > > > + > > > + TP_STRUCT__entry( > > > + __field_struct(struct subchannel_id, schid) > > > + ), > > > + > > > + TP_fast_assign( > > > + __entry->schid = schid; > > > + ), > > > + > > > + TP_printk("(schid 0.%x.%04X) request not supported", > > > + __entry->schid.ssid, __entry->schid.sch_no) > > > +); > > > > Especially as I don't plan to leave this unsupported for too long :) > > > > Just tracing the function is useful now and will stay useful in the > > future. > If we agree with ideas given above, we could: > 1. DECLARE_EVENT_CLASS as vfio_ccw_schid_errno > 2. DEFINE_EVENT: >vfio_ccw_fam_io_helper >vfio_ccw_cp_prefetch >vfio_ccw_io_start >vfio_ccw_io_clear >vfio_ccw_io_halt Use a
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On Tue, 27 Mar 2018 15:51:14 +0800 Dong Jia Shi wrote: > * Cornelia Huck [2018-03-26 15:59:02 +0200]: > > [...] > > > > @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private > > > *private, > > > > > > io_region->ret_code = cp_prefetch(>cp); > > > if (io_region->ret_code) { > > > + trace_vfio_ccw_cp_prefetch_failed(get_schid(private), > > > + io_region->ret_code); > > > cp_free(>cp); > > > goto err_out; > > > } > > > @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private > > > *private, > > > /* Start channel program and wait for I/O interrupt. */ > > > io_region->ret_code = fsm_io_helper(private); > > > if (io_region->ret_code) { > > > + trace_vfio_ccw_ssch_failed(get_schid(private), > > > +io_region->ret_code); > > > cp_free(>cp); > > > goto err_out; > > > } > > > @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private > > > *private, > > > } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) { > > > /* XXX: Handle halt. */ > > > io_region->ret_code = -EOPNOTSUPP; > > > + trace_vfio_ccw_halt(get_schid(private)); > > > goto err_out; > > > } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) { > > > /* XXX: Handle clear. */ > > > io_region->ret_code = -EOPNOTSUPP; > > > + trace_vfio_ccw_clear(get_schid(private)); > > > goto err_out; > > > > Hmmm perhaps better to just trace the function (start/halt/clear) > > in any case? > > > I agree trace the function in any case is good. @Halil, opinion? > > But the traces for cp_prefetch() and fsm_io_helper() should also be > kept, since they are helpful to debug problem. So I tend to trace the > following in any case: > - cp_prefetch() > - fsm_io_helper() > - start > - halt > - clear OK, I was unclear :) I'd argue to keep the others, just replace the halt/clear tracing with tracing the function. > > > > } > > > > > > diff --git a/drivers/s390/cio/vfio_ccw_trace.h > > > b/drivers/s390/cio/vfio_ccw_trace.h > > > new file mode 100644 > > > index ..edd3321cd919 > > > --- /dev/null > > > +++ b/drivers/s390/cio/vfio_ccw_trace.h > > > @@ -0,0 +1,86 @@ > > > +/* SPDX-License-Identifier: GPL-2.0 > > > + * Tracepoints for vfio_ccw driver > > > + * > > > + * Copyright IBM Corp. 2018 > > > + * > > > + * Author(s): Dong Jia Shi > > > + *Halil Pasic > > > + */ > > > + > > > + > > > +#undef TRACE_SYSTEM > > > +#define TRACE_SYSTEM vfio_ccw > > > + > > > +#if !defined(_VFIO_CCW_TRACE_) || defined(TRACE_HEADER_MULTI_READ) > > > +#define _VFIO_CCW_TRACE_ > > > + > > > +#include > > > + > > > +TRACE_EVENT(vfio_ccw_cp_prefetch_failed, > > > + TP_PROTO(struct subchannel_id schid, int errno), > > > + TP_ARGS(schid, errno), > > > + > > > + TP_STRUCT__entry( > > > + __field_struct(struct subchannel_id, schid) > > > + __field(int, errno) > > > + ), > > > + > > > + TP_fast_assign( > > > + __entry->schid = schid; > > > + __entry->errno = errno; > > > + ), > > > + > > > + TP_printk("(schid 0.%x.%04X) translation failed (errno: %d)", > > > + __entry->schid.ssid, __entry->schid.sch_no, __entry->errno) > > > +); > > > + > > > +TRACE_EVENT(vfio_ccw_ssch_failed, > > > + TP_PROTO(struct subchannel_id schid, int errno), > > > + TP_ARGS(schid, errno), > > > + > > > + TP_STRUCT__entry( > > > + __field_struct(struct subchannel_id, schid) > > > + __field(int, errno) > > > + ), > > > + > > > + TP_fast_assign( > > > + __entry->schid = schid; > > > + __entry->errno = errno; > > > + ), > > > + > > > + TP_printk("(schid 0.%x.%04X) ssch failed (errno: %d)", > > > + __entry->schid.ssid, __entry->schid.sch_no, __entry->errno) > > > +); > > > + > > > +DECLARE_EVENT_CLASS(vfio_ccw_notsupp, > > > + TP_PROTO(struct subchannel_id schid), > > > + TP_ARGS(schid), > > > + > > > + TP_STRUCT__entry( > > > + __field_struct(struct subchannel_id, schid) > > > + ), > > > + > > > + TP_fast_assign( > > > + __entry->schid = schid; > > > + ), > > > + > > > + TP_printk("(schid 0.%x.%04X) request not supported", > > > + __entry->schid.ssid, __entry->schid.sch_no) > > > +); > > > > Especially as I don't plan to leave this unsupported for too long :) > > > > Just tracing the function is useful now and will stay useful in the > > future. > If we agree with ideas given above, we could: > 1. DECLARE_EVENT_CLASS as vfio_ccw_schid_errno > 2. DEFINE_EVENT: >vfio_ccw_fam_io_helper >vfio_ccw_cp_prefetch >vfio_ccw_io_start >vfio_ccw_io_clear >vfio_ccw_io_halt Use a vfio_ccw_io_fctl tracepoint instead? > 3. add trace points in coresponding places > > > > > Another idea:
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On Wed, 21 Mar 2018 03:08:22 +0100 Dong Jia Shiwrote: > From: Halil Pasic > > Add some tracepoints so we can inspect what is not working as is should. Tracepoints are certainly helpful (we can add more later). > > Signed-off-by: Halil Pasic > Signed-off-by: Dong Jia Shi > --- > drivers/s390/cio/Makefile | 1 + > drivers/s390/cio/vfio_ccw_fsm.c | 13 ++ > drivers/s390/cio/vfio_ccw_trace.h | 86 > +++ > 3 files changed, 100 insertions(+) > create mode 100644 drivers/s390/cio/vfio_ccw_trace.h > > diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile > index a070ef0efe65..f230516abb96 100644 > --- a/drivers/s390/cio/Makefile > +++ b/drivers/s390/cio/Makefile > @@ -5,6 +5,7 @@ > > # The following is required for define_trace.h to find ./trace.h > CFLAGS_trace.o := -I$(src) > +CFLAGS_vfio_ccw_fsm.o := -I$(src) > > obj-y += airq.o blacklist.o chsc.o cio.o css.o chp.o idset.o isc.o \ > fcx.o itcw.o crw.o ccwreq.o trace.o ioasm.o > diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c > index c30420c517b1..7ed27f90d741 100644 > --- a/drivers/s390/cio/vfio_ccw_fsm.c > +++ b/drivers/s390/cio/vfio_ccw_fsm.c > @@ -13,6 +13,9 @@ > #include "ioasm.h" > #include "vfio_ccw_private.h" > > +#define CREATE_TRACE_POINTS > +#include "vfio_ccw_trace.h" > + > static int fsm_io_helper(struct vfio_ccw_private *private) > { > struct subchannel *sch; > @@ -105,6 +108,10 @@ static void fsm_disabled_irq(struct vfio_ccw_private > *private, >*/ > cio_disable_subchannel(sch); > } > +inline struct subchannel_id get_schid(struct vfio_ccw_private *p) > +{ > + return p->sch->schid; > +} > > /* > * Deal with the ccw command request from the userspace. > @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > > io_region->ret_code = cp_prefetch(>cp); > if (io_region->ret_code) { > + trace_vfio_ccw_cp_prefetch_failed(get_schid(private), > + io_region->ret_code); > cp_free(>cp); > goto err_out; > } > @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > /* Start channel program and wait for I/O interrupt. */ > io_region->ret_code = fsm_io_helper(private); > if (io_region->ret_code) { > + trace_vfio_ccw_ssch_failed(get_schid(private), > +io_region->ret_code); > cp_free(>cp); > goto err_out; > } > @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) { > /* XXX: Handle halt. */ > io_region->ret_code = -EOPNOTSUPP; > + trace_vfio_ccw_halt(get_schid(private)); > goto err_out; > } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) { > /* XXX: Handle clear. */ > io_region->ret_code = -EOPNOTSUPP; > + trace_vfio_ccw_clear(get_schid(private)); > goto err_out; Hmmm perhaps better to just trace the function (start/halt/clear) in any case? > } > > diff --git a/drivers/s390/cio/vfio_ccw_trace.h > b/drivers/s390/cio/vfio_ccw_trace.h > new file mode 100644 > index ..edd3321cd919 > --- /dev/null > +++ b/drivers/s390/cio/vfio_ccw_trace.h > @@ -0,0 +1,86 @@ > +/* SPDX-License-Identifier: GPL-2.0 > + * Tracepoints for vfio_ccw driver > + * > + * Copyright IBM Corp. 2018 > + * > + * Author(s): Dong Jia Shi > + *Halil Pasic > + */ > + > + > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM vfio_ccw > + > +#if !defined(_VFIO_CCW_TRACE_) || defined(TRACE_HEADER_MULTI_READ) > +#define _VFIO_CCW_TRACE_ > + > +#include > + > +TRACE_EVENT(vfio_ccw_cp_prefetch_failed, > + TP_PROTO(struct subchannel_id schid, int errno), > + TP_ARGS(schid, errno), > + > + TP_STRUCT__entry( > + __field_struct(struct subchannel_id, schid) > + __field(int, errno) > + ), > + > + TP_fast_assign( > + __entry->schid = schid; > + __entry->errno = errno; > + ), > + > + TP_printk("(schid 0.%x.%04X) translation failed (errno: %d)", > + __entry->schid.ssid, __entry->schid.sch_no, __entry->errno) > +); > + > +TRACE_EVENT(vfio_ccw_ssch_failed, > + TP_PROTO(struct subchannel_id schid, int errno), > + TP_ARGS(schid, errno), > + > + TP_STRUCT__entry( > + __field_struct(struct subchannel_id, schid) > + __field(int, errno) > + ), > + > +
Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
On Wed, 21 Mar 2018 03:08:22 +0100 Dong Jia Shi wrote: > From: Halil Pasic > > Add some tracepoints so we can inspect what is not working as is should. Tracepoints are certainly helpful (we can add more later). > > Signed-off-by: Halil Pasic > Signed-off-by: Dong Jia Shi > --- > drivers/s390/cio/Makefile | 1 + > drivers/s390/cio/vfio_ccw_fsm.c | 13 ++ > drivers/s390/cio/vfio_ccw_trace.h | 86 > +++ > 3 files changed, 100 insertions(+) > create mode 100644 drivers/s390/cio/vfio_ccw_trace.h > > diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile > index a070ef0efe65..f230516abb96 100644 > --- a/drivers/s390/cio/Makefile > +++ b/drivers/s390/cio/Makefile > @@ -5,6 +5,7 @@ > > # The following is required for define_trace.h to find ./trace.h > CFLAGS_trace.o := -I$(src) > +CFLAGS_vfio_ccw_fsm.o := -I$(src) > > obj-y += airq.o blacklist.o chsc.o cio.o css.o chp.o idset.o isc.o \ > fcx.o itcw.o crw.o ccwreq.o trace.o ioasm.o > diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c > index c30420c517b1..7ed27f90d741 100644 > --- a/drivers/s390/cio/vfio_ccw_fsm.c > +++ b/drivers/s390/cio/vfio_ccw_fsm.c > @@ -13,6 +13,9 @@ > #include "ioasm.h" > #include "vfio_ccw_private.h" > > +#define CREATE_TRACE_POINTS > +#include "vfio_ccw_trace.h" > + > static int fsm_io_helper(struct vfio_ccw_private *private) > { > struct subchannel *sch; > @@ -105,6 +108,10 @@ static void fsm_disabled_irq(struct vfio_ccw_private > *private, >*/ > cio_disable_subchannel(sch); > } > +inline struct subchannel_id get_schid(struct vfio_ccw_private *p) > +{ > + return p->sch->schid; > +} > > /* > * Deal with the ccw command request from the userspace. > @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > > io_region->ret_code = cp_prefetch(>cp); > if (io_region->ret_code) { > + trace_vfio_ccw_cp_prefetch_failed(get_schid(private), > + io_region->ret_code); > cp_free(>cp); > goto err_out; > } > @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > /* Start channel program and wait for I/O interrupt. */ > io_region->ret_code = fsm_io_helper(private); > if (io_region->ret_code) { > + trace_vfio_ccw_ssch_failed(get_schid(private), > +io_region->ret_code); > cp_free(>cp); > goto err_out; > } > @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private > *private, > } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) { > /* XXX: Handle halt. */ > io_region->ret_code = -EOPNOTSUPP; > + trace_vfio_ccw_halt(get_schid(private)); > goto err_out; > } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) { > /* XXX: Handle clear. */ > io_region->ret_code = -EOPNOTSUPP; > + trace_vfio_ccw_clear(get_schid(private)); > goto err_out; Hmmm perhaps better to just trace the function (start/halt/clear) in any case? > } > > diff --git a/drivers/s390/cio/vfio_ccw_trace.h > b/drivers/s390/cio/vfio_ccw_trace.h > new file mode 100644 > index ..edd3321cd919 > --- /dev/null > +++ b/drivers/s390/cio/vfio_ccw_trace.h > @@ -0,0 +1,86 @@ > +/* SPDX-License-Identifier: GPL-2.0 > + * Tracepoints for vfio_ccw driver > + * > + * Copyright IBM Corp. 2018 > + * > + * Author(s): Dong Jia Shi > + *Halil Pasic > + */ > + > + > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM vfio_ccw > + > +#if !defined(_VFIO_CCW_TRACE_) || defined(TRACE_HEADER_MULTI_READ) > +#define _VFIO_CCW_TRACE_ > + > +#include > + > +TRACE_EVENT(vfio_ccw_cp_prefetch_failed, > + TP_PROTO(struct subchannel_id schid, int errno), > + TP_ARGS(schid, errno), > + > + TP_STRUCT__entry( > + __field_struct(struct subchannel_id, schid) > + __field(int, errno) > + ), > + > + TP_fast_assign( > + __entry->schid = schid; > + __entry->errno = errno; > + ), > + > + TP_printk("(schid 0.%x.%04X) translation failed (errno: %d)", > + __entry->schid.ssid, __entry->schid.sch_no, __entry->errno) > +); > + > +TRACE_EVENT(vfio_ccw_ssch_failed, > + TP_PROTO(struct subchannel_id schid, int errno), > + TP_ARGS(schid, errno), > + > + TP_STRUCT__entry( > + __field_struct(struct subchannel_id, schid) > + __field(int, errno) > + ), > + > + TP_fast_assign( > + __entry->schid = schid; > + __entry->errno = errno; > + ), > + > + TP_printk("(schid 0.%x.%04X) ssch failed (errno: %d)", > +