Re: [RFCv4,19/21] media: vim2m: add request support
On Wed, Mar 14, 2018 at 10:25 PM, Paul Kocialkowskiwrote: > Hi, > > On Tue, 2018-03-13 at 19:24 +0900, Alexandre Courbot wrote: >> On Fri, Mar 9, 2018 at 11:35 PM, Paul Kocialkowski >> wrote: >> > Hi, >> > >> > On Thu, 2018-03-08 at 22:48 +0900, Alexandre Courbot wrote: >> > > Hi Paul! >> > > >> > > Thanks a lot for taking the time to try this! I am also working on >> > > getting it to work with an actual driver, but you apparently found >> > > rough edges that I missed. >> > > >> > > On Thu, Mar 8, 2018 at 1:37 AM, Paul Kocialkowski >> > > wrote: >> > > > On Tue, 2018-02-20 at 13:44 +0900, Alexandre Courbot wrote: > > [...] > >> > > > > +static int vim2m_request_submit(struct media_request *req, >> > > > > + struct media_request_entity_data >> > > > > *_data) >> > > > > +{ >> > > > > + struct v4l2_request_entity_data *data; >> > > > > + >> > > > > + data = to_v4l2_entity_data(_data); >> > > > >> > > > We need to call v4l2_m2m_try_schedule here so that m2m >> > > > scheduling >> > > > can >> > > > happen when only 2 buffers were queued and no other action was >> > > > taken >> > > > from usespace. In that scenario, m2m scheduling currently >> > > > doesn't >> > > > happen. >> > > >> > > I don't think I understand the sequence of events that results in >> > > v4l2_m2m_try_schedule() not being called. Do you mean something >> > > like: >> > > >> > > * >> > > * QBUF on output queue with request set >> > > * QBUF on capture queue >> > > * SUBMIT_REQUEST >> > > >> > > ? >> > > >> > > The call to vb2_request_submit() right after should trigger >> > > v4l2_m2m_try_schedule(), since the buffers associated to the >> > > request >> > > will enter the vb2 queue and be passed to the m2m framework, which >> > > will then call v4l2_m2m_try_schedule(). Or maybe you are thinking >> > > about a different sequence of events? >> > >> > This is indeed the sequence of events that I'm seeing, but the >> > scheduling call simply did not happen on vb2_request_submit. I >> > suppose I will need to investigate some more to find out exactly >> > why. >> > >> > IIRC, the m2m qbuf function is called (and fails to schedule) when >> > the >> > ioctl happens, not when the task is submitted. >> > >> > This issue is seen with vim2m as well as the rencently-submitted >> > sunxi- >> > cedrus driver (with the in-driver calls to v4l2_m2m_try_schedule >> > removed, obviously). If needs be, I could provide a standalone test >> > program to reproduce it. >> >> If you have a standalone program that can reproduce this on vim2m, >> then I would like to see it indeed, if only to understand what I have >> missed. > > You can find the test file for this use case at: > https://gist.github.com/paulkocialkowski/4cfa350e1bbe8e3bf714480bba83ea72 Thanks, I have been able to see what the issue was. One indeed needs to call v4l2_m2m_try_schedule() when the request is queued, since the driver qbuf() hook does not do it automatically.
Re: [RFCv4,19/21] media: vim2m: add request support
Hi, On Tue, 2018-03-13 at 19:24 +0900, Alexandre Courbot wrote: > On Fri, Mar 9, 2018 at 11:35 PM, Paul Kocialkowski >wrote: > > Hi, > > > > On Thu, 2018-03-08 at 22:48 +0900, Alexandre Courbot wrote: > > > Hi Paul! > > > > > > Thanks a lot for taking the time to try this! I am also working on > > > getting it to work with an actual driver, but you apparently found > > > rough edges that I missed. > > > > > > On Thu, Mar 8, 2018 at 1:37 AM, Paul Kocialkowski > > > wrote: > > > > On Tue, 2018-02-20 at 13:44 +0900, Alexandre Courbot wrote: [...] > > > > > +static int vim2m_request_submit(struct media_request *req, > > > > > + struct media_request_entity_data > > > > > *_data) > > > > > +{ > > > > > + struct v4l2_request_entity_data *data; > > > > > + > > > > > + data = to_v4l2_entity_data(_data); > > > > > > > > We need to call v4l2_m2m_try_schedule here so that m2m > > > > scheduling > > > > can > > > > happen when only 2 buffers were queued and no other action was > > > > taken > > > > from usespace. In that scenario, m2m scheduling currently > > > > doesn't > > > > happen. > > > > > > I don't think I understand the sequence of events that results in > > > v4l2_m2m_try_schedule() not being called. Do you mean something > > > like: > > > > > > * > > > * QBUF on output queue with request set > > > * QBUF on capture queue > > > * SUBMIT_REQUEST > > > > > > ? > > > > > > The call to vb2_request_submit() right after should trigger > > > v4l2_m2m_try_schedule(), since the buffers associated to the > > > request > > > will enter the vb2 queue and be passed to the m2m framework, which > > > will then call v4l2_m2m_try_schedule(). Or maybe you are thinking > > > about a different sequence of events? > > > > This is indeed the sequence of events that I'm seeing, but the > > scheduling call simply did not happen on vb2_request_submit. I > > suppose I will need to investigate some more to find out exactly > > why. > > > > IIRC, the m2m qbuf function is called (and fails to schedule) when > > the > > ioctl happens, not when the task is submitted. > > > > This issue is seen with vim2m as well as the rencently-submitted > > sunxi- > > cedrus driver (with the in-driver calls to v4l2_m2m_try_schedule > > removed, obviously). If needs be, I could provide a standalone test > > program to reproduce it. > > If you have a standalone program that can reproduce this on vim2m, > then I would like to see it indeed, if only to understand what I have > missed. You can find the test file for this use case at: https://gist.github.com/paulkocialkowski/4cfa350e1bbe8e3bf714480bba83ea72 Cheers! -- Paul Kocialkowski, Bootlin (formerly Free Electrons) Embedded Linux and kernel engineering https://bootlin.com signature.asc Description: This is a digitally signed message part
Re: [RFCv4,19/21] media: vim2m: add request support
On Fri, Mar 9, 2018 at 11:35 PM, Paul Kocialkowskiwrote: > Hi, > > On Thu, 2018-03-08 at 22:48 +0900, Alexandre Courbot wrote: >> Hi Paul! >> >> Thanks a lot for taking the time to try this! I am also working on >> getting it to work with an actual driver, but you apparently found >> rough edges that I missed. >> >> On Thu, Mar 8, 2018 at 1:37 AM, Paul Kocialkowski >> wrote: >> > Hi, >> > >> > First off, I'd like to take the occasion to say thank-you for your >> > work. >> > This is a major piece of plumbing that is required for me to add >> > support >> > for the Allwinner CedarX VPU hardware in upstream Linux. Other >> > drivers, >> > such as tegra-vde (that was recently merged in staging) are also >> > badly >> > in need of this API. >> > >> > I have a few comments based on my experience integrating this >> > request >> > API with the Cedrus VPU driver (and the associated libva backend), >> > that >> > also concern the vim2m driver. >> > >> > On Tue, 2018-02-20 at 13:44 +0900, Alexandre Courbot wrote: >> > > Set the necessary ops for supporting requests in vim2m. >> > > >> > > Signed-off-by: Alexandre Courbot >> > > --- >> > > drivers/media/platform/Kconfig | 1 + >> > > drivers/media/platform/vim2m.c | 75 >> > > ++ >> > > 2 files changed, 76 insertions(+) >> > > >> > > diff --git a/drivers/media/platform/Kconfig >> > > b/drivers/media/platform/Kconfig >> > > index 614fbef08ddc..09be0b5f9afe 100644 >> > > --- a/drivers/media/platform/Kconfig >> > > +++ b/drivers/media/platform/Kconfig >> > >> > [...] >> > >> > > +static int vim2m_request_submit(struct media_request *req, >> > > + struct media_request_entity_data >> > > *_data) >> > > +{ >> > > + struct v4l2_request_entity_data *data; >> > > + >> > > + data = to_v4l2_entity_data(_data); >> > >> > We need to call v4l2_m2m_try_schedule here so that m2m scheduling >> > can >> > happen when only 2 buffers were queued and no other action was taken >> > from usespace. In that scenario, m2m scheduling currently doesn't >> > happen. >> >> I don't think I understand the sequence of events that results in >> v4l2_m2m_try_schedule() not being called. Do you mean something like: >> >> * >> * QBUF on output queue with request set >> * QBUF on capture queue >> * SUBMIT_REQUEST >> >> ? >> >> The call to vb2_request_submit() right after should trigger >> v4l2_m2m_try_schedule(), since the buffers associated to the request >> will enter the vb2 queue and be passed to the m2m framework, which >> will then call v4l2_m2m_try_schedule(). Or maybe you are thinking >> about a different sequence of events? > > This is indeed the sequence of events that I'm seeing, but the > scheduling call simply did not happen on vb2_request_submit. I suppose I will > need to investigate some more to find out exactly why. > > IIRC, the m2m qbuf function is called (and fails to schedule) when the > ioctl happens, not when the task is submitted. > > This issue is seen with vim2m as well as the rencently-submitted sunxi- > cedrus driver (with the in-driver calls to v4l2_m2m_try_schedule > removed, obviously). If needs be, I could provide a standalone test > program to reproduce it. If you have a standalone program that can reproduce this on vim2m, then I would like to see it indeed, if only to understand what I have missed. Thanks, Alex.
Re: [RFCv4,19/21] media: vim2m: add request support
On 12.03.2018 15:32, Alexandre Courbot wrote: > On Mon, Mar 12, 2018 at 5:15 PM, Tomasz Figawrote: >> Hi Paul, Dmitry, >> >> On Mon, Mar 12, 2018 at 5:10 PM, Paul Kocialkowski >> wrote: >>> Hi, >>> >>> On Sun, 2018-03-11 at 22:42 +0300, Dmitry Osipenko wrote: Hello, On 07.03.2018 19:37, Paul Kocialkowski wrote: > Hi, > > First off, I'd like to take the occasion to say thank-you for your > work. > This is a major piece of plumbing that is required for me to add > support > for the Allwinner CedarX VPU hardware in upstream Linux. Other > drivers, > such as tegra-vde (that was recently merged in staging) are also > badly > in need of this API. Certainly it would be good to have a common UAPI. Yet I haven't got my hands on trying to implement the V4L interface for the tegra-vde driver, but I've taken a look at Cedrus driver and for now I've one question: Would it be possible (or maybe already is) to have a single IOCTL that takes input/output buffers with codec parameters, processes the request(s) and returns to userspace when everything is done? Having 5 context switches for a single frame decode (like Cedrus VAAPI driver does) looks like a bit of overhead. >>> >>> The V4L2 interface exposes ioctls for differents actions and I don't >>> think there's a combined ioctl for this. The request API was introduced >>> precisely because we need to have consistency between the various ioctls >>> needed for each frame. Maybe one single (atomic) ioctl would have worked >>> too, but that's apparently not how the V4L2 API was designed. >>> >>> I don't think there is any particular overhead caused by having n ioctls >>> instead of a single one. At least that would be very surprising IMHO. >> >> Well, there is small syscall overhead, which normally shouldn't be >> very painful, although with all the speculative execution hardening, >> can't be sure of anything anymore. :) >> >> Hans and Alex can correct me if I'm wrong, but I believe there is a >> more atomic-like API being planned, which would only need one IOCTL to >> do everything. However, that would be a more serious change to the >> V4L2 interfaces, so should be decoupled from Request API itself. > > Indeed, we discussed the possibility to setup and submit requests in > one syscall, similarly (at least in spirit) to the DRM atomic API. > > This has only been discussed though, and as a feature to consider > *after* the request API is merged for codecs (as the more complex > camera use-cases would benefit more from it). As Tomasz mentioned, the > overhead of ioctls is somehow negligible compared to the workload of > the encoding/decoding itself, although I suppose it can still add up. > The main advantage I can see for this is a simpler and less > error-prone setup of requests for user-space. Indeed, atomic API should be nicer from a userspace perspective and a bit more optimal for at least some of workloads. Would be good to have it.
Re: [RFCv4,19/21] media: vim2m: add request support
On Mon, Mar 12, 2018 at 5:15 PM, Tomasz Figawrote: > Hi Paul, Dmitry, > > On Mon, Mar 12, 2018 at 5:10 PM, Paul Kocialkowski > wrote: >> Hi, >> >> On Sun, 2018-03-11 at 22:42 +0300, Dmitry Osipenko wrote: >>> Hello, >>> >>> On 07.03.2018 19:37, Paul Kocialkowski wrote: >>> > Hi, >>> > >>> > First off, I'd like to take the occasion to say thank-you for your >>> > work. >>> > This is a major piece of plumbing that is required for me to add >>> > support >>> > for the Allwinner CedarX VPU hardware in upstream Linux. Other >>> > drivers, >>> > such as tegra-vde (that was recently merged in staging) are also >>> > badly >>> > in need of this API. >>> >>> Certainly it would be good to have a common UAPI. Yet I haven't got my >>> hands on >>> trying to implement the V4L interface for the tegra-vde driver, but >>> I've taken a >>> look at Cedrus driver and for now I've one question: >>> >>> Would it be possible (or maybe already is) to have a single IOCTL that >>> takes input/output buffers with codec parameters, processes the >>> request(s) and returns to userspace when everything is done? Having 5 >>> context switches for a single frame decode (like Cedrus VAAPI driver >>> does) looks like a bit of overhead. >> >> The V4L2 interface exposes ioctls for differents actions and I don't >> think there's a combined ioctl for this. The request API was introduced >> precisely because we need to have consistency between the various ioctls >> needed for each frame. Maybe one single (atomic) ioctl would have worked >> too, but that's apparently not how the V4L2 API was designed. >> >> I don't think there is any particular overhead caused by having n ioctls >> instead of a single one. At least that would be very surprising IMHO. > > Well, there is small syscall overhead, which normally shouldn't be > very painful, although with all the speculative execution hardening, > can't be sure of anything anymore. :) > > Hans and Alex can correct me if I'm wrong, but I believe there is a > more atomic-like API being planned, which would only need one IOCTL to > do everything. However, that would be a more serious change to the > V4L2 interfaces, so should be decoupled from Request API itself. Indeed, we discussed the possibility to setup and submit requests in one syscall, similarly (at least in spirit) to the DRM atomic API. This has only been discussed though, and as a feature to consider *after* the request API is merged for codecs (as the more complex camera use-cases would benefit more from it). As Tomasz mentioned, the overhead of ioctls is somehow negligible compared to the workload of the encoding/decoding itself, although I suppose it can still add up. The main advantage I can see for this is a simpler and less error-prone setup of requests for user-space.
Re: [RFCv4,19/21] media: vim2m: add request support
On 12.03.2018 11:29, Tomasz Figa wrote: > On Mon, Mar 12, 2018 at 5:25 PM, Paul Kocialkowski >wrote: >> Hi, >> >> On Mon, 2018-03-12 at 17:15 +0900, Tomasz Figa wrote: >>> Hi Paul, Dmitry, >>> >>> On Mon, Mar 12, 2018 at 5:10 PM, Paul Kocialkowski >>> wrote: Hi, On Sun, 2018-03-11 at 22:42 +0300, Dmitry Osipenko wrote: > Hello, > > On 07.03.2018 19:37, Paul Kocialkowski wrote: >> Hi, >> >> First off, I'd like to take the occasion to say thank-you for >> your >> work. >> This is a major piece of plumbing that is required for me to add >> support >> for the Allwinner CedarX VPU hardware in upstream Linux. Other >> drivers, >> such as tegra-vde (that was recently merged in staging) are also >> badly >> in need of this API. > > Certainly it would be good to have a common UAPI. Yet I haven't > got my > hands on > trying to implement the V4L interface for the tegra-vde driver, > but > I've taken a > look at Cedrus driver and for now I've one question: > > Would it be possible (or maybe already is) to have a single IOCTL > that > takes input/output buffers with codec parameters, processes the > request(s) and returns to userspace when everything is done? > Having 5 > context switches for a single frame decode (like Cedrus VAAPI > driver > does) looks like a bit of overhead. The V4L2 interface exposes ioctls for differents actions and I don't think there's a combined ioctl for this. The request API was introduced precisely because we need to have consistency between the various ioctls needed for each frame. Maybe one single (atomic) ioctl would have worked too, but that's apparently not how the V4L2 API was designed. I don't think there is any particular overhead caused by having n ioctls instead of a single one. At least that would be very surprising IMHO. >>> >>> Well, there is small syscall overhead, which normally shouldn't be >>> very painful, although with all the speculative execution hardening, >>> can't be sure of anything anymore. :) >> >> Oh, my mistake then, I had it in mind that it is not really something >> noticeable. Hopefully, it won't be a limiting factor in our cases. > > With typical frame rates achievable by hardware codecs, I doubt that > it would be a limiting factor. We're using a similar API (a WiP > version of pre-Request API prototype from long ago) in Chrome OS > already without any performance issues. Thank you very much for the answers! The syscalls overhead is miserable in comparison to the rest of decoding, though I wanted to clarify whether there is a way to avoid it. Atomic API sounds like something that would suit well for that.
Re: [RFCv4,19/21] media: vim2m: add request support
On Mon, Mar 12, 2018 at 5:25 PM, Paul Kocialkowskiwrote: > Hi, > > On Mon, 2018-03-12 at 17:15 +0900, Tomasz Figa wrote: >> Hi Paul, Dmitry, >> >> On Mon, Mar 12, 2018 at 5:10 PM, Paul Kocialkowski >> wrote: >> > Hi, >> > >> > On Sun, 2018-03-11 at 22:42 +0300, Dmitry Osipenko wrote: >> > > Hello, >> > > >> > > On 07.03.2018 19:37, Paul Kocialkowski wrote: >> > > > Hi, >> > > > >> > > > First off, I'd like to take the occasion to say thank-you for >> > > > your >> > > > work. >> > > > This is a major piece of plumbing that is required for me to add >> > > > support >> > > > for the Allwinner CedarX VPU hardware in upstream Linux. Other >> > > > drivers, >> > > > such as tegra-vde (that was recently merged in staging) are also >> > > > badly >> > > > in need of this API. >> > > >> > > Certainly it would be good to have a common UAPI. Yet I haven't >> > > got my >> > > hands on >> > > trying to implement the V4L interface for the tegra-vde driver, >> > > but >> > > I've taken a >> > > look at Cedrus driver and for now I've one question: >> > > >> > > Would it be possible (or maybe already is) to have a single IOCTL >> > > that >> > > takes input/output buffers with codec parameters, processes the >> > > request(s) and returns to userspace when everything is done? >> > > Having 5 >> > > context switches for a single frame decode (like Cedrus VAAPI >> > > driver >> > > does) looks like a bit of overhead. >> > >> > The V4L2 interface exposes ioctls for differents actions and I don't >> > think there's a combined ioctl for this. The request API was >> > introduced >> > precisely because we need to have consistency between the various >> > ioctls >> > needed for each frame. Maybe one single (atomic) ioctl would have >> > worked >> > too, but that's apparently not how the V4L2 API was designed. >> > >> > I don't think there is any particular overhead caused by having n >> > ioctls >> > instead of a single one. At least that would be very surprising >> > IMHO. >> >> Well, there is small syscall overhead, which normally shouldn't be >> very painful, although with all the speculative execution hardening, >> can't be sure of anything anymore. :) > > Oh, my mistake then, I had it in mind that it is not really something > noticeable. Hopefully, it won't be a limiting factor in our cases. With typical frame rates achievable by hardware codecs, I doubt that it would be a limiting factor. We're using a similar API (a WiP version of pre-Request API prototype from long ago) in Chrome OS already without any performance issues.
Re: [RFCv4,19/21] media: vim2m: add request support
Hi, On Mon, 2018-03-12 at 17:15 +0900, Tomasz Figa wrote: > Hi Paul, Dmitry, > > On Mon, Mar 12, 2018 at 5:10 PM, Paul Kocialkowski >wrote: > > Hi, > > > > On Sun, 2018-03-11 at 22:42 +0300, Dmitry Osipenko wrote: > > > Hello, > > > > > > On 07.03.2018 19:37, Paul Kocialkowski wrote: > > > > Hi, > > > > > > > > First off, I'd like to take the occasion to say thank-you for > > > > your > > > > work. > > > > This is a major piece of plumbing that is required for me to add > > > > support > > > > for the Allwinner CedarX VPU hardware in upstream Linux. Other > > > > drivers, > > > > such as tegra-vde (that was recently merged in staging) are also > > > > badly > > > > in need of this API. > > > > > > Certainly it would be good to have a common UAPI. Yet I haven't > > > got my > > > hands on > > > trying to implement the V4L interface for the tegra-vde driver, > > > but > > > I've taken a > > > look at Cedrus driver and for now I've one question: > > > > > > Would it be possible (or maybe already is) to have a single IOCTL > > > that > > > takes input/output buffers with codec parameters, processes the > > > request(s) and returns to userspace when everything is done? > > > Having 5 > > > context switches for a single frame decode (like Cedrus VAAPI > > > driver > > > does) looks like a bit of overhead. > > > > The V4L2 interface exposes ioctls for differents actions and I don't > > think there's a combined ioctl for this. The request API was > > introduced > > precisely because we need to have consistency between the various > > ioctls > > needed for each frame. Maybe one single (atomic) ioctl would have > > worked > > too, but that's apparently not how the V4L2 API was designed. > > > > I don't think there is any particular overhead caused by having n > > ioctls > > instead of a single one. At least that would be very surprising > > IMHO. > > Well, there is small syscall overhead, which normally shouldn't be > very painful, although with all the speculative execution hardening, > can't be sure of anything anymore. :) Oh, my mistake then, I had it in mind that it is not really something noticeable. Hopefully, it won't be a limiting factor in our cases. > Hans and Alex can correct me if I'm wrong, but I believe there is a > more atomic-like API being planned, which would only need one IOCTL to > do everything. However, that would be a more serious change to the > V4L2 interfaces, so should be decoupled from Request API itself. > > Best regards, > Tomasz -- Paul Kocialkowski, Bootlin (formerly Free Electrons) Embedded Linux and kernel engineering https://bootlin.com signature.asc Description: This is a digitally signed message part
Re: [RFCv4,19/21] media: vim2m: add request support
Hi Paul, Dmitry, On Mon, Mar 12, 2018 at 5:10 PM, Paul Kocialkowskiwrote: > Hi, > > On Sun, 2018-03-11 at 22:42 +0300, Dmitry Osipenko wrote: >> Hello, >> >> On 07.03.2018 19:37, Paul Kocialkowski wrote: >> > Hi, >> > >> > First off, I'd like to take the occasion to say thank-you for your >> > work. >> > This is a major piece of plumbing that is required for me to add >> > support >> > for the Allwinner CedarX VPU hardware in upstream Linux. Other >> > drivers, >> > such as tegra-vde (that was recently merged in staging) are also >> > badly >> > in need of this API. >> >> Certainly it would be good to have a common UAPI. Yet I haven't got my >> hands on >> trying to implement the V4L interface for the tegra-vde driver, but >> I've taken a >> look at Cedrus driver and for now I've one question: >> >> Would it be possible (or maybe already is) to have a single IOCTL that >> takes input/output buffers with codec parameters, processes the >> request(s) and returns to userspace when everything is done? Having 5 >> context switches for a single frame decode (like Cedrus VAAPI driver >> does) looks like a bit of overhead. > > The V4L2 interface exposes ioctls for differents actions and I don't > think there's a combined ioctl for this. The request API was introduced > precisely because we need to have consistency between the various ioctls > needed for each frame. Maybe one single (atomic) ioctl would have worked > too, but that's apparently not how the V4L2 API was designed. > > I don't think there is any particular overhead caused by having n ioctls > instead of a single one. At least that would be very surprising IMHO. Well, there is small syscall overhead, which normally shouldn't be very painful, although with all the speculative execution hardening, can't be sure of anything anymore. :) Hans and Alex can correct me if I'm wrong, but I believe there is a more atomic-like API being planned, which would only need one IOCTL to do everything. However, that would be a more serious change to the V4L2 interfaces, so should be decoupled from Request API itself. Best regards, Tomasz
Re: [RFCv4,19/21] media: vim2m: add request support
Hi, On Sun, 2018-03-11 at 22:42 +0300, Dmitry Osipenko wrote: > Hello, > > On 07.03.2018 19:37, Paul Kocialkowski wrote: > > Hi, > > > > First off, I'd like to take the occasion to say thank-you for your > > work. > > This is a major piece of plumbing that is required for me to add > > support > > for the Allwinner CedarX VPU hardware in upstream Linux. Other > > drivers, > > such as tegra-vde (that was recently merged in staging) are also > > badly > > in need of this API. > > Certainly it would be good to have a common UAPI. Yet I haven't got my > hands on > trying to implement the V4L interface for the tegra-vde driver, but > I've taken a > look at Cedrus driver and for now I've one question: > > Would it be possible (or maybe already is) to have a single IOCTL that > takes input/output buffers with codec parameters, processes the > request(s) and returns to userspace when everything is done? Having 5 > context switches for a single frame decode (like Cedrus VAAPI driver > does) looks like a bit of overhead. The V4L2 interface exposes ioctls for differents actions and I don't think there's a combined ioctl for this. The request API was introduced precisely because we need to have consistency between the various ioctls needed for each frame. Maybe one single (atomic) ioctl would have worked too, but that's apparently not how the V4L2 API was designed. I don't think there is any particular overhead caused by having n ioctls instead of a single one. At least that would be very surprising IMHO. > > I have a few comments based on my experience integrating this > > request API with the Cedrus VPU driver (and the associated libva > > backend), that also concern the vim2m driver. > > > > On Tue, 2018-02-20 at 13:44 +0900, Alexandre Courbot wrote: > > > Set the necessary ops for supporting requests in vim2m. > > > > > > Signed-off-by: Alexandre Courbot> > > --- > > > drivers/media/platform/Kconfig | 1 + > > > drivers/media/platform/vim2m.c | 75 > > > ++ > > > 2 files changed, 76 insertions(+) > > > > > > diff --git a/drivers/media/platform/Kconfig > > > b/drivers/media/platform/Kconfig > > > index 614fbef08ddc..09be0b5f9afe 100644 > > > --- a/drivers/media/platform/Kconfig > > > +++ b/drivers/media/platform/Kconfig > > > > [...] > > > > > +static int vim2m_request_submit(struct media_request *req, > > > + struct media_request_entity_data > > > *_data) > > > +{ > > > + struct v4l2_request_entity_data *data; > > > + > > > + data = to_v4l2_entity_data(_data); > > > > We need to call v4l2_m2m_try_schedule here so that m2m scheduling > > can > > happen when only 2 buffers were queued and no other action was taken > > from usespace. In that scenario, m2m scheduling currently doesn't > > happen. > > > > However, this requires access to the m2m context, which is not easy > > to > > get from req or _data. I'm not sure that some container_of magic > > would > > even do the trick here. > > > > > + return vb2_request_submit(data); > > > > vb2_request_submit does not lock the associated request mutex > > although > > it accesses the associated queued buffers list, which I believe this > > mutex is supposed to protect. > > > > We could either wrap this call with media_request_lock(req) and > > media_request_unlock(req) or have the lock in the function itself, > > which > > would require passing it the req pointer. > > > > The latter would probably be safer for future use of the function. > > > > > +} > > > + > > > +static const struct media_request_entity_ops > > > vim2m_request_entity_ops > > > = { > > > + .data_alloc = vim2m_entity_data_alloc, > > > + .data_free = v4l2_request_entity_data_free, > > > + .submit = vim2m_request_submit, > > > +}; > > > + > > > /* > > > * File operations > > > */ > > > @@ -900,6 +967,9 @@ static int vim2m_open(struct file *file) > > > ctx->dev = dev; > > > hdl = >hdl; > > > v4l2_ctrl_handler_init(hdl, 4); > > > + v4l2_request_entity_init(>req_entity, > > > _request_entity_ops, > > > + >dev->vfd); > > > + ctx->fh.entity = >req_entity.base; > > > v4l2_ctrl_new_std(hdl, _ctrl_ops, V4L2_CID_HFLIP, > > > 0, 1, > > > 1, 0); > > > v4l2_ctrl_new_std(hdl, _ctrl_ops, V4L2_CID_VFLIP, > > > 0, 1, > > > 1, 0); > > > v4l2_ctrl_new_custom(hdl, _ctrl_trans_time_msec, > > > NULL); > > > @@ -999,6 +1069,9 @@ static int vim2m_probe(struct platform_device > > > *pdev) > > > if (!dev) > > > return -ENOMEM; > > > > > > + v4l2_request_mgr_init(>req_mgr, >vfd, > > > + _request_ops); > > > + > > > spin_lock_init(>irqlock); > > > > > > ret = v4l2_device_register(>dev, >v4l2_dev); > > > @@ -1012,6 +1085,7 @@ static int vim2m_probe(struct > > > platform_device > > > *pdev) > > > vfd = >vfd; > > > vfd->lock = >dev_mutex; > > > vfd->v4l2_dev = >v4l2_dev; > > > + vfd->req_mgr = >req_mgr.base; > > > > > > ret =
Re: [RFCv4,19/21] media: vim2m: add request support
Hello, On 07.03.2018 19:37, Paul Kocialkowski wrote: > Hi, > > First off, I'd like to take the occasion to say thank-you for your work. > This is a major piece of plumbing that is required for me to add support > for the Allwinner CedarX VPU hardware in upstream Linux. Other drivers, > such as tegra-vde (that was recently merged in staging) are also badly > in need of this API. Certainly it would be good to have a common UAPI. Yet I haven't got my hands on trying to implement the V4L interface for the tegra-vde driver, but I've taken a look at Cedrus driver and for now I've one question: Would it be possible (or maybe already is) to have a single IOCTL that takes input/output buffers with codec parameters, processes the request(s) and returns to userspace when everything is done? Having 5 context switches for a single frame decode (like Cedrus VAAPI driver does) looks like a bit of overhead. > I have a few comments based on my experience integrating this request > API with the Cedrus VPU driver (and the associated libva backend), that > also concern the vim2m driver. > > On Tue, 2018-02-20 at 13:44 +0900, Alexandre Courbot wrote: >> Set the necessary ops for supporting requests in vim2m. >> >> Signed-off-by: Alexandre Courbot>> --- >> drivers/media/platform/Kconfig | 1 + >> drivers/media/platform/vim2m.c | 75 >> ++ >> 2 files changed, 76 insertions(+) >> >> diff --git a/drivers/media/platform/Kconfig >> b/drivers/media/platform/Kconfig >> index 614fbef08ddc..09be0b5f9afe 100644 >> --- a/drivers/media/platform/Kconfig >> +++ b/drivers/media/platform/Kconfig > > [...] > >> +static int vim2m_request_submit(struct media_request *req, >> +struct media_request_entity_data >> *_data) >> +{ >> +struct v4l2_request_entity_data *data; >> + >> +data = to_v4l2_entity_data(_data); > > We need to call v4l2_m2m_try_schedule here so that m2m scheduling can > happen when only 2 buffers were queued and no other action was taken > from usespace. In that scenario, m2m scheduling currently doesn't > happen. > > However, this requires access to the m2m context, which is not easy to > get from req or _data. I'm not sure that some container_of magic would > even do the trick here. > >> +return vb2_request_submit(data); > > vb2_request_submit does not lock the associated request mutex although > it accesses the associated queued buffers list, which I believe this > mutex is supposed to protect. > > We could either wrap this call with media_request_lock(req) and > media_request_unlock(req) or have the lock in the function itself, which > would require passing it the req pointer. > > The latter would probably be safer for future use of the function. > >> +} >> + >> +static const struct media_request_entity_ops vim2m_request_entity_ops >> = { >> +.data_alloc = vim2m_entity_data_alloc, >> +.data_free = v4l2_request_entity_data_free, >> +.submit = vim2m_request_submit, >> +}; >> + >> /* >> * File operations >> */ >> @@ -900,6 +967,9 @@ static int vim2m_open(struct file *file) >> ctx->dev = dev; >> hdl = >hdl; >> v4l2_ctrl_handler_init(hdl, 4); >> +v4l2_request_entity_init(>req_entity, >> _request_entity_ops, >> + >dev->vfd); >> +ctx->fh.entity = >req_entity.base; >> v4l2_ctrl_new_std(hdl, _ctrl_ops, V4L2_CID_HFLIP, 0, 1, >> 1, 0); >> v4l2_ctrl_new_std(hdl, _ctrl_ops, V4L2_CID_VFLIP, 0, 1, >> 1, 0); >> v4l2_ctrl_new_custom(hdl, _ctrl_trans_time_msec, NULL); >> @@ -999,6 +1069,9 @@ static int vim2m_probe(struct platform_device >> *pdev) >> if (!dev) >> return -ENOMEM; >> >> +v4l2_request_mgr_init(>req_mgr, >vfd, >> + _request_ops); >> + >> spin_lock_init(>irqlock); >> >> ret = v4l2_device_register(>dev, >v4l2_dev); >> @@ -1012,6 +1085,7 @@ static int vim2m_probe(struct platform_device >> *pdev) >> vfd = >vfd; >> vfd->lock = >dev_mutex; >> vfd->v4l2_dev = >v4l2_dev; >> +vfd->req_mgr = >req_mgr.base; >> >> ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0); >> if (ret) { >> @@ -1054,6 +1128,7 @@ static int vim2m_remove(struct platform_device >> *pdev) >> del_timer_sync(>timer); >> video_unregister_device(>vfd); >> v4l2_device_unregister(>v4l2_dev); >> +v4l2_request_mgr_free(>req_mgr); >> >> return 0; >> } > -- Dmitry
Re: [RFCv4,19/21] media: vim2m: add request support
Hi, On Thu, 2018-03-08 at 22:48 +0900, Alexandre Courbot wrote: > Hi Paul! > > Thanks a lot for taking the time to try this! I am also working on > getting it to work with an actual driver, but you apparently found > rough edges that I missed. > > On Thu, Mar 8, 2018 at 1:37 AM, Paul Kocialkowski >wrote: > > Hi, > > > > First off, I'd like to take the occasion to say thank-you for your > > work. > > This is a major piece of plumbing that is required for me to add > > support > > for the Allwinner CedarX VPU hardware in upstream Linux. Other > > drivers, > > such as tegra-vde (that was recently merged in staging) are also > > badly > > in need of this API. > > > > I have a few comments based on my experience integrating this > > request > > API with the Cedrus VPU driver (and the associated libva backend), > > that > > also concern the vim2m driver. > > > > On Tue, 2018-02-20 at 13:44 +0900, Alexandre Courbot wrote: > > > Set the necessary ops for supporting requests in vim2m. > > > > > > Signed-off-by: Alexandre Courbot > > > --- > > > drivers/media/platform/Kconfig | 1 + > > > drivers/media/platform/vim2m.c | 75 > > > ++ > > > 2 files changed, 76 insertions(+) > > > > > > diff --git a/drivers/media/platform/Kconfig > > > b/drivers/media/platform/Kconfig > > > index 614fbef08ddc..09be0b5f9afe 100644 > > > --- a/drivers/media/platform/Kconfig > > > +++ b/drivers/media/platform/Kconfig > > > > [...] > > > > > +static int vim2m_request_submit(struct media_request *req, > > > + struct media_request_entity_data > > > *_data) > > > +{ > > > + struct v4l2_request_entity_data *data; > > > + > > > + data = to_v4l2_entity_data(_data); > > > > We need to call v4l2_m2m_try_schedule here so that m2m scheduling > > can > > happen when only 2 buffers were queued and no other action was taken > > from usespace. In that scenario, m2m scheduling currently doesn't > > happen. > > I don't think I understand the sequence of events that results in > v4l2_m2m_try_schedule() not being called. Do you mean something like: > > * > * QBUF on output queue with request set > * QBUF on capture queue > * SUBMIT_REQUEST > > ? > > The call to vb2_request_submit() right after should trigger > v4l2_m2m_try_schedule(), since the buffers associated to the request > will enter the vb2 queue and be passed to the m2m framework, which > will then call v4l2_m2m_try_schedule(). Or maybe you are thinking > about a different sequence of events? This is indeed the sequence of events that I'm seeing, but the scheduling call simply did not happen on vb2_request_submit. I suppose I will need to investigate some more to find out exactly why. IIRC, the m2m qbuf function is called (and fails to schedule) when the ioctl happens, not when the task is submitted. This issue is seen with vim2m as well as the rencently-submitted sunxi- cedrus driver (with the in-driver calls to v4l2_m2m_try_schedule removed, obviously). If needs be, I could provide a standalone test program to reproduce it. > > However, this requires access to the m2m context, which is not easy > > to > > get from req or _data. I'm not sure that some container_of magic > > would > > even do the trick here. > > data_->entity will give you a pointer to the media_request_entity, > which is part of vim2m_ctx. You can thus get the m2m context by doing > container_of(data_->entity, struct vim2m_ctx, req_entity). See > vim2m_entity_data_alloc() for an example. Excellent, that's exactly what I was looking for! Thanks a lot and see the related patch I submitted earlier today. > > > + return vb2_request_submit(data); > > > > vb2_request_submit does not lock the associated request mutex > > although > > it accesses the associated queued buffers list, which I believe this > > mutex is supposed to protect. > > After a request is submitted, the data protected by the mutex can only > be accessed by the driver when it processes the request. It cannot be > modified concurrently, so I think we are safe here. Fait enough, that should be enough then. I've dropped this change. Cheers, Paul > I am also wondering whether the ioctl locking doesn't make the request > locking redundant. Request information can only be modified and > accessed through ioctls until it is submitted, and after that there > are no concurrent accesses. I need to think a bit more about it > though. > > Cheers, > Alex. > > > > > We could either wrap this call with media_request_lock(req) and > > media_request_unlock(req) or have the lock in the function itself, > > which > > would require passing it the req pointer. > > > > The latter would probably be safer for future use of the function. > > > > > +} > > > + > > > +static const struct media_request_entity_ops > > > vim2m_request_entity_ops > > > = { > > > + .data_alloc = vim2m_entity_data_alloc, > > > + .data_free
Re: [RFCv4,19/21] media: vim2m: add request support
Hi Paul! Thanks a lot for taking the time to try this! I am also working on getting it to work with an actual driver, but you apparently found rough edges that I missed. On Thu, Mar 8, 2018 at 1:37 AM, Paul Kocialkowskiwrote: > Hi, > > First off, I'd like to take the occasion to say thank-you for your work. > This is a major piece of plumbing that is required for me to add support > for the Allwinner CedarX VPU hardware in upstream Linux. Other drivers, > such as tegra-vde (that was recently merged in staging) are also badly > in need of this API. > > I have a few comments based on my experience integrating this request > API with the Cedrus VPU driver (and the associated libva backend), that > also concern the vim2m driver. > > On Tue, 2018-02-20 at 13:44 +0900, Alexandre Courbot wrote: >> Set the necessary ops for supporting requests in vim2m. >> >> Signed-off-by: Alexandre Courbot >> --- >> drivers/media/platform/Kconfig | 1 + >> drivers/media/platform/vim2m.c | 75 >> ++ >> 2 files changed, 76 insertions(+) >> >> diff --git a/drivers/media/platform/Kconfig >> b/drivers/media/platform/Kconfig >> index 614fbef08ddc..09be0b5f9afe 100644 >> --- a/drivers/media/platform/Kconfig >> +++ b/drivers/media/platform/Kconfig > > [...] > >> +static int vim2m_request_submit(struct media_request *req, >> + struct media_request_entity_data >> *_data) >> +{ >> + struct v4l2_request_entity_data *data; >> + >> + data = to_v4l2_entity_data(_data); > > We need to call v4l2_m2m_try_schedule here so that m2m scheduling can > happen when only 2 buffers were queued and no other action was taken > from usespace. In that scenario, m2m scheduling currently doesn't > happen. I don't think I understand the sequence of events that results in v4l2_m2m_try_schedule() not being called. Do you mean something like: * * QBUF on output queue with request set * QBUF on capture queue * SUBMIT_REQUEST ? The call to vb2_request_submit() right after should trigger v4l2_m2m_try_schedule(), since the buffers associated to the request will enter the vb2 queue and be passed to the m2m framework, which will then call v4l2_m2m_try_schedule(). Or maybe you are thinking about a different sequence of events? > > However, this requires access to the m2m context, which is not easy to > get from req or _data. I'm not sure that some container_of magic would > even do the trick here. data_->entity will give you a pointer to the media_request_entity, which is part of vim2m_ctx. You can thus get the m2m context by doing container_of(data_->entity, struct vim2m_ctx, req_entity). See vim2m_entity_data_alloc() for an example. > >> + return vb2_request_submit(data); > > vb2_request_submit does not lock the associated request mutex although > it accesses the associated queued buffers list, which I believe this > mutex is supposed to protect. After a request is submitted, the data protected by the mutex can only be accessed by the driver when it processes the request. It cannot be modified concurrently, so I think we are safe here. I am also wondering whether the ioctl locking doesn't make the request locking redundant. Request information can only be modified and accessed through ioctls until it is submitted, and after that there are no concurrent accesses. I need to think a bit more about it though. Cheers, Alex. > > We could either wrap this call with media_request_lock(req) and > media_request_unlock(req) or have the lock in the function itself, which > would require passing it the req pointer. > > The latter would probably be safer for future use of the function. > >> +} >> + >> +static const struct media_request_entity_ops vim2m_request_entity_ops >> = { >> + .data_alloc = vim2m_entity_data_alloc, >> + .data_free = v4l2_request_entity_data_free, >> + .submit = vim2m_request_submit, >> +}; >> + >> /* >> * File operations >> */ >> @@ -900,6 +967,9 @@ static int vim2m_open(struct file *file) >> ctx->dev = dev; >> hdl = >hdl; >> v4l2_ctrl_handler_init(hdl, 4); >> + v4l2_request_entity_init(>req_entity, >> _request_entity_ops, >> + >dev->vfd); >> + ctx->fh.entity = >req_entity.base; >> v4l2_ctrl_new_std(hdl, _ctrl_ops, V4L2_CID_HFLIP, 0, 1, >> 1, 0); >> v4l2_ctrl_new_std(hdl, _ctrl_ops, V4L2_CID_VFLIP, 0, 1, >> 1, 0); >> v4l2_ctrl_new_custom(hdl, _ctrl_trans_time_msec, NULL); >> @@ -999,6 +1069,9 @@ static int vim2m_probe(struct platform_device >> *pdev) >> if (!dev) >> return -ENOMEM; >> >> + v4l2_request_mgr_init(>req_mgr, >vfd, >> + _request_ops); >> + >> spin_lock_init(>irqlock); >> >> ret = v4l2_device_register(>dev, >v4l2_dev); >> @@ -1012,6 +1085,7 @@ static int vim2m_probe(struct platform_device >> *pdev) >> vfd = >vfd; >> vfd->lock =
Re: [RFCv4,19/21] media: vim2m: add request support
Hi, First off, I'd like to take the occasion to say thank-you for your work. This is a major piece of plumbing that is required for me to add support for the Allwinner CedarX VPU hardware in upstream Linux. Other drivers, such as tegra-vde (that was recently merged in staging) are also badly in need of this API. I have a few comments based on my experience integrating this request API with the Cedrus VPU driver (and the associated libva backend), that also concern the vim2m driver. On Tue, 2018-02-20 at 13:44 +0900, Alexandre Courbot wrote: > Set the necessary ops for supporting requests in vim2m. > > Signed-off-by: Alexandre Courbot> --- > drivers/media/platform/Kconfig | 1 + > drivers/media/platform/vim2m.c | 75 > ++ > 2 files changed, 76 insertions(+) > > diff --git a/drivers/media/platform/Kconfig > b/drivers/media/platform/Kconfig > index 614fbef08ddc..09be0b5f9afe 100644 > --- a/drivers/media/platform/Kconfig > +++ b/drivers/media/platform/Kconfig [...] > +static int vim2m_request_submit(struct media_request *req, > + struct media_request_entity_data > *_data) > +{ > + struct v4l2_request_entity_data *data; > + > + data = to_v4l2_entity_data(_data); We need to call v4l2_m2m_try_schedule here so that m2m scheduling can happen when only 2 buffers were queued and no other action was taken from usespace. In that scenario, m2m scheduling currently doesn't happen. However, this requires access to the m2m context, which is not easy to get from req or _data. I'm not sure that some container_of magic would even do the trick here. > + return vb2_request_submit(data); vb2_request_submit does not lock the associated request mutex although it accesses the associated queued buffers list, which I believe this mutex is supposed to protect. We could either wrap this call with media_request_lock(req) and media_request_unlock(req) or have the lock in the function itself, which would require passing it the req pointer. The latter would probably be safer for future use of the function. > +} > + > +static const struct media_request_entity_ops vim2m_request_entity_ops > = { > + .data_alloc = vim2m_entity_data_alloc, > + .data_free = v4l2_request_entity_data_free, > + .submit = vim2m_request_submit, > +}; > + > /* > * File operations > */ > @@ -900,6 +967,9 @@ static int vim2m_open(struct file *file) > ctx->dev = dev; > hdl = >hdl; > v4l2_ctrl_handler_init(hdl, 4); > + v4l2_request_entity_init(>req_entity, > _request_entity_ops, > + >dev->vfd); > + ctx->fh.entity = >req_entity.base; > v4l2_ctrl_new_std(hdl, _ctrl_ops, V4L2_CID_HFLIP, 0, 1, > 1, 0); > v4l2_ctrl_new_std(hdl, _ctrl_ops, V4L2_CID_VFLIP, 0, 1, > 1, 0); > v4l2_ctrl_new_custom(hdl, _ctrl_trans_time_msec, NULL); > @@ -999,6 +1069,9 @@ static int vim2m_probe(struct platform_device > *pdev) > if (!dev) > return -ENOMEM; > > + v4l2_request_mgr_init(>req_mgr, >vfd, > + _request_ops); > + > spin_lock_init(>irqlock); > > ret = v4l2_device_register(>dev, >v4l2_dev); > @@ -1012,6 +1085,7 @@ static int vim2m_probe(struct platform_device > *pdev) > vfd = >vfd; > vfd->lock = >dev_mutex; > vfd->v4l2_dev = >v4l2_dev; > + vfd->req_mgr = >req_mgr.base; > > ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0); > if (ret) { > @@ -1054,6 +1128,7 @@ static int vim2m_remove(struct platform_device > *pdev) > del_timer_sync(>timer); > video_unregister_device(>vfd); > v4l2_device_unregister(>v4l2_dev); > + v4l2_request_mgr_free(>req_mgr); > > return 0; > } -- Paul Kocialkowski, Bootlin (formerly Free Electrons) Embedded Linux and kernel engineering https://bootlin.com signature.asc Description: This is a digitally signed message part