On Thu, Apr 27, 2023 at 11:20 PM Alexander Gordeev <alexander.gord...@opensynergy.com> wrote: > > On 26.04.23 07:52, Alexandre Courbot wrote: > > On Mon, Apr 24, 2023 at 4:52 PM Alexander Gordeev > > <alexander.gord...@opensynergy.com> wrote: > >> > >> On 21.04.23 18:01, Alexander Gordeev wrote: > >> > >>> On 21.04.23 06:02, Alexandre Courbot wrote: > >>> > >>>> * I am still not convinced that V4L2 is lacking from a security > >>>> perspective. It would take just one valid example to change my mind > >>>> (and no, the way the queues are named is not valid). And btw, if it > >>>> really introduces security issues, then this makes it invalid for > >>>> inclusion in virtio entirely, just not OpSy's hypervisor. > >>> > >>> > >>> I'd like to start with this and then answer everything else later. > >>> > >>> Let's compare VIRTIO_VIDEO_CMD_RESOURCE_QUEUE with > >>> VIDIOC_QBUF+VIDIOC_DQBUF. Including the parameters, of course. First, > >>> let's compare the word count to get a very rough estimate of complexity. > >>> I counted 585 words for VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, including the > >>> parameters. VIDIOC_QBUF+VIDIOC_DQBUF are defined together and take 1206 > >>> words, they both use struct v4l2_buffer as a parameter. The struct takes > >>> 2716 words to be described. So the whole thing takes 3922 words. This is > >>> 6.7 times more, than VIRTIO_VIDEO_CMD_RESOURCE_QUEUE. If we check the > >>> definitions of the structs, it is also very obvious, that V4L2 UAPI is > >>> almost like an order of magnitude more complex. > >> > >> > >> I think, it is best to add all the steps necessary to reproduce my > >> calculations just in case. > >> > >> VIRTIO_VIDEO_CMD_RESOURCE_QUEUE is doing essentially the same thing as > >> VIDIOC_QBUF+VIDIOC_DQBUF, so we're comparing apples to apples (if we don't > >> forget to compare their parameters too). > >> > >> To get the word count for the VIRTIO_VIDEO_CMD_RESOURCE_QUEUE I opened the > >> rendered PDF of video section only from the first email in this thread. > >> Here is the link: > >> https://drive.google.com/file/d/1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD/view?usp=sharing > >> . Then I scrolled to page 11 and copied everything related a text file. > >> This is around two pages in the PDF. Then I removed page numbers from the > >> copied text and used 'wc -w' to count words. > >> > >> To get the word count for VIDIOC_QBUF+VIDIOC_DQBUF I opened this link: > >> https://docs.kernel.org/userspace-api/media/v4l/vidioc-qbuf.html . Then I > >> selected all the text except table of contents and did followed the same > >> procedure. > >> > >> To get the word count for struct v4l2_buffer and other types, that are > >> referenced from it, I opened this link: > >> https://docs.kernel.org/userspace-api/media/v4l/buffer.html#struct-v4l2-buffer > >> . Then I selected all the text except the table of contents and the text > >> above struct v4l2_buffer definition. The rest is the same. > >> > >> Also it's quite obvious if you look at them how much bigger struct > >> v4l2_buffer (including the referenced types) is compared to struct > >> virtio_video_resource_queue. > > > > You are comparing not the complexity of the structures but the > > verbosity of their documentation, which are written in a different > > style, format, and by different people. > > I agree to some extent. At least this benchmark is simple and it > provokes to actually go and look at the definitions, which IMO should be > enough to already see the difference. What could be a better benchmark? > Maybe counting the number of various fields and flags and enum cases, > that one has to read through?
I give you another point of comparison literally 5 lines down my email. > > > And the V4L2 page also > > contains the description of memory types, which is part of another > > section in the virtio-video spec. > > You mean only enum v4l2_memory? Or anything else too? > > > There is no way to draw a meaningful > > conclusion from this. > > > > If you want to compare, do it with how the structures are actually > > used. Here is how you would queue an input buffer with virtio-video: > > > > struct virtio_video_resource_queue queue_buf = { > > .cmd_type = VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, > > .stream_id = 42, > > .queue_type = VIRTIO_VIDEO_QUEUE_TYPE_INPUT, > > .resource_id = 1, > > .timestamp = 0x10, > > .data_sizes = { > > [0] = 0x1000, > > }, > > }; > > > > Now the same with virtio-v4l2: > > > > struct virtio_v4l2_queue_buf queue_buf = { > > .cmd = VIRTIO_V4L2_CMD_IOCTL, > > .code = VIDIOC_QBUF, > > .session_id = 42, > > .buffer.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, > > .buffer.index = 1, > > .buffer.timestamp.tv_usec = 0x10, > > .buffer.memory = V4L2_MEMORY_MMAP, > > .planes = { > > [0] = { .bytesused = 0x1000 }, > > } > > }; > > > > In both cases, you pass a structure with some members set, and the > > rest to 0. The host receives basically the same thing - it's the same > > data! The only difference is how it is laid out. > > How do I know this from the text? How do I verify, that this is correct? > Can I be sure this code is going to work or not between any device and > any driver? > > With virtio-video you can just go to the spec/header file, take the > struct definition and simply set all the fields. > > With V4L2 UAPI I don't see any other way except going through the whole > buffer.html file, filtering potentially irrelevant stuff, then doing > some trials, and maybe looking into the device or driver code. Maybe > also asking you for an advice? The relevant fields for each decoding operation are clearly detailed in https://www.kernel.org/doc/html/v5.18/userspace-api/media/v4l/dev-decoder.html, which I already linked to multiple times. > I think it is worth trying to imagine you're a newcomer in the project. > Or a security auditor. > > So I think we can't draw conclusions about the spec quality from these > code samples. > > > Also as mentioned by Bart, the apparent simplicity of > > VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, which does not require a dequeue > > operation as the dequeue is sent with its response, is actually a > > fallacy: that design choice makes the specification simpler, but at > > the cost of more complexity on the device side and the potential to > > starve on command descriptors. By contrast, adopting the V4L2 model > > resulted in simpler code on both sides and no possibility to deadlock. > > That point could be addressed by revising the virtio-video spec, but > > then you get even closer to V4L2. > > Hmm, I understand, thanks. Well, I can fix this in the spec. I have some > ideas. I think I'd better describe them in an answer to Bart's email. > > >> Do we agree now, that V4L2 UAPI is not only marginally more complex? > > > > No, and I rest my case on the code samples above. > > > >>> > >>> > >>> Also please read: > >>> > >>> https://medium.com/starting-up-security/evidence-of-absence-8148958da092 > >>> > >> > >> This reference probably needs a clarification. You argued, that V4L2 has a > >> good track record so far. Here is the quote: > >> > >>> FWIW V4L2 has been in use for a looong time (including in Chromebooks > >>> that do secure playback) and I am not aware of fundamental security > >>> issues with it. > >> > >> But absence of found major security issues doesn't tell us much about the > >> number of not found ones. Absence of evidence is not an evidence of > >> absence. At the link above a former Director of Security at Facebook > >> shares his thoughts about what could be a good evidence of absence of > >> major security problems. > > > > You are just FUDing now. > > These are well established security concepts AFAIK. Are you gaslighting? > > > The exact same argument could be made about > > virtio-video. If you want to borrow from the security world, I can ask > > you how going with virtio-video when V4L2 exists is different from > > rolling your own crypto. > > It is very different because obviously neither of them is a crypto. > > >> So a bug bounty program with high premiums that covers V4L2 would be a > >> better argument in favor of *already written code* in my opinion. Not for > >> new code. Also probably it is also an argument in favor of the spec, that > >> is the V4L2 UAPI. Like that it is polished enough. Not so sure about that > >> though. > >> > >> There actually are several bug bounty programs, that cover the kernel. > >> These are Google's kctf, ZDI's pwn2own, and zerodium AFAIK. However the > >> premiums are not even close to the ones mentioned in my reference. Anyway > >> this means, that using *the existing V4L2 code in the kernel* is probably > >> OK. But this creates some limitations if we want the actual code to still > >> be covered with these bug bounties, right? This means, that the host OS > >> has to be Linux and the actual hardware has to be exposed through a stable > >> V4L2 driver, that is in mainline for some time, and there has to be no or > >> little processing on top. For us this is not possible unfortunately. In > >> the end both things could be secure: > >> > >> 1. V4L2 pass through can be secure because of the bug bounty programs and > >> a lot of attention to the kernel in general. > >> 2. For the new code this doesn't work, so the spec should be as simple and > >> device-centric as possible. Because, all other things being equal, there > >> are fewer errors in simpler programs. So defining a subset of V4L2 UAPI > >> including the data types looks like a good idea to me. The stateful > >> decoder interface, that you point to, does not define a subset in the data > >> types. > > > > Wouldn't you have exactly the same problem by using a new guest-host > > protocol? > > I think this is far-fetched. Especially if our new protocol is > intentionally kept close to the reference protocol with a few things > made better and simpler. That would be my preferred route. > > > Are you going to start a bounty program for virtio-video to > > get an assurance that it is secure? > > I'm not sure it is possible because the device is proprietary. At least > the driver is going to be covered hopefully. So for the device we're > going to rely on simplicity, tests and audits. But who knows. I'm not > the one to make this decision. > > >> This is basically my reasoning. > >> > >> Also these two specs don't need to compete with each other. They have > >> different limitations and they are for different audiences. If you check > >> the XKCD's comic, it is about competing standards. > > > > They allow exactly the same thing (virtualization of video > > decoding/encoding) and there isn't any use-case of virtio-video that > > could not be covered equally well by V4L2. Reinventing a new video > > specification is pointless and will lead to unneeded fragmentation. > > I don't agree with these statements. > > > We should also expect reticence from the Linux community to upstream > > two virtual video drivers that basically do the same thing. > > This is hypothetical. > > -- > Alexander Gordeev > Senior Software Engineer > > OpenSynergy GmbH > Rotherstr. 20, 10245 Berlin > > Phone: +49 30 60 98 54 0 - 88 > Fax: +49 (30) 60 98 54 0 - 99 > EMail: alexander.gord...@opensynergy.com > > www.opensynergy.com > > Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B > Geschäftsführer/Managing Director: Régis Adjamah > > Please mind our privacy > notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> > pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 > DSGVO finden Sie > hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/> --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org