Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
> > I doubt you can handle pci memory bars like regular ram when it comes to > > dma and iommu support. There is a reason we have p2pdma in the first > > place ... > > The thing is that such bars would be actually backed by regular host > RAM. Do we really need the complexity of real PCI bar handling for > that? Well, taking shortcuts because of virtualization-specific assumptions already caused problems in the past. See the messy iommu handling we have in virtio-pci for example. So I don't feel like going the "we know it's just normal pages, so lets simplify things" route. Beside that hostmap isn't important for secure buffers, we wouldn't allow the guest mapping them anyway ;) cheers, Gerd
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hi, > That could be still a guest physical address. Like on a bare metal > system with TrustZone, there could be physical memory that is not > accessible to the CPU. Hmm. Yes, maybe. We could use the dma address of the (first page of the) guest buffer. In case of a secure buffer the guest has no access to the guest buffer would be unused, but it would at least make sure that things don't crash in case someone tries to map & access the buffer. The host should be able to figure the corresponding host buffer from the guest buffer address. When running drm-misc-next you should be able to test whenever that'll actually work without any virtio-gpu driver changes. cheers, Gerd
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
On Thu, Oct 17, 2019 at 4:44 PM Gerd Hoffmann wrote: > > Hi, > > > > Also note that the guest manages the address space, so the host can't > > > simply allocate guest page addresses. > > > > Is this really true? I'm not an expert in this area, but on a bare > > metal system it's the hardware or firmware that sets up the various > > physical address allocations on a hardware level and most of the time > > most of the addresses are already pre-assigned in hardware (like the > > DRAM base, various IOMEM spaces, etc.). > > Yes, the firmware does it. Same in a VM, ovmf or seabios (which runs > inside the guest) typically does it. And sometimes the linux kernel > too. > > > I think that means that we could have a reserved region that could be > > used by the host for dynamic memory hot-plug-like operation. The > > reference to memory hot-plug here is fully intentional, we could even > > use this feature of Linux to get struct pages for such memory if we > > really wanted. > > We try to avoid such quirks whenever possible. Negotiating such things > between qemu and firmware can be done if really needed (and actually is > done for memory hotplug support), but it's an extra interface which > needs maintenance. > > > > Mapping host virtio-gpu resources > > > into guest address space is planned, it'll most likely use a pci memory > > > bar to reserve some address space. The host can map resources into that > > > pci bar, on guest request. > > > > Sounds like a viable option too. Do you have a pointer to some > > description on how this would work on both host and guest side? > > Some early code: > https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-memory-v2 > https://git.kraxel.org/cgit/linux/log/?h=drm-virtio-memory-v2 > > Branches have other stuff too, look for "hostmem" commits. > > Not much code yet beyond creating a pci bar on the host and detecting > presence in the guest. > > On the host side qemu would create subregions inside the hostmem memory > region for the resources. > > Oh the guest side we can ioremap stuff, like vram. > > > > Hmm, well, pci memory bars are *not* backed by pages. Maybe we can use > > > Documentation/driver-api/pci/p2pdma.rst though. With that we might be > > > able to lookup buffers using device and dma address, without explicitly > > > creating some identifier. Not investigated yet in detail. > > > > Not backed by pages as in "struct page", but those are still regular > > pages of the physical address space. > > Well, maybe not. Host gem object could live in device memory, and if we > map them into the guest ... That's an interesting scenario, but in that case would we still want to map it into the guest? I think in such case may need to have some shadow buffer in regular RAM and that's already implemented in virtio-gpu. > > > That said, currently the sg_table interface is only able to describe > > physical memory using struct page pointers. It's been a long standing > > limitation affecting even bare metal systems, so perhaps it's just the > > right time to make them possible to use some other identifiers, like > > PFNs? > > I doubt you can handle pci memory bars like regular ram when it comes to > dma and iommu support. There is a reason we have p2pdma in the first > place ... The thing is that such bars would be actually backed by regular host RAM. Do we really need the complexity of real PCI bar handling for that? Best regards, Tomasz
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
On Thu, Oct 17, 2019 at 4:19 PM Gerd Hoffmann wrote: > > Hi, > > > That said, Chrome OS would use a similar model, except that we don't > > use ION. We would likely use minigbm backed by virtio-gpu to allocate > > appropriate secure buffers for us and then import them to the V4L2 > > driver. > > What exactly is a "secure buffer"? I guess a gem object where read > access is not allowed, only scanout to display? Who enforces this? > The hardware? Or the kernel driver? In general, it's a buffer which can be accessed only by a specific set of entities. The set depends on the use case and the level of security you want to achieve. In Chrome OS we at least want to make such buffers completely inaccessible for the guest, enforced by the VMM, for example by not installing corresponding memory into the guest address space (and not allowing transfers if the virtio-gpu shadow buffer model is used). Beyond that, the host memory itself could be further protected by some hardware mechanisms or another hypervisor running above the host OS, like in the ARM TrustZone model. That shouldn't matter for a VM guest, though. > > It might make sense for virtio-gpu to know that concept, to allow guests > ask for secure buffers. > > And of course we'll need some way to pass around identifiers for these > (and maybe other) buffers (from virtio-gpu device via guest drivers to > virtio-vdec device). virtio-gpu guest driver could generate a uuid for > that, attach it to the dma-buf and also notify the host so qemu can > maintain a uuid -> buffer lookup table. That could be still a guest physical address. Like on a bare metal system with TrustZone, there could be physical memory that is not accessible to the CPU. Best regards, Tomasz
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
On Tue, Oct 15, 2019 at 11:06 PM Dmitry Morozov wrote: > > Hello Gerd, > > On Dienstag, 15. Oktober 2019 09:54:22 CEST Gerd Hoffmann wrote: > > On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote: > > > On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote: > > > > Hi, > > > > > > > > > My take on this (for a decoder) would be to allocate memory for output > > > > > buffers from a secure ION heap, import in the v4l2 driver, and then to > > > > > provide those to the device using virtio. The device side then uses > > > > > the > > > > > dmabuf framework to make the buffers accessible for the hardware. I'm > > > > > not > > > > > sure about that, it's just an idea. > > > > > > > > Virtualization aside, how does the complete video decoding workflow > > > > work? I assume along the lines of ... > > > > > > > > (1) allocate buffer for decoded video frames (from ion). > > > > (2) export those buffers as dma-buf. > > > > (3) import dma-buf to video decoder. > > > > (4) import dma-buf to gpu. > > > > > > > > ... to establish buffers shared between video decoder and gpu? > > > > > > > > Then feed the video stream into the decoder, which decodes into the ion > > > > buffers? Ask the gpu to scanout the ion buffers to show the video? > > > > > > > > cheers, > > > > > > > > Gerd > > > > > > Yes, exactly. > > > > > > [decoder] > > > 1) Input buffers are allocated using VIDIOC_*BUFS. > > > > Ok. > > > > > 2) Output buffers are allocated in a guest specific manner (ION, gbm). > > > > Who decides whenever ION or gbm is used? The phrase "secure ION heap" > > used above sounds like using ION is required for decoding drm-protected > > content. > > I mention the secure ION heap to address this Chrome OS related point: > > 3) protected content decoding: the memory for decoded video frames > > must not be accessible to the guest at all > > There was an RFC to implement a secure memory allocation framework, but > apparently it was not accepted: https://lwn.net/Articles/661549/. > > In case of Android, it allocates GPU buffers for output frames, so it is the > gralloc implementation who decides how to allocate memory. It can use some > dedicated ION heap or can use libgbm. It can also be some proprietary > implementation. > > > > > So, do we have to worry about ION here? Or can we just use gbm? > > If we replace vendor specific code in the Android guest and provide a way to > communicate meatdata for buffer allocations from the device to the driver, we > can use gbm. In the PC world it might be easier. > > > > > [ Note: don't know much about ion, other than that it is used by > > android, is in staging right now and patches to move it > > out of staging are floating around @ dri-devel ] Chrome OS has cros_gralloc, which is an open source implementation of gralloc on top of minigbm (which itself is built on top of the Linux DRM interfaces). It's not limited to Chrome OS and I believe Intel also uses it for their native Android setups. With that, we could completely disregard ION, but I feel like it's not a core problem here. Whoever wants to use ION should be still able to do so if they back the allocations with guest pages or memory coming from the host using some other interface and it can be described using an identifier compatible with what we're discussing here. Best regards, Tomasz
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hi, > > Also note that the guest manages the address space, so the host can't > > simply allocate guest page addresses. > > Is this really true? I'm not an expert in this area, but on a bare > metal system it's the hardware or firmware that sets up the various > physical address allocations on a hardware level and most of the time > most of the addresses are already pre-assigned in hardware (like the > DRAM base, various IOMEM spaces, etc.). Yes, the firmware does it. Same in a VM, ovmf or seabios (which runs inside the guest) typically does it. And sometimes the linux kernel too. > I think that means that we could have a reserved region that could be > used by the host for dynamic memory hot-plug-like operation. The > reference to memory hot-plug here is fully intentional, we could even > use this feature of Linux to get struct pages for such memory if we > really wanted. We try to avoid such quirks whenever possible. Negotiating such things between qemu and firmware can be done if really needed (and actually is done for memory hotplug support), but it's an extra interface which needs maintenance. > > Mapping host virtio-gpu resources > > into guest address space is planned, it'll most likely use a pci memory > > bar to reserve some address space. The host can map resources into that > > pci bar, on guest request. > > Sounds like a viable option too. Do you have a pointer to some > description on how this would work on both host and guest side? Some early code: https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-memory-v2 https://git.kraxel.org/cgit/linux/log/?h=drm-virtio-memory-v2 Branches have other stuff too, look for "hostmem" commits. Not much code yet beyond creating a pci bar on the host and detecting presence in the guest. On the host side qemu would create subregions inside the hostmem memory region for the resources. Oh the guest side we can ioremap stuff, like vram. > > Hmm, well, pci memory bars are *not* backed by pages. Maybe we can use > > Documentation/driver-api/pci/p2pdma.rst though. With that we might be > > able to lookup buffers using device and dma address, without explicitly > > creating some identifier. Not investigated yet in detail. > > Not backed by pages as in "struct page", but those are still regular > pages of the physical address space. Well, maybe not. Host gem object could live in device memory, and if we map them into the guest ... > That said, currently the sg_table interface is only able to describe > physical memory using struct page pointers. It's been a long standing > limitation affecting even bare metal systems, so perhaps it's just the > right time to make them possible to use some other identifiers, like > PFNs? I doubt you can handle pci memory bars like regular ram when it comes to dma and iommu support. There is a reason we have p2pdma in the first place ... cheers, Gerd
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hi, > That said, Chrome OS would use a similar model, except that we don't > use ION. We would likely use minigbm backed by virtio-gpu to allocate > appropriate secure buffers for us and then import them to the V4L2 > driver. What exactly is a "secure buffer"? I guess a gem object where read access is not allowed, only scanout to display? Who enforces this? The hardware? Or the kernel driver? It might make sense for virtio-gpu to know that concept, to allow guests ask for secure buffers. And of course we'll need some way to pass around identifiers for these (and maybe other) buffers (from virtio-gpu device via guest drivers to virtio-vdec device). virtio-gpu guest driver could generate a uuid for that, attach it to the dma-buf and also notify the host so qemu can maintain a uuid -> buffer lookup table. cheers, Gerd
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
> Hmm, the cross-device buffer sharing framework I have in mind would > basically be a buffer registry. virtio-gpu would create buffers as > usual, create a identifier somehow (details to be hashed out), attach > the identifier to the dma-buf so it can be used as outlined above. Using physical addresses to identify buffers is using the guest physical address space as the buffer registry. Especially if every device should be able to operate in isolation, then each virtio protocol will have some way to allocate buffers that are accessible to the guest and host. This requires guest physical addresses, and the guest physical address of the start of the buffer can serve as the unique identifier for the buffer in both the guest and the host. Even with buffers that are only accessible to the host, I think it's reasonable to allocate guest physical addresses since the pages still exist (in the same way physical addresses for secure physical memory make sense). This approach also sidesteps the need for explicit registration. With explicit registration, either there would need to be some centralized buffer exporter device or each protocol would need to have its own export function. Using guest physical addresses means that buffers get a unique identifier during creation. For example, in the virtio-gpu protocol, buffers would get this identifier through VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING, or through VIRTIO_GPU_CMD_RESOURCE_CREATE_V2 with impending additions to resource creation. > Also note that the guest manages the address space, so the host can't > simply allocate guest page addresses. Mapping host virtio-gpu resources > into guest address space is planned, it'll most likely use a pci memory > bar to reserve some address space. The host can map resources into that > pci bar, on guest request. > > > - virtio-gpu driver could then create a regular DMA-buf object for > > such memory, because it's just backed by pages (even though they may > > not be accessible to the guest; just like in the case of TrustZone > > memory protection on bare metal systems), > > Hmm, well, pci memory bars are *not* backed by pages. Maybe we can use > Documentation/driver-api/pci/p2pdma.rst though. With that we might be > able to lookup buffers using device and dma address, without explicitly > creating some identifier. Not investigated yet in detail. For the linux guest implementation, mapping a dma-buf doesn't necessarily require actual pages. The exporting driver's map_dma_buf function just needs to provide a sg_table with populated dma_addres fields, it doesn't actually need to populate the sg_table with pages. At the very least, there are places such as i915_gem_stolen.c and (some situations of) videobuf-dma-sg.c that take this approach. Cheers, David
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
On Mon, Oct 14, 2019 at 9:19 PM Gerd Hoffmann wrote: > > > > Well. I think before even discussing the protocol details we need a > > > reasonable plan for buffer handling. I think using virtio-gpu buffers > > > should be an optional optimization and not a requirement. Also the > > > motivation for that should be clear (Let the host decoder write directly > > > to virtio-gpu resources, to display video without copying around the > > > decoded framebuffers from one device to another). > > > > Just to make sure we're on the same page, what would the buffers come > > from if we don't use this optimization? > > > > I can imagine a setup like this; > > 1) host device allocates host memory appropriate for usage with host > > video decoder, > > 2) guest driver allocates arbitrary guest pages for storage > > accessible to the guest software, > > 3) guest userspace writes input for the decoder to guest pages, > > 4) guest driver passes the list of pages for the input and output > > buffers to the host device > > 5) host device copies data from input guest pages to host buffer > > 6) host device runs the decoding > > 7) host device copies decoded frame to output guest pages > > 8) guest userspace can access decoded frame from those pages; back to 3 > > > > Is that something you have in mind? > > I don't have any specific workflow in mind. > > If you want display the decoded video frames you want use dma-bufs shared > by video decoder and gpu, right? So the userspace application (video > player probably) would create the buffers using one of the drivers, > export them as dma-buf, then import them into the other driver. Just > like you would do on physical hardware. So, when using virtio-gpu > buffers: > > (1) guest app creates buffers using virtio-gpu. > (2) guest app exports virtio-gpu buffers buffers as dma-buf. > (3) guest app imports the dma-bufs into virtio-vdec. > (4) guest app asks the virtio-vdec driver to write the decoded > frames into the dma-bufs. > (5) guest app asks the virtio-gpu driver to display the decoded > frame. > > The guest video decoder driver passes the dma-buf pages to the host, and > it is the host driver's job to fill the buffer. How this is done > exactly might depend on hardware capabilities (whenever a host-allocated > bounce buffer is needed or whenever the hardware can decode directly to > the dma-buf passed by the guest driver) and is an implementation detail. > > Now, with cross-device sharing added the virtio-gpu would attach some > kind of identifier to the dma-buf, virtio-vdec could fetch the > identifier and pass it to the host too, and the host virtio-vdec device > can use the identifier to get a host dma-buf handle for the (virtio-gpu) > buffer. Ask the host video decoder driver to import the host dma-buf. > If it all worked fine it can ask the host hardware to decode directly to > the host virtio-gpu resource. > Agreed. > > > Referencing virtio-gpu buffers needs a better plan than just re-using > > > virtio-gpu resource handles. The handles are device-specific. What if > > > there are multiple virtio-gpu devices present in the guest? > > > > > > I think we need a framework for cross-device buffer sharing. One > > > possible option would be to have some kind of buffer registry, where > > > buffers can be registered for cross-device sharing and get a unique > > > id (a uuid maybe?). Drivers would typically register buffers on > > > dma-buf export. > > > > This approach could possibly let us handle this transparently to > > importers, which would work for guest kernel subsystems that rely on > > the ability to handle buffers like native memory (e.g. having a > > sgtable or DMA address) for them. > > > > How about allocating guest physical addresses for memory corresponding > > to those buffers? On the virtio-gpu example, that could work like > > this: > > - by default a virtio-gpu buffer has only a resource handle, > > - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the > > virtio-gpu device export the buffer to a host framework (inside the > > VMM) that would allocate guest page addresses for it, which the > > command would return in a response to the guest, > > Hmm, the cross-device buffer sharing framework I have in mind would > basically be a buffer registry. virtio-gpu would create buffers as > usual, create a identifier somehow (details to be hashed out), attach > the identifier to the dma-buf so it can be used as outlined above. > > Also note that the guest manages the address space, so the host can't > simply allocate guest page addresses. Is this really true? I'm not an expert in this area, but on a bare metal system it's the hardware or firmware that sets up the various physical address allocations on a hardware level and most of the time most of the addresses are already pre-assigned in hardware (like the DRAM base, various IOMEM spaces, etc.). I think that means that we could have a reserved region that co
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
On Fri, Oct 11, 2019 at 5:54 PM Dmitry Morozov wrote: > > Hi Tomasz, > > On Mittwoch, 9. Oktober 2019 05:55:45 CEST Tomasz Figa wrote: > > On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov > > > > wrote: > > > Hi Tomasz, > > > > > > On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote: > > > > Hi Dmitry, > > > > > > > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov > > > > > > > > wrote: > > > > > Hello, > > > > > > > > > > We at OpenSynergy are also working on an abstract paravirtualized > > > > > video > > > > > streaming device that operates input and/or output data buffers and > > > > > can be > > > > > used as a generic video decoder/encoder/input/output device. > > > > > > > > > > We would be glad to share our thoughts and contribute to the > > > > > discussion. > > > > > Please see some comments regarding buffer allocation inline. > > > > > > > > > > Best regards, > > > > > Dmitry. > > > > > > > > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote: > > > > > > Hi Gerd, > > > > > > > > > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann > wrote: > > > > > > > Hi, > > > > > > > > > > > > > > > Our prototype implementation uses [4], which allows the > > > > > > > > virtio-vdec > > > > > > > > device to use buffers allocated by virtio-gpu device. > > > > > > > > > > > > > > > > [4] https://lkml.org/lkml/2019/9/12/157 > > > > > > > > > > > > First of all, thanks for taking a look at this RFC and for valuable > > > > > > feedback. Sorry for the late reply. > > > > > > > > > > > > For reference, Keiichi is working with me and David Stevens on > > > > > > accelerated video support for virtual machines and integration with > > > > > > other virtual devices, like virtio-gpu for rendering or our > > > > > > currently-downstream virtio-wayland for display (I believe there is > > > > > > ongoing work to solve this problem in upstream too). > > > > > > > > > > > > > Well. I think before even discussing the protocol details we need > > > > > > > a > > > > > > > reasonable plan for buffer handling. I think using virtio-gpu > > > > > > > buffers > > > > > > > should be an optional optimization and not a requirement. Also > > > > > > > the > > > > > > > motivation for that should be clear (Let the host decoder write > > > > > > > directly > > > > > > > to virtio-gpu resources, to display video without copying around > > > > > > > the > > > > > > > decoded framebuffers from one device to another). > > > > > > > > > > > > Just to make sure we're on the same page, what would the buffers > > > > > > come > > > > > > from if we don't use this optimization? > > > > > > > > > > > > I can imagine a setup like this; > > > > > > > > > > > > 1) host device allocates host memory appropriate for usage with > > > > > > host > > > > > > > > > > > > video decoder, > > > > > > > > > > > > 2) guest driver allocates arbitrary guest pages for storage > > > > > > > > > > > > accessible to the guest software, > > > > > > > > > > > > 3) guest userspace writes input for the decoder to guest pages, > > > > > > 4) guest driver passes the list of pages for the input and output > > > > > > > > > > > > buffers to the host device > > > > > > > > > > > > 5) host device copies data from input guest pages to host buffer > > > > > > 6) host device runs the decoding > > > > > > 7) host device copies decoded frame to output guest pages > > > > > > 8) guest userspace can access decoded frame from those pages; back > > > > > > to 3 > > > > > > > > > > > > Is that something you have in mind? > > > > > > > > > > While GPU side allocations can be useful (especially in case of > > > > > decoder), > > > > > it could be more practical to stick to driver side allocations. This > > > > > is > > > > > also due to the fact that paravirtualized encoders and cameras are not > > > > > necessarily require a GPU device. > > > > > > > > > > Also, the v4l2 framework already features convenient helpers for CMA > > > > > and > > > > > SG > > > > > allocations. The buffers can be used in the same manner as in > > > > > virtio-gpu: > > > > > buffers are first attached to an already allocated buffer/resource > > > > > descriptor and then are made available for processing by the device > > > > > using > > > > > a dedicated command from the driver. > > > > > > > > First of all, thanks a lot for your input. This is a relatively new > > > > area of virtualization and we definitely need to collect various > > > > possible perspectives in the discussion. > > > > > > > > From Chrome OS point of view, there are several aspects for which the > > > > guest side allocation doesn't really work well: > > > > 1) host-side hardware has a lot of specific low level allocation > > > > requirements, like alignments, paddings, address space limitations and > > > > so on, which is not something that can be (easily) taught to the guest > > > > OS, > > > > > > I couldn't agree more. There are some changes by Greg to add support for > > > querying GPU buffer met
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hello Gerd,
On Dienstag, 15. Oktober 2019 09:54:22 CEST Gerd Hoffmann wrote:
> On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
> > On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> > > Hi,
> > >
> > > > My take on this (for a decoder) would be to allocate memory for output
> > > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > > provide those to the device using virtio. The device side then uses
> > > > the
> > > > dmabuf framework to make the buffers accessible for the hardware. I'm
> > > > not
> > > > sure about that, it's just an idea.
> > >
> > > Virtualization aside, how does the complete video decoding workflow
> > > work? I assume along the lines of ...
> > >
> > > (1) allocate buffer for decoded video frames (from ion).
> > > (2) export those buffers as dma-buf.
> > > (3) import dma-buf to video decoder.
> > > (4) import dma-buf to gpu.
> > >
> > > ... to establish buffers shared between video decoder and gpu?
> > >
> > > Then feed the video stream into the decoder, which decodes into the ion
> > > buffers? Ask the gpu to scanout the ion buffers to show the video?
> > >
> > > cheers,
> > >
> > > Gerd
> >
> > Yes, exactly.
> >
> > [decoder]
> > 1) Input buffers are allocated using VIDIOC_*BUFS.
>
> Ok.
>
> > 2) Output buffers are allocated in a guest specific manner (ION, gbm).
>
> Who decides whenever ION or gbm is used? The phrase "secure ION heap"
> used above sounds like using ION is required for decoding drm-protected
> content.
I mention the secure ION heap to address this Chrome OS related point:
> 3) protected content decoding: the memory for decoded video frames
> must not be accessible to the guest at all
There was an RFC to implement a secure memory allocation framework, but
apparently it was not accepted: https://lwn.net/Articles/661549/.
In case of Android, it allocates GPU buffers for output frames, so it is the
gralloc implementation who decides how to allocate memory. It can use some
dedicated ION heap or can use libgbm. It can also be some proprietary
implementation.
>
> So, do we have to worry about ION here? Or can we just use gbm?
If we replace vendor specific code in the Android guest and provide a way to
communicate meatdata for buffer allocations from the device to the driver, we
can use gbm. In the PC world it might be easier.
>
> [ Note: don't know much about ion, other than that it is used by
> android, is in staging right now and patches to move it
> out of staging are floating around @ dri-devel ]
>
> > 3) Both input and output buffers are exported as dma-bufs.
> > 4) The backing storage of both inputs and outputs is made available to the
> > device.
> > 5) Decoder hardware writes to output buffers directly.
>
> As expected.
>
> > 6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.
>
> Ok. So, virtio-gpu has support for dma-buf exports (in drm-misc-next,
> should land upstream in kernel 5.5). dma-buf imports are not that
> simple unfortunately. When using the gbm allocation route dma-buf
> exports are good enough though.
>
> The virtio-gpu resources have both a host buffer and a guest buffer[1]
> Data can be copied using the DRM_IOCTL_VIRTGPU_TRANSFER_{FROM,TO}_HOST
> ioctls. The dma-buf export will export the guest buffer (which lives
> in guest ram).
>
> It would make sense for the decoded video to go directly to the host
> buffer though. First because we want avoid copying the video frames for
> performance reasons, and second because we might not be able to copy
> video frames (drm ...).
>
> This is where the buffer registry idea comes in. Attach a (host)
> identifier to (guest) dma-bufs, which then allows host device emulation
> share buffers, i.e. virtio-vdec device emulation could decode to a
> dma-buf it got from virtio-gpu device emulation.
Yes. Also, as I mentioned above, in case of gbm the buffers already can
originate from GPU.
Best regards,
Dmitry.
>
> Alternatively we could use virtual ION (or whatever it becomes after
> de-staging) for buffer management, with both virtio-vdec and virtio-gpu
> importing dma-bufs from virtual ION on both guest and host side.
>
> cheers,
> Gerd
>
> [1] support for shared buffers is in progress.
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
>
> On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> > Hi,
> >
> > > My take on this (for a decoder) would be to allocate memory for output
> > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > provide those to the device using virtio. The device side then uses the
> > > dmabuf framework to make the buffers accessible for the hardware. I'm not
> > > sure about that, it's just an idea.
> >
> > Virtualization aside, how does the complete video decoding workflow
> > work? I assume along the lines of ...
> >
> > (1) allocate buffer for decoded video frames (from ion).
> > (2) export those buffers as dma-buf.
> > (3) import dma-buf to video decoder.
> > (4) import dma-buf to gpu.
> >
> > ... to establish buffers shared between video decoder and gpu?
> >
> > Then feed the video stream into the decoder, which decodes into the ion
> > buffers? Ask the gpu to scanout the ion buffers to show the video?
> >
> > cheers,
> > Gerd
>
> Yes, exactly.
>
> [decoder]
> 1) Input buffers are allocated using VIDIOC_*BUFS.
Ok.
> 2) Output buffers are allocated in a guest specific manner (ION, gbm).
Who decides whenever ION or gbm is used? The phrase "secure ION heap"
used above sounds like using ION is required for decoding drm-protected
content.
So, do we have to worry about ION here? Or can we just use gbm?
[ Note: don't know much about ion, other than that it is used by
android, is in staging right now and patches to move it
out of staging are floating around @ dri-devel ]
> 3) Both input and output buffers are exported as dma-bufs.
> 4) The backing storage of both inputs and outputs is made available to the
> device.
> 5) Decoder hardware writes to output buffers directly.
As expected.
> 6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.
Ok. So, virtio-gpu has support for dma-buf exports (in drm-misc-next,
should land upstream in kernel 5.5). dma-buf imports are not that
simple unfortunately. When using the gbm allocation route dma-buf
exports are good enough though.
The virtio-gpu resources have both a host buffer and a guest buffer[1]
Data can be copied using the DRM_IOCTL_VIRTGPU_TRANSFER_{FROM,TO}_HOST
ioctls. The dma-buf export will export the guest buffer (which lives
in guest ram).
It would make sense for the decoded video to go directly to the host
buffer though. First because we want avoid copying the video frames for
performance reasons, and second because we might not be able to copy
video frames (drm ...).
This is where the buffer registry idea comes in. Attach a (host)
identifier to (guest) dma-bufs, which then allows host device emulation
share buffers, i.e. virtio-vdec device emulation could decode to a
dma-buf it got from virtio-gpu device emulation.
Alternatively we could use virtual ION (or whatever it becomes after
de-staging) for buffer management, with both virtio-vdec and virtio-gpu
importing dma-bufs from virtual ION on both guest and host side.
cheers,
Gerd
[1] support for shared buffers is in progress.
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote: > Hi, > > > My take on this (for a decoder) would be to allocate memory for output > > buffers from a secure ION heap, import in the v4l2 driver, and then to > > provide those to the device using virtio. The device side then uses the > > dmabuf framework to make the buffers accessible for the hardware. I'm not > > sure about that, it's just an idea. > > Virtualization aside, how does the complete video decoding workflow > work? I assume along the lines of ... > > (1) allocate buffer for decoded video frames (from ion). > (2) export those buffers as dma-buf. > (3) import dma-buf to video decoder. > (4) import dma-buf to gpu. > > ... to establish buffers shared between video decoder and gpu? > > Then feed the video stream into the decoder, which decodes into the ion > buffers? Ask the gpu to scanout the ion buffers to show the video? > > cheers, > Gerd Yes, exactly. [decoder] 1) Input buffers are allocated using VIDIOC_*BUFS. 2) Output buffers are allocated in a guest specific manner (ION, gbm). 3) Both input and output buffers are exported as dma-bufs. 4) The backing storage of both inputs and outputs is made available to the device. 5) Decoder hardware writes to output buffers directly. 6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu. Best regards, Dmitry
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hi, > My take on this (for a decoder) would be to allocate memory for output > buffers > from a secure ION heap, import in the v4l2 driver, and then to provide those > to the device using virtio. The device side then uses the dmabuf framework to > make the buffers accessible for the hardware. I'm not sure about that, it's > just an idea. Virtualization aside, how does the complete video decoding workflow work? I assume along the lines of ... (1) allocate buffer for decoded video frames (from ion). (2) export those buffers as dma-buf. (3) import dma-buf to video decoder. (4) import dma-buf to gpu. ... to establish buffers shared between video decoder and gpu? Then feed the video stream into the decoder, which decodes into the ion buffers? Ask the gpu to scanout the ion buffers to show the video? cheers, Gerd
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
> > Well. I think before even discussing the protocol details we need a > > reasonable plan for buffer handling. I think using virtio-gpu buffers > > should be an optional optimization and not a requirement. Also the > > motivation for that should be clear (Let the host decoder write directly > > to virtio-gpu resources, to display video without copying around the > > decoded framebuffers from one device to another). > > Just to make sure we're on the same page, what would the buffers come > from if we don't use this optimization? > > I can imagine a setup like this; > 1) host device allocates host memory appropriate for usage with host > video decoder, > 2) guest driver allocates arbitrary guest pages for storage > accessible to the guest software, > 3) guest userspace writes input for the decoder to guest pages, > 4) guest driver passes the list of pages for the input and output > buffers to the host device > 5) host device copies data from input guest pages to host buffer > 6) host device runs the decoding > 7) host device copies decoded frame to output guest pages > 8) guest userspace can access decoded frame from those pages; back to 3 > > Is that something you have in mind? I don't have any specific workflow in mind. If you want display the decoded video frames you want use dma-bufs shared by video decoder and gpu, right? So the userspace application (video player probably) would create the buffers using one of the drivers, export them as dma-buf, then import them into the other driver. Just like you would do on physical hardware. So, when using virtio-gpu buffers: (1) guest app creates buffers using virtio-gpu. (2) guest app exports virtio-gpu buffers buffers as dma-buf. (3) guest app imports the dma-bufs into virtio-vdec. (4) guest app asks the virtio-vdec driver to write the decoded frames into the dma-bufs. (5) guest app asks the virtio-gpu driver to display the decoded frame. The guest video decoder driver passes the dma-buf pages to the host, and it is the host driver's job to fill the buffer. How this is done exactly might depend on hardware capabilities (whenever a host-allocated bounce buffer is needed or whenever the hardware can decode directly to the dma-buf passed by the guest driver) and is an implementation detail. Now, with cross-device sharing added the virtio-gpu would attach some kind of identifier to the dma-buf, virtio-vdec could fetch the identifier and pass it to the host too, and the host virtio-vdec device can use the identifier to get a host dma-buf handle for the (virtio-gpu) buffer. Ask the host video decoder driver to import the host dma-buf. If it all worked fine it can ask the host hardware to decode directly to the host virtio-gpu resource. > > Referencing virtio-gpu buffers needs a better plan than just re-using > > virtio-gpu resource handles. The handles are device-specific. What if > > there are multiple virtio-gpu devices present in the guest? > > > > I think we need a framework for cross-device buffer sharing. One > > possible option would be to have some kind of buffer registry, where > > buffers can be registered for cross-device sharing and get a unique > > id (a uuid maybe?). Drivers would typically register buffers on > > dma-buf export. > > This approach could possibly let us handle this transparently to > importers, which would work for guest kernel subsystems that rely on > the ability to handle buffers like native memory (e.g. having a > sgtable or DMA address) for them. > > How about allocating guest physical addresses for memory corresponding > to those buffers? On the virtio-gpu example, that could work like > this: > - by default a virtio-gpu buffer has only a resource handle, > - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the > virtio-gpu device export the buffer to a host framework (inside the > VMM) that would allocate guest page addresses for it, which the > command would return in a response to the guest, Hmm, the cross-device buffer sharing framework I have in mind would basically be a buffer registry. virtio-gpu would create buffers as usual, create a identifier somehow (details to be hashed out), attach the identifier to the dma-buf so it can be used as outlined above. Also note that the guest manages the address space, so the host can't simply allocate guest page addresses. Mapping host virtio-gpu resources into guest address space is planned, it'll most likely use a pci memory bar to reserve some address space. The host can map resources into that pci bar, on guest request. > - virtio-gpu driver could then create a regular DMA-buf object for > such memory, because it's just backed by pages (even though they may > not be accessible to the guest; just like in the case of TrustZone > memory protection on bare metal systems), Hmm, well, pci memory bars are *not* backed by pages. Maybe we can use Documentation/driver-api/pci/p2pdma.rst though. With that we might be
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hi Tomasz, On Mittwoch, 9. Oktober 2019 05:55:45 CEST Tomasz Figa wrote: > On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov > > wrote: > > Hi Tomasz, > > > > On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote: > > > Hi Dmitry, > > > > > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov > > > > > > wrote: > > > > Hello, > > > > > > > > We at OpenSynergy are also working on an abstract paravirtualized > > > > video > > > > streaming device that operates input and/or output data buffers and > > > > can be > > > > used as a generic video decoder/encoder/input/output device. > > > > > > > > We would be glad to share our thoughts and contribute to the > > > > discussion. > > > > Please see some comments regarding buffer allocation inline. > > > > > > > > Best regards, > > > > Dmitry. > > > > > > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote: > > > > > Hi Gerd, > > > > > > > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann wrote: > > > > > > Hi, > > > > > > > > > > > > > Our prototype implementation uses [4], which allows the > > > > > > > virtio-vdec > > > > > > > device to use buffers allocated by virtio-gpu device. > > > > > > > > > > > > > > [4] https://lkml.org/lkml/2019/9/12/157 > > > > > > > > > > First of all, thanks for taking a look at this RFC and for valuable > > > > > feedback. Sorry for the late reply. > > > > > > > > > > For reference, Keiichi is working with me and David Stevens on > > > > > accelerated video support for virtual machines and integration with > > > > > other virtual devices, like virtio-gpu for rendering or our > > > > > currently-downstream virtio-wayland for display (I believe there is > > > > > ongoing work to solve this problem in upstream too). > > > > > > > > > > > Well. I think before even discussing the protocol details we need > > > > > > a > > > > > > reasonable plan for buffer handling. I think using virtio-gpu > > > > > > buffers > > > > > > should be an optional optimization and not a requirement. Also > > > > > > the > > > > > > motivation for that should be clear (Let the host decoder write > > > > > > directly > > > > > > to virtio-gpu resources, to display video without copying around > > > > > > the > > > > > > decoded framebuffers from one device to another). > > > > > > > > > > Just to make sure we're on the same page, what would the buffers > > > > > come > > > > > from if we don't use this optimization? > > > > > > > > > > I can imagine a setup like this; > > > > > > > > > > 1) host device allocates host memory appropriate for usage with > > > > > host > > > > > > > > > > video decoder, > > > > > > > > > > 2) guest driver allocates arbitrary guest pages for storage > > > > > > > > > > accessible to the guest software, > > > > > > > > > > 3) guest userspace writes input for the decoder to guest pages, > > > > > 4) guest driver passes the list of pages for the input and output > > > > > > > > > > buffers to the host device > > > > > > > > > > 5) host device copies data from input guest pages to host buffer > > > > > 6) host device runs the decoding > > > > > 7) host device copies decoded frame to output guest pages > > > > > 8) guest userspace can access decoded frame from those pages; back > > > > > to 3 > > > > > > > > > > Is that something you have in mind? > > > > > > > > While GPU side allocations can be useful (especially in case of > > > > decoder), > > > > it could be more practical to stick to driver side allocations. This > > > > is > > > > also due to the fact that paravirtualized encoders and cameras are not > > > > necessarily require a GPU device. > > > > > > > > Also, the v4l2 framework already features convenient helpers for CMA > > > > and > > > > SG > > > > allocations. The buffers can be used in the same manner as in > > > > virtio-gpu: > > > > buffers are first attached to an already allocated buffer/resource > > > > descriptor and then are made available for processing by the device > > > > using > > > > a dedicated command from the driver. > > > > > > First of all, thanks a lot for your input. This is a relatively new > > > area of virtualization and we definitely need to collect various > > > possible perspectives in the discussion. > > > > > > From Chrome OS point of view, there are several aspects for which the > > > guest side allocation doesn't really work well: > > > 1) host-side hardware has a lot of specific low level allocation > > > requirements, like alignments, paddings, address space limitations and > > > so on, which is not something that can be (easily) taught to the guest > > > OS, > > > > I couldn't agree more. There are some changes by Greg to add support for > > querying GPU buffer metadata. Probably those changes could be integrated > > with 'a framework for cross-device buffer sharing' (something that Greg > > mentioned earlier in the thread and that would totally make sense). > > Did you mean one of Gerd's proposals? > > I think we need some
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov wrote: > > Hi Tomasz, > > On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote: > > Hi Dmitry, > > > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov > > > > wrote: > > > Hello, > > > > > > We at OpenSynergy are also working on an abstract paravirtualized video > > > streaming device that operates input and/or output data buffers and can be > > > used as a generic video decoder/encoder/input/output device. > > > > > > We would be glad to share our thoughts and contribute to the discussion. > > > Please see some comments regarding buffer allocation inline. > > > > > > Best regards, > > > Dmitry. > > > > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote: > > > > Hi Gerd, > > > > > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann wrote: > > > > > Hi, > > > > > > > > > > > Our prototype implementation uses [4], which allows the virtio-vdec > > > > > > device to use buffers allocated by virtio-gpu device. > > > > > > > > > > > > [4] https://lkml.org/lkml/2019/9/12/157 > > > > > > > > First of all, thanks for taking a look at this RFC and for valuable > > > > feedback. Sorry for the late reply. > > > > > > > > For reference, Keiichi is working with me and David Stevens on > > > > accelerated video support for virtual machines and integration with > > > > other virtual devices, like virtio-gpu for rendering or our > > > > currently-downstream virtio-wayland for display (I believe there is > > > > ongoing work to solve this problem in upstream too). > > > > > > > > > Well. I think before even discussing the protocol details we need a > > > > > reasonable plan for buffer handling. I think using virtio-gpu buffers > > > > > should be an optional optimization and not a requirement. Also the > > > > > motivation for that should be clear (Let the host decoder write > > > > > directly > > > > > to virtio-gpu resources, to display video without copying around the > > > > > decoded framebuffers from one device to another). > > > > > > > > Just to make sure we're on the same page, what would the buffers come > > > > from if we don't use this optimization? > > > > > > > > I can imagine a setup like this; > > > > > > > > 1) host device allocates host memory appropriate for usage with host > > > > > > > > video decoder, > > > > > > > > 2) guest driver allocates arbitrary guest pages for storage > > > > > > > > accessible to the guest software, > > > > > > > > 3) guest userspace writes input for the decoder to guest pages, > > > > 4) guest driver passes the list of pages for the input and output > > > > > > > > buffers to the host device > > > > > > > > 5) host device copies data from input guest pages to host buffer > > > > 6) host device runs the decoding > > > > 7) host device copies decoded frame to output guest pages > > > > 8) guest userspace can access decoded frame from those pages; back to 3 > > > > > > > > Is that something you have in mind? > > > > > > While GPU side allocations can be useful (especially in case of decoder), > > > it could be more practical to stick to driver side allocations. This is > > > also due to the fact that paravirtualized encoders and cameras are not > > > necessarily require a GPU device. > > > > > > Also, the v4l2 framework already features convenient helpers for CMA and > > > SG > > > allocations. The buffers can be used in the same manner as in virtio-gpu: > > > buffers are first attached to an already allocated buffer/resource > > > descriptor and then are made available for processing by the device using > > > a dedicated command from the driver. > > > > First of all, thanks a lot for your input. This is a relatively new > > area of virtualization and we definitely need to collect various > > possible perspectives in the discussion. > > > > From Chrome OS point of view, there are several aspects for which the > > guest side allocation doesn't really work well: > > 1) host-side hardware has a lot of specific low level allocation > > requirements, like alignments, paddings, address space limitations and > > so on, which is not something that can be (easily) taught to the guest > > OS, > I couldn't agree more. There are some changes by Greg to add support for > querying GPU buffer metadata. Probably those changes could be integrated with > 'a framework for cross-device buffer sharing' (something that Greg mentioned > earlier in the thread and that would totally make sense). > Did you mean one of Gerd's proposals? I think we need some clarification there, as it's not clear to me whether the framework is host-side, guest-side or both. The approach I suggested would rely on a host-side framework and guest-side wouldn't need any special handling for sharing, because the memory would behave as on bare metal. However allocation would still need some special API to express high level buffer parameters and delegate the exact allocation requirements to the host. Currently virtio-gpu already has such interfac
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hi Tomasz, On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote: > Hi Dmitry, > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov > > wrote: > > Hello, > > > > We at OpenSynergy are also working on an abstract paravirtualized video > > streaming device that operates input and/or output data buffers and can be > > used as a generic video decoder/encoder/input/output device. > > > > We would be glad to share our thoughts and contribute to the discussion. > > Please see some comments regarding buffer allocation inline. > > > > Best regards, > > Dmitry. > > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote: > > > Hi Gerd, > > > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann wrote: > > > > Hi, > > > > > > > > > Our prototype implementation uses [4], which allows the virtio-vdec > > > > > device to use buffers allocated by virtio-gpu device. > > > > > > > > > > [4] https://lkml.org/lkml/2019/9/12/157 > > > > > > First of all, thanks for taking a look at this RFC and for valuable > > > feedback. Sorry for the late reply. > > > > > > For reference, Keiichi is working with me and David Stevens on > > > accelerated video support for virtual machines and integration with > > > other virtual devices, like virtio-gpu for rendering or our > > > currently-downstream virtio-wayland for display (I believe there is > > > ongoing work to solve this problem in upstream too). > > > > > > > Well. I think before even discussing the protocol details we need a > > > > reasonable plan for buffer handling. I think using virtio-gpu buffers > > > > should be an optional optimization and not a requirement. Also the > > > > motivation for that should be clear (Let the host decoder write > > > > directly > > > > to virtio-gpu resources, to display video without copying around the > > > > decoded framebuffers from one device to another). > > > > > > Just to make sure we're on the same page, what would the buffers come > > > from if we don't use this optimization? > > > > > > I can imagine a setup like this; > > > > > > 1) host device allocates host memory appropriate for usage with host > > > > > > video decoder, > > > > > > 2) guest driver allocates arbitrary guest pages for storage > > > > > > accessible to the guest software, > > > > > > 3) guest userspace writes input for the decoder to guest pages, > > > 4) guest driver passes the list of pages for the input and output > > > > > > buffers to the host device > > > > > > 5) host device copies data from input guest pages to host buffer > > > 6) host device runs the decoding > > > 7) host device copies decoded frame to output guest pages > > > 8) guest userspace can access decoded frame from those pages; back to 3 > > > > > > Is that something you have in mind? > > > > While GPU side allocations can be useful (especially in case of decoder), > > it could be more practical to stick to driver side allocations. This is > > also due to the fact that paravirtualized encoders and cameras are not > > necessarily require a GPU device. > > > > Also, the v4l2 framework already features convenient helpers for CMA and > > SG > > allocations. The buffers can be used in the same manner as in virtio-gpu: > > buffers are first attached to an already allocated buffer/resource > > descriptor and then are made available for processing by the device using > > a dedicated command from the driver. > > First of all, thanks a lot for your input. This is a relatively new > area of virtualization and we definitely need to collect various > possible perspectives in the discussion. > > From Chrome OS point of view, there are several aspects for which the > guest side allocation doesn't really work well: > 1) host-side hardware has a lot of specific low level allocation > requirements, like alignments, paddings, address space limitations and > so on, which is not something that can be (easily) taught to the guest > OS, I couldn't agree more. There are some changes by Greg to add support for querying GPU buffer metadata. Probably those changes could be integrated with 'a framework for cross-device buffer sharing' (something that Greg mentioned earlier in the thread and that would totally make sense). > 2) allocation system is designed to be centralized, like Android > gralloc, because there is almost never a case when a buffer is to be > used only with 1 specific device. 99% of the cases are pipelines like > decoder -> GPU/display, camera -> encoder + GPU/display, GPU -> > encoder and so on, which means that allocations need to take into > account multiple hardware constraints. > 3) protected content decoding: the memory for decoded video frames > must not be accessible to the guest at all This looks like a valid use case. Would it also be possible for instance to allocate mem from a secure ION heap on the guest and then to provide the sgt to the device? We don't necessarily need to map that sgt for guest access. Best regards, Dmitry. > > Th
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hi Dmitry, On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov wrote: > > Hello, > > We at OpenSynergy are also working on an abstract paravirtualized video > streaming device that operates input and/or output data buffers and can be > used > as a generic video decoder/encoder/input/output device. > > We would be glad to share our thoughts and contribute to the discussion. > Please see some comments regarding buffer allocation inline. > > Best regards, > Dmitry. > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote: > > Hi Gerd, > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann wrote: > > > Hi, > > > > > > > Our prototype implementation uses [4], which allows the virtio-vdec > > > > device to use buffers allocated by virtio-gpu device. > > > > > > > > [4] https://lkml.org/lkml/2019/9/12/157 > > > > First of all, thanks for taking a look at this RFC and for valuable > > feedback. Sorry for the late reply. > > > > For reference, Keiichi is working with me and David Stevens on > > accelerated video support for virtual machines and integration with > > other virtual devices, like virtio-gpu for rendering or our > > currently-downstream virtio-wayland for display (I believe there is > > ongoing work to solve this problem in upstream too). > > > > > Well. I think before even discussing the protocol details we need a > > > reasonable plan for buffer handling. I think using virtio-gpu buffers > > > should be an optional optimization and not a requirement. Also the > > > motivation for that should be clear (Let the host decoder write directly > > > to virtio-gpu resources, to display video without copying around the > > > decoded framebuffers from one device to another). > > > > Just to make sure we're on the same page, what would the buffers come > > from if we don't use this optimization? > > > > I can imagine a setup like this; > > 1) host device allocates host memory appropriate for usage with host > > video decoder, > > 2) guest driver allocates arbitrary guest pages for storage > > accessible to the guest software, > > 3) guest userspace writes input for the decoder to guest pages, > > 4) guest driver passes the list of pages for the input and output > > buffers to the host device > > 5) host device copies data from input guest pages to host buffer > > 6) host device runs the decoding > > 7) host device copies decoded frame to output guest pages > > 8) guest userspace can access decoded frame from those pages; back to 3 > > > > Is that something you have in mind? > While GPU side allocations can be useful (especially in case of decoder), it > could be more practical to stick to driver side allocations. This is also due > to the fact that paravirtualized encoders and cameras are not necessarily > require a GPU device. > > Also, the v4l2 framework already features convenient helpers for CMA and SG > allocations. The buffers can be used in the same manner as in virtio-gpu: > buffers are first attached to an already allocated buffer/resource descriptor > and > then are made available for processing by the device using a dedicated command > from the driver. First of all, thanks a lot for your input. This is a relatively new area of virtualization and we definitely need to collect various possible perspectives in the discussion. >From Chrome OS point of view, there are several aspects for which the guest side allocation doesn't really work well: 1) host-side hardware has a lot of specific low level allocation requirements, like alignments, paddings, address space limitations and so on, which is not something that can be (easily) taught to the guest OS, 2) allocation system is designed to be centralized, like Android gralloc, because there is almost never a case when a buffer is to be used only with 1 specific device. 99% of the cases are pipelines like decoder -> GPU/display, camera -> encoder + GPU/display, GPU -> encoder and so on, which means that allocations need to take into account multiple hardware constraints. 3) protected content decoding: the memory for decoded video frames must not be accessible to the guest at all That said, the common desktop Linux model bases on allocating from the producer device (which is why videobuf2 has allocation capability) and we definitely need to consider this model, even if we just think about Linux V4L2 compliance. That's why I'm suggesting the unified memory handling based on guest physical addresses, which would handle both guest-allocated and host-allocated memory. Best regards, Tomasz > > > > > Referencing virtio-gpu buffers needs a better plan than just re-using > > > virtio-gpu resource handles. The handles are device-specific. What if > > > there are multiple virtio-gpu devices present in the guest? > > > > > > I think we need a framework for cross-device buffer sharing. One > > > possible option would be to have some kind of buffer registry, where > > > buffers can be registered for cross-device sharing and get a unique > > > id (
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hello, We at OpenSynergy are also working on an abstract paravirtualized video streaming device that operates input and/or output data buffers and can be used as a generic video decoder/encoder/input/output device. We would be glad to share our thoughts and contribute to the discussion. Please see some comments regarding buffer allocation inline. Best regards, Dmitry. On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote: > Hi Gerd, > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann wrote: > > Hi, > > > > > Our prototype implementation uses [4], which allows the virtio-vdec > > > device to use buffers allocated by virtio-gpu device. > > > > > > [4] https://lkml.org/lkml/2019/9/12/157 > > First of all, thanks for taking a look at this RFC and for valuable > feedback. Sorry for the late reply. > > For reference, Keiichi is working with me and David Stevens on > accelerated video support for virtual machines and integration with > other virtual devices, like virtio-gpu for rendering or our > currently-downstream virtio-wayland for display (I believe there is > ongoing work to solve this problem in upstream too). > > > Well. I think before even discussing the protocol details we need a > > reasonable plan for buffer handling. I think using virtio-gpu buffers > > should be an optional optimization and not a requirement. Also the > > motivation for that should be clear (Let the host decoder write directly > > to virtio-gpu resources, to display video without copying around the > > decoded framebuffers from one device to another). > > Just to make sure we're on the same page, what would the buffers come > from if we don't use this optimization? > > I can imagine a setup like this; > 1) host device allocates host memory appropriate for usage with host > video decoder, > 2) guest driver allocates arbitrary guest pages for storage > accessible to the guest software, > 3) guest userspace writes input for the decoder to guest pages, > 4) guest driver passes the list of pages for the input and output > buffers to the host device > 5) host device copies data from input guest pages to host buffer > 6) host device runs the decoding > 7) host device copies decoded frame to output guest pages > 8) guest userspace can access decoded frame from those pages; back to 3 > > Is that something you have in mind? While GPU side allocations can be useful (especially in case of decoder), it could be more practical to stick to driver side allocations. This is also due to the fact that paravirtualized encoders and cameras are not necessarily require a GPU device. Also, the v4l2 framework already features convenient helpers for CMA and SG allocations. The buffers can be used in the same manner as in virtio-gpu: buffers are first attached to an already allocated buffer/resource descriptor and then are made available for processing by the device using a dedicated command from the driver. > > > Referencing virtio-gpu buffers needs a better plan than just re-using > > virtio-gpu resource handles. The handles are device-specific. What if > > there are multiple virtio-gpu devices present in the guest? > > > > I think we need a framework for cross-device buffer sharing. One > > possible option would be to have some kind of buffer registry, where > > buffers can be registered for cross-device sharing and get a unique > > id (a uuid maybe?). Drivers would typically register buffers on > > dma-buf export. > > This approach could possibly let us handle this transparently to > importers, which would work for guest kernel subsystems that rely on > the ability to handle buffers like native memory (e.g. having a > sgtable or DMA address) for them. > > How about allocating guest physical addresses for memory corresponding > to those buffers? On the virtio-gpu example, that could work like > this: > - by default a virtio-gpu buffer has only a resource handle, > - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the > virtio-gpu device export the buffer to a host framework (inside the > VMM) that would allocate guest page addresses for it, which the > command would return in a response to the guest, > - virtio-gpu driver could then create a regular DMA-buf object for > such memory, because it's just backed by pages (even though they may > not be accessible to the guest; just like in the case of TrustZone > memory protection on bare metal systems), > - any consumer would be able to handle such buffer like a regular > guest memory, passing low-level scatter-gather tables to the host as > buffer descriptors - this would nicely integrate with the basic case > without buffer sharing, as described above. > > Another interesting side effect of the above approach would be the > ease of integration with virtio-iommu. If the virtio master device is > put behind a virtio-iommu, the guest page addresses become the input > to iommu page tables and IOVA addresses go to the host via the virtio > master
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hi Gerd,
On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann wrote:
>
> Hi,
>
> > Our prototype implementation uses [4], which allows the virtio-vdec
> > device to use buffers allocated by virtio-gpu device.
>
> > [4] https://lkml.org/lkml/2019/9/12/157
First of all, thanks for taking a look at this RFC and for valuable
feedback. Sorry for the late reply.
For reference, Keiichi is working with me and David Stevens on
accelerated video support for virtual machines and integration with
other virtual devices, like virtio-gpu for rendering or our
currently-downstream virtio-wayland for display (I believe there is
ongoing work to solve this problem in upstream too).
>
> Well. I think before even discussing the protocol details we need a
> reasonable plan for buffer handling. I think using virtio-gpu buffers
> should be an optional optimization and not a requirement. Also the
> motivation for that should be clear (Let the host decoder write directly
> to virtio-gpu resources, to display video without copying around the
> decoded framebuffers from one device to another).
Just to make sure we're on the same page, what would the buffers come
from if we don't use this optimization?
I can imagine a setup like this;
1) host device allocates host memory appropriate for usage with host
video decoder,
2) guest driver allocates arbitrary guest pages for storage
accessible to the guest software,
3) guest userspace writes input for the decoder to guest pages,
4) guest driver passes the list of pages for the input and output
buffers to the host device
5) host device copies data from input guest pages to host buffer
6) host device runs the decoding
7) host device copies decoded frame to output guest pages
8) guest userspace can access decoded frame from those pages; back to 3
Is that something you have in mind?
>
> Referencing virtio-gpu buffers needs a better plan than just re-using
> virtio-gpu resource handles. The handles are device-specific. What if
> there are multiple virtio-gpu devices present in the guest?
>
> I think we need a framework for cross-device buffer sharing. One
> possible option would be to have some kind of buffer registry, where
> buffers can be registered for cross-device sharing and get a unique
> id (a uuid maybe?). Drivers would typically register buffers on
> dma-buf export.
This approach could possibly let us handle this transparently to
importers, which would work for guest kernel subsystems that rely on
the ability to handle buffers like native memory (e.g. having a
sgtable or DMA address) for them.
How about allocating guest physical addresses for memory corresponding
to those buffers? On the virtio-gpu example, that could work like
this:
- by default a virtio-gpu buffer has only a resource handle,
- VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
virtio-gpu device export the buffer to a host framework (inside the
VMM) that would allocate guest page addresses for it, which the
command would return in a response to the guest,
- virtio-gpu driver could then create a regular DMA-buf object for
such memory, because it's just backed by pages (even though they may
not be accessible to the guest; just like in the case of TrustZone
memory protection on bare metal systems),
- any consumer would be able to handle such buffer like a regular
guest memory, passing low-level scatter-gather tables to the host as
buffer descriptors - this would nicely integrate with the basic case
without buffer sharing, as described above.
Another interesting side effect of the above approach would be the
ease of integration with virtio-iommu. If the virtio master device is
put behind a virtio-iommu, the guest page addresses become the input
to iommu page tables and IOVA addresses go to the host via the virtio
master device protocol, inside the low-level scatter-gather tables.
What do you think?
Best regards,
Tomasz
>
> Another option would be to pass around both buffer handle and buffer
> owner, i.e. instead of "u32 handle" have something like this:
>
> struct buffer_reference {
> enum device_type; /* pci, virtio-mmio, ... */
> union device_address {
> struct pci_address pci_addr;
> u64 virtio_mmio_addr;
> [ ... ]
> };
> u64 device_buffer_handle; /* device-specific, virtio-gpu could use
> resource ids here */
> };
>
> cheers,
> Gerd
>
Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
Hi,
> Our prototype implementation uses [4], which allows the virtio-vdec
> device to use buffers allocated by virtio-gpu device.
> [4] https://lkml.org/lkml/2019/9/12/157
Well. I think before even discussing the protocol details we need a
reasonable plan for buffer handling. I think using virtio-gpu buffers
should be an optional optimization and not a requirement. Also the
motivation for that should be clear (Let the host decoder write directly
to virtio-gpu resources, to display video without copying around the
decoded framebuffers from one device to another).
Referencing virtio-gpu buffers needs a better plan than just re-using
virtio-gpu resource handles. The handles are device-specific. What if
there are multiple virtio-gpu devices present in the guest?
I think we need a framework for cross-device buffer sharing. One
possible option would be to have some kind of buffer registry, where
buffers can be registered for cross-device sharing and get a unique
id (a uuid maybe?). Drivers would typically register buffers on
dma-buf export.
Another option would be to pass around both buffer handle and buffer
owner, i.e. instead of "u32 handle" have something like this:
struct buffer_reference {
enum device_type; /* pci, virtio-mmio, ... */
union device_address {
struct pci_address pci_addr;
u64 virtio_mmio_addr;
[ ... ]
};
u64 device_buffer_handle; /* device-specific, virtio-gpu could use
resource ids here */
};
cheers,
Gerd
