Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-17 Thread Gerd Hoffmann
> > I doubt you can handle pci memory bars like regular ram when it comes to
> > dma and iommu support.  There is a reason we have p2pdma in the first
> > place ...
> 
> The thing is that such bars would be actually backed by regular host
> RAM. Do we really need the complexity of real PCI bar handling for
> that?

Well, taking shortcuts because of virtualization-specific assumptions
already caused problems in the past.  See the messy iommu handling we
have in virtio-pci for example.

So I don't feel like going the "we know it's just normal pages, so lets
simplify things" route.

Beside that hostmap isn't important for secure buffers, we wouldn't
allow the guest mapping them anyway ;)

cheers,
  Gerd



Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-17 Thread Gerd Hoffmann
  Hi,

> That could be still a guest physical address. Like on a bare metal
> system with TrustZone, there could be physical memory that is not
> accessible to the CPU.

Hmm.  Yes, maybe.  We could use the dma address of the (first page of
the) guest buffer.  In case of a secure buffer the guest has no access
to the guest buffer would be unused, but it would at least make sure
that things don't crash in case someone tries to map & access the
buffer.

The host should be able to figure the corresponding host buffer from the
guest buffer address.

When running drm-misc-next you should be able to test whenever that'll
actually work without any virtio-gpu driver changes.

cheers,
  Gerd



Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-17 Thread Tomasz Figa
On Thu, Oct 17, 2019 at 4:44 PM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > > Also note that the guest manages the address space, so the host can't
> > > simply allocate guest page addresses.
> >
> > Is this really true? I'm not an expert in this area, but on a bare
> > metal system it's the hardware or firmware that sets up the various
> > physical address allocations on a hardware level and most of the time
> > most of the addresses are already pre-assigned in hardware (like the
> > DRAM base, various IOMEM spaces, etc.).
>
> Yes, the firmware does it.  Same in a VM, ovmf or seabios (which runs
> inside the guest) typically does it.  And sometimes the linux kernel
> too.
>
> > I think that means that we could have a reserved region that could be
> > used by the host for dynamic memory hot-plug-like operation. The
> > reference to memory hot-plug here is fully intentional, we could even
> > use this feature of Linux to get struct pages for such memory if we
> > really wanted.
>
> We try to avoid such quirks whenever possible.  Negotiating such things
> between qemu and firmware can be done if really needed (and actually is
> done for memory hotplug support), but it's an extra interface which
> needs maintenance.
>
> > > Mapping host virtio-gpu resources
> > > into guest address space is planned, it'll most likely use a pci memory
> > > bar to reserve some address space.  The host can map resources into that
> > > pci bar, on guest request.
> >
> > Sounds like a viable option too. Do you have a pointer to some
> > description on how this would work on both host and guest side?
>
> Some early code:
>   https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-memory-v2
>   https://git.kraxel.org/cgit/linux/log/?h=drm-virtio-memory-v2
>
> Branches have other stuff too, look for "hostmem" commits.
>
> Not much code yet beyond creating a pci bar on the host and detecting
> presence in the guest.
>
> On the host side qemu would create subregions inside the hostmem memory
> region for the resources.
>
> Oh the guest side we can ioremap stuff, like vram.
>
> > > Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> > > Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> > > able to lookup buffers using device and dma address, without explicitly
> > > creating some identifier.  Not investigated yet in detail.
> >
> > Not backed by pages as in "struct page", but those are still regular
> > pages of the physical address space.
>
> Well, maybe not.  Host gem object could live in device memory, and if we
> map them into the guest ...

That's an interesting scenario, but in that case would we still want
to map it into the guest? I think in such case may need to have some
shadow buffer in regular RAM and that's already implemented in
virtio-gpu.

>
> > That said, currently the sg_table interface is only able to describe
> > physical memory using struct page pointers.  It's been a long standing
> > limitation affecting even bare metal systems, so perhaps it's just the
> > right time to make them possible to use some other identifiers, like
> > PFNs?
>
> I doubt you can handle pci memory bars like regular ram when it comes to
> dma and iommu support.  There is a reason we have p2pdma in the first
> place ...

The thing is that such bars would be actually backed by regular host
RAM. Do we really need the complexity of real PCI bar handling for
that?

Best regards,
Tomasz


Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-17 Thread Tomasz Figa
On Thu, Oct 17, 2019 at 4:19 PM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > That said, Chrome OS would use a similar model, except that we don't
> > use ION. We would likely use minigbm backed by virtio-gpu to allocate
> > appropriate secure buffers for us and then import them to the V4L2
> > driver.
>
> What exactly is a "secure buffer"?  I guess a gem object where read
> access is not allowed, only scanout to display?  Who enforces this?
> The hardware?  Or the kernel driver?

In general, it's a buffer which can be accessed only by a specific set
of entities. The set depends on the use case and the level of security
you want to achieve. In Chrome OS we at least want to make such
buffers completely inaccessible for the guest, enforced by the VMM,
for example by not installing corresponding memory into the guest
address space (and not allowing transfers if the virtio-gpu shadow
buffer model is used).

Beyond that, the host memory itself could be further protected by some
hardware mechanisms or another hypervisor running above the host OS,
like in the ARM TrustZone model. That shouldn't matter for a VM guest,
though.

>
> It might make sense for virtio-gpu to know that concept, to allow guests
> ask for secure buffers.
>
> And of course we'll need some way to pass around identifiers for these
> (and maybe other) buffers (from virtio-gpu device via guest drivers to
> virtio-vdec device).  virtio-gpu guest driver could generate a uuid for
> that, attach it to the dma-buf and also notify the host so qemu can
> maintain a uuid -> buffer lookup table.

That could be still a guest physical address. Like on a bare metal
system with TrustZone, there could be physical memory that is not
accessible to the CPU.

Best regards,
Tomasz


Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-17 Thread Tomasz Figa
On Tue, Oct 15, 2019 at 11:06 PM Dmitry Morozov
 wrote:
>
> Hello Gerd,
>
> On Dienstag, 15. Oktober 2019 09:54:22 CEST Gerd Hoffmann wrote:
> > On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
> > > On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> > > >   Hi,
> > > >
> > > > > My take on this (for a decoder) would be to allocate memory for output
> > > > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > > > provide those to the device using virtio. The device side then uses
> > > > > the
> > > > > dmabuf framework to make the buffers accessible for the hardware. I'm
> > > > > not
> > > > > sure about that, it's just an idea.
> > > >
> > > > Virtualization aside, how does the complete video decoding workflow
> > > > work?  I assume along the lines of ...
> > > >
> > > >   (1) allocate buffer for decoded video frames (from ion).
> > > >   (2) export those buffers as dma-buf.
> > > >   (3) import dma-buf to video decoder.
> > > >   (4) import dma-buf to gpu.
> > > >
> > > > ... to establish buffers shared between video decoder and gpu?
> > > >
> > > > Then feed the video stream into the decoder, which decodes into the ion
> > > > buffers?  Ask the gpu to scanout the ion buffers to show the video?
> > > >
> > > > cheers,
> > > >
> > > >   Gerd
> > >
> > > Yes, exactly.
> > >
> > > [decoder]
> > > 1) Input buffers are allocated using  VIDIOC_*BUFS.
> >
> > Ok.
> >
> > > 2) Output buffers are allocated in a guest specific manner (ION, gbm).
> >
> > Who decides whenever ION or gbm is used?  The phrase "secure ION heap"
> > used above sounds like using ION is required for decoding drm-protected
> > content.
>
> I mention the secure ION heap to address this Chrome OS related point:
> > 3) protected content decoding: the memory for decoded video frames
> > must not be accessible to the guest at all
>
> There was an RFC to implement a secure memory allocation framework, but
> apparently it was not accepted: https://lwn.net/Articles/661549/.
>
> In case of Android, it allocates GPU buffers for output frames, so it is the
> gralloc implementation who decides how to allocate memory. It can use some
> dedicated ION heap or can use libgbm. It can also be some proprietary
> implementation.
>
> >
> > So, do we have to worry about ION here?  Or can we just use gbm?
>
> If we replace vendor specific code in the Android guest and provide a way to
> communicate meatdata for buffer allocations from the device to the driver, we
> can use gbm. In the PC world it might be easier.
>
> >
> > [ Note: don't know much about ion, other than that it is used by
> > android, is in staging right now and patches to move it
> > out of staging are floating around @ dri-devel ]

Chrome OS has cros_gralloc, which is an open source implementation of
gralloc on top of minigbm (which itself is built on top of the Linux
DRM interfaces). It's not limited to Chrome OS and I believe Intel
also uses it for their native Android setups. With that, we could
completely disregard ION, but I feel like it's not a core problem
here. Whoever wants to use ION should be still able to do so if they
back the allocations with guest pages or memory coming from the host
using some other interface and it can be described using an identifier
compatible with what we're discussing here.

Best regards,
Tomasz


Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-17 Thread Gerd Hoffmann
  Hi,

> > Also note that the guest manages the address space, so the host can't
> > simply allocate guest page addresses.
> 
> Is this really true? I'm not an expert in this area, but on a bare
> metal system it's the hardware or firmware that sets up the various
> physical address allocations on a hardware level and most of the time
> most of the addresses are already pre-assigned in hardware (like the
> DRAM base, various IOMEM spaces, etc.).

Yes, the firmware does it.  Same in a VM, ovmf or seabios (which runs
inside the guest) typically does it.  And sometimes the linux kernel
too.

> I think that means that we could have a reserved region that could be
> used by the host for dynamic memory hot-plug-like operation. The
> reference to memory hot-plug here is fully intentional, we could even
> use this feature of Linux to get struct pages for such memory if we
> really wanted.

We try to avoid such quirks whenever possible.  Negotiating such things
between qemu and firmware can be done if really needed (and actually is
done for memory hotplug support), but it's an extra interface which
needs maintenance.

> > Mapping host virtio-gpu resources
> > into guest address space is planned, it'll most likely use a pci memory
> > bar to reserve some address space.  The host can map resources into that
> > pci bar, on guest request.
> 
> Sounds like a viable option too. Do you have a pointer to some
> description on how this would work on both host and guest side?

Some early code:
  https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-memory-v2
  https://git.kraxel.org/cgit/linux/log/?h=drm-virtio-memory-v2

Branches have other stuff too, look for "hostmem" commits.

Not much code yet beyond creating a pci bar on the host and detecting
presence in the guest.

On the host side qemu would create subregions inside the hostmem memory
region for the resources.

Oh the guest side we can ioremap stuff, like vram.

> > Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> > Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> > able to lookup buffers using device and dma address, without explicitly
> > creating some identifier.  Not investigated yet in detail.
> 
> Not backed by pages as in "struct page", but those are still regular
> pages of the physical address space.

Well, maybe not.  Host gem object could live in device memory, and if we
map them into the guest ...

> That said, currently the sg_table interface is only able to describe
> physical memory using struct page pointers.  It's been a long standing
> limitation affecting even bare metal systems, so perhaps it's just the
> right time to make them possible to use some other identifiers, like
> PFNs?

I doubt you can handle pci memory bars like regular ram when it comes to
dma and iommu support.  There is a reason we have p2pdma in the first
place ...

cheers,
  Gerd



Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-17 Thread Gerd Hoffmann
  Hi,

> That said, Chrome OS would use a similar model, except that we don't
> use ION. We would likely use minigbm backed by virtio-gpu to allocate
> appropriate secure buffers for us and then import them to the V4L2
> driver.

What exactly is a "secure buffer"?  I guess a gem object where read
access is not allowed, only scanout to display?  Who enforces this?
The hardware?  Or the kernel driver?

It might make sense for virtio-gpu to know that concept, to allow guests
ask for secure buffers.

And of course we'll need some way to pass around identifiers for these
(and maybe other) buffers (from virtio-gpu device via guest drivers to
virtio-vdec device).  virtio-gpu guest driver could generate a uuid for
that, attach it to the dma-buf and also notify the host so qemu can
maintain a uuid -> buffer lookup table.

cheers,
  Gerd



Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-17 Thread David Stevens
> Hmm, the cross-device buffer sharing framework I have in mind would
> basically be a buffer registry.  virtio-gpu would create buffers as
> usual, create a identifier somehow (details to be hashed out), attach
> the identifier to the dma-buf so it can be used as outlined above.

Using physical addresses to identify buffers is using the guest
physical address space as the buffer registry. Especially if every
device should be able to operate in isolation, then each virtio
protocol will have some way to allocate buffers that are accessible to
the guest and host. This requires guest physical addresses, and the
guest physical address of the start of the buffer can serve as the
unique identifier for the buffer in both the guest and the host. Even
with buffers that are only accessible to the host, I think it's
reasonable to allocate guest physical addresses since the pages still
exist (in the same way physical addresses for secure physical memory
make sense).

This approach also sidesteps the need for explicit registration. With
explicit registration, either there would need to be some centralized
buffer exporter device or each protocol would need to have its own
export function. Using guest physical addresses means that buffers get
a unique identifier during creation. For example, in the virtio-gpu
protocol, buffers would get this identifier through
VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING, or through
VIRTIO_GPU_CMD_RESOURCE_CREATE_V2 with impending additions to resource
creation.

> Also note that the guest manages the address space, so the host can't
> simply allocate guest page addresses.  Mapping host virtio-gpu resources
> into guest address space is planned, it'll most likely use a pci memory
> bar to reserve some address space.  The host can map resources into that
> pci bar, on guest request.
>
> >  - virtio-gpu driver could then create a regular DMA-buf object for
> > such memory, because it's just backed by pages (even though they may
> > not be accessible to the guest; just like in the case of TrustZone
> > memory protection on bare metal systems),
>
> Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> able to lookup buffers using device and dma address, without explicitly
> creating some identifier.  Not investigated yet in detail.

For the linux guest implementation, mapping a dma-buf doesn't
necessarily require actual pages. The exporting driver's map_dma_buf
function just needs to provide a sg_table with populated dma_addres
fields, it doesn't actually need to populate the sg_table with pages.
At the very least, there are places such as i915_gem_stolen.c and
(some situations of) videobuf-dma-sg.c that take this approach.

Cheers,
David


Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-16 Thread Tomasz Figa
On Mon, Oct 14, 2019 at 9:19 PM Gerd Hoffmann  wrote:
>
> > > Well.  I think before even discussing the protocol details we need a
> > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > should be an optional optimization and not a requirement.  Also the
> > > motivation for that should be clear (Let the host decoder write directly
> > > to virtio-gpu resources, to display video without copying around the
> > > decoded framebuffers from one device to another).
> >
> > Just to make sure we're on the same page, what would the buffers come
> > from if we don't use this optimization?
> >
> > I can imagine a setup like this;
> >  1) host device allocates host memory appropriate for usage with host
> > video decoder,
> >  2) guest driver allocates arbitrary guest pages for storage
> > accessible to the guest software,
> >  3) guest userspace writes input for the decoder to guest pages,
> >  4) guest driver passes the list of pages for the input and output
> > buffers to the host device
> >  5) host device copies data from input guest pages to host buffer
> >  6) host device runs the decoding
> >  7) host device copies decoded frame to output guest pages
> >  8) guest userspace can access decoded frame from those pages; back to 3
> >
> > Is that something you have in mind?
>
> I don't have any specific workflow in mind.
>
> If you want display the decoded video frames you want use dma-bufs shared
> by video decoder and gpu, right?  So the userspace application (video
> player probably) would create the buffers using one of the drivers,
> export them as dma-buf, then import them into the other driver.  Just
> like you would do on physical hardware.  So, when using virtio-gpu
> buffers:
>
>   (1) guest app creates buffers using virtio-gpu.
>   (2) guest app exports virtio-gpu buffers buffers as dma-buf.
>   (3) guest app imports the dma-bufs into virtio-vdec.
>   (4) guest app asks the virtio-vdec driver to write the decoded
>   frames into the dma-bufs.
>   (5) guest app asks the virtio-gpu driver to display the decoded
>   frame.
>
> The guest video decoder driver passes the dma-buf pages to the host, and
> it is the host driver's job to fill the buffer.  How this is done
> exactly might depend on hardware capabilities (whenever a host-allocated
> bounce buffer is needed or whenever the hardware can decode directly to
> the dma-buf passed by the guest driver) and is an implementation detail.
>
> Now, with cross-device sharing added the virtio-gpu would attach some
> kind of identifier to the dma-buf, virtio-vdec could fetch the
> identifier and pass it to the host too, and the host virtio-vdec device
> can use the identifier to get a host dma-buf handle for the (virtio-gpu)
> buffer.  Ask the host video decoder driver to import the host dma-buf.
> If it all worked fine it can ask the host hardware to decode directly to
> the host virtio-gpu resource.
>

Agreed.

> > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > virtio-gpu resource handles.  The handles are device-specific.  What if
> > > there are multiple virtio-gpu devices present in the guest?
> > >
> > > I think we need a framework for cross-device buffer sharing.  One
> > > possible option would be to have some kind of buffer registry, where
> > > buffers can be registered for cross-device sharing and get a unique
> > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > dma-buf export.
> >
> > This approach could possibly let us handle this transparently to
> > importers, which would work for guest kernel subsystems that rely on
> > the ability to handle buffers like native memory (e.g. having a
> > sgtable or DMA address) for them.
> >
> > How about allocating guest physical addresses for memory corresponding
> > to those buffers? On the virtio-gpu example, that could work like
> > this:
> >  - by default a virtio-gpu buffer has only a resource handle,
> >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > virtio-gpu device export the buffer to a host framework (inside the
> > VMM) that would allocate guest page addresses for it, which the
> > command would return in a response to the guest,
>
> Hmm, the cross-device buffer sharing framework I have in mind would
> basically be a buffer registry.  virtio-gpu would create buffers as
> usual, create a identifier somehow (details to be hashed out), attach
> the identifier to the dma-buf so it can be used as outlined above.
>
> Also note that the guest manages the address space, so the host can't
> simply allocate guest page addresses.

Is this really true? I'm not an expert in this area, but on a bare
metal system it's the hardware or firmware that sets up the various
physical address allocations on a hardware level and most of the time
most of the addresses are already pre-assigned in hardware (like the
DRAM base, various IOMEM spaces, etc.).

I think that means that we could have a reserved region that co

Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-16 Thread Tomasz Figa
On Fri, Oct 11, 2019 at 5:54 PM Dmitry Morozov
 wrote:
>
> Hi Tomasz,
>
> On Mittwoch, 9. Oktober 2019 05:55:45 CEST Tomasz Figa wrote:
> > On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov
> >
> >  wrote:
> > > Hi Tomasz,
> > >
> > > On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> > > > Hi Dmitry,
> > > >
> > > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> > > >
> > > >  wrote:
> > > > > Hello,
> > > > >
> > > > > We at OpenSynergy are also working on an abstract paravirtualized
> > > > > video
> > > > > streaming device that operates input and/or output data buffers and
> > > > > can be
> > > > > used as a generic video decoder/encoder/input/output device.
> > > > >
> > > > > We would be glad to share our thoughts and contribute to the
> > > > > discussion.
> > > > > Please see some comments regarding buffer allocation inline.
> > > > >
> > > > > Best regards,
> > > > > Dmitry.
> > > > >
> > > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > > > > Hi Gerd,
> > > > > >
> > > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann 
> wrote:
> > > > > > >   Hi,
> > > > > > >
> > > > > > > > Our prototype implementation uses [4], which allows the
> > > > > > > > virtio-vdec
> > > > > > > > device to use buffers allocated by virtio-gpu device.
> > > > > > > >
> > > > > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > > > >
> > > > > > First of all, thanks for taking a look at this RFC and for valuable
> > > > > > feedback. Sorry for the late reply.
> > > > > >
> > > > > > For reference, Keiichi is working with me and David Stevens on
> > > > > > accelerated video support for virtual machines and integration with
> > > > > > other virtual devices, like virtio-gpu for rendering or our
> > > > > > currently-downstream virtio-wayland for display (I believe there is
> > > > > > ongoing work to solve this problem in upstream too).
> > > > > >
> > > > > > > Well.  I think before even discussing the protocol details we need
> > > > > > > a
> > > > > > > reasonable plan for buffer handling.  I think using virtio-gpu
> > > > > > > buffers
> > > > > > > should be an optional optimization and not a requirement.  Also
> > > > > > > the
> > > > > > > motivation for that should be clear (Let the host decoder write
> > > > > > > directly
> > > > > > > to virtio-gpu resources, to display video without copying around
> > > > > > > the
> > > > > > > decoded framebuffers from one device to another).
> > > > > >
> > > > > > Just to make sure we're on the same page, what would the buffers
> > > > > > come
> > > > > > from if we don't use this optimization?
> > > > > >
> > > > > > I can imagine a setup like this;
> > > > > >
> > > > > >  1) host device allocates host memory appropriate for usage with
> > > > > >  host
> > > > > >
> > > > > > video decoder,
> > > > > >
> > > > > >  2) guest driver allocates arbitrary guest pages for storage
> > > > > >
> > > > > > accessible to the guest software,
> > > > > >
> > > > > >  3) guest userspace writes input for the decoder to guest pages,
> > > > > >  4) guest driver passes the list of pages for the input and output
> > > > > >
> > > > > > buffers to the host device
> > > > > >
> > > > > >  5) host device copies data from input guest pages to host buffer
> > > > > >  6) host device runs the decoding
> > > > > >  7) host device copies decoded frame to output guest pages
> > > > > >  8) guest userspace can access decoded frame from those pages; back
> > > > > >  to 3
> > > > > >
> > > > > > Is that something you have in mind?
> > > > >
> > > > > While GPU side allocations can be useful (especially in case of
> > > > > decoder),
> > > > > it could be more practical to stick to driver side allocations. This
> > > > > is
> > > > > also due to the fact that paravirtualized encoders and cameras are not
> > > > > necessarily require a GPU device.
> > > > >
> > > > > Also, the v4l2 framework already features convenient helpers for CMA
> > > > > and
> > > > > SG
> > > > > allocations. The buffers can be used in the same manner as in
> > > > > virtio-gpu:
> > > > > buffers are first attached to an already allocated buffer/resource
> > > > > descriptor and then are made available for processing by the device
> > > > > using
> > > > > a dedicated command from the driver.
> > > >
> > > > First of all, thanks a lot for your input. This is a relatively new
> > > > area of virtualization and we definitely need to collect various
> > > > possible perspectives in the discussion.
> > > >
> > > > From Chrome OS point of view, there are several aspects for which the
> > > > guest side allocation doesn't really work well:
> > > > 1) host-side hardware has a lot of specific low level allocation
> > > > requirements, like alignments, paddings, address space limitations and
> > > > so on, which is not something that can be (easily) taught to the guest
> > > > OS,
> > >
> > > I couldn't agree more. There are some changes by Greg to add support for
> > > querying GPU buffer met

Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-15 Thread Dmitry Morozov
Hello Gerd,

On Dienstag, 15. Oktober 2019 09:54:22 CEST Gerd Hoffmann wrote:
> On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
> > On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> > >   Hi,
> > >   
> > > > My take on this (for a decoder) would be to allocate memory for output
> > > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > > provide those to the device using virtio. The device side then uses
> > > > the
> > > > dmabuf framework to make the buffers accessible for the hardware. I'm
> > > > not
> > > > sure about that, it's just an idea.
> > > 
> > > Virtualization aside, how does the complete video decoding workflow
> > > work?  I assume along the lines of ...
> > > 
> > >   (1) allocate buffer for decoded video frames (from ion).
> > >   (2) export those buffers as dma-buf.
> > >   (3) import dma-buf to video decoder.
> > >   (4) import dma-buf to gpu.
> > > 
> > > ... to establish buffers shared between video decoder and gpu?
> > > 
> > > Then feed the video stream into the decoder, which decodes into the ion
> > > buffers?  Ask the gpu to scanout the ion buffers to show the video?
> > > 
> > > cheers,
> > > 
> > >   Gerd
> > 
> > Yes, exactly.
> > 
> > [decoder]
> > 1) Input buffers are allocated using  VIDIOC_*BUFS.
> 
> Ok.
> 
> > 2) Output buffers are allocated in a guest specific manner (ION, gbm).
> 
> Who decides whenever ION or gbm is used?  The phrase "secure ION heap"
> used above sounds like using ION is required for decoding drm-protected
> content.

I mention the secure ION heap to address this Chrome OS related point:
> 3) protected content decoding: the memory for decoded video frames
> must not be accessible to the guest at all

There was an RFC to implement a secure memory allocation framework, but 
apparently it was not accepted: https://lwn.net/Articles/661549/.

In case of Android, it allocates GPU buffers for output frames, so it is the 
gralloc implementation who decides how to allocate memory. It can use some 
dedicated ION heap or can use libgbm. It can also be some proprietary 
implementation.

> 
> So, do we have to worry about ION here?  Or can we just use gbm?

If we replace vendor specific code in the Android guest and provide a way to 
communicate meatdata for buffer allocations from the device to the driver, we 
can use gbm. In the PC world it might be easier.

> 
> [ Note: don't know much about ion, other than that it is used by
> android, is in staging right now and patches to move it
> out of staging are floating around @ dri-devel ]
> 
> > 3) Both input and output buffers are exported as dma-bufs.
> > 4) The backing storage of both inputs and outputs is made available to the
> > device.
> > 5) Decoder hardware writes to output buffers directly.
> 
> As expected.
> 
> > 6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.
> 
> Ok.  So, virtio-gpu has support for dma-buf exports (in drm-misc-next,
> should land upstream in kernel 5.5).  dma-buf imports are not that
> simple unfortunately.  When using the gbm allocation route dma-buf
> exports are good enough though.
>
> The virtio-gpu resources have both a host buffer and a guest buffer[1]
> Data can be copied using the DRM_IOCTL_VIRTGPU_TRANSFER_{FROM,TO}_HOST
> ioctls.  The dma-buf export will export the guest buffer (which lives
> in guest ram).
> 
> It would make sense for the decoded video to go directly to the host
> buffer though.  First because we want avoid copying the video frames for
> performance reasons, and second because we might not be able to copy
> video frames (drm ...).
> 
> This is where the buffer registry idea comes in.  Attach a (host)
> identifier to (guest) dma-bufs, which then allows host device emulation
> share buffers, i.e. virtio-vdec device emulation could decode to a
> dma-buf it got from virtio-gpu device emulation.

Yes. Also, as I mentioned above, in case of gbm the buffers already can 
originate from GPU.

Best regards,
Dmitry.

> 
> Alternatively we could use virtual ION (or whatever it becomes after
> de-staging) for buffer management, with both virtio-vdec and virtio-gpu
> importing dma-bufs from virtual ION on both guest and host side.
> 
> cheers,
>   Gerd
> 
> [1] support for shared buffers is in progress.





Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-15 Thread Gerd Hoffmann
On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
> 
> On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > My take on this (for a decoder) would be to allocate memory for output
> > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > provide those to the device using virtio. The device side then uses the
> > > dmabuf framework to make the buffers accessible for the hardware. I'm not
> > > sure about that, it's just an idea.
> > 
> > Virtualization aside, how does the complete video decoding workflow
> > work?  I assume along the lines of ...
> > 
> >   (1) allocate buffer for decoded video frames (from ion).
> >   (2) export those buffers as dma-buf.
> >   (3) import dma-buf to video decoder.
> >   (4) import dma-buf to gpu.
> > 
> > ... to establish buffers shared between video decoder and gpu?
> > 
> > Then feed the video stream into the decoder, which decodes into the ion
> > buffers?  Ask the gpu to scanout the ion buffers to show the video?
> > 
> > cheers,
> >   Gerd
> 
> Yes, exactly.
> 
> [decoder]
> 1) Input buffers are allocated using  VIDIOC_*BUFS.

Ok.

> 2) Output buffers are allocated in a guest specific manner (ION, gbm).

Who decides whenever ION or gbm is used?  The phrase "secure ION heap"
used above sounds like using ION is required for decoding drm-protected
content.

So, do we have to worry about ION here?  Or can we just use gbm?

[ Note: don't know much about ion, other than that it is used by
android, is in staging right now and patches to move it
out of staging are floating around @ dri-devel ]

> 3) Both input and output buffers are exported as dma-bufs.
> 4) The backing storage of both inputs and outputs is made available to the 
> device.
> 5) Decoder hardware writes to output buffers directly.

As expected.

> 6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.

Ok.  So, virtio-gpu has support for dma-buf exports (in drm-misc-next,
should land upstream in kernel 5.5).  dma-buf imports are not that
simple unfortunately.  When using the gbm allocation route dma-buf
exports are good enough though.

The virtio-gpu resources have both a host buffer and a guest buffer[1]
Data can be copied using the DRM_IOCTL_VIRTGPU_TRANSFER_{FROM,TO}_HOST
ioctls.  The dma-buf export will export the guest buffer (which lives
in guest ram).

It would make sense for the decoded video to go directly to the host
buffer though.  First because we want avoid copying the video frames for
performance reasons, and second because we might not be able to copy
video frames (drm ...).

This is where the buffer registry idea comes in.  Attach a (host)
identifier to (guest) dma-bufs, which then allows host device emulation
share buffers, i.e. virtio-vdec device emulation could decode to a
dma-buf it got from virtio-gpu device emulation.

Alternatively we could use virtual ION (or whatever it becomes after
de-staging) for buffer management, with both virtio-vdec and virtio-gpu
importing dma-bufs from virtual ION on both guest and host side.

cheers,
  Gerd

[1] support for shared buffers is in progress.


Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-14 Thread Dmitry Morozov


On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
>   Hi,
> 
> > My take on this (for a decoder) would be to allocate memory for output
> > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > provide those to the device using virtio. The device side then uses the
> > dmabuf framework to make the buffers accessible for the hardware. I'm not
> > sure about that, it's just an idea.
> 
> Virtualization aside, how does the complete video decoding workflow
> work?  I assume along the lines of ...
> 
>   (1) allocate buffer for decoded video frames (from ion).
>   (2) export those buffers as dma-buf.
>   (3) import dma-buf to video decoder.
>   (4) import dma-buf to gpu.
> 
> ... to establish buffers shared between video decoder and gpu?
> 
> Then feed the video stream into the decoder, which decodes into the ion
> buffers?  Ask the gpu to scanout the ion buffers to show the video?
> 
> cheers,
>   Gerd

Yes, exactly.

[decoder]
1) Input buffers are allocated using  VIDIOC_*BUFS.
2) Output buffers are allocated in a guest specific manner (ION, gbm).
3) Both input and output buffers are exported as dma-bufs.
4) The backing storage of both inputs and outputs is made available to the 
device.
5) Decoder hardware writes to output buffers directly.
6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.

Best regards,
Dmitry 




Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-14 Thread Gerd Hoffmann
  Hi,

> My take on this (for a decoder) would be to allocate memory for output 
> buffers 
> from a secure ION heap, import in the v4l2 driver, and then to provide those 
> to the device using virtio. The device side then uses the dmabuf framework to 
> make the buffers accessible for the hardware. I'm not sure about that, it's 
> just an idea.

Virtualization aside, how does the complete video decoding workflow
work?  I assume along the lines of ...

  (1) allocate buffer for decoded video frames (from ion).
  (2) export those buffers as dma-buf.
  (3) import dma-buf to video decoder.
  (4) import dma-buf to gpu.

... to establish buffers shared between video decoder and gpu?

Then feed the video stream into the decoder, which decodes into the ion
buffers?  Ask the gpu to scanout the ion buffers to show the video?

cheers,
  Gerd



Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-14 Thread Gerd Hoffmann
> > Well.  I think before even discussing the protocol details we need a
> > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > should be an optional optimization and not a requirement.  Also the
> > motivation for that should be clear (Let the host decoder write directly
> > to virtio-gpu resources, to display video without copying around the
> > decoded framebuffers from one device to another).
> 
> Just to make sure we're on the same page, what would the buffers come
> from if we don't use this optimization?
> 
> I can imagine a setup like this;
>  1) host device allocates host memory appropriate for usage with host
> video decoder,
>  2) guest driver allocates arbitrary guest pages for storage
> accessible to the guest software,
>  3) guest userspace writes input for the decoder to guest pages,
>  4) guest driver passes the list of pages for the input and output
> buffers to the host device
>  5) host device copies data from input guest pages to host buffer
>  6) host device runs the decoding
>  7) host device copies decoded frame to output guest pages
>  8) guest userspace can access decoded frame from those pages; back to 3
> 
> Is that something you have in mind?

I don't have any specific workflow in mind.

If you want display the decoded video frames you want use dma-bufs shared
by video decoder and gpu, right?  So the userspace application (video
player probably) would create the buffers using one of the drivers,
export them as dma-buf, then import them into the other driver.  Just
like you would do on physical hardware.  So, when using virtio-gpu
buffers:

  (1) guest app creates buffers using virtio-gpu.
  (2) guest app exports virtio-gpu buffers buffers as dma-buf.
  (3) guest app imports the dma-bufs into virtio-vdec.
  (4) guest app asks the virtio-vdec driver to write the decoded
  frames into the dma-bufs.
  (5) guest app asks the virtio-gpu driver to display the decoded
  frame.

The guest video decoder driver passes the dma-buf pages to the host, and
it is the host driver's job to fill the buffer.  How this is done
exactly might depend on hardware capabilities (whenever a host-allocated
bounce buffer is needed or whenever the hardware can decode directly to
the dma-buf passed by the guest driver) and is an implementation detail.

Now, with cross-device sharing added the virtio-gpu would attach some
kind of identifier to the dma-buf, virtio-vdec could fetch the
identifier and pass it to the host too, and the host virtio-vdec device
can use the identifier to get a host dma-buf handle for the (virtio-gpu)
buffer.  Ask the host video decoder driver to import the host dma-buf.
If it all worked fine it can ask the host hardware to decode directly to
the host virtio-gpu resource.

> > Referencing virtio-gpu buffers needs a better plan than just re-using
> > virtio-gpu resource handles.  The handles are device-specific.  What if
> > there are multiple virtio-gpu devices present in the guest?
> >
> > I think we need a framework for cross-device buffer sharing.  One
> > possible option would be to have some kind of buffer registry, where
> > buffers can be registered for cross-device sharing and get a unique
> > id (a uuid maybe?).  Drivers would typically register buffers on
> > dma-buf export.
> 
> This approach could possibly let us handle this transparently to
> importers, which would work for guest kernel subsystems that rely on
> the ability to handle buffers like native memory (e.g. having a
> sgtable or DMA address) for them.
> 
> How about allocating guest physical addresses for memory corresponding
> to those buffers? On the virtio-gpu example, that could work like
> this:
>  - by default a virtio-gpu buffer has only a resource handle,
>  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> virtio-gpu device export the buffer to a host framework (inside the
> VMM) that would allocate guest page addresses for it, which the
> command would return in a response to the guest,

Hmm, the cross-device buffer sharing framework I have in mind would
basically be a buffer registry.  virtio-gpu would create buffers as
usual, create a identifier somehow (details to be hashed out), attach
the identifier to the dma-buf so it can be used as outlined above.

Also note that the guest manages the address space, so the host can't
simply allocate guest page addresses.  Mapping host virtio-gpu resources
into guest address space is planned, it'll most likely use a pci memory
bar to reserve some address space.  The host can map resources into that
pci bar, on guest request.

>  - virtio-gpu driver could then create a regular DMA-buf object for
> such memory, because it's just backed by pages (even though they may
> not be accessible to the guest; just like in the case of TrustZone
> memory protection on bare metal systems),

Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
Documentation/driver-api/pci/p2pdma.rst though.  With that we might be

Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-11 Thread Dmitry Morozov
Hi Tomasz,

On Mittwoch, 9. Oktober 2019 05:55:45 CEST Tomasz Figa wrote:
> On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov
> 
>  wrote:
> > Hi Tomasz,
> > 
> > On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> > > Hi Dmitry,
> > > 
> > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> > > 
> > >  wrote:
> > > > Hello,
> > > > 
> > > > We at OpenSynergy are also working on an abstract paravirtualized
> > > > video
> > > > streaming device that operates input and/or output data buffers and
> > > > can be
> > > > used as a generic video decoder/encoder/input/output device.
> > > > 
> > > > We would be glad to share our thoughts and contribute to the
> > > > discussion.
> > > > Please see some comments regarding buffer allocation inline.
> > > > 
> > > > Best regards,
> > > > Dmitry.
> > > > 
> > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > > > Hi Gerd,
> > > > > 
> > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann  
wrote:
> > > > > >   Hi,
> > > > > >   
> > > > > > > Our prototype implementation uses [4], which allows the
> > > > > > > virtio-vdec
> > > > > > > device to use buffers allocated by virtio-gpu device.
> > > > > > > 
> > > > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > > > 
> > > > > First of all, thanks for taking a look at this RFC and for valuable
> > > > > feedback. Sorry for the late reply.
> > > > > 
> > > > > For reference, Keiichi is working with me and David Stevens on
> > > > > accelerated video support for virtual machines and integration with
> > > > > other virtual devices, like virtio-gpu for rendering or our
> > > > > currently-downstream virtio-wayland for display (I believe there is
> > > > > ongoing work to solve this problem in upstream too).
> > > > > 
> > > > > > Well.  I think before even discussing the protocol details we need
> > > > > > a
> > > > > > reasonable plan for buffer handling.  I think using virtio-gpu
> > > > > > buffers
> > > > > > should be an optional optimization and not a requirement.  Also
> > > > > > the
> > > > > > motivation for that should be clear (Let the host decoder write
> > > > > > directly
> > > > > > to virtio-gpu resources, to display video without copying around
> > > > > > the
> > > > > > decoded framebuffers from one device to another).
> > > > > 
> > > > > Just to make sure we're on the same page, what would the buffers
> > > > > come
> > > > > from if we don't use this optimization?
> > > > > 
> > > > > I can imagine a setup like this;
> > > > > 
> > > > >  1) host device allocates host memory appropriate for usage with
> > > > >  host
> > > > > 
> > > > > video decoder,
> > > > > 
> > > > >  2) guest driver allocates arbitrary guest pages for storage
> > > > > 
> > > > > accessible to the guest software,
> > > > > 
> > > > >  3) guest userspace writes input for the decoder to guest pages,
> > > > >  4) guest driver passes the list of pages for the input and output
> > > > > 
> > > > > buffers to the host device
> > > > > 
> > > > >  5) host device copies data from input guest pages to host buffer
> > > > >  6) host device runs the decoding
> > > > >  7) host device copies decoded frame to output guest pages
> > > > >  8) guest userspace can access decoded frame from those pages; back
> > > > >  to 3
> > > > > 
> > > > > Is that something you have in mind?
> > > > 
> > > > While GPU side allocations can be useful (especially in case of
> > > > decoder),
> > > > it could be more practical to stick to driver side allocations. This
> > > > is
> > > > also due to the fact that paravirtualized encoders and cameras are not
> > > > necessarily require a GPU device.
> > > > 
> > > > Also, the v4l2 framework already features convenient helpers for CMA
> > > > and
> > > > SG
> > > > allocations. The buffers can be used in the same manner as in
> > > > virtio-gpu:
> > > > buffers are first attached to an already allocated buffer/resource
> > > > descriptor and then are made available for processing by the device
> > > > using
> > > > a dedicated command from the driver.
> > > 
> > > First of all, thanks a lot for your input. This is a relatively new
> > > area of virtualization and we definitely need to collect various
> > > possible perspectives in the discussion.
> > > 
> > > From Chrome OS point of view, there are several aspects for which the
> > > guest side allocation doesn't really work well:
> > > 1) host-side hardware has a lot of specific low level allocation
> > > requirements, like alignments, paddings, address space limitations and
> > > so on, which is not something that can be (easily) taught to the guest
> > > OS,
> > 
> > I couldn't agree more. There are some changes by Greg to add support for
> > querying GPU buffer metadata. Probably those changes could be integrated
> > with 'a framework for cross-device buffer sharing' (something that Greg
> > mentioned earlier in the thread and that would totally make sense).
> 
> Did you mean one of Gerd's proposals?
> 
> I think we need some

Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-08 Thread Tomasz Figa
On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov
 wrote:
>
> Hi Tomasz,
>
> On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> > Hi Dmitry,
> >
> > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> >
> >  wrote:
> > > Hello,
> > >
> > > We at OpenSynergy are also working on an abstract paravirtualized video
> > > streaming device that operates input and/or output data buffers and can be
> > > used as a generic video decoder/encoder/input/output device.
> > >
> > > We would be glad to share our thoughts and contribute to the discussion.
> > > Please see some comments regarding buffer allocation inline.
> > >
> > > Best regards,
> > > Dmitry.
> > >
> > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > > Hi Gerd,
> > > >
> > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann  wrote:
> > > > >   Hi,
> > > > >
> > > > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > > > device to use buffers allocated by virtio-gpu device.
> > > > > >
> > > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > >
> > > > First of all, thanks for taking a look at this RFC and for valuable
> > > > feedback. Sorry for the late reply.
> > > >
> > > > For reference, Keiichi is working with me and David Stevens on
> > > > accelerated video support for virtual machines and integration with
> > > > other virtual devices, like virtio-gpu for rendering or our
> > > > currently-downstream virtio-wayland for display (I believe there is
> > > > ongoing work to solve this problem in upstream too).
> > > >
> > > > > Well.  I think before even discussing the protocol details we need a
> > > > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > > > should be an optional optimization and not a requirement.  Also the
> > > > > motivation for that should be clear (Let the host decoder write
> > > > > directly
> > > > > to virtio-gpu resources, to display video without copying around the
> > > > > decoded framebuffers from one device to another).
> > > >
> > > > Just to make sure we're on the same page, what would the buffers come
> > > > from if we don't use this optimization?
> > > >
> > > > I can imagine a setup like this;
> > > >
> > > >  1) host device allocates host memory appropriate for usage with host
> > > >
> > > > video decoder,
> > > >
> > > >  2) guest driver allocates arbitrary guest pages for storage
> > > >
> > > > accessible to the guest software,
> > > >
> > > >  3) guest userspace writes input for the decoder to guest pages,
> > > >  4) guest driver passes the list of pages for the input and output
> > > >
> > > > buffers to the host device
> > > >
> > > >  5) host device copies data from input guest pages to host buffer
> > > >  6) host device runs the decoding
> > > >  7) host device copies decoded frame to output guest pages
> > > >  8) guest userspace can access decoded frame from those pages; back to 3
> > > >
> > > > Is that something you have in mind?
> > >
> > > While GPU side allocations can be useful (especially in case of decoder),
> > > it could be more practical to stick to driver side allocations. This is
> > > also due to the fact that paravirtualized encoders and cameras are not
> > > necessarily require a GPU device.
> > >
> > > Also, the v4l2 framework already features convenient helpers for CMA and
> > > SG
> > > allocations. The buffers can be used in the same manner as in virtio-gpu:
> > > buffers are first attached to an already allocated buffer/resource
> > > descriptor and then are made available for processing by the device using
> > > a dedicated command from the driver.
> >
> > First of all, thanks a lot for your input. This is a relatively new
> > area of virtualization and we definitely need to collect various
> > possible perspectives in the discussion.
> >
> > From Chrome OS point of view, there are several aspects for which the
> > guest side allocation doesn't really work well:
> > 1) host-side hardware has a lot of specific low level allocation
> > requirements, like alignments, paddings, address space limitations and
> > so on, which is not something that can be (easily) taught to the guest
> > OS,
> I couldn't agree more. There are some changes by Greg to add support for
> querying GPU buffer metadata. Probably those changes could be integrated with
> 'a framework for cross-device buffer sharing' (something that Greg mentioned
> earlier in the thread and that would totally make sense).
>

Did you mean one of Gerd's proposals?

I think we need some clarification there, as it's not clear to me
whether the framework is host-side, guest-side or both. The approach I
suggested would rely on a host-side framework and guest-side wouldn't
need any special handling for sharing, because the memory would behave
as on bare metal.

However allocation would still need some special API to express high
level buffer parameters and delegate the exact allocation requirements
to the host. Currently virtio-gpu already has such interfac

Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-07 Thread Dmitry Morozov
Hi Tomasz,

On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> Hi Dmitry,
> 
> On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> 
>  wrote:
> > Hello,
> > 
> > We at OpenSynergy are also working on an abstract paravirtualized video
> > streaming device that operates input and/or output data buffers and can be
> > used as a generic video decoder/encoder/input/output device.
> > 
> > We would be glad to share our thoughts and contribute to the discussion.
> > Please see some comments regarding buffer allocation inline.
> > 
> > Best regards,
> > Dmitry.
> > 
> > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > Hi Gerd,
> > > 
> > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann  wrote:
> > > >   Hi,
> > > >   
> > > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > > device to use buffers allocated by virtio-gpu device.
> > > > > 
> > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > 
> > > First of all, thanks for taking a look at this RFC and for valuable
> > > feedback. Sorry for the late reply.
> > > 
> > > For reference, Keiichi is working with me and David Stevens on
> > > accelerated video support for virtual machines and integration with
> > > other virtual devices, like virtio-gpu for rendering or our
> > > currently-downstream virtio-wayland for display (I believe there is
> > > ongoing work to solve this problem in upstream too).
> > > 
> > > > Well.  I think before even discussing the protocol details we need a
> > > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > > should be an optional optimization and not a requirement.  Also the
> > > > motivation for that should be clear (Let the host decoder write
> > > > directly
> > > > to virtio-gpu resources, to display video without copying around the
> > > > decoded framebuffers from one device to another).
> > > 
> > > Just to make sure we're on the same page, what would the buffers come
> > > from if we don't use this optimization?
> > > 
> > > I can imagine a setup like this;
> > > 
> > >  1) host device allocates host memory appropriate for usage with host
> > > 
> > > video decoder,
> > > 
> > >  2) guest driver allocates arbitrary guest pages for storage
> > > 
> > > accessible to the guest software,
> > > 
> > >  3) guest userspace writes input for the decoder to guest pages,
> > >  4) guest driver passes the list of pages for the input and output
> > > 
> > > buffers to the host device
> > > 
> > >  5) host device copies data from input guest pages to host buffer
> > >  6) host device runs the decoding
> > >  7) host device copies decoded frame to output guest pages
> > >  8) guest userspace can access decoded frame from those pages; back to 3
> > > 
> > > Is that something you have in mind?
> > 
> > While GPU side allocations can be useful (especially in case of decoder),
> > it could be more practical to stick to driver side allocations. This is
> > also due to the fact that paravirtualized encoders and cameras are not
> > necessarily require a GPU device.
> > 
> > Also, the v4l2 framework already features convenient helpers for CMA and
> > SG
> > allocations. The buffers can be used in the same manner as in virtio-gpu:
> > buffers are first attached to an already allocated buffer/resource
> > descriptor and then are made available for processing by the device using
> > a dedicated command from the driver.
> 
> First of all, thanks a lot for your input. This is a relatively new
> area of virtualization and we definitely need to collect various
> possible perspectives in the discussion.
> 
> From Chrome OS point of view, there are several aspects for which the
> guest side allocation doesn't really work well:
> 1) host-side hardware has a lot of specific low level allocation
> requirements, like alignments, paddings, address space limitations and
> so on, which is not something that can be (easily) taught to the guest
> OS,
I couldn't agree more. There are some changes by Greg to add support for 
querying GPU buffer metadata. Probably those changes could be integrated with 
'a framework for cross-device buffer sharing' (something that Greg mentioned 
earlier in the thread and that would totally make sense).

> 2) allocation system is designed to be centralized, like Android
> gralloc, because there is almost never a case when a buffer is to be
> used only with 1 specific device. 99% of the cases are pipelines like
> decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
> encoder and so on, which means that allocations need to take into
> account multiple hardware constraints.
> 3) protected content decoding: the memory for decoded video frames
> must not be accessible to the guest at all
This looks like a valid use case. Would it also be possible for instance to 
allocate mem from a secure ION heap on the guest and then to provide the sgt 
to the device? We don't necessarily need to map that sgt for guest access.

Best regards,
Dmitry.

> 
> Th

Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-07 Thread Tomasz Figa
Hi Dmitry,

On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
 wrote:
>
> Hello,
>
> We at OpenSynergy are also working on an abstract paravirtualized video
> streaming device that operates input and/or output data buffers and can be 
> used
> as a generic video decoder/encoder/input/output device.
>
> We would be glad to share our thoughts and contribute to the discussion.
> Please see some comments regarding buffer allocation inline.
>
> Best regards,
> Dmitry.
>
> On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > Hi Gerd,
> >
> > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann  wrote:
> > >   Hi,
> > >
> > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > device to use buffers allocated by virtio-gpu device.
> > > >
> > > > [4] https://lkml.org/lkml/2019/9/12/157
> >
> > First of all, thanks for taking a look at this RFC and for valuable
> > feedback. Sorry for the late reply.
> >
> > For reference, Keiichi is working with me and David Stevens on
> > accelerated video support for virtual machines and integration with
> > other virtual devices, like virtio-gpu for rendering or our
> > currently-downstream virtio-wayland for display (I believe there is
> > ongoing work to solve this problem in upstream too).
> >
> > > Well.  I think before even discussing the protocol details we need a
> > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > should be an optional optimization and not a requirement.  Also the
> > > motivation for that should be clear (Let the host decoder write directly
> > > to virtio-gpu resources, to display video without copying around the
> > > decoded framebuffers from one device to another).
> >
> > Just to make sure we're on the same page, what would the buffers come
> > from if we don't use this optimization?
> >
> > I can imagine a setup like this;
> >  1) host device allocates host memory appropriate for usage with host
> > video decoder,
> >  2) guest driver allocates arbitrary guest pages for storage
> > accessible to the guest software,
> >  3) guest userspace writes input for the decoder to guest pages,
> >  4) guest driver passes the list of pages for the input and output
> > buffers to the host device
> >  5) host device copies data from input guest pages to host buffer
> >  6) host device runs the decoding
> >  7) host device copies decoded frame to output guest pages
> >  8) guest userspace can access decoded frame from those pages; back to 3
> >
> > Is that something you have in mind?
> While GPU side allocations can be useful (especially in case of decoder), it
> could be more practical to stick to driver side allocations. This is also due
> to the fact that paravirtualized encoders and cameras are not necessarily
> require a GPU device.
>
> Also, the v4l2 framework already features convenient helpers for CMA and SG
> allocations. The buffers can be used in the same manner as in virtio-gpu:
> buffers are first attached to an already allocated buffer/resource descriptor 
> and
> then are made available for processing by the device using a dedicated command
> from the driver.

First of all, thanks a lot for your input. This is a relatively new
area of virtualization and we definitely need to collect various
possible perspectives in the discussion.

>From Chrome OS point of view, there are several aspects for which the
guest side allocation doesn't really work well:
1) host-side hardware has a lot of specific low level allocation
requirements, like alignments, paddings, address space limitations and
so on, which is not something that can be (easily) taught to the guest
OS,
2) allocation system is designed to be centralized, like Android
gralloc, because there is almost never a case when a buffer is to be
used only with 1 specific device. 99% of the cases are pipelines like
decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
encoder and so on, which means that allocations need to take into
account multiple hardware constraints.
3) protected content decoding: the memory for decoded video frames
must not be accessible to the guest at all

That said, the common desktop Linux model bases on allocating from the
producer device (which is why videobuf2 has allocation capability) and
we definitely need to consider this model, even if we just think about
Linux V4L2 compliance. That's why I'm suggesting the unified memory
handling based on guest physical addresses, which would handle both
guest-allocated and host-allocated memory.

Best regards,
Tomasz

> >
> > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > virtio-gpu resource handles.  The handles are device-specific.  What if
> > > there are multiple virtio-gpu devices present in the guest?
> > >
> > > I think we need a framework for cross-device buffer sharing.  One
> > > possible option would be to have some kind of buffer registry, where
> > > buffers can be registered for cross-device sharing and get a unique
> > > id (

Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-07 Thread Dmitry Morozov
Hello,

We at OpenSynergy are also working on an abstract paravirtualized video 
streaming device that operates input and/or output data buffers and can be used 
as a generic video decoder/encoder/input/output device.

We would be glad to share our thoughts and contribute to the discussion. 
Please see some comments regarding buffer allocation inline.

Best regards,
Dmitry.

On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> Hi Gerd,
> 
> On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann  wrote:
> >   Hi,
> >   
> > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > device to use buffers allocated by virtio-gpu device.
> > > 
> > > [4] https://lkml.org/lkml/2019/9/12/157
> 
> First of all, thanks for taking a look at this RFC and for valuable
> feedback. Sorry for the late reply.
> 
> For reference, Keiichi is working with me and David Stevens on
> accelerated video support for virtual machines and integration with
> other virtual devices, like virtio-gpu for rendering or our
> currently-downstream virtio-wayland for display (I believe there is
> ongoing work to solve this problem in upstream too).
> 
> > Well.  I think before even discussing the protocol details we need a
> > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > should be an optional optimization and not a requirement.  Also the
> > motivation for that should be clear (Let the host decoder write directly
> > to virtio-gpu resources, to display video without copying around the
> > decoded framebuffers from one device to another).
> 
> Just to make sure we're on the same page, what would the buffers come
> from if we don't use this optimization?
> 
> I can imagine a setup like this;
>  1) host device allocates host memory appropriate for usage with host
> video decoder,
>  2) guest driver allocates arbitrary guest pages for storage
> accessible to the guest software,
>  3) guest userspace writes input for the decoder to guest pages,
>  4) guest driver passes the list of pages for the input and output
> buffers to the host device
>  5) host device copies data from input guest pages to host buffer
>  6) host device runs the decoding
>  7) host device copies decoded frame to output guest pages
>  8) guest userspace can access decoded frame from those pages; back to 3
> 
> Is that something you have in mind?
While GPU side allocations can be useful (especially in case of decoder), it 
could be more practical to stick to driver side allocations. This is also due 
to the fact that paravirtualized encoders and cameras are not necessarily 
require a GPU device.

Also, the v4l2 framework already features convenient helpers for CMA and SG 
allocations. The buffers can be used in the same manner as in virtio-gpu: 
buffers are first attached to an already allocated buffer/resource descriptor 
and 
then are made available for processing by the device using a dedicated command 
from the driver.
> 
> > Referencing virtio-gpu buffers needs a better plan than just re-using
> > virtio-gpu resource handles.  The handles are device-specific.  What if
> > there are multiple virtio-gpu devices present in the guest?
> > 
> > I think we need a framework for cross-device buffer sharing.  One
> > possible option would be to have some kind of buffer registry, where
> > buffers can be registered for cross-device sharing and get a unique
> > id (a uuid maybe?).  Drivers would typically register buffers on
> > dma-buf export.
> 
> This approach could possibly let us handle this transparently to
> importers, which would work for guest kernel subsystems that rely on
> the ability to handle buffers like native memory (e.g. having a
> sgtable or DMA address) for them.
> 
> How about allocating guest physical addresses for memory corresponding
> to those buffers? On the virtio-gpu example, that could work like
> this:
>  - by default a virtio-gpu buffer has only a resource handle,
>  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> virtio-gpu device export the buffer to a host framework (inside the
> VMM) that would allocate guest page addresses for it, which the
> command would return in a response to the guest,
>  - virtio-gpu driver could then create a regular DMA-buf object for
> such memory, because it's just backed by pages (even though they may
> not be accessible to the guest; just like in the case of TrustZone
> memory protection on bare metal systems),
>  - any consumer would be able to handle such buffer like a regular
> guest memory, passing low-level scatter-gather tables to the host as
> buffer descriptors - this would nicely integrate with the basic case
> without buffer sharing, as described above.
> 
> Another interesting side effect of the above approach would be the
> ease of integration with virtio-iommu. If the virtio master device is
> put behind a virtio-iommu, the guest page addresses become the input
> to iommu page tables and IOVA addresses go to the host via the virtio
> master

Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-10-04 Thread Tomasz Figa
Hi Gerd,

On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > Our prototype implementation uses [4], which allows the virtio-vdec
> > device to use buffers allocated by virtio-gpu device.
>
> > [4] https://lkml.org/lkml/2019/9/12/157

First of all, thanks for taking a look at this RFC and for valuable
feedback. Sorry for the late reply.

For reference, Keiichi is working with me and David Stevens on
accelerated video support for virtual machines and integration with
other virtual devices, like virtio-gpu for rendering or our
currently-downstream virtio-wayland for display (I believe there is
ongoing work to solve this problem in upstream too).

>
> Well.  I think before even discussing the protocol details we need a
> reasonable plan for buffer handling.  I think using virtio-gpu buffers
> should be an optional optimization and not a requirement.  Also the
> motivation for that should be clear (Let the host decoder write directly
> to virtio-gpu resources, to display video without copying around the
> decoded framebuffers from one device to another).

Just to make sure we're on the same page, what would the buffers come
from if we don't use this optimization?

I can imagine a setup like this;
 1) host device allocates host memory appropriate for usage with host
video decoder,
 2) guest driver allocates arbitrary guest pages for storage
accessible to the guest software,
 3) guest userspace writes input for the decoder to guest pages,
 4) guest driver passes the list of pages for the input and output
buffers to the host device
 5) host device copies data from input guest pages to host buffer
 6) host device runs the decoding
 7) host device copies decoded frame to output guest pages
 8) guest userspace can access decoded frame from those pages; back to 3

Is that something you have in mind?

>
> Referencing virtio-gpu buffers needs a better plan than just re-using
> virtio-gpu resource handles.  The handles are device-specific.  What if
> there are multiple virtio-gpu devices present in the guest?
>
> I think we need a framework for cross-device buffer sharing.  One
> possible option would be to have some kind of buffer registry, where
> buffers can be registered for cross-device sharing and get a unique
> id (a uuid maybe?).  Drivers would typically register buffers on
> dma-buf export.

This approach could possibly let us handle this transparently to
importers, which would work for guest kernel subsystems that rely on
the ability to handle buffers like native memory (e.g. having a
sgtable or DMA address) for them.

How about allocating guest physical addresses for memory corresponding
to those buffers? On the virtio-gpu example, that could work like
this:
 - by default a virtio-gpu buffer has only a resource handle,
 - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
virtio-gpu device export the buffer to a host framework (inside the
VMM) that would allocate guest page addresses for it, which the
command would return in a response to the guest,
 - virtio-gpu driver could then create a regular DMA-buf object for
such memory, because it's just backed by pages (even though they may
not be accessible to the guest; just like in the case of TrustZone
memory protection on bare metal systems),
 - any consumer would be able to handle such buffer like a regular
guest memory, passing low-level scatter-gather tables to the host as
buffer descriptors - this would nicely integrate with the basic case
without buffer sharing, as described above.

Another interesting side effect of the above approach would be the
ease of integration with virtio-iommu. If the virtio master device is
put behind a virtio-iommu, the guest page addresses become the input
to iommu page tables and IOVA addresses go to the host via the virtio
master device protocol, inside the low-level scatter-gather tables.

What do you think?

Best regards,
Tomasz

>
> Another option would be to pass around both buffer handle and buffer
> owner, i.e. instead of "u32 handle" have something like this:
>
> struct buffer_reference {
> enum device_type; /* pci, virtio-mmio, ... */
> union device_address {
> struct pci_address pci_addr;
> u64 virtio_mmio_addr;
> [ ... ]
> };
> u64 device_buffer_handle; /* device-specific, virtio-gpu could use 
> resource ids here */
> };
>
> cheers,
>   Gerd
>


Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-09-23 Thread Gerd Hoffmann
  Hi,

> Our prototype implementation uses [4], which allows the virtio-vdec
> device to use buffers allocated by virtio-gpu device.

> [4] https://lkml.org/lkml/2019/9/12/157

Well.  I think before even discussing the protocol details we need a
reasonable plan for buffer handling.  I think using virtio-gpu buffers
should be an optional optimization and not a requirement.  Also the
motivation for that should be clear (Let the host decoder write directly
to virtio-gpu resources, to display video without copying around the
decoded framebuffers from one device to another).

Referencing virtio-gpu buffers needs a better plan than just re-using
virtio-gpu resource handles.  The handles are device-specific.  What if
there are multiple virtio-gpu devices present in the guest?

I think we need a framework for cross-device buffer sharing.  One
possible option would be to have some kind of buffer registry, where
buffers can be registered for cross-device sharing and get a unique
id (a uuid maybe?).  Drivers would typically register buffers on
dma-buf export.

Another option would be to pass around both buffer handle and buffer
owner, i.e. instead of "u32 handle" have something like this:

struct buffer_reference {
enum device_type; /* pci, virtio-mmio, ... */
union device_address {
struct pci_address pci_addr;
u64 virtio_mmio_addr;
[ ... ]
};
u64 device_buffer_handle; /* device-specific, virtio-gpu could use 
resource ids here */
};

cheers,
  Gerd



Re: [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-09-19 Thread Keiichi Watanabe
I shared PDF version of this RFC on Google drive:
https://drive.google.com/drive/folders/1hed-mTVI7dj0M_iab4DTfx5kPMLoeX8R

On Thu, Sep 19, 2019 at 8:15 PM Keiichi Watanabe  wrote:
>
> Hi Hans,
> Thank you for your feedback.
>
> On Thu, Sep 19, 2019 at 6:53 PM Hans Verkuil  wrote:
> >
> > Hi Keiichi,
> >
> > On 9/19/19 11:34 AM, Keiichi Watanabe wrote:
> > > [Resending because of some issues with sending to virtio-dev. Sorry for 
> > > the noise.]
> > >
> > > This patch proposes virtio specification for new virtio video decode
> > > device.
> > > This device provides the functionality of hardware accelerated video
> > > decoding from encoded video contents provided by the guest into frame
> > > buffers accessible by the guest.
> > >
> > > We have prototype implementation for VMs on Chrome OS:
> > > * virtio-vdec device in crosvm [1]
> > > * virtio-vdec driver in Linux kernel v4.19 [2]
> > >   - This driver follows V4L2 stateful video decoder API [3].
> > >
> > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > device to use buffers allocated by virtio-gpu device.
> > >
> > > Any feedback would be greatly appreciated. Thank you.
> >
> > I'm not a virtio expert, but as I understand it the virtio-vdec driver
> > looks like a regular v4l2 stateful decoder devices to the guest, while
> > on the host there is a driver (or something like that) that maps the
> > virtio-vdec requests to the actual decoder hardware, right?
> >
> > What concerns me a bit (but there may be good reasons for this) is that
> > this virtio driver is so specific for stateful decoders.
> >
>
> We aim to design a platform-independent interface. The virtio-vdec protocol
> should be designed to be workable regardless of APIs, OS, and platforms
> eventually.
>
> Our prototype virtio-vdec device translates the virtio-vdec protocol to a
> Chrome's video decode acceleration API instead of talking to hardware decoders
> directly. This Chrome's API is an abstract layer for multiple decoder APIs on
> Linux such as V4L2 stateful, V4L2 slice, Intel's VAAPI.
>
> That is to say the guest driver translates V4L2 stateful API to virtio-vdec 
> and
> the host device translates virtio-vdec to Chrome's API. So, I could say that
> this is already more general than a mere V4L2 stateful API wrapper, at least.
>
> I'd appreciate if you could let me know some parts are still specific to V4L2.
>
>
> > How does this scale to stateful encoders? Stateless codecs? Other M2M
> > devices like deinterlacers or colorspace converters? What about webcams?
> >
>
> We're designing virtio protocol for encoder as well, but we are at an early
> stage. So, I'm not sure if we should/can handle decoder and encoder in one
> protocol. I don't have any plans for other media devices.
>
>
> > In other words, I would like to see a bigger picture here.
> >
> > Note that there is also an effort for Xen to expose webcams to a guest:
> >
> > https://www.spinics.net/lists/linux-media/msg148629.html
> >
>
> Good to know. Thanks.
>
> > This may or may not be of interest. This was an RFC only, and I haven't
> > seen any follow-up patches with actual code.
> >
> > There will be a half-day meeting of media developers during the ELCE
> > in October about codecs. I know Alexandre and Tomasz will be there.
> > It might be a good idea to discuss this in more detail if needed.
> >
>
> Sounds good. They are closely working with me.
>
> Regards,
> Keiichi
>
> > Regards,
> >
> > Hans
> >
> > >
> > > [1] 
> > > https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
> > > [2] 
> > > https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
> > > [3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html 
> > > (to be merged to Linux 5.4)
> > > [4] https://lkml.org/lkml/2019/9/12/157
> > >
> > > Signed-off-by: Keiichi Watanabe 
> > > ---
> > >  content.tex |   1 +
> > >  virtio-vdec.tex | 750 
> > >  2 files changed, 751 insertions(+)
> > >  create mode 100644 virtio-vdec.tex
> > >
> > > diff --git a/content.tex b/content.tex
> > > index 37a2190..b57d4a9 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing 
> > > Requirements}\label{sec:Device
> > >  \input{virtio-input.tex}
> > >  \input{virtio-crypto.tex}
> > >  \input{virtio-vsock.tex}
> > > +\input{virtio-vdec.tex}
> > >
> > >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >
> > > diff --git a/virtio-vdec.tex b/virtio-vdec.tex
> > > new file mode 100644
> > > index 000..d117129
> > > --- /dev/null
> > > +++ b/virtio-vdec.tex
> > > @@ -0,0 +1,750 @@
> > > +\section{Video Decode Device}
> > > +\label{sec:Device Types / Video Decode Device}
> > > +
> > > +virtio-vdec is a virtio based video decoder. This device provides the
> > > +functionality of hardware acc

Re: [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-09-19 Thread Keiichi Watanabe
Hi Hans,
Thank you for your feedback.

On Thu, Sep 19, 2019 at 6:53 PM Hans Verkuil  wrote:
>
> Hi Keiichi,
>
> On 9/19/19 11:34 AM, Keiichi Watanabe wrote:
> > [Resending because of some issues with sending to virtio-dev. Sorry for the 
> > noise.]
> >
> > This patch proposes virtio specification for new virtio video decode
> > device.
> > This device provides the functionality of hardware accelerated video
> > decoding from encoded video contents provided by the guest into frame
> > buffers accessible by the guest.
> >
> > We have prototype implementation for VMs on Chrome OS:
> > * virtio-vdec device in crosvm [1]
> > * virtio-vdec driver in Linux kernel v4.19 [2]
> >   - This driver follows V4L2 stateful video decoder API [3].
> >
> > Our prototype implementation uses [4], which allows the virtio-vdec
> > device to use buffers allocated by virtio-gpu device.
> >
> > Any feedback would be greatly appreciated. Thank you.
>
> I'm not a virtio expert, but as I understand it the virtio-vdec driver
> looks like a regular v4l2 stateful decoder devices to the guest, while
> on the host there is a driver (or something like that) that maps the
> virtio-vdec requests to the actual decoder hardware, right?
>
> What concerns me a bit (but there may be good reasons for this) is that
> this virtio driver is so specific for stateful decoders.
>

We aim to design a platform-independent interface. The virtio-vdec protocol
should be designed to be workable regardless of APIs, OS, and platforms
eventually.

Our prototype virtio-vdec device translates the virtio-vdec protocol to a
Chrome's video decode acceleration API instead of talking to hardware decoders
directly. This Chrome's API is an abstract layer for multiple decoder APIs on
Linux such as V4L2 stateful, V4L2 slice, Intel's VAAPI.

That is to say the guest driver translates V4L2 stateful API to virtio-vdec and
the host device translates virtio-vdec to Chrome's API. So, I could say that
this is already more general than a mere V4L2 stateful API wrapper, at least.

I'd appreciate if you could let me know some parts are still specific to V4L2.


> How does this scale to stateful encoders? Stateless codecs? Other M2M
> devices like deinterlacers or colorspace converters? What about webcams?
>

We're designing virtio protocol for encoder as well, but we are at an early
stage. So, I'm not sure if we should/can handle decoder and encoder in one
protocol. I don't have any plans for other media devices.


> In other words, I would like to see a bigger picture here.
>
> Note that there is also an effort for Xen to expose webcams to a guest:
>
> https://www.spinics.net/lists/linux-media/msg148629.html
>

Good to know. Thanks.

> This may or may not be of interest. This was an RFC only, and I haven't
> seen any follow-up patches with actual code.
>
> There will be a half-day meeting of media developers during the ELCE
> in October about codecs. I know Alexandre and Tomasz will be there.
> It might be a good idea to discuss this in more detail if needed.
>

Sounds good. They are closely working with me.

Regards,
Keiichi

> Regards,
>
> Hans
>
> >
> > [1] 
> > https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
> > [2] 
> > https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
> > [3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html (to 
> > be merged to Linux 5.4)
> > [4] https://lkml.org/lkml/2019/9/12/157
> >
> > Signed-off-by: Keiichi Watanabe 
> > ---
> >  content.tex |   1 +
> >  virtio-vdec.tex | 750 
> >  2 files changed, 751 insertions(+)
> >  create mode 100644 virtio-vdec.tex
> >
> > diff --git a/content.tex b/content.tex
> > index 37a2190..b57d4a9 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing 
> > Requirements}\label{sec:Device
> >  \input{virtio-input.tex}
> >  \input{virtio-crypto.tex}
> >  \input{virtio-vsock.tex}
> > +\input{virtio-vdec.tex}
> >
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >
> > diff --git a/virtio-vdec.tex b/virtio-vdec.tex
> > new file mode 100644
> > index 000..d117129
> > --- /dev/null
> > +++ b/virtio-vdec.tex
> > @@ -0,0 +1,750 @@
> > +\section{Video Decode Device}
> > +\label{sec:Device Types / Video Decode Device}
> > +
> > +virtio-vdec is a virtio based video decoder. This device provides the
> > +functionality of hardware accelerated video decoding from encoded
> > +video contents provided by the guest into frame buffers accessible by
> > +the guest.
> > +
> > +\subsection{Device ID}
> > +\label{sec:Device Types / Video Decode Device / Device ID}
> > +
> > +28
> > +
> > +\subsection{Virtqueues}
> > +\label{sec:Device Types / Video Decode Device / Virtqueues}
> > +
> > +\begin{description}
> > +\item[0] outq - queue for sending requests from t

Re: [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-09-19 Thread Hans Verkuil
Hi Keiichi,

On 9/19/19 11:34 AM, Keiichi Watanabe wrote:
> [Resending because of some issues with sending to virtio-dev. Sorry for the 
> noise.]
> 
> This patch proposes virtio specification for new virtio video decode
> device.
> This device provides the functionality of hardware accelerated video
> decoding from encoded video contents provided by the guest into frame
> buffers accessible by the guest.
> 
> We have prototype implementation for VMs on Chrome OS:
> * virtio-vdec device in crosvm [1]
> * virtio-vdec driver in Linux kernel v4.19 [2]
>   - This driver follows V4L2 stateful video decoder API [3].
> 
> Our prototype implementation uses [4], which allows the virtio-vdec
> device to use buffers allocated by virtio-gpu device.
> 
> Any feedback would be greatly appreciated. Thank you.

I'm not a virtio expert, but as I understand it the virtio-vdec driver
looks like a regular v4l2 stateful decoder devices to the guest, while
on the host there is a driver (or something like that) that maps the
virtio-vdec requests to the actual decoder hardware, right?

What concerns me a bit (but there may be good reasons for this) is that
this virtio driver is so specific for stateful decoders.

How does this scale to stateful encoders? Stateless codecs? Other M2M
devices like deinterlacers or colorspace converters? What about webcams?

In other words, I would like to see a bigger picture here.

Note that there is also an effort for Xen to expose webcams to a guest:

https://www.spinics.net/lists/linux-media/msg148629.html

This may or may not be of interest. This was an RFC only, and I haven't
seen any follow-up patches with actual code.

There will be a half-day meeting of media developers during the ELCE
in October about codecs. I know Alexandre and Tomasz will be there.
It might be a good idea to discuss this in more detail if needed.

Regards,

Hans

> 
> [1] 
> https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
> [2] 
> https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
> [3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html (to 
> be merged to Linux 5.4)
> [4] https://lkml.org/lkml/2019/9/12/157
> 
> Signed-off-by: Keiichi Watanabe 
> ---
>  content.tex |   1 +
>  virtio-vdec.tex | 750 
>  2 files changed, 751 insertions(+)
>  create mode 100644 virtio-vdec.tex
> 
> diff --git a/content.tex b/content.tex
> index 37a2190..b57d4a9 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing 
> Requirements}\label{sec:Device
>  \input{virtio-input.tex}
>  \input{virtio-crypto.tex}
>  \input{virtio-vsock.tex}
> +\input{virtio-vdec.tex}
> 
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> 
> diff --git a/virtio-vdec.tex b/virtio-vdec.tex
> new file mode 100644
> index 000..d117129
> --- /dev/null
> +++ b/virtio-vdec.tex
> @@ -0,0 +1,750 @@
> +\section{Video Decode Device}
> +\label{sec:Device Types / Video Decode Device}
> +
> +virtio-vdec is a virtio based video decoder. This device provides the
> +functionality of hardware accelerated video decoding from encoded
> +video contents provided by the guest into frame buffers accessible by
> +the guest.
> +
> +\subsection{Device ID}
> +\label{sec:Device Types / Video Decode Device / Device ID}
> +
> +28
> +
> +\subsection{Virtqueues}
> +\label{sec:Device Types / Video Decode Device / Virtqueues}
> +
> +\begin{description}
> +\item[0] outq - queue for sending requests from the driver to the
> +  device
> +\item[1] inq - queue for sending requests from the device to the
> +  driver
> +\end{description}
> +
> +Each queue is used uni-directionally. outq is used to send requests
> +from the driver to the device (i.e., guest requests) and inq is used
> +to send requests in the other direction (i.e., host requests).
> +
> +\subsection{Feature bits}
> +\label{sec:Device Types / Video Decode Device / Feature bits}
> +
> +There are currently no feature bits defined for this device.
> +
> +\subsection{Device configuration layout}
> +\label{sec:Device Types / Video Decode Device / Device configuration layout}
> +
> +None.
> +
> +\subsection{Device Requirements: Device Initialization}
> +\label{sec:Device Types / Video Decode Device / Device Requirements: Device 
> Initialization}
> +
> +The virtqueues are initialized.
> +
> +\subsection{Device Operation}
> +\label{sec:Device Types / Video Decode Device / Device Operation}
> +
> +\subsubsection{Video Buffers}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Buffers}
> +
> +A virtio-vdec driver and a device use two types of video buffers:
> +\emph{bitstream buffer} and \emph{frame buffer}. A bitstream buffer
> +contains encoded video stream data. This buffer is similar to an
> +OUTPUT buffer for Video for Linux Two (V4L2) API. A frame buff

[PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

2019-09-19 Thread Keiichi Watanabe
[Resending because of some issues with sending to virtio-dev. Sorry for the 
noise.]

This patch proposes virtio specification for new virtio video decode
device.
This device provides the functionality of hardware accelerated video
decoding from encoded video contents provided by the guest into frame
buffers accessible by the guest.

We have prototype implementation for VMs on Chrome OS:
* virtio-vdec device in crosvm [1]
* virtio-vdec driver in Linux kernel v4.19 [2]
  - This driver follows V4L2 stateful video decoder API [3].

Our prototype implementation uses [4], which allows the virtio-vdec
device to use buffers allocated by virtio-gpu device.

Any feedback would be greatly appreciated. Thank you.

[1] 
https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
[2] 
https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
[3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html (to be 
merged to Linux 5.4)
[4] https://lkml.org/lkml/2019/9/12/157

Signed-off-by: Keiichi Watanabe 
---
 content.tex |   1 +
 virtio-vdec.tex | 750 
 2 files changed, 751 insertions(+)
 create mode 100644 virtio-vdec.tex

diff --git a/content.tex b/content.tex
index 37a2190..b57d4a9 100644
--- a/content.tex
+++ b/content.tex
@@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing 
Requirements}\label{sec:Device
 \input{virtio-input.tex}
 \input{virtio-crypto.tex}
 \input{virtio-vsock.tex}
+\input{virtio-vdec.tex}

 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}

diff --git a/virtio-vdec.tex b/virtio-vdec.tex
new file mode 100644
index 000..d117129
--- /dev/null
+++ b/virtio-vdec.tex
@@ -0,0 +1,750 @@
+\section{Video Decode Device}
+\label{sec:Device Types / Video Decode Device}
+
+virtio-vdec is a virtio based video decoder. This device provides the
+functionality of hardware accelerated video decoding from encoded
+video contents provided by the guest into frame buffers accessible by
+the guest.
+
+\subsection{Device ID}
+\label{sec:Device Types / Video Decode Device / Device ID}
+
+28
+
+\subsection{Virtqueues}
+\label{sec:Device Types / Video Decode Device / Virtqueues}
+
+\begin{description}
+\item[0] outq - queue for sending requests from the driver to the
+  device
+\item[1] inq - queue for sending requests from the device to the
+  driver
+\end{description}
+
+Each queue is used uni-directionally. outq is used to send requests
+from the driver to the device (i.e., guest requests) and inq is used
+to send requests in the other direction (i.e., host requests).
+
+\subsection{Feature bits}
+\label{sec:Device Types / Video Decode Device / Feature bits}
+
+There are currently no feature bits defined for this device.
+
+\subsection{Device configuration layout}
+\label{sec:Device Types / Video Decode Device / Device configuration layout}
+
+None.
+
+\subsection{Device Requirements: Device Initialization}
+\label{sec:Device Types / Video Decode Device / Device Requirements: Device 
Initialization}
+
+The virtqueues are initialized.
+
+\subsection{Device Operation}
+\label{sec:Device Types / Video Decode Device / Device Operation}
+
+\subsubsection{Video Buffers}
+\label{sec:Device Types / Video Decode Device / Device Operation / Buffers}
+
+A virtio-vdec driver and a device use two types of video buffers:
+\emph{bitstream buffer} and \emph{frame buffer}. A bitstream buffer
+contains encoded video stream data. This buffer is similar to an
+OUTPUT buffer for Video for Linux Two (V4L2) API. A frame buffer
+contains decoded video frame data like a CAPTURE buffers for V4L2 API.
+The driver and the device share these buffers, and each buffer is
+identified by a unique integer called a \emph{resource handle}.
+
+\subsubsection{Guest Request}
+
+The driver queues requests to the outq virtqueue. The device MAY
+process requests out-of-order. All requests on outq use the following
+structure:
+
+\begin{lstlisting}
+enum virtio_vdec_guest_req_type {
+VIRTIO_VDEC_GUEST_REQ_UNDEFINED = 0,
+
+/* Global */
+VIRTIO_VDEC_GUEST_REQ_QUERY = 0x0100,
+
+/* Per instance */
+VIRTIO_VDEC_GUEST_REQ_OPEN = 0x0200,
+VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT,
+VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER,
+VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO,
+VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER,
+VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER,
+VIRTIO_VDEC_GUEST_REQ_DRAIN,
+VIRTIO_VDEC_GUEST_REQ_FLUSH,
+VIRTIO_VDEC_GUEST_REQ_CLOSE,
+};
+
+struct virtio_vdec_guest_req {
+le32 type;
+le32 instance_id;
+union {
+struct virtio_vdec_guest_req_open open;
+struct virtio_vdec_guest_req_set_buffer_count set_buffer_count;
+struct virtio_vdec_guest_req_register_buffer register_buffer;
+struct virtio_vdec_guest