Re: [Mesa-dev] [PATCH 00/27] Renderbuffer Decompression (and GBM modifiers)

2016-12-31 Thread Ben Widawsky

On 16-12-29 17:34:19, Ben Widawsky wrote:

On 16-12-06 13:34:02, Paulo Zanoni wrote:

2016-12-01 20:09 GMT-02:00 Ben Widawsky :

From: Ben Widawsky 

This patch series ultimately adds support within the i965 driver for
Renderbuffer Decompression with GBM. In short, this feature reduces memory
bandwidth by allowing the GPU to work with losslessly compressed data and having
that compression scheme understood by the display engine for decompression. The
display engine will decompress on the fly and scanout the image.

Quoting from the final patch, the bandwidth savings on a SKL GT4 with a 19x10
display running kmscube:

Without compression:
   Read bandwidth: 603.91 MiB/s
   Write bandwidth: 615.28 MiB/s

With compression:
   Read bandwidth: 259.34 MiB/s
   Write bandwidth: 337.83 MiB/s


The hardware achieves this savings by maintaining an auxiliary buffer
containing "opaque" compression information. It's opaque in the sense that the
low level compression scheme is not needed, but, knowledge of the overall
layout of the compressed data is required. The auxiliary buffer is created by
the driver on behalf of the client when requested. That buffer needs to be
passed along wherever the main image's buffer goes.

The overall strategy is that the buffer/surface is created with a list of
modifiers. The list of modifiers the hardware is capable of using will come from
a new kernel API that is aware of the hardware and general constraints. A client
will request the list of modifiers and pass it directly back in during buffer
creation (potentially the client can prune the list, but as of now there is no
reason to.) This new API is being developed by Kristian. I did not get far
enough to play with that.

For EGL, a similar mechanism would exist whereby when importing a buffer into
EGL, one would provide a modifier and probably a pointer to the auxiliary data
upon import. (Import therefore might require multiple dma-buf fds), but for i965
and Intel, this wouldn't be necessary.

Here is a brief description of the series:
1-6 Adds support in GBM for per plane functions where necessary. This is
required because the kernel expects the auxiliary buffer to be passed along as a
plane. It has its own offset, and stride, and the client shouldn't need to
calculate those.

7-9 Adds support in GBM to understand modifiers. When creating a buffer or
surface, the client is expected to pass in a list of modifiers that the driver
will optimally choose from. As a result of this, the GBM APIs need to support
modifiers.

10-12 Support Y-tiled modifier. Y-tiling was already a modifier exposed by the
kernel. With the previous patches in place, it's easy to support this too.

13-26 Plumbing to support sending CCS buffers to display. Leveraging much of the
existing code for MCS buffers, these patches creating an MCS for the scanout
buffer. The trickery here is that a single BO contains both the main surface and
the auxiliary data. Previously, auxiliary data always lived in its own BO.

27 Support CCS-modifier. Finally, the code can parse the CCS fb modifier(s) and
realize the bandwidth savings that come with it.

This was tested using kmscube
(https://github.com/bwidawsk/kmscube/tree/modifiers). The kmscube implementation
is missing support for GET_PLANE2 - which is currently being worked on by
Kristian.

Upstream plan:


First of all, I'd like to point that I haven't really been following
this feature closely, so maybe my questions are irrelevant to this
series. But still, I feel I have to poitn these things since maybe
they are relevant. Please tell me if I'm not talking about the same
thing as you are.

The main question is: where's the matching i915.ko series? Shouldn't
that be step 0 in your upstream plan?



Ville is working on it. All patches except the last can be merged without kernel
support. That is assuming that we agree upon the general solution, using the
modifiers and having both buffers be part of the same BO. There is also a
requisite series from Kristian which will allow the client to query per plane
modifiers.



I guess this is a lie actually. I depend on fourcc_mod_code(INTEL, 4) being
Y-tiled CCS modifier. I can figure out a way to defer this until the last patch.


I do recall seeing BSpec text containing "do this thing if render
decompression is enabled" and, at that time, our code wasn't
implementing those instructions. AFAIU, the Kernel didn't really had
support for render decompression, so its specific bits were just
ignored. I was assuming that whoever implemented the feature would add
all the necessary bits, especially since we didn't seem to have any
sort of "if (has_render_decompression(dev_priv))" to call. I am 100%
sure there's such an example in the Gen 9 Watermarks instructions, but
I'm sure I saw more somewhere else (Display WA page?). And reember:
missing watermarks workarounds equals flickering screens.

Is this relevant to your series? How will Mesa be 

Re: [Mesa-dev] [PATCH 00/27] Renderbuffer Decompression (and GBM modifiers)

2016-12-29 Thread Ben Widawsky

On 16-12-06 13:34:02, Paulo Zanoni wrote:

2016-12-01 20:09 GMT-02:00 Ben Widawsky :

From: Ben Widawsky 

This patch series ultimately adds support within the i965 driver for
Renderbuffer Decompression with GBM. In short, this feature reduces memory
bandwidth by allowing the GPU to work with losslessly compressed data and having
that compression scheme understood by the display engine for decompression. The
display engine will decompress on the fly and scanout the image.

Quoting from the final patch, the bandwidth savings on a SKL GT4 with a 19x10
display running kmscube:

Without compression:
Read bandwidth: 603.91 MiB/s
Write bandwidth: 615.28 MiB/s

With compression:
Read bandwidth: 259.34 MiB/s
Write bandwidth: 337.83 MiB/s


The hardware achieves this savings by maintaining an auxiliary buffer
containing "opaque" compression information. It's opaque in the sense that the
low level compression scheme is not needed, but, knowledge of the overall
layout of the compressed data is required. The auxiliary buffer is created by
the driver on behalf of the client when requested. That buffer needs to be
passed along wherever the main image's buffer goes.

The overall strategy is that the buffer/surface is created with a list of
modifiers. The list of modifiers the hardware is capable of using will come from
a new kernel API that is aware of the hardware and general constraints. A client
will request the list of modifiers and pass it directly back in during buffer
creation (potentially the client can prune the list, but as of now there is no
reason to.) This new API is being developed by Kristian. I did not get far
enough to play with that.

For EGL, a similar mechanism would exist whereby when importing a buffer into
EGL, one would provide a modifier and probably a pointer to the auxiliary data
upon import. (Import therefore might require multiple dma-buf fds), but for i965
and Intel, this wouldn't be necessary.

Here is a brief description of the series:
1-6 Adds support in GBM for per plane functions where necessary. This is
required because the kernel expects the auxiliary buffer to be passed along as a
plane. It has its own offset, and stride, and the client shouldn't need to
calculate those.

7-9 Adds support in GBM to understand modifiers. When creating a buffer or
surface, the client is expected to pass in a list of modifiers that the driver
will optimally choose from. As a result of this, the GBM APIs need to support
modifiers.

10-12 Support Y-tiled modifier. Y-tiling was already a modifier exposed by the
kernel. With the previous patches in place, it's easy to support this too.

13-26 Plumbing to support sending CCS buffers to display. Leveraging much of the
existing code for MCS buffers, these patches creating an MCS for the scanout
buffer. The trickery here is that a single BO contains both the main surface and
the auxiliary data. Previously, auxiliary data always lived in its own BO.

27 Support CCS-modifier. Finally, the code can parse the CCS fb modifier(s) and
realize the bandwidth savings that come with it.

This was tested using kmscube
(https://github.com/bwidawsk/kmscube/tree/modifiers). The kmscube implementation
is missing support for GET_PLANE2 - which is currently being worked on by
Kristian.

Upstream plan:


First of all, I'd like to point that I haven't really been following
this feature closely, so maybe my questions are irrelevant to this
series. But still, I feel I have to poitn these things since maybe
they are relevant. Please tell me if I'm not talking about the same
thing as you are.

The main question is: where's the matching i915.ko series? Shouldn't
that be step 0 in your upstream plan?



Ville is working on it. All patches except the last can be merged without kernel
support. That is assuming that we agree upon the general solution, using the
modifiers and having both buffers be part of the same BO. There is also a
requisite series from Kristian which will allow the client to query per plane
modifiers.


I do recall seeing BSpec text containing "do this thing if render
decompression is enabled" and, at that time, our code wasn't
implementing those instructions. AFAIU, the Kernel didn't really had
support for render decompression, so its specific bits were just
ignored. I was assuming that whoever implemented the feature would add
all the necessary bits, especially since we didn't seem to have any
sort of "if (has_render_decompression(dev_priv))" to call. I am 100%
sure there's such an example in the Gen 9 Watermarks instructions, but
I'm sure I saw more somewhere else (Display WA page?). And reember:
missing watermarks workarounds equals flickering screens.

Is this relevant to your series? How will Mesa be able to detect that
the Kernel it's running on contains the necessary Render Decompression
checks/WAs/code it needs? How can the Kernel detect that Render
Decompression is in use and start doing the 

Re: [Mesa-dev] [PATCH 00/27] Renderbuffer Decompression (and GBM modifiers)

2016-12-06 Thread Paulo Zanoni
2016-12-01 20:09 GMT-02:00 Ben Widawsky :
> From: Ben Widawsky 
>
> This patch series ultimately adds support within the i965 driver for
> Renderbuffer Decompression with GBM. In short, this feature reduces memory
> bandwidth by allowing the GPU to work with losslessly compressed data and 
> having
> that compression scheme understood by the display engine for decompression. 
> The
> display engine will decompress on the fly and scanout the image.
>
> Quoting from the final patch, the bandwidth savings on a SKL GT4 with a 19x10
> display running kmscube:
>
> Without compression:
> Read bandwidth: 603.91 MiB/s
> Write bandwidth: 615.28 MiB/s
>
> With compression:
> Read bandwidth: 259.34 MiB/s
> Write bandwidth: 337.83 MiB/s
>
>
> The hardware achieves this savings by maintaining an auxiliary buffer
> containing "opaque" compression information. It's opaque in the sense that the
> low level compression scheme is not needed, but, knowledge of the overall
> layout of the compressed data is required. The auxiliary buffer is created by
> the driver on behalf of the client when requested. That buffer needs to be
> passed along wherever the main image's buffer goes.
>
> The overall strategy is that the buffer/surface is created with a list of
> modifiers. The list of modifiers the hardware is capable of using will come 
> from
> a new kernel API that is aware of the hardware and general constraints. A 
> client
> will request the list of modifiers and pass it directly back in during buffer
> creation (potentially the client can prune the list, but as of now there is no
> reason to.) This new API is being developed by Kristian. I did not get far
> enough to play with that.
>
> For EGL, a similar mechanism would exist whereby when importing a buffer into
> EGL, one would provide a modifier and probably a pointer to the auxiliary data
> upon import. (Import therefore might require multiple dma-buf fds), but for 
> i965
> and Intel, this wouldn't be necessary.
>
> Here is a brief description of the series:
> 1-6 Adds support in GBM for per plane functions where necessary. This is
> required because the kernel expects the auxiliary buffer to be passed along 
> as a
> plane. It has its own offset, and stride, and the client shouldn't need to
> calculate those.
>
> 7-9 Adds support in GBM to understand modifiers. When creating a buffer or
> surface, the client is expected to pass in a list of modifiers that the driver
> will optimally choose from. As a result of this, the GBM APIs need to support
> modifiers.
>
> 10-12 Support Y-tiled modifier. Y-tiling was already a modifier exposed by the
> kernel. With the previous patches in place, it's easy to support this too.
>
> 13-26 Plumbing to support sending CCS buffers to display. Leveraging much of 
> the
> existing code for MCS buffers, these patches creating an MCS for the scanout
> buffer. The trickery here is that a single BO contains both the main surface 
> and
> the auxiliary data. Previously, auxiliary data always lived in its own BO.
>
> 27 Support CCS-modifier. Finally, the code can parse the CCS fb modifier(s) 
> and
> realize the bandwidth savings that come with it.
>
> This was tested using kmscube
> (https://github.com/bwidawsk/kmscube/tree/modifiers). The kmscube 
> implementation
> is missing support for GET_PLANE2 - which is currently being worked on by
> Kristian.
>
> Upstream plan:

First of all, I'd like to point that I haven't really been following
this feature closely, so maybe my questions are irrelevant to this
series. But still, I feel I have to poitn these things since maybe
they are relevant. Please tell me if I'm not talking about the same
thing as you are.

The main question is: where's the matching i915.ko series? Shouldn't
that be step 0 in your upstream plan?

I do recall seeing BSpec text containing "do this thing if render
decompression is enabled" and, at that time, our code wasn't
implementing those instructions. AFAIU, the Kernel didn't really had
support for render decompression, so its specific bits were just
ignored. I was assuming that whoever implemented the feature would add
all the necessary bits, especially since we didn't seem to have any
sort of "if (has_render_decompression(dev_priv))" to call. I am 100%
sure there's such an example in the Gen 9 Watermarks instructions, but
I'm sure I saw more somewhere else (Display WA page?). And reember:
missing watermarks workarounds equals flickering screens.

Is this relevant to your series? How will Mesa be able to detect that
the Kernel it's running on contains the necessary Render Decompression
checks/WAs/code it needs? How can the Kernel detect that Render
Decompression is in use and start doing the things it should do?

Thanks,
Paulo


> 1. All of the patches up through 26 should be mergeable today after review.
> 2. After 1-12 land, client support of Y-tiling should be achievable. 
> Modesetting
> driver can probably 

Re: [Mesa-dev] [PATCH 00/27] Renderbuffer Decompression (and GBM modifiers)

2016-12-02 Thread Rob Clark
On Thu, Dec 1, 2016 at 5:09 PM, Ben Widawsky
 wrote:
> When Kristian's interface is ready, kmscube can be modified to make use of it.
>
> Rob: are you interested in a PR for kmscube?

sure, from a quick look seems like it should be backwards compatible..
probably we should set up a git tree on fd.o for kmscube

It does make me realize that I do need to figure out what to do w/ the
atomic/fences branches.. maybe I should just make a legacy branch
which sticks with the legacy APIs for hw that doesn't support atomic
and old kernels.  Otherwise I guess kmscube maybe needs to get split
into more than one file to keep it from being too much of a mess ;-)

btw, interesting that you went the route of an extra plane for
"metadata"..  I have something similar w/ a5xx, and was assuming I'd
just have to go single-plane with well known formula for calculating
offset of color data from aux data, to avoid confusing dri2/dri3 too
badly.

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/27] Renderbuffer Decompression (and GBM modifiers)

2016-12-02 Thread Daniel Stone
Hey Ben,
Sorry I didn't get to testing this before now; have been tied up with
all manner of stuff.

On 1 December 2016 at 22:09, Ben Widawsky  wrote:
> The overall strategy is that the buffer/surface is created with a list of
> modifiers. The list of modifiers the hardware is capable of using will come 
> from
> a new kernel API that is aware of the hardware and general constraints. A 
> client
> will request the list of modifiers and pass it directly back in during buffer
> creation (potentially the client can prune the list, but as of now there is no
> reason to.) This new API is being developed by Kristian. I did not get far
> enough to play with that.
>
> For EGL, a similar mechanism would exist whereby when importing a buffer into
> EGL, one would provide a modifier and probably a pointer to the auxiliary data
> upon import. (Import therefore might require multiple dma-buf fds), but for 
> i965
> and Intel, this wouldn't be necessary.

Right, we have EGL_EXT_image_dma_buf_import_modifiers; Varad has a
series on the list already for this which just needs some reviews
(ahem).

> Here is a brief description of the series:
> 1-6 Adds support in GBM for per plane functions where necessary. This is
> required because the kernel expects the auxiliary buffer to be passed along 
> as a
> plane. It has its own offset, and stride, and the client shouldn't need to
> calculate those.

This is missing gbm_bo_get_handle_for_plane(); as you say, a lot of
other hardware tends to use separate buffers rather than
adjacent/offset. So adding that would be nice. Having
gbm_bo_get_plane_count() is really nice though, since it allows us to
have a completely agnostic client (i.e. I don't have to have a map
inside Weston with every exotic format/modifier combination).

> 7-9 Adds support in GBM to understand modifiers. When creating a buffer or
> surface, the client is expected to pass in a list of modifiers that the driver
> will optimally choose from. As a result of this, the GBM APIs need to support
> modifiers.

This bit seems good, and like a reasonable fit for the draft of
GETPLANE2 which is kicking around.

> 10-12 Support Y-tiled modifier. Y-tiling was already a modifier exposed by the
> kernel. With the previous patches in place, it's easy to support this too.

And it works! \o/

> 13-26 Plumbing to support sending CCS buffers to display. Leveraging much of 
> the
> existing code for MCS buffers, these patches creating an MCS for the scanout
> buffer. The trickery here is that a single BO contains both the main surface 
> and
> the auxiliary data. Previously, auxiliary data always lived in its own BO.
>
> 27 Support CCS-modifier. Finally, the code can parse the CCS fb modifier(s) 
> and
> realize the bandwidth savings that come with it.

I've not rebuilt my kernel to test the new CCS bits, so I haven't tested this.

> This was tested using kmscube
> (https://github.com/bwidawsk/kmscube/tree/modifiers). The kmscube 
> implementation
> is missing support for GET_PLANE2 - which is currently being worked on by
> Kristian.

There's also a Weston branch here:
https://git.collabora.com/cgit/user/daniels/weston.git/log/?h=wip/2016-11/gbm-planes-modifiers

This works with Y-tiling for me, but with the same need for
GET_PLANE2; also the branch as-is will provoke a segfault inside
gbm_dri_bo_get_modifier(), which ends up calling intel_query_image()
with image == NULL, when using cursor images. To get it to succeed,
you need to shove an early 'return -1' inside
drm_output_init_cursor_egl() so we fall back to software (well OK, GL)
cursors.

The branch is broken with multihead, but that's the branch it's based
on being broken/WIP, not a result of these patches.

> Upstream plan:
> 1. All of the patches up through 26 should be mergeable today after review.
> 2. After 1-12 land, client support of Y-tiling should be achievable. 
> Modesetting
> driver can probably be updated as can things like Weston. Clients assuming a 
> new
> enough kernel should be able to blindly set the y tiled modifier.
> 3. Once kernel and libdrm support for CCS modifiers, patch 27 can land, 
> however
> CCS isn't yet usable, it is only available as a prototype.
> 4. Kristian's GET_PLANE2 interface needs to be solidified and land.
> 5. Clients will utilize #3 and #4 to use CCS.
> 6. Protocol work, EGL, Wayland, DRIX - etc

Wayland has modifier support already; there are patches out for review
for Weston to support this via the EGL extension above, as well as
inside KMS (part of the atomic branch).

> When Kristian's interface is ready, kmscube can be modified to make use of it.

And I'll modify Weston to use it as well.

Thanks for this, and sorry for the tardy review.

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/27] Renderbuffer Decompression (and GBM modifiers)

2016-12-01 Thread Ben Widawsky
From: Ben Widawsky 

This patch series ultimately adds support within the i965 driver for
Renderbuffer Decompression with GBM. In short, this feature reduces memory
bandwidth by allowing the GPU to work with losslessly compressed data and having
that compression scheme understood by the display engine for decompression. The
display engine will decompress on the fly and scanout the image.

Quoting from the final patch, the bandwidth savings on a SKL GT4 with a 19x10
display running kmscube:

Without compression:
Read bandwidth: 603.91 MiB/s
Write bandwidth: 615.28 MiB/s

With compression:
Read bandwidth: 259.34 MiB/s
Write bandwidth: 337.83 MiB/s


The hardware achieves this savings by maintaining an auxiliary buffer
containing "opaque" compression information. It's opaque in the sense that the
low level compression scheme is not needed, but, knowledge of the overall
layout of the compressed data is required. The auxiliary buffer is created by
the driver on behalf of the client when requested. That buffer needs to be
passed along wherever the main image's buffer goes.

The overall strategy is that the buffer/surface is created with a list of
modifiers. The list of modifiers the hardware is capable of using will come from
a new kernel API that is aware of the hardware and general constraints. A client
will request the list of modifiers and pass it directly back in during buffer
creation (potentially the client can prune the list, but as of now there is no
reason to.) This new API is being developed by Kristian. I did not get far
enough to play with that.

For EGL, a similar mechanism would exist whereby when importing a buffer into
EGL, one would provide a modifier and probably a pointer to the auxiliary data
upon import. (Import therefore might require multiple dma-buf fds), but for i965
and Intel, this wouldn't be necessary.

Here is a brief description of the series:
1-6 Adds support in GBM for per plane functions where necessary. This is
required because the kernel expects the auxiliary buffer to be passed along as a
plane. It has its own offset, and stride, and the client shouldn't need to
calculate those.

7-9 Adds support in GBM to understand modifiers. When creating a buffer or
surface, the client is expected to pass in a list of modifiers that the driver
will optimally choose from. As a result of this, the GBM APIs need to support
modifiers.

10-12 Support Y-tiled modifier. Y-tiling was already a modifier exposed by the
kernel. With the previous patches in place, it's easy to support this too.

13-26 Plumbing to support sending CCS buffers to display. Leveraging much of the
existing code for MCS buffers, these patches creating an MCS for the scanout
buffer. The trickery here is that a single BO contains both the main surface and
the auxiliary data. Previously, auxiliary data always lived in its own BO.

27 Support CCS-modifier. Finally, the code can parse the CCS fb modifier(s) and
realize the bandwidth savings that come with it.

This was tested using kmscube
(https://github.com/bwidawsk/kmscube/tree/modifiers). The kmscube implementation
is missing support for GET_PLANE2 - which is currently being worked on by
Kristian.

Upstream plan:
1. All of the patches up through 26 should be mergeable today after review.
2. After 1-12 land, client support of Y-tiling should be achievable. Modesetting
driver can probably be updated as can things like Weston. Clients assuming a new
enough kernel should be able to blindly set the y tiled modifier.
3. Once kernel and libdrm support for CCS modifiers, patch 27 can land, however
CCS isn't yet usable, it is only available as a prototype.
4. Kristian's GET_PLANE2 interface needs to be solidified and land.
5. Clients will utilize #3 and #4 to use CCS.
6. Protocol work, EGL, Wayland, DRIX - etc

When Kristian's interface is ready, kmscube can be modified to make use of it.

Rob: are you interested in a PR for kmscube?

Definition of terms:
Renderbuffer Decompression - In the ARM world, this is AFBC. Having the graphics
driver utilize lossless surface compression for the scanout buffer and sending
those surfaces, compressed, to the kernel (via KMS) for the display engine to
directly consume.

Renderbuffer Compression - Utilizing compressed surfaces for many buffer types
(scanout, textures, whatever), and decompressing (ie. resolving) those surfaces
before passing them along.

Ben Widawsky (27):
  gbm: Move getters to match order in header file (trivial)
  gbm: Fix width height getters return type (trivial)
  gbm: Export a plane getter function
  gbm: Create a gbm_device getter for stride
  gbm: Export a per plane getter for stride
  gbm: Export a per plane getter for offset
  i965/dri: Store the screen associated with the image
  dri: Add an image creation with modifiers
  gbm: Introduce modifiers into surface/bo creation
  i965: Handle Y-tile modifier
  gbm: Get modifiers from DRI
  i965: Bring back always Y-tiled on SKL+
  i965: