subject:"\[Intel\-gfx\] \[PATCH\] drm\/i915\: Remove temporary allocation of dma addresses when rotating"

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-11-15 Thread Chris Wilson

Quoting Tvrtko Ursulin (2017-11-15 09:18:00)
> 
> On 14/11/2017 18:14, Chris Wilson wrote:
> > Honestly I think the page_iter cache is useful and likely to already
> > exist or be used shortly after a portion of the object is rotated.
> 
> How come? I thought CPU access to framebuffers is atypical nowadays.

But not entirely forgotten. It's the first, and sometimes, last thing a
client wires up. (Just yesterday mesa gained a patch for yet another
access path. Great.)

I also think it is of wider utility for the when the buffer gets reused
or new views. Your argument applies better to the wider aspect of the
futility of keeping caches around that may never be reused. And that is
worth improving, even more so if we can measure an impact.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-11-15 Thread Tvrtko Ursulin



On 14/11/2017 18:14, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2017-02-27 14:31:17)


On 27/02/2017 10:21, Chris Wilson wrote:

On Mon, Feb 27, 2017 at 10:14:12AM +, Tvrtko Ursulin wrote:


On 27/02/2017 10:06, Chris Wilson wrote:

On Mon, Feb 27, 2017 at 09:55:10AM +, Tvrtko Ursulin wrote:


On 22/02/2017 08:44, Chris Wilson wrote:

I also think that's an argument for improving the general cache rather
than arguing against using it.


Well I wasn't concerned about the cache per se, but about whether it
is completely appropriate (best choice) to use it in this particular
case.

Because as I said before, for 1920x1080x32 we are talking about a
16KiB extremely short lived temporary allocation, vs the similar
size for the sg radix cache. But radix cache sticks around the the
lifetime of obj->mm.pages and it wouldn't otherwise be there since
AFAICS in practice no one really touches frame buffers in a way to
trigger its creation.

Those amounts of memory are not a concern, but again, is the
simplification of the code worth the conceptual downsides mentioned
above? Even if we considered 4K frame buffers, when both allocations
go to ~64KiB, would that change anything? I am not sure, probably
not for me.

So I am still unsure that we should go with this change.


Again, the complaint you have here are general concerns about caching
the mapping. Avoiding using the cache instead of improving the cache
seems the wrong approach.


Depends what kind of improvments to the cache you have in mind. If
you are thinking about size then I disagree, I think it is efficient
enough already. But if you are thinking about the lifetime
management then it is obvious from all that I have written so far
that I would agree with that. Since the core of my "complaint" is
the lifetime mismatch, and not the size.

For lifetime I am not sure what you could do. Exposing the size of
it, with maybe some other bits attached the the object, to the
shrinker I think doesn't make much sense since the sizes are so
small compared to the backing store sizes.

Perhaps you could add an explicit reset of the cache after the
rotation is done with it, but then the only remaining benefit will
be avoiding greater than zero order allocations. I say the only
one.. it would still be a good one. Just have no idea if this level
of cache usage would satisfy you!

Perhaps you could say what kind of optimisation you have in mind to
save me guessing? :)


I was thinking you would like an inactivity timer. Or we could have a
separate shrinker, as that's the principal cache management system.


I thought about the shrinker myself. Even wrote some code to more
accurately size the objects as part of the existing passes. But as I
said the contribution of anything object and not backing store is so
small that, even though it would conceptually be more correct and
perhaps avoid some marginal over-shrinking, I am not sure it is worth
doing it. Assuming of course that I got the sizing of the radix tree
correct! I just hacked something up based on some debug dumping code
from radix-tree.c.

So the complication is there is no API to get the size of the radix tree
(or the scatter list table) and we would have to add something, either
internally to i915, or try and upstream it.

Or we avoid that with your timer idea and just purge all caches which
haven't been used in a while. Maybe from idle work or something.


Tempting. I like hooking into mark_idle/park more than adding a new
timer, and we already have the precedent of using that to initiate a
cache flush.

What's the impact of us keeping pages pinned when idle -- (a lot) more
work in the shrinker. Let's see where the cost-beneift lies.
  

But for this immediate patch, would you be happy with adding and
exporting i915_gem_object_reset_page_iter and calling it after rotation
is done with accessing the pages? Benefit would be avoidance of
drm_malloc_gfp if that bothers you most.


Honestly I think the page_iter cache is useful and likely to already
exist or be used shortly after a portion of the object is rotated.


How come? I thought CPU access to framebuffers is atypical nowadays.

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-11-14 Thread Chris Wilson

Quoting Tvrtko Ursulin (2017-02-27 14:31:17)
> 
> On 27/02/2017 10:21, Chris Wilson wrote:
> > On Mon, Feb 27, 2017 at 10:14:12AM +, Tvrtko Ursulin wrote:
> >>
> >> On 27/02/2017 10:06, Chris Wilson wrote:
> >>> On Mon, Feb 27, 2017 at 09:55:10AM +, Tvrtko Ursulin wrote:
> 
>  On 22/02/2017 08:44, Chris Wilson wrote:
> > I also think that's an argument for improving the general cache rather
> > than arguing against using it.
> 
>  Well I wasn't concerned about the cache per se, but about whether it
>  is completely appropriate (best choice) to use it in this particular
>  case.
> 
>  Because as I said before, for 1920x1080x32 we are talking about a
>  16KiB extremely short lived temporary allocation, vs the similar
>  size for the sg radix cache. But radix cache sticks around the the
>  lifetime of obj->mm.pages and it wouldn't otherwise be there since
>  AFAICS in practice no one really touches frame buffers in a way to
>  trigger its creation.
> 
>  Those amounts of memory are not a concern, but again, is the
>  simplification of the code worth the conceptual downsides mentioned
>  above? Even if we considered 4K frame buffers, when both allocations
>  go to ~64KiB, would that change anything? I am not sure, probably
>  not for me.
> 
>  So I am still unsure that we should go with this change.
> >>>
> >>> Again, the complaint you have here are general concerns about caching
> >>> the mapping. Avoiding using the cache instead of improving the cache
> >>> seems the wrong approach.
> >>
> >> Depends what kind of improvments to the cache you have in mind. If
> >> you are thinking about size then I disagree, I think it is efficient
> >> enough already. But if you are thinking about the lifetime
> >> management then it is obvious from all that I have written so far
> >> that I would agree with that. Since the core of my "complaint" is
> >> the lifetime mismatch, and not the size.
> >>
> >> For lifetime I am not sure what you could do. Exposing the size of
> >> it, with maybe some other bits attached the the object, to the
> >> shrinker I think doesn't make much sense since the sizes are so
> >> small compared to the backing store sizes.
> >>
> >> Perhaps you could add an explicit reset of the cache after the
> >> rotation is done with it, but then the only remaining benefit will
> >> be avoiding greater than zero order allocations. I say the only
> >> one.. it would still be a good one. Just have no idea if this level
> >> of cache usage would satisfy you!
> >>
> >> Perhaps you could say what kind of optimisation you have in mind to
> >> save me guessing? :)
> >
> > I was thinking you would like an inactivity timer. Or we could have a
> > separate shrinker, as that's the principal cache management system.
> 
> I thought about the shrinker myself. Even wrote some code to more 
> accurately size the objects as part of the existing passes. But as I 
> said the contribution of anything object and not backing store is so 
> small that, even though it would conceptually be more correct and 
> perhaps avoid some marginal over-shrinking, I am not sure it is worth 
> doing it. Assuming of course that I got the sizing of the radix tree 
> correct! I just hacked something up based on some debug dumping code 
> from radix-tree.c.
> 
> So the complication is there is no API to get the size of the radix tree 
> (or the scatter list table) and we would have to add something, either 
> internally to i915, or try and upstream it.
> 
> Or we avoid that with your timer idea and just purge all caches which 
> haven't been used in a while. Maybe from idle work or something.

Tempting. I like hooking into mark_idle/park more than adding a new
timer, and we already have the precedent of using that to initiate a
cache flush.

What's the impact of us keeping pages pinned when idle -- (a lot) more
work in the shrinker. Let's see where the cost-beneift lies.
 
> But for this immediate patch, would you be happy with adding and 
> exporting i915_gem_object_reset_page_iter and calling it after rotation 
> is done with accessing the pages? Benefit would be avoidance of 
> drm_malloc_gfp if that bothers you most.

Honestly I think the page_iter cache is useful and likely to already
exist or be used shortly after a portion of the object is rotated.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-27 Thread Tvrtko Ursulin



On 27/02/2017 10:21, Chris Wilson wrote:

On Mon, Feb 27, 2017 at 10:14:12AM +, Tvrtko Ursulin wrote:


On 27/02/2017 10:06, Chris Wilson wrote:

On Mon, Feb 27, 2017 at 09:55:10AM +, Tvrtko Ursulin wrote:


On 22/02/2017 08:44, Chris Wilson wrote:

I also think that's an argument for improving the general cache rather
than arguing against using it.


Well I wasn't concerned about the cache per se, but about whether it
is completely appropriate (best choice) to use it in this particular
case.

Because as I said before, for 1920x1080x32 we are talking about a
16KiB extremely short lived temporary allocation, vs the similar
size for the sg radix cache. But radix cache sticks around the the
lifetime of obj->mm.pages and it wouldn't otherwise be there since
AFAICS in practice no one really touches frame buffers in a way to
trigger its creation.

Those amounts of memory are not a concern, but again, is the
simplification of the code worth the conceptual downsides mentioned
above? Even if we considered 4K frame buffers, when both allocations
go to ~64KiB, would that change anything? I am not sure, probably
not for me.

So I am still unsure that we should go with this change.


Again, the complaint you have here are general concerns about caching
the mapping. Avoiding using the cache instead of improving the cache
seems the wrong approach.


Depends what kind of improvments to the cache you have in mind. If
you are thinking about size then I disagree, I think it is efficient
enough already. But if you are thinking about the lifetime
management then it is obvious from all that I have written so far
that I would agree with that. Since the core of my "complaint" is
the lifetime mismatch, and not the size.

For lifetime I am not sure what you could do. Exposing the size of
it, with maybe some other bits attached the the object, to the
shrinker I think doesn't make much sense since the sizes are so
small compared to the backing store sizes.

Perhaps you could add an explicit reset of the cache after the
rotation is done with it, but then the only remaining benefit will
be avoiding greater than zero order allocations. I say the only
one.. it would still be a good one. Just have no idea if this level
of cache usage would satisfy you!

Perhaps you could say what kind of optimisation you have in mind to
save me guessing? :)


I was thinking you would like an inactivity timer. Or we could have a
separate shrinker, as that's the principal cache management system.


I thought about the shrinker myself. Even wrote some code to more 
accurately size the objects as part of the existing passes. But as I 
said the contribution of anything object and not backing store is so 
small that, even though it would conceptually be more correct and 
perhaps avoid some marginal over-shrinking, I am not sure it is worth 
doing it. Assuming of course that I got the sizing of the radix tree 
correct! I just hacked something up based on some debug dumping code 
from radix-tree.c.


So the complication is there is no API to get the size of the radix tree 
(or the scatter list table) and we would have to add something, either 
internally to i915, or try and upstream it.


Or we avoid that with your timer idea and just purge all caches which 
haven't been used in a while. Maybe from idle work or something.


But for this immediate patch, would you be happy with adding and 
exporting i915_gem_object_reset_page_iter and calling it after rotation 
is done with accessing the pages? Benefit would be avoidance of 
drm_malloc_gfp if that bothers you most.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-27 Thread Joonas Lahtinen

On ma, 2017-02-27 at 10:21 +, Chris Wilson wrote:
> On Mon, Feb 27, 2017 at 10:14:12AM +, Tvrtko Ursulin wrote:
> > Perhaps you could say what kind of optimisation you have in mind to
> > save me guessing? :)
> 
> I was thinking you would like an inactivity timer. Or we could have a
> separate shrinker, as that's the principal cache management system.

My vote goes for merging and addressing the seen problems in the
caching logic.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-27 Thread Chris Wilson

On Mon, Feb 27, 2017 at 10:14:12AM +, Tvrtko Ursulin wrote:
> 
> On 27/02/2017 10:06, Chris Wilson wrote:
> >On Mon, Feb 27, 2017 at 09:55:10AM +, Tvrtko Ursulin wrote:
> >>
> >>On 22/02/2017 08:44, Chris Wilson wrote:
> >>>I also think that's an argument for improving the general cache rather
> >>>than arguing against using it.
> >>
> >>Well I wasn't concerned about the cache per se, but about whether it
> >>is completely appropriate (best choice) to use it in this particular
> >>case.
> >>
> >>Because as I said before, for 1920x1080x32 we are talking about a
> >>16KiB extremely short lived temporary allocation, vs the similar
> >>size for the sg radix cache. But radix cache sticks around the the
> >>lifetime of obj->mm.pages and it wouldn't otherwise be there since
> >>AFAICS in practice no one really touches frame buffers in a way to
> >>trigger its creation.
> >>
> >>Those amounts of memory are not a concern, but again, is the
> >>simplification of the code worth the conceptual downsides mentioned
> >>above? Even if we considered 4K frame buffers, when both allocations
> >>go to ~64KiB, would that change anything? I am not sure, probably
> >>not for me.
> >>
> >>So I am still unsure that we should go with this change.
> >
> >Again, the complaint you have here are general concerns about caching
> >the mapping. Avoiding using the cache instead of improving the cache
> >seems the wrong approach.
> 
> Depends what kind of improvments to the cache you have in mind. If
> you are thinking about size then I disagree, I think it is efficient
> enough already. But if you are thinking about the lifetime
> management then it is obvious from all that I have written so far
> that I would agree with that. Since the core of my "complaint" is
> the lifetime mismatch, and not the size.
> 
> For lifetime I am not sure what you could do. Exposing the size of
> it, with maybe some other bits attached the the object, to the
> shrinker I think doesn't make much sense since the sizes are so
> small compared to the backing store sizes.
> 
> Perhaps you could add an explicit reset of the cache after the
> rotation is done with it, but then the only remaining benefit will
> be avoiding greater than zero order allocations. I say the only
> one.. it would still be a good one. Just have no idea if this level
> of cache usage would satisfy you!
> 
> Perhaps you could say what kind of optimisation you have in mind to
> save me guessing? :)

I was thinking you would like an inactivity timer. Or we could have a
separate shrinker, as that's the principal cache management system.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-27 Thread Tvrtko Ursulin



On 27/02/2017 10:06, Chris Wilson wrote:

On Mon, Feb 27, 2017 at 09:55:10AM +, Tvrtko Ursulin wrote:


On 22/02/2017 08:44, Chris Wilson wrote:

I also think that's an argument for improving the general cache rather
than arguing against using it.


Well I wasn't concerned about the cache per se, but about whether it
is completely appropriate (best choice) to use it in this particular
case.

Because as I said before, for 1920x1080x32 we are talking about a
16KiB extremely short lived temporary allocation, vs the similar
size for the sg radix cache. But radix cache sticks around the the
lifetime of obj->mm.pages and it wouldn't otherwise be there since
AFAICS in practice no one really touches frame buffers in a way to
trigger its creation.

Those amounts of memory are not a concern, but again, is the
simplification of the code worth the conceptual downsides mentioned
above? Even if we considered 4K frame buffers, when both allocations
go to ~64KiB, would that change anything? I am not sure, probably
not for me.

So I am still unsure that we should go with this change.


Again, the complaint you have here are general concerns about caching
the mapping. Avoiding using the cache instead of improving the cache
seems the wrong approach.


Depends what kind of improvments to the cache you have in mind. If you 
are thinking about size then I disagree, I think it is efficient enough 
already. But if you are thinking about the lifetime management then it 
is obvious from all that I have written so far that I would agree with 
that. Since the core of my "complaint" is the lifetime mismatch, and not 
the size.


For lifetime I am not sure what you could do. Exposing the size of it, 
with maybe some other bits attached the the object, to the shrinker I 
think doesn't make much sense since the sizes are so small compared to 
the backing store sizes.


Perhaps you could add an explicit reset of the cache after the rotation 
is done with it, but then the only remaining benefit will be avoiding 
greater than zero order allocations. I say the only one.. it would still 
be a good one. Just have no idea if this level of cache usage would 
satisfy you!


Perhaps you could say what kind of optimisation you have in mind to save 
me guessing? :)


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-27 Thread Chris Wilson

On Mon, Feb 27, 2017 at 09:55:10AM +, Tvrtko Ursulin wrote:
> 
> On 22/02/2017 08:44, Chris Wilson wrote:
> >I also think that's an argument for improving the general cache rather
> >than arguing against using it.
> 
> Well I wasn't concerned about the cache per se, but about whether it
> is completely appropriate (best choice) to use it in this particular
> case.
> 
> Because as I said before, for 1920x1080x32 we are talking about a
> 16KiB extremely short lived temporary allocation, vs the similar
> size for the sg radix cache. But radix cache sticks around the the
> lifetime of obj->mm.pages and it wouldn't otherwise be there since
> AFAICS in practice no one really touches frame buffers in a way to
> trigger its creation.
> 
> Those amounts of memory are not a concern, but again, is the
> simplification of the code worth the conceptual downsides mentioned
> above? Even if we considered 4K frame buffers, when both allocations
> go to ~64KiB, would that change anything? I am not sure, probably
> not for me.
> 
> So I am still unsure that we should go with this change.

Again, the complaint you have here are general concerns about caching
the mapping. Avoiding using the cache instead of improving the cache
seems the wrong approach.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-27 Thread Tvrtko Ursulin



On 22/02/2017 08:44, Chris Wilson wrote:

On Wed, Feb 22, 2017 at 08:29:06AM +, Tvrtko Ursulin wrote:


On 21/02/2017 15:01, Joonas Lahtinen wrote:

On pe, 2017-02-17 at 15:10 +, Chris Wilson wrote:

The object already stores (computed on the fly) the index to dma address
so use it instead of reallocating a large temporary array every time we
bind a rotated framebuffer.

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
Cc: Joonas Lahtinen 
Cc: Tvrtko Ursulin 





+rotate_pages(struct drm_i915_gem_object *obj,
+const struct intel_rotation_plane_info *p,
 struct sg_table *st, struct scatterlist *sg)
{
unsigned int column, row;
-   unsigned int src_idx;

-   for (column = 0; column < width; column++) {
-   src_idx = stride * (height - 1) + column;
-   for (row = 0; row < height; row++) {
-   st->nents++;
+   for (column = 0; column < p->width; column++) {
+   unsigned long src_idx =
+   p->stride * (p->height - 1) + column + p->offset;
+   for (row = 0; row < p->height; row++) {
+   struct scatterlist *src;
+   unsigned int n;
+
+   src = i915_gem_object_get_sg(obj, src_idx, );


i915_gem_object_get_sg has variable names obj, n, *offset, so I'd be
little concerned of sidetracking reader. Rename n into offset?


Or use i915_gem_object_get_dma_address in the sg_dma_adress_assignment 
directly.





+   src_idx -= p->stride;
+
/* We don't need the pages, but need to initialize
 * the entries so the sg list can be happily traversed.
 * The only thing we need are DMA addresses.
 */
sg_set_page(sg, NULL, PAGE_SIZE, 0);
-   sg_dma_address(sg) = in[offset + src_idx];
+   sg_dma_address(sg) = sg_dma_address(src) + n*PAGE_SIZE;
sg_dma_len(sg) = PAGE_SIZE;
-   sg = sg_next(sg);
-   src_idx -= stride;


I'm not sure why moving this line, might as well hoist all these to the
for() line.


+   sg = __sg_next(sg);
+
+   st->nents++;
}
}

@@ -3074,62 +3079,30 @@ static noinline struct sg_table *
intel_rotate_pages(struct intel_rotation_info *rot_info,
   struct drm_i915_gem_object *obj)
{
-   const unsigned long n_pages = obj->base.size / PAGE_SIZE;
-   unsigned int size = intel_rotation_info_size(rot_info);
-   struct sgt_iter sgt_iter;
-   dma_addr_t dma_addr;
-   unsigned long i;
-   dma_addr_t *page_addr_list;
-   struct sg_table *st;
+   const unsigned int size = intel_rotation_info_size(rot_info);


This is only used once, just inline it.

Reviewed-by: Joonas Lahtinen 

Could use an A-b from Tvrtko.


I did not like it in another thread, well, better say I was
concerned about the increased memory use by the radix tree which
would then stick around for the obj->pages lifetime (long time for a
framebuffer I thought). While the temporary array allocations here
are not that big and very temporary.

I guess someone needs to bite the bullet and try and figure out how
exactly big is the radix tree for some mixes of more or less
coalesced sg entries.


I also think that's an argument for improving the general cache rather
than arguing against using it.


Well I wasn't concerned about the cache per se, but about whether it is 
completely appropriate (best choice) to use it in this particular case.


Because as I said before, for 1920x1080x32 we are talking about a 16KiB 
extremely short lived temporary allocation, vs the similar size for the 
sg radix cache. But radix cache sticks around the the lifetime of 
obj->mm.pages and it wouldn't otherwise be there since AFAICS in 
practice no one really touches frame buffers in a way to trigger its 
creation.


Those amounts of memory are not a concern, but again, is the 
simplification of the code worth the conceptual downsides mentioned 
above? Even if we considered 4K frame buffers, when both allocations go 
to ~64KiB, would that change anything? I am not sure, probably not for me.


So I am still unsure that we should go with this change.

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-22 Thread Chris Wilson

On Wed, Feb 22, 2017 at 08:29:06AM +, Tvrtko Ursulin wrote:
> 
> On 21/02/2017 15:01, Joonas Lahtinen wrote:
> >On pe, 2017-02-17 at 15:10 +, Chris Wilson wrote:
> >>The object already stores (computed on the fly) the index to dma address
> >>so use it instead of reallocating a large temporary array every time we
> >>bind a rotated framebuffer.
> >>
> >>Signed-off-by: Chris Wilson 
> >>Cc: Matthew Auld 
> >>Cc: Joonas Lahtinen 
> >>Cc: Tvrtko Ursulin 
> >
> >
> >
> >>+rotate_pages(struct drm_i915_gem_object *obj,
> >>+const struct intel_rotation_plane_info *p,
> >> struct sg_table *st, struct scatterlist *sg)
> >> {
> >>unsigned int column, row;
> >>-   unsigned int src_idx;
> >>
> >>-   for (column = 0; column < width; column++) {
> >>-   src_idx = stride * (height - 1) + column;
> >>-   for (row = 0; row < height; row++) {
> >>-   st->nents++;
> >>+   for (column = 0; column < p->width; column++) {
> >>+   unsigned long src_idx =
> >>+   p->stride * (p->height - 1) + column + p->offset;
> >>+   for (row = 0; row < p->height; row++) {
> >>+   struct scatterlist *src;
> >>+   unsigned int n;
> >>+
> >>+   src = i915_gem_object_get_sg(obj, src_idx, );
> >
> >i915_gem_object_get_sg has variable names obj, n, *offset, so I'd be
> >little concerned of sidetracking reader. Rename n into offset?
> >
> >>+   src_idx -= p->stride;
> >>+
> >>/* We don't need the pages, but need to initialize
> >> * the entries so the sg list can be happily traversed.
> >> * The only thing we need are DMA addresses.
> >> */
> >>sg_set_page(sg, NULL, PAGE_SIZE, 0);
> >>-   sg_dma_address(sg) = in[offset + src_idx];
> >>+   sg_dma_address(sg) = sg_dma_address(src) + n*PAGE_SIZE;
> >>sg_dma_len(sg) = PAGE_SIZE;
> >>-   sg = sg_next(sg);
> >>-   src_idx -= stride;
> >
> >I'm not sure why moving this line, might as well hoist all these to the
> >for() line.
> >
> >>+   sg = __sg_next(sg);
> >>+
> >>+   st->nents++;
> >>}
> >>}
> >>
> >>@@ -3074,62 +3079,30 @@ static noinline struct sg_table *
> >> intel_rotate_pages(struct intel_rotation_info *rot_info,
> >>   struct drm_i915_gem_object *obj)
> >> {
> >>-   const unsigned long n_pages = obj->base.size / PAGE_SIZE;
> >>-   unsigned int size = intel_rotation_info_size(rot_info);
> >>-   struct sgt_iter sgt_iter;
> >>-   dma_addr_t dma_addr;
> >>-   unsigned long i;
> >>-   dma_addr_t *page_addr_list;
> >>-   struct sg_table *st;
> >>+   const unsigned int size = intel_rotation_info_size(rot_info);
> >
> >This is only used once, just inline it.
> >
> >Reviewed-by: Joonas Lahtinen 
> >
> >Could use an A-b from Tvrtko.
> 
> I did not like it in another thread, well, better say I was
> concerned about the increased memory use by the radix tree which
> would then stick around for the obj->pages lifetime (long time for a
> framebuffer I thought). While the temporary array allocations here
> are not that big and very temporary.
> 
> I guess someone needs to bite the bullet and try and figure out how
> exactly big is the radix tree for some mixes of more or less
> coalesced sg entries.

I also think that's an argument for improving the general cache rather
than arguing against using it.
-Chris
> 

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-22 Thread Tvrtko Ursulin



On 21/02/2017 15:01, Joonas Lahtinen wrote:

On pe, 2017-02-17 at 15:10 +, Chris Wilson wrote:

The object already stores (computed on the fly) the index to dma address
so use it instead of reallocating a large temporary array every time we
bind a rotated framebuffer.

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
Cc: Joonas Lahtinen 
Cc: Tvrtko Ursulin 





+rotate_pages(struct drm_i915_gem_object *obj,
+const struct intel_rotation_plane_info *p,
 struct sg_table *st, struct scatterlist *sg)
 {
unsigned int column, row;
-   unsigned int src_idx;

-   for (column = 0; column < width; column++) {
-   src_idx = stride * (height - 1) + column;
-   for (row = 0; row < height; row++) {
-   st->nents++;
+   for (column = 0; column < p->width; column++) {
+   unsigned long src_idx =
+   p->stride * (p->height - 1) + column + p->offset;
+   for (row = 0; row < p->height; row++) {
+   struct scatterlist *src;
+   unsigned int n;
+
+   src = i915_gem_object_get_sg(obj, src_idx, );


i915_gem_object_get_sg has variable names obj, n, *offset, so I'd be
little concerned of sidetracking reader. Rename n into offset?


+   src_idx -= p->stride;
+
/* We don't need the pages, but need to initialize
 * the entries so the sg list can be happily traversed.
 * The only thing we need are DMA addresses.
 */
sg_set_page(sg, NULL, PAGE_SIZE, 0);
-   sg_dma_address(sg) = in[offset + src_idx];
+   sg_dma_address(sg) = sg_dma_address(src) + n*PAGE_SIZE;
sg_dma_len(sg) = PAGE_SIZE;
-   sg = sg_next(sg);
-   src_idx -= stride;


I'm not sure why moving this line, might as well hoist all these to the
for() line.


+   sg = __sg_next(sg);
+
+   st->nents++;
}
}

@@ -3074,62 +3079,30 @@ static noinline struct sg_table *
 intel_rotate_pages(struct intel_rotation_info *rot_info,
   struct drm_i915_gem_object *obj)
 {
-   const unsigned long n_pages = obj->base.size / PAGE_SIZE;
-   unsigned int size = intel_rotation_info_size(rot_info);
-   struct sgt_iter sgt_iter;
-   dma_addr_t dma_addr;
-   unsigned long i;
-   dma_addr_t *page_addr_list;
-   struct sg_table *st;
+   const unsigned int size = intel_rotation_info_size(rot_info);


This is only used once, just inline it.

Reviewed-by: Joonas Lahtinen 

Could use an A-b from Tvrtko.


I did not like it in another thread, well, better say I was concerned 
about the increased memory use by the radix tree which would then stick 
around for the obj->pages lifetime (long time for a framebuffer I 
thought). While the temporary array allocations here are not that big 
and very temporary.


I guess someone needs to bite the bullet and try and figure out how 
exactly big is the radix tree for some mixes of more or less coalesced 
sg entries.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-21 Thread Joonas Lahtinen

On pe, 2017-02-17 at 15:10 +, Chris Wilson wrote:
> The object already stores (computed on the fly) the index to dma address
> so use it instead of reallocating a large temporary array every time we
> bind a rotated framebuffer.
> 
> Signed-off-by: Chris Wilson 
> Cc: Matthew Auld 
> Cc: Joonas Lahtinen 
> Cc: Tvrtko Ursulin 



> +rotate_pages(struct drm_i915_gem_object *obj,
> +  const struct intel_rotation_plane_info *p,
>    struct sg_table *st, struct scatterlist *sg)
>  {
>   unsigned int column, row;
> - unsigned int src_idx;
>  
> - for (column = 0; column < width; column++) {
> - src_idx = stride * (height - 1) + column;
> - for (row = 0; row < height; row++) {
> - st->nents++;
> + for (column = 0; column < p->width; column++) {
> + unsigned long src_idx =
> + p->stride * (p->height - 1) + column + p->offset;
> + for (row = 0; row < p->height; row++) {
> + struct scatterlist *src;
> + unsigned int n;
> +
> + src = i915_gem_object_get_sg(obj, src_idx, );

i915_gem_object_get_sg has variable names obj, n, *offset, so I'd be
little concerned of sidetracking reader. Rename n into offset?

> + src_idx -= p->stride;
> +
>   /* We don't need the pages, but need to initialize
>    * the entries so the sg list can be happily traversed.
>    * The only thing we need are DMA addresses.
>    */
>   sg_set_page(sg, NULL, PAGE_SIZE, 0);
> - sg_dma_address(sg) = in[offset + src_idx];
> + sg_dma_address(sg) = sg_dma_address(src) + n*PAGE_SIZE;
>   sg_dma_len(sg) = PAGE_SIZE;
> - sg = sg_next(sg);
> - src_idx -= stride;

I'm not sure why moving this line, might as well hoist all these to the
for() line.

> + sg = __sg_next(sg);
> +
> + st->nents++;
>   }
>   }
>  
> @@ -3074,62 +3079,30 @@ static noinline struct sg_table *
>  intel_rotate_pages(struct intel_rotation_info *rot_info,
>      struct drm_i915_gem_object *obj)
>  {
> - const unsigned long n_pages = obj->base.size / PAGE_SIZE;
> - unsigned int size = intel_rotation_info_size(rot_info);
> - struct sgt_iter sgt_iter;
> - dma_addr_t dma_addr;
> - unsigned long i;
> - dma_addr_t *page_addr_list;
> - struct sg_table *st;
> + const unsigned int size = intel_rotation_info_size(rot_info);

This is only used once, just inline it.

Reviewed-by: Joonas Lahtinen 

Could use an A-b from Tvrtko.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-17 Thread Chris Wilson

The object already stores (computed on the fly) the index to dma address
so use it instead of reallocating a large temporary array every time we
bind a rotated framebuffer.

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
Cc: Joonas Lahtinen 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 77 -
 1 file changed, 25 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 47a38272f54c..848dbb926fd1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3043,27 +3043,32 @@ void i915_gem_restore_gtt_mappings(struct 
drm_i915_private *dev_priv)
 }
 
 static struct scatterlist *
-rotate_pages(const dma_addr_t *in, unsigned int offset,
-unsigned int width, unsigned int height,
-unsigned int stride,
+rotate_pages(struct drm_i915_gem_object *obj,
+const struct intel_rotation_plane_info *p,
 struct sg_table *st, struct scatterlist *sg)
 {
unsigned int column, row;
-   unsigned int src_idx;
 
-   for (column = 0; column < width; column++) {
-   src_idx = stride * (height - 1) + column;
-   for (row = 0; row < height; row++) {
-   st->nents++;
+   for (column = 0; column < p->width; column++) {
+   unsigned long src_idx =
+   p->stride * (p->height - 1) + column + p->offset;
+   for (row = 0; row < p->height; row++) {
+   struct scatterlist *src;
+   unsigned int n;
+
+   src = i915_gem_object_get_sg(obj, src_idx, );
+   src_idx -= p->stride;
+
/* We don't need the pages, but need to initialize
 * the entries so the sg list can be happily traversed.
 * The only thing we need are DMA addresses.
 */
sg_set_page(sg, NULL, PAGE_SIZE, 0);
-   sg_dma_address(sg) = in[offset + src_idx];
+   sg_dma_address(sg) = sg_dma_address(src) + n*PAGE_SIZE;
sg_dma_len(sg) = PAGE_SIZE;
-   sg = sg_next(sg);
-   src_idx -= stride;
+   sg = __sg_next(sg);
+
+   st->nents++;
}
}
 
@@ -3074,62 +3079,30 @@ static noinline struct sg_table *
 intel_rotate_pages(struct intel_rotation_info *rot_info,
   struct drm_i915_gem_object *obj)
 {
-   const unsigned long n_pages = obj->base.size / PAGE_SIZE;
-   unsigned int size = intel_rotation_info_size(rot_info);
-   struct sgt_iter sgt_iter;
-   dma_addr_t dma_addr;
-   unsigned long i;
-   dma_addr_t *page_addr_list;
-   struct sg_table *st;
+   const unsigned int size = intel_rotation_info_size(rot_info);
struct scatterlist *sg;
+   struct sg_table *st;
+   unsigned long i;
int ret = -ENOMEM;
 
-   /* Allocate a temporary list of source pages for random access. */
-   page_addr_list = drm_malloc_gfp(n_pages,
-   sizeof(dma_addr_t),
-   GFP_TEMPORARY);
-   if (!page_addr_list)
-   return ERR_PTR(ret);
-
-   /* Allocate target SG list. */
st = kmalloc(sizeof(*st), GFP_KERNEL);
if (!st)
-   goto err_st_alloc;
+   goto err;
 
ret = sg_alloc_table(st, size, GFP_KERNEL);
if (ret)
-   goto err_sg_alloc;
-
-   /* Populate source page list from the object. */
-   i = 0;
-   for_each_sgt_dma(dma_addr, sgt_iter, obj->mm.pages)
-   page_addr_list[i++] = dma_addr;
+   goto err;
 
-   GEM_BUG_ON(i != n_pages);
st->nents = 0;
sg = st->sgl;
-
-   for (i = 0 ; i < ARRAY_SIZE(rot_info->plane); i++) {
-   sg = rotate_pages(page_addr_list, rot_info->plane[i].offset,
- rot_info->plane[i].width, 
rot_info->plane[i].height,
- rot_info->plane[i].stride, st, sg);
-   }
-
-   DRM_DEBUG_KMS("Created rotated page mapping for object size %zu (%ux%u 
tiles, %u pages)\n",
- obj->base.size, rot_info->plane[0].width, 
rot_info->plane[0].height, size);
-
-   drm_free_large(page_addr_list);
+   for (i = 0 ; i < ARRAY_SIZE(rot_info->plane); i++)
+   sg = rotate_pages(obj, _info->plane[i], st, sg);
+   GEM_BUG_ON(st->nents != size);
 
return st;
 
-err_sg_alloc:
+err:
kfree(st);
-err_st_alloc:
-   drm_free_large(page_addr_list);
-
-   DRM_DEBUG_KMS("Failed to create rotated mapping for object size %zu!

[Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

2017-02-17 Thread Chris Wilson

The object already stores (computed on the fly) the index to dma address
so use it instead of reallocating a large temporary array every time we
bind a rotated framebuffer.

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
Cc: Joonas Lahtinen 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 77 -
 1 file changed, 25 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 47a38272f54c..848dbb926fd1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3043,27 +3043,32 @@ void i915_gem_restore_gtt_mappings(struct 
drm_i915_private *dev_priv)
 }
 
 static struct scatterlist *
-rotate_pages(const dma_addr_t *in, unsigned int offset,
-unsigned int width, unsigned int height,
-unsigned int stride,
+rotate_pages(struct drm_i915_gem_object *obj,
+const struct intel_rotation_plane_info *p,
 struct sg_table *st, struct scatterlist *sg)
 {
unsigned int column, row;
-   unsigned int src_idx;
 
-   for (column = 0; column < width; column++) {
-   src_idx = stride * (height - 1) + column;
-   for (row = 0; row < height; row++) {
-   st->nents++;
+   for (column = 0; column < p->width; column++) {
+   unsigned long src_idx =
+   p->stride * (p->height - 1) + column + p->offset;
+   for (row = 0; row < p->height; row++) {
+   struct scatterlist *src;
+   unsigned int n;
+
+   src = i915_gem_object_get_sg(obj, src_idx, );
+   src_idx -= p->stride;
+
/* We don't need the pages, but need to initialize
 * the entries so the sg list can be happily traversed.
 * The only thing we need are DMA addresses.
 */
sg_set_page(sg, NULL, PAGE_SIZE, 0);
-   sg_dma_address(sg) = in[offset + src_idx];
+   sg_dma_address(sg) = sg_dma_address(src) + n*PAGE_SIZE;
sg_dma_len(sg) = PAGE_SIZE;
-   sg = sg_next(sg);
-   src_idx -= stride;
+   sg = __sg_next(sg);
+
+   st->nents++;
}
}
 
@@ -3074,62 +3079,30 @@ static noinline struct sg_table *
 intel_rotate_pages(struct intel_rotation_info *rot_info,
   struct drm_i915_gem_object *obj)
 {
-   const unsigned long n_pages = obj->base.size / PAGE_SIZE;
-   unsigned int size = intel_rotation_info_size(rot_info);
-   struct sgt_iter sgt_iter;
-   dma_addr_t dma_addr;
-   unsigned long i;
-   dma_addr_t *page_addr_list;
-   struct sg_table *st;
+   const unsigned int size = intel_rotation_info_size(rot_info);
struct scatterlist *sg;
+   struct sg_table *st;
+   unsigned long i;
int ret = -ENOMEM;
 
-   /* Allocate a temporary list of source pages for random access. */
-   page_addr_list = drm_malloc_gfp(n_pages,
-   sizeof(dma_addr_t),
-   GFP_TEMPORARY);
-   if (!page_addr_list)
-   return ERR_PTR(ret);
-
-   /* Allocate target SG list. */
st = kmalloc(sizeof(*st), GFP_KERNEL);
if (!st)
-   goto err_st_alloc;
+   goto err;
 
ret = sg_alloc_table(st, size, GFP_KERNEL);
if (ret)
-   goto err_sg_alloc;
-
-   /* Populate source page list from the object. */
-   i = 0;
-   for_each_sgt_dma(dma_addr, sgt_iter, obj->mm.pages)
-   page_addr_list[i++] = dma_addr;
+   goto err;
 
-   GEM_BUG_ON(i != n_pages);
st->nents = 0;
sg = st->sgl;
-
-   for (i = 0 ; i < ARRAY_SIZE(rot_info->plane); i++) {
-   sg = rotate_pages(page_addr_list, rot_info->plane[i].offset,
- rot_info->plane[i].width, 
rot_info->plane[i].height,
- rot_info->plane[i].stride, st, sg);
-   }
-
-   DRM_DEBUG_KMS("Created rotated page mapping for object size %zu (%ux%u 
tiles, %u pages)\n",
- obj->base.size, rot_info->plane[0].width, 
rot_info->plane[0].height, size);
-
-   drm_free_large(page_addr_list);
+   for (i = 0 ; i < ARRAY_SIZE(rot_info->plane); i++)
+   sg = rotate_pages(obj, _info->plane[i], st, sg);
+   GEM_BUG_ON(st->nents != size);
 
return st;
 
-err_sg_alloc:
+err:
kfree(st);
-err_st_alloc:
-   drm_free_large(page_addr_list);
-
-   DRM_DEBUG_KMS("Failed to create rotated mapping for object size %zu!

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

Re: [Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

[Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

[Intel-gfx] [PATCH] drm/i915: Remove temporary allocation of dma addresses when rotating

14 matches

Site Navigation

Mail list logo

Footer information