Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation

2017-11-02 Thread Muthukumar, Aravindan
> > >On 10/24/2017 02:42 AM, Muthukumar, Aravindan wrote:
> > >>> -Original Message-
> > >>> From: Ian Romanick [mailto:i...@freedesktop.org]
> > >>> Sent: Friday, October 20, 2017 9:51 AM
> > >>> To: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; mesa-
> > >>> d...@lists.freedesktop.org
> > >>> Cc: J Karanje, Kedar <kedar.j.kara...@intel.com>; Tamminen, Eero T
> > >>> <eero.t.tammi...@intel.com>
> > >>> Subject: Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index
> > >>> calculation
> > >>>
> > >>> On 09/13/2017 11:43 PM, aravindan.muthuku...@intel.com wrote:
> > >>>> From: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
> > >>>>
> > >>>> Avoiding the loop which was running with O(n) complexity.
> > >>>> Now the complexity has been reduced to O(1)
> > >>>>
> > >>>> Algorithm calculates the index using matrix method.
> > >>>> Matrix arrangement is as below:
> > >>>> Assuming PAGE_SIZE is 4096.
> > >>>>
> > >>>>  1*4096   2*40963*40964*4096
> > >>>>  5*4096   6*40967*40968*4096
> > >>>
> > >>> I think adding one more row to this chart would make it more clear.
> > >>> The two rows shown also follow a simpler pattern, and that made
> > >>> some of the complexity below seem confusing.
> > >>>
> > >>>10*4096  12*4096   14*4096   16*4096
> > >>>
> > >>>>   ...  ...   ...   ...
> > >>>>   ...  ...   ...   ...
> > >>>>   ...  ...   ...   max_cache_size
> > >>>>
> > >>>> From this matrix its cleary seen that every row
> > >>>clearly
> > >>>
> > >>>> follows the below way:
> > >>>>  ...   ...   ...n
> > >>>>n+(1/4)n  n+(1/2)n  n+(3/4)n2n
> > >>>>
> > >>>> Row is calulated as log2(size/PAGE_SIZE) Column is calculated as
> > >>>> converting the difference between the elements to fit into power
> > >>>> size of two and indexing it.
> > >>>>
> > >>>> Final Index is (row*4)+(col-1)
> > >>>>
> > >>>> Tested with Intel Mesa CI.
> > >>>>
> > >>>> Improves performance of 3d Mark on Broxton.
> > >>>> Analyzed using Compare Perf Analyser:
> > >>>> Average : 201.2 +/- 65.4836 (n=20)
> > >>>
> > >>> Is 201 the improvement or the absolute score?  Do not quote
> > >>> absolute
> > scores.
> > >>>
> > >>>> Percentage : 0.705966% +/- 0.229767% (n=20)
> > >>>
> > >>> Eero: Can you reproduce this result on BXT or other platforms?
> > >>> Just
> > curious...
> > >>>
> > >>>> v2: Review comments regarding cosmetics and asserts implemented
> > >>>>
> > >>>> Signed-off-by: Aravindan Muthukumar
> > >>>> <aravindan.muthuku...@intel.com>
> > >>>> Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com>
> > >>>> Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com>
> > >>>> ---
> > >>>>  src/mesa/drivers/dri/i965/brw_bufmgr.c | 46
> > >>>> --
> > >>>>  1 file changed, 39 insertions(+), 7 deletions(-)
> > >>>>
> > >>>> diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > >>>> b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > >>>> index 8017219..8013ccb 100644
> > >>>> --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > >>>> +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > >>>> @@ -87,6 +87,8 @@
> > >>>>
> > >>>>  #define memclear(s) memset(, 0, sizeof(s))
> > >>>>
> > >>>> +#define PAGE_SIZE 4096
> > >>>> +
> > >>>>  #define FILE_DEBUG_FLAG DEBUG_BUFMGR
> > >>>>
> > >>>>  static inline int
> > >>>> @@ -181,19 +183,45 @@ bo_tile

Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation

2017-10-27 Thread Ian Romanick
On 10/24/2017 02:42 AM, Muthukumar, Aravindan wrote:
>> -Original Message-
>> From: Ian Romanick [mailto:i...@freedesktop.org]
>> Sent: Friday, October 20, 2017 9:51 AM
>> To: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; mesa-
>> d...@lists.freedesktop.org
>> Cc: J Karanje, Kedar <kedar.j.kara...@intel.com>; Tamminen, Eero T
>> <eero.t.tammi...@intel.com>
>> Subject: Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
>>
>> On 09/13/2017 11:43 PM, aravindan.muthuku...@intel.com wrote:
>>> From: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
>>>
>>> Avoiding the loop which was running with O(n) complexity.
>>> Now the complexity has been reduced to O(1)
>>>
>>> Algorithm calculates the index using matrix method.
>>> Matrix arrangement is as below:
>>> Assuming PAGE_SIZE is 4096.
>>>
>>>  1*4096   2*40963*40964*4096
>>>  5*4096   6*40967*40968*4096
>>
>> I think adding one more row to this chart would make it more clear.  The two
>> rows shown also follow a simpler pattern, and that made some of the
>> complexity below seem confusing.
>>
>>10*4096  12*4096   14*4096   16*4096
>>
>>>   ...  ...   ...   ...
>>>   ...  ...   ...   ...
>>>   ...  ...   ...   max_cache_size
>>>
>>> From this matrix its cleary seen that every row
>>clearly
>>
>>> follows the below way:
>>>  ...   ...   ...n
>>>n+(1/4)n  n+(1/2)n  n+(3/4)n2n
>>>
>>> Row is calulated as log2(size/PAGE_SIZE) Column is calculated as
>>> converting the difference between the elements to fit into power size
>>> of two and indexing it.
>>>
>>> Final Index is (row*4)+(col-1)
>>>
>>> Tested with Intel Mesa CI.
>>>
>>> Improves performance of 3d Mark on Broxton.
>>> Analyzed using Compare Perf Analyser:
>>> Average : 201.2 +/- 65.4836 (n=20)
>>
>> Is 201 the improvement or the absolute score?  Do not quote absolute scores.
>>
>>> Percentage : 0.705966% +/- 0.229767% (n=20)
>>
>> Eero: Can you reproduce this result on BXT or other platforms?  Just 
>> curious...
>>
>>> v2: Review comments regarding cosmetics and asserts implemented
>>>
>>> Signed-off-by: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
>>> Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com>
>>> Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com>
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_bufmgr.c | 46
>>> --
>>>  1 file changed, 39 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c
>>> b/src/mesa/drivers/dri/i965/brw_bufmgr.c
>>> index 8017219..8013ccb 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
>>> +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
>>> @@ -87,6 +87,8 @@
>>>
>>>  #define memclear(s) memset(, 0, sizeof(s))
>>>
>>> +#define PAGE_SIZE 4096
>>> +
>>>  #define FILE_DEBUG_FLAG DEBUG_BUFMGR
>>>
>>>  static inline int
>>> @@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t
>> pitch, uint32_t tiling)
>>> return ALIGN(pitch, tile_width);
>>>  }
>>>
>>> +static inline int
>>> +ilog2_round_up(int value)
>>> +{
>>> +   assert(value != 0);
>>> +   return 32 - __builtin_clz(value - 1); }
>>> +
>>> +/*
>>> + * This function finds the correct bucket fit for the input size.
>>> + * The function works with O(1) complexity when the requested size
>>> + * was queried instead of iterating the size through all the buckets.
>>> + */
>>>  static struct bo_cache_bucket *
>>>  bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size)  {
>>> -   int i;
>>> +   int index = -1;
>>
>> I don't see how execution could get to that test without index being set 
>> again,
>> so it should be safe to remove the initialization.
>>
> 
> We will skip this initialization since it doesn’t impact the index result.
> 
>   
>>> +   int row, col = 0;
>>> +   int pages, pages_log2;
>>
>> Move the declarations of row, col, pages, and pages_log2 into the else case,
>>

Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation

2017-10-24 Thread Muthukumar, Aravindan
> -Original Message-
> From: Ian Romanick [mailto:i...@freedesktop.org]
> Sent: Friday, October 20, 2017 9:51 AM
> To: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; mesa-
> d...@lists.freedesktop.org
> Cc: J Karanje, Kedar <kedar.j.kara...@intel.com>; Tamminen, Eero T
> <eero.t.tammi...@intel.com>
> Subject: Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
> 
> On 09/13/2017 11:43 PM, aravindan.muthuku...@intel.com wrote:
> > From: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
> >
> > Avoiding the loop which was running with O(n) complexity.
> > Now the complexity has been reduced to O(1)
> >
> > Algorithm calculates the index using matrix method.
> > Matrix arrangement is as below:
> > Assuming PAGE_SIZE is 4096.
> >
> >  1*4096   2*40963*40964*4096
> >  5*4096   6*40967*40968*4096
> 
> I think adding one more row to this chart would make it more clear.  The two
> rows shown also follow a simpler pattern, and that made some of the
> complexity below seem confusing.
> 
>10*4096  12*4096   14*4096   16*4096
> 
> >   ...  ...   ...   ...
> >   ...  ...   ...   ...
> >   ...  ...   ...   max_cache_size
> >
> > From this matrix its cleary seen that every row
>clearly
> 
> > follows the below way:
> >  ...   ...   ...n
> >n+(1/4)n  n+(1/2)n  n+(3/4)n2n
> >
> > Row is calulated as log2(size/PAGE_SIZE) Column is calculated as
> > converting the difference between the elements to fit into power size
> > of two and indexing it.
> >
> > Final Index is (row*4)+(col-1)
> >
> > Tested with Intel Mesa CI.
> >
> > Improves performance of 3d Mark on Broxton.
> > Analyzed using Compare Perf Analyser:
> > Average : 201.2 +/- 65.4836 (n=20)
> 
> Is 201 the improvement or the absolute score?  Do not quote absolute scores.
> 
> > Percentage : 0.705966% +/- 0.229767% (n=20)
> 
> Eero: Can you reproduce this result on BXT or other platforms?  Just 
> curious...
> 
> > v2: Review comments regarding cosmetics and asserts implemented
> >
> > Signed-off-by: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
> > Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com>
> > Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_bufmgr.c | 46
> > --
> >  1 file changed, 39 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > index 8017219..8013ccb 100644
> > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > @@ -87,6 +87,8 @@
> >
> >  #define memclear(s) memset(, 0, sizeof(s))
> >
> > +#define PAGE_SIZE 4096
> > +
> >  #define FILE_DEBUG_FLAG DEBUG_BUFMGR
> >
> >  static inline int
> > @@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t
> pitch, uint32_t tiling)
> > return ALIGN(pitch, tile_width);
> >  }
> >
> > +static inline int
> > +ilog2_round_up(int value)
> > +{
> > +   assert(value != 0);
> > +   return 32 - __builtin_clz(value - 1); }
> > +
> > +/*
> > + * This function finds the correct bucket fit for the input size.
> > + * The function works with O(1) complexity when the requested size
> > + * was queried instead of iterating the size through all the buckets.
> > + */
> >  static struct bo_cache_bucket *
> >  bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size)  {
> > -   int i;
> > +   int index = -1;
> 
> I don't see how execution could get to that test without index being set 
> again,
> so it should be safe to remove the initialization.
> 

We will skip this initialization since it doesn’t impact the index result.

  
> > +   int row, col = 0;
> > +   int pages, pages_log2;
> 
> Move the declarations of row, col, pages, and pages_log2 into the else case,
> initialize them with the only assignment to them, and make them const.
> 
> Since all of these values, including index, get values derived from size, I 
> believe
> the should all be unsigned.  In that case, remove the index >= 0 test below.
> 

We will move as well as initialize the above variables in the respective else 
part. 

> >
> > -   for (i = 0; i < bufmgr->num_buckets; i++) {
&

Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation

2017-10-22 Thread Marathe, Yogesh
Ian, 

Rest all review comments noted, we'll get back, Thanks.

>-Original Message-
>From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of
>Ian Romanick
>Sent: Friday, October 20, 2017 9:51 AM
>To: Muthukumar, Aravindan ; mesa-
>> Average : 201.2 +/- 65.4836 (n=20)
>
>Is 201 the improvement or the absolute score?  Do not quote absolute scores.
>

No, this is not an absolute score this is the difference patch made and the
deviation reported in requested format.

-Yogesh.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation

2017-10-19 Thread Ian Romanick
On 09/13/2017 11:43 PM, aravindan.muthuku...@intel.com wrote:
> From: Aravindan Muthukumar 
> 
> Avoiding the loop which was running with O(n) complexity.
> Now the complexity has been reduced to O(1)
> 
> Algorithm calculates the index using matrix method.
> Matrix arrangement is as below:
> Assuming PAGE_SIZE is 4096.
> 
>  1*4096   2*40963*40964*4096
>  5*4096   6*40967*40968*4096

I think adding one more row to this chart would make it more clear.  The
two rows shown also follow a simpler pattern, and that made some of the
complexity below seem confusing.

   10*4096  12*4096   14*4096   16*4096

>   ...  ...   ...   ...
>   ...  ...   ...   ...
>   ...  ...   ...   max_cache_size
> 
> From this matrix its cleary seen that every row
   clearly

> follows the below way:
>  ...   ...   ...n
>n+(1/4)n  n+(1/2)n  n+(3/4)n2n
> 
> Row is calulated as log2(size/PAGE_SIZE)
> Column is calculated as converting the difference
> between the elements to fit into power size of two
> and indexing it.
> 
> Final Index is (row*4)+(col-1)
> 
> Tested with Intel Mesa CI.
> 
> Improves performance of 3d Mark on Broxton.
> Analyzed using Compare Perf Analyser:
> Average : 201.2 +/- 65.4836 (n=20)

Is 201 the improvement or the absolute score?  Do not quote absolute scores.

> Percentage : 0.705966% +/- 0.229767% (n=20)

Eero: Can you reproduce this result on BXT or other platforms?  Just
curious...

> v2: Review comments regarding cosmetics and asserts implemented
> 
> Signed-off-by: Aravindan Muthukumar 
> Signed-off-by: Kedar Karanje 
> Reviewed-by: Yogesh Marathe 
> ---
>  src/mesa/drivers/dri/i965/brw_bufmgr.c | 46 
> --
>  1 file changed, 39 insertions(+), 7 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
> b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> index 8017219..8013ccb 100644
> --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> @@ -87,6 +87,8 @@
>  
>  #define memclear(s) memset(, 0, sizeof(s))
>  
> +#define PAGE_SIZE 4096
> +
>  #define FILE_DEBUG_FLAG DEBUG_BUFMGR
>  
>  static inline int
> @@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t 
> pitch, uint32_t tiling)
> return ALIGN(pitch, tile_width);
>  }
>  
> +static inline int
> +ilog2_round_up(int value)
> +{
> +   assert(value != 0);
> +   return 32 - __builtin_clz(value - 1);
> +}
> +
> +/*
> + * This function finds the correct bucket fit for the input size.
> + * The function works with O(1) complexity when the requested size
> + * was queried instead of iterating the size through all the buckets.
> + */
>  static struct bo_cache_bucket *
>  bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size)
>  {
> -   int i;
> +   int index = -1;

I don't see how execution could get to that test without index being set
again, so it should be safe to remove the initialization.

> +   int row, col = 0;
> +   int pages, pages_log2;

Move the declarations of row, col, pages, and pages_log2 into the else
case, initialize them with the only assignment to them, and make them const.

Since all of these values, including index, get values derived from
size, I believe the should all be unsigned.  In that case, remove the
index >= 0 test below.

>  
> -   for (i = 0; i < bufmgr->num_buckets; i++) {
> -  struct bo_cache_bucket *bucket = >cache_bucket[i];
> -  if (bucket->size >= size) {
> - return bucket;
> -  }
> +   /* condition for size less  than 4*4096 (16KB) page size */
 ^   ^ Remove one space here.
 |
 Capitalize and add a period at the end of the sentence.

> +   if(size <= 4 * PAGE_SIZE) {
^ Add a space here.

> +  index = DIV_ROUND_UP(size, PAGE_SIZE) - 1;;
  Remove extra trailing semicolon. ^

> +   } else {
> +  /* Number of pages of page size */
> +  pages = DIV_ROUND_UP(size, PAGE_SIZE);
> +  pages_log2 = ilog2_round_up(pages) - 1;

Isn't this going to be the same result as _mesa_logbase2(pages)?

> +
> +  /* Finding the row and column of the matrix */
> +  row = pages_log2 - 1;
> +  col = DIV_ROUND_UP((pages - (1 << pages_log2)),
> +(1 << (pages_log2 - 2)));
   ^^
   This should  |
   be aligned   |
   here. ---+

Removing some of the extra parenthesis would also make it easier to read.

So... I feel like this function doesn't need to special case size < 16k
like this.  I haven't benchmarked it, but the following is about 30
bytes shorter and avoids the conditional branch.  I also think it's
easier to understand.

   const unsigned pages = DIV_ROUND_UP(size, PAGE_SIZE);

   /* Row  

Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation

2017-10-16 Thread Muthukumar, Aravindan
Hi Jason,
Please review and let us know if any other changes need to be done for 
this patch.
We have incorporated all the review comments in the second version of the patch.

Thanks,
Aravindan

> -Original Message-
> From: Muthukumar, Aravindan
> Sent: Tuesday, October 3, 2017 10:04 AM
> To: Marathe, Yogesh <yogesh.mara...@intel.com>; Ekstrand, Jason
> <jason.ekstr...@intel.com>; Palli, Tapani <tapani.pa...@intel.com>; Ian
> Romanick <i...@freedesktop.org>; Emil Velikov <emil.l.veli...@gmail.com>
> Cc: J Karanje, Kedar <kedar.j.kara...@intel.com>; mesa-
> d...@lists.freedesktop.org
> Subject: RE: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
> 
> Hi Reviewers,
>   Please review and provide the comments on the second version of the
> patch.
> 
> Thanks,
> Aravindan
> 
> > -Original Message-
> > From: Marathe, Yogesh
> > Sent: Friday, September 22, 2017 8:41 AM
> > To: Ekstrand, Jason <jason.ekstr...@intel.com>; Palli, Tapani
> > <tapani.pa...@intel.com>; Ian Romanick <i...@freedesktop.org>; Emil
> > Velikov <emil.l.veli...@gmail.com>
> > Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; J Karanje,
> > Kedar <kedar.j.kara...@intel.com>; mesa-dev@lists.freedesktop.org
> > Subject: RE: [Mesa-dev] [PATCH v2] i965 : optimized bucket index
> > calculation
> >
> > + all v1 reviewers. Does this look ok?
> >
> > -Yogesh.
> >
> > >-Original Message-
> > >From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> > >Behalf Of aravindan.muthuku...@intel.com
> > >Sent: Thursday, September 14, 2017 12:13 PM
> > >To: mesa-dev@lists.freedesktop.org
> > >Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; J
> > >Karanje, Kedar <kedar.j.kara...@intel.com>
> > >Subject: [Mesa-dev] [PATCH v2] i965 : optimized bucket index
> > >calculation
> > >
> > >From: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
> > >
> > >Avoiding the loop which was running with O(n) complexity.
> > >Now the complexity has been reduced to O(1)
> > >
> > >Algorithm calculates the index using matrix method.
> > >Matrix arrangement is as below:
> > >Assuming PAGE_SIZE is 4096.
> > >
> > > 1*4096   2*40963*40964*4096
> > > 5*4096   6*40967*40968*4096
> > >  ...  ...   ...   ...
> > >  ...  ...   ...   ...
> > >  ...  ...   ...   max_cache_size
> > >
> > >From this matrix its cleary seen that every row follows the below way:
> > > ...   ...   ...n
> > >   n+(1/4)n  n+(1/2)n  n+(3/4)n2n
> > >
> > >Row is calulated as log2(size/PAGE_SIZE) Column is calculated as
> > >converting the difference between the elements to fit into power size
> > >of two
> > and indexing it.
> > >
> > >Final Index is (row*4)+(col-1)
> > >
> > >Tested with Intel Mesa CI.
> > >
> > >Improves performance of 3d Mark on Broxton.
> > >Analyzed using Compare Perf Analyser:
> > >Average : 201.2 +/- 65.4836 (n=20)
> > >Percentage : 0.705966% +/- 0.229767% (n=20)
> > >
> > >v2: Review comments regarding cosmetics and asserts implemented
> > >
> > >Signed-off-by: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
> > >Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com>
> > >Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com>
> > >---
> > > src/mesa/drivers/dri/i965/brw_bufmgr.c | 46
> > >--
> > > 1 file changed, 39 insertions(+), 7 deletions(-)
> > >
> > >diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > >b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > >index 8017219..8013ccb 100644
> > >--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > >+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > >@@ -87,6 +87,8 @@
> > >
> > > #define memclear(s) memset(, 0, sizeof(s))
> > >
> > >+#define PAGE_SIZE 4096
> > >+
> > > #define FILE_DEBUG_FLAG DEBUG_BUFMGR
> > >
> > > static inline int
> > >@@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr,
> > >uint32_t pitch, uint32_t tiling)
> > >return ALIGN(pitch, tile_width);
> > > }
> > >
> > >+static inline 

Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation

2017-10-02 Thread Muthukumar, Aravindan
Hi Reviewers,
Please review and provide the comments on the second version of the 
patch.

Thanks,
Aravindan

> -Original Message-
> From: Marathe, Yogesh
> Sent: Friday, September 22, 2017 8:41 AM
> To: Ekstrand, Jason <jason.ekstr...@intel.com>; Palli, Tapani
> <tapani.pa...@intel.com>; Ian Romanick <i...@freedesktop.org>; Emil Velikov
> <emil.l.veli...@gmail.com>
> Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; J Karanje,
> Kedar <kedar.j.kara...@intel.com>; mesa-dev@lists.freedesktop.org
> Subject: RE: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
> 
> + all v1 reviewers. Does this look ok?
> 
> -Yogesh.
> 
> >-Original Message-
> >From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> >Behalf Of aravindan.muthuku...@intel.com
> >Sent: Thursday, September 14, 2017 12:13 PM
> >To: mesa-dev@lists.freedesktop.org
> >Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; J Karanje,
> >Kedar <kedar.j.kara...@intel.com>
> >Subject: [Mesa-dev] [PATCH v2] i965 : optimized bucket index
> >calculation
> >
> >From: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
> >
> >Avoiding the loop which was running with O(n) complexity.
> >Now the complexity has been reduced to O(1)
> >
> >Algorithm calculates the index using matrix method.
> >Matrix arrangement is as below:
> >Assuming PAGE_SIZE is 4096.
> >
> > 1*4096   2*40963*40964*4096
> > 5*4096   6*40967*40968*4096
> >  ...  ...   ...   ...
> >  ...  ...   ...   ...
> >  ...  ...   ...   max_cache_size
> >
> >From this matrix its cleary seen that every row follows the below way:
> > ...   ...   ...n
> >   n+(1/4)n  n+(1/2)n  n+(3/4)n2n
> >
> >Row is calulated as log2(size/PAGE_SIZE) Column is calculated as
> >converting the difference between the elements to fit into power size of two
> and indexing it.
> >
> >Final Index is (row*4)+(col-1)
> >
> >Tested with Intel Mesa CI.
> >
> >Improves performance of 3d Mark on Broxton.
> >Analyzed using Compare Perf Analyser:
> >Average : 201.2 +/- 65.4836 (n=20)
> >Percentage : 0.705966% +/- 0.229767% (n=20)
> >
> >v2: Review comments regarding cosmetics and asserts implemented
> >
> >Signed-off-by: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
> >Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com>
> >Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com>
> >---
> > src/mesa/drivers/dri/i965/brw_bufmgr.c | 46
> >--
> > 1 file changed, 39 insertions(+), 7 deletions(-)
> >
> >diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> >b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> >index 8017219..8013ccb 100644
> >--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> >+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> >@@ -87,6 +87,8 @@
> >
> > #define memclear(s) memset(, 0, sizeof(s))
> >
> >+#define PAGE_SIZE 4096
> >+
> > #define FILE_DEBUG_FLAG DEBUG_BUFMGR
> >
> > static inline int
> >@@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t
> >pitch, uint32_t tiling)
> >return ALIGN(pitch, tile_width);
> > }
> >
> >+static inline int
> >+ilog2_round_up(int value)
> >+{
> >+   assert(value != 0);
> >+   return 32 - __builtin_clz(value - 1); }
> >+
> >+/*
> >+ * This function finds the correct bucket fit for the input size.
> >+ * The function works with O(1) complexity when the requested size
> >+ * was queried instead of iterating the size through all the buckets.
> >+ */
> > static struct bo_cache_bucket *
> > bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size)  {
> >-   int i;
> >+   int index = -1;
> >+   int row, col = 0;
> >+   int pages, pages_log2;
> >
> >-   for (i = 0; i < bufmgr->num_buckets; i++) {
> >-  struct bo_cache_bucket *bucket = >cache_bucket[i];
> >-  if (bucket->size >= size) {
> >- return bucket;
> >-  }
> >+   /* condition for size less  than 4*4096 (16KB) page size */
> >+   if(size <= 4 * PAGE_SIZE) {
> >+  index = DIV_ROUND_UP(size, PAGE_SIZE) - 1;;
> >+   } else {
> >+  /* Number of pages of page size */
> >+  pages = DIV_ROUND_UP(size, PAGE_SIZE);
> >+  pages_log2 = ilog2_round_up(page

Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation

2017-09-21 Thread Marathe, Yogesh
+ all v1 reviewers. Does this look ok?

-Yogesh.

>-Original Message-
>From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of
>aravindan.muthuku...@intel.com
>Sent: Thursday, September 14, 2017 12:13 PM
>To: mesa-dev@lists.freedesktop.org
>Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; J Karanje,
>Kedar <kedar.j.kara...@intel.com>
>Subject: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
>
>From: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
>
>Avoiding the loop which was running with O(n) complexity.
>Now the complexity has been reduced to O(1)
>
>Algorithm calculates the index using matrix method.
>Matrix arrangement is as below:
>Assuming PAGE_SIZE is 4096.
>
> 1*4096   2*40963*40964*4096
> 5*4096   6*40967*40968*4096
>  ...  ...   ...   ...
>  ...  ...   ...   ...
>  ...  ...   ...   max_cache_size
>
>From this matrix its cleary seen that every row follows the below way:
> ...   ...   ...n
>   n+(1/4)n  n+(1/2)n  n+(3/4)n2n
>
>Row is calulated as log2(size/PAGE_SIZE) Column is calculated as converting the
>difference between the elements to fit into power size of two and indexing it.
>
>Final Index is (row*4)+(col-1)
>
>Tested with Intel Mesa CI.
>
>Improves performance of 3d Mark on Broxton.
>Analyzed using Compare Perf Analyser:
>Average : 201.2 +/- 65.4836 (n=20)
>Percentage : 0.705966% +/- 0.229767% (n=20)
>
>v2: Review comments regarding cosmetics and asserts implemented
>
>Signed-off-by: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
>Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com>
>Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com>
>---
> src/mesa/drivers/dri/i965/brw_bufmgr.c | 46
>--
> 1 file changed, 39 insertions(+), 7 deletions(-)
>
>diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c
>b/src/mesa/drivers/dri/i965/brw_bufmgr.c
>index 8017219..8013ccb 100644
>--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
>+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
>@@ -87,6 +87,8 @@
>
> #define memclear(s) memset(, 0, sizeof(s))
>
>+#define PAGE_SIZE 4096
>+
> #define FILE_DEBUG_FLAG DEBUG_BUFMGR
>
> static inline int
>@@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t
>pitch, uint32_t tiling)
>return ALIGN(pitch, tile_width);
> }
>
>+static inline int
>+ilog2_round_up(int value)
>+{
>+   assert(value != 0);
>+   return 32 - __builtin_clz(value - 1); }
>+
>+/*
>+ * This function finds the correct bucket fit for the input size.
>+ * The function works with O(1) complexity when the requested size
>+ * was queried instead of iterating the size through all the buckets.
>+ */
> static struct bo_cache_bucket *
> bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size)  {
>-   int i;
>+   int index = -1;
>+   int row, col = 0;
>+   int pages, pages_log2;
>
>-   for (i = 0; i < bufmgr->num_buckets; i++) {
>-  struct bo_cache_bucket *bucket = >cache_bucket[i];
>-  if (bucket->size >= size) {
>- return bucket;
>-  }
>+   /* condition for size less  than 4*4096 (16KB) page size */
>+   if(size <= 4 * PAGE_SIZE) {
>+  index = DIV_ROUND_UP(size, PAGE_SIZE) - 1;;
>+   } else {
>+  /* Number of pages of page size */
>+  pages = DIV_ROUND_UP(size, PAGE_SIZE);
>+  pages_log2 = ilog2_round_up(pages) - 1;
>+
>+  /* Finding the row and column of the matrix */
>+  row = pages_log2 - 1;
>+  col = DIV_ROUND_UP((pages - (1 << pages_log2)),
>+(1 << (pages_log2 - 2)));
>+
>+  /* Using the calculated row and column to index into the matrix */
>+  index = (row << 2) + (col - 1);
>}
>
>-   return NULL;
>+   /* Checking the error condition */
>+   return (index >= 0 && index < bufmgr->num_buckets) ?
>+  (>cache_bucket[index]) : NULL;
> }
>
> int
>@@ -1239,6 +1267,10 @@ add_bucket(struct brw_bufmgr *bufmgr, int size)
>list_inithead(>cache_bucket[i].head);
>bufmgr->cache_bucket[i].size = size;
>bufmgr->num_buckets++;
>+
>+   assert(bucket_for_size(bufmgr, size) == >cache_bucket[i]);
>+   assert(bucket_for_size(bufmgr, size - 2048) == >cache_bucket[i]);
>+   assert(bucket_for_size(bufmgr, size + 1) !=
>+ >cache_bucket[i]);
> }
>
> static void
>--
>2.7.4
>
>___
>mesa-dev mailing list
>mesa-dev@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation

2017-09-14 Thread aravindan . muthukumar
From: Aravindan Muthukumar 

Avoiding the loop which was running with O(n) complexity.
Now the complexity has been reduced to O(1)

Algorithm calculates the index using matrix method.
Matrix arrangement is as below:
Assuming PAGE_SIZE is 4096.

 1*4096   2*40963*40964*4096
 5*4096   6*40967*40968*4096
  ...  ...   ...   ...
  ...  ...   ...   ...
  ...  ...   ...   max_cache_size

From this matrix its cleary seen that every row
follows the below way:
 ...   ...   ...n
   n+(1/4)n  n+(1/2)n  n+(3/4)n2n

Row is calulated as log2(size/PAGE_SIZE)
Column is calculated as converting the difference
between the elements to fit into power size of two
and indexing it.

Final Index is (row*4)+(col-1)

Tested with Intel Mesa CI.

Improves performance of 3d Mark on Broxton.
Analyzed using Compare Perf Analyser:
Average : 201.2 +/- 65.4836 (n=20)
Percentage : 0.705966% +/- 0.229767% (n=20)

v2: Review comments regarding cosmetics and asserts implemented

Signed-off-by: Aravindan Muthukumar 
Signed-off-by: Kedar Karanje 
Reviewed-by: Yogesh Marathe 
---
 src/mesa/drivers/dri/i965/brw_bufmgr.c | 46 --
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
b/src/mesa/drivers/dri/i965/brw_bufmgr.c
index 8017219..8013ccb 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
@@ -87,6 +87,8 @@
 
 #define memclear(s) memset(, 0, sizeof(s))
 
+#define PAGE_SIZE 4096
+
 #define FILE_DEBUG_FLAG DEBUG_BUFMGR
 
 static inline int
@@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t pitch, 
uint32_t tiling)
return ALIGN(pitch, tile_width);
 }
 
+static inline int
+ilog2_round_up(int value)
+{
+   assert(value != 0);
+   return 32 - __builtin_clz(value - 1);
+}
+
+/*
+ * This function finds the correct bucket fit for the input size.
+ * The function works with O(1) complexity when the requested size
+ * was queried instead of iterating the size through all the buckets.
+ */
 static struct bo_cache_bucket *
 bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size)
 {
-   int i;
+   int index = -1;
+   int row, col = 0;
+   int pages, pages_log2;
 
-   for (i = 0; i < bufmgr->num_buckets; i++) {
-  struct bo_cache_bucket *bucket = >cache_bucket[i];
-  if (bucket->size >= size) {
- return bucket;
-  }
+   /* condition for size less  than 4*4096 (16KB) page size */
+   if(size <= 4 * PAGE_SIZE) {
+  index = DIV_ROUND_UP(size, PAGE_SIZE) - 1;;
+   } else {
+  /* Number of pages of page size */
+  pages = DIV_ROUND_UP(size, PAGE_SIZE);
+  pages_log2 = ilog2_round_up(pages) - 1;
+
+  /* Finding the row and column of the matrix */
+  row = pages_log2 - 1;
+  col = DIV_ROUND_UP((pages - (1 << pages_log2)),
+(1 << (pages_log2 - 2)));
+
+  /* Using the calculated row and column to index into the matrix */
+  index = (row << 2) + (col - 1);
}
 
-   return NULL;
+   /* Checking the error condition */
+   return (index >= 0 && index < bufmgr->num_buckets) ?
+  (>cache_bucket[index]) : NULL;
 }
 
 int
@@ -1239,6 +1267,10 @@ add_bucket(struct brw_bufmgr *bufmgr, int size)
list_inithead(>cache_bucket[i].head);
bufmgr->cache_bucket[i].size = size;
bufmgr->num_buckets++;
+
+   assert(bucket_for_size(bufmgr, size) == >cache_bucket[i]);
+   assert(bucket_for_size(bufmgr, size - 2048) == >cache_bucket[i]);
+   assert(bucket_for_size(bufmgr, size + 1) != >cache_bucket[i]);
 }
 
 static void
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev