Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
> > >On 10/24/2017 02:42 AM, Muthukumar, Aravindan wrote: > > >>> -Original Message- > > >>> From: Ian Romanick [mailto:i...@freedesktop.org] > > >>> Sent: Friday, October 20, 2017 9:51 AM > > >>> To: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; mesa- > > >>> d...@lists.freedesktop.org > > >>> Cc: J Karanje, Kedar <kedar.j.kara...@intel.com>; Tamminen, Eero T > > >>> <eero.t.tammi...@intel.com> > > >>> Subject: Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index > > >>> calculation > > >>> > > >>> On 09/13/2017 11:43 PM, aravindan.muthuku...@intel.com wrote: > > >>>> From: Aravindan Muthukumar <aravindan.muthuku...@intel.com> > > >>>> > > >>>> Avoiding the loop which was running with O(n) complexity. > > >>>> Now the complexity has been reduced to O(1) > > >>>> > > >>>> Algorithm calculates the index using matrix method. > > >>>> Matrix arrangement is as below: > > >>>> Assuming PAGE_SIZE is 4096. > > >>>> > > >>>> 1*4096 2*40963*40964*4096 > > >>>> 5*4096 6*40967*40968*4096 > > >>> > > >>> I think adding one more row to this chart would make it more clear. > > >>> The two rows shown also follow a simpler pattern, and that made > > >>> some of the complexity below seem confusing. > > >>> > > >>>10*4096 12*4096 14*4096 16*4096 > > >>> > > >>>> ... ... ... ... > > >>>> ... ... ... ... > > >>>> ... ... ... max_cache_size > > >>>> > > >>>> From this matrix its cleary seen that every row > > >>>clearly > > >>> > > >>>> follows the below way: > > >>>> ... ... ...n > > >>>>n+(1/4)n n+(1/2)n n+(3/4)n2n > > >>>> > > >>>> Row is calulated as log2(size/PAGE_SIZE) Column is calculated as > > >>>> converting the difference between the elements to fit into power > > >>>> size of two and indexing it. > > >>>> > > >>>> Final Index is (row*4)+(col-1) > > >>>> > > >>>> Tested with Intel Mesa CI. > > >>>> > > >>>> Improves performance of 3d Mark on Broxton. > > >>>> Analyzed using Compare Perf Analyser: > > >>>> Average : 201.2 +/- 65.4836 (n=20) > > >>> > > >>> Is 201 the improvement or the absolute score? Do not quote > > >>> absolute > > scores. > > >>> > > >>>> Percentage : 0.705966% +/- 0.229767% (n=20) > > >>> > > >>> Eero: Can you reproduce this result on BXT or other platforms? > > >>> Just > > curious... > > >>> > > >>>> v2: Review comments regarding cosmetics and asserts implemented > > >>>> > > >>>> Signed-off-by: Aravindan Muthukumar > > >>>> <aravindan.muthuku...@intel.com> > > >>>> Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com> > > >>>> Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com> > > >>>> --- > > >>>> src/mesa/drivers/dri/i965/brw_bufmgr.c | 46 > > >>>> -- > > >>>> 1 file changed, 39 insertions(+), 7 deletions(-) > > >>>> > > >>>> diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c > > >>>> b/src/mesa/drivers/dri/i965/brw_bufmgr.c > > >>>> index 8017219..8013ccb 100644 > > >>>> --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c > > >>>> +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c > > >>>> @@ -87,6 +87,8 @@ > > >>>> > > >>>> #define memclear(s) memset(, 0, sizeof(s)) > > >>>> > > >>>> +#define PAGE_SIZE 4096 > > >>>> + > > >>>> #define FILE_DEBUG_FLAG DEBUG_BUFMGR > > >>>> > > >>>> static inline int > > >>>> @@ -181,19 +183,45 @@ bo_tile
Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
On 10/24/2017 02:42 AM, Muthukumar, Aravindan wrote: >> -Original Message- >> From: Ian Romanick [mailto:i...@freedesktop.org] >> Sent: Friday, October 20, 2017 9:51 AM >> To: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; mesa- >> d...@lists.freedesktop.org >> Cc: J Karanje, Kedar <kedar.j.kara...@intel.com>; Tamminen, Eero T >> <eero.t.tammi...@intel.com> >> Subject: Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation >> >> On 09/13/2017 11:43 PM, aravindan.muthuku...@intel.com wrote: >>> From: Aravindan Muthukumar <aravindan.muthuku...@intel.com> >>> >>> Avoiding the loop which was running with O(n) complexity. >>> Now the complexity has been reduced to O(1) >>> >>> Algorithm calculates the index using matrix method. >>> Matrix arrangement is as below: >>> Assuming PAGE_SIZE is 4096. >>> >>> 1*4096 2*40963*40964*4096 >>> 5*4096 6*40967*40968*4096 >> >> I think adding one more row to this chart would make it more clear. The two >> rows shown also follow a simpler pattern, and that made some of the >> complexity below seem confusing. >> >>10*4096 12*4096 14*4096 16*4096 >> >>> ... ... ... ... >>> ... ... ... ... >>> ... ... ... max_cache_size >>> >>> From this matrix its cleary seen that every row >>clearly >> >>> follows the below way: >>> ... ... ...n >>>n+(1/4)n n+(1/2)n n+(3/4)n2n >>> >>> Row is calulated as log2(size/PAGE_SIZE) Column is calculated as >>> converting the difference between the elements to fit into power size >>> of two and indexing it. >>> >>> Final Index is (row*4)+(col-1) >>> >>> Tested with Intel Mesa CI. >>> >>> Improves performance of 3d Mark on Broxton. >>> Analyzed using Compare Perf Analyser: >>> Average : 201.2 +/- 65.4836 (n=20) >> >> Is 201 the improvement or the absolute score? Do not quote absolute scores. >> >>> Percentage : 0.705966% +/- 0.229767% (n=20) >> >> Eero: Can you reproduce this result on BXT or other platforms? Just >> curious... >> >>> v2: Review comments regarding cosmetics and asserts implemented >>> >>> Signed-off-by: Aravindan Muthukumar <aravindan.muthuku...@intel.com> >>> Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com> >>> Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com> >>> --- >>> src/mesa/drivers/dri/i965/brw_bufmgr.c | 46 >>> -- >>> 1 file changed, 39 insertions(+), 7 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c >>> b/src/mesa/drivers/dri/i965/brw_bufmgr.c >>> index 8017219..8013ccb 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c >>> +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c >>> @@ -87,6 +87,8 @@ >>> >>> #define memclear(s) memset(, 0, sizeof(s)) >>> >>> +#define PAGE_SIZE 4096 >>> + >>> #define FILE_DEBUG_FLAG DEBUG_BUFMGR >>> >>> static inline int >>> @@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t >> pitch, uint32_t tiling) >>> return ALIGN(pitch, tile_width); >>> } >>> >>> +static inline int >>> +ilog2_round_up(int value) >>> +{ >>> + assert(value != 0); >>> + return 32 - __builtin_clz(value - 1); } >>> + >>> +/* >>> + * This function finds the correct bucket fit for the input size. >>> + * The function works with O(1) complexity when the requested size >>> + * was queried instead of iterating the size through all the buckets. >>> + */ >>> static struct bo_cache_bucket * >>> bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size) { >>> - int i; >>> + int index = -1; >> >> I don't see how execution could get to that test without index being set >> again, >> so it should be safe to remove the initialization. >> > > We will skip this initialization since it doesn’t impact the index result. > > >>> + int row, col = 0; >>> + int pages, pages_log2; >> >> Move the declarations of row, col, pages, and pages_log2 into the else case, >>
Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
> -Original Message- > From: Ian Romanick [mailto:i...@freedesktop.org] > Sent: Friday, October 20, 2017 9:51 AM > To: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; mesa- > d...@lists.freedesktop.org > Cc: J Karanje, Kedar <kedar.j.kara...@intel.com>; Tamminen, Eero T > <eero.t.tammi...@intel.com> > Subject: Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation > > On 09/13/2017 11:43 PM, aravindan.muthuku...@intel.com wrote: > > From: Aravindan Muthukumar <aravindan.muthuku...@intel.com> > > > > Avoiding the loop which was running with O(n) complexity. > > Now the complexity has been reduced to O(1) > > > > Algorithm calculates the index using matrix method. > > Matrix arrangement is as below: > > Assuming PAGE_SIZE is 4096. > > > > 1*4096 2*40963*40964*4096 > > 5*4096 6*40967*40968*4096 > > I think adding one more row to this chart would make it more clear. The two > rows shown also follow a simpler pattern, and that made some of the > complexity below seem confusing. > >10*4096 12*4096 14*4096 16*4096 > > > ... ... ... ... > > ... ... ... ... > > ... ... ... max_cache_size > > > > From this matrix its cleary seen that every row >clearly > > > follows the below way: > > ... ... ...n > >n+(1/4)n n+(1/2)n n+(3/4)n2n > > > > Row is calulated as log2(size/PAGE_SIZE) Column is calculated as > > converting the difference between the elements to fit into power size > > of two and indexing it. > > > > Final Index is (row*4)+(col-1) > > > > Tested with Intel Mesa CI. > > > > Improves performance of 3d Mark on Broxton. > > Analyzed using Compare Perf Analyser: > > Average : 201.2 +/- 65.4836 (n=20) > > Is 201 the improvement or the absolute score? Do not quote absolute scores. > > > Percentage : 0.705966% +/- 0.229767% (n=20) > > Eero: Can you reproduce this result on BXT or other platforms? Just > curious... > > > v2: Review comments regarding cosmetics and asserts implemented > > > > Signed-off-by: Aravindan Muthukumar <aravindan.muthuku...@intel.com> > > Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com> > > Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com> > > --- > > src/mesa/drivers/dri/i965/brw_bufmgr.c | 46 > > -- > > 1 file changed, 39 insertions(+), 7 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c > > b/src/mesa/drivers/dri/i965/brw_bufmgr.c > > index 8017219..8013ccb 100644 > > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c > > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c > > @@ -87,6 +87,8 @@ > > > > #define memclear(s) memset(, 0, sizeof(s)) > > > > +#define PAGE_SIZE 4096 > > + > > #define FILE_DEBUG_FLAG DEBUG_BUFMGR > > > > static inline int > > @@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t > pitch, uint32_t tiling) > > return ALIGN(pitch, tile_width); > > } > > > > +static inline int > > +ilog2_round_up(int value) > > +{ > > + assert(value != 0); > > + return 32 - __builtin_clz(value - 1); } > > + > > +/* > > + * This function finds the correct bucket fit for the input size. > > + * The function works with O(1) complexity when the requested size > > + * was queried instead of iterating the size through all the buckets. > > + */ > > static struct bo_cache_bucket * > > bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size) { > > - int i; > > + int index = -1; > > I don't see how execution could get to that test without index being set > again, > so it should be safe to remove the initialization. > We will skip this initialization since it doesn’t impact the index result. > > + int row, col = 0; > > + int pages, pages_log2; > > Move the declarations of row, col, pages, and pages_log2 into the else case, > initialize them with the only assignment to them, and make them const. > > Since all of these values, including index, get values derived from size, I > believe > the should all be unsigned. In that case, remove the index >= 0 test below. > We will move as well as initialize the above variables in the respective else part. > > > > - for (i = 0; i < bufmgr->num_buckets; i++) { &
Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
Ian, Rest all review comments noted, we'll get back, Thanks. >-Original Message- >From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of >Ian Romanick >Sent: Friday, October 20, 2017 9:51 AM >To: Muthukumar, Aravindan; mesa- >> Average : 201.2 +/- 65.4836 (n=20) > >Is 201 the improvement or the absolute score? Do not quote absolute scores. > No, this is not an absolute score this is the difference patch made and the deviation reported in requested format. -Yogesh. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
On 09/13/2017 11:43 PM, aravindan.muthuku...@intel.com wrote: > From: Aravindan Muthukumar> > Avoiding the loop which was running with O(n) complexity. > Now the complexity has been reduced to O(1) > > Algorithm calculates the index using matrix method. > Matrix arrangement is as below: > Assuming PAGE_SIZE is 4096. > > 1*4096 2*40963*40964*4096 > 5*4096 6*40967*40968*4096 I think adding one more row to this chart would make it more clear. The two rows shown also follow a simpler pattern, and that made some of the complexity below seem confusing. 10*4096 12*4096 14*4096 16*4096 > ... ... ... ... > ... ... ... ... > ... ... ... max_cache_size > > From this matrix its cleary seen that every row clearly > follows the below way: > ... ... ...n >n+(1/4)n n+(1/2)n n+(3/4)n2n > > Row is calulated as log2(size/PAGE_SIZE) > Column is calculated as converting the difference > between the elements to fit into power size of two > and indexing it. > > Final Index is (row*4)+(col-1) > > Tested with Intel Mesa CI. > > Improves performance of 3d Mark on Broxton. > Analyzed using Compare Perf Analyser: > Average : 201.2 +/- 65.4836 (n=20) Is 201 the improvement or the absolute score? Do not quote absolute scores. > Percentage : 0.705966% +/- 0.229767% (n=20) Eero: Can you reproduce this result on BXT or other platforms? Just curious... > v2: Review comments regarding cosmetics and asserts implemented > > Signed-off-by: Aravindan Muthukumar > Signed-off-by: Kedar Karanje > Reviewed-by: Yogesh Marathe > --- > src/mesa/drivers/dri/i965/brw_bufmgr.c | 46 > -- > 1 file changed, 39 insertions(+), 7 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c > b/src/mesa/drivers/dri/i965/brw_bufmgr.c > index 8017219..8013ccb 100644 > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c > @@ -87,6 +87,8 @@ > > #define memclear(s) memset(, 0, sizeof(s)) > > +#define PAGE_SIZE 4096 > + > #define FILE_DEBUG_FLAG DEBUG_BUFMGR > > static inline int > @@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t > pitch, uint32_t tiling) > return ALIGN(pitch, tile_width); > } > > +static inline int > +ilog2_round_up(int value) > +{ > + assert(value != 0); > + return 32 - __builtin_clz(value - 1); > +} > + > +/* > + * This function finds the correct bucket fit for the input size. > + * The function works with O(1) complexity when the requested size > + * was queried instead of iterating the size through all the buckets. > + */ > static struct bo_cache_bucket * > bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size) > { > - int i; > + int index = -1; I don't see how execution could get to that test without index being set again, so it should be safe to remove the initialization. > + int row, col = 0; > + int pages, pages_log2; Move the declarations of row, col, pages, and pages_log2 into the else case, initialize them with the only assignment to them, and make them const. Since all of these values, including index, get values derived from size, I believe the should all be unsigned. In that case, remove the index >= 0 test below. > > - for (i = 0; i < bufmgr->num_buckets; i++) { > - struct bo_cache_bucket *bucket = >cache_bucket[i]; > - if (bucket->size >= size) { > - return bucket; > - } > + /* condition for size less than 4*4096 (16KB) page size */ ^ ^ Remove one space here. | Capitalize and add a period at the end of the sentence. > + if(size <= 4 * PAGE_SIZE) { ^ Add a space here. > + index = DIV_ROUND_UP(size, PAGE_SIZE) - 1;; Remove extra trailing semicolon. ^ > + } else { > + /* Number of pages of page size */ > + pages = DIV_ROUND_UP(size, PAGE_SIZE); > + pages_log2 = ilog2_round_up(pages) - 1; Isn't this going to be the same result as _mesa_logbase2(pages)? > + > + /* Finding the row and column of the matrix */ > + row = pages_log2 - 1; > + col = DIV_ROUND_UP((pages - (1 << pages_log2)), > +(1 << (pages_log2 - 2))); ^^ This should | be aligned | here. ---+ Removing some of the extra parenthesis would also make it easier to read. So... I feel like this function doesn't need to special case size < 16k like this. I haven't benchmarked it, but the following is about 30 bytes shorter and avoids the conditional branch. I also think it's easier to understand. const unsigned pages = DIV_ROUND_UP(size, PAGE_SIZE); /* Row
Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
Hi Jason, Please review and let us know if any other changes need to be done for this patch. We have incorporated all the review comments in the second version of the patch. Thanks, Aravindan > -Original Message- > From: Muthukumar, Aravindan > Sent: Tuesday, October 3, 2017 10:04 AM > To: Marathe, Yogesh <yogesh.mara...@intel.com>; Ekstrand, Jason > <jason.ekstr...@intel.com>; Palli, Tapani <tapani.pa...@intel.com>; Ian > Romanick <i...@freedesktop.org>; Emil Velikov <emil.l.veli...@gmail.com> > Cc: J Karanje, Kedar <kedar.j.kara...@intel.com>; mesa- > d...@lists.freedesktop.org > Subject: RE: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation > > Hi Reviewers, > Please review and provide the comments on the second version of the > patch. > > Thanks, > Aravindan > > > -Original Message- > > From: Marathe, Yogesh > > Sent: Friday, September 22, 2017 8:41 AM > > To: Ekstrand, Jason <jason.ekstr...@intel.com>; Palli, Tapani > > <tapani.pa...@intel.com>; Ian Romanick <i...@freedesktop.org>; Emil > > Velikov <emil.l.veli...@gmail.com> > > Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; J Karanje, > > Kedar <kedar.j.kara...@intel.com>; mesa-dev@lists.freedesktop.org > > Subject: RE: [Mesa-dev] [PATCH v2] i965 : optimized bucket index > > calculation > > > > + all v1 reviewers. Does this look ok? > > > > -Yogesh. > > > > >-Original Message- > > >From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On > > >Behalf Of aravindan.muthuku...@intel.com > > >Sent: Thursday, September 14, 2017 12:13 PM > > >To: mesa-dev@lists.freedesktop.org > > >Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; J > > >Karanje, Kedar <kedar.j.kara...@intel.com> > > >Subject: [Mesa-dev] [PATCH v2] i965 : optimized bucket index > > >calculation > > > > > >From: Aravindan Muthukumar <aravindan.muthuku...@intel.com> > > > > > >Avoiding the loop which was running with O(n) complexity. > > >Now the complexity has been reduced to O(1) > > > > > >Algorithm calculates the index using matrix method. > > >Matrix arrangement is as below: > > >Assuming PAGE_SIZE is 4096. > > > > > > 1*4096 2*40963*40964*4096 > > > 5*4096 6*40967*40968*4096 > > > ... ... ... ... > > > ... ... ... ... > > > ... ... ... max_cache_size > > > > > >From this matrix its cleary seen that every row follows the below way: > > > ... ... ...n > > > n+(1/4)n n+(1/2)n n+(3/4)n2n > > > > > >Row is calulated as log2(size/PAGE_SIZE) Column is calculated as > > >converting the difference between the elements to fit into power size > > >of two > > and indexing it. > > > > > >Final Index is (row*4)+(col-1) > > > > > >Tested with Intel Mesa CI. > > > > > >Improves performance of 3d Mark on Broxton. > > >Analyzed using Compare Perf Analyser: > > >Average : 201.2 +/- 65.4836 (n=20) > > >Percentage : 0.705966% +/- 0.229767% (n=20) > > > > > >v2: Review comments regarding cosmetics and asserts implemented > > > > > >Signed-off-by: Aravindan Muthukumar <aravindan.muthuku...@intel.com> > > >Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com> > > >Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com> > > >--- > > > src/mesa/drivers/dri/i965/brw_bufmgr.c | 46 > > >-- > > > 1 file changed, 39 insertions(+), 7 deletions(-) > > > > > >diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c > > >b/src/mesa/drivers/dri/i965/brw_bufmgr.c > > >index 8017219..8013ccb 100644 > > >--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c > > >+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c > > >@@ -87,6 +87,8 @@ > > > > > > #define memclear(s) memset(, 0, sizeof(s)) > > > > > >+#define PAGE_SIZE 4096 > > >+ > > > #define FILE_DEBUG_FLAG DEBUG_BUFMGR > > > > > > static inline int > > >@@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, > > >uint32_t pitch, uint32_t tiling) > > >return ALIGN(pitch, tile_width); > > > } > > > > > >+static inline
Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
Hi Reviewers, Please review and provide the comments on the second version of the patch. Thanks, Aravindan > -Original Message- > From: Marathe, Yogesh > Sent: Friday, September 22, 2017 8:41 AM > To: Ekstrand, Jason <jason.ekstr...@intel.com>; Palli, Tapani > <tapani.pa...@intel.com>; Ian Romanick <i...@freedesktop.org>; Emil Velikov > <emil.l.veli...@gmail.com> > Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; J Karanje, > Kedar <kedar.j.kara...@intel.com>; mesa-dev@lists.freedesktop.org > Subject: RE: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation > > + all v1 reviewers. Does this look ok? > > -Yogesh. > > >-Original Message- > >From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On > >Behalf Of aravindan.muthuku...@intel.com > >Sent: Thursday, September 14, 2017 12:13 PM > >To: mesa-dev@lists.freedesktop.org > >Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; J Karanje, > >Kedar <kedar.j.kara...@intel.com> > >Subject: [Mesa-dev] [PATCH v2] i965 : optimized bucket index > >calculation > > > >From: Aravindan Muthukumar <aravindan.muthuku...@intel.com> > > > >Avoiding the loop which was running with O(n) complexity. > >Now the complexity has been reduced to O(1) > > > >Algorithm calculates the index using matrix method. > >Matrix arrangement is as below: > >Assuming PAGE_SIZE is 4096. > > > > 1*4096 2*40963*40964*4096 > > 5*4096 6*40967*40968*4096 > > ... ... ... ... > > ... ... ... ... > > ... ... ... max_cache_size > > > >From this matrix its cleary seen that every row follows the below way: > > ... ... ...n > > n+(1/4)n n+(1/2)n n+(3/4)n2n > > > >Row is calulated as log2(size/PAGE_SIZE) Column is calculated as > >converting the difference between the elements to fit into power size of two > and indexing it. > > > >Final Index is (row*4)+(col-1) > > > >Tested with Intel Mesa CI. > > > >Improves performance of 3d Mark on Broxton. > >Analyzed using Compare Perf Analyser: > >Average : 201.2 +/- 65.4836 (n=20) > >Percentage : 0.705966% +/- 0.229767% (n=20) > > > >v2: Review comments regarding cosmetics and asserts implemented > > > >Signed-off-by: Aravindan Muthukumar <aravindan.muthuku...@intel.com> > >Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com> > >Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com> > >--- > > src/mesa/drivers/dri/i965/brw_bufmgr.c | 46 > >-- > > 1 file changed, 39 insertions(+), 7 deletions(-) > > > >diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c > >b/src/mesa/drivers/dri/i965/brw_bufmgr.c > >index 8017219..8013ccb 100644 > >--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c > >+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c > >@@ -87,6 +87,8 @@ > > > > #define memclear(s) memset(, 0, sizeof(s)) > > > >+#define PAGE_SIZE 4096 > >+ > > #define FILE_DEBUG_FLAG DEBUG_BUFMGR > > > > static inline int > >@@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t > >pitch, uint32_t tiling) > >return ALIGN(pitch, tile_width); > > } > > > >+static inline int > >+ilog2_round_up(int value) > >+{ > >+ assert(value != 0); > >+ return 32 - __builtin_clz(value - 1); } > >+ > >+/* > >+ * This function finds the correct bucket fit for the input size. > >+ * The function works with O(1) complexity when the requested size > >+ * was queried instead of iterating the size through all the buckets. > >+ */ > > static struct bo_cache_bucket * > > bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size) { > >- int i; > >+ int index = -1; > >+ int row, col = 0; > >+ int pages, pages_log2; > > > >- for (i = 0; i < bufmgr->num_buckets; i++) { > >- struct bo_cache_bucket *bucket = >cache_bucket[i]; > >- if (bucket->size >= size) { > >- return bucket; > >- } > >+ /* condition for size less than 4*4096 (16KB) page size */ > >+ if(size <= 4 * PAGE_SIZE) { > >+ index = DIV_ROUND_UP(size, PAGE_SIZE) - 1;; > >+ } else { > >+ /* Number of pages of page size */ > >+ pages = DIV_ROUND_UP(size, PAGE_SIZE); > >+ pages_log2 = ilog2_round_up(page
Re: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
+ all v1 reviewers. Does this look ok? -Yogesh. >-Original Message- >From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of >aravindan.muthuku...@intel.com >Sent: Thursday, September 14, 2017 12:13 PM >To: mesa-dev@lists.freedesktop.org >Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>; J Karanje, >Kedar <kedar.j.kara...@intel.com> >Subject: [Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation > >From: Aravindan Muthukumar <aravindan.muthuku...@intel.com> > >Avoiding the loop which was running with O(n) complexity. >Now the complexity has been reduced to O(1) > >Algorithm calculates the index using matrix method. >Matrix arrangement is as below: >Assuming PAGE_SIZE is 4096. > > 1*4096 2*40963*40964*4096 > 5*4096 6*40967*40968*4096 > ... ... ... ... > ... ... ... ... > ... ... ... max_cache_size > >From this matrix its cleary seen that every row follows the below way: > ... ... ...n > n+(1/4)n n+(1/2)n n+(3/4)n2n > >Row is calulated as log2(size/PAGE_SIZE) Column is calculated as converting the >difference between the elements to fit into power size of two and indexing it. > >Final Index is (row*4)+(col-1) > >Tested with Intel Mesa CI. > >Improves performance of 3d Mark on Broxton. >Analyzed using Compare Perf Analyser: >Average : 201.2 +/- 65.4836 (n=20) >Percentage : 0.705966% +/- 0.229767% (n=20) > >v2: Review comments regarding cosmetics and asserts implemented > >Signed-off-by: Aravindan Muthukumar <aravindan.muthuku...@intel.com> >Signed-off-by: Kedar Karanje <kedar.j.kara...@intel.com> >Reviewed-by: Yogesh Marathe <yogesh.mara...@intel.com> >--- > src/mesa/drivers/dri/i965/brw_bufmgr.c | 46 >-- > 1 file changed, 39 insertions(+), 7 deletions(-) > >diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c >b/src/mesa/drivers/dri/i965/brw_bufmgr.c >index 8017219..8013ccb 100644 >--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c >+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c >@@ -87,6 +87,8 @@ > > #define memclear(s) memset(, 0, sizeof(s)) > >+#define PAGE_SIZE 4096 >+ > #define FILE_DEBUG_FLAG DEBUG_BUFMGR > > static inline int >@@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t >pitch, uint32_t tiling) >return ALIGN(pitch, tile_width); > } > >+static inline int >+ilog2_round_up(int value) >+{ >+ assert(value != 0); >+ return 32 - __builtin_clz(value - 1); } >+ >+/* >+ * This function finds the correct bucket fit for the input size. >+ * The function works with O(1) complexity when the requested size >+ * was queried instead of iterating the size through all the buckets. >+ */ > static struct bo_cache_bucket * > bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size) { >- int i; >+ int index = -1; >+ int row, col = 0; >+ int pages, pages_log2; > >- for (i = 0; i < bufmgr->num_buckets; i++) { >- struct bo_cache_bucket *bucket = >cache_bucket[i]; >- if (bucket->size >= size) { >- return bucket; >- } >+ /* condition for size less than 4*4096 (16KB) page size */ >+ if(size <= 4 * PAGE_SIZE) { >+ index = DIV_ROUND_UP(size, PAGE_SIZE) - 1;; >+ } else { >+ /* Number of pages of page size */ >+ pages = DIV_ROUND_UP(size, PAGE_SIZE); >+ pages_log2 = ilog2_round_up(pages) - 1; >+ >+ /* Finding the row and column of the matrix */ >+ row = pages_log2 - 1; >+ col = DIV_ROUND_UP((pages - (1 << pages_log2)), >+(1 << (pages_log2 - 2))); >+ >+ /* Using the calculated row and column to index into the matrix */ >+ index = (row << 2) + (col - 1); >} > >- return NULL; >+ /* Checking the error condition */ >+ return (index >= 0 && index < bufmgr->num_buckets) ? >+ (>cache_bucket[index]) : NULL; > } > > int >@@ -1239,6 +1267,10 @@ add_bucket(struct brw_bufmgr *bufmgr, int size) >list_inithead(>cache_bucket[i].head); >bufmgr->cache_bucket[i].size = size; >bufmgr->num_buckets++; >+ >+ assert(bucket_for_size(bufmgr, size) == >cache_bucket[i]); >+ assert(bucket_for_size(bufmgr, size - 2048) == >cache_bucket[i]); >+ assert(bucket_for_size(bufmgr, size + 1) != >+ >cache_bucket[i]); > } > > static void >-- >2.7.4 > >___ >mesa-dev mailing list >mesa-dev@lists.freedesktop.org >https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] i965 : optimized bucket index calculation
From: Aravindan MuthukumarAvoiding the loop which was running with O(n) complexity. Now the complexity has been reduced to O(1) Algorithm calculates the index using matrix method. Matrix arrangement is as below: Assuming PAGE_SIZE is 4096. 1*4096 2*40963*40964*4096 5*4096 6*40967*40968*4096 ... ... ... ... ... ... ... ... ... ... ... max_cache_size From this matrix its cleary seen that every row follows the below way: ... ... ...n n+(1/4)n n+(1/2)n n+(3/4)n2n Row is calulated as log2(size/PAGE_SIZE) Column is calculated as converting the difference between the elements to fit into power size of two and indexing it. Final Index is (row*4)+(col-1) Tested with Intel Mesa CI. Improves performance of 3d Mark on Broxton. Analyzed using Compare Perf Analyser: Average : 201.2 +/- 65.4836 (n=20) Percentage : 0.705966% +/- 0.229767% (n=20) v2: Review comments regarding cosmetics and asserts implemented Signed-off-by: Aravindan Muthukumar Signed-off-by: Kedar Karanje Reviewed-by: Yogesh Marathe --- src/mesa/drivers/dri/i965/brw_bufmgr.c | 46 -- 1 file changed, 39 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index 8017219..8013ccb 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -87,6 +87,8 @@ #define memclear(s) memset(, 0, sizeof(s)) +#define PAGE_SIZE 4096 + #define FILE_DEBUG_FLAG DEBUG_BUFMGR static inline int @@ -181,19 +183,45 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t pitch, uint32_t tiling) return ALIGN(pitch, tile_width); } +static inline int +ilog2_round_up(int value) +{ + assert(value != 0); + return 32 - __builtin_clz(value - 1); +} + +/* + * This function finds the correct bucket fit for the input size. + * The function works with O(1) complexity when the requested size + * was queried instead of iterating the size through all the buckets. + */ static struct bo_cache_bucket * bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size) { - int i; + int index = -1; + int row, col = 0; + int pages, pages_log2; - for (i = 0; i < bufmgr->num_buckets; i++) { - struct bo_cache_bucket *bucket = >cache_bucket[i]; - if (bucket->size >= size) { - return bucket; - } + /* condition for size less than 4*4096 (16KB) page size */ + if(size <= 4 * PAGE_SIZE) { + index = DIV_ROUND_UP(size, PAGE_SIZE) - 1;; + } else { + /* Number of pages of page size */ + pages = DIV_ROUND_UP(size, PAGE_SIZE); + pages_log2 = ilog2_round_up(pages) - 1; + + /* Finding the row and column of the matrix */ + row = pages_log2 - 1; + col = DIV_ROUND_UP((pages - (1 << pages_log2)), +(1 << (pages_log2 - 2))); + + /* Using the calculated row and column to index into the matrix */ + index = (row << 2) + (col - 1); } - return NULL; + /* Checking the error condition */ + return (index >= 0 && index < bufmgr->num_buckets) ? + (>cache_bucket[index]) : NULL; } int @@ -1239,6 +1267,10 @@ add_bucket(struct brw_bufmgr *bufmgr, int size) list_inithead(>cache_bucket[i].head); bufmgr->cache_bucket[i].size = size; bufmgr->num_buckets++; + + assert(bucket_for_size(bufmgr, size) == >cache_bucket[i]); + assert(bucket_for_size(bufmgr, size - 2048) == >cache_bucket[i]); + assert(bucket_for_size(bufmgr, size + 1) != >cache_bucket[i]); } static void -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev