Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2015-01-06 Thread Daniel Vetter
On Tue, Jan 06, 2015 at 03:15:39PM +0800, Xiang, Haihao wrote:
 On Mon, 2015-01-05 at 23:03 -0800, Kenneth Graunke wrote:
  On Tuesday, January 06, 2015 02:39:36 PM Xiang, Haihao wrote:
   On Mon, 2015-01-05 at 21:54 -0800, Kenneth Graunke wrote:
On Tuesday, January 06, 2015 01:11:53 PM Xiang, Haihao wrote:
 
 Hi Kenneth,
 
 How did you test OSD ? I can't reproduce the issue you mentioned, OSD
 works well for me when using mplayer-vaapi with the latest
 libva/libva-intel-driver master branch.
 
 I tried your patch, what surprised me is OSD still works well after
 applying your patch. It seems your patch didn't disable the palette.
 
 Thanks
 Haihao

I ran:

mplayer -osdlevel 3 -vo vaapi big_buck_bunny_720p_stereo.ogg

For me, the OSD text is solid green, with hard edges.
   
   The OSD text is white for me when using mplayer -osdlevel 3 -vo vaapi
   xxx. If possible, could you update your mplayer ?
  
  Huh.  I'm using the Arch Linux package of mplayer-vaapi 36265-13,
  which seems to be the most recent subversion commit ID.  I've never seen
  white text on my Haswell system - it seems to be consistently dark green.
 
If you use -vo gl or -vo xv, the OSD is solid white text with a 
black
border around it.  I presume that it's supposed to be white with vaapi 
as
well, but I guess I'm not entirely sure.

It's possible that the optimization doesn't affect the palette as long 
as
you never use sample_c with the paletted textures.
   
   I verified the palette takes effect in the following way:
   
   1. Only support P8A8 format in the driver
   
   2. ran the above command and I saw white OSD text
   
   3. Only support P4A4 format in the driver and don't use
   3DSTATE_SAMPLER_PALETTE_LOAD0 to load the value to the texture palette,
   so the palette keeps unchanged. 
   
   4. ran the above command and I saw black OSD text.
   
   5. Load the right value to the texture palette and ran the above command
   again, I saw white OSD text.
   
   Hence I think sample_c with the paletted textures is used in the driver.
  
  That sounds like the palette is actually working, then.  Great :)
  
  I doubt that libva would use sample_c - sampling with a shadow comparison?
  It looks like it just uses sample and sample+killpix.
 
 You are right, libva driver doesn't use sample_c message. 
 
  I'm pretty sure the sample_c optimization just uses the palette memory as
  storage for some stuff, so it's quite possible it just works if you're
  only using sample and sample+killpix.
 
 Thanks for the explanation, it makes sense to me.

Thanks for digging into this some more, I've added the above discussion in
a quote to the commit message.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2015-01-05 Thread Xiang, Haihao
On Mon, 2015-01-05 at 21:54 -0800, Kenneth Graunke wrote:
 On Tuesday, January 06, 2015 01:11:53 PM Xiang, Haihao wrote:
  
  Hi Kenneth,
  
  How did you test OSD ? I can't reproduce the issue you mentioned, OSD
  works well for me when using mplayer-vaapi with the latest
  libva/libva-intel-driver master branch.
  
  I tried your patch, what surprised me is OSD still works well after
  applying your patch. It seems your patch didn't disable the palette.
  
  Thanks
  Haihao
 
 I ran:
 
 mplayer -osdlevel 3 -vo vaapi big_buck_bunny_720p_stereo.ogg
 
 For me, the OSD text is solid green, with hard edges.

The OSD text is white for me when using mplayer -osdlevel 3 -vo vaapi
xxx. If possible, could you update your mplayer ?

 
 If you use -vo gl or -vo xv, the OSD is solid white text with a black
 border around it.  I presume that it's supposed to be white with vaapi as
 well, but I guess I'm not entirely sure.
 
 It's possible that the optimization doesn't affect the palette as long as
 you never use sample_c with the paletted textures.


I verified the palette takes effect in the following way:

1. Only support P8A8 format in the driver

2. ran the above command and I saw white OSD text

3. Only support P4A4 format in the driver and don't use
3DSTATE_SAMPLER_PALETTE_LOAD0 to load the value to the texture palette,
so the palette keeps unchanged. 

4. ran the above command and I saw black OSD text.

5. Load the right value to the texture palette and ran the above command
again, I saw white OSD text.

Hence I think sample_c with the paletted textures is used in the driver.


 
 --Ken


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2015-01-05 Thread Xiang, Haihao
On Tue, 2015-01-06 at 14:39 +0800, Xiang, Haihao wrote:
 On Mon, 2015-01-05 at 21:54 -0800, Kenneth Graunke wrote:
  On Tuesday, January 06, 2015 01:11:53 PM Xiang, Haihao wrote:
   
   Hi Kenneth,
   
   How did you test OSD ? I can't reproduce the issue you mentioned, OSD
   works well for me when using mplayer-vaapi with the latest
   libva/libva-intel-driver master branch.
   
   I tried your patch, what surprised me is OSD still works well after
   applying your patch. It seems your patch didn't disable the palette.
   
   Thanks
   Haihao
  
  I ran:
  
  mplayer -osdlevel 3 -vo vaapi big_buck_bunny_720p_stereo.ogg
  
  For me, the OSD text is solid green, with hard edges.
 
 The OSD text is white for me when using mplayer -osdlevel 3 -vo vaapi
 xxx. If possible, could you update your mplayer ?
 
  
  If you use -vo gl or -vo xv, the OSD is solid white text with a black
  border around it.  I presume that it's supposed to be white with vaapi as
  well, but I guess I'm not entirely sure.
  
  It's possible that the optimization doesn't affect the palette as long as
  you never use sample_c with the paletted textures.
 
 
 I verified the palette takes effect in the following way:
 
 1. Only support P8A8 format in the driver
 
 2. ran the above command and I saw white OSD text
 
 3. Only support P4A4 format in the driver and don't use
 3DSTATE_SAMPLER_PALETTE_LOAD0 to load the value to the texture palette,
 so the palette keeps unchanged. 
 
 4. ran the above command and I saw black OSD text.
 
 5. Load the right value to the texture palette and ran the above command
 again, I saw white OSD text.
 
 Hence I think sample_c with the paletted textures is used in the driver.

Sorry, libva driver doesn't use sample_c message, I mean the paletted
texture is used.  However corroding to the doc, Palette is disabled for
fast mode.


 
  
  --Ken
 
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2015-01-05 Thread Kenneth Graunke
On Tuesday, January 06, 2015 02:39:36 PM Xiang, Haihao wrote:
 On Mon, 2015-01-05 at 21:54 -0800, Kenneth Graunke wrote:
  On Tuesday, January 06, 2015 01:11:53 PM Xiang, Haihao wrote:
   
   Hi Kenneth,
   
   How did you test OSD ? I can't reproduce the issue you mentioned, OSD
   works well for me when using mplayer-vaapi with the latest
   libva/libva-intel-driver master branch.
   
   I tried your patch, what surprised me is OSD still works well after
   applying your patch. It seems your patch didn't disable the palette.
   
   Thanks
   Haihao
  
  I ran:
  
  mplayer -osdlevel 3 -vo vaapi big_buck_bunny_720p_stereo.ogg
  
  For me, the OSD text is solid green, with hard edges.
 
 The OSD text is white for me when using mplayer -osdlevel 3 -vo vaapi
 xxx. If possible, could you update your mplayer ?

Huh.  I'm using the Arch Linux package of mplayer-vaapi 36265-13,
which seems to be the most recent subversion commit ID.  I've never seen
white text on my Haswell system - it seems to be consistently dark green.

  If you use -vo gl or -vo xv, the OSD is solid white text with a black
  border around it.  I presume that it's supposed to be white with vaapi as
  well, but I guess I'm not entirely sure.
  
  It's possible that the optimization doesn't affect the palette as long as
  you never use sample_c with the paletted textures.
 
 I verified the palette takes effect in the following way:
 
 1. Only support P8A8 format in the driver
 
 2. ran the above command and I saw white OSD text
 
 3. Only support P4A4 format in the driver and don't use
 3DSTATE_SAMPLER_PALETTE_LOAD0 to load the value to the texture palette,
 so the palette keeps unchanged. 
 
 4. ran the above command and I saw black OSD text.
 
 5. Load the right value to the texture palette and ran the above command
 again, I saw white OSD text.
 
 Hence I think sample_c with the paletted textures is used in the driver.

That sounds like the palette is actually working, then.  Great :)

I doubt that libva would use sample_c - sampling with a shadow comparison?
It looks like it just uses sample and sample+killpix.

I'm pretty sure the sample_c optimization just uses the palette memory as
storage for some stuff, so it's quite possible it just works if you're
only using sample and sample+killpix.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2015-01-05 Thread Xiang, Haihao
On Mon, 2015-01-05 at 23:03 -0800, Kenneth Graunke wrote:
 On Tuesday, January 06, 2015 02:39:36 PM Xiang, Haihao wrote:
  On Mon, 2015-01-05 at 21:54 -0800, Kenneth Graunke wrote:
   On Tuesday, January 06, 2015 01:11:53 PM Xiang, Haihao wrote:

Hi Kenneth,

How did you test OSD ? I can't reproduce the issue you mentioned, OSD
works well for me when using mplayer-vaapi with the latest
libva/libva-intel-driver master branch.

I tried your patch, what surprised me is OSD still works well after
applying your patch. It seems your patch didn't disable the palette.

Thanks
Haihao
   
   I ran:
   
   mplayer -osdlevel 3 -vo vaapi big_buck_bunny_720p_stereo.ogg
   
   For me, the OSD text is solid green, with hard edges.
  
  The OSD text is white for me when using mplayer -osdlevel 3 -vo vaapi
  xxx. If possible, could you update your mplayer ?
 
 Huh.  I'm using the Arch Linux package of mplayer-vaapi 36265-13,
 which seems to be the most recent subversion commit ID.  I've never seen
 white text on my Haswell system - it seems to be consistently dark green.

   If you use -vo gl or -vo xv, the OSD is solid white text with a black
   border around it.  I presume that it's supposed to be white with vaapi as
   well, but I guess I'm not entirely sure.
   
   It's possible that the optimization doesn't affect the palette as long as
   you never use sample_c with the paletted textures.
  
  I verified the palette takes effect in the following way:
  
  1. Only support P8A8 format in the driver
  
  2. ran the above command and I saw white OSD text
  
  3. Only support P4A4 format in the driver and don't use
  3DSTATE_SAMPLER_PALETTE_LOAD0 to load the value to the texture palette,
  so the palette keeps unchanged. 
  
  4. ran the above command and I saw black OSD text.
  
  5. Load the right value to the texture palette and ran the above command
  again, I saw white OSD text.
  
  Hence I think sample_c with the paletted textures is used in the driver.
 
 That sounds like the palette is actually working, then.  Great :)
 
 I doubt that libva would use sample_c - sampling with a shadow comparison?
 It looks like it just uses sample and sample+killpix.

You are right, libva driver doesn't use sample_c message. 

 I'm pretty sure the sample_c optimization just uses the palette memory as
 storage for some stuff, so it's quite possible it just works if you're
 only using sample and sample+killpix.

Thanks for the explanation, it makes sense to me.

 
 --Ken


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2015-01-05 Thread Daniel Vetter
On Wed, Dec 31, 2014 at 04:23:00PM -0800, Kenneth Graunke wrote:
 Haswell significantly improved the performance of sampler_c messages,
 but the optimization appears to be off by default.  Later platforms
 remove this bit, and apparently always enable the optimization.
 
 Improves performance in Counter Strike: Global Offensive by 18%
 at default settings on Iris Pro.
 
 This may break sampling of paletted formats (P8/A8P8/P8A8).  It's
 unclear whether it affects sampling of paletted formats in general,
 or just the sample_c message (which is never used).
 
 While libva does have support for using paletted formats (primarily
 for OSDs), that support appears to have been broken for at least a
 year, so I couldn't observe a regression from this.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  drivers/gpu/drm/i915/i915_reg.h | 1 +
  drivers/gpu/drm/i915/intel_pm.c | 4 
  2 files changed, 5 insertions(+)
 
 Resubmitting the patch to unconditionally enable this.  I tried to get
 libva-intel to use paletted formats, and observe a regression...but the
 only thing I found that used it was mplayer's OSD (on screen display).
 Even without my patch, the colors were totally wrong with that, and it's
 according to a few distro wikis, that's been the case for over a year.
 
 If libva's code for paletted formats /is/ broken, they could always add
 code to disable this bit using the command validator when fixing it.
 
 Could we try merging this, and back it out if someone reports a
 regression?  I haven't observed any problems.  It's also been quite
 stable.

Yeah makes sense. When resending please incorporated review feedback
(Ville dug out the wa name), I've done that. And I've pasted the
additional detail about the libva saga, just for reference (since no one
will remember that it's mplayer's OSD which uses this 2 months down the
road).

Also please cc libva mailing lists next time around as an fyi. Done that
too.

Queued for -next, thanks for the patch.
-Daniel

 
 diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
 index 40ca873..0f32fd1a 100644
 --- a/drivers/gpu/drm/i915/i915_reg.h
 +++ b/drivers/gpu/drm/i915/i915_reg.h
 @@ -6167,6 +6167,7 @@ enum punit_power_well {
  #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)
  
  #define HALF_SLICE_CHICKEN3  0xe184
 +#define   HSW_SAMPLE_C_PERFORMANCE   (19)
  #define   GEN8_CENTROID_PIXEL_OPT_DIS(18)
  #define   GEN8_SAMPLER_POWER_BYPASS_DIS  (11)
  
 diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
 index 7d99a9c..17e84dc 100644
 --- a/drivers/gpu/drm/i915/intel_pm.c
 +++ b/drivers/gpu/drm/i915/intel_pm.c
 @@ -5974,6 +5974,10 @@ static void haswell_init_clock_gating(struct 
 drm_device *dev)
   I915_WRITE(GEN7_GT_MODE,
  _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4));
  
 + /* Make sample_c messages faster. */
 + I915_WRITE(HALF_SLICE_CHICKEN3,
 +_MASKED_BIT_ENABLE(HSW_SAMPLE_C_PERFORMANCE));
 +
   /* WaSwitchSolVfFArbitrationPriority:hsw */
   I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
  
 -- 
 2.2.1
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2015-01-05 Thread Kenneth Graunke
On Monday, January 05, 2015 02:19:15 PM Daniel Vetter wrote:
 On Wed, Dec 31, 2014 at 04:23:00PM -0800, Kenneth Graunke wrote:
  Haswell significantly improved the performance of sampler_c messages,
  but the optimization appears to be off by default.  Later platforms
  remove this bit, and apparently always enable the optimization.
  
  Improves performance in Counter Strike: Global Offensive by 18%
  at default settings on Iris Pro.
  
  This may break sampling of paletted formats (P8/A8P8/P8A8).  It's
  unclear whether it affects sampling of paletted formats in general,
  or just the sample_c message (which is never used).
  
  While libva does have support for using paletted formats (primarily
  for OSDs), that support appears to have been broken for at least a
  year, so I couldn't observe a regression from this.
  
  Signed-off-by: Kenneth Graunke kenn...@whitecape.org
  ---
   drivers/gpu/drm/i915/i915_reg.h | 1 +
   drivers/gpu/drm/i915/intel_pm.c | 4 
   2 files changed, 5 insertions(+)
  
  Resubmitting the patch to unconditionally enable this.  I tried to get
  libva-intel to use paletted formats, and observe a regression...but the
  only thing I found that used it was mplayer's OSD (on screen display).
  Even without my patch, the colors were totally wrong with that, and it's
  according to a few distro wikis, that's been the case for over a year.
  
  If libva's code for paletted formats /is/ broken, they could always add
  code to disable this bit using the command validator when fixing it.
  
  Could we try merging this, and back it out if someone reports a
  regression?  I haven't observed any problems.  It's also been quite
  stable.
 
 Yeah makes sense. When resending please incorporated review feedback
 (Ville dug out the wa name), I've done that. And I've pasted the
 additional detail about the libva saga, just for reference (since no one
 will remember that it's mplayer's OSD which uses this 2 months down the
 road).
 
 Also please cc libva mailing lists next time around as an fyi. Done that
 too.
 
 Queued for -next, thanks for the patch.
 -Daniel

Oh, sorry, I missed that in the review.

Thanks, Daniel!

--Ken

signature.asc
Description: This is a digitally signed message part.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2015-01-05 Thread Xiang, Haihao

Hi Kenneth,

How did you test OSD ? I can't reproduce the issue you mentioned, OSD
works well for me when using mplayer-vaapi with the latest
libva/libva-intel-driver master branch.

I tried your patch, what surprised me is OSD still works well after
applying your patch. It seems your patch didn't disable the palette.

Thanks
Haihao


 On Monday, January 05, 2015 02:19:15 PM Daniel Vetter wrote:
  On Wed, Dec 31, 2014 at 04:23:00PM -0800, Kenneth Graunke wrote:
   Haswell significantly improved the performance of sampler_c messages,
   but the optimization appears to be off by default.  Later platforms
   remove this bit, and apparently always enable the optimization.
   
   Improves performance in Counter Strike: Global Offensive by 18%
   at default settings on Iris Pro.
   
   This may break sampling of paletted formats (P8/A8P8/P8A8).  It's
   unclear whether it affects sampling of paletted formats in general,
   or just the sample_c message (which is never used).
   
   While libva does have support for using paletted formats (primarily
   for OSDs), that support appears to have been broken for at least a
   year, so I couldn't observe a regression from this.
   
   Signed-off-by: Kenneth Graunke kenn...@whitecape.org
   ---
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_pm.c | 4 
2 files changed, 5 insertions(+)
   
   Resubmitting the patch to unconditionally enable this.  I tried to get
   libva-intel to use paletted formats, and observe a regression...but the
   only thing I found that used it was mplayer's OSD (on screen display).
   Even without my patch, the colors were totally wrong with that, and it's
   according to a few distro wikis, that's been the case for over a year.
   
   If libva's code for paletted formats /is/ broken, they could always add
   code to disable this bit using the command validator when fixing it.
   
   Could we try merging this, and back it out if someone reports a
   regression?  I haven't observed any problems.  It's also been quite
   stable.
  
  Yeah makes sense. When resending please incorporated review feedback
  (Ville dug out the wa name), I've done that. And I've pasted the
  additional detail about the libva saga, just for reference (since no one
  will remember that it's mplayer's OSD which uses this 2 months down the
  road).
  
  Also please cc libva mailing lists next time around as an fyi. Done that
  too.
  
  Queued for -next, thanks for the patch.
  -Daniel
 
 Oh, sorry, I missed that in the review.
 
 Thanks, Daniel!
 
 --Ken
 ___ Intel-gfx mailing list 
 Intel-gfx@lists.freedesktop.org 
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2015-01-05 Thread Kenneth Graunke
On Tuesday, January 06, 2015 01:11:53 PM Xiang, Haihao wrote:
 
 Hi Kenneth,
 
 How did you test OSD ? I can't reproduce the issue you mentioned, OSD
 works well for me when using mplayer-vaapi with the latest
 libva/libva-intel-driver master branch.
 
 I tried your patch, what surprised me is OSD still works well after
 applying your patch. It seems your patch didn't disable the palette.
 
 Thanks
 Haihao

I ran:

mplayer -osdlevel 3 -vo vaapi big_buck_bunny_720p_stereo.ogg

For me, the OSD text is solid green, with hard edges.

If you use -vo gl or -vo xv, the OSD is solid white text with a black
border around it.  I presume that it's supposed to be white with vaapi as
well, but I guess I'm not entirely sure.

It's possible that the optimization doesn't affect the palette as long as
you never use sample_c with the paletted textures.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2015-01-04 Thread shuang . he
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: 
shuang...@intel.com)
-Summary-
Platform  Delta  drm-intel-nightly  Series Applied
PNV  363/364  363/364
ILK -1  364/366  363/366
SNB  +2-1  443/450  444/450
IVB  496/498  496/498
BYT  288/289  288/289
HSW  +2-6  542/564  538/564
BDW  415/417  415/417
-Detailed-
Platform  Testdrm-intel-nightly  Series 
Applied
*ILK  igt_gem_fenced_exec_thrash_no-spare-fences-busy  PASS(2, M37)  
DMESG_WARN(1, M37)
*SNB  igt_kms_flip_modeset-vs-vblank-race  DMESG_WARN(2, M35)  PASS(1, 
M35)
 SNB  igt_kms_flip_modeset-vs-vblank-race-interruptible  DMESG_WARN(2, 
M35M22)PASS(1, M35)  DMESG_WARN(1, M35)
 SNB  igt_kms_plane_plane-position-hole-pipe-B-plane-1  DMESG_WARN(1, 
M35)PASS(2, M35M22)  PASS(1, M35)
*HSW  igt_kms_flip_flip-vs-expired-vblank  PASS(2, M40)  DMESG_WARN(1, 
M40)
*HSW  igt_kms_flip_flip-vs-expired-vblank-interruptible  PASS(2, M40)  
DMESG_WARN(1, M40)
*HSW  igt_kms_flip_nonexisting-fb  PASS(2, M40)  DMESG_WARN(1, M40)
*HSW  igt_kms_flip_nonexisting-fb-interruptible  PASS(2, M40)  
DMESG_WARN(1, M40)
*HSW  igt_kms_flip_single-buffer-flip-vs-dpms-off-vs-modeset  PASS(3, 
M40M19)  DMESG_WARN(1, M40)
*HSW  igt_kms_plane_plane-panning-bottom-right-pipe-C-plane-1  TIMEOUT(2, 
M40)PASS(1, M19)  PASS(1, M40)
*HSW  igt_kms_plane_plane-position-hole-pipe-A-plane-2  PASS(2, M40)  
DMESG_WARN(1, M40)
 HSW  igt_pm_rpm_modeset-non-lpsp-stress-no-wait  NSPT(1, M19)DMESG_WARN(1, 
M40)PASS(1, M40)  PASS(1, M40)
Note: You need to pay more attention to line start with '*'
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-12-31 Thread Kenneth Graunke
Haswell significantly improved the performance of sampler_c messages,
but the optimization appears to be off by default.  Later platforms
remove this bit, and apparently always enable the optimization.

Improves performance in Counter Strike: Global Offensive by 18%
at default settings on Iris Pro.

This may break sampling of paletted formats (P8/A8P8/P8A8).  It's
unclear whether it affects sampling of paletted formats in general,
or just the sample_c message (which is never used).

While libva does have support for using paletted formats (primarily
for OSDs), that support appears to have been broken for at least a
year, so I couldn't observe a regression from this.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 drivers/gpu/drm/i915/i915_reg.h | 1 +
 drivers/gpu/drm/i915/intel_pm.c | 4 
 2 files changed, 5 insertions(+)

Resubmitting the patch to unconditionally enable this.  I tried to get
libva-intel to use paletted formats, and observe a regression...but the
only thing I found that used it was mplayer's OSD (on screen display).
Even without my patch, the colors were totally wrong with that, and it's
according to a few distro wikis, that's been the case for over a year.

If libva's code for paletted formats /is/ broken, they could always add
code to disable this bit using the command validator when fixing it.

Could we try merging this, and back it out if someone reports a
regression?  I haven't observed any problems.  It's also been quite
stable.

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 40ca873..0f32fd1a 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6167,6 +6167,7 @@ enum punit_power_well {
 #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)
 
 #define HALF_SLICE_CHICKEN30xe184
+#define   HSW_SAMPLE_C_PERFORMANCE (19)
 #define   GEN8_CENTROID_PIXEL_OPT_DIS  (18)
 #define   GEN8_SAMPLER_POWER_BYPASS_DIS(11)
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 7d99a9c..17e84dc 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5974,6 +5974,10 @@ static void haswell_init_clock_gating(struct drm_device 
*dev)
I915_WRITE(GEN7_GT_MODE,
   _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4));
 
+   /* Make sample_c messages faster. */
+   I915_WRITE(HALF_SLICE_CHICKEN3,
+  _MASKED_BIT_ENABLE(HSW_SAMPLE_C_PERFORMANCE));
+
/* WaSwitchSolVfFArbitrationPriority:hsw */
I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
 
-- 
2.2.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-11-11 Thread Daniel Vetter
On Fri, Nov 07, 2014 at 10:46:31AM -0800, Matt Turner wrote:
 On Wed, Oct 29, 2014 at 2:12 PM, Kenneth Graunke kenn...@whitecape.org 
 wrote:
  Haswell significantly improved the performance of sampler_c messages,
  but the optimization appears to be off by default.  Later platforms
  remove this bit, and apparently always enable the optimization.
 
  Improves performance in Counter Strike: Global Offensive by 18%
  at default settings on Iris Pro.  No Piglit regressions.
 
 Discussion seems to have ended, but Mesa really needs this.
 
 So, Kernel Team, how are we going to make sure Mesa gets this 18%
 performance improvement?

Someone writes the patches+tests and everything, gets it all reviewed and
I'll merge it.

I'd prefer if we could do this with the cmd parser, but that thing seems
to be eternally stuck. So a new execbuf flag is imo ok too. execbuf
already tracks a few other lonesome bits and makes sure userspace actually
gets the setting it requests, so should work out.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-11-07 Thread Matt Turner
On Wed, Oct 29, 2014 at 2:12 PM, Kenneth Graunke kenn...@whitecape.org wrote:
 Haswell significantly improved the performance of sampler_c messages,
 but the optimization appears to be off by default.  Later platforms
 remove this bit, and apparently always enable the optimization.

 Improves performance in Counter Strike: Global Offensive by 18%
 at default settings on Iris Pro.  No Piglit regressions.

Discussion seems to have ended, but Mesa really needs this.

So, Kernel Team, how are we going to make sure Mesa gets this 18%
performance improvement?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-11-04 Thread Xiang, Haihao
On Mon, 2014-11-03 at 13:48 +0100, Daniel Vetter wrote:
 On Fri, Oct 31, 2014 at 11:27:33AM +0200, Ville Syrjälä wrote:
  On Thu, Oct 30, 2014 at 12:57:04PM -0700, Kenneth Graunke wrote:
   Before we get too much further...we should check if libva is actually 
   broken.
   I don't know if this means the sampler palette completely doesn't work, 
   or if
   it just means sample_c doesn't work with the palette.  If it's the latter,
   we're probably fine, because I doubt libva uses sample_c.
 
  Yeah if we wouldn't break any existing userspace I guess we could just
  flip the switch in the kernel. If anyone later wants to start doing
  something that no longer works they'd have to deal with disabling the
  bit using an LRI.
 
 It very much looks like libva uses palettes, since it supports C8 and C4
 image formats (well some crazy fourcc nonsense, but meh). And it does so
 on all generations support by the libva driver, i.e. including hsw afaict
 
 Cc'ing people and lists with more clue who should be able to tell whether
 its not just there but actually works ...

Yes, libva driver uses sample (0) and palette on HSW

Thanks
Haihao


 -Daniel


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-11-03 Thread Daniel Vetter
On Fri, Oct 31, 2014 at 11:27:33AM +0200, Ville Syrjälä wrote:
 On Thu, Oct 30, 2014 at 12:57:04PM -0700, Kenneth Graunke wrote:
  Before we get too much further...we should check if libva is actually 
  broken.
  I don't know if this means the sampler palette completely doesn't work, or 
  if
  it just means sample_c doesn't work with the palette.  If it's the latter,
  we're probably fine, because I doubt libva uses sample_c.

 Yeah if we wouldn't break any existing userspace I guess we could just
 flip the switch in the kernel. If anyone later wants to start doing
 something that no longer works they'd have to deal with disabling the
 bit using an LRI.

It very much looks like libva uses palettes, since it supports C8 and C4
image formats (well some crazy fourcc nonsense, but meh). And it does so
on all generations support by the libva driver, i.e. including hsw afaict

Cc'ing people and lists with more clue who should be able to tell whether
its not just there but actually works ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-11-03 Thread Dave Gordon
On 30/10/14 19:26, Ville Syrjälä wrote:
 On Thu, Oct 30, 2014 at 10:32:38AM -0700, Kenneth Graunke wrote:
 On Thursday, October 30, 2014 01:01:30 PM Ville Syrjälä wrote:
 On Thu, Oct 30, 2014 at 02:32:40AM -0700, Kenneth Graunke wrote:
 On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote:
 On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
 On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
 Haswell significantly improved the performance of sampler_c 
 messages,
 but the optimization appears to be off by default.  Later platforms
 remove this bit, and apparently always enable the optimization.

 Improves performance in Counter Strike: Global Offensive by 18%
 at default settings on Iris Pro.  No Piglit regressions.

 Nice. We need more bits like this ;)


 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  drivers/gpu/drm/i915/i915_reg.h | 1 +
  drivers/gpu/drm/i915/intel_pm.c | 4 
  2 files changed, 5 insertions(+)

 diff --git a/drivers/gpu/drm/i915/i915_reg.h 
 b/drivers/gpu/drm/i915/i915_reg.h
 index 77fce96..340821a 100644
 --- a/drivers/gpu/drm/i915/i915_reg.h
 +++ b/drivers/gpu/drm/i915/i915_reg.h
 @@ -5952,6 +5952,7 @@ enum punit_power_well {
  #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)
  
  #define HALF_SLICE_CHICKEN30xe184
 +#define   HSW_SAMPLE_C_PERFORMANCE (19)
  #define   GEN8_CENTROID_PIXEL_OPT_DIS  (18)
  #define   GEN8_SAMPLER_POWER_BYPASS_DIS(11)
  
 diff --git a/drivers/gpu/drm/i915/intel_pm.c 
 b/drivers/gpu/drm/i915/intel_pm.c
 index 7a69eba..50c72a7 100644
 --- a/drivers/gpu/drm/i915/intel_pm.c
 +++ b/drivers/gpu/drm/i915/intel_pm.c
 @@ -5736,6 +5736,10 @@ static void haswell_init_clock_gating(struct 
 drm_device *dev)
 I915_WRITE(GEN7_GT_MODE,
GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
  
 +   /* Make sample_c messages faster. */

 I found a name for it in the w/a database.

 WaSampleCChickenBitEnable:hsw

 Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com

 Oh actually it says palette won't work when this bit is on. I'm assuming
 that's the texture palette. Do we have any use of that anywhere?

 That's a good point.  3DSTATE_SAMPLER_PALETTE_LOAD and the A8P8/indexed 
 formats aren't used by Mesa or xf86-video-intel, but it looks like they
 might be used by libva.

 Can someone confirm that libva does use the sampler palette?

 If they do, what do we do about it?

 I suppose the best option then would be to use an LRI from a batch,
 which means the register would need to be added to the cmd parser
 white list. This is one of the context saved registers so doing the
 LRI just once per context should be enough.

 I don't like that solution.  For one, it's impossible - you can't LRI from 
 userspace batches, even if you add it to the kernel command parser's 
 whitelist, because the hardware scanner is still enabled.  Given that I've 
 been waiting two years for this capability, I want to find a more immediate 
 solution.
 
 Ah. I've somehow convinced myself the cmd parser might actually be doing
 something besides just eating CPU cycles these days. But I guess not.
 

 Another option is to have some sort of execbuf flag...maybe a 3D/Media 
 usage 
 flag.  If set to 3D, write 0x6000200...if media, write 0x600.  Or 
 something specific.  I do hate adding more junk to the execbuf path, though.

 Other ideas?
 
 Fast vs. slow flag? :)
 
 More seriously, one somewhat crappy option would be to initialize that
 bit to 1 for all explicit contexts, and then have the kernel always turn
 it off before executing something with the default context. It's not
 unlike how we imagined the RS stuff would work since old userspace
 doesn't know to turn RS off when using the default context.
 
 But if we would do something like this, I think it would be nice if we
 could just add it as a temporary hack and potentially drop it if we
 manage to lose the restore inhibit flag and someone finishes the
 cmd parser. So maybe Mesa could also try to set it from a batch,
 which currently would be a nop or rejected by the cmd parser, but
 when the cmd parser starts to work it might actually do something?
 
 But always initializing it to 1 for all explicit context in the kernel
 obviously requires that libva doesn't use an explicit context itself.
 In case it does, another idea would be to just add a hack to the cmd
 parser to emit the LRI on the ring when it enconters this register.
 Not very pretty, but should work somewhat. It wouldn't allow mid-batch
 changes to the register value though, but enough for setting it once
 for each context in Mesa. And of course we'd still need the kernel to
 turn it off before any default context will see it.

This looks like a use for the proposed DRM_I915_GEM_CONTEXT_CREATE2
ioctl, which (so far) allows setting of scheduling priority for batches
submitted in the new context, and/or declaring that the context will be
used for GPGPU batches. We 

Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-10-31 Thread Ville Syrjälä
On Thu, Oct 30, 2014 at 12:57:04PM -0700, Kenneth Graunke wrote:
 On Thursday, October 30, 2014 09:26:01 PM Ville Syrjälä wrote:
  On Thu, Oct 30, 2014 at 10:32:38AM -0700, Kenneth Graunke wrote:
   On Thursday, October 30, 2014 01:01:30 PM Ville Syrjälä wrote:
On Thu, Oct 30, 2014 at 02:32:40AM -0700, Kenneth Graunke wrote:
 On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote:
  On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
   On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
Haswell significantly improved the performance of sampler_c 
   messages,
but the optimization appears to be off by default.  Later 
 platforms
remove this bit, and apparently always enable the optimization.

Improves performance in Counter Strike: Global Offensive by 
 18%
at default settings on Iris Pro.  No Piglit regressions.
   
   Nice. We need more bits like this ;)
   

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 drivers/gpu/drm/i915/i915_reg.h | 1 +
 drivers/gpu/drm/i915/intel_pm.c | 4 
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h 
 b/drivers/gpu/drm/i915/i915_reg.h
index 77fce96..340821a 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5952,6 +5952,7 @@ enum punit_power_well {
 #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)
 
 #define HALF_SLICE_CHICKEN30xe184
+#define   HSW_SAMPLE_C_PERFORMANCE (19)
 #define   GEN8_CENTROID_PIXEL_OPT_DIS  (18)
 #define   GEN8_SAMPLER_POWER_BYPASS_DIS(11)
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c 
 b/drivers/gpu/drm/i915/intel_pm.c
index 7a69eba..50c72a7 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5736,6 +5736,10 @@ static void 
 haswell_init_clock_gating(struct 
 drm_device *dev)
I915_WRITE(GEN7_GT_MODE,
   GEN6_WIZ_HASHING_MASK | 
GEN6_WIZ_HASHING_16x4);
 
+   /* Make sample_c messages faster. */
   
   I found a name for it in the w/a database.
   
   WaSampleCChickenBitEnable:hsw
   
   Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com
  
  Oh actually it says palette won't work when this bit is on. I'm 
 assuming
  that's the texture palette. Do we have any use of that anywhere?
 
 That's a good point.  3DSTATE_SAMPLER_PALETTE_LOAD and the 
 A8P8/indexed 
 formats aren't used by Mesa or xf86-video-intel, but it looks like 
 they 
   might 
 be used by libva.
 
 Can someone confirm that libva does use the sampler palette?
 
 If they do, what do we do about it?

I suppose the best option then would be to use an LRI from a batch,
which means the register would need to be added to the cmd parser
white list. This is one of the context saved registers so doing the
LRI just once per context should be enough.
   
   I don't like that solution.  For one, it's impossible - you can't LRI 
   from 
   userspace batches, even if you add it to the kernel command parser's 
   whitelist, because the hardware scanner is still enabled.  Given that 
   I've 
   been waiting two years for this capability, I want to find a more 
 immediate 
   solution.
  
  Ah. I've somehow convinced myself the cmd parser might actually be doing
  something besides just eating CPU cycles these days. But I guess not.
  
   
   Another option is to have some sort of execbuf flag...maybe a 3D/Media 
 usage 
   flag.  If set to 3D, write 0x6000200...if media, write 0x600.  Or 
   something specific.  I do hate adding more junk to the execbuf path, 
 though.
   
   Other ideas?
  
  Fast vs. slow flag? :)
  
  More seriously, one somewhat crappy option would be to initialize that
  bit to 1 for all explicit contexts, and then have the kernel always turn
  it off before executing something with the default context. It's not
  unlike how we imagined the RS stuff would work since old userspace
  doesn't know to turn RS off when using the default context.
 
 Interesting idea - that might work.  We don't need mid-batch changes either.
 
 I don't think HALF_SLICE_CHICKEN3 is part of the logical context, FWIW.

Oh but it is. Lots of chickens like to nest in the context.

 
 Before we get too much further...we should check if libva is actually broken. 
  
 I don't know if this means the sampler palette completely doesn't work, or if 
 it just means sample_c doesn't work with the palette.  If it's the latter, 
 we're probably fine, because I doubt libva uses sample_c.

Yeah if we wouldn't break any existing userspace I guess we could just
flip the switch in the kernel. If anyone 

Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-10-31 Thread Jani Nikula

[off-topic]

On Fri, 31 Oct 2014, Ville Syrjälä ville.syrj...@linux.intel.com wrote:
 On Thu, Oct 30, 2014 at 12:57:04PM -0700, Kenneth Graunke wrote:
 I don't think HALF_SLICE_CHICKEN3 is part of the logical context, FWIW.

 Oh but it is. Lots of chickens like to nest in the context.

In the context of lunch, HALF_SLICE_CHICKEN3 is indeed part of the
context. But it has certainly ceased to nest.


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-10-30 Thread Ville Syrjälä
On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
 Haswell significantly improved the performance of sampler_c messages,
 but the optimization appears to be off by default.  Later platforms
 remove this bit, and apparently always enable the optimization.
 
 Improves performance in Counter Strike: Global Offensive by 18%
 at default settings on Iris Pro.  No Piglit regressions.

Nice. We need more bits like this ;)

 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  drivers/gpu/drm/i915/i915_reg.h | 1 +
  drivers/gpu/drm/i915/intel_pm.c | 4 
  2 files changed, 5 insertions(+)
 
 diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
 index 77fce96..340821a 100644
 --- a/drivers/gpu/drm/i915/i915_reg.h
 +++ b/drivers/gpu/drm/i915/i915_reg.h
 @@ -5952,6 +5952,7 @@ enum punit_power_well {
  #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)
  
  #define HALF_SLICE_CHICKEN3  0xe184
 +#define   HSW_SAMPLE_C_PERFORMANCE   (19)
  #define   GEN8_CENTROID_PIXEL_OPT_DIS(18)
  #define   GEN8_SAMPLER_POWER_BYPASS_DIS  (11)
  
 diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
 index 7a69eba..50c72a7 100644
 --- a/drivers/gpu/drm/i915/intel_pm.c
 +++ b/drivers/gpu/drm/i915/intel_pm.c
 @@ -5736,6 +5736,10 @@ static void haswell_init_clock_gating(struct 
 drm_device *dev)
   I915_WRITE(GEN7_GT_MODE,
  GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
  
 + /* Make sample_c messages faster. */

I found a name for it in the w/a database.

WaSampleCChickenBitEnable:hsw

Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com

 + I915_WRITE(HALF_SLICE_CHICKEN3,
 +_MASKED_BIT_ENABLE(HSW_SAMPLE_C_PERFORMANCE));
 +
   /* WaSwitchSolVfFArbitrationPriority:hsw */
   I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
  
 -- 
 2.1.2
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-10-30 Thread Ville Syrjälä
On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
 On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
  Haswell significantly improved the performance of sampler_c messages,
  but the optimization appears to be off by default.  Later platforms
  remove this bit, and apparently always enable the optimization.
  
  Improves performance in Counter Strike: Global Offensive by 18%
  at default settings on Iris Pro.  No Piglit regressions.
 
 Nice. We need more bits like this ;)
 
  
  Signed-off-by: Kenneth Graunke kenn...@whitecape.org
  ---
   drivers/gpu/drm/i915/i915_reg.h | 1 +
   drivers/gpu/drm/i915/intel_pm.c | 4 
   2 files changed, 5 insertions(+)
  
  diff --git a/drivers/gpu/drm/i915/i915_reg.h 
  b/drivers/gpu/drm/i915/i915_reg.h
  index 77fce96..340821a 100644
  --- a/drivers/gpu/drm/i915/i915_reg.h
  +++ b/drivers/gpu/drm/i915/i915_reg.h
  @@ -5952,6 +5952,7 @@ enum punit_power_well {
   #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)
   
   #define HALF_SLICE_CHICKEN30xe184
  +#define   HSW_SAMPLE_C_PERFORMANCE (19)
   #define   GEN8_CENTROID_PIXEL_OPT_DIS  (18)
   #define   GEN8_SAMPLER_POWER_BYPASS_DIS(11)
   
  diff --git a/drivers/gpu/drm/i915/intel_pm.c 
  b/drivers/gpu/drm/i915/intel_pm.c
  index 7a69eba..50c72a7 100644
  --- a/drivers/gpu/drm/i915/intel_pm.c
  +++ b/drivers/gpu/drm/i915/intel_pm.c
  @@ -5736,6 +5736,10 @@ static void haswell_init_clock_gating(struct 
  drm_device *dev)
  I915_WRITE(GEN7_GT_MODE,
 GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
   
  +   /* Make sample_c messages faster. */
 
 I found a name for it in the w/a database.
 
 WaSampleCChickenBitEnable:hsw
 
 Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com

Oh actually it says palette won't work when this bit is on. I'm assuming
that's the texture palette. Do we have any use of that anywhere?

 
  +   I915_WRITE(HALF_SLICE_CHICKEN3,
  +  _MASKED_BIT_ENABLE(HSW_SAMPLE_C_PERFORMANCE));
  +
  /* WaSwitchSolVfFArbitrationPriority:hsw */
  I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
   
  -- 
  2.1.2
  
  ___
  Intel-gfx mailing list
  Intel-gfx@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/intel-gfx
 
 -- 
 Ville Syrjälä
 Intel OTC

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-10-30 Thread Kenneth Graunke
On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote:
 On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
  On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
   Haswell significantly improved the performance of sampler_c messages,
   but the optimization appears to be off by default.  Later platforms
   remove this bit, and apparently always enable the optimization.
   
   Improves performance in Counter Strike: Global Offensive by 18%
   at default settings on Iris Pro.  No Piglit regressions.
  
  Nice. We need more bits like this ;)
  
   
   Signed-off-by: Kenneth Graunke kenn...@whitecape.org
   ---
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_pm.c | 4 
2 files changed, 5 insertions(+)
   
   diff --git a/drivers/gpu/drm/i915/i915_reg.h 
b/drivers/gpu/drm/i915/i915_reg.h
   index 77fce96..340821a 100644
   --- a/drivers/gpu/drm/i915/i915_reg.h
   +++ b/drivers/gpu/drm/i915/i915_reg.h
   @@ -5952,6 +5952,7 @@ enum punit_power_well {
#define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)

#define HALF_SLICE_CHICKEN3  0xe184
   +#define   HSW_SAMPLE_C_PERFORMANCE   (19)
#define   GEN8_CENTROID_PIXEL_OPT_DIS(18)
#define   GEN8_SAMPLER_POWER_BYPASS_DIS  (11)

   diff --git a/drivers/gpu/drm/i915/intel_pm.c 
b/drivers/gpu/drm/i915/intel_pm.c
   index 7a69eba..50c72a7 100644
   --- a/drivers/gpu/drm/i915/intel_pm.c
   +++ b/drivers/gpu/drm/i915/intel_pm.c
   @@ -5736,6 +5736,10 @@ static void haswell_init_clock_gating(struct 
drm_device *dev)
 I915_WRITE(GEN7_GT_MODE,
GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);

   + /* Make sample_c messages faster. */
  
  I found a name for it in the w/a database.
  
  WaSampleCChickenBitEnable:hsw
  
  Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com
 
 Oh actually it says palette won't work when this bit is on. I'm assuming
 that's the texture palette. Do we have any use of that anywhere?

That's a good point.  3DSTATE_SAMPLER_PALETTE_LOAD and the A8P8/indexed 
formats aren't used by Mesa or xf86-video-intel, but it looks like they might 
be used by libva.

Can someone confirm that libva does use the sampler palette?

If they do, what do we do about it?

--Ken

signature.asc
Description: This is a digitally signed message part.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-10-30 Thread Ville Syrjälä
On Thu, Oct 30, 2014 at 02:32:40AM -0700, Kenneth Graunke wrote:
 On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote:
  On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
   On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
Haswell significantly improved the performance of sampler_c messages,
but the optimization appears to be off by default.  Later platforms
remove this bit, and apparently always enable the optimization.

Improves performance in Counter Strike: Global Offensive by 18%
at default settings on Iris Pro.  No Piglit regressions.
   
   Nice. We need more bits like this ;)
   

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 drivers/gpu/drm/i915/i915_reg.h | 1 +
 drivers/gpu/drm/i915/intel_pm.c | 4 
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h 
 b/drivers/gpu/drm/i915/i915_reg.h
index 77fce96..340821a 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5952,6 +5952,7 @@ enum punit_power_well {
 #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)
 
 #define HALF_SLICE_CHICKEN30xe184
+#define   HSW_SAMPLE_C_PERFORMANCE (19)
 #define   GEN8_CENTROID_PIXEL_OPT_DIS  (18)
 #define   GEN8_SAMPLER_POWER_BYPASS_DIS(11)
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c 
 b/drivers/gpu/drm/i915/intel_pm.c
index 7a69eba..50c72a7 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5736,6 +5736,10 @@ static void haswell_init_clock_gating(struct 
 drm_device *dev)
I915_WRITE(GEN7_GT_MODE,
   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
 
+   /* Make sample_c messages faster. */
   
   I found a name for it in the w/a database.
   
   WaSampleCChickenBitEnable:hsw
   
   Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com
  
  Oh actually it says palette won't work when this bit is on. I'm assuming
  that's the texture palette. Do we have any use of that anywhere?
 
 That's a good point.  3DSTATE_SAMPLER_PALETTE_LOAD and the A8P8/indexed 
 formats aren't used by Mesa or xf86-video-intel, but it looks like they might 
 be used by libva.
 
 Can someone confirm that libva does use the sampler palette?
 
 If they do, what do we do about it?

I suppose the best option then would be to use an LRI from a batch,
which means the register would need to be added to the cmd parser
white list. This is one of the context saved registers so doing the
LRI just once per context should be enough.

Well that's assuming libva doesn't use the default context. I'm getting
another itch to drop the restore inhibit flag for default contexts.
That would actually make it possible to do these sort of things without
risking breakage to existing userspace. But I think Chris is going
scream unless the patch comes with performance data that shows it
doesn't hurt too much.

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-10-30 Thread Kenneth Graunke
On Thursday, October 30, 2014 01:01:30 PM Ville Syrjälä wrote:
 On Thu, Oct 30, 2014 at 02:32:40AM -0700, Kenneth Graunke wrote:
  On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote:
   On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
 Haswell significantly improved the performance of sampler_c 
messages,
 but the optimization appears to be off by default.  Later platforms
 remove this bit, and apparently always enable the optimization.
 
 Improves performance in Counter Strike: Global Offensive by 18%
 at default settings on Iris Pro.  No Piglit regressions.

Nice. We need more bits like this ;)

 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  drivers/gpu/drm/i915/i915_reg.h | 1 +
  drivers/gpu/drm/i915/intel_pm.c | 4 
  2 files changed, 5 insertions(+)
 
 diff --git a/drivers/gpu/drm/i915/i915_reg.h 
  b/drivers/gpu/drm/i915/i915_reg.h
 index 77fce96..340821a 100644
 --- a/drivers/gpu/drm/i915/i915_reg.h
 +++ b/drivers/gpu/drm/i915/i915_reg.h
 @@ -5952,6 +5952,7 @@ enum punit_power_well {
  #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)
  
  #define HALF_SLICE_CHICKEN3  0xe184
 +#define   HSW_SAMPLE_C_PERFORMANCE   (19)
  #define   GEN8_CENTROID_PIXEL_OPT_DIS(18)
  #define   GEN8_SAMPLER_POWER_BYPASS_DIS  (11)
  
 diff --git a/drivers/gpu/drm/i915/intel_pm.c 
  b/drivers/gpu/drm/i915/intel_pm.c
 index 7a69eba..50c72a7 100644
 --- a/drivers/gpu/drm/i915/intel_pm.c
 +++ b/drivers/gpu/drm/i915/intel_pm.c
 @@ -5736,6 +5736,10 @@ static void haswell_init_clock_gating(struct 
  drm_device *dev)
   I915_WRITE(GEN7_GT_MODE,
  GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
  
 + /* Make sample_c messages faster. */

I found a name for it in the w/a database.

WaSampleCChickenBitEnable:hsw

Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com
   
   Oh actually it says palette won't work when this bit is on. I'm assuming
   that's the texture palette. Do we have any use of that anywhere?
  
  That's a good point.  3DSTATE_SAMPLER_PALETTE_LOAD and the A8P8/indexed 
  formats aren't used by Mesa or xf86-video-intel, but it looks like they 
might 
  be used by libva.
  
  Can someone confirm that libva does use the sampler palette?
  
  If they do, what do we do about it?
 
 I suppose the best option then would be to use an LRI from a batch,
 which means the register would need to be added to the cmd parser
 white list. This is one of the context saved registers so doing the
 LRI just once per context should be enough.

I don't like that solution.  For one, it's impossible - you can't LRI from 
userspace batches, even if you add it to the kernel command parser's 
whitelist, because the hardware scanner is still enabled.  Given that I've 
been waiting two years for this capability, I want to find a more immediate 
solution.

Another option is to have some sort of execbuf flag...maybe a 3D/Media usage 
flag.  If set to 3D, write 0x6000200...if media, write 0x600.  Or 
something specific.  I do hate adding more junk to the execbuf path, though.

Other ideas?

 Well that's assuming libva doesn't use the default context. I'm getting
 another itch to drop the restore inhibit flag for default contexts.
 That would actually make it possible to do these sort of things without
 risking breakage to existing userspace. But I think Chris is going
 scream unless the patch comes with performance data that shows it
 doesn't hurt too much.

I suppose it wouldn't affect Mesa much, since we never use the default context 
on Gen6+.  But otherwise I'd probably want to see the data, like Chris...

signature.asc
Description: This is a digitally signed message part.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-10-30 Thread Ville Syrjälä
On Thu, Oct 30, 2014 at 10:32:38AM -0700, Kenneth Graunke wrote:
 On Thursday, October 30, 2014 01:01:30 PM Ville Syrjälä wrote:
  On Thu, Oct 30, 2014 at 02:32:40AM -0700, Kenneth Graunke wrote:
   On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote:
On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
 On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
  Haswell significantly improved the performance of sampler_c 
 messages,
  but the optimization appears to be off by default.  Later platforms
  remove this bit, and apparently always enable the optimization.
  
  Improves performance in Counter Strike: Global Offensive by 18%
  at default settings on Iris Pro.  No Piglit regressions.
 
 Nice. We need more bits like this ;)
 
  
  Signed-off-by: Kenneth Graunke kenn...@whitecape.org
  ---
   drivers/gpu/drm/i915/i915_reg.h | 1 +
   drivers/gpu/drm/i915/intel_pm.c | 4 
   2 files changed, 5 insertions(+)
  
  diff --git a/drivers/gpu/drm/i915/i915_reg.h 
   b/drivers/gpu/drm/i915/i915_reg.h
  index 77fce96..340821a 100644
  --- a/drivers/gpu/drm/i915/i915_reg.h
  +++ b/drivers/gpu/drm/i915/i915_reg.h
  @@ -5952,6 +5952,7 @@ enum punit_power_well {
   #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)
   
   #define HALF_SLICE_CHICKEN30xe184
  +#define   HSW_SAMPLE_C_PERFORMANCE (19)
   #define   GEN8_CENTROID_PIXEL_OPT_DIS  (18)
   #define   GEN8_SAMPLER_POWER_BYPASS_DIS(11)
   
  diff --git a/drivers/gpu/drm/i915/intel_pm.c 
   b/drivers/gpu/drm/i915/intel_pm.c
  index 7a69eba..50c72a7 100644
  --- a/drivers/gpu/drm/i915/intel_pm.c
  +++ b/drivers/gpu/drm/i915/intel_pm.c
  @@ -5736,6 +5736,10 @@ static void haswell_init_clock_gating(struct 
   drm_device *dev)
  I915_WRITE(GEN7_GT_MODE,
 GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
   
  +   /* Make sample_c messages faster. */
 
 I found a name for it in the w/a database.
 
 WaSampleCChickenBitEnable:hsw
 
 Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com

Oh actually it says palette won't work when this bit is on. I'm assuming
that's the texture palette. Do we have any use of that anywhere?
   
   That's a good point.  3DSTATE_SAMPLER_PALETTE_LOAD and the A8P8/indexed 
   formats aren't used by Mesa or xf86-video-intel, but it looks like they 
 might 
   be used by libva.
   
   Can someone confirm that libva does use the sampler palette?
   
   If they do, what do we do about it?
  
  I suppose the best option then would be to use an LRI from a batch,
  which means the register would need to be added to the cmd parser
  white list. This is one of the context saved registers so doing the
  LRI just once per context should be enough.
 
 I don't like that solution.  For one, it's impossible - you can't LRI from 
 userspace batches, even if you add it to the kernel command parser's 
 whitelist, because the hardware scanner is still enabled.  Given that I've 
 been waiting two years for this capability, I want to find a more immediate 
 solution.

Ah. I've somehow convinced myself the cmd parser might actually be doing
something besides just eating CPU cycles these days. But I guess not.

 
 Another option is to have some sort of execbuf flag...maybe a 3D/Media 
 usage 
 flag.  If set to 3D, write 0x6000200...if media, write 0x600.  Or 
 something specific.  I do hate adding more junk to the execbuf path, though.
 
 Other ideas?

Fast vs. slow flag? :)

More seriously, one somewhat crappy option would be to initialize that
bit to 1 for all explicit contexts, and then have the kernel always turn
it off before executing something with the default context. It's not
unlike how we imagined the RS stuff would work since old userspace
doesn't know to turn RS off when using the default context.

But if we would do something like this, I think it would be nice if we
could just add it as a temporary hack and potentially drop it if we
manage to lose the restore inhibit flag and someone finishes the
cmd parser. So maybe Mesa could also try to set it from a batch,
which currently would be a nop or rejected by the cmd parser, but
when the cmd parser starts to work it might actually do something?

But always initializing it to 1 for all explicit context in the kernel
obviously requires that libva doesn't use an explicit context itself.
In case it does, another idea would be to just add a hack to the cmd
parser to emit the LRI on the ring when it enconters this register.
Not very pretty, but should work somewhat. It wouldn't allow mid-batch
changes to the register value though, but enough for setting it once
for each context in Mesa. And of course we'd still need the kernel to
turn it off before any default context will see it.

 
  Well that's assuming libva doesn't use the 

Re: [Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-10-30 Thread Kenneth Graunke
On Thursday, October 30, 2014 09:26:01 PM Ville Syrjälä wrote:
 On Thu, Oct 30, 2014 at 10:32:38AM -0700, Kenneth Graunke wrote:
  On Thursday, October 30, 2014 01:01:30 PM Ville Syrjälä wrote:
   On Thu, Oct 30, 2014 at 02:32:40AM -0700, Kenneth Graunke wrote:
On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote:
 On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
  On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
   Haswell significantly improved the performance of sampler_c 
  messages,
   but the optimization appears to be off by default.  Later 
platforms
   remove this bit, and apparently always enable the optimization.
   
   Improves performance in Counter Strike: Global Offensive by 
18%
   at default settings on Iris Pro.  No Piglit regressions.
  
  Nice. We need more bits like this ;)
  
   
   Signed-off-by: Kenneth Graunke kenn...@whitecape.org
   ---
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_pm.c | 4 
2 files changed, 5 insertions(+)
   
   diff --git a/drivers/gpu/drm/i915/i915_reg.h 
b/drivers/gpu/drm/i915/i915_reg.h
   index 77fce96..340821a 100644
   --- a/drivers/gpu/drm/i915/i915_reg.h
   +++ b/drivers/gpu/drm/i915/i915_reg.h
   @@ -5952,6 +5952,7 @@ enum punit_power_well {
#define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)

#define HALF_SLICE_CHICKEN3  0xe184
   +#define   HSW_SAMPLE_C_PERFORMANCE   (19)
#define   GEN8_CENTROID_PIXEL_OPT_DIS(18)
#define   GEN8_SAMPLER_POWER_BYPASS_DIS  (11)

   diff --git a/drivers/gpu/drm/i915/intel_pm.c 
b/drivers/gpu/drm/i915/intel_pm.c
   index 7a69eba..50c72a7 100644
   --- a/drivers/gpu/drm/i915/intel_pm.c
   +++ b/drivers/gpu/drm/i915/intel_pm.c
   @@ -5736,6 +5736,10 @@ static void 
haswell_init_clock_gating(struct 
drm_device *dev)
 I915_WRITE(GEN7_GT_MODE,
GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);

   + /* Make sample_c messages faster. */
  
  I found a name for it in the w/a database.
  
  WaSampleCChickenBitEnable:hsw
  
  Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com
 
 Oh actually it says palette won't work when this bit is on. I'm 
assuming
 that's the texture palette. Do we have any use of that anywhere?

That's a good point.  3DSTATE_SAMPLER_PALETTE_LOAD and the 
A8P8/indexed 
formats aren't used by Mesa or xf86-video-intel, but it looks like 
they 
  might 
be used by libva.

Can someone confirm that libva does use the sampler palette?

If they do, what do we do about it?
   
   I suppose the best option then would be to use an LRI from a batch,
   which means the register would need to be added to the cmd parser
   white list. This is one of the context saved registers so doing the
   LRI just once per context should be enough.
  
  I don't like that solution.  For one, it's impossible - you can't LRI from 
  userspace batches, even if you add it to the kernel command parser's 
  whitelist, because the hardware scanner is still enabled.  Given that I've 
  been waiting two years for this capability, I want to find a more 
immediate 
  solution.
 
 Ah. I've somehow convinced myself the cmd parser might actually be doing
 something besides just eating CPU cycles these days. But I guess not.
 
  
  Another option is to have some sort of execbuf flag...maybe a 3D/Media 
usage 
  flag.  If set to 3D, write 0x6000200...if media, write 0x600.  Or 
  something specific.  I do hate adding more junk to the execbuf path, 
though.
  
  Other ideas?
 
 Fast vs. slow flag? :)
 
 More seriously, one somewhat crappy option would be to initialize that
 bit to 1 for all explicit contexts, and then have the kernel always turn
 it off before executing something with the default context. It's not
 unlike how we imagined the RS stuff would work since old userspace
 doesn't know to turn RS off when using the default context.

Interesting idea - that might work.  We don't need mid-batch changes either.

I don't think HALF_SLICE_CHICKEN3 is part of the logical context, FWIW.

Before we get too much further...we should check if libva is actually broken.  
I don't know if this means the sampler palette completely doesn't work, or if 
it just means sample_c doesn't work with the palette.  If it's the latter, 
we're probably fine, because I doubt libva uses sample_c.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

2014-10-29 Thread Kenneth Graunke
Haswell significantly improved the performance of sampler_c messages,
but the optimization appears to be off by default.  Later platforms
remove this bit, and apparently always enable the optimization.

Improves performance in Counter Strike: Global Offensive by 18%
at default settings on Iris Pro.  No Piglit regressions.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 drivers/gpu/drm/i915/i915_reg.h | 1 +
 drivers/gpu/drm/i915/intel_pm.c | 4 
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 77fce96..340821a 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5952,6 +5952,7 @@ enum punit_power_well {
 #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE(1  6)
 
 #define HALF_SLICE_CHICKEN30xe184
+#define   HSW_SAMPLE_C_PERFORMANCE (19)
 #define   GEN8_CENTROID_PIXEL_OPT_DIS  (18)
 #define   GEN8_SAMPLER_POWER_BYPASS_DIS(11)
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 7a69eba..50c72a7 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5736,6 +5736,10 @@ static void haswell_init_clock_gating(struct drm_device 
*dev)
I915_WRITE(GEN7_GT_MODE,
   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
 
+   /* Make sample_c messages faster. */
+   I915_WRITE(HALF_SLICE_CHICKEN3,
+  _MASKED_BIT_ENABLE(HSW_SAMPLE_C_PERFORMANCE));
+
/* WaSwitchSolVfFArbitrationPriority:hsw */
I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
 
-- 
2.1.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx