Re: [Intel-gfx] [PATCH 07/10] drm/i915: print Gen5+ CPU poison interrupts

2013-02-14 Thread Paulo Zanoni
Hi

2013/2/9 Ben Widawsky b...@bwidawsk.net:
 On Fri, 8 Feb 2013 11:42:39 -0800
 Jesse Barnes jbar...@virtuousgeek.org wrote:

 On Fri,  8 Feb 2013 17:35:18 -0200
 Paulo Zanoni przan...@gmail.com wrote:

  From: Paulo Zanoni paulo.r.zan...@intel.com
 
  On ILK/SNB all we need to do is to enable the poison bit, but on
  IVB/HSW we need to enable the CPU error interrupt register, which is
  responsible not only for poison interrupts, but also other things.
  This includes the unclaimed register interrupt, so on the IVB irq
  handler we now need to: (i) check whether the interrupt was
  triggered by an unclaimed register and (ii) mask the error
  interrupt bit so we don't risk generating unclaimed register
  interrupts form inside the interrupt handler.
 
  Signed-off-by: Paulo Zanoni paulo.r.zan...@intel.com
  ---

 OTOH there's nothing the user can do about it... so we might do a
 WARN_ONCE or something here instead.  But even then, I'm not sure
 there's much *we* can do about these, as they indicate a corruption in
 the communication between the CPU and PCH.


 I agree with Jesse. I wouldn't bother with these. Even a WARN_ONCE
 isn't helpful since the backtrace wouldn't really be meaningful.

Why isn't it helpful? Right now we don't even know whether this
problem happens or not, we're completely blind to a possible problem
that may be affecting us in some specific cases and we don't even
know. Knowing that it happens and how often it happens is IMHO
certainly better than closing our eyes and pretending it doesn't
exist.


 If OTOH, you wanted to save away this information into error state; I
 could get behind that.




-- 
Paulo Zanoni
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 07/10] drm/i915: print Gen5+ CPU poison interrupts

2013-02-14 Thread Ben Widawsky
On Thu, 14 Feb 2013 18:35:32 -0200
Paulo Zanoni przan...@gmail.com wrote:

 Hi
 
 2013/2/9 Ben Widawsky b...@bwidawsk.net:
  On Fri, 8 Feb 2013 11:42:39 -0800
  Jesse Barnes jbar...@virtuousgeek.org wrote:
 
  On Fri,  8 Feb 2013 17:35:18 -0200
  Paulo Zanoni przan...@gmail.com wrote:
 
   From: Paulo Zanoni paulo.r.zan...@intel.com
  
   On ILK/SNB all we need to do is to enable the poison bit, but
   on IVB/HSW we need to enable the CPU error interrupt register,
   which is responsible not only for poison interrupts, but also
   other things. This includes the unclaimed register interrupt,
   so on the IVB irq handler we now need to: (i) check whether the
   interrupt was triggered by an unclaimed register and (ii) mask
   the error interrupt bit so we don't risk generating unclaimed
   register interrupts form inside the interrupt handler.
  
   Signed-off-by: Paulo Zanoni paulo.r.zan...@intel.com
   ---
 
  OTOH there's nothing the user can do about it... so we might do a
  WARN_ONCE or something here instead.  But even then, I'm not sure
  there's much *we* can do about these, as they indicate a
  corruption in the communication between the CPU and PCH.
 
 
  I agree with Jesse. I wouldn't bother with these. Even a WARN_ONCE
  isn't helpful since the backtrace wouldn't really be meaningful.
 
 Why isn't it helpful? Right now we don't even know whether this
 problem happens or not, we're completely blind to a possible problem
 that may be affecting us in some specific cases and we don't even
 know. Knowing that it happens and how often it happens is IMHO
 certainly better than closing our eyes and pretending it doesn't
 exist.
 

I suppose you're right. I'm strongly of the opinion that we won't
ever see this error because the system will crap out before we'd be
able to get that info - of course I cannot prove that, and I don't
know enough about what exactly poison means. I just think it sucks that
we have yet another gen specific thing which has TBD value. I certainly
won't nak it, and of course if it proves useful, I'll be most
apologetic.

As for the WARN being unhelpful, it's the same problem again. You're
getting the notifications via interrupt, so a backtrace is useless on
IVB/HSW. Perhaps it makes sense on ILK/SNB. Reinventing a do this
once macro isn't worthwhile either, so I guess WARN_ON with the
assumption that we ignore the backtrace is fine on IVB/HSW.

 
  If OTOH, you wanted to save away this information into error state;
  I could get behind that.
 
 
 
 

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 07/10] drm/i915: print Gen5+ CPU poison interrupts

2013-02-09 Thread Ben Widawsky
On Fri, 8 Feb 2013 11:42:39 -0800
Jesse Barnes jbar...@virtuousgeek.org wrote:

 On Fri,  8 Feb 2013 17:35:18 -0200
 Paulo Zanoni przan...@gmail.com wrote:
 
  From: Paulo Zanoni paulo.r.zan...@intel.com
  
  On ILK/SNB all we need to do is to enable the poison bit, but on
  IVB/HSW we need to enable the CPU error interrupt register, which is
  responsible not only for poison interrupts, but also other things.
  This includes the unclaimed register interrupt, so on the IVB irq
  handler we now need to: (i) check whether the interrupt was
  triggered by an unclaimed register and (ii) mask the error
  interrupt bit so we don't risk generating unclaimed register
  interrupts form inside the interrupt handler.
  
  Signed-off-by: Paulo Zanoni paulo.r.zan...@intel.com
  ---
 
 OTOH there's nothing the user can do about it... so we might do a
 WARN_ONCE or something here instead.  But even then, I'm not sure
 there's much *we* can do about these, as they indicate a corruption in
 the communication between the CPU and PCH.
 

I agree with Jesse. I wouldn't bother with these. Even a WARN_ONCE
isn't helpful since the backtrace wouldn't really be meaningful.

If OTOH, you wanted to save away this information into error state; I
could get behind that.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 07/10] drm/i915: print Gen5+ CPU poison interrupts

2013-02-08 Thread Paulo Zanoni
From: Paulo Zanoni paulo.r.zan...@intel.com

On ILK/SNB all we need to do is to enable the poison bit, but on
IVB/HSW we need to enable the CPU error interrupt register, which is
responsible not only for poison interrupts, but also other things.
This includes the unclaimed register interrupt, so on the IVB irq
handler we now need to: (i) check whether the interrupt was triggered by an
unclaimed register and (ii) mask the error interrupt bit so we don't
risk generating unclaimed register interrupts form inside the
interrupt handler.

Signed-off-by: Paulo Zanoni paulo.r.zan...@intel.com
---
 drivers/gpu/drm/i915/i915_irq.c |   49 ---
 drivers/gpu/drm/i915/i915_reg.h |5 ++--
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 10aec0e..703a08a 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -665,6 +665,17 @@ static void ibx_irq_handler(struct drm_device *dev, u32 
pch_iir)
DRM_DEBUG_DRIVER(PCH transcoder A underrun interrupt\n);
 }
 
+static void err_int_handler(struct drm_device *dev)
+{
+   drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev-dev_private;
+   u32 err_int = I915_READ(GEN7_ERR_INT);
+
+   if (err_int  ERR_INT_POISON)
+   DRM_ERROR(Poison interrupt\n);
+
+   I915_WRITE(GEN7_ERR_INT, err_int);
+}
+
 static void serr_int_handler(struct drm_device *dev)
 {
drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev-dev_private;
@@ -715,16 +726,33 @@ static irqreturn_t ivybridge_irq_handler(int irq, void 
*arg)
 {
struct drm_device *dev = (struct drm_device *) arg;
drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev-dev_private;
-   u32 de_iir, gt_iir, de_ier, pm_iir, sde_ier;
+   u32 de_iir, gt_iir, de_ier, pm_iir, sde_ier, de_imr;
irqreturn_t ret = IRQ_NONE;
int i;
 
atomic_inc(dev_priv-irq_received);
 
+   /* We get interrupts on unclaimed registers, so check for this before we
+* do any I915_{READ,WRITE}. */
+   if (IS_HASWELL(dev) 
+   (I915_READ_NOTRACE(FPGA_DBG)  FPGA_DBG_RM_NOCLAIM)) {
+   DRM_ERROR(Unclaimed register before interrupt\n);
+   I915_WRITE_NOTRACE(FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
+   }
+
/* disable master interrupt before clearing iir  */
de_ier = I915_READ(DEIER);
I915_WRITE(DEIER, de_ier  ~DE_MASTER_IRQ_CONTROL);
 
+   /* On Haswell, also mask ERR_INT because we don't want to risk
+* generating unclaimed register interrupts from inside the interrupt
+* handler. */
+   de_imr = I915_READ(DEIMR);
+   if (IS_HASWELL(dev)) {
+   I915_WRITE(DEIMR, de_imr | DE_ERR_INT_IVB);
+   POSTING_READ(DEIMR);
+   }
+
sde_ier = I915_READ(SDEIER);
I915_WRITE(SDEIER, 0);
POSTING_READ(SDEIER);
@@ -738,6 +766,9 @@ static irqreturn_t ivybridge_irq_handler(int irq, void *arg)
 
de_iir = I915_READ(DEIIR);
if (de_iir) {
+   if (de_iir  DE_ERR_INT_IVB)
+   err_int_handler(dev);
+
if (de_iir  DE_AUX_CHANNEL_A_IVB)
dp_aux_irq_handler(dev);
 
@@ -775,6 +806,11 @@ static irqreturn_t ivybridge_irq_handler(int irq, void 
*arg)
ret = IRQ_HANDLED;
}
 
+   if (IS_HASWELL(dev)) {
+   I915_WRITE(DEIMR, de_imr);
+   POSTING_READ(DEIMR);
+   }
+
I915_WRITE(DEIER, de_ier);
POSTING_READ(DEIER);
I915_WRITE(SDEIER, sde_ier);
@@ -837,6 +873,9 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
if (de_iir  DE_PIPEB_VBLANK)
drm_handle_vblank(dev, 1);
 
+   if (de_iir  DE_POISON)
+   DRM_ERROR(Poison interrupt\n);
+
if (de_iir  DE_PLANEA_FLIP_DONE) {
intel_prepare_page_flip(dev, 0);
intel_finish_page_flip_plane(dev, 0);
@@ -1985,7 +2024,7 @@ static int ironlake_irq_postinstall(struct drm_device 
*dev)
/* enable kind of interrupts always enabled */
u32 display_mask = DE_MASTER_IRQ_CONTROL | DE_GSE | DE_PCH_EVENT |
   DE_PLANEA_FLIP_DONE | DE_PLANEB_FLIP_DONE |
-  DE_AUX_CHANNEL_A;
+  DE_AUX_CHANNEL_A | DE_POISON;
u32 render_irqs;
 
dev_priv-irq_mask = ~display_mask;
@@ -2035,12 +2074,14 @@ static int ivybridge_irq_postinstall(struct drm_device 
*dev)
DE_PLANEC_FLIP_DONE_IVB |
DE_PLANEB_FLIP_DONE_IVB |
DE_PLANEA_FLIP_DONE_IVB |
-   DE_AUX_CHANNEL_A_IVB;
+   DE_AUX_CHANNEL_A_IVB |
+   DE_ERR_INT_IVB;
u32 render_irqs;
 
dev_priv-irq_mask = ~display_mask;
 
/* should always can generate irq */
+   I915_WRITE(GEN7_ERR_INT, 

Re: [Intel-gfx] [PATCH 07/10] drm/i915: print Gen5+ CPU poison interrupts

2013-02-08 Thread Jesse Barnes
On Fri,  8 Feb 2013 17:35:18 -0200
Paulo Zanoni przan...@gmail.com wrote:

 From: Paulo Zanoni paulo.r.zan...@intel.com
 
 On ILK/SNB all we need to do is to enable the poison bit, but on
 IVB/HSW we need to enable the CPU error interrupt register, which is
 responsible not only for poison interrupts, but also other things.
 This includes the unclaimed register interrupt, so on the IVB irq
 handler we now need to: (i) check whether the interrupt was triggered by an
 unclaimed register and (ii) mask the error interrupt bit so we don't
 risk generating unclaimed register interrupts form inside the
 interrupt handler.
 
 Signed-off-by: Paulo Zanoni paulo.r.zan...@intel.com
 ---

OTOH there's nothing the user can do about it... so we might do a
WARN_ONCE or something here instead.  But even then, I'm not sure
there's much *we* can do about these, as they indicate a corruption in
the communication between the CPU and PCH.

-- 
Jesse Barnes, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 07/10] drm/i915: print Gen5+ CPU poison interrupts

2013-02-08 Thread Paulo Zanoni
Hi

2013/2/8 Jesse Barnes jbar...@virtuousgeek.org:
 On Fri,  8 Feb 2013 17:35:18 -0200
 Paulo Zanoni przan...@gmail.com wrote:

 From: Paulo Zanoni paulo.r.zan...@intel.com

 On ILK/SNB all we need to do is to enable the poison bit, but on
 IVB/HSW we need to enable the CPU error interrupt register, which is
 responsible not only for poison interrupts, but also other things.
 This includes the unclaimed register interrupt, so on the IVB irq
 handler we now need to: (i) check whether the interrupt was triggered by an
 unclaimed register and (ii) mask the error interrupt bit so we don't
 risk generating unclaimed register interrupts form inside the
 interrupt handler.

 Signed-off-by: Paulo Zanoni paulo.r.zan...@intel.com
 ---

 OTOH there's nothing the user can do about it... so we might do a
 WARN_ONCE or something here instead.

Well, so far I haven't seen the message. If we conclude it happens
*too much*, then we can use WARN_ONCE.

 But even then, I'm not sure
 there's much *we* can do about these, as they indicate a corruption in
 the communication between the CPU and PCH.

At least if we get the message we may be able to understand and/or
reproduce the problems. So far we don't even know whether the problem
is happening or not... And when there's a display bug, we don't know
if it's caused by poison.


 --
 Jesse Barnes, Intel Open Source Technology Center



-- 
Paulo Zanoni
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 07/10] drm/i915: print Gen5+ CPU poison interrupts

2013-02-08 Thread Jesse Barnes
On Fri, 8 Feb 2013 17:54:23 -0200
Paulo Zanoni przan...@gmail.com wrote:

 Hi
 
 2013/2/8 Jesse Barnes jbar...@virtuousgeek.org:
  On Fri,  8 Feb 2013 17:35:18 -0200
  Paulo Zanoni przan...@gmail.com wrote:
 
  From: Paulo Zanoni paulo.r.zan...@intel.com
 
  On ILK/SNB all we need to do is to enable the poison bit, but on
  IVB/HSW we need to enable the CPU error interrupt register, which is
  responsible not only for poison interrupts, but also other things.
  This includes the unclaimed register interrupt, so on the IVB irq
  handler we now need to: (i) check whether the interrupt was triggered by an
  unclaimed register and (ii) mask the error interrupt bit so we don't
  risk generating unclaimed register interrupts form inside the
  interrupt handler.
 
  Signed-off-by: Paulo Zanoni paulo.r.zan...@intel.com
  ---
 
  OTOH there's nothing the user can do about it... so we might do a
  WARN_ONCE or something here instead.
 
 Well, so far I haven't seen the message. If we conclude it happens
 *too much*, then we can use WARN_ONCE.
 
  But even then, I'm not sure
  there's much *we* can do about these, as they indicate a corruption in
  the communication between the CPU and PCH.
 
 At least if we get the message we may be able to understand and/or
 reproduce the problems. So far we don't even know whether the problem
 is happening or not... And when there's a display bug, we don't know
 if it's caused by poison.

Ok I guess the DRM_ERROR won't hurt if/until we see reports.  Then we
can dig in and see if keeping the message makes sense or not.

-- 
Jesse Barnes, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx