Re: NVIDIA GPU fallen off the bus after exiting s2idle

2021-05-20 Thread Chris Chiu
On Thu, May 6, 2021 at 5:46 PM Rafael J. Wysocki  wrote:
>
> On Tue, May 4, 2021 at 10:08 AM Chris Chiu  wrote:
> >
> > Hi,
> > We have some Intel laptops (11th generation CPU) with NVIDIA GPU
> > suffering the same GPU falling off the bus problem while exiting
> > s2idle with external display connected. These laptops connect the
> > external display via the HDMI/DisplayPort on a USB Type-C interfaced
> > dock. If we enter and exit s2idle with the dock connected, the NVIDIA
> > GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port can come
> > back to D0 w/o problem. If we enter the s2idle, disconnect the dock,
> > then exit the s2idle, both external display and the panel will remain
> > with no output. The dmesg as follows shows the "nvidia :01:00.0:
> > can't change power state from D3cold to D0 (config space
> > inaccessible)" due to the following ACPI error
> > [ 154.446781]
> > [ 154.446783]
> > [ 154.446783] Initialized Local Variables for Method [IPCS]:
> > [ 154.446784] Local0: 9863e365  Integer 09C5
> > [ 154.446790]
> > [ 154.446791] Initialized Arguments for Method [IPCS]: (7 arguments
> > defined for method invocation)
> > [ 154.446792] Arg0: 25568fbd  Integer 00AC
> > [ 154.446795] Arg1: 9ef30e76  Integer 
> > [ 154.446798] Arg2: fdf820f0  Integer 0010
> > [ 154.446801] Arg3: 9fc2a088  Integer 0001
> > [ 154.446804] Arg4: 3a3418f7  Integer 0001
> > [ 154.446807] Arg5: 20c4b87c  Integer 
> > [ 154.446810] Arg6: 8b965a8a  Integer 
> > [ 154.446813]
> > [ 154.446815] ACPI Error: Aborting method \IPCS due to previous error
> > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446824] ACPI Error: Aborting method \MCUI due to previous error
> > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446829] ACPI Error: Aborting method \SPCX due to previous error
> > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446835] ACPI Error: Aborting method \_SB.PC00.PGSC due to
> > previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446841] ACPI Error: Aborting method \_SB.PC00.PGON due to
> > previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446846] ACPI Error: Aborting method \_SB.PC00.PEG1.NPON due to
> > previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446852] ACPI Error: Aborting method \_SB.PC00.PEG1.PG01._ON due
> > to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446860] acpi device:02: Failed to change power state to D0
> > [ 154.690760] video LNXVIDEO:00: Cannot transition to power state D0
> > for parent in (unknown)
>
> If I were to guess, I would say that AML tries to access memory that
> is not accessible while suspended, probably PCI config space.
>
> > The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON
> > which we expect it to prepare everything before bringing back the
> > NVIDIA GPU but it's stuck in the infinite loop as described below.
> > Please refer to
> > https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44 for
> > the full DSDT.dsl.
>
> The DSDT alone may not be sufficient.
>
> Can you please create a bug entry at bugzilla.kernel.org for this
> issue and attach the full output of acpidump from one of the affected
> machines to it?  And please let me know the number of the bug.
>
> Also please attach the output of dmesg including a suspend-resume
> cycle including dock disconnection while suspended and the ACPI
> messages quoted below.
>
> >While (One)
> > {
> > If ((!IBSY || (IERR == One)))
> > {
> > Break
> > }
> >
> > If ((Local0 > TMOV))
> > {
> > RPKG [Zero] = 0x03
> > Return (RPKG) /* \IPCS.RPKG */
> > }
> >
> > Sleep (One)
> > Local0++
> > }
> >
> > And the upstream PCIe port of NVIDIA seems to become inaccessible due
> > to the messages as follows.
> > [ 292.746508] pcieport :00:01.0: waiting 100 ms for downstream
> > link, after activation
> > [ 292.882296] pci :01:00.0: waiting additional 100 ms to become 
> > accessible
> > [ 316.876997] pci :01:00.0: can't change power state from D3cold
> > to D0 (config space inaccessible)

NVIDIA GPU fallen off the bus after exiting s2idle

2021-05-04 Thread Chris Chiu
Hi,
We have some Intel laptops (11th generation CPU) with NVIDIA GPU
suffering the same GPU falling off the bus problem while exiting
s2idle with external display connected. These laptops connect the
external display via the HDMI/DisplayPort on a USB Type-C interfaced
dock. If we enter and exit s2idle with the dock connected, the NVIDIA
GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port can come
back to D0 w/o problem. If we enter the s2idle, disconnect the dock,
then exit the s2idle, both external display and the panel will remain
with no output. The dmesg as follows shows the "nvidia :01:00.0:
can't change power state from D3cold to D0 (config space
inaccessible)" due to the following ACPI error
[ 154.446781]
[ 154.446783]
[ 154.446783] Initialized Local Variables for Method [IPCS]:
[ 154.446784] Local0: 9863e365  Integer 09C5
[ 154.446790]
[ 154.446791] Initialized Arguments for Method [IPCS]: (7 arguments
defined for method invocation)
[ 154.446792] Arg0: 25568fbd  Integer 00AC
[ 154.446795] Arg1: 9ef30e76  Integer 
[ 154.446798] Arg2: fdf820f0  Integer 0010
[ 154.446801] Arg3: 9fc2a088  Integer 0001
[ 154.446804] Arg4: 3a3418f7  Integer 0001
[ 154.446807] Arg5: 20c4b87c  Integer 
[ 154.446810] Arg6: 8b965a8a  Integer 
[ 154.446813]
[ 154.446815] ACPI Error: Aborting method \IPCS due to previous error
(AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446824] ACPI Error: Aborting method \MCUI due to previous error
(AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446829] ACPI Error: Aborting method \SPCX due to previous error
(AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446835] ACPI Error: Aborting method \_SB.PC00.PGSC due to
previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446841] ACPI Error: Aborting method \_SB.PC00.PGON due to
previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446846] ACPI Error: Aborting method \_SB.PC00.PEG1.NPON due to
previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446852] ACPI Error: Aborting method \_SB.PC00.PEG1.PG01._ON due
to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446860] acpi device:02: Failed to change power state to D0
[ 154.690760] video LNXVIDEO:00: Cannot transition to power state D0
for parent in (unknown)

The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON
which we expect it to prepare everything before bringing back the
NVIDIA GPU but it's stuck in the infinite loop as described below.
Please refer to
https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44 for
the full DSDT.dsl.
   While (One)
{
If ((!IBSY || (IERR == One)))
{
Break
}

If ((Local0 > TMOV))
{
RPKG [Zero] = 0x03
Return (RPKG) /* \IPCS.RPKG */
}

Sleep (One)
Local0++
}

And the upstream PCIe port of NVIDIA seems to become inaccessible due
to the messages as follows.
[ 292.746508] pcieport :00:01.0: waiting 100 ms for downstream
link, after activation
[ 292.882296] pci :01:00.0: waiting additional 100 ms to become accessible
[ 316.876997] pci :01:00.0: can't change power state from D3cold
to D0 (config space inaccessible)

Since the IPCS is the Intel Reference Code and we don't really know
why the never-end loop happens just because we unplug the dock while
the system still stays in s2idle. Can anyone from Intel suggest what
happens here?

And one thing also worth mentioning, if we unplug the display cable
from the dock before entering the s2idle, NVIDIA GPU can come back w/o
problem even if we disconnect the dock before exiting s2idle. Here's
the lspci information
https://gist.github.com/mschiu77/0bfc439d15d52d20de0129b1b2a86dc4 and
the dmesg log with ACPI trace_state enabled and dynamic debug on for
drivers/pci/pci.c, drivers/acpi/device_pm.c for the whole s2idle
enter/exit with IPCS timeout.

Any suggestion would be appreciated. Thanks.

Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


TGL: : No video output on external monitor after unplug and re-plug the cable

2021-04-28 Thread Chris Chiu
We found another bug after the fix of
https://gitlab.freedesktop.org/drm/intel/-/issues/2538. The external
monitor is also connected via WD19's HDMI/DisplayPort just as #2538.
However, the display monitor can only be detected and show output at
the very first time we power on the WD19 dock. If we unplug the cable
and replug again, the monitor seems to be detected but there's no
video output.

When we power on the WD19 dock with cable connected to the monitor,
the drm kernel log shows as follows

 i915 :00:02.0: [drm:intel_get_hpd_pins.isra.0 [i915]] hotplug
event received, stat 0x0001, dig 0x008a, pins 0x0200, long
0x0200
 i915 :00:02.0: [drm:intel_hpd_irq_handler [i915]] digital hpd on
[ENCODER:292:DDI D] - long
 i915 :00:02.0: [drm:intel_hpd_irq_handler [i915]] Received HPD
interrupt on PIN 9 - cnt: 10
 i915 :00:02.0: [drm:intel_dp_hpd_pulse [i915]] got hpd irq on
[ENCODER:292:DDI D] - long
 i915 :00:02.0: [drm:i915_hotplug_work_func [i915]] running
encoder hotplug functions
 i915 :00:02.0: [drm:i915_hotplug_work_func [i915]] Connector DP-1
(pin 9) received hotplug event. (retry 0)
 i915 :00:02.0: [drm:intel_dp_detect [i915]] [CONNECTOR:293:DP-1]
 i915 :00:02.0: [drm:intel_power_well_enable [i915]] enabling TC cold off
 i915 :00:02.0: [drm:tgl_tc_cold_request [i915]] TC cold block succeeded
 i915 :00:02.0: [drm:__intel_tc_port_lock [i915]] Port D/TC#1: TC
port mode reset (tbt-alt -> dp-alt)
 i915 :00:02.0: [drm:intel_power_well_enable [i915]] enabling AUX D TC1
 i915 :00:02.0: [drm:drm_dp_dpcd_read [drm_kms_helper]] AUX D/port
D: 0xf AUX -> (ret=  8) 14 1e 40 55 02 00 00 00
 i915 :00:02.0: [drm:intel_dp_lttpr_init [i915]] LTTPR common
capabilities: 14 1e 40 55 02 00 00 00

Then I replug the cable, the intel_power_well_enable() in
intel_dp_aux_xfer() shows "enabling DC off" power domain instead of
enabling AUX D TC1. After that, the flooded i915 :00:02.0:
[drm:intel_dp_aux_xfer [i915]] AUX D/port D: timeout (status
0x7d4003ff) keeps show up and no video output.

I filed a bug on
https://gitlab.freedesktop.org/drm/intel/-/issues/3407 and also
uploaded the journal log  with kernel boot parameter
"drm.debug=0x10e".

Can anyone suggest what happens at the replug? What can we do to
identify the cause? Thanks

Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Fails to lauch X on laptop with Ryzen 7 4700U

2020-02-03 Thread Chris Chiu
We are working with new laptops that have the Ryzen 7 4700U.

It fails to launch X so I can only access via the virtual terminal.
I tried with the latest mainline kernel and kernel from
https://cgit.freedesktop.org/~agd5f/linux but no luck. I also boot
the kernel with parameter amdgpu.exp_hw_support=1, but
the system freezes after loading amdgpu and I can't even switch
to the virtual terminal.

I post the bug description and related information on
https://gitlab.freedesktop.org/drm/amd/issues/1031.

Please kindly advise what I should do next.

Thanks
Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Unexpected screen flicker during i915 initialization

2019-10-31 Thread Chris Chiu
Hi guys,
We have 2 laptops, ASUS Z406MA and Acer TravelMate B118, both
powered by the same Intel N5000 GemniLake CPU. On the Acer laptop, the
panel will blink once during boot which never happens on the ASUS
laptop. It caught my attention and I find the difference between them
but I need help for more information,

The major difference happens in bxt_sanitize_cdclk() on the
following condition check.
if (cdctl == expected)
/* All well; nothing to sanitize */
return;

On the problematic Acer laptop, the value of cdctl is 0x27a while
the same cdctl is 0x278 on ASUS machine. Due to the 0x27a is not equal
to the expected value 0x278 so it needs to be sanitized by assigning
-1 to  dev_priv->cdclk.hw.vco. Then the consequent bxt_set_cdclk()
will force the full PLL disable and enable. And that's the flicker
(blink) we observed during boot.

Although I can't find the definition about the BIT(2) of CDCLK_CTL
which cause this difference. Can anyone suggest what exactly the
problem is and how should we deal with it? Thanks.

Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Unexpected screen flicker during i915 initialization

2019-10-31 Thread Chris Chiu
On Wed, Oct 30, 2019 at 6:25 PM Chris Chiu  wrote:
>
> Hi guys,
> We have 2 laptops, ASUS Z406MA and Acer TravelMate B118, both
> powered by the same Intel N5000 GemniLake CPU. On the Acer laptop, the
> panel will blink once during boot which never happens on the ASUS
> laptop. It caught my attention and I find the difference between them
> but I need help for more information,

Sorry, I forgot to mention that the problem was reproduced on the
latest kernel 5.3.

Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [BUG] i915 HDMI connector status is connected after disconnection

2018-09-21 Thread Chris Chiu
On Wed, Sep 19, 2018 at 8:08 PM, Jani Nikula  wrote:
> On Wed, 19 Sep 2018, Chris Chiu  wrote:
>> I tried to add a slight delay in the hotplug work as follows
>>
>> --- a/drivers/gpu/drm/i915/intel_hotplug.c
>> +++ b/drivers/gpu/drm/i915/intel_hotplug.c
>> @@ -378,6 +378,8 @@ static void do_i915_hotplug_check(struct work_struct 
>> *work,
>>
>> spin_unlock_irq(_priv->irq_lock);
>>
>> +   msleep(100);
>> +
>> drm_connector_list_iter_begin(dev, _iter);
>> drm_for_each_connector_iter(connector, _iter) {
>> intel_connector = to_intel_connector(connector);
>>
>> It does work in most cases, but still fail to update the status if I
>> unplug the HDMI
>> cable very slow. I basically pull the HDMI cable in loose connected
>> state first, and
>> hold in that state ~1 second, totally unplug after that. The status in
>> sysfs will report
>> connected as it used to. There was no problem when I tried the patch
>> https://bugs.freedesktop.org/show_bug.cgi?id=107125#c8
>>
>> I'll try to modify this patch a little bit and send upstream for
>> discussion later. Please
>> advise if any. Thanks.
>
> Please let's not add excessive msleeps in work functions.
>
> My idea was more along the lines of making the hotplug function run in a
> delayed work. After a chat with Ville, below is what I came up with.
>
> Please let me know how it works. Feel free to toy with the
> delay. However, 1-2 seconds or more seems too much.
>
> BR,
> Jani.
>

Thanks to the patch. It works in most cases on my problematic laptops.
After lots
of experiments, ex. pull the cable out with different paces, range
delay from 300 to
800 msec, it makes no significant difference for a longer delay. So
300 msec is good
enough for most cases. It at least updates the status correctly with a
visible quick
display blink when disconnecting HDMI. And compared to other machines which have
no such problem, the HDMI cable slow pull out also result in the same
problem. I'll
say the test result is OK for me. Thanks.

Chris

>
>
> From 72515b3e856171e52e96bb74796774f595a7f418 Mon Sep 17 00:00:00 2001
> From: Jani Nikula 
> Date: Tue, 18 Sep 2018 13:12:34 +0300
> Subject: [PATCH] drm/i915: delay hotplug scheduling
> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
> Cc: Jani Nikula 
>
> On some systems we get the hotplug irq on unplug before the DDC pins
> have been disconnected, resulting in connector status remaining
> connected. Since previous attempts at using hotplug live status before
> DDC have failed, the only viable option is to reduce the window for
> mistakes. Delay the hotplug processing.
>
> Signed-off-by: Jani Nikula 
> ---
>  drivers/gpu/drm/i915/i915_drv.h  |  2 +-
>  drivers/gpu/drm/i915/intel_hotplug.c | 15 ++-
>  2 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 7d4daa7412f1..27f579abddae 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -286,7 +286,7 @@ enum hpd_pin {
>  #define HPD_STORM_DEFAULT_THRESHOLD 5
>
>  struct i915_hotplug {
> -   struct work_struct hotplug_work;
> +   struct delayed_work hotplug_work;
>
> struct {
> unsigned long last_jiffies;
> diff --git a/drivers/gpu/drm/i915/intel_hotplug.c 
> b/drivers/gpu/drm/i915/intel_hotplug.c
> index 648a13c6043c..3af64daa5cfc 100644
> --- a/drivers/gpu/drm/i915/intel_hotplug.c
> +++ b/drivers/gpu/drm/i915/intel_hotplug.c
> @@ -110,6 +110,8 @@ enum hpd_pin intel_hpd_pin_default(struct 
> drm_i915_private *dev_priv,
> }
>  }
>
> +#define HOTPLUG_DELAY_MS   300
> +
>  #define HPD_STORM_DETECT_PERIOD1000
>  #define HPD_STORM_REENABLE_DELAY   (2 * 60 * 1000)
>
> @@ -319,7 +321,8 @@ static void i915_digport_work_func(struct work_struct 
> *work)
> spin_lock_irq(_priv->irq_lock);
> dev_priv->hotplug.event_bits |= old_bits;
> spin_unlock_irq(_priv->irq_lock);
> -   schedule_work(_priv->hotplug.hotplug_work);
> +   schedule_delayed_work(_priv->hotplug.hotplug_work,
> + msecs_to_jiffies(HOTPLUG_DELAY_MS));
> }
>  }
>
> @@ -329,7 +332,7 @@ static void i915_digport_work_func(struct work_struct 
> *work)
>  static void i915_hotplug_work_func(struct work_struct *work)
>  {
> struct drm_i915_private *dev_priv =
> -   container_of(work, struct drm_i915_private, 
> hotplug.hotplug

[PATCH] drm/i915: re-check the hotplug with a delayed work

2018-09-20 Thread Chris Chiu
I have few ASUS laptops, X705FD(Intel i7-8565), X560UD(Intel i5-8250U)
and X530UN(Intel i7-8550U) share the same problem. The HDMI connector
status stays 'connected' even the HDMI cable has been unplugged.
Then the status in sysfs would never change since then until we
do 'xrandr' to reprobe the devices. It would also cause the audio
output path cannot correctly swicth based on the connector status.

This commit kicks off a delayed work when the status remains unchanged
in the first hotplug event handling, which may not be the perfect
timing in some special cases.

Signed-off-by: Chris Chiu 
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/intel_hotplug.c | 35 +++
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d51d8574a679..78e2cf09cc10 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -286,6 +286,7 @@ struct i915_hotplug {
} stats[HPD_NUM_PINS];
u32 event_bits;
struct delayed_work reenable_work;
+   struct delayed_work recheck_work;
 
struct intel_digital_port *irq_port[I915_MAX_PORTS];
u32 long_port_mask;
diff --git a/drivers/gpu/drm/i915/intel_hotplug.c 
b/drivers/gpu/drm/i915/intel_hotplug.c
index 43aa92beff2a..089a24588ec8 100644
--- a/drivers/gpu/drm/i915/intel_hotplug.c
+++ b/drivers/gpu/drm/i915/intel_hotplug.c
@@ -349,14 +349,15 @@ static void i915_digport_work_func(struct work_struct 
*work)
}
 }
 
+#define HPD_RECHECK_DELAY(2 * 1000)
+
 /*
  * Handle hotplug events outside the interrupt handler proper.
  */
-static void i915_hotplug_work_func(struct work_struct *work)
+static void do_i915_hotplug_check(struct work_struct *work,
+  struct drm_i915_private *dev_priv,
+  struct drm_device *dev, bool do_recheck)
 {
-   struct drm_i915_private *dev_priv =
-   container_of(work, struct drm_i915_private, 
hotplug.hotplug_work);
-   struct drm_device *dev = _priv->drm;
struct intel_connector *intel_connector;
struct intel_encoder *intel_encoder;
struct drm_connector *connector;
@@ -396,8 +397,31 @@ static void i915_hotplug_work_func(struct work_struct 
*work)
 
if (changed)
drm_kms_helper_hotplug_event(dev);
+   else if (do_recheck) {
+   spin_lock_irq(_priv->irq_lock);
+   dev_priv->hotplug.event_bits |= hpd_event_bits;
+   spin_unlock_irq(_priv->irq_lock);
+   schedule_delayed_work(_priv->hotplug.recheck_work, 
msecs_to_jiffies(HPD_RECHECK_DELAY));
+   }
 }
 
+static void i915_hotplug_work_func(struct work_struct *work)
+{
+   struct drm_i915_private *dev_priv =
+   container_of(work, struct drm_i915_private, 
hotplug.hotplug_work);
+   struct drm_device *dev = _priv->drm;
+
+   do_i915_hotplug_check(work, dev_priv, dev, true);
+}
+
+static void i915_hotplug_recheck_func(struct work_struct *work)
+{
+   struct drm_i915_private *dev_priv =
+   container_of(work, struct drm_i915_private, 
hotplug.recheck_work.work);
+   struct drm_device *dev = _priv->drm;
+
+   do_i915_hotplug_check(work, dev_priv, dev, false);
+}
 
 /**
  * intel_hpd_irq_handler - main hotplug irq handler
@@ -619,6 +643,8 @@ void intel_hpd_init_work(struct drm_i915_private *dev_priv)
INIT_WORK(_priv->hotplug.poll_init_work, i915_hpd_poll_init_work);
INIT_DELAYED_WORK(_priv->hotplug.reenable_work,
  intel_hpd_irq_storm_reenable_work);
+   INIT_DELAYED_WORK(_priv->hotplug.recheck_work,
+ i915_hotplug_recheck_func);
 }
 
 void intel_hpd_cancel_work(struct drm_i915_private *dev_priv)
@@ -635,6 +661,7 @@ void intel_hpd_cancel_work(struct drm_i915_private 
*dev_priv)
cancel_work_sync(_priv->hotplug.hotplug_work);
cancel_work_sync(_priv->hotplug.poll_init_work);
cancel_delayed_work_sync(_priv->hotplug.reenable_work);
+   cancel_delayed_work_sync(_priv->hotplug.recheck_work);
 }
 
 bool intel_hpd_disable(struct drm_i915_private *dev_priv, enum hpd_pin pin)
-- 
2.11.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] i915 HDMI connector status is connected after disconnection

2018-09-20 Thread Chris Chiu
On Tue, Sep 11, 2018 at 6:25 PM, Chris Chiu  wrote:
> On Fri, Aug 24, 2018 at 11:04 PM, Jani Nikula  wrote:
>> On Wed, 22 Aug 2018, Chris Chiu  wrote:
>>> On Fri, Jul 6, 2018 at 2:44 PM, Chris Chiu  wrote:
>>>> On Thu, Jul 5, 2018 at 10:40 PM, Ville Syrjälä
>>>>  wrote:
>>>>> On Thu, Jul 05, 2018 at 03:58:36PM +0800, Chris Chiu wrote:
>>>>>> Hi,
>>>>>> We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel
>>>>>> i5-8250U), X530UN (intel i7-8550U) share the same problem, which is
>>>>>> the HDMI connector status stays connected even the HDMI cable has been
>>>>>> unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for
>>>>>> checking the status while plug/unplug the HDMI, it shows
>>>>>> "disconnected" before plug in HDMI cable, then switch to "connected"
>>>>>> after plugin, and still stay "connected" after unplug. This would
>>>>>> cause the audio output path cannot correctly switch from HDMI to
>>>>>> internal speaker after unplugging the HDMI.
>>>>>>
>>>>>> I then try to verify with the latest kernel 4.18.0-rc3+, the bug still
>>>>>> present. The full "dmesg" log is here.
>>>>>> https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1
>>>>>>
>>>>>> The HDMI cable is plugged in at ~26th second.
>>>>>> "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>>>>>> audio support"
>>>>>> then unplug the HDMI at ~73th second.
>>>>>> "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>>>>>> audio support"
>>>>>>
>>>>>> Please advise what I can do to fix this. Thanks
>>>>>
>>>>> Pull the cable out faster?
>>>>>
>>>>> I presume this is the same old case of hpd disconnecting slightly
>>>>> before ddc and we still manage to read the EDID when processing
>>>>> the hpd irq. We kinda tried to fix that with the live status
>>>>> check but that thing failed spectacularly.
>>>>>
>>>>> --
>>>>> Ville Syrjälä
>>>>> Intel
>>>
>>> There's a patch https://bugs.freedesktop.org/show_bug.cgi?id=107125#c8.
>>> And I verified on the X705FD/X560UD which were easy to reproduce, the patch
>>> works as expected. Can anyone kindly give comments about this patch?
>>> We can do anything to help fix this issue upstream. Thanks
>>
>> Seems like a hack. Should look into hw based debouncing or a slight
>> delay in the hotplug work processing I think.
>>
>> BR,
>> Jani.
>>

I tried to add a slight delay in the hotplug work as follows

--- a/drivers/gpu/drm/i915/intel_hotplug.c
+++ b/drivers/gpu/drm/i915/intel_hotplug.c
@@ -378,6 +378,8 @@ static void do_i915_hotplug_check(struct work_struct *work,

spin_unlock_irq(_priv->irq_lock);

+   msleep(100);
+
drm_connector_list_iter_begin(dev, _iter);
drm_for_each_connector_iter(connector, _iter) {
intel_connector = to_intel_connector(connector);

It does work in most cases, but still fail to update the status if I
unplug the HDMI
cable very slow. I basically pull the HDMI cable in loose connected
state first, and
hold in that state ~1 second, totally unplug after that. The status in
sysfs will report
connected as it used to. There was no problem when I tried the patch
https://bugs.freedesktop.org/show_bug.cgi?id=107125#c8

I'll try to modify this patch a little bit and send upstream for
discussion later. Please
advise if any. Thanks.

Chris

> So you're suggesting to add a slight delay directly in 
> i915_hotplug_work_func()?
> And any suggestion about the 'hw based' debouncing? Maybe some examples
> that I can refer to?
>
> Thanks
>
>>>
>>> Chris
>>>
>>>> Thanks for the suggestion. I tried pulling the cable out faster, the status
>>>> shows correctly. I also tried branch drm-tip of
>>>> https://cgit.freedesktop.org/drm/drm-tip
>>>> but the symptom persists.
>>>>
>>>> Anything I can help here? Or any old commit/patch I can try to do some
>>>> experiments?
>>>>
>>>> Chris
>>
>> --
>> Jani Nikula, Intel Open Source Graphics Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] i915 HDMI connector status is connected after disconnection

2018-09-12 Thread Chris Chiu
On Fri, Aug 24, 2018 at 11:04 PM, Jani Nikula  wrote:
> On Wed, 22 Aug 2018, Chris Chiu  wrote:
>> On Fri, Jul 6, 2018 at 2:44 PM, Chris Chiu  wrote:
>>> On Thu, Jul 5, 2018 at 10:40 PM, Ville Syrjälä
>>>  wrote:
>>>> On Thu, Jul 05, 2018 at 03:58:36PM +0800, Chris Chiu wrote:
>>>>> Hi,
>>>>> We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel
>>>>> i5-8250U), X530UN (intel i7-8550U) share the same problem, which is
>>>>> the HDMI connector status stays connected even the HDMI cable has been
>>>>> unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for
>>>>> checking the status while plug/unplug the HDMI, it shows
>>>>> "disconnected" before plug in HDMI cable, then switch to "connected"
>>>>> after plugin, and still stay "connected" after unplug. This would
>>>>> cause the audio output path cannot correctly switch from HDMI to
>>>>> internal speaker after unplugging the HDMI.
>>>>>
>>>>> I then try to verify with the latest kernel 4.18.0-rc3+, the bug still
>>>>> present. The full "dmesg" log is here.
>>>>> https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1
>>>>>
>>>>> The HDMI cable is plugged in at ~26th second.
>>>>> "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>>>>> audio support"
>>>>> then unplug the HDMI at ~73th second.
>>>>> "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>>>>> audio support"
>>>>>
>>>>> Please advise what I can do to fix this. Thanks
>>>>
>>>> Pull the cable out faster?
>>>>
>>>> I presume this is the same old case of hpd disconnecting slightly
>>>> before ddc and we still manage to read the EDID when processing
>>>> the hpd irq. We kinda tried to fix that with the live status
>>>> check but that thing failed spectacularly.
>>>>
>>>> --
>>>> Ville Syrjälä
>>>> Intel
>>
>> There's a patch https://bugs.freedesktop.org/show_bug.cgi?id=107125#c8.
>> And I verified on the X705FD/X560UD which were easy to reproduce, the patch
>> works as expected. Can anyone kindly give comments about this patch?
>> We can do anything to help fix this issue upstream. Thanks
>
> Seems like a hack. Should look into hw based debouncing or a slight
> delay in the hotplug work processing I think.
>
> BR,
> Jani.
>
So you're suggesting to add a slight delay directly in i915_hotplug_work_func()?
And any suggestion about the 'hw based' debouncing? Maybe some examples
that I can refer to?

Thanks

>>
>> Chris
>>
>>> Thanks for the suggestion. I tried pulling the cable out faster, the status
>>> shows correctly. I also tried branch drm-tip of
>>> https://cgit.freedesktop.org/drm/drm-tip
>>> but the symptom persists.
>>>
>>> Anything I can help here? Or any old commit/patch I can try to do some
>>> experiments?
>>>
>>> Chris
>
> --
> Jani Nikula, Intel Open Source Graphics Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] i915 HDMI connector status is connected after disconnection

2018-08-22 Thread Chris Chiu
On Fri, Jul 6, 2018 at 2:44 PM, Chris Chiu  wrote:
> On Thu, Jul 5, 2018 at 10:40 PM, Ville Syrjälä
>  wrote:
>> On Thu, Jul 05, 2018 at 03:58:36PM +0800, Chris Chiu wrote:
>>> Hi,
>>> We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel
>>> i5-8250U), X530UN (intel i7-8550U) share the same problem, which is
>>> the HDMI connector status stays connected even the HDMI cable has been
>>> unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for
>>> checking the status while plug/unplug the HDMI, it shows
>>> "disconnected" before plug in HDMI cable, then switch to "connected"
>>> after plugin, and still stay "connected" after unplug. This would
>>> cause the audio output path cannot correctly switch from HDMI to
>>> internal speaker after unplugging the HDMI.
>>>
>>> I then try to verify with the latest kernel 4.18.0-rc3+, the bug still
>>> present. The full "dmesg" log is here.
>>> https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1
>>>
>>> The HDMI cable is plugged in at ~26th second.
>>> "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>>> audio support"
>>> then unplug the HDMI at ~73th second.
>>> "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>>> audio support"
>>>
>>> Please advise what I can do to fix this. Thanks
>>
>> Pull the cable out faster?
>>
>> I presume this is the same old case of hpd disconnecting slightly
>> before ddc and we still manage to read the EDID when processing
>> the hpd irq. We kinda tried to fix that with the live status
>> check but that thing failed spectacularly.
>>
>> --
>> Ville Syrjälä
>> Intel

There's a patch https://bugs.freedesktop.org/show_bug.cgi?id=107125#c8.
And I verified on the X705FD/X560UD which were easy to reproduce, the patch
works as expected. Can anyone kindly give comments about this patch?
We can do anything to help fix this issue upstream. Thanks

Chris

> Thanks for the suggestion. I tried pulling the cable out faster, the status
> shows correctly. I also tried branch drm-tip of
> https://cgit.freedesktop.org/drm/drm-tip
> but the symptom persists.
>
> Anything I can help here? Or any old commit/patch I can try to do some
> experiments?
>
> Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] i915 HDMI connector status is connected after disconnection

2018-07-06 Thread Chris Chiu
On Thu, Jul 5, 2018 at 10:40 PM, Ville Syrjälä
 wrote:
> On Thu, Jul 05, 2018 at 03:58:36PM +0800, Chris Chiu wrote:
>> Hi,
>> We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel
>> i5-8250U), X530UN (intel i7-8550U) share the same problem, which is
>> the HDMI connector status stays connected even the HDMI cable has been
>> unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for
>> checking the status while plug/unplug the HDMI, it shows
>> "disconnected" before plug in HDMI cable, then switch to "connected"
>> after plugin, and still stay "connected" after unplug. This would
>> cause the audio output path cannot correctly switch from HDMI to
>> internal speaker after unplugging the HDMI.
>>
>> I then try to verify with the latest kernel 4.18.0-rc3+, the bug still
>> present. The full "dmesg" log is here.
>> https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1
>>
>> The HDMI cable is plugged in at ~26th second.
>> "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>> audio support"
>> then unplug the HDMI at ~73th second.
>> "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>> audio support"
>>
>> Please advise what I can do to fix this. Thanks
>
> Pull the cable out faster?
>
> I presume this is the same old case of hpd disconnecting slightly
> before ddc and we still manage to read the EDID when processing
> the hpd irq. We kinda tried to fix that with the live status
> check but that thing failed spectacularly.
>
> --
> Ville Syrjälä
> Intel

Thanks for the suggestion. I tried pulling the cable out faster, the status
shows correctly. I also tried branch drm-tip of
https://cgit.freedesktop.org/drm/drm-tip
but the symptom persists.

Anything I can help here? Or any old commit/patch I can try to do some
experiments?

Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[BUG] i915 HDMI connector status is connected after disconnection

2018-07-05 Thread Chris Chiu
Hi,
We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel
i5-8250U), X530UN (intel i7-8550U) share the same problem, which is
the HDMI connector status stays connected even the HDMI cable has been
unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for
checking the status while plug/unplug the HDMI, it shows
"disconnected" before plug in HDMI cable, then switch to "connected"
after plugin, and still stay "connected" after unplug. This would
cause the audio output path cannot correctly switch from HDMI to
internal speaker after unplugging the HDMI.

I then try to verify with the latest kernel 4.18.0-rc3+, the bug still
present. The full "dmesg" log is here.
https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1

The HDMI cable is plugged in at ~26th second.
"[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
audio support"
then unplug the HDMI at ~73th second.
"[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
audio support"

Please advise what I can do to fix this. Thanks

Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] i915 HDMI connector status is connected after disconnection

2018-07-05 Thread Chris Chiu
On Thu, Jul 5, 2018 at 9:18 PM, Jani Nikula  wrote:
> On Thu, 05 Jul 2018, Chris Chiu  wrote:
>> On Thu, Jul 5, 2018 at 5:37 PM, Jani Nikula  wrote:
>>> On Thu, 05 Jul 2018, Chris Wilson  wrote:
>>>> Quoting Jani Nikula (2018-07-05 09:58:57)
>>>>> On Thu, 05 Jul 2018, Chris Chiu  wrote:
>>>>> > Hi,
>>>>> > We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel
>>>>> > i5-8250U), X530UN (intel i7-8550U) share the same problem, which is
>>>>> > the HDMI connector status stays connected even the HDMI cable has been
>>>>> > unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for
>>>>> > checking the status while plug/unplug the HDMI, it shows
>>>>> > "disconnected" before plug in HDMI cable, then switch to "connected"
>>>>> > after plugin, and still stay "connected" after unplug. This would
>>>>> > cause the audio output path cannot correctly switch from HDMI to
>>>>> > internal speaker after unplugging the HDMI.
>>>>> >
>>>>> > I then try to verify with the latest kernel 4.18.0-rc3+, the bug still
>>>>> > present. The full "dmesg" log is here.
>>>>> > https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1
>>>>> >
>>>>> > The HDMI cable is plugged in at ~26th second.
>>>>> > "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>>>>> > audio support"
>>>>> > then unplug the HDMI at ~73th second.
>>>>> > "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>>>>> > audio support"
>>>>> >
>>>>> > Please advise what I can do to fix this. Thanks
>>>>>
>>>>> Seems rather odd. Please file a bug report at [1]. Attach the dmesg on
>>>>> the bug. Please attach 'xrandr --verbose' output before and after
>>>>> unplugging on the bug.
>>>>
>>>> Note that 'xrandr --verbose' will trigger a reprobe of the devices,
>>>> papering over any missed probe following hotplug.  I would suggest
>>>> preceding with 'xrandr --current --verbose'.
>>>>
>>>> If all you are doing is checking status, you need to 'echo detect >
>>>> status' to trigger a reprobe after hotplug.
>>
>> It's interesting that reprobe triggered by 'xrandr --verbose' after unplug 
>> will
>> get the status back to "disconnected". But if I just do 'xrandr
>> --current --verbose'
>> before and after unplugging the cable, the output shows the same status
>> 'connected'.
>>
>> Here's the output of 'xrandr --verbose' before unplugging HDMI
>> https://gist.github.com/mschiu77/ea2e843078297f344596243418dcdaf7
>>
>> And the output of 'xrandr --current --verbose' after unplugging the cable
>> https://gist.github.com/mschiu77/55756c0801046d49cd9bc3f87712b079
>>
>> Then do 'xrandr --current --verbose' to trigger reprobe, the ouput
>> https://gist.github.com/mschiu77/72e6ab5438cbe64443300fc4fd71770c
>>
>> It means that the HDMI unplug not detected by the driver?
>
> Please do file the bug, and attach the information there. People go on
> vacations, the pastebins will go away, and the memory of all of this
> will fade.
>

Sorry that I missed to list here. I've reported the bug as follows
https://bugs.freedesktop.org/show_bug.cgi?id=107125

Thanks

> BR,
> Jani.
>
>>
>> Chris
>>
>>>
>>> I was curious about the logs seemingly indicating that we can read the
>>> EDID even after the user says they've unplugged the cable. The updating
>>> of sysfs status attribute is another matter.
>>>
>>> BR,
>>> Jani.
>>>
>>>
>>> --
>>> Jani Nikula, Intel Open Source Graphics Center
>
> --
> Jani Nikula, Intel Open Source Graphics Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] i915 HDMI connector status is connected after disconnection

2018-07-05 Thread Chris Chiu
On Thu, Jul 5, 2018 at 5:37 PM, Jani Nikula  wrote:
> On Thu, 05 Jul 2018, Chris Wilson  wrote:
>> Quoting Jani Nikula (2018-07-05 09:58:57)
>>> On Thu, 05 Jul 2018, Chris Chiu  wrote:
>>> > Hi,
>>> > We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel
>>> > i5-8250U), X530UN (intel i7-8550U) share the same problem, which is
>>> > the HDMI connector status stays connected even the HDMI cable has been
>>> > unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for
>>> > checking the status while plug/unplug the HDMI, it shows
>>> > "disconnected" before plug in HDMI cable, then switch to "connected"
>>> > after plugin, and still stay "connected" after unplug. This would
>>> > cause the audio output path cannot correctly switch from HDMI to
>>> > internal speaker after unplugging the HDMI.
>>> >
>>> > I then try to verify with the latest kernel 4.18.0-rc3+, the bug still
>>> > present. The full "dmesg" log is here.
>>> > https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1
>>> >
>>> > The HDMI cable is plugged in at ~26th second.
>>> > "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>>> > audio support"
>>> > then unplug the HDMI at ~73th second.
>>> > "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic
>>> > audio support"
>>> >
>>> > Please advise what I can do to fix this. Thanks
>>>
>>> Seems rather odd. Please file a bug report at [1]. Attach the dmesg on
>>> the bug. Please attach 'xrandr --verbose' output before and after
>>> unplugging on the bug.
>>
>> Note that 'xrandr --verbose' will trigger a reprobe of the devices,
>> papering over any missed probe following hotplug.  I would suggest
>> preceding with 'xrandr --current --verbose'.
>>
>> If all you are doing is checking status, you need to 'echo detect >
>> status' to trigger a reprobe after hotplug.

It's interesting that reprobe triggered by 'xrandr --verbose' after unplug will
get the status back to "disconnected". But if I just do 'xrandr
--current --verbose'
before and after unplugging the cable, the output shows the same status
'connected'.

Here's the output of 'xrandr --verbose' before unplugging HDMI
https://gist.github.com/mschiu77/ea2e843078297f344596243418dcdaf7

And the output of 'xrandr --current --verbose' after unplugging the cable
https://gist.github.com/mschiu77/55756c0801046d49cd9bc3f87712b079

Then do 'xrandr --current --verbose' to trigger reprobe, the ouput
https://gist.github.com/mschiu77/72e6ab5438cbe64443300fc4fd71770c

It means that the HDMI unplug not detected by the driver?

Chris

>
> I was curious about the logs seemingly indicating that we can read the
> EDID even after the user says they've unplugged the cable. The updating
> of sysfs status attribute is another matter.
>
> BR,
> Jani.
>
>
> --
> Jani Nikula, Intel Open Source Graphics Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu hangs on boot or shutdown on AMD Raven Ridge CPU (Engineer Sample)

2018-02-09 Thread Chris Chiu
On Thu, Feb 1, 2018 at 9:13 PM, Chris Chiu <c...@endlessm.com> wrote:
> On Thu, Feb 1, 2018 at 12:08 AM, Harry Wentland <harry.wentl...@amd.com> 
> wrote:
>> On 2018-01-31 09:31 AM, Chris Chiu wrote:
>>> Hi,
>>> We are working with new laptops that have the AMD Ravenl Ridge
>>> chipset with this `/proc/cpuinfo`
>>> https://gist.github.com/mschiu77/b06dba574e89b9a30cf4c450eaec49bc
>>>
>>> With the latest kernel 4.15, there're lots of different
>>> panics/oops during boot so no chance to get into X. It also happens
>>> during shutdown. Then I tried to build kernel from
>>> git://people.freedesktop.org/~agd5f/linux on branch
>>> amd-staging-drm-next with head on commit "drm: Fix trailing semicolon"
>>> and update the linux-firmware. Things seem to get better, only 1 oops
>>> observed. Here's the oops
>>> https://gist.github.com/mschiu77/1a68f27272b24775b2040acdb474cdd3.
>>
>> Hi Chris,
>>
>> what are the steps to reproduce this oops?
>>
>> Does it reproduce all the time or is it intermittent?
>>
>> Can you send a dmesg with amdgpu.dc_log=1, in addition to drm.debug=0xe?
>>
>> Thanks,
>> Harry
>>
>
> I did nothing special to reproduce the oops. Boot and sometimes it
> just shows blank
> screen but still responds to magic sysrq. So I reboot and take the journal 
> log.
>
> It's intermittent, I ran into it 2 times during 13 reboots.
> The logs are listed as follows
> https://gist.github.com/mschiu77/9307d1ca0acd046cc6817f8cad63d79c
> https://gist.github.com/mschiu77/fa81110f93428721f017cb9fbfd06fbe
>
> One more log here. It enters X OK but after few minutes the display
> went black and
> only a mouse cursor left. But the mouse cursor can't even move. So I do a 
> sysrq
> reboot again.
> The last error is
> ""
> [  636.312759] endless kernel:
> [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
> [CRTC:41:crtc-0] flip_done timed out
> [  646.552344] endless kernel:
> [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR*
> [CRTC:41:crtc-0] flip_done timed out
> ""
> full log here 
> https://gist.github.com/mschiu77/c8696e5fefb17bb1c53598214fb4e382
>
> Only 4 times I can login X, blank screen or hangs w/o responding to
> magic sysrq for
> the rest. I took a picture of the only panic although I think it's not
> about amdgpu.
> It's here.
> https://pasteboard.co/H5CUvxk.jpg
>
> Hope they can be helpful.
>
> Chris
>
>>> However, I still get stuck on the following messages during boot very
>>> often
>>> ""
>>> [4.998241] endless kernel: [drm] amdgpu kernel modesetting enabled.
>>> [4.998288] endless kernel: checking generic (e000 7f) vs
>>> hw (e000 1000)
>>> [4.998289] endless kernel: fb: switching to amdgpudrmfb from EFI VGA
>>> ""
>>> I turned on drm.debug=0xe while booting, but no more information at this 
>>> point.
>>> Anything I can do at this point?
>>>
>>> And there's 1 more information may be helpful. Sometimes the
>>> system boots OK with the blank screen, I can't switch to virtual
>>> console, but it did respond to the magic sys-rq key. The dmesg with
>>> drm.debug=0xe is here
>>> https://gist.github.com/mschiu77/291e47b1f07dc52be9461c55c820464c.
>>>
>>> I'm pretty sure it's due to the amdgpu driver. Because when I boot
>>> with my own kernel which disables the amdgpu driver, all these
>>> symptoms went away. Please suggest anything I can do for this. Thanks
>>>
>>> Chris
>>> ___
>>> amd-gfx mailing list
>>> amd-...@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>

Gentle ping, cheers.

Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu hangs on boot or shutdown on AMD Raven Ridge CPU (Engineer Sample)

2018-02-01 Thread Chris Chiu
On Thu, Feb 1, 2018 at 12:08 AM, Harry Wentland <harry.wentl...@amd.com> wrote:
> On 2018-01-31 09:31 AM, Chris Chiu wrote:
>> Hi,
>> We are working with new laptops that have the AMD Ravenl Ridge
>> chipset with this `/proc/cpuinfo`
>> https://gist.github.com/mschiu77/b06dba574e89b9a30cf4c450eaec49bc
>>
>> With the latest kernel 4.15, there're lots of different
>> panics/oops during boot so no chance to get into X. It also happens
>> during shutdown. Then I tried to build kernel from
>> git://people.freedesktop.org/~agd5f/linux on branch
>> amd-staging-drm-next with head on commit "drm: Fix trailing semicolon"
>> and update the linux-firmware. Things seem to get better, only 1 oops
>> observed. Here's the oops
>> https://gist.github.com/mschiu77/1a68f27272b24775b2040acdb474cdd3.
>
> Hi Chris,
>
> what are the steps to reproduce this oops?
>
> Does it reproduce all the time or is it intermittent?
>
> Can you send a dmesg with amdgpu.dc_log=1, in addition to drm.debug=0xe?
>
> Thanks,
> Harry
>

I did nothing special to reproduce the oops. Boot and sometimes it
just shows blank
screen but still responds to magic sysrq. So I reboot and take the journal log.

It's intermittent, I ran into it 2 times during 13 reboots.
The logs are listed as follows
https://gist.github.com/mschiu77/9307d1ca0acd046cc6817f8cad63d79c
https://gist.github.com/mschiu77/fa81110f93428721f017cb9fbfd06fbe

One more log here. It enters X OK but after few minutes the display
went black and
only a mouse cursor left. But the mouse cursor can't even move. So I do a sysrq
reboot again.
The last error is
""
[  636.312759] endless kernel:
[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
[CRTC:41:crtc-0] flip_done timed out
[  646.552344] endless kernel:
[drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR*
[CRTC:41:crtc-0] flip_done timed out
""
full log here https://gist.github.com/mschiu77/c8696e5fefb17bb1c53598214fb4e382

Only 4 times I can login X, blank screen or hangs w/o responding to
magic sysrq for
the rest. I took a picture of the only panic although I think it's not
about amdgpu.
It's here.
https://pasteboard.co/H5CUvxk.jpg

Hope they can be helpful.

Chris

>> However, I still get stuck on the following messages during boot very
>> often
>> ""
>> [4.998241] endless kernel: [drm] amdgpu kernel modesetting enabled.
>> [4.998288] endless kernel: checking generic (e000 7f) vs
>> hw (e000 1000)
>> [4.998289] endless kernel: fb: switching to amdgpudrmfb from EFI VGA
>> ""
>> I turned on drm.debug=0xe while booting, but no more information at this 
>> point.
>> Anything I can do at this point?
>>
>> And there's 1 more information may be helpful. Sometimes the
>> system boots OK with the blank screen, I can't switch to virtual
>> console, but it did respond to the magic sys-rq key. The dmesg with
>> drm.debug=0xe is here
>> https://gist.github.com/mschiu77/291e47b1f07dc52be9461c55c820464c.
>>
>> I'm pretty sure it's due to the amdgpu driver. Because when I boot
>> with my own kernel which disables the amdgpu driver, all these
>> symptoms went away. Please suggest anything I can do for this. Thanks
>>
>> Chris
>> ___
>> amd-gfx mailing list
>> amd-...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


amdgpu hangs on boot or shutdown on AMD Raven Ridge CPU (Engineer Sample)

2018-01-31 Thread Chris Chiu
Hi,
We are working with new laptops that have the AMD Ravenl Ridge
chipset with this `/proc/cpuinfo`
https://gist.github.com/mschiu77/b06dba574e89b9a30cf4c450eaec49bc

With the latest kernel 4.15, there're lots of different
panics/oops during boot so no chance to get into X. It also happens
during shutdown. Then I tried to build kernel from
git://people.freedesktop.org/~agd5f/linux on branch
amd-staging-drm-next with head on commit "drm: Fix trailing semicolon"
and update the linux-firmware. Things seem to get better, only 1 oops
observed. Here's the oops
https://gist.github.com/mschiu77/1a68f27272b24775b2040acdb474cdd3.
However, I still get stuck on the following messages during boot very
often
""
[4.998241] endless kernel: [drm] amdgpu kernel modesetting enabled.
[4.998288] endless kernel: checking generic (e000 7f) vs
hw (e000 1000)
[4.998289] endless kernel: fb: switching to amdgpudrmfb from EFI VGA
""
I turned on drm.debug=0xe while booting, but no more information at this point.
Anything I can do at this point?

And there's 1 more information may be helpful. Sometimes the
system boots OK with the blank screen, I can't switch to virtual
console, but it did respond to the magic sys-rq key. The dmesg with
drm.debug=0xe is here
https://gist.github.com/mschiu77/291e47b1f07dc52be9461c55c820464c.

I'm pretty sure it's due to the amdgpu driver. Because when I boot
with my own kernel which disables the amdgpu driver, all these
symptoms went away. Please suggest anything I can do for this. Thanks

Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Kernel panic on nouveau during boot on NVIDIA NV118 (GM108)

2017-06-03 Thread Chris Chiu
We are working with new desktop that have the NVIDIA NV118
chipset.

During boot, the display becomes unusable at the point where the
nouveau driver loads. We have reproduced on 4.8, 4.11 and linux
master (4.12-rc3).

Dmesg log is attached.

Is this a known issue? Anything we can do to help?

Thanks


Bristol195i_dmesg.log
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/nouveau: fix unknown chipset for GTX 1060

2016-12-12 Thread Chris Chiu
Nouveau driver shows unknown chipset (136000a1) for GTX 1060, so it
only gives VGA resolution on screen. Use the same chipset as nv134
then it shows FullHD. This commit copies fields from nv134_chipset
to nv136_chipset for GTX 1060.

Signed-off-by: Chris Chiu 
---
 drivers/gpu/drm/nouveau/nvkm/engine/device/base.c | 29 +++
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
index 7218a06..7c6eece 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
@@ -2209,6 +2209,34 @@ nv134_chipset = {
.fifo = gp100_fifo_new,
 };

+static const struct nvkm_device_chip
+nv136_chipset = {
+   .name = "GP104",
+   .bar = gf100_bar_new,
+   .bios = nvkm_bios_new,
+   .bus = gf100_bus_new,
+   .devinit = gm200_devinit_new,
+   .fb = gp104_fb_new,
+   .fuse = gm107_fuse_new,
+   .gpio = gk104_gpio_new,
+   .i2c = gm200_i2c_new,
+   .ibus = gm200_ibus_new,
+   .imem = nv50_instmem_new,
+   .ltc = gp100_ltc_new,
+   .mc = gp100_mc_new,
+   .mmu = gf100_mmu_new,
+   .pci = gp100_pci_new,
+   .timer = gk20a_timer_new,
+   .top = gk104_top_new,
+   .ce[0] = gp104_ce_new,
+   .ce[1] = gp104_ce_new,
+   .ce[2] = gp104_ce_new,
+   .ce[3] = gp104_ce_new,
+   .disp = gp104_disp_new,
+   .dma = gf119_dma_new,
+   .fifo = gp100_fifo_new,
+};
+
 static int
 nvkm_device_event_ctor(struct nvkm_object *object, void *data, u32 size,
   struct nvkm_notify *notify)
@@ -2644,6 +2672,7 @@ nvkm_device_ctor(const struct nvkm_device_func *func,
case 0x12b: device->chip = _chipset; break;
case 0x130: device->chip = _chipset; break;
case 0x134: device->chip = _chipset; break;
+   case 0x136: device->chip = _chipset; break;
default:
nvdev_error(device, "unknown chipset (%08x)\n", boot0);
goto done;
-- 
2.1.4