Re: NVIDIA GPU fallen off the bus after exiting s2idle
On Thu, May 6, 2021 at 5:46 PM Rafael J. Wysocki wrote: > > On Tue, May 4, 2021 at 10:08 AM Chris Chiu wrote: > > > > Hi, > > We have some Intel laptops (11th generation CPU) with NVIDIA GPU > > suffering the same GPU falling off the bus problem while exiting > > s2idle with external display connected. These laptops connect the > > external display via the HDMI/DisplayPort on a USB Type-C interfaced > > dock. If we enter and exit s2idle with the dock connected, the NVIDIA > > GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port can come > > back to D0 w/o problem. If we enter the s2idle, disconnect the dock, > > then exit the s2idle, both external display and the panel will remain > > with no output. The dmesg as follows shows the "nvidia :01:00.0: > > can't change power state from D3cold to D0 (config space > > inaccessible)" due to the following ACPI error > > [ 154.446781] > > [ 154.446783] > > [ 154.446783] Initialized Local Variables for Method [IPCS]: > > [ 154.446784] Local0: 9863e365 Integer 09C5 > > [ 154.446790] > > [ 154.446791] Initialized Arguments for Method [IPCS]: (7 arguments > > defined for method invocation) > > [ 154.446792] Arg0: 25568fbd Integer 00AC > > [ 154.446795] Arg1: 9ef30e76 Integer > > [ 154.446798] Arg2: fdf820f0 Integer 0010 > > [ 154.446801] Arg3: 9fc2a088 Integer 0001 > > [ 154.446804] Arg4: 3a3418f7 Integer 0001 > > [ 154.446807] Arg5: 20c4b87c Integer > > [ 154.446810] Arg6: 8b965a8a Integer > > [ 154.446813] > > [ 154.446815] ACPI Error: Aborting method \IPCS due to previous error > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) > > [ 154.446824] ACPI Error: Aborting method \MCUI due to previous error > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) > > [ 154.446829] ACPI Error: Aborting method \SPCX due to previous error > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) > > [ 154.446835] ACPI Error: Aborting method \_SB.PC00.PGSC due to > > previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) > > [ 154.446841] ACPI Error: Aborting method \_SB.PC00.PGON due to > > previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) > > [ 154.446846] ACPI Error: Aborting method \_SB.PC00.PEG1.NPON due to > > previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) > > [ 154.446852] ACPI Error: Aborting method \_SB.PC00.PEG1.PG01._ON due > > to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) > > [ 154.446860] acpi device:02: Failed to change power state to D0 > > [ 154.690760] video LNXVIDEO:00: Cannot transition to power state D0 > > for parent in (unknown) > > If I were to guess, I would say that AML tries to access memory that > is not accessible while suspended, probably PCI config space. > > > The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON > > which we expect it to prepare everything before bringing back the > > NVIDIA GPU but it's stuck in the infinite loop as described below. > > Please refer to > > https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44 for > > the full DSDT.dsl. > > The DSDT alone may not be sufficient. > > Can you please create a bug entry at bugzilla.kernel.org for this > issue and attach the full output of acpidump from one of the affected > machines to it? And please let me know the number of the bug. > > Also please attach the output of dmesg including a suspend-resume > cycle including dock disconnection while suspended and the ACPI > messages quoted below. > > >While (One) > > { > > If ((!IBSY || (IERR == One))) > > { > > Break > > } > > > > If ((Local0 > TMOV)) > > { > > RPKG [Zero] = 0x03 > > Return (RPKG) /* \IPCS.RPKG */ > > } > > > > Sleep (One) > > Local0++ > > } > > > > And the upstream PCIe port of NVIDIA seems to become inaccessible due > > to the messages as follows. > > [ 292.746508] pcieport :00:01.0: waiting 100 ms for downstream > > link, after activation > > [ 292.882296] pci :01:00.0: waiting additional 100 ms to become > > accessible > > [ 316.876997] pci :01:00.0: can't change power state from D3cold > > to D0 (config space inaccessible)
NVIDIA GPU fallen off the bus after exiting s2idle
Hi, We have some Intel laptops (11th generation CPU) with NVIDIA GPU suffering the same GPU falling off the bus problem while exiting s2idle with external display connected. These laptops connect the external display via the HDMI/DisplayPort on a USB Type-C interfaced dock. If we enter and exit s2idle with the dock connected, the NVIDIA GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port can come back to D0 w/o problem. If we enter the s2idle, disconnect the dock, then exit the s2idle, both external display and the panel will remain with no output. The dmesg as follows shows the "nvidia :01:00.0: can't change power state from D3cold to D0 (config space inaccessible)" due to the following ACPI error [ 154.446781] [ 154.446783] [ 154.446783] Initialized Local Variables for Method [IPCS]: [ 154.446784] Local0: 9863e365 Integer 09C5 [ 154.446790] [ 154.446791] Initialized Arguments for Method [IPCS]: (7 arguments defined for method invocation) [ 154.446792] Arg0: 25568fbd Integer 00AC [ 154.446795] Arg1: 9ef30e76 Integer [ 154.446798] Arg2: fdf820f0 Integer 0010 [ 154.446801] Arg3: 9fc2a088 Integer 0001 [ 154.446804] Arg4: 3a3418f7 Integer 0001 [ 154.446807] Arg5: 20c4b87c Integer [ 154.446810] Arg6: 8b965a8a Integer [ 154.446813] [ 154.446815] ACPI Error: Aborting method \IPCS due to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446824] ACPI Error: Aborting method \MCUI due to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446829] ACPI Error: Aborting method \SPCX due to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446835] ACPI Error: Aborting method \_SB.PC00.PGSC due to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446841] ACPI Error: Aborting method \_SB.PC00.PGON due to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446846] ACPI Error: Aborting method \_SB.PC00.PEG1.NPON due to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446852] ACPI Error: Aborting method \_SB.PC00.PEG1.PG01._ON due to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446860] acpi device:02: Failed to change power state to D0 [ 154.690760] video LNXVIDEO:00: Cannot transition to power state D0 for parent in (unknown) The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON which we expect it to prepare everything before bringing back the NVIDIA GPU but it's stuck in the infinite loop as described below. Please refer to https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44 for the full DSDT.dsl. While (One) { If ((!IBSY || (IERR == One))) { Break } If ((Local0 > TMOV)) { RPKG [Zero] = 0x03 Return (RPKG) /* \IPCS.RPKG */ } Sleep (One) Local0++ } And the upstream PCIe port of NVIDIA seems to become inaccessible due to the messages as follows. [ 292.746508] pcieport :00:01.0: waiting 100 ms for downstream link, after activation [ 292.882296] pci :01:00.0: waiting additional 100 ms to become accessible [ 316.876997] pci :01:00.0: can't change power state from D3cold to D0 (config space inaccessible) Since the IPCS is the Intel Reference Code and we don't really know why the never-end loop happens just because we unplug the dock while the system still stays in s2idle. Can anyone from Intel suggest what happens here? And one thing also worth mentioning, if we unplug the display cable from the dock before entering the s2idle, NVIDIA GPU can come back w/o problem even if we disconnect the dock before exiting s2idle. Here's the lspci information https://gist.github.com/mschiu77/0bfc439d15d52d20de0129b1b2a86dc4 and the dmesg log with ACPI trace_state enabled and dynamic debug on for drivers/pci/pci.c, drivers/acpi/device_pm.c for the whole s2idle enter/exit with IPCS timeout. Any suggestion would be appreciated. Thanks. Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
TGL: : No video output on external monitor after unplug and re-plug the cable
We found another bug after the fix of https://gitlab.freedesktop.org/drm/intel/-/issues/2538. The external monitor is also connected via WD19's HDMI/DisplayPort just as #2538. However, the display monitor can only be detected and show output at the very first time we power on the WD19 dock. If we unplug the cable and replug again, the monitor seems to be detected but there's no video output. When we power on the WD19 dock with cable connected to the monitor, the drm kernel log shows as follows i915 :00:02.0: [drm:intel_get_hpd_pins.isra.0 [i915]] hotplug event received, stat 0x0001, dig 0x008a, pins 0x0200, long 0x0200 i915 :00:02.0: [drm:intel_hpd_irq_handler [i915]] digital hpd on [ENCODER:292:DDI D] - long i915 :00:02.0: [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 9 - cnt: 10 i915 :00:02.0: [drm:intel_dp_hpd_pulse [i915]] got hpd irq on [ENCODER:292:DDI D] - long i915 :00:02.0: [drm:i915_hotplug_work_func [i915]] running encoder hotplug functions i915 :00:02.0: [drm:i915_hotplug_work_func [i915]] Connector DP-1 (pin 9) received hotplug event. (retry 0) i915 :00:02.0: [drm:intel_dp_detect [i915]] [CONNECTOR:293:DP-1] i915 :00:02.0: [drm:intel_power_well_enable [i915]] enabling TC cold off i915 :00:02.0: [drm:tgl_tc_cold_request [i915]] TC cold block succeeded i915 :00:02.0: [drm:__intel_tc_port_lock [i915]] Port D/TC#1: TC port mode reset (tbt-alt -> dp-alt) i915 :00:02.0: [drm:intel_power_well_enable [i915]] enabling AUX D TC1 i915 :00:02.0: [drm:drm_dp_dpcd_read [drm_kms_helper]] AUX D/port D: 0xf AUX -> (ret= 8) 14 1e 40 55 02 00 00 00 i915 :00:02.0: [drm:intel_dp_lttpr_init [i915]] LTTPR common capabilities: 14 1e 40 55 02 00 00 00 Then I replug the cable, the intel_power_well_enable() in intel_dp_aux_xfer() shows "enabling DC off" power domain instead of enabling AUX D TC1. After that, the flooded i915 :00:02.0: [drm:intel_dp_aux_xfer [i915]] AUX D/port D: timeout (status 0x7d4003ff) keeps show up and no video output. I filed a bug on https://gitlab.freedesktop.org/drm/intel/-/issues/3407 and also uploaded the journal log with kernel boot parameter "drm.debug=0x10e". Can anyone suggest what happens at the replug? What can we do to identify the cause? Thanks Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Fails to lauch X on laptop with Ryzen 7 4700U
We are working with new laptops that have the Ryzen 7 4700U. It fails to launch X so I can only access via the virtual terminal. I tried with the latest mainline kernel and kernel from https://cgit.freedesktop.org/~agd5f/linux but no luck. I also boot the kernel with parameter amdgpu.exp_hw_support=1, but the system freezes after loading amdgpu and I can't even switch to the virtual terminal. I post the bug description and related information on https://gitlab.freedesktop.org/drm/amd/issues/1031. Please kindly advise what I should do next. Thanks Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Unexpected screen flicker during i915 initialization
Hi guys, We have 2 laptops, ASUS Z406MA and Acer TravelMate B118, both powered by the same Intel N5000 GemniLake CPU. On the Acer laptop, the panel will blink once during boot which never happens on the ASUS laptop. It caught my attention and I find the difference between them but I need help for more information, The major difference happens in bxt_sanitize_cdclk() on the following condition check. if (cdctl == expected) /* All well; nothing to sanitize */ return; On the problematic Acer laptop, the value of cdctl is 0x27a while the same cdctl is 0x278 on ASUS machine. Due to the 0x27a is not equal to the expected value 0x278 so it needs to be sanitized by assigning -1 to dev_priv->cdclk.hw.vco. Then the consequent bxt_set_cdclk() will force the full PLL disable and enable. And that's the flicker (blink) we observed during boot. Although I can't find the definition about the BIT(2) of CDCLK_CTL which cause this difference. Can anyone suggest what exactly the problem is and how should we deal with it? Thanks. Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Unexpected screen flicker during i915 initialization
On Wed, Oct 30, 2019 at 6:25 PM Chris Chiu wrote: > > Hi guys, > We have 2 laptops, ASUS Z406MA and Acer TravelMate B118, both > powered by the same Intel N5000 GemniLake CPU. On the Acer laptop, the > panel will blink once during boot which never happens on the ASUS > laptop. It caught my attention and I find the difference between them > but I need help for more information, Sorry, I forgot to mention that the problem was reproduced on the latest kernel 5.3. Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] i915 HDMI connector status is connected after disconnection
On Wed, Sep 19, 2018 at 8:08 PM, Jani Nikula wrote: > On Wed, 19 Sep 2018, Chris Chiu wrote: >> I tried to add a slight delay in the hotplug work as follows >> >> --- a/drivers/gpu/drm/i915/intel_hotplug.c >> +++ b/drivers/gpu/drm/i915/intel_hotplug.c >> @@ -378,6 +378,8 @@ static void do_i915_hotplug_check(struct work_struct >> *work, >> >> spin_unlock_irq(_priv->irq_lock); >> >> + msleep(100); >> + >> drm_connector_list_iter_begin(dev, _iter); >> drm_for_each_connector_iter(connector, _iter) { >> intel_connector = to_intel_connector(connector); >> >> It does work in most cases, but still fail to update the status if I >> unplug the HDMI >> cable very slow. I basically pull the HDMI cable in loose connected >> state first, and >> hold in that state ~1 second, totally unplug after that. The status in >> sysfs will report >> connected as it used to. There was no problem when I tried the patch >> https://bugs.freedesktop.org/show_bug.cgi?id=107125#c8 >> >> I'll try to modify this patch a little bit and send upstream for >> discussion later. Please >> advise if any. Thanks. > > Please let's not add excessive msleeps in work functions. > > My idea was more along the lines of making the hotplug function run in a > delayed work. After a chat with Ville, below is what I came up with. > > Please let me know how it works. Feel free to toy with the > delay. However, 1-2 seconds or more seems too much. > > BR, > Jani. > Thanks to the patch. It works in most cases on my problematic laptops. After lots of experiments, ex. pull the cable out with different paces, range delay from 300 to 800 msec, it makes no significant difference for a longer delay. So 300 msec is good enough for most cases. It at least updates the status correctly with a visible quick display blink when disconnecting HDMI. And compared to other machines which have no such problem, the HDMI cable slow pull out also result in the same problem. I'll say the test result is OK for me. Thanks. Chris > > > From 72515b3e856171e52e96bb74796774f595a7f418 Mon Sep 17 00:00:00 2001 > From: Jani Nikula > Date: Tue, 18 Sep 2018 13:12:34 +0300 > Subject: [PATCH] drm/i915: delay hotplug scheduling > Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo > Cc: Jani Nikula > > On some systems we get the hotplug irq on unplug before the DDC pins > have been disconnected, resulting in connector status remaining > connected. Since previous attempts at using hotplug live status before > DDC have failed, the only viable option is to reduce the window for > mistakes. Delay the hotplug processing. > > Signed-off-by: Jani Nikula > --- > drivers/gpu/drm/i915/i915_drv.h | 2 +- > drivers/gpu/drm/i915/intel_hotplug.c | 15 ++- > 2 files changed, 11 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 7d4daa7412f1..27f579abddae 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -286,7 +286,7 @@ enum hpd_pin { > #define HPD_STORM_DEFAULT_THRESHOLD 5 > > struct i915_hotplug { > - struct work_struct hotplug_work; > + struct delayed_work hotplug_work; > > struct { > unsigned long last_jiffies; > diff --git a/drivers/gpu/drm/i915/intel_hotplug.c > b/drivers/gpu/drm/i915/intel_hotplug.c > index 648a13c6043c..3af64daa5cfc 100644 > --- a/drivers/gpu/drm/i915/intel_hotplug.c > +++ b/drivers/gpu/drm/i915/intel_hotplug.c > @@ -110,6 +110,8 @@ enum hpd_pin intel_hpd_pin_default(struct > drm_i915_private *dev_priv, > } > } > > +#define HOTPLUG_DELAY_MS 300 > + > #define HPD_STORM_DETECT_PERIOD1000 > #define HPD_STORM_REENABLE_DELAY (2 * 60 * 1000) > > @@ -319,7 +321,8 @@ static void i915_digport_work_func(struct work_struct > *work) > spin_lock_irq(_priv->irq_lock); > dev_priv->hotplug.event_bits |= old_bits; > spin_unlock_irq(_priv->irq_lock); > - schedule_work(_priv->hotplug.hotplug_work); > + schedule_delayed_work(_priv->hotplug.hotplug_work, > + msecs_to_jiffies(HOTPLUG_DELAY_MS)); > } > } > > @@ -329,7 +332,7 @@ static void i915_digport_work_func(struct work_struct > *work) > static void i915_hotplug_work_func(struct work_struct *work) > { > struct drm_i915_private *dev_priv = > - container_of(work, struct drm_i915_private, > hotplug.hotplug
[PATCH] drm/i915: re-check the hotplug with a delayed work
I have few ASUS laptops, X705FD(Intel i7-8565), X560UD(Intel i5-8250U) and X530UN(Intel i7-8550U) share the same problem. The HDMI connector status stays 'connected' even the HDMI cable has been unplugged. Then the status in sysfs would never change since then until we do 'xrandr' to reprobe the devices. It would also cause the audio output path cannot correctly swicth based on the connector status. This commit kicks off a delayed work when the status remains unchanged in the first hotplug event handling, which may not be the perfect timing in some special cases. Signed-off-by: Chris Chiu --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/intel_hotplug.c | 35 +++ 2 files changed, 32 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d51d8574a679..78e2cf09cc10 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -286,6 +286,7 @@ struct i915_hotplug { } stats[HPD_NUM_PINS]; u32 event_bits; struct delayed_work reenable_work; + struct delayed_work recheck_work; struct intel_digital_port *irq_port[I915_MAX_PORTS]; u32 long_port_mask; diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c index 43aa92beff2a..089a24588ec8 100644 --- a/drivers/gpu/drm/i915/intel_hotplug.c +++ b/drivers/gpu/drm/i915/intel_hotplug.c @@ -349,14 +349,15 @@ static void i915_digport_work_func(struct work_struct *work) } } +#define HPD_RECHECK_DELAY(2 * 1000) + /* * Handle hotplug events outside the interrupt handler proper. */ -static void i915_hotplug_work_func(struct work_struct *work) +static void do_i915_hotplug_check(struct work_struct *work, + struct drm_i915_private *dev_priv, + struct drm_device *dev, bool do_recheck) { - struct drm_i915_private *dev_priv = - container_of(work, struct drm_i915_private, hotplug.hotplug_work); - struct drm_device *dev = _priv->drm; struct intel_connector *intel_connector; struct intel_encoder *intel_encoder; struct drm_connector *connector; @@ -396,8 +397,31 @@ static void i915_hotplug_work_func(struct work_struct *work) if (changed) drm_kms_helper_hotplug_event(dev); + else if (do_recheck) { + spin_lock_irq(_priv->irq_lock); + dev_priv->hotplug.event_bits |= hpd_event_bits; + spin_unlock_irq(_priv->irq_lock); + schedule_delayed_work(_priv->hotplug.recheck_work, msecs_to_jiffies(HPD_RECHECK_DELAY)); + } } +static void i915_hotplug_work_func(struct work_struct *work) +{ + struct drm_i915_private *dev_priv = + container_of(work, struct drm_i915_private, hotplug.hotplug_work); + struct drm_device *dev = _priv->drm; + + do_i915_hotplug_check(work, dev_priv, dev, true); +} + +static void i915_hotplug_recheck_func(struct work_struct *work) +{ + struct drm_i915_private *dev_priv = + container_of(work, struct drm_i915_private, hotplug.recheck_work.work); + struct drm_device *dev = _priv->drm; + + do_i915_hotplug_check(work, dev_priv, dev, false); +} /** * intel_hpd_irq_handler - main hotplug irq handler @@ -619,6 +643,8 @@ void intel_hpd_init_work(struct drm_i915_private *dev_priv) INIT_WORK(_priv->hotplug.poll_init_work, i915_hpd_poll_init_work); INIT_DELAYED_WORK(_priv->hotplug.reenable_work, intel_hpd_irq_storm_reenable_work); + INIT_DELAYED_WORK(_priv->hotplug.recheck_work, + i915_hotplug_recheck_func); } void intel_hpd_cancel_work(struct drm_i915_private *dev_priv) @@ -635,6 +661,7 @@ void intel_hpd_cancel_work(struct drm_i915_private *dev_priv) cancel_work_sync(_priv->hotplug.hotplug_work); cancel_work_sync(_priv->hotplug.poll_init_work); cancel_delayed_work_sync(_priv->hotplug.reenable_work); + cancel_delayed_work_sync(_priv->hotplug.recheck_work); } bool intel_hpd_disable(struct drm_i915_private *dev_priv, enum hpd_pin pin) -- 2.11.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] i915 HDMI connector status is connected after disconnection
On Tue, Sep 11, 2018 at 6:25 PM, Chris Chiu wrote: > On Fri, Aug 24, 2018 at 11:04 PM, Jani Nikula wrote: >> On Wed, 22 Aug 2018, Chris Chiu wrote: >>> On Fri, Jul 6, 2018 at 2:44 PM, Chris Chiu wrote: >>>> On Thu, Jul 5, 2018 at 10:40 PM, Ville Syrjälä >>>> wrote: >>>>> On Thu, Jul 05, 2018 at 03:58:36PM +0800, Chris Chiu wrote: >>>>>> Hi, >>>>>> We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel >>>>>> i5-8250U), X530UN (intel i7-8550U) share the same problem, which is >>>>>> the HDMI connector status stays connected even the HDMI cable has been >>>>>> unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for >>>>>> checking the status while plug/unplug the HDMI, it shows >>>>>> "disconnected" before plug in HDMI cable, then switch to "connected" >>>>>> after plugin, and still stay "connected" after unplug. This would >>>>>> cause the audio output path cannot correctly switch from HDMI to >>>>>> internal speaker after unplugging the HDMI. >>>>>> >>>>>> I then try to verify with the latest kernel 4.18.0-rc3+, the bug still >>>>>> present. The full "dmesg" log is here. >>>>>> https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1 >>>>>> >>>>>> The HDMI cable is plugged in at ~26th second. >>>>>> "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >>>>>> audio support" >>>>>> then unplug the HDMI at ~73th second. >>>>>> "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >>>>>> audio support" >>>>>> >>>>>> Please advise what I can do to fix this. Thanks >>>>> >>>>> Pull the cable out faster? >>>>> >>>>> I presume this is the same old case of hpd disconnecting slightly >>>>> before ddc and we still manage to read the EDID when processing >>>>> the hpd irq. We kinda tried to fix that with the live status >>>>> check but that thing failed spectacularly. >>>>> >>>>> -- >>>>> Ville Syrjälä >>>>> Intel >>> >>> There's a patch https://bugs.freedesktop.org/show_bug.cgi?id=107125#c8. >>> And I verified on the X705FD/X560UD which were easy to reproduce, the patch >>> works as expected. Can anyone kindly give comments about this patch? >>> We can do anything to help fix this issue upstream. Thanks >> >> Seems like a hack. Should look into hw based debouncing or a slight >> delay in the hotplug work processing I think. >> >> BR, >> Jani. >> I tried to add a slight delay in the hotplug work as follows --- a/drivers/gpu/drm/i915/intel_hotplug.c +++ b/drivers/gpu/drm/i915/intel_hotplug.c @@ -378,6 +378,8 @@ static void do_i915_hotplug_check(struct work_struct *work, spin_unlock_irq(_priv->irq_lock); + msleep(100); + drm_connector_list_iter_begin(dev, _iter); drm_for_each_connector_iter(connector, _iter) { intel_connector = to_intel_connector(connector); It does work in most cases, but still fail to update the status if I unplug the HDMI cable very slow. I basically pull the HDMI cable in loose connected state first, and hold in that state ~1 second, totally unplug after that. The status in sysfs will report connected as it used to. There was no problem when I tried the patch https://bugs.freedesktop.org/show_bug.cgi?id=107125#c8 I'll try to modify this patch a little bit and send upstream for discussion later. Please advise if any. Thanks. Chris > So you're suggesting to add a slight delay directly in > i915_hotplug_work_func()? > And any suggestion about the 'hw based' debouncing? Maybe some examples > that I can refer to? > > Thanks > >>> >>> Chris >>> >>>> Thanks for the suggestion. I tried pulling the cable out faster, the status >>>> shows correctly. I also tried branch drm-tip of >>>> https://cgit.freedesktop.org/drm/drm-tip >>>> but the symptom persists. >>>> >>>> Anything I can help here? Or any old commit/patch I can try to do some >>>> experiments? >>>> >>>> Chris >> >> -- >> Jani Nikula, Intel Open Source Graphics Center ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] i915 HDMI connector status is connected after disconnection
On Fri, Aug 24, 2018 at 11:04 PM, Jani Nikula wrote: > On Wed, 22 Aug 2018, Chris Chiu wrote: >> On Fri, Jul 6, 2018 at 2:44 PM, Chris Chiu wrote: >>> On Thu, Jul 5, 2018 at 10:40 PM, Ville Syrjälä >>> wrote: >>>> On Thu, Jul 05, 2018 at 03:58:36PM +0800, Chris Chiu wrote: >>>>> Hi, >>>>> We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel >>>>> i5-8250U), X530UN (intel i7-8550U) share the same problem, which is >>>>> the HDMI connector status stays connected even the HDMI cable has been >>>>> unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for >>>>> checking the status while plug/unplug the HDMI, it shows >>>>> "disconnected" before plug in HDMI cable, then switch to "connected" >>>>> after plugin, and still stay "connected" after unplug. This would >>>>> cause the audio output path cannot correctly switch from HDMI to >>>>> internal speaker after unplugging the HDMI. >>>>> >>>>> I then try to verify with the latest kernel 4.18.0-rc3+, the bug still >>>>> present. The full "dmesg" log is here. >>>>> https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1 >>>>> >>>>> The HDMI cable is plugged in at ~26th second. >>>>> "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >>>>> audio support" >>>>> then unplug the HDMI at ~73th second. >>>>> "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >>>>> audio support" >>>>> >>>>> Please advise what I can do to fix this. Thanks >>>> >>>> Pull the cable out faster? >>>> >>>> I presume this is the same old case of hpd disconnecting slightly >>>> before ddc and we still manage to read the EDID when processing >>>> the hpd irq. We kinda tried to fix that with the live status >>>> check but that thing failed spectacularly. >>>> >>>> -- >>>> Ville Syrjälä >>>> Intel >> >> There's a patch https://bugs.freedesktop.org/show_bug.cgi?id=107125#c8. >> And I verified on the X705FD/X560UD which were easy to reproduce, the patch >> works as expected. Can anyone kindly give comments about this patch? >> We can do anything to help fix this issue upstream. Thanks > > Seems like a hack. Should look into hw based debouncing or a slight > delay in the hotplug work processing I think. > > BR, > Jani. > So you're suggesting to add a slight delay directly in i915_hotplug_work_func()? And any suggestion about the 'hw based' debouncing? Maybe some examples that I can refer to? Thanks >> >> Chris >> >>> Thanks for the suggestion. I tried pulling the cable out faster, the status >>> shows correctly. I also tried branch drm-tip of >>> https://cgit.freedesktop.org/drm/drm-tip >>> but the symptom persists. >>> >>> Anything I can help here? Or any old commit/patch I can try to do some >>> experiments? >>> >>> Chris > > -- > Jani Nikula, Intel Open Source Graphics Center ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] i915 HDMI connector status is connected after disconnection
On Fri, Jul 6, 2018 at 2:44 PM, Chris Chiu wrote: > On Thu, Jul 5, 2018 at 10:40 PM, Ville Syrjälä > wrote: >> On Thu, Jul 05, 2018 at 03:58:36PM +0800, Chris Chiu wrote: >>> Hi, >>> We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel >>> i5-8250U), X530UN (intel i7-8550U) share the same problem, which is >>> the HDMI connector status stays connected even the HDMI cable has been >>> unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for >>> checking the status while plug/unplug the HDMI, it shows >>> "disconnected" before plug in HDMI cable, then switch to "connected" >>> after plugin, and still stay "connected" after unplug. This would >>> cause the audio output path cannot correctly switch from HDMI to >>> internal speaker after unplugging the HDMI. >>> >>> I then try to verify with the latest kernel 4.18.0-rc3+, the bug still >>> present. The full "dmesg" log is here. >>> https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1 >>> >>> The HDMI cable is plugged in at ~26th second. >>> "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >>> audio support" >>> then unplug the HDMI at ~73th second. >>> "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >>> audio support" >>> >>> Please advise what I can do to fix this. Thanks >> >> Pull the cable out faster? >> >> I presume this is the same old case of hpd disconnecting slightly >> before ddc and we still manage to read the EDID when processing >> the hpd irq. We kinda tried to fix that with the live status >> check but that thing failed spectacularly. >> >> -- >> Ville Syrjälä >> Intel There's a patch https://bugs.freedesktop.org/show_bug.cgi?id=107125#c8. And I verified on the X705FD/X560UD which were easy to reproduce, the patch works as expected. Can anyone kindly give comments about this patch? We can do anything to help fix this issue upstream. Thanks Chris > Thanks for the suggestion. I tried pulling the cable out faster, the status > shows correctly. I also tried branch drm-tip of > https://cgit.freedesktop.org/drm/drm-tip > but the symptom persists. > > Anything I can help here? Or any old commit/patch I can try to do some > experiments? > > Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] i915 HDMI connector status is connected after disconnection
On Thu, Jul 5, 2018 at 10:40 PM, Ville Syrjälä wrote: > On Thu, Jul 05, 2018 at 03:58:36PM +0800, Chris Chiu wrote: >> Hi, >> We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel >> i5-8250U), X530UN (intel i7-8550U) share the same problem, which is >> the HDMI connector status stays connected even the HDMI cable has been >> unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for >> checking the status while plug/unplug the HDMI, it shows >> "disconnected" before plug in HDMI cable, then switch to "connected" >> after plugin, and still stay "connected" after unplug. This would >> cause the audio output path cannot correctly switch from HDMI to >> internal speaker after unplugging the HDMI. >> >> I then try to verify with the latest kernel 4.18.0-rc3+, the bug still >> present. The full "dmesg" log is here. >> https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1 >> >> The HDMI cable is plugged in at ~26th second. >> "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >> audio support" >> then unplug the HDMI at ~73th second. >> "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >> audio support" >> >> Please advise what I can do to fix this. Thanks > > Pull the cable out faster? > > I presume this is the same old case of hpd disconnecting slightly > before ddc and we still manage to read the EDID when processing > the hpd irq. We kinda tried to fix that with the live status > check but that thing failed spectacularly. > > -- > Ville Syrjälä > Intel Thanks for the suggestion. I tried pulling the cable out faster, the status shows correctly. I also tried branch drm-tip of https://cgit.freedesktop.org/drm/drm-tip but the symptom persists. Anything I can help here? Or any old commit/patch I can try to do some experiments? Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[BUG] i915 HDMI connector status is connected after disconnection
Hi, We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel i5-8250U), X530UN (intel i7-8550U) share the same problem, which is the HDMI connector status stays connected even the HDMI cable has been unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for checking the status while plug/unplug the HDMI, it shows "disconnected" before plug in HDMI cable, then switch to "connected" after plugin, and still stay "connected" after unplug. This would cause the audio output path cannot correctly switch from HDMI to internal speaker after unplugging the HDMI. I then try to verify with the latest kernel 4.18.0-rc3+, the bug still present. The full "dmesg" log is here. https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1 The HDMI cable is plugged in at ~26th second. "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic audio support" then unplug the HDMI at ~73th second. "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic audio support" Please advise what I can do to fix this. Thanks Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] i915 HDMI connector status is connected after disconnection
On Thu, Jul 5, 2018 at 9:18 PM, Jani Nikula wrote: > On Thu, 05 Jul 2018, Chris Chiu wrote: >> On Thu, Jul 5, 2018 at 5:37 PM, Jani Nikula wrote: >>> On Thu, 05 Jul 2018, Chris Wilson wrote: >>>> Quoting Jani Nikula (2018-07-05 09:58:57) >>>>> On Thu, 05 Jul 2018, Chris Chiu wrote: >>>>> > Hi, >>>>> > We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel >>>>> > i5-8250U), X530UN (intel i7-8550U) share the same problem, which is >>>>> > the HDMI connector status stays connected even the HDMI cable has been >>>>> > unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for >>>>> > checking the status while plug/unplug the HDMI, it shows >>>>> > "disconnected" before plug in HDMI cable, then switch to "connected" >>>>> > after plugin, and still stay "connected" after unplug. This would >>>>> > cause the audio output path cannot correctly switch from HDMI to >>>>> > internal speaker after unplugging the HDMI. >>>>> > >>>>> > I then try to verify with the latest kernel 4.18.0-rc3+, the bug still >>>>> > present. The full "dmesg" log is here. >>>>> > https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1 >>>>> > >>>>> > The HDMI cable is plugged in at ~26th second. >>>>> > "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >>>>> > audio support" >>>>> > then unplug the HDMI at ~73th second. >>>>> > "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >>>>> > audio support" >>>>> > >>>>> > Please advise what I can do to fix this. Thanks >>>>> >>>>> Seems rather odd. Please file a bug report at [1]. Attach the dmesg on >>>>> the bug. Please attach 'xrandr --verbose' output before and after >>>>> unplugging on the bug. >>>> >>>> Note that 'xrandr --verbose' will trigger a reprobe of the devices, >>>> papering over any missed probe following hotplug. I would suggest >>>> preceding with 'xrandr --current --verbose'. >>>> >>>> If all you are doing is checking status, you need to 'echo detect > >>>> status' to trigger a reprobe after hotplug. >> >> It's interesting that reprobe triggered by 'xrandr --verbose' after unplug >> will >> get the status back to "disconnected". But if I just do 'xrandr >> --current --verbose' >> before and after unplugging the cable, the output shows the same status >> 'connected'. >> >> Here's the output of 'xrandr --verbose' before unplugging HDMI >> https://gist.github.com/mschiu77/ea2e843078297f344596243418dcdaf7 >> >> And the output of 'xrandr --current --verbose' after unplugging the cable >> https://gist.github.com/mschiu77/55756c0801046d49cd9bc3f87712b079 >> >> Then do 'xrandr --current --verbose' to trigger reprobe, the ouput >> https://gist.github.com/mschiu77/72e6ab5438cbe64443300fc4fd71770c >> >> It means that the HDMI unplug not detected by the driver? > > Please do file the bug, and attach the information there. People go on > vacations, the pastebins will go away, and the memory of all of this > will fade. > Sorry that I missed to list here. I've reported the bug as follows https://bugs.freedesktop.org/show_bug.cgi?id=107125 Thanks > BR, > Jani. > >> >> Chris >> >>> >>> I was curious about the logs seemingly indicating that we can read the >>> EDID even after the user says they've unplugged the cable. The updating >>> of sysfs status attribute is another matter. >>> >>> BR, >>> Jani. >>> >>> >>> -- >>> Jani Nikula, Intel Open Source Graphics Center > > -- > Jani Nikula, Intel Open Source Graphics Center ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] i915 HDMI connector status is connected after disconnection
On Thu, Jul 5, 2018 at 5:37 PM, Jani Nikula wrote: > On Thu, 05 Jul 2018, Chris Wilson wrote: >> Quoting Jani Nikula (2018-07-05 09:58:57) >>> On Thu, 05 Jul 2018, Chris Chiu wrote: >>> > Hi, >>> > We have few ASUS laptops X705FD (The new WiskyLake), X560UD (intel >>> > i5-8250U), X530UN (intel i7-8550U) share the same problem, which is >>> > the HDMI connector status stays connected even the HDMI cable has been >>> > unplugged. Look into the "/sys/class/drm/card0-HDMI-A-1/status" for >>> > checking the status while plug/unplug the HDMI, it shows >>> > "disconnected" before plug in HDMI cable, then switch to "connected" >>> > after plugin, and still stay "connected" after unplug. This would >>> > cause the audio output path cannot correctly switch from HDMI to >>> > internal speaker after unplugging the HDMI. >>> > >>> > I then try to verify with the latest kernel 4.18.0-rc3+, the bug still >>> > present. The full "dmesg" log is here. >>> > https://gist.github.com/mschiu77/d761d7c5cf191b7868d4d7788ae087f1 >>> > >>> > The HDMI cable is plugged in at ~26th second. >>> > "[ 26.214371] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >>> > audio support" >>> > then unplug the HDMI at ~73th second. >>> > "[ 73.328361] [drm:drm_detect_monitor_audio [drm]] Monitor has basic >>> > audio support" >>> > >>> > Please advise what I can do to fix this. Thanks >>> >>> Seems rather odd. Please file a bug report at [1]. Attach the dmesg on >>> the bug. Please attach 'xrandr --verbose' output before and after >>> unplugging on the bug. >> >> Note that 'xrandr --verbose' will trigger a reprobe of the devices, >> papering over any missed probe following hotplug. I would suggest >> preceding with 'xrandr --current --verbose'. >> >> If all you are doing is checking status, you need to 'echo detect > >> status' to trigger a reprobe after hotplug. It's interesting that reprobe triggered by 'xrandr --verbose' after unplug will get the status back to "disconnected". But if I just do 'xrandr --current --verbose' before and after unplugging the cable, the output shows the same status 'connected'. Here's the output of 'xrandr --verbose' before unplugging HDMI https://gist.github.com/mschiu77/ea2e843078297f344596243418dcdaf7 And the output of 'xrandr --current --verbose' after unplugging the cable https://gist.github.com/mschiu77/55756c0801046d49cd9bc3f87712b079 Then do 'xrandr --current --verbose' to trigger reprobe, the ouput https://gist.github.com/mschiu77/72e6ab5438cbe64443300fc4fd71770c It means that the HDMI unplug not detected by the driver? Chris > > I was curious about the logs seemingly indicating that we can read the > EDID even after the user says they've unplugged the cable. The updating > of sysfs status attribute is another matter. > > BR, > Jani. > > > -- > Jani Nikula, Intel Open Source Graphics Center ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: amdgpu hangs on boot or shutdown on AMD Raven Ridge CPU (Engineer Sample)
On Thu, Feb 1, 2018 at 9:13 PM, Chris Chiu <c...@endlessm.com> wrote: > On Thu, Feb 1, 2018 at 12:08 AM, Harry Wentland <harry.wentl...@amd.com> > wrote: >> On 2018-01-31 09:31 AM, Chris Chiu wrote: >>> Hi, >>> We are working with new laptops that have the AMD Ravenl Ridge >>> chipset with this `/proc/cpuinfo` >>> https://gist.github.com/mschiu77/b06dba574e89b9a30cf4c450eaec49bc >>> >>> With the latest kernel 4.15, there're lots of different >>> panics/oops during boot so no chance to get into X. It also happens >>> during shutdown. Then I tried to build kernel from >>> git://people.freedesktop.org/~agd5f/linux on branch >>> amd-staging-drm-next with head on commit "drm: Fix trailing semicolon" >>> and update the linux-firmware. Things seem to get better, only 1 oops >>> observed. Here's the oops >>> https://gist.github.com/mschiu77/1a68f27272b24775b2040acdb474cdd3. >> >> Hi Chris, >> >> what are the steps to reproduce this oops? >> >> Does it reproduce all the time or is it intermittent? >> >> Can you send a dmesg with amdgpu.dc_log=1, in addition to drm.debug=0xe? >> >> Thanks, >> Harry >> > > I did nothing special to reproduce the oops. Boot and sometimes it > just shows blank > screen but still responds to magic sysrq. So I reboot and take the journal > log. > > It's intermittent, I ran into it 2 times during 13 reboots. > The logs are listed as follows > https://gist.github.com/mschiu77/9307d1ca0acd046cc6817f8cad63d79c > https://gist.github.com/mschiu77/fa81110f93428721f017cb9fbfd06fbe > > One more log here. It enters X OK but after few minutes the display > went black and > only a mouse cursor left. But the mouse cursor can't even move. So I do a > sysrq > reboot again. > The last error is > "" > [ 636.312759] endless kernel: > [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* > [CRTC:41:crtc-0] flip_done timed out > [ 646.552344] endless kernel: > [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* > [CRTC:41:crtc-0] flip_done timed out > "" > full log here > https://gist.github.com/mschiu77/c8696e5fefb17bb1c53598214fb4e382 > > Only 4 times I can login X, blank screen or hangs w/o responding to > magic sysrq for > the rest. I took a picture of the only panic although I think it's not > about amdgpu. > It's here. > https://pasteboard.co/H5CUvxk.jpg > > Hope they can be helpful. > > Chris > >>> However, I still get stuck on the following messages during boot very >>> often >>> "" >>> [4.998241] endless kernel: [drm] amdgpu kernel modesetting enabled. >>> [4.998288] endless kernel: checking generic (e000 7f) vs >>> hw (e000 1000) >>> [4.998289] endless kernel: fb: switching to amdgpudrmfb from EFI VGA >>> "" >>> I turned on drm.debug=0xe while booting, but no more information at this >>> point. >>> Anything I can do at this point? >>> >>> And there's 1 more information may be helpful. Sometimes the >>> system boots OK with the blank screen, I can't switch to virtual >>> console, but it did respond to the magic sys-rq key. The dmesg with >>> drm.debug=0xe is here >>> https://gist.github.com/mschiu77/291e47b1f07dc52be9461c55c820464c. >>> >>> I'm pretty sure it's due to the amdgpu driver. Because when I boot >>> with my own kernel which disables the amdgpu driver, all these >>> symptoms went away. Please suggest anything I can do for this. Thanks >>> >>> Chris >>> ___ >>> amd-gfx mailing list >>> amd-...@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> Gentle ping, cheers. Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: amdgpu hangs on boot or shutdown on AMD Raven Ridge CPU (Engineer Sample)
On Thu, Feb 1, 2018 at 12:08 AM, Harry Wentland <harry.wentl...@amd.com> wrote: > On 2018-01-31 09:31 AM, Chris Chiu wrote: >> Hi, >> We are working with new laptops that have the AMD Ravenl Ridge >> chipset with this `/proc/cpuinfo` >> https://gist.github.com/mschiu77/b06dba574e89b9a30cf4c450eaec49bc >> >> With the latest kernel 4.15, there're lots of different >> panics/oops during boot so no chance to get into X. It also happens >> during shutdown. Then I tried to build kernel from >> git://people.freedesktop.org/~agd5f/linux on branch >> amd-staging-drm-next with head on commit "drm: Fix trailing semicolon" >> and update the linux-firmware. Things seem to get better, only 1 oops >> observed. Here's the oops >> https://gist.github.com/mschiu77/1a68f27272b24775b2040acdb474cdd3. > > Hi Chris, > > what are the steps to reproduce this oops? > > Does it reproduce all the time or is it intermittent? > > Can you send a dmesg with amdgpu.dc_log=1, in addition to drm.debug=0xe? > > Thanks, > Harry > I did nothing special to reproduce the oops. Boot and sometimes it just shows blank screen but still responds to magic sysrq. So I reboot and take the journal log. It's intermittent, I ran into it 2 times during 13 reboots. The logs are listed as follows https://gist.github.com/mschiu77/9307d1ca0acd046cc6817f8cad63d79c https://gist.github.com/mschiu77/fa81110f93428721f017cb9fbfd06fbe One more log here. It enters X OK but after few minutes the display went black and only a mouse cursor left. But the mouse cursor can't even move. So I do a sysrq reboot again. The last error is "" [ 636.312759] endless kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:41:crtc-0] flip_done timed out [ 646.552344] endless kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:41:crtc-0] flip_done timed out "" full log here https://gist.github.com/mschiu77/c8696e5fefb17bb1c53598214fb4e382 Only 4 times I can login X, blank screen or hangs w/o responding to magic sysrq for the rest. I took a picture of the only panic although I think it's not about amdgpu. It's here. https://pasteboard.co/H5CUvxk.jpg Hope they can be helpful. Chris >> However, I still get stuck on the following messages during boot very >> often >> "" >> [4.998241] endless kernel: [drm] amdgpu kernel modesetting enabled. >> [4.998288] endless kernel: checking generic (e000 7f) vs >> hw (e000 1000) >> [4.998289] endless kernel: fb: switching to amdgpudrmfb from EFI VGA >> "" >> I turned on drm.debug=0xe while booting, but no more information at this >> point. >> Anything I can do at this point? >> >> And there's 1 more information may be helpful. Sometimes the >> system boots OK with the blank screen, I can't switch to virtual >> console, but it did respond to the magic sys-rq key. The dmesg with >> drm.debug=0xe is here >> https://gist.github.com/mschiu77/291e47b1f07dc52be9461c55c820464c. >> >> I'm pretty sure it's due to the amdgpu driver. Because when I boot >> with my own kernel which disables the amdgpu driver, all these >> symptoms went away. Please suggest anything I can do for this. Thanks >> >> Chris >> ___ >> amd-gfx mailing list >> amd-...@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
amdgpu hangs on boot or shutdown on AMD Raven Ridge CPU (Engineer Sample)
Hi, We are working with new laptops that have the AMD Ravenl Ridge chipset with this `/proc/cpuinfo` https://gist.github.com/mschiu77/b06dba574e89b9a30cf4c450eaec49bc With the latest kernel 4.15, there're lots of different panics/oops during boot so no chance to get into X. It also happens during shutdown. Then I tried to build kernel from git://people.freedesktop.org/~agd5f/linux on branch amd-staging-drm-next with head on commit "drm: Fix trailing semicolon" and update the linux-firmware. Things seem to get better, only 1 oops observed. Here's the oops https://gist.github.com/mschiu77/1a68f27272b24775b2040acdb474cdd3. However, I still get stuck on the following messages during boot very often "" [4.998241] endless kernel: [drm] amdgpu kernel modesetting enabled. [4.998288] endless kernel: checking generic (e000 7f) vs hw (e000 1000) [4.998289] endless kernel: fb: switching to amdgpudrmfb from EFI VGA "" I turned on drm.debug=0xe while booting, but no more information at this point. Anything I can do at this point? And there's 1 more information may be helpful. Sometimes the system boots OK with the blank screen, I can't switch to virtual console, but it did respond to the magic sys-rq key. The dmesg with drm.debug=0xe is here https://gist.github.com/mschiu77/291e47b1f07dc52be9461c55c820464c. I'm pretty sure it's due to the amdgpu driver. Because when I boot with my own kernel which disables the amdgpu driver, all these symptoms went away. Please suggest anything I can do for this. Thanks Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Kernel panic on nouveau during boot on NVIDIA NV118 (GM108)
We are working with new desktop that have the NVIDIA NV118 chipset. During boot, the display becomes unusable at the point where the nouveau driver loads. We have reproduced on 4.8, 4.11 and linux master (4.12-rc3). Dmesg log is attached. Is this a known issue? Anything we can do to help? Thanks Bristol195i_dmesg.log Description: Binary data ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/nouveau: fix unknown chipset for GTX 1060
Nouveau driver shows unknown chipset (136000a1) for GTX 1060, so it only gives VGA resolution on screen. Use the same chipset as nv134 then it shows FullHD. This commit copies fields from nv134_chipset to nv136_chipset for GTX 1060. Signed-off-by: Chris Chiu --- drivers/gpu/drm/nouveau/nvkm/engine/device/base.c | 29 +++ 1 file changed, 29 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c index 7218a06..7c6eece 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c @@ -2209,6 +2209,34 @@ nv134_chipset = { .fifo = gp100_fifo_new, }; +static const struct nvkm_device_chip +nv136_chipset = { + .name = "GP104", + .bar = gf100_bar_new, + .bios = nvkm_bios_new, + .bus = gf100_bus_new, + .devinit = gm200_devinit_new, + .fb = gp104_fb_new, + .fuse = gm107_fuse_new, + .gpio = gk104_gpio_new, + .i2c = gm200_i2c_new, + .ibus = gm200_ibus_new, + .imem = nv50_instmem_new, + .ltc = gp100_ltc_new, + .mc = gp100_mc_new, + .mmu = gf100_mmu_new, + .pci = gp100_pci_new, + .timer = gk20a_timer_new, + .top = gk104_top_new, + .ce[0] = gp104_ce_new, + .ce[1] = gp104_ce_new, + .ce[2] = gp104_ce_new, + .ce[3] = gp104_ce_new, + .disp = gp104_disp_new, + .dma = gf119_dma_new, + .fifo = gp100_fifo_new, +}; + static int nvkm_device_event_ctor(struct nvkm_object *object, void *data, u32 size, struct nvkm_notify *notify) @@ -2644,6 +2672,7 @@ nvkm_device_ctor(const struct nvkm_device_func *func, case 0x12b: device->chip = _chipset; break; case 0x130: device->chip = _chipset; break; case 0x134: device->chip = _chipset; break; + case 0x136: device->chip = _chipset; break; default: nvdev_error(device, "unknown chipset (%08x)\n", boot0); goto done; -- 2.1.4