Re: [Nouveau] [PATCH v2] ALSA: hda: Continue to probe when codec probe fails
On Mon, Apr 12, 2021 at 4:01 PM Aaron Plattner wrote: > > On 4/12/21 12:36 PM, Roy Spliet wrote: > > Hello Aaron, > > > > Thanks for your insights. A follow-up query and some observations > > in-line. > > > > Op 12-04-2021 om 20:06 schreef Aaron Plattner: > >> On 4/10/21 1:48 PM, Roy Spliet wrote: > >>> Op 10-04-2021 om 20:23 schreef Lukas Wunner: > On Sat, Apr 10, 2021 at 04:51:27PM +0100, Roy Spliet wrote: > > Can I ask someone with more > > technical knowledge of snd_hda_intel and vgaswitcheroo to > > brainstorm about > > the possible challenges of nouveau taking matters into its own > > hand rather > > than keeping this PCI quirk around? > > It sounds to me like the HDA is not powered if no cable is plugged in. > What is reponsible then for powering it up or down, firmware code on > the GPU or in the host's BIOS? > >>> > >>> Sometimes the BIOS, but definitely unconditionally the PCI quirk > >>> code: > >>> https://github.com/torvalds/linux/blob/master/drivers/pci/quirks.c#L5289 > >>> > >>> > >>> (CC Aaron Plattner) > >> > >> My basic understanding is that the audio function stops responding > >> whenever the graphics function is powered off. So the requirement > >> here is that the audio driver can't try to talk to the audio function > >> while the graphics function is asleep, and must trigger a graphics > >> function wakeup before trying to communicate with the audio function. > > > > I believe that vgaswitcheroo takes care of this for us. > > > >> I think there are also requirements about the audio function needing > >> to be awake when the graphics driver is updating the ELD, but I'm not > >> sure. > >> > >> This is harder on Windows because the audio driver lives in its own > >> little world doing its own thing but on Linux we can do better. > >> > Ideally, we should try to find out how to control HDA power from the > operating system rather than trying to cooperate with whatever > firmware > is doing. If we have that capability, the OS should power the HDA up > and down as it sees fit. > >> > >> After system boot, I don't think there's any firmware involved, but > >> I'm not super familiar with the low-level details and it's possible > >> the situation changed since I last looked at it. > >> > >> I think the problem with having nouveau write this quirk is that the > >> kernel will need to re-probe the PCI device to notice that it has > >> suddenly become a multi-function device with an audio function, and > >> hotplug the audio driver. I originally looked into trying to do that > >> but it was tricky because the PCI subsystem didn't really have a > >> mechanism for a single-function device to become a multi-function > >> device on the fly and it seemed easier to enable it early on during > >> bus enumeration. That way the kernel sees both functions all the time > >> without anything else having to be special about this configuration. > > > > Right, so for a little more context: a while ago I noticed that my > > laptop (lucky me, Asus K501UB) has a 940M with HDA but no codec. Seems > > legit, given how this GPU has no displays attached; they're all hooked > > up to the Intel integrated GPU. That threw off the snd_hda_intel > > mid-probe, and as a result didn't permit runpm, keeping the entire > > GPU, PCIe bus and thus the CPU package awake. A bit of hackerly later > > we decided to continue probing without a codec, and now my laptop is > > happy, but... > > What is the PCI class of the GPU in your system? If it has no display > outputs it's probably 0x302 ("3D Controller") rather than 0x300 ("VGA > Controller"). Looking at the code it looks like this workaround is being > applied to both but maybe it should be restricted to just VGA controllers. That was a comment I had back when the quirk was being implemented, but helpfully there are some of these devices running around which say "3D Controller" but still have displays attached to them. Lukas probably remembers more specifics. -ilia
Re: [Nouveau] [PATCH] drm/nouveau/kms/nv50-: Check plane size for cursors, not fb size
On Thu, Mar 18, 2021 at 5:56 PM Lyude Paul wrote: > > Found this while trying to make some changes to the kms_cursor_crc test. > curs507a_acquire checks that the width and height of the cursor framebuffer > are equal (asyw->image.{w,h}). This is actually wrong though, as we only > want to be concerned that the actual width/height of the plane are the > same. It's fine if we scan out from an fb that's slightly larger than the > cursor plane (in fact, some igt tests actually do this). How so? The scanout engine expects the data to be packed. Height can be larger, but width has to match. -ilia
Re: [PATCH 2/3] drm/nouveau/kms/nv50-: Report max cursor size to userspace
On Tue, Feb 23, 2021 at 11:23 AM Alex Riesen wrote: > > Alex Riesen, Tue, Feb 23, 2021 16:51:26 +0100: > > Ilia Mirkin, Tue, Feb 23, 2021 16:46:52 +0100: > > > I'd recommend using xf86-video-nouveau in any case, but some distros > > > > I would like try this out. Do you know how to force the xorg server to > > choose this driver instead of modesetting? > > Found that myself (a Device section with Driver set to "nouveau"): > > $ xrandr --listproviders > Providers: number : 1 > Provider 0: id: 0x68 cap: 0x7, Source Output, Sink Output, Source Offload > crtcs: 4 outputs: 5 associated providers: 0 name:nouveau > > And yes, the cursor looks good in v5.11 even without reverting the commit. FWIW it's not immediately apparent to me what grave error modesetting is committing in setting the cursor. The logic looks perfectly reasonable. It's not trying to be fancy with rendering the cursor/etc. The one thing is that it's using drmModeSetCursor2 which sets the hotspot at the same time. But internally inside nouveau I think it should work out to the same thing. Perhaps setting the hotspot, or something in that path, doesn't quite work for 256x256? [Again, no clue what that might be.] It might also be worthwhile just testing if the 256x256 cursor works quite the way one would want. If you're interested, grab libdrm, there's a test called 'modetest', which has an option to enable a moving cursor (-c iirc). It's hard-coded to 64x64, so you'll have to modify it there too (and probably change the pattern from plain gray to any one of the other ones). Cheers, -ilia
Re: [PATCH 2/3] drm/nouveau/kms/nv50-: Report max cursor size to userspace
On Tue, Feb 23, 2021 at 10:36 AM Alex Riesen wrote: > > Ilia Mirkin, Tue, Feb 23, 2021 15:56:21 +0100: > > On Tue, Feb 23, 2021 at 9:26 AM Alex Riesen > > wrote: > > > Lyude Paul, Tue, Jan 19, 2021 02:54:13 +0100: > > > > diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c > > > > b/drivers/gpu/drm/nouveau/dispnv50/disp.c > > > > index c6367035970e..5f4f09a601d4 100644 > > > > --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c > > > > +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c > > > > @@ -2663,6 +2663,14 @@ nv50_display_create(struct drm_device *dev) > > > > else > > > > nouveau_display(dev)->format_modifiers = > > > > disp50xx_modifiers; > > > > > > > > + if (disp->disp->object.oclass >= GK104_DISP) { > > > > + dev->mode_config.cursor_width = 256; > > > > + dev->mode_config.cursor_height = 256; > > > > + } else { > > > > + dev->mode_config.cursor_width = 64; > > > > + dev->mode_config.cursor_height = 64; > > > > + } > > > > + > > > > /* create crtc objects to represent the hw heads */ > > > > if (disp->disp->object.oclass >= GV100_DISP) > > > > crtcs = nvif_rd32(>object, 0x610060) & 0xff; > > > > > > This change broke X cursor in my setup, and reverting the commit restores > > > it. > > > > > > Dell Precision M4800, issue ~2014 with GK106GLM [Quadro K2100M] (rev a1). > > > libdrm 2.4.91-1 (Debian 10.8 stable). > > > There are no errors or warnings in Xorg logs nor in the kernel log. > > > > Could you confirm which ddx is driving the nvidia hw? You can find > > this out by running "xrandr --listproviders", or also in the xorg log. > > xrandr(1) does not seem to list much: > > $ xrandr --listproviders > Providers: number : 1 > Provider 0: id: 0x48 cap: 0xf, Source Output, Sink Output, Source Offload, > Sink Offload crtcs: 4 outputs: 5 associated providers: 0 name:modesetting Thanks - this is what I was looking for. name:modesetting, i.e. the modesetting ddx driver. I checked nouveau source, and it seems like it uses a 64x64 cursor no matter what. Not sure what the modesetting ddx does. I'd recommend using xf86-video-nouveau in any case, but some distros have decided to explicitly force modesetting in preference of nouveau. Oh well. (And regardless, the regression should be addressed somehow, but it's also good to understand what the problem is.) Can you confirm what the problem with the cursor is? -ilia
Re: [PATCH 2/3] drm/nouveau/kms/nv50-: Report max cursor size to userspace
On Tue, Feb 23, 2021 at 9:26 AM Alex Riesen wrote: > > Lyude Paul, Tue, Jan 19, 2021 02:54:13 +0100: > > diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c > > b/drivers/gpu/drm/nouveau/dispnv50/disp.c > > index c6367035970e..5f4f09a601d4 100644 > > --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c > > +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c > > @@ -2663,6 +2663,14 @@ nv50_display_create(struct drm_device *dev) > > else > > nouveau_display(dev)->format_modifiers = disp50xx_modifiers; > > > > + if (disp->disp->object.oclass >= GK104_DISP) { > > + dev->mode_config.cursor_width = 256; > > + dev->mode_config.cursor_height = 256; > > + } else { > > + dev->mode_config.cursor_width = 64; > > + dev->mode_config.cursor_height = 64; > > + } > > + > > /* create crtc objects to represent the hw heads */ > > if (disp->disp->object.oclass >= GV100_DISP) > > crtcs = nvif_rd32(>object, 0x610060) & 0xff; > > This change broke X cursor in my setup, and reverting the commit restores it. > > Dell Precision M4800, issue ~2014 with GK106GLM [Quadro K2100M] (rev a1). > libdrm 2.4.91-1 (Debian 10.8 stable). > There are no errors or warnings in Xorg logs nor in the kernel log. Hi Alex, Could you confirm which ddx is driving the nvidia hw? You can find this out by running "xrandr --listproviders", or also in the xorg log. Thanks, -ilia
Re: [Nouveau] [RFC v3 05/10] drm/i915/dpcd_bl: Cleanup intel_dp_aux_vesa_enable_backlight() a bit
On Fri, Feb 5, 2021 at 6:45 PM Lyude Paul wrote: > > Get rid of the extraneous switch case in here, and just open code > edp_backlight_mode as we only ever use it once. > > Signed-off-by: Lyude Paul > --- > .../gpu/drm/i915/display/intel_dp_aux_backlight.c | 15 ++- > 1 file changed, 2 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c > b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c > index c37ccc8538cb..95e3e344cf40 100644 > --- a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c > +++ b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c > @@ -382,7 +382,7 @@ intel_dp_aux_vesa_enable_backlight(const struct > intel_crtc_state *crtc_state, > struct intel_dp *intel_dp = intel_attached_dp(connector); > struct drm_i915_private *i915 = dp_to_i915(intel_dp); > struct intel_panel *panel = >panel; > - u8 dpcd_buf, new_dpcd_buf, edp_backlight_mode; > + u8 dpcd_buf, new_dpcd_buf; > u8 pwmgen_bit_count = panel->backlight.edp.vesa.pwmgen_bit_count; > > if (drm_dp_dpcd_readb(_dp->aux, > @@ -393,12 +393,8 @@ intel_dp_aux_vesa_enable_backlight(const struct > intel_crtc_state *crtc_state, > } > > new_dpcd_buf = dpcd_buf; > - edp_backlight_mode = dpcd_buf & DP_EDP_BACKLIGHT_CONTROL_MODE_MASK; > > - switch (edp_backlight_mode) { > - case DP_EDP_BACKLIGHT_CONTROL_MODE_PWM: > - case DP_EDP_BACKLIGHT_CONTROL_MODE_PRESET: > - case DP_EDP_BACKLIGHT_CONTROL_MODE_PRODUCT: > + if ((dpcd_buf & DP_EDP_BACKLIGHT_CONTROL_MODE_MASK) != > DP_EDP_BACKLIGHT_CONTROL_MODE_MASK) { You probably meant != MODE_DPCD? > new_dpcd_buf &= ~DP_EDP_BACKLIGHT_CONTROL_MODE_MASK; > new_dpcd_buf |= DP_EDP_BACKLIGHT_CONTROL_MODE_DPCD; > > @@ -406,13 +402,6 @@ intel_dp_aux_vesa_enable_backlight(const struct > intel_crtc_state *crtc_state, >pwmgen_bit_count) != 1) > drm_dbg_kms(>drm, > "Failed to write aux pwmgen bit count\n"); > - > - break; > - > - /* Do nothing when it is already DPCD mode */ > - case DP_EDP_BACKLIGHT_CONTROL_MODE_DPCD: > - default: > - break; > } > > if (panel->backlight.edp.vesa.pwm_freq_pre_divider) { > -- > 2.29.2 > > ___ > Nouveau mailing list > nouv...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
On Tue, Dec 29, 2020 at 10:52 AM Marc MERLIN wrote: > > On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote: > > > after boot, when it gets the right trigger (not sure which ones), it > > > loops on this evern 2 seconds, mostly forever. > > > > The gpu suspends with runtime pm. And then gets woken up for some > > reason (could be something quite silly, like lspci, or could be > > something explicitly checking connectors, etc). Repeat. > > Ah, fair point. Could it be powertop even? > How would I go towards tracing that? > Sounds like this would be a problem with all chips if userspace is able > to wake them up every second or two with a probe. Now I wonder what > broken userspace I have that could be doing this. Well, it's a theory. Some userspace helpfully prevents the GPU from suspending entirely, unfortunately I don't remember its name though by messing with the attached audio device. It's very common and meant to help... oh well. > > > Display offload usually requires acceleration -- the copies are done > > using the DMA engine. Please make sure that you have firmware > > available (and a new enough mesa). The errors suggest that you don't > > have firmware available at the time that nouveau loads. Depending on > > your setup, that might mean the firmware has to be built into the > > kernel, or available in initramfs. (Or just regular filesystem if you > > don't use a complicated boot sequence. But many people go with distro > > defaults, which do have this complexity.) > > Hi Ilia, thanks for your answer. > > Do you think that could be a reason why the boot would hang for 2 full > minutes at every > boot ever since I upgraded to 5.5? I'd have to check, but I'm guessing TU104 acceleration became a thing in 5.5. I would also not be very surprised if the code didn't handle failure extremely gracefully - there definitely have been problems with that in the past. > > Also, without wanting to sound like a full newbie, where is that > firmware you're talking about? In my kernel source? > > Here's what I do have: > sauron:/usr/local/bin# dpkggrep nouveau > libdrm-nouveau2:amd64 install > xserver-xorg-video-nouveau install > > no nouveau-firmware package in debian: > sauron:/usr/local/bin# apt-cache search nouveau > bumblebee - NVIDIA Optimus support for Linux > libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services > -- runtime > xfonts-jmk - Jim Knoble's character-cell fonts for X > xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver > > No firmware file on my disk: > sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ > /lib/firmware/ |grep nouveau > /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau > /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko > sauron:/usr/local/bin# > > The kernel module is in my initrd: > sauron:/usr/local/bin# dd > if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528 skip=1 | > gunzip | cpio -tdv | grep nouveau > drwxr-xr-x 1 root root0 Nov 30 15:40 > usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau > -rw-r--r-- 1 root root 3691385 Nov 30 15:35 > usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko > 17+1 records in > 17+1 records out > 52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s I think that gets you out of "full newbie" land... > > What am I supposed to do/check next? > > Note that ultimately I only need nouveau not to hang my boot 2mn and do > PM so that the nvidia chip goes to sleep since I don't use it. I'm not extremely familiar with debian packaging, but the firmware is provided by NVIDIA and shipped as part of linux-firmware: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia This needs to be available at /lib/firmware/nvidia when nouveau loads. Based on your email above, it's most likely that it would load from the initrd - so make sure it's in there. Of course now that I read your email a bit more carefully, it seems your issue is with the "saving config space" messages. I'm not sure I've seen those before. Perhaps you have some sort of debug enabled. I'd find where in the kernel they are being produced, and what the conditions for it are. But the failure to load firmware isn't great -- not 100% sure if it impacts runpm or not. I just double-checked, TU10x accel came in via afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6. Initial TU10x support came in v5.0. So that doesn't line up with your timeline. Anyways, I'd definitely sort the firmware situation out, but it may not be the cause of your problem. Cheers, -ilia
Re: [PATCH 5.10 064/717] drm/edid: Fix uninitialized variable in drm_cvt_modes()
Hi Greg, Linus had to apply a fixup for this patch. Please ensure that it's in your patch list: commit d652d5f1eeeb06046009f4fcb9b4542249526916 Author: Linus Torvalds Date: Thu Dec 17 09:27:57 2020 -0800 drm/edid: fix objtool warning in drm_cvt_modes() It does not appear to have a Fixes tag, so may not have been picked up by your automated tooling. Cheers, -ilia On Mon, Dec 28, 2020 at 9:01 AM Greg Kroah-Hartman wrote: > > From: Lyude Paul > > [ Upstream commit 991fcb77f490390bcad89fa67d95763c58cdc04c ] > > Noticed this when trying to compile with -Wall on a kernel fork. We > potentially don't set width here, which causes the compiler to complain > about width potentially being uninitialized in drm_cvt_modes(). So, let's > fix that. > > Changes since v1: > * Don't emit an error as this code isn't reachable, just mark it as such > Changes since v2: > * Remove now unused variable > > Fixes: 3f649ab728cd ("treewide: Remove uninitialized_var() usage") > Signed-off-by: Lyude Paul > Reviewed-by: Ilia Mirkin > Link: > https://patchwork.freedesktop.org/patch/msgid/20201105235703.1328115-1-ly...@redhat.com > Signed-off-by: Sasha Levin > --- > drivers/gpu/drm/drm_edid.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c > index 631125b46e04c..b84efd538a702 100644 > --- a/drivers/gpu/drm/drm_edid.c > +++ b/drivers/gpu/drm/drm_edid.c > @@ -3114,6 +3114,8 @@ static int drm_cvt_modes(struct drm_connector > *connector, > case 0x0c: > width = height * 15 / 9; > break; > + default: > + unreachable(); > } > > for (j = 1; j < 5; j++) { > -- > 2.27.0 > > >
Re: [Nouveau] 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
On Sun, Dec 27, 2020 at 12:03 PM Marc MERLIN wrote: > > This started with 5.5 and hasn't gotten better since then, despite some > reports > I tried to send. > > As per my previous message: > I have a Thinkpad P70 with hybrid graphics. > 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] > (rev a2) > that one works fine, I can use i915 for the main screen, and nouveau to > display on the external ports (external ports are only wired to nvidia > chip, so it's impossible to use them without turning the nvidia chip > on). > > I now got a newer P73 also with the same hybrid graphics (setup as such > in the bios). It runs fine with i915, and I don't need to use external > display with nouveau for now (it almost works, but I only see the mouse > cursor on the external screen, no window or anything else can get > displayed, very weird). > 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX > 4000 Mobile / Max-Q] (rev a1) Display offload usually requires acceleration -- the copies are done using the DMA engine. Please make sure that you have firmware available (and a new enough mesa). The errors suggest that you don't have firmware available at the time that nouveau loads. Depending on your setup, that might mean the firmware has to be built into the kernel, or available in initramfs. (Or just regular filesystem if you don't use a complicated boot sequence. But many people go with distro defaults, which do have this complexity.) > > > after boot, when it gets the right trigger (not sure which ones), it > loops on this evern 2 seconds, mostly forever. The gpu suspends with runtime pm. And then gets woken up for some reason (could be something quite silly, like lspci, or could be something explicitly checking connectors, etc). Repeat. Cheers, -ilia
Re: [PATCH v8 4/4] NOTFORMERGE: drm/logicvc: Add plane colorkey support
FWIW this is something I added, hoping it was going to get used at some point, but I never followed up with support in xf86-video-nouveau for Xv. At this point, I'm not sure I ever will. I encoded the "enabled" part into the value with a high bit (1<<24) -- not sure that was such a great idea. All NVIDIA hardware supports colorkey (and not actual alpha, until the very latest GPUs - Volta/Turing families), although my implementation only covers the pre-G80 series (i.e. DX9 and earlier GPUs - pre-2008). Should a determination of usefulness be reached, it would be easy to implement on the remainder though. Cheers, -ilia On Wed, Dec 23, 2020 at 5:20 PM Simon Ser wrote: > > nouveau already has something for colorkey: > https://drmdb.emersion.fr/properties/4008636142/colorkey > > I know this is marked "not for merge", but it would be nice to discuss > with them and come up with a standardized property. > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH v2] ALSA: hda: Continue to probe when codec probe fails
On Mon, Dec 21, 2020 at 11:33 AM Kai-Heng Feng wrote: > > [+Cc nouveau] > > On Fri, Dec 18, 2020 at 4:06 PM Takashi Iwai wrote: > [snip] > > > Quite possibly the system doesn't power up HDA controller when there's > > > no external monitor. > > > So when it's connected to external monitor, it's still needed for HDMI > > > audio. > > > Let me ask the user to confirm this. > > > > Yeah, it's the basic question whether the HD-audio is supposed to work > > on this machine at all. If yes, the current approach we take makes > > less sense - instead we should rather make the HD-audio controller > > working. > > Yea, confirmed that the Nvidia HDA works when HDMI is connected prior boot. > > > > > - The second problem is that pci_enable_device() ignores the error > > > > returned from pci_set_power_state() if it's -EIO. And the > > > > inaccessible access error returns -EIO, although it's rather a fatal > > > > problem. So the driver believes as the PCI device gets enabled > > > > properly. > > > > > > This was introduced in 2005, by Alan's 11f3859b1e85 ("[PATCH] PCI: Fix > > > regression in pci_enable_device_bars") to fix UHCI controller. > > > > > > > > > > > - The third problem is that HD-audio driver blindly believes the > > > > codec_mask read from the register even if it's a read failure as I > > > > already showed. > > > > > > This approach has least regression risk. > > > > Yes, but it assumes that HD-audio is really non-existent. > > I really don't know any good approach to address this. > On Windows, HDA PCI is "hidden" until HDMI cable is plugged, then the > driver will flag the magic bit to make HDA audio appear on the PCI > bus. > IIRC the current approach is to make nouveau and device link work. I don't have the full context of this discussion, but the kernel force-enables the HDA subfunction nowadays, irrespective of nouveau or nvidia or whatever: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/quirks.c?h=v5.10#n5267 Cheers, -ilia
Re: [Nouveau] Nouveau video --- [ cut here ] ----- crash dump 5.10.0-rc6
Unfortunately this isn't a crash, but rather a warning that things are timing out. By the time you get this, the display is most likely hung. Was there anything before this, e.g. an error state dump perhaps? What GPU are you using, what displays, and how are they connected? What kind of userspace is running here? X or a Wayland compositor (or something else entirely)? On Thu, Dec 3, 2020 at 12:13 AM Dave Airlie wrote: > > cc'ing Ben + nouveau > > On Thu, 3 Dec 2020 at 14:59, bob wrote: > > > > Hello. I have a crash dump for: > > > > $ uname -a > > Linux freedom 5.10.0-rc6 #1 SMP Sun Nov 29 17:26:13 MST 2020 x86_64 > > x86_64 x86_64 GNU/Linux > > > > Occasionally when this dumps it likes to lock up the computer, but I > > caught it this time. > > > > Also video likes to flicker a lot. Nouveau has been iffy since kernel > > 5.8.0. > > > > This isn't the only dump, it dumped probably 50 times. If you are > > really desperate for all of it, > > > > reply to me directly as I'm not on the mailing list. Here is one of them. > > > > [39019.426580] [ cut here ] > > [39019.426589] WARNING: CPU: 6 PID: 14136 at > > drivers/gpu/drm/nouveau/dispnv50/disp.c:211 nv50_dmac_wait+0x1e1/0x230 > > [39019.426590] Modules linked in: mt2131 s5h1409 fuse tda8290 tuner > > cx25840 rt2800usb rt2x00usb rt2800lib snd_hda_codec_analog > > snd_hda_codec_generic ledtrig_audio rt2x00lib binfmt_misc > > intel_powerclamp coretemp cx23885 mac80211 tda18271 altera_stapl > > videobuf2_dvb m88ds3103 tveeprom cx2341x dvb_core rc_core i2c_mux > > snd_hda_codec_hdmi videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 > > snd_hda_intel videobuf2_common snd_intel_dspcfg kvm_intel snd_hda_codec > > videodev snd_hda_core kvm mc snd_hwdep snd_pcm_oss snd_mixer_oss > > irqbypass snd_pcm cfg80211 snd_seq_dummy snd_seq_midi snd_seq_oss > > snd_seq_midi_event snd_rawmidi snd_seq intel_cstate snd_seq_device > > serio_raw snd_timer input_leds nfsd libarc4 snd asus_atk0110 i7core_edac > > soundcore i5500_temp auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc > > parport_pc ppdev lp parport ip_tables x_tables btrfs blake2b_generic > > libcrc32c xor zstd_compress raid6_pq dm_mirror dm_region_hash dm_log > > pata_acpi pata_marvell hid_generic usbhid hid psmouse firewire_ohci > > [39019.426650] firewire_core crc_itu_t i2c_i801 ahci sky2 libahci > > i2c_smbus lpc_ich > > [39019.426658] CPU: 6 PID: 14136 Comm: kworker/u16:0 Tainted: GW > > I 5.10.0-rc6 #1 > > [39019.426659] Hardware name: System manufacturer System Product > > Name/P6T DELUXE, BIOS 220909/21/2010 > > [39019.426662] Workqueue: events_unbound nv50_disp_atomic_commit_work > > [39019.426665] RIP: 0010:nv50_dmac_wait+0x1e1/0x230 > > [39019.426667] Code: 8d 48 04 48 89 4a 68 c7 00 00 00 00 20 49 8b 46 38 > > 41 c7 86 20 01 00 00 00 00 00 00 49 89 46 68 e8 e4 fc ff ff e9 76 fe ff > > ff <0f> 0b b8 92 ff ff ff e9 ed fe ff ff 49 8b be 80 00 00 00 e8 c7 fc > > [39019.426668] RSP: 0018:b79d028ebd48 EFLAGS: 00010282 > > [39019.426670] RAX: ff92 RBX: 000d RCX: > > > > [39019.426671] RDX: ff92 RSI: b79d028ebc88 RDI: > > b79d028ebd28 > > [39019.426671] RBP: b79d028ebd48 R08: R09: > > b79d028ebc58 > > [39019.426672] R10: 0030 R11: 11c4 R12: > > fffb > > [39019.426673] R13: a05fc1ebd368 R14: a05fc1ebd3a8 R15: > > a05fc2425000 > > [39019.426675] FS: () GS:a061f3d8() > > knlGS: > > [39019.426676] CS: 0010 DS: ES: CR0: 80050033 > > [39019.426677] CR2: 7fb2d58e CR3: 00026280a000 CR4: > > 06e0 > > [39019.426678] Call Trace: > > [39019.426685] base827c_image_set+0x2f/0x1d0 > > [39019.426687] nv50_wndw_flush_set+0x89/0x1c0 > > [39019.426688] nv50_disp_atomic_commit_tail+0x4e7/0x7e0 > > [39019.426693] process_one_work+0x1d4/0x370 > > [39019.426695] worker_thread+0x4a/0x3b0 > > [39019.426697] ? process_one_work+0x370/0x370 > > [39019.426699] kthread+0xfe/0x140 > > [39019.426701] ? kthread_park+0x90/0x90 > > [39019.426704] ret_from_fork+0x22/0x30 > > [39019.426706] ---[ end trace d512d675211c738c ]--- > > [39021.426751] [ cut here ] > > > > > > Thanks in advance, > > > > Bob > > > ___ > Nouveau mailing list > nouv...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [PATCH v3] drm/edid: Fix uninitialized variable in drm_cvt_modes()
On Thu, Nov 5, 2020 at 6:57 PM Lyude Paul wrote: > > Noticed this when trying to compile with -Wall on a kernel fork. We > potentially > don't set width here, which causes the compiler to complain about width > potentially being uninitialized in drm_cvt_modes(). So, let's fix that. > > Changes since v1: > * Don't emit an error as this code isn't reachable, just mark it as such > Changes since v2: > * Remove now unused variable > > Signed-off-by: Lyude Paul > > Cc: # v5.9+ > Fixes: 3f649ab728cd ("treewide: Remove uninitialized_var() usage") > Signed-off-by: Lyude Paul For the very little it's worth, Reviewed-by: Ilia Mirkin > --- > drivers/gpu/drm/drm_edid.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c > index 631125b46e04..b84efd538a70 100644 > --- a/drivers/gpu/drm/drm_edid.c > +++ b/drivers/gpu/drm/drm_edid.c > @@ -3114,6 +3114,8 @@ static int drm_cvt_modes(struct drm_connector > *connector, > case 0x0c: > width = height * 15 / 9; > break; > + default: > + unreachable(); > } > > for (j = 1; j < 5; j++) { > -- > 2.28.0 >
Re: [PATCH v2] drm/edid: Fix uninitialized variable in drm_cvt_modes()
On Tue, Nov 3, 2020 at 5:15 PM Lyude Paul wrote: > > Noticed this when trying to compile with -Wall on a kernel fork. We > potentially > don't set width here, which causes the compiler to complain about width > potentially being uninitialized in drm_cvt_modes(). So, let's fix that. > > Changes since v1: > * Don't emit an error as this code isn't reachable, just mark it as such > > Signed-off-by: Lyude Paul > > Cc: # v5.9+ > Fixes: 3f649ab728cd ("treewide: Remove uninitialized_var() usage") > Signed-off-by: Lyude Paul > --- > drivers/gpu/drm/drm_edid.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c > index 631125b46e04..0643b98c6383 100644 > --- a/drivers/gpu/drm/drm_edid.c > +++ b/drivers/gpu/drm/drm_edid.c > @@ -3094,6 +3094,7 @@ static int drm_cvt_modes(struct drm_connector > *connector, > > for (i = 0; i < 4; i++) { > int width, height; > + u8 cvt_aspect_ratio; > > cvt = &(timing->data.other_data.data.cvt[i]); > > @@ -3101,7 +3102,8 @@ static int drm_cvt_modes(struct drm_connector > *connector, > continue; > > height = (cvt->code[0] + ((cvt->code[1] & 0xf0) << 4) + 1) * > 2; > - switch (cvt->code[1] & 0x0c) { > + cvt_aspect_ratio = cvt->code[1] & 0x0c; The temp var doesn't do anything now right? Previously you were using it in the print, but now you can drop these two hunks, I think? -ilia > + switch (cvt_aspect_ratio) { > case 0x00: > width = height * 4 / 3; > break; > @@ -3114,6 +3116,8 @@ static int drm_cvt_modes(struct drm_connector > *connector, > case 0x0c: > width = height * 15 / 9; > break; > + default: > + unreachable(); > } > > for (j = 1; j < 5; j++) { > -- > 2.28.0 >
Re: [PATCH] drm/edid: Fix uninitialized variable in drm_cvt_modes()
On Tue, Nov 3, 2020 at 2:47 PM Lyude Paul wrote: > > Sorry! Thought I had responded to this but apparently not, comments down below > > On Thu, 2020-10-22 at 14:04 -0400, Ilia Mirkin wrote: > > On Thu, Oct 22, 2020 at 12:55 PM Lyude Paul wrote: > > > > > > Noticed this when trying to compile with -Wall on a kernel fork. We > > > potentially > > > don't set width here, which causes the compiler to complain about width > > > potentially being uninitialized in drm_cvt_modes(). So, let's fix that. > > > > > > Signed-off-by: Lyude Paul > > > > > > Cc: # v5.9+ > > > Fixes: 3f649ab728cd ("treewide: Remove uninitialized_var() usage") > > > Signed-off-by: Lyude Paul > > > --- > > > drivers/gpu/drm/drm_edid.c | 8 +++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c > > > index 631125b46e04..2da158ffed8e 100644 > > > --- a/drivers/gpu/drm/drm_edid.c > > > +++ b/drivers/gpu/drm/drm_edid.c > > > @@ -3094,6 +3094,7 @@ static int drm_cvt_modes(struct drm_connector > > > *connector, > > > > > > for (i = 0; i < 4; i++) { > > > int width, height; > > > + u8 cvt_aspect_ratio; > > > > > > cvt = &(timing->data.other_data.data.cvt[i]); > > > > > > @@ -3101,7 +3102,8 @@ static int drm_cvt_modes(struct drm_connector > > > *connector, > > > continue; > > > > > > height = (cvt->code[0] + ((cvt->code[1] & 0xf0) << 4) + > > > 1) * > > > 2; > > > - switch (cvt->code[1] & 0x0c) { > > > + cvt_aspect_ratio = cvt->code[1] & 0x0c; > > > + switch (cvt_aspect_ratio) { > > > case 0x00: > > > width = height * 4 / 3; > > > break; > > > @@ -3114,6 +3116,10 @@ static int drm_cvt_modes(struct drm_connector > > > *connector, > > > case 0x0c: > > > width = height * 15 / 9; > > > break; > > > + default: > > > > What value would cvt->code[1] have such that this gets hit? > > > > Or is this a "compiler is broken, so let's add more code" situation? > > If so, perhaps the code added could just be enough to silence the > > compiler (unreachable, etc)? > > I mean, this information comes from the EDID which inherently means it's > coming > from an untrusted source so the value could be literally anything as long as > the > EDID has a valid checksum. Note (assuming I'm understanding this code > correctly): > > drm_add_edid_modes() → add_cvt_modes() → drm_for_each_detailed_block() → > do_cvt_mode() → drm_cvt_modes() > > So afaict this isn't a broken compiler but a legitimate uninitialized > variable. The value can be anything, but it has to be something. The switch is on "unknown & 0x0c", so only 4 cases are possible, which are enumerated in the switch. -ilia
Re: [PATCH] drm/edid: Fix uninitialized variable in drm_cvt_modes()
On Thu, Oct 22, 2020 at 12:55 PM Lyude Paul wrote: > > Noticed this when trying to compile with -Wall on a kernel fork. We > potentially > don't set width here, which causes the compiler to complain about width > potentially being uninitialized in drm_cvt_modes(). So, let's fix that. > > Signed-off-by: Lyude Paul > > Cc: # v5.9+ > Fixes: 3f649ab728cd ("treewide: Remove uninitialized_var() usage") > Signed-off-by: Lyude Paul > --- > drivers/gpu/drm/drm_edid.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c > index 631125b46e04..2da158ffed8e 100644 > --- a/drivers/gpu/drm/drm_edid.c > +++ b/drivers/gpu/drm/drm_edid.c > @@ -3094,6 +3094,7 @@ static int drm_cvt_modes(struct drm_connector > *connector, > > for (i = 0; i < 4; i++) { > int width, height; > + u8 cvt_aspect_ratio; > > cvt = &(timing->data.other_data.data.cvt[i]); > > @@ -3101,7 +3102,8 @@ static int drm_cvt_modes(struct drm_connector > *connector, > continue; > > height = (cvt->code[0] + ((cvt->code[1] & 0xf0) << 4) + 1) * > 2; > - switch (cvt->code[1] & 0x0c) { > + cvt_aspect_ratio = cvt->code[1] & 0x0c; > + switch (cvt_aspect_ratio) { > case 0x00: > width = height * 4 / 3; > break; > @@ -3114,6 +3116,10 @@ static int drm_cvt_modes(struct drm_connector > *connector, > case 0x0c: > width = height * 15 / 9; > break; > + default: What value would cvt->code[1] have such that this gets hit? Or is this a "compiler is broken, so let's add more code" situation? If so, perhaps the code added could just be enough to silence the compiler (unreachable, etc)? -ilia
Re: [Nouveau] nouveau broken on Riva TNT2 in 5.9.0-rc8: GPU not supported on big-endian
On Fri, Oct 9, 2020 at 5:54 PM Karol Herbst wrote: > > On Fri, Oct 9, 2020 at 11:35 PM Ondrej Zary wrote: > > > > Hello, > > I'm testing 5.9.0-rc8 and found that Riva TNT2 stopped working: > > [0.00] Linux version 5.9.0-rc8+ (zary@gsql) (gcc (Debian 8.3.0-6) > > 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #326 SMP Fri Oct 9 22:31:40 > > CEST 2020 > > ... > > [ 14.771464] nouveau :01:00.0: GPU not supported on big-endian > > [ 14.771782] nouveau: probe of :01:00.0 failed with error -38 > > > > big-endian? WTF? The machine is x86. > > > > mhh, we reworked the endianess checks a bit and apparently that broke > something... I will give it some thoughts, but could you be so kind > and create an mmiotrace under 5.9 with nouveau? You won't need to > start X or anything while doing it. Just enable the trace and modprobe > nouveau and collect the trace. Looks like nvkm_device_endianness unconditionally reads out 0x4. I don't think that reg is there pre-NV11. At least NV4, NV5, NV10 and maybe NV15 (which is logically pre-NV11) don't support big-endian mode. Not sure about NV1A, which was the IGP of the series and IIRC logically pre-NV11 as well (but clearly could only be used with x86 chips, since it was part of the motherboard). Aha, it's documented in rnndb: https://github.com/envytools/envytools/blob/master/rnndb/bus/pmc.xml -ilia
Re: [PATCH] drm/nouveau/kms/nv50-: Fix clock checking algorithm in nv50_dp_mode_valid()
On Fri, Sep 25, 2020 at 6:08 PM Lyude Paul wrote: > > On Tue, 2020-09-22 at 17:22 -0400, Ilia Mirkin wrote: > > On Tue, Sep 22, 2020 at 5:14 PM Lyude Paul wrote: > > > On Tue, 2020-09-22 at 17:10 -0400, Ilia Mirkin wrote: > > > > Can we use 6bpc on arbitrary DP monitors, or is there a capability for > > > > it? Maybe only use 6bpc if display_info.bpc == 6 and otherwise use 8? > > > > > > I don't think that display_info.bpc actually implies a minimum bpc, only a > > > maximum bpc iirc (Ville would know the answer to this one). The other > > > thing > > > to > > > note here is that we want to assume the lowest possible bpc here since > > > we're > > > only concerned if the mode passed to ->mode_valid can be set under -any- > > > conditions (including those that require lowering the bpc beyond it's > > > maximum > > > value), so we definitely do want to always use 6bpc here even once we get > > > support for optimizing the bpc based on the available display bandwidth. > > > > Yeah, display_info is the max bpc. But would an average monitor > > support 6bpc? And if it does, does the current link training code even > > try that when display_info.bpc != 6? > > So I did confirm that 6bpc support is mandatory for DP, so yes-6 bpc will > always > work. > > But also, your second comment doesn't really apply here. So: to be clear, > we're > not really concerned here about whether nouveau will actually use 6bpc or not. > In truth I'm not actually sure either if we have any code that uses 6bpc (iirc > we don't), since we don't current optimize bpc. I think it's very possible for > us to use 6bpc for eDP displays if I recall though, but I'm not sure on that. > > But that's also not the point of this code. ->mode_valid() is only used in two > situations in DRM modesetting: when probing connector modes, and when checking > if a mode is valid or not during the atomic check for atomic modesetting. Its > purpose is only to reject display modes that are physically impossible to set > in > hardware due to static hardware constraints. Put another way, we only check > the > given mode against constraints which will always remain constant regardless of > the rest of the display state. An example of a static constraint would be the > max pixel clock supported by the hardware, since on sensible hardware this > never > changes. A dynamic constraint would be something like how much bandwidth is > currently unused on an MST topology, since that value is entirely dependent on > the rest of the display state. > > So - with that said, bpc is technically a dynamic constraint because while a > sink and source both likely have their own bpc limits, any bpc which is equal > or > below that limit can be used depending on what the driver decides - which will > be based on the max_bpc property, and additionally for MST displays it will > also > depend on the available bandwidth on the topology. The only non-dynamic thing > about bpc is that at a minimum, it will be 6 - so any mode that doesn't fit on > the link with a bpc of 6 is guaranteed to be a mode that we'll never be able > to > set and therefore want to prune. > > So, even if we're not using 6 in the majority of situations, I'm fairly > confident it's the right value here. It's also what i915 does as well (and > they > previously had to fix a bug that was the result of assuming a minimum of 8bpc > instead of 6). Here's the situation I'm trying to avoid: 1. Mode is considered "OK" from a bandwidth perspective @6bpc 2. Modesetting logic never tries 6bpc and the modeset fails As long as the two bits of logic agree on what is settable, I'm happy. Cheers, -ilia
Re: [PATCH] drm/nouveau/kms/nv50-: Fix clock checking algorithm in nv50_dp_mode_valid()
On Tue, Sep 22, 2020 at 5:14 PM Lyude Paul wrote: > > On Tue, 2020-09-22 at 17:10 -0400, Ilia Mirkin wrote: > > Can we use 6bpc on arbitrary DP monitors, or is there a capability for > > it? Maybe only use 6bpc if display_info.bpc == 6 and otherwise use 8? > > I don't think that display_info.bpc actually implies a minimum bpc, only a > maximum bpc iirc (Ville would know the answer to this one). The other thing to > note here is that we want to assume the lowest possible bpc here since we're > only concerned if the mode passed to ->mode_valid can be set under -any- > conditions (including those that require lowering the bpc beyond it's maximum > value), so we definitely do want to always use 6bpc here even once we get > support for optimizing the bpc based on the available display bandwidth. Yeah, display_info is the max bpc. But would an average monitor support 6bpc? And if it does, does the current link training code even try that when display_info.bpc != 6? -ilia
Re: [PATCH] drm/nouveau/kms/nv50-: Fix clock checking algorithm in nv50_dp_mode_valid()
Can we use 6bpc on arbitrary DP monitors, or is there a capability for it? Maybe only use 6bpc if display_info.bpc == 6 and otherwise use 8? On Tue, Sep 22, 2020 at 5:06 PM Lyude Paul wrote: > > While I thought I had this correct (since it actually did reject modes > like I expected during testing), Ville Syrjala from Intel pointed out > that the logic here isn't correct. max_clock refers to the max symbol > rate supported by the encoder, so limiting clock to ds_clock using max() > doesn't make sense. Additionally, we want to check against 6bpc for the > time being since that's the minimum possible bpc here, not the reported > bpc from the connector. See: > > https://lists.freedesktop.org/archives/dri-devel/2020-September/280276.html > > For more info. > > So, let's rewrite this using Ville's advice. > > Signed-off-by: Lyude Paul > Fixes: 409d38139b42 ("drm/nouveau/kms/nv50-: Use downstream DP clock limits > for mode validation") > Cc: Ville Syrjälä > Cc: Lyude Paul > Cc: Ben Skeggs > --- > drivers/gpu/drm/nouveau/nouveau_dp.c | 23 +-- > 1 file changed, 13 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dp.c > b/drivers/gpu/drm/nouveau/nouveau_dp.c > index 7b640e05bd4cd..24c81e423d349 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dp.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dp.c > @@ -231,23 +231,26 @@ nv50_dp_mode_valid(struct drm_connector *connector, >const struct drm_display_mode *mode, >unsigned *out_clock) > { > - const unsigned min_clock = 25000; > - unsigned max_clock, ds_clock, clock; > + const unsigned int min_clock = 25000; > + unsigned int max_clock, ds_clock, clock; > + const u8 bpp = 18; /* 6 bpc */ > enum drm_mode_status ret; > > if (mode->flags & DRM_MODE_FLAG_INTERLACE && !outp->caps.dp_interlace) > return MODE_NO_INTERLACE; > > max_clock = outp->dp.link_nr * outp->dp.link_bw; > - ds_clock = drm_dp_downstream_max_dotclock(outp->dp.dpcd, > - outp->dp.downstream_ports); > - if (ds_clock) > - max_clock = min(max_clock, ds_clock); > - > - clock = mode->clock * (connector->display_info.bpc * 3) / 10; > - ret = nouveau_conn_mode_clock_valid(mode, min_clock, max_clock, > - ); > + clock = mode->clock * bpp / 8; > + if (clock > max_clock) > + return MODE_CLOCK_HIGH; > + > + ds_clock = drm_dp_downstream_max_dotclock(outp->dp.dpcd, > outp->dp.downstream_ports); > + if (ds_clock && mode->clock > ds_clock) > + return MODE_CLOCK_HIGH; > + > + ret = nouveau_conn_mode_clock_valid(mode, min_clock, max_clock, > ); > if (out_clock) > *out_clock = clock; > + > return ret; > } > -- > 2.26.2 > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] 2dd4d163cd9c ("drm/nouveau: remove open-coded version of remove_conflicting_pci_framebuffers()")
Hi Boris, There was a fixup to that patch that you'll also have to revert first -- 7dbbdd37f2ae7dd4175ba3f86f4335c463b18403. I guess there's some subtle difference between the old open-coded logic and the helper, they were supposed to be identical. Cheers, -ilia On Thu, Jun 18, 2020 at 4:09 PM Borislav Petkov wrote: > > Hi, > > my test box won't boot 5.8-rc1 all the way but stops at > > ... > fb0: switching to nouveaufb from EFI VGA > <-- EOF > > I've bisected it to the commit in $Subject, see below. Unfortunately, it > doesn't revert cleanly so I can't really do the final test of reverting > it ontop of 5.8-rc1 to confirm that this one is really causing it. > > Any ideas? > > GPU is: > > [5.678614] fb0: switching to nouveaufb from EFI VGA > [5.685577] Console: switching to colour dummy device 80x25 > [5.691865] nouveau :03:00.0: NVIDIA GT218 (0a8c00b1) > [5.814409] nouveau :03:00.0: bios: version 70.18.83.00.08 > [5.823559] nouveau :03:00.0: fb: 512 MiB DDR3 > [6.096680] [TTM] Zone kernel: Available graphics memory: 8158364 KiB > [6.103327] [TTM] Zone dma32: Available graphics memory: 2097152 KiB > [6.109951] [TTM] Initializing pool allocator > [6.114405] [TTM] Initializing DMA pool allocator > [6.119256] nouveau :03:00.0: DRM: VRAM: 512 MiB > [6.124285] nouveau :03:00.0: DRM: GART: 1048576 MiB > [6.129677] nouveau :03:00.0: DRM: TMDS table version 2.0 > [6.135534] nouveau :03:00.0: DRM: DCB version 4.0 > [6.140755] nouveau :03:00.0: DRM: DCB outp 00: 02000360 > [6.147273] nouveau :03:00.0: DRM: DCB outp 01: 02000362 00020010 > [6.153782] nouveau :03:00.0: DRM: DCB outp 02: 028003a6 0f220010 > [6.160292] nouveau :03:00.0: DRM: DCB outp 03: 01011380 > [6.166810] nouveau :03:00.0: DRM: DCB outp 04: 08011382 00020010 > [6.173306] nouveau :03:00.0: DRM: DCB outp 05: 088113c6 0f220010 > [6.179829] nouveau :03:00.0: DRM: DCB conn 00: 00101064 > [6.185553] nouveau :03:00.0: DRM: DCB conn 01: 00202165 > [6.196145] nouveau :03:00.0: DRM: MM: using COPY for buffer copies > [6.233659] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [6.311939] nouveau :03:00.0: DRM: allocated 1920x1080 fb: 0x7, bo > (ptrval) > [6.320736] fbcon: nouveaudrmfb (fb0) is primary device > [6.392722] tsc: Refined TSC clocksource calibration: 3591.346 MHz > [6.392788] clocksource: tsc: mask: 0x max_cycles: > 0x33c46403b59, max_idle_ns: 440795293818 ns > [6.392930] clocksource: Switched to clocksource tsc > [6.509946] Console: switching to colour frame buffer device 240x67 > [6.546287] nouveau :03:00.0: fb0: nouveaudrmfb frame buffer device > [6.555021] [drm] Initialized nouveau 1.3.1 20120801 for :03:00.0 on > minor 0 > > Thx. > > git bisect start > # good: [3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162] Linux 5.7 > git bisect good 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162 > # bad: [b3a9e3b9622ae10064826dccb4f7a52bd88c7407] Linux 5.8-rc1 > git bisect bad b3a9e3b9622ae10064826dccb4f7a52bd88c7407 > # bad: [ee01c4d72adffb7d424535adf630f2955748fa8b] Merge branch 'akpm' > (patches from Andrew) > git bisect bad ee01c4d72adffb7d424535adf630f2955748fa8b > # bad: [16d91548d1057691979de4686693f0ff92f46000] Merge tag 'xfs-5.8-merge-8' > of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux > git bisect bad 16d91548d1057691979de4686693f0ff92f46000 > # good: [cfa3b8068b09f25037146bfd5eed041b78878bee] Merge tag 'for-linus-hmm' > of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma > git bisect good cfa3b8068b09f25037146bfd5eed041b78878bee > # good: [3fd911b69b3117e03181262fc19ae6c3ef6962ce] Merge tag > 'drm-misc-next-2020-05-07' of git://anongit.freedesktop.org/drm/drm-misc into > drm-next > git bisect good 3fd911b69b3117e03181262fc19ae6c3ef6962ce > # bad: [1966391fa576e1fb2701be8bcca197d8f72737b7] mm/migrate.c: > attach_page_private already does the get_page > git bisect bad 1966391fa576e1fb2701be8bcca197d8f72737b7 > # good: [43c8546bcd854806736d8a635a0d696504dd4c21] drm/amdgpu: Add a UAPI > flag for user to call mem_sync > git bisect good 43c8546bcd854806736d8a635a0d696504dd4c21 > # good: [6cf991611bc72c077f0cc64e23987341ad7ef41e] Merge tag > 'drm-intel-next-2020-05-15' of git://anongit.freedesktop.org/drm/drm-intel > into drm-next > git bisect good 6cf991611bc72c077f0cc64e23987341ad7ef41e > # bad: [dc455f4c888365595c0a13da445e092422d55b8d] drm/nouveau/dispnv50: fix > runtime pm imbalance on error > git bisect bad dc455f4c888365595c0a13da445e092422d55b8d > # bad: [2dd4d163cd9c15432524aa9863155bc03a821361] drm/nouveau: remove > open-coded version of remove_conflicting_pci_framebuffers() > git bisect bad 2dd4d163cd9c15432524aa9863155bc03a821361 > # good: [c41219fda6e04255c44d37fd2c0d898c1c46abf1] Merge tag > 'drm-intel-next-fixes-2020-05-20' of >
Re: [Nouveau] NVIDIA GP107 (137000a1) - acr: failed to load firmware
On Thu, Jun 4, 2020 at 12:04 PM Zeno Davatz wrote: > > Thank you, Ilia > > On Thu, Jun 4, 2020 at 5:25 PM Ilia Mirkin wrote: > > > There's a lot more firmware files than that ... everything in the > > gp107 directory. Also this would only be necessary if nouveau is built > > into the kernel. The files just have to be available whenever nouveau > > is loaded -- if it's built in, that means the firmware has to be baked > > into the kernel too. If it's loaded from initrd, that means the > > firmware has to be in initrd. If it's loaded after boot, then the > > firmware has to be available after boot. > > For the time being I got it working by removing all nouveau selections > in "make menuconfig" and by emerging "x11-drivers/nvidia-drivers" > Version 440.82. > > Back on the latest Linux Kernel. Feels great ;). > > Linux zenogentoo 5.7.0 #84 SMP Thu Jun 4 17:47:15 CEST 2020 x86_64 > Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux Not sure why you bother asking questions when you're just going to dump nouveau anyways. This is the second time I've answered your questions on this very topic, I think it'll be the last too. Cheers, -ilia
Re: [Nouveau] NVIDIA GP107 (137000a1) - acr: failed to load firmware
On Thu, Jun 4, 2020 at 11:16 AM Zeno Davatz wrote: > > On Thu, Jun 4, 2020 at 4:36 PM Ilia Mirkin wrote: > > > > Starting with kernel 5.6, loading nouveau without firmware (for GPUs > > where it is required, such as yours) got broken. > > > > You are loading nouveau without firmware, so it fails. > > > > The firmware needs to be available to the kernel at the time of nouveau > > loading. > > Ok, I am now trying this: > > /usr/src/linux> grep FIRMWARE /usr/src/linux/.config > CONFIG_FIRMWARE_MEMMAP=y > # CONFIG_GOOGLE_FIRMWARE is not set > CONFIG_PREVENT_FIRMWARE_BUILD=y > CONFIG_EXTRA_FIRMWARE="nvidia/gp107/gr/sw_nonctx.bin" > # CONFIG_CYPRESS_FIRMWARE is not set > # CONFIG_DRM_LOAD_EDID_FIRMWARE is not set > # CONFIG_FIRMWARE_EDID is not set > # CONFIG_TEST_FIRMWARE is not set There's a lot more firmware files than that ... everything in the gp107 directory. Also this would only be necessary if nouveau is built into the kernel. The files just have to be available whenever nouveau is loaded -- if it's built in, that means the firmware has to be baked into the kernel too. If it's loaded from initrd, that means the firmware has to be in initrd. If it's loaded after boot, then the firmware has to be available after boot. Cheers, -ilia
Re: [Nouveau] NVIDIA GP107 (137000a1) - acr: failed to load firmware
Starting with kernel 5.6, loading nouveau without firmware (for GPUs where it is required, such as yours) got broken. You are loading nouveau without firmware, so it fails. The firmware needs to be available to the kernel at the time of nouveau loading. Cheers, -ilia On Thu, Jun 4, 2020 at 10:24 AM Zeno Davatz wrote: > > Hi > > With Kernel 5.7 I am still getting this, while booting: > > ~> uname -a > Linux zenogentoo 5.7.0 #80 SMP Thu Jun 4 16:10:03 CEST 2020 x86_64 > Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux > ~> dmesg |grep nouveau > [0.762872] nouveau :05:00.0: NVIDIA GP107 (137000a1) > [0.875311] nouveau :05:00.0: bios: version 86.07.42.00.4a > [0.875681] nouveau :05:00.0: acr: failed to load firmware > [0.875780] nouveau :05:00.0: acr: failed to load firmware > [0.875881] nouveau :05:00.0: acr ctor failed, -2 > [0.875980] nouveau: probe of :05:00.0 failed with error -2 > > Old thread is here: https://lkml.org/lkml/2020/4/3/775 > > My Linxu-Firmware is: linux-firmware-20200421 > > This used to work fine with Kernel 5.5. > > Please CC me for replies. > > best > Zeno > ___ > Nouveau mailing list > nouv...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nouveau: add fbdev dependency
Isn't this already fixed by https://cgit.freedesktop.org/drm/drm/commit/?id=7dbbdd37f2ae7dd4175ba3f86f4335c463b18403 On Wed, May 27, 2020 at 9:43 AM Arnd Bergmann wrote: > > Calling directly into the fbdev stack only works when the > fbdev layer is built into the kernel as well, or both are > loadable modules: > > drivers/gpu/drm/nouveau/nouveau_drm.o: in function `nouveau_drm_probe': > nouveau_drm.c:(.text+0x1f90): undefined reference to > `remove_conflicting_pci_framebuffers' > > The change seems to have been intentional, so add an explicit > dependency here but allow it to still be compiled if FBDEV > is completely disabled. > > Fixes: 2dd4d163cd9c ("drm/nouveau: remove open-coded version of > remove_conflicting_pci_framebuffers()") > Signed-off-by: Arnd Bergmann > --- > drivers/gpu/drm/nouveau/Kconfig | 1 + > drivers/gpu/drm/nouveau/nouveau_drm.c | 3 ++- > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig > index 980ed09bd7f6..8c640f003358 100644 > --- a/drivers/gpu/drm/nouveau/Kconfig > +++ b/drivers/gpu/drm/nouveau/Kconfig > @@ -18,6 +18,7 @@ config DRM_NOUVEAU > select THERMAL if ACPI && X86 > select ACPI_VIDEO if ACPI && X86 > select SND_HDA_COMPONENT if SND_HDA_CORE > + depends on FBDEV || !FBDEV > help > Choose this option for open-source NVIDIA support. > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c > b/drivers/gpu/drm/nouveau/nouveau_drm.c > index eb10c80ed853..e8560444ab57 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > @@ -697,7 +697,8 @@ static int nouveau_drm_probe(struct pci_dev *pdev, > nvkm_device_del(); > > /* Remove conflicting drivers (vesafb, efifb etc). */ > - ret = remove_conflicting_pci_framebuffers(pdev, "nouveaufb"); > + if (IS_ENABLED(CONFIG_FBDEV)) > + ret = remove_conflicting_pci_framebuffers(pdev, "nouveaufb"); > if (ret) > return ret; > > -- > 2.26.2 > > ___ > Nouveau mailing list > nouv...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [PATCH v3 3/5] drm/nouveau/kms/gv100-: Add support for interlaced modes
On Mon, May 11, 2020 at 6:42 PM Lyude Paul wrote: > diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c > b/drivers/gpu/drm/nouveau/nouveau_connector.c > index 43bcbb6d73c4..6dae00da5d7e 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_connector.c > +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c > @@ -1065,7 +1065,7 @@ nouveau_connector_mode_valid(struct drm_connector > *connector, > return get_slave_funcs(encoder)->mode_valid(encoder, mode); > case DCB_OUTPUT_DP: > if (mode->flags & DRM_MODE_FLAG_INTERLACE && > - !nv_encoder->dp.caps.interlace) > + !nv_encoder->caps.dp_interlace) > return MODE_NO_INTERLACE; > > max_clock = nv_encoder->dp.link_nr; You probably meant for this hunk to go into an earlier patch. -ilia
Re: [PATCH v2 1/4] drm/komeda: Add a new helper drm_color_ctm_s31_32_to_qm_n()
On Mon, Oct 14, 2019 at 9:16 PM james qian wang (Arm Technology China) wrote: > On Mon, Oct 14, 2019 at 11:58:48AM -0400, Ilia Mirkin wrote: > > On Fri, Oct 11, 2019 at 1:43 AM james qian wang (Arm Technology China) > > wrote: > > > + * > > > + * Convert and clamp S31.32 sign-magnitude to Qm.n 2's complement. > > > + */ > > > +uint64_t drm_color_ctm_s31_32_to_qm_n(uint64_t user_input, > > > + uint32_t m, uint32_t n) > > > +{ > > > + u64 mag = (user_input & ~BIT_ULL(63)) >> (32 - n); > > > + bool negative = !!(user_input & BIT_ULL(63)); > > > + s64 val; > > > + > > > + /* the range of signed 2s complement is [-2^n+m, 2^n+m - 1] */ > > > > This implies that n = 32, m = 0 would actually yield a 33-bit 2's > > complement number. Is that what you meant? > > Yes, since m doesn't include sign-bit So a Q0.32 is a 33bit value. This goes counter to what the wikipedia page says [ https://en.wikipedia.org/wiki/Q_(number_format) ]: (reformatted slightly for text-only consumption): """ For example, a Q15.1 format number: - requires 15+1 = 16 bits - its range is [-2^14, 2^14 - 2^-1] = [-16384.0, +16383.5] = [0x8000, 0x8001 ... 0x, 0x, 0x0001 ... 0x7FFE, 0x7FFF] - its resolution is 2^-1 = 0.5 """ This suggests that the proper way to represent a standard 32-bit 2's complement integer would be Q32.0. -ilia
Re: [PATCH 13/36] drm/nouveau: use bpp instead of cpp for drm_format_info
On Mon, Sep 23, 2019 at 8:56 AM Sandy Huang wrote: > > cpp[BytePerPlane] can't describe the 10bit data format correctly, > So we use bpp[BitPerPlane] to instead cpp. > > Signed-off-by: Sandy Huang > --- > drivers/gpu/drm/nouveau/dispnv04/crtc.c | 7 --- > drivers/gpu/drm/nouveau/dispnv50/base507c.c | 4 ++-- > drivers/gpu/drm/nouveau/dispnv50/ovly507e.c | 2 +- > 3 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c > b/drivers/gpu/drm/nouveau/dispnv04/crtc.c > index f22f010..59d2f07 100644 > --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c > +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c > @@ -874,11 +874,12 @@ nv04_crtc_do_mode_set_base(struct drm_crtc *crtc, > > /* Update the framebuffer location. */ > regp->fb_start = nv_crtc->fb.offset & ~3; > - regp->fb_start += (y * drm_fb->pitches[0]) + (x * > drm_fb->format->cpp[0]); > + regp->fb_start += (y * drm_fb->pitches[0]) + > + (x * drm_fb->format->bpp[0] / 8); > nv_set_crtc_base(dev, nv_crtc->index, regp->fb_start); > > /* Update the arbitration parameters. */ > - nouveau_calc_arb(dev, crtc->mode.clock, drm_fb->format->cpp[0] * 8, > + nouveau_calc_arb(dev, crtc->mode.clock, drm_fb->format->bpp[0], > _burst, _lwm); > > regp->CRTC[NV_CIO_CRE_FF_INDEX] = arb_burst; > @@ -1238,7 +1239,7 @@ nv04_crtc_page_flip(struct drm_crtc *crtc, struct > drm_framebuffer *fb, > > /* Initialize a page flip struct */ > *s = (struct nv04_page_flip_state) > - { { }, event, crtc, fb->format->cpp[0] * 8, fb->pitches[0], > + { { }, event, crtc, fb->format->bpp[0], fb->pitches[0], > new_bo->bo.offset }; > > /* Keep vblanks on during flip, for the target crtc of this flip */ > diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c > b/drivers/gpu/drm/nouveau/dispnv50/base507c.c > index d5e295c..59883bd0 100644 > --- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c > +++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c > @@ -190,12 +190,12 @@ base507c_acquire(struct nv50_wndw *wndw, struct > nv50_wndw_atom *asyw, > return ret; > > if (!wndw->func->ilut) { > - if ((asyh->base.cpp != 1) ^ (fb->format->cpp[0] != 1)) > + if (asyh->base.cpp != 1 ^ fb->format->bpp[0] != 8) Please leave the parens in. Even if it works out to the same thing (don't know), ^ vs != ordering isn't fresh in many people's minds (mine included). > asyh->state.color_mgmt_changed = true; > } > > asyh->base.depth = fb->format->depth; > - asyh->base.cpp = fb->format->cpp[0]; > + asyh->base.cpp = fb->format->bpp[0] / 8; > asyh->base.x = asyw->state.src.x1 >> 16; > asyh->base.y = asyw->state.src.y1 >> 16; > asyh->base.w = asyw->state.fb->width; > diff --git a/drivers/gpu/drm/nouveau/dispnv50/ovly507e.c > b/drivers/gpu/drm/nouveau/dispnv50/ovly507e.c > index cc41766..c6c2e0b 100644 > --- a/drivers/gpu/drm/nouveau/dispnv50/ovly507e.c > +++ b/drivers/gpu/drm/nouveau/dispnv50/ovly507e.c > @@ -135,7 +135,7 @@ ovly507e_acquire(struct nv50_wndw *wndw, struct > nv50_wndw_atom *asyw, > if (ret) > return ret; > > - asyh->ovly.cpp = fb->format->cpp[0]; > + asyh->ovly.cpp = fb->format->bpp[0] / 8; > return 0; > } > > -- > 2.7.4 > > > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4.19 092/190] drm/nouveau: Dont WARN_ON VCPI allocation failures
On Fri, Sep 13, 2019 at 11:01 AM Sasha Levin wrote: > > On Fri, Sep 13, 2019 at 03:54:56PM +0100, Greg Kroah-Hartman wrote: > >On Fri, Sep 13, 2019 at 10:46:27AM -0400, Sasha Levin wrote: > >> On Fri, Sep 13, 2019 at 09:33:36AM -0400, Ilia Mirkin wrote: > >> > Hi Greg, > >> > > >> > This feels like it's missing a From: line. > >> > > >> > commit b513a18cf1d705bd04efd91c417e79e4938be093 > >> > Author: Lyude Paul > >> > Date: Mon Jan 28 16:03:50 2019 -0500 > >> > > >> >drm/nouveau: Don't WARN_ON VCPI allocation failures > >> > > >> > Is this an artifact of your notification-of-patches process and I > >> > never noticed before, or was the patch ingested incorrectly? > >> > >> It was always like this for patches that came through me. Greg's script > >> generates an explicit "From:" line in the patch, but I never saw the > >> value in that since git does the right thing by looking at the "From:" > >> line in the mail header. > >> > >> The right thing is being done in stable-rc and for the releases. For > >> your example here, this is how it looks like in the stable-rc tree: > >> > >> commit bdcc885be68289a37d0d063cd94390da81fd8178 > >> Author: Lyude Paul > >> AuthorDate: Mon Jan 28 16:03:50 2019 -0500 > >> Commit: Greg Kroah-Hartman > >> CommitDate: Fri Sep 13 14:05:29 2019 +0100 > >> > >>drm/nouveau: Don't WARN_ON VCPI allocation failures > > > >Yeah, we should fix your scripts to put the explicit From: line in here > >as we are dealing with patches in this format and it causes confusion at > >times (like now.) It's not the first time and that's why I added those > >lines to the patches. > > Heh, didn't think anyone cared about this scenario for the stable-rc > patches. > > I'll go add it. > > But... why do you actually care? Just a hygiene thing. Everyone else sends patches the normal way, with accurate attribution. Why should stable be different? (I was surprised to see Greg contributing to nouveau when I first saw the patch. But then realized it was the stable ingestion notification.) -ilia
Re: [PATCH 4.19 092/190] drm/nouveau: Dont WARN_ON VCPI allocation failures
Hi Greg, This feels like it's missing a From: line. commit b513a18cf1d705bd04efd91c417e79e4938be093 Author: Lyude Paul Date: Mon Jan 28 16:03:50 2019 -0500 drm/nouveau: Don't WARN_ON VCPI allocation failures Is this an artifact of your notification-of-patches process and I never noticed before, or was the patch ingested incorrectly? Cheers, -ilia On Fri, Sep 13, 2019 at 9:16 AM Greg Kroah-Hartman wrote: > > [ Upstream commit b513a18cf1d705bd04efd91c417e79e4938be093 ] > > This is much louder then we want. VCPI allocation failures are quite > normal, since they will happen if any part of the modesetting process is > interrupted by removing the DP MST topology in question. So just print a > debugging message on VCPI failures instead. > > Signed-off-by: Lyude Paul > Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 > multi-stream") > Cc: Ben Skeggs > Cc: dri-de...@lists.freedesktop.org > Cc: nouv...@lists.freedesktop.org > Cc: # v4.10+ > Signed-off-by: Ben Skeggs > Signed-off-by: Sasha Levin > --- > drivers/gpu/drm/nouveau/dispnv50/disp.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c > b/drivers/gpu/drm/nouveau/dispnv50/disp.c > index f889d41a281fa..5e01bfb69d7a3 100644 > --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c > +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c > @@ -759,7 +759,8 @@ nv50_msto_enable(struct drm_encoder *encoder) > > slots = drm_dp_find_vcpi_slots(>mgr, mstc->pbn); > r = drm_dp_mst_allocate_vcpi(>mgr, mstc->port, mstc->pbn, > slots); > - WARN_ON(!r); > + if (!r) > + DRM_DEBUG_KMS("Failed to allocate VCPI\n"); > > if (!mstm->links++) > nv50_outp_acquire(mstm->outp); > -- > 2.20.1 > > > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH 20/22] mm: move hmm_vma_fault to nouveau
On Wed, Jul 3, 2019 at 1:49 PM Ralph Campbell wrote: > On 6/30/19 11:20 PM, Christoph Hellwig wrote: > > hmm_vma_fault is marked as a legacy API to get rid of, but quite suites > > the current nouvea flow. Move it to the only user in preparation for > > I didn't quite parse the phrase "quite suites the current nouvea flow." > s/nouvea/nouveau/ As long as you're fixing typos, suites -> suits.
Re: nouveau: DRM: GPU lockup - switching to software fbcon
On Wed, Jun 19, 2019 at 1:48 AM Sergey Senozhatsky wrote: > > On (06/19/19 01:20), Ilia Mirkin wrote: > > On Wed, Jun 19, 2019 at 1:08 AM Sergey Senozhatsky > > wrote: > > > > > > On (06/14/19 11:50), Sergey Senozhatsky wrote: > > > > dmesg > > > > > > > > nouveau :01:00.0: DRM: GPU lockup - switching to software fbcon > > > > nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] > > > > nouveau :01:00.0: fifo: runlist 0: scheduled for recovery > > > > nouveau :01:00.0: fifo: channel 5: killed > > > > nouveau :01:00.0: fifo: engine 6: scheduled for recovery > > > > nouveau :01:00.0: fifo: engine 0: scheduled for recovery > > > > nouveau :01:00.0: firefox[476]: channel 5 killed! > > > > nouveau :01:00.0: firefox[476]: failed to idle channel 5 > > > > [firefox[476]] > > > > > > > > It lockups several times a day. Twice in just one hour today. > > > > Can we fix this? > > > > > > Unusable > > > > Are you using a GTX 660 by any chance? You've provided rather minimal > > system info. > > 01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 730] > (rev a1) Quite literally the same GPU I have plugged in... 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 730] [10de:1287] (rev a1) Works great here! Only other thing I can think of is that I avoid applications with the letters "G" and "K" in their names, and I'm using xf86-video-nouveau ddx, whereas you might be using the "modeset" ddx with glamor. If all else fails, just remove nouveau_dri.so and/or boot with nouveau.noaccel=1 -- should be perfect. Cheers, -ilia
Re: drm/nouveau/bios/ramcfg, setting of RON pull value
On Sat, Feb 16, 2019 at 10:02 AM Colin Ian King wrote: > > Hi, > > Static Analysis with CoverityScan as detected an issue with the setting > of the RON pull value in function nvkm_gddr3_calc in > drm/nouveau/bios/ramcfg.c > > This was introduced by commit: c25bf7b6155cb ("drm/nouveau/bios/ramcfg: > Separate out RON pull value") > > CoverityScan reports the issue as follows: > > 84case 0x20: > 85CWL = (ram->next->bios.timing[1] & 0x0f80) >> 7; > 86CL = (ram->next->bios.timing[1] & 0x001f) >> 0; > 87WR = (ram->next->bios.timing[2] & 0x007f) >> 16; > 88/* XXX: Get these values from the VBIOS instead */ > 89DLL = !(ram->mr[1] & 0x1); > >CID 1324005 (#1 of 1): Operands don't affect result > (CONSTANT_EXPRESSION_RESULT) > > result_independent_of_operands: !(ram->mr[1] & 768) >> 8 is 0 regardless > of the values of its operands. This occurs as the operand of assignment. > > 90RON = !(ram->mr[1] & 0x300) >> 8; > 91break; > > Looking at this, I believe perhaps the correct setting could be: > > RON = !((ram->mr[1] & 0x300) >> 8); > > ..however I don't have the datasheet available for the H/W so I'm not > sure if this a correct fix. Actually looking at the code a bit, I suspect it should just be RON = (ram->mr[1] & 0x300) >> 8; since later on, when we recompose the MR (memory register) value, we do: ram->mr[1] |= (RON & 0x03) << 8; (And the whole point here is that we don't know how to get the proper RON value for that timing table version, so we just copy whatever used to be there in that case.) -ilia
Re: Nouveau module X server not starting on a NP900X5N Kaby Lake machine
On Tue, Jan 1, 2019 at 5:30 PM Jan Vlietland wrote: > > Hi Ilia, many thanks for answering my mail. > > Tonight I tried to see what happens if I generate a xorg.conf file and place > it in /etc/X11/xorg.conf, as described here: > https://askubuntu.com/questions/4662/where-is-the-x-org-config-file-how-do-i-configure-x-there > > When I do that X starts without the framebuffer error. X starts with a > backtrace list in the shell and then stops with the error: > > 'Segmentation fault at address 0x0. > > Fatal sever error etc etc etc. Unless you're an advanced user, you'll get the best results by not supplying a manual xorg.conf. Generically, this indicates that you messed something up. Without knowing precisely what the contents of that file are, it would be difficult to say what exactly went wrong. However I wouldn't advise this path without a good reason. > > Hope this helps! > > In fact the above is part of a much bigger issue I have with the > machine. When I enable the i915 module (Kaby lake native video) my > screen goed black after a while. The machine is totally stuck in that > state. Even ssh connection is not possible. It shows no errors in the > (saved) logs after restarting the machine. > > So I disabled the i915 module and try to get the nvidia card running. > Without any luck. > > Thank you for inviting me for irc.freenode.net. What it the procss to > get access? It's an IRC network like any other. More info about the network available at https://freenode.net/ It's open to anyone... #nouveau for nouveau, #intel-gfx for intel. > > I have included the full dmesg in zip format. > > For me it is a showstopper using the machine with Linux. I really do not > understand that I am the only person on this planet that cannot run > Linux on a plain vanilla Kaby lake machine. I don't know the specifics of your laptop, but on many other GM108M laptops, the displays are only attached to the Intel GPU. So running without i915 is just not an option, if you want anything displayed. You would be able to use the GM108M chip for 3D acceleration if you chose, but nothing to do with actual display. If your screen goes black with i915 loaded, I suspect that you'd be better served reporting this issue to Intel. >From your logs, you also appear to have a variety of combinations of nomodeset/.modeset=0 combinations -- these will just impede the proper mode of operation. The i915 and nouveau drivers effectively do nothing under those conditions. Cheers, -ilia
Re: Nouveau module results in total lockups without any dmesg trace on a NP900X5N Kaby Lake machine
On Tue, Jan 1, 2019 at 4:06 PM Jan Vlietland wrote: > > Hi Ben, David and Daniel , > > First of all happy new year. Based on advice of Greg K-H herewith a mail > about a number of Nouveau issues with my laptop. > > I installed various Kali linux versions up to Linux 4.20.0-rc7 > (downloaded, compiled and installed) on a Samsung NP900X5N laptop and > have an issue with the driver after loading. > > My configuration: > > - i7 7500 > - 16 gb / 256 gb ssd > - nvidia 940MX (for 3D graphics) > > When I enable loading of the nouveau module for my Nvidia 3D card I get > three MMIO faults: > > [ 35.984104] nouveau :01:00.0: bus: MMIO read of FAULT at > 6013d4 [ IBUS ] > [ 35.997510] nouveau :01:00.0: bus: MMIO read of FAULT at > 10ac08 [ IBUS ] > [ 37.551790] nouveau :01:00.0: bus: MMIO read of FAULT at > 619444 [ IBUS ] > > I see currenty varous discussions on bugzilla: (as summarized by Bruno > Pagani) https://bugs.freedesktop.org/show_bug.cgi?id=100423. > > But I do not see any confirmed solutions on the MMIO faults. > > The module is loaded but X server cannot run in framebuffer mode. I > assume that the module does not provide any video memory to X to run in > graphics mode. > > First of all I would like to understand what the faults impose. > And I also would like to help you providing testing to fix the errors. The faults are, generally, nothing to worry about, esp if they occur infrequently. It's one bit or another of code that's poking at a part of the GPU it shouldn't be touching. To the best of my knowledge, 940MX (GM108) should work reasonably well. Perhaps it would make most sense if you posted about some of your other issues (usually GM108's are have no outputs, so only usable as offload devices). Feel free to join #nouveau on irc.freenode.net to get more info as well. Cheers, -ilia
Re: [Nouveau] [PATCH][next] drm/nouveau/disp: avoid potential overflow on shift of int value
On Sun, May 27, 2018 at 5:54 PM, Colin Kingwrote: > From: Colin Ian King > > The constant values being shifted are 32 bit integers and may potentially > overflow on the shift. Avoid this potential overflow by making them > unsigned long long values before the shift. > > Detected by CoverityScan, CID#1469383, 1469400 ("Unintentional > integer overflow") > > Signed-off-by: Colin Ian King > --- > drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c | 2 +- > drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c > b/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c > index 29e6dd58ac48..99b94802ed63 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c > @@ -52,7 +52,7 @@ void > gf119_disp_chan_intr(struct nv50_disp_chan *chan, bool en) > { > struct nvkm_device *device = chan->disp->base.engine.subdev.device; > - const u64 mask = 0x0001 << chan->chid.user; > + const u64 mask = 0x0001ULL << chan->chid.user; I'm pretty sure all of these should just be u32 (below as well). The registers that this is masking are all 32-bit, more doesn't make sense. > if (!en) { > nvkm_mask(device, 0x610090, mask, 0x); > nvkm_mask(device, 0x6100a0, mask, 0x); > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c > b/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c > index 57719f675eec..43ae3b092e43 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c > @@ -166,7 +166,7 @@ void > nv50_disp_chan_intr(struct nv50_disp_chan *chan, bool en) > { > struct nvkm_device *device = chan->disp->base.engine.subdev.device; > - const u64 mask = 0x00010001 << chan->chid.user; > + const u64 mask = 0x00010001ULL << chan->chid.user; > const u64 data = en ? 0x0001 : 0x; > nvkm_mask(device, 0x610028, mask, data); > } > -- > 2.17.0 > > ___ > Nouveau mailing list > nouv...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH][next] drm/nouveau/disp: avoid potential overflow on shift of int value
On Sun, May 27, 2018 at 5:54 PM, Colin King wrote: > From: Colin Ian King > > The constant values being shifted are 32 bit integers and may potentially > overflow on the shift. Avoid this potential overflow by making them > unsigned long long values before the shift. > > Detected by CoverityScan, CID#1469383, 1469400 ("Unintentional > integer overflow") > > Signed-off-by: Colin Ian King > --- > drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c | 2 +- > drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c > b/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c > index 29e6dd58ac48..99b94802ed63 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c > @@ -52,7 +52,7 @@ void > gf119_disp_chan_intr(struct nv50_disp_chan *chan, bool en) > { > struct nvkm_device *device = chan->disp->base.engine.subdev.device; > - const u64 mask = 0x0001 << chan->chid.user; > + const u64 mask = 0x0001ULL << chan->chid.user; I'm pretty sure all of these should just be u32 (below as well). The registers that this is masking are all 32-bit, more doesn't make sense. > if (!en) { > nvkm_mask(device, 0x610090, mask, 0x); > nvkm_mask(device, 0x6100a0, mask, 0x); > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c > b/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c > index 57719f675eec..43ae3b092e43 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c > @@ -166,7 +166,7 @@ void > nv50_disp_chan_intr(struct nv50_disp_chan *chan, bool en) > { > struct nvkm_device *device = chan->disp->base.engine.subdev.device; > - const u64 mask = 0x00010001 << chan->chid.user; > + const u64 mask = 0x00010001ULL << chan->chid.user; > const u64 data = en ? 0x0001 : 0x; > nvkm_mask(device, 0x610028, mask, data); > } > -- > 2.17.0 > > ___ > Nouveau mailing list > nouv...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE
On Sat, Apr 28, 2018 at 7:02 PM, Michel Dänzer <mic...@daenzer.net> wrote: > On 2018-04-28 06:30 PM, Ilia Mirkin wrote: >> On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzer <mic...@daenzer.net> wrote: >>> From: Michel Dänzer <michel.daen...@amd.com> >>> >>> Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled) >>> try to allocate huge pages. However, not all drivers can take advantage >>> of huge pages, but they would incur the overhead for allocating and >>> freeing them anyway. >>> >>> Now, drivers which can take advantage of huge pages need to set the new >>> flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag >>> no longer incur any overhead for allocating or freeing huge pages. >>> >>> v2: >>> * Also guard swapping of consecutive pages in ttm_get_pages >>> * Reword commit log, hopefully clearer now >>> >>> Cc: sta...@vger.kernel.org >>> Signed-off-by: Michel Dänzer <michel.daen...@amd.com> >> >> Both I and lots of other people, based on reports, are still seeing >> plenty of issues with this as late as 4.16.4. > > "lots of other people", "plenty of issues" sounds a bit exaggerated from > what I've seen. FWIW, while I did see the original messages myself, I > haven't seen any since Christian's original fix (see below), neither > with amdgpu nor radeon, even before this patch you followed up to. Probably a half-dozen reports of it with nouveau, in addition to another bunch of people talking about it on the bug you mention below, along with email threads on dri-devel. I figured I didn't have to raise my own since it was identical to the others, and, I assumed, was being handled. >> Admittedly I'm on nouveau, but others have reported issues with >> radeon/amdgpu as well. It's been going on since the feature was merged >> in v4.15, with what seems like little investigation from the authors >> introducing the feature. > > That's not a fair assessment. See > https://bugs.freedesktop.org/show_bug.cgi?id=104082#c40 and following > comments. > > Christian fixed the original issue in > d0bc0c2a31c95002d37c3cc511ffdcab851b3256 "swiotlb: suppress warning when > __GFP_NOWARN is set". Christian did his best to try and get the fix in > before 4.15 final, but for reasons beyond his control, it was delayed > until 4.16-rc1 and then backported to 4.15.5. In case it's unclear, let me state this explicitly -- I totally get that despite best intentions, bugs get introduced. I do it myself. What I'm having trouble with is the handling once the issue is discovered. > > Unfortunately, there was an swiotlb regression (not directly related to > Christian's work) shortly after this fix, also in 4.16-rc1, which is now > fixed in 4.17-rc1 and will be backported to 4.16.y. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.16.5=2c9dacf5bfe1e45d96dfe97cb71d2b717786a7b9 This guy? Didn't help. I'm running 4.16.4 right now. > It looks like there's at least one more bug left, but it's not clear yet > when that was introduced, whether it's directly related to Christian's > work, or indeed what the impact is. Let's not get ahead of ourselves. Whether it is directly related to that work or not, the issue persists. There are two options: - When declaring things fixed, no serious attempt was actually made at reproducing the underlying issues. - The authors truly can't reproduce the underlying issues users are seeing and are taking stabs in the dark. Given that a number of people are reporting problems, in either scenario, the reasonable thing is to disable the feature, and figure out what is going on. Maybe condition it on !CONFIG_SWIOTLB. >> We now have *two* broken releases, v4.15 and v4.16 (anything that >> spews error messages and stack traces ad-infinitum in dmesg is, by >> definition, broken). > > I haven't seen any evidence that there's still an issue in 4.15, is > there any? Well, I did have a late 4.15 rc kernel in addition to the 'suppress warning' patch. Now I'm questioning my memory of whether the issue was resolved there or not. I'm pretty sure that 'not', but no longer 100%. Either way, I think we all agree v4.15 was broken and more importantly was *known* to be broken well in advance of the release. A reasonable option would have been to disable the feature until the other bits fell into place. >> You're putting this behind a flag now (finally), > > I wrote this patch because I realized due to some remark I happened to > see you make this week on IRC that the huge page support in TTM was > enabled for all drivers. Instead of making that kind of remark on IRC, > it would ha
Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE
On Sat, Apr 28, 2018 at 7:02 PM, Michel Dänzer wrote: > On 2018-04-28 06:30 PM, Ilia Mirkin wrote: >> On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzer wrote: >>> From: Michel Dänzer >>> >>> Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled) >>> try to allocate huge pages. However, not all drivers can take advantage >>> of huge pages, but they would incur the overhead for allocating and >>> freeing them anyway. >>> >>> Now, drivers which can take advantage of huge pages need to set the new >>> flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag >>> no longer incur any overhead for allocating or freeing huge pages. >>> >>> v2: >>> * Also guard swapping of consecutive pages in ttm_get_pages >>> * Reword commit log, hopefully clearer now >>> >>> Cc: sta...@vger.kernel.org >>> Signed-off-by: Michel Dänzer >> >> Both I and lots of other people, based on reports, are still seeing >> plenty of issues with this as late as 4.16.4. > > "lots of other people", "plenty of issues" sounds a bit exaggerated from > what I've seen. FWIW, while I did see the original messages myself, I > haven't seen any since Christian's original fix (see below), neither > with amdgpu nor radeon, even before this patch you followed up to. Probably a half-dozen reports of it with nouveau, in addition to another bunch of people talking about it on the bug you mention below, along with email threads on dri-devel. I figured I didn't have to raise my own since it was identical to the others, and, I assumed, was being handled. >> Admittedly I'm on nouveau, but others have reported issues with >> radeon/amdgpu as well. It's been going on since the feature was merged >> in v4.15, with what seems like little investigation from the authors >> introducing the feature. > > That's not a fair assessment. See > https://bugs.freedesktop.org/show_bug.cgi?id=104082#c40 and following > comments. > > Christian fixed the original issue in > d0bc0c2a31c95002d37c3cc511ffdcab851b3256 "swiotlb: suppress warning when > __GFP_NOWARN is set". Christian did his best to try and get the fix in > before 4.15 final, but for reasons beyond his control, it was delayed > until 4.16-rc1 and then backported to 4.15.5. In case it's unclear, let me state this explicitly -- I totally get that despite best intentions, bugs get introduced. I do it myself. What I'm having trouble with is the handling once the issue is discovered. > > Unfortunately, there was an swiotlb regression (not directly related to > Christian's work) shortly after this fix, also in 4.16-rc1, which is now > fixed in 4.17-rc1 and will be backported to 4.16.y. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.16.5=2c9dacf5bfe1e45d96dfe97cb71d2b717786a7b9 This guy? Didn't help. I'm running 4.16.4 right now. > It looks like there's at least one more bug left, but it's not clear yet > when that was introduced, whether it's directly related to Christian's > work, or indeed what the impact is. Let's not get ahead of ourselves. Whether it is directly related to that work or not, the issue persists. There are two options: - When declaring things fixed, no serious attempt was actually made at reproducing the underlying issues. - The authors truly can't reproduce the underlying issues users are seeing and are taking stabs in the dark. Given that a number of people are reporting problems, in either scenario, the reasonable thing is to disable the feature, and figure out what is going on. Maybe condition it on !CONFIG_SWIOTLB. >> We now have *two* broken releases, v4.15 and v4.16 (anything that >> spews error messages and stack traces ad-infinitum in dmesg is, by >> definition, broken). > > I haven't seen any evidence that there's still an issue in 4.15, is > there any? Well, I did have a late 4.15 rc kernel in addition to the 'suppress warning' patch. Now I'm questioning my memory of whether the issue was resolved there or not. I'm pretty sure that 'not', but no longer 100%. Either way, I think we all agree v4.15 was broken and more importantly was *known* to be broken well in advance of the release. A reasonable option would have been to disable the feature until the other bits fell into place. >> You're putting this behind a flag now (finally), > > I wrote this patch because I realized due to some remark I happened to > see you make this week on IRC that the huge page support in TTM was > enabled for all drivers. Instead of making that kind of remark on IRC, > it would have been more constructive, and more conducive to quick > implementation, to suggest making the feature no
Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE
On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzerwrote: > From: Michel Dänzer > > Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled) > try to allocate huge pages. However, not all drivers can take advantage > of huge pages, but they would incur the overhead for allocating and > freeing them anyway. > > Now, drivers which can take advantage of huge pages need to set the new > flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag > no longer incur any overhead for allocating or freeing huge pages. > > v2: > * Also guard swapping of consecutive pages in ttm_get_pages > * Reword commit log, hopefully clearer now > > Cc: sta...@vger.kernel.org > Signed-off-by: Michel Dänzer Both I and lots of other people, based on reports, are still seeing plenty of issues with this as late as 4.16.4. Admittedly I'm on nouveau, but others have reported issues with radeon/amdgpu as well. It's been going on since the feature was merged in v4.15, with what seems like little investigation from the authors introducing the feature. We now have *two* broken releases, v4.15 and v4.16 (anything that spews error messages and stack traces ad-infinitum in dmesg is, by definition, broken). You're putting this behind a flag now (finally), but should it be enabled anywhere? Why is it being flipped on for amdgpu by default, despite the still-existing problems? Reverting this feature without just resetting back to the code in v4.14 is painful, but why make Joe User suffer by enabling it while you're still working out the kinks? -ilia
Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE
On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzer wrote: > From: Michel Dänzer > > Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled) > try to allocate huge pages. However, not all drivers can take advantage > of huge pages, but they would incur the overhead for allocating and > freeing them anyway. > > Now, drivers which can take advantage of huge pages need to set the new > flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag > no longer incur any overhead for allocating or freeing huge pages. > > v2: > * Also guard swapping of consecutive pages in ttm_get_pages > * Reword commit log, hopefully clearer now > > Cc: sta...@vger.kernel.org > Signed-off-by: Michel Dänzer Both I and lots of other people, based on reports, are still seeing plenty of issues with this as late as 4.16.4. Admittedly I'm on nouveau, but others have reported issues with radeon/amdgpu as well. It's been going on since the feature was merged in v4.15, with what seems like little investigation from the authors introducing the feature. We now have *two* broken releases, v4.15 and v4.16 (anything that spews error messages and stack traces ad-infinitum in dmesg is, by definition, broken). You're putting this behind a flag now (finally), but should it be enabled anywhere? Why is it being flipped on for amdgpu by default, despite the still-existing problems? Reverting this feature without just resetting back to the code in v4.14 is painful, but why make Joe User suffer by enabling it while you're still working out the kinks? -ilia
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mr...@linux.ee> wrote: >>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15: >> >> NV5 in another PC (secondary card in x86-64) made the systrem crash on >> boot, in nvkm_therm_clkgate_fini. > > Mind booting with nouveau.debug=trace? That should hopefully tell us > more exactly which thing is dying. If you have a cross-compile/distcc > setup handy, a bisect may be even more useful. Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is somehow mis-hooked up for NV5 now. A bisect result would still make the culprit a lot more obvious.
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin wrote: > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos wrote: >>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15: >> >> NV5 in another PC (secondary card in x86-64) made the systrem crash on >> boot, in nvkm_therm_clkgate_fini. > > Mind booting with nouveau.debug=trace? That should hopefully tell us > more exactly which thing is dying. If you have a cross-compile/distcc > setup handy, a bisect may be even more useful. Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is somehow mis-hooked up for NV5 now. A bisect result would still make the culprit a lot more obvious.
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
On Wed, Feb 14, 2018 at 9:29 AM, Meelis Rooswrote: >> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15: > > NV5 in another PC (secondary card in x86-64) made the systrem crash on > boot, in nvkm_therm_clkgate_fini. Mind booting with nouveau.debug=trace? That should hopefully tell us more exactly which thing is dying. If you have a cross-compile/distcc setup handy, a bisect may be even more useful. It's funny, I had a NV5 plugged into my desktop for testing, and *just* took it out (because the box wouldn't even get to BIOS anymore ... although it was unrelated to the NV5, probably just something mis-seated.) -ilia
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos wrote: >> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15: > > NV5 in another PC (secondary card in x86-64) made the systrem crash on > boot, in nvkm_therm_clkgate_fini. Mind booting with nouveau.debug=trace? That should hopefully tell us more exactly which thing is dying. If you have a cross-compile/distcc setup handy, a bisect may be even more useful. It's funny, I had a NV5 plugged into my desktop for testing, and *just* took it out (because the box wouldn't even get to BIOS anymore ... although it was unrelated to the NV5, probably just something mis-seated.) -ilia
Re: [RFC v2 3/4] drm/nouveau: Add support for BLCG on Kepler2
On Thu, Jan 25, 2018 at 10:35 PM, Lyude Paulwrote: > Same as the previous patch, but for Kepler2 now > > Signed-off-by: Lyude Paul > --- > drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h | 1 + > drivers/gpu/drm/nouveau/nvkm/engine/device/base.c | 8 +-- > drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c| 62 > drivers/gpu/drm/nouveau/nvkm/subdev/fb/Kbuild | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk110.c| 71 > +++ > 5 files changed, 139 insertions(+), 4 deletions(-) > create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk110.c > > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h > b/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h > index adb78f7d083a..92be0e5269c6 100644 > --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h > +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h > @@ -75,6 +75,7 @@ int mcp89_fb_new(struct nvkm_device *, int, struct nvkm_fb > **); > int gf100_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > int gf108_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > int gk104_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > +int gk110_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > int gk20a_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > int gm107_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > int gm200_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c > b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c > index 74bd09b1c893..7590a30b7ff0 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c > @@ -1812,7 +1812,7 @@ nvf0_chipset = { > .bus = gf100_bus_new, > .clk = gk104_clk_new, > .devinit = gf100_devinit_new, > - .fb = gk104_fb_new, > + .fb = gk110_fb_new, > .fuse = gf100_fuse_new, > .gpio = gk104_gpio_new, > .i2c = gk104_i2c_new, > @@ -1850,7 +1850,7 @@ nvf1_chipset = { > .bus = gf100_bus_new, > .clk = gk104_clk_new, > .devinit = gf100_devinit_new, > - .fb = gk104_fb_new, > + .fb = gk110_fb_new, > .fuse = gf100_fuse_new, > .gpio = gk104_gpio_new, > .i2c = gk104_i2c_new, > @@ -1888,7 +1888,7 @@ nv106_chipset = { > .bus = gf100_bus_new, > .clk = gk104_clk_new, > .devinit = gf100_devinit_new, > - .fb = gk104_fb_new, > + .fb = gk110_fb_new, > .fuse = gf100_fuse_new, > .gpio = gk104_gpio_new, > .i2c = gk104_i2c_new, > @@ -1926,7 +1926,7 @@ nv108_chipset = { > .bus = gf100_bus_new, > .clk = gk104_clk_new, > .devinit = gf100_devinit_new, > - .fb = gk104_fb_new, > + .fb = gk110_fb_new, > .fuse = gf100_fuse_new, > .gpio = gk104_gpio_new, > .i2c = gk104_i2c_new, > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c > b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c > index a38e19b61c1d..38d3328e45f1 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c > @@ -22,6 +22,7 @@ > * Authors: Ben Skeggs > */ > #include "gf100.h" > +#include "gk104.h" > #include "ctxgf100.h" > > #include > @@ -156,6 +157,66 @@ gk110_gr_pack_mmio[] = { > {} > }; > > +const struct nvkm_therm_clkgate_init These should all be static, no? > +gk110_clkgate_blcg_init_sked_0[] = { > + { 0x407000, 1, 0x4041 }, > + {} > +}; > + > +const struct nvkm_therm_clkgate_init > +gk110_clkgate_blcg_init_gpc_gcc_0[] = { > + { 0x419020, 1, 0x0042 }, > + { 0x419038, 1, 0x0042 }, > + {} > +}; > + > +const struct nvkm_therm_clkgate_init > +gk110_clkgate_blcg_init_gpc_l1c_0[] = { > + { 0x419cd4, 2, 0x4042 }, > + {} > +}; > + > +const struct nvkm_therm_clkgate_init > +gk110_clkgate_blcg_init_gpc_mp_0[] = { > + { 0x419fd0, 1, 0x4043 }, > + { 0x419fd8, 1, 0x4049 }, > + { 0x419fe0, 2, 0x4042 }, > + { 0x419ff0, 1, 0x0046 }, > + { 0x419ff8, 1, 0x4042 }, > + { 0x419f90, 1, 0x4042 }, > + {} > +}; > + > +const struct nvkm_therm_clkgate_pack > +gk110_clkgate_pack[] = { > + { gk104_clkgate_blcg_init_main_0 }, > + { gk104_clkgate_blcg_init_rstr2d_0 }, > + { gk104_clkgate_blcg_init_unk_0 }, > + { gk104_clkgate_blcg_init_gcc_0 }, > + { gk110_clkgate_blcg_init_sked_0 }, > + { gk104_clkgate_blcg_init_unk_1 }, > + { gk104_clkgate_blcg_init_gpc_ctxctl_0 }, > + { gk104_clkgate_blcg_init_gpc_unk_0 }, > + { gk104_clkgate_blcg_init_gpc_esetup_0 }, > + { gk104_clkgate_blcg_init_gpc_tpbus_0 }, > + { gk104_clkgate_blcg_init_gpc_zcull_0 }, > + { gk104_clkgate_blcg_init_gpc_tpconf_0 }, > + {
Re: [RFC v2 3/4] drm/nouveau: Add support for BLCG on Kepler2
On Thu, Jan 25, 2018 at 10:35 PM, Lyude Paul wrote: > Same as the previous patch, but for Kepler2 now > > Signed-off-by: Lyude Paul > --- > drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h | 1 + > drivers/gpu/drm/nouveau/nvkm/engine/device/base.c | 8 +-- > drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c| 62 > drivers/gpu/drm/nouveau/nvkm/subdev/fb/Kbuild | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk110.c| 71 > +++ > 5 files changed, 139 insertions(+), 4 deletions(-) > create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk110.c > > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h > b/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h > index adb78f7d083a..92be0e5269c6 100644 > --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h > +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h > @@ -75,6 +75,7 @@ int mcp89_fb_new(struct nvkm_device *, int, struct nvkm_fb > **); > int gf100_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > int gf108_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > int gk104_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > +int gk110_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > int gk20a_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > int gm107_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > int gm200_fb_new(struct nvkm_device *, int, struct nvkm_fb **); > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c > b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c > index 74bd09b1c893..7590a30b7ff0 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c > @@ -1812,7 +1812,7 @@ nvf0_chipset = { > .bus = gf100_bus_new, > .clk = gk104_clk_new, > .devinit = gf100_devinit_new, > - .fb = gk104_fb_new, > + .fb = gk110_fb_new, > .fuse = gf100_fuse_new, > .gpio = gk104_gpio_new, > .i2c = gk104_i2c_new, > @@ -1850,7 +1850,7 @@ nvf1_chipset = { > .bus = gf100_bus_new, > .clk = gk104_clk_new, > .devinit = gf100_devinit_new, > - .fb = gk104_fb_new, > + .fb = gk110_fb_new, > .fuse = gf100_fuse_new, > .gpio = gk104_gpio_new, > .i2c = gk104_i2c_new, > @@ -1888,7 +1888,7 @@ nv106_chipset = { > .bus = gf100_bus_new, > .clk = gk104_clk_new, > .devinit = gf100_devinit_new, > - .fb = gk104_fb_new, > + .fb = gk110_fb_new, > .fuse = gf100_fuse_new, > .gpio = gk104_gpio_new, > .i2c = gk104_i2c_new, > @@ -1926,7 +1926,7 @@ nv108_chipset = { > .bus = gf100_bus_new, > .clk = gk104_clk_new, > .devinit = gf100_devinit_new, > - .fb = gk104_fb_new, > + .fb = gk110_fb_new, > .fuse = gf100_fuse_new, > .gpio = gk104_gpio_new, > .i2c = gk104_i2c_new, > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c > b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c > index a38e19b61c1d..38d3328e45f1 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c > @@ -22,6 +22,7 @@ > * Authors: Ben Skeggs > */ > #include "gf100.h" > +#include "gk104.h" > #include "ctxgf100.h" > > #include > @@ -156,6 +157,66 @@ gk110_gr_pack_mmio[] = { > {} > }; > > +const struct nvkm_therm_clkgate_init These should all be static, no? > +gk110_clkgate_blcg_init_sked_0[] = { > + { 0x407000, 1, 0x4041 }, > + {} > +}; > + > +const struct nvkm_therm_clkgate_init > +gk110_clkgate_blcg_init_gpc_gcc_0[] = { > + { 0x419020, 1, 0x0042 }, > + { 0x419038, 1, 0x0042 }, > + {} > +}; > + > +const struct nvkm_therm_clkgate_init > +gk110_clkgate_blcg_init_gpc_l1c_0[] = { > + { 0x419cd4, 2, 0x4042 }, > + {} > +}; > + > +const struct nvkm_therm_clkgate_init > +gk110_clkgate_blcg_init_gpc_mp_0[] = { > + { 0x419fd0, 1, 0x4043 }, > + { 0x419fd8, 1, 0x4049 }, > + { 0x419fe0, 2, 0x4042 }, > + { 0x419ff0, 1, 0x0046 }, > + { 0x419ff8, 1, 0x4042 }, > + { 0x419f90, 1, 0x4042 }, > + {} > +}; > + > +const struct nvkm_therm_clkgate_pack > +gk110_clkgate_pack[] = { > + { gk104_clkgate_blcg_init_main_0 }, > + { gk104_clkgate_blcg_init_rstr2d_0 }, > + { gk104_clkgate_blcg_init_unk_0 }, > + { gk104_clkgate_blcg_init_gcc_0 }, > + { gk110_clkgate_blcg_init_sked_0 }, > + { gk104_clkgate_blcg_init_unk_1 }, > + { gk104_clkgate_blcg_init_gpc_ctxctl_0 }, > + { gk104_clkgate_blcg_init_gpc_unk_0 }, > + { gk104_clkgate_blcg_init_gpc_esetup_0 }, > + { gk104_clkgate_blcg_init_gpc_tpbus_0 }, > + { gk104_clkgate_blcg_init_gpc_zcull_0 }, > + { gk104_clkgate_blcg_init_gpc_tpconf_0 }, > + { gk104_clkgate_blcg_init_gpc_unk_1 }, > + {
Re: nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
On Sun, Dec 31, 2017 at 3:53 PM, Mike Galbraith <efa...@gmx.de> wrote: > On Sun, 2017-12-31 at 13:27 -0500, Ilia Mirkin wrote: >> On Tue, Dec 19, 2017 at 8:45 AM, Christian König >> <ckoenig.leichtzumer...@gmail.com> wrote: >> > Am 19.12.2017 um 11:39 schrieb Michel Dänzer: >> >> >> >> On 2017-12-19 11:37 AM, Michel Dänzer wrote: >> >>> >> >>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote: >> >>>> >> >>>> On 12/18/17 7:06 PM, Mike Galbraith wrote: >> >>>>> >> >>>>> Greetings, >> >>>>> >> >>>>> Kernel bound workloads seem to trigger the below for whatever reason. >> >>>>>I only see this when beating up NFS. There was a kworker wakeup >> >>>>> latency issue, but with a bandaid applied to fix that up, I can still >> >>>>> trigger this. >> >>>> >> >>>> >> >>>> Hi, >> >>>> >> >>>> i have seen this one as well with my system, but i could not find an >> >>>> easy way to trigger it for bisecting purpose. If you can trigger it >> >>>> conveniently, a bisect would be nice! >> >>> >> >>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a >> >>> backup, creating memory pressure. I happen to have just finished >> >>> bisecting, the result is: >> >>> >> >>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit >> >>> commit 648bc3574716400acc06f99915815f80d9563783 >> >>> Author: Christian König <christian.koe...@amd.com> >> >>> Date: Thu Jul 6 09:59:43 2017 +0200 >> >>> >> >>> drm/ttm: add transparent huge page support for DMA allocations v2 >> >>> >> >>> Try to allocate huge pages when it makes sense. >> >>> >> >>> v2: fix comment and use ifdef >> >>> >> >>> >> >> BTW, I haven't noticed any bad effects other than the dmesg splats, so >> >> maybe it's just noise about transient failures for which there is a >> >> proper fallback in place. >> > >> > >> > Yeah, I think that is exactly what happens here. >> > >> > We try to allocate a huge page, but fail and so fall back to using multiple >> > 4k pages instead. >> > >> > Going to send out a patch to suppress the warning. >> >> Hi Christian, >> >> Did you ever send out such a patch? I didn't see one on the list, but >> perhaps I missed it. One definitely hasn't made it upstream yet. (I >> just hit the issue myself with Linus's tree from last night.) > > Actually, that wants a bit more methinks, because while the stack dump > goes away, you still get spammed, it just comes in smaller chunks. OK, well this has to either be fixed or reverted. Right now it's complaining all the time for me after like a day of uptime. -ilia
Re: nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
On Sun, Dec 31, 2017 at 3:53 PM, Mike Galbraith wrote: > On Sun, 2017-12-31 at 13:27 -0500, Ilia Mirkin wrote: >> On Tue, Dec 19, 2017 at 8:45 AM, Christian König >> wrote: >> > Am 19.12.2017 um 11:39 schrieb Michel Dänzer: >> >> >> >> On 2017-12-19 11:37 AM, Michel Dänzer wrote: >> >>> >> >>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote: >> >>>> >> >>>> On 12/18/17 7:06 PM, Mike Galbraith wrote: >> >>>>> >> >>>>> Greetings, >> >>>>> >> >>>>> Kernel bound workloads seem to trigger the below for whatever reason. >> >>>>>I only see this when beating up NFS. There was a kworker wakeup >> >>>>> latency issue, but with a bandaid applied to fix that up, I can still >> >>>>> trigger this. >> >>>> >> >>>> >> >>>> Hi, >> >>>> >> >>>> i have seen this one as well with my system, but i could not find an >> >>>> easy way to trigger it for bisecting purpose. If you can trigger it >> >>>> conveniently, a bisect would be nice! >> >>> >> >>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a >> >>> backup, creating memory pressure. I happen to have just finished >> >>> bisecting, the result is: >> >>> >> >>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit >> >>> commit 648bc3574716400acc06f99915815f80d9563783 >> >>> Author: Christian König >> >>> Date: Thu Jul 6 09:59:43 2017 +0200 >> >>> >> >>> drm/ttm: add transparent huge page support for DMA allocations v2 >> >>> >> >>> Try to allocate huge pages when it makes sense. >> >>> >> >>> v2: fix comment and use ifdef >> >>> >> >>> >> >> BTW, I haven't noticed any bad effects other than the dmesg splats, so >> >> maybe it's just noise about transient failures for which there is a >> >> proper fallback in place. >> > >> > >> > Yeah, I think that is exactly what happens here. >> > >> > We try to allocate a huge page, but fail and so fall back to using multiple >> > 4k pages instead. >> > >> > Going to send out a patch to suppress the warning. >> >> Hi Christian, >> >> Did you ever send out such a patch? I didn't see one on the list, but >> perhaps I missed it. One definitely hasn't made it upstream yet. (I >> just hit the issue myself with Linus's tree from last night.) > > Actually, that wants a bit more methinks, because while the stack dump > goes away, you still get spammed, it just comes in smaller chunks. OK, well this has to either be fixed or reverted. Right now it's complaining all the time for me after like a day of uptime. -ilia
Re: nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
On Tue, Dec 19, 2017 at 8:45 AM, Christian Königwrote: > Am 19.12.2017 um 11:39 schrieb Michel Dänzer: >> >> On 2017-12-19 11:37 AM, Michel Dänzer wrote: >>> >>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote: On 12/18/17 7:06 PM, Mike Galbraith wrote: > > Greetings, > > Kernel bound workloads seem to trigger the below for whatever reason. >I only see this when beating up NFS. There was a kworker wakeup > latency issue, but with a bandaid applied to fix that up, I can still > trigger this. Hi, i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice! >>> >>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a >>> backup, creating memory pressure. I happen to have just finished >>> bisecting, the result is: >>> >>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit >>> commit 648bc3574716400acc06f99915815f80d9563783 >>> Author: Christian König >>> Date: Thu Jul 6 09:59:43 2017 +0200 >>> >>> drm/ttm: add transparent huge page support for DMA allocations v2 >>> >>> Try to allocate huge pages when it makes sense. >>> >>> v2: fix comment and use ifdef >>> >>> >> BTW, I haven't noticed any bad effects other than the dmesg splats, so >> maybe it's just noise about transient failures for which there is a >> proper fallback in place. > > > Yeah, I think that is exactly what happens here. > > We try to allocate a huge page, but fail and so fall back to using multiple > 4k pages instead. > > Going to send out a patch to suppress the warning. Hi Christian, Did you ever send out such a patch? I didn't see one on the list, but perhaps I missed it. One definitely hasn't made it upstream yet. (I just hit the issue myself with Linus's tree from last night.) Thanks, -ilia
Re: nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
On Tue, Dec 19, 2017 at 8:45 AM, Christian König wrote: > Am 19.12.2017 um 11:39 schrieb Michel Dänzer: >> >> On 2017-12-19 11:37 AM, Michel Dänzer wrote: >>> >>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote: On 12/18/17 7:06 PM, Mike Galbraith wrote: > > Greetings, > > Kernel bound workloads seem to trigger the below for whatever reason. >I only see this when beating up NFS. There was a kworker wakeup > latency issue, but with a bandaid applied to fix that up, I can still > trigger this. Hi, i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice! >>> >>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a >>> backup, creating memory pressure. I happen to have just finished >>> bisecting, the result is: >>> >>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit >>> commit 648bc3574716400acc06f99915815f80d9563783 >>> Author: Christian König >>> Date: Thu Jul 6 09:59:43 2017 +0200 >>> >>> drm/ttm: add transparent huge page support for DMA allocations v2 >>> >>> Try to allocate huge pages when it makes sense. >>> >>> v2: fix comment and use ifdef >>> >>> >> BTW, I haven't noticed any bad effects other than the dmesg splats, so >> maybe it's just noise about transient failures for which there is a >> proper fallback in place. > > > Yeah, I think that is exactly what happens here. > > We try to allocate a huge page, but fail and so fall back to using multiple > 4k pages instead. > > Going to send out a patch to suppress the warning. Hi Christian, Did you ever send out such a patch? I didn't see one on the list, but perhaps I missed it. One definitely hasn't made it upstream yet. (I just hit the issue myself with Linus's tree from last night.) Thanks, -ilia
Re: [tip:x86/urgent] x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
On Tue, Dec 12, 2017 at 9:43 AM, Peter Zijlstra <pet...@infradead.org> wrote: > On Tue, Dec 12, 2017 at 09:21:10AM -0500, Ilia Mirkin wrote: >> The "thing" being mmiotrace, or the "thing" being page-unaligned addresses? > > mmiotrace > >> If the former, its primary use-case is for snooping on the NVIDIA >> proprietary GPU driver in order to figure out how to drive the >> underlying hardware. The driver does ioremap's to get at PCI space, >> which mmiotrace "steals" and provides pages without a present bit set, >> along with a fault handler. When the fault handler is hit, it >> reinstates the faulting page, and single-steps the faulting >> instruction > > At which point you have valid page-tables and another CPU can access > that page too. > >> reading the before/after regs to determine what happened >> (doesn't work universally, but enough for instructions used for PCI >> MMIO accesses). See mmio-mod.c::pre and post (the latter is called >> from the debug handler). > > And after that you only invalidate the TLBs for the CPU that took the > initial fault, leaving possibly stale TLBs on other CPUs. > > > So this 'thing' has huge gaping SMP holes in. Sure does! Probably why the following happens when mmiotrace is enabled: void enable_mmiotrace(void) { mutex_lock(_mutex); if (is_enabled()) goto out; if (nommiotrace) pr_info("MMIO tracing disabled.\n"); kmmio_init(); enter_uniprocessor(); spin_lock_irq(_lock); atomic_inc(_enabled); spin_unlock_irq(_lock); pr_info("enabled.\n"); out: mutex_unlock(_mutex); }
Re: [tip:x86/urgent] x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
On Tue, Dec 12, 2017 at 9:43 AM, Peter Zijlstra wrote: > On Tue, Dec 12, 2017 at 09:21:10AM -0500, Ilia Mirkin wrote: >> The "thing" being mmiotrace, or the "thing" being page-unaligned addresses? > > mmiotrace > >> If the former, its primary use-case is for snooping on the NVIDIA >> proprietary GPU driver in order to figure out how to drive the >> underlying hardware. The driver does ioremap's to get at PCI space, >> which mmiotrace "steals" and provides pages without a present bit set, >> along with a fault handler. When the fault handler is hit, it >> reinstates the faulting page, and single-steps the faulting >> instruction > > At which point you have valid page-tables and another CPU can access > that page too. > >> reading the before/after regs to determine what happened >> (doesn't work universally, but enough for instructions used for PCI >> MMIO accesses). See mmio-mod.c::pre and post (the latter is called >> from the debug handler). > > And after that you only invalidate the TLBs for the CPU that took the > initial fault, leaving possibly stale TLBs on other CPUs. > > > So this 'thing' has huge gaping SMP holes in. Sure does! Probably why the following happens when mmiotrace is enabled: void enable_mmiotrace(void) { mutex_lock(_mutex); if (is_enabled()) goto out; if (nommiotrace) pr_info("MMIO tracing disabled.\n"); kmmio_init(); enter_uniprocessor(); spin_lock_irq(_lock); atomic_inc(_enabled); spin_unlock_irq(_lock); pr_info("enabled.\n"); out: mutex_unlock(_mutex); }
Re: [tip:x86/urgent] x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
On Tue, Dec 12, 2017 at 9:04 AM, Ingo Molnarwrote: > > * Peter Zijlstra wrote: > >> On Tue, Dec 12, 2017 at 02:55:30AM -0800, tip-bot for Karol Herbst wrote: >> > Commit-ID: 6d60ce384d1d5ca32b595244db4077a419acc687 >> > Gitweb: >> > https://git.kernel.org/tip/6d60ce384d1d5ca32b595244db4077a419acc687 >> > Author: Karol Herbst >> > AuthorDate: Mon, 27 Nov 2017 08:51:39 +0100 >> > Committer: Ingo Molnar >> > CommitDate: Mon, 11 Dec 2017 15:35:18 +0100 >> > >> > x86/mm/kmmio: Fix mmiotrace for page unaligned addresses >> >> OK, let me hijack this thread since apparently people use and care about >> mmiotrace. >> >> I was recently auditing the x86 tlb flushing and ran across this >> 'thing'. Can someone please explain to me how this is supposed to work >> and how its not completely broken? The "thing" being mmiotrace, or the "thing" being page-unaligned addresses? If the former, its primary use-case is for snooping on the NVIDIA proprietary GPU driver in order to figure out how to drive the underlying hardware. The driver does ioremap's to get at PCI space, which mmiotrace "steals" and provides pages without a present bit set, along with a fault handler. When the fault handler is hit, it reinstates the faulting page, and single-steps the faulting instruction reading the before/after regs to determine what happened (doesn't work universally, but enough for instructions used for PCI MMIO accesses). See mmio-mod.c::pre and post (the latter is called from the debug handler). You may be interested in reading Documentation/trace/mmiotrace.txt::How Mmiotrace Works Cheers, -ilia
Re: [tip:x86/urgent] x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
On Tue, Dec 12, 2017 at 9:04 AM, Ingo Molnar wrote: > > * Peter Zijlstra wrote: > >> On Tue, Dec 12, 2017 at 02:55:30AM -0800, tip-bot for Karol Herbst wrote: >> > Commit-ID: 6d60ce384d1d5ca32b595244db4077a419acc687 >> > Gitweb: >> > https://git.kernel.org/tip/6d60ce384d1d5ca32b595244db4077a419acc687 >> > Author: Karol Herbst >> > AuthorDate: Mon, 27 Nov 2017 08:51:39 +0100 >> > Committer: Ingo Molnar >> > CommitDate: Mon, 11 Dec 2017 15:35:18 +0100 >> > >> > x86/mm/kmmio: Fix mmiotrace for page unaligned addresses >> >> OK, let me hijack this thread since apparently people use and care about >> mmiotrace. >> >> I was recently auditing the x86 tlb flushing and ran across this >> 'thing'. Can someone please explain to me how this is supposed to work >> and how its not completely broken? The "thing" being mmiotrace, or the "thing" being page-unaligned addresses? If the former, its primary use-case is for snooping on the NVIDIA proprietary GPU driver in order to figure out how to drive the underlying hardware. The driver does ioremap's to get at PCI space, which mmiotrace "steals" and provides pages without a present bit set, along with a fault handler. When the fault handler is hit, it reinstates the faulting page, and single-steps the faulting instruction reading the before/after regs to determine what happened (doesn't work universally, but enough for instructions used for PCI MMIO accesses). See mmio-mod.c::pre and post (the latter is called from the debug handler). You may be interested in reading Documentation/trace/mmiotrace.txt::How Mmiotrace Works Cheers, -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Sat, Nov 18, 2017 at 12:23 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: >> On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary <li...@rainbow-software.org> >> wrote: >>> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote: >>>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary >>>> >>>> <li...@rainbow-software.org> wrote: >>>> > @@ -483,8 +483,8 @@ >>>> > nouveau :02:00.0: disp:0860: -> 0500 >>>> > nouveau :02:00.0: disp:0864: >>>> > nouveau :02:00.0: disp:0868: -> 04000500 >>>> > -nouveau :02:00.0: disp:086c: -> 00100500 >>>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00 >>>> > +nouveau :02:00.0: disp:086c: -> 00100a00 >>>> > +nouveau :02:00.0: disp:0870: e900 -> e800 >>>> > nouveau :02:00.0: disp:0874: -> >>>> > nouveau :02:00.0: disp:0878: >>>> > nouveau :02:00.0: disp:0880: 0500 >>>> > >>>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in >>>> > 64MB case. Why? >>>> > >>>> > I get blank screen even with 64MB with video=1280x1024-8 kernel >>>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen >>>> > memory. >>>> > >>>> > Conclusions: 8-bit support is broken and bpp reduction is weird. >>>> >>>> OK, well that makes a *ton* of sense (8bpp being broken). >>>> >>>> I think the idea of bpp reduction is that when you're on your shiny >>>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating >>>> all that to a pinned fbcon - almost half of that would go to a single >>>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to >>>> have at least a few fb-sized buffers for backbuffer rendering, etc. >>>> >>>> The specific limits could probably use tweaking - I think they only >>>> consider VRAM size, not the fb size. >>>> >>>> I guess 8bpp worked prior to the change you bisected though, so we >>>> should figure out what we did wrong in the new code. >>> >>> Yes, booted 3.7 (last working kernel) and it's running in 8bpp. >> >> By the way, instead of booting $kernel, you can use modetest from >> libdrm/tests. Not sure if it supports C8 though =/ > > It didn't. But it does now - I mailed a patch to dri-devel, also (with > slight fix) available at > > https://people.freedesktop.org/~imirkin/patches/0001-modetest-add-C8-support-to-generate-SMPTE-pattern.patch > > This works on GK208 but not on G92 (whose display unit is much closer > to your MCP79's). You can run as > > ./modetest -s DVI-I-1:1920x1200@C8 > > This should display a SMPTE pattern, and exit when you hit enter. When > it does so, it doesn't restore fbcon, but you can swtich to another > vty to get console back. > > I get a white picture on G92. Now just have to figure out how to fix > it. Someone should also test on a G80 if possible, since that takes a > different path as well. Someone tested out a GF100 and it had the same issue. I've since determined that the color is that of the first entry in the LUT. With the above program, it's (192, 192, 192) which looks white. -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Sat, Nov 18, 2017 at 12:23 AM, Ilia Mirkin wrote: > On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin wrote: >> On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary >> wrote: >>> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote: >>>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary >>>> >>>> wrote: >>>> > @@ -483,8 +483,8 @@ >>>> > nouveau :02:00.0: disp:0860: -> 0500 >>>> > nouveau :02:00.0: disp:0864: >>>> > nouveau :02:00.0: disp:0868: -> 04000500 >>>> > -nouveau :02:00.0: disp:086c: -> 00100500 >>>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00 >>>> > +nouveau :02:00.0: disp:086c: -> 00100a00 >>>> > +nouveau :02:00.0: disp:0870: e900 -> e800 >>>> > nouveau :02:00.0: disp:0874: -> >>>> > nouveau :02:00.0: disp:0878: >>>> > nouveau :02:00.0: disp:0880: 0500 >>>> > >>>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in >>>> > 64MB case. Why? >>>> > >>>> > I get blank screen even with 64MB with video=1280x1024-8 kernel >>>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen >>>> > memory. >>>> > >>>> > Conclusions: 8-bit support is broken and bpp reduction is weird. >>>> >>>> OK, well that makes a *ton* of sense (8bpp being broken). >>>> >>>> I think the idea of bpp reduction is that when you're on your shiny >>>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating >>>> all that to a pinned fbcon - almost half of that would go to a single >>>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to >>>> have at least a few fb-sized buffers for backbuffer rendering, etc. >>>> >>>> The specific limits could probably use tweaking - I think they only >>>> consider VRAM size, not the fb size. >>>> >>>> I guess 8bpp worked prior to the change you bisected though, so we >>>> should figure out what we did wrong in the new code. >>> >>> Yes, booted 3.7 (last working kernel) and it's running in 8bpp. >> >> By the way, instead of booting $kernel, you can use modetest from >> libdrm/tests. Not sure if it supports C8 though =/ > > It didn't. But it does now - I mailed a patch to dri-devel, also (with > slight fix) available at > > https://people.freedesktop.org/~imirkin/patches/0001-modetest-add-C8-support-to-generate-SMPTE-pattern.patch > > This works on GK208 but not on G92 (whose display unit is much closer > to your MCP79's). You can run as > > ./modetest -s DVI-I-1:1920x1200@C8 > > This should display a SMPTE pattern, and exit when you hit enter. When > it does so, it doesn't restore fbcon, but you can swtich to another > vty to get console back. > > I get a white picture on G92. Now just have to figure out how to fix > it. Someone should also test on a G80 if possible, since that takes a > different path as well. Someone tested out a GF100 and it had the same issue. I've since determined that the color is that of the first entry in the LUT. With the above program, it's (192, 192, 192) which looks white. -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary <li...@rainbow-software.org> > wrote: >> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote: >>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary >>> >>> <li...@rainbow-software.org> wrote: >>> > @@ -483,8 +483,8 @@ >>> > nouveau :02:00.0: disp:0860: -> 0500 >>> > nouveau :02:00.0: disp:0864: >>> > nouveau :02:00.0: disp:0868: -> 04000500 >>> > -nouveau :02:00.0: disp:086c: -> 00100500 >>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00 >>> > +nouveau :02:00.0: disp:086c: -> 00100a00 >>> > +nouveau :02:00.0: disp:0870: e900 -> e800 >>> > nouveau :02:00.0: disp:0874: -> >>> > nouveau :02:00.0: disp:0878: >>> > nouveau :02:00.0: disp:0880: 0500 >>> > >>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in >>> > 64MB case. Why? >>> > >>> > I get blank screen even with 64MB with video=1280x1024-8 kernel >>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen >>> > memory. >>> > >>> > Conclusions: 8-bit support is broken and bpp reduction is weird. >>> >>> OK, well that makes a *ton* of sense (8bpp being broken). >>> >>> I think the idea of bpp reduction is that when you're on your shiny >>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating >>> all that to a pinned fbcon - almost half of that would go to a single >>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to >>> have at least a few fb-sized buffers for backbuffer rendering, etc. >>> >>> The specific limits could probably use tweaking - I think they only >>> consider VRAM size, not the fb size. >>> >>> I guess 8bpp worked prior to the change you bisected though, so we >>> should figure out what we did wrong in the new code. >> >> Yes, booted 3.7 (last working kernel) and it's running in 8bpp. > > By the way, instead of booting $kernel, you can use modetest from > libdrm/tests. Not sure if it supports C8 though =/ It didn't. But it does now - I mailed a patch to dri-devel, also (with slight fix) available at https://people.freedesktop.org/~imirkin/patches/0001-modetest-add-C8-support-to-generate-SMPTE-pattern.patch This works on GK208 but not on G92 (whose display unit is much closer to your MCP79's). You can run as ./modetest -s DVI-I-1:1920x1200@C8 This should display a SMPTE pattern, and exit when you hit enter. When it does so, it doesn't restore fbcon, but you can swtich to another vty to get console back. I get a white picture on G92. Now just have to figure out how to fix it. Someone should also test on a G80 if possible, since that takes a different path as well. -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin wrote: > On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary > wrote: >> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote: >>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary >>> >>> wrote: >>> > @@ -483,8 +483,8 @@ >>> > nouveau :02:00.0: disp:0860: -> 0500 >>> > nouveau :02:00.0: disp:0864: >>> > nouveau :02:00.0: disp:0868: -> 04000500 >>> > -nouveau :02:00.0: disp:086c: -> 00100500 >>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00 >>> > +nouveau :02:00.0: disp:086c: -> 00100a00 >>> > +nouveau :02:00.0: disp:0870: e900 -> e800 >>> > nouveau :02:00.0: disp:0874: -> >>> > nouveau :02:00.0: disp:0878: >>> > nouveau :02:00.0: disp:0880: 0500 >>> > >>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in >>> > 64MB case. Why? >>> > >>> > I get blank screen even with 64MB with video=1280x1024-8 kernel >>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen >>> > memory. >>> > >>> > Conclusions: 8-bit support is broken and bpp reduction is weird. >>> >>> OK, well that makes a *ton* of sense (8bpp being broken). >>> >>> I think the idea of bpp reduction is that when you're on your shiny >>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating >>> all that to a pinned fbcon - almost half of that would go to a single >>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to >>> have at least a few fb-sized buffers for backbuffer rendering, etc. >>> >>> The specific limits could probably use tweaking - I think they only >>> consider VRAM size, not the fb size. >>> >>> I guess 8bpp worked prior to the change you bisected though, so we >>> should figure out what we did wrong in the new code. >> >> Yes, booted 3.7 (last working kernel) and it's running in 8bpp. > > By the way, instead of booting $kernel, you can use modetest from > libdrm/tests. Not sure if it supports C8 though =/ It didn't. But it does now - I mailed a patch to dri-devel, also (with slight fix) available at https://people.freedesktop.org/~imirkin/patches/0001-modetest-add-C8-support-to-generate-SMPTE-pattern.patch This works on GK208 but not on G92 (whose display unit is much closer to your MCP79's). You can run as ./modetest -s DVI-I-1:1920x1200@C8 This should display a SMPTE pattern, and exit when you hit enter. When it does so, it doesn't restore fbcon, but you can swtich to another vty to get console back. I get a white picture on G92. Now just have to figure out how to fix it. Someone should also test on a G80 if possible, since that takes a different path as well. -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary <li...@rainbow-software.org> > wrote: >> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote: >>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary >>> >>> <li...@rainbow-software.org> wrote: >>> > @@ -483,8 +483,8 @@ >>> > nouveau :02:00.0: disp:0860: -> 0500 >>> > nouveau :02:00.0: disp:0864: >>> > nouveau :02:00.0: disp:0868: -> 04000500 >>> > -nouveau :02:00.0: disp:086c: -> 00100500 >>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00 >>> > +nouveau :02:00.0: disp:086c: -> 00100a00 >>> > +nouveau :02:00.0: disp:0870: e900 -> e800 >>> > nouveau :02:00.0: disp:0874: -> >>> > nouveau :02:00.0: disp:0878: >>> > nouveau :02:00.0: disp:0880: 0500 >>> > >>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in >>> > 64MB case. Why? >>> > >>> > I get blank screen even with 64MB with video=1280x1024-8 kernel >>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen >>> > memory. >>> > >>> > Conclusions: 8-bit support is broken and bpp reduction is weird. >>> >>> OK, well that makes a *ton* of sense (8bpp being broken). >>> >>> I think the idea of bpp reduction is that when you're on your shiny >>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating >>> all that to a pinned fbcon - almost half of that would go to a single >>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to >>> have at least a few fb-sized buffers for backbuffer rendering, etc. >>> >>> The specific limits could probably use tweaking - I think they only >>> consider VRAM size, not the fb size. >>> >>> I guess 8bpp worked prior to the change you bisected though, so we >>> should figure out what we did wrong in the new code. >> >> Yes, booted 3.7 (last working kernel) and it's running in 8bpp. > > By the way, instead of booting $kernel, you can use modetest from > libdrm/tests. Not sure if it supports C8 though =/ > > I think the issue is this: > > - OUT_RING(evo, nv_crtc->lut.depth == 8 ? > - NV50_EVO_CRTC_CLUT_MODE_OFF : > - NV50_EVO_CRTC_CLUT_MODE_ON); > > Whereas now we always set 0xC000 (aka "ON"). In case I was being unclear, I'm talking about https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nv50_display.c#L1808 and surrounding items. Looks like lut_clr sets 0x4000 which was previously not used. Not sure what the difference between that and 0x is. This is what we have in rnndb for it: https://github.com/envytools/envytools/blob/master/rnndb/display/nv_evo.xml#L408 So bit 30 is mode, set is "high res", unset is "low res". So really what we want is 0x8000 which leaves the LUT enabled but uses the low-res mode? All this could use some playing-around with. -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin wrote: > On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary > wrote: >> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote: >>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary >>> >>> wrote: >>> > @@ -483,8 +483,8 @@ >>> > nouveau :02:00.0: disp:0860: -> 0500 >>> > nouveau :02:00.0: disp:0864: >>> > nouveau :02:00.0: disp:0868: -> 04000500 >>> > -nouveau :02:00.0: disp:086c: -> 00100500 >>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00 >>> > +nouveau :02:00.0: disp:086c: -> 00100a00 >>> > +nouveau :02:00.0: disp:0870: e900 -> e800 >>> > nouveau :02:00.0: disp:0874: -> >>> > nouveau :02:00.0: disp:0878: >>> > nouveau :02:00.0: disp:0880: 0500 >>> > >>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in >>> > 64MB case. Why? >>> > >>> > I get blank screen even with 64MB with video=1280x1024-8 kernel >>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen >>> > memory. >>> > >>> > Conclusions: 8-bit support is broken and bpp reduction is weird. >>> >>> OK, well that makes a *ton* of sense (8bpp being broken). >>> >>> I think the idea of bpp reduction is that when you're on your shiny >>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating >>> all that to a pinned fbcon - almost half of that would go to a single >>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to >>> have at least a few fb-sized buffers for backbuffer rendering, etc. >>> >>> The specific limits could probably use tweaking - I think they only >>> consider VRAM size, not the fb size. >>> >>> I guess 8bpp worked prior to the change you bisected though, so we >>> should figure out what we did wrong in the new code. >> >> Yes, booted 3.7 (last working kernel) and it's running in 8bpp. > > By the way, instead of booting $kernel, you can use modetest from > libdrm/tests. Not sure if it supports C8 though =/ > > I think the issue is this: > > - OUT_RING(evo, nv_crtc->lut.depth == 8 ? > - NV50_EVO_CRTC_CLUT_MODE_OFF : > - NV50_EVO_CRTC_CLUT_MODE_ON); > > Whereas now we always set 0xC000 (aka "ON"). In case I was being unclear, I'm talking about https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nv50_display.c#L1808 and surrounding items. Looks like lut_clr sets 0x4000 which was previously not used. Not sure what the difference between that and 0x is. This is what we have in rnndb for it: https://github.com/envytools/envytools/blob/master/rnndb/display/nv_evo.xml#L408 So bit 30 is mode, set is "high res", unset is "low res". So really what we want is 0x8000 which leaves the LUT enabled but uses the low-res mode? All this could use some playing-around with. -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary <li...@rainbow-software.org> wrote: > On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote: >> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary >> >> <li...@rainbow-software.org> wrote: >> > @@ -483,8 +483,8 @@ >> > nouveau :02:00.0: disp:0860: -> 0500 >> > nouveau :02:00.0: disp:0864: >> > nouveau :02:00.0: disp:0868: -> 04000500 >> > -nouveau :02:00.0: disp:086c: -> 00100500 >> > -nouveau :02:00.0: disp:0870: e900 -> 1e00 >> > +nouveau :02:00.0: disp:086c: -> 00100a00 >> > +nouveau :02:00.0: disp:0870: e900 -> e800 >> > nouveau :02:00.0: disp:0874: -> >> > nouveau :02:00.0: disp:0878: >> > nouveau :02:00.0: disp:0880: 0500 >> > >> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in >> > 64MB case. Why? >> > >> > I get blank screen even with 64MB with video=1280x1024-8 kernel >> > parameter. Console works with video=1280x1024-16 even with 32MB stolen >> > memory. >> > >> > Conclusions: 8-bit support is broken and bpp reduction is weird. >> >> OK, well that makes a *ton* of sense (8bpp being broken). >> >> I think the idea of bpp reduction is that when you're on your shiny >> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating >> all that to a pinned fbcon - almost half of that would go to a single >> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to >> have at least a few fb-sized buffers for backbuffer rendering, etc. >> >> The specific limits could probably use tweaking - I think they only >> consider VRAM size, not the fb size. >> >> I guess 8bpp worked prior to the change you bisected though, so we >> should figure out what we did wrong in the new code. > > Yes, booted 3.7 (last working kernel) and it's running in 8bpp. By the way, instead of booting $kernel, you can use modetest from libdrm/tests. Not sure if it supports C8 though =/ I think the issue is this: - OUT_RING(evo, nv_crtc->lut.depth == 8 ? - NV50_EVO_CRTC_CLUT_MODE_OFF : - NV50_EVO_CRTC_CLUT_MODE_ON); Whereas now we always set 0xC000 (aka "ON"). -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary wrote: > On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote: >> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary >> >> wrote: >> > @@ -483,8 +483,8 @@ >> > nouveau :02:00.0: disp:0860: -> 0500 >> > nouveau :02:00.0: disp:0864: >> > nouveau :02:00.0: disp:0868: -> 04000500 >> > -nouveau :02:00.0: disp:086c: -> 00100500 >> > -nouveau :02:00.0: disp:0870: e900 -> 1e00 >> > +nouveau :02:00.0: disp:086c: -> 00100a00 >> > +nouveau :02:00.0: disp:0870: e900 -> e800 >> > nouveau :02:00.0: disp:0874: -> >> > nouveau :02:00.0: disp:0878: >> > nouveau :02:00.0: disp:0880: 0500 >> > >> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in >> > 64MB case. Why? >> > >> > I get blank screen even with 64MB with video=1280x1024-8 kernel >> > parameter. Console works with video=1280x1024-16 even with 32MB stolen >> > memory. >> > >> > Conclusions: 8-bit support is broken and bpp reduction is weird. >> >> OK, well that makes a *ton* of sense (8bpp being broken). >> >> I think the idea of bpp reduction is that when you're on your shiny >> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating >> all that to a pinned fbcon - almost half of that would go to a single >> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to >> have at least a few fb-sized buffers for backbuffer rendering, etc. >> >> The specific limits could probably use tweaking - I think they only >> consider VRAM size, not the fb size. >> >> I guess 8bpp worked prior to the change you bisected though, so we >> should figure out what we did wrong in the new code. > > Yes, booted 3.7 (last working kernel) and it's running in 8bpp. By the way, instead of booting $kernel, you can use modetest from libdrm/tests. Not sure if it supports C8 though =/ I think the issue is this: - OUT_RING(evo, nv_crtc->lut.depth == 8 ? - NV50_EVO_CRTC_CLUT_MODE_OFF : - NV50_EVO_CRTC_CLUT_MODE_ON); Whereas now we always set 0xC000 (aka "ON"). -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zarywrote: > @@ -483,8 +483,8 @@ > nouveau :02:00.0: disp:0860: -> 0500 > nouveau :02:00.0: disp:0864: > nouveau :02:00.0: disp:0868: -> 04000500 > -nouveau :02:00.0: disp:086c: -> 00100500 > -nouveau :02:00.0: disp:0870: e900 -> 1e00 > +nouveau :02:00.0: disp:086c: -> 00100a00 > +nouveau :02:00.0: disp:0870: e900 -> e800 > nouveau :02:00.0: disp:0874: -> > nouveau :02:00.0: disp:0878: > nouveau :02:00.0: disp:0880: 0500 > > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in 64MB > case. Why? > > I get blank screen even with 64MB with video=1280x1024-8 kernel parameter. > Console works with video=1280x1024-16 even with 32MB stolen memory. > > Conclusions: 8-bit support is broken and bpp reduction is weird. OK, well that makes a *ton* of sense (8bpp being broken). I think the idea of bpp reduction is that when you're on your shiny new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating all that to a pinned fbcon - almost half of that would go to a single 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to have at least a few fb-sized buffers for backbuffer rendering, etc. The specific limits could probably use tweaking - I think they only consider VRAM size, not the fb size. I guess 8bpp worked prior to the change you bisected though, so we should figure out what we did wrong in the new code. -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary wrote: > @@ -483,8 +483,8 @@ > nouveau :02:00.0: disp:0860: -> 0500 > nouveau :02:00.0: disp:0864: > nouveau :02:00.0: disp:0868: -> 04000500 > -nouveau :02:00.0: disp:086c: -> 00100500 > -nouveau :02:00.0: disp:0870: e900 -> 1e00 > +nouveau :02:00.0: disp:086c: -> 00100a00 > +nouveau :02:00.0: disp:0870: e900 -> e800 > nouveau :02:00.0: disp:0874: -> > nouveau :02:00.0: disp:0878: > nouveau :02:00.0: disp:0880: 0500 > > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in 64MB > case. Why? > > I get blank screen even with 64MB with video=1280x1024-8 kernel parameter. > Console works with video=1280x1024-16 even with 32MB stolen memory. > > Conclusions: 8-bit support is broken and bpp reduction is weird. OK, well that makes a *ton* of sense (8bpp being broken). I think the idea of bpp reduction is that when you're on your shiny new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating all that to a pinned fbcon - almost half of that would go to a single 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to have at least a few fb-sized buffers for backbuffer rendering, etc. The specific limits could probably use tweaking - I think they only consider VRAM size, not the fb size. I guess 8bpp worked prior to the change you bisected though, so we should figure out what we did wrong in the new code. -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
With a new kernel, mind grabbing a dmesg with drm.debug=0x1e nouveau.debug=debug (or maybe even =trace)? Maybe also see if fbcon/fbdev have any debug things that can be turned on? Sounds like things are generally working, just the fbcon -> nouveaufb path seems somehow buggered. Another thing to try would be nouveau.atomic=1 On Fri, Nov 17, 2017 at 9:26 AM, Ondrej Zarywrote: > Hello, > I've just been hit by this old bug which is still present in 4.14: > https://bugs.freedesktop.org/show_bug.cgi?id=80675 > > On MCP79 (ION), when stolen memory is set to 32MB in BIOS, console is blank > but X11 works. When the stolen memory is increased to 64MB, console works > fine. > > Bisected it to this: > > 4f6029da58ba9204c98e33f4f3737fe085c87a6f is the first bad commit > commit 4f6029da58ba9204c98e33f4f3737fe085c87a6f > Author: Ben Skeggs > Date: Fri Nov 16 11:54:31 2012 +1000 > > drm/nv50-nvc0: switch to common disp impl, removing previous version > > Signed-off-by: Ben Skeggs > > It's a big change so I'm not able to do more debugging. > > -- > Ondrej Zary
Re: Blank console but X11 works on MCP79 - old regression since 3.8
With a new kernel, mind grabbing a dmesg with drm.debug=0x1e nouveau.debug=debug (or maybe even =trace)? Maybe also see if fbcon/fbdev have any debug things that can be turned on? Sounds like things are generally working, just the fbcon -> nouveaufb path seems somehow buggered. Another thing to try would be nouveau.atomic=1 On Fri, Nov 17, 2017 at 9:26 AM, Ondrej Zary wrote: > Hello, > I've just been hit by this old bug which is still present in 4.14: > https://bugs.freedesktop.org/show_bug.cgi?id=80675 > > On MCP79 (ION), when stolen memory is set to 32MB in BIOS, console is blank > but X11 works. When the stolen memory is increased to 64MB, console works > fine. > > Bisected it to this: > > 4f6029da58ba9204c98e33f4f3737fe085c87a6f is the first bad commit > commit 4f6029da58ba9204c98e33f4f3737fe085c87a6f > Author: Ben Skeggs > Date: Fri Nov 16 11:54:31 2012 +1000 > > drm/nv50-nvc0: switch to common disp impl, removing previous version > > Signed-off-by: Ben Skeggs > > It's a big change so I'm not able to do more debugging. > > -- > Ondrej Zary
Re: DSA mv88e6xxx RX frame errors and TCP/IP RX failure
On Wed, Aug 30, 2017 at 8:22 PM, Tim Harveywrote: > On Wed, Aug 30, 2017 at 3:06 PM, Andrew Lunn wrote: >> On Wed, Aug 30, 2017 at 12:53:56PM -0700, Tim Harvey wrote: >>> Greetings, >>> >>> I'm seeing RX frame errors when using the mv88e6xxx DSA driver on >>> 4.13-rc7. The board I'm using is a GW5904 [1] which has an IMX6 FEC >>> MAC (eth0) connected via RGMII to a MV88E6176 with its downstream >>> P0/P1/P2/P3 to front panel RJ45's (lan1-lan4). >> >> Hi Tim >> >> Can you confirm the counter is this one: >> >>/* Report late collisions as a frame error. */ >> if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL)) >> ndev->stats.rx_frame_errors++; >> >> I don't see anywhere else frame errors are counted, but it would be >> good to prove we are looking in the right place. >> > > Andrew, > > (adding IMX FEC driver maintainer to CC) > > Yes, that's one of them being hit. It looks like ifconfig reports > 'frame' as the accumulation of a few stats so here are some more > specifics from /sys/class/net/eth0/statistics: > > root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics# > for i in `ls rx_*`; do echo $i:$(cat $i); done > rx_bytes:103229 > rx_compressed:0 > rx_crc_errors:22 > rx_dropped:0 > rx_errors:22 > rx_fifo_errors:0 > rx_frame_errors:22 > rx_length_errors:22 > rx_missed_errors:0 > rx_nohandler:0 > rx_over_errors:0 > rx_packets:1174 > root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics# > ifconfig eth0 > eth0 Link encap:Ethernet HWaddr 00:D0:12:41:F3:E7 > inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1207 errors:22 dropped:0 overruns:0 frame:66 > TX packets:42 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:106009 (103.5 KiB) TX bytes:4604 (4.4 KiB) > > Instrumenting fec driver I see the following getting hit: > > status & BD_ENET_RX_LG /* rx_length_errors: Frame too long */ > status & BD_ENET_RX_CR /* rx_crc_errors: CRC Error */ > status & BD_ENET_RX_CL /* rx_frame_errors: Collision? */ > > Is this a frame size issue where the MV88E6176 is sending frames down > that exceed the MTU because of headers added? Not sure if this is relevant to you, but https://github.com/laanwj/linux-freedreno-a2xx/commit/076b6542fa27499072ec6c3a7941c8b3c79ba1fd was necessary to fix some MTU issues on a i.MX51. Not sure if it's upstream yet or not. Cheers, -ilia
Re: DSA mv88e6xxx RX frame errors and TCP/IP RX failure
On Wed, Aug 30, 2017 at 8:22 PM, Tim Harvey wrote: > On Wed, Aug 30, 2017 at 3:06 PM, Andrew Lunn wrote: >> On Wed, Aug 30, 2017 at 12:53:56PM -0700, Tim Harvey wrote: >>> Greetings, >>> >>> I'm seeing RX frame errors when using the mv88e6xxx DSA driver on >>> 4.13-rc7. The board I'm using is a GW5904 [1] which has an IMX6 FEC >>> MAC (eth0) connected via RGMII to a MV88E6176 with its downstream >>> P0/P1/P2/P3 to front panel RJ45's (lan1-lan4). >> >> Hi Tim >> >> Can you confirm the counter is this one: >> >>/* Report late collisions as a frame error. */ >> if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL)) >> ndev->stats.rx_frame_errors++; >> >> I don't see anywhere else frame errors are counted, but it would be >> good to prove we are looking in the right place. >> > > Andrew, > > (adding IMX FEC driver maintainer to CC) > > Yes, that's one of them being hit. It looks like ifconfig reports > 'frame' as the accumulation of a few stats so here are some more > specifics from /sys/class/net/eth0/statistics: > > root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics# > for i in `ls rx_*`; do echo $i:$(cat $i); done > rx_bytes:103229 > rx_compressed:0 > rx_crc_errors:22 > rx_dropped:0 > rx_errors:22 > rx_fifo_errors:0 > rx_frame_errors:22 > rx_length_errors:22 > rx_missed_errors:0 > rx_nohandler:0 > rx_over_errors:0 > rx_packets:1174 > root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics# > ifconfig eth0 > eth0 Link encap:Ethernet HWaddr 00:D0:12:41:F3:E7 > inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1207 errors:22 dropped:0 overruns:0 frame:66 > TX packets:42 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:106009 (103.5 KiB) TX bytes:4604 (4.4 KiB) > > Instrumenting fec driver I see the following getting hit: > > status & BD_ENET_RX_LG /* rx_length_errors: Frame too long */ > status & BD_ENET_RX_CR /* rx_crc_errors: CRC Error */ > status & BD_ENET_RX_CL /* rx_frame_errors: Collision? */ > > Is this a frame size issue where the MV88E6176 is sending frames down > that exceed the MTU because of headers added? Not sure if this is relevant to you, but https://github.com/laanwj/linux-freedreno-a2xx/commit/076b6542fa27499072ec6c3a7941c8b3c79ba1fd was necessary to fix some MTU issues on a i.MX51. Not sure if it's upstream yet or not. Cheers, -ilia
Re: [PATCH][V2] drm/nouveau: perform null check on msto[i] rathern than msto
On Thu, Aug 17, 2017 at 6:03 PM, Colin Kingwrote: > From: Colin Ian King > > The null check on the array msto is incorrect since msto is never > null. The null check should be instead on msto[i] since this is > being dereferenced in the call to drm_mode_connector_attach_encoder. > > Thanks to Emil Velikov for pointing out the mistake in my original > fix and for suggesting the correct fix. > > Detected by CoverityScan, CID#1375915 ("Array compared against 0") > > Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 > multi-stream") > Signed-off-by: Colin Ian King > --- > drivers/gpu/drm/nouveau/nv50_display.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nv50_display.c > b/drivers/gpu/drm/nouveau/nv50_display.c > index f7b4326a4641..ed444ecd9e85 100644 > --- a/drivers/gpu/drm/nouveau/nv50_display.c > +++ b/drivers/gpu/drm/nouveau/nv50_display.c > @@ -3141,7 +3141,7 @@ nv50_mstc_new(struct nv50_mstm *mstm, struct > drm_dp_mst_port *port, > mstc->connector.funcs->reset(>connector); > nouveau_conn_attach_properties(>connector); > > - for (i = 0; i < ARRAY_SIZE(mstm->msto) && mstm->msto; i++) > + for (i = 0; i < ARRAY_SIZE(mstm->msto) && mstm->msto[i]; i++) Ben will have to rule on which way is correct, but another interpretation is that it should be for (...; i < ARRAY_SIZE; i++) if (mstm->msto[i]) do_stuff() I haven't the faintest clue whether the msto array can have "holes" or not. > drm_mode_connector_attach_encoder(>connector, > >msto[i]->encoder); > > drm_object_attach_property(>connector.base, > dev->mode_config.path_property, 0); > -- > 2.11.0 >
Re: [PATCH][V2] drm/nouveau: perform null check on msto[i] rathern than msto
On Thu, Aug 17, 2017 at 6:03 PM, Colin King wrote: > From: Colin Ian King > > The null check on the array msto is incorrect since msto is never > null. The null check should be instead on msto[i] since this is > being dereferenced in the call to drm_mode_connector_attach_encoder. > > Thanks to Emil Velikov for pointing out the mistake in my original > fix and for suggesting the correct fix. > > Detected by CoverityScan, CID#1375915 ("Array compared against 0") > > Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 > multi-stream") > Signed-off-by: Colin Ian King > --- > drivers/gpu/drm/nouveau/nv50_display.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nv50_display.c > b/drivers/gpu/drm/nouveau/nv50_display.c > index f7b4326a4641..ed444ecd9e85 100644 > --- a/drivers/gpu/drm/nouveau/nv50_display.c > +++ b/drivers/gpu/drm/nouveau/nv50_display.c > @@ -3141,7 +3141,7 @@ nv50_mstc_new(struct nv50_mstm *mstm, struct > drm_dp_mst_port *port, > mstc->connector.funcs->reset(>connector); > nouveau_conn_attach_properties(>connector); > > - for (i = 0; i < ARRAY_SIZE(mstm->msto) && mstm->msto; i++) > + for (i = 0; i < ARRAY_SIZE(mstm->msto) && mstm->msto[i]; i++) Ben will have to rule on which way is correct, but another interpretation is that it should be for (...; i < ARRAY_SIZE; i++) if (mstm->msto[i]) do_stuff() I haven't the faintest clue whether the msto array can have "holes" or not. > drm_mode_connector_attach_encoder(>connector, > >msto[i]->encoder); > > drm_object_attach_property(>connector.base, > dev->mode_config.path_property, 0); > -- > 2.11.0 >
Re: nouveau driver locks up with 4.11 kernel
On Mon, Aug 14, 2017 at 4:29 PM, Michal Hocko <mho...@kernel.org> wrote: > On Mon 14-08-17 15:27:20, Ilia Mirkin wrote: >> On Mon, Aug 14, 2017 at 3:18 PM, Michal Hocko <mho...@kernel.org> wrote: > [...] >> > nouveau :03:00.0: fifo: channel 6 [mpv/vo[3535]] kick timeout >> > nouveau: mpv/vo[3535]::906f: detach gr failed, -110 >> >> Are you using mpv in conjunction with the GL video output and >> VDPAU-based acceleration? That will kill nouveau. For VDPAU, I >> recommend mplayer. > > Well, I am using mplayer package and vo=sdl. Which video output should I Well, according to the logs you're using "mpv", which, along with mplayer2, is not mplayer. I recommend mplayer. Not sure what the sdl video output does TBH, I've never used it -- perhaps mpv still manages to use GL for that? xv and vdpau are ones to use. [ In order to use VDPAU for decoding, you of course have to follow the instructions at https://nouveau.freedesktop.org/wiki/VideoAcceleration/#firmware ] > try instead? Btw. xine seems to be using VDPAU as well, yet it doesn't > lockup the whole X session. The videou output doesn't work properly > either but at least I am able to kill xine and still have the session. Happy to explain all the dirty details on IRC if you're curious. Doing things in multiple threads kills nouveau, and mpv does precisely that.
Re: nouveau driver locks up with 4.11 kernel
On Mon, Aug 14, 2017 at 4:29 PM, Michal Hocko wrote: > On Mon 14-08-17 15:27:20, Ilia Mirkin wrote: >> On Mon, Aug 14, 2017 at 3:18 PM, Michal Hocko wrote: > [...] >> > nouveau :03:00.0: fifo: channel 6 [mpv/vo[3535]] kick timeout >> > nouveau: mpv/vo[3535]::906f: detach gr failed, -110 >> >> Are you using mpv in conjunction with the GL video output and >> VDPAU-based acceleration? That will kill nouveau. For VDPAU, I >> recommend mplayer. > > Well, I am using mplayer package and vo=sdl. Which video output should I Well, according to the logs you're using "mpv", which, along with mplayer2, is not mplayer. I recommend mplayer. Not sure what the sdl video output does TBH, I've never used it -- perhaps mpv still manages to use GL for that? xv and vdpau are ones to use. [ In order to use VDPAU for decoding, you of course have to follow the instructions at https://nouveau.freedesktop.org/wiki/VideoAcceleration/#firmware ] > try instead? Btw. xine seems to be using VDPAU as well, yet it doesn't > lockup the whole X session. The videou output doesn't work properly > either but at least I am able to kill xine and still have the session. Happy to explain all the dirty details on IRC if you're curious. Doing things in multiple threads kills nouveau, and mpv does precisely that.
Re: nouveau driver locks up with 4.11 kernel
On Mon, Aug 14, 2017 at 3:18 PM, Michal Hockowrote: > Hi, > I am having issues with nouveau driver in 4.11 Debian distribution > kernel. I can start X session but the screen locks up e.g. when I try to > exit mplayer fullscreen mode. The lock is swamped with tons of > nouveau :03:00.0: fifo: SCHED_ERROR 13 [] > > messages and I also can see some warnings > [ cut here ] > WARNING: CPU: 1 PID: 3535 at > /build/linux-J4LMtv/linux-4.11.6/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogf100.c:85 > gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau] > nouveau :03:00.0: timeout > Modules linked in: nouveau mxm_wmi wmi ttm cpufreq_powersave > cpufreq_conservative cpufreq_userspace iptable_filter iptable_nat > nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle ip_tables > x_tables binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace > fscache sunrpc i2c_sis96x hwmon_vid nf_conntrack_ftp nf_conntrack fuse > i2c_dev thermal fan ac battery ntfs snd_intel8x0 snd_ac97_codec ac97_bus > psmouse lp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic > snd_hda_intel ppdev intel_powerclamp iTCO_wdt iTCO_vendor_support > snd_hda_codec coretemp gma500_gfx snd_hda_core evdev serio_raw drm_kms_helper > snd_hwdep snd_pcm_oss pcspkr snd_mixer_oss snd_pcm drm snd_seq_midi > snd_seq_midi_event sg snd_rawmidi snd_seq snd_seq_device snd_timer parport_pc > snd soundcore > parport i2c_algo_bit shpchp lpc_ich tpm_infineon mfd_core video button ext4 > crc16 jbd2 fscrypto ecb crypto_simd cryptd aes_i586 mbcache raid10 raid456 > async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq > libcrc32c crc32c_generic raid0 multipath linear uas usb_storage dm_mod raid1 > md_mod sd_mod hid_generic usbhid hid ahci libahci libata i2c_i801 scsi_mod > ehci_pci uhci_hcd e1000e ptp pps_core ehci_hcd usbcore usb_common > CPU: 1 PID: 3535 Comm: mpv/vo Tainted: GW 4.11.0-1-686-pae #1 > Debian 4.11.6-1 > Hardware name: /D2500HN, BIOS > MUCDT10N.86A.0073.2012.1101.1638 11/01/2012 > Call Trace: > ? dump_stack+0x55/0x73 > ? __warn+0xea/0x110 > ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau] > ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau] > ? warn_slowpath_fmt+0x46/0x60 > ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau] > ? gf100_fifo_gpfifo_engine_addr.isra.1+0xa0/0xa0 [nouveau] > ? nvkm_fifo_chan_child_fini+0x4e/0x120 [nouveau] > ? nvkm_object_del+0x58/0x90 [nouveau] > ? ktime_get+0x4b/0x110 > ? nvkm_oproxy_fini+0x23/0x80 [nouveau] > ? nvkm_object_fini+0x137/0x300 [nouveau] > ? nvkm_ioctl_del+0x8c/0xa0 [nouveau] > ? nvkm_ioctl+0x100/0x290 [nouveau] > ? __check_object_size+0x9e/0x13c > ? __check_object_size+0x9e/0x13c > ? nvif_client_ioctl+0x2b/0x40 [nouveau] > ? usif_ioctl+0x4eb/0x790 [nouveau] > ? nouveau_drm_ioctl+0xab/0xb0 [nouveau] > ? nouveau_pmops_resume+0x80/0x80 [nouveau] > ? do_vfs_ioctl+0x91/0x6b0 > ? SyS_ioctl+0x60/0x70 > ? do_fast_syscall_32+0x8a/0x150 > ? entry_SYSENTER_32+0x4e/0x7c > ---[ end trace 1bf6c731018c2e52 ]--- > > followed by > nouveau :03:00.0: fifo: channel 6 [mpv/vo[3535]] kick timeout > nouveau: mpv/vo[3535]::906f: detach gr failed, -110 Are you using mpv in conjunction with the GL video output and VDPAU-based acceleration? That will kill nouveau. For VDPAU, I recommend mplayer. Cheers, -ilia
Re: nouveau driver locks up with 4.11 kernel
On Mon, Aug 14, 2017 at 3:18 PM, Michal Hocko wrote: > Hi, > I am having issues with nouveau driver in 4.11 Debian distribution > kernel. I can start X session but the screen locks up e.g. when I try to > exit mplayer fullscreen mode. The lock is swamped with tons of > nouveau :03:00.0: fifo: SCHED_ERROR 13 [] > > messages and I also can see some warnings > [ cut here ] > WARNING: CPU: 1 PID: 3535 at > /build/linux-J4LMtv/linux-4.11.6/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogf100.c:85 > gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau] > nouveau :03:00.0: timeout > Modules linked in: nouveau mxm_wmi wmi ttm cpufreq_powersave > cpufreq_conservative cpufreq_userspace iptable_filter iptable_nat > nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle ip_tables > x_tables binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace > fscache sunrpc i2c_sis96x hwmon_vid nf_conntrack_ftp nf_conntrack fuse > i2c_dev thermal fan ac battery ntfs snd_intel8x0 snd_ac97_codec ac97_bus > psmouse lp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic > snd_hda_intel ppdev intel_powerclamp iTCO_wdt iTCO_vendor_support > snd_hda_codec coretemp gma500_gfx snd_hda_core evdev serio_raw drm_kms_helper > snd_hwdep snd_pcm_oss pcspkr snd_mixer_oss snd_pcm drm snd_seq_midi > snd_seq_midi_event sg snd_rawmidi snd_seq snd_seq_device snd_timer parport_pc > snd soundcore > parport i2c_algo_bit shpchp lpc_ich tpm_infineon mfd_core video button ext4 > crc16 jbd2 fscrypto ecb crypto_simd cryptd aes_i586 mbcache raid10 raid456 > async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq > libcrc32c crc32c_generic raid0 multipath linear uas usb_storage dm_mod raid1 > md_mod sd_mod hid_generic usbhid hid ahci libahci libata i2c_i801 scsi_mod > ehci_pci uhci_hcd e1000e ptp pps_core ehci_hcd usbcore usb_common > CPU: 1 PID: 3535 Comm: mpv/vo Tainted: GW 4.11.0-1-686-pae #1 > Debian 4.11.6-1 > Hardware name: /D2500HN, BIOS > MUCDT10N.86A.0073.2012.1101.1638 11/01/2012 > Call Trace: > ? dump_stack+0x55/0x73 > ? __warn+0xea/0x110 > ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau] > ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau] > ? warn_slowpath_fmt+0x46/0x60 > ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau] > ? gf100_fifo_gpfifo_engine_addr.isra.1+0xa0/0xa0 [nouveau] > ? nvkm_fifo_chan_child_fini+0x4e/0x120 [nouveau] > ? nvkm_object_del+0x58/0x90 [nouveau] > ? ktime_get+0x4b/0x110 > ? nvkm_oproxy_fini+0x23/0x80 [nouveau] > ? nvkm_object_fini+0x137/0x300 [nouveau] > ? nvkm_ioctl_del+0x8c/0xa0 [nouveau] > ? nvkm_ioctl+0x100/0x290 [nouveau] > ? __check_object_size+0x9e/0x13c > ? __check_object_size+0x9e/0x13c > ? nvif_client_ioctl+0x2b/0x40 [nouveau] > ? usif_ioctl+0x4eb/0x790 [nouveau] > ? nouveau_drm_ioctl+0xab/0xb0 [nouveau] > ? nouveau_pmops_resume+0x80/0x80 [nouveau] > ? do_vfs_ioctl+0x91/0x6b0 > ? SyS_ioctl+0x60/0x70 > ? do_fast_syscall_32+0x8a/0x150 > ? entry_SYSENTER_32+0x4e/0x7c > ---[ end trace 1bf6c731018c2e52 ]--- > > followed by > nouveau :03:00.0: fifo: channel 6 [mpv/vo[3535]] kick timeout > nouveau: mpv/vo[3535]::906f: detach gr failed, -110 Are you using mpv in conjunction with the GL video output and VDPAU-based acceleration? That will kill nouveau. For VDPAU, I recommend mplayer. Cheers, -ilia
Re: [PATCH v5 2/6] drm/bridge: Add a devm_ allocator for panel bridge.
On Fri, Aug 4, 2017 at 4:43 PM, Eric Anholtwrote: > Laurent Pinchart writes: > >> Hi Eric, >> >> (CC'ing Daniel) >> >> Thank you for the patch. >> >> On Tuesday 18 Jul 2017 14:05:06 Eric Anholt wrote: >>> This will let drivers reduce the error cleanup they need, in >>> particular the "is_panel_bridge" flag. >>> >>> v2: Slight cleanup of remove function by Andrzej >> >> I just want to point out that, in the context of Daniel's work on hot-unplug, >> 90% of the devm_* allocations are wrong and will get in the way. All DRM core >> objects that are accessible one way or another from userspace will need to be >> properly reference-counted and freed only when the last reference disappears, >> which could be well after the corresponding device is removed. I believe this >> could be one such objects :-/ > > Sure, if you're hotplugging, your life is pain. For non-hotpluggable > devices, like our SOC platform devices (current panel-bridge consumers), > this still seems like an excellent simplification of memory management. At that point you may as well make your module non-unloadable, and return failure when trying to remove a device from management by the driver (whatever the opposite of "probe" is, I forget). Hotplugging doesn't only happen when physically removing, it can happen for all kinds of reasons... and userspace may still hold references in some of those cases.
Re: [PATCH v5 2/6] drm/bridge: Add a devm_ allocator for panel bridge.
On Fri, Aug 4, 2017 at 4:43 PM, Eric Anholt wrote: > Laurent Pinchart writes: > >> Hi Eric, >> >> (CC'ing Daniel) >> >> Thank you for the patch. >> >> On Tuesday 18 Jul 2017 14:05:06 Eric Anholt wrote: >>> This will let drivers reduce the error cleanup they need, in >>> particular the "is_panel_bridge" flag. >>> >>> v2: Slight cleanup of remove function by Andrzej >> >> I just want to point out that, in the context of Daniel's work on hot-unplug, >> 90% of the devm_* allocations are wrong and will get in the way. All DRM core >> objects that are accessible one way or another from userspace will need to be >> properly reference-counted and freed only when the last reference disappears, >> which could be well after the corresponding device is removed. I believe this >> could be one such objects :-/ > > Sure, if you're hotplugging, your life is pain. For non-hotpluggable > devices, like our SOC platform devices (current panel-bridge consumers), > this still seems like an excellent simplification of memory management. At that point you may as well make your module non-unloadable, and return failure when trying to remove a device from management by the driver (whatever the opposite of "probe" is, I forget). Hotplugging doesn't only happen when physically removing, it can happen for all kinds of reasons... and userspace may still hold references in some of those cases.
Re: [Nouveau] [PATCH] drm: disable vblank only if it got previously enabled
I believe the solution is to not call drm_crtc_vblank_off for atomic modesetting in nouveau_display_fini. I think Ben's working on it. On Wed, Jul 19, 2017 at 1:25 PM, Tobias Klausmannwrote: > mimic the behavior of vblank_disable_fn(), another caller of > drm_vblank_disable_and_save(). > > This avoids oopsing, while trying to disable vblank on a not connected > display: > > [ 12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 > drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm] > [ 12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc > uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops > videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 > snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 > hid_multitouch nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp > vfat coretemp fat kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc > aesni_intel ath10k_pci snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec > crypto_simd ath glue_helper cryptd snd_hda_core mac80211 snd_hwdep snd_pcm > pcspkr r8169 cfg80211 mii snd_timer acer_wmi snd sparse_keymap wmi_bmof > idma64 hci_uart virt_dma mei_me soundcore i2c_i801 mei btbcm shpchp > intel_lpss_pci intel_pch_thermal > [ 12.768130] serdev btqca ucsi_acpi btintel typec_ucsi thermal typec > bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill intel_lpss_acpi > pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 mxm_wmi ttm > i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect sysimgblt > xhci_hcd fb_sys_fops usbcore drm i2c_hid wmi video button sg efivarfs > [ 12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted > 4.12.0-desktop-debug-drm+ #2 > [ 12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 > 03/30/2017 > [ 12.768164] Workqueue: pm pm_runtime_work > [ 12.768166] task: 889bf1627040 task.stack: 9541013e4000 > [ 12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 > [drm] > [ 12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086 > [ 12.768183] RAX: 001c RBX: 889b4cebd000 RCX: > 0004 > [ 12.768184] RDX: 8004 RSI: 87a2d952 RDI: > > [ 12.768186] RBP: 9541013e7b90 R08: 0001 R09: > 039f > [ 12.768187] R10: c05fe530 R11: R12: > > [ 12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: > 889bf0426000 > [ 12.768190] FS: () GS:889bfec0() > knlGS: > [ 12.768191] CS: 0010 DS: ES: CR0: 80050033 > [ 12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: > 003406f0 > [ 12.768193] Call Trace: > [ 12.768198] ? enqueue_task_fair+0x64/0x600 > [ 12.768211] ? drm_get_last_vbltimestamp+0x47/0x70 [drm] > [ 12.768223] ? drm_update_vblank_count+0x65/0x240 [drm] > [ 12.768227] ? pci_pm_runtime_resume+0xa0/0xa0 > [ 12.768238] ? drm_vblank_disable_and_save+0x55/0xc0 [drm] > [ 12.768250] ? drm_crtc_vblank_off+0xa9/0x1e0 [drm] > [ 12.768253] ? pci_pm_runtime_resume+0xa0/0xa0 > [ 12.768299] ? nouveau_display_fini+0x56/0xd0 [nouveau] > [ 12.768339] ? nouveau_display_suspend+0x51/0x110 [nouveau] > [ 12.768378] ? nouveau_do_suspend+0x76/0x1c0 [nouveau] > [ 12.768413] ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau] > [ 12.768416] ? pci_pm_runtime_suspend+0x5c/0x160 > [ 12.768419] ? __rpm_callback+0xb6/0x1e0 > [ 12.768423] ? kobject_uevent_env+0x111/0x5e0 > [ 12.768425] ? pci_pm_runtime_resume+0xa0/0xa0 > [ 12.768427] ? rpm_callback+0x1f/0x70 > [ 12.768429] ? pci_pm_runtime_resume+0xa0/0xa0 > [ 12.768431] ? rpm_suspend+0x11f/0x640 > [ 12.768441] ? drm_fb_helper_hotplug_event+0x9a/0xe0 [drm_kms_helper] > [ 12.768447] ? output_poll_execute+0x17b/0x1a0 [drm_kms_helper] > [ 12.768449] ? pm_runtime_work+0x64/0xa0 > [ 12.768453] ? process_one_work+0x1db/0x410 > [ 12.768456] ? worker_thread+0x47/0x3d0 > [ 12.768459] ? process_one_work+0x410/0x410 > [ 12.768461] ? kthread+0x117/0x130 > [ 12.768463] ? kthread_create_on_node+0x40/0x40 > [ 12.768466] ? ret_from_fork+0x25/0x30 > [ 12.768468] Code: 80 3d 26 f3 01 00 00 0f 85 ad fd ff ff 48 8b 43 20 48 c7 > c7 31 a2 20 c0 c6 05 0e f3 01 00 01 48 8b b0 60 01 00 00 e8 75 2e ec c6 <0f> > ff e9 88 fd ff ff 31 f6 44 88 55 b0 e8 38 fa ed c6 44 0f b6 > [ 12.768508] ---[ end trace d9bb853af3659bd5 ]--- > > Signed-off-by: Tobias Klausmann > --- > drivers/gpu/drm/drm_vblank.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c > index a233a6be934a..4a21756bf2bd 100644 > ---
Re: [Nouveau] [PATCH] drm: disable vblank only if it got previously enabled
I believe the solution is to not call drm_crtc_vblank_off for atomic modesetting in nouveau_display_fini. I think Ben's working on it. On Wed, Jul 19, 2017 at 1:25 PM, Tobias Klausmann wrote: > mimic the behavior of vblank_disable_fn(), another caller of > drm_vblank_disable_and_save(). > > This avoids oopsing, while trying to disable vblank on a not connected > display: > > [ 12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 > drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm] > [ 12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc > uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops > videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 > snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 > hid_multitouch nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp > vfat coretemp fat kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc > aesni_intel ath10k_pci snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec > crypto_simd ath glue_helper cryptd snd_hda_core mac80211 snd_hwdep snd_pcm > pcspkr r8169 cfg80211 mii snd_timer acer_wmi snd sparse_keymap wmi_bmof > idma64 hci_uart virt_dma mei_me soundcore i2c_i801 mei btbcm shpchp > intel_lpss_pci intel_pch_thermal > [ 12.768130] serdev btqca ucsi_acpi btintel typec_ucsi thermal typec > bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill intel_lpss_acpi > pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 mxm_wmi ttm > i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect sysimgblt > xhci_hcd fb_sys_fops usbcore drm i2c_hid wmi video button sg efivarfs > [ 12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted > 4.12.0-desktop-debug-drm+ #2 > [ 12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 > 03/30/2017 > [ 12.768164] Workqueue: pm pm_runtime_work > [ 12.768166] task: 889bf1627040 task.stack: 9541013e4000 > [ 12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 > [drm] > [ 12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086 > [ 12.768183] RAX: 001c RBX: 889b4cebd000 RCX: > 0004 > [ 12.768184] RDX: 8004 RSI: 87a2d952 RDI: > > [ 12.768186] RBP: 9541013e7b90 R08: 0001 R09: > 039f > [ 12.768187] R10: c05fe530 R11: R12: > > [ 12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: > 889bf0426000 > [ 12.768190] FS: () GS:889bfec0() > knlGS: > [ 12.768191] CS: 0010 DS: ES: CR0: 80050033 > [ 12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: > 003406f0 > [ 12.768193] Call Trace: > [ 12.768198] ? enqueue_task_fair+0x64/0x600 > [ 12.768211] ? drm_get_last_vbltimestamp+0x47/0x70 [drm] > [ 12.768223] ? drm_update_vblank_count+0x65/0x240 [drm] > [ 12.768227] ? pci_pm_runtime_resume+0xa0/0xa0 > [ 12.768238] ? drm_vblank_disable_and_save+0x55/0xc0 [drm] > [ 12.768250] ? drm_crtc_vblank_off+0xa9/0x1e0 [drm] > [ 12.768253] ? pci_pm_runtime_resume+0xa0/0xa0 > [ 12.768299] ? nouveau_display_fini+0x56/0xd0 [nouveau] > [ 12.768339] ? nouveau_display_suspend+0x51/0x110 [nouveau] > [ 12.768378] ? nouveau_do_suspend+0x76/0x1c0 [nouveau] > [ 12.768413] ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau] > [ 12.768416] ? pci_pm_runtime_suspend+0x5c/0x160 > [ 12.768419] ? __rpm_callback+0xb6/0x1e0 > [ 12.768423] ? kobject_uevent_env+0x111/0x5e0 > [ 12.768425] ? pci_pm_runtime_resume+0xa0/0xa0 > [ 12.768427] ? rpm_callback+0x1f/0x70 > [ 12.768429] ? pci_pm_runtime_resume+0xa0/0xa0 > [ 12.768431] ? rpm_suspend+0x11f/0x640 > [ 12.768441] ? drm_fb_helper_hotplug_event+0x9a/0xe0 [drm_kms_helper] > [ 12.768447] ? output_poll_execute+0x17b/0x1a0 [drm_kms_helper] > [ 12.768449] ? pm_runtime_work+0x64/0xa0 > [ 12.768453] ? process_one_work+0x1db/0x410 > [ 12.768456] ? worker_thread+0x47/0x3d0 > [ 12.768459] ? process_one_work+0x410/0x410 > [ 12.768461] ? kthread+0x117/0x130 > [ 12.768463] ? kthread_create_on_node+0x40/0x40 > [ 12.768466] ? ret_from_fork+0x25/0x30 > [ 12.768468] Code: 80 3d 26 f3 01 00 00 0f 85 ad fd ff ff 48 8b 43 20 48 c7 > c7 31 a2 20 c0 c6 05 0e f3 01 00 01 48 8b b0 60 01 00 00 e8 75 2e ec c6 <0f> > ff e9 88 fd ff ff 31 f6 44 88 55 b0 e8 38 fa ed c6 44 0f b6 > [ 12.768508] ---[ end trace d9bb853af3659bd5 ]--- > > Signed-off-by: Tobias Klausmann > --- > drivers/gpu/drm/drm_vblank.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c > index a233a6be934a..4a21756bf2bd 100644 > --- a/drivers/gpu/drm/drm_vblank.c > +++ b/drivers/gpu/drm/drm_vblank.c > @@ -1140,8 +1140,11 @@ void
Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling
On Sun, Jul 16, 2017 at 12:43 AM, Mike Galbraith <efa...@gmx.de> wrote: > On Sat, 2017-07-15 at 14:52 -0400, Ilia Mirkin wrote: >> >> OK, so this issue appears to be that we're calling >> drm_crtc_vblank_off() on a crtc for which vblank is already disabled. >> My guess is that this happens because the crtc is disabled. >> >> Not sure what the proper check is to see if vblanks are already disabled... > > Seems so, the below shut up suspend for both 8600 GT and GTX 980. The modeset done by drm_atomic_helper_suspend (called previously to that *_fini) should already take care of disabling vblanks, I think. So the vblank_off calls can just be done when we're not doing an atomic modeset [drm_drv_uses_atomic_modeset(dev)] -- this is all very confusing since pre-nv50 uses legacy modesets, while nv50+ has been moved to atomic, but they share a bunch of helpers =/
Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling
On Sun, Jul 16, 2017 at 12:43 AM, Mike Galbraith wrote: > On Sat, 2017-07-15 at 14:52 -0400, Ilia Mirkin wrote: >> >> OK, so this issue appears to be that we're calling >> drm_crtc_vblank_off() on a crtc for which vblank is already disabled. >> My guess is that this happens because the crtc is disabled. >> >> Not sure what the proper check is to see if vblanks are already disabled... > > Seems so, the below shut up suspend for both 8600 GT and GTX 980. The modeset done by drm_atomic_helper_suspend (called previously to that *_fini) should already take care of disabling vblanks, I think. So the vblank_off calls can just be done when we're not doing an atomic modeset [drm_drv_uses_atomic_modeset(dev)] -- this is all very confusing since pre-nv50 uses legacy modesets, while nv50+ has been moved to atomic, but they share a bunch of helpers =/
Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling
On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraithwrote: > Greetings, > > box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside > kernel: master.today (v4.12-11690-gccd5d1b91f22) > > lspci -nn -d 10de: > 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce > 8600 GT] [10de:0402] (rev a1) > > abreviated dmesg: > ... > [3.720990] fb: switching to nouveaufb from VESA VGA > [3.744489] Console: switching to colour dummy device 80x25 > [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2) > ... > [3.846963] usbcore: registered new interface driver uas > [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12 > [ 321.450262] nouveau :01:00.0: DRM: suspending console... > [ 321.450265] nouveau :01:00.0: DRM: suspending display... > [ 321.450462] e1000e: EEE TX LPI TIMER: > [ 321.450501] br0: port 1(eth0) entered disabled state > [ 321.473838] [ cut here ] > [ 321.473863] WARNING: CPU: 1 PID: 4786 at drivers/gpu/drm/drm_vblank.c:608 > drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x330 [drm] > [ 321.473864] Modules linked in: ebtable_filter(E) ebtables(E) fuse(E) > rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) af_packet(E) > bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) > xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) > ipt_REJECT(E) iptable_raw(E) iptable_filter(E) ip6table_mangle(E) > nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) > nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) > ip6table_filter(E) ip6_tables(E) x_tables(E) saa7134_alsa(E) tda1004x(E) > saa7134_dvb(E) videobuf2_dvb(E) dvb_core(E) arc4(E) rt2800usb(E) rt2x00usb(E) > rt2800lib(E) crc_ccitt(E) rt2x00lib(E) mac80211(E) cfg80211(E) > rc_medion_x10_or2x(E) rfkill(E) ati_remote(E) tda827x(E) tda8290(E) tuner(E) > snd_hda_codec_realtek(E) saa7134(E) > [ 321.473905] snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) > snd_hwdep(E) tveeprom(E) coretemp(E) videobuf2_dma_sg(E) videobuf2_memops(E) > snd_hda_core(E) videobuf2_v4l2(E) kvm_intel(E) snd_pcm(E) kvm(E) > videobuf2_core(E) snd_timer(E) rc_core(E) v4l2_common(E) snd(E) videodev(E) > iTCO_wdt(E) media(E) e1000e(E) iTCO_vendor_support(E) ptp(E) pps_core(E) > shpchp(E) soundcore(E) i2c_i801(E) lpc_ich(E) mfd_core(E) irqbypass(E) > pcspkr(E) thermal(E) acpi_cpufreq(E) fan(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) > lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) > sr_mod(E) cdrom(E) sd_mod(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) > nouveau(E) wmi(E) video(E) i2c_algo_bit(E) ahci(E) drm_kms_helper(E) > syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) > firewire_ohci(E) > [ 321.473950] libata(E) firewire_core(E) crc_itu_t(E) ehci_pci(E) > serio_raw(E) ttm(E) button(E) drm(E) uhci_hcd(E) ehci_hcd(E) usbcore(E) sg(E) > dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) > scsi_mod(E) autofs4(E) > [ 321.473966] CPU: 1 PID: 4786 Comm: kworker/u8:17 Tainted: GW E > 4.12.0.gccd5d1b-master #186 > [ 321.473968] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG > 12/26/2007 > [ 321.473972] Workqueue: events_unbound async_run_entry_fn > [ 321.473974] task: 8801daf93d40 task.stack: c90003edc000 > [ 321.473990] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x330 > [drm] > [ 321.473992] RSP: 0018:c90003edfb00 EFLAGS: 00010082 > [ 321.473994] RAX: a03e6100 RBX: 88021114 RCX: > 0001 > [ 321.473995] RDX: a01dd8c8 RSI: 0001 RDI: > a01c8023 > [ 321.473996] RBP: c90003edfb80 R08: R09: > a01b0920 > [ 321.473998] R10: a0376e60 R11: 8802131399f8 R12: > 0001 > [ 321.473999] R13: 880213139800 R14: c90003edfb94 R15: > c90003edfbd0 > [ 321.474001] FS: () GS:88022fc8() > knlGS: > [ 321.474003] CS: 0010 DS: ES: CR0: 80050033 > [ 321.474004] CR2: 7fdd82e8f810 CR3: 000214683000 CR4: > 06e0 > [ 321.474005] Call Trace: > [ 321.474068] ? nv50_head_vblank_put+0x22/0x50 [nouveau] > [ 321.474085] drm_get_last_vbltimestamp+0x41/0x70 [drm] > [ 321.474102] drm_update_vblank_count+0x61/0x230 [drm] > [ 321.474118] drm_vblank_disable_and_save+0x59/0xc0 [drm] > [ 321.474134] drm_crtc_vblank_off+0x1d5/0x210 [drm] > [ 321.474152] ? drm_modeset_drop_locks+0x4e/0x60 [drm] > [ 321.474203] nouveau_display_fini+0x56/0xd0 [nouveau] > [ 321.474254] nouveau_display_suspend+0x4f/0x110 [nouveau] > [ 321.474304] nouveau_do_suspend+0x7c/0x1e0 [nouveau] > [ 321.474355] nouveau_pmops_suspend+0x2d/0x70 [nouveau] > [ 321.474358] pci_pm_suspend+0x70/0x130 > [ 321.474360] ? pci_pm_resume+0x90/0x90 > [ 321.474364]
Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling
On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraith wrote: > Greetings, > > box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside > kernel: master.today (v4.12-11690-gccd5d1b91f22) > > lspci -nn -d 10de: > 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce > 8600 GT] [10de:0402] (rev a1) > > abreviated dmesg: > ... > [3.720990] fb: switching to nouveaufb from VESA VGA > [3.744489] Console: switching to colour dummy device 80x25 > [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2) > ... > [3.846963] usbcore: registered new interface driver uas > [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12 > [ 321.450262] nouveau :01:00.0: DRM: suspending console... > [ 321.450265] nouveau :01:00.0: DRM: suspending display... > [ 321.450462] e1000e: EEE TX LPI TIMER: > [ 321.450501] br0: port 1(eth0) entered disabled state > [ 321.473838] [ cut here ] > [ 321.473863] WARNING: CPU: 1 PID: 4786 at drivers/gpu/drm/drm_vblank.c:608 > drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x330 [drm] > [ 321.473864] Modules linked in: ebtable_filter(E) ebtables(E) fuse(E) > rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) af_packet(E) > bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) > xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) > ipt_REJECT(E) iptable_raw(E) iptable_filter(E) ip6table_mangle(E) > nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) > nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) > ip6table_filter(E) ip6_tables(E) x_tables(E) saa7134_alsa(E) tda1004x(E) > saa7134_dvb(E) videobuf2_dvb(E) dvb_core(E) arc4(E) rt2800usb(E) rt2x00usb(E) > rt2800lib(E) crc_ccitt(E) rt2x00lib(E) mac80211(E) cfg80211(E) > rc_medion_x10_or2x(E) rfkill(E) ati_remote(E) tda827x(E) tda8290(E) tuner(E) > snd_hda_codec_realtek(E) saa7134(E) > [ 321.473905] snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) > snd_hwdep(E) tveeprom(E) coretemp(E) videobuf2_dma_sg(E) videobuf2_memops(E) > snd_hda_core(E) videobuf2_v4l2(E) kvm_intel(E) snd_pcm(E) kvm(E) > videobuf2_core(E) snd_timer(E) rc_core(E) v4l2_common(E) snd(E) videodev(E) > iTCO_wdt(E) media(E) e1000e(E) iTCO_vendor_support(E) ptp(E) pps_core(E) > shpchp(E) soundcore(E) i2c_i801(E) lpc_ich(E) mfd_core(E) irqbypass(E) > pcspkr(E) thermal(E) acpi_cpufreq(E) fan(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) > lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) > sr_mod(E) cdrom(E) sd_mod(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) > nouveau(E) wmi(E) video(E) i2c_algo_bit(E) ahci(E) drm_kms_helper(E) > syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) > firewire_ohci(E) > [ 321.473950] libata(E) firewire_core(E) crc_itu_t(E) ehci_pci(E) > serio_raw(E) ttm(E) button(E) drm(E) uhci_hcd(E) ehci_hcd(E) usbcore(E) sg(E) > dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) > scsi_mod(E) autofs4(E) > [ 321.473966] CPU: 1 PID: 4786 Comm: kworker/u8:17 Tainted: GW E > 4.12.0.gccd5d1b-master #186 > [ 321.473968] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG > 12/26/2007 > [ 321.473972] Workqueue: events_unbound async_run_entry_fn > [ 321.473974] task: 8801daf93d40 task.stack: c90003edc000 > [ 321.473990] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x330 > [drm] > [ 321.473992] RSP: 0018:c90003edfb00 EFLAGS: 00010082 > [ 321.473994] RAX: a03e6100 RBX: 88021114 RCX: > 0001 > [ 321.473995] RDX: a01dd8c8 RSI: 0001 RDI: > a01c8023 > [ 321.473996] RBP: c90003edfb80 R08: R09: > a01b0920 > [ 321.473998] R10: a0376e60 R11: 8802131399f8 R12: > 0001 > [ 321.473999] R13: 880213139800 R14: c90003edfb94 R15: > c90003edfbd0 > [ 321.474001] FS: () GS:88022fc8() > knlGS: > [ 321.474003] CS: 0010 DS: ES: CR0: 80050033 > [ 321.474004] CR2: 7fdd82e8f810 CR3: 000214683000 CR4: > 06e0 > [ 321.474005] Call Trace: > [ 321.474068] ? nv50_head_vblank_put+0x22/0x50 [nouveau] > [ 321.474085] drm_get_last_vbltimestamp+0x41/0x70 [drm] > [ 321.474102] drm_update_vblank_count+0x61/0x230 [drm] > [ 321.474118] drm_vblank_disable_and_save+0x59/0xc0 [drm] > [ 321.474134] drm_crtc_vblank_off+0x1d5/0x210 [drm] > [ 321.474152] ? drm_modeset_drop_locks+0x4e/0x60 [drm] > [ 321.474203] nouveau_display_fini+0x56/0xd0 [nouveau] > [ 321.474254] nouveau_display_suspend+0x4f/0x110 [nouveau] > [ 321.474304] nouveau_do_suspend+0x7c/0x1e0 [nouveau] > [ 321.474355] nouveau_pmops_suspend+0x2d/0x70 [nouveau] > [ 321.474358] pci_pm_suspend+0x70/0x130 > [ 321.474360] ? pci_pm_resume+0x90/0x90 > [ 321.474364] dpm_run_callback+0x4d/0x150 > [
Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling
On Sat, Jul 15, 2017 at 12:14 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraith <efa...@gmx.de> wrote: >> Greetings, >> >> box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside >> kernel: master.today (v4.12-11690-gccd5d1b91f22) >> >> lspci -nn -d 10de: >> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce >> 8600 GT] [10de:0402] (rev a1) >> >> abreviated dmesg: >> ... >> [3.720990] fb: switching to nouveaufb from VESA VGA >> [3.744489] Console: switching to colour dummy device 80x25 >> [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2) >> ... >> [3.846963] usbcore: registered new interface driver uas >> [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12 >> [3.870769] hid-generic 0003:04CA:002B.0002: input,hidraw1: USB HID v1.11 >> Keyboard [Liteon Wireless keyboard and mouse] on usb-:00:1d.0-1/input0 >> [3.870773] nouveau :01:00.0: bios: M0203T not found >> [3.870774] nouveau :01:00.0: bios: M0203E not matched! >> [3.870777] nouveau :01:00.0: fb: 256 MiB DDR2 >> [3.871168] input: Liteon Wireless keyboard and mouse as >> /devices/pci:00/:00:1d.0/usb4/4-1/4-1:1.1/0003:04CA:002B.0003/input/input7 >> [3.896090] usb 3-2: new low-speed USB device number 3 using uhci_hcd >> [3.919101] [TTM] Zone kernel: Available graphics memory: 3881208 kiB >> [3.919106] [TTM] Zone dma32: Available graphics memory: 2097152 kiB >> [3.919110] [TTM] Initializing pool allocator >> [3.919120] [TTM] Initializing DMA pool allocator >> [3.919141] nouveau :01:00.0: DRM: VRAM: 256 MiB >> [3.919146] nouveau :01:00.0: DRM: GART: 1048576 MiB >> [3.919152] nouveau :01:00.0: DRM: TMDS table version 2.0 >> [3.919157] nouveau :01:00.0: DRM: DCB version 4.0 >> [3.919162] nouveau :01:00.0: DRM: DCB outp 00: 04000310 0028 >> [3.919167] nouveau :01:00.0: DRM: DCB outp 01: 02011300 0028 >> [3.919171] nouveau :01:00.0: DRM: DCB outp 02: 01011302 0030 >> [3.919176] nouveau :01:00.0: DRM: DCB outp 03: 02022322 00020010 >> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083 >> [3.919185] nouveau :01:00.0: DRM: DCB conn 00: >> [3.919189] nouveau :01:00.0: DRM: DCB conn 01: 1130 >> [3.919194] nouveau :01:00.0: DRM: DCB conn 02: 2261 >> [3.919198] nouveau :01:00.0: DRM: DCB conn 03: 0310 >> [3.919202] nouveau :01:00.0: DRM: DCB conn 04: 0311 >> [3.919206] nouveau :01:00.0: DRM: DCB conn 05: 0313 >> [3.919258] [ cut here ] >> [3.919316] WARNING: CPU: 3 PID: 224 at >> drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c:83 >> nvkm_outp_xlat.isra.0+0x26/0x80 [nouveau] > > The code in question is > > static enum nvkm_ior_proto > nvkm_outp_xlat(struct nvkm_outp *outp, enum nvkm_ior_type *type) > { > switch (outp->info.location) { > case 0: > switch (outp->info.type) { > case DCB_OUTPUT_ANALOG: *type = DAC; return CRT; > case DCB_OUTPUT_TMDS : *type = SOR; return TMDS; > case DCB_OUTPUT_LVDS : *type = SOR; return LVDS; > case DCB_OUTPUT_DP: *type = SOR; return DP; > default: > break; > } > break; > case 1: > switch (outp->info.type) { > case DCB_OUTPUT_TMDS: *type = PIOR; return TMDS; > case DCB_OUTPUT_DP : *type = PIOR; return TMDS; /* not a bug > */ > default: > break; > } > break; > default: > break; > } > WARN_ON(1); > return UNKNOWN; > } > > Looks like someone forgot about TV S-Video/Composite outputs (which > existed up until the GT21x's). > >> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083 > > And there ya go (the type is the lowest nibble of the first dword). We > don't support TV outputs on nv50+, so you could just add a > > case DCB_OUTPUT_TV: return UNKNOWN; > > in the location == 0 case. > > I don't think that's related to the issue you're seeing on suspend > though, as the TV connector isn't created anyways, it's just an > "annoyance" warn, and you were also seeing it on your GM20x which has > no such thing. Actually while this may fix things for you in the short term, this is all generic code, not chip-specific, and we do support TV outputs on pre-nv50 chips, so it needs to be fixed for real. Ben - I'm very weak on all these concepts of OR/etc - is the right move to add a new nvkm_ior_proto/type for TV? (There's also a DCB_OUTPUT_EOL type, no clue what that is.) I guess it should get type = DAC and add a new nvkm_ior_proto for TV? -ilia
Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling
On Sat, Jul 15, 2017 at 12:14 PM, Ilia Mirkin wrote: > On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraith wrote: >> Greetings, >> >> box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside >> kernel: master.today (v4.12-11690-gccd5d1b91f22) >> >> lspci -nn -d 10de: >> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce >> 8600 GT] [10de:0402] (rev a1) >> >> abreviated dmesg: >> ... >> [3.720990] fb: switching to nouveaufb from VESA VGA >> [3.744489] Console: switching to colour dummy device 80x25 >> [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2) >> ... >> [3.846963] usbcore: registered new interface driver uas >> [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12 >> [3.870769] hid-generic 0003:04CA:002B.0002: input,hidraw1: USB HID v1.11 >> Keyboard [Liteon Wireless keyboard and mouse] on usb-:00:1d.0-1/input0 >> [3.870773] nouveau :01:00.0: bios: M0203T not found >> [3.870774] nouveau :01:00.0: bios: M0203E not matched! >> [3.870777] nouveau :01:00.0: fb: 256 MiB DDR2 >> [3.871168] input: Liteon Wireless keyboard and mouse as >> /devices/pci:00/:00:1d.0/usb4/4-1/4-1:1.1/0003:04CA:002B.0003/input/input7 >> [3.896090] usb 3-2: new low-speed USB device number 3 using uhci_hcd >> [3.919101] [TTM] Zone kernel: Available graphics memory: 3881208 kiB >> [3.919106] [TTM] Zone dma32: Available graphics memory: 2097152 kiB >> [3.919110] [TTM] Initializing pool allocator >> [3.919120] [TTM] Initializing DMA pool allocator >> [3.919141] nouveau :01:00.0: DRM: VRAM: 256 MiB >> [3.919146] nouveau :01:00.0: DRM: GART: 1048576 MiB >> [3.919152] nouveau :01:00.0: DRM: TMDS table version 2.0 >> [3.919157] nouveau :01:00.0: DRM: DCB version 4.0 >> [3.919162] nouveau :01:00.0: DRM: DCB outp 00: 04000310 0028 >> [3.919167] nouveau :01:00.0: DRM: DCB outp 01: 02011300 0028 >> [3.919171] nouveau :01:00.0: DRM: DCB outp 02: 01011302 0030 >> [3.919176] nouveau :01:00.0: DRM: DCB outp 03: 02022322 00020010 >> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083 >> [3.919185] nouveau :01:00.0: DRM: DCB conn 00: >> [3.919189] nouveau :01:00.0: DRM: DCB conn 01: 1130 >> [3.919194] nouveau :01:00.0: DRM: DCB conn 02: 2261 >> [3.919198] nouveau :01:00.0: DRM: DCB conn 03: 0310 >> [3.919202] nouveau :01:00.0: DRM: DCB conn 04: 0311 >> [3.919206] nouveau :01:00.0: DRM: DCB conn 05: 0313 >> [3.919258] [ cut here ] >> [3.919316] WARNING: CPU: 3 PID: 224 at >> drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c:83 >> nvkm_outp_xlat.isra.0+0x26/0x80 [nouveau] > > The code in question is > > static enum nvkm_ior_proto > nvkm_outp_xlat(struct nvkm_outp *outp, enum nvkm_ior_type *type) > { > switch (outp->info.location) { > case 0: > switch (outp->info.type) { > case DCB_OUTPUT_ANALOG: *type = DAC; return CRT; > case DCB_OUTPUT_TMDS : *type = SOR; return TMDS; > case DCB_OUTPUT_LVDS : *type = SOR; return LVDS; > case DCB_OUTPUT_DP: *type = SOR; return DP; > default: > break; > } > break; > case 1: > switch (outp->info.type) { > case DCB_OUTPUT_TMDS: *type = PIOR; return TMDS; > case DCB_OUTPUT_DP : *type = PIOR; return TMDS; /* not a bug > */ > default: > break; > } > break; > default: > break; > } > WARN_ON(1); > return UNKNOWN; > } > > Looks like someone forgot about TV S-Video/Composite outputs (which > existed up until the GT21x's). > >> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083 > > And there ya go (the type is the lowest nibble of the first dword). We > don't support TV outputs on nv50+, so you could just add a > > case DCB_OUTPUT_TV: return UNKNOWN; > > in the location == 0 case. > > I don't think that's related to the issue you're seeing on suspend > though, as the TV connector isn't created anyways, it's just an > "annoyance" warn, and you were also seeing it on your GM20x which has > no such thing. Actually while this may fix things for you in the short term, this is all generic code, not chip-specific, and we do support TV outputs on pre-nv50 chips, so it needs to be fixed for real. Ben - I'm very weak on all these concepts of OR/etc - is the right move to add a new nvkm_ior_proto/type for TV? (There's also a DCB_OUTPUT_EOL type, no clue what that is.) I guess it should get type = DAC and add a new nvkm_ior_proto for TV? -ilia
Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling
On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraithwrote: > Greetings, > > box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside > kernel: master.today (v4.12-11690-gccd5d1b91f22) > > lspci -nn -d 10de: > 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce > 8600 GT] [10de:0402] (rev a1) > > abreviated dmesg: > ... > [3.720990] fb: switching to nouveaufb from VESA VGA > [3.744489] Console: switching to colour dummy device 80x25 > [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2) > ... > [3.846963] usbcore: registered new interface driver uas > [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12 > [3.870769] hid-generic 0003:04CA:002B.0002: input,hidraw1: USB HID v1.11 > Keyboard [Liteon Wireless keyboard and mouse] on usb-:00:1d.0-1/input0 > [3.870773] nouveau :01:00.0: bios: M0203T not found > [3.870774] nouveau :01:00.0: bios: M0203E not matched! > [3.870777] nouveau :01:00.0: fb: 256 MiB DDR2 > [3.871168] input: Liteon Wireless keyboard and mouse as > /devices/pci:00/:00:1d.0/usb4/4-1/4-1:1.1/0003:04CA:002B.0003/input/input7 > [3.896090] usb 3-2: new low-speed USB device number 3 using uhci_hcd > [3.919101] [TTM] Zone kernel: Available graphics memory: 3881208 kiB > [3.919106] [TTM] Zone dma32: Available graphics memory: 2097152 kiB > [3.919110] [TTM] Initializing pool allocator > [3.919120] [TTM] Initializing DMA pool allocator > [3.919141] nouveau :01:00.0: DRM: VRAM: 256 MiB > [3.919146] nouveau :01:00.0: DRM: GART: 1048576 MiB > [3.919152] nouveau :01:00.0: DRM: TMDS table version 2.0 > [3.919157] nouveau :01:00.0: DRM: DCB version 4.0 > [3.919162] nouveau :01:00.0: DRM: DCB outp 00: 04000310 0028 > [3.919167] nouveau :01:00.0: DRM: DCB outp 01: 02011300 0028 > [3.919171] nouveau :01:00.0: DRM: DCB outp 02: 01011302 0030 > [3.919176] nouveau :01:00.0: DRM: DCB outp 03: 02022322 00020010 > [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083 > [3.919185] nouveau :01:00.0: DRM: DCB conn 00: > [3.919189] nouveau :01:00.0: DRM: DCB conn 01: 1130 > [3.919194] nouveau :01:00.0: DRM: DCB conn 02: 2261 > [3.919198] nouveau :01:00.0: DRM: DCB conn 03: 0310 > [3.919202] nouveau :01:00.0: DRM: DCB conn 04: 0311 > [3.919206] nouveau :01:00.0: DRM: DCB conn 05: 0313 > [3.919258] [ cut here ] > [3.919316] WARNING: CPU: 3 PID: 224 at > drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c:83 > nvkm_outp_xlat.isra.0+0x26/0x80 [nouveau] The code in question is static enum nvkm_ior_proto nvkm_outp_xlat(struct nvkm_outp *outp, enum nvkm_ior_type *type) { switch (outp->info.location) { case 0: switch (outp->info.type) { case DCB_OUTPUT_ANALOG: *type = DAC; return CRT; case DCB_OUTPUT_TMDS : *type = SOR; return TMDS; case DCB_OUTPUT_LVDS : *type = SOR; return LVDS; case DCB_OUTPUT_DP: *type = SOR; return DP; default: break; } break; case 1: switch (outp->info.type) { case DCB_OUTPUT_TMDS: *type = PIOR; return TMDS; case DCB_OUTPUT_DP : *type = PIOR; return TMDS; /* not a bug */ default: break; } break; default: break; } WARN_ON(1); return UNKNOWN; } Looks like someone forgot about TV S-Video/Composite outputs (which existed up until the GT21x's). > [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083 And there ya go (the type is the lowest nibble of the first dword). We don't support TV outputs on nv50+, so you could just add a case DCB_OUTPUT_TV: return UNKNOWN; in the location == 0 case. I don't think that's related to the issue you're seeing on suspend though, as the TV connector isn't created anyways, it's just an "annoyance" warn, and you were also seeing it on your GM20x which has no such thing. -ilia
Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling
On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraith wrote: > Greetings, > > box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside > kernel: master.today (v4.12-11690-gccd5d1b91f22) > > lspci -nn -d 10de: > 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce > 8600 GT] [10de:0402] (rev a1) > > abreviated dmesg: > ... > [3.720990] fb: switching to nouveaufb from VESA VGA > [3.744489] Console: switching to colour dummy device 80x25 > [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2) > ... > [3.846963] usbcore: registered new interface driver uas > [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12 > [3.870769] hid-generic 0003:04CA:002B.0002: input,hidraw1: USB HID v1.11 > Keyboard [Liteon Wireless keyboard and mouse] on usb-:00:1d.0-1/input0 > [3.870773] nouveau :01:00.0: bios: M0203T not found > [3.870774] nouveau :01:00.0: bios: M0203E not matched! > [3.870777] nouveau :01:00.0: fb: 256 MiB DDR2 > [3.871168] input: Liteon Wireless keyboard and mouse as > /devices/pci:00/:00:1d.0/usb4/4-1/4-1:1.1/0003:04CA:002B.0003/input/input7 > [3.896090] usb 3-2: new low-speed USB device number 3 using uhci_hcd > [3.919101] [TTM] Zone kernel: Available graphics memory: 3881208 kiB > [3.919106] [TTM] Zone dma32: Available graphics memory: 2097152 kiB > [3.919110] [TTM] Initializing pool allocator > [3.919120] [TTM] Initializing DMA pool allocator > [3.919141] nouveau :01:00.0: DRM: VRAM: 256 MiB > [3.919146] nouveau :01:00.0: DRM: GART: 1048576 MiB > [3.919152] nouveau :01:00.0: DRM: TMDS table version 2.0 > [3.919157] nouveau :01:00.0: DRM: DCB version 4.0 > [3.919162] nouveau :01:00.0: DRM: DCB outp 00: 04000310 0028 > [3.919167] nouveau :01:00.0: DRM: DCB outp 01: 02011300 0028 > [3.919171] nouveau :01:00.0: DRM: DCB outp 02: 01011302 0030 > [3.919176] nouveau :01:00.0: DRM: DCB outp 03: 02022322 00020010 > [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083 > [3.919185] nouveau :01:00.0: DRM: DCB conn 00: > [3.919189] nouveau :01:00.0: DRM: DCB conn 01: 1130 > [3.919194] nouveau :01:00.0: DRM: DCB conn 02: 2261 > [3.919198] nouveau :01:00.0: DRM: DCB conn 03: 0310 > [3.919202] nouveau :01:00.0: DRM: DCB conn 04: 0311 > [3.919206] nouveau :01:00.0: DRM: DCB conn 05: 0313 > [3.919258] [ cut here ] > [3.919316] WARNING: CPU: 3 PID: 224 at > drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c:83 > nvkm_outp_xlat.isra.0+0x26/0x80 [nouveau] The code in question is static enum nvkm_ior_proto nvkm_outp_xlat(struct nvkm_outp *outp, enum nvkm_ior_type *type) { switch (outp->info.location) { case 0: switch (outp->info.type) { case DCB_OUTPUT_ANALOG: *type = DAC; return CRT; case DCB_OUTPUT_TMDS : *type = SOR; return TMDS; case DCB_OUTPUT_LVDS : *type = SOR; return LVDS; case DCB_OUTPUT_DP: *type = SOR; return DP; default: break; } break; case 1: switch (outp->info.type) { case DCB_OUTPUT_TMDS: *type = PIOR; return TMDS; case DCB_OUTPUT_DP : *type = PIOR; return TMDS; /* not a bug */ default: break; } break; default: break; } WARN_ON(1); return UNKNOWN; } Looks like someone forgot about TV S-Video/Composite outputs (which existed up until the GT21x's). > [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083 And there ya go (the type is the lowest nibble of the first dword). We don't support TV outputs on nv50+, so you could just add a case DCB_OUTPUT_TV: return UNKNOWN; in the location == 0 case. I don't think that's related to the issue you're seeing on suspend though, as the TV connector isn't created anyways, it's just an "annoyance" warn, and you were also seeing it on your GM20x which has no such thing. -ilia
Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Fri, Jul 14, 2017 at 11:19 AM, Tobias Klausmannwrote: > The conversion is a nice catch, but i'd like to have a bit more context, see > below! > > With a better description: > > Tobias Klausmann I don't think it was meant as a serious patch. WARN_ON_ONCE should work. The fix isn't to remove all instances of WARN_ON_ONCE. The fix is to fix WARN_ON_ONCE.
Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Fri, Jul 14, 2017 at 11:19 AM, Tobias Klausmann wrote: > The conversion is a nice catch, but i'd like to have a bit more context, see > below! > > With a better description: > > Tobias Klausmann I don't think it was meant as a serious patch. WARN_ON_ONCE should work. The fix isn't to remove all instances of WARN_ON_ONCE. The fix is to fix WARN_ON_ONCE.
Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Fri, Jul 14, 2017 at 11:15 AM, Mike Galbraithwrote: > On Fri, 2017-07-14 at 17:10 +0200, Karol Herbst wrote: >> Yeah, we shouldn't let the machine die. Are there more WARN_ON_ONCE >> usage we could convert to WARN_ONCE? > > Shooting the messenger is generally considered uncool :) That's never stopped it from being a popular practice...
Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Fri, Jul 14, 2017 at 11:15 AM, Mike Galbraith wrote: > On Fri, 2017-07-14 at 17:10 +0200, Karol Herbst wrote: >> Yeah, we shouldn't let the machine die. Are there more WARN_ON_ONCE >> usage we could convert to WARN_ONCE? > > Shooting the messenger is generally considered uncool :) That's never stopped it from being a popular practice...
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith <efa...@gmx.de> wrote: > On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote: >> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: >> > >> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not >> > too much trouble, a bisect would be pretty useful. >> >> Bisection seemingly went fine, but the result is odd. >> >> e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit > > But it really really is bad. Looking at gitk fork in the road leading > to it... > > 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good > e4e818cc2d7c drm: make drm_panel.h self-contained - good > 9cf8f5802f39 drm: add missing declaration to drm_blend.h - good > > Before the git highway splits, all is well. The lane with commits > works fine at both ends, but e98c58e55f68 is busted. Merge arfifact? Hmmm... that tree does not appear to have gotten a v4.12 backmerge at any point. The last backmerge from Linus as far as I can tell was v4.11-rc7. Could be an interaction with some out-of-tree change.
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith wrote: > On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote: >> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: >> > >> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not >> > too much trouble, a bisect would be pretty useful. >> >> Bisection seemingly went fine, but the result is odd. >> >> e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit > > But it really really is bad. Looking at gitk fork in the road leading > to it... > > 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good > e4e818cc2d7c drm: make drm_panel.h self-contained - good > 9cf8f5802f39 drm: add missing declaration to drm_blend.h - good > > Before the git highway splits, all is well. The lane with commits > works fine at both ends, but e98c58e55f68 is busted. Merge arfifact? Hmmm... that tree does not appear to have gotten a v4.12 backmerge at any point. The last backmerge from Linus as far as I can tell was v4.11-rc7. Could be an interaction with some out-of-tree change.
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, Jul 11, 2017 at 2:08 PM, Mike Galbraith <efa...@gmx.de> wrote: > On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote: >> Some details that may be useful in analysis of the bug: >> >> 1. lspci -nn -d 10de: > > 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce > GTX 980] [10de:13c0] (rev a1) > 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio > Controller [10de:0fbb] (rev a1 > >> 2. What displays, if any, you have plugged into the NVIDIA board when >> this happens? > > A Philips 273V, via DVI. > >> 3. Any boot parameters, esp relating to ACPI, PM, or related? > > None for those, what's there that will be unfamiliar to you are for > patches that aren't applied. > > nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0 > nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60 > ignore_loglevel crashkernel=256M,high OK, thanks. So in other words, a fairly standard desktop with a PCIe board plugged in. No funny business. (Laptops can create a ton of additional weirdness, which I assumed you had since you were talking about STR.) My best guess is that gf119_head_vblank_put either has a bogus head id (should be in the 0..3 range) which causes it to do an out-of-bounds read on MMIO space, or that the MMIO mapping has already been removed by the time nouveau_display_suspend runs. Adding Ben Skeggs for additional insight. Some display stuff did change for 4.13 for GM20x+ boards. If it's not too much trouble, a bisect would be pretty useful. Cheers, -ilia
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, Jul 11, 2017 at 2:08 PM, Mike Galbraith wrote: > On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote: >> Some details that may be useful in analysis of the bug: >> >> 1. lspci -nn -d 10de: > > 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce > GTX 980] [10de:13c0] (rev a1) > 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio > Controller [10de:0fbb] (rev a1 > >> 2. What displays, if any, you have plugged into the NVIDIA board when >> this happens? > > A Philips 273V, via DVI. > >> 3. Any boot parameters, esp relating to ACPI, PM, or related? > > None for those, what's there that will be unfamiliar to you are for > patches that aren't applied. > > nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0 > nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60 > ignore_loglevel crashkernel=256M,high OK, thanks. So in other words, a fairly standard desktop with a PCIe board plugged in. No funny business. (Laptops can create a ton of additional weirdness, which I assumed you had since you were talking about STR.) My best guess is that gf119_head_vblank_put either has a bogus head id (should be in the 0..3 range) which causes it to do an out-of-bounds read on MMIO space, or that the MMIO mapping has already been removed by the time nouveau_display_suspend runs. Adding Ben Skeggs for additional insight. Some display stuff did change for 4.13 for GM20x+ boards. If it's not too much trouble, a bisect would be pretty useful. Cheers, -ilia
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
Some details that may be useful in analysis of the bug: 1. lspci -nn -d 10de: 2. What displays, if any, you have plugged into the NVIDIA board when this happens? 3. Any boot parameters, esp relating to ACPI, PM, or related? Cheers, -ilia On Tue, Jul 11, 2017 at 1:32 PM, Mike Galbraithwrote: > Greetings, > > I met $subject in master-rt post drm merge, but taking the config > (attached) to virgin v4.12-10624-g9967468c0a10, it's reproducible. > > KERNEL: vmlinux-4.12.0.g9967468-preempt.gz > DUMPFILE: vmcore > CPUS: 8 > DATE: Tue Jul 11 18:55:28 2017 > UPTIME: 00:02:03 > LOAD AVERAGE: 3.43, 1.39, 0.52 >TASKS: 467 > NODENAME: homer > RELEASE: 4.12.0.g9967468-preempt > VERSION: #155 SMP PREEMPT Tue Jul 11 18:18:11 CEST 2017 > MACHINE: x86_64 (3591 Mhz) > MEMORY: 16 GB >PANIC: "BUG: unable to handle kernel paging request at > a022990f" > PID: 4658 > COMMAND: "kworker/u16:26" > TASK: 8803c6068f80 [THREAD_INFO: 8803c6068f80] > CPU: 7 >STATE: TASK_RUNNING (PANIC) > > crash> bt > PID: 4658 TASK: 8803c6068f80 CPU: 7 COMMAND: "kworker/u16:26" > #0 [c900039f76a0] machine_kexec at 810481fc > #1 [c900039f76f0] __crash_kexec at 81109e3a > #2 [c900039f77b0] crash_kexec at 8110adc9 > #3 [c900039f77c8] oops_end at 8101d059 > #4 [c900039f77e8] no_context at 81055ce5 > #5 [c900039f7838] do_page_fault at 81056c5b > #6 [c900039f7860] page_fault at 81690a88 > [exception RIP: report_bug+93] > RIP: 8167227d RSP: c900039f7918 RFLAGS: 00010002 > RAX: a0229905 RBX: a020af0f RCX: 0001 > RDX: 0907 RSI: a020af11 RDI: 98f6 > RBP: c900039f7a58 R8: 0001 R9: 03fc > R10: 81a01906 R11: 8803f84711f8 R12: a02231fb > R13: 0260 R14: 0004 R15: 0006 > ORIG_RAX: CS: 0010 SS: 0018 > #7 [c900039f7910] report_bug at 81672248 > #8 [c900039f7938] fixup_bug at 8101af85 > #9 [c900039f7950] do_trap at 8101b0d9 > #10 [c900039f79a0] do_error_trap at 8101b190 > #11 [c900039f7a50] invalid_op at 8169063e > [exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335] > RIP: a020af0f RSP: c900039f7b00 RFLAGS: 00010086 > RAX: a04fa100 RBX: 8803f9550800 RCX: 0001 > RDX: a0228a58 RSI: 0001 RDI: a022321b > RBP: c900039f7b80 R8: R9: a020adc0 > R10: a048a1b0 R11: 8803f84711f8 R12: 0001 > R13: 8803f8471000 R14: c900039f7b94 R15: c900039f7bd0 > ORIG_RAX: CS: 0010 SS: 0018 > #12 [c900039f7b18] gf119_head_vblank_put at a04422f9 [nouveau] > #13 [c900039f7b88] drm_get_last_vbltimestamp at a020ad91 [drm] > #14 [c900039f7ba8] drm_update_vblank_count at a020b3e1 [drm] > #15 [c900039f7c10] drm_vblank_disable_and_save at a020bbe9 [drm] > #16 [c900039f7c40] drm_crtc_vblank_off at a020c3c0 [drm] > #17 [c900039f7cb0] nouveau_display_fini at a048a4d6 [nouveau] > #18 [c900039f7ce0] nouveau_display_suspend at a048ac4f [nouveau] > #19 [c900039f7d00] nouveau_do_suspend at a047e5ec [nouveau] > #20 [c900039f7d38] nouveau_pmops_suspend at a047e77d [nouveau] > #21 [c900039f7d50] pci_pm_suspend at 813b1ff0 > #22 [c900039f7d80] dpm_run_callback at 814c4dbd > #23 [c900039f7db8] __device_suspend at 814c5a61 > #24 [c900039f7e30] async_suspend at 814c5cfa > #25 [c900039f7e48] async_run_entry_fn at 81091683 > #26 [c900039f7e70] process_one_work at 810882bc > #27 [c900039f7eb0] worker_thread at 8108854a > #28 [c900039f7f10] kthread at 8108e387 > #29 [c900039f7f50] ret_from_fork at 8168fa85 > crash> gdb list *drm_calc_vbltimestamp_from_scanoutpos+335 > 0xa020af0f is in drm_calc_vbltimestamp_from_scanoutpos > (drivers/gpu/drm/drm_vblank.c:608). > 603 /* If mode timing undefined, just return as no-op: > 604 * Happens during initial modesetting of a crtc. > 605 */ > 606 if (mode->crtc_clock == 0) { > 607 DRM_DEBUG("crtc %u: Noop due to uninitialized > mode.\n", pipe); > 608 WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev)); > 609 > 610 return false; > 611 } > 612 > crash> gdb list *report_bug+93 > 0x8167227d is in report_bug (lib/bug.c:177). > 172 return BUG_TRAP_TYPE_WARN; > 173 > 174
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
Some details that may be useful in analysis of the bug: 1. lspci -nn -d 10de: 2. What displays, if any, you have plugged into the NVIDIA board when this happens? 3. Any boot parameters, esp relating to ACPI, PM, or related? Cheers, -ilia On Tue, Jul 11, 2017 at 1:32 PM, Mike Galbraith wrote: > Greetings, > > I met $subject in master-rt post drm merge, but taking the config > (attached) to virgin v4.12-10624-g9967468c0a10, it's reproducible. > > KERNEL: vmlinux-4.12.0.g9967468-preempt.gz > DUMPFILE: vmcore > CPUS: 8 > DATE: Tue Jul 11 18:55:28 2017 > UPTIME: 00:02:03 > LOAD AVERAGE: 3.43, 1.39, 0.52 >TASKS: 467 > NODENAME: homer > RELEASE: 4.12.0.g9967468-preempt > VERSION: #155 SMP PREEMPT Tue Jul 11 18:18:11 CEST 2017 > MACHINE: x86_64 (3591 Mhz) > MEMORY: 16 GB >PANIC: "BUG: unable to handle kernel paging request at > a022990f" > PID: 4658 > COMMAND: "kworker/u16:26" > TASK: 8803c6068f80 [THREAD_INFO: 8803c6068f80] > CPU: 7 >STATE: TASK_RUNNING (PANIC) > > crash> bt > PID: 4658 TASK: 8803c6068f80 CPU: 7 COMMAND: "kworker/u16:26" > #0 [c900039f76a0] machine_kexec at 810481fc > #1 [c900039f76f0] __crash_kexec at 81109e3a > #2 [c900039f77b0] crash_kexec at 8110adc9 > #3 [c900039f77c8] oops_end at 8101d059 > #4 [c900039f77e8] no_context at 81055ce5 > #5 [c900039f7838] do_page_fault at 81056c5b > #6 [c900039f7860] page_fault at 81690a88 > [exception RIP: report_bug+93] > RIP: 8167227d RSP: c900039f7918 RFLAGS: 00010002 > RAX: a0229905 RBX: a020af0f RCX: 0001 > RDX: 0907 RSI: a020af11 RDI: 98f6 > RBP: c900039f7a58 R8: 0001 R9: 03fc > R10: 81a01906 R11: 8803f84711f8 R12: a02231fb > R13: 0260 R14: 0004 R15: 0006 > ORIG_RAX: CS: 0010 SS: 0018 > #7 [c900039f7910] report_bug at 81672248 > #8 [c900039f7938] fixup_bug at 8101af85 > #9 [c900039f7950] do_trap at 8101b0d9 > #10 [c900039f79a0] do_error_trap at 8101b190 > #11 [c900039f7a50] invalid_op at 8169063e > [exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335] > RIP: a020af0f RSP: c900039f7b00 RFLAGS: 00010086 > RAX: a04fa100 RBX: 8803f9550800 RCX: 0001 > RDX: a0228a58 RSI: 0001 RDI: a022321b > RBP: c900039f7b80 R8: R9: a020adc0 > R10: a048a1b0 R11: 8803f84711f8 R12: 0001 > R13: 8803f8471000 R14: c900039f7b94 R15: c900039f7bd0 > ORIG_RAX: CS: 0010 SS: 0018 > #12 [c900039f7b18] gf119_head_vblank_put at a04422f9 [nouveau] > #13 [c900039f7b88] drm_get_last_vbltimestamp at a020ad91 [drm] > #14 [c900039f7ba8] drm_update_vblank_count at a020b3e1 [drm] > #15 [c900039f7c10] drm_vblank_disable_and_save at a020bbe9 [drm] > #16 [c900039f7c40] drm_crtc_vblank_off at a020c3c0 [drm] > #17 [c900039f7cb0] nouveau_display_fini at a048a4d6 [nouveau] > #18 [c900039f7ce0] nouveau_display_suspend at a048ac4f [nouveau] > #19 [c900039f7d00] nouveau_do_suspend at a047e5ec [nouveau] > #20 [c900039f7d38] nouveau_pmops_suspend at a047e77d [nouveau] > #21 [c900039f7d50] pci_pm_suspend at 813b1ff0 > #22 [c900039f7d80] dpm_run_callback at 814c4dbd > #23 [c900039f7db8] __device_suspend at 814c5a61 > #24 [c900039f7e30] async_suspend at 814c5cfa > #25 [c900039f7e48] async_run_entry_fn at 81091683 > #26 [c900039f7e70] process_one_work at 810882bc > #27 [c900039f7eb0] worker_thread at 8108854a > #28 [c900039f7f10] kthread at 8108e387 > #29 [c900039f7f50] ret_from_fork at 8168fa85 > crash> gdb list *drm_calc_vbltimestamp_from_scanoutpos+335 > 0xa020af0f is in drm_calc_vbltimestamp_from_scanoutpos > (drivers/gpu/drm/drm_vblank.c:608). > 603 /* If mode timing undefined, just return as no-op: > 604 * Happens during initial modesetting of a crtc. > 605 */ > 606 if (mode->crtc_clock == 0) { > 607 DRM_DEBUG("crtc %u: Noop due to uninitialized > mode.\n", pipe); > 608 WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev)); > 609 > 610 return false; > 611 } > 612 > crash> gdb list *report_bug+93 > 0x8167227d is in report_bug (lib/bug.c:177). > 172 return BUG_TRAP_TYPE_WARN; > 173 > 174 /* > 175
Re: Weird green patterns on video
On Sun, Jun 4, 2017 at 1:05 PM, Alexandre-Xavier L-Lwrote: > Hello, > > Someone sent me a picture of a device that he tried to add support for > in V4L2. The device causes a kind of diagonal pattern made of green > lines on his image. I wonder what could be causing this. Has anyone > seen this before? > > The device is a the first ever model of Ion Video 2 PC that uses a TM6010 > chip. > > What he got: https://sebbro.nl/ION_Video2PC-TM6010_BOARD_GENERIC.png > > Expected result (captured from another device): > https://sebbro.nl/VCR-reference.png > > The support for the device was added by adding > { USB_DEVICE(0x15e4, 0x0140), .driver_info = TM6010_BOARD_GENERIC }, > to tm6000-cards.c. > > Thanks in advance for any clues. > Alexandre-Xavier YUV zero = RGB greenish, as you see there. From the looks of it, the pitch on the buffer is wrong, and you're showing the parts of the buffer that are left zeroed as if they were part of the visible region. (Pitch = how many bytes between lines, which is not necessarily the visible width of the buffer, as it can be rounded up to various values for various reasons.) Hope this helps, -ilia
Re: Weird green patterns on video
On Sun, Jun 4, 2017 at 1:05 PM, Alexandre-Xavier L-L wrote: > Hello, > > Someone sent me a picture of a device that he tried to add support for > in V4L2. The device causes a kind of diagonal pattern made of green > lines on his image. I wonder what could be causing this. Has anyone > seen this before? > > The device is a the first ever model of Ion Video 2 PC that uses a TM6010 > chip. > > What he got: https://sebbro.nl/ION_Video2PC-TM6010_BOARD_GENERIC.png > > Expected result (captured from another device): > https://sebbro.nl/VCR-reference.png > > The support for the device was added by adding > { USB_DEVICE(0x15e4, 0x0140), .driver_info = TM6010_BOARD_GENERIC }, > to tm6000-cards.c. > > Thanks in advance for any clues. > Alexandre-Xavier YUV zero = RGB greenish, as you see there. From the looks of it, the pitch on the buffer is wrong, and you're showing the parts of the buffer that are left zeroed as if they were part of the visible region. (Pitch = how many bytes between lines, which is not necessarily the visible width of the buffer, as it can be rounded up to various values for various reasons.) Hope this helps, -ilia
Re: [PATCH 1/3] drm: fourcc byteorder: drop DRM_FORMAT_BIG_ENDIAN
On Tue, May 2, 2017 at 11:06 AM, Gerd Hoffmannwrote: > Radeon and nvidia (nv40) cards where mentioned. I'll try to summarize > (feel free to correct me if I'm wrong). > > nvidia has support for 8 bit-per-color formats only on bigendian hosts. > Not sure whenever this is a driver or hardware limitation. Let me just summarize the NVIDIA situation. First off, pre-nv50 and nv50+ are entirely different and unrelated beasts. The (pre-nv50) hardware has (a few) "big endian mode" bits. Those bits are kind of unrelated to each other and control their own "domains". One of the domains is reading of the scanout fb. So as a result, the hardware can scan out XRGB, RGB565, and XRGB1555 stored in either little or big endian packings, irrespective of the "mode" that other parts of the hardware are in. However there's the delicate little question of the GPU *generating* the data. These older GPUs don't have quite the format flexibility offered by newer hw. So only XRGB is supported, packed in whatever "mode" the whole PGRAPH unit is in. (I say this because things seem to work when rendering using the XRGB format while scanning out with the BE flag set.) There are no APIs for controlling the endianness of each engine in nouveau, so it ends up being in "big endian" mode on BE hosts, so the GPU can only render to big-endian-packed framebuffers. None of this applies to nv50+ hw. (Although it might in broad strokes.) Currently the driver is exposing XRGB and ARGB formats as that's what drm_crtc_init does for it. However the ARGB format doesn't work (and shouldn't be exposed, the alpha is meaningless on a single-plane setup), and the XRGB format is assumed to be packed in cpu host endian (and the "BE" bit is set accordingly). Hope this helps! -ilia