from:"Ilia Mirkin"

Re: [Nouveau] [PATCH v2] ALSA: hda: Continue to probe when codec probe fails

2021-04-12 Thread Ilia Mirkin

On Mon, Apr 12, 2021 at 4:01 PM Aaron Plattner  wrote:
>
> On 4/12/21 12:36 PM, Roy Spliet wrote:
> > Hello Aaron,
> >
> > Thanks for your insights. A follow-up query and some observations
> > in-line.
> >
> > Op 12-04-2021 om 20:06 schreef Aaron Plattner:
> >> On 4/10/21 1:48 PM, Roy Spliet wrote:
> >>> Op 10-04-2021 om 20:23 schreef Lukas Wunner:
>  On Sat, Apr 10, 2021 at 04:51:27PM +0100, Roy Spliet wrote:
> > Can I ask someone with more
> > technical knowledge of snd_hda_intel and vgaswitcheroo to
> > brainstorm about
> > the possible challenges of nouveau taking matters into its own
> > hand rather
> > than keeping this PCI quirk around?
> 
>  It sounds to me like the HDA is not powered if no cable is plugged in.
>  What is reponsible then for powering it up or down, firmware code on
>  the GPU or in the host's BIOS?
> >>>
> >>> Sometimes the BIOS, but definitely unconditionally the PCI quirk
> >>> code:
> >>> https://github.com/torvalds/linux/blob/master/drivers/pci/quirks.c#L5289
> >>>
> >>>
> >>> (CC Aaron Plattner)
> >>
> >> My basic understanding is that the audio function stops responding
> >> whenever the graphics function is powered off. So the requirement
> >> here is that the audio driver can't try to talk to the audio function
> >> while the graphics function is asleep, and must trigger a graphics
> >> function wakeup before trying to communicate with the audio function.
> >
> > I believe that vgaswitcheroo takes care of this for us.
> >
> >> I think there are also requirements about the audio function needing
> >> to be awake when the graphics driver is updating the ELD, but I'm not
> >> sure.
> >>
> >> This is harder on Windows because the audio driver lives in its own
> >> little world doing its own thing but on Linux we can do better.
> >>
>  Ideally, we should try to find out how to control HDA power from the
>  operating system rather than trying to cooperate with whatever
>  firmware
>  is doing.  If we have that capability, the OS should power the HDA up
>  and down as it sees fit.
> >>
> >> After system boot, I don't think there's any firmware involved, but
> >> I'm not super familiar with the low-level details and it's possible
> >> the situation changed since I last looked at it.
> >>
> >> I think the problem with having nouveau write this quirk is that the
> >> kernel will need to re-probe the PCI device to notice that it has
> >> suddenly become a multi-function device with an audio function, and
> >> hotplug the audio driver. I originally looked into trying to do that
> >> but it was tricky because the PCI subsystem didn't really have a
> >> mechanism for a single-function device to become a multi-function
> >> device on the fly and it seemed easier to enable it early on during
> >> bus enumeration. That way the kernel sees both functions all the time
> >> without anything else having to be special about this configuration.
> >
> > Right, so for a little more context: a while ago I noticed that my
> > laptop (lucky me, Asus K501UB) has a 940M with HDA but no codec. Seems
> > legit, given how this GPU has no displays attached; they're all hooked
> > up to the Intel integrated GPU. That threw off the snd_hda_intel
> > mid-probe, and as a result didn't permit runpm, keeping the entire
> > GPU, PCIe bus and thus the CPU package awake. A bit of hackerly later
> > we decided to continue probing without a codec, and now my laptop is
> > happy, but...
>
> What is the PCI class of the GPU in your system? If it has no display
> outputs it's probably 0x302 ("3D Controller") rather than 0x300 ("VGA
> Controller"). Looking at the code it looks like this workaround is being
> applied to both but maybe it should be restricted to just VGA controllers.

That was a comment I had back when the quirk was being implemented,
but helpfully there are some of these devices running around which say
"3D Controller" but still have displays attached to them. Lukas
probably remembers more specifics.

  -ilia

Re: [Nouveau] [PATCH] drm/nouveau/kms/nv50-: Check plane size for cursors, not fb size

2021-03-18 Thread Ilia Mirkin

On Thu, Mar 18, 2021 at 5:56 PM Lyude Paul  wrote:
>
> Found this while trying to make some changes to the kms_cursor_crc test.
> curs507a_acquire checks that the width and height of the cursor framebuffer
> are equal (asyw->image.{w,h}). This is actually wrong though, as we only
> want to be concerned that the actual width/height of the plane are the
> same. It's fine if we scan out from an fb that's slightly larger than the
> cursor plane (in fact, some igt tests actually do this).

How so? The scanout engine expects the data to be packed. Height can
be larger, but width has to match.

  -ilia

Re: [PATCH 2/3] drm/nouveau/kms/nv50-: Report max cursor size to userspace

2021-02-23 Thread Ilia Mirkin

On Tue, Feb 23, 2021 at 11:23 AM Alex Riesen
 wrote:
>
> Alex Riesen, Tue, Feb 23, 2021 16:51:26 +0100:
> > Ilia Mirkin, Tue, Feb 23, 2021 16:46:52 +0100:
> > > I'd recommend using xf86-video-nouveau in any case, but some distros
> >
> > I would like try this out. Do you know how to force the xorg server to
> > choose this driver instead of modesetting?
>
> Found that myself (a Device section with Driver set to "nouveau"):
>
> $ xrandr  --listproviders
> Providers: number : 1
> Provider 0: id: 0x68 cap: 0x7, Source Output, Sink Output, Source Offload 
> crtcs: 4 outputs: 5 associated providers: 0 name:nouveau
>
> And yes, the cursor looks good in v5.11 even without reverting the commit.

FWIW it's not immediately apparent to me what grave error modesetting
is committing in setting the cursor. The logic looks perfectly
reasonable. It's not trying to be fancy with rendering the cursor/etc.

The one thing is that it's using drmModeSetCursor2 which sets the
hotspot at the same time. But internally inside nouveau I think it
should work out to the same thing. Perhaps setting the hotspot, or
something in that path, doesn't quite work for 256x256? [Again, no
clue what that might be.]

It might also be worthwhile just testing if the 256x256 cursor works
quite the way one would want. If you're interested, grab libdrm,
there's a test called 'modetest', which has an option to enable a
moving cursor (-c iirc). It's hard-coded to 64x64, so you'll have to
modify it there too (and probably change the pattern from plain gray
to any one of the other ones).

Cheers,

  -ilia

Re: [PATCH 2/3] drm/nouveau/kms/nv50-: Report max cursor size to userspace

2021-02-23 Thread Ilia Mirkin

On Tue, Feb 23, 2021 at 10:36 AM Alex Riesen
 wrote:
>
> Ilia Mirkin, Tue, Feb 23, 2021 15:56:21 +0100:
> > On Tue, Feb 23, 2021 at 9:26 AM Alex Riesen  
> > wrote:
> > > Lyude Paul, Tue, Jan 19, 2021 02:54:13 +0100:
> > > > diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
> > > > b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> > > > index c6367035970e..5f4f09a601d4 100644
> > > > --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
> > > > +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> > > > @@ -2663,6 +2663,14 @@ nv50_display_create(struct drm_device *dev)
> > > >   else
> > > >   nouveau_display(dev)->format_modifiers = 
> > > > disp50xx_modifiers;
> > > >
> > > > + if (disp->disp->object.oclass >= GK104_DISP) {
> > > > + dev->mode_config.cursor_width = 256;
> > > > + dev->mode_config.cursor_height = 256;
> > > > + } else {
> > > > + dev->mode_config.cursor_width = 64;
> > > > + dev->mode_config.cursor_height = 64;
> > > > + }
> > > > +
> > > >   /* create crtc objects to represent the hw heads */
> > > >   if (disp->disp->object.oclass >= GV100_DISP)
> > > >   crtcs = nvif_rd32(>object, 0x610060) & 0xff;
> > >
> > > This change broke X cursor in my setup, and reverting the commit restores 
> > > it.
> > >
> > > Dell Precision M4800, issue ~2014 with GK106GLM [Quadro K2100M] (rev a1).
> > > libdrm 2.4.91-1 (Debian 10.8 stable).
> > > There are no errors or warnings in Xorg logs nor in the kernel log.
> >
> > Could you confirm which ddx is driving the nvidia hw? You can find
> > this out by running "xrandr --listproviders", or also in the xorg log.
>
> xrandr(1) does not seem to list much:
>
> $ xrandr --listproviders
> Providers: number : 1
> Provider 0: id: 0x48 cap: 0xf, Source Output, Sink Output, Source Offload, 
> Sink Offload crtcs: 4 outputs: 5 associated providers: 0 name:modesetting

Thanks - this is what I was looking for. name:modesetting, i.e. the
modesetting ddx driver.

I checked nouveau source, and it seems like it uses a 64x64 cursor no
matter what. Not sure what the modesetting ddx does.

I'd recommend using xf86-video-nouveau in any case, but some distros
have decided to explicitly force modesetting in preference of nouveau.
Oh well. (And regardless, the regression should be addressed somehow,
but it's also good to understand what the problem is.)

Can you confirm what the problem with the cursor is?

  -ilia

Re: [PATCH 2/3] drm/nouveau/kms/nv50-: Report max cursor size to userspace

2021-02-23 Thread Ilia Mirkin

On Tue, Feb 23, 2021 at 9:26 AM Alex Riesen
 wrote:
>
> Lyude Paul, Tue, Jan 19, 2021 02:54:13 +0100:
> > diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
> > b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> > index c6367035970e..5f4f09a601d4 100644
> > --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
> > +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> > @@ -2663,6 +2663,14 @@ nv50_display_create(struct drm_device *dev)
> >   else
> >   nouveau_display(dev)->format_modifiers = disp50xx_modifiers;
> >
> > + if (disp->disp->object.oclass >= GK104_DISP) {
> > + dev->mode_config.cursor_width = 256;
> > + dev->mode_config.cursor_height = 256;
> > + } else {
> > + dev->mode_config.cursor_width = 64;
> > + dev->mode_config.cursor_height = 64;
> > + }
> > +
> >   /* create crtc objects to represent the hw heads */
> >   if (disp->disp->object.oclass >= GV100_DISP)
> >   crtcs = nvif_rd32(>object, 0x610060) & 0xff;
>
> This change broke X cursor in my setup, and reverting the commit restores it.
>
> Dell Precision M4800, issue ~2014 with GK106GLM [Quadro K2100M] (rev a1).
> libdrm 2.4.91-1 (Debian 10.8 stable).
> There are no errors or warnings in Xorg logs nor in the kernel log.

Hi Alex,

Could you confirm which ddx is driving the nvidia hw? You can find
this out by running "xrandr --listproviders", or also in the xorg log.

Thanks,

  -ilia

Re: [Nouveau] [RFC v3 05/10] drm/i915/dpcd_bl: Cleanup intel_dp_aux_vesa_enable_backlight() a bit

2021-02-05 Thread Ilia Mirkin

On Fri, Feb 5, 2021 at 6:45 PM Lyude Paul  wrote:
>
> Get rid of the extraneous switch case in here, and just open code
> edp_backlight_mode as we only ever use it once.
>
> Signed-off-by: Lyude Paul 
> ---
>  .../gpu/drm/i915/display/intel_dp_aux_backlight.c | 15 ++-
>  1 file changed, 2 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c 
> b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
> index c37ccc8538cb..95e3e344cf40 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
> @@ -382,7 +382,7 @@ intel_dp_aux_vesa_enable_backlight(const struct 
> intel_crtc_state *crtc_state,
> struct intel_dp *intel_dp = intel_attached_dp(connector);
> struct drm_i915_private *i915 = dp_to_i915(intel_dp);
> struct intel_panel *panel = >panel;
> -   u8 dpcd_buf, new_dpcd_buf, edp_backlight_mode;
> +   u8 dpcd_buf, new_dpcd_buf;
> u8 pwmgen_bit_count = panel->backlight.edp.vesa.pwmgen_bit_count;
>
> if (drm_dp_dpcd_readb(_dp->aux,
> @@ -393,12 +393,8 @@ intel_dp_aux_vesa_enable_backlight(const struct 
> intel_crtc_state *crtc_state,
> }
>
> new_dpcd_buf = dpcd_buf;
> -   edp_backlight_mode = dpcd_buf & DP_EDP_BACKLIGHT_CONTROL_MODE_MASK;
>
> -   switch (edp_backlight_mode) {
> -   case DP_EDP_BACKLIGHT_CONTROL_MODE_PWM:
> -   case DP_EDP_BACKLIGHT_CONTROL_MODE_PRESET:
> -   case DP_EDP_BACKLIGHT_CONTROL_MODE_PRODUCT:
> +   if ((dpcd_buf & DP_EDP_BACKLIGHT_CONTROL_MODE_MASK) != 
> DP_EDP_BACKLIGHT_CONTROL_MODE_MASK) {

You probably meant != MODE_DPCD?

> new_dpcd_buf &= ~DP_EDP_BACKLIGHT_CONTROL_MODE_MASK;
> new_dpcd_buf |= DP_EDP_BACKLIGHT_CONTROL_MODE_DPCD;
>
> @@ -406,13 +402,6 @@ intel_dp_aux_vesa_enable_backlight(const struct 
> intel_crtc_state *crtc_state,
>pwmgen_bit_count) != 1)
> drm_dbg_kms(>drm,
> "Failed to write aux pwmgen bit count\n");
> -
> -   break;
> -
> -   /* Do nothing when it is already DPCD mode */
> -   case DP_EDP_BACKLIGHT_CONTROL_MODE_DPCD:
> -   default:
> -   break;
> }
>
> if (panel->backlight.edp.vesa.pwm_freq_pre_divider) {
> --
> 2.29.2
>
> ___
> Nouveau mailing list
> nouv...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)

2020-12-29 Thread Ilia Mirkin

On Tue, Dec 29, 2020 at 10:52 AM Marc MERLIN  wrote:
>
> On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote:
> > > after boot, when it gets the right trigger (not sure which ones), it
> > > loops on this evern 2 seconds, mostly forever.
> >
> > The gpu suspends with runtime pm. And then gets woken up for some
> > reason (could be something quite silly, like lspci, or could be
> > something explicitly checking connectors, etc). Repeat.
>
> Ah, fair point.  Could it be powertop even?
> How would I go towards tracing that?
> Sounds like this would be a problem with all chips if userspace is able
> to wake them up every second or two with a probe. Now I wonder what
> broken userspace I have that could be doing this.

Well, it's a theory. Some userspace helpfully prevents the GPU from
suspending entirely, unfortunately I don't remember its name though by
messing with the attached audio device. It's very common and meant to
help... oh well.

>
> > Display offload usually requires acceleration -- the copies are done
> > using the DMA engine. Please make sure that you have firmware
> > available (and a new enough mesa). The errors suggest that you don't
> > have firmware available at the time that nouveau loads. Depending on
> > your setup, that might mean the firmware has to be built into the
> > kernel, or available in initramfs. (Or just regular filesystem if you
> > don't use a complicated boot sequence. But many people go with distro
> > defaults, which do have this complexity.)
>
> Hi Ilia, thanks for your answer.
>
> Do you think that could be a reason why the boot would hang for 2 full 
> minutes at every
> boot ever since I upgraded to 5.5?

I'd have to check, but I'm guessing TU104 acceleration became a thing
in 5.5. I would also not be very surprised if the code didn't handle
failure extremely gracefully - there definitely have been problems
with that in the past.

>
> Also, without wanting to sound like a full newbie, where is that
> firmware you're talking about? In my kernel source?
>
> Here's what I do have:
> sauron:/usr/local/bin# dpkggrep nouveau
> libdrm-nouveau2:amd64   install
> xserver-xorg-video-nouveau  install
>
> no nouveau-firmware package in debian:
> sauron:/usr/local/bin# apt-cache search nouveau
> bumblebee - NVIDIA Optimus support for Linux
> libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services 
> -- runtime
> xfonts-jmk - Jim Knoble's character-cell fonts for X
> xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver
>
> No firmware file on my disk:
> sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ 
> /lib/firmware/ |grep nouveau
> /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
> /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
> sauron:/usr/local/bin#
>
> The kernel module is in my initrd:
> sauron:/usr/local/bin# dd 
> if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | 
> gunzip | cpio -tdv | grep nouveau
> drwxr-xr-x   1 root root0 Nov 30 15:40 
> usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
> -rw-r--r--   1 root root  3691385 Nov 30 15:35 
> usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
> 17+1 records in
> 17+1 records out
> 52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s

I think that gets you out of "full newbie" land...

>
> What am I supposed to do/check next?
>
> Note that ultimately I only need nouveau not to hang my boot 2mn and do
> PM so that the nvidia chip goes to sleep since I don't use it.

I'm not extremely familiar with debian packaging, but the firmware is
provided by NVIDIA and shipped as part of linux-firmware:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia

This needs to be available at /lib/firmware/nvidia when nouveau loads.
Based on your email above, it's most likely that it would load from
the initrd - so make sure it's in there.

Of course now that I read your email a bit more carefully, it seems
your issue is with the "saving config space" messages. I'm not sure
I've seen those before. Perhaps you have some sort of debug enabled.
I'd find where in the kernel they are being produced, and what the
conditions for it are. But the failure to load firmware isn't great --
not 100% sure if it impacts runpm or not.

I just double-checked, TU10x accel came in via
afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6.
Initial TU10x support came in v5.0. So that doesn't line up with your
timeline.

Anyways, I'd definitely sort the firmware situation out, but it may
not be the cause of your problem.

Cheers,

  -ilia

Re: [PATCH 5.10 064/717] drm/edid: Fix uninitialized variable in drm_cvt_modes()

2020-12-28 Thread Ilia Mirkin

Hi Greg,

Linus had to apply a fixup for this patch. Please ensure that it's in
your patch list:

commit d652d5f1eeeb06046009f4fcb9b4542249526916
Author: Linus Torvalds 
Date:   Thu Dec 17 09:27:57 2020 -0800

drm/edid: fix objtool warning in drm_cvt_modes()

It does not appear to have a Fixes tag, so may not have been picked up
by your automated tooling.

Cheers,

  -ilia

On Mon, Dec 28, 2020 at 9:01 AM Greg Kroah-Hartman
 wrote:
>
> From: Lyude Paul 
>
> [ Upstream commit 991fcb77f490390bcad89fa67d95763c58cdc04c ]
>
> Noticed this when trying to compile with -Wall on a kernel fork. We
> potentially don't set width here, which causes the compiler to complain
> about width potentially being uninitialized in drm_cvt_modes(). So, let's
> fix that.
>
> Changes since v1:
> * Don't emit an error as this code isn't reachable, just mark it as such
> Changes since v2:
> * Remove now unused variable
>
> Fixes: 3f649ab728cd ("treewide: Remove uninitialized_var() usage")
> Signed-off-by: Lyude Paul 
> Reviewed-by: Ilia Mirkin 
> Link: 
> https://patchwork.freedesktop.org/patch/msgid/20201105235703.1328115-1-ly...@redhat.com
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/drm_edid.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> index 631125b46e04c..b84efd538a702 100644
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -3114,6 +3114,8 @@ static int drm_cvt_modes(struct drm_connector 
> *connector,
> case 0x0c:
> width = height * 15 / 9;
> break;
> +   default:
> +   unreachable();
> }
>
> for (j = 1; j < 5; j++) {
> --
> 2.27.0
>
>
>

Re: [Nouveau] 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)

2020-12-27 Thread Ilia Mirkin

On Sun, Dec 27, 2020 at 12:03 PM Marc MERLIN  wrote:
>
> This started with 5.5 and hasn't gotten better since then, despite some 
> reports
> I tried to send.
>
> As per my previous message:
> I have a Thinkpad P70 with hybrid graphics.
> 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] 
> (rev a2)
> that one works fine, I can use i915 for the main screen, and nouveau to
> display on the external ports (external ports are only wired to nvidia
> chip, so it's impossible to use them without turning the nvidia chip
> on).
>
> I now got a newer P73 also with the same hybrid graphics (setup as such
> in the bios). It runs fine with i915, and I don't need to use external
> display with nouveau for now (it almost works, but I only see the mouse
> cursor on the external screen, no window or anything else can get
> displayed, very weird).
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 
> 4000 Mobile / Max-Q] (rev a1)

Display offload usually requires acceleration -- the copies are done
using the DMA engine. Please make sure that you have firmware
available (and a new enough mesa). The errors suggest that you don't
have firmware available at the time that nouveau loads. Depending on
your setup, that might mean the firmware has to be built into the
kernel, or available in initramfs. (Or just regular filesystem if you
don't use a complicated boot sequence. But many people go with distro
defaults, which do have this complexity.)

>
>
> after boot, when it gets the right trigger (not sure which ones), it
> loops on this evern 2 seconds, mostly forever.

The gpu suspends with runtime pm. And then gets woken up for some
reason (could be something quite silly, like lspci, or could be
something explicitly checking connectors, etc). Repeat.

Cheers,

  -ilia

Re: [PATCH v8 4/4] NOTFORMERGE: drm/logicvc: Add plane colorkey support

2020-12-23 Thread Ilia Mirkin

FWIW this is something I added, hoping it was going to get used at
some point, but I never followed up with support in xf86-video-nouveau
for Xv. At this point, I'm not sure I ever will. I encoded the
"enabled" part into the value with a high bit (1<<24) -- not sure that
was such a great idea. All NVIDIA hardware supports colorkey (and not
actual alpha, until the very latest GPUs - Volta/Turing families),
although my implementation only covers the pre-G80 series (i.e. DX9
and earlier GPUs - pre-2008). Should a determination of usefulness be
reached, it would be easy to implement on the remainder though.

Cheers,

  -ilia

On Wed, Dec 23, 2020 at 5:20 PM Simon Ser  wrote:
>
> nouveau already has something for colorkey:
> https://drmdb.emersion.fr/properties/4008636142/colorkey
>
> I know this is marked "not for merge", but it would be nice to discuss
> with them and come up with a standardized property.
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Nouveau] [PATCH v2] ALSA: hda: Continue to probe when codec probe fails

2020-12-21 Thread Ilia Mirkin

On Mon, Dec 21, 2020 at 11:33 AM Kai-Heng Feng
 wrote:
>
> [+Cc nouveau]
>
> On Fri, Dec 18, 2020 at 4:06 PM Takashi Iwai  wrote:
> [snip]
> > > Quite possibly the system doesn't power up HDA controller when there's
> > > no external monitor.
> > > So when it's connected to external monitor, it's still needed for HDMI 
> > > audio.
> > > Let me ask the user to confirm this.
> >
> > Yeah, it's the basic question whether the HD-audio is supposed to work
> > on this machine at all.  If yes, the current approach we take makes
> > less sense - instead we should rather make the HD-audio controller
> > working.
>
> Yea, confirmed that the Nvidia HDA works when HDMI is connected prior boot.
>
> > > > - The second problem is that pci_enable_device() ignores the error
> > > >   returned from pci_set_power_state() if it's -EIO.  And the
> > > >   inaccessible access error returns -EIO, although it's rather a fatal
> > > >   problem.  So the driver believes as the PCI device gets enabled
> > > >   properly.
> > >
> > > This was introduced in 2005, by Alan's 11f3859b1e85 ("[PATCH] PCI: Fix
> > > regression in pci_enable_device_bars") to fix UHCI controller.
> > >
> > > >
> > > > - The third problem is that HD-audio driver blindly believes the
> > > >   codec_mask read from the register even if it's a read failure as I
> > > >   already showed.
> > >
> > > This approach has least regression risk.
> >
> > Yes, but it assumes that HD-audio is really non-existent.
>
> I really don't know any good approach to address this.
> On Windows, HDA PCI is "hidden" until HDMI cable is plugged, then the
> driver will flag the magic bit to make HDA audio appear on the PCI
> bus.
> IIRC the current approach is to make nouveau and device link work.

I don't have the full context of this discussion, but the kernel
force-enables the HDA subfunction nowadays, irrespective of nouveau or
nvidia or whatever:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/quirks.c?h=v5.10#n5267

Cheers,

  -ilia

Re: [Nouveau] Nouveau video --- [ cut here ] ----- crash dump 5.10.0-rc6

2020-12-02 Thread Ilia Mirkin

Unfortunately this isn't a crash, but rather a warning that things are
timing out. By the time you get this, the display is most likely hung.

Was there anything before this, e.g. an error state dump perhaps?

What GPU are you using, what displays, and how are they connected?
What kind of userspace is running here? X or a Wayland compositor (or
something else entirely)?

On Thu, Dec 3, 2020 at 12:13 AM Dave Airlie  wrote:
>
> cc'ing Ben + nouveau
>
> On Thu, 3 Dec 2020 at 14:59, bob  wrote:
> >
> > Hello.  I have a crash dump for:
> >
> > $ uname -a
> > Linux freedom 5.10.0-rc6 #1 SMP Sun Nov 29 17:26:13 MST 2020 x86_64
> > x86_64 x86_64 GNU/Linux
> >
> > Occasionally when this dumps it likes to lock up the computer, but I
> > caught it this time.
> >
> > Also video likes to flicker a lot.   Nouveau has been iffy since kernel
> > 5.8.0.
> >
> > This isn't the only dump, it dumped probably 50 times.  If you are
> > really desperate for all of it,
> >
> > reply to me directly as I'm not on the mailing list.  Here is one of them.
> >
> > [39019.426580] [ cut here ]
> > [39019.426589] WARNING: CPU: 6 PID: 14136 at
> > drivers/gpu/drm/nouveau/dispnv50/disp.c:211 nv50_dmac_wait+0x1e1/0x230
> > [39019.426590] Modules linked in: mt2131 s5h1409 fuse tda8290 tuner
> > cx25840 rt2800usb rt2x00usb rt2800lib snd_hda_codec_analog
> > snd_hda_codec_generic ledtrig_audio rt2x00lib binfmt_misc
> > intel_powerclamp coretemp cx23885 mac80211 tda18271 altera_stapl
> > videobuf2_dvb m88ds3103 tveeprom cx2341x dvb_core rc_core i2c_mux
> > snd_hda_codec_hdmi videobuf2_dma_sg videobuf2_memops videobuf2_v4l2
> > snd_hda_intel videobuf2_common snd_intel_dspcfg kvm_intel snd_hda_codec
> > videodev snd_hda_core kvm mc snd_hwdep snd_pcm_oss snd_mixer_oss
> > irqbypass snd_pcm cfg80211 snd_seq_dummy snd_seq_midi snd_seq_oss
> > snd_seq_midi_event snd_rawmidi snd_seq intel_cstate snd_seq_device
> > serio_raw snd_timer input_leds nfsd libarc4 snd asus_atk0110 i7core_edac
> > soundcore i5500_temp auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc
> > parport_pc ppdev lp parport ip_tables x_tables btrfs blake2b_generic
> > libcrc32c xor zstd_compress raid6_pq dm_mirror dm_region_hash dm_log
> > pata_acpi pata_marvell hid_generic usbhid hid psmouse firewire_ohci
> > [39019.426650]  firewire_core crc_itu_t i2c_i801 ahci sky2 libahci
> > i2c_smbus lpc_ich
> > [39019.426658] CPU: 6 PID: 14136 Comm: kworker/u16:0 Tainted: GW
> > I   5.10.0-rc6 #1
> > [39019.426659] Hardware name: System manufacturer System Product
> > Name/P6T DELUXE, BIOS 220909/21/2010
> > [39019.426662] Workqueue: events_unbound nv50_disp_atomic_commit_work
> > [39019.426665] RIP: 0010:nv50_dmac_wait+0x1e1/0x230
> > [39019.426667] Code: 8d 48 04 48 89 4a 68 c7 00 00 00 00 20 49 8b 46 38
> > 41 c7 86 20 01 00 00 00 00 00 00 49 89 46 68 e8 e4 fc ff ff e9 76 fe ff
> > ff <0f> 0b b8 92 ff ff ff e9 ed fe ff ff 49 8b be 80 00 00 00 e8 c7 fc
> > [39019.426668] RSP: 0018:b79d028ebd48 EFLAGS: 00010282
> > [39019.426670] RAX: ff92 RBX: 000d RCX:
> > 
> > [39019.426671] RDX: ff92 RSI: b79d028ebc88 RDI:
> > b79d028ebd28
> > [39019.426671] RBP: b79d028ebd48 R08:  R09:
> > b79d028ebc58
> > [39019.426672] R10: 0030 R11: 11c4 R12:
> > fffb
> > [39019.426673] R13: a05fc1ebd368 R14: a05fc1ebd3a8 R15:
> > a05fc2425000
> > [39019.426675] FS:  () GS:a061f3d8()
> > knlGS:
> > [39019.426676] CS:  0010 DS:  ES:  CR0: 80050033
> > [39019.426677] CR2: 7fb2d58e CR3: 00026280a000 CR4:
> > 06e0
> > [39019.426678] Call Trace:
> > [39019.426685]  base827c_image_set+0x2f/0x1d0
> > [39019.426687]  nv50_wndw_flush_set+0x89/0x1c0
> > [39019.426688]  nv50_disp_atomic_commit_tail+0x4e7/0x7e0
> > [39019.426693]  process_one_work+0x1d4/0x370
> > [39019.426695]  worker_thread+0x4a/0x3b0
> > [39019.426697]  ? process_one_work+0x370/0x370
> > [39019.426699]  kthread+0xfe/0x140
> > [39019.426701]  ? kthread_park+0x90/0x90
> > [39019.426704]  ret_from_fork+0x22/0x30
> > [39019.426706] ---[ end trace d512d675211c738c ]---
> > [39021.426751] [ cut here ]
> >
> >
> > Thanks in advance,
> >
> > Bob
> >
> ___
> Nouveau mailing list
> nouv...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH v3] drm/edid: Fix uninitialized variable in drm_cvt_modes()

2020-11-05 Thread Ilia Mirkin

On Thu, Nov 5, 2020 at 6:57 PM Lyude Paul  wrote:
>
> Noticed this when trying to compile with -Wall on a kernel fork. We 
> potentially
> don't set width here, which causes the compiler to complain about width
> potentially being uninitialized in drm_cvt_modes(). So, let's fix that.
>
> Changes since v1:
> * Don't emit an error as this code isn't reachable, just mark it as such
> Changes since v2:
> * Remove now unused variable
>
> Signed-off-by: Lyude Paul 
>
> Cc:  # v5.9+
> Fixes: 3f649ab728cd ("treewide: Remove uninitialized_var() usage")
> Signed-off-by: Lyude Paul 

For the very little it's worth,

Reviewed-by: Ilia Mirkin 

> ---
>  drivers/gpu/drm/drm_edid.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> index 631125b46e04..b84efd538a70 100644
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -3114,6 +3114,8 @@ static int drm_cvt_modes(struct drm_connector 
> *connector,
> case 0x0c:
> width = height * 15 / 9;
> break;
> +   default:
> +   unreachable();
> }
>
> for (j = 1; j < 5; j++) {
> --
> 2.28.0
>

Re: [PATCH v2] drm/edid: Fix uninitialized variable in drm_cvt_modes()

2020-11-03 Thread Ilia Mirkin

On Tue, Nov 3, 2020 at 5:15 PM Lyude Paul  wrote:
>
> Noticed this when trying to compile with -Wall on a kernel fork. We 
> potentially
> don't set width here, which causes the compiler to complain about width
> potentially being uninitialized in drm_cvt_modes(). So, let's fix that.
>
> Changes since v1:
> * Don't emit an error as this code isn't reachable, just mark it as such
>
> Signed-off-by: Lyude Paul 
>
> Cc:  # v5.9+
> Fixes: 3f649ab728cd ("treewide: Remove uninitialized_var() usage")
> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/drm_edid.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> index 631125b46e04..0643b98c6383 100644
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -3094,6 +3094,7 @@ static int drm_cvt_modes(struct drm_connector 
> *connector,
>
> for (i = 0; i < 4; i++) {
> int width, height;
> +   u8 cvt_aspect_ratio;
>
> cvt = &(timing->data.other_data.data.cvt[i]);
>
> @@ -3101,7 +3102,8 @@ static int drm_cvt_modes(struct drm_connector 
> *connector,
> continue;
>
> height = (cvt->code[0] + ((cvt->code[1] & 0xf0) << 4) + 1) * 
> 2;
> -   switch (cvt->code[1] & 0x0c) {
> +   cvt_aspect_ratio = cvt->code[1] & 0x0c;

The temp var doesn't do anything now right? Previously you were using
it in the print, but now you can drop these two hunks, I think?

  -ilia

> +   switch (cvt_aspect_ratio) {
> case 0x00:
> width = height * 4 / 3;
> break;
> @@ -3114,6 +3116,8 @@ static int drm_cvt_modes(struct drm_connector 
> *connector,
> case 0x0c:
> width = height * 15 / 9;
> break;
> +   default:
> +   unreachable();
> }
>
> for (j = 1; j < 5; j++) {
> --
> 2.28.0
>

Re: [PATCH] drm/edid: Fix uninitialized variable in drm_cvt_modes()

2020-11-03 Thread Ilia Mirkin

On Tue, Nov 3, 2020 at 2:47 PM Lyude Paul  wrote:
>
> Sorry! Thought I had responded to this but apparently not, comments down below
>
> On Thu, 2020-10-22 at 14:04 -0400, Ilia Mirkin wrote:
> > On Thu, Oct 22, 2020 at 12:55 PM Lyude Paul  wrote:
> > >
> > > Noticed this when trying to compile with -Wall on a kernel fork. We
> > > potentially
> > > don't set width here, which causes the compiler to complain about width
> > > potentially being uninitialized in drm_cvt_modes(). So, let's fix that.
> > >
> > > Signed-off-by: Lyude Paul 
> > >
> > > Cc:  # v5.9+
> > > Fixes: 3f649ab728cd ("treewide: Remove uninitialized_var() usage")
> > > Signed-off-by: Lyude Paul 
> > > ---
> > >  drivers/gpu/drm/drm_edid.c | 8 +++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> > > index 631125b46e04..2da158ffed8e 100644
> > > --- a/drivers/gpu/drm/drm_edid.c
> > > +++ b/drivers/gpu/drm/drm_edid.c
> > > @@ -3094,6 +3094,7 @@ static int drm_cvt_modes(struct drm_connector
> > > *connector,
> > >
> > > for (i = 0; i < 4; i++) {
> > > int width, height;
> > > +   u8 cvt_aspect_ratio;
> > >
> > > cvt = &(timing->data.other_data.data.cvt[i]);
> > >
> > > @@ -3101,7 +3102,8 @@ static int drm_cvt_modes(struct drm_connector
> > > *connector,
> > > continue;
> > >
> > > height = (cvt->code[0] + ((cvt->code[1] & 0xf0) << 4) + 
> > > 1) *
> > > 2;
> > > -   switch (cvt->code[1] & 0x0c) {
> > > +   cvt_aspect_ratio = cvt->code[1] & 0x0c;
> > > +   switch (cvt_aspect_ratio) {
> > > case 0x00:
> > > width = height * 4 / 3;
> > > break;
> > > @@ -3114,6 +3116,10 @@ static int drm_cvt_modes(struct drm_connector
> > > *connector,
> > > case 0x0c:
> > > width = height * 15 / 9;
> > > break;
> > > +   default:
> >
> > What value would cvt->code[1] have such that this gets hit?
> >
> > Or is this a "compiler is broken, so let's add more code" situation?
> > If so, perhaps the code added could just be enough to silence the
> > compiler (unreachable, etc)?
>
> I mean, this information comes from the EDID which inherently means it's 
> coming
> from an untrusted source so the value could be literally anything as long as 
> the
> EDID has a valid checksum. Note (assuming I'm understanding this code
> correctly):
>
> drm_add_edid_modes() → add_cvt_modes() → drm_for_each_detailed_block() →
> do_cvt_mode() → drm_cvt_modes()
>
> So afaict this isn't a broken compiler but a legitimate uninitialized 
> variable.

The value can be anything, but it has to be something. The switch is
on "unknown & 0x0c", so only 4 cases are possible, which are
enumerated in the switch.

  -ilia

Re: [PATCH] drm/edid: Fix uninitialized variable in drm_cvt_modes()

2020-10-22 Thread Ilia Mirkin

On Thu, Oct 22, 2020 at 12:55 PM Lyude Paul  wrote:
>
> Noticed this when trying to compile with -Wall on a kernel fork. We 
> potentially
> don't set width here, which causes the compiler to complain about width
> potentially being uninitialized in drm_cvt_modes(). So, let's fix that.
>
> Signed-off-by: Lyude Paul 
>
> Cc:  # v5.9+
> Fixes: 3f649ab728cd ("treewide: Remove uninitialized_var() usage")
> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/drm_edid.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> index 631125b46e04..2da158ffed8e 100644
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -3094,6 +3094,7 @@ static int drm_cvt_modes(struct drm_connector 
> *connector,
>
> for (i = 0; i < 4; i++) {
> int width, height;
> +   u8 cvt_aspect_ratio;
>
> cvt = &(timing->data.other_data.data.cvt[i]);
>
> @@ -3101,7 +3102,8 @@ static int drm_cvt_modes(struct drm_connector 
> *connector,
> continue;
>
> height = (cvt->code[0] + ((cvt->code[1] & 0xf0) << 4) + 1) * 
> 2;
> -   switch (cvt->code[1] & 0x0c) {
> +   cvt_aspect_ratio = cvt->code[1] & 0x0c;
> +   switch (cvt_aspect_ratio) {
> case 0x00:
> width = height * 4 / 3;
> break;
> @@ -3114,6 +3116,10 @@ static int drm_cvt_modes(struct drm_connector 
> *connector,
> case 0x0c:
> width = height * 15 / 9;
> break;
> +   default:

What value would cvt->code[1] have such that this gets hit?

Or is this a "compiler is broken, so let's add more code" situation?
If so, perhaps the code added could just be enough to silence the
compiler (unreachable, etc)?

  -ilia

Re: [Nouveau] nouveau broken on Riva TNT2 in 5.9.0-rc8: GPU not supported on big-endian

2020-10-09 Thread Ilia Mirkin

On Fri, Oct 9, 2020 at 5:54 PM Karol Herbst  wrote:
>
> On Fri, Oct 9, 2020 at 11:35 PM Ondrej Zary  wrote:
> >
> > Hello,
> > I'm testing 5.9.0-rc8 and found that Riva TNT2 stopped working:
> > [0.00] Linux version 5.9.0-rc8+ (zary@gsql) (gcc (Debian 8.3.0-6) 
> > 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #326 SMP Fri Oct 9 22:31:40 
> > CEST 2020
> > ...
> > [   14.771464] nouveau :01:00.0: GPU not supported on big-endian
> > [   14.771782] nouveau: probe of :01:00.0 failed with error -38
> >
> > big-endian? WTF? The machine is x86.
> >
>
> mhh, we reworked the endianess checks a bit and apparently that broke
> something... I will give it some thoughts, but could you be so kind
> and create an mmiotrace under 5.9 with nouveau? You won't need to
> start X or anything while doing it. Just enable the trace and modprobe
> nouveau and collect the trace.

Looks like nvkm_device_endianness unconditionally reads out 0x4. I
don't think that reg is there pre-NV11. At least NV4, NV5, NV10 and
maybe NV15 (which is logically pre-NV11) don't support big-endian
mode. Not sure about NV1A, which was the IGP of the series and IIRC
logically pre-NV11 as well (but clearly could only be used with x86
chips, since it was part of the motherboard).

Aha, it's documented in rnndb:

https://github.com/envytools/envytools/blob/master/rnndb/bus/pmc.xml

  -ilia

Re: [PATCH] drm/nouveau/kms/nv50-: Fix clock checking algorithm in nv50_dp_mode_valid()

2020-09-25 Thread Ilia Mirkin

On Fri, Sep 25, 2020 at 6:08 PM Lyude Paul  wrote:
>
> On Tue, 2020-09-22 at 17:22 -0400, Ilia Mirkin wrote:
> > On Tue, Sep 22, 2020 at 5:14 PM Lyude Paul  wrote:
> > > On Tue, 2020-09-22 at 17:10 -0400, Ilia Mirkin wrote:
> > > > Can we use 6bpc on arbitrary DP monitors, or is there a capability for
> > > > it? Maybe only use 6bpc if display_info.bpc == 6 and otherwise use 8?
> > >
> > > I don't think that display_info.bpc actually implies a minimum bpc, only a
> > > maximum bpc iirc (Ville would know the answer to this one). The other 
> > > thing
> > > to
> > > note here is that we want to assume the lowest possible bpc here since 
> > > we're
> > > only concerned if the mode passed to ->mode_valid can be set under -any-
> > > conditions (including those that require lowering the bpc beyond it's
> > > maximum
> > > value), so we definitely do want to always use 6bpc here even once we get
> > > support for optimizing the bpc based on the available display bandwidth.
> >
> > Yeah, display_info is the max bpc. But would an average monitor
> > support 6bpc? And if it does, does the current link training code even
> > try that when display_info.bpc != 6?
>
> So I did confirm that 6bpc support is mandatory for DP, so yes-6 bpc will 
> always
> work.
>
> But also, your second comment doesn't really apply here. So: to be clear, 
> we're
> not really concerned here about whether nouveau will actually use 6bpc or not.
> In truth I'm not actually sure either if we have any code that uses 6bpc (iirc
> we don't), since we don't current optimize bpc. I think it's very possible for
> us to use 6bpc for eDP displays if I recall though, but I'm not sure on that.
>
> But that's also not the point of this code. ->mode_valid() is only used in two
> situations in DRM modesetting: when probing connector modes, and when checking
> if a mode is valid or not during the atomic check for atomic modesetting. Its
> purpose is only to reject display modes that are physically impossible to set 
> in
> hardware due to static hardware constraints. Put another way, we only check 
> the
> given mode against constraints which will always remain constant regardless of
> the rest of the display state. An example of a static constraint would be the
> max pixel clock supported by the hardware, since on sensible hardware this 
> never
> changes. A dynamic constraint would be something like how much bandwidth is
> currently unused on an MST topology, since that value is entirely dependent on
> the rest of the display state.
>
> So - with that said, bpc is technically a dynamic constraint because while a
> sink and source both likely have their own bpc limits, any bpc which is equal 
> or
> below that limit can be used depending on what the driver decides - which will
> be based on the max_bpc property, and additionally for MST displays it will 
> also
> depend on the available bandwidth on the topology. The only non-dynamic thing
> about bpc is that at a minimum, it will be 6 - so any mode that doesn't fit on
> the link with a bpc of 6 is guaranteed to be a mode that we'll never be able 
> to
> set and therefore want to prune.
>
> So, even if we're not using 6 in the majority of situations, I'm fairly
> confident it's the right value here. It's also what i915 does as well (and 
> they
> previously had to fix a bug that was the result of assuming a minimum of 8bpc
> instead of 6).

Here's the situation I'm trying to avoid:

1. Mode is considered "OK" from a bandwidth perspective @6bpc
2. Modesetting logic never tries 6bpc and the modeset fails

As long as the two bits of logic agree on what is settable, I'm happy.

Cheers,

  -ilia

Re: [PATCH] drm/nouveau/kms/nv50-: Fix clock checking algorithm in nv50_dp_mode_valid()

2020-09-22 Thread Ilia Mirkin

On Tue, Sep 22, 2020 at 5:14 PM Lyude Paul  wrote:
>
> On Tue, 2020-09-22 at 17:10 -0400, Ilia Mirkin wrote:
> > Can we use 6bpc on arbitrary DP monitors, or is there a capability for
> > it? Maybe only use 6bpc if display_info.bpc == 6 and otherwise use 8?
>
> I don't think that display_info.bpc actually implies a minimum bpc, only a
> maximum bpc iirc (Ville would know the answer to this one). The other thing to
> note here is that we want to assume the lowest possible bpc here since we're
> only concerned if the mode passed to ->mode_valid can be set under -any-
> conditions (including those that require lowering the bpc beyond it's maximum
> value), so we definitely do want to always use 6bpc here even once we get
> support for optimizing the bpc based on the available display bandwidth.

Yeah, display_info is the max bpc. But would an average monitor
support 6bpc? And if it does, does the current link training code even
try that when display_info.bpc != 6?

  -ilia

Re: [PATCH] drm/nouveau/kms/nv50-: Fix clock checking algorithm in nv50_dp_mode_valid()

2020-09-22 Thread Ilia Mirkin

Can we use 6bpc on arbitrary DP monitors, or is there a capability for
it? Maybe only use 6bpc if display_info.bpc == 6 and otherwise use 8?

On Tue, Sep 22, 2020 at 5:06 PM Lyude Paul  wrote:
>
> While I thought I had this correct (since it actually did reject modes
> like I expected during testing), Ville Syrjala from Intel pointed out
> that the logic here isn't correct. max_clock refers to the max symbol
> rate supported by the encoder, so limiting clock to ds_clock using max()
> doesn't make sense. Additionally, we want to check against 6bpc for the
> time being since that's the minimum possible bpc here, not the reported
> bpc from the connector. See:
>
> https://lists.freedesktop.org/archives/dri-devel/2020-September/280276.html
>
> For more info.
>
> So, let's rewrite this using Ville's advice.
>
> Signed-off-by: Lyude Paul 
> Fixes: 409d38139b42 ("drm/nouveau/kms/nv50-: Use downstream DP clock limits 
> for mode validation")
> Cc: Ville Syrjälä 
> Cc: Lyude Paul 
> Cc: Ben Skeggs 
> ---
>  drivers/gpu/drm/nouveau/nouveau_dp.c | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dp.c 
> b/drivers/gpu/drm/nouveau/nouveau_dp.c
> index 7b640e05bd4cd..24c81e423d349 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dp.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dp.c
> @@ -231,23 +231,26 @@ nv50_dp_mode_valid(struct drm_connector *connector,
>const struct drm_display_mode *mode,
>unsigned *out_clock)
>  {
> -   const unsigned min_clock = 25000;
> -   unsigned max_clock, ds_clock, clock;
> +   const unsigned int min_clock = 25000;
> +   unsigned int max_clock, ds_clock, clock;
> +   const u8 bpp = 18; /* 6 bpc */
> enum drm_mode_status ret;
>
> if (mode->flags & DRM_MODE_FLAG_INTERLACE && !outp->caps.dp_interlace)
> return MODE_NO_INTERLACE;
>
> max_clock = outp->dp.link_nr * outp->dp.link_bw;
> -   ds_clock = drm_dp_downstream_max_dotclock(outp->dp.dpcd,
> - outp->dp.downstream_ports);
> -   if (ds_clock)
> -   max_clock = min(max_clock, ds_clock);
> -
> -   clock = mode->clock * (connector->display_info.bpc * 3) / 10;
> -   ret = nouveau_conn_mode_clock_valid(mode, min_clock, max_clock,
> -   );
> +   clock = mode->clock * bpp / 8;
> +   if (clock > max_clock)
> +   return MODE_CLOCK_HIGH;
> +
> +   ds_clock = drm_dp_downstream_max_dotclock(outp->dp.dpcd, 
> outp->dp.downstream_ports);
> +   if (ds_clock && mode->clock > ds_clock)
> +   return MODE_CLOCK_HIGH;
> +
> +   ret = nouveau_conn_mode_clock_valid(mode, min_clock, max_clock, 
> );
> if (out_clock)
> *out_clock = clock;
> +
> return ret;
>  }
> --
> 2.26.2
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Nouveau] 2dd4d163cd9c ("drm/nouveau: remove open-coded version of remove_conflicting_pci_framebuffers()")

2020-06-18 Thread Ilia Mirkin

Hi Boris,

There was a fixup to that patch that you'll also have to revert first
-- 7dbbdd37f2ae7dd4175ba3f86f4335c463b18403. I guess there's some
subtle difference between the old open-coded logic and the helper,
they were supposed to be identical.

Cheers,

  -ilia

On Thu, Jun 18, 2020 at 4:09 PM Borislav Petkov  wrote:
>
> Hi,
>
> my test box won't boot 5.8-rc1 all the way but stops at
>
> ...
> fb0: switching to nouveaufb from EFI VGA
> <-- EOF
>
> I've bisected it to the commit in $Subject, see below. Unfortunately, it
> doesn't revert cleanly so I can't really do the final test of reverting
> it ontop of 5.8-rc1 to confirm that this one is really causing it.
>
> Any ideas?
>
> GPU is:
>
> [5.678614] fb0: switching to nouveaufb from EFI VGA
> [5.685577] Console: switching to colour dummy device 80x25
> [5.691865] nouveau :03:00.0: NVIDIA GT218 (0a8c00b1)
> [5.814409] nouveau :03:00.0: bios: version 70.18.83.00.08
> [5.823559] nouveau :03:00.0: fb: 512 MiB DDR3
> [6.096680] [TTM] Zone  kernel: Available graphics memory: 8158364 KiB
> [6.103327] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
> [6.109951] [TTM] Initializing pool allocator
> [6.114405] [TTM] Initializing DMA pool allocator
> [6.119256] nouveau :03:00.0: DRM: VRAM: 512 MiB
> [6.124285] nouveau :03:00.0: DRM: GART: 1048576 MiB
> [6.129677] nouveau :03:00.0: DRM: TMDS table version 2.0
> [6.135534] nouveau :03:00.0: DRM: DCB version 4.0
> [6.140755] nouveau :03:00.0: DRM: DCB outp 00: 02000360 
> [6.147273] nouveau :03:00.0: DRM: DCB outp 01: 02000362 00020010
> [6.153782] nouveau :03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> [6.160292] nouveau :03:00.0: DRM: DCB outp 03: 01011380 
> [6.166810] nouveau :03:00.0: DRM: DCB outp 04: 08011382 00020010
> [6.173306] nouveau :03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> [6.179829] nouveau :03:00.0: DRM: DCB conn 00: 00101064
> [6.185553] nouveau :03:00.0: DRM: DCB conn 01: 00202165
> [6.196145] nouveau :03:00.0: DRM: MM: using COPY for buffer copies
> [6.233659] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [6.311939] nouveau :03:00.0: DRM: allocated 1920x1080 fb: 0x7, bo 
> (ptrval)
> [6.320736] fbcon: nouveaudrmfb (fb0) is primary device
> [6.392722] tsc: Refined TSC clocksource calibration: 3591.346 MHz
> [6.392788] clocksource: tsc: mask: 0x max_cycles: 
> 0x33c46403b59, max_idle_ns: 440795293818 ns
> [6.392930] clocksource: Switched to clocksource tsc
> [6.509946] Console: switching to colour frame buffer device 240x67
> [6.546287] nouveau :03:00.0: fb0: nouveaudrmfb frame buffer device
> [6.555021] [drm] Initialized nouveau 1.3.1 20120801 for :03:00.0 on 
> minor 0
>
> Thx.
>
> git bisect start
> # good: [3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162] Linux 5.7
> git bisect good 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162
> # bad: [b3a9e3b9622ae10064826dccb4f7a52bd88c7407] Linux 5.8-rc1
> git bisect bad b3a9e3b9622ae10064826dccb4f7a52bd88c7407
> # bad: [ee01c4d72adffb7d424535adf630f2955748fa8b] Merge branch 'akpm' 
> (patches from Andrew)
> git bisect bad ee01c4d72adffb7d424535adf630f2955748fa8b
> # bad: [16d91548d1057691979de4686693f0ff92f46000] Merge tag 'xfs-5.8-merge-8' 
> of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
> git bisect bad 16d91548d1057691979de4686693f0ff92f46000
> # good: [cfa3b8068b09f25037146bfd5eed041b78878bee] Merge tag 'for-linus-hmm' 
> of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
> git bisect good cfa3b8068b09f25037146bfd5eed041b78878bee
> # good: [3fd911b69b3117e03181262fc19ae6c3ef6962ce] Merge tag 
> 'drm-misc-next-2020-05-07' of git://anongit.freedesktop.org/drm/drm-misc into 
> drm-next
> git bisect good 3fd911b69b3117e03181262fc19ae6c3ef6962ce
> # bad: [1966391fa576e1fb2701be8bcca197d8f72737b7] mm/migrate.c: 
> attach_page_private already does the get_page
> git bisect bad 1966391fa576e1fb2701be8bcca197d8f72737b7
> # good: [43c8546bcd854806736d8a635a0d696504dd4c21] drm/amdgpu: Add a UAPI 
> flag for user to call mem_sync
> git bisect good 43c8546bcd854806736d8a635a0d696504dd4c21
> # good: [6cf991611bc72c077f0cc64e23987341ad7ef41e] Merge tag 
> 'drm-intel-next-2020-05-15' of git://anongit.freedesktop.org/drm/drm-intel 
> into drm-next
> git bisect good 6cf991611bc72c077f0cc64e23987341ad7ef41e
> # bad: [dc455f4c888365595c0a13da445e092422d55b8d] drm/nouveau/dispnv50: fix 
> runtime pm imbalance on error
> git bisect bad dc455f4c888365595c0a13da445e092422d55b8d
> # bad: [2dd4d163cd9c15432524aa9863155bc03a821361] drm/nouveau: remove 
> open-coded version of remove_conflicting_pci_framebuffers()
> git bisect bad 2dd4d163cd9c15432524aa9863155bc03a821361
> # good: [c41219fda6e04255c44d37fd2c0d898c1c46abf1] Merge tag 
> 'drm-intel-next-fixes-2020-05-20' of 
>

Re: [Nouveau] NVIDIA GP107 (137000a1) - acr: failed to load firmware

2020-06-04 Thread Ilia Mirkin

On Thu, Jun 4, 2020 at 12:04 PM Zeno Davatz  wrote:
>
> Thank you, Ilia
>
> On Thu, Jun 4, 2020 at 5:25 PM Ilia Mirkin  wrote:
>
> > There's a lot more firmware files than that ... everything in the
> > gp107 directory. Also this would only be necessary if nouveau is built
> > into the kernel. The files just have to be available whenever nouveau
> > is loaded -- if it's built in, that means the firmware has to be baked
> > into the kernel too. If it's loaded from initrd, that means the
> > firmware has to be in initrd. If it's loaded after boot, then the
> > firmware has to be available after boot.
>
> For the time being I got it working by removing all nouveau selections
> in "make menuconfig" and by emerging "x11-drivers/nvidia-drivers"
> Version 440.82.
>
> Back on the latest Linux Kernel. Feels great ;).
>
> Linux zenogentoo 5.7.0 #84 SMP Thu Jun 4 17:47:15 CEST 2020 x86_64
> Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux

Not sure why you bother asking questions when you're just going to
dump nouveau anyways. This is the second time I've answered your
questions on this very topic, I think it'll be the last too.

Cheers,

  -ilia

Re: [Nouveau] NVIDIA GP107 (137000a1) - acr: failed to load firmware

2020-06-04 Thread Ilia Mirkin

On Thu, Jun 4, 2020 at 11:16 AM Zeno Davatz  wrote:
>
> On Thu, Jun 4, 2020 at 4:36 PM Ilia Mirkin  wrote:
> >
> > Starting with kernel 5.6, loading nouveau without firmware (for GPUs
> > where it is required, such as yours) got broken.
> >
> > You are loading nouveau without firmware, so it fails.
> >
> > The firmware needs to be available to the kernel at the time of nouveau 
> > loading.
>
> Ok, I am now trying this:
>
> /usr/src/linux> grep FIRMWARE /usr/src/linux/.config
> CONFIG_FIRMWARE_MEMMAP=y
> # CONFIG_GOOGLE_FIRMWARE is not set
> CONFIG_PREVENT_FIRMWARE_BUILD=y
> CONFIG_EXTRA_FIRMWARE="nvidia/gp107/gr/sw_nonctx.bin"
> # CONFIG_CYPRESS_FIRMWARE is not set
> # CONFIG_DRM_LOAD_EDID_FIRMWARE is not set
> # CONFIG_FIRMWARE_EDID is not set
> # CONFIG_TEST_FIRMWARE is not set

There's a lot more firmware files than that ... everything in the
gp107 directory. Also this would only be necessary if nouveau is built
into the kernel. The files just have to be available whenever nouveau
is loaded -- if it's built in, that means the firmware has to be baked
into the kernel too. If it's loaded from initrd, that means the
firmware has to be in initrd. If it's loaded after boot, then the
firmware has to be available after boot.

Cheers,

  -ilia

Re: [Nouveau] NVIDIA GP107 (137000a1) - acr: failed to load firmware

2020-06-04 Thread Ilia Mirkin

Starting with kernel 5.6, loading nouveau without firmware (for GPUs
where it is required, such as yours) got broken.

You are loading nouveau without firmware, so it fails.

The firmware needs to be available to the kernel at the time of nouveau loading.

Cheers,

  -ilia

On Thu, Jun 4, 2020 at 10:24 AM Zeno Davatz  wrote:
>
> Hi
>
> With Kernel 5.7 I am still getting this, while booting:
>
> ~> uname -a
> Linux zenogentoo 5.7.0 #80 SMP Thu Jun 4 16:10:03 CEST 2020 x86_64
> Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux
> ~> dmesg |grep nouveau
> [0.762872] nouveau :05:00.0: NVIDIA GP107 (137000a1)
> [0.875311] nouveau :05:00.0: bios: version 86.07.42.00.4a
> [0.875681] nouveau :05:00.0: acr: failed to load firmware
> [0.875780] nouveau :05:00.0: acr: failed to load firmware
> [0.875881] nouveau :05:00.0: acr ctor failed, -2
> [0.875980] nouveau: probe of :05:00.0 failed with error -2
>
> Old thread is here: https://lkml.org/lkml/2020/4/3/775
>
> My Linxu-Firmware is: linux-firmware-20200421
>
> This used to work fine with Kernel 5.5.
>
> Please CC me for replies.
>
> best
> Zeno
> ___
> Nouveau mailing list
> nouv...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nouveau: add fbdev dependency

2020-05-27 Thread Ilia Mirkin

Isn't this already fixed by

https://cgit.freedesktop.org/drm/drm/commit/?id=7dbbdd37f2ae7dd4175ba3f86f4335c463b18403

On Wed, May 27, 2020 at 9:43 AM Arnd Bergmann  wrote:
>
> Calling directly into the fbdev stack only works when the
> fbdev layer is built into the kernel as well, or both are
> loadable modules:
>
> drivers/gpu/drm/nouveau/nouveau_drm.o: in function `nouveau_drm_probe':
> nouveau_drm.c:(.text+0x1f90): undefined reference to 
> `remove_conflicting_pci_framebuffers'
>
> The change seems to have been intentional, so add an explicit
> dependency here but allow it to still be compiled if FBDEV
> is completely disabled.
>
> Fixes: 2dd4d163cd9c ("drm/nouveau: remove open-coded version of 
> remove_conflicting_pci_framebuffers()")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/gpu/drm/nouveau/Kconfig   | 1 +
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
> index 980ed09bd7f6..8c640f003358 100644
> --- a/drivers/gpu/drm/nouveau/Kconfig
> +++ b/drivers/gpu/drm/nouveau/Kconfig
> @@ -18,6 +18,7 @@ config DRM_NOUVEAU
> select THERMAL if ACPI && X86
> select ACPI_VIDEO if ACPI && X86
> select SND_HDA_COMPONENT if SND_HDA_CORE
> +   depends on FBDEV || !FBDEV
> help
>   Choose this option for open-source NVIDIA support.
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
> b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index eb10c80ed853..e8560444ab57 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -697,7 +697,8 @@ static int nouveau_drm_probe(struct pci_dev *pdev,
> nvkm_device_del();
>
> /* Remove conflicting drivers (vesafb, efifb etc). */
> -   ret = remove_conflicting_pci_framebuffers(pdev, "nouveaufb");
> +   if (IS_ENABLED(CONFIG_FBDEV))
> +   ret = remove_conflicting_pci_framebuffers(pdev, "nouveaufb");
> if (ret)
> return ret;
>
> --
> 2.26.2
>
> ___
> Nouveau mailing list
> nouv...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH v3 3/5] drm/nouveau/kms/gv100-: Add support for interlaced modes

2020-05-11 Thread Ilia Mirkin

On Mon, May 11, 2020 at 6:42 PM Lyude Paul  wrote:
> diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c 
> b/drivers/gpu/drm/nouveau/nouveau_connector.c
> index 43bcbb6d73c4..6dae00da5d7e 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_connector.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
> @@ -1065,7 +1065,7 @@ nouveau_connector_mode_valid(struct drm_connector 
> *connector,
> return get_slave_funcs(encoder)->mode_valid(encoder, mode);
> case DCB_OUTPUT_DP:
> if (mode->flags & DRM_MODE_FLAG_INTERLACE &&
> -   !nv_encoder->dp.caps.interlace)
> +   !nv_encoder->caps.dp_interlace)
> return MODE_NO_INTERLACE;
>
> max_clock  = nv_encoder->dp.link_nr;

You probably meant for this hunk to go into an earlier patch.

  -ilia

Re: [PATCH v2 1/4] drm/komeda: Add a new helper drm_color_ctm_s31_32_to_qm_n()

2019-10-14 Thread Ilia Mirkin

On Mon, Oct 14, 2019 at 9:16 PM james qian wang (Arm Technology China)
 wrote:
> On Mon, Oct 14, 2019 at 11:58:48AM -0400, Ilia Mirkin wrote:
> > On Fri, Oct 11, 2019 at 1:43 AM james qian wang (Arm Technology China)
> >  wrote:
> > > + *
> > > + * Convert and clamp S31.32 sign-magnitude to Qm.n 2's complement.
> > > + */
> > > +uint64_t drm_color_ctm_s31_32_to_qm_n(uint64_t user_input,
> > > + uint32_t m, uint32_t n)
> > > +{
> > > +   u64 mag = (user_input & ~BIT_ULL(63)) >> (32 - n);
> > > +   bool negative = !!(user_input & BIT_ULL(63));
> > > +   s64 val;
> > > +
> > > +   /* the range of signed 2s complement is [-2^n+m, 2^n+m - 1] */
> >
> > This implies that n = 32, m = 0 would actually yield a 33-bit 2's
> > complement number. Is that what you meant?
>
> Yes, since m doesn't include sign-bit So a Q0.32 is a 33bit value.

This goes counter to what the wikipedia page says [
https://en.wikipedia.org/wiki/Q_(number_format) ]:

(reformatted slightly for text-only consumption):

"""
For example, a Q15.1 format number:

- requires 15+1 = 16 bits
- its range is [-2^14, 2^14 - 2^-1] = [-16384.0, +16383.5] = [0x8000,
0x8001 ... 0x, 0x, 0x0001 ... 0x7FFE, 0x7FFF]
- its resolution is 2^-1 = 0.5
"""

This suggests that the proper way to represent a standard 32-bit 2's
complement integer would be Q32.0.

  -ilia

Re: [PATCH 13/36] drm/nouveau: use bpp instead of cpp for drm_format_info

2019-09-23 Thread Ilia Mirkin

On Mon, Sep 23, 2019 at 8:56 AM Sandy Huang  wrote:
>
> cpp[BytePerPlane] can't describe the 10bit data format correctly,
> So we use bpp[BitPerPlane] to instead cpp.
>
> Signed-off-by: Sandy Huang 
> ---
>  drivers/gpu/drm/nouveau/dispnv04/crtc.c | 7 ---
>  drivers/gpu/drm/nouveau/dispnv50/base507c.c | 4 ++--
>  drivers/gpu/drm/nouveau/dispnv50/ovly507e.c | 2 +-
>  3 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
> b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> index f22f010..59d2f07 100644
> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> @@ -874,11 +874,12 @@ nv04_crtc_do_mode_set_base(struct drm_crtc *crtc,
>
> /* Update the framebuffer location. */
> regp->fb_start = nv_crtc->fb.offset & ~3;
> -   regp->fb_start += (y * drm_fb->pitches[0]) + (x * 
> drm_fb->format->cpp[0]);
> +   regp->fb_start += (y * drm_fb->pitches[0]) +
> +   (x * drm_fb->format->bpp[0] / 8);
> nv_set_crtc_base(dev, nv_crtc->index, regp->fb_start);
>
> /* Update the arbitration parameters. */
> -   nouveau_calc_arb(dev, crtc->mode.clock, drm_fb->format->cpp[0] * 8,
> +   nouveau_calc_arb(dev, crtc->mode.clock, drm_fb->format->bpp[0],
>  _burst, _lwm);
>
> regp->CRTC[NV_CIO_CRE_FF_INDEX] = arb_burst;
> @@ -1238,7 +1239,7 @@ nv04_crtc_page_flip(struct drm_crtc *crtc, struct 
> drm_framebuffer *fb,
>
> /* Initialize a page flip struct */
> *s = (struct nv04_page_flip_state)
> -   { { }, event, crtc, fb->format->cpp[0] * 8, fb->pitches[0],
> +   { { }, event, crtc, fb->format->bpp[0], fb->pitches[0],
>   new_bo->bo.offset };
>
> /* Keep vblanks on during flip, for the target crtc of this flip */
> diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c 
> b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
> index d5e295c..59883bd0 100644
> --- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c
> +++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
> @@ -190,12 +190,12 @@ base507c_acquire(struct nv50_wndw *wndw, struct 
> nv50_wndw_atom *asyw,
> return ret;
>
> if (!wndw->func->ilut) {
> -   if ((asyh->base.cpp != 1) ^ (fb->format->cpp[0] != 1))
> +   if (asyh->base.cpp != 1 ^ fb->format->bpp[0] != 8)

Please leave the parens in. Even if it works out to the same thing
(don't know), ^ vs != ordering isn't fresh in many people's minds
(mine included).

> asyh->state.color_mgmt_changed = true;
> }
>
> asyh->base.depth = fb->format->depth;
> -   asyh->base.cpp = fb->format->cpp[0];
> +   asyh->base.cpp = fb->format->bpp[0] / 8;
> asyh->base.x = asyw->state.src.x1 >> 16;
> asyh->base.y = asyw->state.src.y1 >> 16;
> asyh->base.w = asyw->state.fb->width;
> diff --git a/drivers/gpu/drm/nouveau/dispnv50/ovly507e.c 
> b/drivers/gpu/drm/nouveau/dispnv50/ovly507e.c
> index cc41766..c6c2e0b 100644
> --- a/drivers/gpu/drm/nouveau/dispnv50/ovly507e.c
> +++ b/drivers/gpu/drm/nouveau/dispnv50/ovly507e.c
> @@ -135,7 +135,7 @@ ovly507e_acquire(struct nv50_wndw *wndw, struct 
> nv50_wndw_atom *asyw,
> if (ret)
> return ret;
>
> -   asyh->ovly.cpp = fb->format->cpp[0];
> +   asyh->ovly.cpp = fb->format->bpp[0] / 8;
> return 0;
>  }
>
> --
> 2.7.4
>
>
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 4.19 092/190] drm/nouveau: Dont WARN_ON VCPI allocation failures

2019-09-13 Thread Ilia Mirkin

On Fri, Sep 13, 2019 at 11:01 AM Sasha Levin  wrote:
>
> On Fri, Sep 13, 2019 at 03:54:56PM +0100, Greg Kroah-Hartman wrote:
> >On Fri, Sep 13, 2019 at 10:46:27AM -0400, Sasha Levin wrote:
> >> On Fri, Sep 13, 2019 at 09:33:36AM -0400, Ilia Mirkin wrote:
> >> > Hi Greg,
> >> >
> >> > This feels like it's missing a From: line.
> >> >
> >> > commit b513a18cf1d705bd04efd91c417e79e4938be093
> >> > Author: Lyude Paul 
> >> > Date:   Mon Jan 28 16:03:50 2019 -0500
> >> >
> >> >drm/nouveau: Don't WARN_ON VCPI allocation failures
> >> >
> >> > Is this an artifact of your notification-of-patches process and I
> >> > never noticed before, or was the patch ingested incorrectly?
> >>
> >> It was always like this for patches that came through me. Greg's script
> >> generates an explicit "From:" line in the patch, but I never saw the
> >> value in that since git does the right thing by looking at the "From:"
> >> line in the mail header.
> >>
> >> The right thing is being done in stable-rc and for the releases. For
> >> your example here, this is how it looks like in the stable-rc tree:
> >>
> >> commit bdcc885be68289a37d0d063cd94390da81fd8178
> >> Author: Lyude Paul 
> >> AuthorDate: Mon Jan 28 16:03:50 2019 -0500
> >> Commit: Greg Kroah-Hartman 
> >> CommitDate: Fri Sep 13 14:05:29 2019 +0100
> >>
> >>drm/nouveau: Don't WARN_ON VCPI allocation failures
> >
> >Yeah, we should fix your scripts to put the explicit From: line in here
> >as we are dealing with patches in this format and it causes confusion at
> >times (like now.)  It's not the first time and that's why I added those
> >lines to the patches.
>
> Heh, didn't think anyone cared about this scenario for the stable-rc
> patches.
>
> I'll go add it.
>
> But... why do you actually care?

Just a hygiene thing. Everyone else sends patches the normal way, with
accurate attribution. Why should stable be different?

(I was surprised to see Greg contributing to nouveau when I first saw
the patch. But then realized it was the stable ingestion
notification.)

  -ilia

Re: [PATCH 4.19 092/190] drm/nouveau: Dont WARN_ON VCPI allocation failures

2019-09-13 Thread Ilia Mirkin

Hi Greg,

This feels like it's missing a From: line.

commit b513a18cf1d705bd04efd91c417e79e4938be093
Author: Lyude Paul 
Date:   Mon Jan 28 16:03:50 2019 -0500

drm/nouveau: Don't WARN_ON VCPI allocation failures

Is this an artifact of your notification-of-patches process and I
never noticed before, or was the patch ingested incorrectly?

Cheers,

  -ilia

On Fri, Sep 13, 2019 at 9:16 AM Greg Kroah-Hartman
 wrote:
>
> [ Upstream commit b513a18cf1d705bd04efd91c417e79e4938be093 ]
>
> This is much louder then we want. VCPI allocation failures are quite
> normal, since they will happen if any part of the modesetting process is
> interrupted by removing the DP MST topology in question. So just print a
> debugging message on VCPI failures instead.
>
> Signed-off-by: Lyude Paul 
> Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 
> multi-stream")
> Cc: Ben Skeggs 
> Cc: dri-de...@lists.freedesktop.org
> Cc: nouv...@lists.freedesktop.org
> Cc:  # v4.10+
> Signed-off-by: Ben Skeggs 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/nouveau/dispnv50/disp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
> b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> index f889d41a281fa..5e01bfb69d7a3 100644
> --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
> +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> @@ -759,7 +759,8 @@ nv50_msto_enable(struct drm_encoder *encoder)
>
> slots = drm_dp_find_vcpi_slots(>mgr, mstc->pbn);
> r = drm_dp_mst_allocate_vcpi(>mgr, mstc->port, mstc->pbn, 
> slots);
> -   WARN_ON(!r);
> +   if (!r)
> +   DRM_DEBUG_KMS("Failed to allocate VCPI\n");
>
> if (!mstm->links++)
> nv50_outp_acquire(mstm->outp);
> --
> 2.20.1
>
>
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Nouveau] [PATCH 20/22] mm: move hmm_vma_fault to nouveau

2019-07-03 Thread Ilia Mirkin

On Wed, Jul 3, 2019 at 1:49 PM Ralph Campbell  wrote:
> On 6/30/19 11:20 PM, Christoph Hellwig wrote:
> > hmm_vma_fault is marked as a legacy API to get rid of, but quite suites
> > the current nouvea flow.  Move it to the only user in preparation for
>
> I didn't quite parse the phrase "quite suites the current nouvea flow."
> s/nouvea/nouveau/

As long as you're fixing typos, suites -> suits.

Re: nouveau: DRM: GPU lockup - switching to software fbcon

2019-06-19 Thread Ilia Mirkin

On Wed, Jun 19, 2019 at 1:48 AM Sergey Senozhatsky
 wrote:
>
> On (06/19/19 01:20), Ilia Mirkin wrote:
> > On Wed, Jun 19, 2019 at 1:08 AM Sergey Senozhatsky
> >  wrote:
> > >
> > > On (06/14/19 11:50), Sergey Senozhatsky wrote:
> > > > dmesg
> > > >
> > > >  nouveau :01:00.0: DRM: GPU lockup - switching to software fbcon
> > > >  nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
> > > >  nouveau :01:00.0: fifo: runlist 0: scheduled for recovery
> > > >  nouveau :01:00.0: fifo: channel 5: killed
> > > >  nouveau :01:00.0: fifo: engine 6: scheduled for recovery
> > > >  nouveau :01:00.0: fifo: engine 0: scheduled for recovery
> > > >  nouveau :01:00.0: firefox[476]: channel 5 killed!
> > > >  nouveau :01:00.0: firefox[476]: failed to idle channel 5 
> > > > [firefox[476]]
> > > >
> > > > It lockups several times a day. Twice in just one hour today.
> > > > Can we fix this?
> > >
> > > Unusable
> >
> > Are you using a GTX 660 by any chance? You've provided rather minimal
> > system info.
>
> 01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 730] 
> (rev a1)

Quite literally the same GPU I have plugged in...

02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B
[GeForce GT 730] [10de:1287] (rev a1)

Works great here! Only other thing I can think of is that I avoid
applications with the letters "G" and "K" in their names, and I'm
using xf86-video-nouveau ddx, whereas you might be using the "modeset"
ddx with glamor.

If all else fails, just remove nouveau_dri.so and/or boot with
nouveau.noaccel=1 -- should be perfect.

Cheers,

  -ilia

Re: drm/nouveau/bios/ramcfg, setting of RON pull value

2019-02-16 Thread Ilia Mirkin

On Sat, Feb 16, 2019 at 10:02 AM Colin Ian King
 wrote:
>
> Hi,
>
> Static Analysis with CoverityScan as detected an issue with the setting
> of the RON pull value in function nvkm_gddr3_calc in
> drm/nouveau/bios/ramcfg.c
>
> This was introduced by commit: c25bf7b6155cb ("drm/nouveau/bios/ramcfg:
> Separate out RON pull value")
>
> CoverityScan reports the issue as follows:
>
>  84case 0x20:
>  85CWL = (ram->next->bios.timing[1] & 0x0f80) >> 7;
>  86CL  = (ram->next->bios.timing[1] & 0x001f) >> 0;
>  87WR  = (ram->next->bios.timing[2] & 0x007f) >> 16;
>  88/* XXX: Get these values from the VBIOS instead */
>  89DLL = !(ram->mr[1] & 0x1);
>
>CID 1324005 (#1 of 1): Operands don't affect result
> (CONSTANT_EXPRESSION_RESULT)
>
> result_independent_of_operands: !(ram->mr[1] & 768) >> 8 is 0 regardless
> of the values of its operands. This occurs as the operand of assignment.
>
>  90RON = !(ram->mr[1] & 0x300) >> 8;
>  91break;
>
> Looking at this, I believe perhaps the correct setting could be:
>
> RON = !((ram->mr[1] & 0x300) >> 8);
>
> ..however I don't have the datasheet available for the H/W so I'm not
> sure if this a correct fix.

Actually looking at the code a bit, I suspect it should just be

RON = (ram->mr[1] & 0x300) >> 8;

since later on, when we recompose the MR (memory register) value, we do:

ram->mr[1] |= (RON & 0x03) << 8;

(And the whole point here is that we don't know how to get the proper
RON value for that timing table version, so we just copy whatever used
to be there in that case.)

  -ilia

Re: Nouveau module X server not starting on a NP900X5N Kaby Lake machine

2019-01-01 Thread Ilia Mirkin

On Tue, Jan 1, 2019 at 5:30 PM Jan Vlietland  wrote:
>
> Hi Ilia, many thanks for answering my mail.
>
> Tonight I tried to see what happens if I generate a xorg.conf file and place 
> it in /etc/X11/xorg.conf, as described here:
> https://askubuntu.com/questions/4662/where-is-the-x-org-config-file-how-do-i-configure-x-there
>
> When I do that X starts without the framebuffer error. X starts with a 
> backtrace list in the shell and then stops with the error:
>
> 'Segmentation fault at address 0x0.
>
> Fatal sever error etc etc etc.

Unless you're an advanced user, you'll get the best results by not
supplying a manual xorg.conf. Generically, this indicates that you
messed something up. Without knowing precisely what the contents of
that file are, it would be difficult to say what exactly went wrong.
However I wouldn't advise this path without a good reason.

>
> Hope this helps!
>
> In fact the above is part of a much bigger issue I have with the
> machine. When I enable the i915 module (Kaby lake native video) my
> screen goed black after a while. The machine is totally stuck in that
> state. Even ssh connection is not possible. It shows no errors in the
> (saved) logs after restarting the machine.
>
> So I disabled the i915 module and try to get the nvidia card running.
> Without any luck.
>
> Thank you for inviting me for irc.freenode.net. What it the procss to
> get access?

It's an IRC network like any other. More info about the network
available at https://freenode.net/

It's open to anyone... #nouveau for nouveau, #intel-gfx for intel.

>
> I have included the full dmesg in zip format.
>
> For me it is a showstopper using the machine with Linux. I really do not
> understand that I am the only person on this planet that cannot run
> Linux on a plain vanilla Kaby lake machine.

I don't know the specifics of your laptop, but on many other GM108M
laptops, the displays are only attached to the Intel GPU. So running
without i915 is just not an option, if you want anything displayed.
You would be able to use the GM108M chip for 3D acceleration if you
chose, but nothing to do with actual display.

If your screen goes black with i915 loaded, I suspect that you'd be
better served reporting this issue to Intel.

>From your logs, you also appear to have a variety of combinations of
nomodeset/.modeset=0 combinations -- these will just impede
the proper mode of operation. The i915 and nouveau drivers effectively
do nothing under those conditions.

Cheers,

  -ilia

Re: Nouveau module results in total lockups without any dmesg trace on a NP900X5N Kaby Lake machine

2019-01-01 Thread Ilia Mirkin

On Tue, Jan 1, 2019 at 4:06 PM Jan Vlietland  wrote:
>
> Hi Ben, David and Daniel ,
>
> First of all happy new year. Based on advice of Greg K-H herewith a mail
> about a number of Nouveau issues with my laptop.
>
> I installed various Kali linux versions up to Linux 4.20.0-rc7
> (downloaded, compiled and installed) on a Samsung NP900X5N laptop and
> have an issue with the driver after loading.
>
> My configuration:
>
> - i7 7500
> - 16 gb / 256 gb ssd
> - nvidia 940MX (for 3D graphics)
>
> When I enable loading of the nouveau module for my Nvidia 3D card I get
> three MMIO faults:
>
> [   35.984104] nouveau :01:00.0: bus: MMIO read of  FAULT at 
> 6013d4 [ IBUS ]
> [   35.997510] nouveau :01:00.0: bus: MMIO read of  FAULT at 
> 10ac08 [ IBUS ]
> [   37.551790] nouveau :01:00.0: bus: MMIO read of  FAULT at 
> 619444 [ IBUS ]
>
> I see currenty varous discussions on bugzilla: (as summarized by Bruno
> Pagani) https://bugs.freedesktop.org/show_bug.cgi?id=100423.
>
> But I do not see any confirmed solutions on the MMIO faults.
>
> The module is loaded but X server cannot run in framebuffer mode. I
> assume that the module does not provide any video memory to X to run in
> graphics mode.
>
> First of all I would like to understand what the faults impose.
> And I also would like to help you providing testing to fix the errors.

The faults are, generally, nothing to worry about, esp if they occur
infrequently. It's one bit or another of code that's poking at a part
of the GPU it shouldn't be touching.

To the best of my knowledge, 940MX (GM108) should work reasonably
well. Perhaps it would make most sense if you posted about some of
your other issues (usually GM108's are have no outputs, so only usable
as offload devices). Feel free to join #nouveau on irc.freenode.net to
get more info as well.

Cheers,

  -ilia

Re: [Nouveau] [PATCH][next] drm/nouveau/disp: avoid potential overflow on shift of int value

2018-05-27 Thread Ilia Mirkin

On Sun, May 27, 2018 at 5:54 PM, Colin King  wrote:
> From: Colin Ian King 
>
> The constant values being shifted are 32 bit integers and may potentially
> overflow on the shift.  Avoid this potential overflow by making them
> unsigned long long values before the shift.
>
> Detected by CoverityScan, CID#1469383, 1469400 ("Unintentional
> integer overflow")
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c | 2 +-
>  drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c
> index 29e6dd58ac48..99b94802ed63 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c
> @@ -52,7 +52,7 @@ void
>  gf119_disp_chan_intr(struct nv50_disp_chan *chan, bool en)
>  {
> struct nvkm_device *device = chan->disp->base.engine.subdev.device;
> -   const u64 mask = 0x0001 << chan->chid.user;
> +   const u64 mask = 0x0001ULL << chan->chid.user;

I'm pretty sure all of these should just be u32 (below as well). The
registers that this is masking are all 32-bit, more doesn't make
sense.

> if (!en) {
> nvkm_mask(device, 0x610090, mask, 0x);
> nvkm_mask(device, 0x6100a0, mask, 0x);
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c
> index 57719f675eec..43ae3b092e43 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c
> @@ -166,7 +166,7 @@ void
>  nv50_disp_chan_intr(struct nv50_disp_chan *chan, bool en)
>  {
> struct nvkm_device *device = chan->disp->base.engine.subdev.device;
> -   const u64 mask = 0x00010001 << chan->chid.user;
> +   const u64 mask = 0x00010001ULL << chan->chid.user;
> const u64 data = en ? 0x0001 : 0x;
> nvkm_mask(device, 0x610028, mask, data);
>  }
> --
> 2.17.0
>
> ___
> Nouveau mailing list
> nouv...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH][next] drm/nouveau/disp: avoid potential overflow on shift of int value

2018-05-27 Thread Ilia Mirkin

On Sun, May 27, 2018 at 5:54 PM, Colin King  wrote:
> From: Colin Ian King 
>
> The constant values being shifted are 32 bit integers and may potentially
> overflow on the shift.  Avoid this potential overflow by making them
> unsigned long long values before the shift.
>
> Detected by CoverityScan, CID#1469383, 1469400 ("Unintentional
> integer overflow")
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c | 2 +-
>  drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c
> index 29e6dd58ac48..99b94802ed63 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/changf119.c
> @@ -52,7 +52,7 @@ void
>  gf119_disp_chan_intr(struct nv50_disp_chan *chan, bool en)
>  {
> struct nvkm_device *device = chan->disp->base.engine.subdev.device;
> -   const u64 mask = 0x0001 << chan->chid.user;
> +   const u64 mask = 0x0001ULL << chan->chid.user;

I'm pretty sure all of these should just be u32 (below as well). The
registers that this is masking are all 32-bit, more doesn't make
sense.

> if (!en) {
> nvkm_mask(device, 0x610090, mask, 0x);
> nvkm_mask(device, 0x6100a0, mask, 0x);
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c
> index 57719f675eec..43ae3b092e43 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/channv50.c
> @@ -166,7 +166,7 @@ void
>  nv50_disp_chan_intr(struct nv50_disp_chan *chan, bool en)
>  {
> struct nvkm_device *device = chan->disp->base.engine.subdev.device;
> -   const u64 mask = 0x00010001 << chan->chid.user;
> +   const u64 mask = 0x00010001ULL << chan->chid.user;
> const u64 data = en ? 0x0001 : 0x;
> nvkm_mask(device, 0x610028, mask, data);
>  }
> --
> 2.17.0
>
> ___
> Nouveau mailing list
> nouv...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE

2018-04-28 Thread Ilia Mirkin

On Sat, Apr 28, 2018 at 7:02 PM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 2018-04-28 06:30 PM, Ilia Mirkin wrote:
>> On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzer <mic...@daenzer.net> wrote:
>>> From: Michel Dänzer <michel.daen...@amd.com>
>>>
>>> Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled)
>>> try to allocate huge pages. However, not all drivers can take advantage
>>> of huge pages, but they would incur the overhead for allocating and
>>> freeing them anyway.
>>>
>>> Now, drivers which can take advantage of huge pages need to set the new
>>> flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag
>>> no longer incur any overhead for allocating or freeing huge pages.
>>>
>>> v2:
>>> * Also guard swapping of consecutive pages in ttm_get_pages
>>> * Reword commit log, hopefully clearer now
>>>
>>> Cc: sta...@vger.kernel.org
>>> Signed-off-by: Michel Dänzer <michel.daen...@amd.com>
>>
>> Both I and lots of other people, based on reports, are still seeing
>> plenty of issues with this as late as 4.16.4.
>
> "lots of other people", "plenty of issues" sounds a bit exaggerated from
> what I've seen. FWIW, while I did see the original messages myself, I
> haven't seen any since Christian's original fix (see below), neither
> with amdgpu nor radeon, even before this patch you followed up to.

Probably a half-dozen reports of it with nouveau, in addition to
another bunch of people talking about it on the bug you mention below,
along with email threads on dri-devel.

I figured I didn't have to raise my own since it was identical to the
others, and, I assumed, was being handled.

>> Admittedly I'm on nouveau, but others have reported issues with
>> radeon/amdgpu as well. It's been going on since the feature was merged
>> in v4.15, with what seems like little investigation from the authors
>> introducing the feature.
>
> That's not a fair assessment. See
> https://bugs.freedesktop.org/show_bug.cgi?id=104082#c40 and following
> comments.
>
> Christian fixed the original issue in
> d0bc0c2a31c95002d37c3cc511ffdcab851b3256 "swiotlb: suppress warning when
> __GFP_NOWARN is set". Christian did his best to try and get the fix in
> before 4.15 final, but for reasons beyond his control, it was delayed
> until 4.16-rc1 and then backported to 4.15.5.

In case it's unclear, let me state this explicitly -- I totally get
that despite best intentions, bugs get introduced. I do it myself.
What I'm having trouble with is the handling once the issue is
discovered.

>
> Unfortunately, there was an swiotlb regression (not directly related to
> Christian's work) shortly after this fix, also in 4.16-rc1, which is now
> fixed in 4.17-rc1 and will be backported to 4.16.y.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.16.5=2c9dacf5bfe1e45d96dfe97cb71d2b717786a7b9

This guy? Didn't help. I'm running 4.16.4 right now.

> It looks like there's at least one more bug left, but it's not clear yet
> when that was introduced, whether it's directly related to Christian's
> work, or indeed what the impact is. Let's not get ahead of ourselves.

Whether it is directly related to that work or not, the issue
persists. There are two options:

 - When declaring things fixed, no serious attempt was actually made
at reproducing the underlying issues.
 - The authors truly can't reproduce the underlying issues users are
seeing and are taking stabs in the dark.

Given that a number of people are reporting problems, in either
scenario, the reasonable thing is to disable the feature, and figure
out what is going on. Maybe condition it on !CONFIG_SWIOTLB.

>> We now have *two* broken releases, v4.15 and v4.16 (anything that
>> spews error messages and stack traces ad-infinitum in dmesg is, by
>> definition, broken).
>
> I haven't seen any evidence that there's still an issue in 4.15, is
> there any?

Well, I did have a late 4.15 rc kernel in addition to the 'suppress
warning' patch. Now I'm questioning my memory of whether the issue was
resolved there or not. I'm pretty sure that 'not', but no longer 100%.
Either way, I think we all agree v4.15 was broken and more importantly
was *known* to be broken well in advance of the release. A reasonable
option would have been to disable the feature until the other bits
fell into place.

>> You're putting this behind a flag now (finally),
>
> I wrote this patch because I realized due to some remark I happened to
> see you make this week on IRC that the huge page support in TTM was
> enabled for all drivers. Instead of making that kind of remark on IRC,
> it would ha

Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE

2018-04-28 Thread Ilia Mirkin

On Sat, Apr 28, 2018 at 7:02 PM, Michel Dänzer  wrote:
> On 2018-04-28 06:30 PM, Ilia Mirkin wrote:
>> On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzer  wrote:
>>> From: Michel Dänzer 
>>>
>>> Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled)
>>> try to allocate huge pages. However, not all drivers can take advantage
>>> of huge pages, but they would incur the overhead for allocating and
>>> freeing them anyway.
>>>
>>> Now, drivers which can take advantage of huge pages need to set the new
>>> flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag
>>> no longer incur any overhead for allocating or freeing huge pages.
>>>
>>> v2:
>>> * Also guard swapping of consecutive pages in ttm_get_pages
>>> * Reword commit log, hopefully clearer now
>>>
>>> Cc: sta...@vger.kernel.org
>>> Signed-off-by: Michel Dänzer 
>>
>> Both I and lots of other people, based on reports, are still seeing
>> plenty of issues with this as late as 4.16.4.
>
> "lots of other people", "plenty of issues" sounds a bit exaggerated from
> what I've seen. FWIW, while I did see the original messages myself, I
> haven't seen any since Christian's original fix (see below), neither
> with amdgpu nor radeon, even before this patch you followed up to.

Probably a half-dozen reports of it with nouveau, in addition to
another bunch of people talking about it on the bug you mention below,
along with email threads on dri-devel.

I figured I didn't have to raise my own since it was identical to the
others, and, I assumed, was being handled.

>> Admittedly I'm on nouveau, but others have reported issues with
>> radeon/amdgpu as well. It's been going on since the feature was merged
>> in v4.15, with what seems like little investigation from the authors
>> introducing the feature.
>
> That's not a fair assessment. See
> https://bugs.freedesktop.org/show_bug.cgi?id=104082#c40 and following
> comments.
>
> Christian fixed the original issue in
> d0bc0c2a31c95002d37c3cc511ffdcab851b3256 "swiotlb: suppress warning when
> __GFP_NOWARN is set". Christian did his best to try and get the fix in
> before 4.15 final, but for reasons beyond his control, it was delayed
> until 4.16-rc1 and then backported to 4.15.5.

In case it's unclear, let me state this explicitly -- I totally get
that despite best intentions, bugs get introduced. I do it myself.
What I'm having trouble with is the handling once the issue is
discovered.

>
> Unfortunately, there was an swiotlb regression (not directly related to
> Christian's work) shortly after this fix, also in 4.16-rc1, which is now
> fixed in 4.17-rc1 and will be backported to 4.16.y.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.16.5=2c9dacf5bfe1e45d96dfe97cb71d2b717786a7b9

This guy? Didn't help. I'm running 4.16.4 right now.

> It looks like there's at least one more bug left, but it's not clear yet
> when that was introduced, whether it's directly related to Christian's
> work, or indeed what the impact is. Let's not get ahead of ourselves.

Whether it is directly related to that work or not, the issue
persists. There are two options:

 - When declaring things fixed, no serious attempt was actually made
at reproducing the underlying issues.
 - The authors truly can't reproduce the underlying issues users are
seeing and are taking stabs in the dark.

Given that a number of people are reporting problems, in either
scenario, the reasonable thing is to disable the feature, and figure
out what is going on. Maybe condition it on !CONFIG_SWIOTLB.

>> We now have *two* broken releases, v4.15 and v4.16 (anything that
>> spews error messages and stack traces ad-infinitum in dmesg is, by
>> definition, broken).
>
> I haven't seen any evidence that there's still an issue in 4.15, is
> there any?

Well, I did have a late 4.15 rc kernel in addition to the 'suppress
warning' patch. Now I'm questioning my memory of whether the issue was
resolved there or not. I'm pretty sure that 'not', but no longer 100%.
Either way, I think we all agree v4.15 was broken and more importantly
was *known* to be broken well in advance of the release. A reasonable
option would have been to disable the feature until the other bits
fell into place.

>> You're putting this behind a flag now (finally),
>
> I wrote this patch because I realized due to some remark I happened to
> see you make this week on IRC that the huge page support in TTM was
> enabled for all drivers. Instead of making that kind of remark on IRC,
> it would have been more constructive, and more conducive to quick
> implementation, to suggest making the feature no

Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE

2018-04-28 Thread Ilia Mirkin

On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzer  wrote:
> From: Michel Dänzer 
>
> Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled)
> try to allocate huge pages. However, not all drivers can take advantage
> of huge pages, but they would incur the overhead for allocating and
> freeing them anyway.
>
> Now, drivers which can take advantage of huge pages need to set the new
> flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag
> no longer incur any overhead for allocating or freeing huge pages.
>
> v2:
> * Also guard swapping of consecutive pages in ttm_get_pages
> * Reword commit log, hopefully clearer now
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Michel Dänzer 

Both I and lots of other people, based on reports, are still seeing
plenty of issues with this as late as 4.16.4. Admittedly I'm on
nouveau, but others have reported issues with radeon/amdgpu as well.
It's been going on since the feature was merged in v4.15, with what
seems like little investigation from the authors introducing the
feature.

We now have *two* broken releases, v4.15 and v4.16 (anything that
spews error messages and stack traces ad-infinitum in dmesg is, by
definition, broken). You're putting this behind a flag now (finally),
but should it be enabled anywhere? Why is it being flipped on for
amdgpu by default, despite the still-existing problems?

Reverting this feature without just resetting back to the code in
v4.14 is painful, but why make Joe User suffer by enabling it while
you're still working out the kinks?

  -ilia

Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE

2018-04-28 Thread Ilia Mirkin

On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzer  wrote:
> From: Michel Dänzer 
>
> Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled)
> try to allocate huge pages. However, not all drivers can take advantage
> of huge pages, but they would incur the overhead for allocating and
> freeing them anyway.
>
> Now, drivers which can take advantage of huge pages need to set the new
> flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag
> no longer incur any overhead for allocating or freeing huge pages.
>
> v2:
> * Also guard swapping of consecutive pages in ttm_get_pages
> * Reword commit log, hopefully clearer now
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Michel Dänzer 

Both I and lots of other people, based on reports, are still seeing
plenty of issues with this as late as 4.16.4. Admittedly I'm on
nouveau, but others have reported issues with radeon/amdgpu as well.
It's been going on since the feature was merged in v4.15, with what
seems like little investigation from the authors introducing the
feature.

We now have *two* broken releases, v4.15 and v4.16 (anything that
spews error messages and stack traces ad-infinitum in dmesg is, by
definition, broken). You're putting this behind a flag now (finally),
but should it be enabled anywhere? Why is it being flipped on for
amdgpu by default, despite the still-existing problems?

Reverting this feature without just resetting back to the code in
v4.14 is painful, but why make Joe User suffer by enabling it while
you're still working out the kinks?

  -ilia

Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Ilia Mirkin

On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
> On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mr...@linux.ee> wrote:
>>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
>>
>> NV5 in another PC (secondary card in x86-64) made the systrem crash on
>> boot, in nvkm_therm_clkgate_fini.
>
> Mind booting with nouveau.debug=trace? That should hopefully tell us
> more exactly which thing is dying. If you have a cross-compile/distcc
> setup handy, a bisect may be even more useful.

Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
somehow mis-hooked up for NV5 now. A bisect result would still make
the culprit a lot more obvious.

Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Ilia Mirkin

On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin  wrote:
> On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos  wrote:
>>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
>>
>> NV5 in another PC (secondary card in x86-64) made the systrem crash on
>> boot, in nvkm_therm_clkgate_fini.
>
> Mind booting with nouveau.debug=trace? That should hopefully tell us
> more exactly which thing is dying. If you have a cross-compile/distcc
> setup handy, a bisect may be even more useful.

Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
somehow mis-hooked up for NV5 now. A bisect result would still make
the culprit a lot more obvious.

Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Ilia Mirkin

On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos  wrote:
>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
>
> NV5 in another PC (secondary card in x86-64) made the systrem crash on
> boot, in nvkm_therm_clkgate_fini.

Mind booting with nouveau.debug=trace? That should hopefully tell us
more exactly which thing is dying. If you have a cross-compile/distcc
setup handy, a bisect may be even more useful.

It's funny, I had a NV5 plugged into my desktop for testing, and
*just* took it out (because the box wouldn't even get to BIOS anymore
... although it was unrelated to the NV5, probably just something
mis-seated.)

  -ilia

Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Ilia Mirkin

On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos  wrote:
>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
>
> NV5 in another PC (secondary card in x86-64) made the systrem crash on
> boot, in nvkm_therm_clkgate_fini.

Mind booting with nouveau.debug=trace? That should hopefully tell us
more exactly which thing is dying. If you have a cross-compile/distcc
setup handy, a bisect may be even more useful.

It's funny, I had a NV5 plugged into my desktop for testing, and
*just* took it out (because the box wouldn't even get to BIOS anymore
... although it was unrelated to the NV5, probably just something
mis-seated.)

  -ilia

Re: [RFC v2 3/4] drm/nouveau: Add support for BLCG on Kepler2

2018-01-25 Thread Ilia Mirkin

On Thu, Jan 25, 2018 at 10:35 PM, Lyude Paul  wrote:
> Same as the previous patch, but for Kepler2 now
>
> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h  |  1 +
>  drivers/gpu/drm/nouveau/nvkm/engine/device/base.c |  8 +--
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c| 62 
>  drivers/gpu/drm/nouveau/nvkm/subdev/fb/Kbuild |  1 +
>  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk110.c| 71 
> +++
>  5 files changed, 139 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk110.c
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h 
> b/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h
> index adb78f7d083a..92be0e5269c6 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h
> @@ -75,6 +75,7 @@ int mcp89_fb_new(struct nvkm_device *, int, struct nvkm_fb 
> **);
>  int gf100_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
>  int gf108_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
>  int gk104_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
> +int gk110_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
>  int gk20a_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
>  int gm107_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
>  int gm200_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
> index 74bd09b1c893..7590a30b7ff0 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
> @@ -1812,7 +1812,7 @@ nvf0_chipset = {
> .bus = gf100_bus_new,
> .clk = gk104_clk_new,
> .devinit = gf100_devinit_new,
> -   .fb = gk104_fb_new,
> +   .fb = gk110_fb_new,
> .fuse = gf100_fuse_new,
> .gpio = gk104_gpio_new,
> .i2c = gk104_i2c_new,
> @@ -1850,7 +1850,7 @@ nvf1_chipset = {
> .bus = gf100_bus_new,
> .clk = gk104_clk_new,
> .devinit = gf100_devinit_new,
> -   .fb = gk104_fb_new,
> +   .fb = gk110_fb_new,
> .fuse = gf100_fuse_new,
> .gpio = gk104_gpio_new,
> .i2c = gk104_i2c_new,
> @@ -1888,7 +1888,7 @@ nv106_chipset = {
> .bus = gf100_bus_new,
> .clk = gk104_clk_new,
> .devinit = gf100_devinit_new,
> -   .fb = gk104_fb_new,
> +   .fb = gk110_fb_new,
> .fuse = gf100_fuse_new,
> .gpio = gk104_gpio_new,
> .i2c = gk104_i2c_new,
> @@ -1926,7 +1926,7 @@ nv108_chipset = {
> .bus = gf100_bus_new,
> .clk = gk104_clk_new,
> .devinit = gf100_devinit_new,
> -   .fb = gk104_fb_new,
> +   .fb = gk110_fb_new,
> .fuse = gf100_fuse_new,
> .gpio = gk104_gpio_new,
> .i2c = gk104_i2c_new,
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c
> index a38e19b61c1d..38d3328e45f1 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c
> @@ -22,6 +22,7 @@
>   * Authors: Ben Skeggs 
>   */
>  #include "gf100.h"
> +#include "gk104.h"
>  #include "ctxgf100.h"
>
>  #include 
> @@ -156,6 +157,66 @@ gk110_gr_pack_mmio[] = {
> {}
>  };
>
> +const struct nvkm_therm_clkgate_init

These should all be static, no?

> +gk110_clkgate_blcg_init_sked_0[] = {
> +   { 0x407000, 1, 0x4041 },
> +   {}
> +};
> +
> +const struct nvkm_therm_clkgate_init
> +gk110_clkgate_blcg_init_gpc_gcc_0[] = {
> +   { 0x419020, 1, 0x0042 },
> +   { 0x419038, 1, 0x0042 },
> +   {}
> +};
> +
> +const struct nvkm_therm_clkgate_init
> +gk110_clkgate_blcg_init_gpc_l1c_0[] = {
> +   { 0x419cd4, 2, 0x4042 },
> +   {}
> +};
> +
> +const struct nvkm_therm_clkgate_init
> +gk110_clkgate_blcg_init_gpc_mp_0[] = {
> +   { 0x419fd0, 1, 0x4043 },
> +   { 0x419fd8, 1, 0x4049 },
> +   { 0x419fe0, 2, 0x4042 },
> +   { 0x419ff0, 1, 0x0046 },
> +   { 0x419ff8, 1, 0x4042 },
> +   { 0x419f90, 1, 0x4042 },
> +   {}
> +};
> +
> +const struct nvkm_therm_clkgate_pack
> +gk110_clkgate_pack[] = {
> +   { gk104_clkgate_blcg_init_main_0 },
> +   { gk104_clkgate_blcg_init_rstr2d_0 },
> +   { gk104_clkgate_blcg_init_unk_0 },
> +   { gk104_clkgate_blcg_init_gcc_0 },
> +   { gk110_clkgate_blcg_init_sked_0 },
> +   { gk104_clkgate_blcg_init_unk_1 },
> +   { gk104_clkgate_blcg_init_gpc_ctxctl_0 },
> +   { gk104_clkgate_blcg_init_gpc_unk_0 },
> +   { gk104_clkgate_blcg_init_gpc_esetup_0 },
> +   { gk104_clkgate_blcg_init_gpc_tpbus_0 },
> +   { gk104_clkgate_blcg_init_gpc_zcull_0 },
> +   { gk104_clkgate_blcg_init_gpc_tpconf_0 },
> +   {

Re: [RFC v2 3/4] drm/nouveau: Add support for BLCG on Kepler2

2018-01-25 Thread Ilia Mirkin

On Thu, Jan 25, 2018 at 10:35 PM, Lyude Paul  wrote:
> Same as the previous patch, but for Kepler2 now
>
> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h  |  1 +
>  drivers/gpu/drm/nouveau/nvkm/engine/device/base.c |  8 +--
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c| 62 
>  drivers/gpu/drm/nouveau/nvkm/subdev/fb/Kbuild |  1 +
>  drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk110.c| 71 
> +++
>  5 files changed, 139 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/fb/gk110.c
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h 
> b/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h
> index adb78f7d083a..92be0e5269c6 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/fb.h
> @@ -75,6 +75,7 @@ int mcp89_fb_new(struct nvkm_device *, int, struct nvkm_fb 
> **);
>  int gf100_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
>  int gf108_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
>  int gk104_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
> +int gk110_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
>  int gk20a_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
>  int gm107_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
>  int gm200_fb_new(struct nvkm_device *, int, struct nvkm_fb **);
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
> index 74bd09b1c893..7590a30b7ff0 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
> @@ -1812,7 +1812,7 @@ nvf0_chipset = {
> .bus = gf100_bus_new,
> .clk = gk104_clk_new,
> .devinit = gf100_devinit_new,
> -   .fb = gk104_fb_new,
> +   .fb = gk110_fb_new,
> .fuse = gf100_fuse_new,
> .gpio = gk104_gpio_new,
> .i2c = gk104_i2c_new,
> @@ -1850,7 +1850,7 @@ nvf1_chipset = {
> .bus = gf100_bus_new,
> .clk = gk104_clk_new,
> .devinit = gf100_devinit_new,
> -   .fb = gk104_fb_new,
> +   .fb = gk110_fb_new,
> .fuse = gf100_fuse_new,
> .gpio = gk104_gpio_new,
> .i2c = gk104_i2c_new,
> @@ -1888,7 +1888,7 @@ nv106_chipset = {
> .bus = gf100_bus_new,
> .clk = gk104_clk_new,
> .devinit = gf100_devinit_new,
> -   .fb = gk104_fb_new,
> +   .fb = gk110_fb_new,
> .fuse = gf100_fuse_new,
> .gpio = gk104_gpio_new,
> .i2c = gk104_i2c_new,
> @@ -1926,7 +1926,7 @@ nv108_chipset = {
> .bus = gf100_bus_new,
> .clk = gk104_clk_new,
> .devinit = gf100_devinit_new,
> -   .fb = gk104_fb_new,
> +   .fb = gk110_fb_new,
> .fuse = gf100_fuse_new,
> .gpio = gk104_gpio_new,
> .i2c = gk104_i2c_new,
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c
> index a38e19b61c1d..38d3328e45f1 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/gk110.c
> @@ -22,6 +22,7 @@
>   * Authors: Ben Skeggs 
>   */
>  #include "gf100.h"
> +#include "gk104.h"
>  #include "ctxgf100.h"
>
>  #include 
> @@ -156,6 +157,66 @@ gk110_gr_pack_mmio[] = {
> {}
>  };
>
> +const struct nvkm_therm_clkgate_init

These should all be static, no?

> +gk110_clkgate_blcg_init_sked_0[] = {
> +   { 0x407000, 1, 0x4041 },
> +   {}
> +};
> +
> +const struct nvkm_therm_clkgate_init
> +gk110_clkgate_blcg_init_gpc_gcc_0[] = {
> +   { 0x419020, 1, 0x0042 },
> +   { 0x419038, 1, 0x0042 },
> +   {}
> +};
> +
> +const struct nvkm_therm_clkgate_init
> +gk110_clkgate_blcg_init_gpc_l1c_0[] = {
> +   { 0x419cd4, 2, 0x4042 },
> +   {}
> +};
> +
> +const struct nvkm_therm_clkgate_init
> +gk110_clkgate_blcg_init_gpc_mp_0[] = {
> +   { 0x419fd0, 1, 0x4043 },
> +   { 0x419fd8, 1, 0x4049 },
> +   { 0x419fe0, 2, 0x4042 },
> +   { 0x419ff0, 1, 0x0046 },
> +   { 0x419ff8, 1, 0x4042 },
> +   { 0x419f90, 1, 0x4042 },
> +   {}
> +};
> +
> +const struct nvkm_therm_clkgate_pack
> +gk110_clkgate_pack[] = {
> +   { gk104_clkgate_blcg_init_main_0 },
> +   { gk104_clkgate_blcg_init_rstr2d_0 },
> +   { gk104_clkgate_blcg_init_unk_0 },
> +   { gk104_clkgate_blcg_init_gcc_0 },
> +   { gk110_clkgate_blcg_init_sked_0 },
> +   { gk104_clkgate_blcg_init_unk_1 },
> +   { gk104_clkgate_blcg_init_gpc_ctxctl_0 },
> +   { gk104_clkgate_blcg_init_gpc_unk_0 },
> +   { gk104_clkgate_blcg_init_gpc_esetup_0 },
> +   { gk104_clkgate_blcg_init_gpc_tpbus_0 },
> +   { gk104_clkgate_blcg_init_gpc_zcull_0 },
> +   { gk104_clkgate_blcg_init_gpc_tpconf_0 },
> +   { gk104_clkgate_blcg_init_gpc_unk_1 },
> +   {

Re: nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152

2018-01-01 Thread Ilia Mirkin

On Sun, Dec 31, 2017 at 3:53 PM, Mike Galbraith <efa...@gmx.de> wrote:
> On Sun, 2017-12-31 at 13:27 -0500, Ilia Mirkin wrote:
>> On Tue, Dec 19, 2017 at 8:45 AM, Christian König
>> <ckoenig.leichtzumer...@gmail.com> wrote:
>> > Am 19.12.2017 um 11:39 schrieb Michel Dänzer:
>> >>
>> >> On 2017-12-19 11:37 AM, Michel Dänzer wrote:
>> >>>
>> >>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote:
>> >>>>
>> >>>> On 12/18/17 7:06 PM, Mike Galbraith wrote:
>> >>>>>
>> >>>>> Greetings,
>> >>>>>
>> >>>>> Kernel bound workloads seem to trigger the below for whatever reason.
>> >>>>>I only see this when beating up NFS.  There was a kworker wakeup
>> >>>>> latency issue, but with a bandaid applied to fix that up, I can still
>> >>>>> trigger this.
>> >>>>
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> i have seen this one as well with my system, but i could not find an
>> >>>> easy way to trigger it for bisecting purpose. If you can trigger it
>> >>>> conveniently, a bisect would be nice!
>> >>>
>> >>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a
>> >>> backup, creating memory pressure. I happen to have just finished
>> >>> bisecting, the result is:
>> >>>
>> >>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit
>> >>> commit 648bc3574716400acc06f99915815f80d9563783
>> >>> Author: Christian König <christian.koe...@amd.com>
>> >>> Date:   Thu Jul 6 09:59:43 2017 +0200
>> >>>
>> >>>  drm/ttm: add transparent huge page support for DMA allocations v2
>> >>>
>> >>>  Try to allocate huge pages when it makes sense.
>> >>>
>> >>>  v2: fix comment and use ifdef
>> >>>
>> >>>
>> >> BTW, I haven't noticed any bad effects other than the dmesg splats, so
>> >> maybe it's just noise about transient failures for which there is a
>> >> proper fallback in place.
>> >
>> >
>> > Yeah, I think that is exactly what happens here.
>> >
>> > We try to allocate a huge page, but fail and so fall back to using multiple
>> > 4k pages instead.
>> >
>> > Going to send out a patch to suppress the warning.
>>
>> Hi Christian,
>>
>> Did you ever send out such a patch? I didn't see one on the list, but
>> perhaps I missed it. One definitely hasn't made it upstream yet. (I
>> just hit the issue myself with Linus's tree from last night.)
>
> Actually, that wants a bit more methinks, because while the stack dump
> goes away, you still get spammed, it just comes in smaller chunks.

OK, well this has to either be fixed or reverted. Right now it's
complaining all the time for me after like a day of uptime.

  -ilia

Re: nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152

2018-01-01 Thread Ilia Mirkin

On Sun, Dec 31, 2017 at 3:53 PM, Mike Galbraith  wrote:
> On Sun, 2017-12-31 at 13:27 -0500, Ilia Mirkin wrote:
>> On Tue, Dec 19, 2017 at 8:45 AM, Christian König
>>  wrote:
>> > Am 19.12.2017 um 11:39 schrieb Michel Dänzer:
>> >>
>> >> On 2017-12-19 11:37 AM, Michel Dänzer wrote:
>> >>>
>> >>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote:
>> >>>>
>> >>>> On 12/18/17 7:06 PM, Mike Galbraith wrote:
>> >>>>>
>> >>>>> Greetings,
>> >>>>>
>> >>>>> Kernel bound workloads seem to trigger the below for whatever reason.
>> >>>>>I only see this when beating up NFS.  There was a kworker wakeup
>> >>>>> latency issue, but with a bandaid applied to fix that up, I can still
>> >>>>> trigger this.
>> >>>>
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> i have seen this one as well with my system, but i could not find an
>> >>>> easy way to trigger it for bisecting purpose. If you can trigger it
>> >>>> conveniently, a bisect would be nice!
>> >>>
>> >>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a
>> >>> backup, creating memory pressure. I happen to have just finished
>> >>> bisecting, the result is:
>> >>>
>> >>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit
>> >>> commit 648bc3574716400acc06f99915815f80d9563783
>> >>> Author: Christian König 
>> >>> Date:   Thu Jul 6 09:59:43 2017 +0200
>> >>>
>> >>>  drm/ttm: add transparent huge page support for DMA allocations v2
>> >>>
>> >>>  Try to allocate huge pages when it makes sense.
>> >>>
>> >>>  v2: fix comment and use ifdef
>> >>>
>> >>>
>> >> BTW, I haven't noticed any bad effects other than the dmesg splats, so
>> >> maybe it's just noise about transient failures for which there is a
>> >> proper fallback in place.
>> >
>> >
>> > Yeah, I think that is exactly what happens here.
>> >
>> > We try to allocate a huge page, but fail and so fall back to using multiple
>> > 4k pages instead.
>> >
>> > Going to send out a patch to suppress the warning.
>>
>> Hi Christian,
>>
>> Did you ever send out such a patch? I didn't see one on the list, but
>> perhaps I missed it. One definitely hasn't made it upstream yet. (I
>> just hit the issue myself with Linus's tree from last night.)
>
> Actually, that wants a bit more methinks, because while the stack dump
> goes away, you still get spammed, it just comes in smaller chunks.

OK, well this has to either be fixed or reverted. Right now it's
complaining all the time for me after like a day of uptime.

  -ilia

Re: nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152

2017-12-31 Thread Ilia Mirkin

On Tue, Dec 19, 2017 at 8:45 AM, Christian König
 wrote:
> Am 19.12.2017 um 11:39 schrieb Michel Dänzer:
>>
>> On 2017-12-19 11:37 AM, Michel Dänzer wrote:
>>>
>>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote:

 On 12/18/17 7:06 PM, Mike Galbraith wrote:
>
> Greetings,
>
> Kernel bound workloads seem to trigger the below for whatever reason.
>I only see this when beating up NFS.  There was a kworker wakeup
> latency issue, but with a bandaid applied to fix that up, I can still
> trigger this.


 Hi,

 i have seen this one as well with my system, but i could not find an
 easy way to trigger it for bisecting purpose. If you can trigger it
 conveniently, a bisect would be nice!
>>>
>>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a
>>> backup, creating memory pressure. I happen to have just finished
>>> bisecting, the result is:
>>>
>>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit
>>> commit 648bc3574716400acc06f99915815f80d9563783
>>> Author: Christian König 
>>> Date:   Thu Jul 6 09:59:43 2017 +0200
>>>
>>>  drm/ttm: add transparent huge page support for DMA allocations v2
>>>
>>>  Try to allocate huge pages when it makes sense.
>>>
>>>  v2: fix comment and use ifdef
>>>
>>>
>> BTW, I haven't noticed any bad effects other than the dmesg splats, so
>> maybe it's just noise about transient failures for which there is a
>> proper fallback in place.
>
>
> Yeah, I think that is exactly what happens here.
>
> We try to allocate a huge page, but fail and so fall back to using multiple
> 4k pages instead.
>
> Going to send out a patch to suppress the warning.

Hi Christian,

Did you ever send out such a patch? I didn't see one on the list, but
perhaps I missed it. One definitely hasn't made it upstream yet. (I
just hit the issue myself with Linus's tree from last night.)

Thanks,

  -ilia

Re: nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152

2017-12-31 Thread Ilia Mirkin

On Tue, Dec 19, 2017 at 8:45 AM, Christian König
 wrote:
> Am 19.12.2017 um 11:39 schrieb Michel Dänzer:
>>
>> On 2017-12-19 11:37 AM, Michel Dänzer wrote:
>>>
>>> On 2017-12-18 08:01 PM, Tobias Klausmann wrote:

 On 12/18/17 7:06 PM, Mike Galbraith wrote:
>
> Greetings,
>
> Kernel bound workloads seem to trigger the below for whatever reason.
>I only see this when beating up NFS.  There was a kworker wakeup
> latency issue, but with a bandaid applied to fix that up, I can still
> trigger this.


 Hi,

 i have seen this one as well with my system, but i could not find an
 easy way to trigger it for bisecting purpose. If you can trigger it
 conveniently, a bisect would be nice!
>>>
>>> I'm seeing this (with the amdgpu and radeon drivers) when restic takes a
>>> backup, creating memory pressure. I happen to have just finished
>>> bisecting, the result is:
>>>
>>> 648bc3574716400acc06f99915815f80d9563783 is the first bad commit
>>> commit 648bc3574716400acc06f99915815f80d9563783
>>> Author: Christian König 
>>> Date:   Thu Jul 6 09:59:43 2017 +0200
>>>
>>>  drm/ttm: add transparent huge page support for DMA allocations v2
>>>
>>>  Try to allocate huge pages when it makes sense.
>>>
>>>  v2: fix comment and use ifdef
>>>
>>>
>> BTW, I haven't noticed any bad effects other than the dmesg splats, so
>> maybe it's just noise about transient failures for which there is a
>> proper fallback in place.
>
>
> Yeah, I think that is exactly what happens here.
>
> We try to allocate a huge page, but fail and so fall back to using multiple
> 4k pages instead.
>
> Going to send out a patch to suppress the warning.

Hi Christian,

Did you ever send out such a patch? I didn't see one on the list, but
perhaps I missed it. One definitely hasn't made it upstream yet. (I
just hit the issue myself with Linus's tree from last night.)

Thanks,

  -ilia

Re: [tip:x86/urgent] x86/mm/kmmio: Fix mmiotrace for page unaligned addresses

2017-12-12 Thread Ilia Mirkin

On Tue, Dec 12, 2017 at 9:43 AM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Tue, Dec 12, 2017 at 09:21:10AM -0500, Ilia Mirkin wrote:
>> The "thing" being mmiotrace, or the "thing" being page-unaligned addresses?
>
> mmiotrace
>
>> If the former, its primary use-case is for snooping on the NVIDIA
>> proprietary GPU driver in order to figure out how to drive the
>> underlying hardware. The driver does ioremap's to get at PCI space,
>> which mmiotrace "steals" and provides pages without a present bit set,
>> along with a fault handler. When the fault handler is hit, it
>> reinstates the faulting page, and single-steps the faulting
>> instruction
>
> At which point you have valid page-tables and another CPU can access
> that page too.
>
>> reading the before/after regs to determine what happened
>> (doesn't work universally, but enough for instructions used for PCI
>> MMIO accesses). See mmio-mod.c::pre and post (the latter is called
>> from the debug handler).
>
> And after that you only invalidate the TLBs for the CPU that took the
> initial fault, leaving possibly stale TLBs on other CPUs.
>
>
> So this 'thing' has huge gaping SMP holes in.

Sure does! Probably why the following happens when mmiotrace is enabled:

void enable_mmiotrace(void)
{
mutex_lock(_mutex);
if (is_enabled())
goto out;

if (nommiotrace)
pr_info("MMIO tracing disabled.\n");
kmmio_init();
enter_uniprocessor();
spin_lock_irq(_lock);
atomic_inc(_enabled);
spin_unlock_irq(_lock);
pr_info("enabled.\n");
out:
mutex_unlock(_mutex);
}

Re: [tip:x86/urgent] x86/mm/kmmio: Fix mmiotrace for page unaligned addresses

2017-12-12 Thread Ilia Mirkin

On Tue, Dec 12, 2017 at 9:43 AM, Peter Zijlstra  wrote:
> On Tue, Dec 12, 2017 at 09:21:10AM -0500, Ilia Mirkin wrote:
>> The "thing" being mmiotrace, or the "thing" being page-unaligned addresses?
>
> mmiotrace
>
>> If the former, its primary use-case is for snooping on the NVIDIA
>> proprietary GPU driver in order to figure out how to drive the
>> underlying hardware. The driver does ioremap's to get at PCI space,
>> which mmiotrace "steals" and provides pages without a present bit set,
>> along with a fault handler. When the fault handler is hit, it
>> reinstates the faulting page, and single-steps the faulting
>> instruction
>
> At which point you have valid page-tables and another CPU can access
> that page too.
>
>> reading the before/after regs to determine what happened
>> (doesn't work universally, but enough for instructions used for PCI
>> MMIO accesses). See mmio-mod.c::pre and post (the latter is called
>> from the debug handler).
>
> And after that you only invalidate the TLBs for the CPU that took the
> initial fault, leaving possibly stale TLBs on other CPUs.
>
>
> So this 'thing' has huge gaping SMP holes in.

Sure does! Probably why the following happens when mmiotrace is enabled:

void enable_mmiotrace(void)
{
mutex_lock(_mutex);
if (is_enabled())
goto out;

if (nommiotrace)
pr_info("MMIO tracing disabled.\n");
kmmio_init();
enter_uniprocessor();
spin_lock_irq(_lock);
atomic_inc(_enabled);
spin_unlock_irq(_lock);
pr_info("enabled.\n");
out:
mutex_unlock(_mutex);
}

Re: [tip:x86/urgent] x86/mm/kmmio: Fix mmiotrace for page unaligned addresses

2017-12-12 Thread Ilia Mirkin

On Tue, Dec 12, 2017 at 9:04 AM, Ingo Molnar  wrote:
>
> * Peter Zijlstra  wrote:
>
>> On Tue, Dec 12, 2017 at 02:55:30AM -0800, tip-bot for Karol Herbst wrote:
>> > Commit-ID:  6d60ce384d1d5ca32b595244db4077a419acc687
>> > Gitweb: 
>> > https://git.kernel.org/tip/6d60ce384d1d5ca32b595244db4077a419acc687
>> > Author: Karol Herbst 
>> > AuthorDate: Mon, 27 Nov 2017 08:51:39 +0100
>> > Committer:  Ingo Molnar 
>> > CommitDate: Mon, 11 Dec 2017 15:35:18 +0100
>> >
>> > x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
>>
>> OK, let me hijack this thread since apparently people use and care about
>> mmiotrace.
>>
>> I was recently auditing the x86 tlb flushing and ran across this
>> 'thing'. Can someone please explain to me how this is supposed to work
>> and how its not completely broken?

The "thing" being mmiotrace, or the "thing" being page-unaligned addresses?

If the former, its primary use-case is for snooping on the NVIDIA
proprietary GPU driver in order to figure out how to drive the
underlying hardware. The driver does ioremap's to get at PCI space,
which mmiotrace "steals" and provides pages without a present bit set,
along with a fault handler. When the fault handler is hit, it
reinstates the faulting page, and single-steps the faulting
instruction reading the before/after regs to determine what happened
(doesn't work universally, but enough for instructions used for PCI
MMIO accesses). See mmio-mod.c::pre and post (the latter is called
from the debug handler).

You may be interested in reading
Documentation/trace/mmiotrace.txt::How Mmiotrace Works

Cheers,

  -ilia

Re: [tip:x86/urgent] x86/mm/kmmio: Fix mmiotrace for page unaligned addresses

2017-12-12 Thread Ilia Mirkin

On Tue, Dec 12, 2017 at 9:04 AM, Ingo Molnar  wrote:
>
> * Peter Zijlstra  wrote:
>
>> On Tue, Dec 12, 2017 at 02:55:30AM -0800, tip-bot for Karol Herbst wrote:
>> > Commit-ID:  6d60ce384d1d5ca32b595244db4077a419acc687
>> > Gitweb: 
>> > https://git.kernel.org/tip/6d60ce384d1d5ca32b595244db4077a419acc687
>> > Author: Karol Herbst 
>> > AuthorDate: Mon, 27 Nov 2017 08:51:39 +0100
>> > Committer:  Ingo Molnar 
>> > CommitDate: Mon, 11 Dec 2017 15:35:18 +0100
>> >
>> > x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
>>
>> OK, let me hijack this thread since apparently people use and care about
>> mmiotrace.
>>
>> I was recently auditing the x86 tlb flushing and ran across this
>> 'thing'. Can someone please explain to me how this is supposed to work
>> and how its not completely broken?

The "thing" being mmiotrace, or the "thing" being page-unaligned addresses?

If the former, its primary use-case is for snooping on the NVIDIA
proprietary GPU driver in order to figure out how to drive the
underlying hardware. The driver does ioremap's to get at PCI space,
which mmiotrace "steals" and provides pages without a present bit set,
along with a fault handler. When the fault handler is hit, it
reinstates the faulting page, and single-steps the faulting
instruction reading the before/after regs to determine what happened
(doesn't work universally, but enough for instructions used for PCI
MMIO accesses). See mmio-mod.c::pre and post (the latter is called
from the debug handler).

You may be interested in reading
Documentation/trace/mmiotrace.txt::How Mmiotrace Works

Cheers,

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-20 Thread Ilia Mirkin

On Sat, Nov 18, 2017 at 12:23 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
> On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
>> On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary <li...@rainbow-software.org> 
>> wrote:
>>> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote:
>>>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary
>>>>
>>>> <li...@rainbow-software.org> wrote:
>>>> > @@ -483,8 +483,8 @@
>>>> >  nouveau :02:00.0: disp:0860:  -> 0500
>>>> >  nouveau :02:00.0: disp:0864: 
>>>> >  nouveau :02:00.0: disp:0868:  -> 04000500
>>>> > -nouveau :02:00.0: disp:086c:  -> 00100500
>>>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00
>>>> > +nouveau :02:00.0: disp:086c:  -> 00100a00
>>>> > +nouveau :02:00.0: disp:0870: e900 -> e800
>>>> >  nouveau :02:00.0: disp:0874:  -> 
>>>> >  nouveau :02:00.0: disp:0878: 
>>>> >  nouveau :02:00.0: disp:0880: 0500
>>>> >
>>>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in
>>>> > 64MB case. Why?
>>>> >
>>>> > I get blank screen even with 64MB with video=1280x1024-8 kernel
>>>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen
>>>> > memory.
>>>> >
>>>> > Conclusions: 8-bit support is broken and bpp reduction is weird.
>>>>
>>>> OK, well that makes a *ton* of sense (8bpp being broken).
>>>>
>>>> I think the idea of bpp reduction is that when you're on your shiny
>>>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating
>>>> all that to a pinned fbcon - almost half of that would go to a single
>>>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to
>>>> have at least a few fb-sized buffers for backbuffer rendering, etc.
>>>>
>>>> The specific limits could probably use tweaking - I think they only
>>>> consider VRAM size, not the fb size.
>>>>
>>>> I guess 8bpp worked prior to the change you bisected though, so we
>>>> should figure out what we did wrong in the new code.
>>>
>>> Yes, booted 3.7 (last working kernel) and it's running in 8bpp.
>>
>> By the way, instead of booting $kernel, you can use modetest from
>> libdrm/tests. Not sure if it supports C8 though =/
>
> It didn't. But it does now - I mailed a patch to dri-devel, also (with
> slight fix) available at
>
> https://people.freedesktop.org/~imirkin/patches/0001-modetest-add-C8-support-to-generate-SMPTE-pattern.patch
>
> This works on GK208 but not on G92 (whose display unit is much closer
> to your MCP79's). You can run as
>
> ./modetest -s DVI-I-1:1920x1200@C8
>
> This should display a SMPTE pattern, and exit when you hit enter. When
> it does so, it doesn't restore fbcon, but you can swtich to another
> vty to get console back.
>
> I get a white picture on G92. Now just have to figure out how to fix
> it. Someone should also test on a G80 if possible, since that takes a
> different path as well.

Someone tested out a GF100 and it had the same issue.

I've since determined that the color is that of the first entry in the
LUT. With the above program, it's (192, 192, 192) which looks white.

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-20 Thread Ilia Mirkin

On Sat, Nov 18, 2017 at 12:23 AM, Ilia Mirkin  wrote:
> On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin  wrote:
>> On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary  
>> wrote:
>>> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote:
>>>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary
>>>>
>>>>  wrote:
>>>> > @@ -483,8 +483,8 @@
>>>> >  nouveau :02:00.0: disp:0860:  -> 0500
>>>> >  nouveau :02:00.0: disp:0864: 
>>>> >  nouveau :02:00.0: disp:0868:  -> 04000500
>>>> > -nouveau :02:00.0: disp:086c:  -> 00100500
>>>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00
>>>> > +nouveau :02:00.0: disp:086c:  -> 00100a00
>>>> > +nouveau :02:00.0: disp:0870: e900 -> e800
>>>> >  nouveau :02:00.0: disp:0874:  -> 
>>>> >  nouveau :02:00.0: disp:0878: 
>>>> >  nouveau :02:00.0: disp:0880: 0500
>>>> >
>>>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in
>>>> > 64MB case. Why?
>>>> >
>>>> > I get blank screen even with 64MB with video=1280x1024-8 kernel
>>>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen
>>>> > memory.
>>>> >
>>>> > Conclusions: 8-bit support is broken and bpp reduction is weird.
>>>>
>>>> OK, well that makes a *ton* of sense (8bpp being broken).
>>>>
>>>> I think the idea of bpp reduction is that when you're on your shiny
>>>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating
>>>> all that to a pinned fbcon - almost half of that would go to a single
>>>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to
>>>> have at least a few fb-sized buffers for backbuffer rendering, etc.
>>>>
>>>> The specific limits could probably use tweaking - I think they only
>>>> consider VRAM size, not the fb size.
>>>>
>>>> I guess 8bpp worked prior to the change you bisected though, so we
>>>> should figure out what we did wrong in the new code.
>>>
>>> Yes, booted 3.7 (last working kernel) and it's running in 8bpp.
>>
>> By the way, instead of booting $kernel, you can use modetest from
>> libdrm/tests. Not sure if it supports C8 though =/
>
> It didn't. But it does now - I mailed a patch to dri-devel, also (with
> slight fix) available at
>
> https://people.freedesktop.org/~imirkin/patches/0001-modetest-add-C8-support-to-generate-SMPTE-pattern.patch
>
> This works on GK208 but not on G92 (whose display unit is much closer
> to your MCP79's). You can run as
>
> ./modetest -s DVI-I-1:1920x1200@C8
>
> This should display a SMPTE pattern, and exit when you hit enter. When
> it does so, it doesn't restore fbcon, but you can swtich to another
> vty to get console back.
>
> I get a white picture on G92. Now just have to figure out how to fix
> it. Someone should also test on a G80 if possible, since that takes a
> different path as well.

Someone tested out a GF100 and it had the same issue.

I've since determined that the color is that of the first entry in the
LUT. With the above program, it's (192, 192, 192) which looks white.

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-17 Thread Ilia Mirkin

On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
> On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary <li...@rainbow-software.org> 
> wrote:
>> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote:
>>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary
>>>
>>> <li...@rainbow-software.org> wrote:
>>> > @@ -483,8 +483,8 @@
>>> >  nouveau :02:00.0: disp:0860:  -> 0500
>>> >  nouveau :02:00.0: disp:0864: 
>>> >  nouveau :02:00.0: disp:0868:  -> 04000500
>>> > -nouveau :02:00.0: disp:086c:  -> 00100500
>>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00
>>> > +nouveau :02:00.0: disp:086c:  -> 00100a00
>>> > +nouveau :02:00.0: disp:0870: e900 -> e800
>>> >  nouveau :02:00.0: disp:0874:  -> 
>>> >  nouveau :02:00.0: disp:0878: 
>>> >  nouveau :02:00.0: disp:0880: 0500
>>> >
>>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in
>>> > 64MB case. Why?
>>> >
>>> > I get blank screen even with 64MB with video=1280x1024-8 kernel
>>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen
>>> > memory.
>>> >
>>> > Conclusions: 8-bit support is broken and bpp reduction is weird.
>>>
>>> OK, well that makes a *ton* of sense (8bpp being broken).
>>>
>>> I think the idea of bpp reduction is that when you're on your shiny
>>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating
>>> all that to a pinned fbcon - almost half of that would go to a single
>>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to
>>> have at least a few fb-sized buffers for backbuffer rendering, etc.
>>>
>>> The specific limits could probably use tweaking - I think they only
>>> consider VRAM size, not the fb size.
>>>
>>> I guess 8bpp worked prior to the change you bisected though, so we
>>> should figure out what we did wrong in the new code.
>>
>> Yes, booted 3.7 (last working kernel) and it's running in 8bpp.
>
> By the way, instead of booting $kernel, you can use modetest from
> libdrm/tests. Not sure if it supports C8 though =/

It didn't. But it does now - I mailed a patch to dri-devel, also (with
slight fix) available at

https://people.freedesktop.org/~imirkin/patches/0001-modetest-add-C8-support-to-generate-SMPTE-pattern.patch

This works on GK208 but not on G92 (whose display unit is much closer
to your MCP79's). You can run as

./modetest -s DVI-I-1:1920x1200@C8

This should display a SMPTE pattern, and exit when you hit enter. When
it does so, it doesn't restore fbcon, but you can swtich to another
vty to get console back.

I get a white picture on G92. Now just have to figure out how to fix
it. Someone should also test on a G80 if possible, since that takes a
different path as well.

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-17 Thread Ilia Mirkin

On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin  wrote:
> On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary  
> wrote:
>> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote:
>>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary
>>>
>>>  wrote:
>>> > @@ -483,8 +483,8 @@
>>> >  nouveau :02:00.0: disp:0860:  -> 0500
>>> >  nouveau :02:00.0: disp:0864: 
>>> >  nouveau :02:00.0: disp:0868:  -> 04000500
>>> > -nouveau :02:00.0: disp:086c:  -> 00100500
>>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00
>>> > +nouveau :02:00.0: disp:086c:  -> 00100a00
>>> > +nouveau :02:00.0: disp:0870: e900 -> e800
>>> >  nouveau :02:00.0: disp:0874:  -> 
>>> >  nouveau :02:00.0: disp:0878: 
>>> >  nouveau :02:00.0: disp:0880: 0500
>>> >
>>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in
>>> > 64MB case. Why?
>>> >
>>> > I get blank screen even with 64MB with video=1280x1024-8 kernel
>>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen
>>> > memory.
>>> >
>>> > Conclusions: 8-bit support is broken and bpp reduction is weird.
>>>
>>> OK, well that makes a *ton* of sense (8bpp being broken).
>>>
>>> I think the idea of bpp reduction is that when you're on your shiny
>>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating
>>> all that to a pinned fbcon - almost half of that would go to a single
>>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to
>>> have at least a few fb-sized buffers for backbuffer rendering, etc.
>>>
>>> The specific limits could probably use tweaking - I think they only
>>> consider VRAM size, not the fb size.
>>>
>>> I guess 8bpp worked prior to the change you bisected though, so we
>>> should figure out what we did wrong in the new code.
>>
>> Yes, booted 3.7 (last working kernel) and it's running in 8bpp.
>
> By the way, instead of booting $kernel, you can use modetest from
> libdrm/tests. Not sure if it supports C8 though =/

It didn't. But it does now - I mailed a patch to dri-devel, also (with
slight fix) available at

https://people.freedesktop.org/~imirkin/patches/0001-modetest-add-C8-support-to-generate-SMPTE-pattern.patch

This works on GK208 but not on G92 (whose display unit is much closer
to your MCP79's). You can run as

./modetest -s DVI-I-1:1920x1200@C8

This should display a SMPTE pattern, and exit when you hit enter. When
it does so, it doesn't restore fbcon, but you can swtich to another
vty to get console back.

I get a white picture on G92. Now just have to figure out how to fix
it. Someone should also test on a G80 if possible, since that takes a
different path as well.

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-17 Thread Ilia Mirkin

On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
> On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary <li...@rainbow-software.org> 
> wrote:
>> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote:
>>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary
>>>
>>> <li...@rainbow-software.org> wrote:
>>> > @@ -483,8 +483,8 @@
>>> >  nouveau :02:00.0: disp:0860:  -> 0500
>>> >  nouveau :02:00.0: disp:0864: 
>>> >  nouveau :02:00.0: disp:0868:  -> 04000500
>>> > -nouveau :02:00.0: disp:086c:  -> 00100500
>>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00
>>> > +nouveau :02:00.0: disp:086c:  -> 00100a00
>>> > +nouveau :02:00.0: disp:0870: e900 -> e800
>>> >  nouveau :02:00.0: disp:0874:  -> 
>>> >  nouveau :02:00.0: disp:0878: 
>>> >  nouveau :02:00.0: disp:0880: 0500
>>> >
>>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in
>>> > 64MB case. Why?
>>> >
>>> > I get blank screen even with 64MB with video=1280x1024-8 kernel
>>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen
>>> > memory.
>>> >
>>> > Conclusions: 8-bit support is broken and bpp reduction is weird.
>>>
>>> OK, well that makes a *ton* of sense (8bpp being broken).
>>>
>>> I think the idea of bpp reduction is that when you're on your shiny
>>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating
>>> all that to a pinned fbcon - almost half of that would go to a single
>>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to
>>> have at least a few fb-sized buffers for backbuffer rendering, etc.
>>>
>>> The specific limits could probably use tweaking - I think they only
>>> consider VRAM size, not the fb size.
>>>
>>> I guess 8bpp worked prior to the change you bisected though, so we
>>> should figure out what we did wrong in the new code.
>>
>> Yes, booted 3.7 (last working kernel) and it's running in 8bpp.
>
> By the way, instead of booting $kernel, you can use modetest from
> libdrm/tests. Not sure if it supports C8 though =/
>
> I think the issue is this:
>
> -   OUT_RING(evo, nv_crtc->lut.depth == 8 ?
> -   NV50_EVO_CRTC_CLUT_MODE_OFF :
> -   NV50_EVO_CRTC_CLUT_MODE_ON);
>
> Whereas now we always set 0xC000 (aka "ON").

In case I was being unclear, I'm talking about

https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nv50_display.c#L1808

and surrounding items. Looks like lut_clr sets 0x4000 which was
previously not used. Not sure what the difference between that and
0x is. This is what we have in rnndb for it:

https://github.com/envytools/envytools/blob/master/rnndb/display/nv_evo.xml#L408

So bit 30 is mode, set is "high res", unset is "low res". So really
what we want is 0x8000 which leaves the LUT enabled but uses the
low-res mode?

All this could use some playing-around with.

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-17 Thread Ilia Mirkin

On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin  wrote:
> On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary  
> wrote:
>> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote:
>>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary
>>>
>>>  wrote:
>>> > @@ -483,8 +483,8 @@
>>> >  nouveau :02:00.0: disp:0860:  -> 0500
>>> >  nouveau :02:00.0: disp:0864: 
>>> >  nouveau :02:00.0: disp:0868:  -> 04000500
>>> > -nouveau :02:00.0: disp:086c:  -> 00100500
>>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00
>>> > +nouveau :02:00.0: disp:086c:  -> 00100a00
>>> > +nouveau :02:00.0: disp:0870: e900 -> e800
>>> >  nouveau :02:00.0: disp:0874:  -> 
>>> >  nouveau :02:00.0: disp:0878: 
>>> >  nouveau :02:00.0: disp:0880: 0500
>>> >
>>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in
>>> > 64MB case. Why?
>>> >
>>> > I get blank screen even with 64MB with video=1280x1024-8 kernel
>>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen
>>> > memory.
>>> >
>>> > Conclusions: 8-bit support is broken and bpp reduction is weird.
>>>
>>> OK, well that makes a *ton* of sense (8bpp being broken).
>>>
>>> I think the idea of bpp reduction is that when you're on your shiny
>>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating
>>> all that to a pinned fbcon - almost half of that would go to a single
>>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to
>>> have at least a few fb-sized buffers for backbuffer rendering, etc.
>>>
>>> The specific limits could probably use tweaking - I think they only
>>> consider VRAM size, not the fb size.
>>>
>>> I guess 8bpp worked prior to the change you bisected though, so we
>>> should figure out what we did wrong in the new code.
>>
>> Yes, booted 3.7 (last working kernel) and it's running in 8bpp.
>
> By the way, instead of booting $kernel, you can use modetest from
> libdrm/tests. Not sure if it supports C8 though =/
>
> I think the issue is this:
>
> -   OUT_RING(evo, nv_crtc->lut.depth == 8 ?
> -   NV50_EVO_CRTC_CLUT_MODE_OFF :
> -   NV50_EVO_CRTC_CLUT_MODE_ON);
>
> Whereas now we always set 0xC000 (aka "ON").

In case I was being unclear, I'm talking about

https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nv50_display.c#L1808

and surrounding items. Looks like lut_clr sets 0x4000 which was
previously not used. Not sure what the difference between that and
0x is. This is what we have in rnndb for it:

https://github.com/envytools/envytools/blob/master/rnndb/display/nv_evo.xml#L408

So bit 30 is mode, set is "high res", unset is "low res". So really
what we want is 0x8000 which leaves the LUT enabled but uses the
low-res mode?

All this could use some playing-around with.

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-17 Thread Ilia Mirkin

On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary <li...@rainbow-software.org> wrote:
> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote:
>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary
>>
>> <li...@rainbow-software.org> wrote:
>> > @@ -483,8 +483,8 @@
>> >  nouveau :02:00.0: disp:0860:  -> 0500
>> >  nouveau :02:00.0: disp:0864: 
>> >  nouveau :02:00.0: disp:0868:  -> 04000500
>> > -nouveau :02:00.0: disp:086c:  -> 00100500
>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00
>> > +nouveau :02:00.0: disp:086c:  -> 00100a00
>> > +nouveau :02:00.0: disp:0870: e900 -> e800
>> >  nouveau :02:00.0: disp:0874:  -> 
>> >  nouveau :02:00.0: disp:0878: 
>> >  nouveau :02:00.0: disp:0880: 0500
>> >
>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in
>> > 64MB case. Why?
>> >
>> > I get blank screen even with 64MB with video=1280x1024-8 kernel
>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen
>> > memory.
>> >
>> > Conclusions: 8-bit support is broken and bpp reduction is weird.
>>
>> OK, well that makes a *ton* of sense (8bpp being broken).
>>
>> I think the idea of bpp reduction is that when you're on your shiny
>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating
>> all that to a pinned fbcon - almost half of that would go to a single
>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to
>> have at least a few fb-sized buffers for backbuffer rendering, etc.
>>
>> The specific limits could probably use tweaking - I think they only
>> consider VRAM size, not the fb size.
>>
>> I guess 8bpp worked prior to the change you bisected though, so we
>> should figure out what we did wrong in the new code.
>
> Yes, booted 3.7 (last working kernel) and it's running in 8bpp.

By the way, instead of booting $kernel, you can use modetest from
libdrm/tests. Not sure if it supports C8 though =/

I think the issue is this:

-   OUT_RING(evo, nv_crtc->lut.depth == 8 ?
-   NV50_EVO_CRTC_CLUT_MODE_OFF :
-   NV50_EVO_CRTC_CLUT_MODE_ON);

Whereas now we always set 0xC000 (aka "ON").

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-17 Thread Ilia Mirkin

On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary  wrote:
> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote:
>> On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary
>>
>>  wrote:
>> > @@ -483,8 +483,8 @@
>> >  nouveau :02:00.0: disp:0860:  -> 0500
>> >  nouveau :02:00.0: disp:0864: 
>> >  nouveau :02:00.0: disp:0868:  -> 04000500
>> > -nouveau :02:00.0: disp:086c:  -> 00100500
>> > -nouveau :02:00.0: disp:0870: e900 -> 1e00
>> > +nouveau :02:00.0: disp:086c:  -> 00100a00
>> > +nouveau :02:00.0: disp:0870: e900 -> e800
>> >  nouveau :02:00.0: disp:0874:  -> 
>> >  nouveau :02:00.0: disp:0878: 
>> >  nouveau :02:00.0: disp:0880: 0500
>> >
>> > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in
>> > 64MB case. Why?
>> >
>> > I get blank screen even with 64MB with video=1280x1024-8 kernel
>> > parameter. Console works with video=1280x1024-16 even with 32MB stolen
>> > memory.
>> >
>> > Conclusions: 8-bit support is broken and bpp reduction is weird.
>>
>> OK, well that makes a *ton* of sense (8bpp being broken).
>>
>> I think the idea of bpp reduction is that when you're on your shiny
>> new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating
>> all that to a pinned fbcon - almost half of that would go to a single
>> 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to
>> have at least a few fb-sized buffers for backbuffer rendering, etc.
>>
>> The specific limits could probably use tweaking - I think they only
>> consider VRAM size, not the fb size.
>>
>> I guess 8bpp worked prior to the change you bisected though, so we
>> should figure out what we did wrong in the new code.
>
> Yes, booted 3.7 (last working kernel) and it's running in 8bpp.

By the way, instead of booting $kernel, you can use modetest from
libdrm/tests. Not sure if it supports C8 though =/

I think the issue is this:

-   OUT_RING(evo, nv_crtc->lut.depth == 8 ?
-   NV50_EVO_CRTC_CLUT_MODE_OFF :
-   NV50_EVO_CRTC_CLUT_MODE_ON);

Whereas now we always set 0xC000 (aka "ON").

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-17 Thread Ilia Mirkin

On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary
 wrote:
> @@ -483,8 +483,8 @@
>  nouveau :02:00.0: disp:0860:  -> 0500
>  nouveau :02:00.0: disp:0864: 
>  nouveau :02:00.0: disp:0868:  -> 04000500
> -nouveau :02:00.0: disp:086c:  -> 00100500
> -nouveau :02:00.0: disp:0870: e900 -> 1e00
> +nouveau :02:00.0: disp:086c:  -> 00100a00
> +nouveau :02:00.0: disp:0870: e900 -> e800
>  nouveau :02:00.0: disp:0874:  -> 
>  nouveau :02:00.0: disp:0878: 
>  nouveau :02:00.0: disp:0880: 0500
>
> Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in 64MB
> case. Why?
>
> I get blank screen even with 64MB with video=1280x1024-8 kernel parameter.
> Console works with video=1280x1024-16 even with 32MB stolen memory.
>
> Conclusions: 8-bit support is broken and bpp reduction is weird.

OK, well that makes a *ton* of sense (8bpp being broken).

I think the idea of bpp reduction is that when you're on your shiny
new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating
all that to a pinned fbcon - almost half of that would go to a single
32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to
have at least a few fb-sized buffers for backbuffer rendering, etc.

The specific limits could probably use tweaking - I think they only
consider VRAM size, not the fb size.

I guess 8bpp worked prior to the change you bisected though, so we
should figure out what we did wrong in the new code.

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-17 Thread Ilia Mirkin

On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary
 wrote:
> @@ -483,8 +483,8 @@
>  nouveau :02:00.0: disp:0860:  -> 0500
>  nouveau :02:00.0: disp:0864: 
>  nouveau :02:00.0: disp:0868:  -> 04000500
> -nouveau :02:00.0: disp:086c:  -> 00100500
> -nouveau :02:00.0: disp:0870: e900 -> 1e00
> +nouveau :02:00.0: disp:086c:  -> 00100a00
> +nouveau :02:00.0: disp:0870: e900 -> e800
>  nouveau :02:00.0: disp:0874:  -> 
>  nouveau :02:00.0: disp:0878: 
>  nouveau :02:00.0: disp:0880: 0500
>
> Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in 64MB
> case. Why?
>
> I get blank screen even with 64MB with video=1280x1024-8 kernel parameter.
> Console works with video=1280x1024-16 even with 32MB stolen memory.
>
> Conclusions: 8-bit support is broken and bpp reduction is weird.

OK, well that makes a *ton* of sense (8bpp being broken).

I think the idea of bpp reduction is that when you're on your shiny
new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating
all that to a pinned fbcon - almost half of that would go to a single
32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to
have at least a few fb-sized buffers for backbuffer rendering, etc.

The specific limits could probably use tweaking - I think they only
consider VRAM size, not the fb size.

I guess 8bpp worked prior to the change you bisected though, so we
should figure out what we did wrong in the new code.

  -ilia

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-17 Thread Ilia Mirkin

With a new kernel, mind grabbing a dmesg with drm.debug=0x1e
nouveau.debug=debug (or maybe even =trace)? Maybe also see if
fbcon/fbdev have any debug things that can be turned on?

Sounds like things are generally working, just the fbcon -> nouveaufb
path seems somehow buggered.

Another thing to try would be nouveau.atomic=1

On Fri, Nov 17, 2017 at 9:26 AM, Ondrej Zary  wrote:
> Hello,
> I've just been hit by this old bug which is still present in 4.14:
> https://bugs.freedesktop.org/show_bug.cgi?id=80675
>
> On MCP79 (ION), when stolen memory is set to 32MB in BIOS, console is blank
> but X11 works. When the stolen memory is increased to 64MB, console works
> fine.
>
> Bisected it to this:
>
> 4f6029da58ba9204c98e33f4f3737fe085c87a6f is the first bad commit
> commit 4f6029da58ba9204c98e33f4f3737fe085c87a6f
> Author: Ben Skeggs 
> Date:   Fri Nov 16 11:54:31 2012 +1000
>
> drm/nv50-nvc0: switch to common disp impl, removing previous version
>
> Signed-off-by: Ben Skeggs 
>
> It's a big change so I'm not able to do more debugging.
>
> --
> Ondrej Zary

Re: Blank console but X11 works on MCP79 - old regression since 3.8

2017-11-17 Thread Ilia Mirkin

With a new kernel, mind grabbing a dmesg with drm.debug=0x1e
nouveau.debug=debug (or maybe even =trace)? Maybe also see if
fbcon/fbdev have any debug things that can be turned on?

Sounds like things are generally working, just the fbcon -> nouveaufb
path seems somehow buggered.

Another thing to try would be nouveau.atomic=1

On Fri, Nov 17, 2017 at 9:26 AM, Ondrej Zary  wrote:
> Hello,
> I've just been hit by this old bug which is still present in 4.14:
> https://bugs.freedesktop.org/show_bug.cgi?id=80675
>
> On MCP79 (ION), when stolen memory is set to 32MB in BIOS, console is blank
> but X11 works. When the stolen memory is increased to 64MB, console works
> fine.
>
> Bisected it to this:
>
> 4f6029da58ba9204c98e33f4f3737fe085c87a6f is the first bad commit
> commit 4f6029da58ba9204c98e33f4f3737fe085c87a6f
> Author: Ben Skeggs 
> Date:   Fri Nov 16 11:54:31 2012 +1000
>
> drm/nv50-nvc0: switch to common disp impl, removing previous version
>
> Signed-off-by: Ben Skeggs 
>
> It's a big change so I'm not able to do more debugging.
>
> --
> Ondrej Zary

Re: DSA mv88e6xxx RX frame errors and TCP/IP RX failure

2017-08-30 Thread Ilia Mirkin

On Wed, Aug 30, 2017 at 8:22 PM, Tim Harvey  wrote:
> On Wed, Aug 30, 2017 at 3:06 PM, Andrew Lunn  wrote:
>> On Wed, Aug 30, 2017 at 12:53:56PM -0700, Tim Harvey wrote:
>>> Greetings,
>>>
>>> I'm seeing RX frame errors when using the mv88e6xxx DSA driver on
>>> 4.13-rc7. The board I'm using is a GW5904 [1] which has an IMX6 FEC
>>> MAC (eth0) connected via RGMII to a MV88E6176 with its downstream
>>> P0/P1/P2/P3 to front panel RJ45's (lan1-lan4).
>>
>> Hi Tim
>>
>> Can you confirm the counter is this one:
>>
>>/* Report late collisions as a frame error. */
>> if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL))
>> ndev->stats.rx_frame_errors++;
>>
>> I don't see anywhere else frame errors are counted, but it would be
>> good to prove we are looking in the right place.
>>
>
> Andrew,
>
> (adding IMX FEC driver maintainer to CC)
>
> Yes, that's one of them being hit. It looks like ifconfig reports
> 'frame' as the accumulation of a few stats so here are some more
> specifics from /sys/class/net/eth0/statistics:
>
> root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics#
> for i in `ls rx_*`; do echo $i:$(cat $i); done
> rx_bytes:103229
> rx_compressed:0
> rx_crc_errors:22
> rx_dropped:0
> rx_errors:22
> rx_fifo_errors:0
> rx_frame_errors:22
> rx_length_errors:22
> rx_missed_errors:0
> rx_nohandler:0
> rx_over_errors:0
> rx_packets:1174
> root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics#
> ifconfig eth0
> eth0  Link encap:Ethernet  HWaddr 00:D0:12:41:F3:E7
>   inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:1207 errors:22 dropped:0 overruns:0 frame:66
>   TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:106009 (103.5 KiB)  TX bytes:4604 (4.4 KiB)
>
> Instrumenting fec driver I see the following getting hit:
>
> status & BD_ENET_RX_LG /* rx_length_errors: Frame too long */
> status & BD_ENET_RX_CR  /* rx_crc_errors: CRC Error */
> status & BD_ENET_RX_CL /* rx_frame_errors: Collision? */
>
> Is this a frame size issue where the MV88E6176 is sending frames down
> that exceed the MTU because of headers added?

Not sure if this is relevant to you, but
https://github.com/laanwj/linux-freedreno-a2xx/commit/076b6542fa27499072ec6c3a7941c8b3c79ba1fd
was necessary to fix some MTU issues on a i.MX51. Not sure if it's
upstream yet or not.

Cheers,

  -ilia

Re: DSA mv88e6xxx RX frame errors and TCP/IP RX failure

2017-08-30 Thread Ilia Mirkin

On Wed, Aug 30, 2017 at 8:22 PM, Tim Harvey  wrote:
> On Wed, Aug 30, 2017 at 3:06 PM, Andrew Lunn  wrote:
>> On Wed, Aug 30, 2017 at 12:53:56PM -0700, Tim Harvey wrote:
>>> Greetings,
>>>
>>> I'm seeing RX frame errors when using the mv88e6xxx DSA driver on
>>> 4.13-rc7. The board I'm using is a GW5904 [1] which has an IMX6 FEC
>>> MAC (eth0) connected via RGMII to a MV88E6176 with its downstream
>>> P0/P1/P2/P3 to front panel RJ45's (lan1-lan4).
>>
>> Hi Tim
>>
>> Can you confirm the counter is this one:
>>
>>/* Report late collisions as a frame error. */
>> if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL))
>> ndev->stats.rx_frame_errors++;
>>
>> I don't see anywhere else frame errors are counted, but it would be
>> good to prove we are looking in the right place.
>>
>
> Andrew,
>
> (adding IMX FEC driver maintainer to CC)
>
> Yes, that's one of them being hit. It looks like ifconfig reports
> 'frame' as the accumulation of a few stats so here are some more
> specifics from /sys/class/net/eth0/statistics:
>
> root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics#
> for i in `ls rx_*`; do echo $i:$(cat $i); done
> rx_bytes:103229
> rx_compressed:0
> rx_crc_errors:22
> rx_dropped:0
> rx_errors:22
> rx_fifo_errors:0
> rx_frame_errors:22
> rx_length_errors:22
> rx_missed_errors:0
> rx_nohandler:0
> rx_over_errors:0
> rx_packets:1174
> root@xenial:/sys/devices/soc0/soc/210.aips-bus/2188000.ethernet/net/eth0/statistics#
> ifconfig eth0
> eth0  Link encap:Ethernet  HWaddr 00:D0:12:41:F3:E7
>   inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:1207 errors:22 dropped:0 overruns:0 frame:66
>   TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:106009 (103.5 KiB)  TX bytes:4604 (4.4 KiB)
>
> Instrumenting fec driver I see the following getting hit:
>
> status & BD_ENET_RX_LG /* rx_length_errors: Frame too long */
> status & BD_ENET_RX_CR  /* rx_crc_errors: CRC Error */
> status & BD_ENET_RX_CL /* rx_frame_errors: Collision? */
>
> Is this a frame size issue where the MV88E6176 is sending frames down
> that exceed the MTU because of headers added?

Not sure if this is relevant to you, but
https://github.com/laanwj/linux-freedreno-a2xx/commit/076b6542fa27499072ec6c3a7941c8b3c79ba1fd
was necessary to fix some MTU issues on a i.MX51. Not sure if it's
upstream yet or not.

Cheers,

  -ilia

Re: [PATCH][V2] drm/nouveau: perform null check on msto[i] rathern than msto

2017-08-17 Thread Ilia Mirkin

On Thu, Aug 17, 2017 at 6:03 PM, Colin King  wrote:
> From: Colin Ian King 
>
> The null check on the array msto is incorrect since msto is never
> null. The null check should be instead on msto[i] since this is
> being dereferenced in the call to drm_mode_connector_attach_encoder.
>
> Thanks to Emil Velikov for pointing out the mistake in my original
> fix and for suggesting the correct fix.
>
> Detected by CoverityScan, CID#1375915 ("Array compared against 0")
>
> Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 
> multi-stream")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/nouveau/nv50_display.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nv50_display.c 
> b/drivers/gpu/drm/nouveau/nv50_display.c
> index f7b4326a4641..ed444ecd9e85 100644
> --- a/drivers/gpu/drm/nouveau/nv50_display.c
> +++ b/drivers/gpu/drm/nouveau/nv50_display.c
> @@ -3141,7 +3141,7 @@ nv50_mstc_new(struct nv50_mstm *mstm, struct 
> drm_dp_mst_port *port,
> mstc->connector.funcs->reset(>connector);
> nouveau_conn_attach_properties(>connector);
>
> -   for (i = 0; i < ARRAY_SIZE(mstm->msto) && mstm->msto; i++)
> +   for (i = 0; i < ARRAY_SIZE(mstm->msto) && mstm->msto[i]; i++)

Ben will have to rule on which way is correct, but another
interpretation is that it should be

for (...; i < ARRAY_SIZE; i++)
  if (mstm->msto[i])
do_stuff()

I haven't the faintest clue whether the msto array can have "holes" or not.

> drm_mode_connector_attach_encoder(>connector, 
> >msto[i]->encoder);
>
> drm_object_attach_property(>connector.base, 
> dev->mode_config.path_property, 0);
> --
> 2.11.0
>

Re: [PATCH][V2] drm/nouveau: perform null check on msto[i] rathern than msto

2017-08-17 Thread Ilia Mirkin

On Thu, Aug 17, 2017 at 6:03 PM, Colin King  wrote:
> From: Colin Ian King 
>
> The null check on the array msto is incorrect since msto is never
> null. The null check should be instead on msto[i] since this is
> being dereferenced in the call to drm_mode_connector_attach_encoder.
>
> Thanks to Emil Velikov for pointing out the mistake in my original
> fix and for suggesting the correct fix.
>
> Detected by CoverityScan, CID#1375915 ("Array compared against 0")
>
> Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 
> multi-stream")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/nouveau/nv50_display.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nv50_display.c 
> b/drivers/gpu/drm/nouveau/nv50_display.c
> index f7b4326a4641..ed444ecd9e85 100644
> --- a/drivers/gpu/drm/nouveau/nv50_display.c
> +++ b/drivers/gpu/drm/nouveau/nv50_display.c
> @@ -3141,7 +3141,7 @@ nv50_mstc_new(struct nv50_mstm *mstm, struct 
> drm_dp_mst_port *port,
> mstc->connector.funcs->reset(>connector);
> nouveau_conn_attach_properties(>connector);
>
> -   for (i = 0; i < ARRAY_SIZE(mstm->msto) && mstm->msto; i++)
> +   for (i = 0; i < ARRAY_SIZE(mstm->msto) && mstm->msto[i]; i++)

Ben will have to rule on which way is correct, but another
interpretation is that it should be

for (...; i < ARRAY_SIZE; i++)
  if (mstm->msto[i])
do_stuff()

I haven't the faintest clue whether the msto array can have "holes" or not.

> drm_mode_connector_attach_encoder(>connector, 
> >msto[i]->encoder);
>
> drm_object_attach_property(>connector.base, 
> dev->mode_config.path_property, 0);
> --
> 2.11.0
>

Re: nouveau driver locks up with 4.11 kernel

2017-08-14 Thread Ilia Mirkin

On Mon, Aug 14, 2017 at 4:29 PM, Michal Hocko <mho...@kernel.org> wrote:
> On Mon 14-08-17 15:27:20, Ilia Mirkin wrote:
>> On Mon, Aug 14, 2017 at 3:18 PM, Michal Hocko <mho...@kernel.org> wrote:
> [...]
>> > nouveau :03:00.0: fifo: channel 6 [mpv/vo[3535]] kick timeout
>> > nouveau: mpv/vo[3535]::906f: detach gr failed, -110
>>
>> Are you using mpv in conjunction with the GL video output and
>> VDPAU-based acceleration? That will kill nouveau. For VDPAU, I
>> recommend mplayer.
>
> Well, I am using mplayer package and vo=sdl. Which video output should I

Well, according to the logs you're using "mpv", which, along with
mplayer2, is not mplayer. I recommend mplayer. Not sure what the sdl
video output does TBH, I've never used it -- perhaps mpv still manages
to use GL for that? xv and vdpau are ones to use. [ In order to use
VDPAU for decoding, you of course have to follow the instructions at
https://nouveau.freedesktop.org/wiki/VideoAcceleration/#firmware ]

> try instead? Btw. xine seems to be using VDPAU as well, yet it doesn't
> lockup the whole X session. The videou output doesn't work properly
> either but at least I am able to kill xine and still have the session.

Happy to explain all the dirty details on IRC if you're curious. Doing
things in multiple threads kills nouveau, and mpv does precisely that.

Re: nouveau driver locks up with 4.11 kernel

2017-08-14 Thread Ilia Mirkin

On Mon, Aug 14, 2017 at 4:29 PM, Michal Hocko  wrote:
> On Mon 14-08-17 15:27:20, Ilia Mirkin wrote:
>> On Mon, Aug 14, 2017 at 3:18 PM, Michal Hocko  wrote:
> [...]
>> > nouveau :03:00.0: fifo: channel 6 [mpv/vo[3535]] kick timeout
>> > nouveau: mpv/vo[3535]::906f: detach gr failed, -110
>>
>> Are you using mpv in conjunction with the GL video output and
>> VDPAU-based acceleration? That will kill nouveau. For VDPAU, I
>> recommend mplayer.
>
> Well, I am using mplayer package and vo=sdl. Which video output should I

Well, according to the logs you're using "mpv", which, along with
mplayer2, is not mplayer. I recommend mplayer. Not sure what the sdl
video output does TBH, I've never used it -- perhaps mpv still manages
to use GL for that? xv and vdpau are ones to use. [ In order to use
VDPAU for decoding, you of course have to follow the instructions at
https://nouveau.freedesktop.org/wiki/VideoAcceleration/#firmware ]

> try instead? Btw. xine seems to be using VDPAU as well, yet it doesn't
> lockup the whole X session. The videou output doesn't work properly
> either but at least I am able to kill xine and still have the session.

Happy to explain all the dirty details on IRC if you're curious. Doing
things in multiple threads kills nouveau, and mpv does precisely that.

Re: nouveau driver locks up with 4.11 kernel

2017-08-14 Thread Ilia Mirkin

On Mon, Aug 14, 2017 at 3:18 PM, Michal Hocko  wrote:
> Hi,
> I am having issues with nouveau driver in 4.11 Debian distribution
> kernel. I can start X session but the screen locks up e.g. when I try to
> exit mplayer fullscreen mode. The lock is swamped with tons of
> nouveau :03:00.0: fifo: SCHED_ERROR 13 []
>
> messages and I also can see some warnings
>  [ cut here ]
>  WARNING: CPU: 1 PID: 3535 at 
> /build/linux-J4LMtv/linux-4.11.6/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogf100.c:85
>  gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau]
>  nouveau :03:00.0: timeout
>  Modules linked in: nouveau mxm_wmi wmi ttm cpufreq_powersave 
> cpufreq_conservative cpufreq_userspace iptable_filter iptable_nat 
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle ip_tables 
> x_tables binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace 
> fscache sunrpc i2c_sis96x hwmon_vid nf_conntrack_ftp nf_conntrack fuse 
> i2c_dev thermal fan ac battery ntfs snd_intel8x0 snd_ac97_codec ac97_bus 
> psmouse lp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic 
> snd_hda_intel ppdev intel_powerclamp iTCO_wdt iTCO_vendor_support 
> snd_hda_codec coretemp gma500_gfx snd_hda_core evdev serio_raw drm_kms_helper 
> snd_hwdep snd_pcm_oss pcspkr snd_mixer_oss snd_pcm drm snd_seq_midi 
> snd_seq_midi_event sg snd_rawmidi snd_seq snd_seq_device snd_timer parport_pc 
> snd soundcore
>   parport i2c_algo_bit shpchp lpc_ich tpm_infineon mfd_core video button ext4 
> crc16 jbd2 fscrypto ecb crypto_simd cryptd aes_i586 mbcache raid10 raid456 
> async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
> libcrc32c crc32c_generic raid0 multipath linear uas usb_storage dm_mod raid1 
> md_mod sd_mod hid_generic usbhid hid ahci libahci libata i2c_i801 scsi_mod 
> ehci_pci uhci_hcd e1000e ptp pps_core ehci_hcd usbcore usb_common
>  CPU: 1 PID: 3535 Comm: mpv/vo Tainted: GW   4.11.0-1-686-pae #1 
> Debian 4.11.6-1
>  Hardware name:  /D2500HN, BIOS 
> MUCDT10N.86A.0073.2012.1101.1638 11/01/2012
>  Call Trace:
>   ? dump_stack+0x55/0x73
>   ? __warn+0xea/0x110
>   ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau]
>   ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau]
>   ? warn_slowpath_fmt+0x46/0x60
>   ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau]
>   ? gf100_fifo_gpfifo_engine_addr.isra.1+0xa0/0xa0 [nouveau]
>   ? nvkm_fifo_chan_child_fini+0x4e/0x120 [nouveau]
>   ? nvkm_object_del+0x58/0x90 [nouveau]
>   ? ktime_get+0x4b/0x110
>   ? nvkm_oproxy_fini+0x23/0x80 [nouveau]
>   ? nvkm_object_fini+0x137/0x300 [nouveau]
>   ? nvkm_ioctl_del+0x8c/0xa0 [nouveau]
>   ? nvkm_ioctl+0x100/0x290 [nouveau]
>   ? __check_object_size+0x9e/0x13c
>   ? __check_object_size+0x9e/0x13c
>   ? nvif_client_ioctl+0x2b/0x40 [nouveau]
>   ? usif_ioctl+0x4eb/0x790 [nouveau]
>   ? nouveau_drm_ioctl+0xab/0xb0 [nouveau]
>   ? nouveau_pmops_resume+0x80/0x80 [nouveau]
>   ? do_vfs_ioctl+0x91/0x6b0
>   ? SyS_ioctl+0x60/0x70
>   ? do_fast_syscall_32+0x8a/0x150
>   ? entry_SYSENTER_32+0x4e/0x7c
>  ---[ end trace 1bf6c731018c2e52 ]---
>
> followed by
> nouveau :03:00.0: fifo: channel 6 [mpv/vo[3535]] kick timeout
> nouveau: mpv/vo[3535]::906f: detach gr failed, -110

Are you using mpv in conjunction with the GL video output and
VDPAU-based acceleration? That will kill nouveau. For VDPAU, I
recommend mplayer.

Cheers,

  -ilia

Re: nouveau driver locks up with 4.11 kernel

2017-08-14 Thread Ilia Mirkin

On Mon, Aug 14, 2017 at 3:18 PM, Michal Hocko  wrote:
> Hi,
> I am having issues with nouveau driver in 4.11 Debian distribution
> kernel. I can start X session but the screen locks up e.g. when I try to
> exit mplayer fullscreen mode. The lock is swamped with tons of
> nouveau :03:00.0: fifo: SCHED_ERROR 13 []
>
> messages and I also can see some warnings
>  [ cut here ]
>  WARNING: CPU: 1 PID: 3535 at 
> /build/linux-J4LMtv/linux-4.11.6/drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogf100.c:85
>  gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau]
>  nouveau :03:00.0: timeout
>  Modules linked in: nouveau mxm_wmi wmi ttm cpufreq_powersave 
> cpufreq_conservative cpufreq_userspace iptable_filter iptable_nat 
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle ip_tables 
> x_tables binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace 
> fscache sunrpc i2c_sis96x hwmon_vid nf_conntrack_ftp nf_conntrack fuse 
> i2c_dev thermal fan ac battery ntfs snd_intel8x0 snd_ac97_codec ac97_bus 
> psmouse lp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic 
> snd_hda_intel ppdev intel_powerclamp iTCO_wdt iTCO_vendor_support 
> snd_hda_codec coretemp gma500_gfx snd_hda_core evdev serio_raw drm_kms_helper 
> snd_hwdep snd_pcm_oss pcspkr snd_mixer_oss snd_pcm drm snd_seq_midi 
> snd_seq_midi_event sg snd_rawmidi snd_seq snd_seq_device snd_timer parport_pc 
> snd soundcore
>   parport i2c_algo_bit shpchp lpc_ich tpm_infineon mfd_core video button ext4 
> crc16 jbd2 fscrypto ecb crypto_simd cryptd aes_i586 mbcache raid10 raid456 
> async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
> libcrc32c crc32c_generic raid0 multipath linear uas usb_storage dm_mod raid1 
> md_mod sd_mod hid_generic usbhid hid ahci libahci libata i2c_i801 scsi_mod 
> ehci_pci uhci_hcd e1000e ptp pps_core ehci_hcd usbcore usb_common
>  CPU: 1 PID: 3535 Comm: mpv/vo Tainted: GW   4.11.0-1-686-pae #1 
> Debian 4.11.6-1
>  Hardware name:  /D2500HN, BIOS 
> MUCDT10N.86A.0073.2012.1101.1638 11/01/2012
>  Call Trace:
>   ? dump_stack+0x55/0x73
>   ? __warn+0xea/0x110
>   ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau]
>   ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau]
>   ? warn_slowpath_fmt+0x46/0x60
>   ? gf100_fifo_gpfifo_engine_fini+0x14f/0x1d0 [nouveau]
>   ? gf100_fifo_gpfifo_engine_addr.isra.1+0xa0/0xa0 [nouveau]
>   ? nvkm_fifo_chan_child_fini+0x4e/0x120 [nouveau]
>   ? nvkm_object_del+0x58/0x90 [nouveau]
>   ? ktime_get+0x4b/0x110
>   ? nvkm_oproxy_fini+0x23/0x80 [nouveau]
>   ? nvkm_object_fini+0x137/0x300 [nouveau]
>   ? nvkm_ioctl_del+0x8c/0xa0 [nouveau]
>   ? nvkm_ioctl+0x100/0x290 [nouveau]
>   ? __check_object_size+0x9e/0x13c
>   ? __check_object_size+0x9e/0x13c
>   ? nvif_client_ioctl+0x2b/0x40 [nouveau]
>   ? usif_ioctl+0x4eb/0x790 [nouveau]
>   ? nouveau_drm_ioctl+0xab/0xb0 [nouveau]
>   ? nouveau_pmops_resume+0x80/0x80 [nouveau]
>   ? do_vfs_ioctl+0x91/0x6b0
>   ? SyS_ioctl+0x60/0x70
>   ? do_fast_syscall_32+0x8a/0x150
>   ? entry_SYSENTER_32+0x4e/0x7c
>  ---[ end trace 1bf6c731018c2e52 ]---
>
> followed by
> nouveau :03:00.0: fifo: channel 6 [mpv/vo[3535]] kick timeout
> nouveau: mpv/vo[3535]::906f: detach gr failed, -110

Are you using mpv in conjunction with the GL video output and
VDPAU-based acceleration? That will kill nouveau. For VDPAU, I
recommend mplayer.

Cheers,

  -ilia

Re: [PATCH v5 2/6] drm/bridge: Add a devm_ allocator for panel bridge.

2017-08-04 Thread Ilia Mirkin

On Fri, Aug 4, 2017 at 4:43 PM, Eric Anholt  wrote:
> Laurent Pinchart  writes:
>
>> Hi Eric,
>>
>> (CC'ing Daniel)
>>
>> Thank you for the patch.
>>
>> On Tuesday 18 Jul 2017 14:05:06 Eric Anholt wrote:
>>> This will let drivers reduce the error cleanup they need, in
>>> particular the "is_panel_bridge" flag.
>>>
>>> v2: Slight cleanup of remove function by Andrzej
>>
>> I just want to point out that, in the context of Daniel's work on hot-unplug,
>> 90% of the devm_* allocations are wrong and will get in the way. All DRM core
>> objects that are accessible one way or another from userspace will need to be
>> properly reference-counted and freed only when the last reference disappears,
>> which could be well after the corresponding device is removed. I believe this
>> could be one such objects :-/
>
> Sure, if you're hotplugging, your life is pain.  For non-hotpluggable
> devices, like our SOC platform devices (current panel-bridge consumers),
> this still seems like an excellent simplification of memory management.

At that point you may as well make your module non-unloadable, and
return failure when trying to remove a device from management by the
driver (whatever the opposite of "probe" is, I forget). Hotplugging
doesn't only happen when physically removing, it can happen for all
kinds of reasons... and userspace may still hold references in some of
those cases.

Re: [PATCH v5 2/6] drm/bridge: Add a devm_ allocator for panel bridge.

2017-08-04 Thread Ilia Mirkin

On Fri, Aug 4, 2017 at 4:43 PM, Eric Anholt  wrote:
> Laurent Pinchart  writes:
>
>> Hi Eric,
>>
>> (CC'ing Daniel)
>>
>> Thank you for the patch.
>>
>> On Tuesday 18 Jul 2017 14:05:06 Eric Anholt wrote:
>>> This will let drivers reduce the error cleanup they need, in
>>> particular the "is_panel_bridge" flag.
>>>
>>> v2: Slight cleanup of remove function by Andrzej
>>
>> I just want to point out that, in the context of Daniel's work on hot-unplug,
>> 90% of the devm_* allocations are wrong and will get in the way. All DRM core
>> objects that are accessible one way or another from userspace will need to be
>> properly reference-counted and freed only when the last reference disappears,
>> which could be well after the corresponding device is removed. I believe this
>> could be one such objects :-/
>
> Sure, if you're hotplugging, your life is pain.  For non-hotpluggable
> devices, like our SOC platform devices (current panel-bridge consumers),
> this still seems like an excellent simplification of memory management.

At that point you may as well make your module non-unloadable, and
return failure when trying to remove a device from management by the
driver (whatever the opposite of "probe" is, I forget). Hotplugging
doesn't only happen when physically removing, it can happen for all
kinds of reasons... and userspace may still hold references in some of
those cases.

Re: [Nouveau] [PATCH] drm: disable vblank only if it got previously enabled

2017-07-19 Thread Ilia Mirkin

I believe the solution is to not call drm_crtc_vblank_off for atomic
modesetting in nouveau_display_fini. I think Ben's working on it.

On Wed, Jul 19, 2017 at 1:25 PM, Tobias Klausmann
 wrote:
> mimic the behavior of vblank_disable_fn(), another caller of
> drm_vblank_disable_and_save().
>
> This avoids oopsing, while trying to disable vblank on a not connected 
> display:
>
> [   12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 
> drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm]
> [   12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc 
> uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops 
> videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 
> snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 
> hid_multitouch nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp 
> vfat coretemp fat kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass 
> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc 
> aesni_intel ath10k_pci snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec 
> crypto_simd ath glue_helper cryptd snd_hda_core mac80211 snd_hwdep snd_pcm 
> pcspkr r8169 cfg80211 mii snd_timer acer_wmi snd sparse_keymap wmi_bmof 
> idma64 hci_uart virt_dma mei_me soundcore i2c_i801 mei btbcm shpchp 
> intel_lpss_pci intel_pch_thermal
> [   12.768130]  serdev btqca ucsi_acpi btintel typec_ucsi thermal typec 
> bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill intel_lpss_acpi 
> pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 mxm_wmi ttm 
> i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect sysimgblt 
> xhci_hcd fb_sys_fops usbcore drm i2c_hid wmi video button sg efivarfs
> [   12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted 
> 4.12.0-desktop-debug-drm+ #2
> [   12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 
> 03/30/2017
> [   12.768164] Workqueue: pm pm_runtime_work
> [   12.768166] task: 889bf1627040 task.stack: 9541013e4000
> [   12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 
> [drm]
> [   12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086
> [   12.768183] RAX: 001c RBX: 889b4cebd000 RCX: 
> 0004
> [   12.768184] RDX: 8004 RSI: 87a2d952 RDI: 
> 
> [   12.768186] RBP: 9541013e7b90 R08: 0001 R09: 
> 039f
> [   12.768187] R10: c05fe530 R11:  R12: 
> 
> [   12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: 
> 889bf0426000
> [   12.768190] FS:  () GS:889bfec0() 
> knlGS:
> [   12.768191] CS:  0010 DS:  ES:  CR0: 80050033
> [   12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: 
> 003406f0
> [   12.768193] Call Trace:
> [   12.768198]  ? enqueue_task_fair+0x64/0x600
> [   12.768211]  ? drm_get_last_vbltimestamp+0x47/0x70 [drm]
> [   12.768223]  ? drm_update_vblank_count+0x65/0x240 [drm]
> [   12.768227]  ? pci_pm_runtime_resume+0xa0/0xa0
> [   12.768238]  ? drm_vblank_disable_and_save+0x55/0xc0 [drm]
> [   12.768250]  ? drm_crtc_vblank_off+0xa9/0x1e0 [drm]
> [   12.768253]  ? pci_pm_runtime_resume+0xa0/0xa0
> [   12.768299]  ? nouveau_display_fini+0x56/0xd0 [nouveau]
> [   12.768339]  ? nouveau_display_suspend+0x51/0x110 [nouveau]
> [   12.768378]  ? nouveau_do_suspend+0x76/0x1c0 [nouveau]
> [   12.768413]  ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau]
> [   12.768416]  ? pci_pm_runtime_suspend+0x5c/0x160
> [   12.768419]  ? __rpm_callback+0xb6/0x1e0
> [   12.768423]  ? kobject_uevent_env+0x111/0x5e0
> [   12.768425]  ? pci_pm_runtime_resume+0xa0/0xa0
> [   12.768427]  ? rpm_callback+0x1f/0x70
> [   12.768429]  ? pci_pm_runtime_resume+0xa0/0xa0
> [   12.768431]  ? rpm_suspend+0x11f/0x640
> [   12.768441]  ? drm_fb_helper_hotplug_event+0x9a/0xe0 [drm_kms_helper]
> [   12.768447]  ? output_poll_execute+0x17b/0x1a0 [drm_kms_helper]
> [   12.768449]  ? pm_runtime_work+0x64/0xa0
> [   12.768453]  ? process_one_work+0x1db/0x410
> [   12.768456]  ? worker_thread+0x47/0x3d0
> [   12.768459]  ? process_one_work+0x410/0x410
> [   12.768461]  ? kthread+0x117/0x130
> [   12.768463]  ? kthread_create_on_node+0x40/0x40
> [   12.768466]  ? ret_from_fork+0x25/0x30
> [   12.768468] Code: 80 3d 26 f3 01 00 00 0f 85 ad fd ff ff 48 8b 43 20 48 c7 
> c7 31 a2 20 c0 c6 05 0e f3 01 00 01 48 8b b0 60 01 00 00 e8 75 2e ec c6 <0f> 
> ff e9 88 fd ff ff 31 f6 44 88 55 b0 e8 38 fa ed c6 44 0f b6
> [   12.768508] ---[ end trace d9bb853af3659bd5 ]---
>
> Signed-off-by: Tobias Klausmann 
> ---
>  drivers/gpu/drm/drm_vblank.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> index a233a6be934a..4a21756bf2bd 100644
> ---

Re: [Nouveau] [PATCH] drm: disable vblank only if it got previously enabled

2017-07-19 Thread Ilia Mirkin

I believe the solution is to not call drm_crtc_vblank_off for atomic
modesetting in nouveau_display_fini. I think Ben's working on it.

On Wed, Jul 19, 2017 at 1:25 PM, Tobias Klausmann
 wrote:
> mimic the behavior of vblank_disable_fn(), another caller of
> drm_vblank_disable_and_save().
>
> This avoids oopsing, while trying to disable vblank on a not connected 
> display:
>
> [   12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 
> drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm]
> [   12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc 
> uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops 
> videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 
> snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 
> hid_multitouch nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp 
> vfat coretemp fat kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass 
> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc 
> aesni_intel ath10k_pci snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec 
> crypto_simd ath glue_helper cryptd snd_hda_core mac80211 snd_hwdep snd_pcm 
> pcspkr r8169 cfg80211 mii snd_timer acer_wmi snd sparse_keymap wmi_bmof 
> idma64 hci_uart virt_dma mei_me soundcore i2c_i801 mei btbcm shpchp 
> intel_lpss_pci intel_pch_thermal
> [   12.768130]  serdev btqca ucsi_acpi btintel typec_ucsi thermal typec 
> bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill intel_lpss_acpi 
> pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 mxm_wmi ttm 
> i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect sysimgblt 
> xhci_hcd fb_sys_fops usbcore drm i2c_hid wmi video button sg efivarfs
> [   12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted 
> 4.12.0-desktop-debug-drm+ #2
> [   12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 
> 03/30/2017
> [   12.768164] Workqueue: pm pm_runtime_work
> [   12.768166] task: 889bf1627040 task.stack: 9541013e4000
> [   12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 
> [drm]
> [   12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086
> [   12.768183] RAX: 001c RBX: 889b4cebd000 RCX: 
> 0004
> [   12.768184] RDX: 8004 RSI: 87a2d952 RDI: 
> 
> [   12.768186] RBP: 9541013e7b90 R08: 0001 R09: 
> 039f
> [   12.768187] R10: c05fe530 R11:  R12: 
> 
> [   12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: 
> 889bf0426000
> [   12.768190] FS:  () GS:889bfec0() 
> knlGS:
> [   12.768191] CS:  0010 DS:  ES:  CR0: 80050033
> [   12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: 
> 003406f0
> [   12.768193] Call Trace:
> [   12.768198]  ? enqueue_task_fair+0x64/0x600
> [   12.768211]  ? drm_get_last_vbltimestamp+0x47/0x70 [drm]
> [   12.768223]  ? drm_update_vblank_count+0x65/0x240 [drm]
> [   12.768227]  ? pci_pm_runtime_resume+0xa0/0xa0
> [   12.768238]  ? drm_vblank_disable_and_save+0x55/0xc0 [drm]
> [   12.768250]  ? drm_crtc_vblank_off+0xa9/0x1e0 [drm]
> [   12.768253]  ? pci_pm_runtime_resume+0xa0/0xa0
> [   12.768299]  ? nouveau_display_fini+0x56/0xd0 [nouveau]
> [   12.768339]  ? nouveau_display_suspend+0x51/0x110 [nouveau]
> [   12.768378]  ? nouveau_do_suspend+0x76/0x1c0 [nouveau]
> [   12.768413]  ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau]
> [   12.768416]  ? pci_pm_runtime_suspend+0x5c/0x160
> [   12.768419]  ? __rpm_callback+0xb6/0x1e0
> [   12.768423]  ? kobject_uevent_env+0x111/0x5e0
> [   12.768425]  ? pci_pm_runtime_resume+0xa0/0xa0
> [   12.768427]  ? rpm_callback+0x1f/0x70
> [   12.768429]  ? pci_pm_runtime_resume+0xa0/0xa0
> [   12.768431]  ? rpm_suspend+0x11f/0x640
> [   12.768441]  ? drm_fb_helper_hotplug_event+0x9a/0xe0 [drm_kms_helper]
> [   12.768447]  ? output_poll_execute+0x17b/0x1a0 [drm_kms_helper]
> [   12.768449]  ? pm_runtime_work+0x64/0xa0
> [   12.768453]  ? process_one_work+0x1db/0x410
> [   12.768456]  ? worker_thread+0x47/0x3d0
> [   12.768459]  ? process_one_work+0x410/0x410
> [   12.768461]  ? kthread+0x117/0x130
> [   12.768463]  ? kthread_create_on_node+0x40/0x40
> [   12.768466]  ? ret_from_fork+0x25/0x30
> [   12.768468] Code: 80 3d 26 f3 01 00 00 0f 85 ad fd ff ff 48 8b 43 20 48 c7 
> c7 31 a2 20 c0 c6 05 0e f3 01 00 01 48 8b b0 60 01 00 00 e8 75 2e ec c6 <0f> 
> ff e9 88 fd ff ff 31 f6 44 88 55 b0 e8 38 fa ed c6 44 0f b6
> [   12.768508] ---[ end trace d9bb853af3659bd5 ]---
>
> Signed-off-by: Tobias Klausmann 
> ---
>  drivers/gpu/drm/drm_vblank.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> index a233a6be934a..4a21756bf2bd 100644
> --- a/drivers/gpu/drm/drm_vblank.c
> +++ b/drivers/gpu/drm/drm_vblank.c
> @@ -1140,8 +1140,11 @@ void

Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling

2017-07-16 Thread Ilia Mirkin

On Sun, Jul 16, 2017 at 12:43 AM, Mike Galbraith <efa...@gmx.de> wrote:
> On Sat, 2017-07-15 at 14:52 -0400, Ilia Mirkin wrote:
>>
>> OK, so this issue appears to be that we're calling
>> drm_crtc_vblank_off() on a crtc for which vblank is already disabled.
>> My guess is that this happens because the crtc is disabled.
>>
>> Not sure what the proper check is to see if vblanks are already disabled...
>
> Seems so, the below shut up suspend for both 8600 GT and GTX 980.

The modeset done by drm_atomic_helper_suspend (called previously to
that *_fini) should already take care of disabling vblanks, I think.
So the vblank_off calls can just be done when we're not doing an
atomic modeset [drm_drv_uses_atomic_modeset(dev)] -- this is all very
confusing since pre-nv50 uses legacy modesets, while nv50+ has been
moved to atomic, but they share a bunch of helpers =/

Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling

2017-07-16 Thread Ilia Mirkin

On Sun, Jul 16, 2017 at 12:43 AM, Mike Galbraith  wrote:
> On Sat, 2017-07-15 at 14:52 -0400, Ilia Mirkin wrote:
>>
>> OK, so this issue appears to be that we're calling
>> drm_crtc_vblank_off() on a crtc for which vblank is already disabled.
>> My guess is that this happens because the crtc is disabled.
>>
>> Not sure what the proper check is to see if vblanks are already disabled...
>
> Seems so, the below shut up suspend for both 8600 GT and GTX 980.

The modeset done by drm_atomic_helper_suspend (called previously to
that *_fini) should already take care of disabling vblanks, I think.
So the vblank_off calls can just be done when we're not doing an
atomic modeset [drm_drv_uses_atomic_modeset(dev)] -- this is all very
confusing since pre-nv50 uses legacy modesets, while nv50+ has been
moved to atomic, but they share a bunch of helpers =/

Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling

2017-07-15 Thread Ilia Mirkin

On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraith  wrote:
> Greetings,
>
> box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside
> kernel: master.today (v4.12-11690-gccd5d1b91f22)
>
> lspci -nn -d 10de:
> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce 
> 8600 GT] [10de:0402] (rev a1)
>
> abreviated dmesg:
> ...
> [3.720990] fb: switching to nouveaufb from VESA VGA
> [3.744489] Console: switching to colour dummy device 80x25
> [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2)
> ...
> [3.846963] usbcore: registered new interface driver uas
> [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12
> [  321.450262] nouveau :01:00.0: DRM: suspending console...
> [  321.450265] nouveau :01:00.0: DRM: suspending display...
> [  321.450462] e1000e: EEE TX LPI TIMER: 
> [  321.450501] br0: port 1(eth0) entered disabled state
> [  321.473838] [ cut here ]
> [  321.473863] WARNING: CPU: 1 PID: 4786 at drivers/gpu/drm/drm_vblank.c:608 
> drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x330 [drm]
> [  321.473864] Modules linked in: ebtable_filter(E) ebtables(E) fuse(E) 
> rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) af_packet(E) 
> bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) 
> xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) 
> ipt_REJECT(E) iptable_raw(E) iptable_filter(E) ip6table_mangle(E) 
> nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
> nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) 
> ip6table_filter(E) ip6_tables(E) x_tables(E) saa7134_alsa(E) tda1004x(E) 
> saa7134_dvb(E) videobuf2_dvb(E) dvb_core(E) arc4(E) rt2800usb(E) rt2x00usb(E) 
> rt2800lib(E) crc_ccitt(E) rt2x00lib(E) mac80211(E) cfg80211(E) 
> rc_medion_x10_or2x(E) rfkill(E) ati_remote(E) tda827x(E) tda8290(E) tuner(E) 
> snd_hda_codec_realtek(E) saa7134(E)
> [  321.473905]  snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) 
> snd_hwdep(E) tveeprom(E) coretemp(E) videobuf2_dma_sg(E) videobuf2_memops(E) 
> snd_hda_core(E) videobuf2_v4l2(E) kvm_intel(E) snd_pcm(E) kvm(E) 
> videobuf2_core(E) snd_timer(E) rc_core(E) v4l2_common(E) snd(E) videodev(E) 
> iTCO_wdt(E) media(E) e1000e(E) iTCO_vendor_support(E) ptp(E) pps_core(E) 
> shpchp(E) soundcore(E) i2c_i801(E) lpc_ich(E) mfd_core(E) irqbypass(E) 
> pcspkr(E) thermal(E) acpi_cpufreq(E) fan(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) 
> lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) 
> sr_mod(E) cdrom(E) sd_mod(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) 
> nouveau(E) wmi(E) video(E) i2c_algo_bit(E) ahci(E) drm_kms_helper(E) 
> syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) 
> firewire_ohci(E)
> [  321.473950]  libata(E) firewire_core(E) crc_itu_t(E) ehci_pci(E) 
> serio_raw(E) ttm(E) button(E) drm(E) uhci_hcd(E) ehci_hcd(E) usbcore(E) sg(E) 
> dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) 
> scsi_mod(E) autofs4(E)
> [  321.473966] CPU: 1 PID: 4786 Comm: kworker/u8:17 Tainted: GW   E   
> 4.12.0.gccd5d1b-master #186
> [  321.473968] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 
> 12/26/2007
> [  321.473972] Workqueue: events_unbound async_run_entry_fn
> [  321.473974] task: 8801daf93d40 task.stack: c90003edc000
> [  321.473990] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x330 
> [drm]
> [  321.473992] RSP: 0018:c90003edfb00 EFLAGS: 00010082
> [  321.473994] RAX: a03e6100 RBX: 88021114 RCX: 
> 0001
> [  321.473995] RDX: a01dd8c8 RSI: 0001 RDI: 
> a01c8023
> [  321.473996] RBP: c90003edfb80 R08:  R09: 
> a01b0920
> [  321.473998] R10: a0376e60 R11: 8802131399f8 R12: 
> 0001
> [  321.473999] R13: 880213139800 R14: c90003edfb94 R15: 
> c90003edfbd0
> [  321.474001] FS:  () GS:88022fc8() 
> knlGS:
> [  321.474003] CS:  0010 DS:  ES:  CR0: 80050033
> [  321.474004] CR2: 7fdd82e8f810 CR3: 000214683000 CR4: 
> 06e0
> [  321.474005] Call Trace:
> [  321.474068]  ? nv50_head_vblank_put+0x22/0x50 [nouveau]
> [  321.474085]  drm_get_last_vbltimestamp+0x41/0x70 [drm]
> [  321.474102]  drm_update_vblank_count+0x61/0x230 [drm]
> [  321.474118]  drm_vblank_disable_and_save+0x59/0xc0 [drm]
> [  321.474134]  drm_crtc_vblank_off+0x1d5/0x210 [drm]
> [  321.474152]  ? drm_modeset_drop_locks+0x4e/0x60 [drm]
> [  321.474203]  nouveau_display_fini+0x56/0xd0 [nouveau]
> [  321.474254]  nouveau_display_suspend+0x4f/0x110 [nouveau]
> [  321.474304]  nouveau_do_suspend+0x7c/0x1e0 [nouveau]
> [  321.474355]  nouveau_pmops_suspend+0x2d/0x70 [nouveau]
> [  321.474358]  pci_pm_suspend+0x70/0x130
> [  321.474360]  ? pci_pm_resume+0x90/0x90
> [  321.474364]

Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling

2017-07-15 Thread Ilia Mirkin

On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraith  wrote:
> Greetings,
>
> box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside
> kernel: master.today (v4.12-11690-gccd5d1b91f22)
>
> lspci -nn -d 10de:
> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce 
> 8600 GT] [10de:0402] (rev a1)
>
> abreviated dmesg:
> ...
> [3.720990] fb: switching to nouveaufb from VESA VGA
> [3.744489] Console: switching to colour dummy device 80x25
> [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2)
> ...
> [3.846963] usbcore: registered new interface driver uas
> [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12
> [  321.450262] nouveau :01:00.0: DRM: suspending console...
> [  321.450265] nouveau :01:00.0: DRM: suspending display...
> [  321.450462] e1000e: EEE TX LPI TIMER: 
> [  321.450501] br0: port 1(eth0) entered disabled state
> [  321.473838] [ cut here ]
> [  321.473863] WARNING: CPU: 1 PID: 4786 at drivers/gpu/drm/drm_vblank.c:608 
> drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x330 [drm]
> [  321.473864] Modules linked in: ebtable_filter(E) ebtables(E) fuse(E) 
> rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) af_packet(E) 
> bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) 
> xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) 
> ipt_REJECT(E) iptable_raw(E) iptable_filter(E) ip6table_mangle(E) 
> nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
> nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) 
> ip6table_filter(E) ip6_tables(E) x_tables(E) saa7134_alsa(E) tda1004x(E) 
> saa7134_dvb(E) videobuf2_dvb(E) dvb_core(E) arc4(E) rt2800usb(E) rt2x00usb(E) 
> rt2800lib(E) crc_ccitt(E) rt2x00lib(E) mac80211(E) cfg80211(E) 
> rc_medion_x10_or2x(E) rfkill(E) ati_remote(E) tda827x(E) tda8290(E) tuner(E) 
> snd_hda_codec_realtek(E) saa7134(E)
> [  321.473905]  snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) 
> snd_hwdep(E) tveeprom(E) coretemp(E) videobuf2_dma_sg(E) videobuf2_memops(E) 
> snd_hda_core(E) videobuf2_v4l2(E) kvm_intel(E) snd_pcm(E) kvm(E) 
> videobuf2_core(E) snd_timer(E) rc_core(E) v4l2_common(E) snd(E) videodev(E) 
> iTCO_wdt(E) media(E) e1000e(E) iTCO_vendor_support(E) ptp(E) pps_core(E) 
> shpchp(E) soundcore(E) i2c_i801(E) lpc_ich(E) mfd_core(E) irqbypass(E) 
> pcspkr(E) thermal(E) acpi_cpufreq(E) fan(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) 
> lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) 
> sr_mod(E) cdrom(E) sd_mod(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) 
> nouveau(E) wmi(E) video(E) i2c_algo_bit(E) ahci(E) drm_kms_helper(E) 
> syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) 
> firewire_ohci(E)
> [  321.473950]  libata(E) firewire_core(E) crc_itu_t(E) ehci_pci(E) 
> serio_raw(E) ttm(E) button(E) drm(E) uhci_hcd(E) ehci_hcd(E) usbcore(E) sg(E) 
> dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) 
> scsi_mod(E) autofs4(E)
> [  321.473966] CPU: 1 PID: 4786 Comm: kworker/u8:17 Tainted: GW   E   
> 4.12.0.gccd5d1b-master #186
> [  321.473968] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 
> 12/26/2007
> [  321.473972] Workqueue: events_unbound async_run_entry_fn
> [  321.473974] task: 8801daf93d40 task.stack: c90003edc000
> [  321.473990] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x330 
> [drm]
> [  321.473992] RSP: 0018:c90003edfb00 EFLAGS: 00010082
> [  321.473994] RAX: a03e6100 RBX: 88021114 RCX: 
> 0001
> [  321.473995] RDX: a01dd8c8 RSI: 0001 RDI: 
> a01c8023
> [  321.473996] RBP: c90003edfb80 R08:  R09: 
> a01b0920
> [  321.473998] R10: a0376e60 R11: 8802131399f8 R12: 
> 0001
> [  321.473999] R13: 880213139800 R14: c90003edfb94 R15: 
> c90003edfbd0
> [  321.474001] FS:  () GS:88022fc8() 
> knlGS:
> [  321.474003] CS:  0010 DS:  ES:  CR0: 80050033
> [  321.474004] CR2: 7fdd82e8f810 CR3: 000214683000 CR4: 
> 06e0
> [  321.474005] Call Trace:
> [  321.474068]  ? nv50_head_vblank_put+0x22/0x50 [nouveau]
> [  321.474085]  drm_get_last_vbltimestamp+0x41/0x70 [drm]
> [  321.474102]  drm_update_vblank_count+0x61/0x230 [drm]
> [  321.474118]  drm_vblank_disable_and_save+0x59/0xc0 [drm]
> [  321.474134]  drm_crtc_vblank_off+0x1d5/0x210 [drm]
> [  321.474152]  ? drm_modeset_drop_locks+0x4e/0x60 [drm]
> [  321.474203]  nouveau_display_fini+0x56/0xd0 [nouveau]
> [  321.474254]  nouveau_display_suspend+0x4f/0x110 [nouveau]
> [  321.474304]  nouveau_do_suspend+0x7c/0x1e0 [nouveau]
> [  321.474355]  nouveau_pmops_suspend+0x2d/0x70 [nouveau]
> [  321.474358]  pci_pm_suspend+0x70/0x130
> [  321.474360]  ? pci_pm_resume+0x90/0x90
> [  321.474364]  dpm_run_callback+0x4d/0x150
> [

Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling

2017-07-15 Thread Ilia Mirkin

On Sat, Jul 15, 2017 at 12:14 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
> On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraith <efa...@gmx.de> wrote:
>> Greetings,
>>
>> box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside
>> kernel: master.today (v4.12-11690-gccd5d1b91f22)
>>
>> lspci -nn -d 10de:
>> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce 
>> 8600 GT] [10de:0402] (rev a1)
>>
>> abreviated dmesg:
>> ...
>> [3.720990] fb: switching to nouveaufb from VESA VGA
>> [3.744489] Console: switching to colour dummy device 80x25
>> [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2)
>> ...
>> [3.846963] usbcore: registered new interface driver uas
>> [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12
>> [3.870769] hid-generic 0003:04CA:002B.0002: input,hidraw1: USB HID v1.11 
>> Keyboard [Liteon Wireless keyboard and mouse] on usb-:00:1d.0-1/input0
>> [3.870773] nouveau :01:00.0: bios: M0203T not found
>> [3.870774] nouveau :01:00.0: bios: M0203E not matched!
>> [3.870777] nouveau :01:00.0: fb: 256 MiB DDR2
>> [3.871168] input: Liteon Wireless keyboard and mouse as 
>> /devices/pci:00/:00:1d.0/usb4/4-1/4-1:1.1/0003:04CA:002B.0003/input/input7
>> [3.896090] usb 3-2: new low-speed USB device number 3 using uhci_hcd
>> [3.919101] [TTM] Zone  kernel: Available graphics memory: 3881208 kiB
>> [3.919106] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
>> [3.919110] [TTM] Initializing pool allocator
>> [3.919120] [TTM] Initializing DMA pool allocator
>> [3.919141] nouveau :01:00.0: DRM: VRAM: 256 MiB
>> [3.919146] nouveau :01:00.0: DRM: GART: 1048576 MiB
>> [3.919152] nouveau :01:00.0: DRM: TMDS table version 2.0
>> [3.919157] nouveau :01:00.0: DRM: DCB version 4.0
>> [3.919162] nouveau :01:00.0: DRM: DCB outp 00: 04000310 0028
>> [3.919167] nouveau :01:00.0: DRM: DCB outp 01: 02011300 0028
>> [3.919171] nouveau :01:00.0: DRM: DCB outp 02: 01011302 0030
>> [3.919176] nouveau :01:00.0: DRM: DCB outp 03: 02022322 00020010
>> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083
>> [3.919185] nouveau :01:00.0: DRM: DCB conn 00: 
>> [3.919189] nouveau :01:00.0: DRM: DCB conn 01: 1130
>> [3.919194] nouveau :01:00.0: DRM: DCB conn 02: 2261
>> [3.919198] nouveau :01:00.0: DRM: DCB conn 03: 0310
>> [3.919202] nouveau :01:00.0: DRM: DCB conn 04: 0311
>> [3.919206] nouveau :01:00.0: DRM: DCB conn 05: 0313
>> [3.919258] [ cut here ]
>> [3.919316] WARNING: CPU: 3 PID: 224 at 
>> drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c:83 
>> nvkm_outp_xlat.isra.0+0x26/0x80 [nouveau]
>
> The code in question is
>
> static enum nvkm_ior_proto
> nvkm_outp_xlat(struct nvkm_outp *outp, enum nvkm_ior_type *type)
> {
> switch (outp->info.location) {
> case 0:
> switch (outp->info.type) {
> case DCB_OUTPUT_ANALOG: *type = DAC; return  CRT;
> case DCB_OUTPUT_TMDS  : *type = SOR; return TMDS;
> case DCB_OUTPUT_LVDS  : *type = SOR; return LVDS;
> case DCB_OUTPUT_DP: *type = SOR; return   DP;
> default:
> break;
> }
> break;
> case 1:
> switch (outp->info.type) {
> case DCB_OUTPUT_TMDS: *type = PIOR; return TMDS;
> case DCB_OUTPUT_DP  : *type = PIOR; return TMDS; /* not a bug 
> */
> default:
> break;
> }
> break;
> default:
> break;
> }
> WARN_ON(1);
> return UNKNOWN;
> }
>
> Looks like someone forgot about TV S-Video/Composite outputs (which
> existed up until the GT21x's).
>
>> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083
>
> And there ya go (the type is the lowest nibble of the first dword). We
> don't support TV outputs on nv50+, so you could just add a
>
> case DCB_OUTPUT_TV: return UNKNOWN;
>
> in the location == 0 case.
>
> I don't think that's related to the issue you're seeing on suspend
> though, as the TV connector isn't created anyways, it's just an
> "annoyance" warn, and you were also seeing it on your GM20x which has
> no such thing.

Actually while this may fix things for you in the short term, this is
all generic code, not chip-specific, and we do support TV outputs on
pre-nv50 chips, so it needs to be fixed for real.

Ben - I'm very weak on all these concepts of OR/etc - is the right
move to add a new nvkm_ior_proto/type for TV? (There's also a
DCB_OUTPUT_EOL type, no clue what that is.) I guess it should get type
= DAC and add a new nvkm_ior_proto for TV?

  -ilia

Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling

2017-07-15 Thread Ilia Mirkin

On Sat, Jul 15, 2017 at 12:14 PM, Ilia Mirkin  wrote:
> On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraith  wrote:
>> Greetings,
>>
>> box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside
>> kernel: master.today (v4.12-11690-gccd5d1b91f22)
>>
>> lspci -nn -d 10de:
>> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce 
>> 8600 GT] [10de:0402] (rev a1)
>>
>> abreviated dmesg:
>> ...
>> [3.720990] fb: switching to nouveaufb from VESA VGA
>> [3.744489] Console: switching to colour dummy device 80x25
>> [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2)
>> ...
>> [3.846963] usbcore: registered new interface driver uas
>> [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12
>> [3.870769] hid-generic 0003:04CA:002B.0002: input,hidraw1: USB HID v1.11 
>> Keyboard [Liteon Wireless keyboard and mouse] on usb-:00:1d.0-1/input0
>> [3.870773] nouveau :01:00.0: bios: M0203T not found
>> [3.870774] nouveau :01:00.0: bios: M0203E not matched!
>> [3.870777] nouveau :01:00.0: fb: 256 MiB DDR2
>> [3.871168] input: Liteon Wireless keyboard and mouse as 
>> /devices/pci:00/:00:1d.0/usb4/4-1/4-1:1.1/0003:04CA:002B.0003/input/input7
>> [3.896090] usb 3-2: new low-speed USB device number 3 using uhci_hcd
>> [3.919101] [TTM] Zone  kernel: Available graphics memory: 3881208 kiB
>> [3.919106] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
>> [3.919110] [TTM] Initializing pool allocator
>> [3.919120] [TTM] Initializing DMA pool allocator
>> [3.919141] nouveau :01:00.0: DRM: VRAM: 256 MiB
>> [3.919146] nouveau :01:00.0: DRM: GART: 1048576 MiB
>> [3.919152] nouveau :01:00.0: DRM: TMDS table version 2.0
>> [3.919157] nouveau :01:00.0: DRM: DCB version 4.0
>> [3.919162] nouveau :01:00.0: DRM: DCB outp 00: 04000310 0028
>> [3.919167] nouveau :01:00.0: DRM: DCB outp 01: 02011300 0028
>> [3.919171] nouveau :01:00.0: DRM: DCB outp 02: 01011302 0030
>> [3.919176] nouveau :01:00.0: DRM: DCB outp 03: 02022322 00020010
>> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083
>> [3.919185] nouveau :01:00.0: DRM: DCB conn 00: 
>> [3.919189] nouveau :01:00.0: DRM: DCB conn 01: 1130
>> [3.919194] nouveau :01:00.0: DRM: DCB conn 02: 2261
>> [3.919198] nouveau :01:00.0: DRM: DCB conn 03: 0310
>> [3.919202] nouveau :01:00.0: DRM: DCB conn 04: 0311
>> [3.919206] nouveau :01:00.0: DRM: DCB conn 05: 0313
>> [3.919258] [ cut here ]
>> [3.919316] WARNING: CPU: 3 PID: 224 at 
>> drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c:83 
>> nvkm_outp_xlat.isra.0+0x26/0x80 [nouveau]
>
> The code in question is
>
> static enum nvkm_ior_proto
> nvkm_outp_xlat(struct nvkm_outp *outp, enum nvkm_ior_type *type)
> {
> switch (outp->info.location) {
> case 0:
> switch (outp->info.type) {
> case DCB_OUTPUT_ANALOG: *type = DAC; return  CRT;
> case DCB_OUTPUT_TMDS  : *type = SOR; return TMDS;
> case DCB_OUTPUT_LVDS  : *type = SOR; return LVDS;
> case DCB_OUTPUT_DP: *type = SOR; return   DP;
> default:
> break;
> }
> break;
> case 1:
> switch (outp->info.type) {
> case DCB_OUTPUT_TMDS: *type = PIOR; return TMDS;
> case DCB_OUTPUT_DP  : *type = PIOR; return TMDS; /* not a bug 
> */
> default:
> break;
> }
> break;
> default:
> break;
> }
> WARN_ON(1);
> return UNKNOWN;
> }
>
> Looks like someone forgot about TV S-Video/Composite outputs (which
> existed up until the GT21x's).
>
>> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083
>
> And there ya go (the type is the lowest nibble of the first dword). We
> don't support TV outputs on nv50+, so you could just add a
>
> case DCB_OUTPUT_TV: return UNKNOWN;
>
> in the location == 0 case.
>
> I don't think that's related to the issue you're seeing on suspend
> though, as the TV connector isn't created anyways, it's just an
> "annoyance" warn, and you were also seeing it on your GM20x which has
> no such thing.

Actually while this may fix things for you in the short term, this is
all generic code, not chip-specific, and we do support TV outputs on
pre-nv50 chips, so it needs to be fixed for real.

Ben - I'm very weak on all these concepts of OR/etc - is the right
move to add a new nvkm_ior_proto/type for TV? (There's also a
DCB_OUTPUT_EOL type, no clue what that is.) I guess it should get type
= DAC and add a new nvkm_ior_proto for TV?

  -ilia

Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling

2017-07-15 Thread Ilia Mirkin

On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraith  wrote:
> Greetings,
>
> box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside
> kernel: master.today (v4.12-11690-gccd5d1b91f22)
>
> lspci -nn -d 10de:
> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce 
> 8600 GT] [10de:0402] (rev a1)
>
> abreviated dmesg:
> ...
> [3.720990] fb: switching to nouveaufb from VESA VGA
> [3.744489] Console: switching to colour dummy device 80x25
> [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2)
> ...
> [3.846963] usbcore: registered new interface driver uas
> [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12
> [3.870769] hid-generic 0003:04CA:002B.0002: input,hidraw1: USB HID v1.11 
> Keyboard [Liteon Wireless keyboard and mouse] on usb-:00:1d.0-1/input0
> [3.870773] nouveau :01:00.0: bios: M0203T not found
> [3.870774] nouveau :01:00.0: bios: M0203E not matched!
> [3.870777] nouveau :01:00.0: fb: 256 MiB DDR2
> [3.871168] input: Liteon Wireless keyboard and mouse as 
> /devices/pci:00/:00:1d.0/usb4/4-1/4-1:1.1/0003:04CA:002B.0003/input/input7
> [3.896090] usb 3-2: new low-speed USB device number 3 using uhci_hcd
> [3.919101] [TTM] Zone  kernel: Available graphics memory: 3881208 kiB
> [3.919106] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
> [3.919110] [TTM] Initializing pool allocator
> [3.919120] [TTM] Initializing DMA pool allocator
> [3.919141] nouveau :01:00.0: DRM: VRAM: 256 MiB
> [3.919146] nouveau :01:00.0: DRM: GART: 1048576 MiB
> [3.919152] nouveau :01:00.0: DRM: TMDS table version 2.0
> [3.919157] nouveau :01:00.0: DRM: DCB version 4.0
> [3.919162] nouveau :01:00.0: DRM: DCB outp 00: 04000310 0028
> [3.919167] nouveau :01:00.0: DRM: DCB outp 01: 02011300 0028
> [3.919171] nouveau :01:00.0: DRM: DCB outp 02: 01011302 0030
> [3.919176] nouveau :01:00.0: DRM: DCB outp 03: 02022322 00020010
> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083
> [3.919185] nouveau :01:00.0: DRM: DCB conn 00: 
> [3.919189] nouveau :01:00.0: DRM: DCB conn 01: 1130
> [3.919194] nouveau :01:00.0: DRM: DCB conn 02: 2261
> [3.919198] nouveau :01:00.0: DRM: DCB conn 03: 0310
> [3.919202] nouveau :01:00.0: DRM: DCB conn 04: 0311
> [3.919206] nouveau :01:00.0: DRM: DCB conn 05: 0313
> [3.919258] [ cut here ]
> [3.919316] WARNING: CPU: 3 PID: 224 at 
> drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c:83 
> nvkm_outp_xlat.isra.0+0x26/0x80 [nouveau]

The code in question is

static enum nvkm_ior_proto
nvkm_outp_xlat(struct nvkm_outp *outp, enum nvkm_ior_type *type)
{
switch (outp->info.location) {
case 0:
switch (outp->info.type) {
case DCB_OUTPUT_ANALOG: *type = DAC; return  CRT;
case DCB_OUTPUT_TMDS  : *type = SOR; return TMDS;
case DCB_OUTPUT_LVDS  : *type = SOR; return LVDS;
case DCB_OUTPUT_DP: *type = SOR; return   DP;
default:
break;
}
break;
case 1:
switch (outp->info.type) {
case DCB_OUTPUT_TMDS: *type = PIOR; return TMDS;
case DCB_OUTPUT_DP  : *type = PIOR; return TMDS; /* not a bug */
default:
break;
}
break;
default:
break;
}
WARN_ON(1);
return UNKNOWN;
}

Looks like someone forgot about TV S-Video/Composite outputs (which
existed up until the GT21x's).

> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083

And there ya go (the type is the lowest nibble of the first dword). We
don't support TV outputs on nv50+, so you could just add a

case DCB_OUTPUT_TV: return UNKNOWN;

in the location == 0 case.

I don't think that's related to the issue you're seeing on suspend
though, as the TV connector isn't created anyways, it's just an
"annoyance" warn, and you were also seeing it on your GM20x which has
no such thing.

  -ilia

Re: [drm/nouveau] GeForce 8600 GT boot/suspend grumbling

2017-07-15 Thread Ilia Mirkin

On Sat, Jul 15, 2017 at 1:40 AM, Mike Galbraith  wrote:
> Greetings,
>
> box: bog standard [tc]rusty old Nvidia equipped Q6600 Medion (Aldi) deskside
> kernel: master.today (v4.12-11690-gccd5d1b91f22)
>
> lspci -nn -d 10de:
> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [GeForce 
> 8600 GT] [10de:0402] (rev a1)
>
> abreviated dmesg:
> ...
> [3.720990] fb: switching to nouveaufb from VESA VGA
> [3.744489] Console: switching to colour dummy device 80x25
> [3.744966] nouveau :01:00.0: NVIDIA G84 (084200a2)
> ...
> [3.846963] usbcore: registered new interface driver uas
> [3.849938] nouveau :01:00.0: bios: version 60.84.6e.00.12
> [3.870769] hid-generic 0003:04CA:002B.0002: input,hidraw1: USB HID v1.11 
> Keyboard [Liteon Wireless keyboard and mouse] on usb-:00:1d.0-1/input0
> [3.870773] nouveau :01:00.0: bios: M0203T not found
> [3.870774] nouveau :01:00.0: bios: M0203E not matched!
> [3.870777] nouveau :01:00.0: fb: 256 MiB DDR2
> [3.871168] input: Liteon Wireless keyboard and mouse as 
> /devices/pci:00/:00:1d.0/usb4/4-1/4-1:1.1/0003:04CA:002B.0003/input/input7
> [3.896090] usb 3-2: new low-speed USB device number 3 using uhci_hcd
> [3.919101] [TTM] Zone  kernel: Available graphics memory: 3881208 kiB
> [3.919106] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
> [3.919110] [TTM] Initializing pool allocator
> [3.919120] [TTM] Initializing DMA pool allocator
> [3.919141] nouveau :01:00.0: DRM: VRAM: 256 MiB
> [3.919146] nouveau :01:00.0: DRM: GART: 1048576 MiB
> [3.919152] nouveau :01:00.0: DRM: TMDS table version 2.0
> [3.919157] nouveau :01:00.0: DRM: DCB version 4.0
> [3.919162] nouveau :01:00.0: DRM: DCB outp 00: 04000310 0028
> [3.919167] nouveau :01:00.0: DRM: DCB outp 01: 02011300 0028
> [3.919171] nouveau :01:00.0: DRM: DCB outp 02: 01011302 0030
> [3.919176] nouveau :01:00.0: DRM: DCB outp 03: 02022322 00020010
> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083
> [3.919185] nouveau :01:00.0: DRM: DCB conn 00: 
> [3.919189] nouveau :01:00.0: DRM: DCB conn 01: 1130
> [3.919194] nouveau :01:00.0: DRM: DCB conn 02: 2261
> [3.919198] nouveau :01:00.0: DRM: DCB conn 03: 0310
> [3.919202] nouveau :01:00.0: DRM: DCB conn 04: 0311
> [3.919206] nouveau :01:00.0: DRM: DCB conn 05: 0313
> [3.919258] [ cut here ]
> [3.919316] WARNING: CPU: 3 PID: 224 at 
> drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c:83 
> nvkm_outp_xlat.isra.0+0x26/0x80 [nouveau]

The code in question is

static enum nvkm_ior_proto
nvkm_outp_xlat(struct nvkm_outp *outp, enum nvkm_ior_type *type)
{
switch (outp->info.location) {
case 0:
switch (outp->info.type) {
case DCB_OUTPUT_ANALOG: *type = DAC; return  CRT;
case DCB_OUTPUT_TMDS  : *type = SOR; return TMDS;
case DCB_OUTPUT_LVDS  : *type = SOR; return LVDS;
case DCB_OUTPUT_DP: *type = SOR; return   DP;
default:
break;
}
break;
case 1:
switch (outp->info.type) {
case DCB_OUTPUT_TMDS: *type = PIOR; return TMDS;
case DCB_OUTPUT_DP  : *type = PIOR; return TMDS; /* not a bug */
default:
break;
}
break;
default:
break;
}
WARN_ON(1);
return UNKNOWN;
}

Looks like someone forgot about TV S-Video/Composite outputs (which
existed up until the GT21x's).

> [3.919180] nouveau :01:00.0: DRM: DCB outp 04: 010333f1 00c0c083

And there ya go (the type is the lowest nibble of the first dword). We
don't support TV outputs on nv50+, so you could just add a

case DCB_OUTPUT_TV: return UNKNOWN;

in the location == 0 case.

I don't think that's related to the issue you're seeing on suspend
though, as the TV connector isn't created anyways, it's just an
"annoyance" warn, and you were also seeing it on your GM20x which has
no such thing.

  -ilia

Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-14 Thread Ilia Mirkin

On Fri, Jul 14, 2017 at 11:19 AM, Tobias Klausmann
 wrote:
> The conversion is a nice catch, but i'd like to have a bit more context, see
> below!
>
> With a better description:
>
> Tobias Klausmann 

I don't think it was meant as a serious patch. WARN_ON_ONCE should
work. The fix isn't to remove all instances of WARN_ON_ONCE. The fix
is to fix WARN_ON_ONCE.

Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-14 Thread Ilia Mirkin

On Fri, Jul 14, 2017 at 11:19 AM, Tobias Klausmann
 wrote:
> The conversion is a nice catch, but i'd like to have a bit more context, see
> below!
>
> With a better description:
>
> Tobias Klausmann 

I don't think it was meant as a serious patch. WARN_ON_ONCE should
work. The fix isn't to remove all instances of WARN_ON_ONCE. The fix
is to fix WARN_ON_ONCE.

Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-14 Thread Ilia Mirkin

On Fri, Jul 14, 2017 at 11:15 AM, Mike Galbraith  wrote:
> On Fri, 2017-07-14 at 17:10 +0200, Karol Herbst wrote:
>> Yeah, we shouldn't let the machine die. Are there more WARN_ON_ONCE
>> usage we could convert to WARN_ONCE?
>
> Shooting the messenger is generally considered uncool :)

That's never stopped it from being a popular practice...

Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-14 Thread Ilia Mirkin

On Fri, Jul 14, 2017 at 11:15 AM, Mike Galbraith  wrote:
> On Fri, 2017-07-14 at 17:10 +0200, Karol Herbst wrote:
>> Yeah, we shouldn't let the machine die. Are there more WARN_ON_ONCE
>> usage we could convert to WARN_ONCE?
>
> Shooting the messenger is generally considered uncool :)

That's never stopped it from being a popular practice...

Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-12 Thread Ilia Mirkin

On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith <efa...@gmx.de> wrote:
> On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote:
>> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
>> >
>> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not
>> > too much trouble, a bisect would be pretty useful.
>>
>> Bisection seemingly went fine, but the result is odd.
>>
>> e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit
>
> But it really really is bad.  Looking at gitk fork in the road leading
> to it...
>
> 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good
> e4e818cc2d7c drm: make drm_panel.h self-contained - good
> 9cf8f5802f39 drm: add missing declaration to drm_blend.h  - good
>
> Before the git highway splits, all is well.  The lane with commits
> works fine at both ends, but e98c58e55f68 is busted.  Merge arfifact?

Hmmm... that tree does not appear to have gotten a v4.12 backmerge at
any point. The last backmerge from Linus as far as I can tell was
v4.11-rc7. Could be an interaction with some out-of-tree change.

Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-12 Thread Ilia Mirkin

On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith  wrote:
> On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote:
>> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
>> >
>> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not
>> > too much trouble, a bisect would be pretty useful.
>>
>> Bisection seemingly went fine, but the result is odd.
>>
>> e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit
>
> But it really really is bad.  Looking at gitk fork in the road leading
> to it...
>
> 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good
> e4e818cc2d7c drm: make drm_panel.h self-contained - good
> 9cf8f5802f39 drm: add missing declaration to drm_blend.h  - good
>
> Before the git highway splits, all is well.  The lane with commits
> works fine at both ends, but e98c58e55f68 is busted.  Merge arfifact?

Hmmm... that tree does not appear to have gotten a v4.12 backmerge at
any point. The last backmerge from Linus as far as I can tell was
v4.11-rc7. Could be an interaction with some out-of-tree change.

Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Ilia Mirkin

On Tue, Jul 11, 2017 at 2:08 PM, Mike Galbraith <efa...@gmx.de> wrote:
> On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote:
>> Some details that may be useful in analysis of the bug:
>>
>> 1. lspci -nn -d 10de:
>
> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce 
> GTX 980] [10de:13c0] (rev a1)
> 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio 
> Controller [10de:0fbb] (rev a1
>
>> 2. What displays, if any, you have plugged into the NVIDIA board when
>> this happens?
>
> A Philips 273V, via DVI.
>
>> 3. Any boot parameters, esp relating to ACPI, PM, or related?
>
> None for those, what's there that will be unfamiliar to you are for
> patches that aren't applied.
>
> nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0
> nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60
> ignore_loglevel crashkernel=256M,high

OK, thanks. So in other words, a fairly standard desktop with a PCIe
board plugged in. No funny business. (Laptops can create a ton of
additional weirdness, which I assumed you had since you were talking
about STR.)

My best guess is that gf119_head_vblank_put either has a bogus head id
(should be in the 0..3 range) which causes it to do an out-of-bounds
read on MMIO space, or that the MMIO mapping has already been removed
by the time nouveau_display_suspend runs. Adding Ben Skeggs for
additional insight.

Some display stuff did change for 4.13 for GM20x+ boards. If it's not
too much trouble, a bisect would be pretty useful.

Cheers,

  -ilia

Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Ilia Mirkin

On Tue, Jul 11, 2017 at 2:08 PM, Mike Galbraith  wrote:
> On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote:
>> Some details that may be useful in analysis of the bug:
>>
>> 1. lspci -nn -d 10de:
>
> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce 
> GTX 980] [10de:13c0] (rev a1)
> 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio 
> Controller [10de:0fbb] (rev a1
>
>> 2. What displays, if any, you have plugged into the NVIDIA board when
>> this happens?
>
> A Philips 273V, via DVI.
>
>> 3. Any boot parameters, esp relating to ACPI, PM, or related?
>
> None for those, what's there that will be unfamiliar to you are for
> patches that aren't applied.
>
> nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0
> nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60
> ignore_loglevel crashkernel=256M,high

OK, thanks. So in other words, a fairly standard desktop with a PCIe
board plugged in. No funny business. (Laptops can create a ton of
additional weirdness, which I assumed you had since you were talking
about STR.)

My best guess is that gf119_head_vblank_put either has a bogus head id
(should be in the 0..3 range) which causes it to do an out-of-bounds
read on MMIO space, or that the MMIO mapping has already been removed
by the time nouveau_display_suspend runs. Adding Ben Skeggs for
additional insight.

Some display stuff did change for 4.13 for GM20x+ boards. If it's not
too much trouble, a bisect would be pretty useful.

Cheers,

  -ilia

Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Ilia Mirkin

Some details that may be useful in analysis of the bug:

1. lspci -nn -d 10de:
2. What displays, if any, you have plugged into the NVIDIA board when
this happens?
3. Any boot parameters, esp relating to ACPI, PM, or related?

Cheers,

  -ilia

On Tue, Jul 11, 2017 at 1:32 PM, Mike Galbraith  wrote:
> Greetings,
>
> I met $subject in master-rt post drm merge, but taking the config
> (attached) to virgin v4.12-10624-g9967468c0a10, it's reproducible.
>
>   KERNEL: vmlinux-4.12.0.g9967468-preempt.gz
> DUMPFILE: vmcore
> CPUS: 8
> DATE: Tue Jul 11 18:55:28 2017
>   UPTIME: 00:02:03
> LOAD AVERAGE: 3.43, 1.39, 0.52
>TASKS: 467
> NODENAME: homer
>  RELEASE: 4.12.0.g9967468-preempt
>  VERSION: #155 SMP PREEMPT Tue Jul 11 18:18:11 CEST 2017
>  MACHINE: x86_64  (3591 Mhz)
>   MEMORY: 16 GB
>PANIC: "BUG: unable to handle kernel paging request at 
> a022990f"
>  PID: 4658
>  COMMAND: "kworker/u16:26"
> TASK: 8803c6068f80  [THREAD_INFO: 8803c6068f80]
>  CPU: 7
>STATE: TASK_RUNNING (PANIC)
>
> crash> bt
> PID: 4658   TASK: 8803c6068f80  CPU: 7   COMMAND: "kworker/u16:26"
>  #0 [c900039f76a0] machine_kexec at 810481fc
>  #1 [c900039f76f0] __crash_kexec at 81109e3a
>  #2 [c900039f77b0] crash_kexec at 8110adc9
>  #3 [c900039f77c8] oops_end at 8101d059
>  #4 [c900039f77e8] no_context at 81055ce5
>  #5 [c900039f7838] do_page_fault at 81056c5b
>  #6 [c900039f7860] page_fault at 81690a88
> [exception RIP: report_bug+93]
> RIP: 8167227d  RSP: c900039f7918  RFLAGS: 00010002
> RAX: a0229905  RBX: a020af0f  RCX: 0001
> RDX: 0907  RSI: a020af11  RDI: 98f6
> RBP: c900039f7a58   R8: 0001   R9: 03fc
> R10: 81a01906  R11: 8803f84711f8  R12: a02231fb
> R13: 0260  R14: 0004  R15: 0006
> ORIG_RAX:   CS: 0010  SS: 0018
>  #7 [c900039f7910] report_bug at 81672248
>  #8 [c900039f7938] fixup_bug at 8101af85
>  #9 [c900039f7950] do_trap at 8101b0d9
> #10 [c900039f79a0] do_error_trap at 8101b190
> #11 [c900039f7a50] invalid_op at 8169063e
> [exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335]
> RIP: a020af0f  RSP: c900039f7b00  RFLAGS: 00010086
> RAX: a04fa100  RBX: 8803f9550800  RCX: 0001
> RDX: a0228a58  RSI: 0001  RDI: a022321b
> RBP: c900039f7b80   R8:    R9: a020adc0
> R10: a048a1b0  R11: 8803f84711f8  R12: 0001
> R13: 8803f8471000  R14: c900039f7b94  R15: c900039f7bd0
> ORIG_RAX:   CS: 0010  SS: 0018
> #12 [c900039f7b18] gf119_head_vblank_put at a04422f9 [nouveau]
> #13 [c900039f7b88] drm_get_last_vbltimestamp at a020ad91 [drm]
> #14 [c900039f7ba8] drm_update_vblank_count at a020b3e1 [drm]
> #15 [c900039f7c10] drm_vblank_disable_and_save at a020bbe9 [drm]
> #16 [c900039f7c40] drm_crtc_vblank_off at a020c3c0 [drm]
> #17 [c900039f7cb0] nouveau_display_fini at a048a4d6 [nouveau]
> #18 [c900039f7ce0] nouveau_display_suspend at a048ac4f [nouveau]
> #19 [c900039f7d00] nouveau_do_suspend at a047e5ec [nouveau]
> #20 [c900039f7d38] nouveau_pmops_suspend at a047e77d [nouveau]
> #21 [c900039f7d50] pci_pm_suspend at 813b1ff0
> #22 [c900039f7d80] dpm_run_callback at 814c4dbd
> #23 [c900039f7db8] __device_suspend at 814c5a61
> #24 [c900039f7e30] async_suspend at 814c5cfa
> #25 [c900039f7e48] async_run_entry_fn at 81091683
> #26 [c900039f7e70] process_one_work at 810882bc
> #27 [c900039f7eb0] worker_thread at 8108854a
> #28 [c900039f7f10] kthread at 8108e387
> #29 [c900039f7f50] ret_from_fork at 8168fa85
> crash> gdb list *drm_calc_vbltimestamp_from_scanoutpos+335
> 0xa020af0f is in drm_calc_vbltimestamp_from_scanoutpos 
> (drivers/gpu/drm/drm_vblank.c:608).
> 603 /* If mode timing undefined, just return as no-op:
> 604  * Happens during initial modesetting of a crtc.
> 605  */
> 606 if (mode->crtc_clock == 0) {
> 607 DRM_DEBUG("crtc %u: Noop due to uninitialized 
> mode.\n", pipe);
> 608 WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
> 609
> 610 return false;
> 611 }
> 612
> crash> gdb list *report_bug+93
> 0x8167227d is in report_bug (lib/bug.c:177).
> 172 return BUG_TRAP_TYPE_WARN;
> 173
> 174

Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Ilia Mirkin

Some details that may be useful in analysis of the bug:

1. lspci -nn -d 10de:
2. What displays, if any, you have plugged into the NVIDIA board when
this happens?
3. Any boot parameters, esp relating to ACPI, PM, or related?

Cheers,

  -ilia

On Tue, Jul 11, 2017 at 1:32 PM, Mike Galbraith  wrote:
> Greetings,
>
> I met $subject in master-rt post drm merge, but taking the config
> (attached) to virgin v4.12-10624-g9967468c0a10, it's reproducible.
>
>   KERNEL: vmlinux-4.12.0.g9967468-preempt.gz
> DUMPFILE: vmcore
> CPUS: 8
> DATE: Tue Jul 11 18:55:28 2017
>   UPTIME: 00:02:03
> LOAD AVERAGE: 3.43, 1.39, 0.52
>TASKS: 467
> NODENAME: homer
>  RELEASE: 4.12.0.g9967468-preempt
>  VERSION: #155 SMP PREEMPT Tue Jul 11 18:18:11 CEST 2017
>  MACHINE: x86_64  (3591 Mhz)
>   MEMORY: 16 GB
>PANIC: "BUG: unable to handle kernel paging request at 
> a022990f"
>  PID: 4658
>  COMMAND: "kworker/u16:26"
> TASK: 8803c6068f80  [THREAD_INFO: 8803c6068f80]
>  CPU: 7
>STATE: TASK_RUNNING (PANIC)
>
> crash> bt
> PID: 4658   TASK: 8803c6068f80  CPU: 7   COMMAND: "kworker/u16:26"
>  #0 [c900039f76a0] machine_kexec at 810481fc
>  #1 [c900039f76f0] __crash_kexec at 81109e3a
>  #2 [c900039f77b0] crash_kexec at 8110adc9
>  #3 [c900039f77c8] oops_end at 8101d059
>  #4 [c900039f77e8] no_context at 81055ce5
>  #5 [c900039f7838] do_page_fault at 81056c5b
>  #6 [c900039f7860] page_fault at 81690a88
> [exception RIP: report_bug+93]
> RIP: 8167227d  RSP: c900039f7918  RFLAGS: 00010002
> RAX: a0229905  RBX: a020af0f  RCX: 0001
> RDX: 0907  RSI: a020af11  RDI: 98f6
> RBP: c900039f7a58   R8: 0001   R9: 03fc
> R10: 81a01906  R11: 8803f84711f8  R12: a02231fb
> R13: 0260  R14: 0004  R15: 0006
> ORIG_RAX:   CS: 0010  SS: 0018
>  #7 [c900039f7910] report_bug at 81672248
>  #8 [c900039f7938] fixup_bug at 8101af85
>  #9 [c900039f7950] do_trap at 8101b0d9
> #10 [c900039f79a0] do_error_trap at 8101b190
> #11 [c900039f7a50] invalid_op at 8169063e
> [exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335]
> RIP: a020af0f  RSP: c900039f7b00  RFLAGS: 00010086
> RAX: a04fa100  RBX: 8803f9550800  RCX: 0001
> RDX: a0228a58  RSI: 0001  RDI: a022321b
> RBP: c900039f7b80   R8:    R9: a020adc0
> R10: a048a1b0  R11: 8803f84711f8  R12: 0001
> R13: 8803f8471000  R14: c900039f7b94  R15: c900039f7bd0
> ORIG_RAX:   CS: 0010  SS: 0018
> #12 [c900039f7b18] gf119_head_vblank_put at a04422f9 [nouveau]
> #13 [c900039f7b88] drm_get_last_vbltimestamp at a020ad91 [drm]
> #14 [c900039f7ba8] drm_update_vblank_count at a020b3e1 [drm]
> #15 [c900039f7c10] drm_vblank_disable_and_save at a020bbe9 [drm]
> #16 [c900039f7c40] drm_crtc_vblank_off at a020c3c0 [drm]
> #17 [c900039f7cb0] nouveau_display_fini at a048a4d6 [nouveau]
> #18 [c900039f7ce0] nouveau_display_suspend at a048ac4f [nouveau]
> #19 [c900039f7d00] nouveau_do_suspend at a047e5ec [nouveau]
> #20 [c900039f7d38] nouveau_pmops_suspend at a047e77d [nouveau]
> #21 [c900039f7d50] pci_pm_suspend at 813b1ff0
> #22 [c900039f7d80] dpm_run_callback at 814c4dbd
> #23 [c900039f7db8] __device_suspend at 814c5a61
> #24 [c900039f7e30] async_suspend at 814c5cfa
> #25 [c900039f7e48] async_run_entry_fn at 81091683
> #26 [c900039f7e70] process_one_work at 810882bc
> #27 [c900039f7eb0] worker_thread at 8108854a
> #28 [c900039f7f10] kthread at 8108e387
> #29 [c900039f7f50] ret_from_fork at 8168fa85
> crash> gdb list *drm_calc_vbltimestamp_from_scanoutpos+335
> 0xa020af0f is in drm_calc_vbltimestamp_from_scanoutpos 
> (drivers/gpu/drm/drm_vblank.c:608).
> 603 /* If mode timing undefined, just return as no-op:
> 604  * Happens during initial modesetting of a crtc.
> 605  */
> 606 if (mode->crtc_clock == 0) {
> 607 DRM_DEBUG("crtc %u: Noop due to uninitialized 
> mode.\n", pipe);
> 608 WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
> 609
> 610 return false;
> 611 }
> 612
> crash> gdb list *report_bug+93
> 0x8167227d is in report_bug (lib/bug.c:177).
> 172 return BUG_TRAP_TYPE_WARN;
> 173
> 174 /*
> 175

Re: Weird green patterns on video

2017-06-04 Thread Ilia Mirkin

On Sun, Jun 4, 2017 at 1:05 PM, Alexandre-Xavier L-L  wrote:
> Hello,
>
> Someone sent me a picture of a device that he tried to add support for
> in V4L2. The device causes a kind of diagonal pattern made of green
> lines on his image. I wonder what could be causing this. Has anyone
> seen this before?
>
> The device is a the first ever model of Ion Video 2 PC that uses a TM6010 
> chip.
>
> What he got: https://sebbro.nl/ION_Video2PC-TM6010_BOARD_GENERIC.png
>
> Expected result (captured from another device):
> https://sebbro.nl/VCR-reference.png
>
> The support for the device was added by adding
> { USB_DEVICE(0x15e4, 0x0140), .driver_info = TM6010_BOARD_GENERIC },
> to tm6000-cards.c.
>
> Thanks in advance for any clues.
> Alexandre-Xavier

YUV zero = RGB greenish, as you see there. From the looks of it, the
pitch on the buffer is wrong, and you're showing the parts of the
buffer that are left zeroed as if they were part of the visible
region. (Pitch = how many bytes between lines, which is not
necessarily the visible width of the buffer, as it can be rounded up
to various values for various reasons.)

Hope this helps,

  -ilia

Re: Weird green patterns on video

2017-06-04 Thread Ilia Mirkin

On Sun, Jun 4, 2017 at 1:05 PM, Alexandre-Xavier L-L  wrote:
> Hello,
>
> Someone sent me a picture of a device that he tried to add support for
> in V4L2. The device causes a kind of diagonal pattern made of green
> lines on his image. I wonder what could be causing this. Has anyone
> seen this before?
>
> The device is a the first ever model of Ion Video 2 PC that uses a TM6010 
> chip.
>
> What he got: https://sebbro.nl/ION_Video2PC-TM6010_BOARD_GENERIC.png
>
> Expected result (captured from another device):
> https://sebbro.nl/VCR-reference.png
>
> The support for the device was added by adding
> { USB_DEVICE(0x15e4, 0x0140), .driver_info = TM6010_BOARD_GENERIC },
> to tm6000-cards.c.
>
> Thanks in advance for any clues.
> Alexandre-Xavier

YUV zero = RGB greenish, as you see there. From the looks of it, the
pitch on the buffer is wrong, and you're showing the parts of the
buffer that are left zeroed as if they were part of the visible
region. (Pitch = how many bytes between lines, which is not
necessarily the visible width of the buffer, as it can be rounded up
to various values for various reasons.)

Hope this helps,

  -ilia

Re: [PATCH 1/3] drm: fourcc byteorder: drop DRM_FORMAT_BIG_ENDIAN

2017-05-02 Thread Ilia Mirkin

On Tue, May 2, 2017 at 11:06 AM, Gerd Hoffmann  wrote:
> Radeon and nvidia (nv40) cards where mentioned.  I'll try to summarize
> (feel free to correct me if I'm wrong).
>
> nvidia has support for 8 bit-per-color formats only on bigendian hosts.
> Not sure whenever this is a driver or hardware limitation.

Let me just summarize the NVIDIA situation. First off, pre-nv50 and
nv50+ are entirely different and unrelated beasts.

The (pre-nv50) hardware has (a few) "big endian mode" bits. Those bits
are kind of unrelated to each other and control their own "domains".
One of the domains is reading of the scanout fb. So as a result, the
hardware can scan out XRGB, RGB565, and XRGB1555 stored in either
little or big endian packings, irrespective of the "mode" that other
parts of the hardware are in.

However there's the delicate little question of the GPU *generating*
the data. These older GPUs don't have quite the format flexibility
offered by newer hw. So only XRGB is supported, packed in whatever
"mode" the whole PGRAPH unit is in. (I say this because things seem to
work when rendering using the XRGB format while scanning out with
the BE flag set.)

There are no APIs for controlling the endianness of each engine in
nouveau, so it ends up being in "big endian" mode on BE hosts, so the
GPU can only render to big-endian-packed framebuffers.

None of this applies to nv50+ hw. (Although it might in broad strokes.)

Currently the driver is exposing XRGB and ARGB formats as
that's what drm_crtc_init does for it. However the ARGB format
doesn't work (and shouldn't be exposed, the alpha is meaningless on a
single-plane setup), and the XRGB format is assumed to be packed
in cpu host endian (and the "BE" bit is set accordingly).

Hope this helps!

  -ilia

1 2 3 4 >

1 - 100 of 371 matches

Mail list logo