Re: [regression from v4.19] Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related?

2019-01-02 Thread Pavel Machek
Hi!

> > > So if you could please try drm-tip reproducing AND open a bug in Bugzilla.
> > > If you are unwilling to do that, it is very difficult to help you
> > > more.
> > 
> > Website says I have to read and agree to two different pieces of
> > legalesee, and I'd need to keep track of yet another password... so
> > you can "communicate" with me.
> > 
> > But you can already communicate with me, over email.
> 
> I've listed all the reasons why our bug handling process is what it is.
> 
> If registering to the Bugzilla is too much of an effort for you, then I
> won't be able to help you further on this.

Actually I did register at the bugzilla. Only useful help there
was that CONFIG_DRM_I915_DEBUG_GEM might be useful. Unfortunately that
one seems to make it panic() and impossible to get anything useful.

https://bugs.freedesktop.org/show_bug.cgi?id=109175

Best regards,

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [regression from v4.19] Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related?

2019-01-02 Thread Joonas Lahtinen
Quoting Pavel Machek (2018-12-27 10:34:39)
> Hi!
> 
> > > > > If you think it is useful, I can try to update my machine to
> > > > > linux-next.
> > > > 
> > > > linux-next is closer to drm-tip, so it's better. Do you have some
> > > > specific reason for not wanting to run drm-tip (but linux-next is still
> > > > ok)?
> > > 
> > > I already have build/update scripts for -next, and I trust -next not
> > > to store screenshots of my desktop in my master boot record :-).
> > > 
> > > Anyway, it does happen with -next. This time, chromiums were running,
> > > and crash happened minute? after I exited flightgear. It can be seen
> > > in the logs.
> > > 
> > > Oh and I might want to mention -- machine was rather deep in swap this
> > > time, as in "mouse jumping when starting fgfs" and "could feel the
> > > chromium being swapped back in". I might have had this situation
> > > before, and just powercycled the machine "because it is so deep in
> > > swap that it will not recover".
> > > 
> > > top says:
> > > 
> > > top - 19:18:24 up 2 days,  8:03,  2 users,  load average: 3.02, 3.45,
> > > 3.21
> > > Tasks: 141 total,   1 running,  86 sleeping,   0 stopped,   2 zombie
> > > %Cpu(s): 18.8 us,  7.6 sy,  3.0 ni, 68.4 id,  1.3 wa,  0.0 hi,  0.9
> > > si,  0.0 st
> > > KiB Mem:   5967968 total,   663244 used,  5304724 free,48876
> > > buffers
> > > KiB Swap:  1681428 total,   170904 used,  1510524 free.   446280
> > > cached Mem
> > > 
> > > but of course that memory is free once everything died.
> > > 
> > > Any ideas? Should I go back to v4.19 to see if it happens there, too?
> > 
> > linux-next includes very much the same code as drm-tip. There's nobody
> > magically reviewing the code more than it is reviewed for inclusion into
> > drm-tip, when it is fed into linux-next. So thinking linux-next would be
> > some way safer is an illusion.
> > 
> > It sounds like having memory pressure expedites the corruption, which
> > should make it easier to reproduce and thus fix.
> > 
> > So if you could please try drm-tip reproducing AND open a bug in Bugzilla.
> > If you are unwilling to do that, it is very difficult to help you
> > more.
> 
> Website says I have to read and agree to two different pieces of
> legalesee, and I'd need to keep track of yet another password... so
> you can "communicate" with me.
> 
> But you can already communicate with me, over email.

I've listed all the reasons why our bug handling process is what it is.

If registering to the Bugzilla is too much of an effort for you, then I
won't be able to help you further on this.

Regards, Joonas

> I verified v4.19 is stable -- it worked ok for way more than two days
> it usually takes to crash.
> 
> Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


[regression from v4.19] Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related?

2018-12-27 Thread Pavel Machek
Hi!

> > > > If you think it is useful, I can try to update my machine to
> > > > linux-next.
> > > 
> > > linux-next is closer to drm-tip, so it's better. Do you have some
> > > specific reason for not wanting to run drm-tip (but linux-next is still
> > > ok)?
> > 
> > I already have build/update scripts for -next, and I trust -next not
> > to store screenshots of my desktop in my master boot record :-).
> > 
> > Anyway, it does happen with -next. This time, chromiums were running,
> > and crash happened minute? after I exited flightgear. It can be seen
> > in the logs.
> > 
> > Oh and I might want to mention -- machine was rather deep in swap this
> > time, as in "mouse jumping when starting fgfs" and "could feel the
> > chromium being swapped back in". I might have had this situation
> > before, and just powercycled the machine "because it is so deep in
> > swap that it will not recover".
> > 
> > top says:
> > 
> > top - 19:18:24 up 2 days,  8:03,  2 users,  load average: 3.02, 3.45,
> > 3.21
> > Tasks: 141 total,   1 running,  86 sleeping,   0 stopped,   2 zombie
> > %Cpu(s): 18.8 us,  7.6 sy,  3.0 ni, 68.4 id,  1.3 wa,  0.0 hi,  0.9
> > si,  0.0 st
> > KiB Mem:   5967968 total,   663244 used,  5304724 free,48876
> > buffers
> > KiB Swap:  1681428 total,   170904 used,  1510524 free.   446280
> > cached Mem
> > 
> > but of course that memory is free once everything died.
> > 
> > Any ideas? Should I go back to v4.19 to see if it happens there, too?
> 
> linux-next includes very much the same code as drm-tip. There's nobody
> magically reviewing the code more than it is reviewed for inclusion into
> drm-tip, when it is fed into linux-next. So thinking linux-next would be
> some way safer is an illusion.
> 
> It sounds like having memory pressure expedites the corruption, which
> should make it easier to reproduce and thus fix.
> 
> So if you could please try drm-tip reproducing AND open a bug in Bugzilla.
> If you are unwilling to do that, it is very difficult to help you
> more.

Website says I have to read and agree to two different pieces of
legalesee, and I'd need to keep track of yet another password... so
you can "communicate" with me.

But you can already communicate with me, over email.

I verified v4.19 is stable -- it worked ok for way more than two days
it usually takes to crash.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related?

2018-12-13 Thread Joonas Lahtinen
Quoting Pavel Machek (2018-12-12 20:29:02)
> Hi!
> 
> > > > > > > > There's one similar for nouveau in Bugzilla, but it seems like 
> > > > > > > > a genuine
> > > > > > > > memory corruption (1 bit flipped):
> > > > > > > > 
> > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880
> > > > > > > > 
> > > > > > > > Any extra information would be of use :)
> > > > > > > > 
> > > > > > > > Regards, Joonas
> > > > > > > > 
> > > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the
> > > > > > > > information in one consolidated place:
> > > > > > > > 
> > > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs
> > > > > > > 
> > > > > > > I prefer email... certainly for bugs that can't be reproduced.
> > > > > > 
> > > > > > By adding it to the Bugzilla it may be recognized by somebody else
> > > > > > who is experiencing a similar issue. Internet points are not 
> > > > > > deducted
> > > > > > for submitting bugs in good faith, even if they get closed as
> > > > > > NOTABUG.
> > > > 
> > > > Well, your documentation suggests you'll deduce my internet points:
> > > > 
> > > >   Before filing the bug, please try to reproduce your issue with the
> > > >   latest kernel. Use the latest drm-tip branch from
> > > >   http://cgit.freedesktop.org/drm-tip and build as instructed on our
> > > >   Build Guide.
> > > > 
> > > > :-)
> > > 
> > > I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if
> > > it re-appears (but it takes long time to reproduce :-().
> > 
> > If we can or can not reproduce the issue with drm-tip, is a very useful
> > datapoint for us. If we can not reproduce, it'll be possible to bisect
> > which commit fixed it, and backport that. On the other hand, if it's
> > still reproducible, we know we're not spending time on something we
> > already fixed, and the priority gets a bump.
> 
> bisect ... is not practical on something that takes 2 days to reproduce.
> 
> > > If you think it is useful, I can try to update my machine to
> > > linux-next.
> > 
> > linux-next is closer to drm-tip, so it's better. Do you have some
> > specific reason for not wanting to run drm-tip (but linux-next is still
> > ok)?
> 
> I already have build/update scripts for -next, and I trust -next not
> to store screenshots of my desktop in my master boot record :-).
> 
> Anyway, it does happen with -next. This time, chromiums were running,
> and crash happened minute? after I exited flightgear. It can be seen
> in the logs.
> 
> Oh and I might want to mention -- machine was rather deep in swap this
> time, as in "mouse jumping when starting fgfs" and "could feel the
> chromium being swapped back in". I might have had this situation
> before, and just powercycled the machine "because it is so deep in
> swap that it will not recover".
> 
> top says:
> 
> top - 19:18:24 up 2 days,  8:03,  2 users,  load average: 3.02, 3.45,
> 3.21
> Tasks: 141 total,   1 running,  86 sleeping,   0 stopped,   2 zombie
> %Cpu(s): 18.8 us,  7.6 sy,  3.0 ni, 68.4 id,  1.3 wa,  0.0 hi,  0.9
> si,  0.0 st
> KiB Mem:   5967968 total,   663244 used,  5304724 free,48876
> buffers
> KiB Swap:  1681428 total,   170904 used,  1510524 free.   446280
> cached Mem
> 
> but of course that memory is free once everything died.
> 
> Any ideas? Should I go back to v4.19 to see if it happens there, too?

linux-next includes very much the same code as drm-tip. There's nobody
magically reviewing the code more than it is reviewed for inclusion into
drm-tip, when it is fed into linux-next. So thinking linux-next would be
some way safer is an illusion.

It sounds like having memory pressure expedites the corruption, which
should make it easier to reproduce and thus fix.

So if you could please try drm-tip reproducing AND open a bug in Bugzilla.
If you are unwilling to do that, it is very difficult to help you more.

Regards, Joonas

> 
> 
> Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related?

2018-12-12 Thread Pavel Machek
Hi!

> > > > > > > There's one similar for nouveau in Bugzilla, but it seems like a 
> > > > > > > genuine
> > > > > > > memory corruption (1 bit flipped):
> > > > > > > 
> > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880
> > > > > > > 
> > > > > > > Any extra information would be of use :)
> > > > > > > 
> > > > > > > Regards, Joonas
> > > > > > > 
> > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the
> > > > > > > information in one consolidated place:
> > > > > > > 
> > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs
> > > > > > 
> > > > > > I prefer email... certainly for bugs that can't be reproduced.
> > > > > 
> > > > > By adding it to the Bugzilla it may be recognized by somebody else
> > > > > who is experiencing a similar issue. Internet points are not deducted
> > > > > for submitting bugs in good faith, even if they get closed as
> > > > > NOTABUG.
> > > 
> > > Well, your documentation suggests you'll deduce my internet points:
> > > 
> > >   Before filing the bug, please try to reproduce your issue with the
> > >   latest kernel. Use the latest drm-tip branch from
> > >   http://cgit.freedesktop.org/drm-tip and build as instructed on our
> > >   Build Guide.
> > > 
> > > :-)
> > 
> > I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if
> > it re-appears (but it takes long time to reproduce :-().
> 
> If we can or can not reproduce the issue with drm-tip, is a very useful
> datapoint for us. If we can not reproduce, it'll be possible to bisect
> which commit fixed it, and backport that. On the other hand, if it's
> still reproducible, we know we're not spending time on something we
> already fixed, and the priority gets a bump.

bisect ... is not practical on something that takes 2 days to reproduce.

> > If you think it is useful, I can try to update my machine to
> > linux-next.
> 
> linux-next is closer to drm-tip, so it's better. Do you have some
> specific reason for not wanting to run drm-tip (but linux-next is still
> ok)?

I already have build/update scripts for -next, and I trust -next not
to store screenshots of my desktop in my master boot record :-).

Anyway, it does happen with -next. This time, chromiums were running,
and crash happened minute? after I exited flightgear. It can be seen
in the logs.

Oh and I might want to mention -- machine was rather deep in swap this
time, as in "mouse jumping when starting fgfs" and "could feel the
chromium being swapped back in". I might have had this situation
before, and just powercycled the machine "because it is so deep in
swap that it will not recover".

top says:

top - 19:18:24 up 2 days,  8:03,  2 users,  load average: 3.02, 3.45,
3.21
Tasks: 141 total,   1 running,  86 sleeping,   0 stopped,   2 zombie
%Cpu(s): 18.8 us,  7.6 sy,  3.0 ni, 68.4 id,  1.3 wa,  0.0 hi,  0.9
si,  0.0 st
KiB Mem:   5967968 total,   663244 used,  5304724 free,48876
buffers
KiB Swap:  1681428 total,   170904 used,  1510524 free.   446280
cached Mem

but of course that memory is free once everything died.

Any ideas? Should I go back to v4.19 to see if it happens there, too?


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


delme.gz
Description: application/gzip


signature.asc
Description: Digital signature