Re: 4.18 regression: dvb-usb-v2: General Protection Fault shortly after boot

2018-09-22 Thread Dan Ziemba
On Sat, 2018-09-22 at 18:31 -0400, Dan Ziemba wrote:
> On Sat, 2018-09-22 at 07:21 -0300, Mauro Carvalho Chehab wrote:
> > Em Thu, 20 Sep 2018 00:07:09 -0400
> > Dan Ziemba  escreveu:
> > 
> > > I reported this on bugzilla also a few days ago, but I'm not sure
> > > if
> > > that is actually the right place to report, so copying to the
> > > mailing
> > > list...
> > 
> > I saw a report on BZ, but haven't time yet to dig into it. Those
> > days, it is usually better to report via the ML.
> >  
> > > 
> > > Starting with the first 4.18 RC kernel, my system experiences
> > > general
> > > protection faults leading to kernel panic shortly after the login
> > > prompt appears on most boots.  Occasionally that doesn't happen
> > > and
> > > instead numerous other seemingly random stack traces are printed
> > > (bad
> > > page map, scheduling while atomic, null pointer deref, etc), but
> > > either
> > > way the system is unusable.  This bug remains up through the
> > > latest
> > > mainline kernel 4.19-rc2.
> > > 
> > > Booting with my USB ATSC tv tuner disconnected prevents the bug
> > > from
> > > happening.
> > > 
> > > 
> > > Kernel bisection between v4.17 and 4.18-rc1 shows problem is
> > > caused
> > > by:
> > > 
> > > 1a0c10ed7bb1 media: dvb-usb-v2: stop using coherent memory for
> > > URBs
> > > 
> > > 
> > > Building both 4.18.6 and 4.19-rc2 with that commit reverted
> > > resolves
> > > the bug for me.  
> > 
> > There's something really weird on it: that patch changes a code
> > that
> > it is only called when the device is streaming. It shouldn't be
> > causing GFP/kernel panic, depending if the machine was booted with
> > or without it.
> 
> It hadn't occurred to me to try disabled my tv software.  When I
> disable tvheadend so it doesn't start at boot, crash does not happen
> until I later start it manually.  I believe it does some scanning
> through the channels at start up to update EPG data.
> 
> > 
> > Perhaps it would be a side effect due to some changes at the USB
> > subsystem? There are some changes happening there changing some
> > locks.
> > 
> > I see one minor issue there: it is using GFP_ATOMIC instead
> > of GFP_KERNEL.
> > 
> > Could you please try to change this line:
> > 
> > stream->buf_list[stream->buf_num] = kzalloc(size, GFP_ATOMIC);
> > 
> > to
> > 
> > stream->buf_list[stream->buf_num] = kzalloc(size, GFP_KERNEL);
> 
> I'll give this a try now.

I built from mainline HEAD, currently 4.19rc4.r209.g10dc890d4228, and
was able to reproduce the bug before any code changes. Stack trace from
the one test is attached.

I then rebuilt with the above line changed, but the problem continues. 
Stack traces from two tests are attached.  First one was a null pointer
deref instead of general protection fault, but I have seen that beforeas well.  

I have noticed that with this newer kernel version (with and without
code change), the crash does not always happen immediately after
starting tvheadend.  A few times, I have been able to tune in a channel
and watch for a few seconds.  Then the crash would happen after
flipping through 3 or 4 channels.  

> 
> > 
> > Also, it would be great if you could post the GPF logs.
> 
> It's difficult to capture much, since the system often locks up
> without
> syncing to disk.  The stack traces appear pretty random to me, but I
> have attached two examples I captured by tailing dmesg over ssh while
> starting tvheadend. In the first, there was actually not a complete
> lock up, so it is complete.  For the second one, there was a complete
> lockup and quite a bit more printed on the local console that didn't
> make it though the network. 
> 
> > 
> > > 
> > > 
> > > My DVB hardware uses driver mxl111sf:
> > > 
> > > Bus 002 Device 003: ID 2040:c61b Hauppauge 
> > > Device Descriptor:
> > >   bLength18
> > >   bDescriptorType 1
> > >   bcdUSB   2.00
> > >   bDeviceClass0 
> > >   bDeviceSubClass 0 
> > >   bDeviceProtocol 0 
> > >   bMaxPacketSize064
> > >   idVendor   0x2040 Hauppauge
> > >   idProduct  0xc61b 
> > >   bcdDevice0.00
> > >   iManufacturer   1 Hauppauge
> > >   iProduct2 WinTV Aero-M
> > > 
> > > Other system info:
> > > 
> > > Arch Linux x86_64
> > > Intel i7-3770
> > > 16 GB ram
> > > 
> > > Bugzilla:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=201055
> > > 
> > > Arch bug:
> > > https://bugs.archlinux.org/task/59990
> > > 
> > > 
> > > Thanks,
> > > Dan Ziemba
> > > 
> > > 
> > 
> > 
> > 
> > Thanks,
> > Mauro
syslog:warn  : [   57.773807] systemd-journald[337]: File 
/var/log/journal/9ebf93d137434ec68b05472bb8d498ab/user-1337.journal corrupted 
or uncleanly shut down, renaming and replacing.
kern  :err   : [   59.912749] usb 4-1.5: dvb_usb_v2: 2nd usb_bulk_msg() 
failed=-110
kern  :err   : [   59.912816] error writing addr: 0x8d, mask: 0x01, data: 0x01, 
retrying...
kern  :warn  : [   60.260210] usb 

Re: 4.18 regression: dvb-usb-v2: General Protection Fault shortly after boot

2018-09-22 Thread Dan Ziemba
On Sat, 2018-09-22 at 07:21 -0300, Mauro Carvalho Chehab wrote:
> Em Thu, 20 Sep 2018 00:07:09 -0400
> Dan Ziemba  escreveu:
> 
> > I reported this on bugzilla also a few days ago, but I'm not sure
> > if
> > that is actually the right place to report, so copying to the
> > mailing
> > list...
> 
> I saw a report on BZ, but haven't time yet to dig into it. Those
> days, it is usually better to report via the ML.
>  
> > 
> > Starting with the first 4.18 RC kernel, my system experiences
> > general
> > protection faults leading to kernel panic shortly after the login
> > prompt appears on most boots.  Occasionally that doesn't happen and
> > instead numerous other seemingly random stack traces are printed
> > (bad
> > page map, scheduling while atomic, null pointer deref, etc), but
> > either
> > way the system is unusable.  This bug remains up through the latest
> > mainline kernel 4.19-rc2.
> > 
> > Booting with my USB ATSC tv tuner disconnected prevents the bug
> > from
> > happening.
> > 
> > 
> > Kernel bisection between v4.17 and 4.18-rc1 shows problem is caused
> > by:
> > 
> > 1a0c10ed7bb1 media: dvb-usb-v2: stop using coherent memory for URBs
> > 
> > 
> > Building both 4.18.6 and 4.19-rc2 with that commit reverted
> > resolves
> > the bug for me.  
> 
> There's something really weird on it: that patch changes a code that
> it is only called when the device is streaming. It shouldn't be
> causing GFP/kernel panic, depending if the machine was booted with
> or without it.

It hadn't occurred to me to try disabled my tv software.  When I
disable tvheadend so it doesn't start at boot, crash does not happen
until I later start it manually.  I believe it does some scanning
through the channels at start up to update EPG data.

> 
> Perhaps it would be a side effect due to some changes at the USB
> subsystem? There are some changes happening there changing some
> locks.
> 
> I see one minor issue there: it is using GFP_ATOMIC instead
> of GFP_KERNEL.
> 
> Could you please try to change this line:
> 
>   stream->buf_list[stream->buf_num] = kzalloc(size, GFP_ATOMIC);
> 
> to
> 
>   stream->buf_list[stream->buf_num] = kzalloc(size, GFP_KERNEL);

I'll give this a try now.

> 
> Also, it would be great if you could post the GPF logs.

It's difficult to capture much, since the system often locks up without
syncing to disk.  The stack traces appear pretty random to me, but I
have attached two examples I captured by tailing dmesg over ssh while
starting tvheadend. In the first, there was actually not a complete
lock up, so it is complete.  For the second one, there was a complete
lockup and quite a bit more printed on the local console that didn't
make it though the network. 

> 
> > 
> > 
> > My DVB hardware uses driver mxl111sf:
> > 
> > Bus 002 Device 003: ID 2040:c61b Hauppauge 
> > Device Descriptor:
> >   bLength18
> >   bDescriptorType 1
> >   bcdUSB   2.00
> >   bDeviceClass0 
> >   bDeviceSubClass 0 
> >   bDeviceProtocol 0 
> >   bMaxPacketSize064
> >   idVendor   0x2040 Hauppauge
> >   idProduct  0xc61b 
> >   bcdDevice0.00
> >   iManufacturer   1 Hauppauge
> >   iProduct2 WinTV Aero-M
> > 
> > Other system info:
> > 
> > Arch Linux x86_64
> > Intel i7-3770
> > 16 GB ram
> > 
> > Bugzilla:
> > https://bugzilla.kernel.org/show_bug.cgi?id=201055
> > 
> > Arch bug:
> > https://bugs.archlinux.org/task/59990
> > 
> > 
> > Thanks,
> > Dan Ziemba
> > 
> > 
> 
> 
> 
> Thanks,
> Mauro
kern  :notice: [  410.089420] audit: type=1130 audit(1537653893.759:73): pid=1 
uid=0 auid=4294967295 ses=4294967295 msg='unit=tvheadend comm="systemd" 
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
kern  :err   : [  412.638173] usb 4-1.5: dvb_usb_v2: 2nd usb_bulk_msg() 
failed=-110
kern  :err   : [  412.638229] error writing addr: 0x8d, mask: 0x01, data: 0x01, 
retrying...
kern  :warn  : [  412.985663] usb 4-1.5: DVB: adapter 0 frontend 0 frequency 0 
out of range (5400..85800)
kern  :err   : [  415.198280] usb 4-1.5: dvb_usb_v2: 2nd usb_bulk_msg() 
failed=-110
kern  :err   : [  415.198342] error writing addr: 0x8d, mask: 0x01, data: 0x01, 
retrying...
kern  :warn  : [  429.186180] general protection fault:  [#1] PREEMPT SMP 
PTI
kern  :warn  : [  429.186280] CPU: 2 PID: 288 Comm: md1_raid6 Not tainted 
4.18.9-arch1-1-ARCH #1
kern  :warn  : [  429.186328] Hardware name: To Be Filled By O.E.M. To Be 
Filled By O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013
kern  :warn  : [  429.186398] RIP: 0010:memcpy_erms+0x6/0x10
kern  :warn  : [  429.186427] Code: 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 
d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 
89 d1  a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 
kern  :warn  : [  429.186588] RSP: 0018:a38c03be7a70 EFLAGS: 00010206
kern  :warn  : [  429.186625] RAX: 900d75115000 

Re: 4.18 regression: dvb-usb-v2: General Protection Fault shortly after boot

2018-09-22 Thread Mauro Carvalho Chehab
Em Thu, 20 Sep 2018 00:07:09 -0400
Dan Ziemba  escreveu:

> I reported this on bugzilla also a few days ago, but I'm not sure if
> that is actually the right place to report, so copying to the mailing
> list...

I saw a report on BZ, but haven't time yet to dig into it. Those
days, it is usually better to report via the ML.
 
> 
> Starting with the first 4.18 RC kernel, my system experiences general
> protection faults leading to kernel panic shortly after the login
> prompt appears on most boots.  Occasionally that doesn't happen and
> instead numerous other seemingly random stack traces are printed (bad
> page map, scheduling while atomic, null pointer deref, etc), but either
> way the system is unusable.  This bug remains up through the latest
> mainline kernel 4.19-rc2.
> 
> Booting with my USB ATSC tv tuner disconnected prevents the bug from
> happening.
> 
> 
> Kernel bisection between v4.17 and 4.18-rc1 shows problem is caused by:
> 
> 1a0c10ed7bb1 media: dvb-usb-v2: stop using coherent memory for URBs
> 
> 
> Building both 4.18.6 and 4.19-rc2 with that commit reverted resolves
> the bug for me.  

There's something really weird on it: that patch changes a code that
it is only called when the device is streaming. It shouldn't be
causing GFP/kernel panic, depending if the machine was booted with
or without it.

Perhaps it would be a side effect due to some changes at the USB
subsystem? There are some changes happening there changing some
locks.

I see one minor issue there: it is using GFP_ATOMIC instead
of GFP_KERNEL.

Could you please try to change this line:

stream->buf_list[stream->buf_num] = kzalloc(size, GFP_ATOMIC);

to

stream->buf_list[stream->buf_num] = kzalloc(size, GFP_KERNEL);

Also, it would be great if you could post the GPF logs.

> 
> 
> My DVB hardware uses driver mxl111sf:
> 
> Bus 002 Device 003: ID 2040:c61b Hauppauge 
> Device Descriptor:
>   bLength18
>   bDescriptorType 1
>   bcdUSB   2.00
>   bDeviceClass0 
>   bDeviceSubClass 0 
>   bDeviceProtocol 0 
>   bMaxPacketSize064
>   idVendor   0x2040 Hauppauge
>   idProduct  0xc61b 
>   bcdDevice0.00
>   iManufacturer   1 Hauppauge
>   iProduct2 WinTV Aero-M
> 
> Other system info:
> 
> Arch Linux x86_64
> Intel i7-3770
> 16 GB ram
> 
> Bugzilla:
> https://bugzilla.kernel.org/show_bug.cgi?id=201055
> 
> Arch bug:
> https://bugs.archlinux.org/task/59990
> 
> 
> Thanks,
> Dan Ziemba
> 
> 



Thanks,
Mauro


4.18 regression: dvb-usb-v2: General Protection Fault shortly after boot

2018-09-19 Thread Dan Ziemba
I reported this on bugzilla also a few days ago, but I'm not sure if
that is actually the right place to report, so copying to the mailing
list...


Starting with the first 4.18 RC kernel, my system experiences general
protection faults leading to kernel panic shortly after the login
prompt appears on most boots.  Occasionally that doesn't happen and
instead numerous other seemingly random stack traces are printed (bad
page map, scheduling while atomic, null pointer deref, etc), but either
way the system is unusable.  This bug remains up through the latest
mainline kernel 4.19-rc2.

Booting with my USB ATSC tv tuner disconnected prevents the bug from
happening.


Kernel bisection between v4.17 and 4.18-rc1 shows problem is caused by:

1a0c10ed7bb1 media: dvb-usb-v2: stop using coherent memory for URBs


Building both 4.18.6 and 4.19-rc2 with that commit reverted resolves
the bug for me.  


My DVB hardware uses driver mxl111sf:

Bus 002 Device 003: ID 2040:c61b Hauppauge 
Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   2.00
  bDeviceClass0 
  bDeviceSubClass 0 
  bDeviceProtocol 0 
  bMaxPacketSize064
  idVendor   0x2040 Hauppauge
  idProduct  0xc61b 
  bcdDevice0.00
  iManufacturer   1 Hauppauge
  iProduct2 WinTV Aero-M

Other system info:

Arch Linux x86_64
Intel i7-3770
16 GB ram

Bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=201055

Arch bug:
https://bugs.archlinux.org/task/59990


Thanks,
Dan Ziemba