Re: dvb usb issues since kernel 4.9
Em Fri, 26 Jan 2018 17:37:39 -0200 Mauro Carvalho Chehab escreveu: > Em Fri, 26 Jan 2018 12:17:37 -0200 > Mauro Carvalho Chehab escreveu: > > > Hi Alan, > > > > Em Mon, 8 Jan 2018 14:15:35 -0500 (EST) > > Alan Stern escreveu: > > > > > On Mon, 8 Jan 2018, Linus Torvalds wrote: > > > > > > > Can somebody tell which softirq it is that dvb/usb cares about? > > > > > > I don't know about the DVB part. The USB part is a little difficult to > > > analyze, mostly because the bug reports I've seen are mostly from > > > people running non-vanilla kernels. > > > > I suspect that the main reason for people not using non-vanilla Kernels > > is that, among other bugs, the dwc2 upstream driver has serious troubles > > handling ISOCH traffic. > > > > Using Kernel 4.15-rc7 from this git tree: > > https://git.linuxtv.org/mchehab/experimental.git/log/?h=softirq_fixup > > > > (e. g. with the softirq bug partially reverted with Linux patch, and > > the DWC2 deferred probe fixed) > > > > With a PCTV 461e device, with uses em28xx driver + Montage frontend > > (with is the same used on dvbsky hardware - except for em28xx). > > > > This device doesn't support bulk for DVB, just ISOCH. The drivers work > > fine on x86. > > > > Using a test signal at the bit rate of 56698,4 Kbits/s, that's what > > happens, when capturing less than one second of data: > > > > $ dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m > > -t2dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m -t2 > > Using LNBf UNIVERSAL > > Universal, Europe > > Freqs : 10800 to 11800 MHz, LO: 9750 MHz > > Freqs : 11600 to 12700 MHz, LO: 10600 MHz > > using demux 'dvb0.demux0' > > reading channels from file '/home/mchehab/dvb_channel.conf' > > tuning to 11468000 Hz > >(0x00) Signal= -33.90dBm > > Lock (0x1f) Signal= -33.90dBm C/N= 30.28dB postBER= 2.33x10^-6 > > dvb_dev_set_bufsize: buffer set to 6160384 > > dvb_set_pesfilter to 0x2000 > > 354.08s: Starting capture > > 354.73s: only read 59220 bytes > > 354.73s: Stopping capture > > > > [ 354.000827] dwc2 3f98.usb: DWC OTG HCD EP DISABLE: > > bEndpointAddress=0x84, ep->hcpriv=116f41b2 > > [ 354.000859] dwc2 3f98.usb: DWC OTG HCD EP RESET: > > bEndpointAddress=0x84 > > [ 354.010744] dwc2 3f98.usb: --Host Channel 5 Interrupt: Frame > > Overrun-- > > ... (hundreds of thousands of Frame Overrun messages) > > [ 354.660857] dwc2 3f98.usb: --Host Channel 5 Interrupt: Frame > > Overrun-- > > [ 354.660935] dwc2 3f98.usb: DWC OTG HCD URB Dequeue > > [ 354.660959] dwc2 3f98.usb: Called usb_hcd_giveback_urb() > > [ 354.660966] dwc2 3f98.usb: urb->status = 0 > > [ 354.660992] dwc2 3f98.usb: DWC OTG HCD URB Dequeue > > [ 354.661001] dwc2 3f98.usb: Called usb_hcd_giveback_urb() > > [ 354.661008] dwc2 3f98.usb: urb->status = 0 > > [ 354.661054] dwc2 3f98.usb: DWC OTG HCD URB Dequeue > > [ 354.661065] dwc2 3f98.usb: Called usb_hcd_giveback_urb() > > [ 354.661072] dwc2 3f98.usb: urb->status = 0 > > [ 354.661107] dwc2 3f98.usb: DWC OTG HCD URB Dequeue > > [ 354.661120] dwc2 3f98.usb: Called usb_hcd_giveback_urb() > > [ 354.661127] dwc2 3f98.usb: urb->status = 0 > > [ 354.661146] dwc2 3f98.usb: DWC OTG HCD URB Dequeue > > [ 354.661158] dwc2 3f98.usb: Called usb_hcd_giveback_urb() > > [ 354.661165] dwc2 3f98.usb: urb->status = 0 > > Btw, > > Just in case, I also applied all recent pending dwc2 patches I found at > linux-usb (even trivial unrelated ones) at: > > https://git.linuxtv.org/mchehab/experimental.git/log/?h=dwc2_patches > > No differences. ISOCH is still broken. > > If anyone wants to see the full logs, it is there: > https://pastebin.com/XJYyTwPv Someone pointed me in priv that applying a change at DWC2 BRCM profile to enable uframe_sched might help. So, I wrote this patch: https://git.linuxtv.org/mchehab/experimental.git/commit/?h=v4.15%2bmedia%2bdwc2&id=19abf0026b7bf1bd44aa9d2add9f958935760ded And applied on the top of this branch: https://git.linuxtv.org/mchehab/experimental.git/log/?h=v4.15%2bmedia%2bdwc2 It is based on Kernel 4.15 vanilla. I applied: - all media -next patches that will be sent to Kernel 4.16-rc1; - DWC2 patches submitted by Gregor at linux-usb ML; - Linus softirq test patch: https://git.linuxtv.org/mchehab/experimental.git/commit/?h=v4.15%2bmedia%2bdwc2&id=ccf833fd4a5b99c3d3cf2c09c065670f74a230a7 - A DT patch that enables VCIQ (needed by some GPU drivers): https://git.linuxtv.org/mchehab/experimental.git/commit/?h=v4.15%2bmedia%2bdwc2&id=fd4e9ca6f41d35b6234c30fa29937141e0c09570 - a few debug patches like this one: https://git.linuxtv.org/mchehab/experimental.git/commit/?h=v4.15%2bmedia%2bdwc2&id=f50669c18394f5b5674630e2ebf78a06b023626f I didn't notice any difference.
Re: dvb usb issues since kernel 4.9
Em Fri, 26 Jan 2018 12:17:37 -0200 Mauro Carvalho Chehab escreveu: > Hi Alan, > > Em Mon, 8 Jan 2018 14:15:35 -0500 (EST) > Alan Stern escreveu: > > > On Mon, 8 Jan 2018, Linus Torvalds wrote: > > > > > Can somebody tell which softirq it is that dvb/usb cares about? > > > > I don't know about the DVB part. The USB part is a little difficult to > > analyze, mostly because the bug reports I've seen are mostly from > > people running non-vanilla kernels. > > I suspect that the main reason for people not using non-vanilla Kernels > is that, among other bugs, the dwc2 upstream driver has serious troubles > handling ISOCH traffic. > > Using Kernel 4.15-rc7 from this git tree: > https://git.linuxtv.org/mchehab/experimental.git/log/?h=softirq_fixup > > (e. g. with the softirq bug partially reverted with Linux patch, and > the DWC2 deferred probe fixed) > > With a PCTV 461e device, with uses em28xx driver + Montage frontend > (with is the same used on dvbsky hardware - except for em28xx). > > This device doesn't support bulk for DVB, just ISOCH. The drivers work > fine on x86. > > Using a test signal at the bit rate of 56698,4 Kbits/s, that's what > happens, when capturing less than one second of data: > > $ dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m > -t2dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m -t2 > Using LNBf UNIVERSAL > Universal, Europe > Freqs : 10800 to 11800 MHz, LO: 9750 MHz > Freqs : 11600 to 12700 MHz, LO: 10600 MHz > using demux 'dvb0.demux0' > reading channels from file '/home/mchehab/dvb_channel.conf' > tuning to 11468000 Hz >(0x00) Signal= -33.90dBm > Lock (0x1f) Signal= -33.90dBm C/N= 30.28dB postBER= 2.33x10^-6 > dvb_dev_set_bufsize: buffer set to 6160384 > dvb_set_pesfilter to 0x2000 > 354.08s: Starting capture > 354.73s: only read 59220 bytes > 354.73s: Stopping capture > > [ 354.000827] dwc2 3f98.usb: DWC OTG HCD EP DISABLE: > bEndpointAddress=0x84, ep->hcpriv=116f41b2 > [ 354.000859] dwc2 3f98.usb: DWC OTG HCD EP RESET: bEndpointAddress=0x84 > [ 354.010744] dwc2 3f98.usb: --Host Channel 5 Interrupt: Frame Overrun-- > ... (hundreds of thousands of Frame Overrun messages) > [ 354.660857] dwc2 3f98.usb: --Host Channel 5 Interrupt: Frame Overrun-- > [ 354.660935] dwc2 3f98.usb: DWC OTG HCD URB Dequeue > [ 354.660959] dwc2 3f98.usb: Called usb_hcd_giveback_urb() > [ 354.660966] dwc2 3f98.usb: urb->status = 0 > [ 354.660992] dwc2 3f98.usb: DWC OTG HCD URB Dequeue > [ 354.661001] dwc2 3f98.usb: Called usb_hcd_giveback_urb() > [ 354.661008] dwc2 3f98.usb: urb->status = 0 > [ 354.661054] dwc2 3f98.usb: DWC OTG HCD URB Dequeue > [ 354.661065] dwc2 3f98.usb: Called usb_hcd_giveback_urb() > [ 354.661072] dwc2 3f98.usb: urb->status = 0 > [ 354.661107] dwc2 3f98.usb: DWC OTG HCD URB Dequeue > [ 354.661120] dwc2 3f98.usb: Called usb_hcd_giveback_urb() > [ 354.661127] dwc2 3f98.usb: urb->status = 0 > [ 354.661146] dwc2 3f98.usb: DWC OTG HCD URB Dequeue > [ 354.661158] dwc2 3f98.usb: Called usb_hcd_giveback_urb() > [ 354.661165] dwc2 3f98.usb: urb->status = 0 Btw, Just in case, I also applied all recent pending dwc2 patches I found at linux-usb (even trivial unrelated ones) at: https://git.linuxtv.org/mchehab/experimental.git/log/?h=dwc2_patches No differences. ISOCH is still broken. If anyone wants to see the full logs, it is there: https://pastebin.com/XJYyTwPv Cheers, Mauro
Re: dvb usb issues since kernel 4.9
Hi Alan, Em Mon, 8 Jan 2018 14:15:35 -0500 (EST) Alan Stern escreveu: > On Mon, 8 Jan 2018, Linus Torvalds wrote: > > > Can somebody tell which softirq it is that dvb/usb cares about? > > I don't know about the DVB part. The USB part is a little difficult to > analyze, mostly because the bug reports I've seen are mostly from > people running non-vanilla kernels. I suspect that the main reason for people not using non-vanilla Kernels is that, among other bugs, the dwc2 upstream driver has serious troubles handling ISOCH traffic. Using Kernel 4.15-rc7 from this git tree: https://git.linuxtv.org/mchehab/experimental.git/log/?h=softirq_fixup (e. g. with the softirq bug partially reverted with Linux patch, and the DWC2 deferred probe fixed) With a PCTV 461e device, with uses em28xx driver + Montage frontend (with is the same used on dvbsky hardware - except for em28xx). This device doesn't support bulk for DVB, just ISOCH. The drivers work fine on x86. Using a test signal at the bit rate of 56698,4 Kbits/s, that's what happens, when capturing less than one second of data: $ dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m -t2dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m -t2 Using LNBf UNIVERSAL Universal, Europe Freqs : 10800 to 11800 MHz, LO: 9750 MHz Freqs : 11600 to 12700 MHz, LO: 10600 MHz using demux 'dvb0.demux0' reading channels from file '/home/mchehab/dvb_channel.conf' tuning to 11468000 Hz (0x00) Signal= -33.90dBm Lock (0x1f) Signal= -33.90dBm C/N= 30.28dB postBER= 2.33x10^-6 dvb_dev_set_bufsize: buffer set to 6160384 dvb_set_pesfilter to 0x2000 354.08s: Starting capture 354.73s: only read 59220 bytes 354.73s: Stopping capture [ 354.000827] dwc2 3f98.usb: DWC OTG HCD EP DISABLE: bEndpointAddress=0x84, ep->hcpriv=116f41b2 [ 354.000859] dwc2 3f98.usb: DWC OTG HCD EP RESET: bEndpointAddress=0x84 [ 354.010744] dwc2 3f98.usb: --Host Channel 5 Interrupt: Frame Overrun-- ... (hundreds of thousands of Frame Overrun messages) [ 354.660857] dwc2 3f98.usb: --Host Channel 5 Interrupt: Frame Overrun-- [ 354.660935] dwc2 3f98.usb: DWC OTG HCD URB Dequeue [ 354.660959] dwc2 3f98.usb: Called usb_hcd_giveback_urb() [ 354.660966] dwc2 3f98.usb: urb->status = 0 [ 354.660992] dwc2 3f98.usb: DWC OTG HCD URB Dequeue [ 354.661001] dwc2 3f98.usb: Called usb_hcd_giveback_urb() [ 354.661008] dwc2 3f98.usb: urb->status = 0 [ 354.661054] dwc2 3f98.usb: DWC OTG HCD URB Dequeue [ 354.661065] dwc2 3f98.usb: Called usb_hcd_giveback_urb() [ 354.661072] dwc2 3f98.usb: urb->status = 0 [ 354.661107] dwc2 3f98.usb: DWC OTG HCD URB Dequeue [ 354.661120] dwc2 3f98.usb: Called usb_hcd_giveback_urb() [ 354.661127] dwc2 3f98.usb: urb->status = 0 [ 354.661146] dwc2 3f98.usb: DWC OTG HCD URB Dequeue [ 354.661158] dwc2 3f98.usb: Called usb_hcd_giveback_urb() [ 354.661165] dwc2 3f98.usb: urb->status = 0 Kernel was compiled with: CONFIG_USB_DWC2=y CONFIG_USB_DWC2_HOST=y # CONFIG_USB_DWC2_PERIPHERAL is not set # CONFIG_USB_DWC2_DUAL_ROLE is not set # CONFIG_USB_DWC2_PCI is not set CONFIG_USB_DWC2_DEBUG=y # CONFIG_USB_DWC2_VERBOSE is not set # CONFIG_USB_DWC2_TRACK_MISSED_SOFS is not set CONFIG_USB_DWC2_DEBUG_PERIODIC=y As reference, that's the output of lsusb for the PCTV usb hardware: $ lsusb -v -d 2013:0258 Bus 001 Device 005: ID 2013:0258 PCTV Systems Couldn't open device, some information will be missing Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize064 idVendor 0x2013 PCTV Systems idProduct 0x0258 bcdDevice1.00 iManufacturer 3 iProduct1 iSerial 2 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 41 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 500mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol 0 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x84 EP 4 IN bmAttributes1 Transfer TypeIsochronous Synch Type None Usage Type Data wMaxPacketSize 0x 1x 0 bytes bInterva
Re: dvb usb issues since kernel 4.9
Em Sat, 13 Jan 2018 07:09:20 -0200 Mauro Carvalho Chehab escreveu: > Em Fri, 12 Jan 2018 13:48:46 -0800 > Eric Dumazet escreveu: > > > On Fri, 2018-01-12 at 19:13 -0200, Mauro Carvalho Chehab wrote: > > > > > > > > > The .config file used to build the Kernel is at: > > > https://pastebin.com/wpZghann > > > > > > > Hi Mauro > > > > Any chance you can try CONFIG_HZ_1000=y, CONFIG_HZ=1000 ? It actually made it a lot worse! without Linus patch (or reverting the softirq patch), on a 4 minutes of capture, it got all those errors: Jan 13 10:41:41 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 1) Jan 13 10:41:42 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: MPEG2AUDIO @ #1912 Continuity counter error (total 1) Jan 13 10:42:14 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 3) Jan 13 10:42:47 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 4) Jan 13 10:42:58 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 5) Jan 13 10:42:58 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: MPEG2AUDIO @ #1912 Continuity counter error (total 2) Jan 13 10:43:34 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 9) Jan 13 10:43:37 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: MPEG2AUDIO @ #1912 Continuity counter error (total 5) Jan 13 10:44:00 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 12) Jan 13 10:44:29 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 13) Thanks, Mauro
Re: dvb usb issues since kernel 4.9
Em Fri, 12 Jan 2018 13:48:46 -0800 Eric Dumazet escreveu: > On Fri, 2018-01-12 at 19:13 -0200, Mauro Carvalho Chehab wrote: > > > > > > The .config file used to build the Kernel is at: > > https://pastebin.com/wpZghann > > > > Hi Mauro > > Any chance you can try CONFIG_HZ_1000=y, CONFIG_HZ=1000 ? I can do such test to satisfy your curiosity, but that doesn't sound the right fix. See, almost all TV and set top boxes(STB) run Linux nowadays and usually come with ARM cpus designed to "just do their job" (e. g. CPUs with low clocks). There, power consumption is a must. This bug very likely affect those devices, once migrated to Kernel 4.9+. Changing from NO_HZ to HZ=1000 on TV/STB will for sure have bad side effects on those types of devices, increasing power consumption. Not saying that this will be environmentally very bad, as the number of just TV unit sales is at the order of 230 million units per year[1]. [1] https://www.statista.com/statistics/461316/global-tv-unit-sales/ Thanks, Mauro
Re: dvb usb issues since kernel 4.9
On Fri, 2018-01-12 at 19:13 -0200, Mauro Carvalho Chehab wrote: > > > The .config file used to build the Kernel is at: > https://pastebin.com/wpZghann > Hi Mauro Any chance you can try CONFIG_HZ_1000=y, CONFIG_HZ=1000 ? Thanks.
Re: dvb usb issues since kernel 4.9
Em Tue, 9 Jan 2018 09:48:47 -0800 Linus Torvalds escreveu: > On Tue, Jan 9, 2018 at 9:27 AM, Eric Dumazet wrote: > > > > So yes, commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") has > > shown up multiple times in various 'regressions' > > simply because it could surface the problem more often. > > But even if you revert it, you can still make the faulty > > driver/subsystem misbehave by adding more stress to the cpu handling > > the IRQ. > > ..but that's always true. People sometimes live on the edge - often by > design (ie hardware has been designed/selected to be the crappiest > possible that still work). > > That doesn't change anything. A patch that takes "bad things can > happen" to "bad things DO happen" is a bad patch. > > > Maybe the answer is to tune the kernel for small latencies at the > > price of small throughput (situation before the patch) > > Generally we always want to tune for latency. Throughput is "easy", > but almost never interesting. > > Sure, people do batch jobs. And yes, people often _benchmark_ > throughput, because it's easy to benchmark. It's much harder to > benchmark latency, even when it's often much more important. > > A prime example is the SSD benchmarks in the last few years - they > improved _dramatically_ when people noticed that the real problem was > latency, not the idiotic maximum big-block bandwidth numbers that have > almost zero impact on most people. > > Put another way: we already have a very strong implicit bias towards > bandwidth just because it's easier to see and measure. > > That means that we generally should strive to have a explicit bias > towards optimizing for latency when that choice comes up. Just to > balance things out (and just to not take the easy way out: bandwidth > can often be improved by adding more layers of buffering and bigger > buffers, and that often ends up really hurting latency). > > > 1) Revert the patch > > Well, we can revert it only partially - limiting it to just networking > for example. > > Just saying "act the way you used to for tasklets" already seems to > have fixed the issue in DVB. > > > 2) get rid of ksoftirqd since it adds unexpected latencies. > > We can't get rid of it entirely, since the synchronous softirq code > can cause problems too. It's why we have that "maximum of ten > synchronous events" in __do_softirq(). > > And we don't *want* to get rid of it. > > We've _always_ had that small-scale "at some point we can't do it > synchronously any more". > > That is a small-scale "don't have horrible latency for _other_ things" > protection. So it's about latency too, it's just about protecting > latency of the rest of the system. > > The problem with commit 4cd13c21b207 is that it turns the small-scale > latency issues in softirq handling (they get larger latencies for lots > of hardware interrupts or even from non-preemptible kernel code) into > the _huge_ scale latency of scheduling, and does so in a racy way too. > > > 3) Let applications that expect to have high throughput make sure to > > pin their threads on cpus that are not processing IRQ. > > (And make sure to not use irqbalance, and setup IRQ cpu affinities) > > The only people that really deal in "thoughput only" tend to be the > HPC people, and they already do things like this. > > (The other end of the spectrum is the realtime people that have > extreme latency requirements, who do things like that for the reverse > reason: keeping one or more CPU's reserved for the particular > low-latency realtime job). Ok, it took me some time - and a faster microSD - in order to be sure that the data loss weren't due to bad storage performance, but I have now some test results. In summary, indeed the ksoftirq commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") is causing data losses. On my tests, it generate at least one continuity error on every 1-5 minutes. Either reverting it or applying Linus proposal of partially reverting it fixes the issues. Increasing the number of URBs doesn't seem to help. I'm enclosing the dirty details below. Linus/Eric, Now that I have an environment setup, I can test whatever other alternative that would fix the UDP packet flow attack while won't break the softirq handling code. Regards, Mauro --- All tests below were done on a Raspberry Pi3 with a SanDisk Extreme U3 microSD card with 32GB and a DVBSky S960C DVB-S2 tuner with an external power supply, connected to a TCP/IP network via Ethernet (with uses USB on RPi). It also have a serial cable connected to it. It was installed with LibreELEC 8.2.2, using tvheadend backend. I'm recording one MPEG-TS service/"channel" composed of one audio and one video stream, The total traffic collected by tvheadend was about 4 Mbits/s (audio+video+EPG tables). It is part of a 58 mbits/s MPEG Transport stream, with 23 TV service/"channels" on it. While handling this issue, I found one unrelated bug, fixed on this patch: https:
Re: dvb usb issues since kernel 4.9
On Tue, 9 Jan 2018 10:58:30 -0800 Linus Torvalds wrote: > So I really think "you can use up 90% of CPU time with a UDP packet > flood from the same network" is very very very different - and > honestly not at all as important - as "you want to be able to use a > USB DVB receiver and watch/record TV". > > Because that whole "UDP packet flood from the same network" really is > something you _fundamentally_ have other mitigations for. > > I bet that whole commit was introduced because of a benchmark test, > rather than real life. No? I believe this have happened in real-life. In the form of DNS servers not being able to recover after long outage, where DNS-TTL had timeout causing legitimate traffic to overload their DNS servers. The goodput answers/sec from their DNS servers were too low, when bringing them online again. (Based on talk over beer at NetDevConf from a guy claiming they ran DNS for AWS). The commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") tries to address a fundamental problem that the network stack have when interacting with softirq in overload situations. (Maybe we can come up with a better solution?) Before this commit, when application run on same CPU as softirq, the kernel have a bad "drop off cliff" behavior, when reaching above the saturation point. This is confirmed in CloudFlare blogpost[1], which used a kernel that predates this commit. From[1] section: "A note on NUMA performance" Quote:" 1. Run receiver on another CPU, but on the same NUMA node as the RX queue. The performance as we saw above is around 360kpps. 2. With receiver on exactly same CPU as the RX queue we can get up to ~430kpps. But it creates high variability. The performance drops down to zero if the NIC is overwhelmed with packets." The behavior problem here is "performance drops down to zero if the NIC is overwhelmed with packets". That is a bad way to handle overload. Not only when attacked, but also when bringing a service online after an outage. What essentially happens is that: 1. softirq NAPI enqueue 64 packets into socket. 2. application dequeue 1 packet and invoke local_bh_enable() 3. causing softirq to run in app-timeslice, again enq 64 packets 4. app only see goodput of 1/128 of packets That is essentially what Eric solved with his commit, avoiding (3) local_bh_enable() to invoke softirq if ksoftirqd is already running. Maybe we can come up with a better solution? (as I do agree this was a too big-hammer affecting other use-cases) [1] https://blog.cloudflare.com/how-to-receive-a-million-packets/ p.s. Regarding quote[1] point "1.", after Paolo Abeni optimized the UDP code, that statement is no longer true. It now (significantly) faster to run/pin your UDP application to another CPU than the RX-CPU. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: dvb usb issues since kernel 4.9
On Tue, 2018-01-09 at 22:26 +0100, Jesper Dangaard Brouer wrote: > > I've previously experienced that you can be affected by the scheduler > granularity, which is adjustable (with CONFIG_SCHED_DEBUG=y): > > $ grep -H . /proc/sys/kernel/sched_*_granularity_ns > /proc/sys/kernel/sched_min_granularity_ns:225 > /proc/sys/kernel/sched_wakeup_granularity_ns:300 > > The above numbers were confirmed on the RPi2 (see[2]). With commit > 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), I expect/assume that > softirq processing latency is bounded by the sched_wakeup_granularity_ns, > which with 3 ms is not good enough for their use-case. Note of caution wrt twiddling sched_wakeup_granularity_ns: it must remain < sched_latency_ns/2 else you effectively disable wakeup preemption completely, turning CFS into a tick granularity scheduler. -Mike
Re: Re: dvb usb issues since kernel 4.9
On Tue, Jan 9, 2018 at 10:58 AM, Linus Torvalds wrote: > On Tue, Jan 9, 2018 at 9:57 AM, Eric Dumazet wrote: >> >> Your patch considers TASKLET_SOFTIRQ being a candidate for 'immediate >> handling', but TCP Small queues heavily use TASKLET, >> so as far as I am concerned a revert would have the same effect. > > Does it actually? > > TCP ends up dropping packets outside of the window etc, so flooding a > machine with TCP packets and causing some further processing up the > stack sounds very different from the basic packet flooding thing that > happens with NET_RX_SOFTIRQ. > > Also, honestly, the kinds of people who really worry about flooding > tend to have packet filtering in the receive path etc. > > So I really think "you can use up 90% of CPU time with a UDP packet > flood from the same network" is very very very different - and > honestly not at all as important - as "you want to be able to use a > USB DVB receiver and watch/record TV". > > Because that whole "UDP packet flood from the same network" really is > something you _fundamentally_ have other mitigations for. > > I bet that whole commit was introduced because of a benchmark test, > rather than real life. No? > > In contrast, now people are complaining about real loads not working. > > Linus I said that a revert was fine, maybe I was not clear. Clearly we can not touch anything scheduler related without breaking someone workload/assumptions on how system behaved at some point. Your patch wont solve other workloads that might have been impacted by my patch, so in one year (or next week), we will have to cope with another device driver not using tasklet but still relying on immediate softirq processing. Apparently, we have to live with softirq model forever, or switch to RT kernels. Note that we have no mitigation for something that involve flood of valid packets that no firewall can drop (without dropping legitimate packets). The 'benchmark' here is not really the trigger, only a tool validating an idea/patch.
Re: dvb usb issues since kernel 4.9
On Tue, 9 Jan 2018 15:42:35 -0200 Mauro Carvalho Chehab wrote: > Em Mon, 8 Jan 2018 11:51:04 -0800 Linus Torvalds > escreveu: > [...] > Patch makes sense to me, although I was not able to test it myself. The patch also make sense to me. I've done some basic testing with it on my high-end Broadwell system (that I use for 100Gbit/s testing). As expected the network overload case still works, as NET_RX_SOFTIRQ is not matched. > I set a RPi3 machine here with vanilla Kernel 4.14.11 running a > standard raspbian distribution (with elevator=deadline). I found a Raspberry Pi Model B+ (I think, BCM2835), that I loaded the LibreELEC distro on. One of the guys even created an image for me with a specific kernel[1] (that I just upgraded the system with). [1] https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=77031#post77031 > My plan is to do more tests along this week, and try to tweak a little > bit both userspace and kernelspace, in order to see if I can get > better results. I've previously experienced that you can be affected by the scheduler granularity, which is adjustable (with CONFIG_SCHED_DEBUG=y): $ grep -H . /proc/sys/kernel/sched_*_granularity_ns /proc/sys/kernel/sched_min_granularity_ns:225 /proc/sys/kernel/sched_wakeup_granularity_ns:300 The above numbers were confirmed on the RPi2 (see[2]). With commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), I expect/assume that softirq processing latency is bounded by the sched_wakeup_granularity_ns, which with 3 ms is not good enough for their use-case. Thus, if you manage to reproduce the case, try to see if adjusting this can mitigate the issue... Their system have non-preempt kernel, should they use PREEMPT? LibreELEC:~ # uname -a Linux LibreELEC 4.14.10 #1 SMP Tue Jan 9 17:35:03 GMT 2018 armv7l GNU/Linux [2] https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=76999#post76999 -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: Re: dvb usb issues since kernel 4.9
On Tue, Jan 9, 2018 at 9:57 AM, Eric Dumazet wrote: > > Your patch considers TASKLET_SOFTIRQ being a candidate for 'immediate > handling', but TCP Small queues heavily use TASKLET, > so as far as I am concerned a revert would have the same effect. Does it actually? TCP ends up dropping packets outside of the window etc, so flooding a machine with TCP packets and causing some further processing up the stack sounds very different from the basic packet flooding thing that happens with NET_RX_SOFTIRQ. Also, honestly, the kinds of people who really worry about flooding tend to have packet filtering in the receive path etc. So I really think "you can use up 90% of CPU time with a UDP packet flood from the same network" is very very very different - and honestly not at all as important - as "you want to be able to use a USB DVB receiver and watch/record TV". Because that whole "UDP packet flood from the same network" really is something you _fundamentally_ have other mitigations for. I bet that whole commit was introduced because of a benchmark test, rather than real life. No? In contrast, now people are complaining about real loads not working. Linus
Re: Re: dvb usb issues since kernel 4.9
On Tue, Jan 9, 2018 at 9:48 AM, Linus Torvalds wrote: > On Tue, Jan 9, 2018 at 9:27 AM, Eric Dumazet wrote: >> >> So yes, commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") has >> shown up multiple times in various 'regressions' >> simply because it could surface the problem more often. >> But even if you revert it, you can still make the faulty >> driver/subsystem misbehave by adding more stress to the cpu handling >> the IRQ. > > ..but that's always true. People sometimes live on the edge - often by > design (ie hardware has been designed/selected to be the crappiest > possible that still work). > > That doesn't change anything. A patch that takes "bad things can > happen" to "bad things DO happen" is a bad patch. I was expecting that people could get a chance to fix the root cause, instead of trying to keep status quo. Strangely, it took 18 months for someone to complain enough and 'bisect to this commit' Your patch considers TASKLET_SOFTIRQ being a candidate for 'immediate handling', but TCP Small queues heavily use TASKLET, so as far as I am concerned a revert would have the same effect.
Re: dvb usb issues since kernel 4.9
On Tue, Jan 9, 2018 at 9:42 AM, Mauro Carvalho Chehab wrote: > > On my preliminar tests, writing to a file on an ext4 partition at a > USB stick loses data up to the point to make it useless (1/4 of the data > is lost!). However, writing to a class 10 microSD card is doable. Note that most USB sticks are horrible crap. They can have write latencies counted in _seconds_. You can cause VM issues and various other non-hardware stalls with them, simply because something gets stuck waiting for a page writeout that should take a few ms on any reasonable hardware, but ends up talking half a second or more. For example, even really well-written software that tries to do things like threaded write-behind to smooth out the IO will be _totally_ screwed by the USB stick behavior (where you might write a few MB at high speeds, and then the next write - however small - takes a second because the stupid USB stick does a synchronous garbage collection. Suddenly all that clever software that tried to keep things moving along smoothly without any hiccups, and tried hard to make the USB bus have a nice constant loadm can't do anything at all about the crap hardware. So when testing writes to USB sticks, I'm not convinced you're actually testing any USB bus limitations or even really any other hardware limitations than the USB stick itself. Linus
Re: Re: dvb usb issues since kernel 4.9
On Tue, Jan 9, 2018 at 9:27 AM, Eric Dumazet wrote: > > So yes, commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") has > shown up multiple times in various 'regressions' > simply because it could surface the problem more often. > But even if you revert it, you can still make the faulty > driver/subsystem misbehave by adding more stress to the cpu handling > the IRQ. ..but that's always true. People sometimes live on the edge - often by design (ie hardware has been designed/selected to be the crappiest possible that still work). That doesn't change anything. A patch that takes "bad things can happen" to "bad things DO happen" is a bad patch. > Maybe the answer is to tune the kernel for small latencies at the > price of small throughput (situation before the patch) Generally we always want to tune for latency. Throughput is "easy", but almost never interesting. Sure, people do batch jobs. And yes, people often _benchmark_ throughput, because it's easy to benchmark. It's much harder to benchmark latency, even when it's often much more important. A prime example is the SSD benchmarks in the last few years - they improved _dramatically_ when people noticed that the real problem was latency, not the idiotic maximum big-block bandwidth numbers that have almost zero impact on most people. Put another way: we already have a very strong implicit bias towards bandwidth just because it's easier to see and measure. That means that we generally should strive to have a explicit bias towards optimizing for latency when that choice comes up. Just to balance things out (and just to not take the easy way out: bandwidth can often be improved by adding more layers of buffering and bigger buffers, and that often ends up really hurting latency). > 1) Revert the patch Well, we can revert it only partially - limiting it to just networking for example. Just saying "act the way you used to for tasklets" already seems to have fixed the issue in DVB. > 2) get rid of ksoftirqd since it adds unexpected latencies. We can't get rid of it entirely, since the synchronous softirq code can cause problems too. It's why we have that "maximum of ten synchronous events" in __do_softirq(). And we don't *want* to get rid of it. We've _always_ had that small-scale "at some point we can't do it synchronously any more". That is a small-scale "don't have horrible latency for _other_ things" protection. So it's about latency too, it's just about protecting latency of the rest of the system. The problem with commit 4cd13c21b207 is that it turns the small-scale latency issues in softirq handling (they get larger latencies for lots of hardware interrupts or even from non-preemptible kernel code) into the _huge_ scale latency of scheduling, and does so in a racy way too. > 3) Let applications that expect to have high throughput make sure to > pin their threads on cpus that are not processing IRQ. > (And make sure to not use irqbalance, and setup IRQ cpu affinities) The only people that really deal in "thoughput only" tend to be the HPC people, and they already do things like this. (The other end of the spectrum is the realtime people that have extreme latency requirements, who do things like that for the reverse reason: keeping one or more CPU's reserved for the particular low-latency realtime job). Linus
Re: dvb usb issues since kernel 4.9
Em Mon, 8 Jan 2018 11:51:04 -0800 Linus Torvalds escreveu: > On Mon, Jan 8, 2018 at 11:15 AM, Alan Stern wrote: > > > > Both dwc2_hsotg and ehci-hcd use the tasklets embedded in the > > giveback_urb_bh member of struct usb_hcd. See usb_hcd_giveback_urb() > > in drivers/usb/core/hcd.c; the calls are > > > > else if (high_prio_bh) > > tasklet_hi_schedule(&bh->bh); > > else > > tasklet_schedule(&bh->bh); > > > > As it turns out, high_prio_bh gets set for interrupt and isochronous > > URBs but not for bulk and control URBs. The DVB driver in question > > uses bulk transfers. > > Ok, so we could try out something like the appended? > > NOTE! I have not tested this at all. It LooksObvious(tm), but... > > Linus > kernel/softirq.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index 2f5e87f1bae2..97b080956fea 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -79,12 +79,16 @@ static void wakeup_softirqd(void) > > /* > * If ksoftirqd is scheduled, we do not want to process pending softirqs > - * right now. Let ksoftirqd handle this at its own rate, to get fairness. > + * right now. Let ksoftirqd handle this at its own rate, to get fairness, > + * unless we're doing some of the synchronous softirqs. > */ > -static bool ksoftirqd_running(void) > +#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ)) > +static bool ksoftirqd_running(unsigned long pending) > { > struct task_struct *tsk = __this_cpu_read(ksoftirqd); > > + if (pending & SOFTIRQ_NOW_MASK) > + return false; > return tsk && (tsk->state == TASK_RUNNING); > } > > @@ -325,7 +329,7 @@ asmlinkage __visible void do_softirq(void) > > pending = local_softirq_pending(); > > - if (pending && !ksoftirqd_running()) > + if (pending && !ksoftirqd_running(pending)) > do_softirq_own_stack(); > > local_irq_restore(flags); > @@ -352,7 +356,7 @@ void irq_enter(void) > > static inline void invoke_softirq(void) > { > - if (ksoftirqd_running()) > + if (ksoftirqd_running(local_softirq_pending())) > return; > > if (!force_irqthreads) { Hi Linus, Patch makes sense to me, although I was not able to test it myself. I set a RPi3 machine here with vanilla Kernel 4.14.11 running a standard raspbian distribution (with elevator=deadline). Right now, I'm trying to reproduce the bug with dvbv5-zap. I may eventually do more tests on some other slow machines. Usually, applications like tvheadend records just one channel. So, instead of a ~58 Mbits/s payload, it uses, typically, ~11 Mbits/s for a HD channel. This is usually filtered by hardware. Here, I'm forcing to record the entire TS, in order to make easier to reproduce the issue. So, I'm forcing a condition that it is usually worse than real usecases (at last for HD - I I don't have any DVB stream here with a 4K channel). >From what I checked so far, with vanila upstream Kernel on RPi3, just receiving a DVB stream - or receiving it and writing to /dev/null works with or without your patch. The problem starts to happen when there are concurrency with writes. On my preliminar tests, writing to a file on an ext4 partition at a USB stick loses data up to the point to make it useless (1/4 of the data is lost!). However, writing to a class 10 microSD card is doable. If you're curious enough, this is what I'm doing (that are the results while using class 10 microSD card): $ FILE=/tmp/out.ts; for i in $(seq 1 6); do echo "step $i"; rm $FILE 2>/dev/null; dvbv5-zap -l universal -c ~/vivo-channels.conf NBR -o $FILE -P -t60 2>&1|grep -E "(buffer|received)"; du $FILE 2>/dev/null; done step 1 Setting buffer length to 725 buffer overrun buffer overrun buffer overrun buffer overrun buffer overrun buffer overrun buffer overrun received 347504652 bytes (5656 Kbytes/sec) 339368 /tmp/out.ts step 2 Setting buffer length to 725 buffer overrun received 408995880 bytes (6656 Kbytes/sec) 399416 /tmp/out.ts step 3 Setting buffer length to 725 received 412999716 bytes (6722 Kbytes/sec) 403328 /tmp/out.ts step 4 Setting buffer length to 725 buffer overrun received 415564788 bytes (6763 Kbytes/sec) 405832 /tmp/out.ts step 5 Setting buffer length to 725 received 412999716 bytes (6722 Kbytes/sec) 403324 /tmp/out.ts step 6 Setting buffer length to 725 received 408366080 bytes (6646 Kbytes/sec) 398796 /tmp/out.ts My plan is to do more tests along this week, and try to tweak a little bit both userspace and kernelspace, in order to see if I can get better results. Thanks, Mauro
Re: Re: dvb usb issues since kernel 4.9
On Tue, Jan 9, 2018 at 8:51 AM, Josef Griebichler wrote: > Hi Linus, > > your patch works very good for me and others (please see > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=77006#post77006). > No errors in recordings any more. > The patch was also tested on x86_64 (Revo 3700) with positive effect. > I agree with the forum poster, that there's still an issue when recording and > watching livetv at same time. I also get audio dropouts and audio is out of > sync. > According to user smp kernel 4.9.73 with your patch on rpi and according to > user jahutchi kernel 4.11.12 on x86_64 have no such issues. > I don't know if this dropouts are related to this topic. > > If of any help I could provide perf output on raspberry with libreelec and > tvheadend. > Sorry to come late to the party. It seems problem comes from some piece of hardware/driver having some precise timing prereq, and opportunistic use of softirq/tasklet (instead maybe of hard irq handlers ) While it is true that softirq might do the job in most cases, we already have cases where this can be easily defeated, say if one cpu has suddenly to handle multiple sources of interrupts for various devices. NET_RX can easily lock the cpu for 10ms (on HZ=100 builds) So yes, commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") has shown up multiple times in various 'regressions' simply because it could surface the problem more often. But even if you revert it, you can still make the faulty driver/subsystem misbehave by adding more stress to the cpu handling the IRQ. Note that networking lacks fine control of its softirq processing. Some people found/complained that relying more on ksoftirqd was potentially adding tail latencies. Maybe the answer is to tune the kernel for small latencies at the price of small throughput (situation before the patch) 1) Revert the patch 2) get rid of ksoftirqd since it adds unexpected latencies. 3) Let applications that expect to have high throughput make sure to pin their threads on cpus that are not processing IRQ. (And make sure to not use irqbalance, and setup IRQ cpu affinities)
Aw: Re: dvb usb issues since kernel 4.9
Hi Linus, your patch works very good for me and others (please see https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=77006#post77006). No errors in recordings any more. The patch was also tested on x86_64 (Revo 3700) with positive effect. I agree with the forum poster, that there's still an issue when recording and watching livetv at same time. I also get audio dropouts and audio is out of sync. According to user smp kernel 4.9.73 with your patch on rpi and according to user jahutchi kernel 4.11.12 on x86_64 have no such issues. I don't know if this dropouts are related to this topic. If of any help I could provide perf output on raspberry with libreelec and tvheadend. Regards, Josef Gesendet: Montag, 08. Januar 2018 um 23:16 Uhr Von: "Jesper Dangaard Brouer" An: "Peter Zijlstra" Cc: "Josef Griebichler" , "Mauro Carvalho Chehab" , "Alan Stern" , "Greg Kroah-Hartman" , linux-...@vger.kernel.org, "Eric Dumazet" , "Rik van Riel" , "Paolo Abeni" , "Hannes Frederic Sowa" , linux-kernel , netdev , "Jonathan Corbet" , LMML , "David Miller" , torva...@linux-foundation.org Betreff: Re: dvb usb issues since kernel 4.9 On Mon, 8 Jan 2018 22:44:27 +0100 Peter Zijlstra wrote: > On Mon, Jan 08, 2018 at 10:31:09PM +0100, Jesper Dangaard Brouer wrote: > > I did expected the issue to get worse, when you load the Pi with > > network traffic, as now the softirq time-budget have to be shared > > between networking and USB/DVB. Thus, I guess you are running TCP and > > USB/mpeg2ts on the same CPU (why when you have 4 CPUs?...) > > Isn't networking also over USB on the Pi ? Darn, that is true. Looking at the dmesg output in http://ix.io/DOg: [ 0.405942] usbcore: registered new interface driver smsc95xx [ 5.821104] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 I don't know enough about USB... is it possible to control which CPU handles the individual USB ports, or on some other level (than ports)? -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer[http://www.linkedin.com/in/brouer]
Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018 22:44:27 +0100 Peter Zijlstra wrote: > On Mon, Jan 08, 2018 at 10:31:09PM +0100, Jesper Dangaard Brouer wrote: > > I did expected the issue to get worse, when you load the Pi with > > network traffic, as now the softirq time-budget have to be shared > > between networking and USB/DVB. Thus, I guess you are running TCP and > > USB/mpeg2ts on the same CPU (why when you have 4 CPUs?...) > > Isn't networking also over USB on the Pi ? Darn, that is true. Looking at the dmesg output in http://ix.io/DOg: [0.405942] usbcore: registered new interface driver smsc95xx [5.821104] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 I don't know enough about USB... is it possible to control which CPU handles the individual USB ports, or on some other level (than ports)? -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: dvb usb issues since kernel 4.9
On Mon, Jan 08, 2018 at 10:31:09PM +0100, Jesper Dangaard Brouer wrote: > I did expected the issue to get worse, when you load the Pi with > network traffic, as now the softirq time-budget have to be shared > between networking and USB/DVB. Thus, I guess you are running TCP and > USB/mpeg2ts on the same CPU (why when you have 4 CPUs?...) Isn't networking also over USB on the Pi ?
Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018 17:26:10 +0100 "Josef Griebichler" wrote: > I tried your mentioned patch but unfortunately no real improvement for me. > dmesg http://ix.io/DOg > tvheadend service log http://ix.io/DOi > > Errors during recording are still there. Are you _also_ recording the stream on the Raspberry Pi? It seems to me, that you are expecting too much from this small device. > Errors increase if there is additional tcp load on raspberry. I did expected the issue to get worse, when you load the Pi with network traffic, as now the softirq time-budget have to be shared between networking and USB/DVB. Thus, I guess you are running TCP and USB/mpeg2ts on the same CPU (why when you have 4 CPUs?...) If you expect/want to get stable performance out of such a small box, then you (or LibreELEC) need to tune the box for this usage. And it does not have to be that complicated. First step is to move IRQ handling for the NIC to another CPU and than the USB port handling the DVB signal (/proc/irq/*/smp_affinity_list). And then pin the userspace process (taskset) to another CPU than the one handling USB-softirq. > Unfortunately there's no usbmon or tshark on libreelec so I can't > provide further logs. Do you have perf or trace-cmd on the box? Maybe we could come up with some kernel functions to trace, to measure/show the latency spikes? -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018 12:35:08 -0500 (EST) Alan Stern wrote: > On Mon, 8 Jan 2018, Josef Griebichler wrote: > > > No I can't sorry. There's no sat connection near to my workstation. > > Can we ask the person who made this post: > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965 > > to run the test? The post says that the testing was done on an x86_64 > machine. For >5 years ago I used to play a lot with IPTV multicast MPEG2-TS streams (I implemented the wireshark mp2ts drop detecting, and a out-of-tree netfilter kernel module to detect drops[1]). The web-site is dead, but archive.org have a copy[2]. Let me quote my own Lab-setup documentation[3]. You don't need a live IPTV MPEG2TS signal, you can simply generate your own using VLC: $ vlc ~/Videos/test_video.mkv -I rc --sout '#duplicate{dst=std{access=udp,mux=ts,dst=239.254.1.1:5500}}' Viewing your own signal: You can view your own generated signal, again, by using VLC. $ vlc udp/ts://@239.254.1.1:5500 I hope the vlc syntax is still valid. And remember to join the multicast channels, if you don't have an application requesting the stream, as desc in [4]. [1] https://github.com/netoptimizer/IPTV-Analyzer [2] http://web.archive.org/web/20150328200122/http://www.iptv-analyzer.org:80/wiki/index.php/Main_Page [3] http://web.archive.org/web/20150329095538/http://www.iptv-analyzer.org:80/wiki/index.php/Lab_Setup [4] http://web.archive.org/web/20150328234459/http://www.iptv-analyzer.org:80/wiki/index.php/Multicast_Signal_on_Linux -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: dvb usb issues since kernel 4.9
On Mon, Jan 8, 2018 at 11:15 AM, Alan Stern wrote: > > Both dwc2_hsotg and ehci-hcd use the tasklets embedded in the > giveback_urb_bh member of struct usb_hcd. See usb_hcd_giveback_urb() > in drivers/usb/core/hcd.c; the calls are > > else if (high_prio_bh) > tasklet_hi_schedule(&bh->bh); > else > tasklet_schedule(&bh->bh); > > As it turns out, high_prio_bh gets set for interrupt and isochronous > URBs but not for bulk and control URBs. The DVB driver in question > uses bulk transfers. Ok, so we could try out something like the appended? NOTE! I have not tested this at all. It LooksObvious(tm), but... Linus kernel/softirq.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index 2f5e87f1bae2..97b080956fea 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -79,12 +79,16 @@ static void wakeup_softirqd(void) /* * If ksoftirqd is scheduled, we do not want to process pending softirqs - * right now. Let ksoftirqd handle this at its own rate, to get fairness. + * right now. Let ksoftirqd handle this at its own rate, to get fairness, + * unless we're doing some of the synchronous softirqs. */ -static bool ksoftirqd_running(void) +#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ)) +static bool ksoftirqd_running(unsigned long pending) { struct task_struct *tsk = __this_cpu_read(ksoftirqd); + if (pending & SOFTIRQ_NOW_MASK) + return false; return tsk && (tsk->state == TASK_RUNNING); } @@ -325,7 +329,7 @@ asmlinkage __visible void do_softirq(void) pending = local_softirq_pending(); - if (pending && !ksoftirqd_running()) + if (pending && !ksoftirqd_running(pending)) do_softirq_own_stack(); local_irq_restore(flags); @@ -352,7 +356,7 @@ void irq_enter(void) static inline void invoke_softirq(void) { - if (ksoftirqd_running()) + if (ksoftirqd_running(local_softirq_pending())) return; if (!force_irqthreads) {
Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018, Linus Torvalds wrote: > Can somebody tell which softirq it is that dvb/usb cares about? I don't know about the DVB part. The USB part is a little difficult to analyze, mostly because the bug reports I've seen are mostly from people running non-vanilla kernels. For example, Josef is using a Raspberry Pi 3B with a non-standard USB host controller driver: dwc_otg_hcd is built into raspbian in place of the normal dwc2_hsotg driver. Both dwc2_hsotg and ehci-hcd use the tasklets embedded in the giveback_urb_bh member of struct usb_hcd. See usb_hcd_giveback_urb() in drivers/usb/core/hcd.c; the calls are else if (high_prio_bh) tasklet_hi_schedule(&bh->bh); else tasklet_schedule(&bh->bh); As it turns out, high_prio_bh gets set for interrupt and isochronous URBs but not for bulk and control URBs. The DVB driver in question uses bulk transfers. xhci-hcd, on the other hand, does not use these tasklets (it doesn't set the HCD_BH bit in the hc_driver's .flags member). Alan Stern
Re: dvb usb issues since kernel 4.9
On Mon, Jan 8, 2018 at 9:55 AM, Ingo Molnar wrote: > > as I doubt we have enough time to root-case this properly. Well, it's not like this is a new issue, and we don't have to get it fixed for 4.15. It's been around since 4.9, it's not a "have to suddenly fix it this week" issue. I just think that people should plan on having to maybe revert it and mark the revert for stable. But if the USB or DVB layers can instead just make the packet queue a bit deeper and not react so badly to the latency of a single softirq, that would obviously be a good thing in general, and maybe fix this issue. So I'm not saying that the revert is inevitable either. But I have to say that that commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") was a pretty damn big hammer, and entirely ignored the "softirqs can have latency concerns" issue. So I do feel like the UDP packet storm thing might want a somewhat more directed fix than that huge hammer of trying to move softirqs aggressively into the softirq thread. This is not that different from threaded irqs. And while you can set the "thread every irq" flag, that would be largely insane to do by default and in general. So instead, people do it either for specific irqs (ie "request_threaded_irq()") or they have a way to opt out of it (IRQF_NO_THREAD). I _suspect_ that the softirq thing really just wants the same thing. Have the networking case maybe set the "prefer threaded" flag just for networking, if it's less latency-sensitive for softirq handling than In fact, even for networking, there are separate TX/RX softirqs, maybe networking would only set it for the RX case? Or maybe even trigger it only for cases where things queue up and it goes into a "polling mode" (like NAPI already does). Of course, I don't even know _which_ softirq it is that the DVB case has issues with. Maybe it's the same NET_RX case? But looking at that offending commit, I do note (for example), that we literally have things like tasklet[_hi]_schedule() that might have been explicitly expected to just run the tasklet at a fairly low latency (maybe instead of a workqueue exactly because it doesn't need to sleep and wants lower latency). So saying "just because softirqd is possibly already woken up, let's delay all those tasklets etc" does really seem very wrong to me. Can somebody tell which softirq it is that dvb/usb cares about? Linus
Re: dvb usb issues since kernel 4.9
* Linus Torvalds wrote: > On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab > wrote: > > > > Em Sat, 6 Jan 2018 16:04:16 +0100 > > "Josef Griebichler" escreveu: > >> > >> the causing commit has been identified. > >> After reverting commit > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882 > >> its working again. > > > > Just replying to me won't magically fix this. The ones that were involved on > > this patch should also be c/c, plus USB people. Just added them. > > Actually, you seem to have added an odd subset of the people involved. > > For example, Ingo - who actually committed that patch - wasn't on the cc. > > I do think we need to simply revert that patch. It's very simple: it > has been reported to lead to actual problems for people, and we don't > fix one problem and then say "well, it fixed something else" when > something breaks. > > When something breaks, we either unbreak it, or we revert the change > that caused the breakage. > > It's really that simple. That's what "no regressions" means. We don't > accept changes that cause regressions. This one did. Yeah, absolutely - for the revert: Acked-by: Ingo Molnar as I doubt we have enough time to root-case this properly. Thanks, Ingo
Re: Aw: Re: Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018, Josef Griebichler wrote: > No I can't sorry. There's no sat connection near to my workstation. Can we ask the person who made this post: https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965 to run the test? The post says that the testing was done on an x86_64 machine. > Gesendet: Montag, 08. Januar 2018 um 17:31 Uhr > Von: "Alan Stern" > An: "Josef Griebichler" > Cc: "Mauro Carvalho Chehab" , "Greg Kroah-Hartman" > , linux-...@vger.kernel.org, "Eric Dumazet" > , "Rik van Riel" , "Paolo Abeni" > , "Hannes Frederic Sowa" , "Jesper > Dangaard Brouer" , linux-kernel > , netdev , "Jonathan > Corbet" , LMML , "Peter > Zijlstra" , "David Miller" , > torva...@linux-foundation.org > Betreff: Re: Aw: Re: dvb usb issues since kernel 4.9 > On Mon, 8 Jan 2018, Josef Griebichler wrote: > Hi Maro, > > I tried your > mentioned patch but unfortunately no real improvement for me. > dmesg > http://ix.io/DOg > tvheadend service log http://ix.io/DOi[http://ix.io/DOi] > > Errors during recording are still there. > Errors increase if there is > additional tcp load on raspberry. > > Unfortunately there's no usbmon or > tshark on libreelec so I can't provide further logs. Can you try running the > same test on an x86_64 system? Alan Stern It appears that you are using a non-standard kernel. The vanilla kernel does not include any "dwc_otg_hcd" driver. Alan Stern
Aw: Re: Re: dvb usb issues since kernel 4.9
No I can't sorry. There's no sat connection near to my workstation. Gesendet: Montag, 08. Januar 2018 um 17:31 Uhr Von: "Alan Stern" An: "Josef Griebichler" Cc: "Mauro Carvalho Chehab" , "Greg Kroah-Hartman" , linux-...@vger.kernel.org, "Eric Dumazet" , "Rik van Riel" , "Paolo Abeni" , "Hannes Frederic Sowa" , "Jesper Dangaard Brouer" , linux-kernel , netdev , "Jonathan Corbet" , LMML , "Peter Zijlstra" , "David Miller" , torva...@linux-foundation.org Betreff: Re: Aw: Re: dvb usb issues since kernel 4.9 On Mon, 8 Jan 2018, Josef Griebichler wrote: > Hi Maro, > > I tried your mentioned patch but unfortunately no real improvement for me. > dmesg http://ix.io/DOg > tvheadend service log http://ix.io/DOi[http://ix.io/DOi] > Errors during recording are still there. > Errors increase if there is additional tcp load on raspberry. > > Unfortunately there's no usbmon or tshark on libreelec so I can't provide further logs. Can you try running the same test on an x86_64 system? Alan Stern
Re: Aw: Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018, Josef Griebichler wrote: > Hi Maro, > > I tried your mentioned patch but unfortunately no real improvement for me. > dmesg http://ix.io/DOg > tvheadend service log http://ix.io/DOi > Errors during recording are still there. > Errors increase if there is additional tcp load on raspberry. > > Unfortunately there's no usbmon or tshark on libreelec so I can't provide > further logs. Can you try running the same test on an x86_64 system? Alan Stern
Aw: Re: dvb usb issues since kernel 4.9
Hi Maro, I tried your mentioned patch but unfortunately no real improvement for me. dmesg http://ix.io/DOg tvheadend service log http://ix.io/DOi Errors during recording are still there. Errors increase if there is additional tcp load on raspberry. Unfortunately there's no usbmon or tshark on libreelec so I can't provide further logs. Regards, Josef > On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote: > > > > > It seems that the original patch were designed to solve some IRQ issues > > > > with network cards with causes data losses on high traffic. However, > > > > it is also causing bad effects on sustained high bandwidth demands > > > > required by DVB cards, at least on some USB host drivers. > > > > > > > > Alan/Greg/Eric/David: > > > > > > > > Any ideas about how to fix it without causing regressions to > > > > network? > > > > > > It would be good to know what hardware was involved on the x86 system > > > and to have some timing data. Can we see the output from lsusb and > > > usbmon, running on a vanilla kernel that gets plenty of video glitches? > > > > From Josef's report, and from the BZ, the affected hardware seems > > to be based on Montage Technology M88DS3103/M88TS2022 chipset. > > What type of USB host controller does the x86_64 system use? EHCI or > xHCI? I'll let Josef answer this. > > > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c, > > with shares a USB implementation that is used by a lot more drivers. > > The URB handling code is at: > > > > drivers/media/usb/dvb-usb-v2/usb_urb.c > > > > This particular driver allocates 8 buffers with 4096 bytes each > > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP. > > > > This become a popular USB hardware nowadays. I have one S960c > > myself, so I can send you the lsusb from it. You should notice, however, > > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit > > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps > > of payload after removing URB headers. > > You mentioned earlier that the driver uses bulk transfers. In USB-2.0, > the maximum possible payload data transfer rate using bulk transfers is > 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s). And > even this is possible only if almost nothing else is using the bus at > the same time. No, I said 58 Mbits/s (not bytes). On DVB-C and DVB-S2 specs, AFAIKT, there's no hard limit for the maximum payload data rate, although industry seems to limit it to be around 60 Mbits/s. On those standards, the maximal bit rate is defined by the modulation type and by the channel symbol rate. To give you a practical example, my DVB-S2 provider modulates each transponder with 8/PSK (3 bits/symbol), and define channels with a symbol rate of 30 Mbauds/s. So, it could, theoretically, transport a MPEG-TS stream up to 90 Mbits/s (minus headers and guard intervals). In practice, the streams there are transmitted with 58,026.5 Kbits/s. > > A 10 minutes record with the > > entire data (with typically contains 5-10 channels) can easily go > > above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if > > a usbmon dump would be useful. > > It might not be helpful at all. However, I'm not interested in the > payload data (which would be unintelligible to me anyway) but rather > the timing of URB submissions and completions. A usbmon trace which > didn't keep much of the payload data would only require on the order of > 50 MB per minute -- and Josef said that glitches usually would show up > within a minute or so. Yeah, this could help. Josef, You can get it with wireshark/tshark or tcpdump. See: https://technolinchpin.wordpress.com/2015/10/23/usb-bus-sniffers-for-linux-system/ https://wiki.wireshark.org/CaptureSetup/USB > > I'm enclosing the lsusb from a S960C device, with is based on those > > Montage chipsets: > > What I wanted to see was the output from "lsusb" on the affected > system, not the output from "lsusb -v -s B:D" on your system. > > > > Overall, this may be a very difficult problem to solve. The > > > 4cd13c21b207 commit was intended to improve throughput at the cost of > > > increased latency. But then what do you do when the latency becomes > > > too high for the video subsystem to handle? > > > > Latency can't be too high, otherwise frames will be dropped. > > Yes, that's the whole point. > > > Even if the Kernel itself doesn't drop, if the delay goes higher > > than a certain threshold, userspace will need to drop, as it > > should be presenting audio and video on real time. Yet, typically, > > userspace will delay it by one or two seconds, with would mean > > 1500-3500 buffers, with I suspect it is a lot more than the hardware > > limits. So I suspect that the hardware starves free buffers a way > > before userspace, as media hardware don't have unlimited buffers > > inside them, as they assume that the Kernel/userspace will be fast > > enough to sustain bit rates up to 66 Mbps of pay
Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018, Mauro Carvalho Chehab wrote: > > Let find the root-cause of this before reverting, as this will hurt the > > networking use-case. > > > > I want to see if the increase buffer will solve the issue (the current > > buffer of 0.63 ms seem too small). > > For TV, high latency has mainly two practical consequences: > > 1) it increases the time to switch channels. MPEG-TS based transmissions >usually takes some time to start showing the channel contents. Adding >more buffers make it worse; > > 2) specially when watching sports, a higher latency means that you'll know >that your favorite team made a score when your neighbors start >celebrating... seeing the actual event only after them. > > So, the lower, the merrier, but I think that 5 ms would be acceptable. That value 65 for the number of buffers was calculated based on a misunderstanding of the actual bandwidth requirement. Still increasing the number of buffers shouldn't hurt, and it's worth trying. But there is another misunderstanding here which needs to be cleared up. Adding more buffers does _not_ increase latency; it increases capacity. Making each buffer larger _would_ increase latency, but that's not what I proposed. Going through this more explicitly... Suppose you receive 8 KB of data every ms, and suppose you have four 8-KB buffers. Then the latency is 1 ms, because that's how long you have to wait for the first buffer to be filled up after you submit an I/O request. (The driver does _not_ need to wait for all four buffers to be filled before it can start displaying the data in the first buffer.) The capacity would be 4 ms, because that's how much data your buffers can store. If you end up waiting longer than 4 ms before ksoftirqd gets around to processing any of the data, then some data will inevitably get lost. That's why the way to deal with the delays caused by deferring softirqs to ksoftirqd is to add more buffers (and not make the buffers larger than they already are). > > I would also like to see experiments with adjusting adjust the sched > > priority of the kthread's and/or the userspace prog. (e.g use command > > like 'sudo chrt --fifo -p 10 $(pgrep udp_sink)' ). > > If this fixes the issue, we'll need to do something inside the Kernel > to change the priority, as TV userspace apps should not run as root. Not > sure where such change should be done (USB? media?). It would be interesting to try this, but I agree that it's not likely to be a practical solution. Anyway, shouldn't ksoftirqd already be running with very high priority? > > Are we really sure that the regression is cause by 4cd13c21b207 > > ("softirq: Let ksoftirqd do its job"), the forum thread also report > > that the problem is almost gone after commit 34f41c0316ed ("timers: Fix > > overflow in get_next_timer_interrupt") > > https://git.kernel.org/torvalds/c/34f41c0316ed That is a good point. It's hard to see how the issues in the two commits could be related, but who knows? > I'll see if I can mount a test scenario here in order to try reproduce > the reported bug. I suspect that I won't be able to reproduce it on my > "standard" i7core-based test machine, even with KPTI enabled. If you're using the same sort of hardware as Josef, under similar circumstances, the buggy bahavior should be the same. If not, there must be something else going on that we're not aware of. > > It makes me suspicious that this fix changes things... > > After this fix, I suspect that changing the sched priorities, will fix > > the remaining glitches. > > > > > > > It is hard to foresee the consequences of the softirq changes for other > > > devices, though. > > > > Yes, it is hard to foresee, I can only cover networking. > > > > For networking, if reverting this, we will (again) open the kernel for > > an easy DDoS vector with UDP packets. As mentioned in the commit desc, > > before you could easily cause softirq to take all the CPU time from the > > application, resulting in very low "good-put" in the UDP-app. (That's why > > it was so easy to DDoS DNS servers before...) > > > > With the softirqd patch in place, ksoftirqd is scheduled fairly between > > other applications running on the same CPU. But in some cases this is > > not what you want, so as the also commit mentions, the admin can now > > more easily tune process scheduling parameters if needed, to adjust for > > such use-cases (it was not really an admin choice before). > > Can't the ksoftirq patch be modified to only apply to the networking > IRQ handling? That sounds less risky of affecting unrelated subsystems[1]. That might work. Or more generally, allow drivers to specify which softirq sources should be deferred to ksoftirqd and which should not. Alan Stern > [1] Actually, DVB drivers can also implement networking for satellite > based Internet, but, in this case, the top half is implemented inside > the DVB core, as the IP traffic should be filt
Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018, Mauro Carvalho Chehab wrote: > Em Sun, 7 Jan 2018 10:41:37 -0500 (EST) > Alan Stern escreveu: > > > On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote: > > > > > > > It seems that the original patch were designed to solve some IRQ > > > > > issues > > > > > with network cards with causes data losses on high traffic. However, > > > > > it is also causing bad effects on sustained high bandwidth demands > > > > > required by DVB cards, at least on some USB host drivers. > > > > > > > > > > Alan/Greg/Eric/David: > > > > > > > > > > Any ideas about how to fix it without causing regressions to > > > > > network? > > > > > > > > It would be good to know what hardware was involved on the x86 system > > > > and to have some timing data. Can we see the output from lsusb and > > > > usbmon, running on a vanilla kernel that gets plenty of video glitches? > > > > > > > > > > From Josef's report, and from the BZ, the affected hardware seems > > > to be based on Montage Technology M88DS3103/M88TS2022 chipset. > > > > What type of USB host controller does the x86_64 system use? EHCI or > > xHCI? > > I'll let Josef answer this. > > > > > > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c, > > > with shares a USB implementation that is used by a lot more drivers. > > > The URB handling code is at: > > > > > > drivers/media/usb/dvb-usb-v2/usb_urb.c > > > > > > This particular driver allocates 8 buffers with 4096 bytes each > > > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP. > > > > > > This become a popular USB hardware nowadays. I have one S960c > > > myself, so I can send you the lsusb from it. You should notice, however, > > > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit > > > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps > > > of payload after removing URB headers. > > > > You mentioned earlier that the driver uses bulk transfers. In USB-2.0, > > the maximum possible payload data transfer rate using bulk transfers is > > 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s). And > > even this is possible only if almost nothing else is using the bus at > > the same time. > > No, I said 58 Mbits/s (not bytes). Well, what you actually _wrote_ was "58 Mpps of payload" (see above), and I couldn't tell how to interpret that. :-) 58 Mb/s is obviously almost 8 times less than the full USB bus bandwidth. > On DVB-C and DVB-S2 specs, AFAIKT, there's no hard limit for the maximum > payload data rate, although industry seems to limit it to be around > 60 Mbits/s. On those standards, the maximal bit rate is defined by the > modulation type and by the channel symbol rate. > > To give you a practical example, my DVB-S2 provider modulates each > transponder with 8/PSK (3 bits/symbol), and define channels with a > symbol rate of 30 Mbauds/s. So, it could, theoretically, transport > a MPEG-TS stream up to 90 Mbits/s (minus headers and guard intervals). > In practice, the streams there are transmitted with 58,026.5 Kbits/s. Okay. This is 58 Kb/ms or 7.25 KB/ms. So your scheme of eight 4-KB buffers gives a latency of 0.57 ms with a total capacity of 4.5 ms, which is a lot better than what I was thinking. > > In any case, you might be able to attack the problem simply by using > > more than 8 buffers. With just eight 4096-byte buffers, the total > > pipeline capacity is only about 0.62 ms (at the maximum possible > > transfer rate). Increasing the number of buffers to 65 would give a > > capacity of 5 ms, which is probably a lot better suited for situations > > where completions are handled by the ksoftirqd thread. > > Increasing it to 65 shouldn't be hard. Not sure, however, if the hardware > will actually fill the 65 buffers, but it is worth to try. Given the new information, 65 would be overkill. But going from 8 to 16 might help. > > > Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP, > > > in order to revert the kernel logic to prioritize latency instead of > > > throughput. > > > > It can't be done without pervasive changes to the USB subsystem, which > > I would greatly prefer to avoid. Besides, this wouldn't really solve > > the problem. Decreasing the latency for one device will cause it to be > > increased for others. > > If there is a TV streaming traffic at a USB bus, it means that the > user wants to either watch and/or record a TV program. On such > usecase scenario, a low latency is highly desired for the TV capture > (and display, if the GPU is USB), even it means a higher latency for > other traffic. Not if the other traffic is also a TV capture. :-) It might make sense to classify softirq sources as "high priority" or "low priority", and only defer the "low priority" work to ksoftirqd. Alan Stern
Re: dvb usb issues since kernel 4.9
Em Mon, 8 Jan 2018 12:59:10 +0100 Jesper Dangaard Brouer escreveu: > On Mon, 8 Jan 2018 08:02:00 -0200 > Mauro Carvalho Chehab wrote: > > > Hi Linus, > > > > Em Sun, 7 Jan 2018 13:23:39 -0800 > > Linus Torvalds escreveu: > > > > > On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab > > > wrote: > > > > > > > > Em Sat, 6 Jan 2018 16:04:16 +0100 > > > > "Josef Griebichler" escreveu: > > > >> > > > >> the causing commit has been identified. > > > >> After reverting commit > > > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882 > > > >> its working again. > > > > > > > > Just replying to me won't magically fix this. The ones that were > > > > involved on > > > > this patch should also be c/c, plus USB people. Just added them. > > > > > > Actually, you seem to have added an odd subset of the people involved. > > > > > > For example, Ingo - who actually committed that patch - wasn't on the cc. > > > > > > > Sorry, my fault. I forgot to add him to it. > > > > > I do think we need to simply revert that patch. It's very simple: it > > > has been reported to lead to actual problems for people, and we don't > > > fix one problem and then say "well, it fixed something else" when > > > something breaks. > > > > > > When something breaks, we either unbreak it, or we revert the change > > > that caused the breakage. > > > > > > It's really that simple. That's what "no regressions" means. We don't > > > accept changes that cause regressions. This one did. > > > > Yeah, we should either unbreak or revert it. In the specific case of > > media devices, Alan came with a proposal of increasing the number of > > buffers. This is an one line change, and increase a capture delay from > > 0.63 ms to 5 ms on this specific case (Digital TV) shouldn't make much > > harm. So, I guess it would worth trying it before reverting the patch. > > Let find the root-cause of this before reverting, as this will hurt the > networking use-case. > > I want to see if the increase buffer will solve the issue (the current > buffer of 0.63 ms seem too small). For TV, high latency has mainly two practical consequences: 1) it increases the time to switch channels. MPEG-TS based transmissions usually takes some time to start showing the channel contents. Adding more buffers make it worse; 2) specially when watching sports, a higher latency means that you'll know that your favorite team made a score when your neighbors start celebrating... seeing the actual event only after them. So, the lower, the merrier, but I think that 5 ms would be acceptable. > I would also like to see experiments with adjusting adjust the sched > priority of the kthread's and/or the userspace prog. (e.g use command > like 'sudo chrt --fifo -p 10 $(pgrep udp_sink)' ). If this fixes the issue, we'll need to do something inside the Kernel to change the priority, as TV userspace apps should not run as root. Not sure where such change should be done (USB? media?). > Are we really sure that the regression is cause by 4cd13c21b207 > ("softirq: Let ksoftirqd do its job"), the forum thread also report > that the problem is almost gone after commit 34f41c0316ed ("timers: Fix > overflow in get_next_timer_interrupt") > https://git.kernel.org/torvalds/c/34f41c0316ed I'll see if I can mount a test scenario here in order to try reproduce the reported bug. I suspect that I won't be able to reproduce it on my "standard" i7core-based test machine, even with KPTI enabled. > It makes me suspicious that this fix changes things... > After this fix, I suspect that changing the sched priorities, will fix > the remaining glitches. > > > > It is hard to foresee the consequences of the softirq changes for other > > devices, though. > > Yes, it is hard to foresee, I can only cover networking. > > For networking, if reverting this, we will (again) open the kernel for > an easy DDoS vector with UDP packets. As mentioned in the commit desc, > before you could easily cause softirq to take all the CPU time from the > application, resulting in very low "good-put" in the UDP-app. (That's why > it was so easy to DDoS DNS servers before...) > > With the softirqd patch in place, ksoftirqd is scheduled fairly between > other applications running on the same CPU. But in some cases this is > not what you want, so as the also commit mentions, the admin can now > more easily tune process scheduling parameters if needed, to adjust for > such use-cases (it was not really an admin choice before). Can't the ksoftirq patch be modified to only apply to the networking IRQ handling? That sounds less risky of affecting unrelated subsystems[1]. [1] Actually, DVB drivers can also implement networking for satellite based Internet, but, in this case, the top half is implemented inside the DVB core, as the IP traffic should be filtered out of an MPEG-TS stream. Not sure if
Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018 08:02:00 -0200 Mauro Carvalho Chehab wrote: > Hi Linus, > > Em Sun, 7 Jan 2018 13:23:39 -0800 > Linus Torvalds escreveu: > > > On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab > > wrote: > > > > > > Em Sat, 6 Jan 2018 16:04:16 +0100 > > > "Josef Griebichler" escreveu: > > >> > > >> the causing commit has been identified. > > >> After reverting commit > > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882 > > >> its working again. > > > > > > Just replying to me won't magically fix this. The ones that were involved > > > on > > > this patch should also be c/c, plus USB people. Just added them. > > > > Actually, you seem to have added an odd subset of the people involved. > > > > For example, Ingo - who actually committed that patch - wasn't on the cc. > > Sorry, my fault. I forgot to add him to it. > > > I do think we need to simply revert that patch. It's very simple: it > > has been reported to lead to actual problems for people, and we don't > > fix one problem and then say "well, it fixed something else" when > > something breaks. > > > > When something breaks, we either unbreak it, or we revert the change > > that caused the breakage. > > > > It's really that simple. That's what "no regressions" means. We don't > > accept changes that cause regressions. This one did. > > Yeah, we should either unbreak or revert it. In the specific case of > media devices, Alan came with a proposal of increasing the number of > buffers. This is an one line change, and increase a capture delay from > 0.63 ms to 5 ms on this specific case (Digital TV) shouldn't make much > harm. So, I guess it would worth trying it before reverting the patch. Let find the root-cause of this before reverting, as this will hurt the networking use-case. I want to see if the increase buffer will solve the issue (the current buffer of 0.63 ms seem too small). I would also like to see experiments with adjusting adjust the sched priority of the kthread's and/or the userspace prog. (e.g use command like 'sudo chrt --fifo -p 10 $(pgrep udp_sink)' ). Are we really sure that the regression is cause by 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), the forum thread also report that the problem is almost gone after commit 34f41c0316ed ("timers: Fix overflow in get_next_timer_interrupt") https://git.kernel.org/torvalds/c/34f41c0316ed It makes me suspicious that this fix changes things... After this fix, I suspect that changing the sched priorities, will fix the remaining glitches. > It is hard to foresee the consequences of the softirq changes for other > devices, though. Yes, it is hard to foresee, I can only cover networking. For networking, if reverting this, we will (again) open the kernel for an easy DDoS vector with UDP packets. As mentioned in the commit desc, before you could easily cause softirq to take all the CPU time from the application, resulting in very low "good-put" in the UDP-app. (That's why it was so easy to DDoS DNS servers before...) With the softirqd patch in place, ksoftirqd is scheduled fairly between other applications running on the same CPU. But in some cases this is not what you want, so as the also commit mentions, the admin can now more easily tune process scheduling parameters if needed, to adjust for such use-cases (it was not really an admin choice before). > For example, we didn't have any reports about this issue affecting cameras, > Most cameras use ISOC nowadays, but some only provide bulk transfers. > We usually try to use the minimum number of buffers possible, as > increasing latency on cameras can be very annoying, specially on > videoconference applications. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: dvb usb issues since kernel 4.9
Hi Linus, Em Sun, 7 Jan 2018 13:23:39 -0800 Linus Torvalds escreveu: > On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab > wrote: > > > > Em Sat, 6 Jan 2018 16:04:16 +0100 > > "Josef Griebichler" escreveu: > >> > >> the causing commit has been identified. > >> After reverting commit > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882 > >> its working again. > > > > Just replying to me won't magically fix this. The ones that were involved on > > this patch should also be c/c, plus USB people. Just added them. > > Actually, you seem to have added an odd subset of the people involved. > > For example, Ingo - who actually committed that patch - wasn't on the cc. Sorry, my fault. I forgot to add him to it. > I do think we need to simply revert that patch. It's very simple: it > has been reported to lead to actual problems for people, and we don't > fix one problem and then say "well, it fixed something else" when > something breaks. > > When something breaks, we either unbreak it, or we revert the change > that caused the breakage. > > It's really that simple. That's what "no regressions" means. We don't > accept changes that cause regressions. This one did. Yeah, we should either unbreak or revert it. In the specific case of media devices, Alan came with a proposal of increasing the number of buffers. This is an one line change, and increase a capture delay from 0.63 ms to 5 ms on this specific case (Digital TV) shouldn't make much harm. So, I guess it would worth trying it before reverting the patch. It is hard to foresee the consequences of the softirq changes for other devices, though. For example, we didn't have any reports about this issue affecting cameras, Most cameras use ISOC nowadays, but some only provide bulk transfers. We usually try to use the minimum number of buffers possible, as increasing latency on cameras can be very annoying, specially on videoconference applications. Thanks, Mauro
Re: dvb usb issues since kernel 4.9
Em Sun, 7 Jan 2018 10:41:37 -0500 (EST) Alan Stern escreveu: > On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote: > > > > > It seems that the original patch were designed to solve some IRQ issues > > > > with network cards with causes data losses on high traffic. However, > > > > it is also causing bad effects on sustained high bandwidth demands > > > > required by DVB cards, at least on some USB host drivers. > > > > > > > > Alan/Greg/Eric/David: > > > > > > > > Any ideas about how to fix it without causing regressions to > > > > network? > > > > > > It would be good to know what hardware was involved on the x86 system > > > and to have some timing data. Can we see the output from lsusb and > > > usbmon, running on a vanilla kernel that gets plenty of video glitches? > > > > From Josef's report, and from the BZ, the affected hardware seems > > to be based on Montage Technology M88DS3103/M88TS2022 chipset. > > What type of USB host controller does the x86_64 system use? EHCI or > xHCI? I'll let Josef answer this. > > > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c, > > with shares a USB implementation that is used by a lot more drivers. > > The URB handling code is at: > > > > drivers/media/usb/dvb-usb-v2/usb_urb.c > > > > This particular driver allocates 8 buffers with 4096 bytes each > > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP. > > > > This become a popular USB hardware nowadays. I have one S960c > > myself, so I can send you the lsusb from it. You should notice, however, > > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit > > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps > > of payload after removing URB headers. > > You mentioned earlier that the driver uses bulk transfers. In USB-2.0, > the maximum possible payload data transfer rate using bulk transfers is > 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s). And > even this is possible only if almost nothing else is using the bus at > the same time. No, I said 58 Mbits/s (not bytes). On DVB-C and DVB-S2 specs, AFAIKT, there's no hard limit for the maximum payload data rate, although industry seems to limit it to be around 60 Mbits/s. On those standards, the maximal bit rate is defined by the modulation type and by the channel symbol rate. To give you a practical example, my DVB-S2 provider modulates each transponder with 8/PSK (3 bits/symbol), and define channels with a symbol rate of 30 Mbauds/s. So, it could, theoretically, transport a MPEG-TS stream up to 90 Mbits/s (minus headers and guard intervals). In practice, the streams there are transmitted with 58,026.5 Kbits/s. > > A 10 minutes record with the > > entire data (with typically contains 5-10 channels) can easily go > > above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if > > a usbmon dump would be useful. > > It might not be helpful at all. However, I'm not interested in the > payload data (which would be unintelligible to me anyway) but rather > the timing of URB submissions and completions. A usbmon trace which > didn't keep much of the payload data would only require on the order of > 50 MB per minute -- and Josef said that glitches usually would show up > within a minute or so. Yeah, this could help. Josef, You can get it with wireshark/tshark or tcpdump. See: https://technolinchpin.wordpress.com/2015/10/23/usb-bus-sniffers-for-linux-system/ https://wiki.wireshark.org/CaptureSetup/USB > > I'm enclosing the lsusb from a S960C device, with is based on those > > Montage chipsets: > > What I wanted to see was the output from "lsusb" on the affected > system, not the output from "lsusb -v -s B:D" on your system. > > > > Overall, this may be a very difficult problem to solve. The > > > 4cd13c21b207 commit was intended to improve throughput at the cost of > > > increased latency. But then what do you do when the latency becomes > > > too high for the video subsystem to handle? > > > > Latency can't be too high, otherwise frames will be dropped. > > Yes, that's the whole point. > > > Even if the Kernel itself doesn't drop, if the delay goes higher > > than a certain threshold, userspace will need to drop, as it > > should be presenting audio and video on real time. Yet, typically, > > userspace will delay it by one or two seconds, with would mean > > 1500-3500 buffers, with I suspect it is a lot more than the hardware > > limits. So I suspect that the hardware starves free buffers a way > > before userspace, as media hardware don't have unlimited buffers > > inside them, as they assume that the Kernel/userspace will be fast > > enough to sustain bit rates up to 66 Mbps of payload. > > The timing information would tell us how large the latency is. > > In any case, you might be able to attack the problem simply by using > more than 8 buffers. With just eight 4096-byte buffers, the total > pipel
Re: dvb usb issues since kernel 4.9
On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab wrote: > > Em Sat, 6 Jan 2018 16:04:16 +0100 > "Josef Griebichler" escreveu: >> >> the causing commit has been identified. >> After reverting commit >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882 >> its working again. > > Just replying to me won't magically fix this. The ones that were involved on > this patch should also be c/c, plus USB people. Just added them. Actually, you seem to have added an odd subset of the people involved. For example, Ingo - who actually committed that patch - wasn't on the cc. I do think we need to simply revert that patch. It's very simple: it has been reported to lead to actual problems for people, and we don't fix one problem and then say "well, it fixed something else" when something breaks. When something breaks, we either unbreak it, or we revert the change that caused the breakage. It's really that simple. That's what "no regressions" means. We don't accept changes that cause regressions. This one did. Linus
Aw: Re: dvb usb issues since kernel 4.9
Hi, here I provide lsusb from my affected hardware (technotrend s2-4600). http://ix.io/DLY With this hardware I had errors when recording with tvheadend. Livetv was ok, only channel switching made some problems sometimes. Please see attached tvheadend service logs. I also provide dmesg (libreelec on rpi3 with kernel 4.14.10 with revert of the mentioned commit). http://ix.io/DM2 Regards Josef Gesendet: Sonntag, 07. Januar 2018 um 16:41 Uhr Von: "Alan Stern" An: "Mauro Carvalho Chehab" Cc: "Josef Griebichler" , "Greg Kroah-Hartman" , linux-...@vger.kernel.org, "Eric Dumazet" , "Rik van Riel" , "Paolo Abeni" , "Hannes Frederic Sowa" , "Jesper Dangaard Brouer" , linux-kernel , netdev , "Jonathan Corbet" , LMML , "Peter Zijlstra" , "David Miller" , torva...@linux-foundation.org Betreff: Re: dvb usb issues since kernel 4.9 On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote: > > > It seems that the original patch were designed to solve some IRQ issues > > > with network cards with causes data losses on high traffic. However, > > > it is also causing bad effects on sustained high bandwidth demands > > > required by DVB cards, at least on some USB host drivers. > > > > > > Alan/Greg/Eric/David: > > > > > > Any ideas about how to fix it without causing regressions to > > > network? > > > > It would be good to know what hardware was involved on the x86 system > > and to have some timing data. Can we see the output from lsusb and > > usbmon, running on a vanilla kernel that gets plenty of video glitches? > > From Josef's report, and from the BZ, the affected hardware seems > to be based on Montage Technology M88DS3103/M88TS2022 chipset. What type of USB host controller does the x86_64 system use? EHCI or xHCI? > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c, > with shares a USB implementation that is used by a lot more drivers. > The URB handling code is at: > > drivers/media/usb/dvb-usb-v2/usb_urb.c > > This particular driver allocates 8 buffers with 4096 bytes each > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP. > > This become a popular USB hardware nowadays. I have one S960c > myself, so I can send you the lsusb from it. You should notice, however, > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps > of payload after removing URB headers. You mentioned earlier that the driver uses bulk transfers. In USB-2.0, the maximum possible payload data transfer rate using bulk transfers is 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s). And even this is possible only if almost nothing else is using the bus at the same time. > A 10 minutes record with the > entire data (with typically contains 5-10 channels) can easily go > above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if > a usbmon dump would be useful. It might not be helpful at all. However, I'm not interested in the payload data (which would be unintelligible to me anyway) but rather the timing of URB submissions and completions. A usbmon trace which didn't keep much of the payload data would only require on the order of 50 MB per minute -- and Josef said that glitches usually would show up within a minute or so. > I'm enclosing the lsusb from a S960C device, with is based on those > Montage chipsets: What I wanted to see was the output from "lsusb" on the affected system, not the output from "lsusb -v -s B:D" on your system. > > Overall, this may be a very difficult problem to solve. The > > 4cd13c21b207 commit was intended to improve throughput at the cost of > > increased latency. But then what do you do when the latency becomes > > too high for the video subsystem to handle? > > Latency can't be too high, otherwise frames will be dropped. Yes, that's the whole point. > Even if the Kernel itself doesn't drop, if the delay goes higher > than a certain threshold, userspace will need to drop, as it > should be presenting audio and video on real time. Yet, typically, > userspace will delay it by one or two seconds, with would mean > 1500-3500 buffers, with I suspect it is a lot more than the hardware > limits. So I suspect that the hardware starves free buffers a way > before userspace, as media hardware don't have unlimited buffers > inside them, as they assume that the Kernel/userspace will be fast > enough to sustain bit rates up to 66 Mbps of payload. The timing information would tell us how large the latency is. In any case,
Re: dvb usb issues since kernel 4.9
On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote: > > > It seems that the original patch were designed to solve some IRQ issues > > > with network cards with causes data losses on high traffic. However, > > > it is also causing bad effects on sustained high bandwidth demands > > > required by DVB cards, at least on some USB host drivers. > > > > > > Alan/Greg/Eric/David: > > > > > > Any ideas about how to fix it without causing regressions to > > > network? > > > > It would be good to know what hardware was involved on the x86 system > > and to have some timing data. Can we see the output from lsusb and > > usbmon, running on a vanilla kernel that gets plenty of video glitches? > > From Josef's report, and from the BZ, the affected hardware seems > to be based on Montage Technology M88DS3103/M88TS2022 chipset. What type of USB host controller does the x86_64 system use? EHCI or xHCI? > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c, > with shares a USB implementation that is used by a lot more drivers. > The URB handling code is at: > > drivers/media/usb/dvb-usb-v2/usb_urb.c > > This particular driver allocates 8 buffers with 4096 bytes each > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP. > > This become a popular USB hardware nowadays. I have one S960c > myself, so I can send you the lsusb from it. You should notice, however, > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps > of payload after removing URB headers. You mentioned earlier that the driver uses bulk transfers. In USB-2.0, the maximum possible payload data transfer rate using bulk transfers is 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s). And even this is possible only if almost nothing else is using the bus at the same time. > A 10 minutes record with the > entire data (with typically contains 5-10 channels) can easily go > above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if > a usbmon dump would be useful. It might not be helpful at all. However, I'm not interested in the payload data (which would be unintelligible to me anyway) but rather the timing of URB submissions and completions. A usbmon trace which didn't keep much of the payload data would only require on the order of 50 MB per minute -- and Josef said that glitches usually would show up within a minute or so. > I'm enclosing the lsusb from a S960C device, with is based on those > Montage chipsets: What I wanted to see was the output from "lsusb" on the affected system, not the output from "lsusb -v -s B:D" on your system. > > Overall, this may be a very difficult problem to solve. The > > 4cd13c21b207 commit was intended to improve throughput at the cost of > > increased latency. But then what do you do when the latency becomes > > too high for the video subsystem to handle? > > Latency can't be too high, otherwise frames will be dropped. Yes, that's the whole point. > Even if the Kernel itself doesn't drop, if the delay goes higher > than a certain threshold, userspace will need to drop, as it > should be presenting audio and video on real time. Yet, typically, > userspace will delay it by one or two seconds, with would mean > 1500-3500 buffers, with I suspect it is a lot more than the hardware > limits. So I suspect that the hardware starves free buffers a way > before userspace, as media hardware don't have unlimited buffers > inside them, as they assume that the Kernel/userspace will be fast > enough to sustain bit rates up to 66 Mbps of payload. The timing information would tell us how large the latency is. In any case, you might be able to attack the problem simply by using more than 8 buffers. With just eight 4096-byte buffers, the total pipeline capacity is only about 0.62 ms (at the maximum possible transfer rate). Increasing the number of buffers to 65 would give a capacity of 5 ms, which is probably a lot better suited for situations where completions are handled by the ksoftirqd thread. > Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP, > in order to revert the kernel logic to prioritize latency instead of > throughput. It can't be done without pervasive changes to the USB subsystem, which I would greatly prefer to avoid. Besides, this wouldn't really solve the problem. Decreasing the latency for one device will cause it to be increased for others. Alan Stern
Re: dvb usb issues since kernel 4.9
Em Sat, 6 Jan 2018 16:44:20 -0500 (EST) Alan Stern escreveu: > On Sat, 6 Jan 2018, Mauro Carvalho Chehab wrote: > > > Hi Josef, > > > > Em Sat, 6 Jan 2018 16:04:16 +0100 > > "Josef Griebichler" escreveu: > > > > > Hi, > > > > > > the causing commit has been identified. > > > After reverting commit > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882 > > > its working again. > > > > Just replying to me won't magically fix this. The ones that were involved on > > this patch should also be c/c, plus USB people. Just added them. > > > > > Please have a look into the thread > > > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13 > > > here are several users aknowledging the revert solves their issues with > > > usb dvb cards. > > > > I read the entire (long) thread there. In order to make easier for the > > others, from what I understand, the problem happens on both x86 and arm, > > although almost all comments there are mentioning tests with raspbian > > Kernel (with uses a different USB host driver than the upstream one). > > > > It happens when watching digital TV DVB-C channels, with usually means > > a sustained bit rate of 11 MBps to 54 MBps. > > > > The reports mention the dvbsky, with uses USB URB bulk transfers. > > On every several minutes (5 to 10 mins), the stream suffer "glitches" > > caused by frame losses. > > > > The part of the thread that contains the bisect is at: > > > > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965 > > > > It indirectly mentions another comment on the thread with points > > to: > > https://github.com/raspberrypi/linux/issues/2134 > > > > There, it says that this fix part of the issues: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4 > > > > but it affects URB packet losses on a lesser extend. > > > > The main issue is really the logic changes a the core softirq logic. > > > > Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted > > fixed the issue. > > > > Joseph, is the above right? Anything else to mention? Does the > > same issue affect also on x86 with vanilla Kernel 4.14.10? > > > > - > > > > It seems that the original patch were designed to solve some IRQ issues > > with network cards with causes data losses on high traffic. However, > > it is also causing bad effects on sustained high bandwidth demands > > required by DVB cards, at least on some USB host drivers. > > > > Alan/Greg/Eric/David: > > > > Any ideas about how to fix it without causing regressions to > > network? > > It would be good to know what hardware was involved on the x86 system > and to have some timing data. Can we see the output from lsusb and > usbmon, running on a vanilla kernel that gets plenty of video glitches? >From Josef's report, and from the BZ, the affected hardware seems to be based on Montage Technology M88DS3103/M88TS2022 chipset. The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c, with shares a USB implementation that is used by a lot more drivers. The URB handling code is at: drivers/media/usb/dvb-usb-v2/usb_urb.c This particular driver allocates 8 buffers with 4096 bytes each for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP. This become a popular USB hardware nowadays. I have one S960c myself, so I can send you the lsusb from it. You should notice, however, that a DVB-C/DVB-S2 channel can easily provide very high sustained bit rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps of payload after removing URB headers. A 10 minutes record with the entire data (with typically contains 5-10 channels) can easily go above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if a usbmon dump would be useful. I'm enclosing the lsusb from a S960C device, with is based on those Montage chipsets: Bus 002 Device 007: ID 0572:960c Conexant Systems (Rockwell), Inc. DVBSky S960C DVB-S2 tuner Couldn't open device, some information will be missing Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize064 idVendor 0x0572 Conexant Systems (Rockwell), Inc. idProduct 0x960c DVBSky S960C DVB-S2 tuner bcdDevice0.00 iManufacturer 1 iProduct2 iSerial 3 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 219 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 4 bmAttributes 0x80 (Bus Powered) MaxPower
Re: dvb usb issues since kernel 4.9
On Sat, 6 Jan 2018, Mauro Carvalho Chehab wrote: > Hi Josef, > > Em Sat, 6 Jan 2018 16:04:16 +0100 > "Josef Griebichler" escreveu: > > > Hi, > > > > the causing commit has been identified. > > After reverting commit > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882 > > its working again. > > Just replying to me won't magically fix this. The ones that were involved on > this patch should also be c/c, plus USB people. Just added them. > > > Please have a look into the thread > > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13 > > here are several users aknowledging the revert solves their issues with usb > > dvb cards. > > I read the entire (long) thread there. In order to make easier for the > others, from what I understand, the problem happens on both x86 and arm, > although almost all comments there are mentioning tests with raspbian > Kernel (with uses a different USB host driver than the upstream one). > > It happens when watching digital TV DVB-C channels, with usually means > a sustained bit rate of 11 MBps to 54 MBps. > > The reports mention the dvbsky, with uses USB URB bulk transfers. > On every several minutes (5 to 10 mins), the stream suffer "glitches" > caused by frame losses. > > The part of the thread that contains the bisect is at: > > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965 > > It indirectly mentions another comment on the thread with points > to: > https://github.com/raspberrypi/linux/issues/2134 > > There, it says that this fix part of the issues: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4 > > but it affects URB packet losses on a lesser extend. > > The main issue is really the logic changes a the core softirq logic. > > Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted > fixed the issue. > > Joseph, is the above right? Anything else to mention? Does the > same issue affect also on x86 with vanilla Kernel 4.14.10? > > - > > It seems that the original patch were designed to solve some IRQ issues > with network cards with causes data losses on high traffic. However, > it is also causing bad effects on sustained high bandwidth demands > required by DVB cards, at least on some USB host drivers. > > Alan/Greg/Eric/David: > > Any ideas about how to fix it without causing regressions to > network? It would be good to know what hardware was involved on the x86 system and to have some timing data. Can we see the output from lsusb and usbmon, running on a vanilla kernel that gets plenty of video glitches? Overall, this may be a very difficult problem to solve. The 4cd13c21b207 commit was intended to improve throughput at the cost of increased latency. But then what do you do when the latency becomes too high for the video subsystem to handle? Alan Stern
Aw: Re: dvb usb issues since kernel 4.9
Hi, thanks for adding the people involved! Yes arm and x86 are affected. Bisecting was not done by me on a x86_64 machine on mainline kernel and not raspbian kernel (https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965). In the mentioned post you also find the bisect log. I'm using a dvb-s/s2 usb tv card (technotrend s2-4600 with firmware dvb-fe-ds3103.fw, components M88DS3103, M88TS2022), so not only dvb-c is affected. Yes kernel 4.14.10 with revert of the mentioned commit works as before on kernel 4.8 with rpi3. I hope this is of some help. Regards, Josef Hi Josef, Em Sat, 6 Jan 2018 16:04:16 +0100 "Josef Griebichler" escreveu: > Hi, > > the causing commit has been identified. > After reverting commit > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882 > its working again. Just replying to me won't magically fix this. The ones that were involved on this patch should also be c/c, plus USB people. Just added them. > Please have a look into the thread > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13[https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13] > here are several users aknowledging the revert solves their issues with usb > dvb cards. I read the entire (long) thread there. In order to make easier for the others, from what I understand, the problem happens on both x86 and arm, although almost all comments there are mentioning tests with raspbian Kernel (with uses a different USB host driver than the upstream one). It happens when watching digital TV DVB-C channels, with usually means a sustained bit rate of 11 MBps to 54 MBps. The reports mention the dvbsky, with uses USB URB bulk transfers. On every several minutes (5 to 10 mins), the stream suffer "glitches" caused by frame losses. The part of the thread that contains the bisect is at: https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965[https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965] It indirectly mentions another comment on the thread with points to: https://github.com/raspberrypi/linux/issues/2134[https://github.com/raspberrypi/linux/issues/2134] There, it says that this fix part of the issues: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4[https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4] but it affects URB packet losses on a lesser extend. The main issue is really the logic changes a the core softirq logic. Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted fixed the issue. Joseph, is the above right? Anything else to mention? Does the same issue affect also on x86 with vanilla Kernel 4.14.10? - It seems that the original patch were designed to solve some IRQ issues with network cards with causes data losses on high traffic. However, it is also causing bad effects on sustained high bandwidth demands required by DVB cards, at least on some USB host drivers. Alan/Greg/Eric/David: Any ideas about how to fix it without causing regressions to network? Regards, Mauro > Gesendet: Sonntag, 17. Dezember 2017 um 14:27 Uhr > Von: "Mauro Carvalho Chehab" > An: "Sean Young" > Cc: "Josef Griebichler" , lcaumo...@gmail.com, > gre...@linuxfoundation.org, linux-media@vger.kernel.org, > linux-...@vger.kernel.org > Betreff: Re: dvb usb issues since kernel 4.9 > Em Sun, 17 Dec 2017 12:06:37 + > Sean Young escreveu: > > > Hi Josef, > > Em Sun, 17 Dec 2017 11:19:38 +0100 > "Josef Griebichler" escreveu: > > > > Hello Mr. Caumont, > > > > > > since switch to kernel 4.9 there are several users which have issues with > > > their usb dvb cards. > > > Some get artifacts when watching livetv, I'm getting discontinuity errors > > > in tvheadend when recording. > > > I'm using latest test build of LibreElec with kernel 4.14.6 but the > > > issues are still there. > > > There's an librelec forum thread for this topic > > > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/[https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/] > > > and also an open kernel bug > > > https://bugzilla.kernel.org/show_bug.cgi?id=197835[https://bugzilla.kernel.org/show_bug.cgi?id=197835][https://bugzilla.kernel.org/show_bug.cgi?id=197835[https://bugzilla.kernel.org/show_bug.cgi?id=197835]] > > > > > > This is my dmesg > &
Re: dvb usb issues since kernel 4.9
Hi Josef, Em Sat, 6 Jan 2018 16:04:16 +0100 "Josef Griebichler" escreveu: > Hi, > > the causing commit has been identified. > After reverting commit > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882 > its working again. Just replying to me won't magically fix this. The ones that were involved on this patch should also be c/c, plus USB people. Just added them. > Please have a look into the thread > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13 > here are several users aknowledging the revert solves their issues with usb > dvb cards. I read the entire (long) thread there. In order to make easier for the others, from what I understand, the problem happens on both x86 and arm, although almost all comments there are mentioning tests with raspbian Kernel (with uses a different USB host driver than the upstream one). It happens when watching digital TV DVB-C channels, with usually means a sustained bit rate of 11 MBps to 54 MBps. The reports mention the dvbsky, with uses USB URB bulk transfers. On every several minutes (5 to 10 mins), the stream suffer "glitches" caused by frame losses. The part of the thread that contains the bisect is at: https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965 It indirectly mentions another comment on the thread with points to: https://github.com/raspberrypi/linux/issues/2134 There, it says that this fix part of the issues: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4 but it affects URB packet losses on a lesser extend. The main issue is really the logic changes a the core softirq logic. Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted fixed the issue. Joseph, is the above right? Anything else to mention? Does the same issue affect also on x86 with vanilla Kernel 4.14.10? - It seems that the original patch were designed to solve some IRQ issues with network cards with causes data losses on high traffic. However, it is also causing bad effects on sustained high bandwidth demands required by DVB cards, at least on some USB host drivers. Alan/Greg/Eric/David: Any ideas about how to fix it without causing regressions to network? Regards, Mauro > Gesendet: Sonntag, 17. Dezember 2017 um 14:27 Uhr > Von: "Mauro Carvalho Chehab" > An: "Sean Young" > Cc: "Josef Griebichler" , lcaumo...@gmail.com, > gre...@linuxfoundation.org, linux-media@vger.kernel.org, > linux-...@vger.kernel.org > Betreff: Re: dvb usb issues since kernel 4.9 > Em Sun, 17 Dec 2017 12:06:37 + > Sean Young escreveu: > > > Hi Josef, > > Em Sun, 17 Dec 2017 11:19:38 +0100 > "Josef Griebichler" escreveu: > > > > Hello Mr. Caumont, > > > > > > since switch to kernel 4.9 there are several users which have issues with > > > their usb dvb cards. > > > Some get artifacts when watching livetv, I'm getting discontinuity errors > > > in tvheadend when recording. > > > I'm using latest test build of LibreElec with kernel 4.14.6 but the > > > issues are still there. > > > There's an librelec forum thread for this topic > > > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/ > > > and also an open kernel bug > > > https://bugzilla.kernel.org/show_bug.cgi?id=197835[https://bugzilla.kernel.org/show_bug.cgi?id=197835] > > > > > > This is my dmesg http://sprunge.us/WRIE[http://sprunge.us/WRIE] > > > and tvh service log http://sprunge.us/bEiE[http://sprunge.us/bEiE] > > > > > > I saw in kernel changelog that you made an improvement/change for dvb und > > > usb (commit 9a11204d2b26324636ff54f8d28095ed5dd17e91) > > > > > > Is there anything that can be done to improve our situation or are we > > > forced to stay with kernel 4.8? > > > > > > Thanks for support! > > > > > > Josef > > > > Between kernel v4.8 and v4.9 there are many changes, and it is unlikely that > > commit 9a11204d2b26324636ff54f8d28095ed5dd17e91 is responsible for this. > > Let me add linux-media@vger.kernel.org and linux-...@vger.kernel.org ML. > > Josef, Please be sure that your e-mailer won't be sending e-mails with > HTML tags on it, otherwise the ML server will automatically drop. > > > What would be really helpful is if you could find out which commit did > > cause a regression. This can be done by bisecting the kernel. There are > > various guides to this:
Re: dvb usb issues since kernel 4.9
Em Sun, 17 Dec 2017 12:06:37 + Sean Young escreveu: > Hi Josef, Em Sun, 17 Dec 2017 11:19:38 +0100 "Josef Griebichler" escreveu: > > Hello Mr. Caumont, > > > > since switch to kernel 4.9 there are several users which have issues with > > their usb dvb cards. > > Some get artifacts when watching livetv, I'm getting discontinuity errors > > in tvheadend when recording. > > I'm using latest test build of LibreElec with kernel 4.14.6 but the issues > > are still there. > > There's an librelec forum thread for this topic > > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/ > > and also an open kernel bug > > https://bugzilla.kernel.org/show_bug.cgi?id=197835 > > > > This is my dmesg http://sprunge.us/WRIE > > and tvh service log http://sprunge.us/bEiE > > > > I saw in kernel changelog that you made an improvement/change for dvb und > > usb (commit 9a11204d2b26324636ff54f8d28095ed5dd17e91) > > > > Is there anything that can be done to improve our situation or are we > > forced to stay with kernel 4.8? > > > > Thanks for support! > > > > Josef > > Between kernel v4.8 and v4.9 there are many changes, and it is unlikely that > commit 9a11204d2b26324636ff54f8d28095ed5dd17e91 is responsible for this. Let me add linux-media@vger.kernel.org and linux-...@vger.kernel.org ML. Josef, Please be sure that your e-mailer won't be sending e-mails with HTML tags on it, otherwise the ML server will automatically drop. > What would be really helpful is if you could find out which commit did > cause a regression. This can be done by bisecting the kernel. There are > various guides to this: > > https://wiki.ubuntu.com/Kernel/KernelBisection > or > https://wiki.archlinux.org/index.php/Bisecting_bugs > > Once the commit has been identified we can work together to narrow it down > to the exact change, and then work together on a fix. Yeah, we need more data in order to start tracking it. I suspect, however, that a simple git bisect may not work in this case, due to the USB changes that forbids DMA on stack that was added to Kernel 4.9, if the card Josef is using was affected by such change. Probably, he'll need to disable CONFIG_VMAP_STACK in the middle of bisect (e. g. when the patch that implements it is added), or to cherry-pick any needed DMA fixup patch on the top of Kernel 4.8 before starting bisect. It is also worth mentioning what's the USB host controller that are used, and what's the media driver, as this could be an issue there. That's said, from the bug report, it seems that this is happening on RPi3. Could you please test it also on a PC? That will help to identify if the bug is at RPi's host driver or on media drivers. With regards to RPi3, there are actually two different drivers for it: one used on Raspbian Kernel, and another one upstream. They're completely different ones. What driver are you using? Thanks, Mauro