Re: [syzbot] WARNING in ieee802154_del_seclevel

2021-04-01 Thread Alan Stern
On Wed, Mar 31, 2021 at 02:03:08PM -0700, syzbot wrote:
> syzbot has bisected this issue to:
> 
> commit 416dacb819f59180e4d86a5550052033ebb6d72c
> Author: Alan Stern 
> Date:   Wed Aug 21 17:27:12 2019 +
> 
> HID: hidraw: Fix invalid read in hidraw_ioctl
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=127430fcd0
> start commit:   6e5a03bc ethernet/netronome/nfp: Fix a use after free in n..
> git tree:   net
> final oops: https://syzkaller.appspot.com/x/report.txt?x=117430fcd0
> console output: https://syzkaller.appspot.com/x/log.txt?x=167430fcd0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=daeff30c2474a60f
> dashboard link: https://syzkaller.appspot.com/bug?extid=fbf4fc11a819824e027b
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13bfe45ed0
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1188e31ad0
> 
> Reported-by: syzbot+fbf4fc11a819824e0...@syzkaller.appspotmail.com
> Fixes: 416dacb819f5 ("HID: hidraw: Fix invalid read in hidraw_ioctl")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

It seems likely that the bisection ran off the rails here.  This commit 
could not have caused a problem, although it may have revealed a 
pre-existing problem that previously was hidden.

By the way, what happened to the annotated stack dumps that syzkaller 
used to provide in its bug reports?

Alan Stern


Re: WARNING in port100_send_frame_async/usb_submit_urb

2020-12-02 Thread Alan Stern
On Tue, Dec 01, 2020 at 01:21:27AM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:c84e1efa Merge tag 'asm-generic-fixes-5.10-2' of git://git..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=14a9856550
> kernel config:  https://syzkaller.appspot.com/x/.config?x=7be70951fca93701
> dashboard link: https://syzkaller.appspot.com/bug?extid=dbec6695a6565a9c6bc0
> compiler:   clang version 11.0.0 
> (https://github.com/llvm/llvm-project.git 
> ca2dcbd030eadbf0aa9b660efe864ff08af6e18b)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=17c607f150
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+dbec6695a6565a9c6...@syzkaller.appspotmail.com
> 
> usb 1-1: string descriptor 0 read error: -32
> [ cut here ]
> URB 5c26bc1e submitted while active
> WARNING: CPU: 0 PID: 5 at drivers/usb/core/urb.c:378 
> usb_submit_urb+0xf57/0x1510 drivers/usb/core/urb.c:378
> Modules linked in:
> CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.0-rc5-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Workqueue: usb_hub_wq hub_event
> RIP: 0010:usb_submit_urb+0xf57/0x1510 drivers/usb/core/urb.c:378
> Code: 5c 41 5d 41 5e 41 5f 5d e9 76 5b ff ff e8 f1 e8 04 fc c6 05 25 0e 8b 07 
> 01 48 c7 c7 a0 b7 5b 8a 4c 89 e6 31 c0 e8 89 07 d5 fb <0f> 0b e9 20 f1 ff ff 
> e8 cd e8 04 fc eb 05 e8 c6 e8 04 fc bb a6 ff
> RSP: 0018:c9ca6ec8 EFLAGS: 00010246
> RAX: cf72e284cb303700 RBX: 888021723708 RCX: 888011108000
> RDX:  RSI: 8000 RDI: 
> RBP: 0cc0 R08: 815d29f2 R09: ed1017383ffc
> R10: ed1017383ffc R11:  R12: 888021723700
> R13: dc00 R14: 888012cfa458 R15: 11100259f489
> FS:  () GS:8880b9c0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 56157313d160 CR3: 1e22c000 CR4: 001506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  port100_send_frame_async+0x1ea/0x390 drivers/nfc/port100.c:780
>  port100_send_cmd_async+0x6c7/0x950 drivers/nfc/port100.c:876
>  port100_send_cmd_sync drivers/nfc/port100.c:916 [inline]
>  port100_set_command_type drivers/nfc/port100.c:987 [inline]
>  port100_probe+0xd4f/0x1600 drivers/nfc/port100.c:1567

I don't understand this driver very well.  It looks like the problem 
stems from the fact that port100_send_frame_async() submits two URBs, 
but port100_send_cmd_sync() only waits for one of them to complete.  The 
other URB may then still be active when the driver tries to reuse it.

Maybe someone who's more familiar with the port100 driver can fix the 
problem.

Alan Stern


Re: [PATCH v2] net: usb: usbnet: update __usbnet_{read|write}_cmd() to use new API

2020-10-29 Thread Alan Stern
On Thu, Oct 29, 2020 at 08:46:59PM +0530, Anant Thazhemadam wrote:
> I had a v2 prepared and ready but was told to wait for a week before sending 
> it in,
> since usb_control_msg_{send|recv}() that were being used were not present in 
> the
> networking tree at the time, and all the trees would be converged by then.
> So, just to be on the safer side, I waited for two weeks.
> I checked the net tree, and found the APIs there too (defined in usb.h).
> 
> However the build seems to fail here,
>     
> https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org/thread/O2BERGN7SYYC6LNOOKNUGPS2IJLDWYT7/
> 
> I'm not entirely sure at this point why this is happening, and would 
> appreciate it if
> someone could take the time to tell me if and how this might be an issue with 
> my
> patch.

It's happening because the tree you used for building did not include 
the declarations of usb_control_msg_{send|recv}().

It's hard to tell exactly what that tree does contain, because the 
github.com links embedded in the web page you mention above don't work.

Alan Stern


Re: [PATCH] net: usb: Fix uninit-was-stored issue in asix_read_cmd()

2020-08-25 Thread Alan Stern
On Tue, Aug 25, 2020 at 08:51:35AM +0200, Greg Kroah-Hartman wrote:
> At first glance, I think this can all be cleaned up, but it will take a
> bit of tree-wide work.  I agree, we need a "read this message and error
> if the whole thing is not there", as well as a "send this message and
> error if the whole thing was not sent", and also a way to handle
> stack-provided data, which seems to be the primary reason subsystems
> wrap this call (they want to make it easier on their drivers to use it.)
> 
> Let me think about this in more detail, but maybe something like:
>   usb_control_msg_read()
>   usb_control_msg_send()
> is a good first step (as the caller knows this) and stack provided data
> would be allowed, and it would return an error if the whole message was
> not read/sent properly.  That way we can start converting everything
> over to a sane, and checkable, api and remove a bunch of wrapper
> functions as well.

Suggestion: _read and _send are not a natural pair.  Consider instead
_read and _write.  _recv and _send don't feel right either, because it
both cases the host sends the control message -- the difference lies
in who sends the data.

Alan Stern


Re: [PATCH 8/8] net/iucv: Use the new device_to_pm() helper to access struct dev_pm_ops

2020-05-26 Thread Alan Stern
On Tue, May 26, 2020 at 05:19:07PM +0200, Rafael J. Wysocki wrote:
> On Tue, May 26, 2020 at 5:07 PM Krzysztof Wilczyński  wrote:
> >
> > Hello Greg,
> >
> > [...]
> > > It's "interesting" how using your new helper doesn't actually make the
> > > code smaller.  Perhaps it isn't a good helper function?
> >
> > The idea for the helper was inspired by the comment Dan made to Bjorn
> > about Bjorn's change, as per:
> >
> >   https://lore.kernel.org/driverdev-devel/20191016135002.GA24678@kadam/
> >
> > It looked like a good idea to try to reduce the following:
> >
> >   dev->driver && dev->driver->pm && dev->driver->pm->prepare
> >
> > Into something more succinct.  Albeit, given the feedback from yourself
> > and Rafael, I gather that this helper is not really a good addition.
> 
> IMO it could be used for reducing code duplication like you did in the
> PCI code, but not necessarily in the other places where the code in
> question is not exactly duplicated.

The code could be a little more succinct, although it wouldn't fit every 
usage.  For example,

#define pm_do_callback(dev, method) \
(dev->driver && dev->driver->pm && dev->driver->pm->callback ? \
dev->driver->pm->callback(dev) : 0)

Then the usage is something like:

ret = pm_do_callback(dev, prepare);

Would this be an overall improvement?

Alan Stern


Re: [PATCH RFC 0/4] barriers using data dependency

2019-01-03 Thread Alan Stern
On Wed, 2 Jan 2019, Michael S. Tsirkin wrote:

> On Wed, Jan 02, 2019 at 04:36:40PM -0500, Alan Stern wrote:
> > On Wed, 2 Jan 2019, Michael S. Tsirkin wrote:
> > 
> > > So as explained in Documentation/memory-barriers.txt e.g.
> > > a load followed by a store require a full memory barrier,
> > > to avoid store being ordered before the load.
> > > Similarly load-load requires a read memory barrier.
> > > 
> > > Thinking about it, we can actually create a data dependency
> > > by mixing the first loaded value into the pointer being
> > > accessed.
> > > 
> > > This adds an API for this and uses it in virtio.
> > > 
> > > Written over the holiday and build tested only so far.
> > 
> > You are using the terminology from memory-barriers.txt, referring to
> > the new dependency you create as a data dependency.  However,
> > tools/memory-model/* uses a more precise name, calling it an address
> > dependency.  Could you change the comments in the patches to use this
> > name instead?
> 
> Sure, sounds good. While I'm at it, should memory-barriers.txt be
> switched over too?

If you want to take care of that, great!  I never seem to get around to 
doing it.

> > > This patchset is also suboptimal on e.g. x86 where e.g. smp_rmb is a nop.
> > 
> > This should be easy to fix with an architecture-specific override.
> > 
> > Alan Stern
> 
> Absolutely. It does however mean that we'll need several
> variants: mb/rmb, smp/dma/virt/mandatory.
> 
> I am still trying to decide whether it's good since it documents the
> kind of barrier that we are trying to use - or bad since it's more
> verbose and makes you choose one where they are all pretty cheap.

How many places can these things be used?  My guess is not very many,
or at least, there aren't very many different _types_ of usage.  So
start only with variants you know will be used.  More can be added
later if we want.

Alan Stern



Re: [PATCH RFC 0/4] barriers using data dependency

2019-01-02 Thread Alan Stern
On Wed, 2 Jan 2019, Michael S. Tsirkin wrote:

> So as explained in Documentation/memory-barriers.txt e.g.
> a load followed by a store require a full memory barrier,
> to avoid store being ordered before the load.
> Similarly load-load requires a read memory barrier.
> 
> Thinking about it, we can actually create a data dependency
> by mixing the first loaded value into the pointer being
> accessed.
> 
> This adds an API for this and uses it in virtio.
> 
> Written over the holiday and build tested only so far.

You are using the terminology from memory-barriers.txt, referring to
the new dependency you create as a data dependency.  However,
tools/memory-model/* uses a more precise name, calling it an address
dependency.  Could you change the comments in the patches to use this
name instead?

> This patchset is also suboptimal on e.g. x86 where e.g. smp_rmb is a nop.

This should be easy to fix with an architecture-specific override.

Alan Stern

> Sending out for early feedback/flames.
> 
> Michael S. Tsirkin (4):
>   include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
>   include/linux/compiler.h: allow memory operands
>   barriers: convert a control to a data dependency
>   virtio: use dependent_ptr_mb
> 
>  Documentation/memory-barriers.txt | 20 
>  arch/alpha/include/asm/barrier.h  |  1 +
>  drivers/virtio/virtio_ring.c  |  6 --
>  include/asm-generic/barrier.h | 18 ++
>  include/linux/compiler-clang.h|  5 ++---
>  include/linux/compiler-gcc.h  |  4 
>  include/linux/compiler-intel.h|  4 +---
>  include/linux/compiler.h  |  8 +++-
>  8 files changed, 53 insertions(+), 13 deletions(-)



Re: dvb usb issues since kernel 4.9

2018-01-08 Thread Alan Stern
On Mon, 8 Jan 2018, Linus Torvalds wrote:

> Can somebody tell which softirq it is that dvb/usb cares about?

I don't know about the DVB part.  The USB part is a little difficult to
analyze, mostly because the bug reports I've seen are mostly from
people running non-vanilla kernels.  For example, Josef is using a
Raspberry Pi 3B with a non-standard USB host controller driver:
dwc_otg_hcd is built into raspbian in place of the normal dwc2_hsotg
driver.

Both dwc2_hsotg and ehci-hcd use the tasklets embedded in the 
giveback_urb_bh member of struct usb_hcd.  See usb_hcd_giveback_urb() 
in drivers/usb/core/hcd.c; the calls are

else if (high_prio_bh)
tasklet_hi_schedule(&bh->bh);
else
tasklet_schedule(&bh->bh);

As it turns out, high_prio_bh gets set for interrupt and isochronous
URBs but not for bulk and control URBs.  The DVB driver in question
uses bulk transfers.

xhci-hcd, on the other hand, does not use these tasklets (it doesn't
set the HCD_BH bit in the hc_driver's .flags member).

Alan Stern



Re: Aw: Re: Re: dvb usb issues since kernel 4.9

2018-01-08 Thread Alan Stern
On Mon, 8 Jan 2018, Josef Griebichler wrote:

> No I can't sorry. There's no sat connection near to my workstation.

Can we ask the person who made this post:

https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965

to run the test?  The post says that the testing was done on an x86_64 
machine.

> Gesendet: Montag, 08. Januar 2018 um 17:31 Uhr
> Von: "Alan Stern" 
> An: "Josef Griebichler" 
> Cc: "Mauro Carvalho Chehab" , "Greg Kroah-Hartman" 
> , linux-...@vger.kernel.org, "Eric Dumazet" 
> , "Rik van Riel" , "Paolo Abeni" 
> , "Hannes Frederic Sowa" , "Jesper 
> Dangaard Brouer" , linux-kernel 
> , netdev , "Jonathan 
> Corbet" , LMML , "Peter 
> Zijlstra" , "David Miller" , 
> torva...@linux-foundation.org
> Betreff: Re: Aw: Re: dvb usb issues since kernel 4.9
> On Mon, 8 Jan 2018, Josef Griebichler wrote: > Hi Maro, > > I tried your 
> mentioned patch but unfortunately no real improvement for me. > dmesg 
> http://ix.io/DOg > tvheadend service log http://ix.io/DOi[http://ix.io/DOi] > 
> Errors during recording are still there. > Errors increase if there is 
> additional tcp load on raspberry. > > Unfortunately there's no usbmon or 
> tshark on libreelec so I can't provide further logs. Can you try running the 
> same test on an x86_64 system? Alan Stern

It appears that you are using a non-standard kernel.  The vanilla 
kernel does not include any "dwc_otg_hcd" driver.

Alan Stern



Re: Aw: Re: dvb usb issues since kernel 4.9

2018-01-08 Thread Alan Stern
On Mon, 8 Jan 2018, Josef Griebichler wrote:

> Hi Maro,
> 
> I tried your mentioned patch but unfortunately no real improvement for me.
> dmesg http://ix.io/DOg
> tvheadend service log http://ix.io/DOi
> Errors during recording are still there.
> Errors increase if there is additional tcp load on raspberry.
> 
> Unfortunately there's no usbmon or tshark on libreelec so I can't provide 
> further logs.

Can you try running the same test on an x86_64 system?

Alan Stern



Re: dvb usb issues since kernel 4.9

2018-01-08 Thread Alan Stern
On Mon, 8 Jan 2018, Mauro Carvalho Chehab wrote:

> > Let find the root-cause of this before reverting, as this will hurt the
> > networking use-case.
> > 
> > I want to see if the increase buffer will solve the issue (the current
> > buffer of 0.63 ms seem too small).
> 
> For TV, high latency has mainly two practical consequences:
> 
> 1) it increases the time to switch channels. MPEG-TS based transmissions
>usually takes some time to start showing the channel contents. Adding
>more buffers make it worse;
> 
> 2) specially when watching sports, a higher latency means that you'll know
>that your favorite team made a score when your neighbors start
>celebrating... seeing the actual event only after them.
> 
> So, the lower, the merrier, but I think that 5 ms would be acceptable.

That value 65 for the number of buffers was calculated based on a
misunderstanding of the actual bandwidth requirement.  Still increasing
the number of buffers shouldn't hurt, and it's worth trying.

But there is another misunderstanding here which needs to be cleared 
up.  Adding more buffers does _not_ increase latency; it increases 
capacity.  Making each buffer larger _would_ increase latency, but 
that's not what I proposed.

Going through this more explicitly...  Suppose you receive 8 KB of data
every ms, and suppose you have four 8-KB buffers.  Then the latency is
1 ms, because that's how long you have to wait for the first buffer to
be filled up after you submit an I/O request.  (The driver does _not_
need to wait for all four buffers to be filled before it can start
displaying the data in the first buffer.)  The capacity would be 4 ms,
because that's how much data your buffers can store.  If you end up
waiting longer than 4 ms before ksoftirqd gets around to processing any
of the data, then some data will inevitably get lost.

That's why the way to deal with the delays caused by deferring softirqs
to ksoftirqd is to add more buffers (and not make the buffers larger
than they already are).

> > I would also like to see experiments with adjusting adjust the sched
> > priority of the kthread's and/or the userspace prog. (e.g use command
> > like 'sudo chrt --fifo -p 10 $(pgrep udp_sink)' ).
> 
> If this fixes the issue, we'll need to do something inside the Kernel
> to change the priority, as TV userspace apps should not run as root. Not
> sure where such change should be done (USB? media?).

It would be interesting to try this, but I agree that it's not likely 
to be a practical solution.  Anyway, shouldn't ksoftirqd already be 
running with very high priority?

> > Are we really sure that the regression is cause by 4cd13c21b207
> > ("softirq: Let ksoftirqd do its job"), the forum thread also report
> > that the problem is almost gone after commit 34f41c0316ed ("timers: Fix
> > overflow in get_next_timer_interrupt")
> >  https://git.kernel.org/torvalds/c/34f41c0316ed

That is a good point.  It's hard to see how the issues in the two 
commits could be related, but who knows?

> I'll see if I can mount a test scenario here in order to try reproduce
> the reported bug. I suspect that I won't be able to reproduce it on my
> "standard" i7core-based test machine, even with KPTI enabled.

If you're using the same sort of hardware as Josef, under similar 
circumstances, the buggy bahavior should be the same.  If not, there 
must be something else going on that we're not aware of.

> > It makes me suspicious that this fix changes things...
> > After this fix, I suspect that changing the sched priorities, will fix
> > the remaining glitches.
> > 
> > 
> > > It is hard to foresee the consequences of the softirq changes for other
> > > devices, though.  
> > 
> > Yes, it is hard to foresee, I can only cover networking.
> > 
> > For networking, if reverting this, we will (again) open the kernel for
> > an easy DDoS vector with UDP packets.  As mentioned in the commit desc,
> > before you could easily cause softirq to take all the CPU time from the
> > application, resulting in very low "good-put" in the UDP-app. (That's why
> > it was so easy to DDoS DNS servers before...)
> > 
> > With the softirqd patch in place, ksoftirqd is scheduled fairly between
> > other applications running on the same CPU.  But in some cases this is
> > not what you want, so as the also commit mentions, the admin can now
> > more easily tune process scheduling parameters if needed, to adjust for
> > such use-cases (it was not really an admin choice before).
> 
> Can't the ksoftirq patch be modified to only apply to the networking
>

Re: dvb usb issues since kernel 4.9

2018-01-08 Thread Alan Stern
On Mon, 8 Jan 2018, Mauro Carvalho Chehab wrote:

> Em Sun, 7 Jan 2018 10:41:37 -0500 (EST)
> Alan Stern  escreveu:
> 
> > On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote:
> > 
> > > > > It seems that the original patch were designed to solve some IRQ 
> > > > > issues
> > > > > with network cards with causes data losses on high traffic. However,
> > > > > it is also causing bad effects on sustained high bandwidth demands
> > > > > required by DVB cards, at least on some USB host drivers.
> > > > > 
> > > > > Alan/Greg/Eric/David:
> > > > > 
> > > > > Any ideas about how to fix it without causing regressions to
> > > > > network?
> > > > 
> > > > It would be good to know what hardware was involved on the x86 system
> > > > and to have some timing data.  Can we see the output from lsusb and
> > > > usbmon, running on a vanilla kernel that gets plenty of video glitches? 
> > > >  
> > > 
> > > From Josef's report, and from the BZ, the affected hardware seems
> > > to be based on Montage Technology M88DS3103/M88TS2022 chipset.  
> > 
> > What type of USB host controller does the x86_64 system use?  EHCI or
> > xHCI?
> 
> I'll let Josef answer this.
> 
> > 
> > > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
> > > with shares a USB implementation that is used by a lot more drivers.
> > > The URB handling code is at:
> > > 
> > >   drivers/media/usb/dvb-usb-v2/usb_urb.c
> > > 
> > > This particular driver allocates 8 buffers with 4096 bytes each
> > > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.
> > > 
> > > This become a popular USB hardware nowadays. I have one S960c
> > > myself, so I can send you the lsusb from it. You should notice, however,
> > > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
> > > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
> > > of payload after removing URB headers.  
> > 
> > You mentioned earlier that the driver uses bulk transfers.  In USB-2.0,
> > the maximum possible payload data transfer rate using bulk transfers is
> > 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s).  And
> > even this is possible only if almost nothing else is using the bus at
> > the same time.
> 
> No, I said 58 Mbits/s (not bytes).

Well, what you actually _wrote_ was "58 Mpps of payload" (see above),
and I couldn't tell how to interpret that.  :-)

58 Mb/s is obviously almost 8 times less than the full USB bus 
bandwidth.

> On DVB-C and DVB-S2 specs, AFAIKT, there's no hard limit for the maximum
> payload data rate, although industry seems to limit it to be around
> 60 Mbits/s. On those standards, the maximal bit rate is defined by the
> modulation type and by the channel symbol rate.
> 
> To give you a practical example, my DVB-S2 provider modulates each
> transponder with 8/PSK (3 bits/symbol), and define channels with a
> symbol rate of 30 Mbauds/s. So, it could, theoretically, transport
> a MPEG-TS stream up to 90 Mbits/s (minus headers and guard intervals).
> In practice, the streams there are transmitted with 58,026.5 Kbits/s.

Okay.  This is 58 Kb/ms or 7.25 KB/ms.  So your scheme of eight 4-KB 
buffers gives a latency of 0.57 ms with a total capacity of 4.5 ms, 
which is a lot better than what I was thinking.

> > In any case, you might be able to attack the problem simply by using
> > more than 8 buffers.  With just eight 4096-byte buffers, the total
> > pipeline capacity is only about 0.62 ms (at the maximum possible
> > transfer rate).  Increasing the number of buffers to 65 would give a
> > capacity of 5 ms, which is probably a lot better suited for situations
> > where completions are handled by the ksoftirqd thread.
> 
> Increasing it to 65 shouldn't be hard. Not sure, however, if the hardware
> will actually fill the 65 buffers, but it is worth to try.

Given the new information, 65 would be overkill.  But going from 8 to 
16 might help.

> > > Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP,
> > > in order to revert the kernel logic to prioritize latency instead of
> > > throughput.  
> > 
> > It can't be done without pervasive changes to the USB subsystem, which
> > I would greatly prefer to avoid.  Besides, this wouldn't really solve
> > the problem.  Decreasing the latency for one device will cause it to be
> > increased for others.
> 
> If there is a TV streaming traffic at a USB bus, it means that the
> user wants to either watch and/or record a TV program. On such
> usecase scenario, a low latency is highly desired for the TV capture
> (and display, if the GPU is USB), even it means a higher latency for
> other traffic.

Not if the other traffic is also a TV capture.  :-)

It might make sense to classify softirq sources as "high priority" or 
"low priority", and only defer the "low priority" work to ksoftirqd.

Alan Stern



Re: dvb usb issues since kernel 4.9

2018-01-07 Thread Alan Stern
On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote:

> > > It seems that the original patch were designed to solve some IRQ issues
> > > with network cards with causes data losses on high traffic. However,
> > > it is also causing bad effects on sustained high bandwidth demands
> > > required by DVB cards, at least on some USB host drivers.
> > > 
> > > Alan/Greg/Eric/David:
> > > 
> > > Any ideas about how to fix it without causing regressions to
> > > network?  
> > 
> > It would be good to know what hardware was involved on the x86 system
> > and to have some timing data.  Can we see the output from lsusb and
> > usbmon, running on a vanilla kernel that gets plenty of video glitches?
> 
> From Josef's report, and from the BZ, the affected hardware seems
> to be based on Montage Technology M88DS3103/M88TS2022 chipset.

What type of USB host controller does the x86_64 system use?  EHCI or
xHCI?

> The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
> with shares a USB implementation that is used by a lot more drivers.
> The URB handling code is at:
> 
>   drivers/media/usb/dvb-usb-v2/usb_urb.c
> 
> This particular driver allocates 8 buffers with 4096 bytes each
> for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.
> 
> This become a popular USB hardware nowadays. I have one S960c
> myself, so I can send you the lsusb from it. You should notice, however,
> that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
> rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
> of payload after removing URB headers.

You mentioned earlier that the driver uses bulk transfers.  In USB-2.0,
the maximum possible payload data transfer rate using bulk transfers is
53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s).  And
even this is possible only if almost nothing else is using the bus at
the same time.

> A 10 minutes record with the
> entire data (with typically contains 5-10 channels) can easily go
> above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if
> a usbmon dump would be useful.

It might not be helpful at all.  However, I'm not interested in the 
payload data (which would be unintelligible to me anyway) but rather 
the timing of URB submissions and completions.  A usbmon trace which 
didn't keep much of the payload data would only require on the order of 
50 MB per minute -- and Josef said that glitches usually would show up 
within a minute or so.

> I'm enclosing the lsusb from a S960C device, with is based on those
> Montage chipsets:

What I wanted to see was the output from "lsusb" on the affected
system, not the output from "lsusb -v -s B:D" on your system.

> > Overall, this may be a very difficult problem to solve.  The
> > 4cd13c21b207 commit was intended to improve throughput at the cost of
> > increased latency.  But then what do you do when the latency becomes
> > too high for the video subsystem to handle?
> 
> Latency can't be too high, otherwise frames will be dropped.

Yes, that's the whole point.

> Even if the Kernel itself doesn't drop, if the delay goes higher
> than a certain threshold, userspace will need to drop, as it
> should be presenting audio and video on real time. Yet, typically,
> userspace will delay it by one or two seconds, with would mean
> 1500-3500 buffers, with I suspect it is a lot more than the hardware
> limits. So I suspect that the hardware starves free buffers a way 
> before userspace, as media hardware don't have unlimited buffers
> inside them, as they assume that the Kernel/userspace will be fast
> enough to sustain bit rates up to 66 Mbps of payload.

The timing information would tell us how large the latency is.

In any case, you might be able to attack the problem simply by using
more than 8 buffers.  With just eight 4096-byte buffers, the total
pipeline capacity is only about 0.62 ms (at the maximum possible
transfer rate).  Increasing the number of buffers to 65 would give a
capacity of 5 ms, which is probably a lot better suited for situations
where completions are handled by the ksoftirqd thread.

> Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP,
> in order to revert the kernel logic to prioritize latency instead of
> throughput.

It can't be done without pervasive changes to the USB subsystem, which
I would greatly prefer to avoid.  Besides, this wouldn't really solve
the problem.  Decreasing the latency for one device will cause it to be
increased for others.

Alan Stern



Re: dvb usb issues since kernel 4.9

2018-01-06 Thread Alan Stern
On Sat, 6 Jan 2018, Mauro Carvalho Chehab wrote:

> Hi Josef,
> 
> Em Sat, 6 Jan 2018 16:04:16 +0100
> "Josef Griebichler"  escreveu:
> 
> > Hi,
> > 
> > the causing commit has been identified.
> > After reverting commit 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> > its working again.
> 
> Just replying to me won't magically fix this. The ones that were involved on
> this patch should also be c/c, plus USB people. Just added them.
> 
> > Please have a look into the thread 
> > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13
> > here are several users aknowledging the revert solves their issues with usb 
> > dvb cards.
> 
> I read the entire (long) thread there. In order to make easier for the
> others, from what I understand, the problem happens on both x86 and arm,
> although almost all comments there are mentioning tests with raspbian
> Kernel (with uses a different USB host driver than the upstream one).
> 
> It happens when watching digital TV DVB-C channels, with usually means
> a sustained bit rate of 11 MBps to 54 MBps.
> 
> The reports mention the dvbsky, with uses USB URB bulk transfers.
> On every several minutes (5 to 10 mins), the stream suffer "glitches"
> caused by frame losses.
> 
> The part of the thread that contains the bisect is at:
>   
> https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965
> 
> It indirectly mentions another comment on the thread with points
> to:
>   https://github.com/raspberrypi/linux/issues/2134
> 
> There, it says that this fix part of the issues:
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4
> 
> but it affects URB packet losses on a lesser extend.
> 
> The main issue is really the logic changes a the core softirq logic.
> 
> Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted
> fixed the issue. 
> 
> Joseph, is the above right? Anything else to mention? Does the
> same issue affect also on x86 with vanilla Kernel 4.14.10?
> 
> -
> 
> It seems that the original patch were designed to solve some IRQ issues
> with network cards with causes data losses on high traffic. However,
> it is also causing bad effects on sustained high bandwidth demands
> required by DVB cards, at least on some USB host drivers.
> 
> Alan/Greg/Eric/David:
> 
> Any ideas about how to fix it without causing regressions to
> network?

It would be good to know what hardware was involved on the x86 system
and to have some timing data.  Can we see the output from lsusb and
usbmon, running on a vanilla kernel that gets plenty of video glitches?

Overall, this may be a very difficult problem to solve.  The
4cd13c21b207 commit was intended to improve throughput at the cost of
increased latency.  But then what do you do when the latency becomes
too high for the video subsystem to handle?

Alan Stern



Re: [PATCH v2 0/9] Remove spin_unlock_wait()

2017-07-08 Thread Alan Stern
Pardon me for barging in, but I found this whole interchange extremely 
confusing...

On Sat, 8 Jul 2017, Ingo Molnar wrote:

> * Paul E. McKenney  wrote:
> 
> > On Sat, Jul 08, 2017 at 10:35:43AM +0200, Ingo Molnar wrote:
> > > 
> > > * Manfred Spraul  wrote:
> > > 
> > > > Hi Ingo,
> > > > 
> > > > On 07/07/2017 10:31 AM, Ingo Molnar wrote:
> > > > > 
> > > > > There's another, probably just as significant advantage: 
> > > > > queued_spin_unlock_wait()
> > > > > is 'read-only', while spin_lock()+spin_unlock() dirties the lock 
> > > > > cache line. On
> > > > > any bigger system this should make a very measurable difference - if
> > > > > spin_unlock_wait() is ever used in a performance critical code path.
> > > > At least for ipc/sem:
> > > > Dirtying the cacheline (in the slow path) allows to remove a smp_mb() 
> > > > in the
> > > > hot path.
> > > > So for sem_lock(), I either need a primitive that dirties the cacheline 
> > > > or
> > > > sem_lock() must continue to use spin_lock()/spin_unlock().

This statement doesn't seem to make sense.  Did Manfred mean to write 
"smp_mb()" instead of "spin_lock()/spin_unlock()"?

> > > Technically you could use spin_trylock()+spin_unlock() and avoid the lock 
> > > acquire 
> > > spinning on spin_unlock() and get very close to the slow path performance 
> > > of a 
> > > pure cacheline-dirtying behavior.

This is even more confusing.  Did Ingo mean to suggest using 
"spin_trylock()+spin_unlock()" in place of "spin_lock()+spin_unlock()" 
could provide the desired ordering guarantee without delaying other 
CPUs that may try to acquire the lock?  That seems highly questionable.

> > > But adding something like spin_barrier(), which purely dirties the lock 
> > > cacheline, 
> > > would be even faster, right?
> > 
> > Interestingly enough, the arm64 and powerpc implementations of
> > spin_unlock_wait() were very close to what it sounds like you are
> > describing.
> 
> So could we perhaps solve all our problems by defining the generic version 
> thusly:
> 
> void spin_unlock_wait(spinlock_t *lock)
> {
>   if (spin_trylock(lock))
>   spin_unlock(lock);
> }

How could this possibly be a generic version of spin_unlock_wait()?  
It does nothing at all (with no ordering properties) if some other CPU
currently holds the lock, whereas the real spin_unlock_wait() would
wait until the other CPU released the lock (or possibly longer).

And if no other CPU currently holds the lock, this has exactly the same
performance properties as spin_lock()+spin_unlock(), so what's the
advantage?

Alan Stern

> ... and perhaps rename it to spin_barrier() [or whatever proper name there 
> would 
> be]?
> 
> Architectures can still optimize it, to remove the small window where the 
> lock is 
> held locally - as long as the ordering is at least as strong as the generic 
> version.
> 
> This would have various advantages:
> 
>  - semantics are well-defined
> 
>  - the generic implementation is already pretty well optimized (no spinning)
> 
>  - it would make it usable for the IPC performance optimization
> 
>  - architectures could still optimize it to eliminate the window where the 
> lock is
>held locally - if there's such instructions available.
> 
> Was this proposed before, or am I missing something?
> 
> Thanks,
> 
>   Ingo



Re: [PATCH v2 0/9] Remove spin_unlock_wait()

2017-07-06 Thread Alan Stern
On Thu, 6 Jul 2017, Peter Zijlstra wrote:

> On Thu, Jul 06, 2017 at 12:49:12PM -0400, Alan Stern wrote:
> > On Thu, 6 Jul 2017, Paul E. McKenney wrote:
> > 
> > > On Thu, Jul 06, 2017 at 06:10:47PM +0200, Peter Zijlstra wrote:
> > > > On Thu, Jul 06, 2017 at 08:21:10AM -0700, Paul E. McKenney wrote:
> > > > > And yes, there are architecture-specific optimizations for an
> > > > > empty spin_lock()/spin_unlock() critical section, and the current
> > > > > arch_spin_unlock_wait() implementations show some of these 
> > > > > optimizations.
> > > > > But I expect that performance benefits would need to be demonstrated 
> > > > > at
> > > > > the system level.
> > > > 
> > > > I do in fact contended there are any optimizations for the exact
> > > > lock+unlock semantics.
> > > 
> > > You lost me on this one.
> > > 
> > > > The current spin_unlock_wait() is weaker. Most notably it will not (with
> > > > exception of ARM64/PPC for other reasons) cause waits on other CPUs.
> > > 
> > > Agreed, weaker semantics allow more optimizations.  So use cases needing
> > > only the weaker semantics should more readily show performance benefits.
> > > But either way, we need compelling use cases, and I do not believe that
> > > any of the existing spin_unlock_wait() calls are compelling.  Perhaps I
> > > am confused, but I am not seeing it for any of them.
> > 
> > If somebody really wants the full spin_unlock_wait semantics and
> > doesn't want to interfere with other CPUs, wouldn't synchronize_sched()
> > or something similar do the job?  It wouldn't be as efficient as
> > lock+unlock, but it also wouldn't affect other CPUs.
> 
> So please don't do that. That'll create massive pain for RT. Also I
> don't think it works. The whole point was that spin_unlock_wait() is
> _cheaper_ than lock()+unlock(). If it gets to be more expensive there is
> absolutely no point in using it.

Of course; that is obvious.

I was making a rhetorical point: You should not try to justify
spin_unlock_wait() on the basis that it doesn't cause waits on other
CPUs.

Alan Stern



Re: [PATCH v2 0/9] Remove spin_unlock_wait()

2017-07-06 Thread Alan Stern
On Thu, 6 Jul 2017, Paul E. McKenney wrote:

> On Thu, Jul 06, 2017 at 06:10:47PM +0200, Peter Zijlstra wrote:
> > On Thu, Jul 06, 2017 at 08:21:10AM -0700, Paul E. McKenney wrote:
> > > And yes, there are architecture-specific optimizations for an
> > > empty spin_lock()/spin_unlock() critical section, and the current
> > > arch_spin_unlock_wait() implementations show some of these optimizations.
> > > But I expect that performance benefits would need to be demonstrated at
> > > the system level.
> > 
> > I do in fact contended there are any optimizations for the exact
> > lock+unlock semantics.
> 
> You lost me on this one.
> 
> > The current spin_unlock_wait() is weaker. Most notably it will not (with
> > exception of ARM64/PPC for other reasons) cause waits on other CPUs.
> 
> Agreed, weaker semantics allow more optimizations.  So use cases needing
> only the weaker semantics should more readily show performance benefits.
> But either way, we need compelling use cases, and I do not believe that
> any of the existing spin_unlock_wait() calls are compelling.  Perhaps I
> am confused, but I am not seeing it for any of them.

If somebody really wants the full spin_unlock_wait semantics and
doesn't want to interfere with other CPUs, wouldn't synchronize_sched()
or something similar do the job?  It wouldn't be as efficient as
lock+unlock, but it also wouldn't affect other CPUs.

Alan Stern



Re: [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair

2017-07-03 Thread Alan Stern
On Mon, 3 Jul 2017, Paul E. McKenney wrote:

> On Mon, Jul 03, 2017 at 10:39:49AM -0400, Alan Stern wrote:
> > On Sat, 1 Jul 2017, Manfred Spraul wrote:
> > 
> > > As we want to remove spin_unlock_wait() and replace it with explicit
> > > spin_lock()/spin_unlock() calls, we can use this to simplify the
> > > locking.
> > > 
> > > In addition:
> > > - Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
> > > - The new code avoids the backwards loop.
> > > 
> > > Only slightly tested, I did not manage to trigger calls to
> > > nf_conntrack_all_lock().
> > > 
> > > Fixes: b16c29191dc8
> > > Signed-off-by: Manfred Spraul 
> > > Cc: 
> > > Cc: Sasha Levin 
> > > Cc: Pablo Neira Ayuso 
> > > Cc: netfilter-de...@vger.kernel.org
> > > ---
> > >  net/netfilter/nf_conntrack_core.c | 44 
> > > +--
> > >  1 file changed, 24 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/net/netfilter/nf_conntrack_core.c 
> > > b/net/netfilter/nf_conntrack_core.c
> > > index e847dba..1193565 100644
> > > --- a/net/netfilter/nf_conntrack_core.c
> > > +++ b/net/netfilter/nf_conntrack_core.c
> > > @@ -96,19 +96,24 @@ static struct conntrack_gc_work conntrack_gc_work;
> > >  
> > >  void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
> > >  {
> > > + /* 1) Acquire the lock */
> > >   spin_lock(lock);
> > > - while (unlikely(nf_conntrack_locks_all)) {
> > > - spin_unlock(lock);
> > >  
> > > - /*
> > > -  * Order the 'nf_conntrack_locks_all' load vs. the
> > > -  * spin_unlock_wait() loads below, to ensure
> > > -  * that 'nf_conntrack_locks_all_lock' is indeed held:
> > > -  */
> > > - smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
> > > - spin_unlock_wait(&nf_conntrack_locks_all_lock);
> > > - spin_lock(lock);
> > > - }
> > > + /* 2) read nf_conntrack_locks_all, with ACQUIRE semantics */
> > > + if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false))
> > > + return;
> > 
> > As far as I can tell, this read does not need to have ACQUIRE
> > semantics.
> > 
> > You need to guarantee that two things can never happen:
> > 
> > (1) We read nf_conntrack_locks_all == false, and this routine's
> > critical section for nf_conntrack_locks[i] runs after the
> > (empty) critical section for that lock in 
> > nf_conntrack_all_lock().
> > 
> > (2) We read nf_conntrack_locks_all == true, and this routine's 
> > critical section for nf_conntrack_locks_all_lock runs before 
> > the critical section in nf_conntrack_all_lock().
> > 
> > In fact, neither one can happen even if smp_load_acquire() is replaced
> > with READ_ONCE().  The reason is simple enough, using this property of
> > spinlocks:
> > 
> > If critical section CS1 runs before critical section CS2 (for 
> > the same lock) then: (a) every write coming before CS1's
> > spin_unlock() will be visible to any read coming after CS2's
> > spin_lock(), and (b) no write coming after CS2's spin_lock()
> > will be visible to any read coming before CS1's spin_unlock().
> > 
> > Thus for (1), assuming the critical sections run in the order mentioned
> > above, since nf_conntrack_all_lock() writes to nf_conntrack_locks_all
> > before releasing nf_conntrack_locks[i], and since nf_conntrack_lock()
> > acquires nf_conntrack_locks[i] before reading nf_conntrack_locks_all,
> > by (a) the read will always see the write.
> > 
> > Similarly for (2), since nf_conntrack_all_lock() acquires 
> > nf_conntrack_locks_all_lock before writing to nf_conntrack_locks_all, 
> > and since nf_conntrack_lock() reads nf_conntrack_locks_all before 
> > releasing nf_conntrack_locks_all_lock, by (b) the read cannot see the 
> > write.
> 
> And the Linux kernel memory model (https://lwn.net/Articles/718628/
> and https://lwn.net/Articles/720550/) agrees with Alan.  Here is
> a litmus test, which emulates spin_lock() with xchg_acquire() and
> spin_unlock() with smp_store_release():
> 
> 
> 
> C C-ManfredSpraul-L1G1xchgnr.litmus
> 
> (* Expected result: Never.  *)
> 
> {
> }
> 
> P0(int *nfcla,

Re: [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair

2017-07-03 Thread Alan Stern
On Mon, 3 Jul 2017, Manfred Spraul wrote:

> >>> + /* 2) read nf_conntrack_locks_all, with ACQUIRE semantics */
> >>> + if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false))
> >>> + return;
> >> As far as I can tell, this read does not need to have ACQUIRE
> >> semantics.
> >>
> >> You need to guarantee that two things can never happen:
> >>
> >>  (1) We read nf_conntrack_locks_all == false, and this routine's
> >>critical section for nf_conntrack_locks[i] runs after the
> >>(empty) critical section for that lock in
> >>nf_conntrack_all_lock().
> >>
> >>  (2) We read nf_conntrack_locks_all == true, and this routine's
> >>critical section for nf_conntrack_locks_all_lock runs before
> >>the critical section in nf_conntrack_all_lock().
> I was looking at nf_conntrack_all_unlock:
> There is a smp_store_release() - which memory barrier does this pair with?
> 
> nf_conntrack_all_unlock()
>  
>  smp_store_release(a, false)
>  spin_unlock(b);
> 
> nf_conntrack_lock()
>  spin_lock(c);
>  xx=read_once(a)
>  if (xx==false)
>  return
>  

Ah, I see your point.  Yes, I did wonder about what would happen when
nf_conntrack_locks_all was set back to false.  But I didn't think about
it any further, because the relevant code wasn't in your patch.

> I tried to pair the memory barriers:
> nf_conntrack_all_unlock() contains a smp_store_release().
> What does that pair with?

You are right, this does need to be smp_load_acquire() after all.  
Perhaps the preceding comment should mention that it pairs with the 
smp_store_release() from an earlier invocation of 
nf_conntrack_all_unlock().

(Alternatively, you could make nf_conntrack_all_unlock() do a
lock+unlock on all the locks in the array, just like
nf_conntrack_all_lock().  But of course, that would be a lot less
efficient.)

Alan Stern



Re: [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair

2017-07-03 Thread Alan Stern
On Sat, 1 Jul 2017, Manfred Spraul wrote:

> As we want to remove spin_unlock_wait() and replace it with explicit
> spin_lock()/spin_unlock() calls, we can use this to simplify the
> locking.
> 
> In addition:
> - Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
> - The new code avoids the backwards loop.
> 
> Only slightly tested, I did not manage to trigger calls to
> nf_conntrack_all_lock().
> 
> Fixes: b16c29191dc8
> Signed-off-by: Manfred Spraul 
> Cc: 
> Cc: Sasha Levin 
> Cc: Pablo Neira Ayuso 
> Cc: netfilter-de...@vger.kernel.org
> ---
>  net/netfilter/nf_conntrack_core.c | 44 
> +--
>  1 file changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c 
> b/net/netfilter/nf_conntrack_core.c
> index e847dba..1193565 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -96,19 +96,24 @@ static struct conntrack_gc_work conntrack_gc_work;
>  
>  void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
>  {
> + /* 1) Acquire the lock */
>   spin_lock(lock);
> - while (unlikely(nf_conntrack_locks_all)) {
> - spin_unlock(lock);
>  
> - /*
> -  * Order the 'nf_conntrack_locks_all' load vs. the
> -  * spin_unlock_wait() loads below, to ensure
> -  * that 'nf_conntrack_locks_all_lock' is indeed held:
> -  */
> - smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
> - spin_unlock_wait(&nf_conntrack_locks_all_lock);
> - spin_lock(lock);
> - }
> + /* 2) read nf_conntrack_locks_all, with ACQUIRE semantics */
> + if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false))
> + return;

As far as I can tell, this read does not need to have ACQUIRE
semantics.

You need to guarantee that two things can never happen:

(1) We read nf_conntrack_locks_all == false, and this routine's
critical section for nf_conntrack_locks[i] runs after the
(empty) critical section for that lock in 
nf_conntrack_all_lock().

(2) We read nf_conntrack_locks_all == true, and this routine's 
critical section for nf_conntrack_locks_all_lock runs before 
the critical section in nf_conntrack_all_lock().

In fact, neither one can happen even if smp_load_acquire() is replaced
with READ_ONCE().  The reason is simple enough, using this property of
spinlocks:

If critical section CS1 runs before critical section CS2 (for 
the same lock) then: (a) every write coming before CS1's
spin_unlock() will be visible to any read coming after CS2's
spin_lock(), and (b) no write coming after CS2's spin_lock()
will be visible to any read coming before CS1's spin_unlock().

Thus for (1), assuming the critical sections run in the order mentioned
above, since nf_conntrack_all_lock() writes to nf_conntrack_locks_all
before releasing nf_conntrack_locks[i], and since nf_conntrack_lock()
acquires nf_conntrack_locks[i] before reading nf_conntrack_locks_all,
by (a) the read will always see the write.

Similarly for (2), since nf_conntrack_all_lock() acquires 
nf_conntrack_locks_all_lock before writing to nf_conntrack_locks_all, 
and since nf_conntrack_lock() reads nf_conntrack_locks_all before 
releasing nf_conntrack_locks_all_lock, by (b) the read cannot see the 
write.

Alan Stern

> +
> + /* fast path failed, unlock */
> + spin_unlock(lock);
> +
> + /* Slow path 1) get global lock */
> + spin_lock(&nf_conntrack_locks_all_lock);
> +
> + /* Slow path 2) get the lock we want */
> + spin_lock(lock);
> +
> + /* Slow path 3) release the global lock */
> + spin_unlock(&nf_conntrack_locks_all_lock);
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_lock);
>  
> @@ -149,18 +154,17 @@ static void nf_conntrack_all_lock(void)
>   int i;
>  
>   spin_lock(&nf_conntrack_locks_all_lock);
> - nf_conntrack_locks_all = true;
>  
> - /*
> -  * Order the above store of 'nf_conntrack_locks_all' against
> -  * the spin_unlock_wait() loads below, such that if
> -  * nf_conntrack_lock() observes 'nf_conntrack_locks_all'
> -  * we must observe nf_conntrack_locks[] held:
> -  */
> - smp_mb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
> + nf_conntrack_locks_all = true;
>  
>   for (i = 0; i < CONNTRACK_LOCKS; i++) {
> - spin_unlock_wait(&nf_conntrack_locks[i]);
> + spin_lock(&nf_conntrack_locks[i]);
> +
> + /* This spin_unlock provides the "release" to ensure that
> +  * nf_conntrack_locks_all==true is visible to everyone that
> +  * acquired spin_lock(&nf_conntrack_locks[]).
> +  */
> + spin_unlock(&nf_conntrack_locks[i]);
>   }
>  }





Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Alan Stern
On Fri, 30 Jun 2017, Oleg Nesterov wrote:

> On 06/30, Paul E. McKenney wrote:
> >
> > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > >
> > > I do not think the overhead will be noticeable in this particular case.
> > >
> > > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
>^
> 
> if it was not clear, I tried to say "why do we want to _remove_ unlock_wait".
> 
> > > it has some problems, but still...
> > >
> > > The code above looks strange for me. If we are going to repeat this 
> > > pattern
> > > the perhaps we should add a helper for lock+unlock and name it 
> > > unlock_wait2 ;)
> > >
> > > If not, we should probably change this code more:
> >
> > This looks -much- better than my patch!  May I have your Signed-off-by?
> 
> Only if you promise to replace all RCU flavors with a single simple 
> implementation
> based on rwlock ;)
> 
> Seriously, of course I won't argue, and it seems that nobody except me likes
> this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and
> sometimes it makes sense.

If it looks like synchronize_rcu(), why not actually use 
synchronize_rcu()?

Alan Stern

> Including this particular case. task_work_run() is going to flush/destroy the
> ->task_works list, so it needs to wait until all currently executing "readers"
> (task_work_cancel()'s which have started before ->task_works was updated) have
> completed.



Re: [PATCH] r8152: Allocate interrupt buffer as part of struct r8152

2017-01-31 Thread Alan Stern
On Tue, 31 Jan 2017, Guenter Roeck wrote:

> When unloading the r8152 driver using the 'unbind' sysfs attribute
> in a system with KASAN enabled, the following error message is seen
> on a regular basis.

...

> The two-byte allocation in conjunction with code analysis suggests that
> the interrupt buffer has been overwritten. Added instrumentation in the
> driver shows that the interrupt handler is called after RTL8152_UNPLUG
> was set, and that this event is associated with the error message above.
> This suggests that there are situations where the interrupt buffer is used
> after it has been freed.
> 
> To avoid the problem, allocate the interrupt buffer as part of struct
> r8152.
> 
> Cc: Hayes Wang 
> Signed-off-by: Guenter Roeck 
> ---
> The problem is seen in chromeos-4.4, but there is not reason to believe
> that it does not occur with the upstream kernel. It is still seen in
> chromeos-4.4 after all patches from upstream and linux-next have been
> applied to the driver.
> 
> While relatively simple, I am not really convinced that this is the best
> (or even an acceptable) solution for this problem. I am open to suggestions
> for a better fix.

The proper approach is to keep the allocation as it is, but _before_
deallocating the buffer, make sure that the interrupt buffer won't be
accessed any more.  This may involve calling usb_kill_urb(), or
synchronize_irq(), or something similar.

Alan Stern



Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-10 Thread Alan Stern
On Thu, 10 Nov 2016, Kai-Heng Feng wrote:

> Hi,
> 
> On Wed, Nov 9, 2016 at 8:32 PM, Bjørn Mork  wrote:
> > Oliver Neukum  writes:
> >
> >> On Tue, 2016-11-08 at 13:44 -0500, Alan Stern wrote:
> >>
> >>> These problems could very well be caused by running at SuperSpeed
> >>> (USB-3) instead of high speed (USB-2).
> 
> Yes, it's running at SuperSpeed, on a Kabylake laptop.
> 
> It does not have this issue on a Broadwell laptop, also running at SuperSpeed.
> 
> >>>
> >>> Is there any way to test what happens when the device is attached to
> >>> the computer by a USB-2 cable?  That would prevent it from operating at
> >>> SuperSpeed.
> 
> I recall old Intel PCH can change the USB host from XHCI to EHCI,
> newer PCH does not have this option.
> 
> Is there a way to force XHCI run at HighSpeed?

Yes, like I said above: Use a USB-2 cable instead of a USB-3 cable.

Alan Stern



Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-08 Thread Alan Stern
On Tue, 8 Nov 2016, Bjørn Mork wrote:

> Alan Stern  writes:
> 
> > On Tue, 8 Nov 2016, Kai-Heng Feng wrote:
> >
> >> Hi,
> >> 
> >> On Mon, Nov 7, 2016 at 7:02 PM, Oliver Neukum  wrote:
> >> > On Fri, 2016-11-04 at 17:57 +0800, Kai-Heng Feng wrote:
> >> >> Sometimes cdc_mbim failed to probe if runtime pm is enabled:
> >> >> [9.305626] cdc_mbim: probe of 2-2:1.12 failed with error -22
> >> >>
> >> >> This can be solved by increase its pm usage counter.
> >> >>
> >> >> Signed-off-by: Kai-Heng Feng 
> >> >
> >> > For the record:
> >> >
> >> > NAK. This fixes a symptom. If this patch helps something is broken in
> >> > device core. We need to find that.
> >> >
> >> 
> >> Please check attached dmesg with usbcore.dyndbg="+p".
> >
> > The log shows that the device went into suspend _before_ the cdc_mbim 
> > driver was probed, not during the probe.  Then just before the probe 
> > was started, the USB core tried to resume the device and the resume 
> > failed.
> >
> > The log shows a bunch of other problems with this device:
> >
> > [3.862253] usb 2-2: config 1 has an invalid interface number: 12 but 
> > max is 1
> > [3.862254] usb 2-2: config 1 has an invalid interface number: 13 but 
> > max is 1
> > [3.862254] usb 2-2: config 1 has an invalid interface number: 13 but 
> > max is 1
> > [3.862255] usb 2-2: config 1 has no interface number 0
> > [3.862256] usb 2-2: config 1 has no interface number 1
> 
> These messages are completely harmless and normal for Sierra Wireless
> devices.  They use the interface number to identify the type of
> function, causing this mismatch between the number of interfaces and the
> inteface numbers. Boy, that looks weird in writing :)
> 
> Ref this discussion we had a few years ago:
> http://www.spinics.net/lists/linux-usb/msg77499.html
> 
> No, I didn't expect you to remember that :)

You're right; I didn't remember it.  But seeing those messages again in
the mailing list archives, they do look a little familiar.

> > [8.295180] usb 2-2: Disable of device-initiated U1 failed.
> > [8.295322] usb 2-2: Disable of device-initiated U2 failed.
> >
> > I get the impression that the device won't work properly with runtime 
> > PM at all.
> 
> I suspect the device is an EM7455?  If so, then it does work fine with
> runtime PM, as long as we're talking USB2.  Not sure about USB3 runtime
> PM though.  Cannot test it. The Lenovo laptop I got with one of these
> modems has disabled the USB3 link on the m.2 modem slot for some reason.

These problems could very well be caused by running at SuperSpeed
(USB-3) instead of high speed (USB-2).

Is there any way to test what happens when the device is attached to 
the computer by a USB-2 cable?  That would prevent it from operating at 
SuperSpeed.

The main point, however, is that the proposed patch doesn't seem to
address the true problem, which is that the device gets suspended
between probes.  The patch only tries to prevent it from being
suspended during a probe -- which is already prevented by the USB core.

Alan Stern



Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-08 Thread Alan Stern
On Tue, 8 Nov 2016, Kai-Heng Feng wrote:

> Hi,
> 
> On Mon, Nov 7, 2016 at 7:02 PM, Oliver Neukum  wrote:
> > On Fri, 2016-11-04 at 17:57 +0800, Kai-Heng Feng wrote:
> >> Sometimes cdc_mbim failed to probe if runtime pm is enabled:
> >> [9.305626] cdc_mbim: probe of 2-2:1.12 failed with error -22
> >>
> >> This can be solved by increase its pm usage counter.
> >>
> >> Signed-off-by: Kai-Heng Feng 
> >
> > For the record:
> >
> > NAK. This fixes a symptom. If this patch helps something is broken in
> > device core. We need to find that.
> >
> 
> Please check attached dmesg with usbcore.dyndbg="+p".

The log shows that the device went into suspend _before_ the cdc_mbim 
driver was probed, not during the probe.  Then just before the probe 
was started, the USB core tried to resume the device and the resume 
failed.

The log shows a bunch of other problems with this device:

[3.862253] usb 2-2: config 1 has an invalid interface number: 12 but max is 
1
[3.862254] usb 2-2: config 1 has an invalid interface number: 13 but max is 
1
[3.862254] usb 2-2: config 1 has an invalid interface number: 13 but max is 
1
[3.862255] usb 2-2: config 1 has no interface number 0
[3.862256] usb 2-2: config 1 has no interface number 1
...
[8.295180] usb 2-2: Disable of device-initiated U1 failed.
[8.295322] usb 2-2: Disable of device-initiated U2 failed.

I get the impression that the device won't work properly with runtime 
PM at all.

Alan Stern



Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-04 Thread Alan Stern
On Fri, 4 Nov 2016, Kai-Heng Feng wrote:

> Sometimes cdc_mbim failed to probe if runtime pm is enabled:
> [9.305626] cdc_mbim: probe of 2-2:1.12 failed with error -22
> 
> This can be solved by increase its pm usage counter.

This should not be needed.  The USB core increments the PM usage 
counter of a device before probing its interfaces.

Alan Stern

> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/net/usb/usbnet.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index d5071e3..f77b4bf 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -1674,12 +1674,15 @@ usbnet_probe (struct usb_interface *udev, const 
> struct usb_device_id *prod)
>   net->watchdog_timeo = TX_TIMEOUT_JIFFIES;
>   net->ethtool_ops = &usbnet_ethtool_ops;
>  
> + if (usb_autopm_get_interface(dev->intf) < 0)
> + goto out1;
> +
>   // allow device-specific bind/init procedures
>   // NOTE net->name still not usable ...
>   if (info->bind) {
>   status = info->bind (dev, udev);
>   if (status < 0)
> - goto out1;
> + goto out2;
>  
>   // heuristic:  "usb%d" for links we know are two-host,
>   // else "eth%d" when there's reasonable doubt.  userspace
> @@ -1772,6 +1775,8 @@ usbnet_probe (struct usb_interface *udev, const struct 
> usb_device_id *prod)
>  out3:
>   if (info->unbind)
>   info->unbind (dev, udev);
> +out2:
> + usb_autopm_put_interface(dev->intf);
>  out1:
>   /* subdrivers must undo all they did in bind() if they
>* fail it, but we may fail later and a deferred kevent
> 



Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-22 Thread Alan Stern
On Tue, 22 Mar 2016, Oliver Neukum wrote:

> On Tue, 2016-03-22 at 10:21 -0400, Alan Stern wrote:
> > I don't see any point in resuming the device just in order to collect 
> > operating statistics.  If it was already suspended then it wasn't 
> > operating, so there will be no statistics to collect.
> 
> Indeed. In that case the point is moot. But it is correct to ask
> the core whether the device is autosuspended at that point rather
> than keep a private flag if you can.

That's why we have pm_runtime_status_suspended().

> All that is relevant only if the upper layers ask for information
> that the driver cannot provide without resuming the device.
> Those are fundamentally different issues.

Of course.

Alan Stern



Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-22 Thread Alan Stern
On Tue, 22 Mar 2016, Oliver Neukum wrote:

> On Mon, 2016-03-21 at 15:30 -0400, Alan Stern wrote:
> > On Mon, 21 Mar 2016, Oliver Neukum wrote:
> > 
> 
> > > We have an autosuspend timeout because we think that IO, if it will
> > > come at all, is likeliest to come soon. If, however, the IO is
> > > periodic that heuristics is false.
> > > To save most power the driver must either decide that the interval
> > > is too short or suspend immediately. So if we are lucky enough
> > > to have the frequency in the kernel, we should use that information.
> > 
> > The autosuspend timeout is set by userspace.  The kernel may assign a
> 
> Thus it should apply to all IO originating in user space.
> But only to that IO.

Fair enough.

> > default value, but the user can always override it.  Given this, I 
> > don't see how the kernel can use frequency information (and I'm not 
> > sure where that information would come from in the first place).
> 
> It can ignore internal IO for the purpose of the timeout.
> If such IO is performed while the device is active, don't
> alter the timer.

Come to think of it, we don't.  If pm_runtime_get_sync() and then
pm_runtime_put() are called while the device is already at full power, 
the PM core doesn't update the last_busy time.  So if the driver 
doesn't update it either, the statistics collection won't interfere 
with autosuspend (except when it races with the autosuspend timer 
expiration).

> Otherwise resume the device and look at
> the provided hint and suspend again immediately if the period is long
> enough.

I don't see any point in resuming the device just in order to collect 
operating statistics.  If it was already suspended then it wasn't 
operating, so there will be no statistics to collect.

> If IO is generated periodically in the kernel, the kernel must know that
> period.

Alan Stern



RE: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-21 Thread Alan Stern
On Mon, 21 Mar 2016 woojung@microchip.com wrote:

> > > > But this leaves open the issue that querying the device too often will
> > > > prevent it from going into autosuspend.  It seems to me that the best
> > > > way to deal with this is to make sure that the autosuspend timeout is
> > > > shorter than the interal between queries, not to make the querying
> > > > conditional on !runtime_auto.
> > >
> > > Short autosuspend timeout can affect performance. For instance our
> > experiments showed that
> > > shorter than 10sec timeout made Ethernet performance degrade because
> > of wakeup delays.
> > > So, just putting shorter timeout may have some side effects.
> > 
> > Sure.  This just means that you need a long statistics interval --
> > longer than the autosuspend timeout.  That's why I suggested making the
> > interval adjustable.
> 
> What do you mean statistics interval?
> Interval calling ndo_get_stats64 or another thread/timer or else getting 
> statistics?

The time between calls to ndo_get_stats64, since that's the routine 
which queries the device.

Alan Stern



Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-21 Thread Alan Stern
On Mon, 21 Mar 2016, Oliver Neukum wrote:

> On Mon, 2016-03-21 at 14:24 -0400, Alan Stern wrote:
> > On Mon, 21 Mar 2016, Oliver Neukum wrote:
> > 
> > > On Mon, 2016-03-21 at 10:57 -0400, Alan Stern wrote:
> > > 
> > > > One possible solution is to export a sysfs parameter to prevent 
> > > > statistics collection (or more generally, to change the interval at 
> > > > which it occurs).
> > > 
> > > Surely, not performing a task can hardly be beaten in terms of power
> > > consumption. That is not meant to be flippant, but I think these
> > > issues are orthogonal. The question of how much to do doesn't
> > > solve the question of doing efficiently what we do.
> > 
> > In other words, what's the best way to collect the statistics without 
> > interfering with runtime PM, right?
> 
> Yes.
> 
> > If the device is suspended, presumably we know there's nothing to
> > collect -- especially if we already collected the statistics at the
> > time the device got suspended.  Hence my suggestion to avoid querying 
> > the device while it is suspended.
> 
> That is perfectly alright if we just collect statistics.
> As a generic mechanism it is bad. Think about the polling
> for media detection.

True.  Here I'm talking specifically about collecting statistics.  
Media detection has its own requirements.

> > But this leaves open the issue that querying the device too often will 
> > prevent it from going into autosuspend.  It seems to me that the best 
> > way to deal with this is to make sure that the autosuspend timeout is 
> > shorter than the interal between queries, not to make the querying 
> > conditional on !runtime_auto.
> [..]
> > > If we know when the next activity will come, why not pass this
> > > information down?
> 
> We have an autosuspend timeout because we think that IO, if it will
> come at all, is likeliest to come soon. If, however, the IO is
> periodic that heuristics is false.
> To save most power the driver must either decide that the interval
> is too short or suspend immediately. So if we are lucky enough
> to have the frequency in the kernel, we should use that information.

The autosuspend timeout is set by userspace.  The kernel may assign a 
default value, but the user can always override it.  Given this, I 
don't see how the kernel can use frequency information (and I'm not 
sure where that information would come from in the first place).

Alan Stern



RE: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-21 Thread Alan Stern
On Mon, 21 Mar 2016 woojung@microchip.com wrote:

> > But this leaves open the issue that querying the device too often will
> > prevent it from going into autosuspend.  It seems to me that the best
> > way to deal with this is to make sure that the autosuspend timeout is
> > shorter than the interal between queries, not to make the querying
> > conditional on !runtime_auto.
> 
> Short autosuspend timeout can affect performance. For instance our 
> experiments showed that
> shorter than 10sec timeout made Ethernet performance degrade because of 
> wakeup delays.
> So, just putting shorter timeout may have some side effects.

Sure.  This just means that you need a long statistics interval --
longer than the autosuspend timeout.  That's why I suggested making the
interval adjustable.

Alternatively, you could automatically set the statistics interval to
the autosuspend timeout (or 0 if autosuspend is disabled) plus some
fixed value.

Alan Stern



Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-21 Thread Alan Stern
On Mon, 21 Mar 2016, Oliver Neukum wrote:

> On Mon, 2016-03-21 at 10:57 -0400, Alan Stern wrote:
> 
> > One possible solution is to export a sysfs parameter to prevent 
> > statistics collection (or more generally, to change the interval at 
> > which it occurs).
> 
> Surely, not performing a task can hardly be beaten in terms of power
> consumption. That is not meant to be flippant, but I think these
> issues are orthogonal. The question of how much to do doesn't
> solve the question of doing efficiently what we do.

In other words, what's the best way to collect the statistics without 
interfering with runtime PM, right?

If the device is suspended, presumably we know there's nothing to
collect -- especially if we already collected the statistics at the
time the device got suspended.  Hence my suggestion to avoid querying 
the device while it is suspended.

But this leaves open the issue that querying the device too often will 
prevent it from going into autosuspend.  It seems to me that the best 
way to deal with this is to make sure that the autosuspend timeout is 
shorter than the interal between queries, not to make the querying 
conditional on !runtime_auto.

> > But checking the runtime_auto flag is probably not a great idea.  Even
> > if it isn't set, collecting statistics is likely to wait up a device
> > that otherwise would have remained suspended.
> > 
> > Perhaps the best solution is to collect the statistics only when the 
> > device is not suspended or is about to suspend.
> 
> If we know when the next activity will come, why not pass this
> information down?

I don't follow.

Alan Stern



Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-21 Thread Alan Stern
On Mon, 21 Mar 2016, Oliver Neukum wrote:

> On Sun, 2016-03-20 at 11:43 +0100, Geert Uytterhoeven wrote:
> > If CONFIG_PM=n:
> > 
> > drivers/net/usb/lan78xx.c: In function ‘lan78xx_get_stats64’:
> > drivers/net/usb/lan78xx.c:3274: error: ‘struct dev_pm_info’ has no
> > member named ‘runtime_auto’
> > 
> > If PM is disabled, the runtime_auto flag is not available, but auto
> > suspend is not enabled anyway.  Hence protect the check for
> > runtime_auto
> > by #ifdef CONFIG_PM to fix this.
> > 
> > Fixes: a59f8c5b048dc938 ("lan78xx: add ndo_get_stats64")
> > Reported-by: Guenter Roeck 
> > Signed-off-by: Geert Uytterhoeven 
> > ---
> > Alternatively, we can add a dev_pm_runtime_auto_is_enabled() wrapper
> > to
> > include/linux/pm.h, which always return false if CONFIG_PM is
> > disabled.
> 
> That is what we do for almost everything else in include/pm_runtime.h
> So it is the best solution to add the stub.

Guenter's question about whether drivers should try to access
runtime_auto in the first place was a good one.  A similar problem
arises in the block layer: When a block device has removable media, the
block layer probes at 1-second intervals looking for media changes.  
This defeats autosuspend in the same way as we're talking about here.

One possible solution is to export a sysfs parameter to prevent 
statistics collection (or more generally, to change the interval at 
which it occurs).

But checking the runtime_auto flag is probably not a great idea.  Even
if it isn't set, collecting statistics is likely to wait up a device
that otherwise would have remained suspended.

Perhaps the best solution is to collect the statistics only when the 
device is not suspended or is about to suspend.

Alan Stern



Re: [PATCH v2] r8152: fix lockup when runtime PM is enabled

2015-12-24 Thread Alan Stern
On Thu, 24 Dec 2015, Oliver Neukum wrote:

> On Thu, 2015-12-24 at 10:14 -0500, Alan Stern wrote:
> > On Thu, 24 Dec 2015, Oliver Neukum wrote:
> > 
> > > On Wed, 2015-12-23 at 20:32 -0500, Alan Stern wrote:
> 
> > > But you cannot keep that setting if the system goes down
> > > or any broadcast packet would resume the whole system.
> > > Yet you cannot just disable remote wake up, as WoL packages
> > > still must trigger a remote wake up.
> > 
> > This means that sometimes you want to avoid losing packets and other 
> > times you do want to lose packets.  That is a policy decision, and 
> > therefore it should be made by the user, not the kernel.
> 
> Indeed it is and there is a tool for this with a defined
> interface called "ethtool"

No; ethtool affects the wakeup setting for system suspend, but not
for runtime suspend.  I was referring to something that would specify 
the setting for both cases.  But perhaps that doesn't make sense, 
because you never want to drop relevant packets during runtime suspend.  
If you did, you would run "ifconfig down" instead.

> > the PM core aren't adequate for what the driver needs.  The PM core 
> > distinguishes between wakeup enabled or disabled; it doesn't 
> > distinguish among different levels of wakekup.
> 
> True and sanely it cannot. We could only distinguish between drivers
> which need their devices to be resumed before the system suspends and
> the rest.
> Or we tell driver coders to use the notifier chains.

"Resume before system suspend" sounds like a reasonable thing to
implement, for devices that have multiple levels of wakeup settings.  
Would you like to post a proposal on linux-pm for this?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] r8152: fix lockup when runtime PM is enabled

2015-12-24 Thread Alan Stern
On Thu, 24 Dec 2015, Oliver Neukum wrote:

> On Wed, 2015-12-23 at 20:32 -0500, Alan Stern wrote:
> 
> > I don't understand why the wakeup conditions are different.  It seems
> > to me that the choice of which packets will generate a wakeup ought to
> > depend on the user's selection, not on the kind of suspend.  For
> > instance, if the user says that only a magic packet should cause a
> > wakeup then that should be true for both runtime suspend and system
> > suspend.
> > 
> > To put it another way, as far as the device is concerned a suspend is
> > just a suspend -- there's no different between a runtime suspend and a
> > system suspend.
> 
> This literally true, but the host and the driver care.
> If we autosuspend a running network device, any packet
> (maybe filtered for MAC) should cause a remote wake up,
> else we'd lose packets.

That's also true during system suspend.

> But you cannot keep that setting if the system goes down
> or any broadcast packet would resume the whole system.
> Yet you cannot just disable remote wake up, as WoL packages
> still must trigger a remote wake up.

This means that sometimes you want to avoid losing packets and other 
times you do want to lose packets.  That is a policy decision, and 
therefore it should be made by the user, not the kernel.

> So there are drivers which must change settings on devices
> as the system goes to sleep, even if their devices have
> already been autosuspended. We could use the notifier chains
> for that. But can this solution be called elegant?

Instead of the driver trying to do this automatically, you could rely 
on userspace telling the driver which packets should cause a wakeup.  
The setting could be updated immediately before and after each system 
suspend.

I admit this is more awkward than having the driver make a choice based 
on the type of suspend.  This is a case where the resources provided by 
the PM core aren't adequate for what the driver needs.  The PM core 
distinguishes between wakeup enabled or disabled; it doesn't 
distinguish among different levels of wakekup.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] r8152: fix lockup when runtime PM is enabled

2015-12-23 Thread Alan Stern
On Wed, 23 Dec 2015, Hayes Wang wrote:

> Oliver Neukum [mailto:oneu...@suse.de]
> > Sent: Wednesday, December 23, 2015 4:20 PM
> [...]
> > No, step (2) does not exist. Calls to suspend() and [reset_]resume()
> > always balance. Usually a driver shouldn't care about system suspend.
> > The way the driver is currently coded will also fail for Port-Power Off.
> 
> It is different with Windows. The Windows would resume the device before
> system suspend, if the system suspend follows the autosuspend.
> 
> Would this be a problem? After system suspend, the device may wake up
> the system when receiving any packet, not only magic packet. The wake
> events are different for system suspend and autosuspend. However, I
> couldn't change the wake event, because the autosuspend occurs first,
> and the suspend() is only called once.

I don't understand why the wakeup conditions are different.  It seems
to me that the choice of which packets will generate a wakeup ought to
depend on the user's selection, not on the kind of suspend.  For
instance, if the user says that only a magic packet should cause a
wakeup then that should be true for both runtime suspend and system
suspend.

To put it another way, as far as the device is concerned a suspend is
just a suspend -- there's no different between a runtime suspend and a
system suspend.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Adding Reset resume support for CDC-EEM driver.

2015-12-21 Thread Alan Stern
On Mon, 21 Dec 2015, Vikas Bansal wrote:

> Pre-Condition
> At the time of reset resume of a USB device, both self and bus powered 
> devices might go to a low power state or power off state depending on the 
> acceptable suspend time power of the system.
> In case the device experiences a power glitch or completely powers off during 
> suspend-resume, the device will lose its internal state and hence it'll again 
> need a set interface from class driver on reset resume operation.
> 
> Issue 
> So far our experiments were based on a USB gadget working on cdc_eem 
> protocol. 
> We have seen that device is unable to continue the packet transfer on bulk 
> endpoints after the reset resume operation.
> 
> Solution
> We have added a .reset_resume function for cdc_eem protocol which sends a set 
> interface command on the Control endpoint. And calls the existing 
> usbnet_resume thereafter

You know, the USB core already issues a Set-Interface request on the
control endpoint whenever a reset-resume occurs (unless the interface
was using altsetting 0 beforehand).  Issuing another Set-Interface
shouldn't make any difference.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Alan Stern
On Mon, 24 Aug 2015, Alan Stern wrote:

> On Mon, 24 Aug 2015, David Miller wrote:
> 
> > From: Eugene Shatokhin 
> > Date: Wed, 19 Aug 2015 14:59:01 +0300
> > 
> > > So the following might be possible, although unlikely:
> > > 
> > > CPU0 CPU1
> > >  clear_bit: read dev->flags
> > >  clear_bit: clear EVENT_RX_KILL in the read value
> > > 
> > > dev->flags=0;
> > > 
> > >  clear_bit: write updated dev->flags
> > > 
> > > As a result, dev->flags may become non-zero again.
> > 
> > Is this really possible?
> > 
> > Stores really are "atomic" in the sense that the do their update
> > in one indivisible operation.
> 
> Provided you use ACCESS_ONCE or WRITE_ONCE or whatever people like to 
> call it now.
> 
> > Atomic operations like clear_bit also will behave that way.
> 
> Are you certain about that?  I couldn't find any mention of it in
> Documentation/atomic_ops.txt.
> 
> In theory, an architecture could implement atomic bit operations using 
> a spinlock to insure atomicity.  I don't know if any architectures do 
> this, but if they do then the scenario above could arise.

Now that I see this in writing, I realize it's not possible after all.  
clear_bit() et al. will work with a single unsigned long, which doesn't
leave any place for spinlocks or other mechanisms.  I was thinking of 
atomic_t.

So never mind...

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Alan Stern
On Mon, 24 Aug 2015, David Miller wrote:

> From: Eugene Shatokhin 
> Date: Wed, 19 Aug 2015 14:59:01 +0300
> 
> > So the following might be possible, although unlikely:
> > 
> > CPU0 CPU1
> >  clear_bit: read dev->flags
> >  clear_bit: clear EVENT_RX_KILL in the read value
> > 
> > dev->flags=0;
> > 
> >  clear_bit: write updated dev->flags
> > 
> > As a result, dev->flags may become non-zero again.
> 
> Is this really possible?
> 
> Stores really are "atomic" in the sense that the do their update
> in one indivisible operation.

Provided you use ACCESS_ONCE or WRITE_ONCE or whatever people like to 
call it now.

> Atomic operations like clear_bit also will behave that way.

Are you certain about that?  I couldn't find any mention of it in
Documentation/atomic_ops.txt.

In theory, an architecture could implement atomic bit operations using 
a spinlock to insure atomicity.  I don't know if any architectures do 
this, but if they do then the scenario above could arise.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: fec: fix initial runtime PM refcount

2015-08-04 Thread Alan Stern
On Tue, 4 Aug 2015, Uwe [iso-8859-1] Kleine-K�nig wrote:

> Hello,
> 
> On Tue, Aug 04, 2015 at 10:20:55AM -0400, Alan Stern wrote:
> > In that case, adding a call pm_runtime_get_noresume() is the right 
> > thing to do.
> Is this an ack for Lucas' patch?

Yes:

Acked-by: Alan Stern 

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: fec: fix initial runtime PM refcount

2015-08-04 Thread Alan Stern
On Tue, 4 Aug 2015, Lucas Stach wrote:

> Am Montag, den 03.08.2015, 14:28 -0400 schrieb Alan Stern:
> > On Mon, 3 Aug 2015, Uwe [iso-8859-1] Kleine-Knig wrote:
> > 
> > > Hello,
> > > 
> > > I have no clue about runtime-pm, but I added a few people to Cc: who
> > > should know better ...
> > > 
> > > Best regards
> > > Uwe
> > > 
> > > On Mon, Aug 03, 2015 at 06:15:54PM +0200, Andrew Lunn wrote:
> > > > On Mon, Aug 03, 2015 at 05:50:11PM +0200, Lucas Stach wrote:
> > > > > The clocks are initially active and thus the device is marked active.
> > > > > This still keeps the PM refcount at 0, the 
> > > > > pm_runtime_put_autosuspend()
> > > > > call at the end of probe then leaves us with an invalid refcount of 
> > > > > -1,
> > > > > which in turn leads to the device staying in suspended state even 
> > > > > though
> > > > > netdev open had been called.
> > > > > 
> > > > > Fix this by initializing the refcount to be coherent with the initial
> > > > > device status.
> > > > > 
> > > > > Fixes:
> > > > > 8fff755e9f8 (net: fec: Ensure clocks are enabled while using mdio bus)
> > > > > 
> > > > > Signed-off-by: Lucas Stach 
> > > > > ---
> > > > > Please apply this as a fix for 4.2
> > > > > ---
> > > > >  drivers/net/ethernet/freescale/fec_main.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/drivers/net/ethernet/freescale/fec_main.c 
> > > > > b/drivers/net/ethernet/freescale/fec_main.c
> > > > > index 32e3807c650e..271bb5862346 100644
> > > > > --- a/drivers/net/ethernet/freescale/fec_main.c
> > > > > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > > > > @@ -3433,6 +3433,7 @@ fec_probe(struct platform_device *pdev)
> > > > >  
> > > > >   pm_runtime_set_autosuspend_delay(&pdev->dev, 
> > > > > FEC_MDIO_PM_TIMEOUT);
> > > > >   pm_runtime_use_autosuspend(&pdev->dev);
> > > > > + pm_runtime_get_noresume(&pdev->dev);
> > > > >   pm_runtime_set_active(&pdev->dev);
> > > > >   pm_runtime_enable(&pdev->dev);
> > > > 
> > > > This might work, but is it the correct fix?
> > 
> > It looks reasonable to me.  It might also make sense to move all of
> > that pm_runtime_* stuff to the end of the probe routine.  Notice that
> > they don't get undone if register_netdev() fails.
> > 
> Unfortunately we can not move RPM enabling to the end of probe, as the
> MDIO read/write functions that rely on RPM are already called while we
> are still in the middle of the probe function.

In that case, adding a call pm_runtime_get_noresume() is the right 
thing to do.

> I agree that we need better error handling here, but that comment
> applies to the whole FEC probe function. I think that this might be
> invasive enough to justify a delay to the next merge window, not really
> material for the late RCs.

That's entirely up to you, of course.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: fec: fix initial runtime PM refcount

2015-08-03 Thread Alan Stern
On Mon, 3 Aug 2015, Uwe [iso-8859-1] Kleine-K�nig wrote:

> Hello,
> 
> I have no clue about runtime-pm, but I added a few people to Cc: who
> should know better ...
> 
> Best regards
> Uwe
> 
> On Mon, Aug 03, 2015 at 06:15:54PM +0200, Andrew Lunn wrote:
> > On Mon, Aug 03, 2015 at 05:50:11PM +0200, Lucas Stach wrote:
> > > The clocks are initially active and thus the device is marked active.
> > > This still keeps the PM refcount at 0, the pm_runtime_put_autosuspend()
> > > call at the end of probe then leaves us with an invalid refcount of -1,
> > > which in turn leads to the device staying in suspended state even though
> > > netdev open had been called.
> > > 
> > > Fix this by initializing the refcount to be coherent with the initial
> > > device status.
> > > 
> > > Fixes:
> > > 8fff755e9f8 (net: fec: Ensure clocks are enabled while using mdio bus)
> > > 
> > > Signed-off-by: Lucas Stach 
> > > ---
> > > Please apply this as a fix for 4.2
> > > ---
> > >  drivers/net/ethernet/freescale/fec_main.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/net/ethernet/freescale/fec_main.c 
> > > b/drivers/net/ethernet/freescale/fec_main.c
> > > index 32e3807c650e..271bb5862346 100644
> > > --- a/drivers/net/ethernet/freescale/fec_main.c
> > > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > > @@ -3433,6 +3433,7 @@ fec_probe(struct platform_device *pdev)
> > >  
> > >   pm_runtime_set_autosuspend_delay(&pdev->dev, FEC_MDIO_PM_TIMEOUT);
> > >   pm_runtime_use_autosuspend(&pdev->dev);
> > > + pm_runtime_get_noresume(&pdev->dev);
> > >   pm_runtime_set_active(&pdev->dev);
> > >   pm_runtime_enable(&pdev->dev);
> > 
> > This might work, but is it the correct fix?

It looks reasonable to me.  It might also make sense to move all of
that pm_runtime_* stuff to the end of the probe routine.  Notice that
they don't get undone if register_netdev() fails.

> > Documentation/power/runtime_pm.txt says:
> > 
> > 534 In addition to that, the initial runtime PM status of all devices is
> > 535 'suspended', but it need not reflect the actual physical state of the 
> > device.
> > 536 Thus, if the device is initially active (i.e. it is able to process 
> > I/O), its
> > 537 runtime PM status must be changed to 'active', with the help of
> > 538 pm_runtime_set_active(), before pm_runtime_enable() is called for the 
> > device.
> > 
> > At the point we call the pm_runtime_ functions above, all the clocks
> > are ticking. So according to the documentation pm_runtime_set_active()
> > is the right thing to do. But it makes no mention of have to call
> > pm_runtime_get_noresume(). I would of expected pm_runtime_set_active()
> > to set the count to the correct value.

pm_runtime_set_active() doesn't change the usage count.  All it does is 
set the runtime PM status to "active".

A call to pm_runtime_get_noresume() (or something similar) is necessary
to balance the call to pm_runtime_put_autosuspend() at the end of the
probe routine.  Both the _get_ and the _put_ should be present or
neither should be.

For instance, an alternative way to accomplish the same result is to
replace pm_runtime_put_autosuspend() with pm_runtime_autosuspend().  
The only difference is that the usage counter would not be elevated
during the register_netdev() call, so in theory the device could be
suspended while that routine is running.  But if all the pm_runtime_*
calls were moved to the end of the probe function, even that couldn't
happen.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH] Bluetooth: Make request workqueue freezable

2015-05-21 Thread Alan Stern
On Thu, 21 May 2015, Takashi Iwai wrote:

> At Thu, 21 May 2015 11:26:17 -0400 (EDT),
> Alan Stern wrote:
> > 
> > On Thu, 21 May 2015, Takashi Iwai wrote:
> > 
> > > At Thu, 21 May 2015 10:18:08 -0400 (EDT),
> > > Alan Stern wrote:
> > > > 
> > > > On Thu, 21 May 2015, Takashi Iwai wrote:
> > > > 
> > > > > Then avoiding the failed firmware is no solution, indeed.
> > > > > If it's a new probe, it should be never executed during resume.
> > > > 
> > > > Can you expand this comment?  What's wrong with probing during resume?
> > > 
> > > Well, if the probe requires the access to a user-space file, it can't
> > > be done during resume.  That's the very problem we're seeing now.
> > > The firmware loader can't help much alone if it's a new device
> > > object.
> > 
> > But the same thing happens during early boot, if the driver is built 
> > into the kernel.  When the probe occurs, userspace isn't up and running 
> > yet, so the firmware loader can't do anything.
> > 
> > Why should probe during resume be any worse than probe during early 
> > boot?
> 
> The early boot has initrd, so the files can be there.  But the resume
> has no way to fetch the file except for cached data.

I suppose USB could delay re-probing until userspace is running again,
if we knew when that was.  But it would be awkward and prone to races.  
It also would leave a user-visible window of time during which the 
device does not exist, which we want to avoid.  (This may not matter 
for bluetooth, but it does matter for other kinds of devices.)

I would prefer to solve this problem in a different way, if possible.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH] Bluetooth: Make request workqueue freezable

2015-05-21 Thread Alan Stern
On Thu, 21 May 2015, Marcel Holtmann wrote:

> Hi Alan,
> 
> >> Then avoiding the failed firmware is no solution, indeed.
> >> If it's a new probe, it should be never executed during resume.
> > 
> > Can you expand this comment?  What's wrong with probing during resume?
> > 
> > The USB stack does carry out probes during resume under certain
> > circumstances.  A driver lacking a reset_resume callback is one of
> > those circumstances.
> 
> in case the platform kills the power to the USB lines, we can never
> do anything about this. I do not want to hack around this in the
> driver.
> 
> What are the cases where we should implement reset_resume and would
> it really help here. Since the btusb.ko driver implements
> suspend/resume support, would reset_resume ever be called?

One of those cases is exactly what you have been talking about: when
the platform kills power to the USB lines during suspend.  The driver's
reset_resume routine will be called during resume, as opposed to the
probe routine being called.  Therefore the driver will be able to tell
that this is not a new device instance.

The other cases are less likely to occur: a device is unable to resume 
normally and requires a reset before it will start working again, or 
something else goes wrong along those lines.

> However I get the feeling someone needs to go back and see if the
> device is the same one and just gets probed again or if it is a new
> one from the USB host stack perspective.

That can be done easily enough by enabling usbcore debugging before 
carrying out the system suspend:

echo 'module usbcore =p' >/debug/dynamic_debug/control

The debugging information in the kernel log will tell just what 
happened.


On Thu, 21 May 2015, Takashi Iwai wrote:

> At Thu, 21 May 2015 10:18:08 -0400 (EDT),
> Alan Stern wrote:
> > 
> > On Thu, 21 May 2015, Takashi Iwai wrote:
> > 
> > > Then avoiding the failed firmware is no solution, indeed.
> > > If it's a new probe, it should be never executed during resume.
> > 
> > Can you expand this comment?  What's wrong with probing during resume?
> 
> Well, if the probe requires the access to a user-space file, it can't
> be done during resume.  That's the very problem we're seeing now.
> The firmware loader can't help much alone if it's a new device
> object.

But the same thing happens during early boot, if the driver is built 
into the kernel.  When the probe occurs, userspace isn't up and running 
yet, so the firmware loader can't do anything.

Why should probe during resume be any worse than probe during early 
boot?

> > The USB stack does carry out probes during resume under certain
> > circumstances.  A driver lacking a reset_resume callback is one of
> > those circumstances.
> 
> So, having a proper reset_resume in btusb would help in the end?

It might, depending on how the driver is written.  I don't know enough 
about the details of btusb to say.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH] Bluetooth: Make request workqueue freezable

2015-05-21 Thread Alan Stern
On Thu, 21 May 2015, Takashi Iwai wrote:

> Then avoiding the failed firmware is no solution, indeed.
> If it's a new probe, it should be never executed during resume.

Can you expand this comment?  What's wrong with probing during resume?

The USB stack does carry out probes during resume under certain
circumstances.  A driver lacking a reset_resume callback is one of
those circumstances.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

2008-01-02 Thread Alan Stern
On Tue, 1 Jan 2008, Greg KH wrote:

> On Mon, Dec 31, 2007 at 11:26:43AM -0800, Greg KH wrote:
> > On Mon, Dec 31, 2007 at 12:49:52PM -0500, Alan Stern wrote:
> > > On Sun, 30 Dec 2007, Greg KH wrote:
> > > 
> > > > > It looks like Greg misused the debugfs API -- which is ironic, because
> > > > > he wrote debugfs in the first place!  :-)

> Ok, no, I didn't write that patch, I'm getting very confused here.
> 
> In 2.6.24-rc6 there is no usage of debugfs in the ohci driver.
> 
> In the -mm tree there is a patch, from Tony Jones, that moves some debug
> code out of sysfs and into debugfs where it belongs.  It does it for
> both the ehci and ohci USB host controller drivers, and this is the code
> that is incorrect if CONFIG_DEBUGFS is not enabled.

My mistake; I got the impression you had written that new code rather
than Tony.  BTW, I don't recall ever seeing Tony's patch announced on
linux-usb or linux-usb-devel.  Did I simply miss it?

> So, for the 2.6.24 release, nothing needs to be changed, all is good,
> and there is no regression.
> 
> Right?  Or am I still confused about this whole thing?

Correct.  The problem exists only in -mm and your development tree.

> I will go fix up Tony's patches in the -mm tree that do not handle the
> error return values from debugfs properly, but that is not such a rush,
> as Tony is on vacation for a few weeks, and those patches are going to
> be going in only after 2.6.24 is out.

The fix I posted earlier in this thread should simply be merged into 
Tony's patches.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

2007-12-31 Thread Alan Stern
On Sun, 30 Dec 2007, Greg KH wrote:

> > It looks like Greg misused the debugfs API -- which is ironic, because
> > he wrote debugfs in the first place!  :-)
> 
> Oh crap, sorry, I did mess that up :(
> 
> > Let me know if this patch fixes the problem.  If it does, I'll submit 
> > it to Greg with all the proper accoutrements.
> 
> This isn't going to work if CONFIG_DEBUGFS is not enabled either :(

Why not?  The debugging files won't be created, true, but the driver
should work just fine regardless.  This is exactly the way uhci-hcd
behaves, and it hasn't caused any problems.

> > Index: 2.6.24-rc6-mm1/drivers/usb/host/ohci-hcd.c
> > ===
> > --- 2.6.24-rc6-mm1.orig/drivers/usb/host/ohci-hcd.c
> > +++ 2.6.24-rc6-mm1/drivers/usb/host/ohci-hcd.c
> > @@ -1067,14 +1067,8 @@ static int __init ohci_hcd_mod_init(void
> >  
> >  #ifdef DEBUG
> > ohci_debug_root = debugfs_create_dir("ohci", NULL);
> > -   if (!ohci_debug_root || IS_ERR(ohci_debug_root)) {
> > -   if (!ohci_debug_root)
> > -   retval = -ENOENT;
> > -   else
> > -   retval = PTR_ERR(ohci_debug_root);
> > -
> > -   goto error_debug;
> > -   }
> > +   if (!ohci_debug_root)
> > +   return -ENOENT;
> 
> It needs to check for ERR_PTR(-ENODEV) which is the return value if
> debugfs is not enabled, and if so, just ignore things.

No.  That is the mistake you made before.  If debugfs isn't enabled 
then the driver should just ignore things, yes -- and in particular it 
should ignore the fact that the return code is ERR_PTR(-ENODEV).  No 
extra checking is required.

> If NULL is returned, or anything else, it's a real error.

If NULL is returned, it's a real error, agreed.  But if anything else
is returned then the call was successful as far as the driver is 
concerned.

> So, try something like:
>   if (!ohci_debug_root) {
>   retval = -ENOENT;
>   goto error_debug;
>   }
>   if (IS_ERR(ohci_debug_root) && PTR_ERR(ohci_debug_root) != -ENODEV) {
>   retval = PTR_ERR(ohci_debug_root);
>   goto error_debug;
>   }
> 
> and let me know of that works for you.

Greg, it sounds like you have forgotten the rationale you originally 
used for specifying the return codes in debugfs!  The idea was to 
return non-NULL if CONFIG_DEBUGFS was disabled, so that drivers could 
pretend the calls had succeeded and not need to worry about matters 
beyond their control.

Only a NULL return indicates a genuine error.  The kerneldoc for 
debugfs_create_dir says this very plainly.

> Although the combination of CONFIG_USB_DEBUG and not CONFIG_DEBUGFS is
> strange, we could just enable CONFIG_DEBUGFS is USB_DEBUG is enabled and
> then simplify this logic a bunch at the same time.

Most distributions enable CONFIG_DEBUGFS by default, don't they?  So 
this problem only comes up when people make up their own configs.  
Having USB_DEBUG enabled and DEBUGFS disabled seems like a perfectly 
reasonable combination to me.  I wouldn't change any aspect of the 
configuration logic.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

2007-12-30 Thread Alan Stern
On Sun, 30 Dec 2007, Ingo Molnar wrote:

> * Andreas Mohr <[EMAIL PROTECTED]> wrote:
> 
> > (yes, that's all there is, despite CONFIG_USB_DEBUG being set)
> > 
> > The LED of a usb stick isn't active either, for obvious reasons.
> > 
> > And keep in mind that this is a (relatively old) OHCI-only machine... 
> > (which had the 2.6.19 lsmod showing ohci-hcd just fine and working 
> > fine with WLAN USB)
> > 
> > Now pondering whether to try -rc6 proper or whether to revert specific 
> > guilty-looking USB changes... And wondering how to properly elevate 
> > this issue (prompt Greg about it, new thread, bug #, ...?)

It looks like Greg misused the debugfs API -- which is ironic, because
he wrote debugfs in the first place!  :-)

Let me know if this patch fixes the problem.  If it does, I'll submit 
it to Greg with all the proper accoutrements.

Alan Stern


Index: 2.6.24-rc6-mm1/drivers/usb/host/ohci-hcd.c
===
--- 2.6.24-rc6-mm1.orig/drivers/usb/host/ohci-hcd.c
+++ 2.6.24-rc6-mm1/drivers/usb/host/ohci-hcd.c
@@ -1067,14 +1067,8 @@ static int __init ohci_hcd_mod_init(void
 
 #ifdef DEBUG
ohci_debug_root = debugfs_create_dir("ohci", NULL);
-   if (!ohci_debug_root || IS_ERR(ohci_debug_root)) {
-   if (!ohci_debug_root)
-   retval = -ENOENT;
-   else
-   retval = PTR_ERR(ohci_debug_root);
-
-   goto error_debug;
-   }
+   if (!ohci_debug_root)
+   return -ENOENT;
 #endif
 
 #ifdef PS3_SYSTEM_BUS_DRIVER
@@ -1142,7 +1136,6 @@ static int __init ohci_hcd_mod_init(void
 #ifdef DEBUG
debugfs_remove(ohci_debug_root);
ohci_debug_root = NULL;
- error_debug:
 #endif
 
return retval;
Index: 2.6.24-rc6-mm1/drivers/usb/host/ohci-dbg.c
===
--- 2.6.24-rc6-mm1.orig/drivers/usb/host/ohci-dbg.c
+++ 2.6.24-rc6-mm1/drivers/usb/host/ohci-dbg.c
@@ -813,30 +813,29 @@ static inline void create_debug_files (s
struct device *dev = bus->dev;
 
ohci->debug_dir = debugfs_create_dir(bus->bus_name, ohci_debug_root);
-   if (!ohci->debug_dir || IS_ERR(ohci->debug_dir)) {
-   ohci->debug_dir = NULL;
-   goto done;
-   }
+   if (!ohci->debug_dir)
+   return;
 
ohci->debug_async = debugfs_create_file("async", S_IRUGO,
ohci->debug_dir, dev,
&debug_async_fops);
-   if (!ohci->debug_async || IS_ERR(ohci->debug_async))
+   if (!ohci->debug_async)
goto async_error;
 
ohci->debug_periodic = debugfs_create_file("periodic", S_IRUGO,
   ohci->debug_dir, dev,
   &debug_periodic_fops);
-   if (!ohci->debug_periodic || IS_ERR(ohci->debug_periodic))
+   if (!ohci->debug_periodic)
goto periodic_error;
 
ohci->debug_registers = debugfs_create_file("registers", S_IRUGO,
ohci->debug_dir, dev,
&debug_registers_fops);
-   if (!ohci->debug_registers || IS_ERR(ohci->debug_registers))
+   if (!ohci->debug_registers)
goto registers_error;
 
-   goto done;
+   ohci_dbg(ohci, "created debug files\n");
+   return;
 
 registers_error:
debugfs_remove(ohci->debug_periodic);
@@ -847,10 +846,6 @@ periodic_error:
 async_error:
debugfs_remove(ohci->debug_dir);
ohci->debug_dir = NULL;
-done:
-   return;
-
-   ohci_dbg (ohci, "created debug files\n");
 }
 
 static inline void remove_debug_files (struct ohci_hcd *ohci)
Index: 2.6.24-rc6-mm1/drivers/usb/host/ehci-hcd.c
===
--- 2.6.24-rc6-mm1.orig/drivers/usb/host/ehci-hcd.c
+++ 2.6.24-rc6-mm1/drivers/usb/host/ehci-hcd.c
@@ -1019,14 +1019,8 @@ static int __init ehci_hcd_init(void)
 
 #ifdef DEBUG
ehci_debug_root = debugfs_create_dir("ehci", NULL);
-   if (!ehci_debug_root || IS_ERR(ehci_debug_root)) {
-   if (!ehci_debug_root)
-   retval = -ENOENT;
-   else
-   retval = PTR_ERR(ehci_debug_root);
-
-   return retval;
-   }
+   if (!ehci_debug_root)
+   return -ENOENT;
 #endif
 
 #ifdef PLATFORM_DRIVER
Index: 2.6.24-rc6-mm1/drivers/usb/host/ehci-dbg.c
===
--- 2.6.24-rc6-mm1.orig/drivers/usb/host/ehci-dbg.c
+++ 2.

Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer

2007-10-27 Thread Alan Stern
On Sat, 27 Oct 2007, Maxim Levitsky wrote:

> > Use del_timer_sync().  It guarantees that when it returns, the timer 
> > will be stopped and the timer routine will no longer be running on any 
> > CPU.
> > 
> Even if the timer re-enables itself, are you sure?

Last time I looked at the source code, that's what it did.  I'll look
again...  Yep, it still does.  It checks to see if the timer routine is
currently running; if so then it waits a little while and tries again.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer

2007-10-27 Thread Alan Stern
On Fri, 26 Oct 2007, Maxim Levitsky wrote:

> > > Looking through the dmfe code, I noticed yet another possible race.
> > > A race between the .suspend, and a timer that serves both as a watchdog, 
> > > and link state detector.
> > > Again I need to prevent it from running during the suspend/resume, but 
> > > how?
> > > 
> > > I can use del_timer in .suspend, and mod_timer in .resume, but that 
> > > doesn't protect against
> > > race with already running timer.
> > > I can use del_timer_sync, but it states that it is useless if timer 
> > > re-enables itself, and I agree with that.
> > > In dmfe case the timer does re-enable itself.
> > 
> > That comment isn't right.  del_timer_sync works perfectly well even if
> > the timer routine re-enables itself, provided it stops doing so after a
> > small number of iterations.
> Thanks for the info. but
> Due to the "don't access the hardware, while powered-off" rule, I must know 
> that the timer isn't running.
> and won't be.
> So what function to use (if possible) to be sure that the timer won't run 
> anymore?
> (Taking in the account the fact that it re-enables itself)

Use del_timer_sync().  It guarantees that when it returns, the timer 
will be stopped and the timer routine will no longer be running on any 
CPU.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer

2007-10-25 Thread Alan Stern
On Thu, 25 Oct 2007, Maxim Levitsky wrote:

> Hi,
> 
> Recently, trying to fix saa7134 suspend/resume problems I found that there 
> is a race between IRQ handler and .suspend , and that I cant let driver 
> access the device
> while its in D3 since it can lock up some systems.
> 
> Now I am looking to fix those issues in two drivers that have my 
> .suspend/.resume routines.
> the saa7134 capture chip and dmfe, the davicom network driver.
> 
> Looking through the dmfe code, I noticed yet another possible race.
> A race between the .suspend, and a timer that serves both as a watchdog, and 
> link state detector.
> Again I need to prevent it from running during the suspend/resume, but how?
> 
> I can use del_timer in .suspend, and mod_timer in .resume, but that doesn't 
> protect against
> race with already running timer.
> I can use del_timer_sync, but it states that it is useless if timer 
> re-enables itself, and I agree with that.
> In dmfe case the timer does re-enable itself.

That comment isn't right.  del_timer_sync works perfectly well even if
the timer routine re-enables itself, provided it stops doing so after a
small number of iterations.

> I can put checks in the timer for ->insuspend, and don't re enable it if set,
> but that opens a new can of worms with memory barriers, etc...

You don't have to worry about any of that stuff.  Just check the 
insuspend flag and don't re-enable the timer if it is set.  Even 
without any memory barriers, the timer routine won't iterate more than 
once or twice.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Questions about IPsec and Netfilter

2007-05-10 Thread Alan Stern
I've got a few questions about the relationship between the IPsec 
implementation and Netfilter.

Q1: At what points during packet processing do the IPsec transformations 
occur?  In particular, which netfilter hooks do they come before and 
after?  And likewise, which routing operations do they come before and 
after?

Q2: When a packet using IPsec tunnel mode is encapsulated or 
de-encapsulated, does the newly-formed packet return to some earlier point 
in the stack for further netfilter processing or routing?  What about 
transport mode?

Q3: How can iptables rules determine whether they are dealing with a 
packet which has been de-encapsulated from (or encapsulated within) an 
IPsec wrapper?

Q4: Is it true that NAT-Traversal isn't implemented for transport mode?

In RFC 2401 (Security Architecture for the Internet Protocol), section 5
includes this text:

   As mentioned in Section 4.4.1 "The Security Policy Database (SPD)",
   the SPD must be consulted during the processing of all traffic
   (INBOUND and OUTBOUND), including non-IPsec traffic.  If no policy is
   found in the SPD that matches the packet (for either inbound or
   outbound traffic), the packet MUST be discarded.

But on Linux systems, by default the SPD is normally empty (as shown by
"setkey -DP") and all packets are allowed to pass unhindered.

Q5: Isn't this a violation of the RFC?  Or is there some implicit policy 
entry which accepts all packets without applying any security association?


Thanks for any answers.  I may think up more questions later...

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Runtime power management for network interfaces

2006-07-25 Thread Alan Stern
During a Power Management session at the Ottawa Linux Symposium, it was
generally agreed that network interface drivers ought to automatically
suspend their devices (if possible) whenever:

(1) The interface is ifconfig'ed down, or

(2) No link is available.

Presumably (1) should be easy enough to implement.  (2) might or might not
be feasible, depending on how much WOL support is available.  (It might
not be feasible at all for wireless networking.)  Still, there can be no
question that it would be a Good Thing for laptops to power-down their
ethernet controllers when the network cable is unplugged.

Has any progress been made in this direction?  If not, a natural approach 
would be to start with a reference implementation in one driver which 
could then be copied to other drivers.

Any suggestions?

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html