from:"Tejun Heo"

Re: [PATCH] net: pegasus: Remove deprecated create_singlethread_workqueue

2016-08-31 Thread Tejun Heo

On Tue, Aug 30, 2016 at 10:02:47PM +0530, Bhaktipriya Shridhar wrote:
> The workqueue "pegasus_workqueue" queues a single work item per pegasus
> instance and hence it doesn't require execution ordering. Hence,
> alloc_workqueue has been used to replace the deprecated
> create_singlethread_workqueue instance.
> 
> The WQ_MEM_RECLAIM flag has been set to ensure forward progress under
> memory pressure since it's a network driver.
> 
> Since there are fixed number of work items, explicit concurrency
> limit is unnecessary here.
> 
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriy...@gmail.com>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] whci: Remove deprecated create_singlethread_workqueue

2016-08-15 Thread Tejun Heo

On Sat, Aug 13, 2016 at 09:20:40PM +0530, Bhaktipriya Shridhar wrote:
> alloc_ordered_workqueue replaces the deprecated
> create_singlethread_workqueue.
> 
> The workqueue "workqueue" has multiple workitems which may require
> ordering. Hence, a dedicated ordered workqueue has been used.
> Since the workqueue is not being used on a memory reclaim path,
> WQ_MEM_RECLAIM has not been set.
> 
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriy...@gmail.com>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] usb: host: u132-hcd: Remove deprecated create_singlethread_workqueue

2016-08-02 Thread Tejun Heo

Hello,

On Tue, Aug 02, 2016 at 03:29:48PM +0200, Oliver Neukum wrote:
> Apparently if that is the requirement USB will have to define its own
> set of flags to use in such contexts. But still the calls as currently
> executed work. Taking away WQ_MEM_RECLAIM would create danger of
> introducing a regression. The issue with __GFP_DIRECT_RECLAIM already
> exists and can be fixed.

Alright, let's go with WQ_MEM_RECLAIM then.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] usb: host: u132-hcd: Remove deprecated create_singlethread_workqueue

2016-08-01 Thread Tejun Heo

Hello,

On Mon, Aug 01, 2016 at 03:50:36PM +0200, Michal Hocko wrote:
> > All that would do is deferring the deadlock, right?  I'm not sure it
> > makes a lot of sense to protect an IO path against memory pressure
> > half-way.  It either can be depended during memory reclaim or it
> > can't. 
> 
> Completely agreed! If the rescuer thread can block on a memory
> allocation be it GFP_NOIO or others it is basically useless.
...
> > Can MM people please chime in?  The question is about USB stoage
> > devices and memory reclaim.  USB doesn't guarantee forward progress
> > under memory pressure but tries a best-effort attempt with GFP_NOIO
> > and ATOMIC.  Is this the right thing to do?
> 
> If any real IO depends on those devices then this is not sufficient and
> they need some form of guarantee for progress (aka mempool).

Oliver, Alan, what do you think?  If USB itself can't operate without
allocating memory during transactions, whatever USB storage drivers
are doing isn't all that meaningful.  Can we proceed with the
workqueue patches?  Also, it could be that the only thing GFP_NOIO and
GFP_ATOMIC are doing is increasing the chance of IO failures under
memory pressure.  Maybe it'd be a good idea to reconsider the
approach?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usb: lvstest: Remove deprecated create_singlethread_workqueue

2016-07-29 Thread Tejun Heo

On Thu, Jul 28, 2016 at 02:15:11PM +0530, Bhaktipriya Shridhar wrote:
> The workqueue has a single work item(>rh_work) and hence
> doesn't require ordering. Also, it is not being used on a memory
> reclaim path. Hence, the singlethreaded workqueue has been replaced
> with the use of system_wq.
> 
> System workqueues have been able to handle high level of concurrency
> for a long time now and hence it's not required to have a singlethreaded
> workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
> created with create_singlethread_workqueue(), system_wq allows multiple
> work items to overlap executions even on the same CPU; however, a
> per-cpu workqueue doesn't have any CPU locality or global ordering
> guarantee unless the target CPU is explicitly specified and thus the
> increase of local concurrency shouldn't make any difference.
> 
> The work item has been flushed in lvs_rh_disconnect() to ensure that
> there are no pending tasks while disconnecting the driver.
> 
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriy...@gmail.com>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usb: dwc2: Remove deprecated create_singlethread_workqueue

2016-07-29 Thread Tejun Heo

On Thu, Jul 28, 2016 at 01:57:29PM +0530, Bhaktipriya Shridhar wrote:
> alloc_ordered_workqueue replaces the deprecated
> create_singlethread_workqueue.
> 
> There are multiple work items on the work queue, which require
> ordering. Hence, an ordered workqueue has been used.
> 
> The workqueue "wq_otg" is not being used on a memory reclaim path.
> Hence, WQ_MEM_RECLAIM has not been set.
> 
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriy...@gmail.com>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] usb: host: u132-hcd: Remove deprecated create_singlethread_workqueue

2016-07-29 Thread Tejun Heo

Hello, Alan.

On Wed, Jul 27, 2016 at 04:45:19PM -0400, Alan Stern wrote:
> > Hmm... That doesn't really make them dependable during memory reclaim.
> 
> True.  But it does mean that they can't cause a deadlock by waiting
> indefinitely for some other memory to be paged out to the very device
> they are on the access pathway for.
> 
> > What happens when those allocations fail?
> 
> The same thing that happens when any allocation fails -- the original
> I/O request fails with -ENOMEM or the equivalent.  In the case of 
> usb-storage, this is likely to trigger error recovery, which will need 
> to allocate memory of its own...  A bad situation to get into.

All that would do is deferring the deadlock, right?  I'm not sure it
makes a lot of sense to protect an IO path against memory pressure
half-way.  It either can be depended during memory reclaim or it
can't.  The use of GFP_NOIO / ATOMIC is probably increases the chance
of IO errors under moderate memory pressure too when there are
dependable memory backing devices (and there better be) which can push
things forward if called upon.

Can MM people please chime in?  The question is about USB stoage
devices and memory reclaim.  USB doesn't guarantee forward progress
under memory pressure but tries a best-effort attempt with GFP_NOIO
and ATOMIC.  Is this the right thing to do?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] usb: host: u132-hcd: Remove deprecated create_singlethread_workqueue

2016-07-27 Thread Tejun Heo

Hello, Alan.

On Wed, Jul 27, 2016 at 02:54:51PM -0400, Alan Stern wrote:
> > Hmm... I didn't know the whole USB stack could operate without
> > allocating memory.  Does usb stack have mempools and stuff all the way
> > through?
> 
> No -- the USB stack does need to allocate memory in order to operate.  
> But it is careful to use GFP_NOIO or GFP_ATOMIC for allocations that
> might be on the block-device path.

Hmm... That doesn't really make them dependable during memory reclaim.
What happens when those allocations fail?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] usb: host: u132-hcd: Remove deprecated create_singlethread_workqueue

2016-07-27 Thread Tejun Heo

Hello, Oliver.

On Wed, Jul 27, 2016 at 11:29:56AM +0200, Oliver Neukum wrote:
> On Wed, 2016-07-27 at 14:50 +0530, Bhaktipriya Shridhar wrote:
> > The workqueue "workqueue" has multiple workitems which may require
> > ordering. Hence, a dedicated ordered workqueue has been used.
> > Since the workqueue is not being used on a memory reclaim path,
> > WQ_MEM_RECLAIM has not been set.
> 
> That is incorrect. The work queue is used by the HCD to handle
> TDs, which are parts of basic IO. The HCD in turn is used by
> usb-storage and uas, which are block drivers and those are obviously
> used on the memory reclaim path.

Hmm... I didn't know the whole USB stack could operate without
allocating memory.  Does usb stack have mempools and stuff all the way
through?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] USB: appledisplay: Remove deprecated create_singlethread_workqueue

2016-07-26 Thread Tejun Heo

On Tue, Jul 26, 2016 at 10:19:18PM +0530, Bhaktipriya Shridhar wrote:
> The workqueue "wq" is involved in controlling the brightness of an
> Apple Cinema Display over USB.
> 
> It has a single work item(>work) per appledisplay and hence
> doesn't require ordering. Also, it is not being used on a memory
> reclaim path.
> 
> Hence, the singlethreaded workqueue has been replaced with the use of
> system_wq.
> 
> System workqueues have been able to handle high level of concurrency
> for a long time now and hence it's not required to have a singlethreaded
> workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
> created with create_singlethread_workqueue(), system_wq allows multiple
> work items to overlap executions even on the same CPU; however, a
> per-cpu workqueue doesn't have any CPU locality or global ordering
> guarantee unless the target CPU is explicitly specified and thus the
> increase of local concurrency shouldn't make any difference.
> 
> The work item is self-requeueing and needs to wait for the in-flight
> work item to finish before proceeding with destruction.
> Hence, it has been sync cancelled in appledisplay_disconnect().
> This also ensures that there are no pending tasks while disconnecting
> the driver.
> 
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriy...@gmail.com>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] USB: appledisplay: Remove deprecated create_singlethread_workqueue

2016-07-26 Thread Tejun Heo

On Tue, Jul 26, 2016 at 11:11:18AM +0530, Bhaktipriya Shridhar wrote:
> The workqueue "wq" is involved in controlling the brightness of an
> Apple Cinema Display over USB.
> 
> It has a single work item(>work) per appledisplay and hence
> doesn't require ordering. Also, it is not being used on a memory
> reclaim path.
> 
> Hence, the singlethreaded workqueue has been replaced with the use of
> system_wq.
> 
> System workqueues have been able to handle high level of concurrency
> for a long time now and hence it's not required to have a singlethreaded
> workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
> created with create_singlethread_workqueue(), system_wq allows multiple
> work items to overlap executions even on the same CPU; however, a
> per-cpu workqueue doesn't have any CPU locality or global ordering
> guarantee unless the target CPU is explicitly specified and thus the
> increase of local concurrency shouldn't make any difference.
> 
> The work item is self-requeueing and needs to wait for the in-flight
> work item to finish before proceeding with destruction.
> Hence, it has been sync cancelled in appledisplay_disconnect().
> This also ensures that there are no pending tasks while disconnecting the
> driver.
> 
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriy...@gmail.com>
...
> @@ -159,7 +158,7 @@ static int appledisplay_bl_update_status(struct 
> backlight_device *bd)
>   pdata->msgdata, 2,
>       ACD_USB_TIMEOUT);
>   mutex_unlock(>sysfslock);
> -
> +

???

Other than that,

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usb: ftdi-elan: Remove deprecated create_singlethread_workqueue

2016-07-26 Thread Tejun Heo

On Tue, Jul 26, 2016 at 10:47:20AM +0530, Bhaktipriya Shridhar wrote:
> The status workqueue is involved in initializing the Uxxx and polling
> the Uxxx until a supported PCMCIA CardBus device is detected.
> It then starts the command and respond workqueues and then loads the
> module that handles the device, after which it just polls the Uxxx
> looking for card ejects.
> 
> The command and respond workqueues are involved in implementing a command
> sequencer for communicating with the firmware on the other side of
> the FTDI chip in the Uxxx.
> 
> These workqueues have only a single work item each and hence they do not
> require ordering. Also, none of the above workqueues are being used on a
> memory recliam path. Hence, the singlethreaded workqueues have been
> replaced with the use of system_wq.
> 
> System workqueues have been able to handle high level of concurrency
> for a long time now and hence it's not required to have a singlethreaded
> workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
> created with create_singlethread_workqueue(), system_wq allows multiple
> work items to overlap executions even on the same CPU; however, a
> per-cpu workqueue doesn't have any CPU locality or global ordering
> guarantee unless the target CPU is explicitly specified and thus the
> increase of local concurrency shouldn't make any difference.
> 
> The work items have been sync cancelled because they are self-requeueing
> and need to wait for the in-flight work item to finish before proceeding
> with destruction. Hence, they have been sync cancelled in
> ftdi_status_cancel_work(), ftdi_command_cancel_work() and
> ftdi_response_cancel_work(). These functions are called in
> ftdi_elan_exit() to ensure that there are no pending work items while
> disconnecting the driver.
> 
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriy...@gmail.com>

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-03-19 Thread Tejun Heo

Hello, Jan, Alan.

On Tue, Mar 15, 2016 at 10:25:43AM +0100, Jan Kara wrote:
> > The kernel does suspend device drivers; that is, it invokes their
> > suspend callbacks.  But it doesn't "freeze" them in any sense.  Once a 
> > driver has been suspended, it assumes it won't receive any I/O requests 
> > until it has been resumed.  Therefore the kernel first has to prevent 
> > all the upper layers from generating such requests and/or sending them 
> > to the low-level drivers.
> 
> OK, so Tejun and you should talk together because you both seem to want
> something else... If I understand it right, Tejun wants suspended devices
> to just queue requests that have been submitted after these devices were
> suspended and complete them once they are resumed...

Yeah, I suppose that's why we have the code base we do now.  I don't
think freezing kernel threads is the right mechanism to plug IO
devices during suspend.  It's way too error-prone and causes a
dependency nightmare as it acts essentially as a system-wide lock.

More complex drivers already plug themselves which are necessary no
matter what as upper layers or some kthreads aren't the only sources
of commands to devices.  We can plug at block layer for IOs coming
down from higher layers.  We can even provide a mechanism to plug
certain kthreads if necessary but they should be contained in the
driver - e.g. the suspend callback specifically blocking certain
specific kthreads - instead of the vague "the system is generally
stopped now and it seems to work most of the time" that we're doing
now.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-03-11 Thread Tejun Heo

Hello, Jan.

On Thu, Mar 03, 2016 at 10:33:10AM +0100, Jan Kara wrote:
> > Ugh... that's nasty.  I wonder whether the right thing to do is making
> > writeback workers non-freezable.  IOs are supposed to be blocked from
> > lower layer anyway.  Jan, what do you think?
> 
> Well no, at least currently IO is not blocked in lower layers AFAIK - for
> that you'd need to freeze block devices & filesystems and there are issues

At least libata does and I think SCSI does too, but yeah, there
probably are drivers which depend on block layer blocking IOs, which
btw is a pretty fragile way to go about as upper layers might not be
the only source of activities.

> with that (Jiri Kosina was the last one which was trying to make this work
> AFAIR). And I think you need to stop writeback (and generally any IO) to be
> generated so that it doesn't interact in a strange way with device drivers
> being frozen. So IMO until suspend freezes filesystems & devices properly
> you have to freeze writeback workqueue.

I still think the right thing to do is plugging that block layer or
low level drivers.  It's like we're trying to plug multiple sources
when we can plug the point where they come together anyway.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-03-02 Thread Tejun Heo

Hello,

(cc'ing Jan)

On Fri, Feb 26, 2016 at 02:19:20PM +0800, Peter Chen wrote:
> On Thu, Feb 25, 2016 at 05:01:12PM -0500, Tejun Heo wrote:
> > Hello, Peter.
> > 
> > On Wed, Feb 24, 2016 at 03:24:30PM +0800, Peter Chen wrote:
> > > > You might want to complain to the block-layer people about this.  I 
> > > > don't know if anything can be done to fix it.
> > > > 
> > > > Or maybe flush_work and flush_delayed_work can be changed to avoid 
> > > > blocking if the workqueue is frozen.  Tejun?
> > > > 
> > > 
> > > I have a patch to show the root cause of this issue.
> > > 
> > > http://www.spinics.net/lists/linux-usb/msg136815.html
> > 
> > I don't get it.  Why would it deadlock?  Shouldn't things get rolling
> > once the workqueues are thawed?
> 
> The workqueue writeback can't be thawed due to driver's resume
> (dpm_complete) is lock nested, and can't be finished.

Ugh... that's nasty.  I wonder whether the right thing to do is making
writeback workers non-freezable.  IOs are supposed to be blocked from
lower layer anyway.  Jan, what do you think?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-02-25 Thread Tejun Heo

Hello, Peter.

On Wed, Feb 24, 2016 at 03:24:30PM +0800, Peter Chen wrote:
> > You might want to complain to the block-layer people about this.  I 
> > don't know if anything can be done to fix it.
> > 
> > Or maybe flush_work and flush_delayed_work can be changed to avoid 
> > blocking if the workqueue is frozen.  Tejun?
> > 
> 
> I have a patch to show the root cause of this issue.
> 
> http://www.spinics.net/lists/linux-usb/msg136815.html

I don't get it.  Why would it deadlock?  Shouldn't things get rolling
once the workqueues are thawed?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/23] usb-gadget: use per-attribute show and store methods

2015-09-30 Thread Tejun Heo

On Wed, Sep 30, 2015 at 11:32:19AM -0500, Felipe Balbi wrote:
> On Wed, Sep 30, 2015 at 12:20:46PM -0400, Tejun Heo wrote:
> > On Wed, Sep 30, 2015 at 11:19:25AM -0500, Felipe Balbi wrote:
> > > On Mon, Sep 28, 2015 at 03:35:14PM +0200, Christoph Hellwig wrote:
> > > > On Sun, Sep 27, 2015 at 10:50:53AM -0500, Felipe Balbi wrote:
> > > > > this (and the other helper below) could be macros just fine.
> > > > 
> > > > They could, but they shouldn't.  Inlines are always preferable over
> > > > function-like macros.
> > > 
> > > says who ? And why ?
> > 
> > Documentation/CodingStyle
> 
> container_of() is type-safe, what is an inline function bringing as benefit ?

It's a general preference.  Because there's enough benefit to going
with inline functions and there's extra benefit to be gained from
having consistent style of code and documentation, as a general rule,
we prefer inline functions over macros.  If you have specific
technical arguments why macro is better, sure; otherwise, follow the
conventions for consistency if for nothing else.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/23] usb-gadget: use per-attribute show and store methods

2015-09-30 Thread Tejun Heo

On Wed, Sep 30, 2015 at 11:19:25AM -0500, Felipe Balbi wrote:
> On Mon, Sep 28, 2015 at 03:35:14PM +0200, Christoph Hellwig wrote:
> > On Sun, Sep 27, 2015 at 10:50:53AM -0500, Felipe Balbi wrote:
> > > this (and the other helper below) could be macros just fine.
> > 
> > They could, but they shouldn't.  Inlines are always preferable over
> > function-like macros.
> 
> says who ? And why ?

Documentation/CodingStyle

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/23] usb-gadget: use per-attribute show and store methods

2015-09-30 Thread Tejun Heo

On Wed, Sep 30, 2015 at 11:43:01AM -0500, Felipe Balbi wrote:
> Seems like there are *ton* of uses of container_of() wrapped within a simple
> macro. What convention are you talking about, again ?

The convention of using inline functions over macros when possible.
We don't do that all the time but that's where we wanna be in general.

> And again, what benefit is an inline function bringing in this specific
> case ? As for a technical reason, we know the macro definition will be
> copied Verbatim into the caller body. GCC might decide to not inline
> those helpers (unlikely, but could).

That's really grasping at straws.  Let's not go there.

For simple stuff like container_of(), a more valid reason would be
"it's shorter and sweeter and AFAICS doesn't have any known downsides
of using macros" but this ultimately is a bike-shedding problem.

You asked where and why we said we prefer inline functions so that's
the answer (aside from individual technical advantages).  It's not an
absolute rule but we're better off if we try to keep things consistent
if possible.  As for this specific case, I don't know.  I might do the
macro too just because it's less typing and I don't really care but at
the same time I'd just switch to inline if somebody points it out.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/23] configfs: add show and store methods to struct configfs_attribute

2015-09-25 Thread Tejun Heo

Hello, Christoph.

On Fri, Sep 25, 2015 at 06:49:38AM -0700, Christoph Hellwig wrote:
> Add methods to struct configfs_attribute to directly show and store
> attributes without adding boilerplate code to every user.  In addition
> to the methods this also adds 3 helper macros to define read/write,
> read-only and write-only attributes with a single line of code.

This looks great to me.  It doesn't look like the whole attribute
thing played out as hoped.  It'd be great if we can move away from it.

Thanks a lot for the work!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 1/4] phy: phy-core: Make GENERIC_PHY an invisible option

2015-05-29 Thread Tejun Heo

On Fri, May 29, 2015 at 06:07:18PM +0530, Kishon Vijay Abraham I wrote:
 Tejun, Maxime, Sylwester, Kyungmin
 
 On Thursday 23 April 2015 04:34 AM, Arun Ramamurthy wrote:
 Most of the phy providers use select to enable GENERIC_PHY. Since select
 is only recommended when the config is not visible, GENERIC_PHY is changed
 an invisible option. To maintain consistency, all phy providers are changed
 to select GENERIC_PHY and all non-phy drivers use depends on when the
 phy framework is explicity required. USB_MUSB_OMAP2PLUS has a cyclic
 dependency, so it is left as select.
 
 Signed-off-by: Arun Ramamurthy arun.ramamur...@broadcom.com
 
 Need your ACK for this patch.

For ATA part,

 Acked-by: Tejun Heo t...@kernel.org

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 24/32] usb: use %*pb[l] to print bitmaps including cpumasks and nodemasks

2015-01-24 Thread Tejun Heo

printk and friends can now formap bitmaps using '%*pb[l]'.  cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.

* drivers/uwb/drp.c::uwb_drp_handle_alien_drp() was formatting mas.bm
  into a buffer but never used it.  Removed.

This patch is dependent on the following two patches.

 lib/vsprintf: implement bitmap printing through '%*pb[l]'
 cpumask, nodemask: implement cpumask/nodemask_pr_args()

Please wait till the forementioned patches are merged to mainline
before applying to subsystem trees.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: linux-usb@vger.kernel.org
---
 drivers/usb/host/whci/debug.c  |  7 ++-
 drivers/usb/wusbcore/reservation.c |  5 ++---
 drivers/usb/wusbcore/wa-rpipe.c|  5 ++---
 drivers/usb/wusbcore/wusbhc.c  |  7 ++-
 drivers/uwb/drp.c  |  2 --
 drivers/uwb/uwb-debug.c| 16 +---
 6 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/drivers/usb/host/whci/debug.c b/drivers/usb/host/whci/debug.c
index ba61dae..774b89d 100644
--- a/drivers/usb/host/whci/debug.c
+++ b/drivers/usb/host/whci/debug.c
@@ -86,17 +86,14 @@ static void qset_print(struct seq_file *s, struct whc_qset 
*qset)
 static int di_print(struct seq_file *s, void *p)
 {
struct whc *whc = s-private;
-   char buf[72];
int d;
 
for (d = 0; d  whc-n_devices; d++) {
struct di_buf_entry *di = whc-di_buf[d];
 
-   bitmap_scnprintf(buf, sizeof(buf),
-(unsigned long *)di-availability_info, 
UWB_NUM_MAS);
-
seq_printf(s, DI[%d]\n, d);
-   seq_printf(s,   availability: %s\n, buf);
+   seq_printf(s,   availability: %*pb\n,
+  UWB_NUM_MAS, (unsigned long *)di-availability_info);
seq_printf(s,   %c%c key idx: %d dev addr: %d\n,
   (di-addr_sec_info  WHC_DI_SECURE) ? 'S' : ' ',
   (di-addr_sec_info  WHC_DI_DISABLE) ? 'D' : ' ',
diff --git a/drivers/usb/wusbcore/reservation.c 
b/drivers/usb/wusbcore/reservation.c
index d5efd0f..7b1b2e2 100644
--- a/drivers/usb/wusbcore/reservation.c
+++ b/drivers/usb/wusbcore/reservation.c
@@ -49,14 +49,13 @@ static void wusbhc_rsv_complete_cb(struct uwb_rsv *rsv)
struct wusbhc *wusbhc = rsv-pal_priv;
struct device *dev = wusbhc-dev;
struct uwb_mas_bm mas;
-   char buf[72];
 
dev_dbg(dev, %s: state = %d\n, __func__, rsv-state);
switch (rsv-state) {
case UWB_RSV_STATE_O_ESTABLISHED:
uwb_rsv_get_usable_mas(rsv, mas);
-   bitmap_scnprintf(buf, sizeof(buf), mas.bm, UWB_NUM_MAS);
-   dev_dbg(dev, established reservation: %s\n, buf);
+   dev_dbg(dev, established reservation: %*pb\n,
+   UWB_NUM_MAS, mas.bm);
wusbhc_bwa_set(wusbhc, rsv-stream, mas);
break;
case UWB_RSV_STATE_NONE:
diff --git a/drivers/usb/wusbcore/wa-rpipe.c b/drivers/usb/wusbcore/wa-rpipe.c
index a80c5d2..c7ecdbe 100644
--- a/drivers/usb/wusbcore/wa-rpipe.c
+++ b/drivers/usb/wusbcore/wa-rpipe.c
@@ -496,10 +496,9 @@ void wa_rpipes_destroy(struct wahc *wa)
struct device *dev = wa-usb_iface-dev;
 
if (!bitmap_empty(wa-rpipe_bm, wa-rpipes)) {
-   char buf[256];
WARN_ON(1);
-   bitmap_scnprintf(buf, sizeof(buf), wa-rpipe_bm, wa-rpipes);
-   dev_err(dev, BUG: pipes not released on exit: %s\n, buf);
+   dev_err(dev, BUG: pipes not released on exit: %*pb\n,
+   wa-rpipes, wa-rpipe_bm);
}
kfree(wa-rpipe_bm);
 }
diff --git a/drivers/usb/wusbcore/wusbhc.c b/drivers/usb/wusbcore/wusbhc.c
index 3e1ba51..94f401a 100644
--- a/drivers/usb/wusbcore/wusbhc.c
+++ b/drivers/usb/wusbcore/wusbhc.c
@@ -496,11 +496,8 @@ static void __exit wusbcore_exit(void)
 {
clear_bit(0, wusb_cluster_id_table);
if (!bitmap_empty(wusb_cluster_id_table, CLUSTER_IDS)) {
-   char buf[256];
-   bitmap_scnprintf(buf, sizeof(buf), wusb_cluster_id_table,
-CLUSTER_IDS);
-   printk(KERN_ERR BUG: WUSB Cluster IDs not released 
-  on exit: %s\n, buf);
+   printk(KERN_ERR BUG: WUSB Cluster IDs not released on exit: 
%*pb\n,
+  CLUSTER_IDS, wusb_cluster_id_table);
WARN_ON(1);
}
usb_unregister_notify(wusb_usb_notifier);
diff --git a/drivers/uwb/drp.c b/drivers/uwb/drp.c
index 05b7bd7..8fc1b78 100644
--- a/drivers/uwb/drp.c
+++ b/drivers/uwb/drp.c
@@ -619,11 +619,9 @@ static void uwb_drp_handle_alien_drp(struct uwb_rc *rc, 
struct uwb_ie_drp

Re: [PATCH v4 00/11] scsi: fix module reference mismatch for scsi host

2015-01-20 Thread Tejun Heo

Hello, Akinobu.

On Tue, Jan 20, 2015 at 11:57:37PM +0900, Akinobu Mita wrote:
 The reason I didn't move sht from the core driver to the LLDDs for
 fixing ufs and ums-* in the first place is to avoid exporting many
 symbols for callbacks in sht.  But I realized that we can do it
 without that many exported symbols by creating a single function that
 returns a kmemdup()ed sht with a few change including -module.

Hmmm, libata already exports most of the necessary symbols.  libahci
or platform drivers might have to export more but that shouldn't be
much.  For libata, pushing sht's to the leaf drivers would make far
more sense as sht's already get inherited and modified along the
hierarchy.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 00/11] scsi: fix module reference mismatch for scsi host

2015-01-19 Thread Tejun Heo

On Mon, Jan 19, 2015 at 12:05:58AM +0900, Akinobu Mita wrote:
 While accessing a scsi_device, the use count of the underlying LLDD
 module is incremented.  The module reference is retrieved through
 .module field of struct scsi_host_template.
 
 This mapping between scsi_device and underlying LLDD module works well
 except some drivers which consist with the core driver and the actual
 LLDDs and scsi_host_template is defined in the core driver.  In these
 cases, the actual LLDDs can be unloaded even if the scsi_device is
 being accessed.
 
 This patch series fixes the module reference mismatch problem for
 ufs, usb-storage, esp_scsi, ahci_platform, and pata_platform drivers
 by moving owner module reference field from struct scsi_host_template
 to struct Scsi_Host and allowing the LLDDs to set their correct module
 reference.

Hmmm, can't we just move sht definitions to actual LLDDs?  libata can
do that easily.  sht, as its name implies, is the template for
creating the scsi_hosts of a given type.  We're now just moving module
ownership from sht definition site to whatever callsite the actual
instance is being created which can also be wrapped in a separate
layer requiring explicit propagation.  Why not just propagate sht's
directly?  What's the difference?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 4/4] usb: hub: rename khubd to hub_wq in documentation and comments

2014-09-18 Thread Tejun Heo

On Thu, Sep 18, 2014 at 06:15:02PM +0200, Petr Mládek wrote:
 The easiest solution would be to allocate the work queue with
 the flag WQ_UNBOUND and max_active = 1. It will force serialization
 of all work items.

Please use alloc_ordered_workqueue() for that purpose.  WQ_UNBOUND +
max_active == 1 no longer guarantees single thread of execution.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 4/4] usb: hub: rename khubd to hub_wq in documentation and comments

2014-09-17 Thread Tejun Heo

Hello, Alan, Petr.

On Wed, Sep 17, 2014 at 01:36:26PM -0400, Alan Stern wrote:
  -   /* If khubd ever becomes multithreaded, this will need a lock */
  +   /* If hub_wq ever becomes multithreaded, this will need a lock */
  if (udev-wusb) {
  devnum = udev-portnum + 1;
  BUG_ON(test_bit(devnum, bus-devmap.devicemap));
 
 You probably didn't notice when changing this comment.  But in fact,
 workqueues _are_ multithreaded.  Therefore you need to add a lock to 
 this routine.

Haven't read the code but if this function is called from a single
work_struct, workqueue guarantees that there's only single thread of
execution at any given time.  A work item is never executed
concurrently no matter what.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] usb: hub: convert khubd into workqueue

2014-09-12 Thread Tejun Heo

On Fri, Sep 12, 2014 at 02:21:05PM +0200, Petr Mladek wrote:
 There is no need to have separate kthread for handling USB hub events.
 It is more elegant to use the workqueue framework.
 
 The workqueue is allocated as unbound, cpu intensive, and freezable.

I'd just go with WQ_FREEZABLE.  As a general rule of thumb, please
only use attributes which are strictly necessary.  If the default
behavior turns out to be not so good for typical wq usages (and AFAICS
khubd seems fairly typical), we should be changing the default
behavior.  Each usage deviating complicates the long-term prospect.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-05-02 Thread Tejun Heo

On Mon, Apr 28, 2014 at 08:39:47AM +0800, Li Zhong wrote:
 Yes, maybe try to get the module reference is not bad before writing to
 driver attributes, as it doesn't make much sense to really call the
 callbacks for the driver attribute if the driver is being unload.

Please don't do that spuriously.  Active protection is the primary
mechanism for that sort of protection and adding spurious things just
make them confusing.

 And after we get the reference, it is safe for us to break the active.
 But if we don't have such real cases(lockdep warnings), we actually
 don't need to break it. 

Yeah, for cases where active protection should be broken, other
measures should be taken to prevent the underlying data structure /
code from going away.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-24 Thread Tejun Heo

On Thu, Apr 24, 2014 at 04:29:15PM +0800, Li Zhong wrote:
 On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
  cc'ing Li Zhong who's working on a simliar issue in the following
  thread and quoting whole body.
  
http://thread.gmane.org/gmane.linux.kernel/1680706
  
  Li, this is another variation of the same problem.  Maybe this can be
  covered by your work too?
 
 It seems to me that it is about write something to driver attribute, and
 driver unloading. If so, maybe it's not easy to reuse the help functions
 created for device attribute, and device removing.
 
 But I guess the idea to break the active protection could still be
 applied here:
 
 Maybe we could try_module_get() here (like the other option suggested by
 Johan?), and break active protection if we could get the module,
 something like below? 

I don't get why try_module_get() matters here.  We can't call into
-store if the object at hand is already destroyed and the underlying
module can't go away if the target device is still alive.
try_module_get() doesn't actually protect the object.  Why does that
matter?  This is self removal, right?  Can you please take a look at
kernfs_remove_self()?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] USB: serial: fix sysfs-attribute removal deadlock

2014-04-23 Thread Tejun Heo

cc'ing Li Zhong who's working on a simliar issue in the following
thread and quoting whole body.

  http://thread.gmane.org/gmane.linux.kernel/1680706

Li, this is another variation of the same problem.  Maybe this can be
covered by your work too?

Thanks.

On Wed, Apr 23, 2014 at 11:32:19AM +0200, Johan Hovold wrote:
 Fix driver new_id sysfs-attribute removal deadlock by making sure to
 not hold any locks that the attribute operations grab when removing the
 attribute.
 
 Specifically, usb_serial_deregister holds the table mutex when
 deregistering the driver, which includes removing the new_id attribute.
 This can lead to a deadlock as writing to new_id increments the
 attribute's active count before trying to grab the same mutex in
 usb_serial_probe.
 
 The deadlock can easily be triggered by inserting a sleep in
 usb_serial_deregister and writing the id of an unbound device to new_id
 during module unload.
 
 As the table mutex (in this case) is used to prevent subdriver unload
 during probe, it should be sufficient to only hold the lock while
 manipulating the usb-serial driver list during deregister. A racing
 probe will then either fail to find a matching subdriver or fail to get
 the corresponding module reference.
 
 Since v3.15-rc1 this also triggers the following lockdep warning:
 
 ==
 [ INFO: possible circular locking dependency detected ]
 3.15.0-rc2 #123 Tainted: GW
 ---
 modprobe/190 is trying to acquire lock:
  (s_active#4){.+}, at: [c0167aa0] kernfs_remove_by_name_ns+0x4c/0x94
 
 but task is already holding lock:
  (table_lock){+.+.+.}, at: [bf004d84] usb_serial_deregister+0x3c/0x78 
 [usbserial]
 
 which lock already depends on the new lock.
 
 the existing dependency chain (in reverse order) is:
 
 - #1 (table_lock){+.+.+.}:
[c0075f84] __lock_acquire+0x1694/0x1ce4
[c0076de8] lock_acquire+0xb4/0x154
[c03af3cc] _raw_spin_lock+0x4c/0x5c
[c02bbc24] usb_store_new_id+0x14c/0x1ac
[bf007eb4] new_id_store+0x68/0x70 [usbserial]
[c025f568] drv_attr_store+0x30/0x3c
[c01690e0] sysfs_kf_write+0x5c/0x60
[c01682c0] kernfs_fop_write+0xd4/0x194
[c010881c] vfs_write+0xbc/0x198
[c0108e4c] SyS_write+0x4c/0xa0
[c000f880] ret_fast_syscall+0x0/0x48
 
 - #0 (s_active#4){.+}:
[c03a7a28] print_circular_bug+0x68/0x2f8
[c0076218] __lock_acquire+0x1928/0x1ce4
[c0076de8] lock_acquire+0xb4/0x154
[c0166b70] __kernfs_remove+0x254/0x310
[c0167aa0] kernfs_remove_by_name_ns+0x4c/0x94
[c0169fb8] remove_files.isra.1+0x48/0x84
[c016a2fc] sysfs_remove_group+0x58/0xac
[c016a414] sysfs_remove_groups+0x34/0x44
[c02623b8] driver_remove_groups+0x1c/0x20
[c0260e9c] bus_remove_driver+0x3c/0xe4
[c026235c] driver_unregister+0x38/0x58
[bf007fb4] usb_serial_bus_deregister+0x84/0x88 [usbserial]
[bf004db4] usb_serial_deregister+0x6c/0x78 [usbserial]
[bf005330] usb_serial_deregister_drivers+0x2c/0x4c [usbserial]
[bf016618] usb_serial_module_exit+0x14/0x1c [sierra]
[c009d6cc] SyS_delete_module+0x184/0x210
[c000f880] ret_fast_syscall+0x0/0x48
 
 other info that might help us debug this:
 
  Possible unsafe locking scenario:
 
CPU0CPU1

   lock(table_lock);
lock(s_active#4);
lock(table_lock);
   lock(s_active#4);
 
  *** DEADLOCK ***
 
 1 lock held by modprobe/190:
  #0:  (table_lock){+.+.+.}, at: [bf004d84] usb_serial_deregister+0x3c/0x78 
 [usbserial]
 
 stack backtrace:
 CPU: 0 PID: 190 Comm: modprobe Tainted: GW 3.15.0-rc2 #123
 [c0015e10] (unwind_backtrace) from [c0013728] (show_stack+0x20/0x24)
 [c0013728] (show_stack) from [c03a9a54] (dump_stack+0x24/0x28)
 [c03a9a54] (dump_stack) from [c03a7cac] (print_circular_bug+0x2ec/0x2f8)
 [c03a7cac] (print_circular_bug) from [c0076218] 
 (__lock_acquire+0x1928/0x1ce4)
 [c0076218] (__lock_acquire) from [c0076de8] (lock_acquire+0xb4/0x154)
 [c0076de8] (lock_acquire) from [c0166b70] (__kernfs_remove+0x254/0x310)
 [c0166b70] (__kernfs_remove) from [c0167aa0] 
 (kernfs_remove_by_name_ns+0x4c/0x94)
 [c0167aa0] (kernfs_remove_by_name_ns) from [c0169fb8] 
 (remove_files.isra.1+0x48/0x84)
 [c0169fb8] (remove_files.isra.1) from [c016a2fc] 
 (sysfs_remove_group+0x58/0xac)
 [c016a2fc] (sysfs_remove_group) from [c016a414] 
 (sysfs_remove_groups+0x34/0x44)
 [c016a414] (sysfs_remove_groups) from [c02623b8] 
 (driver_remove_groups+0x1c/0x20)
 [c02623b8] (driver_remove_groups) from [c0260e9c] 
 (bus_remove_driver+0x3c/0xe4)
 [c0260e9c] (bus_remove_driver) from [c026235c] 
 (driver_unregister+0x38/0x58)
 [c026235c] (driver_unregister) from [bf007fb4] 
 (usb_serial_bus_deregister+0x84/0x88 [usbserial])
 [bf007fb4]

Re: [PATCH libata/for-3.15-fixes] libata: drop COMPILE_TEST from AHCI_XGENE

2014-04-03 Thread Tejun Heo

Hello,

On Thu, Apr 03, 2014 at 04:32:24PM +0200, Bartlomiej Zolnierkiewicz wrote:
  So, apparently, this isn't enough as this would allow enabling
  PHY_XGENE regardless of HAS_IOMEM or OF.  From kconfig-language.txt,
 
 PHY_XGENE has the following dependencies:
 
   depends on HAS_IOMEM  OF  (ARM64 || COMPILE_TEST)
 
 So it should be OK.
...
 Please note that ARM64 implies that HAS_IOMEM and OF are _always_
 selected.  This is the reason why removing || COMPILE_TEST from
 AHCI_XGENE dependencies is sufficient to fix the issue (though
 I have to admit that relying indirectly on ARM64 selects is a bit
 hacky).

This being okay is so not obvious.  Let's either float all
dependencies to the higher level drivers or just use depends on.  I
really dislike mixing the two and playing games with the subtle
combinations.  For now, I'm making it all depends on.  Please feel
free to improve on it as necessary.

 Hmmm, the alternative idea is to remove PHY_XGENE reference from
 AHCI_XGENE altogether and add:
 
   default y if AHCI_XGENE
 
 to PHY_XGENE instead (which seems to be not as hacky as solution
 with select PHY_XGENE and also more user-friendly than depends
 on PHY_XGENE one).

The config dependencies should be indicative the actual dependency.
AHCI_XGENE is useless without PHY_XGENE; then, the config dependency
*must* represent that.  That's the priority.

It is true that users probably won't care about PHY_XGENE or any PHY
drivers probably, but, again, let's please not mix depends and selects
in contrived manner to hit the just right mixture of the two.  Let's
please do it systematically.  Float all actual deps to the surface to
the options that the users configuring care about and propagate down
to phy drivers using select.  Our config system *really* doesn't work
with mixed depends and selects.  I mean, think about the suggested
solution.  It works only because there's no actual linkage dependency
between the two, ARM64 implies HAS_IOMEM  OF, AND we omitted
COMPILE_TEST for a completely obscure reason.  I don't think that's
acceptable.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH libata/for-3.15-fixes] libata: drop COMPILE_TEST from AHCI_XGENE

2014-04-02 Thread Tejun Heo

On Wed, Apr 02, 2014 at 11:53:57AM -0400, Tejun Heo wrote:
  config AHCI_XGENE
   tristate APM X-Gene 6.0Gbps AHCI SATA host controller support
 - depends on ARM64 || COMPILE_TEST
 + depends on ARM64
   select PHY_XGENE

So, an alternative would be doing

select PHY_XGENE if ARM64

butI don't think it'd be worthwhile to go above and beyond for
COMPILE_TEST and it might cause other linkage issues down the road.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH libata/for-3.15-fixes] libata: drop COMPILE_TEST from AHCI_XGENE

2014-04-02 Thread Tejun Heo

On Wed, Apr 02, 2014 at 11:53:57AM -0400, Tejun Heo wrote:
 Applied the following patch to libata/for-3.15-fixes.

 Thanks.
 --- 8 ---
 From 9c23f2cf7f6e107e85eef57fdf3049a93b6e157c Mon Sep 17 00:00:00 2001
 From: Tejun Heo t...@kernel.org
 Date: Wed, 2 Apr 2014 11:47:04 -0400

 AHCI_XGENE is only applicable on ARM64 but it can also be enabled for
 compile testing; however, AHCI_XGENE selects PHY_XGENE which has other
 arch specific dependencies.  This leads to the following warning when
 enabling it on other archs for compile testing.

   warning: (AHCI_XGENE) selects PHY_XGENE which has unmet direct
   dependencies (HAS_IOMEM  OF  (ARM64 || COMPILE_TEST))

 Let's drop COMPILE_TEST from AHCI_XGENE.

 Signed-off-by: Tejun Heo t...@kernel.org
 Reported-by: Linus Torvalds torva...@linux-foundation.org
 Cc: Loc Ho l...@apm.com
 Cc: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com

So, apparently, this isn't enough as this would allow enabling
PHY_XGENE regardless of HAS_IOMEM or OF.  From kconfig-language.txt,

   Note:
select should be used with care. select will force
a symbol to a value without visiting the dependencies.
By abusing select you are able to select a symbol FOO even
if FOO depends on BAR that is not set.
In general use select only for non-visible symbols
(no prompts anywhere) and for symbols with no dependencies.
That will limit the usefulness but on the other hand avoid
the illegal configurations all over.

We can add all the necessary dependencies to AHCI_XGENE but I think
the the right thing to do is turning it into a proper dependency.
Will prep another patch.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH libata/for-3.15-fixes] libata: make AHCI_XGENE depend on PHY_XGENE

2014-04-02 Thread Tejun Heo

AHCI_XGENE is only applicable on ARM64 but it can also be enabled for
compile testing; however, AHCI_XGENE selects PHY_XGENE which has other
arch specific dependencies.  This leads to the following warning when
enabling it on other archs for compile testing.

  warning: (AHCI_XGENE) selects PHY_XGENE which has unmet direct
  dependencies (HAS_IOMEM  OF  (ARM64 || COMPILE_TEST))

Selecting a config option which itself has dependencies can easily
lead to broken configurations.  For now, let's just make AHCI_XGENE
depend on PHY_XGENE which has all the necessary dependencies already.

Signed-off-by: Tejun Heo t...@kernel.org
Reported-by: Linus Torvalds torva...@linux-foundation.org
Cc: Loc Ho l...@apm.com
Cc: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com
Cc: Kishon Vijay Abraham I kis...@ti.com
---
 drivers/ata/Kconfig |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index 20e03a7..2e4da3b 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -134,8 +134,7 @@ config AHCI_SUNXI
 
 config AHCI_XGENE
tristate APM X-Gene 6.0Gbps AHCI SATA host controller support
-   depends on ARM64 || COMPILE_TEST
-   select PHY_XGENE
+   depends on PHY_XGENE
help
 This option enables support for APM X-Gene SoC SATA host controller.
 
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/9] usb: don't use PREPARE_DELAYED_WORK

2014-02-22 Thread Tejun Heo

Hello,

If this is actually safe, let's do it from the get-go.

Thanks!
--- 8 ---
PREPARE_[DELAYED_]WORK() are being phased out.  They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

usb_hub-init_work is multiplexed with multiple work functions;
however, the work item is never queued while in-flight, so we can
simply use INIT_DELAYED_WORK() before each queueing.

It would probably be best to route this with other related updates
through the workqueue tree.

Lightly tested.

v2: Greg and Alan confirm that the work item is never queued while
in-flight.  Simply use INIT_DELAYED_WORK().

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: Alan Stern st...@rowland.harvard.edu
Cc: linux-usb@vger.kernel.org
---
 drivers/usb/core/hub.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1040,7 +1040,7 @@ static void hub_activate(struct usb_hub
 */
if (type == HUB_INIT) {
delay = hub_power_on(hub, false);
-   PREPARE_DELAYED_WORK(hub-init_work, hub_init_func2);
+   INIT_DELAYED_WORK(hub-init_work, hub_init_func2);
schedule_delayed_work(hub-init_work,
msecs_to_jiffies(delay));
 
@@ -1194,7 +1194,7 @@ static void hub_activate(struct usb_hub
 
/* Don't do a long sleep inside a workqueue routine */
if (type == HUB_INIT2) {
-   PREPARE_DELAYED_WORK(hub-init_work, hub_init_func3);
+   INIT_DELAYED_WORK(hub-init_work, hub_init_func3);
schedule_delayed_work(hub-init_work,
msecs_to_jiffies(delay));
return; /* Continues at init3: below */
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 5/9] usb: don't use PREPARE_DELAYED_WORK

2014-02-22 Thread Tejun Heo

Hey, Alan.

On Sat, Feb 22, 2014 at 06:03:04PM -0500, Alan Stern wrote:
   then a single init work could be queued to
  the system_unbound_wq which doesn't care about running times.
 
 This sort of thing sounds like the best approach.  Tejun, do you want
 to rewrite the patch, getting rid of the hub_init_func3 and HUB_INIT3
 business entirely?  Or would you like me to do it?

I'll doing the minimal patch in this series and then following up with
the update is probably better.  I can put the patches in a separate
branch so that it can be easily pulled.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/9] usb: don't use PREPARE_DELAYED_WORK

2014-02-21 Thread Tejun Heo

On Fri, Feb 21, 2014 at 10:06:05AM -0500, Alan Stern wrote:
  I think it should be fine to use INIT_DELAYED_WORK(), but Alan would
  know best.  Alan?
 
 That's right; INIT_DELAYED_WORK() should be fine.  Provided there's no 
 problem doing it from within the previous work routine.

Yeah, that's completely fine.  Once a work item starts execution,
workqueue doesn't care about it at all - even freeing and recycling it
is fine.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/9] usb: don't use PREPARE_DELAYED_WORK

2014-02-20 Thread Tejun Heo

PREPARE_[DELAYED_]WORK() are being phased out.  They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

usb_hub-init_work is multiplexed with multiple work functions.
Introduce hub_init_workfn() which invokes usb_hub-init_workfn and
always use it as the work function and update the users to set the
-init_workfn field instead of overriding the work function using
PREPARE_DELAYED_WORK().

It looks like that the work items are never queued while in-flight, so
simply using INIT_DELAYED_WORK() before each queueing could be enough.
This patch performs equivalent conversion just in case but we probably
wanna clean it up later if that's the case.

It would probably be best to route this with other related updates
through the workqueue tree.

Lightly tested.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: linux-usb@vger.kernel.org
---
 drivers/usb/core/hub.c | 13 ++---
 drivers/usb/core/hub.h |  1 +
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 64ea219..2bc61c0 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1040,7 +1040,7 @@ static void hub_activate(struct usb_hub *hub, enum 
hub_activation_type type)
 */
if (type == HUB_INIT) {
delay = hub_power_on(hub, false);
-   PREPARE_DELAYED_WORK(hub-init_work, hub_init_func2);
+   hub-init_workfn = hub_init_func2;
schedule_delayed_work(hub-init_work,
msecs_to_jiffies(delay));
 
@@ -1194,7 +1194,7 @@ static void hub_activate(struct usb_hub *hub, enum 
hub_activation_type type)
 
/* Don't do a long sleep inside a workqueue routine */
if (type == HUB_INIT2) {
-   PREPARE_DELAYED_WORK(hub-init_work, hub_init_func3);
+   hub-init_workfn = hub_init_func3;
schedule_delayed_work(hub-init_work,
msecs_to_jiffies(delay));
return; /* Continues at init3: below */
@@ -1634,6 +1634,13 @@ static void hub_disconnect(struct usb_interface *intf)
kref_put(hub-kref, hub_release);
 }
 
+static void hub_init_workfn(struct work_struct *work)
+{
+   struct usb_hub *hub = container_of(to_delayed_work(work),
+  struct usb_hub, init_work);
+   hub-init_workfn(work);
+}
+
 static int hub_probe(struct usb_interface *intf, const struct usb_device_id 
*id)
 {
struct usb_host_interface *desc;
@@ -1728,7 +1735,7 @@ descriptor_error:
hub-intfdev = intf-dev;
hub-hdev = hdev;
INIT_DELAYED_WORK(hub-leds, led_work);
-   INIT_DELAYED_WORK(hub-init_work, NULL);
+   INIT_DELAYED_WORK(hub-init_work, hub_init_workfn);
usb_get_intf(intf);
 
usb_set_intfdata (intf, hub);
diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h
index df629a3..ef81463 100644
--- a/drivers/usb/core/hub.h
+++ b/drivers/usb/core/hub.h
@@ -72,6 +72,7 @@ struct usb_hub {
unsignedhas_indicators:1;
u8  indicator[USB_MAXCHILDREN];
struct delayed_work leds;
+   work_func_t init_workfn;
struct delayed_work init_work;
struct usb_port **ports;
 };
-- 
1.8.5.3

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH driver-core-linus] kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag

2014-01-29 Thread Tejun Heo

kernfs_deactivate() forgot to check whether KERNFS_LOCKDEP is set
before performing lockdep annotations and ends up feeding
uninitialized lockdep_map to lockdep triggering warning like the
following on USB stick hotunplug.

 usb 1-2: USB disconnect, device number 2
 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 CPU: 1 PID: 62 Comm: khubd Not tainted 3.13.0-work+ #82
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
  880065ca7f60 88013a4ffa08 81cfb6bd 0002
  88013a4ffac8 810f8530 88013a4fc710 0002
  8801 82a3db50 0001 88013a4fc710
 Call Trace:
  [81cfb6bd] dump_stack+0x4e/0x7a
  [810f8530] __lock_acquire+0x1910/0x1e70
  [810f931a] lock_acquire+0x9a/0x1d0
  [8127c75e] kernfs_deactivate+0xee/0x130
  [8127d4c8] kernfs_addrm_finish+0x38/0x60
  [8127d701] kernfs_remove_by_name_ns+0x51/0xa0
  [8127b4f1] remove_files.isra.1+0x41/0x80
  [8127b7e7] sysfs_remove_group+0x47/0xa0
  [8127b873] sysfs_remove_groups+0x33/0x50
  [8177d66d] device_remove_attrs+0x4d/0x80
  [8177e25e] device_del+0x12e/0x1d0
  [819722c2] usb_disconnect+0x122/0x1a0
  [819749b5] hub_thread+0x3c5/0x1290
  [810c6a6d] kthread+0xed/0x110
  [81d0a56c] ret_from_fork+0x7c/0xb0

Fix it by making kernfs_deactivate() perform lockdep annotations only
if KERNFS_LOCKDEP is set.

Signed-off-by: Tejun Heo t...@kernel.org
Reported-by: Fabio Estevam feste...@gmail.com
Reported-by: Alan Stern st...@rowland.harvard.edu
---
 fs/kernfs/dir.c |   12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 5104cf5..bd6e18b 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -187,19 +187,23 @@ static void kernfs_deactivate(struct kernfs_node *kn)
 
kn-u.completion = (void *)wait;
 
-   rwsem_acquire(kn-dep_map, 0, 0, _RET_IP_);
+   if (kn-flags  KERNFS_LOCKDEP)
+   rwsem_acquire(kn-dep_map, 0, 0, _RET_IP_);
/* atomic_add_return() is a mb(), put_active() will always see
 * the updated kn-u.completion.
 */
v = atomic_add_return(KN_DEACTIVATED_BIAS, kn-active);
 
if (v != KN_DEACTIVATED_BIAS) {
-   lock_contended(kn-dep_map, _RET_IP_);
+   if (kn-flags  KERNFS_LOCKDEP)
+   lock_contended(kn-dep_map, _RET_IP_);
wait_for_completion(wait);
}
 
-   lock_acquired(kn-dep_map, _RET_IP_);
-   rwsem_release(kn-dep_map, 1, _RET_IP_);
+   if (kn-flags  KERNFS_LOCKDEP) {
+   lock_acquired(kn-dep_map, _RET_IP_);
+   rwsem_release(kn-dep_map, 1, _RET_IP_);
+   }
 }
 
 /**
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Tejun Heo

Hello, guys.

(cc'ing Greg)

On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:
 On Fri, 13 Dec 2013, Sarah Sharp wrote:
 
   Given the way things work now, I suspect these warnings are truly 
   harmless.  We could simply get rid of the WARN in sysfs_remove_group.
   
   The alternative is to call device_del for SCSI targets earlier on, such 
   as when their hosts are unregistered.  I don't know how James would 
   feel about this approach.  It would be difficult because targets use 
   their own reference counts instead of relying on the usual device 
   refcounting mechanism.
  
  Thanks for looking into this.  I think just getting rid of the WARN
  would be sufficient.  Can you make a patch for that?
 
 Easily.  The downside is that there would no longer be any warning 
 when someone tries to remove a wrong subdirectory by mistake.
 
  The patch still won't help with the UAS issues with
  scsi_init_shared_tag_map though.
 
 I wasn't clear on the reason for that problem.  Does it also arise from 
 late device_del for scsi_target?  I could try to change the way that 
 works, if anybody (Hans?) would like to test it.

While the recent sysfs changes made this issue more visible, Greg
wants to make sure that devices are removed from leaf up in all cases
and keep the warning to ensure that.  Would there be a way fix SCSI
removal ordering?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3.8-rc3 - 3.8-rc4 regression] Re: [PATCH] module, async: async_synchronize_full() on module init iff async is used

2013-12-03 Thread Tejun Heo

Hello,

On Tue, Dec 03, 2013 at 08:28:43AM -0600, Josh Hunt wrote:
 You're right. Thanks for pointing this out. I did not realize there
 was a bug in the init script. The version of initramfs-tools I was
 using had the following bug:
 https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1215911
 
 Updating to 0.99ubuntu13.4 of initramfs-tools resolved my boot hangs.
 
 I did try using the workaround as suggested by Linus. In my setup the
 dm_init() code was hit, however it still appeared to be too late at
 times. I also tried moving the call to async_synchronize_full() above
 the for loop and it still had the same issue (patch attached.) Out of
 around 10 reboot tests it failed to find root 1 or 2 times.
 
 The ubuntu scripts don't ever actually call do_mount() if it can't
 find the device. It seems to rely on some udev functionality to tell
 it when the device is present, and if that fails it just bails out.
 
 This change has introduced a regression. However, I only noticed it
 b/c my init script had a bug which caused it not to wait around for
 the device to appear.

Hmmm so, read the bug report, digged and asked around a bit.
Here's the root problem - ubuntu's initramfs uses a tool to wait for
the root device which uses libudev to listen for the device event;
unfortunately, its rx buffer is not set large enough and the receiver
isn't fast enough, which means that netlink broadcast messages from
the kernel can overrun the buffer.  When that happens, it sets an
error on the socket, so the next recv fails with -ENOBUFS.  If that
happens, the wait for root aborts immediately and initramfs proceeds
to mount non-existent root device.

The only thing which changes by these patches is the timing of events.
The problem likely wasn't as exposed before because things were slow
enough so that either the messages could be consumed fast enough or
there's enough delay between libata module load and the root device
wait hiding the bug in the wait logic.

So, yeah, it's a full blown timing bug.  I'm not sure what we can do
to work around from kernel side except for randomly slowing things
down or forcefully enlarging rx buffer size.  There really is no
interlocking to take advantage of. :(

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 1/2] lib/scatterlist: export sg_miter_skip()

2013-11-29 Thread Tejun Heo

On Tue, Nov 26, 2013 at 12:43:37PM +0800, Ming Lei wrote:
 sg_copy_buffer() can't meet demand for some drrivers(such usb
 mass storage), so we have to use the sg_miter_* APIs to access
 sg buffer, then need export sg_miter_skip() for these drivers.
 
 The API is needed for converting to sg_miter_* APIs in USB storage
 driver for accessing sg buffer.
 
 Acked-by: Andrew Morton a...@linux-foundation.org
 Cc: FUJITA Tomonori fujita.tomon...@lab.ntt.co.jp
 Cc: Tejun Heo t...@kernel.org
 Cc: Jens Axboe ax...@kernel.dk
 Signed-off-by: Ming Lei ming@canonical.com

Reviewed-by: Tejun Heo t...@kernel.org

I suppose this should go through -mm?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3.8-rc3 - 3.8-rc4 regression] Re: [PATCH] module, async: async_synchronize_full() on module init iff async is used

2013-11-26 Thread Tejun Heo

Hello,

On Tue, Nov 26, 2013 at 04:12:41PM -0600, Josh Hunt wrote:
 I should have clarified that I'm not using dm/md in my setup. I know
 the modules are getting loaded in the log I attached, but root is not
 a md/dm device.

Can you please still try it?  The init script is broken and we're now
just trying to restore just enough of the old behavior so that the
issue is not exposed.  The boot script in use seems to load md/dm
modules after storage drivers and use their termination as the signal
for storage ready, so it could be a good enough bandaid even if
you're not using dm/md.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 39/51] DMA-API: others: use dma_set_coherent_mask()

2013-09-20 Thread Tejun Heo

On Fri, Sep 20, 2013 at 12:11:38AM +0100, Russell King wrote:
 The correct way for a driver to specify the coherent DMA mask is
 not to directly access the field in the struct device, but to use
 dma_set_coherent_mask().  Only arch and bus code should access this
 member directly.
 
 Convert all direct write accesses to using the correct API.
 
 Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk

Acked-by: Tejun Heo t...@kernel.org

The patch is pretty widely spread.  I don't mind how it gets routed
but what's the plan?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 39/51] DMA-API: others: use dma_set_coherent_mask()

2013-09-20 Thread Tejun Heo

On Fri, Sep 20, 2013 at 07:16:52AM -0500, Tejun Heo wrote:
 On Fri, Sep 20, 2013 at 12:11:38AM +0100, Russell King wrote:
  The correct way for a driver to specify the coherent DMA mask is
  not to directly access the field in the struct device, but to use
  dma_set_coherent_mask().  Only arch and bus code should access this
  member directly.
  
  Convert all direct write accesses to using the correct API.
  
  Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk
 
 Acked-by: Tejun Heo t...@kernel.org
 
 The patch is pretty widely spread.  I don't mind how it gets routed
 but what's the plan?

Hmm... maybe hte pata_ixp4xx_cf.c part should be moved to the one
which updates pata_octeon_cf.c?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 39/51] DMA-API: others: use dma_set_coherent_mask()

2013-09-20 Thread Tejun Heo

Hey,

On Fri, Sep 20, 2013 at 03:00:18PM +0100, Russell King - ARM Linux wrote:
 Another would be if subsystem maintainers are happy that I carry them,
 I can add the acks, and then later on towards the end of the cycle,
 provide a branch subsystem maintainers could pull.
 
 Or... if you can think of something easier...

I'm happy with the latter method and it's likely that you'll end up
carrying at least some of the patches through your tree anyway.
Please feel free to add my acks to all libata related patches and
carry them through your tree.

Thanks and have fun routing.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3.8-rc3 - 3.8-rc4 regression] Re: [PATCH] module, async: async_synchronize_full() on module init iff async is used

2013-08-12 Thread Tejun Heo

Hello, Jonathan.

On Mon, Aug 12, 2013 at 12:04:11AM -0700, Jonathan Nieder wrote:
 My laptop fails to boot[1] with the message 'Volume group data not
 found'.  Bisects to v3.8-rc4~17 (the above commit).  Reverting that
 commit on top of current master (d92581fcad18, 2013-08-10) produces
 a working kernel.  dmesg output from that working kernel attached.
 More details, including .config, at [2].
 
 Any ideas for tracking this down?

Which initrd / boot script are you using?  It looks like lvm assemble
scripts are running before sdX are detected leading to volume assembly
failure.  Before the patch, any module loading would end up
synchronizing async probes but after the patch modprobe invocations
which don't schedule them won't be.  Does your boot script happen to
run multiple modprobes in parallel and proceed to configure lvm
without waiting for modprobes of libata drivers to finish?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] chipidea: Use devm_request_irq()

2013-07-31 Thread Tejun Heo

Hello,

On Wed, Jul 31, 2013 at 11:44:34AM +0200, Uwe Kleine-König wrote:
   OK, so the possible problem is that remove is called while the irq is
   still active. That means you have to assert that all resources the irq

If your driver destruction path is running while your irq handler is
still running, it's a crappy / broken driver.  You need a deactivation
step whether you're using devm or not.  IRQs can be shared and the
device should be in a quiesced state before the driver detaches
itself.  Note that you can queue deactivation routine using devm.  For
an example, please take a look at
drivers/ata/libata-core.c::ata_host_start().

   handler is using (e.g. ioremap, clk_prepare_enable) are only freed
   *after* the irq is done. For ioremap that means it must be done using
   devm_ioremap_resource. For a clock it's not that easy because the irq
   handler has to assert that a used clk is kept prepared which can only be
   done using clk_prepare which in turn is not allowed in an irq handler.
  
   Hmm. So the only possible fixes are
 - devm* can be told to also care about clk_disable_unprepare
 - after disabling irqs in the remove callback wait for all
   active irqs to be done. (i.e. call synchronize_irq(irq))
 - don't use devm_request_irq

Again, the right thing to do is having a proper deactivation step.
This is nothing devm can do automatically.  There's no way for it to
find out that the device is actually quiesced.  Let's say it waits for
the current instance of irq handler to finish.  How would it know that
it won't start again between the flushing of the current instance and
irq deregistration.  Add an explicit deactivation step using
devres_alloc().

  I'm not sure that devm_ guarantees any ordering in the cleanups it does
  so I'd not like to rely on the first option either, if there were some
  guarantee of that it'd help.  The nice thing about explicitly freeing
  the IRQ is that you can tell that all this stuff is safe by inspection.
 devm_* at least uses list_for_each_entry_reverse
 (drivers/base/devres.c:release_nodes()). Without this guarantee devm_
 would not make much sense IMHO.

devm guarantees that the destruction callbacks are called in the
reverse order of registration.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] chipidea: Use devm_request_irq()

2013-07-31 Thread Tejun Heo

On Wed, Jul 31, 2013 at 12:28:53PM +0200, Uwe Kleine-König wrote:
 Well, you cannot avoid assuming that the irq is still active when your
 driver's remove callback is called. But I agree about crappyness at the
 end of the destruction path. The problem is that crap is as easy as:
 
   probe(..)
   {
   clk = devm_get_clk(...);
   clk_prepare_enable(clk);
   writel(1, base + IRQENABLE);
   devm_request_irq(...);
   }
 
   remove(..)
   {
   writel(0, base + IRQENABLE);
   clk_disable_unprepare(clk);
   }
 
 and I think there are more and more drivers doing that.

Oh, so, the problem is that the driver is mixing devm and non-devm
resource management and ending up messing the order of shutdown?
Well, the obvious solution is using devm for clk too.  devm does
provide constructs to build custom destruction sequence so that such
calls can be mixed but it's a bit of hassle and mostly meant for
driver midlayers (libata, firewire, usb and so on) so that they can
provide nicely wrapped init/exit helpers for low level drivers.

In general, partial devm conversion can easily lead to subtle shutdown
order problems and it's best to go all the way.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] chipidea: Use devm_request_irq()

2013-07-31 Thread Tejun Heo

Hello,

On Wed, Jul 31, 2013 at 12:18:53PM +0100, Mark Brown wrote:
 I'm not sure I understand how this relates the problem.  The main issue
 here is that for the shared IRQ case quiescing the device doesn't make
 any difference since one of the other users of the interrupt could cause
 the interrupt handler to be called regardless of what the hardware is
 doing.  This means that we need to guarantee that anything the interrupt
 handler relies on has not been deallocated before the interrupt handler
 is unregistered.

Yeah, if all resources are allocated using devm - note that you can
hook in non-devm resources using devres_alloc() - all resources which
would be necessary for the interrupt handler would have been allocated
before the irq was allocated, right?  And thus they'll of course
released after the IRQ is freed.  The problem arises when devm and
non-devm releases are mixed as non-devm ones would happen before all
devm ones messing up the release sequencing.

 OK, that's helpful.  It'd be good to document this if it's something
 the API is intending to guarantee, though - devres.txt doesn't mention

Oh, it's definitely guaranteed.  Nothing would work otherwise.

 this and it's not something I'd intuitively expect to be the case.

Oops... I guess I forgot to mention that.  Care to submit a patch?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] chipidea: Use devm_request_irq()

2013-07-31 Thread Tejun Heo

On Wed, Jul 31, 2013 at 12:50:27PM +0100, Mark Brown wrote:
 Most things would work just fine - most of the uses of devm_ are just
 resource allocations that can safely be freed in essentially any order.
 It doesn't really matter if you free the driver's private structure
 before you free the clock that's pointing to it or whatever since
 neither has any real connection to the other.

If you have DMA / IRQ / command engine deactivations in devm path
which often is the case with full conversions, freeing any resources
including DMA areas and host private data in the wrong order is a
horrible idea.  It's worse as it won't really be noticeable in most
cases.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] chipidea: Use devm_request_irq()

2013-07-31 Thread Tejun Heo

Hello,

On Wed, Jul 31, 2013 at 02:27:08PM +0100, Mark Brown wrote:
 It's really only interrupts that affect most devices - if there's DMA or
 anything going on after the remove() then as you said earlier the driver
 is probably doing something wrong.

Hmmm... it depends on the specific driver is converted but if the
deactivation sequence - shutting down of command engine - is also
handled by devm as in libata and if you have non-devres resource free
in the exit path, you have the same problem.  Again, in general,
tearing things down in the order which isn't the reverse of allocation
is a bad idea.  Adhering to the order isn't that hard and will result
in far less craziness in the long term.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] chipidea: Use devm_request_irq()

2013-07-31 Thread Tejun Heo

On Wed, Jul 31, 2013 at 02:57:51PM +0100, Mark Brown wrote:
 That's the only API I've ever heard of doing that.  Everything else is
 just using it to do deallocation.

I'm not sure why or what you're trying to argue here but take a look
at devm_pwm_release() for example.  It calls back into low level
driver free routine.  Are you arguing that it'd be a good idea to
release pci regions before this is complete?  It's just stupid to do
any differently.  There's nothing to argue about.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] chipidea: Use devm_request_irq()

2013-07-31 Thread Tejun Heo

hello,

On Wed, Jul 31, 2013 at 09:55:26PM +0800, Peter Chen wrote:
 I think the main point is we should allocate managed resource which is used
 at interrupt handler before devm_request_irq, and all resources used
 at interrupt
 handler should be managed.
 
 If we use non-managed resource at interrupt handler, but using managed 
 interrupt
 handler, things still will go wrong if there is an odd (unexpected)
 interrupt after
 we finish deactivation at removal.

In general, applying devm partially isn't a good idea.  It's very easy
to get into trouble thanks to release order dependency which isn't
immediately noticeable and there have been actual bugs caused by that.
The strategies which seem to work are either

* Convert everything over to devm by wrapping deactivation in a devres
  callback too.  As long as your init sequence is sane (ie. irq
  shouldn't be request before things used by irq are ready).

* Allocate all resources using devres but shut down the execution
  engine in the remove_one().  Again, as all releases are controlled
  by devres, you won't have to worry about messing up the release
  sequencing.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] chipidea: Use devm_request_irq()

2013-07-31 Thread Tejun Heo

On Wed, Jul 31, 2013 at 04:25:23PM +0100, Mark Brown wrote:
 What I'm saying is that in essentially all the users I've seen devm is
 only being used for things like kfree() or clk_put() which aren't really
 connected in any way and can happen in any order.  This (coupled with
 the lack of documentation that this is supported) is why people are
 nervous about anything that relies on ordering with this stuff - aside
 from ATA everything is just using this for straight frees.

Yeah, sure, thank you very much for your input.  It is of course
strictly ordered and the documentation needs to be updated.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] async: fix __lowest_in_progress()

2013-01-22 Thread Tejun Heo

083b804c4d3e1e3d0eace56bdbc0f674946d2847 (async: use workqueue for
worker pool) made it possible that async jobs are moved from pending
to running out-of-order.  While pending async jobs will be queued and
dispatched for execution in the same order, nothing guarantees they'll
enter 1) move self to the running queue of async_run_entry_fn() in
the same order.

Before the conversion, async implemented its own worker pool.  An
async worker, upon being woken up, fetches the first item from the
pending list, which kept the executing lists sorted.  The conversion
to workqueue was done by adding work_struct to each async_entry and
async just schedules the work item.  The queueing and dispatching of
such work items are still in order but now each worker thread is
associated with a specific async_entry and moves that specific
async_entry to the executing list.  So, depending on which worker
reaches that point earlier, which is non-deterministic, we may end up
moving an async_entry with larger cookie before one with smaller one.

This broke __lowest_in_progress().  running-domain may not be
properly sorted and is not guaranteed to contain lower cookies than
pending list when not empty.  Fix it by ensuring sort-inserting to the
running list and always looking at both pending and running when
trying to determine the lowest cookie.

Over time, the async synchronization implementation became quite
messy.  We better restructure it such that each async_entry is linked
to two lists - one global and one per domain - and not move it when
execution starts.  There's no reason to distinguish pending and
running.  They behave the same for synchronization purposes.

v2: Description updated to better explain why it's broken.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Arjan van de Ven ar...@linux.intel.com
Cc: sta...@vger.kernel.org
---
Linus, I've updated the description to better explain why it's broken.
The code is ugly but cleanup patches are already ready, so it will be
cleaned up during 3.9-rc1.  How should this be routed?

Thanks.

 kernel/async.c |   27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

--- a/kernel/async.c
+++ b/kernel/async.c
@@ -86,18 +86,27 @@ static atomic_t entry_count;
  */
 static async_cookie_t  __lowest_in_progress(struct async_domain *running)
 {
+   async_cookie_t first_running = next_cookie; /* infinity value */
+   async_cookie_t first_pending = next_cookie; /* ditto */
struct async_entry *entry;
 
+   /*
+* Both running and pending lists are sorted but not disjoint.
+* Take the first cookies from both and return the min.
+*/
if (!list_empty(running-domain)) {
entry = list_first_entry(running-domain, typeof(*entry), 
list);
-   return entry-cookie;
+   first_running = entry-cookie;
}
 
-   list_for_each_entry(entry, async_pending, list)
-   if (entry-running == running)
-   return entry-cookie;
+   list_for_each_entry(entry, async_pending, list) {
+   if (entry-running == running) {
+   first_pending = entry-cookie;
+   break;
+   }
+   }
 
-   return next_cookie; /* infinity value */
+   return min(first_running, first_pending);
 }
 
 static async_cookie_t  lowest_in_progress(struct async_domain *running)
@@ -118,13 +127,17 @@ static void async_run_entry_fn(struct wo
 {
struct async_entry *entry =
container_of(work, struct async_entry, work);
+   struct async_entry *pos;
unsigned long flags;
ktime_t uninitialized_var(calltime), delta, rettime;
struct async_domain *running = entry-running;
 
-   /* 1) move self to the running queue */
+   /* 1) move self to the running queue, make sure it stays sorted */
spin_lock_irqsave(async_lock, flags);
-   list_move_tail(entry-list, running-domain);
+   list_for_each_entry_reverse(pos, running-domain, list)
+   if (entry-cookie  pos-cookie)
+   break;
+   list_move_tail(entry-list, pos-list);
spin_unlock_irqrestore(async_lock, flags);
 
/* 2) run (and print duration) */
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/2] block: don't request module during elevator init

2013-01-22 Thread Tejun Heo

From 21c3c5d2800733b7a276725b8e1ae49a694adc1a Mon Sep 17 00:00:00 2001
From: Tejun Heo t...@kernel.org
Date: Tue, 22 Jan 2013 16:48:03 -0800

Block layer allows selecting an elevator which is built as a module to
be selected as system default via kernel param elevator=.  This is
achieved by automatically invoking request_module() whenever a new
block device is initialized and the elevator is not available.

This led to an interesting deadlock problem involving async and module
init.  Block device probing running off an async job invokes
request_module().  While the module is being loaded, it performs
async_synchronize_full() which ends up waiting for the async job which
is already waiting for request_module() to finish, leading to
deadlock.

Invoking request_module() from deep in block device init path is
already nasty in itself.  It seems best to avoid these situations from
the beginning by moving on-demand module loading out of block init
path.

The previous patch made sure that the default elevator module is
loaded early during boot if available.  This patch removes on-demand
loading of the default elevator from elevator init path.  As the
module would have been loaded during boot, userland-visible behavior
difference should be minimal.

For more details, please refer to the following thread.

  http://thread.gmane.org/gmane.linux.kernel/1420814

v2: The bool parameter was named @request_module which conflicted with
request_module().  This built okay w/ CONFIG_MODULES because
request_module() was defined as a macro.  W/o CONFIG_MODULES, it
causes build breakage.  Rename the parameter to @try_loading.
Reported by Fengguang.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Jens Axboe ax...@kernel.dk
Cc: Arjan van de Ven ar...@linux.intel.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Alex Riesen raa.l...@gmail.com
Cc: Fengguang Wu fengguang...@intel.com
---
Minor revision to fix build breakage on !CONFIG_MODULES reported by
Fengguang.  The patch has been queued to
wq/for-3.9-async-deadlock-fixes.

Thanks.

 block/elevator.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index c2d61d5..603b2c1 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -100,14 +100,14 @@ static void elevator_put(struct elevator_type *e)
module_put(e-elevator_owner);
 }

-static struct elevator_type *elevator_get(const char *name)
+static struct elevator_type *elevator_get(const char *name, bool try_loading)
 {
struct elevator_type *e;

spin_lock(elv_list_lock);

e = elevator_find(name);
-   if (!e) {
+   if (!e  try_loading) {
spin_unlock(elv_list_lock);
request_module(%s-iosched, name);
spin_lock(elv_list_lock);
@@ -207,25 +207,30 @@ int elevator_init(struct request_queue *q, char *name)
q-boundary_rq = NULL;

if (name) {
-   e = elevator_get(name);
+   e = elevator_get(name, true);
if (!e)
return -EINVAL;
}

+   /*
+* Use the default elevator specified by config boot param or
+* config option.  Don't try to load modules as we could be running
+* off async and request_module() isn't allowed from async.
+*/
if (!e  *chosen_elevator) {
-   e = elevator_get(chosen_elevator);
+   e = elevator_get(chosen_elevator, false);
if (!e)
printk(KERN_ERR I/O scheduler %s not found\n,
chosen_elevator);
}

if (!e) {
-   e = elevator_get(CONFIG_DEFAULT_IOSCHED);
+   e = elevator_get(CONFIG_DEFAULT_IOSCHED, false);
if (!e) {
printk(KERN_ERR
Default I/O scheduler not found.  \
Using noop.\n);
-   e = elevator_get(noop);
+   e = elevator_get(noop, false);
}
}

@@ -967,7 +972,7 @@ int elevator_change(struct request_queue *q, const char 
*name)
return -ENXIO;

strlcpy(elevator_name, name, sizeof(elevator_name));
-   e = elevator_get(strstrip(elevator_name));
+   e = elevator_get(strstrip(elevator_name), true);
if (!e) {
printk(KERN_ERR elevator: type %s not found\n, elevator_name);
return -EINVAL;
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/2] init, block: try to load default elevator module early during boot

2013-01-22 Thread Tejun Heo

From bb813f4c933ae9f887a014483690d5f8b8ec05e1 Mon Sep 17 00:00:00 2001
From: Tejun Heo t...@kernel.org
Date: Fri, 18 Jan 2013 14:05:56 -0800

This patch adds default module loading and uses it to load the default
block elevator.  During boot, it's called right after initramfs or
initrd is made available and right before control is passed to
userland.  This ensures that as long as the modules are available in
the usual places in initramfs, initrd or the root filesystem, the
default modules are loaded as soon as possible.

This will replace the on-demand elevator module loading from elevator
init path.

v2: Fixed build breakage when !CONFIG_BLOCK.  Reported by kbuild test
robot.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Jens Axboe ax...@kernel.dk
Cc: Arjan van de Ven ar...@linux.intel.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Alex Riesen raa.l...@gmail.com
Cc: Fengguang We fengguang...@intel.com
---
Minor revision to fix build breakage on !CONFIG_BLOCK reported by
Fengguang.  The patch is queued to wq/for-3.9-async-deadlock-fixes.

Thanks.

 block/elevator.c | 16 
 include/linux/elevator.h |  5 +
 include/linux/init.h |  1 +
 init/do_mounts_initrd.c  |  3 +++
 init/initramfs.c |  8 +++-
 init/main.c  | 16 
 6 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/block/elevator.c b/block/elevator.c
index 9edba1b..c2d61d5 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -136,6 +136,22 @@ static int __init elevator_setup(char *str)
 
 __setup(elevator=, elevator_setup);
 
+/* called during boot to load the elevator chosen by the elevator param */
+void __init load_default_elevator_module(void)
+{
+   struct elevator_type *e;
+
+   if (!chosen_elevator[0])
+   return;
+
+   spin_lock(elv_list_lock);
+   e = elevator_find(chosen_elevator);
+   spin_unlock(elv_list_lock);
+
+   if (!e)
+   request_module(%s-iosched, chosen_elevator);
+}
+
 static struct kobj_type elv_ktype;
 
 static struct elevator_queue *elevator_alloc(struct request_queue *q,
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index c03af76..1866206 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -138,6 +138,7 @@ extern void elv_drain_elevator(struct request_queue *);
 /*
  * io scheduler registration
  */
+extern void __init load_default_elevator_module(void);
 extern int elv_register(struct elevator_type *);
 extern void elv_unregister(struct elevator_type *);
 
@@ -206,5 +207,9 @@ enum {
INIT_LIST_HEAD((rq)-csd.list);\
} while (0)
 
+#else /* CONFIG_BLOCK */
+
+static inline void load_default_elevator_module(void) { }
+
 #endif /* CONFIG_BLOCK */
 #endif
diff --git a/include/linux/init.h b/include/linux/init.h
index a799273..9230c94 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -161,6 +161,7 @@ extern unsigned int reset_devices;
 /* used by init/main.c */
 void setup_arch(char **);
 void prepare_namespace(void);
+void __init load_default_modules(void);
 
 extern void (*late_time_init)(void);
 
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 5e4ded5..dfe606a 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -57,6 +57,9 @@ static void __init handle_initrd(void)
sys_mkdir(/old, 0700);
sys_chdir(/old);
 
+   /* try loading default modules from initrd */
+   load_default_modules();
+
/*
 * In case that a resume from disk is carried out by linuxrc or one of
 * its children, we need to tell the freezer not to wait for us.
diff --git a/init/initramfs.c b/init/initramfs.c
index 84c6bf1..a67ef9d 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -592,7 +592,7 @@ static int __init populate_rootfs(void)
initrd_end - initrd_start);
if (!err) {
free_initrd();
-   return 0;
+   goto done;
} else {
clean_rootfs();
unpack_to_rootfs(__initramfs_start, __initramfs_size);
@@ -607,6 +607,7 @@ static int __init populate_rootfs(void)
sys_close(fd);
free_initrd();
}
+   done:
 #else
printk(KERN_INFO Unpacking initramfs...\n);
err = unpack_to_rootfs((char *)initrd_start,
@@ -615,6 +616,11 @@ static int __init populate_rootfs(void)
printk(KERN_EMERG Initramfs unpacking failed: %s\n, 
err);
free_initrd();
 #endif
+   /*
+* Try loading default modules from initramfs.  This gives
+* us a chance to load before device_initcalls.
+*/
+   load_default_modules();
}
return 0;
 }
diff --git a/init/main.c b/init/main.c
index baf1f0f..18efadb 100644
--- a/init

[PATCH 1/5] workqueue: set PF_WQ_WORKER on rescuers

2013-01-18 Thread Tejun Heo

From 111c225a5f8d872bc9327ada18d13b75edaa34be Mon Sep 17 00:00:00 2001
From: Tejun Heo t...@kernel.org
Date: Thu, 17 Jan 2013 17:16:24 -0800

PF_WQ_WORKER is used to tell scheduler that the task is a workqueue
worker and needs wq_worker_sleeping/waking_up() invoked on it for
concurrency management.  As rescuers never participate in concurrency
management, PF_WQ_WORKER wasn't set on them.

There's a need for an interface which can query whether %current is
executing a work item and if so which.  Such interface requires a way
to identify all tasks which may execute work items and PF_WQ_WORKER
will be used for that.  As all normal workers always have PF_WQ_WORKER
set, we only need to add it to rescuers.

As rescuers start with WORKER_PREP but never clear it, it's always
NOT_RUNNING and there's no need to worry about it interfering with
concurrency management even if PF_WQ_WORKER is set; however, unlike
normal workers, rescuers currently don't have its worker struct as
kthread_data().  It uses the associated workqueue_struct instead.
This is problematic as wq_worker_sleeping/waking_up() expect struct
worker at kthread_data().

This patch adds worker-rescue_wq and start rescuer kthreads with
worker struct as kthread_data and sets PF_WQ_WORKER on rescuers.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Linus Torvalds torva...@linux-foundation.org
---
This one is the same as last time.

 kernel/workqueue.c | 35 ---
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7967f34..6b99ac7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -149,6 +149,9 @@ struct worker {
 
/* for rebinding worker to CPU */
struct work_struct  rebind_work;/* L: for busy worker */
+
+   /* used only by rescuers to point to the target workqueue */
+   struct workqueue_struct *rescue_wq; /* I: the workqueue to rescue */
 };
 
 struct worker_pool {
@@ -763,12 +766,20 @@ struct task_struct *wq_worker_sleeping(struct task_struct 
*task,
   unsigned int cpu)
 {
struct worker *worker = kthread_data(task), *to_wakeup = NULL;
-   struct worker_pool *pool = worker-pool;
-   atomic_t *nr_running = get_pool_nr_running(pool);
+   struct worker_pool *pool;
+   atomic_t *nr_running;
 
+   /*
+* Rescuers, which may not have all the fields set up like normal
+* workers, also reach here, let's not access anything before
+* checking NOT_RUNNING.
+*/
if (worker-flags  WORKER_NOT_RUNNING)
return NULL;
 
+   pool = worker-pool;
+   nr_running = get_pool_nr_running(pool);
+
/* this can only happen on the local cpu */
BUG_ON(cpu != raw_smp_processor_id());
 
@@ -2357,7 +2368,7 @@ sleep:
 
 /**
  * rescuer_thread - the rescuer thread function
- * @__wq: the associated workqueue
+ * @__rescuer: self
  *
  * Workqueue rescuer thread function.  There's one rescuer for each
  * workqueue which has WQ_RESCUER set.
@@ -2374,20 +2385,27 @@ sleep:
  *
  * This should happen rarely.
  */
-static int rescuer_thread(void *__wq)
+static int rescuer_thread(void *__rescuer)
 {
-   struct workqueue_struct *wq = __wq;
-   struct worker *rescuer = wq-rescuer;
+   struct worker *rescuer = __rescuer;
+   struct workqueue_struct *wq = rescuer-rescue_wq;
struct list_head *scheduled = rescuer-scheduled;
bool is_unbound = wq-flags  WQ_UNBOUND;
unsigned int cpu;
 
set_user_nice(current, RESCUER_NICE_LEVEL);
+
+   /*
+* Mark rescuer as worker too.  As WORKER_PREP is never cleared, it
+* doesn't participate in concurrency management.
+*/
+   rescuer-task-flags |= PF_WQ_WORKER;
 repeat:
set_current_state(TASK_INTERRUPTIBLE);
 
if (kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
+   rescuer-task-flags = ~PF_WQ_WORKER;
return 0;
}
 
@@ -2431,6 +2449,8 @@ repeat:
spin_unlock_irq(gcwq-lock);
}
 
+   /* rescuers should never participate in concurrency management */
+   WARN_ON_ONCE(!(rescuer-flags  WORKER_NOT_RUNNING));
schedule();
goto repeat;
 }
@@ -3266,7 +3286,8 @@ struct workqueue_struct *__alloc_workqueue_key(const char 
*fmt,
if (!rescuer)
goto err;
 
-   rescuer-task = kthread_create(rescuer_thread, wq, %s,
+   rescuer-rescue_wq = wq;
+   rescuer-task = kthread_create(rescuer_thread, rescuer, %s,
   wq-name);
if (IS_ERR(rescuer-task))
goto err;
-- 
1.8.0.2

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] workqueue: rename kernel/workqueue_sched.h to kernel/workqueue_internal.h

2013-01-18 Thread Tejun Heo

From ea138446e51f7bfe55cdeffa3f1dd9cafc786bd8 Mon Sep 17 00:00:00 2001
From: Tejun Heo t...@kernel.org
Date: Fri, 18 Jan 2013 14:05:55 -0800

Workqueue wants to expose more interface internal to kernel/.  Instead
of adding a new header file, repurpose kernel/workqueue_sched.h.
Rename it to workqueue_internal.h and add include protector.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Ingo Molnar mi...@redhat.com
Cc: Peter Zijlstra pet...@infradead.org
---
Ingo, Peter, changes to scheduler is trivial.  If there's no
objection, I'd like to route this with the rest of workqueue and async
changes.

Thanks.

 kernel/sched/core.c |  2 +-
 kernel/workqueue.c  |  2 +-
 kernel/workqueue_internal.h | 18 ++
 kernel/workqueue_sched.h|  9 -
 4 files changed, 20 insertions(+), 11 deletions(-)
 create mode 100644 kernel/workqueue_internal.h
 delete mode 100644 kernel/workqueue_sched.h

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 257002c..c6737f4f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -83,7 +83,7 @@
 #endif
 
 #include sched.h
-#include ../workqueue_sched.h
+#include ../workqueue_internal.h
 #include ../smpboot.h
 
 #define CREATE_TRACE_POINTS
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6b99ac7..b4e9206 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -43,7 +43,7 @@
 #include linux/idr.h
 #include linux/hashtable.h
 
-#include workqueue_sched.h
+#include workqueue_internal.h
 
 enum {
/*
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
new file mode 100644
index 000..b3ea6ad
--- /dev/null
+++ b/kernel/workqueue_internal.h
@@ -0,0 +1,18 @@
+/*
+ * kernel/workqueue_internal.h
+ *
+ * Workqueue internal header file.  Only to be included by workqueue and
+ * core kernel subsystems.
+ */
+#ifndef _KERNEL_WORKQUEUE_INTERNAL_H
+#define _KERNEL_WORKQUEUE_INTERNAL_H
+
+/*
+ * Scheduler hooks for concurrency managed workqueue.  Only to be used from
+ * sched.c and workqueue.c.
+ */
+void wq_worker_waking_up(struct task_struct *task, unsigned int cpu);
+struct task_struct *wq_worker_sleeping(struct task_struct *task,
+  unsigned int cpu);
+
+#endif /* _KERNEL_WORKQUEUE_INTERNAL_H */
diff --git a/kernel/workqueue_sched.h b/kernel/workqueue_sched.h
deleted file mode 100644
index 2d10fc9..000
--- a/kernel/workqueue_sched.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * kernel/workqueue_sched.h
- *
- * Scheduler hooks for concurrency managed workqueue.  Only to be
- * included from sched.c and workqueue.c.
- */
-void wq_worker_waking_up(struct task_struct *task, unsigned int cpu);
-struct task_struct *wq_worker_sleeping(struct task_struct *task,
-  unsigned int cpu);
-- 
1.8.0.2

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] workqueue: move struct worker definition to workqueue_internal.h

2013-01-18 Thread Tejun Heo

From 2eaebdb33e1911c0cf3d44fd3596c42c6f502fab Mon Sep 17 00:00:00 2001
From: Tejun Heo t...@kernel.org
Date: Fri, 18 Jan 2013 14:05:55 -0800

This will be used to implement an inline function to query whether
%current is a workqueue worker and, if so, allow determining which
work item it's executing.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Linus Torvalds torva...@linux-foundation.org
---
 kernel/workqueue.c  | 32 +---
 kernel/workqueue_internal.h | 37 +
 2 files changed, 38 insertions(+), 31 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b4e9206..2ffa240 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -122,37 +122,7 @@ enum {
  * W: workqueue_lock protected.
  */
 
-struct global_cwq;
-struct worker_pool;
-
-/*
- * The poor guys doing the actual heavy lifting.  All on-duty workers
- * are either serving the manager role, on idle list or on busy hash.
- */
-struct worker {
-   /* on idle list while idle, on busy hash table while busy */
-   union {
-   struct list_headentry;  /* L: while idle */
-   struct hlist_node   hentry; /* L: while busy */
-   };
-
-   struct work_struct  *current_work;  /* L: work being processed */
-   work_func_t current_func;   /* L: current_work's fn */
-   struct cpu_workqueue_struct *current_cwq; /* L: current_work's cwq */
-   struct list_headscheduled;  /* L: scheduled works */
-   struct task_struct  *task;  /* I: worker task */
-   struct worker_pool  *pool;  /* I: the associated pool */
-   /* 64 bytes boundary on 64bit, 32 on 32bit */
-   unsigned long   last_active;/* L: last active timestamp */
-   unsigned intflags;  /* X: flags */
-   int id; /* I: worker id */
-
-   /* for rebinding worker to CPU */
-   struct work_struct  rebind_work;/* L: for busy worker */
-
-   /* used only by rescuers to point to the target workqueue */
-   struct workqueue_struct *rescue_wq; /* I: the workqueue to rescue */
-};
+/* struct worker is defined in workqueue_internal.h */
 
 struct worker_pool {
struct global_cwq   *gcwq;  /* I: the owning gcwq */
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index b3ea6ad..02549fa 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -7,6 +7,43 @@
 #ifndef _KERNEL_WORKQUEUE_INTERNAL_H
 #define _KERNEL_WORKQUEUE_INTERNAL_H
 
+#include linux/workqueue.h
+
+struct global_cwq;
+struct worker_pool;
+
+/*
+ * The poor guys doing the actual heavy lifting.  All on-duty workers are
+ * either serving the manager role, on idle list or on busy hash.  For
+ * details on the locking annotation (L, I, X...), refer to workqueue.c.
+ *
+ * Only to be used in workqueue and async.
+ */
+struct worker {
+   /* on idle list while idle, on busy hash table while busy */
+   union {
+   struct list_headentry;  /* L: while idle */
+   struct hlist_node   hentry; /* L: while busy */
+   };
+
+   struct work_struct  *current_work;  /* L: work being processed */
+   work_func_t current_func;   /* L: current_work's fn */
+   struct cpu_workqueue_struct *current_cwq; /* L: current_work's cwq */
+   struct list_headscheduled;  /* L: scheduled works */
+   struct task_struct  *task;  /* I: worker task */
+   struct worker_pool  *pool;  /* I: the associated pool */
+   /* 64 bytes boundary on 64bit, 32 on 32bit */
+   unsigned long   last_active;/* L: last active timestamp */
+   unsigned intflags;  /* X: flags */
+   int id; /* I: worker id */
+
+   /* for rebinding worker to CPU */
+   struct work_struct  rebind_work;/* L: for busy worker */
+
+   /* used only by rescuers to point to the target workqueue */
+   struct workqueue_struct *rescue_wq; /* I: the workqueue to rescue */
+};
+
 /*
  * Scheduler hooks for concurrency managed workqueue.  Only to be used from
  * sched.c and workqueue.c.
-- 
1.8.0.2

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] workqueue: implement current_is_async()

2013-01-18 Thread Tejun Heo

From 84b233adcca3cacd5cfa8013a5feda7a3db4a9af Mon Sep 17 00:00:00 2001
From: Tejun Heo t...@kernel.org
Date: Fri, 18 Jan 2013 14:05:56 -0800

This function queries whether %current is an async worker executing an
async item.  This will be used to implement warning on synchronous
request_module() from async workers.

Signed-off-by: Tejun Heo t...@kernel.org
---
 include/linux/async.h   |  1 +
 kernel/async.c  | 14 ++
 kernel/workqueue_internal.h | 11 +++
 3 files changed, 26 insertions(+)

diff --git a/include/linux/async.h b/include/linux/async.h
index 7a24fe9..345169c 100644
--- a/include/linux/async.h
+++ b/include/linux/async.h
@@ -52,4 +52,5 @@ extern void async_synchronize_full_domain(struct async_domain 
*domain);
 extern void async_synchronize_cookie(async_cookie_t cookie);
 extern void async_synchronize_cookie_domain(async_cookie_t cookie,
struct async_domain *domain);
+extern bool current_is_async(void);
 #endif
diff --git a/kernel/async.c b/kernel/async.c
index 9d31183..d9bf2a9 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -57,6 +57,8 @@ asynchronous and synchronous parts of the kernel.
 #include linux/slab.h
 #include linux/workqueue.h
 
+#include workqueue_internal.h
+
 static async_cookie_t next_cookie = 1;
 
 #define MAX_WORK   32768
@@ -337,3 +339,15 @@ void async_synchronize_cookie(async_cookie_t cookie)
async_synchronize_cookie_domain(cookie, async_running);
 }
 EXPORT_SYMBOL_GPL(async_synchronize_cookie);
+
+/**
+ * current_is_async - is %current an async worker task?
+ *
+ * Returns %true if %current is an async worker task.
+ */
+bool current_is_async(void)
+{
+   struct worker *worker = current_wq_worker();
+
+   return worker  worker-current_func == async_run_entry_fn;
+}
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 02549fa..cc35e7e 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -8,6 +8,7 @@
 #define _KERNEL_WORKQUEUE_INTERNAL_H
 
 #include linux/workqueue.h
+#include linux/kthread.h
 
 struct global_cwq;
 struct worker_pool;
@@ -44,6 +45,16 @@ struct worker {
struct workqueue_struct *rescue_wq; /* I: the workqueue to rescue */
 };
 
+/**
+ * current_wq_worker - return struct worker if %current is a workqueue worker
+ */
+static inline struct worker *current_wq_worker(void)
+{
+   if (current-flags  PF_WQ_WORKER)
+   return kthread_data(current);
+   return NULL;
+}
+
 /*
  * Scheduler hooks for concurrency managed workqueue.  Only to be used from
  * sched.c and workqueue.c.
-- 
1.8.0.2

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] async, kmod: warn on synchronous request_module() from async workers

2013-01-18 Thread Tejun Heo

From 4983f3b51e18d008956dd113e0ea2f252774cefc Mon Sep 17 00:00:00 2001
From: Tejun Heo t...@kernel.org
Date: Fri, 18 Jan 2013 14:05:57 -0800

Synchronous requet_module() from an async worker can lead to deadlock
because module init path may invoke async_synchronize_full().  The
async worker waits for request_module() to complete and the module
loading waits for the async task to finish.  This bug happened in the
block layer because of default elevator auto-loading.

Block layer has been updated not to do default elevator auto-loading
and it has been decided to disallow synchronous request_module() from
async workers.

Trigger WARN_ON_ONCE() on synchronous request_module() from async
workers.

For more details, please refer to the following thread.

  http://thread.gmane.org/gmane.linux.kernel/1420814

Signed-off-by: Tejun Heo t...@kernel.org
Reported-by: Alex Riesen raa.l...@gmail.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Arjan van de Ven ar...@linux.intel.com
---
 kernel/kmod.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 1c317e3..ecd42b4 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -38,6 +38,7 @@
 #include linux/suspend.h
 #include linux/rwsem.h
 #include linux/ptrace.h
+#include linux/async.h
 #include asm/uaccess.h
 
 #include trace/events/module.h
@@ -130,6 +131,14 @@ int __request_module(bool wait, const char *fmt, ...)
 #define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
static int kmod_loop_msg;
 
+   /*
+* We don't allow synchronous module loading from async.  Module
+* init may invoke async_synchronize_full() which will end up
+* waiting for this task which already is waiting for the module
+* loading to complete, leading to a deadlock.
+*/
+   WARN_ON_ONCE(wait  current_is_async());
+
va_start(args, fmt);
ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
va_end(args);
-- 
1.8.0.2

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] init, block: try to load default elevator module early during boot

2013-01-17 Thread Tejun Heo

On Thu, Jan 17, 2013 at 10:46:47AM -0800, Linus Torvalds wrote:
 On Thu, Jan 17, 2013 at 10:38 AM, Tejun Heo t...@kernel.org wrote:
 
  Oh yeah, it's coming.  I just wanted to finish something else first
  and, as turning on PF_WQ_WORKER on a rescuer thread has some chance of
  developing into an obscure difficult-to-trigger and diagnose problem,
  don't want to hurry it too much.
 
 Ok. I think I'll delay these things for 3.9 anyway, since the actual
 _problem_ people are seeing should be fixed with your other patch. So
 I guess it's not really all that critical any more.

If you're okay with it, I'll route these two and the patches to add
warning through a wq branch.  There's already a wq/for-3.9 patch which
am_i_async() can make use of, so it's gonna be easier this way.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] init, block: try to load default elevator module early during boot

2013-01-17 Thread Tejun Heo

Hello, Linus.

On Thu, Jan 17, 2013 at 10:05:53AM -0800, Linus Torvalds wrote:
 And here I was really hoping that there was a third patch in the
 series that added the warning...
 
 We don't currently have a am I an async worker helper function for
 the warning to use, which is something very much up your alley.

Oh yeah, it's coming.  I just wanted to finish something else first
and, as turning on PF_WQ_WORKER on a rescuer thread has some chance of
developing into an obscure difficult-to-trigger and diagnose problem,
don't want to hurry it too much.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] async: fix __lowest_in_progress()

2013-01-17 Thread Tejun Heo

Hello,

On Thu, Jan 17, 2013 at 10:16:50AM -0800, Linus Torvalds wrote:
 Tejun, mind explaining this one a bit more to me?
 
 If ordering matters, and we have a running queue and a pending queue,
 how could the pending queue *ever* be lower than the running one?

So, before being converted to workqueue, async spooled up its own
workers and each worker would lock and move the first pending item to
the executing list and everything was in order.

The conversion to workqueue was done by adding work_struct to each
async_entry and async just schedules the work item.  The queueing and
dispatching of such work items are still in order but now each worker
thread is associated with a specific async_entry and move that
specific async_entry to the executing list.  So, depending on which
worker reaches that point earlier, which is completely
non-deterministic, we may end up moving an async_entry with larger
cookie before one with smaller one.

 That implies that something was taken off the pending queue and put on
 the running queue out of order, right?
 
 And that in turn implies that there isn't much of a lowest ordering
 at all, so how could anybody even care about what lowest is? It seems
 to be a meaningless measure.

The execution is still lowest first as workqueue would dispatch
workers in queued order.  It just is that they can reach the
synchronization point at their own differing paces.

 So with that in mind, I don't see what semantics the first part of the
 patch fixes. Can you explain more?

The problem with the code is that it's keeping a global pending list
and domain-specific running lists.  I don't know why it developed like
this but even before workqueue conversion the code was weird.

* It has sorted per-domain running list, so looking at the first item
  is easy.

* It has sorted global pennding list, and looking for first item in a
  domain involves scanning it.

Global syncing ends up scanning all per-domain running lists and
domain syncing ends up scanning global pending list, when all we need
is each async item to be queued on two lists - global and per-domain
in-flight lists - and stay there until done.

The posted patch is minimal fix while keeping the basic operation the
same so that it doesn't disturb -stable too much.  I'll prep a patch
to redo synchronization for 3.9.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] workqueue: set PF_WQ_WORKER on rescuers

2013-01-17 Thread Tejun Heo

PF_WQ_WORKER is used to tell scheduler that the task is a workqueue
worker and needs wq_worker_sleeping/waking_up() invoked on it for
concurrency management.  As rescuers never participate in concurrency
management, PF_WQ_WORKER wasn't set on them.

There's a need for an interface which can query whether %current is
executing a work item and if so which.  Such interface requires a way
to identify all tasks which may execute work items and PF_WQ_WORKER
will be used for that.  As all normal workers always have PF_WQ_WORKER
set, we only need to add it to rescuers.

As rescuers start with WORKER_PREP but never clear it, it's always
NOT_RUNNING and there's no need to worry about it interfering with
concurrency management even if PF_WQ_WORKER is set; however, unlike
normal workers, rescuers currently don't have its worker struct as
kthread_data().  It uses the associated workqueue_struct instead.
This is problematic as wq_worker_sleeping/waking_up() expect struct
worker at kthread_data().

This patch adds worker-rescue_wq and start rescuer kthreads with
worker struct as kthread_data and sets PF_WQ_WORKER on rescuers.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Linus Torvalds torva...@linux-foundation.org
---
These three patches implement the warning on synchronous
request_module() from async.  The first two will go through wq/for-3.9
and the last one through wq/for-3.9-async-deadlock-fixes together with
the block layer updates.

Thanks.

 kernel/workqueue.c | 35 ---
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7967f34..6b99ac7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -149,6 +149,9 @@ struct worker {
 
/* for rebinding worker to CPU */
struct work_struct  rebind_work;/* L: for busy worker */
+
+   /* used only by rescuers to point to the target workqueue */
+   struct workqueue_struct *rescue_wq; /* I: the workqueue to rescue */
 };
 
 struct worker_pool {
@@ -763,12 +766,20 @@ struct task_struct *wq_worker_sleeping(struct task_struct 
*task,
   unsigned int cpu)
 {
struct worker *worker = kthread_data(task), *to_wakeup = NULL;
-   struct worker_pool *pool = worker-pool;
-   atomic_t *nr_running = get_pool_nr_running(pool);
+   struct worker_pool *pool;
+   atomic_t *nr_running;
 
+   /*
+* Rescuers, which may not have all the fields set up like normal
+* workers, also reach here, let's not access anything before
+* checking NOT_RUNNING.
+*/
if (worker-flags  WORKER_NOT_RUNNING)
return NULL;
 
+   pool = worker-pool;
+   nr_running = get_pool_nr_running(pool);
+
/* this can only happen on the local cpu */
BUG_ON(cpu != raw_smp_processor_id());
 
@@ -2357,7 +2368,7 @@ sleep:
 
 /**
  * rescuer_thread - the rescuer thread function
- * @__wq: the associated workqueue
+ * @__rescuer: self
  *
  * Workqueue rescuer thread function.  There's one rescuer for each
  * workqueue which has WQ_RESCUER set.
@@ -2374,20 +2385,27 @@ sleep:
  *
  * This should happen rarely.
  */
-static int rescuer_thread(void *__wq)
+static int rescuer_thread(void *__rescuer)
 {
-   struct workqueue_struct *wq = __wq;
-   struct worker *rescuer = wq-rescuer;
+   struct worker *rescuer = __rescuer;
+   struct workqueue_struct *wq = rescuer-rescue_wq;
struct list_head *scheduled = rescuer-scheduled;
bool is_unbound = wq-flags  WQ_UNBOUND;
unsigned int cpu;
 
set_user_nice(current, RESCUER_NICE_LEVEL);
+
+   /*
+* Mark rescuer as worker too.  As WORKER_PREP is never cleared, it
+* doesn't participate in concurrency management.
+*/
+   rescuer-task-flags |= PF_WQ_WORKER;
 repeat:
set_current_state(TASK_INTERRUPTIBLE);
 
if (kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
+   rescuer-task-flags = ~PF_WQ_WORKER;
return 0;
}
 
@@ -2431,6 +2449,8 @@ repeat:
spin_unlock_irq(gcwq-lock);
}
 
+   /* rescuers should never participate in concurrency management */
+   WARN_ON_ONCE(!(rescuer-flags  WORKER_NOT_RUNNING));
schedule();
goto repeat;
 }
@@ -3266,7 +3286,8 @@ struct workqueue_struct *__alloc_workqueue_key(const char 
*fmt,
if (!rescuer)
goto err;
 
-   rescuer-task = kthread_create(rescuer_thread, wq, %s,
+   rescuer-rescue_wq = wq;
+   rescuer-task = kthread_create(rescuer_thread, rescuer, %s,
   wq-name);
if (IS_ERR(rescuer-task))
goto err;
-- 
1.8.0.2

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo

[PATCH 2/3] workqueue, async: implement work/async_current_func()

2013-01-17 Thread Tejun Heo

Implement work/async_current_func() which query whether the current
task is a workqueue or async worker respectively and, if so, return
the current function being executed along with work / async item
related information.

This will be used to implement warning on synchronous request_module()
from async workers.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Arjan van de Ven ar...@linux.intel.com
---
 include/linux/async.h |  2 ++
 include/linux/workqueue.h |  1 +
 kernel/async.c| 25 +
 kernel/workqueue.c| 22 ++
 4 files changed, 50 insertions(+)

diff --git a/include/linux/async.h b/include/linux/async.h
index 7a24fe9..6c49157 100644
--- a/include/linux/async.h
+++ b/include/linux/async.h
@@ -52,4 +52,6 @@ extern void async_synchronize_full_domain(struct async_domain 
*domain);
 extern void async_synchronize_cookie(async_cookie_t cookie);
 extern void async_synchronize_cookie_domain(async_cookie_t cookie,
struct async_domain *domain);
+extern async_func_ptr *async_current_func(void **datap,
+ async_cookie_t *cookiep);
 #endif
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 2b58905..984fbef 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -428,6 +428,7 @@ extern void workqueue_set_max_active(struct 
workqueue_struct *wq,
 extern bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq);
 extern unsigned int work_cpu(struct work_struct *work);
 extern unsigned int work_busy(struct work_struct *work);
+extern work_func_t work_current_func(struct work_struct **workp);
 
 /*
  * Like above, but uses del_timer() instead of del_timer_sync(). This means,
diff --git a/kernel/async.c b/kernel/async.c
index 9d31183..ed1eda0 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -337,3 +337,28 @@ void async_synchronize_cookie(async_cookie_t cookie)
async_synchronize_cookie_domain(cookie, async_running);
 }
 EXPORT_SYMBOL_GPL(async_synchronize_cookie);
+
+/**
+ * async_current_func - determine the async entry %current is executing
+ * @datap: optional out param for the data
+ * @cookiep: optional out param for the cookie
+ *
+ * Determine whether %current is an async worker executing an async_entry
+ * and if so return the async function and, if @cookiep is not %NULL, the
+ * cookie.  If %current isn't executing an async_entry, %NULL is returned.
+ */
+async_func_ptr *async_current_func(void **datap, async_cookie_t *cookiep)
+{
+   struct work_struct *work;
+   struct async_entry *entry;
+
+   if (work_current_func(work) != async_run_entry_fn)
+   return NULL;
+
+   entry = container_of(work, struct async_entry, work);
+   if (datap)
+   *datap = entry-data;
+   if (cookiep)
+   *cookiep = entry-cookie;
+   return entry-func;
+}
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6b99ac7..9fc1549 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3492,6 +3492,28 @@ unsigned int work_busy(struct work_struct *work)
 }
 EXPORT_SYMBOL_GPL(work_busy);
 
+/**
+ * work_current_func - determine the work fn and item %current is executing
+ * @workp: optional out param for the current work item
+ *
+ * Determine whether %current is a kworker executing a work item and if so
+ * return the work function and, if @workp is not %NULL, the work item.  If
+ * %current isn't executing a work item, %NULL is returned.
+ */
+work_func_t work_current_func(struct work_struct **workp)
+{
+   struct worker *worker;
+
+   /* am I a kworker? */
+   if (!(current-flags  PF_WQ_WORKER))
+   return NULL;
+
+   worker = kthread_data(current);
+   if (workp)
+   *workp = worker-current_work;
+   return worker-current_func;
+}
+
 /*
  * CPU hotplug.
  *
-- 
1.8.0.2

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] async, kmod: warn on synchronous request_module() from async workers

2013-01-17 Thread Tejun Heo

Synchronous requet_module() from an async worker can lead to deadlock
because module init path may invoke async_synchronize_full().  The
async worker waits for request_module() to complete and the module
loading waits for the async task to finish.  This bug happened in the
block layer because of default elevator auto-loading.

Block layer has been updated not to do default elevator auto-loading
and it has been decided to disallow synchronous request_module() from
async workers.

Trigger WARN_ON_ONCE() on synchronous request_module() from async
workers.

For more details, please refer to the following thread.

  http://thread.gmane.org/gmane.linux.kernel/1420814

Signed-off-by: Tejun Heo t...@kernel.org
Reported-by: Alex Riesen raa.l...@gmail.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Arjan van de Ven ar...@linux.intel.com
---
Linus, please note that I dropped system_state == SYSTEM_RUNNING
condition from WARN_ON_ONCE() as the deadlock can happen during system
init too - e.g. libata probing block device using async making block
layer try to load default elevator from initramfs.

Thanks.

 kernel/kmod.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 1c317e3..028287e 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -38,6 +38,7 @@
 #include linux/suspend.h
 #include linux/rwsem.h
 #include linux/ptrace.h
+#include linux/async.h
 #include asm/uaccess.h
 
 #include trace/events/module.h
@@ -130,6 +131,14 @@ int __request_module(bool wait, const char *fmt, ...)
 #define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
static int kmod_loop_msg;
 
+   /*
+* We don't allow synchronous module loading from async.  Module
+* init may invoke async_synchronize_full() which will end up
+* waiting for this task which already is waiting for the module
+* loading to complete, leading to a deadlock.
+*/
+   WARN_ON_ONCE(wait  async_current_func(NULL, NULL));
+
va_start(args, fmt);
ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
va_end(args);
-- 
1.8.0.2

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] workqueue, async: implement work/async_current_func()

2013-01-17 Thread Tejun Heo

Hello, Linus.

On Thu, Jan 17, 2013 at 06:47:48PM -0800, Linus Torvalds wrote:
 On Thu, Jan 17, 2013 at 5:25 PM, Tejun Heo t...@kernel.org wrote:
  Implement work/async_current_func() which query whether the current
  task is a workqueue or async worker respectively and, if so, return
  the current function being executed along with work / async item
  related information.
 
 So why the odd interface? The only user of it calls it with a

Yeah, I was doing something else in async and arguing between that and
current_is_async() and ended up keeping it as it was consistent with
the workqueue counterpart.

 NULL/NULL pair of arguments, and in general it's just way too complex
 to be an exported function at all. I *suspect* you chose that complex
 interface because you feel you may have some use for it inside of the
 async code itself, but why isn't that then not totally private to
 there?
 
 IOW, why isn't the interface just
 
static struct worker *current_worker(void)
{
   if (current-flags  PF_WQ_WORKER)
  return kthread_data(current);
   return NULL;
}

I'd prefer to keep struct worker inside workqueue.c, so how about
keeping the workqueue part and make async part current_is_async()?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] workqueue, async: implement work/async_current_func()

2013-01-17 Thread Tejun Heo

On Thu, Jan 17, 2013 at 07:18:26PM -0800, Linus Torvalds wrote:
 I'm not sure I understand what you mean? Do you mean trying to limit
 work_current_func() to only be accessible to the async code? You'd
 have to make some kind of private header file under kernel/ for that,
 but I guess that would work fine. We already do something similar
 inside filesystems etc, where they have their own local headers.

Yeap, and I'm unsure whether it's worth introducing a new internal
header file.

 Yes, yes, some globally optimizing compiler could sort it all out, but
 I'd personally be inclined to just move all the structure definitions
 into kernel/worker.h, and make the code be inline functions. The only
 actual current *user* would also be in the kernel/ subdirectory, and
 we don't know if we'd ever want to really expand it past there.
 
 Hmm?

If we're gonna make it kernel/ internal thing with internal header, we
definitely can go all the way.  It's a bit meh because the code path
involved is very cold.  Hmm... I'll make it that way.  I like how it
keeps the thing apparently internal.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: USB device cannot be reconnected and khubd blocked for more than 120 seconds

2013-01-16 Thread Tejun Heo

Hello, Alan.

On Tue, Jan 15, 2013 at 11:01:15PM -0500, Alan Stern wrote:
  The current domain implementation is somewhere inbetween.  It's not
  completely simplistic system and at the same time not developed enough
  to do properly stacked flushing.
 
 I like your idea of chronological synchronization: Insist that anybody
 who wants to flush async jobs must get a cookie, and then only allow
 them to wait for async jobs started after the cookie was issued.
 
 I don't know if this is possible with the current implementation.  It 
 would require changing every call to async_synchronize_*(), and in a 
 nontrivial way.  But it might provide a proper solution to all these 
 problems.

The problem here is that flush everything which comes before me is
used to order async jobs.  e.g. after async jobs probe the hardware
they order themselves by flushing before registering them, so unless
we build accurate flushing dependencies, those dependencies will reach
beyond the time window we're interested in and bring in deadlocks.

And, as Linus pointed it out, tracking dependency through
request_module() is tricky no matter what we do.  I think it can be
done by matching the ones calling request_module() and the ones
actually loading modules but it's gonna be nasty.

There aren't too many which use async anyway so changing stuff
shouldn't be too difficult but I think the simpicity or dumbness is
one of major attractions of async, so it'd be nice to keep things that
way and the PF_USED_ASYNC hack seems to be able to hold things
together for now.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] async: fix __lowest_in_progress()

2013-01-16 Thread Tejun Heo

083b804c4d3e1e3d0eace56bdbc0f674946d2847 (async: use workqueue for
worker pool) made it possible that async jobs are moved from pending
to running out-of-order.  While pending async jobs will be queued and
dispatched for execution in the same order, nothing guarantees they'll
enter 1) move self to the running queue of async_run_entry_fn() in
the same order.

This broke __lowest_in_progress().  running-domain may not be
properly sorted and is not guaranteed to contain lower cookies than
pending list when not empty.  Fix it by ensuring sort-inserting to the
running list and always looking at both pending and running when
trying to determine the lowest cookie.

Over time, the async synchronization implementation became quite
messy.  We better restructure it such that each async_entry is linked
to two lists - one global and one per domain - and not move it when
execution starts.  There's no reason to distinguish pending and
running.  They behave the same for synchronization purposes.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Arjan van de Ven ar...@linux.intel.com
Cc: sta...@vger.kernel.org
---
And here's the fix for the breakage I mentioned earlier.  It wouldn't
happen often in the wild and the effect of it happening wouldn't be
critical for modern distros but it's still kinda surprising nobody
noticed this.

We definitely need to rewrite async synchronization.  It was already
messy and this makes it worse and there's no reason to be messy here.

Thanks.

 kernel/async.c |   27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

--- a/kernel/async.c
+++ b/kernel/async.c
@@ -86,18 +86,27 @@ static atomic_t entry_count;
  */
 static async_cookie_t  __lowest_in_progress(struct async_domain *running)
 {
+   async_cookie_t first_running = next_cookie; /* infinity value */
+   async_cookie_t first_pending = next_cookie; /* ditto */
struct async_entry *entry;
 
+   /*
+* Both running and pending lists are sorted but not disjoint.
+* Take the first cookies from both and return the min.
+*/
if (!list_empty(running-domain)) {
entry = list_first_entry(running-domain, typeof(*entry), 
list);
-   return entry-cookie;
+   first_running = entry-cookie;
}
 
-   list_for_each_entry(entry, async_pending, list)
-   if (entry-running == running)
-   return entry-cookie;
+   list_for_each_entry(entry, async_pending, list) {
+   if (entry-running == running) {
+   first_pending = entry-cookie;
+   break;
+   }
+   }
 
-   return next_cookie; /* infinity value */
+   return min(first_running, first_pending);
 }
 
 static async_cookie_t  lowest_in_progress(struct async_domain *running)
@@ -118,13 +127,17 @@ static void async_run_entry_fn(struct wo
 {
struct async_entry *entry =
container_of(work, struct async_entry, work);
+   struct async_entry *pos;
unsigned long flags;
ktime_t uninitialized_var(calltime), delta, rettime;
struct async_domain *running = entry-running;
 
-   /* 1) move self to the running queue */
+   /* 1) move self to the running queue, make sure it stays sorted */
spin_lock_irqsave(async_lock, flags);
-   list_move_tail(entry-list, running-domain);
+   list_for_each_entry_reverse(pos, running-domain, list)
+   if (entry-cookie  pos-cookie)
+   break;
+   list_move_tail(entry-list, pos-list);
spin_unlock_irqrestore(async_lock, flags);
 
/* 2) run (and print duration) */
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: USB device cannot be reconnected and khubd blocked for more than 120 seconds

2013-01-16 Thread Tejun Heo

Hello, Alan.

On Wed, Jan 16, 2013 at 12:01:53PM -0500, Alan Stern wrote:
  The problem here is that flush everything which comes before me is
  used to order async jobs.  e.g. after async jobs probe the hardware
  they order themselves by flushing before registering them, so unless
 
 I don't fully understand this example.  What is the point -- to make 
 sure that asynchronously probed devices are registered in the order of 
 their discovery?

People still want devices to be numbered to their physical ports and
so on, so we keep the registeration order the same as natural
(whatever that means) hardware order.

 If so, here's how to do it safely: Start up the async jobs in reverse
 order of discovery.  Have each job acquire a cookie when it starts.  
 Then each job needs to wait only for tasks that started after its
 cookie was issued.

It's a bit clumsy but yeah I guess it could work.

  There aren't too many which use async anyway so changing stuff
  shouldn't be too difficult but I think the simpicity or dumbness is
  one of major attractions of async, so it'd be nice to keep things that
  way and the PF_USED_ASYNC hack seems to be able to hold things
  together for now.
 
 Nesting won't matter for the chronological approach.  I really think 
 you should consider it more fully.  It's not a hack, and it doesn't 
 need to be complicated.

There is benefit to the current dumb implementation in that drivers
can use it without thinking too much, but yeah it could be that the
flushing range limit isn't too much of restriction on top.  I don't
know.  At this point, I'd prefer to remove request_module() from
elevator init path for the problem at hand.  If we need something more
involved, changing cookie usage rules definitely seems like an option.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] init, block: try to load default elevator module early during boot

2013-01-16 Thread Tejun Heo

This patch adds default module loading and uses it to load the default
block elevator.  During boot, it's called right after initramfs or
initrd is made available and right before control is passed to
userland.  This ensures that as long as the modules are available in
the usual places in initramfs, initrd or the root filesystem, the
default modules are loaded as soon as possible.

This will replace the on-demand elevator module loading from elevator
init path.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Jens Axboe ax...@kernel.dk
Cc: Arjan van de Ven ar...@linux.intel.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Alex Riesen raa.l...@gmail.com
---
 block/elevator.c |   16 
 include/linux/elevator.h |1 +
 include/linux/init.h |1 +
 init/do_mounts_initrd.c  |3 +++
 init/initramfs.c |8 +++-
 init/main.c  |   16 
 6 files changed, 44 insertions(+), 1 deletion(-)

--- a/block/elevator.c
+++ b/block/elevator.c
@@ -136,6 +136,22 @@ static int __init elevator_setup(char *s
 
 __setup(elevator=, elevator_setup);
 
+/* called during boot to load the elevator chosen by the elevator param */
+void __init load_default_elevator_module(void)
+{
+   struct elevator_type *e;
+
+   if (!chosen_elevator[0])
+   return;
+
+   spin_lock(elv_list_lock);
+   e = elevator_find(chosen_elevator);
+   spin_unlock(elv_list_lock);
+
+   if (!e)
+   request_module(%s-iosched, chosen_elevator);
+}
+
 static struct kobj_type elv_ktype;
 
 static struct elevator_queue *elevator_alloc(struct request_queue *q,
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -138,6 +138,7 @@ extern void elv_drain_elevator(struct re
 /*
  * io scheduler registration
  */
+extern void __init load_default_elevator_module(void);
 extern int elv_register(struct elevator_type *);
 extern void elv_unregister(struct elevator_type *);
 
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -161,6 +161,7 @@ extern unsigned int reset_devices;
 /* used by init/main.c */
 void setup_arch(char **);
 void prepare_namespace(void);
+void __init load_default_modules(void);
 
 extern void (*late_time_init)(void);
 
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -57,6 +57,9 @@ static void __init handle_initrd(void)
sys_mkdir(/old, 0700);
sys_chdir(/old);
 
+   /* try loading default modules from initrd */
+   load_default_modules();
+
/*
 * In case that a resume from disk is carried out by linuxrc or one of
 * its children, we need to tell the freezer not to wait for us.
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -592,7 +592,7 @@ static int __init populate_rootfs(void)
initrd_end - initrd_start);
if (!err) {
free_initrd();
-   return 0;
+   goto done;
} else {
clean_rootfs();
unpack_to_rootfs(__initramfs_start, __initramfs_size);
@@ -607,6 +607,7 @@ static int __init populate_rootfs(void)
sys_close(fd);
free_initrd();
}
+   done:
 #else
printk(KERN_INFO Unpacking initramfs...\n);
err = unpack_to_rootfs((char *)initrd_start,
@@ -615,6 +616,11 @@ static int __init populate_rootfs(void)
printk(KERN_EMERG Initramfs unpacking failed: %s\n, 
err);
free_initrd();
 #endif
+   /*
+* Try loading default modules from initramfs.  This gives
+* us a chance to load before device_initcalls.
+*/
+   load_default_modules();
}
return 0;
 }
--- a/init/main.c
+++ b/init/main.c
@@ -70,6 +70,8 @@
 #include linux/perf_event.h
 #include linux/file.h
 #include linux/ptrace.h
+#include linux/blkdev.h
+#include linux/elevator.h
 
 #include asm/io.h
 #include asm/bugs.h
@@ -794,6 +796,17 @@ static void __init do_pre_smp_initcalls(
do_one_initcall(*fn);
 }
 
+/*
+ * This function requests modules which should be loaded by default and is
+ * called twice right after initrd is mounted and right before init is
+ * exec'd.  If such modules are on either initrd or rootfs, they will be
+ * loaded before control is passed to userland.
+ */
+void __init load_default_modules(void)
+{
+   load_default_elevator_module();
+}
+
 static int run_init_process(const char *init_filename)
 {
argv_init[0] = init_filename;
@@ -900,4 +913,7 @@ static void __init kernel_init_freeable(
 * we're essentially up and running. Get rid of the
 * initmem segments and start the user-mode stuff..
 */
+
+   /* rootfs is available now, try loading default modules */
+   load_default_modules();
 }
--
To unsubscribe from this list: send the line unsubscribe linux

[PATCH 2/2] block: don't request module during elevator init

2013-01-16 Thread Tejun Heo

Block layer allows selecting an elevator which is built as a module to
be selected as system default via kernel param elevator=.  This is
achieved by automatically invoking request_module() whenever a new
block device is initialized and the elevator is not available.

This led to an interesting deadlock problem involving async and module
init.  Block device probing running off an async job invokes
request_module().  While the module is being loaded, it performs
async_synchronize_full() which ends up waiting for the async job which
is already waiting for request_module() to finish, leading to
deadlock.

Invoking request_module() from deep in block device init path is
already nasty in itself.  It seems best to avoid these situations from
the beginning by moving on-demand module loading out of block init
path.

The previous patch made sure that the default elevator module is
loaded early during boot if available.  This patch removes on-demand
loading of the default elevator from elevator init path.  As the
module would have been loaded during boot, userland-visible behavior
difference should be minimal.

For more details, please refer to the following thread.

  http://thread.gmane.org/gmane.linux.kernel/1420814

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Jens Axboe ax...@kernel.dk
Cc: Arjan van de Ven ar...@linux.intel.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Alex Riesen raa.l...@gmail.com
---
 block/elevator.c |   19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

--- a/block/elevator.c
+++ b/block/elevator.c
@@ -100,14 +100,14 @@ static void elevator_put(struct elevator
module_put(e-elevator_owner);
 }
 
-static struct elevator_type *elevator_get(const char *name)
+static struct elevator_type *elevator_get(const char *name, bool 
request_module)
 {
struct elevator_type *e;
 
spin_lock(elv_list_lock);
 
e = elevator_find(name);
-   if (!e) {
+   if (!e  request_module) {
spin_unlock(elv_list_lock);
request_module(%s-iosched, name);
spin_lock(elv_list_lock);
@@ -207,25 +207,30 @@ int elevator_init(struct request_queue *
q-boundary_rq = NULL;
 
if (name) {
-   e = elevator_get(name);
+   e = elevator_get(name, true);
if (!e)
return -EINVAL;
}
 
+   /*
+* Use the default elevator specified by config boot param or
+* config option.  Don't try to load modules as we could be running
+* off async and request_module() isn't allowed from async.
+*/
if (!e  *chosen_elevator) {
-   e = elevator_get(chosen_elevator);
+   e = elevator_get(chosen_elevator, false);
if (!e)
printk(KERN_ERR I/O scheduler %s not found\n,
chosen_elevator);
}
 
if (!e) {
-   e = elevator_get(CONFIG_DEFAULT_IOSCHED);
+   e = elevator_get(CONFIG_DEFAULT_IOSCHED, false);
if (!e) {
printk(KERN_ERR
Default I/O scheduler not found.  \
Using noop.\n);
-   e = elevator_get(noop);
+   e = elevator_get(noop, false);
}
}
 
@@ -967,7 +972,7 @@ int elevator_change(struct request_queue
return -ENXIO;
 
strlcpy(elevator_name, name, sizeof(elevator_name));
-   e = elevator_get(strstrip(elevator_name));
+   e = elevator_get(strstrip(elevator_name), true);
if (!e) {
printk(KERN_ERR elevator: type %s not found\n, elevator_name);
return -EINVAL;
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: USB device cannot be reconnected and khubd blocked for more than 120 seconds

2013-01-15 Thread Tejun Heo

Hello, Linus.

On Tue, Jan 15, 2013 at 09:36:57AM -0800, Linus Torvalds wrote:
 Tejun, comments? You can see the whole thread on lkml, but the basic
 problem is that the module loading doing the unconditional
 async_synchronize_full() has caused problems, because we have
 
  - load module A
- module A does per-controller async discovery of its devices (eg
 scsi or ata probing)
- in the async thread, it initializes somethign that needs another
 module B (in this case the default IO scheduler module)
   - modprobe for B loads the IO scheduler module successfully
   at the end of the module load, it does
 async_synchronize_full() to make sure load_module won't return before
 the module is ready
   *DEADLOCK*, because the async_synchronize_full() thing
 actually waits for not the module B async code (it didn't have any),
 but for the module *A* async code, which is waiting for module B to
 finish.

I think the root problem here, apart from request_module() from block
- which is a bit nasty but making that part completely async would too
be quite nasty albeit in a different way - is that
async_synchronize_full() is way too indescriminate.  It's something
only suitable for things like the end of system init.

I'm wondering whether what we need is a rudimentray nesting like the
following.

finished_loading()
{
blah blah;

cookie = async_current_cookie();

do init calls;

async_synchronize_upto(cookie);

blah blah;
}

The nesting here would be an approximation as the dependency recorded
here is chronological.  I *suspect* this should be safe unless the
module is doing something weird.  Need to think more about it.  One
way or the other, I think what we need is some form of scoping for
flushing async ops.

BTW, the current synchronization is broken - cookie isn't transferred
to running-domain in queueing order but __lowest_in_progress()
assumes that.  I think I broke that while converting it to workqueue.

Anyways, working on it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: USB device cannot be reconnected and khubd blocked for more than 120 seconds

2013-01-15 Thread Tejun Heo

Hello, Alan.

On Tue, Jan 15, 2013 at 01:20:58PM -0500, Alan Stern wrote:
 It may not be so easy.  When the SCSI async thread probes the new disk, 
 it has to do I/O.  So it needs to use a scheduler.
 
 But maybe it could use a built-in trivial scheduler until the proper 
 one is loaded.  Then the loading could be asynchronous.

It can be done.  Noop is always built-in and block IO can do IOs with
noop.  The problem here is that request_module() is done synchronously
during evelator_init().  We can punt that to a work item so that the
elevator is switched on load completion.  There are some nastiness
involved tho - if module probing returns before elevator switch
happens, the userland can observe elevator being switched after some
indetermined short period of time, which can, for example, break
scripts adjusting elevator knobs and etc...

I *think* it'll be best to allow scoped synchronization of async ops.
Looking into it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: USB device cannot be reconnected and khubd blocked for more than 120 seconds

2013-01-15 Thread Tejun Heo

Hello, Linus

Will continue on another reply but this one is relevant so...

On Tue, Jan 15, 2013 at 10:18:45AM -0800, Linus Torvalds wrote:
 Tejun, is there a good way for code to see I'm running in async
 context? Then we could do something like

Almost.  With a bit of modification we can ask whether current is a
kworker, reach struct worker_struct via kthread_data() if so and then
test worker-current_func against the async workfn.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: USB device cannot be reconnected and khubd blocked for more than 120 seconds

2013-01-15 Thread Tejun Heo

cc'ing Arjan.  Arjan, the original thread can be read from

  http://thread.gmane.org/gmane.linux.kernel/1420814

Hello, again.

On Tue, Jan 15, 2013 at 12:18:01PM -0800, Linus Torvalds wrote:
 I think that is a good solution if it works, but look out: we need to
 synchronize across *all* domains, not just the default one.  The sd.c
 code, for example, uses its own scsi_sd_probe_domain for example,
 and we *do* want to synchronize with it.
 
 Can you do that with your suggested interface (ie it would have to be
 a *global* sequence number).

So, I've been thinking about it for a while now and it looks like
async is cutting too many corners to implement any sane stackable
flushing scheme on top.  There simply isn't much information to
determine who should wait for what.

I've thought of two workarounds.  Both suck.

A. Try to detect deadlock conditions from synchronize().  If deadlock
   condition involving other async jobs are detected, whine about it
   and then skip.  Ignore deadlock condition on self (should solve
   this particular case).

   Detecting deadlock condition isn't difficult if there are only
   global synchronizations; unfortunately, fragmented dependencies via
   domain-local synchronization makes this non-trivial.

   We can still do ignore-self thing mostly trivially tho.  This will
   at least work around the problem at hand.

B. The ranged synchronization I first suggested.  The problem with
   this is that it's a common practice for a given async job to try to
   flush anything which comes before it.  This can introduce spurious
   synchronization dependencies which can then lead to deadlocks.

   These conditions can be detected and ignored, at least only
   considering global synchronizations.  The problem here is that
   those deadlock conditions will occur under normal usage and thus
   should be ignored silently, which basically makes synchronization
   silently ignore and finish successfully even if there are
   legitimate deadlocks which should be investigated.

For now, I'm gonna implement simple I'm not gonna wait for myself
self-deadlock avoidance.  If this needs any more sophistication, I
think we better reimplement it so that we can explicitly match up and
track who's gonna wait for what instead of throwing everything into a
single cookie space and then try to work back from there.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: USB device cannot be reconnected and khubd blocked for more than 120 seconds

2013-01-15 Thread Tejun Heo

Hello, Arjan.

On Tue, Jan 15, 2013 at 04:25:54PM -0800, Arjan van de Ven wrote:
 async fundamentally had the concept of a monotonic increasing number,
 and that you could always wait for everyone before me.
 then people (like me) wanted exceptions to what everyone means ;-(
 I'm ok with going back to a single space and simplify the world.

If we want (or need) finer grained operation, we'll probably have to
head the other direction, so that we can definitively tell that an
async operation belongs to domains system, module load A and B, so
that each waiter knows what to wait for.

The current domain implementation is somewhere inbetween.  It's not
completely simplistic system and at the same time not developed enough
to do properly stacked flushing.

 the module wait case is tricky, and I wonder if there's deadlocks
 lurking even without async.

I don't think so.  It's really an async job waiting for itself.
Working around just this case is mostly trivial (working on patches
now) but it really is putting kludges on top of shaky foundation.
Maybe this is the extent of complexity that we need to go given the
rather limited use cases of async.  Let's hope so.  I think we'll have
to reimplement synchronization scheme if we have to go further.

 at some point in the past we had the concept of request a module
 but don't wait for it, and I wonder if that is what should have
 been used here.

We actually want to wait for it as it creates a userland visible
behavior difference otherwise.  It's just that async's way of waiting
is too ham-fisted to be used properly in more complex scenarios.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: USB device cannot be reconnected and khubd blocked for more than 120 seconds

2013-01-15 Thread Tejun Heo

On Tue, Jan 15, 2013 at 04:36:34PM -0800, Linus Torvalds wrote:
 The thing is, the module loading in particular is not necessarily
 happening in the same context as what *started* the module loading. A
 module loader will request the module from user space, and then later
 user space - through possibly a totally unrelated process - will
 finish it. So there is no myself. There's not even necessarily any
 relationship that the kernel even knows about, because the module
 loading request can have gone from usermode_helper over something like
 dbus to systemd.
 
 See?

Right.  Gees, there's even no way to link them.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] module, async: async_synchronize_full() on module init iff async is used

2013-01-15 Thread Tejun Heo

Hello, Linus.

On Tue, Jan 15, 2013 at 07:00:31PM -0800, Linus Torvalds wrote:
 That said, maybe we could just make the rule be that you can't pick a
 default IO scheduler that is modular.

This is definitely much more preferable but it would affect use case
where everything is built modular and the elevator is selected via
kernel param.  This is way outside the usual usage and we can warn
about the new behavior but it still is an observable behavior change.
Do you think this would be okay?

 And I *would* like to see the warning we discussed. Maybe there are
 other situations that can trigger this?
 
 Because something like that
 
 WARN_ON_ONCE(wait  i_am_async()  system_state == SYSTEM_RUNNING);
 
 in kernel/kmod.c (__request_module()) still sounds like a good idea to
 verify that this is the only thing that triggers it (of course, we'd
 need to somehow avoid the warning for the known case with the known
 workaround).

And then this warning can be added without introducing
request_module_but_dont_warn_about_being_called_from_async().

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 22/25] usb/at91_udc: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from at91_udc.  Only compile tested.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Andrew Victor li...@maxim.org.za
Cc: Nicolas Ferre nicolas.fe...@atmel.com
Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com
Cc: Felipe Balbi ba...@ti.com
Cc: linux-usb@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/usb/gadget/at91_udc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/at91_udc.c b/drivers/usb/gadget/at91_udc.c
index f4a21f6..e81d8a2 100644
--- a/drivers/usb/gadget/at91_udc.c
+++ b/drivers/usb/gadget/at91_udc.c
@@ -1621,8 +1621,7 @@ static void at91_vbus_timer(unsigned long data)
 * bus such as i2c or spi which may sleep, so schedule some work
 * to read the vbus gpio
 */
-   if (!work_pending(udc-vbus_timer_work))
-   schedule_work(udc-vbus_timer_work);
+   schedule_work(udc-vbus_timer_work);
 }
 
 static int at91_start(struct usb_gadget *gadget,
-- 
1.8.0.2

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

84 matches

Mail list logo