Re: [PATCH v6 22/22] of/platform: Defer probes of registered devices

2015-10-24 Thread Rafael J. Wysocki
On Thursday, October 22, 2015 04:27:10 PM Scott Wood wrote:
> On Thu, 2015-10-22 at 15:04 +0200, Tomeu Vizoso wrote:
> > On 22 October 2015 at 00:51, Scott Wood  wrote:
> > > On Wed, 2015-10-21 at 08:44 -0500, Rob Herring wrote:
> > > > On Wed, Oct 21, 2015 at 12:54 AM, Scott Wood 
> > > > wrote:
> > > > > On Mon, 2015-09-21 at 16:03 +0200, Tomeu Vizoso wrote:
> > > > > > Instead of trying to match and probe platform and AMBA devices right
> > > > > > after each is registered, delay their probes until 
> > > > > > device_initcall_sync.
> > > > > > 
> > > > > > This means that devices will start probing once all built-in drivers
> > > > > > have registered, and after all platform and AMBA devices from the DT
> > > > > > have been registered already.
> > > > > > 
> > > > > > This allows us to prevent deferred probes by probing dependencies on
> > > > > > demand.
> > > > > > 
> > > > > > Signed-off-by: Tomeu Vizoso 
> > > > > > ---
> > > > > > 
> > > > > > Changes in v4:
> > > > > > - Also defer probes of AMBA devices registered from the DT as they 
> > > > > > can
> > > > > >   also request resources.
> > > > > > 
> > > > > >  drivers/of/platform.c | 11 ---
> > > > > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > > > > 
> > > > > This breaks arch/powerpc/sysdev/fsl_pci.c.  The PCI bus is an OF 
> > > > > platform
> > > > > device, and it must be probed before pcibios_init() which is a
> > > > > subsys_initcall(), or else the PCI bus never gets scanned.
> > > > 
> > > > Thanks for the report. This is probably getting dropped, but it could
> > > > be disabled for PPC.
> > > 
> > > I don't think that adding another arbitrary arch difference would be the
> > > right solution.
> > 
> > I think Rob meant temporarily disable it while things get fixed. At
> > least, 
> 
> So, what is the permanent fix for the swiotlb issue (or more generally, the 
> inability to have a late_initcall that runs after non-module, non-hotplug 
> platform devices have been probed)?
> 
> > I don't see any reason why PPC wouldn't benefit from this
> > series.
> 
> It's not clear to me what the benefit of this is at all, much less for PPC.   
> What is the fundamental problem with deferred probes?  In the cover letter 
> you say this change saves 2.3 seconds, but where is that time being consumed? 
>  Are the drivers taking too long in their probe function trying to initialize 
> and then deferring, rather than checking for dependencies up front?  Or are 
> there really so many devices and such a pessimal ordering that most of the 
> time is spent iterating through and reordering the list, with each defer 
> happening quickly?
> 
> Even if something different does need to be done at this level, forcing all 
> OF platform devices to be probed at the late_initcall level seems quite 
> intrusive.

Totally agreed.

> You limited it to OF because people complained that other things 
> will break.

Right.

And I'm not sure why that was regarded as a good enough reason to do it.

> Things still broke.

Yes, they did.

> Surely there's a better way to address the 
> problem.  Can't the delay be requested by drivers that might otherwise need 
> to defer (which could be done incrementally, focusing on the worst 
> performance problems), rather than enabling it for everything?

Well, I was suggesting to use an opt-in flag there, but I'm not sure if Tomeu
took that into consideration.

In any case, probing is just one aspect of a deeper issue, which is that
we have no way to represent functional dependencies between devices.

Thanks,
Rafael

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-24 Thread Boqun Feng
On Sat, Oct 24, 2015 at 12:26:27PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 22, 2015 at 08:07:16PM +0800, Boqun Feng wrote:
> > On Wed, Oct 21, 2015 at 09:48:25PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 21, 2015 at 12:35:23PM -0700, Paul E. McKenney wrote:
> > > > > > > > I ask this because I recall Peter once bought up a discussion:
> > > > > > > > 
> > > > > > > > https://lkml.org/lkml/2015/8/26/596
> > > 
> > > > > So a full barrier on one side of these operations is enough, I think.
> > > > > IOW, there is no need to strengthen these operations.
> > > > 
> > > > Do we need to also worry about other futex use cases?
> > > 
> > > Worry, always!
> > > 
> > > But yes, there is one more specific usecase, which is that of a
> > > condition variable.
> > > 
> > > When we go sleep on a futex, we might want to assume visibility of the
> > > stores done by the thread that woke us by the time we wake up.
> > > 
> > 
> > But the thing is futex atomics in PPC are already RELEASE(pc)+ACQUIRE
> > and imply a full barrier, is an RELEASE(sc) semantics really needed
> > here?
> 
> For this, no, the current code should be fine I think.
> 
> > Further more, is this condition variable visibility guaranteed by other
> > part of futex? Because in futex_wake_op:
> > 
> > futex_wake_op()
> >   ...
> >   double_unlock_hb(hb1, hb2);  <- RELEASE(pc) barrier here.
> >   wake_up_q(_q);
> > 
> > and in futex_wait():
> > 
> > futex_wait()
> >   ...
> >   futex_wait_queue_me(hb, , to); <- schedule() here
> >   ...
> >   unqueue_me()
> > drop_futex_key_refs(>key);
> > iput()/mmdrop(); <- a full barrier
> >   
> > 
> > The RELEASE(pc) barrier pairs with the full barrier, therefore the
> > userspace wakee can observe the condition variable modification.
> 
> Right, futexes are a pain; and I think we all agreed we didn't want to
> go rely on implementation details unless we absolutely _have_ to.
> 

Agreed.

Besides, after I have read why futex_wake_op(the caller of
futex_atomic_op_inuser()) is introduced, I think your worries are quite
reasonable. I thought the futex_atomic_op_inuser() only operated on
futex related variables, but it turns out it can actually operate any
userspace variable if userspace code likes, therefore we don't have
control of all memory ordering guarantee of the variable. So if PPC
doesn't provide a full barrier at user<->kernel boundries, we should
make futex_atomic_op_inuser() fully ordered.


Still looking into futex_atomic_cmpxchg_inatomic() ...

> > > And.. aside from the thoughts I outlined in the email referenced above,
> > > there is always the chance people accidentally rely on the strong
> > > ordering on their x86 CPU and find things come apart when ran on their
> > > ARM/MIPS/etc..
> > > 
> > > There are a fair number of people who use the raw futex call and we have
> > > 0 visibility into many of them. The assumed and accidental ordering
> > > guarantees will forever remain a mystery.
> > > 
> > 
> > Understood. That's truely a potential problem. Considering not all the
> > architectures imply a full barrier at user<->kernel boundries, maybe we
> > can use one bit in the opcode of the futex system call to indicate
> > whether userspace treats futex as fully ordered. Like:
> > 
> > #define FUTEX_ORDER_SEQ_CST  0
> > #define FUTEX_ORDER_RELAXED  64 (bit 7 and bit 8 are already used)
> 
> Not unless there's an actual performance problem with any of this.
> Futexes are painful enough as is.

Make sense, and we still have choices like modifying the userspace code
if there is actually a performance problem ;-)

Regards,
Boqun


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-24 Thread Peter Zijlstra
On Thu, Oct 22, 2015 at 08:07:16PM +0800, Boqun Feng wrote:
> On Wed, Oct 21, 2015 at 09:48:25PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 21, 2015 at 12:35:23PM -0700, Paul E. McKenney wrote:
> > > > > > > I ask this because I recall Peter once bought up a discussion:
> > > > > > > 
> > > > > > > https://lkml.org/lkml/2015/8/26/596
> > 
> > > > So a full barrier on one side of these operations is enough, I think.
> > > > IOW, there is no need to strengthen these operations.
> > > 
> > > Do we need to also worry about other futex use cases?
> > 
> > Worry, always!
> > 
> > But yes, there is one more specific usecase, which is that of a
> > condition variable.
> > 
> > When we go sleep on a futex, we might want to assume visibility of the
> > stores done by the thread that woke us by the time we wake up.
> > 
> 
> But the thing is futex atomics in PPC are already RELEASE(pc)+ACQUIRE
> and imply a full barrier, is an RELEASE(sc) semantics really needed
> here?

For this, no, the current code should be fine I think.

> Further more, is this condition variable visibility guaranteed by other
> part of futex? Because in futex_wake_op:
> 
>   futex_wake_op()
> ...
> double_unlock_hb(hb1, hb2);  <- RELEASE(pc) barrier here.
> wake_up_q(_q);
> 
> and in futex_wait():
> 
>   futex_wait()
> ...
> futex_wait_queue_me(hb, , to); <- schedule() here
> ...
> unqueue_me()
>   drop_futex_key_refs(>key);
>   iput()/mmdrop(); <- a full barrier
> 
> 
> The RELEASE(pc) barrier pairs with the full barrier, therefore the
> userspace wakee can observe the condition variable modification.

Right, futexes are a pain; and I think we all agreed we didn't want to
go rely on implementation details unless we absolutely _have_ to.

> > And.. aside from the thoughts I outlined in the email referenced above,
> > there is always the chance people accidentally rely on the strong
> > ordering on their x86 CPU and find things come apart when ran on their
> > ARM/MIPS/etc..
> > 
> > There are a fair number of people who use the raw futex call and we have
> > 0 visibility into many of them. The assumed and accidental ordering
> > guarantees will forever remain a mystery.
> > 
> 
> Understood. That's truely a potential problem. Considering not all the
> architectures imply a full barrier at user<->kernel boundries, maybe we
> can use one bit in the opcode of the futex system call to indicate
> whether userspace treats futex as fully ordered. Like:
> 
> #define FUTEX_ORDER_SEQ_CST  0
> #define FUTEX_ORDER_RELAXED  64 (bit 7 and bit 8 are already used)

Not unless there's an actual performance problem with any of this.
Futexes are painful enough as is.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev