Hi Masami,
> > > OK, then I'll push this to for-next at this moment.
> > > Please share if you have a good idea for the batch interface which can be
> > > backported. I guess it should involve updating userspace changes too.
> >
> > Did you (or anyone else) need anything more from me on this one
> > > OK, then I'll push this to for-next at this moment.
> > > Please share if you have a good idea for the batch interface which can be
> > > backported. I guess it should involve updating userspace changes too.
> >
> > Did you (or anyone else) need anything more from me on this one so that it
>
Hi Masami,
> > > Which is why I was asking to land this patch as is, as it relieves the
> > > scalability pains in production and is easy to backport to old
> > > kernels. And then we can work on batched APIs and switch to per-CPU rw
> > > semaphore.
>
> OK, then I'll push this to for-next at thi
> > Things to note about the results:
> >
> > - The results are slightly variable so don't get too caught up on
> > individual thread count - it's the trend that is important.
> > - In terms of throughput with this specific benchmark a *very* macro view
> > is that the RW spinlock provides 40-6
> > > > Given the discussion around per-cpu rw semaphore and need for
> > > > (internal) batched attachment API for uprobes, do you think you can
> > > > apply this patch as is for now? We can then gain initial improvements
> > > > in scalability that are also easy to backport, and Jonathan will wo
> > Masami,
> >
> > Given the discussion around per-cpu rw semaphore and need for
> > (internal) batched attachment API for uprobes, do you think you can
> > apply this patch as is for now? We can then gain initial improvements
> > in scalability that are also easy to backport, and Jonathan will w
> > > Have you considered/measured per-CPU RW semaphores?
> >
> > No I hadn't but thanks hugely for suggesting it! In initial measurements
> > it seems to be between 20-100% faster than the RW spinlocks! Apologies for
> > all the exclamation marks but I'm very excited. I'll do some more testing
> >
Hi Ingo,
> > This change has been tested against production workloads that exhibit
> > significant contention on the spinlock and an almost order of magnitude
> > reduction for mean uprobe execution time is observed (28 -> 3.5 microsecs).
>
> Have you considered/measured per-CPU RW semaphores?
N
Hi Masami,
> > This change has been tested against production workloads that exhibit
> > significant contention on the spinlock and an almost order of magnitude
> > reduction for mean uprobe execution time is observed (28 -> 3.5 microsecs).
>
> Looks good to me.
>
> Acked-by: Masami Hiramatsu (G
9 matches
Mail list logo