Re: [RFC] perf: fix building for ARCv1

2016-10-18 Thread Vineet Gupta
On 10/19/2015 02:35 AM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:28:43AM +, Vineet Gupta wrote:
>> On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
>>> Vineet Gupta  writes:
 But this user space - so IMHO UP/SMP doesn't matter and we can't simulate 
 them in
 C just by itself.
>>> It matters when you access the perf ring buffer which is updated by kernel.
>> That's part of the problem. The issue is with atomic_* APIs proliferation in 
>> perf
>> user space code which assumes native atomix r-m-w support which is not always
>> true. So I think we still need a feature detection mechanism and if absent 
>> leave
>> the ball in arch court by calling arch_atomic_* which can use creative or 
>> half
>> working measures so perf will work to some extent atleast and not bomb 
>> outright.
>>
>> Also can u please elaborate a bit on "simulate them in C" - u mean just 
>> simple
>> unprotected LD, OP, ST or do u fancy usage of futex etc?
> Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an
> IRQ-disabled load-op-store sequence.

So I got around to addressing this - now that someone actually is trying to use
NPTL (which uses llock/scond) on ARC700 lacking those instructions. However 
given
that we are going this route, FWIW ARM kernel got rid of this syscall with
db695c0509d6ec ("ARM: remove user cmpxchg syscall") citing some security hole.
Even of we were to disregard, the code at the time had some open code MM 
trickery,
which I'd rather not replicate. My use case is simple - I only need to support 
UP
config - and a simple {get,put}_user would suffice - given that that can
potentially take a TLB refill Miss or worse still a full page fault. I'm going 
to
cook that patch to add that syscall, but wanted to get some thoughts ahead of 
that.

-Vineet


Re: [RFC] perf: fix building for ARCv1

2016-02-09 Thread Vineet Gupta
On Friday 05 February 2016 09:40 PM, a...@redhat.com wrote:
> Em Fri, Feb 05, 2016 at 11:18:52AM +, Noam Camus escreveu:
>> Well here for EZchip I also see the:
>> undefined reference to `__sync_add_and_fetch_4'
>> undefined reference to `__sync_sub_and_fetch_4'
> 
> Yeah, because there is no: tools/arch/arc/include/asm/atomic.h, can't
> you guys adapt arch/arc/include/asm/atomic.h to use in userspace?

Sure - however we need to support 3 variants: LLSC, !LLSC, EZCHIP

If needed, latter 2 could be done using a new atomic assist syscall

I presume kernel Kconfig items are no go in this header so this diversity
management needs to use toolchain defined macros e.g. __ezchip__


> 
> - Arnaldo
>  
>> This is since at file tools/include/asm/atomic.h we use the generic 
>> implementation
>> If for ARC I could use just like x86 my own header file then functions like:
>> atomic_inc()
>> atomic_dec_and_test()
>> Are easy to implement and you may see an example for such atomic methods in 
>> my patch set for the new platform.
>>
>> You however wants to use some GCC flag -matomic which I assume somehow will 
>> implement the above __sync*.
>> I can't find the implementation but if it uses LLSC then it won't work for 
>> me since I am not supporting LLSC.
> 
>> So seem that either I have my own header at kernel or that I need to
>> change the GCC implementation for __sync* to use my atomic
>> instructions.  I am personally tend to the x86 solution and not the
>> generic one since changing GCC will require to have new compiler
>> dependency.



Re: [RFC] perf: fix building for ARCv1

2016-02-05 Thread Noam Camus



>From: Vineet Gupta 
>Sent: Thursday, February 4, 2016 6:13 AM
>Noam, what's the atomic story for EZChip. Do you support such things for user
space in GNU tools. If -atomic is added to perf user space builds are you guys 
OK!

Well here for EZchip I also see the:
undefined reference to `__sync_add_and_fetch_4'
undefined reference to `__sync_sub_and_fetch_4'

This is since at file tools/include/asm/atomic.h we use the generic 
implementation
If for ARC I could use just like x86 my own header file then functions like:
atomic_inc()
atomic_dec_and_test()
Are easy to implement and you may see an example for such atomic methods in my 
patch set for the new platform.

You however wants to use some GCC flag -matomic which I assume somehow will 
implement the above __sync*.
I can't find the implementation but if it uses LLSC then it won't work for me 
since I am not supporting LLSC.

So seem that either I have my own header at kernel or that I need to change the 
GCC implementation for __sync* to use my atomic instructions.
I am personally tend to the x86 solution and not the generic one since changing 
GCC will require to have new compiler dependency.
 
-Noam


Re: [RFC] perf: fix building for ARCv1

2016-02-03 Thread Vineet Gupta
+CC Noam

On Wednesday 03 February 2016 09:50 PM, Alexey Brodkin wrote:
>> I agree with the current solution to add -atomic to for arc700 builds.
>> > Although making that default for arc700 tools will be better but that will 
>> > not fix
>> > things before next release of tools etc.
>> > 
>> > But we *do* need to improve generic solution
>> > 1. Add atomics detection in perf to add fall back arch stubs
>> > 2. ARC needs to add syscall for facilitating atomic r-m-w !
> So the most recent ARC GNU tools (2015.12) were just released yesterday
> and still atomics are disabled by default for ARCv1.
> That means perf will continue to fail on building for now.
> 
> Do you think we may apply my initial fix to 4.5 while it is in development and
> then to stable trees as well?

Noam, what's the atomic story for EZChip. Do you support such things for user
space in GNU tools. If -atomic is added to perf user space builds are you guys 
OK!

-Vineet


Re: [RFC] perf: fix building for ARCv1

2016-02-03 Thread Alexey Brodkin
Hi Vineet,

On Fri, 2015-10-30 at 06:19 +, Vineet Gupta wrote:
> On Thursday 29 October 2015 09:28 PM, Alexey Brodkin wrote:
> > Hi Vineet,
> > 
> > On Tue, 2015-10-20 at 10:45 +, Vineet Gupta wrote:
> > > On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
> > > > > > Can we use existing syscall(s) - again this is what our good old 
> > > > > > pthread library
> > > > > > code did.
> > > > > > 
> > > > > > static void __pthread_acquire(int * spinlock)
> > > > > > {
> > > > > >   int cnt = 0;
> > > > > >   struct timespec tm;
> > > > > > 
> > > > > >   READ_MEMORY_BARRIER();
> > > > > > 
> > > > > >   while (testandset(spinlock)) {   < atomic EXchange
> > > > > > if (cnt < 50) {
> > > > > >   sched_yield();
> > > > > >   cnt++;
> > > > > > } else {
> > > > > >   tm.tv_sec = 0;
> > > > > >   tm.tv_nsec = 201;
> > > > > >   nanosleep(&tm, ((void *)0));
> > > > > >   cnt = 0;
> > > > > > }
> > > > > >   }
> > > > *shudder* that is quite horrible.
> > > > 
> > > > This means all your 'atomics' are broken for anything SCHED_FIFO and the
> > > > like. You simply _cannot_ run a realtime system.
> > > The code above is from uClibc old threading library which we don't use 
> > > anymore.
> > > The NPTL version doesn't have all of this song-n-dance and relies on 
> > > futexes. The
> > > change we are talking about is only for the atomics in perf itself. I do
> > > understand your POV though.
> > > 
> > > > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
> > > > test-and-set control dependency.)
> > > Absolutely and in this case it will have to be added both inside the loop 
> > > and one
> > > at the end to cover both the scenarios !
> > > 
> > I'm wondering what are our plans for now?
> > Are we going to accept proposed fix just for ARC in 4.4 (and to all stables 
> > then)
> > or we'll try to come up with more general solution?
> 
> I agree with the current solution to add -atomic to for arc700 builds.
> Although making that default for arc700 tools will be better but that will 
> not fix
> things before next release of tools etc.
> 
> But we *do* need to improve generic solution
> 1. Add atomics detection in perf to add fall back arch stubs
> 2. ARC needs to add syscall for facilitating atomic r-m-w !

So the most recent ARC GNU tools (2015.12) were just released yesterday
and still atomics are disabled by default for ARCv1.
That means perf will continue to fail on building for now.

Do you think we may apply my initial fix to 4.5 while it is in development and
then to stable trees as well?

-Alexey

Re: [RFC] perf: fix building for ARCv1

2015-10-29 Thread Vineet Gupta
On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote:
> Perf uses atomic options and so it is required to have atomics enabled
> in toolchain.
>
> In case of ARC atomics are enabled by default for ARCv2 but disabled for
> ARCv1. Now we explicitly enable atomics for either ARC achitecture
> version so perf could be successfully built.
>
> Currently on attempt to build perf for ARCv1 you'll see tons of:
> ->8-
> undefined reference to `__sync_add_and_fetch_4'
> ->8-
>
> Still note if ARCv1 CPU is configured without LL/SC perf will crash on
> execution once "llock" instruction is attempted to be executed.
>
> Cc: Vineet Gupta 
> Cc: Peter Zijlstra 
> Cc: Arnaldo Carvalho de Melo 
> Signed-off-by: Alexey Brodkin 

Acked-by: Vineet Gupta 


> ---
>  tools/perf/config/Makefile | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> index 38a0853..dc7c0a8 100644
> --- a/tools/perf/config/Makefile
> +++ b/tools/perf/config/Makefile
> @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64)
>LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
>  endif
>  
> +# Additional ARCH settings for ARC
> +ifeq ($(ARCH),arc)
> +  CFLAGS += -matomic
> +endif
> +
>  ifeq ($(NO_PERF_REGS),0)
>$(call detected,CONFIG_PERF_REGS)
>  endif

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-29 Thread Vineet Gupta
On Thursday 29 October 2015 09:28 PM, Alexey Brodkin wrote:
> Hi Vineet,
>
> On Tue, 2015-10-20 at 10:45 +, Vineet Gupta wrote:
>> On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
> Can we use existing syscall(s) - again this is what our good old pthread 
> library
> code did.
>
> static void __pthread_acquire(int * spinlock)
> {
>   int cnt = 0;
>   struct timespec tm;
>
>   READ_MEMORY_BARRIER();
>
>   while (testandset(spinlock)) {   < atomic EXchange
> if (cnt < 50) {
>   sched_yield();
>   cnt++;
> } else {
>   tm.tv_sec = 0;
>   tm.tv_nsec = 201;
>   nanosleep(&tm, ((void *)0));
>   cnt = 0;
> }
>   }
>>> *shudder* that is quite horrible.
>>>
>>> This means all your 'atomics' are broken for anything SCHED_FIFO and the
>>> like. You simply _cannot_ run a realtime system.
>> The code above is from uClibc old threading library which we don't use 
>> anymore.
>> The NPTL version doesn't have all of this song-n-dance and relies on 
>> futexes. The
>> change we are talking about is only for the atomics in perf itself. I do
>> understand your POV though.
>>
>>> (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
>>> test-and-set control dependency.)
>> Absolutely and in this case it will have to be added both inside the loop 
>> and one
>> at the end to cover both the scenarios !
>>
> I'm wondering what are our plans for now?
> Are we going to accept proposed fix just for ARC in 4.4 (and to all stables 
> then)
> or we'll try to come up with more general solution?

I agree with the current solution to add -atomic to for arc700 builds.
Although making that default for arc700 tools will be better but that will not 
fix
things before next release of tools etc.

But we *do* need to improve generic solution
1. Add atomics detection in perf to add fall back arch stubs
2. ARC needs to add syscall for facilitating atomic r-m-w !

-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-29 Thread Alexey Brodkin
Hi Vineet,

On Tue, 2015-10-20 at 10:45 +, Vineet Gupta wrote:
> On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
> > > > Can we use existing syscall(s) - again this is what our good old 
> > > > pthread library
> > > > code did.
> > > > 
> > > > static void __pthread_acquire(int * spinlock)
> > > > {
> > > >   int cnt = 0;
> > > >   struct timespec tm;
> > > > 
> > > >   READ_MEMORY_BARRIER();
> > > > 
> > > >   while (testandset(spinlock)) {   < atomic EXchange
> > > > if (cnt < 50) {
> > > >   sched_yield();
> > > >   cnt++;
> > > > } else {
> > > >   tm.tv_sec = 0;
> > > >   tm.tv_nsec = 201;
> > > >   nanosleep(&tm, ((void *)0));
> > > >   cnt = 0;
> > > > }
> > > >   }
> > *shudder* that is quite horrible.
> > 
> > This means all your 'atomics' are broken for anything SCHED_FIFO and the
> > like. You simply _cannot_ run a realtime system.
> 
> The code above is from uClibc old threading library which we don't use 
> anymore.
> The NPTL version doesn't have all of this song-n-dance and relies on futexes. 
> The
> change we are talking about is only for the atomics in perf itself. I do
> understand your POV though.
> 
> > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
> > test-and-set control dependency.)
> 
> Absolutely and in this case it will have to be added both inside the loop and 
> one
> at the end to cover both the scenarios !
> 

I'm wondering what are our plans for now?
Are we going to accept proposed fix just for ARC in 4.4 (and to all stables 
then)
or we'll try to come up with more general solution?

-Alexey

Re: [RFC] perf: fix building for ARCv1

2015-10-20 Thread Vineet Gupta
On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
>> > Can we use existing syscall(s) - again this is what our good old pthread 
>> > library
>> > code did.
>> > 
>> > static void __pthread_acquire(int * spinlock)
>> > {
>> >   int cnt = 0;
>> >   struct timespec tm;
>> > 
>> >   READ_MEMORY_BARRIER();
>> > 
>> >   while (testandset(spinlock)) {   < atomic EXchange
>> > if (cnt < 50) {
>> >   sched_yield();
>> >   cnt++;
>> > } else {
>> >   tm.tv_sec = 0;
>> >   tm.tv_nsec = 201;
>> >   nanosleep(&tm, ((void *)0));
>> >   cnt = 0;
>> > }
>> >   }
> *shudder* that is quite horrible.
>
> This means all your 'atomics' are broken for anything SCHED_FIFO and the
> like. You simply _cannot_ run a realtime system.

The code above is from uClibc old threading library which we don't use anymore.
The NPTL version doesn't have all of this song-n-dance and relies on futexes. 
The
change we are talking about is only for the atomics in perf itself. I do
understand your POV though.

> (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
> test-and-set control dependency.)

Absolutely and in this case it will have to be added both inside the loop and 
one
at the end to cover both the scenarios !

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-20 Thread Peter Zijlstra
On Tue, Oct 20, 2015 at 08:00:46AM +, Vineet Gupta wrote:
> On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote:
> > On Mon, Oct 19, 2015 at 09:46:35AM +, Vineet Gupta wrote:
> >> On ARC we could use the atomic EXchange to implement a user space only 
> >> binary
> >> semaphore - these atomic ops will be small duration so it is OK to spin 
> >> wait for a
> >> little bit. That's how the old pthread library worked for ARC w/o any 
> >> atomic support.
> > That has the obvious problem of lock-holder-preemption and the horrible
> > performance issues that result from that.
> >
> > I think the syscall at least has deterministic behaviour, whereas that
> > userspace spin loop has this abysmal worst case thing.
> 
> I don't have issue with adding the syscall per-se. But that comes with it's 
> own
> headaches of ABI change - more importantly it requires several things to 
> match,
> libc, kernel...  It would be easier if change was confined to say perf.

OTOH fixing all those would get you a 'sane' system :-)

> Can we use existing syscall(s) - again this is what our good old pthread 
> library
> code did.
> 
> static void __pthread_acquire(int * spinlock)
> {
>   int cnt = 0;
>   struct timespec tm;
> 
>   READ_MEMORY_BARRIER();
> 
>   while (testandset(spinlock)) {   < atomic EXchange
> if (cnt < 50) {
>   sched_yield();
>   cnt++;
> } else {
>   tm.tv_sec = 0;
>   tm.tv_nsec = 201;
>   nanosleep(&tm, ((void *)0));
>   cnt = 0;
> }
>   }

*shudder* that is quite horrible.

This means all your 'atomics' are broken for anything SCHED_FIFO and the
like. You simply _cannot_ run a realtime system.

(also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
test-and-set control dependency.)

But its your arch..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-20 Thread Vineet Gupta
On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:46:35AM +, Vineet Gupta wrote:
>> On ARC we could use the atomic EXchange to implement a user space only binary
>> semaphore - these atomic ops will be small duration so it is OK to spin wait 
>> for a
>> little bit. That's how the old pthread library worked for ARC w/o any atomic 
>> support.
> That has the obvious problem of lock-holder-preemption and the horrible
> performance issues that result from that.
>
> I think the syscall at least has deterministic behaviour, whereas that
> userspace spin loop has this abysmal worst case thing.

I don't have issue with adding the syscall per-se. But that comes with it's own
headaches of ABI change - more importantly it requires several things to match,
libc, kernel...  It would be easier if change was confined to say perf.

Can we use existing syscall(s) - again this is what our good old pthread library
code did.

static void __pthread_acquire(int * spinlock)
{
  int cnt = 0;
  struct timespec tm;

  READ_MEMORY_BARRIER();

  while (testandset(spinlock)) {   < atomic EXchange
if (cnt < 50) {
  sched_yield();
  cnt++;
} else {
  tm.tv_sec = 0;
  tm.tv_nsec = 201;
  nanosleep(&tm, ((void *)0));
  cnt = 0;
}
  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-19 Thread Vineet Gupta
On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:46:35AM +, Vineet Gupta wrote:
>> > On ARC we could use the atomic EXchange to implement a user space only 
>> > binary
>> > semaphore - these atomic ops will be small duration so it is OK to spin 
>> > wait for a
>> > little bit. That's how the old pthread library worked for ARC w/o any 
>> > atomic support.
> That has the obvious problem of lock-holder-preemption and the horrible
> performance issues that result from that.
>
> I think the syscall at least has deterministic behaviour, whereas that
> userspace spin loop has this abysmal worst case thing.

I agree - we can add that syscall trivially and use it based on build time 
feature
detection for atomics !
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-19 Thread Peter Zijlstra
On Mon, Oct 19, 2015 at 09:46:35AM +, Vineet Gupta wrote:
> On ARC we could use the atomic EXchange to implement a user space only binary
> semaphore - these atomic ops will be small duration so it is OK to spin wait 
> for a
> little bit. That's how the old pthread library worked for ARC w/o any atomic 
> support.

That has the obvious problem of lock-holder-preemption and the horrible
performance issues that result from that.

I think the syscall at least has deterministic behaviour, whereas that
userspace spin loop has this abysmal worst case thing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-19 Thread Vineet Gupta
On Monday 19 October 2015 03:05 PM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:28:43AM +, Vineet Gupta wrote:
>> > On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
>>> > > Vineet Gupta  writes:
 > >> But this user space - so IMHO UP/SMP doesn't matter and we can't 
 > >> simulate them in
 > >> C just by itself.
>>> > > It matters when you access the perf ring buffer which is updated by 
>>> > > kernel.
>> > 
>> > That's part of the problem. The issue is with atomic_* APIs proliferation 
>> > in perf
>> > user space code which assumes native atomix r-m-w support which is not 
>> > always
>> > true. So I think we still need a feature detection mechanism and if absent 
>> > leave
>> > the ball in arch court by calling arch_atomic_* which can use creative or 
>> > half
>> > working measures so perf will work to some extent atleast and not bomb 
>> > outright.
>> > 
>> > Also can u please elaborate a bit on "simulate them in C" - u mean just 
>> > simple
>> > unprotected LD, OP, ST or do u fancy usage of futex etc?
> Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an
> IRQ-disabled load-op-store sequence.

Yeah I remember seeing some syscall like that in ARM.

On ARC we could use the atomic EXchange to implement a user space only binary
semaphore - these atomic ops will be small duration so it is OK to spin wait 
for a
little bit. That's how the old pthread library worked for ARC w/o any atomic 
support.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-19 Thread Peter Zijlstra
On Mon, Oct 19, 2015 at 09:28:43AM +, Vineet Gupta wrote:
> On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
> > Vineet Gupta  writes:
> >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate 
> >> them in
> >> C just by itself.
> > It matters when you access the perf ring buffer which is updated by kernel.
> 
> That's part of the problem. The issue is with atomic_* APIs proliferation in 
> perf
> user space code which assumes native atomix r-m-w support which is not always
> true. So I think we still need a feature detection mechanism and if absent 
> leave
> the ball in arch court by calling arch_atomic_* which can use creative or half
> working measures so perf will work to some extent atleast and not bomb 
> outright.
> 
> Also can u please elaborate a bit on "simulate them in C" - u mean just simple
> unprotected LD, OP, ST or do u fancy usage of futex etc?

Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an
IRQ-disabled load-op-store sequence.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-19 Thread Vineet Gupta
On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
> Vineet Gupta  writes:
>> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate 
>> them in
>> C just by itself.
> It matters when you access the perf ring buffer which is updated by kernel.

That's part of the problem. The issue is with atomic_* APIs proliferation in 
perf
user space code which assumes native atomix r-m-w support which is not always
true. So I think we still need a feature detection mechanism and if absent leave
the ball in arch court by calling arch_atomic_* which can use creative or half
working measures so perf will work to some extent atleast and not bomb outright.

Also can u please elaborate a bit on "simulate them in C" - u mean just simple
unprotected LD, OP, ST or do u fancy usage of futex etc?

> Also perf is now multi threaded to some degree.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-18 Thread Andi Kleen
Vineet Gupta  writes:
>
> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate 
> them in
> C just by itself.

It matters when you access the perf ring buffer which is updated by kernel.
Also perf is now multi threaded to some degree.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-18 Thread Vineet Gupta
On Monday 19 October 2015 04:45 AM, Andi Kleen wrote:
> Alexey Brodkin  writes:
>> So the best we may do is to implement detection of atomics in the toolchain 
>> and if there's no atomics hard stop with
>> perf building.
> If your target is single cpu only you can always simulate them in C.
>
> -Andi

But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them 
in
C just by itself.

-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-18 Thread Andi Kleen
Alexey Brodkin  writes:
>
> So the best we may do is to implement detection of atomics in the toolchain 
> and if there's no atomics hard stop with
> perf building.

If your target is single cpu only you can always simulate them in C.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: fix building for ARCv1

2015-10-18 Thread Alexey Brodkin
Hi Vineet,

Looks like this time atomics are a must. And that really sucks!

See these commits that introduce usage of atomic_xxx() all around the perf and 
tools it uses:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f812d3045c2385ac16237e68b156859c4005526e
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3a7c489c7fd2463e3b2c3a2179c7be879dd9cb4
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7143849a5d6a5c623d81790d92f0033507c5b14f
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=59a51c1dc9fbb3fb4af928b852d7b35df83edd74
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e1ed3a5b87ed6759e16ec93f16aae83d2cc77ca2

and that's the one that introduced usage of the following generic gcc's atomics
(__sync_add_and_fetch/__sync_sub_and_fetch):
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da6d8567512df11e0473b710c07de87efde5709c

So the best we may do is to implement detection of atomics in the toolchain and 
if there's no atomics hard stop with
perf building.

-Alexey

On Sat, 2015-10-17 at 14:19 +, Vineet Gupta wrote:
> On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote:
> > Perf uses atomic options and so it is required to have atomics enabled
> > in toolchain.
> > 
> > In case of ARC atomics are enabled by default for ARCv2 but disabled for
> > ARCv1. Now we explicitly enable atomics for either ARC achitecture
> > version so perf could be successfully built.
> > 
> > Currently on attempt to build perf for ARCv1 you'll see tons of:
> > ->8-
> > undefined reference to `__sync_add_and_fetch_4'
> > ->8-
> > 
> > Still note if ARCv1 CPU is configured without LL/SC perf will crash on
> > execution once "llock" instruction is attempted to be executed.
> 
> Ok this fixes ARCompact - assuming it will have LL/SC. We do have old SoCs w/o
> that support.
> So what we are saying is that any arch (or a configuration thereof) which 
> doesn't
> support atomic r-m-w can't even build perf now - that sucks !
> 
> A better way would be to do feature test for __sync_xyz and make atomic_xxx
> wrappers call __sync_xyz) vs. an empty stub.
> So atleast such arches can build and do "some" perf work !
> 
> -Vineet
> 
> > Cc: Vineet Gupta 
> > Cc: Peter Zijlstra 
> > Cc: Arnaldo Carvalho de Melo 
> > Signed-off-by: Alexey Brodkin 
> > ---
> >  tools/perf/config/Makefile | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> > index 38a0853..dc7c0a8 100644
> > --- a/tools/perf/config/Makefile
> > +++ b/tools/perf/config/Makefile
> > @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64)
> >LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
> >  endif
> >  
> > +# Additional ARCH settings for ARC
> > +ifeq ($(ARCH),arc)
> > +  CFLAGS += -matomic
> > +endif
> > +
> >  ifeq ($(NO_PERF_REGS),0)
> >$(call detected,CONFIG_PERF_REGS)
> >  endif
> 
> 

Re: [RFC] perf: fix building for ARCv1

2015-10-17 Thread Vineet Gupta
On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote:
> Perf uses atomic options and so it is required to have atomics enabled
> in toolchain.
>
> In case of ARC atomics are enabled by default for ARCv2 but disabled for
> ARCv1. Now we explicitly enable atomics for either ARC achitecture
> version so perf could be successfully built.
>
> Currently on attempt to build perf for ARCv1 you'll see tons of:
> ->8-
> undefined reference to `__sync_add_and_fetch_4'
> ->8-
>
> Still note if ARCv1 CPU is configured without LL/SC perf will crash on
> execution once "llock" instruction is attempted to be executed.

Ok this fixes ARCompact - assuming it will have LL/SC. We do have old SoCs w/o
that support.
So what we are saying is that any arch (or a configuration thereof) which 
doesn't
support atomic r-m-w can't even build perf now - that sucks !

A better way would be to do feature test for __sync_xyz and make atomic_xxx
wrappers call __sync_xyz) vs. an empty stub.
So atleast such arches can build and do "some" perf work !

-Vineet

> Cc: Vineet Gupta 
> Cc: Peter Zijlstra 
> Cc: Arnaldo Carvalho de Melo 
> Signed-off-by: Alexey Brodkin 
> ---
>  tools/perf/config/Makefile | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> index 38a0853..dc7c0a8 100644
> --- a/tools/perf/config/Makefile
> +++ b/tools/perf/config/Makefile
> @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64)
>LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
>  endif
>  
> +# Additional ARCH settings for ARC
> +ifeq ($(ARCH),arc)
> +  CFLAGS += -matomic
> +endif
> +
>  ifeq ($(NO_PERF_REGS),0)
>$(call detected,CONFIG_PERF_REGS)
>  endif

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/