[v8-dev] Re: Support higher CPU profiler sampling rate on posix systems (issue 21101002)

yurys Tue, 30 Jul 2013 05:34:39 -0700

On 2013/07/30 09:58:31, Benedikt Meurer wrote:

On 2013/07/30 08:52:25, Yury Semikhatsky wrote:
> https://codereview.chromium.org/21101002/diff/1/src/cpu-profiler.cc
> File src/cpu-profiler.cc (right):
>
>

https://codereview.chromium.org/21101002/diff/1/src/cpu-profiler.cc#newcode175

> src/cpu-profiler.cc:175: ProcessEventsAndYield();
> On 2013/07/30 06:41:04, Benedikt Meurer wrote:
> > I think there are two severe issues with this loop, first of allrunning_

is

> not
> > volatile and there's no memory barrier for running_. And YieldCPU()should
not
> > be used. Maybe unrelated to the change at hand, but how about usingproper
> > synchronization primitives (i.e. monitors/condition variables or

semaphores)

> > instead?
>
> The profiler code tries to avoid usage of synchronization primitives to
minimize
> effect it imposes on the thread that is being profiled. In particularboth> queues used to pass code events and tick samples from the VM thread tothe
> processing one are lock-free (I didn't see any measurements that would

justify

> this approach though). Adding a mutex/semaphore here may have anoticeable> performance impact so I'd prefer to stay with a more light-weightsolution.

> believe changing running_ into Atomic32 and using NoBarrier_Load toread it
here
> should be enough, wdyt?

Using monitors or semaphores here would avoid both the busy waiting andthe
synchronization issue.

Correct. With current implementation using monitors/semaphors may makesense.But note that we cannot use monitors/semaphors to access sample buffer assignalhandler may be called when the monitor is acquired by some thread and thenwe

would end up with a dead lock.

One of the aspects of this change is to send SIGPROF signal on the profileeventprocessing thread and eventually get rid of SamplerThread. We need to dothat

because:

1) We don't want the processing thread to fall behind the samples buffer.Onesimple solution for this is not to send next signal before we haveprocessed atleast one sample. On the other hand we'd like to be able to send them at afixed

rate (100us), so  we need to wait until some timeout expires.

2) OS::Sleep used on the sampler thread on some systems will be limited bythescheduler quantum and won't work for very short intervals so we cannotsleep for

100us.

> Also can you elaborate on why you think that YieldCPU should not beused?
Among
> other things I believe that a side-effect of using it here may be thatit
> implies a memory barrier and makes this code work correctly:)

It _may_ indeed act as a memory barrier, but there's no warranty.YieldCPU()

is

used to reduce the negative effects of busy waiting. Even with YieldCPU(),
you're still burning one core with your loop most of the time.

Our assumption is that with a high sampling rate we can afford allocatingoneCPU core to the profile event processor and the sampling rate would belimitedby the performance of that thread. So we intentionally want to have thecore be

busy exclusively with the profile events processing.

> Anyways I'm going to address this in a separate CL as all the raisedissues
are
> applicable to the current implementation.

Yeah, it's not really related to this CL. So this is LGTM.



https://codereview.chromium.org/21101002/

--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

---You received this message because you are subscribed to the Google Groups "v8-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

[v8-dev] Re: Support higher CPU profiler sampling rate on posix systems (issue 21101002)

Reply via email to