On 2013/07/30 09:58:31, Benedikt Meurer wrote:
On 2013/07/30 08:52:25, Yury Semikhatsky wrote:
> https://codereview.chromium.org/21101002/diff/1/src/cpu-profiler.cc
> File src/cpu-profiler.cc (right):
>
>
https://codereview.chromium.org/21101002/diff/1/src/cpu-profiler.cc#newcode175
> src/cpu-profiler.cc:175: ProcessEventsAndYield();
> On 2013/07/30 06:41:04, Benedikt Meurer wrote:
> > I think there are two severe issues with this loop, first of all running_
is
> not
> > volatile and there's no memory barrier for running_. And YieldCPU() should
not
> > be used. Maybe unrelated to the change at hand, but how about using proper
> > synchronization primitives (i.e. monitors/condition variables or
semaphores)
> > instead?
>
> The profiler code tries to avoid usage of synchronization primitives to
minimize
> effect it imposes on the thread that is being profiled. In particular both > queues used to pass code events and tick samples from the VM thread to the
> processing one are lock-free (I didn't see any measurements that would
justify
> this approach though). Adding a mutex/semaphore here may have a noticeable > performance impact so I'd prefer to stay with a more light-weight solution.
I
> believe changing running_ into Atomic32 and using NoBarrier_Load to read it
here
> should be enough, wdyt?

Using monitors or semaphores here would avoid both the busy waiting and the
synchronization issue.

Correct. With current implementation using monitors/semaphors may make sense. But note that we cannot use monitors/semaphors to access sample buffer as signal handler may be called when the monitor is acquired by some thread and then we
would end up with a dead lock.

One of the aspects of this change is to send SIGPROF signal on the profile event processing thread and eventually get rid of SamplerThread. We need to do that
because:

1) We don't want the processing thread to fall behind the samples buffer. One simple solution for this is not to send next signal before we have processed at least one sample. On the other hand we'd like to be able to send them at a fixed
rate (100us), so  we need to wait until some timeout expires.

2) OS::Sleep used on the sampler thread on some systems will be limited by the scheduler quantum and won't work for very short intervals so we cannot sleep for
100us.


> Also can you elaborate on why you think that YieldCPU should not be used?
Among
> other things I believe that a side-effect of using it here may be that it
> implies a memory barrier and makes this code work correctly:)

It _may_ indeed act as a memory barrier, but there's no warranty. YieldCPU()
is
used to reduce the negative effects of busy waiting. Even with YieldCPU(),
you're still burning one core with your loop most of the time.

Our assumption is that with a high sampling rate we can afford allocating one CPU core to the profile event processor and the sampling rate would be limited by the performance of that thread. So we intentionally want to have the core be
busy exclusively with the profile events processing.


> Anyways I'm going to address this in a separate CL as all the raised issues
are
> applicable to the current implementation.

Yeah, it's not really related to this CL. So this is LGTM.


https://codereview.chromium.org/21101002/

--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to