On Wed, Oct 20, 2010 at 1:13 PM, Peter Zijlstra <pet...@infradead.org> wrote: > On Tue, 2010-10-19 at 21:03 +0200, Stephane Eranian wrote: > >> >> Ok, I missed that. But I don't understand why you need the lock to >> >> udpate the time. The lower-level clock is lockless if I recall. Can't you >> >> use an atomic ops in update_context_time()? >> > >> > atomic ops would slow down those code paths, also, I don't think you can >> > fully get the ordering between ->tstamp_$foo and ->total_time_$foo just >> > right. >> > >> >> I don't get that. Could you give an example? > > Take update_context_time(), it has: > > now = perf_clock(); > ctx->time += now - ctx->timestamp; > ctx->timestamp = now; > > If you interleave two of those you get: > > ctx->timestamp = T0; > > > now = perf_clock(); /* T1 */ > ctx->time += now - ctx->timestamp; > now = perf_clock(); /* T2 */ > ctx->time += now - ctx->timestamp; > ctx->timestamp = now; > ctx->timestamp = now; > > > So at this point you would expect timestamp = T2 and time += T2-T0. > > Except that: time += T1 - T0 + T2 - T0 != T2 - T0 and > timestamp = T1 > > You can of course write it as something like x86_perf_event_update(), > but then there's trying to keep total_time_running and > total_time_enabled in sync. > I think it is more complicated than x86_perf_event_update() because, this time, you have to update 2 fields, i.e., ctx->time, ctx->timestamp. The difficulty is to backtrack after you've successfully and atomically set the first one. You cannot just subtract/replace what you've changed because it was visible by whoever interrupted you. So it is already too late.
It may be better to try another approach just for PERF_SAMPLE_READ with its own version of ctx->time. What about if on event_sched_in() you were snapshotting ctx->time. Then in the perf_output_read_event(), you'd have to compute the time delta and add it to this private version of ctx->time and store that in the sample. > >> > Not sure, but barring 64bit atomics for all these, 32bit archs and NMI >> > are going to be 'interesting' >> > >> >> Every sample needs to be correct, otherwise you run the risk of introducing >> bias. >> >> I think if the tradeoffs is correctness vs. speed, I'd choose correctness. > > Well, yes, but it sucks, esp. since its only relevant to > PERF_SAMPLE_READ. > ------------------------------------------------------------------------------ Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel