* Frank Ch. Eigler ([EMAIL PROTECTED]) wrote:
> Hi -
>
> On Fri, Jan 18, 2008 at 10:55:27PM -0500, Steven Rostedt wrote:
> > [...]
> > > All this complexity is to be justified by keeping the raw prev/next
> > > pointers from being sent to a naive tracer? It seems to me way out of
> > > proportion
Hi -
On Fri, Jan 18, 2008 at 10:55:27PM -0500, Steven Rostedt wrote:
> [...]
> > All this complexity is to be justified by keeping the raw prev/next
> > pointers from being sent to a naive tracer? It seems to me way out of
> > proportion.
>
> Damn, and I just blew away all my marker code for som
On Fri, 18 Jan 2008, Frank Ch. Eigler wrote:
>
> All this complexity is to be justified by keeping the raw prev/next
> pointers from being sent to a naive tracer? It seems to me way out of
> proportion.
Damn, and I just blew away all my marker code for something like this ;-)
Actually, you jus
Hi -
On Fri, Jan 18, 2008 at 06:19:29PM -0500, Mathieu Desnoyers wrote:
> [...]
> Almost.. I would add :
>
> static int trace_switch_to_enabled;
>
> > static inline trace_switch_to(struct task_struct *prev,
> > struct task_struct *next)
> > {
> if (likely(!trace_switch_to
Hi -
On Fri, Jan 18, 2008 at 05:49:19PM -0500, Steven Rostedt wrote:
> [...]
> > But I have not seen a lot of situations where that kind of glue-code was
> > needed, so I think it makes sense to keep markers simple to use and
> > efficient for the common case.
> >
> > Then, in this glue-code, we c
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
> On Fri, 18 Jan 2008, Mathieu Desnoyers wrote:
> >
> > But I have not seen a lot of situations where that kind of glue-code was
> > needed, so I think it makes sense to keep markers simple to use and
> > efficient for the common case.
> >
> > Then, in th
On Fri, 18 Jan 2008, Mathieu Desnoyers wrote:
>
> But I have not seen a lot of situations where that kind of glue-code was
> needed, so I think it makes sense to keep markers simple to use and
> efficient for the common case.
>
> Then, in this glue-code, we can put trace_mark() and calls to in-kern
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
>
> On Thu, 17 Jan 2008, Frank Ch. Eigler wrote:
>
> > Hi -
> >
> > On Thu, Jan 17, 2008 at 03:08:33PM -0500, Steven Rostedt wrote:
> > > [...]
> > > + trace_mark(kernel_sched_schedule,
> > > + "prev_pid %d next_pid %d prev_state %ld
On Thu, 17 Jan 2008, Frank Ch. Eigler wrote:
> Hi -
>
> On Thu, Jan 17, 2008 at 03:08:33PM -0500, Steven Rostedt wrote:
> > [...]
> > + trace_mark(kernel_sched_schedule,
> > + "prev_pid %d next_pid %d prev_state %ld",
> > + prev->pid, next->pid, prev->state);
> >
Hi -
On Thu, Jan 17, 2008 at 03:08:33PM -0500, Steven Rostedt wrote:
> [...]
> + trace_mark(kernel_sched_schedule,
> + "prev_pid %d next_pid %d prev_state %ld",
> + prev->pid, next->pid, prev->state);
> [...]
> But...
>
> Tracers that want to do a bit more work,
> > > >
> > > > One thing I want to clear up. The major difference between this
> > > > latency_tracer and LTTng is what we consider fast paths. The latency
> > > > tracer is recording things like enabling and disabling interrupts,
> > > > preempt
> > > > count changes, or simply profiling all fu
On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
>
> Or could we map a per-thread page that would contradict this
> "definition" ?
Over my dead body.
It's been done before. Many times. It's horrible, and means that you need
to flush the TLB on context switches between threads and cannot share th
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
>
> On Thu, 17 Jan 2008, Paul Mackerras wrote:
> >
> > It's very hard to do a per-thread counter in the VDSO, since threads
> > in the same process see the same memory, by definition. You'd have to
> > have an array of counters and have some way for eac
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
>
> On Thu, 17 Jan 2008, Paul Mackerras wrote:
> >
> > It's very hard to do a per-thread counter in the VDSO, since threads
> > in the same process see the same memory, by definition. You'd have to
> > have an array of counters and have some way for eac
>
> Crazy ideas :
>
> Could we do something along the lines of the thread local storage ?
>
> Or could we map a per-thread page that would contradict this
> "definition" ?
When working on lguest64, I implemented a "per CPU" shadow page. That the
process of a guest running on one real CPU, could n
* Paul Mackerras ([EMAIL PROTECTED]) wrote:
> Mathieu Desnoyers writes:
>
> > Sorry for self-reply, but I thought, in the past, of a way to make this
> > possible.
> >
> > It would imply the creation of a new vsyscall : vgetschedperiod
> >
> > It would read a counter that would increment each ti
On Thu, 17 Jan 2008, Paul Mackerras wrote:
>
> It's very hard to do a per-thread counter in the VDSO, since threads
> in the same process see the same memory, by definition. You'd have to
> have an array of counters and have some way for each thread to know
> which entry to read. Also you'd have
Mathieu Desnoyers writes:
> Sorry for self-reply, but I thought, in the past, of a way to make this
> possible.
>
> It would imply the creation of a new vsyscall : vgetschedperiod
>
> It would read a counter that would increment each time the thread is
> scheduled out (or in). It would be a per
On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> It would imply the creation of a new vsyscall : vgetschedperiod
>
> It would read a counter that would increment each time the thread is
> scheduled out (or in). It would be a per thread counter (not a per cpu
> counter) so we can deal appropriately
* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote:
> * john stultz ([EMAIL PROTECTED]) wrote:
> >
> > On Wed, 2008-01-16 at 18:33 -0500, Steven Rostedt wrote:
> > > Thanks John for doing this!
> > >
> > > (comments imbedded)
> > >
> > > On Wed, 16 Jan 2008, john stultz wrote:
> > > > + int num
On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> >
> > Yep. clocksource_get_cycles() ended up not being as useful as an helper
> > function (I was hoping the arch vsyscall implementations could use it,
> > but they've done too much optimization - although that may reflect a
> > need up the chain to
* john stultz ([EMAIL PROTECTED]) wrote:
>
> On Wed, 2008-01-16 at 18:33 -0500, Steven Rostedt wrote:
> > Thanks John for doing this!
> >
> > (comments imbedded)
> >
> > On Wed, 16 Jan 2008, john stultz wrote:
> > > + int num = !cs->base_num;
> > > + cycle_t offset = (now - cs->base[!num].cycle_
* john stultz ([EMAIL PROTECTED]) wrote:
> On Wed, 2008-01-16 at 18:39 -0500, Mathieu Desnoyers wrote:
> > I would disable preemption in clocksource_get_basecycles. We would not
> > want to be scheduled out while we hold a pointer to the old array
> > element.
> >
> > > + int num = cs->base_num;
>
On Wed, 2008-01-16 at 18:33 -0500, Steven Rostedt wrote:
> Thanks John for doing this!
>
> (comments imbedded)
>
> On Wed, 16 Jan 2008, john stultz wrote:
> > + int num = !cs->base_num;
> > + cycle_t offset = (now - cs->base[!num].cycle_base_last);
> > + offset &= cs->mask;
> > + cs->bas
* john stultz ([EMAIL PROTECTED]) wrote:
>
> On Wed, 2008-01-16 at 18:39 -0500, Mathieu Desnoyers wrote:
> > * john stultz ([EMAIL PROTECTED]) wrote:
> > >
> > > On Wed, 2008-01-16 at 14:36 -0800, john stultz wrote:
> > > > On Jan 16, 2008 6:56 AM, Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> >
On Wed, 2008-01-16 at 18:39 -0500, Mathieu Desnoyers wrote:
> I would disable preemption in clocksource_get_basecycles. We would not
> want to be scheduled out while we hold a pointer to the old array
> element.
>
> > + int num = cs->base_num;
>
> Since you deal with base_num in a shared manner
* Linus Torvalds ([EMAIL PROTECTED]) wrote:
>
>
> On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> >
> > > + int num = !cs->base_num;
> > > + cycle_t offset = (now - cs->base[!num].cycle_base_last);
> >
> > !0 is not necessarily 1.
>
> Incorrect.
>
Hrm, *digging in my mailbox*, ah, here it is
On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
>
> > + int num = !cs->base_num;
> > + cycle_t offset = (now - cs->base[!num].cycle_base_last);
>
> !0 is not necessarily 1.
Incorrect.
!0 _is_ necessarily 1. It's how all C logical operators work. If you find
a compiler that turns !x into any
On Wed, 16 Jan 2008, Steven Rostedt wrote:
> On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> >
> > !0 is not necessarily 1. This is why I use cpu_synth->index ? 0 : 1 in
>
> How about simply "cpu_synth->index ^ 1"? Seems the best choice if you ask
> me, if all you are doing is changing it from 1
On Wed, 2008-01-16 at 18:39 -0500, Mathieu Desnoyers wrote:
> * john stultz ([EMAIL PROTECTED]) wrote:
> >
> > On Wed, 2008-01-16 at 14:36 -0800, john stultz wrote:
> > > On Jan 16, 2008 6:56 AM, Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> > > > If you really want an seqlock free algorithm (I
On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
>
> > - cycle_t offset = (now - cs->cycle_last) & cs->mask;
> > + /* First update the monotonic base portion.
> > +* The dual array update method allows for lock-free reading.
> > +*/
> > + int num = !cs->base_num;
> > + cycle_t offset
* john stultz ([EMAIL PROTECTED]) wrote:
>
> On Wed, 2008-01-16 at 14:36 -0800, john stultz wrote:
> > On Jan 16, 2008 6:56 AM, Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> > > If you really want an seqlock free algorithm (I _do_ want this for
> > > tracing!) :) maybe going in the RCU direction
Thanks John for doing this!
(comments imbedded)
On Wed, 16 Jan 2008, john stultz wrote:
>
> On Wed, 2008-01-16 at 14:36 -0800, john stultz wrote:
>
> Completely un-tested, but it builds, so I figured I'd send it out for
> review.
heh, ok, I'll take it and run it.
>
> I'm not super sure the up
On Wed, 2008-01-16 at 14:36 -0800, john stultz wrote:
> On Jan 16, 2008 6:56 AM, Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> > If you really want an seqlock free algorithm (I _do_ want this for
> > tracing!) :) maybe going in the RCU direction could help (I refer to my
> > RCU-based 32-to-64 bi
On Jan 16, 2008 6:56 AM, Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> If you really want an seqlock free algorithm (I _do_ want this for
> tracing!) :) maybe going in the RCU direction could help (I refer to my
> RCU-based 32-to-64 bits lockless timestamp counter extension, which
> could be turne
On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> >
> > In-other-words, latency_tracer is LTTng-lite ;-)
> >
>
> If LTTng is already ported to your specific kernel, the learning-curve
> is not big at all. Here is what the latency_tracer over LTTng guide
> could look like :
>
> Well, once you have LT
Mathieu Desnoyers wrote:
> If LTTng is already ported to your specific kernel, the learning-curve
> is not big at all. Here is what the latency_tracer over LTTng guide
> could look like :
>
> Well, once you have LTTng in your kernel and have compiled and installed
> the ltt-control and lttv packag
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
>
...
> >
> > > >
> > > > - Disable preemption at the read-side :
> > > > it makes sure the pointer I get will point to a data structure that
> > > > will never change while I am in the preempt disabled code. (see *)
> > > > - I use per-cpu data to
On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> * Steven Rostedt ([EMAIL PROTECTED]) wrote:
> >
> > Yeah, but if we replace the loop with a seq lock, then it would work.
> > albeit, more cacheline bouncing (caused by writes). (maybe not, see below)
> >
>
> Yes, but then you would trigger a deadloc
Steven Rostedt wrote:
> grmble. Then how do you trace preempt_disable? As my tracer does that
> (see the last patch in the series).
One way is to make a tracer_preempt_disable() and tracer_preempt_enable(),
both of which would be 'notrace'. You could probably optimize them
as well. The standard
* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote:
> * Steven Rostedt ([EMAIL PROTECTED]) wrote:
> >
> >
>
> > One thing I want to clear up. The major difference between this
> > latency_tracer and LTTng is what we consider fast paths. The latency
> > tracer is recording things like enabling
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
>
>
> On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> > > No, there's probably issues there too, but no need to worry about it,
> > > since I already showed that allowing for clocksource_accumulate to happen
> > > inside the get_monotonic_cycles loop is
On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> > No, there's probably issues there too, but no need to worry about it,
> > since I already showed that allowing for clocksource_accumulate to happen
> > inside the get_monotonic_cycles loop is already flawed.
> >
>
> Yep, I just re-read through you
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
>
> On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> > Hrm, I will reply to the rest of this email in a separate mail, but
> > there is another concern, simpler than memory ordering, that just hit
> > me :
> >
> > If we have CPU A calling clocksource_accu
On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> Hrm, I will reply to the rest of this email in a separate mail, but
> there is another concern, simpler than memory ordering, that just hit
> me :
>
> If we have CPU A calling clocksource_accumulate while CPU B is calling
> get_monotonic_cycles, but
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
>
> [ CC'd Daniel Walker, since he had problems with this code ]
>
> On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
> >
> > I agree with you that I don't see how the compiler could reorder this.
> > So we forget about compiler barriers. Also, the clock s
[ CC'd Daniel Walker, since he had problems with this code ]
On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
>
> I agree with you that I don't see how the compiler could reorder this.
> So we forget about compiler barriers. Also, the clock source used is a
> synchronized clock source (get_cycles_sy
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
>
> On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
> >
> > Ok, but what actually insures that the clock->cycle_* reads won't be
> > reordered across the clocksource_read() ?
>
>
>
> Hmm, interesting.I didn't notice that clocksource_read() is a static
>
On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
>
> Ok, but what actually insures that the clock->cycle_* reads won't be
> reordered across the clocksource_read() ?
Hmm, interesting.I didn't notice that clocksource_read() is a static
inline. I was thinking that since it was passing a pointer to
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
>
>
> On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
> > >
> > > Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
> > > ---
> > > include/linux/clocksource.h |3 ++
> > > kernel/time/timekeeping.c | 48
> > > +++
On Tue, 15 Jan 2008, Steven Rostedt wrote:
>
> Also, it just occurred to me that this is an old patch. I thought I
> renamed cycle_raw to cycle_monotonic. But I must have lost that patch :-/
Ah, I changed this in the -rt patch queue, and never moved the patch back
here.
-- Steve
--
To unsubscri
On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
> >
> > Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
> > ---
> > include/linux/clocksource.h |3 ++
> > kernel/time/timekeeping.c | 48
> >
> > 2 files changed, 51 insertions(+)
> >
> > Index
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
> The latency tracer needs a way to get an accurate time
> without grabbing any locks. Locks themselves might call
> the latency tracer and cause at best a slow down.
>
> This patch adds get_monotonic_cycles that returns cycles
> from a reliable clock so
On Wed, 2008-01-09 at 18:29 -0500, Steven Rostedt wrote:
> +cycle_t notrace get_monotonic_cycles(void)
> +{
> + cycle_t cycle_now, cycle_delta, cycle_raw, cycle_last;
> +
> + do {
> + /*
> +* cycle_raw and cycle_last can change on
> +* anot
The latency tracer needs a way to get an accurate time
without grabbing any locks. Locks themselves might call
the latency tracer and cause at best a slow down.
This patch adds get_monotonic_cycles that returns cycles
from a reliable clock source in a monotonic fashion.
Signed-off-by: Steven Rost
55 matches
Mail list logo