Re: [Valgrind-users] threads to queue [long]

tom fogal Sun, 02 Nov 2008 19:45:51 -0800

(Re-send -- only sent to Darryl the first time (oops!))

First, you've pointed out an error in my earlier mail.  I definitely
meant `unlock; syscall; lock' in my implementation idea, but got
confused.  I mention it inline, but some of the answers above it might
not make sense, so I wanted to clarify up front.

Darryl Miles <[EMAIL PROTECTED]> writes:
> tom fogal wrote:
> > Darryl Miles <[EMAIL PROTECTED]> writes:
> >> tom fogal wrote:
> > It doesn't; I wrote this before I understood your `acquire a
> > VG-internal lock at syscall entry' implementation, and didn't come
> > back and edit.
>  >
>  > I don't see how that's not a scheduler.  Each thread essentially has a
>  > valgrind-specific state -- whether or not you'd like it to artifically
>  > suspend or not -- and valgrind should choose at various points whether
>  > or not to pause or delay the thread based on that information.
> 
> System calls would be left to run, i.e. any thread that wanted to jump 
> into a system call should be left to do so (T1) but in the act of doing 
> this at least one thread that was artificially blocked in user-space (in 
> valgrind scheduler code) would be released to run (T2).

yes, but *which* thread?  You must somehow choose `T2' from a set of
threads.  I call that a scheduler.

> On the return from syscall of that first thread (T1) it would by default 
> be intercepted and a simple check made.
[snip]

I still maintain that with the vg_runnable_thread lock, you don't
really need to check anything -- just acquire and release the lock,
let the kernel worry about everything.  There doesn't seem to be a
need to check anything in that scheme.

> I'm not averse to calling the valgrind thread management a "scheduler" 
> it's just not a parallel scheduler to the one in the kernel,

I did not mean to imply parallel in relation to the kernel.
Originally, I meant parallel with respect to the application, since it
would be possible for two threads of this managed application to be
running this scheduling code at the same time.  You wouldn't want
thread A to decide thread B should run, and thread B to decide thread
A to run.

It's a moot point now; the locking-based implementation bypasses the
need for any sort of scheduling logic inside valgrind, and I'm taking
it as a given that you're going to go with that implementation at this
point.

> >> Note that you could then obtain/influence threading control with the 
> >> vg_scheduler_pre_client_hook() and the vg_scheduler_post_client_hook().
> > 
> > Yes.. but you don't need to.  The locks around the syscall already
> > assure you that only one thread can run.
>
[snip]
>
> "The locks around syscalls" I'm not sure on that terminology, I don't 
> intend to make syscalls mutually exclusive, far from it.  My design 
> states the opposite of this.
> 
> The terminology I have used was to "intercept" syscalls, i.e. allow vg 
> to do something before and after (aka wrapper).

See the next answer, but ...

I never meant to imply these were different.  I still don't understand
how what you label a wrapper is different from the `lock around
syscalls' idea.  I'm just saying that the wrapper code *will be* a
lock/unlock (err, unlock/lock, see below) -- that is, giving the
implementation instead of the abstract.

> The design I hold up 
> for discussion allows all threads to be inside the kernel at the same 
> time, we deal with serializing application/client code on the "return 
> from syscall".  We don't need to serialize kernel calls nor put locks 
> around them, I'm not sure where this misunderstanding comes from I don't 
> think its from me.  That will cause deadlocks.

Ahh, it sounds like you want something more of an `unlock; syscall;
lock' type of implementation.  I was being dumb in the earlier mail;
in retrospect, you'd obviously to unlock first and lock after.  I
somehow managed to think that my locking scheme would do that, when it
blatantly didn't.

(Case in point -- threads are too hard! ;)

> The mutexes you proposed were there to protect tiny fragments of 
> application/client code so that only one thread of application/client 
> code was running at any one time.  They are not there to do anything in 
> relation to syscalls, unfact during syscall entry you'd need to release 
> that lock so allow another user-space thread to run.  I still cite my 
> technique is better and the logical progression after you try your mutex 
> approach that way and find out performance sucks to much.

Your technique has no proposed implementation, and a couple of us have
spoken up now saying that having valgrind do scheduling is a big
no-no.

The purpose of using the mutex at every syscall is just to provide a
convenient place to limit the parallelism via this lock.  The
alternative is per-instruction, or just per-basic-block, both of which
are likely to give terrible performance.

There's nothing particularly special about syscalls.  I just figured
they'd be easy since valgrind already has some smarts for wrapping
them.

> > Well, it doesn't add it to an internal data structure, as far as I
> > know .. just reports it immediately.  Yes though, seems like most of
> > the pieces you want are already there.
> 
> This is where you STM sounds good, a transaction log (or journal) of all 
> access.  This

Didn't finish?

> I'm proposing an internal data structure would be added for this 
> functionality, I understand it may not currently work that way.

Sure.  This is the hash table I think I've brought up once now.
Though thinking about it again, a linked list or an array sounds like
a better data structure, since there is an implied ordering, and
fast random access is not needed/relevant.
Anyway, choice of data structure is your concern, do what you want
<g>.

> > It's just that jumping to timeouts normally means mucking with
> > signals, which I gather is painful in something like valgrind.  I
> > could be wrong there.
> 
> futex() doesn't use signals, but allows a thread to go to sleep with the 
> ability to receive a wakeup-now event.
> 
> nanosleep() also allows for a delay.  Without use of signals.
> 
> alarm() in many version of Unix require the use of signals, of which the 
> SIGALRM might be something the application/client is using so needs 
> special care.

I think POSIX allows sleep (and probably *sleep) to be implemented via
SIGALRM.  Unverified.

> >> Now if ALL code is instrumented under emulation then there may not be a 
> >> need to intercept syscalls if we can ensure control is passed back to 
> >> valgrind BEFORE application/client code.
> > 
> > I don't think this is possible, for the reasons mentioned at the very
> > top of this email -- we can't know when/where a context switch will
> > `jump back to', and the code is already translated at that point.
> 
> Ah I don't understand the correlation between needing to know when a 
> context switch occurs, ensuring valgrind has CPU control at the time we 
> need it to and the instrumentation/translation.

Okay.. here's a quick summary of how /I/ think valgrind works.
Hopefully a valgrind hacker can correct me, at least as is relevant to
this discussion.  Say we have this code from a user:

.LCFI2:
        movl    %edi, -20(%rbp)
        movq    %rsi, -32(%rbp)
        cmpl    $2, -20(%rbp)
        jne     .L2
        movl    $42, -8(%rbp)
        movl    $19, -4(%rbp)

Valgrind probably gets this in two chunks, because there's two basic
blocks here (movl through jne, and the two movls).  So, upon first
entering the function with these basic blocks, or maybe upon first
loading in this chunk of code, valgrind translates this.  I'm not
really sure when/how, but that's not important.

Anyway, this gets changed into VG's VEX instruction set.  Then VG adds
instrumentation:

    VEX-ish
    mov reg edi to argument 1
    mov reg esi to argument 3
    compare two to argument 3
    if flags says they're equal, jump to L2

    instrumented VEX:
    mov reg edi to argument 1
    mov reg esi to argument 3
    check shadow bits of argument 3
    if flags says they're equal, jump to L2

(i'm skipping the second BB.)  finally, VG translates it back to intel
asm:

.LCFI2-modified:
        movl    %edi, -20(%rbp)
        movq    %rsi, -32(%rbp)
        cmpl    $0, $shadow   # i've got no clue, but you get the idea
        pushl   -20(%rbp)
        calll   vg-invalid-shadow
        addl    8, %rsp
        cmpl    $2, -20(%rbp)
        jne     .L2

Now, eventually, the CPU starts to execute LCFI2-modified.  Say it
gets through the second movq.  Then it gets context-switched.  Then
the process comes back and starts to compare 0 with the shadow bits.

How can we know we got context switched after the movq and came back
at the compare?  There's no notification of this.  But what if some
other thread came in and did something which means we don't want this
thread to run anymore?  In your original proposal, we'd want to run
some sort of scheduler here.  Things don't work that way though --
it's not like we jump to valgrind upon return from a context switch,
and then valgrind jumps into the user code.  Rather, the user code is
`valgrindified', and the modified code is what gets executed.

This is why I think of valgrind as more of a JIT than an emulator. It
doesn't really have complete control `all the time' -- it just gets to
control what will happen in the future.  That's subtly different, but
very important to the kind of thing you want to do.

That's why I was thinking you might need to add extra instrumentation
around every instruction -- you can't know /where/ a context switch is
going to hit you, and you want to run this little scheduling thing
every time that happens.  Even then, you're still screwed, because by
adding instructions to check this.. you need to check between those
instructions too, since they can't check and call the scheduler
atomically.

Thus, I don't currently see how this could possibly work without
having a thread hold an exclusive lock while running application code.
The logical place to grab that lock (to me) is upon return from every
call that valgrind can possibly wrap.

-tom

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Valgrind-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] threads to queue [long]

Reply via email to