Re: [Valgrind-users] threads to queue [long]

Darryl Miles Sat, 01 Nov 2008 22:02:52 -0700

tom fogal wrote:
> One thing I neglected to think about earlier is how one would do a
> `Can I still run?' check.  There is no notification to a user process
> when a context switch comes back.
> 
> Remember that valgrind instruments instructions.


You means a context switch occurred due to timeslice exhaustion whilst 
execution was in user-space.   So the scenario you describe is referring 
to resumption of user-space execution.

Why does it need to know ?  I can not see the problem, my strategy works 
just fine in the situation you describe.  Maybe you are confusing 
resumption of execution in user-space due to timeslice exhaustion with 
resumption of execution in user-space due to a syscall return.  The 
syscall return case I do have things to talk about (as previously written).


To add; as it is under emulation then it is true to say the user-space 
part of the thread spends most of its time executing valgrind code as 
opposed to simulated application/client code.  Since it is executing 
valgrind code then VG has control.

Thinking through your example above there would be an issue if valgrind 
allows a process to run an infinite loop of pure application/client 
code, i.e. a CPU bound application that never makes a 
syscall/libc/valgrind/etc function call.   If I understand correctly if 
everything is instrumented then this never happens there is always a 
point where valgrind has control.  If not then simply inject a 
checkpoint/safe-point during instrumentation.


My only concern so far has been syscalls.  When any thread makes a 
syscall you never know how long its going to take or if it will block, 
so you treat the situation as-if they always block, even when you know 
they don't.



> I had a long diatribe about how you essentially want the JVM's safe
> points and how ridiculously difficult it will be, as you'd have to
> write a parallel scheduler.  Which I still think is true, in your
> general case of the choosing which thread to run.

It depends what you mean by a scheduler.  Anything that manipulates what 
gets to run could be called a scheduler.  The points I've put forward 
propose to create artificial waits, artificial temporary suspension, 
artificial low priority.  Artificial meaning from the application/client 
code perspective the observable result is as-if.

I certainly wouldn't call whats proposed a parallel scheduler, it can't 
make something thats not runnable (due to being blocked in the kernel) 
runnable!  The kernel can.

But it can put something that is running to sleep, and thats a big 
enough wrench/spanner to achieve the goal.  I also think you 
misinterpted something as I wasn't providing any emphasis on difficulty 
levels; you're definitely adding that meaning yourself.

Your challenges have been great to hear and a good sounding board but 
there is nothing added that wasn't already thought about or seems a show 
stopper.



> You really don't need that functionality though.  All you want is to
> guarantee that <= 1 thread is running, at all times.
> 
> A solution to your issue -- introduce a new mutex, within valgrind.
> Wrap every function you possibly can.  Acquire the lock before the
> function, and release it after the function.

You're idea of adding a mutex lock around everything, that may work.  I 
wouldn't be wrapping anything I would be expecting valgrind to 
instrument the application/client in that way on my behalf.

It also won't be on a function call basis or a line of code basis as 
your example shows.  It would be one a unit (like your notes lean 
towards) of work basis that is only allowed to have a single load, store 
or load&store with memory within it or a single function call/jump.  So 
yes a single asm insn might need a lock around it.


So code actually becomes:

acquire(&vg_runnable_thread)
vg_scheduler_pre_client_hook();

// run some application/client code work
sys_call();

vg_scheduler_post_client_hook();
release(&vg_runnable_thread)


Note that you could then obtain/influence threading control with the 
vg_scheduler_pre_client_hook() and the vg_scheduler_post_client_hook().


You can obviously understand the major performance penalty for your 
scheme.  Now if you think about optimizing it for a moment...


I think you arrive at the solution I've been discussing, valgrind 
already does cool stuff at watching memory in a byte glandular way and 
hey its only syscalls that are a problem in having valgrind bend the 
scheduling rules of the kernel.

So why not intercept system calls in a not so different way to your 
example above, infact we can do away with the acquire() relead() stuff 
and roll in into the vg_scheduler_xxxx() stuff whilst wrapping all syscalls.

I agree with your approach but maybe I'm a little further down the road 
on it.



> In this scheme, you don't get to control which thread runs.  But I
> think Julian stressed, and I stress now, that you really don't want to
> do that.

Well you can control threads if you want as my version above explains.


But Yes I think I've concluded on that point.  It is not absolute 
control that is wanted it is the ability to bend the existing scheduling 
rules in the direction of exposing more bugs.

Add to this the want of an audit trail of memory access between point A 
and point B, the audit must match up 100% with the SMP/CPU/memory view 
of a real world system.



> It sounds like, if you wanted to do this, you *would* need to lock at
> every instruction, since you'd need some kind of global hash table or
> other data structure which would maintain this list.

Ah yes, now you see.  Yes a memory access would involve the questions 
being asked, are we interested in this extent ?  do we log this access ? 
  is this access an application error ?

Doesn't memcheck already do a lot of this stuff, this was the point of 
valgrind to leverage on this.  I'm just adding a threading twist to it.



> By the way, have you heard of software transactional memory? ``STM''.
> I'm not sure if there are any open-source systems which implement it.
> However, an STM system must keep exactly this, except they of course
> do it to provide better parallelism than the absurdity that is
> threads.

No thats a new one for me, I will research some as you suggest.  I 
disagree thread aren't absurd :)



> Thinking about those gives me the idea that you could probably avoid
> this wrapping in regions of code which could provably not r/w a shared
> variable, perhaps by knowing that the thread holds no locks.

Euh.  Just because a thread holds no locks doesn't mean that the thread 
(that holds no locks) won't violate the multi-threaded design by reading 
from or writing to memory it shouldn't.  The reason it shouldn't is 
because that location is guarded by a lock (which it didn't take before 
access).

A memory access audit will provide that info.



> If you only need a hint, not a hard guarantee, change your wrappers to
> set and restore the thread priority instead of grab a lock.  I imagine
> that'd be much cheaper.

There is no feedback / assurity from the kernel (or whatever scheduler) 
you as using of which threads have been given the opportunity to run.

We want to offer up a roll-call to all threads, some may run, some may 
not run, but they were offered the opportunity.



I disagree its hard to implement a timeout, there are many such 
mechanisms to do that.  Don't forget its the kernel that actually 
implements the timeout, a futex() would probably do the trick and 
provide a mechanism to be woken up to boot.


You misunderstand the point of system call interception, the point is to 
pass back control to valgrind, not to modify the "error return code of a 
system call" but to modify the "exit from system call and return to 
caller assembly code".  One is a code like a variable, the other is code 
as in a program.

A system call maybe implemented as a function call, jump point, call 
gate, software interrupt, whatever... but before that moment there is 
assembly code around it to setup the stack/data correctly and assembly 
code after it where execution resumes that might deal with storing 
errno.   We just want to "wrapper" that (aka intercept that).


Now if ALL code is instrumented under emulation then there may not be a 
need to intercept syscalls if we can ensure control is passed back to 
valgrind BEFORE application/client code.  This would then check a thread 
specific memory location to see if there is an outstanding inception 
event (a latching flag) for this thread and if there is it calls a 
function.  This function then works out what the nature of the 
interception is and deals with it.


You completely miss the point that GDB has no support for byte granular 
memory checking.   That is the main feature of valgrind looking to be 
leveraged.  The application/client program needs to communicate with the 
byte granular memory checker, whatever and wherever that is.  I think 
gdb watchpoints only work on natural sized memory locations (4/8 bytes) 
they don't seem suited to the goals.  Watchpoints can have the option of 
hardware assistance but in limited numbers.


STM-like may certainly be the way, I'll see what I can find on that avenue.



Darryl

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Valgrind-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] threads to queue [long]

Reply via email to