Re: [Valgrind-users] threads to queue

Darryl Miles Sun, 02 Nov 2008 17:59:15 -0800

tom fogal wrote:
> Darryl Miles <[EMAIL PROTECTED]> writes:
> 
> You missed his question -- valgrind sees (essentially):
> 
>         read(42, &mybuf, 1024);
> 
> Is this system call going to block?  Or are we going to return to user
> code without any context switches?  It is impossible to know a priori,
> unless we can peek into the kernel.
> 
> Anyway I don't think this question is relevant anymore -- with the
> lock approach, yuo don't care whether things block.


No system calls will be blocked or otherwise impeded by the valgrind 
scheduler.

I'm not sure on this use of the term "with the lock" in relation to 
talking about syscalls.  There is no lock around syscalls, only around 
application/client code, this is to ensure only one application/client 
thread is running at any one time.


So from the above and in answer to your direct question.  It does not 
matter if the read() syscall (or any other syscall) blocks, since it 
does not break the design.

As and when context-switches occur they occur, again it does not matter 
where and when they occur.  The design I propose does not break if a 
context switch occurs at the wrong moment nor break from not knowing 
when a context switch occured.



>>> Do you have any concise fragments of code illustrating what problem
>>> it is you are really trying to solve?
>> "Concise" might seem a little subjective.
> 
> Look at it from a developer point of view.  Think of a stupid, 10 line
> program that you don't actually want to debug because it's easy, and
> show the output you'd want to get from valgrind on that program.
> 
> I'll make up an example:
> 
>         T1                T2               T3             T4
>     acquire(&x)       acquire(&x)       read(abc)      write(abc)
>     write(abc)        read(abc)         y = abc*18     z =42/abc
>     release(&x)       release(&x)       ... whatever   ..
> 
> Darryl (I think) wants to see is something like:
> 
>     while in T1's C.S. `x', the variable `abc':
>         saw a write from T1 of size whatever
>         saw a write from T4 of size whatever
>         saw a read from T3 of size whatever
>         saw a read from T4 of size whatever

should be "saw a read from T2 of size whatever" in last line, but yes 
understood on your example.

> (of course, that's just one possible ordering that I'm making up)
> 
> Where those `saw a' lines define a total ordering of the memory access
> to `abc'.

Yes yes yes the audit of memory access.

The issue where the scheduler comes into play is to put code in T1 
during the "acquire(&x)" to basically instruct valgrind with:

"Hey I've got my application/client code lock with 'acquite(&x)' now but 
I want you to try as hard as you can to run T2,T3,T4 as far as you can 
for a little while, then come back to me".

So T1 is artificially put to sleep just after the lock acquisition. 
This moment would also atomically enable the earmark areas of memory the 
developer is interested in for this event.

Valgrind then might resume T2.  T2 will then attempt the same 
acquire(&x) in doing this it will call futex() syscall to attempt to 
gain the lock and if unsuccessful go to sleep.  No we know from the 
example it will be unsuccessful but inline with my rules on scheduling 
because it is making a syscall it will in the process of doing that take 
up T3.  T2 will continue into the kernel and ultimately be put to sleep.

Valgrind has resumed T3.  T3 proceeds to make syscall read() and in the 
process of doing that takes T4 and enters the syscall.

Valgrind has resumed T4.  T4 proceeds to make syscall write() and finds 
there are no other threads to wake and enters the syscall.

The recap at this point in time all 4 threads are inside the kernel.  T1 
is blocked sleeping on semaphore/timeout, T2 is blocked forever waiting 
to aquire lock, T3 is doing a read() and T4 is doing a write().

Lets say T3 returns from syscall first.  When it does so it checks to 
see if it can continue to run application/client code, it does this by 
checking that nothing else is running (in userspace).  It finds nothing 
is running so it proceeds to run and execute "y=abc*18".

Now T4 returns from syscall.  It also check to see if T4 can continue to 
run application/client code but it finds T3 is already running so it 
suspends itself in the valgrind scheduler and puts itself to sleep.

Now T1 returns from syscall, due to timeout.  This timeout was imposed 
by valgrind to artificially reduce the scheduling priority of T1 to 
force other threads to run first.  This returns and finds

T3 then does something to relinquish the single thread at a time lock, 
T1 takes over compeletes.  I'm sure you can work fill in the rest of the 
picture.

Obviously the exact order of event might change over many runs, you 
indicate this understanding from your comment "one possible ordering" 
well the same is try of the above that is one possible ordering.

You can add to it that you can presume a state where only T1 is running 
at the start.



You can then see how maybe using HPET timers and other weighting factors 
you can run another thread for a period of time before deciding to 
artificially put it to sleep.  Then select another thread which has not 
run and wake that one up.



I'm sure you appreciate that to get a sequential log of ordered memory 
access out of valgrind you're going to need to serialize 
application/client code to be simulate one-thread-at-time.



I'm also happy to depart from the "report on all memory information for 
the entire life of the process" paradigm.  To replace with a more 
specific selection of memory between an arbitrary point A and point B 
during runtime execution.

It is more useful throw out the "entire life of the process" design 
goal, since Point A and Point B should do most people, if that design 
goal is too difficult/costly to achieve.

Than it is to throw out the "all memory information" design goal.  i.e. 
it's highly likely in some circumstance to want to see all memory access 
between two points in time, than is it to see all access specific memory 
over the life of the process.  The 2nd case isn't so useful.

Ideally the design needs to also allow for overlapping (in same thread 
or different thread) point As and point Bs.  So it is like a multitude 
of sets all being watched by different watcher contexts that are setup 
and torn down.


Darryl


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Valgrind-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] threads to queue

Reply via email to