Hi,

I wanted to measure how much time our application spends in situations like 
waiting for a locked mutex, IO, etc. - things that instruction count can hardly 
measure. Callgrind has an option --collect-systime that looks to be what I 
want. 
However, I have problems interpreting the results.

First of all the app is multithreaded. But I am interested only in one thread 
that does the important work. The results suggested that a lot of time is spent 
in pthread_cond_signal in a function responsible for some logging stuff. When I 
turned off logging and measured the time (without valgrind) it didn't seem to 
help to speed up the app. I might do my time measurement in a wrong way but I 
would like know more about collecting systime. The valgrind manual does not say 
much about it.

How does it work with threads? When I have 3 threads that all sleep 1 second 
should the time collected be 1 or 3? (1. The whole process sleeps 1 second and 
this is somehow distributed among the threads or 2. Each thread sleeps 1 second 
and the result is a sum) When run with --separate-threads=yes, should each 
thread show 1 spent in sleep?
>From what I saw (see below) it looks like case 1. But I don't understand how 
>the 
individual threads contributed to the result.

I tried to understand what is going on using a small sample:
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>

void* child_fn ( void* arg )
{
        printf("Thread %d started\n", (int)(arg));
        sleep(1);
        printf("Thread %d finished\n", (int)(arg));
        return NULL;
}

int main ( void )
{
        const int th_cnt(2);
        pthread_t threads[th_cnt];
        printf("Main thread started\n");
        for (int i1 = 0; i1 < th_cnt; i1++)
                pthread_create(&(threads[i1]), NULL, child_fn, (void*)i1);
        printf("Main thread started other threads\n");
        sleep(4);
        printf("Main thread finished sleeping\n");
        for (int i1 = 0; i1 < th_cnt; i1++)
                pthread_join(threads[i1], NULL);
        printf("Main thread joined other threads\n");
        return 0;
}

Run in callgrind, with output:
"Main thread started
Main thread started other threads
Thread 0 started
Thread 1 started
Thread 0 finished
Thread 1 finished
Main thread finished sleeping
Main thread joined other threads"
In KCacheGrind the result (for sysTime) is a disconnected graph. 5010 spend in 
start_thread (called twice) and no callees from start_thread. Another component 
of the graph contains the main function together with sleep, both having cost 
1003 (called once). child_fn function (called twice) has weight 0.

With output:
"Main thread started
Thread 0 started
Main thread started other threads
Thread 1 started
Thread 1 finished
Thread 0 finished
Main thread finished sleeping
Main thread joined other threads"
It looks completely different. start_thread now takes 6025, calls child_fn 
(1008), that calls sleep (1008). Function main takes only 2.

When I removed "sleep(4);" line from the main thread, one possible output was:
"Main thread started
Thread 0 started
Thread 1 started
Main thread started other threads
Main thread finished sleeping
Thread 0 finished
Thread 1 finished
Main thread joined other threads"
In KCacheGrind:
main (2005) -> pthread_join (1003) -> _dl_sysinfo_int80 (2005)
start_thread (1000)
child_fn(0)

BTW: Maybe the outputs are not so reliable (and relevant) as I assume due to 
buffering and synchronization.

Thanks.


------------------------------------------------------------------------------
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to