Hi, I wanted to measure how much time our application spends in situations like waiting for a locked mutex, IO, etc. - things that instruction count can hardly measure. Callgrind has an option --collect-systime that looks to be what I want. However, I have problems interpreting the results.
First of all the app is multithreaded. But I am interested only in one thread that does the important work. The results suggested that a lot of time is spent in pthread_cond_signal in a function responsible for some logging stuff. When I turned off logging and measured the time (without valgrind) it didn't seem to help to speed up the app. I might do my time measurement in a wrong way but I would like know more about collecting systime. The valgrind manual does not say much about it. How does it work with threads? When I have 3 threads that all sleep 1 second should the time collected be 1 or 3? (1. The whole process sleeps 1 second and this is somehow distributed among the threads or 2. Each thread sleeps 1 second and the result is a sum) When run with --separate-threads=yes, should each thread show 1 spent in sleep? >From what I saw (see below) it looks like case 1. But I don't understand how >the individual threads contributed to the result. I tried to understand what is going on using a small sample: #include <pthread.h> #include <unistd.h> #include <stdio.h> void* child_fn ( void* arg ) { printf("Thread %d started\n", (int)(arg)); sleep(1); printf("Thread %d finished\n", (int)(arg)); return NULL; } int main ( void ) { const int th_cnt(2); pthread_t threads[th_cnt]; printf("Main thread started\n"); for (int i1 = 0; i1 < th_cnt; i1++) pthread_create(&(threads[i1]), NULL, child_fn, (void*)i1); printf("Main thread started other threads\n"); sleep(4); printf("Main thread finished sleeping\n"); for (int i1 = 0; i1 < th_cnt; i1++) pthread_join(threads[i1], NULL); printf("Main thread joined other threads\n"); return 0; } Run in callgrind, with output: "Main thread started Main thread started other threads Thread 0 started Thread 1 started Thread 0 finished Thread 1 finished Main thread finished sleeping Main thread joined other threads" In KCacheGrind the result (for sysTime) is a disconnected graph. 5010 spend in start_thread (called twice) and no callees from start_thread. Another component of the graph contains the main function together with sleep, both having cost 1003 (called once). child_fn function (called twice) has weight 0. With output: "Main thread started Thread 0 started Main thread started other threads Thread 1 started Thread 1 finished Thread 0 finished Main thread finished sleeping Main thread joined other threads" It looks completely different. start_thread now takes 6025, calls child_fn (1008), that calls sleep (1008). Function main takes only 2. When I removed "sleep(4);" line from the main thread, one possible output was: "Main thread started Thread 0 started Thread 1 started Main thread started other threads Main thread finished sleeping Thread 0 finished Thread 1 finished Main thread joined other threads" In KCacheGrind: main (2005) -> pthread_join (1003) -> _dl_sysinfo_int80 (2005) start_thread (1000) child_fn(0) BTW: Maybe the outputs are not so reliable (and relevant) as I assume due to buffering and synchronization. Thanks. ------------------------------------------------------------------------------ _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users