Hi, On Wednesday 05 May 2010, Vojtech Fried wrote: > I wanted to measure how much time our application spends in situations like > waiting for a locked mutex, IO, etc. - things that instruction count can > hardly > measure. Callgrind has an option --collect-systime that looks to be what I > want. > However, I have problems interpreting the results.
Hmm.. perhaps it is better to mark "--collect-systime" as experimental, as it usually just tells you nothing. It tries to measure real time spent in any system calls. However, Callgrind heavily perturbates process time because of its use of dynamic instrumentation (this time never is measured), so the system time measured very probably is bogus (e.g. slowing-down the process by Valgrind can hide much of the latencies of system calls, which would be there if run natively). So in the end, you can not trust Valgrinds profiling tools if there is lot of I/O done in a program. Similarily, Valgrind can not really measure any waiting time in parallel applications (at least not by taking real time). "--collect-systime" could be useful to estimate the number of, and the worst-case overhead of system calls. Josef > > First of all the app is multithreaded. But I am interested only in one thread > that does the important work. The results suggested that a lot of time is > spent > in pthread_cond_signal in a function responsible for some logging stuff. When > I > turned off logging and measured the time (without valgrind) it didn't seem to > help to speed up the app. I might do my time measurement in a wrong way but I > would like know more about collecting systime. The valgrind manual does not > say > much about it. > > How does it work with threads? When I have 3 threads that all sleep 1 second > should the time collected be 1 or 3? (1. The whole process sleeps 1 second > and > this is somehow distributed among the threads or 2. Each thread sleeps 1 > second > and the result is a sum) When run with --separate-threads=yes, should each > thread show 1 spent in sleep? > >>From what I saw (see below) it looks like case 1. But I don't understand > >>how the > individual threads contributed to the result. > > I tried to understand what is going on using a small sample: > #include <pthread.h> > #include <unistd.h> > #include <stdio.h> > > void* child_fn ( void* arg ) > { > printf("Thread %d started\n", (int)(arg)); > sleep(1); > printf("Thread %d finished\n", (int)(arg)); > return NULL; > } > > int main ( void ) > { > const int th_cnt(2); > pthread_t threads[th_cnt]; > printf("Main thread started\n"); > for (int i1 = 0; i1 < th_cnt; i1++) > pthread_create(&(threads[i1]), NULL, child_fn, (void*)i1); > printf("Main thread started other threads\n"); > sleep(4); > printf("Main thread finished sleeping\n"); > for (int i1 = 0; i1 < th_cnt; i1++) > pthread_join(threads[i1], NULL); > printf("Main thread joined other threads\n"); > return 0; > } > > Run in callgrind, with output: > "Main thread started > Main thread started other threads > Thread 0 started > Thread 1 started > Thread 0 finished > Thread 1 finished > Main thread finished sleeping > Main thread joined other threads" > In KCacheGrind the result (for sysTime) is a disconnected graph. 5010 spend > in > start_thread (called twice) and no callees from start_thread. Another > component > of the graph contains the main function together with sleep, both having cost > 1003 (called once). child_fn function (called twice) has weight 0. > > With output: > "Main thread started > Thread 0 started > Main thread started other threads > Thread 1 started > Thread 1 finished > Thread 0 finished > Main thread finished sleeping > Main thread joined other threads" > It looks completely different. start_thread now takes 6025, calls child_fn > (1008), that calls sleep (1008). Function main takes only 2. > > When I removed "sleep(4);" line from the main thread, one possible output was: > "Main thread started > Thread 0 started > Thread 1 started > Main thread started other threads > Main thread finished sleeping > Thread 0 finished > Thread 1 finished > Main thread joined other threads" > In KCacheGrind: > main (2005) -> pthread_join (1003) -> _dl_sysinfo_int80 (2005) > start_thread (1000) > child_fn(0) > > BTW: Maybe the outputs are not so reliable (and relevant) as I assume due to > buffering and synchronization. > > Thanks. > > > ------------------------------------------------------------------------------ > _______________________________________________ > Valgrind-users mailing list > Valgrind-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/valgrind-users > ------------------------------------------------------------------------------ _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users