Performance gain is caused by the fact that for standard API you do function call and additional parameter correctness check.
Artifitial test for sigaddset() performance (attached) provides
the following results:
compiler is gcc 2.95.2 (or 2.95.1), CFLAGS -O2
Platform
Linux HP10.20 AIX4.1
sizeof(sigset_t) 128
32 8
NSIG
64 35
64
Time fast
2.63 0.75 5.73
Time orig.
6.44 4.34 23.51
Ratio
2.45 5.79 4.10
For Linux performance gain is lower but still significant - about 2.5 times.
In regard to portability, Pth anyway choose context switch implementation
on 'configure' stage.
May be it is worth to substitute at the same time standard functions
for sigmask with macros for UNIX'es and leave it as is for Windows and
other non-Unix platforms. This will be well localized and will allow generic
implementation on arbitrary platform while improve performance on all UNIX'es.
Vladimir
Chris Leishman wrote:
On Sun, Sep 10, 2000 at 09:23:29PM +0200, Ralf S. Engelschall wrote:
> On Sun, Sep 10, 2000, Vladimir Kondratiev wrote:
>
> > I analyzed Pth performance for application that performs lots of context
> > switch. I found huge amount of time spent in just 2 functions :
> > sigismember() and sigdelset(). This functions are called from
> > pth_sched_eventmanager in loop for each thread and each signal.
> > pth_sched_eventmanager itself is called for each context switch. This
> > provides O(n**2) calls for functions mentioned, where n is number of
> > threads.
>
> Hmmmm... sorry, this complexity analysis seems not quite correct to me. An
> O(N^2) complexity you only can get if for every thread in an outer loop
> another inner loop exists which again performs an operation which is again
> directly proportional to the number of total threads. But that's not the case
> for Pth - both the way you explained it above and as far as I know details.
> I'm convinced that the complexity of the event manager is just O(n)
> - which is the expected one and should be acceptable.
>Well - you are both well on the way to being right. The code, as Vladimir
said, does a "loop for each thread and each signal". It is:for (t = pth_pqueue_head(&pth_WQ); t != NULL;
t = pth_pqueue_walk(&pth_WQ, t, PTH_WALK_NEXT)) {/* determine signals we block */
for (sig = 1; sig < PTH_NSIG; sig++)
if (!sigismember(&(t->mctx.sigs), sig))
sigdelset(&pth_sigblock, sig);So - the complexity is actually O(n*m), when n is the number of threads, and m
is the number of signals. Vladimir was confused I believe when he said n*n.Of course, since PTH_NSIG is constant (on my box it's 64), the complexity
reduces to O(64*n) - which we all know is the same as O(n). So as Ralph
correctly said, the complexity is O(n) in the number of threads.Chris
______________________________________________________________________
GNU Portable Threads (Pth) http://www.gnu.org/software/pth/
User Support Mailing List [EMAIL PROTECTED]
Automated List Manager (Majordomo) [EMAIL PROTECTED]
#include <stdio.h>
#include <signal.h>
#include <time.h>
/* Return a mask that includes the bit for SIG only. */
# define __sigmask(sig) \
(((unsigned long int) 1) << (((sig) - 1) % (8 * sizeof (unsigned long int))))
/* Return the word index for SIG. */
# define __sigword(sig) (((sig) - 1) / (8 * sizeof (unsigned long int)))
#define fast_sigaddset(ss,i) \
((unsigned long int*)ss)[__sigword(i)] |= __sigmask(i)
#define fast_sigdelset(ss,i) \
((unsigned long int*)ss)[__sigword(i)] &= ~(__sigmask(i))
int main() {
sigset_t ss;
int szSigSet=sizeof(sigset_t);
int i,j;
double tFast,tOrig;
{
clock_t start,stop;
sigemptyset(&ss);
start=clock();
for (j=0;j<1000000;j++) {
for (i=1;i<NSIG;i++) {
fast_sigaddset(&ss,i);
}
}
stop=clock();
tFast=((double)(stop-start))/CLOCKS_PER_SEC;
sigemptyset(&ss);
start=clock();
for (j=0;j<1000000;j++) {
for (i=1;i<NSIG;i++) {
sigaddset(&ss,i);
}
}
stop=clock();
tOrig=((double)(stop-start))/CLOCKS_PER_SEC;
}
printf("sizeof(sigset_t): %d\n", szSigSet);
printf("NSIG: %d\n",NSIG);
printf("Fast: %.2lf sec\n", tFast);
printf("Ordinal: %.2lf sec\n", tOrig);
printf("Ratio: %.2lf\n", tOrig/tFast);
return 0;
}
