Re: scheduler performance

Vladimir Kondratiev Sun, 10 Sep 2000 23:25:59 -0700

My assumption was number of context switches is O(n). This is true for my particular application. I understand this may be not the case for others.
Performance gain is caused by the fact that for standard API you do function call and additional parameter correctness check.

Artifitial test for sigaddset() performance (attached) provides the following results:
compiler is gcc 2.95.2 (or 2.95.1), CFLAGS -O2

Platform            Linux   HP10.20 AIX4.1
sizeof(sigset_t)     128       32        8
NSIG                  64       35       64
Time fast           2.63     0.75     5.73
Time orig.          6.44     4.34    23.51
Ratio               2.45     5.79     4.10

For Linux performance gain is lower but still significant - about 2.5 times.

In regard to portability, Pth anyway choose context switch implementation on 'configure' stage.
May be it is worth to substitute at the same time standard functions for sigmask with macros for UNIX'es and leave it as is for Windows and other non-Unix platforms. This will be well localized and will allow generic implementation on arbitrary platform while improve performance on all UNIX'es.

Vladimir

Chris Leishman wrote:

On Sun, Sep 10, 2000 at 09:23:29PM +0200, Ralf S. Engelschall wrote:
> On Sun, Sep 10, 2000, Vladimir Kondratiev wrote:
>
> > I analyzed Pth performance for application that performs lots of context
> > switch. I found huge amount of time spent in just 2 functions :
> > sigismember() and sigdelset(). This functions are called from
> > pth_sched_eventmanager in loop for each thread and each signal.
> > pth_sched_eventmanager itself is called for each context switch. This
> > provides O(n**2) calls for functions mentioned, where n is number of
> > threads.
>
> Hmmmm... sorry, this complexity analysis seems not quite correct to me. An
> O(N^2) complexity you only can get if for every thread in an outer loop
> another inner loop exists which again performs an operation which is again
> directly proportional to the number of total threads. But that's not the case
> for Pth - both the way you explained it above and as far as I know details.
> I'm convinced that the complexity of the event manager is just O(n)
> - which is the expected one and should be acceptable.
>
Well - you are both well on the way to being right. The code, as Vladimir
said, does a "loop for each thread and each signal". It is:
for (t = pth_pqueue_head(&pth_WQ); t != NULL;
     t = pth_pqueue_walk(&pth_WQ, t, PTH_WALK_NEXT)) {
    /* determine signals we block */
    for (sig = 1; sig < PTH_NSIG; sig++)
        if (!sigismember(&(t->mctx.sigs), sig))
            sigdelset(&pth_sigblock, sig);
So - the complexity is actually O(n*m), when n is the number of threads, and m
is the number of signals. Vladimir was confused I believe when he said n*n.
Of course, since PTH_NSIG is constant (on my box it's 64), the complexity
reduces to O(64*n) - which we all know is the same as O(n). So as Ralph
correctly said, the complexity is O(n) in the number of threads.
Chris
______________________________________________________________________
GNU Portable Threads (Pth)            http://www.gnu.org/software/pth/
User Support Mailing List                            [EMAIL PROTECTED]
Automated List Manager (Majordomo)           [EMAIL PROTECTED]

#include <stdio.h>
#include <signal.h>
#include <time.h>

/* Return a mask that includes the bit for SIG only.  */
# define __sigmask(sig) \
  (((unsigned long int) 1) << (((sig) - 1) % (8 * sizeof (unsigned long int))))

  /* Return the word index for SIG.  */
# define __sigword(sig) (((sig) - 1) / (8 * sizeof (unsigned long int)))

#define fast_sigaddset(ss,i) \
((unsigned long int*)ss)[__sigword(i)] |= __sigmask(i)

#define fast_sigdelset(ss,i) \
((unsigned long int*)ss)[__sigword(i)] &= ~(__sigmask(i))

int main() {
    sigset_t ss;
    int szSigSet=sizeof(sigset_t);
    int i,j;
    double tFast,tOrig;
    {
        clock_t start,stop;
        sigemptyset(&ss);
        start=clock();
        for (j=0;j<1000000;j++) {
            for (i=1;i<NSIG;i++) {
                fast_sigaddset(&ss,i);
            }
        }
        stop=clock();
        tFast=((double)(stop-start))/CLOCKS_PER_SEC;
        sigemptyset(&ss);
        start=clock();
        for (j=0;j<1000000;j++) {
            for (i=1;i<NSIG;i++) {
                sigaddset(&ss,i);
            }
        }
        stop=clock();
        tOrig=((double)(stop-start))/CLOCKS_PER_SEC;
    }
    printf("sizeof(sigset_t): %d\n", szSigSet);
    printf("NSIG:    %d\n",NSIG);
    printf("Fast:    %.2lf sec\n", tFast);
    printf("Ordinal: %.2lf sec\n", tOrig);
    printf("Ratio:   %.2lf\n", tOrig/tFast);
    return 0;
}

Re: scheduler performance

Reply via email to