On Sun, Sep 10, 2000, Vladimir Kondratiev wrote:

> I analyzed Pth performance for application that performs lots of context
> switch. I found huge amount of time spent in just 2 functions :
> sigismember() and sigdelset(). This functions are called from
> pth_sched_eventmanager in loop for each thread and each signal.
> pth_sched_eventmanager itself is called for each context switch. This
> provides O(n**2) calls for functions mentioned, where n is number of
> threads.

Hmmmm... sorry, this complexity analysis seems not quite correct to me. An
O(N^2) complexity you only can get if for every thread in an outer loop
another inner loop exists which again performs an operation which is again
directly proportional to the number of total threads. But that's not the case
for Pth - both the way you explained it above and as far as I know details.
I'm convinced that the complexity of the event manager is just O(n)
- which is the expected one and should be acceptable.

> Functions mentioned do signal mask operations. Actually they do
> set/clean/test one bit in bit vector that represents signal mask.
> Besides this, for each bit to be changed/queried we have function call
> and check for parameter correctness.

That's correct. But there is no _PORTABLE_ way to overcome this bitmask
fiddling (see above).

> I did fix (attached) to inline bit operations directly into the code. I
> took code from glibc 2.1.3 and slightly modified it. I leave signal set
> operations as macros, not inline functions and I removed extra
> operations for sigismember(). Originally sigismember() was implemented
> as 'return sig[word] & mask ? 1:0'; I changed it to 'return sig[word] &
> mask' only.
> I removed all references to sidaddset(), sigdelset() and sigismember(),
> replacing it by macros. Macros itself resides in 'fastsig.h' file. All
> actual calls renamed from xxx() to fast_xxx() to make sure macros, not
> functions, are used.

Independent of details, what you've done can be considered both
questionable and unportable, because:

1. IEEE Std1003.1-1988 (``POSIX'') standardized the sigemptyset,
   sigfillset, sigaddset, sigdelset and sigismember functions and
   the corresponding sigset_t type. But POSIX intentionally did not
   specify how this API has to be implemented. Your approach depends on
   a particular implementation technique (a bit mask) of sigset_t and
   this way is highly unportable between different platforms (although
   in practice most platforms certainly share the same implementation).
   But practice is not allowed to count here.

2. Even on platforms where your fastsig.h stuff works as expected
   (that is where the implementation technique matches), it is questionable
   whether the inline macros really speed up processing dramatically.  First,
   most platforms implement the API as macros anyway. And even on others, the
   overhead of a function call (not a system call!) is not such dramatically.
   Especially not, because I think the complexity above is just O(n) and not
   O(n^2). So you certainly see some performance gain on one platform or
   another, but as a side-effect you horribly suffer from decreased
   portability.

> Fix is against 1.3.5. It should cause no problem to adopt it for other
> releases.
> 
> This fix improves performance for my test by factor 3 for 70 threads on
> HP 10.20.
> I tested this fix on 2 platforms: HP 10.20 and Linux 2.2.14. I did also
> test to make sure my macros do exactly the same as corresponded
> functions do.

On Linux you should see no difference, because you said it is derived
from glibc's code. And on HPUX I guess the vendor implementation is just
brain-dead in some way and this way you see a performance increase.

> I will be happy if you incorporate fix I attached into future releases.

Thanks for your efforts and feedback, but I have to admit that even if your
patch proves a performance increase on some platforms, it cannot be acceptable
for inclusion into the official Pth version because of the mentioned
portability aspect.

So, if you want to optimize Pth's scheduler, you have to do it
differently. You at least cannot replace the usage of standardized
APIs by hard-coded optimized versions. You have to stick with the
standardized APIs. But you nevertheless can do lots of other things.
One thing is to change the scheduler so it no longer must use some API
functions such a lot of times. Typical techniques to achieve this is
caching of function call results, using faster alternative (but still
standardized and portable) functions, etc. Please feel free to make more
suggestions in this optimization direction.

Greetings,
                                       Ralf S. Engelschall
                                       [EMAIL PROTECTED]
                                       www.engelschall.com
______________________________________________________________________
GNU Portable Threads (Pth)            http://www.gnu.org/software/pth/
User Support Mailing List                            [EMAIL PROTECTED]
Automated List Manager (Majordomo)           [EMAIL PROTECTED]

Reply via email to