OK, my compiler, gcc 2.95 does the optimization at -O2, so I ran the
tests here with the attached program, testing a constant of 256 and a
variable 'j' equal to 256.

I found a 1.7% slowdown with the variable compared to the constant.
The 256 value is rather large.  For a length of 32, the difference was
14%!

So, seems I should back out the palloc0 usage and let the MemSet inline
at each location.  Does that seems like the direction to head?

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <[EMAIL PROTECTED]> writes:
> > > I have applied this patch to inline MemSet for newNode.
> > > I will make another commit to make more general use of palloc0.
> > 
> > Refresh my memory on why either of these was a good idea?
> 
> You are talking about the use of palloc0 in place of palloc/MemSet(0),
> not the use of palloc0 in newNode, right?
> 
> > > This was already discussed on hackers.
> > 
> > And this was not the approach agreed to, IIRC.  What you've done has
> > eliminated the possibility of optimizing away the controlling tests
> > in MemSet.
> 
> I see now:
> 
>       
>http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=200210120006.g9C06i502964%40candle.pha.pa.us
> 
> I thought someone had tested that there was little/no performance
> difference between these two statements:
> 
>       MemSet(ptr, 0, 256)
> 
> vs.
> 
>       i = 256;
>       MemSet(ptr, 0, i)
> 
> However, seeing no email reply, I now assume no test was done.
> 
> Neil, can you run such a test and let us know.  It has to be with a
> compiler that optimizes out the MemSet length test when the len is a
> constant.  I can't remember what compiler version/settings that was, but
> it may have been -O3 gcc 3.20.
> 
> I can back out my changes, but it would be easier to see if there is a
> benefit before doing that.  Personally, I am going to be surprised if a
> single conditional tests in MemSet (which is not in the assignment loop)
> will make any measurable difference.
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   [EMAIL PROTECTED]               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#define INT_ALIGN_MASK (sizeof(int) - 1)
#include <stdlib.h>

/*
 * MemSet
 *      Exactly the same as standard library function memset(), but considerably
 *      faster for zeroing small word-aligned structures (such as parsetree nodes).
 *      This has to be a macro because the main point is to avoid function-call
 *      overhead.   However, we have also found that the loop is faster than
 *      native libc memset() on some platforms, even those with assembler
 *      memset() functions.  More research needs to be done, perhaps with
 *      platform-specific MEMSET_LOOP_LIMIT values or tests in configure.
 *
 *      bjm 2002-10-08
 */
#define MemSet(start, val, len) \
        do \
        { \
                int * _start = (int *) (start); \
                int             _val = (val); \
                size_t  _len = (len); \
\
                if ((((long) _start) & INT_ALIGN_MASK) == 0 && \
                        (_len & INT_ALIGN_MASK) == 0 && \
                        _val == 0 && \
                        _len <= MEMSET_LOOP_LIMIT) \
                { \
                        int * _stop = (int *) ((char *) _start + _len); \
                        while (_start < _stop) \
                                *_start++ = 0; \
                } \
                else \
                        memset((char *) _start, _val, _len); \
        } while (0)

#define MEMSET_LOOP_LIMIT  1024

int
main(int argc, char **argv)
{
        int i,j;
        char *buf;

        buf = malloc(256);

        j = 32;
        for (i = 0; i < 1000000; i++)
                MemSet(buf, 0, j /* choose 256 or j here */);
        return 0;
}
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Reply via email to