Rocco Altier wrote:
> I wanted to chime in that I also see this speedup from using XLC 6.0
> (IBM's cc), even in 32bit mode.  I have tested on AIX 5.2 and 5.1.
> 
> I think this would be good to include in the regular release.  
> 
> Not sure how many people are running older versions of AIX that would
> want a new version of postgres.
> 

OK, attached patch applied that turns off MemSet on AIX.  If we need to
tweek it for AIX versions, let us know.  I added a constant test in the
macro that should allow the optimizer to call memset() directly rather
than doing our MemSet comparisons, and the optimizer should eliminate
the constant test for non-AIX builds.

---------------------------------------------------------------------------


>       -rocco
> 
> 
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED] 
> > [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian
> > Sent: Wednesday, February 01, 2006 12:11 PM
> > To: Seneca Cunningham
> > Cc: Martijn van Oosterhout; pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Some platform-specific MemSet research
> > 
> > 
> > 
> > My guess is that there is some really fast assembler for 
> > memory copy on
> > AIX, and only libc memset() has it.  If you want, we can make
> > MEMSET_LOOP_LIMIT in c.h a configure value, and allow template/aix to
> > set it to zero, causing memset() to be always used.
> > 
> > Are you prepared to make this optimization decision for all AIX users
> > using gcc, or only for certain versions?
> > 
> > --------------------------------------------------------------
> > -------------
> > 
> > Seneca Cunningham wrote:
> > > Martijn van Oosterhout wrote:
> > > > On Tue, Jan 24, 2006 at 05:24:28PM -0500, Seneca Cunningham wrote:
> > > > 
> > > >>After reading the post on -patches proposing that MemSet 
> > be changed to
> > > >>use long instead of int32 on the grounds that a pair of 
> > x86-64 linux
> > > >>boxes took less time to execute the long code 64*10^6 
> > times[1], I took a
> > > >>look at how the testcode performed on AIX with gcc.  
> > While the switch to
> > > >>long did result in a minor performance improvement, dropping the
> > > >>MemSetLoop in favour of the native memset resulted in the 
> > tests taking
> > > >>~25% the time as the MemSetLoop-like int loop. The 32-bit 
> > linux system I
> > > >>ran the expanded tests on showed that for the buffer size 
> > range that
> > > >>postgres can use the looping MemSet instead of memset 
> > (size <= 1024
> > > >>bytes), MemSet generally had better performance.
> > > > 
> > > > 
> > > > Could you please check the asm output to see what's going 
> > on. We've had
> > > > tests like these produce odd results in the past because 
> > the compiler
> > > > optimised away stuff that didn't have any effect. Since 
> > every memset
> > > > after the first is a no-op, you want to make sure it's 
> > still actually
> > > > doing the work...
> > > 
> > > Well, on both linux and AIX, all 30 of the 64000000 iterations loops
> > > from the source exist (10 int, 10 long, 10 memset).  According to my
> > > understanding of the assembler, memset itself is only 
> > called for values
> > > >= 64 bytes on both platforms and the memset is called in 
> > each iteration.
> > > 
> > > The assembler for the 64 byte loops, with prepended line 
> > number, first
> > > loop MemSetLoop int-variant, second loop memset, third loop 
> > MemSetLoop
> > > long-variant:
> > > 
> > > 64-bit AIX:
> > > 
> > >     419     addi 3,1,112
> > >     420     li 4,0
> > >     421     bl .gettimeofday
> > >     422     nop
> > >     423     lis 10,0x3d0
> > >     424     cmpld 6,26,16
> > >     425     li 11,0
> > >     426     ori 10,10,36864
> > >     427 L..41:
> > >     428     bge 6,L..42
> > >     429     mr 9,26
> > >     430     li 0,0
> > >     431 L..44:
> > >     432     stw 0,0(9)
> > >     433     addi 9,9,4
> > >     434     cmpld 7,16,9
> > >     435     bgt 7,L..44
> > >     436 L..42:
> > >     437     addi 0,11,1
> > >     438     extsw 11,0
> > >     439     cmpw 7,11,10
> > >     440     bne+ 7,L..41
> > >     441     li 4,0
> > >     442     mr 3,22
> > >     443     lis 25,0x3d0
> > >     444     li 28,0
> > >     445     bl .gettimeofday
> > >     446     nop
> > >     447     li 4,64
> > >     448     addi 5,1,112
> > >     449     ld 3,LC..9(2)
> > >     450     mr 6,22
> > >     451     ori 25,25,36864
> > >     452     bl .print_time
> > >     453     addi 3,1,112
> > >     454     li 4,0
> > >     455     bl .gettimeofday
> > >     456     nop
> > >     457 L..46:
> > >     458     mr 3,26
> > >     459     li 4,0
> > >     460     li 5,64
> > >     461     bl .memset
> > >     462     nop
> > >     463     addi 0,28,1
> > >     464     extsw 28,0
> > >     465     cmpw 7,28,25
> > >     466     bne+ 7,L..46
> > >     467     li 4,0
> > >     468     mr 3,22
> > >     469     bl .gettimeofday
> > >     470     nop
> > >     471     li 4,64
> > >     472     addi 5,1,112
> > >     473     ld 3,LC..11(2)
> > >     474     mr 6,22
> > >     475     bl .print_time
> > >     476     addi 3,1,112
> > >     477     li 4,0
> > >     478     bl .gettimeofday
> > >     479     nop
> > >     480     lis 10,0x3d0
> > >     481     cmpld 6,26,16
> > >     482     li 11,0
> > >     483     ori 10,10,36864
> > >     484 L..48:
> > >     485     bge 6,L..49
> > >     486     mr 9,26
> > >     487     li 0,0
> > >     488 L..51:
> > >     489     std 0,0(9)
> > >     490     addi 9,9,8
> > >     491     cmpld 7,9,16
> > >     492     blt 7,L..51
> > >     493 L..49:
> > >     494     addi 0,11,1
> > >     495     extsw 11,0
> > >     496     cmpw 7,11,10
> > >     497     bne+ 7,L..48
> > >     498     li 4,0
> > >     499     mr 3,22
> > >     500     bl .gettimeofday
> > >     501     nop
> > >     502     li 4,64
> > >     503     addi 5,1,112
> > >     504     ld 3,LC..13(2)
> > >     505     mr 6,22
> > >     506     bl .print_time
> > > 
> > > 
> > > 32-bit Linux:
> > > 
> > >     387     popl    %ecx
> > >     388     popl    %edi
> > >     389     pushl   $0
> > >     390     leal    -20(%ebp), %edx
> > >     391     pushl   %edx
> > >     392     call    gettimeofday
> > >     393     xorl    %edx, %edx
> > >     394     addl    $16, %esp
> > >     395 .L41:
> > >     396     movl    -4160(%ebp), %eax
> > >     397     cmpl    %eax, -4144(%ebp)
> > >     398     jae .L42
> > >     399     movl    -4144(%ebp), %eax
> > >     400 .L44:
> > >     401     movl    $0, (%eax)
> > >     402     addl    $4, %eax
> > >     403     cmpl    %eax, -4160(%ebp)
> > >     404     ja  .L44
> > >     405 .L42:
> > >     406     incl    %edx
> > >     407     cmpl    $64000000, %edx
> > >     408     jne .L41
> > >     409     subl    $8, %esp
> > >     410     pushl   $0
> > >     411     leal    -28(%ebp), %edx
> > >     412     pushl   %edx
> > >     413     call    gettimeofday
> > >     414     leal    -28(%ebp), %eax
> > >     415     movl    %eax, (%esp)
> > >     416     leal    -20(%ebp), %ecx
> > >     417     movl    $64, %edx
> > >     418     movl    $.LC5, %eax
> > >     419     call    print_time
> > >     420     popl    %eax
> > >     421     popl    %edx
> > >     422     pushl   $0
> > >     423     leal    -20(%ebp), %edx
> > >     424     pushl   %edx
> > >     425     call    gettimeofday
> > >     426     xorl    %edi, %edi
> > >     427     addl    $16, %esp
> > >     428 .L46:
> > >     429     pushl   %eax
> > >     430     pushl   $64
> > >     431     pushl   $0
> > >     432     movl    -4144(%ebp), %ecx
> > >     433     pushl   %ecx
> > >     434     call    memset
> > >     435     incl    %edi
> > >     436     addl    $16, %esp
> > >     437     cmpl    $64000000, %edi
> > >     438     jne .L46
> > >     439     subl    $8, %esp
> > >     440     pushl   $0
> > >     441     leal    -28(%ebp), %eax
> > >     442     pushl   %eax
> > >     443     call    gettimeofday
> > >     444     leal    -28(%ebp), %edx
> > >     445     movl    %edx, (%esp)
> > >     446     leal    -20(%ebp), %ecx
> > >     447     movl    $64, %edx
> > >     448     movl    $.LC6, %eax
> > >     449     call    print_time
> > >     450     popl    %eax
> > >     451     popl    %edx
> > >     452     pushl   $0
> > >     453     leal    -20(%ebp), %eax
> > >     454     pushl   %eax
> > >     455     call    gettimeofday
> > >     456     xorl    %edx, %edx
> > >     457     addl    $16, %esp
> > >     458 .L48:
> > >     459     movl    -4160(%ebp), %eax
> > >     460     cmpl    %eax, -4144(%ebp)
> > >     461     jae .L49
> > >     462     movl    -4144(%ebp), %eax
> > >     463 .L51:
> > >     464     movl    $0, (%eax)
> > >     465     addl    $4, %eax
> > >     466     cmpl    -4160(%ebp), %eax
> > >     467     jb  .L51
> > >     468 .L49:
> > >     469     incl    %edx
> > >     470     cmpl    $64000000, %edx
> > >     471     jne .L48
> > >     472     subl    $8, %esp
> > >     473     pushl   $0
> > >     474     leal    -28(%ebp), %edx
> > >     475     pushl   %edx
> > >     476     call    gettimeofday
> > >     477     leal    -28(%ebp), %eax
> > >     478     movl    %eax, (%esp)
> > >     479     leal    -20(%ebp), %ecx
> > >     480     movl    $64, %edx
> > >     481     movl    $.LC7, %eax
> > >     482     call    print_time
> > > 
> > > -- 
> > > Seneca Cunningham
> > > [EMAIL PROTECTED]
> > > 
> > > ---------------------------(end of 
> > broadcast)---------------------------
> > > TIP 5: don't forget to increase your free space map settings
> > > 
> > 
> > -- 
> >   Bruce Momjian                        |  http://candle.pha.pa.us
> >   pgman@candle.pha.pa.us               |  (610) 359-1001
> >   +  If your life is a hard drive,     |  13 Roberts Road
> >   +  Christ can be your backup.        |  Newtown Square, 
> > Pennsylvania 19073
> > 
> > ---------------------------(end of 
> > broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >        choose an index scan if your joining column's datatypes do not
> >        match
> > 
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: configure
===================================================================
RCS file: /cvsroot/pgsql/configure,v
retrieving revision 1.473
diff -c -c -r1.473 configure
*** configure   17 Jan 2006 23:52:27 -0000      1.473
--- configure   3 Feb 2006 13:48:02 -0000
***************
*** 21516,21521 ****
--- 21516,21532 ----
  SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
  
  
+ # If not set in template file, set bytes to use libc memset()
+ if test x"$MEMSET_LOOP_LIMIT" = x"" ; then
+   MEMSET_LOOP_LIMIT=1024
+ fi
+ 
+ cat >>confdefs.h <<_ACEOF
+ #define MEMSET_LOOP_LIMIT ${MEMSET_LOOP_LIMIT}
+ _ACEOF
+ 
+ 
+ 
  if test "$enable_nls" = yes ; then
  
    echo "$as_me:$LINENO: checking for library containing gettext" >&5
Index: configure.in
===================================================================
RCS file: /cvsroot/pgsql/configure.in,v
retrieving revision 1.443
diff -c -c -r1.443 configure.in
*** configure.in        17 Jan 2006 23:52:30 -0000      1.443
--- configure.in        3 Feb 2006 13:48:06 -0000
***************
*** 1249,1254 ****
--- 1249,1261 ----
  SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
  
  
+ # If not set in template file, set bytes to use libc memset()
+ if test x"$MEMSET_LOOP_LIMIT" = x"" ; then
+   MEMSET_LOOP_LIMIT=1024
+ fi
+ AC_DEFINE_UNQUOTED(MEMSET_LOOP_LIMIT, ${MEMSET_LOOP_LIMIT}, [Define bytes to 
use libc memset().])
+ 
+ 
  if test "$enable_nls" = yes ; then
    PGAC_CHECK_GETTEXT
  fi
Index: src/include/c.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/c.h,v
retrieving revision 1.194
diff -c -c -r1.194 c.h
*** src/include/c.h     5 Jan 2006 03:01:37 -0000       1.194
--- src/include/c.h     3 Feb 2006 13:48:09 -0000
***************
*** 614,622 ****
   *    overhead.       However, we have also found that the loop is faster than
   *    native libc memset() on some platforms, even those with assembler
   *    memset() functions.  More research needs to be done, perhaps with
!  *    platform-specific MEMSET_LOOP_LIMIT values or tests in configure.
!  *
!  *    bjm 2002-10-08
   */
  #define MemSet(start, val, len) \
        do \
--- 614,620 ----
   *    overhead.       However, we have also found that the loop is faster than
   *    native libc memset() on some platforms, even those with assembler
   *    memset() functions.  More research needs to be done, perhaps with
!  *    MEMSET_LOOP_LIMIT tests in configure.
   */
  #define MemSet(start, val, len) \
        do \
***************
*** 629,635 ****
                if ((((long) _vstart) & INT_ALIGN_MASK) == 0 && \
                        (_len & INT_ALIGN_MASK) == 0 && \
                        _val == 0 && \
!                       _len <= MEMSET_LOOP_LIMIT) \
                { \
                        int32 *_start = (int32 *) _vstart; \
                        int32 *_stop = (int32 *) ((char *) _start + _len); \
--- 627,638 ----
                if ((((long) _vstart) & INT_ALIGN_MASK) == 0 && \
                        (_len & INT_ALIGN_MASK) == 0 && \
                        _val == 0 && \
!                       _len <= MEMSET_LOOP_LIMIT && \
!                       /* \
!                        *      If MEMSET_LOOP_LIMIT == 0, optimizer should 
find \
!                        *      the whole "if" false at compile time. \
!                        */ \
!                       MEMSET_LOOP_LIMIT != 0) \
                { \
                        int32 *_start = (int32 *) _vstart; \
                        int32 *_stop = (int32 *) ((char *) _start + _len); \
***************
*** 640,647 ****
                        memset(_vstart, _val, _len); \
        } while (0)
  
- #define MEMSET_LOOP_LIMIT  1024
- 
  /*
   * MemSetAligned is the same as MemSet except it omits the test to see if
   * "start" is word-aligned.  This is okay to use if the caller knows a-priori
--- 643,648 ----
***************
*** 657,663 ****
  \
                if ((_len & INT_ALIGN_MASK) == 0 && \
                        _val == 0 && \
!                       _len <= MEMSET_LOOP_LIMIT) \
                { \
                        int32 *_stop = (int32 *) ((char *) _start + _len); \
                        while (_start < _stop) \
--- 658,665 ----
  \
                if ((_len & INT_ALIGN_MASK) == 0 && \
                        _val == 0 && \
!                       _len <= MEMSET_LOOP_LIMIT && \
!                       MEMSET_LOOP_LIMIT != 0) \
                { \
                        int32 *_stop = (int32 *) ((char *) _start + _len); \
                        while (_start < _stop) \
***************
*** 679,684 ****
--- 681,687 ----
  #define MemSetTest(val, len) \
        ( ((len) & INT_ALIGN_MASK) == 0 && \
        (len) <= MEMSET_LOOP_LIMIT && \
+       MEMSET_LOOP_LIMIT != 0 && \
        (val) == 0 )
  
  #define MemSetLoop(start, val, len) \
Index: src/include/pg_config.h.in
===================================================================
RCS file: /cvsroot/pgsql/src/include/pg_config.h.in,v
retrieving revision 1.90
diff -c -c -r1.90 pg_config.h.in
*** src/include/pg_config.h.in  17 Jan 2006 23:52:31 -0000      1.90
--- src/include/pg_config.h.in  3 Feb 2006 13:48:12 -0000
***************
*** 576,581 ****
--- 576,584 ----
  /* Define as the maximum alignment requirement of any C data type. */
  #undef MAXIMUM_ALIGNOF
  
+ /* Define bytes to use libc memset(). */
+ #undef MEMSET_LOOP_LIMIT
+ 
  /* Define to the address where bug reports for this package should be sent. */
  #undef PACKAGE_BUGREPORT
  
Index: src/template/aix
===================================================================
RCS file: /cvsroot/pgsql/src/template/aix,v
retrieving revision 1.17
diff -c -c -r1.17 aix
*** src/template/aix    2 Dec 2004 18:11:40 -0000       1.17
--- src/template/aix    3 Feb 2006 13:48:13 -0000
***************
*** 8,10 ****
--- 8,14 ----
        ;;
    esac
  fi
+ 
+ # native memset() is faster, 2006-02-03
+ # XLC 6.0, (IBM's cc), tested on AIX 5.2 and 5.1
+ MEMSET_LOOP_LIMIT=0
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to