Re: [PATCHES] remove BufferBlockPointers for speed and space

2005-08-11 Thread Mark Kirkwood

Mark Kirkwood wrote:

Looks to me like -O2 makes the difference very small (on this 
platform/gcc combo) - is 5/169 worth doing?


Ahem - misunderstood your comment here, sorry.


Qingqing Zhou wrote:


compiled with gcc testbuf.c. I tried -O2 actually, and it turns out 
that

the timing is reduced a lot so not believable.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] remove BufferBlockPointers for speed and space

2005-08-11 Thread Andrew Dunstan



Gavin Sherry wrote:


Or more than one hardware architecture (which you didn't even say what
you tested...)
   



Well, he tested on SunOS (!) and Linux -- I presume that's two
architectures. 
 



Sun still calls Solaris SunOs - try doing uname -s on a Solaris box (or 
look at a buildfarm solaris build info)


So what he said he tested could mean Solaris/x86 + Linux x86 or 
Solaris/Sparc + Linux Sparc  among other possible combos. Given the 
results I assume they are at least not the same box ;-)


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] remove BufferBlockPointers for speed and space

2005-08-11 Thread Gavin Sherry
On Thu, 11 Aug 2005, Andrew Dunstan wrote:



 Gavin Sherry wrote:

 Or more than one hardware architecture (which you didn't even say what
 you tested...)
 
 
 
 Well, he tested on SunOS (!) and Linux -- I presume that's two
 architectures.
 
 

 Sun still calls Solaris SunOs - try doing uname -s on a Solaris box (or
 look at a buildfarm solaris build info)

True. But my previous experience in university environments is that SunOS
usually refers to SunOS 2.6 -- and the performance indicates old hardware.

The thing is, compilser optimised versions of the test reveal very little
difference in performance. This may be because the compiler is very good
at optimising sequential annd predictable access to the array. Instead, we
should mimic what we see in the real world: random access.

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] remove BufferBlockPointers for speed and space

2005-08-11 Thread Joshua D. Drake

Hello,

With gcc-4 on Ubuntu AMD Athlon 2600 Barton:


[EMAIL PROTECTED]:~$ ./a.out
duration round 1 of array method: 0.539 ms
duration round 2 of array method: 0.529 ms
duration round 3 of array method: 0.530 ms
duration round 1 of mul method: 0.258 ms
duration round 2 of mul method: 5.563 ms
duration round 3 of mul method: 0.258 ms
duration round 1 of shift method: 0.240 ms
duration round 2 of shift method: 0.248 ms
duration round 3 of shift method: 0.240 ms
[EMAIL PROTECTED]:~$

Sincerely,

Joshua D. Drake


Qingqing Zhou wrote:


Tom Lane [EMAIL PROTECTED] writes:
 


Also, I would like to see the actual test code.  I wonder whether what
you measured is the ability of the compiler to optimize references to
successive elements of an array inside a loop; that has little or
nothing to do with the typical usage of BufferGetBlock().

   



The source code is attached.

compiled with gcc testbuf.c. I tried -O2 actually, and it turns out that
the timing is reduced a lot so not believable.
---

/*
* testbuf.c
*/
#include stdio.h
#include sys/file.h
#include sys/param.h
#include sys/stat.h
#include sys/time.h
#include unistd.h
#include fcntl.h

#define BLCKSZ  8192
#define NBuffers 8

typedef void* Block;

int main(void)
{
int  i, round, method;
Block k, start;
struct timeval start_t, stop_t;
long usecs;
Block *array = (Block *) calloc(NBuffers, sizeof(Block));

start = (Block)0xff3386;
for (i = 0; i  NBuffers; i++)
 array[i] = start + BLCKSZ*i;

for (method = 0; method  3; method ++)
{
 start = (Block)0xff3386;
 for (round = 0; round  3; round ++)
 {
  gettimeofday(start_t, NULL);
  if (method == 0)
  {
   for (i = 0; i  NBuffers; i++)
k = array[i];
  }
  if (method == 1)
  {
   for (i = 0; i  NBuffers; i++)
k = start + i*BLCKSZ;
  }
  if (method == 2)
  {
   for (i = 0; i  NBuffers; i++)
k = start + (i13);
  }
  gettimeofday(stop_t, NULL);

  if (stop_t.tv_usec  start_t.tv_usec)
  {
stop_t.tv_sec--;
stop_t.tv_usec += 100;
  }

  usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 100
+ (long) (stop_t.tv_usec - start_t.tv_usec);

  fprintf (stdout, duration round %d of %s method: %ld.%03ld ms\n,
  round + 1,
  method==0?array:method==1?mul:shift,
  (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
(stop_t.tv_usec - start_t.tv_usec) / 1000),
  (long) (stop_t.tv_usec - start_t.tv_usec) % 1000);
 }
}
}



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match
 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] remove BufferBlockPointers for speed and space

2005-08-11 Thread Qingqing Zhou

Tom Lane [EMAIL PROTECTED] writes:

 This is definitely going to tell us a lot more about the compiler's loop
 strength reduction algorithms than it is going to tell about the time to
 evaluate any one of these expressions in isolation.  What's worse, the
 compiler would be quite within its rights to detect that k isn't used
 anywhere, and optimize the loop away completely.


Yes, it is possible. A safter way is to examine the assembling code. And I
found there is no such optimization in Linux/gcc3.2.

 What I would suggest is first to fill another array with some large
 number of buffer numbers (randomly chosen from 1..N) and then time loops
 of the form

 for (i = 0; i  arraysize; i++)
 {
 bn = buffernumbers[i];
 bufferlocs[i] = BufferGetBlock(bn);
 }

 for all three possible definitions of BufferGetBlock (where both arrays
 are global variables, just to be sure the compiler doesn't think it can
 discard the store).  This should give some numbers worth trusting.


I've patched the code according to your suggestion. Result is:

SunOS/gcc 3.2
verify: 1308873472 :duration round 1 of array method: 8.906 ms
verify: 1308873472 :duration round 2 of array method: 8.775 ms
verify: 1308873472 :duration round 3 of array method: 8.761 ms
verify: 1308873472 :duration round 1 of mul method: 6.454 ms
verify: 1308873472 :duration round 2 of mul method: 6.545 ms
verify: 1308873472 :duration round 3 of mul method: 6.535 ms
verify: 1308873472 :duration round 1 of shift method: 6.534 ms
verify: 1308873472 :duration round 2 of shift method: 6.597 ms
verify: 1308873472 :duration round 3 of shift method: 6.540 ms

Linux/gcc 3.2
verify: -2141277440 :duration round 1 of array method: 2.385 ms
verify: -2141277440 :duration round 2 of array method: 2.116 ms
verify: -2141277440 :duration round 3 of array method: 2.057 ms
verify: -2141277440 :duration round 1 of mul method: 0.943 ms
verify: -2141277440 :duration round 2 of mul method: 0.983 ms
verify: -2141277440 :duration round 3 of mul method: 0.840 ms
verify: -2141277440 :duration round 1 of shift method: 0.864 ms
verify: -2141277440 :duration round 2 of shift method: 0.931 ms
verify: -2141277440 :duration round 3 of shift method: 0.968 ms


The test file is attached:
---
/*
 * buftest.c
 */
#include stdio.h
#include stdlib.h
#include sys/file.h
#include sys/time.h
#include unistd.h

#define BLCKSZ  8192
#define NBuffers 8

typedef void* Block;

Block *array = NULL;
Block *reslt = NULL;
int  *bufnm = NULL;

static int grand(int min, int max)
{
return (min + (int) (max * 1.0 * rand() / (RAND_MAX + 1.0)));
}

int main(void)
{
 int  n, i, round, method;
 Block start;
 struct timeval start_t, stop_t;
 long usecs;

 srandom(getpid());

 array = (Block *) calloc(NBuffers, sizeof(Block));
 reslt = (Block *) calloc(NBuffers, sizeof(Block));
 bufnm = (int   *) calloc(NBuffers, sizeof(int));

 start = (Block)0xff3386;
 for (i = 0; i  NBuffers; i++)
  array[i] = start + BLCKSZ*i;
 for (i = 0; i  NBuffers; i++)
  bufnm[i] = grand(0, NBuffers);

 for (method = 0; method  3; method ++)
 {
  start = (Block)0xff3386;
  for (round = 0; round  3; round ++)
  {
   gettimeofday(start_t, NULL);
   if (method == 0)
   {
for (i = 0; i  NBuffers; i++)
{
 n = bufnm[i];
 reslt[i] = array[n];
}

   }
   if (method == 1)
   {
for (i = 0; i  NBuffers; i++)
{
 n = bufnm[i];
 reslt[i] = start + n*BLCKSZ;
}
   }
   if (method == 2)
   {
for (i = 0; i  NBuffers; i++)
{
 n = bufnm[i];
 reslt[i] = start + (n13);
}
   }
   gettimeofday(stop_t, NULL);

   usecs = 0;
   for (i = 0; i NBuffers; i++)
usecs += (long) reslt[i];
   fprintf(stdout, verify: %ld :, usecs );

   if (stop_t.tv_usec  start_t.tv_usec)
   {
 stop_t.tv_sec--;
 stop_t.tv_usec += 100;
   }

   usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 100
 + (long) (stop_t.tv_usec - start_t.tv_usec);

   fprintf (stdout, duration round %d of %s method: %ld.%03ld ms\n,
   round + 1,
   method==0?array:method==1?mul:shift,
   (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
 (stop_t.tv_usec - start_t.tv_usec) / 1000),
   (long) (stop_t.tv_usec - start_t.tv_usec) % 1000);
  }
 }

 free(array);
 free(reslt);
 free(bufnm);

 return 0;
}




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] remove BufferBlockPointers for speed and space

2005-08-11 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes:
 I've patched the code according to your suggestion. Result is:
 [ snip ]

OK, that test seems a little more believable.  One point you didn't
consider is that on 64-bit machines, the integer bufnum really has
to be coerced to size_t to avoid overflow if the buffer array exceeds
2Gb.  (Which we don't support today, but might well by the end of
day tomorrow, seeing that there's a patch in the queue about it.)
But I ran the test case with the extra coercion on an IA64 machine at
Red Hat, and got substantially the same results as you did: the array
method is just slower.  Another consideration is that the array is
competing for L2 cache --- the test program can't really show that,
since it has no other use for L2 cache, but in the context of the real
database I suspect this is at least as much of a win as shaving a few
nanoseconds off the BufferGetBlock macro itself.

So ... patch applied, and thanks for the good idea!

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] remove BufferBlockPointers for speed and space

2005-08-10 Thread Gavin Sherry
On Thu, 11 Aug 2005, Qingqing Zhou wrote:

 It is said that the BufferBlockPointers is used to speedup the
 BufferGetBlock() macro. I compared three ways of getting block pointers.
 I.e., existing method (arrary method), calculating block pointer by adding
 base addr and offset*blockid method (mul method) and optimizing mul method
 by using bit shift (shift method). All of them calculate the block pointer
 8 times (i.e., the BufferBlockPointers array is of size 8), and each
 take 3 rounds.

 The result is:

 SunOS/gcc 3.2
 duration round 1 of array method: 4.179 ms
 duration round 2 of array method: 4.160 ms
 duration round 3 of array method: 4.143 ms
 duration round 1 of mul method: 3.311 ms
 duration round 2 of mul method: 3.233 ms
 duration round 3 of mul method: 3.233 ms
 duration round 1 of shift method: 3.554 ms
 duration round 2 of shift method: 3.235 ms
 duration round 3 of shift method: 3.233 ms

 Linux/gcc 3.2
 duration round 1 of array method: 0.422 ms
 duration round 2 of array method: 0.324 ms
 duration round 3 of array method: 0.354 ms
 duration round 1 of mul method: 0.271 ms
 duration round 2 of mul method: 0.248 ms
 duration round 3 of mul method: 0.304 ms
 duration round 1 of shift method: 0.322 ms
 duration round 2 of shift method: 0.239 ms
 duration round 3 of shift method: 0.265 ms

 We can conclude that:
 (1) mul or shift are definitely better than array method;
 (2) mul and shift are comparable;

Do you have results for more recent gcc releases?

Thanks,

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend