In the last couple of days, I've been running a lot of DBT-2 tests and smaller microbenchmarks with different bgwriter settings and experimental patches, but I have not been able to produce a repeatable test case where any of the bgwriter configurations perform better than not having bgwriter at all.

I encountered a strange phenomenon that I don't understand. I ran a small test case with DELETEs in random order, using an index, on a table ~300MB table, with shared_buffers smaller than that. I expected that to be dominated by the speed postgres can swap pages in and out of the shared buffer cache, but surprisingly the test starts to block on the write I/O, even though the table fits completely in OS cache. I was able to reproduce the phenomenon with a simple C program that writes 8k blocks in random order to a fixed size file. I've attached it along with output of running it on my test server. The output shows how the writes start to periodically block after a while. I was able to reproduce the problem on my laptop as well. Can anyone explain what's going on?

Anyone out there have a repeatable test case where bgwriter helps?

  Heikki Linnakangas
#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <time.h>

int main(int argc, char **argv)
  int fd;
  off_t len;
  char buf[8192];
  int i;
  int size;
  struct timeval begin_t;

  if (argc != 3)
    printf("Usage: writetest <filename> <size in MB>\n");

  fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, S_IWUSR | S_IRUSR);
  if (fd == -1)
  size = atoi(argv[2]) * 1024 * 1024;

  for(i=0; i < size;)
    i += write(fd, buf, sizeof(buf));

  len = i;


  gettimeofday(&begin_t, NULL);
  for(i = 0; i < 10000000; i++)
    lseek(fd, ((random() % (len / sizeof(buf)))) * sizeof(buf), SEEK_SET);
    write(fd, buf, sizeof(buf));
    if(i % 40000 == 0)
      struct timeval t;
      long msecs;

      gettimeofday(&t, NULL);
      msecs = (t.tv_sec - begin_t.tv_sec) * 1000 +(t.tv_usec - begin_t.tv_usec) / 1000;
      printf("%d blocks written, time=%ld ms\n", i, msecs);
      begin_t = t;
./writetest /mnt/data/writetest-data 80
0 blocks written, time=0 ms
40000 blocks written, time=251 ms
80000 blocks written, time=241 ms
120000 blocks written, time=241 ms
160000 blocks written, time=241 ms
200000 blocks written, time=242 ms
240000 blocks written, time=242 ms
280000 blocks written, time=241 ms
320000 blocks written, time=241 ms
360000 blocks written, time=242 ms
400000 blocks written, time=241 ms
440000 blocks written, time=241 ms
480000 blocks written, time=241 ms
520000 blocks written, time=242 ms
560000 blocks written, time=241 ms
600000 blocks written, time=241 ms
640000 blocks written, time=242 ms
680000 blocks written, time=242 ms
720000 blocks written, time=242 ms
760000 blocks written, time=241 ms
800000 blocks written, time=242 ms
840000 blocks written, time=4579 ms
880000 blocks written, time=244 ms
920000 blocks written, time=242 ms
960000 blocks written, time=4752 ms
1000000 blocks written, time=241 ms
1040000 blocks written, time=4618 ms
1080000 blocks written, time=242 ms
1120000 blocks written, time=4614 ms
1160000 blocks written, time=246 ms
1200000 blocks written, time=243 ms
1240000 blocks written, time=4619 ms
1280000 blocks written, time=242 ms
1320000 blocks written, time=242 ms
1360000 blocks written, time=4605 ms
1400000 blocks written, time=242 ms

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?


Reply via email to