Re: [HACKERS] File system performance and pg_xlog

2001-05-07 Thread Trond Eivind Glomsrød

Marko Kreen [EMAIL PROTECTED] writes:

 On Sat, May 05, 2001 at 10:10:33PM -0400, mlw wrote:
  I think it is simpler problem than that. Postgres, with fsync enabled, does a
  lot of work trying to maintain data integrity. It is logical to conclude that a
  file system that does as little as possible would almost always perform better.
  Regardless of what the file system does, eventually it writes blocks of data to
  sectors on a disk.
 
 But there's more, when PostgreSQL today 'uses a fs' it also get
 all the caching/optimizing algorithms in os kernel 'for free'.
 
  Many databases use their own data volume management. I am not suggesting that
  anyone create a new file system, but after performing some tests, I am really
  starting to see why products like oracle manage their own table spaces.
  
  If one looks at the FAT file system with an open mind and a clear understanding
  of how it will be used, some small modifications may make it the functional
  equivalent of a managed table space volume, at least under Linux.
 
 Are you talking about new in-kernel fs?  Lets see, how many
 os'es PostgreSQL today supports?

If you're using raw devices on Linux and get a win there, it's a win
for Postgresql on Linux. This is important for everyone using it on
this platform (probably a big chunk of the users). And who uses all
the new features and performance enhancements done in other ways?

It all comes down to if it actually would give a performance boost,
how much work it is and if someone wants to do it.
 

-- 
Trond Eivind Glomsrød
Red Hat, Inc.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] File system performance and pg_xlog

2001-05-07 Thread Tom Lane

[EMAIL PROTECTED] (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) writes:
 If you're using raw devices on Linux and get a win there, it's a win
 for Postgresql on Linux. ...
 It all comes down to if it actually would give a performance boost,
 how much work it is and if someone wants to do it.

No, those are not the only considerations.  If the feature is not
portable then we also have to consider how much of a headache it'll be
to maintain in parallel with a more portable approach.  We might reject
such a feature even if it's a clear win for Linux, if it creates enough
problems elsewhere.  Postgres is *not* a Linux-only application, and I
trust it never will be.

regards, tom lane

PS: that's not meant to reject the idea out-of-hand; perhaps the
benefits will prove to be so large that we will want to do it
anyway.  I'm just trying to counter what appears to be a narrowly
platform-centric view of the issues.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] File system performance and pg_xlog

2001-05-07 Thread Trond Eivind Glomsrød

Bruce Momjian [EMAIL PROTECTED] writes:

 That is a major issue for people running performance tests.  For
 example, XFS may be slow on 2.2 kernels but not 2.4 kernels.

XFS is 2.4 only, AFAIK - even the installer modifications SGI did to
Red Hat Linux 7 (which is shipped with a 2.2 kernel) includes
installing a 2.4pre kernel, AFAIR.

-- 
Trond Eivind Glomsrød
Red Hat, Inc.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] File system performance and pg_xlog

2001-05-05 Thread mlw

A small debate started with bad performance on ReiserFS. I pondered the likely
advantages to raw device access. It also occured to me that the FAT file system
is about as close to a managed raw device as one could get. So I did some
tests:
The hardware:

A PII system running Linux 7.0, with 2.2.16-2.
256M RAM
IDE home hard disk.
Adaptec 2740 with two SCSI drives
A 9G Seagate ST19171W as /dev/sda1 mounted as /sda1
A 4G Seagate ST15150W as /dev/sdb1 mounted as /sdb1
/sda1 has a ext2 file system, and is used as base with a symlink.
/sdb1 is either an ext2 or FAT file system used as pg_xlog with a symlink.


In a clean Postgres environment, I initialized pgbench as:
./pgbench -i -s 10 -d pgbench

I used this script to produce the results:

psql -U mohawk pgbench -c checkpoint; 
su mohawk -c ./pgbench -d pgbench -t 32 -c 1
psql -U mohawk pgbench  -c checkpoint; 
su mohawk -c ./pgbench -d pgbench -t 32 -c 2
psql -U mohawk pgbench  -c checkpoint; 
su mohawk -c ./pgbench -d pgbench -t 32 -c 3
psql -U mohawk pgbench  -c checkpoint; 
su mohawk -c ./pgbench -d pgbench -t 32 -c 4
psql -U mohawk pgbench  -c checkpoint; 
su mohawk -c ./pgbench -d pgbench -t 32 -c 5
psql -U mohawk pgbench  -c checkpoint; 
su mohawk -c ./pgbench -d pgbench -t 32 -c 6
psql -U mohawk pgbench  -c checkpoint; 
su mohawk -c ./pgbench -d pgbench -t 32 -c 7
psql -U mohawk pgbench  -c checkpoint; 
su mohawk -c ./pgbench -d pgbench -t 32 -c 8

(My postgres user is mohawk)

I had to modify xlog.c to use rename instead of link. And I had to explicitly
set ownership of the FAT file system to the postgres user during mount.

I ran the script twice as:

./test.sh  ext2.log

(Then rebuilt a fresh database and formatted sdb1 as fat)
./test.sh  fat.log

Here is a diff of the two runs:

--- ext2.logSat May  5 12:58:07 2001
+++ fat.log Sat May  5 12:58:07 2001
@@ -5,8 +5,8 @@
 number of clients: 1
 number of transactions per client: 32
 number of transactions actually processed: 32/32
-tps = 18.697006(including connections establishing)
-tps = 19.193225(excluding connections establishing)
+tps = 37.439512(including connections establishing)
+tps = 39.710461(excluding connections establishing)
 CHECKPOINT
 pghost: (null) pgport: (null) nclients: 2 nxacts: 32 dbName: pgbench
 transaction type: TPC-B (sort of)
@@ -14,8 +14,8 @@
 number of clients: 2
 number of transactions per client: 32
 number of transactions actually processed: 64/64
-tps = 32.444226(including connections establishing)
-tps = 33.499452(excluding connections establishing)
+tps = 44.782177(including connections establishing)
+tps = 46.799328(excluding connections establishing)
 CHECKPOINT
 pghost: (null) pgport: (null) nclients: 3 nxacts: 32 dbName: pgbench
 transaction type: TPC-B (sort of)
@@ -23,8 +23,8 @@
 number of clients: 3
 number of transactions per client: 32
 number of transactions actually processed: 96/96
-tps = 43.042861(including connections establishing)
-tps = 44.816086(excluding connections establishing)
+tps = 55.416117(including connections establishing)
+tps = 58.057013(excluding connections establishing)
 CHECKPOINT
 pghost: (null) pgport: (null) nclients: 4 nxacts: 32 dbName: pgbench
 transaction type: TPC-B (sort of)
@@ -32,8 +32,8 @@
 number of clients: 4
 number of transactions per client: 32
 number of transactions actually processed: 128/128
-tps = 46.033959(including connections establishing)
-tps = 47.681683(excluding connections establishing)
+tps = 61.752368(including connections establishing)
+tps = 64.796970(excluding connections establishing)
 CHECKPOINT
 pghost: (null) pgport: (null) nclients: 5 nxacts: 32 dbName: pgbench
 transaction type: TPC-B (sort of)
@@ -41,8 +41,8 @@
 number of clients: 5
 number of transactions per client: 32
 number of transactions actually processed: 160/160
-tps = 49.980258(including connections establishing)
-tps = 51.874653(excluding connections establishing)
+tps = 63.124090(including connections establishing)
+tps = 67.225563(excluding connections establishing)
 CHECKPOINT
 pghost: (null) pgport: (null) nclients: 6 nxacts: 32 dbName: pgbench
 transaction type: TPC-B (sort of)
@@ -50,8 +50,8 @@
 number of clients: 6
 number of transactions per client: 32
 number of transactions actually processed: 192/192
-tps = 51.800192(including connections establishing)
-tps = 53.752739(excluding connections establishing)
+tps = 65.452545(including connections establishing)
+tps = 68.741933(excluding connections establishing)
 CHECKPOINT
 pghost: (null) pgport: (null) nclients: 7 nxacts: 32 dbName: pgbench
 transaction type: TPC-B (sort of)
@@ -59,8 +59,8 @@
 number of clients: 7
 number of transactions per client: 32
 number of transactions actually processed: 224/224
-tps = 52.652660(including connections establishing)
-tps = 54.616802(excluding connections establishing)
+tps = 66.525419(including connections establishing)
+tps = 69.727409(excluding connections establishing)
 CHECKPOINT
 pghost: (null) pgport: (null) nclients: 

Re: [HACKERS] File system performance and pg_xlog

2001-05-05 Thread Marko Kreen

On Sat, May 05, 2001 at 01:09:38PM -0400, mlw wrote:
 A small debate started with bad performance on ReiserFS. I pondered the likely
 advantages to raw device access. It also occured to me that the FAT file system
 is about as close to a managed raw device as one could get. So I did some
 tests:

 /sdb1 is either an ext2 or FAT file system used as pg_xlog with a symlink.

One little thought: does mounting ext2 with 'noatime' makes any
difference?  AFAIK fat does not have concept of atime, so then
it would be more fair?  Just a thought.

-- 
marko


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] File system performance and pg_xlog

2001-05-05 Thread mlw

Marko Kreen wrote:
 
 On Sat, May 05, 2001 at 01:09:38PM -0400, mlw wrote:
  A small debate started with bad performance on ReiserFS. I pondered the likely
  advantages to raw device access. It also occured to me that the FAT file system
  is about as close to a managed raw device as one could get. So I did some
  tests:
 
  /sdb1 is either an ext2 or FAT file system used as pg_xlog with a symlink.
 
 One little thought: does mounting ext2 with 'noatime' makes any
 difference?  AFAIK fat does not have concept of atime, so then
 it would be more fair?  Just a thought.
 
 --
 marko

I don't know, and I haven't tried that, but I suspect that it won't make much
difference. 

While I do not think that anyone would seriously consider using FAT for xlog,
I'd have problems considering myself, it in a production environment, the
numbers do say something about the nature of WAL. A bunch of files, all the
same size, is practically what FAT does best. Plus there is no real overhead.

The very reasons why FAT is a POS file system are the same reasons it would
work great for WAL, with the only caveat being that fsync is implemented, and
the application (postgres) maintains its own data integrity.

Oddly enough, I did not see any performance improvement using FAT for the
base directory. That may be the nature of the pg block size vs cluster size,
fragmentation, and stuff. If I get some time I will investigate it a bit more.

Clearly not everyone would be interested in this. PG seems to be used for
everything from a small personal db, to a system component db -- like on a web
box, to a full blown stand-alone server. The first two applications may not be
interested in this sort of stuff, but last category, the full blown server
would certainly want to squeeze as much out of their system as possible.

I think a pgfs could easily be a derivative of FAT, or even FAT with some
Ioctls.  It is simple, it is fast, it does not attempt to do things postgres
doesn't need.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] File system performance and pg_xlog

2001-05-05 Thread mlw

Marko Kreen wrote:
 
 On Sat, May 05, 2001 at 06:43:51PM -0400, mlw wrote:
  Marko Kreen wrote:
   On Sat, May 05, 2001 at 01:09:38PM -0400, mlw wrote:
A small debate started with bad performance on ReiserFS. I pondered the likely
advantages to raw device access. It also occured to me that the FAT file system
is about as close to a managed raw device as one could get. So I did some
tests:
 
  I think a pgfs could easily be a derivative of FAT, or even FAT with some
  Ioctls.  It is simple, it is fast, it does not attempt to do things postgres
  doesn't need.
 
 Well, my opinion too is that it is waste of resources to try
 implement PostgreSQL-specific filesystem.  As you already showed
 that there are noticeable differences of different filesystems,
 the Right Thing would be to make a FAQ/web-page/knowledge-base
 of comments on different filesystem in point of view of DB
 (PostgreSQL) server.
 
 Also users will have different priorities:
 reliability/speed-of-reads/speed-of-writes - I mean different
 users have them ordered differently - so it should be mentioned
 this fs is good for this but bad on this, etc...  It is good
 to put this part of db on this fs but not that part of db...
 Suggestions on mount flags to use...

I think it is simpler problem than that. Postgres, with fsync enabled, does a
lot of work trying to maintain data integrity. It is logical to conclude that a
file system that does as little as possible would almost always perform better.
Regardless of what the file system does, eventually it writes blocks of data to
sectors on a disk.

Many databases use their own data volume management. I am not suggesting that
anyone create a new file system, but after performing some tests, I am really
starting to see why products like oracle manage their own table spaces.

If one looks at the FAT file system with an open mind and a clear understanding
of how it will be used, some small modifications may make it the functional
equivalent of a managed table space volume, at least under Linux.

Some of the benchmark numbers are hovering around 20% improvement! That's
nothing to sneeze at. I have a database loader that does a select nextval(..)
followed by a begin, a series of inserts, followed by a commit.

With xlog on a FAT file system, I can get 53-60 sets per second. With Xlog
sitting on ext2, I can get 40-45 sets per second. (Of the same data) These are
not insignificant improvements, and should be examined. If not from a Postgres
development perspective, at least from a deployment perspective.

 
 There already exist bazillion filesystems, _some_ of them should
 be usable for PostgreSQL too :)

I agree.


-- 
I'm not offering myself as an example; every life evolves by its own laws.

http://www.mohawksoft.com

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]