> If your load is primarily big update statements, maybe so...

It is. Maybe we are anomalous here.

> I don't think I buy that claim.  We don't normally fsync the log file
> except at transaction commit (and read-only transactions
> don't generate
> any commit record, so they don't cause an fsync).  If a single
> transaction is generating lots of log data, it doesn't have
> to wait for
> that data to hit disk before it can do more stuff.

I have glanced at the code, and I agree that reads do not generate fsync() calls. Since I watched my mirrored RAID 5 arrays hit 2,000 iops with an average request of 4 KB on a recent batch update with Pg, I still think that Pg may be fsync()-ing a bit too often.

Since I haven't digested all of the code, I am speaking a bit out of turn.

> But having said that --- on some platforms our default WAL sync method
> is open_datasync, which could result in the sort of behavior you are
> talking about.  Try experimenting with the other possible values of
> wal_sync_method to see if you like them better.

I will have to check that. I am running Sparc Solaris 8.

> That probably gets you into a situation where no I/O
> is really happening
> at all, it's just being absorbed by kernel disk
> buffers.

Few things would please me more.

> Unfortunately
> that doesn't have a lot to do with the performance you can get if you
> want to be sure you don't lose data ...

I am not sure these are as mutually exclusive as it looks here.

>
> BTW, one thing you can do to reduce the WAL I/O volume in Postgres is
> to increase the inter-checkpoint interval (there are two settings to
> increase, one time-based and one volume-based).  The first write of a
> given data page after a checkpoint dumps the whole page into WAL, as a
> safety measure to deal with partial page writes during power failures.
> So right after a checkpoint the WAL volume goes way up.  With a longer
> interval between checkpoints you don't pay that price as often.

I did that and it helped tremendously. Without proper tuning, I just made the numbers pretty large:

shared_buffers = 100000
sort_mem = 131072
vacuum_mem = 65536
wal_buffers = 8192
checkpoint_segments = 32

Thanks for your feedback.

Sincerely,
Marty

Tom Lane wrote:
Marty Scholes <[EMAIL PROTECTED]> writes:

My experience with Oracle (and now limited experience with Pg) is that the major choke point in performance is not the CPU or read I/O, it is the log performance of big update and select statements.


If your load is primarily big update statements, maybe so...


Essentially, the data is written twice: first to the log and then the data files. This would be ok except the transaction is regularly frozen while the log files sync to disk with a bunch of tiny (8KB for Oracle and Pg) write requests.


I don't think I buy that claim.  We don't normally fsync the log file
except at transaction commit (and read-only transactions don't generate
any commit record, so they don't cause an fsync).  If a single
transaction is generating lots of log data, it doesn't have to wait for
that data to hit disk before it can do more stuff.

But having said that --- on some platforms our default WAL sync method
is open_datasync, which could result in the sort of behavior you are
talking about.  Try experimenting with the other possible values of
wal_sync_method to see if you like them better.


If a transaction will do large updates or inserts, why don't we just log the parsed statements in the WAL instead of the individual data blocks that have changed?


As already pointed out, this would not give enough information to
reproduce the database state.


Some informal testing suggests that we get a factor of 8 improvement in speed here if we completely disable fsync() in large updates under Pg.


That probably gets you into a situation where no I/O is really happening
at all, it's just being absorbed by kernel disk buffers.  Unfortunately
that doesn't have a lot to do with the performance you can get if you
want to be sure you don't lose data ...

BTW, one thing you can do to reduce the WAL I/O volume in Postgres is
to increase the inter-checkpoint interval (there are two settings to
increase, one time-based and one volume-based).  The first write of a
given data page after a checkpoint dumps the whole page into WAL, as a
safety measure to deal with partial page writes during power failures.
So right after a checkpoint the WAL volume goes way up.  With a longer
interval between checkpoints you don't pay that price as often.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings

Reply via email to