Hello Tomas,

I do agree that fprintf is not cheap, actually when profiling pgbench
it's often the #1 item, but the impact on the measurements is actually
quite small. For example with a small database (scale 10) and read-only
30-second runs (single client), I get this:

  no logging: 18672 18792 18667 18518 18613 18547
with logging: 18170 18093 18162 18273 18307 18234

So on average, that's 18634 vs. 18206, i.e. less than 2.5% difference.
And with more expensive transactions (larger scale, writes, ...) the
difference will be much smaller.

I did some testing with a scale 10 prepared "SELECT only" 200 seconds plenty of runs with local socket connections on the largest host I have available:

  pgbench -P 10 -T 200 -S -M prepared -j $c -c $c ...

I think that this cpu-bound bench is kind of a worst possible case for the detailed per transaction log.

I also implemented a quick and dirty version for a merge log based on sharing a file handle (append mode + sprintf + fputs).

The results are as follow:

 * 1 thread 33 runs median tps (average is consistent):
 - no logging:        22062
 - separate logging:  19360  (-12.2%)
 - merged logging:    19326  (-12.4%, not significant from previous)

Note that the impact of logging is much larger than with your tests.
The underlying fprintf comes from gnu libc 2.19.

The worst overhead I could trigger is with 12 threads:

 * 12 threads 35 runs median tps (average is consistent)
 - no logging:       155917
 - separate logging: 124695 (-20.0%)
 - merged logging:   119349 (-23.5%)

My conclusion from these figures is that although the direct merged logging approach adds some overhead, this overhead is small wrt detailed logging (it adds 3.5% to a 20% logging overhead) with 12 threads. Other tests, even with more threads, did not yield larger absolute or relative overheads. Although the direct merge approach is shown to add overheads, this is a small additional overhead on a quite bad situation already, which suggest that using detailed log on a cpu-bound pgbench run is a bad idea to begin with.

For a more realistic test, I ran "simple updates" which involve actual disk writes. It ran at around 840 tps with 24 threads. The logging overhead seems to be under 1%, and there is no significant difference between separate and merged on the 20 runs.

So my overall conclusion is:

(1) The simple thread-shared file approach would save pgbench from post-processing merge-sort heavy code, for a reasonable cost.

(2) The feature would not be available for the thread-emulation with this approach, but I do not see this as a particular issue as I think that it is pretty much only dead code and a maintenance burden.

(3) Optimizing doLog from its current fprintf-based implementation may be a good thing.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to