On Thu, Sep 26, 2013 at 01:41:01PM +0200, Fabien COELHO wrote:
> >I don't get it; why is taking the time just after pthread_create() more sane
> >than taking it just before pthread_create()?
> 
> Thread create time seems to be expensive as well, maybe up 0.1
> seconds under some conditions (?). Under --rate, this create delay
> means that throttling is laging behind schedule by about that time,
> so all the first transactions are trying to catch up.

threadRun() already initializes throttle_trigger with a fresh timestamp.
Please detail how the problem remains despite that.

> >typically far more expensive than pthread_create().  The patch for threaded
> >pgbench made the decision to account for pthread_create() as though it were
> >part of establishing the connection.  You're proposing to not account for it
> >all.  Both of those designs are reasonable to me, but I do not comprehend the
> >benefit you anticipate from switching from one to the other.
> >
> >>-j 800 vs -j 100 : ITM that if I you create more threads, the time delay
> >>incurred is cumulative, so the strangeness of the result should worsen.
> >
> >Not in general; we do one INSTR_TIME_SET_CURRENT() per thread, just before
> >calling pthread_create().  However, thread 0 is a special case; we set its
> >start time first and actually start it last.  Your observation of cumulative
> >delay fits those facts.
> 
> Yep, that must be thread 0 which has a very large delay. I think it
> is simpler that each threads record its start time when it has
> started, without exception.
> 
> > Initializing the thread-0 start time later, just before calling
> >its threadRun(), should clear this anomaly without changing other
> >aspects of the measurement.
> 
> Always taking the thread start time when the thread is started does
> solve the issue as well, and it is homogeneous for all cases, so the
> solution I suggest seems reasonable and simple.

To exercise the timing semantics before and after your proposed change, I
added a "sleep(1);" before the pthread_create() call.  Here are the results
with and without "-j", with and without pgbench-measurements-v5.patch:

$ echo 'select 1' >test.sql

# just the sleep(1) addition
$ env time pgbench -c4 -t1000 -S -n -f test.sql | grep tps
tps = 6784.410104 (including connections establishing)
tps = 7094.701854 (excluding connections establishing)
0.03user 0.07system 0:00.60elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k

$ env time pgbench -j4 -c4 -t1000 -S -n -f test.sql | grep tps
tps = 1224.327010 (including connections establishing)
tps = 2274.160899 (excluding connections establishing)
0.02user 0.03system 0:03.27elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k

# w/ pgbench-measurements-v5.patch
$ env time pgbench -c4 -t1000 -S -n -f test.sql | grep tps
tps = 6792.393877 (including connections establishing)
tps = 7207.142278 (excluding connections establishing)
0.08user 0.06system 0:00.60elapsed 23%CPU (0avgtext+0avgdata 0maxresident)k

$ env time pgbench -j4 -c4 -t1000 -S -n -f test.sql | grep tps
tps = 1212.040409 (including connections establishing)
tps = 1214.728830 (excluding connections establishing)
0.09user 0.06system 0:03.31elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k


Rather than, as I supposed before, excluding the cost of thread start
entirely, pgbench-measurements-v5.patch has us count pthread_create() as part
of the main runtime.  I now see the cumulative delay you mentioned, but
pgbench-measurements-v5.patch does not fix it.  The problem is centered on the
fact that pgbench.c:main() calculates a single total_time and models each
thread as having run for that entire period.  If pthread_create() is slow,
reality diverges considerably from that model; some threads start notably
late, and other threads finish notably early.  The threadRun() runtime
intervals in the last test run above are actually something like this:

thread 1: 1.0s - 1.3s
thread 2: 2.0s - 2.3s
thread 3: 3.0s - 3.3s
thread 0: 3.0s - 3.3s

Current pgbench instead models every thread as having run 0.0s - 3.3s, hence
the numbers reported.  To make the numbers less surprising, we could axe the
global total_time=end_time-start_time and instead accumulate total_time on a
per-thread basis, just as we now accumulate conn_time on a per-thread basis.
That ceases charging threads for time spent not-yet-running or
already-finished, but it can add its own inaccuracy.  Performance during a
period in which some clients have yet to start is not interchangeable with
performance when all clients are running.  pthread_create() slowness would
actually make the workload seem to perform better.

An alternate strategy would be to synchronize the actual start of command
issuance across threads.  All threads would start and make their database
connections, then signal readiness.  Once the first thread notices that every
other thread is ready, it would direct them to actually start issuing queries.
This might minimize the result skew problem of the first strategy.

A third strategy is to just add a comment and write this off as one of the
several artifacts of short benchmark runs.

Opinions, other ideas?

> >While pondering this area of the code, it occurs to me --
> >shouldn't we initialize the throttle rate trigger later, after
> >establishing connections and sending startup queries?  As it
> >stands, we build up a schedule deficit during those tasks.  Was
> >that deliberate?
> 
> On the principle, I agree with you.
> 
> The connection creation time is another thing, but it depends on the
> options set. Under some options the connection is open and closed
> for every transaction, so there is no point in avoiding it in the
> measure or in the scheduling, and I want to avoid having to
> distinguish those cases.

That's fair enough.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to