Attached is an update that I think sorts out all of the documentation
concerns. I broke this section into paragraphs now that it's getting so
long too.
The only code change is that this now labels the controversial lag here
average rate limit schedule lag. That way if someone wants to
Greg Smith wrote:
Thanks. I didn't look at the code, but while trying to read the docs:
+para
+ High rate limit schedule lag values, that is values not small with
+ respect to the actual transaction latency, indicate that something
is
+ amiss in the
Hello Greg,
Thanks for the improvement!
I have a small reservation about finish/end time schedule in the second
paragraph, or maybe there is something that I do not understand. There is
no schedule for finishing anything, only start times are scheduled, so I
wish the text could avoid
Hello Alvaro,
Thanks. I didn't look at the code, but while trying to read the docs:
+para
+ High rate limit schedule lag values, that is values not small with
+ respect to the actual transaction latency, indicate that something is
+ amiss in the throttling
Very minor update with V19 here, to reflect Alvaro's comments. The
tricky part now reads like this:
High rate limit schedule lag values, that is lag values that are large
compared to the actual transaction latency, indicate that something is
amiss in the throttling process. High schedule
On Mon, Jul 22, 2013 at 01:49:39PM -0400, Greg Smith wrote:
Very minor update with V19 here, to reflect Alvaro's comments. The
tricky part now reads like this:
High rate limit schedule lag values,
High values of the rate limit schedule lag measurement?
that is lag values that are large
Very minor update with V19 here, to reflect Alvaro's comments. The
tricky part now reads like this:
High rate limit schedule lag values, that is lag values that are large
compared to the actual transaction latency, indicate that something is
amiss in the throttling process. High schedule
Greg,
Yes, I already took at look at it briefly. The updates move in the
right direction, but I can edit them usefully before commit. I'll
have that done by tomorrow and send out a new version. I'm hopeful
that v18 will finally be the one that everyone likes.
Have you done it?
--
Tatsuo
Hello Tatsuo,
I think I'm starting to understand what's going on. Suppose there are
n transactions be issued by pgbench and it decides each schedule d(0),
d(1)... d(n). Actually the schedule d(i) (which is stored in
st-until) is decided by the following code:
int64 wait =
Hello Greg,
The lag computation was not the interesting part of this feature to me. As I
said before, I considered it more of a debugging level thing than a number
people would analyze as much as you did. I understand why you don't like it
though. If the reference time was moved forward
Fabien,
Hello again Tatsuo,
For your information, included is the patch against git master head to
implement the lag in a way what I proposed. With the patch, I get more
consistent number on Linux (and Mac OS X).
I must disagree with your proposal: At least, it does not provide the
Hello Tatsuo
I think current measurement method will give enough confusion if it's
not provided with detailed explanation. Could you please provide doc
updatation?
Please find a v17 proposition with an updated and extended documentation,
focussed on clarifying the lag measure and its
Hello again Tatsuo,
For your information, included is the patch against git master head to
implement the lag in a way what I proposed. With the patch, I get more
consistent number on Linux (and Mac OS X).
I must disagree with your proposal: At least, it does not provide the
information I
Hello Tatsuo
I think current measurement method will give enough confusion if it's
not provided with detailed explanation. Could you please provide doc
updatation?
Please find a v17 proposition with an updated and extended
documentation, focussed on clarifying the lag measure and its
On 7/18/13 6:45 PM, Tatsuo Ishii wrote:
I'm not a native English speaker either... Greg, could you please
review the document?
Yes, I already took at look at it briefly. The updates move in the
right direction, but I can edit them usefully before commit. I'll have
that done by tomorrow and
I'm not a native English speaker either... Greg, could you please
review the document?
Yes, I already took at look at it briefly. The updates move in the right
direction, but I can edit them usefully before commit.
Great, thanks for your help!
--
Fabien.
--
Sent via pgsql-hackers
To clarify what state this is all in: Fabien's latest
pgbench-throttle-v15.patch is the ready for a committer version. The
last two revisions are just tweaking the comments at this point, and
his version is more correct than my last one.
Got it. I will take care of this.
Please find
To clarify what state this is all in: Fabien's latest
pgbench-throttle-v15.patch is the ready for a committer version. The
last two revisions are just tweaking the comments at this point, and
his version is more correct than my last one.
Got it. I will take care of this.
Please find
Hello Tatsuo,
Now I have question regarding the function.
./pgbench -p 5433 -S -T 10 -R 1 test
tps = 7133.745911 (including connections establishing)
What does average rate limit lag mean? From the manual:
[...]
So in my understanding the number shows the delay time before *each*
Hello Tatsuo,
Now I have question regarding the function.
./pgbench -p 5433 -S -T 10 -R 1 test
tps = 7133.745911 (including connections establishing)
What does average rate limit lag mean? From the manual:
[...]
So in my understanding the number shows the delay time before *each*
On 7/17/13 2:31 AM, Tatsuo Ishii wrote:
./pgbench -p 5433 -S -T 10 -R 1 test
average rate limit lag: 862.534 ms (max 2960.913 ms)
tps = 7133.745911 (including connections establishing)
tps = 7135.130810 (excluding connections establishing)
What does average rate limit lag mean?
The whole
Thanks for detailed explainations. I now understand the function.
Good. I've looked into the documentation. I'm not sure how I could improve
it significantly without adding a lot of text which would also add a lot
of confusion to the casual reader.
I'm going to test your patches on Mac
The whole concept of lag with the rate limit is complicated.
I must agree on that point, their interpretation is subtle.
At one point I thought this should be a debugging detail, rather than
exposing the user to it. The problem is that if you do that, you might
not notice that your limit
The whole concept of lag with the rate limit is complicated.
I must agree on that point, their interpretation is subtle.
At one point I thought this should be a debugging detail, rather than
exposing the user to it. The problem is that if you do that, you might
not notice that your limit
The whole concept of lag with the rate limit is complicated.
I must agree on that point, their interpretation is subtle.
At one point I thought this should be a debugging detail, rather than
exposing the user to it. The problem is that if you do that, you might
not notice that your limit
tps = 9818.741060 (including connections establishing)
So I thought I could squeeze 1 TPS from my box.
Then I tried with -R 5000 tps.
number of transactions actually processed: 1510640
average rate limit lag: 0.304 ms (max 19.101 ms)
tps = 5035.409397 (including connections establishing)
tps = 9818.741060 (including connections establishing)
So I thought I could squeeze 1 TPS from my box.
Then I tried with -R 5000 tps.
number of transactions actually processed: 1510640
average rate limit lag: 0.304 ms (max 19.101 ms)
tps = 5035.409397 (including connections
Hello Tatsuo,
The lag is reasonnable, althought no too good. One transaction is
about 1.2 ms, the lag is much smaller than that, and you are at about
50% of the maximum load. I've got similar figures on my box for such
settings. It improves if your reduce the number of clients.
No, 5000 TPS
Fabien,
Hello Tatsuo,
The lag is reasonnable, althought no too good. One transaction is
about 1.2 ms, the lag is much smaller than that, and you are at about
50% of the maximum load. I've got similar figures on my box for such
settings. It improves if your reduce the number of clients.
On 7/17/13 9:16 PM, Tatsuo Ishii wrote:
Now suppose we have 3 transactions and each has following values:
d(0) = 10
d(1) = 20
d(2) = 30
t(0) = 100
t(1) = 110
t(2) = 120
That says pgbench expects the duration 10 for each
transaction. Actually, the first transaction runs slowly for some
reason
On 7/17/13 9:16 PM, Tatsuo Ishii wrote:
Now suppose we have 3 transactions and each has following values:
d(0) = 10
d(1) = 20
d(2) = 30
t(0) = 100
t(1) = 110
t(2) = 120
That says pgbench expects the duration 10 for each
transaction. Actually, the first transaction runs slowly for some
On 7/17/13 11:34 PM, Tatsuo Ishii wrote:
My example is for 1 client case. So concurrent clients are not the
issue here.
Sorry about that, with your clarification I see what you were trying to
explain now. The code initializes the target time like this:
thread-throttle_trigger =
Sorry about that, with your clarification I see what you were trying
to explain now. The code initializes the target time like this:
thread-throttle_trigger = INSTR_TIME_GET_MICROSEC(start);
And then each time a transaction fires, it advances the reference time
forward based on the
Sorry about that, with your clarification I see what you were trying
to explain now. The code initializes the target time like this:
thread-throttle_trigger = INSTR_TIME_GET_MICROSEC(start);
And then each time a transaction fires, it advances the reference time
forward based on the
To clarify what state this is all in: Fabien's latest
pgbench-throttle-v15.patch is the ready for a committer version. The
last two revisions are just tweaking the comments at this point, and
his version is more correct than my last one.
Got it. I will take care of this.
--
Tatsuo Ishii
SRA
To clarify what state this is all in: Fabien's latest
pgbench-throttle-v15.patch is the ready for a committer version. The
last two revisions are just tweaking the comments at this point, and his
version is more correct than my last one.
My little pgbench-delay-finish-v1.patch is a brand
* ISTM that the impact of the chosen 1000 should appear somewhere.
I don't have a problem with that, but I didn't see that the little table you
included was enough to do that. I think if someone knows how this type of
random generation works, they don't need the comment to analyze the
On 7/13/13 12:13 PM, Fabien COELHO wrote:
My 0.02€: if it means adding complexity to the pgbench code, I think
that it is not worth it. The point of pgbench is to look at a steady
state, not to end in the most graceful possible way as far as measures
are concerned.
That's how some people use
Hello Greg,
But we don't have to argue about that because it isn't. The attached new
patch seems to fix the latency spikes at the end, with -2 lines of new code!
Hmmm... that looks like not too much complexity:-)
With that resolved I did a final pass across the rate limit code too,
On 7/14/13 2:48 PM, Fabien COELHO wrote:
You attached my v13. Could you send your v14?
Correct patch (and the little one from me again) attached this time.
--
Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support
Hello Greg,
Correct patch (and the little one from me again) attached this time.
Please find an updated v15 with only comment changes:
* The new comment about is_throttled was inverted wrt the meaning of the
variable, at least to my understanding.
* ISTM that the impact of the chosen
On 6/30/13 2:04 AM, Fabien COELHO wrote:
My guess is the OS. PQfinish or select do/are systems calls that
present opportunities to switch context. I think that the OS is passing
time with other processes on the same host, expecially postgres
backends, when it is not with the client.
I went
Hello Greg,
There's a refactoring possible here that seems to make this whole class of
problem go away. If I change pgbench so that PQfinish isn't called for any
client until *all* of the clients are actually finished with transactions,
the whole issue goes away.
Sure. If the explanation
On 06/29/2013 04:11 PM, Greg Smith wrote:
I need to catch up with revisions done to this feature since I started
instrumenting my copy more heavily. I hope I can get this ready for
commit by Monday. I've certainly beaten on the feature for long enough
now.
Greg, any progress? Haven't seen
[...] Why? I don't know exactly why, but I am sure that pgbench isn't
doing anything weird. It's either libpq acting funny, or the OS.
My guess is the OS. PQfinish or select do/are systems calls that
present opportunities to switch context. I think that the OS is passing
time with other
On 6/22/13 12:54 PM, Fabien COELHO wrote:
After some poking around, and pursuing various red herrings, I resorted
to measure the delay for calling PQfinish(), which is really the only
special thing going around at the end of pgbench run...
This wasn't what I was seeing, but it's related. I've
Please find attached a v12, which under timer_exceeded interrupts
clients which are being throttled instead of waiting for the end of the
transaction, as the transaction is not started yet.
Please find attached a v13 which fixes conflicts introduced by the long
options patch committed by
So my argumented conclusion is that the issue is somewhere within PQfinish(),
possibly in interaction with pgbench doings, but is *NOT* related in any way
to the throttling patch, as it is preexisting it. Gregs stumbled upon it
because he looked at latencies.
An additional thought:
The
An additional thought:
Yet another thought, hopefully final on this subject.
I think that the probability of a context switch is higher when calling
PQfinish than in other parts of pgbench because it contains system calls
(e.g. closing the network connection) where the kernel is likely to
Dear Robert and Greg,
I think so. If it doesn't get fixed now, it's not likely to get fixed
later. And the fact that nobody understands why it's happening is
kinda worrisome...
Possibly, but I thing that it is not my fault:-)
So, I spent some time at performance debugging...
First, I
Please find attached a v12, which under timer_exceeded interrupts clients
which are being throttled instead of waiting for the end of the
transaction, as the transaction is not started yet.
I've also changed the log format that I used for debugging the apparent
latency issue:
x y z
Please find attached a v12, which under timer_exceeded interrupts clients
which are being throttled instead of waiting for the end of the transaction,
as the transaction is not started yet.
Oops, I forgot the attachment. Here it is!
--
Fabien.diff --git a/contrib/pgbench/pgbench.c
On Wed, Jun 19, 2013 at 2:42 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:
number of transactions actually processed: 301921
Just a thought before spending too much time on this subtle issue.
The patch worked reasonnably for 301900 transactions in your above run, and
the few last ones, less
I'm still getting the same sort of pauses waiting for input with your v11.
Alas.
This is a pretty frustrating problem; I've spent about two days so far trying
to narrow down how it happens. I've attached the test program I'm using. It
seems related to my picking a throttled rate that's
number of transactions actually processed: 301921
Just a thought before spending too much time on this subtle issue.
The patch worked reasonnably for 301900 transactions in your above run,
and the few last ones, less than the number of clients, show strange
latency figures which suggest
I'm still getting the same sort of pauses waiting for input with your
v11. This is a pretty frustrating problem; I've spent about two days so
far trying to narrow down how it happens. I've attached the test
program I'm using. It seems related to my picking a throttled rate
that's close to
Hello Greg,
I've done some more testing with the test patch. I have not seen any
spike at the end of the throttled run.
The attached version 11 patch does ensure that throttle added sleeps are
not included in latency measures (-r) and that throttling is performed
right at the beginning of
On 6/12/13 3:19 AM, Fabien COELHO wrote:
If you are still worried: if you run the very same command without
throttling and measure the same latency, does the same thing happens at
the end? My guess is that it should be yes. If it is no, I'll try out
pgbench-tools.
It looks like it happens
I don't have this resolved yet, but I think I've identified the cause.
Updating here mainly so Fabien doesn't duplicate my work trying to track
this down. I'm going to keep banging at this until it's resolved now
that I got this far.
Here's a slow transaction:
1371226017.568515 client 1
pgbench already has a \sleep command, and the way that delay is
handled happens inside threadRun() instead. The pausing of the rate
limit throttle needs to operate in the same place.
It does operate at the same place. The throttling is performed by
inserting a sleep first thing when
Hello Greg,
I think that the weirdness really comes from the way transactions times
are measured, their interactions with throttling, and latent bugs in the
code.
One issue is that the throttling time was included in the measure, but not
the first time because txn_begin is not set at the
On 6/14/13 3:50 PM, Fabien COELHO wrote:
I think that the weirdness really comes from the way transactions times
are measured, their interactions with throttling, and latent bugs in the
code.
measurement times, no; interactions with throttling, no. If it was
either of those I'd have finished
I think that the weirdness really comes from the way transactions times
are measured, their interactions with throttling, and latent bugs in the
code.
measurement times, no; interactions with throttling, no. If it was either of
those I'd have finished this off days ago. Latent bugs,
Did you look at the giant latency spikes at the end of the test run I
submitted the graph for? I wanted to nail down what was causing those
before worrying about the startup timing.
If you are still worried: if you run the very same command without
throttling and measure the same latency,
On 6/10/13 6:02 PM, Fabien COELHO wrote:
- the tps is global, with a mutex to share the global stochastic process
- there is an adaptation for the fork emulation
- I do not know wheter this works with Win32 pthread stuff.
Instead of this complexity, can we just split the TPS input per
- the tps is global, with a mutex to share the global stochastic process
- there is an adaptation for the fork emulation
- I do not know wheter this works with Win32 pthread stuff.
Instead of this complexity,
Well, the mutex impact is very localized in the code. The complexity is
Submission 10:
- per thread throttling instead of a global process with a mutex.
this avoids a mutex, and the process is shared between clients
of a given thread.
- ISTM that there thread start time should be initialized at the
beginning of threadRun instead of in the loop *before*
On 6/11/13 4:11 PM, Fabien COELHO wrote:
- ISTM that there thread start time should be initialized at the
beginning of threadRun instead of in the loop *before* thread creation,
otherwise the thread creation delays are incorporated in the
performance measure, but ISTM that the
- ISTM that there thread start time should be initialized at the
beginning of threadRun instead of in the loop *before* thread creation,
otherwise the thread creation delays are incorporated in the
performance measure, but ISTM that the point of pgbench is not to
measure
Hello Greg,
Thanks for this very detailed review and the suggestions!
I'll submit a new patch
Question 1: should it report the maximum lang encountered?
I haven't found the lag measurement to be very useful yet, outside of
debugging the feature itself. Accordingly I don't see a reason to
Here is submission v9 based on your v8 version.
- the tps is global, with a mutex to share the global stochastic process
- there is an adaptation for the fork emulation
- I do not know wheter this works with Win32 pthread stuff.
- reduced multiplier ln(100) - ln(1000)
- avg max
On 6/1/13 5:00 AM, Fabien COELHO wrote:
Question 1: should it report the maximum lang encountered?
I haven't found the lag measurement to be very useful yet, outside of
debugging the feature itself. Accordingly I don't see a reason to add
even more statistics about the number outside of
New submission for the next commit fest.
This new version also reports the average lag time, i.e. the delay between
scheduled and actual transaction start times. This may help detect whether
things went smothly, or if at some time some delay was introduced because
of the load and some
73 matches
Mail list logo