Re: [HACKERS] pgbench regression test failure
Robert Haas writes: > On Tue, Nov 21, 2017 at 3:28 PM, Tom Lane wrote: >> or just >> ... latency limit: 33/33 (100.000 %) > Oh, yeah. That last one sounds good; no reason to print the same > value more than once. Sold; I'll go make it so. regards, tom lane
Re: [HACKERS] pgbench regression test failure
On Tue, Nov 21, 2017 at 3:28 PM, Tom Lane wrote: > Seems like a good idea, but the way you've written it is inconsistent > with the "n/m" notation used just above. I'd suggest > > ... latency limit: 33 (33/33, 100.000 %) > > or just > > ... latency limit: 33/33 (100.000 %) Oh, yeah. That last one sounds good; no reason to print the same value more than once. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] pgbench regression test failure
Robert Haas writes: > On Mon, Nov 20, 2017 at 1:40 PM, Tom Lane wrote: >> I dunno, it just looks odd to me that when we've set up a test case in >> which every one of the transactions is guaranteed to exceed the latency >> limit, that it doesn't say that they all did. I don't particularly buy >> your assumption that the percentages should sum. Anybody else have an >> opinion there? > I agree with you, but I don't think either approach is free from > possible confusion. I think it would help to show the numerator and > the denominator explicitly, e.g.: > number of clients: 1 > number of threads: 1 > number of transactions per client: 100 > number of transactions actually processed: 33/100 > number of transactions skipped: 67 (67.000 %) > number of transactions above the 1.0 ms latency limit: 33 (33 of 33, 100.000 > %) > (My proposed change is in the last line.) Seems like a good idea, but the way you've written it is inconsistent with the "n/m" notation used just above. I'd suggest ... latency limit: 33 (33/33, 100.000 %) or just ... latency limit: 33/33 (100.000 %) regards, tom lane
Re: [HACKERS] pgbench regression test failure
On Mon, Nov 20, 2017 at 1:40 PM, Tom Lane wrote: > I dunno, it just looks odd to me that when we've set up a test case in > which every one of the transactions is guaranteed to exceed the latency > limit, that it doesn't say that they all did. I don't particularly buy > your assumption that the percentages should sum. Anybody else have an > opinion there? I agree with you, but I don't think either approach is free from possible confusion. I think it would help to show the numerator and the denominator explicitly, e.g.: number of clients: 1 number of threads: 1 number of transactions per client: 100 number of transactions actually processed: 33/100 number of transactions skipped: 67 (67.000 %) number of transactions above the 1.0 ms latency limit: 33 (33 of 33, 100.000 %) (My proposed change is in the last line.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] pgbench regression test failure
Hello Tom, 2. ISTM that we should report that 100% of the transactions were above the latency limit, not 33%; that is, the appropriate base for the "number of transactions above the latency limit" percentage is the number of actual transactions not the number of scheduled transactions. Hmmm. Allow me to disagree. I dunno, it just looks odd to me that when we've set up a test case in which every one of the transactions is guaranteed to exceed the latency limit, that it doesn't say that they all did. I don't particularly buy your assumption that the percentages should sum. This is a side effect. The reason for me is that the user asked for some transactions, and the results should be given relative to what was asked. Anybody else have an opinion there? Good question. I also noticed that if I specify "-f sleep-100.sql" more than once, the per-script TPS reports are out of line. This is evidently because that calculation isn't excluding skipped xacts; but if we're going to define tps as excluding skipped xacts, surely we should do so there too. I do not think that we should exclude skipped xacts. Uh ... why not? Because I totally misinterpreted your sentence. Indeed, the skipped transactions needs to be substracted from the count. This is yet another bug. but I find that unduly optimistic. It should really read more like "if no transactions were executed, at best we'll get some platform- dependent spelling of NaN. At worst we'll get a SIGFPE." Hmmm. Alas you must be right about spelling. There has been no report of SIGFPE issue, so I would not bother with that. The core issue here really is that you're assuming IEEE float arithmetic. We have not gone as far as deciding that Postgres will only run on IEEE hardware, and I don't want to start in pgbench, especially not in seldom-exercised corner cases. Hmmm. It has already started for some years without complaint. Do as you feel about NaN. I can only say that I do not like much having zero to stand for undefined. -- Fabien.
Re: [HACKERS] pgbench regression test failure
Fabien COELHO writes: >> 1. The per-script stats shouldn't be printed at all if there's >> only one script. They're redundant with the overall stats. > Indeed. > I think the output should tend to be the same for possible automatic > processing, whether there is one script or more, even at the price of some > redundancy. > Obviously this is highly debatable. I think that ship has already sailed. It's certainly silly that the code prints *only* per-script latency stats, and not all the per-script stats, when there's just one script. To me the answer is to make the latency stats conform to the rest, not make the printout even more redundant. None of this output was designed for machine-friendliness. (Maybe there is an argument for a "--machine-readable-output" switch that would dump the data in some more-machine-friendly format. Though I'm sure we'd have lots of debates about exactly what that is...) >> 2. ISTM that we should report that 100% of the transactions were >> above the latency limit, not 33%; that is, the appropriate base >> for the "number of transactions above the latency limit" percentage >> is the number of actual transactions not the number of scheduled >> transactions. > Hmmm. Allow me to disagree. I dunno, it just looks odd to me that when we've set up a test case in which every one of the transactions is guaranteed to exceed the latency limit, that it doesn't say that they all did. I don't particularly buy your assumption that the percentages should sum. Anybody else have an opinion there? >> I also noticed that if I specify "-f sleep-100.sql" more than once, >> the per-script TPS reports are out of line. This is evidently because >> that calculation isn't excluding skipped xacts; but if we're going to >> define tps as excluding skipped xacts, surely we should do so there too. > I do not think that we should exclude skipped xacts. Uh ... why not? >> I'm also still exceedingly unhappy about the NaN business. >> You have this comment in printSimpleStats: >> /* print NaN if no transactions where executed */ >> but I find that unduly optimistic. It should really read more like >> "if no transactions were executed, at best we'll get some platform- >> dependent spelling of NaN. At worst we'll get a SIGFPE." > Hmmm. Alas you must be right about spelling. There has been no report of > SIGFPE issue, so I would not bother with that. The core issue here really is that you're assuming IEEE float arithmetic. We have not gone as far as deciding that Postgres will only run on IEEE hardware, and I don't want to start in pgbench, especially not in seldom-exercised corner cases. regards, tom lane
Re: [HACKERS] pgbench regression test failure
Hello Tom, Thanks for having a look at this bug fix. So we fixed the reported TPS rate, which was nowhere near reality, and the per-script stats are sane now. Good so far, but this still has two problems IMO: 1. The per-script stats shouldn't be printed at all if there's only one script. They're redundant with the overall stats. Indeed. I think the output should tend to be the same for possible automatic processing, whether there is one script or more, even at the price of some redundancy. Obviously this is highly debatable. 2. ISTM that we should report that 100% of the transactions were above the latency limit, not 33%; that is, the appropriate base for the "number of transactions above the latency limit" percentage is the number of actual transactions not the number of scheduled transactions. Hmmm. Allow me to disagree. If the user asked for something, then this is the reference to take and the whole report should be relative to that. Also, changing this makes the report figures not adding up: before: number of transactions skipped: 417 (90.260 %) number of transactions above the 1.0 ms latency limit: 45 (9.740 %) Both above percent are on the same reference, fine. Can be added, 100% of transactions we required had issues. after: number of transactions skipped: 457 (90.855 %) number of transactions above the 1.0 ms latency limit: 46 (100.000 %) Percents do not add up anymore. I do not think that this is an improvement. I also noticed that if I specify "-f sleep-100.sql" more than once, the per-script TPS reports are out of line. This is evidently because that calculation isn't excluding skipped xacts; but if we're going to define tps as excluding skipped xacts, surely we should do so there too. I do not think that we should exclude skipped xacts. I'm also still exceedingly unhappy about the NaN business. You have this comment in printSimpleStats: /* print NaN if no transactions where executed */ but I find that unduly optimistic. It should really read more like "if no transactions were executed, at best we'll get some platform- dependent spelling of NaN. At worst we'll get a SIGFPE." Hmmm. Alas you must be right about spelling. There has been no report of SIGFPE issue, so I would not bother with that. I do not think that a pedantic argument about NaN being the "correct" answer justifies taking such portability risks, even if I bought that argument, which I don't particularly. ISTM that the portability risk, i.e. about SIGFPE issue, would require a non IEEE 754 conforming box, and there has been no complaint yet about that. It's also inconsistent with this basic decision in printResults: /* Remaining stats are nonsensical if we failed to execute any xacts */ if (total->cnt <= 0) return; According to praise, this decision was by Tom Lane. It is about the whole execution not executing anything, which may be seen as a special error case, whereas a progress report without xact execution within some period can be perfectly normal. If we thought NaN was the "correct" answer for no xacts then we'd just bull ahead and print NaNs all over the place. Why not, but for me the issue wrt to the final report is distinct. I think we would be best advised to deal with this by printing zeroes in the progress-report case, and by skipping the output altogether in printSimpleStats. (Or we could make printSimpleStats print zeroes; I'm indifferent to that choice.) I disagree with printing a zero if the measure is undefined. Zero does not mean undefined. "Zero means zero":-) Maybe we could replace platform-specific printing for "NaN" with something else, either a deterministic "NaN" or some other string, eg "-" or "?" or whatever. The benefit with sticking to some "NaN" or some platform specific NaN is that processing the output can retrieve it as a float. So basically I would have stayed with the current version which seems consistent to me, but I agree that it is debatable. I'm not sure it is worth being "exceedingly unhappy" about it, because these are special corner cases anyway, not likely to be seen without somehow looking for it or having pretty strange pgbench runs. Attached is an updated patch addressing these points. Comments? I somehow disagree with your points above, for various reasons. However, you are a committer, and I think best that the bug is fixed even if I happen to disagree with unrelated changes. As an academic, I probably have a number of silly ideas about a lot of things. This is part of the job description:-) -- Fabien.
Re: [HACKERS] pgbench regression test failure
Hello Steve, Here is the hopefully right version, which passes tests here. This version seems fine. I think it is ready for a committer Ok, thanks for the debug! -- Fabien.
Re: [HACKERS] pgbench regression test failure
On Mon, 13 Nov 2017, Fabien COELHO wrote: Hello Steve, printf("number of transactions actually processed: " INT64_FORMAT "/%d\n", - total->cnt - total->skipped, nxacts * nclients); + total->cnt, nxacts * nclients); I think you want ntx instead of total->cnt here. Indeed... and this is also what my git branch contains... I just sent the wrong version, sorry:-( The same fix is also needed in the else branch. Here is the hopefully right version, which passes tests here. This version seems fine. I think it is ready for a committer -- Fabien.
Re: [HACKERS] pgbench regression test failure
Hello Steve, printf("number of transactions actually processed: " INT64_FORMAT "/%d\n", - total->cnt - total->skipped, nxacts * nclients); + total->cnt, nxacts * nclients); I think you want ntx instead of total->cnt here. Indeed... and this is also what my git branch contains... I just sent the wrong version, sorry:-( The same fix is also needed in the else branch. Here is the hopefully right version, which passes tests here. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index ec56a74..1a5c04c 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -2613,7 +2613,7 @@ processXactStats(TState *thread, CState *st, instr_time *now, doLog(thread, st, agg, skipped, latency, lag); /* XXX could use a mutex here, but we choose not to */ - if (per_script_stats) + if (per_script_stats || latency_limit) accumStats(&sql_script[st->use_file].stats, skipped, latency, lag); } @@ -3556,11 +3556,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time, double time_include, tps_include, tps_exclude; + int64 ntx = total->cnt - total->skipped; time_include = INSTR_TIME_GET_DOUBLE(total_time); - tps_include = total->cnt / time_include; - tps_exclude = total->cnt / (time_include - -(INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients)); + + /* tps is about actually executed transactions */ + tps_include = ntx / time_include; + tps_exclude = ntx / + (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients)); /* Report test parameters. */ printf("transaction type: %s\n", @@ -3573,13 +3576,13 @@ printResults(TState *threads, StatsData *total, instr_time total_time, { printf("number of transactions per client: %d\n", nxacts); printf("number of transactions actually processed: " INT64_FORMAT "/%d\n", - total->cnt - total->skipped, nxacts * nclients); + ntx, nxacts * nclients); } else { printf("duration: %d s\n", duration); printf("number of transactions actually processed: " INT64_FORMAT "\n", - total->cnt); + ntx); } /* Remaining stats are nonsensical if we failed to execute any xacts */ @@ -4706,7 +4709,8 @@ threadRun(void *arg) { /* generate and show report */ StatsData cur; -int64 run = now - last_report; +int64 run = now - last_report, + ntx; double tps, total_run, latency, @@ -4721,7 +4725,7 @@ threadRun(void *arg) * XXX: No locking. There is no guarantee that we get an * atomic snapshot of the transaction count and latencies, so * these figures can well be off by a small amount. The - * progress is report's purpose is to give a quick overview of + * progress report's purpose is to give a quick overview of * how the test is going, so that shouldn't matter too much. * (If a read from a 64-bit integer is not atomic, you might * get a "torn" read and completely bogus latencies though!) @@ -4735,15 +4739,14 @@ threadRun(void *arg) cur.skipped += thread[i].stats.skipped; } +/* we count only actually executed transactions */ +ntx = (cur.cnt - cur.skipped) - (last.cnt - last.skipped); total_run = (now - thread_start) / 100.0; -tps = 100.0 * (cur.cnt - last.cnt) / run; -latency = 0.001 * (cur.latency.sum - last.latency.sum) / - (cur.cnt - last.cnt); -sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) - / (cur.cnt - last.cnt); +tps = 100.0 * ntx / run; +latency = 0.001 * (cur.latency.sum - last.latency.sum) / ntx; +sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) / ntx; stdev = 0.001 * sqrt(sqlat - 100.0 * latency * latency); -lag = 0.001 * (cur.lag.sum - last.lag.sum) / - (cur.cnt - last.cnt); +lag = 0.001 * (cur.lag.sum - last.lag.sum) / ntx; if (progress_timestamp) { @@ -4760,6 +4763,7 @@ threadRun(void *arg) (long) tv.tv_sec, (long) (tv.tv_usec / 1000)); } else + /* round seconds are expected, nut the thread may be late */ snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run); fprintf(stderr,