Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
It find it a little annoying that there is no change in tests, it means that the format is not checked at all:-( Yeah. Perhaps it's a little bit hard to perform this kind of tests in the TAP test? Not really. I'll look into it. -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>>> Ok. Fine with me. Possibly at some point there was the idea that there >>> could be other failures counted, but there are none. Also, there has >>> been questions about the failures detailed option, or whether the >>> reports should always be detailed, and the result may be some kind of >>> not convincing compromise. >> >> Attached is the patch to remove the column. > > Patch applies cleanly. Compilation ok. Global and local "make check" > ok. > Doc build ok. Thank you for reviewing. Patch pushed. > It find it a little annoying that there is no change in tests, it > means that the format is not checked at all:-( Yeah. Perhaps it's a little bit hard to perform this kind of tests in the TAP test? Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Ok. Fine with me. Possibly at some point there was the idea that there could be other failures counted, but there are none. Also, there has been questions about the failures detailed option, or whether the reports should always be detailed, and the result may be some kind of not convincing compromise. Attached is the patch to remove the column. Patch applies cleanly. Compilation ok. Global and local "make check" ok. Doc build ok. It find it a little annoying that there is no change in tests, it means that the format is not checked at all:-( -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> One thing that this doesn't fix is that the existing text appears >> to suggest that the "failures" column is something different from >> the sum of the serialization_failures and deadlock_failures >> columns, which it's obvious from the code is not so. If this isn't >> a code bug then I think we ought to just drop that column entirely, >> because it's redundant. > > Ok. Fine with me. Possibly at some point there was the idea that there > could be other failures counted, but there are none. Also, there has > been questions about the failures detailed option, or whether the > reports should always be detailed, and the result may be some kind of > not convincing compromise. Attached is the patch to remove the column. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 387a836287..dbae4e2321 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2591,15 +2591,6 @@ END; - - failures - - - number of transactions that ended with a failed SQL command - - - - serialization_failures @@ -2629,8 +2620,8 @@ END; pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed --max-tries=10 test -1649114136 5815 27552565 177846919143 1078 21716 2756787 7264696105 0 9661 0 7854 31472 4022 4022 0 -1649114146 5958 28460110 182785513108 1083 20391 2539395 6411761497 0 7268 0 8127 32595 4101 4101 0 +1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64 7564 28340 4148 0 +1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598 7392 26621 4527 0 diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index e63cea56a1..f8bcb1ab6d 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -4498,7 +4498,6 @@ doLog(TState *thread, CState *st, int64 skipped = 0; int64 serialization_failures = 0; int64 deadlock_failures = 0; - int64 serialization_or_deadlock_failures = 0; int64 retried = 0; int64 retries = 0; @@ -4540,9 +4539,7 @@ doLog(TState *thread, CState *st, serialization_failures = agg->serialization_failures; deadlock_failures = agg->deadlock_failures; } - serialization_or_deadlock_failures = serialization_failures + deadlock_failures; - fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT, - serialization_or_deadlock_failures, + fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, serialization_failures, deadlock_failures);
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Fabien COELHO writes: > While looking at the html outpout, the "pgbench" command line just below > wraps strangely: >pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 >--latency-limit=10 --failures-detailed --max-tries=10 test > ISTM that there should be no nl in the pgbench … > section, although maybe it would trigger a complaint in the pdf format. PDF wraps that text where it wants to anyway, so I removed the newline. regards, tom lane
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Tom, The buildfarm is still complaining about the synopsis being too wide for PDF format. I think what we ought to do is give up on using a for log lines at all, and instead convert the documentation into a tabular list of fields. Proposal attached, which also fixes a couple of outright errors. Looks ok. Html doc generation ok. While looking at the html outpout, the "pgbench" command line just below wraps strangely: pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed --max-tries=10 test ISTM that there should be no nl in the pgbench … section, although maybe it would trigger a complaint in the pdf format. One thing that this doesn't fix is that the existing text appears to suggest that the "failures" column is something different from the sum of the serialization_failures and deadlock_failures columns, which it's obvious from the code is not so. If this isn't a code bug then I think we ought to just drop that column entirely, because it's redundant. Ok. Fine with me. Possibly at some point there was the idea that there could be other failures counted, but there are none. Also, there has been questions about the failures detailed option, or whether the reports should always be detailed, and the result may be some kind of not convincing compromise. (BTW, now that I've read this stuff I am quite horrified by how the non-aggregated log format has been mangled for error retries, and will be probably be submitting a proposal to change that. But that's a different issue.) Indeed, any improvement is welcome! -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> The buildfarm is still complaining about the synopsis being too > wide for PDF format. I think what we ought to do is give up on > using a for log lines at all, and instead convert the > documentation into a tabular list of fields. Proposal attached, > which also fixes a couple of outright errors. Once I thought about that too. Looks good to me. > One thing that this doesn't fix is that the existing text appears > to suggest that the "failures" column is something different from > the sum of the serialization_failures and deadlock_failures > columns, which it's obvious from the code is not so. If this isn't > a code bug then I think we ought to just drop that column entirely, > because it's redundant. +1. > (BTW, now that I've read this stuff I am quite horrified by how > the non-aggregated log format has been mangled for error retries, > and will be probably be submitting a proposal to change that. > But that's a different issue.) Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Tatsuo Ishii writes: > Patch pushed. Thanks. The buildfarm is still complaining about the synopsis being too wide for PDF format. I think what we ought to do is give up on using a for log lines at all, and instead convert the documentation into a tabular list of fields. Proposal attached, which also fixes a couple of outright errors. One thing that this doesn't fix is that the existing text appears to suggest that the "failures" column is something different from the sum of the serialization_failures and deadlock_failures columns, which it's obvious from the code is not so. If this isn't a code bug then I think we ought to just drop that column entirely, because it's redundant. (BTW, now that I've read this stuff I am quite horrified by how the non-aggregated log format has been mangled for error retries, and will be probably be submitting a proposal to change that. But that's a different issue.) regards, tom lane diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 9ba26e5e86..d12cbaa8ab 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2289,33 +2289,95 @@ END; - The format of the log is: - - -client_id transaction_no time script_no time_epoch time_us schedule_lag retries - - - where - client_id indicates which client session ran the transaction, - transaction_no counts how many transactions have been - run by that session, - time is the total elapsed transaction time in microseconds, - script_no identifies which script file was used (useful when - multiple scripts were specified with -f or -b), - and time_epoch/time_us are a - Unix-epoch time stamp and an offset - in microseconds (suitable for creating an ISO 8601 - time stamp with fractional seconds) showing when - the transaction completed. - The schedule_lag field is the difference between the - transaction's scheduled start time, and the time it actually started, in - microseconds. It is only present when the --rate option is used. + Each line in a log file describes one transaction. + It contains the following space-separated fields: + + + + client_id + + + identifies the client session that ran the transaction + + + + + + transaction_no + + + counts how many transactions have been run by that session + + + + + + time + + + transaction's elapsed time, in microseconds + + + + + + script_no + + + identifies the script file that was used for the transaction + (useful when multiple scripts are specified + with -f or -b) + + + + + + time_epoch + + + transaction's completion time, as a Unix-epoch time stamp + + + + + + time_us + + + fractional-second part of transaction's completion time, in + microseconds + + + + + + schedule_lag + + + difference between the transaction's scheduled start time and the + time it actually started, in microseconds + (present only if --rate is specified) + + + + + + retries + + + count of retries after serialization or deadlock errors during the + transaction + (present only if --max-tries is not equal to one) + + + + + + + When both --rate and --latency-limit are used, the time for a skipped transaction will be reported as skipped. - retries is the sum of all retries after the - serialization or deadlock errors during the current script execution. It is - present only if the --max-tries option is not equal to 1. If the transaction ends with a failure, its time will be reported as failed. If you use the --failures-detailed option, the @@ -2398,66 +2460,171 @@ END; With the --aggregate-interval option, a different - format is used for the log files: - - -interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency -sum_lag sum_lag_2 min_lag max_lag skipped -retried retries failures serialization_failures deadlock_failures - - - where - interval_start is the start of the interval (as a Unix - epoch time stamp), - num_transactions is the number of transactions - within the interval, - sum_latency is the sum of the transaction - latencies within the interval, - sum_latency_2 is the sum of squares of the - transaction latencies within the interval, - min_latency is the minimum latency within the interval, - and - max_latency is the maximum latency within the interval, - failures is the number of transactions that ended - with a failed SQL command within the interval. - - - The next fields, - sum_lag, sum_lag_2, min_lag, - and max_lag, only meaningful if the --rate - option is used. Otherwise, they are all 0.0. - They
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> I would suggest to reorder the last chunk to: >> >>... retried retries failures serfail dlfail >> >> because I intend to add connection failures handling at some point, >> and it would make more sense to add the corresponding count at the end >> with other fails. > > Ok, I have adjusted the patch. V2 patch attached. Patch pushed. Thanks. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hi Fabien, > Hello Tatsuo-san, > >> interval_start num_transactions sum_latency sum_latency_2 min_latency >> max_latency >> sum_lag sum_lag_2 min_lag max_lag skipped >> failures serialization_failures deadlock_failures retried retries > > I would suggest to reorder the last chunk to: > >... retried retries failures serfail dlfail > > because I intend to add connection failures handling at some point, > and it would make more sense to add the corresponding count at the end > with other fails. Ok, I have adjusted the patch. V2 patch attached. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ebdb4b3f46..d1818ff316 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2401,7 +2401,9 @@ END; format is used for the log files: -interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } sum_lag sum_lag_2 min_lag max_lag skippedretried retries +interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency +sum_lag sum_lag_2 min_lag max_lag skipped +retried retries failures serialization_failures deadlock_failures where @@ -2417,41 +2419,55 @@ END; and max_latency is the maximum latency within the interval, failures is the number of transactions that ended - with a failed SQL command within the interval. If you use option - --failures-detailed, instead of the sum of all failed - transactions you will get more detailed statistics for the failed - transactions grouped by the following types: - serialization_failures is the number of - transactions that got a serialization error and were not retried after this, - deadlock_failures is the number of transactions - that got a deadlock error and were not retried after this. + with a failed SQL command within the interval. + + The next fields, sum_lag, sum_lag_2, min_lag, - and max_lag, are only present if the --rate - option is used. + and max_lag, only meaningful if the --rate + option is used. Otherwise, they are all 0.0. They provide statistics about the time each transaction had to wait for the previous one to finish, i.e., the difference between each transaction's scheduled start time and the time it actually started. The next field, skipped, - is only present if the --latency-limit option is used, too. + is only meaningful if the --latency-limit option is used, too. Otherwise it is 0. It counts the number of transactions skipped because they would have started too late. - The retried and retries - fields are present only if the --max-tries option is not - equal to 1. They report the number of retried transactions and the sum of all - retries after serialization or deadlock errors within the interval. - Each transaction is counted in the interval when it was committed. + + + The retried + and retries fields are only meaningful if + the --max-tries option is not equal to 1. Otherwise they + are 0. They report the number of retried transactions and the sum of all + retries after serialization or deadlock errors within the interval. Each + transaction is counted in the interval when it was committed. + + + failures is the sum of all failed transactions. + If --failures-detailed is specified, instead of the sum of + all failed transactions you will get more detailed statistics for the + failed transactions grouped by the following types: + serialization_failures is the number of + transactions that got a serialization error and were not retried after this, + deadlock_failures is the number of transactions + that got a deadlock error and were not retried after this. + If --failures-detailed is not + specified, serialization_failures + and deadlock_failures are always 0. - Here is some example output: + Here is some example output with following options: -1345828501 5601 1542744 483552416 61 2573 0 -1345828503 7884 1979812 565806736 60 1479 0 -1345828505 7208 1979422 567277552 59 1391 0 -1345828507 7685 1980268 569784714 60 1398 0 -1345828509 7073 1979779 573489941 236 1411 0 - +pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 +--latency-limit=10 --failures-detailed --max-tries=10 test + + + +1649114136 5815 27552565 177846919143 1078 21716 2756787 7264696105 0 9661 0 7854 31472 4022 4022 0 +1649114146 5958 28460110 182785513108 1083 20391 2539395 6411761497 0 7268 0 8127 32595 4101 4101 0 + + Notice that while the plain (unaggregated) log file shows which script diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index acf3e56413..4d4b979e4f 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -4494,6 +4494,17 @@ doLog(TState *thread,
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Tatsuo-san, interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency sum_lag sum_lag_2 min_lag max_lag skipped failures serialization_failures deadlock_failures retried retries I would suggest to reorder the last chunk to: ... retried retries failures serfail dlfail because I intend to add connection failures handling at some point, and it would make more sense to add the corresponding count at the end with other fails. -- Fabien Coelho - CRI, MINES ParisTech
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> I think it's easier to just say "if feature X is not enabled, then >> columns XYZ are always zeroes". > > Ok, I will come up with a patch in this direction. Please find attached patch for this. With the patch, the log line is as follows (actually no line foldings of course): interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency sum_lag sum_lag_2 min_lag max_lag skipped failures serialization_failures deadlock_failures retried retries I updated the doc as well: - fold the log line using line feed to avoid error in rendering PDF. I did not use because it does not enhance HTML output. - split explanation of the log output into multiple paragraphs to enhance readability. - replace the example output with full options are specified. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ebdb4b3f46..e1f98ae228 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2401,7 +2401,9 @@ END; format is used for the log files: -interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } sum_lag sum_lag_2 min_lag max_lag skippedretried retries +interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency +sum_lag sum_lag_2 min_lag max_lag skipped +failures serialization_failures deadlock_failures retried retries where @@ -2417,41 +2419,55 @@ END; and max_latency is the maximum latency within the interval, failures is the number of transactions that ended - with a failed SQL command within the interval. If you use option - --failures-detailed, instead of the sum of all failed - transactions you will get more detailed statistics for the failed - transactions grouped by the following types: - serialization_failures is the number of - transactions that got a serialization error and were not retried after this, - deadlock_failures is the number of transactions - that got a deadlock error and were not retried after this. + with a failed SQL command within the interval. + + The next fields, sum_lag, sum_lag_2, min_lag, - and max_lag, are only present if the --rate - option is used. + and max_lag, only meaningful if the --rate + option is used. Otherwise, they are all 0.0. They provide statistics about the time each transaction had to wait for the previous one to finish, i.e., the difference between each transaction's scheduled start time and the time it actually started. The next field, skipped, - is only present if the --latency-limit option is used, too. + is only meaningful if the --latency-limit option is used, too. Otherwise it is 0. It counts the number of transactions skipped because they would have started too late. - The retried and retries - fields are present only if the --max-tries option is not - equal to 1. They report the number of retried transactions and the sum of all - retries after serialization or deadlock errors within the interval. - Each transaction is counted in the interval when it was committed. + + + failures is the sum of all failed transactions. + If --failures-detailed is specified, instead of the sum of + all failed transactions you will get more detailed statistics for the + failed transactions grouped by the following types: + serialization_failures is the number of + transactions that got a serialization error and were not retried after this, + deadlock_failures is the number of transactions + that got a deadlock error and were not retried after this. + If --failures-detailed is not + specified, serialization_failures + and deadlock_failures are always 0. + + + The retried + and retries fields are only meaningful if + the --max-tries option is not equal to 1. Otherwise they + are 0. They report the number of retried transactions and the sum of all + retries after serialization or deadlock errors within the interval. Each + transaction is counted in the interval when it was committed. - Here is some example output: + Here is some example output with following options: -1345828501 5601 1542744 483552416 61 2573 0 -1345828503 7884 1979812 565806736 60 1479 0 -1345828505 7208 1979422 567277552 59 1391 0 -1345828507 7685 1980268 569784714 60 1398 0 -1345828509 7073 1979779 573489941 236 1411 0 - +pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 +--latency-limit=10 --failures-detailed --max-tries=10 test + + + +1649033235 5398 26186948 170797073304 1051 15347 2471626 6290343560 0 8261 0 3925 3925 0 7524 29534 +1649033245 5651 27345519 210270761637 1011 67780 2480555 6835066067 0 496 3839 3839 0 7533 30118 + + Notice that while the plain (unaggregated) log file shows
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Alvaro Herrera writes: > I think it's easier to just say "if feature X is not enabled, then > columns XYZ are always zeroes". +1, that's pretty much what I was thinking. regards, tom lane
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> My 0.02€: >> >> I agree that it would be better to have a more deterministic aggregated log >> format. >> >> ISTM that it should skip failures and lags if no fancy options has been >> selected, i.e.: >> >> [ fails ... retries [ sum_lag ... [ skipped ] ] ? > > I think it's easier to just say "if feature X is not enabled, then > columns XYZ are always zeroes". Ok, I will come up with a patch in this direction. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On 2022-Apr-03, Fabien COELHO wrote: > > What about this? (a log line is not actually folded) > > interval_start num_transactions sum_latency sum_latency_2 min_latency > > max_latency > > failures serialization_failures deadlock_failures retried retries [ sum_lag > > sum_lag_2 min_lag max_lag [ skipped ] ] > > My 0.02€: > > I agree that it would be better to have a more deterministic aggregated log > format. > > ISTM that it should skip failures and lags if no fancy options has been > selected, i.e.: > > [ fails ... retries [ sum_lag ... [ skipped ] ] ? I think it's easier to just say "if feature X is not enabled, then columns XYZ are always zeroes". -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>>> Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, >>> skipped, retried retries? >> >> What about this? (a log line is not actually folded) >> interval_start num_transactions sum_latency sum_latency_2 min_latency >> max_latency >> failures serialization_failures deadlock_failures retried retries [ >> sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] > > My 0.02€: > > I agree that it would be better to have a more deterministic > aggregated log format. > > ISTM that it should skip failures and lags if no fancy options has > been selected, i.e.: > > [ fails ... retries [ sum_lag ... [ skipped ] ] ? > > Alterlatively, as the failure stuff is added to the format, maybe it > could be at the end: > > [ sum_lag ... [ skipped [ fails ... retries ] ] ] ? I like this one. interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency [sum_lag sum_lag_2 min_lag max_lag [ skipped [ failures serialization_failures deadlock_failures retried retries ] ] ] Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, retried retries? What about this? (a log line is not actually folded) interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency failures serialization_failures deadlock_failures retried retries [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] My 0.02€: I agree that it would be better to have a more deterministic aggregated log format. ISTM that it should skip failures and lags if no fancy options has been selected, i.e.: [ fails ... retries [ sum_lag ... [ skipped ] ] ? Alterlatively, as the failure stuff is added to the format, maybe it could be at the end: [ sum_lag ... [ skipped [ fails ... retries ] ] ] ? failures: always 0 (if --max-tries is 1, the default) sum of serialization_failures and deadlock_failures (if --max-tries is not 1) serialization_failures and deadlock_failures: always 0 (if --max-tries is 1, the default) 0 or more (if --max-tries is not 1) retried and retries: always 0 (if --max-tries is 1, the default) 0 or more (if --max-tries is not 1) Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> I think the problem is not merely one of documentation, but one of >> bad design. Up to now it was possible to tell what was what from >> counting the number of columns in the output; but with this design, >> that is impossible. That should be fixed. The first thing you have >> got to do is drop the alternation { failures | serialization_failures >> deadlock_failures }. That doesn't make any sense in the first place: >> counting serialization and deadlock failures doesn't make it impossible >> for other errors to occur. It'd probably make the most sense to have >> three columns always, serialization, deadlock and total. > > +1. > >> Now maybe >> that change alone is sufficient, but I'm not convinced, because the >> multiple options at the end of the line mean we will never again be >> able to add any more columns without reintroducing ambiguity. I >> would be happier if the syntax diagram were such that columns could >> only be dropped from right to left. > > Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, > retried retries? What about this? (a log line is not actually folded) interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency failures serialization_failures deadlock_failures retried retries [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] failures: always 0 (if --max-tries is 1, the default) sum of serialization_failures and deadlock_failures (if --max-tries is not 1) serialization_failures and deadlock_failures: always 0 (if --max-tries is 1, the default) 0 or more (if --max-tries is not 1) retried and retries: always 0 (if --max-tries is 1, the default) 0 or more (if --max-tries is not 1) Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, retried retries? Anyway now that current CF is closing, it will not be possible to change those logging design soon. Or can we change the logging design even after CF is closed? My 0.02€: I'm not sure how the official guidelines are to be interpreted in that case, but if the design is to be changed, ISTM that it is better to do it before a release instead of letting the release out with one format and changing it in the next release? -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> Alvaro Herrera writes: >>> After: >>> interval_start num_transactions sum_latency sum_latency_2 min_latency >>> max_latency >>> { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 >>> min_lag max_lag [ skipped ] ] [ retried retries ] > >> I think that the explanatory paragraph is way too long now, particularly >> since it explains --failures-detailed starting in the middle. Also, the >> example output doesn't include the failures-detailed mode. > > I think the problem is not merely one of documentation, but one of > bad design. Up to now it was possible to tell what was what from > counting the number of columns in the output; but with this design, > that is impossible. That should be fixed. The first thing you have > got to do is drop the alternation { failures | serialization_failures > deadlock_failures }. That doesn't make any sense in the first place: > counting serialization and deadlock failures doesn't make it impossible > for other errors to occur. It'd probably make the most sense to have > three columns always, serialization, deadlock and total. +1. > Now maybe > that change alone is sufficient, but I'm not convinced, because the > multiple options at the end of the line mean we will never again be > able to add any more columns without reintroducing ambiguity. I > would be happier if the syntax diagram were such that columns could > only be dropped from right to left. Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, retried retries? Anyway now that current CF is closing, it will not be possible to change those logging design soon. Or can we change the logging design even after CF is closed? Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> Hello, > > On 2022-Mar-27, Tatsuo Ishii wrote: > >> After: >> interval_start num_transactions sum_latency sum_latency_2 min_latency >> max_latency >> { failures | serialization_failures deadlock_failures } [ sum_lag >> sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ] > > You're showing an indentation, but looking at the HTML output there is > no such. Is the HTML processor eating leading whitespace or something > like that? I just copied from my web browser screen (Firefox 98.0.2 on Ubuntu 20). Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Alvaro Herrera writes: >> After: >> interval_start num_transactions sum_latency sum_latency_2 min_latency >> max_latency >> { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 >> min_lag max_lag [ skipped ] ] [ retried retries ] > I think that the explanatory paragraph is way too long now, particularly > since it explains --failures-detailed starting in the middle. Also, the > example output doesn't include the failures-detailed mode. I think the problem is not merely one of documentation, but one of bad design. Up to now it was possible to tell what was what from counting the number of columns in the output; but with this design, that is impossible. That should be fixed. The first thing you have got to do is drop the alternation { failures | serialization_failures deadlock_failures }. That doesn't make any sense in the first place: counting serialization and deadlock failures doesn't make it impossible for other errors to occur. It'd probably make the most sense to have three columns always, serialization, deadlock and total. Now maybe that change alone is sufficient, but I'm not convinced, because the multiple options at the end of the line mean we will never again be able to add any more columns without reintroducing ambiguity. I would be happier if the syntax diagram were such that columns could only be dropped from right to left. regards, tom lane
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello, On 2022-Mar-27, Tatsuo Ishii wrote: > After: > interval_start num_transactions sum_latency sum_latency_2 min_latency > max_latency > { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 > min_lag max_lag [ skipped ] ] [ retried retries ] You're showing an indentation, but looking at the HTML output there is no such. Is the HTML processor eating leading whitespace or something like that? I think that the explanatory paragraph is way too long now, particularly since it explains --failures-detailed starting in the middle. Also, the example output doesn't include the failures-detailed mode. I suggest that this should be broken down even more; first to explain the output without failures-detailed, including an example, and then the output with failures-detailed, and an example of that. Something like this, perhaps: Aggregated Logging With the --aggregate-interval option, a different format is used for the log files (note that the actual log line is not folded). interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency failures [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ] where interval_start is the start of the interval (as a Unix epoch time stamp), num_transactions is the number of transactions within the interval, sum_latency is the sum of the transaction latencies within the interval, sum_latency_2 is the sum of squares of the transaction latencies within the interval, min_latency is the minimum latency within the interval, and max_latency is the maximum latency within the interval, failures is the number of transactions that ended with a failed SQL command within the interval. The next fields, sum_lag, sum_lag_2, min_lag, and max_lag, are only present if the --rate option is used. They provide statistics about the time each transaction had to wait for the previous one to finish, i.e., the difference between each transaction's scheduled start time and the time it actually started. The next field, skipped, is only present if the --latency-limit option is used, too. It counts the number of transactions skipped because they would have started too late. The retried and retries fields are present only if the --max-tries option is not equal to 1. They report the number of retried transactions and the sum of all retries after serialization or deadlock errors within the interval. Each transaction is counted in the interval when it was committed. Notice that while the plain (unaggregated) log file shows which script was used for each transaction, the aggregated log does not. Therefore if you need per-script data, you need to aggregate the data on your own. Here is some example output: 1345828501 5601 1542744 483552416 61 2573 0 1345828503 7884 1979812 565806736 60 1479 0 1345828505 7208 1979422 567277552 59 1391 0 1345828507 7685 1980268 569784714 60 1398 0 1345828509 7073 1979779 573489941 236 1411 0 If you use option --failures-detailed, instead of the sum of all failed transactions you will get more detailed statistics for the failed transactions: interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency serialization_failures deadlock_failures [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ] This is similar to the above, but here the single 'failures' figure is replaced by serialization_failures which is the number of transactions that got a serialization error and were not retried after this, deadlock_failures which is the number of transactions that got a deadlock error and were not retried after this. The other fields are as above. Here is some example output: [example with detailed failures] -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "If you have nothing to say, maybe you need just the right tool to help you not say it." (New York Times, about Microsoft PowerPoint)
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>>> > Even applying this patch, "make postgres-A4.pdf" arises the warning on my >>> > machine. After some investigations, I found that previous document had a >>> > break >>> > after 'num_transactions', but it has been removed due to this commit. >>> >>> Yes, your patch removed "". >>> >>> > So, >>> > I would like to get back this as it was. I attached the patch. >>> >>> This produces errors. Needs ";" postfix? >> >> Oops. Yes, it needs ';'. Also, I found another "" dropped. >> I attached the fixed patch. > > Basic problem with this patch is, this may solve the issue with pdf > generation but this does not solve the issue with HTML generation. The > PDF manual of pgbench has ridiculously long line, which Tom Lane I meant "HTML manual" here. > complained too: > > interval_start num_transactions sum_latency sum_latency_2 min_latency > max_latency { failures | serialization_failures deadlock_failures } [ > sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ] > > Why can't we use just line feeds instead of ? Although it's not > a command usage but the SELECT manual already uses line feeds to > nicely break into multiple lines of command usage. > > Best reagards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> > Even applying this patch, "make postgres-A4.pdf" arises the warning on my >> > machine. After some investigations, I found that previous document had a >> > break >> > after 'num_transactions', but it has been removed due to this commit. >> >> Yes, your patch removed "". >> >> > So, >> > I would like to get back this as it was. I attached the patch. >> >> This produces errors. Needs ";" postfix? > > Oops. Yes, it needs ';'. Also, I found another "" dropped. > I attached the fixed patch. Basic problem with this patch is, this may solve the issue with pdf generation but this does not solve the issue with HTML generation. The PDF manual of pgbench has ridiculously long line, which Tom Lane complained too: interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ] Why can't we use just line feeds instead of ? Although it's not a command usage but the SELECT manual already uses line feeds to nicely break into multiple lines of command usage. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Mon, 28 Mar 2022 12:17:13 +0900 (JST) Tatsuo Ishii wrote: > > Even applying this patch, "make postgres-A4.pdf" arises the warning on my > > machine. After some investigations, I found that previous document had a > > break > > after 'num_transactions', but it has been removed due to this commit. > > Yes, your patch removed "". > > > So, > > I would like to get back this as it was. I attached the patch. > > This produces errors. Needs ";" postfix? Oops. Yes, it needs ';'. Also, I found another "" dropped. I attached the fixed patch. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ebdb4b3f46..b16a5b9b7b 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2401,7 +2401,7 @@ END; format is used for the log files: -interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } sum_lag sum_lag_2 min_lag max_lag skippedretried retries +interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } sum_lag sum_lag_2 min_lag max_lag skippedretried retries where
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> Even applying this patch, "make postgres-A4.pdf" arises the warning on my > machine. After some investigations, I found that previous document had a break > after 'num_transactions', but it has been removed due to this commit. Yes, your patch removed "". > So, > I would like to get back this as it was. I attached the patch. This produces errors. Needs ";" postfix? ref/pgbench.sgml:2404: parser error : EntityRef: expecting ';' le>interval_start num_transactions ^ ref/pgbench.sgml:2781: parser error : chunk is not well balanced ^ reference.sgml:251: parser error : Failure to process entity pgbench ^ reference.sgml:251: parser error : Entity 'pgbench' not defined ^ reference.sgml:296: parser error : chunk is not well balanced ^ postgres.sgml:240: parser error : Failure to process entity reference ^ postgres.sgml:240: parser error : Entity 'reference' not defined ^ make: *** [Makefile:135: html-stamp] エラー 1 Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Sun, 27 Mar 2022 15:28:41 +0900 (JST) Tatsuo Ishii wrote: > > This patch has caused the PDF documentation to fail to build cleanly: > > > > [WARN] FOUserAgent - The contents of fo:block line 1 exceed the available > > area in the inline-progression direction by more than 50 points. (See > > position 125066:375) > > > > It's complaining about this: > > > > > > interval_start > > num_transactions > > sum_latency > > sum_latency_2 > > min_latency > > max_latency { > > failures | > > serialization_failures > > deadlock_failures } > > sum_lag sum_lag_2 > > min_lag max_lag > > skipped > > retried > > retries > > > > > > which runs much too wide in HTML format too, even though that toolchain > > doesn't tell you so. > > Yeah. > > > We could silence the warning by inserting an arbitrary line break or two, > > or refactoring the syntax description into multiple parts. Either way > > seems to create a risk of confusion. > > I think we can fold the line nicely. Here is the rendered image. > > Before: > interval_start num_transactions sum_latency sum_latency_2 min_latency > max_latency { failures | serialization_failures deadlock_failures } [ sum_lag > sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ] > > After: > interval_start num_transactions sum_latency sum_latency_2 min_latency > max_latency > { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 > min_lag max_lag [ skipped ] ] [ retried retries ] > > Note that before it was like this: > > interval_start num_transactions sum_latency sum_latency_2 min_latency > max_latency [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] > > So newly added items are "{ failures | serialization_failures > deadlock_failures }" and " [ retried retries ]". > > > TBH, I think the *real* problem is that the complexity of this log format > > has blown past "out of hand". Can't we simplify it? Who is really going > > to use all these numbers? I pity the poor sucker who tries to write a > > log analysis tool that will handle all the variants. > > Well, the extra logging items above only appear when the retry feature > is enabled. For those who do not use the feature the only new logging > item is "failures". For those who use the feature, the extra logging > items are apparently necessary. For example if we write an application > using repeatable read or serializable transaction isolation mode, > retrying failed transactions due to srialization error is an essential > technique. Also the retry rate of transactions will deeply affect the > performance and in such use cases the newly added items will be > precisou information. I would suggest leave the log items as it is. > > Patch attached. Even applying this patch, "make postgres-A4.pdf" arises the warning on my machine. After some investigations, I found that previous document had a break after 'num_transactions', but it has been removed due to this commit. So, I would like to get back this as it was. I attached the patch. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ebdb4b3f46..4437d5ef53 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2401,7 +2401,7 @@ END; format is used for the log files: -interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } sum_lag sum_lag_2 min_lag max_lag skippedretried retries +interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } sum_lag sum_lag_2 min_lag max_lag skippedretried retries where
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> This patch has caused the PDF documentation to fail to build cleanly: > > [WARN] FOUserAgent - The contents of fo:block line 1 exceed the available > area in the inline-progression direction by more than 50 points. (See > position 125066:375) > > It's complaining about this: > > > interval_start > num_transactions > sum_latency > sum_latency_2 > min_latency max_latency > { failures | > serialization_failures > deadlock_failures } > sum_lag sum_lag_2 > min_lag max_lag > skipped > retried > retries > > > which runs much too wide in HTML format too, even though that toolchain > doesn't tell you so. Yeah. > We could silence the warning by inserting an arbitrary line break or two, > or refactoring the syntax description into multiple parts. Either way > seems to create a risk of confusion. I think we can fold the line nicely. Here is the rendered image. Before: interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ] After: interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ] Note that before it was like this: interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] So newly added items are "{ failures | serialization_failures deadlock_failures }" and " [ retried retries ]". > TBH, I think the *real* problem is that the complexity of this log format > has blown past "out of hand". Can't we simplify it? Who is really going > to use all these numbers? I pity the poor sucker who tries to write a > log analysis tool that will handle all the variants. Well, the extra logging items above only appear when the retry feature is enabled. For those who do not use the feature the only new logging item is "failures". For those who use the feature, the extra logging items are apparently necessary. For example if we write an application using repeatable read or serializable transaction isolation mode, retrying failed transactions due to srialization error is an essential technique. Also the retry rate of transactions will deeply affect the performance and in such use cases the newly added items will be precisou information. I would suggest leave the log items as it is. Patch attached. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ebdb4b3f46..b65b813ebe 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2398,10 +2398,11 @@ END; With the --aggregate-interval option, a different - format is used for the log files: + format is used for the log files (note that the actual log line is not folded). -interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } sum_lag sum_lag_2 min_lag max_lag skippedretried retries + interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency + { failures | serialization_failures deadlock_failures } sum_lag sum_lag_2 min_lag max_lag skippedretried retries where
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Tatsuo Ishii writes: > Thanks. Patch pushed. This patch has caused the PDF documentation to fail to build cleanly: [WARN] FOUserAgent - The contents of fo:block line 1 exceed the available area in the inline-progression direction by more than 50 points. (See position 125066:375) It's complaining about this: interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } sum_lag sum_lag_2 min_lag max_lag skipped retried retries which runs much too wide in HTML format too, even though that toolchain doesn't tell you so. We could silence the warning by inserting an arbitrary line break or two, or refactoring the syntax description into multiple parts. Either way seems to create a risk of confusion. TBH, I think the *real* problem is that the complexity of this log format has blown past "out of hand". Can't we simplify it? Who is really going to use all these numbers? I pity the poor sucker who tries to write a log analysis tool that will handle all the variants. regards, tom lane
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> Oops. Thanks. New patch attached. Test has passed on my machine. > > This patch works for me. I think it is ok to use \N instead of \gN. Thanks. Patch pushed. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> I reproduced the failure on another machine with perl 5.8.8, > and I can confirm that this patch fixes it. Thank you for the test. I have pushed the patch. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Fri, 25 Mar 2022 09:14:00 +0900 (JST) Tatsuo Ishii wrote: > > Note that the \\g2 just above also needs to be changed. > > Oops. Thanks. New patch attached. Test has passed on my machine. This patch works for me. I think it is ok to use \N instead of \gN. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Tatsuo Ishii writes: > Oops. Thanks. New patch attached. Test has passed on my machine. I reproduced the failure on another machine with perl 5.8.8, and I can confirm that this patch fixes it. regards, tom lane
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> Note that the \\g2 just above also needs to be changed. Oops. Thanks. New patch attached. Test has passed on my machine. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index 60cae1e843..ca71f968dc 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -1222,9 +1222,9 @@ local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read"; # delta variable in the next try my $err_pattern = "(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*" - . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; " + . "client \\2 got an error in command 3 \\(SQL\\) of script 0; " . "ERROR: could not serialize access due to concurrent update\\b.*" - . "\\g1"; + . "\\1"; $node->pgbench( "-n -c 2 -t 1 -d --verbose-errors --max-tries 2",
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Tatsuo Ishii writes: > I don't see a reason to use "\gN" either. Actually after applying > attached patch, my machine is still happy with pgbench test. Note that the \\g2 just above also needs to be changed. regards, tom lane
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> My machine (Ubuntu 20) did not complain either. Maybe perl version >> difference? Any way, the fix pushed. Let's see how prairiedog feels. > > Still not happy. After some digging in man pages, I believe the > problem is that its old version of Perl does not understand "\gN" > backreferences. Is there a good reason to be using that rather > than the traditional "\N" backref notation? I don't see a reason to use "\gN" either. Actually after applying attached patch, my machine is still happy with pgbench test. Yugo? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index 60cae1e843..22a23489e8 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -1224,7 +1224,7 @@ my $err_pattern = "(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*" . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; " . "ERROR: could not serialize access due to concurrent update\\b.*" - . "\\g1"; + . "\\1"; $node->pgbench( "-n -c 2 -t 1 -d --verbose-errors --max-tries 2",
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Tatsuo Ishii writes: >> My hoary animal prairiedog doesn't like this [1]: > My machine (Ubuntu 20) did not complain either. Maybe perl version > difference? Any way, the fix pushed. Let's see how prairiedog feels. Still not happy. After some digging in man pages, I believe the problem is that its old version of Perl does not understand "\gN" backreferences. Is there a good reason to be using that rather than the traditional "\N" backref notation? regards, tom lane
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> My hoary animal prairiedog doesn't like this [1]: >> >> # Failed test 'concurrent update with retrying stderr /(?s-xim:client >> (0|1) got an error in command 3 \\(SQL\\) of script 0; ERROR: could not >> serialize access due to concurrent update\\b.*\\g1)/' >> # at t/001_pgbench_with_server.pl line 1229. >> # 'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 >> nclients: 2 nxacts: 1 dbName: postgres >> ... >> # pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR: >> could not serialize access due to concurrent update >> ... >> # ' >> # doesn't match '(?s-xim:client (0|1) got an error in command 3 >> \\(SQL\\) of script 0; ERROR: could not serialize access due to concurrent >> update\\b.*\\g1)' >> # Looks like you failed 1 test of 425. >> >> I'm not sure what the "\\b.*\\g1" part of this regex is meant to >> accomplish, but it seems to be assuming more than it should >> about the output format of TAP messages. > > I have edited the test code from the original patch by mistake, but > I could not realize because the test works in my machine without any > errors somehow. > > I attached a patch to fix the test as was in the original patch, where > backreferences are used to check retry of the same query. My machine (Ubuntu 20) did not complain either. Maybe perl version difference? Any way, the fix pushed. Let's see how prairiedog feels. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Wed, 23 Mar 2022 14:26:54 -0400 Tom Lane wrote: > Tatsuo Ishii writes: > > The patch Pushed. Thank you! > > My hoary animal prairiedog doesn't like this [1]: > > # Failed test 'concurrent update with retrying stderr /(?s-xim:client (0|1) > got an error in command 3 \\(SQL\\) of script 0; ERROR: could not serialize > access due to concurrent update\\b.*\\g1)/' > # at t/001_pgbench_with_server.pl line 1229. > # 'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 nclients: > 2 nxacts: 1 dbName: postgres > ... > # pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR: > could not serialize access due to concurrent update > ... > # ' > # doesn't match '(?s-xim:client (0|1) got an error in command 3 \\(SQL\\) > of script 0; ERROR: could not serialize access due to concurrent > update\\b.*\\g1)' > # Looks like you failed 1 test of 425. > > I'm not sure what the "\\b.*\\g1" part of this regex is meant to > accomplish, but it seems to be assuming more than it should > about the output format of TAP messages. I have edited the test code from the original patch by mistake, but I could not realize because the test works in my machine without any errors somehow. I attached a patch to fix the test as was in the original patch, where backreferences are used to check retry of the same query. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index d173ceae7a..3eb5905e5a 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -1222,7 +1222,8 @@ local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read"; # Check that we have a serialization error and the same random value of the # delta variable in the next try my $err_pattern = -"client (0|1) got an error in command 3 \\(SQL\\) of script 0; " + "(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*" + . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; " . "ERROR: could not serialize access due to concurrent update\\b.*" . "\\g1";
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Tatsuo Ishii writes: > The patch Pushed. Thank you! My hoary animal prairiedog doesn't like this [1]: # Failed test 'concurrent update with retrying stderr /(?s-xim:client (0|1) got an error in command 3 \\(SQL\\) of script 0; ERROR: could not serialize access due to concurrent update\\b.*\\g1)/' # at t/001_pgbench_with_server.pl line 1229. # 'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 nclients: 2 nxacts: 1 dbName: postgres ... # pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR: could not serialize access due to concurrent update ... # ' # doesn't match '(?s-xim:client (0|1) got an error in command 3 \\(SQL\\) of script 0; ERROR: could not serialize access due to concurrent update\\b.*\\g1)' # Looks like you failed 1 test of 425. I'm not sure what the "\\b.*\\g1" part of this regex is meant to accomplish, but it seems to be assuming more than it should about the output format of TAP messages. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2022-03-23%2013%3A21%3A44
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> I attached the updated patch. I also fixed the following paragraph which I >> had >> forgotten to fix in the previous patch. >> >> The first seven lines report some of the most important parameter settings. >> The sixth line reports the maximum number of tries for transactions with >> serialization or deadlock errors > > Thank you for the updated patch. I think the patches look good and now > it's ready for commit. If there's no objection, I would like to > commit/push the patches. The patch Pushed. Thank you! Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> I've checked other places using referring to , and found >> that "xreflabel"s are used in such tags. So, I'll fix it >> in this style. > > I attached the updated patch. I also fixed the following paragraph which I had > forgotten to fix in the previous patch. > > The first seven lines report some of the most important parameter settings. > The sixth line reports the maximum number of tries for transactions with > serialization or deadlock errors Thank you for the updated patch. I think the patches look good and now it's ready for commit. If there's no objection, I would like to commit/push the patches. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Tue, 22 Mar 2022 09:08:15 +0900 Yugo NAGATA wrote: > Hi Ishii-san, > > On Sun, 20 Mar 2022 09:52:06 +0900 (JST) > Tatsuo Ishii wrote: > > > Hi Yugo, > > > > I have looked into the patch and I noticed that > linkend=... endterm=...> is used in pgbench.sgml. e.g. > > > > > > > > AFAIK this is the only place where "endterm" is used. In other places > > "link" tag is used instead: > > Thank you for pointing out it. > > I've checked other places using referring to , and found > that "xreflabel"s are used in such tags. So, I'll fix it > in this style. I attached the updated patch. I also fixed the following paragraph which I had forgotten to fix in the previous patch. The first seven lines report some of the most important parameter settings. The sixth line reports the maximum number of tries for transactions with serialization or deadlock errors Regards, Yugo Nagata -- Yugo NAGATA >From b9f993e81836c4379478809da0e690023c319038 Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Wed, 26 May 2021 16:58:36 +0900 Subject: [PATCH v18 1/2] Pgbench errors: use the Variables structure for client variables This is most important when it is used to reset client variables during the repeating of transactions after serialization/deadlock failures. Don't allocate Variable structs one by one. Instead, add a constant margin each time it overflows. --- src/bin/pgbench/pgbench.c | 163 +++--- 1 file changed, 100 insertions(+), 63 deletions(-) diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 000ffc4a5c..ab2c5dfc5f 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -289,6 +289,12 @@ const char *progname; volatile bool timer_exceeded = false; /* flag from signal handler */ +/* + * We don't want to allocate variables one by one; for efficiency, add a + * constant margin each time it overflows. + */ +#define VARIABLES_ALLOC_MARGIN 8 + /* * Variable definitions. * @@ -306,6 +312,24 @@ typedef struct PgBenchValue value; /* actual variable's value */ } Variable; +/* + * Data structure for client variables. + */ +typedef struct +{ + Variable *vars; /* array of variable definitions */ + int nvars; /* number of variables */ + + /* + * The maximum number of variables that we can currently store in 'vars' + * without having to reallocate more space. We must always have max_vars >= + * nvars. + */ + int max_vars; + + bool vars_sorted; /* are variables sorted by name? */ +} Variables; + #define MAX_SCRIPTS 128 /* max number of SQL scripts allowed */ #define SHELL_COMMAND_SIZE 256 /* maximum size allowed for shell command */ @@ -460,9 +484,7 @@ typedef struct int command; /* command number in script */ /* client variables */ - Variable *variables; /* array of variable definitions */ - int nvariables; /* number of variables */ - bool vars_sorted; /* are variables sorted by name? */ + Variables variables; /* various times about current transaction in microseconds */ pg_time_usec_t txn_scheduled; /* scheduled start time of transaction */ @@ -1398,39 +1420,39 @@ compareVariableNames(const void *v1, const void *v2) /* Locate a variable by name; returns NULL if unknown */ static Variable * -lookupVariable(CState *st, char *name) +lookupVariable(Variables *variables, char *name) { Variable key; /* On some versions of Solaris, bsearch of zero items dumps core */ - if (st->nvariables <= 0) + if (variables->nvars <= 0) return NULL; /* Sort if we have to */ - if (!st->vars_sorted) + if (!variables->vars_sorted) { - qsort((void *) st->variables, st->nvariables, sizeof(Variable), + qsort((void *) variables->vars, variables->nvars, sizeof(Variable), compareVariableNames); - st->vars_sorted = true; + variables->vars_sorted = true; } /* Now we can search */ key.name = name; return (Variable *) bsearch((void *) , -(void *) st->variables, -st->nvariables, +(void *) variables->vars, +variables->nvars, sizeof(Variable), compareVariableNames); } /* Get the value of a variable, in string form; returns NULL if unknown */ static char * -getVariable(CState *st, char *name) +getVariable(Variables *variables, char *name) { Variable *var; char stringform[64]; - var = lookupVariable(st, name); + var = lookupVariable(variables, name); if (var == NULL) return NULL; /* not found */ @@ -1562,21 +1584,37 @@ valid_variable_name(const char *name) return true; } +/* + * Make sure there is enough space for 'needed' more variable in the variables + * array. + */ +static void +enlargeVariables(Variables *variables, int needed) +{ + /* total number of variables required now */ + needed += variables->nvars; + + if (variables->max_vars < needed) + { + variables->max_vars = needed + VARIABLES_ALLOC_MARGIN; + variables->vars = (Variable *) + pg_realloc(variables->vars, variables->max_vars *
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> On Sun, 20 Mar 2022 16:11:43 +0900 (JST) > Tatsuo Ishii wrote: > >> > Hi Yugo, >> > >> > I tested with serialization error scenario by setting: >> > default_transaction_isolation = 'repeatable read' >> > The result was: >> > >> > $ pgbench -t 10 -c 10 --max-tries=10 test >> > transaction type: >> > scaling factor: 10 >> > query mode: simple >> > number of clients: 10 >> > number of threads: 1 >> > maximum number of tries: 10 >> > number of transactions per client: 10 >> > number of transactions actually processed: 100/100 >> > number of failed transactions: 0 (0.000%) >> > number of transactions retried: 35 (35.000%) >> > total number of retries: 74 >> > latency average = 5.306 ms >> > initial connection time = 15.575 ms >> > tps = 1884.516810 (without initial connection time) >> > >> > I had hard time to understand what those numbers mean: >> > number of transactions retried: 35 (35.000%) >> > total number of retries: 74 >> > >> > It seems "total number of retries" matches with the number of ERRORs >> > reported in PostgreSQL. Good. What I am not sure is "number of >> > transactions retried". What does this mean? >> >> Oh, ok. I see it now. It turned out that "number of transactions >> retried" does not actually means the number of transactions >> rtried. Suppose pgbench exectutes following in a session: >> >> BEGIN; -- transaction A starts >> : >> (ERROR) >> ROLLBACK; -- transaction A aborts >> >> (retry) >> >> BEGIN; -- transaction B starts >> : >> (ERROR) >> ROLLBACK; -- transaction B aborts >> >> (retry) >> >> BEGIN; -- transaction C starts >> : >> END; -- finally succeeds >> >> In this case "total number of retries:" = 2 and "number of >> transactions retried:" = 1. In this patch transactions A, B and C are >> regarded as "same" transaction, so the retried transaction count >> becomes 1. But it's confusing to use the language "transaction" here >> because A, B and C are different transactions. I would think it's >> better to use different language instead of "transaction", something >> like "cycle"? i.e. >> >> number of cycles retried: 35 (35.000%) I realized that the same argument can be applied even to "number of transactions actually processed" because with the retry feature, "transaction" could comprise multiple transactions. But if we go forward and replace those "transactions" with "cycles" (or whatever) altogether, probably it could bring enough confusion to users who have been using pgbench. Probably we should give up the language changing and redefine "transaction" when the retry feature is enabled instead like "when retry feature is enabled, each transaction can be consisted of multiple transactions retried." > In the original patch by Marina Polyakova it was "number of retried", > but I changed it to "number of transactions retried" is because I felt > it was confusing with "number of retries". I chose the word "transaction" > because a transaction ends in any one of successful commit , skipped, or > failure, after possible retries. Ok. > Well, I agree with that it is somewhat confusing wording. If we can find > nice word to resolve the confusion, I don't mind if we change the word. > Maybe, we can use "executions" as well as "cycles". However, I am not sure > that the situation is improved by using such word because what such word > exactly means seems to be still unclear for users. > > Another idea is instead reporting only "the number of successfully > retried transactions" that does not include "failed transactions", > that is, transactions failed after retries, like this; > > number of transactions actually processed: 100/100 > number of failed transactions: 0 (0.000%) > number of successfully retried transactions: 35 (35.000%) > total number of retries: 74 > > The meaning is clear and there seems to be no confusion. Thank you for the suggestion. But I think it would better to leave it as it is because of the reason I mentioned above. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Sun, 20 Mar 2022 16:11:43 +0900 (JST) Tatsuo Ishii wrote: > > Hi Yugo, > > > > I tested with serialization error scenario by setting: > > default_transaction_isolation = 'repeatable read' > > The result was: > > > > $ pgbench -t 10 -c 10 --max-tries=10 test > > transaction type: > > scaling factor: 10 > > query mode: simple > > number of clients: 10 > > number of threads: 1 > > maximum number of tries: 10 > > number of transactions per client: 10 > > number of transactions actually processed: 100/100 > > number of failed transactions: 0 (0.000%) > > number of transactions retried: 35 (35.000%) > > total number of retries: 74 > > latency average = 5.306 ms > > initial connection time = 15.575 ms > > tps = 1884.516810 (without initial connection time) > > > > I had hard time to understand what those numbers mean: > > number of transactions retried: 35 (35.000%) > > total number of retries: 74 > > > > It seems "total number of retries" matches with the number of ERRORs > > reported in PostgreSQL. Good. What I am not sure is "number of > > transactions retried". What does this mean? > > Oh, ok. I see it now. It turned out that "number of transactions > retried" does not actually means the number of transactions > rtried. Suppose pgbench exectutes following in a session: > > BEGIN;-- transaction A starts > : > (ERROR) > ROLLBACK; -- transaction A aborts > > (retry) > > BEGIN;-- transaction B starts > : > (ERROR) > ROLLBACK; -- transaction B aborts > > (retry) > > BEGIN;-- transaction C starts > : > END; -- finally succeeds > > In this case "total number of retries:" = 2 and "number of > transactions retried:" = 1. In this patch transactions A, B and C are > regarded as "same" transaction, so the retried transaction count > becomes 1. But it's confusing to use the language "transaction" here > because A, B and C are different transactions. I would think it's > better to use different language instead of "transaction", something > like "cycle"? i.e. > > number of cycles retried: 35 (35.000%) In the original patch by Marina Polyakova it was "number of retried", but I changed it to "number of transactions retried" is because I felt it was confusing with "number of retries". I chose the word "transaction" because a transaction ends in any one of successful commit , skipped, or failure, after possible retries. Well, I agree with that it is somewhat confusing wording. If we can find nice word to resolve the confusion, I don't mind if we change the word. Maybe, we can use "executions" as well as "cycles". However, I am not sure that the situation is improved by using such word because what such word exactly means seems to be still unclear for users. Another idea is instead reporting only "the number of successfully retried transactions" that does not include "failed transactions", that is, transactions failed after retries, like this; number of transactions actually processed: 100/100 number of failed transactions: 0 (0.000%) number of successfully retried transactions: 35 (35.000%) total number of retries: 74 The meaning is clear and there seems to be no confusion. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hi Ishii-san, On Sun, 20 Mar 2022 09:52:06 +0900 (JST) Tatsuo Ishii wrote: > Hi Yugo, > > I have looked into the patch and I noticed that linkend=... endterm=...> is used in pgbench.sgml. e.g. > > > > AFAIK this is the only place where "endterm" is used. In other places > "link" tag is used instead: Thank you for pointing out it. I've checked other places using referring to , and found that "xreflabel"s are used in such tags. So, I'll fix it in this style. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> Hi Yugo, > > I tested with serialization error scenario by setting: > default_transaction_isolation = 'repeatable read' > The result was: > > $ pgbench -t 10 -c 10 --max-tries=10 test > transaction type: > scaling factor: 10 > query mode: simple > number of clients: 10 > number of threads: 1 > maximum number of tries: 10 > number of transactions per client: 10 > number of transactions actually processed: 100/100 > number of failed transactions: 0 (0.000%) > number of transactions retried: 35 (35.000%) > total number of retries: 74 > latency average = 5.306 ms > initial connection time = 15.575 ms > tps = 1884.516810 (without initial connection time) > > I had hard time to understand what those numbers mean: > number of transactions retried: 35 (35.000%) > total number of retries: 74 > > It seems "total number of retries" matches with the number of ERRORs > reported in PostgreSQL. Good. What I am not sure is "number of > transactions retried". What does this mean? Oh, ok. I see it now. It turned out that "number of transactions retried" does not actually means the number of transactions rtried. Suppose pgbench exectutes following in a session: BEGIN; -- transaction A starts : (ERROR) ROLLBACK; -- transaction A aborts (retry) BEGIN; -- transaction B starts : (ERROR) ROLLBACK; -- transaction B aborts (retry) BEGIN; -- transaction C starts : END;-- finally succeeds In this case "total number of retries:" = 2 and "number of transactions retried:" = 1. In this patch transactions A, B and C are regarded as "same" transaction, so the retried transaction count becomes 1. But it's confusing to use the language "transaction" here because A, B and C are different transactions. I would think it's better to use different language instead of "transaction", something like "cycle"? i.e. number of cycles retried: 35 (35.000%) Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hi Yugo, I tested with serialization error scenario by setting: default_transaction_isolation = 'repeatable read' The result was: $ pgbench -t 10 -c 10 --max-tries=10 test transaction type: scaling factor: 10 query mode: simple number of clients: 10 number of threads: 1 maximum number of tries: 10 number of transactions per client: 10 number of transactions actually processed: 100/100 number of failed transactions: 0 (0.000%) number of transactions retried: 35 (35.000%) total number of retries: 74 latency average = 5.306 ms initial connection time = 15.575 ms tps = 1884.516810 (without initial connection time) I had hard time to understand what those numbers mean: number of transactions retried: 35 (35.000%) total number of retries: 74 It seems "total number of retries" matches with the number of ERRORs reported in PostgreSQL. Good. What I am not sure is "number of transactions retried". What does this mean? Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hi Yugo, I have looked into the patch and I noticed that is used in pgbench.sgml. e.g. AFAIK this is the only place where "endterm" is used. In other places "link" tag is used instead: Failures and Serialization/Deadlock Retries Note that the rendered result is identical. Do we want to use the link tag as well? Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Fabien, On Sat, 12 Mar 2022 15:54:54 +0100 (CET) Fabien COELHO wrote: > Hello Yugo-san, > > About Pgbench error handling v16: Thank you for your review! I attached the updated patches. > This patch set needs a minor rebase because of 506035b0. Otherwise, patch > compiles, global and local "make check" are ok. Doc generation is ok. I rebased it. > ## About v16-2 > English: "he will be aborted" -> "it will be aborted". Fixed. > I'm still not sure I like the "failure detailed" option, ISTM that the report > could be always detailed. That would remove some complexity and I do not think > that people executing a bench with error handling would mind having the > details. > No big deal. I didn't change it because I think those who don't expect any failures using a well designed script may not need details of failures. I think reporting such details will be required only for benchmarks where any failures are expected. > printVerboseErrorMessages: I'd make the buffer static and initialized only > once > so that there is no significant malloc/free cycle involved when calling the > function. OK. I fixed printVerboseErrorMessages to use a static variable. > advanceConnectionState: I'd really prefer not to add new variables (res, > status) > in the loop scope, and only declare them when actually needed in the state > branches, > so as to avoid any unwanted interaction between states. I fixed to declare the variables in the case statement blocks. > typo: "fullowing" -> "following" fixed. > Pipeline cleaning: the advance function is already s long, I'd put that > in a > separate function and call it. Ok. I made a new function "discardUntilSync" for the pipeline cleaning. > I think that the report should not remove data when they are 0, otherwise it > makes > it harder to script around it (in failures_detailed on line 6284). I fixed to report both serialization and deadlock failures always even when they are 0. Regards, Yugo Nagata -- Yugo NAGATA >From b4360d3c03013c86e1e62247f1c3c1378aacc38d Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Wed, 26 May 2021 16:58:36 +0900 Subject: [PATCH v17 1/2] Pgbench errors: use the Variables structure for client variables This is most important when it is used to reset client variables during the repeating of transactions after serialization/deadlock failures. Don't allocate Variable structs one by one. Instead, add a constant margin each time it overflows. --- src/bin/pgbench/pgbench.c | 163 +++--- 1 file changed, 100 insertions(+), 63 deletions(-) diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 000ffc4a5c..ab2c5dfc5f 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -289,6 +289,12 @@ const char *progname; volatile bool timer_exceeded = false; /* flag from signal handler */ +/* + * We don't want to allocate variables one by one; for efficiency, add a + * constant margin each time it overflows. + */ +#define VARIABLES_ALLOC_MARGIN 8 + /* * Variable definitions. * @@ -306,6 +312,24 @@ typedef struct PgBenchValue value; /* actual variable's value */ } Variable; +/* + * Data structure for client variables. + */ +typedef struct +{ + Variable *vars; /* array of variable definitions */ + int nvars; /* number of variables */ + + /* + * The maximum number of variables that we can currently store in 'vars' + * without having to reallocate more space. We must always have max_vars >= + * nvars. + */ + int max_vars; + + bool vars_sorted; /* are variables sorted by name? */ +} Variables; + #define MAX_SCRIPTS 128 /* max number of SQL scripts allowed */ #define SHELL_COMMAND_SIZE 256 /* maximum size allowed for shell command */ @@ -460,9 +484,7 @@ typedef struct int command; /* command number in script */ /* client variables */ - Variable *variables; /* array of variable definitions */ - int nvariables; /* number of variables */ - bool vars_sorted; /* are variables sorted by name? */ + Variables variables; /* various times about current transaction in microseconds */ pg_time_usec_t txn_scheduled; /* scheduled start time of transaction */ @@ -1398,39 +1420,39 @@ compareVariableNames(const void *v1, const void *v2) /* Locate a variable by name; returns NULL if unknown */ static Variable * -lookupVariable(CState *st, char *name) +lookupVariable(Variables *variables, char *name) { Variable key; /* On some versions of Solaris, bsearch of zero items dumps core */ - if (st->nvariables <= 0) + if (variables->nvars <= 0) return NULL; /* Sort if we have to */ - if (!st->vars_sorted) + if (!variables->vars_sorted) { - qsort((void *) st->variables, st->nvariables, sizeof(Variable), + qsort((void *) variables->vars, variables->nvars, sizeof(Variable), compareVariableNames); - st->vars_sorted = true; + variables->vars_sorted = true; } /* Now we can search */ key.name = name;
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Yugo-san, About Pgbench error handling v16: This patch set needs a minor rebase because of 506035b0. Otherwise, patch compiles, global and local "make check" are ok. Doc generation is ok. This patch is in good shape, the code and comments are clear. Some minor remarks below, including typos and a few small suggestions. ## About v16-1 This refactoring patch adds a struct for managing pgbench variables, instead of mixing fields into the client state (CState) struct. Patch compiles, global and local "make check" are both ok. Although this patch is not necessary to add the feature, I'm fine with it as improves pgbench source code readability. ## About v16-2 This last patch adds handling of serialization and deadlock errors to pgbench transactions. This feature is desirable because it enlarge performance testing options, and makes pgbench behave more like a database client application. Possible future extension enabled by this patch include handling deconnections errors by trying to reconnect, for instance. The documentation is clear and well written, at least for my non-native speaker eyes and ears. English: "he will be aborted" -> "it will be aborted". I'm fine with renaming --report-latencies to --report-per-command as the later is clearer about what the options does. I'm still not sure I like the "failure detailed" option, ISTM that the report could be always detailed. That would remove some complexity and I do not think that people executing a bench with error handling would mind having the details. No big deal. printVerboseErrorMessages: I'd make the buffer static and initialized only once so that there is no significant malloc/free cycle involved when calling the function. advanceConnectionState: I'd really prefer not to add new variables (res, status) in the loop scope, and only declare them when actually needed in the state branches, so as to avoid any unwanted interaction between states. typo: "fullowing" -> "following" Pipeline cleaning: the advance function is already s long, I'd put that in a separate function and call it. I think that the report should not remove data when they are 0, otherwise it makes it harder to script around it (in failures_detailed on line 6284). The test cover the different cases. I tried to suggest a simpler approach in a previous round, but it seems not so simple to do so. They could be simplified later, if possible. -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Tatsuo-san, It seems the patch is ready for committer except below. Do you guys want to do more on below? I'm planning a new review of this significant patch, possibly over the next week-end, or the next. -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hi Yugo and Fabien, It seems the patch is ready for committer except below. Do you guys want to do more on below? >> # TESTS >> >> I suggested to simplify the tests by using conditionals & sequences. You >> reported that you got stuck. Hmmm. >> >> I tried again my tests which worked fine when started with 2 clients, >> otherwise they get stuck because the first client waits for the other one >> which does not exists (the point is to generate deadlocks and other >> errors). Maybe this is your issue? > > That seems to be right. It got stuck when I used -T option rather than -t, > it was because, I guess, the number of transactions on each thread was > different. > >> Could you try with: >> >>psql < deadlock_prep.sql >>pgbench -t 4 -c 2 -f deadlock.sql >># note: each deadlock detection takes 1 second >> >>psql < deadlock_prep.sql >>pgbench -t 10 -c 2 -f serializable.sql >># very quick 50% serialization errors > > That works. However, it still gets hang when --max-tries = 2, > so maybe I would not think we can use it for testing the retry > feature Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Fabien, Thank you so much for your review. Sorry for the late reply. I've stopped working on it due to other jobs but I came back again. I attached the updated patch. I would appreciate it if you could review this again. On Mon, 19 Jul 2021 20:04:23 +0200 (CEST) Fabien COELHO wrote: > # About pgbench error handling v15 > > Patches apply cleanly. Compilation, global and local tests ok. > > - v15.1: refactoring is a definite improvement. > Good, even if it is not very useful (see below). Ok, we don't need to save variables in order to implement the retry feature on pbench as you suggested. Well, should we completely separate these two patches and should I fix v15.2 to not rely v15.1? > While restructuring, maybe predefined variables could be make readonly > so that a script which would update them would fail, which would be a > good thing. Maybe this is probably material for an independent patch. Yes, It shoule be for an independent patch. > - v15.2: see detailed comments below > > # Doc > > Doc build is ok. > > ISTM that "number of tries" line would be better placed between the > #threads and #transactions lines. What do you think? Agreed. Fixed. > Aggregate logging description: "{ failures | ... }" seems misleading > because it suggests we have one or the other, whereas it can also be > empty. I suggest: "{ | failures | ... }" to show the empty case. The description is correct because either "failures" or "both of serialization_failures and deadlock_failures" should appear in aggregate logging. If "failures" was printed only when any transaction failed, each line in aggregate logging could have different numbers of columns and which would make it difficult to parse the results. > I'd wonder whether the number of tries is set too high, > though, ISTM that an application should give up before 100? Indeed, max-tries=100 seems too high for practical system. Also, I noticed that sum of latencies of each command (= 15.839 ms) is significantly larger than the latency average (= 10.870 ms) because "per commands" results in the documentation were fixed. So, I retook a measurement on my machine for more accurate documentation. I used max-tries=10. > Minor editing: > > "there're" -> "there are". > > "the --time" -> "the --time option". Fixed. > "The latency for failed transactions and commands is not computed > separately." is unclear, > please use a positive sentence to tell what is true instead of what is not > and the reader > has to guess. Maybe: "The latency figures include failed transactions which > have reached > the maximum number of tries or the transaction latency limit.". I'm not the original author of this description, but I guess this means "The latency is measured only for successful transactions and commands but not for failed transactions or commands.". > "The main report contains the number of failed transactions if it is > non-zero." ISTM that > this is a pain for scripts which would like to process these reports data, > because the data > may or may not be there. I'm sure to write such scripts, which explains my > concern:-) I agree with you. I fixed the behavior to report the the number of failed transactions always regardless with if it is non-zero or not. > "If the total number of retried transactions is non-zero…" should it rather > be "not one", > because zero means unlimited retries? I guess that this means the actual number of retried transaction not the max-tries, so "non-zero" was correct. However, for the same reason above, I fixed the behavior to report the the retry statistics always regardeless with the actual retry numbers. > > # FEATURES > Copying variables: ISTM that we should not need to save the variables > states… no clearing, no copying should be needed. The restarted > transaction simply overrides the existing variables which is what the > previous version was doing anyway. The scripts should write their own > variables before using them, and if they don't then it is the user > problem. This is important for performance, because it means that after a > client has executed all scripts once the variable array is stable and does > not incur significant maintenance costs. The only thing that needs saving > for retry is the speudo-random generator state. This suggest simplifying > or removing "RetryState". Yes. The variables states is not necessary because we retry the whole script. It was necessary in the initial patch because it planned to retry one transaction included in the script. I removed RetryState and copyVariables. > # CODE > In readCommandResponse: ISTM that PGRES_NONFATAL_ERROR is not needed and > could be dealt with the default case. We are only interested in > serialization/deadlocks which are fatal errors? We need PGRES_NONFATAL_ERROR to save st->estatus. It is used outside readCommandResponse to determine whether we should abort or not. > doRetry: for consistency,
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
I attached the updated patch. # About pgbench error handling v15 Patches apply cleanly. Compilation, global and local tests ok. - v15.1: refactoring is a definite improvement. Good, even if it is not very useful (see below). While restructuring, maybe predefined variables could be make readonly so that a script which would update them would fail, which would be a good thing. Maybe this is probably material for an independent patch. - v15.2: see detailed comments below # Doc Doc build is ok. ISTM that "number of tries" line would be better placed between the #threads and #transactions lines. What do you think? Aggregate logging description: "{ failures | ... }" seems misleading because it suggests we have one or the other, whereas it can also be empty. I suggest: "{ | failures | ... }" to show the empty case. Having a full example with retries in the doc is a good thing, and illustrates in passing that running with a number of clients on a small scale does not make much sense because of the contention on tellers/branches. I'd wonder whether the number of tries is set too high, though, ISTM that an application should give up before 100? I like that the feature it is also limited by latency limit. Minor editing: "there're" -> "there are". "the --time" -> "the --time option". The overall English seems good, but I'm not a native speaker. As I already said, a native speaker proofreading would be nice. From a technical writing point of view, maybe the documentation could be improved a bit, but I'm not a ease on that subject. Some comments: "The latency for failed transactions and commands is not computed separately." is unclear, please use a positive sentence to tell what is true instead of what is not and the reader has to guess. Maybe: "The latency figures include failed transactions which have reached the maximum number of tries or the transaction latency limit.". "The main report contains the number of failed transactions if it is non-zero." ISTM that this is a pain for scripts which would like to process these reports data, because the data may or may not be there. I'm sure to write such scripts, which explains my concern:-) "If the total number of retried transactions is non-zero…" should it rather be "not one", because zero means unlimited retries? The section describing the various type of errors that can occur is a good addition. Option "--report-latencies" changed to "--report-per-commands": I'm fine with this change. # FEATURES --failures-detailed: I'm not convinced that this option should not always be on, but this is not very important, so let it be. --verbose-errors: I still think this is only for debugging, but let it be. Copying variables: ISTM that we should not need to save the variables states… no clearing, no copying should be needed. The restarted transaction simply overrides the existing variables which is what the previous version was doing anyway. The scripts should write their own variables before using them, and if they don't then it is the user problem. This is important for performance, because it means that after a client has executed all scripts once the variable array is stable and does not incur significant maintenance costs. The only thing that needs saving for retry is the speudo-random generator state. This suggest simplifying or removing "RetryState". # CODE The semantics of "cnt" is changed. Ok, the overall counters and their relationships make sense, and it simplifies the reporting code. Good. In readCommandResponse: ISTM that PGRES_NONFATAL_ERROR is not needed and could be dealt with the default case. We are only interested in serialization/deadlocks which are fatal errors? doRetry: for consistency, given the assert, ISTM that it should return false if duration has expired, by testing end_time or timer_exceeded. checkTransactionStatus: this function does several things at once with 2 booleans, which make it not very clear to me. Maybe it would be clearer if it would just return an enum (in trans, not in trans, conn error, other error). Another reason to do that is that on connection error pgbench could try to reconnect, which would be an interesting later extension, so let's pave the way for that. Also, I do not think that the function should print out a message, it should be the caller decision to do that. verbose_errors: there is more or less repeated code under RETRY and FAILURE, which should be factored out in a separate function. The advanceConnectionFunction is long enough. Once this is done, there is no need for a getLatencyUsed function. I'd put cleaning up the pipeline in a function. I do not understand why the pipeline mode is not exited in all cases, the code checks for the pipeline status twice in a few lines. I'd put this cleanup in the sync function as well, report to the caller (advanceConnectionState) if there was an error, which would be managed there.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello, I attached the updated patch. On Tue, 13 Jul 2021 15:50:52 +0900 Yugo NAGATA wrote: > > >> > I'm a little hesitant about how to count and report such unfinished > > >> > because of bench timeout transactions, though. Not counting them seems > > >> > to be the best option. > > > I will fix to finish the benchmark when the time is over during retrying, > > > that is, > > > change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases. Done. (I wrote CSTATE_ERROR, but correctly it is CSTATE_FAILURE.) Now, once the timer is expired during retrying a failed transaction, pgbench never start a new transaction for retry. If the transaction successes, it will counted in the result. Otherwise, if the transaction fails again, it is not counted. In addition, I fixed to work well with pipeline mode. Previously, pipeline mode was not enough considered, ROLLBACK was not sent correctly. I fixed to handle errors in pipeline mode properly, and now it works. Regards, Yugo Nagata -- Yugo NAGATA >From 1cd3519f3a1cfbffe7bcce35fd6f12da566625b1 Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Mon, 7 Jun 2021 18:35:14 +0900 Subject: [PATCH v15 2/2] Pgbench errors and serialization/deadlock retries Client's run is aborted in case of a serious error, for example, the connection with the database server was lost or the end of script reached without completing the last transaction. In addition, if an execution of SQL or meta command fails for reasons other than serialization or deadlock errors, the client is aborted. Otherwise, if an SQL fails with serialization or deadlock errors, the current transaction is rolled back which also includes setting the client variables as they were before the run of this transaction (it is assumed that one transaction script contains only one transaction). Transactions with serialization or deadlock errors are repeated after rollbacks until they complete successfully or reach the maximum number of tries (specified by the --max-tries option) / the maximum time of tries (specified by the --latency-limit option). These options can be combined together; more over, you cannot use an unlimited number of tries (--max-tries=0) without the --latency-limit option or the --time option. By default the option --max-tries is set to 1 and transactions with serialization/deadlock errors are not retried. If the last transaction run fails, this transaction will be reported as failed, and the client variables will be set as they were before the first run of this transaction. If there're retries and/or failures their statistics are printed in the progress, in the transaction / aggregation logs and in the end with other results (all and for each script). Also retries and failures are printed per-command with average latencies if you use the appropriate benchmarking option (--report-per-command, -r). If you want to group failures by basic types (serialization failures / deadlock failures), use the option --failures-detailed. If you want to distinguish all errors and failures (errors without retrying) by type including which limit for retries was violated and how far it was exceeded for the serialization/deadlock failures, use the options --verbose-errors. --- doc/src/sgml/ref/pgbench.sgml| 402 +++- src/bin/pgbench/pgbench.c| 974 +-- src/bin/pgbench/t/001_pgbench_with_server.pl | 217 - src/bin/pgbench/t/002_pgbench_no_server.pl | 10 + src/fe_utils/conditional.c | 16 +- src/include/fe_utils/conditional.h | 2 + 6 files changed, 1505 insertions(+), 116 deletions(-) diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 0c60077e1f..0a3ccb3c92 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -58,6 +58,7 @@ number of clients: 10 number of threads: 1 number of transactions per client: 1000 number of transactions actually processed: 1/1 +maximum number of tries: 1 latency average = 11.013 ms latency stddev = 7.351 ms initial connection time = 45.758 ms @@ -65,11 +66,14 @@ tps = 896.967014 (without initial connection time) The first six lines report some of the most important parameter - settings. The next line reports the number of transactions completed + settings. The seventh line reports the number of transactions completed and intended (the latter being just the product of number of clients and number of transactions per client); these will be equal unless the run - failed before completion. (In -T mode, only the actual - number of transactions is printed.) + failed before completion or some SQL command(s) failed. (In + -T mode, only the actual number of transactions is printed.) + The next line reports the maximum number of tries for transactions with + serialization or deadlock errors (see for more information). The last line reports the number of transactions per second. @@
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Tue, 13 Jul 2021 14:35:00 +0900 (JST) Tatsuo Ishii wrote: > >> > I would tend to agree with this behavior, that is not to start any new > >> > transaction or transaction attempt once -T has expired. > > > > That is the behavior in the latest patch. Once -T has expired, any new > > transaction or retry does not start. > > Actually v14 has not changed the behavior in this regard as explained > in different email: Right. Both of v13 and v14 doen't start any new transaction or retry once -T has expired. > >> > I'm a little hesitant about how to count and report such unfinished > >> > because of bench timeout transactions, though. Not counting them seems > >> > to be the best option. > >> > >> I agree. > > > > I also agree. Although I couldn't get an answer what does he think the > > actual > > harm for users due to termination of retrying by the -T option is, I guess > > it just > > complained about reporting the termination of retrying as failures. > > Therefore, > > I will fix to finish the benchmark when the time is over during retrying, > > that is, > > change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases. > > I guess Fabien wanted it differently. Suppose "-c 10 and -T 30" and we > have 100 success transactions by time 25. At time 25 pgbench starts > next benchmark cycle and by time 30 there are 10 failing transactions > (because they are retrying). pgbench stops the execution at time > 30. According your proposal (change the state to CSTATE_FINISHED > instead of CSTATE_ERROR) the total number of success transactions will > be 100 + 10 = 110, right? No. The last failed transaction is not counted because CSTATE_END_TX is bypassed, so please don't worry. > Also actually I have explained the harm number of times but you have > kept on ignoring it because "it's subtle". My request has been pretty > simple. > > > number of failed transactions: 9 (0.916%) > > I don't like this and want to have the failed transactions to be 0. > Who wants a benchmark result having errors? I was asking you because I would like to confirm what you really complained about; whether the problem is that retrying transaction is terminated by -T option, or that pgbench reports it as the number of failed transactions? But now, I understood this is the latter that you don't want to count the temination of retrying as failures. Thanks. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> > I would tend to agree with this behavior, that is not to start any new >> > transaction or transaction attempt once -T has expired. > > That is the behavior in the latest patch. Once -T has expired, any new > transaction or retry does not start. Actually v14 has not changed the behavior in this regard as explained in different email: > $ pgbench -p 11000 -c 10 -T 10 --max-tries=0 test > pgbench (15devel, server 13.3) > starting vacuum...end. > transaction type: > scaling factor: 1 > query mode: simple > number of clients: 10 > number of threads: 1 > duration: 10 s > number of transactions actually processed: 974 > number of failed transactions: 9 (0.916%) > number of transactions retried: 651 (66.226%) > total number of retries: 8482 > latency average = 101.317 ms (including failures) > initial connection time = 44.440 ms > tps = 97.796487 (without initial connection time) >> > I'm a little hesitant about how to count and report such unfinished >> > because of bench timeout transactions, though. Not counting them seems >> > to be the best option. >> >> I agree. > > I also agree. Although I couldn't get an answer what does he think the actual > harm for users due to termination of retrying by the -T option is, I guess it > just > complained about reporting the termination of retrying as failures. > Therefore, > I will fix to finish the benchmark when the time is over during retrying, > that is, > change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases. I guess Fabien wanted it differently. Suppose "-c 10 and -T 30" and we have 100 success transactions by time 25. At time 25 pgbench starts next benchmark cycle and by time 30 there are 10 failing transactions (because they are retrying). pgbench stops the execution at time 30. According your proposal (change the state to CSTATE_FINISHED instead of CSTATE_ERROR) the total number of success transactions will be 100 + 10 = 110, right? I guess Fabien wants to have the number to be 100 rather than 110. Fabien, Please correct me if you think differently. Also actually I have explained the harm number of times but you have kept on ignoring it because "it's subtle". My request has been pretty simple. > number of failed transactions: 9 (0.916%) I don't like this and want to have the failed transactions to be 0. Who wants a benchmark result having errors? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Tue, 13 Jul 2021 13:00:49 +0900 (JST) Tatsuo Ishii wrote: > >>> Or, we should terminate the last cycle of benchmark regardless it is > >>> retrying or not if -T expires. This will make pgbench behaves much > >>> more consistent. > > > > I would tend to agree with this behavior, that is not to start any new > > transaction or transaction attempt once -T has expired. That is the behavior in the latest patch. Once -T has expired, any new transaction or retry does not start. IIUC, Ishii-san's proposal was changing the pgbench's behavior when -T has expired to terminate any running transactions immediately regardless retrying. I am not sure we should do it in this patch. If we would like this change, it would be done in another patch as an improvement of the -T option. > > I'm a little hesitant about how to count and report such unfinished > > because of bench timeout transactions, though. Not counting them seems > > to be the best option. > > I agree. I also agree. Although I couldn't get an answer what does he think the actual harm for users due to termination of retrying by the -T option is, I guess it just complained about reporting the termination of retrying as failures. Therefore, I will fix to finish the benchmark when the time is over during retrying, that is, change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>>> Or, we should terminate the last cycle of benchmark regardless it is >>> retrying or not if -T expires. This will make pgbench behaves much >>> more consistent. > > I would tend to agree with this behavior, that is not to start any new > transaction or transaction attempt once -T has expired. > > I'm a little hesitant about how to count and report such unfinished > because of bench timeout transactions, though. Not counting them seems > to be the best option. I agree. >> Hmmm, indeed this might make the behaviour a bit consistent, but I am >> not >> sure such behavioural change benefit users. > > The user benefit would be that if they asked for a 100s benchmark, > pgbench does a reasonable effort not to overshot that? Right. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello, Of course, users themselves should be careful of problematic script, but it would be better that pgbench itself avoids problems if pgbench can beforehand. Or, we should terminate the last cycle of benchmark regardless it is retrying or not if -T expires. This will make pgbench behaves much more consistent. I would tend to agree with this behavior, that is not to start any new transaction or transaction attempt once -T has expired. I'm a little hesitant about how to count and report such unfinished because of bench timeout transactions, though. Not counting them seems to be the best option. Hmmm, indeed this might make the behaviour a bit consistent, but I am not sure such behavioural change benefit users. The user benefit would be that if they asked for a 100s benchmark, pgbench does a reasonable effort not to overshot that? -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
I have played with v14 patch. I previously complained that pgbench always reported 9 errors (actually the number is always the number specified by "-c" -1 in my case). $ pgbench -p 11000 -c 10 -T 10 --max-tries=0 test pgbench (15devel, server 13.3) starting vacuum...end. transaction type: scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 10 s number of transactions actually processed: 974 number of failed transactions: 9 (0.916%) number of transactions retried: 651 (66.226%) total number of retries: 8482 latency average = 101.317 ms (including failures) initial connection time = 44.440 ms tps = 97.796487 (without initial connection time) To reduce the number of errors I provide "--max-tries=9000" because pgbench reported 8482 errors. $ pgbench -p 11000 -c 10 -T 10 --max-tries=9000 test pgbench (15devel, server 13.3) starting vacuum...end. transaction type: scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 10 s number of transactions actually processed: 1133 number of failed transactions: 9 (0.788%) number of transactions retried: 755 (66.112%) total number of retries: 9278 maximum number of tries: 9000 latency average = 88.570 ms (including failures) initial connection time = 23.384 ms tps = 112.015219 (without initial connection time) Unfortunately this didn't work. Still 9 errors because pgbench terminated the last round of run. Then I gave up to use -T, and switched to use -t. Number of transactions for -t option was calculated by the total number of transactions actually processed (1133) / number of clients (10) = 11.33. I rouned up 11.33 to 12, then multiply number of clients (10) and got 120. The result: $ pgbench -p 11000 -c 10 -t 120 --max-tries=9000 test pgbench (15devel, server 13.3) starting vacuum...end. transaction type: scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 number of transactions per client: 120 number of transactions actually processed: 1200/1200 number of transactions retried: 675 (56.250%) total number of retries: 8524 maximum number of tries: 9000 latency average = 93.777 ms initial connection time = 14.120 ms tps = 106.635908 (without initial connection time) Finally I was able to get a result without any errors. This is not a super simple way to obtain pgbench results without errors, but probably I can live with it. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Fabien, I attached the updated patch (v14)! On Wed, 30 Jun 2021 17:33:24 +0200 (CEST) Fabien COELHO wrote: > >> --report-latencies -> --report-per-command: should we keep supporting > >> the previous option? > > > > Ok. Although now the option is not only for latencies, considering users who > > are using the existing option, I'm fine with this. I got back this to the > > previous name. > > Hmmm. I liked the new name! My point was whether we need to support the > old one as well for compatibility, or whether we should not bother. I'm > still wondering. As I think that the new name is better, I'd suggest to > keep it. Ok. I misunderstood it. I returned the option name to report-per-command. If we keep report-latencies, I can imagine the following choises: - use report-latencies to print only latency information - use report-latencies as alias of report-per-command for compatibility and remove at an appropriate timing. (that is, treat as deprecated) Among these, I prefer the latter because ISTM we would not need many options for reporting information per command. However, actually, I wander that we don't have to keep the previous one if we plan to remove it eventually. > >> --failures-detailed: if we bother to run with handling failures, should > >> it always be on? > > > > If we print other failures that cannot be retried in future, it could a lot > > of lines and might make some users who don't need details of failures > > annoyed. > > Moreover, some users would always need information of detailed failures in > > log, > > and others would need only total numbers of failures. > > Ok. > > > Currently we handle only serialization and deadlock failures, so the number > > of > > lines printed and the number of columns of logging is not large even under > > the > > failures-detail, but if we have a chance to handle other failures in future, > > ISTM adding this option makes sense considering users who would like simple > > outputs. > > Hmmm. What kind of failures could be managed with retries? I guess that on > a connection failure we can try to reconnect, but otherwise it is less > clear that other failures make sense to retry. Indeed, there would few failures that we should retry and I can not imagine other than serialization , deadlock, and connection failures for now. However, considering reporting the number of failed transaction and its causes in future, as you said > Given that we manage errors, ISTM that we should not necessarily > stop on other not retried errors, but rather count/report them and > possibly proceed. , we could define more a few kind of failures. At least we can consider meta-command and other SQL commands errors in addition to serialization, deadlock, connection failures. So, the total number of kind of failures would be five at least and reporting always all of them results a lot of lines and columns in logging. > >> --debug-errors: I'm not sure we should want a special debug mode for that, > >> I'd consider integrating it with the standard debug, or just for > >> development. > > > > I think --debug is a debug option for telling users the pgbench's internal > > behaviors, that is, which client is doing what. On other hand, > > --debug-errors > > is for telling users what error caused a retry or a failure in detail. For > > users who are not interested in pgbench's internal behavior (sending a > > command, > > receiving a result, ... ) but interested in actual errors raised during > > running > > script, this option seems useful. > > Ok. The this is not really about debug per se, but a verbosity setting? I think so. > Maybe --verbose-errors would make more sense? I'm unsure. I'll think about > it. Agreed. This seems more proper than the previous one, so I fixed the name to --verbose-errors. > > Sorry, I couldn't understand your suggestion. Is this about the order of > > case > > statements or pg_log_error? > > My sentence got mixed up. My point was about the case order, so that they > are put in a more logical order when reading all the cases. Ok. Considering the loical order, I moved WAIT_ROLLBACK_RESULT into between ERROR and RETRY, because WAIT_ROLLBACK_RESULT comes atter ERROR state, and RETRY comes after ERROR or WAIT_ROLLBACK_RESULT.. > >> Currently, ISTM that the retry on error mode is implicitely always on. > >> Do we want that? I'd say yes, but maybe people could disagree. > > > > The default values of max-tries is 1, so the retry on error is off. > > > Failed transactions are retried only when the user wants it and > > specifies a valid value to max-treis. > > Ok. My point is that we do not stop on such errors, whereas before ISTM > that we would have stopped, so somehow the default behavior has changed > and the previous behavior cannot be reinstated with an option. Maybe that > is not bad, but this is a behavioral change which needs to be documented > and argumented. I understood. Indeed, there is a behavioural
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Wed, 07 Jul 2021 21:50:16 +0900 (JST) Tatsuo Ishii wrote: > >> Well, "that's very little, let's ignore it" is not technically a right > >> direction IMO. > > > > Hmmm, It seems to me these failures are ignorable because with regard to > > failures > > due to -T they occur only the last transaction of each client and do not > > affect > > the result such as TPS and latency of successfully processed transactions. > > (although I am not sure for what sense you use the word "technically"...) > > "My application button does not respond once in 100 times. It's just > 1% error rate. You should ignore it." I would say this attitude is not > technically correct. I cannot understand what you want to say. Reporting the number of transactions that is failed intentionally can be treated as same as he error rate on your application's button? > > However, maybe I am missing something. Could you please tell me what do you > > think > > the actual harm for users about failures due to -D is? > > I don't know why you are referring to -D. Sorry. It's just a typo as you can imagine. I am asking you what do you think the actual harm for users due to termination of retrying by the -T option is. > >> That's necessarily true in practice. By the time when -T is about to > >> expire, transactions are all finished in finite time as you can see > >> the result I showed. So it's reasonable that the very last cycle of > >> the benchmark will finish in finite time as well. > > > > Your script may finish in finite time, but others may not. > > That's why I said "practically". In other words "in most cases the > scenario will finish in finite time". Sure. > > Indeed, it is possible an execution of a query takes a long or infinite > > time. However, its cause would a problematic query in the custom script > > or other problems occurs on the server side. These are not problem of > > pgbench and, pgbench itself can't control either. On the other hand, the > > unlimited number of tries is a behaviours specified by the pgbench option, > > so I think pgbench itself should internally avoid problems caused from its > > behaviours. That is, if max-tries=0 could cause infinite or much longer > > benchmark time more than user expected due to too many retries, I think > > pgbench should avoid it. > > I would say that's user's responsibility to avoid infinite running > benchmarking. Remember, pgbench is a tool for serious users, not for > novice users. Of course, users themselves should be careful of problematic script, but it would be better that pgbench itself avoids problems if pgbench can beforehand. > Or, we should terminate the last cycle of benchmark regardless it is > retrying or not if -T expires. This will make pgbench behaves much > more consistent. Hmmm, indeed this might make the behaviour a bit consistent, but I am not sure such behavioural change benefit users. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
>> Well, "that's very little, let's ignore it" is not technically a right >> direction IMO. > > Hmmm, It seems to me these failures are ignorable because with regard to > failures > due to -T they occur only the last transaction of each client and do not > affect > the result such as TPS and latency of successfully processed transactions. > (although I am not sure for what sense you use the word "technically"...) "My application button does not respond once in 100 times. It's just 1% error rate. You should ignore it." I would say this attitude is not technically correct. > However, maybe I am missing something. Could you please tell me what do you > think > the actual harm for users about failures due to -D is? I don't know why you are referring to -D. >> That's necessarily true in practice. By the time when -T is about to >> expire, transactions are all finished in finite time as you can see >> the result I showed. So it's reasonable that the very last cycle of >> the benchmark will finish in finite time as well. > > Your script may finish in finite time, but others may not. That's why I said "practically". In other words "in most cases the scenario will finish in finite time". > Indeed, it is possible an execution of a query takes a long or infinite > time. However, its cause would a problematic query in the custom script > or other problems occurs on the server side. These are not problem of > pgbench and, pgbench itself can't control either. On the other hand, the > unlimited number of tries is a behaviours specified by the pgbench option, > so I think pgbench itself should internally avoid problems caused from its > behaviours. That is, if max-tries=0 could cause infinite or much longer > benchmark time more than user expected due to too many retries, I think > pgbench should avoid it. I would say that's user's responsibility to avoid infinite running benchmarking. Remember, pgbench is a tool for serious users, not for novice users. Or, we should terminate the last cycle of benchmark regardless it is retrying or not if -T expires. This will make pgbench behaves much more consistent. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Wed, 07 Jul 2021 16:11:23 +0900 (JST) Tatsuo Ishii wrote: > > Indeed, as Ishii-san pointed out, some users might not want to terminate > > retrying transactions due to -T. However, the actual negative effect is only > > printing the number of failed transactions. The other result that users > > want to > > know, such as tps, are almost not affected because they are measured for > > transactions processed successfully. Actually, the percentage of failed > > transaction is very little, only 0.347%. > > Well, "that's very little, let's ignore it" is not technically a right > direction IMO. Hmmm, It seems to me these failures are ignorable because with regard to failures due to -T they occur only the last transaction of each client and do not affect the result such as TPS and latency of successfully processed transactions. (although I am not sure for what sense you use the word "technically"...) However, maybe I am missing something. Could you please tell me what do you think the actual harm for users about failures due to -D is? > > In the existing behaviour, running transactions are never terminated due to > > the -T option. However, ISTM that this would be based on an assumption > > that a latency of each transaction is small and that a timing when we can > > finish the benchmark would come soon. On the other hand, when transactions > > can > > be retried unlimitedly, it may take a long time more than expected, and we > > can > > not guarantee that this would finish successfully in limited > > time.Therefore, > > terminating the benchmark by giving up to retry the transaction after time > > expiration seems reasonable under unlimited retries. > > That's necessarily true in practice. By the time when -T is about to > expire, transactions are all finished in finite time as you can see > the result I showed. So it's reasonable that the very last cycle of > the benchmark will finish in finite time as well. Your script may finish in finite time, but others may not. However, considering only serialization and deadlock errors, almost transactions would finish in finite time eventually. In the previous version of the patch, errors other than serialization or deadlock can be retried and it causes unlimited retrying easily. Now, only the two kind of errors can be retried, nevertheless, it is unclear for me that we can assume that retying will finish in finite time. If we can assume it, maybe, we can remove the restriction that --max-retries=0 must be used with --latency-limit or -T. > Of course if a benchmark cycle takes infinite time, this will be a > problem. However same thing can be said to non-retry > benchmarks. Theoretically it is possible that *one* benchmark cycle > takes forever. In this case the only solution will be just hitting ^C > to terminate pgbench. Why can't we have same assumption with > --max-tries=0 case? Indeed, it is possible an execution of a query takes a long or infinite time. However, its cause would a problematic query in the custom script or other problems occurs on the server side. These are not problem of pgbench and, pgbench itself can't control either. On the other hand, the unlimited number of tries is a behaviours specified by the pgbench option, so I think pgbench itself should internally avoid problems caused from its behaviours. That is, if max-tries=0 could cause infinite or much longer benchmark time more than user expected due to too many retries, I think pgbench should avoid it. > > In the sense that we don't > > terminate running transactions forcibly, this don't change the existing > > behaviour. > > This statement seems to be depending on your perosnal assumption. Ok. If we regard that a transaction is still running even when it is under retrying after an error, terminate of the retry may imply to terminate running the transaction forcibly. > I still don't understand why you think that --max-tries non 0 case > will *certainly* finish in finite time whereas --max-tries=0 case will > not. I just mean that --max-tries greater than zero will prevent pgbench from retrying a transaction forever. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> Indeed, as Ishii-san pointed out, some users might not want to terminate > retrying transactions due to -T. However, the actual negative effect is only > printing the number of failed transactions. The other result that users want > to > know, such as tps, are almost not affected because they are measured for > transactions processed successfully. Actually, the percentage of failed > transaction is very little, only 0.347%. Well, "that's very little, let's ignore it" is not technically a right direction IMO. > In the existing behaviour, running transactions are never terminated due to > the -T option. However, ISTM that this would be based on an assumption > that a latency of each transaction is small and that a timing when we can > finish the benchmark would come soon. On the other hand, when transactions > can > be retried unlimitedly, it may take a long time more than expected, and we can > not guarantee that this would finish successfully in limited time.Therefore, > terminating the benchmark by giving up to retry the transaction after time > expiration seems reasonable under unlimited retries. That's necessarily true in practice. By the time when -T is about to expire, transactions are all finished in finite time as you can see the result I showed. So it's reasonable that the very last cycle of the benchmark will finish in finite time as well. Of course if a benchmark cycle takes infinite time, this will be a problem. However same thing can be said to non-retry benchmarks. Theoretically it is possible that *one* benchmark cycle takes forever. In this case the only solution will be just hitting ^C to terminate pgbench. Why can't we have same assumption with --max-tries=0 case? > In the sense that we don't > terminate running transactions forcibly, this don't change the existing > behaviour. This statement seems to be depending on your perosnal assumption. I still don't understand why you think that --max-tries non 0 case will *certainly* finish in finite time whereas --max-tries=0 case will not. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Ishii-san, On Fri, 02 Jul 2021 09:25:03 +0900 (JST) Tatsuo Ishii wrote: > I have found an interesting result from patched pgbench (I have set > the isolation level to REPEATABLE READ): > > $ pgbench -p 11000 -c 10 -T 30 --max-tries=0 test > pgbench (15devel, server 13.3) > starting vacuum...end. > transaction type: > scaling factor: 1 > query mode: simple > number of clients: 10 > number of threads: 1 > duration: 30 s > number of transactions actually processed: 2586 > number of failed transactions: 9 (0.347%) > ~ > number of transactions retried: 1892 (72.909%) > total number of retries: 21819 > latency average = 115.551 ms (including failures) > initial connection time = 35.268 ms > tps = 86.241799 (without initial connection time) > > I ran pgbench with 10 concurrent sessions. In this case pgbench always > reports 9 failed transactions regardless the setting of -T > option. This is because at the end of a pgbench session, only 1 out of > 10 transaction succeeded but 9 transactions failed due to > serialization error without any chance to retry because -T expires. > > This is a little bit disappointed because I wanted to see a result of > all transactions succeeded with retries. I tried -t instead of -T but > -t cannot be used with --max-tries=0. > > Also I think this behavior is somewhat inconsistent with existing > behavior of pgbench. When pgbench runs without --max-tries option, > pgbench continues to run transactions even after -T expires: > > $ time pgbench -p 11000 -T 10 -f pgbench.sql test > pgbench (15devel, server 13.3) > starting vacuum...end. > transaction type: pgbench.sql > scaling factor: 1 > query mode: simple > number of clients: 1 > number of threads: 1 > duration: 10 s > number of transactions actually processed: 2 > maximum number of tries: 1 > latency average = 7009.006 ms > initial connection time = 8.045 ms > tps = 0.142674 (without initial connection time) > > real 0m14.067s > user 0m0.010s > sys 0m0.004s > > $ cat pgbench.sql > SELECT pg_sleep(7); > > So pgbench does not stop transactions after 10 seconds passed but > waits for the last transaction completes. If we consistent with > behavior when --max-tries=0, shouldn't we retry until the last > transaction finishes? I changed the previous patch to enable that the -T option can terminate a retrying transaction and that we can specify --max-tries=0 without --latency-limit if we have -T , according with the following comment. > Doc says "you cannot use an infinite number of retries without > latency-limit..." > > Why should this be forbidden? At least if -T timeout takes precedent and > shortens the execution, ISTM that there could be good reason to test that. > Maybe it could be blocked only under -t if this would lead to an non-ending > run. Indeed, as Ishii-san pointed out, some users might not want to terminate retrying transactions due to -T. However, the actual negative effect is only printing the number of failed transactions. The other result that users want to know, such as tps, are almost not affected because they are measured for transactions processed successfully. Actually, the percentage of failed transaction is very little, only 0.347%. In the existing behaviour, running transactions are never terminated due to the -T option. However, ISTM that this would be based on an assumption that a latency of each transaction is small and that a timing when we can finish the benchmark would come soon. On the other hand, when transactions can be retried unlimitedly, it may take a long time more than expected, and we can not guarantee that this would finish successfully in limited time. Therefore, terminating the benchmark by giving up to retry the transaction after time expiration seems reasonable under unlimited retries. In the sense that we don't terminate running transactions forcibly, this don't change the existing behaviour. If you don't want to print the number of transactions failed due to -T, we can fix to forbid to use -T without latency-limit under max-tries=0 for avoiding possible never-ending-benchmark. In this case, users have to limit the number of transaction retry by specifying latency-limit or max-tries (>0). However, if some users would like to benchmark simply allowing unlimited retries, using -T and max-tries=0 seems the most straight way, so I think it is better that they can be used together. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Ishii-san, On Thu, 01 Jul 2021 09:03:42 +0900 (JST) Tatsuo Ishii wrote: > > v13 patches gave a compiler warning... > > > > $ make >/dev/null > > pgbench.c: In function ‘commandError’: > > pgbench.c:3071:17: warning: unused variable ‘command’ [-Wunused-variable] > > const Command *command = sql_script[st->use_file].commands[st->command]; > > ^~~ Hmm, we'll get the warning when --enable-cassert is not specified. I'll fix it. > There is a typo in the doc (more over -> moreover). > > >of all transaction tries; more over, you cannot use an unlimited > > number > > of all transaction tries; moreover, you cannot use an unlimited number > Thanks. I'll fix. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
I have found an interesting result from patched pgbench (I have set the isolation level to REPEATABLE READ): $ pgbench -p 11000 -c 10 -T 30 --max-tries=0 test pgbench (15devel, server 13.3) starting vacuum...end. transaction type: scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 30 s number of transactions actually processed: 2586 number of failed transactions: 9 (0.347%) ~ number of transactions retried: 1892 (72.909%) total number of retries: 21819 latency average = 115.551 ms (including failures) initial connection time = 35.268 ms tps = 86.241799 (without initial connection time) I ran pgbench with 10 concurrent sessions. In this case pgbench always reports 9 failed transactions regardless the setting of -T option. This is because at the end of a pgbench session, only 1 out of 10 transaction succeeded but 9 transactions failed due to serialization error without any chance to retry because -T expires. This is a little bit disappointed because I wanted to see a result of all transactions succeeded with retries. I tried -t instead of -T but -t cannot be used with --max-tries=0. Also I think this behavior is somewhat inconsistent with existing behavior of pgbench. When pgbench runs without --max-tries option, pgbench continues to run transactions even after -T expires: $ time pgbench -p 11000 -T 10 -f pgbench.sql test pgbench (15devel, server 13.3) starting vacuum...end. transaction type: pgbench.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 10 s number of transactions actually processed: 2 maximum number of tries: 1 latency average = 7009.006 ms initial connection time = 8.045 ms tps = 0.142674 (without initial connection time) real0m14.067s user0m0.010s sys 0m0.004s $ cat pgbench.sql SELECT pg_sleep(7); So pgbench does not stop transactions after 10 seconds passed but waits for the last transaction completes. If we consistent with behavior when --max-tries=0, shouldn't we retry until the last transaction finishes? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> v13 patches gave a compiler warning... > > $ make >/dev/null > pgbench.c: In function ‘commandError’: > pgbench.c:3071:17: warning: unused variable ‘command’ [-Wunused-variable] > const Command *command = sql_script[st->use_file].commands[st->command]; > ^~~ There is a typo in the doc (more over -> moreover). >of all transaction tries; more over, you cannot use an unlimited number of all transaction tries; moreover, you cannot use an unlimited number Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
> I attached the patch updated according with your suggestion. v13 patches gave a compiler warning... $ make >/dev/null pgbench.c: In function ‘commandError’: pgbench.c:3071:17: warning: unused variable ‘command’ [-Wunused-variable] const Command *command = sql_script[st->use_file].commands[st->command]; ^~~ Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Yugo-san, Thanks for the update! Patch seems to apply cleanly with "git apply", but does not compile on my host: "undefined reference to `conditional_stack_reset'". However it works better when using the "patch". I'm wondering why git apply fails silently… Hmm, I don't know why your compiling fails... I can apply and complile successfully using git. Hmmm. Strange! Given that we manage errors, ISTM that we should not necessarily stop on other not retried errors, but rather count/report them and possibly proceed. Eg with something like: [...] We could count the fail, rollback if necessary, and go on. What do you think? Maybe such behavior would deserve an option. This feature to count failures that could occur at runtime seems nice. However, as discussed in [1], I think it is better to focus to only failures that can be retried in this patch, and introduce the feature to handle other failures in a separate patch. Ok. --report-latencies -> --report-per-command: should we keep supporting the previous option? Ok. Although now the option is not only for latencies, considering users who are using the existing option, I'm fine with this. I got back this to the previous name. Hmmm. I liked the new name! My point was whether we need to support the old one as well for compatibility, or whether we should not bother. I'm still wondering. As I think that the new name is better, I'd suggest to keep it. --failures-detailed: if we bother to run with handling failures, should it always be on? If we print other failures that cannot be retried in future, it could a lot of lines and might make some users who don't need details of failures annoyed. Moreover, some users would always need information of detailed failures in log, and others would need only total numbers of failures. Ok. Currently we handle only serialization and deadlock failures, so the number of lines printed and the number of columns of logging is not large even under the failures-detail, but if we have a chance to handle other failures in future, ISTM adding this option makes sense considering users who would like simple outputs. Hmmm. What kind of failures could be managed with retries? I guess that on a connection failure we can try to reconnect, but otherwise it is less clear that other failures make sense to retry. --debug-errors: I'm not sure we should want a special debug mode for that, I'd consider integrating it with the standard debug, or just for development. I think --debug is a debug option for telling users the pgbench's internal behaviors, that is, which client is doing what. On other hand, --debug-errors is for telling users what error caused a retry or a failure in detail. For users who are not interested in pgbench's internal behavior (sending a command, receiving a result, ... ) but interested in actual errors raised during running script, this option seems useful. Ok. The this is not really about debug per se, but a verbosity setting? Maybe --verbose-errors would make more sense? I'm unsure. I'll think about it. Also, should it use pg_log_debug? If we use pg_log_debug, the message is printed only under --debug. Therefore, I fixed to use pg_log_info instead of pg_log_error or fprintf. Ok, pg_log_info seems right. Tries vs retries: I'm at odds with having tries & retries and + 1 here and there to handle that, which is a little bit confusing. I'm wondering whether we could only count "tries" and adjust to report what we want later? I fixed to use "tries" instead of "retries" in CState. However, we still use "retries" in StatsData and Command because the number of retries is printed in the final result. Is it less confusing than the previous? I'm going to think about it. advanceConnectionState: ISTM that ERROR should logically be before others which lead to it. Sorry, I couldn't understand your suggestion. Is this about the order of case statements or pg_log_error? My sentence got mixed up. My point was about the case order, so that they are put in a more logical order when reading all the cases. Currently, ISTM that the retry on error mode is implicitely always on. Do we want that? I'd say yes, but maybe people could disagree. The default values of max-tries is 1, so the retry on error is off. Failed transactions are retried only when the user wants it and specifies a valid value to max-treis. Ok. My point is that we do not stop on such errors, whereas before ISTM that we would have stopped, so somehow the default behavior has changed and the previous behavior cannot be reinstated with an option. Maybe that is not bad, but this is a behavioral change which needs to be documented and argumented. See the attached files for generating deadlocks reliably (start with 2 clients). What do you think? The PL/pgSQL minimal, it is really client-code oriented. Sorry, but I cannot find the attached file. Sorry. Attached to this mail. The serialization stuff
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Fabien, On Sat, 26 Jun 2021 12:15:38 +0200 (CEST) Fabien COELHO wrote: > > Hello Yugo-san, > > # About v12.2 > > ## Compilation > > Patch seems to apply cleanly with "git apply", but does not compile on my > host: "undefined reference to `conditional_stack_reset'". > > However it works better when using the "patch". I'm wondering why git > apply fails silently… Hmm, I don't know why your compiling fails... I can apply and complile successfully using git. > When compiling there are warnings about "pg_log_fatal", which does not > expect a FILE* on pgbench.c:4453. Remove the "stderr" argument. Ok. > Global and local checks ok. > > > number of transactions failed: 324 (3.240%) > > ... > > number of transactions retried: 5629 (56.290%) > > number of total retries: 103299 > > I'd suggest: "number of failed transactions". "total number of retries" or > just "number of retries"? Ok. I fixed to use "number of failed transactions" and "total number of retries". > ## Feature > > The overall code structure changes to implements the feature seems > reasonable to me, as we are at the 12th iteration of the patch. > > Comments below are somehow about details and asking questions > about choices, and commenting… > > ## Documentation > > There is a lot of documentation, which is good. I'll review these > separatly. It looks good, but having a native English speaker/writer > would really help! > > Some output examples do not correspond to actual output for > the current version. In particular, there is always one TPS figure > given now, instead of the confusing two shown before. Fixed. > ## Comments > > transactinos -> transactions. Fixed. > ## Code > > By default max_tries = 0. Should not the initialization be 1, > as the documentation argues that it is the default? Ok. I fixed the default value to 1. > Counter comments, missing + in the formula on the skipped line. Fixed. > Given that we manage errors, ISTM that we should not necessarily > stop on other not retried errors, but rather count/report them and > possibly proceed. Eg with something like: > >-- server side random fail >DO LANGUAGE plpgsql $$ >BEGIN > IF RANDOM() < 0.1 THEN >RAISE EXCEPTION 'unlucky!'; > END IF; >END; >$$; > > Or: > >-- client side random fail >BEGIN; >\if random(1, 10) <= 1 >SELECT 1 +; >\else >SELECT 2; >\endif >COMMIT; > > We could count the fail, rollback if necessary, and go on. What do you think? > Maybe such behavior would deserve an option. This feature to count failures that could occur at runtime seems nice. However, as discussed in [1], I think it is better to focus to only failures that can be retried in this patch, and introduce the feature to handle other failures in a separate patch. [1] https://www.postgresql.org/message-id/alpine.DEB.2.21.1809121519590.13887%40lancre > --report-latencies -> --report-per-command: should we keep supporting > the previous option? Ok. Although now the option is not only for latencies, considering users who are using the existing option, I'm fine with this. I got back this to the previous name. > --failures-detailed: if we bother to run with handling failures, should > it always be on? If we print other failures that cannot be retried in future, it could a lot of lines and might make some users who don't need details of failures annoyed. Moreover, some users would always need information of detailed failures in log, and others would need only total numbers of failures. Currently we handle only serialization and deadlock failures, so the number of lines printed and the number of columns of logging is not large even under the failures-detail, but if we have a chance to handle other failures in future, ISTM adding this option makes sense considering users who would like simple outputs. > --debug-errors: I'm not sure we should want a special debug mode for that, > I'd consider integrating it with the standard debug, or just for development. I think --debug is a debug option for telling users the pgbench's internal behaviors, that is, which client is doing what. On other hand, --debug-errors is for telling users what error caused a retry or a failure in detail. For users who are not interested in pgbench's internal behavior (sending a command, receiving a result, ... ) but interested in actual errors raised during running script, this option seems useful. > Also, should it use pg_log_debug? If we use pg_log_debug, the message is printed only under --debug. Therefore, I fixed to use pg_log_info instead of pg_log_error or fprintf. > doRetry: I'd separate the 3 no retries options instead of mixing max_tries and > timer_exceeeded, for clarity. Ok. I fixed to separate them. > Tries vs retries: I'm at odds with having tries & retries and + 1 here > and there to handle that, which is a little bit confusing. I'm wondering > whether > we could only count "tries" and adjust
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Yugo-san, # About v12.2 ## Compilation Patch seems to apply cleanly with "git apply", but does not compile on my host: "undefined reference to `conditional_stack_reset'". However it works better when using the "patch". I'm wondering why git apply fails silently… When compiling there are warnings about "pg_log_fatal", which does not expect a FILE* on pgbench.c:4453. Remove the "stderr" argument. Global and local checks ok. number of transactions failed: 324 (3.240%) ... number of transactions retried: 5629 (56.290%) number of total retries: 103299 I'd suggest: "number of failed transactions". "total number of retries" or just "number of retries"? ## Feature The overall code structure changes to implements the feature seems reasonable to me, as we are at the 12th iteration of the patch. Comments below are somehow about details and asking questions about choices, and commenting… ## Documentation There is a lot of documentation, which is good. I'll review these separatly. It looks good, but having a native English speaker/writer would really help! Some output examples do not correspond to actual output for the current version. In particular, there is always one TPS figure given now, instead of the confusing two shown before. ## Comments transactinos -> transactions. ## Code By default max_tries = 0. Should not the initialization be 1, as the documentation argues that it is the default? Counter comments, missing + in the formula on the skipped line. Given that we manage errors, ISTM that we should not necessarily stop on other not retried errors, but rather count/report them and possibly proceed. Eg with something like: -- server side random fail DO LANGUAGE plpgsql $$ BEGIN IF RANDOM() < 0.1 THEN RAISE EXCEPTION 'unlucky!'; END IF; END; $$; Or: -- client side random fail BEGIN; \if random(1, 10) <= 1 SELECT 1 +; \else SELECT 2; \endif COMMIT; We could count the fail, rollback if necessary, and go on. What do you think? Maybe such behavior would deserve an option. --report-latencies -> --report-per-command: should we keep supporting the previous option? --failures-detailed: if we bother to run with handling failures, should it always be on? --debug-errors: I'm not sure we should want a special debug mode for that, I'd consider integrating it with the standard debug, or just for development. Also, should it use pg_log_debug? doRetry: I'd separate the 3 no retries options instead of mixing max_tries and timer_exceeeded, for clarity. Tries vs retries: I'm at odds with having tries & retries and + 1 here and there to handle that, which is a little bit confusing. I'm wondering whether we could only count "tries" and adjust to report what we want later? advanceConnectionState: ISTM that ERROR should logically be before others which lead to it. Variables management: it looks expensive, with copying and freeing variable arrays. I'm wondering whether we should think of something more clever. Well, that would be for some other patch. "Accumulate the retries" -> "Count (re)tries"? Currently, ISTM that the retry on error mode is implicitely always on. Do we want that? I'd say yes, but maybe people could disagree. ## Tests There are tests, good! I'm wondering whether something simpler could be devised to trigger serialization or deadlock errors, eg with a SEQUENCE and an \if. See the attached files for generating deadlocks reliably (start with 2 clients). What do you think? The PL/pgSQL minimal, it is really client-code oriented. Given that deadlocks are detected about every seconds, the test runs would take some time. Let it be for now. -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Yugo-san, I'm wondering whether we could use "vars" instead of "variables" as a struct field name and function parameter name, so that is is shorter and more distinct from the type name "Variables". What do you think? The struct "Variables" has a field named "vars" which is an array of "Variable" type. I guess this is a reason why "variables" is used instead of "vars" as a name of "Variables" type variable so that we could know a variable's type is Variable or Variables. Also, in order to refer to the field, we would use vars->vars[vars->nvars] and there are nested "vars". Could this make a codereader confused? Hmmm… Probably. Let's keep "variables" then. -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Wed, 23 Jun 2021 10:38:43 +0200 (CEST) Fabien COELHO wrote: > > Hello Yugo-san: > > # About v12.1 > > This is a refactoring patch, which creates a separate structure for > holding variables. This will become handy in the next patch. There is also > a benefit from a software engineering point of view, so it has merit on > its own. > ## Compilation > > Patch applies cleanly, compiles, global & local checks pass. > > ## About the code > > Fine. > > I'm wondering whether we could use "vars" instead of "variables" as a > struct field name and function parameter name, so that is is shorter and > more distinct from the type name "Variables". What do you think? The struct "Variables" has a field named "vars" which is an array of "Variable" type. I guess this is a reason why "variables" is used instead of "vars" as a name of "Variables" type variable so that we could know a variable's type is Variable or Variables. Also, in order to refer to the field, we would use vars->vars[vars->nvars] and there are nested "vars". Could this make a codereader confused? > ## About comments > > Remove the comment on enlargeVariables about "It is assumed …" the issue > of trying MAXINT vars is more than remote and is not worth mentioning. In > the same function, remove the comments about MARGIN, it is already on the > macro declaration, once is enough. Sure. I'll remove them. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Yugo-san: # About v12.1 This is a refactoring patch, which creates a separate structure for holding variables. This will become handy in the next patch. There is also a benefit from a software engineering point of view, so it has merit on its own. ## Compilation Patch applies cleanly, compiles, global & local checks pass. ## About the code Fine. I'm wondering whether we could use "vars" instead of "variables" as a struct field name and function parameter name, so that is is shorter and more distinct from the type name "Variables". What do you think? ## About comments Remove the comment on enlargeVariables about "It is assumed …" the issue of trying MAXINT vars is more than remote and is not worth mentioning. In the same function, remove the comments about MARGIN, it is already on the macro declaration, once is enough. -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Fabien, On Tue, 22 Jun 2021 20:03:58 +0200 (CEST) Fabien COELHO wrote: > > Hello Yugo-san, > > Thanks a lot for continuing this work started by Marina! > > I'm planning to review it for the July CF. I've just added an entry there: > > https://commitfest.postgresql.org/33/3194/ Thanks! -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Yugo-san, Thanks a lot for continuing this work started by Marina! I'm planning to review it for the July CF. I've just added an entry there: https://commitfest.postgresql.org/33/3194/ -- Fabien.
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hi hackers, On Mon, 24 May 2021 11:29:10 +0900 Yugo NAGATA wrote: > Hi hackers, > > On Tue, 10 Mar 2020 09:48:23 +1300 > Thomas Munro wrote: > > > On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO wrote: > > > >> Thank you very much! I'm going to send a new patch set until the end of > > > >> this week (I'm sorry I was very busy in the release of Postgres Pro > > > >> 11...). > > > > > > > > Is anyone interested in rebasing this, and summarising what needs to > > > > be done to get it in? It's arguably a bug or at least quite > > > > unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard > > > > that a couple of forks already ship Marina's patch set. > > I got interested in this and now looking into the patch and the past > discussion. > If anyone other won't do it and there are no objection, I would like to rebase > this. Is that okay? I rebased and fixed the previous patches (v11) rewtten by Marina Polyakova, and attached the revised version (v12). v12-0001-Pgbench-errors-use-the-Variables-structure-for-c.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). v12-0002-Pgbench-errors-and-serialization-deadlock-retrie.patch - the main patch for handling client errors and repetition of transactions with serialization/deadlock failures (see the detailed description in the file). These are the revised versions from v11-0002 and v11-0003. v11-0001 (for the RandomState structure) is not included because this has been already committed (40923191944). V11-0004 (for a separate error reporting function) is not included neither because pgbench now uses common logging APIs (30a3e772b40). In addition to rebase on master, I updated the patch according with the review from Fabien COELHO [1] and discussions after this. Also, I added some other fixes through my reviewing the previous patch. [1] https://www.postgresql.org/message-id/alpine.DEB.2.21.1809081450100.10506%40lancre Following are fixes according with Fabian's review. > * Features > As far as the actual retry feature is concerned, I'd say we are nearly > there. However I have an issue with changing the behavior on meta command > and other sql errors, which I find not desirable. ... > I do not think that these changes of behavior are desirable. Meta command and > miscellaneous SQL errors should result in immediatly aborting the whole run, > because the client test code itself could not run correctly or the SQL sent > was somehow wrong, which is also the client's fault, and the server > performance bench does not make much sense in such conditions. > > ISTM that the focus of this patch should only be to handle some server > runtime errors that can be retryed, but not to change pgbench behavior on > other kind of errors. If these are to be changed, ISTM that it would be a > distinct patch and would require some discussion, and possibly an option > to enable it or not if some use case emerge. AFA this patch is concerned, > I'd suggest to let that out. Previously, all SQL and meta command errors could be retried, but I fixed to allow only serialization & deadlock errors to be retried. > Doc says "you cannot use an infinite number of retries without > latency-limit..." > > Why should this be forbidden? At least if -T timeout takes precedent and > shortens the execution, ISTM that there could be good reason to test that. > Maybe it could be blocked only under -t if this would lead to an non-ending > run. I fixed to allow to use --max-tries with -T option even if latency-limit is not used. > As "--print-errors" is really for debug, maybe it could be named > "--debug-errors". I'm not sure that having "--debug" implying this option > is useful: As there are two distinct options, the user may be allowed > to trigger one or the other as they wish? print-errors was renamed to debug-errors. > makeVariableValue error message is not for debug, but must be kept in all > cases, and the false returned must result in an immediate abort. Same thing > about > lookupCreateVariable, an invalid name is a user error which warrants an > immediate > abort. Same thing again about coerce* functions or evalStandardFunc... > Basically, most/all added "debug_level >= DEBUG_ERRORS" are not desirable. "DEBUG_ERRORS" messages unrelated to serialization & deadlock errors were removed. > sendRollback(): I'd suggest to simplify. The prepare/extended statement stuff > is > really about the transaction script, not dealing with errors, esp as there is > no > significant advantage in preparing a "ROLLBACK" statement which is short and > has > no parameters. I'd suggest to remove this function and just issue > PQsendQuery("ROLLBACK;") in all cases. Now, we just issue PQsendQuery("ROLLBACK;"). > In copyVariables, I'd simplify > > + if (source_var->svalue == NULL) > + dest_var->svalue = NULL; > + else > + dest_var->svalue =