Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hi hackers, On Tue, 10 Mar 2020 09:48:23 +1300 Thomas Munro wrote: > On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO wrote: > > >> Thank you very much! I'm going to send a new patch set until the end of > > >> this week (I'm sorry I was very busy in the release of Postgres Pro > > >> 11...). > > > > > > Is anyone interested in rebasing this, and summarising what needs to > > > be done to get it in? It's arguably a bug or at least quite > > > unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard > > > that a couple of forks already ship Marina's patch set. I got interested in this and now looking into the patch and the past discussion. If anyone other won't do it and there are no objection, I would like to rebase this. Is that okay? Regards, Yugo NAGATA > > > > I'm a reviewer on this patch, that I find a good thing (tm), and which was > > converging to a reasonable and simple enough addition, IMHO. > > > > If I proceed in place of Marina, who is going to do the reviews? > > Hi Fabien, > > Cool. I'll definitely take it for a spin if you post a fresh patch > set. Any place that we arbitrarily don't support SERIALIZABLE, I > consider a bug, so I'd like to commit this if we can agree it's ready. > It sounds like it's actually in pretty good shape. -- Yugo NAGATA
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO wrote: > >> Thank you very much! I'm going to send a new patch set until the end of > >> this week (I'm sorry I was very busy in the release of Postgres Pro > >> 11...). > > > > Is anyone interested in rebasing this, and summarising what needs to > > be done to get it in? It's arguably a bug or at least quite > > unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard > > that a couple of forks already ship Marina's patch set. > > I'm a reviewer on this patch, that I find a good thing (tm), and which was > converging to a reasonable and simple enough addition, IMHO. > > If I proceed in place of Marina, who is going to do the reviews? Hi Fabien, Cool. I'll definitely take it for a spin if you post a fresh patch set. Any place that we arbitrarily don't support SERIALIZABLE, I consider a bug, so I'd like to commit this if we can agree it's ready. It sounds like it's actually in pretty good shape.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Thomas, Thank you very much! I'm going to send a new patch set until the end of this week (I'm sorry I was very busy in the release of Postgres Pro 11...). Is anyone interested in rebasing this, and summarising what needs to be done to get it in? It's arguably a bug or at least quite unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard that a couple of forks already ship Marina's patch set. I'm a reviewer on this patch, that I find a good thing (tm), and which was converging to a reasonable and simple enough addition, IMHO. If I proceed in place of Marina, who is going to do the reviews? -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On Mon, Mar 9, 2020 at 10:00 AM Marina Polyakova wrote: > On 2018-11-16 22:59, Alvaro Herrera wrote: > > On 2018-Sep-05, Marina Polyakova wrote: > > > >> v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch > >> - a patch for the RandomState structure (this is used to reset a > >> client's > >> random seed during the repeating of transactions after > >> serialization/deadlock failures). > > > > Pushed this one with minor stylistic changes (the most notable of which > > is the move of initRandomState to where the rest of the random > > generator > > infrastructure is, instead of in a totally random place). Thanks, > > Thank you very much! I'm going to send a new patch set until the end of > this week (I'm sorry I was very busy in the release of Postgres Pro > 11...). Is anyone interested in rebasing this, and summarising what needs to be done to get it in? It's arguably a bug or at least quite unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard that a couple of forks already ship Marina's patch set.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Feel free to update a patch status to "needs review" yourself after submitting a new version that in your opinion respond to a reviewer's comments. Sure, I do that. But I will not switch any of my patch to "Ready". AFAICR the concerns where mostly about imprecise comments in the code, and a few questions that I answered. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 2018-Nov-19, Fabien COELHO wrote: > > Hello Alvaro, > > > I also think that the pgbench_error() patch should go in before the main > > one. It seems a bit pointless to introduce code using a bad API only to > > fix the API together with all the new callers immediately afterwards. > > I'm not that keen on this part of the patch, because ISTM that introduces > significant and possibly costly malloc/free cycles when handling error, > which do not currently exist in pgbench. Oh, I wasn't aware of that. > Related to Marina patch (triggered by reviewing the patches), I have > submitted a refactoring patch which aims at cleaning up the internal state > machine, so that additions and checking that all is well is simpler. > > https://commitfest.postgresql.org/20/1754/ let me look at this one. > It has been reviewed, I think I answered to the reviewer concerns, but the > reviewer did not update the patch state on the cf app, so I do not know > whether he is unsatisfied or if it was just forgotten. Feel free to update a patch status to "needs review" yourself after submitting a new version that in your opinion respond to a reviewer's comments. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Alvaro, I also think that the pgbench_error() patch should go in before the main one. It seems a bit pointless to introduce code using a bad API only to fix the API together with all the new callers immediately afterwards. I'm not that keen on this part of the patch, because ISTM that introduces significant and possibly costly malloc/free cycles when handling error, which do not currently exist in pgbench. Previously an error was basically the end of the script, but with the feature being introduced by Marina some errors are handled, in which case we end up with paying these costs in the test loop. Also, refactoring error handling is not necessary for the new feature. That is why I advised to move it away and possibly keep it for later. Related to Marina patch (triggered by reviewing the patches), I have submitted a refactoring patch which aims at cleaning up the internal state machine, so that additions and checking that all is well is simpler. https://commitfest.postgresql.org/20/1754/ It has been reviewed, I think I answered to the reviewer concerns, but the reviewer did not update the patch state on the cf app, so I do not know whether he is unsatisfied or if it was just forgotten. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 2018-Nov-19, Marina Polyakova wrote: > On 2018-11-16 22:59, Alvaro Herrera wrote: > > On 2018-Sep-05, Marina Polyakova wrote: > > > > > v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch > > > - a patch for the RandomState structure (this is used to reset a > > > client's > > > random seed during the repeating of transactions after > > > serialization/deadlock failures). > > > > Pushed this one with minor stylistic changes (the most notable of which > > is the move of initRandomState to where the rest of the random generator > > infrastructure is, instead of in a totally random place). Thanks, > > Thank you very much! I'm going to send a new patch set until the end of this > week (I'm sorry I was very busy in the release of Postgres Pro 11...). Great, thanks. I also think that the pgbench_error() patch should go in before the main one. It seems a bit pointless to introduce code using a bad API only to fix the API together with all the new callers immediately afterwards. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 2018-11-16 22:59, Alvaro Herrera wrote: On 2018-Sep-05, Marina Polyakova wrote: v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). Pushed this one with minor stylistic changes (the most notable of which is the move of initRandomState to where the rest of the random generator infrastructure is, instead of in a totally random place). Thanks, Thank you very much! I'm going to send a new patch set until the end of this week (I'm sorry I was very busy in the release of Postgres Pro 11...). -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 2018-Sep-05, Marina Polyakova wrote: > v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch > - a patch for the RandomState structure (this is used to reset a client's > random seed during the repeating of transactions after > serialization/deadlock failures). Pushed this one with minor stylistic changes (the most notable of which is the move of initRandomState to where the rest of the random generator infrastructure is, instead of in a totally random place). Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On Wed, Sep 12, 2018 at 06:12:29PM +0300, Marina Polyakova wrote: > The discussion about this has become entangled from the beginning, because > as I wrote in [1] at first I misread your original proposal... The last emails are about the last reviews of Fabien, which has remained unanswered for the last couple of weeks. I am marking this patch as returned with feedback for now. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 12-09-2018 17:04, Fabien COELHO wrote: Hello Marina, You can get other errors that cannot happen for only one client if you use shell commands in meta commands: Or if you use untrusted procedural languages in SQL expressions (see the used file in the attachments): Or if you try to create a function and perhaps replace an existing one: Sure. Indeed there can be shell errors, perl errors, create functions conflicts... I do not understand what is your point wrt these. I'm mostly saying that your patch should focus on implementing the retry feature when appropriate, and avoid changing the behavior (error displayed, abort or not) on features unrelated to serialization & deadlock errors. Maybe there are inconsistencies, and "bug"/"feature" worth fixing, but if so that should be a separate patch, if possible, and if these are bugs they could be backpatched. For now I'm still convinced that pgbench should keep on aborting on "\set" or SQL syntax errors, and show clear error messages on these, and your examples have not changed my mind on that point. I'm fine with renaming the field if it makes thinks clearer. They are all counters, so naming them "cnt" or "total_cnt" does not help much. Maybe "succeeded" or "success" to show what is really counted? Perhaps renaming of StatsData.cnt is better than just adding a comment to this field. But IMO we have the same problem (They are all counters, so naming them "cnt" or "total_cnt" does not help much.) for CState.cnt which cannot be named in the same way because it also includes skipped and failed transactions. Hmmm. CState's cnt seems only used to implement -t anyway? I'm okay if it has a different name, esp if it has a different semantics. Ok! I think I was arguing only about cnt in StatsData. The discussion about this has become entangled from the beginning, because as I wrote in [1] at first I misread your original proposal... [1] https://www.postgresql.org/message-id/d318cdee8f96de6b1caf2ce684ffe4db%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, You can get other errors that cannot happen for only one client if you use shell commands in meta commands: Or if you use untrusted procedural languages in SQL expressions (see the used file in the attachments): Or if you try to create a function and perhaps replace an existing one: Sure. Indeed there can be shell errors, perl errors, create functions conflicts... I do not understand what is your point wrt these. I'm mostly saying that your patch should focus on implementing the retry feature when appropriate, and avoid changing the behavior (error displayed, abort or not) on features unrelated to serialization & deadlock errors. Maybe there are inconsistencies, and "bug"/"feature" worth fixing, but if so that should be a separate patch, if possible, and if these are bugs they could be backpatched. For now I'm still convinced that pgbench should keep on aborting on "\set" or SQL syntax errors, and show clear error messages on these, and your examples have not changed my mind on that point. I'm fine with renaming the field if it makes thinks clearer. They are all counters, so naming them "cnt" or "total_cnt" does not help much. Maybe "succeeded" or "success" to show what is really counted? Perhaps renaming of StatsData.cnt is better than just adding a comment to this field. But IMO we have the same problem (They are all counters, so naming them "cnt" or "total_cnt" does not help much.) for CState.cnt which cannot be named in the same way because it also includes skipped and failed transactions. Hmmm. CState's cnt seems only used to implement -t anyway? I'm okay if it has a different name, esp if it has a different semantics. I think I was arguing only about cnt in StatsData. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 11-09-2018 18:29, Fabien COELHO wrote: Hello Marina, Hmm, but we can say the same for serialization or deadlock errors that were not retried (the client test code itself could not run correctly or the SQL sent was somehow wrong, which is also the client's fault), can't we? I think not. If a client asks for something "legal", but some other client in parallel happens to make an incompatible change which result in a serialization or deadlock error, the clients are not responsible for the raised errors, it is just that they happen to ask for something incompatible at the same time. So there is no user error per se, but the server is reporting its (temporary) inability to process what was asked for. For these errors, retrying is fine. If the client was alone, there would be no such errors, you cannot deadlock with yourself. This is really an isolation issue linked to parallel execution. You can get other errors that cannot happen for only one client if you use shell commands in meta commands: starting vacuum...end. transaction type: pgbench_meta_concurrent_error.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 20/20 maximum number of tries: 1 latency average = 6.953 ms tps = 287.630161 (including connections establishing) tps = 303.232242 (excluding connections establishing) statement latencies in milliseconds and failures: 1.636 0 BEGIN; 1.497 0 \setshell var mkdir my_directory && echo 1 0.007 0 \sleep 1 us 1.465 0 \setshell var rmdir my_directory && echo 1 1.622 0 END; starting vacuum...end. mkdir: cannot create directory ‘my_directory’: File exists mkdir: could not read result of shell command client 1 got an error in command 1 (setshell) of script 0; execution of meta-command failed transaction type: pgbench_meta_concurrent_error.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 19/20 number of failures: 1 (5.000%) number of meta-command failures: 1 (5.000%) maximum number of tries: 1 latency average = 11.782 ms (including failures) tps = 161.269033 (including connections establishing) tps = 167.733278 (excluding connections establishing) statement latencies in milliseconds and failures: 2.731 0 BEGIN; 2.909 1 \setshell var mkdir my_directory && echo 1 0.231 0 \sleep 1 us 2.366 0 \setshell var rmdir my_directory && echo 1 2.664 0 END; Or if you use untrusted procedural languages in SQL expressions (see the used file in the attachments): starting vacuum...ERROR: relation "pgbench_branches" does not exist (ignoring this error and continuing anyway) ERROR: relation "pgbench_tellers" does not exist (ignoring this error and continuing anyway) ERROR: relation "pgbench_history" does not exist (ignoring this error and continuing anyway) end. client 1 got an error in command 0 (SQL) of script 0; ERROR: could not create the directory "my_directory": File exists at line 3. CONTEXT: PL/Perl anonymous code block client 1 got an error in command 0 (SQL) of script 0; ERROR: could not create the directory "my_directory": File exists at line 3. CONTEXT: PL/Perl anonymous code block transaction type: pgbench_concurrent_error.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 18/20 number of failures: 2 (10.000%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) number of other SQL failures: 2 (10.000%) maximum number of tries: 1 latency average = 3.282 ms (including failures) tps = 548.437196 (including connections establishing) tps = 637.662753 (excluding connections establishing) statement latencies in milliseconds and failures: 1.566 2 DO $$ starting vacuum...ERROR: relation "pgbench_branches" does not exist (ignoring this error and continuing anyway) ERROR: relation "pgbench_tellers" does not exist (ignoring this error and continuing anyway) ERROR: relation "pgbench_history" does not exist (ignoring this error and continuing anyway) end. transaction type: pgbench_concurrent_error.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 20/20 maximum number of tries: 1 latency average = 2.760 ms tps = 724.746078 (including connections establishing) tps = 853.131985 (excluding connections establishing) statement latencies in milliseconds and failures: 1.893 0 DO $$ Or if you try to create a function and perhaps replace an existing one: starting vacuum...end. client 0
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, Hmm, but we can say the same for serialization or deadlock errors that were not retried (the client test code itself could not run correctly or the SQL sent was somehow wrong, which is also the client's fault), can't we? I think not. If a client asks for something "legal", but some other client in parallel happens to make an incompatible change which result in a serialization or deadlock error, the clients are not responsible for the raised errors, it is just that they happen to ask for something incompatible at the same time. So there is no user error per se, but the server is reporting its (temporary) inability to process what was asked for. For these errors, retrying is fine. If the client was alone, there would be no such errors, you cannot deadlock with yourself. This is really an isolation issue linked to parallel execution. Why not handle client errors that can occur (but they may also not occur) the same way? (For example, always abort the client, or conversely do not make aborts in these cases.) Here's an example of such error: client 5 got an error in command 1 (SQL) of script 0; ERROR: division by zero This is an interesting case. For me we must stop the script because the client is asking for something "stupid", and retrying the same won't change the outcome, the division will still be by zero. It is the client responsability not to ask for something stupid, the bench script is buggy, it should not submit illegal SQL queries. This is quite different from submitting something legal which happens to fail. Maybe we can limit the number of failures in one statement, and abort the client if this limit is exceeded?... I think this is quite debatable, and that the best option is to leavze this point out of the current patch, so that we could have retry on serial/deadlock errors. Then you can submit another patch for a feature about other errors if you feel that there is a use case for going on in some cases. I think that the previous behavior made sense, and that changing it should only be considered as an option. As it involves discussing and is not obvious, later is better. To get a clue about the actual issue you can use the options --failures-detailed (to find out out whether this is a serialization failure / deadlock failure / other SQL failure / meta command failure) and/or --print-errors (to get the complete error message). Yep, but for me it should haved stopped immediately, as it did before. If you use the option --latency-limit, the time of tries will be limited regardless of the use of the option -t. Therefore ISTM that an unlimited number of tries can be used only if the time of tries is limited by the options -T and/or -L. Indeed, I'm ok with forbidding unlimitted retries when under -t. I'm not sure that having "--debug" implying this option is useful: As there are two distinct options, the user may be allowed to trigger one or the other as they wish? I'm not sure that the main debugging output will give a good clue of what's happened without full messages about errors, retries and failures... I'm more argumenting about letting the user decide what they want. These lines are quite long - do you suggest to wrap them this way? Sure, if it is too long, then wrap. Function getTransactionStatus name does not seem to correspond fully to what the function does. There is a passthru case which should be either avoided or clearly commented. I don't quite understand you - do you mean that in fact this function finds out whether we are in a (failed) transaction block or not? Or do you mean that the case of PQTRANS_INTRANS is also ok?... The former: although the function is named "getTransactionStatus", it does not really return the "status" of the transaction (aka PQstatus()?). I tried not to break the limit of 80 characters, but if you think that this is better, I'll change it. Hmmm. 80 columns, indeed... I'd insist in a comment that "cnt" does not include "skipped" transactions (anymore). If you mean CState.cnt I'm not sure if this is practically useful because the code uses only the sum of all client transactions including skipped and failed... Maybe we can rename this field to nxacts or total_cnt? I'm fine with renaming the field if it makes thinks clearer. They are all counters, so naming them "cnt" or "total_cnt" does not help much. Maybe "succeeded" or "success" to show what is really counted? -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 11-09-2018 16:47, Marina Polyakova wrote: On 08-09-2018 16:03, Fabien COELHO wrote: Hello Marina, I'd insist in a comment that "cnt" does not include "skipped" transactions (anymore). If you mean CState.cnt I'm not sure if this is practically useful because the code uses only the sum of all client transactions including skipped and failed... Maybe we can rename this field to nxacts or total_cnt? Sorry, I misread your proposal for the first time. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 08-09-2018 16:03, Fabien COELHO wrote: Hello Marina, v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patch - the main patch for handling client errors and repetition of transactions with serialization/deadlock failures (see the detailed description in the file). About patch v11-3. Patch applies cleanly on top of the other two. Compiles, global and local "make check" are ok. :-) * Features As far as the actual retry feature is concerned, I'd say we are nearly there. However I have an issue with changing the behavior on meta command and other sql errors, which I find not desirable. When a meta-command fails, before the patch the command is aborted and there is a convenient error message: sh> pgbench -T 10 -f bad-meta.sql bad-meta.sql:1: unexpected function name (false) in command "set" [...] \set i false + 1 [...] After the patch it is simply counted, pgbench loops on the same error till the time is completed, and there are no clue about the actual issue: sh> pgbench -T 10 -f bad-meta.sql starting vacuum...end. transaction type: bad-meta.sql duration: 10 s number of transactions actually processed: 0 number of failures: 27993953 (100.000%) ... Same thing about SQL errors, an immediate abort... sh> pgbench -T 10 -f bad-sql.sql starting vacuum...end. client 0 aborted in command 0 of script 0; ERROR: syntax error at or near ";" LINE 1: SELECT 1 + ; ... is turned into counting without aborting nor error messages, so that there is no clue that the user was asking for something bad. sh> pgbench -T 10 -f bad-sql.sql starting vacuum...end. transaction type: bad-sql.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 10 s number of transactions actually processed: 0 number of failures: 274617 (100.000%) # no clue that there was a syntax error in the script I do not think that these changes of behavior are desirable. Meta command and miscellaneous SQL errors should result in immediatly aborting the whole run, because the client test code itself could not run correctly or the SQL sent was somehow wrong, which is also the client's fault, and the server performance bench does not make much sense in such conditions. ISTM that the focus of this patch should only be to handle some server runtime errors that can be retryed, but not to change pgbench behavior on other kind of errors. If these are to be changed, ISTM that it would be a distinct patch and would require some discussion, and possibly an option to enable it or not if some use case emerge. AFA this patch is concerned, I'd suggest to let that out. ... The following remarks are linked to the change of behavior discussed above: makeVariableValue error message is not for debug, but must be kept in all cases, and the false returned must result in an immediate abort. Same thing about lookupCreateVariable, an invalid name is a user error which warrants an immediate abort. Same thing again about coerce* functions or evalStandardFunc... Basically, most/all added "debug_level >= DEBUG_ERRORS" are not desirable. Hmm, but we can say the same for serialization or deadlock errors that were not retried (the client test code itself could not run correctly or the SQL sent was somehow wrong, which is also the client's fault), can't we? Why not handle client errors that can occur (but they may also not occur) the same way? (For example, always abort the client, or conversely do not make aborts in these cases.) Here's an example of such error: starting vacuum...end. transaction type: pgbench_rare_sql_error.sql scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 number of transactions per client: 250 number of transactions actually processed: 2500/2500 maximum number of tries: 1 latency average = 0.375 ms tps = 26695.292848 (including connections establishing) tps = 27489.678525 (excluding connections establishing) statement latencies in milliseconds and failures: 0.001 0 \set divider random(-1000, 1000) 0.245 0 SELECT 1 / :divider; starting vacuum...end. client 5 got an error in command 1 (SQL) of script 0; ERROR: division by zero client 0 got an error in command 1 (SQL) of script 0; ERROR: division by zero client 7 got an error in command 1 (SQL) of script 0; ERROR: division by zero transaction type: pgbench_rare_sql_error.sql scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 number of transactions per client: 250 number of transactions actually processed: 2497/2500 number of failures: 3 (0.120%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) number of other SQL failures: 3 (0.120%) maximum number of tries: 1 latency average = 0.579 ms (including failures) tps = 17240.662547 (including connections establishing) tps = 17862.090137 (excluding connections establishing) statement latencies
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 08-09-2018 10:17, Fabien COELHO wrote: Hello Marina, Hello, Fabien! About the two first preparatory patches. v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). Same version as the previous one, which was ok. Still applies, compiles, passes tests. Fine with me. v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). Simpler version, applies cleanly on top of previous patch, compiles and global & local "make check" are ok. Fine with me as well. Glad to hear it :) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patch - the main patch for handling client errors and repetition of transactions with serialization/deadlock failures (see the detailed description in the file). About patch v11-3. Patch applies cleanly on top of the other two. Compiles, global and local "make check" are ok. * Features As far as the actual retry feature is concerned, I'd say we are nearly there. However I have an issue with changing the behavior on meta command and other sql errors, which I find not desirable. When a meta-command fails, before the patch the command is aborted and there is a convenient error message: sh> pgbench -T 10 -f bad-meta.sql bad-meta.sql:1: unexpected function name (false) in command "set" [...] \set i false + 1 [...] After the patch it is simply counted, pgbench loops on the same error till the time is completed, and there are no clue about the actual issue: sh> pgbench -T 10 -f bad-meta.sql starting vacuum...end. transaction type: bad-meta.sql duration: 10 s number of transactions actually processed: 0 number of failures: 27993953 (100.000%) ... Same thing about SQL errors, an immediate abort... sh> pgbench -T 10 -f bad-sql.sql starting vacuum...end. client 0 aborted in command 0 of script 0; ERROR: syntax error at or near ";" LINE 1: SELECT 1 + ; ... is turned into counting without aborting nor error messages, so that there is no clue that the user was asking for something bad. sh> pgbench -T 10 -f bad-sql.sql starting vacuum...end. transaction type: bad-sql.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 10 s number of transactions actually processed: 0 number of failures: 274617 (100.000%) # no clue that there was a syntax error in the script I do not think that these changes of behavior are desirable. Meta command and miscellaneous SQL errors should result in immediatly aborting the whole run, because the client test code itself could not run correctly or the SQL sent was somehow wrong, which is also the client's fault, and the server performance bench does not make much sense in such conditions. ISTM that the focus of this patch should only be to handle some server runtime errors that can be retryed, but not to change pgbench behavior on other kind of errors. If these are to be changed, ISTM that it would be a distinct patch and would require some discussion, and possibly an option to enable it or not if some use case emerge. AFA this patch is concerned, I'd suggest to let that out. Doc says "you cannot use an infinite number of retries without latency-limit..." Why should this be forbidden? At least if -T timeout takes precedent and shortens the execution, ISTM that there could be good reason to test that. Maybe it could be blocked only under -t if this would lead to an non-ending run. As "--print-errors" is really for debug, maybe it could be named "--debug-errors". I'm not sure that having "--debug" implying this option is useful: As there are two distinct options, the user may be allowed to trigger one or the other as they wish? * Code The following remarks are linked to the change of behavior discussed above: makeVariableValue error message is not for debug, but must be kept in all cases, and the false returned must result in an immediate abort. Same thing about lookupCreateVariable, an invalid name is a user error which warrants an immediate abort. Same thing again about coerce* functions or evalStandardFunc... Basically, most/all added "debug_level >= DEBUG_ERRORS" are not desirable. sendRollback(): I'd suggest to simplify. The prepare/extended statement stuff is really about the transaction script, not dealing with errors, esp as there is no significant advantage in preparing a "ROLLBACK" statement which is short and has no parameters. I'd suggest to remove this function and just issue PQsendQuery("ROLLBACK;") in all cases. In copyVariables, I'd simplify + if (source_var->svalue == NULL) + dest_var->svalue = NULL; + else + dest_var->svalue = pg_strdup(source_var->svalue); as: dest_var->value = (source_var->svalue == NULL) ? NULL : pg_strdup(source_var->svalue); + if (sqlState) -> if (sqlState != NULL) ? Function getTransactionStatus name does not seem to correspond fully to what the function does. There is a passthru case which should be either avoided or clearly commented. About: - commandFailed(st, "SQL", "perhaps the backend died while processing"); + clientAborted(st, + "perhaps the backend died while processing"); keep on one line? About: + if (doRetry(st, )) + st->state = CSTATE_RETRY; + else + st->state = CSTATE_FAILURE; -> st->state = doRetry(st, ) ? CSTATE_RETRY : CSTATE_FAILURE; * Comments "There're different types..." -> "There are different types..." "after the errors and"... -> "after errors and"... "the default
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, About the two first preparatory patches. v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). Same version as the previous one, which was ok. Still applies, compiles, passes tests. Fine with me. v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). Simpler version, applies cleanly on top of previous patch, compiles and global & local "make check" are ok. Fine with me as well. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 17-08-2018 14:04, Fabien COELHO wrote: ... Or perhaps we can use a more detailed failure status so for each type of failure we always know the command name (argument "cmd") and whether the client is aborted. Something like this (but in comparison with the first variant ISTM overly complicated): I agree., I do not think that it would be useful given that the same thing is done on all meta-command error cases in the end. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 17-08-2018 10:49, Fabien COELHO wrote: Hello Marina, Detailed -r report. I understand from the doc that the retry number on the detailed per-statement report is to identify at what point errors occur? Probably this is more or less always at the same point on a given script, so that the most interesting feature is to report the number of retries at the script level. This may depend on various factors.. for example: [...] 21.239 5 36 UPDATE xy3 SET y = y + :delta WHERE x = :x3; 21.360 5 39 UPDATE xy2 SET y = y + :delta WHERE x = :x2; Ok, not always the same point, and you confirm that it identifies where the error is raised which leads to a retry. Yes, I confirm this. I'll try to write more clearly about this in the documentation... So we can write something like this: All the transactions are divided into several types depending on their execution. Firstly, they can be divided into transactions that we started to execute, and transactions which were skipped (it was too late to execute them). Secondly, running transactions fall into 2 main types: is there any command that got a failure during the last execution of the transaction script or not? Thus Here is an attempt at having a more precise and shorter version, not sure it is much better than yours, though: """ Transactions are counted depending on their execution and outcome. First a transaction may have started or not: skipped transactions occur under --rate and --latency-limit when the client is too late to execute them. Secondly, a started transaction may ultimately succeed or fail on some error, possibly after some retries when --max-tries is not one. Thus """ Thank you! I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise "another" does not make sense yet. Maybe firstly put a general group, and then special cases?... I understand it more as a catch all default "none of the above" case. Ok! commandFailed: I'm not thrilled by the added boolean, which is partially redundant with the second argument. Do you mean that it is partially redundant with the argument "cmd" and, for example, the meta commands errors always do not cause the abortions of the client? Yes. And also I'm not sure we should want this boolean at all. Perhaps we can use a separate function to print the messages about client's abortion, something like this (it is assumed that all abortions happen when processing SQL commands): static void clientAborted(CState *st, const char *message) { pgbench_error(..., "client %d aborted in command %d (SQL) of script %d; %s\n", st->id, st->command, st->use_file, message); } Or perhaps we can use a more detailed failure status so for each type of failure we always know the command name (argument "cmd") and whether the client is aborted. Something like this (but in comparison with the first variant ISTM overly complicated): /* * For the failures during script execution. */ typedef enum FailureStatus { NO_FAILURE = 0, /* * Failures in meta commands. In these cases the failed transaction is * terminated. */ META_SET_FAILURE, META_SETSHELL_FAILURE, META_SHELL_FAILURE, META_SLEEP_FAILURE, META_IF_FAILURE, META_ELIF_FAILURE, /* * Failures in SQL commands. In cases of serialization/deadlock failures a * failed transaction is re-executed from the very beginning if possible; * otherwise the failed transaction is terminated. */ SERIALIZATION_FAILURE, DEADLOCK_FAILURE, OTHER_SQL_FAILURE, /* other failures in SQL commands that are not * listed by themselves above */ /* * Failures while processing SQL commands. In this case the client is * aborted. */ SQL_CONNECTION_FAILURE } FailureStatus; [...] If in such cases one command is placed on several lines, ISTM that the code is more understandable if curly brackets are used... Hmmm. Such basic style changes are avoided because they break backpatching, so we try to avoid gratuitous changes unless there is a strong added value, which does not seem to be the case here. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, Detailed -r report. I understand from the doc that the retry number on the detailed per-statement report is to identify at what point errors occur? Probably this is more or less always at the same point on a given script, so that the most interesting feature is to report the number of retries at the script level. This may depend on various factors.. for example: [...] 21.239 5 36 UPDATE xy3 SET y = y + :delta WHERE x = :x3; 21.360 5 39 UPDATE xy2 SET y = y + :delta WHERE x = :x2; Ok, not always the same point, and you confirm that it identifies where the error is raised which leads to a retry. And you can always get the number of retries at the script level from the main report (if only one script is used) or from the report for each script (if multiple scripts are used). Ok. ISTM that "skipped" transactions are NOT "successful" so there are a problem with comments. I believe that your formula are probably right, it has more to do with what is "success". For cnt decomposition, ISTM that "other transactions" are really "directly successful transactions". I agree with you, but I also think that skipped transactions should not be considered errors. I'm ok with having a special category for them in the explanations, which is neither success nor error. So we can write something like this: All the transactions are divided into several types depending on their execution. Firstly, they can be divided into transactions that we started to execute, and transactions which were skipped (it was too late to execute them). Secondly, running transactions fall into 2 main types: is there any command that got a failure during the last execution of the transaction script or not? Thus Here is an attempt at having a more precise and shorter version, not sure it is much better than yours, though: """ Transactions are counted depending on their execution and outcome. First a transaction may have started or not: skipped transactions occur under --rate and --latency-limit when the client is too late to execute them. Secondly, a started transaction may ultimately succeed or fail on some error, possibly after some retries when --max-tries is not one. Thus """ the number of all transactions = skipped (it was too late to execute them) cnt (the number of successful transactions) + ecnt (the number of failed transactions). A successful transaction can have several unsuccessful tries before a successfull run. Thus cnt (the number of successful transactions) = retried (they got a serialization or a deadlock failure(s), but were successfully retried from the very beginning) + directly successfull transactions (they were successfully completed on the first try). These above description is clearer for me. I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise "another" does not make sense yet. Maybe firstly put a general group, and then special cases?... I understand it more as a catch all default "none of the above" case. In TState, field "uint32 retries": maybe it would be simpler to count "tries", which can be compared directly to max tries set in the option? If you mean retries in CState - on the one hand, yes, on the other hand, statistics always use the number of retries... Ok. The automaton skips to FAILURE on every possible error. I'm wondering whether it could do so only on SQL errors, because other fails will lead to ABORTED anyway? If there is no good reason to skip to FAILURE from some errors, I'd suggest to keep the previous behavior. Maybe the good reason is to do some counting, but this means that on eg metacommand errors now the script would loop over instead of aborting, which does not look like a desirable change of behavior. Even in the case of meta command errors we must prepare for CSTATE_END_TX and the execution of the next script: if necessary, clear the conditional stack and rollback the current transaction block. Seems ok. commandFailed: I'm not thrilled by the added boolean, which is partially redundant with the second argument. Do you mean that it is partially redundant with the argument "cmd" and, for example, the meta commands errors always do not cause the abortions of the client? Yes. And also I'm not sure we should want this boolean at all. [...] If in such cases one command is placed on several lines, ISTM that the code is more understandable if curly brackets are used... Hmmm. Such basic style changes are avoided because they break backpatching, so we try to avoid gratuitous changes unless there is a strong added value, which does not seem to be the case here. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 15-08-2018 11:50, Fabien COELHO wrote: Hello Marina, Hello! v10-0004-Pgbench-errors-and-serialization-deadlock-retrie.patch - the main patch for handling client errors and repetition of transactions with serialization/deadlock failures (see the detailed description in the file). Patch applies cleanly. It allows retrying a script (considered as a transaction) on serializable and deadlock errors, which is a very interesting extension but also impacts pgbench significantly. I'm waiting for the feature to be right before checking in full the documentation and tests. There are still some issues to resolve before checking that. Anyway, tests look reasonable. Taking advantage of of transactions control from PL/pgsql is a good use of this new feature. :-) A few comments about the doc. According to the documentation, the feature is triggered by --max-tries and --latency-limit. I disagree with the later, because it means that having latency limit without retrying is not supported anymore. Maybe you can allow an "unlimited" max-tries, say with special value zero, and the latency limit does its job if set, over all tries. Doc: "error in meta commands" -> "meta command errors", for homogeneity with other cases? ... Doc: "never occur.." -> "never occur", or eventually "...". Doc: "Directly client errors" -> "Direct client errors". ... inTransactionBlock: I disagree with any function other than doCustom changing the client state, because it makes understanding the state machine harder. There is already one exception to that (threadRun) that I wish to remove. All state changes must be performed explicitely in doCustom. ... PQexec("ROOLBACK"): you are inserting a synchronous command, for which the thread will have to wait for the result, in a middle of a framework which takes great care to use only asynchronous stuff so that one thread can manage several clients efficiently. You cannot call PQexec there. From where I sit, I'd suggest to sendQuery("ROLLBACK"), then switch to a new state CSTATE_WAIT_ABORT_RESULT which would be similar to CSTATE_WAIT_RESULT, but on success would skip to RETRY or ABORT instead of proceeding to the next command. ... memcpy(&(st->retry_state.random_state), &(st->random_state), sizeof(RandomState)); Is there a problem with "st->retry_state.random_state = st->random_state;" instead of memcpy? ISTM that simple assignments work in C. Idem in the reverse copy under RETRY. Thank you, I'll fix this. Detailed -r report. I understand from the doc that the retry number on the detailed per-statement report is to identify at what point errors occur? Probably this is more or less always at the same point on a given script, so that the most interesting feature is to report the number of retries at the script level. This may depend on various factors.. for example: transaction type: pgbench_test_serialization.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 duration: 10 s number of transactions actually processed: 266 number of errors: 10 (3.623%) number of serialization errors: 10 (3.623%) number of retried: 75 (27.174%) number of retries: 75 maximum number of tries: 2 latency average = 72.734 ms (including errors) tps = 26.501162 (including connections establishing) tps = 26.515082 (excluding connections establishing) statement latencies in milliseconds, errors and retries: 0.012 0 0 \set delta random(-5000, 5000) 0.001 0 0 \set x1 random(1, 10) 0.001 0 0 \set x3 random(1, 2) 0.001 0 0 \set x2 random(1, 1) 19.837 0 0 UPDATE xy1 SET y = y + :delta WHERE x = :x1; 21.239 5 36 UPDATE xy3 SET y = y + :delta WHERE x = :x3; 21.360 5 39 UPDATE xy2 SET y = y + :delta WHERE x = :x2; And you can always get the number of retries at the script level from the main report (if only one script is used) or from the report for each script (if multiple scripts are used). I'm still in favor of asserting that the sql connection is idle (no tx in progress) at the beginning and/or end of a script, and report a user error if not, instead of writing complex caveats. If someone has a use-case for that, then maybe it can be changed, but I cannot see any in a benchmarking context, and I can see how easy it is to have a buggy script with this allowed. I do not think that the RETRIES_ENABLED macro is a good thing. I'd suggest to write the condition four times. Ok! ISTM that "skipped" transactions are NOT "successful" so there are a problem with comments. I believe that your formula are probably right, it has more to do with what is "success". For cnt decomposition, ISTM that "other transactions" are really "directly successful transactions". I agree with you, but I also think that skipped transactions should not be
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, v10-0004-Pgbench-errors-and-serialization-deadlock-retrie.patch - the main patch for handling client errors and repetition of transactions with serialization/deadlock failures (see the detailed description in the file). Patch applies cleanly. It allows retrying a script (considered as a transaction) on serializable and deadlock errors, which is a very interesting extension but also impacts pgbench significantly. I'm waiting for the feature to be right before checking in full the documentation and tests. There are still some issues to resolve before checking that. Anyway, tests look reasonable. Taking advantage of of transactions control from PL/pgsql is a good use of this new feature. A few comments about the doc. According to the documentation, the feature is triggered by --max-tries and --latency-limit. I disagree with the later, because it means that having latency limit without retrying is not supported anymore. Maybe you can allow an "unlimited" max-tries, say with special value zero, and the latency limit does its job if set, over all tries. Doc: "error in meta commands" -> "meta command errors", for homogeneity with other cases? Detailed -r report. I understand from the doc that the retry number on the detailed per-statement report is to identify at what point errors occur? Probably this is more or less always at the same point on a given script, so that the most interesting feature is to report the number of retries at the script level. Doc: "never occur.." -> "never occur", or eventually "...". Doc: "Directly client errors" -> "Direct client errors". I'm still in favor of asserting that the sql connection is idle (no tx in progress) at the beginning and/or end of a script, and report a user error if not, instead of writing complex caveats. If someone has a use-case for that, then maybe it can be changed, but I cannot see any in a benchmarking context, and I can see how easy it is to have a buggy script with this allowed. I do not think that the RETRIES_ENABLED macro is a good thing. I'd suggest to write the condition four times. ISTM that "skipped" transactions are NOT "successful" so there are a problem with comments. I believe that your formula are probably right, it has more to do with what is "success". For cnt decomposition, ISTM that "other transactions" are really "directly successful transactions". I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise "another" does not make sense yet. I'd suggest to name it "OTHER_SQL_FAILURE". In TState, field "uint32 retries": maybe it would be simpler to count "tries", which can be compared directly to max tries set in the option? ErrorLevel: I have already commented about in review about 10.2. I'm not sure of the LOG -> DEBUG_FAIL changes. I do not understand the name "DEBUG_FAIL", has it is not related to debug, they just seem to be internal errors. META_ERROR maybe? inTransactionBlock: I disagree with any function other than doCustom changing the client state, because it makes understanding the state machine harder. There is already one exception to that (threadRun) that I wish to remove. All state changes must be performed explicitely in doCustom. The automaton skips to FAILURE on every possible error. I'm wondering whether it could do so only on SQL errors, because other fails will lead to ABORTED anyway? If there is no good reason to skip to FAILURE from some errors, I'd suggest to keep the previous behavior. Maybe the good reason is to do some counting, but this means that on eg metacommand errors now the script would loop over instead of aborting, which does not look like a desirable change of behavior. PQexec("ROOLBACK"): you are inserting a synchronous command, for which the thread will have to wait for the result, in a middle of a framework which takes great care to use only asynchronous stuff so that one thread can manage several clients efficiently. You cannot call PQexec there. From where I sit, I'd suggest to sendQuery("ROLLBACK"), then switch to a new state CSTATE_WAIT_ABORT_RESULT which would be similar to CSTATE_WAIT_RESULT, but on success would skip to RETRY or ABORT instead of proceeding to the next command. ISTM that it would be more logical to only get into RETRY if there is a retry, i.e. move the test RETRY/ABORT in FAILURE. For that, instead of "canRetry", maybe you want "doRetry", which tells that a retry is possible (the error is serializable or deadlock) and that the current parameters allow it (timeout, max retries). * Minor C style comments: if / else if / else if ... on *_FAILURE: I'd suggest a switch. The following line removal does not seem useful, I'd have kept it: stats->cnt++; - if (skipped) copyVariables: I'm not convinced that source_vars & nvars variables are that useful. memcpy(&(st->retry_state.random_state), &(st->random_state), sizeof(RandomState)); Is there a problem with "st->retry_state.random_state = st->random_state;"
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 12-08-2018 12:14, Fabien COELHO wrote: HEllo Marina, Hello, Fabien! v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). This patch adds an explicit structure to manage Variables, which is useful to reset these on pgbench script retries, which is the purpose of the whole patch series. About part 3: Patch applies cleanly, On 12-08-2018 12:17, Fabien COELHO wrote: About part 3: Patch applies cleanly, I forgot: compiles, global & local "make check" are ok. I'm glad to hear it :-) * typo in comments: "varaibles" I'm sorry, I'll fix it. * About enlargeVariables: multiple INT_MAX error handling looks strange, especially as this code can never be triggered because pgbench would be dead long before having allocated INT_MAX variables. So I would not bother to add such checks. ... I'm not sure that the size_t cast here and there are useful for any practical values likely to be encountered by pgbench. Looking at the code of the functions, for example, ParseScript and psql_scan_setup, where the integer variable is used for the size of the entire script - ISTM that you are right.. Therefore size_t casts will also be removed. ISTM that if something is amiss it will fail in pg_realloc anyway. IIUC and physical RAM is not enough, this may depend on the size of the swap. Also I do not like the ExpBuf stuff, as usual. The exponential allocation seems overkill. I'd simply add a constant number of slots, with a simple rule: /* reallocated with a margin */ if (max_vars < needed) max_vars = needed + 8; So in the end the function should be much simpler. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
About part 3: Patch applies cleanly, I forgot: compiles, global & local "make check" are ok. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
HEllo Marina, v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). This patch adds an explicit structure to manage Variables, which is useful to reset these on pgbench script retries, which is the purpose of the whole patch series. About part 3: Patch applies cleanly, * typo in comments: "varaibles" * About enlargeVariables: multiple INT_MAX error handling looks strange, especially as this code can never be triggered because pgbench would be dead long before having allocated INT_MAX variables. So I would not bother to add such checks. ISTM that if something is amiss it will fail in pg_realloc anyway. Also I do not like the ExpBuf stuff, as usual. I'm not sure that the size_t cast here and there are useful for any practical values likely to be encountered by pgbench. The exponential allocation seems overkill. I'd simply add a constant number of slots, with a simple rule: /* reallocated with a margin */ if (max_vars < needed) max_vars = needed + 8; So in the end the function should be much simpler. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 10-08-2018 17:19, Arthur Zakirov wrote: On Fri, Aug 10, 2018 at 04:46:04PM +0300, Marina Polyakova wrote: > +1 from me to keep initial name "pgbench_error". "pgbench_log" for new > function looks nice to me. I think it is better than just "log", > because "log" may conflict with natural logarithmic function (see "man 3 > log"). Do you think that pgbench_log (or another whose name speaks only about logging) will look good, for example, with FATAL? Because this means that the logging function also processes errors and calls exit(1) if necessary.. Yes, why not. "_log" just means that you want to log some message with the specified log level. Moreover those messages sometimes aren't error: pgbench_error(LOG, "starting vacuum..."); "pgbench_log" is already used as the default filename prefix for transaction logging. > I agree with Fabien. Calling pgbench_error() inside pgbench_error() > could be dangerous. I think "fmt" checking could be removed, or we may > use Assert() I would like not to use Assert in this case because IIUC they are mostly used for testing. I'd vote to remove this check at all. I don't see any place where it is possible to call pgbench_error() passing empty "fmt". pgbench_error(..., "%s", PQerrorMessage(con)); ? -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On Fri, Aug 10, 2018 at 04:46:04PM +0300, Marina Polyakova wrote: > > +1 from me to keep initial name "pgbench_error". "pgbench_log" for new > > function looks nice to me. I think it is better than just "log", > > because "log" may conflict with natural logarithmic function (see "man 3 > > log"). > > Do you think that pgbench_log (or another whose name speaks only about > logging) will look good, for example, with FATAL? Because this means that > the logging function also processes errors and calls exit(1) if necessary.. Yes, why not. "_log" just means that you want to log some message with the specified log level. Moreover those messages sometimes aren't error: pgbench_error(LOG, "starting vacuum..."); > > I agree with Fabien. Calling pgbench_error() inside pgbench_error() > > could be dangerous. I think "fmt" checking could be removed, or we may > > use Assert() > > I would like not to use Assert in this case because IIUC they are mostly > used for testing. I'd vote to remove this check at all. I don't see any place where it is possible to call pgbench_error() passing empty "fmt". -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 10-08-2018 15:53, Arthur Zakirov wrote: On Thu, Aug 09, 2018 at 06:17:22PM +0300, Marina Polyakova wrote: > * ErrorLevel > > If ErrorLevel is used for things which are not errors, its name should > not include "Error"? Maybe "LogLevel"? On the one hand, this sounds better for me too. On the other hand, will not this be in some kind of conflict with error level codes in elog.h?.. I think it shouldn't because those error levels are backends levels. pgbench is a client side utility with its own code, it shares some code with libpq and other utilities, but elog.h isn't one of them. I agree with you on this :) I just meant that maybe it would be better to call this group in the same way because they are used in general for the same purpose?.. > This point also suggest that maybe "pgbench_error" is misnamed as well > (ok, I know I suggested it in place of ereport, but e stands for error > there), as it is called on errors, but is also on other things. Maybe > "pgbench_log"? Or just simply "log" or "report", as it is really an > local function, which does not need a prefix? That would mean that > "pgbench_simple_error", which is indeed called on errors, could keep > its initial name "pgbench_error", and be called on errors. About the name "log" - we already have the function doLog, so perhaps the name "report" will be better.. But like with ErrorLevel will not this be in some kind of conflict with ereport which is also used for the levels DEBUG... / LOG / INFO? +1 from me to keep initial name "pgbench_error". "pgbench_log" for new function looks nice to me. I think it is better than just "log", because "log" may conflict with natural logarithmic function (see "man 3 log"). Do you think that pgbench_log (or another whose name speaks only about logging) will look good, for example, with FATAL? Because this means that the logging function also processes errors and calls exit(1) if necessary.. > pgbench_error calls pgbench_error. Hmmm, why not. I agree with Fabien. Calling pgbench_error() inside pgbench_error() could be dangerous. I think "fmt" checking could be removed, or we may use Assert() I would like not to use Assert in this case because IIUC they are mostly used for testing. or fprintf()+exit(1) at least. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On Thu, Aug 09, 2018 at 06:17:22PM +0300, Marina Polyakova wrote: > > * ErrorLevel > > > > If ErrorLevel is used for things which are not errors, its name should > > not include "Error"? Maybe "LogLevel"? > > On the one hand, this sounds better for me too. On the other hand, will not > this be in some kind of conflict with error level codes in elog.h?.. I think it shouldn't because those error levels are backends levels. pgbench is a client side utility with its own code, it shares some code with libpq and other utilities, but elog.h isn't one of them. > > This point also suggest that maybe "pgbench_error" is misnamed as well > > (ok, I know I suggested it in place of ereport, but e stands for error > > there), as it is called on errors, but is also on other things. Maybe > > "pgbench_log"? Or just simply "log" or "report", as it is really an > > local function, which does not need a prefix? That would mean that > > "pgbench_simple_error", which is indeed called on errors, could keep > > its initial name "pgbench_error", and be called on errors. > > About the name "log" - we already have the function doLog, so perhaps the > name "report" will be better.. But like with ErrorLevel will not this be in > some kind of conflict with ereport which is also used for the levels > DEBUG... / LOG / INFO? +1 from me to keep initial name "pgbench_error". "pgbench_log" for new function looks nice to me. I think it is better than just "log", because "log" may conflict with natural logarithmic function (see "man 3 log"). > > pgbench_error calls pgbench_error. Hmmm, why not. I agree with Fabien. Calling pgbench_error() inside pgbench_error() could be dangerous. I think "fmt" checking could be removed, or we may use Assert() or fprintf()+exit(1) at least. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 10-08-2018 11:33, Fabien COELHO wrote: Hello Marina, I'd suggest to let lookupCreateVariable, putVariable* as they are, call pgbench_error with a level which does not stop the execution, and abort if necessary from the callers with a "aborted because of putVariable/eval/... error" message, as it was done before. There's one more problem: if this is a client failure, an error message inside any of these functions should be printed at the level DEBUG_FAILS; otherwise it should be printed at the level LOG. Or do you suggest using the error level as an argument for these functions? No. I suggest that the called function does only one simple thing, probably "DEBUG", and that the *caller* prints a message if it is unhappy about the failure of the called function, as it is currently done. This allows to provide context as well from the caller, eg "setting variable %s failed while ". The user call rerun under debug for precision if they need it. Ok! I'm still not over enthousiastic with these changes, and still think that it should be an independent patch, not submitted together with the "retry on error" feature. In the next version I will put the error patch last, so it will be possible to compare the "retry on error" feature with it and without it, and let the committer decide how it is better) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, I'd suggest to let lookupCreateVariable, putVariable* as they are, call pgbench_error with a level which does not stop the execution, and abort if necessary from the callers with a "aborted because of putVariable/eval/... error" message, as it was done before. There's one more problem: if this is a client failure, an error message inside any of these functions should be printed at the level DEBUG_FAILS; otherwise it should be printed at the level LOG. Or do you suggest using the error level as an argument for these functions? No. I suggest that the called function does only one simple thing, probably "DEBUG", and that the *caller* prints a message if it is unhappy about the failure of the called function, as it is currently done. This allows to provide context as well from the caller, eg "setting variable %s failed while ". The user call rerun under debug for precision if they need it. I'm still not over enthousiastic with these changes, and still think that it should be an independent patch, not submitted together with the "retry on error" feature. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 09-08-2018 12:28, Fabien COELHO wrote: Hello Marina, Hello! v10-0002-Pgbench-errors-use-a-separate-function-to-report.patch - a patch for a separate error reporting function (this is used to report client failures that do not cause an aborts and this depends on the level of debugging). Patch applies cleanly, compiles, global & local make check ok. :-) This patch improves/homogenizes logging & error reporting in pgbench, in preparation for another patch which will manage transaction restarts in some cases. However ISTM that it is not as necessary as the previous one, i.e. we could do without it to get the desired feature, so I see it more as a refactoring done "in passing", and I'm wondering whether it is really worth it because it adds some new complexity, so I'm not sure of the net benefit. We discussed this starting with [1]: IMO this patch is more controversial than the other ones. It is not really related to the aim of the patch series, which could do without, couldn't it? I'd suggest that it should be an independent submission, unrelated to the pgbench error management patch. I suppose that this is related; because of my patch there may be a lot of such code (see v7 in [1]): - fprintf(stderr, - "malformed variable \"%s\" value: \"%s\"\n", - var->name, var->svalue); + if (debug_level >= DEBUG_FAILS) + { + fprintf(stderr, + "malformed variable \"%s\" value: \"%s\"\n", + var->name, var->svalue); + } - if (debug) + if (debug_level >= DEBUG_ALL) fprintf(stderr, "client %d sending %s\n", st->id, sql); I'm not sure that debug messages needs to be kept after debug, if it is about debugging pgbench itself. That is debatable. AFAICS it is not about debugging pgbench itself, but about more detailed information that can be used to understand what exactly happened during its launch. In the case of errors this helps to distinguish between failures or errors by type (including which limit for retries was violated and how far it was exceeded for the serialization/deadlock errors). That's why it was suggested to make the error function which hides all these things (see [2]): There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with corresponding fprintf(stderr..) I think it's time to do it like in the main code, wrap with some function like log(level, msg). Yep. I did not wrote that, but I agree with an "elog" suggestion to switch if (...) { fprintf(...); exit/abort/continue/... } to a simpler: elog(level, ...) Anyway, I still have quite a few comments/suggestions on this version. Thank you very much for them! * ErrorLevel If ErrorLevel is used for things which are not errors, its name should not include "Error"? Maybe "LogLevel"? On the one hand, this sounds better for me too. On the other hand, will not this be in some kind of conflict with error level codes in elog.h?.. /* Error level codes */ #define DEBUG5 10 /* Debugging messages, in categories of * decreasing detail. */ #define DEBUG4 11 ... I'm at odds with the proposed levels. ISTM that pgbench internal errors which warrant an immediate exit should be dubbed "FATAL", Ok! which would leave the "ERROR" name for... errors, eg SQL errors. I'd suggest to use an INFO level for the PGBENCH_DEBUG function, and to keep LOG for main program messages, so that all use case are separate. Or, maybe the distinction between LOG/INFO is unclear so info is not necessary. The messages of the errors in SQL and meta commands are printed only if the option --debug-fails is used so I'm not sure that they should have a higher error level than main program messages (ERROR vs LOG). About an INFO level for the PGBENCH_DEBUG function - ISTM that some main program messages such as "dropping old tables...\n" or ..." tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n" can also use it.. About that all use cases were separate - in the current version the level LOG also includes messages about abortions of the clients. I'm unsure about the "log_min_messages" variable name, I'd suggest "log_level". I do not see the asserts on LOG >= log_min_messages as useful, because the level can only be LOG or DEBUG anyway. Ok! This point also suggest that maybe "pgbench_error" is misnamed as well (ok, I know I suggested it in place of ereport, but e stands for error there), as it is called on errors, but is also on other things. Maybe "pgbench_log"? Or just simply "log" or "report", as it is really an local function, which does not need a prefix? That would
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, v10-0002-Pgbench-errors-use-a-separate-function-to-report.patch - a patch for a separate error reporting function (this is used to report client failures that do not cause an aborts and this depends on the level of debugging). Patch applies cleanly, compiles, global & local make check ok. This patch improves/homogenizes logging & error reporting in pgbench, in preparation for another patch which will manage transaction restarts in some cases. However ISTM that it is not as necessary as the previous one, i.e. we could do without it to get the desired feature, so I see it more as a refactoring done "in passing", and I'm wondering whether it is really worth it because it adds some new complexity, so I'm not sure of the net benefit. Anyway, I still have quite a few comments/suggestions on this version. * ErrorLevel If ErrorLevel is used for things which are not errors, its name should not include "Error"? Maybe "LogLevel"? I'm at odds with the proposed levels. ISTM that pgbench internal errors which warrant an immediate exit should be dubbed "FATAL", which would leave the "ERROR" name for... errors, eg SQL errors. I'd suggest to use an INFO level for the PGBENCH_DEBUG function, and to keep LOG for main program messages, so that all use case are separate. Or, maybe the distinction between LOG/INFO is unclear so info is not necessary. I'm unsure about the "log_min_messages" variable name, I'd suggest "log_level". I do not see the asserts on LOG >= log_min_messages as useful, because the level can only be LOG or DEBUG anyway. This point also suggest that maybe "pgbench_error" is misnamed as well (ok, I know I suggested it in place of ereport, but e stands for error there), as it is called on errors, but is also on other things. Maybe "pgbench_log"? Or just simply "log" or "report", as it is really an local function, which does not need a prefix? That would mean that "pgbench_simple_error", which is indeed called on errors, could keep its initial name "pgbench_error", and be called on errors. Alternatively, the debug/logging code could be let as it is (i.e. direct print to stderr) and the function only called when there is some kind of error, in which case it could be named with "error" in its name (or elog/ereport...). * PQExpBuffer I still do not see a positive value from importing PQExpBuffer complexity and cost into pgbench, as the resulting code is not very readable and it adds malloc/free cycles, so I'd try to avoid using PQExpBuf as much as possible. ISTM that all usages could be avoided in the patch, and most should be avoided even if ExpBuffer is imported because it is really useful somewhere. - to call pgbench_error from pgbench_simple_error, you can do a pgbench_log_va(level, format, va_list) version called both from pgbench_error & pgbench_simple_error. - for PGBENCH_DEBUG function, do separate calls per type, the very small partial code duplication is worth avoiding ExpBuf IMO. - for doCustom debug: I'd just let the printf as it is, with a comment, as it is really very internal stuff for debug. Or I'd just snprintf a something in a static buffer. - for syntax_error: it should terminate, so it should call pgbench_error(FATAL, ...). Idem, I'd either keep the printf then call pgbench_error(FATAL, "syntax error found\n") for a final message, or snprintf in a static buffer. - for listAvailableScript: I'd simply call "pgbench_error(LOG" several time, once per line. I see building a string with a format (printfExpBuf..) and then calling the pgbench_error function with just a "%s" format on the result as not very elegant, because the second format is somehow hacked around. * bool client I'm unconvince by this added boolean just to switch the level on encountered errors. I'd suggest to let lookupCreateVariable, putVariable* as they are, call pgbench_error with a level which does not stop the execution, and abort if necessary from the callers with a "aborted because of putVariable/eval/... error" message, as it was done before. pgbench_error calls pgbench_error. Hmmm, why not. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 07-08-2018 19:21, Fabien COELHO wrote: Hello Marina, Hello, Fabien! v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). About this v10 part 1: Patch applies cleanly, compile, global & local make check both ok. The random state is cleanly separated so that it will be easy to reset it on client error handling ISTM that the pgbench side is deterministic with the separation of the seeds for different uses. Code is clean, comments are clear. :-) I'm wondering what is the rational for the "xseed" field name? In particular, what does the "x" stands for? I called it "...seed" instead of "data" because perhaps the "data" is too general a name for use here (but I'm not entirely sure what Alvaro Herrera meant in [1], see my answer in [2]). I called it "xseed" to combine it with the arguments of the functions _dorand48 / pg_erand48 / pg_jrand48 in the file erand48.c. IIUC they use a linear congruential generator and perhaps "xseed" means the sequence with the name X of pseudorandom values of size 48 bits (X_0, X_1, ... X_n) where X_0 is the seed / the start value. [1] https://www.postgresql.org/message-id/20180711180417.3ytmmwmonsr5lra7@alvherre.pgsql LGTM, though I'd rename the random_state struct members so that it wouldn't look as confusing. Maybe that's just me. [2] https://www.postgresql.org/message-id/cb2cde10e4e7a10a38b48e9cae8fbd28%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). About this v10 part 1: Patch applies cleanly, compile, global & local make check both ok. The random state is cleanly separated so that it will be easy to reset it on client error handling ISTM that the pgbench side is deterministic with the separation of the seeds for different uses. Code is clean, comments are clear. I'm wondering what is the rational for the "xseed" field name? In particular, what does the "x" stands for? -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
The point is to avoid building the message with dynamic allocation and so if in the end it is not used. Ok! About avoidance - I'm afraid there's one more piece of debugging code with the same problem: Indeed. I'd like to avoid all instances, so that PQExpBufferData is not needed anywhere, if possible. If not possible, then too bad, but I'd prefer to make do with formatted prints only, for simplicity. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 11-07-2018 22:34, Fabien COELHO wrote: can we try something like this? PGBENCH_ERROR_START(DEBUG_FAIL) { PGBENCH_ERROR("client %d repeats the failed transaction (try %d", Argh, no? I was thinking of something much more trivial: pgbench_error(DEBUG, "message format %d %s...", 12, "hello world"); If you really need some complex dynamic buffer, and I would prefer that you avoid that, then the fallback is: if (level >= DEBUG) { initPQstuff(); ... pgbench_error(DEBUG, "fixed message... %s\n", msg); freePQstuff(); } The point is to avoid building the message with dynamic allocation and so if in the end it is not used. Ok! About avoidance - I'm afraid there's one more piece of debugging code with the same problem: else if (command->type == META_COMMAND) { ... initPQExpBuffer(_buf); printfPQExpBuffer(_buf, "client %d executing \\%s", st->id, argv[0]); for (i = 1; i < argc; i++) appendPQExpBuffer(_buf, " %s", argv[i]); appendPQExpBufferChar(_buf, '\n'); ereport(ELEVEL_DEBUG, (errmsg("%s", errmsg_buf.data))); termPQExpBuffer(_buf); -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 11-07-2018 21:04, Alvaro Herrera wrote: Just a quick skim while refreshing what were those error reporting API changes about ... Thank you! On 2018-May-21, Marina Polyakova wrote: v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). LGTM, though I'd rename the random_state struct members so that it wouldn't look as confusing. Maybe that's just me. IIUC, do you like "xseed" instead of "data"? typedef struct RandomState { - unsigned short data[3]; + unsigned short xseed[3]; } RandomState; Or do you want to rename "random_state" in the structures RetryState / CState / TState? Thanks to Fabien Coelho' comments in [1], TState can contain several RandomStates for different purposes, something like this: /* * Thread state */ typedef struct { ... /* * Separate randomness for each thread. Each thread option uses its own * random state to make all of them independent of each other and therefore * deterministic at the thread level. */ RandomState choose_script_rs; /* random state for selecting a script */ RandomState throttling_rs; /* random state for transaction throttling */ RandomState sampling_rs;/* random state for log sampling */ ... } TState; v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). Please don't allocate Variable structs one by one. First time allocate some decent number (say 8) and then enlarge by duplicating size. That way you save realloc overhead. We use this technique everywhere else, no reason do different here. Other than that, LGTM. Ok! [1] https://www.postgresql.org/message-id/alpine.DEB.2.21.1806090810090.5307%40lancre While reading your patch, it occurs to me that a run is not deterministic at the thread level under throttling and sampling, because the random state is sollicited differently depending on when transaction ends. This suggest that maybe each thread random_state use should have its own random state. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 11-07-2018 20:49, Alvaro Herrera wrote: On 2018-Jul-11, Marina Polyakova wrote: can we try something like this? PGBENCH_ERROR_START(DEBUG_FAIL) { PGBENCH_ERROR("client %d repeats the failed transaction (try %d", st->id, st->retries + 1); if (max_tries) PGBENCH_ERROR("/%d", max_tries); if (latency_limit) { PGBENCH_ERROR(", %.3f%% of the maximum time of tries was used", getLatencyUsed(st, )); } PGBENCH_ERROR(")\n"); } PGBENCH_ERROR_END(); I didn't quite understand what these PGBENCH_ERROR() functions/macros are supposed to do. Care to explain? It is used only to print a string with the given arguments to stderr. Probably it might be just the function pgbench_error and not a macro.. P.S. This is my mistake, I did not think that PGBENCH_ERROR_END does not know the elevel for calling exit(1) if the elevel >= ERROR. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
can we try something like this? PGBENCH_ERROR_START(DEBUG_FAIL) { PGBENCH_ERROR("client %d repeats the failed transaction (try %d", Argh, no? I was thinking of something much more trivial: pgbench_error(DEBUG, "message format %d %s...", 12, "hello world"); If you really need some complex dynamic buffer, and I would prefer that you avoid that, then the fallback is: if (level >= DEBUG) { initPQstuff(); ... pgbench_error(DEBUG, "fixed message... %s\n", msg); freePQstuff(); } The point is to avoid building the message with dynamic allocation and so if in the end it is not used. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Just a quick skim while refreshing what were those error reporting API changes about ... On 2018-May-21, Marina Polyakova wrote: > v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch > - a patch for the RandomState structure (this is used to reset a client's > random seed during the repeating of transactions after > serialization/deadlock failures). LGTM, though I'd rename the random_state struct members so that it wouldn't look as confusing. Maybe that's just me. > v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch > - a patch for the Variables structure (this is used to reset client > variables during the repeating of transactions after serialization/deadlock > failures). Please don't allocate Variable structs one by one. First time allocate some decent number (say 8) and then enlarge by duplicating size. That way you save realloc overhead. We use this technique everywhere else, no reason do different here. Other than that, LGTM. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 2018-Jul-11, Marina Polyakova wrote: > can we try something like this? > > PGBENCH_ERROR_START(DEBUG_FAIL) > { > PGBENCH_ERROR("client %d repeats the failed transaction (try %d", > st->id, st->retries + 1); > if (max_tries) > PGBENCH_ERROR("/%d", max_tries); > if (latency_limit) > { > PGBENCH_ERROR(", %.3f%% of the maximum time of tries was used", > getLatencyUsed(st, )); > } > PGBENCH_ERROR(")\n"); > } > PGBENCH_ERROR_END(); I didn't quite understand what these PGBENCH_ERROR() functions/macros are supposed to do. Care to explain? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 11-07-2018 16:24, Fabien COELHO wrote: Hello Marina, * -d/--debug: I'm not in favor in requiring a mandatory text argument on this option. As you wrote in [1], adding an additional option is also a bad idea: Hey, I'm entitled to some internal contradictions:-) ... and discussions will be continue forever %-) I'm sceptical of the "--debug-fails" options. ISTM that --debug is already there and should just be reused. I was thinking that you could just use the existing --debug, not change its syntax. My point was that --debug exists, and you could just print the messages when under --debug. Now I understand you better, thanks. I think it will be useful to receive only messages about failures, because they and progress reports can be lost in many other debug messages such as "client %d sending ..." / "client %d executing ..." / "client %d receiving". Maybe it's better to use an optional argument/arguments for compatibility (--debug[=fails] or --debug[=NUM])? But if we use the numbers, now I can see only 2 levels, and there's no guarantee that they will no change.. Optional arguments to options (!) are not really clean things, so I'd like to avoid going onto this path, esp. as I cannot see any other instance in pgbench or elsewhere in postgres, AFAICS they are used in pg_waldump (option --stats[=record]) and in psql (option --help[=topic]). and I personnaly consider these as a bad idea. So if absolutely necessary, a new option is still better than changing --debug syntax. If not necessary, then it is better:-) Ok! * I'm reserved about the whole ereport thing, see comments in other messages. Thank you, I'll try to implement the error reporting in the way you suggested. Dunno if it is a good idea either. The committer word is the good one in the end:-à I agree with you that ereport has good reasons to be non-trivial in the backend and it does not have the same in pgbench.. * doCustom changes. On CSTATE_FAILURE, the next command is possibly started. Although there is some consistency with the previous point, I think that it totally breaks the state automaton where now a command can start while the whole transaction is in failing state anyway. There was no point in starting it in the first place. So, for me, the FAILURE state should record/count the failure, then skip to RETRY if a retry is decided, else proceed to ABORT. Nothing else. This is much clearer that way. Then RETRY should reinstate the global state and proceed to start the *first* command again. <...> It is unclear to me why backslash command errors are turned to FAILURE instead of ABORTED: there is no way they are going to be retried, so maybe they should/could skip directly to ABORTED? So do you propose to execute the command "ROLLBACK" without calculating its latency etc. if we are in a failed transaction and clear the conditional stack after each failure? Also just to be clear: do you want to have the state CSTATE_ABORTED for client abortion and another state for interrupting the current transaction? I do not understand what "interrupting the current transaction" means. A transaction is either committed or rollbacked, I do not know about "interrupted". I mean that IIUC the server usually only reports the error and you must manually send the command "END" or "ROLLBACK" to rollback a failed transaction. When it is rollbacked, probably some stats will be collected in passing, I'm fine with that. If there is an error in a pgbench script, the transaction is aborted, which means for me that the script execution is stopped where it was, and either it is restarted from the beginning (retry) or counted as failure (not retry, just aborted, really). If by interrupted you mean that one script begins a transaction and another ends it, as I said in the review I think that this strange case should be forbidden, so that all the code and documentation trying to manage that can be removed. Ok! The current RETRY state does memory allocations to generate a message with buffer allocation and so on. This looks like a costly and useless operation. If the user required "retries", then this is normal behavior, the retries are counted and will be printed out in the final report, and there is no point in printing out every single one of them. Maybe you want that debugging, but then coslty operations should be guarded. I think we need these debugging messages because, for example, Debugging message should cost only when under debug. When not under debug, there should be no debugging message, and there should be no cost for building and discarding such messages in the executed code path beyond testing whether program is under debug. if you use the option --latency-limit, you we will never know in advance whether the serialization/deadlock failure will be retried or not. ISTM that it will be shown final report. If I want debug, I ask for --debug, otherwise I think that the
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, * -d/--debug: I'm not in favor in requiring a mandatory text argument on this option. As you wrote in [1], adding an additional option is also a bad idea: Hey, I'm entitled to some internal contradictions:-) I'm sceptical of the "--debug-fails" options. ISTM that --debug is already there and should just be reused. I was thinking that you could just use the existing --debug, not change its syntax. My point was that --debug exists, and you could just print the messages when under --debug. Maybe it's better to use an optional argument/arguments for compatibility (--debug[=fails] or --debug[=NUM])? But if we use the numbers, now I can see only 2 levels, and there's no guarantee that they will no change.. Optional arguments to options (!) are not really clean things, so I'd like to avoid going onto this path, esp. as I cannot see any other instance in pgbench or elsewhere in postgres, and I personnaly consider these as a bad idea. So if absolutely necessary, a new option is still better than changing --debug syntax. If not necessary, then it is better:-) * I'm reserved about the whole ereport thing, see comments in other messages. Thank you, I'll try to implement the error reporting in the way you suggested. Dunno if it is a good idea either. The committer word is the good one in the end:-à Thank you, I'll fix this. I'm sorry, I'll fix this. You do not have to thank me or being sorry on every comment I do, once a the former is enough, and there is no need for the later. * doCustom changes. On CSTATE_FAILURE, the next command is possibly started. Although there is some consistency with the previous point, I think that it totally breaks the state automaton where now a command can start while the whole transaction is in failing state anyway. There was no point in starting it in the first place. So, for me, the FAILURE state should record/count the failure, then skip to RETRY if a retry is decided, else proceed to ABORT. Nothing else. This is much clearer that way. Then RETRY should reinstate the global state and proceed to start the *first* command again. <...> It is unclear to me why backslash command errors are turned to FAILURE instead of ABORTED: there is no way they are going to be retried, so maybe they should/could skip directly to ABORTED? So do you propose to execute the command "ROLLBACK" without calculating its latency etc. if we are in a failed transaction and clear the conditional stack after each failure? Also just to be clear: do you want to have the state CSTATE_ABORTED for client abortion and another state for interrupting the current transaction? I do not understand what "interrupting the current transaction" means. A transaction is either committed or rollbacked, I do not know about "interrupted". When it is rollbacked, probably some stats will be collected in passing, I'm fine with that. If there is an error in a pgbench script, the transaction is aborted, which means for me that the script execution is stopped where it was, and either it is restarted from the beginning (retry) or counted as failure (not retry, just aborted, really). If by interrupted you mean that one script begins a transaction and another ends it, as I said in the review I think that this strange case should be forbidden, so that all the code and documentation trying to manage that can be removed. The current RETRY state does memory allocations to generate a message with buffer allocation and so on. This looks like a costly and useless operation. If the user required "retries", then this is normal behavior, the retries are counted and will be printed out in the final report, and there is no point in printing out every single one of them. Maybe you want that debugging, but then coslty operations should be guarded. I think we need these debugging messages because, for example, Debugging message should cost only when under debug. When not under debug, there should be no debugging message, and there should be no cost for building and discarding such messages in the executed code path beyond testing whether program is under debug. if you use the option --latency-limit, you we will never know in advance whether the serialization/deadlock failure will be retried or not. ISTM that it will be shown final report. If I want debug, I ask for --debug, otherwise I think that the command should do what it was asked for, i.e. run scripts, collect performance statistics and show them at the end. In particular, when running with retries is enabled, the user is expecting deadlock/serialization errors, so that they are not "errors" as such for them. They also help to understand which limit of retries was violated or how close we were to these limits during the execution of a specific transaction. But I agree with you that they are costly and can be skipped if the failure type is never retried. Maybe it is better to split them
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 09-07-2018 16:05, Fabien COELHO wrote: Hello Marina, Hello, Fabien! Here is a review for the last part of your v9 version. Thank you very much for this! Patch does not "git apply" (may anymore): error: patch failed: doc/src/sgml/ref/pgbench.sgml:513 error: doc/src/sgml/ref/pgbench.sgml: patch does not apply Sorry, I'll send a new version soon. However I could get it to apply with the "patch" command. Then patch compiles, global & pgbench "make check" are ok. :-) Feature === The patch adds the ability to restart transactions (i.e. the full script) on some errors, which is a good thing as it allows to exercice postgres performance in more realistic scenarii. * -d/--debug: I'm not in favor in requiring a mandatory text argument on this option. It is not pratical, the user has to remember it, and it is a change. I'm sceptical of the overall debug handling changes. Maybe we could have multiple -d which lead to higher debug level, but I'm not sure that it can be made to work for this case and still be compatible with the previous behavior. Maybe you need a specific option for your purpose, eg "--debug-retry"? As you wrote in [1], adding an additional option is also a bad idea: I'm sceptical of the "--debug-fails" options. ISTM that --debug is already there and should just be reused. Maybe it's better to use an optional argument/arguments for compatibility (--debug[=fails] or --debug[=NUM])? But if we use the numbers, now I can see only 2 levels, and there's no guarantee that they will no change.. Code * The implementation is less complex that the previous submission, which is a good thing. I'm not sure that all the remaining complexity is still fully needed. * I'm reserved about the whole ereport thing, see comments in other messages. Thank you, I'll try to implement the error reporting in the way you suggested. Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me. In particular, the "CLIENT" part is not very useful. If the distinction makes sense, I would have kept "LOG" for the initial one and add other ones for ABORT and PGBENCH, maybe. Ok! * There are no comments about "retries" in StatData, CState and Command structures. * Also, for StatData, I would like to understand the logic between cnt, skipped, retries, retried, errors, ... so a clear information about the expected invariant if any would be welcome. One has to go in the code to understand how these fields relate one to the other. <...> * How "errors" differs from "ecnt" is unclear to me. Thank you, I'll fix this. * commandFailed: I think that it should be kept much simpler. In particular, having errors on errors does not help much: on ELEVEL_FATAL, it ignores the actual reported error and generates another error of the same level, so that the initial issue is hidden. Even if these are can't happen cases, hidding the origin if it occurs looks unhelpful. Just print it directly, and maybe abort if you think that it is a can't happen case. Oh, thanks, my mistake( * copyRandomState: just use sizeof(RandomState) instead of making assumptions about the contents of the struct. Also, this function looks pretty useless, why not just do a plain assignment? * copyVariables: lacks comments to explain that the destination is cleaned up and so on. The cleanup phase could probaly be in a distinct function, so that the code would be clearer. Maybe the function variable names are too long. Thank you, I'll fix this. if (current_source->svalue) in the context of a guard for a strdup, maybe: if (current_source->svalue != NULL) I'm sorry, I'll fix this. * I do not understand the comments on CState enum: "First, remember the failure in CSTATE_FAILURE. Then process other commands of the failed transaction if any" Why would other commands be processed at all if the transaction is aborted? For me any error must leads to the rollback and possible retry of the transaction. This comment needs to be clarified. It should also say that on FAILURE, it will go either to RETRY or ABORTED. See below my comments about doCustom. It is unclear to me why their could be several failures within a transaction, as I would have stopped that it would be aborted on the first one. * I do not undestand the purpose of first_failure. The comment should explain why it would need to be remembered. From my point of view, I'm not fully convinced that it should. <...> * executeCondition: this hides client automaton state changes which were clearly visible beforehand in the switch, and the different handling of if & elif is also hidden. I'm against this unnecessary restructuring and to hide such an information, all state changes should be clearly seen in the state switch so that it is easier to understand and follow. I do not see why touching the conditional stack on internal errors (evaluateExpr failure) brings anything, the whole transaction will be aborted anyway. *
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, v9-0004-Pgbench-errors-and-serialization-deadlock-retries.patch - the main patch for handling client errors and repetition of transactions with serialization/deadlock failures (see the detailed description in the file). Here is a review for the last part of your v9 version. Patch does not "git apply" (may anymore): error: patch failed: doc/src/sgml/ref/pgbench.sgml:513 error: doc/src/sgml/ref/pgbench.sgml: patch does not apply However I could get it to apply with the "patch" command. Then patch compiles, global & pgbench "make check" are ok. Feature === The patch adds the ability to restart transactions (i.e. the full script) on some errors, which is a good thing as it allows to exercice postgres performance in more realistic scenarii. * -d/--debug: I'm not in favor in requiring a mandatory text argument on this option. It is not pratical, the user has to remember it, and it is a change. I'm sceptical of the overall debug handling changes. Maybe we could have multiple -d which lead to higher debug level, but I'm not sure that it can be made to work for this case and still be compatible with the previous behavior. Maybe you need a specific option for your purpose, eg "--debug-retry"? Code * The implementation is less complex that the previous submission, which is a good thing. I'm not sure that all the remaining complexity is still fully needed. * I'm reserved about the whole ereport thing, see comments in other messages. Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me. In particular, the "CLIENT" part is not very useful. If the distinction makes sense, I would have kept "LOG" for the initial one and add other ones for ABORT and PGBENCH, maybe. * There are no comments about "retries" in StatData, CState and Command structures. * Also, for StatData, I would like to understand the logic between cnt, skipped, retries, retried, errors, ... so a clear information about the expected invariant if any would be welcome. One has to go in the code to understand how these fields relate one to the other. * "errors_in_failed_tx" is some subcounter of "errors", for a special case. Why it is there fails me [I finally understood, and I think it should be removed, see end of review]. If we wanted to distinguish, then we should distinguish homogeneously: maybe just count the different error types, eg have things like "deadlock_errors", "serializable_errors", "other_errors", "internal_pgbench_errors" which would be orthogonal one to the other, and "errors" could be recomputed from these. * How "errors" differs from "ecnt" is unclear to me. * FailureStatus states are not homogeneously named. I'd suggest to use *_FAILURE for all cases. The miscellaneous case should probably be the last. I do not understand the distinction between ANOTHER_FAILURE & IN_FAILED_SQL_TRANSACTION. Why should it be needed? [again, see and of review] * I do not understand the comments on CState enum: "First, remember the failure in CSTATE_FAILURE. Then process other commands of the failed transaction if any" Why would other commands be processed at all if the transaction is aborted? For me any error must leads to the rollback and possible retry of the transaction. This comment needs to be clarified. It should also say that on FAILURE, it will go either to RETRY or ABORTED. See below my comments about doCustom. It is unclear to me why their could be several failures within a transaction, as I would have stopped that it would be aborted on the first one. * I do not undestand the purpose of first_failure. The comment should explain why it would need to be remembered. From my point of view, I'm not fully convinced that it should. * commandFailed: I think that it should be kept much simpler. In particular, having errors on errors does not help much: on ELEVEL_FATAL, it ignores the actual reported error and generates another error of the same level, so that the initial issue is hidden. Even if these are can't happen cases, hidding the origin if it occurs looks unhelpful. Just print it directly, and maybe abort if you think that it is a can't happen case. * copyRandomState: just use sizeof(RandomState) instead of making assumptions about the contents of the struct. Also, this function looks pretty useless, why not just do a plain assignment? * copyVariables: lacks comments to explain that the destination is cleaned up and so on. The cleanup phase could probaly be in a distinct function, so that the code would be clearer. Maybe the function variable names are too long. if (current_source->svalue) in the context of a guard for a strdup, maybe: if (current_source->svalue != NULL) * executeCondition: this hides client automaton state changes which were clearly visible beforehand in the switch, and the different handling of if & elif is also hidden. I'm against this unnecessary restructuring and to hide such an information, all state changes should be
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Alvaro, For context: in the backend, elog() is only used for internal messages (i.e. "can't-happen" conditions), and ereport() is used for user-facing messages. There are many things ereport() has that elog() doesn't, such as additional message fields (HINT, DETAIL, etc) that I think could have some use in pgbench as well. If you use elog() then you can't have that. [...] Ok. Then forget elog, but I'm pretty against having a kind of ereport which looks greatly overkill to me, because: (1) the syntax is pretty heavy, and does not look like a function. (2) the implementation allocates a string buffer for the message this is greatly overkill for pgbench which only needs to print to stderr once. This makes sense server-side because the generated message may be output several times (eg stderr, file logging, to the client), and the implementation has to work with cpp implementations which do not handle varags (and maybe other reasons). So I would be in favor of having just a simpler error function. Incidentally, one already exists "pgbench_error" and could be improved, extended, replaced. There is also "syntax_error". One thing that just came to mind is that pgbench uses some src/fe_utils stuff. I hope having ereport() doesn't cause a conflict with that ... Currently ereport does not exists client-side. I do not think that this patch is the right moment to decide to do that. Also, there are some "elog" in libpq, but they are out with a "#ifndef FRONTEND". BTW I think abort() is not the right thing, as it'll cause core dumps if enabled. Why not just exit(1)? Yes, I agree and already reported that. Conclusion: My current opinion is that I'm pretty against bringing "ereport" to the front-end on this specific pgbench patch. I agree with you that "elog" would be misleading there as well, for the arguments you developed above. I'd suggest to have just one clean and simple pgbench internal function to handle errors and possibly exit, debug... Something like void pgb_error(FATAL, "error %d raised", 12); Implemented as void pgb_error(int/enum XXX level, const char * format, ...) { test level and maybe return immediately (eg debug); print to stderr; exit/abort/return depending; } Then if some advanced error handling is introduced for front-end programs, possibly through some macros, then it would be time to improve upon that. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 13-06-2018 22:44, Fabien COELHO wrote: Hello Marina, I suppose that this is related; because of my patch there may be a lot of such code (see v7 in [1]): - fprintf(stderr, - "malformed variable \"%s\" value: \"%s\"\n", - var->name, var->svalue); + if (debug_level >= DEBUG_FAILS) + { + fprintf(stderr, + "malformed variable \"%s\" value: \"%s\"\n", + var->name, var->svalue); + } - if (debug) + if (debug_level >= DEBUG_ALL) fprintf(stderr, "client %d sending %s\n", st->id, sql); I'm not sure that debug messages needs to be kept after debug, if it is about debugging pgbench itself. That is debatable. AFAICS it is not about debugging pgbench itself, but about more detailed information that can be used to understand what exactly happened during its launch. In the case of errors this helps to distinguish between failures or errors by type (including which limit for retries was violated and how far it was exceeded for the serialization/deadlock errors). The code adapts/duplicates existing server-side "ereport" stuff and brings it to the frontend, where the logging needs are somehow quite different. I'd prefer to avoid duplication and/or have some code sharing. I was recommended to use the same interface in [3]: On elog/errstart: we already have a convention for what ereport() calls look like; I suggest to use that instead of inventing your own. The "elog" interface already exists, it is not an invention. "ereport" is a hack which is somehow necessary in some cases. I prefer a simple function call if possible for the purpose, and ISTM that this is the case. That is a lot of complication which are justified server side where logging requirements are special, but in this case I see it as overkill. I think we need ereport() if we want to make detailed error messages (see examples in [1]).. If it really needs to be duplicated, I'd suggest to put all this stuff in separated files. If we want to do that, I think that it would belong to fe_utils, and where it could/should be used by all front-end programs. I'll try to do it.. Dunno. If you only need one "elog" function which prints a message to stderr and decides whether to abort/exit/whatevrer, maybe it can just be kept in pgbench. If there are are several complicated functions and macros, better with a file. So I'd say it depends. So my current view is that if you only need an "elog" function, it is simpler to add it to "pgbench.c". Thank you! For logging purposes, ISTM that the "elog" macro interface is nicer, closer to the existing "fprintf(stderr", as it would not introduce the additional parentheses hack for "rest". I was also recommended to use ereport() instead of elog() in [3]: Probably. Are you hoping that advises from different reviewers should be consistent? That seems optimistic:-) To make the patch committable there should be no objection to it.. [1] https://www.postgresql.org/message-id/c89fcc380a19380260b5ea463efc1416%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 13-06-2018 22:59, Alvaro Herrera wrote: For context: in the backend, elog() is only used for internal messages (i.e. "can't-happen" conditions), and ereport() is used for user-facing messages. There are many things ereport() has that elog() doesn't, such as additional message fields (HINT, DETAIL, etc) that I think could have some use in pgbench as well. If you use elog() then you can't have that. AFAIU originally it was not supposed that the pgbench error messages have these fields, so will it be good to change the final output to stderr?.. For example: - fprintf(stderr, "%s", PQerrorMessage(con)); - fprintf(stderr, "(ignoring this error and continuing anyway)\n"); + ereport(LOG, + (errmsg("Ignoring the server error and continuing anyway"), +errdetail("%s", PQerrorMessage(con; - fprintf(stderr, "%s", PQerrorMessage(con)); - if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0) - { -fprintf(stderr, "Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n", PQdb(con)); - } - - exit(1); + ereport(ERROR, + (errmsg("Server error"), +errdetail("%s", PQerrorMessage(con)), +sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0 ? + errhint("Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n", +PQdb(con)) : 0)); -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 2018-Jun-13, Fabien COELHO wrote: > > > > With that, is there a need for elog()? In the backend we have > > > > it because $HISTORY but there's no need for that here -- I > > > > propose to lose elog() and use only ereport everywhere. > > See commit 8a07ebb3c172 which turns some ereport into elog... For context: in the backend, elog() is only used for internal messages (i.e. "can't-happen" conditions), and ereport() is used for user-facing messages. There are many things ereport() has that elog() doesn't, such as additional message fields (HINT, DETAIL, etc) that I think could have some use in pgbench as well. If you use elog() then you can't have that. Another difference is that in the backend, elog() messages are never translated, while ereport() message are translated. Since pgbench is translatable I think it would be best to keep those things in sync, to avoid confusion. (Although of course you could do it differently in pgbench than backend.) One thing that just came to mind is that pgbench uses some src/fe_utils stuff. I hope having ereport() doesn't cause a conflict with that ... BTW I think abort() is not the right thing, as it'll cause core dumps if enabled. Why not just exit(1)? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, I suppose that this is related; because of my patch there may be a lot of such code (see v7 in [1]): - fprintf(stderr, - "malformed variable \"%s\" value: \"%s\"\n", - var->name, var->svalue); + if (debug_level >= DEBUG_FAILS) + { + fprintf(stderr, + "malformed variable \"%s\" value: \"%s\"\n", + var->name, var->svalue); + } - if (debug) + if (debug_level >= DEBUG_ALL) fprintf(stderr, "client %d sending %s\n", st->id, sql); I'm not sure that debug messages needs to be kept after debug, if it is about debugging pgbench itself. That is debatable. That's why it was suggested to make the error function which hides all these things (see [2]): There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with corresponding fprintf(stderr..) I think it's time to do it like in the main code, wrap with some function like log(level, msg). Yep. I did not wrote that, but I agree with an "elog" suggestion to switch if (...) { fprintf(...); exit/abort/continue/... } to a simpler: elog(level, ...) Moreover, it changes pgbench current behavior, which might be admissible, but should be discussed clearly. The semantics of the existing code is changed, the FATAL levels calls abort() and replace existing exit(1) calls. Maybe you want an ERROR level as well. Oh, thanks, I agree with you. And I do not want to change the program exit code without good reasons, but I'm sorry I may not know all pros and cons in this matter.. Or did you also mean other changes? AFAICR I meant switching exit to abort in some cases. The code adapts/duplicates existing server-side "ereport" stuff and brings it to the frontend, where the logging needs are somehow quite different. I'd prefer to avoid duplication and/or have some code sharing. I was recommended to use the same interface in [3]: On elog/errstart: we already have a convention for what ereport() calls look like; I suggest to use that instead of inventing your own. The "elog" interface already exists, it is not an invention. "ereport" is a hack which is somehow necessary in some cases. I prefer a simple function call if possible for the purpose, and ISTM that this is the case. If it really needs to be duplicated, I'd suggest to put all this stuff in separated files. If we want to do that, I think that it would belong to fe_utils, and where it could/should be used by all front-end programs. I'll try to do it.. Dunno. If you only need one "elog" function which prints a message to stderr and decides whether to abort/exit/whatevrer, maybe it can just be kept in pgbench. If there are are several complicated functions and macros, better with a file. So I'd say it depends. For logging purposes, ISTM that the "elog" macro interface is nicer, closer to the existing "fprintf(stderr", as it would not introduce the additional parentheses hack for "rest". I was also recommended to use ereport() instead of elog() in [3]: Probably. Are you hoping that advises from different reviewers should be consistent? That seems optimistic:-) With that, is there a need for elog()? In the backend we have it because $HISTORY but there's no need for that here -- I propose to lose elog() and use only ereport everywhere. See commit 8a07ebb3c172 which turns some ereport into elog... My 0.02€: maybe you just want to turn fprintf(stderr, format, ...); // then possibly exit or abort depending... into elog(level, format, ...); which maybe would exit or abort depending on level, and possibly not actually report under some levels and/or some conditions. For that, it could enough to just provide an nice "elog" function. I agree that elog() can be coded in this way. To use ereport() I need a structure to store the error level as a condition to exit. Yep. That is a lot of complication which are justified server side where logging requirements are special, but in this case I see it as overkill. So my current view is that if you only need an "elog" function, it is simpler to add it to "pgbench.c". -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 10-06-2018 10:38, Fabien COELHO wrote: Hello Marina, Hello! v9-0003-Pgbench-errors-use-the-ereport-macro-to-report-de.patch - a patch for the ereport() macro (this is used to report client failures that do not cause an aborts and this depends on the level of debugging). ISTM that abort() is called under FATAL. If you mean abortion of the client, this is not an abortion of the main program. - implementation: if possible, use the local ErrorData structure during the errstart()/errmsg()/errfinish() calls. Otherwise use a static variable protected by a mutex if necessary. To do all of this export the function appendPQExpBufferVA from libpq. This patch applies cleanly on top of the other ones (there are minimal interactions), compiles cleanly, global & pgbench "make check" are ok. :-) IMO this patch is more controversial than the other ones. It is not really related to the aim of the patch series, which could do without, couldn't it? I'd suggest that it should be an independent submission, unrelated to the pgbench error management patch. I suppose that this is related; because of my patch there may be a lot of such code (see v7 in [1]): - fprintf(stderr, - "malformed variable \"%s\" value: \"%s\"\n", - var->name, var->svalue); + if (debug_level >= DEBUG_FAILS) + { + fprintf(stderr, + "malformed variable \"%s\" value: \"%s\"\n", + var->name, var->svalue); + } - if (debug) + if (debug_level >= DEBUG_ALL) fprintf(stderr, "client %d sending %s\n", st->id, sql); That's why it was suggested to make the error function which hides all these things (see [2]): There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with corresponding fprintf(stderr..) I think it's time to do it like in the main code, wrap with some function like log(level, msg). Moreover, it changes pgbench current behavior, which might be admissible, but should be discussed clearly. The semantics of the existing code is changed, the FATAL levels calls abort() and replace existing exit(1) calls. Maybe you want an ERROR level as well. Oh, thanks, I agree with you. And I do not want to change the program exit code without good reasons, but I'm sorry I may not know all pros and cons in this matter.. Or did you also mean other changes? The code adapts/duplicates existing server-side "ereport" stuff and brings it to the frontend, where the logging needs are somehow quite different. I'd prefer to avoid duplication and/or have some code sharing. I was recommended to use the same interface in [3]: On elog/errstart: we already have a convention for what ereport() calls look like; I suggest to use that instead of inventing your own. If it really needs to be duplicated, I'd suggest to put all this stuff in separated files. If we want to do that, I think that it would belong to fe_utils, and where it could/should be used by all front-end programs. I'll try to do it.. I do not understand why names are changed, eg ELEVEL_FATAL instead of FATAL. ISTM that part of the point of the move would be to be homogeneous, which suggests that the same names should be reused. Ok! For logging purposes, ISTM that the "elog" macro interface is nicer, closer to the existing "fprintf(stderr", as it would not introduce the additional parentheses hack for "rest". I was also recommended to use ereport() instead of elog() in [3]: With that, is there a need for elog()? In the backend we have it because $HISTORY but there's no need for that here -- I propose to lose elog() and use only ereport everywhere. I see no actual value in creating on the fly a dynamic buffer through plenty macros and functions as the end result is just to print the message out to stderr in the end. errfinishImpl: fprintf(stderr, "%s", error->message.data); This looks like overkill. From reading the code, this does not look like an improvement: fprintf(stderr, "invalid socket: %s", PQerrorMessage(st->con)); vs ereport(ELEVEL_LOG, (errmsg("invalid socket: %s", PQerrorMessage(st->con; The whole complexity of the server-side interface only make sense because TRY/CATCH stuff and complex logging requirements (eg several outputs) in the backend. The patch adds quite some code and complexity without clear added value that I can see. My 0.02€: maybe you just want to turn fprintf(stderr, format, ...); // then possibly exit or abort depending... into elog(level, format, ...); which maybe would exit or abort depending on level, and possibly not actually report under some levels and/or some conditions. For that, it could enough to just provide an nice "elog" function. I
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, v9-0003-Pgbench-errors-use-the-ereport-macro-to-report-de.patch - a patch for the ereport() macro (this is used to report client failures that do not cause an aborts and this depends on the level of debugging). ISTM that abort() is called under FATAL. - implementation: if possible, use the local ErrorData structure during the errstart()/errmsg()/errfinish() calls. Otherwise use a static variable protected by a mutex if necessary. To do all of this export the function appendPQExpBufferVA from libpq. This patch applies cleanly on top of the other ones (there are minimal interactions), compiles cleanly, global & pgbench "make check" are ok. IMO this patch is more controversial than the other ones. It is not really related to the aim of the patch series, which could do without, couldn't it? Moreover, it changes pgbench current behavior, which might be admissible, but should be discussed clearly. I'd suggest that it should be an independent submission, unrelated to the pgbench error management patch. The code adapts/duplicates existing server-side "ereport" stuff and brings it to the frontend, where the logging needs are somehow quite different. I'd prefer to avoid duplication and/or have some code sharing. If it really needs to be duplicated, I'd suggest to put all this stuff in separated files. If we want to do that, I think that it would belong to fe_utils, and where it could/should be used by all front-end programs. I do not understand why names are changed, eg ELEVEL_FATAL instead of FATAL. ISTM that part of the point of the move would be to be homogeneous, which suggests that the same names should be reused. For logging purposes, ISTM that the "elog" macro interface is nicer, closer to the existing "fprintf(stderr", as it would not introduce the additional parentheses hack for "rest". I see no actual value in creating on the fly a dynamic buffer through plenty macros and functions as the end result is just to print the message out to stderr in the end. errfinishImpl: fprintf(stderr, "%s", error->message.data); This looks like overkill. From reading the code, this does not look like an improvement: fprintf(stderr, "invalid socket: %s", PQerrorMessage(st->con)); vs ereport(ELEVEL_LOG, (errmsg("invalid socket: %s", PQerrorMessage(st->con; The whole complexity of the server-side interface only make sense because TRY/CATCH stuff and complex logging requirements (eg several outputs) in the backend. The patch adds quite some code and complexity without clear added value that I can see. The semantics of the existing code is changed, the FATAL levels calls abort() and replace existing exit(1) calls. Maybe you want an ERROR level as well. My 0.02€: maybe you just want to turn fprintf(stderr, format, ...); // then possibly exit or abort depending... into elog(level, format, ...); which maybe would exit or abort depending on level, and possibly not actually report under some levels and/or some conditions. For that, it could enough to just provide an nice "elog" function. In conclusion, which you can disagree with because maybe I have missed something... anyway I currently think that: - it should be an independent submission - possibly at "fe_utils" level - possibly just a nice "elog" function is enough, if so just do that. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 09-06-2018 16:31, Fabien COELHO wrote: Hello Marina, Hello! v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). About this second patch: This extracts the variable holding structure, so that it is somehow easier to reset them to their initial state on transaction failures, the management of which is the ultimate aim of this patch series. It is also cleaner this way. Patch applies cleanly on top of the previous one (there is no real interactions with it). It compiles cleanly. Global & pgbench "make check" are both ok. :-) The structure typedef does not need a name. "typedef struct { } V...". Ok! I tend to disagree with naming things after their type, eg "array". I'd suggest "vars" instead. "nvariables" could be "nvars" for consistency with that and "vars_sorted", and because "foo.variables->nvariables" starts looking heavy. I'd suggest but "Variables" type declaration just after "Variable" type declaration in the file. Thank you, I agree and I'll fix all this. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 09-06-2018 9:55, Fabien COELHO wrote: Hello Marina, Hello! v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). A few comments about this first patch. Thank you very much! Patch applies cleanly, compiles, global & pgbench "make check" ok. I'm mostly ok with the changes, which cleanly separate the different use of random between threads (script choice, throttle delay, sampling...) and client (random*() calls). Glad to hear it :) This change is necessary so that a client can restart a transaction deterministically (at the client level at least), which is the ultimate aim of the patch series. A few remarks: The RandomState struct is 6 bytes, which will induce some padding when used. This is life and pre-existing. No problem. ISTM that the struct itself does not need a name, ie. "typedef struct { ... } RandomState" is enough. Ok! There could be clear comments, say in the TState and CState structs, about what randomness is impacted (i.e. script choices, etc.). Thank you, I'll add them. getZipfianRand, computeHarmonicZipfian: The "thread" parameter was justified because it was used for two fieds. As the random state is separated, I'd suggest that the other argument should be a zipfcache pointer. I agree with you and I will change it. While reading your patch, it occurs to me that a run is not deterministic at the thread level under throttling and sampling, because the random state is sollicited differently depending on when transaction ends. This suggest that maybe each thread random_state use should have its own random state. Thank you, I'll fix this. In passing, and totally unrelated to this patch: I've always been a little puzzled about why a quite small 48-bit internal state random generator is used. I understand the need for pg to have a portable & state-controlled thread-safe random generator, but why this particular small one fails me. The source code (src/port/erand48.c, copyright in 1993...) looks optimized for 16 bits architectures, which is probably pretty inefficent to run on 64 bits architectures. Maybe this could be updated with something more consistent with today's processors, providing more quality at a lower cost. This sounds interesting, thanks! *went to look for a multiplier and a summand that are large enough and are mutually prime..* -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). About this second patch: This extracts the variable holding structure, so that it is somehow easier to reset them to their initial state on transaction failures, the management of which is the ultimate aim of this patch series. It is also cleaner this way. Patch applies cleanly on top of the previous one (there is no real interactions with it). It compiles cleanly. Global & pgbench "make check" are both ok. The structure typedef does not need a name. "typedef struct { } V...". I tend to disagree with naming things after their type, eg "array". I'd suggest "vars" instead. "nvariables" could be "nvars" for consistency with that and "vars_sorted", and because "foo.variables->nvariables" starts looking heavy. I'd suggest but "Variables" type declaration just after "Variable" type declaration in the file. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). A few comments about this first patch. Patch applies cleanly, compiles, global & pgbench "make check" ok. I'm mostly ok with the changes, which cleanly separate the different use of random between threads (script choice, throttle delay, sampling...) and client (random*() calls). This change is necessary so that a client can restart a transaction deterministically (at the client level at least), which is the ultimate aim of the patch series. A few remarks: The RandomState struct is 6 bytes, which will induce some padding when used. This is life and pre-existing. No problem. ISTM that the struct itself does not need a name, ie. "typedef struct { ... } RandomState" is enough. There could be clear comments, say in the TState and CState structs, about what randomness is impacted (i.e. script choices, etc.). getZipfianRand, computeHarmonicZipfian: The "thread" parameter was justified because it was used for two fieds. As the random state is separated, I'd suggest that the other argument should be a zipfcache pointer. While reading your patch, it occurs to me that a run is not deterministic at the thread level under throttling and sampling, because the random state is sollicited differently depending on when transaction ends. This suggest that maybe each thread random_state use should have its own random state. In passing, and totally unrelated to this patch: I've always been a little puzzled about why a quite small 48-bit internal state random generator is used. I understand the need for pg to have a portable & state-controlled thread-safe random generator, but why this particular small one fails me. The source code (src/port/erand48.c, copyright in 1993...) looks optimized for 16 bits architectures, which is probably pretty inefficent to run on 64 bits architectures. Maybe this could be updated with something more consistent with today's processors, providing more quality at a lower cost. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello, Fabien COELHO wrote: > > Looking over the diff, I find that this patch tries to do too much and > > needs to be split up. > > Yep, I agree that it would help the reviewing process. On the other hand I > have bad memories about maintaining dependent patches which interfere > significantly. Sure. I suggest not posting these patches separately -- instead, post as a series of commits in a single email, attaching files from "git format-patch". -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Alvaro, I think that I'll have time for a round of review in the first half of July. Providing a rebased patch before then would be nice. Note that even in the absence of a rebased patch, you can apply to an older checkout if you have some limited window of time for a review. Yes, sure. I'd like to bring this feature to be committable, so it will have to be rebased at some point anyway. Looking over the diff, I find that this patch tries to do too much and needs to be split up. Yep, I agree that it would help the reviewing process. On the other hand I have bad memories about maintaining dependent patches which interfere significantly. Maybe it may not the case with this feature. Thanks for the advices. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Fabien COELHO wrote: > I think that I'll have time for a round of review in the first half of July. > Providing a rebased patch before then would be nice. Note that even in the absence of a rebased patch, you can apply to an older checkout if you have some limited window of time for a review. Looking over the diff, I find that this patch tries to do too much and needs to be split up. At a minimum there is a preliminary patch that introduces the error reporting stuff (errstart etc); there are other thread-related changes (for example to the random generation functions) that probably belong in a separate one too. Not sure if there are other smaller patches hidden inside the rest. On elog/errstart: we already have a convention for what ereport() calls look like; I suggest to use that instead of inventing your own. With that, is there a need for elog()? In the backend we have it because $HISTORY but there's no need for that here -- I propose to lose elog() and use only ereport everywhere. Also, I don't see that you need errmsg_internal() at all; let's lose it too. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, FYI the v8 patch does not apply anymore, mostly because of a recent perl reindentation. I think that I'll have time for a round of review in the first half of July. Providing a rebased patch before then would be nice. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On Wed, 04 Apr 2018 16:07:25 +0300 Marina Polyakovawrote: > Hello, hackers! > > Here there's a seventh version of the patch for error handling and > retrying of transactions with serialization/deadlock failures in > pgbench (based on the commit > a08dc711952081d63577fc182fcf955958f70add). I added the option > --max-tries-time which is an implemetation of Fabien Coelho's > proposal in [1]: the transaction with serialization or deadlock > failure can be retried if the total time of all its tries is less > than this limit (in ms). This option can be combined with the option > --max-tries. But if none of them are used, failed transactions are > not retried at all. > > Also: > * Now when the first failure occurs in the transaction it is always > reported as a failure since only after the remaining commands of this > transaction are executed we find out whether we can try again or not. > Therefore add the messages about retrying or ending the failed > transaction to the "fails" debugging level so you can distinguish > failures (which are retried) and errors (which are not retried). > * Fix a report on the latency average because the total time includes > time for both errors and successful transactions. > * Code cleanup (including tests). > > [1] > https://www.postgresql.org/message-id/alpine.DEB.2.20.1803292134380.16472%40lancre > > > Maybe the max retry should rather be expressed in time rather than > > number > > of attempts, or both approach could be implemented? > Hi, I did a little review of your patch. It seems to work as expected, documentation and tests are there. Still I have few comments. There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with corresponding fprintf(stderr..) I think it's time to do it like in the main code, wrap with some function like log(level, msg). In CSTATE_RETRY state used_time is used only in printing but calculated more than needed. In my opinion Debuglevel should be renamed to DebugLevel that looks nicer, also there DEBUGLEVEl (where last letter is in lower case) which is very confusing. I have checked overall functionality of this patch, but haven't checked any special cases yet. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Conception of max-retry option seems strange for me. if number of retries reaches max-retry option, then we just increment counter of failed transaction and try again (possibly, with different random numbers). At the end we should distinguish number of error transaction and failed transaction, to found this difference documentation suggests to rerun pgbench with debugging on. May be I didn't catch an idea, but it seems to me max-tries should be removed. On transaction searialization or deadlock error pgbench should increment counter of failed transaction, resets conditional stack, variables, etc but not a random generator and then start new transaction for the first line of script. ISTM that there is the idea is that the client application should give up at some point are report an error to the end user, kind of a "timeout" on trying, and that max-retry would implement this logic of giving up: the transaction which was intented, represented by a given initial random generator state, could not be committed as if after some iterations. Maybe the max retry should rather be expressed in time rather than number of attempts, or both approach could be implemented? But there is a logic of retrying the same (try again what the client wanted) vs retrying something different (another client need is served). -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Conception of max-retry option seems strange for me. if number of retries reaches max-retry option, then we just increment counter of failed transaction and try again (possibly, with different random numbers). At the end we should distinguish number of error transaction and failed transaction, to found this difference documentation suggests to rerun pgbench with debugging on. May be I didn't catch an idea, but it seems to me max-tries should be removed. On transaction searialization or deadlock error pgbench should increment counter of failed transaction, resets conditional stack, variables, etc but not a random generator and then start new transaction for the first line of script. Marina Polyakova wrote: On 26-03-2018 18:53, Fabien COELHO wrote: Hello Marina, Hello! Many thanks to both of you! I'm working on a patch in this direction.. I think that the best approach for now is simply to reset (command zero, random generator) and start over the whole script, without attempting to be more intelligent. The limitations should be clearly documented (one transaction per script), though. That would be a significant enhancement already. I'm not sure that we can always do this, because we can get new errors until we finish the failed transaction block, and we need destroy the conditional stack.. Sure. I'm suggesting so as to simplify that on failures the retry would always restarts from the beginning of the script by resetting everything, indeed including the conditional stack, the random generator state, the variable values, and so on. This mean enforcing somehow one script is one transaction. If the user does not do that, it would be their decision and the result becomes unpredictable on errors (eg some sub-transactions could be executed more than once). Then if more is needed, that could be for another patch. Here is the fifth version of the patch for pgbench (based on the commit 4b9094eb6e14dfdbed61278ea8e51cc846e43579) where I tried to implement these ideas, thanks to your comments and those of Teodor Sigaev. Since we may need to execute commands to complete a failed transaction block, the script is now always executed completely. If there is a serialization/deadlock failure which can be retried, the script is executed again with the same random state and array of variables as before its first run. Meta commands errors as well as all SQL errors do not cause the aborting of the client. The first failure in the current script execution determines whether the script run will be retried or not, so only such failures (they have a retry) or errors (they are not retried) are reported. I tried to make fixes in accordance with your previous reviews ([1], [2], [3]): I'm unclear about the added example added in the documentation. There are 71% errors, but 100% of transactions are reported as processed. If there were errors, then it is not a success, so the transaction were not processed? To me it looks inconsistent. Also, while testing, it seems that failed transactions are counted in tps, which I think is not appropriate: About the feature: sh> PGOPTIONS='-c default_transaction_isolation=serializable' \ ./pgbench -P 1 -T 3 -r -M prepared -j 2 -c 4 starting vacuum...end. progress: 1.0 s, 10845.8 tps, lat 0.091 ms stddev 0.491, 10474 failed # NOT 10845.8 TPS... progress: 2.0 s, 10534.6 tps, lat 0.094 ms stddev 0.658, 10203 failed progress: 3.0 s, 10643.4 tps, lat 0.096 ms stddev 0.568, 10290 failed ... number of transactions actually processed: 32028 # NO! number of errors: 30969 (96.694 %) latency average = 2.833 ms latency stddev = 1.508 ms tps = 10666.720870 (including connections establishing) # NO tps = 10683.034369 (excluding connections establishing) # NO ... For me this is all wrong. I think that the tps report is about transactions that succeeded, not mere attempts. I cannot say that a transaction which aborted was "actually processed"... as it was not. Fixed The order of reported elements is not logical: maximum number of transaction tries: 100 scaling factor: 10 query mode: prepared number of clients: 4 number of threads: 2 duration: 3 s number of transactions actually processed: 967 number of errors: 152 (15.719 %) latency average = 9.630 ms latency stddev = 13.366 ms number of transactions retried: 623 (64.426 %) number of retries: 32272 I would suggest to group everything about error handling in one block, eg something like: scaling factor: 10 query mode: prepared number of clients: 4 number of threads: 2 duration: 3 s number of transactions actually processed: 967 number of errors: 152 (15.719 %) number of transactions retried: 623 (64.426 %) number of retries: 32272 maximum number of transaction tries: 100 latency average = 9.630 ms latency stddev = 13.366 ms Fixed Also, percent character should be stuck to its number: 15.719% to have the style more homogeneous (although there seems to be
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hello Marina, Many thanks to both of you! I'm working on a patch in this direction.. I think that the best approach for now is simply to reset (command zero, random generator) and start over the whole script, without attempting to be more intelligent. The limitations should be clearly documented (one transaction per script), though. That would be a significant enhancement already. I'm not sure that we can always do this, because we can get new errors until we finish the failed transaction block, and we need destroy the conditional stack.. Sure. I'm suggesting so as to simplify that on failures the retry would always restarts from the beginning of the script by resetting everything, indeed including the conditional stack, the random generator state, the variable values, and so on. This mean enforcing somehow one script is one transaction. If the user does not do that, it would be their decision and the result becomes unpredictable on errors (eg some sub-transactions could be executed more than once). Then if more is needed, that could be for another patch. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 25-03-2018 15:23, Fabien COELHO wrote: Hm, I took a look on both thread about patch and it seems to me now it's overcomplicated. With recently committed enhancements of pgbench (\if, \when) it becomes close to impossible to retry transation in case of failure. So, initial approach just to rollback such transaction looks more attractive. Yep. Many thanks to both of you! I'm working on a patch in this direction.. I think that the best approach for now is simply to reset (command zero, random generator) and start over the whole script, without attempting to be more intelligent. The limitations should be clearly documented (one transaction per script), though. That would be a significant enhancement already. I'm not sure that we can always do this, because we can get new errors until we finish the failed transaction block, and we need destroy the conditional stack.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hm, I took a look on both thread about patch and it seems to me now it's overcomplicated. With recently committed enhancements of pgbench (\if, \when) it becomes close to impossible to retry transation in case of failure. So, initial approach just to rollback such transaction looks more attractive. Yep. I think that the best approach for now is simply to reset (command zero, random generator) and start over the whole script, without attempting to be more intelligent. The limitations should be clearly documented (one transaction per script), though. That would be a significant enhancement already. -- Fabien.
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
I suggest a patch where pgbench client sessions are not disconnected because of serialization or deadlock failures and these failures are mentioned in reports. In details: - transaction with one of these failures continue run normally, but its result is rolled back; - if there were these failures during script execution this "transaction" is marked appropriately in logs; - numbers of "transactions" with these failures are printed in progress, in aggregation logs and in the end with other results (all and for each script); Hm, I took a look on both thread about patch and it seems to me now it's overcomplicated. With recently committed enhancements of pgbench (\if, \when) it becomes close to impossible to retry transation in case of failure. So, initial approach just to rollback such transaction looks more attractive. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/