Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2021-05-23 Thread Yugo NAGATA
Hi hackers,

On Tue, 10 Mar 2020 09:48:23 +1300
Thomas Munro  wrote:

> On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO  wrote:
> > >> Thank you very much! I'm going to send a new patch set until the end of
> > >> this week (I'm sorry I was very busy in the release of Postgres Pro
> > >> 11...).
> > >
> > > Is anyone interested in rebasing this, and summarising what needs to
> > > be done to get it in?  It's arguably a bug or at least quite
> > > unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
> > > that a couple of forks already ship Marina's patch set.

I got interested in this and now looking into the patch and the past 
discussion. 
If anyone other won't do it and there are no objection, I would like to rebase
this. Is that okay?

Regards,
Yugo NAGATA

> >
> > I'm a reviewer on this patch, that I find a good thing (tm), and which was
> > converging to a reasonable and simple enough addition, IMHO.
> >
> > If I proceed in place of Marina, who is going to do the reviews?
> 
> Hi Fabien,
> 
> Cool.  I'll definitely take it for a spin if you post a fresh patch
> set.  Any place that we arbitrarily don't support SERIALIZABLE, I
> consider a bug, so I'd like to commit this if we can agree it's ready.
> It sounds like it's actually in pretty good shape.


-- 
Yugo NAGATA 




Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2020-03-09 Thread Thomas Munro
On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO  wrote:
> >> Thank you very much! I'm going to send a new patch set until the end of
> >> this week (I'm sorry I was very busy in the release of Postgres Pro
> >> 11...).
> >
> > Is anyone interested in rebasing this, and summarising what needs to
> > be done to get it in?  It's arguably a bug or at least quite
> > unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
> > that a couple of forks already ship Marina's patch set.
>
> I'm a reviewer on this patch, that I find a good thing (tm), and which was
> converging to a reasonable and simple enough addition, IMHO.
>
> If I proceed in place of Marina, who is going to do the reviews?

Hi Fabien,

Cool.  I'll definitely take it for a spin if you post a fresh patch
set.  Any place that we arbitrarily don't support SERIALIZABLE, I
consider a bug, so I'd like to commit this if we can agree it's ready.
It sounds like it's actually in pretty good shape.




Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2020-03-09 Thread Fabien COELHO



Hello Thomas,


Thank you very much! I'm going to send a new patch set until the end of
this week (I'm sorry I was very busy in the release of Postgres Pro
11...).


Is anyone interested in rebasing this, and summarising what needs to
be done to get it in?  It's arguably a bug or at least quite
unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
that a couple of forks already ship Marina's patch set.


I'm a reviewer on this patch, that I find a good thing (tm), and which was 
converging to a reasonable and simple enough addition, IMHO.


If I proceed in place of Marina, who is going to do the reviews?

--
Fabien.




Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2020-03-08 Thread Thomas Munro
On Mon, Mar 9, 2020 at 10:00 AM Marina Polyakova
 wrote:
> On 2018-11-16 22:59, Alvaro Herrera wrote:
> > On 2018-Sep-05, Marina Polyakova wrote:
> >
> >> v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
> >> - a patch for the RandomState structure (this is used to reset a
> >> client's
> >> random seed during the repeating of transactions after
> >> serialization/deadlock failures).
> >
> > Pushed this one with minor stylistic changes (the most notable of which
> > is the move of initRandomState to where the rest of the random
> > generator
> > infrastructure is, instead of in a totally random place).  Thanks,
>
> Thank you very much! I'm going to send a new patch set until the end of
> this week (I'm sorry I was very busy in the release of Postgres Pro
> 11...).

Is anyone interested in rebasing this, and summarising what needs to
be done to get it in?  It's arguably a bug or at least quite
unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
that a couple of forks already ship Marina's patch set.




Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-19 Thread Fabien COELHO




Feel free to update a patch status to "needs review" yourself after
submitting a new version that in your opinion respond to a reviewer's
comments.


Sure, I do that. But I will not switch any of my patch to "Ready". AFAICR 
the concerns where mostly about imprecise comments in the code, and a few 
questions that I answered.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-19 Thread Alvaro Herrera
On 2018-Nov-19, Fabien COELHO wrote:

> 
> Hello Alvaro,
> 
> > I also think that the pgbench_error() patch should go in before the main
> > one.  It seems a bit pointless to introduce code using a bad API only to
> > fix the API together with all the new callers immediately afterwards.
> 
> I'm not that keen on this part of the patch, because ISTM that introduces
> significant and possibly costly malloc/free cycles when handling error,
> which do not currently exist in pgbench.

Oh, I wasn't aware of that.

> Related to Marina patch (triggered by reviewing the patches), I have
> submitted a refactoring patch which aims at cleaning up the internal state
> machine, so that additions and checking that all is well is simpler.
> 
>   https://commitfest.postgresql.org/20/1754/

let me look at this one.

> It has been reviewed, I think I answered to the reviewer concerns, but the
> reviewer did not update the patch state on the cf app, so I do not know
> whether he is unsatisfied or if it was just forgotten.

Feel free to update a patch status to "needs review" yourself after
submitting a new version that in your opinion respond to a reviewer's
comments.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-19 Thread Fabien COELHO



Hello Alvaro,


I also think that the pgbench_error() patch should go in before the main
one.  It seems a bit pointless to introduce code using a bad API only to
fix the API together with all the new callers immediately afterwards.


I'm not that keen on this part of the patch, because ISTM that introduces 
significant and possibly costly malloc/free cycles when handling error, 
which do not currently exist in pgbench.


Previously an error was basically the end of the script, but with the 
feature being introduced by Marina some errors are handled, in which case 
we end up with paying these costs in the test loop. Also, refactoring 
error handling is not necessary for the new feature. That is why I advised 
to move it away and possibly keep it for later.


Related to Marina patch (triggered by reviewing the patches), I have 
submitted a refactoring patch which aims at cleaning up the internal state 
machine, so that additions and checking that all is well is simpler.


https://commitfest.postgresql.org/20/1754/

It has been reviewed, I think I answered to the reviewer concerns, but the 
reviewer did not update the patch state on the cf app, so I do not know 
whether he is unsatisfied or if it was just forgotten.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-19 Thread Alvaro Herrera
On 2018-Nov-19, Marina Polyakova wrote:

> On 2018-11-16 22:59, Alvaro Herrera wrote:
> > On 2018-Sep-05, Marina Polyakova wrote:
> > 
> > > v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
> > > - a patch for the RandomState structure (this is used to reset a
> > > client's
> > > random seed during the repeating of transactions after
> > > serialization/deadlock failures).
> > 
> > Pushed this one with minor stylistic changes (the most notable of which
> > is the move of initRandomState to where the rest of the random generator
> > infrastructure is, instead of in a totally random place).  Thanks,
> 
> Thank you very much! I'm going to send a new patch set until the end of this
> week (I'm sorry I was very busy in the release of Postgres Pro 11...).

Great, thanks.

I also think that the pgbench_error() patch should go in before the main
one.  It seems a bit pointless to introduce code using a bad API only to
fix the API together with all the new callers immediately afterwards.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-19 Thread Marina Polyakova

On 2018-11-16 22:59, Alvaro Herrera wrote:

On 2018-Sep-05, Marina Polyakova wrote:


v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a 
client's

random seed during the repeating of transactions after
serialization/deadlock failures).


Pushed this one with minor stylistic changes (the most notable of which
is the move of initRandomState to where the rest of the random 
generator

infrastructure is, instead of in a totally random place).  Thanks,


Thank you very much! I'm going to send a new patch set until the end of 
this week (I'm sorry I was very busy in the release of Postgres Pro 
11...).


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-16 Thread Alvaro Herrera
On 2018-Sep-05, Marina Polyakova wrote:

> v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
> - a patch for the RandomState structure (this is used to reset a client's
> random seed during the repeating of transactions after
> serialization/deadlock failures).

Pushed this one with minor stylistic changes (the most notable of which
is the move of initRandomState to where the rest of the random generator
infrastructure is, instead of in a totally random place).  Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-10-01 Thread Michael Paquier
On Wed, Sep 12, 2018 at 06:12:29PM +0300, Marina Polyakova wrote:
> The discussion about this has become entangled from the beginning, because
> as I wrote in [1] at first I misread your original proposal...

The last emails are about the last reviews of Fabien, which has remained
unanswered for the last couple of weeks.  I am marking this patch as
returned with feedback for now.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-12 Thread Marina Polyakova

On 12-09-2018 17:04, Fabien COELHO wrote:

Hello Marina,

You can get other errors that cannot happen for only one client if you 
use shell commands in meta commands:


Or if you use untrusted procedural languages in SQL expressions (see 
the used file in the attachments):


Or if you try to create a function and perhaps replace an existing 
one:


Sure. Indeed there can be shell errors, perl errors, create functions
conflicts... I do not understand what is your point wrt these.

I'm mostly saying that your patch should focus on implementing the
retry feature when appropriate, and avoid changing the behavior (error
displayed, abort or not) on features unrelated to serialization &
deadlock errors.

Maybe there are inconsistencies, and "bug"/"feature" worth fixing, but
if so that should be a separate patch, if possible, and if these are
bugs they could be backpatched.

For now I'm still convinced that pgbench should keep on aborting on
"\set" or SQL syntax errors, and show clear error messages on these,
and your examples have not changed my mind on that point.


I'm fine with renaming the field if it makes thinks clearer. They are
all counters, so naming them "cnt" or "total_cnt" does not help much.
Maybe "succeeded" or "success" to show what is really counted?


Perhaps renaming of StatsData.cnt is better than just adding a comment 
to this field. But IMO we have the same problem (They are all 
counters, so naming them "cnt" or "total_cnt" does not help much.) for 
CState.cnt which cannot be named in the same way because it also 
includes skipped and failed transactions.


Hmmm. CState's cnt seems only used to implement -t anyway? I'm okay if
it has a different name, esp if it has a different semantics.


Ok!


I think
I was arguing only about cnt in StatsData.


The discussion about this has become entangled from the beginning, 
because as I wrote in [1] at first I misread your original proposal...


[1] 
https://www.postgresql.org/message-id/d318cdee8f96de6b1caf2ce684ffe4db%40postgrespro.ru


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-12 Thread Fabien COELHO



Hello Marina,

You can get other errors that cannot happen for only one client if you use 
shell commands in meta commands:


Or if you use untrusted procedural languages in SQL expressions (see the used 
file in the attachments):



Or if you try to create a function and perhaps replace an existing one:


Sure. Indeed there can be shell errors, perl errors, create functions 
conflicts... I do not understand what is your point wrt these.


I'm mostly saying that your patch should focus on implementing the retry 
feature when appropriate, and avoid changing the behavior (error 
displayed, abort or not) on features unrelated to serialization & deadlock 
errors.


Maybe there are inconsistencies, and "bug"/"feature" worth fixing, but if 
so that should be a separate patch, if possible, and if these are bugs 
they could be backpatched.


For now I'm still convinced that pgbench should keep on aborting on "\set" 
or SQL syntax errors, and show clear error messages on these, and your 
examples have not changed my mind on that point.



I'm fine with renaming the field if it makes thinks clearer. They are
all counters, so naming them "cnt" or "total_cnt" does not help much.
Maybe "succeeded" or "success" to show what is really counted?


Perhaps renaming of StatsData.cnt is better than just adding a comment to 
this field. But IMO we have the same problem (They are all counters, so 
naming them "cnt" or "total_cnt" does not help much.) for CState.cnt which 
cannot be named in the same way because it also includes skipped and failed 
transactions.


Hmmm. CState's cnt seems only used to implement -t anyway? I'm okay if it 
has a different name, esp if it has a different semantics. I think I was 
arguing only about cnt in StatsData.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-12 Thread Marina Polyakova

On 11-09-2018 18:29, Fabien COELHO wrote:

Hello Marina,

Hmm, but we can say the same for serialization or deadlock errors that 
were not retried (the client test code itself could not run correctly 
or the SQL sent was somehow wrong, which is also the client's fault), 
can't we?


I think not.

If a client asks for something "legal", but some other client in
parallel happens to make an incompatible change which result in a
serialization or deadlock error, the clients are not responsible for
the raised errors, it is just that they happen to ask for something
incompatible at the same time. So there is no user error per se, but
the server is reporting its (temporary) inability to process what was
asked for. For these errors, retrying is fine. If the client was
alone, there would be no such errors, you cannot deadlock with
yourself. This is really an isolation issue linked to parallel
execution.


You can get other errors that cannot happen for only one client if you 
use shell commands in meta commands:


starting vacuum...end.
transaction type: pgbench_meta_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 20/20
maximum number of tries: 1
latency average = 6.953 ms
tps = 287.630161 (including connections establishing)
tps = 303.232242 (excluding connections establishing)
statement latencies in milliseconds and failures:
 1.636   0  BEGIN;
 1.497   0  \setshell var mkdir my_directory && echo 1
 0.007   0  \sleep 1 us
 1.465   0  \setshell var rmdir my_directory && echo 1
 1.622   0  END;

starting vacuum...end.
mkdir: cannot create directory ‘my_directory’: File exists
mkdir: could not read result of shell command
client 1 got an error in command 1 (setshell) of script 0; execution of 
meta-command failed

transaction type: pgbench_meta_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 19/20
number of failures: 1 (5.000%)
number of meta-command failures: 1 (5.000%)
maximum number of tries: 1
latency average = 11.782 ms (including failures)
tps = 161.269033 (including connections establishing)
tps = 167.733278 (excluding connections establishing)
statement latencies in milliseconds and failures:
 2.731   0  BEGIN;
 2.909   1  \setshell var mkdir my_directory && echo 1
 0.231   0  \sleep 1 us
 2.366   0  \setshell var rmdir my_directory && echo 1
 2.664   0  END;

Or if you use untrusted procedural languages in SQL expressions (see the 
used file in the attachments):


starting vacuum...ERROR:  relation "pgbench_branches" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_tellers" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_history" does not exist
(ignoring this error and continuing anyway)
end.
client 1 got an error in command 0 (SQL) of script 0; ERROR:  could not 
create the directory "my_directory": File exists at line 3.

CONTEXT:  PL/Perl anonymous code block

client 1 got an error in command 0 (SQL) of script 0; ERROR:  could not 
create the directory "my_directory": File exists at line 3.

CONTEXT:  PL/Perl anonymous code block

transaction type: pgbench_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 18/20
number of failures: 2 (10.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 2 (10.000%)
maximum number of tries: 1
latency average = 3.282 ms (including failures)
tps = 548.437196 (including connections establishing)
tps = 637.662753 (excluding connections establishing)
statement latencies in milliseconds and failures:
 1.566   2  DO $$

starting vacuum...ERROR:  relation "pgbench_branches" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_tellers" does not exist
(ignoring this error and continuing anyway)
ERROR:  relation "pgbench_history" does not exist
(ignoring this error and continuing anyway)
end.
transaction type: pgbench_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 20/20
maximum number of tries: 1
latency average = 2.760 ms
tps = 724.746078 (including connections establishing)
tps = 853.131985 (excluding connections establishing)
statement latencies in milliseconds and failures:
 1.893   0  DO $$

Or if you try to create a function and perhaps replace an existing one:

starting vacuum...end.
client 0 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-11 Thread Fabien COELHO



Hello Marina,

Hmm, but we can say the same for serialization or deadlock errors that were 
not retried (the client test code itself could not run correctly or the SQL 
sent was somehow wrong, which is also the client's fault), can't we?


I think not.

If a client asks for something "legal", but some other client in parallel 
happens to make an incompatible change which result in a serialization or 
deadlock error, the clients are not responsible for the raised errors, it 
is just that they happen to ask for something incompatible at the same 
time. So there is no user error per se, but the server is reporting its 
(temporary) inability to process what was asked for. For these errors, 
retrying is fine. If the client was alone, there would be no such errors, 
you cannot deadlock with yourself. This is really an isolation issue 
linked to parallel execution.


Why not handle client errors that can occur (but they may also not 
occur) the same way? (For example, always abort the client, or 
conversely do not make aborts in these cases.) Here's an example of such 
error:



client 5 got an error in command 1 (SQL) of script 0; ERROR:  division by zero


This is an interesting case. For me we must stop the script because the 
client is asking for something "stupid", and retrying the same won't 
change the outcome, the division will still be by zero. It is the client 
responsability not to ask for something stupid, the bench script is buggy, 
it should not submit illegal SQL queries. This is quite different from 
submitting something legal which happens to fail.


Maybe we can limit the number of failures in one statement, and abort the 
client if this limit is exceeded?...


I think this is quite debatable, and that the best option is to leavze 
this point out of the current patch, so that we could have retry on 
serial/deadlock errors.


Then you can submit another patch for a feature about other errors if you 
feel that there is a use case for going on in some cases. I think that the 
previous behavior made sense, and that changing it should only be 
considered as an option. As it involves discussing and is not obvious, 
later is better.


To get a clue about the actual issue you can use the options 
--failures-detailed (to find out out whether this is a serialization failure 
/ deadlock failure / other SQL failure / meta command failure) and/or 
--print-errors (to get the complete error message).


Yep, but for me it should haved stopped immediately, as it did before.

If you use the option --latency-limit, the time of tries will be limited 
regardless of the use of the option -t. Therefore ISTM that an unlimited 
number of tries can be used only if the time of tries is limited by the 
options -T and/or -L.


Indeed, I'm ok with forbidding unlimitted retries when under -t.


I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?


I'm not sure that the main debugging output will give a good clue of what's 
happened without full messages about errors, retries and failures...


I'm more argumenting about letting the user decide what they want.


These lines are quite long - do you suggest to wrap them this way?


Sure, if it is too long, then wrap.

Function getTransactionStatus name does not seem to correspond fully to 
what the function does. There is a passthru case which should be either 
avoided or clearly commented.


I don't quite understand you - do you mean that in fact this function finds 
out whether we are in a (failed) transaction block or not? Or do you mean 
that the case of PQTRANS_INTRANS is also ok?...


The former: although the function is named "getTransactionStatus", it does 
not really return the "status" of the transaction (aka PQstatus()?).


I tried not to break the limit of 80 characters, but if you think that this 
is better, I'll change it.


Hmmm. 80 columns, indeed...


I'd insist in a comment that "cnt" does not include "skipped" transactions
(anymore).


If you mean CState.cnt I'm not sure if this is practically useful because the 
code uses only the sum of all client transactions including skipped and 
failed... Maybe we can rename this field to nxacts or total_cnt?


I'm fine with renaming the field if it makes thinks clearer. They are all 
counters, so naming them "cnt" or "total_cnt" does not help much. Maybe 
"succeeded" or "success" to show what is really counted?


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-11 Thread Marina Polyakova

On 11-09-2018 16:47, Marina Polyakova wrote:

On 08-09-2018 16:03, Fabien COELHO wrote:

Hello Marina,
I'd insist in a comment that "cnt" does not include "skipped" 
transactions

(anymore).


If you mean CState.cnt I'm not sure if this is practically useful
because the code uses only the sum of all client transactions
including skipped and failed... Maybe we can rename this field to
nxacts or total_cnt?


Sorry, I misread your proposal for the first time. Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-11 Thread Marina Polyakova

On 08-09-2018 16:03, Fabien COELHO wrote:

Hello Marina,


v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of 
transactions with serialization/deadlock failures (see the detailed 
description in the file).


About patch v11-3.

Patch applies cleanly on top of the other two. Compiles, global and 
local

"make check" are ok.


:-)


* Features

As far as the actual retry feature is concerned, I'd say we are nearly
there. However I have an issue with changing the behavior on meta
command and other sql errors, which I find not desirable.

When a meta-command fails, before the patch the command is aborted and
there is a convenient error message:

  sh> pgbench -T 10 -f bad-meta.sql
  bad-meta.sql:1: unexpected function name (false) in command "set" 
[...]

  \set i false + 1 [...]

After the patch it is simply counted, pgbench loops on the same error
till the time is completed, and there are no clue about the actual
issue:

  sh> pgbench -T 10 -f bad-meta.sql
  starting vacuum...end.
  transaction type: bad-meta.sql
  duration: 10 s
  number of transactions actually processed: 0
  number of failures: 27993953 (100.000%)
  ...

Same thing about SQL errors, an immediate abort...

  sh> pgbench -T 10 -f bad-sql.sql
  starting vacuum...end.
  client 0 aborted in command 0 of script 0; ERROR:  syntax error at or 
near ";"

  LINE 1: SELECT 1 + ;

... is turned into counting without aborting nor error messages, so
that there is no clue that the user was asking for something bad.

  sh> pgbench -T 10 -f bad-sql.sql
  starting vacuum...end.
  transaction type: bad-sql.sql
  scaling factor: 1
  query mode: simple
  number of clients: 1
  number of threads: 1
  duration: 10 s
  number of transactions actually processed: 0
  number of failures: 274617 (100.000%)
  # no clue that there was a syntax error in the script

I do not think that these changes of behavior are desirable. Meta 
command and
miscellaneous SQL errors should result in immediatly aborting the whole 
run,
because the client test code itself could not run correctly or the SQL 
sent

was somehow wrong, which is also the client's fault, and the server
performance bench does not make much sense in such conditions.

ISTM that the focus of this patch should only be to handle some server
runtime errors that can be retryed, but not to change pgbench behavior
on other kind of errors. If these are to be changed, ISTM that it
would be a distinct patch and would require some discussion, and
possibly an option to enable it or not if some use case emerge. AFA
this patch is concerned, I'd suggest to let that out.

...
The following remarks are linked to the change of behavior discussed 
above:
makeVariableValue error message is not for debug, but must be kept in 
all

cases, and the false returned must result in an immediate abort. Same
thing about
lookupCreateVariable, an invalid name is a user error which warrants
an immediate
abort. Same thing again about coerce* functions or evalStandardFunc...
Basically, most/all added "debug_level >= DEBUG_ERRORS" are not 
desirable.


Hmm, but we can say the same for serialization or deadlock errors that 
were not retried (the client test code itself could not run correctly or 
the SQL sent was somehow wrong, which is also the client's fault), can't 
we? Why not handle client errors that can occur (but they may also not 
occur) the same way? (For example, always abort the client, or 
conversely do not make aborts in these cases.) Here's an example of such 
error:


starting vacuum...end.
transaction type: pgbench_rare_sql_error.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 250
number of transactions actually processed: 2500/2500
maximum number of tries: 1
latency average = 0.375 ms
tps = 26695.292848 (including connections establishing)
tps = 27489.678525 (excluding connections establishing)
statement latencies in milliseconds and failures:
 0.001   0  \set divider random(-1000, 1000)
 0.245   0  SELECT 1 / :divider;

starting vacuum...end.
client 5 got an error in command 1 (SQL) of script 0; ERROR:  division 
by zero


client 0 got an error in command 1 (SQL) of script 0; ERROR:  division 
by zero


client 7 got an error in command 1 (SQL) of script 0; ERROR:  division 
by zero


transaction type: pgbench_rare_sql_error.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 250
number of transactions actually processed: 2497/2500
number of failures: 3 (0.120%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 3 (0.120%)
maximum number of tries: 1
latency average = 0.579 ms (including failures)
tps = 17240.662547 (including connections establishing)
tps = 17862.090137 (excluding connections establishing)
statement latencies 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-11 Thread Marina Polyakova

On 08-09-2018 10:17, Fabien COELHO wrote:

Hello Marina,


Hello, Fabien!


About the two first preparatory patches.


v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a 
client's random seed during the repeating of transactions after 
serialization/deadlock failures).


Same version as the previous one, which was ok. Still applies,
compiles, passes tests. Fine with me.


v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client 
variables during the repeating of transactions after 
serialization/deadlock failures).


Simpler version, applies cleanly on top of previous patch, compiles
and global & local "make check" are ok. Fine with me as well.


Glad to hear it :)

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-08 Thread Fabien COELHO



Hello Marina,


v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of transactions 
with serialization/deadlock failures (see the detailed description in the 
file).


About patch v11-3.

Patch applies cleanly on top of the other two. Compiles, global and local
"make check" are ok.

* Features

As far as the actual retry feature is concerned, I'd say we are nearly 
there. However I have an issue with changing the behavior on meta command 
and other sql errors, which I find not desirable.


When a meta-command fails, before the patch the command is aborted and 
there is a convenient error message:


  sh> pgbench -T 10 -f bad-meta.sql
  bad-meta.sql:1: unexpected function name (false) in command "set" [...]
  \set i false + 1 [...]

After the patch it is simply counted, pgbench loops on the same error till 
the time is completed, and there are no clue about the actual issue:


  sh> pgbench -T 10 -f bad-meta.sql
  starting vacuum...end.
  transaction type: bad-meta.sql
  duration: 10 s
  number of transactions actually processed: 0
  number of failures: 27993953 (100.000%)
  ...

Same thing about SQL errors, an immediate abort...

  sh> pgbench -T 10 -f bad-sql.sql
  starting vacuum...end.
  client 0 aborted in command 0 of script 0; ERROR:  syntax error at or near ";"
  LINE 1: SELECT 1 + ;

... is turned into counting without aborting nor error messages, so that 
there is no clue that the user was asking for something bad.


  sh> pgbench -T 10 -f bad-sql.sql
  starting vacuum...end.
  transaction type: bad-sql.sql
  scaling factor: 1
  query mode: simple
  number of clients: 1
  number of threads: 1
  duration: 10 s
  number of transactions actually processed: 0
  number of failures: 274617 (100.000%)
  # no clue that there was a syntax error in the script

I do not think that these changes of behavior are desirable. Meta command and
miscellaneous SQL errors should result in immediatly aborting the whole run,
because the client test code itself could not run correctly or the SQL sent
was somehow wrong, which is also the client's fault, and the server 
performance bench does not make much sense in such conditions.


ISTM that the focus of this patch should only be to handle some server 
runtime errors that can be retryed, but not to change pgbench behavior on 
other kind of errors. If these are to be changed, ISTM that it would be a 
distinct patch and would require some discussion, and possibly an option 
to enable it or not if some use case emerge. AFA this patch is concerned, 
I'd suggest to let that out.



Doc says "you cannot use an infinite number of retries without latency-limit..."

Why should this be forbidden? At least if -T timeout takes precedent and
shortens the execution, ISTM that there could be good reason to test that.
Maybe it could be blocked only under -t if this would lead to an non-ending
run.


As "--print-errors" is really for debug, maybe it could be named
"--debug-errors". I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?


* Code

The following remarks are linked to the change of behavior discussed above:
makeVariableValue error message is not for debug, but must be kept in all
cases, and the false returned must result in an immediate abort. Same thing 
about
lookupCreateVariable, an invalid name is a user error which warrants an 
immediate
abort. Same thing again about coerce* functions or evalStandardFunc...
Basically, most/all added "debug_level >= DEBUG_ERRORS" are not desirable.

sendRollback(): I'd suggest to simplify. The prepare/extended statement stuff is
really about the transaction script, not dealing with errors, esp as there is no
significant advantage in preparing a "ROLLBACK" statement which is short and has
no parameters. I'd suggest to remove this function and just issue
PQsendQuery("ROLLBACK;") in all cases.

In copyVariables, I'd simplify

 + if (source_var->svalue == NULL)
 +   dest_var->svalue = NULL;
 + else
 +   dest_var->svalue = pg_strdup(source_var->svalue);

as:

  dest_var->value = (source_var->svalue == NULL) ? NULL : 
pg_strdup(source_var->svalue);

 + if (sqlState)   ->   if (sqlState != NULL) ?


Function getTransactionStatus name does not seem to correspond fully to what the
function does. There is a passthru case which should be either avoided or
clearly commented.


About:

 - commandFailed(st, "SQL", "perhaps the backend died while processing");
 + clientAborted(st,
 +  "perhaps the backend died while processing");

keep on one line?


About:

 + if (doRetry(st, ))
 +   st->state = CSTATE_RETRY;
 + else
 +   st->state = CSTATE_FAILURE;

-> st->state = doRetry(st, ) ? CSTATE_RETRY : CSTATE_FAILURE;


* Comments

"There're different types..." -> "There are different types..."

"after the errors and"... -> "after errors and"...

"the default 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-09-08 Thread Fabien COELHO



Hello Marina,

About the two first preparatory patches.


v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a client's 
random seed during the repeating of transactions after serialization/deadlock 
failures).


Same version as the previous one, which was ok. Still applies, compiles, 
passes tests. Fine with me.



v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client variables 
during the repeating of transactions after serialization/deadlock failures).


Simpler version, applies cleanly on top of previous patch, compiles and 
global & local "make check" are ok. Fine with me as well.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-17 Thread Marina Polyakova

On 17-08-2018 14:04, Fabien COELHO wrote:

...
Or perhaps we can use a more detailed failure status so for each type 
of failure we always know the command name (argument "cmd") and 
whether the client is aborted. Something like this (but in comparison 
with the first variant ISTM overly complicated):


I agree., I do not think that it would be useful given that the same
thing is done on all meta-command error cases in the end.


Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-17 Thread Marina Polyakova

On 17-08-2018 10:49, Fabien COELHO wrote:

Hello Marina,

Detailed -r report. I understand from the doc that the retry number 
on the detailed per-statement report is to identify at what point 
errors occur? Probably this is more or less always at the same point 
on a given script, so that the most interesting feature is to report 
the number of retries at the script level.


This may depend on various factors.. for example:
[...]
   21.239   5  36  UPDATE xy3 SET y = y + :delta 
WHERE x = :x3;
   21.360   5  39  UPDATE xy2 SET y = y + :delta 
WHERE x = :x2;


Ok, not always the same point, and you confirm that it identifies
where the error is raised which leads to a retry.


Yes, I confirm this. I'll try to write more clearly about this in the 
documentation...



So we can write something like this:


All the transactions are divided into several types depending on their 
execution. Firstly, they can be divided into transactions that we 
started to execute, and transactions which were skipped (it was too 
late to execute them). Secondly, running transactions fall into 2 main 
types: is there any command that got a failure during the last 
execution of the transaction script or not? Thus


Here is an attempt at having a more precise and shorter version, not
sure it is much better than yours, though:

"""
Transactions are counted depending on their execution and outcome. 
First

a transaction may have started or not: skipped transactions occur
under --rate and --latency-limit when the client is too late to
execute them. Secondly, a started transaction may ultimately succeed
or fail on some error, possibly after some retries when --max-tries is
not one. Thus
"""


Thank you!

I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, 
otherwise "another" does not make sense yet.


Maybe firstly put a general group, and then special cases?...


I understand it more as a catch all default "none of the above" case.


Ok!

commandFailed: I'm not thrilled by the added boolean, which is 
partially

redundant with the second argument.


Do you mean that it is partially redundant with the argument "cmd" 
and, for example, the meta commands errors always do not cause the 
abortions of the client?


Yes. And also I'm not sure we should want this boolean at all.


Perhaps we can use a separate function to print the messages about 
client's abortion, something like this (it is assumed that all abortions 
happen when processing SQL commands):


static void
clientAborted(CState *st, const char *message)
{
pgbench_error(...,
  "client %d aborted in command %d (SQL) of script 
%d; %s\n",
  st->id, st->command, st->use_file, message);
}

Or perhaps we can use a more detailed failure status so for each type of 
failure we always know the command name (argument "cmd") and whether the 
client is aborted. Something like this (but in comparison with the first 
variant ISTM overly complicated):


/*
 * For the failures during script execution.
 */
typedef enum FailureStatus
{
NO_FAILURE = 0,

/*
 * Failures in meta commands. In these cases the failed transaction is
 * terminated.
 */
META_SET_FAILURE,
META_SETSHELL_FAILURE,
META_SHELL_FAILURE,
META_SLEEP_FAILURE,
META_IF_FAILURE,
META_ELIF_FAILURE,

/*
	 * Failures in SQL commands. In cases of serialization/deadlock 
failures a
	 * failed transaction is re-executed from the very beginning if 
possible;

 * otherwise the failed transaction is terminated.
 */
SERIALIZATION_FAILURE,
DEADLOCK_FAILURE,
OTHER_SQL_FAILURE,  /* other failures in SQL 
commands that are not
 * listed by 
themselves above */

/*
 * Failures while processing SQL commands. In this case the client is
 * aborted.
 */
SQL_CONNECTION_FAILURE
} FailureStatus;


[...]
If in such cases one command is placed on several lines, ISTM that the 
code is more understandable if curly brackets are used...


Hmmm. Such basic style changes are avoided because they break
backpatching, so we try to avoid gratuitous changes unless there is a
strong added value, which does not seem to be the case here.


Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-17 Thread Fabien COELHO



Hello Marina,

Detailed -r report. I understand from the doc that the retry number on 
the detailed per-statement report is to identify at what point errors 
occur? Probably this is more or less always at the same point on a 
given script, so that the most interesting feature is to report the 
number of retries at the script level.


This may depend on various factors.. for example:
[...]
   21.239   5  36  UPDATE xy3 SET y = y + :delta WHERE x 
= :x3;
   21.360   5  39  UPDATE xy2 SET y = y + :delta WHERE x 
= :x2;


Ok, not always the same point, and you confirm that it identifies where 
the error is raised which leads to a retry.


And you can always get the number of retries at the script level from the 
main report (if only one script is used) or from the report for each script 
(if multiple scripts are used).


Ok.

ISTM that "skipped" transactions are NOT "successful" so there are a 
problem with comments. I believe that your formula are probably right, 
it has more to do with what is "success". For cnt decomposition, ISTM 
that "other transactions" are really "directly successful 
transactions".


I agree with you, but I also think that skipped transactions should not be 
considered errors.


I'm ok with having a special category for them in the explanations, which 
is neither success nor error.



So we can write something like this:


All the transactions are divided into several types depending on their 
execution. Firstly, they can be divided into transactions that we started to 
execute, and transactions which were skipped (it was too late to execute 
them). Secondly, running transactions fall into 2 main types: is there any 
command that got a failure during the last execution of the transaction 
script or not? Thus


Here is an attempt at having a more precise and shorter version, not sure 
it is much better than yours, though:


"""
Transactions are counted depending on their execution and outcome. First
a transaction may have started or not: skipped transactions occur under 
--rate and --latency-limit when the client is too late to execute them. 
Secondly, a started transaction may ultimately succeed or fail on some 
error, possibly after some retries when --max-tries is not one. Thus

"""


the number of all transactions =
 skipped (it was too late to execute them)
 cnt (the number of successful transactions) +
 ecnt (the number of failed transactions).

A successful transaction can have several unsuccessful tries before a
successfull run. Thus

cnt (the number of successful transactions) =
 retried (they got a serialization or a deadlock failure(s), but were
  successfully retried from the very beginning) +
 directly successfull transactions (they were successfully completed on
the first try).


These above description is clearer for me.

I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise 
"another" does not make sense yet.


Maybe firstly put a general group, and then special cases?...


I understand it more as a catch all default "none of the above" case.

In TState, field "uint32 retries": maybe it would be simpler to count 
"tries", which can be compared directly to max tries set in the option?


If you mean retries in CState - on the one hand, yes, on the other hand, 
statistics always use the number of retries...


Ok.


The automaton skips to FAILURE on every possible error. I'm wondering 
whether it could do so only on SQL errors, because other fails will 
lead to ABORTED anyway? If there is no good reason to skip to FAILURE 
from some errors, I'd suggest to keep the previous behavior. Maybe the 
good reason is to do some counting, but this means that on eg 
metacommand errors now the script would loop over instead of aborting, 
which does not look like a desirable change of behavior.


Even in the case of meta command errors we must prepare for CSTATE_END_TX and 
the execution of the next script: if necessary, clear the conditional stack 
and rollback the current transaction block.


Seems ok.


commandFailed: I'm not thrilled by the added boolean, which is partially
redundant with the second argument.


Do you mean that it is partially redundant with the argument "cmd" and, for 
example, the meta commands errors always do not cause the abortions of the 
client?


Yes. And also I'm not sure we should want this boolean at all.


[...]
If in such cases one command is placed on several lines, ISTM that the code 
is more understandable if curly brackets are used...


Hmmm. Such basic style changes are avoided because they break 
backpatching, so we try to avoid gratuitous changes unless there is a 
strong added value, which does not seem to be the case here.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-16 Thread Marina Polyakova

On 15-08-2018 11:50, Fabien COELHO wrote:

Hello Marina,


Hello!


v10-0004-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of 
transactions with serialization/deadlock failures (see the detailed 
description in the file).


Patch applies cleanly.

It allows retrying a script (considered as a transaction) on
serializable and deadlock errors, which is a very interesting
extension but also impacts pgbench significantly.

I'm waiting for the feature to be right before checking in full the
documentation and tests. There are still some issues to resolve before
checking that.

Anyway, tests look reasonable. Taking advantage of of transactions
control from PL/pgsql is a good use of this new feature.


:-)


A few comments about the doc.

According to the documentation, the feature is triggered by --max-tries 
and
--latency-limit. I disagree with the later, because it means that 
having

latency limit without retrying is not supported anymore.

Maybe you can allow an "unlimited" max-tries, say with special value 
zero,

and the latency limit does its job if set, over all tries.

Doc: "error in meta commands" -> "meta command errors", for homogeneity 
with

other cases?
...
Doc: "never occur.." -> "never occur", or eventually "...".

Doc: "Directly client errors" -> "Direct client errors".
...
inTransactionBlock: I disagree with any function other than doCustom 
changing
the client state, because it makes understanding the state machine 
harder. There
is already one exception to that (threadRun) that I wish to remove. All 
state

changes must be performed explicitely in doCustom.
...
PQexec("ROOLBACK"): you are inserting a synchronous command, for which 
the
thread will have to wait for the result, in a middle of a framework 
which

takes great care to use only asynchronous stuff so that one thread can
manage several clients efficiently. You cannot call PQexec there.
From where I sit, I'd suggest to sendQuery("ROLLBACK"), then switch to
a new state CSTATE_WAIT_ABORT_RESULT which would be similar to
CSTATE_WAIT_RESULT, but on success would skip to RETRY or ABORT instead
of proceeding to the next command.
...
  memcpy(&(st->retry_state.random_state), &(st->random_state),
sizeof(RandomState));

Is there a problem with "st->retry_state.random_state = 
st->random_state;"
instead of memcpy? ISTM that simple assignments work in C. Idem in the 
reverse

copy under RETRY.


Thank you, I'll fix this.

Detailed -r report. I understand from the doc that the retry number on 
the
detailed per-statement report is to identify at what point errors 
occur?
Probably this is more or less always at the same point on a given 
script,
so that the most interesting feature is to report the number of retries 
at the

script level.


This may depend on various factors.. for example:

transaction type: pgbench_test_serialization.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
duration: 10 s
number of transactions actually processed: 266
number of errors: 10 (3.623%)
number of serialization errors: 10 (3.623%)
number of retried: 75 (27.174%)
number of retries: 75
maximum number of tries: 2
latency average = 72.734 ms (including errors)
tps = 26.501162 (including connections establishing)
tps = 26.515082 (excluding connections establishing)
statement latencies in milliseconds, errors and retries:
 0.012   0   0  \set delta random(-5000, 5000)
 0.001   0   0  \set x1 random(1, 10)
 0.001   0   0  \set x3 random(1, 2)
 0.001   0   0  \set x2 random(1, 1)
19.837   0   0  UPDATE xy1 SET y = y + :delta 
WHERE x = :x1;
21.239   5  36  UPDATE xy3 SET y = y + :delta 
WHERE x = :x3;
21.360   5  39  UPDATE xy2 SET y = y + :delta 
WHERE x = :x2;


And you can always get the number of retries at the script level from 
the main report (if only one script is used) or from the report for each 
script (if multiple scripts are used).


I'm still in favor of asserting that the sql connection is idle (no tx 
in
progress) at the beginning and/or end of a script, and report a user 
error

if not, instead of writing complex caveats.

If someone has a use-case for that, then maybe it can be changed, but I
cannot see any in a benchmarking context, and I can see how easy it is
to have a buggy script with this allowed.

I do not think that the RETRIES_ENABLED macro is a good thing. I'd 
suggest

to write the condition four times.


Ok!

ISTM that "skipped" transactions are NOT "successful" so there are a 
problem
with comments. I believe that your formula are probably right, it has 
more to do
with what is "success". For cnt decomposition, ISTM that "other 
transactions"

are really "directly successful transactions".


I agree with you, but I also think that skipped transactions should not 
be 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-15 Thread Fabien COELHO



Hello Marina,


v10-0004-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of transactions 
with serialization/deadlock failures (see the detailed description in the 
file).


Patch applies cleanly.

It allows retrying a script (considered as a transaction) on serializable 
and deadlock errors, which is a very interesting extension but also 
impacts pgbench significantly.


I'm waiting for the feature to be right before checking in full the 
documentation and tests. There are still some issues to resolve before

checking that.

Anyway, tests look reasonable. Taking advantage of of transactions control 
from PL/pgsql is a good use of this new feature.


A few comments about the doc.

According to the documentation, the feature is triggered by --max-tries and
--latency-limit. I disagree with the later, because it means that having
latency limit without retrying is not supported anymore.

Maybe you can allow an "unlimited" max-tries, say with special value zero,
and the latency limit does its job if set, over all tries.

Doc: "error in meta commands" -> "meta command errors", for homogeneity with
other cases?

Detailed -r report. I understand from the doc that the retry number on the
detailed per-statement report is to identify at what point errors occur?
Probably this is more or less always at the same point on a given script,
so that the most interesting feature is to report the number of retries at the
script level.

Doc: "never occur.." -> "never occur", or eventually "...".

Doc: "Directly client errors" -> "Direct client errors".


I'm still in favor of asserting that the sql connection is idle (no tx in
progress) at the beginning and/or end of a script, and report a user error
if not, instead of writing complex caveats.

If someone has a use-case for that, then maybe it can be changed, but I
cannot see any in a benchmarking context, and I can see how easy it is
to have a buggy script with this allowed.

I do not think that the RETRIES_ENABLED macro is a good thing. I'd suggest
to write the condition four times.

ISTM that "skipped" transactions are NOT "successful" so there are a problem
with comments. I believe that your formula are probably right, it has more to do
with what is "success". For cnt decomposition, ISTM that "other transactions"
are really "directly successful transactions".

I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise "another"
does not make sense yet. I'd suggest to name it "OTHER_SQL_FAILURE".

In TState, field "uint32 retries": maybe it would be simpler to count "tries",
which can be compared directly to max tries set in the option?

ErrorLevel: I have already commented about in review about 10.2. I'm not sure of
the LOG -> DEBUG_FAIL changes. I do not understand the name "DEBUG_FAIL", has it
is not related to debug, they just seem to be internal errors. META_ERROR maybe?

inTransactionBlock: I disagree with any function other than doCustom changing
the client state, because it makes understanding the state machine harder. There
is already one exception to that (threadRun) that I wish to remove. All state
changes must be performed explicitely in doCustom.

The automaton skips to FAILURE on every possible error. I'm wondering whether
it could do so only on SQL errors, because other fails will lead to ABORTED
anyway? If there is no good reason to skip to FAILURE from some errors, I'd
suggest to keep the previous behavior. Maybe the good reason is to do some
counting, but this means that on eg metacommand errors now the script would
loop over instead of aborting, which does not look like a desirable change
of behavior.

PQexec("ROOLBACK"): you are inserting a synchronous command, for which the
thread will have to wait for the result, in a middle of a framework which
takes great care to use only asynchronous stuff so that one thread can
manage several clients efficiently. You cannot call PQexec there.

From where I sit, I'd suggest to sendQuery("ROLLBACK"), then switch to

a new state CSTATE_WAIT_ABORT_RESULT which would be similar to
CSTATE_WAIT_RESULT, but on success would skip to RETRY or ABORT instead
of proceeding to the next command.

ISTM that it would be more logical to only get into RETRY if there is a retry,
i.e. move the test RETRY/ABORT in FAILURE. For that, instead of "canRetry",
maybe you want "doRetry", which tells that a retry is possible (the error
is serializable or deadlock) and that the current parameters allow it
(timeout, max retries).


* Minor C style comments:

if / else if / else if ... on *_FAILURE: I'd suggest a switch.

The following line removal does not seem useful, I'd have kept it:

  stats->cnt++;
 -
  if (skipped)

copyVariables: I'm not convinced that source_vars & nvars variables are that
useful.

  memcpy(&(st->retry_state.random_state), &(st->random_state), 
sizeof(RandomState));

Is there a problem with "st->retry_state.random_state = st->random_state;"

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-13 Thread Marina Polyakova

On 12-08-2018 12:14, Fabien COELHO wrote:

HEllo Marina,


Hello, Fabien!


v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client 
variables during the repeating of transactions after 
serialization/deadlock failures).


This patch adds an explicit structure to manage Variables, which is
useful to reset these on pgbench script retries, which is the purpose
of the whole patch series.

About part 3:

Patch applies cleanly,


On 12-08-2018 12:17, Fabien COELHO wrote:

About part 3:

Patch applies cleanly,


I forgot: compiles, global & local "make check" are ok.


I'm glad to hear it :-)


* typo in comments: "varaibles"


I'm sorry, I'll fix it.


* About enlargeVariables:

multiple INT_MAX error handling looks strange, especially as this code
can never be triggered because pgbench would be dead long before
having allocated INT_MAX variables. So I would not bother to add such
checks.
...
I'm not sure that the size_t cast here and there are useful for any
practical values likely to be encountered by pgbench.


Looking at the code of the functions, for example, ParseScript and 
psql_scan_setup, where the integer variable is used for the size of the 
entire script - ISTM that you are right.. Therefore size_t casts will 
also be removed.



ISTM that if something is amiss it will fail in pg_realloc anyway.


IIUC and physical RAM is not enough, this may depend on the size of the 
swap.



Also I do not like the ExpBuf stuff, as usual.



The exponential allocation seems overkill. I'd simply add a constant
number of slots, with a simple rule:

/* reallocated with a margin */
if (max_vars < needed) max_vars = needed + 8;

So in the end the function should be much simpler.


Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-12 Thread Fabien COELHO




About part 3:

Patch applies cleanly,


I forgot: compiles, global & local "make check" are ok.

--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-12 Thread Fabien COELHO



HEllo Marina,


v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client variables 
during the repeating of transactions after serialization/deadlock failures).


This patch adds an explicit structure to manage Variables, which is useful 
to reset these on pgbench script retries, which is the purpose of the 
whole patch series.


About part 3:

Patch applies cleanly,

* typo in comments: "varaibles"

* About enlargeVariables:

multiple INT_MAX error handling looks strange, especially as this code can 
never be triggered because pgbench would be dead long before having 
allocated INT_MAX variables. So I would not bother to add such checks.


ISTM that if something is amiss it will fail in pg_realloc anyway. Also I 
do not like the ExpBuf stuff, as usual.


I'm not sure that the size_t cast here and there are useful for any 
practical values likely to be encountered by pgbench.


The exponential allocation seems overkill. I'd simply add a constant 
number of slots, with a simple rule:


  /* reallocated with a margin */
  if (max_vars < needed) max_vars = needed + 8;

So in the end the function should be much simpler.

--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-10 Thread Marina Polyakova

On 10-08-2018 17:19, Arthur Zakirov wrote:

On Fri, Aug 10, 2018 at 04:46:04PM +0300, Marina Polyakova wrote:

> +1 from me to keep initial name "pgbench_error". "pgbench_log" for new
> function looks nice to me. I think it is better than just "log",
> because "log" may conflict with natural logarithmic function (see "man 3
> log").

Do you think that pgbench_log (or another whose name speaks only about
logging) will look good, for example, with FATAL? Because this means 
that
the logging function also processes errors and calls exit(1) if 
necessary..


Yes, why not. "_log" just means that you want to log some message with
the specified log level. Moreover those messages sometimes aren't 
error:


pgbench_error(LOG, "starting vacuum...");


"pgbench_log" is already used as the default filename prefix for 
transaction logging.



> I agree with Fabien. Calling pgbench_error() inside pgbench_error()
> could be dangerous. I think "fmt" checking could be removed, or we may
> use Assert()

I would like not to use Assert in this case because IIUC they are 
mostly

used for testing.


I'd vote to remove this check at all. I don't see any place where it is
possible to call pgbench_error() passing empty "fmt".


pgbench_error(..., "%s", PQerrorMessage(con)); ?

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-10 Thread Arthur Zakirov
On Fri, Aug 10, 2018 at 04:46:04PM +0300, Marina Polyakova wrote:
> > +1 from me to keep initial name "pgbench_error". "pgbench_log" for new
> > function looks nice to me. I think it is better than just "log",
> > because "log" may conflict with natural logarithmic function (see "man 3
> > log").
> 
> Do you think that pgbench_log (or another whose name speaks only about
> logging) will look good, for example, with FATAL? Because this means that
> the logging function also processes errors and calls exit(1) if necessary..

Yes, why not. "_log" just means that you want to log some message with
the specified log level. Moreover those messages sometimes aren't error:

pgbench_error(LOG, "starting vacuum...");

> > I agree with Fabien. Calling pgbench_error() inside pgbench_error()
> > could be dangerous. I think "fmt" checking could be removed, or we may
> > use Assert()
> 
> I would like not to use Assert in this case because IIUC they are mostly
> used for testing.

I'd vote to remove this check at all. I don't see any place where it is
possible to call pgbench_error() passing empty "fmt".

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-10 Thread Marina Polyakova

On 10-08-2018 15:53, Arthur Zakirov wrote:

On Thu, Aug 09, 2018 at 06:17:22PM +0300, Marina Polyakova wrote:

> * ErrorLevel
>
> If ErrorLevel is used for things which are not errors, its name should
> not include "Error"? Maybe "LogLevel"?

On the one hand, this sounds better for me too. On the other hand, 
will not

this be in some kind of conflict with error level codes in elog.h?..


I think it shouldn't because those error levels are backends levels.
pgbench is a client side utility with its own code, it shares some code
with libpq and other utilities, but elog.h isn't one of them.


I agree with you on this :) I just meant that maybe it would be better 
to call this group in the same way because they are used in general for 
the same purpose?..



> This point also suggest that maybe "pgbench_error" is misnamed as well
> (ok, I know I suggested it in place of ereport, but e stands for error
> there), as it is called on errors, but is also on other things. Maybe
> "pgbench_log"? Or just simply "log" or "report", as it is really an
> local function, which does not need a prefix? That would mean that
> "pgbench_simple_error", which is indeed called on errors, could keep
> its initial name "pgbench_error", and be called on errors.

About the name "log" - we already have the function doLog, so perhaps 
the
name "report" will be better.. But like with ErrorLevel will not this 
be in

some kind of conflict with ereport which is also used for the levels
DEBUG... / LOG / INFO?


+1 from me to keep initial name "pgbench_error". "pgbench_log" for new
function looks nice to me. I think it is better than just "log",
because "log" may conflict with natural logarithmic function (see "man 
3

log").


Do you think that pgbench_log (or another whose name speaks only about 
logging) will look good, for example, with FATAL? Because this means 
that the logging function also processes errors and calls exit(1) if 
necessary..



> pgbench_error calls pgbench_error. Hmmm, why not.


I agree with Fabien. Calling pgbench_error() inside pgbench_error()
could be dangerous. I think "fmt" checking could be removed, or we may
use Assert()


I would like not to use Assert in this case because IIUC they are mostly 
used for testing.



or fprintf()+exit(1) at least.


Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-10 Thread Arthur Zakirov
On Thu, Aug 09, 2018 at 06:17:22PM +0300, Marina Polyakova wrote:
> > * ErrorLevel
> > 
> > If ErrorLevel is used for things which are not errors, its name should
> > not include "Error"? Maybe "LogLevel"?
> 
> On the one hand, this sounds better for me too. On the other hand, will not
> this be in some kind of conflict with error level codes in elog.h?..

I think it shouldn't because those error levels are backends levels.
pgbench is a client side utility with its own code, it shares some code
with libpq and other utilities, but elog.h isn't one of them.

> > This point also suggest that maybe "pgbench_error" is misnamed as well
> > (ok, I know I suggested it in place of ereport, but e stands for error
> > there), as it is called on errors, but is also on other things. Maybe
> > "pgbench_log"? Or just simply "log" or "report", as it is really an
> > local function, which does not need a prefix? That would mean that
> > "pgbench_simple_error", which is indeed called on errors, could keep
> > its initial name "pgbench_error", and be called on errors.
> 
> About the name "log" - we already have the function doLog, so perhaps the
> name "report" will be better.. But like with ErrorLevel will not this be in
> some kind of conflict with ereport which is also used for the levels
> DEBUG... / LOG / INFO?

+1 from me to keep initial name "pgbench_error". "pgbench_log" for new
function looks nice to me. I think it is better than just "log",
because "log" may conflict with natural logarithmic function (see "man 3
log").

> > pgbench_error calls pgbench_error. Hmmm, why not.

I agree with Fabien. Calling pgbench_error() inside pgbench_error()
could be dangerous. I think "fmt" checking could be removed, or we may
use Assert() or fprintf()+exit(1) at least.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-10 Thread Marina Polyakova

On 10-08-2018 11:33, Fabien COELHO wrote:

Hello Marina,


I'd suggest to let lookupCreateVariable, putVariable* as they are,
call pgbench_error with a level which does not stop the execution, 
and

abort if necessary from the callers with a "aborted because of
putVariable/eval/... error" message, as it was done before.


There's one more problem: if this is a client failure, an error 
message inside any of these functions should be printed at the level 
DEBUG_FAILS; otherwise it should be printed at the level LOG. Or do 
you suggest using the error level as an argument for these functions?


No. I suggest that the called function does only one simple thing,
probably "DEBUG", and that the *caller* prints a message if it is
unhappy about the failure of the called function, as it is currently
done. This allows to provide context as well from the caller, eg
"setting variable %s failed while ". The user
call rerun under debug for precision if they need it.


Ok!


I'm still not over enthousiastic with these changes, and still think
that it should be an independent patch, not submitted together with
the "retry on error" feature.


In the next version I will put the error patch last, so it will be 
possible to compare the "retry on error" feature with it and without it, 
and let the committer decide how it is better)


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-10 Thread Fabien COELHO



Hello Marina,


I'd suggest to let lookupCreateVariable, putVariable* as they are,
call pgbench_error with a level which does not stop the execution, and
abort if necessary from the callers with a "aborted because of
putVariable/eval/... error" message, as it was done before.


There's one more problem: if this is a client failure, an error message 
inside any of these functions should be printed at the level DEBUG_FAILS; 
otherwise it should be printed at the level LOG. Or do you suggest using the 
error level as an argument for these functions?


No. I suggest that the called function does only one simple thing, 
probably "DEBUG", and that the *caller* prints a message if it is unhappy 
about the failure of the called function, as it is currently done. This 
allows to provide context as well from the caller, eg "setting variable %s 
failed while ". The user call rerun under debug for 
precision if they need it.


I'm still not over enthousiastic with these changes, and still think that 
it should be an independent patch, not submitted together with the "retry 
on error" feature.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-09 Thread Marina Polyakova

On 09-08-2018 12:28, Fabien COELHO wrote:

Hello Marina,


Hello!


v10-0002-Pgbench-errors-use-a-separate-function-to-report.patch
- a patch for a separate error reporting function (this is used to 
report client failures that do not cause an aborts and this depends on 
the level of debugging).


Patch applies cleanly, compiles, global & local make check ok.


:-)


This patch improves/homogenizes logging & error reporting in pgbench,
in preparation for another patch which will manage transaction
restarts in some cases.

However ISTM that it is not as necessary as the previous one, i.e. we
could do without it to get the desired feature, so I see it more as a
refactoring done "in passing", and I'm wondering whether it is really
worth it because it adds some new complexity, so I'm not sure of the
net benefit.


We discussed this starting with [1]:


IMO this patch is more controversial than the other ones.

It is not really related to the aim of the patch series, which could
do without, couldn't it?


I'd suggest that it should be an independent submission, unrelated 
to

the pgbench error management patch.


I suppose that this is related; because of my patch there may be a 
lot

of such code (see v7 in [1]):

-   fprintf(stderr,
-   "malformed variable \"%s\" value: 
\"%s\"\n",
-   var->name, var->svalue);
+   if (debug_level >= DEBUG_FAILS)
+   {
+   fprintf(stderr,
+   "malformed variable \"%s\" value: 
\"%s\"\n",
+   var->name, var->svalue);
+   }

-   if (debug)
+   if (debug_level >= DEBUG_ALL)
fprintf(stderr, "client %d sending %s\n", st->id, sql);


I'm not sure that debug messages needs to be kept after debug, if it 
is

about debugging pgbench itself. That is debatable.


AFAICS it is not about debugging pgbench itself, but about more 
detailed

information that can be used to understand what exactly happened during
its launch. In the case of errors this helps to distinguish between
failures or errors by type (including which limit for retries was
violated and how far it was exceeded for the serialization/deadlock
errors).

That's why it was suggested to make the error function which hides 
all

these things (see [2]):

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in 
the

main code, wrap with some function like log(level, msg).


Yep. I did not wrote that, but I agree with an "elog" suggestion to 
switch


   if (...) { fprintf(...); exit/abort/continue/... }

to a simpler:

   elog(level, ...)



Anyway, I still have quite a few comments/suggestions on this version.


Thank you very much for them!


* ErrorLevel

If ErrorLevel is used for things which are not errors, its name should
not include "Error"? Maybe "LogLevel"?


On the one hand, this sounds better for me too. On the other hand, will 
not this be in some kind of conflict with error level codes in elog.h?..


/* Error level codes */
#define DEBUG5  10  /* Debugging messages, in 
categories of
 * decreasing 
detail. */
#define DEBUG4  11
...


I'm at odds with the proposed levels. ISTM that pgbench internal
errors which warrant an immediate exit should be dubbed "FATAL",


Ok!


which
would leave the "ERROR" name for... errors, eg SQL errors. I'd suggest
to use an INFO level for the PGBENCH_DEBUG function, and to keep LOG
for main program messages, so that all use case are separate. Or,
maybe the distinction between LOG/INFO is unclear so info is not
necessary.


The messages of the errors in SQL and meta commands are printed only if 
the option --debug-fails is used so I'm not sure that they should have a 
higher error level than main program messages (ERROR vs LOG). About an 
INFO level for the PGBENCH_DEBUG function - ISTM that some main program 
messages such as "dropping old tables...\n" or ..." tuples (%d%%) done 
(elapsed %.2f s, remaining %.2f s)\n" can also use it.. About that all 
use cases were separate - in the current version the level LOG also 
includes messages about abortions of the clients.


I'm unsure about the "log_min_messages" variable name, I'd suggest 
"log_level".


I do not see the asserts on LOG >= log_min_messages as useful, because
the level can only be LOG or DEBUG anyway.


Ok!


This point also suggest that maybe "pgbench_error" is misnamed as well
(ok, I know I suggested it in place of ereport, but e stands for error
there), as it is called on errors, but is also on other things. Maybe
"pgbench_log"? Or just simply "log" or "report", as it is really an
local function, which does not need a prefix? That would 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-09 Thread Fabien COELHO



Hello Marina,


v10-0002-Pgbench-errors-use-a-separate-function-to-report.patch
- a patch for a separate error reporting function (this is used to report 
client failures that do not cause an aborts and this depends on the level of 
debugging).


Patch applies cleanly, compiles, global & local make check ok.

This patch improves/homogenizes logging & error reporting in pgbench, in 
preparation for another patch which will manage transaction restarts in 
some cases.


However ISTM that it is not as necessary as the previous one, i.e. we 
could do without it to get the desired feature, so I see it more as a 
refactoring done "in passing", and I'm wondering whether it is 
really worth it because it adds some new complexity, so I'm not sure of 
the net benefit.


Anyway, I still have quite a few comments/suggestions on this version.

* ErrorLevel

If ErrorLevel is used for things which are not errors, its name should not 
include "Error"? Maybe "LogLevel"?


I'm at odds with the proposed levels. ISTM that pgbench internal errors 
which warrant an immediate exit should be dubbed "FATAL", which would 
leave the "ERROR" name for... errors, eg SQL errors. I'd suggest to use an 
INFO level for the PGBENCH_DEBUG function, and to keep LOG for main 
program messages, so that all use case are separate. Or, maybe the 
distinction between LOG/INFO is unclear so info is not necessary.


I'm unsure about the "log_min_messages" variable name, I'd suggest 
"log_level".


I do not see the asserts on LOG >= log_min_messages as useful, because the 
level can only be LOG or DEBUG anyway.


This point also suggest that maybe "pgbench_error" is misnamed as well 
(ok, I know I suggested it in place of ereport, but e stands for error 
there), as it is called on errors, but is also on other things. Maybe 
"pgbench_log"? Or just simply "log" or "report", as it is really an local 
function, which does not need a prefix? That would mean that 
"pgbench_simple_error", which is indeed called on errors, could keep its 
initial name "pgbench_error", and be called on errors.


Alternatively, the debug/logging code could be let as it is (i.e. direct 
print to stderr) and the function only called when there is some kind of 
error, in which case it could be named with "error" in its name (or 
elog/ereport...).


* PQExpBuffer

I still do not see a positive value from importing PQExpBuffer complexity 
and cost into pgbench, as the resulting code is not very readable and it 
adds malloc/free cycles, so I'd try to avoid using PQExpBuf as much as 
possible. ISTM that all usages could be avoided in the patch, and most 
should be avoided even if ExpBuffer is imported because it is really 
useful somewhere.


- to call pgbench_error from pgbench_simple_error, you can do a 
pgbench_log_va(level, format, va_list) version called both from 
pgbench_error & pgbench_simple_error.


- for PGBENCH_DEBUG function, do separate calls per type, the 
very small partial code duplication is worth avoiding ExpBuf IMO.


- for doCustom debug: I'd just let the printf as it is, with a comment, as 
it is really very internal stuff for debug. Or I'd just snprintf a 
something in a static buffer.


- for syntax_error: it should terminate, so it should call
pgbench_error(FATAL, ...). Idem, I'd either keep the printf then call
pgbench_error(FATAL, "syntax error found\n") for a final message,
or snprintf in a static buffer.

- for listAvailableScript: I'd simply call "pgbench_error(LOG" several 
time, once per line.


I see building a string with a format (printfExpBuf..) and then calling 
the pgbench_error function with just a "%s" format on the result as not 
very elegant, because the second format is somehow hacked around.


* bool client

I'm unconvince by this added boolean just to switch the level on 
encountered errors.


I'd suggest to let lookupCreateVariable, putVariable* as they are, call 
pgbench_error with a level which does not stop the execution, and abort if 
necessary from the callers with a "aborted because of putVariable/eval/... 
error" message, as it was done before.


pgbench_error calls pgbench_error. Hmmm, why not.

--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-08 Thread Marina Polyakova

On 07-08-2018 19:21, Fabien COELHO wrote:

Hello Marina,


Hello, Fabien!


v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a 
client's random seed during the repeating of transactions after 
serialization/deadlock failures).


About this v10 part 1:

Patch applies cleanly, compile, global & local make check both ok.

The random state is cleanly separated so that it will be easy to reset
it on client error handling ISTM that the pgbench side is
deterministic with
the separation of the seeds for different uses.

Code is clean, comments are clear.


:-)


I'm wondering what is the rational for the "xseed" field name? In
particular, what does the "x" stands for?


I called it "...seed" instead of "data" because perhaps the "data" is 
too general a name for use here (but I'm not entirely sure what Alvaro 
Herrera meant in [1], see my answer in [2]). I called it "xseed" to 
combine it with the arguments of the functions _dorand48 / pg_erand48 / 
pg_jrand48 in the file erand48.c. IIUC they use a linear congruential 
generator and perhaps "xseed" means the sequence with the name X of 
pseudorandom values of size 48 bits (X_0, X_1, ... X_n) where X_0 is the 
seed / the start value.


[1] 
https://www.postgresql.org/message-id/20180711180417.3ytmmwmonsr5lra7@alvherre.pgsql



LGTM, though I'd rename the random_state struct members so that it
wouldn't look as confusing.  Maybe that's just me.


[2] 
https://www.postgresql.org/message-id/cb2cde10e4e7a10a38b48e9cae8fbd28%40postgrespro.ru


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-07 Thread Fabien COELHO



Hello Marina,


v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a client's 
random seed during the repeating of transactions after serialization/deadlock 
failures).


About this v10 part 1:

Patch applies cleanly, compile, global & local make check both ok.

The random state is cleanly separated so that it will be easy to reset it 
on client error handling ISTM that the pgbench side is deterministic with

the separation of the seeds for different uses.

Code is clean, comments are clear.

I'm wondering what is the rational for the "xseed" field name? In 
particular, what does the "x" stands for?


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-12 Thread Fabien COELHO




The point is to avoid building the message with dynamic allocation and so
if in the end it is not used.


Ok! About avoidance - I'm afraid there's one more piece of debugging code 
with the same problem:


Indeed. I'd like to avoid all instances, so that PQExpBufferData is not 
needed anywhere, if possible. If not possible, then too bad, but I'd 
prefer to make do with formatted prints only, for simplicity.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-12 Thread Marina Polyakova

On 11-07-2018 22:34, Fabien COELHO wrote:

can we try something like this?

PGBENCH_ERROR_START(DEBUG_FAIL)
{
PGBENCH_ERROR("client %d repeats the failed transaction (try %d",


Argh, no? I was thinking of something much more trivial:

   pgbench_error(DEBUG, "message format %d %s...", 12, "hello world");

If you really need some complex dynamic buffer, and I would prefer
that you avoid that, then the fallback is:

   if (level >= DEBUG)
   {
  initPQstuff();
  ...
  pgbench_error(DEBUG, "fixed message... %s\n", msg);
  freePQstuff();
   }

The point is to avoid building the message with dynamic allocation and 
so

if in the end it is not used.


Ok! About avoidance - I'm afraid there's one more piece of debugging 
code with the same problem:


else if (command->type == META_COMMAND)
{
...
initPQExpBuffer(_buf);
printfPQExpBuffer(_buf, "client %d executing \\%s",
  st->id, argv[0]);
for (i = 1; i < argc; i++)
appendPQExpBuffer(_buf, " %s", argv[i]);
appendPQExpBufferChar(_buf, '\n');
ereport(ELEVEL_DEBUG, (errmsg("%s", errmsg_buf.data)));
termPQExpBuffer(_buf);

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-12 Thread Marina Polyakova

On 11-07-2018 21:04, Alvaro Herrera wrote:

Just a quick skim while refreshing what were those error reporting API
changes about ...


Thank you!


On 2018-May-21, Marina Polyakova wrote:


v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch
- a patch for the RandomState structure (this is used to reset a 
client's

random seed during the repeating of transactions after
serialization/deadlock failures).


LGTM, though I'd rename the random_state struct members so that it
wouldn't look as confusing.  Maybe that's just me.


IIUC, do you like "xseed" instead of "data"?

 typedef struct RandomState
 {
-   unsigned short data[3];
+   unsigned short xseed[3];
 } RandomState;

Or do you want to rename "random_state" in the structures RetryState / 
CState / TState? Thanks to Fabien Coelho' comments in [1], TState can 
contain several RandomStates for different purposes, something like 
this:


/*
 * Thread state
 */
typedef struct
{
...
/*
 * Separate randomness for each thread. Each thread option uses its own
	 * random state to make all of them independent of each other and 
therefore

 * deterministic at the thread level.
 */
RandomState choose_script_rs;   /* random state for selecting a script 
*/
	RandomState throttling_rs;	/* random state for transaction throttling 
*/

RandomState sampling_rs;/* random state for log sampling */
...
} TState;


v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch
- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after 
serialization/deadlock

failures).


Please don't allocate Variable structs one by one.  First time allocate
some decent number (say 8) and then enlarge by duplicating size.  That
way you save realloc overhead.  We use this technique everywhere else,
no reason do different here.  Other than that, LGTM.


Ok!

[1] 
https://www.postgresql.org/message-id/alpine.DEB.2.21.1806090810090.5307%40lancre


While reading your patch, it occurs to me that a run is not 
deterministic

at the thread level under throttling and sampling, because the random
state is sollicited differently depending on when transaction ends. 
This
suggest that maybe each thread random_state use should have its own 
random

state.


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-12 Thread Marina Polyakova

On 11-07-2018 20:49, Alvaro Herrera wrote:

On 2018-Jul-11, Marina Polyakova wrote:


can we try something like this?

PGBENCH_ERROR_START(DEBUG_FAIL)
{
PGBENCH_ERROR("client %d repeats the failed transaction (try %d",
  st->id, st->retries + 1);
if (max_tries)
PGBENCH_ERROR("/%d", max_tries);
if (latency_limit)
{
PGBENCH_ERROR(", %.3f%% of the maximum time of tries was used",
  getLatencyUsed(st, ));
}
PGBENCH_ERROR(")\n");
}
PGBENCH_ERROR_END();


I didn't quite understand what these PGBENCH_ERROR() functions/macros
are supposed to do.  Care to explain?


It is used only to print a string with the given arguments to stderr. 
Probably it might be just the function pgbench_error and not a macro..


P.S. This is my mistake, I did not think that PGBENCH_ERROR_END does not 
know the elevel for calling exit(1) if the elevel >= ERROR.


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-11 Thread Fabien COELHO





can we try something like this?

PGBENCH_ERROR_START(DEBUG_FAIL)
{
PGBENCH_ERROR("client %d repeats the failed transaction (try %d",


Argh, no? I was thinking of something much more trivial:

   pgbench_error(DEBUG, "message format %d %s...", 12, "hello world");

If you really need some complex dynamic buffer, and I would prefer 
that you avoid that, then the fallback is:


   if (level >= DEBUG)
   {
  initPQstuff();
  ...
  pgbench_error(DEBUG, "fixed message... %s\n", msg);
  freePQstuff();
   }

The point is to avoid building the message with dynamic allocation and so
if in the end it is not used.

--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-11 Thread Alvaro Herrera
Just a quick skim while refreshing what were those error reporting API
changes about ...

On 2018-May-21, Marina Polyakova wrote:

> v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch
> - a patch for the RandomState structure (this is used to reset a client's
> random seed during the repeating of transactions after
> serialization/deadlock failures).

LGTM, though I'd rename the random_state struct members so that it
wouldn't look as confusing.  Maybe that's just me.

> v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch
> - a patch for the Variables structure (this is used to reset client
> variables during the repeating of transactions after serialization/deadlock
> failures).

Please don't allocate Variable structs one by one.  First time allocate
some decent number (say 8) and then enlarge by duplicating size.  That
way you save realloc overhead.  We use this technique everywhere else,
no reason do different here.  Other than that, LGTM.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-11 Thread Alvaro Herrera
On 2018-Jul-11, Marina Polyakova wrote:

> can we try something like this?
> 
> PGBENCH_ERROR_START(DEBUG_FAIL)
> {
>   PGBENCH_ERROR("client %d repeats the failed transaction (try %d",
> st->id, st->retries + 1);
>   if (max_tries)
>   PGBENCH_ERROR("/%d", max_tries);
>   if (latency_limit)
>   {
>   PGBENCH_ERROR(", %.3f%% of the maximum time of tries was used",
> getLatencyUsed(st, ));
>   }
>   PGBENCH_ERROR(")\n");
> }
> PGBENCH_ERROR_END();

I didn't quite understand what these PGBENCH_ERROR() functions/macros
are supposed to do.  Care to explain?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-11 Thread Marina Polyakova

On 11-07-2018 16:24, Fabien COELHO wrote:

Hello Marina,

* -d/--debug: I'm not in favor in requiring a mandatory text argument 
on this option.


As you wrote in [1], adding an additional option is also a bad idea:


Hey, I'm entitled to some internal contradictions:-)


... and discussions will be continue forever %-)

I'm sceptical of the "--debug-fails" options. ISTM that --debug is 
already there and should just be reused.


I was thinking that you could just use the existing --debug, not
change its syntax. My point was that --debug exists, and you could
just print
the messages when under --debug.


Now I understand you better, thanks. I think it will be useful to 
receive only messages about failures, because they and progress reports 
can be lost in many other debug messages such as "client %d sending ..." 
/ "client %d executing ..." / "client %d receiving".


Maybe it's better to use an optional argument/arguments for 
compatibility (--debug[=fails] or --debug[=NUM])? But if we use the 
numbers, now I can see only 2 levels, and there's no guarantee that 
they will no change..


Optional arguments to options (!) are not really clean things, so I'd
like to avoid going onto this path, esp. as I cannot see any other
instance in pgbench or elsewhere in postgres,


AFAICS they are used in pg_waldump (option --stats[=record]) and in psql 
(option --help[=topic]).



and I personnaly
consider these as a bad idea.



So if absolutely necessary, a new option is still better than changing
--debug syntax. If not necessary, then it is better:-)


Ok!


* I'm reserved about the whole ereport thing, see comments in other
messages.


Thank you, I'll try to implement the error reporting in the way you 
suggested.


Dunno if it is a good idea either. The committer word is the good one
in the end:-à


I agree with you that ereport has good reasons to be non-trivial in the 
backend and it does not have the same in pgbench..



* doCustom changes.




On CSTATE_FAILURE, the next command is possibly started. Although 
there is some consistency with the previous point, I think that it 
totally breaks the state automaton where now a command can start 
while the whole transaction is in failing state anyway. There was no 
point in starting it in the first place.


So, for me, the FAILURE state should record/count the failure, then 
skip

to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
This is much clearer that way.

Then RETRY should reinstate the global state and proceed to start the 
*first* command again.

<...>

It is unclear to me why backslash command errors are turned to 
FAILURE

instead of ABORTED: there is no way they are going to be retried, so
maybe they should/could skip directly to ABORTED?


So do you propose to execute the command "ROLLBACK" without 
calculating its latency etc. if we are in a failed transaction and 
clear the conditional stack after each failure?


Also just to be clear: do you want to have the state CSTATE_ABORTED 
for client abortion and another state for interrupting the current 
transaction?


I do not understand what "interrupting the current transaction" means.
A transaction is either committed or rollbacked, I do not know about
"interrupted".


I mean that IIUC the server usually only reports the error and you must 
manually send the command "END" or "ROLLBACK" to rollback a failed 
transaction.



When it is rollbacked, probably some stats will be
collected in passing, I'm fine with that.

If there is an error in a pgbench script, the transaction is aborted,
which means for me that the script execution is stopped where it was,
and either it is restarted from the beginning (retry) or counted as
failure (not retry, just aborted, really).

If by interrupted you mean that one script begins a transaction and
another ends it, as I said in the review I think that this strange
case should be forbidden, so that all the code and documentation
trying to
manage that can be removed.


Ok!


The current RETRY state does memory allocations to generate a message
with buffer allocation and so on. This looks like a costly and 
useless
operation. If the user required "retries", then this is normal 
behavior,

the retries are counted and will be printed out in the final report,
and there is no point in printing out every single one of them.
Maybe you want that debugging, but then coslty operations should be 
guarded.


I think we need these debugging messages because, for example,


Debugging message should cost only when under debug. When not under
debug, there should be no debugging message, and there should be no
cost for building and discarding such messages in the executed code
path beyond
testing whether program is under debug.

if you use the option --latency-limit, you we will never know in 
advance whether the serialization/deadlock failure will be retried or 
not.


ISTM that it will be shown final report. If I want debug, I ask for
--debug, otherwise I think that the 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-11 Thread Fabien COELHO


Hello Marina,

* -d/--debug: I'm not in favor in requiring a mandatory text argument on 
this option.


As you wrote in [1], adding an additional option is also a bad idea:


Hey, I'm entitled to some internal contradictions:-)

I'm sceptical of the "--debug-fails" options. ISTM that --debug is 
already there and should just be reused.


I was thinking that you could just use the existing --debug, not change 
its syntax. My point was that --debug exists, and you could just print

the messages when under --debug.

Maybe it's better to use an optional argument/arguments for compatibility 
(--debug[=fails] or --debug[=NUM])? But if we use the numbers, now I can see 
only 2 levels, and there's no guarantee that they will no change..


Optional arguments to options (!) are not really clean things, so I'd like 
to avoid going onto this path, esp. as I cannot see any other instance in 
pgbench or elsewhere in postgres, and I personnaly consider these as a bad 
idea.


So if absolutely necessary, a new option is still better than changing 
--debug syntax. If not necessary, then it is better:-)



* I'm reserved about the whole ereport thing, see comments in other
messages.


Thank you, I'll try to implement the error reporting in the way you 
suggested.


Dunno if it is a good idea either. The committer word is the good one in 
the end:-à



Thank you, I'll fix this.
I'm sorry, I'll fix this.


You do not have to thank me or being sorry on every comment I do, once a 
the former is enough, and there is no need for the later.



* doCustom changes.




On CSTATE_FAILURE, the next command is possibly started. Although there 
is some consistency with the previous point, I think that it totally 
breaks the state automaton where now a command can start while the 
whole transaction is in failing state anyway. There was no point in 
starting it in the first place.


So, for me, the FAILURE state should record/count the failure, then skip
to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
This is much clearer that way.

Then RETRY should reinstate the global state and proceed to start the 
*first* command again.

<...>

It is unclear to me why backslash command errors are turned to FAILURE
instead of ABORTED: there is no way they are going to be retried, so
maybe they should/could skip directly to ABORTED?


So do you propose to execute the command "ROLLBACK" without calculating its 
latency etc. if we are in a failed transaction and clear the conditional 
stack after each failure?


Also just to be clear: do you want to have the state CSTATE_ABORTED for 
client abortion and another state for interrupting the current transaction?


I do not understand what "interrupting the current transaction" means. A 
transaction is either committed or rollbacked, I do not know about 
"interrupted". When it is rollbacked, probably some stats will be 
collected in passing, I'm fine with that.


If there is an error in a pgbench script, the transaction is aborted, 
which means for me that the script execution is stopped where it was, and 
either it is restarted from the beginning (retry) or counted as failure 
(not retry, just aborted, really).


If by interrupted you mean that one script begins a transaction and 
another ends it, as I said in the review I think that this strange case 
should be forbidden, so that all the code and documentation trying to

manage that can be removed.


The current RETRY state does memory allocations to generate a message
with buffer allocation and so on. This looks like a costly and useless
operation. If the user required "retries", then this is normal behavior,
the retries are counted and will be printed out in the final report,
and there is no point in printing out every single one of them.
Maybe you want that debugging, but then coslty operations should be 
guarded.


I think we need these debugging messages because, for example,


Debugging message should cost only when under debug. When not under debug, 
there should be no debugging message, and there should be no cost for 
building and discarding such messages in the executed code path beyond

testing whether program is under debug.

if you use the option --latency-limit, you we will never know in advance 
whether the serialization/deadlock failure will be retried or not.


ISTM that it will be shown final report. If I want debug, I ask for 
--debug, otherwise I think that the command should do what it was asked 
for, i.e. run scripts, collect performance statistics and show them at the 
end.


In particular, when running with retries is enabled, the user is expecting 
deadlock/serialization errors, so that they are not "errors" as such for

them.

They also help to understand which limit of retries was violated or how 
close we were to these limits during the execution of a specific 
transaction. But I agree with you that they are costly and can be 
skipped if the failure type is never retried. Maybe it is better to 
split them 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-11 Thread Marina Polyakova

On 09-07-2018 16:05, Fabien COELHO wrote:

Hello Marina,


Hello, Fabien!


Here is a review for the last part of your v9 version.


Thank you very much for this!


Patch does not "git apply" (may anymore):
  error: patch failed: doc/src/sgml/ref/pgbench.sgml:513
  error: doc/src/sgml/ref/pgbench.sgml: patch does not apply


Sorry, I'll send a new version soon.


However I could get it to apply with the "patch" command.

Then patch compiles, global & pgbench "make check" are ok.


:-)


Feature
===

The patch adds the ability to restart transactions (i.e. the full 
script)

on some errors, which is a good thing as it allows to exercice postgres
performance in more realistic scenarii.

* -d/--debug: I'm not in favor in requiring a mandatory text argument 
on this
option. It is not pratical, the user has to remember it, and it is a 
change.
I'm sceptical of the overall debug handling changes. Maybe we could 
have
multiple -d which lead to higher debug level, but I'm not sure that it 
can be
made to work for this case and still be compatible with the previous 
behavior.

Maybe you need a specific option for your purpose, eg "--debug-retry"?


As you wrote in [1], adding an additional option is also a bad idea:


I'm sceptical of the "--debug-fails" options. ISTM that --debug is
already there
and should just be reused.


Maybe it's better to use an optional argument/arguments for 
compatibility (--debug[=fails] or --debug[=NUM])? But if we use the 
numbers, now I can see only 2 levels, and there's no guarantee that they 
will no change..



Code


* The implementation is less complex that the previous submission,
which is a good thing. I'm not sure that all the remaining complexity
is still fully needed.

* I'm reserved about the whole ereport thing, see comments in other
messages.


Thank you, I'll try to implement the error reporting in the way you 
suggested.



Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me.
In particular, the "CLIENT" part is not very useful. If the
distinction makes sense, I would have kept "LOG" for the initial one 
and

add other ones for ABORT and PGBENCH, maybe.


Ok!


* There are no comments about "retries" in StatData, CState and Command
structures.

* Also, for StatData, I would like to understand the logic between cnt,
skipped, retries, retried, errors, ... so a clear information about the
expected invariant if any would be welcome. One has to go in the code 
to

understand how these fields relate one to the other.

<...>

* How "errors" differs from "ecnt" is unclear to me.


Thank you, I'll fix this.


* commandFailed: I think that it should be kept much simpler. In
particular, having errors on errors does not help much: on
ELEVEL_FATAL, it ignores the actual reported error and generates
another error of the same level, so that the initial issue is hidden.
Even if these are can't happen cases, hidding the origin if it occurs
looks unhelpful. Just print it directly, and maybe abort if you think
that it is a can't happen case.


Oh, thanks, my mistake(

* copyRandomState: just use sizeof(RandomState) instead of making 
assumptions
about the contents of the struct. Also, this function looks pretty 
useless,

why not just do a plain assignment?

* copyVariables: lacks comments to explain that the destination is 
cleaned up
and so on. The cleanup phase could probaly be in a distinct function, 
so that
the code would be clearer. Maybe the function variable names are too 
long.


Thank you, I'll fix this.


  if (current_source->svalue)

in the context of a guard for a strdup, maybe:

  if (current_source->svalue != NULL)


I'm sorry, I'll fix this.

* I do not understand the comments on CState enum: "First, remember the 
failure
in CSTATE_FAILURE. Then process other commands of the failed 
transaction if any"
Why would other commands be processed at all if the transaction is 
aborted?

For me any error must leads to the rollback and possible retry of the
transaction. This comment needs to be clarified. It should also say
that on FAILURE, it will go either to RETRY or ABORTED. See below my
comments about doCustom.

It is unclear to me why their could be several failures within a
transaction, as I would have stopped that it would be aborted on the
first one.

* I do not undestand the purpose of first_failure. The comment should 
explain
why it would need to be remembered. From my point of view, I'm not 
fully

convinced that it should.

<...>

* executeCondition: this hides client automaton state changes which 
were

clearly visible beforehand in the switch, and the different handling of
if & elif is also hidden.

I'm against this unnecessary restructuring and to hide such an 
information,
all state changes should be clearly seen in the state switch so that it 
is

easier to understand and follow.

I do not see why touching the conditional stack on internal errors
(evaluateExpr failure) brings anything, the whole transaction will be 
aborted

anyway.

* 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-09 Thread Fabien COELHO



Hello Marina,


v9-0004-Pgbench-errors-and-serialization-deadlock-retries.patch
- the main patch for handling client errors and repetition of transactions 
with serialization/deadlock failures (see the detailed description in the 
file).


Here is a review for the last part of your v9 version.

Patch does not "git apply" (may anymore):
  error: patch failed: doc/src/sgml/ref/pgbench.sgml:513
  error: doc/src/sgml/ref/pgbench.sgml: patch does not apply

However I could get it to apply with the "patch" command.

Then patch compiles, global & pgbench "make check" are ok.

Feature
===

The patch adds the ability to restart transactions (i.e. the full script)
on some errors, which is a good thing as it allows to exercice postgres
performance in more realistic scenarii.

* -d/--debug: I'm not in favor in requiring a mandatory text argument on this
option. It is not pratical, the user has to remember it, and it is a change.
I'm sceptical of the overall debug handling changes. Maybe we could have
multiple -d which lead to higher debug level, but I'm not sure that it can be
made to work for this case and still be compatible with the previous behavior.
Maybe you need a specific option for your purpose, eg "--debug-retry"?


Code


* The implementation is less complex that the previous submission, which 
is a good thing. I'm not sure that all the remaining complexity is still 
fully needed.


* I'm reserved about the whole ereport thing, see comments in other
messages.

Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me.
In particular, the "CLIENT" part is not very useful. If the
distinction makes sense, I would have kept "LOG" for the initial one and
add other ones for ABORT and PGBENCH, maybe.

* There are no comments about "retries" in StatData, CState and Command
structures.

* Also, for StatData, I would like to understand the logic between cnt,
skipped, retries, retried, errors, ... so a clear information about the
expected invariant if any would be welcome. One has to go in the code to
understand how these fields relate one to the other.

* "errors_in_failed_tx" is some subcounter of "errors", for a special 
case. Why it is there fails me [I finally understood, and I think it 
should be removed, see end of review]. If we wanted to distinguish, then 
we should distinguish homogeneously: maybe just count the different error 
types, eg have things like "deadlock_errors", "serializable_errors", 
"other_errors", "internal_pgbench_errors" which would be orthogonal one to 
the other, and "errors" could be recomputed from these.


* How "errors" differs from "ecnt" is unclear to me.

* FailureStatus states are not homogeneously named. I'd suggest to use 
*_FAILURE for all cases. The miscellaneous case should probably be the 
last. I do not understand the distinction between ANOTHER_FAILURE & 
IN_FAILED_SQL_TRANSACTION. Why should it be needed? [again, see and of 
review]


* I do not understand the comments on CState enum: "First, remember the failure
in CSTATE_FAILURE. Then process other commands of the failed transaction if any"
Why would other commands be processed at all if the transaction is aborted?
For me any error must leads to the rollback and possible retry of the
transaction. This comment needs to be clarified. It should also say
that on FAILURE, it will go either to RETRY or ABORTED. See below my 
comments about doCustom.


It is unclear to me why their could be several failures within a 
transaction, as I would have stopped that it would be aborted on the first 
one.


* I do not undestand the purpose of first_failure. The comment should explain
why it would need to be remembered. From my point of view, I'm not fully
convinced that it should.

* commandFailed: I think that it should be kept much simpler. In 
particular, having errors on errors does not help much: on ELEVEL_FATAL, 
it ignores the actual reported error and generates another error of the 
same level, so that the initial issue is hidden. Even if these are can't 
happen cases, hidding the origin if it occurs looks unhelpful. Just print 
it directly, and maybe abort if you think that it is a can't happen case.


* copyRandomState: just use sizeof(RandomState) instead of making assumptions
about the contents of the struct. Also, this function looks pretty useless,
why not just do a plain assignment?

* copyVariables: lacks comments to explain that the destination is cleaned up
and so on. The cleanup phase could probaly be in a distinct function, so that
the code would be clearer. Maybe the function variable names are too long.

  if (current_source->svalue)

in the context of a guard for a strdup, maybe:

  if (current_source->svalue != NULL)

* executeCondition: this hides client automaton state changes which were
clearly visible beforehand in the switch, and the different handling of
if & elif is also hidden.

I'm against this unnecessary restructuring and to hide such an information,
all state changes should be 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-07-08 Thread Fabien COELHO



Hello Alvaro,


For context: in the backend, elog() is only used for internal messages
(i.e. "can't-happen" conditions), and ereport() is used for user-facing
messages.  There are many things ereport() has that elog() doesn't, such
as additional message fields (HINT, DETAIL, etc) that I think could have
some use in pgbench as well.  If you use elog() then you can't have that.
[...]


Ok. Then forget elog, but I'm pretty against having a kind of ereport 
which looks greatly overkill to me, because:


 (1) the syntax is pretty heavy, and does not look like a function.

 (2) the implementation allocates a string buffer for the message
 this is greatly overkill for pgbench which only needs to print
 to stderr once.

This makes sense server-side because the generated message may be output 
several times (eg stderr, file logging, to the client), and the 
implementation has to work with cpp implementations which do not handle 
varags (and maybe other reasons).


So I would be in favor of having just a simpler error function. 
Incidentally, one already exists "pgbench_error" and could be improved, 
extended, replaced. There is also "syntax_error".



One thing that just came to mind is that pgbench uses some src/fe_utils
stuff.  I hope having ereport() doesn't cause a conflict with that ...


Currently ereport does not exists client-side. I do not think that this 
patch is the right moment to decide to do that. Also, there are some 
"elog" in libpq, but they are out with a "#ifndef FRONTEND".



BTW I think abort() is not the right thing, as it'll cause core dumps if
enabled.  Why not just exit(1)?


Yes, I agree and already reported that.

Conclusion:

My current opinion is that I'm pretty against bringing "ereport" to the 
front-end on this specific pgbench patch. I agree with you that "elog" 
would be misleading there as well, for the arguments you developed above.


I'd suggest to have just one clean and simple pgbench internal function to 
handle errors and possibly exit, debug... Something like


  void pgb_error(FATAL, "error %d raised", 12);

Implemented as

  void pgb_error(int/enum XXX level, const char * format, ...)
  {
 test level and maybe return immediately (eg debug);
 print to stderr;
 exit/abort/return depending;
  }

Then if some advanced error handling is introduced for front-end programs, 
possibly through some macros, then it would be time to improve upon that.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-06-14 Thread Marina Polyakova

On 13-06-2018 22:44, Fabien COELHO wrote:

Hello Marina,

I suppose that this is related; because of my patch there may be a lot 
of such code (see v7 in [1]):


-   fprintf(stderr,
-   "malformed variable \"%s\" value: 
\"%s\"\n",
-   var->name, var->svalue);
+   if (debug_level >= DEBUG_FAILS)
+   {
+   fprintf(stderr,
+   "malformed variable \"%s\" value: 
\"%s\"\n",
+   var->name, var->svalue);
+   }

-   if (debug)
+   if (debug_level >= DEBUG_ALL)
fprintf(stderr, "client %d sending %s\n", st->id, sql);


I'm not sure that debug messages needs to be kept after debug, if it
is about debugging pgbench itself. That is debatable.


AFAICS it is not about debugging pgbench itself, but about more detailed 
information that can be used to understand what exactly happened during 
its launch. In the case of errors this helps to distinguish between 
failures or errors by type (including which limit for retries was 
violated and how far it was exceeded for the serialization/deadlock 
errors).



The code adapts/duplicates existing server-side "ereport" stuff and
brings it to the frontend, where the logging needs are somehow quite
different.

I'd prefer to avoid duplication and/or have some code sharing.


I was recommended to use the same interface in [3]:

On elog/errstart: we already have a convention for what ereport() 
calls look like; I suggest to use that instead of inventing your 
own.


The "elog" interface already exists, it is not an invention. "ereport"
is a hack which is somehow necessary in some cases. I prefer a simple
function call if possible for the purpose, and ISTM that this is the
case.



That is a lot of complication which are justified server side
where logging requirements are special, but in this case I see it as
overkill.


I think we need ereport() if we want to make detailed error messages 
(see examples in [1])..


If it really needs to be duplicated, I'd suggest to put all this 
stuff in separated files. If we want to do that, I think that it 
would belong to fe_utils, and where it could/should be used by all 
front-end programs.


I'll try to do it..


Dunno. If you only need one "elog" function which prints a message to
stderr and decides whether to abort/exit/whatevrer, maybe it can just
be kept in pgbench. If there are are several complicated functions and
macros, better with a file. So I'd say it depends.



So my current view is that if you only need an "elog" function, it is
simpler to add it to "pgbench.c".


Thank you!


For logging purposes, ISTM that the "elog" macro interface is nicer,
closer to the existing "fprintf(stderr", as it would not introduce 
the

additional parentheses hack for "rest".


I was also recommended to use ereport() instead of elog() in [3]:


Probably. Are you hoping that advises from different reviewers should
be consistent? That seems optimistic:-)


To make the patch committable there should be no objection to it..

[1] 
https://www.postgresql.org/message-id/c89fcc380a19380260b5ea463efc1416%40postgrespro.ru


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-06-14 Thread Marina Polyakova

On 13-06-2018 22:59, Alvaro Herrera wrote:

For context: in the backend, elog() is only used for internal messages
(i.e. "can't-happen" conditions), and ereport() is used for user-facing
messages.  There are many things ereport() has that elog() doesn't, 
such
as additional message fields (HINT, DETAIL, etc) that I think could 
have
some use in pgbench as well.  If you use elog() then you can't have 
that.


AFAIU originally it was not supposed that the pgbench error messages 
have these fields, so will it be good to change the final output to 
stderr?.. For example:


-   fprintf(stderr, "%s", PQerrorMessage(con));
-   fprintf(stderr, "(ignoring this error and continuing 
anyway)\n");
+   ereport(LOG,
+   (errmsg("Ignoring the server error and continuing 
anyway"),
+errdetail("%s", PQerrorMessage(con;

-   fprintf(stderr, "%s", PQerrorMessage(con));
-   if (sqlState && strcmp(sqlState, 
ERRCODE_UNDEFINED_TABLE) == 0)
-   {
-fprintf(stderr, "Perhaps you need to do initialization (\"pgbench 
-i\") in database \"%s\"\n", PQdb(con));

-   }
-
-   exit(1);
+   ereport(ERROR,
+   (errmsg("Server error"),
+errdetail("%s", PQerrorMessage(con)),
+sqlState && strcmp(sqlState, 
ERRCODE_UNDEFINED_TABLE) == 0 ?
+	 errhint("Perhaps you need to do initialization (\"pgbench -i\") 
in database \"%s\"\n",

+PQdb(con)) : 0));

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-06-13 Thread Alvaro Herrera
On 2018-Jun-13, Fabien COELHO wrote:

> > > > With that, is there a need for elog()?  In the backend we have
> > > > it because $HISTORY but there's no need for that here -- I
> > > > propose to lose elog() and use only ereport everywhere.
> 
> See commit 8a07ebb3c172 which turns some ereport into elog...

For context: in the backend, elog() is only used for internal messages
(i.e. "can't-happen" conditions), and ereport() is used for user-facing
messages.  There are many things ereport() has that elog() doesn't, such
as additional message fields (HINT, DETAIL, etc) that I think could have
some use in pgbench as well.  If you use elog() then you can't have that.

Another difference is that in the backend, elog() messages are never
translated, while ereport() message are translated.  Since pgbench is
translatable I think it would be best to keep those things in sync, to
avoid confusion. (Although of course you could do it differently in
pgbench than backend.)

One thing that just came to mind is that pgbench uses some src/fe_utils
stuff.  I hope having ereport() doesn't cause a conflict with that ...

BTW I think abort() is not the right thing, as it'll cause core dumps if
enabled.  Why not just exit(1)?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-06-13 Thread Fabien COELHO


Hello Marina,

I suppose that this is related; because of my patch there may be a lot of 
such code (see v7 in [1]):


-   fprintf(stderr,
-	"malformed variable \"%s\" value: 
\"%s\"\n",

-   var->name, var->svalue);
+   if (debug_level >= DEBUG_FAILS)
+   {
+   fprintf(stderr,
+		"malformed variable \"%s\" 
value: \"%s\"\n",

+   var->name, var->svalue);
+   }

-   if (debug)
+   if (debug_level >= DEBUG_ALL)
			fprintf(stderr, "client %d sending %s\n", st->id, 
sql);


I'm not sure that debug messages needs to be kept after debug, if it is 
about debugging pgbench itself. That is debatable.


That's why it was suggested to make the error function which hides all these 
things (see [2]):


There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with 
corresponding fprintf(stderr..) I think it's time to do it like in the 
main code, wrap with some function like log(level, msg).


Yep. I did not wrote that, but I agree with an "elog" suggestion to switch

  if (...) { fprintf(...); exit/abort/continue/... }

to a simpler:

  elog(level, ...)

Moreover, it changes pgbench current behavior, which might be 
admissible, but should be discussed clearly.



The semantics of the existing code is changed, the FATAL levels calls
abort() and replace existing exit(1) calls. Maybe you want an ERROR
level as well.


Oh, thanks, I agree with you. And I do not want to change the program exit 
code without good reasons, but I'm sorry I may not know all pros and cons in 
this matter..


Or did you also mean other changes?


AFAICR I meant switching exit to abort in some cases.


The code adapts/duplicates existing server-side "ereport" stuff and
brings it to the frontend, where the logging needs are somehow quite
different.

I'd prefer to avoid duplication and/or have some code sharing.


I was recommended to use the same interface in [3]:

On elog/errstart: we already have a convention for what ereport() 
calls look like; I suggest to use that instead of inventing your own.


The "elog" interface already exists, it is not an invention. "ereport" is 
a hack which is somehow necessary in some cases. I prefer a simple 
function call if possible for the purpose, and ISTM that this is the case.


If it really needs to be duplicated, I'd suggest to put all this stuff 
in separated files. If we want to do that, I think that it would belong 
to fe_utils, and where it could/should be used by all front-end 
programs.


I'll try to do it..


Dunno. If you only need one "elog" function which prints a message to 
stderr and decides whether to abort/exit/whatevrer, maybe it can just be 
kept in pgbench. If there are are several complicated functions and 
macros, better with a file. So I'd say it depends.



For logging purposes, ISTM that the "elog" macro interface is nicer,
closer to the existing "fprintf(stderr", as it would not introduce the
additional parentheses hack for "rest".


I was also recommended to use ereport() instead of elog() in [3]:


Probably. Are you hoping that advises from different reviewers should be 
consistent? That seems optimistic:-)


With that, is there a need for elog()?  In the backend we have it 
because $HISTORY but there's no need for that here -- I propose to 
lose elog() and use only ereport everywhere.


See commit 8a07ebb3c172 which turns some ereport into elog...


My 0.02€: maybe you just want to turn

  fprintf(stderr, format, ...);
  // then possibly exit or abort depending...

into

  elog(level, format, ...);

which maybe would exit or abort depending on level, and possibly not
actually report under some levels and/or some conditions. For that, it
could enough to just provide an nice "elog" function.


I agree that elog() can be coded in this way. To use ereport() I need a 
structure to store the error level as a condition to exit.


Yep. That is a lot of complication which are justified server side where 
logging requirements are special, but in this case I see it as overkill.


So my current view is that if you only need an "elog" function, it is 
simpler to add it to "pgbench.c".


--
Fabien.

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-06-13 Thread Marina Polyakova

On 10-06-2018 10:38, Fabien COELHO wrote:

Hello Marina,


Hello!


v9-0003-Pgbench-errors-use-the-ereport-macro-to-report-de.patch
- a patch for the ereport() macro (this is used to report client 
failures that do not cause an aborts and this depends on the level of 
debugging).


ISTM that abort() is called under FATAL.


If you mean abortion of the client, this is not an abortion of the main 
program.


- implementation: if possible, use the local ErrorData structure 
during the errstart()/errmsg()/errfinish() calls. Otherwise use a 
static variable protected by a mutex if necessary. To do all of this 
export the function appendPQExpBufferVA from libpq.


This patch applies cleanly on top of the other ones (there are minimal
interactions), compiles cleanly, global & pgbench "make check" are ok.


:-)


IMO this patch is more controversial than the other ones.

It is not really related to the aim of the patch series, which could
do without, couldn't it?



I'd suggest that it should be an independent submission, unrelated to
the pgbench error management patch.


I suppose that this is related; because of my patch there may be a lot 
of such code (see v7 in [1]):


-   fprintf(stderr,
-   "malformed variable \"%s\" value: 
\"%s\"\n",
-   var->name, var->svalue);
+   if (debug_level >= DEBUG_FAILS)
+   {
+   fprintf(stderr,
+   "malformed variable \"%s\" value: 
\"%s\"\n",
+   var->name, var->svalue);
+   }

-   if (debug)
+   if (debug_level >= DEBUG_ALL)
fprintf(stderr, "client %d sending %s\n", st->id, sql);

That's why it was suggested to make the error function which hides all 
these things (see [2]):


There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with 
corresponding fprintf(stderr..) I think it's time to do it like in the 
main code, wrap with some function like log(level, msg).



Moreover, it changes pgbench current
behavior, which might be admissible, but should be discussed clearly.



The semantics of the existing code is changed, the FATAL levels calls
abort() and replace existing exit(1) calls. Maybe you want an ERROR
level as well.


Oh, thanks, I agree with you. And I do not want to change the program 
exit code without good reasons, but I'm sorry I may not know all pros 
and cons in this matter..


Or did you also mean other changes?


The code adapts/duplicates existing server-side "ereport" stuff and
brings it to the frontend, where the logging needs are somehow quite
different.

I'd prefer to avoid duplication and/or have some code sharing.


I was recommended to use the same interface in [3]:

On elog/errstart: we already have a convention for what ereport() calls 
look like; I suggest to use that instead of inventing your own.



If it
really needs to be duplicated, I'd suggest to put all this stuff in
separated files. If we want to do that, I think that it would belong
to fe_utils, and where it could/should be used by all front-end
programs.


I'll try to do it..


I do not understand why names are changed, eg ELEVEL_FATAL instead of
FATAL. ISTM that part of the point of the move would be to be
homogeneous, which suggests that the same names should be reused.


Ok!


For logging purposes, ISTM that the "elog" macro interface is nicer,
closer to the existing "fprintf(stderr", as it would not introduce the
additional parentheses hack for "rest".


I was also recommended to use ereport() instead of elog() in [3]:

With that, is there a need for elog()?  In the backend we have it 
because $HISTORY but there's no need for that here -- I propose to lose 
elog() and use only ereport everywhere.



I see no actual value in creating on the fly a dynamic buffer through
plenty macros and functions as the end result is just to print the
message out to stderr in the end.

  errfinishImpl: fprintf(stderr, "%s", error->message.data);

This looks like overkill. From reading the code, this does not look
like an improvement:

  fprintf(stderr, "invalid socket: %s", PQerrorMessage(st->con));

vs

  ereport(ELEVEL_LOG, (errmsg("invalid socket: %s", 
PQerrorMessage(st->con;


The whole complexity of the server-side interface only make sense
because TRY/CATCH stuff and complex logging requirements (eg several
outputs) in the backend. The patch adds quite some code and complexity
without clear added value that I can see.



My 0.02€: maybe you just want to turn

  fprintf(stderr, format, ...);
  // then possibly exit or abort depending...

into

  elog(level, format, ...);

which maybe would exit or abort depending on level, and possibly not
actually report under some levels and/or some conditions. For that, it
could enough to just provide an nice "elog" function.


I 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-06-10 Thread Fabien COELHO


Hello Marina,


v9-0003-Pgbench-errors-use-the-ereport-macro-to-report-de.patch
- a patch for the ereport() macro (this is used to report client failures 
that do not cause an aborts and this depends on the level of debugging).


ISTM that abort() is called under FATAL.

- implementation: if possible, use the local ErrorData structure during the 
errstart()/errmsg()/errfinish() calls. Otherwise use a static variable 
protected by a mutex if necessary. To do all of this export the function 
appendPQExpBufferVA from libpq.


This patch applies cleanly on top of the other ones (there are minimal 
interactions), compiles cleanly, global & pgbench "make check" are ok.


IMO this patch is more controversial than the other ones.

It is not really related to the aim of the patch series, which could do 
without, couldn't it? Moreover, it changes pgbench current behavior, which 
might be admissible, but should be discussed clearly.


I'd suggest that it should be an independent submission, unrelated to the 
pgbench error management patch.


The code adapts/duplicates existing server-side "ereport" stuff and brings 
it to the frontend, where the logging needs are somehow quite different.


I'd prefer to avoid duplication and/or have some code sharing. If it 
really needs to be duplicated, I'd suggest to put all this stuff in 
separated files. If we want to do that, I think that it would belong to 
fe_utils, and where it could/should be used by all front-end programs.


I do not understand why names are changed, eg ELEVEL_FATAL instead of 
FATAL. ISTM that part of the point of the move would be to be homogeneous, 
which suggests that the same names should be reused.


For logging purposes, ISTM that the "elog" macro interface is nicer, 
closer to the existing "fprintf(stderr", as it would not introduce the 
additional parentheses hack for "rest".


I see no actual value in creating on the fly a dynamic buffer through 
plenty macros and functions as the end result is just to print the message 
out to stderr in the end.


  errfinishImpl: fprintf(stderr, "%s", error->message.data);

This looks like overkill. From reading the code, this does not look
like an improvement:

  fprintf(stderr, "invalid socket: %s", PQerrorMessage(st->con));

vs

  ereport(ELEVEL_LOG, (errmsg("invalid socket: %s", PQerrorMessage(st->con;

The whole complexity of the server-side interface only make sense because 
TRY/CATCH stuff and complex logging requirements (eg several outputs) in 
the backend. The patch adds quite some code and complexity without clear 
added value that I can see.


The semantics of the existing code is changed, the FATAL levels calls 
abort() and replace existing exit(1) calls. Maybe you want an ERROR level 
as well.


My 0.02€: maybe you just want to turn

  fprintf(stderr, format, ...);
  // then possibly exit or abort depending...

into

  elog(level, format, ...);

which maybe would exit or abort depending on level, and possibly not 
actually report under some levels and/or some conditions. For that, it 
could enough to just provide an nice "elog" function.


In conclusion, which you can disagree with because maybe I have missed 
something... anyway I currently think that:


 - it should be an independent submission

 - possibly at "fe_utils" level

 - possibly just a nice "elog" function is enough, if so just do that.

--
Fabien.

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-06-09 Thread Marina Polyakova

On 09-06-2018 16:31, Fabien COELHO wrote:

Hello Marina,


Hello!


v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch
- a patch for the Variables structure (this is used to reset client 
variables during the repeating of transactions after 
serialization/deadlock failures).


About this second patch:

This extracts the variable holding structure, so that it is somehow
easier to reset them to their initial state on transaction failures,
the management of which is the ultimate aim of this patch series.

It is also cleaner this way.

Patch applies cleanly on top of the previous one (there is no real
interactions with it). It compiles cleanly. Global & pgbench "make
check" are both ok.


:-)


The structure typedef does not need a name. "typedef struct { } V...".


Ok!


I tend to disagree with naming things after their type, eg "array".
I'd suggest "vars" instead. "nvariables" could be "nvars" for
consistency with that and "vars_sorted", and because
"foo.variables->nvariables" starts looking heavy.

I'd suggest but "Variables" type declaration just after "Variable"
type declaration in the file.


Thank you, I agree and I'll fix all this.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-06-09 Thread Marina Polyakova

On 09-06-2018 9:55, Fabien COELHO wrote:

Hello Marina,


Hello!


v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch
- a patch for the RandomState structure (this is used to reset a 
client's random seed during the repeating of transactions after 
serialization/deadlock failures).


A few comments about this first patch.


Thank you very much!


Patch applies cleanly, compiles, global & pgbench "make check" ok.

I'm mostly ok with the changes, which cleanly separate the different
use of random between threads (script choice, throttle delay,
sampling...) and client (random*() calls).


Glad to hear it :)


This change is necessary so that a client can restart a transaction
deterministically (at the client level at least), which is the
ultimate aim of the patch series.

A few remarks:

The RandomState struct is 6 bytes, which will induce some padding when
used. This is life and pre-existing. No problem.

ISTM that the struct itself does not need a name, ie. "typedef struct
{ ... } RandomState" is enough.


Ok!


There could be clear comments, say in the TState and CState structs,
about what randomness is impacted (i.e. script choices, etc.).


Thank you, I'll add them.


getZipfianRand, computeHarmonicZipfian: The "thread" parameter was
justified because it was used for two fieds. As the random state is
separated, I'd suggest that the other argument should be a zipfcache
pointer.


I agree with you and I will change it.


While reading your patch, it occurs to me that a run is not
deterministic at the thread level under throttling and sampling,
because the random state is sollicited differently depending on when
transaction ends. This suggest that maybe each thread random_state use
should have its own random state.


Thank you, I'll fix this.


In passing, and totally unrelated to this patch:

I've always been a little puzzled about why a quite small 48-bit
internal state random generator is used. I understand the need for pg
to have a portable & state-controlled thread-safe random generator,
but why this particular small one fails me. The source code
(src/port/erand48.c, copyright in 1993...) looks optimized for 16 bits
architectures, which is probably pretty inefficent to run on 64 bits
architectures. Maybe this could be updated with something more
consistent with today's processors, providing more quality at a lower
cost.


This sounds interesting, thanks!
*went to look for a multiplier and a summand that are large enough and 
are mutually prime..*


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-06-09 Thread Fabien COELHO



Hello Marina,


v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch
- a patch for the Variables structure (this is used to reset client variables 
during the repeating of transactions after serialization/deadlock failures).


About this second patch:

This extracts the variable holding structure, so that it is somehow easier 
to reset them to their initial state on transaction failures, the 
management of which is the ultimate aim of this patch series.


It is also cleaner this way.

Patch applies cleanly on top of the previous one (there is no real 
interactions with it). It compiles cleanly. Global & pgbench "make check" 
are both ok.


The structure typedef does not need a name. "typedef struct { } V...".

I tend to disagree with naming things after their type, eg "array". I'd 
suggest "vars" instead. "nvariables" could be "nvars" for consistency with 
that and "vars_sorted", and because "foo.variables->nvariables" starts 
looking heavy.


I'd suggest but "Variables" type declaration just after "Variable" type 
declaration in the file.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-06-09 Thread Fabien COELHO



Hello Marina,


v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch
- a patch for the RandomState structure (this is used to reset a client's 
random seed during the repeating of transactions after serialization/deadlock 
failures).


A few comments about this first patch.

Patch applies cleanly, compiles, global & pgbench "make check" ok.

I'm mostly ok with the changes, which cleanly separate the different use 
of random between threads (script choice, throttle delay, sampling...) and 
client (random*() calls).


This change is necessary so that a client can restart a transaction 
deterministically (at the client level at least), which is the ultimate 
aim of the patch series.


A few remarks:

The RandomState struct is 6 bytes, which will induce some padding when 
used. This is life and pre-existing. No problem.


ISTM that the struct itself does not need a name, ie. "typedef struct { 
... } RandomState" is enough.


There could be clear comments, say in the TState and CState structs, about 
what randomness is impacted (i.e. script choices, etc.).


getZipfianRand, computeHarmonicZipfian: The "thread" parameter was 
justified because it was used for two fieds. As the random state is 
separated, I'd suggest that the other argument should be a zipfcache 
pointer.


While reading your patch, it occurs to me that a run is not deterministic 
at the thread level under throttling and sampling, because the random 
state is sollicited differently depending on when transaction ends. This 
suggest that maybe each thread random_state use should have its own random 
state.


In passing, and totally unrelated to this patch:

I've always been a little puzzled about why a quite small 48-bit internal 
state random generator is used. I understand the need for pg to have a 
portable & state-controlled thread-safe random generator, but why this 
particular small one fails me. The source code (src/port/erand48.c, 
copyright in 1993...) looks optimized for 16 bits architectures, which is 
probably pretty inefficent to run on 64 bits architectures. Maybe this 
could be updated with something more consistent with today's processors, 
providing more quality at a lower cost.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-05-08 Thread Alvaro Herrera
Hello,

Fabien COELHO wrote:

> > Looking over the diff, I find that this patch tries to do too much and
> > needs to be split up.
> 
> Yep, I agree that it would help the reviewing process. On the other hand I
> have bad memories about maintaining dependent patches which interfere
> significantly.

Sure.  I suggest not posting these patches separately -- instead, post
as a series of commits in a single email, attaching files from "git
format-patch".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-05-08 Thread Fabien COELHO


Hello Alvaro,


I think that I'll have time for a round of review in the first half of July.
Providing a rebased patch before then would be nice.



Note that even in the absence of a rebased patch, you can apply to an
older checkout if you have some limited window of time for a review.


Yes, sure. I'd like to bring this feature to be committable, so it will 
have to be rebased at some point anyway.



Looking over the diff, I find that this patch tries to do too much and
needs to be split up.


Yep, I agree that it would help the reviewing process. On the other hand I 
have bad memories about maintaining dependent patches which interfere 
significantly. Maybe it may not the case with this feature.


Thanks for the advices.

--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-05-08 Thread Alvaro Herrera
Fabien COELHO wrote:

> I think that I'll have time for a round of review in the first half of July.
> Providing a rebased patch before then would be nice.

Note that even in the absence of a rebased patch, you can apply to an
older checkout if you have some limited window of time for a review.

Looking over the diff, I find that this patch tries to do too much and
needs to be split up.  At a minimum there is a preliminary patch that
introduces the error reporting stuff (errstart etc); there are other
thread-related changes (for example to the random generation functions)
that probably belong in a separate one too.  Not sure if there are other
smaller patches hidden inside the rest.

On elog/errstart: we already have a convention for what ereport() calls
look like; I suggest to use that instead of inventing your own.  With
that, is there a need for elog()?  In the backend we have it because
$HISTORY but there's no need for that here -- I propose to lose elog()
and use only ereport everywhere.  Also, I don't see that you need
errmsg_internal() at all; let's lose it too.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-05-08 Thread Fabien COELHO


Hello Marina,

FYI the v8 patch does not apply anymore, mostly because of a recent perl 
reindentation.


I think that I'll have time for a round of review in the first half of 
July. Providing a rebased patch before then would be nice.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-04-05 Thread Ildus Kurbangaliev
On Wed, 04 Apr 2018 16:07:25 +0300
Marina Polyakova  wrote:

> Hello, hackers!
> 
> Here there's a seventh version of the patch for error handling and 
> retrying of transactions with serialization/deadlock failures in
> pgbench (based on the commit
> a08dc711952081d63577fc182fcf955958f70add). I added the option
> --max-tries-time which is an implemetation of Fabien Coelho's
> proposal in [1]: the transaction with serialization or deadlock
> failure can be retried if the total time of all its tries is less
> than this limit (in ms). This option can be combined with the option
> --max-tries. But if none of them are used, failed transactions are
> not retried at all.
> 
> Also:
> * Now when the first failure occurs in the transaction it is always 
> reported as a failure since only after the remaining commands of this 
> transaction are executed we find out whether we can try again or not. 
> Therefore add the messages about retrying or ending the failed 
> transaction to the "fails" debugging level so you can distinguish 
> failures (which are retried) and errors (which are not retried).
> * Fix a report on the latency average because the total time includes 
> time for both errors and successful transactions.
> * Code cleanup (including tests).
> 
> [1] 
> https://www.postgresql.org/message-id/alpine.DEB.2.20.1803292134380.16472%40lancre
> 
> > Maybe the max retry should rather be expressed in time rather than 
> > number
> > of attempts, or both approach could be implemented?  
> 

Hi, I did a little review of your patch. It seems to work as
expected, documentation and tests are there. Still I have few comments.

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in the
main code, wrap with some function like log(level, msg).

In CSTATE_RETRY state used_time is used only in printing but calculated
more than needed.

In my opinion Debuglevel should be renamed to DebugLevel that looks
nicer, also there DEBUGLEVEl (where last letter is in lower case) which
is very confusing.

I have checked overall functionality of this patch, but haven't checked
any special cases yet.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-03-29 Thread Fabien COELHO


Conception of max-retry option seems strange for me. if number of retries 
reaches max-retry option, then we just increment counter of failed 
transaction and try again (possibly, with different random numbers). At the 
end we should distinguish number of error transaction and failed transaction, 
to found this difference documentation  suggests to rerun pgbench with 
debugging on.


May be I didn't catch an idea, but it seems to me max-tries should be 
removed. On transaction searialization or deadlock error pgbench should 
increment counter of failed transaction, resets conditional stack, variables, 
etc but not a random generator and then start new transaction for the first 
line of script.


ISTM that there is the idea is that the client application should give up 
at some point are report an error to the end user, kind of a "timeout" on 
trying, and that max-retry would implement this logic of giving up: the 
transaction which was intented, represented by a given initial random 
generator state, could not be committed as if after some iterations.


Maybe the max retry should rather be expressed in time rather than number 
of attempts, or both approach could be implemented? But there is a logic 
of retrying the same (try again what the client wanted) vs retrying 
something different (another client need is served).


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-03-29 Thread Teodor Sigaev
Conception of max-retry option seems strange for me. if number of retries 
reaches max-retry option, then we just increment counter of failed transaction 
and try again (possibly, with different random numbers). At the end we should 
distinguish number of error transaction and failed transaction, to found this 
difference documentation  suggests to rerun pgbench with debugging on.


May be I didn't catch an idea, but it seems to me max-tries should be removed. 
On transaction searialization or deadlock error pgbench should increment counter 
of failed transaction, resets conditional stack, variables, etc but not a random 
generator and then start new transaction for the first line of script.


Marina Polyakova wrote:

On 26-03-2018 18:53, Fabien COELHO wrote:

Hello Marina,


Hello!


Many thanks to both of you! I'm working on a patch in this direction..


I think that the best approach for now is simply to reset (command
zero, random generator) and start over the whole script, without
attempting to be more intelligent. The limitations should be clearly
documented (one transaction per script), though. That would be a
significant enhancement already.


I'm not sure that we can always do this, because we can get new errors until 
we finish the failed transaction block, and we need destroy the conditional 
stack..


Sure. I'm suggesting so as to simplify that on failures the retry
would always restarts from the beginning of the script by resetting
everything, indeed including the conditional stack, the random
generator state, the variable values, and so on.

This mean enforcing somehow one script is one transaction.

If the user does not do that, it would be their decision and the
result becomes unpredictable on errors (eg some sub-transactions could
be executed more than once).

Then if more is needed, that could be for another patch.


Here is the fifth version of the patch for pgbench (based on the commit 
4b9094eb6e14dfdbed61278ea8e51cc846e43579) where I tried to implement these 
ideas, thanks to your comments and those of Teodor Sigaev. Since we may need to 
execute commands to complete a failed transaction block, the script is now 
always executed completely. If there is a serialization/deadlock failure which 
can be retried, the script is executed again with the same random state and 
array of variables as before its first run. Meta commands  errors as well as all 
SQL errors do not cause the aborting of the client. The first failure in the 
current script execution determines whether the script run will be retried or 
not, so only such failures (they have a retry) or errors (they are not retried) 
are reported.


I tried to make fixes in accordance with your previous reviews ([1], [2], [3]):


I'm unclear about the added example added in the documentation. There
are 71% errors, but 100% of transactions are reported as processed. If
there were errors, then it is not a success, so the transaction were
not
processed? To me it looks inconsistent. Also, while testing, it seems
that
failed transactions are counted in tps, which I think is not
appropriate:


About the feature:

 sh> PGOPTIONS='-c default_transaction_isolation=serializable' \
   ./pgbench -P 1 -T 3 -r -M prepared -j 2 -c 4
 starting vacuum...end.
 progress: 1.0 s, 10845.8 tps, lat 0.091 ms stddev 0.491, 10474 failed
 # NOT 10845.8 TPS...
 progress: 2.0 s, 10534.6 tps, lat 0.094 ms stddev 0.658, 10203 failed
 progress: 3.0 s, 10643.4 tps, lat 0.096 ms stddev 0.568, 10290 failed
 ...
 number of transactions actually processed: 32028 # NO!
 number of errors: 30969 (96.694 %)
 latency average = 2.833 ms
 latency stddev = 1.508 ms
 tps = 10666.720870 (including connections establishing) # NO
 tps = 10683.034369 (excluding connections establishing) # NO
 ...

For me this is all wrong. I think that the tps report is about
transactions
that succeeded, not mere attempts. I cannot say that a transaction
which aborted
was "actually processed"... as it was not.


Fixed


The order of reported elements is not logical:

 maximum number of transaction tries: 100
 scaling factor: 10
 query mode: prepared
 number of clients: 4
 number of threads: 2
 duration: 3 s
 number of transactions actually processed: 967
 number of errors: 152 (15.719 %)
 latency average = 9.630 ms
 latency stddev = 13.366 ms
 number of transactions retried: 623 (64.426 %)
 number of retries: 32272

I would suggest to group everything about error handling in one block,
eg something like:

 scaling factor: 10
 query mode: prepared
 number of clients: 4
 number of threads: 2
 duration: 3 s
 number of transactions actually processed: 967
 number of errors: 152 (15.719 %)
 number of transactions retried: 623 (64.426 %)
 number of retries: 32272
 maximum number of transaction tries: 100
 latency average = 9.630 ms
 latency stddev = 13.366 ms


Fixed


Also, percent character should be stuck to its number: 15.719% to have
the style more homogeneous (although there seems to be 

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-03-26 Thread Fabien COELHO


Hello Marina,


Many thanks to both of you! I'm working on a patch in this direction..


I think that the best approach for now is simply to reset (command
zero, random generator) and start over the whole script, without
attempting to be more intelligent. The limitations should be clearly
documented (one transaction per script), though. That would be a
significant enhancement already.


I'm not sure that we can always do this, because we can get new errors until 
we finish the failed transaction block, and we need destroy the conditional 
stack..


Sure. I'm suggesting so as to simplify that on failures the retry would 
always restarts from the beginning of the script by resetting everything, 
indeed including the conditional stack, the random generator state, the 
variable values, and so on.


This mean enforcing somehow one script is one transaction.

If the user does not do that, it would be their decision and the result 
becomes unpredictable on errors (eg some sub-transactions could be 
executed more than once).


Then if more is needed, that could be for another patch.

--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-03-26 Thread Marina Polyakova

On 25-03-2018 15:23, Fabien COELHO wrote:
Hm, I took a look on both thread about patch and it seems to me now 
it's overcomplicated. With recently committed enhancements of pgbench 
(\if, \when) it becomes close to impossible to retry transation in 
case of failure. So, initial approach just to rollback such 
transaction looks more attractive.


Yep.


Many thanks to both of you! I'm working on a patch in this direction..


I think that the best approach for now is simply to reset (command
zero, random generator) and start over the whole script, without
attempting to be more intelligent. The limitations should be clearly
documented (one transaction per script), though. That would be a
significant enhancement already.


I'm not sure that we can always do this, because we can get new errors 
until we finish the failed transaction block, and we need destroy the 
conditional stack..


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-03-25 Thread Fabien COELHO


Hm, I took a look on both thread about patch and it seems to me now it's 
overcomplicated. With recently committed enhancements of pgbench (\if, 
\when) it becomes close to impossible to retry transation in case of 
failure. So, initial approach just to rollback such transaction looks 
more attractive.


Yep.

I think that the best approach for now is simply to reset (command zero, 
random generator) and start over the whole script, without attempting to 
be more intelligent. The limitations should be clearly documented (one 
transaction per script), though. That would be a significant enhancement 
already.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-03-23 Thread Teodor Sigaev
I suggest a patch where pgbench client sessions are not disconnected because of 
serialization or deadlock failures and these failures are mentioned in reports. 
In details:
- transaction with one of these failures continue run normally, but its result 
is rolled back;

- if there were these failures during script execution this "transaction" is 
marked
appropriately in logs;
- numbers of "transactions" with these failures are printed in progress, in 
aggregation logs and in the end with other results (all and for each script);
Hm, I took a look on both thread about patch and it seems to me now it's 
overcomplicated. With recently committed enhancements of pgbench (\if, \when) it 
becomes close to impossible to retry transation in case of failure. So, initial 
approach just to rollback such transaction looks more attractive.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/