Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-19 Thread Fabien COELHO




It find it a little annoying that there is no change in tests, it
means that the format is not checked at all:-(


Yeah. Perhaps it's a little bit hard to perform this kind of tests in
the TAP test?


Not really. I'll look into it.

--
Fabien.




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-19 Thread Tatsuo Ishii
>>> Ok. Fine with me. Possibly at some point there was the idea that there
>>> could be other failures counted, but there are none. Also, there has
>>> been questions about the failures detailed option, or whether the
>>> reports should always be detailed, and the result may be some kind of
>>> not convincing compromise.
>>
>> Attached is the patch to remove the column.
> 
> Patch applies cleanly. Compilation ok. Global and local "make check"
> ok.
> Doc build ok.

Thank you for reviewing. Patch pushed.

> It find it a little annoying that there is no change in tests, it
> means that the format is not checked at all:-(

Yeah. Perhaps it's a little bit hard to perform this kind of tests in
the TAP test?

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-19 Thread Fabien COELHO




Ok. Fine with me. Possibly at some point there was the idea that there
could be other failures counted, but there are none. Also, there has
been questions about the failures detailed option, or whether the
reports should always be detailed, and the result may be some kind of
not convincing compromise.


Attached is the patch to remove the column.


Patch applies cleanly. Compilation ok. Global and local "make check" ok.
Doc build ok.

It find it a little annoying that there is no change in tests, it means 
that the format is not checked at all:-(


--
Fabien.




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-17 Thread Tatsuo Ishii
>> One thing that this doesn't fix is that the existing text appears
>> to suggest that the "failures" column is something different from
>> the sum of the serialization_failures and deadlock_failures
>> columns, which it's obvious from the code is not so.  If this isn't
>> a code bug then I think we ought to just drop that column entirely,
>> because it's redundant.
> 
> Ok. Fine with me. Possibly at some point there was the idea that there
> could be other failures counted, but there are none. Also, there has
> been questions about the failures detailed option, or whether the
> reports should always be detailed, and the result may be some kind of
> not convincing compromise.

Attached is the patch to remove the column.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 387a836287..dbae4e2321 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2591,15 +2591,6 @@ END;
  
 
 
-
- failures
- 
-  
-   number of transactions that ended with a failed SQL command
-  
- 
-
-
 
  serialization_failures
  
@@ -2629,8 +2620,8 @@ END;
 
 pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed --max-tries=10 test
 
-1649114136 5815 27552565 177846919143 1078 21716 2756787 7264696105 0 9661 0 7854 31472 4022 4022 0
-1649114146 5958 28460110 182785513108 1083 20391 2539395 6411761497 0 7268 0 8127 32595 4101 4101 0
+1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64 7564 28340 4148 0
+1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598 7392 26621 4527 0
 
   
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e63cea56a1..f8bcb1ab6d 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4498,7 +4498,6 @@ doLog(TState *thread, CState *st,
 			int64		skipped = 0;
 			int64		serialization_failures = 0;
 			int64		deadlock_failures = 0;
-			int64		serialization_or_deadlock_failures = 0;
 			int64		retried = 0;
 			int64		retries = 0;
 
@@ -4540,9 +4539,7 @@ doLog(TState *thread, CState *st,
 serialization_failures = agg->serialization_failures;
 deadlock_failures = agg->deadlock_failures;
 			}
-			serialization_or_deadlock_failures = serialization_failures + deadlock_failures;
-			fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT,
-	serialization_or_deadlock_failures,
+			fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
 	serialization_failures,
 	deadlock_failures);
 


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-10 Thread Tom Lane
Fabien COELHO  writes:
> While looking at the html outpout, the "pgbench" command line just below 
> wraps strangely:

>pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
>--latency-limit=10 --failures-detailed --max-tries=10 test

> ISTM that there should be no nl in the pgbench … 
> section, although maybe it would trigger a complaint in the pdf format.

PDF wraps that text where it wants to anyway, so I removed the newline.

regards, tom lane




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-10 Thread Fabien COELHO


Hello Tom,


The buildfarm is still complaining about the synopsis being too
wide for PDF format.  I think what we ought to do is give up on
using a  for log lines at all, and instead convert the
documentation into a tabular list of fields.  Proposal attached,
which also fixes a couple of outright errors.


Looks ok. Html doc generation ok.

While looking at the html outpout, the "pgbench" command line just below 
wraps strangely:


  pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
  --latency-limit=10 --failures-detailed --max-tries=10 test

ISTM that there should be no nl in the pgbench … 
section, although maybe it would trigger a complaint in the pdf format.



One thing that this doesn't fix is that the existing text appears
to suggest that the "failures" column is something different from
the sum of the serialization_failures and deadlock_failures
columns, which it's obvious from the code is not so.  If this isn't
a code bug then I think we ought to just drop that column entirely,
because it's redundant.


Ok. Fine with me. Possibly at some point there was the idea that there 
could be other failures counted, but there are none. Also, there has been 
questions about the failures detailed option, or whether the reports 
should always be detailed, and the result may be some kind of not 
convincing compromise.



(BTW, now that I've read this stuff I am quite horrified by how
the non-aggregated log format has been mangled for error retries,
and will be probably be submitting a proposal to change that.
But that's a different issue.)


Indeed, any improvement is welcome!

--
Fabien.

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-09 Thread Tatsuo Ishii
> The buildfarm is still complaining about the synopsis being too
> wide for PDF format.  I think what we ought to do is give up on
> using a  for log lines at all, and instead convert the
> documentation into a tabular list of fields.  Proposal attached,
> which also fixes a couple of outright errors.

Once I thought about that too. Looks good to me.

> One thing that this doesn't fix is that the existing text appears
> to suggest that the "failures" column is something different from
> the sum of the serialization_failures and deadlock_failures
> columns, which it's obvious from the code is not so.  If this isn't
> a code bug then I think we ought to just drop that column entirely,
> because it's redundant.

+1.

> (BTW, now that I've read this stuff I am quite horrified by how
> the non-aggregated log format has been mangled for error retries,
> and will be probably be submitting a proposal to change that.
> But that's a different issue.)

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-09 Thread Tom Lane
Tatsuo Ishii  writes:
> Patch pushed. Thanks.

The buildfarm is still complaining about the synopsis being too
wide for PDF format.  I think what we ought to do is give up on
using a  for log lines at all, and instead convert the
documentation into a tabular list of fields.  Proposal attached,
which also fixes a couple of outright errors.

One thing that this doesn't fix is that the existing text appears
to suggest that the "failures" column is something different from
the sum of the serialization_failures and deadlock_failures
columns, which it's obvious from the code is not so.  If this isn't
a code bug then I think we ought to just drop that column entirely,
because it's redundant.

(BTW, now that I've read this stuff I am quite horrified by how
the non-aggregated log format has been mangled for error retries,
and will be probably be submitting a proposal to change that.
But that's a different issue.)

regards, tom lane

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 9ba26e5e86..d12cbaa8ab 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2289,33 +2289,95 @@ END;
   
 
   
-   The format of the log is:
-
-
-client_id transaction_no time script_no time_epoch time_us  schedule_lag   retries 
-
-
-   where
-   client_id indicates which client session ran the transaction,
-   transaction_no counts how many transactions have been
-   run by that session,
-   time is the total elapsed transaction time in microseconds,
-   script_no identifies which script file was used (useful when
-   multiple scripts were specified with -f or -b),
-   and time_epoch/time_us are a
-   Unix-epoch time stamp and an offset
-   in microseconds (suitable for creating an ISO 8601
-   time stamp with fractional seconds) showing when
-   the transaction completed.
-   The schedule_lag field is the difference between the
-   transaction's scheduled start time, and the time it actually started, in
-   microseconds. It is only present when the --rate option is used.
+   Each line in a log file describes one transaction.
+   It contains the following space-separated fields:
+
+   
+
+ client_id
+ 
+  
+   identifies the client session that ran the transaction
+  
+ 
+
+
+
+ transaction_no
+ 
+  
+   counts how many transactions have been run by that session
+  
+ 
+
+
+
+ time
+ 
+  
+   transaction's elapsed time, in microseconds
+  
+ 
+
+
+
+ script_no
+ 
+  
+   identifies the script file that was used for the transaction
+   (useful when multiple scripts are specified
+   with -f or -b)
+  
+ 
+
+
+
+ time_epoch
+ 
+  
+   transaction's completion time, as a Unix-epoch time stamp
+  
+ 
+
+
+
+ time_us
+ 
+  
+   fractional-second part of transaction's completion time, in
+   microseconds
+  
+ 
+
+
+
+ schedule_lag
+ 
+  
+   difference between the transaction's scheduled start time and the
+   time it actually started, in microseconds
+   (present only if --rate is specified)
+  
+ 
+
+
+
+ retries
+ 
+  
+   count of retries after serialization or deadlock errors during the
+   transaction
+   (present only if --max-tries is not equal to one)
+  
+ 
+
+   
+  
+
+  
When both --rate and --latency-limit are used,
the time for a skipped transaction will be reported as
skipped.
-   retries is the sum of all retries after the
-   serialization or deadlock errors during the current script execution. It is
-   present only if the --max-tries option is not equal to 1.
If the transaction ends with a failure, its time
will be reported as failed. If you use the
--failures-detailed option, the
@@ -2398,66 +2460,171 @@ END;
 
   
With the --aggregate-interval option, a different
-   format is used for the log files:
-
-
-interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
-sum_lag sum_lag_2 min_lag max_lag skipped
-retried retries failures serialization_failures deadlock_failures
-
-
-   where
-   interval_start is the start of the interval (as a Unix
-   epoch time stamp),
-   num_transactions is the number of transactions
-   within the interval,
-   sum_latency is the sum of the transaction
-   latencies within the interval,
-   sum_latency_2 is the sum of squares of the
-   transaction latencies within the interval,
-   min_latency is the minimum latency within the interval,
-   and
-   max_latency is the maximum latency within the interval,
-   failures is the number of transactions that ended
-   with a failed SQL command within the interval.
-  
-  
-   The next fields,
-   sum_lag, sum_lag_2, min_lag,
-   and max_lag, only meaningful if the --rate
-   option is used. Otherwise, they are all 0.0.
-   They 

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-05 Thread Tatsuo Ishii
>> I would suggest to reorder the last chunk to:
>> 
>>... retried retries failures serfail dlfail
>> 
>> because I intend to add connection failures handling at some point,
>> and it would make more sense to add the corresponding count at the end
>> with other fails.
> 
> Ok, I have adjusted the patch. V2 patch attached.

Patch pushed. Thanks.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-04 Thread Tatsuo Ishii
Hi Fabien,

> Hello Tatsuo-san,
> 
>> interval_start num_transactions sum_latency sum_latency_2 min_latency
>> max_latency
>> sum_lag sum_lag_2 min_lag max_lag skipped
>> failures serialization_failures deadlock_failures retried retries
> 
> I would suggest to reorder the last chunk to:
> 
>... retried retries failures serfail dlfail
> 
> because I intend to add connection failures handling at some point,
> and it would make more sense to add the corresponding count at the end
> with other fails.

Ok, I have adjusted the patch. V2 patch attached.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..d1818ff316 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2401,7 +2401,9 @@ END;
format is used for the log files:
 
 
-interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures }  sum_lag sum_lag_2 min_lag max_lag  skippedretried retries 
+interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
+sum_lag sum_lag_2 min_lag max_lag skipped
+retried retries failures serialization_failures deadlock_failures
 
 
where
@@ -2417,41 +2419,55 @@ END;
and
max_latency is the maximum latency within the interval,
failures is the number of transactions that ended
-   with a failed SQL command within the interval. If you use option
-   --failures-detailed, instead of the sum of all failed
-   transactions you will get more detailed statistics for the failed
-   transactions grouped by the following types:
-   serialization_failures is the number of
-   transactions that got a serialization error and were not retried after this,
-   deadlock_failures is the number of transactions
-   that got a deadlock error and were not retried after this.
+   with a failed SQL command within the interval.
+  
+  
The next fields,
sum_lag, sum_lag_2, min_lag,
-   and max_lag, are only present if the --rate
-   option is used.
+   and max_lag, only meaningful if the --rate
+   option is used. Otherwise, they are all 0.0.
They provide statistics about the time each transaction had to wait for the
previous one to finish, i.e., the difference between each transaction's
scheduled start time and the time it actually started.
The next field, skipped,
-   is only present if the --latency-limit option is used, too.
+   is only meaningful if the --latency-limit option is used, too. Otherwise it is 0.
It counts the number of transactions skipped because they would have
started too late.
-   The retried and retries
-   fields are present only if the --max-tries option is not
-   equal to 1. They report the number of retried transactions and the sum of all
-   retries after serialization or deadlock errors within the interval.
-   Each transaction is counted in the interval when it was committed.
+  
+  
+   The retried
+   and retries fields are only meaningful if
+   the --max-tries option is not equal to 1. Otherwise they
+   are 0. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.  Each
+   transaction is counted in the interval when it was committed.
+  
+  
+   failures is the sum of all failed transactions.
+   If --failures-detailed is specified, instead of the sum of
+   all failed transactions you will get more detailed statistics for the
+   failed transactions grouped by the following types:
+   serialization_failures is the number of
+   transactions that got a serialization error and were not retried after this,
+   deadlock_failures is the number of transactions
+   that got a deadlock error and were not retried after this.
+   If --failures-detailed is not
+   specified, serialization_failures
+   and deadlock_failures are always 0.
   
 
   
-   Here is some example output:
+   Here is some example output with following options:
 
-1345828501 5601 1542744 483552416 61 2573 0
-1345828503 7884 1979812 565806736 60 1479 0
-1345828505 7208 1979422 567277552 59 1391 0
-1345828507 7685 1980268 569784714 60 1398 0
-1345828509 7073 1979779 573489941 236 1411 0
-
+pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
+--latency-limit=10 --failures-detailed --max-tries=10 test
+
+
+
+1649114136 5815 27552565 177846919143 1078 21716 2756787 7264696105 0 9661 0 7854 31472 4022 4022 0
+1649114146 5958 28460110 182785513108 1083 20391 2539395 6411761497 0 7268 0 8127 32595 4101 4101 0
+
+  
 
   
Notice that while the plain (unaggregated) log file shows which script
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index acf3e56413..4d4b979e4f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4494,6 +4494,17 @@ doLog(TState *thread, 

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-04 Thread Fabien COELHO



Hello Tatsuo-san,


interval_start num_transactions sum_latency sum_latency_2 min_latency 
max_latency
sum_lag sum_lag_2 min_lag max_lag skipped
failures serialization_failures deadlock_failures retried retries


I would suggest to reorder the last chunk to:

   ... retried retries failures serfail dlfail

because I intend to add connection failures handling at some point, and it 
would make more sense to add the corresponding count at the end with other 
fails.


--
Fabien Coelho - CRI, MINES ParisTech




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-03 Thread Tatsuo Ishii
>> I think it's easier to just say "if feature X is not enabled, then
>> columns XYZ are always zeroes".
> 
> Ok, I will come up with a patch in this direction.

Please find attached patch for this.

With the patch, the log line is as follows (actually no line foldings of 
course): 

interval_start num_transactions sum_latency sum_latency_2 min_latency 
max_latency
sum_lag sum_lag_2 min_lag max_lag skipped
failures serialization_failures deadlock_failures retried retries

I updated the doc as well:

- fold the log line using line feed to avoid error in rendering PDF. I
  did not use  because it does not enhance HTML output.

- split explanation of the log output into multiple paragraphs to
  enhance readability.

- replace the example output with full options are specified.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..e1f98ae228 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2401,7 +2401,9 @@ END;
format is used for the log files:
 
 
-interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures }  sum_lag sum_lag_2 min_lag max_lag  skippedretried retries 
+interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
+sum_lag sum_lag_2 min_lag max_lag skipped
+failures serialization_failures deadlock_failures retried retries
 
 
where
@@ -2417,41 +2419,55 @@ END;
and
max_latency is the maximum latency within the interval,
failures is the number of transactions that ended
-   with a failed SQL command within the interval. If you use option
-   --failures-detailed, instead of the sum of all failed
-   transactions you will get more detailed statistics for the failed
-   transactions grouped by the following types:
-   serialization_failures is the number of
-   transactions that got a serialization error and were not retried after this,
-   deadlock_failures is the number of transactions
-   that got a deadlock error and were not retried after this.
+   with a failed SQL command within the interval.
+  
+  
The next fields,
sum_lag, sum_lag_2, min_lag,
-   and max_lag, are only present if the --rate
-   option is used.
+   and max_lag, only meaningful if the --rate
+   option is used. Otherwise, they are all 0.0.
They provide statistics about the time each transaction had to wait for the
previous one to finish, i.e., the difference between each transaction's
scheduled start time and the time it actually started.
The next field, skipped,
-   is only present if the --latency-limit option is used, too.
+   is only meaningful if the --latency-limit option is used, too. Otherwise it is 0.
It counts the number of transactions skipped because they would have
started too late.
-   The retried and retries
-   fields are present only if the --max-tries option is not
-   equal to 1. They report the number of retried transactions and the sum of all
-   retries after serialization or deadlock errors within the interval.
-   Each transaction is counted in the interval when it was committed.
+  
+  
+   failures is the sum of all failed transactions.
+   If --failures-detailed is specified, instead of the sum of
+   all failed transactions you will get more detailed statistics for the
+   failed transactions grouped by the following types:
+   serialization_failures is the number of
+   transactions that got a serialization error and were not retried after this,
+   deadlock_failures is the number of transactions
+   that got a deadlock error and were not retried after this.
+   If --failures-detailed is not
+   specified, serialization_failures
+   and deadlock_failures are always 0.
+  
+  
+   The retried
+   and retries fields are only meaningful if
+   the --max-tries option is not equal to 1. Otherwise they
+   are 0. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.  Each
+   transaction is counted in the interval when it was committed.
   
 
   
-   Here is some example output:
+   Here is some example output with following options:
 
-1345828501 5601 1542744 483552416 61 2573 0
-1345828503 7884 1979812 565806736 60 1479 0
-1345828505 7208 1979422 567277552 59 1391 0
-1345828507 7685 1980268 569784714 60 1398 0
-1345828509 7073 1979779 573489941 236 1411 0
-
+pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
+--latency-limit=10 --failures-detailed --max-tries=10 test
+
+
+
+1649033235 5398 26186948 170797073304 1051 15347 2471626 6290343560 0 8261 0 3925 3925 0 7524 29534
+1649033245 5651 27345519 210270761637 1011 67780 2480555 6835066067 0  496 3839 3839 0 7533 30118
+
+  
 
   
Notice that while the plain (unaggregated) log file shows 

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-03 Thread Tom Lane
Alvaro Herrera  writes:
> I think it's easier to just say "if feature X is not enabled, then
> columns XYZ are always zeroes".

+1, that's pretty much what I was thinking.

regards, tom lane




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-03 Thread Tatsuo Ishii
>> My 0.02€:
>> 
>> I agree that it would be better to have a more deterministic aggregated log
>> format.
>> 
>> ISTM that it should skip failures and lags if no fancy options has been
>> selected, i.e.:
>> 
>>   [ fails ... retries [ sum_lag ... [ skipped ] ] ?
> 
> I think it's easier to just say "if feature X is not enabled, then
> columns XYZ are always zeroes".

Ok, I will come up with a patch in this direction.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-03 Thread Alvaro Herrera
On 2022-Apr-03, Fabien COELHO wrote:

> > What about this? (a log line is not actually folded)
> > interval_start num_transactions sum_latency sum_latency_2 min_latency 
> > max_latency
> > failures serialization_failures deadlock_failures retried retries [ sum_lag 
> > sum_lag_2 min_lag max_lag [ skipped ] ]
> 
> My 0.02€:
> 
> I agree that it would be better to have a more deterministic aggregated log
> format.
> 
> ISTM that it should skip failures and lags if no fancy options has been
> selected, i.e.:
> 
>   [ fails ... retries [ sum_lag ... [ skipped ] ] ?

I think it's easier to just say "if feature X is not enabled, then
columns XYZ are always zeroes".

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-03 Thread Tatsuo Ishii
>>> Or those three columns always, sum_lag sum_lag_2, min_lag max_lag,
>>> skipped, retried retries?
>>
>> What about this? (a log line is not actually folded)
>> interval_start num_transactions sum_latency sum_latency_2 min_latency
>> max_latency
>> failures serialization_failures deadlock_failures retried retries [
>> sum_lag sum_lag_2 min_lag max_lag [ skipped ] ]
> 
> My 0.02€:
> 
> I agree that it would be better to have a more deterministic
> aggregated log format.
> 
> ISTM that it should skip failures and lags if no fancy options has
> been selected, i.e.:
> 
>   [ fails ... retries [ sum_lag ... [ skipped ] ] ?
> 
> Alterlatively, as the failure stuff is added to the format, maybe it
> could be at the end:
> 
>   [ sum_lag ... [ skipped [ fails ... retries ] ] ] ?

I like this one.

interval_start num_transactions sum_latency sum_latency_2 min_latency 
max_latency
[sum_lag sum_lag_2 min_lag max_lag [ skipped [
failures serialization_failures deadlock_failures retried retries ] ] ]

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-03 Thread Fabien COELHO


Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, 
skipped, retried retries?


What about this? (a log line is not actually folded)
interval_start num_transactions sum_latency sum_latency_2 min_latency 
max_latency
failures serialization_failures deadlock_failures retried retries [ sum_lag 
sum_lag_2 min_lag max_lag [ skipped ] ]


My 0.02€:

I agree that it would be better to have a more deterministic 
aggregated log format.


ISTM that it should skip failures and lags if no fancy options has been 
selected, i.e.:


  [ fails ... retries [ sum_lag ... [ skipped ] ] ?

Alterlatively, as the failure stuff is added to the format, maybe it could 
be at the end:


  [ sum_lag ... [ skipped [ fails ... retries ] ] ] ?



failures:
always 0 (if --max-tries is 1, the default)
sum of serialization_failures and deadlock_failures (if --max-tries is not 1)

serialization_failures and deadlock_failures:
always 0 (if --max-tries is 1, the default)
0 or more (if --max-tries is not 1)

retried and retries:
always 0 (if --max-tries is 1, the default)
0 or more (if --max-tries is not 1)

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



--
Fabien.

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-04-03 Thread Tatsuo Ishii
>> I think the problem is not merely one of documentation, but one of
>> bad design.  Up to now it was possible to tell what was what from
>> counting the number of columns in the output; but with this design,
>> that is impossible.  That should be fixed.  The first thing you have
>> got to do is drop the alternation { failures | serialization_failures
>> deadlock_failures }.  That doesn't make any sense in the first place:
>> counting serialization and deadlock failures doesn't make it impossible
>> for other errors to occur.  It'd probably make the most sense to have
>> three columns always, serialization, deadlock and total.
> 
> +1.
> 
>> Now maybe
>> that change alone is sufficient, but I'm not convinced, because the
>> multiple options at the end of the line mean we will never again be
>> able to add any more columns without reintroducing ambiguity.  I
>> would be happier if the syntax diagram were such that columns could
>> only be dropped from right to left.
> 
> Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, 
> retried retries?

What about this? (a log line is not actually folded)
interval_start num_transactions sum_latency sum_latency_2 min_latency 
max_latency
failures serialization_failures deadlock_failures retried retries [ sum_lag 
sum_lag_2 min_lag max_lag [ skipped ] ]

failures:
always 0 (if --max-tries is 1, the default)
sum of serialization_failures and deadlock_failures (if --max-tries is not 1)

serialization_failures and deadlock_failures:
always 0 (if --max-tries is 1, the default)
0 or more (if --max-tries is not 1)

retried and retries:
always 0 (if --max-tries is 1, the default)
0 or more (if --max-tries is not 1)

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-31 Thread Fabien COELHO



Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, 
skipped, retried retries?


Anyway now that current CF is closing, it will not be possible to
change those logging design soon. Or can we change the logging design
even after CF is closed?


My 0.02€: I'm not sure how the official guidelines are to be interpreted 
in that case, but if the design is to be changed, ISTM that it is better 
to do it before a release instead of letting the release out with one 
format and changing it in the next release?


--
Fabien.

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-31 Thread Tatsuo Ishii
> Alvaro Herrera  writes:
>>> After:
>>> interval_start num_transactions sum_latency sum_latency_2 min_latency 
>>> max_latency
>>> { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 
>>> min_lag max_lag [ skipped ] ] [ retried retries ]
> 
>> I think that the explanatory paragraph is way too long now, particularly
>> since it explains --failures-detailed starting in the middle.  Also, the
>> example output doesn't include the failures-detailed mode.
> 
> I think the problem is not merely one of documentation, but one of
> bad design.  Up to now it was possible to tell what was what from
> counting the number of columns in the output; but with this design,
> that is impossible.  That should be fixed.  The first thing you have
> got to do is drop the alternation { failures | serialization_failures
> deadlock_failures }.  That doesn't make any sense in the first place:
> counting serialization and deadlock failures doesn't make it impossible
> for other errors to occur.  It'd probably make the most sense to have
> three columns always, serialization, deadlock and total.

+1.

> Now maybe
> that change alone is sufficient, but I'm not convinced, because the
> multiple options at the end of the line mean we will never again be
> able to add any more columns without reintroducing ambiguity.  I
> would be happier if the syntax diagram were such that columns could
> only be dropped from right to left.

Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, 
retried retries?

Anyway now that current CF is closing, it will not be possible to
change those logging design soon. Or can we change the logging design
even after CF is closed?

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-28 Thread Tatsuo Ishii
> Hello,
> 
> On 2022-Mar-27, Tatsuo Ishii wrote:
> 
>> After:
>> interval_start num_transactions sum_latency sum_latency_2 min_latency 
>> max_latency
>>   { failures | serialization_failures deadlock_failures } [ sum_lag 
>> sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]
> 
> You're showing an indentation, but looking at the HTML output there is
> no such.  Is the HTML processor eating leading whitespace or something
> like that?

I just copied from my web browser screen (Firefox 98.0.2 on Ubuntu 20).

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-28 Thread Tom Lane
Alvaro Herrera  writes:
>> After:
>> interval_start num_transactions sum_latency sum_latency_2 min_latency 
>> max_latency
>> { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 
>> min_lag max_lag [ skipped ] ] [ retried retries ]

> I think that the explanatory paragraph is way too long now, particularly
> since it explains --failures-detailed starting in the middle.  Also, the
> example output doesn't include the failures-detailed mode.

I think the problem is not merely one of documentation, but one of
bad design.  Up to now it was possible to tell what was what from
counting the number of columns in the output; but with this design,
that is impossible.  That should be fixed.  The first thing you have
got to do is drop the alternation { failures | serialization_failures
deadlock_failures }.  That doesn't make any sense in the first place:
counting serialization and deadlock failures doesn't make it impossible
for other errors to occur.  It'd probably make the most sense to have
three columns always, serialization, deadlock and total.  Now maybe
that change alone is sufficient, but I'm not convinced, because the
multiple options at the end of the line mean we will never again be
able to add any more columns without reintroducing ambiguity.  I
would be happier if the syntax diagram were such that columns could
only be dropped from right to left.

regards, tom lane




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-28 Thread Alvaro Herrera
Hello,

On 2022-Mar-27, Tatsuo Ishii wrote:

> After:
> interval_start num_transactions sum_latency sum_latency_2 min_latency 
> max_latency
>   { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 
> min_lag max_lag [ skipped ] ] [ retried retries ]

You're showing an indentation, but looking at the HTML output there is
no such.  Is the HTML processor eating leading whitespace or something
like that?

I think that the explanatory paragraph is way too long now, particularly
since it explains --failures-detailed starting in the middle.  Also, the
example output doesn't include the failures-detailed mode.  I suggest
that this should be broken down even more; first to explain the output
without failures-detailed, including an example, and then the output
with failures-detailed, and an example of that.  Something like this,
perhaps:

Aggregated Logging
With the --aggregate-interval option, a different format is used for the log 
files (note that the actual log line is not folded).

  interval_start num_transactions sum_latency sum_latency_2 min_latency 
max_latency
  failures [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

where interval_start is the start of the interval (as a Unix epoch time stamp), 
num_transactions is the number of transactions within the interval, sum_latency 
is the sum of the transaction latencies within the interval, sum_latency_2 is 
the sum of squares of the transaction latencies within the interval, 
min_latency is the minimum latency within the interval, and max_latency is the 
maximum latency within the interval, failures is the number of transactions 
that ended with a failed SQL command within the interval.

The next fields, sum_lag, sum_lag_2, min_lag, and max_lag, are only present if 
the --rate option is used. They provide statistics about the time each 
transaction had to wait for the previous one to finish, i.e., the difference 
between each transaction's scheduled start time and the time it actually 
started. The next field, skipped, is only present if the --latency-limit option 
is used, too. It counts the number of transactions skipped because they would 
have started too late. The retried and retries fields are present only if the 
--max-tries option is not equal to 1. They report the number of retried 
transactions and the sum of all retries after serialization or deadlock errors 
within the interval. Each transaction is counted in the interval when it was 
committed.

Notice that while the plain (unaggregated) log file shows which script was used 
for each transaction, the aggregated log does not. Therefore if you need 
per-script data, you need to aggregate the data on your own.

Here is some example output:

1345828501 5601 1542744 483552416 61 2573 0
1345828503 7884 1979812 565806736 60 1479 0
1345828505 7208 1979422 567277552 59 1391 0
1345828507 7685 1980268 569784714 60 1398 0
1345828509 7073 1979779 573489941 236 1411 0

If you use option --failures-detailed, instead of the sum of all failed 
transactions you will get more detailed statistics for the failed transactions:

  interval_start num_transactions sum_latency sum_latency_2 min_latency 
max_latency
  serialization_failures deadlock_failures [ sum_lag sum_lag_2 min_lag max_lag 
[ skipped ] ] [ retried retries ]

This is similar to the above, but here the single 'failures' figure is replaced 
by serialization_failures which is the number of transactions that got a 
serialization error and were not retried after this, deadlock_failures which is 
the number of transactions that got a deadlock error and were not retried after 
this. The other fields are as above. Here is some example output:

[example with detailed failures]

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."   (New York Times, about Microsoft PowerPoint)




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-28 Thread Tatsuo Ishii
>>> > Even applying this patch, "make postgres-A4.pdf" arises the warning on my
>>> > machine. After some investigations, I found that previous document had a 
>>> > break
>>> > after 'num_transactions', but it has been removed due to this commit.
>>> 
>>> Yes, your patch removed "".
>>> 
>>> > So,
>>> > I would like to get back this as it was. I attached the patch.
>>> 
>>> This produces errors. Needs ";" postfix?
>> 
>> Oops. Yes, it needs ';'. Also, I found another "" dropped.
>> I attached the fixed patch.
> 
> Basic problem with this patch is, this may solve the issue with pdf
> generation but this does not solve the issue with HTML generation. The
> PDF manual of pgbench has ridiculously long line, which Tom Lane
I meant "HTML manual" here.

> complained too:
> 
> interval_start num_transactions​ sum_latency sum_latency_2 min_latency 
> max_latency​ { failures | serialization_failures deadlock_failures } [ 
> sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]
> 
> Why can't we use just line feeds instead of ? Although it's not
> a command usage but the SELECT manual already uses line feeds to
> nicely break into multiple lines of command usage.
> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-28 Thread Tatsuo Ishii
>> > Even applying this patch, "make postgres-A4.pdf" arises the warning on my
>> > machine. After some investigations, I found that previous document had a 
>> > break
>> > after 'num_transactions', but it has been removed due to this commit.
>> 
>> Yes, your patch removed "".
>> 
>> > So,
>> > I would like to get back this as it was. I attached the patch.
>> 
>> This produces errors. Needs ";" postfix?
> 
> Oops. Yes, it needs ';'. Also, I found another "" dropped.
> I attached the fixed patch.

Basic problem with this patch is, this may solve the issue with pdf
generation but this does not solve the issue with HTML generation. The
PDF manual of pgbench has ridiculously long line, which Tom Lane
complained too:

interval_start num_transactions​ sum_latency sum_latency_2 min_latency 
max_latency​ { failures | serialization_failures deadlock_failures } [ sum_lag 
sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

Why can't we use just line feeds instead of ? Although it's not
a command usage but the SELECT manual already uses line feeds to
nicely break into multiple lines of command usage.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-27 Thread Yugo NAGATA
On Mon, 28 Mar 2022 12:17:13 +0900 (JST)
Tatsuo Ishii  wrote:

> > Even applying this patch, "make postgres-A4.pdf" arises the warning on my
> > machine. After some investigations, I found that previous document had a 
> > break
> > after 'num_transactions', but it has been removed due to this commit.
> 
> Yes, your patch removed "".
> 
> > So,
> > I would like to get back this as it was. I attached the patch.
> 
> This produces errors. Needs ";" postfix?

Oops. Yes, it needs ';'. Also, I found another "" dropped.
I attached the fixed patch.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..b16a5b9b7b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2401,7 +2401,7 @@ END;
format is used for the log files:
 
 
-interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures }  sum_lag sum_lag_2 min_lag max_lag  skippedretried retries 
+interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures }  sum_lag sum_lag_2 min_lag max_lag  skippedretried retries 
 
 
where


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-27 Thread Tatsuo Ishii
> Even applying this patch, "make postgres-A4.pdf" arises the warning on my
> machine. After some investigations, I found that previous document had a break
> after 'num_transactions', but it has been removed due to this commit.

Yes, your patch removed "".

> So,
> I would like to get back this as it was. I attached the patch.

This produces errors. Needs ";" postfix?

ref/pgbench.sgml:2404: parser error : EntityRef: expecting ';'
le>interval_start num_transactions
   ^
ref/pgbench.sgml:2781: parser error : chunk is not well balanced

^
reference.sgml:251: parser error : Failure to process entity pgbench
   
^
reference.sgml:251: parser error : Entity 'pgbench' not defined
   
^
reference.sgml:296: parser error : chunk is not well balanced

^
postgres.sgml:240: parser error : Failure to process entity reference
 
^
postgres.sgml:240: parser error : Entity 'reference' not defined
 
^
make: *** [Makefile:135: html-stamp] エラー 1

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-27 Thread Yugo NAGATA
On Sun, 27 Mar 2022 15:28:41 +0900 (JST)
Tatsuo Ishii  wrote:

> > This patch has caused the PDF documentation to fail to build cleanly:
> > 
> > [WARN] FOUserAgent - The contents of fo:block line 1 exceed the available 
> > area in the inline-progression direction by more than 50 points. (See 
> > position 125066:375)
> > 
> > It's complaining about this:
> > 
> > 
> > interval_start 
> > num_transactions 
> > sum_latency 
> > sum_latency_2 
> > min_latency 
> > max_latency { 
> > failures | 
> > serialization_failures 
> > deadlock_failures }  
> > sum_lag sum_lag_2 
> > min_lag max_lag 
> >  skipped   
> >  retried 
> > retries 
> > 
> > 
> > which runs much too wide in HTML format too, even though that toolchain
> > doesn't tell you so.
> 
> Yeah.
> 
> > We could silence the warning by inserting an arbitrary line break or two,
> > or refactoring the syntax description into multiple parts.  Either way
> > seems to create a risk of confusion.
> 
> I think we can fold the line nicely. Here is the rendered image.
> 
> Before:
> interval_start num_transactions sum_latency sum_latency_2 min_latency 
> max_latency { failures | serialization_failures deadlock_failures } [ sum_lag 
> sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]
> 
> After:
> interval_start num_transactions sum_latency sum_latency_2 min_latency 
> max_latency
>   { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 
> min_lag max_lag [ skipped ] ] [ retried retries ]
> 
> Note that before it was like this:
> 
> interval_start num_transactions​ sum_latency sum_latency_2 min_latency 
> max_latency​ [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ]
> 
> So newly added items are "{ failures | serialization_failures 
> deadlock_failures }" and " [ retried retries ]".
> 
> > TBH, I think the *real* problem is that the complexity of this log format
> > has blown past "out of hand".  Can't we simplify it?  Who is really going
> > to use all these numbers?  I pity the poor sucker who tries to write a
> > log analysis tool that will handle all the variants.
> 
> Well, the extra logging items above only appear when the retry feature
> is enabled. For those who do not use the feature the only new logging
> item is "failures". For those who use the feature, the extra logging
> items are apparently necessary. For example if we write an application
> using repeatable read or serializable transaction isolation mode,
> retrying failed transactions due to srialization error is an essential
> technique. Also the retry rate of transactions will deeply affect the
> performance and in such use cases the newly added items will be
> precisou information. I would suggest leave the log items as it is.
> 
> Patch attached.

Even applying this patch, "make postgres-A4.pdf" arises the warning on my
machine. After some investigations, I found that previous document had a break
after 'num_transactions', but it has been removed due to this commit. So,
I would like to get back this as it was. I attached the patch.

Regards,
Yugo Nagata


-- 
Yugo NAGATA 
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..4437d5ef53 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2401,7 +2401,7 @@ END;
format is used for the log files:
 
 
-interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures }  sum_lag sum_lag_2 min_lag max_lag  skippedretried retries 
+interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures }  sum_lag sum_lag_2 min_lag max_lag  skippedretried retries 
 
 
where


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-27 Thread Tatsuo Ishii
> This patch has caused the PDF documentation to fail to build cleanly:
> 
> [WARN] FOUserAgent - The contents of fo:block line 1 exceed the available 
> area in the inline-progression direction by more than 50 points. (See 
> position 125066:375)
> 
> It's complaining about this:
> 
> 
> interval_start 
> num_transactions 
> sum_latency 
> sum_latency_2 
> min_latency max_latency 
> { failures | 
> serialization_failures 
> deadlock_failures }  
> sum_lag sum_lag_2 
> min_lag max_lag 
>  skipped   
>  retried 
> retries 
> 
> 
> which runs much too wide in HTML format too, even though that toolchain
> doesn't tell you so.

Yeah.

> We could silence the warning by inserting an arbitrary line break or two,
> or refactoring the syntax description into multiple parts.  Either way
> seems to create a risk of confusion.

I think we can fold the line nicely. Here is the rendered image.

Before:
interval_start num_transactions sum_latency sum_latency_2 min_latency 
max_latency { failures | serialization_failures deadlock_failures } [ sum_lag 
sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

After:
interval_start num_transactions sum_latency sum_latency_2 min_latency 
max_latency
  { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 
min_lag max_lag [ skipped ] ] [ retried retries ]

Note that before it was like this:

interval_start num_transactions​ sum_latency sum_latency_2 min_latency 
max_latency​ [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ]

So newly added items are "{ failures | serialization_failures deadlock_failures 
}" and " [ retried retries ]".

> TBH, I think the *real* problem is that the complexity of this log format
> has blown past "out of hand".  Can't we simplify it?  Who is really going
> to use all these numbers?  I pity the poor sucker who tries to write a
> log analysis tool that will handle all the variants.

Well, the extra logging items above only appear when the retry feature
is enabled. For those who do not use the feature the only new logging
item is "failures". For those who use the feature, the extra logging
items are apparently necessary. For example if we write an application
using repeatable read or serializable transaction isolation mode,
retrying failed transactions due to srialization error is an essential
technique. Also the retry rate of transactions will deeply affect the
performance and in such use cases the newly added items will be
precisou information. I would suggest leave the log items as it is.

Patch attached.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..b65b813ebe 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2398,10 +2398,11 @@ END;
 
   
With the --aggregate-interval option, a different
-   format is used for the log files:
+   format is used for the log files (note that the actual log line is not folded).
 
 
-interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures }  sum_lag sum_lag_2 min_lag max_lag  skippedretried retries 
+  interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
+  { failures | serialization_failures deadlock_failures }  sum_lag sum_lag_2 min_lag max_lag  skippedretried retries 
 
 
where


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-26 Thread Tom Lane
Tatsuo Ishii  writes:
> Thanks. Patch pushed.

This patch has caused the PDF documentation to fail to build cleanly:

[WARN] FOUserAgent - The contents of fo:block line 1 exceed the available area 
in the inline-progression direction by more than 50 points. (See position 
125066:375)

It's complaining about this:


interval_start 
num_transactions 
sum_latency sum_latency_2 
min_latency max_latency { 
failures | 
serialization_failures 
deadlock_failures }  
sum_lag sum_lag_2 
min_lag max_lag 
 skipped   
 retried 
retries 


which runs much too wide in HTML format too, even though that toolchain
doesn't tell you so.

We could silence the warning by inserting an arbitrary line break or two,
or refactoring the syntax description into multiple parts.  Either way
seems to create a risk of confusion.

TBH, I think the *real* problem is that the complexity of this log format
has blown past "out of hand".  Can't we simplify it?  Who is really going
to use all these numbers?  I pity the poor sucker who tries to write a
log analysis tool that will handle all the variants.

regards, tom lane




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-24 Thread Tatsuo Ishii
>> Oops. Thanks. New patch attached. Test has passed on my machine.
> 
> This patch works for me. I think it is ok to use \N instead of \gN.

Thanks. Patch pushed.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-24 Thread Tatsuo Ishii
> I reproduced the failure on another machine with perl 5.8.8,
> and I can confirm that this patch fixes it.

Thank you for the test. I have pushed the patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-24 Thread Yugo NAGATA
On Fri, 25 Mar 2022 09:14:00 +0900 (JST)
Tatsuo Ishii  wrote:

> > Note that the \\g2 just above also needs to be changed.
> 
> Oops. Thanks. New patch attached. Test has passed on my machine.

This patch works for me. I think it is ok to use \N instead of \gN.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-24 Thread Tom Lane
Tatsuo Ishii  writes:
> Oops. Thanks. New patch attached. Test has passed on my machine.

I reproduced the failure on another machine with perl 5.8.8,
and I can confirm that this patch fixes it.

regards, tom lane




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-24 Thread Tatsuo Ishii
> Note that the \\g2 just above also needs to be changed.

Oops. Thanks. New patch attached. Test has passed on my machine.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 60cae1e843..ca71f968dc 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1222,9 +1222,9 @@ local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
 # delta variable in the next try
 my $err_pattern =
 	"(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*"
-  . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; "
+  . "client \\2 got an error in command 3 \\(SQL\\) of script 0; "
   . "ERROR:  could not serialize access due to concurrent update\\b.*"
-  . "\\g1";
+  . "\\1";
 
 $node->pgbench(
 	"-n -c 2 -t 1 -d --verbose-errors --max-tries 2",


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-24 Thread Tom Lane
Tatsuo Ishii  writes:
> I don't see a reason to use "\gN" either. Actually after applying
> attached patch, my machine is still happy with pgbench test.

Note that the \\g2 just above also needs to be changed.

regards, tom lane




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-24 Thread Tatsuo Ishii
>> My machine (Ubuntu 20) did not complain either. Maybe perl version
>> difference?  Any way, the fix pushed. Let's see how prairiedog feels.
> 
> Still not happy.  After some digging in man pages, I believe the
> problem is that its old version of Perl does not understand "\gN"
> backreferences.  Is there a good reason to be using that rather
> than the traditional "\N" backref notation?

I don't see a reason to use "\gN" either. Actually after applying
attached patch, my machine is still happy with pgbench test.

Yugo?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 60cae1e843..22a23489e8 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1224,7 +1224,7 @@ my $err_pattern =
 	"(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*"
   . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; "
   . "ERROR:  could not serialize access due to concurrent update\\b.*"
-  . "\\g1";
+  . "\\1";
 
 $node->pgbench(
 	"-n -c 2 -t 1 -d --verbose-errors --max-tries 2",


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-24 Thread Tom Lane
Tatsuo Ishii  writes:
>> My hoary animal prairiedog doesn't like this [1]:

> My machine (Ubuntu 20) did not complain either. Maybe perl version
> difference?  Any way, the fix pushed. Let's see how prairiedog feels.

Still not happy.  After some digging in man pages, I believe the
problem is that its old version of Perl does not understand "\gN"
backreferences.  Is there a good reason to be using that rather
than the traditional "\N" backref notation?

regards, tom lane




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-23 Thread Tatsuo Ishii
>> My hoary animal prairiedog doesn't like this [1]:
>> 
>> #   Failed test 'concurrent update with retrying stderr /(?s-xim:client 
>> (0|1) got an error in command 3 \\(SQL\\) of script 0; ERROR:  could not 
>> serialize access due to concurrent update\\b.*\\g1)/'
>> #   at t/001_pgbench_with_server.pl line 1229.
>> #   'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 
>> nclients: 2 nxacts: 1 dbName: postgres
>> ...
>> # pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR:  
>> could not serialize access due to concurrent update
>> ...
>> # '
>> # doesn't match '(?s-xim:client (0|1) got an error in command 3 
>> \\(SQL\\) of script 0; ERROR:  could not serialize access due to concurrent 
>> update\\b.*\\g1)'
>> # Looks like you failed 1 test of 425.
>> 
>> I'm not sure what the "\\b.*\\g1" part of this regex is meant to
>> accomplish, but it seems to be assuming more than it should
>> about the output format of TAP messages.
> 
> I have edited the test code from the original patch by mistake, but
> I could not realize because the test works in my machine without any
> errors somehow.
> 
> I attached a patch to fix the test as was in the original patch, where
> backreferences are used to check retry of the same query.

My machine (Ubuntu 20) did not complain either. Maybe perl version
difference?  Any way, the fix pushed. Let's see how prairiedog feels.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-23 Thread Yugo NAGATA
On Wed, 23 Mar 2022 14:26:54 -0400
Tom Lane  wrote:

> Tatsuo Ishii  writes:
> > The patch Pushed. Thank you!
> 
> My hoary animal prairiedog doesn't like this [1]:
> 
> #   Failed test 'concurrent update with retrying stderr /(?s-xim:client (0|1) 
> got an error in command 3 \\(SQL\\) of script 0; ERROR:  could not serialize 
> access due to concurrent update\\b.*\\g1)/'
> #   at t/001_pgbench_with_server.pl line 1229.
> #   'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 nclients: 
> 2 nxacts: 1 dbName: postgres
> ...
> # pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR:  
> could not serialize access due to concurrent update
> ...
> # '
> # doesn't match '(?s-xim:client (0|1) got an error in command 3 \\(SQL\\) 
> of script 0; ERROR:  could not serialize access due to concurrent 
> update\\b.*\\g1)'
> # Looks like you failed 1 test of 425.
> 
> I'm not sure what the "\\b.*\\g1" part of this regex is meant to
> accomplish, but it seems to be assuming more than it should
> about the output format of TAP messages.

I have edited the test code from the original patch by mistake, but
I could not realize because the test works in my machine without any
errors somehow.

I attached a patch to fix the test as was in the original patch, where
backreferences are used to check retry of the same query.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index d173ceae7a..3eb5905e5a 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1222,7 +1222,8 @@ local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
 # Check that we have a serialization error and the same random value of the
 # delta variable in the next try
 my $err_pattern =
-"client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+	"(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*"
+  . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; "
   . "ERROR:  could not serialize access due to concurrent update\\b.*"
   . "\\g1";
 


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-23 Thread Tom Lane
Tatsuo Ishii  writes:
> The patch Pushed. Thank you!

My hoary animal prairiedog doesn't like this [1]:

#   Failed test 'concurrent update with retrying stderr /(?s-xim:client (0|1) 
got an error in command 3 \\(SQL\\) of script 0; ERROR:  could not serialize 
access due to concurrent update\\b.*\\g1)/'
#   at t/001_pgbench_with_server.pl line 1229.
#   'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 nclients: 2 
nxacts: 1 dbName: postgres
...
# pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR:  could 
not serialize access due to concurrent update
...
# '
# doesn't match '(?s-xim:client (0|1) got an error in command 3 \\(SQL\\) 
of script 0; ERROR:  could not serialize access due to concurrent 
update\\b.*\\g1)'
# Looks like you failed 1 test of 425.

I'm not sure what the "\\b.*\\g1" part of this regex is meant to
accomplish, but it seems to be assuming more than it should
about the output format of TAP messages.

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2022-03-23%2013%3A21%3A44




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-23 Thread Tatsuo Ishii
>> I attached the updated patch. I also fixed the following paragraph which I 
>> had
>> forgotten to fix in the previous patch.
>> 
>>  The first seven lines report some of the most important parameter settings.
>>  The sixth line reports the maximum number of tries for transactions with
>>  serialization or deadlock errors
> 
> Thank you for the updated patch. I think the patches look good and now
> it's ready for commit. If there's no objection, I would like to
> commit/push the patches.

The patch Pushed. Thank you!

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-22 Thread Tatsuo Ishii
>> I've checked other places using  referring to , and found
>> that "xreflabel"s are used in such  tags. So, I'll fix it 
>> in this style.
> 
> I attached the updated patch. I also fixed the following paragraph which I had
> forgotten to fix in the previous patch.
> 
>  The first seven lines report some of the most important parameter settings.
>  The sixth line reports the maximum number of tries for transactions with
>  serialization or deadlock errors

Thank you for the updated patch. I think the patches look good and now
it's ready for commit. If there's no objection, I would like to
commit/push the patches.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-22 Thread Yugo NAGATA
On Tue, 22 Mar 2022 09:08:15 +0900
Yugo NAGATA  wrote:

> Hi Ishii-san,
> 
> On Sun, 20 Mar 2022 09:52:06 +0900 (JST)
> Tatsuo Ishii  wrote:
> 
> > Hi Yugo,
> > 
> > I have looked into the patch and I noticed that  > linkend=... endterm=...> is used in pgbench.sgml. e.g.
> > 
> > 
> > 
> > AFAIK this is the only place where "endterm" is used. In other places
> > "link" tag is used instead:
> 
> Thank you for pointing out it. 
> 
> I've checked other places using  referring to , and found
> that "xreflabel"s are used in such  tags. So, I'll fix it 
> in this style.

I attached the updated patch. I also fixed the following paragraph which I had
forgotten to fix in the previous patch.

 The first seven lines report some of the most important parameter settings.
 The sixth line reports the maximum number of tries for transactions with
 serialization or deadlock errors

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From b9f993e81836c4379478809da0e690023c319038 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v18 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 163 +++---
 1 file changed, 100 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 000ffc4a5c..ab2c5dfc5f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -289,6 +289,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -306,6 +312,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1398,39 +1420,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) ,
-(void *) st->variables,
-st->nvariables,
+(void *) variables->vars,
+variables->nvars,
 sizeof(Variable),
 compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1562,21 +1584,37 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array.
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * 

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-21 Thread Tatsuo Ishii
> On Sun, 20 Mar 2022 16:11:43 +0900 (JST)
> Tatsuo Ishii  wrote:
> 
>> > Hi Yugo,
>> > 
>> > I tested with serialization error scenario by setting:
>> > default_transaction_isolation = 'repeatable read'
>> > The result was:
>> > 
>> > $ pgbench -t 10 -c 10 --max-tries=10 test
>> > transaction type: 
>> > scaling factor: 10
>> > query mode: simple
>> > number of clients: 10
>> > number of threads: 1
>> > maximum number of tries: 10
>> > number of transactions per client: 10
>> > number of transactions actually processed: 100/100
>> > number of failed transactions: 0 (0.000%)
>> > number of transactions retried: 35 (35.000%)
>> > total number of retries: 74
>> > latency average = 5.306 ms
>> > initial connection time = 15.575 ms
>> > tps = 1884.516810 (without initial connection time)
>> > 
>> > I had hard time to understand what those numbers mean:
>> > number of transactions retried: 35 (35.000%)
>> > total number of retries: 74
>> > 
>> > It seems "total number of retries" matches with the number of ERRORs
>> > reported in PostgreSQL. Good. What I am not sure is "number of
>> > transactions retried". What does this mean?
>> 
>> Oh, ok. I see it now. It turned out that "number of transactions
>> retried" does not actually means the number of transactions
>> rtried. Suppose pgbench exectutes following in a session:
>> 
>> BEGIN;   -- transaction A starts
>> :
>> (ERROR)
>> ROLLBACK; -- transaction A aborts
>> 
>> (retry)
>> 
>> BEGIN;   -- transaction B starts
>> :
>> (ERROR)
>> ROLLBACK; -- transaction B aborts
>> 
>> (retry)
>> 
>> BEGIN;   -- transaction C starts
>> :
>> END; -- finally succeeds
>> 
>> In this case "total number of retries:" = 2 and "number of
>> transactions retried:" = 1. In this patch transactions A, B and C are
>> regarded as "same" transaction, so the retried transaction count
>> becomes 1. But it's confusing to use the language "transaction" here
>> because A, B and C are different transactions. I would think it's
>> better to use different language instead of "transaction", something
>> like "cycle"? i.e.
>> 
>> number of cycles retried: 35 (35.000%)

I realized that the same argument can be applied even to "number of
transactions actually processed" because with the retry feature,
"transaction" could comprise multiple transactions.

But if we go forward and replace those "transactions" with "cycles"
(or whatever) altogether, probably it could bring enough confusion to
users who have been using pgbench. Probably we should give up the
language changing and redefine "transaction" when the retry feature is
enabled instead like "when retry feature is enabled, each transaction
can be consisted of multiple transactions retried."

> In the original patch by Marina Polyakova it was "number of retried", 
> but I changed it to "number of transactions retried" is because I felt
> it was confusing with "number of retries". I chose the word "transaction"
> because a transaction ends in any one of successful commit , skipped, or
> failure, after possible retries. 

Ok.

> Well, I agree with that it is somewhat confusing wording. If we can find
> nice word to resolve the confusion, I don't mind if we change the word. 
> Maybe, we can use "executions" as well as "cycles". However, I am not sure
> that the situation is improved by using such word because what such word
> exactly means seems to be still unclear for users. 
> 
> Another idea is instead reporting only "the number of successfully
> retried transactions" that does not include "failed transactions", 
> that is, transactions failed after retries, like this;
> 
>  number of transactions actually processed: 100/100
>  number of failed transactions: 0 (0.000%)
>  number of successfully retried transactions: 35 (35.000%)
>  total number of retries: 74 
> 
> The meaning is clear and there seems to be no confusion.

Thank you for the suggestion. But I think it would better to leave it
as it is because of the reason I mentioned above.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-21 Thread Yugo NAGATA
On Sun, 20 Mar 2022 16:11:43 +0900 (JST)
Tatsuo Ishii  wrote:

> > Hi Yugo,
> > 
> > I tested with serialization error scenario by setting:
> > default_transaction_isolation = 'repeatable read'
> > The result was:
> > 
> > $ pgbench -t 10 -c 10 --max-tries=10 test
> > transaction type: 
> > scaling factor: 10
> > query mode: simple
> > number of clients: 10
> > number of threads: 1
> > maximum number of tries: 10
> > number of transactions per client: 10
> > number of transactions actually processed: 100/100
> > number of failed transactions: 0 (0.000%)
> > number of transactions retried: 35 (35.000%)
> > total number of retries: 74
> > latency average = 5.306 ms
> > initial connection time = 15.575 ms
> > tps = 1884.516810 (without initial connection time)
> > 
> > I had hard time to understand what those numbers mean:
> > number of transactions retried: 35 (35.000%)
> > total number of retries: 74
> > 
> > It seems "total number of retries" matches with the number of ERRORs
> > reported in PostgreSQL. Good. What I am not sure is "number of
> > transactions retried". What does this mean?
> 
> Oh, ok. I see it now. It turned out that "number of transactions
> retried" does not actually means the number of transactions
> rtried. Suppose pgbench exectutes following in a session:
> 
> BEGIN;-- transaction A starts
> :
> (ERROR)
> ROLLBACK; -- transaction A aborts
> 
> (retry)
> 
> BEGIN;-- transaction B starts
> :
> (ERROR)
> ROLLBACK; -- transaction B aborts
> 
> (retry)
> 
> BEGIN;-- transaction C starts
> :
> END;  -- finally succeeds
> 
> In this case "total number of retries:" = 2 and "number of
> transactions retried:" = 1. In this patch transactions A, B and C are
> regarded as "same" transaction, so the retried transaction count
> becomes 1. But it's confusing to use the language "transaction" here
> because A, B and C are different transactions. I would think it's
> better to use different language instead of "transaction", something
> like "cycle"? i.e.
> 
> number of cycles retried: 35 (35.000%)

In the original patch by Marina Polyakova it was "number of retried", 
but I changed it to "number of transactions retried" is because I felt
it was confusing with "number of retries". I chose the word "transaction"
because a transaction ends in any one of successful commit , skipped, or
failure, after possible retries. 

Well, I agree with that it is somewhat confusing wording. If we can find
nice word to resolve the confusion, I don't mind if we change the word. 
Maybe, we can use "executions" as well as "cycles". However, I am not sure
that the situation is improved by using such word because what such word
exactly means seems to be still unclear for users. 

Another idea is instead reporting only "the number of successfully
retried transactions" that does not include "failed transactions", 
that is, transactions failed after retries, like this;

 number of transactions actually processed: 100/100
 number of failed transactions: 0 (0.000%)
 number of successfully retried transactions: 35 (35.000%)
 total number of retries: 74 

The meaning is clear and there seems to be no confusion.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-21 Thread Yugo NAGATA
Hi Ishii-san,

On Sun, 20 Mar 2022 09:52:06 +0900 (JST)
Tatsuo Ishii  wrote:

> Hi Yugo,
> 
> I have looked into the patch and I noticed that  linkend=... endterm=...> is used in pgbench.sgml. e.g.
> 
> 
> 
> AFAIK this is the only place where "endterm" is used. In other places
> "link" tag is used instead:

Thank you for pointing out it. 

I've checked other places using  referring to , and found
that "xreflabel"s are used in such  tags. So, I'll fix it 
in this style.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-20 Thread Tatsuo Ishii
> Hi Yugo,
> 
> I tested with serialization error scenario by setting:
> default_transaction_isolation = 'repeatable read'
> The result was:
> 
> $ pgbench -t 10 -c 10 --max-tries=10 test
> transaction type: 
> scaling factor: 10
> query mode: simple
> number of clients: 10
> number of threads: 1
> maximum number of tries: 10
> number of transactions per client: 10
> number of transactions actually processed: 100/100
> number of failed transactions: 0 (0.000%)
> number of transactions retried: 35 (35.000%)
> total number of retries: 74
> latency average = 5.306 ms
> initial connection time = 15.575 ms
> tps = 1884.516810 (without initial connection time)
> 
> I had hard time to understand what those numbers mean:
> number of transactions retried: 35 (35.000%)
> total number of retries: 74
> 
> It seems "total number of retries" matches with the number of ERRORs
> reported in PostgreSQL. Good. What I am not sure is "number of
> transactions retried". What does this mean?

Oh, ok. I see it now. It turned out that "number of transactions
retried" does not actually means the number of transactions
rtried. Suppose pgbench exectutes following in a session:

BEGIN;  -- transaction A starts
:
(ERROR)
ROLLBACK; -- transaction A aborts

(retry)

BEGIN;  -- transaction B starts
:
(ERROR)
ROLLBACK; -- transaction B aborts

(retry)

BEGIN;  -- transaction C starts
:
END;-- finally succeeds

In this case "total number of retries:" = 2 and "number of
transactions retried:" = 1. In this patch transactions A, B and C are
regarded as "same" transaction, so the retried transaction count
becomes 1. But it's confusing to use the language "transaction" here
because A, B and C are different transactions. I would think it's
better to use different language instead of "transaction", something
like "cycle"? i.e.

number of cycles retried: 35 (35.000%)

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-19 Thread Tatsuo Ishii
Hi Yugo,

I tested with serialization error scenario by setting:
default_transaction_isolation = 'repeatable read'
The result was:

$ pgbench -t 10 -c 10 --max-tries=10 test
transaction type: 
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 1
maximum number of tries: 10
number of transactions per client: 10
number of transactions actually processed: 100/100
number of failed transactions: 0 (0.000%)
number of transactions retried: 35 (35.000%)
total number of retries: 74
latency average = 5.306 ms
initial connection time = 15.575 ms
tps = 1884.516810 (without initial connection time)

I had hard time to understand what those numbers mean:
number of transactions retried: 35 (35.000%)
total number of retries: 74

It seems "total number of retries" matches with the number of ERRORs
reported in PostgreSQL. Good. What I am not sure is "number of
transactions retried". What does this mean?

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-19 Thread Tatsuo Ishii
Hi Yugo,

I have looked into the patch and I noticed that  is used in pgbench.sgml. e.g.



AFAIK this is the only place where "endterm" is used. In other places
"link" tag is used instead:

Failures and Serialization/Deadlock 
Retries

Note that the rendered result is identical. Do we want to use the link tag as 
well?

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-19 Thread Yugo NAGATA
Hello Fabien,

On Sat, 12 Mar 2022 15:54:54 +0100 (CET)
Fabien COELHO  wrote:

> Hello Yugo-san,
> 
> About Pgbench error handling v16:

Thank you for your review! I attached the updated patches.
 
> This patch set needs a minor rebase because of 506035b0. Otherwise, patch 
> compiles, global and local "make check" are ok. Doc generation is ok.

I rebased it.

> ## About v16-2
 
> English: "he will be aborted" -> "it will be aborted".

Fixed.

> I'm still not sure I like the "failure detailed" option, ISTM that the report
> could be always detailed. That would remove some complexity and I do not think
> that people executing a bench with error handling would mind having the 
> details.
> No big deal.

I didn't change it because I think those who don't expect any failures using a
well designed script may not need details of failures. I think reporting such
details will be required only for benchmarks where any failures are expected.

> printVerboseErrorMessages: I'd make the buffer static and initialized only 
> once
> so that there is no significant malloc/free cycle involved when calling the 
> function.

OK. I fixed printVerboseErrorMessages to use a static variable.

> advanceConnectionState: I'd really prefer not to add new variables (res, 
> status)
> in the loop scope, and only declare them when actually needed in the state 
> branches,
> so as to avoid any unwanted interaction between states.

I fixed to declare the variables in the case statement blocks.

> typo: "fullowing" -> "following"

fixed.

> Pipeline cleaning: the advance function is already s long, I'd put that 
> in a
> separate function and call it.

Ok. I made a new function "discardUntilSync" for the pipeline cleaning.

> I think that the report should not remove data when they are 0, otherwise it 
> makes
> it harder to script around it (in failures_detailed on line 6284).

I fixed to report both serialization and deadlock failures always even when
they are 0.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From b4360d3c03013c86e1e62247f1c3c1378aacc38d Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v17 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 163 +++---
 1 file changed, 100 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 000ffc4a5c..ab2c5dfc5f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -289,6 +289,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -306,6 +312,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1398,39 +1420,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-12 Thread Fabien COELHO



Hello Yugo-san,

About Pgbench error handling v16:

This patch set needs a minor rebase because of 506035b0. Otherwise, patch 
compiles, global and local "make check" are ok. Doc generation is ok.


This patch is in good shape, the code and comments are clear.
Some minor remarks below, including typos and a few small suggestions.


## About v16-1

This refactoring patch adds a struct for managing pgbench variables, instead of
mixing fields into the client state (CState) struct.

Patch compiles, global and local "make check" are both ok.

Although this patch is not necessary to add the feature, I'm fine with it as
improves pgbench source code readability.


## About v16-2

This last patch adds handling of serialization and deadlock errors to pgbench
transactions. This feature is desirable because it enlarge performance testing
options, and makes pgbench behave more like a database client application.

Possible future extension enabled by this patch include handling deconnections
errors by trying to reconnect, for instance.

The documentation is clear and well written, at least for my non-native speaker
eyes and ears.

English: "he will be aborted" -> "it will be aborted".

I'm fine with renaming --report-latencies to --report-per-command as the later
is clearer about what the options does.

I'm still not sure I like the "failure detailed" option, ISTM that the report
could be always detailed. That would remove some complexity and I do not think
that people executing a bench with error handling would mind having the details.
No big deal.

printVerboseErrorMessages: I'd make the buffer static and initialized only once
so that there is no significant malloc/free cycle involved when calling the 
function.

advanceConnectionState: I'd really prefer not to add new variables (res, status)
in the loop scope, and only declare them when actually needed in the state 
branches,
so as to avoid any unwanted interaction between states.

typo: "fullowing" -> "following"

Pipeline cleaning: the advance function is already s long, I'd put that in a
separate function and call it.

I think that the report should not remove data when they are 0, otherwise it 
makes
it harder to script around it (in failures_detailed on line 6284).

The test cover the different cases. I tried to suggest a simpler approach 
in a previous round, but it seems not so simple to do so. They could be 
simplified later, if possible.


--
Fabien.




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-02 Thread Fabien COELHO



Hello Tatsuo-san,

It seems the patch is ready for committer except below. Do you guys want 
to do more on below?


I'm planning a new review of this significant patch, possibly over the 
next week-end, or the next.


--
Fabien.




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-02 Thread Tatsuo Ishii
Hi Yugo and Fabien,

It seems the patch is ready for committer except below. Do you guys
want to do more on below?

>> # TESTS
>> 
>> I suggested to simplify the tests by using conditionals & sequences. You 
>> reported that you got stuck. Hmmm.
>> 
>> I tried again my tests which worked fine when started with 2 clients, 
>> otherwise they get stuck because the first client waits for the other one 
>> which does not exists (the point is to generate deadlocks and other 
>> errors). Maybe this is your issue?
> 
> That seems to be right. It got stuck when I used -T option rather than -t,
> it was because, I guess, the number of transactions on each thread was
> different.
> 
>> Could you try with:
>> 
>>psql < deadlock_prep.sql
>>pgbench -t 4 -c 2 -f deadlock.sql
>># note: each deadlock detection takes 1 second
>> 
>>psql < deadlock_prep.sql
>>pgbench -t 10 -c 2 -f serializable.sql
>># very quick 50% serialization errors
> 
> That works. However, it still gets hang when --max-tries = 2,
> so maybe I would not think we can use it for testing the retry
> feature

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-02-18 Thread Yugo NAGATA
Hello Fabien,

Thank you so much for your review. 

Sorry for the late reply. I've stopped working on it due to other
jobs but I came back again. I attached the updated patch. I would
appreciate it if you could review this again.

On Mon, 19 Jul 2021 20:04:23 +0200 (CEST)
Fabien COELHO  wrote:

> # About pgbench error handling v15
> 
> Patches apply cleanly. Compilation, global and local tests ok.
> 
>   - v15.1: refactoring is a definite improvement.
> Good, even if it is not very useful (see below).

Ok, we don't need to save variables in order to implement the
retry feature on pbench as you suggested. Well, should we completely
separate these two patches and should I fix v15.2 to not rely v15.1?

> While restructuring, maybe predefined variables could be make readonly
> so that a script which would update them would fail, which would be a
> good thing. Maybe this is probably material for an independent patch.

Yes, It shoule be for an independent patch.

>   - v15.2: see detailed comments below
> 
> # Doc
> 
> Doc build is ok.
> 
> ISTM that "number of tries" line would be better placed between the 
> #threads and #transactions lines. What do you think?

Agreed. Fixed.

> Aggregate logging description: "{ failures | ... }" seems misleading 
> because it suggests we have one or the other, whereas it can also be 
> empty. I suggest: "{ | failures | ... }" to show the empty case.

The description is correct because either "failures" or "both of
serialization_failures and deadlock_failures" should appear in aggregate
logging. If "failures" was printed only when any transaction failed,
each line in aggregate logging could have different numbers of columns
and which would make it difficult to parse the results.

> I'd wonder whether the number of tries is set too high, 
> though, ISTM that an application should give up before 100? 

Indeed, max-tries=100 seems too high for practical system. 

Also, I noticed that sum of latencies of each command (= 15.839 ms)
is significantly larger than the latency average (= 10.870 ms) 
because "per commands" results in the documentation were fixed.

So, I retook a measurement on my machine for more accurate documentation. I
used max-tries=10.

> Minor editing:
> 
> "there're" -> "there are".
> 
> "the --time" -> "the --time option".

Fixed.

> "The latency for failed transactions and commands is not computed 
> separately." is unclear,
> please use a positive sentence to tell what is true instead of what is not 
> and the reader
> has to guess. Maybe: "The latency figures include failed transactions which 
> have reached
> the maximum number of tries or the transaction latency limit.".

I'm not the original author of this description, but I guess this means "The 
latency is
measured only for successful transactions and commands but not for failed 
transactions
or commands.".

> "The main report contains the number of failed transactions if it is 
> non-zero." ISTM that
> this is a pain for scripts which would like to process these reports data, 
> because the data
> may or may not be there. I'm sure to write such scripts, which explains my 
> concern:-)

I agree with you. I fixed the behavior to report the the number of failed 
transactions
always regardless with if it is non-zero or not.

> "If the total number of retried transactions is non-zero…" should it rather 
> be "not one",
> because zero means unlimited retries?

I guess that this means the actual number of retried transaction not the 
max-tries, so
"non-zero" was correct. However, for the same reason above, I fixed the 
behavior to
report the the retry statistics always regardeless with the actual retry 
numbers.

> 
> # FEATURES
 
> Copying variables: ISTM that we should not need to save the variables 
> states… no clearing, no copying should be needed. The restarted 
> transaction simply overrides the existing variables which is what the 
> previous version was doing anyway. The scripts should write their own 
> variables before using them, and if they don't then it is the user 
> problem. This is important for performance, because it means that after a 
> client has executed all scripts once the variable array is stable and does 
> not incur significant maintenance costs. The only thing that needs saving 
> for retry is the speudo-random generator state. This suggest simplifying 
> or removing "RetryState".

Yes. The variables states is not necessary because we retry the
whole script. It was necessary in the initial patch because it
planned to retry one transaction included in the script. I removed
RetryState and copyVariables.
 
> # CODE
 
> In readCommandResponse: ISTM that PGRES_NONFATAL_ERROR is not needed and 
> could be dealt with the default case. We are only interested in 
> serialization/deadlocks which are fatal errors?

We need PGRES_NONFATAL_ERROR to save st->estatus. It is used outside
readCommandResponse to determine whether we should abort or not.

> doRetry: for consistency, 

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-19 Thread Fabien COELHO



I attached the updated patch.


# About pgbench error handling v15

Patches apply cleanly. Compilation, global and local tests ok.

 - v15.1: refactoring is a definite improvement.
   Good, even if it is not very useful (see below).

   While restructuring, maybe predefined variables could be make readonly
   so that a script which would update them would fail, which would be a
   good thing. Maybe this is probably material for an independent patch.

 - v15.2: see detailed comments below

# Doc

Doc build is ok.

ISTM that "number of tries" line would be better placed between the 
#threads and #transactions lines. What do you think?


Aggregate logging description: "{ failures | ... }" seems misleading 
because it suggests we have one or the other, whereas it can also be 
empty. I suggest: "{ | failures | ... }" to show the empty case.


Having a full example with retries in the doc is a good thing, and 
illustrates in passing that running with a number of clients on a small 
scale does not make much sense because of the contention on 
tellers/branches. I'd wonder whether the number of tries is set too high, 
though, ISTM that an application should give up before 100? I like that 
the feature it is also limited by latency limit.


Minor editing:

"there're" -> "there are".

"the --time" -> "the --time option".

The overall English seems good, but I'm not a native speaker. As I already 
said, a native
speaker proofreading would be nice.


From a technical writing point of view, maybe the documentation could be 
improved a bit,

but I'm not a ease on that subject. Some comments:

"The latency for failed transactions and commands is not computed separately." 
is unclear,
please use a positive sentence to tell what is true instead of what is not and 
the reader
has to guess. Maybe: "The latency figures include failed transactions which 
have reached
the maximum number of tries or the transaction latency limit.".

"The main report contains the number of failed transactions if it is non-zero." 
ISTM that
this is a pain for scripts which would like to process these reports data, 
because the data
may or may not be there. I'm sure to write such scripts, which explains my 
concern:-)

"If the total number of retried transactions is non-zero…" should it rather be "not 
one",
because zero means unlimited retries?

The section describing the various type of errors that can occur is a good 
addition.

Option "--report-latencies" changed to "--report-per-commands": I'm fine with 
this change.

# FEATURES

--failures-detailed: I'm not convinced that this option should not always be 
on, but
this is not very important, so let it be.

--verbose-errors: I still think this is only for debugging, but let it be.

Copying variables: ISTM that we should not need to save the variables 
states… no clearing, no copying should be needed. The restarted 
transaction simply overrides the existing variables which is what the 
previous version was doing anyway. The scripts should write their own 
variables before using them, and if they don't then it is the user 
problem. This is important for performance, because it means that after a 
client has executed all scripts once the variable array is stable and does 
not incur significant maintenance costs. The only thing that needs saving 
for retry is the speudo-random generator state. This suggest simplifying 
or removing "RetryState".


# CODE

The semantics of "cnt" is changed. Ok, the overall counters and their 
relationships make sense, and it simplifies the reporting code. Good.


In readCommandResponse: ISTM that PGRES_NONFATAL_ERROR is not needed and 
could be dealt with the default case. We are only interested in 
serialization/deadlocks which are fatal errors?


doRetry: for consistency, given the assert, ISTM that it should return 
false if duration has expired, by testing end_time or timer_exceeded.


checkTransactionStatus: this function does several things at once with 2 
booleans, which make it not very clear to me. Maybe it would be clearer if 
it would just return an enum (in trans, not in trans, conn error, other 
error). Another reason to do that is that on connection error pgbench 
could try to reconnect, which would be an interesting later extension, so 
let's pave the way for that.  Also, I do not think that the function 
should print out a message, it should be the caller decision to do that.


verbose_errors: there is more or less repeated code under RETRY and 
FAILURE, which should be factored out in a separate function. The 
advanceConnectionFunction is long enough. Once this is done, there is no 
need for a getLatencyUsed function.


I'd put cleaning up the pipeline in a function. I do not understand why 
the pipeline mode is not exited in all cases, the code checks for the 
pipeline status twice in a few lines. I'd put this cleanup in the sync 
function as well, report to the caller (advanceConnectionState) if there 
was an error, which would be managed there.

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-15 Thread Yugo NAGATA
Hello,

I attached the updated patch.

On Tue, 13 Jul 2021 15:50:52 +0900
Yugo NAGATA  wrote:
 
> > >> > I'm a little hesitant about how to count and report such unfinished
> > >> > because of bench timeout transactions, though. Not counting them seems
> > >> > to be the best option.

> > > I will fix to finish the benchmark when the time is over during retrying, 
> > > that is,
> > > change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases.

Done.
(I wrote CSTATE_ERROR, but correctly it is CSTATE_FAILURE.)
 
Now, once the timer is expired during retrying a failed transaction, pgbench 
never start
a new transaction for retry. If the transaction successes, it will counted in 
the result.
Otherwise, if the transaction fails again, it is not counted.


In addition, I fixed to work well with pipeline mode. Previously, pipeline mode 
was not
enough considered, ROLLBACK was not sent correctly. I fixed to handle errors in 
pipeline
mode properly, and now it works.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From 1cd3519f3a1cfbffe7bcce35fd6f12da566625b1 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Mon, 7 Jun 2021 18:35:14 +0900
Subject: [PATCH v15 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --verbose-errors.
---
 doc/src/sgml/ref/pgbench.sgml| 402 +++-
 src/bin/pgbench/pgbench.c| 974 +--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 217 -
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c   |  16 +-
 src/include/fe_utils/conditional.h   |   2 +
 6 files changed, 1505 insertions(+), 116 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..0a3ccb3c92 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -58,6 +58,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
+maximum number of tries: 1
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
@@ -65,11 +66,14 @@ tps = 896.967014 (without initial connection time)
 
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.  The seventh line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In -T mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  -T mode, only the actual number of transactions is printed.)
+  The next line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see  for more information).
   The last line reports the number of transactions per second.
  
 
@@ 

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-13 Thread Yugo NAGATA
On Tue, 13 Jul 2021 14:35:00 +0900 (JST)
Tatsuo Ishii  wrote:

> >> > I would tend to agree with this behavior, that is not to start any new
> >> > transaction or transaction attempt once -T has expired.
> > 
> > That is the behavior in the latest patch. Once -T has expired, any new
> > transaction or retry does not start. 
> 
> Actually v14 has not changed the behavior in this regard as explained
> in different email:

Right. Both of v13 and v14 doen't start any new transaction or retry once
-T has expired.

> >> > I'm a little hesitant about how to count and report such unfinished
> >> > because of bench timeout transactions, though. Not counting them seems
> >> > to be the best option.
> >> 
> >> I agree.
> > 
> > I also agree. Although I  couldn't get an answer what does he think the 
> > actual
> > harm for users due to termination of retrying by the -T option is, I guess 
> > it just
> > complained about reporting the termination of retrying  as failures. 
> > Therefore,
> > I will fix to finish the benchmark when the time is over during retrying, 
> > that is,
> > change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases.
> 
> I guess Fabien wanted it differently. Suppose "-c 10 and -T 30" and we
> have 100 success transactions by time 25. At time 25 pgbench starts
> next benchmark cycle and by time 30 there are 10 failing transactions
> (because they are retrying). pgbench stops the execution at time
> 30. According your proposal (change the state to CSTATE_FINISHED
> instead of CSTATE_ERROR) the total number of success transactions will
> be 100 + 10 = 110, right? 

No. The last failed transaction is not counted because CSTATE_END_TX is
bypassed, so please don't worry.

> Also actually I have explained the harm number of times but you have
> kept on ignoring it because "it's subtle". My request has been pretty
> simple.
> 
> > number of failed transactions: 9 (0.916%)
> 
> I don't like this and want to have the failed transactions to be 0.
> Who wants a benchmark result having errors?

I was asking you because I would like to confirm what you really complained
about; whether the problem is that retrying transaction is terminated by -T
option, or that pgbench reports it as the number of failed transactions? But
now, I understood this is the latter that you don't want to count the temination
of retrying as failures. Thanks. 

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-12 Thread Tatsuo Ishii
>> > I would tend to agree with this behavior, that is not to start any new
>> > transaction or transaction attempt once -T has expired.
> 
> That is the behavior in the latest patch. Once -T has expired, any new
> transaction or retry does not start. 

Actually v14 has not changed the behavior in this regard as explained
in different email:

> $ pgbench -p 11000 -c 10  -T 10  --max-tries=0 test
> pgbench (15devel, server 13.3)
> starting vacuum...end.
> transaction type: 
> scaling factor: 1
> query mode: simple
> number of clients: 10
> number of threads: 1
> duration: 10 s
> number of transactions actually processed: 974
> number of failed transactions: 9 (0.916%)
> number of transactions retried: 651 (66.226%)
> total number of retries: 8482
> latency average = 101.317 ms (including failures)
> initial connection time = 44.440 ms
> tps = 97.796487 (without initial connection time)

>> > I'm a little hesitant about how to count and report such unfinished
>> > because of bench timeout transactions, though. Not counting them seems
>> > to be the best option.
>> 
>> I agree.
> 
> I also agree. Although I  couldn't get an answer what does he think the actual
> harm for users due to termination of retrying by the -T option is, I guess it 
> just
> complained about reporting the termination of retrying  as failures. 
> Therefore,
> I will fix to finish the benchmark when the time is over during retrying, 
> that is,
> change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases.

I guess Fabien wanted it differently. Suppose "-c 10 and -T 30" and we
have 100 success transactions by time 25. At time 25 pgbench starts
next benchmark cycle and by time 30 there are 10 failing transactions
(because they are retrying). pgbench stops the execution at time
30. According your proposal (change the state to CSTATE_FINISHED
instead of CSTATE_ERROR) the total number of success transactions will
be 100 + 10 = 110, right? I guess Fabien wants to have the number to
be 100 rather than 110.

Fabien,
Please correct me if you think differently.

Also actually I have explained the harm number of times but you have
kept on ignoring it because "it's subtle". My request has been pretty
simple.

> number of failed transactions: 9 (0.916%)

I don't like this and want to have the failed transactions to be 0.
Who wants a benchmark result having errors?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-12 Thread Yugo NAGATA
On Tue, 13 Jul 2021 13:00:49 +0900 (JST)
Tatsuo Ishii  wrote:

> >>> Or, we should terminate the last cycle of benchmark regardless it is
> >>> retrying or not if -T expires. This will make pgbench behaves much
> >>> more consistent.
> > 
> > I would tend to agree with this behavior, that is not to start any new
> > transaction or transaction attempt once -T has expired.

That is the behavior in the latest patch. Once -T has expired, any new
transaction or retry does not start. 

IIUC, Ishii-san's proposal was changing the pgbench's behavior when -T has
expired to terminate any running transactions immediately regardless retrying.
I am not sure we should do it in this patch. If we would like this change,
it would be done in another patch as an improvement of the -T option.

> > I'm a little hesitant about how to count and report such unfinished
> > because of bench timeout transactions, though. Not counting them seems
> > to be the best option.
> 
> I agree.

I also agree. Although I  couldn't get an answer what does he think the actual
harm for users due to termination of retrying by the -T option is, I guess it 
just
complained about reporting the termination of retrying  as failures. Therefore,
I will fix to finish the benchmark when the time is over during retrying, that 
is,
change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-12 Thread Tatsuo Ishii
>>> Or, we should terminate the last cycle of benchmark regardless it is
>>> retrying or not if -T expires. This will make pgbench behaves much
>>> more consistent.
> 
> I would tend to agree with this behavior, that is not to start any new
> transaction or transaction attempt once -T has expired.
> 
> I'm a little hesitant about how to count and report such unfinished
> because of bench timeout transactions, though. Not counting them seems
> to be the best option.

I agree.

>> Hmmm, indeed this might make the behaviour a bit consistent, but I am
>> not
>> sure such behavioural change benefit users.
> 
> The user benefit would be that if they asked for a 100s benchmark,
> pgbench does a reasonable effort not to overshot that?

Right.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-10 Thread Fabien COELHO



Hello,


Of course, users themselves should be careful of problematic script, but it
would be better that pgbench itself avoids problems if pgbench can beforehand.


Or, we should terminate the last cycle of benchmark regardless it is
retrying or not if -T expires. This will make pgbench behaves much
more consistent.


I would tend to agree with this behavior, that is not to start any new 
transaction or transaction attempt once -T has expired.


I'm a little hesitant about how to count and report such unfinished 
because of bench timeout transactions, though. Not counting them seems to 
be the best option.



Hmmm, indeed this might make the behaviour a bit consistent, but I am not
sure such behavioural change benefit users.


The user benefit would be that if they asked for a 100s benchmark, pgbench 
does a reasonable effort not to overshot that?


--
Fabien.




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-10 Thread Tatsuo Ishii
I have played with v14 patch. I previously complained that pgbench
always reported 9 errors (actually the number is always the number
specified by "-c" -1 in my case).

$ pgbench -p 11000 -c 10  -T 10  --max-tries=0 test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: 
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 10 s
number of transactions actually processed: 974
number of failed transactions: 9 (0.916%)
number of transactions retried: 651 (66.226%)
total number of retries: 8482
latency average = 101.317 ms (including failures)
initial connection time = 44.440 ms
tps = 97.796487 (without initial connection time)

To reduce the number of errors I provide "--max-tries=9000" because
pgbench reported 8482 errors.

$ pgbench -p 11000 -c 10  -T 10 --max-tries=9000 test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: 
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 10 s
number of transactions actually processed: 1133
number of failed transactions: 9 (0.788%)
number of transactions retried: 755 (66.112%)
total number of retries: 9278
maximum number of tries: 9000
latency average = 88.570 ms (including failures)
initial connection time = 23.384 ms
tps = 112.015219 (without initial connection time)

Unfortunately this didn't work. Still 9 errors because pgbench
terminated the last round of run.

Then I gave up to use -T, and switched to use -t. Number of
transactions for -t option was calculated by the total number of
transactions actually processed (1133) / number of clients (10) =
11.33. I rouned up 11.33 to 12, then multiply number of clients (10)
and got 120. The result:

$ pgbench -p 11000 -c 10  -t 120 --max-tries=9000 test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: 
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 120
number of transactions actually processed: 1200/1200
number of transactions retried: 675 (56.250%)
total number of retries: 8524
maximum number of tries: 9000
latency average = 93.777 ms
initial connection time = 14.120 ms
tps = 106.635908 (without initial connection time)

Finally I was able to get a result without any errors.  This is not a
super simple way to obtain pgbench results without errors, but
probably I can live with it.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-07 Thread Yugo NAGATA
Hello Fabien,

I attached the updated patch (v14)!

On Wed, 30 Jun 2021 17:33:24 +0200 (CEST)
Fabien COELHO  wrote:

> >> --report-latencies -> --report-per-command: should we keep supporting
> >> the previous option?
> >
> > Ok. Although now the option is not only for latencies, considering users who
> > are using the existing option, I'm fine with this. I got back this to the
> > previous name.
> 
> Hmmm. I liked the new name! My point was whether we need to support the 
> old one as well for compatibility, or whether we should not bother. I'm 
> still wondering. As I think that the new name is better, I'd suggest to 
> keep it.

Ok. I misunderstood it. I returned the option name to report-per-command.

If we keep report-latencies, I can imagine the following choises:
- use report-latencies to print only latency information
- use report-latencies as alias of report-per-command for compatibility
  and remove at an appropriate timing. (that is, treat as deprecated)

Among these, I prefer the latter because ISTM we would not need many options
for reporting information per command. However, actually, I wander that we
don't have to keep the previous one if we plan to remove it eventually.

> >> --failures-detailed: if we bother to run with handling failures, should
> >> it always be on?
> >
> > If we print other failures that cannot be retried in future, it could a lot
> > of lines and might make some users who don't need details of failures 
> > annoyed.
> > Moreover, some users would always need information of detailed failures in 
> > log,
> > and others would need only total numbers of failures.
> 
> Ok.
> 
> > Currently we handle only serialization and deadlock failures, so the number 
> > of
> > lines printed and the number of columns of logging is not large even under 
> > the
> > failures-detail, but if we have a chance to handle other failures in future,
> > ISTM adding this option makes sense considering users who would like simple
> > outputs.
> 
> Hmmm. What kind of failures could be managed with retries? I guess that on 
> a connection failure we can try to reconnect, but otherwise it is less 
> clear that other failures make sense to retry.

Indeed, there would few failures that we should retry and I can not imagine
other than serialization , deadlock, and connection failures for now. However,
considering reporting the number of failed transaction and its causes in future,
as you said

> Given that we manage errors, ISTM that we should not necessarily
> stop on other not retried errors, but rather count/report them and
> possibly proceed. 

, we could define more a few kind of failures. At least we can consider
meta-command and other SQL commands errors in addition to serialization, 
deadlock, connection failures. So, the total number of kind of failures would
be five at least and reporting always all of them results a lot of lines and
columns in logging.

> >> --debug-errors: I'm not sure we should want a special debug mode for that,
> >> I'd consider integrating it with the standard debug, or just for 
> >> development.
> >
> > I think --debug is a debug option for telling users the pgbench's internal
> > behaviors, that is, which client is doing what. On other hand, 
> > --debug-errors
> > is for telling users what error caused a retry or a failure in detail. For
> > users who are not interested in pgbench's internal behavior (sending a 
> > command,
> > receiving a result, ... ) but interested in actual errors raised during 
> > running
> > script, this option seems useful.
> 
> Ok. The this is not really about debug per se, but a verbosity setting?

I think so.

> Maybe --verbose-errors would make more sense? I'm unsure. I'll think about 
> it.

Agreed. This seems more proper than the previous one, so I fixed the name to
--verbose-errors.

> > Sorry, I couldn't understand your suggestion. Is this about the order of 
> > case
> > statements or pg_log_error?
> 
> My sentence got mixed up. My point was about the case order, so that they 
> are put in a more logical order when reading all the cases.

Ok. Considering the loical order, I moved WAIT_ROLLBACK_RESULT into
between ERROR and RETRY, because WAIT_ROLLBACK_RESULT comes atter ERROR state,
and RETRY comes after ERROR or WAIT_ROLLBACK_RESULT..

> >> Currently, ISTM that the retry on error mode is implicitely always on.
> >> Do we want that? I'd say yes, but maybe people could disagree.
> >
> > The default values of max-tries is 1, so the retry on error is off.
> 
> > Failed transactions are retried only when the user wants it and
> > specifies a valid value to max-treis.
> 
> Ok. My point is that we do not stop on such errors, whereas before ISTM 
> that we would have stopped, so somehow the default behavior has changed 
> and the previous behavior cannot be reinstated with an option. Maybe that 
> is not bad, but this is a behavioral change which needs to be documented 
> and argumented.

I understood. Indeed, there is a behavioural 

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-07 Thread Yugo NAGATA
On Wed, 07 Jul 2021 21:50:16 +0900 (JST)
Tatsuo Ishii  wrote:

> >> Well, "that's very little, let's ignore it" is not technically a right
> >> direction IMO.
> > 
> > Hmmm, It seems to me these failures are ignorable because with regard to 
> > failures
> > due to -T they occur only the last transaction of each client and do not 
> > affect
> > the result such as TPS and latency of successfully processed transactions.
> > (although I am not sure for what sense you use the word "technically"...)
> 
> "My application button does not respond once in 100 times. It's just
> 1% error rate. You should ignore it." I would say this attitude is not
> technically correct.

I cannot understand what you want to say. Reporting the number of transactions 
that is failed intentionally can be treated as same as he error rate on your
application's button?

> > However, maybe I am missing something. Could you please tell me what do you 
> > think
> > the actual harm for users about failures due to -D is?
> 
> I don't know why you are referring to -D.

Sorry. It's just a typo as you can imagine.
I am asking you what do you think the actual harm for users due to termination 
of
retrying by the -T option is.

> >> That's necessarily true in practice. By the time when -T is about to
> >> expire, transactions are all finished in finite time as you can see
> >> the result I showed. So it's reasonable that the very last cycle of
> >> the benchmark will finish in finite time as well.
> > 
> > Your script may finish in finite time, but others may not.
> 
> That's why I said "practically". In other words "in most cases the
> scenario will finish in finite time".

Sure.

> > Indeed, it is possible an execution of a query takes a long or infinite
> > time. However, its cause would a problematic query in the custom script
> > or other problems occurs on the server side. These are not problem of
> > pgbench and, pgbench itself can't control either. On the other hand, the
> > unlimited number of tries is a behaviours specified by the pgbench option,
> > so I think pgbench itself should internally avoid problems caused from its
> > behaviours. That is, if max-tries=0 could cause infinite or much longer
> > benchmark time more than user expected due to too many retries, I think
> > pgbench should avoid it.
> 
> I would say that's user's responsibility to avoid infinite running
> benchmarking. Remember, pgbench is a tool for serious users, not for
> novice users.

Of course, users themselves should be careful of problematic script, but it
would be better that pgbench itself avoids problems if pgbench can beforehand.
 
> Or, we should terminate the last cycle of benchmark regardless it is
> retrying or not if -T expires. This will make pgbench behaves much
> more consistent.

Hmmm, indeed this might make the behaviour a bit consistent, but I am not
sure such behavioural change benefit users.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-07 Thread Tatsuo Ishii
>> Well, "that's very little, let's ignore it" is not technically a right
>> direction IMO.
> 
> Hmmm, It seems to me these failures are ignorable because with regard to 
> failures
> due to -T they occur only the last transaction of each client and do not 
> affect
> the result such as TPS and latency of successfully processed transactions.
> (although I am not sure for what sense you use the word "technically"...)

"My application button does not respond once in 100 times. It's just
1% error rate. You should ignore it." I would say this attitude is not
technically correct.

> However, maybe I am missing something. Could you please tell me what do you 
> think
> the actual harm for users about failures due to -D is?

I don't know why you are referring to -D.

>> That's necessarily true in practice. By the time when -T is about to
>> expire, transactions are all finished in finite time as you can see
>> the result I showed. So it's reasonable that the very last cycle of
>> the benchmark will finish in finite time as well.
> 
> Your script may finish in finite time, but others may not.

That's why I said "practically". In other words "in most cases the
scenario will finish in finite time".

> Indeed, it is possible an execution of a query takes a long or infinite
> time. However, its cause would a problematic query in the custom script
> or other problems occurs on the server side. These are not problem of
> pgbench and, pgbench itself can't control either. On the other hand, the
> unlimited number of tries is a behaviours specified by the pgbench option,
> so I think pgbench itself should internally avoid problems caused from its
> behaviours. That is, if max-tries=0 could cause infinite or much longer
> benchmark time more than user expected due to too many retries, I think
> pgbench should avoid it.

I would say that's user's responsibility to avoid infinite running
benchmarking. Remember, pgbench is a tool for serious users, not for
novice users.

Or, we should terminate the last cycle of benchmark regardless it is
retrying or not if -T expires. This will make pgbench behaves much
more consistent.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-07 Thread Yugo NAGATA
On Wed, 07 Jul 2021 16:11:23 +0900 (JST)
Tatsuo Ishii  wrote:

> > Indeed, as Ishii-san pointed out, some users might not want to terminate
> > retrying transactions due to -T. However, the actual negative effect is only
> > printing the number of failed transactions. The other result that users 
> > want to
> > know, such as tps, are almost not affected because they are measured for
> > transactions processed successfully. Actually, the percentage of failed
> > transaction is very little, only 0.347%.
> 
> Well, "that's very little, let's ignore it" is not technically a right
> direction IMO.

Hmmm, It seems to me these failures are ignorable because with regard to 
failures
due to -T they occur only the last transaction of each client and do not affect
the result such as TPS and latency of successfully processed transactions.
(although I am not sure for what sense you use the word "technically"...)

However, maybe I am missing something. Could you please tell me what do you 
think
the actual harm for users about failures due to -D is?

> > In the existing behaviour, running transactions are never terminated due to
> > the -T option. However, ISTM that this would be based on an assumption
> > that a latency of each transaction is small and that a timing when we can
> > finish the benchmark would come soon.  On the other hand, when transactions 
> > can 
> > be retried unlimitedly, it may take a long time more than expected, and we 
> > can
> > not guarantee that this would finish successfully in limited 
> > time.Therefore,  
> > terminating the benchmark by giving up to retry the transaction after time
> > expiration seems reasonable under unlimited retries.
> 
> That's necessarily true in practice. By the time when -T is about to
> expire, transactions are all finished in finite time as you can see
> the result I showed. So it's reasonable that the very last cycle of
> the benchmark will finish in finite time as well.

Your script may finish in finite time, but others may not. However, 
considering only serialization and deadlock errors, almost transactions
would finish in finite time eventually. In the previous version of the
patch, errors other than serialization or deadlock can be retried and
it causes unlimited retrying easily. Now, only the two kind of errors
can be retried, nevertheless, it is unclear for me that we can assume
that retying will finish in finite time. If we can assume it, maybe,
we can remove the restriction that --max-retries=0 must be used with
--latency-limit or -T.

> Of course if a benchmark cycle takes infinite time, this will be a
> problem. However same thing can be said to non-retry
> benchmarks. Theoretically it is possible that *one* benchmark cycle
> takes forever. In this case the only solution will be just hitting ^C
> to terminate pgbench. Why can't we have same assumption with
> --max-tries=0 case?

Indeed, it is possible an execution of a query takes a long or infinite
time. However, its cause would a problematic query in the custom script
or other problems occurs on the server side. These are not problem of
pgbench and, pgbench itself can't control either. On the other hand, the
unlimited number of tries is a behaviours specified by the pgbench option,
so I think pgbench itself should internally avoid problems caused from its
behaviours. That is, if max-tries=0 could cause infinite or much longer
benchmark time more than user expected due to too many retries, I think
pgbench should avoid it.

> > In the sense that we don't
> > terminate running transactions forcibly, this don't change the existing 
> > behaviour. 
> 
> This statement seems to be depending on your perosnal assumption.

Ok. If we regard that a transaction is still running even when it is under
retrying after an error,  terminate of the retry may imply to terminate running
the transaction forcibly.  

> I still don't understand why you think that --max-tries non 0 case
> will *certainly* finish in finite time whereas --max-tries=0 case will
> not.

I just mean that --max-tries greater than zero will prevent pgbench from 
retrying a
transaction forever.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-07 Thread Tatsuo Ishii
> Indeed, as Ishii-san pointed out, some users might not want to terminate
> retrying transactions due to -T. However, the actual negative effect is only
> printing the number of failed transactions. The other result that users want 
> to
> know, such as tps, are almost not affected because they are measured for
> transactions processed successfully. Actually, the percentage of failed
> transaction is very little, only 0.347%.

Well, "that's very little, let's ignore it" is not technically a right
direction IMO.

> In the existing behaviour, running transactions are never terminated due to
> the -T option. However, ISTM that this would be based on an assumption
> that a latency of each transaction is small and that a timing when we can
> finish the benchmark would come soon.  On the other hand, when transactions 
> can 
> be retried unlimitedly, it may take a long time more than expected, and we can
> not guarantee that this would finish successfully in limited time.Therefore,  
> terminating the benchmark by giving up to retry the transaction after time
> expiration seems reasonable under unlimited retries.

That's necessarily true in practice. By the time when -T is about to
expire, transactions are all finished in finite time as you can see
the result I showed. So it's reasonable that the very last cycle of
the benchmark will finish in finite time as well.

Of course if a benchmark cycle takes infinite time, this will be a
problem. However same thing can be said to non-retry
benchmarks. Theoretically it is possible that *one* benchmark cycle
takes forever. In this case the only solution will be just hitting ^C
to terminate pgbench. Why can't we have same assumption with
--max-tries=0 case?

> In the sense that we don't
> terminate running transactions forcibly, this don't change the existing 
> behaviour. 

This statement seems to be depending on your perosnal assumption.

I still don't understand why you think that --max-tries non 0 case
will *certainly* finish in finite time whereas --max-tries=0 case will
not.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-06 Thread Yugo NAGATA
Hello Ishii-san,

On Fri, 02 Jul 2021 09:25:03 +0900 (JST)
Tatsuo Ishii  wrote:

> I have found an interesting result from patched pgbench (I have set
> the isolation level to REPEATABLE READ):
> 
> $ pgbench -p 11000 -c 10  -T 30  --max-tries=0 test
> pgbench (15devel, server 13.3)
> starting vacuum...end.
> transaction type: 
> scaling factor: 1
> query mode: simple
> number of clients: 10
> number of threads: 1
> duration: 30 s
> number of transactions actually processed: 2586
> number of failed transactions: 9 (0.347%)
> ~
> number of transactions retried: 1892 (72.909%)
> total number of retries: 21819
> latency average = 115.551 ms (including failures)
> initial connection time = 35.268 ms
> tps = 86.241799 (without initial connection time)
> 
> I ran pgbench with 10 concurrent sessions. In this case pgbench always
> reports 9 failed transactions regardless the setting of -T
> option. This is because at the end of a pgbench session, only 1 out of
> 10 transaction succeeded but 9 transactions failed due to
> serialization error without any chance to retry because -T expires.
> 
> This is a little bit disappointed because I wanted to see a result of
> all transactions succeeded with retries.  I tried -t instead of -T but
> -t cannot be used with --max-tries=0.
> 
> Also I think this behavior is somewhat inconsistent with existing
> behavior of pgbench. When pgbench runs without --max-tries option,
> pgbench continues to run transactions even after -T expires:
> 
> $ time pgbench -p 11000 -T 10 -f pgbench.sql test
> pgbench (15devel, server 13.3)
> starting vacuum...end.
> transaction type: pgbench.sql
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> duration: 10 s
> number of transactions actually processed: 2
> maximum number of tries: 1
> latency average = 7009.006 ms
> initial connection time = 8.045 ms
> tps = 0.142674 (without initial connection time)
> 
> real  0m14.067s
> user  0m0.010s
> sys   0m0.004s
> 
> $ cat pgbench.sql
> SELECT pg_sleep(7);
> 
> So pgbench does not stop transactions after 10 seconds passed but
> waits for the last transaction completes. If we consistent with
> behavior when --max-tries=0, shouldn't we retry until the last
> transaction finishes?

I changed the previous patch to enable that the -T option can terminate
a retrying transaction and that we can specify --max-tries=0 without
--latency-limit if we have -T , according with the following comment.

> Doc says "you cannot use an infinite number of retries without 
> latency-limit..."
> 
> Why should this be forbidden? At least if -T timeout takes precedent and
> shortens the execution, ISTM that there could be good reason to test that.
> Maybe it could be blocked only under -t if this would lead to an non-ending
> run.

Indeed, as Ishii-san pointed out, some users might not want to terminate
retrying transactions due to -T. However, the actual negative effect is only
printing the number of failed transactions. The other result that users want to
know, such as tps, are almost not affected because they are measured for
transactions processed successfully. Actually, the percentage of failed
transaction is very little, only 0.347%.

In the existing behaviour, running transactions are never terminated due to
the -T option. However, ISTM that this would be based on an assumption
that a latency of each transaction is small and that a timing when we can
finish the benchmark would come soon.  On the other hand, when transactions can 
be retried unlimitedly, it may take a long time more than expected, and we can
not guarantee that this would finish successfully in limited time. Therefore,  
terminating the benchmark by giving up to retry the transaction after time
expiration seems reasonable under unlimited retries.  In the sense that we don't
terminate running transactions forcibly, this don't change the existing 
behaviour. 

If you don't want to print the number of transactions failed due to -T, we can
fix to forbid to use -T without latency-limit under max-tries=0 for avoiding
possible never-ending-benchmark. In this case, users have to limit the number of
transaction retry by specifying latency-limit or max-tries (>0). However,  if 
some
users would like to benchmark simply allowing unlimited retries,  using -T and
max-tries=0 seems the most straight way, so I think it is better that they can 
be
used together.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-05 Thread Yugo NAGATA
Hello Ishii-san,

On Thu, 01 Jul 2021 09:03:42 +0900 (JST)
Tatsuo Ishii  wrote:

> > v13 patches gave a compiler warning...
> > 
> > $ make >/dev/null
> > pgbench.c: In function ‘commandError’:
> > pgbench.c:3071:17: warning: unused variable ‘command’ [-Wunused-variable]
> >   const Command *command = sql_script[st->use_file].commands[st->command];
> >  ^~~

Hmm, we'll get the warning when --enable-cassert is not specified.
I'll fix it.

> There is a typo in the doc (more over ->  moreover).
> 
> >of all transaction tries; more over, you cannot use an unlimited 
> > number
> 
> of all transaction tries; moreover, you cannot use an unlimited number
> 

Thanks. I'll fix.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-07-01 Thread Tatsuo Ishii
I have found an interesting result from patched pgbench (I have set
the isolation level to REPEATABLE READ):

$ pgbench -p 11000 -c 10  -T 30  --max-tries=0 test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: 
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2586
number of failed transactions: 9 (0.347%)
~
number of transactions retried: 1892 (72.909%)
total number of retries: 21819
latency average = 115.551 ms (including failures)
initial connection time = 35.268 ms
tps = 86.241799 (without initial connection time)

I ran pgbench with 10 concurrent sessions. In this case pgbench always
reports 9 failed transactions regardless the setting of -T
option. This is because at the end of a pgbench session, only 1 out of
10 transaction succeeded but 9 transactions failed due to
serialization error without any chance to retry because -T expires.

This is a little bit disappointed because I wanted to see a result of
all transactions succeeded with retries.  I tried -t instead of -T but
-t cannot be used with --max-tries=0.

Also I think this behavior is somewhat inconsistent with existing
behavior of pgbench. When pgbench runs without --max-tries option,
pgbench continues to run transactions even after -T expires:

$ time pgbench -p 11000 -T 10 -f pgbench.sql test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: pgbench.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 2
maximum number of tries: 1
latency average = 7009.006 ms
initial connection time = 8.045 ms
tps = 0.142674 (without initial connection time)

real0m14.067s
user0m0.010s
sys 0m0.004s

$ cat pgbench.sql
SELECT pg_sleep(7);

So pgbench does not stop transactions after 10 seconds passed but
waits for the last transaction completes. If we consistent with
behavior when --max-tries=0, shouldn't we retry until the last
transaction finishes?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-30 Thread Tatsuo Ishii
> v13 patches gave a compiler warning...
> 
> $ make >/dev/null
> pgbench.c: In function ‘commandError’:
> pgbench.c:3071:17: warning: unused variable ‘command’ [-Wunused-variable]
>   const Command *command = sql_script[st->use_file].commands[st->command];
>  ^~~

There is a typo in the doc (more over ->  moreover).

>of all transaction tries; more over, you cannot use an unlimited number

of all transaction tries; moreover, you cannot use an unlimited number

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-30 Thread Tatsuo Ishii
> I attached the patch updated according with your suggestion.

v13 patches gave a compiler warning...

$ make >/dev/null
pgbench.c: In function ‘commandError’:
pgbench.c:3071:17: warning: unused variable ‘command’ [-Wunused-variable]
  const Command *command = sql_script[st->use_file].commands[st->command];
 ^~~

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-30 Thread Fabien COELHO


Hello Yugo-san,

Thanks for the update!


Patch seems to apply cleanly with "git apply", but does not compile on my
host: "undefined reference to `conditional_stack_reset'".

However it works better when using the "patch". I'm wondering why git
apply fails silently…


Hmm, I don't know why your compiling fails... I can apply and complile
successfully using git.


Hmmm. Strange!

Given that we manage errors, ISTM that we should not necessarily stop 
on other not retried errors, but rather count/report them and possibly 
proceed.  Eg with something like: [...] We could count the fail, 
rollback if necessary, and go on.  What do you think? Maybe such 
behavior would deserve an option.


This feature to count failures that could occur at runtime seems nice. However,
as discussed in [1], I think it is better to focus to only failures that can be
retried in this patch, and introduce the feature to handle other failures in a
separate patch.


Ok.


--report-latencies -> --report-per-command: should we keep supporting
the previous option?


Ok. Although now the option is not only for latencies, considering users who
are using the existing option, I'm fine with this. I got back this to the
previous name.


Hmmm. I liked the new name! My point was whether we need to support the 
old one as well for compatibility, or whether we should not bother. I'm 
still wondering. As I think that the new name is better, I'd suggest to 
keep it.



--failures-detailed: if we bother to run with handling failures, should
it always be on?


If we print other failures that cannot be retried in future, it could a lot
of lines and might make some users who don't need details of failures annoyed.
Moreover, some users would always need information of detailed failures in log,
and others would need only total numbers of failures.


Ok.


Currently we handle only serialization and deadlock failures, so the number of
lines printed and the number of columns of logging is not large even under the
failures-detail, but if we have a chance to handle other failures in future,
ISTM adding this option makes sense considering users who would like simple
outputs.


Hmmm. What kind of failures could be managed with retries? I guess that on 
a connection failure we can try to reconnect, but otherwise it is less 
clear that other failures make sense to retry.



--debug-errors: I'm not sure we should want a special debug mode for that,
I'd consider integrating it with the standard debug, or just for development.


I think --debug is a debug option for telling users the pgbench's internal
behaviors, that is, which client is doing what. On other hand, --debug-errors
is for telling users what error caused a retry or a failure in detail. For
users who are not interested in pgbench's internal behavior (sending a command,
receiving a result, ... ) but interested in actual errors raised during running
script, this option seems useful.


Ok. The this is not really about debug per se, but a verbosity setting?
Maybe --verbose-errors would make more sense? I'm unsure. I'll think about 
it.



Also, should it use pg_log_debug?


If we use pg_log_debug, the message is printed only under --debug.
Therefore, I fixed to use pg_log_info instead of pg_log_error or fprintf.


Ok, pg_log_info seems right.


Tries vs retries: I'm at odds with having tries & retries and + 1 here
and there to handle that, which is a little bit confusing. I'm wondering whether
we could only count "tries" and adjust to report what we want later?


I fixed to use "tries" instead of "retries" in CState. However, we still use
"retries" in StatsData and Command because the number of retries is printed
in the final result. Is it less confusing than the previous?


I'm going to think about it.


advanceConnectionState: ISTM that ERROR should logically be before others which
lead to it.


Sorry, I couldn't understand your suggestion. Is this about the order of case
statements or pg_log_error?


My sentence got mixed up. My point was about the case order, so that they 
are put in a more logical order when reading all the cases.



Currently, ISTM that the retry on error mode is implicitely always on.
Do we want that? I'd say yes, but maybe people could disagree.


The default values of max-tries is 1, so the retry on error is off.



Failed transactions are retried only when the user wants it and
specifies a valid value to max-treis.


Ok. My point is that we do not stop on such errors, whereas before ISTM 
that we would have stopped, so somehow the default behavior has changed 
and the previous behavior cannot be reinstated with an option. Maybe that 
is not bad, but this is a behavioral change which needs to be documented 
and argumented.


See the attached files for generating deadlocks reliably (start with 2 
clients). What do you think? The PL/pgSQL minimal, it is really 
client-code oriented.


Sorry, but I cannot find the attached file.


Sorry. Attached to this mail. The serialization stuff 

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-30 Thread Yugo NAGATA
Hello Fabien,

On Sat, 26 Jun 2021 12:15:38 +0200 (CEST)
Fabien COELHO  wrote:

> 
> Hello Yugo-san,
> 
> # About v12.2
> 
> ## Compilation
> 
> Patch seems to apply cleanly with "git apply", but does not compile on my 
> host: "undefined reference to `conditional_stack_reset'".
> 
> However it works better when using the "patch". I'm wondering why git 
> apply fails silently…

Hmm, I don't know why your compiling fails... I can apply and complile
successfully using git.

> When compiling there are warnings about "pg_log_fatal", which does not 
> expect a FILE* on pgbench.c:4453. Remove the "stderr" argument.

Ok.

> Global and local checks ok.
> 
> > number of transactions failed: 324 (3.240%)
> > ...
> > number of transactions retried: 5629 (56.290%)
> > number of total retries: 103299
> 
> I'd suggest: "number of failed transactions". "total number of retries" or 
> just "number of retries"?

Ok. I fixed to use "number of failed transactions" and "total number of 
retries".

> ## Feature
> 
> The overall code structure changes to implements the feature seems 
> reasonable to me, as we are at the 12th iteration of the patch.
> 
> Comments below are somehow about details and asking questions
> about choices, and commenting…
> 
> ## Documentation
> 
> There is a lot of documentation, which is good. I'll review these
> separatly. It looks good, but having a native English speaker/writer
> would really help!
> 
> Some output examples do not correspond to actual output for
> the current version. In particular, there is always one TPS figure
> given now, instead of the confusing two shown before.

Fixed.

> ## Comments
> 
> transactinos -> transactions.

Fixed.

> ## Code
> 
> By default max_tries = 0. Should not the initialization be 1,
> as the documentation argues that it is the default?

Ok. I fixed the default value to 1.

> Counter comments, missing + in the formula on the skipped line.

Fixed.

> Given that we manage errors, ISTM that we should not necessarily
> stop on other not retried errors, but rather count/report them and
> possibly proceed.  Eg with something like:
> 
>-- server side random fail
>DO LANGUAGE plpgsql $$
>BEGIN
>  IF RANDOM() < 0.1 THEN
>RAISE EXCEPTION 'unlucky!';
>  END IF;
>END;
>$$;
> 
> Or:
> 
>-- client side random fail
>BEGIN;
>\if random(1, 10) <= 1
>SELECT 1 +;
>\else
>SELECT 2;
>\endif
>COMMIT;
> 
> We could count the fail, rollback if necessary, and go on.  What do you think?
> Maybe such behavior would deserve an option.

This feature to count failures that could occur at runtime seems nice. However,
as discussed in [1], I think it is better to focus to only failures that can be
retried in this patch, and introduce the feature to handle other failures in a
separate patch.

[1] 
https://www.postgresql.org/message-id/alpine.DEB.2.21.1809121519590.13887%40lancre

> --report-latencies -> --report-per-command: should we keep supporting
> the previous option?

Ok. Although now the option is not only for latencies, considering users who
are using the existing option, I'm fine with this. I got back this to the
previous name.

> --failures-detailed: if we bother to run with handling failures, should
> it always be on?

If we print other failures that cannot be retried in future, it could a lot
of lines and might make some users who don't need details of failures annoyed.
Moreover, some users would always need information of detailed failures in log,
and others would need only total numbers of failures. 

Currently we handle only serialization and deadlock failures, so the number of
lines printed and the number of columns of logging is not large even under the
failures-detail, but if we have a chance to handle other failures in future,  
ISTM adding this option makes sense considering users who would like simple
outputs.
 
> --debug-errors: I'm not sure we should want a special debug mode for that,
> I'd consider integrating it with the standard debug, or just for development.

I think --debug is a debug option for telling users the pgbench's internal
behaviors, that is, which client is doing what. On other hand, --debug-errors
is for telling users what error caused a retry or a failure in detail. For
users who are not interested in pgbench's internal behavior (sending a command, 
receiving a result, ... ) but interested in actual errors raised during running 
script, this option seems useful.

> Also, should it use pg_log_debug?

If we use pg_log_debug, the message is printed only under --debug.
Therefore, I fixed to use pg_log_info instead of pg_log_error or fprintf.
 
> doRetry: I'd separate the 3 no retries options instead of mixing max_tries and
> timer_exceeeded, for clarity.

Ok. I fixed to separate them.
 
> Tries vs retries: I'm at odds with having tries & retries and + 1 here
> and there to handle that, which is a little bit confusing. I'm wondering 
> whether
> we could only count "tries" and adjust 

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-26 Thread Fabien COELHO


Hello Yugo-san,

# About v12.2

## Compilation

Patch seems to apply cleanly with "git apply", but does not compile on my 
host: "undefined reference to `conditional_stack_reset'".


However it works better when using the "patch". I'm wondering why git 
apply fails silently…


When compiling there are warnings about "pg_log_fatal", which does not 
expect a FILE* on pgbench.c:4453. Remove the "stderr" argument.


Global and local checks ok.


number of transactions failed: 324 (3.240%)
...
number of transactions retried: 5629 (56.290%)
number of total retries: 103299


I'd suggest: "number of failed transactions". "total number of retries" or 
just "number of retries"?


## Feature

The overall code structure changes to implements the feature seems 
reasonable to me, as we are at the 12th iteration of the patch.


Comments below are somehow about details and asking questions
about choices, and commenting…

## Documentation

There is a lot of documentation, which is good. I'll review these
separatly. It looks good, but having a native English speaker/writer
would really help!

Some output examples do not correspond to actual output for
the current version. In particular, there is always one TPS figure
given now, instead of the confusing two shown before.

## Comments

transactinos -> transactions.

## Code

By default max_tries = 0. Should not the initialization be 1,
as the documentation argues that it is the default?

Counter comments, missing + in the formula on the skipped line.

Given that we manage errors, ISTM that we should not necessarily
stop on other not retried errors, but rather count/report them and
possibly proceed.  Eg with something like:

  -- server side random fail
  DO LANGUAGE plpgsql $$
  BEGIN
IF RANDOM() < 0.1 THEN
  RAISE EXCEPTION 'unlucky!';
END IF;
  END;
  $$;

Or:

  -- client side random fail
  BEGIN;
  \if random(1, 10) <= 1
  SELECT 1 +;
  \else
  SELECT 2;
  \endif
  COMMIT;

We could count the fail, rollback if necessary, and go on.  What do you think?
Maybe such behavior would deserve an option.

--report-latencies -> --report-per-command: should we keep supporting
the previous option?

--failures-detailed: if we bother to run with handling failures, should
it always be on?

--debug-errors: I'm not sure we should want a special debug mode for that,
I'd consider integrating it with the standard debug, or just for development.
Also, should it use pg_log_debug?

doRetry: I'd separate the 3 no retries options instead of mixing max_tries and
timer_exceeeded, for clarity.

Tries vs retries: I'm at odds with having tries & retries and + 1 here
and there to handle that, which is a little bit confusing. I'm wondering whether
we could only count "tries" and adjust to report what we want later?

advanceConnectionState: ISTM that ERROR should logically be before others which
lead to it.

Variables management: it looks expensive, with copying and freeing variable 
arrays.
I'm wondering whether we should think of something more clever. Well, that 
would be
for some other patch.

"Accumulate the retries" -> "Count (re)tries"?

Currently, ISTM that the retry on error mode is implicitely always on.
Do we want that? I'd say yes, but maybe people could disagree.

## Tests

There are tests, good!

I'm wondering whether something simpler could be devised to trigger
serialization or deadlock errors, eg with a SEQUENCE and an \if.

See the attached files for generating deadlocks reliably (start with 2 clients).
What do you think? The PL/pgSQL minimal, it is really client-code 
oriented.


Given that deadlocks are detected about every seconds, the test runs
would take some time. Let it be for now.

--
Fabien.

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-26 Thread Fabien COELHO


Hello Yugo-san,


I'm wondering whether we could use "vars" instead of "variables" as a
struct field name and function parameter name, so that is is shorter and
more distinct from the type name "Variables". What do you think?


The struct "Variables" has a field named "vars" which is an array of
"Variable" type. I guess this is a reason why "variables" is used instead
of "vars" as a name of "Variables" type variable so that we could know
a variable's type is Variable or Variables.  Also, in order to refer to
the field, we would use

vars->vars[vars->nvars]

and there are nested "vars". Could this make a codereader confused?


Hmmm… Probably. Let's keep "variables" then.

--
Fabien.

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-25 Thread Yugo NAGATA
On Wed, 23 Jun 2021 10:38:43 +0200 (CEST)
Fabien COELHO  wrote:

> 
> Hello Yugo-san:
> 
> # About v12.1
> 
> This is a refactoring patch, which creates a separate structure for 
> holding variables. This will become handy in the next patch. There is also 
> a benefit from a software engineering point of view, so it has merit on 
> its own.

> ## Compilation
> 
> Patch applies cleanly, compiles, global & local checks pass.
> 
> ## About the code
> 
> Fine.
> 
> I'm wondering whether we could use "vars" instead of "variables" as a 
> struct field name and function parameter name, so that is is shorter and 
> more distinct from the type name "Variables". What do you think?

The struct "Variables" has a field named "vars" which is an array of
"Variable" type. I guess this is a reason why "variables" is used instead
of "vars" as a name of "Variables" type variable so that we could know
a variable's type is Variable or Variables.  Also, in order to refer to
the field, we would use

 vars->vars[vars->nvars]

and there are nested "vars". Could this make a codereader confused?


> ## About comments
> 
> Remove the comment on enlargeVariables about "It is assumed …" the issue 
> of trying MAXINT vars is more than remote and is not worth mentioning. In 
> the same function, remove the comments about MARGIN, it is already on the 
> macro declaration, once is enough.

Sure. I'll remove them.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-23 Thread Fabien COELHO


Hello Yugo-san:

# About v12.1

This is a refactoring patch, which creates a separate structure for 
holding variables. This will become handy in the next patch. There is also 
a benefit from a software engineering point of view, so it has merit on 
its own.


## Compilation

Patch applies cleanly, compiles, global & local checks pass.

## About the code

Fine.

I'm wondering whether we could use "vars" instead of "variables" as a 
struct field name and function parameter name, so that is is shorter and 
more distinct from the type name "Variables". What do you think?


## About comments

Remove the comment on enlargeVariables about "It is assumed …" the issue 
of trying MAXINT vars is more than remote and is not worth mentioning. In 
the same function, remove the comments about MARGIN, it is already on the 
macro declaration, once is enough.


--
Fabien.

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-22 Thread Yugo NAGATA
Hello Fabien,

On Tue, 22 Jun 2021 20:03:58 +0200 (CEST)
Fabien COELHO  wrote:

> 
> Hello Yugo-san,
> 
> Thanks a lot for continuing this work started by Marina!
> 
> I'm planning to review it for the July CF. I've just added an entry there:
> 
>   https://commitfest.postgresql.org/33/3194/

Thanks!

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-22 Thread Fabien COELHO



Hello Yugo-san,

Thanks a lot for continuing this work started by Marina!

I'm planning to review it for the July CF. I've just added an entry there:

https://commitfest.postgresql.org/33/3194/

--
Fabien.




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-22 Thread Yugo NAGATA
Hi hackers,

On Mon, 24 May 2021 11:29:10 +0900
Yugo NAGATA  wrote:

> Hi hackers,
> 
> On Tue, 10 Mar 2020 09:48:23 +1300
> Thomas Munro  wrote:
> 
> > On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO  wrote:
> > > >> Thank you very much! I'm going to send a new patch set until the end of
> > > >> this week (I'm sorry I was very busy in the release of Postgres Pro
> > > >> 11...).
> > > >
> > > > Is anyone interested in rebasing this, and summarising what needs to
> > > > be done to get it in?  It's arguably a bug or at least quite
> > > > unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
> > > > that a couple of forks already ship Marina's patch set.
> 
> I got interested in this and now looking into the patch and the past 
> discussion. 
> If anyone other won't do it and there are no objection, I would like to rebase
> this. Is that okay?

I rebased and fixed the previous patches (v11) rewtten by Marina Polyakova,
and attached the revised version (v12).

v12-0001-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client 
variables during the repeating of transactions after 
serialization/deadlock failures).

v12-0002-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of 
transactions with serialization/deadlock failures (see the detailed 
description in the file).

These are the revised versions from v11-0002 and v11-0003. v11-0001
(for the RandomState structure) is not included because this has been
already committed (40923191944). V11-0004 (for a separate error reporting
function) is not included neither because pgbench now uses common logging
APIs (30a3e772b40).

In addition to rebase on master, I updated the patch according with the
review from Fabien COELHO [1] and discussions after this. Also, I added
some other fixes through my reviewing the previous patch.

[1] 
https://www.postgresql.org/message-id/alpine.DEB.2.21.1809081450100.10506%40lancre

Following are fixes according with Fabian's review.

> * Features

> As far as the actual retry feature is concerned, I'd say we are nearly 
> there. However I have an issue with changing the behavior on meta command 
> and other sql errors, which I find not desirable.
...
> I do not think that these changes of behavior are desirable. Meta command and
> miscellaneous SQL errors should result in immediatly aborting the whole run,
> because the client test code itself could not run correctly or the SQL sent
> was somehow wrong, which is also the client's fault, and the server 
> performance bench does not make much sense in such conditions.
> 
> ISTM that the focus of this patch should only be to handle some server 
> runtime errors that can be retryed, but not to change pgbench behavior on 
> other kind of errors. If these are to be changed, ISTM that it would be a 
> distinct patch and would require some discussion, and possibly an option 
> to enable it or not if some use case emerge. AFA this patch is concerned, 
> I'd suggest to let that out.

Previously, all SQL and meta command errors could be retried, but I fixed
to allow only serialization & deadlock errors to be retried. 

> Doc says "you cannot use an infinite number of retries without 
> latency-limit..."
> 
> Why should this be forbidden? At least if -T timeout takes precedent and
> shortens the execution, ISTM that there could be good reason to test that.
> Maybe it could be blocked only under -t if this would lead to an non-ending
> run.

I fixed to allow to use --max-tries with -T option even if latency-limit
is not used.

> As "--print-errors" is really for debug, maybe it could be named
> "--debug-errors". I'm not sure that having "--debug" implying this option
> is useful: As there are two distinct options, the user may be allowed
> to trigger one or the other as they wish?

print-errors was renamed to debug-errors.

> makeVariableValue error message is not for debug, but must be kept in all
> cases, and the false returned must result in an immediate abort. Same thing 
> about
> lookupCreateVariable, an invalid name is a user error which warrants an 
> immediate
> abort. Same thing again about coerce* functions or evalStandardFunc...
> Basically, most/all added "debug_level >= DEBUG_ERRORS" are not desirable.

"DEBUG_ERRORS" messages unrelated to serialization & deadlock errors were 
removed.

> sendRollback(): I'd suggest to simplify. The prepare/extended statement stuff 
> is
> really about the transaction script, not dealing with errors, esp as there is 
> no
> significant advantage in preparing a "ROLLBACK" statement which is short and 
> has
> no parameters. I'd suggest to remove this function and just issue
> PQsendQuery("ROLLBACK;") in all cases.

Now, we just issue PQsendQuery("ROLLBACK;").

> In copyVariables, I'd simplify
>
>  + if (source_var->svalue == NULL)
>  +   dest_var->svalue = NULL;
>  + else
>  +   dest_var->svalue =