Re: seems like a bug in pgbench -R

2019-07-26 Thread Tom Lane
Fabien COELHO  writes:
>> |...] So I'll mark this ready for committer.

> Ok, thanks for the review.

LGTM, pushed.

regards, tom lane




RE: seems like a bug in pgbench -R

2019-07-25 Thread Fabien COELHO



Hello Yoshikazu,


|...] So I'll mark this ready for committer.


Ok, thanks for the review.

--
Fabien.




RE: seems like a bug in pgbench -R

2019-07-24 Thread Imai, Yoshikazu
On Wed, July 24, 2019 at 7:02 PM, Fabien COELHO wrote:
> > but I have one question. Is it better adding any check like if(maxsock
> > != -1) before the select?
> >
> > else/* no explicit delay, select without timeout */
> > {
> >nsocks = select(maxsock + 1, _mask, NULL, NULL, NULL); }
> 
> I think that it is not necessary because this case cannot happen: If some
> clients are still running (remains > 0), they are either sleeping, in
> which case there would be a timeout, or they are waiting for something
> from the server, otherwise the script could be advanced further so there
> would be something else to do for the thread.

Ah, I understand.

> We could check this by adding "Assert(maxsock != -1);" before this select,
> but I would not do that for a released version.

Yeah I also imagined that we can use Assert, but ah, it's released version.
I got it. Thanks for telling that.

So I'll mark this ready for committer.

--
Yoshikazu Imai





RE: seems like a bug in pgbench -R

2019-07-24 Thread Fabien COELHO



Hello Yoshikazu,


I could not reproduce this issue on head, but I confirm on 11.2.


I could reproduce the stuck on 11.4.


Attached is a fix to apply on pg11.


I confirm the stuck doesn't happen after applying your patch.


Ok, thanks for the feedback.


+   /* under throttling we may have finished the last client above 
*/
+   if (remains == 0)
+   break;


If there are only CSTATE_WAIT_RESULT, CSTATE_SLEEP or CSTATE_THROTTLE clients,
a thread needs to wait the results or sleep. In that logic, there are the case
that a thread tried to wait the results when there are no clients wait the
results, and this causes the issue. This is happened when there are only
CSTATE_THROTLE clients and pgbench timeout is occured. Those clients will be
finished and "remains" will be 0.

I confirmed above codes prevent such a case.


Yep.


I almost think this is ready for committer,


Almost great, then.

but I have one question. Is it better adding any check like if(maxsock 
!= -1) before the select?


else/* no explicit delay, select without timeout */
{
   nsocks = select(maxsock + 1, _mask, NULL, NULL, NULL);
}


I think that it is not necessary because this case cannot happen: If some 
clients are still running (remains > 0), they are either sleeping, in 
which case there would be a timeout, or they are waiting for something 
from the server, otherwise the script could be advanced further so there 
would be something else to do for the thread.


We could check this by adding "Assert(maxsock != -1);" before this select, 
but I would not do that for a released version.


--
Fabien.




RE: seems like a bug in pgbench -R

2019-07-24 Thread Imai, Yoshikazu
Hi Fabien,

On Fri, Mar 15, 2019 at 4:17 PM, Fabien COELHO wrote:
> >> echo 'select 1' > select.sql
> >> 
> >> while /bin/true; do
> >>  pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
> >>  date;
> >> done;
> >
> > Indeed. I'll look at it over the weekend.
> >
> >> So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
> >> one of the other commits touching this part of the code.
> 
> I could not reproduce this issue on head, but I confirm on 11.2.

I could reproduce the stuck on 11.4.

On Sat, Mar 16, 2019 at 10:14 AM, Fabien COELHO wrote:
> Attached is a fix to apply on pg11.

I confirm the stuck doesn't happen after applying your patch.

It passes make check-world.

This change seems not to affect performance, so I didn't do any performance
test.

> + /* under throttling we may have finished the last client above 
> */
> + if (remains == 0)
> + break;

If there are only CSTATE_WAIT_RESULT, CSTATE_SLEEP or CSTATE_THROTTLE clients,
a thread needs to wait the results or sleep. In that logic, there are the case
that a thread tried to wait the results when there are no clients wait the
results, and this causes the issue. This is happened when there are only
CSTATE_THROTLE clients and pgbench timeout is occured. Those clients will be
finished and "remains" will be 0.

I confirmed above codes prevent such a case.


I almost think this is ready for committer, but I have one question.

Is it better adding any check like if(maxsock != -1) before the select?


else/* no explicit delay, select without timeout */
{
nsocks = select(maxsock + 1, _mask, NULL, NULL, NULL);
}

--
Yoshikazu Imai




Re: seems like a bug in pgbench -R

2019-03-16 Thread Fabien COELHO


Hello Tomas,


So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
one of the other commits touching this part of the code.


I could not reproduce this issue on head, but I confirm on 11.2.



AFAICS on head it's fixed by 3bac77c48f166b9024a5ead984df73347466ae12


Thanks for the information.

I pinpointed the exact issue in one go: no surprise given that the patch 
was motivated by cleaning up this kind of external state machine changes 
which I thought doubtful and error prone.


Attached is a fix to apply on pg11.

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 9eaa192239..e2093cdaca 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5762,6 +5762,10 @@ threadRun(void *arg)
 			}
 		}
 
+		/* under throttling we may have finished the last client above */
+		if (remains == 0)
+			break;
+
 		/* also wake up to print the next progress report on time */
 		if (progress && min_usec > 0 && thread->tid == 0)
 		{


Re: seems like a bug in pgbench -R

2019-03-15 Thread Tomas Vondra
On 3/15/19 5:16 PM, Fabien COELHO wrote:
> 
>>> echo 'select 1' > select.sql
>>>
>>> while /bin/true; do
>>>  pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
>>>  date;
>>> done;
>>
>> Indeed. I'll look at it over the weekend.
>>
>>> So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
>>> one of the other commits touching this part of the code.
> 
> I could not reproduce this issue on head, but I confirm on 11.2.
> 

AFAICS on head it's fixed by 3bac77c48f166b9024a5ead984df73347466ae12

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: seems like a bug in pgbench -R

2019-03-15 Thread Fabien COELHO




echo 'select 1' > select.sql

while /bin/true; do
 pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
 date;
done;


Indeed. I'll look at it over the weekend.


So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
one of the other commits touching this part of the code.


I could not reproduce this issue on head, but I confirm on 11.2.

--
Fabien.



Re: seems like a bug in pgbench -R

2019-03-15 Thread Fabien COELHO




echo 'select 1' > select.sql

while /bin/true; do
 pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
 date;
done;


Indeed. I'll look at it over the weekend.


So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
one of the other commits touching this part of the code.


Yep, possibly.

--
Fabien.



seems like a bug in pgbench -R

2019-03-14 Thread Tomas Vondra
Hi,

There seems to be a bug in pgbench when used with '-R' option, resulting
in stuck pgbench process. Reproducing it is pretty easy:

echo 'select 1' > select.sql

while /bin/true; do
  pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
  date;
done;

I do get a stuck pgbench within a few rounds (~10-20), but YMMV. It gets
stuck like this (this is on REL_11_STABLE):

0x7f3b1814fcb3 in __GI___select (nfds=nfds@entry=0,
readfds=readfds@entry=0x7fff9d668ec0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x0)
at ../sysdeps/unix/sysv/linux/select.c:41
41  return SYSCALL_CANCEL (select, nfds, readfds, writefds, exceptfds,
(gdb) bt
#0  0x7f3b1814fcb3 in __GI___select (nfds=nfds@entry=0,
readfds=readfds@entry=0x7fff9d668ec0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x0)
at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x0040834d in threadRun (arg=arg@entry=0x1c9f370) at
pgbench.c:5810
#2  0x00403e0a in main (argc=, argv=) at pgbench.c:5583

All the connections are already closed at this point (there is nothing
in pg_stat_activity and log_disconnections=on confirms that), and then
there's this:

(gdb) p remains
$1 = 0
(gdb) p nstate
$1 = 1
(gdb) p state[0]
$2 = {con = 0x0, id = 0, state = CSTATE_FINISHED, cstack = 0xf21b10,
use_file = 0, command = 1, variables = 0xf2b0b0, nvariables = 4,
vars_sorted = false, txn_scheduled = 699536519281, sleep_until = 0,
txn_begin = {tv_sec = 699536, tv_nsec = 518478603}, stmt_begin = {tv_sec
= 0,  tv_nsec = 0}, prepared = {false }, cnt = 132,
ecnt = 0}

So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
one of the other commits touching this part of the code.

cheers

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services