Hi Fabien, On Fri, Mar 15, 2019 at 4:17 PM, Fabien COELHO wrote: > >> echo 'select 1' > select.sql > >> > >> while /bin/true; do > >> pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1; > >> date; > >> done; > > > > Indeed. I'll look at it over the weekend. > > > >> So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or > >> one of the other commits touching this part of the code. > > I could not reproduce this issue on head, but I confirm on 11.2.
I could reproduce the stuck on 11.4. On Sat, Mar 16, 2019 at 10:14 AM, Fabien COELHO wrote: > Attached is a fix to apply on pg11. I confirm the stuck doesn't happen after applying your patch. It passes make check-world. This change seems not to affect performance, so I didn't do any performance test. > + /* under throttling we may have finished the last client above > */ > + if (remains == 0) > + break; If there are only CSTATE_WAIT_RESULT, CSTATE_SLEEP or CSTATE_THROTTLE clients, a thread needs to wait the results or sleep. In that logic, there are the case that a thread tried to wait the results when there are no clients wait the results, and this causes the issue. This is happened when there are only CSTATE_THROTLE clients and pgbench timeout is occured. Those clients will be finished and "remains" will be 0. I confirmed above codes prevent such a case. I almost think this is ready for committer, but I have one question. Is it better adding any check like if(maxsock != -1) before the select? else /* no explicit delay, select without timeout */ { nsocks = select(maxsock + 1, &input_mask, NULL, NULL, NULL); } -- Yoshikazu Imai