Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.

2019-02-23 Thread Gunther

Hi,

I am using an SQL queue for distributing work to massively parallel 
workers. Workers come in servers with 12 parallel threads. One of those 
worker sets handles 7 transactions per second. If I add a second one, 
for 24 parallel workers, it scales to 14 /s. Even a third, for 36 
parallel workers, I can add to reach 21 /s. If I try a fourth set, 48 
workers, I end up in trouble. But that isn't even so much my problem 
rather than the fact that in short time, the performance will 
deteriorate, and it looks like that is because the queue index 
deteriorates and needs a REINDEX.


The queue table is essentially this:

CREATE TABLE Queue (
  jobId bigint,
  action text,
  pending boolean,
  result text
);

the dequeue operation is essentially this:

BEGIN

SELECT jobId, action
  FROM Queue
  WHERE pending
  FOR UPDATE SKIP LOCKED

which is a wonderful concept with the SKIP LOCKED.

Then I perform the action and finally:

UPDATE Queue
   SET pending = false,
   result = ?
 WHERE jobId = ?

COMMIT

I thought to keep my index tight, I would define it like this:

CREATE UNIQUE INDEX Queue_idx_pending ON Queue(jobId) WHERE pending;

so that only pending jobs are in that index.

When a job is done, follow up work is often inserted into the Queue as 
pending, thus adding to that index.


Below is the performance chart.

The blue line at the bottom is the db server.


You can see the orange line is the first worker server with 12 threads. 
It settled into a steady state of 7/s ran with 90% CPU for some 30 min, 
and then the database CPU% started climbing and I tried to rebuild the 
indexes on the queue, got stuck there, exclusive lock, no jobs were 
processing, but the exclusive lock was never obtained for too long. So I 
shut down the worker server. Database quiet I could resolve the messed 
up indexes and restarted again. Soon I added a second worker server 
(green line) starting around 19:15. Once settled in they were pulling 
14/s together. but you can see in just 15 min, the db server CPU % 
climbed again to over 40% and the performance of the workers dropped, 
their load falling to 30%. Now at around 19:30 I stopped them all, 
REINDEXed the queue table and then started 3 workers servers 
simultaneously. They settled in to 21/s but in just 10 min again the 
deterioration happened. Again I stopped them all, REINDEXed, and now 
started 4 worker servers (48 threads). This time 5 min was not enough to 
see them ever settling into a decent 28/s transaction rate, but I guess 
they might have reached that for a minute or two, only for the index 
deteriorating again. I did another stop now started only 2 servers and 
again, soon the index deteriorated again.


Clearly that index is deteriorating quickly, in about 10,000 transactions.

BTW: when I said 7/s, it is in reality about 4 times as many 
transactions, because of the follow up jobs that also get added on this 
queue. So 10, transactions may be 30 or 40 k transactions before the 
index deteriorates.


Do you have any suggestion how I can avoid that index deterioration 
problem smoothly?


I figured I might just pause all workers briefly to schedule the REINDEX 
Queue command, but the problem with this is that while the transaction 
volume is large, some jobs may take minutes to process, and in that case 
we need to wait minutes to quiet the database with then 47 workers 
sitting as idle capacity waiting for the 48th to finish so that the 
index can be rebuilt!


Of course I tried to resolve the issue with vacuumdb --analyze (just in 
case if the autovacuum doesn't act in time) and that doesn't help. 
Vacuumdb --full --analyze would probably help but can't work because it 
required an exclusive table lock.


I tried to just create a new index of the same

CREATE UNIQUE INDEX Queue_idx2_pending ON Queue(jobId) WHERE pending;
DROP INDEX Queue_idx_pending;
ANALYZE Queue;

but with that I got completely stuck with two indexes where I could not 
remove either of them for those locking issues. And REINDEX will give me 
a deadlock error rightout.


I am looking for a way to manage that index so that it does not deteriorate.

May be if I was not defining it with

... WHERE pending;

then it would only grow, but never shrink. May be that helps somehow? I 
doubt it though. Adding to an index also causes deterioration, and most 
of the rows would be irrelevant because they would be past work. It 
would be nicer if there was another smooth way.


regards,
-Gunther





Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.

2019-02-23 Thread Peter Geoghegan
On Sat, Feb 23, 2019 at 1:06 PM Gunther  wrote:
> I thought to keep my index tight, I would define it like this:
>
> CREATE UNIQUE INDEX Queue_idx_pending ON Queue(jobId) WHERE pending;
>
> so that only pending jobs are in that index.
>
> When a job is done, follow up work is often inserted into the Queue as 
> pending, thus adding to that index.

How many distinct jobIds are there in play, roughly? Would you say
that there are many fewer distinct Jobs than distinct entries in the
index/table? Is the number of jobs fixed at a fairly low number, that
doesn't really grow as the workload needs to scale up?

-- 
Peter Geoghegan



Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.

2019-02-23 Thread Gunther

On 2/23/2019 16:13, Peter Geoghegan wrote:

On Sat, Feb 23, 2019 at 1:06 PM Gunther  wrote:

I thought to keep my index tight, I would define it like this:

CREATE UNIQUE INDEX Queue_idx_pending ON Queue(jobId) WHERE pending;

so that only pending jobs are in that index.

When a job is done, follow up work is often inserted into the Queue as pending, 
thus adding to that index.

How many distinct jobIds are there in play, roughly? Would you say
that there are many fewer distinct Jobs than distinct entries in the
index/table? Is the number of jobs fixed at a fairly low number, that
doesn't really grow as the workload needs to scale up?


Jobs start on another, external queue, there were about 200,000 of them 
waiting when I started the run.


When the SQL Queue is empty, the workers pick one job from the external 
queue and add it to the SQL queue.


When that happens immediately 2 more jobs are created on that queue. 
Let's cal it phase 1 a and b


When phase 1 a has been worked off, another follow-up job is created. 
Let' s call it phase 2.


When phase 2 has been worked off, a final phase 3 job is created.

When that is worked off, nothing new is created, and the next item is 
pulled from the external queue and added to the SQL queue.


So this means, each of the 200,000 items add (up to) 4 jobs onto the 
queue during their processing.


But since these 200,000 items are on an external queue, the SQL queue 
itself is not stuffed full at all. It only slowly grows, and on the main 
index where we have only the pending jobs, there are only probably than 
20 at any given point in time. When I said 7 jobs per second, it meant 
7/s simultaneously for all these 3+1 phases, i.e., 28 jobs per second. 
And at that rate it takes little less than 30 min for the index to 
deteriorate. I.e. once about 50,000 queue entries have been processed 
through that index it has deteriorated to become nearly unusable until 
it is rebuilt.


thanks,
-Gunther





Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.

2019-02-23 Thread David Rowley
On Sun, 24 Feb 2019 at 10:06, Gunther  wrote:
> I am using an SQL queue for distributing work to massively parallel workers. 
> Workers come in servers with 12 parallel threads. One of those worker sets 
> handles 7 transactions per second. If I add a second one, for 24 parallel 
> workers, it scales to 14 /s. Even a third, for 36 parallel workers, I can add 
> to reach 21 /s. If I try a fourth set, 48 workers, I end up in trouble. But 
> that isn't even so much my problem rather than the fact that in short time, 
> the performance will deteriorate, and it looks like that is because the queue 
> index deteriorates and needs a REINDEX.

It sounds very much like auto-vacuum is simply unable to keep up with
the rate at which the table is being updated.   Please be aware, that
by default, auto-vacuum is configured to run fairly slowly so as not
to saturate low-end machines.

vacuum_cost_limit / autovacuum_vacuum_cost limit control how many
"points" the vacuum process can accumulate before it will perform an
autovacuum_vacuum_cost_delay / vacuum_cost_delay.

Additionally, after an auto-vacuum run completes it will wait for
autovacuum_naptime before checking again if any tables require some
attention.

I think you should be monitoring how many auto-vacuums workers are
busy during your runs. If you find that the "queue" table is being
vacuumed almost constantly, then you'll likely want to increase
vacuum_cost_limit / autovacuum_vacuum_cost_limit. You could get an
idea of how often this table is being auto-vacuumed by setting
log_autovacuum_min_duration to 0 and checking the logs.  Another way
to check would be to sample what: SELECT query FROM pg_stat_activity
WHERE query LIKE 'autovacuum%'; returns. You may find that all of the
workers are busy most of the time.  If so, that indicates that the
cost limits need to be raised.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.

2019-02-23 Thread Justin Pryzby
Some ideas:

You could ALTER TABLE SET (fillfactor=50) to try to maximize use of HOT indices
during UPDATEs (check pg_stat_user_indexes).

You could also ALTER TABLE SET autovacuum parameters for more aggressive 
vacuuming.

You could recreate indices using the CONCURRENTLY trick
(CREATE INDEX CONCURRENTLY new; DROP old; ALTER .. RENAME;)

On Sat, Feb 23, 2019 at 04:05:51PM -0500, Gunther wrote:
> I am using an SQL queue for distributing work to massively parallel workers.
> Workers come in servers with 12 parallel threads. One of those worker sets
> handles 7 transactions per second. If I add a second one, for 24 parallel
> workers, it scales to 14 /s. Even a third, for 36 parallel workers, I can
> add to reach 21 /s. If I try a fourth set, 48 workers, I end up in trouble.
> But that isn't even so much my problem rather than the fact that in short
> time, the performance will deteriorate, and it looks like that is because
> the queue index deteriorates and needs a REINDEX.
> 
> The queue table is essentially this:
> 
> CREATE TABLE Queue (
>   jobId bigint,
>   action text,
>   pending boolean,
>   result text
> );
> 
> the dequeue operation is essentially this:
> 
> BEGIN
> 
> SELECT jobId, action
>   FROM Queue
>   WHERE pending
>   FOR UPDATE SKIP LOCKED
> 
> which is a wonderful concept with the SKIP LOCKED.
> 
> Then I perform the action and finally:
> 
> UPDATE Queue
>SET pending = false,
>result = ?
>  WHERE jobId = ?
> 
> COMMIT
> 
> I thought to keep my index tight, I would define it like this:
> 
> CREATE UNIQUE INDEX Queue_idx_pending ON Queue(jobId) WHERE pending;
> 
> so that only pending jobs are in that index.
> 
> When a job is done, follow up work is often inserted into the Queue as
> pending, thus adding to that index.
> 
> Below is the performance chart.
> 
> The blue line at the bottom is the db server.
> 
> 
> You can see the orange line is the first worker server with 12 threads. It
> settled into a steady state of 7/s ran with 90% CPU for some 30 min, and
> then the database CPU% started climbing and I tried to rebuild the indexes
> on the queue, got stuck there, exclusive lock, no jobs were processing, but
> the exclusive lock was never obtained for too long. So I shut down the
> worker server. Database quiet I could resolve the messed up indexes and
> restarted again. Soon I added a second worker server (green line) starting
> around 19:15. Once settled in they were pulling 14/s together. but you can
> see in just 15 min, the db server CPU % climbed again to over 40% and the
> performance of the workers dropped, their load falling to 30%. Now at around
> 19:30 I stopped them all, REINDEXed the queue table and then started 3
> workers servers simultaneously. They settled in to 21/s but in just 10 min
> again the deterioration happened. Again I stopped them all, REINDEXed, and
> now started 4 worker servers (48 threads). This time 5 min was not enough to
> see them ever settling into a decent 28/s transaction rate, but I guess they
> might have reached that for a minute or two, only for the index
> deteriorating again. I did another stop now started only 2 servers and
> again, soon the index deteriorated again.
> 
> Clearly that index is deteriorating quickly, in about 10,000 transactions.
> 
> BTW: when I said 7/s, it is in reality about 4 times as many transactions,
> because of the follow up jobs that also get added on this queue. So 10,
> transactions may be 30 or 40 k transactions before the index deteriorates.
> 
> Do you have any suggestion how I can avoid that index deterioration problem
> smoothly?
> 
> I figured I might just pause all workers briefly to schedule the REINDEX
> Queue command, but the problem with this is that while the transaction
> volume is large, some jobs may take minutes to process, and in that case we
> need to wait minutes to quiet the database with then 47 workers sitting as
> idle capacity waiting for the 48th to finish so that the index can be
> rebuilt!
> 
> Of course I tried to resolve the issue with vacuumdb --analyze (just in case
> if the autovacuum doesn't act in time) and that doesn't help. Vacuumdb
> --full --analyze would probably help but can't work because it required an
> exclusive table lock.
> 
> I tried to just create a new index of the same
> 
> CREATE UNIQUE INDEX Queue_idx2_pending ON Queue(jobId) WHERE pending;
> DROP INDEX Queue_idx_pending;
> ANALYZE Queue;
> 
> but with that I got completely stuck with two indexes where I could not
> remove either of them for those locking issues. And REINDEX will give me a
> deadlock error rightout.
> 
> I am looking for a way to manage that index so that it does not deteriorate.
> 
> May be if I was not defining it with
> 
> ... WHERE pending;
> 
> then it would only grow, but never shrink. May be that helps somehow? I
> doubt it though. Adding to an index also causes deterioration, and most of
> the rows would be irrelevant because they would be past work. It would be
> nice

Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.

2019-02-23 Thread Corey Huinker
On Sun, Feb 24, 2019 at 2:04 AM Justin Pryzby  wrote:

> Some ideas:
>
> You could ALTER TABLE SET (fillfactor=50) to try to maximize use of HOT
> indices
> during UPDATEs (check pg_stat_user_indexes).
>
> You could also ALTER TABLE SET autovacuum parameters for more aggressive
> vacuuming.
>
> You could recreate indices using the CONCURRENTLY trick
> (CREATE INDEX CONCURRENTLY new; DROP old; ALTER .. RENAME;)
>

I have basically the same issue with a table. Each new row enters the table
with a active=true kind of flag. The row is updated a lot, until a business
condition expires, it is updated to active=false and then the row is almost
never updated after that.

We also used a partial index, to good effect, but also had/have an issue
where the index bloats and performs worse rather quickly, only to recover a
bit after an autovacuum pass completes.

Lowering the fillfactor isn't a good solution because 99%+ of the table is
"cold".

One manual VACUUM FREEZE coupled with lowering the vacuum sensitivity on
that one table helps quite a bit by increasing the frequency shortening the
runtimes of autovacuums, but it's not a total solution.

My next step is to partition the table on the "active" boolean flag, which
eliminates the need for the partial indexes, and allows for different
fillfactor for each partition (50 for true, 100 for false). This should
also aid vacuum speed and make re-indexing the hot partition much faster.
However, we have to upgrade to v11 first to enable row migration, so I
can't yet report on how much of a solution that is.