On Sat, Feb 23, 2019 at 4:06 PM Gunther <r...@gusw.net> wrote:

> Hi,
>
> I am using an SQL queue for distributing work to massively parallel
> workers.
>
You should look into specialized queueing software.

...

> I figured I might just pause all workers briefly to schedule the REINDEX
> Queue command, but the problem with this is that while the transaction
> volume is large, some jobs may take minutes to process, and in that case we
> need to wait minutes to quiet the database with then 47 workers sitting as
> idle capacity waiting for the 48th to finish so that the index can be
> rebuilt!
>
The jobs that take minutes are themselves the problem.  They prevent tuples
from being cleaned up, meaning all the other jobs needs to grovel through
the detritus every time they need to claim a new row.  If you got those
long running jobs to end, you probably wouldn't even need to reindex--the
problem would go away on its own as the dead-to-all tuples get cleaned up.

Locking a tuple and leaving the transaction open for minutes is going to
cause no end of trouble on a highly active system.  You should look at a
three-state method where the tuple can be pending/claimed/finished, rather
than pending/locked/finished.  That way the process commits immediately
after claiming the tuple, and then records the outcome in another
transaction once it is done processing.  You will need a way to detect
processes that failed after claiming a row but before finishing, but
implementing that is going to be easier than all of this re-indexing stuff
you are trying to do now.  You would claim the row by updating a field in
it to have something distinctive about the process, like its hostname and
pid, so you can figure out if it is still running when it comes time to
clean up apparently forgotten entries.

Cheers,

Jeff

Reply via email to