This particular db is on 9.3.15.  Recently we had a serious performance
degradation related to a batch job that creates 4-5 temp tables and 5
indexes.  It is a really badly written job but what really confuses us is
that this job has been running for years with no issue remotely approaching
this one.  We are also using pgpool.

The job would kick off with 20-30 of similar queries running at once.  The
thing normally takes only 30ms or so to run - it only operates on 1
customer at a time (yes, it's horribly written).  All of a sudden the
cluster started thrashing and performance seriously degraded.  We tried a
number of things with no success:

   - Analyzed the whole database
   - Turned off full logging
   - Turned off synchronous commit
   - Vacuumed several of the catalog tables
   - Checked if we had an abnormal high amount of traffic this time - we
   didn't
   - No abnormal disk/network issues (we would have seen much larger issues
   if that had been the case)
   - Tried turning down the number of app nodes running

What ended up completely resolving the issue was converting the query to
use ctes instead of temp tables.  That means we avoided the disk writing
and the catalog churn, and useless indexes.  However, we are baffled as to
why this could make such a big difference when we had no issue like this
before, and we have seen no systematic performance degradation in our
system.

Any insights would be greatly appreciated, as we are concerned not knowing
the root cause.

Thanks,
Jeremy

Reply via email to