Tom Lane wrote:

It's also worth noting that work_mem is temporarily set to maintenance_work_mem, which you didn't tell us the value of:

It's left at the default. (16384).

This would be OK if that is all it used for this type of thing.

My recollection is that hash join chooses hash table partitions partly
on the basis of the estimated number of input rows.  Since the estimate
was way off, the actual table size got out of hand a bit :-(

A bit!!

The really worrying bit is that a normal (ish) query also exhibited the same behaviour. I'm a bit worried that if the stats get a bit out of date so that the estimate is off, as in this case, a few backends trying to get this much RAM will see the server grind to a halt.

Is this a fixable bug? It seems a fairly high priority, makes the server go away, type bug to me.

If you need the test data, I could zip the two tables up and send them somewhere....


