Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-03-07 Thread Tom Lane
Thomas Munro writes: > I have been wondering about a couple of different worst case execution > strategies that would be better than throwing our hands up and > potentially exploding memory once we detect that further partitioning > is not going to help, if we still manage to reach that case despi

Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-19 Thread Robert Haas
On Thu, Feb 16, 2017 at 8:13 PM, Thomas Munro wrote: > Obviously there are vanishing returns here as we add more defences > making it increasingly unlikely that we hit "fail" mode. But it > bothers me that hash joins in general are not 100% guaranteed to be > able to complete unless you have infi

Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-16 Thread Thomas Munro
On Fri, Feb 17, 2017 at 11:13 AM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Feb 16, 2017 at 3:51 PM, Tom Lane wrote: >>> No, it'd be the *most* common MCV, because we're concerned about the >>> worst-case (largest) bucket size. But that's good, really, because the >>> highest MCV frequen

Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-16 Thread Tom Lane
Robert Haas writes: > On Thu, Feb 16, 2017 at 3:51 PM, Tom Lane wrote: >> No, it'd be the *most* common MCV, because we're concerned about the >> worst-case (largest) bucket size. But that's good, really, because the >> highest MCV frequency will be the one we have most statistical >> confidence

Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-16 Thread Robert Haas
On Thu, Feb 16, 2017 at 3:51 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Feb 16, 2017 at 2:38 PM, Tom Lane wrote: >>> I initially thought about driving the shutoff strictly from the estimate >>> of the MCV frequency, without involving the more general ndistinct >>> computation that esti

Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-16 Thread Tom Lane
Robert Haas writes: > On Thu, Feb 16, 2017 at 2:38 PM, Tom Lane wrote: >> I initially thought about driving the shutoff strictly from the estimate >> of the MCV frequency, without involving the more general ndistinct >> computation that estimate_hash_bucketsize does. I'm not sure how much >> tha

Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-16 Thread Robert Haas
On Thu, Feb 16, 2017 at 2:38 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Feb 16, 2017 at 2:02 PM, Tom Lane wrote: >>> This might be overly aggressive, because it will pretty much shut off >>> any attempt to use hash joining on a large inner relation unless we >>> have statistics for it

Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-16 Thread Tom Lane
Robert Haas writes: > On Thu, Feb 16, 2017 at 2:02 PM, Tom Lane wrote: >> This might be overly aggressive, because it will pretty much shut off >> any attempt to use hash joining on a large inner relation unless we >> have statistics for it (and those stats are favorable). But having >> seen thi

Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-16 Thread Peter Geoghegan
On Thu, Feb 16, 2017 at 11:11 AM, Robert Haas wrote: > I do think that's worrying, but on the other hand it seems like this > solution could disable many hash joins that would actually be fine. I > don't think the largest ndistinct estimates we ever generate are very > large, and therefore this s

Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-16 Thread Robert Haas
On Thu, Feb 16, 2017 at 2:02 PM, Tom Lane wrote: > The planner doesn't currently worry about work_mem restrictions when > planning a hash join, figuring that the executor should be able to > subdivide the data arbitrarily finely by splitting buckets at runtime. > However there's a thread here: > h

[HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-16 Thread Tom Lane
The planner doesn't currently worry about work_mem restrictions when planning a hash join, figuring that the executor should be able to subdivide the data arbitrarily finely by splitting buckets at runtime. However there's a thread here: https://www.postgresql.org/message-id/flat/CACw4T0p4Lzd6Vpwpt