Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-05-25 Thread David Rowley
On Wed, 26 May 2021 at 14:19, Andy Fan wrote: > I just checked the latest code, looks like we didn't improve this situation > except > that we introduced a GUC to control it. Am I missing something? I don't > have a > suggestion though. Various extra caching was done to help speed it up.

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-05-25 Thread Andy Fan
On Fri, Mar 12, 2021 at 8:31 AM David Rowley wrote: > Thanks for these suggestions. > > On Mon, 22 Feb 2021 at 14:21, Justin Pryzby wrote: > > > > On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote: > > > To summarise here, the planner performance gets a fair bit worse with > > > the

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-04-01 Thread David Rowley
On Thu, 1 Apr 2021 at 23:41, houzj.f...@fujitsu.com wrote: > > > I've attached the updated patch. I'll let the CFbot grab this to ensure > > it's > > happy with it before I go looking to push it again. > > Hi, > > I took a look into the patch and noticed some minor things. > > 1. > +

RE: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-04-01 Thread houzj.f...@fujitsu.com
> I've attached the updated patch. I'll let the CFbot grab this to ensure it's > happy with it before I go looking to push it again. Hi, I took a look into the patch and noticed some minor things. 1. + case T_ResultCache: + ptype = "ResultCache"; +

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-03-31 Thread David Rowley
On Wed, 31 Mar 2021 at 05:34, Zhihong Yu wrote: > > Hi, > In paraminfo_get_equal_hashops(), > > + /* Reject if there are any volatile functions */ > + if (contain_volatile_functions(expr)) > + { > > You can move the above code to just ahead of: > > + if (IsA(expr, Var)) >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-03-30 Thread Zhihong Yu
Hi, In paraminfo_get_equal_hashops(), + /* Reject if there are any volatile functions */ + if (contain_volatile_functions(expr)) + { You can move the above code to just ahead of: + if (IsA(expr, Var)) + var_relids = bms_make_singleton(((Var *) expr)->varno);

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-03-28 Thread Zhihong Yu
Hi, For show_resultcache_info() + if (rcstate->shared_info != NULL) + { The negated condition can be used with a return. This way, the loop can be unindented. + * ResultCache nodes are intended to sit above a parameterized node in the + * plan tree in order to cache results from them.

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-03-11 Thread David Rowley
On Fri, 12 Mar 2021 at 14:59, Tom Lane wrote: > > David Rowley writes: > > The 0001 patch adds a has_volatile bool field to RestrictInfo and sets > > it when building the RestrictInfo. > > I'm -1 on doing it exactly that way, because you're expending > the cost of those lookups without certainty

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-03-11 Thread Tom Lane
David Rowley writes: > On Tue, 23 Feb 2021 at 18:43, Tom Lane wrote: >> I doubt it's that bad. We could cache such info in RestrictInfo >> for quals, or PathTarget for tlists, without much new notational >> overhead. That doesn't cover everything the planner deals with >> of course, but it

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-03-11 Thread David Rowley
On Tue, 23 Feb 2021 at 14:22, Andy Fan wrote: > > On Mon, Feb 22, 2021 at 9:21 AM Justin Pryzby wrote: >> - Maybe this should be integrated into nestloop rather than being a separate >>plan node. That means that it could be dynamically enabled during >>execution, maybe after a few

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-03-11 Thread David Rowley
Thanks for these suggestions. On Mon, 22 Feb 2021 at 14:21, Justin Pryzby wrote: > > On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote: > > To summarise here, the planner performance gets a fair bit worse with > > the patched code. With master, summing the average planning time over >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-03-04 Thread Ibrar Ahmed
On Tue, Feb 23, 2021 at 10:44 AM Tom Lane wrote: > Andres Freund writes: > > We could add a wrapper node around "planner expressions" that stores > > metadata about them during planning, without those properties leaking > > over expressions used at other times. E.g. having > >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-02-22 Thread Tom Lane
Andres Freund writes: > We could add a wrapper node around "planner expressions" that stores > metadata about them during planning, without those properties leaking > over expressions used at other times. E.g. having > preprocess_expression() return a PlannerExpr that that points to the >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-02-22 Thread Andres Freund
Hi, On 2021-02-22 20:51:17 -0500, Tom Lane wrote: > Andres Freund writes: > > Except for the annoying issue that that we pervasively use Lists as > > expressions, I'd argue that we should actually cache "subtree > > volatility" in Expr nodes, similar to the way we use OpExpr.opfuncid > > etc.

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-02-22 Thread Tom Lane
Andres Freund writes: > Except for the annoying issue that that we pervasively use Lists as > expressions, I'd argue that we should actually cache "subtree > volatility" in Expr nodes, similar to the way we use OpExpr.opfuncid > etc. That'd allow us to make contain_volatile_functions() very cheap

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-02-22 Thread Andres Freund
Hi, On 2021-02-16 23:15:51 +1300, David Rowley wrote: > There might be gains to be had by checking the parse once rather than > having to call contains_volatile_functions in the various places we do > call it. I think both the parallel safety and volatile checks could > then be done in the same

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-02-22 Thread Andy Fan
On Mon, Feb 22, 2021 at 9:21 AM Justin Pryzby wrote: > On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote: > > To summarise here, the planner performance gets a fair bit worse with > > the patched code. With master, summing the average planning time over > > each of the queries

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-02-21 Thread Justin Pryzby
On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote: > To summarise here, the planner performance gets a fair bit worse with > the patched code. With master, summing the average planning time over > each of the queries resulted in a total planning time of 765.7 ms. > After patching, that

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-02-21 Thread Andy Fan
On Tue, Feb 16, 2021 at 6:16 PM David Rowley wrote: > On Wed, 3 Feb 2021 at 19:51, David Rowley wrote: > > I've attached a spreadsheet with the results of each of the tests. > > > > The attached file v13_costing_hacks.patch.txt is the quick and dirty > > patch I put together to run test 5. > >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-01-28 Thread David Rowley
Thanks for having a look at this. I've taken most of your suggestions. The things quoted below are just the ones I didn't agree with or didn't understand. On Thu, 28 Jan 2021 at 18:43, Justin Pryzby wrote: > > On Tue, Dec 08, 2020 at 08:15:52PM +1300, David Rowley wrote: > > +typedef struct

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-10 Thread David Rowley
Thanks a lot for testing this patch. It's good to see it run through a benchmark that exercises quite a few join problems. On Fri, 11 Dec 2020 at 05:44, Konstantin Knizhnik wrote: > For most queries performance is the same, some queries are executed > faster but > one query is 150 times slower:

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-09 Thread David Rowley
On Tue, 8 Dec 2020 at 20:15, David Rowley wrote: > I've attached another patchset that addresses some comments left by > Zhihong Yu over on [1]. The version number got bumped to v12 instead > of v11 as I still have a copy of the other version of the patch which > I made some changes to and

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-07 Thread Zhihong Yu
>> just removing the logic that has the soft upper limit and just have it do cache evictions after each allocation after the cache first fills Yeah - having one fewer limit would simplify the code. Cheers On Mon, Dec 7, 2020 at 5:27 PM David Rowley wrote: > On Mon, 7 Dec 2020 at 14:25,

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-07 Thread David Rowley
On Mon, 7 Dec 2020 at 14:25, Zhihong Yu wrote: > > > + /* Make a guess at a good size when we're not given a valid size. */ > > + if (size == 0) > > + size = 1024; > > > > Should the default size be logged ? > > > I'm not too sure if I know what you mean here. Should it be a power of >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-06 Thread Zhihong Yu
> + /* Make a guess at a good size when we're not given a valid size. */ > + if (size == 0) > + size = 1024; > > Should the default size be logged ? > I'm not too sure if I know what you mean here. Should it be a power of > 2? It is. Or do you mean I shouldn't use a magic number?

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-06 Thread David Rowley
On Sat, 5 Dec 2020 at 16:51, Zhihong Yu wrote: > > There are two blocks with almost identical code (second occurrence in > cache_store_tuple): > > + if (rcstate->mem_used > rcstate->mem_upperlimit) > + { > It would be nice if the code can be extracted to a method and shared. It's true,

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-06 Thread David Rowley
Thanks for having a look at this. On Sat, 5 Dec 2020 at 14:08, Zhihong Yu wrote: > +#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0 > > I think it would be safer if the comparison is enclosed in parentheses (in > case the macro appears in composite condition). That seems fair.

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-06 Thread David Rowley
On Sun, 6 Dec 2020 at 03:52, Andy Fan wrote: > > On Fri, Dec 4, 2020 at 10:41 PM David Rowley wrote: >> >> I also >> noticed that the code I'd written to build the cache lookup expression >> included a step to deform the outer tuple. This was unnecessary and >> slowed down the expression

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-06 Thread David Rowley
Thanks for this review. I somehow missed addressing what's mentioned here for the v10 patch. Comments below. On Mon, 23 Nov 2020 at 02:21, Andy Fan wrote: > 1. modified src/include/utils/selfuncs.h > @@ -70,9 +70,9 @@ > * callers to provide further details about some assumptions which were

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-05 Thread Andy Fan
Thanks for working on the new version. On Fri, Dec 4, 2020 at 10:41 PM David Rowley wrote: > > I also > noticed that the code I'd written to build the cache lookup expression > included a step to deform the outer tuple. This was unnecessary and > slowed down the expression evaluation. > > I

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-04 Thread Zhihong Yu
There are two blocks with almost identical code (second occurrence in cache_store_tuple): + if (rcstate->mem_used > rcstate->mem_upperlimit) + { It would be nice if the code can be extracted to a method and shared. node->rc_status = RC_END_OF_SCAN;

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-26 Thread Andy Fan
On Fri, Nov 27, 2020 at 8:10 AM David Rowley wrote: > Thanks for having another look at this. > > > On Sun, Nov 22, 2020 at 9:21 PM Andy Fan > wrote: > > add 2 more comments. > > > > 1. I'd suggest adding Assert(false); in RC_END_OF_SCAN case to make the > error clearer. > > > > case

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-26 Thread David Rowley
Thanks for having another look at this. > On Sun, Nov 22, 2020 at 9:21 PM Andy Fan wrote: > add 2 more comments. > > 1. I'd suggest adding Assert(false); in RC_END_OF_SCAN case to make the > error clearer. > > case RC_END_OF_SCAN: > /* > * We've already returned NULL for this scan, but just in

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-22 Thread Andy Fan
On Sun, Nov 22, 2020 at 9:21 PM Andy Fan wrote: > Hi David: > > I did a review on the v8, it looks great to me. Here are some tiny > things noted, > just FYI. > > 1. modified src/include/utils/selfuncs.h > @@ -70,9 +70,9 @@ > * callers to provide further details about some assumptions

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-22 Thread Andy Fan
Hi David: I did a review on the v8, it looks great to me. Here are some tiny things noted, just FYI. 1. modified src/include/utils/selfuncs.h @@ -70,9 +70,9 @@ * callers to provide further details about some assumptions which were made * during the estimation. */ -#define

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-11 Thread David Rowley
On Tue, 10 Nov 2020 at 12:55, David Rowley wrote: > > On Tue, 10 Nov 2020 at 12:49, Tom Lane wrote: > > > > Alvaro Herrera writes: > > > Are you taking into account the possibility that generated machine code > > > is a small percent slower out of mere bad luck? I remember someone > > >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-09 Thread David Rowley
On Tue, 10 Nov 2020 at 15:38, Andy Fan wrote: > While I have interest about what caused the tiny difference, I admit that > what direction > this patch should go is more important. Not sure if anyone is convinced that > v8 and v9 have a similar performance. The current data show it is

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-09 Thread Andy Fan
On Tue, Nov 10, 2020 at 7:55 AM David Rowley wrote: > On Tue, 10 Nov 2020 at 12:49, Tom Lane wrote: > > > > Alvaro Herrera writes: > > > Are you taking into account the possibility that generated machine code > > > is a small percent slower out of mere bad luck? I remember someone > > >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-09 Thread Peter Geoghegan
On Mon, Nov 9, 2020 at 3:49 PM Tom Lane wrote: > Alvaro Herrera writes: > > Are you taking into account the possibility that generated machine code > > is a small percent slower out of mere bad luck? I remember someone > > suggesting that they can make code 2% faster or so by inserting random >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-09 Thread David Rowley
On Tue, 10 Nov 2020 at 12:49, Tom Lane wrote: > > Alvaro Herrera writes: > > Are you taking into account the possibility that generated machine code > > is a small percent slower out of mere bad luck? I remember someone > > suggesting that they can make code 2% faster or so by inserting random

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-09 Thread Tom Lane
Alvaro Herrera writes: > Are you taking into account the possibility that generated machine code > is a small percent slower out of mere bad luck? I remember someone > suggesting that they can make code 2% faster or so by inserting random > no-op instructions in the binary, or something like

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-09 Thread Alvaro Herrera
On 2020-Nov-10, David Rowley wrote: > On Mon, 9 Nov 2020 at 16:29, Andy Fan wrote: > > However I believe v9 > > should be no worse than v8 all the time, Is there any theory to explain > > your result? > > Nothing jumps out at me from looking at profiles. The only thing I > noticed was the

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-09 Thread David Rowley
On Mon, 9 Nov 2020 at 16:29, Andy Fan wrote: > I think either version is OK for me and I like this patch overall. That's good to know. Thanks. > However I believe v9 > should be no worse than v8 all the time, Is there any theory to explain > your result? Nothing jumps out at me from looking

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-08 Thread Andy Fan
On Mon, Nov 9, 2020 at 10:07 AM David Rowley wrote: > On Mon, 9 Nov 2020 at 03:52, Andy Fan wrote: > > then I did a perf on the 2 version, Is it possible that you called > tts_minimal_clear twice in > > the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple called > tts_minimal_clear

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-08 Thread David Rowley
On Mon, 9 Nov 2020 at 03:52, Andy Fan wrote: > then I did a perf on the 2 version, Is it possible that you called > tts_minimal_clear twice in > the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple called > tts_minimal_clear > on the same slot. > > With the following changes:

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-08 Thread Andy Fan
On Mon, Nov 2, 2020 at 3:44 PM David Rowley wrote: > On Tue, 20 Oct 2020 at 22:30, David Rowley wrote: > > > > So far benchmarking shows there's still a regression from the v8 > > version of the patch. This is using count(*). An earlier test [1] did > > show speedups when we needed to deform

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-08 Thread Andy Fan
On Fri, Nov 6, 2020 at 6:13 AM David Rowley wrote: > On Mon, 2 Nov 2020 at 20:43, David Rowley wrote: > > > > On Tue, 20 Oct 2020 at 22:30, David Rowley wrote: > > I did some further tests this time with some tuple deforming. Again, > > it does seem that v9 is slower than v8. > > > > Graphs

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-05 Thread David Rowley
On Mon, 2 Nov 2020 at 20:43, David Rowley wrote: > > On Tue, 20 Oct 2020 at 22:30, David Rowley wrote: > I did some further tests this time with some tuple deforming. Again, > it does seem that v9 is slower than v8. > > Graphs attached > > Looking at profiles, I don't really see any obvious

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-09-02 Thread Alvaro Herrera
On 2020-Sep-02, David Rowley wrote: > v7 (Separate Result Cache node) > Query 1: > Execution Time: 894.003 ms > > Query 2: > Execution Time: 854.950 ms > v7 + hacks_V3 (caching done in Nested Loop) > Query 1: > Execution Time: 770.470 ms > > Query 2 > Execution Time: 779.181 ms Wow, this

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-09-01 Thread David Rowley
On Sat, 29 Aug 2020 at 02:54, David Rowley wrote: > > On Wed, 26 Aug 2020 at 03:52, Andres Freund wrote: > > There'll be a significant reduction in increase in performance. > > So I did a very rough-cut change to the patch to have the caching be > part of Nested Loop. It can be applied on top

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-31 Thread David Rowley
Thanks for chipping in here. On Mon, 31 Aug 2020 at 17:57, Thomas Munro wrote: > I wonder what it would take to overcome the overheads of the separate > Result Cache node, with techniques to step out of the way or something > like that. So far it looks like there are more overheads to having

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-30 Thread Thomas Munro
On Sat, Aug 29, 2020 at 3:33 AM Robert Haas wrote: > I think David's points elsewhere on the thread about ProjectSet and > Materialize nodes are interesting. Indeed, I'm now finding it very difficult to look past the similarity with: postgres=# explain select count(*) from t t1 cross join t t2;

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-28 Thread Robert Haas
On Wed, Aug 19, 2020 at 6:58 PM Alvaro Herrera wrote: > On 2020-Aug-19, David Rowley wrote: > > Andres' suggestion: > > regression=# explain (analyze, costs off, timing off, summary off) > > select count(*) from tenk1 t1 inner join tenk1 t2 on > > t1.twenty=t2.unique1; > >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-28 Thread David Rowley
On Sat, 29 Aug 2020 at 02:54, David Rowley wrote: > I'm open to ideas to make the comparison fairer. While on that, it's not just queries that don't require the cached tuple to be deformed that are slower. Here's a couple of example that do requite both patches to deform the cached tuple: Some

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-28 Thread David Rowley
On Wed, 26 Aug 2020 at 03:52, Andres Freund wrote: > > On 2020-08-25 20:48:37 +1200, David Rowley wrote: > > However, given the correct planner choice, there will never be > > a gross slowdown due to having the extra node. > > There'll be a significant reduction in increase in performance. So I

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-26 Thread Andy Fan
On Wed, Aug 26, 2020 at 8:14 AM David Rowley wrote: > On Wed, 26 Aug 2020 at 05:18, Andy Fan wrote: > > > > > > On Tue, Aug 25, 2020 at 11:53 PM Andres Freund > wrote: > >> > >> On 2020-08-25 20:48:37 +1200, David Rowley wrote: > >> > Also, just in case anyone is misunderstanding this Andres'

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread David Rowley
On Wed, 26 Aug 2020 at 05:18, Andy Fan wrote: > > > On Tue, Aug 25, 2020 at 11:53 PM Andres Freund wrote: >> >> On 2020-08-25 20:48:37 +1200, David Rowley wrote: >> > Also, just in case anyone is misunderstanding this Andres' argument. >> > It's entirely based on the performance impact of having

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread Andy Fan
On Tue, Aug 25, 2020 at 11:53 PM Andres Freund wrote: > Hi, > > On 2020-08-25 20:48:37 +1200, David Rowley wrote: > > On Tue, 25 Aug 2020 at 08:26, Andres Freund wrote: > > > While I'm against introducing a separate node for the caching, I'm > *not* > > > against displaying a different node

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread Andy Fan
On Tue, Aug 25, 2020 at 11:53 PM Andres Freund wrote: > Hi, > > On 2020-08-25 20:48:37 +1200, David Rowley wrote: > > On Tue, 25 Aug 2020 at 08:26, Andres Freund wrote: > > > While I'm against introducing a separate node for the caching, I'm > *not* > > > against displaying a different node

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread Andres Freund
Hi, On 2020-08-25 20:48:37 +1200, David Rowley wrote: > On Tue, 25 Aug 2020 at 08:26, Andres Freund wrote: > > While I'm against introducing a separate node for the caching, I'm *not* > > against displaying a different node type when caching is > > present. E.g. it'd be perfectly reasonable from

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread Gavin Flower
On 25/08/2020 20:48, David Rowley wrote: On Tue, 25 Aug 2020 at 08:26, Andres Freund wrote: While I'm against introducing a separate node for the caching, I'm *not* against displaying a different node type when caching is present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-25 Thread David Rowley
On Tue, 25 Aug 2020 at 08:26, Andres Freund wrote: > While I'm against introducing a separate node for the caching, I'm *not* > against displaying a different node type when caching is > present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached > Nested Loop' join and a plain

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-24 Thread Andres Freund
Hi, On 2020-08-19 18:58:11 -0400, Alvaro Herrera wrote: > On 2020-Aug-19, David Rowley wrote: > > > Andres' suggestion: > > > > regression=# explain (analyze, costs off, timing off, summary off) > > select count(*) from tenk1 t1 inner join tenk1 t2 on > > t1.twenty=t2.unique1; > >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-19 Thread Pavel Stehule
čt 20. 8. 2020 v 0:04 odesílatel David Rowley napsal: > On Wed, 19 Aug 2020 at 16:18, Pavel Stehule > wrote: > > > > > > > > st 19. 8. 2020 v 5:48 odesílatel David Rowley > napsal: > >> Current method: > >> > >> regression=# explain (analyze, costs off, timing off, summary off) > >> select

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-19 Thread David Rowley
On Thu, 20 Aug 2020 at 10:58, Alvaro Herrera wrote: > On the performance aspect, I wonder what the overhead is, particularly > considering Tom's point of making these nodes more expensive for cases > with no caching. It's likely small. I've not written any code but only thought about it and I

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-19 Thread Alvaro Herrera
On 2020-Aug-19, David Rowley wrote: > Andres' suggestion: > > regression=# explain (analyze, costs off, timing off, summary off) > select count(*) from tenk1 t1 inner join tenk1 t2 on > t1.twenty=t2.unique1; > QUERY PLAN >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-19 Thread David Rowley
On Wed, 19 Aug 2020 at 16:18, Pavel Stehule wrote: > > > > st 19. 8. 2020 v 5:48 odesílatel David Rowley napsal: >> Current method: >> >> regression=# explain (analyze, costs off, timing off, summary off) >> select twenty, (select count(*) from tenk1 t2 where t1.twenty = >> t2.twenty) from tenk1

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-19 Thread David Rowley
On Wed, 19 Aug 2020 at 16:23, Tom Lane wrote: > > David Rowley writes: > > I don't object to making the change. I just object to making it only > > to put it back again later when someone else speaks up that they'd > > prefer to keep nodes modular and not overload them in obscure ways. > > > So

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-18 Thread Tom Lane
David Rowley writes: > I don't object to making the change. I just object to making it only > to put it back again later when someone else speaks up that they'd > prefer to keep nodes modular and not overload them in obscure ways. > So other input is welcome. Is it too weird to overload SubPlan

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-18 Thread Pavel Stehule
st 19. 8. 2020 v 5:48 odesílatel David Rowley napsal: > On Tue, 18 Aug 2020 at 21:42, David Rowley wrote: > > > > On Tue, 11 Aug 2020 at 17:44, Andres Freund wrote: > > > > > > Hi, > > > > > > On 2020-08-11 17:23:42 +1200, David Rowley wrote: > > > > On Tue, 11 Aug 2020 at 12:21, Andres Freund

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-18 Thread David Rowley
On Tue, 18 Aug 2020 at 21:42, David Rowley wrote: > > On Tue, 11 Aug 2020 at 17:44, Andres Freund wrote: > > > > Hi, > > > > On 2020-08-11 17:23:42 +1200, David Rowley wrote: > > > On Tue, 11 Aug 2020 at 12:21, Andres Freund wrote: > > > > > > > > On 2020-07-09 10:25:14 +1200, David Rowley

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-18 Thread David Rowley
On Tue, 11 Aug 2020 at 17:44, Andres Freund wrote: > > Hi, > > On 2020-08-11 17:23:42 +1200, David Rowley wrote: > > On Tue, 11 Aug 2020 at 12:21, Andres Freund wrote: > > > > > > On 2020-07-09 10:25:14 +1200, David Rowley wrote: > > > > On Thu, 9 Jul 2020 at 04:53, Andres Freund wrote: > > > >

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-10 Thread Andres Freund
Hi, On 2020-08-11 17:23:42 +1200, David Rowley wrote: > On Tue, 11 Aug 2020 at 12:21, Andres Freund wrote: > > > > On 2020-07-09 10:25:14 +1200, David Rowley wrote: > > > On Thu, 9 Jul 2020 at 04:53, Andres Freund wrote: > > > > I'm not convinced it's a good idea to introduce a separate

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-10 Thread David Rowley
On Tue, 11 Aug 2020 at 12:21, Andres Freund wrote: > > On 2020-07-09 10:25:14 +1200, David Rowley wrote: > > On Thu, 9 Jul 2020 at 04:53, Andres Freund wrote: > > > I'm not convinced it's a good idea to introduce a separate executor node > > > for this. There's a fair bit of overhead in them,

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-10 Thread Andres Freund
Hi, On 2020-07-09 10:25:14 +1200, David Rowley wrote: > On Thu, 9 Jul 2020 at 04:53, Andres Freund wrote: > > > > On 2020-05-20 23:44:27 +1200, David Rowley wrote: > > > I've attached a patch which implements this. The new node type is > > > called "Result Cache". I'm not particularly wedded

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-06 Thread David Rowley
On Thu, 6 Aug 2020 at 05:44, Andres Freund wrote: > > Andres, or anyone, any objections to me pushing 0002? > > I think it'd be good to add a warning that, unless one is very careful, > no other hashtable modifications are allowed between lookup and > modification. E.g. something like > a =

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-05 Thread David Rowley
On Thu, 6 Aug 2020 at 08:13, Robert Haas wrote: > > This is cool work; I am going to bikeshed on the name for a minute. I > don't think Result Cache is terrible, but I have two observations: Thanks > 1. It might invite confusion with a feature of some other database > systems where they cache

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-05 Thread Peter Geoghegan
On Wed, May 20, 2020 at 4:44 AM David Rowley wrote: > Does it seem like something we might want for PG14? Minor terminology issue: "Hybrid Hash Join" is a specific hash join algorithm which is unrelated to what you propose to do here. I hope that confusion can be avoided, possibly by not using

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-05 Thread Robert Haas
On Wed, May 20, 2020 at 7:44 AM David Rowley wrote: > I've attached a patch which implements this. The new node type is > called "Result Cache". I'm not particularly wedded to keeping that > name, but if I change it, I only want to do it once. I've got a few > other names I mind, but I don't

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-05 Thread Andres Freund
Hi, On 2020-08-04 10:05:25 +1200, David Rowley wrote: > I'd like to push the 0002 patch quite soon as I think it's an > improvement to simplehash.h regardless of if we get Result Cache. It > reuses the SH_LOOKUP function for deletes. Also, if we ever get around > to giving up performing a lookup

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-07-08 Thread David Rowley
On Thu, 9 Jul 2020 at 04:53, Andres Freund wrote: > > On 2020-05-20 23:44:27 +1200, David Rowley wrote: > > I've attached a patch which implements this. The new node type is > > called "Result Cache". I'm not particularly wedded to keeping that > > name, but if I change it, I only want to do it

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-07-08 Thread Andres Freund
Hi, On 2020-05-20 23:44:27 +1200, David Rowley wrote: > I've attached a patch which implements this. The new node type is > called "Result Cache". I'm not particularly wedded to keeping that > name, but if I change it, I only want to do it once. I've got a few > other names I mind, but I don't

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-06-29 Thread David Rowley
On Fri, 12 Jun 2020 at 16:10, Andy Fan wrote: > I first tried to see if we can have a row estimation before the subplan > is created and it looks very complex. The subplan was created during > preprocess_qual_conditions, at that time, we even didn't create the base > RelOptInfo , to say nothing

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-06-11 Thread Andy Fan
On Wed, Jun 3, 2020 at 10:36 AM Andy Fan wrote: > >> Thanks for running those tests. I had a quick look at the results and >> I think to say that all 4 are better is not quite right. One is >> actually a tiny bit slower and one is only faster due to a plan >> change. >> >> > Yes.. Thanks for

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-06-02 Thread Andy Fan
> > > Thanks for running those tests. I had a quick look at the results and > I think to say that all 4 are better is not quite right. One is > actually a tiny bit slower and one is only faster due to a plan > change. > > Yes.. Thanks for pointing it out. > Q18 uses a result cache for 2 x

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-06-02 Thread David Rowley
On Tue, 2 Jun 2020 at 21:05, Andy Fan wrote: > Today I tested the correctness & performance of this patch based on TPC-H > workload, the environment is setup based on [1]. Correctness is tested by > storing the result into another table when this feature is not introduced and > then enable this

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-05-25 Thread David Rowley
On Fri, 22 May 2020 at 12:12, Andy Fan wrote: > Actually I am not sure about what does the "parameterized sub plan" mean (I > treat is a SubPlan Node), so please correct me if I misunderstand you:) > Because > the inner plan in nest loop not a SubPlan node actually. so if bind the > facility

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-05-21 Thread Andy Fan
> My question is whether it should be added as an optional facility of a > parameterised sub plan, rather than an always-needed full-strength node. > That way the choice of whether to use it can happen at execution time once > we notice that we've been called too many times. > > Actually I am not

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-05-21 Thread David Rowley
On Thu, 21 May 2020 at 00:56, Simon Riggs wrote: > I thought the main reason to do this was the case when the nested loop > subplan was significantly underestimated and we realize during execution that > we should have built a hash table. So including this based on cost alone > seems to miss a

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-05-20 Thread Simon Riggs
On Wed, 20 May 2020 at 12:44, David Rowley wrote: > Hackers, > > Over on [1], Heikki mentioned about the usefulness of caching results > from parameterized subplans so that they could be used again for > subsequent scans which have the same parameters as a previous scan. > On [2], I mentioned