Re: Trouble with hashagg spill I/O pattern and costing

2020-06-05 Thread Tomas Vondra
On Fri, Jun 05, 2020 at 06:51:34PM -0400, Alvaro Herrera wrote: On 2020-Jun-06, Tomas Vondra wrote: On Fri, Jun 05, 2020 at 05:19:43PM -0400, Alvaro Herrera wrote: > Is this patch the only thing missing before this open item can be > considered closed? I've already pushed this as

Re: Trouble with hashagg spill I/O pattern and costing

2020-06-05 Thread Alvaro Herrera
On 2020-Jun-06, Tomas Vondra wrote: > On Fri, Jun 05, 2020 at 05:19:43PM -0400, Alvaro Herrera wrote: > > > Is this patch the only thing missing before this open item can be > > considered closed? > > I've already pushed this as 4cad2534da6d17067d98cf04be2dfc1bda8f2cd0, > sorry for not

Re: Trouble with hashagg spill I/O pattern and costing

2020-06-05 Thread Tomas Vondra
On Fri, Jun 05, 2020 at 05:19:43PM -0400, Alvaro Herrera wrote: Hello Is this patch the only thing missing before this open item can be considered closed? I've already pushed this as 4cad2534da6d17067d98cf04be2dfc1bda8f2cd0, sorry for not mentioning it in this thread explicitly. regards

Re: Trouble with hashagg spill I/O pattern and costing

2020-06-05 Thread Alvaro Herrera
Hello Is this patch the only thing missing before this open item can be considered closed? Thanks -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-29 Thread Jeff Davis
On Fri, 2020-05-29 at 15:04 +0200, Tomas Vondra wrote: > Ah, right. Yeah, we only need to check for AGG_HASH here. Moreover, > AGG_MIXED probably does not need the tlist tweak, because the input > should be pre-sorted as with AGG_SORTED. > > And we should probably do similar check in the >

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-29 Thread Tomas Vondra
On Thu, May 28, 2020 at 06:14:55PM -0700, Jeff Davis wrote: On Thu, 2020-05-28 at 20:57 +0200, Tomas Vondra wrote: Attached is a patch adding CP_SMALL_TLIST to create_agg_plan and create_groupingsets_plan. Looks good, except one question: Why would aggstrategy ever be MIXED when in

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-28 Thread Jeff Davis
On Thu, 2020-05-28 at 20:57 +0200, Tomas Vondra wrote: > Attached is a patch adding CP_SMALL_TLIST to create_agg_plan and > create_groupingsets_plan. Looks good, except one question: Why would aggstrategy ever be MIXED when in create_agg_path? Wouldn't that only happen in

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-28 Thread Jeff Davis
On Tue, 2020-05-26 at 17:40 -0700, Jeff Davis wrote: > On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote: > > Yeah. I agree prefetching is definitely out of v13 scope. It might > > be > > interesting to try how useful would it be, if you're willing to > > spend > > some time on a prototype. >

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-28 Thread Tomas Vondra
On Wed, May 27, 2020 at 11:07:04AM +0200, Tomas Vondra wrote: On Tue, May 26, 2020 at 06:42:50PM -0700, Melanie Plageman wrote: On Tue, May 26, 2020 at 5:40 PM Jeff Davis wrote: On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote: As for the tlist fix, I think that's mostly ready too -

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-27 Thread Tomas Vondra
On Tue, May 26, 2020 at 06:42:50PM -0700, Melanie Plageman wrote: On Tue, May 26, 2020 at 5:40 PM Jeff Davis wrote: On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote: > > As for the tlist fix, I think that's mostly ready too - the one thing > we > should do is probably only doing it for

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-26 Thread Melanie Plageman
On Tue, May 26, 2020 at 5:40 PM Jeff Davis wrote: > On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote: > > > > As for the tlist fix, I think that's mostly ready too - the one thing > > we > > should do is probably only doing it for AGG_HASHED. For AGG_SORTED > > it's > > not really

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-26 Thread Jeff Davis
On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote: > Yeah. I agree prefetching is definitely out of v13 scope. It might be > interesting to try how useful would it be, if you're willing to spend > some time on a prototype. I think a POC would be pretty quick; I'll see if I can hack something

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-26 Thread Tomas Vondra
On Tue, May 26, 2020 at 11:40:07AM -0700, Jeff Davis wrote: On Tue, 2020-05-26 at 16:15 +0200, Tomas Vondra wrote: I'm not familiar with logtape internals but IIRC the blocks are linked by each block having a pointer to the prev/next block, which means we can't prefetch more than one block

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-26 Thread Jeff Davis
On Tue, 2020-05-26 at 16:15 +0200, Tomas Vondra wrote: > I'm not familiar with logtape internals but IIRC the blocks are > linked > by each block having a pointer to the prev/next block, which means we > can't prefetch more than one block ahead I think. But maybe I'm > wrong, > or maybe fetching

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-25 Thread Thomas Munro
On Tue, May 26, 2020 at 10:59 AM Tomas Vondra wrote: > On Mon, May 25, 2020 at 12:49:45PM -0700, Jeff Davis wrote: > >Do you think the difference in IO patterns is due to a difference in > >handling reads vs. writes in the kernel? Or do you think that 128 > >blocks is not enough to amortize the

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-25 Thread Tomas Vondra
On Mon, May 25, 2020 at 12:49:45PM -0700, Jeff Davis wrote: On Mon, 2020-05-25 at 14:17 +0200, Tomas Vondra wrote: It's still ~2x slower than the sort, so presumably we'll need to tweak the costing somehow. One thing to think about is that the default random_page_cost is only 4X

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-25 Thread Tomas Vondra
On Mon, May 25, 2020 at 11:36:42AM -0700, Jeff Davis wrote: On Mon, 2020-05-25 at 04:10 +0200, Tomas Vondra wrote: algorithm master prealloc tlist prealloc-tlist -- hash1365 437368 213

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-25 Thread Jeff Davis
On Mon, 2020-05-25 at 14:17 +0200, Tomas Vondra wrote: > It's still ~2x slower than the sort, so presumably we'll need to > tweak > the costing somehow. One thing to think about is that the default random_page_cost is only 4X seq_page_cost. We know that's complete fiction, but it's meant to paper

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-25 Thread Jeff Davis
On Mon, 2020-05-25 at 04:10 +0200, Tomas Vondra wrote: > algorithm master prealloc tlist prealloc-tlist > -- > hash1365 437368 213 > sort 226 214224 215 > > The

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-25 Thread Tomas Vondra
On Mon, May 25, 2020 at 04:10:45AM +0200, Tomas Vondra wrote: ... parallel queries And now the fun begins ... 1) small one (SSD, max_parallel_workers_per_gather = 2) algorithm master tlist prealloc prealloc+tlist --

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-24 Thread Tomas Vondra
On Thu, May 21, 2020 at 11:41:22PM +0200, Tomas Vondra wrote: On Thu, May 21, 2020 at 02:16:37PM -0700, Jeff Davis wrote: On Thu, 2020-05-21 at 21:13 +0200, Tomas Vondra wrote: 2) We could make it self-tuning, by increasing the number of blocks we pre-allocate. So every time we exhaust the

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Tomas Vondra
On Thu, May 21, 2020 at 02:16:37PM -0700, Jeff Davis wrote: On Thu, 2020-05-21 at 21:13 +0200, Tomas Vondra wrote: 2) We could make it self-tuning, by increasing the number of blocks we pre-allocate. So every time we exhaust the range, we double the number of blocks (with a reasonable maximum,

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Jeff Davis
On Thu, 2020-05-21 at 21:13 +0200, Tomas Vondra wrote: > 2) We could make it self-tuning, by increasing the number of blocks > we pre-allocate. So every time we exhaust the range, we double the > number of blocks (with a reasonable maximum, like 1024 or so). Or we > might just increment it by 32,

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Tomas Vondra
On Thu, May 21, 2020 at 12:40:23PM -0700, Jeff Davis wrote: On Thu, 2020-05-21 at 21:13 +0200, Tomas Vondra wrote: 1) Instead of assigning the pages one by one, we can easily extend the API to allow getting a range of blocks, so that we don't need to call ltsGetFreeBlock in a loop. Instead we

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Jeff Davis
On Thu, 2020-05-21 at 21:13 +0200, Tomas Vondra wrote: > 1) Instead of assigning the pages one by one, we can easily extend > the > API to allow getting a range of blocks, so that we don't need to call > ltsGetFreeBlock in a loop. Instead we could call ltsGetFreeBlockRange > with the requested

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Tomas Vondra
On Thu, May 21, 2020 at 12:04:19PM -0700, Jeff Davis wrote: On Thu, 2020-05-21 at 20:54 +0200, Tomas Vondra wrote: The last column is master with the tlist tweak alone - it's better than hashagg on master alone, but it's not nearly as good as with both tlist and prealloc patches. Right, I

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Tomas Vondra
On Tue, May 19, 2020 at 09:15:40PM -0700, Jeff Davis wrote: On Tue, 2020-05-19 at 19:53 +0200, Tomas Vondra wrote: And if there a way to pre-allocate larger chunks? Presumably we could assign the blocks to tape in larger chunks (e.g. 128kB, i.e. 16 x 8kB) instead of just single block. I

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Jeff Davis
On Thu, 2020-05-21 at 20:54 +0200, Tomas Vondra wrote: > The last column is master with the tlist tweak alone - it's better > than > hashagg on master alone, but it's not nearly as good as with both > tlist > and prealloc patches. Right, I certainly think we should do the prealloc change, as

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Tomas Vondra
On Thu, May 21, 2020 at 08:34:05PM +0200, Tomas Vondra wrote: On Thu, May 21, 2020 at 11:19:01AM -0700, Jeff Davis wrote: ... I think we should do the pre-allocation patch too. I haven't tried yet but I believe the tlist fix alone won't do nearly as good. I've done some measurements on the

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Tomas Vondra
On Thu, May 21, 2020 at 11:19:01AM -0700, Jeff Davis wrote: On Thu, 2020-05-21 at 16:30 +0200, Tomas Vondra wrote: OK, it seems the attached trivial fix (simply changing CP_LABEL_TLIST to CP_SMALL_TLIST) addresses this for me. Great! There were a couple plan changes where it introduced a

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Jeff Davis
On Thu, 2020-05-21 at 16:30 +0200, Tomas Vondra wrote: > OK, it seems the attached trivial fix (simply changing CP_LABEL_TLIST > to > CP_SMALL_TLIST) addresses this for me. Great! There were a couple plan changes where it introduced a Subquery Scan. I'm not sure that I understand why it's doing

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Robert Haas
On Thu, May 21, 2020 at 10:45 AM Tomas Vondra wrote: > So the pre-allocation makes it 10x faster, and the tlist tweak makes it > 5x faster. Not bad, I guess. That is pretty great stuff, Tomas. FWIW, I agree that CP_SMALL_TLIST seems like the right thing here. -- Robert Haas EnterpriseDB:

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Tomas Vondra
On Thu, May 21, 2020 at 02:12:55AM +0200, Tomas Vondra wrote: ... I agree that's pretty nice. I wonder how far would we need to go before reaching a plateau. I'll try this on the other machine with temporary tablespace on SATA, but that'll take longer. OK, I've managed to get some numbers

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Tomas Vondra
On Thu, May 21, 2020 at 03:41:22PM +0200, Tomas Vondra wrote: On Tue, May 19, 2020 at 05:12:02PM +0200, Tomas Vondra wrote: ... The problem is that the hashagg plan runs in ~1400 seconds, while the groupagg only takes ~360. And per explain analyze, the difference really is in the aggregation

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-21 Thread Tomas Vondra
On Tue, May 19, 2020 at 05:12:02PM +0200, Tomas Vondra wrote: ... The problem is that the hashagg plan runs in ~1400 seconds, while the groupagg only takes ~360. And per explain analyze, the difference really is in the aggregation - if we subtract the seqscan, the sort+groupagg takes about

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-20 Thread Tomas Vondra
On Tue, May 19, 2020 at 09:15:40PM -0700, Jeff Davis wrote: On Tue, 2020-05-19 at 19:53 +0200, Tomas Vondra wrote: And if there a way to pre-allocate larger chunks? Presumably we could assign the blocks to tape in larger chunks (e.g. 128kB, i.e. 16 x 8kB) instead of just single block. I

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-19 Thread Jeff Davis
On Tue, 2020-05-19 at 19:53 +0200, Tomas Vondra wrote: > > And if there a way to pre-allocate larger chunks? Presumably we could > assign the blocks to tape in larger chunks (e.g. 128kB, i.e. 16 x > 8kB) > instead of just single block. I haven't seen anything like that in > tape.c, though ... It

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-19 Thread Tomas Vondra
On Tue, May 19, 2020 at 09:27:34AM -0700, Jeff Davis wrote: On Tue, 2020-05-19 at 17:12 +0200, Tomas Vondra wrote: I think there are two related problem - with costing and with excessive I/O due to using logical tapes. Thank you for the detailed analysis. I am still digesting this

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-19 Thread Jeff Davis
On Tue, 2020-05-19 at 17:12 +0200, Tomas Vondra wrote: > I think there are two related problem - with costing and with > excessive > I/O due to using logical tapes. Thank you for the detailed analysis. I am still digesting this information. > This kinda makes me question whether logical tapes