Re: Merge David and Goliath tables efficiently

2023-06-20 Thread nicolas paris
> This is absolutely expected. If you partition by hash (id, part_key), > you can't join on (id) and expect partitionwise join to work. To > quote > the enable_partitionwise_join documentation [1]: > >     Enables or disables the query planner's use of partitionwise > join, >     which allows a jo

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread nicolas paris
> But you wrote that in both cases the query is: that was indeed yet another tipo, hope to do better in the future. > I'm willing to continue to investigate, but only if you prepare a > reproducer, Thanks for your starter script. Please find attached 2 scripts which now illustrates two trouble

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread nicolas paris
> IMHO the thing that breaks it is the ORDER BY in the merge, which > likely > acts as an optimization fence and prevents all sorts of smart things > including the partitionwise join. I'd bet that if Nicolas replaces > >   MERGE INTO "goliath" ca >   USING (SELECT * FROM "david" ORDER BY "list_id"

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread nicolas paris
On Mon, 2023-06-19 at 13:34 +0200, Tomas Vondra wrote: > > > On 6/18/23 22:57, nicolas paris wrote: > > > ... > > > Well, I kinda doubt you have 17023331531230 rows (not even > > > physically > > > possible with 2TB disk), so that's immediately s

Re: Merge David and Goliath tables efficiently

2023-06-18 Thread nicolas paris
> I assume 2TO is 2TB? Yes. 2TB > I don't understand why you're sharing just a part of the plan As for the nested loop plan, what I shared is the full plan. Actually it is repeated many times, since 2M batched by 500 rows. I add it again: Merge on goliath_23 ca (cost=2139.75..11077.17 rows=

Re: Merge David and Goliath tables efficiently

2023-06-17 Thread nicolas paris
> > My interpretation reading the query plan is: well sized small > > batches of upserts leverage the indexes while the regular join > > choose the sequential scan, including sorting and hashing which > > takes forever time and resources including disk. > > You may be right, but it's hard to tell

Merge David and Goliath tables efficiently

2023-06-17 Thread Nicolas Paris
In my use case I have a 2billion / 1To table. I have daily data to upsert around 2milion, with say 50% inserts, based on the primary key in a fresh analyzed table. I have tested multiple strategies to merge the data, all based on first stage to copy the 2m dataset in an staging unlogged / index

tablespace to benefit from ssd ?

2020-02-18 Thread Nicolas PARIS
. I have read that tablespaces introduce overhead of maintenance and introduce complication for replication. But on the other hand I have this ssd disk ready for something. Any recommandation ? -- nicolas paris

Re: PostgreSQL VS MongoDB: a use case comparison

2018-12-12 Thread Nicolas Paris
On Tue, Nov 20, 2018 at 08:34:20AM -0500, Stephen Frost wrote: > > Anyway, to bring data from JSON to a relational model is out of topic > > for the current discussion, since we are actually questioning if > > Postgres is a good replacement for Mongo when handling JSON data. > > This narrow viewpo

LEFT JOIN LATERAL optimisation at plan time

2018-09-18 Thread Nicolas Paris
Hi, For a traditional LEFT JOIN, in case the SELECT does not mention a field from a joined table being unique , the planner removes the join. Eg: SELECT a, b --,c FROM table1 LEFT JOIN (select a, c from table2 group by a) joined USING (a) However this behavior is not the same for LATERAL JOINS