Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
On Sat, 4 Mar 2023, Alban Hertroys wrote: >> But isn’t that the same as with a regular LEFT JOIN? > >Similar, but not the same, I’d say. > >I do now notice that I made some copying errors there, I was a bit >nauseous at that time. np, I’m under pollen attack currently so also not at my best. >That should have read: > >>> select jsonb_build_object('opening_times’, >>> jsonb_agg(obj >>> ORDER BY >>> obj->>'weekday’, >>> obj->>'from_hour’, >>> obj->>'to_hour') >>> ) >>> from cot >>> cross join lateral jsonb_build_object( >>> 'weekday', cot.weekday, >>> 'from_hour', cot.from_hour, >>> 'to_hour', cot.to_hour) obj > >The lateral join applies the function to each row returned from the >left side of the join and enriches that row with the function result. I >used a cross join because there is no join condition to apply to the >lateral, otherwise you could also use an inner join on true. Okay, so I got it now. But that’s still identical to the LEFT JOIN which I’m using in that example, because… >A left join wouldn’t make much sense here, unless the function could … I have a LEFT JOIN already and can just use the CTE there, so I don’t have to add an extra lateral join. But good to know for the future/when I don’t have that. >return NULL - for example if it were a function marked as STRICT and >some of the input parameter values (from the table) could be NULL. OK. >You need a sub-select, which in turn creates its own result set. It’s >up to the planner whether the left or the right side gets executed >first, after which the results of the other side of the join get merged >to this, or whether this can all be collected in one go. That’s up to >the query planner to decide though, and it could be right. OK, but that shouldn’t make a difference here as it needs to run over all rows of the cgwa table anyway (possibly reduced by filtering on users). While not the case here, I see that for other entries the lateral join would cause more work: for the “qualification” kinds of tables, for example, the individual qualification table has very few rows (these are entered by the site admin), but the m:n connection table (whatever the correct name for these is) has a lot because many of the users have many of these qualifications. If I use a CTE to add a JSON object to the individual qualification table first, it doesn’t run on each qualification multiple times; if I use a lateral join, it possibly, modulo planner optimisations, runs the jsonb_build_object function many times per qualification despite them all giving the same result. And, even if the optimisations catch that, it’s mentally not the same. >In my experience, lateral joins go well with the jsonb functions. They >tend to reduce code repetition when referencing object members, such as >in your case. Right. In my case I can get the same by adding a CTE instead though, and it’s hard to see which is better, performance-wise. This is a lot to take in, and I r̲e̲a̲l̲l̲y̲ appreciate the detailed explanations given alongside ☻ bye, //mirabilos -- 15:41⎜ Somebody write a testsuite for helloworld :-)
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
> On 3 Mar 2023, at 20:32, Thorsten Glaser wrote: > > On Fri, 3 Mar 2023, Alban Hertroys wrote: > >> You can rewrite that into something like this: >> >> select jsonb_build_object('opening_times’, >> obj >> ORDER BY >> obj->>'weekday’, >> obj->>'from_hour’, >> obj->>'to_hour') >> ) >> from cot >> cross join lateral jsonb_agg(jsonb_build_object( >> 'weekday', cot.weekday, >> 'from_hour', cot.from_hour, >> 'to_hour', cot.to_hour) obj > > But isn’t that the same as with a regular LEFT JOIN? Similar, but not the same, I’d say. I do now notice that I made some copying errors there, I was a bit nauseous at that time. That should have read: >> select jsonb_build_object('opening_times’, >> jsonb_agg(obj >> ORDER BY >> obj->>'weekday’, >> obj->>'from_hour’, >> obj->>'to_hour') >> ) >> from cot >> cross join lateral jsonb_build_object( >> 'weekday', cot.weekday, >> 'from_hour', cot.from_hour, >> 'to_hour', cot.to_hour) obj The lateral join applies the function to each row returned from the left side of the join and enriches that row with the function result. I used a cross join because there is no join condition to apply to the lateral, otherwise you could also use an inner join on true. I think you could also have used an implicit Cartesian product (using ‘,’ for the join), and that in that case the lateral would be implied. I prefer explicit notation though. A left join wouldn’t make much sense here, unless the function could return NULL - for example if it were a function marked as STRICT and some of the input parameter values (from the table) could be NULL. >>> cgwaj AS ( >>> SELECT cgwa.id AS id, jsonb_build_object( >>> 'weekday', cgwa.weekday, >>> 'forenoon', cgwa.forenoon, >>> 'afternoon', cgwa.afternoon, >>> 'evening', cgwa.evening) AS obj >>> FROM core_generalworkavailability cgwa > > plus There are some differences. You need a sub-select, which in turn creates its own result set. It’s up to the planner whether the left or the right side gets executed first, after which the results of the other side of the join get merged to this, or whether this can all be collected in one go. That’s up to the query planner to decide though, and it could be right. >>> LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id > > With the addition that I can aggregate… You can do so in both situations, but I guess that confusion stems from my copy/paste mistake. In my experience, lateral joins go well with the jsonb functions. They tend to reduce code repetition when referencing object members, such as in your case. Regards, Alban Hertroys -- There is always an exception to always.
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
On Fri, 3 Mar 2023, Alban Hertroys wrote: >You can rewrite that into something like this: > >select jsonb_build_object('opening_times’, > obj > ORDER BY > obj->>'weekday’, > obj->>'from_hour’, > obj->>'to_hour') >) >from cot >cross join lateral jsonb_agg(jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour) obj But isn’t that the same as with a regular LEFT JOIN? >>cgwaj AS ( >> SELECT cgwa.id AS id, jsonb_build_object( >> 'weekday', cgwa.weekday, >> 'forenoon', cgwa.forenoon, >> 'afternoon', cgwa.afternoon, >> 'evening', cgwa.evening) AS obj >> FROM core_generalworkavailability cgwa plus >> LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id With the addition that I can aggregate… bye, //mirabilos -- 15:41⎜ Somebody write a testsuite for helloworld :-)
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
> On 3 Mar 2023, at 0:02, Thorsten Glaser wrote: > > On Tue, 28 Feb 2023, Alban Hertroys wrote: > >> Perhaps you can use a lateral cross join to get the result of >> jsonb_build_object as a jsonb value to pass around? > > I don’t see how. (But then I’ve not yet worked with lateral JOINs.) You posted this bit: > jsonb_build_object('opening_times', > jsonb_agg(DISTINCT jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour) > ORDER BY > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'weekday', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'from_hour', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'to_hour') > ) You can rewrite that into something like this: select jsonb_build_object('opening_times’, obj ORDER BY obj->>'weekday’, obj->>'from_hour’, obj->>'to_hour') ) from cot cross join lateral jsonb_agg(jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour) obj That’s off the top of my head and I did leave out the DISTINCT. Just to show the concept here. A bit of experimenting and reading should get you there, I’m keeping $work waiting :P (…) > WITH >cgwaj AS ( > SELECT cgwa.id AS id, jsonb_build_object( > 'weekday', cgwa.weekday, > 'forenoon', cgwa.forenoon, > 'afternoon', cgwa.afternoon, > 'evening', cgwa.evening) AS obj > FROM core_generalworkavailability cgwa >), >-- … same for opening times > SELECT cp.email, …, > -- … > jsonb_build_object('possible_work_times', COALESCE( > jsonb_agg(DISTINCT cgwaj.obj ORDER BY cgwaj.obj->>'weekday', > cgwaj.obj->>'forenoon', cgwaj.obj->>'afternoon', > cgwaj.obj->>'evening') > FILTER (WHERE cgwaj.id IS NOT NULL))) || > -- … >FROM core_person cp > -- … > LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id > LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id > -- … > > That is, add a CTE for each m:n-attached table whose “value” is > an object, not a single field, keep the id field; LEFT JOIN that > (instead of the original table), then we have a field to use in > ORDER BY. > > I think. I’ve not yet tried it (I don’t have access to that DB > normally, I was just helping out). > > This avoids sub-SELECTs in the sense of needing to run one for > each user row, because the innermost JSON object building needs > to be done for each (connected (if the query is not filtering on > specific users)) row of the “property table”, anyway. (And even > if filtered, that can be passed down.) > > bye, > //mirabilos > -- > Solange man keine schmutzigen Tricks macht, und ich meine *wirklich* > schmutzige Tricks, wie bei einer doppelt verketteten Liste beide > Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz > hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r > Alban Hertroys -- There is always an exception to always.
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
On Tue, 28 Feb 2023, Alban Hertroys wrote: >Perhaps you can use a lateral cross join to get the result of >jsonb_build_object as a jsonb value to pass around? I don’t see how. (But then I’ve not yet worked with lateral JOINs.) But I c̲a̲n̲ just generate the objects first, I t̲h̲i̲n̲k̲, given one of them corresponds to exactly one of the rows of an m:n-linked table and nothing else. Something like… WITH cgwaj AS ( SELECT cgwa.id AS id, jsonb_build_object( 'weekday', cgwa.weekday, 'forenoon', cgwa.forenoon, 'afternoon', cgwa.afternoon, 'evening', cgwa.evening) AS obj FROM core_generalworkavailability cgwa ), -- … same for opening times SELECT cp.email, …, -- … jsonb_build_object('possible_work_times', COALESCE( jsonb_agg(DISTINCT cgwaj.obj ORDER BY cgwaj.obj->>'weekday', cgwaj.obj->>'forenoon', cgwaj.obj->>'afternoon', cgwaj.obj->>'evening') FILTER (WHERE cgwaj.id IS NOT NULL))) || -- … FROM core_person cp -- … LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id -- … That is, add a CTE for each m:n-attached table whose “value” is an object, not a single field, keep the id field; LEFT JOIN that (instead of the original table), then we have a field to use in ORDER BY. I think. I’ve not yet tried it (I don’t have access to that DB normally, I was just helping out). This avoids sub-SELECTs in the sense of needing to run one for each user row, because the innermost JSON object building needs to be done for each (connected (if the query is not filtering on specific users)) row of the “property table”, anyway. (And even if filtered, that can be passed down.) bye, //mirabilos -- Solange man keine schmutzigen Tricks macht, und ich meine *wirklich* schmutzige Tricks, wie bei einer doppelt verketteten Liste beide Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
> On 28 Feb 2023, at 3:54, Thorsten Glaser wrote: (…) >> Well, that may be what you want, but it's not what you wrote in >> the query. Follow David's advice and do > […] >> I'm pretty sure that this will only incur one evaluation of the >> common subexpression, so even though it's tedious to type it's not >> inefficient. > > Thanks. But I fear it’s not as simple as you wrote. More like: > > jsonb_build_object('opening_times', > jsonb_agg(DISTINCT jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour) > ORDER BY > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'weekday', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'from_hour', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'to_hour') > ) > > Isn’t that more like it? Perhaps you can use a lateral cross join to get the result of jsonb_build_object as a jsonb value to pass around? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
On Mon, 27 Feb 2023, David G. Johnston wrote: >Consider this then as a jumping point to a more precise query form: […] >the basic concept holds - produce single rows in subqueries then join those >various single rows together to produce your desired json output. Ouch. I’ll have to read up and experiment with that, I guess. But wouldn’t I rather then do a CTA for each referenced table that does the aggregating and GROUP BY person_id, then join that (which has single rows for each user row) to users? >--BAD cross joining going on with multiple one-to-many relationships >--GOOD, only joining one-to-one relationships Not quite. This is many-to-many created by Django… On Mon, 27 Feb 2023, Tom Lane wrote: >Sure, I was just trying to explain the rule. Aaah, okay. Sorry, I misunderstood that, and the… general direction of the replies, then. >Well, that may be what you want, but it's not what you wrote in >the query. Follow David's advice and do […] >I'm pretty sure that this will only incur one evaluation of the >common subexpression, so even though it's tedious to type it's not >inefficient. Thanks. But I fear it’s not as simple as you wrote. More like: jsonb_build_object('opening_times', jsonb_agg(DISTINCT jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour) ORDER BY jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour)->>'weekday', jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour)->>'from_hour', jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour)->>'to_hour') ) Isn’t that more like it? (Though I guess at that point I could just drop at least the to_hour fallback sort, hoping nobody inserts overlapping hours with indentical start times…) Now that I see that, it sort of re-raises a question I had during developing this but didn’t pursue. How about, instead of creating a JSONB object here, I create a (not-JSON) ARRAY['weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour] in the aggregate function. The array is something I could naturally use to sort (its elements already are in sort key order), and its elements *also* are in the order jsonb_build_object expects its arguments (assuming I can do something like passing the array to it instead of “spreading” the arguments). Pseudocode: … hmm, not that easy. The aggregate would return e.g. this… ARRAY[['weekday',1,…],['weekday',2,…]] … and there’s no array_map function that could be used to pass each inner array, one by one, to jsonb_build_object; converting the outer array to JSON gets us json_array_elements (same problem), other ways I can think of also don’t seem to have matching functions (jq has… interesting ones). As I write this, I fear that won’t fly because nōn-JSON arrays cannot contain mixed types (text and numbers) in PostgreSQL… at least I ran into that at some point in the past vaguely remembering… Oh well, //mirabilos -- Solange man keine schmutzigen Tricks macht, und ich meine *wirklich* schmutzige Tricks, wie bei einer doppelt verketteten Liste beide Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r