Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-03-04 Thread Thorsten Glaser
On Sat, 4 Mar 2023, Alban Hertroys wrote:

>> But isn’t that the same as with a regular LEFT JOIN?
>
>Similar, but not the same, I’d say.
>
>I do now notice that I made some copying errors there, I was a bit
>nauseous at that time.

np, I’m under pollen attack currently so also not at my best.

>That should have read:
>
>>> select jsonb_build_object('opening_times’,
>>> jsonb_agg(obj
>>> ORDER BY
>>> obj->>'weekday’,
>>> obj->>'from_hour’,
>>> obj->>'to_hour')
>>> )
>>> from cot
>>> cross join lateral jsonb_build_object(
>>> 'weekday', cot.weekday,
>>> 'from_hour', cot.from_hour,
>>> 'to_hour', cot.to_hour) obj
>
>The lateral join applies the function to each row returned from the
>left side of the join and enriches that row with the function result. I
>used a cross join because there is no join condition to apply to the
>lateral, otherwise you could also use an inner join on true.

Okay, so I got it now. But that’s still identical to the LEFT JOIN
which I’m using in that example, because…

>A left join wouldn’t make much sense here, unless the function could

… I have a LEFT JOIN already and can just use the CTE there,
so I don’t have to add an extra lateral join.

But good to know for the future/when I don’t have that.

>return NULL - for example if it were a function marked as STRICT and
>some of the input parameter values (from the table) could be NULL.

OK.

>You need a sub-select, which in turn creates its own result set. It’s
>up to the planner whether the left or the right side gets executed
>first, after which the results of the other side of the join get merged
>to this, or whether this can all be collected in one go. That’s up to
>the query planner to decide though, and it could be right.

OK, but that shouldn’t make a difference here as it needs to run over
all rows of the cgwa table anyway (possibly reduced by filtering on
users).

While not the case here, I see that for other entries the lateral
join would cause more work: for the “qualification” kinds of tables,
for example, the individual qualification table has very few rows
(these are entered by the site admin), but the m:n connection table
(whatever the correct name for these is) has a lot because many of
the users have many of these qualifications. If I use a CTE to add
a JSON object to the individual qualification table first, it doesn’t
run on each qualification multiple times; if I use a lateral join,
it possibly, modulo planner optimisations, runs the jsonb_build_object
function many times per qualification despite them all giving the same
result. And, even if the optimisations catch that, it’s mentally not
the same.

>In my experience, lateral joins go well with the jsonb functions. They
>tend to reduce code repetition when referencing object members, such as
>in your case.

Right. In my case I can get the same by adding a CTE instead though,
and it’s hard to see which is better, performance-wise.

This is a lot to take in, and I r̲e̲a̲l̲l̲y̲ appreciate the detailed
explanations given alongside ☻

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-03-04 Thread Alban Hertroys


> On 3 Mar 2023, at 20:32, Thorsten Glaser  wrote:
> 
> On Fri, 3 Mar 2023, Alban Hertroys wrote:
> 
>> You can rewrite that into something like this:
>> 
>> select jsonb_build_object('opening_times’,
>>  obj
>>  ORDER BY
>>  obj->>'weekday’,
>>  obj->>'from_hour’,
>>  obj->>'to_hour')
>> )
>> from cot
>> cross join lateral jsonb_agg(jsonb_build_object(
>>  'weekday', cot.weekday,
>>  'from_hour', cot.from_hour,
>>  'to_hour', cot.to_hour) obj
> 
> But isn’t that the same as with a regular LEFT JOIN?

Similar, but not the same, I’d say.

I do now notice that I made some copying errors there, I was a bit nauseous at 
that time.
That should have read:

>> select jsonb_build_object('opening_times’,
>> jsonb_agg(obj
>>  ORDER BY
>>  obj->>'weekday’,
>>  obj->>'from_hour’,
>>  obj->>'to_hour')
>> )
>> from cot
>> cross join lateral jsonb_build_object(
>>  'weekday', cot.weekday,
>>  'from_hour', cot.from_hour,
>>  'to_hour', cot.to_hour) obj


The lateral join applies the function to each row returned from the left side 
of the join and enriches that row with the function result.
I used a cross join because there is no join condition to apply to the lateral, 
otherwise you could also use an inner join on true.

I think you could also have used an implicit Cartesian product (using ‘,’ for 
the join), and that in that case the lateral would be implied. I prefer 
explicit notation though.

A left join wouldn’t make much sense here, unless the function could return 
NULL - for example if it were a function marked as STRICT and some of the input 
parameter values (from the table) could be NULL.


>>>   cgwaj AS (
>>> SELECT cgwa.id AS id, jsonb_build_object(
>>> 'weekday', cgwa.weekday,
>>> 'forenoon', cgwa.forenoon,
>>> 'afternoon', cgwa.afternoon,
>>> 'evening', cgwa.evening) AS obj
>>> FROM core_generalworkavailability cgwa
> 
> plus

There are some differences.

You need a sub-select, which in turn creates its own result set. It’s up to the 
planner whether the left or the right side gets executed first, after which the 
results of the other side of the join get merged to this, or whether this can 
all be collected in one go. That’s up to the query planner to decide though, 
and it could be right.

>>> LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id
> 
> With the addition that I can aggregate…

You can do so in both situations, but I guess that confusion stems from my 
copy/paste mistake.

In my experience, lateral joins go well with the jsonb functions. They tend to 
reduce code repetition when referencing object members, such as in your case.

Regards,

Alban Hertroys
--
There is always an exception to always.








Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-03-03 Thread Thorsten Glaser
On Fri, 3 Mar 2023, Alban Hertroys wrote:

>You can rewrite that into something like this:
>
>select jsonb_build_object('opening_times’,
>   obj
>   ORDER BY
>   obj->>'weekday’,
>   obj->>'from_hour’,
>   obj->>'to_hour')
>)
>from cot
>cross join lateral jsonb_agg(jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour) obj

But isn’t that the same as with a regular LEFT JOIN?

>>cgwaj AS (
>>  SELECT cgwa.id AS id, jsonb_build_object(
>>  'weekday', cgwa.weekday,
>>  'forenoon', cgwa.forenoon,
>>  'afternoon', cgwa.afternoon,
>>  'evening', cgwa.evening) AS obj
>>  FROM core_generalworkavailability cgwa

plus

>>  LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id

With the addition that I can aggregate…

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-03-03 Thread Alban Hertroys



> On 3 Mar 2023, at 0:02, Thorsten Glaser  wrote:
> 
> On Tue, 28 Feb 2023, Alban Hertroys wrote:
> 
>> Perhaps you can use a lateral cross join to get the result of
>> jsonb_build_object as a jsonb value to pass around?
> 
> I don’t see how. (But then I’ve not yet worked with lateral JOINs.)

You posted this bit:

> jsonb_build_object('opening_times',
>   jsonb_agg(DISTINCT jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)
>   ORDER BY
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'weekday',
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'from_hour',
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'to_hour')
> )


You can rewrite that into something like this:

select jsonb_build_object('opening_times’,
obj
ORDER BY
obj->>'weekday’,
obj->>'from_hour’,
obj->>'to_hour')
)
from cot
cross join lateral jsonb_agg(jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour) obj

That’s off the top of my head and I did leave out the DISTINCT. Just to show 
the concept here. A bit of experimenting and reading should get you there, I’m 
keeping $work waiting :P

(…)

> WITH
>cgwaj AS (
>   SELECT cgwa.id AS id, jsonb_build_object(
>   'weekday', cgwa.weekday,
>   'forenoon', cgwa.forenoon,
>   'afternoon', cgwa.afternoon,
>   'evening', cgwa.evening) AS obj
>   FROM core_generalworkavailability cgwa
>),
>-- … same for opening times
> SELECT cp.email, …,
>   -- …
>   jsonb_build_object('possible_work_times', COALESCE(
>   jsonb_agg(DISTINCT cgwaj.obj ORDER BY cgwaj.obj->>'weekday',
>   cgwaj.obj->>'forenoon', cgwaj.obj->>'afternoon',
>   cgwaj.obj->>'evening')
>   FILTER (WHERE cgwaj.id IS NOT NULL))) ||
>   -- …
>FROM core_person cp
>   -- …
>   LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id
>   LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id
>   -- …
> 
> That is, add a CTE for each m:n-attached table whose “value” is
> an object, not a single field, keep the id field; LEFT JOIN that
> (instead of the original table), then we have a field to use in
> ORDER BY.
> 
> I think. I’ve not yet tried it (I don’t have access to that DB
> normally, I was just helping out).
> 
> This avoids sub-SELECTs in the sense of needing to run one for
> each user row, because the innermost JSON object building needs
> to be done for each (connected (if the query is not filtering on
> specific users)) row of the “property table”, anyway. (And even
> if filtered, that can be passed down.)
> 
> bye,
> //mirabilos
> -- 
> Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
> schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
> Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
> hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r
> 

Alban Hertroys
--
There is always an exception to always.








Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-03-02 Thread Thorsten Glaser
On Tue, 28 Feb 2023, Alban Hertroys wrote:

>Perhaps you can use a lateral cross join to get the result of
>jsonb_build_object as a jsonb value to pass around?

I don’t see how. (But then I’ve not yet worked with lateral JOINs.)

But I c̲a̲n̲ just generate the objects first, I t̲h̲i̲n̲k̲, given one
of them corresponds to exactly one of the rows of an m:n-linked
table and nothing else. Something like…

WITH
cgwaj AS (
SELECT cgwa.id AS id, jsonb_build_object(
'weekday', cgwa.weekday,
'forenoon', cgwa.forenoon,
'afternoon', cgwa.afternoon,
'evening', cgwa.evening) AS obj
FROM core_generalworkavailability cgwa
),
-- … same for opening times
SELECT cp.email, …,
-- …
jsonb_build_object('possible_work_times', COALESCE(
jsonb_agg(DISTINCT cgwaj.obj ORDER BY cgwaj.obj->>'weekday',
cgwaj.obj->>'forenoon', cgwaj.obj->>'afternoon',
cgwaj.obj->>'evening')
FILTER (WHERE cgwaj.id IS NOT NULL))) ||
-- …
FROM core_person cp
-- …
LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id
LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id
-- …

That is, add a CTE for each m:n-attached table whose “value” is
an object, not a single field, keep the id field; LEFT JOIN that
(instead of the original table), then we have a field to use in
ORDER BY.

I think. I’ve not yet tried it (I don’t have access to that DB
normally, I was just helping out).

This avoids sub-SELECTs in the sense of needing to run one for
each user row, because the innermost JSON object building needs
to be done for each (connected (if the query is not filtering on
specific users)) row of the “property table”, anyway. (And even
if filtered, that can be passed down.)

bye,
//mirabilos
-- 
Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
hervorragend.   -- Andreas Bogk über boehm-gc in d.a.s.r





Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-02-28 Thread Alban Hertroys



> On 28 Feb 2023, at 3:54, Thorsten Glaser  wrote:

(…)

>> Well, that may be what you want, but it's not what you wrote in
>> the query.  Follow David's advice and do
> […]
>> I'm pretty sure that this will only incur one evaluation of the
>> common subexpression, so even though it's tedious to type it's not
>> inefficient.
> 
> Thanks. But I fear it’s not as simple as you wrote. More like:
> 
> jsonb_build_object('opening_times',
>   jsonb_agg(DISTINCT jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)
>   ORDER BY
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'weekday',
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'from_hour',
>   jsonb_build_object(
>   'weekday', cot.weekday,
>   'from_hour', cot.from_hour,
>   'to_hour', cot.to_hour)->>'to_hour')
> )
> 
> Isn’t that more like it?


Perhaps you can use a lateral cross join to get the result of 
jsonb_build_object as a jsonb value to pass around?


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y

2023-02-27 Thread Thorsten Glaser
On Mon, 27 Feb 2023, David G. Johnston wrote:

>Consider this then as a jumping point to a more precise query form:
[…]
>the basic concept holds - produce single rows in subqueries then join those
>various single rows together to produce your desired json output.

Ouch. I’ll have to read up and experiment with that, I guess.
But wouldn’t I rather then do a CTA for each referenced table
that does the aggregating and GROUP BY person_id, then join that
(which has single rows for each user row) to users?

>--BAD cross joining going on with multiple one-to-many relationships
>--GOOD, only joining one-to-one relationships

Not quite. This is many-to-many created by Django…


On Mon, 27 Feb 2023, Tom Lane wrote:

>Sure, I was just trying to explain the rule.

Aaah, okay. Sorry, I misunderstood that, and the… general direction
of the replies, then.

>Well, that may be what you want, but it's not what you wrote in
>the query.  Follow David's advice and do
[…]
>I'm pretty sure that this will only incur one evaluation of the
>common subexpression, so even though it's tedious to type it's not
>inefficient.

Thanks. But I fear it’s not as simple as you wrote. More like:

jsonb_build_object('opening_times',
jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)
ORDER BY
jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)->>'weekday',
jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)->>'from_hour',
jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)->>'to_hour')
)

Isn’t that more like it?

(Though I guess at that point I could just drop at least
the to_hour fallback sort, hoping nobody inserts overlapping
hours with indentical start times…)

Now that I see that, it sort of re-raises a question I had
during developing this but didn’t pursue.

How about, instead of creating a JSONB object here, I create
a (not-JSON) ARRAY['weekday', cot.weekday, 'from_hour',
cot.from_hour, 'to_hour', cot.to_hour] in the aggregate function.
The array is something I could naturally use to sort (its elements
already are in sort key order), and its elements *also* are in
the order jsonb_build_object expects its arguments (assuming I can
do something like passing the array to it instead of “spreading”
the arguments). Pseudocode:

… hmm, not that easy. The aggregate would return e.g. this…

ARRAY[['weekday',1,…],['weekday',2,…]]

… and there’s no array_map function that could be used to pass
each inner array, one by one, to jsonb_build_object; converting
the outer array to JSON gets us json_array_elements (same problem),
other ways I can think of also don’t seem to have matching functions
(jq has… interesting ones).

As I write this, I fear that won’t fly because nōn-JSON arrays
cannot contain mixed types (text and numbers) in PostgreSQL… at
least I ran into that at some point in the past vaguely remembering…

Oh well,
//mirabilos
-- 
Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
hervorragend.   -- Andreas Bogk über boehm-gc in d.a.s.r