Hi Michael, On Fri, Nov 8, 2024 at 8:52 AM Michael Paquier <[email protected]> wrote: > > On Thu, Nov 07, 2024 at 10:06:04PM +0800, Junwang Zhao wrote: > > I'm afraid this can not be achieved in my current implementation, a simple > > case is: > > > > SELECT array_sort('{foo,bar,null,CCC,Abc,bbc}'::text[]); > > {Abc,bar,bbc,CCC,foo,NULL} > > SELECT array_sort('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C"); > > {Abc,CCC,bar,bbc,foo,NULL} > > > > SELECT array_sort(a COLLATE "C") FROM (VALUES > > ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C"), > > ('{foo,bar,null,CCC,Abc,bbc}'::text[])) v(a); > > {Abc,CCC,bar,bbc,foo,NULL} > > {Abc,CCC,bar,bbc,foo,NULL} > > > > Maybe add some documents to specify this? > > So, if I use that: > CREATE COLLATION case_sensitive (provider = icu, locale = ''); > =# SELECT array_sort('{Abc,CCC,bar,bbc,foo,NULL}'::text[] > COLLATE "case_sensitive"); > array_sort > ---------------------------- > {Abc,bar,bbc,CCC,foo,NULL} > (1 row) > =# SELECT array_sort('{Abc,CCC,bar,bbc,foo,NULL}'::text[] > COLLATE "C"); > array_sort > ---------------------------- > {Abc,CCC,bar,bbc,foo,NULL} > (1 row) > > What takes priority is the collation defined with the array_sort, > which is fine: > =# SELECT array_sort(a collate "case_sensitive") FROM > (VALUES ('{foo,bar,null,CCC,Abc,bbc}'::text[]), > ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C" )) v(a); > array_sort > ---------------------------- > {Abc,bar,bbc,CCC,foo,NULL} > {Abc,bar,bbc,CCC,foo,NULL} > (2 rows) > =# SELECT array_sort(a collate "C") FROM > (VALUES ('{foo,bar,null,CCC,Abc,bbc}'::text[]), > ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sensitive" > )) v(a); > array_sort > ---------------------------- > {Abc,CCC,bar,bbc,foo,NULL} > {Abc,CCC,bar,bbc,foo,NULL} > (2 rows) > > The case where the collation is defined in the set of values is a bit > more troubling to me, as it depends on what the values want to be > applied, still that's OK because the collation applied is the one > coming from the set of values: > =# SELECT array_sort(a) FROM > (VALUES ('{foo,bar,null,CCC,Abc,bbc}'::text[]), > ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sensitive" > )) v(a); > array_sort > ---------------------------- > {Abc,bar,bbc,CCC,foo,NULL} > {Abc,bar,bbc,CCC,foo,NULL} > (2 rows) > =# SELECT array_sort(a) FROM > (VALUES ('{foo,bar,null,CCC,Abc,bbc}'::text[]), > ('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C" )) v(a); > array_sort > ---------------------------- > {Abc,CCC,bar,bbc,foo,NULL} > {Abc,CCC,bar,bbc,foo,NULL} > (2 rows) > > I am wondering if there are more fancy cases where the saved cache > could force a state that would lead to puzzling results, say with > different collations that should be applied. I'd recommend to > research that more, to reflect that in the docs and to add tests that > show what we should expect in these cases within 0001 because this > new function is mimicking in the context of a function execution > multiple query clauses where restrictions are applied when analyzing > the query, close to the parser. > > For example, UNION and UNION ALL require a common collation when > processing a set of expressions related to them, which would be OK. > Perhaps I lack some imagination to be able to break things. > -- > Michael
While trying to come up with more test cases, it comes to me if the
PG_GET_COLLATION() has already done the work to give array_sort
the right collation oid? I did not pass the typentry->typcollation but
PG_GET_COLLATION() to tuplesort_begin_datum.
I tried:
CREATE COLLATION case_sensitive (provider = icu, locale = '');
create table t1(a int, b text[] COLLATE "C");
create table t2(a int, b text[] COLLATE "case_sensitive");
insert into t1 values (1, '{foo,bar,null,CCC,Abc,bbc}'::text[]);
insert into t2 values (2, '{foo,bar,null,CCC,Abc,bbc}'::text[]);
select array_sort(b) from t1;
select array_sort(b) from t2;
Set breakpoint to see the collation oid, they all give the expected result.
For the following cases:
SELECT array_sort(a) FROM
(VALUES ('{foo,bar,null,CCC,Abc,bbc}'::text[]),
('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE
"case_sensitive" )) v(a);
WITH t AS (select '{foo,bar,null,CCC,Abc,bbc}'::text[] a UNION ALL
select '{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sensitive"
a) SELECT array_sort(a) from t;
The collation seems to have been decided in select_common_collation of
the transform phase.
For:
WITH t AS (select '{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C" a
UNION ALL select
'{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sensitive"a)
SELECT array_sort(a) from t;
ERROR: collation mismatch between explicit collations "C" and "case_sensitive"
LINE 2: '{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "case_sens...
merge_collation_state gives out an ERROR since its explicit collation.
But for implicit collation,
select_common_collation sets InvalidOid to context.collation, so the
following works:
select b from t1 UNION ALL select b from t2;
But since the context has the InvalidOid as collation,
PG_GET_COLLATION() in arrary_sort
got InvalidOid, the following errors:
WITH t3 AS (select b from t1 UNION ALL select b from t2) select
array_sort(b) from t3;
ERROR: could not determine which collation to use for string comparison
HINT: Use the COLLATE clause to set the collation explicitly.
The error message comes from tuplesort_begin_datum's call stack, we
can do explicit COLLATE to make it work:
WITH t3 AS (select b from t1 UNION ALL select b from t2) select
array_sort(b collate "C") from t3;
Based on the above analysis, I think it's ok to pass PG_GET_COLLATION()
to tuplesort_begin_datum.
PFA v14 with Robert's comment addressed.
--
Regards
Junwang Zhao
v14-0001-general-purpose-array_sort.patch
Description: Binary data
v14-0002-support-sort-order-and-nullsfirst-flag.patch
Description: Binary data
