Re: Why do indexes and sorts use the database collation?

2023-11-17 Thread Jeff Davis
On Mon, 2023-11-13 at 14:12 -0800, Andres Freund wrote: > Why on earth are we solving this by having multiple pg_collation > entries for > exactly the same collation, instead of normalizing the collation-name > during > lookup by adding the relevant encoding name if not explicitly > specified?  It

Re: Why do indexes and sorts use the database collation?

2023-11-15 Thread Jeff Davis
On Tue, 2023-11-14 at 13:01 +0100, Tomas Vondra wrote: > Presumably we'd no generate incorrect > results, but we'd not be able use an index, causing performance > issues. Couldn't use the index for its pathkeys or range scans, but could use it for equality. > AFAICS this is a trade-off between

Re: Why do indexes and sorts use the database collation?

2023-11-14 Thread Jeff Davis
On Wed, 2023-11-15 at 00:52 +0100, Matthias van de Meent wrote: > That doesn't really answer the question for me. Why would you have a > primary key that has different collation rules (which include > equality > rules) The equality rules for all deterministic collations are the same: if the bytes

Re: Why do indexes and sorts use the database collation?

2023-11-14 Thread Matthias van de Meent
On Wed, 15 Nov 2023 at 00:28, Jeff Davis wrote: > > On Tue, 2023-11-14 at 14:47 -0500, Tom Lane wrote: > > Why should that ever be different from the column's own declared > > collation? > > Because an index with the "C" collation is more efficient in terms of > building/maintaining/searching the

Re: Why do indexes and sorts use the database collation?

2023-11-14 Thread Jeff Davis
On Tue, 2023-11-14 at 14:47 -0500, Tom Lane wrote: > Why should that ever be different from the column's own declared > collation? Because an index with the "C" collation is more efficient in terms of building/maintaining/searching the index, and it also doesn't carry risks of corrupting your PK

Re: Why do indexes and sorts use the database collation?

2023-11-14 Thread Tom Lane
Jeff Davis writes: > On Tue, 2023-11-14 at 17:15 +0100, Peter Eisentraut wrote: >> The problem is that the user has no way to declare whether they just >> want this. > We should add a way to declare that a primary key should create an > index in a particular collation. Why should that ever be

Re: Why do indexes and sorts use the database collation?

2023-11-14 Thread Jeff Davis
On Tue, 2023-11-14 at 17:15 +0100, Peter Eisentraut wrote: > On 14.11.23 02:58, Jeff Davis wrote: > > If the user just wants PK/FK constraints, and equality lookups, > > then an > > index with the "C" collation makes a lot of sense to serve those > > purposes. > > The problem is that the user has

Re: Why do indexes and sorts use the database collation?

2023-11-14 Thread Peter Eisentraut
On 14.11.23 02:58, Jeff Davis wrote: If the user just wants PK/FK constraints, and equality lookups, then an index with the "C" collation makes a lot of sense to serve those purposes. The problem is that the user has no way to declare whether they just want this. The default assumption is

Re: Why do indexes and sorts use the database collation?

2023-11-14 Thread Peter Eisentraut
On 13.11.23 17:49, Jeff Davis wrote: On Mon, 2023-11-13 at 13:43 +0100, Peter Eisentraut wrote: On 11.11.23 01:03, Jeff Davis wrote: But the database collation is always deterministic, So far! Yeah, if we did that, clearly the index collation would need to match that of the database to be

Re: Why do indexes and sorts use the database collation?

2023-11-14 Thread Tomas Vondra
On 11/14/23 02:58, Jeff Davis wrote: > On Mon, 2023-11-13 at 22:36 +0100, Tomas Vondra wrote: >> Yeah. I don't quite agree with the initial argument that not >> specifying >> the collation explicitly in CREATE TABLE or a query means the user >> does >> not care about the collation. > > I

Re: Why do indexes and sorts use the database collation?

2023-11-13 Thread Laurenz Albe
On Mon, 2023-11-13 at 22:36 +0100, Tomas Vondra wrote: > Perhaps we could allow the PK index to have a different collation, say > by supporting something like this: > >   ALTER TABLE distributors ADD PRIMARY KEY (dist_id COLLATE "C"); An appealing idea! While at it, we could add an INCLUDE

Re: Why do indexes and sorts use the database collation?

2023-11-13 Thread Jeff Davis
On Mon, 2023-11-13 at 22:36 +0100, Tomas Vondra wrote: > Yeah. I don't quite agree with the initial argument that not > specifying > the collation explicitly in CREATE TABLE or a query means the user > does > not care about the collation. I didn't argue that the user doesn't care about collation

Re: Why do indexes and sorts use the database collation?

2023-11-13 Thread Jeff Davis
On Mon, 2023-11-13 at 10:02 -0800, Andres Freund wrote: > > Inequalities and ORDER BYs can't benefit from an index with a > > different > > collation, but lots of indexes don't need that. > > But we don't know whether the index is used for that. That will be hard to quantify, but perhaps we can

Re: Why do indexes and sorts use the database collation?

2023-11-13 Thread Andres Freund
Hi, On 2023-11-14 00:02:13 +0100, Tomas Vondra wrote: > On 11/13/23 23:12, Andres Freund wrote: > > On 2023-11-13 22:36:24 +0100, Tomas Vondra wrote: > >> ISTM it's about how complex the rules implemented by the collation are, > >> so I agree the cost should be a feature of collations not

Re: Why do indexes and sorts use the database collation?

2023-11-13 Thread Tomas Vondra
On 11/13/23 23:12, Andres Freund wrote: > Hi, > > On 2023-11-13 22:36:24 +0100, Tomas Vondra wrote: >> I don't think we can just arbitrarily override the default because we >> happen to think "C" is going to be faster. If we could prove that using >> "C" is going to produce exactly the same

Re: Why do indexes and sorts use the database collation?

2023-11-13 Thread Andres Freund
Hi, On 2023-11-13 22:36:24 +0100, Tomas Vondra wrote: > I don't think we can just arbitrarily override the default because we > happen to think "C" is going to be faster. If we could prove that using > "C" is going to produce exactly the same results as for the implicit > collation (for a given

Re: Why do indexes and sorts use the database collation?

2023-11-13 Thread Tomas Vondra
On 11/13/23 19:02, Andres Freund wrote: > Hi, > > On 2023-11-11 23:19:55 -0800, Jeff Davis wrote: >> On Fri, 2023-11-10 at 17:19 -0800, Andres Freund wrote: >>> I guess you are arguing that the user didn't intend to create an >>> index here? >> >> No, obviously the user should expect an index

Re: Why do indexes and sorts use the database collation?

2023-11-13 Thread Andres Freund
Hi, On 2023-11-11 23:19:55 -0800, Jeff Davis wrote: > On Fri, 2023-11-10 at 17:19 -0800, Andres Freund wrote: > > I guess you are arguing that the user didn't intend to create an > > index here? > > No, obviously the user should expect an index when a primary key is > created. But that doesn't

Re: Why do indexes and sorts use the database collation?

2023-11-13 Thread Jeff Davis
On Mon, 2023-11-13 at 13:43 +0100, Peter Eisentraut wrote: > On 11.11.23 01:03, Jeff Davis wrote: > > But the database collation is always deterministic, > > So far! Yeah, if we did that, clearly the index collation would need to match that of the database to be useful. What are the main

Re: Why do indexes and sorts use the database collation?

2023-11-13 Thread Peter Eisentraut
On 11.11.23 01:03, Jeff Davis wrote: But the database collation is always deterministic, So far!

Re: Why do indexes and sorts use the database collation?

2023-11-11 Thread Jeff Davis
On Fri, 2023-11-10 at 17:19 -0800, Andres Freund wrote: > I guess you are arguing that the user didn't intend to create an > index here? No, obviously the user should expect an index when a primary key is created. But that doesn't mean that it necessarily needs to be ordered according to the

Re: Why do indexes and sorts use the database collation?

2023-11-10 Thread Andres Freund
Hi, On 2023-11-10 16:03:16 -0800, Jeff Davis wrote: > An "en_US" user doing: > >CREATE TABLE foo(t TEXT PRIMARY KEY); > > is providing no indication that they want an index tailored to their > locale. Yet we are creating the index with the "en_US" collation and > therefore imposing huge