On Fri, 2025-06-06 at 15:47 -0700, Jeff Davis wrote: > > > * Force the environment variables LC_COLLATE=C and LC_CTYPE=C > > > unconditionally, and pg_perm_setlocale() them > > > > Currently that would be a regression for some people, because > > when LC_CTYPE=C, the FTS parser produces substandard results with > > characters beyond ASCII. > > In the other thread, I posted a patch: > > https://www.postgresql.org/message-id/a1396f17f462ee6561820f755caaf2d12eb9fd15.camel%40j-davis.com > > for the callers that rely on datctype (regardless of datlocprovider), > they access the locale_t through a global, and use the "_l" variants. > > There should be no behavior change, and we still need to set > LC_CTYPE, > so you are right that it's not a solution yet. I think it moves us in > the right direction, though.
I'm not sure of the history here, but it looks like the reason full text search doesn't use collation is because neither tsvector nor tsquery are collatable types. Is that something that can ever be corrected, or are we just stuck with the current behavior forever? Even if it's not a collatable type, it should use the database collation rather than going straight to libc. Again, is that something that can ever be fixed or are we just stuck with libc semantics for full text search permanently, even if you initialize the cluster with a different provider? Regards, Jeff Davis