Re: Reorganize collation lookup time and place

2019-02-01 Thread Peter Geoghegan
On Fri, Feb 1, 2019 at 7:27 AM Peter Eisentraut
 wrote:
> This patch will probably make a reappearance when we consider making ICU
> available as the database-wide default collation.

Somebody recently reported that an ICU locale was over 2000x faster
than the equivalent glibc locale with their query due to supporting
the abbreviated keys optimization:

https://postgr.es/m/CACd=f9cO5OcmAeBK1M5a3+t8BO7E91Ki0WaLAxFm=fbbon_...@mail.gmail.com

We really should get around to supporting ICU for database-wide
collations sooner rather than later. I knew that there were cases
where glibc performs abysmally, but I didn't think it could ever be a
difference of more than two orders of magnitude until recently.

-- 
Peter Geoghegan



Re: Reorganize collation lookup time and place

2019-02-01 Thread Peter Eisentraut
On 31/01/2019 13:45, Andres Freund wrote:
> So, what's the plan here? Should the CF entry be closed?

Set to RwF.

This patch will probably make a reappearance when we consider making ICU
available as the database-wide default collation.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Reorganize collation lookup time and place

2019-01-31 Thread Andres Freund
Hi,

On 2018-12-13 14:50:53 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-12-13 15:05:40 +0100, Peter Eisentraut wrote:
> >> On 12/12/2018 18:56, Andres Freund wrote:
> >>> Why isn't this integrated into fmgr_info()?
> 
> >> fmgr_info() looks up stuff in pg_proc.  pg_newlocale_...() looks up
> >> stuff in pg_collation.  There is no overlap between them.
> 
> > It looks up stuff necessary for calling a function, that doesn't fit
> > looking up the collation necessary to do so too badly. A lot of the the
> > changes you made are rote changes to each caller, taking the collation
> > oid and expanding it with pg_newlocale_from_collation().
> 
> I'm not very thrilled with the idea of changing every single caller of
> InitFunctionCallInfoData and related functions, especially when exactly
> zero functional change ensues.  We should work harder on avoiding that
> code churn; extension developers will thank us.
> 
> >> There is also not necessarily a one-to-one correspondence between
> >> function and collation lookup calls.  For example, in some cases you
> >> need to look up a sorting and a hashing functions, but only one
> >> collation for both.
> 
> > That seems rare enough not to matter, performancewise.
> 
> I think it potentially does matter, but I also agree that it's not
> the common case.  Could we perhaps keep the API for the existing
> functions the same, and introduce new functions alongside them
> to be used by the small number of places where it matters?
> 
> (I've not looked at Peter's patch yet, so maybe I'm talking through
> my hat.  But we should set a pretty high value on avoiding code
> churn in stuff as widely used as the fmgr interfaces.)

So, what's the plan here? Should the CF entry be closed?

Greetings,

Andres Freund



Re: Reorganize collation lookup time and place

2018-12-13 Thread Andres Freund
On 2018-12-13 14:50:53 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-12-13 15:05:40 +0100, Peter Eisentraut wrote:
> >> On 12/12/2018 18:56, Andres Freund wrote:
> >>> Why isn't this integrated into fmgr_info()?
> 
> >> fmgr_info() looks up stuff in pg_proc.  pg_newlocale_...() looks up
> >> stuff in pg_collation.  There is no overlap between them.
> 
> > It looks up stuff necessary for calling a function, that doesn't fit
> > looking up the collation necessary to do so too badly. A lot of the the
> > changes you made are rote changes to each caller, taking the collation
> > oid and expanding it with pg_newlocale_from_collation().
> 
> I'm not very thrilled with the idea of changing every single caller of
> InitFunctionCallInfoData and related functions, especially when exactly
> zero functional change ensues.  We should work harder on avoiding that
> code churn; extension developers will thank us.

I am thinking of moving to commit 0001 (but not 0002) of
https://www.postgresql.org/message-id/20181009191802.ppt6lqcvkpjvkm76%40alap3.anarazel.de
which'd break some (but fewer) of the same place that'd be affected by
this.  The amount of memory saved for plpgsql and other places with lots
of expressoins is quite noticable, and it's a prerequisite for better
code generation and caching for JIT.  As that patch doesn't affect
DirectFunctionCall* etc, I'm not sure it's a sufficient argument to go
full breakage here, however.

Independent of that it still seems like putting stuff onto
every function caller that they don't need to deal with.

Greetings,

Andres Freund



Re: Reorganize collation lookup time and place

2018-12-13 Thread Tom Lane
Andres Freund  writes:
> On 2018-12-13 15:05:40 +0100, Peter Eisentraut wrote:
>> On 12/12/2018 18:56, Andres Freund wrote:
>>> Why isn't this integrated into fmgr_info()?

>> fmgr_info() looks up stuff in pg_proc.  pg_newlocale_...() looks up
>> stuff in pg_collation.  There is no overlap between them.

> It looks up stuff necessary for calling a function, that doesn't fit
> looking up the collation necessary to do so too badly. A lot of the the
> changes you made are rote changes to each caller, taking the collation
> oid and expanding it with pg_newlocale_from_collation().

I'm not very thrilled with the idea of changing every single caller of
InitFunctionCallInfoData and related functions, especially when exactly
zero functional change ensues.  We should work harder on avoiding that
code churn; extension developers will thank us.

>> There is also not necessarily a one-to-one correspondence between
>> function and collation lookup calls.  For example, in some cases you
>> need to look up a sorting and a hashing functions, but only one
>> collation for both.

> That seems rare enough not to matter, performancewise.

I think it potentially does matter, but I also agree that it's not
the common case.  Could we perhaps keep the API for the existing
functions the same, and introduce new functions alongside them
to be used by the small number of places where it matters?

(I've not looked at Peter's patch yet, so maybe I'm talking through
my hat.  But we should set a pretty high value on avoiding code
churn in stuff as widely used as the fmgr interfaces.)

regards, tom lane



Re: Reorganize collation lookup time and place

2018-12-13 Thread Andres Freund
Hi,

On 2018-12-13 15:05:40 +0100, Peter Eisentraut wrote:
> On 12/12/2018 18:56, Andres Freund wrote:
> >> This makes the collation lookup more similar to the function lookup.  In
> >> many cases they now happen right next to each other.
> >> pg_newlocale_from_collation() becomes analogous to fmgr_info() and
> >> pg_locale_t to FmgrInfo *.
> > Why isn't this integrated into fmgr_info()?
> 
> fmgr_info() looks up stuff in pg_proc.  pg_newlocale_...() looks up
> stuff in pg_collation.  There is no overlap between them.

It looks up stuff necessary for calling a function, that doesn't fit
looking up the collation necessary to do so too badly. A lot of the the
changes you made are rote changes to each caller, taking the collation
oid and expanding it with pg_newlocale_from_collation().  It seems not
necessary to force every external user to do rote changes like:

fmgr_info(opexpr->opfuncid, finfo);
fmgr_info_set_expr((Node *) node, finfo);
InitFunctionCallInfoData(*fcinfo, finfo, 2,
-   
 opexpr->inputcollid, NULL, NULL);
+   
 pg_newlocale_from_collation(opexpr->inputcollid), NULL, NULL);
...
-   h = FunctionCall1Coll(array_extra_data->hash, DEFAULT_COLLATION_OID, d);
+   h = FunctionCall1Coll(array_extra_data->hash, 
pg_newlocale_from_collation(DEFAULT_COLLATION_OID), d);


A lot of the new pg_newlocale_from_collation() calls go a fair bit over
the customary line length limit.

As it stands I have another patch that wants to change the function call
interface, including for extensions:
https://www.postgresql.org/message-id/20180605172952.x34m5uz6ju6enaem%40alap3.anarazel.de
so perhaps we can just bite the bullet and do both.

But I'm somewhat doubtful it's useful to expose having to lookup
pg_newlocale_from_collation() at the callsites to every single caller of
functions in the codebase. You say that's a separate conern of
initializing function calls, for me it's unnecessarily exposing details
that seem likely to change.


> There is also not necessarily a one-to-one correspondence between
> function and collation lookup calls.  For example, in some cases you
> need to look up a sorting and a hashing functions, but only one
> collation for both.

That seems rare enough not to matter, performancewise.

Greetings,

Andres Freund



Re: Reorganize collation lookup time and place

2018-12-13 Thread Peter Eisentraut
On 12/12/2018 18:56, Andres Freund wrote:
>> This makes the collation lookup more similar to the function lookup.  In
>> many cases they now happen right next to each other.
>> pg_newlocale_from_collation() becomes analogous to fmgr_info() and
>> pg_locale_t to FmgrInfo *.
> Why isn't this integrated into fmgr_info()?

fmgr_info() looks up stuff in pg_proc.  pg_newlocale_...() looks up
stuff in pg_collation.  There is no overlap between them.

There is also not necessarily a one-to-one correspondence between
function and collation lookup calls.  For example, in some cases you
need to look up a sorting and a hashing functions, but only one
collation for both.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Reorganize collation lookup time and place

2018-12-12 Thread Andres Freund
Hi,

On 2018-12-10 16:12:54 +0100, Peter Eisentraut wrote:
> Attached is a patch that reorganizes how and where collations are looked up.
> 
> Until now, a fmgr C function that makes use of collation information
> (for example varstr_cmp(), str_toupper()) gets passed the collation OID,
> looks up the collation with pg_newlocale_from_collation(), and then does
> something with it.
> 
> With this change, the lookup is moved earlier, typically during executor
> initialization.  The fmgr functions receive the locale pointer that is
> the result of that lookup.
> 
> This makes the collation lookup more similar to the function lookup.  In
> many cases they now happen right next to each other.
> pg_newlocale_from_collation() becomes analogous to fmgr_info() and
> pg_locale_t to FmgrInfo *.

Why isn't this integrated into fmgr_info()?

Greetings,

Andres Freund



Re: Reorganize collation lookup time and place

2018-12-12 Thread Andreas Karlsson

On 12/10/18 4:12 PM, Peter Eisentraut wrote:

Attached is a patch that reorganizes how and where collations are looked up.

Until now, a fmgr C function that makes use of collation information
(for example varstr_cmp(), str_toupper()) gets passed the collation OID,
looks up the collation with pg_newlocale_from_collation(), and then does
something with it.

With this change, the lookup is moved earlier, typically during executor
initialization.  The fmgr functions receive the locale pointer that is
the result of that lookup.


Sounds like a great plan. I too feel that having the look ups there 
makes more logical sense.


But when taking a look at your patch I got a segfault when running 
initdb. See the stack trace below. My compiler is "gcc (Debian 8.2.0-9) 
8.2.0" and the locale when running initdb is glibc's "en_US.utf8".


#0  __GI___strcoll_l (s1=0x55d64555d998 "({CONST :consttype 1009 
:consttypmod -1 :constcollid 100 :constlen -1 :constbyval false 
:constisnull false :location 160 :constvalue 16 [ 64 0 0 0 0 0 0 0 0 0 0 
0 25 0 0 0 ]})", s2=0x55d64555ddb0 "({CONST :consttype 1009 :consttypmod 
-1 :constcollid 100 :constlen -1 :constbyval false :constisnull false 
:location 161 :constvalue 16 [ 64 0 0 0 0 0 0 0 0 0 0 0 25 0 0 0 ]})", 
l=0x0) at strcoll_l.c:259
#1  0x55d644c4d607 in varstrfastcmp_locale (x=94378777346072, 
y=94378777347120, ssup=0x7ffea6a64990) at varlena.c:2185
#2  0x55d644958dad in ApplySortComparator (ssup=0x7ffea6a64990, 
isNull2=false, datum2=, isNull1=false, datum1=out>) at ../../../src/include/utils/sortsupport.h:224
#3  compare_scalars (a=, b=, 
arg=0x7ffea6a64980) at analyze.c:2784
#4  0x55d644cba494 in qsort_arg (a=a@entry=0x55d6456ee5f0, 
n=n@entry=14, es=es@entry=16, cmp=cmp@entry=0x55d644958d82 
, arg=arg@entry=0x7ffea6a64980) at qsort_arg.c:140
#5  0x55d64495b833 in compute_scalar_stats (stats=0x55d6456c6c88, 
fetchfunc=0x55d6449597f1 , samplerows=2904, 
totalrows=2904) at analyze.c:2360
#6  0x55d64495cca3 in do_analyze_rel 
(onerel=onerel@entry=0x7f3424bdbd50, options=options@entry=2, 
params=params@entry=0x7ffea6a64d90, va_cols=va_cols@entry=0x0, 
acquirefunc=0x55d64495901b , relpages=77, 
inh=false, in_outer_xact=false, elevel=13) at analyze.c:527
#7  0x55d64495d2ec in analyze_rel (relid=, 
relation=0x0, options=options@entry=2, 
params=params@entry=0x7ffea6a64d90, va_cols=0x0, 
in_outer_xact=, bstrategy=0x55d6455413a8) at analyze.c:258
#8  0x55d6449d61b1 in vacuum (options=2, relations=0x55d645541520, 
params=params@entry=0x7ffea6a64d90, bstrategy=, 
bstrategy@entry=0x0, isTopLevel=) at vacuum.c:357
#9  0x55d6449d644c in ExecVacuum 
(vacstmt=vacstmt@entry=0x55d645442e50, isTopLevel=isTopLevel@entry=true) 
at vacuum.c:141
#10 0x55d644b5ec9d in standard_ProcessUtility (pstmt=0x55d6454431a0, 
queryString=0x55d645442468 "ANALYZE;\n", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, 
dest=0x55d644f50b00 , completionTag=0x7ffea6a650d0 "") at 
utility.c:670
#11 0x55d644b5f1cc in ProcessUtility 
(pstmt=pstmt@entry=0x55d6454431a0, queryString=, 
context=, params=, queryEnv=out>, dest=dest@entry=0x55d644f50b00 , 
completionTag=0x7ffea6a650d0 "") at utility.c:360
#12 0x55d644b5b582 in PortalRunUtility 
(portal=portal@entry=0x55d645434428, pstmt=pstmt@entry=0x55d6454431a0, 
isTopLevel=isTopLevel@entry=true, 
setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x55d644f50b00 , 
completionTag=completionTag@entry=0x7ffea6a650d0 "") at pquery.c:1175
#13 0x55d644b5c1fd in PortalRunMulti 
(portal=portal@entry=0x55d645434428, isTopLevel=isTopLevel@entry=true, 
setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x55d644f50b00 , 
altdest=altdest@entry=0x55d644f50b00 , 
completionTag=completionTag@entry=0x7ffea6a650d0 "") at pquery.c:1321
#14 0x55d644b5cfbb in PortalRun (portal=portal@entry=0x55d645434428, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, 
run_once=run_once@entry=true, dest=dest@entry=0x55d644f50b00 
, altdest=altdest@entry=0x55d644f50b00 , 
completionTag=0x7ffea6a650d0 "") at pquery.c:796
#15 0x55d644b591b8 in exec_simple_query 
(query_string=query_string@entry=0x55d645442468 "ANALYZE;\n") at 
postgres.c:1215
#16 0x55d644b5b0dd in PostgresMain (argc=, 
argv=, dbname=, username=) 
at postgres.c:4256

#17 0x55d644a3abf9 in main (argc=10, argv=0x55d6453c3490) at main.c:224