Re: [HACKERS] [PROPOSAL][PROTOTYPE] Individual options for each index column: Opclass options

2016-09-16 Thread Robert Haas
On Tue, May 24, 2016 at 10:06 AM, Nikolay Shaplov
 wrote:
> So adding options for opclass seems to be really good idea.
> To see how it works you should do the following:
>
> # create extension intarray ;
> # create table test (i int[]);
> # create table test2 (i int[]);
> # create index ON test USING GIST (i USING gist__intbig_ops WITH OPTIONS
> (sig_len_int=22) );
> # create index ON test2 USING GIST (i USING gist__intbig_ops WITH OPTIONS
> (sig_len_int=120) );

I think supporting syntax of this type would be a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PROPOSAL][PROTOTYPE] Individual options for each index column: Opclass options

2016-05-24 Thread Nikolay Shaplov

Hi! 

There was an interesting idea. Now it is possible to add in runtime options 
for relation, when you are creating a new extension with new access method.

But if you add a custom operator class, you have no tool to tune it's 
behavior. But in some cases it would be nice to have such tool. For example in 
intarray extension for gist__intbig_ops there is an option SIGLENINT that 
defines the length of the signature that will be used while creating index. Now 
it is #defined in the code, but theoretically one can tune index 
size/performance by manually setting SIGLENINT value that is more suitable for 
his data.

Moreover the authors of bloom extension have to create workaround for 
customizing each column behavior: they sets number of bits per column by 
setting reloptions col1 — col16. This is not handy. I would seven say a little 
bit ugly.

So adding options for opclass seems to be really good idea.

As far as in postgres index is a relation, index column is an attribute in 
this relation and each column might have only one opclass, this brings us to 
conclusion that when we as speaking about options of an opclass in the index 
column, then we can actually speak about an options of an attribute.

If we look into pg_attribute table, we will see, that there is already 
attoptions attribute there. So we can use it. Just add a way to add them in 
runtime and teach opclass to use it.

The general idea is the following:

Each access method should have amattoptions function, that works similar to  
amoptions function (parses and validate text[] of attoptions into internal 
data structure) but it takes an additional argument -- the number of an 
attribute we are processing.

amattoptions do this job by calling a specific support function of an opclass, 
that knows how to parse it (and do it in a similar way as amoptions function 
of an access method)

the result of amattoptions should be cached as attoptions are cached now or 
similar way. (In current version of prototype I do it wrong, but should change 
it later)

All the support functions of the access method should accept parsed attoptions 
in their argument lists, and use it if necessary. In the prototype I've 
implemented if for gist opclass, and added attoptions as a last argument of 
all support functions. So it is even backward compatible as all funtions I 
have not touched just ignores this last argument.

The implementation of an access method should get attoptions from cache, if 
they are there, or from am->amattoptions if not (in the prototype it is done 
using get_index_attribute_options function) and pass it to the support 
functions.

In the prototype I've added a member to GISTSTATE structure for 
storing parsed attoptions, and fill in initGISTstate and used it all over the 
code. In other implementations solutions might be different.

So... The prototype I wrote is in attachment. It is really dirty. Some things 
are not done yet. Some things should be changed. But never the less work, and 
it shows what I am going to do better than all explanations.

To see how it works you should do the following:

# create extension intarray ;
# create table test (i int[]);
# create table test2 (i int[]);
# create index ON test USING GIST (i USING gist__intbig_ops WITH OPTIONS 
(sig_len_int=22) ); 
# create index ON test2 USING GIST (i USING gist__intbig_ops WITH OPTIONS 
(sig_len_int=120) ); 

# select attoptions from pg_attribute where attrelid = 'test_i_idx'::regclass 
OR attrelid = 'test2_i_idx'::regclass;

attoptions 
---
 {sig_len_int=120}
 {sig_len_int=22}
(2 rows)

and if you uncomment development warning output from the code you will see 
that these values are really used through the code.

Speaking of the syntax, I've added WITH OPTIONS keywords for specifying 
attoptions, not just WITH, that most of us would expect.  I did it because 
index_elem, the node that is used for definition of an index column is used not 
only in CREATE INDEX expression, but also in CREATE TABLE CONSTRAINT EXCLUDE. 
Because CREATE TABLE CONSTRAINT EXCLUDE actually creates an index, and one can 
specify there full specification of index element there. But right after this 
you should write "WITH operator" clause. 

So if I will try to use WITH keyword for attoptions, a syntax parser will be 
confused, as it cat not distinguish one WITH keyword from another. So decided 
to use WITH OTPTIONS keywords for attoptions. 

There might be some tricks that will allow to use WITH in both places but I 
think "WITH OPTIONS" might make SQL code more readable, because an expression 
with two "WITH" in a row will be hard to read for human too ;-)

So we come to the last part of this message, that is most important for me for 
the moment:

One of the issues that I've solved while writing this patch, was the 
following: attoptions and reloptions are should use almost the same code for 
options parsing. For reloptions this code is written, it is good, but you