On 09.11.2018 2:27, Tom Lane wrote:
I wrote:
The bigger picture here, and the reason for my skepticism about having
any intelligence in the enabling logic, is that there is no scenario
in which this code can be smarter than the user about what to do.
We have no insight today, and are unlikely to have any in future, about
whether a specific index expression is many-to-one or not.
Hmm ... actually, I take that back. Since we're only interested in this
for expression indexes, we can expect that statistics will be available
for the expression index, at least for tables that have been around
long enough that UPDATE performance is really an exciting topic.
So you could imagine pulling up the stadistinct numbers for the index
column(s) and the underlying column(s), and enabling the optimization
when their ratio is less than $something. Figuring out how to merge
numbers for multiple columns might be tricky, but it's not going to be
completely fact-free. (I still think that the cost-estimate logic is
quite bogus, however.)
Another issue in all this is the cost of doing this work over again
after any relcache flush. Maybe we could move the responsibility
into ANALYZE?
BTW, the existing code appears to be prepared to enable this logic
if *any* index column is an expression, but surely we should do so
only if they are *all* expressions?
regards, tom lane
From my point of view "auto" value should be default, otherwise it has
not so much sense.
If somebody decides to switch on this optimization for some particular
index, then it will set it to "on", not "auto".
So I agree with your previous opinion, that if this optimization is
disabled by default, then it is enough to have boolean parameter.
Concerning muticolumn indexes: why we should apply this optimization
only if *all* of index columns are expressions?
Assume very simple example: we have some kind of document storage
represented by the following table:
create table document(owner integer, name text, last_updated
timestamp, description json);
So there are some static document attributes (name, date,...) and some
dynamic, stored in json field.
Consider that most frequently users will search among their own documents.
So we may create index like this:
create index by_title on documents(owner,(description->>'title'));
Document description may include many attributes which are updated quite
frequently, like "comments", "keywords",...
But "title" is rarely changed, so this optimization will be very useful
for such index.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company