On 2010-12-21 14:26, t...@fuzzy.cz wrote:
Why not auto-create indices for some limited period after database load
(copy?  any large number of inserts from a single connection?), track
those
that actually get re-used and remove the rest?   Would this not provide
a better out-of-the-box experience for neophytes?

Say you have a table with several columns (A,B,C), and the query is using
some of them. What indexes would you create? One index on every column? A
multi-column index on all columns? Indexs for each combination of columns?

One single-column index, on the first index-worthy column appearing.
Keep it simple.    Maybe, while you're doing that full-table-scan. gather
stats on all the indexable columns for later reference, to guide choice of
which column to index later.

There really is no automatic way to solve this puzzle using a single
query. Indexing strategy is a very tough design discipline, and it
requires a complex knowledge of the workload. One slow query does not mean
the index should be created - what if that was just an ad-hoc query and
will not be executed ever again?

Then the index you just built gets automatically dropped, as I said above.

I really don't want to work with products that try to be smarter than me
(and I admit I'm pretty dumb from time to time) and act rather randomly
due to this 'artificial intelligence'. I've already been there and I don't
want to repeat this experience.

Then, since you're not a neophyte, leave the feature turned off.   But don't
deny the possibility of using it to actual neophytes.


as in b): Creating an index is quite expensiv

How much more so than doing that full-table-scan plus sort, which your
query is doing anyway?

A lot. It consumes a lot of CPU, it may consume a lot of memory (up to
maintenance_work_mem) etc. So imagine a few users, building indices on a
big table simultaneously

Pffthht.   One simple trylock, used only by the auto-indexer.


Building an index is just one side of the problem - maintenance of the
indexes is another thing. Each index has an impact on write operations
(INSERT/UPDATE) and may cause that HOT actually does not work (thus
causing unnecessary bloat of the index).

This is worth discussing.    How much help does the DBMS currently give
the DBA in evaluating these tradeoffs?    Could we do better, given an
actual installation and workload?

And the indexes may need lot of
space on a disk.

By all means require limits as well as a "don't do that" switch.


But the real show stopper is probably locking. Building an index takes a
write lock on a table, effectively blocking writes. Sure, you can use a
'CREATE INDEX ... CONCURRENTLY' introduced in 8.2, but there are some
disadvantages of that (see
http://www.postgresql.org/docs/9.0/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY).

The only disadvantages I see there are a) the leftover "invalid" index - which 
feels like a
bug; why is it not auto-dropped?   and b) the second scan, which implies more 
total work
and a wish to background that portion after completing the query triggering
the auto-index.

Don't forget I suggested doing this only for a limited time after DB creation.  
Possibly another
reason to turn it off should be any manual index creation, as that implies that 
the DBA
knows about indexing.

I see in another thread you suggest merely placing hints in the log.   That's a 
fine
first step - but I'll then be wanting to auto-parse that log to auto-create....

Cheers,
    Jeremy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to