On 06/06/15 04:07, deavid wrote:
There are several use cases where I see useful an index, but adding it will slow too much inserts and updates. For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes. Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table. I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind.

Some people already asked for "delayed write" indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects).

Since I do not know every internal of postgres, i feel simpler to share here and ask which things can or cannot be done.

Let's imagine there is a new type of index called "weird_btree", where we trade-off simplicity for speed. In almost every mode, we will rely on VACUUM to put our index in optimal state.

Mode 1: on "aminsert" mark the index as INVALID. So, if you modified the table you need to run REINDEX/CREATE INDEX CONCURRENTLY before doing SELECT. This is almost the same as create index concurrently, the main difference is you don't have to remember to drop the index before writing. (I don't see much benefit here)

Mode 2: on "aminsert", put the new entry in a plain, unordered list instead of the btree. Inserting at the end of a list should be faster than big btrees and you'll know later which entries you missed indexing.

Mode 2.a: on index scan (amrescan, amgettuple), pretend that after the btree there is the list and output every row, out-of order. You will have to tell postgres that our index isn't sorted and it will have to recheck every row.

Mode 2.b: mark the index invalid instead. When doing the next vacuum, sort the list and insert it to the btree in a bulk operation. If it's ok, mark the index valid.

Mode 3: on "aminsert", put the new entry on a second btree; leaving the first one untouched. Because the second btree is new, will be small, and writes should be faster. When doing a index scan, read tuples from both at same time (like merge sort). On vacuum, merge the second btree onto the first. On this mode, the index is sorted and there's no need of recheck.

Anyone thinks this would be a interesting feature for postgresql?
Did I miss something?

PD: Maybe it's also possible to take advantage of clustering, and have indexes which entries are range of TIDs; but i'm not sure if this is too exotic, or if it will make a difference.

Sincerely,
David.
How about a hybrid indexing system, with 2 parts:

(1) existing index system which is checked first and has been mostly optimised for speed of reading. If there are only a few inserts/updates and the system is not heavily loaded, then it gets modified immediately. The threshold for being too busy, and few enough changes, could be configurable.

(2) overflow index optimised for writing. Possible in memory and not backed to permanent storage. A crash would require a complete index rebuild - but only when there were entries in it (or at least more than some configurable threshold, to allow for cases were some missing index entries are acceptable).

So where the index is needed for a query, part 1 is checked first, and the part 2 if necessary

Have a background process that will move entries from part 2 to part 1, when the systems is less busy.


Cheers,
Gavin





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to