On 15 September 2013 01:14, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> Hi, > > Here's a reviewable version of what I've dubbed Minmax indexes. Some > people said they would like to use some other name for this feature, but > I have yet to hear usable ideas, so for now I will keep calling them > this way. I'm open to proposals, but if you pick something that cannot > be abbreviated "mm" I might have you prepare a rebased version which > renames the files and structs. > > The implementation here has been simplified from what I originally > proposed at 20130614222805.gz5...@eldon.alvh.no-ip.org -- in particular, > I noticed that there's no need to involve aggregate functions at all; we > can just use inequality operators. So the pg_amproc entries are gone; > only the pg_amop entries are necessary. > > I've somewhat punted on the question of doing resummarization separately > from vacuuming. Right now, resummarization (as well as other necessary > index cleanup) takes place in amvacuumcleanup. This is not optimal; I > have stated elsewhere that I'd like to create separate maintenance > actions that can be carried out by autovacuum. That would be useful > both for Minmax indexes and GIN indexes (pending insertion list); maybe > others. That's not part of this patch, however. > > The design of this stuff is in the file "minmax-proposal" at the top of > the tree. That file is up to date, though it still contains some open > questions that were present in the original proposal. (I have not fixed > some bogosities pointed out by Noah, for instance. I will do that > shortly.) In a final version, that file would be applied as > src/backend/access/minmax/README, most likely. > > One area on which I needed to modify core code is IndexBuildHeapScan. I > needed a version that was able to scan only a certain range of pages, > not the entire table, so I introduced a new IndexBuildHeapRangeScan, and > added a quick "heap_scansetlimits" function. I haven't tested that this > works outside of the HeapRangeScan thingy, so it's probably completely > bogus; I'm open to suggestions if people think this should be > implemented differently. In any case, keeping that implementation > together with vanilla IndexBuildHeapScan makes a lot of sense. > > One thing still to tackle is when to mark ranges as unsummarized. Right > now, any new tuple on a page range would cause a new index entry to be > created and a new revmap update. This would cause huge index bloat if, > say, a page is emptied and vacuumed and filled with new tuples with > increasing values outside the original range; each new tuple would > create a new index tuple. I have two ideas about this (1. mark range as > unsummarized if 3rd time we touch the same page range; 2. vacuum the > affected index page if it's full, so we can maintain the index always up > to date without causing unduly bloat), but I haven't implemented > anything yet. > > The "amcostestimate" routine is completely bogus; right now it returns > constant 0, meaning the index is always chosen if it exists. > > There are opclasses for int4, numeric and text. The latter doesn't work > at all, because collation info is not passed down at all. I will have > to figure that out (even if I find unlikely that minmax indexes have any > usefulness on top of text columns). I admit that numeric hasn't been > tested, and it's quite likely that they won't work; mainly because of > lack of some datumCopy() calls, about which the code contains some > /* XXX */ lines. I think this should be relatively straightforward. > Ideally, the final version of this patch would contain opclasses for all > supported datatypes (i.e. the same that have got btree opclasses). > > I have messed up the opclass information, as evidenced by failures in > opr_sanity regression test. I will research that later. > > > There's working contrib/pageinspect support; pg_xlogdump (and wal_debug) > seems to work sanely too. > This patch compiles cleanly under -Werror. > > The research leading to these results has received funding from the > European Union's Seventh Framework Programme (FP7/2007-2013) under > grant agreement n° 318633 Thanks for the patch, but I seem to have immediately hit a snag: pgbench=# CREATE INDEX minmaxtest ON pgbench_accounts USING minmax (aid); PANIC: invalid xlog record length 0 -- Thom