Hello, this patch enables planner to be couscious of inter-column correlation.
Sometimes two or more columns in a table has some correlation which brings underestimate, which leads to wrong join method and ends with slow execution. Tomas Vondra is now working on heavily-equipped multivariate statistics for OLAP usage. In contrast, this is a lightly implemented solution which calculates only the ratio between a rows estimated by current method and a actual row number. I think this doesn't conflict with his work except the grammar part. This would apply fewer cases but I suppose still in many cases the correlated colums would be in simple proportional relationship, so this can help the cases. The previous discussion is https://wiki.postgresql.org/wiki/Cross_Columns_Stats http://www.postgresql.org/message-id/4d0ba4d5.8080...@fuzzy.cz This patch is covers only the type A (Discrete values and equality conditions) but I think it is usable in many cases seen in the field. So I'd like to repropose for the latest version of PostgreSQL. - design outline Provide new system catalog pg_mvcoefficient to store the information required to do this. A user can instruct planner to correct the wrong estimation caused by inter-column correlation by registering the columns in pg_mvcoefficient using new DDL ALTER TABLE... ADD STATISTICS. Analyzing of the target table also stores the 'multivariate coefficient' calculated by using the following formula into pg_mvcoefficient. mv_coef(c1, c2, ..) = ndistinct(c1 * c2 * ...) / (ndistinct(c1) * ndistinct(c2) * ...) In clauselist_selectivity, planner corrects the estimate if given clauselist has equivalence-classes-compatible clauses for required columns at the top-level. - Example The attached perl script gentbl.pl generates test data resembles some tables in DBT-3 benchmark. > $ perl gentbl.pl | psql postgres > =# EXPLAIN ANALYZE SELECT * FROM t1 WHERE a = 1 AND b = 2501; ... > Seq Scan on t1 (cost=0.00..653.00 rows=1 width=12) (actual > time=0.021..6.348 rows=8 loops=1) This doesn't have no harm but in a join case, > =# EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.a = t2.a AND t1.b = t2.b; > Hash Join (cost=122.00..855.32 rows=32 width=24) > (actual time=2.009..29.208 rows=32000 loops=1) The correlation between a and b makes the estimate too small. Then register correlation setting. > =# ALTER TABLE t1 ADD STATISTICS (mvndistinct) ON (a, b); > =# ANALYZE t1; Then the estimate will be corrected. > =# EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.a = t2.a AND t1.b = t2.b; > Hash Join (cost=122.00..855.32 rows=32000 width=24) > (actual time=1.907..29.025 rows=32000 loops=1) - Known limitations The coefficient calculated by this feature is applicble only for conjunctions of simple var-exprs on merge-joinable operator. The coefficient is applied regardless of whether the base estimate has been calculated using MCV, so estimates for non-join cases on the columns which has MCV can rather become inaccurate. Uniform correlation is assumed so some extent of correlation ununiformity would lead to wrong estimation. This patch set doesn't contain any document yet. - Patche Files This patch consists of the following files. - 0001-New-system-catalog-pg_mvcoefficient.patch Adds new system catalog pg_mvcoefficient. - 0002-Analyze-part-for-multivariate-coefficient.patch Analyze part of multivariate coefficient. - 0003-Make-use-of-multivariate-coefficeient-in-estimation-.patch Planner part to make it use the multivariate coefficient. - 0004-Syntactical-part-of-multivariate-coefficient.patch Add new DDL to define mv coefficient columns. The four files above are essential. The two following files are experimental patch to add mvcattrs to index columns. One of them adds a new opclass for int2vector of btree but it would be overkill. - 0005-Add-btree-operator-class-for-int2vector.patch Add btree operator class for int2vector. - 0006-Use-modified-index-of-pg_mvcoefficient.patch Use modified index of pg_mvcoefficient. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
>From ce2beeba56b96af4f9289be53646f83e807637dc Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Thu, 6 Aug 2015 16:49:04 +0900 Subject: [PATCH 1/6] New system catalog pg_mvcoefficient. --- src/backend/catalog/Makefile | 3 +- src/backend/utils/cache/syscache.c | 12 +++++ src/include/catalog/indexing.h | 3 ++ src/include/catalog/pg_mvcoefficient.h | 74 ++++++++++++++++++++++++++++++ src/include/utils/syscache.h | 1 + src/test/regress/expected/sanity_check.out | 1 + 6 files changed, 93 insertions(+), 1 deletion(-) create mode 100644 src/include/catalog/pg_mvcoefficient.h diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile index 25130ec..4ce1653 100644 --- a/src/backend/catalog/Makefile +++ b/src/backend/catalog/Makefile @@ -33,7 +33,8 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\ pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \ pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \ pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \ - pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \ + pg_cast.h pg_enum.h pg_mvcoefficient.h pg_namespace.h pg_conversion.h \ + pg_depend.h \ pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \ pg_authid.h pg_auth_members.h pg_shdepend.h pg_shdescription.h \ pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \ diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c index efce7b9..a895405 100644 --- a/src/backend/utils/cache/syscache.c +++ b/src/backend/utils/cache/syscache.c @@ -43,6 +43,7 @@ #include "catalog/pg_foreign_server.h" #include "catalog/pg_foreign_table.h" #include "catalog/pg_language.h" +#include "catalog/pg_mvcoefficient.h" #include "catalog/pg_namespace.h" #include "catalog/pg_opclass.h" #include "catalog/pg_operator.h" @@ -501,6 +502,17 @@ static const struct cachedesc cacheinfo[] = { }, 4 }, + {MvCoefficientRelationId, /* MVCOEFFICIENT */ + MvCoefficientIndexId, + 1, + { + Anum_pg_mvcoefficient_mvcreloid, + 0, + 0, + 0 + }, + 4 + }, {NamespaceRelationId, /* NAMESPACENAME */ NamespaceNameIndexId, 1, diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h index c38958d..fb28747 100644 --- a/src/include/catalog/indexing.h +++ b/src/include/catalog/indexing.h @@ -173,6 +173,9 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops)); #define LargeObjectMetadataOidIndexId 2996 +DECLARE_UNIQUE_INDEX(pg_mvcoefficient_index, 3578, on pg_mvcoefficient using btree(mvcreloid oid_ops)); +#define MvCoefficientIndexId 3578 + DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops)); #define NamespaceNameIndexId 2684 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops)); diff --git a/src/include/catalog/pg_mvcoefficient.h b/src/include/catalog/pg_mvcoefficient.h new file mode 100644 index 0000000..5936070 --- /dev/null +++ b/src/include/catalog/pg_mvcoefficient.h @@ -0,0 +1,74 @@ +/*------------------------------------------------------------------------- + * + * pg_mvcoefficient.h + * definition of the system multivariate coefficient relation + * (pg_mvcoefficient) along with the relation's initial contents. + * + * Copyright (c) 2015, PostgreSQL Global Development Group + * + * src/include/catalog/pg_mvcoefficient.h + * + * NOTES + * the genbki.pl script reads this file and generates .bki + * information from the DATA() statements. + * + * XXX do NOT break up DATA() statements into multiple lines! + * the scripts are not as smart as you might think... + * + *------------------------------------------------------------------------- + */ +#ifndef PG_MVCOEFFICIENT_H +#define PG_MVCOEFFICIENT_H + +#include "catalog/genbki.h" +#include "nodes/pg_list.h" + +/* ---------------- + * pg_mvcoefficient definition. cpp turns this into + * typedef struct FormData_pg_mvcoefficient + * ---------------- + */ +#define MvCoefficientRelationId 3577 + +/* This values is arbitrary number, but 6 seems already too much. */ +#define MVCOEF_MAX_COLS 6 + +CATALOG(pg_mvcoefficient,3577) BKI_WITHOUT_OIDS +{ + Oid mvcreloid; /* OID of target relation */ + int16 mvcnattrs; /* number of mv columns */ + float8 mvccoefficient; /* multivariate distinctness coefficient */ + + /* + * variable-length fields start here, but we allow direct access to + * mvcattrs. + */ + int2vector mvcattrs; /* Column numbers */ +} FormData_pg_mvcoefficient; + +/* ---------------- + * Form_pg_mvcoefficient corresponds to a pointer to a tuple with the + * format of pg_mvcoefficient relation. + * ---------------- + */ +typedef FormData_pg_mvcoefficient *Form_pg_mvcoefficient; + +/* ---------------- + * compiler constants for pg_mvcoefficient + * ---------------- +< */ +#define Natts_pg_mvcoefficient 4 +#define Anum_pg_mvcoefficient_mvcreloid 1 +#define Anum_pg_mvcoefficient_mvcnattrs 2 +#define Anum_pg_mvcoefficient_mvccoefficient 3 +#define Anum_pg_mvcoefficient_mvcattrs 4 + +/* ---------------- + * pg_mvcoefficient has no initial contents + * ---------------- + */ + +/* + * prototypes for functions in pg_enum.c + */ +#endif /* PG_MVCOEFFICIENT_H */ diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h index 18404e2..1f48d61 100644 --- a/src/include/utils/syscache.h +++ b/src/include/utils/syscache.h @@ -66,6 +66,7 @@ enum SysCacheIdentifier INDEXRELID, LANGNAME, LANGOID, + MVCOEFFICIENT, NAMESPACENAME, NAMESPACEOID, OPERNAMENSP, diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out index eb0bc88..7c77796 100644 --- a/src/test/regress/expected/sanity_check.out +++ b/src/test/regress/expected/sanity_check.out @@ -113,6 +113,7 @@ pg_inherits|t pg_language|t pg_largeobject|t pg_largeobject_metadata|t +pg_mvcoefficient|t pg_namespace|t pg_opclass|t pg_operator|t -- 1.8.3.1
>From 04207bbe8a5800fcc2c76cdacb77798639113b80 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Thu, 6 Aug 2015 17:12:28 +0900 Subject: [PATCH 2/6] Analyze part for multivariate coefficient. Calculates multivarite coefficients on ANALYZE and stores them into pg_mvcoefficient. This code is considering multiple definitions for one relation. (However definition part registers no more than one definition per relation.) It calculates the following number ndistinct(A * B * ...) ---------------------------------- ndistint(A) * ndistinct(B) * ... --- src/backend/commands/analyze.c | 402 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 401 insertions(+), 1 deletion(-) diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 861048f..45f7cf5 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -27,6 +27,7 @@ #include "catalog/indexing.h" #include "catalog/pg_collation.h" #include "catalog/pg_inherits_fn.h" +#include "catalog/pg_mvcoefficient.h" #include "catalog/pg_namespace.h" #include "commands/dbcommands.h" #include "commands/tablecmds.h" @@ -46,6 +47,7 @@ #include "utils/acl.h" #include "utils/attoptcache.h" #include "utils/datum.h" +#include "utils/fmgroids.h" #include "utils/guc.h" #include "utils/lsyscache.h" #include "utils/memutils.h" @@ -56,6 +58,10 @@ #include "utils/timestamp.h" #include "utils/tqual.h" +#ifndef CHAR_BIT +#include <limits.h> +#endif + /* Per-index data for ANALYZE */ typedef struct AnlIndexData @@ -96,7 +102,11 @@ static void update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats); static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull); static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull); - +static float4 compute_mv_ndistinct(int nattrs, + VacAttrStats **colstats, + AnalyzeAttrFetchFunc fetchfunc, + int samplerows, + double totalrows); /* * analyze_rel() -- analyze one relation @@ -281,6 +291,128 @@ analyze_rel(Oid relid, RangeVar *relation, int options, LWLockRelease(ProcArrayLock); } +/* Compute and store multivariate distinctness */ +static void +update_mv_distinct(Oid reloid, VacAttrStats **vacattrstats, + int attr_cnt, int numrows, double totalrows) +{ + ScanKeyData scankey; + SysScanDesc sysscan; + Relation mvcrel; + HeapTuple oldtup, newtup; + + mvcrel = heap_open(MvCoefficientRelationId, RowExclusiveLock); + + ScanKeyInit(&scankey, + Anum_pg_mvcoefficient_mvcreloid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(reloid)); + sysscan = systable_beginscan(mvcrel, MvCoefficientIndexId, true, + NULL, 1, &scankey); + oldtup = systable_getnext(sysscan); + + while (HeapTupleIsValid(oldtup)) + { + VacAttrStats *colstats[MVCOEF_MAX_COLS]; + int ncols; + double nd_cartesian, nd, ncol_totalrows; + Datum values[Natts_pg_mvcoefficient]; + bool nulls[Natts_pg_mvcoefficient]; + bool replaces[Natts_pg_mvcoefficient]; + int i; + float4 nd_coef; + + Form_pg_mvcoefficient mvc = + (Form_pg_mvcoefficient) GETSTRUCT (oldtup); + + Assert(mvc->mvcnattrs > 1 && mvc->mvcnattrs <= MVCOEF_MAX_COLS); + ncols = mvc->mvcnattrs; + + /* find vacattrstats entry for required columns */ + for (i = 0 ; i < ncols ; i++) + { + int j; + + for(j = 0 ; j < attr_cnt ; j++) + { + if (mvc->mvcattrs.values[i] == vacattrstats[j]->tupattnum) + { + colstats[i] = vacattrstats[j]; + break; + } + } + + /* required column is not in vacattrstats, give up */ + if (j == attr_cnt) + break; + } + + /* This mvdistinct is not computable with this vacattrstats. */ + if (i != ncols) + { + oldtup = systable_getnext(sysscan); + continue; + } + + /* + * compute cartesian ndistinct = + * ndistinct(col0) * ndistinct(col1) ... + * + * This calculation is performed with distinct ratio to preserve + * precision. + */ + nd_cartesian = 1.0; + for (i = 0 ; i < ncols ; i++) + { + double t = -colstats[i]->stadistinct; + + /* ignore colums where stadistinct is "unknown" */ + if (t == 0.0) continue; + + if (t < 0) + t = -t / totalrows; + + nd_cartesian *= t; + } + + nd_coef = 1.0; + + /* compute ndistinct(col0 * col1 * ... ) */ + nd = compute_mv_ndistinct(i, colstats, + std_fetch_func, numrows, totalrows); + + ncol_totalrows = pow(totalrows, ncols); + if (!isinf(ncol_totalrows)) /* This won't be happen? */ + nd_coef = nd / nd_cartesian / ncol_totalrows; + else + nd_coef = 0.0; + + /* Update statistics record */ + for (i = 0; i < Natts_pg_mvcoefficient ; ++i) + { + nulls[i] = false; + replaces[i] = false; + } + values[Anum_pg_mvcoefficient_mvccoefficient - 1] = + Float8GetDatum(nd_coef); + replaces[Anum_pg_mvcoefficient_mvccoefficient - 1] = true; + newtup = heap_modify_tuple(oldtup, + RelationGetDescr(mvcrel), + values, + nulls, + replaces); + simple_heap_update(mvcrel, &oldtup->t_self, newtup); + + CatalogUpdateIndexes(mvcrel, newtup); + + oldtup = systable_getnext(sysscan); + } + + systable_endscan(sysscan); + heap_close(mvcrel, RowExclusiveLock); +} + + /* * do_analyze_rel() -- analyze one relation, recursively or not * @@ -538,6 +670,10 @@ do_analyze_rel(Relation onerel, int options, VacuumParams *params, MemoryContextResetAndDeleteChildren(col_context); } + /* update multivariate distinctness if requested */ + update_mv_distinct(onerel->rd_id, vacattrstats, attr_cnt, + numrows, totalrows); + if (hasindex) compute_index_stats(onerel, totalrows, indexdata, nindexes, @@ -1676,6 +1812,16 @@ typedef struct int tupno; /* position index for tuple it came from */ } ScalarItem; +#define VI_UBITS (CHAR_BIT * sizeof(unsigned)) +#define VI_BSET(bm,an) ((bm)[(an)/VI_UBITS] |= (unsigned)1 << ((an)%VI_UBITS)) +#define VI_ISBSET(bm,an) ((bm)[(an)/VI_UBITS] & ((unsigned)1 << ((an)%VI_UBITS))) +typedef struct +{ + unsigned nulls[MVCOEF_MAX_COLS / VI_UBITS + 1]; /* null bitmap */ + Datum *datums; /* Values list */ + int tupno; /* position index for tuple it came from */ +} VectorItem; + typedef struct { int count; /* # of duplicates */ @@ -1698,6 +1844,7 @@ static void compute_scalar_stats(VacAttrStatsP stats, int samplerows, double totalrows); static int compare_scalars(const void *a, const void *b, void *arg); +static int compare_mv_scalars(const void *a, const void *b, void *arg); static int compare_mcvs(const void *a, const void *b); @@ -2628,6 +2775,220 @@ compute_scalar_stats(VacAttrStatsP stats, } /* + * compute_mv_ndistinct() -- compute multicolumn distinctness + */ +static float4 +compute_mv_ndistinct(int nattrs, + VacAttrStats **colstats, + AnalyzeAttrFetchFunc fetchfunc, + int samplerows, + double totalrows) +{ + int i; + int null_cnt = 0; + int nonnull_cnt = 0; + int toowide_cnt = 0; + bool is_varlena[MVCOEF_MAX_COLS]; + SortSupportData ssup[MVCOEF_MAX_COLS]; + Datum *datums; + VectorItem *values; + int values_cnt = 0; + int *tupnoLink; + StdAnalyzeData *mystats[MVCOEF_MAX_COLS]; + float4 fndistinct; + + Assert (nattrs > 1 && nattrs <= MVCOEF_MAX_COLS); + + /* set up row storage for sorting */ + datums = (Datum*) palloc(samplerows * nattrs * sizeof(Datum)); + values = (VectorItem *) palloc0(samplerows * sizeof(VectorItem)); + tupnoLink = (int *) palloc(samplerows * sizeof(int)); + + for (i = 0 ; i < samplerows ; i++) + values[i].datums = &datums[i * nattrs]; + + memset(ssup, 0, sizeof(ssup)); + for (i = 0 ; i < nattrs ; i++) + { + VacAttrStats *vas = colstats[i]; + + is_varlena[i] = + !vas->attrtype->typbyval && vas->attrtype->typlen == -1; + mystats[i] = + (StdAnalyzeData*) vas->extra_data; + + ssup[i].ssup_cxt = CurrentMemoryContext; + /* We always use the default collation for statistics */ + ssup[i].ssup_collation = DEFAULT_COLLATION_OID; + /* Don't use abbreviated key conversion for now. See + compute_scalar_stats. */ + /* either will do for sort order and nulls_first */ + PrepareSortSupportFromOrderingOp(mystats[i]->ltopr, &ssup[i]); + } + ssup[nattrs].ssup_cxt = NULL; + + /* Initial scan to find sortable values */ + for (i = 0; i < samplerows; i++) + { + Datum colvalues[nattrs]; + bool colnulls[nattrs]; + bool all_is_null = true; + bool toowide = false; + int j; + + vacuum_delay_point(); + + for (j = 0 ; j < nattrs ; j++) + { + colnulls[nattrs] = false; + + colvalues[j] = fetchfunc(colstats[j], i, &colnulls[j]); + + /* + * Check for null/nonnull. Don't abandon this row if null is + * contained + */ + if (colnulls[j]) + continue; + + all_is_null = false; + + /* + * If it's a variable-width field, detoast it now to avoid + * repeated detoasting during the comparisons. If at least one of + * the columns is too long, exclude this row from accurate + * counting. + */ + if (is_varlena[j]) + { + if (toast_raw_datum_size(colvalues[j]) > WIDTH_THRESHOLD) + { + toowide = true; + break; + } + colvalues[j] = PointerGetDatum(PG_DETOAST_DATUM(colvalues[j])); + } + } + if (all_is_null) + { + null_cnt++; + continue; + } + nonnull_cnt++; + + if (toowide) + { + toowide_cnt++; + continue; + } + + /* Add it to the list to be sorted */ + for (j = 0 ; j < nattrs ; j++) + { + if (colnulls[j]) + VI_BSET(values[values_cnt].nulls, j); + else + values[values_cnt].datums[j] = colvalues[j]; + } + + values[values_cnt].tupno = values_cnt; + tupnoLink[values_cnt] = values_cnt; + values_cnt++; + } + + /* We can only compute real stats if we found some sortable values. */ + if (values_cnt > 0) + { + int ndistinct, /* # distinct values in sample */ + nmultiple, /* # that appear multiple times */ + dups_cnt; + CompareScalarsContext cxt; + + /* Sort the collected values */ + cxt.ssup = ssup; + cxt.tupnoLink = tupnoLink; + qsort_arg((void *) values, values_cnt, sizeof(VectorItem), + compare_mv_scalars, (void *) &cxt); + + ndistinct = 0; + nmultiple = 0; + dups_cnt = 0; + for (i = 0; i < values_cnt; i++) + { + int tupno = values[i].tupno; + + dups_cnt++; + if (tupnoLink[tupno] == tupno) + { + /* Reached end of duplicates of this value */ + ndistinct++; + if (dups_cnt > 1) + nmultiple++; + + dups_cnt = 0; + } + } + + if (nmultiple == 0) + { + /* If we found no repeated values, assume it's a unique column */ + fndistinct = totalrows; + } + else if (toowide_cnt == 0 && nmultiple == ndistinct) + { + /* + * Every value in the sample appeared more than once. Assume the + * columns have just these values. + */ + fndistinct = (float4)ndistinct; + } + else + { + /*---------- + * Estimate the number of distinct values using the estimator. + * See compute_scalar_stats for the algorithm. + *---------- + */ + int f1 = ndistinct - nmultiple + toowide_cnt; + int d = f1 + nmultiple; + double numer, + denom, + stadistinct; + + numer = (double) samplerows *(double) d; + + denom = (double) (samplerows - f1) + + (double) f1 *(double) samplerows / totalrows; + + stadistinct = numer / denom; + /* Clamp to sane range in case of roundoff error */ + if (stadistinct < (double) d) + stadistinct = (double) d; + if (stadistinct > totalrows) + stadistinct = totalrows; + fndistinct = floor(stadistinct + 0.5); + } + } + else if (nonnull_cnt > 0) + { + /* We found some non-null values, but they were all too wide */ + Assert(nonnull_cnt == toowide_cnt); + + /* Assume all too-wide values are distinct, so all rows are unique */ + fndistinct = totalrows; + } + else if (null_cnt > 0) + { + /* We found only all-null rows; assume the rows are entirely null */ + fndistinct = 0.0; /* "unknown" */ + } + + /* We don't need to bother cleaning up any of our temporary palloc's */ + return fndistinct; +} + + +/* * qsort_arg comparator for sorting ScalarItems * * Aside from sorting the items, we update the tupnoLink[] array @@ -2664,6 +3025,45 @@ compare_scalars(const void *a, const void *b, void *arg) return ta - tb; } +static int +compare_mv_scalars(const void *a, const void *b, void *arg) +{ + CompareScalarsContext *cxt = (CompareScalarsContext *) arg; + VectorItem *va = (VectorItem*)a; + VectorItem *vb = (VectorItem*)b; + Datum da, db; + int ta, tb; + int compare; + int i; + + for (i = 0 ; cxt->ssup[i].ssup_cxt ; i++) + { + bool da_isnull = VI_ISBSET(va->nulls, i); + bool db_isnull = VI_ISBSET(vb->nulls, i); + da = va->datums[i]; + db = vb->datums[i]; + + compare = ApplySortComparator(da, da_isnull, db, db_isnull, &cxt->ssup[i]); + if (compare != 0) + return compare; + } + + /* + * The two datums are equal, so update cxt->tupnoLink[]. + */ + ta = va->tupno; + tb = vb->tupno; + if (cxt->tupnoLink[ta] < tb) + cxt->tupnoLink[ta] = tb; + if (cxt->tupnoLink[tb] < ta) + cxt->tupnoLink[tb] = ta; + + /* + * For equal datums, sort by tupno + */ + return ta - tb; +} + /* * qsort comparator for sorting ScalarMCVItems by position */ -- 1.8.3.1
>From 19f5733988c75732a6fd4612c822fa99f05ed9ab Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Thu, 6 Aug 2015 17:31:20 +0900 Subject: [PATCH 3/6] Make use of multivariate coefficeient in estimation of clauselist. This is rather simple but walks on whole clauselist to collect column information.. --- src/backend/optimizer/path/clausesel.c | 100 +++++++++++++++++++++++++++++++++ 1 file changed, 100 insertions(+) diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c index dcac1c1..b28c271 100644 --- a/src/backend/optimizer/path/clausesel.c +++ b/src/backend/optimizer/path/clausesel.c @@ -14,8 +14,15 @@ */ #include "postgres.h" +#include "access/genam.h" +#include "access/heapam.h" +#include "access/htup_details.h" +#include "catalog/indexing.h" +#include "catalog/pg_class.h" #include "catalog/pg_operator.h" +#include "catalog/pg_mvcoefficient.h" #include "nodes/makefuncs.h" +#include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" #include "optimizer/pathnode.h" @@ -43,6 +50,96 @@ static void addRangeClause(RangeQueryClause **rqlist, Node *clause, bool varonleft, bool isLTsel, Selectivity s2); +static bool +collect_collist_walker(Node *node, Bitmapset **colsetlist) +{ + if (node == NULL) + return false; + if (IsA(node, Var)) + { + Var *var = (Var*)node; + + if (AttrNumberIsForUserDefinedAttr(var->varattno)) + colsetlist[var->varno] = + bms_add_member(colsetlist[var->varno], var->varattno); + } + return expression_tree_walker(node, collect_collist_walker, + (void*)colsetlist); +} + +/* Find multivariate distinctness coefficient for clauselist */ +static double +find_mv_coeffeicient(PlannerInfo *root, List *clauses) +{ + int relid; + ListCell *l; + Bitmapset **colsetlist = NULL; + double mv_coef = 1.0; + + /* Collect columns this clauselist on */ + colsetlist = (Bitmapset**) + palloc0(root->simple_rel_array_size * sizeof(Bitmapset*)); + + foreach(l, clauses) + { + RestrictInfo *rti = (RestrictInfo *) lfirst(l); + + /* Consider only EC-related clauses */ + if (rti->left_ec && (rti->left_ec == rti->right_ec)) + { + if (IsA(rti, RestrictInfo)) + collect_collist_walker((Node*)rti->clause, colsetlist); + } + } + + /* Find pg_mv_coefficient entries match this columlist */ + for (relid = 1 ; relid < root->simple_rel_array_size ; relid++) + { + Relation mvcrel; + SysScanDesc sscan; + ScanKeyData skeys[1]; + HeapTuple tuple; + + /* at least two colums required for on rel*/ + if (bms_num_members(colsetlist[relid]) < 2) continue; + + /* tables other than ordinary ones have no mv statistics */ + if (root->simple_rte_array[relid]->rtekind != RTE_RELATION || + root->simple_rte_array[relid]->relkind != RELKIND_RELATION) + continue; + + ScanKeyInit(&skeys[0], + Anum_pg_mvcoefficient_mvcreloid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(root->simple_rte_array[relid]->relid)); + + mvcrel = heap_open(MvCoefficientRelationId, AccessShareLock); + sscan = systable_beginscan(mvcrel, MvCoefficientIndexId, true, + NULL, 1, skeys); + while (HeapTupleIsValid(tuple = systable_getnext(sscan))) + { + int i; + Bitmapset *mvccols = NULL; + Form_pg_mvcoefficient mvc = + (Form_pg_mvcoefficient) GETSTRUCT (tuple); + + for (i = 0 ; i < mvc->mvcnattrs ; i++) + mvccols = bms_add_member(mvccols, mvc->mvcattrs.values[i]); + + if (!bms_is_subset(mvccols, colsetlist[relid])) + continue; + + /* Prefer smaller one */ + if (mvc->mvccoefficient > 0 && mvc->mvccoefficient < mv_coef) + mv_coef = mvc->mvccoefficient; + } + systable_endscan(sscan); + heap_close(mvcrel, AccessShareLock); + } + + return mv_coef; +} + /**************************************************************************** * ROUTINES TO COMPUTE SELECTIVITIES ****************************************************************************/ @@ -200,6 +297,9 @@ clauselist_selectivity(PlannerInfo *root, s1 = s1 * s2; } + /* Try multivariate distinctness correction for clauses */ + s1 /= find_mv_coeffeicient(root, clauses); + /* * Now scan the rangequery pair list. */ -- 1.8.3.1
>From 693c9b0f67a0fabc38212e436453a29f9ae275c5 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Thu, 6 Aug 2015 16:59:47 +0900 Subject: [PATCH 4/6] Syntactical part of multivariate coefficient. This adds new two syntax ALTER TABLE <name> (ADD|DROP) STATISTICS (<type>,..) ON (<col>,..) To avoid shift/reduce conflict caused by adding these syntaxes, two existing rules are splitted into two rules each. They simply add or remove one definition tuple into/from pg_mvcoefficeint and currently only one definition for one relation is allowed. The tuples are removed on deletion of the target relation or covering columns. --- src/backend/catalog/heap.c | 49 ++++++++++ src/backend/commands/tablecmds.c | 191 +++++++++++++++++++++++++++++++++++++++ src/backend/parser/gram.y | 46 +++++++++- src/include/catalog/heap.h | 1 + src/include/nodes/parsenodes.h | 2 + 5 files changed, 287 insertions(+), 2 deletions(-) diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index d04e94d..32fafb7 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -46,6 +46,7 @@ #include "catalog/pg_constraint.h" #include "catalog/pg_foreign_table.h" #include "catalog/pg_inherits.h" +#include "catalog/pg_mvcoefficient.h" #include "catalog/pg_namespace.h" #include "catalog/pg_statistic.h" #include "catalog/pg_tablespace.h" @@ -2647,6 +2648,51 @@ cookConstraint(ParseState *pstate, } +void +RemoveMvStatistics(Oid relid, AttrNumber attnum) +{ + Relation pgmvcoef; + SysScanDesc scan; + ScanKeyData key[1]; + HeapTuple tuple; + + pgmvcoef = heap_open(MvCoefficientRelationId, RowExclusiveLock); + + ScanKeyInit(&key[0], + Anum_pg_mvcoefficient_mvcreloid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(relid)); + + scan = systable_beginscan(pgmvcoef, MvCoefficientIndexId, true, + NULL, 1, key); + + while (HeapTupleIsValid(tuple = systable_getnext(scan))) + { + if (attnum != 0) + { + /* Check if this attnum is contained in this mvc entry */ + int i; + Form_pg_mvcoefficient mvc = + (Form_pg_mvcoefficient) GETSTRUCT (tuple); + + for (i = 0 ; i < mvc->mvcnattrs ; i++) + { + if (mvc->mvcattrs.values[i] == attnum) + break; + } + + /* This mvcoef entry has no relation with this attribute set */ + if (i == mvc->mvcnattrs) + continue; + } + + simple_heap_delete(pgmvcoef, &tuple->t_self); + } + + systable_endscan(scan); + + heap_close(pgmvcoef, RowExclusiveLock); +} /* * RemoveStatistics --- remove entries in pg_statistic for a rel or column * @@ -2690,6 +2736,9 @@ RemoveStatistics(Oid relid, AttrNumber attnum) systable_endscan(scan); heap_close(pgstatistic, RowExclusiveLock); + + /* Remove mutivariate coefficient entry */ + RemoveMvStatistics(relid, attnum); } diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 126b119..843a14e 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -36,6 +36,7 @@ #include "catalog/pg_inherits.h" #include "catalog/pg_inherits_fn.h" #include "catalog/pg_namespace.h" +#include "catalog/pg_mvcoefficient.h" #include "catalog/pg_opclass.h" #include "catalog/pg_tablespace.h" #include "catalog/pg_trigger.h" @@ -402,6 +403,7 @@ static bool ATPrepChangePersistence(Relation rel, bool toLogged); static void ATPrepSetTableSpace(AlteredTableInfo *tab, Relation rel, char *tablespacename, LOCKMODE lockmode); static void ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode); +static void ATExecAddDropMvStatistics(Relation rel, List *param, bool drop); static void ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation, LOCKMODE lockmode); @@ -3015,6 +3017,8 @@ AlterTableGetLockLevel(List *cmds) * applies: we don't currently allow concurrent catalog * updates. */ + case AT_AddMvStatistics: + case AT_DropMvStatistics: case AT_SetStatistics: /* Uses MVCC in getTableAttrs() */ case AT_ClusterOn: /* Uses MVCC in getIndexes() */ case AT_DropCluster: /* Uses MVCC in getIndexes() */ @@ -3175,6 +3179,8 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, break; case AT_SetOptions: /* ALTER COLUMN SET ( options ) */ case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */ + case AT_AddMvStatistics: + case AT_DropMvStatistics: ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW | ATT_INDEX | ATT_FOREIGN_TABLE); /* This command never recurses */ pass = AT_PASS_MISC; @@ -3591,6 +3597,12 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel, * Nothing to do here; Phase 3 does the work */ break; + case AT_AddMvStatistics: /* ADD STATISTICS */ + ATExecAddDropMvStatistics(rel, (List *)cmd->def, false); + break; + case AT_DropMvStatistics: /* DROP STATISTICS */ + ATExecAddDropMvStatistics(rel, (List *)cmd->def, true); + break; case AT_SetRelOptions: /* SET (...) */ case AT_ResetRelOptions: /* RESET (...) */ case AT_ReplaceRelOptions: /* replace entire option list */ @@ -9300,6 +9312,185 @@ ATPrepSetTableSpace(AlteredTableInfo *tab, Relation rel, char *tablespacename, L tab->newTableSpace = tablespaceId; } +static int +cmpint16(const void *p1, const void *p2) +{ + int a = *(int16 *)p1; + int b = *(int16 *)p2; + + return a - b; +} + +static void +ATExecAddDropMvStatistics(Relation rel, List * params, bool drop) +{ + Oid relid = RelationGetRelid(rel); + int ncols = 0; + int16 colids[MVCOEF_MAX_COLS]; + Relation mvcrel; + HeapTuple oldtup, newtup; + ScanKeyData scankey; + SysScanDesc sysscan; + Datum values[Natts_pg_mvcoefficient]; + bool nulls[Natts_pg_mvcoefficient]; + bool replaces[Natts_pg_mvcoefficient]; + List *stattypes = (List*) linitial(params); + List *colnames = (List*) lsecond(params); + char *stattype = strVal(linitial(stattypes)); + ListCell *lc; + int i; + + if (list_length(stattypes) > 1) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("multiple stat types specfied"))); + + if(strcasecmp(stattype, "mvndistinct") != 0) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("unknown multivariate stat type: %s", stattype))); + + if (colnames && list_length(colnames) < 2) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("at least two columns needed"))); + + if (list_length(colnames) > MVCOEF_MAX_COLS) + ereport(ERROR, + (errcode(ERRCODE_TOO_MANY_COLUMNS), + errmsg("number of columns (%d) exceeds limit (%d)", + list_length(colnames), MVCOEF_MAX_COLS))); + + foreach (lc, colnames) + { + char *attname = strVal(lfirst (lc)); + HeapTuple atttup; + int16 colid; + + atttup = SearchSysCacheAttName(relid, attname); + if (!HeapTupleIsValid(atttup)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_COLUMN), + errmsg("column \"%s\" of relation \"%s\" does not exist", + attname, get_rel_name(relid)))); + colid = ((Form_pg_attribute) GETSTRUCT(atttup))->attnum; + ReleaseSysCache(atttup); + + if (!AttrNumberIsForUserDefinedAttr(colid)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_COLUMN), + errmsg("system columns cannot be specified"))); + + for (i = 0 ; i < ncols ; i++) + { + if (colids[i] == colid) + ereport(ERROR, + (errcode(ERRCODE_DUPLICATE_COLUMN), + errmsg("column \"%s\" specified more than once", + attname))); + } + + colids[ncols++] = colid; + } + + mvcrel = heap_open(MvCoefficientRelationId, RowExclusiveLock); + + ScanKeyInit(&scankey, + Anum_pg_mvcoefficient_mvcreloid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(relid)); + sysscan = systable_beginscan(mvcrel, MvCoefficientIndexId, true, + NULL, 1, &scankey); + + /* colum list is required to be sorted in comparison */ + qsort(colids, ncols, sizeof(int16), cmpint16); + + if (drop) + { + bool dropped = false; + + while (HeapTupleIsValid(oldtup = systable_getnext(sysscan))) + { + Form_pg_mvcoefficient mvc = + (Form_pg_mvcoefficient) GETSTRUCT (oldtup); + int i; + + /* Fast exit, number of columns don't match */ + if (mvc->mvcnattrs != ncols) + continue; + + /* + * Check for individual columns. This assumes that both of the + * column vector in mvc and colids are sorted. + */ + for (i = 0 ; i < ncols ; i++) + { + if (mvc->mvcattrs.values[i] != colids[i]) + break; + } + + /* + * Remove it if match. Unconditionally do this for the case of + * ncols == 0. + */ + if (i == ncols) + { + simple_heap_delete(mvcrel, &oldtup->t_self); + dropped = true; + } + } + + if (!dropped) + ereport(ERROR, + (errcode(ERRCODE_NO_DATA_FOUND), + errmsg("no matching statistics found"))); + } + else + { + int2vector *colvec; + + oldtup = systable_getnext(sysscan); + + for (i = 0; i < Natts_pg_mvcoefficient ; ++i) + { + nulls[i] = false; + values[i] = false; + replaces[i] = false; + } + + colvec = buildint2vector(colids, ncols); + + values[Anum_pg_mvcoefficient_mvcreloid - 1] = ObjectIdGetDatum(relid); + values[Anum_pg_mvcoefficient_mvcnattrs - 1] = Int16GetDatum(ncols); + values[Anum_pg_mvcoefficient_mvccoefficient - 1] = Float8GetDatum(1.0); + values[Anum_pg_mvcoefficient_mvcattrs - 1] = PointerGetDatum(colvec); + + if (HeapTupleIsValid(oldtup)) + { + replaces[Anum_pg_mvcoefficient_mvcnattrs - 1] = true; + replaces[Anum_pg_mvcoefficient_mvccoefficient - 1] = true; + replaces[Anum_pg_mvcoefficient_mvcattrs - 1] = true; + newtup = heap_modify_tuple(oldtup, RelationGetDescr(mvcrel), + values, nulls, replaces); + simple_heap_update(mvcrel, &oldtup->t_self, newtup); + + elog(NOTICE, + "multivariate distinctness for relation %s is cleard", + get_rel_name(relid)); + } + else + { + newtup = heap_form_tuple(RelationGetDescr(mvcrel), values, nulls); + simple_heap_insert(mvcrel, newtup); + } + CatalogUpdateIndexes(mvcrel, newtup); + } + + systable_endscan(sysscan); + + heap_close(mvcrel, RowExclusiveLock); +} + /* * Set, reset, or replace reloptions. */ diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 1efc6d6..a637df0 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -2033,7 +2033,7 @@ alter_table_cmd: $$ = (Node *)n; } /* ALTER TABLE <name> DROP [COLUMN] IF EXISTS <colname> [RESTRICT|CASCADE] */ - | DROP opt_column IF_P EXISTS ColId opt_drop_behavior + | DROP COLUMN IF_P EXISTS ColId opt_drop_behavior { AlterTableCmd *n = makeNode(AlterTableCmd); n->subtype = AT_DropColumn; @@ -2042,8 +2042,17 @@ alter_table_cmd: n->missing_ok = TRUE; $$ = (Node *)n; } + | DROP IF_P EXISTS ColId opt_drop_behavior + { + AlterTableCmd *n = makeNode(AlterTableCmd); + n->subtype = AT_DropColumn; + n->name = $4; + n->behavior = $5; + n->missing_ok = TRUE; + $$ = (Node *)n; + } /* ALTER TABLE <name> DROP [COLUMN] <colname> [RESTRICT|CASCADE] */ - | DROP opt_column ColId opt_drop_behavior + | DROP COLUMN ColId opt_drop_behavior { AlterTableCmd *n = makeNode(AlterTableCmd); n->subtype = AT_DropColumn; @@ -2052,6 +2061,15 @@ alter_table_cmd: n->missing_ok = FALSE; $$ = (Node *)n; } + | DROP ColId opt_drop_behavior + { + AlterTableCmd *n = makeNode(AlterTableCmd); + n->subtype = AT_DropColumn; + n->name = $2; + n->behavior = $3; + n->missing_ok = FALSE; + $$ = (Node *)n; + } /* * ALTER TABLE <name> ALTER [COLUMN] <colname> [SET DATA] TYPE <typename> * [ USING <expression> ] @@ -2087,6 +2105,14 @@ alter_table_cmd: n->def = $2; $$ = (Node *)n; } + /* ALTER TABLE <name> ADD STATISTICS (...) ON ( <cols> ) */ + | ADD_P STATISTICS '(' name_list ')' ON '(' columnList ')' + { + AlterTableCmd *n = makeNode(AlterTableCmd); + n->subtype = AT_AddMvStatistics; + n->def = (Node*) list_make2((Node*) $4, (Node*) $8); + $$ = (Node *)n; + } /* ALTER TABLE <name> ALTER CONSTRAINT ... */ | ALTER CONSTRAINT name ConstraintAttributeSpec { @@ -2130,6 +2156,22 @@ alter_table_cmd: n->missing_ok = FALSE; $$ = (Node *)n; } + /* ALTER TABLE <name> DROP STATISTICS (...) ON ( <cols> ) */ + | DROP STATISTICS '(' name_list ')' ON '(' columnList ')' + { + AlterTableCmd *n = makeNode(AlterTableCmd); + n->subtype = AT_DropMvStatistics; + n->def = (Node*) list_make2((Node*) $4, (Node*) $8); + $$ = (Node *)n; + } + /* ALTER TABLE <name> DROP STATISTICS (...) */ + | DROP STATISTICS '(' name_list ')' + { + AlterTableCmd *n = makeNode(AlterTableCmd); + n->subtype = AT_DropMvStatistics; + n->def = (Node*) list_make2((Node*) $4, NULL); + $$ = (Node *)n; + } /* ALTER TABLE <name> SET WITH OIDS */ | SET WITH OIDS { diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h index e6ac394..30d2acd 100644 --- a/src/include/catalog/heap.h +++ b/src/include/catalog/heap.h @@ -119,6 +119,7 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum, DropBehavior behavior, bool complain, bool internal); extern void RemoveAttrDefaultById(Oid attrdefId); extern void RemoveStatistics(Oid relid, AttrNumber attnum); +extern void RemoveMvStatistics(Oid relid, AttrNumber attnum); extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno, bool relhasoids); diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index f0dcd2f..479b8bc 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -1513,6 +1513,8 @@ typedef enum AlterTableType AT_ReplicaIdentity, /* REPLICA IDENTITY */ AT_EnableRowSecurity, /* ENABLE ROW SECURITY */ AT_DisableRowSecurity, /* DISABLE ROW SECURITY */ + AT_AddMvStatistics, /* ADD STATISTICS (...) ON (...) */ + AT_DropMvStatistics, /* DROP STATISTICS (...) ON (...) */ AT_GenericOptions /* OPTIONS (...) */ } AlterTableType; -- 1.8.3.1
>From 6e71473e8ea615bd5f047d48c7a99e3f7e47940a Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Thu, 6 Aug 2015 21:11:55 +0900 Subject: [PATCH 5/6] Add btree operator class for int2vector. Add new opclass for use for pg_mvcoefficient. --- src/backend/access/nbtree/nbtcompare.c | 24 ++++++++++++++++++++++ src/backend/utils/adt/int.c | 37 ++++++++++++++++++++++++++++++++++ src/backend/utils/cache/syscache.c | 10 ++++----- src/include/catalog/pg_amop.h | 10 +++++++++ src/include/catalog/pg_amproc.h | 1 + src/include/catalog/pg_opclass.h | 1 + src/include/catalog/pg_operator.h | 12 ++++++++++- src/include/catalog/pg_opfamily.h | 1 + src/include/catalog/pg_proc.h | 8 ++++++++ src/include/utils/builtins.h | 6 ++++++ src/include/utils/syscache.h | 2 +- 11 files changed, 105 insertions(+), 7 deletions(-) diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c index 8ce8009..baa28c0 100644 --- a/src/backend/access/nbtree/nbtcompare.c +++ b/src/backend/access/nbtree/nbtcompare.c @@ -308,6 +308,30 @@ btoidvectorcmp(PG_FUNCTION_ARGS) } Datum +btint2vectorcmp(PG_FUNCTION_ARGS) +{ + int2vector *a = (int2vector *) PG_GETARG_POINTER(0); + int2vector *b = (int2vector *) PG_GETARG_POINTER(1); + int i; + + /* We arbitrarily choose to sort first by vector length */ + if (a->dim1 != b->dim1) + PG_RETURN_INT32(a->dim1 - b->dim1); + + for (i = 0; i < a->dim1; i++) + { + if (a->values[i] != b->values[i]) + { + if (a->values[i] > b->values[i]) + PG_RETURN_INT32(1); + else + PG_RETURN_INT32(-1); + } + } + PG_RETURN_INT32(0); +} + +Datum btcharcmp(PG_FUNCTION_ARGS) { char a = PG_GETARG_CHAR(0); diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c index 1a91b29..063d85a 100644 --- a/src/backend/utils/adt/int.c +++ b/src/backend/utils/adt/int.c @@ -269,6 +269,43 @@ int2vectoreq(PG_FUNCTION_ARGS) PG_RETURN_BOOL(memcmp(a->values, b->values, a->dim1 * sizeof(int16)) == 0); } +Datum +int2vectorne(PG_FUNCTION_ARGS) +{ + PG_RETURN_BOOL(!int2vectoreq(fcinfo)); +} + +Datum +int2vectorlt(PG_FUNCTION_ARGS) +{ + int32 cmp = DatumGetInt32(btint2vectorcmp(fcinfo)); + + PG_RETURN_BOOL(cmp < 0); +} + +Datum +int2vectorle(PG_FUNCTION_ARGS) +{ + int32 cmp = DatumGetInt32(btint2vectorcmp(fcinfo)); + + PG_RETURN_BOOL(cmp <= 0); +} + +Datum +int2vectorge(PG_FUNCTION_ARGS) +{ + int32 cmp = DatumGetInt32(btint2vectorcmp(fcinfo)); + + PG_RETURN_BOOL(cmp >= 0); +} + +Datum +int2vectorgt(PG_FUNCTION_ARGS) +{ + int32 cmp = DatumGetInt32(btint2vectorcmp(fcinfo)); + + PG_RETURN_BOOL(cmp > 0); +} /***************************************************************************** * PUBLIC ROUTINES * diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c index a895405..e41268e 100644 --- a/src/backend/utils/cache/syscache.c +++ b/src/backend/utils/cache/syscache.c @@ -502,16 +502,16 @@ static const struct cachedesc cacheinfo[] = { }, 4 }, - {MvCoefficientRelationId, /* MVCOEFFICIENT */ - MvCoefficientIndexId, - 1, + {MvCoefficientRelationId, /* MVCOEFRELIDATTRS */ + MvCoefficientIndexId, + 2, { Anum_pg_mvcoefficient_mvcreloid, - 0, + Anum_pg_mvcoefficient_mvcattrs, 0, 0 }, - 4 + 8 }, {NamespaceRelationId, /* NAMESPACENAME */ NamespaceNameIndexId, diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h index da5fe9d..23eb849 100644 --- a/src/include/catalog/pg_amop.h +++ b/src/include/catalog/pg_amop.h @@ -185,6 +185,16 @@ DATA(insert ( 1991 30 30 4 s 648 403 0 )); DATA(insert ( 1991 30 30 5 s 646 403 0 )); /* + * btree int2vector_ops + */ + +DATA(insert ( 3321 22 22 1 s 3317 403 0 )); +DATA(insert ( 3321 22 22 2 s 3319 403 0 )); +DATA(insert ( 3321 22 22 3 s 386 403 0 )); +DATA(insert ( 3321 22 22 4 s 3320 403 0 )); +DATA(insert ( 3321 22 22 5 s 3318 403 0 )); + +/* * btree float_ops */ diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h index b57d6e6..72e3557 100644 --- a/src/include/catalog/pg_amproc.h +++ b/src/include/catalog/pg_amproc.h @@ -142,6 +142,7 @@ DATA(insert ( 3626 3614 3614 1 3622 )); DATA(insert ( 3683 3615 3615 1 3668 )); DATA(insert ( 3901 3831 3831 1 3870 )); DATA(insert ( 4033 3802 3802 1 4044 )); +DATA(insert ( 3321 22 22 1 3327 )); /* hash */ diff --git a/src/include/catalog/pg_opclass.h b/src/include/catalog/pg_opclass.h index e7b3148..37075ff 100644 --- a/src/include/catalog/pg_opclass.h +++ b/src/include/catalog/pg_opclass.h @@ -167,6 +167,7 @@ DATA(insert ( 403 bpchar_pattern_ops PGNSP PGUID 2097 1042 f 0 )); DATA(insert ( 403 money_ops PGNSP PGUID 2099 790 t 0 )); DATA(insert ( 405 bool_ops PGNSP PGUID 2222 16 t 0 )); DATA(insert ( 405 bytea_ops PGNSP PGUID 2223 17 t 0 )); +DATA(insert ( 403 int2vector_ops PGNSP PGUID 3321 22 t 0 )); DATA(insert ( 405 int2vector_ops PGNSP PGUID 2224 22 t 0 )); DATA(insert ( 403 tid_ops PGNSP PGUID 2789 27 t 0 )); DATA(insert ( 405 xid_ops PGNSP PGUID 2225 28 t 0 )); diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h index 26c9d4e..7b11942 100644 --- a/src/include/catalog/pg_operator.h +++ b/src/include/catalog/pg_operator.h @@ -154,8 +154,18 @@ DATA(insert OID = 389 ( "!!" PGNSP PGUID l f f 0 20 1700 0 0 numeric_fac DESCR("deprecated, use ! instead"); DATA(insert OID = 385 ( "=" PGNSP PGUID b f t 29 29 16 385 0 cideq eqsel eqjoinsel )); DESCR("equal"); -DATA(insert OID = 386 ( "=" PGNSP PGUID b f t 22 22 16 386 0 int2vectoreq eqsel eqjoinsel )); +DATA(insert OID = 386 ( "=" PGNSP PGUID b t t 22 22 16 386 3316 int2vectoreq eqsel eqjoinsel )); DESCR("equal"); +DATA(insert OID = 3316 ( "<>" PGNSP PGUID b f f 22 22 16 3316 386 int2vectorne neqsel neqjoinsel )); +DESCR("not equal"); +DATA(insert OID = 3317 ( "<" PGNSP PGUID b f f 22 22 16 3318 3320 int2vectorlt scalarltsel scalarltjoinsel )); +DESCR("less than"); +DATA(insert OID = 3318 ( ">" PGNSP PGUID b f f 22 22 16 3317 3319 int2vectorgt scalargtsel scalargtjoinsel )); +DESCR("greater than"); +DATA(insert OID = 3319 ( "<=" PGNSP PGUID b f f 22 22 16 3320 3318 int2vectorle scalarltsel scalarltjoinsel )); +DESCR("less than or equal"); +DATA(insert OID = 3320 ( ">=" PGNSP PGUID b f f 22 22 16 3319 3317 int2vectorge scalargtsel scalargtjoinsel )); +DESCR("greater than or equal"); DATA(insert OID = 387 ( "=" PGNSP PGUID b t f 27 27 16 387 402 tideq eqsel eqjoinsel )); DESCR("equal"); diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h index acbc100..eb9e62b 100644 --- a/src/include/catalog/pg_opfamily.h +++ b/src/include/catalog/pg_opfamily.h @@ -87,6 +87,7 @@ DATA(insert OID = 1983 ( 405 interval_ops PGNSP PGUID )); DATA(insert OID = 1984 ( 403 macaddr_ops PGNSP PGUID )); DATA(insert OID = 1985 ( 405 macaddr_ops PGNSP PGUID )); DATA(insert OID = 1986 ( 403 name_ops PGNSP PGUID )); +DATA(insert OID = 3321 ( 403 int2vector_ops PGNSP PGUID )); #define NAME_BTREE_FAM_OID 1986 DATA(insert OID = 1987 ( 405 name_ops PGNSP PGUID )); DATA(insert OID = 1988 ( 403 numeric_ops PGNSP PGUID )); diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index ddf7c67..f3da6d6 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -536,6 +536,12 @@ DESCR("convert int2 to int4"); DATA(insert OID = 314 ( int2 PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 21 "23" _null_ _null_ _null_ _null_ _null_ i4toi2 _null_ _null_ _null_ )); DESCR("convert int4 to int2"); DATA(insert OID = 315 ( int2vectoreq PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "22 22" _null_ _null_ _null_ _null_ _null_ int2vectoreq _null_ _null_ _null_ )); +DATA(insert OID = 3322 ( int2vectorne PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "22 22" _null_ _null_ _null_ _null_ _null_ int2vectorne _null_ _null_ _null_ )); +DATA(insert OID = 3323 ( int2vectorlt PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "22 22" _null_ _null_ _null_ _null_ _null_ int2vectorlt _null_ _null_ _null_ )); +DATA(insert OID = 3324 ( int2vectorle PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "22 22" _null_ _null_ _null_ _null_ _null_ int2vectorle _null_ _null_ _null_ )); +DATA(insert OID = 3325 ( int2vectorge PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "22 22" _null_ _null_ _null_ _null_ _null_ int2vectorge _null_ _null_ _null_ )); +DATA(insert OID = 3326 ( int2vectorgt PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "22 22" _null_ _null_ _null_ _null_ _null_ int2vectorgt _null_ _null_ _null_ )); + DATA(insert OID = 316 ( float8 PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 701 "23" _null_ _null_ _null_ _null_ _null_ i4tod _null_ _null_ _null_ )); DESCR("convert int4 to float8"); DATA(insert OID = 317 ( int4 PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 23 "701" _null_ _null_ _null_ _null_ _null_ dtoi4 _null_ _null_ _null_ )); @@ -644,6 +650,8 @@ DATA(insert OID = 3134 ( btoidsortsupport PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 DESCR("sort support"); DATA(insert OID = 404 ( btoidvectorcmp PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "30 30" _null_ _null_ _null_ _null_ _null_ btoidvectorcmp _null_ _null_ _null_ )); DESCR("less-equal-greater"); +DATA(insert OID = 3327 ( btint2vectorcmp PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "22 22" _null_ _null_ _null_ _null_ _null_ btint2vectorcmp _null_ _null_ _null_ )); +DESCR("less-equal-greater"); DATA(insert OID = 357 ( btabstimecmp PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "702 702" _null_ _null_ _null_ _null_ _null_ btabstimecmp _null_ _null_ _null_ )); DESCR("less-equal-greater"); DATA(insert OID = 358 ( btcharcmp PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "18 18" _null_ _null_ _null_ _null_ _null_ btcharcmp _null_ _null_ _null_ )); diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index fc1679e..89f231a 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -187,6 +187,11 @@ extern Datum int2vectorout(PG_FUNCTION_ARGS); extern Datum int2vectorrecv(PG_FUNCTION_ARGS); extern Datum int2vectorsend(PG_FUNCTION_ARGS); extern Datum int2vectoreq(PG_FUNCTION_ARGS); +extern Datum int2vectorne(PG_FUNCTION_ARGS); +extern Datum int2vectorlt(PG_FUNCTION_ARGS); +extern Datum int2vectorle(PG_FUNCTION_ARGS); +extern Datum int2vectorge(PG_FUNCTION_ARGS); +extern Datum int2vectorgt(PG_FUNCTION_ARGS); extern Datum int4in(PG_FUNCTION_ARGS); extern Datum int4out(PG_FUNCTION_ARGS); extern Datum int4recv(PG_FUNCTION_ARGS); @@ -310,6 +315,7 @@ extern Datum btfloat48cmp(PG_FUNCTION_ARGS); extern Datum btfloat84cmp(PG_FUNCTION_ARGS); extern Datum btoidcmp(PG_FUNCTION_ARGS); extern Datum btoidvectorcmp(PG_FUNCTION_ARGS); +extern Datum btint2vectorcmp(PG_FUNCTION_ARGS); extern Datum btabstimecmp(PG_FUNCTION_ARGS); extern Datum btreltimecmp(PG_FUNCTION_ARGS); extern Datum bttintervalcmp(PG_FUNCTION_ARGS); diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h index 1f48d61..f5ce307 100644 --- a/src/include/utils/syscache.h +++ b/src/include/utils/syscache.h @@ -66,7 +66,7 @@ enum SysCacheIdentifier INDEXRELID, LANGNAME, LANGOID, - MVCOEFFICIENT, + MVCOEFRELIDATTRS, NAMESPACENAME, NAMESPACEOID, OPERNAMENSP, -- 1.8.3.1
>From 86d867084e6c44f9d7c732f2e315b4dcd34fd711 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Thu, 6 Aug 2015 21:15:08 +0900 Subject: [PATCH 6/6] Use modified index of pg_mvcoefficient. Add column vector to the index of pg_mvcoefficient and make use of it on DDL operation. --- src/backend/commands/tablecmds.c | 74 ++++++++++++++-------------------------- src/include/catalog/indexing.h | 2 +- 2 files changed, 27 insertions(+), 49 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 843a14e..dd37aa8 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -9327,9 +9327,11 @@ ATExecAddDropMvStatistics(Relation rel, List * params, bool drop) Oid relid = RelationGetRelid(rel); int ncols = 0; int16 colids[MVCOEF_MAX_COLS]; + int2vector *colvec = NULL; Relation mvcrel; HeapTuple oldtup, newtup; - ScanKeyData scankey; + int nscankeys; + ScanKeyData scankey[2]; SysScanDesc sysscan; Datum values[Natts_pg_mvcoefficient]; bool nulls[Natts_pg_mvcoefficient]; @@ -9395,62 +9397,38 @@ ATExecAddDropMvStatistics(Relation rel, List * params, bool drop) mvcrel = heap_open(MvCoefficientRelationId, RowExclusiveLock); - ScanKeyInit(&scankey, + nscankeys = 1; + ScanKeyInit(&scankey[0], Anum_pg_mvcoefficient_mvcreloid, BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(relid)); - sysscan = systable_beginscan(mvcrel, MvCoefficientIndexId, true, - NULL, 1, &scankey); - - /* colum list is required to be sorted in comparison */ - qsort(colids, ncols, sizeof(int16), cmpint16); + if (ncols > 0) + { + qsort(colids, ncols, sizeof(int16), cmpint16); + colvec = buildint2vector(colids, ncols); + ScanKeyInit(&scankey[1], + Anum_pg_mvcoefficient_mvcattrs, + BTEqualStrategyNumber, F_INT2VECTOREQ, + PointerGetDatum(colvec)); + nscankeys = 2; + } + sysscan = systable_beginscan(mvcrel, MvCoefficientIndexId, true, + NULL, nscankeys, scankey); if (drop) { - bool dropped = false; - + /* Two or more tuples may be returned for the drop case */ while (HeapTupleIsValid(oldtup = systable_getnext(sysscan))) { - Form_pg_mvcoefficient mvc = - (Form_pg_mvcoefficient) GETSTRUCT (oldtup); - int i; - - /* Fast exit, number of columns don't match */ - if (mvc->mvcnattrs != ncols) - continue; - - /* - * Check for individual columns. This assumes that both of the - * column vector in mvc and colids are sorted. - */ - for (i = 0 ; i < ncols ; i++) - { - if (mvc->mvcattrs.values[i] != colids[i]) - break; - } - - /* - * Remove it if match. Unconditionally do this for the case of - * ncols == 0. - */ - if (i == ncols) - { - simple_heap_delete(mvcrel, &oldtup->t_self); - dropped = true; - } + simple_heap_delete(mvcrel, &oldtup->t_self); } - - if (!dropped) - ereport(ERROR, - (errcode(ERRCODE_NO_DATA_FOUND), - errmsg("no matching statistics found"))); } else { - int2vector *colvec; - + /* At most one tuple is fetched for the add case */ oldtup = systable_getnext(sysscan); + /* Create data for the new tuple */ for (i = 0; i < Natts_pg_mvcoefficient ; ++i) { nulls[i] = false; @@ -9458,8 +9436,7 @@ ATExecAddDropMvStatistics(Relation rel, List * params, bool drop) replaces[i] = false; } - colvec = buildint2vector(colids, ncols); - + Assert(colvec != NULL); values[Anum_pg_mvcoefficient_mvcreloid - 1] = ObjectIdGetDatum(relid); values[Anum_pg_mvcoefficient_mvcnattrs - 1] = Int16GetDatum(ncols); values[Anum_pg_mvcoefficient_mvccoefficient - 1] = Float8GetDatum(1.0); @@ -9467,6 +9444,7 @@ ATExecAddDropMvStatistics(Relation rel, List * params, bool drop) if (HeapTupleIsValid(oldtup)) { + /* Update existing tuple */ replaces[Anum_pg_mvcoefficient_mvcnattrs - 1] = true; replaces[Anum_pg_mvcoefficient_mvccoefficient - 1] = true; replaces[Anum_pg_mvcoefficient_mvcattrs - 1] = true; @@ -9480,14 +9458,14 @@ ATExecAddDropMvStatistics(Relation rel, List * params, bool drop) } else { - newtup = heap_form_tuple(RelationGetDescr(mvcrel), values, nulls); + /* Insert new tuple */ + newtup = heap_form_tuple(RelationGetDescr(mvcrel), + values, nulls); simple_heap_insert(mvcrel, newtup); } CatalogUpdateIndexes(mvcrel, newtup); } - systable_endscan(sysscan); - heap_close(mvcrel, RowExclusiveLock); } diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h index fb28747..16ad43d 100644 --- a/src/include/catalog/indexing.h +++ b/src/include/catalog/indexing.h @@ -173,7 +173,7 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops)); #define LargeObjectMetadataOidIndexId 2996 -DECLARE_UNIQUE_INDEX(pg_mvcoefficient_index, 3578, on pg_mvcoefficient using btree(mvcreloid oid_ops)); +DECLARE_UNIQUE_INDEX(pg_mvcoefficient_index, 3578, on pg_mvcoefficient using btree(mvcreloid oid_ops, mvcattrs int2vector_ops)); #define MvCoefficientIndexId 3578 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops)); -- 1.8.3.1
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers