Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested The patch looks fine to me. Changes are clear and all functions are commented properly. So I marked it "Ready for committer". The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
01.04.2017 02:31, Peter Geoghegan: * index_truncate_tuple() should have as an argument the number of attributes. No need to "#include utils/rel.h" that way. Will fix. * I think that we should store this (the number of attributes), and use it directly when comparing, per my remarks to Tom over on that other thread. We should also use the free bit within IndexTupleData.t_info, to indicate that the IndexTuple was truncated, just to make it clear to everyone that might care that that's how these truncated IndexTuples need to be represented. Doing this would have no real impact on your patch, because for you this will be 100% redundant. It will help external tools, and perhaps another, more general suffix truncation patch that comes in the future. We should try very hard to have a future-proof on-disk representation. I think that this is quite possible. To be honest, I think that it'll make the patch overcomplified, because this exact patch has nothing to do with suffix truncation. Although, we can add any necessary flags if this work will be continued in the future. * I suggest adding a "can't happen" defensive check + error that checks that the tuple returned by index_truncate_tuple() is sized <= the original. This cannot be allowed to ever happen. (Not that I think it will.) There is already an assertion. Assert(IndexTupleSize(newitup) <= IndexTupleSize(olditup)); Do you think it is not enough? * I see a small bug. You forgot to teach _bt_findsplitloc() about truncation. It does this currently, which you did not update: /* * The first item on the right page becomes the high key of the left page; * therefore it counts against left space as well as right space. */ leftfree -= firstrightitemsz; I think that this accounting needs to be fixed. Could you explain, what's wrong with this accounting? We may expect to take more space on the left page, than will be taken after highkey truncation. But I don't see any problem here. * Note sure about one thing. What's the reason for this change? - /* Log left page */ - if (!isleaf) - { - /* -* We must also log the left page's high key, because the right -* page's leftmost key is suppressed on non-leaf levels. Show it -* as belonging to the left page buffer, so that it is not stored -* if XLogInsert decides it needs a full-page image of the left -* page. -*/ - itemid = PageGetItemId(origpage, P_HIKEY); - item = (IndexTuple) PageGetItem(origpage, itemid); - XLogRegisterBufData(0, (char *) item, MAXALIGN(IndexTupleSize(item))); - } + /* +* We must also log the left page's high key, because the right +* page's leftmost key is suppressed on non-leaf levels. Show it +* as belonging to the left page buffer, so that it is not stored +* if XLogInsert decides it needs a full-page image of the left +* page. +*/ + itemid = PageGetItemId(origpage, P_HIKEY); + item = (IndexTuple) PageGetItem(origpage, itemid); + XLogRegisterBufData(0, (char *) item, MAXALIGN(IndexTupleSize(item))); Is this related to the problem that you mentioned to me that you'd fixed when we spoke in person earlier today? You said something about WAL logging, but I don't recall any details. I don't remember seeing this change in prior versions. Anyway, whatever the reason for doing this on the leaf level now, the comments should be updated to explain it. This change related to the bug described in this message. https://www.postgresql.org/message-id/20170330192706.GA2565%40e733.localdomain After fix it is not reproducible. I will update comments in the next patch. * Speaking of WAL-logging, I think that there is another bug in btree_xlog_split(). You didn't change this existing code at all: /* * On leaf level, the high key of the left page is equal to the first key * on the right page. */ if (isleaf) { ItemId hiItemId = PageGetItemId(rpage, P_FIRSTDATAKEY(ropaque)); left_hikey = PageGetItem(rpage, hiItemId); left_hikeysz = ItemIdGetLength(hiItemId); } It seems like this was missed when you changed WAL-logging, since you do something for this on the logging side, but not here, on the replay side. No? I changed it. Now we always use highkey saved in xlog. This code don't needed anymore and can be deleted. Thank you for the notice. I will send updated patch today. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
31.03.2017 20:57, Robert Haas: On Fri, Mar 31, 2017 at 1:40 PM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: First of all, I want to thank you and Robert for reviewing this patch. Your expertise in postgres subsystems is really necessary for features like this. I just wonder, why don't you share your thoughts and doubts till the "last call". I haven't done any significant technical work other than review patches in 14 months, and in the last several months I've often worked 10 and 12 hour days to get more review done. I think at one level you've got a fair complaint here - it's hard to get things committed, and this patch probably didn't get as much attention as it deserved. It's not so easy to know how to fix that. I'm pretty sure "tell Andres and Robert to work harder" isn't it. *off-topic* No complaints from me, I understand how difficult is reviewing and highly appreciate your work. The problem is that not all developers are qualified enough to do a review. I've tried to make a course about postrges internals. Something like "Deep dive into postgres codebase for hackers". And it turned out to be really helpful for new developers. So, I wonder, maybe we could write some tips for new reviewers and testers as well. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
31.03.2017 20:47, Andres Freund: Maybe it would be better to modify index_form_tuple() to accept a new argument with a number of attributes, then you can just Assert that this number is never higher than the number of attributes in the TupleDesc. Good point. I agree that this function is a bit strange. I have to set tupdesc->nattrs to support compatibility with index_form_tuple(). I didn't want to add neither a new field to tupledesc nor a new parameter to index_form_tuple(), because they are used widely. But I haven't considered the possibility of index_form_tuple() failure. Fixed in this version of the patch. Now it creates a copy of tupledesc to pass it to index_form_tuple. That seems like it'd actually be a noticeable increase in memory allocator overhead. I think we should just add (as just proposed in separate thread), a _extended version of it that allows to specify the number of columns. The function is called not that often. Only once per page split for indexes with included columns. Doesn't look like dramatic overhead. So I decided that a wrapper function would be more appropriate than refactoring of all index_form_tuple() calls. But index_form_tuple_extended() looks like a better solution. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Covering + unique indexes.
30.03.2017 19:49, Robert Haas: On Thu, Mar 30, 2017 at 11:26 AM, Teodor Sigaev <teo...@sigaev.ru> wrote: I had a look on patch and played with it, seems, it looks fine. I splitted it to two patches: core changes (+bloom index fix) and btree itself. All docs are left in first patch - I'm too lazy to rewrite documentation which is changed in second patch. Any objection from reviewers to push both patches? Has this really had enough review and testing? The last time it was pushed, it didn't go too well. And laziness is not a very good excuse for not dividing up patches properly. Well, I don't know how can we estimate the quality of the review or testing. The patch was reviewed by many people. Here are those who marked themselves as reviewers on this and previous committfests: Stephen Frost (sfrost), Andrew Dunstan (adunstan), Aleksander Alekseev (a.alekseev), Amit Kapila (amitkapila), Andrey Borodin (x4m), Peter Geoghegan (pgeoghegan), David Rowley (davidrowley). For me it looks serious enough. These people, as well as many others, shared their thoughts on this topic and pointed out various mistakes. I fixed all the issues as soon as I could. And I'm not going to disappear when it will be committed. Personally, I always thought that we have Alpha and Beta releases for integration testing. Speaking of the feature itself, it is included into our fork of PostgreSQL 9.6 since it was released. And as far as I know, there were no complaints from users. It makes me believe that there are no critical bugs there. While there may be conflicts with some other features of v10.0. It seems highly surprising to me that CheckIndexCompatible() only gets a one line change in this patch. That seems unlikely to be correct. What makes you think so? CheckIndexCompatible() only cares about possible opclasses' changes. For covering indexes opclasses are only applicable to indnkeyatts. And that is exactly what was changed in this patch. Do you think it needs some other changes? Has anybody done some testing of this patch with the WAL consistency checker? Like, create some tables with indexes that have INCLUDE columns, set up a standby, enable consistency checking, pound the master, and see if the standby bails? Good point. I missed this feature, I wish someone mentioned this issue a bit earlier. And as Alexander's test shows there is some problem with my patch, indeed. I'll fix it and send updated patch. Has anybody tested this patch with amcheck? Does it break amcheck? Yes, it breaks amcheck. Amcheck should be patched in order to work with covering indexes. We've discussed it with Peter before and I even wrote small patch. I'll attach it in the following message. A few minor comments: -foreach(lc, constraint->keys) +else foreach(lc, constraint->keys) That doesn't look like a reasonable way of formatting the code. +/* Here is some code duplication. But we do need it. */ That is not a very informative comment. +* NOTE It is not crutial for reliability in present, Spelling, punctuation. Will be fixed as well. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_serial early wraparound
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested Hi, I've tried to review this patch, but it seems that I miss something essential. You claim that SLRUs now support five digit segment name, while in slru.h at current master I see the following: * Note: slru.c currently assumes that segment file names will be four hex * digits. This sets a lower bound on the segment size (64K transactions * for 32-bit TransactionIds). */ #define SLRU_PAGES_PER_SEGMENT 32 /* Maximum length of an SLRU name */ #define SLRU_MAX_NAME_LENGTH32 Could you please clarify the idea of the patch? Is it still relevant? I've also run your test script. pg_clog was renamed to pg_xact, so it need to be changed accordingly echo "Contents of pg_clog:" ls $PGDATA/pg_xact/ The test shows failed assertion: == setting next xid to 1073741824 = Transaction log reset waiting for server to start2017-03-24 17:05:19.897 MSK [1181] LOG: listening on IPv4 address "127.0.0.1", port 5432 2017-03-24 17:05:19.981 MSK [1181] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2017-03-24 17:05:20.081 MSK [1183] LOG: database system was shut down at 2017-03-24 17:05:19 MSK 2017-03-24 17:05:20.221 MSK [1181] LOG: database system is ready to accept connections done server started vacuumdb: vacuuming database "postgres" vacuumdb: vacuuming database "template0" vacuumdb: vacuuming database "template1" TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(oldestXact, ShmemVariableCache->oldestXid))", File: "clog.c", Line: 669) vacuumdb: vacuuming of database "template1" failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2017-03-24 17:05:21.541 MSK [1181] LOG: server process (PID 1202) was terminated by signal 6: Aborted 2017-03-24 17:05:21.541 MSK [1181] DETAIL: Failed process was running: VACUUM (FREEZE); The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Declarative partitioning optimization for large amount of partitions
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested The patch looks good to me. It applies clearly, passes all the tests and eliminates the bottleneck described in the first message. And as I can see from the thread, there were no objections from others, except a few minor comments about code style, which are fixed in the last version of the patch. So I mark it "Ready for committer". The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
Patch rebased to the current master is in attachments. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit 497d52b713dd8f926b465ddf22f21db7229b12e3 Author: Anastasia <a.lubennik...@postgrespro.ru> Date: Tue Mar 21 12:58:13 2017 +0300 include_columns_10.0_v4.patch diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index c1e9089..5c80e3b 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals); @@ -1480,7 +1480,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey); Datum dblink_get_pkey(PG_FUNCTION_ARGS) { - int16 numatts; + int16 indnkeyatts; char **results; FuncCallContext *funcctx; int32 call_cntr; @@ -1506,7 +1506,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS) rel = get_rel_from_relname(PG_GETARG_TEXT_PP(0), AccessShareLock, ACL_SELECT); /* get the array of attnums */ - results = get_pkey_attnames(rel, ); + results = get_pkey_attnames(rel, ); relation_close(rel, AccessShareLock); @@ -1526,9 +1526,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS) attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; - if ((results != NULL) && (numatts > 0)) + if ((results != NULL) && (indnkeyatts > 0)) { - funcctx->max_calls = numatts; + funcctx->max_calls = indnkeyatts; /* got results, keep track of them */ funcctx->user_fctx = results; @@ -2016,10 +2016,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS) * get_pkey_attnames * * Get the primary key attnames for the given relation. - * Return NULL, and set numatts = 0, if no primary key exists. + * Return NULL, and set indnkeyatts = 0, if no primary key exists. */ static char ** -get_pkey_attnames(Relation rel, int16 *numatts) +get_pkey_attnames(Relation rel, int16 *indnkeyatts) { Relation indexRelation; ScanKeyData skey; @@ -2029,8 +2029,8 @@ get_pkey_attnames(Relation rel, int16 *numatts) char **result = NULL; TupleDesc tupdesc; - /* initialize numatts to 0 in case no primary key exists */ - *numatts = 0; + /* initialize indnkeyatts to 0 in case no primary key exists */ + *indnkeyatts = 0; tupdesc = rel->rd_att; @@ -2051,12 +2051,12 @@ get_pkey_attnames(Relation rel, int16 *numatts) /* we're only interested if it is the primary key */ if (index->indisprimary) { - *numatts = index->indnatts; - if (*numatts > 0) + *indnkeyatts = index->indnkeyatts; + if (*indnkeyatts > 0) { -result = (char **) palloc(*numatts * sizeof(char *)); +result = (char **) palloc(*indnkeyatts * sizeof(char *)); -for (i = 0; i < *numatts; i++) +for (i = 0; i < *indnkeyatts; i++) result[i] = SPI_fname(tupdesc, index->indkey.values[i]); } break; diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 1241108..943e410 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS) /* we're only interested if it is the primary key and valid */ if (index->indisprimary && IndexIsValid(index)) { - int numatts = index->indnatts; + int indnkeyatts = index->indnkeyatts; - if (numatts > 0) + if (indnkeyatts > 0) { int i; @@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS) appendStringInfoCharMacro(payload, ','); appendStringInfoCharMacro(payload, operation); -for (i = 0; i < numatts; i++) +for (i = 0; i < indnkeyatts; i++) { int colno = index->indkey.values[i]; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index df0435c..e196e20 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3620,6 +3620,14 @@ pg_class.relnatts) + + indnkeyatts + int2 + + The number of key columns in the index. "Key columns" are ordinary + index columns (as opposed to "included" columns). + + indisunique bool diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index ac51258..f9539e9 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -114,6 +114,8 @@ typedef struct IndexAmRoutine boolamcanparallel; /* type of data stored in index, or InvalidOi
Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested As I can see, this bugfix was already discussed and reviewed. All mentioned issues are fixed in the latest version of the patch So I mark it "Ready for committer". Thank you for fixing it! The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] btree_gin and btree_gist for enums
28.02.2017 00:22, Andrew Dunstan: OK, here's the whole series of patches. Patch 1 adds the CallerFInfoFunctionCall{1,2} functions. Patch 2 adds btree_gist support for their use for non-varlena types Patch 3 does the same for varlena types (Not required for patch 4, but better to be consistent, I think.) Patch 4 adds enum support to btree_gist Patch 5 adds enum support to btree_gin cheers andrew Thank you for implementing the feature! These patches seem to have some merge conflicts with recent commit: commit c7a9fa399d557c6366222e90b35db31e45d25678 Author: Stephen Frost <sfr...@snowman.net> Date: Wed Mar 15 11:16:25 2017 -0400 Add support for EUI-64 MAC addresses as macaddr8 And also, it's needed to update patch 0002 to consider macaddr8, I attached the diff. Complete patch (including 0002_addition fix) rebased to the current master is attached as well. It works as expected, code itself looks clear and well documented. The only issue I've found is a make check failure in contrib/btree_gin subdirectory: test numeric ... ok test enum ... /bin/sh: 1: cannot open /home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/sql/enum.sql: No such file diff: /home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out: No such file or directory diff command failed with status 512: diff "/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/expected/enum.out" "/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out" > "/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out.diff" Please, add regression test for btree_gin, and this patch can be considered "Ready for committer". -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/btree_gist/btree_macaddr8.c b/contrib/btree_gist/btree_macaddr8.c index 13238ef..96afbcd 100644 --- a/contrib/btree_gist/btree_macaddr8.c +++ b/contrib/btree_gist/btree_macaddr8.c @@ -28,37 +28,37 @@ PG_FUNCTION_INFO_V1(gbt_macad8_same); static bool -gbt_macad8gt(const void *a, const void *b) +gbt_macad8gt(const void *a, const void *b, FmgrInfo *flinfo) { return DatumGetBool(DirectFunctionCall2(macaddr8_gt, PointerGetDatum(a), PointerGetDatum(b))); } static bool -gbt_macad8ge(const void *a, const void *b) +gbt_macad8ge(const void *a, const void *b, FmgrInfo *flinfo) { return DatumGetBool(DirectFunctionCall2(macaddr8_ge, PointerGetDatum(a), PointerGetDatum(b))); } static bool -gbt_macad8eq(const void *a, const void *b) +gbt_macad8eq(const void *a, const void *b, FmgrInfo *flinfo) { return DatumGetBool(DirectFunctionCall2(macaddr8_eq, PointerGetDatum(a), PointerGetDatum(b))); } static bool -gbt_macad8le(const void *a, const void *b) +gbt_macad8le(const void *a, const void *b, FmgrInfo *flinfo) { return DatumGetBool(DirectFunctionCall2(macaddr8_le, PointerGetDatum(a), PointerGetDatum(b))); } static bool -gbt_macad8lt(const void *a, const void *b) +gbt_macad8lt(const void *a, const void *b, FmgrInfo *flinfo) { return DatumGetBool(DirectFunctionCall2(macaddr8_lt, PointerGetDatum(a), PointerGetDatum(b))); } static int -gbt_macad8key_cmp(const void *a, const void *b) +gbt_macad8key_cmp(const void *a, const void *b, FmgrInfo *flinfo) { mac8KEY*ia = (mac8KEY *) (((const Nsrt *) a)->t); mac8KEY*ib = (mac8KEY *) (((const Nsrt *) b)->t); @@ -142,7 +142,7 @@ gbt_macad8_consistent(PG_FUNCTION_ARGS) key.upper = (GBT_NUMKEY *) >upper; PG_RETURN_BOOL( - gbt_num_consistent(, (void *) query, , GIST_LEAF(entry), ) + gbt_num_consistent(, (void *) query, , GIST_LEAF(entry), , fcinfo->flinfo) ); } @@ -154,7 +154,7 @@ gbt_macad8_union(PG_FUNCTION_ARGS) void *out = palloc0(sizeof(mac8KEY)); *(int *) PG_GETARG_POINTER(1) = sizeof(mac8KEY); - PG_RETURN_POINTER(gbt_num_union((void *) out, entryvec, )); + PG_RETURN_POINTER(gbt_num_union((void *) out, entryvec, , fcinfo->flinfo)); } @@ -184,7 +184,7 @@ gbt_macad8_picksplit(PG_FUNCTION_ARGS) PG_RETURN_POINTER(gbt_num_picksplit( (GistEntryVector *) PG_GETARG_POINTER(0), (GIST_SPLITVEC *) PG_GETARG_POINTER(1), - + , fcinfo->flinfo )); } @@ -195,6 +195,6 @@ gbt_macad8_same(PG_FUNCTION_ARGS) mac8KEY*b2 = (mac8KEY *) PG_GETARG_POINTER(1); bool *result = (bool *) PG_GETARG_POINTER(2); - *result = gbt_num_same((void *) b1, (void *) b2, ); + *result = gbt_num_same((void *) b1, (void *) b2, , fcinfo->flinfo); PG_RETURN_POINTER(result); } commit 50f4a4efcb9ed3ae5e0378676459c6d1ce455bd2 Author: Anastasia <a.lubennik...@postgrespro.ru> Date: Tue Mar 21 10:11:37 2017 +0300 complete btree_gin_gist_enum patch diff --git a/contrib/btree_gin/Makefile b/contrib/btree_gin/Makefile index f22e
Re: [HACKERS] Backend crash on non-exclusive backup cancel
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed As I see, this bugfix works as expected and the patch is small and clear, so I marked it "Ready for committer". Anyway, it would be great if David could also have a look at the patch. And again, thank you for fixing this issue! The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backend crash on non-exclusive backup cancel
As far as I understand, in this thread were discussed two bugs of pg_stop_backup(). Thanks to the clear descriptions above, I easily reproduced both of them. BUG#1: Server crashes on assertion on call of pg_stop_backup(false) after interrupted call of pg_stop_backup(false). TRAP: FailedAssertion("!(XLogCtl->Insert.nonExclusiveBackups > 0)", File: "xlog.c", Line: 10747) BUG#2: Failure to start an exclusive backup with the same name, if previous exclusive backup was stopped in another session. Both problems seem to be fixed with patch "backup-session-locks-fixes.patch". Speaking of the patch itself, I have a question: shouldn't we also update sessionBackupState in pg_stop_backup_callback() along with XLogCtl->Insert.exclusiveBackupState? And couple of minor notes: 1) + * Routines to starting stop, and get status of a base backup Probably should be: + * Routines to start, stop and get status of a base backup And also this comment should be moved below the enum. 2) This is used in parallel of the shared memory status s/ in parallel of/ in parallel with The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements
13.03.2017 11:53, Artur Zakirov: On 15.02.2017 20:54, Anastasia Lubennikova wrote: Done. I have gotten the error that AlterUserMappingStmt doesn't have if_not_exists (in Russian): gram.y: В функции «base_yyparse»: gram.y:4918:7: ошибка: «AlterUserMappingStmt {aka struct AlterUserMappingStmt}» не содержит элемента с именем «if_not_exists» n->if_not_exists = false; ^~ After applying the CREATE USER patch in gram.y I have: AlterUserMappingStmt: ALTER USER MAPPING FOR auth_ident SERVER name alter_generic_options { AlterUserMappingStmt *n = makeNode(AlterUserMappingStmt); n->user = $5; n->servername = $7; n->options = $8; n->if_not_exists = false; $$ = (Node *) n; } | CREATE USER MAPPING IF_P NOT EXISTS FOR auth_ident SERVER name create_generic_options { CreateUserMappingStmt *n = makeNode(CreateUserMappingStmt); n->user = $8; n->servername = $10; n->options = $11; n->if_not_exists = true; $$ = (Node *) n; } ; Here ALTER USER MAPPING and CREATE USER MAPPING commands were mixed. Thanks for catching that. It was caused by a conflict on applying of the patch. Updated versions of both patches are attached. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/ref/create_server.sgml b/doc/src/sgml/ref/create_server.sgml index 734c6c9..6777679 100644 --- a/doc/src/sgml/ref/create_server.sgml +++ b/doc/src/sgml/ref/create_server.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -CREATE SERVER server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ] +CREATE SERVER [IF NOT EXISTS] server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ] FOREIGN DATA WRAPPER fdw_name [ OPTIONS ( option 'value' [, ... ] ) ] @@ -56,6 +56,19 @@ CREATE SERVER server_name [ TYPE '< Parameters + + +IF NOT EXISTS + + + Do not throw an error if a server with the same name already exists. + A notice is issued in this case. Note that there is no guarantee that + the existing server is anything like the one that would have been + created. + + + + server_name diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c index d5d40e6..41b2c01 100644 --- a/src/backend/commands/foreigncmds.c +++ b/src/backend/commands/foreigncmds.c @@ -879,12 +879,25 @@ CreateForeignServer(CreateForeignServerStmt *stmt) /* * Check that there is no other foreign server by this name. + * Do nothing if IF NOT EXISTS was enforced. */ if (GetForeignServerByName(stmt->servername, true) != NULL) - ereport(ERROR, -(errcode(ERRCODE_DUPLICATE_OBJECT), - errmsg("server \"%s\" already exists", - stmt->servername))); + { + if (stmt->if_not_exists) + { + ereport(NOTICE, + (errcode(ERRCODE_DUPLICATE_OBJECT), + errmsg("foreign server \"%s\" already exists, skipping", + stmt->servername))); + heap_close(rel, RowExclusiveLock); + return InvalidObjectAddress; + } + else + ereport(ERROR, + (errcode(ERRCODE_DUPLICATE_OBJECT), + errmsg("foreign server \"%s\" already exists", + stmt->servername))); + } /* * Check that the FDW exists and that we have USAGE on it. Also get the diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index e7acc2d..da67b51 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -4621,6 +4621,19 @@ CreateForeignServerStmt: CREATE SERVER name opt_type opt_foreign_server_version n->version = $5; n->fdwname = $9; n->options = $10; + n->if_not_exists = false; + $$ = (Node *) n; +} +| CREATE SERVER IF_P NOT EXISTS name opt_type opt_foreign_server_version + FOREIGN DATA_P WRAPPER name create_generic_options +{ + CreateForeignServerStmt *n = makeNode(CreateForeignServerStmt); + n->servername = $6; + n->servertype = $7; + n->version = $8; + n->fdwname = $12; + n->options = $13; + n->if_not_exists = true; $$ = (Node *) n; } ; diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index a44d217..804436b 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -2151,6 +2151,7 @@ typedef struct CreateForeignServerStmt char *servertype; /* optional server type */ char *version; /* optional server version */ char *fdwname; /* FDW name */ + bool if_not_exists; /* just do nothing if it already exists? */ List *opt
Re: [HACKERS] IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements
13.02.2017 19:34, Andrew Dunstan: On 01/13/2017 08:36 AM, Anastasia Lubennikova wrote: I implemented IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements for one of our customers. I think other users can also find it useful for scripting and automated tasks. The patches themselves are small and simple. Documentation is updated as well. This looks good and useful. Please add some regression tests. Done. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/ref/create_server.sgml b/doc/src/sgml/ref/create_server.sgml index 734c6c9..6777679 100644 --- a/doc/src/sgml/ref/create_server.sgml +++ b/doc/src/sgml/ref/create_server.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -CREATE SERVER server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ] +CREATE SERVER [IF NOT EXISTS] server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ] FOREIGN DATA WRAPPER fdw_name [ OPTIONS ( option 'value' [, ... ] ) ] @@ -56,6 +56,19 @@ CREATE SERVER server_name [ TYPE '< Parameters + + +IF NOT EXISTS + + + Do not throw an error if a server with the same name already exists. + A notice is issued in this case. Note that there is no guarantee that + the existing server is anything like the one that would have been + created. + + + + server_name diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c index 6ff8b69..b4ae5de 100644 --- a/src/backend/commands/foreigncmds.c +++ b/src/backend/commands/foreigncmds.c @@ -883,12 +883,25 @@ CreateForeignServer(CreateForeignServerStmt *stmt) /* * Check that there is no other foreign server by this name. + * Do nothing if IF NOT EXISTS was enforced. */ if (GetForeignServerByName(stmt->servername, true) != NULL) - ereport(ERROR, -(errcode(ERRCODE_DUPLICATE_OBJECT), - errmsg("server \"%s\" already exists", - stmt->servername))); + { + if (stmt->if_not_exists) + { + ereport(NOTICE, + (errcode(ERRCODE_DUPLICATE_OBJECT), + errmsg("foreign server \"%s\" already exists, skipping", + stmt->servername))); + heap_close(rel, RowExclusiveLock); + return InvalidObjectAddress; + } + else + ereport(ERROR, + (errcode(ERRCODE_DUPLICATE_OBJECT), + errmsg("foreign server \"%s\" already exists", + stmt->servername))); + } /* * Check that the FDW exists and that we have USAGE on it. Also get the diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index a4edea0..d007468 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -4655,6 +4655,19 @@ CreateForeignServerStmt: CREATE SERVER name opt_type opt_foreign_server_version n->version = $5; n->fdwname = $9; n->options = $10; + n->if_not_exists = false; + $$ = (Node *) n; +} +| CREATE SERVER IF_P NOT EXISTS name opt_type opt_foreign_server_version + FOREIGN DATA_P WRAPPER name create_generic_options +{ + CreateForeignServerStmt *n = makeNode(CreateForeignServerStmt); + n->servername = $6; + n->servertype = $7; + n->version = $8; + n->fdwname = $12; + n->options = $13; + n->if_not_exists = true; $$ = (Node *) n; } ; diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 07a8436..704bc6b 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -2113,6 +2113,7 @@ typedef struct CreateForeignServerStmt char *servertype; /* optional server type */ char *version; /* optional server version */ char *fdwname; /* FDW name */ + bool if_not_exists; /* just do nothing if it already exists? */ List *options; /* generic options to server */ } CreateForeignServerStmt; diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out index 3a9fb8f..17f9f40 100644 --- a/src/test/regress/expected/foreign_data.out +++ b/src/test/regress/expected/foreign_data.out @@ -283,7 +283,9 @@ ERROR: foreign-data wrapper "foo" does not exist CREATE FOREIGN DATA WRAPPER foo OPTIONS ("test wrapper" 'true'); CREATE SERVER s1 FOREIGN DATA WRAPPER foo; CREATE SERVER s1 FOREIGN DATA WRAPPER foo; -- ERROR -ERROR: server "s1" already exists +ERROR: foreign server "s1" already exists +CREATE SERVER IF NOT EXISTS s1 FOREIGN DATA WRAPPER foo; -- No ERROR, just NOTICE +NOTICE: foreign server "s1" already exists, skipping CREATE SERVER s2 FOREIGN DATA WRAPPER foo OPTIONS (host 'a', dbname 'b'); CREATE SERVER s3 TYPE 'oracle' FOREIGN DATA WRAPPER foo; CREATE SERVER s4 TYPE 'oracle' FOREIGN DATA WRAPPER foo OPTIONS (host 'a', dbname 'b'); diff --git a/src/test/regress/sql
[HACKERS] Cast jsonb to numeric, int, float, bool
Now the simplest way to extract booleans and numbers from json/jsonb is to cast it to text and then cast to the appropriate type: postgres=# select 'true'::jsonb::text::bool; bool -- t postgres=# select '1.0'::jsonb::text::numeric; numeric - 1.0 This patch implements direct casts from jsonb numeric (jbvNumeric) to numeric, int4 and float8, and from jsonb bool (jbvBool) to bool. postgres=# select 'true'::jsonb::bool; bool -- t postgres=# select '1.0'::jsonb::numeric; numeric - 1.0 Waiting for your feedback. If you find it useful, I can also add support of json and other types, such as smallint and bigint. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c index b9bf18f..4bbe81c 100644 --- a/src/backend/utils/adt/jsonb.c +++ b/src/backend/utils/adt/jsonb.c @@ -1939,3 +1939,129 @@ jsonb_object_agg_finalfn(PG_FUNCTION_ARGS) PG_RETURN_POINTER(out); } + +Datum +jsonb_numeric(PG_FUNCTION_ARGS) +{ + Jsonb *in = PG_GETARG_JSONB(0); + JsonbIterator *it; + JsonbValue v; + + if (!JB_ROOT_IS_OBJECT(in) + && !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in))) + { + Assert(JB_ROOT_IS_SCALAR(in)); + + it = JsonbIteratorInit(>root); + + /* + * A root scalar is stored as an array of one element, so we get the + * array and then its first (and only) member. + */ + (void) JsonbIteratorNext(, , true); + Assert(v.type == jbvArray); + (void) JsonbIteratorNext(, , true); + + if (v.type == jbvNumeric) + PG_RETURN_NUMERIC(v.val.numeric); + } + + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("key value must be json numeric"))); +} + +Datum +jsonb_int4(PG_FUNCTION_ARGS) +{ + Jsonb *in = PG_GETARG_JSONB(0); + JsonbIterator *it; + JsonbValue v; + + if (!JB_ROOT_IS_OBJECT(in) + && !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in))) + { + Assert(JB_ROOT_IS_SCALAR(in)); + + it = JsonbIteratorInit(>root); + + /* + * A root scalar is stored as an array of one element, so we get the + * array and then its first (and only) member. + */ + (void) JsonbIteratorNext(, , true); + Assert(v.type == jbvArray); + (void) JsonbIteratorNext(, , true); + + if (v.type == jbvNumeric) + PG_RETURN_INT32(DatumGetInt32(DirectFunctionCall1(numeric_int4, + NumericGetDatum(v.val.numeric; + } + + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("key value must be json numeric"))); +} + +Datum +jsonb_float8(PG_FUNCTION_ARGS) +{ + Jsonb *in = PG_GETARG_JSONB(0); + JsonbIterator *it; + JsonbValue v; + + if (!JB_ROOT_IS_OBJECT(in) + && !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in))) + { + Assert(JB_ROOT_IS_SCALAR(in)); + + it = JsonbIteratorInit(>root); + + /* + * A root scalar is stored as an array of one element, so we get the + * array and then its first (and only) member. + */ + (void) JsonbIteratorNext(, , true); + Assert(v.type == jbvArray); + (void) JsonbIteratorNext(, , true); + + if (v.type == jbvNumeric) + PG_RETURN_FLOAT8(DatumGetFloat8(DirectFunctionCall1(numeric_float8, + NumericGetDatum(v.val.numeric; + } + + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("key value must be json numeric"))); +} + +Datum +jsonb_bool(PG_FUNCTION_ARGS) +{ + Jsonb *in = PG_GETARG_JSONB(0); + JsonbIterator *it; + JsonbValue v; + + if (!JB_ROOT_IS_OBJECT(in) + && !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in))) + { + Assert(JB_ROOT_IS_SCALAR(in)); + + it = JsonbIteratorInit(>root); + + /* + * A root scalar is stored as an array of one element, so we get the + * array and then its first (and only) member. + */ + (void) JsonbIteratorNext(, , true); + Assert(v.type == jbvArray); + (void) JsonbIteratorNext(, , true); + + if (v.type == jbvBool) + PG_RETURN_BOOL(v.val.boolean); + } + + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("key value must be json boolean"))); +} \ No newline at end of file diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h index 80a40ab..0646f99 100644 --- a/src/include/catalog/pg_cast.h +++ b/src/include/catalog/pg_cast.h @@ -377,5 +377,9 @@ DATA(insert ( 1700 1700 1703 i f )); /* json to/from jsonb */ DATA(insert ( 114 38020 a i )); DATA(insert ( 3802 1140 a i )); +DATA(insert ( 3802 1700 774 e f )); +DATA(insert ( 3802 23 775 e f )); +DATA(insert ( 3802 701776 e f )); +DATA(insert ( 3802 16 777 e f )); #endif /* PG_CAST_H */ diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 31c828a..ff7da5b 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -2364,6 +2364,15 @@ DESCR("convert int2 to numeric"); DATA(insert OID = 1783
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
28.12.2016 23:43, Claudio Freire: Attached v4 patches with the requested fixes. Sorry for being late, but the tests took a lot of time. create table t1 as select i, md5(random()::text) from generate_series(0,4) as i; create index md5_idx ON t1(md5); update t1 set md5 = md5((random() * (100 + 500))::text); vacuum; Patched vacuum used 2.9Gb of memory and vacuumed the index in one pass, while for old version it took three passes (1GB+1GB+0.9GB). Vacuum duration results: vanilla: LOG: duration: 4359006.327 ms statement: vacuum verbose t1; patched: LOG: duration: 3076827.378 ms statement: vacuum verbose t1; We can see 30% vacuum speedup. I should note that this case can be considered as favorable to vanilla vacuum: the table is not that big, it has just one index and disk used is a fast fusionIO. We can expect even more gain on slower disks. Thank you again for the patch. Hope to see it in 10.0. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] GSoC 2017
I'm ready to be a mentor. 10.01.2017 12:53, Alexander Korotkov: Hi all! In 2016 PostgreSQL project didn't pass to GSoC program. In my understanding the reasons for that are following. 1. We did last-minute submission of our application to GSoC. 2. In 2016 GSoC application form for mentoring organizations has been changed. In particular, it required more detailed information about possible project. As result we didn't manage to make a good enough application that time. Thus, our application was declined. See [1] and [2] for details. I think that the right way to manage this in 2017 would be to start collecting required information in advance. According to GSoC 2017 timeline [3] mentoring organization can submit their applications from January 19 to February 9. Thus, now it's a good time to start collecting project ideas and make call for mentors. Also, we need to decide who would be our admin this year. In sum, we have following questions: 1. What project ideas we have? 2. Who are going to be mentors this year? 3. Who is going to be project admin this year? BTW, I'm ready to be mentor this year. I'm also open to be an admin if needed. [1] https://www.postgresql.org/message-id/flat/CAA-aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/CALxAEPuGpAjBSN-PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww%40mail.gmail.com [3] https://developers.google.com/open-source/gsoc/timeline -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Parallel Index Scans
27.12.2016 17:33, Amit Kapila: On Fri, Dec 23, 2016 at 6:42 PM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: 22.12.2016 07:19, Amit Kapila: On Wed, Dec 21, 2016 at 8:46 PM, Anastasia Lubennikova <lubennikov...@gmail.com> wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi, thank you for the patch. Results are very promising. Do you see any drawbacks of this feature or something that requires more testing? I think you can focus on the handling of array scan keys for testing. In general, one of my colleagues has shown interest in testing this patch and I think he has tested as well but never posted his findings. I will request him to share his findings and what kind of tests he has done, if any. Check please code related to buffer locking and pinning once again. I got the warning. Here are the steps to reproduce it: Except "autovacuum = off" config is default. pgbench -i -s 100 test pgbench -c 10 -T 120 test SELECT count(aid) FROM pgbench_accounts WHERE aid > 1000 AND aid < 90 AND bid > 800 AND bid < 900; WARNING: buffer refcount leak: [8297] (rel=base/12289/16459, blockNum=2469, flags=0x9380, refcount=1 1) count The similar problem has occurred while testing "parallel index only scan" patch and Rafia has included the fix in her patch [1] which ideally should be included in this patch, so I have copied the fix from her patch. Apart from that, I observed that similar problem can happen for backward scans, so fixed the same as well. I confirm that this problem is solved. But I'm trying to find the worst cases for this feature. And I suppose we should test parallel index scans with concurrent insertions. More parallel readers we have, higher the concurrency. I doubt that it can significantly decrease performance, because number of parallel readers is not that big, I am not sure if such a test is meaningful for this patch because parallelism is generally used for large data reads and in such cases there are generally not many concurrent writes. I didn't find any case of noticeable performancedegradation, so set status to "Ready for committer". Thank you for this patch. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements
I implemented IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements for one of our customers. I think other users can also find it useful for scripting and automated tasks. The patches themselves are small and simple. Documentation is updated as well. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit cf20fa8c9f29310d8f67e5b198a9eb908d903431 Author: Anastasia <a.lubennik...@postgrespro.ru> Date: Fri Jan 13 16:22:01 2017 +0300 Add [IF NOT EXISTS] option to CREATE SERVER statement diff --git a/doc/src/sgml/ref/create_server.sgml b/doc/src/sgml/ref/create_server.sgml index 734c6c9..6777679 100644 --- a/doc/src/sgml/ref/create_server.sgml +++ b/doc/src/sgml/ref/create_server.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -CREATE SERVER server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ] +CREATE SERVER [IF NOT EXISTS] server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ] FOREIGN DATA WRAPPER fdw_name [ OPTIONS ( option 'value' [, ... ] ) ] @@ -56,6 +56,19 @@ CREATE SERVER server_name [ TYPE '< Parameters + + +IF NOT EXISTS + + + Do not throw an error if a server with the same name already exists. + A notice is issued in this case. Note that there is no guarantee that + the existing server is anything like the one that would have been + created. + + + + server_name diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c index 06b4bc3..0b0114c 100644 --- a/src/backend/commands/foreigncmds.c +++ b/src/backend/commands/foreigncmds.c @@ -883,12 +883,25 @@ CreateForeignServer(CreateForeignServerStmt *stmt) /* * Check that there is no other foreign server by this name. + * Do nothing if IF NOT EXISTS was enforced. */ if (GetForeignServerByName(stmt->servername, true) != NULL) - ereport(ERROR, -(errcode(ERRCODE_DUPLICATE_OBJECT), - errmsg("server \"%s\" already exists", - stmt->servername))); + { + if (stmt->if_not_exists) + { + ereport(NOTICE, + (errcode(ERRCODE_DUPLICATE_OBJECT), + errmsg("foreign server \"%s\" already exists, skipping", + stmt->servername))); + heap_close(rel, RowExclusiveLock); + return InvalidObjectAddress; + } + else + ereport(ERROR, + (errcode(ERRCODE_DUPLICATE_OBJECT), + errmsg("foreign server \"%s\" already exists", + stmt->servername))); + } /* * Check that the FDW exists and that we have USAGE on it. Also get the diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 9eef550..ab4b3b1 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -4621,6 +4621,19 @@ CreateForeignServerStmt: CREATE SERVER name opt_type opt_foreign_server_version n->version = $5; n->fdwname = $9; n->options = $10; + n->if_not_exists = false; + $$ = (Node *) n; +} +| CREATE SERVER IF_P NOT EXISTS name opt_type opt_foreign_server_version + FOREIGN DATA_P WRAPPER name create_generic_options +{ + CreateForeignServerStmt *n = makeNode(CreateForeignServerStmt); + n->servername = $6; + n->servertype = $7; + n->version = $8; + n->fdwname = $12; + n->options = $13; + n->if_not_exists = true; $$ = (Node *) n; } ; diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 7ceaa22..8a79ec0 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -2075,6 +2075,7 @@ typedef struct CreateForeignServerStmt char *servertype; /* optional server type */ char *version; /* optional server version */ char *fdwname; /* FDW name */ + bool if_not_exists; /* just do nothing if it already exists? */ List *options; /* generic options to server */ } CreateForeignServerStmt; commit 4237a23d6674da774b0772ffc953e5f499c04803 Author: Anastasia <a.lubennik...@postgrespro.ru> Date: Fri Jan 13 16:23:53 2017 +0300 Add [IF NOT EXISTS] option to CREATE USER MAPPING statement diff --git a/doc/src/sgml/ref/create_user_mapping.sgml b/doc/src/sgml/ref/create_user_mapping.sgml index 44fe302..680906b 100644 --- a/doc/src/sgml/ref/create_user_mapping.sgml +++ b/doc/src/sgml/ref/create_user_mapping.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -CREATE USER MAPPING FOR { user_name | USER | CURRENT_USER | PUBLIC } +CREATE USER MAPPING [IF NOT EXISTS] FOR { user_name | USER | CURRENT_USER | PUBLIC } SERVER server_name [ OPTIONS ( option 'value' [ , ... ] ) ] @@ -49,6 +49,18 @@ CREATE USER MAPPING FOR { user_name Parameters + +IF NOT EXISTS + + + Do not throw an error if a user mapping with the same name already exists. + A notice is issued in this case. Note that
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
27.12.2016 20:14, Claudio Freire: On Tue, Dec 27, 2016 at 10:41 AM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, vacrelstats=0x1ef6e00) at vacuumlazy.c:1417 1417tblk = ItemPointerGetBlockNumber(>dead_tuples[tupindex]); (gdb) bt #0 0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, vacrelstats=0x1ef6e00) at vacuumlazy.c:1417 #1 0x00693dfe in lazy_scan_heap (onerel=0x1ec2360, options=9, vacrelstats=0x1ef6e00, Irel=0x1ef7168, nindexes=2, aggressive=1 '\001') at vacuumlazy.c:1337 #2 0x00691e66 in lazy_vacuum_rel (onerel=0x1ec2360, options=9, params=0x7ffe0f866310, bstrategy=0x1f1c4a8) at vacuumlazy.c:290 #3 0x0069191f in vacuum_rel (relid=1247, relation=0x0, options=9, params=0x7ffe0f866310) at vacuum.c:1418 Those line numbers don't match my code. Which commit are you based on? My tree is (currently) based on 71f996d22125eb6cfdbee6094f44370aa8dec610 Hm, my branch is based on 71f996d22125eb6cfdbee6094f44370aa8dec610 as well. I merely applied patches 0001-Vacuum-prefetch-buffers-on-backward-scan-v3.patch and 0002-Vacuum-allow-using-more-than-1GB-work-mem-v3.patch then ran configure and make as usual. Am I doing something wrong? Anyway, I found the problem that had caused segfault. for (segindex = 0; segindex <= vacrelstats->dead_tuples.last_seg; tupindex = 0, segindex++) { DeadTuplesSegment *seg = &(vacrelstats->dead_tuples.dead_tuples[segindex]); intnum_dead_tuples = seg->num_dead_tuples; while (tupindex < num_dead_tuples) ... You rely on the value of tupindex here, while during the very first pass the 'tupindex' variable may contain any garbage. And it happend that on my system there was negative value as I found inspecting core dump: (gdb) info locals num_dead_tuples = 5 tottuples = 0 tupindex = -1819017215 Which leads to failure in the next line tblk = ItemPointerGetBlockNumber(>dead_tuples[tupindex]); The solution is to move this assignment inside the cycle. I've read the second patch. 1. What is the reason to inline vac_cmp_itemptr() ? 2. +#define LAZY_MIN_TUPLESMax(MaxHeapTuplesPerPage, (128<<20) / sizeof(ItemPointerData)) What does 128<<20 mean? Why not 1<<27? I'd ask you to replace it with named constant, or at least add a comment. I'll share my results of performance testing it in a few days. -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
27.12.2016 16:54, Alvaro Herrera: Anastasia Lubennikova wrote: I ran configure using following set of flags: ./configure --enable-tap-tests --enable-cassert --enable-debug --enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer" And then ran make check. Here is the stacktrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, vacrelstats=0x1ef6e00) at vacuumlazy.c:1417 1417tblk = ItemPointerGetBlockNumber(>dead_tuples[tupindex]); This doesn't make sense, since the patch removes the "tupindex" variable in that function. Sorry, I didn't get what you mean. In 0002-Vacuum-allow-using-more-than-1GB-work-mem-v3.patch -tupindex = 0; -while (tupindex < vacrelstats->num_dead_tuples) +segindex = 0; +tottuples = 0; +for (segindex = 0; segindex <= vacrelstats->dead_tuples.last_seg; tupindex = 0, segindex++) { variable is still there, it was just moved to the loop. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
23.12.2016 22:54, Claudio Freire: On Fri, Dec 23, 2016 at 1:39 PM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: I found the reason. I configure postgres with CFLAGS="-O0" and it causes Segfault on initdb. It works fine and passes tests with default configure flags, but I'm pretty sure that we should fix segfault before testing the feature. If you need it, I'll send a core dump. I just ran it with CFLAGS="-O0" and it passes all checks too: CFLAGS='-O0' ./configure --enable-debug --enable-cassert make clean && make -j8 && make check-world A stacktrace and a thorough description of your build environment would be helpful to understand why it breaks on your system. I ran configure using following set of flags: ./configure --enable-tap-tests --enable-cassert --enable-debug --enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer" And then ran make check. Here is the stacktrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, vacrelstats=0x1ef6e00) at vacuumlazy.c:1417 1417tblk = ItemPointerGetBlockNumber(>dead_tuples[tupindex]); (gdb) bt #0 0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, vacrelstats=0x1ef6e00) at vacuumlazy.c:1417 #1 0x00693dfe in lazy_scan_heap (onerel=0x1ec2360, options=9, vacrelstats=0x1ef6e00, Irel=0x1ef7168, nindexes=2, aggressive=1 '\001') at vacuumlazy.c:1337 #2 0x00691e66 in lazy_vacuum_rel (onerel=0x1ec2360, options=9, params=0x7ffe0f866310, bstrategy=0x1f1c4a8) at vacuumlazy.c:290 #3 0x0069191f in vacuum_rel (relid=1247, relation=0x0, options=9, params=0x7ffe0f866310) at vacuum.c:1418 #4 0x00690122 in vacuum (options=9, relation=0x0, relid=0, params=0x7ffe0f866310, va_cols=0x0, bstrategy=0x1f1c4a8, isTopLevel=1 '\001') at vacuum.c:320 #5 0x0068fd0b in vacuum (options=-1652367447, relation=0x0, relid=3324614038, params=0x1f11bf0, va_cols=0xb59f63, bstrategy=0x1f1c620, isTopLevel=0 '\000') at vacuum.c:150 #6 0x00852993 in standard_ProcessUtility (parsetree=0x1f07e60, queryString=0x1f07468 "VACUUM FREEZE;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0xea5cc0 , completionTag=0x7ffe0f866750 "") at utility.c:669 #7 0x008520da in standard_ProcessUtility (parsetree=0x401ef6cd8, queryString=0x18 address 0x18>, context=PROCESS_UTILITY_TOPLEVEL, params=0x68, dest=0x9e5d62 <AllocSetFree+60>, completionTag=0x7ffe0f8663f0 "`~\360\001") at utility.c:360 #8 0x00851161 in PortalRunMulti (portal=0x7ffe0f866750, isTopLevel=0 '\000', setHoldSnapshot=-39 '\331', dest=0x851161 <PortalRunMulti+19>, altdest=0x7ffe0f8664f0, completionTag=0x1f07e60 "\341\002") at pquery.c:1219 #9 0x00851374 in PortalRunMulti (portal=0x1f0a488, isTopLevel=1 '\001', setHoldSnapshot=0 '\000', dest=0xea5cc0 , altdest=0xea5cc0 , completionTag=0x7ffe0f866750 "") at pquery.c:1345 #10 0x00850889 in PortalRun (portal=0x1f0a488, count=9223372036854775807, isTopLevel=1 '\001', dest=0xea5cc0 , altdest=0xea5cc0 , completionTag=0x7ffe0f866750 "") at pquery.c:824 #11 0x0084a4dc in exec_simple_query (query_string=0x1f07468 "VACUUM FREEZE;\n") at postgres.c:1113 #12 0x0084e960 in PostgresMain (argc=10, argv=0x1e60a50, dbname=0x1e823b0 "template1", username=0x1e672a0 "anastasia") at postgres.c:4091 #13 0x006f967e in init_locale (categoryname=0x100 , category=32766, locale=0xa004692f0 address 0xa004692f0>) at main.c:310 #14 0x7f1e5f463830 in __libc_start_main (main=0x6f93e1 <main+85>, argc=10, argv=0x7ffe0f866a78, init=, fini=, rtld_fini=, stack_end=0x7ffe0f866a68) at ../csu/libc-start.c:291 #15 0x00469319 in _start () core file is quite big, so I didn't attach it to the mail. You can download it here: core dump file <https://drive.google.com/open?id=0B-7gUWL5Lg_gX3VlSXBaZzlKTlk>. Here are some notes about the first patch: 1. prefetchBlkno = blkno & ~0x1f; prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0; I didn't get it what for we need these tricks. How does it differ from: prefetchBlkno = (blkno > 32) ? blkno - 32 : 0; 2. Why do we decrease prefetchBlckno twice? Here: +prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0; And here: if (prefetchBlkno >= 32) +prefetchBlkno -= 32; I'll inspect second patch in a few days and write questions about it. -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
22.12.2016 21:18, Claudio Freire: On Thu, Dec 22, 2016 at 12:22 PM, Claudio Freire <klaussfre...@gmail.com> wrote: On Thu, Dec 22, 2016 at 12:15 PM, Anastasia Lubennikova <lubennikov...@gmail.com> wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation:not tested Hi, I haven't read through the thread yet, just tried to apply the patch and run tests. And it seems that the last attached version is outdated now. It doesn't apply to the master and after I've tried to fix merge conflict, it segfaults at initdb. I'll rebase when I get some time to do it and post an updated version Attached rebased patches. I called them both v3 to be consistent. I'm not sure how you ran it, but this works fine for me: ./configure --enable-debug --enable-cassert make clean make check ... after a while ... === All 168 tests passed. === I reckon the above is equivalent to installcheck, but doesn't require sudo or actually installing the server, so installcheck should work assuming the install went ok. I found the reason. I configure postgres with CFLAGS="-O0" and it causes Segfault on initdb. It works fine and passes tests with default configure flags, but I'm pretty sure that we should fix segfault before testing the feature. If you need it, I'll send a core dump. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Index Scans
22.12.2016 07:19, Amit Kapila: On Wed, Dec 21, 2016 at 8:46 PM, Anastasia Lubennikova <lubennikov...@gmail.com> wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi, thank you for the patch. Results are very promising. Do you see any drawbacks of this feature or something that requires more testing? I think you can focus on the handling of array scan keys for testing. In general, one of my colleagues has shown interest in testing this patch and I think he has tested as well but never posted his findings. I will request him to share his findings and what kind of tests he has done, if any. Check please code related to buffer locking and pinning once again. I got the warning. Here are the steps to reproduce it: Except "autovacuum = off" config is default. pgbench -i -s 100 test pgbench -c 10 -T 120 test SELECT count(aid) FROM pgbench_accounts WHERE aid > 1000 AND aid < 90 AND bid > 800 AND bid < 900; WARNING: buffer refcount leak: [8297] (rel=base/12289/16459, blockNum=2469, flags=0x9380, refcount=1 1) count --- 0 (1 row) postgres=# select 16459::regclass; regclass --- pgbench_accounts_pkey 2. Don't you mind to rename 'amestimateparallelscan' to let's say 'amparallelscan_spacerequired' or something like this? Sure, I am open to other names, but IMHO, lets keep "estimate" in the name to keep it consistent with other parallel stuff. Refer execParallel.c to see how widely this word is used. As far as I understand there is nothing to estimate, we know this size for sure. I guess that you've chosen this name because of 'heap_parallelscan_estimate'. But now it looks similar to 'amestimate' which refers to indexscan cost for optimizer. That leads to the next question. Do you mean 'amcostestimate'? If you want we can rename it amparallelscanestimate to be consistent with amcostestimate. I think that 'amparallelscanestimate' seems less ambiguous than amestimateparallelscan. But it's up to you. There are enough comments to understand the purpose of this field. And why do you call it pageStatus? What does it have to do with page? During scan this tells us whether next page is available for scan. Another option could be to name it as scanStatus, but not sure if that is better. Do you think if we add a comment like "indicates whether next page is available for scan" for this variable then it will be clear? Yes, I think it describes the flag better. 5. Comment for _bt_parallel_seize() says: "False indicates that we have reached the end of scan for current scankeys and for that we return block number as P_NONE." What is the reason to check (blkno == P_NONE) after checking (status == false) in _bt_first() (see code below)? If comment is correct we'll never reach _bt_parallel_done() + blkno = _bt_parallel_seize(scan, ); + if (status == false) + { + BTScanPosInvalidate(so->currPos); + return false; + } + else if (blkno == P_NONE) + { + _bt_parallel_done(scan); + BTScanPosInvalidate(so->currPos); + return false; + } The first time master backend or worker hits last page (calls this API), it will return P_NONE and after that any worker tries to fetch next page, it will return status as false. I think we can expand a comment to explain it clearly. Let me know, if you need more clarification, I can explain it in detail. Got it, I think you can add this explanation to the comment for _bt_parallel_seize(). 6. To avoid code duplication, I would wrap this into the function + /* initialize moreLeft/moreRight appropriately for scan direction */ + if (ScanDirectionIsForward(dir)) + { + so->currPos.moreLeft = false; + so->currPos.moreRight = true; + } + else + { + so->currPos.moreLeft = true; + so->currPos.moreRight = false; + } + so->numKilled = 0; /* just paranoia */ + so->markItemIndex = -1; /* ditto */ Okay, I think we can write a separate function (probably inline function) for above. And after that we can also get rid of _bt_parallel_readpage() which only bring another level of indirection to the code. See, this function is responsible for multiple actions like initializing moreLeft/moreRight positions, reading next page, dropping the lock/pin. So replicating all these actions in the caller will make the code in caller less readable as compared to now. Consider this point and let me know your view
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation:not tested Hi, I haven't read through the thread yet, just tried to apply the patch and run tests. And it seems that the last attached version is outdated now. It doesn't apply to the master and after I've tried to fix merge conflict, it segfaults at initdb. So, looking forward to a new version for review. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Index Scans
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi, thank you for the patch. Results are very promising. Do you see any drawbacks of this feature or something that requires more testing? I'm willing to oo a review. I hadn't do benchmarks yet, but I've read the patch and here are some notes and questions about it. I saw the discussion about parameters in the thread above. And I agree that we'd better concentrate on the patch itself and add them later if necessary. 1. Can't we simply use "if (scan->parallel_scan != NULL)" instead of xs_temp_snap flag? + if (scan->xs_temp_snap) + UnregisterSnapshot(scan->xs_snapshot); I must say that I'm quite new with all this parallel stuff. If you give me a link, where to read about snapshots for parallel workers, my review will be more helpful. Anyway, it would be great to have more comments about it in the code. 2. Don't you mind to rename 'amestimateparallelscan' to let's say 'amparallelscan_spacerequired' or something like this? As far as I understand there is nothing to estimate, we know this size for sure. I guess that you've chosen this name because of 'heap_parallelscan_estimate'. But now it looks similar to 'amestimate' which refers to indexscan cost for optimizer. That leads to the next question. 3. Are there any changes in cost estimation? I didn't find related changes in the patch. Parallel scan is expected to be faster and optimizer definitely should know that. 4. +uint8 ps_pageStatus; /* state of scan, see below */ There is no desciption below. I'd make the comment more helpful: /* state of scan. See possible flags values in nbtree.h */ And why do you call it pageStatus? What does it have to do with page? 5. Comment for _bt_parallel_seize() says: "False indicates that we have reached the end of scan for current scankeys and for that we return block number as P_NONE." What is the reason to check (blkno == P_NONE) after checking (status == false) in _bt_first() (see code below)? If comment is correct we'll never reach _bt_parallel_done() + blkno = _bt_parallel_seize(scan, ); + if (status == false) + { + BTScanPosInvalidate(so->currPos); + return false; + } + else if (blkno == P_NONE) + { + _bt_parallel_done(scan); + BTScanPosInvalidate(so->currPos); + return false; + } 6. To avoid code duplication, I would wrap this into the function + /* initialize moreLeft/moreRight appropriately for scan direction */ + if (ScanDirectionIsForward(dir)) + { + so->currPos.moreLeft = false; + so->currPos.moreRight = true; + } + else + { + so->currPos.moreLeft = true; + so->currPos.moreRight = false; + } + so->numKilled = 0; /* just paranoia */ + so->markItemIndex = -1; /* ditto */ And after that we can also get rid of _bt_parallel_readpage() which only bring another level of indirection to the code. 7. Just a couple of typos I've noticed: * Below flags are used indicate the state of parallel scan. * Below flags are used TO indicate the state of parallel scan. * On success, release lock and pin on buffer on success. * On success release lock and pin on buffer. 8. I didn't find a description of the feature in documentation. Probably we need to add a paragraph to the "Parallel Query" chapter. I will send another review of performance until the end of the week. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Question about pg_control usage
Hi, hackers! I am examining various global variables in ShmemVariableCache, pg_control and pg_controldata. To force and debug xid wraparound, I've implemented a function, that allows to set nextXid value in control file manually, but it doesn't work as expected. After an investigation I found that, while loading global variables in StartupXlog() we read checkPoint location from ControlFile, and then read all the information from the checkPoint record. And never ever use nextXid stored in ControlFile. What is the purpose of having CheckPoint stored in control file then? I thought, that it allows us to start up server after correct shutdown without any xlog, as said in the comment in pg_control.h * Body of CheckPoint XLOG records. This is declared here because we keep * a copy of the latest one in pg_control for possible disaster recovery. * Changing this struct requires a PG_CONTROL_VERSION bump. But as far as I see, we never use this data. Could you please explain, why don't we use information from control file in StartapXlog()? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FSM corruption leading to errors
[3] https://www.postgresql.org/message-id/AMSPR06MB504CD8FE8AA30D4B7C958AAE39E0%40AMSPR06MB504.eurprd06.prod.outlook.com -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services Could you please add the patches to commitfest? I'm going to test them and write a review in a few days. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Covering + unique indexes.
03.10.2016 15:29, Anastasia Lubennikova: 03.10.2016 05:22, Michael Paquier: On Tue, Sep 27, 2016 at 12:17 AM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: Ok, I'll write it in a few days. Marked as returned with feedback per last emails exchanged. The only complaint about this patch was a lack of README, which is fixed now (see the attachment). So, I added it to new CF, marked as ready for committer. One more fix for pg_upgrade. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index d4f9090..99735ce 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals); @@ -1483,7 +1483,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey); Datum dblink_get_pkey(PG_FUNCTION_ARGS) { - int16 numatts; + int16 indnkeyatts; char **results; FuncCallContext *funcctx; int32 call_cntr; @@ -1509,7 +1509,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS) rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT); /* get the array of attnums */ - results = get_pkey_attnames(rel, ); + results = get_pkey_attnames(rel, ); relation_close(rel, AccessShareLock); @@ -1529,9 +1529,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS) attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; - if ((results != NULL) && (numatts > 0)) + if ((results != NULL) && (indnkeyatts > 0)) { - funcctx->max_calls = numatts; + funcctx->max_calls = indnkeyatts; /* got results, keep track of them */ funcctx->user_fctx = results; @@ -2021,10 +2021,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS) * get_pkey_attnames * * Get the primary key attnames for the given relation. - * Return NULL, and set numatts = 0, if no primary key exists. + * Return NULL, and set indnkeyatts = 0, if no primary key exists. */ static char ** -get_pkey_attnames(Relation rel, int16 *numatts) +get_pkey_attnames(Relation rel, int16 *indnkeyatts) { Relation indexRelation; ScanKeyData skey; @@ -2034,8 +2034,8 @@ get_pkey_attnames(Relation rel, int16 *numatts) char **result = NULL; TupleDesc tupdesc; - /* initialize numatts to 0 in case no primary key exists */ - *numatts = 0; + /* initialize indnkeyatts to 0 in case no primary key exists */ + *indnkeyatts = 0; tupdesc = rel->rd_att; @@ -2056,12 +2056,12 @@ get_pkey_attnames(Relation rel, int16 *numatts) /* we're only interested if it is the primary key */ if (index->indisprimary) { - *numatts = index->indnatts; - if (*numatts > 0) + *indnkeyatts = index->indnkeyatts; + if (*indnkeyatts > 0) { -result = (char **) palloc(*numatts * sizeof(char *)); +result = (char **) palloc(*indnkeyatts * sizeof(char *)); -for (i = 0; i < *numatts; i++) +for (i = 0; i < *indnkeyatts; i++) result[i] = SPI_fname(tupdesc, index->indkey.values[i]); } break; diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 7352b29..c024cf3 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS) /* we're only interested if it is the primary key and valid */ if (index->indisprimary && IndexIsValid(index)) { - int numatts = index->indnatts; + int indnkeyatts = index->indnkeyatts; - if (numatts > 0) + if (indnkeyatts > 0) { int i; @@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS) appendStringInfoCharMacro(payload, ','); appendStringInfoCharMacro(payload, operation); -for (i = 0; i < numatts; i++) +for (i = 0; i < indnkeyatts; i++) { int colno = index->indkey.values[i]; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 29738b0..9c7f9e5 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3564,6 +3564,14 @@ pg_class.relnatts) + + indnkeyatts + int2 + + The number of key columns in the index. "Key columns" are ordinary + index columns in contrast with "included" columns. + + indisunique bool diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/inde
Re: [HACKERS] WIP: Covering + unique indexes.
03.10.2016 05:22, Michael Paquier: On Tue, Sep 27, 2016 at 12:17 AM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: Ok, I'll write it in a few days. Marked as returned with feedback per last emails exchanged. The only complaint about this patch was a lack of README, which is fixed now (see the attachment). So, I added it to new CF, marked as ready for committer. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index d4f9090..99735ce 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals); @@ -1483,7 +1483,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey); Datum dblink_get_pkey(PG_FUNCTION_ARGS) { - int16 numatts; + int16 indnkeyatts; char **results; FuncCallContext *funcctx; int32 call_cntr; @@ -1509,7 +1509,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS) rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT); /* get the array of attnums */ - results = get_pkey_attnames(rel, ); + results = get_pkey_attnames(rel, ); relation_close(rel, AccessShareLock); @@ -1529,9 +1529,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS) attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; - if ((results != NULL) && (numatts > 0)) + if ((results != NULL) && (indnkeyatts > 0)) { - funcctx->max_calls = numatts; + funcctx->max_calls = indnkeyatts; /* got results, keep track of them */ funcctx->user_fctx = results; @@ -2021,10 +2021,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS) * get_pkey_attnames * * Get the primary key attnames for the given relation. - * Return NULL, and set numatts = 0, if no primary key exists. + * Return NULL, and set indnkeyatts = 0, if no primary key exists. */ static char ** -get_pkey_attnames(Relation rel, int16 *numatts) +get_pkey_attnames(Relation rel, int16 *indnkeyatts) { Relation indexRelation; ScanKeyData skey; @@ -2034,8 +2034,8 @@ get_pkey_attnames(Relation rel, int16 *numatts) char **result = NULL; TupleDesc tupdesc; - /* initialize numatts to 0 in case no primary key exists */ - *numatts = 0; + /* initialize indnkeyatts to 0 in case no primary key exists */ + *indnkeyatts = 0; tupdesc = rel->rd_att; @@ -2056,12 +2056,12 @@ get_pkey_attnames(Relation rel, int16 *numatts) /* we're only interested if it is the primary key */ if (index->indisprimary) { - *numatts = index->indnatts; - if (*numatts > 0) + *indnkeyatts = index->indnkeyatts; + if (*indnkeyatts > 0) { -result = (char **) palloc(*numatts * sizeof(char *)); +result = (char **) palloc(*indnkeyatts * sizeof(char *)); -for (i = 0; i < *numatts; i++) +for (i = 0; i < *indnkeyatts; i++) result[i] = SPI_fname(tupdesc, index->indkey.values[i]); } break; diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 7352b29..c024cf3 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS) /* we're only interested if it is the primary key and valid */ if (index->indisprimary && IndexIsValid(index)) { - int numatts = index->indnatts; + int indnkeyatts = index->indnkeyatts; - if (numatts > 0) + if (indnkeyatts > 0) { int i; @@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS) appendStringInfoCharMacro(payload, ','); appendStringInfoCharMacro(payload, operation); -for (i = 0; i < numatts; i++) +for (i = 0; i < indnkeyatts; i++) { int colno = index->indkey.values[i]; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 322d8d6..0983c93 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3564,6 +3564,14 @@ pg_class.relnatts) + + indnkeyatts + int2 + + The number of key columns in the index. "Key columns" are ordinary + index columns in contrast with "included" columns. + + indisunique bool diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index 40f201b..92bef4a 100644 --- a/doc/src/sgml/indexam.sgml +++ b/d
Re: [HACKERS] sequences and pg_upgrade
23.09.2016 21:06, Peter Eisentraut: Here is an updated patch set. Compared to the initial set, I have changed pg_dump's sorting priorities so that sequence data is always after table data. This would otherwise have introduced a problem because sortDataAndIndexObjectsBySize() only considers consecutive DO_TABLE_DATA entries. Also, I have removed the separate --sequence-data switch from pg_dump and made it implicit in --binary-upgrade. (So the previous patches 0002 and 0003 have been combined, because it's no longer a separate feature.) The patches are good, no complaints. But again, I have the same question. I was confused, why do we always dump sequence data, because I'd overlooked the --sequence-data key. I'd rather leave this option, because it's quite non intuitive behaviour... /* dump sequence data even in schema-only mode */ -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
24.09.2016 15:36, Amit Kapila: On Wed, Sep 21, 2016 at 6:51 PM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: 20.09.2016 08:21, Amit Kapila: On Tue, Sep 6, 2016 at 10:18 PM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: 28.08.2016 09:13, Amit Kapila: The problem seems really tricky, but the answer is simple. We store included columns unordered. It was mentioned somewhere in this thread. Is there any fundamental problem in storing them in ordered way? I mean to say, you need to anyway store all the column values on leaf page, so why can't we find the exact location for the complete key. Basically use truncated key to reach to leaf level and then use the complete key to find the exact location to store the key. I might be missing some thing here, but if we can store them in ordered fashion, we can use them even for queries containing ORDER BY (where ORDER BY contains included columns). I'd say that the reason for not using included columns in any operations which require comparisons, is that they don't have opclass. Let's go back to the example of points. This data type don't have any opclass for B-tree, because of fundamental reasons. And we can not apply _bt_compare() and others to this attribute, so we don't include it to scan key. create table t (i int, i2 int, p point); create index idx1 on (i) including (i2); create index idx2 on (i) including (p); create index idx3 on (i) including (i2, p); create index idx4 on (i) including (p, i2); You can keep tuples ordered in idx1, but not for idx2, partially ordered for idx3, but not for idx4. At the very beginning of this thread [1], I suggested to use opclass, where possible. Exactly the same idea, you're thinking about. But after short discussion, we came to conclusion that it would require many additional checks and will be too complicated, at least for the initial patch. Let me give you an example: create table t (i int, p point); create index on (i) including (p); "point" data type doesn't have any opclass for btree. Should we insert (0, '(0,2)') before (0, '(1,1)') or after? We have no idea what is the "correct order" for this attribute. So the answer is "it doesn't matter". When searching in index, we know that only key attrs are ordered, so only them can be used in scankey. Other columns are filtered after retrieving data. explain select i,p from t where i =0 and p <@ circle '((0,0),2)'; QUERY PLAN --- Index Only Scan using idx on t (cost=0.14..4.20 rows=1 width=20) Index Cond: (i = 0) Filter: (p <@ '<(0,0),2>'::circle) I think here reason for using Filter is that because we don't keep included columns in scan keys, can't we think of having them in scan keys, but use only key columns in scan key to reach till leaf level and then use complete scan key at leaf level. What should I add to README (or to documentation), to make it more understandable? May be add the data representation like only leaf pages contains all the columns and how the scan works. I think you can see if you can extend "Notes About Data Representation" and or "Other Things That Are Handy to Know" sections in existing README. Ok, I'll write it in a few days. [1] https://www.postgresql.org/message-id/55f84df4.5030...@postgrespro.ru -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] File system operations.
Hi, hackers! Do we have any standard set of functions to operate with files? I mean link(), copy(), rename(), unlink(), ftruncate() and so on. Mostly I worry about the first two since they have different implementation on Windows. I found a couple of functions in src/backend/storage/file/copydir.c, some more in src/bin/pg_upgrade/file.c along with a set of defines in src/bin/pg_upgrade/pg_upgrade.h. Obviously md.c and fd.c contain some functions. Do we have any policy of using them? There is a comment in fd.c * For this scheme to work, most (if not all) routines throughout the * server should use these interfaces instead of calling the C library * routines (e.g., open(2) and fopen(3)) themselves. Otherwise, we * may find ourselves short of real file descriptors anyway. and even more in fd.h *File {Close, Read, Write, Seek, Tell, Sync} *{Path Name Open, Allocate, Free} File * * These are NOT JUST RENAMINGS OF THE UNIX ROUTINES. * Use them for all file activity... but I see that, for example, pg_open_tzfile() and get_controlfile(), logfile_open() call open()/fopen() directly. Same behavior you can find for any C library function. Am I missing something important or it is simply a legasy/overlooked code? What do you think about moving stuff from pg_upgrade/file.c to storage/file/ to allow reuse of this code? I think it'll be really helpful for developers of contrib modules like backup managers. -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequences and pg_upgrade
15.09.2016 15:29, Peter Eisentraut: On 9/14/16 8:52 AM, Anastasia Lubennikova wrote: Could you clarify, please, why do we dump sequence in schemaOnly mode? + if (dopt.schemaOnly && dopt.sequence_data) + getSequenceData(, tblinfo, numTables, dopt.oids); The point of this patch is that with the new option, you can dump sequence data (but not table data) alongside with the schema. This would be used by pg_upgrade for the reasons described at the beginning of the thread. Oh, thank you. Now I see. Somewhy I thought that it *always* dumps sequence data in schemaOnly mode. Which is definitely not true. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
One more update. I added ORDER BY clause to regression tests. It was done as a separate bugfix patch by Tom Lane some time ago, but it definitely should be included into the patch. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index d4f9090..99735ce 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals); @@ -1483,7 +1483,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey); Datum dblink_get_pkey(PG_FUNCTION_ARGS) { - int16 numatts; + int16 indnkeyatts; char **results; FuncCallContext *funcctx; int32 call_cntr; @@ -1509,7 +1509,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS) rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT); /* get the array of attnums */ - results = get_pkey_attnames(rel, ); + results = get_pkey_attnames(rel, ); relation_close(rel, AccessShareLock); @@ -1529,9 +1529,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS) attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; - if ((results != NULL) && (numatts > 0)) + if ((results != NULL) && (indnkeyatts > 0)) { - funcctx->max_calls = numatts; + funcctx->max_calls = indnkeyatts; /* got results, keep track of them */ funcctx->user_fctx = results; @@ -2021,10 +2021,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS) * get_pkey_attnames * * Get the primary key attnames for the given relation. - * Return NULL, and set numatts = 0, if no primary key exists. + * Return NULL, and set indnkeyatts = 0, if no primary key exists. */ static char ** -get_pkey_attnames(Relation rel, int16 *numatts) +get_pkey_attnames(Relation rel, int16 *indnkeyatts) { Relation indexRelation; ScanKeyData skey; @@ -2034,8 +2034,8 @@ get_pkey_attnames(Relation rel, int16 *numatts) char **result = NULL; TupleDesc tupdesc; - /* initialize numatts to 0 in case no primary key exists */ - *numatts = 0; + /* initialize indnkeyatts to 0 in case no primary key exists */ + *indnkeyatts = 0; tupdesc = rel->rd_att; @@ -2056,12 +2056,12 @@ get_pkey_attnames(Relation rel, int16 *numatts) /* we're only interested if it is the primary key */ if (index->indisprimary) { - *numatts = index->indnatts; - if (*numatts > 0) + *indnkeyatts = index->indnkeyatts; + if (*indnkeyatts > 0) { -result = (char **) palloc(*numatts * sizeof(char *)); +result = (char **) palloc(*indnkeyatts * sizeof(char *)); -for (i = 0; i < *numatts; i++) +for (i = 0; i < *indnkeyatts; i++) result[i] = SPI_fname(tupdesc, index->indkey.values[i]); } break; diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 7352b29..c024cf3 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS) /* we're only interested if it is the primary key and valid */ if (index->indisprimary && IndexIsValid(index)) { - int numatts = index->indnatts; + int indnkeyatts = index->indnkeyatts; - if (numatts > 0) + if (indnkeyatts > 0) { int i; @@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS) appendStringInfoCharMacro(payload, ','); appendStringInfoCharMacro(payload, operation); -for (i = 0; i < numatts; i++) +for (i = 0; i < indnkeyatts; i++) { int colno = index->indkey.values[i]; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 322d8d6..0983c93 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3564,6 +3564,14 @@ pg_class.relnatts) + + indnkeyatts + int2 + + The number of key columns in the index. "Key columns" are ordinary + index columns in contrast with "included" columns. + + indisunique bool diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index 40f201b..92bef4a 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -110,6 +110,8 @@ typedef struct IndexAmRoutine boolamclusterable; /* does AM handle predicate locks? */ boolampredlocks; +/* does AM s
Re: [HACKERS] sequences and pg_upgrade
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, failed Thank you for the patch. As I see there are no objections in the discussion, all the patches look clear. Could you clarify, please, why do we dump sequence in schemaOnly mode? + if (dopt.schemaOnly && dopt.sequence_data) + getSequenceData(, tblinfo, numTables, dopt.oids); Example: postgres=# create table t(i serial, data text); postgres=# insert into t(data) values ('aaa'); pg_dump -d postgres --sequence-data --schema-only > ../reviews/dump_pg Then restore it into newdb and add new value. newdb=# insert into t(data) values ('aaa'); INSERT 0 1 newdb=# select * from t; i | data ---+-- 2 | aaa I'm not an experienced user, but I thought that while doing dump/restore of schema of database we reset all the data. Why should the table in newly created (via pg_restore) database have non-default sequence value? I also did some other tests and all of them were passed. One more thing to do is a documentation for the new option. You should update help() function in pg_dump.c and also add some notes to pg_dump.sgml and probably to pgupgrade.sgml. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
28.08.2016 09:13, Amit Kapila: On Mon, Aug 15, 2016 at 8:15 PM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: @@ -590,7 +622,14 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup) if (last_off == P_HIKEY) { Assert(state->btps_minkey == NULL); - state->btps_minkey = CopyIndexTuple(itup); + /* + * Truncate the tuple that we're going to insert + * into the parent page as a downlink + */ + if (indnkeyatts != indnatts && P_ISLEAF(pageop)) + state->btps_minkey = index_truncate_tuple(wstate->index, itup); + else + state->btps_minkey = CopyIndexTuple(itup); It seems that above code always ensure that for leaf pages, high key is a truncated tuple. What is less clear is if that is true, why you need to re-ensure it again for the old page in below code: Thank you for the question. Investigation took a long time) As far as I understand, the code above only applies to the first tuple of each level. While the code you have quoted below truncates high keys for all other pages. There is a comment that clarifies situation: /* * If the new item is the first for its page, stash a copy for later. Note * this will only happen for the first item on a level; on later pages, * the first item for a page is copied from the prior page in the code * above. */ So the patch is correct. We can go further and remove this index_truncate_tuple() call, because the first key of any inner (or root) page doesn't need any key at all. It simply points out to the leftmost page of the level below. But it's not a bug, because truncation of one tuple per level doesn't add any considerable overhead. So I want to leave the patch in its current state. @@ -510,6 +513,8 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup) { .. + if (indnkeyatts != indnatts && P_ISLEAF(opageop)) + { + /* + * It's essential to truncate High key here. + * The purpose is not just to save more space on this particular page, + * but to keep whole b-tree structure consistent. Subsequent insertions + * assume that hikey is already truncated, and so they should not + * worry about it, when copying the high key into the parent page + * as a downlink. + * NOTE It is not crutial for reliability in present, + * but maybe it will be that in the future. + */ + keytup = index_truncate_tuple(wstate->index, oitup); + + /* delete "wrong" high key, insert keytup as P_HIKEY. */ + PageIndexTupleDelete(opage, P_HIKEY); + + if (!_bt_pgaddtup(opage, IndexTupleSize(keytup), keytup, P_HIKEY)) + elog(ERROR, "failed to rewrite compressed item in index \"%s\"", + RelationGetRelationName(wstate->index)); + } + .. .. -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Refactoring of heapam code.
06.09.2016 07:44, Pavan Deolasee: On Mon, Aug 8, 2016 at 3:13 PM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> wrote: Thank you for the review, I'll fix these problems in final version. Posting the first message I intended to raise the discussion. Patches were attached mainly to illustrate the problem and to be more concrete. I started looking at the first patch posted above, but it seems you'll rewrite these patches after discussion on the heapam refactoring ideas you posted here https://wiki.postgresql.org/wiki/HeapamRefactoring <https://wiki.postgresql.org/wiki/HeapamRefactoring> concludes. Thank you for the review. Some comments anyways on the first patch: 1. It does not apply cleanly with git-apply - many white space errors I'll fix all merge issues and attach it to the following message. 2. I don't understand the difference between PageGetItemHeapHeaderOnly() and PageGetItemHeap(). They seem to do exactly the same thing and can be used interchangeably. The only difference between these two macros is that PageGetItemHeapHeaderOnly() doesn't touch any key fields of a tuple, it only checks header fields (see HeapTupleHeaderData). I divided it into two separate functions, while I was working on page compression and I wanted to avoid unnecessary decompression calls. These names are just a kind of 'markers' to make the code reading and improving easier. 3. While I like the idea of using separate interfaces to get heap/index tuple from a page, may be they can internally use a common interface instead of duplicating what PageGetItem() does already. I don't sure I get it right. Do you suggest to leave PageGetItem and write PageGetItemHeap() and PageGetItemIndex() as wrappers on it? It sounds reasonable while we have similar layout for both heap and index pages. In any case, it'll be easy to separate them when necessary. 4. Should we rename PageGetItemHeap() to PageGetHeapTuple() or something like that? I don't feel like doing that, because HeapTuple is a different structure. What we do get from page is a HeapTupleHeaderData structure followed by user's data. 5. If we do this, we should probably have corresponding wrapper functions/macros for remaining callers of PageGetItem() There are some calls of PageGetItem() left in BRIN and SP-gist indexes, I can write simple wrappers for them. I left them just because they were out of interest for my compression prototype. I also looked at the refactoring design doc. Looks like a fine approach to me, but the API will need much more elaborate discussions. I am not sure if the interfaces as presented are enough to support everything that even heapam can do today. What features of heapam do you think could be unsupportable in this API? Maybe I've just missed them. My point here is that heapam is too complicated for many simpler use-cases read-only tables and append-only tables do not need many MVCC related tricks like vacuum and complicated locking, tables with fixed len attributes can use more optimal page layout. And so on. I suggest refactoring, that will allow us to develop new heap-like access methods. For the first version, they must have methods to "heapify" tuple i.e turn internal representation of the data to regular HeapTuple, for example put some stubs into MVCC fields. Of course this approach has its disadvantages, such as performance issues. It definitely won't be enough to write column storage or to implement other great data structures. But it allows us not to depend of the Executor's code. And much more thoughts will be required to ensure we don't miss out things that new primary access method may need. Do you have any particular ideas of new access methods? A few points regarding how the proposed API maps to heapam: - How do we support parallel scans on heap? As far as I understand, parallel heap scan requires one more API function heap_beginscan_parallel(). It uses the same API function - heap_getnext(), but in addition to ordinary seq scan it selects page to read in a synchronized manner. - Does the primary AM need to support locking of rows? That's a good question. I'd say it should be an option. - Would there be separate API to interpret the PAM tuple itself? (something that htup_details.h does today). It seems natural that every PAM will have its own interpretation of tuple structure and we would like to hide that inside PAM implementation. As I already wrote, for the first prototype, I'd use function to "heapify" tuple and don't care about executor issues at all. More detailed discussion of this question you can find in another thread [1]. - There are many upper modules that need access to system attributes (xmin, xmax for starters). How do you plan to handle that? You think we can provide enough abstraction so that the callers don't need to know
Re: [HACKERS] Optimization for lazy_scan_heap
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: not tested Documentation:not tested Hi, I haven't tested the performance yet, but the patch itself looks pretty good and reasonable improvement. I have a question about removing the comment. It seems to be really tricky moment. How do we know that all-frozen block hasn't changed since the moment we checked it? -* Tricky, tricky. If this is in aggressive vacuum, the page -* must have been all-frozen at the time we checked whether it -* was skippable, but it might not be any more. We must be -* careful to count it as a skipped all-frozen page in that -* case, or else we'll think we can't update relfrozenxid and -* relminmxid. If it's not an aggressive vacuum, we don't -* know whether it was all-frozen, so we have to recheck; but -* in this case an approximate answer is OK. +* We know that there are n_skipped pages by the visibilitymap scan we +* did just before. */ I'm going to test the performance this week. I wonder if you could send a test script or describe the steps to test it? The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pluggable storage
es forwards (say a batch of tuples in a columnar storage AM has column values spread across many buffers; they must all be kept pinned until the scan has moved past the whole set of tuples). But I'm not really sure that this is a great design. Frankly, I doubt that it's real to implement columnar storage just as a variant of pluggable storage. It requires a lot of changes in executor and optimizer and so on, which are hardly compatible with existing tuple-oriented model. However I'm not so good in this area, so if you feel that it's possible, go ahead. I welcome comments on these ideas. My patch for this is nowhere near completion yet; expect things to change for items that I've overlooked, but I hope I didn't overlook any major. If things are handwavy, it is probably because I haven't fully figured them out yet. Thank you again for beginning the big project. Looking forward to the prototype. I think it will make the discussion more concrete and useful. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
09.08.2016 19:45, Andrew Borodin: Here is new version of the patch, now it includes recommendations from Anastasia Lubennikova. I've investigated anamalous index size decrease. Most probable version appeared to be true. Cube extension, as some others, use Guttman's polynomial time split algorithm. It is known to generate "needle-like" MBBs (MBRs) for sorted data due to imbalanced splits (splitting 100 tuples as 98 to 2). Due to previous throw-to-the-end behavior of GiST this imbalance was further amplificated (most of inserts were going to bigger part after split). So GiST inserts were extremely slow for sorted data. There is no need to do exact sorting to trigger this behavior. This behavior can be fused by implementation of small-m restriction in split (AFAIR this is described in original R-tree paper from 84), which prescribes to do a split in a parts no smaller than m, where m is around 20% of a page capacity (in tuples number). After application of this patch performance for sorted data is roughly the same as performance for randomized data. Thank you for explanation. Now it's clear to me. I did some more testing and found nothing special. The declared feature is implemented correctly. It passes all regression tests and improves performance. I still have a few minor nitpicks about the patch style. You can find a style guide on https://www.postgresql.org/docs/9.6/static/source.html 1) remove extra whitespace in README 2) add whitespace: if(ntup == 1) 3) fix comments in accordance with coding conventions /* In case of single tuple update GiST calls overwrite * In all other cases function gistplacetopage deletes * old tuples and place updated at the end * */ +/* Normally here was following assertion + * Assert(ItemIdHasStorage(ii)); + * This assertion was introduced in PageIndexTupleDelete + * But if this function will be used from BRIN index + * this assertion will fail. Thus, here we do not check that + * item has the storage. + * */ 4) remove unrelated changes -data += sizeof(OffsetNumber) * xldata->ntodelete; +data += sizeof(OffsetNumber) *xldata->ntodelete; 5) If the comment is correct, maxalignment is not necessary. +/* tuples on a page are always maxaligned */ +oldsize = MAXALIGN(oldsize); 6) I'd rather use alignednewsize here. +ItemIdSetNormal(tupid, offset + size_diff, newsize); After the cleanup you can change status to "Ready for Committer" without waiting for the response. If data is randomized this effect of Guttman poly-time split is not in effect; test from the start of the thread will show no statistically confident improvement by the patch. To prove performance increase in randomized case I've composed modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1 This test with five runs shows following: Before patch Insert Time AVG 78 seconds STD 9.5 Afer patch Insert Time AVG 68 seconds STD 7.7 This is marginal but statistically viable performance improvement. For sorted data performance is improved by a factor of 3. Best regards, Andrey Borodin, Octonica & Ural Federal University. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pluggable storage
alone. This is somewhat messy in the case of heap_multi_insert because it returns several items; I think it's acceptable to return an array of ItemPointers in the same order as the input tuples. This works fine for the only caller, which is COPY in batch mode. For the other routines, they don't really care where the TID is returned AFAICS. Additional noteworthy items: i) Speculative insertion: the speculative insertion token is no longer installed directly in the heap tuple by the executor (of course). Instead, the token becomes part of the slot. When the tuple_insert method is called, the insertion routine is in charge of setting the token from the slot into the storage tuple. Executor is in charge of calling method->speculative_finish() / abort() once the insertion has been confirmed by the indexes. ii) execTuples has additional accessors for tuples-in-slot, such as ExecFetchSlotTuple and friends. I expect to have some of them to return abstract StorageTuples, others HeapTuple or MinimalTuples (possibly wrapped in Datum), depending on callers. We might be able to cut down on these later; my first cut will try to avoid API changes to keep fallout to a minimum. iii) All tuples need to be identifiable by ItemPointers. Storages that have different requirements will need careful additional thought across the board. iv) System catalogs cannot use pluggable storage. We continue to use heap_open etc in the DDL code, in order not to make this more invasive that it already is. We may lift this restriction later for specific catalogs, as needed. v) Currently, one Buffer may be associated with one HeapTuple living in a slot; when the slot is cleared, the buffer pin is released. My current patch moves the buffer pin to inside the heapam-based storage AM and the buffer is released by the ->slot_clear_tuple method. The rationale for doing this is that some storage AMs might want to keep several buffers pinned at once, for example, and must not to release those pins individually but in batches as the scan moves forwards (say a batch of tuples in a columnar storage AM has column values spread across many buffers; they must all be kept pinned until the scan has moved past the whole set of tuples). But I'm not really sure that this is a great design. I welcome comments on these ideas. My patch for this is nowhere near completion yet; expect things to change for items that I've overlooked, but I hope I didn't overlook any major. If things are handwavy, it is probably because I haven't fully figured them out yet. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring of heapam code.
08.08.2016 12:43, Anastasia Lubennikova: 08.08.2016 03:51, Michael Paquier: On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: Anastasia Lubennikova wrote: So there is a couple of patches. They do not cover all mentioned problems, but I'd like to get a feedback before continuing. I agree that we could improve things in this general area, but I do not endorse any particular changes you propose in these patches; I haven't reviewed your patches. Well, I am interested in the topic. And just had a look at the patches proposed. + /* + * Split PageGetItem into set of different macros + * in order to make code more readable. + */ +#define PageGetItemHeap(page, itemId) \ +( \ + AssertMacro(PageIsValid(page)), \ + AssertMacro(ItemIdHasStorage(itemId)), \ + (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \ +) Placing into bufpage.h macros that are specific to heap and indexes is not that much a good idea. I'd suggest having the heap ones in heapam.h, and the index ones in a more generic place, as src/access/common as already mentioned by Alvaro. + onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid); Just PageGetItemHeapHeader would be fine. @@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate) pfree(bistate); } - /* * heap_insert - insert tuple into a heap Those patches have some noise. Patch 0002 is doing two things: reorganizing the order of the routines in heapam.c and move some routines from heapam.c to hio.c. Those two could be separate patches/ If the final result is to be able to extend custom AMs for tables, we should have a structure like src/access/common/index.c, src/access/common/table.c and src/access/common/relation.c for example, and have headers dedicated to them. But yes, there is definitely a lot of room for refactoring as we are aiming, at least I guess so, at having more various code paths for access methods. Thank you for the review, I'll fix these problems in final version. Posting the first message I intended to raise the discussion. Patches were attached mainly to illustrate the problem and to be more concrete. Thank you for attention and feedback. Since there are no objections to the idea in general, I'll write an exhaustive README about current state of the code and then propose API design to discuss details. Stay tuned. Here is the promised design draft. https://wiki.postgresql.org/wiki/HeapamRefactoring Looking forward to your comments. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
14.08.2016 20:11, Andrey Borodin: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, failed Spec compliant: tested, passed Documentation:tested, passed Hi hackers! I've read the patch and here is my code review. ==PURPOSE I've used this feature from time to time with MS SQL. From my experience INCLUDE is a 'sugar on top' feature. Some MS SQL classes do not even mention INCLUDE despite it's there from 2005 (though classes do not mention lots of important things, so it's not kind of valuable indicator). But those who use it, use it whenever possible. For example, system view with recommended indices rarely list one without INCLUDE columns. So, this feature is very important from perspective of converting MS SQL DBAs to PostgreSQL. This is how I see it. Thank you for the review, I hope this feature will be useful for many people. SUGGESTIONS== 0. Index build is broken. This script https://github.com/x4m/pggistopt/blob/8ad65d2e305e98c836388a07909af5983dba9c73/test.sql SEGFAULTs and may cause situation when you cannot insert anything into table (I think drop of index would help, but didn't tested this) Thank you for reporting. That was a bug caused by high key truncation, that occurs when index has more than 3 levels. Fixed. See attached file. 1. I think MS SQL syntax INCLUDE instead of INCLUDING would be better (for a purpose listed above) I've chosen this particular name to avoid using of new keyword. We already have INCLUDING in postgres in a context of inheritance that will never intersect with covering indexes. I'm sure it won't be a big problem of migration from MsSQL. 2. Empty line added in ruleutils.c. Is it for a reason? No, just a missed line. Fixed. 3. Now we have indnatts and indnkeyatts instead of indnatts. I think it is worth considering renaming indnatts to something different from old name. Someone somewhere could still suppose it's a number of keys. I agree that naming became vague after this patch. I've already suggested to replace "indkeys[]" with more specific name, and AFAIR there was no reaction. So I didn't do that. But I don't sure about your suggestion regarding indnatts. Old queries (and old indexes) can still use it correctly. I don't see a reason to break compatibility for all users. Those who will use this new feature, should ensure that their queries to pg_index behave as expected. PERFORMANCE== Due to suggestion number 0 I could not measure performance of index build. Index crashes when there's more than 1.1 million of rows in a table. Performance test script is here https://github.com/x4m/pggistopt/blob/f206b4395baa15a2fa42897eeb27bd555619119a/test.sql Test scenario is following: 1. Create table, then create index, then add data. 2. Make a query touching data in INCLUDING columns. This scenario is tested against table with: A. Table with index, that do not contain touched columns, just PK. B. Index with all columns in index. C. Index with PK in keys and INCLUDING all other columns. Tests were executed 5 times on Ubuntu VM under Hyper-V i5 2500 CPU, 16 Gb of RAM, SSD disk. Time to insert 10M rows: A. AVG 110 seconds STD 4.8 B. AVG 121 seconds STD 2.0 C. AVG 111 seconds STD 5.7 Inserts to INCLUDING index is almost as fast as inserts to index without extra columns. Time to run SELECT query: A. AVG 2864 ms STD 794 B. AVG 2329 ms STD 84 C. AVG 2293 ms STD 58 Selects with INCLUDING columns is almost as fast as with full index. Index size (deterministic measure, STD = 0) A. 317 MB B. 509 MB C. 399 MB Index size is in the middle between full index and minimal index. I think this numbers agree with expectation from the feature. CONCLUSION== This patch brings useful and important feature. Build shall be repaired; other my suggestions are only suggestions. Best regards, Andrey Borodin, Octonica & Ural Federal University. The new status of this patch is: Waiting on Author -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 9c8e308..891325d 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums,
Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
06.08.2016 19:51, Andrew Borodin: Anastasia , thank you for your attentive code examine. 2016-08-05 21:19 GMT+05:00 Anastasia Lubennikova <a.lubennik...@postgrespro.ru>: First of all, shouldn't we use MAXALIGN(oldsize) instead of oldsize? Although, I'm quite sure that it was already aligned somewhere before. I doubt that the check (size_diff != MAXALIGN(size_diff)) is necessary. I'd rather add the check: (offset+size_diff < pd_lower). Actually, that's a tricky question. There may be different code expectations. 1. If we expect that not-maxaligned tuple may be placed by any other routine, we should remove check (size_diff != MAXALIGN(size_diff)), since it will fail for not-maxaligned tuple. 2. If we expect that noone should use PageIndexTupleOverwrite with not-maxaligned tuples, than current checks are OK: we will break execution if we find non-maxaligned tuples. With an almost correct message. I suggest that this check may be debug-only assertion: in a production code routine will work with not-maxaligned tuples if they already reside on the page, in a dev build it will inform dev that something is going wrong. Is this behavior Postgres-style compliant? I agree that pd_lower check makes sense. Pointing out to this comparison I worried about possible call of MAXALIGN for negative size_diff value. Although I don't sure if it can cause any problem. Tuples on a page are always maxaligned. But, as far as I recall, itemid->lp_len can contain non-maxaligned value. So, I'd suggest you to perform MAXALIGN(oldsize) before computing size_diff: size_diff = oldsize - alignednewsize; Since both arguments are maxaligned the check of size_diff is not needed. BTW, I'm very surprised that it improves performance so much. And also size is reduced significantly. 89MB against 289MB without patch. Could you explain in details, why does it happen? Size reduction is unexpected for me. There might be 4 plausible explanations. I'll list them ordered by descending of probability: 1. Before this patch every update was throwing recently updated tuple to the end of a page. Thus, in some-how ordered data, recent best-fit will be discovered last. This can cause increase of MBB's overlap in spatial index and slightly higher tree. But 3 times size decrease is unlikely. How did you obtained those results? Can I look at a test case? Nothing special, I've just run the test you provided in this thread. And check index size via select pg_size_pretty(pg_relation_size('idx')); 2. Bug in PageIndexTupleDelete causing unused space emersion. I've searched for it, unsuccessfully. 3. Bug in PageIndexTupleOVerwrite. I cannot imagine nature of such a bug. May be we are not placing something not very important and big on a page? 4. Magic. I do not see what one should do with the R-tree to change it's size by a factor of 3. First three explanations are not better that forth, actually. Those 89 MB, they do not include WAL, right? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring of heapam code.
08.08.2016 03:51, Michael Paquier: On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: Anastasia Lubennikova wrote: So there is a couple of patches. They do not cover all mentioned problems, but I'd like to get a feedback before continuing. I agree that we could improve things in this general area, but I do not endorse any particular changes you propose in these patches; I haven't reviewed your patches. Well, I am interested in the topic. And just had a look at the patches proposed. + /* + * Split PageGetItem into set of different macros + * in order to make code more readable. + */ +#define PageGetItemHeap(page, itemId) \ +( \ + AssertMacro(PageIsValid(page)), \ + AssertMacro(ItemIdHasStorage(itemId)), \ + (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \ +) Placing into bufpage.h macros that are specific to heap and indexes is not that much a good idea. I'd suggest having the heap ones in heapam.h, and the index ones in a more generic place, as src/access/common as already mentioned by Alvaro. + onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid); Just PageGetItemHeapHeader would be fine. @@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate) pfree(bistate); } - /* * heap_insert - insert tuple into a heap Those patches have some noise. Patch 0002 is doing two things: reorganizing the order of the routines in heapam.c and move some routines from heapam.c to hio.c. Those two could be separate patches/ If the final result is to be able to extend custom AMs for tables, we should have a structure like src/access/common/index.c, src/access/common/table.c and src/access/common/relation.c for example, and have headers dedicated to them. But yes, there is definitely a lot of room for refactoring as we are aiming, at least I guess so, at having more various code paths for access methods. Thank you for the review, I'll fix these problems in final version. Posting the first message I intended to raise the discussion. Patches were attached mainly to illustrate the problem and to be more concrete. Thank you for attention and feedback. Since there are no objections to the idea in general, I'll write an exhaustive README about current state of the code and then propose API design to discuss details. Stay tuned. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring of heapam code.
05.08.2016 20:56, Alvaro Herrera: Anastasia Lubennikova wrote: Working on page compression and some other issues related to access methods, I found out that the code related to heap looks too complicated. Much more complicated, than it should be. Since I anyway got into this area, I want to suggest a set of improvements. Hm. I'm hacking on heapam.c pretty heavily ATM. Your first patch causes no problem I think, or if it does it's probably pretty minor, so that's okay. I'm unsure about the second one -- I think it's okay too, because it mostly seems to be about moving stuff from heapam.c to hio.c and shuffling some code around that I don't think I'm modifying. I'm sure these patches will not cause any problems. The second one is mostly about moving unrelated stuff from heapam.c to hio.c and renaming couple of functions in heap.c. Anyway, the are not a final solution just proof of concept. By the way, I thought about looking at the mentioned patch and probably reviewing it, but didn't find any of your patches on the current commitfest. Could you point out the thread? Also I think that it's quite strange to have a function heap_create(), that is called from index_create(). It has absolutely nothing to do with heap access method. Agreed. But changing its name while keeping it in heapam.c does not really improve things enough. I'd rather have it moved elsewhere that's not specific to "heaps" (somewhere in access/common perhaps). However, renaming the functions would break third-party code for no good reason. I propose that we only rename things if we also break it for other reasons (say because the API changes in a fundamental way). Yes, I agree that it should be moved to another file. Just to be more accurate: it's not in heapam.c now, it is in "src/backend/catalog/heap.c" which requires much more changes than I did. Concerning to the third-party code. It's a good observation and we definitely should keep it in mind. Nevertheless, I doubt that there's a lot of external calls to these functions. And I hope this thread will end up with fundamental changes introducing clear API for all mentioned problems. One more thing that requires refactoring is "RELKIND_RELATION" name. We have a type of relation called "relation"... I don't see us fixing this bit, really. Historical baggage and all that. For example, a lot of SQL queries use "WHERE relkind = 'r'". If we change the name, we should probably also change the relkind constant, and that would break the queries. If we change the name and do not change the constant, then we have to have a comment "we call them RELKIND_TABLE but the char is r because it was called RELATION previously", which isn't any better. Good point. I'd rather change it to RELKIND_BASIC_RELATION or something like that, which is more specific and still goes well with 'r' constant. But minor changes like that can wait until we clarify the API in general. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
05.08.2016 19:41, Robert Haas: 2. This inserts additional code in a bunch of really low-level places like heap_hot_search_buffer, heap_update, heap_delete, etc. I think what you've basically done here is create a new, in-memory heap AM and then, because we don't have an API for adding new storage managers, you've bolted it onto the existing heapam layer. That's certainly a reasonable approach for creating a PoC, but I think we actually need a real API here. Otherwise, when the next person comes along and wants to add a third heap implementation, they've got to modify all of these same places again. I don't think this code is reasonably maintainable in this form. As I can see, you recommend to clean up the API of storage management code. I strongly agree that it's high time to do it. So, I started the discussion about refactoring and improving API of heapam and heap relations. You can find it on commitfest: https://commitfest.postgresql.org/10/700/ I'll be glad to see your thoughts on the thread. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
30.07.2016 14:00, Andrew Borodin: Here is BRIN-compatible version of patch. Now function PageIndexTupleOverwrite consumes tuple size as a parameter, this behavior is coherent with PageAddItem. Also, i had to remove asserstion that item has a storage in the loop that adjusts line pointers. It would be great if someone could check that place (commented Assert(ItemIdHasStorage(ii)); in PageIndexTupleOverwrite). I suspect that there might be a case when linp's should not be adjusted. Hi, I reviewed the code and I have couple of questions about following code: /* may have negative size here if new tuple is larger */ size_diff = oldsize - alignednewsize; offset = ItemIdGetOffset(tupid); if (offset < phdr->pd_upper || (offset + size_diff) > phdr->pd_special || offset != MAXALIGN(offset) || size_diff != MAXALIGN(size_diff)) ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("corrupted item offset: offset = %u, size = %u", offset, (unsigned int) size_diff))); First of all, shouldn't we use MAXALIGN(oldsize) instead of oldsize? Although, I'm quite sure that it was already aligned somewhere before. I doubt that the check (size_diff != MAXALIGN(size_diff)) is necessary. I'd rather add the check: (offset+size_diff < pd_lower). Besides that moment, the patch seems pretty good. BTW, I'm very surprised that it improves performance so much. And also size is reduced significantly. 89MB against 289MB without patch. Could you explain in details, why does it happen? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring of heapam code.
05.08.2016 16:30, Kevin Grittner: On Fri, Aug 5, 2016 at 2:54 AM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: They can be logically separated into three categories: "primary storage" - r, S, t, v. They store data and visibility information. The only implementation is heapam.c "secondary index" - i. They store some data and pointers to primary storage. Various existing AMs and managed by AM API. "virtual relations" - c, f, m. They have no physical storage, only entries in caches and catalogs. Materialized views (relkind == "m") have physical storage, and may have indexes. Good point. I сonfused letters for views (virtual relations) and materialized views (primary storage). Should be: "primary storage" - r, S, t, m. They store data and visibility information. The only implementation is heapam.c "secondary index" - i. They store some data and pointers to primary storage. Various existing AMs and managed by AM API. "virtual relations" - c, f, v. They have no physical storage, only entries in caches and catalogs. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Refactoring of heapam code.
Working on page compression and some other issues related to access methods, I found out that the code related to heap looks too complicated. Much more complicated, than it should be. Since I anyway got into this area, I want to suggest a set of improvements. There is a number of problems I see in the existing code: Problem 1. Heap != Relation. File heapam.c has a note: * This file contains the heap_ routines which implement * the POSTGRES heap access method used for all POSTGRES * relations. This statement is wrong, since we also have index relations, that are implemented using other access methods. Also I think that it's quite strange to have a function heap_create(), that is called from index_create(). It has absolutely nothing to do with heap access method. And so on, and so on. One more thing that requires refactoring is "RELKIND_RELATION" name. We have a type of relation called "relation"... Problem 2. Some functions are shared between heap and indexes access methods. Maybe someday it used to be handy, but now, I think, it's time to separate them, because IndexTuples and HeapTuples have really little in common. Problem 3. The word "heap" is used in many different contexts. Heap - a storage where we have tuples and visibility information. Generally speaking, it can be implemented over any data structure, not just a plain table as it is done now. Heap - an access method, that provides a set of routines for plain table storage, such as insert, delete, update, gettuple, vacuum and so on. We use this access method not only for user's tables. There are many types of relations (pg_class.h): #define RELKIND_RELATION'r'/* ordinary table */ #define RELKIND_INDEX 'i'/* secondary index */ #define RELKIND_SEQUENCE'S'/* sequence object */ #define RELKIND_TOASTVALUE 't'/* for out-of-line values */ #define RELKIND_VIEW'v'/* view */ #define RELKIND_COMPOSITE_TYPE 'c'/* composite type */ #define RELKIND_FOREIGN_TABLE 'f'/* foreign table */ #define RELKIND_MATVIEW 'm'/* materialized view */ They can be logically separated into three categories: "primary storage" - r, S, t, v. They store data and visibility information. The only implementation is heapam.c "secondary index" - i. They store some data and pointers to primary storage. Various existing AMs and managed by AM API. "virtual relations" - c, f, m. They have no physical storage, only entries in caches and catalogs. Now, for some reasons, we sometimes use name "heap" for both "primary storage" and "virual relations". See src/backend/catalog/heap.c. But some functions work only with "primary storage". See pgstat_relation(). And we constantly have to check relkind everywhere. I'd like it would be clear from the code interface and naming. So there is a couple of patches. They do not cover all mentioned problems, but I'd like to get a feedback before continuing. Patch 1: There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item from the given page. It is used for both heap and index tuples. It doesn't seems a problem, until you are going to find anything in this code. The first patch "PageGetItem_refactoring.patch" is intended to fix it. It is just renaming: (IndexTuple) PageGetItem ---> PageGetItemIndex (HeapTupleHeader) PageGetItem ---> PageGetItemHeap Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple, SpGistDeadTuple) still use PageGetItem. I also changed functions that do not access tuple's data, but only need HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly. I do not insist on these particular names, by the way. Patch 2. heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names and outdated comments. The patch is intended to improve it. Fun fact: I found several "check it later" comments unchanged since "Postgres95 1.01 Distribution - Virgin Sources" commit. I wonder if we can wind better name relation_drop_with_catalog() since, again, it's not about all kinds of relations. We use another function to drop index relations. I hope these changes will be useful for the future development. Suggested patches are mostly about renaming and rearrangement of functions between files, so, if nobody has conceptual objections, all I need from reviewers is an attentive look to find typos, grammar mistakes and overlooked areas. -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c index 4983bbb..257b609 100644 --- a/contrib/pageinspect/btreefuncs.c +++ b/contrib/pageinspect/btreefuncs.c
Re: [HACKERS] Leaking memory in text_overlay function
I found this thread on the open CF. As I see, the discussion is ended with decision to reject patch. I've just changed the status to "Rejected". 02.07.2016 18:11, Dirk Rudolph: Well that's good to know. It was just curious that, in my case, I only saw this in this particular function. Anyway. I will consider to handle the memory the same way, if this is the recommendation. Many thanks. /Closed On Sat, Jul 2, 2016 at 4:45 PM, Tom Lane <t...@sss.pgh.pa.us <mailto:t...@sss.pgh.pa.us>> wrote: Dirk Rudolph <dirk.rudo...@netcentric.biz <mailto:dirk.rudo...@netcentric.biz>> writes: > while implementing my own C function, I mentioned that some memory is not freed by the text_overlay function in varlena.c By and large, that's intentional. SQL-called functions normally run in short-lived memory contexts, so that any memory they don't bother to pfree will be cleaned up automatically at the end of the tuple cycle. If we did not follow that approach, there would need to be many thousands more explicit pfree calls than there are today, and the system would be net slower overall because multiple retail pfree's are slower than a MemoryContextReset. There are places where it's important to avoid leaking memory, certainly. But I don't think this is one of them, unless you can demonstrate a scenario with query-lifespan or worse leakage. > Particularly I mean the both substrings allocated by text_substring (according to the documentation of text_substring "The result is always a freshly palloc'd datum.") and one of the concatenation results. I’m aware of the MemoryContext being deleted in my case but IMHO builtin functions should be memory safe. That is not the project policy. regards, tom lane -- Dirk Rudolph | Senior Software Engineer Netcentric AG M: +41 79 642 37 11 D: +49 174 966 84 34 dirk.rudo...@netcentric.biz <mailto:dirk.rudo...@netcentric.biz> | www.netcentric.biz <http://www.netcentric.biz/> -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] Processes and caches in postgresql
Hi, hackers. There's a couple of questions about processes. I found EXEC_BACKEND flag, while reading the code. As I understood, it exists because we have to emulate fork() on WIN32. And also it allows to debug the same behavior on Linux. Is it right? Are there any other use cases? Another question is about "--fork" argument (see code below). I didn't find it in documentation, so I'm a bit confused. I wonder if it exists only for internal purposes? #ifdef EXEC_BACKEND if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0) SubPostmasterMain(argc, argv);/* does not return */ #endif And the last, but not least. Do we have any presentations/articles/READMEs/whatever about caches (src/backend/utils/cache/*) in postgresql? I found nothing, besides comments in the code. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
Attached version has fix of pg_dump suggested by Stephen Frost<http://postgresql.nabble.com/template/NamlServlet.jtp?macro=user_nodes=75583> in -committers thread. http://postgresql.nabble.com/pgsql-CREATE-INDEX-INCLUDING-column-td5897653.html Sooner or later, I'd like to see this patch finished. For now, it has two complaints: - support of expressions as included columns. Frankly, I don't understand, why it's a problem of the patch. The patch is already big enough and it will be much easier to add expressions support in the following patch, after the first one will be stable. I wonder, if someone has objections to that? Yes, it's a kind of delayed feature. But should we wait for every patch when it will be entirely completed? - lack of review and testing Obviously I did as much testing as I could. So, if reviewers have any concerns about the patch, I'm waiting forward to see them. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 9c8e308..891325d 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals); @@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey); Datum dblink_get_pkey(PG_FUNCTION_ARGS) { - int16 numatts; + int16 indnkeyatts; char **results; FuncCallContext *funcctx; int32 call_cntr; @@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS) rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT); /* get the array of attnums */ - results = get_pkey_attnames(rel, ); + results = get_pkey_attnames(rel, ); relation_close(rel, AccessShareLock); @@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS) attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; - if ((results != NULL) && (numatts > 0)) + if ((results != NULL) && (indnkeyatts > 0)) { - funcctx->max_calls = numatts; + funcctx->max_calls = indnkeyatts; /* got results, keep track of them */ funcctx->user_fctx = results; @@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS) * get_pkey_attnames * * Get the primary key attnames for the given relation. - * Return NULL, and set numatts = 0, if no primary key exists. + * Return NULL, and set indnkeyatts = 0, if no primary key exists. */ static char ** -get_pkey_attnames(Relation rel, int16 *numatts) +get_pkey_attnames(Relation rel, int16 *indnkeyatts) { Relation indexRelation; ScanKeyData skey; @@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts) char **result = NULL; TupleDesc tupdesc; - /* initialize numatts to 0 in case no primary key exists */ - *numatts = 0; + /* initialize indnkeyatts to 0 in case no primary key exists */ + *indnkeyatts = 0; tupdesc = rel->rd_att; @@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts) /* we're only interested if it is the primary key */ if (index->indisprimary) { - *numatts = index->indnatts; - if (*numatts > 0) + *indnkeyatts = index->indnkeyatts; + if (*indnkeyatts > 0) { -result = (char **) palloc(*numatts * sizeof(char *)); +result = (char **) palloc(*indnkeyatts * sizeof(char *)); -for (i = 0; i < *numatts; i++) +for (i = 0; i < *indnkeyatts; i++) result[i] = SPI_fname(tupdesc, index->indkey.values[i]); } break; diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 7352b29..142730a 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS) /* we're only interested if it is the primary key and valid */ if (index->indisprimary && IndexIsValid(index)) { - int numatts = index->indnatts; + int indnkeyatts = index->indnkeyatts; - if (numatts > 0) + if (indnkeyatts > 0) { int i; @@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS) appendStringInfoCharMacro(payload, ','); appendStringInfoCharMacro(payload, operation); -for (i = 0; i < numatts; i++) +for (i = 0; i < indnkeyatts; i++) { int colno = index->indkey.values[i]; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index d6b60db..3
Re: [HACKERS] WIP: Covering + unique indexes.
08.04.2016 15:45, Anastasia Lubennikova: 08.04.2016 15:06, Teodor Sigaev: On Wed, Apr 6, 2016 at 1:50 PM, Peter Geoghegan <p...@heroku.com> wrote: Personally, I like documenting assertions, and will sometimes write assertions that the compiler could easily optimize away. Maybe going *that* far is more a matter of personal style, but I think an assertion about the new index tuple size being <= the old one is just a good idea. It's not about a problem in your code at all. You should make index_truncate_tuple()/index_reform_tuple() promise to always do this in its comments/contract with caller as part of this, IMV. Some notices: - index_truncate_tuple(Relation idxrel, IndexTuple olditup, int indnatts, int indnkeyatts) Why we need indnatts/indnkeyatts? They are presented in idxrel struct already - follow code where index_truncate_tuple() is called, it should never called in case where indnatts == indnkeyatts. So, indnkeyatts should be strictly less than indnatts, pls, change assertion. If they are equal the this function becomes complicated variant of CopyIndexTuple() Good point. These attributes seem to be there since previous versions of the function. But now they are definitely unnecessary. Updated patch is attached One more improvement - note about expressions into documentation. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index b5f67af..61a21a9 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -161,6 +161,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] INCLUDING clause, which can slightly reduce the size of the index, due to storing included attributes only in leaf index pages. Currently, only the B-tree access method supports this feature. +Expressions as included columns are not supported since they cannot be used +in index-only scan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] amcheck (B-Tree integrity checking tool)
ake it as simple as possible despite significant effort. It's hard to describe these things in an accessible way. Maybe someone can try and understand what I've said there, and let me know how I might have fallen short. I have used a new term, "cousin page" in this explanation (cf. sibling page). This seems like useful terminology that makes these difficult explanations a bit more accessible. I'm not an expert in btree locking races, but I tested the patch it on different loads and it works as promised. I didn't find mistakes in the code logic as well. As a reviewer, I don't have any objections to mark it "Ready for Committer". The only notice I'd like to add is about it's compatibility with covering indexes. We already discussed it in the thread https://commitfest.postgresql.org/9/433/ Tiny attached patch fixes this issue. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/amcheck/amcheck.c b/contrib/amcheck/amcheck.c index 66d1f4f..91b2bc5 100644 --- a/contrib/amcheck/amcheck.c +++ b/contrib/amcheck/amcheck.c @@ -1118,6 +1118,9 @@ invariant_key_less_than_equal_offset(BtreeCheckState *state, ScanKey key, int16 natts = state->rel->rd_rel->relnatts; int32 cmp; +#if (PG_VERSION_NUM >= 90600) + natts = state->rel->rd_index->indnkeyatts; +#endif cmp = _bt_compare(state->rel, natts, key, state->target, upperbound); return cmp <= 0; @@ -1139,6 +1142,9 @@ invariant_key_greater_than_equal_offset(BtreeCheckState *state, ScanKey key, int16 natts = state->rel->rd_rel->relnatts; int32 cmp; +#if (PG_VERSION_NUM >= 90600) + natts = state->rel->rd_index->indnkeyatts; +#endif cmp = _bt_compare(state->rel, natts, key, state->target, lowerbound); return cmp >= 0; @@ -1164,6 +1170,9 @@ invariant_key_less_than_equal_nontarget_offset(BtreeCheckState *state, int16 natts = state->rel->rd_rel->relnatts; int32 cmp; +#if (PG_VERSION_NUM >= 90600) + natts = state->rel->rd_index->indnkeyatts; +#endif cmp = _bt_compare(state->rel, natts, key, nontarget, upperbound); return cmp <= 0; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
08.04.2016 15:06, Teodor Sigaev: On Wed, Apr 6, 2016 at 1:50 PM, Peter Geoghegan <p...@heroku.com> wrote: Personally, I like documenting assertions, and will sometimes write assertions that the compiler could easily optimize away. Maybe going *that* far is more a matter of personal style, but I think an assertion about the new index tuple size being <= the old one is just a good idea. It's not about a problem in your code at all. You should make index_truncate_tuple()/index_reform_tuple() promise to always do this in its comments/contract with caller as part of this, IMV. Some notices: - index_truncate_tuple(Relation idxrel, IndexTuple olditup, int indnatts, int indnkeyatts) Why we need indnatts/indnkeyatts? They are presented in idxrel struct already - follow code where index_truncate_tuple() is called, it should never called in case where indnatts == indnkeyatts. So, indnkeyatts should be strictly less than indnatts, pls, change assertion. If they are equal the this function becomes complicated variant of CopyIndexTuple() Good point. These attributes seem to be there since previous versions of the function. But now they are definitely unnecessary. Updated patch is attached -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 9c8e308..891325d 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals); @@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey); Datum dblink_get_pkey(PG_FUNCTION_ARGS) { - int16 numatts; + int16 indnkeyatts; char **results; FuncCallContext *funcctx; int32 call_cntr; @@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS) rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT); /* get the array of attnums */ - results = get_pkey_attnames(rel, ); + results = get_pkey_attnames(rel, ); relation_close(rel, AccessShareLock); @@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS) attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; - if ((results != NULL) && (numatts > 0)) + if ((results != NULL) && (indnkeyatts > 0)) { - funcctx->max_calls = numatts; + funcctx->max_calls = indnkeyatts; /* got results, keep track of them */ funcctx->user_fctx = results; @@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS) * get_pkey_attnames * * Get the primary key attnames for the given relation. - * Return NULL, and set numatts = 0, if no primary key exists. + * Return NULL, and set indnkeyatts = 0, if no primary key exists. */ static char ** -get_pkey_attnames(Relation rel, int16 *numatts) +get_pkey_attnames(Relation rel, int16 *indnkeyatts) { Relation indexRelation; ScanKeyData skey; @@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts) char **result = NULL; TupleDesc tupdesc; - /* initialize numatts to 0 in case no primary key exists */ - *numatts = 0; + /* initialize indnkeyatts to 0 in case no primary key exists */ + *indnkeyatts = 0; tupdesc = rel->rd_att; @@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts) /* we're only interested if it is the primary key */ if (index->indisprimary) { - *numatts = index->indnatts; - if (*numatts > 0) + *indnkeyatts = index->indnkeyatts; + if (*indnkeyatts > 0) { -result = (char **) palloc(*numatts * sizeof(char *)); +result = (char **) palloc(*indnkeyatts * sizeof(char *)); -for (i = 0; i < *numatts; i++) +for (i = 0; i < *indnkeyatts; i++) result[i] = SPI_fname(tupdesc, index->indkey.values[i]); } break; diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 7352b29..142730a 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS) /* we're only interested if it is the primary key and valid */ if (index->indisprimary && IndexIsValid(index)) { - int numatts = index->indnatts; + int indnkeyatts = index->indnkeyatts; - if (numatts > 0) + if (indnkeyatts > 0) { int i; @@ -150,7 +150,7 @@ triggered_change_notification(P
Re: [HACKERS] WIP: Covering + unique indexes.
06.04.2016 23:52, Peter Geoghegan: On Wed, Apr 6, 2016 at 1:50 PM, Peter Geoghegan <p...@heroku.com> wrote: Personally, I like documenting assertions, and will sometimes write assertions that the compiler could easily optimize away. Maybe going *that* far is more a matter of personal style, but I think an assertion about the new index tuple size being <= the old one is just a good idea. It's not about a problem in your code at all. You should make index_truncate_tuple()/index_reform_tuple() promise to always do this in its comments/contract with caller as part of this, IMV. Mentioned issues are fixed. Patch is attached. I'd like to remind you that the commitfest will be closed very-very soon, so I'd like to get your final resolution about the patch. Not to have it in the 9.6 release will be very disappointing. I agree that b-tree is a crucial subsystem. But it seems to me, that we have lack of improvements in this area not only because of the algorithm's complexity but also because of lack of enthusiasts to work on it and struggle through endless discussions. But it's off-topic here. Attention to these development difficulties will be one of the messages of my pgcon talk. You know, we lost a lot of time discussing various b-tree problems. Besides that, I am sure that the patch is really in a good shape. It hasn't any open problems to fix. And possible subtle bugs can be found at the testing stage of the release. Looking forward to your reply. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 9c8e308..891325d 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals); @@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey); Datum dblink_get_pkey(PG_FUNCTION_ARGS) { - int16 numatts; + int16 indnkeyatts; char **results; FuncCallContext *funcctx; int32 call_cntr; @@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS) rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT); /* get the array of attnums */ - results = get_pkey_attnames(rel, ); + results = get_pkey_attnames(rel, ); relation_close(rel, AccessShareLock); @@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS) attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; - if ((results != NULL) && (numatts > 0)) + if ((results != NULL) && (indnkeyatts > 0)) { - funcctx->max_calls = numatts; + funcctx->max_calls = indnkeyatts; /* got results, keep track of them */ funcctx->user_fctx = results; @@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS) * get_pkey_attnames * * Get the primary key attnames for the given relation. - * Return NULL, and set numatts = 0, if no primary key exists. + * Return NULL, and set indnkeyatts = 0, if no primary key exists. */ static char ** -get_pkey_attnames(Relation rel, int16 *numatts) +get_pkey_attnames(Relation rel, int16 *indnkeyatts) { Relation indexRelation; ScanKeyData skey; @@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts) char **result = NULL; TupleDesc tupdesc; - /* initialize numatts to 0 in case no primary key exists */ - *numatts = 0; + /* initialize indnkeyatts to 0 in case no primary key exists */ + *indnkeyatts = 0; tupdesc = rel->rd_att; @@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts) /* we're only interested if it is the primary key */ if (index->indisprimary) { - *numatts = index->indnatts; - if (*numatts > 0) + *indnkeyatts = index->indnkeyatts; + if (*indnkeyatts > 0) { -result = (char **) palloc(*numatts * sizeof(char *)); +result = (char **) palloc(*indnkeyatts * sizeof(char *)); -for (i = 0; i < *numatts; i++) +for (i = 0; i < *indnkeyatts; i++) result[i] = SPI_fname(tupdesc, index->indkey.values[i]); } break; diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 7352b29..142730a 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS) /* we're only interested if it is the primary key and valid */ if (index->indisprimary
Re: [HACKERS] WIP: Covering + unique indexes.
06.04.2016 16:15, Anastasia Lubennikova : 06.04.2016 03:05, Peter Geoghegan: * There is some stray whitespace within RelationGetIndexAttrBitmap(). I think you should have updated it with code, though. I don't think it's necessary for HOT updates to work, but I think it could be necessary so that we don't need to get a row lock that blocks non-conflict foreign key locking (see heap_update() callers). I think you need to be careful for non-key columns within the loop in RelationGetIndexAttrBitmap(), basically, because it seems to still go through all columns. UPSERT also must call this code, FWIW. * I think that a similar omission is also made for the replica identity stuff in RelationGetIndexAttrBitmap(). Some thought is needed on how this patch interacts with logical decoding, I guess. Good point. Indexes are everywhere in the code. I missed that RelationGetIndexAttrBitmap() is used not only for REINDEX. I'll discuss it with Theodor and send an updated patch tomorrow. As promised, updated patch is in attachments. But, I'm not an expert in this area, so it needs a 'critical look'. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 9c8e308..891325d 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals); @@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey); Datum dblink_get_pkey(PG_FUNCTION_ARGS) { - int16 numatts; + int16 indnkeyatts; char **results; FuncCallContext *funcctx; int32 call_cntr; @@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS) rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT); /* get the array of attnums */ - results = get_pkey_attnames(rel, ); + results = get_pkey_attnames(rel, ); relation_close(rel, AccessShareLock); @@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS) attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; - if ((results != NULL) && (numatts > 0)) + if ((results != NULL) && (indnkeyatts > 0)) { - funcctx->max_calls = numatts; + funcctx->max_calls = indnkeyatts; /* got results, keep track of them */ funcctx->user_fctx = results; @@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS) * get_pkey_attnames * * Get the primary key attnames for the given relation. - * Return NULL, and set numatts = 0, if no primary key exists. + * Return NULL, and set indnkeyatts = 0, if no primary key exists. */ static char ** -get_pkey_attnames(Relation rel, int16 *numatts) +get_pkey_attnames(Relation rel, int16 *indnkeyatts) { Relation indexRelation; ScanKeyData skey; @@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts) char **result = NULL; TupleDesc tupdesc; - /* initialize numatts to 0 in case no primary key exists */ - *numatts = 0; + /* initialize indnkeyatts to 0 in case no primary key exists */ + *indnkeyatts = 0; tupdesc = rel->rd_att; @@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts) /* we're only interested if it is the primary key */ if (index->indisprimary) { - *numatts = index->indnatts; - if (*numatts > 0) + *indnkeyatts = index->indnkeyatts; + if (*indnkeyatts > 0) { -result = (char **) palloc(*numatts * sizeof(char *)); +result = (char **) palloc(*indnkeyatts * sizeof(char *)); -for (i = 0; i < *numatts; i++) +for (i = 0; i < *indnkeyatts; i++) result[i] = SPI_fname(tupdesc, index->indkey.values[i]); } break; diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 7352b29..142730a 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS) /* we're only interested if it is the primary key and valid */ if (index->indisprimary && IndexIsValid(index)) { - int numatts = index->indnatts; + int indnkeyatts = index->indnkeyatts; - if (numatts > 0) + if (indnkeyatts > 0) { int i; @@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS) appendStringInfoCharMacro(payload, ','); appendStringInfoCharMacro(payload, operation); -for (i = 0; i <
Re: [HACKERS] WIP: Covering + unique indexes.
b; ==12310== Invalid read of size 4 ==12310==at 0x656615: match_clause_to_indexcol (indxpath.c:2226) ==12310==by 0x656615: match_clause_to_index (indxpath.c:2144) ==12310==by 0x656DBC: match_clauses_to_index (indxpath.c:2115) ==12310==by 0x658054: match_restriction_clauses_to_index (indxpath.c:2026) ==12310==by 0x658054: create_index_paths (indxpath.c:269) ==12310==by 0x64D1DB: set_plain_rel_pathlist (allpaths.c:649) ==12310==by 0x64D1DB: set_rel_pathlist (allpaths.c:427) ==12310==by 0x64D93B: set_base_rel_pathlists (allpaths.c:299) ==12310==by 0x64D93B: make_one_rel (allpaths.c:170) ==12310==by 0x66876C: query_planner (planmain.c:246) ==12310==by 0x669FBA: grouping_planner (planner.c:1666) ==12310==by 0x66D0C9: subquery_planner (planner.c:751) ==12310==by 0x66D3DA: standard_planner (planner.c:300) ==12310==by 0x66D714: planner (planner.c:170) ==12310==by 0x6FD692: pg_plan_query (postgres.c:798) ==12310==by 0x59082D: ExplainOneQuery (explain.c:350) ==12310== Address 0xbff290c is 2,508 bytes inside a block of size 8,192 alloc'd ==12310==at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==12310==by 0x81B7FA: AllocSetAlloc (aset.c:853) ==12310==by 0x81D257: palloc (mcxt.c:907) ==12310==by 0x4B6F65: RelationGetIndexScan (genam.c:94) ==12310==by 0x4C135D: btbeginscan (nbtree.c:431) ==12310==by 0x4B7A5C: index_beginscan_internal (indexam.c:279) ==12310==by 0x4B7C5A: index_beginscan (indexam.c:222) ==12310==by 0x4B73D1: systable_beginscan (genam.c:379) ==12310==by 0x7E8CF9: ScanPgRelation (relcache.c:341) ==12310==by 0x7EB3C4: RelationBuildDesc (relcache.c:951) ==12310==by 0x7ECD35: RelationIdGetRelation (relcache.c:1800) ==12310==by 0x4A4D37: relation_open (heapam.c:1118) ==12310== { Memcheck:Addr4 fun:match_clause_to_indexcol fun:match_clause_to_index fun:match_clauses_to_index fun:match_restriction_clauses_to_index fun:create_index_paths fun:set_plain_rel_pathlist fun:set_rel_pathlist fun:set_base_rel_pathlists fun:make_one_rel fun:query_planner fun:grouping_planner fun:subquery_planner fun:standard_planner fun:planner fun:pg_plan_query fun:ExplainOneQuery } Separately, I tried "make installcheck-tests TESTS=index_including" from Postgres + Valgrind, with Valgrind's --track-origins option enabled (as it was above). I recommend installing Valgrind, and making sure that the patch shows no errors. I didn't actually find a Valgrind issue from just using your regression tests (nor did I find an issue from separately running the regression tests with CLOBBER_CACHE_ALWAYS, FWIW). Thank you for advice. Another miss of index->ncolumns to index->nkeycolumns replacement in match_clause_to_index. Fixed. I also updated couple of typos in documentation. Thank you again for the detailed review. -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 9c8e308..891325d 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals); @@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey); Datum dblink_get_pkey(PG_FUNCTION_ARGS) { - int16 numatts; + int16 indnkeyatts; char **results; FuncCallContext *funcctx; int32 call_cntr; @@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS) rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT); /* get the array of attnums */ - results = get_pkey_attnames(rel, ); + results = get_pkey_attnames(rel, ); relation_close(rel, AccessShareLock); @@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS) attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; - if ((results != NULL) && (numatts > 0)) + if ((results != NULL) && (indnkeyatts > 0)) { - funcctx->max_calls = numatts; + funcctx->max_calls = indnkeyatts; /* got results, keep track of them */ funcctx->user_fctx = results; @@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS) * get_pkey_attnames * * Get the primary key attnames for the given relation. - * Return NULL, and set numatts = 0, if no primary
Re: [HACKERS] WIP: Covering + unique indexes.
05.04.2016 01:48, Peter Geoghegan : On Mon, Mar 21, 2016 at 9:53 AM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: It's a bit more complicated to add it into index creation algorithm. There's a trick with a "high key". /* * We copy the last item on the page into the new page, and then * rearrange the old page so that the 'last item' becomes its high key * rather than a true data item. There had better be at least two * items on the page already, else the page would be empty of useful * data. */ /* * Move 'last' into the high key position on opage */ To be consistent with other steps of algorithm ( all high keys must be truncated tuples), I had to update this high key on place: delete the old one, and insert truncated high key. Hmm. But the high key comparing equal to the Scankey gives insertion the choice of where to put its IndexTuple (it can go on the page with the high key, or its right-sibling, according only to considerations about fillfactor, etc). Is this changed? Does it not matter? Why not? Is it just worth it? I would say, this is changed, but it doesn't matter. Performing any search in btree (including choosing suitable page for insertion), we use only key attributes. We assume that included columns are stored in index unordered. Simple example. create table tbl(id int, data int); create index idx on tbl (id) including (data); Select query does not consider included columns in scan key. It selects all tuples satisfying the condition on key column. And only after that it applies filter to remove wrong rows from the result. If key attribute doesn't satisfy query condition, there are no more tuples to return and we can interrupt scan. You can find more explanations in the attached sql script, that contains queries to recieve detailed information about index structure using pageinspect. The right-most page on every level has no high-key. But you say those pages have an "imaginary" *positive* infinity high key, just as internal pages have (non-imaginary) minus infinity downlinks as their first item/downlink. So tuples in a (say) leaf page are always bound by the downlink lower bound in parent, while their own high key is an upper bound. Either (and, rarely, both) could be (positive or negative) infinity. Maybe you now see why I talked about special _bt_compare() logic for this. I proposed special logic that is similar to the existing minus infinity thing _bt_compare() does (although _bt_binsrch(), an important caller of _bt_compare() also does special things for internal .vs leaf case, so I'm not sure any new special logic must go in _bt_compare()). In my view, it's the correct way to fix this problem, because the caller is responsible for passing proper arguments to the function. Of course I will add a check into bt_compare, but I'd rather make it an assertion (see the patch attached). I see what you mean, but I think we need to decide what to do about the key space when leaf high keys are truncated. I do think that truncating the high key was the right idea, though, and it nicely illustrates that nothing special should happen in upper levels. Suffix truncation should only happen when leaf pages are split, generally speaking. As I said, the high key is very similar to the downlinks, in that both bound the things that go on each page. Together with downlinks they represent *discrete* ranges *unambiguously*, so INCLUDING truncation needs to make it clear which page new items go on. As I said, _bt_binsrch() already takes special actions for internal pages, making sure to return the item that is < the scankey, not <= the scankey which is only allowed for leaf pages. (See README, from "Lehman and Yao assume that the key range for a subtree S is described by Ki < v <= Ki+1 where Ki and Ki+1 are the adjacent keys in the parent page..."). To give a specific example, I worry about the case where two sibling downlinks in a parent page are distinct, but per specific-to-Postgres "Ki <= v <= Ki+1" thing (which differs from the classic L invariant), some tuples with all right downlink's attributes matching end up in left child page, not right child page. I worry that since _bt_findsplitloc() doesn't consider this (for example), the split point doesn't *reliably* and unambiguously divide the key space between the new halves of a page being split. I think the "Ki <= v <= Ki+1"/_bt_binsrch() thing might save you in common cases where all downlink attributes are distinct, so maybe that simpler case is okay. But to be even more specific, what about the more complicated case where the downlinks *are* fully _bt_compare()-wise equal? This could happen even though they're constrained to be unique in leaf pages, due to bloat. Unique indexes aren't special here; they just make it far less likely that this w
Re: [HACKERS] [PATCH] Supporting +-Infinity values by to_timestamp(float8)
17.03.2016 06:27, Vitaly Burovoy: On 2016-03-15, David Steele <da...@pgmasters.net> wrote: On 3/4/16 2:56 PM, Vitaly Burovoy wrote: On 3/4/16, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: I think that you should update documentation. At least description of epoch on this page: http://www.postgresql.org/docs/devel/static/functions-datetime.html Thank you very much for pointing where it is located (I saw only "to_timestamp(TEXT, TEXT)"). I'll think how to update it. Vitaly, have you decided how to update this yet? Yes, there are three versions: * remove mentioning how to get timestamptz from UNIX stamp; * leave a note how to get timestamptz and add a note that such encapsulation existed prior to 9.6; * replace to the proposed current behavior (without interval). I decided to implement the third case (but a result there has a time zone which can be different, so the result can be not exactly same for a concrete user). If a committer decides to do somehow else, he is free to choose one from the list above or to do something else. 3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice abbreviation, but it seems slightly confusing to me. It doesn't matter for me what it is called, it is short enough and reflects a type on which it is applied. What would the best name be for it? Anastasia, any suggestions for a better name, or just leave it as is? I'm not in favor of the "4", either. I think I would prefer JULIAN_MAXYEAR_STAMP. It turns out that Tom has changed almost one third of timestamp.h and now the constant has a name "TIMESTAMP_END_JULIAN". I've rebased the patch to the current master (5db5146) and changed it according to the new timestamp.h. Now it passes both versions (integer and float timestamps). I deleted test for the upper boundary for integer timestamps, changed a little comments. I decided to delete hints about minimum and maximum allowed values because no one type has such hint. Please find attached a new version of the patch. I think, I should write something as a reviewer. I read the patch again and I don't see any issues with it. It applies to the master and works as expected. So I'll mark it as "Ready for committer" -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
25.03.2016 01:12, Peter Geoghegan: On Thu, Mar 24, 2016 at 7:17 AM, Robert Haas <robertmh...@gmail.com> wrote: I really like this idea, and the performance results seem impressive, but I think we should push this out to 9.7. A btree patch that didn't have WAL support until two and a half weeks into the final CommitFest just doesn't seem to me like a good candidate. First, as a general matter, if a patch isn't code-complete at the start of a CommitFest, it's reasonable to say that it should be reviewed but not necessarily committed in that CommitFest. You're right. Frankly, I thought that someone will help me with the path, but I had to finish it myself. *off-topic* I wonder, if we can add new flag to commitfest. Something like "Needs assistance", which will be used to mark big and complicated patches in progress. While "Needs review" means that the patch is almost ready and only requires the final review. This patch has had some review, but I'm not sure how deep that review is, and I think it's had no code review at all of the WAL logging changes, which were submitted only a week ago, well after the CF deadline. Second, the btree AM is a particularly poor place to introduce possibly destabilizing changes. Everybody depends on it, all the time, for everything. And despite new tools like amcheck, it's not a particularly easy thing to debug. Regrettably, I must agree. I don't see a plausible path to commit for this patch in the ongoing CF. I think that Anastasia did an excellent job here, and I wish I could have been of greater help sooner. Nevertheless, it would be unwise to commit this given the maturity of the code. There have been very few instances of performance improvements to the B-Tree code for as long as I've been interested, because it's so hard, and the standard is so high. The only example I can think of from the last few years is Kevin's commit 2ed5b87f96 and Tom's commit 1a77f8b63d both of which were far less invasive, and Simon's commit c7111d11b1, which we just outright reverted from 9.5 due to subtle bugs (and even that was significantly less invasive than this patch). Improving nbtree is something that requires several rounds of expert review, and that's something that's in short supply for the B-Tree code in particular. I think that a new testing strategy is needed to make this easier, and I hope to get that going with amcheck. I need help with formalizing a "testing first" approach for improving the B-Tree code, because I think it's the only way that we can move forward with projects like this. It's *incredibly* hard to push forward patches like this given our current, limited testing strategy. Unfortunately, I must agree. This patch seems to be far from final version until the feature freeze. I'll move it to the future commitfest. Anyway it means, that now we have more time to improve the patch. If you have any ideas related to this patch like prefix/suffix compression, I'll be glad to discuss them. Same for any other ideas of B-tree optimization. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
21.03.2016 19:53, Anastasia Lubennikova: 19.03.2016 08:00, Peter Geoghegan: On Fri, Mar 18, 2016 at 5:15 AM, David Steele <da...@pgmasters.net> wrote: It looks like this patch should be marked "needs review" and I have done so. Uh, no it shouldn't. I've posted an extensive review on the original design thread. See CF entry: https://commitfest.postgresql.org/9/433/ Marked "Waiting on Author". Thanks to David, I've missed these letters at first. I'll answer here. * You truncate (remove suffix attributes -- the "included" attributes) within _bt_insertonpg(): - right_item = CopyIndexTuple(item); + indnatts = IndexRelationGetNumberOfAttributes(rel); + indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel); + + if (indnatts != indnkeyatts) + { + right_item = index_reform_tuple(rel, item, indnatts, indnkeyatts); + right_item_sz = IndexTupleDSize(*right_item); + right_item_sz = MAXALIGN(right_item_sz); + } + else + right_item = CopyIndexTuple(item); ItemPointerSet(&(right_item->t_tid), rbkno, P_HIKEY); I suggest that you do this within _bt_insert_parent(), instead, iff the original target page is know to be a leaf page. That's where it needs to happen for conventional suffix truncation, which has special considerations when determining which attributes are safe to truncate (or even which byte in the first distinguishing attribute it is okay to truncate past) I agree that _bt_insertonpg() is not right for truncation. Furthermore, I've noticed that all internal keys are solely the copies of "High keys" from the leaf pages. Which is pretty logical. Therefore, if we have already truncated the tuple, when it became a High key, we do not need the same truncation within _bt_insert_parent() or any other function. So the only thing to worry about is the HighKey truncation. I rewrote the code. Now only _bt_split cares about truncation. It's a bit more complicated to add it into index creation algorithm. There's a trick with a "high key". /* * We copy the last item on the page into the new page, and then * rearrange the old page so that the 'last item' becomes its high key * rather than a true data item. There had better be at least two * items on the page already, else the page would be empty of useful * data. */ /* * Move 'last' into the high key position on opage */ To be consistent with other steps of algorithm ( all high keys must be truncated tuples), I had to update this high key on place: delete the old one, and insert truncated high key. The very same logic I use to truncate posting list of a compressed tuple in the "btree_compression" patch. [1] I hope, both patches will be accepted, and then I'll thoroughly merge them . * I think the comparison logic may have a bug. Does this work with amcheck? Maybe it works with bt_index_check(), but not bt_index_parent_check()? I think that you need to make sure that _bt_compare() knows about this, too. That's because it isn't good enough to let a truncated internal IndexTuple compare equal to a scankey when non-truncated attributes are equal. It is a very important issue. But I don't think it's a bug there. I've read amcheck sources thoroughly and found that the problem appears at "invariant_key_less_than_equal_nontarget_offset() static bool invariant_key_less_than_equal_nontarget_offset(BtreeCheckState *state, Page nontarget, ScanKey key, OffsetNumber upperbound) { int16natts = state->rel->rd_rel->relnatts; int32cmp; cmp = _bt_compare(state->rel, natts, key, nontarget, upperbound); return cmp <= 0; } It uses scankey, made with _bt_mkscankey() which uses only key attributes, but calls _bt_compare with wrong keysz. If we wiil use nkeyatts = state->rel->rd_index->relnatts; instead of natts, all the checks would be passed successfully. Same for invariant_key_greater_than_equal_offset() and invariant_key_less_than_equal_nontarget_offset(). In my view, it's the correct way to fix this problem, because the caller is responsible for passing proper arguments to the function. Of course I will add a check into bt_compare, but I'd rather make it an assertion (see the patch attached). I'll add a flag to distinguish regular and truncated tuples, but it will not be used in this patch. Please, comment, if I've missed something. As you've already mentioned, neither high keys, nor tuples on internal pages are using "itup->t_tid.ip_posid", so I'll take one bit of it. It will definitely require changes in the future works on suffix truncation or something like that, but IMHO for now it's enough. Do you have any objections or comments? [1] https://commitfest.pos
Re: [HACKERS] WIP: Covering + unique indexes.
19.03.2016 08:00, Peter Geoghegan: On Fri, Mar 18, 2016 at 5:15 AM, David Steele <da...@pgmasters.net> wrote: It looks like this patch should be marked "needs review" and I have done so. Uh, no it shouldn't. I've posted an extensive review on the original design thread. See CF entry: https://commitfest.postgresql.org/9/433/ Marked "Waiting on Author". Thanks to David, I've missed these letters at first. I'll answer here. * You truncate (remove suffix attributes -- the "included" attributes) within _bt_insertonpg(): - right_item = CopyIndexTuple(item); + indnatts = IndexRelationGetNumberOfAttributes(rel); + indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel); + + if (indnatts != indnkeyatts) + { + right_item = index_reform_tuple(rel, item, indnatts, indnkeyatts); + right_item_sz = IndexTupleDSize(*right_item); + right_item_sz = MAXALIGN(right_item_sz); + } + else + right_item = CopyIndexTuple(item); ItemPointerSet(&(right_item->t_tid), rbkno, P_HIKEY); I suggest that you do this within _bt_insert_parent(), instead, iff the original target page is know to be a leaf page. That's where it needs to happen for conventional suffix truncation, which has special considerations when determining which attributes are safe to truncate (or even which byte in the first distinguishing attribute it is okay to truncate past) I agree that _bt_insertonpg() is not right for truncation. Furthermore, I've noticed that all internal keys are solely the copies of "High keys" from the leaf pages. Which is pretty logical. Therefore, if we have already truncated the tuple, when it became a High key, we do not need the same truncation within _bt_insert_parent() or any other function. So the only thing to worry about is the HighKey truncation. I rewrote the code. Now only _bt_split cares about truncation. It's a bit more complicated to add it into index creation algorithm. There's a trick with a "high key". /* * We copy the last item on the page into the new page, and then * rearrange the old page so that the 'last item' becomes its high key * rather than a true data item. There had better be at least two * items on the page already, else the page would be empty of useful * data. */ /* * Move 'last' into the high key position on opage */ To be consistent with other steps of algorithm ( all high keys must be truncated tuples), I had to update this high key on place: delete the old one, and insert truncated high key. The very same logic I use to truncate posting list of a compressed tuple in the "btree_compression" patch. [1] I hope, both patches will be accepted, and then I'll thoroughly merge them . * I think the comparison logic may have a bug. Does this work with amcheck? Maybe it works with bt_index_check(), but not bt_index_parent_check()? I think that you need to make sure that _bt_compare() knows about this, too. That's because it isn't good enough to let a truncated internal IndexTuple compare equal to a scankey when non-truncated attributes are equal. It is a very important issue. But I don't think it's a bug there. I've read amcheck sources thoroughly and found that the problem appears at "invariant_key_less_than_equal_nontarget_offset() static bool invariant_key_less_than_equal_nontarget_offset(BtreeCheckState *state, Page nontarget, ScanKey key, OffsetNumber upperbound) { int16natts = state->rel->rd_rel->relnatts; int32cmp; cmp = _bt_compare(state->rel, natts, key, nontarget, upperbound); return cmp <= 0; } It uses scankey, made with _bt_mkscankey() which uses only key attributes, but calls _bt_compare with wrong keysz. If we wiil use nkeyatts = state->rel->rd_index->relnatts; instead of natts, all the checks would be passed successfully. Same for invariant_key_greater_than_equal_offset() and invariant_key_less_than_equal_nontarget_offset(). In my view, it's the correct way to fix this problem, because the caller is responsible for passing proper arguments to the function. Of course I will add a check into bt_compare, but I'd rather make it an assertion (see the patch attached). I'll add a flag to distinguish regular and truncated tuples, but it will not be used in this patch. Please, comment, if I've missed something. As you've already mentioned, neither high keys, nor tuples on internal pages are using "itup->t_tid.ip_posid", so I'll take one bit of it. It will definitely require changes in the future works on suffix truncation or something like that, but IMHO for now it's enough. Do you have any objections or comments? [1] https://commitfest.postgresql.org/9/494/ -- Anastasia Lubennikova Pos
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Please, find the new version of the patch attached. Now it has WAL functionality. Detailed description of the feature you can find in README draft https://goo.gl/50O8Q0 This patch is pretty complicated, so I ask everyone, who interested in this feature, to help with reviewing and testing it. I will be grateful for any feedback. But please, don't complain about code style, it is still work in progress. Next things I'm going to do: 1. More debugging and testing. I'm going to attach in next message couple of sql scripts for testing. 2. Fix NULLs processing 3. Add a flag into pg_index, that allows to enable/disable compression for each particular index. 4. Recheck locking considerations. I tried to write code as less invasive as possible, but we need to make sure that algorithm is still correct. 5. Change BTMaxItemSize 6. Bring back microvacuum functionality. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c index e3c55eb..72acc0f 100644 --- a/src/backend/access/nbtree/nbtinsert.c +++ b/src/backend/access/nbtree/nbtinsert.c @@ -24,6 +24,8 @@ #include "storage/predicate.h" #include "utils/tqual.h" +#include "catalog/catalog.h" +#include "utils/datum.h" typedef struct { @@ -82,6 +84,7 @@ static bool _bt_pgaddtup(Page page, Size itemsize, IndexTuple itup, OffsetNumber itup_off); static bool _bt_isequal(TupleDesc itupdesc, Page page, OffsetNumber offnum, int keysz, ScanKey scankey); + static void _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel); @@ -113,6 +116,11 @@ _bt_doinsert(Relation rel, IndexTuple itup, BTStack stack; Buffer buf; OffsetNumber offset; + Page page; + TupleDesc itupdesc; + int nipd; + IndexTuple olditup; + Size sizetoadd; /* we need an insertion scan key to do our search, so build one */ itup_scankey = _bt_mkscankey(rel, itup); @@ -190,6 +198,7 @@ top: if (checkUnique != UNIQUE_CHECK_EXISTING) { + bool updposting = false; /* * The only conflict predicate locking cares about for indexes is when * an index tuple insert conflicts with an existing lock. Since the @@ -201,7 +210,42 @@ top: /* do the insertion */ _bt_findinsertloc(rel, , , natts, itup_scankey, itup, stack, heapRel); - _bt_insertonpg(rel, buf, InvalidBuffer, stack, itup, offset, false); + + /* + * Decide, whether we can apply compression + */ + page = BufferGetPage(buf); + + if(!IsSystemRelation(rel) + && !rel->rd_index->indisunique + && offset != InvalidOffsetNumber + && offset <= PageGetMaxOffsetNumber(page)) + { + itupdesc = RelationGetDescr(rel); + sizetoadd = sizeof(ItemPointerData); + olditup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset)); + + if(_bt_isbinaryequal(itupdesc, olditup, + rel->rd_index->indnatts, itup)) + { +if (!BtreeTupleIsPosting(olditup)) +{ + nipd = 1; + sizetoadd = sizetoadd*2; +} +else + nipd = BtreeGetNPosting(olditup); + +if ((IndexTupleSize(olditup) + sizetoadd) <= BTMaxItemSize(page) + && PageGetFreeSpace(page) > sizetoadd) + updposting = true; + } + } + + if (updposting) + _bt_pgupdtup(rel, buf, offset, itup, olditup, nipd); + else + _bt_insertonpg(rel, buf, InvalidBuffer, stack, itup, offset, false); } else { @@ -1042,6 +1086,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright, itemid = PageGetItemId(origpage, P_HIKEY); itemsz = ItemIdGetLength(itemid); item = (IndexTuple) PageGetItem(origpage, itemid); + if (PageAddItem(rightpage, (Item) item, itemsz, rightoff, false, false) == InvalidOffsetNumber) { @@ -1072,13 +1117,39 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright, itemsz = ItemIdGetLength(itemid); item = (IndexTuple) PageGetItem(origpage, itemid); } - if (PageAddItem(leftpage, (Item) item, itemsz, leftoff, + + if (BtreeTupleIsPosting(item)) + { + Size hikeysize = BtreeGetPostingOffset(item); + IndexTuple hikey = palloc0(hikeysize); + + /* Truncate posting before insert it as a hikey. */ + memcpy (hikey, item, hikeysize); + hikey->t_info &= ~INDEX_SIZE_MASK; + hikey->t_info |= hikeysize; + ItemPointerSet(&(hikey->t_tid), origpagenumber, P_HIKEY); + + if (PageAddItem(leftpage, (Item) hikey, hikeysize, leftoff, false, false) == InvalidOffsetNumber) + { + memset(rightpage, 0, BufferGetPageSize(rbuf)); + elog(ERROR, "failed to add hikey to the left sibling" +" while splitting block %u of index \"%s\"", +origpagenumber, RelationGetRelationName(rel)); + } + + pfree(hikey); + } + else { - memset(rightpage, 0, BufferGetPageSize(rbuf)); - elog(ERROR, "failed to add hikey to the left siblin
Re: [HACKERS] [PATCH] Supporting +-Infinity values by to_timestamp(float8)
15.03.2016 22:28, David Steele: On 3/4/16 2:56 PM, Vitaly Burovoy wrote: On 3/4/16, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: I think that you should update documentation. At least description of epoch on this page: http://www.postgresql.org/docs/devel/static/functions-datetime.html Thank you very much for pointing where it is located (I saw only "to_timestamp(TEXT, TEXT)"). I'll think how to update it. Vitaly, have you decided how to update this yet? 3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice abbreviation, but it seems slightly confusing to me. It doesn't matter for me what it is called, it is short enough and reflects a type on which it is applied. What would the best name be for it? Anastasia, any suggestions for a better name, or just leave it as is? I'm not in favor of the "4", either. I think I would prefer JULIAN_MAXYEAR_STAMP. This point is related to another patch https://commitfest.postgresql.org/9/540/. And added to this patch just for compatibility. If Tom wouldn't change the name of the macros there, I don't see any reasons why should we do it in this patch. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
14.03.2016 16:02, David Steele: Hi Anastasia, On 2/18/16 12:29 PM, Anastasia Lubennikova wrote: 18.02.2016 20:18, Anastasia Lubennikova: 04.02.2016 20:16, Peter Geoghegan: On Fri, Jan 29, 2016 at 8:50 AM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: I fixed it in the new version (attached). Thank you for the review. At last, there is a new patch version 3.0. After some refactoring it looks much better. I described all details of the compression in this document https://goo.gl/50O8Q0 (the same text without pictures is attached in btc_readme_1.0.txt). Consider it as a rough copy of readme. It contains some notes about tricky moments of implementation and questions about future work. Please don't hesitate to comment it. Sorry, previous patch was dirty. Hotfix is attached. This looks like an extremely valuable optimization for btree indexes but unfortunately it is not getting a lot of attention. It still applies cleanly for anyone interested in reviewing. Thank you for attention. I would be indebted to all reviewers, who can just try this patch on real data and workload (except WAL for now). B-tree needs very much testing. It's not clear to me that you answered all of Peter's questions in [1]. I understand that you've provided a README but it may not be clear if the answers are in there (and where). I described in README all the points Peter asked. But I see that it'd be better to answer directly. Thanks for reminding, I'll do it tomorrow. Also, at the end of the README it says: 13. Xlog. TODO. Does that mean the patch is not yet complete? Yes, you're right. Frankly speaking, I supposed that someone will help me with that stuff, but now I almost completed it. I'll send updated patch in the next letter. I'm still doubtful about some patch details. I mentioned them in readme (bold type). But they are mostly about future improvements. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
15.03.2016 03:21, Vitaly Burovoy: On 3/14/16, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: 14.03.2016 16:23, David Steele: On 2/25/16 4:44 PM, Vitaly Burovoy wrote: Added to the commitfest 2016-03. [CF] https://commitfest.postgresql.org/9/540/ This looks like a fairly straight-forward bug fix (the size of the patch is deceptive because there a lot of new tests included). It applies cleanly. Anastasia, I see you have signed up to review. Do you have an idea when you will get the chance to do that? Thanks, I've read the patch thoroughly and haven't found any problems. I think that the patch is in a very good shape. It fixes a bug and has an excellent set of tests. There is an issue, mentioned in the thread above: postgres=# select postgres-# to_char(date_trunc('week', '4713-01-01 BC'::date),'day') postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day') postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day'); to_char | to_char | to_char ---+---+--- monday| monday| thursday (1 row) since 4714-12-28 BC and to the past detection when a week is starting is broken (because it is boundary of isoyears -4713 and -4712). Is it worth to break undocumented range or leave it as is? But I suppose that behavior of undocumented dates is not essential. I'm sorry... What should I do with "Waiting on Author" state if you don't have complaints? I was going to set "Ready for Committer", but then I've noticed message from Mark Dilger and changed my mind. Now, when you answered him, I have no objections. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
14.03.2016 16:23, David Steele: On 2/25/16 4:44 PM, Vitaly Burovoy wrote: Added to the commitfest 2016-03. [CF] https://commitfest.postgresql.org/9/540/ This looks like a fairly straight-forward bug fix (the size of the patch is deceptive because there a lot of new tests included). It applies cleanly. Anastasia, I see you have signed up to review. Do you have an idea when you will get the chance to do that? Thanks, I've read the patch thoroughly and haven't found any problems. I think that the patch is in a very good shape. It fixes a bug and has an excellent set of tests. There is an issue, mentioned in the thread above: postgres=# select postgres-# to_char(date_trunc('week', '4713-01-01 BC'::date),'day') postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day') postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day'); to_char | to_char | to_char ---+---+--- monday| monday| thursday (1 row) since 4714-12-28 BC and to the past detection when a week is starting is broken (because it is boundary of isoyears -4713 and -4712). Is it worth to break undocumented range or leave it as is? But I suppose that behavior of undocumented dates is not essential. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS]WIP: Covering + unique indexes.
02.03.2016 08:50, Michael Paquier: On Wed, Mar 2, 2016 at 2:10 AM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: 01.03.2016 19:55, Anastasia Lubennikova: It is not the final version, because it breaks pg_dump for previous versions. I need some help from hackers here. pgdump. line 5466 if (fout->remoteVersion >= 90400) What does 'remoteVersion' mean? And what is the right way to change it? Or it changes between releases? I guess that 90400 is for 9.4 and 80200 is for 8.2 but is it really so? That is totally new to me. Yes, you got it. That's basically PG_VERSION_NUM as compiled on the server that has been queried, in this case the server from which a dump is taken. If you are changing the system catalog layer, you would need to provide a query at least equivalent to what has been done until now for your patch, the modify pg_dump as follows: if (fout->remoteVersion >= 90600) { query = my_new_query; } else if (fout->remoteVersion >= 90400) { query = the existing 9.4 query } etc. In short you just need to add a new block so as remote servers newer than 9.6 will be able to dump objects correctly. pg_upgrade is a good way to check the validity of pg_dump actually, this explains why some objects are not dropped in the regression tests. Perhaps you'd want to do the same with your patch if the current test coverage of pg_dump is not enough. I have not looked at your patch so I cannot say for sure. Thank you for the explanation. New version of the patch implements pg_dump well. Documentation related to constraints is updated. I hope, that patch is in a good shape now. Brief overview for reviewers: This patch allows unique indexes to be defined on one set of columns and include another set of column in the INCLUDING clause, on which the uniqueness is not enforced upon. It allows more queries to benefit from using index-only scan. Currently, only the B-tree access method supports this feature. Syntax example: CREATE TABLE tbl (c1 int, c2 int, c3 box); CREATE INDEX idx ON TABLE tbl (c1) INCLUDING (c2, c3); In opposite to key columns (c1), included columns (c2,c3) are not used in index scankeys neither in "search" scankeys nor in "insertion" scankeys. Included columns are stored only in leaf pages and it can help to slightly reduce index size. Hence, included columns do not require any opclass for btree access method. As you can see from example above, it's possible to add into index columns of "box" type. The most common use-case for this feature is combination of UNIQUE or PRIMARY KEY constraint on columns (a,b) and covering index on columns (a,b,c). So, there is a new syntax for constraints. CREATE TABLE tblu (c1 int, c2 int, c3 box, UNIQUE (c1,c2) INCLUDING (c3)); Index, created for this constraint contains three columns. "tblu_c1_c2_c3_key" UNIQUE CONSTRAINT, btree (c1, c2) INCLUDING (c3) CREATE TABLE tblpk (c1 int, c2 int, c3 box, PRIMARY KEY (c1) INCLUDING (c3)); Index, created for this constraint contains two columns. Note that NOT NULL constraint is applied only to key column(s) as well as unique constraint. postgres=# \d tblpk Table "public.tblpk" Column | Type | Modifiers +-+--- c1 | integer | not null c2 | integer | c3 | box | Indexes: "tblpk_pkey" PRIMARY KEY, btree (c1) INCLUDING (c3) Same for ALTER TABLE statements: CREATE TABLE tblpka (c1 int, c2 int, c3 box); ALTER TABLE tblpka ADD PRIMARY KEY (c1) INCLUDING (c3); pg_dump is updated and seems to work fine with this kind of indexes. I see only one problem left (maybe I've mentioned it before). Queries like this [1] must be rewritten, because after catalog changes, i.indkey contains both key and included attrs. One more thing to do is some refactoring of names, since "indkey" looks really confusing to me. But it could be done as a separate patch [2]. [1] https://wiki.postgresql.org/wiki/Retrieve_primary_key_columns [2] http://www.postgresql.org/message-id/56bb7788.30...@postgrespro.ru -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 9c8e308..891325d 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknuma
Re: [HACKERS] amcheck (B-Tree integrity checking tool)
fer, copy it into local memory allocated by palloc(), and then immediately release the buffer lock and drop the pin. This is the same in all instances. There is never more than one buffer lock or pin held at a time. We do, on the other hand, have a detailed rationale for why it's okay that we don't have an ExclusiveLock on the index relation for checks that span the key space of more than one page by following right links to compare items across sibling pages. This isn't the same thing as having an explicit interlock like a pin -- our interlock is one against *recycling* by vacuum, which is based on recentGlobalXmin. This rationale requires expert review. I'm not an expert, but I promise to take a closer look at locking. I will send another email soon. Performance == Trying to keep the tool as simple as possible, while still making it do verification that is as useful as possible was my priority here, not performance. Still, verification completes fairly quickly. Certainly, it takes far less time than having to REINDEX the index, and doesn't need too much memory. I think that in practice most problems that can be detected by the B-Tree checker functions will be detected with the lighter variant. I do not see any performance issues. I'm sure that if someone wants to check whether the index is corrupted, he certainly can wait a minute (. But I have some concerns about compatibility with my patches. I've tried to call bt_index_check() over my "including" patch [1] and caught a segfault. LOG: server process (PID 31794) was terminated by signal 11: Segmentation fault DETAIL: Failed process was running: select bt_index_check('idx'); I do hope that my patch will be accepted in 9.6, so this conflict looks really bad. I think that error is caused by changes in pages layout. To save some space nonkey attributes are truncated when btree copies the indexed value into inner pages or into high key. You can look at index_reform_tuple() calls. I wonder, whether you can add some check of catalog version to your module or this case requires more changes? With second patch which implements compression [2], amcheck causes another error. postgres=# insert into tbl select 1 from generate_series(0,5); INSERT 0 6 postgres=# SELECT * FROM bt_page_items('idx', 1); itemoffset | ctid | itemlen | nulls | vars | data ++-+---+--+ - 1 | (2147483664,6) | 56 | f | f| 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 02 00 00 00 00 00 03 00 00 00 00 00 04 00 00 00 00 00 05 00 00 00 00 00 06 00 00 00 00 00 postgres=# select bt_index_check('idx'); ERROR: cannot check index "idx" DETAIL: index is not yet ready for insertions But I'm sure that it's a problem of my patch. So I'll fix it and try again. [1] https://commitfest.postgresql.org/9/433/ [2] https://commitfest.postgresql.org/9/494/ -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS][PATCH] Supporting +-Infinity values by to_timestamp(float8)
27.02.2016 09:57, Vitaly Burovoy: Hello, Hackers! I worked on a patch[1] allows "EXTRACT(epoch FROM +-Inf::timestamp[tz])" to return "+-Inf::float8". There is an opposite function "to_timestamp(float8)" which now defined as: SELECT ('epoch'::timestamptz + $1 * '1 second'::interval) Hi, thank you for the patches. Could you explain, whether they depend on each other? Since intervals do not support infinity values, it is impossible to do something like: SELECT to_timestamp('infinity'::float8); ... which is not good. Supporting of such converting is in the TODO list[2] (by "converting between infinity timestamp and float8"). You mention intervals here, and TODO item definitely says about 'infinity' interval, while patch and all the following discussion concerns to timestamps. Is it a typo or I misunderstood something important? I assumed that following query will work, but it isn't. Could you clarify that? select to_timestamp('infinity'::interval); Proposed patch implements it. There is an other patch in the CF[3] 2016-03 implements checking of timestamp[tz] for being in allowed range. Since it is wise to set (fix) the upper boundary of timestamp[tz]s, I've included the file "src/include/datatype/timestamp.h" from there to check that an input value and a result are in the allowed range. There is no changes in a documentation because allowed range is the same as officially supported[4] (i.e. until 294277 AD). I think that you should update documentation. At least description of epoch on this page: http://www.postgresql.org/docs/devel/static/functions-datetime.html Here is how you can convert an epoch value back to a time stamp: SELECT TIMESTAMP WITH TIME ZONE 'epoch' + 982384720.12 * INTERVAL '1 second'; (The |to_timestamp| function encapsulates the above conversion.) More thoughts about the patch: 1. When I copy value from hints for min and max values (see examples below), it works fine for min, while max still leads to error. It comes from the check "if (seconds >= epoch_ubound)". I wonder, whether you should change hint message? select to_timestamp(-210866803200.00); to_timestamp - 4714-11-24 02:30:17+02:30:17 BC (1 row) select to_timestamp(9224318016000.00); ERROR: UNIX epoch out of range: "9224318016000.00" HINT: Maximal UNIX epoch value is "9224318016000.00" 2. There is a comment about JULIAN_MAXYEAR inaccuracy in timestamp.h: * IS_VALID_JULIAN checks the minimum date exactly, but is a bit sloppy * about the maximum, since it's far enough out to not be especially * interesting. Maybe you can expand it? - Is JULIAN_MAXYEAR4STAMPS helps to avoid overflow in all possible cases? - Why do we need to hold both definitions? I suppose, it's a matter of backward compatibility, isn't it? 3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice abbreviation, but it seems slightly confusing to me. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS]WIP: Covering + unique indexes.
29.02.2016 18:17, Anastasia Lubennikova: 25.02.2016 21:39, Jeff Janes: As promised, here's the new version of the patch "including_columns_4.0". I fixed all issues except some points mentioned below. Thanks for the update patch. I get a compiler warning: genam.c: In function 'BuildIndexValueDescription': genam.c:259: warning: unused variable 'tupdesc' Thank you for the notice, I'll fix it in the next update. Also, I can't create a primary key INCLUDING columns directly: jjanes=# create table foobar (a int, b int, c int); jjanes=# alter table foobar add constraint foobar_pkey primary key (a,b) including (c); ERROR: syntax error at or near "including" But I can get there using a circuitous route: jjanes=# create unique index on foobar (a,b) including (c); jjanes=# alter table foobar add constraint foobar_pkey primary key using index foobar_a_b_c_idx; The description of the table's index knows to include the including column: jjanes=# \d foobar Table "public.foobar" Column | Type | Modifiers +-+--- a | integer | not null b | integer | not null c | integer | Indexes: "foobar_pkey" PRIMARY KEY, btree (a, b) INCLUDING (c) Since the machinery appears to all be in place to have primary keys with INCLUDING columns, it would be nice if the syntax for adding primary keys allowed one to implement them directly. Is this something or future expansion, or could it be added at the same time as the main patch? Good point. At quick glance, this looks easy to implement it. The only problem is that there are too many places in code which must be updated. I'll try to do it, and if there would be difficulties, it's fine with me to delay this feature for the future work. I found one more thing to do. Pgdump does not handle included columns now. I will fix it in the next version of the patch. As promised, fixed patch is in attachments. It allows to perform following statements: create table utbl (a int, b box); alter table utbl add unique (a) including(b); create table ptbl (a int, b box); alter table ptbl add primary key (a) including(b); And now they can be dumped/restored successfully. I used following settings pg_dump --verbose -Fc postgres -f pg.dump pg_restore -d newdb pg.dump It is not the final version, because it breaks pg_dump for previous versions. I need some help from hackers here. pgdump. line 5466 if (fout->remoteVersion >= 90400) What does 'remoteVersion' mean? And what is the right way to change it? Or it changes between releases? I guess that 90400 is for 9.4 and 80200 is for 8.2 but is it really so? That is totally new to me. BTW, While we are on the subject, maybe it's worth to replace these magic numbers with some set of macro? P.S. I'll update documentation for ALTER TABLE in the next patch. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS]WIP: Covering + unique indexes.
25.02.2016 21:39, Jeff Janes: As promised, here's the new version of the patch "including_columns_4.0". I fixed all issues except some points mentioned below. Thanks for the update patch. I get a compiler warning: genam.c: In function 'BuildIndexValueDescription': genam.c:259: warning: unused variable 'tupdesc' Thank you for the notice, I'll fix it in the next update. Also, I can't create a primary key INCLUDING columns directly: jjanes=# create table foobar (a int, b int, c int); jjanes=# alter table foobar add constraint foobar_pkey primary key (a,b) including (c); ERROR: syntax error at or near "including" But I can get there using a circuitous route: jjanes=# create unique index on foobar (a,b) including (c); jjanes=# alter table foobar add constraint foobar_pkey primary key using index foobar_a_b_c_idx; The description of the table's index knows to include the including column: jjanes=# \d foobar Table "public.foobar" Column | Type | Modifiers +-+--- a | integer | not null b | integer | not null c | integer | Indexes: "foobar_pkey" PRIMARY KEY, btree (a, b) INCLUDING (c) Since the machinery appears to all be in place to have primary keys with INCLUDING columns, it would be nice if the syntax for adding primary keys allowed one to implement them directly. Is this something or future expansion, or could it be added at the same time as the main patch? Good point. At quick glance, this looks easy to implement it. The only problem is that there are too many places in code which must be updated. I'll try to do it, and if there would be difficulties, it's fine with me to delay this feature for the future work. I found one more thing to do. Pgdump does not handle included columns now. I will fix it in the next version of the patch. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
18.02.2016 20:18, Anastasia Lubennikova: 04.02.2016 20:16, Peter Geoghegan: On Fri, Jan 29, 2016 at 8:50 AM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: I fixed it in the new version (attached). Thank you for the review. At last, there is a new patch version 3.0. After some refactoring it looks much better. I described all details of the compression in this document https://goo.gl/50O8Q0 (the same text without pictures is attached in btc_readme_1.0.txt). Consider it as a rough copy of readme. It contains some notes about tricky moments of implementation and questions about future work. Please don't hesitate to comment it. Sorry, previous patch was dirty. Hotfix is attached. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company Compression. To be correct, itâs not actually compression, but just effective layout of ItemPointers on an index page. compressed tuple = IndexTuple (with metadata in TID field+ key) + PostingList 1. Gin index fits extremely good for really large sets of repeating keys, but on the other hand, it completely fails to handle unique keys. To btree it is essential to have good performance and concurrency in any corner cases with any number of duplicates. Thatâs why we canât just copy the gin implementation of item pointers compression. The first difference is that btree algorithm performs compression (or, in other words, changes index tuple layout) only if thereâs more than one tuple with this key. It allows us to avoid the overhead of storing useless metadata for mostly different keys (see picture below). It seems that compression could be useful for unique indexes under heavy write/update load (because of MVCC copies), but I donât sure whether this use-case really exists. Those tuples should be deleted by microvacuum as soon as possible. Anyway, I think that itâs worth to add storage_parameter for btree which enables/disables compression for each particular index. And set compression of unique indexes to off by default. System indexes do not support compression for several reasons. First of all because of WIP state of the patch (debugging system catalog isnât a big pleasure). The next reason is that I know many places in the code where hardcode or some non-obvious syscache routines are used. I do not feel brave enough to change this code. And last but not least, I donât see good reasons to do that. 2. If the index key is very small (smaller than metadata) and the number of duplicates is small, compression could lead to index bloat instead of index size decrease (see picture below). I donât sure whether itâs worth to handle this case separately because itâs really rare and I consider that itâs the DBAâs job to disable compression on such indexes. But if you see any clear way to do this, it would be great. 3. For GIN indexes, if a posting list is too large, a posting tree is created. It proceeded on assumptions that: Indexed keys are never deleted. It makes all tree algorithms much easier. There are always many duplicates. Otherwise, gin becomes really inefficient. Thereâs no big concurrent rate. In order to add a new entry into a posting tree, we hold a lock on its root, so only 1 backend at a time can perform insertion. In btree we canât afford these assumptions. So we should handle big posting lists in another way. If there are too many ItemPointers to fit into a single posting list, we will just create another one. The overhead of this approach is that we have to store a duplicate of the key and metadata. It leads to the problem of big keys. If the keysize is close to BTMaxItemSize, compression will give us really small benefit, if any at all (see picture below). 4. The more item pointers fit into the single posting list, the rare we have to split it and repeat the key. Therefore, the bigger BTMaxItemSize is the better. The comment in nbtree.h says: âWe actually need to be able to fit three items on every page, so restrict any one item to 1/3 the per-page available space.â That is quite right for regular items, but if the index tuple is compressed it already contains more than one item. Taking it into account, we can assert that BTMaxItemSize ~ â pagesize for regular items, and ~ ½ pagesize for compressed items. Are there any objections? I wonder if we can increase BTMaxItemSize with some other assumption? The problem I see here is that varlena highkey could be as big as the compressed tuple. 5. CREATE INDEX. _bt_load. The algorithm of btree build is following: do the heap scan, add tuples into spool, sort the data, insert ordered data from spool into leaf index pages (_bt_load), build inner pages and root. The main changes are applied to _bt_load function. While loading tuples, we do not insert them one by one, but instead, compare each tuple with the previous one, and if they are equal we put them into posting list. If the posting list is large
Re: [HACKERS]WIP: Covering + unique indexes.
02.02.2016 15:50, Anastasia Lubennikova: 31.01.2016 11:04, David Rowley: On 27 January 2016 at 03:35, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: including_columns_3.0 is the latest version of patch. And changes regarding the previous version are attached in a separate patch. Just to ease the review and debug. Hi, I've made another pass over the patch. There's still a couple of things that I think need to be looked at. Thank you again. I just write here to say that I do not disappear and I do remember about the issue. But I'm very very busy this week. I'll send an updated patch next week as soon as possible. As promised, here's the new version of the patch "including_columns_4.0". I fixed all issues except some points mentioned below. Besides, I did some refactoring: - use macros IndexRelationGetNumberOfAttributes, IndexRelationGetNumberOfKeyAttributes where possible. Use macro RelationGetNumberOfAttributes. Maybe that's a bit unrelated changes, but it'll make development much easier in future. - rename related variables to indnatts, indnkeyatts. I'm also not that keen on index_reform_tuple() in general. I wonder if there's a way we can just keep the Datum/isnull arrays a bit longer, and only form the tuple when needed. I've not looked into this in detail, but it does look like reforming the tuple is not going to be cheap. It is used in splits, for example. There is no datum array, we just move tuple key from a child page to a parent page or something like that. And according to INCLUDING algorithm we need to truncate nonkey attributes. If we do need to keep this function, I think a better name might be index_trim_tuple() and I don't think you need to pass the original length. It might make sense to Assert() that the trim length is smaller than the tuple size As regards the performance, I don't think that it's a big problem here. Do you suggest to do it in a following way memcpy(oldtup, newtup, newtuplength)? I've tested it some more, and still didn't find any performance issues. in gistrescan() there is some code: for (attno = 1; attno <= natts; attno++) { TupleDescInitEntry(so->giststate->fetchTupdesc, attno, NULL, scan->indexRelation->rd_opcintype[attno - 1], -1, 0); } Going by RelationInitIndexAccessInfo() rd_opcintype[] is allocated to be sized by the number of key columns, but this loop goes over the number of attribute columns. Perhaps this is not a big problem since GIST does not support INCLUDING columns, but it does seem wrong still. GiST doesn't support INCLUDING clause, so natts and nkeyatts are always equal. I don't see any problem here. And I think that it's an extra work to this patch. Maybe I or someone else would add this feature to other access methods later. Still the same. Which brings me to the fact that I've spent a bit of time trying to look for places where you've forgotten to change natts to nkeyatts. I did find this one, but I don't have much confidence that there's not lots more places that have been forgotten. Apart from this one, how confident are you that you've found all the places? I'm getting towards being happy with the code that I see that's been changed, but I'm hesitant to mark as "Ready for committer" due to not being all that comfortable that all the code that needs to be updated has been updated. I'm not quite sure of a good way to find all these places. I found all mentions of natts and other related variables with grep, and replaced (or expand) them with nkeyatts where it was necessary. As mentioned before, I didn't change other AMs. I strongly agree that any changes related to btree require thorough inspection, so I'll recheck it again. But I'm almost sure that it's okay. I rechecked everything again and fixed couple of omissions. Thank you for being exacting reviewer) I don't know how to ensure that everything is ok, but I have no idea what else I can do. I wondering if hacking the code so that each btree index which is created with > 1 column puts all but the first column into the INCLUDING columns, then run the regression tests to see if there are any crashes. I'm really not that sure of how else to increase the confidence levels on this. Do you have ideas? Do I understand correctly that you suggest to replace all multicolumn indexes with (1key column) + included? I don't think it's a good idea. INCLUDING clause brings some disadvantages. For example, included columns must be filtered after the search, while key columns could be used in scan key directly. I already mentioned this in test example: explain analyze select c1, c2 from tbl where c1<1 and c3<20; If columns' opclasses are used, new query plan uses them in Index Cond: ((c1 < 1) AND (c3 < 20)) Otherwise, new query can not use included column in Index Cond and uses filter instead: Index Cond: (c1 < 1) Filter: (c3 < 20) Rows Removed by Filter: 999
[HACKERS] Some refactoring of index structures .
Hi, hackers. Long story short, I'd like to do some refactoring of the code related to indexes. I work on patch which provides INCLUDING columns functional [1]. The patch was reviewed again and again, I fixed many issues, but we still don't sure enough that all occurrences of /rd_index->indnatts/ are replaced with /rd_index->indnkeyatts /accurately. I have no idea how to do it without thorough reading of code, so I am doing it now. I already replaced all these /rd_index->indnatts ///with macro /IndexRelationGetNumberOfAttributes/ and /rd_index->indnkeyatts /with macro /IndexRelationGetNumberOfKeyAttributes/. But still there are a lot of places with vague naming. For example, /indnatts/, /natts/, /ncolumns/ and so on for the very same /rd_index->indnatts. /I changed them with /indnatts./ Or /indexStruct, index, idxForm, index_form/ for /index->rd_index./ They are definitely should be unified. I'd prefer indexForm. Any objections? One more point is indkey in FormData_pg_index <http://doxygen.postgresql.org/pg__index_8h.html#a5065be0408f03576083a30c97b43a09a>. The same for indkeys <http://doxygen.postgresql.org/struct__indxInfo.html#a66d6eab0efe4bcb9fcbb472d403fa981>, ii_KeyAttrNumbers <http://doxygen.postgresql.org/structIndexInfo.html#a7723798af613a780d505231bad72c0a3> , indexkeys <http://doxygen.postgresql.org/structIndexOptInfo.html#ab95b0da6ff1e527506239dab1a480b82>. With my patch index can have key and non-key (included) columns. Therefore indkey is not just the key now. It contains both key and included columns. And its length is not indnkeyatts as one might think, but indnatts. I suppose it's worth to rename it somehow. But I can't find the proper name. Do you have any ideas? If you see any related changes, please, let me know. 1. http://www.postgresql.org/message-id/flat/56168952.4010...@postgrespro.ru#56168952.4010...@postgrespro.ru -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
29.01.2016 20:43, Thom Brown: On 29 January 2016 at 16:50, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: 29.01.2016 19:01, Thom Brown: On 29 January 2016 at 15:47, Aleksander Alekseev <a.aleks...@postgrespro.ru> wrote: I tested this patch on x64 and ARM servers for a few hours today. The only problem I could find is that INSERT works considerably slower after applying a patch. Beside that everything looks fine - no crashes, tests pass, memory doesn't seem to leak, etc. Thank you for testing. I rechecked that, and insertions are really very very very slow. It seems like a bug. Okay, now for some badness. I've restored a database containing 2 tables, one 318MB, another 24kB. The 318MB table contains 5 million rows with a sequential id column. I get a problem if I try to delete many rows from it: # delete from contacts where id % 3 != 0 ; WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory I didn't manage to reproduce this. Thom, could you describe exact steps to reproduce this issue please? Sure, I used my pg_rep_test tool to create a primary (pg_rep_test -r0), which creates an instance with a custom config, which is as follows: shared_buffers = 8MB max_connections = 7 wal_level = 'hot_standby' cluster_name = 'primary' max_wal_senders = 3 wal_keep_segments = 6 Then create a pgbench data set (I didn't originally use pgbench, but you can get the same results with it): createdb -p 5530 pgbench pgbench -p 5530 -i -s 100 pgbench And delete some stuff: thom@swift:~/Development/test$ psql -p 5530 pgbench Timing is on. psql (9.6devel) Type "help" for help. ➤ psql://thom@[local]:5530/pgbench # DELETE FROM pgbench_accounts WHERE aid % 3 != 0; WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory ... WARNING: out of shared memory WARNING: out of shared memory DELETE 667 Time: 22218.804 ms There were 358 lines of that warning message. I don't get these messages without the patch. Thom Thank you for this report. I tried to reproduce it, but I couldn't. Debug will be much easier now. I hope I'll fix these issueswithin the next few days. BTW, I found a dummy mistake, the previous patch contains some unrelated changes. I fixed it in the new version (attached). Thanks. Well I've tested this latest patch, and the warnings are no longer generated. However, the index sizes show that the patch doesn't seem to be doing its job, so I'm wondering if you removed too much from it. Huh, this patch seems to be enchanted) It works fine for me. Did you perform "make distclean"? Anyway, I'll send a new version soon. I just write here to say that I do not disappear and I do remember about the issue. I even almost fixed the insert speed problem. But I'm very very busy this week. I'll send an updated patch next week as soon as possible. Thank you for attention to this work. -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS]WIP: Covering + unique indexes.
31.01.2016 11:04, David Rowley: On 27 January 2016 at 03:35, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: including_columns_3.0 is the latest version of patch. And changes regarding the previous version are attached in a separate patch. Just to ease the review and debug. Hi, I've made another pass over the patch. There's still a couple of things that I think need to be looked at. Thank you again. I just write here to say that I do not disappear and I do remember about the issue. But I'm very very busy this week. I'll send an updated patch next week as soon as possible. Do we need the "b (included)" here? The key is (a) = (1). Having irrelevant details might be confusing. postgres=# create table a (a int not null, b int not null); CREATE TABLE postgres=# create unique index on a (a) including(b); CREATE INDEX postgres=# insert into a values(1,1); INSERT 0 1 postgres=# insert into a values(1,1); ERROR: duplicate key value violates unique constraint "a_a_b_idx" DETAIL: Key (a, b (included))=(1, 1) already exists. I thought that it could be strange if user inserts two values and then sees only one of them in error message. But now I see that you're right. I'll also look at the same functional in other DBs and fix it. In index_reform_tuple() I find it a bit scary that you change the TupleDesc's number of attributes then set it back again once you're finished reforming the shortened tuple. Maybe it would be better to modify index_form_tuple() to accept a new argument with a number of attributes, then you can just Assert that this number is never higher than the number of attributes in the TupleDesc. Good point. I agree that this function is a bit strange. I have to set tupdesc->nattrs to support compatibility with index_form_tuple(). I didn't want to add neither a new field to tupledesc nor a new parameter to index_form_tuple(), because they are used widely. I'm also not that keen on index_reform_tuple() in general. I wonder if there's a way we can just keep the Datum/isnull arrays a bit longer, and only form the tuple when needed. I've not looked into this in detail, but it does look like reforming the tuple is not going to be cheap. It is used in splits, for example. There is no datum array, we just move tuple key from a child page to a parent page or something like that. And according to INCLUDING algorithm we need to truncate nonkey attributes. If we do need to keep this function, I think a better name might be index_trim_tuple() and I don't think you need to pass the original length. It might make sense to Assert() that the trim length is smaller than the tuple size As regards the performance, I don't think that it's a big problem here. Do you suggest to do it in a following way memcpy(oldtup, newtup, newtuplength)? I will in gistrescan() there is some code: for (attno = 1; attno <= natts; attno++) { TupleDescInitEntry(so->giststate->fetchTupdesc, attno, NULL, scan->indexRelation->rd_opcintype[attno - 1], -1, 0); } Going by RelationInitIndexAccessInfo() rd_opcintype[] is allocated to be sized by the number of key columns, but this loop goes over the number of attribute columns. Perhaps this is not a big problem since GIST does not support INCLUDING columns, but it does seem wrong still. GiST doesn't support INCLUDING clause, so natts and nkeyatts are always equal. I don't see any problem here. And I think that it's an extra work to this patch. Maybe I or someone else would add this feature to other access methods later. Which brings me to the fact that I've spent a bit of time trying to look for places where you've forgotten to change natts to nkeyatts. I did find this one, but I don't have much confidence that there's not lots more places that have been forgotten. Apart from this one, how confident are you that you've found all the places? I'm getting towards being happy with the code that I see that's been changed, but I'm hesitant to mark as "Ready for committer" due to not being all that comfortable that all the code that needs to be updated has been updated. I'm not quite sure of a good way to find all these places. I found all mentions of natts and other related variables with grep, and replaced (or expand) them with nkeyatts where it was necessary. As mentioned before, I didn't change other AMs. I strongly agree that any changes related to btree require thorough inspection, so I'll recheck it again. But I'm almost sure that it's okay. I wondering if hacking the code so that each btree index which is created with > 1 column puts all but the first column into the INCLUDING columns, then run the regression tests to see if there are any crashes. I'm really not that sure of how else to increase the confidence levels on this. Do you have ideas? Do I understand correctly that you suggest to replace all multicolumn indexes with (1key column) + included? I don't think it's a good idea.
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
28.01.2016 20:03, Thom Brown: On 28 January 2016 at 16:12, Anastasia Lubennikova <a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> wrote: 28.01.2016 18:12, Thom Brown: On 28 January 2016 at 14:06, Anastasia Lubennikova <a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> wrote: 31.08.2015 10:41, Anastasia Lubennikova: Hi, hackers! I'm going to begin work on effective storage of duplicate keys in B-tree index. The main idea is to implement posting lists and posting trees for B-tree index pages as it's already done for GIN. In a nutshell, effective storing of duplicates in GIN is organised as follows. Index stores single index tuple for each unique key. That index tuple points to posting list which contains pointers to heap tuples (TIDs). If too many rows having the same key, multiple pages are allocated for the TIDs and these constitute so called posting tree. You can find wonderful detailed descriptions in gin readme <https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README> and articles <http://www.cybertec.at/gin-just-an-index-type/>. It also makes possible to apply compression algorithm to posting list/tree and significantly decrease index size. Read more in presentation (part 1) <http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>. Now new B-tree index tuple must be inserted for each table row that we index. It can possibly cause page split. Because of MVCC even unique index could contain duplicates. Storing duplicates in posting list/tree helps to avoid superfluous splits. I'd like to share the progress of my work. So here is a WIP patch. It provides effective duplicate handling using posting lists the same way as GIN does it. Layout of the tuples on the page is changed in the following way: before: TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key with patch: TID (N item pointers, posting list offset) + key, TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid) It seems that backward compatibility works well without any changes. But I haven't tested it properly yet. Here are some test results. They are obtained by test functions test_btbuild and test_ginbuild, which you can find in attached sql file. i - number of distinct values in the index. So i=1 means that all rows have the same key, and i=1000 means that all keys are different. The other columns contain the index size (MB). i B-tree Old B-tree New GIN 1 214,234375 87,7109375 10,2109375 10 214,234375 87,7109375 10,71875 100 214,234375 87,4375 15,640625 1000214,234375 86,2578125 31,296875 1 214,234375 78,421875 104,3046875 10 214,234375 65,359375 49,078125 100 214,234375 90,140625 106,8203125 1000214,234375 214,234375 534,0625 You can note that the last row contains the same index sizes for B-tree, which is quite logical - there is no compression if all the keys are distinct. Other cases looks really nice to me. Next thing to say is that I haven't implemented posting list compression yet. So there is still potential to decrease size of compressed btree. I'm almost sure, there are still some tiny bugs and missed functions, but on the whole, the patch is ready for testing. I'd like to get a feedback about the patch testing on some real datasets. Any bug reports and suggestions are welcome. Here is a couple of useful queries to inspect the data inside the index pages: create extension pageinspect; select * from bt_metap('idx'); select bt.* from generate_series(1,1) as n, lateral bt_page_stats('idx', n) as bt; select n, bt.* from generate_series(1,1) as n, lateral bt_page_items('idx', n) as bt; And at last, the list of items I'm going to complete in the near future: 1. Add storage_parameter 'enable_compression' for btree access method which specifies whether the index handles duplicates. default is 'off' 2. Bring back microvacuum functionality for compressed indexes. 3. Improve insertion speed. Insertions became significantly slower with compressed btree, which is obviously not what we do want. 4
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
29.01.2016 19:01, Thom Brown: On 29 January 2016 at 15:47, Aleksander Alekseev <a.aleks...@postgrespro.ru> wrote: I tested this patch on x64 and ARM servers for a few hours today. The only problem I could find is that INSERT works considerably slower after applying a patch. Beside that everything looks fine - no crashes, tests pass, memory doesn't seem to leak, etc. Thank you for testing. I rechecked that, and insertions are really very very very slow. It seems like a bug. Okay, now for some badness. I've restored a database containing 2 tables, one 318MB, another 24kB. The 318MB table contains 5 million rows with a sequential id column. I get a problem if I try to delete many rows from it: # delete from contacts where id % 3 != 0 ; WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory I didn't manage to reproduce this. Thom, could you describe exact steps to reproduce this issue please? Sure, I used my pg_rep_test tool to create a primary (pg_rep_test -r0), which creates an instance with a custom config, which is as follows: shared_buffers = 8MB max_connections = 7 wal_level = 'hot_standby' cluster_name = 'primary' max_wal_senders = 3 wal_keep_segments = 6 Then create a pgbench data set (I didn't originally use pgbench, but you can get the same results with it): createdb -p 5530 pgbench pgbench -p 5530 -i -s 100 pgbench And delete some stuff: thom@swift:~/Development/test$ psql -p 5530 pgbench Timing is on. psql (9.6devel) Type "help" for help. ➤ psql://thom@[local]:5530/pgbench # DELETE FROM pgbench_accounts WHERE aid % 3 != 0; WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory WARNING: out of shared memory ... WARNING: out of shared memory WARNING: out of shared memory DELETE 667 Time: 22218.804 ms There were 358 lines of that warning message. I don't get these messages without the patch. Thom Thank you for this report. I tried to reproduce it, but I couldn't. Debug will be much easier now. I hope I'll fix these issueswithin the next few days. BTW, I found a dummy mistake, the previous patch contains some unrelated changes. I fixed it in the new version (attached). -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c index e3c55eb..3908cc1 100644 --- a/src/backend/access/nbtree/nbtinsert.c +++ b/src/backend/access/nbtree/nbtinsert.c @@ -24,6 +24,7 @@ #include "storage/predicate.h" #include "utils/tqual.h" +#include "catalog/catalog.h" typedef struct { @@ -60,7 +61,8 @@ static void _bt_findinsertloc(Relation rel, ScanKey scankey, IndexTuple newtup, BTStack stack, - Relation heapRel); + Relation heapRel, + bool *updposing); static void _bt_insertonpg(Relation rel, Buffer buf, Buffer cbuf, BTStack stack, IndexTuple itup, @@ -113,6 +115,7 @@ _bt_doinsert(Relation rel, IndexTuple itup, BTStack stack; Buffer buf; OffsetNumber offset; + bool updposting = false; /* we need an insertion scan key to do our search, so build one */ itup_scankey = _bt_mkscankey(rel, itup); @@ -162,8 +165,9 @@ top: { TransactionId xwait; uint32 speculativeToken; + bool fakeupdposting = false; /* Never update posting in unique index */ - offset = _bt_binsrch(rel, buf, natts, itup_scankey, false); + offset = _bt_binsrch(rel, buf, natts, itup_scankey, false, ); xwait = _bt_check_unique(rel, itup, heapRel, buf, offset, itup_scankey, checkUnique, _unique, ); @@ -200,8 +204,54 @@ top: CheckForSerializableConflictIn(rel, NULL, buf); /* do the insertion */ _bt_findinsertloc(rel, , , natts, itup_scankey, itup, - stack, heapRel); - _bt_insertonpg(rel, buf, InvalidBuffer, stack, itup, offset, false); + stack, heapRel, ); + + if (IsSystemRelation(rel)) + updposting = false; + + /* + * New tuple has the same key with tuple at the page. + * Unite them into one posting. + */ + if (updposting) + { + Page page; + IndexTuple olditup, newitup; + ItemPointerData *ipd; + int nipd; + + page = BufferGetPage(buf); + olditup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset)); + + if (BtreeTupleIsPosting(olditup)) +nipd = BtreeGetNPosting(olditup); + else +nipd = 1; + + ipd = palloc0(sizeof(ItemPointerData)*(nipd + 1)); + /* copy item pointers from old tuple into ipd */ + if (BtreeTupleIsPosting(olditup)) +memcpy(ipd, BtreeGetPosting(olditup), sizeof(ItemPointerData)*nipd); + else +memcpy(ipd, olditup, sizeof(ItemPointerData)); + + /* add item pointer of the new tuple into ipd */ + memcpy(ipd+nipd, itup, sizeof(ItemPointerData)); + + /* + * Form posting tuple,
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
28.01.2016 18:12, Thom Brown: On 28 January 2016 at 14:06, Anastasia Lubennikova <a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> wrote: 31.08.2015 10:41, Anastasia Lubennikova: Hi, hackers! I'm going to begin work on effective storage of duplicate keys in B-tree index. The main idea is to implement posting lists and posting trees for B-tree index pages as it's already done for GIN. In a nutshell, effective storing of duplicates in GIN is organised as follows. Index stores single index tuple for each unique key. That index tuple points to posting list which contains pointers to heap tuples (TIDs). If too many rows having the same key, multiple pages are allocated for the TIDs and these constitute so called posting tree. You can find wonderful detailed descriptions in gin readme <https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README> and articles <http://www.cybertec.at/gin-just-an-index-type/>. It also makes possible to apply compression algorithm to posting list/tree and significantly decrease index size. Read more in presentation (part 1) <http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>. Now new B-tree index tuple must be inserted for each table row that we index. It can possibly cause page split. Because of MVCC even unique index could contain duplicates. Storing duplicates in posting list/tree helps to avoid superfluous splits. I'd like to share the progress of my work. So here is a WIP patch. It provides effective duplicate handling using posting lists the same way as GIN does it. Layout of the tuples on the page is changed in the following way: before: TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key with patch: TID (N item pointers, posting list offset) + key, TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid) It seems that backward compatibility works well without any changes. But I haven't tested it properly yet. Here are some test results. They are obtained by test functions test_btbuild and test_ginbuild, which you can find in attached sql file. i - number of distinct values in the index. So i=1 means that all rows have the same key, and i=1000 means that all keys are different. The other columns contain the index size (MB). i B-tree Old B-tree New GIN 1 214,234375 87,7109375 10,2109375 10 214,234375 87,7109375 10,71875 100 214,234375 87,4375 15,640625 1000214,234375 86,2578125 31,296875 1 214,234375 78,421875 104,3046875 10 214,234375 65,359375 49,078125 100 214,234375 90,140625 106,8203125 1000214,234375 214,234375 534,0625 You can note that the last row contains the same index sizes for B-tree, which is quite logical - there is no compression if all the keys are distinct. Other cases looks really nice to me. Next thing to say is that I haven't implemented posting list compression yet. So there is still potential to decrease size of compressed btree. I'm almost sure, there are still some tiny bugs and missed functions, but on the whole, the patch is ready for testing. I'd like to get a feedback about the patch testing on some real datasets. Any bug reports and suggestions are welcome. Here is a couple of useful queries to inspect the data inside the index pages: create extension pageinspect; select * from bt_metap('idx'); select bt.* from generate_series(1,1) as n, lateral bt_page_stats('idx', n) as bt; select n, bt.* from generate_series(1,1) as n, lateral bt_page_items('idx', n) as bt; And at last, the list of items I'm going to complete in the near future: 1. Add storage_parameter 'enable_compression' for btree access method which specifies whether the index handles duplicates. default is 'off' 2. Bring back microvacuum functionality for compressed indexes. 3. Improve insertion speed. Insertions became significantly slower with compressed btree, which is obviously not what we do want. 4. Clean the code and comments, add related documentation. This doesn't apply cleanly against current git head. Have you caught up past commit 65c5fcd35? Thank you for the notice. New patch is attached. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 9673fe0..0c8e4fb 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
31.08.2015 10:41, Anastasia Lubennikova: Hi, hackers! I'm going to begin work on effective storage of duplicate keys in B-tree index. The main idea is to implement posting lists and posting trees for B-tree index pages as it's already done for GIN. In a nutshell, effective storing of duplicates in GIN is organised as follows. Index stores single index tuple for each unique key. That index tuple points to posting list which contains pointers to heap tuples (TIDs). If too many rows having the same key, multiple pages are allocated for the TIDs and these constitute so called posting tree. You can find wonderful detailed descriptions in gin readme <https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README> and articles <http://www.cybertec.at/gin-just-an-index-type/>. It also makes possible to apply compression algorithm to posting list/tree and significantly decrease index size. Read more in presentation (part 1) <http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>. Now new B-tree index tuple must be inserted for each table row that we index. It can possibly cause page split. Because of MVCC even unique index could contain duplicates. Storing duplicates in posting list/tree helps to avoid superfluous splits. I'd like to share the progress of my work. So here is a WIP patch. It provides effective duplicate handling using posting lists the same way as GIN does it. Layout of the tuples on the page is changed in the following way: before: TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key with patch: TID (N item pointers, posting list offset) + key, TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid) It seems that backward compatibility works well without any changes. But I haven't tested it properly yet. Here are some test results. They are obtained by test functions test_btbuild and test_ginbuild, which you can find in attached sql file. i - number of distinct values in the index. So i=1 means that all rows have the same key, and i=1000 means that all keys are different. The other columns contain the index size (MB). i B-tree Old B-tree New GIN 1 214,234375 87,7109375 10,2109375 10 214,234375 87,7109375 10,71875 100 214,234375 87,4375 15,640625 1000214,234375 86,2578125 31,296875 1 214,234375 78,421875 104,3046875 10 214,234375 65,359375 49,078125 100 214,234375 90,140625 106,8203125 1000214,234375 214,234375 534,0625 You can note that the last row contains the same index sizes for B-tree, which is quite logical - there is no compression if all the keys are distinct. Other cases looks really nice to me. Next thing to say is that I haven't implemented posting list compression yet. So there is still potential to decrease size of compressed btree. I'm almost sure, there are still some tiny bugs and missed functions, but on the whole, the patch is ready for testing. I'd like to get a feedback about the patch testing on some real datasets. Any bug reports and suggestions are welcome. Here is a couple of useful queries to inspect the data inside the index pages: create extension pageinspect; select * from bt_metap('idx'); select bt.* from generate_series(1,1) as n, lateral bt_page_stats('idx', n) as bt; select n, bt.* from generate_series(1,1) as n, lateral bt_page_items('idx', n) as bt; And at last, the list of items I'm going to complete in the near future: 1. Add storage_parameter 'enable_compression' for btree access method which specifies whether the index handles duplicates. default is 'off' 2. Bring back microvacuum functionality for compressed indexes. 3. Improve insertion speed. Insertions became significantly slower with compressed btree, which is obviously not what we do want. 4. Clean the code and comments, add related documentation. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c index 77c2fdf..3b61e8f 100644 --- a/src/backend/access/nbtree/nbtinsert.c +++ b/src/backend/access/nbtree/nbtinsert.c @@ -24,6 +24,7 @@ #include "storage/predicate.h" #include "utils/tqual.h" +#include "catalog/catalog.h" typedef struct { @@ -60,7 +61,8 @@ static void _bt_findinsertloc(Relation rel, ScanKey scankey, IndexTuple newtup, BTStack stack, - Relation heapRel); + Relation heapRel, + bool *updposing); static void _bt_insertonpg(Relation rel, Buffer buf, Buffer cbuf, BTStack stack, IndexTuple itup, @@ -113,6 +115,7 @@ _bt_doinsert(Relation rel, IndexTuple itup, BTStack stack; Buffer buf; OffsetNumber offset; + bool updposting = false; /
Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex
22.01.2016 13:48, Aleksander Alekseev: Then, this thread became too tangled. I think it's worth to write a new message with the patch, the test script, some results and brief overview of how does it really works. It will make following review much easier. Sure. HASHHDR represents a hash table. It could be usual or partitioned. Partitioned table is stored in a shared memory and accessed by multiple processes simultaneously. To prevent data corruption hash table is partitioned and each process has to acquire a lock for a corresponding partition before accessing data in it. Number of partition is determine by lower bits of key's hash value. Most tricky part is --- dynahash knows nothing about these locks, all locking is done on calling side. Since shared memory is pre-allocated and can't grow to allocate memory in a hash table we use freeList. Also we use nentries field to count current number of elements in a hash table. Since hash table is used by multiple processes these fields are protected by a spinlock. Which as it turned out could cause lock contention and create a bottleneck. After trying a few approaches I discovered that using one spinlock per partition works quite well. Here are last benchmark results: http://www.postgresql.org/message-id/20151229184851.1bb7d1bd@fujitsu Note that "no locks" solution cant be used because it doesn't guarantee that all available memory will be used in some corner cases. You can find a few more details and a test script in the first message of this thread. If you have any other questions regarding this patch please don't hesitate to ask. InitLocks /* * Compute init/max size to request for lock hashtables. Note these * calculations must agree with LockShmemSize! */ This comment certainly requires some changes. BTW, could you explain why init_table_size was two times less than max_table_size? Although, I don't see any problems with your changes. -hctl->nentries = 0; -hctl->freeList = NULL; Why did you delete these two lines? I wonder if you should rewrite them instead? + * This particular number of partitions significantly reduces lock contention + * in partitioned hash tables, almost if partitioned tables didn't use any + * locking at all As far as I understood, this number was obtained experimentally? Maybe you should note that in the comment. And the last, but not least. +nelem_alloc = ((int) nelem) / partitions_number; +if (nelem_alloc == 0) +nelem_alloc = 1; + +for (i = 0; i < partitions_number; i++) +if (!element_alloc(hashp, nelem_alloc, i)) +ereport(ERROR, +(errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory"))); It looks like you forgot to round the result of the division. For example, if you have nelem=25 and partitions_number=6. 25 / 6 = 4. And then you allocate 24 nelems, while 1 is lost. What about productivity, I did a test on my notebook. Patched version shows small performance improvement. pgbench -j 1 -c 1 -f pgbench.sql -T 300 postgres base patched 127tps 145tps pgbench -j 8 -c 8 -f pgbench.sql -T 300 postgres base patched 247tps 270tps But I haven't any big machine to perform tests, where the patch is supposed to make significant changes. So I would rely on your and the other reviewers results. Except mentioned notes, I suppose the patch is good enough to pass it to a reviewer. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS]WIP: Covering + unique indexes.
25.01.2016 03:32, Jeff Janes: On Fri, Jan 22, 2016 at 7:19 AM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: Done. I hope that my patch is close to the commit too. Thanks for the update. I've run into this problem: create table foobar (x text, w text); create unique index foobar_pkey on foobar (x) including (w); alter table foobar add constraint foobar_pkey primary key using index foobar_pkey; ERROR: index "foobar_pkey" does not have default sorting behavior LINE 1: alter table foobar add constraint foobar_pkey primary key us... ^ DETAIL: Cannot create a primary key or unique constraint using such an index. Time: 1.577 ms If I instead define the table as create table foobar (x int, w xml); Then I can create the index and then the primary key the first time I do this in a session. But then if I drop the table and repeat the process, I get "does not have default sorting behavior" error even for this index that previously succeeded, so I think there is some kind of problem with the backend syscache or catcache. create table foobar (x int, w xml); create unique index foobar_pkey on foobar (x) including (w); alter table foobar add constraint foobar_pkey primary key using index foobar_pkey; drop table foobar ; create table foobar (x int, w xml); create unique index foobar_pkey on foobar (x) including (w); alter table foobar add constraint foobar_pkey primary key using index foobar_pkey; ERROR: index "foobar_pkey" does not have default sorting behavior LINE 1: alter table foobar add constraint foobar_pkey primary key us... ^ DETAIL: Cannot create a primary key or unique constraint using such an index. Great, I've fixed that. Thank you for the tip about cache. I've also found and fixed related bug in copying tables with indexes: create table tbl2 (like tbl including all); And there's one more tiny fix in get_pkey_attnames in dblink module. including_columns_3.0 is the latest version of patch. And changes regarding the previous version are attached in a separate patch. Just to ease the review and debug. I've changed size of pg_index.indclass array. It contains indnkeyatts elements now. While pg_index.indkey still contains all attributes. And this query Retrieve primary key columns <https://wiki.postgresql.org/wiki/Retrieve_primary_key_columns> provides pretty non-obvious result. Is it a normal behavior here or some changes are required? Do you know any similar queries? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 9c8e308..ef6aee5 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -2058,7 +2058,7 @@ get_pkey_attnames(Relation rel, int16 *numatts) /* we're only interested if it is the primary key */ if (index->indisprimary) { - *numatts = index->indnatts; + *numatts = index->indnkeyatts; if (*numatts > 0) { result = (char **) palloc(*numatts * sizeof(char *)); diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 412c845..c201f8b 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3515,6 +3515,14 @@ pg_class.relnatts) + + indnkeyatts + int2 + + The number of key columns in the index. "Key columns" are ordinary + index columns in contrast with "included" columns. + + indisunique bool diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index 5f7befb..aaed5a7 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -111,6 +111,8 @@ typedef struct IndexAmRoutine boolamclusterable; /* does AM handle predicate locks? */ boolampredlocks; +/* does AM support columns included with clause INCLUDING? */ +boolamcaninclude; /* type of data stored in index, or InvalidOid if variable */ Oid amkeytype; @@ -852,7 +854,8 @@ amrestrpos (IndexScanDesc scan); using unique indexes, which are indexes that disallow multiple entries with identical keys. An access method that supports this feature sets amcanunique true. - (At present, only b-tree supports it.) + (At present, only b-tree supports it.) Columns which are present in the + INCLUDING clause are not used to enforce uniqueness. diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index 23bbec6..09d4e6b 100644 --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -633,7 +633,8 @@ CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST); Indexes can also be used to enforce uniqueness of a column's value, or the uniqueness of the combined values of more than one column. -CREATE UNIQUE INDEX name ON table (column , ...); +CREATE UNIQUE INDEX name ON ta
Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex
30.12.2015 16:01, Aleksander Alekseev: Here is a clean version of the patch. Step 1 is an optimization. Step 2 refactors this: HTAB * ShmemInitHash(const char *name, /* table string name for shmem index */ - long init_size, /* initial table size */ + long init_size, /* initial table size XXX ignored, refactor! */ Hi, I found that the patch is still not reviewed, so I've looked it over. I haven't dived into this subject before, so my comments will be more general than relating to your investigation. Sorry if some things seem like nitpicks, I just suppose that clear patch would be more convenient for reviewers. First of all, why not merge both patches into one? They aren't too big anyway. Then, this thread became too tangled. I think it's worth to write a new message with the patch, the test script, some results and brief overview of how does it really works. It will make following review much easier. + /* number of entries in hash table */ + longnentries[NUM_LOCK_PARTITIONS]; + /* linked list of free elements */ + HASHELEMENT *freeList[NUM_LOCK_PARTITIONS]; I think comments should be changed, to be more informative here. + if (IS_PARTITIONED(hashp->hctl)) + partitions_number = NUM_LOCK_PARTITIONS; + else + partitions_number = 1; + + nelem_alloc = ((int) nelem) / partitions_number; + if (nelem_alloc == 0) + nelem_alloc = 1; Add a comment here too. -/* Number of partitions of the shared buffer mapping hashtable */ -#define NUM_BUFFER_PARTITIONS 128 - /* Number of partitions the shared lock tables are divided into */ -#define LOG2_NUM_LOCK_PARTITIONS 4 +#define LOG2_NUM_LOCK_PARTITIONS 7 #define NUM_LOCK_PARTITIONS (1 << LOG2_NUM_LOCK_PARTITIONS) +/* Number of partitions of the shared buffer mapping hashtable */ +#define NUM_BUFFER_PARTITIONS NUM_LOCK_PARTITIONS + I'm sure, it was discussed above. but these definitions do not look obvious at all. Maybe you should explain this magic number 7 in the comment above? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Batch update of indexes
20.01.2016 17:55, Konstantin Knizhnik: Hi, Hi, I glad to see that you interested in that too. I think this is a good feature and I think it will be very useful to have. I have already mentioned some related problems and possible improvements in my presentation. http://www.slideshare.net/AnastasiaLubennikova/indexes-dont-mean-slow-inserts Last two slides concern to this thread. Briefly, I've suggested to think about insertion buffer. Something very similar to it is already implemented in BRIN. It does not index last data from heap, while the number of last pages is less than pages_per_block. Do you mean GIN-like usage of insertion buffer (here it is called "pending list")? So that we have to combine search in the main tree and in the insert buffer? Actually this is what I want to avoided (because at least in case of GIN pending list cause significant degrade of performance, while up-to-date state of full text index is rarely required). What I meant is more like a BRIN-like combination of an index scan and heap scan. Maybe it could be called "deferred inserts" or "temporary read-only index" Maybe it's similar with mysql insert buffer http://dev.mysql.com/doc/refman/5.7/en/innodb-insert-buffering.html I think it'll be more clear with example. Please don't care about syntax. CREATE TABLE tbl (c1 int); CREATE INDEX idx on tbl(c1); SET enable_deferred_insert(idx) = on; At this moment, we save the last_indexed_item (its TID) somewhere in index metapage. Since that moment, the data inserted into the table doesn't touch the index. We perform some heavy insert and then go back to the normal index behavior. SET enable_deferred_insert(idx) = off; This command takes all the data between the last_indexed_item and the end of the table, and inserts it into the index at a time. Of course there are new problems to deal with, but it's really useful for the use case to balance irregular heavy write load, isn't it? BTW, could you explain, what is the reason to copy data into the pending list and then copy it again while flushing pending list into the index? Why not read this data directly from the table? I feel that I've missed something important here. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Batch update of indexes
tion plan while it is not needed in this case: search condition is exactly the same as partial index condition. Optimal plan should be: Index Scan using idx1 on t (cost=0.00..4.13 rows=12263 width=0) Index Cond: (c1 < '10'::double precision) There was the discussion of the patch for partial indexes. http://postgresql.nabble.com/PATCH-index-only-scans-with-partial-indexes-td5857568.html Since I haven't watched it closely, It seems to be open still. I think it'll be interesting to you. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS]WIP: Covering + unique indexes.
18.01.2016 01:02, David Rowley пишет: On 14 January 2016 at 08:24, David Rowley <david.row...@2ndquadrant.com <mailto:david.row...@2ndquadrant.com>> wrote: I will try to review the omit_opclass_4.0.patch soon. Hi, as promised, here's my review of the omit_opclass_4.0.patch patch. Thank you again. All mentioned points are fixed and patches are merged. I hope it's all right now. Please check comments one more time. I rather doubt that I wrote everything correctly. Also this makes me think that the name ii_KeyAttrNumbers is now out-of-date, as it contains the including columns too by the looks of it. Maybe it just needs to drop the "Key" and become "ii_AttrNumbers". It would be interesting to hear what others think of that. I'm also wondering if indexkeys is still a good name for the IndexOptInfo struct member. Including columns are not really keys, but I feel renaming that might cause a fair bit of code churn, so I'd be interested to hear what other's have to say. I agree that KeyAttrNumbers and indexkeys are a bit confusing names, but I'd like to keep them at least in this patch. It's may be worth doing "index structures refactoring" as a separate patch. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 97ef618..d17a06c 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -644,6 +644,13 @@ Does an index of this type manage fine-grained predicate locks? + + amcaninclude + bool + + Does the access method support included columns? + + amkeytype oid @@ -3714,6 +3721,14 @@ pg_class.relnatts) + + indnkeyatts + int2 + + The number of key columns in the index. "Key columns" are ordinary + index columns in contrast with "included" columns. + + indisunique bool diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index 1c09bae..d01af17 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -767,7 +767,8 @@ amrestrpos (IndexScanDesc scan); using unique indexes, which are indexes that disallow multiple entries with identical keys. An access method that supports this feature sets pg_am.amcanunique true. - (At present, only b-tree supports it.) + (At present, only b-tree supports it.) Columns included with clause + INCLUDING aren't used to enforce uniqueness. diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index 23bbec6..09d4e6b 100644 --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -633,7 +633,8 @@ CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST); Indexes can also be used to enforce uniqueness of a column's value, or the uniqueness of the combined values of more than one column. -CREATE UNIQUE INDEX name ON table (column , ...); +CREATE UNIQUE INDEX name ON table (column , ...) +INCLUDING (column , ...); Currently, only B-tree indexes can be declared unique. @@ -642,7 +643,9 @@ CREATE UNIQUE INDEX name ON tableINCLUDING aren't used to enforce constraints (UNIQUE, + PRIMARY KEY, etc). diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index ce36a1b..8360bb6 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -23,6 +23,7 @@ PostgreSQL documentation CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON table_name [ USING method ] ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) +[ INCLUDING ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ] [ TABLESPACE tablespace_name ] [ WHERE predicate ] @@ -139,6 +140,32 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] + INCLUDING + + +An optional INCLUDING clause allows a list of columns to be +specified which will be included in the index, in the non-key portion of +the index. Columns which are part of this clause cannot also exist in the +key columns portion of the index, and vice versa. The +INCLUDING columns exist solely to allow more queries to benefit +from index-only scans by including certain columns in the +index, the value of which would otherwise have to be obtained by reading +the table's heap. Having these columns in the INCLUDING clause +in some cases allows PostgreSQL to skip the heap read +completely. This also allows UNIQUE indexes to be defined on +one set of columns, which can include another set of co
Re: [HACKERS]WIP: Covering + unique indexes.
12.01.2016 20:47, Jeff Janes: It looks like the "covering" patch, with or without the "omit_opclass" patch, does not support expressions as included columns: create table foobar (x text, y xml); create index on foobar (x) including (md5(x)); ERROR: unrecognized node type: 904 create index on foobar (x) including ((y::text)); ERROR: unrecognized node type: 911 I think we would probably want it to work with those (or at least to throw a better error message). Thank you for the notice. I couldn't fix it quickly and added a stub in the latest patch. But I'll try to fix it and add expressions support a bit later. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS]WIP: Covering + unique indexes.
13.01.2016 04:47, David Rowley : On 13 January 2016 at 06:47, Jeff Janes <jeff.ja...@gmail.com <mailto:jeff.ja...@gmail.com>> wrote: Why is omit_opclass a separate patch? If the included columns now never participate in the index ordering, shouldn't it be an inherent property of the main patch that you can "cover" things without btree opclasses? I don't personally think the covering_unique_4.0.patch is that close to being too big to review, I think things would make more sense of the omit_opclass_4.0.patch was included together with this. I agree that these patches should be merged. It'll be fixed it the next updates. I kept them separate only for historical reasons, it was more convenient for me to debug them. Furthermore, I wanted to show some performance degradation caused by "omit_opclass" and give a way to reproduce it performing test with and whithot the patch. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS]WIP: Covering + unique indexes.
13.01.2016 04:27, David Rowley: I've also done some testing: create table ab (a int, b int); insert into ab select a,b from generate_Series(1,10) a(a), generate_series(1,1) b(b); set enable_bitmapscan=off; set enable_indexscan=off; select * from ab where a = 1 and b=1; a | b ---+--- 1 | 1 (1 row) set enable_indexscan = on; select * from ab where a = 1 and b=1; a | b ---+--- (0 rows) This is broken. I've not looked into why yet, but from looking at the EXPLAIN output I was a bit surprised to see b=1 as an index condition. I'd have expected a Filter maybe, but I've not looked at the EXPLAIN code to see how those are determined yet. Hmm... Do you use both patches? And could you provide index definition, I can't reproduce the problem assuming that index is created by the statement CREATE INDEX idx ON ab (a) INCLUDING (b); -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS]WIP: Covering + unique indexes.
04.01.2016 11:49, David Rowley: On 2 December 2015 at 01:53, Anastasia Lubennikova <a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> wrote: Finally, completed patch "covering_unique_3.0.patch" is here. It includes the functionality discussed above in the thread, regression tests and docs update. I think it's quite ready for review. Hi Anastasia, I've maybe mentioned before that I think this is a great feature and I think it will be very useful to have, so I've signed up to review the patch, and below is the results of my first pass from reading the code. Apologies if some of the things seem like nitpicks, I've basically just listed everything I've noticed during, no matter how small. First of all, I would like to thank you for writing such a detailed review. All mentioned style problems, comments and typos are fixed in the patch v4.0. + An access method that supports this feature sets pg_am.amcanincluding true. I don't think this belongs under the "Index Uniqueness Checks" title. I think the "Columns included with clause INCLUDING aren't used to enforce uniqueness." that you've added before it is a good idea, but perhaps the details of amcanincluding are best explained elsewhere. agree +This clause specifies additional columns to be appended to the set of index columns. +Included columns don't support any constraints (UNIQUE, PRMARY KEY, EXCLUSION CONSTRAINT). +These columns can improve the performance of some queries through using advantages of index-only scan +(Or so called covering indexes. Covering index is the index that +covers all columns required in the query and prevents a table access). +Besides that, included attributes are not stored in index inner pages. +It allows to decrease index size and furthermore it provides a way to extend included +columns to store atttributes without suitable opclass (not implemented yet). +This clause could be applied to both unique and nonunique indexes. +It's possible to have non-unique covering index, which behaves as a regular +multi-column index with a bit smaller index-size. +Currently, only the B-tree access method supports this feature. "PRMARY KEY" should be "PRIMARY KEY". I ended up rewriting this paragraph as follows. "An optional INCLUDING clause allows a list of columns to be specified which will be included in the index, in the non-key portion of the index. Columns which are part of this clause cannot also exist in the indexed columns portion of the index, and vice versa. The INCLUDING columns exist solely to allow more queries to benefit from index only scans by including certain columns in the index, the value of which would otherwise have to be obtained by reading the table's heap. Having these columns in the INCLUDING clause in some cases allows PostgreSQL to skip the heap read completely. This also allows UNIQUE indexes to be defined on one set of columns, which can include another set of column in the INCLUDING clause, on which the uniqueness is not enforced upon. This can also be useful for non-unique indexes as any columns which are not required for the searching or ordering of records can defined in the INCLUDING clause, which can often reduce the size of the index." Maybe not perfect, but maybe it's an improvement? Yes, this explanation is much better. I've just added couple of notes. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 97ef618..d17a06c 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -644,6 +644,13 @@ Does an index of this type manage fine-grained predicate locks? + + amcaninclude + bool + + Does the access method support included columns? + + amkeytype oid @@ -3714,6 +3721,14 @@ pg_class.relnatts) + + indnkeyatts + int2 + + The number of key columns in the index. "Key columns" are ordinary + index columns in contrast with "included" columns. + + indisunique bool diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index 1c09bae..a102391 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -765,9 +765,10 @@ amrestrpos (IndexScanDesc scan); PostgreSQL enforces SQL uniqueness constraints using unique indexes, which are indexes that disallow - multiple entries with identical keys. An access method that supports this + multiple entries with identical keys. An access method that supports this feature sets pg_am.amcanunique true. - (At present, only b-tree supports it.) + Columns included wit
Re: [HACKERS]WIP: Covering + unique indexes.
for the data type. create index on tbl (a) including (b); CREATE INDEX This functionality is provided by the attached patch "omit_opclass_4.0", which must be applied over covering_unique_4.0.patch. I see what you were confused about, I'd had the same question at the very beginning of the discussion of this patch. Now it seems a bit more clear to me. INCLUDING columns are not used for the searching or ordering of records, so there is no need to check whether they have an opclass. INCLUDING columns perform as expected and it agrees with other database experience. And this patch is completed. But it isn't perfect definitely... I found test case to explain that. See below. That's why we need optional_opclass functionality, which will use opclass where possible and omit it in other cases. This idea have been already described in a message Re: [PROPOSAL] Covering + unique indexes <http://www.postgresql.org/message-id/55f84df4.5030...@postgrespro.ru>as "partially unique index". I suggest to separate optional_opclass task to ease syntax discussion and following review. And I'll implement it in the next patch a bit later. Test case: 1) patch covering_unique_4.0 + test_covering_unique_4.0 If included columns' opclasses are used, new query plan is the same with the old one. and have nearly the same execution time: QUERY PLAN Index Only Scan using oldcoveringidx on oldt (cost=0.43..301.72 rows=1 width=8) (actual time=0.021..0.676 rows=6 loops=1) Index Cond: ((c1 < 1) AND (c3 < 20)) Heap Fetches: 0 Planning time: 0.101 ms Execution time: 0.697 ms (5 rows) QUERY PLAN Index Only Scan using newidx on newt (cost=0.43..276.51 rows=1 width=8) (actual time=0.020..0.665 rows=6 loops=1) Index Cond: ((c1 < 1) AND (c3 < 20)) Heap Fetches: 0 Planning time: 0.082 ms Execution time: 0.687 ms (5 rows) 2) patch covering_unique_4.0 + patch omit_opclass_4.0 + test_covering_unique_4.0 Otherwise, new query can not use included column in Index Cond and uses filter instead. It slows down the query significantly. QUERY PLAN Index Only Scan using oldcoveringidx on oldt (cost=0.43..230.39 rows=1 width=8) (actual time=0.021..0.722 rows=6 loops=1) Index Cond: ((c1 < 1) AND (c3 < 20)) Heap Fetches: 0 Planning time: 0.091 ms Execution time: 0.744 ms (5 rows) QUERY PLAN Index Only Scan using newidx on newt (cost=0.43..374.68 rows=1 width=8) (actual time=0.018..2.595 rows=6 loops=1) Index Cond: (c1 < 1) Filter: (c3 < 20) Rows Removed by Filter: 9993 Heap Fetches: 0 Planning time: 0.078 ms Execution time: 2.612 ms -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 97ef618..d17a06c 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -644,6 +644,13 @@ Does an index of this type manage fine-grained predicate locks? + + amcaninclude + bool + + Does the access method support included columns? + + amkeytype oid @@ -3714,6 +3721,14 @@ pg_class.relnatts) + + indnkeyatts + int2 + + The number of key columns in the index. "Key columns" are ordinary + index columns in contrast with "included" columns. + + indisunique bool diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index 1c09bae..a102391 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -765,9 +765,10 @@ amrestrpos (IndexScanDesc scan); PostgreSQL enforces SQL uniqueness constraints using unique indexes, which are indexes that disallow - multiple entries with identical keys. An access method that supports this + multiple entries with identical keys. An access method that supports this feature sets pg_am.amcanunique true. - (At present, only b-tree supports it.) + Columns included with clause INCLUDING aren't used to enforce uniqueness. + (At present, only b-tree supports them.) diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index 23bbec6..44dddb6 100644
Re: [HACKERS]WIP: Covering + unique indexes.
03.12.2015 04:03, Robert Haas пишет: On Tue, Dec 1, 2015 at 7:53 AM, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: If we don't need c4 as an index scankey, we don't need any btree opclass on it. But we still want to have it in covering index for queries like SELECT c4 FROM tbl WHERE c1=1000; // uses the IndexOnlyScan SELECT * FROM tbl WHERE c1=1000; // uses the IndexOnlyScan The patch "optional_opclass" completely ignores opclasses of included attributes. OK, I don't get it. Why have an opclass here at all, even optionally? We haven't opclass on c4 and there's no need to have it. But now (without a patch) it's impossible to create covering index, which contains columns with no opclass for btree. test=# create index on tbl using btree (c1, c4); ERROR: data type box has no default operator class for access method "btree" ComputeIndexAttrs() processes the list of index attributes and trying to get an opclass for each of them via GetIndexOpClass(). The patch drops this check for included attributes. So it makes possible to store any datatype in btree and use IndexOnlyScan advantages. I hope that this helps to clarify. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS]WIP: Covering + unique indexes.
Finally, completed patch "covering_unique_3.0.patch" is here. It includes the functionality discussed above in the thread, regression tests and docs update. I think it's quite ready for review. _Future work:_ Besides that, I'd like to get feedback about attached patch "optional_opclass_3.0.patch". It should be applied on the "covering_unique_3.0.patch". Actually, this patch is the first step to do opclasses for "included" columns optional and implement real covering indexing. Example: CREATE TABLE tbl (c1 int, c4 box); CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4); If we don't need c4 as an index scankey, we don't need any btree opclass on it. But we still want to have it in covering index for queries like SELECT c4 FROM tbl WHERE c1=1000; // uses the IndexOnlyScan SELECT * FROM tbl WHERE c1=1000; // uses the IndexOnlyScan The patch "optional_opclass" completely ignores opclasses of included attributes. To see the difference, look at the explain analyze output: explain analyze select * from tbl where c1=2 and c4 && box '(0,0,1,1)'; QUERY PLAN --- Index Only Scan using idx on tbl (cost=0.13..4.15 rows=1 width=36) (actual time=0.010..0.013 rows=1 loops=1) Index Cond: (c1 = 2) Filter: (c4 && '(1,1),(0,0)'::box) "Index Cond" shows the index ScanKey conditions and "Filter" is for conditions which are used after index scan. Anyway it is faster than SeqScan that we had before, because IndexOnlyScan avoids extra heap fetches. As I already said, this patch is just WIP, so included opclass is not "optional" but actually "ignored". And following example works worse than without the patch. Please, don't care about it. CREATE TABLE tbl2 (c1 int, c2 int); CREATE UNIQUE INDEX idx2 ON tbl2 USING btree (c1) INCLUDING (c2); explain analyze select * from tbl2 where c1<20 and c2<5; QUERY PLAN --- Index Only Scan using idx2 on tbl2 (cost=0.28..4.68 rows=10 width=8) (actual time=0.055..0.066 rows=9 loops=1) Index Cond: (c1 < 20) Filter: (c2 < 5) The question is more about suitable syntax. We have two different optimizations here: 1. INCLUDED columns 2. Optional opclasses It's logical to provide optional opclasses only for included columns. Is it ok, to handle it using the same syntax and resolve all opclass conflicts while create index? CREATE TABLE tbl2 (c1 int, c2 int, c4 box); CREATE UNIQUE INDEX idx2 ON tbl2 USING btree (c1) INCLUDING (c2, c4); CREATE UNIQUE INDEX idx3 ON tbl2 USING btree (c1) INCLUDING (c4, c2); Of course, order of attributes is important. Attrs which have oplass and want to use it in ScanKey must be situated before the others. idx2 will use c2 in IndexCond, while idx3 will not. But I think that it's the job for DBA. If you see any related changes in planner, please mention them. I haven't explored that part of code yet and could have missed something. -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index b4ea227..4973e1b 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -644,6 +644,13 @@ Does an index of this type manage fine-grained predicate locks? + + amcanincluding + bool + + Does the access method support included columns? + + amkeytype oid @@ -3704,6 +3711,14 @@ pg_class.relnatts) + + indnkeyatts + int2 + + The number of key columns in the index. "Key columns" are ordinary + index columns in contrast with "included" columns. + + indisunique bool diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index 1c09bae..0287c62 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -765,9 +765,11 @@ amrestrpos (IndexScanDesc scan); PostgreSQL enforces SQL uniqueness constraints using unique indexes, which are indexes that disallow - multiple entries with identical keys. An access method that supports this + multiple entries with identical keys. An access method that supports this feature sets pg_am.amcanunique true. - (At present, only b-tree supports it.) + Columns included with clause INCLUDING aren't used to enforce uniqueness. + An access method that supports this feature sets pg_am.amcanincluding true. + (At present, only b-tree supports them.) diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/i
Re: [HACKERS]WIP: Covering + unique indexes.
08.10.2015 19:31, Thom Brown пишет: On 8 October 2015 at 16:18, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: Hi hackers, I'm working on a patch that allows to combine covering and unique functionality for btree indexes. Previous discussion was here: 1) Proposal thread 2) Message with proposal clarification In a nutshell, the feature allows to create index with "key" columns and "included" columns. "key" columns can be used as scan keys. Unique constraint relates only to "key" columns. "included" columns may be used as scan keys if they have suitable opclass. Both "key" and "included" columns can be returned from index by IndexOnlyScan. Btree is the default index and it's used everywhere. So it requires properly testing. Volunteers are welcome) Use case: - We have a table (c1, c2, c3, c4); - We need to have an unique index on (c1, c2). - We would like to have a covering index on all columns to avoid reading of heap pages. Old way: CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2); CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4); What's wrong? Two indexes contain repeated data. Overhead to data manipulation operations and database size. New way: CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4); The patch is attached. In 'test.sql' you can find a test with detailed comments on each step, and comparison of old and new indexes. New feature has following syntax: CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4); Keyword INCLUDING defines the "included" columns of index. These columns aren't concern to unique constraint. Also, them are not stored in index inner pages. It allows to decrease index size. Results: 1) Additional covering index is not required anymore. 2) New index can use IndexOnlyScan on queries, where old index can't. For example, explain analyze select c1, c2 from newt where c1<1 and c3<20; *more examples in 'test.sql' Future work: To do opclasses for "included" columns optional. CREATE TABLE tbl (c1 int, c4 box); CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4); If we don't need c4 as an index scankey, we don't need any btree opclass on it. But we still want to have it in covering index for queries like SELECT c4 FROM tbl WHERE c1=1000; SELECT * FROM tbl WHERE c1=1000; The definition output needs a space after "INCLUDING": # SELECT pg_get_indexdef('people_first_name_last_name_email_idx'::regclass::oid); pg_get_indexdef -- CREATE UNIQUE INDEX people_first_name_last_name_email_idx ON people USING btree (first_name, last_name) INCLUDING(email) (1 row) There is also no collation output: # CREATE UNIQUE INDEX test_idx ON people (first_name COLLATE "en_GB", last_name) INCLUDING (email COLLATE "pl_PL"); CREATE INDEX # SELECT pg_get_indexdef('test_idx'::regclass::oid); pg_get_indexdef - CREATE UNIQUE INDEX test_idx ON people USING btree (first_name COLLATE "en_GB", last_name) INCLUDING(email) (1 row) As for functioning, it works as described: # EXPLAIN SELECT email FROM people WHERE (first_name,last_name) = ('Paul','Freeman'); QUERY PLAN -- Index Only Scan using people_first_name_last_name_email_idx on people (cost=0.28..1.40 rows=1 width=21) Index Cond: ((first_name = 'Paul'::text) AND (last_name = 'Freeman'::text)) (2 rows) Typo: "included columns must not intersects with key columns" should be: "included columns must not intersect with key columns" Thank you for testing. Mentioned issues are fixed. One thing I've noticed you can do with your patch, which you haven't mentioned, is have a non-unique covering index: # CREATE INDEX covering_idx ON people (first_name) INCLUDING (last_name); CREATE INDEX # EXPLAIN SELECT first_name, last_name FROM people WHERE first_name = 'Paul'; QUERY PLAN - Index Only Scan using covering_idx on people (cost=0.28..1.44 rows=4 width=13) Index Cond: (first_name = 'Paul'::text) (2 rows) But this appears to behave as if it were a regular multi-column index, in that it will use the index for ordering rather than sort after fetching from the index. So is this really stored the same as a multi-column index? The index sizes aren't identical, so some
[HACKERS]WIP: Covering + unique indexes.
Hi hackers, I'm working on a patch that allows to combine covering and unique functionality for btree indexes. _Previous discussion was here:_ 1) Proposal thread <http://www.postgresql.org/message-id/55f2ccd0.7040...@postgrespro.ru> 2) Message with proposal clarification <http://www.postgresql.org/message-id/55f84df4.5030...@postgrespro.ru> In a nutshell, the feature allows to create index with "key" columns and "included" columns. "key" columns can be used as scan keys. Unique constraint relates only to "key" columns. "included" columns may be used as scan keys if they have suitable opclass. Both "key" and "included" columns can be returned from index by IndexOnlyScan. Btree is the default index and it's used everywhere. So it requires properly testing. Volunteers are welcome) _Use case:_ - We have a table (c1, c2, c3, c4); - We need to have an unique index on (c1, c2). - We would like to have a covering index on all columns to avoid reading of heap pages. Old way: CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2); CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4); What's wrong? Two indexes contain repeated data. Overhead to data manipulation operations and database size. New way: CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4); The patch is attached. In 'test.sql' you can find a test with detailed comments on each step, and comparison of old and new indexes. New feature has following syntax: CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4); Keyword INCLUDING defines the "included" columns of index. These columns aren't concern to unique constraint. Also, them are not stored in index inner pages. It allows to decrease index size. _Results:_ 1) Additional covering index is not required anymore. 2) New index can use IndexOnlyScan on queries, where old index can't. For example, explain analyze select c1, c2 from newt where c1<1 and c3<20; *more examples in 'test.sql' _Future work:_ To do opclasses for "included" columns optional. CREATE TABLE tbl (c1 int, c4 box); CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4); If we don't need c4 as an index scankey, we don't need any btree opclass on it. But we still want to have it in covering index for queries like SELECT c4 FROM tbl WHERE c1=1000; SELECT * FROM tbl WHERE c1=1000; -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company test.sql Description: application/sql diff --git a/src/backend/access/common/indextuple.c b/src/backend/access/common/indextuple.c index dc588d7..83f24c3 100644 --- a/src/backend/access/common/indextuple.c +++ b/src/backend/access/common/indextuple.c @@ -19,6 +19,7 @@ #include "access/heapam.h" #include "access/itup.h" #include "access/tuptoaster.h" +#include "utils/rel.h" /* @@ -441,3 +442,30 @@ CopyIndexTuple(IndexTuple source) memcpy(result, source, size); return result; } + +/* + * Reform index tuple. Truncate nonkey (INCLUDED) attributes. + */ +IndexTuple +index_reform_tuple(Relation idxrel, IndexTuple olditup, int natts, int nkeyatts) +{ + TupleDesc itupdesc = RelationGetDescr(idxrel); + Datum values[INDEX_MAX_KEYS]; + boolisnull[INDEX_MAX_KEYS]; + IndexTuple newitup; + + Assert(natts <= INDEX_MAX_KEYS); + Assert(nkeyatts > 0); + Assert(nkeyatts <= natts); + + index_deform_tuple(olditup, itupdesc, values, isnull); + + /* form new tuple that will contain only key attributes */ + itupdesc->natts = nkeyatts; + newitup = index_form_tuple(itupdesc, values, isnull); + newitup->t_tid = olditup->t_tid; + + itupdesc->natts = natts; + + return newitup; +} diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c index 77c2fdf..d14df12 100644 --- a/src/backend/access/nbtree/nbtinsert.c +++ b/src/backend/access/nbtree/nbtinsert.c @@ -108,18 +108,23 @@ _bt_doinsert(Relation rel, IndexTuple itup, IndexUniqueCheck checkUnique, Relation heapRel) { bool is_unique = false; - int natts = rel->rd_rel->relnatts; + int nkeyatts = rel->rd_rel->relnatts; ScanKey itup_scankey; BTStack stack; Buffer buf; OffsetNumber offset; + Assert (rel->rd_index != NULL); + Assert(rel->rd_index->indnatts != 0); + Assert(rel->rd_index->indnkeyatts != 0); + nkeyatts = IndexRelationGetNumberOfKeyAttributes(rel); + /* we need an insertion scan key to do our search, so build one */ itup_scankey = _bt_mkscankey(rel, itup); top: /* find the first page containing this key */ - stack = _bt_search(rel, natts, itup_scankey, false, , BT_WRITE); + stack = _bt_search(rel, nkeyatts, itup_scankey, false, , BT_WRITE); offset = InvalidOffset
Re: [HACKERS] [PATCH] Microvacuum for gist.
16.09.2015 07:30, Jeff Janes: The commit of this patch seems to have created a bug in which updated tuples can disappear from the index, while remaining in the table. It looks like the bug depends on going through a crash-recovery cycle, but I am not sure of that yet. I've looked through the commit diff and don't see anything obviously wrong. I notice index tuples are marked dead with only a buffer content share lock, and the page is defragmented with only a buffer exclusive lock (as opposed to a super-exclusive buffer clean up lock). But as far as I can tell, both of those should be safe on an index. Also, if that was the bug, it should happen without crash-recovery. The test is pretty simple. I create a 10,000 row table with a unique-by-construction id column with a btree_gist index on it and a counter column, and fire single-row updates of the counter for random ids in high concurrency (8 processes running flat out). I force the server to crash frequently with simulated torn-page writes in which md.c writes a partial page and then PANICs. Eventually (1 to 3 hours) the updates start indicating they updated 0 rows. At that point, a forced table scan will find the row, but the index doesn't. Any hints on how to proceed with debugging this? If I can't get it to reproduce the problem in the absence of crash-recovery cycles with an overnight run, then I think my next step will be to run it over hot-standby and see if WAL replay in the absence of crashes might be broken as well. I've found the bug. It's because of mixed calls of PageIndexMultiDelete() in gistvacuumpage() and PageIndexTupleDelete() in gistRedoPageUpdateRecord(). These functions are conflicting. I've fixed my patch by change MultiDelete to TupleDelete in gistvacuumpage(). Patch is attached. But It seems to me that it would be better to rewrite all mentions of TupleDelete to MultiDelete in gist code. I'm working on it. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company() diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c index 4edc5a7..1d02e1b 100644 --- a/src/backend/access/gist/gist.c +++ b/src/backend/access/gist/gist.c @@ -1478,14 +1478,20 @@ gistvacuumpage(Relation rel, Page page, Buffer buffer) ItemId itemId = PageGetItemId(page, offnum); if (ItemIdIsDead(itemId)) - deletable[ndeletable++] = offnum; + { + deletable[ndeletable] = offnum - ndeletable; + ndeletable++; + } } if (ndeletable > 0) { + int i; + START_CRIT_SECTION(); - PageIndexMultiDelete(page, deletable, ndeletable); + for (i = 0; i < ndeletable; i++) + PageIndexTupleDelete(page, deletable[i]); /* * Mark the page as not containing any LP_DEAD items. This is not -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers