Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-05 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

The patch looks fine to me. Changes are clear and all functions are commented 
properly.
So I marked it "Ready for committer".

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-04-04 Thread Anastasia Lubennikova

01.04.2017 02:31, Peter Geoghegan:


* index_truncate_tuple() should have as an argument the number of
attributes. No need to "#include utils/rel.h" that way.

Will fix.


* I think that we should store this (the number of attributes), and
use it directly when comparing, per my remarks to Tom over on that
other thread. We should also use the free bit within
IndexTupleData.t_info, to indicate that the IndexTuple was truncated,
just to make it clear to everyone that might care that that's how
these truncated IndexTuples need to be represented.

Doing this would have no real impact on your patch, because for you
this will be 100% redundant. It will help external tools, and perhaps
another, more general suffix truncation patch that comes in the
future. We should try very hard to have a future-proof on-disk
representation. I think that this is quite possible.
To be honest, I think that it'll make the patch overcomplified, because 
this exact patch has nothing to do with suffix truncation.
Although, we can add any necessary flags if this work will be continued 
in the future.

* I suggest adding a "can't happen" defensive check + error that
checks that the tuple returned by index_truncate_tuple() is sized <=
the original. This cannot be allowed to ever happen. (Not that I think
it will.)

There is already an assertion.
Assert(IndexTupleSize(newitup) <= IndexTupleSize(olditup));
Do you think it is not enough?

* I see a small bug. You forgot to teach _bt_findsplitloc() about
truncation. It does this currently, which you did not update:

 /*
  * The first item on the right page becomes the high key of the left page;
  * therefore it counts against left space as well as right space.
  */
 leftfree -= firstrightitemsz;

I think that this accounting needs to be fixed.
Could you explain, what's wrong with this accounting? We may expect to 
take more space on the left page, than will be taken after highkey 
truncation. But I don't see any problem here.



* Note sure about one thing. What's the reason for this change?


-   /* Log left page */
-   if (!isleaf)
-   {
-   /*
-* We must also log the left page's high key, because the right
-* page's leftmost key is suppressed on non-leaf levels.  Show it
-* as belonging to the left page buffer, so that it is not stored
-* if XLogInsert decides it needs a full-page image of the left
-* page.
-*/
-   itemid = PageGetItemId(origpage, P_HIKEY);
-   item = (IndexTuple) PageGetItem(origpage, itemid);
-   XLogRegisterBufData(0, (char *) item, 
MAXALIGN(IndexTupleSize(item)));
-   }
+   /*
+* We must also log the left page's high key, because the right
+* page's leftmost key is suppressed on non-leaf levels.  Show it
+* as belonging to the left page buffer, so that it is not stored
+* if XLogInsert decides it needs a full-page image of the left
+* page.
+*/
+   itemid = PageGetItemId(origpage, P_HIKEY);
+   item = (IndexTuple) PageGetItem(origpage, itemid);
+   XLogRegisterBufData(0, (char *) item, MAXALIGN(IndexTupleSize(item)));

Is this related to the problem that you mentioned to me that you'd
fixed when we spoke in person earlier today? You said something about
WAL logging, but I don't recall any details. I don't remember seeing
this change in prior versions.

Anyway, whatever the reason for doing this on the leaf level now, the
comments should be updated to explain it.

This change related to the bug described in this message.
https://www.postgresql.org/message-id/20170330192706.GA2565%40e733.localdomain
After fix it is not reproducible. I will update comments in the next patch.

* Speaking of WAL-logging, I think that there is another bug in
btree_xlog_split(). You didn't change this existing code at all:

 /*
  * On leaf level, the high key of the left page is equal to the first key
  * on the right page.
  */
 if (isleaf)
 {
 ItemId  hiItemId = PageGetItemId(rpage, P_FIRSTDATAKEY(ropaque));

 left_hikey = PageGetItem(rpage, hiItemId);
 left_hikeysz = ItemIdGetLength(hiItemId);
 }

It seems like this was missed when you changed WAL-logging, since you
do something for this on the logging side, but not here, on the replay
side. No?


I changed it. Now we always use highkey saved in xlog.
This code don't needed anymore and can be deleted. Thank you for the 
notice. I will send updated patch today.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-31 Thread Anastasia Lubennikova

31.03.2017 20:57, Robert Haas:

On Fri, Mar 31, 2017 at 1:40 PM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

First of all, I want to thank you and Robert for reviewing this patch.
Your expertise in postgres subsystems is really necessary for features like
this.
I just wonder, why don't you share your thoughts and doubts till the "last
call".

I haven't done any significant technical work other than review
patches in 14 months, and in the last several months I've often worked
10 and 12 hour days to get more review done.

I think at one level you've got a fair complaint here - it's hard to
get things committed, and this patch probably didn't get as much
attention as it deserved.  It's not so easy to know how to fix that.
I'm pretty sure "tell Andres and Robert to work harder" isn't it.



*off-topic*
No complaints from me, I understand how difficult is reviewing and 
highly appreciate your work.

The problem is that not all developers are qualified enough to do a review.
I've tried to make a course about postrges internals. Something like 
"Deep dive into postgres codebase for hackers".
And it turned out to be really helpful for new developers. So, I wonder, 
maybe we could write some tips for new reviewers and testers as well.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-31 Thread Anastasia Lubennikova

31.03.2017 20:47, Andres Freund:

Maybe it would be better to modify index_form_tuple() to accept a new
argument with a number of attributes, then you can just Assert that
this number is never higher than the number of attributes in the
TupleDesc.

Good point.
I agree that this function is a bit strange. I have to set
tupdesc->nattrs to support compatibility with index_form_tuple().
I didn't want to add neither a new field to tupledesc nor a new
parameter to index_form_tuple(), because they are used widely.


But I haven't considered the possibility of index_form_tuple() failure.
Fixed in this version of the patch. Now it creates a copy of tupledesc to
pass it to index_form_tuple.

That seems like it'd actually be a noticeable increase in memory
allocator overhead.  I think we should just add (as just proposed in
separate thread), a _extended version of it that allows to specify the
number of columns.


The function is called not that often. Only once per page split for 
indexes with included columns.
Doesn't look like dramatic overhead. So I decided that a wrapper 
function would be more appropriate than refactoring of all 
index_form_tuple() calls.

But index_form_tuple_extended() looks like a better solution.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-30 Thread Anastasia Lubennikova

30.03.2017 19:49, Robert Haas:

On Thu, Mar 30, 2017 at 11:26 AM, Teodor Sigaev <teo...@sigaev.ru> wrote:

I had a look on patch and played with it, seems, it looks fine. I splitted
it to two patches: core changes (+bloom index fix) and btree itself. All
docs are left in first patch - I'm too lazy to rewrite documentation which
is changed in second patch.
Any objection from reviewers to push both patches?

Has this really had enough review and testing?  The last time it was
pushed, it didn't go too well.  And laziness is not a very good excuse
for not dividing up patches properly.


Well,
I don't know how can we estimate the quality of the review or testing.
The patch was reviewed by many people.
Here are those who marked themselves as reviewers on this and previous 
committfests: Stephen Frost (sfrost), Andrew Dunstan (adunstan), 
Aleksander Alekseev (a.alekseev), Amit Kapila (amitkapila), Andrey 
Borodin (x4m), Peter Geoghegan (pgeoghegan), David Rowley (davidrowley).


For me it looks serious enough. These people, as well as many others, 
shared their thoughts on this topic and pointed out various mistakes.
I fixed all the issues as soon as I could. And I'm not going to 
disappear when it will be committed. Personally, I always thought that 
we have Alpha and Beta releases for integration testing.


Speaking of the feature itself, it is included into our fork of 
PostgreSQL 9.6 since it was released.
And as far as I know, there were no complaints from users. It makes me 
believe that there are no critical bugs there.

While there may be conflicts with some other features of v10.0.


It seems highly surprising to me that CheckIndexCompatible() only gets
a one line change in this patch.  That seems unlikely to be correct.
What makes you think so? CheckIndexCompatible() only cares about 
possible opclasses' changes.
For covering indexes opclasses are only applicable to indnkeyatts. And 
that is exactly what was changed in this patch.

Do you think it needs some other changes?


Has anybody done some testing of this patch with the WAL consistency
checker?  Like, create some tables with indexes that have INCLUDE
columns, set up a standby, enable consistency checking, pound the
master, and see if the standby bails?
Good point. I missed this feature, I wish someone mentioned this issue a 
bit earlier.

And as Alexander's test shows there is some problem with my patch, indeed.
I'll fix it and send updated patch.


Has anybody tested this patch with amcheck?  Does it break amcheck?
Yes, it breaks amcheck. Amcheck should be patched in order to work with 
covering indexes.

We've discussed it with Peter before and I even wrote small patch.
I'll attach it in the following message.


A few minor comments:

-foreach(lc, constraint->keys)
+else foreach(lc, constraint->keys)

That doesn't look like a reasonable way of formatting the code.

+/* Here is some code duplication. But we do need it. */

That is not a very informative comment.

+* NOTE It is not crutial for reliability in present,

Spelling, punctuation.



Will be fixed as well.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_serial early wraparound

2017-03-24 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi, I've tried to review this patch, but it seems that I miss something 
essential.
You claim that SLRUs now support five digit segment name, while in slru.h
at current master I see the following:

 * Note: slru.c currently assumes that segment file names will be four hex
 * digits.  This sets a lower bound on the segment size (64K transactions
 * for 32-bit TransactionIds).
 */
#define SLRU_PAGES_PER_SEGMENT  32

/* Maximum length of an SLRU name */
#define SLRU_MAX_NAME_LENGTH32

Could you please clarify the idea of the patch? Is it still relevant?

I've also run your test script.
pg_clog was renamed to pg_xact, so it need to be changed accordingly
echo "Contents of pg_clog:"
  ls $PGDATA/pg_xact/


The test shows failed assertion:

== setting next xid to 1073741824 =
Transaction log reset
waiting for server to start2017-03-24 17:05:19.897 MSK [1181] LOG:  
listening on IPv4 address "127.0.0.1", port 5432
2017-03-24 17:05:19.981 MSK [1181] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2017-03-24 17:05:20.081 MSK [1183] LOG:  database system was shut down at 
2017-03-24 17:05:19 MSK
2017-03-24 17:05:20.221 MSK [1181] LOG:  database system is ready to accept 
connections
 done
server started
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template0"
vacuumdb: vacuuming database "template1"
TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(oldestXact, 
ShmemVariableCache->oldestXid))", File: "clog.c", Line: 669)
vacuumdb: vacuuming of database "template1" failed: server closed the 
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2017-03-24 17:05:21.541 MSK [1181] LOG:  server process (PID 1202) was 
terminated by signal 6: Aborted
2017-03-24 17:05:21.541 MSK [1181] DETAIL:  Failed process was running: VACUUM 
(FREEZE);

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-22 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

The patch looks good to me.
It applies clearly, passes all the tests and eliminates the bottleneck 
described in the first message.
And as I can see from the thread, there were no objections from others,
except a few minor comments about code style, which are fixed in the last 
version of the patch.
So I mark it "Ready for committer".

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-21 Thread Anastasia Lubennikova

Patch rebased to the current master is in attachments.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit 497d52b713dd8f926b465ddf22f21db7229b12e3
Author: Anastasia <a.lubennik...@postgrespro.ru>
Date:   Tue Mar 21 12:58:13 2017 +0300

include_columns_10.0_v4.patch

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index c1e9089..5c80e3b 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1480,7 +1480,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1506,7 +1506,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_PP(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1526,9 +1526,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2016,10 +2016,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2029,8 +2029,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2051,12 +2051,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 1241108..943e410 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 appendStringInfoCharMacro(payload, ',');
 appendStringInfoCharMacro(payload, operation);
 
-for (i = 0; i < numatts; i++)
+for (i = 0; i < indnkeyatts; i++)
 {
 	int			colno = index->indkey.values[i];
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index df0435c..e196e20 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3620,6 +3620,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns (as opposed to "included" columns).
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index ac51258..f9539e9 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -114,6 +114,8 @@ typedef struct IndexAmRoutine
 boolamcanparallel;
 /* type of data stored in index, or InvalidOi

Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-03-21 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

As I can see, this bugfix was already discussed and reviewed.
All mentioned issues are fixed in the latest version of the patch
So I mark it "Ready for committer".
Thank you for fixing it!

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gin and btree_gist for enums

2017-03-21 Thread Anastasia Lubennikova

28.02.2017 00:22, Andrew Dunstan:

OK, here's the whole series of patches.

Patch 1 adds the CallerFInfoFunctionCall{1,2} functions.

Patch 2 adds btree_gist support for their use for non-varlena types

Patch 3 does the same for varlena types (Not required for patch 4, but
better to be consistent, I think.)

Patch 4 adds enum support to btree_gist

Patch 5 adds enum support to btree_gin

cheers

andrew


Thank you for implementing the feature!
These patches seem to have some merge conflicts with recent commit:

commit c7a9fa399d557c6366222e90b35db31e45d25678
Author: Stephen Frost <sfr...@snowman.net>
Date:   Wed Mar 15 11:16:25 2017 -0400

Add support for EUI-64 MAC addresses as macaddr8

And also, it's needed to update patch 0002 to consider macaddr8, I 
attached the diff.
Complete patch (including 0002_addition fix) rebased to the current 
master is attached as well.

It works as expected, code itself looks clear and well documented.

The only issue I've found is a make check failure in contrib/btree_gin 
subdirectory:


test numeric  ... ok
test enum ... /bin/sh: 1: cannot open 
/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/sql/enum.sql: 
No such file
diff: 
/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out: 
No such file or directory
diff command failed with status 512: diff 
"/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/expected/enum.out" 
"/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out" 
> 
"/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out.diff"



Please, add regression test for btree_gin, and this patch can be 
considered "Ready for committer".


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/btree_gist/btree_macaddr8.c b/contrib/btree_gist/btree_macaddr8.c
index 13238ef..96afbcd 100644
--- a/contrib/btree_gist/btree_macaddr8.c
+++ b/contrib/btree_gist/btree_macaddr8.c
@@ -28,37 +28,37 @@ PG_FUNCTION_INFO_V1(gbt_macad8_same);
 
 
 static bool
-gbt_macad8gt(const void *a, const void *b)
+gbt_macad8gt(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	return DatumGetBool(DirectFunctionCall2(macaddr8_gt, PointerGetDatum(a), PointerGetDatum(b)));
 }
 static bool
-gbt_macad8ge(const void *a, const void *b)
+gbt_macad8ge(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	return DatumGetBool(DirectFunctionCall2(macaddr8_ge, PointerGetDatum(a), PointerGetDatum(b)));
 }
 
 static bool
-gbt_macad8eq(const void *a, const void *b)
+gbt_macad8eq(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	return DatumGetBool(DirectFunctionCall2(macaddr8_eq, PointerGetDatum(a), PointerGetDatum(b)));
 }
 
 static bool
-gbt_macad8le(const void *a, const void *b)
+gbt_macad8le(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	return DatumGetBool(DirectFunctionCall2(macaddr8_le, PointerGetDatum(a), PointerGetDatum(b)));
 }
 
 static bool
-gbt_macad8lt(const void *a, const void *b)
+gbt_macad8lt(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	return DatumGetBool(DirectFunctionCall2(macaddr8_lt, PointerGetDatum(a), PointerGetDatum(b)));
 }
 
 
 static int
-gbt_macad8key_cmp(const void *a, const void *b)
+gbt_macad8key_cmp(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	mac8KEY*ia = (mac8KEY *) (((const Nsrt *) a)->t);
 	mac8KEY*ib = (mac8KEY *) (((const Nsrt *) b)->t);
@@ -142,7 +142,7 @@ gbt_macad8_consistent(PG_FUNCTION_ARGS)
 	key.upper = (GBT_NUMKEY *) >upper;
 
 	PG_RETURN_BOOL(
-   gbt_num_consistent(, (void *) query, , GIST_LEAF(entry), )
+   gbt_num_consistent(, (void *) query, , GIST_LEAF(entry), , fcinfo->flinfo)
 		);
 }
 
@@ -154,7 +154,7 @@ gbt_macad8_union(PG_FUNCTION_ARGS)
 	void	   *out = palloc0(sizeof(mac8KEY));
 
 	*(int *) PG_GETARG_POINTER(1) = sizeof(mac8KEY);
-	PG_RETURN_POINTER(gbt_num_union((void *) out, entryvec, ));
+	PG_RETURN_POINTER(gbt_num_union((void *) out, entryvec, , fcinfo->flinfo));
 }
 
 
@@ -184,7 +184,7 @@ gbt_macad8_picksplit(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(gbt_num_picksplit(
 	(GistEntryVector *) PG_GETARG_POINTER(0),
 	  (GIST_SPLITVEC *) PG_GETARG_POINTER(1),
-		
+		, fcinfo->flinfo
 		));
 }
 
@@ -195,6 +195,6 @@ gbt_macad8_same(PG_FUNCTION_ARGS)
 	mac8KEY*b2 = (mac8KEY *) PG_GETARG_POINTER(1);
 	bool	   *result = (bool *) PG_GETARG_POINTER(2);
 
-	*result = gbt_num_same((void *) b1, (void *) b2, );
+	*result = gbt_num_same((void *) b1, (void *) b2, , fcinfo->flinfo);
 	PG_RETURN_POINTER(result);
 }
commit 50f4a4efcb9ed3ae5e0378676459c6d1ce455bd2
Author: Anastasia <a.lubennik...@postgrespro.ru>
Date:   Tue Mar 21 10:11:37 2017 +0300

complete btree_gin_gist_enum patch

diff --git a/contrib/btree_gin/Makefile b/contrib/btree_gin/Makefile
index f22e

Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-15 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

As I see, this bugfix works as expected and the patch is small and clear,
so I marked it "Ready for committer".
Anyway, it would be great if David could also have a look at the patch.

And again, thank you for fixing this issue!

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-14 Thread Anastasia Lubennikova
As far as I understand, in this thread were discussed two bugs of 
pg_stop_backup().
Thanks to the clear descriptions above, I easily reproduced both of them.

BUG#1:
Server crashes on assertion on call of pg_stop_backup(false) after interrupted 
call of pg_stop_backup(false).
TRAP: FailedAssertion("!(XLogCtl->Insert.nonExclusiveBackups > 0)", File: 
"xlog.c", Line: 10747)

BUG#2:
Failure to start an exclusive backup with the same name, if previous exclusive 
backup was stopped in another session.

Both problems seem to be fixed with patch "backup-session-locks-fixes.patch".
Speaking of the patch itself, I have a question: shouldn't we also update 
sessionBackupState
in pg_stop_backup_callback() along with XLogCtl->Insert.exclusiveBackupState?

And couple of minor notes:
1) + * Routines to starting stop, and get status of a base backup
Probably should be: + * Routines to start, stop and get status of a base backup
And also this comment should be moved below the enum.

2) This is used in parallel of the shared memory status 
s/ in parallel of/ in parallel with


The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements

2017-03-13 Thread Anastasia Lubennikova

13.03.2017 11:53, Artur Zakirov:

On 15.02.2017 20:54, Anastasia Lubennikova wrote:


Done.



I have gotten the error that AlterUserMappingStmt doesn't have 
if_not_exists (in Russian):



gram.y: В функции «base_yyparse»:
gram.y:4918:7: ошибка: «AlterUserMappingStmt {aka struct 
AlterUserMappingStmt}» не содержит элемента с именем «if_not_exists»

  n->if_not_exists = false;
   ^~


After applying the CREATE USER patch in gram.y I have:



AlterUserMappingStmt: ALTER USER MAPPING FOR auth_ident SERVER name 
alter_generic_options

{
AlterUserMappingStmt *n = 
makeNode(AlterUserMappingStmt);

n->user = $5;
n->servername = $7;
n->options = $8;
n->if_not_exists = false;
$$ = (Node *) n;
}
| CREATE USER MAPPING IF_P NOT EXISTS FOR auth_ident 
SERVER name create_generic_options

{
CreateUserMappingStmt *n = 
makeNode(CreateUserMappingStmt);

n->user = $8;
n->servername = $10;
n->options = $11;
n->if_not_exists = true;
$$ = (Node *) n;
}
;


Here ALTER USER MAPPING and CREATE USER MAPPING commands were mixed.



Thanks for catching that.
It was caused by a conflict on applying of the patch.
Updated versions of both patches are attached.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/ref/create_server.sgml b/doc/src/sgml/ref/create_server.sgml
index 734c6c9..6777679 100644
--- a/doc/src/sgml/ref/create_server.sgml
+++ b/doc/src/sgml/ref/create_server.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE SERVER server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ]
+CREATE SERVER [IF NOT EXISTS] server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ]
 FOREIGN DATA WRAPPER fdw_name
 [ OPTIONS ( option 'value' [, ... ] ) ]
 
@@ -56,6 +56,19 @@ CREATE SERVER server_name [ TYPE '<
   Parameters
 
   
+
+  
+IF NOT EXISTS
+
+ 
+  Do not throw an error if a server with the same name already exists.
+  A notice is issued in this case.  Note that there is no guarantee that
+  the existing server is anything like the one that would have been
+  created.
+ 
+
+   
+

 server_name
 
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index d5d40e6..41b2c01 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -879,12 +879,25 @@ CreateForeignServer(CreateForeignServerStmt *stmt)
 
 	/*
 	 * Check that there is no other foreign server by this name.
+	 * Do nothing if IF NOT EXISTS was enforced.
 	 */
 	if (GetForeignServerByName(stmt->servername, true) != NULL)
-		ereport(ERROR,
-(errcode(ERRCODE_DUPLICATE_OBJECT),
- errmsg("server \"%s\" already exists",
-		stmt->servername)));
+	{
+		if (stmt->if_not_exists)
+		{
+			ereport(NOTICE,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("foreign server \"%s\" already exists, skipping",
+			stmt->servername)));
+			heap_close(rel, RowExclusiveLock);
+			return InvalidObjectAddress;
+		}
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("foreign server \"%s\" already exists",
+			stmt->servername)));
+	}
 
 	/*
 	 * Check that the FDW exists and that we have USAGE on it. Also get the
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e7acc2d..da67b51 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4621,6 +4621,19 @@ CreateForeignServerStmt: CREATE SERVER name opt_type opt_foreign_server_version
 	n->version = $5;
 	n->fdwname = $9;
 	n->options = $10;
+	n->if_not_exists = false;
+	$$ = (Node *) n;
+}
+| CREATE SERVER IF_P NOT EXISTS name opt_type opt_foreign_server_version
+		 FOREIGN DATA_P WRAPPER name create_generic_options
+{
+	CreateForeignServerStmt *n = makeNode(CreateForeignServerStmt);
+	n->servername = $6;
+	n->servertype = $7;
+	n->version = $8;
+	n->fdwname = $12;
+	n->options = $13;
+	n->if_not_exists = true;
 	$$ = (Node *) n;
 }
 		;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index a44d217..804436b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2151,6 +2151,7 @@ typedef struct CreateForeignServerStmt
 	char	   *servertype;		/* optional server type */
 	char	   *version;		/* optional server version */
 	char	   *fdwname;		/* FDW name */
+	bool		if_not_exists;	/* just do nothing if it already exists? */
 	List	   *opt

Re: [HACKERS] IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements

2017-02-15 Thread Anastasia Lubennikova

13.02.2017 19:34, Andrew Dunstan:


On 01/13/2017 08:36 AM, Anastasia Lubennikova wrote:

I implemented IF NOT EXISTS option for CREATE SERVER and CREATE USER
MAPPING statements
for one of our customers.
I think other users can also find it useful for scripting and
automated tasks.
The patches themselves are small and simple. Documentation is updated
as well.




This looks good and useful. Please add some regression tests.


Done.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/ref/create_server.sgml b/doc/src/sgml/ref/create_server.sgml
index 734c6c9..6777679 100644
--- a/doc/src/sgml/ref/create_server.sgml
+++ b/doc/src/sgml/ref/create_server.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE SERVER server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ]
+CREATE SERVER [IF NOT EXISTS] server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ]
 FOREIGN DATA WRAPPER fdw_name
 [ OPTIONS ( option 'value' [, ... ] ) ]
 
@@ -56,6 +56,19 @@ CREATE SERVER server_name [ TYPE '<
   Parameters
 
   
+
+  
+IF NOT EXISTS
+
+ 
+  Do not throw an error if a server with the same name already exists.
+  A notice is issued in this case.  Note that there is no guarantee that
+  the existing server is anything like the one that would have been
+  created.
+ 
+
+   
+

 server_name
 
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index 6ff8b69..b4ae5de 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -883,12 +883,25 @@ CreateForeignServer(CreateForeignServerStmt *stmt)
 
 	/*
 	 * Check that there is no other foreign server by this name.
+	 * Do nothing if IF NOT EXISTS was enforced.
 	 */
 	if (GetForeignServerByName(stmt->servername, true) != NULL)
-		ereport(ERROR,
-(errcode(ERRCODE_DUPLICATE_OBJECT),
- errmsg("server \"%s\" already exists",
-		stmt->servername)));
+	{
+		if (stmt->if_not_exists)
+		{
+			ereport(NOTICE,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("foreign server \"%s\" already exists, skipping",
+			stmt->servername)));
+			heap_close(rel, RowExclusiveLock);
+			return InvalidObjectAddress;
+		}
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("foreign server \"%s\" already exists",
+			stmt->servername)));
+	}
 
 	/*
 	 * Check that the FDW exists and that we have USAGE on it. Also get the
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a4edea0..d007468 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4655,6 +4655,19 @@ CreateForeignServerStmt: CREATE SERVER name opt_type opt_foreign_server_version
 	n->version = $5;
 	n->fdwname = $9;
 	n->options = $10;
+	n->if_not_exists = false;
+	$$ = (Node *) n;
+}
+| CREATE SERVER IF_P NOT EXISTS name opt_type opt_foreign_server_version
+		 FOREIGN DATA_P WRAPPER name create_generic_options
+{
+	CreateForeignServerStmt *n = makeNode(CreateForeignServerStmt);
+	n->servername = $6;
+	n->servertype = $7;
+	n->version = $8;
+	n->fdwname = $12;
+	n->options = $13;
+	n->if_not_exists = true;
 	$$ = (Node *) n;
 }
 		;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 07a8436..704bc6b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2113,6 +2113,7 @@ typedef struct CreateForeignServerStmt
 	char	   *servertype;		/* optional server type */
 	char	   *version;		/* optional server version */
 	char	   *fdwname;		/* FDW name */
+	bool		if_not_exists;	/* just do nothing if it already exists? */
 	List	   *options;		/* generic options to server */
 } CreateForeignServerStmt;
 
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 3a9fb8f..17f9f40 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -283,7 +283,9 @@ ERROR:  foreign-data wrapper "foo" does not exist
 CREATE FOREIGN DATA WRAPPER foo OPTIONS ("test wrapper" 'true');
 CREATE SERVER s1 FOREIGN DATA WRAPPER foo;
 CREATE SERVER s1 FOREIGN DATA WRAPPER foo;  -- ERROR
-ERROR:  server "s1" already exists
+ERROR:  foreign server "s1" already exists
+CREATE SERVER IF NOT EXISTS s1 FOREIGN DATA WRAPPER foo;	-- No ERROR, just NOTICE
+NOTICE:  foreign server "s1" already exists, skipping
 CREATE SERVER s2 FOREIGN DATA WRAPPER foo OPTIONS (host 'a', dbname 'b');
 CREATE SERVER s3 TYPE 'oracle' FOREIGN DATA WRAPPER foo;
 CREATE SERVER s4 TYPE 'oracle' FOREIGN DATA WRAPPER foo OPTIONS (host 'a', dbname 'b');
diff --git a/src/test/regress/sql

[HACKERS] Cast jsonb to numeric, int, float, bool

2017-02-01 Thread Anastasia Lubennikova
Now the simplest way to extract booleans and numbers from json/jsonb is 
to cast it

to text and then cast to the appropriate type:

postgres=# select 'true'::jsonb::text::bool;
 bool
--
 t
postgres=# select '1.0'::jsonb::text::numeric;
 numeric
-
 1.0


This patch implements direct casts from jsonb numeric (jbvNumeric) to 
numeric, int4 and float8,

and from jsonb bool (jbvBool) to bool.

postgres=# select 'true'::jsonb::bool;
 bool
--
 t

postgres=# select '1.0'::jsonb::numeric;
 numeric
-
 1.0


Waiting for your feedback.
If you find it useful, I can also add support of json and other types, 
such as smallint and bigint.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index b9bf18f..4bbe81c 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -1939,3 +1939,129 @@ jsonb_object_agg_finalfn(PG_FUNCTION_ARGS)
 
 	PG_RETURN_POINTER(out);
 }
+
+Datum
+jsonb_numeric(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvNumeric)
+			PG_RETURN_NUMERIC(v.val.numeric);
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json numeric")));
+}
+
+Datum
+jsonb_int4(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvNumeric)
+			PG_RETURN_INT32(DatumGetInt32(DirectFunctionCall1(numeric_int4,
+		  NumericGetDatum(v.val.numeric;
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json numeric")));
+}
+
+Datum
+jsonb_float8(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvNumeric)
+			PG_RETURN_FLOAT8(DatumGetFloat8(DirectFunctionCall1(numeric_float8,
+			NumericGetDatum(v.val.numeric;
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json numeric")));
+}
+
+Datum
+jsonb_bool(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvBool)
+			PG_RETURN_BOOL(v.val.boolean);
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json boolean")));
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 80a40ab..0646f99 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -377,5 +377,9 @@ DATA(insert ( 1700 1700 1703 i f ));
 /* json to/from jsonb */
 DATA(insert (  114 38020 a i ));
 DATA(insert ( 3802	1140 a i ));
+DATA(insert ( 3802	1700   774 e f ));
+DATA(insert ( 3802	23 775 e f ));
+DATA(insert ( 3802	701776 e f ));
+DATA(insert ( 3802	16 777 e f ));
 
 #endif   /* PG_CAST_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 31c828a..ff7da5b 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2364,6 +2364,15 @@ DESCR("convert int2 to numeric");
 DATA(insert OID = 1783

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-01-19 Thread Anastasia Lubennikova

28.12.2016 23:43, Claudio Freire:

Attached v4 patches with the requested fixes.


Sorry for being late, but the tests took a lot of time.

create table t1 as select i, md5(random()::text) from 
generate_series(0,4) as i;
create index md5_idx ON  t1(md5);
update t1 set md5 = md5((random() * (100 + 500))::text);
vacuum;

Patched vacuum used 2.9Gb of memory and vacuumed the index in one pass,
while for old version it took three passes (1GB+1GB+0.9GB).
Vacuum duration results:

vanilla:
LOG: duration: 4359006.327 ms  statement: vacuum verbose t1;
patched:
LOG: duration: 3076827.378 ms  statement: vacuum verbose t1;

We can see 30% vacuum speedup. I should note that this case can be 
considered
as favorable to vanilla vacuum: the table is not that big, it has just 
one index
and disk used is a fast fusionIO. We can expect even more gain on slower 
disks.


Thank you again for the patch. Hope to see it in 10.0.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] GSoC 2017

2017-01-17 Thread Anastasia Lubennikova

I'm ready to be a mentor.

10.01.2017 12:53, Alexander Korotkov:

Hi all!

In 2016 PostgreSQL project didn't pass to GSoC program.  In my 
understanding the reasons for that are following.


1. We did last-minute submission of our application to GSoC.
2. In 2016 GSoC application form for mentoring organizations has been 
changed.  In particular, it required more detailed information about 
possible project.


As result we didn't manage to make a good enough application that 
time.  Thus, our application was declined. See [1] and [2] for details.


I think that the right way to manage this in 2017 would be to start 
collecting required information in advance. According to GSoC 2017 
timeline [3] mentoring organization can submit their applications from 
January 19 to February 9. Thus, now it's a good time to start 
collecting project ideas and make call for mentors.  Also, we need to 
decide who would be our admin this year.


In sum, we have following questions:
1. What project ideas we have?
2. Who are going to be mentors this year?
3. Who is going to be project admin this year?

BTW, I'm ready to be mentor this year.  I'm also open to be an admin 
if needed.


[1] 
https://www.postgresql.org/message-id/flat/CAA-aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CALxAEPuGpAjBSN-PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww%40mail.gmail.com

[3] https://developers.google.com/open-source/gsoc/timeline

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Parallel Index Scans

2017-01-13 Thread Anastasia Lubennikova

27.12.2016 17:33, Amit Kapila:

On Fri, Dec 23, 2016 at 6:42 PM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

22.12.2016 07:19, Amit Kapila:

On Wed, Dec 21, 2016 at 8:46 PM, Anastasia Lubennikova
<lubennikov...@gmail.com> wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi, thank you for the patch.
Results are very promising. Do you see any drawbacks of this feature or
something that requires more testing?


I think you can focus on the handling of array scan keys for testing.
In general, one of my colleagues has shown interest in testing this
patch and I think he has tested as well but never posted his findings.
I will request him to share his findings and what kind of tests he has
done, if any.



Check please code related to buffer locking and pinning once again.
I got the warning. Here are the steps to reproduce it:
Except "autovacuum = off" config is default.

pgbench -i -s 100 test
pgbench -c 10 -T 120 test

 SELECT count(aid) FROM pgbench_accounts
 WHERE aid > 1000 AND aid < 90 AND bid > 800 AND bid < 900;
WARNING:  buffer refcount leak: [8297] (rel=base/12289/16459, blockNum=2469,
flags=0x9380, refcount=1 1)
  count


The similar problem has occurred while testing "parallel index only
scan" patch and Rafia has included the fix in her patch [1] which
ideally should be included in this patch, so I have copied the fix
from her patch.  Apart from that, I observed that similar problem can
happen for backward scans, so fixed the same as well.


I confirm that this problem is solved.


But I'm trying to find the worst cases for this feature. And I suppose we
should test parallel index scans with
concurrent insertions. More parallel readers we have, higher the
concurrency.
I doubt that it can significantly decrease performance, because number of
parallel readers is not that big,


I am not sure if such a test is meaningful for this patch because
parallelism is generally used for large data reads and in such cases
there are generally not many concurrent writes.


I didn't find any case of noticeable performancedegradation,
so set status to "Ready for committer".
Thank you for this patch.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



[HACKERS] IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements

2017-01-13 Thread Anastasia Lubennikova
I implemented IF NOT EXISTS option for CREATE SERVER and CREATE USER 
MAPPING statements

for one of our customers.
I think other users can also find it useful for scripting and automated 
tasks.
The patches themselves are small and simple. Documentation is updated as 
well.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit cf20fa8c9f29310d8f67e5b198a9eb908d903431
Author: Anastasia <a.lubennik...@postgrespro.ru>
Date:   Fri Jan 13 16:22:01 2017 +0300

Add [IF NOT EXISTS] option to CREATE SERVER statement

diff --git a/doc/src/sgml/ref/create_server.sgml b/doc/src/sgml/ref/create_server.sgml
index 734c6c9..6777679 100644
--- a/doc/src/sgml/ref/create_server.sgml
+++ b/doc/src/sgml/ref/create_server.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE SERVER server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ]
+CREATE SERVER [IF NOT EXISTS] server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ]
 FOREIGN DATA WRAPPER fdw_name
 [ OPTIONS ( option 'value' [, ... ] ) ]
 
@@ -56,6 +56,19 @@ CREATE SERVER server_name [ TYPE '<
   Parameters
 
   
+
+  
+IF NOT EXISTS
+
+ 
+  Do not throw an error if a server with the same name already exists.
+  A notice is issued in this case.  Note that there is no guarantee that
+  the existing server is anything like the one that would have been
+  created.
+ 
+
+   
+

 server_name
 
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index 06b4bc3..0b0114c 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -883,12 +883,25 @@ CreateForeignServer(CreateForeignServerStmt *stmt)
 
 	/*
 	 * Check that there is no other foreign server by this name.
+	 * Do nothing if IF NOT EXISTS was enforced.
 	 */
 	if (GetForeignServerByName(stmt->servername, true) != NULL)
-		ereport(ERROR,
-(errcode(ERRCODE_DUPLICATE_OBJECT),
- errmsg("server \"%s\" already exists",
-		stmt->servername)));
+	{
+		if (stmt->if_not_exists)
+		{
+			ereport(NOTICE,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("foreign server \"%s\" already exists, skipping",
+			stmt->servername)));
+			heap_close(rel, RowExclusiveLock);
+			return InvalidObjectAddress;
+		}
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("foreign server \"%s\" already exists",
+			stmt->servername)));
+	}
 
 	/*
 	 * Check that the FDW exists and that we have USAGE on it. Also get the
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9eef550..ab4b3b1 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4621,6 +4621,19 @@ CreateForeignServerStmt: CREATE SERVER name opt_type opt_foreign_server_version
 	n->version = $5;
 	n->fdwname = $9;
 	n->options = $10;
+	n->if_not_exists = false;
+	$$ = (Node *) n;
+}
+| CREATE SERVER IF_P NOT EXISTS name opt_type opt_foreign_server_version
+		 FOREIGN DATA_P WRAPPER name create_generic_options
+{
+	CreateForeignServerStmt *n = makeNode(CreateForeignServerStmt);
+	n->servername = $6;
+	n->servertype = $7;
+	n->version = $8;
+	n->fdwname = $12;
+	n->options = $13;
+	n->if_not_exists = true;
 	$$ = (Node *) n;
 }
 		;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 7ceaa22..8a79ec0 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2075,6 +2075,7 @@ typedef struct CreateForeignServerStmt
 	char	   *servertype;		/* optional server type */
 	char	   *version;		/* optional server version */
 	char	   *fdwname;		/* FDW name */
+	bool		if_not_exists;	/* just do nothing if it already exists? */
 	List	   *options;		/* generic options to server */
 } CreateForeignServerStmt;
 
commit 4237a23d6674da774b0772ffc953e5f499c04803
Author: Anastasia <a.lubennik...@postgrespro.ru>
Date:   Fri Jan 13 16:23:53 2017 +0300

Add [IF NOT EXISTS] option to CREATE USER MAPPING statement

diff --git a/doc/src/sgml/ref/create_user_mapping.sgml b/doc/src/sgml/ref/create_user_mapping.sgml
index 44fe302..680906b 100644
--- a/doc/src/sgml/ref/create_user_mapping.sgml
+++ b/doc/src/sgml/ref/create_user_mapping.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE USER MAPPING FOR { user_name | USER | CURRENT_USER | PUBLIC }
+CREATE USER MAPPING [IF NOT EXISTS] FOR { user_name | USER | CURRENT_USER | PUBLIC }
 SERVER server_name
 [ OPTIONS ( option 'value' [ , ... ] ) ]
 
@@ -49,6 +49,18 @@ CREATE USER MAPPING FOR { user_name
  
   Parameters
 
+  
+IF NOT EXISTS
+
+ 
+  Do not throw an error if a user mapping with the same name already exists.
+  A notice is issued in this case.  Note that 

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-12-28 Thread Anastasia Lubennikova

27.12.2016 20:14, Claudio Freire:

On Tue, Dec 27, 2016 at 10:41 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru>  wrote:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360,
vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
1417tblk =
ItemPointerGetBlockNumber(>dead_tuples[tupindex]);
(gdb) bt
#0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360,
vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
#1  0x00693dfe in lazy_scan_heap (onerel=0x1ec2360, options=9,
vacrelstats=0x1ef6e00, Irel=0x1ef7168, nindexes=2, aggressive=1 '\001')
 at vacuumlazy.c:1337
#2  0x00691e66 in lazy_vacuum_rel (onerel=0x1ec2360, options=9,
params=0x7ffe0f866310, bstrategy=0x1f1c4a8) at vacuumlazy.c:290
#3  0x0069191f in vacuum_rel (relid=1247, relation=0x0, options=9,
params=0x7ffe0f866310) at vacuum.c:1418

Those line numbers don't match my code.

Which commit are you based on?

My tree is (currently) based on 71f996d22125eb6cfdbee6094f44370aa8dec610


Hm, my branch is based on 71f996d22125eb6cfdbee6094f44370aa8dec610 as well.
I merely applied patches 
0001-Vacuum-prefetch-buffers-on-backward-scan-v3.patch

and 0002-Vacuum-allow-using-more-than-1GB-work-mem-v3.patch
then ran configure and make as usual.
Am I doing something wrong?

Anyway, I found the problem that had caused segfault.

for (segindex = 0; segindex <= vacrelstats->dead_tuples.last_seg; 
tupindex = 0, segindex++)

{
DeadTuplesSegment *seg = 
&(vacrelstats->dead_tuples.dead_tuples[segindex]);

intnum_dead_tuples = seg->num_dead_tuples;

while (tupindex < num_dead_tuples)
...

You rely on the value of tupindex here, while during the very first pass 
the 'tupindex' variable
may contain any garbage. And it happend that on my system there was 
negative value

as I found inspecting core dump:

(gdb) info locals
num_dead_tuples = 5
tottuples = 0
tupindex = -1819017215

Which leads to failure in the next line
tblk = ItemPointerGetBlockNumber(>dead_tuples[tupindex]);

The solution is to move this assignment inside the cycle.

I've read the second patch.

1. What is the reason to inline vac_cmp_itemptr() ?

2. +#define LAZY_MIN_TUPLESMax(MaxHeapTuplesPerPage, (128<<20) / 
sizeof(ItemPointerData))
What does 128<<20 mean? Why not 1<<27? I'd ask you to replace it with 
named constant,

or at least add a comment.

I'll share my results of performance testing it in a few days.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-12-28 Thread Anastasia Lubennikova

27.12.2016 16:54, Alvaro Herrera:

Anastasia Lubennikova wrote:


I ran configure using following set of flags:
  ./configure --enable-tap-tests --enable-cassert --enable-debug
--enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer"
And then ran make check. Here is the stacktrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360,
vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
1417tblk =
ItemPointerGetBlockNumber(>dead_tuples[tupindex]);

This doesn't make sense, since the patch removes the "tupindex"
variable in that function.



Sorry, I didn't get what you mean.

In  0002-Vacuum-allow-using-more-than-1GB-work-mem-v3.patch

-tupindex = 0;
-while (tupindex < vacrelstats->num_dead_tuples)
+segindex = 0;
+tottuples = 0;
+for (segindex = 0; segindex <= vacrelstats->dead_tuples.last_seg; 
tupindex = 0, segindex++)

 {

variable is still there, it was just moved to the loop.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-12-27 Thread Anastasia Lubennikova

23.12.2016 22:54, Claudio Freire:

On Fri, Dec 23, 2016 at 1:39 PM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru>  wrote:

I found the reason. I configure postgres with CFLAGS="-O0" and it causes
Segfault on initdb.
It works fine and passes tests with default configure flags, but I'm pretty
sure that we should fix segfault before testing the feature.
If you need it, I'll send a core dump.

I just ran it with CFLAGS="-O0" and it passes all checks too:

CFLAGS='-O0' ./configure --enable-debug --enable-cassert
make clean && make -j8 && make check-world

A stacktrace and a thorough description of your build environment
would be helpful to understand why it breaks on your system.


I ran configure using following set of flags:
 ./configure --enable-tap-tests --enable-cassert --enable-debug 
--enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer"

And then ran make check. Here is the stacktrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, 
vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
1417tblk = 
ItemPointerGetBlockNumber(>dead_tuples[tupindex]);

(gdb) bt
#0  0x006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, 
vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
#1  0x00693dfe in lazy_scan_heap (onerel=0x1ec2360, options=9, 
vacrelstats=0x1ef6e00, Irel=0x1ef7168, nindexes=2, aggressive=1 '\001')

at vacuumlazy.c:1337
#2  0x00691e66 in lazy_vacuum_rel (onerel=0x1ec2360, options=9, 
params=0x7ffe0f866310, bstrategy=0x1f1c4a8) at vacuumlazy.c:290
#3  0x0069191f in vacuum_rel (relid=1247, relation=0x0, 
options=9, params=0x7ffe0f866310) at vacuum.c:1418
#4  0x00690122 in vacuum (options=9, relation=0x0, relid=0, 
params=0x7ffe0f866310, va_cols=0x0, bstrategy=0x1f1c4a8,

isTopLevel=1 '\001') at vacuum.c:320
#5  0x0068fd0b in vacuum (options=-1652367447, relation=0x0, 
relid=3324614038, params=0x1f11bf0, va_cols=0xb59f63,

bstrategy=0x1f1c620, isTopLevel=0 '\000') at vacuum.c:150
#6  0x00852993 in standard_ProcessUtility (parsetree=0x1f07e60, 
queryString=0x1f07468 "VACUUM FREEZE;\n",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0xea5cc0 
, completionTag=0x7ffe0f866750 "") at utility.c:669
#7  0x008520da in standard_ProcessUtility 
(parsetree=0x401ef6cd8, queryString=0x18 address 0x18>,
context=PROCESS_UTILITY_TOPLEVEL, params=0x68, dest=0x9e5d62 
<AllocSetFree+60>, completionTag=0x7ffe0f8663f0 "`~\360\001")

at utility.c:360
#8  0x00851161 in PortalRunMulti (portal=0x7ffe0f866750, 
isTopLevel=0 '\000', setHoldSnapshot=-39 '\331',
dest=0x851161 <PortalRunMulti+19>, altdest=0x7ffe0f8664f0, 
completionTag=0x1f07e60 "\341\002") at pquery.c:1219
#9  0x00851374 in PortalRunMulti (portal=0x1f0a488, isTopLevel=1 
'\001', setHoldSnapshot=0 '\000', dest=0xea5cc0 ,
altdest=0xea5cc0 , completionTag=0x7ffe0f866750 "") at 
pquery.c:1345
#10 0x00850889 in PortalRun (portal=0x1f0a488, 
count=9223372036854775807, isTopLevel=1 '\001', dest=0xea5cc0 ,
altdest=0xea5cc0 , completionTag=0x7ffe0f866750 "") at 
pquery.c:824
#11 0x0084a4dc in exec_simple_query (query_string=0x1f07468 
"VACUUM FREEZE;\n") at postgres.c:1113
#12 0x0084e960 in PostgresMain (argc=10, argv=0x1e60a50, 
dbname=0x1e823b0 "template1", username=0x1e672a0 "anastasia")

at postgres.c:4091
#13 0x006f967e in init_locale (categoryname=0x100 
,
category=32766, locale=0xa004692f0 address 0xa004692f0>) at main.c:310
#14 0x7f1e5f463830 in __libc_start_main (main=0x6f93e1 <main+85>, 
argc=10, argv=0x7ffe0f866a78, init=,
fini=, rtld_fini=, 
stack_end=0x7ffe0f866a68) at ../csu/libc-start.c:291

#15 0x00469319 in _start ()

core file is quite big, so I didn't attach it to the mail. You can 
download it here: core dump file 
<https://drive.google.com/open?id=0B-7gUWL5Lg_gX3VlSXBaZzlKTlk>.


Here are some notes about the first patch:

1. prefetchBlkno = blkno & ~0x1f;
prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0;

I didn't get it what for we need these tricks. How does it differ from:
prefetchBlkno = (blkno > 32) ? blkno - 32 : 0;

2. Why do we decrease prefetchBlckno twice?

Here:
+prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0;
And here:
if (prefetchBlkno >= 32)
+prefetchBlkno -= 32;


I'll inspect second patch in a few days and write questions about it.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-12-23 Thread Anastasia Lubennikova

22.12.2016 21:18, Claudio Freire:

On Thu, Dec 22, 2016 at 12:22 PM, Claudio Freire <klaussfre...@gmail.com> wrote:

On Thu, Dec 22, 2016 at 12:15 PM, Anastasia Lubennikova
<lubennikov...@gmail.com> wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi,
I haven't read through the thread yet, just tried to apply the patch and run 
tests.
And it seems that the last attached version is outdated now. It doesn't apply 
to the master
and after I've tried to fix merge conflict, it segfaults at initdb.


I'll rebase when I get some time to do it and post an updated version

Attached rebased patches. I called them both v3 to be consistent.

I'm not sure how you ran it, but this works fine for me:

./configure --enable-debug --enable-cassert
make clean
make check

... after a while ...

===
  All 168 tests passed.
===

I reckon the above is equivalent to installcheck, but doesn't require
sudo or actually installing the server, so installcheck should work
assuming the install went ok.


I found the reason. I configure postgres with CFLAGS="-O0" and it causes 
Segfault on initdb.
It works fine and passes tests with default configure flags, but I'm 
pretty sure that we should fix segfault before testing the feature.

If you need it, I'll send a core dump.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index Scans

2016-12-23 Thread Anastasia Lubennikova

22.12.2016 07:19, Amit Kapila:

On Wed, Dec 21, 2016 at 8:46 PM, Anastasia Lubennikova
<lubennikov...@gmail.com> wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi, thank you for the patch.
Results are very promising. Do you see any drawbacks of this feature or 
something that requires more testing?


I think you can focus on the handling of array scan keys for testing.
In general, one of my colleagues has shown interest in testing this
patch and I think he has tested as well but never posted his findings.
I will request him to share his findings and what kind of tests he has
done, if any.



Check please code related to buffer locking and pinning once again.
I got the warning. Here are the steps to reproduce it:
Except "autovacuum = off" config is default.

pgbench -i -s 100 test
pgbench -c 10 -T 120 test

SELECT count(aid) FROM pgbench_accounts
WHERE aid > 1000 AND aid < 90 AND bid > 800 AND bid < 900;
WARNING:  buffer refcount leak: [8297] (rel=base/12289/16459, 
blockNum=2469, flags=0x9380, refcount=1 1)

 count
---
 0
(1 row)

postgres=# select 16459::regclass;
   regclass
---
 pgbench_accounts_pkey



2. Don't you mind to rename 'amestimateparallelscan' to let's say 
'amparallelscan_spacerequired'
or something like this?

Sure, I am open to other names, but IMHO, lets keep "estimate" in the
name to keep it consistent with other parallel stuff. Refer
execParallel.c to see how widely this word is used.


As far as I understand there is nothing to estimate, we know this size
for sure. I guess that you've chosen this name because of 
'heap_parallelscan_estimate'.
But now it looks similar to 'amestimate' which refers to indexscan cost for 
optimizer.
That leads to the next question.


Do you mean 'amcostestimate'?  If you want we can rename it
amparallelscanestimate to be consistent with amcostestimate.


I think that 'amparallelscanestimate' seems less ambiguous than 
amestimateparallelscan.
But it's up to you. There are enough comments to understand the purpose 
of this field.



And why do you call it pageStatus? What does it have to do with page?


During scan this tells us whether next page is available for scan.
Another option could be to name it as scanStatus, but not sure if that
is better.  Do you think if we add a comment like "indicates whether
next page is available for scan" for this variable then it will be
clear?


Yes, I think it describes the flag better.

5. Comment for _bt_parallel_seize() says:
"False indicates that we have reached the end of scan for
  current scankeys and for that we return block number as P_NONE."

  What is the reason to check (blkno == P_NONE) after checking (status == false)
  in _bt_first() (see code below)? If comment is correct
  we'll never reach _bt_parallel_done()

+   blkno = _bt_parallel_seize(scan, );
+   if (status == false)
+   {
+   BTScanPosInvalidate(so->currPos);
+   return false;
+   }
+   else if (blkno == P_NONE)
+   {
+   _bt_parallel_done(scan);
+   BTScanPosInvalidate(so->currPos);
+   return false;
+   }


The first time master backend or worker hits last page (calls this
API), it will return P_NONE and after that any worker tries to fetch
next page, it will return status as false.  I think we can expand a
comment to explain it clearly.  Let me know, if you need more
clarification, I can explain it in detail.


Got it,
I think you can add this explanation to the comment for 
_bt_parallel_seize().



6. To avoid code duplication, I would wrap this into the function

+   /* initialize moreLeft/moreRight appropriately for scan direction */
+   if (ScanDirectionIsForward(dir))
+   {
+   so->currPos.moreLeft = false;
+   so->currPos.moreRight = true;
+   }
+   else
+   {
+   so->currPos.moreLeft = true;
+   so->currPos.moreRight = false;
+   }
+   so->numKilled = 0;  /* just paranoia */
+   so->markItemIndex = -1; /* ditto */


Okay, I think we can write a separate function (probably inline
function) for above.


And after that we can also get rid of _bt_parallel_readpage() which only
bring another level of indirection to the code.


See, this function is responsible for multiple actions like
initializing moreLeft/moreRight positions, reading next page, dropping
the lock/pin.  So replicating all these actions in the caller will
make the code in caller less readable as compared to now.  Consider
this point and let me know your view 

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-12-22 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi,
I haven't read through the thread yet, just tried to apply the patch and run 
tests.
And it seems that the last attached version is outdated now. It doesn't apply 
to the master
and after I've tried to fix merge conflict, it segfaults at initdb. 

So, looking forward to a new version for review.

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index Scans

2016-12-21 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi, thank you for the patch.
Results are very promising. Do you see any drawbacks of this feature or 
something that requires more testing?
I'm willing to oo a review. I hadn't do benchmarks yet, but I've read the patch 
and here are some
notes and questions about it.

I saw the discussion about parameters in the thread above. And I agree that 
we'd better concentrate
on the patch itself and add them later if necessary.

1. Can't we simply use "if (scan->parallel_scan != NULL)" instead of 
xs_temp_snap flag?

+   if (scan->xs_temp_snap)
+   UnregisterSnapshot(scan->xs_snapshot);

I must say that I'm quite new with all this parallel stuff. If you give me a 
link,
where to read about snapshots for parallel workers, my review will be more 
helpful.
Anyway, it would be great to have more comments about it in the code.

2. Don't you mind to rename 'amestimateparallelscan' to let's say 
'amparallelscan_spacerequired'
or something like this? As far as I understand there is nothing to estimate, we 
know this size
for sure. I guess that you've chosen this name because of 
'heap_parallelscan_estimate'.
But now it looks similar to 'amestimate' which refers to indexscan cost for 
optimizer.
That leads to the next question.

3. Are there any changes in cost estimation? I didn't find related changes in 
the patch.
Parallel scan is expected to be faster and optimizer definitely should know 
that.

4. +uint8   ps_pageStatus;  /* state of scan, see below */
There is no desciption below. I'd make the comment more helpful:
/* state of scan. See possible flags values in nbtree.h */
And why do you call it pageStatus? What does it have to do with page?

5. Comment for _bt_parallel_seize() says:
"False indicates that we have reached the end of scan for
 current scankeys and for that we return block number as P_NONE."

 What is the reason to check (blkno == P_NONE) after checking (status == false)
 in _bt_first() (see code below)? If comment is correct
 we'll never reach _bt_parallel_done()

+   blkno = _bt_parallel_seize(scan, );
+   if (status == false)
+   {
+   BTScanPosInvalidate(so->currPos);
+   return false;
+   }
+   else if (blkno == P_NONE)
+   {
+   _bt_parallel_done(scan);
+   BTScanPosInvalidate(so->currPos);
+   return false;
+   }

6. To avoid code duplication, I would wrap this into the function

+   /* initialize moreLeft/moreRight appropriately for scan direction */
+   if (ScanDirectionIsForward(dir))
+   {
+   so->currPos.moreLeft = false;
+   so->currPos.moreRight = true;
+   }
+   else
+   {
+   so->currPos.moreLeft = true;
+   so->currPos.moreRight = false;
+   }
+   so->numKilled = 0;  /* just paranoia */
+   so->markItemIndex = -1; /* ditto */

And after that we can also get rid of _bt_parallel_readpage() which only
bring another level of indirection to the code.

7. Just a couple of typos I've noticed:

 * Below flags are used indicate the state of parallel scan.
 * Below flags are used TO indicate the state of parallel scan.

* On success, release lock and pin on buffer on success.
* On success release lock and pin on buffer.

8. I didn't find a description of the feature in documentation.
Probably we need to add a paragraph to the "Parallel Query" chapter. 

I will send another review of performance until the end of the week.

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about pg_control usage

2016-10-07 Thread Anastasia Lubennikova

Hi, hackers!

I am examining various global variables in ShmemVariableCache,
pg_control and pg_controldata.  To force and debug xid wraparound,
I've implemented a function, that allows to set nextXid value in control
file manually, but it doesn't work as expected.

After an investigation I found that, while loading global variables in 
StartupXlog()
we read checkPoint location from ControlFile, and then read all the 
information
from the checkPoint record. And never ever use nextXid stored in 
ControlFile.


What is the purpose of having CheckPoint stored in control file then?
I thought, that it allows us to start up server after correct shutdown
without any xlog, as said in the comment in pg_control.h

 * Body of CheckPoint XLOG records.  This is declared here because we keep
 * a copy of the latest one in pg_control for possible disaster recovery.
 * Changing this struct requires a PG_CONTROL_VERSION bump.

But as far as I see, we never use this data.

Could you please explain, why don't we use information from control file
in StartapXlog()?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM corruption leading to errors

2016-10-07 Thread Anastasia Lubennikova
[3] 
https://www.postgresql.org/message-id/AMSPR06MB504CD8FE8AA30D4B7C958AAE39E0%40AMSPR06MB504.eurprd06.prod.outlook.com



--
 Pavan Deolasee http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Could you please add the patches to commitfest?
I'm going to test them and write a review in a few days.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] WIP: Covering + unique indexes.

2016-10-04 Thread Anastasia Lubennikova

03.10.2016 15:29, Anastasia Lubennikova:

03.10.2016 05:22, Michael Paquier:

On Tue, Sep 27, 2016 at 12:17 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

Ok, I'll write it in a few days.

Marked as returned with feedback per last emails exchanged.


The only complaint about this patch was a lack of README,
which is fixed now (see the attachment). So, I added it to new CF,
marked as ready for committer.


One more fix for pg_upgrade.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index d4f9090..99735ce 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1483,7 +1483,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1509,7 +1509,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1529,9 +1529,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2021,10 +2021,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2034,8 +2034,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2056,12 +2056,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..c024cf3 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 appendStringInfoCharMacro(payload, ',');
 appendStringInfoCharMacro(payload, operation);
 
-for (i = 0; i < numatts; i++)
+for (i = 0; i < indnkeyatts; i++)
 {
 	int			colno = index->indkey.values[i];
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 29738b0..9c7f9e5 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3564,6 +3564,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/inde

Re: [HACKERS] WIP: Covering + unique indexes.

2016-10-03 Thread Anastasia Lubennikova

03.10.2016 05:22, Michael Paquier:

On Tue, Sep 27, 2016 at 12:17 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

Ok, I'll write it in a few days.

Marked as returned with feedback per last emails exchanged.


The only complaint about this patch was a lack of README,
which is fixed now (see the attachment). So, I added it to new CF,
marked as ready for committer.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index d4f9090..99735ce 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1483,7 +1483,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1509,7 +1509,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1529,9 +1529,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2021,10 +2021,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2034,8 +2034,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2056,12 +2056,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..c024cf3 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 appendStringInfoCharMacro(payload, ',');
 appendStringInfoCharMacro(payload, operation);
 
-for (i = 0; i < numatts; i++)
+for (i = 0; i < indnkeyatts; i++)
 {
 	int			colno = index->indkey.values[i];
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 322d8d6..0983c93 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3564,6 +3564,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 40f201b..92bef4a 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/d

Re: [HACKERS] sequences and pg_upgrade

2016-09-30 Thread Anastasia Lubennikova

23.09.2016 21:06, Peter Eisentraut:

Here is an updated patch set.  Compared to the initial set, I have
changed pg_dump's sorting priorities so that sequence data is always
after table data.  This would otherwise have introduced a problem
because sortDataAndIndexObjectsBySize() only considers consecutive
DO_TABLE_DATA entries.  Also, I have removed the separate
--sequence-data switch from pg_dump and made it implicit in
--binary-upgrade.  (So the previous patches 0002 and 0003 have been
combined, because it's no longer a separate feature.)



The patches are good, no complaints.
But again, I have the same question.
I was confused, why do we always dump sequence data,
because I'd overlooked the --sequence-data key. I'd rather leave this 
option,

because it's quite non intuitive behaviour...
 /* dump sequence data even in schema-only mode */

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-09-26 Thread Anastasia Lubennikova


24.09.2016 15:36, Amit Kapila:

On Wed, Sep 21, 2016 at 6:51 PM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

20.09.2016 08:21, Amit Kapila:

On Tue, Sep 6, 2016 at 10:18 PM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

28.08.2016 09:13, Amit Kapila:


The problem seems really tricky, but the answer is simple.
We store included columns unordered. It was mentioned somewhere in
this thread.


Is there any fundamental problem in storing them in ordered way?  I
mean to say, you need to anyway store all the column values on leaf
page, so why can't we find the exact location for the complete key.
Basically use truncated key to reach to leaf level and then use the
complete key to find the exact location to store the key.  I might be
missing some thing here, but if we can store them in ordered fashion,
we can use them even for queries containing ORDER BY (where ORDER BY
contains included columns).



I'd say that the reason for not using included columns in any
operations which require comparisons, is that they don't have opclass.
Let's go back to the example of points.
This data type don't have any opclass for B-tree, because of fundamental 
reasons.

And we can not apply _bt_compare() and others to this attribute, so
we don't include it to scan key.

create table t (i int, i2 int, p point);
create index idx1 on (i) including (i2);
create index idx2 on (i) including (p);
create index idx3 on (i) including (i2, p);
create index idx4 on (i) including (p, i2);

You can keep tuples ordered in idx1, but not for idx2, partially ordered 
for idx3, but not for idx4.


At the very beginning of this thread [1], I suggested to use opclass, 
where possible.
Exactly the same idea, you're thinking about. But after short 
discussion, we came
to conclusion that it would require many additional checks and will be 
too complicated,

at least for the initial patch.


Let me give you an example:

create table t (i int, p point);
create index on (i) including (p);
"point" data type doesn't have any opclass for btree.
Should we insert (0, '(0,2)') before (0, '(1,1)') or after?
We have no idea what is the "correct order" for this attribute.
So the answer is "it doesn't matter". When searching in index,
we know that only key attrs are ordered, so only them can be used
in scankey. Other columns are filtered after retrieving data.

explain select i,p from t where i =0 and p <@ circle '((0,0),2)';
 QUERY PLAN
---
  Index Only Scan using idx on t  (cost=0.14..4.20 rows=1 width=20)
Index Cond: (i = 0)
Filter: (p <@ '<(0,0),2>'::circle)


I think here reason for using Filter is that because we don't keep
included columns in scan keys, can't we think of having them in scan
keys, but use only key columns in scan key to reach till leaf level
and then use complete scan key at leaf level.



What should I add to README (or to documentation),
to make it more understandable?


May be add the data representation like only leaf pages contains all
the columns and how the scan works.  I think you can see if you can
extend "Notes About Data Representation" and or "Other Things That Are
Handy to Know" sections in existing README.


Ok, I'll write it in a few days.


[1] https://www.postgresql.org/message-id/55f84df4.5030...@postgrespro.ru

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] File system operations.

2016-09-15 Thread Anastasia Lubennikova

Hi, hackers!

Do we have any standard set of functions to operate with files?

I mean link(), copy(), rename(), unlink(), ftruncate() and so on.
Mostly I worry about the first two since they have different
implementation on Windows.
I found a couple of functions in src/backend/storage/file/copydir.c,
some more in src/bin/pg_upgrade/file.c along with a set of defines
in src/bin/pg_upgrade/pg_upgrade.h. Obviously md.c and fd.c contain
some functions.
Do we have any policy of using them? There is a comment in fd.c

 * For this scheme to work, most (if not all) routines throughout the
 * server should use these interfaces instead of calling the C library
 * routines (e.g., open(2) and fopen(3)) themselves.  Otherwise, we
 * may find ourselves short of real file descriptors anyway.

and even more in fd.h

 *File {Close, Read, Write, Seek, Tell, Sync}
 *{Path Name Open, Allocate, Free} File
 *
 * These are NOT JUST RENAMINGS OF THE UNIX ROUTINES.
 * Use them for all file activity...

but I see that, for example, pg_open_tzfile() and get_controlfile(), 
logfile_open()
call open()/fopen() directly. Same behavior you can find for any C 
library function.

Am I missing something important or it is simply a legasy/overlooked code?

What do you think about moving stuff from pg_upgrade/file.c to storage/file/
to allow reuse of this code? I think it'll be really helpful for 
developers of contrib modules

like backup managers.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequences and pg_upgrade

2016-09-15 Thread Anastasia Lubennikova

15.09.2016 15:29, Peter Eisentraut:

On 9/14/16 8:52 AM, Anastasia Lubennikova wrote:

Could you clarify, please, why do we dump sequence in schemaOnly mode?
+   if (dopt.schemaOnly && dopt.sequence_data)
+   getSequenceData(, tblinfo, numTables, dopt.oids);

The point of this patch is that with the new option, you can dump
sequence data (but not table data) alongside with the schema.  This
would be used by pg_upgrade for the reasons described at the beginning
of the thread.



Oh, thank you. Now I see.
Somewhy I thought that it *always* dumps sequence data in schemaOnly mode.
Which is definitely not true.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-09-14 Thread Anastasia Lubennikova

One more update.

I added ORDER BY clause to regression tests.
It was done as a separate bugfix patch by Tom Lane some time ago,
but it definitely should be included into the patch.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index d4f9090..99735ce 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1483,7 +1483,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1509,7 +1509,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1529,9 +1529,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2021,10 +2021,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2034,8 +2034,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2056,12 +2056,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..c024cf3 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 appendStringInfoCharMacro(payload, ',');
 appendStringInfoCharMacro(payload, operation);
 
-for (i = 0; i < numatts; i++)
+for (i = 0; i < indnkeyatts; i++)
 {
 	int			colno = index->indkey.values[i];
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 322d8d6..0983c93 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3564,6 +3564,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 40f201b..92bef4a 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -110,6 +110,8 @@ typedef struct IndexAmRoutine
 boolamclusterable;
 /* does AM handle predicate locks? */
 boolampredlocks;
+/* does AM s

Re: [HACKERS] sequences and pg_upgrade

2016-09-14 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, failed

Thank you for the patch.
As I see there are no objections in the discussion, all the patches look clear.

Could you clarify, please, why do we dump sequence in schemaOnly mode?
+   if (dopt.schemaOnly && dopt.sequence_data)
+   getSequenceData(, tblinfo, numTables, dopt.oids);

Example: 
postgres=# create table t(i serial, data text);
postgres=# insert into t(data) values ('aaa');
pg_dump -d postgres --sequence-data --schema-only > ../reviews/dump_pg

Then restore it into newdb and add new value.
newdb=# insert into t(data) values ('aaa');
INSERT 0 1
newdb=# select * from t;
 i | data 
---+--
 2 | aaa

I'm not an experienced user, but I thought that while doing dump/restore
of schema of database we reset all the data. Why should the table in newly
created (via pg_restore) database have non-default sequence value?

I also did some other tests and all of them were passed.

One more thing to do is a documentation for the new option.
You should update help() function in pg_dump.c and also add some
notes to pg_dump.sgml and probably to pgupgrade.sgml.

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-09-06 Thread Anastasia Lubennikova

28.08.2016 09:13, Amit Kapila:

On Mon, Aug 15, 2016 at 8:15 PM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru>  wrote:
@@ -590,7 +622,14 @@ _bt_buildadd(BTWriteState *wstate, BTPageState
*state, IndexTuple itup)
   if (last_off == P_HIKEY)
   {
   Assert(state->btps_minkey == NULL);
- state->btps_minkey = CopyIndexTuple(itup);
+ /*
+ * Truncate the tuple that we're going to insert
+ * into the parent page as a downlink
+ */

+ if (indnkeyatts != indnatts && P_ISLEAF(pageop))
+ state->btps_minkey = index_truncate_tuple(wstate->index, itup);
+ else
+ state->btps_minkey = CopyIndexTuple(itup);

It seems that above code always ensure that for leaf pages, high key
is a truncated tuple.  What is less clear is if that is true, why you
need to re-ensure it again for the old  page in below code:


Thank you for the question. Investigation took a long time)

As far as I understand, the code above only applies to
the first tuple of each level. While the code you have quoted below
truncates high keys for all other pages.

There is a comment that clarifies situation:
/*
 * If the new item is the first for its page, stash a copy for 
later. Note

 * this will only happen for the first item on a level; on later pages,
 * the first item for a page is copied from the prior page in the code
 * above.
 */


So the patch is correct.
We can go further and remove this index_truncate_tuple() call, because
the first key of any inner (or root) page doesn't need any key at all.
It simply points out to the leftmost page of the level below.
But it's not a bug, because truncation of one tuple per level doesn't
add any considerable overhead. So I want to leave the patch in its 
current state.



@@ -510,6 +513,8 @@ _bt_buildadd(BTWriteState *wstate, BTPageState
*state, IndexTuple itup)
{
..
+ if (indnkeyatts != indnatts && P_ISLEAF(opageop))
+ {
+ /*
+ * It's essential to truncate High key here.
+ * The purpose is not just to save more space on this particular page,
+ * but to keep whole b-tree structure consistent. Subsequent insertions
+ * assume that hikey is already truncated, and so they should not
+ * worry about it, when copying the high key into the parent page
+ * as a downlink.
+ * NOTE It is not crutial for reliability in present,
+ * but maybe it will be that in the future.
+ */
+ keytup = index_truncate_tuple(wstate->index, oitup);
+
+ /*  delete "wrong" high key, insert keytup as P_HIKEY. */
+ PageIndexTupleDelete(opage, P_HIKEY);
+
+ if (!_bt_pgaddtup(opage, IndexTupleSize(keytup), keytup, P_HIKEY))
+ elog(ERROR, "failed to rewrite compressed item in index \"%s\"",
+ RelationGetRelationName(wstate->index));
+ }
+
..
..



--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Refactoring of heapam code.

2016-09-06 Thread Anastasia Lubennikova

06.09.2016 07:44, Pavan Deolasee:




On Mon, Aug 8, 2016 at 3:13 PM, Anastasia Lubennikova 
<a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> 
wrote:




Thank you for the review, I'll fix these problems in final version.

Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more
concrete.


I started looking at the first patch posted above, but it seems you'll 
rewrite these patches after discussion on the heapam refactoring ideas 
you posted here https://wiki.postgresql.org/wiki/HeapamRefactoring 
<https://wiki.postgresql.org/wiki/HeapamRefactoring> concludes.




Thank you for the review.


Some comments anyways on the first patch:

1. It does not apply cleanly with git-apply - many white space errors


I'll fix all merge issues and attach it to the following message.

2. I don't understand the difference 
between PageGetItemHeapHeaderOnly() and PageGetItemHeap(). They seem 
to do exactly the same thing and can be used interchangeably.


The only difference between these two macros is that
PageGetItemHeapHeaderOnly() doesn't touch any key fields of a tuple,
it only checks header fields (see HeapTupleHeaderData). I divided it into
two separate functions, while I was working on page compression and
I wanted to avoid unnecessary decompression calls. These names are
just a kind of 'markers' to make the code reading and improving easier.

3. While I like the idea of using separate interfaces to get 
heap/index tuple from a page, may be they can internally use a common 
interface instead of duplicating what PageGetItem() does already.


I don't sure I get it right. Do you suggest to leave PageGetItem and write
PageGetItemHeap() and PageGetItemIndex() as wrappers on it?
It sounds reasonable while we have similar layout for both heap and 
index pages.

In any case, it'll be easy to separate them when necessary.

4. Should we rename PageGetItemHeap() to PageGetHeapTuple() or 
something like that?


I don't feel like doing that, because HeapTuple is a different structure.
What we do get from page is a HeapTupleHeaderData structure
followed by user's data.

5. If we do this, we should probably have corresponding wrapper 
functions/macros for remaining callers of PageGetItem()



There are some calls of PageGetItem() left in BRIN and SP-gist indexes,
I can write simple wrappers for them.
I left them just because they were out of interest for my compression 
prototype.



I also looked at the refactoring design doc. Looks like a fine 
approach to me, but the API will need much more elaborate discussions. 
I am not sure if the interfaces as presented are enough to support 
everything that even heapam can do today.


What features of heapam do you think could be unsupportable in this API?
Maybe I've just missed them.

My point here is that heapam is too complicated for many simpler use-cases
read-only tables and append-only tables do not need many MVCC related tricks
like vacuum and complicated locking, tables with fixed len attributes 
can use

more optimal page layout. And so on.

I suggest refactoring, that will allow us to develop new heap-like 
access methods.
For the first version, they must have methods to "heapify" tuple i.e 
turn internal
representation of the data to regular HeapTuple, for example put some 
stubs into
MVCC fields. Of course this approach has its disadvantages, such as 
performance issues.
It definitely won't be enough to write column storage or to implement 
other great

data structures. But it allows us not to depend of the Executor's code.

And much more thoughts will be required to ensure we don't miss out 
things that new primary access method may need.



Do you have any particular ideas of new access methods?



A few points regarding how the proposed API maps to heapam:

- How do we support parallel scans on heap?


As far as I understand, parallel heap scan requires one more API function
heap_beginscan_parallel(). It uses the same API function - heap_getnext(),
but in addition to ordinary seq scan it selects page to read in a 
synchronized manner.



- Does the primary AM need to support locking of rows?


That's a good question. I'd say it should be an option.

- Would there be separate API to interpret the PAM tuple itself? 
(something that htup_details.h does today). It seems natural that 
every PAM will have its own interpretation of tuple structure and we 
would like to hide that inside PAM implementation.


As I already wrote, for the first prototype, I'd use function to "heapify"
tuple and don't care about executor issues at all. More detailed discussion
of this question you can find in another thread [1].

- There are many upper modules that need access to system attributes 
(xmin, xmax for starters). How do you plan to handle that? You think 
we can provide enough abstraction so that the callers don't need to 
know 

Re: [HACKERS] Optimization for lazy_scan_heap

2016-08-24 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi,
I haven't tested the performance yet, but the patch itself looks pretty good
and reasonable improvement.
I have a question about removing the comment. It seems to be really tricky
moment. How do we know that all-frozen block hasn't changed since the 
moment we checked it?

-* Tricky, tricky.  If this is in aggressive 
vacuum, the page
-* must have been all-frozen at the time we 
checked whether it
-* was skippable, but it might not be any more. 
 We must be
-* careful to count it as a skipped all-frozen 
page in that
-* case, or else we'll think we can't update 
relfrozenxid and
-* relminmxid.  If it's not an aggressive 
vacuum, we don't
-* know whether it was all-frozen, so we have 
to recheck; but
-* in this case an approximate answer is OK.
+* We know that there are n_skipped pages by 
the visibilitymap scan we
+* did just before.
 */

I'm going to test the performance this week.
I wonder if you could send a test script or describe the steps to test it?

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2016-08-17 Thread Anastasia Lubennikova
es forwards (say a batch of
tuples in a columnar storage AM has column values spread across many
buffers; they must all be kept pinned until the scan has moved past the
whole set of tuples).  But I'm not really sure that this is a great
design.


Frankly, I doubt that it's real to implement columnar storage just as
a variant of pluggable storage. It requires a lot of changes in executor
and optimizer and so on, which are hardly compatible with existing
tuple-oriented model. However I'm not so good in this area, so if you
feel that it's possible, go ahead.


I welcome comments on these ideas.  My patch for this is nowhere near
completion yet; expect things to change for items that I've overlooked,
but I hope I didn't overlook any major.  If things are handwavy, it is
probably because I haven't fully figured them out yet.


Thank you again for beginning the big project.
Looking forward to the prototype. I think it will make the discussion
more concrete and useful.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-08-16 Thread Anastasia Lubennikova

09.08.2016 19:45, Andrew Borodin:

Here is new version of the patch, now it includes recommendations from
Anastasia Lubennikova.


I've investigated anamalous index size decrease. Most probable version
appeared to be true.
Cube extension, as some others, use Guttman's polynomial time split
algorithm. It is known to generate "needle-like" MBBs (MBRs) for
sorted data due to imbalanced splits (splitting 100 tuples as 98 to
2). Due to previous throw-to-the-end behavior of GiST this imbalance
was further amplificated (most of inserts were going to bigger part
after split). So GiST inserts were extremely slow for sorted data.
There is no need to do exact sorting to trigger this behavior.
This behavior can be fused by implementation of small-m restriction in
split (AFAIR this is described in original R-tree paper from 84),
which prescribes to do a split in a parts no smaller than m, where m
is around 20% of a page capacity (in tuples number). After application
of this patch performance for sorted data is roughly the same as
performance for randomized data.


Thank you for explanation. Now it's clear to me. I did some more testing and
found nothing special. The declared feature is implemented correctly.
It passes all regression tests and improves performance.

I still have a few minor nitpicks about the patch style.
You can find a style guide on 
https://www.postgresql.org/docs/9.6/static/source.html


1) remove extra whitespace in README

2) add whitespace: if(ntup == 1)

3) fix comments in accordance with coding conventions

/* In case of single tuple update GiST calls overwrite
 * In all other cases function gistplacetopage deletes
 * old tuples and place updated at the end
 *  */


+/* Normally here was following assertion
+ * Assert(ItemIdHasStorage(ii));
+ * This assertion was introduced in PageIndexTupleDelete
+ * But if this function will be used from BRIN index
+ * this assertion will fail. Thus, here we do not check that
+ * item has the storage.
+ * */

4) remove unrelated changes
-data += sizeof(OffsetNumber) * xldata->ntodelete;
+data += sizeof(OffsetNumber) *xldata->ntodelete;

5) If the comment is correct, maxalignment is not necessary.
+/* tuples on a page are always maxaligned */
+oldsize = MAXALIGN(oldsize);

6) I'd rather use alignednewsize here.
 +ItemIdSetNormal(tupid, offset + size_diff, newsize);


After the cleanup you can change status to "Ready for Committer"
without waiting for the response.


If data is randomized this effect of Guttman poly-time split is not in
effect; test from the start of the thread will show no statistically
confident improvement by the patch.
To prove performance increase in randomized case I've composed
modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1
This test with five runs shows following:
Before patch
Insert Time AVG 78 seconds STD 9.5
Afer patch
Insert Time AVG 68 seconds STD 7.7
This is marginal but statistically viable performance improvement.

For sorted data performance is improved by a factor of 3.
Best regards, Andrey Borodin, Octonica & Ural Federal University.



--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2016-08-15 Thread Anastasia Lubennikova
alone.  This is somewhat messy in the
case of heap_multi_insert because it returns several items; I think it's
acceptable to return an array of ItemPointers in the same order as the
input tuples.  This works fine for the only caller, which is COPY in
batch mode.  For the other routines, they don't really care where the
TID is returned AFAICS.


Additional noteworthy items:

i) Speculative insertion: the speculative insertion token is no longer
installed directly in the heap tuple by the executor (of course).
Instead, the token becomes part of the slot.  When the tuple_insert
method is called, the insertion routine is in charge of setting the
token from the slot into the storage tuple.  Executor is in charge of
calling method->speculative_finish() / abort() once the insertion has
been confirmed by the indexes.

ii) execTuples has additional accessors for tuples-in-slot, such as
ExecFetchSlotTuple and friends.  I expect to have some of them to return
abstract StorageTuples, others HeapTuple or MinimalTuples (possibly
wrapped in Datum), depending on callers.  We might be able to cut down
on these later; my first cut will try to avoid API changes to keep
fallout to a minimum.

iii) All tuples need to be identifiable by ItemPointers.  Storages that
have different requirements will need careful additional thought across
the board.

iv) System catalogs cannot use pluggable storage.  We continue to use
heap_open etc in the DDL code, in order not to make this more invasive
that it already is.  We may lift this restriction later for specific
catalogs, as needed.

v) Currently, one Buffer may be associated with one HeapTuple living in a
slot; when the slot is cleared, the buffer pin is released.  My current
patch moves the buffer pin to inside the heapam-based storage AM and the
buffer is released by the ->slot_clear_tuple method.  The rationale for
doing this is that some storage AMs might want to keep several buffers
pinned at once, for example, and must not to release those pins
individually but in batches as the scan moves forwards (say a batch of
tuples in a columnar storage AM has column values spread across many
buffers; they must all be kept pinned until the scan has moved past the
whole set of tuples).  But I'm not really sure that this is a great
design.


I welcome comments on these ideas.  My patch for this is nowhere near
completion yet; expect things to change for items that I've overlooked,
but I hope I didn't overlook any major.  If things are handwavy, it is
probably because I haven't fully figured them out yet.



--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-15 Thread Anastasia Lubennikova

08.08.2016 12:43, Anastasia Lubennikova:

08.08.2016 03:51, Michael Paquier:
On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera 
<alvhe...@2ndquadrant.com> wrote:

Anastasia Lubennikova wrote:
So there is a couple of patches. They do not cover all mentioned 
problems,

but I'd like to get a feedback before continuing.

I agree that we could improve things in this general area, but I do not
endorse any particular changes you propose in these patches; I haven't
reviewed your patches.
Well, I am interested in the topic. And just had a look at the 
patches proposed.


+ /*
+ * Split PageGetItem into set of different macros
+ * in order to make code more readable.
+ */
+#define PageGetItemHeap(page, itemId) \
+( \
+   AssertMacro(PageIsValid(page)), \
+   AssertMacro(ItemIdHasStorage(itemId)), \
+   (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \
+)
Placing into bufpage.h macros that are specific to heap and indexes is
not that much a good idea. I'd suggest having the heap ones in
heapam.h, and the index ones in a more generic place, as
src/access/common as already mentioned by Alvaro.

+   onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid);
Just PageGetItemHeapHeader would be fine.

@@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate)
 pfree(bistate);
  }

-
  /*
   * heap_insert - insert tuple into a heap
Those patches have some noise.

Patch 0002 is doing two things: reorganizing the order of the routines
in heapam.c and move some routines from heapam.c to hio.c. Those two
could be separate patches/

If the final result is to be able to extend custom AMs for tables, we
should have a structure like src/access/common/index.c,
src/access/common/table.c and src/access/common/relation.c for
example, and have headers dedicated to them. But yes, there is
definitely a lot of room for refactoring as we are aiming, at least I
guess so, at having more various code paths for access methods.


Thank you for the review, I'll fix these problems in final version.

Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more concrete.

Thank you for attention and feedback.
Since there are no objections to the idea in general, I'll write an 
exhaustive

README about current state of the code and then propose API design
to discuss details.

Stay tuned.



Here is the promised design draft.
https://wiki.postgresql.org/wiki/HeapamRefactoring

Looking forward to your comments.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-08-15 Thread Anastasia Lubennikova

14.08.2016 20:11, Andrey Borodin:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, failed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi hackers!

I've read the patch and here is my code review.

==PURPOSE
I've used this feature from time to time with MS SQL. From my experience 
INCLUDE is a 'sugar on top' feature.
Some MS SQL classes do not even mention INCLUDE despite it's there from 2005 
(though classes do not mention lots of important things, so it's not kind of 
valuable indicator).
But those who use it, use it whenever possible. For example, system view with 
recommended indices rarely list one without INCLUDE columns.
So, this feature is very important from perspective of converting MS SQL DBAs 
to PostgreSQL. This is how I see it.


Thank you for the review, I hope this feature will be useful for many 
people.



SUGGESTIONS==
0. Index build is broken. This script 
https://github.com/x4m/pggistopt/blob/8ad65d2e305e98c836388a07909af5983dba9c73/test.sql
 SEGFAULTs and may cause situation when you cannot insert anything into table 
(I think drop of index would help, but didn't tested this)


Thank you for reporting. That was a bug caused by high key truncation, 
that occurs

when index has more than 3 levels.
Fixed. See attached file.


1. I think MS SQL syntax INCLUDE instead of INCLUDING would be better (for a 
purpose listed above)


I've chosen this particular name to avoid using of new keyword. We 
already have INCLUDING
in postgres in a context of inheritance that will never intersect with 
covering indexes.

I'm sure it won't be a big problem of migration from MsSQL.


2. Empty line added in ruleutils.c. Is it for a reason?


No, just a missed line.
Fixed.


3. Now we have indnatts and indnkeyatts instead of indnatts. I think it is 
worth considering renaming indnatts to something different from old name. 
Someone somewhere could still suppose it's a number of keys.


I agree that naming became vague after this patch.
I've already suggested to replace "indkeys[]" with more specific name, and
AFAIR there was no reaction. So I didn't do that.
But I don't sure about your suggestion regarding indnatts. Old queries 
(and old indexes)
can still use it correctly. I don't see a reason to break compatibility 
for all users.
Those who will use this new feature, should ensure that their queries to 
pg_index

behave as expected.


PERFORMANCE==
Due to suggestion number 0 I could not measure performance of index build. 
Index crashes when there's more than 1.1 million of rows in a table.
Performance test script is here 
https://github.com/x4m/pggistopt/blob/f206b4395baa15a2fa42897eeb27bd555619119a/test.sql
Test scenario is following:
1. Create table, then create index, then add data.
2. Make a query touching data in INCLUDING columns.
This scenario is tested against table with:
A. Table with index, that do not contain touched columns, just PK.
B. Index with all columns in index.
C. Index with PK in keys and INCLUDING all other columns.

Tests were executed 5 times on Ubuntu VM under Hyper-V i5 2500 CPU, 16 Gb of 
RAM, SSD disk.
Time to insert 10M rows:
A. AVG 110 seconds STD 4.8
B. AVG 121 seconds STD 2.0
C. AVG 111 seconds STD 5.7
Inserts to INCLUDING index is almost as fast as inserts to index without extra 
columns.

Time to run SELECT query:
A. AVG 2864 ms STD 794
B. AVG 2329 ms STD 84
C. AVG 2293 ms STD 58
Selects with INCLUDING columns is almost as fast as with full index.

Index size (deterministic measure, STD = 0)
A. 317 MB
B. 509 MB
C. 399 MB
Index size is in the middle between full index and minimal index.

I think this numbers agree with expectation from the feature.

CONCLUSION==
This patch brings useful and important feature. Build shall be repaired; other 
my suggestions are only suggestions.



Best regards, Andrey Borodin, Octonica & Ural Federal University.

The new status of this patch is: Waiting on Author



--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums,

Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-08-08 Thread Anastasia Lubennikova

06.08.2016 19:51, Andrew Borodin:

Anastasia , thank you for your attentive code examine.

2016-08-05 21:19 GMT+05:00 Anastasia Lubennikova <a.lubennik...@postgrespro.ru>:

First of all, shouldn't we use MAXALIGN(oldsize) instead of oldsize?
Although, I'm quite sure that it was already aligned somewhere before.
I doubt that the check (size_diff != MAXALIGN(size_diff)) is necessary.
I'd rather add  the check: (offset+size_diff < pd_lower).

Actually, that's a tricky question. There may be different code expectations.
1. If we expect that not-maxaligned tuple may be placed by any other
routine, we should remove check (size_diff != MAXALIGN(size_diff)),
since it will fail for not-maxaligned tuple.
2. If we expect that noone should use PageIndexTupleOverwrite with
not-maxaligned tuples, than current checks are OK: we will break
execution if we find non-maxaligned tuples. With an almost correct
message.
I suggest that this check may be debug-only assertion: in a production
code routine will work with not-maxaligned tuples if they already
reside on the page, in a dev build it will inform dev that something
is going wrong. Is this behavior Postgres-style compliant?


I agree that pd_lower check makes sense.


Pointing out to this comparison I worried about possible call of
MAXALIGN for negative size_diff value. Although I don't sure if it can 
cause any problem.


Tuples on a page are always maxaligned.
But, as far as I recall,  itemid->lp_len can contain non-maxaligned value.

So, I'd suggest you to perform MAXALIGN(oldsize) before computing size_diff:
 size_diff = oldsize - alignednewsize;

Since both arguments are maxaligned the check of size_diff is not needed.


BTW, I'm very surprised that it improves performance so much.
And also size is reduced significantly. 89MB against 289MB without patch.
Could you explain in details, why does it happen?

Size reduction is unexpected for me.
There might be 4 plausible explanations. I'll list them ordered by
descending of probability:
1. Before this patch every update was throwing recently updated tuple
to the end of a page. Thus, in some-how ordered data, recent best-fit
will be discovered last. This can cause increase of MBB's overlap in
spatial index and slightly higher tree. But 3 times size decrease is
unlikely.
How did you obtained those results? Can I look at a test case?


Nothing special, I've just run the test you provided in this thread.
And check index size via select pg_size_pretty(pg_relation_size('idx'));


2. Bug in PageIndexTupleDelete causing unused space emersion. I've
searched for it, unsuccessfully.
3. Bug in PageIndexTupleOVerwrite. I cannot imagine nature of such a
bug. May be we are not placing something not very important and big on
a page?
4. Magic.
I do not see what one should do with the R-tree to change it's size by
a factor of 3. First three explanations are not better that forth,
actually.
Those 89 MB, they do not include WAL, right?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-08 Thread Anastasia Lubennikova

08.08.2016 03:51, Michael Paquier:

On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:

Anastasia Lubennikova wrote:

So there is a couple of patches. They do not cover all mentioned problems,
but I'd like to get a feedback before continuing.

I agree that we could improve things in this general area, but I do not
endorse any particular changes you propose in these patches; I haven't
reviewed your patches.

Well, I am interested in the topic. And just had a look at the patches proposed.

+ /*
+ * Split PageGetItem into set of different macros
+ * in order to make code more readable.
+ */
+#define PageGetItemHeap(page, itemId) \
+( \
+   AssertMacro(PageIsValid(page)), \
+   AssertMacro(ItemIdHasStorage(itemId)), \
+   (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \
+)
Placing into bufpage.h macros that are specific to heap and indexes is
not that much a good idea. I'd suggest having the heap ones in
heapam.h, and the index ones in a more generic place, as
src/access/common as already mentioned by Alvaro.

+   onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid);
Just PageGetItemHeapHeader would be fine.

@@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate)
 pfree(bistate);
  }

-
  /*
   * heap_insert - insert tuple into a heap
Those patches have some noise.

Patch 0002 is doing two things: reorganizing the order of the routines
in heapam.c and move some routines from heapam.c to hio.c. Those two
could be separate patches/

If the final result is to be able to extend custom AMs for tables, we
should have a structure like src/access/common/index.c,
src/access/common/table.c and src/access/common/relation.c for
example, and have headers dedicated to them. But yes, there is
definitely a lot of room for refactoring as we are aiming, at least I
guess so, at having more various code paths for access methods.


Thank you for the review, I'll fix these problems in final version.

Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more concrete.

Thank you for attention and feedback.
Since there are no objections to the idea in general, I'll write an 
exhaustive

README about current state of the code and then propose API design
to discuss details.

Stay tuned.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-08 Thread Anastasia Lubennikova

05.08.2016 20:56, Alvaro Herrera:

Anastasia Lubennikova wrote:

Working on page compression and some other issues related to
access methods, I found out that the code related to heap
looks too complicated. Much more complicated, than it should be.
Since I anyway got into this area, I want to suggest a set of improvements.

Hm.  I'm hacking on heapam.c pretty heavily ATM.  Your first patch
causes no problem I think, or if it does it's probably pretty minor, so
that's okay.  I'm unsure about the second one -- I think it's okay too,
because it mostly seems to be about moving stuff from heapam.c to hio.c
and shuffling some code around that I don't think I'm modifying.


I'm sure these patches will not cause any problems. The second one is
mostly about moving unrelated stuff from heapam.c to hio.c and renaming
couple of functions in heap.c.
Anyway, the are not a final solution just proof of concept.

By the way, I thought about looking at the mentioned patch and
probably reviewing it, but didn't find any of your patches on the
current commitfest. Could you point out the thread?


Also I think that it's quite strange to have a function heap_create(), that
is called
from index_create(). It has absolutely nothing to do with heap access
method.

Agreed.  But changing its name while keeping it in heapam.c does not
really improve things enough.  I'd rather have it moved elsewhere that's
not specific to "heaps" (somewhere in access/common perhaps).  However,
renaming the functions would break third-party code for no good reason.
I propose that we only rename things if we also break it for other
reasons (say because the API changes in a fundamental way).


Yes, I agree that it should be moved to another file.
Just to be more accurate: it's not in heapam.c now, it is in
"src/backend/catalog/heap.c" which requires much more changes
than I did.

Concerning to the third-party code. It's a good observation and we
definitely should keep it in mind. Nevertheless, I doubt that there's a 
lot of

external calls to these functions. And I hope this thread will end up with
fundamental changes introducing clear API for all mentioned problems.




One more thing that requires refactoring is "RELKIND_RELATION" name.
We have a type of relation called "relation"...

I don't see us fixing this bit, really.  Historical baggage and all
that.  For example, a lot of SQL queries use "WHERE relkind = 'r'".  If
we change the name, we should probably also change the relkind constant,
and that would break the queries.  If we change the name and do not
change the constant, then we have to have a comment "we call them
RELKIND_TABLE but the char is r because it was called RELATION
previously", which isn't any better.


Good point.
I'd rather change it to RELKIND_BASIC_RELATION or something like that,
which is more specific and still goes well with 'r' constant. But minor 
changes

like that can wait until we clarify the API in general.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-08 Thread Anastasia Lubennikova

05.08.2016 19:41, Robert Haas:


2. This inserts additional code in a bunch of really low-level places
like heap_hot_search_buffer, heap_update, heap_delete, etc.  I think
what you've basically done here is create a new, in-memory heap AM and
then, because we don't have an API for adding new storage managers,
you've bolted it onto the existing heapam layer.  That's certainly a
reasonable approach for creating a PoC, but I think we actually need a
real API here.  Otherwise, when the next person comes along and wants
to add a third heap implementation, they've got to modify all of these
same places again.  I don't think this code is reasonably maintainable
in this form.


As I can see, you recommend to clean up the API of storage
management code. I strongly agree that it's high time to do it.

So, I started the discussion about refactoring and improving API
of heapam and heap relations.
You can find it on commitfest:
https://commitfest.postgresql.org/10/700/

I'll be glad to see your thoughts on the thread.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-08-05 Thread Anastasia Lubennikova

30.07.2016 14:00, Andrew Borodin:

Here is BRIN-compatible version of patch. Now function
PageIndexTupleOverwrite consumes tuple size as a parameter, this
behavior is coherent with PageAddItem.
Also, i had to remove asserstion that item has a storage in the loop
that adjusts line pointers. It would be great if someone could check
that place (commented Assert(ItemIdHasStorage(ii)); in
PageIndexTupleOverwrite). I suspect that there might be a case when
linp's should not be adjusted.


Hi, I reviewed the code and I have couple of questions about
following code:

/* may have negative size here if new tuple is larger */
size_diff = oldsize - alignednewsize;
offset = ItemIdGetOffset(tupid);

if (offset < phdr->pd_upper || (offset + size_diff) > 
phdr->pd_special ||

offset != MAXALIGN(offset) || size_diff != MAXALIGN(size_diff))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg("corrupted item offset: offset = %u, size = %u",
offset, (unsigned int) size_diff)));

First of all, shouldn't we use MAXALIGN(oldsize) instead of oldsize?
Although, I'm quite sure that it was already aligned somewhere before.

I doubt that the check (size_diff != MAXALIGN(size_diff)) is necessary.
I'd rather add  the check: (offset+size_diff < pd_lower).

Besides that moment, the patch seems pretty good.

BTW, I'm very surprised that it improves performance so much.
And also size is reduced significantly. 89MB against 289MB without patch.
Could you explain in details, why does it happen?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-05 Thread Anastasia Lubennikova

05.08.2016 16:30, Kevin Grittner:

On Fri, Aug 5, 2016 at 2:54 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:


They can be logically separated into three categories:
"primary storage" - r, S, t, v. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, m. They have no physical storage, only entries
in caches and catalogs.

Materialized views (relkind == "m") have physical storage, and may have indexes.


Good point. I сonfused letters for views (virtual relations) and
materialized views (primary storage).
Should be:

"primary storage" - r, S, t, m. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, v. They have no physical storage, only entries
in caches and catalogs.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Refactoring of heapam code.

2016-08-05 Thread Anastasia Lubennikova

Working on page compression and some other issues related to
access methods, I found out that the code related to heap
looks too complicated. Much more complicated, than it should be.
Since I anyway got into this area, I want to suggest a set of improvements.

There is a number of problems I see in the existing code:
Problem 1. Heap != Relation.

File heapam.c has a note:
 *  This file contains the heap_ routines which implement
 *  the POSTGRES heap access method used for all POSTGRES
 *  relations.
This statement is wrong, since we also have index relations,
that are implemented using other access methods.

Also I think that it's quite strange to have a function heap_create(), 
that is called
from index_create(). It has absolutely nothing to do with heap access 
method.


And so on, and so on.

One more thing that requires refactoring is "RELKIND_RELATION" name.
We have a type of relation called "relation"...

Problem 2.
Some functions are shared between heap and indexes access methods.
Maybe someday it used to be handy, but now, I think, it's time to separate
them, because IndexTuples and HeapTuples have really little in common.

Problem 3. The word "heap" is used in many different contexts.
Heap - a storage where we have tuples and visibility information.
Generally speaking, it can be implemented over any data structure,
not just a plain table as it is done now.

Heap - an access method, that provides a set of routines for plain table
storage, such as insert, delete, update, gettuple, vacuum and so on.
We use this access method not only for user's tables.

There are many types of relations (pg_class.h):
#define  RELKIND_RELATION'r'/* ordinary table */
#define  RELKIND_INDEX   'i'/* secondary index */
#define  RELKIND_SEQUENCE'S'/* sequence object */
#define  RELKIND_TOASTVALUE  't'/* for out-of-line 
values */

#define  RELKIND_VIEW'v'/* view */
#define  RELKIND_COMPOSITE_TYPE  'c'/* composite type */
#define  RELKIND_FOREIGN_TABLE   'f'/* foreign table */
#define  RELKIND_MATVIEW 'm'/* materialized view */

They can be logically separated into three categories:
"primary storage" - r, S, t, v. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, m. They have no physical storage, only entries
in caches and catalogs.

Now, for some reasons, we sometimes use name "heap" for both
"primary storage" and "virual relations". See src/backend/catalog/heap.c.
But some functions work only with "primary storage". See pgstat_relation().
And we constantly have to check relkind everywhere.
I'd like it would be clear from the code interface and naming.

So there is a couple of patches. They do not cover all mentioned problems,
but I'd like to get a feedback before continuing.

Patch 1:
There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item
from the given page. It is used for both heap and index tuples.
It doesn't seems a problem, until you are going to find anything in this 
code.


The first patch "PageGetItem_refactoring.patch" is intended to fix it.
It is just renaming:
(IndexTuple) PageGetItem ---> PageGetItemIndex
(HeapTupleHeader) PageGetItem ---> PageGetItemHeap
Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple, 
SpGistDeadTuple)

still use PageGetItem.

I also changed functions that do not access tuple's data, but only need
HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly.

I do not insist on these particular names, by the way.

Patch 2.
heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names
and outdated comments. The patch is intended to improve it.
Fun fact: I found several "check it later" comments unchanged since
 "Postgres95 1.01 Distribution - Virgin Sources" commit.

I wonder if we can wind better name relation_drop_with_catalog() since, 
again, it's
not about all kinds of relations. We use another function to drop index 
relations.


I hope these changes will be useful for the future development.
Suggested patches are mostly about renaming and rearrangement of functions
between files, so, if nobody has conceptual objections, all I need from 
reviewers

is an attentive look to find typos, grammar mistakes and overlooked areas.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 4983bbb..257b609 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c

Re: [HACKERS] Leaking memory in text_overlay function

2016-08-04 Thread Anastasia Lubennikova
I found this thread on the open CF. As I see, the discussion is ended 
with decision to reject patch.


I've just changed the status to "Rejected".


02.07.2016 18:11, Dirk Rudolph:
Well that's good to know. It was just curious that, in my case, I only 
saw this in this particular function.


Anyway. I will consider to handle the memory the same way, if this is 
the recommendation.


Many thanks.

/Closed

On Sat, Jul 2, 2016 at 4:45 PM, Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>> wrote:


Dirk Rudolph <dirk.rudo...@netcentric.biz
<mailto:dirk.rudo...@netcentric.biz>> writes:
> while implementing my own C function, I mentioned that some
memory is not freed by the text_overlay function in varlena.c

By and large, that's intentional.  SQL-called functions normally run
in short-lived memory contexts, so that any memory they don't
bother to
pfree will be cleaned up automatically at the end of the tuple cycle.
If we did not follow that approach, there would need to be many
thousands
more explicit pfree calls than there are today, and the system
would be
net slower overall because multiple retail pfree's are slower than a
MemoryContextReset.

There are places where it's important to avoid leaking memory,
certainly.
But I don't think this is one of them, unless you can demonstrate a
scenario with query-lifespan or worse leakage.

> Particularly I mean the both substrings allocated by
text_substring (according to the documentation of text_substring
"The result is always a freshly palloc'd datum.") and one of the
concatenation results. I’m aware of the MemoryContext being
deleted in my case but IMHO builtin functions should be memory safe.

That is not the project policy.

regards, tom lane




--

Dirk Rudolph | Senior Software Engineer

Netcentric AG

M: +41 79 642 37 11
D: +49 174 966 84 34

dirk.rudo...@netcentric.biz <mailto:dirk.rudo...@netcentric.biz> | 
www.netcentric.biz <http://www.netcentric.biz/>




--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



[HACKERS] Processes and caches in postgresql

2016-04-27 Thread Anastasia Lubennikova

Hi, hackers.
There's a couple of questions about processes.

I found EXEC_BACKEND flag, while reading the code.
As I understood, it exists because we have to emulate fork() on WIN32.
And also it allows to debug the same behavior on Linux.
Is it right? Are there any other use cases?

Another question is about "--fork" argument (see code below).
I didn't find it in documentation, so I'm a bit confused.
I wonder if it exists only for internal purposes?

#ifdef EXEC_BACKEND
if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
SubPostmasterMain(argc, argv);/* does not return */
#endif

And the last, but not least. Do we have any 
presentations/articles/READMEs/whatever

about caches (src/backend/utils/cache/*) in postgresql?
I found nothing, besides comments in the code.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-12 Thread Anastasia Lubennikova
Attached version has fix of pg_dump suggested by Stephen 
Frost<http://postgresql.nabble.com/template/NamlServlet.jtp?macro=user_nodes=75583> 
in -committers thread.

http://postgresql.nabble.com/pgsql-CREATE-INDEX-INCLUDING-column-td5897653.html
Sooner or later, I'd like to see this patch finished.

For now, it has two complaints:
- support of expressions as included columns.
Frankly, I don't understand, why it's a problem of the patch.
The patch is  already big enough and it will be much easier to add 
expressions support in the following patch, after the first one will be 
stable.

I wonder, if someone has objections to that?
Yes, it's a kind of delayed feature. But should we wait for every patch 
when it will be entirely completed?


- lack of review and testing
Obviously I did as much testing as I could.
So, if reviewers have any concerns about the patch, I'm waiting forward 
to see them.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..142730a 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int			indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 appendStringInfoCharMacro(payload, ',');
 appendStringInfoCharMacro(payload, operation);
 
-for (i = 0; i < numatts; i++)
+for (i = 0; i < indnkeyatts; i++)
 {
 	int			colno = index->indkey.values[i];
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d6b60db..3

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-08 Thread Anastasia Lubennikova

08.04.2016 15:45, Anastasia Lubennikova:

08.04.2016 15:06, Teodor Sigaev:

On Wed, Apr 6, 2016 at 1:50 PM, Peter Geoghegan <p...@heroku.com> wrote:

Personally, I like documenting assertions, and will sometimes write
assertions that the compiler could easily optimize away. Maybe going
*that* far is more a matter of personal style, but I think an
assertion about the new index tuple size being <= the old one is just
a good idea. It's not about a problem in your code at all.


You should make index_truncate_tuple()/index_reform_tuple() promise to
always do this in its comments/contract with caller as part of this,
IMV.


Some notices:
- index_truncate_tuple(Relation idxrel, IndexTuple olditup, int 
indnatts,

   int  indnkeyatts)
  Why we need indnatts/indnkeyatts? They are presented in idxrel struct
  already
- follow code where index_truncate_tuple() is called, it should never 
called in
  case where indnatts == indnkeyatts. So, indnkeyatts should be 
strictly less
  than indnatts, pls, change assertion. If they are equal the this 
function

  becomes complicated variant of CopyIndexTuple()


Good point. These attributes seem to be there since previous versions 
of the function.

But now they are definitely unnecessary. Updated patch is attached


One more improvement - note about expressions into documentation.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index b5f67af..61a21a9 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -161,6 +161,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] INCLUDING clause, which can slightly reduce the size of the index,
 due to storing included attributes only in leaf index pages.
 Currently, only the B-tree access method supports this feature.
+Expressions as included columns are not supported since they cannot be used
+in index-only scan.

   
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-04-08 Thread Anastasia Lubennikova
ake it as simple as possible despite significant
effort. It's hard to describe these things in an accessible way. Maybe
someone can try and understand what I've said there, and let me know
how I might have fallen short. I have used a new term, "cousin page"
in this explanation (cf. sibling page). This seems like useful
terminology that makes these difficult explanations a bit more
accessible.


I'm not an expert in btree locking races, but I tested the patch it on 
different loads and it works as promised.

I didn't find mistakes in the code logic as well.

As a reviewer, I don't have any objections to mark it "Ready for Committer".
The only notice I'd like to add is about it's compatibility with 
covering indexes.
We already discussed it in the thread 
https://commitfest.postgresql.org/9/433/

Tiny attached patch fixes this issue.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/amcheck/amcheck.c b/contrib/amcheck/amcheck.c
index 66d1f4f..91b2bc5 100644
--- a/contrib/amcheck/amcheck.c
+++ b/contrib/amcheck/amcheck.c
@@ -1118,6 +1118,9 @@ invariant_key_less_than_equal_offset(BtreeCheckState *state, ScanKey key,
 	int16		natts = state->rel->rd_rel->relnatts;
 	int32		cmp;
 
+#if (PG_VERSION_NUM >= 90600)
+		natts = state->rel->rd_index->indnkeyatts;
+#endif
 	cmp = _bt_compare(state->rel, natts, key, state->target, upperbound);
 
 	return cmp <= 0;
@@ -1139,6 +1142,9 @@ invariant_key_greater_than_equal_offset(BtreeCheckState *state, ScanKey key,
 	int16		natts = state->rel->rd_rel->relnatts;
 	int32		cmp;
 
+#if (PG_VERSION_NUM >= 90600)
+		natts = state->rel->rd_index->indnkeyatts;
+#endif
 	cmp = _bt_compare(state->rel, natts, key, state->target, lowerbound);
 
 	return cmp >= 0;
@@ -1164,6 +1170,9 @@ invariant_key_less_than_equal_nontarget_offset(BtreeCheckState *state,
 	int16		natts = state->rel->rd_rel->relnatts;
 	int32		cmp;
 
+#if (PG_VERSION_NUM >= 90600)
+		natts = state->rel->rd_index->indnkeyatts;
+#endif
 	cmp = _bt_compare(state->rel, natts, key, nontarget, upperbound);
 
 	return cmp <= 0;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-08 Thread Anastasia Lubennikova

08.04.2016 15:06, Teodor Sigaev:

On Wed, Apr 6, 2016 at 1:50 PM, Peter Geoghegan <p...@heroku.com> wrote:

Personally, I like documenting assertions, and will sometimes write
assertions that the compiler could easily optimize away. Maybe going
*that* far is more a matter of personal style, but I think an
assertion about the new index tuple size being <= the old one is just
a good idea. It's not about a problem in your code at all.


You should make index_truncate_tuple()/index_reform_tuple() promise to
always do this in its comments/contract with caller as part of this,
IMV.


Some notices:
- index_truncate_tuple(Relation idxrel, IndexTuple olditup, int indnatts,
   int  indnkeyatts)
  Why we need indnatts/indnkeyatts? They are presented in idxrel struct
  already
- follow code where index_truncate_tuple() is called, it should never 
called in
  case where indnatts == indnkeyatts. So, indnkeyatts should be 
strictly less
  than indnatts, pls, change assertion. If they are equal the this 
function

  becomes complicated variant of CopyIndexTuple()


Good point. These attributes seem to be there since previous versions of 
the function.

But now they are definitely unnecessary. Updated patch is attached

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..142730a 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int			indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(P

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-07 Thread Anastasia Lubennikova

06.04.2016 23:52, Peter Geoghegan:

On Wed, Apr 6, 2016 at 1:50 PM, Peter Geoghegan <p...@heroku.com> wrote:

Personally, I like documenting assertions, and will sometimes write
assertions that the compiler could easily optimize away. Maybe going
*that* far is more a matter of personal style, but I think an
assertion about the new index tuple size being <= the old one is just
a good idea. It's not about a problem in your code at all.

You should make index_truncate_tuple()/index_reform_tuple() promise to
always do this in its comments/contract with caller as part of this,
IMV.



Mentioned issues are fixed. Patch is attached.

I'd like to remind you that the commitfest will be closed very-very 
soon, so I'd like to get your final resolution about the patch.

Not to have it in the 9.6 release will be very disappointing.

I agree that b-tree is a crucial subsystem. But it seems to me, that we 
have lack of improvements in this area
not only because of the algorithm's complexity but also because of lack 
of enthusiasts to work on it and struggle through endless discussions.
But it's off-topic here. Attention to these development difficulties 
will be one of the messages of my pgcon talk.


You know, we lost a lot of time discussing various b-tree problems.
Besides that, I am sure that the patch is really in a good shape. It 
hasn't any open problems to fix.

And possible subtle bugs can be found at the testing stage of the release.

Looking forward to your reply.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..142730a 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary 

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-06 Thread Anastasia Lubennikova

06.04.2016 16:15, Anastasia Lubennikova :

06.04.2016 03:05, Peter Geoghegan:

* There is some stray whitespace within RelationGetIndexAttrBitmap().
I think you should have updated it with code, though. I don't think
it's necessary for HOT updates to work, but I think it could be
necessary so that we don't need to get a row lock that blocks
non-conflict foreign key locking (see heap_update() callers). I think
you need to be careful for non-key columns within the loop in
RelationGetIndexAttrBitmap(), basically, because it seems to still go
through all columns. UPSERT also must call this code, FWIW.

* I think that a similar omission is also made for the replica
identity stuff in RelationGetIndexAttrBitmap(). Some thought is needed
on how this patch interacts with logical decoding, I guess.


Good point. Indexes are everywhere in the code.
I missed that RelationGetIndexAttrBitmap() is used not only for REINDEX.
I'll discuss it with Theodor and send an updated patch tomorrow.


As promised, updated patch is in attachments.
But, I'm not an expert in this area, so it needs a 'critical look'.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..142730a 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int			indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 appendStringInfoCharMacro(payload, ',');
 appendStringInfoCharMacro(payload, operation);
 
-for (i = 0; i <

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-06 Thread Anastasia Lubennikova
 b;
==12310== Invalid read of size 4
==12310==at 0x656615: match_clause_to_indexcol (indxpath.c:2226)
==12310==by 0x656615: match_clause_to_index (indxpath.c:2144)
==12310==by 0x656DBC: match_clauses_to_index (indxpath.c:2115)
==12310==by 0x658054: match_restriction_clauses_to_index (indxpath.c:2026)
==12310==by 0x658054: create_index_paths (indxpath.c:269)
==12310==by 0x64D1DB: set_plain_rel_pathlist (allpaths.c:649)
==12310==by 0x64D1DB: set_rel_pathlist (allpaths.c:427)
==12310==by 0x64D93B: set_base_rel_pathlists (allpaths.c:299)
==12310==by 0x64D93B: make_one_rel (allpaths.c:170)
==12310==by 0x66876C: query_planner (planmain.c:246)
==12310==by 0x669FBA: grouping_planner (planner.c:1666)
==12310==by 0x66D0C9: subquery_planner (planner.c:751)
==12310==by 0x66D3DA: standard_planner (planner.c:300)
==12310==by 0x66D714: planner (planner.c:170)
==12310==by 0x6FD692: pg_plan_query (postgres.c:798)
==12310==by 0x59082D: ExplainOneQuery (explain.c:350)
==12310==  Address 0xbff290c is 2,508 bytes inside a block of size 8,192 alloc'd
==12310==at 0x4C2AB80: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12310==by 0x81B7FA: AllocSetAlloc (aset.c:853)
==12310==by 0x81D257: palloc (mcxt.c:907)
==12310==by 0x4B6F65: RelationGetIndexScan (genam.c:94)
==12310==by 0x4C135D: btbeginscan (nbtree.c:431)
==12310==by 0x4B7A5C: index_beginscan_internal (indexam.c:279)
==12310==by 0x4B7C5A: index_beginscan (indexam.c:222)
==12310==by 0x4B73D1: systable_beginscan (genam.c:379)
==12310==by 0x7E8CF9: ScanPgRelation (relcache.c:341)
==12310==by 0x7EB3C4: RelationBuildDesc (relcache.c:951)
==12310==by 0x7ECD35: RelationIdGetRelation (relcache.c:1800)
==12310==by 0x4A4D37: relation_open (heapam.c:1118)
==12310==
{

Memcheck:Addr4
fun:match_clause_to_indexcol
fun:match_clause_to_index
fun:match_clauses_to_index
fun:match_restriction_clauses_to_index
fun:create_index_paths
fun:set_plain_rel_pathlist
fun:set_rel_pathlist
fun:set_base_rel_pathlists
fun:make_one_rel
fun:query_planner
fun:grouping_planner
fun:subquery_planner
fun:standard_planner
fun:planner
fun:pg_plan_query
fun:ExplainOneQuery
}

Separately, I tried "make installcheck-tests TESTS=index_including"
from Postgres + Valgrind, with Valgrind's --track-origins option
enabled (as it was above). I recommend installing Valgrind, and making
sure that the patch shows no errors. I didn't actually find a Valgrind
issue from just using your regression tests (nor did I find an issue
from separately running the regression tests with
CLOBBER_CACHE_ALWAYS, FWIW).


Thank you for advice.
Another miss of index->ncolumns to index->nkeycolumns replacement in 
match_clause_to_index. Fixed.

I also updated couple of typos in documentation.

Thank you again for the detailed review.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-05 Thread Anastasia Lubennikova

05.04.2016 01:48, Peter Geoghegan :

On Mon, Mar 21, 2016 at 9:53 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

It's a bit more complicated to add it into index creation algorithm.
There's a trick with a "high key".
 /*
  * We copy the last item on the page into the new page, and then
  * rearrange the old page so that the 'last item' becomes its high
key
  * rather than a true data item.  There had better be at least two
  * items on the page already, else the page would be empty of useful
  * data.
  */
 /*
  * Move 'last' into the high key position on opage
  */

To be consistent with other steps of algorithm ( all high keys must be
truncated tuples), I had to update this high key on place:
delete the old one, and insert truncated high key.

Hmm. But the high key comparing equal to the Scankey gives insertion
the choice of where to put its IndexTuple (it can go on the page with
the high key, or its right-sibling, according only to considerations
about fillfactor, etc). Is this changed? Does it not matter? Why not?
Is it just worth it?


I would say, this is changed, but it doesn't matter.
Performing any search in btree (including choosing suitable page for 
insertion), we use only key attributes.

We assume that included columns are stored in index unordered.
Simple example.
create table tbl(id int, data int);
create index idx on tbl (id) including (data);

Select query does not consider included columns in scan key.
It selects all tuples satisfying the condition on key column. And only 
after that it applies filter to remove wrong rows from the result.
If key attribute doesn't satisfy query condition, there are no more 
tuples to return and we can interrupt scan.


You can find more explanations in the attached sql script,
that contains queries to recieve detailed information about index 
structure using pageinspect.



The right-most page on every level has no high-key. But you say those
pages have an "imaginary" *positive* infinity high key, just as
internal pages have (non-imaginary) minus infinity downlinks as their
first item/downlink. So tuples in a (say) leaf page are always bound
by the downlink lower bound in parent, while their own high key is an
upper bound. Either (and, rarely, both) could be (positive or
negative) infinity.

Maybe you now see why I talked about special _bt_compare() logic for
this. I proposed special logic that is similar to the existing minus
infinity thing _bt_compare() does (although _bt_binsrch(), an
important caller of _bt_compare() also does special things for
internal .vs leaf case, so I'm not sure any new special logic must go
in _bt_compare()).


In my view, it's the correct way to fix this problem, because the caller is
responsible for passing proper arguments to the function.
Of course I will add a check into bt_compare, but I'd rather make it an
assertion (see the patch attached).

I see what you mean, but I think we need to decide what to do about
the key space when leaf high keys are truncated. I do think that
truncating the high key was the right idea, though, and it nicely
illustrates that nothing special should happen in upper levels. Suffix
truncation should only happen when leaf pages are split, generally
speaking.
As I said, the high key is very similar to the downlinks, in that both
bound the things that go on each page. Together with downlinks they
represent *discrete* ranges *unambiguously*, so INCLUDING truncation
needs to make it clear which page new items go on. As I said,
_bt_binsrch() already takes special actions for internal pages, making
sure to return the item that is < the scankey, not <= the scankey
which is only allowed for leaf pages. (See README, from "Lehman and
Yao assume that the key range for a subtree S is described by Ki < v
<= Ki+1 where Ki and Ki+1 are the adjacent keys in the parent
page...").

To give a specific example, I worry about the case where two sibling
downlinks in a parent page are distinct, but per specific-to-Postgres
"Ki <= v <= Ki+1" thing (which differs from the classic L
invariant), some tuples with all right downlink's attributes matching
end up in left child page, not right child page. I worry that since
_bt_findsplitloc() doesn't consider this (for example), the split
point doesn't *reliably* and unambiguously divide the key space
between the new halves of a page being split. I think the "Ki <= v <=
Ki+1"/_bt_binsrch() thing might save you in common cases where all
downlink attributes are distinct, so maybe that simpler case is okay.
But to be even more specific, what about the more complicated case
where the downlinks *are* fully _bt_compare()-wise equal? This could
happen even though they're constrained to be unique in leaf pages, due
to bloat. Unique indexes aren't special here; they just make it far
less likely that this w

Re: [HACKERS] [PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-29 Thread Anastasia Lubennikova

17.03.2016 06:27, Vitaly Burovoy:

On 2016-03-15, David Steele <da...@pgmasters.net> wrote:

On 3/4/16 2:56 PM, Vitaly Burovoy wrote:

On 3/4/16, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote:


I think that you should update documentation. At least description of
epoch on this page:
http://www.postgresql.org/docs/devel/static/functions-datetime.html

Thank you very much for pointing where it is located (I saw only
"to_timestamp(TEXT, TEXT)").
I'll think how to update it.

Vitaly, have you decided how to update this yet?

Yes, there are three versions:
* remove mentioning how to get timestamptz from UNIX stamp;
* leave a note how to get timestamptz and add a note that such
encapsulation existed prior to 9.6;
* replace to the proposed current behavior (without interval).

I decided to implement the third case (but a result there has a time
zone which can be different, so the result can be not exactly same for
a concrete user). If a committer decides to do somehow else, he is
free to choose one from the list above or to do something else.


3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice
abbreviation, but it seems slightly confusing to me.

It doesn't matter for me what it is called, it is short enough and
reflects a type on which it is applied.
What would the best name be for it?

Anastasia, any suggestions for a better name, or just leave it as is?

I'm not in favor of the "4", either.  I think I would prefer
JULIAN_MAXYEAR_STAMP.

It turns out that Tom has changed almost one third of timestamp.h and
now the constant has a name "TIMESTAMP_END_JULIAN".

I've rebased the patch to the current master (5db5146) and changed it
according to the new timestamp.h.

Now it passes both versions (integer and float timestamps).
I deleted test for the upper boundary for integer timestamps, changed
a little comments.

I decided to delete hints about minimum and maximum allowed values
because no one type has such hint.

Please find attached a new version of the patch.



I think, I should write something as a reviewer.
I read the patch again and I don't see any issues with it.
It applies to the master and works as expected. So I'll mark it as 
"Ready for committer"


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-28 Thread Anastasia Lubennikova

25.03.2016 01:12, Peter Geoghegan:

On Thu, Mar 24, 2016 at 7:17 AM, Robert Haas <robertmh...@gmail.com> wrote:

I really like this idea, and the performance results seem impressive,
but I think we should push this out to 9.7.  A btree patch that didn't
have WAL support until two and a half weeks into the final CommitFest
just doesn't seem to me like a good candidate.  First, as a general
matter, if a patch isn't code-complete at the start of a CommitFest,
it's reasonable to say that it should be reviewed but not necessarily
committed in that CommitFest.

You're right.
Frankly, I thought that someone will help me with the path, but I had to 
finish it myself.

*off-topic*
I wonder, if we can add new flag to commitfest. Something like "Needs 
assistance",

which will be used to mark big and complicated patches in progress.
While "Needs review" means that the patch is almost ready and only 
requires the final review.



   This patch has had some review, but I'm
not sure how deep that review is, and I think it's had no code review
at all of the WAL logging changes, which were submitted only a week
ago, well after the CF deadline.  Second, the btree AM is a
particularly poor place to introduce possibly destabilizing changes.
Everybody depends on it, all the time, for everything.  And despite
new tools like amcheck, it's not a particularly easy thing to debug.

Regrettably, I must agree. I don't see a plausible path to commit for
this patch in the ongoing CF.

I think that Anastasia did an excellent job here, and I wish I could
have been of greater help sooner. Nevertheless, it would be unwise to
commit this given the maturity of the code. There have been very few
instances of performance improvements to the B-Tree code for as long
as I've been interested, because it's so hard, and the standard is so
high. The only example I can think of from the last few years is
Kevin's commit 2ed5b87f96 and Tom's commit 1a77f8b63d both of which
were far less invasive, and Simon's commit c7111d11b1, which we just
outright reverted from 9.5 due to subtle bugs (and even that was
significantly less invasive than this patch). Improving nbtree is
something that requires several rounds of expert review, and that's
something that's in short supply for the B-Tree code in particular. I
think that a new testing strategy is needed to make this easier, and I
hope to get that going with amcheck. I need help with formalizing a
"testing first" approach for improving the B-Tree code, because I
think it's the only way that we can move forward with projects like
this. It's *incredibly* hard to push forward patches like this given
our current, limited testing strategy.


Unfortunately, I must agree. This patch seems to be far from final 
version until the feature freeze.

I'll move it to the future commitfest.

Anyway it means, that now we have more time to improve the patch.
If you have any ideas related to this patch like prefix/suffix 
compression, I'll be glad to discuss them.

Same for any other ideas of B-tree optimization.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-03-28 Thread Anastasia Lubennikova

21.03.2016 19:53, Anastasia Lubennikova:

19.03.2016 08:00, Peter Geoghegan:
On Fri, Mar 18, 2016 at 5:15 AM, David Steele <da...@pgmasters.net> 
wrote:
It looks like this patch should be marked "needs review" and I have 
done so.

Uh, no it shouldn't. I've posted an extensive review on the original
design thread. See CF entry:

https://commitfest.postgresql.org/9/433/

Marked "Waiting on Author".

Thanks to David,
I've missed these letters at first.
I'll answer here.


* You truncate (remove suffix attributes -- the "included" attributes)
within _bt_insertonpg():

-   right_item = CopyIndexTuple(item);
+   indnatts = IndexRelationGetNumberOfAttributes(rel);
+   indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+
+   if (indnatts != indnkeyatts)
+   {
+   right_item = index_reform_tuple(rel, item, indnatts, 
indnkeyatts);

+   right_item_sz = IndexTupleDSize(*right_item);
+   right_item_sz = MAXALIGN(right_item_sz);
+   }
+   else
+   right_item = CopyIndexTuple(item);
 ItemPointerSet(&(right_item->t_tid), rbkno, P_HIKEY);

I suggest that you do this within _bt_insert_parent(), instead, iff
the original target page is know to be a leaf page. That's where it
needs to happen for conventional suffix truncation, which has special
considerations when determining which attributes are safe to truncate
(or even which byte in the first distinguishing attribute it is okay
to truncate past)


I agree that _bt_insertonpg() is not right for truncation.
Furthermore, I've noticed that all internal keys are solely the copies 
of "High keys" from the leaf pages. Which is pretty logical.
Therefore, if we have already truncated the tuple, when it became a 
High key, we do not need the same truncation within 
_bt_insert_parent() or any other function.
So the only thing to worry about is the HighKey truncation. I rewrote 
the code. Now only _bt_split cares about truncation.


It's a bit more complicated to add it into index creation algorithm.
There's a trick with a "high key".
/*
 * We copy the last item on the page into the new page, and then
 * rearrange the old page so that the 'last item' becomes its 
high key
 * rather than a true data item.  There had better be at least 
two
 * items on the page already, else the page would be empty of 
useful

 * data.
 */
/*
 * Move 'last' into the high key position on opage
 */

To be consistent with other steps of algorithm ( all high keys must be 
truncated tuples), I had to update this high key on place:

delete the old one, and insert truncated high key.
The very same logic I use to truncate posting list of a compressed 
tuple in the "btree_compression" patch. [1]
I hope, both patches will be accepted, and then I'll thoroughly merge 
them .



* I think the comparison logic may have a bug.

Does this work with amcheck? Maybe it works with bt_index_check(), but
not bt_index_parent_check()? I think that you need to make sure that
_bt_compare() knows about this, too. That's because it isn't good
enough to let a truncated internal IndexTuple compare equal to a
scankey when non-truncated attributes are equal.


It is a very important issue. But I don't think it's a bug there.
I've read amcheck sources thoroughly and found that the problem 
appears at

"invariant_key_less_than_equal_nontarget_offset()


static bool
invariant_key_less_than_equal_nontarget_offset(BtreeCheckState *state,
   Page nontarget, ScanKey 
key,

   OffsetNumber upperbound)
{
int16natts = state->rel->rd_rel->relnatts;
int32cmp;

cmp = _bt_compare(state->rel, natts, key, nontarget, upperbound);

return cmp <= 0;
}

It uses scankey, made with _bt_mkscankey() which uses only key 
attributes, but calls _bt_compare with wrong keysz.
If we wiil use nkeyatts = state->rel->rd_index->relnatts; instead of 
natts, all the checks would be passed successfully.


Same for invariant_key_greater_than_equal_offset() and 
invariant_key_less_than_equal_nontarget_offset().


In my view, it's the correct way to fix this problem, because the 
caller is responsible for passing proper arguments to the function.
Of course I will add a check into bt_compare, but I'd rather make it 
an assertion (see the patch attached).


I'll add a flag to distinguish regular and truncated tuples, but it 
will not be used in this patch. Please, comment, if I've missed 
something.
As you've already mentioned, neither high keys, nor tuples on internal 
pages are using  "itup->t_tid.ip_posid", so I'll take one bit of it.


It will definitely require changes in the future works on suffix 
truncation or something like that, but IMHO for now it's enough.


Do you have any objections or comments?

[1] https://commitfest.pos

Re: [HACKERS] WIP: Covering + unique indexes.

2016-03-21 Thread Anastasia Lubennikova

19.03.2016 08:00, Peter Geoghegan:

On Fri, Mar 18, 2016 at 5:15 AM, David Steele <da...@pgmasters.net> wrote:

It looks like this patch should be marked "needs review" and I have done so.

Uh, no it shouldn't. I've posted an extensive review on the original
design thread. See CF entry:

https://commitfest.postgresql.org/9/433/

Marked "Waiting on Author".

Thanks to David,
I've missed these letters at first.
I'll answer here.


* You truncate (remove suffix attributes -- the "included" attributes)
within _bt_insertonpg():

-   right_item = CopyIndexTuple(item);
+   indnatts = IndexRelationGetNumberOfAttributes(rel);
+   indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+
+   if (indnatts != indnkeyatts)
+   {
+   right_item = index_reform_tuple(rel, item, indnatts, indnkeyatts);
+   right_item_sz = IndexTupleDSize(*right_item);
+   right_item_sz = MAXALIGN(right_item_sz);
+   }
+   else
+   right_item = CopyIndexTuple(item);
 ItemPointerSet(&(right_item->t_tid), rbkno, P_HIKEY);

I suggest that you do this within _bt_insert_parent(), instead, iff
the original target page is know to be a leaf page. That's where it
needs to happen for conventional suffix truncation, which has special
considerations when determining which attributes are safe to truncate
(or even which byte in the first distinguishing attribute it is okay
to truncate past)


I agree that _bt_insertonpg() is not right for truncation.
Furthermore, I've noticed that all internal keys are solely the copies 
of "High keys" from the leaf pages. Which is pretty logical.
Therefore, if we have already truncated the tuple, when it became a High 
key, we do not need the same truncation within _bt_insert_parent() or 
any other function.
So the only thing to worry about is the HighKey truncation. I rewrote 
the code. Now only _bt_split cares about truncation.


It's a bit more complicated to add it into index creation algorithm.
There's a trick with a "high key".
/*
 * We copy the last item on the page into the new page, and then
 * rearrange the old page so that the 'last item' becomes its 
high key

 * rather than a true data item.  There had better be at least two
 * items on the page already, else the page would be empty of 
useful

 * data.
 */
/*
 * Move 'last' into the high key position on opage
 */

To be consistent with other steps of algorithm ( all high keys must be 
truncated tuples), I had to update this high key on place:

delete the old one, and insert truncated high key.
The very same logic I use to truncate posting list of a compressed tuple 
in the "btree_compression" patch. [1]

I hope, both patches will be accepted, and then I'll thoroughly merge them .


* I think the comparison logic may have a bug.

Does this work with amcheck? Maybe it works with bt_index_check(), but
not bt_index_parent_check()? I think that you need to make sure that
_bt_compare() knows about this, too. That's because it isn't good
enough to let a truncated internal IndexTuple compare equal to a
scankey when non-truncated attributes are equal.


It is a very important issue. But I don't think it's a bug there.
I've read amcheck sources thoroughly and found that the problem appears at
"invariant_key_less_than_equal_nontarget_offset()


static bool
invariant_key_less_than_equal_nontarget_offset(BtreeCheckState *state,
   Page nontarget, ScanKey key,
   OffsetNumber upperbound)
{
int16natts = state->rel->rd_rel->relnatts;
int32cmp;

cmp = _bt_compare(state->rel, natts, key, nontarget, upperbound);

return cmp <= 0;
}

It uses scankey, made with _bt_mkscankey() which uses only key 
attributes, but calls _bt_compare with wrong keysz.
If we wiil use nkeyatts = state->rel->rd_index->relnatts; instead of 
natts, all the checks would be passed successfully.


Same for invariant_key_greater_than_equal_offset() and 
invariant_key_less_than_equal_nontarget_offset().


In my view, it's the correct way to fix this problem, because the caller 
is responsible for passing proper arguments to the function.
Of course I will add a check into bt_compare, but I'd rather make it an 
assertion (see the patch attached).


I'll add a flag to distinguish regular and truncated tuples, but it will 
not be used in this patch. Please, comment, if I've missed something.
As you've already mentioned, neither high keys, nor tuples on internal 
pages are using  "itup->t_tid.ip_posid", so I'll take one bit of it.


It will definitely require changes in the future works on suffix 
truncation or something like that, but IMHO for now it's enough.


Do you have any objections or comments?

[1] https://commitfest.postgresql.org/9/494/

--
Anastasia Lubennikova
Pos

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-19 Thread Anastasia Lubennikova
Please, find the new version of the patch attached. Now it has WAL 
functionality.


Detailed description of the feature you can find in README draft 
https://goo.gl/50O8Q0


This patch is pretty complicated, so I ask everyone, who interested in 
this feature,

to help with reviewing and testing it. I will be grateful for any feedback.
But please, don't complain about code style, it is still work in progress.

Next things I'm going to do:
1. More debugging and testing. I'm going to attach in next message 
couple of sql scripts for testing.

2. Fix NULLs processing
3. Add a flag into pg_index, that allows to enable/disable compression 
for each particular index.
4. Recheck locking considerations. I tried to write code as less 
invasive as possible, but we need to make sure that algorithm is still 
correct.

5. Change BTMaxItemSize
6. Bring back microvacuum functionality.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index e3c55eb..72acc0f 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -24,6 +24,8 @@
 #include "storage/predicate.h"
 #include "utils/tqual.h"
 
+#include "catalog/catalog.h"
+#include "utils/datum.h"
 
 typedef struct
 {
@@ -82,6 +84,7 @@ static bool _bt_pgaddtup(Page page, Size itemsize, IndexTuple itup,
 			 OffsetNumber itup_off);
 static bool _bt_isequal(TupleDesc itupdesc, Page page, OffsetNumber offnum,
 			int keysz, ScanKey scankey);
+
 static void _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel);
 
 
@@ -113,6 +116,11 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 	BTStack		stack;
 	Buffer		buf;
 	OffsetNumber offset;
+	Page 		page;
+	TupleDesc	itupdesc;
+	int			nipd;
+	IndexTuple 	olditup;
+	Size 		sizetoadd;
 
 	/* we need an insertion scan key to do our search, so build one */
 	itup_scankey = _bt_mkscankey(rel, itup);
@@ -190,6 +198,7 @@ top:
 
 	if (checkUnique != UNIQUE_CHECK_EXISTING)
 	{
+		bool updposting = false;
 		/*
 		 * The only conflict predicate locking cares about for indexes is when
 		 * an index tuple insert conflicts with an existing lock.  Since the
@@ -201,7 +210,42 @@ top:
 		/* do the insertion */
 		_bt_findinsertloc(rel, , , natts, itup_scankey, itup,
 		  stack, heapRel);
-		_bt_insertonpg(rel, buf, InvalidBuffer, stack, itup, offset, false);
+
+		/*
+		 * Decide, whether we can apply compression
+		 */
+		page = BufferGetPage(buf);
+
+		if(!IsSystemRelation(rel)
+			&& !rel->rd_index->indisunique
+			&& offset != InvalidOffsetNumber
+			&& offset <= PageGetMaxOffsetNumber(page))
+		{
+			itupdesc = RelationGetDescr(rel);
+			sizetoadd = sizeof(ItemPointerData);
+			olditup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset));
+
+			if(_bt_isbinaryequal(itupdesc, olditup,
+	rel->rd_index->indnatts, itup))
+			{
+if (!BtreeTupleIsPosting(olditup))
+{
+	nipd = 1;
+	sizetoadd = sizetoadd*2;
+}
+else
+	nipd = BtreeGetNPosting(olditup);
+
+if ((IndexTupleSize(olditup) + sizetoadd) <= BTMaxItemSize(page)
+	&& PageGetFreeSpace(page) > sizetoadd)
+	updposting = true;
+			}
+		}
+
+		if (updposting)
+			_bt_pgupdtup(rel, buf, offset, itup, olditup, nipd);
+		else
+			_bt_insertonpg(rel, buf, InvalidBuffer, stack, itup, offset, false);
 	}
 	else
 	{
@@ -1042,6 +1086,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 		itemid = PageGetItemId(origpage, P_HIKEY);
 		itemsz = ItemIdGetLength(itemid);
 		item = (IndexTuple) PageGetItem(origpage, itemid);
+
 		if (PageAddItem(rightpage, (Item) item, itemsz, rightoff,
 		false, false) == InvalidOffsetNumber)
 		{
@@ -1072,13 +1117,39 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 		itemsz = ItemIdGetLength(itemid);
 		item = (IndexTuple) PageGetItem(origpage, itemid);
 	}
-	if (PageAddItem(leftpage, (Item) item, itemsz, leftoff,
+
+	if (BtreeTupleIsPosting(item))
+	{
+		Size hikeysize =  BtreeGetPostingOffset(item);
+		IndexTuple hikey = palloc0(hikeysize);
+
+		/* Truncate posting before insert it as a hikey. */
+		memcpy (hikey, item, hikeysize);
+		hikey->t_info &= ~INDEX_SIZE_MASK;
+		hikey->t_info |= hikeysize;
+		ItemPointerSet(&(hikey->t_tid), origpagenumber, P_HIKEY);
+
+		if (PageAddItem(leftpage, (Item) hikey, hikeysize, leftoff,
 	false, false) == InvalidOffsetNumber)
+		{
+			memset(rightpage, 0, BufferGetPageSize(rbuf));
+			elog(ERROR, "failed to add hikey to the left sibling"
+" while splitting block %u of index \"%s\"",
+origpagenumber, RelationGetRelationName(rel));
+		}
+
+		pfree(hikey);
+	}
+	else
 	{
-		memset(rightpage, 0, BufferGetPageSize(rbuf));
-		elog(ERROR, "failed to add hikey to the left siblin

Re: [HACKERS] [PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-16 Thread Anastasia Lubennikova

15.03.2016 22:28, David Steele:

On 3/4/16 2:56 PM, Vitaly Burovoy wrote:


On 3/4/16, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote:


I think that you should update documentation. At least description of
epoch on this page:
http://www.postgresql.org/docs/devel/static/functions-datetime.html

Thank you very much for pointing where it is located (I saw only
"to_timestamp(TEXT, TEXT)").
I'll think how to update it.

Vitaly, have you decided how to update this yet?


3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice
abbreviation, but it seems slightly confusing to me.

It doesn't matter for me what it is called, it is short enough and
reflects a type on which it is applied.
What would the best name be for it?

Anastasia, any suggestions for a better name, or just leave it as is?

I'm not in favor of the "4", either.  I think I would prefer
JULIAN_MAXYEAR_STAMP.



This point is related to another patch 
https://commitfest.postgresql.org/9/540/.

And added to this patch just for compatibility.
If Tom wouldn't change the name of the macros there, I don't see any 
reasons why should we do it in this patch.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-15 Thread Anastasia Lubennikova

14.03.2016 16:02, David Steele:

Hi Anastasia,

On 2/18/16 12:29 PM, Anastasia Lubennikova wrote:

18.02.2016 20:18, Anastasia Lubennikova:

04.02.2016 20:16, Peter Geoghegan:

On Fri, Jan 29, 2016 at 8:50 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru>  wrote:

I fixed it in the new version (attached).


Thank you for the review.
At last, there is a new patch version 3.0. After some refactoring it
looks much better.
I described all details of the compression in this document
https://goo.gl/50O8Q0 (the same text without pictures is attached in
btc_readme_1.0.txt).
Consider it as a rough copy of readme. It contains some notes about
tricky moments of implementation and questions about future work.
Please don't hesitate to comment it.


Sorry, previous patch was dirty. Hotfix is attached.


This looks like an extremely valuable optimization for btree indexes 
but unfortunately it is not getting a lot of attention. It still 
applies cleanly for anyone interested in reviewing.




Thank you for attention.
I would be indebted to all reviewers, who can just try this patch on 
real data and workload (except WAL for now).

B-tree needs very much testing.

It's not clear to me that you answered all of Peter's questions in 
[1].  I understand that you've provided a README but it may not be 
clear if the answers are in there (and where).


I described in README all the points Peter asked.
But I see that it'd be better to answer directly.
Thanks for reminding, I'll do it tomorrow.


Also, at the end of the README it says:

13. Xlog. TODO.

Does that mean the patch is not yet complete?


Yes, you're right.
Frankly speaking, I supposed that someone will help me with that stuff,
but now I almost completed it. I'll send updated patch in the next letter.

I'm still doubtful about some patch details. I mentioned them in readme 
(bold type).

But they are mostly about future improvements.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

2016-03-15 Thread Anastasia Lubennikova

15.03.2016 03:21, Vitaly Burovoy:

On 3/14/16, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote:

14.03.2016 16:23, David Steele:

On 2/25/16 4:44 PM, Vitaly Burovoy wrote:


Added to the commitfest 2016-03.

[CF] https://commitfest.postgresql.org/9/540/

This looks like a fairly straight-forward bug fix (the size of the
patch is deceptive because there a lot of new tests included).  It
applies cleanly.

Anastasia, I see you have signed up to review.  Do you have an idea
when you will get the chance to do that?

Thanks,

I've read the patch thoroughly and haven't found any problems. I think
that the patch is in a very good shape.
It fixes a bug and has an excellent set of tests.

There is an issue, mentioned in the thread above:


postgres=# select
postgres-#  to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
  to_char  |  to_char  |  to_char
---+---+---
monday| monday| thursday
(1 row)
since 4714-12-28 BC and to the past detection when a week is starting
is broken (because it is boundary of isoyears -4713 and -4712).
Is it worth to break undocumented range or leave it as is?

But I suppose that behavior of undocumented dates is not essential.

I'm sorry... What should I do with "Waiting on Author" state if you
don't have complaints?



I was going to set "Ready for Committer", but then I've noticed message 
from Mark Dilger and changed my mind.

Now, when you answered him, I have no objections.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

2016-03-14 Thread Anastasia Lubennikova

14.03.2016 16:23, David Steele:

On 2/25/16 4:44 PM, Vitaly Burovoy wrote:


Added to the commitfest 2016-03.

[CF] https://commitfest.postgresql.org/9/540/


This looks like a fairly straight-forward bug fix (the size of the 
patch is deceptive because there a lot of new tests included).  It 
applies cleanly.


Anastasia, I see you have signed up to review.  Do you have an idea 
when you will get the chance to do that?


Thanks,


I've read the patch thoroughly and haven't found any problems. I think 
that the patch is in a very good shape.

It fixes a bug and has an excellent set of tests.

There is an issue, mentioned in the thread above:


postgres=# select
postgres-#  to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
 to_char  |  to_char  |  to_char
---+---+---
monday| monday| thursday
(1 row)



since 4714-12-28 BC and to the past detection when a week is starting
is broken (because it is boundary of isoyears -4713 and -4712).
Is it worth to break undocumented range or leave it as is?


But I suppose that behavior of undocumented dates is not essential.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2016-03-14 Thread Anastasia Lubennikova

02.03.2016 08:50, Michael Paquier:

On Wed, Mar 2, 2016 at 2:10 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

01.03.2016 19:55, Anastasia Lubennikova:

It is not the final version, because it breaks pg_dump for previous
versions. I need some help from hackers here.
pgdump. line 5466
if (fout->remoteVersion >= 90400)

What does 'remoteVersion' mean? And what is the right way to change it? Or
it changes between releases?
I guess that 90400 is for 9.4 and 80200 is for 8.2 but is it really so?
That is totally new to me.

Yes, you got it. That's basically PG_VERSION_NUM as compiled on the
server that has been queried, in this case the server from which a
dump is taken. If you are changing the system catalog layer, you would
need to provide a query at least equivalent to what has been done
until now for your patch, the modify pg_dump as follows:
if (fout->remoteVersion >= 90600)
{
 query = my_new_query;
}
else if (fout->remoteVersion >= 90400)
{
 query = the existing 9.4 query
}
etc.

In short you just need to add a new block so as remote servers newer
than 9.6 will be able to dump objects correctly. pg_upgrade is a good
way to check the validity of pg_dump actually, this explains why some
objects are not dropped in the regression tests. Perhaps you'd want to
do the same with your patch if the current test coverage of pg_dump is
not enough. I have not looked at your patch so I cannot say for sure.


Thank you for the explanation.
New version of the patch implements pg_dump well.
Documentation related to constraints is updated.

I hope, that patch is in a good shape now. Brief overview for reviewers:

This patch allows unique indexes to be defined on one set of columns
and include another set of column in the INCLUDING clause, on which
the uniqueness is not enforced upon. It allows more queries to benefit
from using index-only scan. Currently, only the B-tree access method
supports this feature.

Syntax example:
CREATE TABLE tbl (c1 int, c2 int, c3 box);
CREATE INDEX idx ON TABLE tbl (c1) INCLUDING (c2, c3);

In opposite to key columns (c1),  included columns (c2,c3) are not used
in index scankeys neither in "search" scankeys nor in "insertion" scankeys.
Included columns are stored only in leaf pages and it can help to slightly
reduce index size. Hence, included columns do not require any opclass
for btree access method. As you can see from example above, it's possible
to add into index columns of "box" type.

The most common use-case for this feature is combination of UNIQUE or
PRIMARY KEY constraint on columns (a,b) and covering index on columns 
(a,b,c).

So, there is a new syntax for constraints.

CREATE TABLE tblu (c1 int, c2 int, c3 box, UNIQUE (c1,c2) INCLUDING (c3));
Index, created for this constraint contains three columns.
"tblu_c1_c2_c3_key" UNIQUE CONSTRAINT, btree (c1, c2) INCLUDING (c3)

CREATE TABLE tblpk (c1 int, c2 int, c3 box, PRIMARY KEY (c1) INCLUDING 
(c3));

Index, created for this constraint contains two columns. Note that NOT NULL
constraint is applied only to key column(s) as well as unique constraint.

postgres=# \d tblpk
 Table "public.tblpk"
 Column |  Type   | Modifiers
+-+---
 c1 | integer | not null
 c2 | integer |
 c3 | box |
Indexes:
"tblpk_pkey" PRIMARY KEY, btree (c1) INCLUDING (c3)

Same for ALTER TABLE statements:
CREATE TABLE tblpka (c1 int, c2 int, c3 box);
ALTER TABLE tblpka ADD PRIMARY KEY (c1) INCLUDING (c3);

pg_dump is updated and seems to work fine with this kind of indexes.

I see only one problem left (maybe I've mentioned it before).
Queries like this [1] must be rewritten, because after catalog changes,
i.indkey contains both key and included attrs.
One more thing to do is some refactoring of names, since "indkey"
looks really confusing to me. But it could be done as a separate patch [2].


[1] https://wiki.postgresql.org/wiki/Retrieve_primary_key_columns
[2] http://www.postgresql.org/message-id/56bb7788.30...@postgrespro.ru

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknuma

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Anastasia Lubennikova
fer, copy it into local memory allocated by
palloc(), and then immediately release the buffer lock and drop the
pin. This is the same in all instances. There is never more than one
buffer lock or pin held at a time.

We do, on the other hand, have a detailed rationale for why it's okay
that we don't have an ExclusiveLock on the index relation for checks
that span the key space of more than one page by following right links
to compare items across sibling pages. This isn't the same thing as
having an explicit interlock like a pin -- our interlock is one
against *recycling* by vacuum, which is based on recentGlobalXmin.
This rationale requires expert review.


I'm not an expert, but I promise to take a closer look at locking.
I will send another email soon.

Performance
==

Trying to keep the tool as simple as possible, while still making it
do verification that is as useful as possible was my priority here,
not performance. Still, verification completes fairly quickly.
Certainly, it takes far less time than having to REINDEX the index,
and doesn't need too much memory. I think that in practice most
problems that can be detected by the B-Tree checker functions will be
detected with the lighter variant.


I do not see any performance issues.
I'm sure that if someone wants to check whether the index is corrupted, 
he certainly can wait a minute (.


But I have some concerns about compatibility with my patches.
I've tried to call bt_index_check() over my "including" patch [1] and 
caught a segfault.


LOG:  server process (PID 31794) was terminated by signal 11: 
Segmentation fault

DETAIL:  Failed process was running: select bt_index_check('idx');

I do hope that my patch will be accepted in 9.6, so this conflict looks 
really bad.
I think that error is caused by changes in pages layout. To save some 
space nonkey attributes are truncated
when btree copies the indexed value into inner pages or into high key. 
You can look at index_reform_tuple() calls.


I wonder, whether you can add some check of catalog version to your 
module or this case requires more changes?


With second patch which implements compression [2], amcheck causes 
another error.

postgres=# insert into tbl select 1 from generate_series(0,5);
INSERT 0 6
postgres=# SELECT * FROM bt_page_items('idx', 1);
 itemoffset |  ctid  | itemlen | nulls | vars | data

++-+---+--+
-
  1 | (2147483664,6) |  56 | f | f| 01 00 00 00 00 
00 00 00 00 00 00 00 01 00 00 00 00 00 02 00 00 00 00 00 03 00 00 00 00

00 04 00 00 00 00 00 05 00 00 00 00 00 06 00 00 00 00 00

postgres=# select bt_index_check('idx');
ERROR:  cannot check index "idx"
DETAIL:  index is not yet ready for insertions

But I'm sure that it's a problem of my patch. So I'll fix it and try again.

[1] https://commitfest.postgresql.org/9/433/
[2] https://commitfest.postgresql.org/9/494/

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-04 Thread Anastasia Lubennikova

27.02.2016 09:57, Vitaly Burovoy:

Hello, Hackers!

I worked on a patch[1] allows "EXTRACT(epoch FROM
+-Inf::timestamp[tz])" to return "+-Inf::float8".
There is an opposite function "to_timestamp(float8)" which now defined as:
SELECT ('epoch'::timestamptz + $1 * '1 second'::interval)


Hi,
thank you for the patches.
Could you explain, whether they depend on each other?


Since intervals do not support infinity values, it is impossible to do
something like:

SELECT to_timestamp('infinity'::float8);

... which is not good.

Supporting of such converting is in the TODO list[2] (by "converting
between infinity timestamp and float8").


You mention intervals here, and TODO item definitely says about 
'infinity' interval,

while patch and all the following discussion concerns to timestamps.
Is it a typo or I misunderstood something important?
I assumed that following query will work, but it isn't. Could you 
clarify that?

select to_timestamp('infinity'::interval);


Proposed patch implements it.

There is an other patch in the CF[3] 2016-03 implements checking of
timestamp[tz] for being in allowed range. Since it is wise to set
(fix) the upper boundary of timestamp[tz]s, I've included the file
"src/include/datatype/timestamp.h" from there to check that an input
value and a result are in the allowed range.

There is no changes in a documentation because allowed range is the
same as officially supported[4] (i.e. until 294277 AD).


I think that you should update documentation. At least description of 
epoch on this page:

http://www.postgresql.org/docs/devel/static/functions-datetime.html

Here is how you can convert an epoch value back to a time stamp:

SELECT TIMESTAMP WITH TIME ZONE 'epoch' + 982384720.12 * INTERVAL '1 second';

(The |to_timestamp| function encapsulates the above conversion.)


More thoughts about the patch:

1. When I copy value from hints for min and max values (see examples 
below), it works fine for min, while max still leads to error.
It comes from the check   "if (seconds >= epoch_ubound)". I wonder, 
whether you should change hint message?


select to_timestamp(-210866803200.00);
  to_timestamp
-
 4714-11-24 02:30:17+02:30:17 BC
(1 row)


select to_timestamp(9224318016000.00);
ERROR:  UNIX epoch out of range: "9224318016000.00"
HINT:  Maximal UNIX epoch value is "9224318016000.00"

2. There is a comment about JULIAN_MAXYEAR inaccuracy in timestamp.h:

 * IS_VALID_JULIAN checks the minimum date exactly, but is a bit sloppy
 * about the maximum, since it's far enough out to not be especially
 * interesting.

Maybe you can expand it?
- Is JULIAN_MAXYEAR4STAMPS helps to avoid overflow in all possible cases?
- Why do we need to hold both definitions? I suppose, it's a matter of 
backward compatibility, isn't it?


3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice 
abbreviation, but it seems slightly confusing to me.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS]WIP: Covering + unique indexes.

2016-03-01 Thread Anastasia Lubennikova


29.02.2016 18:17, Anastasia Lubennikova:

25.02.2016 21:39, Jeff Janes:
As promised, here's the new version of the patch 
"including_columns_4.0".

I fixed all issues except some points mentioned below.

Thanks for the update patch.  I get a compiler warning:

genam.c: In function 'BuildIndexValueDescription':
genam.c:259: warning: unused variable 'tupdesc'


Thank you for the notice, I'll fix it in the next update.

Also, I can't create a primary key INCLUDING columns directly:

jjanes=# create table foobar (a int, b int, c int);
jjanes=# alter table foobar add constraint foobar_pkey primary key
(a,b) including (c);
ERROR:  syntax error at or near "including"

But I can get there using a circuitous route:

jjanes=# create unique index on foobar (a,b) including (c);
jjanes=# alter table foobar add constraint foobar_pkey primary key
using index foobar_a_b_c_idx;

The description of the table's index knows to include the including 
column:


jjanes=# \d foobar
 Table "public.foobar"
  Column |  Type   | Modifiers
+-+---
  a  | integer | not null
  b  | integer | not null
  c  | integer |
Indexes:
 "foobar_pkey" PRIMARY KEY, btree (a, b) INCLUDING (c)


Since the machinery appears to all be in place to have primary keys
with INCLUDING columns, it would be nice if the syntax for adding
primary keys allowed one to implement them directly.

Is this something or future expansion, or could it be added at the
same time as the main patch?


Good point.
At quick glance, this looks easy to implement it. The only problem is 
that there are too many places in code which must be updated.
I'll try to do it, and if there would be difficulties, it's fine with 
me to delay this feature for the future work.


I found one more thing to do. Pgdump does not handle included columns 
now. I will fix it in the next version of the patch.




As promised, fixed patch is in attachments. It allows to perform 
following statements:


create table utbl (a int, b box);
alter table utbl add unique (a) including(b);
create table ptbl (a int, b box);
alter table ptbl add primary key (a) including(b);

And now they can be dumped/restored successfully.
I used following settings
pg_dump --verbose -Fc postgres -f pg.dump
pg_restore -d newdb pg.dump

It is not the final version, because it breaks pg_dump for previous 
versions. I need some help from hackers here.

pgdump. line 5466
if (fout->remoteVersion >= 90400)

What does 'remoteVersion' mean? And what is the right way to change it? 
Or it changes between releases?
I guess that 90400 is for 9.4 and 80200 is for 8.2 but is it really so? 
That is totally new to me.
BTW, While we are on the subject, maybe it's worth to replace these 
magic numbers with some set of macro?


P.S. I'll update documentation for ALTER TABLE in the next patch.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2016-02-29 Thread Anastasia Lubennikova

25.02.2016 21:39, Jeff Janes:

As promised, here's the new version of the patch "including_columns_4.0".
I fixed all issues except some points mentioned below.

Thanks for the update patch.  I get a compiler warning:

genam.c: In function 'BuildIndexValueDescription':
genam.c:259: warning: unused variable 'tupdesc'


Thank you for the notice, I'll fix it in the next update.

Also, I can't create a primary key INCLUDING columns directly:

jjanes=# create table foobar (a int, b int, c int);
jjanes=# alter table foobar add constraint foobar_pkey primary key
(a,b) including (c);
ERROR:  syntax error at or near "including"

But I can get there using a circuitous route:

jjanes=# create unique index on foobar (a,b) including (c);
jjanes=# alter table foobar add constraint foobar_pkey primary key
using index foobar_a_b_c_idx;

The description of the table's index knows to include the including column:

jjanes=# \d foobar
 Table "public.foobar"
  Column |  Type   | Modifiers
+-+---
  a  | integer | not null
  b  | integer | not null
  c  | integer |
Indexes:
 "foobar_pkey" PRIMARY KEY, btree (a, b) INCLUDING (c)


Since the machinery appears to all be in place to have primary keys
with INCLUDING columns, it would be nice if the syntax for adding
primary keys allowed one to implement them directly.

Is this something or future expansion, or could it be added at the
same time as the main patch?


Good point.
At quick glance, this looks easy to implement it. The only problem is 
that there are too many places in code which must be updated.
I'll try to do it, and if there would be difficulties, it's fine with me 
to delay this feature for the future work.


I found one more thing to do. Pgdump does not handle included columns 
now. I will fix it in the next version of the patch.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-02-18 Thread Anastasia Lubennikova

18.02.2016 20:18, Anastasia Lubennikova:

04.02.2016 20:16, Peter Geoghegan:

On Fri, Jan 29, 2016 at 8:50 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru>  wrote:

I fixed it in the new version (attached).


Thank you for the review.
At last, there is a new patch version 3.0. After some refactoring it 
looks much better.
I described all details of the compression in this document 
https://goo.gl/50O8Q0 (the same text without pictures is attached in 
btc_readme_1.0.txt).
Consider it as a rough copy of readme. It contains some notes about 
tricky moments of implementation and questions about future work.

Please don't hesitate to comment it.


Sorry, previous patch was dirty. Hotfix is attached.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Compression. To be correct, it’s not actually compression, but just effective layout of ItemPointers on an index page.
compressed tuple  = IndexTuple (with metadata in TID field+ key) + PostingList


1. Gin index fits extremely good for really large sets of repeating keys, but on the other hand, it completely fails to handle unique keys. To btree it is essential to have good performance and concurrency in any corner cases with any number of duplicates. That’s why we can’t just copy the gin implementation of  item pointers compression. The first difference is that btree algorithm performs compression (or, in other words, changes index tuple layout) only if there’s more than one tuple with this key. It allows us to avoid the overhead of storing useless metadata for mostly different keys (see picture below). It seems that compression could be useful for unique indexes under heavy write/update load (because of MVCC copies), but I don’t sure whether this use-case really exists. Those tuples should be deleted by microvacuum as soon as possible. Anyway, I think that it’s worth to add storage_parameter for btree which enables/disables compression for each particular index. And set compression of unique indexes to off by default. System indexes do not support compression for several reasons. First of all because of WIP state of the patch (debugging system catalog isn’t a big pleasure). The next reason is that I know many places in the code where hardcode or some non-obvious syscache routines are used. I do not feel brave enough to change this code. And last but not least, I don’t see good reasons to do that.

2. If the index key is very small (smaller than metadata) and the number of duplicates is small, compression could lead to index bloat instead of index size decrease (see picture below). I don’t sure whether it’s worth to handle this case separately because it’s really rare and I consider that it’s the DBA’s job to disable compression on such indexes. But if you see any clear way to do this, it would be great.

3. For GIN indexes, if a posting list is too large, a posting tree is created. It proceeded on assumptions that:
Indexed keys are never deleted. It makes all tree algorithms much easier.
There are always many duplicates. Otherwise, gin becomes really inefficient.
There’s no big concurrent rate. In order to add a new entry into a posting tree, we hold a lock on its root, so only 1 backend at a time can perform insertion. 

In btree we can’t afford these assumptions. So we should handle big posting lists in another way. If there are too many ItemPointers to fit into a single posting list, we will just create another one. The overhead of this approach is that we have to store a duplicate of the key and metadata. It leads to the problem of big keys. If the keysize is close to BTMaxItemSize, compression will give us really small benefit, if any at all (see picture below).

4. The more item pointers fit into the single posting list, the rare we have to split it and repeat the key. Therefore, the bigger BTMaxItemSize is the better. The comment in nbtree.h says: “We actually need to be able to fit three items on every page, so restrict any one item to 1/3 the per-page available space.” That is quite right for regular items, but if the index tuple is compressed it already contains more than one item. Taking it into account, we can assert that BTMaxItemSize ~ ⅓ pagesize for regular items, and ~ ½ pagesize for compressed items. Are there any objections? I wonder if we can increase BTMaxItemSize with some other assumption? The problem I see here is that varlena highkey could be as big as the compressed tuple.

5. CREATE INDEX. _bt_load. The algorithm of btree build is following: do the heap scan, add tuples into spool, sort the data, insert ordered data from spool into leaf index pages (_bt_load), build inner pages and root. The main changes are applied to _bt_load function. While loading tuples, we do not insert them one by one, but instead, compare each tuple with the previous one, and if they are equal we put them into posting list. If the posting list is large 

Re: [HACKERS]WIP: Covering + unique indexes.

2016-02-11 Thread Anastasia Lubennikova

02.02.2016 15:50, Anastasia Lubennikova:

31.01.2016 11:04, David Rowley:

On 27 January 2016 at 03:35, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

including_columns_3.0 is the latest version of patch.
And changes regarding the previous version are attached in a 
separate patch.

Just to ease the review and debug.

Hi,

I've made another pass over the patch. There's still a couple of
things that I think need to be looked at.

Thank you again.
I just write here to say that I do not disappear and I do remember 
about the issue.
But I'm very very busy this week. I'll send an updated patch next week 
as soon as possible.




As promised, here's the new version of the patch "including_columns_4.0".
I fixed all issues except some points mentioned below.
Besides, I did some refactoring:
- use macros IndexRelationGetNumberOfAttributes, 
IndexRelationGetNumberOfKeyAttributes where possible. Use macro 
RelationGetNumberOfAttributes. Maybe that's a bit unrelated changes, but 
it'll make development much easier in future.

- rename related variables to indnatts,  indnkeyatts.

I'm also not that keen on index_reform_tuple() in general. I wonder if
there's a way we can just keep the Datum/isnull arrays a bit longer,
and only form the tuple when needed. I've not looked into this in
detail, but it does look like reforming the tuple is not going to be
cheap.
It is used in splits, for example. There is no datum array, we just 
move tuple key from a child page to a parent page or something like that.
And according to INCLUDING algorithm we need to truncate nonkey 
attributes.

If we do need to keep this function, I think a better name might be
index_trim_tuple() and I don't think you need to pass the original
length. It might make sense to Assert() that the trim length is
smaller than the tuple size


As regards the performance, I don't think that it's a big problem here.
Do you suggest to do it in a following way memcpy(oldtup, newtup, 
newtuplength)?


I've tested it some more, and still didn't find any performance issues.


in gistrescan() there is some code:

for (attno = 1; attno <= natts; attno++)
{
TupleDescInitEntry(so->giststate->fetchTupdesc, attno, NULL,
   scan->indexRelation->rd_opcintype[attno - 1],
   -1, 0);
}

Going by RelationInitIndexAccessInfo() rd_opcintype[] is allocated to
be sized by the number of key columns, but this loop goes over the
number of attribute columns.
Perhaps this is not a big problem since GIST does not support
INCLUDING columns, but it does seem wrong still.


GiST doesn't support INCLUDING clause, so natts and nkeyatts are 
always equal. I don't see any problem here.
And I think that it's an extra work to this patch. Maybe I or someone 
else would add this feature to other access methods later.


Still the same.

Which brings me to the fact that I've spent a bit of time trying to
look for places where you've forgotten to change natts to nkeyatts. I
did find this one, but I don't have much confidence that there's not
lots more places that have been forgotten. Apart from this one, how
confident are you that you've found all the places? I'm getting
towards being happy with the code that I see that's been changed, but
I'm hesitant to mark as "Ready for committer" due to not being all
that comfortable that all the code that needs to be updated has been
updated. I'm not quite sure of a good way to find all these places.
I found all mentions of natts and other related variables with grep, 
and replaced (or expand) them with nkeyatts where it was necessary.

As mentioned before, I didn't change other AMs.
I strongly agree that any changes related to btree require thorough 
inspection, so I'll recheck it again. But I'm almost sure that it's okay.


I rechecked everything again and fixed couple of omissions. Thank you 
for being exacting reviewer)
I don't know how to ensure that everything is ok, but I have no idea 
what else I can do.



I wondering if hacking the code so that each btree index which is
created with > 1 column puts all but the first column into the
INCLUDING columns, then run the regression tests to see if there are
any crashes. I'm really not that sure of how else to increase the
confidence levels on this. Do you have ideas?


Do I understand correctly that you suggest to replace all multicolumn 
indexes with (1key column) + included?
I don't think it's a good idea. INCLUDING clause brings some 
disadvantages. For example, included columns must be filtered after 
the search, while key columns could be used in scan key directly. I 
already mentioned this in test example:


explain analyze select c1, c2 from tbl where c1<1 and c3<20;
If columns' opclasses are used, new query plan uses them in  Index 
Cond: ((c1 < 1) AND (c3 < 20))
Otherwise, new query can not use included column in Index Cond and 
uses filter instead:

Index Cond: (c1 < 1)
Filter: (c3 < 20)
Rows Removed by Filter: 999

[HACKERS] Some refactoring of index structures .

2016-02-10 Thread Anastasia Lubennikova

Hi, hackers.

Long story short, I'd like to do some refactoring of the code related to 
indexes.


I work on patch which provides INCLUDING columns functional [1]. The 
patch was reviewed again and again, I fixed many issues, but we still 
don't sure enough that all occurrences of /rd_index->indnatts/ are 
replaced with /rd_index->indnkeyatts /accurately. I have no idea how to 
do it without thorough reading of code, so I am doing it now.


I already replaced all these /rd_index->indnatts ///with macro 
/IndexRelationGetNumberOfAttributes/ and /rd_index->indnkeyatts /with 
macro /IndexRelationGetNumberOfKeyAttributes/. But still there are a lot 
of places with vague naming.
For example, /indnatts/, /natts/, /ncolumns/ and so on for the very same 
/rd_index->indnatts. /I changed them with /indnatts./
Or /indexStruct, index, idxForm, index_form/ for /index->rd_index./ They 
are definitely should be unified. I'd prefer indexForm. Any objections?


One more point is indkey in FormData_pg_index 
<http://doxygen.postgresql.org/pg__index_8h.html#a5065be0408f03576083a30c97b43a09a>. 
The same for indkeys 
<http://doxygen.postgresql.org/struct__indxInfo.html#a66d6eab0efe4bcb9fcbb472d403fa981>, 
ii_KeyAttrNumbers 
<http://doxygen.postgresql.org/structIndexInfo.html#a7723798af613a780d505231bad72c0a3> 
, indexkeys 
<http://doxygen.postgresql.org/structIndexOptInfo.html#ab95b0da6ff1e527506239dab1a480b82>. 
With my patch index can have key and non-key (included) columns. 
Therefore indkey is not just the key now. It contains both key and 
included columns. And its length is not indnkeyatts as one might think, 
but indnatts. I suppose it's worth to rename it somehow. But I can't 
find the proper name. Do you have any ideas?


If you see any related changes, please, let me know.

1. 
http://www.postgresql.org/message-id/flat/56168952.4010...@postgrespro.ru#56168952.4010...@postgrespro.ru


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-02-02 Thread Anastasia Lubennikova



29.01.2016 20:43, Thom Brown:

On 29 January 2016 at 16:50, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru>  wrote:

29.01.2016 19:01, Thom Brown:

On 29 January 2016 at 15:47, Aleksander Alekseev
<a.aleks...@postgrespro.ru>  wrote:

I tested this patch on x64 and ARM servers for a few hours today. The
only problem I could find is that INSERT works considerably slower after
applying a patch. Beside that everything looks fine - no crashes, tests
pass, memory doesn't seem to leak, etc.

Thank you for testing. I rechecked that, and insertions are really very very
very slow. It seems like a bug.


Okay, now for some badness.  I've restored a database containing 2
tables, one 318MB, another 24kB.  The 318MB table contains 5 million
rows with a sequential id column.  I get a problem if I try to delete
many rows from it:
# delete from contacts where id % 3 != 0 ;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory

I didn't manage to reproduce this. Thom, could you describe exact steps
to reproduce this issue please?

Sure, I used my pg_rep_test tool to create a primary (pg_rep_test
-r0), which creates an instance with a custom config, which is as
follows:

shared_buffers = 8MB
max_connections = 7
wal_level = 'hot_standby'
cluster_name = 'primary'
max_wal_senders = 3
wal_keep_segments = 6

Then create a pgbench data set (I didn't originally use pgbench, but
you can get the same results with it):

createdb -p 5530 pgbench
pgbench -p 5530 -i -s 100 pgbench

And delete some stuff:

thom@swift:~/Development/test$ psql -p 5530 pgbench
Timing is on.
psql (9.6devel)
Type "help" for help.


   ➤ psql://thom@[local]:5530/pgbench

# DELETE FROM pgbench_accounts WHERE aid % 3 != 0;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
...
WARNING:  out of shared memory
WARNING:  out of shared memory
DELETE 667
Time: 22218.804 ms

There were 358 lines of that warning message.  I don't get these
messages without the patch.

Thom

Thank you for this report.
I tried to reproduce it, but I couldn't. Debug will be much easier now.

I hope I'll fix these issueswithin the next few days.

BTW, I found a dummy mistake, the previous patch contains some unrelated
changes. I fixed it in the new version (attached).

Thanks.  Well I've tested this latest patch, and the warnings are no
longer generated.  However, the index sizes show that the patch
doesn't seem to be doing its job, so I'm wondering if you removed too
much from it.


Huh, this patch seems to be enchanted) It works fine for me. Did you 
perform "make distclean"?

Anyway, I'll send a new version soon.
I just write here to say that I do not disappear and I do remember about 
the issue.
I even almost fixed the insert speed problem. But I'm very very busy 
this week.

I'll send an updated patch next week as soon as possible.

Thank you for attention to this work.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2016-02-02 Thread Anastasia Lubennikova

31.01.2016 11:04, David Rowley:

On 27 January 2016 at 03:35, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

including_columns_3.0 is the latest version of patch.
And changes regarding the previous version are attached in a separate patch.
Just to ease the review and debug.

Hi,

I've made another pass over the patch. There's still a couple of
things that I think need to be looked at.

Thank you again.
I just write here to say that I do not disappear and I do remember about 
the issue.
But I'm very very busy this week. I'll send an updated patch next week 
as soon as possible.



Do we need the "b (included)" here? The key is (a) = (1). Having
irrelevant details might be confusing.

postgres=# create table a (a int not null, b int not null);
CREATE TABLE
postgres=# create unique index on a (a) including(b);
CREATE INDEX
postgres=# insert into a values(1,1);
INSERT 0 1
postgres=# insert into a values(1,1);
ERROR:  duplicate key value violates unique constraint "a_a_b_idx"
DETAIL:  Key (a, b (included))=(1, 1) already exists.
I thought that it could be strange if user inserts two values and then 
sees only one of them in error message.
But now I see that you're right. I'll also look at the same functional 
in other DBs and fix it.



In index_reform_tuple() I find it a bit scary that you change the
TupleDesc's number of attributes then set it back again once you're
finished reforming the shortened tuple.
Maybe it would be better to modify index_form_tuple() to accept a new
argument with a number of attributes, then you can just Assert that
this number is never higher than the number of attributes in the
TupleDesc.

Good point.
I agree that this function is a bit strange. I have to set 
tupdesc->nattrs to support compatibility with index_form_tuple().
I didn't want to add neither a new field to tupledesc nor a new 
parameter to index_form_tuple(), because they are used widely.

I'm also not that keen on index_reform_tuple() in general. I wonder if
there's a way we can just keep the Datum/isnull arrays a bit longer,
and only form the tuple when needed. I've not looked into this in
detail, but it does look like reforming the tuple is not going to be
cheap.
It is used in splits, for example. There is no datum array, we just move 
tuple key from a child page to a parent page or something like that.

And according to INCLUDING algorithm we need to truncate nonkey attributes.

If we do need to keep this function, I think a better name might be
index_trim_tuple() and I don't think you need to pass the original
length. It might make sense to Assert() that the trim length is
smaller than the tuple size


As regards the performance, I don't think that it's a big problem here.
Do you suggest to do it in a following way memcpy(oldtup, newtup, 
newtuplength)?

I will

in gistrescan() there is some code:

for (attno = 1; attno <= natts; attno++)
{
TupleDescInitEntry(so->giststate->fetchTupdesc, attno, NULL,
   scan->indexRelation->rd_opcintype[attno - 1],
   -1, 0);
}

Going by RelationInitIndexAccessInfo() rd_opcintype[] is allocated to
be sized by the number of key columns, but this loop goes over the
number of attribute columns.
Perhaps this is not a big problem since GIST does not support
INCLUDING columns, but it does seem wrong still.


GiST doesn't support INCLUDING clause, so natts and nkeyatts are always 
equal. I don't see any problem here.
And I think that it's an extra work to this patch. Maybe I or someone 
else would add this feature to other access methods later.

Which brings me to the fact that I've spent a bit of time trying to
look for places where you've forgotten to change natts to nkeyatts. I
did find this one, but I don't have much confidence that there's not
lots more places that have been forgotten. Apart from this one, how
confident are you that you've found all the places? I'm getting
towards being happy with the code that I see that's been changed, but
I'm hesitant to mark as "Ready for committer" due to not being all
that comfortable that all the code that needs to be updated has been
updated. I'm not quite sure of a good way to find all these places.
I found all mentions of natts and other related variables with grep, and 
replaced (or expand) them with nkeyatts where it was necessary.

As mentioned before, I didn't change other AMs.
I strongly agree that any changes related to btree require thorough 
inspection, so I'll recheck it again. But I'm almost sure that it's okay.



I wondering if hacking the code so that each btree index which is
created with > 1 column puts all but the first column into the
INCLUDING columns, then run the regression tests to see if there are
any crashes. I'm really not that sure of how else to increase the
confidence levels on this. Do you have ideas?


Do I understand correctly that you suggest to replace all multicolumn 
indexes with (1key column) + included?
I don't think it's a good idea.

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-29 Thread Anastasia Lubennikova

28.01.2016 20:03, Thom Brown:
On 28 January 2016 at 16:12, Anastasia Lubennikova 
<a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> 
wrote:



28.01.2016 18:12, Thom Brown:


On 28 January 2016 at 14:06, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru
<mailto:a.lubennik...@postgrespro.ru>> wrote:


    31.08.2015 10:41, Anastasia Lubennikova:

Hi, hackers!
I'm going to begin work on effective storage of duplicate
keys in B-tree index.
The main idea is to implement posting lists and posting
trees for B-tree index pages as it's already done for GIN.

In a nutshell, effective storing of duplicates in GIN is
organised as follows.
Index stores single index tuple for each unique key. That
index tuple points to posting list which contains pointers
to heap tuples (TIDs). If too many rows having the same key,
multiple pages are allocated for the TIDs and these
constitute so called posting tree.
You can find wonderful detailed descriptions in gin readme

<https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README>
and articles <http://www.cybertec.at/gin-just-an-index-type/>.
It also makes possible to apply compression algorithm to
posting list/tree and significantly decrease index size.
Read more in presentation (part 1)
<http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>.

Now new B-tree index tuple must be inserted for each table
row that we index.
It can possibly cause page split. Because of MVCC even
unique index could contain duplicates.
Storing duplicates in posting list/tree helps to avoid
superfluous splits.


I'd like to share the progress of my work. So here is a WIP
patch.
It provides effective duplicate handling using posting lists
the same way as GIN does it.

Layout of the tuples on the page is changed in the following way:
before:
TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) +
key, TID (ip_blkid, ip_posid) + key
with patch:
TID (N item pointers, posting list offset) + key, TID
(ip_blkid, ip_posid), TID (ip_blkid, ip_posid), TID
(ip_blkid, ip_posid)

It seems that backward compatibility works well without any
changes. But I haven't tested it properly yet.

Here are some test results. They are obtained by test
functions test_btbuild and test_ginbuild, which you can find
in attached sql file.
i - number of distinct values in the index. So i=1 means that
all rows have the same key, and i=1000 means that all
keys are different.
The other columns contain the index size (MB).

i   B-tree Old  B-tree New  GIN
1   214,234375  87,7109375  10,2109375
10  214,234375  87,7109375  10,71875
100 214,234375  87,4375 15,640625
1000214,234375  86,2578125  31,296875
1   214,234375  78,421875   104,3046875
10  214,234375  65,359375   49,078125
100 214,234375  90,140625   106,8203125
1000214,234375  214,234375  534,0625


You can note that the last row contains the same index sizes
for B-tree, which is quite logical - there is no compression
if all the keys are distinct.
Other cases looks really nice to me.
Next thing to say is that I haven't implemented posting list
compression yet. So there is still potential to decrease size
of compressed btree.

I'm almost sure, there are still some tiny bugs and missed
functions, but on the whole, the patch is ready for testing.
I'd like to get a feedback about the patch testing on some
real datasets. Any bug reports and suggestions are welcome.

Here is a couple of useful queries to inspect the data inside
the index pages:
create extension pageinspect;
select * from bt_metap('idx');
select bt.* from generate_series(1,1) as n, lateral
bt_page_stats('idx', n) as bt;
select n, bt.* from generate_series(1,1) as n, lateral
bt_page_items('idx', n) as bt;

And at last, the list of items I'm going to complete in the
near future:
1. Add storage_parameter 'enable_compression' for btree
access method which specifies whether the index handles
duplicates. default is 'off'
2. Bring back microvacuum functionality for compressed indexes.
3. Improve insertion speed. Insertions became significantly
slower with compressed btree, which is obviously not what we
do want.
4

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-29 Thread Anastasia Lubennikova

29.01.2016 19:01, Thom Brown:

On 29 January 2016 at 15:47, Aleksander Alekseev
<a.aleks...@postgrespro.ru> wrote:

I tested this patch on x64 and ARM servers for a few hours today. The
only problem I could find is that INSERT works considerably slower after
applying a patch. Beside that everything looks fine - no crashes, tests
pass, memory doesn't seem to leak, etc.
Thank you for testing. I rechecked that, and insertions are really very 
very very slow. It seems like a bug.

Okay, now for some badness.  I've restored a database containing 2
tables, one 318MB, another 24kB.  The 318MB table contains 5 million
rows with a sequential id column.  I get a problem if I try to delete
many rows from it:
# delete from contacts where id % 3 != 0 ;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory

I didn't manage to reproduce this. Thom, could you describe exact steps
to reproduce this issue please?

Sure, I used my pg_rep_test tool to create a primary (pg_rep_test
-r0), which creates an instance with a custom config, which is as
follows:

shared_buffers = 8MB
max_connections = 7
wal_level = 'hot_standby'
cluster_name = 'primary'
max_wal_senders = 3
wal_keep_segments = 6

Then create a pgbench data set (I didn't originally use pgbench, but
you can get the same results with it):

createdb -p 5530 pgbench
pgbench -p 5530 -i -s 100 pgbench

And delete some stuff:

thom@swift:~/Development/test$ psql -p 5530 pgbench
Timing is on.
psql (9.6devel)
Type "help" for help.


  ➤ psql://thom@[local]:5530/pgbench

# DELETE FROM pgbench_accounts WHERE aid % 3 != 0;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
...
WARNING:  out of shared memory
WARNING:  out of shared memory
DELETE 667
Time: 22218.804 ms

There were 358 lines of that warning message.  I don't get these
messages without the patch.

Thom


Thank you for this report.
I tried to reproduce it, but I couldn't. Debug will be much easier now.

I hope I'll fix these issueswithin the next few days.

BTW, I found a dummy mistake, the previous patch contains some unrelated 
changes. I fixed it in the new version (attached).


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index e3c55eb..3908cc1 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -24,6 +24,7 @@
 #include "storage/predicate.h"
 #include "utils/tqual.h"
 
+#include "catalog/catalog.h"
 
 typedef struct
 {
@@ -60,7 +61,8 @@ static void _bt_findinsertloc(Relation rel,
   ScanKey scankey,
   IndexTuple newtup,
   BTStack stack,
-  Relation heapRel);
+  Relation heapRel,
+  bool *updposing);
 static void _bt_insertonpg(Relation rel, Buffer buf, Buffer cbuf,
 			   BTStack stack,
 			   IndexTuple itup,
@@ -113,6 +115,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 	BTStack		stack;
 	Buffer		buf;
 	OffsetNumber offset;
+	bool updposting = false;
 
 	/* we need an insertion scan key to do our search, so build one */
 	itup_scankey = _bt_mkscankey(rel, itup);
@@ -162,8 +165,9 @@ top:
 	{
 		TransactionId xwait;
 		uint32		speculativeToken;
+		bool fakeupdposting = false; /* Never update posting in unique index */
 
-		offset = _bt_binsrch(rel, buf, natts, itup_scankey, false);
+		offset = _bt_binsrch(rel, buf, natts, itup_scankey, false, );
 		xwait = _bt_check_unique(rel, itup, heapRel, buf, offset, itup_scankey,
  checkUnique, _unique, );
 
@@ -200,8 +204,54 @@ top:
 		CheckForSerializableConflictIn(rel, NULL, buf);
 		/* do the insertion */
 		_bt_findinsertloc(rel, , , natts, itup_scankey, itup,
-		  stack, heapRel);
-		_bt_insertonpg(rel, buf, InvalidBuffer, stack, itup, offset, false);
+		  stack, heapRel, );
+
+		if (IsSystemRelation(rel))
+			updposting = false;
+
+		/*
+		 * New tuple has the same key with tuple at the page.
+		 * Unite them into one posting.
+		 */
+		if (updposting)
+		{
+			Page		page;
+			IndexTuple olditup, newitup;
+			ItemPointerData *ipd;
+			int nipd;
+
+			page = BufferGetPage(buf);
+			olditup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset));
+
+			if (BtreeTupleIsPosting(olditup))
+nipd = BtreeGetNPosting(olditup);
+			else
+nipd = 1;
+
+			ipd = palloc0(sizeof(ItemPointerData)*(nipd + 1));
+			/* copy item pointers from old tuple into ipd */
+			if (BtreeTupleIsPosting(olditup))
+memcpy(ipd, BtreeGetPosting(olditup), sizeof(ItemPointerData)*nipd);
+			else
+memcpy(ipd, olditup, sizeof(ItemPointerData));
+
+			/* add item pointer of the new tuple into ipd */
+			memcpy(ipd+nipd, itup, sizeof(ItemPointerData));
+
+			/*
+			 * Form posting tuple, 

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-28 Thread Anastasia Lubennikova

28.01.2016 18:12, Thom Brown:
On 28 January 2016 at 14:06, Anastasia Lubennikova 
<a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> 
wrote:



31.08.2015 10:41, Anastasia Lubennikova:

Hi, hackers!
I'm going to begin work on effective storage of duplicate keys in
B-tree index.
The main idea is to implement posting lists and posting trees for
B-tree index pages as it's already done for GIN.

In a nutshell, effective storing of duplicates in GIN is
organised as follows.
Index stores single index tuple for each unique key. That index
tuple points to posting list which contains pointers to heap
tuples (TIDs). If too many rows having the same key, multiple
pages are allocated for the TIDs and these constitute so called
posting tree.
You can find wonderful detailed descriptions in gin readme

<https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README>
and articles <http://www.cybertec.at/gin-just-an-index-type/>.
It also makes possible to apply compression algorithm to posting
list/tree and significantly decrease index size. Read more in
presentation (part 1)
<http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>.

Now new B-tree index tuple must be inserted for each table row
that we index.
It can possibly cause page split. Because of MVCC even unique
index could contain duplicates.
Storing duplicates in posting list/tree helps to avoid
superfluous splits.


I'd like to share the progress of my work. So here is a WIP patch.
It provides effective duplicate handling using posting lists the
same way as GIN does it.

Layout of the tuples on the page is changed in the following way:
before:
TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key,
TID (ip_blkid, ip_posid) + key
with patch:
TID (N item pointers, posting list offset) + key, TID (ip_blkid,
ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid)

It seems that backward compatibility works well without any
changes. But I haven't tested it properly yet.

Here are some test results. They are obtained by test functions
test_btbuild and test_ginbuild, which you can find in attached sql
file.
i - number of distinct values in the index. So i=1 means that all
rows have the same key, and i=1000 means that all keys are
different.
The other columns contain the index size (MB).

i   B-tree Old  B-tree New  GIN
1   214,234375  87,7109375  10,2109375
10  214,234375  87,7109375  10,71875
100 214,234375  87,4375 15,640625
1000214,234375  86,2578125  31,296875
1   214,234375  78,421875   104,3046875
10  214,234375  65,359375   49,078125
100 214,234375  90,140625   106,8203125
1000214,234375  214,234375  534,0625


You can note that the last row contains the same index sizes for
B-tree, which is quite logical - there is no compression if all
the keys are distinct.
Other cases looks really nice to me.
Next thing to say is that I haven't implemented posting list
compression yet. So there is still potential to decrease size of
compressed btree.

I'm almost sure, there are still some tiny bugs and missed
functions, but on the whole, the patch is ready for testing.
I'd like to get a feedback about the patch testing on some real
datasets. Any bug reports and suggestions are welcome.

Here is a couple of useful queries to inspect the data inside the
index pages:
create extension pageinspect;
select * from bt_metap('idx');
select bt.* from generate_series(1,1) as n, lateral
bt_page_stats('idx', n) as bt;
select n, bt.* from generate_series(1,1) as n, lateral
bt_page_items('idx', n) as bt;

And at last, the list of items I'm going to complete in the near
future:
1. Add storage_parameter 'enable_compression' for btree access
method which specifies whether the index handles duplicates.
default is 'off'
2. Bring back microvacuum functionality for compressed indexes.
3. Improve insertion speed. Insertions became significantly slower
with compressed btree, which is obviously not what we do want.
4. Clean the code and comments, add related documentation.


This doesn't apply cleanly against current git head. Have you caught 
up past commit 65c5fcd35?


Thank you for the notice. New patch is attached.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9673fe0..0c8e4fb 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-28 Thread Anastasia Lubennikova

31.08.2015 10:41, Anastasia Lubennikova:

Hi, hackers!
I'm going to begin work on effective storage of duplicate keys in 
B-tree index.
The main idea is to implement posting lists and posting trees for 
B-tree index pages as it's already done for GIN.


In a nutshell, effective storing of duplicates in GIN is organised as 
follows.
Index stores single index tuple for each unique key. That index tuple 
points to posting list which contains pointers to heap tuples (TIDs). 
If too many rows having the same key, multiple pages are allocated for 
the TIDs and these constitute so called posting tree.
You can find wonderful detailed descriptions in gin readme 
<https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README> 
and articles <http://www.cybertec.at/gin-just-an-index-type/>.
It also makes possible to apply compression algorithm to posting 
list/tree and significantly decrease index size. Read more in 
presentation (part 1) 
<http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>.


Now new B-tree index tuple must be inserted for each table row that we 
index.
It can possibly cause page split. Because of MVCC even unique index 
could contain duplicates.

Storing duplicates in posting list/tree helps to avoid superfluous splits.


I'd like to share the progress of my work. So here is a WIP patch.
It provides effective duplicate handling using posting lists the same 
way as GIN does it.


Layout of the tuples on the page is changed in the following way:
before:
TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key, TID 
(ip_blkid, ip_posid) + key

with patch:
TID (N item pointers, posting list offset) + key, TID (ip_blkid, 
ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid)


It seems that backward compatibility works well without any changes. But 
I haven't tested it properly yet.


Here are some test results. They are obtained by test functions 
test_btbuild and test_ginbuild, which you can find in attached sql file.
i - number of distinct values in the index. So i=1 means that all rows 
have the same key, and i=1000 means that all keys are different.

The other columns contain the index size (MB).

i   B-tree Old  B-tree New  GIN
1   214,234375  87,7109375  10,2109375
10  214,234375  87,7109375  10,71875
100 214,234375  87,4375 15,640625
1000214,234375  86,2578125  31,296875
1   214,234375  78,421875   104,3046875
10  214,234375  65,359375   49,078125
100 214,234375  90,140625   106,8203125
1000214,234375  214,234375  534,0625


You can note that the last row contains the same index sizes for B-tree, 
which is quite logical - there is no compression if all the keys are 
distinct.

Other cases looks really nice to me.
Next thing to say is that I haven't implemented posting list compression 
yet. So there is still potential to decrease size of compressed btree.


I'm almost sure, there are still some tiny bugs and missed functions, 
but on the whole, the patch is ready for testing.
I'd like to get a feedback about the patch testing on some real 
datasets. Any bug reports and suggestions are welcome.


Here is a couple of useful queries to inspect the data inside the index 
pages:

create extension pageinspect;
select * from bt_metap('idx');
select bt.* from generate_series(1,1) as n, lateral bt_page_stats('idx', 
n) as bt;
select n, bt.* from generate_series(1,1) as n, lateral 
bt_page_items('idx', n) as bt;


And at last, the list of items I'm going to complete in the near future:
1. Add storage_parameter 'enable_compression' for btree access method 
which specifies whether the index handles duplicates. default is 'off'

2. Bring back microvacuum functionality for compressed indexes.
3. Improve insertion speed. Insertions became significantly slower with 
compressed btree, which is obviously not what we do want.

4. Clean the code and comments, add related documentation.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 77c2fdf..3b61e8f 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -24,6 +24,7 @@
 #include "storage/predicate.h"
 #include "utils/tqual.h"
 
+#include "catalog/catalog.h"
 
 typedef struct
 {
@@ -60,7 +61,8 @@ static void _bt_findinsertloc(Relation rel,
   ScanKey scankey,
   IndexTuple newtup,
   BTStack stack,
-  Relation heapRel);
+  Relation heapRel,
+  bool *updposing);
 static void _bt_insertonpg(Relation rel, Buffer buf, Buffer cbuf,
 			   BTStack stack,
 			   IndexTuple itup,
@@ -113,6 +115,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 	BTStack		stack;
 	Buffer		buf;
 	OffsetNumber offset;
+	bool updposting = false;
 
 	/

Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-01-27 Thread Anastasia Lubennikova



22.01.2016 13:48, Aleksander Alekseev:

Then, this thread became too tangled. I think it's worth to write a
new message with the patch, the test script, some results and brief
overview of how does it really works. It will make following review
much easier.

Sure.

HASHHDR represents a hash table. It could be usual or partitioned.
Partitioned table is stored in a shared memory and accessed by multiple
processes simultaneously. To prevent data corruption hash table is
partitioned and each process has to acquire a lock for a corresponding
partition before accessing data in it. Number of partition is determine
by lower bits of key's hash value. Most tricky part is --- dynahash
knows nothing about these locks, all locking is done on calling side.

Since shared memory is pre-allocated and can't grow to allocate memory
in a hash table we use freeList. Also we use nentries field to count
current number of elements in a hash table. Since hash table is used by
multiple processes these fields are protected by a spinlock. Which as
it turned out could cause lock contention and create a bottleneck.

After trying a few approaches I discovered that using one spinlock per
partition works quite well. Here are last benchmark results:

http://www.postgresql.org/message-id/20151229184851.1bb7d1bd@fujitsu

Note that "no locks" solution cant be used because it doesn't guarantee
that all available memory will be used in some corner cases.

You can find a few more details and a test script in the first message
of this thread. If you have any other questions regarding this patch
please don't hesitate to ask.

InitLocks
/*
 * Compute init/max size to request for lock hashtables.  Note these
 * calculations must agree with LockShmemSize!
 */

This comment certainly requires some changes.
BTW, could you explain why init_table_size was two times less than 
max_table_size?

Although, I don't see any problems with your changes.


-hctl->nentries = 0;
-hctl->freeList = NULL;

Why did you delete these two lines? I wonder if you should rewrite them 
instead?


+ * This particular number of partitions significantly reduces lock 
contention

+ * in partitioned hash tables, almost if partitioned tables didn't use any
+ * locking at all

As far as I understood, this number was obtained experimentally? Maybe 
you should note that in the comment.



And the last, but not least.

+nelem_alloc = ((int) nelem) / partitions_number;
+if (nelem_alloc == 0)
+nelem_alloc = 1;
+
+for (i = 0; i < partitions_number; i++)
+if (!element_alloc(hashp, nelem_alloc, i))
+ereport(ERROR,
+(errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory")));


It looks like you forgot to round the result of the division.
For example, if you have nelem=25 and partitions_number=6.
25 / 6 = 4. And then you allocate 24 nelems, while 1 is lost.

What about productivity, I did a test on my notebook. Patched version 
shows small performance improvement.


pgbench -j 1 -c 1 -f pgbench.sql -T 300 postgres
base patched
127tps  145tps
pgbench -j 8 -c 8 -f pgbench.sql -T 300 postgres
base patched
247tps  270tps

But I haven't any big machine to perform tests, where the patch is 
supposed to make significant changes.

So I would rely on your and the other reviewers results.

Except mentioned notes, I suppose the patch is good enough to pass it to 
a reviewer.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-26 Thread Anastasia Lubennikova

25.01.2016 03:32, Jeff Janes:

On Fri, Jan 22, 2016 at 7:19 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

Done. I hope that my patch is close to the commit too.


Thanks for the update.

I've run into this problem:

create table foobar (x text, w text);
create unique index foobar_pkey on foobar (x) including (w);
alter table foobar add constraint foobar_pkey primary key using index
foobar_pkey;

ERROR:  index "foobar_pkey" does not have default sorting behavior
LINE 1: alter table foobar add constraint foobar_pkey primary key us...
^
DETAIL:  Cannot create a primary key or unique constraint using such an index.
Time: 1.577 ms


If I instead define the table as
create table foobar (x int, w xml);

Then I can create the index and then the primary key the first time I
do this in a session.  But then if I drop the table and repeat the
process, I get "does not have default sorting behavior" error even for
this index that previously succeeded, so I think there is some kind of
problem with the backend syscache or catcache.

create table foobar (x int, w xml);
create unique index foobar_pkey on foobar (x) including (w);
alter table foobar add constraint foobar_pkey primary key using index
foobar_pkey;
drop table foobar ;
create table foobar (x int, w xml);
create unique index foobar_pkey on foobar (x) including (w);
alter table foobar add constraint foobar_pkey primary key using index
foobar_pkey;
ERROR:  index "foobar_pkey" does not have default sorting behavior
LINE 1: alter table foobar add constraint foobar_pkey primary key us...
^
DETAIL:  Cannot create a primary key or unique constraint using such an index.


Great, I've fixed that. Thank you for the tip about cache.

I've also found and fixed related bug in copying tables with indexes:
create table tbl2 (like tbl including all);
And there's one more tiny fix in get_pkey_attnames in dblink module.

including_columns_3.0 is the latest version of patch.
And changes regarding the previous version are attached in a separate 
patch. Just to ease the review and debug.


I've changed size of pg_index.indclass array. It contains indnkeyatts 
elements now.
While pg_index.indkey still contains all attributes. And this query 
Retrieve primary key columns 
<https://wiki.postgresql.org/wiki/Retrieve_primary_key_columns> provides 
pretty non-obvious result. Is it a normal behavior here or some changes 
are required? Do you know any similar queries?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..ef6aee5 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2058,7 +2058,7 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
+			*numatts = index->indnkeyatts;
 			if (*numatts > 0)
 			{
 result = (char **) palloc(*numatts * sizeof(char *));
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 412c845..c201f8b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3515,6 +3515,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 5f7befb..aaed5a7 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -111,6 +111,8 @@ typedef struct IndexAmRoutine
 boolamclusterable;
 /* does AM handle predicate locks? */
 boolampredlocks;
+/* does AM support columns included with clause INCLUDING? */
+boolamcaninclude;
 /* type of data stored in index, or InvalidOid if variable */
 Oid amkeytype;
 
@@ -852,7 +854,8 @@ amrestrpos (IndexScanDesc scan);
using unique indexes, which are indexes that disallow
multiple entries with identical keys.  An access method that supports this
feature sets amcanunique true.
-   (At present, only b-tree supports it.)
+   (At present, only b-tree supports it.) Columns which are present in the
+   INCLUDING clause are not used to enforce uniqueness.
   
 
   
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 23bbec6..09d4e6b 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -633,7 +633,8 @@ CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST);
Indexes can also be used to enforce uniqueness of a column's value,
or the uniqueness of the combined values of more than one column.
 
-CREATE UNIQUE INDEX name ON table (column , ...);
+CREATE UNIQUE INDEX name ON ta

Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-01-21 Thread Anastasia Lubennikova

30.12.2015 16:01, Aleksander Alekseev:

Here is a clean version of the patch. Step 1 is an optimization. Step 2
refactors this:

  HTAB *
  ShmemInitHash(const char *name, /* table string name for shmem index */
- long init_size,   /* initial table size */
+ long init_size,   /* initial table size XXX ignored,
  refactor! */


Hi, I found that the patch is still not reviewed, so I've looked it over.
I haven't dived into this subject before, so my comments will be more 
general than relating to your investigation.
Sorry if some things seem like nitpicks, I just suppose that clear patch 
would be more convenient for reviewers.


First of all, why not merge both patches into one? They aren't too big 
anyway.
Then, this thread became too tangled. I think it's worth to write a new 
message with the patch, the test script, some results and brief overview 
of how does it really works. It will make following review much easier.


+   /* number of entries in hash table */
+   longnentries[NUM_LOCK_PARTITIONS];
+   /* linked list of free elements */
+   HASHELEMENT *freeList[NUM_LOCK_PARTITIONS];

I think comments should be changed, to be more informative here.

+   if (IS_PARTITIONED(hashp->hctl))
+   partitions_number = NUM_LOCK_PARTITIONS;
+   else
+   partitions_number = 1;
+
+   nelem_alloc = ((int) nelem) / partitions_number;
+   if (nelem_alloc == 0)
+   nelem_alloc = 1;

Add a comment here too.

-/* Number of partitions of the shared buffer mapping hashtable */
-#define NUM_BUFFER_PARTITIONS  128
-
 /* Number of partitions the shared lock tables are divided into */
-#define LOG2_NUM_LOCK_PARTITIONS  4
+#define LOG2_NUM_LOCK_PARTITIONS  7
 #define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)

+/* Number of partitions of the shared buffer mapping hashtable */
+#define NUM_BUFFER_PARTITIONS NUM_LOCK_PARTITIONS
+

I'm sure, it was discussed above. but these definitions do not look 
obvious at all.

Maybe you should explain this magic number 7 in the comment above?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Batch update of indexes

2016-01-21 Thread Anastasia Lubennikova


20.01.2016 17:55, Konstantin Knizhnik:

Hi,


Hi, I glad to see that you interested in that too.
I think this is a good feature and I think it will be very useful to 
have.
I have already mentioned some related problems and possible 
improvements in my presentation.
http://www.slideshare.net/AnastasiaLubennikova/indexes-dont-mean-slow-inserts 

Last two slides concern to this thread. Briefly, I've suggested to 
think about insertion buffer. Something very similar to it is already 
implemented in BRIN. It does not index last data from heap, while the 
number of last pages is less than pages_per_block.


Do you mean GIN-like usage of insertion buffer (here it is called 
"pending list")?
So that we have to combine search in the main tree and in the insert 
buffer?
Actually this is what I want to avoided (because at least in case of 
GIN pending list cause significant degrade of performance,

while up-to-date state of full text index is rarely required).



What I meant is more like a BRIN-like combination of an index scan and 
heap scan.

Maybe it could be called "deferred inserts" or "temporary read-only index"
Maybe it's similar with mysql insert buffer 
http://dev.mysql.com/doc/refman/5.7/en/innodb-insert-buffering.html

I think it'll be more clear with example. Please don't care about syntax.

CREATE TABLE tbl (c1 int);
CREATE INDEX idx on tbl(c1);

SET enable_deferred_insert(idx) = on;
At this moment, we save the last_indexed_item (its TID) somewhere in 
index metapage.


Since that moment, the data inserted into the table doesn't touch the index.
We perform some heavy insert and then go back to the normal index behavior.

SET enable_deferred_insert(idx) = off;
This command takes all the data between the last_indexed_item and the 
end of the table, and inserts it into the index at a time.


Of course there are new problems to deal with, but it's really useful 
for the use case to balance irregular heavy write load, isn't it?


BTW, could you explain, what is the reason to copy data into the pending 
list and then copy it again while flushing pending list into the index? 
Why not read this data directly from the table? I feel that I've missed 
something important here.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Batch update of indexes

2016-01-20 Thread Anastasia Lubennikova
tion plan while 
it is not needed in this case: search condition is exactly the same as 
partial index condition.

Optimal plan should be:

   Index Scan using idx1 on t (cost=0.00..4.13 rows=12263 width=0)
   Index Cond: (c1 < '10'::double precision)


There was the discussion of the patch for partial indexes.
http://postgresql.nabble.com/PATCH-index-only-scans-with-partial-indexes-td5857568.html
 Since I haven't watched it closely, It seems to be open still. I think 
it'll be interesting to you.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-19 Thread Anastasia Lubennikova



18.01.2016 01:02, David Rowley пишет:
On 14 January 2016 at 08:24, David Rowley 
<david.row...@2ndquadrant.com <mailto:david.row...@2ndquadrant.com>> 
wrote:


I will try to review the omit_opclass_4.0.patch soon.


Hi, as promised, here's my review of the omit_opclass_4.0.patch patch.


Thank you again. All mentioned points are fixed and patches are merged.
I hope it's all right now. Please check comments one more time. I rather 
doubt that I wrote everything correctly.
Also this makes me think that the name ii_KeyAttrNumbers is now 
out-of-date, as it contains
the including columns too by the looks of it. Maybe it just needs to 
drop the "Key" and become
"ii_AttrNumbers". It would be interesting to hear what others think of 
that.


I'm also wondering if indexkeys is still a good name for the 
IndexOptInfo struct member.
Including columns are not really keys, but I feel renaming that might 
cause a fair bit of code churn, so I'd be interested to hear what 
other's have to say.


I agree that KeyAttrNumbers and indexkeys are a bit confusing names, but 
I'd like to keep them at least in this patch.

It's may be worth doing "index structures refactoring" as a separate patch.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..d17a06c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
   Does an index of this type manage fine-grained predicate locks?
  
 
+  
+  amcaninclude
+  bool
+  
+  Does the access method support included columns?
+ 
+
  
   amkeytype
   oid
@@ -3714,6 +3721,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..d01af17 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -767,7 +767,8 @@ amrestrpos (IndexScanDesc scan);
using unique indexes, which are indexes that disallow
multiple entries with identical keys.  An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   (At present, only b-tree supports it.) Columns included with clause
+   INCLUDING  aren't used to enforce uniqueness.
   
 
   
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 23bbec6..09d4e6b 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -633,7 +633,8 @@ CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST);
Indexes can also be used to enforce uniqueness of a column's value,
or the uniqueness of the combined values of more than one column.
 
-CREATE UNIQUE INDEX name ON table (column , ...);
+CREATE UNIQUE INDEX name ON table (column , ...)
+INCLUDING (column , ...);
 
Currently, only B-tree indexes can be declared unique.
   
@@ -642,7 +643,9 @@ CREATE UNIQUE INDEX name ON tableINCLUDING aren't used to enforce constraints (UNIQUE,
+   PRIMARY KEY, etc).
   
 
   
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ce36a1b..8360bb6 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 
 CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON table_name [ USING method ]
 ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
+[ INCLUDING ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
 [ WITH ( storage_parameter = value [, ... ] ) ]
 [ TABLESPACE tablespace_name ]
 [ WHERE predicate ]
@@ -139,6 +140,32 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 
  
+  INCLUDING
+  
+   
+An optional INCLUDING clause allows a list of columns to be
+specified which will be included in the index, in the non-key portion of
+the index. Columns which are part of this clause cannot also exist in the
+key columns portion of the index, and vice versa. The
+INCLUDING columns exist solely to allow more queries to benefit
+from index-only scans by including certain columns in the
+index, the value of which would otherwise have to be obtained by reading
+the table's heap. Having these columns in the INCLUDING clause
+in some cases allows PostgreSQL to skip the heap read
+completely. This also allows UNIQUE indexes to be defined on
+one set of columns, which can include another set of co

Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-19 Thread Anastasia Lubennikova



12.01.2016 20:47, Jeff Janes:

It looks like the "covering" patch, with or without the "omit_opclass"
patch, does not support expressions as included columns:

create table foobar (x text, y xml);
create index on foobar (x) including  (md5(x));
ERROR:  unrecognized node type: 904
create index on foobar (x) including  ((y::text));
ERROR:  unrecognized node type: 911

I think we would probably want it to work with those (or at least to
throw a better error message).
Thank you for the notice. I couldn't fix it quickly and added a stub in 
the latest patch.

But I'll try to fix it and add expressions support a bit later.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-13 Thread Anastasia Lubennikova



13.01.2016 04:47, David Rowley :
On 13 January 2016 at 06:47, Jeff Janes <jeff.ja...@gmail.com 
<mailto:jeff.ja...@gmail.com>> wrote:



Why is omit_opclass a separate patch?  If the included columns now
never participate in the index ordering, shouldn't it be an inherent
property of the main patch that you can "cover" things without btree
opclasses?


I don't personally think the covering_unique_4.0.patch is that close 
to being too big to review, I think things would make more sense of 
the omit_opclass_4.0.patch was included together with this.




I agree that these patches should be merged. It'll be fixed it the next 
updates.
I kept them separate only for historical reasons, it was more convenient 
for me to debug them. Furthermore, I wanted to show some performance 
degradation caused by "omit_opclass" and give a way to reproduce it 
performing test with and whithot the patch.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-13 Thread Anastasia Lubennikova

13.01.2016 04:27, David Rowley:

I've also done some testing:

create table ab (a int, b int);
insert into ab select a,b from generate_Series(1,10) a(a), 
generate_series(1,1) b(b);

set enable_bitmapscan=off;
set enable_indexscan=off;

select * from ab where a = 1 and b=1;
 a | b
---+---
 1 | 1
(1 row)

set enable_indexscan = on;
select * from ab where a = 1 and b=1;
 a | b
---+---
(0 rows)

This is broken. I've not looked into why yet, but from looking at the 
EXPLAIN output I was a bit surprised to see b=1 as an index condition. 
I'd have expected a Filter maybe, but I've not looked at the EXPLAIN 
code to see how those are determined yet.


Hmm... Do you use both patches?
And could you provide index definition, I can't reproduce the problem 
assuming that index is created by the statement

CREATE INDEX idx ON ab (a) INCLUDING (b);

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-12 Thread Anastasia Lubennikova


04.01.2016 11:49, David Rowley:
On 2 December 2015 at 01:53, Anastasia Lubennikova 
<a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> 
wrote:


Finally, completed patch "covering_unique_3.0.patch" is here.
It includes the functionality discussed above in the thread,
regression tests and docs update.
I think it's quite ready for review.


Hi Anastasia,

I've maybe mentioned before that I think this is a great feature and I 
think it will be very useful to have, so I've signed up to review the 
patch, and below is the results of my first pass from reading the 
code. Apologies if some of the things seem like nitpicks, I've 
basically just listed everything I've noticed during, no matter how small.


First of all, I would like to thank you for writing such a detailed review.
All mentioned style problems, comments and typos are fixed in the patch 
v4.0.
+   An access method that supports this feature sets 
pg_am.amcanincluding true.


I don't think this belongs under the "Index Uniqueness Checks" title. 
I think the "Columns included with clause INCLUDING  aren't used to 
enforce uniqueness." that you've added before it is a good idea, but 
perhaps the details of amcanincluding are best explained elsewhere.

agree


+This clause specifies additional columns to be appended to 
the set of index columns.
+Included columns don't support any constraints 
(UNIQUE, PRMARY KEY, EXCLUSION CONSTRAINT).
+These columns can improve the performance of some queries 
 through using advantages of index-only scan
+(Or so called covering indexes. 
Covering index is the index that
+covers all columns required in the query and prevents a table 
access).
+Besides that, included attributes are not stored in index 
inner pages.
+It allows to decrease index size and furthermore it provides 
a way to extend included
+columns to store atttributes without suitable opclass (not 
implemented yet).
+This clause could be applied to both unique and nonunique 
indexes.
+It's possible to have non-unique covering index, which 
behaves as a regular

+multi-column index with a bit smaller index-size.
+Currently, only the B-tree access method supports this feature.

"PRMARY KEY" should be "PRIMARY KEY". I ended up rewriting this 
paragraph as follows.


"An optional INCLUDING clause allows a list of columns to 
be specified which will be included in the index, in the non-key 
portion of the index. Columns which are part of this clause cannot 
also exist in the indexed columns portion of the index, and vice 
versa. The INCLUDING columns exist solely to allow more 
queries to benefit from index only scans by including 
certain columns in the index, the value of which would otherwise have 
to be obtained by reading the table's heap. Having these columns in 
the INCLUDING clause in some cases allows 
PostgreSQL to skip the heap read completely. This also 
allows UNIQUE indexes to be defined on one set of columns, 
which can include another set of column in the INCLUDING 
clause, on which the uniqueness is not enforced upon. This can also be 
useful for non-unique indexes as any columns which are not required 
for the searching or ordering of records can defined in the 
INCLUDING clause, which can often reduce the size of the 
index."


Maybe not perfect, but maybe it's an improvement?



Yes, this explanation is much better. I've just added couple of notes.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..d17a06c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
   Does an index of this type manage fine-grained predicate locks?
  
 
+  
+  amcaninclude
+  bool
+  
+  Does the access method support included columns?
+ 
+
  
   amkeytype
   oid
@@ -3714,6 +3721,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..a102391 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -765,9 +765,10 @@ amrestrpos (IndexScanDesc scan);
   
PostgreSQL enforces SQL uniqueness constraints
using unique indexes, which are indexes that disallow
-   multiple entries with identical keys.  An access method that supports this
+   multiple entries with identical keys. An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   Columns included wit

Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-12 Thread Anastasia Lubennikova
for the data type.

create index on tbl (a) including (b);
CREATE INDEX

This functionality is provided by the attached patch "omit_opclass_4.0", 
which must be applied over covering_unique_4.0.patch.


I see what you were confused about, I'd had the same question at the 
very beginning of the discussion of this patch.
Now it seems a bit more clear to me. INCLUDING columns are not used for 
the searching or ordering of records, so there is no need to check 
whether they have an opclass.  INCLUDING columns perform as expected and 
it agrees with other database experience. And this patch is completed.


But it isn't perfect definitely... I found test case to explain that. 
See below.
That's why we need optional_opclass functionality, which will use 
opclass where possible and omit it in other cases.
This idea have been already described in a message Re: [PROPOSAL] 
Covering + unique indexes 
<http://www.postgresql.org/message-id/55f84df4.5030...@postgrespro.ru>as 
"partially unique index".
I suggest to separate optional_opclass task to ease syntax discussion 
and following review. And I'll implement it in the next patch a bit later.


Test case:
1) patch covering_unique_4.0 + test_covering_unique_4.0
If included columns' opclasses are used, new query plan is the same with 
the old one.

and have nearly the same execution time:

 QUERY PLAN

 Index Only Scan using oldcoveringidx on oldt  (cost=0.43..301.72 
rows=1 width=8) (actual time=0.021..0.676 rows=6 loops=1)

   Index Cond: ((c1 < 1) AND (c3 < 20))
   Heap Fetches: 0
 Planning time: 0.101 ms
 Execution time: 0.697 ms
(5 rows)

 QUERY PLAN

 Index Only Scan using newidx on newt  (cost=0.43..276.51 rows=1 
width=8) (actual time=0.020..0.665 rows=6 loops=1)

   Index Cond: ((c1 < 1) AND (c3 < 20))
   Heap Fetches: 0
 Planning time: 0.082 ms
 Execution time: 0.687 ms
(5 rows)

2) patch covering_unique_4.0 + patch omit_opclass_4.0 + 
test_covering_unique_4.0
Otherwise, new query can not use included column in Index Cond and uses 
filter instead. It slows down the query significantly.

 QUERY PLAN

 Index Only Scan using oldcoveringidx on oldt  (cost=0.43..230.39 
rows=1 width=8) (actual time=0.021..0.722 rows=6 loops=1)

   Index Cond: ((c1 < 1) AND (c3 < 20))
   Heap Fetches: 0
 Planning time: 0.091 ms
 Execution time: 0.744 ms
(5 rows)

 QUERY PLAN

 Index Only Scan using newidx on newt  (cost=0.43..374.68 rows=1 
width=8) (actual time=0.018..2.595 rows=6 loops=1)

   Index Cond: (c1 < 1)
   Filter: (c3 < 20)
   Rows Removed by Filter: 9993
   Heap Fetches: 0
 Planning time: 0.078 ms
 Execution time: 2.612 ms

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..d17a06c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
   Does an index of this type manage fine-grained predicate locks?
  
 
+  
+  amcaninclude
+  bool
+  
+  Does the access method support included columns?
+ 
+
  
   amkeytype
   oid
@@ -3714,6 +3721,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..a102391 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -765,9 +765,10 @@ amrestrpos (IndexScanDesc scan);
   
PostgreSQL enforces SQL uniqueness constraints
using unique indexes, which are indexes that disallow
-   multiple entries with identical keys.  An access method that supports this
+   multiple entries with identical keys. An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   Columns included with clause INCLUDING  aren't used to enforce uniqueness.
+   (At present, only b-tree supports them.)
   
 
   
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 23bbec6..44dddb6 100644

Re: [HACKERS]WIP: Covering + unique indexes.

2015-12-04 Thread Anastasia Lubennikova



03.12.2015 04:03, Robert Haas пишет:

On Tue, Dec 1, 2015 at 7:53 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

If we don't need c4 as an index scankey, we don't need any btree opclass on
it.
But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000; // uses the IndexOnlyScan
SELECT * FROM tbl WHERE c1=1000; // uses the IndexOnlyScan

The patch "optional_opclass" completely ignores opclasses of included
attributes.

OK, I don't get it.  Why have an opclass here at all, even optionally?


We haven't opclass on c4 and there's no need to have it.
But now (without a patch) it's impossible to create covering index, 
which contains columns with no opclass for btree.


test=# create index on tbl using btree (c1, c4);
ERROR:  data type box has no default operator class for access method 
"btree"


ComputeIndexAttrs() processes the list of index attributes and trying to 
get an opclass for each of them via GetIndexOpClass().
The patch drops this check for included attributes. So it makes possible 
to store any datatype in btree  and use IndexOnlyScan advantages.


I hope that this helps to clarify.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2015-12-01 Thread Anastasia Lubennikova

Finally, completed patch "covering_unique_3.0.patch" is here.
It includes the functionality discussed above in the thread, regression 
tests and docs update.

I think it's quite ready for review.

_Future work:_
Besides that, I'd like to get feedback about attached patch 
"optional_opclass_3.0.patch".

It should be applied on the "covering_unique_3.0.patch".

Actually, this patch is the first step to do opclasses for "included" 
columns optional

and implement real covering indexing.

Example:
CREATE TABLE tbl (c1 int, c4 box);
CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);

If we don't need c4 as an index scankey, we don't need any btree opclass 
on it.

But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000; // uses the IndexOnlyScan
SELECT * FROM tbl WHERE c1=1000; // uses the IndexOnlyScan

The patch "optional_opclass" completely ignores opclasses of included 
attributes.

To see the difference, look at the explain analyze output:

explain analyze select * from tbl where c1=2 and c4 && box '(0,0,1,1)';
  QUERY PLAN
---
 Index Only Scan using idx on tbl  (cost=0.13..4.15 rows=1 width=36) 
(actual time=0.010..0.013 rows=1 loops=1)

   Index Cond: (c1 = 2)
   Filter: (c4 && '(1,1),(0,0)'::box)

"Index Cond" shows the index ScanKey conditions and "Filter" is for 
conditions which are used after index scan. Anyway it is faster than 
SeqScan that we had before, because IndexOnlyScan avoids extra heap fetches.


As I already said, this patch is just WIP, so included opclass is not 
"optional" but actually "ignored".
And following example works worse than without the patch. Please, don't 
care about it.


CREATE TABLE tbl2 (c1 int, c2 int);
CREATE UNIQUE INDEX idx2 ON tbl2 USING btree (c1) INCLUDING (c2);
explain analyze select * from tbl2 where c1<20 and c2<5;
  QUERY PLAN
---
 Index Only Scan using idx2 on tbl2  (cost=0.28..4.68 rows=10 width=8) 
(actual time=0.055..0.066 rows=9 loops=1)

   Index Cond: (c1 < 20)
   Filter: (c2 < 5)

The question is more about suitable syntax.
We have two different optimizations here:
1. INCLUDED columns
2. Optional opclasses
It's logical to provide optional opclasses only for included columns.
Is it ok, to handle it using the same syntax and resolve all opclass 
conflicts while create index?


CREATE TABLE tbl2 (c1 int, c2 int, c4 box);
CREATE UNIQUE INDEX idx2 ON tbl2 USING btree (c1) INCLUDING (c2, c4);
CREATE UNIQUE INDEX idx3 ON tbl2 USING btree (c1) INCLUDING (c4, c2);

Of course, order of attributes is important.
Attrs which have oplass and want to use it in ScanKey must be situated 
before the others.
idx2 will use c2 in IndexCond, while idx3 will not. But I think that 
it's the job for DBA.


If you see any related changes in planner, please mention them. I 
haven't explored that part of code yet and could have missed something.


--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index b4ea227..4973e1b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
   Does an index of this type manage fine-grained predicate locks?
  
 
+  
+  amcanincluding
+  bool
+  
+  Does the access method support included columns?
+ 
+
  
   amkeytype
   oid
@@ -3704,6 +3711,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..0287c62 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -765,9 +765,11 @@ amrestrpos (IndexScanDesc scan);
   
PostgreSQL enforces SQL uniqueness constraints
using unique indexes, which are indexes that disallow
-   multiple entries with identical keys.  An access method that supports this
+   multiple entries with identical keys. An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   Columns included with clause INCLUDING  aren't used to enforce uniqueness.
+   An access method that supports this feature sets pg_am.amcanincluding true.
+   (At present, only b-tree supports them.)
   
 
   
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/i

Re: [HACKERS]WIP: Covering + unique indexes.

2015-10-09 Thread Anastasia Lubennikova



08.10.2015 19:31, Thom Brown пишет:

On 8 October 2015 at 16:18, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

Hi hackers,

I'm working on a patch that allows to combine covering and unique
functionality for btree indexes.

Previous discussion was here:
1) Proposal thread
2) Message with proposal clarification

In a nutshell, the feature allows to create index with "key" columns and
"included" columns.
"key" columns can be used as scan keys. Unique constraint relates only to
"key" columns.
"included" columns may be used as scan keys if they have suitable opclass.
Both "key" and "included" columns can be returned from index by
IndexOnlyScan.

Btree is the default index and it's used everywhere. So it requires properly
testing.  Volunteers are welcome)

Use case:
- We have a table (c1, c2, c3, c4);
- We need to have an unique index on (c1, c2).
- We would like to have a covering index on all columns to avoid reading of
heap pages.

Old way:
CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2);
CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4);

What's wrong?
Two indexes contain repeated data. Overhead to data manipulation operations
and database size.

New way:
CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);

The patch is attached.
In 'test.sql' you can find a test with detailed comments on each step, and
comparison of old and new indexes.

New feature has following syntax:
CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
Keyword INCLUDING defines the "included" columns of index. These columns
aren't concern to unique constraint.
Also, them are not stored in index inner pages. It allows to decrease index
size.

Results:
1) Additional covering index is not required anymore.
2) New index can use IndexOnlyScan on queries, where old index can't.

For example,
explain analyze select c1, c2 from newt where c1<1 and c3<20;

*more examples in 'test.sql'

Future work:
To do opclasses for "included" columns optional.

CREATE TABLE tbl (c1 int, c4 box);
CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);

If we don't need c4 as an index scankey, we don't need any btree opclass on
it.
But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000;
SELECT * FROM tbl WHERE c1=1000;

The definition output needs a space after "INCLUDING":

# SELECT 
pg_get_indexdef('people_first_name_last_name_email_idx'::regclass::oid);
  pg_get_indexdef
--
  CREATE UNIQUE INDEX people_first_name_last_name_email_idx ON people
USING btree (first_name, last_name) INCLUDING(email)
(1 row)


There is also no collation output:

# CREATE UNIQUE INDEX test_idx ON people (first_name COLLATE "en_GB",
last_name) INCLUDING (email COLLATE "pl_PL");
CREATE INDEX

# SELECT pg_get_indexdef('test_idx'::regclass::oid);
pg_get_indexdef
-
  CREATE UNIQUE INDEX test_idx ON people USING btree (first_name
COLLATE "en_GB", last_name) INCLUDING(email)
(1 row)


As for functioning, it works as described:

# EXPLAIN SELECT email FROM people WHERE (first_name,last_name) =
('Paul','Freeman');
 QUERY PLAN
--
  Index Only Scan using people_first_name_last_name_email_idx on people
  (cost=0.28..1.40 rows=1 width=21)
Index Cond: ((first_name = 'Paul'::text) AND (last_name = 'Freeman'::text))
(2 rows)


Typo:

"included columns must not intersects with key columns"

should be:

"included columns must not intersect with key columns"


Thank you for testing. Mentioned issues are fixed.


One thing I've noticed you can do with your patch, which you haven't
mentioned, is have a non-unique covering index:

# CREATE INDEX covering_idx ON people (first_name) INCLUDING (last_name);
CREATE INDEX

# EXPLAIN SELECT first_name, last_name FROM people WHERE first_name = 'Paul';
QUERY PLAN
-
  Index Only Scan using covering_idx on people  (cost=0.28..1.44 rows=4 
width=13)
Index Cond: (first_name = 'Paul'::text)
(2 rows)

But this appears to behave as if it were a regular multi-column index,
in that it will use the index for ordering rather than sort after
fetching from the index.  So is this really stored the same as a
multi-column index?  The index sizes aren't identical, so some

[HACKERS]WIP: Covering + unique indexes.

2015-10-08 Thread Anastasia Lubennikova


Hi hackers,

I'm working on a patch that allows to combine covering and unique 
functionality for btree indexes.


_Previous discussion was here:_
1) Proposal thread 
<http://www.postgresql.org/message-id/55f2ccd0.7040...@postgrespro.ru>
2) Message with proposal clarification 
<http://www.postgresql.org/message-id/55f84df4.5030...@postgrespro.ru>


In a nutshell, the feature allows to create index with "key" columns and 
"included" columns.
"key" columns can be used as scan keys. Unique constraint relates only 
to "key" columns.

"included" columns may be used as scan keys if they have suitable opclass.
Both "key" and "included" columns can be returned from index by 
IndexOnlyScan.


Btree is the default index and it's used everywhere. So it requires 
properly testing. Volunteers are welcome)


_Use case:_
- We have a table (c1, c2, c3, c4);
- We need to have an unique index on (c1, c2).
- We would like to have a covering index on all columns to avoid reading 
of heap pages.


Old way:
CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2);
CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4);

What's wrong?
Two indexes contain repeated data. Overhead to data manipulation 
operations and database size.


New way:
CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);

The patch is attached.
In 'test.sql' you can find a test with detailed comments on each step, 
and comparison of old and new indexes.


New feature has following syntax:
CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
Keyword INCLUDING defines the "included" columns of index. These columns 
aren't concern to unique constraint.
Also, them are not stored in index inner pages. It allows to decrease 
index size.


_Results:_
1) Additional covering index is not required anymore.
2) New index can use IndexOnlyScan on queries, where old index can't.

For example,
explain analyze select c1, c2 from newt where c1<1 and c3<20;

*more examples in 'test.sql'

_Future work:_
To do opclasses for "included" columns optional.

CREATE TABLE tbl (c1 int, c4 box);
CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);

If we don't need c4 as an index scankey, we don't need any btree opclass 
on it.

But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000;
SELECT * FROM tbl WHERE c1=1000;

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company





test.sql
Description: application/sql
diff --git a/src/backend/access/common/indextuple.c b/src/backend/access/common/indextuple.c
index dc588d7..83f24c3 100644
--- a/src/backend/access/common/indextuple.c
+++ b/src/backend/access/common/indextuple.c
@@ -19,6 +19,7 @@
 #include "access/heapam.h"
 #include "access/itup.h"
 #include "access/tuptoaster.h"
+#include "utils/rel.h"
 
 
 /* 
@@ -441,3 +442,30 @@ CopyIndexTuple(IndexTuple source)
 	memcpy(result, source, size);
 	return result;
 }
+
+/*
+ * Reform index tuple. Truncate nonkey (INCLUDED) attributes.
+ */
+IndexTuple
+index_reform_tuple(Relation idxrel, IndexTuple olditup, int natts, int nkeyatts)
+{
+	TupleDesc   itupdesc = RelationGetDescr(idxrel);
+	Datum   values[INDEX_MAX_KEYS];
+	boolisnull[INDEX_MAX_KEYS];
+	IndexTuple	newitup;
+
+	Assert(natts <= INDEX_MAX_KEYS);
+	Assert(nkeyatts > 0);
+	Assert(nkeyatts <= natts);
+
+	index_deform_tuple(olditup, itupdesc, values, isnull);
+
+	/* form new tuple that will contain only key attributes */
+	itupdesc->natts = nkeyatts;
+	newitup = index_form_tuple(itupdesc, values, isnull);
+	newitup->t_tid = olditup->t_tid;
+
+	itupdesc->natts = natts;
+
+	return newitup;
+}
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 77c2fdf..d14df12 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -108,18 +108,23 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 			 IndexUniqueCheck checkUnique, Relation heapRel)
 {
 	bool		is_unique = false;
-	int			natts = rel->rd_rel->relnatts;
+	int			nkeyatts = rel->rd_rel->relnatts;
 	ScanKey		itup_scankey;
 	BTStack		stack;
 	Buffer		buf;
 	OffsetNumber offset;
 
+	Assert (rel->rd_index != NULL);
+	Assert(rel->rd_index->indnatts != 0);
+	Assert(rel->rd_index->indnkeyatts != 0);
+	nkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+
 	/* we need an insertion scan key to do our search, so build one */
 	itup_scankey = _bt_mkscankey(rel, itup);
 
 top:
 	/* find the first page containing this key */
-	stack = _bt_search(rel, natts, itup_scankey, false, , BT_WRITE);
+	stack = _bt_search(rel, nkeyatts, itup_scankey, false, , BT_WRITE);
 
 	offset = InvalidOffset

Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-09-16 Thread Anastasia Lubennikova



16.09.2015 07:30, Jeff Janes:


The commit of this patch seems to have created a bug in which updated 
tuples can disappear from the index, while remaining in the table.


It looks like the bug depends on going through a crash-recovery cycle, 
but I am not sure of that yet.


I've looked through the commit diff and don't see anything obviously 
wrong.  I notice index tuples are marked dead with only a buffer 
content share lock, and the page is defragmented with only a buffer 
exclusive lock (as opposed to a super-exclusive buffer clean up 
lock).  But as far as I can tell, both of those should be safe on an 
index.  Also, if that was the bug, it should happen without 
crash-recovery.


The test is pretty simple.  I create a 10,000 row table with a 
unique-by-construction id column with a btree_gist index on it and a 
counter column, and fire single-row updates of the counter for random 
ids in high concurrency (8 processes running flat out).  I force the 
server to crash frequently with simulated torn-page writes in which 
md.c writes a partial page and then PANICs.  Eventually (1 to 3 hours) 
the updates start indicating they updated 0 rows.  At that point, a 
forced table scan will find the row, but the index doesn't.


Any hints on how to proceed with debugging this?  If I can't get it to 
reproduce the problem in the absence of crash-recovery cycles with an 
overnight run, then I think my next step will be to run it over 
hot-standby and see if WAL replay in the absence of crashes might be 
broken as well.




I've found the bug. It's because of mixed calls of
PageIndexMultiDelete() in gistvacuumpage() and
PageIndexTupleDelete() in gistRedoPageUpdateRecord().
These functions are conflicting.

I've fixed my patch by change MultiDelete to TupleDelete in 
gistvacuumpage(). Patch is attached.
But It seems to me that it would be better to rewrite all mentions of 
TupleDelete to MultiDelete in gist code.

I'm working on it.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company()
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 4edc5a7..1d02e1b 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1478,14 +1478,20 @@ gistvacuumpage(Relation rel, Page page, Buffer buffer)
 		ItemId		itemId = PageGetItemId(page, offnum);
 
 		if (ItemIdIsDead(itemId))
-			deletable[ndeletable++] = offnum;
+		{
+			deletable[ndeletable] = offnum - ndeletable;
+			ndeletable++;
+		}
 	}
 
 	if (ndeletable > 0)
 	{
+		int i;
+
 		START_CRIT_SECTION();
 
-		PageIndexMultiDelete(page, deletable, ndeletable);
+		for (i = 0; i < ndeletable; i++)
+			PageIndexTupleDelete(page, deletable[i]);
 
 		/*
 		 * Mark the page as not containing any LP_DEAD items.  This is not

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >