Re: [HACKERS] Repetitive code in RI triggers

2017-10-03 Thread Ildar Musin

Hi Maksim,

On 26.09.2017 11:51, Maksim Milyutin wrote:

On 19.09.2017 11:09, Daniel Gustafsson wrote:


Removing reviewer Maksim Milyutin from patch entry due to inactivity and
community account email bouncing.  Maksim: if you are indeed reviewing
this
patch, then please update the community account and re-add to the
patch entry.

cheers ./daniel


Daniel, thanks for noticing. I have updated my account and re-added to
the patch entry.

Ildar, your patch is conflicting with the current HEAD of master branch,
please update it.



Thank you for checking the patch out. Yes, it seems that original code 
was reformatted and this led to merging conflicts. I've fixed that and 
also introduced some minor improvements. The new version is in attachment.


Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index c2891e6..25cdf7d 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,8 +196,9 @@ static int	ri_constraint_cache_valid_count = 0;
 static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
   HeapTuple old_row,
   const RI_ConstraintInfo *riinfo);
-static Datum ri_restrict_del(TriggerData *trigdata, bool is_no_action);
-static Datum ri_restrict_upd(TriggerData *trigdata, bool is_no_action);
+static Datum ri_restrict(TriggerData *trigdata, bool is_no_action);
+static Datum ri_setnull(TriggerData *trigdata);
+static Datum ri_setdefault(FunctionCallInfo fcinfo);
 static void quoteOneName(char *buffer, const char *name);
 static void quoteRelationName(char *buffer, Relation rel);
 static void ri_GenerateQual(StringInfo buf,
@@ -605,7 +606,7 @@ RI_FKey_noaction_del(PG_FUNCTION_ARGS)
 	/*
 	 * Share code with RESTRICT case.
 	 */
-	return ri_restrict_del((TriggerData *) fcinfo->context, true);
+	return ri_restrict((TriggerData *) fcinfo->context, true);
 }
 
 /* --
@@ -630,175 +631,10 @@ RI_FKey_restrict_del(PG_FUNCTION_ARGS)
 	/*
 	 * Share code with NO ACTION case.
 	 */
-	return ri_restrict_del((TriggerData *) fcinfo->context, false);
+	return ri_restrict((TriggerData *) fcinfo->context, false);
 }
 
 /* --
- * ri_restrict_del -
- *
- *	Common code for ON DELETE RESTRICT and ON DELETE NO ACTION.
- * --
- */
-static Datum
-ri_restrict_del(TriggerData *trigdata, bool is_no_action)
-{
-	const RI_ConstraintInfo *riinfo;
-	Relation	fk_rel;
-	Relation	pk_rel;
-	HeapTuple	old_row;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
-	int			i;
-
-	/*
-	 * Get arguments.
-	 */
-	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
-	trigdata->tg_relation, true);
-
-	/*
-	 * Get the relation descriptors of the FK and PK tables and the old tuple.
-	 *
-	 * fk_rel is opened in RowShareLock mode since that's what our eventual
-	 * SELECT FOR KEY SHARE will get on it.
-	 */
-	fk_rel = heap_open(riinfo->fk_relid, RowShareLock);
-	pk_rel = trigdata->tg_relation;
-	old_row = trigdata->tg_trigtuple;
-
-	switch (riinfo->confmatchtype)
-	{
-			/* --
-			 * SQL:2008 15.17 
-			 *	General rules 9) a) iv):
-			 *		MATCH SIMPLE/FULL
-			 *			... ON DELETE RESTRICT
-			 * --
-			 */
-		case FKCONSTR_MATCH_SIMPLE:
-		case FKCONSTR_MATCH_FULL:
-			switch (ri_NullCheck(old_row, riinfo, true))
-			{
-case RI_KEYS_ALL_NULL:
-case RI_KEYS_SOME_NULL:
-
-	/*
-	 * No check needed - there cannot be any reference to old
-	 * key if it contains a NULL
-	 */
-	heap_close(fk_rel, RowShareLock);
-	return PointerGetDatum(NULL);
-
-case RI_KEYS_NONE_NULL:
-
-	/*
-	 * Have a full qualified key - continue below
-	 */
-	break;
-			}
-
-			/*
-			 * If another PK row now exists providing the old key values, we
-			 * should not do anything.  However, this check should only be
-			 * made in the NO ACTION case; in RESTRICT cases we don't wish to
-			 * allow another row to be substituted.
-			 */
-			if (is_no_action &&
-ri_Check_Pk_Match(pk_rel, fk_rel, old_row, riinfo))
-			{
-heap_close(fk_rel, RowShareLock);
-return PointerGetDatum(NULL);
-			}
-
-			if (SPI_connect() != SPI_OK_CONNECT)
-elog(ERROR, "SPI_connect failed");
-
-			/*
-			 * Fetch or prepare a saved plan for the restrict delete lookup
-			 */
-			ri_BuildQueryKey(, riinfo, RI_PLAN_RESTRICT_DEL_CHECKREF);
-
-			if ((qplan = ri_FetchPreparedPlan()) == NULL)
-			{
-StringInfoData querybuf;
-char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-char		attname[MAX_QUOTED_NAME_LEN];
-char		paramname[16];
-const char *querysep;
-Oid			queryoids[RI_MAX_NUMKEYS];
-
-/* --
- * The query string built is
- *	SELECT 1 FROM ONLY  x WHERE $1 = fkatt1 [AND ...]
- *		   FOR KEY SHARE OF x
- * The type id's for the $ parameters are those of the
- * corresponding PK attributes.
- * --
- */
-initStringInfo();
-quoteRelationName(fkrelname, fk_rel);

Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

2017-09-05 Thread Ildar Musin

Hi Alexander,

On 22.06.2017 18:36, Alexander Korotkov wrote:

On Wed, Jun 7, 2017 at 11:33 AM, Alexander Korotkov
<a.korot...@postgrespro.ru <mailto:a.korot...@postgrespro.ru>> wrote:



0002-heap-page-special-1.patch
Putting xid and multixact bases into PageHeaderData would take extra 16
bytes on index pages too.  That would be waste of space for indexes.
This is why I decided to put bases into special area of heap pages.
This patch adds special area for heap pages contaning prune xid and
magic number.  Magic number is different for regular heap page and
sequence page.


We've discussed it earlier but it worth mentioning here too. You have 
pd_prune_xid of type TransactionId which is treated elsewhere as 
ShortTransactionId (see HeapPageGetPruneXid() and HeapPageSetPruneXid()) 
and hence introduces redundant 4 bytes. Could you please fix it?


--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: global index

2017-08-25 Thread Ildar Musin

Hi,

On 24.08.2017 22:44, Andres Freund wrote:

Hi,

On 2017-08-18 12:12:58 +0300, Ildar Musin wrote:

While we've been developing pg_pathman extension one of the most frequent
questions we got from our users was about global index support. We cannot
provide it within an extension. And I couldn't find any recent discussion
about someone implementing it. So I'm thinking about giving it a shot and
start working on a patch for postgres.


FWIW, I personally think for constraints the better approach is to make
the constraint checking code cope with having to check multiple
indexes. Initially by just checking all indexes, over the longer term
perhaps pruning the set of to-be-checked indexes based on the values in
the partition key if applicable.   The problem with creating huge global
indexes is that you give away some the major advantages of partitioning:
- dropping partitions now is slow / leaves a lof of garbage again
- there's no way you can do this with individual partitions being remote
  or such
- there's a good chunk of locality loss in global indexes



I agree with you that garbage collection after partitions drop could be 
a major downside of single index scheme. On the other hand not all 
partitioning use-cases imply dropping partitions. What worries me about 
global unique index built on multiple local indexes is the need to 
lookup (almost) every index for every insert/update/FK check. In some 
cases we can reduce the number of the indexes to be checked (e.g. by 
storing min/max values in metapage), but it will not be possible if key 
values are spread across indexes evenly. And it can get quite expensive 
as partition count grows.


The good thing about multiple indexes is that they are more compact and 
manageable.


--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: global index

2017-08-19 Thread Ildar Musin


18/08/2017 17:40, Alvaro Herrera пишет:
> Ildar Musin wrote:
>
>> While we've been developing pg_pathman extension one of the most frequent
>> questions we got from our users was about global index support. We cannot
>> provide it within an extension. And I couldn't find any recent discussion
>> about someone implementing it. So I'm thinking about giving it a shot and
>> start working on a patch for postgres.
>>
>> One possible solution is to create an extended version of item pointer which
>> would store relation oid along with block number and position:
> I've been playing with the index code in order to allow indirect tuples,
> which are stored in a format different from IndexTupleData.
>
> I've been adding an "InMemoryIndexTuple" (maybe there's a better name)
> which internally has pointers to both IndexTupleData and
> IndirectIndexTupleData, which makes it easier to pass around the index
> tuple in either format.  It's very easy to add an OID to that struct,
> which then allows to include the OID in either an indirect index tuple
> or a regular one.
>
> Then, wherever we're using IndexTupleData in the index AM code, we would
> replace it with InMemoryIndexTuple.  This should satisfy both your use
> case and mine.
>
I found the thread about indirect indexes
(https://www.postgresql.org/message-id/20161018182843.xczrxsa2yd47pnru%40alvherre.pgsql),
but it seems that it haven't been updated for some time and I couldn't
find InMemoryIndexTuple in the latest patch. Is there a newer version?
Generally I think this may be a good idea.

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: global index

2017-08-19 Thread Ildar Musin


18/08/2017 17:45, Alvaro Herrera пишет:
> Erikjan Rijkers wrote:
>> On 2017-08-18 11:12, Ildar Musin wrote:
>>> Hi hackers,
>>>
>>> While we've been developing pg_pathman extension one of the most
>>> frequent questions we got from our users was about global index
>>> support. We cannot provide it within an extension. And I couldn't find
>>> any recent discussion about someone implementing it. So I'm thinking
>>> about giving it a shot and start working on a patch for postgres.
>> Sorry to be dense but what exactly is a "Global Index"?
> A global index covers all partitions of a partitioned table.  It allows
> you to have unique indexes across the partitioned table.
>
> The main disadvantage of global indexes is that you need some kind of
> cleanup after you drop a partition.  Either make partition drop wait
> until all the index pointers are removed, or you need some kind of
> after-commit cleanup process that removes them afterwards (which
> requires some assurance that they are really all gone).  You can't let
> them linger forever, or you risk a new partition that reuses the same
> OID causing the whole index to become automatically corrupt.
>
Thanks for the notion, I haven't thought this through yet. We could
probably keep the list of removed partitions somewhere in the catalog or
in the index itself. Usually autovacuum (or the process we start right
after partition drop as you suggested) would clean the index up before
OID wraparound. But if it didn't and user is trying to add a new
partition with the same oid then the cleanup will be forced. I think the
latter is very unlikely.

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: global index

2017-08-18 Thread Ildar Musin

Hi Chris,

On 18.08.2017 16:15, Chris Travers wrote:

I would really like to see global indexes.  It would make things a lot
easier for things like unique constraints across table inheritance trees.

On Fri, Aug 18, 2017 at 11:12 AM, Ildar Musin <i.mu...@postgrespro.ru
<mailto:i.mu...@postgrespro.ru>> wrote:

Hi hackers,

While we've been developing pg_pathman extension one of the most
frequent questions we got from our users was about global index
support. We cannot provide it within an extension. And I couldn't
find any recent discussion about someone implementing it. So I'm
thinking about giving it a shot and start working on a patch for
postgres.

One possible solution is to create an extended version of item
pointer which would store relation oid along with block number and
position:

struct ItemPointerExt
{
Oid ip_relid;
BlockIdData ip_blkid;
OffsetNumber ip_posid;
};

and use it in global index (regular index will still use old
version). This will require in-depth refactoring of existing index
nodes to make them support both versions. Accordingly, we could
replace ItemPointer with ItemPointerExt in index AM to make unified
API to access both regular and global indexes. TIDBitmap will
require refactoring as well to be able to deal with relation oids.


So, to be clear on-disk representations would be unchanged for old
indexes (ensuring that pg_upgrade would not be broken), right?


Yes, the idea is to keep old indexes untouched so that there would be no 
need in any further conversion. And global indexes in turn will have 
extended TID format.





It seems to be quite an invasive patch since it requires changes in
general index routines, existing index nodes, catalog, vacuum
routines and syntax. So I'm planning to implement it step by step.
As a first prototype it could be:

* refactoring of btree index to be able to store both regular and
extended item pointers;


Do you foresee any performance implementation of handling both?


It's hard to say until there is some working prototype. I think there 
can be (and most like will be) some overhead due to unified API (and 
hence addition conversion operations). It will require benchmarking to 
say how bad is it.



--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
<http://www.adjust.com/>
Saarbrücker Straße 37a, 10405 Berlin



--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: global index

2017-08-18 Thread Ildar Musin

Hi hackers,

While we've been developing pg_pathman extension one of the most 
frequent questions we got from our users was about global index support. 
We cannot provide it within an extension. And I couldn't find any recent 
discussion about someone implementing it. So I'm thinking about giving 
it a shot and start working on a patch for postgres.


One possible solution is to create an extended version of item pointer 
which would store relation oid along with block number and position:


struct ItemPointerExt
{
Oid ip_relid;
BlockIdData ip_blkid;
OffsetNumber ip_posid;
};

and use it in global index (regular index will still use old version). 
This will require in-depth refactoring of existing index nodes to make 
them support both versions. Accordingly, we could replace ItemPointer 
with ItemPointerExt in index AM to make unified API to access both 
regular and global indexes. TIDBitmap will require refactoring as well 
to be able to deal with relation oids.


It seems to be quite an invasive patch since it requires changes in 
general index routines, existing index nodes, catalog, vacuum routines 
and syntax. So I'm planning to implement it step by step. As a first 
prototype it could be:


* refactoring of btree index to be able to store both regular and 
extended item pointers;

* refactoring of TIDBitmap;
* refactoring of general index routines (index_insert, index_getnext, 
etc) and indexAM api;
* catalog (add pg_index.indisglobal attribute and/or a specific relkind 
as discussed in [1] thread);

* syntax for global index definition. E.g., it could be oracle-like syntax:

CREATE INDEX my_idx ON my_tbl (key) GLOBAL;

If it goes well, then I’ll do the rest of indexes and vacuuming. If you 
have any ideas or concerns I’ll be glad to hear it.


[1] 
https://www.postgresql.org/message-id/c8fe4f6b-ff46-aae0-89e3-e936a35f0cfd%40postgrespro.ru


Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Repetitive code in RI triggers

2017-04-10 Thread Ildar Musin

Hi all,

I was looking through the RI triggers code recently and noticed a few 
almost identical functions, e.g. ri_restrict_upd() and 
ri_restrict_del(). The following patch is an attempt to reduce some of 
repetitive code. Yet there is still room for improvement.


Thanks,
--
Ildar Musin
i.mu...@postgrespro.ru
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 37139f9..259c988 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -199,8 +199,9 @@ static int	ri_constraint_cache_valid_count = 0;
 static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
   HeapTuple old_row,
   const RI_ConstraintInfo *riinfo);
-static Datum ri_restrict_del(TriggerData *trigdata, bool is_no_action);
-static Datum ri_restrict_upd(TriggerData *trigdata, bool is_no_action);
+static Datum ri_restrict(TriggerData *trigdata, bool is_no_action, bool is_update);
+static Datum ri_setnull(TriggerData *trigdata, bool is_update);
+static Datum ri_setdefault(FunctionCallInfo fcinfo, bool is_update);
 static void quoteOneName(char *buffer, const char *name);
 static void quoteRelationName(char *buffer, Relation rel);
 static void ri_GenerateQual(StringInfo buf,
@@ -608,7 +609,7 @@ RI_FKey_noaction_del(PG_FUNCTION_ARGS)
 	/*
 	 * Share code with RESTRICT case.
 	 */
-	return ri_restrict_del((TriggerData *) fcinfo->context, true);
+	return ri_restrict((TriggerData *) fcinfo->context, true, false);
 }
 
 /* --
@@ -633,175 +634,10 @@ RI_FKey_restrict_del(PG_FUNCTION_ARGS)
 	/*
 	 * Share code with NO ACTION case.
 	 */
-	return ri_restrict_del((TriggerData *) fcinfo->context, false);
+	return ri_restrict((TriggerData *) fcinfo->context, false, false);
 }
 
 /* --
- * ri_restrict_del -
- *
- *	Common code for ON DELETE RESTRICT and ON DELETE NO ACTION.
- * --
- */
-static Datum
-ri_restrict_del(TriggerData *trigdata, bool is_no_action)
-{
-	const RI_ConstraintInfo *riinfo;
-	Relation	fk_rel;
-	Relation	pk_rel;
-	HeapTuple	old_row;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
-	int			i;
-
-	/*
-	 * Get arguments.
-	 */
-	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
-	trigdata->tg_relation, true);
-
-	/*
-	 * Get the relation descriptors of the FK and PK tables and the old tuple.
-	 *
-	 * fk_rel is opened in RowShareLock mode since that's what our eventual
-	 * SELECT FOR KEY SHARE will get on it.
-	 */
-	fk_rel = heap_open(riinfo->fk_relid, RowShareLock);
-	pk_rel = trigdata->tg_relation;
-	old_row = trigdata->tg_trigtuple;
-
-	switch (riinfo->confmatchtype)
-	{
-			/* --
-			 * SQL:2008 15.17 
-			 *	General rules 9) a) iv):
-			 *		MATCH SIMPLE/FULL
-			 *			... ON DELETE RESTRICT
-			 * --
-			 */
-		case FKCONSTR_MATCH_SIMPLE:
-		case FKCONSTR_MATCH_FULL:
-			switch (ri_NullCheck(old_row, riinfo, true))
-			{
-case RI_KEYS_ALL_NULL:
-case RI_KEYS_SOME_NULL:
-
-	/*
-	 * No check needed - there cannot be any reference to old
-	 * key if it contains a NULL
-	 */
-	heap_close(fk_rel, RowShareLock);
-	return PointerGetDatum(NULL);
-
-case RI_KEYS_NONE_NULL:
-
-	/*
-	 * Have a full qualified key - continue below
-	 */
-	break;
-			}
-
-			/*
-			 * If another PK row now exists providing the old key values, we
-			 * should not do anything.  However, this check should only be
-			 * made in the NO ACTION case; in RESTRICT cases we don't wish to
-			 * allow another row to be substituted.
-			 */
-			if (is_no_action &&
-ri_Check_Pk_Match(pk_rel, fk_rel, old_row, riinfo))
-			{
-heap_close(fk_rel, RowShareLock);
-return PointerGetDatum(NULL);
-			}
-
-			if (SPI_connect() != SPI_OK_CONNECT)
-elog(ERROR, "SPI_connect failed");
-
-			/*
-			 * Fetch or prepare a saved plan for the restrict delete lookup
-			 */
-			ri_BuildQueryKey(, riinfo, RI_PLAN_RESTRICT_DEL_CHECKREF);
-
-			if ((qplan = ri_FetchPreparedPlan()) == NULL)
-			{
-StringInfoData querybuf;
-char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-char		attname[MAX_QUOTED_NAME_LEN];
-char		paramname[16];
-const char *querysep;
-Oid			queryoids[RI_MAX_NUMKEYS];
-
-/* --
- * The query string built is
- *	SELECT 1 FROM ONLY  x WHERE $1 = fkatt1 [AND ...]
- *		   FOR KEY SHARE OF x
- * The type id's for the $ parameters are those of the
- * corresponding PK attributes.
- * --
- */
-initStringInfo();
-quoteRelationName(fkrelname, fk_rel);
-appendStringInfo(, "SELECT 1 FROM ONLY %s x",
- fkrelname);
-querysep = "WHERE";
-for (i = 0; i < riinfo->nkeys; i++)
-{
-	Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-	Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-	quoteOneName(attname,
- RIAttName(fk_rel, riinfo->fk_attnums[i]));
-	sprintf(paramname, "$%d&qu

Re: [HACKERS] Declarative partitioning - another take

2016-12-14 Thread Ildar Musin

Hi,

On 13.12.2016 21:10, Robert Haas wrote:

On Tue, Dec 13, 2016 at 12:22 PM, Ildar Musin <i.mu...@postgrespro.ru> wrote:

We've noticed that PartitionDispatch object is built on every INSERT query
and that it could create unnecessary overhead. Wouldn't it be better to keep
it in relcache?


You might be able to cache some of that data in the relcache, but List
*keystate is pointing to query-lifespan data, so you can't cache that.



Yes, you are right. I meant mostly the 'indexes' field. I've measured 
insert performance with perf in case when there are thousand partitions 
and it seems that 34% of the time it takes to run 
RelationGetPartitionDispatchInfo() which builds this indexes array. And 
the most of the time it spends on acquiring locks on all partitions 
which is unnecessary if we're inserting in just a single partition. 
Probably we could avoid this by moving at least indexes field into cache.


--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-13 Thread Ildar Musin

Hi hackers,

On 08.12.2016 19:44, Dmitry Ivanov wrote:

That would be fantastic.  I and my colleagues at EnterpriseDB can
surely help review; of course, maybe you and some of your colleagues
would like to help review our patches, too.


Certainly, I'll start reviewing as soon as I get familiar with the code.



Do you think this is
likely to be something where you can get something done quickly, with
the hope of getting it into v10?


Yes, I've just cleared my schedule in order to make this possible. I'll
bring in the patches ASAP.




We've noticed that PartitionDispatch object is built on every INSERT 
query and that it could create unnecessary overhead. Wouldn't it be 
better to keep it in relcache?


Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index Onlys Scan for expressions

2016-09-07 Thread Ildar Musin

Hi Vladimir,

On 05.09.2016 16:38, Ildar Musin wrote:

Hi Vladimir,

On 03.09.2016 19:31, Vladimir Sitnikov wrote:

Ildar>The reason why this doesn't work is that '~~' operator (which is a
Ildar>synonym for 'like') isn't supported by operator class for btree.
Since
Ildar>the only operators supported by btree are <, <=, =, >=, >, you
can use
Ildar>it with queries like:

Ildar>And in 3rd query 'OFFSET' statement prevents rewriter from
Ildar>transforming the query, so it is possible to use index only scan on
Ildar>subquery and then filter the result of subquery with '~~' operator.

I'm afraid I do not follow you.
Note: query 3 is 100% equivalent of query 2, however query 3 takes 55
times less reads.
It looks like either an optimizer bug, or some missing feature in the
"index only scan" logic.

Here's quote from "query 2" (note % are at both ends):  ... where
type=42) as x where upper_vc like '%ABC%';

Note: I do NOT use "indexed scan" for the like operator. I'm very well
aware
that LIKE patterns with leading % cannot be optimized to a btree range
scan.
What I want is "use the first indexed column as index scan, then use the
second column
for filtering".

As shown in "query 2" vs "query 3", PostgreSQL cannot come up with such
a plan on its own
for some reason.

This is not a theoretical issue, but it is something that I use a lot
with Oracle DB (it just creates a good plan for "query 2").

Vladimir


Thanks, I get it now. The reason why it acts like this is that I used
match_clause_to_index() function to determine if IOS can be used with
the specified clauses. This function among other things checks if
operator matches the index opfamily. Apparently this isn't correct. I
wrote another prototype to test your case and it seems to work. But it's
not ready for public yet, I'll publish it in 1-2 days.



Here is a new patch version. I modified check_index_only_clauses() so 
that it doesn't use match_clause_to_indexcol() anymore. Instead it 
handles different types of expressions including binary operator 
expressions, scalar array expressions, row compare expressions (e.g. 
(a,b)<(1,2)) and null tests and tries to match each part of expression 
to index regardless an operator. I reproduced your example and was able 
to get index only scan on all queries. Could you please try the patch 
and tell if it works for you?


--
Ildar Musin
i.mu...@postgrespro.ru
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2952bfb..e81f914 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
@@ -78,6 +79,14 @@ typedef struct
 	int			indexcol;		/* index column we want to match to */
 } ec_member_matches_arg;
 
+/* Context for index only expression checker */
+typedef struct
+{
+	IndexOptInfo   *index;
+	bool			match;	/* TRUE if expression matches
+			 * index */
+} check_index_only_walker_ctx;
+
 
 static void consider_index_join_clauses(PlannerInfo *root, RelOptInfo *rel,
 			IndexOptInfo *index,
@@ -130,7 +139,15 @@ static PathClauseUsage *classify_index_clause_usage(Path *path,
 static Relids get_bitmap_tree_required_outer(Path *bitmapqual);
 static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds);
 static int	find_list_position(Node *node, List **nodelist);
-static bool check_index_only(RelOptInfo *rel, IndexOptInfo *index);
+static bool check_index_only(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index);
+static bool check_index_only_targetlist(PlannerInfo *root,
+			RelOptInfo *rel,
+			IndexOptInfo *index,
+			Bitmapset *index_canreturn_attrs);
+static bool check_index_only_clauses(List *clauses,
+		 IndexOptInfo *index);
+static bool check_index_only_expr_walker(Node *node, check_index_only_walker_ctx *ctx);
+static bool check_index_only_expr(Node *expr, IndexOptInfo *index);
 static double get_loop_count(PlannerInfo *root, Index cur_relid, Relids outer_relids);
 static double adjust_rowcount_for_semijoins(PlannerInfo *root,
 			  Index cur_relid,
@@ -1020,7 +1037,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	 * index data retrieval anyway.
 	 */
 	index_only_scan = (scantype != ST_BITMAPSCAN &&
-	   check_index_only(rel, index));
+	   check_index_only(root, rel, index));
 
 	/*
 	 * 4. Generate an indexscan path if there are relevant restriction clauses
@@ -1786,7 +1803,7 @@ find_list_position(Node *node, List **nodelist)
  *		Determine whether an index-only scan is possible for this index.
  */
 static bool
-check_index_only(

Re: [HACKERS] Index Onlys Scan for expressions

2016-09-05 Thread Ildar Musin

Hi Tomas,

On 03.09.2016 14:37, Tomas Vondra wrote:

Hi Ildar,

I've looked at this patch again today to do a bit more thorough review,
and I think it's fine. There are a few comments (particularly in the new
code in check_index_only) that need improving, and also a few small
tweaks in the walkers.

Attached is a modified v5 patch with the proposed changes - it's
probably easier than explaining what the changes should/might be.

I've added an XXX comment in check_index_only_expr_walker - ISTM we're
first explicitly matching  Vars to index attributes, and then dealing
with expressions. But we loop over all index columns (including those
that can only really match Vars). Not sure if it makes any difference,
but is it worth differentiating between Var and non-Var expressions? Why
not to simply call match_index_to_operand() in both cases?



Thank you! Yes, you're right and probably it doesn't worth it. I 
intended to optimize just a little bit since we already have attribute 
numbers that can be returned by index. But as you have already noted in 
comment it is a cheap check and it would barely be noticeable.



I've also tweaked a few comments to match project code style, and moved
a few variables into the block where they are used. But the latter is
probably matter of personal taste, I guess.


regards



Thanks for that, I have some difficulties in expressing myself in 
english, so it was very helpful.


Best regards,
--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index Onlys Scan for expressions

2016-09-05 Thread Ildar Musin

Hi Vladimir,

On 03.09.2016 19:31, Vladimir Sitnikov wrote:

Ildar>The reason why this doesn't work is that '~~' operator (which is a
Ildar>synonym for 'like') isn't supported by operator class for btree. Since
Ildar>the only operators supported by btree are <, <=, =, >=, >, you can use
Ildar>it with queries like:

Ildar>And in 3rd query 'OFFSET' statement prevents rewriter from
Ildar>transforming the query, so it is possible to use index only scan on
Ildar>subquery and then filter the result of subquery with '~~' operator.

I'm afraid I do not follow you.
Note: query 3 is 100% equivalent of query 2, however query 3 takes 55
times less reads.
It looks like either an optimizer bug, or some missing feature in the
"index only scan" logic.

Here's quote from "query 2" (note % are at both ends):  ... where
type=42) as x where upper_vc like '%ABC%';

Note: I do NOT use "indexed scan" for the like operator. I'm very well aware
that LIKE patterns with leading % cannot be optimized to a btree range scan.
What I want is "use the first indexed column as index scan, then use the
second column
for filtering".

As shown in "query 2" vs "query 3", PostgreSQL cannot come up with such
a plan on its own
for some reason.

This is not a theoretical issue, but it is something that I use a lot
with Oracle DB (it just creates a good plan for "query 2").

Vladimir


Thanks, I get it now. The reason why it acts like this is that I used 
match_clause_to_index() function to determine if IOS can be used with 
the specified clauses. This function among other things checks if 
operator matches the index opfamily. Apparently this isn't correct. I 
wrote another prototype to test your case and it seems to work. But it's 
not ready for public yet, I'll publish it in 1-2 days.


--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index Onlys Scan for expressions

2016-09-02 Thread Ildar Musin

Hi Vladimir,

On 23.08.2016 23:35, Vladimir Sitnikov wrote:

Hi,

I've tried your indexonlypatch5.patch against REL9_6_BETA3.
Here are some results.

TL;DR:
1) <> does not support
index-only scan for index (type, upper(vc) varchar_pattern_ops).
3) <<(... where type=42 offset 0) where upper_vc like '%ABC%'>> does
trigger index-only scan. IOS reduces number of buffers from 977 to 17
and that is impressive.

Can IOS be used for simple query like #1 as well?



Thanks for checking out the patch. Sorry for the delayed reply.


Here are the details.

drop table vlsi;
create table vlsi(type numeric, vc varchar(500));
insert into vlsi(type,vc) select round(x/1000),
md5('||x)||md5('||x+1)||md5(''||x+2) from generate_series(1,100) as
s(x);
create index type_vc__vlsi on vlsi(type, upper(vc) varchar_pattern_ops);
vacuum analyze vlsi;

0) Smoke test (index only scan works when selecting indexed expression):

explain (analyze, buffers) select type, upper(vc) from vlsi where type=42;

 Index Only Scan using type_vc__vlsi on vlsi  (cost=0.55..67.97 rows=971
width=36) (actual time=0.012..0.212 rows=1000 loops=1)
   Index Cond: (type = '42'::numeric)
   Heap Fetches: 0
   Buffers: shared hit=17
 Planning time: 0.112 ms
 Execution time: 0.272 ms

1) When trying to apply "like condition", index only scan does not work.
Note: "buffers hit" becomes 977 instead of 17.

explain (analyze, buffers) select type, upper(vc) from vlsi where
type=42 and upper(vc) like '%ABC%';

 Index Scan using type_vc__vlsi on vlsi  (cost=0.55..1715.13 rows=20
width=36) (actual time=0.069..1.343 rows=23 loops=1)
   Index Cond: (type = '42'::numeric)
   Filter: (upper((vc)::text) ~~ '%ABC%'::text)
   Rows Removed by Filter: 977
   Buffers: shared hit=939
 Planning time: 0.104 ms
 Execution time: 1.358 ms



The reason why this doesn't work is that '~~' operator (which is a 
synonym for 'like') isn't supported by operator class for btree. Since 
the only operators supported by btree are <, <=, =, >=, >, you can use 
it with queries like:


explain (analyze, buffers) select type, upper(vc) from vlsi where 
type=42 and upper(vc) ~~ 'ABC%';
 QUERY PLAN 


-
 Index Only Scan using type_vc__vlsi on vlsi  (cost=0.55..4.58 rows=1 
width=36) (actual time=0.021..0.021 rows=0 loops=1)
   Index Cond: ((type = '42'::numeric) AND ((upper((vc)::text)) ~>=~ 
'ABC'::text) AND ((upper((vc)::text)) ~<~ 'ABD'::text))

   Filter: ((upper((vc)::text)) ~~ 'ABC%'::text)
   Heap Fetches: 0
   Buffers: shared hit=4
 Planning time: 0.214 ms
 Execution time: 0.044 ms
(7 rows)

In case of fixed prefix postgres implicitly substitutes '~~' operator 
with two range operators:


((upper((vc)::text)) ~>=~ 'ABC'::text) AND ((upper((vc)::text)) ~<~ 
'ABD'::text)


so that you can use these conditions to lookup in btree.


Mere "subquery" does not help: still no index-only scan

2) explain (analyze, buffers) select * from (select type, upper(vc)
upper_vc from vlsi where type=42) as x where upper_vc like '%ABC%';

 Index Scan using type_vc__vlsi on vlsi  (cost=0.55..1715.13 rows=20
width=36) (actual time=0.068..1.344 rows=23 loops=1)
   Index Cond: (type = '42'::numeric)
   Filter: (upper((vc)::text) ~~ '%ABC%'::text)
   Rows Removed by Filter: 977
   Buffers: shared hit=939
 Planning time: 0.114 ms
 Execution time: 1.357 ms

3) "offset 0" trick does help:
explain (analyze, buffers) select * from (select type, upper(vc)
upper_vc from vlsi where type=42 offset 0) as x where upper_vc like '%ABC%';

 Subquery Scan on x  (cost=0.55..80.11 rows=39 width=36) (actual
time=0.033..0.488 rows=23 loops=1)
   Filter: (x.upper_vc ~~ '%ABC%'::text)
   Rows Removed by Filter: 977
   Buffers: shared hit=17
   ->  Index Only Scan using type_vc__vlsi on vlsi  (cost=0.55..67.97
rows=971 width=36) (actual time=0.015..0.210 rows=1000 loops=1)
 Index Cond: (type = '42'::numeric)
 Heap Fetches: 0
 Buffers: shared hit=17
 Planning time: 0.086 ms
 Execution time: 0.503 ms

Vladimir


I debugged the last two queries to figure out the difference between 
them. It turned out that that the query 2) transforms to the same as 
query 1). And in 3rd query 'OFFSET' statement prevents rewriter from 
transforming the query, so it is possible to use index only scan on 
subquery and then filter the result of subquery with '~~' operator.


--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Index Onlys Scan for expressions

2016-08-15 Thread Ildar Musin

Hi, hackers!

There is a known issue that index only scan (IOS) can only work with 
simple index keys based on single attributes and doesn't work with index 
expressions. In this patch I propose a solution that adds support of IOS 
for index expressions. Here's an example:


create table abc(a int, b int, c int);
create index on abc ((a * 1000 + b), c);

with t1 as (select generate_series(1, 1000) as x),
 t2 as (select generate_series(0, 999) as x)
insert into abc(a, b, c)
select t1.x, t2.x, t2.x from t1, t2;
vacuum analyze;

Explain results with the patch:

explain (analyze, buffers) select a * 1000 + b + c from abc where a * 
1000 + b = 1001;

   QUERY PLAN
-
 Index Only Scan using abc_expr_c_idx on abc  (cost=0.42..4.45 rows=1 
width=4) (actual time=0.032..0.033 rows=1 loops=1)

   Index Cond: a * 1000) + b)) = 1001)
   Heap Fetches: 0
   Buffers: shared hit=4
 Planning time: 0.184 ms
 Execution time: 0.077 ms
(6 rows)

Before the patch it was:

explain (analyze, buffers) select a * 1000 + b + c from abc where a * 
1000 + b = 1001;

 QUERY PLAN

 Index Scan using abc_expr_c_idx on abc  (cost=0.42..8.45 rows=1 
width=4) (actual time=0.039..0.041 rows=1 loops=1)

   Index Cond: (((a * 1000) + b) = 1001)
   Buffers: shared hit=4
 Planning time: 0.177 ms
 Execution time: 0.088 ms
(5 rows)

This solution has limitations though: the restriction or the target 
expression tree (or its part) must match exactly the index. E.g. this 
expression will pass the check:


select a * 1000 + b + 100 from ...

but this will fail:

select 100 + a * 1000 + b from ...

because the parser groups it as:

(100 + a * 1000) + b

In this form it won't match any index key. Another case is when we 
create index on (a+b) and then make query like 'select b+a ...' or '... 
where b+a = smth' -- it won't match. This applies to regular index scan 
too. Probably it worth to discuss the way to normalize index expressions 
and clauses and work out more convenient way to match them.
Anyway, I will be grateful if you take a look at the patch in 
attachment. Any comments and tips are welcome.


Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2952bfb..9eb0e12 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
@@ -130,7 +131,13 @@ static PathClauseUsage *classify_index_clause_usage(Path *path,
 static Relids get_bitmap_tree_required_outer(Path *bitmapqual);
 static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds);
 static int	find_list_position(Node *node, List **nodelist);
-static bool check_index_only(RelOptInfo *rel, IndexOptInfo *index);
+static bool check_index_only(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index);
+static bool check_index_only_targetlist(PlannerInfo *root,
+			RelOptInfo *rel,
+			IndexOptInfo *index,
+			Bitmapset *index_canreturn_attrs);
+static bool check_index_only_clauses(List *clauses,
+		 IndexOptInfo *index);
 static double get_loop_count(PlannerInfo *root, Index cur_relid, Relids outer_relids);
 static double adjust_rowcount_for_semijoins(PlannerInfo *root,
 			  Index cur_relid,
@@ -190,7 +197,6 @@ static List *network_prefix_quals(Node *leftop, Oid expr_op, Oid opfamily,
 static Datum string_to_datum(const char *str, Oid datatype);
 static Const *string_to_const(const char *str, Oid datatype);
 
-
 /*
  * create_index_paths()
  *	  Generate all interesting index paths for the given relation.
@@ -1020,7 +1026,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	 * index data retrieval anyway.
 	 */
 	index_only_scan = (scantype != ST_BITMAPSCAN &&
-	   check_index_only(rel, index));
+	   check_index_only(root, rel, index));
 
 	/*
 	 * 4. Generate an indexscan path if there are relevant restriction clauses
@@ -1780,13 +1786,12 @@ find_list_position(Node *node, List **nodelist)
 	return i;
 }
 
-
 /*
  * check_index_only
  *		Determine whether an index-only scan is possible for this index.
  */
 static bool
-check_index_only(RelOptInfo *rel, IndexOptInfo *index)
+check_index_only(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index)
 {
 	bool		result;
 	Bitmapset  *attrs_used = NULL;
@@ -1836,10 +1841,7 @@ check_index_only(RelOptInfo *rel, IndexOptInfo 

[HACKERS] Unused argument in PinBuffer function

2016-08-04 Thread Ildar Musin

Hi all,

I was looking through the buffer manager's code and have noticed that 
function PinBuffer has an unused 'strategy' argument. It's seems that 
after refactoring made by Alexander Korotkov and Andres Freund 
(48354581a49c30f5757c203415aa8412d85b0f70) it became useless. The 
attached patch removes it. Probably someone more advanced could edit the 
function description to reflect changes?


Regards,

--
Ildar Musin
i.mu...@postgrespro.ru

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 76ade37..1eaa1cb 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -432,7 +432,7 @@ static Buffer ReadBuffer_common(SMgrRelation reln, char relpersistence,
   ForkNumber forkNum, BlockNumber blockNum,
   ReadBufferMode mode, BufferAccessStrategy strategy,
   bool *hit);
-static bool PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy);
+static bool PinBuffer(BufferDesc *buf);
 static void PinBuffer_Locked(BufferDesc *buf);
 static void UnpinBuffer(BufferDesc *buf, bool fixOwner);
 static void BufferSync(int flags);
@@ -1020,7 +1020,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 */
 		buf = GetBufferDescriptor(buf_id);
 
-		valid = PinBuffer(buf, strategy);
+		valid = PinBuffer(buf);
 
 		/* Can release the mapping lock as soon as we've pinned it */
 		LWLockRelease(newPartitionLock);
@@ -1232,7 +1232,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 			buf = GetBufferDescriptor(buf_id);
 
-			valid = PinBuffer(buf, strategy);
+			valid = PinBuffer(buf);
 
 			/* Can release the mapping lock as soon as we've pinned it */
 			LWLockRelease(newPartitionLock);
@@ -1563,7 +1563,7 @@ ReleaseAndReadBuffer(Buffer buffer,
  * some callers to avoid an extra spinlock cycle.
  */
 static bool
-PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy)
+PinBuffer(BufferDesc *buf)
 {
 	Buffer		b = BufferDescriptorGetBuffer(buf);
 	bool		result;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Confusing TAP tests readme file

2016-07-25 Thread Ildar Musin

Hi all,

I was checking out TAP tests documentation. And I found confusing this 
part of src/test/perl/README file:


my $ret = $node->psql('postgres', 'SELECT 1');
is($ret, '1', 'SELECT 1 returns 1');

The returning value of psql() function is the exit code of the psql. 
Hence this test will never pass since psql returns 0 if query was 
successfully executed. Probably it was meant to be the safe_psql() 
function instead which returns stdout:


my $ret = $node->safe_psql('postgres', 'SELECT 1');
is($ret, '1', 'SELECT 1 returns 1');

or else:

my ($ret, $stdout, $stderr) =
$node->psql('postgres', 'SELECT 1');
is($stdout, '1', 'SELECT 1 returns 1');

The attached patch fixes this.

Regards,
Ildar Musin

--
Ildar Musin
i.mu...@postgrespro.ru

diff --git a/src/test/perl/README b/src/test/perl/README
index 9eae159..36d4120 100644
--- a/src/test/perl/README
+++ b/src/test/perl/README
@@ -41,7 +41,7 @@ against them and evaluate the results. For example:
 $node->init;
 $node->start;
 
-my $ret = $node->psql('postgres', 'SELECT 1');
+my $ret = $node->safe_psql('postgres', 'SELECT 1');
 is($ret, '1', 'SELECT 1 returns 1');
 
 $node->stop('fast');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-05-20 Thread Ildar Musin

Hi Amit,

On 20.05.2016 11:37, Amit Langote wrote:

Perhaps you're already aware but may I also suggest looking at how clauses
are matched to indexes?  For example, consider how
match_clauses_to_index() in src/backend/optimizer/path/indxpath.c works.

Thanks, I'll take a closer look at it.

Moreover, instead of pruning partitions in planner prep phase, might it
not be better to do that when considering paths for the (partitioned) rel?
  IOW, instead of looking at parse->jointree, we should rather be working
with rel->baserestrictinfo.  Although, that would require some revisions
to how append_rel_list, simple_rel_list, etc. are constructed and
manipulated in a given planner invocation.  Maybe it's time for that...
Again, you may have already considered these things.

Yes, you're right, this is how we did it in pg_pathman extension. But 
for this patch it requires further consideration and I'll do it in future!

Could you try with the attached updated set of patches?  I changed
partition descriptor relcache code to eliminate excessive copying in
previous versions.

Thanks,
Amit
I tried your new patch and got following results, which are quite close 
to the ones using pointer to PartitionDesc structure (TPS):


# of partitions | single row | single partition
++--
100 |   3014 | 1024
   1000 |   2964 | 1001
   2000 |   2874 | 1000

However I've encountered a problem which is that postgres crashes 
occasionally while creating partitions. Here is function that reproduces 
this behaviour:


CREATE OR REPLACE FUNCTION fail()
 RETURNS void
 LANGUAGE plpgsql
AS $$
BEGIN
DROP TABLE IF EXISTS abc CASCADE;
CREATE TABLE abc (id SERIAL NOT NULL, a INT, dt TIMESTAMP) PARTITION BY 
RANGE (a);

CREATE INDEX ON abc (a);
CREATE TABLE abc_0 PARTITION OF abc FOR VALUES START (0) END (1000);
CREATE TABLE abc_1 PARTITION OF abc FOR VALUES START (1000) END (2000);
END
$$;

SELECT fail();

It happens not every time but quite often. It doesn't happen if I 
execute this commands one by one in psql. Backtrace:


#0  range_overlaps_existing_partition (key=0x7f1097504410, 
range_spec=0x1d0f400, pdesc=0x1d32200, with=0x7ffe437ead00) at 
partition.c:747
#1  0x0054c2a5 in StorePartitionBound (relid=245775, 
parentId=245770, bound=0x1d0f400) at partition.c:578
#2  0x0061bfc4 in DefineRelation (stmt=0x1d0dfe0, relkind=114 
'r', ownerId=10, typaddress=0x0) at tablecmds.c:739
#3  0x007f4473 in ProcessUtilitySlow (parsetree=0x1d1a150, 
queryString=0x1d1d940 "CREATE TABLE abc_0 PARTITION OF abc FOR VALUES 
START (0) END (1000)", context=PROCESS_UTILITY_QUERY, params=0x0, 
dest=0xdb5ca0 , completionTag=0x7ffe437eb500 "")

at utility.c:983
#4  0x007f425e in standard_ProcessUtility (parsetree=0x1d1a150, 
queryString=0x1d1d940 "CREATE TABLE abc_0 PARTITION OF abc FOR VALUES 
START (0) END (1000)", context=PROCESS_UTILITY_QUERY, params=0x0, 
dest=0xdb5ca0 ,

completionTag=0x7ffe437eb500 "") at utility.c:907
#5  0x007f3354 in ProcessUtility (parsetree=0x1d1a150, 
queryString=0x1d1d940 "CREATE TABLE abc_0 PARTITION OF abc FOR VALUES 
START (0) END (1000)", context=PROCESS_UTILITY_QUERY, params=0x0, 
dest=0xdb5ca0 , completionTag=0x7ffe437eb500 "")

at utility.c:336
#6  0x0069f8b2 in _SPI_execute_plan (plan=0x1d19cf0, 
paramLI=0x0, snapshot=0x0, crosscheck_snapshot=0x0, read_only=0 '\000', 
fire_triggers=1 '\001', tcount=0) at spi.c:2200
#7  0x0069c735 in SPI_execute_plan_with_paramlist 
(plan=0x1d19cf0, params=0x0, read_only=0 '\000', tcount=0) at spi.c:450
#8  0x7f108cc6266f in exec_stmt_execsql (estate=0x7ffe437eb8e0, 
stmt=0x1d05318) at pl_exec.c:3517
#9  0x7f108cc5e5fc in exec_stmt (estate=0x7ffe437eb8e0, 
stmt=0x1d05318) at pl_exec.c:1503
#10 0x7f108cc5e318 in exec_stmts (estate=0x7ffe437eb8e0, 
stmts=0x1d04c98) at pl_exec.c:1398
#11 0x7f108cc5e1af in exec_stmt_block (estate=0x7ffe437eb8e0, 
block=0x1d055e0) at pl_exec.c:1336
#12 0x7f108cc5c35d in plpgsql_exec_function (func=0x1cc2a90, 
fcinfo=0x1cf7f50, simple_eval_estate=0x0) at pl_exec.c:434

...

Thanks

--
Ildar Musin
i.mu...@postgrespro.ru



Re: [HACKERS] Declarative partitioning

2016-05-18 Thread Ildar Musin

Hi Amit and all,

On 18.05.2016 04:26, Amit Langote wrote:

On 2016/05/18 2:22, Tom Lane wrote:

The two ways that we've dealt with this type of hazard are to copy data
out of the relcache before using it; or to give the relcache the
responsibility of not moving a particular portion of data if it did not
change.  From memory, the latter applies to the tuple descriptor and
trigger data, but we've done most other things the first way.

It seems that tuple descriptor is reference-counted; however trigger data
is copied.  The former seems to have been done on performance grounds (I
found 06e10abc).

So for a performance-sensitive relcache data structure, refcounting is the
way to go (although done quite rarely)?  In this particular case, it is a
"partition descriptor" that could get big for a partitioned table with
partitions in hundreds or thousands.

Thanks,
Amit


Here is an experimental patch that optimizes planning time for range 
partitioned tables (it could be considered as a "proof of concept"). 
Patch should be applied on top of Amit's declarative partitioning patch. 
It handles only a very special case (often used though) where 
partitioning key consists of just a single attribute and doesn't contain 
expressions.


The main idea is the following:
* we are looking for clauses like 'VAR OP CONST' (where VAR is 
partitioning key attribute, OP is a comparison operator);

* using binary search find a partition (X) that fits CONST value;
* based on OP operator determine which partitions are also covered by 
clause. There are possible cases:
   1. If OP is '<' or '<=' then we need partitions standing left from X 
(including)
   2. If OP is '>' or '>=' then we need partitions standing right from 
X (including)

   3. If OP is '=' the we need only X partition
  (for '<' and '>' operators we also check if CONST value is equal to a 
lower or upper boundary (accordingly) and if it's true then exclude X).


For boolean expressions we evaluate left and right sides accordingly to 
algorithm above and then based on boolean operator find intersection 
(for AND) or union (for OR).


I run some benchmarks on:
1. original constraint exclusion mechanism,
2. optimized version (this patch) and
3. optimized version using relation->rd_partdesc pointer instead of 
RelationGetPartitionDesc() function (see previous discussion).


Initial conditions:

CREATE TABLE abc (id SERIAL NOT NULL, a INT, dt TIMESTAMP) PARTITION BY 
RANGE (a);

CREATE TABLE abc_1 PARTITION OF abc FOR VALUES START (0) END (1000);
CREATE TABLE abc_2 PARTITION OF abc FOR VALUES START (1000) END (2000);
...
etc
INSERT INTO %s (a) SELECT generate_series(0,  * 1000);

pgbench scripts: 
https://gist.github.com/zilder/872e634a8eeb405bd045465fc9527e53 (where 
:partitions is a number of partitions).
The first script tests fetching a single row from the partitioned table. 
Results (tps):


# of partitions | constraint excl. | optimized | optimized (using 
pointer)

+--+---+
100 |  658 |  2906 | 3079
   1000 |   45 |  2174 | 3021
   2000 |   22 |  1667 | 2919


The second script tests fetching all data from a single partition. 
Results (tps):


# of partitions | constraint excl. | optimized | optimized (using 
pointer)

+--+---+
100 |  317 |  1001 | 1051
   1000 |   34 |   941 | 1023
   2000 |   15 |   813 | 1016

Optimized version works much faster on large amount of partitions and 
degradates slower than constraint exclusion. But still there is a 
noticeable performance degradation from copying PartitionDesc structure: 
with 2000 partitions RelationGetPartitionDesc() function spent more than 
40% of all execution time on copying in first benchmark (measured with 
`perf`). Using reference counting as Amit suggests will allow to 
significantily decrease performance degradation.


Any comments and suggestions are very welcome. Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index abec2b8..cd4c727 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -47,6 +47,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+// #include "utils/rangeset.h"
 
 /*
  * Information about a single partition
@@ -77,6 +78,29 @@ typedef struct PartitionDescData
 	PartitionRangeBound **rangeuppers;	/* range upper bounds */
 } PartitionDescData;
 
+/*
+ * Context
+ */
+typedef struct walk_expr_tree_context
+{
+	OidparentId;
+	Index			index;
+	PartitionKey	key;
+	PartitionDesc	desc;
+} walk_expr_tree_context;
+
+typedef struct var_

Re: [HACKERS] Declarative partitioning

2016-05-16 Thread Ildar Musin

Hi Amit,

I'm running some experiments based on your infrastructure trying to 
optimize SELECT queries. At some point I need to get PartitionDesc for 
relation and to do it I'm using RelationGetPartitionDesc() function. 
Problem is that this function copies relcache data and it can be quite 
slow for large amounts (thousands) of partitions. The comment to the 
function says that we cannot use relation->rd_partdesc pointer to 
relcache because of possibility of relcache invalidation. Could you 
please tell is it possible that relcache invalidation occurs during 
SELECT/UPDATE/DELETE query?


Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-04-20 Thread Ildar Musin
  0x0098a744 in get_attnum (relid=32807, attname=0x0) at 
lsyscache.c:825
#8  0x0056afd2 in get_check_expr_from_partbound 
(rel=0x7f868601ca20, parent=0x7f868601b770, bound=0x26e6ac8) at 
partition.c:1427
#9  0x0056bc9e in generate_partition_check_expr 
(rel=0x7f868601ca20) at partition.c:1788
#10 0x0056bb5f in RelationGetPartitionCheckExpr 
(rel=0x7f868601ca20) at partition.c:1746
#11 0x00782b5f in get_relation_constraints (root=0x268f1b8, 
relationObjectId=32807, rel=0x26e5cd8, include_notnull=1 '\001') at 
plancat.c:1209
#12 0x00782d74 in relation_excluded_by_constraints 
(root=0x268f1b8, rel=0x26e5cd8, rte=0x268ebf0) at plancat.c:1302
#13 0x0072a18d in set_append_rel_size (root=0x268f1b8, 
rel=0x26e5690, rti=1, rte=0x268ea80) at allpaths.c:947

...

--
Ildar Musin
i.mu...@postgrespro.ru



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-04-19 Thread Ildar Musin

Hi Amit,

On 15.04.2016 07:35, Amit Langote wrote:
Thanks a lot for the comments. The patch set changed quite a bit since 
the last version. Once the CF entry was marked returned with feedback 
on March 22, I held off sending the new version at all. Perhaps, it 
would have been OK. Anyway here it is, if you are interested. I will 
create an entry in CF 2016-09 for the same. Also, see below replies to 
you individual comments.


Thanks for your new patch! I've tried it and discovered some strange 
behavior for partitioning by composite key. Here is an example of my setup:


create table test(a int, b int) partition by range (a, b);
create table test_1 partition of test for values start (0, 0) end (100, 
100);
create table test_2 partition of test for values start (100, 100) end 
(200, 200);
create table test_3 partition of test for values start (200, 200) end 
(300, 300);


It's alright so far. But if we try to insert record in which attribute 
'a' belongs to one partition and attribute 'b' belongs to another then 
record will be inserted in the first one:


insert into test(a, b) values (150, 50);

select tableoid::regclass, * from test;
 tableoid |  a  | b
--+-+
 test_2   | 150 | 50
(1 row)

I think there should be an error because value for 'b' violates range 
constraint for test_2. Now if we query data from 'test' and add filter b 
< 100, then planner will exclude partitions 'test_2' (which actually 
contains inserted row) and 'test_3' and return nothing:


select * from test where b < 100;
 a | b
---+---
(0 rows)

explain (costs off) select * from test where b < 100;
QUERY PLAN
---
 Append
   ->  Seq Scan on test
 Filter: (b < 100)
   ->  Seq Scan on test_1
 Filter: (b < 100)
(5 rows)

--
Ildar Musin
i.mu...@postgrespro.ru



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2016-03-04 Thread Ildar Musin

Hi Pavel

27/02/16 10:26, Pavel Stehule пишет:

Hi

2015-08-06 10:37 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com 
<mailto:pavel.steh...@gmail.com>>:


Hi,

Psql based implementation needs new infrastructure (more than few
lines)

Missing:

* binary mode support
* parametrized query support,

I am not against, but both points I proposed, and both was rejected.

So why dont use current infrastructure? Raw copy is trivial patch.


I was asked by Daniel Verite about reopening this patch in opened 
commitfest.


I am sending rebased patch

Regards

Pavel


I am new to reviewing, here is what I got. Patch have been applied 
nicely to the HEAD. I tried to upload and export files in psql, it works 
as expected. All regression tests are passed without problems as well. 
Code looks good for me. There is a little confusion for me in this line 
of documentation:


"use any metadata - only row data in network byte order are exported"

Did you mean "only raw data in network byte order is exported"?

And there are two entries for this patch on commitfest page: in 
"miscellaneous" and "sql" sections. Probably it's better to remove one 
of them to avoid confusion.


--
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Declarative partitioning

2016-02-28 Thread Ildar Musin



16/02/16 07:46, Amit Langote wrote:

Hi Josh,

On 2016/02/16 11:41, Josh berkus wrote:

On 02/15/2016 04:28 PM, Amit Langote wrote:

Also, you won't see any optimizer and executor changes. Queries will still
use the same plans as existing inheritance-based partitioned tables,
although as I mentioned, constraint exclusion won't yet kick in. That will
be fixed very shortly.

We're not going to use CE for the new partitioning long-term, are we? This
is just the first version, right?

Yes. My approach in previous versions of stuffing major planner changes in
with the syntax patch was not quite proper in retrospect. So, I thought
I'd propose any major planner (and executor) changes later.

Thanks,
Amit


Hello Amit,

Thank you for your work. I'm currently working on extension aimed at 
planner optimization for partitioned tables 
(https://github.com/postgrespro/pg_pathman). At this moment I have an 
implementation of binary search for range partitioned tables with basic 
partitioning keys (date, timestamp, integers etc). And I'd like to try 
to combine your syntax and infrastructure with my binary search 
implementation.
There likely will be changes in range syntax and partitions cache 
structure based on discussion. So looking forward for your next patch.


Ildar


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers