Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-08-05 Thread Andres Freund
Hi 
On 2013-08-05 08:37:57 -0700, Kevin Grittner wrote:
> Some of the issues raised by Andres and Noah have been addressed.
> These all seemed simple and non-controversial, so I've just applied
> the suggested fixes.

Cool!

> >>>   I'd even suggest using BuildIndexInfo() or such on the indexes,
> >>>   then you could use ->ii_Expressions et al instead of peeking
> >>>   into indkeys by hand.
> >>
> >> Given that the function is in a source file described as containing
> >> "code to create and destroy POSTGRES index relations" and the
> >> comments for that function say that it "stores the information
> >> about the index that's needed by FormIndexDatum, which is used for
> >> both index_build() and later insertion of individual index tuples,"
> >> and we're not going to create or destroy an index or call
> >> FormIndexDatum or insert individual index tuples from this code, it
> >> would seem to be a significant expansion of the charter of that
> >> function.  What benefits do you see to using that level?
> >
> > I'd prevent you from needing to peek into indkeys. Note that it's
> > already used in various places.
> 
> I looked at where it was and wasn't used, and continue to be
> skeptical.  For example, both techniques are used in tablecmds.c;
> the technique you suggest is used where an index is being created,
> dropped, or index tuples are being manipulated, while the
> Form_pg_index structure is being used when the definition of the
> index is being examined without directly manipulating it.  Compare
> what is being done in my code with the existing code for
> ATExecDropNotNull(), for example.

The RelationGetIndexExpressions() you mentioned in the commit sounds
like a good plan to me. Didn't remember that existed.

Don't think the ATExecDropNotNull() comparison is really valid, we need
to know more details there, but anyway, you've found something a good
bit better.

> > What I am thinking of is that you'll get a successfull REFRESH
> > CONCURRENTLY but it will later error out at COMMIT time because there
> > were constraint violations. Afaik we don't have any such behaviour for
> > existing DDL and I don't like introducing it.0
> 
> REFRESH MATERIALIZED VIEW CONCURRENTLY is definitely not DDL.  It
> is DML, and behavior should be consistent with that.  (Note that
> the definition of the matview remains exactly the same after the
> statement executes as it was before; only the data is modified.)
> Without the CONCURRENTLY clause it's in the same sort of gray area
> as TRUNCATE TABLE, where it is essentially DML, but the
> implementation details are similar to that of DDL, so it may
> sometimes be hard to avoid DDL-like behaviors, even though it would
> be best to do so.  We have no such problem here.

But there's no usecase that makes deferred checks and similar useful
afaics. And it seems to me like it certainly will confuse users that a
second RMVC fails (via CheckTableNotInUse()) because there's still a
deferred trigger queue.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-08-05 Thread Kevin Grittner
Some of the issues raised by Andres and Noah have been addressed.
These all seemed simple and non-controversial, so I've just applied
the suggested fixes.

Some issues remain, such as how best to create the temp table used
for the "diff" data, and the related simplification of the security
context swapping that might become possible with a change there.
Review of that area has raised a couple other questions that I'm
looking into.  These all probably amount to enough that I will add
the patch(es) to address them to the next CF.

Andres Freund  wrote:
> On 2013-07-23 09:27:34 -0700, Kevin Grittner wrote:
>> Andres Freund  wrote:

>>> The loop over indexes in refresh_by_match_merge should
>>> index_open(ExclusiveLock) the indexes initially instead of
>>> searching the syscache manually. They are opened inside the
>>> loop in many cases anyway. There probably aren't any hazards
>>> currently, but I am not even sure about that. The index_open()
>>> in the loop at the very least processes the invalidation
>>> messages other backends send...

Fixed.

>>>   I'd even suggest using BuildIndexInfo() or such on the indexes,
>>>   then you could use ->ii_Expressions et al instead of peeking
>>>   into indkeys by hand.
>>
>> Given that the function is in a source file described as containing
>> "code to create and destroy POSTGRES index relations" and the
>> comments for that function say that it "stores the information
>> about the index that's needed by FormIndexDatum, which is used for
>> both index_build() and later insertion of individual index tuples,"
>> and we're not going to create or destroy an index or call
>> FormIndexDatum or insert individual index tuples from this code, it
>> would seem to be a significant expansion of the charter of that
>> function.  What benefits do you see to using that level?
>
> I'd prevent you from needing to peek into indkeys. Note that it's
> already used in various places.

I looked at where it was and wasn't used, and continue to be
skeptical.  For example, both techniques are used in tablecmds.c;
the technique you suggest is used where an index is being created,
dropped, or index tuples are being manipulated, while the
Form_pg_index structure is being used when the definition of the
index is being examined without directly manipulating it.  Compare
what is being done in my code with the existing code for
ATExecDropNotNull(), for example.

>>> [while comparison of indexed columns used OPERATOR() correctly,
>>> comparison of tid and rowvar values did not]

>> I wasn't aware that people could override the equality operators
>> for tid and RECORD

>> [example proving the possibility]

>> I think for the cases where you're comparing tids it's fine to just
>> to hardcode the operator to OPERATOR(pg_catalog.=).

Done.

>>> * I'd strongly suggest more descriptive table aliases than x, y,
>>>   d. Those make the statements rather hard to parse.
>>
>> I guess.  Those are intended to be internal, but I guess there's no
>> reason not to be more verbose in the aliases.
>
> Well, for one, other people will read that code every now and then. I am
> not 100% convinced that all the statements are correct, and it's more
> effort to do so right now. Also, those statements will show up in error
> messages.

Done.

> What I am thinking of is that you'll get a successfull REFRESH
> CONCURRENTLY but it will later error out at COMMIT time because there
> were constraint violations. Afaik we don't have any such behaviour for
> existing DDL and I don't like introducing it.

REFRESH MATERIALIZED VIEW CONCURRENTLY is definitely not DDL.  It
is DML, and behavior should be consistent with that.  (Note that
the definition of the matview remains exactly the same after the
statement executes as it was before; only the data is modified.)
Without the CONCURRENTLY clause it's in the same sort of gray area
as TRUNCATE TABLE, where it is essentially DML, but the
implementation details are similar to that of DDL, so it may
sometimes be hard to avoid DDL-like behaviors, even though it would
be best to do so.  We have no such problem here.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-23 Thread Kevin Grittner
Andres Freund  wrote:

> Hm. There seems to be more things that need some more improvement
> from a quick look.
>
> First, I have my doubts of the general modus operandy of
> CONCURRENTLY here. I think in many, many cases it would be faster
> to simply swap in the newly built heap + indexes in concurrently.
> I think I have seen that mentioned in the review while mostly
> skipping the discussion, but still. That would be easy enough as
> of Robert's mvcc patch.

Yeah, in many cases you would not want to choose to specify this
technique. The point was to have a technique which could run
without blocking while a heavy load of queries (including perhaps a
very long-running query) was executing.  If subsequent events
provide an alternative way to do that, it's worth looking at it;
but my hope was to get this out of the way to allow time to develop
incremental maintenance of matviews for the next CF.  If we spend
too much time reworking this to micro-optimize it for new patches,
incremental maintenance might not happen for 9.4.

The bigger point is perhaps syntax.  If MVCC catalogs are really at
a point where we can substitute a new heap and new indexes for a
matview without taking an AccessExclusiveLock, then we should just
make the current code do that.  I think there is still a place for
a differential update for large matviews which only need small
changes on a REFRESH, but perhaps the right term for that is not
CONCURRENTLY.

> * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance
>   be done a good bit earlier? We're executing queries before
>   that.

This can't be in effect while we are creating temp tables.  If
we're going to support a differential REFRESH technique, it must be
done more "surgically" than to wrap the whole REFRESH statement
execution.

> * The loop over indexes in refresh_by_match_merge should
>   index_open(ExclusiveLock) the indexes initially instead of
>   searching the syscache manually. They are opened inside the
>   loop in many cases anyway. There probably aren't any hazards
>   currently, but I am not even sure about that. The index_open()
>   in the loop at the very least processes the invalidation
>   messages other backends send...

Fair enough.  That seems like a simple change.

>   I'd even suggest using BuildIndexInfo() or such on the indexes,
>   then you could use ->ii_Expressions et al instead of peeking
>   into indkeys by hand.

Given that the function is in a source file described as containing
"code to create and destroy POSTGRES index relations" and the
comments for that function say that it "stores the information
about the index that's needed by FormIndexDatum, which is used for
both index_build() and later insertion of individual index tuples,"
and we're not going to create or destroy an index or call
FormIndexDatum or insert individual index tuples from this code, it
would seem to be a significant expansion of the charter of that
function.  What benefits do you see to using that level?

> * All not explicitly looked up operators should be qualified
>   using OPERATOR(). It's easy to define your own = operator for
>   tid and set the search_path to public,pg_catalog. Now, the
>   whole thing is restricted to talbe owners afaics, making this
>   decidedly less dicey, but still. But if anyobdy actually uses a
>   search_path like the above you can easily hurt them.
> * Same seems to hold true for the y = x and y.* IS DISTINCT FROM
>   x.* operations.
> * (y.*) = (x.*) also needs to do proper operator lookups.

I wasn't aware that people could override the equality operators
for tid and RECORD, so that seemed like unnecessary overhead.  I
can pull the operators from the btree AM if people can do that.

> * OpenMatViewIncrementalMaintenance() et al seem to be
>   underdocumented.

If the adjacent comment for
MatViewIncrementalMaintenanceIsEnabled() left you at all uncertain
about what OpenMatViewIncrementalMaintenance() and
CloseMatViewIncrementalMaintenance() are for, I can add comments.
At the time I wrote it, it seemed to me to be redundant to have
such comments, and would cause people to waste more time looking at
them then would be saved by anything they could add to what the
function names and one-line function bodies conveyed; but if there
was doubt in your mind about when they should be called, I'll add
something.

Would it help to move the static functions underneath the
(documented) public function and its comment?

> * why is it even allowed that matview_maintenance_depth is > 1?
>   Unless I miss something there doesn't seem to be support for a
>   recursive refresh whatever that would mean?

I was trying to create the function in a way that would be useful
for incremental maintenance, too.  Those could conceivably recurse.
Also, this way bugs in usage are more easily detected.

> * INSERT and DELETE should also alias the table names, to make
>   sure there cannot be issues with the nested queries.

I don't see the hazard -- could you elabor

Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-22 Thread Andres Freund
On 2013-07-23 00:01:50 +0200, Andres Freund wrote:
> On 2013-07-17 10:11:28 -0400, Tom Lane wrote:
> > Kevin Grittner  writes:
> > > Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
> > 
> > The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
> > is broken.
> 
> Jagarundi still tells that story. At least something like the following
> patch seems to be required.

Hm. There seems to be more things that need some more improvement from a
quick look.

First, I have my doubts of the general modus operandy of CONCURRENTLY
here. I think in many, many cases it would be faster to simply swap in
the newly built heap + indexes in concurrently. I think I have seen that
mentioned in the review while mostly skipping the discussion, but still.
That would be easy enough as of Robert's mvcc patch.

* Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance be
  done a good bit earlier? We're executing queries before that.
* The loop over indexes in refresh_by_match_merge should 
index_open(ExclusiveLock) the
  indexes initially instead of searching the syscache manually. They are
  opened inside the loop in many cases anyway. There probably aren't any
  hazards currently, but I am not even sure about that. The index_open()
  in the loop at the very least processes the invalidation messages
  other backends send...
  I'd even suggest using BuildIndexInfo() or such on the indexes, then
  you could use ->ii_Expressions et al instead of peeking into indkeys
  by hand.
* All not explicitly looked up operators should be qualified using
  OPERATOR(). It's easy to define your own = operator for tid and set
  the search_path to public,pg_catalog. Now, the whole thing is
  restricted to talbe owners afaics, making this decidedly less dicey,
  but still. But if anyobdy actually uses a search_path like the above
  you can easily hurt them.
* Same seems to hold true for the y = x and y.* IS DISTINCT FROM x.*
  operations.
* (y.*) = (x.*) also needs to do proper operator lookups.
* OpenMatViewIncrementalMaintenance() et al seem to be underdocumented.
* why is it even allowed that matview_maintenance_depth is > 1? Unless I
  miss something there doesn't seem to be support for a recursive
  refresh whatever that would mean?
* INSERT and DELETE should also alias the table names, to make sure there cannot
  be issues with the nested queries.
* I'd strongly suggest more descriptive table aliases than x, y,
  d. Those make the statements rather hard to parse.
* SPI_exec() is deprecated in favor of SPI_execute()
* ISTM deferred uniqueness checks and similar things should be enforced
  to run ub refresh_by_match_merge()

I think this patch/feature needs a good bit of work...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-22 Thread Andres Freund
On 2013-07-22 19:09:13 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2013-07-17 10:11:28 -0400, Tom Lane wrote:
> >> The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
> >> is broken.
> 
> > Jagarundi still tells that story.
> 
> Uh, no.  Jagarundi was perfectly happy for several build cycles after
> I committed 405a468.  The buildfarm history shows that it started
> complaining again after this change:
> 
> f01d1ae Add infrastructure for mapping relfilenodes to relation OIDs

Huh? That's rather strange. No code from that patch is even exececuted
until the alter_table regression tests way later. So I can't see how
that patch could cause the matview error. Which is pretty clearly using
an freed relcache entry.
I think this may be a timing issue.

> > At least something like the following patch seems to be required.
> 
> This does look like a necessary change, but I suspect there is also
> something rotten in f01d1ae.

The alter table failure lateron seems to be "ERROR:  relation "tmp"
already exists", which just seems to be a consequence of incomplete
cleanup due to the earlier crashes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-22 Thread Tom Lane
Andres Freund  writes:
> On 2013-07-17 10:11:28 -0400, Tom Lane wrote:
>> The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
>> is broken.

> Jagarundi still tells that story.

Uh, no.  Jagarundi was perfectly happy for several build cycles after
I committed 405a468.  The buildfarm history shows that it started
complaining again after this change:

f01d1ae Add infrastructure for mapping relfilenodes to relation OIDs

> At least something like the following patch seems to be required.

This does look like a necessary change, but I suspect there is also
something rotten in f01d1ae.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-22 Thread Robert Haas
On Mon, Jul 22, 2013 at 6:01 PM, Andres Freund  wrote:
> On 2013-07-17 10:11:28 -0400, Tom Lane wrote:
>> Kevin Grittner  writes:
>> > Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
>>
>> The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
>> is broken.
>
> Jagarundi still tells that story. At least something like the following
> patch seems to be required.

Thanks, I was noticing that breakage today, too.

I committed this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-22 Thread Andres Freund
On 2013-07-17 10:11:28 -0400, Tom Lane wrote:
> Kevin Grittner  writes:
> > Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
> 
> The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
> is broken.

Jagarundi still tells that story. At least something like the following
patch seems to be required.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 46149ee..09ea344 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -141,6 +141,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	List	   *actions;
 	Query	   *dataQuery;
 	Oid			tableSpace;
+	Oid			owner;
 	Oid			OIDNewHeap;
 	DestReceiver *dest;
 	bool		concurrent;
@@ -238,6 +239,8 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	else
 		tableSpace = matviewRel->rd_rel->reltablespace;
 
+	owner = matviewRel->rd_rel->relowner;
+
 	heap_close(matviewRel, NoLock);
 
 	/* Create the transient table that will receive the regenerated data. */
@@ -247,8 +250,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 
 	/* Generate the data, if wanted. */
 	if (!stmt->skipData)
-		refresh_matview_datafill(dest, dataQuery, queryString,
- matviewRel->rd_rel->relowner);
+		refresh_matview_datafill(dest, dataQuery, queryString, owner);
 
 	/* Make the matview match the newly generated data. */
 	if (concurrent)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-17 Thread Tom Lane
Hitoshi Harada  writes:
> Looks like rd_indpred is not correct if index relation is fresh.
> Something like this works for me.

> -   if (indexRel->rd_indpred != NIL)
> +   if (RelationGetIndexPredicate(indexRel) != NIL)

Hm, yeah, the direct access to rd_indpred is certainly wrong.
Will apply, thanks!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-17 Thread Hitoshi Harada
On Wed, Jul 17, 2013 at 7:11 AM, Tom Lane  wrote:
> Kevin Grittner  writes:
>> Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
>
> The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
> is broken.
>

Looks like rd_indpred is not correct if index relation is fresh.
Something like this works for me.

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index edd34ff..46149ee 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -634,7 +634,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid)

/* Skip partial indexes. */
indexRel = index_open(index->indexrelid,
RowExclusiveLock);
-   if (indexRel->rd_indpred != NIL)
+   if (RelationGetIndexPredicate(indexRel) != NIL)
{
index_close(indexRel, NoLock);
ReleaseSysCache(indexTuple);

--
Hitoshi Harada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

2013-07-17 Thread Tom Lane
Kevin Grittner  writes:
> Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch
is broken.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers