Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-11-14 Thread Michael Paquier
On Wed, Nov 15, 2017 at 12:12 PM, Masahiko Sawada  wrote:
> I think we need to check only sessionBackupState and don't need to
> check XLogCtl->Insert.exclusiveBackupState in do_pg_abort_backup(). We
> can quickly return if sessionBackupState !=
> SESSION_BACKUP_NON_EXCLUSIVE. In your suggestion, I think we can still
> get an assertion failure when pg_stop_backup(false) waiting for
> archiving is terminated while concurrent an exclusive backup is in
> progress.

I have just gone through the thread once again, and noticed that it is
actually what I suggested upthread:
https://www.postgresql.org/message-id/cab7npqtm5cdrr5y7yyfky+pvdz6dws_jkg1kstan5m95gam...@mail.gmail.com
But your v2 posted here did not do that so it is incorrect from the start:
https://www.postgresql.org/message-id/cad21aoa+isxyl1_zxmnk9xjhyel5h6rxjttovli7fumqfmc...@mail.gmail.com

We both got a bit confused here. As do_pg_abort_backup() is only used
for non-exclusive backups (including those taken through the
replication protocol), going through the session lock for checks is
fine. Could you update your patch accordingly please?
-- 
Michael



Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-11-14 Thread Masahiko Sawada
On Wed, Nov 15, 2017 at 10:05 AM, Michael Paquier
 wrote:
> On Wed, Nov 15, 2017 at 9:06 AM, Masahiko Sawada  
> wrote:
>>> On Nov 15, 2017 2:59 AM, "Fujii Masao"  wrote:
>>> + /* Quick exit if we have done the backup */
>>> + if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
>>> + return;
>>>
>>> This quick exit seems to cause another problem. Please imagine the
>>> case where there is no exclusive backup running, a backend starts
>>> non-exclusive backup and is terminated before calling pg_stop_backup().
>>> In this case, do_pg_abort_backup() should decrement
>>> XLogCtl->Insert.nonExclusiveBackups, but, with the patch, because of
>>> the above quick exit, do_pg_abort_backup() fails to decrement that.
>>
>> Hmm, I think that in this case because  exclusiveBackupState is not
>> EXCLUSIVE_BACKUP_NONE, nonExclusiveBackups is surely decremented. Am I
>> missing something?
>
> Nah. Fujii-san is right here as exclusiveBackupState is never updated
> for non-exclusive backups.

Thank you, I was wrong.

> You need an extra check on
> sessionBackupState to make sure that even for non-exclusive backups
> the counter is correctly decremented if a non-exclusive session lock
> is hold. For an exclusive backup, the session lock can be either
> SESSION_BACKUP_EXCLUSIVE if an exclusive backup is stopped on the same
> session as the start phase, or SESSION_BACKUP_NONE if the exclusive
> backup is stopped from a different session. So you'd basically need
> that:
> +   /*
> +* Quick exit if we have done the exclusive backup or if session is
> +* not keeping around a started non-exclusive backup.
> +*/
> +   if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
> +   sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
> +   return;
>
> At the same time it would be safer at startup phase to update
> sessionBackupState when holding the WAL insert lock to prevent other
> CHECK_FOR_INTERRUPTS hazards. Suggestion of patch attached.

I think we need to check only sessionBackupState and don't need to
check XLogCtl->Insert.exclusiveBackupState in do_pg_abort_backup(). We
can quickly return if sessionBackupState !=
SESSION_BACKUP_NON_EXCLUSIVE. In your suggestion, I think we can still
get an assertion failure when pg_stop_backup(false) waiting for
archiving is terminated while concurrent an exclusive backup is in
progress.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-14 Thread Amit Langote
Hi David.

On 2017/11/14 13:00, David Rowley wrote:
> On 13 November 2017 at 22:46, Amit Langote wrote:
>> On 2017/11/10 12:30, Kyotaro HORIGUCHI wrote:
>>> The following uses a bit tricky bitmap operation but
>>> is straightforward as a whole.
>>>
>>> =
>>> /* fill the bits upper from BITNUM(lower) (0-based) of the first word */
>>> a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);
>>>
>>> /* fill up intermediate words */
>>> while (wordnum < uwordnum)
>>>a->words[wordnum++] = ~(bitmapword) 0;
>>>
>>> /* fill up to BITNUM(upper) bit (0-based) of the last word */
>>> a->workds[wordnum++] |=
>>>  (~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
>>> =
>>
>> Considering also the David's comment downthread, I will try to incorporate
>> this into bms_add_range().
> 
> I've attached an implementation of the patch using this method.

[ ... ]

> Probably just go with Kyotaro's idea (v2). I don't think this is worth
> debating, I just wanted to show it's not that clear-cut.

Thanks.  I have incorporated the v2 patch in my local repository.  I'm
still working through some of the review comments and will be able to
post a new version no later than tomorrow, including support for the new
hash partitioning.

Thanks,
Amit




Re: [HACKERS] moving some partitioning code to executor

2017-11-14 Thread Amit Langote
On 2017/11/15 2:09, Alvaro Herrera wrote:
> Amit Langote wrote:
> 
>> Since that 0001 patch was committed [1], here is the rebased patch.  Will
>> add this to the November commit-fest.
> 
> Please rebase once more -- 1aba8e651ac3 seems to have broken things
> in this area pretty thoroughly.

Thanks, done.

Regards,
Amit
From 75cd3fc3524df4c674fc22f59580ed8271002c8e Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 8 Sep 2017 19:07:38 +0900
Subject: [PATCH] Move certain partitioning code to the executor

---
 src/backend/catalog/partition.c| 455 +--
 src/backend/commands/copy.c|   1 +
 src/backend/executor/Makefile  |   2 +-
 src/backend/executor/execMain.c| 266 +---
 src/backend/executor/execPartition.c   | 560 +
 src/backend/executor/nodeModifyTable.c |   1 +
 src/include/catalog/partition.h|  48 +--
 src/include/executor/execPartition.h   |  65 
 src/include/executor/executor.h|  14 +-
 9 files changed, 717 insertions(+), 695 deletions(-)
 create mode 100644 src/backend/executor/execPartition.c
 create mode 100644 src/include/executor/execPartition.h

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index cff59ed055..ce29ba2eda 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -170,8 +170,6 @@ static int32 partition_bound_cmp(PartitionKey key,
 static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
void *probe, bool 
probe_is_bound, bool *is_equal);
-static void get_partition_dispatch_recurse(Relation rel, Relation parent,
-  List **pds, List 
**leaf_part_oids);
 static int get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(PartitionKey key, Datum *values, bool 
*isnull);
@@ -1530,148 +1528,6 @@ get_partition_qual_relid(Oid relid)
return result;
 }
 
-/*
- * RelationGetPartitionDispatchInfo
- * Returns information necessary to route tuples down a partition 
tree
- *
- * The number of elements in the returned array (that is, the number of
- * PartitionDispatch objects for the partitioned tables in the partition tree)
- * is returned in *num_parted and a list of the OIDs of all the leaf
- * partitions of rel is returned in *leaf_part_oids.
- *
- * All the relations in the partition tree (including 'rel') must have been
- * locked (using at least the AccessShareLock) by the caller.
- */
-PartitionDispatch *
-RelationGetPartitionDispatchInfo(Relation rel,
-int 
*num_parted, List **leaf_part_oids)
-{
-   List   *pdlist = NIL;
-   PartitionDispatchData **pd;
-   ListCell   *lc;
-   int i;
-
-   Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-
-   *num_parted = 0;
-   *leaf_part_oids = NIL;
-
-   get_partition_dispatch_recurse(rel, NULL, , leaf_part_oids);
-   *num_parted = list_length(pdlist);
-   pd = (PartitionDispatchData **) palloc(*num_parted *
-   
   sizeof(PartitionDispatchData *));
-   i = 0;
-   foreach(lc, pdlist)
-   {
-   pd[i++] = lfirst(lc);
-   }
-
-   return pd;
-}
-
-/*
- * get_partition_dispatch_recurse
- * Recursively expand partition tree rooted at rel
- *
- * As the partition tree is expanded in a depth-first manner, we maintain two
- * global lists: of PartitionDispatch objects corresponding to partitioned
- * tables in *pds and of the leaf partition OIDs in *leaf_part_oids.
- *
- * Note that the order of OIDs of leaf partitions in leaf_part_oids matches
- * the order in which the planner's expand_partitioned_rtentry() processes
- * them.  It's not necessarily the case that the offsets match up exactly,
- * because constraint exclusion might prune away some partitions on the
- * planner side, whereas we'll always have the complete list; but unpruned
- * partitions will appear in the same order in the plan as they are returned
- * here.
- */
-static void
-get_partition_dispatch_recurse(Relation rel, Relation parent,
-  List **pds, List 
**leaf_part_oids)
-{
-   TupleDesc   tupdesc = RelationGetDescr(rel);
-   PartitionDesc partdesc = RelationGetPartitionDesc(rel);
-   PartitionKey partkey = RelationGetPartitionKey(rel);
-   PartitionDispatch pd;
-   int i;
-
-   check_stack_depth();
-
-   /* Build a PartitionDispatch for this table and add it to *pds. */
-   pd = (PartitionDispatch) 

Re: [HACKERS] SQL procedures

2017-11-14 Thread Tom Lane
Peter Eisentraut  writes:
> On 11/14/17 11:14, Tom Lane wrote:
>> ... Do we really want the existence of
>> a function foo(int) to mean that you can't create a SQL procedure named
>> foo and taking one int argument?

> Yes, that is defined that way by the SQL standard.

Meh.  OK, then it has to be one catalog.

regards, tom lane



Re: [HACKERS] Runtime Partition Pruning

2017-11-14 Thread David Rowley
On 15 November 2017 at 01:57, David Rowley  wrote:
> I think to do this you're going to have to store some sort of array
> that maps the partition index to the subpath in the Append node so you
> can correctly identify the subpath based on what you're getting back
> from get_partitions_for_keys(). Perhaps what you had worked previously
> when we were not returning a Bitmapset with that function.
>
> Once you've got that design worked out I can take another look at this.

I think this is a bit more broken than I originally mentioned above.
The code you have at the moment assumes there will be just a single
partitioned table in the hierarchy. Remember that complex partitioned
hierarchies will be flattened during set_append_rel_pathlist(), so
there may be multiple partitioned relations to search for.

A more simple way to break the patch is to have some constants in the
query to eliminate some of the partitions during planning, leaving
just a few to be eliminated during execution.

Something like:

deallocate ab_q1;
drop table if exists ab;
create table ab (a int not null, b int not null) partition by list(a);
create table ab_a1 partition of ab for values in (1);
create table ab_a2 partition of ab for values in (2);
create table ab_a3 partition of ab for values in (3);
create table ab_a4 partition of ab for values in (4);
create table ab_a5 partition of ab for values in (5);
create table ab_a6 partition of ab for values in (6);
create table ab_a7 partition of ab for values in (7);
create table ab_a8 partition of ab for values in (8);
create table ab_a9 partition of ab for values in (9);
create table ab_a10 partition of ab for values in (10);

prepare ab_q1 (int) as select * from ab where a between 4 and 5 and a = $1;

explain execute ab_q1 (4);
explain execute ab_q1 (4);
explain execute ab_q1 (4);
explain execute ab_q1 (4);
explain execute ab_q1 (4);

explain execute ab_q1 (4); -- TRAP: FailedAssertion("!(n <
list->length)", File: "src/backend/nodes/list.c", Line: 392)

So some sort of hierarchical structure of the partition hierarchy
would need to be stored in the Append node and then you'd need to
search at each level, and then somehow match the results up to the
subpaths that you have in the Append. Although, I'm still not sure
this is the best way to go about this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] SQL procedures

2017-11-14 Thread Peter Eisentraut
On 11/14/17 11:14, Tom Lane wrote:
>> The common functionality between functions and procedures is like 98%
>> [citation needed], so they definitely belong there, even more so than
>> aggregates, for example.
> No, I don't think so.  The core reason why not is that in
> 
>   SELECT foo(...) FROM ...
> 
> foo() might be either a plain function or an aggregate, so it's important
> that functions and aggregates share the same namespace.  *That* is why
> they are in the same catalog.  On the other hand, since the above syntax
> is not usable to call a SQL procedure, putting SQL procedures into pg_proc
> just creates namespacing conflicts.  Do we really want the existence of
> a function foo(int) to mean that you can't create a SQL procedure named
> foo and taking one int argument?

Yes, that is defined that way by the SQL standard.

The point about the overlap refers more to the internals.  The entire
parsing, look-up, type-resolution, DDL handling, and other things are
the same.

So for both of these reasons I think it's appropriate to use the same
catalog.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-14 Thread David Rowley
On 15 November 2017 at 04:23, Alvaro Herrera  wrote:
> David Rowley wrote:
>> I'd have thought some sort of: ALTER INDEX name_of_partitioned_index
>> REPLACE INDEX FOR partitioned_tablename WITH
>> name_of_new_matching_bloat_free_index;
>>
>> ... or something along those lines, and just have an atomic swap out
>> of the index with some new one that's been created.
>
> (My intention here is to avoid scope creep.)

OK, I didn't really mean that this would be required for an initial
patch. It's something that could be invented at some later date.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-14 Thread Alvaro Herrera
Hi Jesper,

Thanks for reviewing.

Jesper Pedersen wrote:

> I have been looking at the "CREATE INDEX ... ONLY" syntax, and I think it
> could cause some confusion due to the "Note" described in create_index.sgml.
> 
> An alternative, maybe crazy, could be to treat a partitioned index as one;
> e.g. all operations are on the definition.

Well, honestly that's the way I wanted partitioning to be defined, right
from the start -- my first proposal was that partitions were not to be
standalone objects.  But I was outvoted once I stepped out of the
implementation, and partitioning is now the way we have it -- each
partition its own separate object.  Until we change that view (if we
ever do), I think the only sensible decision is that the whole feature
works that way.

> Some comments. [...]

Thanks, will fix.  Except for this one:

> A small thing, for
> 
> -- test.sql --
> CREATE TABLE test (a integer NOT NULL) PARTITION BY HASH(a);
> CREATE TABLE test_p00 PARTITION OF test FOR VALUES WITH (MODULUS 2,
> REMAINDER 0);
> CREATE TABLE test_p01 PARTITION OF test FOR VALUES WITH (MODULUS 2,
> REMAINDER 1);
> CREATE INDEX idx_test_a ON test (a);
> -- test.sql --
> 
> I would expect that the index names were 'test_p00_idx_test_a' and
> 'test_p01_idx_test_a'.

Hmm, the code in my patch just uses the standard naming routine for
indexes.  I'm disinclined to change it in the initial commit, but
perhaps we can change that later.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-14 Thread Peter Geoghegan
On Tue, Nov 14, 2017 at 11:02 AM, Simon Riggs  wrote:
> That's a good place to leave this for now - we're OK to make progress
> with the main feature, and we have some questions to be addressed once
> we have a cake to decorate.
>
> Thanks for your input.

Thanks for listening. I regret that it became heated, but I'm glad
that we now understand each other's perspective. I'm also glad that
you're pushing ahead with MERGE as a project, because MERGE is
certainly a compelling feature.

-- 
Peter Geoghegan



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-14 Thread Jesper Pedersen

Hi,

On 11/14/2017 12:49 PM, Alvaro Herrera wrote:

Thanks, pushed.


Here's the remaining bits, rebased.



First of all, thanks for working on this.

I have been looking at the "CREATE INDEX ... ONLY" syntax, and I think 
it could cause some confusion due to the "Note" described in 
create_index.sgml.


An alternative, maybe crazy, could be to treat a partitioned index as 
one; e.g. all operations are on the definition. That way ONLY, ATTACH 
and DETACH could be eliminated. Once a partition is detached any 
partitioned indexes would be marked as a normal index, and a partition 
could only be attached if it had indexes satisfying all partition index 
definitions. Bloat could be handled by swapping the index as suggested 
by David [1]. It would be less flexible, but people always have the 
option to define indexes directly on the partitions.


Some comments.

0004's alter_index.sgml are missing the "PARTITION" keyword for both the 
ATTACH and DETACH commands.


A small thing, for

-- test.sql --
CREATE TABLE test (a integer NOT NULL) PARTITION BY HASH(a);
CREATE TABLE test_p00 PARTITION OF test FOR VALUES WITH (MODULUS 2, 
REMAINDER 0);
CREATE TABLE test_p01 PARTITION OF test FOR VALUES WITH (MODULUS 2, 
REMAINDER 1);

CREATE INDEX idx_test_a ON test (a);
-- test.sql --

I would expect that the index names were 'test_p00_idx_test_a' and 
'test_p01_idx_test_a'.


psql completion for "CREATE INDEX blah ON tes" only lists the child 
tables. Completion for the ALTER INDEX commands is missing.


[1] 
https://www.postgresql.org/message-id/CAKJS1f_Wf%3DM06o8iQi72SxN%2BZpLV4c0CwYoN5xYVY4aWWX-jBQ%40mail.gmail.com


Best regards,
 Jesper



Re: [HACKERS] Transaction control in procedures

2017-11-14 Thread Peter Eisentraut
On 11/14/17 09:27, Merlin Moncure wrote:
> *) Will it be possible to do operations like this in pl/pgsql?
> 
> BEGIN
>   SELECT INTO r * FROM foo;
> 
>   START TRANSACTION;  -- perhaps we ought to have a special function
> for this instead (BEGIN is reserved, etc).
>   SET transaction_isololation TO serializable;
> ...

Eventually, I don't see why not.  Currently, it's not complete.

One detail in your example is that when you enter the procedure, you are
already in a transaction, so you would have to run either COMMIT or
ROLLBACK before the START TRANSACTION.

Also, you can't run SET TRANSACTION ISOLATION through SPI, so one would
have to implement a separate code path for that, but that would just be
a bit of leg work.

>  *) Will there be any negative consequences to a procedure running
> with an unbounded run time?  For example, something like:
> 
> LOOP
>   SELECT check_for_stuff_to_do();
> 
>   IF stuff_to_do
>   THEN
> do_stuff();
>   ELSE
> PERFORM pg_sleep(1);
>   END IF;
> END LOOP;

That procedure doesn't do anything with transactions, so it's just like
a long-running function.  Otherwise, it'd just be like long-running
client code managing transactions.

> *) Will pg_cancel_backend() cancel the currently executing statement
> or the procedure? (I guess probably the procedure but I'm curious)

Same as the way it currently works.  It will raise an exception, which
will travel up the stack and eventually issue an error or be caught.  If
someone knows more specific concerns here I could look into it, but I
don't see any problem.

> *) Will long running procedures be subject to statement timeout (and
> does it apply to the entire procedure)?

See previous item.

> Will they be able to control
> statement_timeout from within the procedure itself?

The statement timeout alarm is set by the top-level execution loop, so
you can't change a statement timeout that is already in progress.  But
you could change the GUC and commit it for the next top-level statement.

> *) Will pg_stat_activity show the invoking CALL or the currently
> executing statement?  I see a strong argument for showing both of
> these things. although I understand that's out of scope here.

Not different from a function execution, i.e., top-level statement.

> If these questions (especially the first two) come down the correct
> way, then it will mean that I can stop coding in other languages
> (primarily bash) for a fairly large number of cases that I really
> think belong in the database itself.  This would really simplify
> coding, some things in bash are really awkward to get right such as a
> mutex to guarantee single script invocation.  My only real dependency
> on the operation system environment at that point would be cron to
> step in to the backround daemon process (which would immediately set
> an advisory lock).

Well, some kind of scheduler around procedures would be pretty cool at
some point.

> I'm somewhat surprised that SPI is the point of attack for this
> functionality, but if it works that's really the best case scenario
> (the only downside I can see is that the various out of core pl/s have
> to implement the interface individually).

So I tried different things here, and I'll list them here to explain how
I got there.

Option zero is to not use SPI at all and implement a whole new internal
command execution system.  But that would obviously be a large amount of
work, and it would look 85% like SPI, and as it turns out it's not
necessary.

The first thing I tried out what to run transaction control statements
through SPI.  That turned out to be very complicated and confusing and
fragile, mainly because of the dichotomy between the internal
subtransaction management that the PLs do and the explicit transaction
control statements on the other hand.  It was just a giant unworkable mess.

The next thing I tried was to shut down (SPI_finish) SPI before a
transaction boundary command and restart it (SPI_connect) it afterwards.
 That would work in principle, but it would require a fair amount of
work in each PL, because they implicitly rely on SPI (or perhaps are
tangled up with SPI) for memory management.

The setup I finally arrived at was to implement the transaction boundary
commands as SPI API calls and let them internally make sure that only
the appropriate stuff is cleared away at transaction boundaries.  This
turned out to be the easiest and cleanest.  I have since the last patch
implemented the transaction control capabilities in PL/pgSQL, PL/Perl,
and PL/Tcl, and it was entirely trivial once the details were worked out
as I had shown in PL/Python.  I will post an updated patch with this soon.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-14 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Simon Riggs wrote:
> > On 13 November 2017 at 12:55, Alvaro Herrera  
> > wrote:
> > > Somehow I managed to include an unrelated patch as attachment.  Here's
> > > another attempt (on which I also lightly touched ddl.sgml with relevant
> > > changes).
> > 
> > Looks good. Some minor comments below.
> > 
> > 0001- Simplify
> > Seems useful as separate step; agree with everything, no further comments
> 
> Thanks, pushed.

Here's the remaining bits, rebased.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From c9bb7365be8cc542660379632369bfebe5e2d64b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 26 Oct 2017 11:15:39 +0200
Subject: [PATCH v5 1/4] Get rid of copy_partition_key

Instead, allocate the objects in a temporary context right from the
start, and only make it permanent once no errors can occur anymore.

Reviewed-by: Robert Haas, Tom Lane
Discussion: https://postgr.es/m/20171027172730.eh2domlkpn4ja62m@alvherre.pgsql
---
 src/backend/utils/cache/relcache.c | 69 +-
 1 file changed, 8 insertions(+), 61 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index 1908420d82..59280cd8bb 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -262,7 +262,6 @@ static Relation AllocateRelationDesc(Form_pg_class relp);
 static void RelationParseRelOptions(Relation relation, HeapTuple tuple);
 static void RelationBuildTupleDesc(Relation relation);
 static void RelationBuildPartitionKey(Relation relation);
-static PartitionKey copy_partition_key(PartitionKey fromkey);
 static Relation RelationBuildDesc(Oid targetRelId, bool insertIt);
 static void RelationInitPhysicalAddr(Relation relation);
 static void load_critical_index(Oid indexoid, Oid heapoid);
@@ -846,6 +845,11 @@ RelationBuildPartitionKey(Relation relation)
if (!HeapTupleIsValid(tuple))
return;
 
+   partkeycxt = AllocSetContextCreate(CurTransactionContext,
+  
RelationGetRelationName(relation),
+  
ALLOCSET_SMALL_SIZES);
+   oldcxt = MemoryContextSwitchTo(partkeycxt);
+
key = (PartitionKey) palloc0(sizeof(PartitionKeyData));
 
/* Fixed-length attributes */
@@ -983,71 +987,14 @@ RelationBuildPartitionKey(Relation relation)
 
ReleaseSysCache(tuple);
 
-   /* Success --- now copy to the cache memory */
-   partkeycxt = AllocSetContextCreate(CacheMemoryContext,
-  
RelationGetRelationName(relation),
-  
ALLOCSET_SMALL_SIZES);
+   /* Success --- make the relcache point to the newly constructed key */
+   MemoryContextSetParent(partkeycxt, CacheMemoryContext);
relation->rd_partkeycxt = partkeycxt;
-   oldcxt = MemoryContextSwitchTo(relation->rd_partkeycxt);
-   relation->rd_partkey = copy_partition_key(key);
+   relation->rd_partkey = key;
MemoryContextSwitchTo(oldcxt);
 }
 
 /*
- * copy_partition_key
- *
- * The copy is allocated in the current memory context.
- */
-static PartitionKey
-copy_partition_key(PartitionKey fromkey)
-{
-   PartitionKey newkey;
-   int n;
-
-   newkey = (PartitionKey) palloc(sizeof(PartitionKeyData));
-
-   newkey->strategy = fromkey->strategy;
-   newkey->partnatts = n = fromkey->partnatts;
-
-   newkey->partattrs = (AttrNumber *) palloc(n * sizeof(AttrNumber));
-   memcpy(newkey->partattrs, fromkey->partattrs, n * sizeof(AttrNumber));
-
-   newkey->partexprs = copyObject(fromkey->partexprs);
-
-   newkey->partopfamily = (Oid *) palloc(n * sizeof(Oid));
-   memcpy(newkey->partopfamily, fromkey->partopfamily, n * sizeof(Oid));
-
-   newkey->partopcintype = (Oid *) palloc(n * sizeof(Oid));
-   memcpy(newkey->partopcintype, fromkey->partopcintype, n * sizeof(Oid));
-
-   newkey->partsupfunc = (FmgrInfo *) palloc(n * sizeof(FmgrInfo));
-   memcpy(newkey->partsupfunc, fromkey->partsupfunc, n * sizeof(FmgrInfo));
-
-   newkey->partcollation = (Oid *) palloc(n * sizeof(Oid));
-   memcpy(newkey->partcollation, fromkey->partcollation, n * sizeof(Oid));
-
-   newkey->parttypid = (Oid *) palloc(n * sizeof(Oid));
-   memcpy(newkey->parttypid, fromkey->parttypid, n * sizeof(Oid));
-
-   newkey->parttypmod = (int32 *) palloc(n * sizeof(int32));
-   memcpy(newkey->parttypmod, fromkey->parttypmod, n * sizeof(int32));
-
-   newkey->parttyplen = (int16 *) palloc(n * sizeof(int16));
-   memcpy(newkey->parttyplen, fromkey->parttyplen, n * sizeof(int16));
-
-   

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-11-14 Thread Peter Moser
2017-11-11 13:19 GMT+01:00 Robert Haas :
> This is really good input.  If the feature weren't useful, then it
> wouldn't make sense to try to figure out how to integrate it, but if
> it is, then we should try.

We are happy to hear this and will do the implementation.  Any input
regarding the implementation is much appreciated.

> I don't think that implementing a feature like this by SQL
> transformation can work.  It's certainly got the advantage of
> simplicity of implemention, but there are quite a few things that seem
> like they won't always work correctly.
> [...]
> Overall, I think that the whole approach here probably needs to be
> scrapped and rethought.  The stuff this patch is doing really belongs
> in the optimizer, not the parser, I think.  It could possibly happen
> at a relatively early stage in the optimizer so that the rest of the
> optimizer can see the results of the transformation and, well,
> optimize.  But parse time is way too early.

We create this query rewrites during parser stage, because we want
that the optimizer chooses the best strategies for each rewritten
subplan and that our executor nodes get the desired input format in
the most optimal way. Our goal was an integration that re-uses the
existing PostgreSQL rewrites and optimizations fully.

Another approach is to optimize the temporal primitives manually.
This does not reuse existing PostgreSQL optimizations automatically.

Is there a general guideline or policy as to which approach is
preferable?

Regarding all the other issues, we will look into them in detail and
report back soon.


Best regards,
Anton, Johann, Michael, Peter



Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-11-14 Thread Peter Moser
2017-10-06 19:22 GMT+02:00 Paul A Jungwirth :
> I just wanted to chime in and say that the work these people have done
> is *amazing*. I read two of their papers yesterday [1, 2], and if you
> are interested in temporal data, I encourage you to read them too. The
> first one is only 12 pages and quite readable. After that the second
> is easy because it covers a lot of the same ground but adds "scaling"
> of values when a tuple is split, and some other interesting points.
> Their contributions could be used to implement SQL:2011 syntax but go
> way beyond that.
>
> Almost every project I work on could use temporal database support,
> but there is nothing available in the Open Source world. The
> temporal_tables extension [3] offers transaction-time support, which
> is great for auditing, but it has no valid-time support (aka
> application-time or state-time). Same with Magnus Hagander's TARDIS
> approach [4], Chronomodel [5] (an extension to the Rails ORM), or any
> other project I've seen. But valid-time is the more valuable
> dimension, because it tells you the history of the thing itself (not
> just when the database was changed). Also nothing is even attempting
> full bitemporal support.
>
> The ideas behind temporal data are covered extensively in Snodgrass's
> 1999 book [6], which shows how valuable it is to handle temporal data
> in a principled way, rather than ad hoc. But that book also
> demonstrates how complex the queries become to do things like temporal
> foreign key constraints and temporal joins. I was sad to learn that
> his proposed TSQL2 was rejected as a standard back in the 90s,
> although the critiques by C. J. Date [7] have some merit. In
> particular, since TSQL2 used *statement* modifiers, some of the
> behavior was unclear or bad when using subqueries, views, and
> set-returning functions. It makes more sense to have temporal
> *operators*, so alongside inner join you have temporal inner join, and
> likewise with temporal left outer join, temporal
> union/intersection/difference, temporal aggregation, etc. (I think the
> drawbacks of TSQL2 came from pursuing an unachievable goal, which was
> to enable seamlessly converting existing non-temporal tables to
> temporal without breaking any queries.)
>
> Another unsatisfactory approach at historical data, from the industry
> rather than academia, is in chapter 4 and elsewhere of Ralph Kimball's
> *Data Warehouse Toolkit* [8]. His first suggestion (Type 1 Dimensions)
> is to ignore the problem and overwrite old data with new. His Type 2
> approach (make a new row) is better but loses the continuity between
> the old row and the new. Type 3 fixes that but supports only one
> change, not several. And anyway his ideas are tailored to star-schema
> designs so are not as broadly useful. Workarounds like bridge tables
> and "put the data in the fact table" are even more wedded to a
> star-schema approach. But I think his efforts do show how valuable
> historical data is, and how hard it is to handle without built-in
> support.
>
> As far as I can tell SQL:2011 avoids the statement modifier problem
> (I'm not 100% sure), but it is quite limited, mostly covering
> transaction-time semantics and not giving any way to do valid-time
> outer joins or aggregations. It is clearly an early first step.
> Unfortunately the syntax feels (to me) crippled by over-specificity,
> like it will have a hard time growing to support all the things you'd
> want to do.
>
> The research by Dignös et al shows how you can define temporal
> variants for every operator in the relational algebra, and then
> implement them by using just two transformations (ALIGN and NORMALIZE)
> combined with the existing non-temporal operators. It has a strong
> theoretical basis and avoids the TSQL2 problems with composability.
> And unlike SQL:2011 it has a great elegance and completeness I haven't
> seen anywhere else.
>
> I believe with range types the approach was to build up useful
> primitives rather than jumping straight to a less-factored full
> implementation of temporal features. (This in spite of SQL:2011
> choosing to model begin/end times as separate columns, not as ranges.
> :-) It seems to me the Dignös work follows the same philosophy. Their
> ALIGN and NORMALIZE could be used to implement SQL:2011 features, but
> they are also useful for much more. In their papers they actually
> suggest that these transformations need not be exposed to end-users,
> although it was convenient to have access to them for their own
> research. I think it'd be great if Postgres's SQL dialect supported
> them though, since SQL:2011 leaves out so much.
>
> Anyway, I wanted to thank them for their excellent work, their
> generosity, and also their perseverance. ([1] is from 2012 and was
> built against Postgres 9.0!) I hope we take their contribution
> seriously, because it would truly move Postgres's temporal support
> beyond any database on the market.


plpgsql test layout

2017-11-14 Thread Peter Eisentraut
Something that has been bothering me for a while, while we have neatly
organized test files for plperl, plpython, pltcl, the plpgsql tests are
all in one giant file in the main test suite, which makes development
and testing of plpgsql cumbersome.  It is by far the largest test file
after numeric_big now.

One option would be to create a new test setup under src/pl/pgsql(/src)
and move some of the test material from the main test suite there.  Some
of the test cases in the main test suite are more about SPI and triggers
and such, so it makes sense to keep these in the main line.  Of course
finding the cut-off might be hard.  Or maybe we'll just start with new
stuff from now on.

Any thoughts?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] SQL procedures

2017-11-14 Thread Tom Lane
Peter Eisentraut  writes:
> On 11/8/17 09:54, Tom Lane wrote:
>> Do procedures of this ilk belong in pg_proc at all?  It seems like a large
>> fraction of the attributes tracked in pg_proc are senseless for this
>> purpose.  A new catalog might be a better approach.

> The common functionality between functions and procedures is like 98%
> [citation needed], so they definitely belong there, even more so than
> aggregates, for example.

No, I don't think so.  The core reason why not is that in

SELECT foo(...) FROM ...

foo() might be either a plain function or an aggregate, so it's important
that functions and aggregates share the same namespace.  *That* is why
they are in the same catalog.  On the other hand, since the above syntax
is not usable to call a SQL procedure, putting SQL procedures into pg_proc
just creates namespacing conflicts.  Do we really want the existence of
a function foo(int) to mean that you can't create a SQL procedure named
foo and taking one int argument?

regards, tom lane



Re: [HACKERS] parallelize queries containing initplans

2017-11-14 Thread Robert Haas
On Tue, Nov 14, 2017 at 10:37 AM, Robert Haas  wrote:
> The problem would happen if the plan for InitPlan $1 in the above
> example itself referenced a parameter from an upper query level, and
> the value of that parameter changed, and then this section of the plan
> tree was rescanned.  I'm not sure I can write a query like that
> off-hand, but I think it's possible to do so.

OK, here's an example:

regression=# explain select * from tenk1 a where unique1 = (select
unique1 from tenk1 b where (select string4 from tenk1 c where c.ten =
a.ten order by unique1 limit 1) < b.string4 limit 1);
QUERY PLAN
--
 Seq Scan on tenk1 a  (cost=0.00..22051.31 rows=50 width=244)
   Filter: (unique1 = (SubPlan 2))
   SubPlan 2
 ->  Limit  (cost=2.01..2.16 rows=1 width=4)
   InitPlan 1 (returns $1)
 ->  Limit  (cost=0.29..2.01 rows=1 width=68)
   ->  Index Scan using tenk1_unique1 on tenk1 c
(cost=0.29..1727.20 rows=1000 width=68)
 Filter: (ten = a.ten)
   ->  Seq Scan on tenk1 b  (cost=0.00..483.00 rows= width=4)
 Filter: ($1 < string4)
(10 rows)

InitPlan 1 has got to be re-evaluated for every row in the "Seq Scan
on tenk1 a", and each iteration could return a different value for $1,
and some of those values might be wider than others -- well, not
really, because in this example string4 is actually declared as type
"name".  But if you imagine it as type "text" then you can see the
problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] SQL procedures

2017-11-14 Thread Peter Eisentraut
On 11/8/17 09:54, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 10/31/17 14:23, Tom Lane wrote:
>>> Why not use VOIDOID for the prorettype value?
> 
>> We need a way to distinguish functions that are callable by SELECT and
>> procedures that are callable by CALL.
> 
> Do procedures of this ilk belong in pg_proc at all?  It seems like a large
> fraction of the attributes tracked in pg_proc are senseless for this
> purpose.  A new catalog might be a better approach.

The common functionality between functions and procedures is like 98%
[citation needed], so they definitely belong there, even more so than
aggregates, for example.

> In any case, I buy none of your arguments that 0 is a better choice than a
> new pseudotype.

Well, I haven't heard any reasons for doing it differently, so I can't
judge the relative merits of either approach.  Ultimately, it would be a
minor detail as far as the code is concerned, I think.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] parallelize queries containing initplans

2017-11-14 Thread Robert Haas
On Tue, Nov 14, 2017 at 12:00 AM, Amit Kapila  wrote:
> I think this would have been a matter of concern if we use initplans
> below Gather/Gather Merge.  The patch uses initplans which are between
> current query level and root.  So, I think we don't need to reevaluate
> such parameters.  Let us try to see it via example:
>
>
>QUERY PLAN
> 
>  Aggregate  (cost=62.08..62.08 rows=1 width=8)
>InitPlan 1 (returns $1)
>  ->  Finalize Aggregate  (cost=20.64..20.65 rows=1 width=8)
>->  Gather  (cost=20.63..20.64 rows=2 width=8)
>  Workers Planned: 2
>  ->  Partial Aggregate  (cost=20.63..20.64 rows=1 width=8)
>->  Parallel Seq Scan on t3  (cost=0.00..18.50
> rows=850 width=4)
>->  Hash Join  (cost=20.75..41.42 rows=1 width=4)
>  Hash Cond: (t1.j = t2.j)
>  ->  Gather  (cost=0.00..20.63 rows=10 width=12)
>Workers Planned: 2
>Params Evaluated: $1
>->  Parallel Seq Scan on t1  (cost=0.00..20.63 rows=4 width=12)
>  Filter: (k = $1)
>  ->  Hash  (cost=20.63..20.63 rows=10 width=8)
>->  Gather  (cost=0.00..20.63 rows=10 width=8)
>  Workers Planned: 2
>  Params Evaluated: $1
>  ->  Parallel Seq Scan on t2  (cost=0.00..20.63
> rows=4 width=8)
>Filter: (k = $1)
> (20 rows)
>
>
> Now, here even if initplan would have been an undirect correlated
> plan, it wouldn't have been a problem, because we don't need to
> reevaluate it for Gather node below Hash.
>
> Am I missing something?  Do you have some test or shape of the plan in
> mind which can cause a problem?

The problem would happen if the plan for InitPlan $1 in the above
example itself referenced a parameter from an upper query level, and
the value of that parameter changed, and then this section of the plan
tree was rescanned.  I'm not sure I can write a query like that
off-hand, but I think it's possible to do so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] log_destination=file

2017-11-14 Thread Robert Haas
On Sun, Sep 10, 2017 at 5:29 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> average latency:
>
> clients patch   master
> 10  0.321   0.286
> 20  0.669   0.602
> 30  1.016   0.942
> 40  1.358   1.280
> 50  1.727   1.637

That's still a noticeable slowdown, though.  And we've had previous
reports of the overhead of logging being significant as well:

http://postgr.es/m/caclsapsa7u0gcfpojvqem6sgtekv8vnwdbfhvi+dqo+gu5g...@mail.gmail.com

I seem to recall a discussion, perhaps in-person, around the time Theo
submitted that patch where it was reported that the logging collector
could not be used on some systems he was working with because it
became a major performance bottleneck.  With each backend writing its
own messages to a file, it was tolerable, but when you tried to funnel
everything through a single process, the back-pressure slowed down the
entire system unacceptably.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] SQL procedures

2017-11-14 Thread Peter Eisentraut
On 11/8/17 12:15, Merlin Moncure wrote:
> On Wed, Nov 8, 2017 at 11:03 AM, Peter Eisentraut
>  wrote:
>> On 11/8/17 11:11, Merlin Moncure wrote:
>>> On Wed, Nov 8, 2017 at 9:13 AM, Peter Eisentraut
>>>  wrote:
 I have already submitted a separate patch that addresses these questions.
>>>
>>> Maybe I'm obtuse, but I'm not seeing it? In very interested in the
>>> general approach to transaction management; if you've described it in
>>> the patch I'll read it there.  Thanks for doing this.
>>
>> https://www.postgresql.org/message-id/178d3380-0fae-2982-00d6-c43100bc8...@2ndquadrant.com
> 
> All right, thanks.  So,

Could we have this discussion in that other thread?  This thread is just
about procedures at mostly a syntax level.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-14 Thread Alvaro Herrera
Simon Riggs wrote:
> On 13 November 2017 at 12:55, Alvaro Herrera  wrote:
> > Somehow I managed to include an unrelated patch as attachment.  Here's
> > another attempt (on which I also lightly touched ddl.sgml with relevant
> > changes).
> 
> Looks good. Some minor comments below.
> 
> 0001- Simplify
> Seems useful as separate step; agree with everything, no further comments

Thanks, pushed.

> Why uint16? Why not just uint?

*Shrug*.  16 bits seem plenty enough.  I changed it to bits16, which we
use in other places for bitmasks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Transaction control in procedures

2017-11-14 Thread Merlin Moncure
On Wed, Nov 8, 2017 at 5:48 PM, Simon Riggs  wrote:
> On 31 October 2017 at 15:38, Peter Eisentraut
>  wrote:
>> Here is a patch that implements transaction control in PL/Python
>> procedures.  (This patch goes on top of "SQL procedures" patch v1.)
>
> The patch is incredibly short for such a feature, which is probably a
> good indication that it is feasible.
>
> Amazing!

I have to agree with that.  I'm really excited about this...

Some questions:
*) Will it be possible to do operations like this in pl/pgsql?

BEGIN
  SELECT INTO r * FROM foo;

  START TRANSACTION;  -- perhaps we ought to have a special function
for this instead (BEGIN is reserved, etc).
  SET transaction_isololation TO serializable;
...

 *) Will there be any negative consequences to a procedure running
with an unbounded run time?  For example, something like:

LOOP
  SELECT check_for_stuff_to_do();

  IF stuff_to_do
  THEN
do_stuff();
  ELSE
PERFORM pg_sleep(1);
  END IF;
END LOOP;

*) Will pg_cancel_backend() cancel the currently executing statement
or the procedure? (I guess probably the procedure but I'm curious)

*) Will long running procedures be subject to statement timeout (and
does it apply to the entire procedure)?  Will they be able to control
statement_timeout from within the procedure itself?

*) Will pg_stat_activity show the invoking CALL or the currently
executing statement?  I see a strong argument for showing both of
these things. although I understand that's out of scope here.

If these questions (especially the first two) come down the correct
way, then it will mean that I can stop coding in other languages
(primarily bash) for a fairly large number of cases that I really
think belong in the database itself.  This would really simplify
coding, some things in bash are really awkward to get right such as a
mutex to guarantee single script invocation.  My only real dependency
on the operation system environment at that point would be cron to
step in to the backround daemon process (which would immediately set
an advisory lock).

I'm somewhat surprised that SPI is the point of attack for this
functionality, but if it works that's really the best case scenario
(the only downside I can see is that the various out of core pl/s have
to implement the interface individually).

merlin



Re: [HACKERS] Pluggable storage

2017-11-14 Thread Amit Kapila
On Tue, Nov 14, 2017 at 4:12 PM, Alvaro Herrera  wrote:
> Hmm.  Am I reading it right that this discussion led to moving
> essentially all code from tqual.c to heapam?  Given the hard time we've
> had to get tqual.c right, it seems fundamentally misguided to me to
> require that every single storage AM reimplements all the visibility
> routines.
>
> I think that changing tqual's API (such as not passing HeapTuples
> anymore but some other more general representation) would be okay and
> should be sufficient, but this wholesale movement of code seems
> dangerous and wasteful in terms of future reimplementations that will be
> necessary.
>

I don't think the idea is to touch existing tqual.c in any significant
way.  However, some other storage engine might need a different way to
check the visibility of tuples so we need provision for that. I think
for storage engine where tuple headers no longer contain transaction
information and or the old versions of tuples are chained in separate
storage (say undo storage), current visibility routines can be used.
I think the current tqual.c is quite tightly coupled with the
HeapTuple representation, so changing that or adding more code to it
for another storage engine with different tuple format won't be of
much use.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-14 Thread Alvaro Herrera
David Rowley wrote:
> On 15 November 2017 at 01:09, Alvaro Herrera  wrote:
> > if a
> > partition exists which *doesn't* have the index, restoring things this
> > way would create the index in that partition too, which is unwanted
> > because the end state is different to what was in the dumped database.
> 
> hmm, but surely the all those indexes must already exist if the
> partitioned index exists. Won't we be disallowing DROP INDEX of the
> leaf partition indexes if that index is marked as being part of the
> partitioned index?

In normal cases, sure -- but you can do ALTER INDEX DETACH on a
partition, then drop the index.  This is useful for example if you want
to replace some partition's index because of bloat: create a replacement
index, detach the old one, attach the new one, drop the old one.  Now
you probably don't *want* to take a pg_dump in the middle of this
process ...  but pg_dump's charter is not to produce the output we think
would be most convenient to users most of the time, but rather to
accurately describe what is in the database.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-11-14 Thread Magnus Hagander
On Mon, Nov 13, 2017 at 3:17 PM, Dean Rasheed 
wrote:

> On 28 October 2017 at 13:46, Pavel Stehule 
> wrote:
> > I though about Alexander proposal, and I am thinking so it can be
> probably
> > best if we respect psql design. I implemented two command suffixes
> > (supported only when it has sense) "s" sorted by size and "d" as descent
> >
> > so list of tables can be sorted with commands:
> >
> > \dt+sd (in this case, the order is not strict), so command
> > \dtsd+ is working too (same \disd+ or \di+sd)
> >
> > These two chars are acceptable. Same principle is used for \l command
> >
> > \lsd+ or \l+sd
> >
> > What do you think about it?
> >
>
> I really hate that syntax. This is going to turn into an
> incomprehensible mess, and isn't easily extended to support other
> options.
>

+1. While useful in itself, I think it's definitely a dangerous pattern to
go down, as it'll only get worse.


I agree with people who have said they would prefer this to be
> available as a per-command option rather than as a variable that you
> have to set, but it needs a clearer syntax. I actually like Stephen's
> idea of using a user-defined SQL snippet, because that's a familiar
> syntax to people, and it avoids adding an ever-increasing number of
> options to these commands. Instead, the syntax could simply be:
>

+1 here as well. And anybody who is actually going to need this level of
control definitely will know SQL...

And if one wants to save some "standard patterns", it should be doable to
save the pattern itself in a variable and then use it with something like
"\dt :mysort" and have it expand the normal way there.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-14 Thread David Rowley
On 15 November 2017 at 01:09, Alvaro Herrera  wrote:
> if a
> partition exists which *doesn't* have the index, restoring things this
> way would create the index in that partition too, which is unwanted
> because the end state is different to what was in the dumped database.

hmm, but surely the all those indexes must already exist if the
partitioned index exists. Won't we be disallowing DROP INDEX of the
leaf partition indexes if that index is marked as being part of the
partitioned index?

If so, then surely this could only happen if someone manually edited
the pg_dump file to remove the CREATE INDEX statement for the leaf
partition, and if they do that, then maybe they won't be so surprised
that CREATE INDEX has to create some indexes.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Pluggable storage

2017-11-14 Thread Alvaro Herrera
Hmm.  Am I reading it right that this discussion led to moving
essentially all code from tqual.c to heapam?  Given the hard time we've
had to get tqual.c right, it seems fundamentally misguided to me to
require that every single storage AM reimplements all the visibility
routines.

I think that changing tqual's API (such as not passing HeapTuples
anymore but some other more general representation) would be okay and
should be sufficient, but this wholesale movement of code seems
dangerous and wasteful in terms of future reimplementations that will be
necessary.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-14 Thread David Rowley
On 8 October 2017 at 02:34, Robert Haas  wrote:
> However, when
> you dump-and-restore (including but not limited to the pg_upgrade
> case) you've got to preserve those names.  If you just generate a new
> name that may or may not be the same as the old one, then it may
> collide with a user-specified name that only occurs later in the dump.
> Also, you'll have trouble if the user has applied a COMMENT or a
> SECURITY LABEL to the index because that command works by name, or if
> the user has a reference to the index name inside a function or
> whatever.

I might be arriving late for the party here, but I see mention on here
of when we ATTACH a partition that we'd check to see if any of that
table's indexes match the index definition required for the
partitioned table's "partitioned index", and just use those indexes
and incorporate them into the partitioned index by setting the new OID
column in pg_index (or whatever way was decided to record the
dependency)... So, if that works for ATTACH, then can't pg_dump just
create each of the partition's indexes first, then create the
partitioned table's partition index, and that command would just go
through checking each partition similar to how ATTACH would work. That
way we get to keep all the names as pg_dump would just dump the
indexes belonging to each partition like it were any other index.
Those indexes would only become a bit more special once the
partitioned index is created on the parent.

I've not read the patch yet, but reading the thread it does not seem
to have gone in this direction. Maybe I'm overlooking something?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



RE: [HACKERS][PATCH]pg_buffercache add a buffer state column, Add fuction to decode buffer state

2017-11-14 Thread Moon Insung
# I add [hacker] to the mail subject.

Dear Andres Freund.

Thank you for review!
> I'm disinclined to exposing state that way. It's an internal representation 
> that's not unlikely to change. Sure,
> pg_buffercache is more of a debugging / investigatory tool, but I 
> nevertheless see no reason to expose it that way.

Okay!
I'll not print(or add) the internal value directly.
(and I'll be careful when create another patch).
Thank you

> One way around that would be to create a buffer_state type that's returned by 
> pg_buffercache and then only decoded when
> outputting.  Doing that + having a cast to an array seems like it'd provide 
> most of the needed functionality?

It's it better to output the decode state value from pg_buffercache view?
For example to following output

-
postgres=# select * from pg_buffercache where bufferid = 1; 
-[ RECORD 1 ]+---
bufferid | 1
relfilenode  | 1262
reltablespace| 1664
reldatabase  | 0
relforknumber| 0
relblocknumber   | 0
isdirty  | f
usagecount   | 5
pinning_backends | 0
buffer_state | {LOCKED,VALID,TAG_VALID,PERMANENT}
-

It's right?
If it is correct, I'll modify patch ASAP.

Regards.
Moon.

> -Original Message-
> From: Andres Freund [mailto:and...@anarazel.de]
> Sent: Tuesday, November 14, 2017 6:07 PM
> To: Moon Insung
> Cc: 'PostgreSQL Hackers'
> Subject: Re: [PATCH]pg_buffercache add a buffer state column, Add fuction to 
> decode buffer state
> 
> Hi,
> 
> On 2017-11-14 17:57:00 +0900, Moon Insung wrote:
> > So I add a state column to pg_buffercache view so that I could print a 
> > value indicating the state of the buffer.
> > This is outpu as an unit32 type, and examples are shown below.
> 
> > -
> > postgres=# select * from pg_buffercache where bufferid = 1; -[ RECORD
> > 1 ]+---
> > bufferid | 1
> > relfilenode  | 1262
> > reltablespace| 1664
> > reldatabase  | 0
> > relforknumber| 0
> > relblocknumber   | 0
> > isdirty  | f
> > usagecount   | 5
> > pinning_backends | 0
> > buffer_state | 2203320320 <- it's a new column
> > -
> 
> I'm disinclined to exposing state that way. It's an internal representation 
> that's not unlikely to change. Sure,
> pg_buffercache is more of a debugging / investigatory tool, but I 
> nevertheless see no reason to expose it that way.
> 
> If we shared those flags more in a manner like you did below:
> > 1 |1262 | {LOCKED,VALID,TAG_VALID,PERMANENT}
> 
> that'd be more acceptable.  However doing that by default would have some 
> performance downsides, because we'd need to
> create these arrays for every row.
> 
> One way around that would be to create a buffer_state type that's returned by 
> pg_buffercache and then only decoded when
> outputting.  Doing that + having a cast to an array seems like it'd provide 
> most of the needed functionality?
> 
> Greetings,
> 
> Andres Freund






Re: [HACKERS] pg_basebackup --progress output for batch execution

2017-11-14 Thread Arthur Zakirov
On Fri, Nov 10, 2017 at 10:32:23AM -0300, Martín Marqués wrote:
> An example where using isatty() might fail is if you run pg_basebackup
> from a tty but redirect the output to a file, I believe that in that
> case isatty() will return true, but it's very likely that the user
> might want batch mode output.

Sorry if I misunderstood you. I think this can happen if you redirect only 
standard output (stdout) to a file.
But pg_basebackup writes messages to stderr. So you need redirect stderr too:

pg_basebackup -D data -X stream -R --progress --verbose &> backup

or

pg_basebackup -D data_repl -X stream -R --progress --verbose > backup 2>&1

If you redirect stderr too then isatty() will know that message output is not 
tty.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [PATCH]pg_buffercache add a buffer state column, Add fuction to decode buffer state

2017-11-14 Thread Andres Freund
Hi,

On 2017-11-14 17:57:00 +0900, Moon Insung wrote:
> So I add a state column to pg_buffercache view so that I could print a value 
> indicating the state of the buffer.
> This is outpu as an unit32 type, and examples are shown below.

> -
> postgres=# select * from pg_buffercache where bufferid = 1;
> -[ RECORD 1 ]+---
> bufferid | 1
> relfilenode  | 1262
> reltablespace| 1664
> reldatabase  | 0
> relforknumber| 0
> relblocknumber   | 0
> isdirty  | f
> usagecount   | 5
> pinning_backends | 0
> buffer_state | 2203320320 <- it's a new column
> -

I'm disinclined to exposing state that way. It's an internal
representation that's not unlikely to change. Sure, pg_buffercache is
more of a debugging / investigatory tool, but I nevertheless see no
reason to expose it that way.

If we shared those flags more in a manner like you did below:
> 1 |1262 | {LOCKED,VALID,TAG_VALID,PERMANENT}

that'd be more acceptable.  However doing that by default would have
some performance downsides, because we'd need to create these arrays for
every row.

One way around that would be to create a buffer_state type that's
returned by pg_buffercache and then only decoded when outputting.  Doing
that + having a cast to an array seems like it'd provide most of the
needed functionality?

Greetings,

Andres Freund



[PATCH]pg_buffercache add a buffer state column, Add fuction to decode buffer state

2017-11-14 Thread Moon Insung
Dear Hackers.

I'm studied PostgreSQL buffers for the development of new patches.
In particular, using pg_buffercache, is can easily check the status of actual 
buffer.

Bur there was one inconvenience.
Pg_buffercache was also to check only the dirty state of the buffer.

State of the buffer currently represents 10 states.
Therefore, it seems impossible to check remaining 9 state.

So I add a state column to pg_buffercache view so that I could print a value 
indicating the state of the buffer.
This is outpu as an unit32 type, and examples are shown below.

-
postgres=# select * from pg_buffercache where bufferid = 1;
-[ RECORD 1 ]+---
bufferid | 1
relfilenode  | 1262
reltablespace| 1664
reldatabase  | 0
relforknumber| 0
relblocknumber   | 0
isdirty  | f
usagecount   | 5
pinning_backends | 0
buffer_state | 2203320320 <- it's a new column
-

With the patches, user can check status values and check the status of the 
buffers.

However, if do not know source code, or do not know hex values,
It's difficult or impossible to check the actual state even using this patch.

Therefore, add a new function to improve readability when checking state.
When you input a value for state, this function prints out what the actual 
state.

Examples of actual use are as follows.

-
postgres=# SELECT bufferid, relfilenode, state_text FROM pg_buffercache,
LATERAL pg_buffercache_state_print(buffer_state) M(state_text)
WHERE bufferid < 10;
 bufferid | relfilenode |  state_text
--+-+---
1 |1262 | {LOCKED,VALID,TAG_VALID,PERMANENT}
2 |1260 | {LOCKED,VALID,TAG_VALID,PERMANENT}
3 |1259 | {LOCKED,DIRTY,VALID,TAG_VALID,JUST_DIRTIED,PERMANENT}
4 |1259 | {LOCKED,VALID,TAG_VALID,PERMANENT}
5 |1259 | {LOCKED,VALID,TAG_VALID,PERMANENT}
6 |1249 | {LOCKED,VALID,TAG_VALID,PERMANENT}
7 |1249 | {LOCKED,VALID,TAG_VALID,PERMANENT}
8 |1249 | {LOCKED,VALID,TAG_VALID,PERMANENT}
9 |1249 | {LOCKED,VALID,TAG_VALID,PERMANENT}
(9 rows)
-

If you use this patch, I think that you can easily judge the state of the 
buffer.

Regards.
Moon.


pg_buffercache add a buffer state column and Add function to decode buffer state_V1.diff
Description: Binary data


Re: Migration to PGLister - After

2017-11-14 Thread Pavel Golub
Hello, Tom.

You wrote:

TL> Geoff Winkless  writes:
>> The removal of the [HACKERS] (etc) header will be frustrating for me: I do
>> not sort mailing lists into folders (I simply scan the "Forums" autofilter
>> in gmail) and have no wish to do so, and there is no way to make such a
>> machine-readable header visible in gmail.

TL> I'm told it's actually very easy to handle this in gmail: set up a filter
TL> that recognizes list messages (make it key on the List-ID header) and then
TL> tags them with an appropriate label.

TL> We should put detailed instructions about how to do that on the wiki page
TL> about the migration, but not being a gmail user I can't provide exact
TL> instructions.

Here my post with instructions and screenshots
https://pgolub.wordpress.com/2017/11/14/hackers-and-bugs-lists-migration-and-gmails-filters/
  

>> Perhaps those users who wish to use DKIM on their email to the list could
>> be persuaded to remove the Subject line from list of optional fields to be
>> signed for list email?

TL> As Stephen mentioned, there are no good solutions that would allow us to
TL> keep mangling subject headers.  This has been debated extensively already.

TL> regards, tom lane




-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com




Re: Migration to PGLister - After

2017-11-14 Thread Pavel Golub
Hello, Geoff.

You wrote:

GW> The removal of the [HACKERS] (etc) header will be frustrating for
GW> me: I do not sort mailing lists into folders (I simply scan the
GW> "Forums" autofilter in gmail) and have no wish to do so, and there
GW> is no way to make such a machine-readable header visible in gmail.


Who had troubles with GMail's filters and pgsql-hackers & pgsql-bugs yesterday?
https://pgolub.wordpress.com/2017/11/14/hackers-and-bugs-lists-migration-and-gmails-filters/

tl;dr: Use list operator in search and filters

GW> Perhaps those users who wish to use DKIM on their email to the
GW> list could be persuaded to remove the Subject line from list of
GW> optional fields to be signed for list email? 


GW> It seems to me unreasonable that those users who don't want to
GW> use a feature should be inconvenienced by those who do.


GW> Geoff

GW> On 13 November 2017 at 13:14, Stephen Frost  wrote:

GW> Greetings!

GW>  This list has now been migrated to new mailing list software known as
GW>  'PGLister'.  This migration will impact all users of this mailing list
GW>  in one way or another.

GW>  If you would like to unsubscribe from this mailing list, please click on
GW>  'Show Original' or 'Show Headers' in your client and find the
GW>  'List-Unsubscribe' header which will include a link that you can click
GW>  on (or copy/paste into your browser) and use to unsubscribe yourself
GW>  from this list.

GW>  The changes which we expect to be most significant to users can be found
GW>  on the wiki here:
GW> https://wiki.postgresql.org/wiki/PGLister_Announce the
GW>  current version of which is also included below.

GW>  Thank you!

GW>  Stephen (on behalf of the pginfra team)





-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com




Re: [HACKERS] Transform for pl/perl

2017-11-14 Thread Anthony Bykov
On Fri, 10 Nov 2017 14:40:21 +0100
Pavel Stehule  wrote:

> Hi
> 
> 2017-10-24 14:27 GMT+02:00 Anthony Bykov :
> 
> > There are some moments I should mention:
> > 1. {"1":1}::jsonb is transformed into HV {"1"=>"1"}, while
> > ["1","2"]::jsonb is transformed into AV ["1", "2"]
> >
> > 2. If there is a numeric value appear in jsonb, it will be
> > transformed to SVnv through string (Numeric->String->SV->SVnv). Not
> > the best solution, but as far as I understand this is usual
> > practise in postgresql to serialize Numerics and de-serialize them.
> >
> > 3. SVnv is transformed into jsonb through string
> > (SVnv->String->Numeric).
> >
> > An example may also be helpful to understand extension. So, as an
> > example, function "test" transforms incoming jsonb into perl,
> > transforms it back into jsonb and returns it.
> >
> > create extension jsonb_plperl cascade;
> >
> > create or replace function test(val jsonb)
> > returns jsonb
> > transform for type jsonb
> > language plperl
> > as $$
> > return $_[0];
> > $$;
> >
> > select test('{"1":1,"example": null}'::jsonb);
> >
> >  
> I am looking to this patch:
> 
> 1. the patch contains some artefacts - look the word "hstore"
> 
> 2. I got lot of warnings
> 
> 
> make[1]: Vstupuje se do adresáře
> „/home/pavel/src/postgresql/contrib/jsonb_plperl“
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -fexcess-precision=standard -g -ggdb -Og -g3
> -fno-omit-frame-pointer -fPIC -I../../src/pl/plperl -I. -I.
> -I../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2
> -I/usr/lib64/perl5/CORE  -c -o jsonb_plperl.o jsonb_plperl.c
> jsonb_plperl.c: In function ‘SV_FromJsonbValue’: jsonb_plperl.c:83:9:
> warning: ‘result’ may be used uninitialized in this function
> [-Wmaybe-uninitialized] return (result);
>  ^
> jsonb_plperl.c: In function ‘SV_FromJsonb’:
> jsonb_plperl.c:95:10: warning: ‘object’ may be used uninitialized in
> this function [-Wmaybe-uninitialized]
>   HV *object;
>   ^~
> In file included from /usr/lib64/perl5/CORE/perl.h:5644:0,
>  from ../../src/pl/plperl/plperl.h:52,
>  from jsonb_plperl.c:17:
> /usr/lib64/perl5/CORE/embed.h:404:19: warning: ‘value’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
>  #define newRV(a)  Perl_newRV(aTHX_ a)
>^~
> jsonb_plperl.c:101:10: note: ‘value’ was declared here
>   SV *value;
>   ^
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -fexcess-precision=standard -g -ggdb -Og -g3
> -fno-omit-frame-pointer -fPIC -shared -o jsonb_plperl.so
> jsonb_plperl.o  -L../../src/port -L../../src/common -Wl,--as-needed
> -Wl,-rpath,'/usr/lib64/perl5/CORE',--enable-new-dtags   -Wl,-z,relro
> -specs=/usr/lib/rpm/redhat/redhat-hardened-ld
> -fstack-protector-strong -L/usr/local/lib  -L/usr/lib64/perl5/CORE
> -lperl -lpthread -lresolv -lnsl -ldl -lm -lcrypt -lutil -lc make[1]:
> Opouští se adresář „/home/pavel/src/postgresql/contrib/jsonb_plperl“
> 
> [pavel@nemesis contrib]$ gcc --version
> gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
> Copyright (C) 2017 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There
> is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
> PARTICULAR PURPOSE.
> 
> 3. regress tests passed
> 
> 4. There are not any documentation - probably it should be part of
> PLPerl
> 
> 5. The regress tests doesn't coverage other datatypes than numbers. I
> miss boolean, binary, object, ... Maybe using data::dumper or some
> similar can be interesting
> 
> Note - it is great extension, I am pleasured so transformations are
> used.
> 
> Regards
> 
> Pavel
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hack...@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >  

Hello,
Thank you for your review. I have fixed most of your comments, except
for the 5-th part, about data::dumper - I just couldn't understand
your point, but I've added more tests with more complex objects if this
helps. 

Please, take a look at new patch. You can find it in attachments to
this message (it is called "0001-jsonb_plperl-extension-v2.patch")
--
Anthony Bykov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/contrib/Makefile b/contrib/Makefile
index 8046ca4..53d44fe 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -75,9 +75,9 @@ ALWAYS_SUBDIRS += sepgsql
 endif
 
 ifeq ($(with_perl),yes)
-SUBDIRS += hstore_plperl
+SUBDIRS += hstore_plperl jsonb_plperl
 else
-ALWAYS_SUBDIRS += hstore_plperl
+ALWAYS_SUBDIRS += hstore_plperl jsonb_plperl
 endif
 
 ifeq 

Re: [HACKERS] Pluggable storage

2017-11-14 Thread Michael Paquier
On Tue, Nov 7, 2017 at 6:34 PM, Haribabu Kommi  wrote:
> On Tue, Oct 31, 2017 at 8:59 PM, Haribabu Kommi 
> wrote:
>> Additional changes that are done in the patches compared to earlier
>> patches apart from rebase.
>
> Rebased patches are attached.

This set of patches needs again a... Rebase.
-- 
Michael