Re: [HACKERS] TABLESAMPLE patch

2014-12-22 Thread Petr Jelinek

On 21/12/14 18:38, Tomas Vondra wrote:

Hi,

On 18.12.2014 13:14, Petr Jelinek wrote:

Hi,

v2 version of this patch is attached.


I did a review of this v2 patch today. I plan to do a bit more testing,
but these are my comments/questions so far:


Thanks for looking at it!



(0) There's a TABLESAMPLE page at the wiki, not updated since 2012:

 https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation

 We should either update it or mark it as obsolete I guess. Also,
 I'd like to know what's the status regarding the TODO items
 mentioned there. Are those still valid with this patch?


I will have to look in more detail over the holidays and update it, but 
the general info about table sampling there applies and will apply to 
any patch trying to implement it.


Quick look on the mail thread suggest that at least the concerns 
mentioned in the mailing list should not apply to this implementation.
And looking at the patch the problem with BERNOULLI sampling should not 
exist either as I use completely different implementation for that.


I do also have some issues with joins which I plan to look at but it's 
different one (my optimizer code overestimates the number of rows)



(1) The patch adds a new catalog, but does not bump CATVERSION.



I thought this was always done by committer?


(2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
 as it squishes everything into a single chunk. That's inconsistent
 with naming of the other catalogs. I think pg_table_sample_method
 would be better.


Fair point, but perhaps pg_tablesample_method then as tablesample is 
used as single word everywhere including the standard.




(3) There are a few more strange naming decisions, but that's mostly
 because of the SQL standard requires that naming. I mean SYSTEM and
 BERNOULLI method names, and also the fact that the probability is
 specified as 0-100 value, which is inconsistent with other places
 (e.g. percentile_cont uses the usual 0-1 probability notion). But
 I don't think this can be fixed, that's what the standard says.


Yeah, I don't exactly love that either but what standard says, standard 
says.




(4) I noticed there's an interesting extension in SQL Server, which
 allows specifying PERCENT or ROWS, so you can say

   SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);

 or

   SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);

 That seems handy, and it'd make migration from SQL Server easier.
 What do you think?


Well doing it exactly this way somewhat kills the extensibility which 
was one of the main goals for me - I allow any kind of parameters for 
sampling and the handling of those depends solely on the sampling 
method. This means that in my approach if you'd want to change what you 
are limiting you'd have to write new sampling method and the query would 
then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500); 
or some such (depending on how you name the sampling method). Or SELECT 
* FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend 
the SYSTEM sampling method, that would be also doable.


The reason for this is that I don't want to really limit too much what 
parameters can be send to sampling (I envision even SYSTEM_TIMED 
sampling method that will get limit as time interval for example).




(5) I envision a lot of confusion because of the REPEATABLE clause.
 With READ COMMITTED, it's not really repeatable because of changes
 done by the other users (and maybe things like autovacuum). Shall
 we mention this in the documentation?


Yes docs need improvement and this should be mentioned, also code-docs - 
I found few places which I forgot to update when changing code and now 
have misleading comments.




(6) This seems slightly wrong, because of long/uint32 mismatch:

   long seed = PG_GETARG_UINT32(1);

 I think uint32 would be more appropriate, no?


Probably, although I need long later in the algorithm anyway.



(7) NITPICKING: I think a 'sample_rate' would be a better name here:

   double percent = sampler-percent;


Ok.



(8) NITPICKING: InitSamplingMethod contains a command with ';;'

   fcinfo.arg[i] = (Datum) 0;;


Yeah :)



(9) The current regression tests only use the REPEATABLE cases. I
 understand queries without this clause are RANDOM, but maybe we
 could do something like this:

SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
  SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
) foo;

  Granted, there's still a small probability of false positive, but
  maybe that's sufficiently small? Or is the amount of code this
  tests negligible?


Well, depending on fillfactor and limit it could be made quite reliable 
I think, I also want to add test with VIEW (I think v2 has a bug there) 
and perhaps some JOIN.




(10) In the initial patch you mentioned it's possible to write custom

Re: [HACKERS] GiST kNN search queue (Re: KNN-GiST with recheck)

2014-12-22 Thread Heikki Linnakangas

On 12/20/2014 10:50 PM, Peter Geoghegan wrote:

On Wed, Dec 17, 2014 at 6:07 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

How about adding a src/backend/lib/README for that, per attached?


I took a quick look at this. Some observations:

* I like the idea of adding a .../lib README. However, the README
fails to note that ilist.c implements an *integrated* list, unlike the
much more prevalent cell-based List structure. It should note that,
since that's the whole point of ilist.c.


Added a sentence on that.


* pairingheap_remove() is technically dead code. It still makes sense
that you'd have it in this patch, but I think there's an argument for
not including it at all on the theory that if you need to use it you
should use a different data structure. After all, the actual
(non-amortized) complexity of that operation is O(n) [1], and if
remove operations are infrequent as we might expect, that might be the
more important consideration. As long as you are including
pairingheap_remove(), though, why is the local variable prev_ptr a
pointer to a pointer to a pairingheap_node, rather than just a pointer
to a pairingheap_node?

* Similarly, the function-like macro pairingheap_reset() doesn't seem
to pull its weight. Why does it exist alongside pairingheap_free()?
I'm not seeing a need to re-use a heap like that.


pairingheap_remove and pairingheap_reset are both unused in this patch, 
but they were needed for the other use case, tracking snapshots to 
advance xmin more aggressively, discussed here: 
http://www.postgresql.org/message-id/5488acf0.8050...@vmware.com. In 
fact, without the pairingheap_remove() operation, the prev_or_parent 
pointer wouldn't be necessarily at all. We could've added them as a 
separate patch, but that seems like unnecessary code churn.


The prev_ptr variable is used to set the parent's first_child pointer, 
or the previous sibling's next_sibling pointer, depending on whether the 
removed node is the parent's first child or not. I'll add more comments 
in pairingheap_remove to explain that.



*  Assert(!pairingheap_is_empty(heap)) appears in a few places.
You're basically asserting that a pointer isn't null, often
immediately before dereferencing the pointer. This seems to be of
questionable value.


I copied that from binaryheap.c. It has some documentation value; they 
make it easy to see that the functions require the heap to not be empty. 
It's also explained in comments, but still.



* I think that the indentation of code could use some tweaking.

* More comments, please. In particular, comment the struct fields in
pairingheap_node. There are various blocks of code that could use at
least an additional terse comment, too.


Added some comments, hope it's better now.


* You talked about tuplesort.c integration. In order for that to
happen, I think the comparator logic should know less about min-heaps.
This should formally be a max-heap, with the ability to provide
customizations only encapsulated in the comparator (like inverting the
comparison logic to get a min-heap, or like custom NULLs first/last
behavior). So IMV this comment should be more generic/anticipatory:

+ /*
+  * For a max-heap, the comparator must return 0 iff a  b, 0 iff a == b,
+  * and 0 iff a  b.  For a min-heap, the conditions are reversed.
+  */
+ typedef int (*pairingheap_comparator) (const pairingheap_node *a,
const pairingheap_node *b, void *arg);

I think the functions should be called pairing_max_heap* for this
reason, too. Although that isn't consistent with binaryheap.c, so I
guess this whole argument is a non-starter.


I don't see what the problem is. The pairingheap.c (and binaryheap.c) 
code works the same for min and max-heaps. The comments assume a 
max-heap in a few places, but that seems OK to me in the context.



* We should just move rbtree.c to .../lib. We're not using CVS anymore
-- the history will be magically preserved.


Yeah, I tend to agree. Tom Lane has not liked moving things, because it 
breaks back-patching. That's true in general, even though git has some 
smarts to follow renames. I think it would work in this case, though. 
Anyway, let's discuss and do that as a separate patch, so that we don't 
get stuck on that.



Anyway, to get to the heart of the matter: in general, I think the
argument for the patch is sound. It's not a stellar improvement, but
it's worthwhile. That's all I have for now...


Ok, thanks for the review! I have committed this, with some cleanup and 
more comments added.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Moving src/backend/utils/misc/rbtree.c to src/backend/lib

2014-12-22 Thread Heikki Linnakangas
Peter Geoghegan suggested [1] moving rbtree.c to src/backend/lib, which 
I think makes a lot of sense. Now that we have several other general 
purpose data structures in src/backend/lib (linked lists, a binary heap, 
and a pairing heap), rbtree.c would definitely be better placed in 
src/backend/lib, too.


The usual objection to moving things is that it makes back-patching 
harder. It also might break third-party code that use it (since 
presumably we would also move the .h file). Nevertheless, I feel the 
advantages outweigh the disadvantages in this case.


Any objections?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] advance local xmin more aggressively

2014-12-22 Thread Heikki Linnakangas

On 12/16/2014 10:41 PM, Jeff Janes wrote:

On Wed, Dec 10, 2014 at 3:46 PM, Robert Haas robertmh...@gmail.com wrote:


On Wed, Dec 10, 2014 at 3:28 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Care to code it up?


Here you are.


That was quick.

You need to add a semicolon to the end of line 20 in pairingheap.c.


In addition to the semicolon, it doesn't build under cassert.  There are
some pairingheap_empty that need to be pairingheap_is_empty, and snapmgr.c
needs an address of operator near line 355 and something is wrong
in snapmgr.c near line 811.


Here's an updated version, rebased over the pairing heap code that I 
just committed, and fixing those bugs.


- Heikki

commit 4f37313a5b173c2952aebc91c41c744dcc3cf2df
Author: Heikki Linnakangas heikki.linnakan...@iki.fi
Date:   Mon Dec 22 12:22:39 2014 +0200

Use pairing heap to keep registered snapshots in xmin-order.

This allows us to advance the xmin in PGPROC more aggressively.

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index d601efe..08d6d3d 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -46,6 +46,7 @@
 
 #include access/transam.h
 #include access/xact.h
+#include lib/pairingheap.h
 #include miscadmin.h
 #include storage/predicate.h
 #include storage/proc.h
@@ -58,6 +59,12 @@
 #include utils/syscache.h
 #include utils/tqual.h
 
+/* Prototypes for local functions */
+static Snapshot CopySnapshot(Snapshot snapshot);
+static void FreeSnapshot(Snapshot snapshot);
+static void SnapshotResetXmin(void);
+static int xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg);
+
 
 /*
  * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
@@ -122,14 +129,8 @@ typedef struct ActiveSnapshotElt
 /* Top of the stack of active snapshots */
 static ActiveSnapshotElt *ActiveSnapshot = NULL;
 
-/*
- * How many snapshots is resowner.c tracking for us?
- *
- * Note: for now, a simple counter is enough.  However, if we ever want to be
- * smarter about advancing our MyPgXact-xmin we will need to be more
- * sophisticated about this, perhaps keeping our own list of snapshots.
- */
-static int	RegisteredSnapshots = 0;
+/* Snapshots registered with resowners. Ordered in a heap by xmin. */
+static pairingheap RegisteredSnapshots = { xmin_cmp, NULL, NULL };
 
 /* first GetTransactionSnapshot call in a transaction? */
 bool		FirstSnapshotSet = false;
@@ -151,11 +152,6 @@ static Snapshot FirstXactSnapshot = NULL;
 static List *exportedSnapshots = NIL;
 
 
-static Snapshot CopySnapshot(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
-static void SnapshotResetXmin(void);
-
-
 /*
  * GetTransactionSnapshot
  *		Get the appropriate snapshot for a new query in a transaction.
@@ -183,7 +179,7 @@ GetTransactionSnapshot(void)
 	/* First call in transaction? */
 	if (!FirstSnapshotSet)
 	{
-		Assert(RegisteredSnapshots == 0);
+		Assert(pairingheap_is_empty(RegisteredSnapshots));
 		Assert(FirstXactSnapshot == NULL);
 
 		/*
@@ -205,7 +201,7 @@ GetTransactionSnapshot(void)
 			FirstXactSnapshot = CurrentSnapshot;
 			/* Mark it as registered in FirstXactSnapshot */
 			FirstXactSnapshot-regd_count++;
-			RegisteredSnapshots++;
+			pairingheap_add(RegisteredSnapshots, FirstXactSnapshot-ph_node);
 		}
 		else
 			CurrentSnapshot = GetSnapshotData(CurrentSnapshotData);
@@ -350,7 +346,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid)
 	/* Caller should have checked this already */
 	Assert(!FirstSnapshotSet);
 
-	Assert(RegisteredSnapshots == 0);
+	Assert(pairingheap_is_empty(RegisteredSnapshots));
 	Assert(FirstXactSnapshot == NULL);
 	Assert(!HistoricSnapshotActive());
 
@@ -413,7 +409,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid)
 		FirstXactSnapshot = CurrentSnapshot;
 		/* Mark it as registered in FirstXactSnapshot */
 		FirstXactSnapshot-regd_count++;
-		RegisteredSnapshots++;
+		pairingheap_add(RegisteredSnapshots, FirstXactSnapshot-ph_node);
 	}
 
 	FirstSnapshotSet = true;
@@ -639,7 +635,8 @@ RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner)
 	snap-regd_count++;
 	ResourceOwnerRememberSnapshot(owner, snap);
 
-	RegisteredSnapshots++;
+	if (snap-regd_count == 1)
+		pairingheap_add(RegisteredSnapshots, snap-ph_node);
 
 	return snap;
 }
@@ -671,11 +668,16 @@ UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner)
 		return;
 
 	Assert(snapshot-regd_count  0);
-	Assert(RegisteredSnapshots  0);
+	Assert(!pairingheap_is_empty(RegisteredSnapshots));
 
 	ResourceOwnerForgetSnapshot(owner, snapshot);
-	RegisteredSnapshots--;
-	if (--snapshot-regd_count == 0  snapshot-active_count == 0)
+
+	snapshot-regd_count--;
+
+	if (snapshot-regd_count == 0)
+		pairingheap_remove(RegisteredSnapshots, snapshot-ph_node);
+
+	if (snapshot-regd_count == 0  snapshot-active_count == 0)
 	{
 		FreeSnapshot(snapshot);
 		SnapshotResetXmin();
@@ -683,17 +685,54 @@ 

Re: [HACKERS] inherit support for foreign tables

2014-12-22 Thread Etsuro Fujita
On 2014/12/18 7:04, Tom Lane wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 Attached are updated patches.  Patch fdw-inh-5.patch has been created on
 top of patch fdw-chk-5.patch.

 I've committed fdw-chk-5.patch with some minor further adjustments;

 Have not looked at the other patch yet.

I updated the remaining patch correspondingly to the fix [1].  Please
find attached a patch (the patch has been created on top of the patch in
[1]).

I haven't done anything about the issue that postgresGetForeignPlan() is
using get_parse_rowmark(), discussed in eg, [2].  Tom, will you work on
that?

Thanks,

[1] http://www.postgresql.org/message-id/5497bf4c.6080...@lab.ntt.co.jp
[2] http://www.postgresql.org/message-id/18256.1418401...@sss.pgh.pa.us

Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***
*** 148,153  EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv WHERE a  0;
--- 148,167 
  SELECT * FROM agg_csv WHERE a  0;
  RESET constraint_exclusion;
  
+ -- table inheritance tests
+ CREATE TABLE agg (a int2, b float4);
+ ALTER FOREIGN TABLE agg_csv INHERIT agg;
+ SELECT tableoid::regclass, * FROM agg;
+ SELECT tableoid::regclass, * FROM agg_csv;
+ SELECT tableoid::regclass, * FROM ONLY agg;
+ -- updates aren't supported
+ UPDATE agg SET a = 1;
+ DELETE FROM agg WHERE a = 100;
+ -- but this should be ignored
+ SELECT tableoid::regclass, * FROM agg FOR UPDATE;
+ ALTER FOREIGN TABLE agg_csv NO INHERIT agg;
+ DROP TABLE agg;
+ 
  -- privilege tests
  SET ROLE file_fdw_superuser;
  SELECT * FROM agg_text ORDER BY a;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***
*** 249,254  SELECT * FROM agg_csv WHERE a  0;
--- 249,294 
  (0 rows)
  
  RESET constraint_exclusion;
+ -- table inheritance tests
+ CREATE TABLE agg (a int2, b float4);
+ ALTER FOREIGN TABLE agg_csv INHERIT agg;
+ SELECT tableoid::regclass, * FROM agg;
+  tableoid |  a  |b
+ --+-+-
+  agg_csv  | 100 |  99.097
+  agg_csv  |   0 | 0.09561
+  agg_csv  |  42 |  324.78
+ (3 rows)
+ 
+ SELECT tableoid::regclass, * FROM agg_csv;
+  tableoid |  a  |b
+ --+-+-
+  agg_csv  | 100 |  99.097
+  agg_csv  |   0 | 0.09561
+  agg_csv  |  42 |  324.78
+ (3 rows)
+ 
+ SELECT tableoid::regclass, * FROM ONLY agg;
+  tableoid | a | b 
+ --+---+---
+ (0 rows)
+ 
+ -- updates aren't supported
+ UPDATE agg SET a = 1;
+ ERROR:  cannot update foreign table agg_csv
+ DELETE FROM agg WHERE a = 100;
+ ERROR:  cannot delete from foreign table agg_csv
+ -- but this should be ignored
+ SELECT tableoid::regclass, * FROM agg FOR UPDATE;
+  tableoid |  a  |b
+ --+-+-
+  agg_csv  | 100 |  99.097
+  agg_csv  |   0 | 0.09561
+  agg_csv  |  42 |  324.78
+ (3 rows)
+ 
+ ALTER FOREIGN TABLE agg_csv NO INHERIT agg;
+ DROP TABLE agg;
  -- privilege tests
  SET ROLE file_fdw_superuser;
  SELECT * FROM agg_text ORDER BY a;
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 3027,3032  NOTICE:  NEW: (13,test triggered !)
--- 3027,3544 
  (1 row)
  
  -- ===
+ -- test inheritance features
+ -- ===
+ CREATE TABLE a (aa TEXT);
+ CREATE TABLE loct (aa TEXT, bb TEXT);
+ CREATE FOREIGN TABLE b (bb TEXT) INHERITS (a)
+   SERVER loopback OPTIONS (table_name 'loct');
+ INSERT INTO a(aa) VALUES('aaa');
+ INSERT INTO a(aa) VALUES('');
+ INSERT INTO a(aa) VALUES('a');
+ INSERT INTO a(aa) VALUES('aa');
+ INSERT INTO a(aa) VALUES('aaa');
+ INSERT INTO a(aa) VALUES('');
+ INSERT INTO b(aa) VALUES('bbb');
+ INSERT INTO b(aa) VALUES('');
+ INSERT INTO b(aa) VALUES('b');
+ INSERT INTO b(aa) VALUES('bb');
+ INSERT INTO b(aa) VALUES('bbb');
+ INSERT INTO b(aa) VALUES('');
+ SELECT relname, a.* FROM a, pg_class where a.tableoid = pg_class.oid;
+  relname |aa
+ -+--
+  a   | aaa
+  a   | 
+  a   | a
+  a   | aa
+  a   | aaa
+  a   | 
+  b   | bbb
+  b   | 
+  b   | b
+  b   | bb
+  b   | bbb
+  b   | 
+ (12 rows)
+ 
+ SELECT relname, b.* FROM b, pg_class where b.tableoid = pg_class.oid;
+  relname |aa| bb 
+ -+--+
+  b   | bbb  | 
+  b   |  | 
+  b   | b| 
+  b   | bb   | 
+  b   | bbb  | 
+  b   |  | 
+ (6 rows)
+ 
+ SELECT relname, a.* FROM ONLY a, pg_class where a.tableoid = pg_class.oid;
+  relname |aa
+ -+--
+  a   | aaa
+  a   | 
+  a   | a
+  a   | aa
+  a   | aaa
+  a   | 
+ (6 rows)
+ 
+ UPDATE a SET aa='zz' WHERE aa LIKE 

Re: [HACKERS] TABLESAMPLE patch

2014-12-22 Thread Tomas Vondra
On 22.12.2014 10:07, Petr Jelinek wrote:
 On 21/12/14 18:38, Tomas Vondra wrote:

 (1) The patch adds a new catalog, but does not bump CATVERSION.

 
 I thought this was always done by committer?

Right. Sorry for the noise.

 
 (2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
  as it squishes everything into a single chunk. That's inconsistent
  with naming of the other catalogs. I think pg_table_sample_method
  would be better.
 
 Fair point, but perhaps pg_tablesample_method then as tablesample is 
 used as single word everywhere including the standard.

Sounds good.

 (4) I noticed there's an interesting extension in SQL Server, which
  allows specifying PERCENT or ROWS, so you can say

SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);

  or

SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);

  That seems handy, and it'd make migration from SQL Server easier.
  What do you think?
 
 Well doing it exactly this way somewhat kills the extensibility which
 was one of the main goals for me - I allow any kind of parameters for
 sampling and the handling of those depends solely on the sampling
 method. This means that in my approach if you'd want to change what you
 are limiting you'd have to write new sampling method and the query would
 then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500);
 or some such (depending on how you name the sampling method). Or SELECT
 * FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend
 the SYSTEM sampling method, that would be also doable.
 
 The reason for this is that I don't want to really limit too much what
 parameters can be send to sampling (I envision even SYSTEM_TIMED
 sampling method that will get limit as time interval for example).

Good point.


 (6) This seems slightly wrong, because of long/uint32 mismatch:

long seed = PG_GETARG_UINT32(1);

  I think uint32 would be more appropriate, no?
 
 Probably, although I need long later in the algorithm anyway.

Really? ISTM the sampler_setseed() does not really require long, uint32
would work exactly the same.


 (9) The current regression tests only use the REPEATABLE cases. I
  understand queries without this clause are RANDOM, but maybe we
  could do something like this:

 SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
   SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
 ) foo;

   Granted, there's still a small probability of false positive, but
   maybe that's sufficiently small? Or is the amount of code this
   tests negligible?
 
 Well, depending on fillfactor and limit it could be made quite reliable
 I think, I also want to add test with VIEW (I think v2 has a bug there)
 and perhaps some JOIN.

OK.


 (10) In the initial patch you mentioned it's possible to write custom
   sampling methods. Do you think a CREATE TABLESAMPLE METHOD,
   allowing custom methods implemented as extensions would be useful?

 
 Yes, I think so, CREATE/DROP TABLESAMPLE METHOD is on my TODO, but since
 that's just simple mechanical work with no hard problems to solve there
 I can add it later once we have agreement on the general approach of the
 patch (especially in the terms of extensibility).

OK, good to know.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench -f and vacuum

2014-12-22 Thread Tomas Vondra
On 22.12.2014 07:36, Tatsuo Ishii wrote:
 On 22.12.2014 00:28, Tomas Vondra wrote:

 (2) The 'executeStatement2' API is a bit awkward as the signarure

 executeStatement2(PGconn *con, const char *sql, const char *table);

 suggests that the 'sql' command is executed when 'table' exists.
 But that's not the case, because what actually gets executed is
 'sql table'.
 
 Any replacement idea for sql and table?

What about removing the concatenation logic, and just passing the whole
query to executeStatement2()? The queries are reasonably short, IMHO.

 (3) The 'is_table_exists' should be probably just 'table_exists'.


 (4) The SQL used in is_table_exists to check table existence seems
 slightly wrong, because 'to_regclass' works for other relation
 kinds, not just regular tables - it will match views for example.
 While a conflict like that (having an object with the right name
 but not a regular table) is rather unlikely I guess, a more correct
 query would be this:

   SELECT oid FROM pg_class WHERE relname = '%s' AND relkind = 'r';
 
 This doesn't always work if schema search path is set.

True. My point was that the to_regclass() is not exactly sufficient.
Maybe that's acceptable, maybe not.

 (5) I'm not a libpq expert, but I don't see how the PQgetvalue() could
 return anything except true/false, so the

 if (result == NULL)
 {
 PQclear(res);
 return false;
 }

 seems a bit pointless to me. But maybe it's necessary?
 
 PQgetvalue could return NULL, if the parameter is wrong. I don't want
 to raise segfault in any case.

So, how could the parameter be wrong? Or what about using PQgetisnull()?


 (7) I also find the coding in main() around line 3250 a bit clumsy. The
 first reason is that it only checks existence of 'pgbench_branches'
 and then vacuums three pgbench_tellers and pgbench_history in the
 same block. If pgbench_branches does not exist, there will be no
 message printed (but the tables will be vacuumed).
 
 So we should check the existence of all pgbench_branches,
 pgbench_tellers, pgbench_history and pgbench_accounts? Not sure if
 it's worth the trouble.

I'm not sure. But if we use 'at least one of the tables exists' logic,
then I don't see a reason for msg1 and msg2. A single flag would be
sufficient.

 The second reason is that the msg1, msg2 variables seem unnecessary.
 IMHO this is a bit better:
 
 This will behave differently from what pgbench it is now. If -f is not
 specified and pgbench_* does not exist, then pgbech silently skipps
 vacuum. Today pgbench raises error in this case.

I don't understand. I believe the code I proposed does exactly the same
thing as the patch, no? I certainly don't see any error messages in the
patch ...

 (8) Also, I think it's not necessary to define function prototypes for
 executeStatement2 and is_table_exists. It certainly is not
 consistent with the other functions defined in pgbench.c (e.g.
 there's no prototype for executeStatement). Just delete the two
 prototypes and move is_table_exists before executeStatement2.
 
 I think not having static function prototypes is not a good
 custom. See other source code in PostgreSQL.

Yes, but apparently pgbench.c does not do that. It's strange to have
prototypes for just two of many functions in the file.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-12-22 Thread Robert Haas
On Tue, Dec 16, 2014 at 1:28 PM, Stephen Frost sfr...@snowman.net wrote:
 The magic audit role has SELECT rights on a given table.  When any
 user does a SELECT against that table, ExecCheckRTPerms is called and
 there's a hook there which the module can use to say ok, does the audit
 role have any permissions here? and, if the result is yes, then the
 command is audited.  Note that this role, from core PG's perspective,
 wouldn't be special at all; it would just be that pgaudit would use the
 role's permissions as a way to figure out if a given command should be
 audited or not.

This is a little weird because you're effectively granting an
anti-permission.  I'm not sure whether that ought to be regarded as a
serious problem, but it's a little surprising.

Also, what makes the audit role magical?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres TR for missing chunk

2014-12-22 Thread Robert Haas
On Fri, Dec 19, 2014 at 4:18 AM, M Tarkeshwar Rao
m.tarkeshwar@ericsson.com wrote:
 We are facing this issue and want to analyse this through logging.
 can you please share a sample Postgres config file to enable max logging with 
 syslog support?

 What should be the debug level so that I can capture the failure information?

I don't think that cranking up the debug level is going to help you
here.  Instead, you should tell us what you did that ended up causing
this error, especially if it is a reproducible series of steps,
because then we might be able to identify the cause, and/or fix it.

It is worth noting that one possible explanation for this is that your
database is corrupted.  That can happen for a variety of reasons, as I
explained in one of my blog posts:

http://rhaas.blogspot.com/2012/03/why-is-my-database-corrupted.html

You should check and make sure that none of those causes sound like
things that could have happened in your case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication Helpers WIP for discussion

2014-12-22 Thread Robert Haas
On Fri, Dec 19, 2014 at 8:40 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 What I hope to get from this is agreement on the general approach and
 protocol so that we can have common base which will both make it easier to
 create external logical replication solutions and also eventually lead to
 full logical replication inside core PostgreSQL.

The protocol is a really important topic which deserves its own
discussion.  Andres has mentioned some of the ideas he has in mind -
which I think are similar to what you did here - but there hasn't
really been a thread devoted to discussing that topic specifically.  I
think that would be a good idea: lay out what you have in mind, and
why, and solicit feedback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table-level log_autovacuum_min_duration

2014-12-22 Thread Robert Haas
On Sat, Dec 20, 2014 at 9:17 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 But now, here are some things to consider if we use directly the
 reloptions available with RelationData:
 - If the parameters toast.autovacuum_* are not set, toast relations
 inherit values from their parent relation. Looking at autovacuum.c
 which is the only place where autovacuum options are located, we keep
 a hash table to save the mapping toast - parent relation. Using that
 it is easy to fetch for a given toast relation the relopts of its
 parent. Note this is strictly located in the autovacuum code path, so
 to let vacuum and analyze now the custom value of
 log_autovacuum_min_duration, with the inheritance property kept, we
 would need to pass some extra values from autovacuum to the calls of
 vacuum().
 - It would not be possible to log autovacuum and analyze being skipped
 when lock is not available because in this case Relation cannot be
 safely fetched, so there are no reltoptions directly available. This
 is for example this kind of message:
 skipping analyze of foo --- lock not available

 Both those things could be solved by passing a value through
 VacuumStmt, like what we do when passing a value for
 multixact_freeze_min_age, or with an extra argument in vacuum() for
 example. Now I am not sure if it is worth it, and we could live as
 well with a parameter that do not support the inheritance parent
 relation - toast, so log value set for a toast relation and its
 parent share no dependency. In short that's a trade between code
 simplicity and usability.

I'm not sure I follow all of the particulars of what you are asking
here, but in general I would say that you shouldn't hesitate to pass
more information down the call stack when that helps to obtain correct
behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-12-22 Thread Robert Haas
On Sat, Dec 20, 2014 at 7:30 PM, Peter Geoghegan p...@heroku.com wrote:
 On Sat, Dec 20, 2014 at 3:17 PM, Peter Geoghegan p...@heroku.com wrote:
 Attached patch implements this scheme.

 I had another thought: NAMEDATALEN + 1 is a better representation of
 infinity for matching purposes than INT_MAX. I probably should have
 made that change, too. It would then not have been necessary to
 #include limits.h. I think that this is a useful
 belt-and-suspenders precaution against integer overflow. It almost
 certainly won't matter, since it's very unlikely that the best match
 within an RTE will end up being a dropped column, but we might as well
 do it that way (Levenshtein distance is costed in multiples of code
 point changes, but the maximum density is 1 byte per codepoint).

Good idea.

Looking over the latest patch, I think we could simplify the code so
that you don't need multiple FuzzyAttrMatchState objects.  Instead of
creating a separate one for each RTE and then merging them, just have
one.  When you find an inexact-RTE name match, set a field inside the
FuzzyAttrMatchState -- maybe with a name like rte_penalty -- to the
Levenshtein distance between the RTEs.  Then call scanRTEForColumn()
and pass down the same state object.  Now let
updateFuzzyAttrMatchState() work out what it needs to do based on the
observed inter-column distance and the currently-in-force RTE penalty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] moving from contrib to bin

2014-12-22 Thread Michael Paquier
On Thu, Dec 18, 2014 at 10:37 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Dec 18, 2014 at 4:02 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

 I know this is how it currently works, but it looks way too messy to me:

 +   my $pgarchivecleanup = AddSimpleFrontend('pg_archivecleanup');
 +   my $pgstandby = AddSimpleFrontend('pg_standby');
 +   my $pgtestfsync = AddSimpleFrontend('pg_test_fsync');
 +   my $pgtesttiming = AddSimpleFrontend('pg_test_timing');
 +   my $pgbench = AddSimpleFrontend('pgbench', 1);

 ISTM we should be something like

 for each $elem in src/bin/Makefile:$(SUBDIRS)
 AddSimpleFrontend($elem)

 and avoid having to list the modules one by one.

 If we take this road, I'd like to avoid a huge if/elseif scanning the
 names of the submodules to do the necessary adjustments (Some need
 FRONTEND defined, others ws2_32, etc.). Also, there is the case of
 pg_basebackup where multiple binaries are included with pg_basebackup,
 pg_recvlogical and pg_receivexlog. So I think that we'd need something
 similar to what contrib does, aka:
 my @frontend_excludes = ('pg_basebackup', 'pg_dump', 'pg_dumpall',
 'pg_xlogdump', 'initdb' ...);
 my frontend_extralibs = ('pgbench' = 'ws2_32.lib');
 my @frontend_uselibpq = ('pgbench', 'pg_ctl', 'pg_upgrade');
 And for each frontend name excluded we have an individual project
 declaration with its own exceptions. With this way of doing when a new
 frontend is added by default in src/bin it will be automatically
 compiled. How does that sound?
And here is an updated patch following those lines. Similarly to the
things in contrib/, a set of variables are used to define for each
module what are the extra libraries, include dirs, etc. This refactors
quite a bit of code, even if there are a couple of exceptions like
pg_xlogdump/ or pg_basebackup/.
-- 
Michael
From 894ed2d918b142727661055d1ef60dd3ba9f6022 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Mon, 15 Dec 2014 22:16:36 -0800
Subject: [PATCH] Add MSVC support for new modules in src/bin

---
 src/tools/msvc/Mkvcbuild.pm | 158 +---
 src/tools/msvc/clean.bat|   4 +-
 src/tools/msvc/vcregress.pl |   6 +-
 3 files changed, 96 insertions(+), 72 deletions(-)

diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 004942c..d4fd03f 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -29,29 +29,37 @@ my $libpgcommon;
 my $postgres;
 my $libpq;
 
+# Set of variables for contrib modules
 my $contrib_defines = { 'refint' = 'REFINT_VERBOSE' };
-my @contrib_uselibpq =
-  ('dblink', 'oid2name', 'pgbench', 'pg_upgrade', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = (
-	'oid2name',  'pgbench',
-	'pg_standby','pg_archivecleanup',
-	'pg_test_fsync', 'pg_test_timing',
-	'pg_upgrade','pg_xlogdump',
-	'vacuumlo');
-my @contrib_uselibpgcommon = (
-	'oid2name',  'pgbench',
-	'pg_standby','pg_archivecleanup',
-	'pg_test_fsync', 'pg_test_timing',
-	'pg_upgrade','pg_xlogdump',
-	'vacuumlo');
-my $contrib_extralibs = { 'pgbench' = ['ws2_32.lib'] };
+my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'vacuumlo');
+my $contrib_extralibs = {};
 my $contrib_extraincludes =
-  { 'tsearch2' = ['contrib/tsearch2'], 'dblink' = ['src/backend'] };
-my $contrib_extrasource = {
-	'cube' = [ 'cubescan.l', 'cubeparse.y' ],
-	'seg'  = [ 'segscan.l',  'segparse.y' ], };
+  { 'tsearch2' = ['contrib\tsearch2'], 'dblink' = ['src\backend'] };
+my $contrib_extrafiles = {
+	'cube' = [ 'contrib\cube\cubescan.l', 'contrib\cube\cubeparse.y' ],
+	'seg'  = [ 'contrib\seg\segscan.l',  'contrib\seg\segparse.y' ], };
 my @contrib_excludes = ('pgcrypto', 'intagg', 'sepgsql');
 
+# Set of variables for frontend modules
+my $frontend_defines = { 'initdb' = 'FRONTEND' };
+my @frontend_uselibpq = ('pg_ctl', 'pg_upgrade', 'pgbench', 'psql');
+my @frontend_uselibpgport = ('pg_ctl', 'pg_upgrade', 'psql');
+my @frontend_uselibpgcommon = ('pg_ctl', 'psql');
+my $frontend_extralibs = {'initdb' = ['ws2_32.lib'],
+		  'pg_dump' = ['ws2_32.lib'],
+		  'pg_restore' = ['ws2_32.lib'],
+		  'pgbench' = ['ws2_32.lib'],
+		  'psql' = ['ws2_32.lib'] };
+my $frontend_extraincludes = {
+   'initdb' = ['src\timezone'],
+   'psql' = ['src\bin\pg_dump', 'src\backend'] };
+my $frontend_extrafiles = {
+	'psql' = [ 'src\bin\psql\psqlscan.l' ] };
+my @frontend_excludes = ('pgevent', 'pg_basebackup', 'pg_dump', 'pg_dumpall',
+		 'pg_restore', 'pg_xlogdump', 'pgevent', 'scripts');
+
 sub mkvcbuild
 {
 	our $config = shift;
@@ -376,11 +384,15 @@ sub mkvcbuild
 	$pgregress_isolation-AddReference($libpgcommon, $libpgport);
 
 	# src/bin
-	my $initdb = AddSimpleFrontend('initdb');
-	$initdb-AddIncludeDir('src\interfaces\libpq');
-	

Re: [HACKERS] moving from contrib to bin

2014-12-22 Thread Alvaro Herrera
Michael Paquier wrote:

 And here is an updated patch following those lines. Similarly to the
 things in contrib/, a set of variables are used to define for each
 module what are the extra libraries, include dirs, etc. This refactors
 quite a bit of code, even if there are a couple of exceptions like
 pg_xlogdump/ or pg_basebackup/.

In a broad look, this looks a lot better.  I think we should press
forward with this whole set of patches and see what the buildfarm
thinks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving src/backend/utils/misc/rbtree.c to src/backend/lib

2014-12-22 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 Peter Geoghegan suggested [1] moving rbtree.c to src/backend/lib, which 
 I think makes a lot of sense. Now that we have several other general 
 purpose data structures in src/backend/lib (linked lists, a binary heap, 
 and a pairing heap), rbtree.c would definitely be better placed in 
 src/backend/lib, too.

 The usual objection to moving things is that it makes back-patching 
 harder. It also might break third-party code that use it (since 
 presumably we would also move the .h file). Nevertheless, I feel the 
 advantages outweigh the disadvantages in this case.

 Any objections?

A look at the git history says that rbtree.h/.c have not been touched
(except by copyright/pgindent commits) since 9.0, so probably the
backpatch argument doesn't have much force.

However, wasn't there some speculation about removing rbtree entirely?
If we're going to end up doing that, moving the files first is just
unnecessary thrashing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving src/backend/utils/misc/rbtree.c to src/backend/lib

2014-12-22 Thread Heikki Linnakangas

On 12/22/2014 05:19 PM, Tom Lane wrote:

However, wasn't there some speculation about removing rbtree entirely?


Not that I recall. It's still used for GIN bulk loading. There might be 
better ways to do that, but there hasn't been any serious discussion on 
that.


There was some discussion on replacing the existing binary heap usage 
with the pairing heap, in MergeAppend and in tuplesort.c, but that's a 
different story.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-12-22 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 I haven't done anything about the issue that postgresGetForeignPlan() is
 using get_parse_rowmark(), discussed in eg, [2].  Tom, will you work on
 that?

Yeah, we need to do something about the PlanRowMark data structure.
Aside from the pre-existing issue in postgres_fdw, we need to fix it
to support inheritance trees in which more than one rowmark method
is being used.  That rte.hasForeignChildren thing is a crock, and
would still be a crock even if it were correctly documented as being
a planner temporary variable (rather than the current implication that
it's always valid).  RangeTblEntry is no place for planner temporaries.

The idea I'd had about that was to convert the markType field into a
bitmask, so that a parent node's markType could represent the logical
OR of the rowmark methods being used by all its children.  I've not
attempted to code this up though, and probably won't get to it until
after Christmas.  One thing that's not clear is what should happen
with ExecRowMark.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving src/backend/utils/misc/rbtree.c to src/backend/lib

2014-12-22 Thread Robert Haas
On Mon, Dec 22, 2014 at 5:16 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Peter Geoghegan suggested [1] moving rbtree.c to src/backend/lib, which I
 think makes a lot of sense. Now that we have several other general purpose
 data structures in src/backend/lib (linked lists, a binary heap, and a
 pairing heap), rbtree.c would definitely be better placed in
 src/backend/lib, too.

 The usual objection to moving things is that it makes back-patching harder.
 It also might break third-party code that use it (since presumably we would
 also move the .h file). Nevertheless, I feel the advantages outweigh the
 disadvantages in this case.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-22 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Sat, Dec 13, 2014 at 04:37:48AM +, Andrew Gierth wrote:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
 Tom That seems pretty grotty from a performance+memory consumption
 Tom standpoint.  At peak memory usage, each one of the Sort nodes
 Tom will contain every input row,

 Has this objection ever been raised for WindowAgg, which has the same
 issue?

 I caution against using window function performance as the template for
 GROUPING SETS performance goals.  The benefit of GROUPING SETS compared to its
 UNION ALL functional equivalent is 15% syntactic pleasantness, 85% performance
 opportunities.  Contrast that having window functions is great even with naive
 performance, because they enable tasks that are otherwise too hard in SQL.

The other reason that's a bad comparison is that I've not seen many
queries that use more than a couple of window frames, whereas we have
to expect that the number of grouping sets in typical queries will be
significantly more than a couple.  So we do have to think about what
the performance will be like with a lot of sort steps.  I'm also worried
that this use-case may finally force us to do something about the one
work_mem per sort node behavior, unless we can hack things so that only
one or two sorts reach max memory consumption concurrently.

I still find the ChainAggregate approach too ugly at a system structural
level to accept, regardless of Noah's argument about number of I/O cycles
consumed.  We'll be paying for that in complexity and bugs into the
indefinite future, and I wonder if it isn't going to foreclose some other
performance opportunities as well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gin and ranges

2014-12-22 Thread Heikki Linnakangas

On 12/22/2014 03:15 AM, Michael Paquier wrote:

On Thu, Dec 18, 2014 at 4:13 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

On 10/22/2014 01:55 PM, Teodor Sigaev wrote:


Suggested patch adds GIN support contains operator for ranges over scalar
column.

It allows more effective GIN scan. Currently, queries like
SELECT * FROM test_int4 WHERE i = 1 and i = 1
will be excuted by GIN with two scans: one is from mines infinity to 1 and
another is from -1 to plus infinity. That's because GIN is generalized
and it
doesn't know a semantics of operation.

With patch it's possible to rewrite query with ranges
SELECT * FROM test_int4 WHERE i @ '[-1, 1]'::int4range
and GIN index will support this query with single scan from -1 to 1.

Patch provides index support only for existing range types: int4, int8,
numeric,
date and timestamp with and without time zone.



I started to look at this, but very quickly got carried away into
refactoring away the macros. Defining long functions as macros makes
debugging quite difficult, and it's not nice to read or edit the source code
either.

I propose the attached refactoring (it doesn't include your range-patch yet,
just refactoring the existing code). It turns the bulk of the macros into
static functions. GIN_SUPPORT macro still defines the datatype-specific
functions, but they are now very thin wrappers that just call the
corresponding generic static functions.

It's annoying that we need the concept of a leftmost value, for the  and =
queries. Without that, we could have the support functions look up the
required datatype information from the type cache, based on the datatype of
the passed argument. Then you could easily use the btree_gin support
functions also for user-defined datatypes. But that needs bigger changes,
and this is a step in the right direction anyway.

I just had a look at the refactoring patch and ISTM that this is a
good step forward in terms of readability. Teodor, I am noticing that
your patch cannot apply once the refactoring is done. Could you rebase
your patch once the refactoring is pushed?s


I've pushed the refactoring patch. Here's a version of Teodor's patch, 
rebased over the pushed patch.


Teodor's patch could use some more comments. The 
STOP_SCAN/MATCH_SCAN/CONT_SCAN macros are a good idea, but they probably 
should go into src/include/access/gin.h so that they can be used in all 
compare_partial implementations.


- Heikki
commit 51217bac66009f876d44ee547c25523a1b0eaeb3
Author: Heikki Linnakangas heikki.linnakan...@iki.fi
Date:   Mon Dec 22 17:38:40 2014 +0200

Rebase Teodor's btree_gin_range-1.patch over my refactorings.

diff --git a/contrib/btree_gin/Makefile b/contrib/btree_gin/Makefile
index 0492091..b85c390 100644
--- a/contrib/btree_gin/Makefile
+++ b/contrib/btree_gin/Makefile
@@ -4,7 +4,7 @@ MODULE_big = btree_gin
 OBJS = btree_gin.o $(WIN32RES)
 
 EXTENSION = btree_gin
-DATA = btree_gin--1.0.sql btree_gin--unpackaged--1.0.sql
+DATA = btree_gin--1.1.sql btree_gin--1.0--1.1.sql btree_gin--unpackaged--1.0.sql
 PGFILEDESC = btree_gin - B-tree equivalent GIN operator classes
 
 REGRESS = install_btree_gin int2 int4 int8 float4 float8 money oid \
diff --git a/contrib/btree_gin/btree_gin--1.0--1.1.sql b/contrib/btree_gin/btree_gin--1.0--1.1.sql
new file mode 100644
index 000..a7f0e54
--- /dev/null
+++ b/contrib/btree_gin/btree_gin--1.0--1.1.sql
@@ -0,0 +1,7 @@
+ALTER OPERATOR FAMILY int4_ops USING gin ADD OPERATOR 8 @(anyelement, anyrange);
+ALTER OPERATOR FAMILY int8_ops USING gin ADD OPERATOR 8 @(anyelement, anyrange);
+ALTER OPERATOR FAMILY timestamp_ops USING gin ADD OPERATOR 8 @(anyelement, anyrange);
+ALTER OPERATOR FAMILY timestamptz_ops USING gin ADD OPERATOR 8 @(anyelement, anyrange);
+ALTER OPERATOR FAMILY date_ops USING gin ADD OPERATOR 8 @(anyelement, anyrange);
+ALTER OPERATOR FAMILY numeric_ops USING gin ADD OPERATOR 8 @(anyelement, anyrange);
+
diff --git a/contrib/btree_gin/btree_gin--1.0.sql b/contrib/btree_gin/btree_gin--1.0.sql
deleted file mode 100644
index cf867ef..000
--- a/contrib/btree_gin/btree_gin--1.0.sql
+++ /dev/null
@@ -1,689 +0,0 @@
-/* contrib/btree_gin/btree_gin--1.0.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use CREATE EXTENSION btree_gin to load this file. \quit
-
-CREATE FUNCTION gin_btree_consistent(internal, int2, anyelement, int4, internal, internal)
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION gin_extract_value_int2(int2, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION gin_compare_prefix_int2(int2, int2, int2, internal)
-RETURNS int4
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION gin_extract_query_int2(int2, internal, int2, internal, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR CLASS int2_ops
-DEFAULT FOR TYPE int2 USING gin
-AS
-OPERATOR1   ,
-

Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread José Luis Tallón

On 12/21/2014 10:30 PM, Fabrízio de Royes Mello wrote:

[snip]


I do agree that vacuum schema might very well be useful (I'll probably 
use it myself from time to time, too).
ANALYZE SCHEMA (specially coupled with some transaction-wide SET 
statistics_target could be beneficial)




 And why that, but not
 say schema-wide ANALYZE, CLUSTER, TRUNCATE, ...


+1. I can write patches for each of this maintenance statement too.


Hmm... I think Tom might have been a bit rethorical (or even sarcastic 
with that), but I can definitely be wrong.


Do we really want to have some such operation potentially (and 
inadvertently) locking for *hours* at a time?


CLUSTER SCHEMA somename;

... where schema somename contains myHugeTable

Given that the cluster command exclusively locks and rewrites the 
table, it might lock queries and overwhelm the I/O subsystem for quite a 
long time.



TRUNCATE SCHEMA whateversounds quite dangerous, too.



Just my .02€

/ J.L.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Andres Freund
On 2014-12-21 14:18:33 -0500, Tom Lane wrote:
 =?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= fabriziome...@gmail.com writes:
  I work with some customer that have databases with a lot of schemas and
  sometimes we need to run manual VACUUM in one schema, and would be nice to
  have a new option to run vacuum in relations from a specific schema.
 
 I'm pretty skeptical of this alleged use-case.  Manual vacuuming ought
 to be mostly a thing of the past, and even if it's not, hitting
 *everything* in a schema should seldom be an appropriate thing to do.

Based on my experience autovacuum isn't sufficient on bigger high
throughput databases. At the very least manual vacuuming with lower
freeze_table_age settings during low-load times is required lest
anti-wraparound vacuums increase load too much during prime business
hours.
That said, I don't see how this feature is actually helpful in those
cases. In pretty much all of what I've seen you'd want to have more
complex selection criteria than the schema.

 While the feature itself might be fairly innocuous, I'm just wondering
 why we need to encourage manual vacuuming.  And why that, but not
 say schema-wide ANALYZE, CLUSTER, TRUNCATE, ...

There's one argument for supporting more for VACUUM than the rest - it
can't be executed directly as the result of a query as the others
can... I wonder if that'd not better be answered by adding a feature to
vacuumdb that allows selecting the to-be-vacuumed table by a user
defined query.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-22 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 [Noah]
  I caution against using window function performance as the
  template for GROUPING SETS performance goals.  The benefit of
  GROUPING SETS compared to its UNION ALL functional equivalent is
  15% syntactic pleasantness, 85% performance opportunities.
  Contrast that having window functions is great even with naive
  performance, because they enable tasks that are otherwise too hard
  in SQL.

Yes, this is a reasonable point.

 Tom The other reason that's a bad comparison is that I've not seen
 Tom many queries that use more than a couple of window frames,
 Tom whereas we have to expect that the number of grouping sets in
 Tom typical queries will be significantly more than a couple.

I would be interested in seeing more good examples of the size and
type of grouping sets used in typical queries.

 Tom So we do have to think about what the performance will be like
 Tom with a lot of sort steps.  I'm also worried that this use-case
 Tom may finally force us to do something about the one work_mem per
 Tom sort node behavior, unless we can hack things so that only one
 Tom or two sorts reach max memory consumption concurrently.

Modifying ChainAggregate so that only two sorts reach max memory
consumption concurrently seems to have been quite simple to implement,
though I'm still testing some aspects of it.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench -f and vacuum

2014-12-22 Thread Alvaro Herrera
Tomas Vondra wrote:
 On 22.12.2014 07:36, Tatsuo Ishii wrote:
  On 22.12.2014 00:28, Tomas Vondra wrote:

  (8) Also, I think it's not necessary to define function prototypes for
  executeStatement2 and is_table_exists. It certainly is not
  consistent with the other functions defined in pgbench.c (e.g.
  there's no prototype for executeStatement). Just delete the two
  prototypes and move is_table_exists before executeStatement2.
  
  I think not having static function prototypes is not a good
  custom. See other source code in PostgreSQL.
 
 Yes, but apparently pgbench.c does not do that. It's strange to have
 prototypes for just two of many functions in the file.

Whenever a function is defined before its first use, a prototype is not
mandatory, so we tend to omit them, but I'm pretty sure there are cases
where we add them anyway.  I my opinion, rearranging code so that called
functions appear first just to avoid the prototype is not a very good
way to organize things, though.  I haven't looked at this patch so I
don't know whether this is what's being done here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Alvaro Herrera
José Luis Tallón wrote:
 On 12/21/2014 10:30 PM, Fabrízio de Royes Mello wrote:
 [snip]
 
 I do agree that vacuum schema might very well be useful (I'll probably use
 it myself from time to time, too).
 ANALYZE SCHEMA (specially coupled with some transaction-wide SET
 statistics_target could be beneficial)

We already have transanction-wide SET -- it's spelled SET LOCAL.

 
  And why that, but not say schema-wide ANALYZE, CLUSTER, TRUNCATE,
  ...
 
 +1. I can write patches for each of this maintenance statement too.
 
 Hmm... I think Tom might have been a bit rethorical (or even sarcastic with
 that),

That was my impression too.

 Do we really want to have some such operation potentially (and
 inadvertently) locking for *hours* at a time?
 
 CLUSTER SCHEMA somename;
 
 ... where schema somename contains myHugeTable
 
 Given that the cluster command exclusively locks and rewrites the table,
 it might lock queries and overwhelm the I/O subsystem for quite a long time.

Multi-table CLUSTER uses multiple transactions, so this should not be an
issue.  That said, I don't think there's much point in CLUSTER SCHEMA,
much less TRUNCATE SCHEMA.  Do you normally organize your schemas so
that there are some that contain only tables that need to be truncated
together?  That would be a strange use case.

Overall, this whole line of development seems like bloating the parse
tables for little gain.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Christoph Berg
Re: Alvaro Herrera 2014-12-22 20141222165157.gd1...@alvh.no-ip.org
 Multi-table CLUSTER uses multiple transactions, so this should not be an
 issue.  That said, I don't think there's much point in CLUSTER SCHEMA,
 much less TRUNCATE SCHEMA.  Do you normally organize your schemas so
 that there are some that contain only tables that need to be truncated
 together?  That would be a strange use case.

Having a schema that's only used for importing data in batch jobs
doesn't sound too unreasonable. It could then be cleaned in a simple
TRUNCATE SCHEMA import_area command.

 Overall, this whole line of development seems like bloating the parse
 tables for little gain.

Reading the thread, my impression was that most people opposed the
idea because there's ways to script vacuum schema, or because of
people shouldn't be invoking manual vacuums anyway. I think the
patch tries to solve a practical problem, and does have its merits.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Multi-table CLUSTER uses multiple transactions, so this should not be an
 issue.  That said, I don't think there's much point in CLUSTER SCHEMA,
 much less TRUNCATE SCHEMA.  Do you normally organize your schemas so
 that there are some that contain only tables that need to be truncated
 together?  That would be a strange use case.

I could see it happening in environments which use schemas when doing
partitioning.  eg: data_2014 contains all of the data_201401-201412
monthly (or perhaps weekly) tables.

 Overall, this whole line of development seems like bloating the parse
 tables for little gain.

Still, I see this point also.  I do think it'd be really great if we
could figure out a way to segregate these kinds of DDL / maintenance
commands from the normal select/insert/update/delete SQL parsing, such
that we could add more options, etc, to those longer running and less
frequent commands without impacting parse time for the high-volume
commands.

I'm less concerned about the memory impact, except to the extent that it
impacts throughput and performance.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Alvaro Herrera
Stephen Frost wrote:
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:

  Overall, this whole line of development seems like bloating the parse
  tables for little gain.
 
 Still, I see this point also.  I do think it'd be really great if we
 could figure out a way to segregate these kinds of DDL / maintenance
 commands from the normal select/insert/update/delete SQL parsing, such
 that we could add more options, etc, to those longer running and less
 frequent commands without impacting parse time for the high-volume
 commands.

We do have a parenthesized options clause in VACUUM.  I think adding
this as a clause there would be pretty much free.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-12-21 14:18:33 -0500, Tom Lane wrote:
  While the feature itself might be fairly innocuous, I'm just wondering
  why we need to encourage manual vacuuming.  And why that, but not
  say schema-wide ANALYZE, CLUSTER, TRUNCATE, ...
 
 There's one argument for supporting more for VACUUM than the rest - it
 can't be executed directly as the result of a query as the others
 can... I wonder if that'd not better be answered by adding a feature to
 vacuumdb that allows selecting the to-be-vacuumed table by a user
 defined query.

Wow.  That's certainly an interesting idea.

We might end up turning the autovacuum process into a generalized
scheduler/cron-like entity that way though.  I'd rather we just build
that.  Users would then be able to run a script periodically which
would add VACUUM commands to be run on whichever tables they want to
the jobs queue, either for immediate execution or at whatever time they
want (or possibly chronically :).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Andres Freund
On 2014-12-22 12:12:12 -0500, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  On 2014-12-21 14:18:33 -0500, Tom Lane wrote:
   While the feature itself might be fairly innocuous, I'm just wondering
   why we need to encourage manual vacuuming.  And why that, but not
   say schema-wide ANALYZE, CLUSTER, TRUNCATE, ...
  
  There's one argument for supporting more for VACUUM than the rest - it
  can't be executed directly as the result of a query as the others
  can... I wonder if that'd not better be answered by adding a feature to
  vacuumdb that allows selecting the to-be-vacuumed table by a user
  defined query.
 
 Wow.  That's certainly an interesting idea.
 
 We might end up turning the autovacuum process into a generalized
 scheduler/cron-like entity that way though.

I'm not talking about autovacuum, just plain vacuumdb.

 I'd rather we just build
 that.  Users would then be able to run a script periodically which
 would add VACUUM commands to be run on whichever tables they want to
 the jobs queue, either for immediate execution or at whatever time they
 want (or possibly chronically :).

And this discussion just feature creeped beyond anything realistic... :)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Alvaro Herrera
Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:

  There's one argument for supporting more for VACUUM than the rest - it
  can't be executed directly as the result of a query as the others
  can... I wonder if that'd not better be answered by adding a feature to
  vacuumdb that allows selecting the to-be-vacuumed table by a user
  defined query.
 
 Wow.  That's certainly an interesting idea.

+1.

 We might end up turning the autovacuum process into a generalized
 scheduler/cron-like entity that way though.  I'd rather we just build
 that.  Users would then be able to run a script periodically which
 would add VACUUM commands to be run on whichever tables they want to
 the jobs queue, either for immediate execution or at whatever time they
 want (or possibly chronically :).

This too.  I think there's one or two orders of magnitude of difference
in implementation effort of these two ideas, however.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-12-22 12:12:12 -0500, Stephen Frost wrote:
  We might end up turning the autovacuum process into a generalized
  scheduler/cron-like entity that way though.
 
 I'm not talking about autovacuum, just plain vacuumdb.

Oh, right, clearly I was thinking of autovacuum.  Adding an option like
that to vacuumdb would certainly be a lot more straight-forward.

  I'd rather we just build
  that.  Users would then be able to run a script periodically which
  would add VACUUM commands to be run on whichever tables they want to
  the jobs queue, either for immediate execution or at whatever time they
  want (or possibly chronically :).
 
 And this discussion just feature creeped beyond anything realistic... :)

Yeah, but I really *want* this... ;)

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pgbench -f and vacuum

2014-12-22 Thread Tomas Vondra
On 22.12.2014 17:47, Alvaro Herrera wrote:
 Tomas Vondra wrote:
 On 22.12.2014 07:36, Tatsuo Ishii wrote:
 On 22.12.2014 00:28, Tomas Vondra wrote:
 
 (8) Also, I think it's not necessary to define function prototypes for
 executeStatement2 and is_table_exists. It certainly is not
 consistent with the other functions defined in pgbench.c (e.g.
 there's no prototype for executeStatement). Just delete the two
 prototypes and move is_table_exists before executeStatement2.

 I think not having static function prototypes is not a good
 custom. See other source code in PostgreSQL.

 Yes, but apparently pgbench.c does not do that. It's strange to have
 prototypes for just two of many functions in the file.
 
 Whenever a function is defined before its first use, a prototype is
 not mandatory, so we tend to omit them, but I'm pretty sure there are
 cases where we add them anyway. I my opinion, rearranging code so
 that called functions appear first just to avoid the prototype is not
 a very good way to organize things, though. I haven't looked at this
 patch so I don't know whether this is what's being done here.

I'm not objecting to prototypes in general, but I believe the principle
is to respect how the existing code is written. There are almost no
other prototypes in pgbench.c - e.g. there are no prototypes for
executeStatement(), init() etc. so adding the prototypes in this patch
seems inconsistent. But maybe that's nitpicking.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Fabrízio de Royes Mello
On Mon, Dec 22, 2014 at 3:17 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Stephen Frost wrote:
  * Andres Freund (and...@2ndquadrant.com) wrote:

   There's one argument for supporting more for VACUUM than the rest - it
   can't be executed directly as the result of a query as the others
   can... I wonder if that'd not better be answered by adding a feature
to
   vacuumdb that allows selecting the to-be-vacuumed table by a user
   defined query.
 
  Wow.  That's certainly an interesting idea.

 +1.


Then to simplify can we allow the --table option of vacuumdb act similar
to the --table option of pg_dump??

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] pgbench -f and vacuum

2014-12-22 Thread Alvaro Herrera
Tomas Vondra wrote:

 I'm not objecting to prototypes in general, but I believe the principle
 is to respect how the existing code is written. There are almost no
 other prototypes in pgbench.c - e.g. there are no prototypes for
 executeStatement(), init() etc. so adding the prototypes in this patch
 seems inconsistent. But maybe that's nitpicking.

Well, given that Tatsuo-san invented pgbench in the first place, I think
he has enough authority to decide whether to add prototypes or not.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench -f and vacuum

2014-12-22 Thread Andres Freund
On 2014-12-22 18:17:56 +0100, Tomas Vondra wrote:
 On 22.12.2014 17:47, Alvaro Herrera wrote:
  Tomas Vondra wrote:
  On 22.12.2014 07:36, Tatsuo Ishii wrote:
  On 22.12.2014 00:28, Tomas Vondra wrote:
  
  (8) Also, I think it's not necessary to define function prototypes for
  executeStatement2 and is_table_exists. It certainly is not
  consistent with the other functions defined in pgbench.c (e.g.
  there's no prototype for executeStatement). Just delete the two
  prototypes and move is_table_exists before executeStatement2.
 
  I think not having static function prototypes is not a good
  custom. See other source code in PostgreSQL.
 
  Yes, but apparently pgbench.c does not do that. It's strange to have
  prototypes for just two of many functions in the file.
  
  Whenever a function is defined before its first use, a prototype is
  not mandatory, so we tend to omit them, but I'm pretty sure there are
  cases where we add them anyway. I my opinion, rearranging code so
  that called functions appear first just to avoid the prototype is not
  a very good way to organize things, though. I haven't looked at this
  patch so I don't know whether this is what's being done here.
 
 I'm not objecting to prototypes in general, but I believe the principle
 is to respect how the existing code is written. There are almost no
 other prototypes in pgbench.c - e.g. there are no prototypes for
 executeStatement(), init() etc. so adding the prototypes in this patch
 seems inconsistent. But maybe that's nitpicking.

I don't see a point in avoiding prototypes if the author thinks it makes
his patch clearer. And it's not like pgbench is the pinnacle of beauty
that would be greatly disturbed by any changes to its form.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench -f and vacuum

2014-12-22 Thread Tomas Vondra
On 22.12.2014 18:41, Andres Freund wrote:
 On 2014-12-22 18:17:56 +0100, Tomas Vondra wrote:
 On 22.12.2014 17:47, Alvaro Herrera wrote:
 Tomas Vondra wrote:
 On 22.12.2014 07:36, Tatsuo Ishii wrote:
 On 22.12.2014 00:28, Tomas Vondra wrote:

 (8) Also, I think it's not necessary to define function prototypes for
 executeStatement2 and is_table_exists. It certainly is not
 consistent with the other functions defined in pgbench.c (e.g.
 there's no prototype for executeStatement). Just delete the two
 prototypes and move is_table_exists before executeStatement2.

 I think not having static function prototypes is not a good
 custom. See other source code in PostgreSQL.

 Yes, but apparently pgbench.c does not do that. It's strange to have
 prototypes for just two of many functions in the file.

 Whenever a function is defined before its first use, a prototype is
 not mandatory, so we tend to omit them, but I'm pretty sure there are
 cases where we add them anyway. I my opinion, rearranging code so
 that called functions appear first just to avoid the prototype is not
 a very good way to organize things, though. I haven't looked at this
 patch so I don't know whether this is what's being done here.

 I'm not objecting to prototypes in general, but I believe the
 principle is to respect how the existing code is written. There are
 almost no other prototypes in pgbench.c - e.g. there are no
 prototypes for executeStatement(), init() etc. so adding the
 prototypes in this patch seems inconsistent. But maybe that's
 nitpicking.
 
 I don't see a point in avoiding prototypes if the author thinks it 
 makes his patch clearer. And it's not like pgbench is the pinnacle
 of beauty that would be greatly disturbed by any changes to its form.

I'm not recommending to avoid them (although I'm not sure they make the
patch/code cleaner in this case). It was just a minor observation that
the patch introduces prototypes into code where currently are none. So
executeStatement2() has a prototype while executeStatement() doesn't.

This certainly is not a problem that would make the patch uncommitable,
and yes, it's up to the author to either add prototypes or not. I'm not
going to fight for removing or keeping them.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2014-12-22 Thread Jaime Casanova
On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 Hi,

 v2 version of this patch is attached.


a few more tests revealed that passing null as the sample size
argument works, and it shouldn't.
in repeatable it gives an error if i use null as argument but it gives
a syntax error, and it should be a data exception (data exception --
invalid repeat argument in a sample clause) according to the standard

also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a
big table and had to wait a long time for it to finish


regression=# select count(1) from tenk1 tablesample system (null);
 count
---
28
(1 row)

regression=# select count(1) from tenk1 tablesample bernoulli (null);
 count
---
 0
(1 row)


-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Andrew Dunstan


On 12/21/2014 02:18 PM, Tom Lane wrote:

=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= fabriziome...@gmail.com writes:

I work with some customer that have databases with a lot of schemas and
sometimes we need to run manual VACUUM in one schema, and would be nice to
have a new option to run vacuum in relations from a specific schema.

I'm pretty skeptical of this alleged use-case.  Manual vacuuming ought
to be mostly a thing of the past, and even if it's not, hitting
*everything* in a schema should seldom be an appropriate thing to do.

While the feature itself might be fairly innocuous, I'm just wondering
why we need to encourage manual vacuuming.  And why that, but not
say schema-wide ANALYZE, CLUSTER, TRUNCATE, ...





Sadly, manual vacuuming is very far from a thing of the past. Autovacuum 
simply doesn't give us enough control in many cases.


Maybe this gadget would be useful, but its application seems a bit 
limited. Someone mentioned allowing multiple --table options to 
vacuumdb. That would be mopre flexible.


But really I think we need to work on how we can make autovacuum more 
useful. For example, it would be nice not to have to do ALTER TABLE to 
change the autovac settings. It would be nice to be able to specify 
times of day and days of week when autovacuum should be turned on or off 
for a table. I'm sure there are plenty of other ideas.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-22 Thread Robert Haas
On Mon, Dec 22, 2014 at 11:19 AM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
  Tom The other reason that's a bad comparison is that I've not seen
  Tom many queries that use more than a couple of window frames,
  Tom whereas we have to expect that the number of grouping sets in
  Tom typical queries will be significantly more than a couple.

 I would be interested in seeing more good examples of the size and
 type of grouping sets used in typical queries.

From what I have seen, there is interest in being able to do things
like GROUP BY CUBE(a, b, c, d) and have that be efficient.  That will
require 16 different groupings, and we really want to minimize the
number of times we have to re-sort to get all of those done.  For
example, if we start by sorting on (a, b, c, d), we want to then make
a single pass over the data computing the aggregates with (a, b, c,
d), (a, b, c), (a, b), (a), and () as the grouping columns.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-22 Thread Atri Sharma
On Tuesday, December 23, 2014, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Dec 22, 2014 at 11:19 AM, Andrew Gierth
 and...@tao11.riddles.org.uk javascript:; wrote:
   Tom The other reason that's a bad comparison is that I've not seen
   Tom many queries that use more than a couple of window frames,
   Tom whereas we have to expect that the number of grouping sets in
   Tom typical queries will be significantly more than a couple.
 
  I would be interested in seeing more good examples of the size and
  type of grouping sets used in typical queries.

 From what I have seen, there is interest in being able to do things
 like GROUP BY CUBE(a, b, c, d) and have that be efficient.  That will
 require 16 different groupings, and we really want to minimize the
 number of times we have to re-sort to get all of those done.  For
 example, if we start by sorting on (a, b, c, d), we want to then make
 a single pass over the data computing the aggregates with (a, b, c,
 d), (a, b, c), (a, b), (a), and () as the grouping columns.



That is what ChainAggregate node does exactly. A set of orders that fit in
a single ROLLUP list (like your example) are processed in a single go.


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] TABLESAMPLE patch

2014-12-22 Thread Petr Jelinek

On 22/12/14 20:14, Jaime Casanova wrote:

On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek p...@2ndquadrant.com wrote:

Hi,

v2 version of this patch is attached.



a few more tests revealed that passing null as the sample size
argument works, and it shouldn't.


Fixed.


in repeatable it gives an error if i use null as argument but it gives
a syntax error, and it should be a data exception (data exception --
invalid repeat argument in a sample clause) according to the standard


Also fixed.



also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a
big table and had to wait a long time for it to finish


Ah yeah, I can't rely on CHECK_FOR_INTERRUPTS in ExecScan because it 
might take a while to fetch a row if percentage is very small and table 
is big... Fixed.



Attached is v3 which besides the fixes mentioned above also includes 
changes discussed with Tomas (except the CREATE/DROP TABLESAMPLE 
METHOD), fixes for crash with FETCH FIRST and is rebased against current 
master.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..250ae29 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replac
 
 phrasewhere replaceable class=parameterfrom_item/replaceable can be one of:/phrase
 
-[ ONLY ] replaceable class=parametertable_name/replaceable [ * ] [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
+[ ONLY ] replaceable class=parametertable_name/replaceable [ * ] [ TABLESAMPLE replaceable class=parametersampling_method/replaceable ( replaceable class=parameterargument/replaceable [, ...] ) [ REPEATABLE ( replaceable class=parameterseed/replaceable ) ] ] [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
 [ LATERAL ] ( replaceable class=parameterselect/replaceable ) [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ]
 replaceable class=parameterwith_query_name/replaceable [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
 [ LATERAL ] replaceable class=parameterfunction_name/replaceable ( [ replaceable class=parameterargument/replaceable [, ...] ] )
@@ -317,6 +317,38 @@ TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ]
  /varlistentry
 
  varlistentry
+  termTABLESAMPLE replaceable class=parametersampling_method/replaceable ( replaceable class=parameterargument/replaceable [, ...] ) [ REPEATABLE ( replaceable class=parameterseed/replaceable ) ]/term
+  listitem
+   para
+Table sample clause after
+replaceable class=parametertable_name/replaceable indicates that
+a replaceable class=parametersampling_method/replaceable should
+be used to retrieve subset of rows in the table.
+The replaceable class=parametersampling_method/replaceable can be
+one of:
+itemizedlist
+ listitem
+  paraliteralSYSTEM/literal/para
+ /listitem
+ listitem
+  paraliteralBERNOULLI/literal/para
+ /listitem
+/itemizedlist
+Both of those sampling methods currently accept only single argument
+which is the percent (floating point from 0 to 100) of the rows to
+be returned.
+The literalSYSTEM/literal sampling method does block level
+sampling with each block having same chance of being selected and
+returns all rows from each selected block.
+The literalBERNOULLI/literal scans whole table and returns
+individual rows with equal probability.
+The optional numeric parameter literalREPEATABLE/literal is used
+as random seed for sampling.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termreplaceable class=parameteralias/replaceable/term
   listitem
para
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..595737c 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist \
+			  transam tsm
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tsm/Makefile b/src/backend/access/tsm/Makefile
new file mode 100644
index 000..73bbbd7
--- /dev/null
+++ b/src/backend/access/tsm/Makefile
@@ -0,0 +1,17 @@

Re: [HACKERS] TAP test breakage on MacOS X

2014-12-22 Thread Christoph Berg
Re: Peter Eisentraut 2014-11-03 5457f54e.4020...@gmx.net
 On 11/2/14 2:00 PM, Noah Misch wrote:
  Ick; I concur with your judgment on those aspects of the IPC::Cmd design.
  Thanks for investigating.  So, surviving options include:
 
  1. Require IPC::Run.
  2. Write our own run() that reports the raw exit code.
  3. Distill the raw exit code from the IPC::Cmd::run() error string.
  4. Pass IPC::Run::run_forked() a subroutine that execs an argument list.
  
  FWIW, (3) looks most promising to me.  That is to say, implement a reverse 
  of
  IPC::Cmd::_pp_child_error().  Ugly to be sure, but the wart can be small and
  self-contained.
 
 I thank you for this research, but I suggest that we ship 9.4 as is,
 that is with requiring IPC::Run and --enable-* option.  All the possible
 alternatives will clearly need more rounds of portability testing.  We
 can then evaluate these changes for 9.5 in peace.

Hrm. I spent some effort into getting the TAP tests to run on 9.4beta
for Debian, and I've only now learned that 9.4.0 doesn't run them
unless I say --enable-tap-tests. A short note to -packagers would have
been nice, for a change so late in the release cycle...

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-22 Thread Stephen Frost
Hey Alvaro,

* Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
 I have attached an updated patch for review
 (role-attribute-bitmask-v7.patch).
 
 This patch incorporates the 'v5a' patch proposed by Alvaro, input
 validation (Assert) check in 'check_role_attribute' and the documentation
 updates requested by Stephen.

Not sure if you were looking for me to comment on this or not, but I did
look over the updated docs and the added Asserts and they look good to
me.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-22 Thread Heikki Linnakangas

On 12/20/2014 11:14 PM, Peter Geoghegan wrote:

On Sat, Dec 20, 2014 at 2:16 AM, Martijn van Oosterhout
klep...@svana.org wrote:

What I find curious about the opclass thing is: when do you ever have
an opclass that has a different idea of equality than the default
opclass for the type? In other words, when is B-Tree strategy number 3
not actually '=' in practice, for *any* B-Tree opclass? Certainly, it
doesn't appear to be the case that it isn't so with any shipped
opclasses - the shipped non-default B-Tree opclasses only serve to
provide alternative notions of sort order, and never equals.


Well, in theory you could build a case insensetive index on a text
column. You could argue that the column should have been defined as
citext in the first place, but it might not for various reasons.


That generally works in other systems by having a case-insensitive
collation. I don't know if that implies that non bitwise identical
items can be equal according to the equals operator in those other
systems. There aren't too many examples of that happening in general
(I can only think of citext and numeric offhand), presumably because
it necessitates a normalization process (such as lower-casing in the
case of citext) within the hash opclass support function 1, a process
best avoided.

citext is an interesting precedent that supports my argument above,
because citext demonstrates that we preferred to create a new type
rather than a new non-default opclass (with a non-'=' equals
operator) when time came to introduce a new concept of equals (and
not merely a new, alternative sort order). Again, this is surely due
to the system dependency on the default B-Tree opclass for the
purposes of GROUP BY and DISTINCT, whose behavior sort ordering
doesn't necessarily enter into at all.


Yeah, I don't expect it to happen very often. It's confusing to have 
multiple definitions of equality.


There is one built-in example: the record *= record operator [1]. It's 
quite special purpose, the docs even say that they are not intended to 
be generally useful for writing queries. But there they are.


I feel that it needs to be possible to specify the constraint 
unambiguously in all cases. These are very rare use cases, but we should 
have an escape hatch for the rare cases that need it.



What would it take to also support partial indexes?

[1] See 
http://www.postgresql.org/docs/devel/static/functions-comparisons.html#ROW-WISE-COMPARISON


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-22 Thread Alvaro Herrera
Stephen Frost wrote:
 Hey Alvaro,
 
 * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
  I have attached an updated patch for review
  (role-attribute-bitmask-v7.patch).
  
  This patch incorporates the 'v5a' patch proposed by Alvaro, input
  validation (Assert) check in 'check_role_attribute' and the documentation
  updates requested by Stephen.
 
 Not sure if you were looking for me to comment on this or not, but I did
 look over the updated docs and the added Asserts and they look good to
 me.

Okay, great.  I will be looking into committing this patch shortly --
unless you want to do it yourself?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Jim Nasby

On 12/22/14, 10:05 AM, Andres Freund wrote:

While the feature itself might be fairly innocuous, I'm just wondering
why we need to encourage manual vacuuming.  And why that, but not
say schema-wide ANALYZE, CLUSTER, TRUNCATE, ...

There's one argument for supporting more for VACUUM than the rest - it
can't be executed directly as the result of a query as the others
can... I wonder if that'd not better be answered by adding a feature to
vacuumdb that allows selecting the to-be-vacuumed table by a user
defined query.


I would MUCH rather that we find a way to special-case executing 
non-transactional commands dynamically, because VACUUM isn't the only one that 
suffers from this problem.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread Jim Nasby

On 12/21/14, 8:55 PM, Fabrízio de Royes Mello wrote:

   And why that, but not
   say schema-wide ANALYZE, CLUSTER, TRUNCATE, ...
  

+1. I can write patches for each of this maintenance statement too.


If we're going to go that route, then perhaps it would make more sense to 
create a command that allows you to apply a second command to every object in a 
schema. We would have to be careful about PreventTransactionChain commands.


  Sorry but I don't understand what you meant. Can you explain more about your 
idea?


There's a very large number of commands that could be useful to execute on 
every object in a schema. (RE)INDEX, CLUSTER, ALTER come to mind besides VACUUM.

Right now a lot of people just work around this with things like DO blocks, but 
as mentioned elsewhere in the thread that fails for commands that can't be in a 
transaction.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-22 Thread Peter Geoghegan
On Mon, Dec 22, 2014 at 1:24 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I feel that it needs to be possible to specify the constraint unambiguously
 in all cases. These are very rare use cases, but we should have an escape
 hatch for the rare cases that need it.


 What would it take to also support partial indexes?

Aside from considerations about how to pick them without using their
name, partial unique indexes aren't special at all. My earlier concern
was that we'd need to account for before row insert triggers that
change values out from under us. But maybe that concern was overblown,
come to think of it. I am already borrowing a little bit of the raw
parser's logic for CREATE INDEX statements for unique index inference
(during parse analysis) -- we're matching the cataloged index
definition attributes/expressions, so this makes a lot of sense. Maybe
I had the wrong idea about partial indexes earlier, which was that we
must use the values in the tuple proposed for insertion to check that
a partial index was a suitable arbiter of whether or not the UPDATE
path should be taken in respect of any given tuple. I should just go
further with borrowing things from CREATE INDEX, and give the user an
optional way of specifying a WHERE clause that is also matched in a
similar way to the expressions themselves. Did the partial unique
index your UPSERT implied not cover the ultimate tuple inserted after
before row insert triggers fired? That's on you as a user...you'll
always get an insert, since there won't be a would-be duplicate
violation to make there be an update.

I actually care about partial unique indexes a lot. They're a very
useful feature. Back when I was an application developer, I frequently
used is_active boolean columns to represent logical app-level
deletion, where actually deleting the tuple was not possible (e.g.
because it may still be referenced in historic records), while not
wanting to have it be subject to uniqueness checks as a logically
deleted/!is_active tuple.

This measure to support partial indexes, plus the additional leeway
around non-default opclass unique indexes that I can add (that they
need only match the equals operator of the default opclass to be
accepted) brings us 99.9% of the way. That only leaves:

* An inability to specifying some subset of unique indexes or
exclusion constraints for the IGNORE variant (the UPDATE variant is
irrelevant).

* An inability to specifying a IGNORE arbitrating *exclusion
constraint* as the sole arbiter of whether or not the IGNORE path
should be taken. (exclusion constraints are not usable for the UPDATE
variant, so that's irrelevant again).

Did I forget something?

The use cases around these limitations are very rare, and only apply
to the IGNORE variant which seems much less interesting. I'm quite
comfortable dealing with them in a later release of PostgreSQL, to cut
scope (or avoid adding scope) for 9.5. Do you think that's okay? How
often will the IGNORE variant be used when everything shouldn't be
IGNOREd anyway? Although, to be totally fair, I should probably also
include:

* non-default B-tree opclasses cannot be specified as arbiters of the
alternative path (for both IGNORE and UPDATE variants) iff their
equals operator happens to not be the equals operator of the
default opclass (which is theoretical, and likely non-existent as a
use case).

If you're dead set on having an escape hatch, maybe we should just get
over it and add a way of specifying a unique index by name. As I said,
these under-served use cases are either exceedingly rare or entirely
theoretical.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-22 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Stephen Frost wrote:
  * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
   I have attached an updated patch for review
   (role-attribute-bitmask-v7.patch).
   
   This patch incorporates the 'v5a' patch proposed by Alvaro, input
   validation (Assert) check in 'check_role_attribute' and the documentation
   updates requested by Stephen.
  
  Not sure if you were looking for me to comment on this or not, but I did
  look over the updated docs and the added Asserts and they look good to
  me.
 
 Okay, great.  I will be looking into committing this patch shortly --
 unless you want to do it yourself?

Please feel free to go ahead.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-12-22 Thread Oskari Saarenmaa
08.11.2014, 04:03, Peter Eisentraut kirjoitti:
 On 11/4/14 3:52 PM, Peter Eisentraut wrote:
  Here are patches to address that.  First, it reports errors when
  attempting to create a tar header that would truncate file or symlink
  names.  Second, it works around the problem in the tests by creating a
  symlink from the short-name tempdir that we had set up for the
  Unix-socket directory case.
 I ended up splitting this up differently.  I applied to part of the
 second patch that works around the length issue in tablespaces.  So the
 tests now pass in 9.4 and up even in working directories with long
 names.  This clears up the regression in 9.4.
 
 The remaining, not applied patch is attached.  It errors when the file
 name is too long and adds tests for that.  This could be applied to 9.5
 and backpatched, if we so choose.  It might become obsolete if
 https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted.
  If that patch doesn't get accepted, I might add my patch to a future
 commit fest.

I think we should just use the UStar tar format
(http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and
allow long file names; all actively used tar implementations should be
able to handle them.  I'll try to write a patch for that soonish.

Until UStar format is used we should raise an error if a filename is
being truncated by tar instead of creating invalid archives.  Also note
that Posix tar format allows 100 byte file names as the name doesn't
have to be zero terminated, but we may want to stick to 99 bytes in old
type tar anyway as using 100 byte filenames has shown bugs in other tar
implementations, for example
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=689582 - and
truncating at 100 bytes instead of 99 doesn't help us too much anyway.

/ Oskari



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2014-12-22 Thread Oskari Saarenmaa
On Fri, Nov 14, 2014 at 01:57:16AM +0100, Andreas Karlsson wrote:
 *** a/configure.in
 --- b/configure.in
 *** AC_DEFINE_UNQUOTED(MAXIMUM_ALIGNOF, $MAX
 *** 1751,1756 
 --- 1751,1759 
   AC_CHECK_TYPES([int8, uint8, int64, uint64], [], [],
   [#include stdio.h])
   
 + # Check if platform support gcc style 128-bit integers.
 + AC_CHECK_TYPES([__int128_t, __uint128_t], [], [], [#include stdint.h])
 + 
   # We also check for sig_atomic_t, which *should* be defined per ANSI
   # C, but is missing on some old platforms.
   AC_CHECK_TYPES(sig_atomic_t, [], [], [#include signal.h])

__int128_t and __uint128_t are GCC extensions and are not related to
stdint.h.

 *** a/src/include/pg_config.h.in
 --- b/src/include/pg_config.h.in
 ***
 *** 198,203 
 --- 198,209 
   /* Define to 1 if you have __sync_compare_and_swap(int64 *, int64, int64). 
 */
   #undef HAVE_GCC__SYNC_INT64_CAS
   
 + /* Define to 1 if you have __int128_t */
 + #undef HAVE___INT128_T
 + 
 + /* Define to 1 if you have __uint128_t */
 + #undef HAVE___UINT128_T
 + 
   /* Define to 1 if you have the `getaddrinfo' function. */
   #undef HAVE_GETADDRINFO

These changes don't match what my autoconf does.  Not a big deal I guess,
but if this is merged as-is the next time someone runs autoreconf it'll
write these lines differently to a different location of the file and the
change will end up as a part of an unrelated commit.

 *** a/src/backend/utils/adt/numeric.c
 --- b/src/backend/utils/adt/numeric.c
 *** static void apply_typmod(NumericVar *var
 *** 402,407 
 --- 402,410 
   static int32 numericvar_to_int4(NumericVar *var);
   static bool numericvar_to_int8(NumericVar *var, int64 *result);
   static void int8_to_numericvar(int64 val, NumericVar *var);
 + #ifdef HAVE_INT128
 + static void int16_to_numericvar(int128 val, NumericVar *var);
 + #endif

Existing code uses int4 and int8 to refer to 32 and 64 bit integers as
they're also PG datatypes, but int16 isn't one and it looks a lot like
int16_t.  I think it would make sense to just call it int128 internally
everywhere instead of int16 which isn't used anywhere else to refer to 128
bit integers.

 new file mode 100755

I guess src/tools/git-external-diff generated these bogus new file mode
lines?  I know the project policy says that context diffs should be used,
but it seems to me that most people just use unified diffs these days so I'd
just use git show or git format-patch -1 to generate the patches.  The
ones generated by git format-patch can be easily applied by reviewers
using git am.


/ Oskari


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade needs postmaster [sic]

2014-12-22 Thread Christoph Berg
Hi,

I've played with trying to find out which minimal set of files I need
from the old version to make pg_upgrade work. Interestingly, this
includes the good old postmaster binary:

$ sudo -u postgres pgsql/bin/pg_upgrade -b /var/tmp/pgsql/bin/ -B 
/usr/lib/postgresql/9.5/bin/ -d /etc/postgresql/9.5/main -D /tmp/9.5/data
Finding the real data directory for the old cluster sh: 1: 
/var/tmp/pgsql/bin/postmaster: not found

Could not get data directory using /var/tmp/pgsql/bin/postmaster -D 
/etc/postgresql/9.5/main -C data_directory: No such file or directory
Failure, exiting

I think it should just use postgres there, patch attached. (If we
need to be compatible with postmaster-only PG versions from the last
century, it should try both names.)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
new file mode 100644
index cfc88ec..bd810cd
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*** adjust_data_dir(ClusterInfo *cluster)
*** 415,421 
  	 * so this might fail --- only works for PG 9.2+.   If this fails,
  	 * pg_upgrade will fail anyway because the data files will not be found.
  	 */
! 	snprintf(cmd, sizeof(cmd), \%s/postmaster\ -D \%s\ -C data_directory,
  			 cluster-bindir, cluster-pgconfig);
  
  	if ((output = popen(cmd, r)) == NULL ||
--- 415,421 
  	 * so this might fail --- only works for PG 9.2+.   If this fails,
  	 * pg_upgrade will fail anyway because the data files will not be found.
  	 */
! 	snprintf(cmd, sizeof(cmd), \%s/postgres\ -D \%s\ -C data_directory,
  			 cluster-bindir, cluster-pgconfig);
  
  	if ((output = popen(cmd, r)) == NULL ||

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade needs postmaster [sic]

2014-12-22 Thread Bruce Momjian
On Mon, Dec 22, 2014 at 11:48:52PM +0100, Christoph Berg wrote:
 Hi,
 
 I've played with trying to find out which minimal set of files I need
 from the old version to make pg_upgrade work. Interestingly, this
 includes the good old postmaster binary:
 
 $ sudo -u postgres pgsql/bin/pg_upgrade -b /var/tmp/pgsql/bin/ -B 
 /usr/lib/postgresql/9.5/bin/ -d /etc/postgresql/9.5/main -D /tmp/9.5/data
 Finding the real data directory for the old cluster sh: 1: 
 /var/tmp/pgsql/bin/postmaster: not found
 
 Could not get data directory using /var/tmp/pgsql/bin/postmaster -D 
 /etc/postgresql/9.5/main -C data_directory: No such file or directory
 Failure, exiting
 
 I think it should just use postgres there, patch attached. (If we
 need to be compatible with postmaster-only PG versions from the last
 century, it should try both names.)

Yes, I think you are right.  I see pg_ctl using the 'postgres' binary,
so pg_upgrade should do the same.  I will apply this patch soon, and see
if there are any other 'postmaster' mentions that need updating.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench -f and vacuum

2014-12-22 Thread Tatsuo Ishii
 First of all - I'm not entirely convinced the IF EXISTS approach is
 somehow better than -f implies -n suggested before, but I don't have a
 strong preference either.

I revisited the -f implies -n approach again. The main reason why I
wanted to avoid the approach was, it breaks the backward
compatibility. However if we were not going to back port the patch,
the approach is simpler and cleaner from the point of code
organization, I think (the patch I posted already make it impossible
to back port because to_regclass is used) .

However there's another problem with the approach. If we want to use
-f *and* run vacuum before testing, currently there's no way to do
it. -v might help, but it runs vacuum against pgbench_accounts
(without -v, pgbench runs vacuum against pgbench_* except
pgbench_accounts). To solve the problem, we would need to add opposite
option to -n, run VACUUM before tests except pgbench_accounts
(suppose the option be -k). But maybe someone said why don't we
vacuum always pgbench_accounts? These days machines are pretty fast
and we don't need to care about it any more.

So my questions are:

1) Which approach is better IF EXISTS or -f implies -n?

2) If latter is better, do we need to add -k option? Or it's not
   worth the trouble?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-22 Thread Andrew Gierth
 Robert == Robert Haas robertmh...@gmail.com writes:

  I would be interested in seeing more good examples of the size and
  type of grouping sets used in typical queries.

 Robert From what I have seen, there is interest in being able to do
 Robert things like GROUP BY CUBE(a, b, c, d) and have that be
 Robert efficient.

Yes, but that's not telling me anything I didn't already know.

What I'm curious about is things like:

 - what's the largest cube(...) people actually make use of in practice

 - do people make much use of the ability to mix cube and rollup, or
   take the cross product of multiple grouping sets

 - what's the most complex GROUPING SETS clause anyone has seen in
   common use

 Robert That will require 16 different groupings, and we really want
 Robert to minimize the number of times we have to re-sort to get all
 Robert of those done.  For example, if we start by sorting on (a, b,
 Robert c, d), we want to then make a single pass over the data
 Robert computing the aggregates with (a, b, c, d), (a, b, c), (a,
 Robert b), (a), and () as the grouping columns.

In the case of cube(a,b,c,d), our code currently gives:

b,d,a,c:  (b,d,a,c),(b,d)
a,b,d:(a,b,d),(a,b)
d,a,c:(d,a,c),(d,a),(d)
c,d:  (c,d),(c)
b,c,d:(b,c,d),(b,c),(b)
a,c,b:(a,c,b),(a,c),(a),()

There is no solution in less than 6 sorts. (There are many possible
solutions in 6 sorts, but we don't attempt to prefer one over
another. The minimum number of sorts for a cube of N dimensions is
obviously N! / (r! * (N-r)!) where r = floor(N/2).)

If you want the theory: the set of grouping sets is a poset ordered by
set inclusion; what we want is a minimal partition of this poset into
chains (since any chain can be processed in one pass), which happens
to be equivalent to the problem of maximum cardinality matching in a
bipartite graph, which we solve in polynomial time with the
Hopcroft-Karp algorithm.  This guarantees us a minimal solution for
any combination of grouping sets however specified, not just for
cubes.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-12-22 Thread Peter Geoghegan
On Mon, Dec 22, 2014 at 5:50 AM, Robert Haas robertmh...@gmail.com wrote:
 Looking over the latest patch, I think we could simplify the code so
 that you don't need multiple FuzzyAttrMatchState objects.  Instead of
 creating a separate one for each RTE and then merging them, just have
 one.  When you find an inexact-RTE name match, set a field inside the
 FuzzyAttrMatchState -- maybe with a name like rte_penalty -- to the
 Levenshtein distance between the RTEs.  Then call scanRTEForColumn()
 and pass down the same state object.  Now let
 updateFuzzyAttrMatchState() work out what it needs to do based on the
 observed inter-column distance and the currently-in-force RTE penalty.

I'm afraid I don't follow. I think doing things that way makes things
less clear. Merging is useful because it allows us to consider that an
exact match might exist, which this searchRangeTableForCol() is
already tasked with today. We now look for the best match
exhaustively, or magically return immediately in the event of an exact
match, without caring about the alias correctness or distance.

Having a separate object makes this pattern apparent from the top
level, within searchRangeTableForCol(). I feel that's better.
updateFuzzyAttrMatchState() is the wrong place to put that, because
that task rightfully belongs in searchRangeTableForCol(), where the
high level diagnostic-report-generating control flow lives.

To put it another way, creating a separate object obfuscates
scanRTEForColumn(), since it's the only client of
updateFuzzyAttrMatchState(). scanRTEForColumn() is a very important
function, and right now I am only making it slightly less clear by
tasking it with caring about distance of names on top of strict binary
equality of attribute names. I don't want to push it any further.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-12-22 Thread Peter Geoghegan
On Mon, Dec 22, 2014 at 4:34 PM, Peter Geoghegan p...@heroku.com wrote:
 To put it another way, creating a separate object obfuscates
 scanRTEForColumn(), since it's the only client of
 updateFuzzyAttrMatchState().


Excuse me. I mean *not* creating a separate object -- having a unified
state representation for the entire range-table, rather than having
one per RTE and merging them one by one into an overall/final range
table object.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bin checks taking too long.

2014-12-22 Thread Andrew Dunstan
Currently the buildfarm animal crake (my development instance) is 
running the bin check, but not any other animal. These tests still take 
for too long, not least because each set of tests requires a separate 
install.


I have complained about this before, but we don't seem to have made any 
progress. I am very reluctant to put this check into a buildfarm client 
release until that is addressed. Is there really nothing we can do about 
it? This is getting more important since we now have another feature 
that we want to get into a buildfarm client release.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench -f and vacuum

2014-12-22 Thread Alvaro Herrera
Here's a completely different idea.  How about we add an option that
means vacuum this table before running the test (can be given several
times); by default the set of vacuumed tables is the current pgbench_*
list, but if -f is specified then the default set is cleared.  So if you
have a -f script and want to vacuum the default tables, you're forced to
give a few --vacuum-table=foo options.  But this gives the option to
vacuum some other table before the test, not just the pgbench default
ones.

This is a bit more complicated, and makes life more difficult to people
using -f and the default pgbench tables than your proposed new -k
switch.  But it's more general and it might have more use cases; and
people using -f with the default pgbench tables are probably rare,
anyway.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-12-22 Thread Alvaro Herrera
Here's a five-part split of the remaining pieces of this patch.

Patch 0001 is the one I posted in 
http://www.postgresql.org/message-id/20141220022308.gy1...@alvh.no-ip.org
which adds support for COMMENT ON CONSTRAINT .. ON DOMAIN.  This just
splits OBJECT_CONSTRAINT in OBJECT_TABCONSTRAINT and
OBJECT_DOMCONSTRAINT.  It includes \dd support and pg_dump support for
comments on domain constraint comments.

I intend to commit this one first thing tomorrow.

Patch 0002 adds OBJECT_DEFAULT support.  This is not needed currently,
so there's no bug being fixed; we just need it if we want to use
get_object_address in a way different from currently.

Patch 0003 adds an (unused) table and routine to map the strings
returned by getObjectTypeDescription into enum ObjectType, for use of
0004.  It also splits a part of parseTypeString into a new function
typeStringToTypeName(), for use of 0004.

Patch 0004 adds a SQL-callable interface to get_object_address,
imaginatively called pg_get_object_address; this uses the stuff in patch
0003.  It includes a simple regression test.  The code that prepares
from text arrays into the appropriate List structure is messy because it
needs to mimic parser output.

I intend to push these three patches as a single commit tomorrow.

Patch 0005 adds getObjectIdentityParts(), which returns the object
identity in arrays that can be passed to pg_get_object_address.  This
part needs slight revisions so I'm not sure I will be able to push
tomorrow.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
From f5a324e864baf60df989e313740744732c04404d Mon Sep 17 00:00:00 2001
From: Alvaro Herrera alvhe...@alvh.no-ip.org
Date: Fri, 19 Dec 2014 17:03:29 -0300
Subject: [PATCH 1/5] Distinguish domain constraint from table constraints

---
 doc/src/sgml/ref/comment.sgml  | 14 ++
 src/backend/catalog/objectaddress.c| 26 ++
 src/backend/commands/alter.c   |  3 ++-
 src/backend/commands/event_trigger.c   |  3 ++-
 src/backend/commands/tablecmds.c   |  2 +-
 src/backend/parser/gram.y  | 18 +-
 src/backend/parser/parse_utilcmd.c |  2 +-
 src/backend/tcop/utility.c |  5 ++---
 src/bin/pg_dump/pg_dump.c  | 17 +
 src/bin/psql/describe.c| 27 +--
 src/include/nodes/parsenodes.h |  3 ++-
 src/test/regress/input/constraints.source  | 21 +
 src/test/regress/output/constraints.source | 19 +++
 13 files changed, 141 insertions(+), 19 deletions(-)

diff --git a/doc/src/sgml/ref/comment.sgml b/doc/src/sgml/ref/comment.sgml
index 36a7312..62e1968 100644
--- a/doc/src/sgml/ref/comment.sgml
+++ b/doc/src/sgml/ref/comment.sgml
@@ -28,6 +28,7 @@ COMMENT ON
   COLLATION replaceable class=PARAMETERobject_name/replaceable |
   COLUMN replaceable class=PARAMETERrelation_name/replaceable.replaceable class=PARAMETERcolumn_name/replaceable |
   CONSTRAINT replaceable class=PARAMETERconstraint_name/replaceable ON replaceable class=PARAMETERtable_name/replaceable |
+  CONSTRAINT replaceable class=PARAMETERconstraint_name/replaceable ON DOMAIN replaceable class=PARAMETERdomain_name/replaceable |
   CONVERSION replaceable class=PARAMETERobject_name/replaceable |
   DATABASE replaceable class=PARAMETERobject_name/replaceable |
   DOMAIN replaceable class=PARAMETERobject_name/replaceable |
@@ -127,6 +128,18 @@ COMMENT ON
/varlistentry
 
varlistentry
+termreplaceable class=parametertable_name/replaceable/term
+termreplaceable class=parameterdomain_name/replaceable/term
+listitem
+ para
+  When creating a comment on a constraint on a table or a domain, these
+  parameteres specify the name of the table or domain on which the
+  constraint is defined.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
  termreplaceablesource_type/replaceable/term
  listitem
   para
@@ -266,6 +279,7 @@ COMMENT ON COLLATION fr_CA IS 'Canadian French';
 COMMENT ON COLUMN my_table.my_column IS 'Employee ID number';
 COMMENT ON CONVERSION my_conv IS 'Conversion to UTF8';
 COMMENT ON CONSTRAINT bar_col_cons ON bar IS 'Constrains column col';
+COMMENT ON CONSTRAINT dom_col_constr ON DOMAIN dom IS 'Constrains col of domain';
 COMMENT ON DATABASE my_database IS 'Development Database';
 COMMENT ON DOMAIN my_domain IS 'Email Address Domain';
 COMMENT ON EXTENSION hstore IS 'implements the hstore data type';
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index e261307..297deb5 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -530,11 +530,28 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 break;
 			case OBJECT_RULE:
 			case OBJECT_TRIGGER:
-			case 

Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-12-22 Thread Amit Kapila
On Tue, Dec 23, 2014 at 4:10 AM, Oskari Saarenmaa o...@ohmu.fi wrote:

 08.11.2014, 04:03, Peter Eisentraut kirjoitti:
  On 11/4/14 3:52 PM, Peter Eisentraut wrote:
   Here are patches to address that.  First, it reports errors when
   attempting to create a tar header that would truncate file or symlink
   names.  Second, it works around the problem in the tests by creating
a
   symlink from the short-name tempdir that we had set up for the
   Unix-socket directory case.
  I ended up splitting this up differently.  I applied to part of the
  second patch that works around the length issue in tablespaces.  So the
  tests now pass in 9.4 and up even in working directories with long
  names.  This clears up the regression in 9.4.
 
  The remaining, not applied patch is attached.  It errors when the file
  name is too long and adds tests for that.  This could be applied to 9.5
  and backpatched, if we so choose.  It might become obsolete if
  https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted.
   If that patch doesn't get accepted, I might add my patch to a future
  commit fest.

 I think we should just use the UStar tar format
 (http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and
 allow long file names; all actively used tar implementations should be
 able to handle them.  I'll try to write a patch for that soonish.


I think even using UStar format won't make it work for Windows where
the standard utilities are not able to understand the symlinks in tar.
There is already a patch [1] in this CF which will handle both cases, so I
am
not sure if it is very good idea to go with a new tar format to handle this
issue.

[1] : https://commitfest.postgresql.org/action/patch_view?id=1512

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] bin checks taking too long.

2014-12-22 Thread Peter Eisentraut
On 12/22/14 7:56 PM, Andrew Dunstan wrote:
 Currently the buildfarm animal crake (my development instance) is
 running the bin check, but not any other animal. These tests still take
 for too long, not least because each set of tests requires a separate
 install.

You can avoid that by using make installcheck.

 Is there really nothing we can do about it?

There is, but it's not straightforward, as it turns out.  Ideas are welcome.

Also, as you can imagine, any build system changes are bottlenecked on
Windows support.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user set local_preload_libraries.

2014-12-22 Thread Peter Eisentraut
On 8/29/14 4:01 PM, Peter Eisentraut wrote:
 On 8/28/14 9:01 AM, Kyotaro HORIGUCHI wrote:
 I found this issue when trying per-pg_user (role) loading of
 auto_analyze and some tweaking tool. It is not necessarily set by
 the user by own, but the function to decide whether to load some
 module by the session-user would be usable, at least, as for me:)
 
 I think we could just set local_preload_libraries to PGC_USERSET and
 document that subsequent changes won't take effect.  That's the same way
 session_preload_libraries works.

Committed this way.

This doesn't prevent future fine-tuning in this area, but in the
subsequent discussion, there was no clear enthusiasm for any particular
direction.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Missing updates at few places for row level security

2014-12-22 Thread Amit Kapila
There is a new column added in pg_authid (rolbypassrls)
and the updation for same is missed in below places:

a. System catalog page for pg_authid
http://www.postgresql.org/docs/devel/static/catalog-pg-authid.html
b. Do we want to add this new column in pg_shadow view?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-22 Thread Noah Misch
On Mon, Dec 22, 2014 at 10:46:16AM -0500, Tom Lane wrote:
 I still find the ChainAggregate approach too ugly at a system structural
 level to accept, regardless of Noah's argument about number of I/O cycles
 consumed.  We'll be paying for that in complexity and bugs into the
 indefinite future, and I wonder if it isn't going to foreclose some other
 performance opportunities as well.

Among GROUPING SETS GroupAggregate implementations, I bet there's a nonempty
intersection between those having maintainable design and those having optimal
I/O usage, optimal memory usage, and optimal number of sorts.  Let's put more
effort into finding it.  I'm hearing that the shared tuplestore is
ChainAggregate's principal threat to system structure; is that right?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers