Re: [HACKERS] Patch to support SEMI and ANTI join removal

2014-08-17 Thread David Rowley
On Sun, Aug 10, 2014 at 11:48 PM, David Rowley dgrowle...@gmail.com wrote:


  I've attached an updated version of the patch which fixes up some
 incorrect logic in the foreign key matching code, plus various comment
 improvements.


 I've made a few updated to the patch to simplify some logic in the code
which analyses the join condition. The result is slightly faster code for
detecting either successful or unsuccessful join removal.

I've also been doing a little benchmarking of the overhead that this adds
to planning time for a handful of different queries.
With the queries I tested the overhead was between ~20 and ~423 nanoseconds
per SEMI or ANTI join, the 20 was for the earliest fast path out on an
unsuccessful removal and the 423 was for a successful removal. (tests done
on a 4 year old intel i5 laptop). This accounted for between 0.01% and 0.2%
of planning time for the tested queries. I was quite happy with this, but I
did manage to knock it down a little more with the
bms_get_singleton_v1.patch, which I've attached. This reduces the range to
between ~15 and ~409 nanoseconds, but probably this is going into micro
benchmark territory... so perhaps not worth the extra code...

With the benchmarks I just put semiorantijoin_is_removable() in a tight 1
million iteration loop and grabbed the total planning time for that, I then
compared this to an unpatched master's planning time after dividing the
time reported for the 1 million removals version by 1 million.

I didn't really find a good way to measure the extra overhead in actually
loading the foreign key constraints in get_relation_info()

Regards

David Rowley
--Benchmark with the following code:

/*
else if (sjinfo-jointype == JOIN_SEMI)
{
List   *columnlist;
RelOptInfo *rel;
bool result;
int x;

for (x = 0; x  100; x++)
result = semiorantijoin_is_removable(root, 
sjinfo, columnlist, rel);

/* Skip if not removable */
if (!result)
continue;

Assert(columnlist != NIL);
convert_semijoin_to_isnotnull_quals(root, rel, 
columnlist);
}
*/

create table t2 (id int primary key);
create table t1 (value1 int references t2, value2 int, value3 int, value4 int);

-- test 1. Successful removal with single fk
explain select * from t1 where value1 in(select id from t2);

--times in milliseconds for planning plus 1 million semiorantijoin_is_removable 
calls
425.019
413.903
407.200
411.014
411.370

-- as above, but with with bms_get_singleton patch applied
405.971
390.838
401.959
407.593
393.540

-- unpatched master planning time for above query
0.207
0.216
0.203
0.194
0.200

-- test 2. Non var in outer join condition.
explain select * from t1 where value1 in(select 0 from t2);

--times in milliseconds for planning plus 1 million semiorantijoin_is_removable 
calls
40.419
40.374
36.831
42.832
40.248

-- as above, but with with bms_get_singleton patch applied
31.064
32.176
30.028
31.654
34.571

-- unpatched master planning time for above query
0.169
0.171
0.156
0.155
0.157

-- test 3. Fail case. No foreign key defined.
explain select * from t1 where value2 in(select id from t2);

--times in milliseconds for planning plus 1 million semiorantijoin_is_removable 
calls
409.801
414.237
406.165
406.959
401.322

-- as above, but with with bms_get_singleton patch applied
367.536
347.254
349.383
348.291
348.063

-- unpatched master planning time for above query
0.214
0.200
0.199
0.195
0.248

-- test 4. 2 foreign keys defined. (with join removed)
ALTER TABLE t1 ADD CONSTRAINT t1_value2_fkey FOREIGN KEY (value2) REFERENCES t2;

explain select * from t1 where value2 in(select id from t2);

--times in milliseconds for planning plus 1 million semiorantijoin_is_removable 
calls
431.408
421.737
426.109
419.222
418.526

-- as above, but with with bms_get_singleton patch applied
392.525
392.785
393.102
388.653
393.168

-- unpatched master planning time for above query
0.211
0.201
0.235
0.236
0.230

-- test 5. 2 foreign keys defined (without join removed)
explain select * from t1 where value3 in(select id from t2);

--times in milliseconds for planning plus 1 million semiorantijoin_is_removable 
calls
426.327
423.019
431.331
423.405
413.779

-- as above, but with with bms_get_singleton patch applied
411.991
412.971
402.479
405.715
416.528

-- unpatched master planning time for above query
0.237
0.261
0.266
0.214
0.218

-- test 6. wrong operator.
explain select * from t1 where exists(select 1 from t2 where value1  id);

--times in milliseconds for planning plus 1 million semiorantijoin_is_removable 
calls
122.176
117.536
119.057
125.618
118.249

-- as above, but with with bms_get_singleton patch applied
109.182
108.424
109.526
110.091
107.240

-- unpatched master 

Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-17 Thread Stephen Frost
Alvaro, all,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Can you be more specific on the exact grammar you're considering?  The
 proposal above,
 ALTER TABLE ON ALL TABLES IN TABLESPACE xyz
 doesn't seem very good to me.  I would think it'd be more like
 ALTER ALL TABLES IN TABLESPACE xyz
 but then if you return ALTER TABLE as a command tag that might be a bit
 strange.  Maybe
 ALTER TABLE ALL IN TABLESPACE xyz
 which AFAICS should work since ALL is already a reserved keyword.

I've implemented the 'ALTER TABLE ALL IN TABLESPACE ...' appproach, as
discussed.  Patch attached.

 Also, how would we document this?  Would we have it in the same page as
 all the ALTER TABLE variants, or would we create a separate page for
 ALTER TABLE ALL?  Keeping in mind that in the future we might want to
 allow things such as ALTER TABLE ALL IN SCHEMA xyz it might be better to
 have the selection logic documented neatly in its own little page
 instead of together with the ALTER TABLE mess which is already rather
 large.

As mentioned, I'll add this to the ALTER TABLE documentation and remove
it from the TABLESPACE docs.  That's not done yet but I should have time
in the next few days to get that done also and will then commit it all
to master and back-patch to 9.4, barring objections.

Thanks,

Stephen
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c86b999..feceed7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -50,6 +50,7 @@
 #include commands/tablespace.h
 #include commands/trigger.h
 #include commands/typecmds.h
+#include commands/user.h
 #include executor/executor.h
 #include foreign/foreign.h
 #include miscadmin.h
@@ -9204,6 +9205,176 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 }
 
 /*
+ * Alter Table ALL ... SET TABLESPACE
+ *
+ * Allows a user to move all objects of some type in a given tablespace in the
+ * current database to another tablespace.  Objects can be chosen based on the
+ * owner of the object also, to allow users to move only their objects.
+ * The user must have CREATE rights on the new tablespace, as usual.   The main
+ * permissions handling is done by the lower-level table move function.
+ *
+ * All to-be-moved objects are locked first. If NOWAIT is specified and the
+ * lock can't be acquired then we ereport(ERROR).
+ */
+Oid
+AlterTableMoveAll(AlterTableMoveAllStmt *stmt)
+{
+	List	   *relations = NIL;
+	ListCell   *l;
+	ScanKeyData key[1];
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tuple;
+	Oid			orig_tablespaceoid;
+	Oid			new_tablespaceoid;
+	List	   *role_oids = roleNamesToIds(stmt-roles);
+
+	/* Ensure we were not asked to move something we can't */
+	if (stmt-objtype != OBJECT_TABLE  stmt-objtype != OBJECT_INDEX 
+		stmt-objtype != OBJECT_MATVIEW)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg(only tables, indexes, and materialized views exist in tablespaces)));
+
+	/* Get the orig and new tablespace OIDs */
+	orig_tablespaceoid = get_tablespace_oid(stmt-orig_tablespacename, false);
+	new_tablespaceoid = get_tablespace_oid(stmt-new_tablespacename, false);
+
+	/* Can't move shared relations in to or out of pg_global */
+	/* This is also checked by ATExecSetTableSpace, but nice to stop earlier */
+	if (orig_tablespaceoid == GLOBALTABLESPACE_OID ||
+		new_tablespaceoid == GLOBALTABLESPACE_OID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg(cannot move relations in to or out of pg_global tablespace)));
+
+	/*
+	 * Must have CREATE rights on the new tablespace, unless it is the
+	 * database default tablespace (which all users implicitly have CREATE
+	 * rights on).
+	 */
+	if (OidIsValid(new_tablespaceoid)  new_tablespaceoid != MyDatabaseTableSpace)
+	{
+		AclResult	aclresult;
+
+		aclresult = pg_tablespace_aclcheck(new_tablespaceoid, GetUserId(),
+		   ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, ACL_KIND_TABLESPACE,
+		   get_tablespace_name(new_tablespaceoid));
+	}
+
+	/*
+	 * Now that the checks are done, check if we should set either to
+	 * InvalidOid because it is our database's default tablespace.
+	 */
+	if (orig_tablespaceoid == MyDatabaseTableSpace)
+		orig_tablespaceoid = InvalidOid;
+
+	if (new_tablespaceoid == MyDatabaseTableSpace)
+		new_tablespaceoid = InvalidOid;
+
+	/* no-op */
+	if (orig_tablespaceoid == new_tablespaceoid)
+		return new_tablespaceoid;
+
+	/*
+	 * Walk the list of objects in the tablespace and move them. This will
+	 * only find objects in our database, of course.
+	 */
+	ScanKeyInit(key[0],
+Anum_pg_class_reltablespace,
+BTEqualStrategyNumber, F_OIDEQ,
+ObjectIdGetDatum(orig_tablespaceoid));
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 1, key);
+	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	{
+		Oid			relOid = 

Re: [HACKERS] Index-only scans for GIST

2014-08-17 Thread Anastasia Lubennikova
2014-08-07 0:30 GMT+04:00 Heikki Linnakangas hlinnakan...@vmware.com:

* I'm getting two regression failures with this (opr_sanity and join).


opr_sanity failure is corrected.
But there is remain question with join.
I check the latest version of my github repo and there's no fail in join
regression test
All 145 tests passed.
To tell the truth, I don't understand which changes could led to this
failure.
Could you show me regression diffs?
I want to understand what's wrong with the patch.

* The regression test queries that use LIMIT are not guaranteed to always
 return the same rows, hence they're not very good regression test cases.
 I'd suggest using more restricting WHERE clauses, so that each query only
 returns a handful of rows.


Thank you for comment, I rewrote wrong queries. But could you explain why
LIMIT queries may return different results? Is it happens because of
different query optimization?

* I think it's leaking memory, in GIST scan context. I tested this with a
 variant of the regression tests:

 insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
 0.05*i)),
  point(0.05*i, 0.05*i) FROM generate_series(0,
 1000) as i;
 CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

 set enable_seqscan=off;
 set enable_bitmapscan=off;

 explain analyze  select p from gist_tbl where p @ box(point(0,0),
 point(999,999)) and length(p::text)  10;

 while the final query runs, 'top' shows constantly increasing memory usage.


I don't think it's memory leak. After some increasing, memory using remain
the same. It works similar without using indexonlyscan.



-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-17 Thread Tomas Vondra
On 10.8.2014 23:26, Jeff Davis wrote:
 This patch is requires the Memory Accounting patch, or something similar
 to track memory usage.
 
 The attached patch enables hashagg to spill to disk, which means that
 hashagg will contain itself to work_mem even if the planner makes a
 bad misestimate of the cardinality.
 
 This is a well-known concept; there's even a Berkeley homework
 assignment floating around to implement it -- in postgres 7.2, no
 less. I didn't take the exact same approach as the homework assignment
 suggests, but it's not much different, either. My apologies if some
 classes are still using this as a homework assignment, but postgres
 needs to eventually have an answer to this problem.
 
 Included is a GUC, enable_hashagg_disk (default on), which allows
 the planner to choose hashagg even if it doesn't expect the hashtable
 to fit in memory. If it's off, and the planner misestimates the
 cardinality, hashagg will still use the disk to contain itself to
 work_mem.
 
 One situation that might surprise the user is if work_mem is set too
 low, and the user is *relying* on a misestimate to pick hashagg. With
 this patch, it would end up going to disk, which might be
 significantly slower. The solution for the user is to increase
 work_mem.
 
 Rough Design:
 
 Change the hash aggregate algorithm to accept a generic work item,
 which consists of an input file as well as some other bookkeeping
 information.
 
 Initially prime the algorithm by adding a single work item where the
 file is NULL, indicating that it should read from the outer plan.
 
 If the memory is exhausted during execution of a work item, then
 continue to allow existing groups to be aggregated, but do not allow new
 groups to be created in the hash table. Tuples representing new groups
 are saved in an output partition file referenced in the work item that
 is currently being executed.
 
 When the work item is done, emit any groups in the hash table, clear the
 hash table, and turn each output partition file into a new work item.
 
 Each time through at least some groups are able to stay in the hash
 table, so eventually none will need to be saved in output partitions, no
 new work items will be created, and the algorithm will terminate. This
 is true even if the number of output partitions is always one.
 
 Open items:
* costing
* EXPLAIN details for disk usage
* choose number of partitions intelligently
* performance testing
 
 Initial tests indicate that it can be competitive with sort+groupagg
 when the disk is involved, but more testing is required.
 
 Feedback welcome.

I've been working on this for a few hours - getting familiar with the
code, testing queries etc. Two comments.

1) Apparently there's something broken, because with this:

   create table table_b (fk_id int, val_a int, val_b int);
   insert into table_b
  select i, mod(i,1000), mod(i,1000)
from generate_series(1,1000) s(i);
   analyze table_b;

   I get this:

   set work_mem = '8MB';
   explain analyze select fk_id, count(*)
   from table_b where val_a  50 and val_b  50 group by 1;
The connection to the server was lost. Attempting reset: Failed.

   Stacktrace attached, but apparently there's a segfault in
   advance_transition_function when accessing pergroupstate.

   This happened for all queries that I tried, once they needed to do
   the batching.

2) Using the same hash value both for dynahash and batching seems
   really fishy to me. I'm not familiar with dynahash, but I'd bet
   the way it's done now will lead to bad distribution in the hash
   table (some buckets will be always empty in some batches, etc.).

   This is why hashjoin tries so hard to use non-overlapping parts
   of the hash for batchno/bucketno.

   The hashjoin implements it's onw hash table, which makes it clear
   how the bucket is derived from the hash value. I'm not sure how
   dynahash does that, but I'm pretty sure we can'd just reuse the hash
   value like this.

   I see two options - compute our own hash value, or somehow derive
   a new one (e.g. by doing hashvalue XOR random_seed). I'm not sure
   the latter would work, though.

regards
Tomas
Program received signal SIGSEGV, Segmentation fault.
0x0064e6fc in advance_transition_function (aggstate=0x11405c0, peraggstate=0x1143540, pergroupstate=0x8) at nodeAgg.c:465
465 if (pergroupstate-noTransValue)
(gdb) bt
#0  0x0064e6fc in advance_transition_function (aggstate=0x11405c0, peraggstate=0x1143540, pergroupstate=0x8) at nodeAgg.c:465
#1  0x0064eb0e in advance_aggregates (aggstate=0x11405c0, pergroup=0x8) at nodeAgg.c:621
#2  0x006502a7 in agg_fill_hash_table (aggstate=0x11405c0) at nodeAgg.c:1584
#3  0x0064fc3f in ExecAgg (node=0x11405c0) at nodeAgg.c:1289
#4  0x0063c754 in ExecProcNode (node=0x11405c0) at execProcnode.c:476
#5  0x0063a483 in ExecutePlan (estate=0x11404a8, planstate=0x11405c0, 

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-17 Thread Tomas Vondra
On 15.8.2014 19:53, Robert Haas wrote:
 On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis pg...@j-davis.com wrote:
 On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote:
 Oh?  So if we have aggregates like array_agg whose memory footprint
 increases over time, the patch completely fails to avoid bloat?

 Yes, in its current form.

 I might think a patch with such a limitation was useful, if it weren't
 for the fact that aggregates of that nature are a large part of the
 cases where the planner misestimates the table size in the first place.
 Any complication that we add to nodeAgg should be directed towards
 dealing with cases that the planner is likely to get wrong.

 In my experience, the planner has a lot of difficulty estimating the
 cardinality unless it's coming from a base table without any operators
 above it (other than maybe a simple predicate). This is probably a lot
 more common than array_agg problems, simply because array_agg is
 relatively rare compared with GROUP BY in general.
 
 I think that's right, and I rather like your (Jeff's) approach.  It's
 definitely true that we could do better if we have a mechanism for
 serializing and deserializing group states, but (1) I think an awful
 lot of cases would get an awful lot better even just with the approach
 proposed here and (2) I doubt we would make the
 serialization/deserialization interfaces mandatory, so even if we had
 that we'd probably want a fallback strategy anyway.

I certainly agree that we need Jeff's approach even if we can do better
in some cases (when we are able to serialize/deserialize the states).

And yes, (mis)estimating the cardinalities is a big issue, and certainly
a source of many problems.


 Furthermore, I don't really see that we're backing ourselves into a
 corner here.  If prohibiting creation of additional groups isn't
 sufficient to control memory usage, but we have
 serialization/deserialization functions, we can just pick an arbitrary
 subset of the groups that we're storing in memory and spool their
 transition states off to disk, thus reducing memory even further.  I
 understand Tomas' point to be that this is quite different from what
 we do for hash joins, but I think it's a different problem.  In the
 case of a hash join, there are two streams of input tuples, and we've
 got to batch them in compatible ways.  If we were to, say, exclude an
 arbitrary subset of tuples from the hash table instead of basing it on
 the hash code, we'd have to test *every* outer tuple against the hash
 table for *every* batch.  That would incur a huge amount of additional
 cost vs. being able to discard outer tuples once the batch to which
 they pertain has been processed.

Being able to batch inner and outer relations in a matching way is
certainly one of the reasons why hashjoin uses that particular scheme.
There are other reasons, though - for example being able to answer 'Does
this group belong to this batch?' quickly, and automatically update
number of batches.

I'm not saying the lookup is extremely costly, but I'd be very surprised
if it was as cheap as modulo on a 32-bit integer. Not saying it's the
dominant cost here, but memory bandwidth is quickly becoming one of the
main bottlenecks.


 But the situation here isn't comparable, because there's only one
 input stream.  I'm pretty sure we'll want to keep track of which
 transition states we've spilled due to lack of memory as opposed to
 those which were never present in the table at all, so that we can
 segregate the unprocessed tuples that pertain to spilled transition
 states from the ones that pertain to a group we haven't begun yet.

Why would that be necessary or useful? I don't see a reason for tracking
that / segregating the tuples.

 And it might be that if we know (or learn as we go along) that we're
 going to vastly blow out work_mem it makes sense to use batching to
 segregate the tuples that we decide not to process onto N tapes binned
 by hash code, so that we have a better chance that future batches will
 be the right size to fit in memory.  But I'm not convinced that
 there's a compelling reason why the *first* batch has to be chosen by
 hash code; we're actually best off picking any arbitrary set of groups
 that does the best job reducing the amount of data remaining to be
 processed, at least if the transition states are fixed size and maybe
 even if they aren't.

If you don't choose the fist batch by hash code, it's over, IMHO. You
can't just redo that later easily, because the HashWork items are
already treated separately.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-08-17 Thread Fabrízio de Royes Mello
On Mon, Jul 28, 2014 at 2:24 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:


 On Mon, Jul 28, 2014 at 1:41 PM, Christoph Berg c...@df7cb.de wrote:
 
  Re: Fabrízio de Royes Mello 2014-07-28
CAFcNs+pctx4Q2UYsLOvVFWaznO3U0XhPpkMx5DRhR=jw8w3...@mail.gmail.com
   There are something that should I do on this patch yet?
 
  I haven't got around to have a look at the newest incarnation yet, but
  I plan to do that soonish. (Of course that shouldn't stop others from
  doing that as well if they wish.)
 

 Thanks!


Updated version.

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..2d131df 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -59,16 +59,17 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
 ENABLE ALWAYS RULE replaceable class=PARAMETERrewrite_rule_name/replaceable
 CLUSTER ON replaceable class=PARAMETERindex_name/replaceable
 SET WITHOUT CLUSTER
+SET {LOGGED | UNLOGGED}
 SET WITH OIDS
 SET WITHOUT OIDS
 SET ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] )
+SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable
 RESET ( replaceable class=PARAMETERstorage_parameter/replaceable [, ... ] )
 INHERIT replaceable class=PARAMETERparent_table/replaceable
 NO INHERIT replaceable class=PARAMETERparent_table/replaceable
 OF replaceable class=PARAMETERtype_name/replaceable
 NOT OF
 OWNER TO replaceable class=PARAMETERnew_owner/replaceable
-SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable
 REPLICA IDENTITY {DEFAULT | USING INDEX replaceable class=PARAMETERindex_name/replaceable | FULL | NOTHING}
 
 phraseand replaceable class=PARAMETERtable_constraint_using_index/replaceable is:/phrase
@@ -447,6 +448,20 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
/varlistentry
 
varlistentry
+termliteralSET {LOGGED | UNLOGGED}/literal/term
+listitem
+ para
+  This form changes the table persistence type from unlogged to permanent or
+  from unlogged to permanent (see xref linkend=SQL-CREATETABLE-UNLOGGED).
+ /para
+ para
+  Changing the table persistence type acquires an literalACCESS EXCLUSIVE/literal lock
+  and rewrites the table contents and associated indexes into new disk files.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
 termliteralSET WITH OIDS/literal/term
 listitem
  para
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b1c411a..7f497c7 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -574,7 +574,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	heap_close(OldHeap, NoLock);
 
 	/* Create the transient table that will receive the re-ordered data */
-	OIDNewHeap = make_new_heap(tableOid, tableSpace, false,
+	OIDNewHeap = make_new_heap(tableOid, tableSpace,
+			   OldHeap-rd_rel-relpersistence,
 			   AccessExclusiveLock);
 
 	/* Copy the heap data into the new table in the desired order */
@@ -601,7 +602,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
  * data, then call finish_heap_swap to complete the operation.
  */
 Oid
-make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
+make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
 			  LOCKMODE lockmode)
 {
 	TupleDesc	OldHeapDesc;
@@ -613,7 +614,6 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
 	Datum		reloptions;
 	bool		isNull;
 	Oid			namespaceid;
-	char		relpersistence;
 
 	OldHeap = heap_open(OIDOldHeap, lockmode);
 	OldHeapDesc = RelationGetDescr(OldHeap);
@@ -636,16 +636,10 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
 	if (isNull)
 		reloptions = (Datum) 0;
 
-	if (forcetemp)
-	{
+	if (relpersistence == RELPERSISTENCE_TEMP)
 		namespaceid = LookupCreationNamespace(pg_temp);
-		relpersistence = RELPERSISTENCE_TEMP;
-	}
 	else
-	{
 		namespaceid = RelationGetNamespace(OldHeap);
-		relpersistence = OldHeap-rd_rel-relpersistence;
-	}
 
 	/*
 	 * Create the new heap, using a temporary name in the same namespace as
@@ -1146,6 +1140,7 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 	Oid			relfilenode1,
 relfilenode2;
 	Oid			swaptemp;
+	char		swaprelpersistence;
 	CatalogIndexState indstate;
 
 	/* We need writable copies of both pg_class tuples. */
@@ -1177,6 +1172,10 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 		relform1-reltablespace = relform2-reltablespace;
 		relform2-reltablespace = swaptemp;
 
+		

Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-17 Thread Stephen Frost
Alvaro, all,

* Stephen Frost (sfr...@snowman.net) wrote:
 As mentioned, I'll add this to the ALTER TABLE documentation and remove
 it from the TABLESPACE docs.  That's not done yet but I should have time
 in the next few days to get that done also and will then commit it all
 to master and back-patch to 9.4, barring objections.

Here's a first pass with the documentation changes included.  Feedback
welcome, and I'll review it again and plan to commit it on Tuesday.

Thanks,

Stephen
diff --git a/doc/src/sgml/ref/alter_index.sgml b/doc/src/sgml/ref/alter_index.sgml
index 94a7af0..85159a0 100644
--- a/doc/src/sgml/ref/alter_index.sgml
+++ b/doc/src/sgml/ref/alter_index.sgml
@@ -25,6 +25,8 @@ ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RENA
 ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable SET TABLESPACE replaceable class=PARAMETERtablespace_name/replaceable
 ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable SET ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] )
 ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RESET ( replaceable class=PARAMETERstorage_parameter/replaceable [, ... ] )
+ALTER INDEX ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable [ OWNED BY replaceable class=PARAMETERrole_name/replaceable [, ... ] ]
+SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable [ NOWAIT ]
 /synopsis
  /refsynopsisdiv
 
@@ -63,6 +65,17 @@ ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RESE
  para
   This form changes the index's tablespace to the specified tablespace and
   moves the data file(s) associated with the index to the new tablespace.
+  To change the tablespace of an index, you must own the index and have
+  literalCREATE/literal privilege on the new tablespace.
+  All indexes in a tablespace can be moved by using the
+  literalALL IN TABLESPACE/literal form, which will lock all indexes
+  to be moved and then move each one.  This form also supports
+  literalOWNED BY/literal, which will only move indexes owned by the
+  roles specified.  If the literalNOWAIT/literal option is specified
+  then the command will fail if it is unable to acquire all of the locks
+  required immediately.  Note that system catalogs will not be moved by
+  this command, use commandALTER DATABASE/command or explicit
+  commandALTER INDEX/command invocations instead if desired.
   See also
   xref linkend=SQL-CREATETABLESPACE.
  /para
diff --git a/doc/src/sgml/ref/alter_materialized_view.sgml b/doc/src/sgml/ref/alter_materialized_view.sgml
index 1932eeb..b0759fc 100644
--- a/doc/src/sgml/ref/alter_materialized_view.sgml
+++ b/doc/src/sgml/ref/alter_materialized_view.sgml
@@ -29,6 +29,8 @@ ALTER MATERIALIZED VIEW [ IF EXISTS ] replaceable class=parametername/repla
 RENAME TO replaceable class=parameternew_name/replaceable
 ALTER MATERIALIZED VIEW [ IF EXISTS ] replaceable class=parametername/replaceable
 SET SCHEMA replaceable class=parameternew_schema/replaceable
+ALTER MATERIALIZED VIEW ALL IN TABLESPACE replaceable class=parametername/replaceable [ OWNED BY replaceable class=PARAMETERrole_name/replaceable [, ... ] ]
+SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable [ NOWAIT ]
 
 phrasewhere replaceable class=PARAMETERaction/replaceable is one of:/phrase
 
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..83d230c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -31,6 +31,8 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
 RENAME TO replaceable class=PARAMETERnew_name/replaceable
 ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
 SET SCHEMA replaceable class=PARAMETERnew_schema/replaceable
+ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable [ OWNED BY replaceable class=PARAMETERrole_name/replaceable [, ... ] ]
+SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable [ NOWAIT ]
 
 phrasewhere replaceable class=PARAMETERaction/replaceable is one of:/phrase
 
@@ -597,6 +599,17 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
   moves the data file(s) associated with the table to the new tablespace.
   Indexes on the table, if any, are not moved; but they can be moved
   separately with additional literalSET TABLESPACE/literal commands.
+  All tables in a tablespace can be moved by using the
+  literalALL IN TABLESPACE/literal form, which will lock all tables
+  to be moved first and then move each one.  This form also supports
+  literalOWNED BY/literal, which will only move tables owned by the
+  roles specified.  If the literalNOWAIT/literal option is specified
+  

Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-17 Thread Greg Stark
On 15 Aug 2014 19:06, Robert Haas robertmh...@gmail.com wrote:

  As for the expanded-mode changes, I thought there was consensus to
  revert that from 9.4.

 Me too.  In fact, I think that's been the consensus for many months,
 but unless I'm mistaken it ain't done.

Yeah, this is entirely my fault. I was traveling, busy, not feeling well,
in various permutations but I should have gotten to it earlier.

I'll take care of it tomorrow morning.


[HACKERS] [PATCH] Incremental backup: add backup profile to base backup

2014-08-17 Thread Marco Nenciarini
Hi Hackers,

This is the first piece of file level incremental backup support, as
described on wiki page https://wiki.postgresql.org/wiki/Incremental_backup

It is not yet complete, but I wish to share it on the list to receive
comments and suggestions.

The point of the patch is adding an option to pg_basebackup and
replication protocol BASE_BACKUP command to generate a backup_profile file.

When taking a full backup with pg_basebackup, the user can request
Postgres to generate a backup_profile file through the --profile option
(-B short option, which I've arbitrarily picked up because both -P and
-p was already taken)

At the moment the backup profile consists of a file with one line per
file detailing modification time, md5, size, tablespace and path
relative to tablespace root (PGDATA or the tablespace)

To calculate the md5 checksum I've used the md5 code present in pgcrypto
contrib as the code in src/include/libpq/md5.h is not suitable for large
files. Since a core feature cannot depend on a piece of contrib, I've
moved the files

contrib/pgcrypto/md5.c
contrib/pgcrypto/md5.h

to

src/backend/utils/hash/md5.c
src/include/utils/md5.h

changing the pgcrypto extension to use them.

There are still some TODOs:

* User documentation

* Remove the pg_basebackup code duplication I've introduced with the
ReceiveAndUnpackTarFileToDir function, which is almost the same of
ReceiveAndUnpackTarFile but does not expect to handle a tablespace. It
instead simply extract a tar stream in a destination directory. The
latter could probably be rewritten using the former as component, but it
needs some adjustment to the progress reporting part, which is not
present at the moment in ReceiveAndUnpackTarFileToDir.

* Add header section to backup_profile file which at the moment contains
only the body part. I'm thinking to change the original design and put
the whole backup label as header, which is IMHO clearer and well known.
I would use something like:

START WAL LOCATION: 0/E28 (file 0001000E)
CHECKPOINT LOCATION: 0/E60
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2014-08-14 18:54:01 CEST
LABEL: pg_basebackup base backup
END LABEL

I've attached the current patch based on master.

Any comment will be appreciated.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
diff --git a/contrib/pgcrypto/Makefile b/contrib/pgcrypto/Makefile
index b6c9484..daf2ba3 100644
*** a/contrib/pgcrypto/Makefile
--- b/contrib/pgcrypto/Makefile
***
*** 1,6 
  # contrib/pgcrypto/Makefile
  
! INT_SRCS = md5.c sha1.c sha2.c internal.c internal-sha2.c blf.c rijndael.c \
fortuna.c random.c pgp-mpi-internal.c imath.c
  INT_TESTS = sha2
  
--- 1,6 
  # contrib/pgcrypto/Makefile
  
! INT_SRCS = sha1.c sha2.c internal.c internal-sha2.c blf.c rijndael.c \
fortuna.c random.c pgp-mpi-internal.c imath.c
  INT_TESTS = sha2
  
diff --git a/contrib/pgcrypto/internal.c b/contrib/pgcrypto/internal.c
index cb8ba26..0d2db23 100644
*** a/contrib/pgcrypto/internal.c
--- b/contrib/pgcrypto/internal.c
***
*** 30,40 
   */
  
  #include postgres.h
  
  #include time.h
  
  #include px.h
- #include md5.h
  #include sha1.h
  #include blf.h
  #include rijndael.h
--- 30,40 
   */
  
  #include postgres.h
+ #include utils/md5.h
  
  #include time.h
  
  #include px.h
  #include sha1.h
  #include blf.h
  #include rijndael.h
diff --git a/contrib/pgcrypto/md5.c b/contrib/pgcrypto/md5.c
index cac4e40..e69de29 .
*** a/contrib/pgcrypto/md5.c
--- b/contrib/pgcrypto/md5.c
***
*** 1,397 
- /*   $KAME: md5.c,v 1.3 2000/02/22 14:01:17 itojun Exp $ */
- 
- /*
-  * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
-  * All rights reserved.
-  *
-  * Redistribution and use in source and binary forms, with or without
-  * modification, are permitted provided that the following conditions
-  * are met:
-  * 1. Redistributions of source code must retain the above copyright
-  *  notice, this list of conditions and the following disclaimer.
-  * 2. Redistributions in binary form must reproduce the above copyright
-  *  notice, this list of conditions and the following disclaimer in the
-  *  documentation and/or other materials provided with the distribution.
-  * 3. Neither the name of the project nor the names of its contributors
-  *  may be used to endorse or promote products derived from this software
-  *  without specific prior written permission.
-  *
-  * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
-  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-  * ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
-  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 

Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-17 Thread Peter Eisentraut
On 8/17/14 5:19 PM, Stephen Frost wrote:
 Alvaro, all,
 
 * Stephen Frost (sfr...@snowman.net) wrote:
 As mentioned, I'll add this to the ALTER TABLE documentation and remove
 it from the TABLESPACE docs.  That's not done yet but I should have time
 in the next few days to get that done also and will then commit it all
 to master and back-patch to 9.4, barring objections.
 
 Here's a first pass with the documentation changes included.  Feedback
 welcome, and I'll review it again and plan to commit it on Tuesday.

One thing that is not clear from this (before or after) is whether all
{tables|indexes|etc} in the tablespace actually means in this
tablespace and in the current database.  Presumably it does.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] option -T in pg_basebackup doesn't work on windows

2014-08-17 Thread Peter Eisentraut
On 8/16/14 8:46 AM, Amit Kapila wrote:
 On Fri, Aug 15, 2014 at 1:03 PM, MauMau maumau...@gmail.com
 mailto:maumau...@gmail.com wrote:

 Thank you.  The code looks correct.  I confirmed that the
 pg_basebackup could relocate the tablespace directory on Windows.

 I marked this patch as ready for committer.
 
 Thanks for the review.

It's not ready for committer if the current patch does not apply.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why not ISO 8601 format for date values rendered into JSON?

2014-08-17 Thread Tom Lane
I was just going over the release notes, and noticed the bit about
timestamp and timestamptz now being rendered in a fixed ISO-8601-compliant
format rather than whatever random DateStyle is in use.  That's fine,
but I wonder why the same approach wasn't applied to type date?

regression=# set datestyle to postgres;
SET

regression=# select row_to_json(row(now()));
row_to_json
---
 {f1:2014-08-17T20:34:54.424237-04:00}
(1 row)

regression=# select row_to_json(row(current_date));
 row_to_json 
-
 {f1:08-17-2014}
(1 row)

Doesn't seem real consistent ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] option -T in pg_basebackup doesn't work on windows

2014-08-17 Thread Michael Paquier
On Mon, Aug 18, 2014 at 9:37 AM, Peter Eisentraut pete...@gmx.net wrote:
 It's not ready for committer if the current patch does not apply.
FWIW, the latest version sent by Amit here applies correctly:
http://www.postgresql.org/message-id/caa4ek1+cc9rb1s9q4+nsokfas1yufvfgfxuhxy_6wlbq1re...@mail.gmail.com
I haven't tested it myself though.

However, independently on this patch and as pointed by MauMau, the
code that has been committed in fb05f3c is incorrect in the way it
defines the tablespace path, this:
psprintf(%s/pg_tblspc/%d, basedir, oid);
should be this:
psprintf(%s/pg_tblspc/%u, basedir, oid);
I am separating this fix (that should be backpatched to REL9_4_STABLE
as well), in the patch attached if this helps.
Regards,
-- 
Michael
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 3d26e22..ef7cd94 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1119,7 +1119,7 @@ update_tablespace_symlink(Oid oid, const char *old_dir)
 
 	if (strcmp(old_dir, new_dir) != 0)
 	{
-		char	   *linkloc = psprintf(%s/pg_tblspc/%d, basedir, oid);
+		char	   *linkloc = psprintf(%s/pg_tblspc/%u, basedir, oid);
 
 		if (unlink(linkloc) != 0  errno != ENOENT)
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?

2014-08-17 Thread Andrew Dunstan


On 08/17/2014 08:42 PM, Tom Lane wrote:

I was just going over the release notes, and noticed the bit about
timestamp and timestamptz now being rendered in a fixed ISO-8601-compliant
format rather than whatever random DateStyle is in use.  That's fine,
but I wonder why the same approach wasn't applied to type date?

regression=# set datestyle to postgres;
SET

regression=# select row_to_json(row(now()));
 row_to_json
---
  {f1:2014-08-17T20:34:54.424237-04:00}
(1 row)

regression=# select row_to_json(row(current_date));
  row_to_json
-
  {f1:08-17-2014}
(1 row)

Doesn't seem real consistent ...





Good point. Probably because I didn't get a complaint about it, which in 
turn is probably because JavaScript's builtin Date class is in fact 
(from our POV) more or less a timestamp(tz) type.


See 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date


But yes, I agree it should be fixed. Whatever we output should be 
suitable as input for the string-argument constructor of class Date.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?

2014-08-17 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 08/17/2014 08:42 PM, Tom Lane wrote:
 I was just going over the release notes, and noticed the bit about
 timestamp and timestamptz now being rendered in a fixed ISO-8601-compliant
 format rather than whatever random DateStyle is in use.  That's fine,
 but I wonder why the same approach wasn't applied to type date?

 But yes, I agree it should be fixed. Whatever we output should be 
 suitable as input for the string-argument constructor of class Date.

OK.  I think I can fix it, if you don't have time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication commands and index terms

2014-08-17 Thread Fujii Masao
On Fri, Aug 15, 2014 at 11:25 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Aug 15, 2014 at 4:59 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Aug 14, 2014 at 12:59 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Since I sometimes try to search the replication command in the index,
 I'd feel inclined to expose all those commands as index terms...

 +1.

 Attached patch exposes replication commands as index terms.
 Barring any objection, I will commit this.

Done.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-08-17 Thread Fujii Masao
On Sat, Aug 16, 2014 at 4:23 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Fri, Aug 8, 2014 at 11:45 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas robertmh...@gmail.com wrote:
 Should we try to install some hack around fastupdate for 9.4?  I fear
 the divergence between reasonable values of work_mem and reasonable
 sizes for that list is only going to continue to get bigger.  I'm sure
 there's somebody out there who has work_mem = 16GB, and stuff like
 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the
 appeal of large values.

 Controlling the threshold of the size of pending list only by GUC doesn't
 seem reasonable. Users may want to increase the threshold only for the
 GIN index which can be updated heavily, and decrease it otherwise. So
 I think that it's better to add new storage parameter for GIN index to 
 control
 the threshold, or both storage parameter and GUC.

 Yeah, -1 for a GUC.  A GIN-index-specific storage parameter seems more
 appropriate.

 The attached patch introduces the GIN index storage parameter
 PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of
 GIN pending list. If it's not set, work_mem is used as that maximum size,
 instead. So this patch doesn't break the existing application which
 currently uses work_mem as the threshold of cleanup operation of
 the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.

 This is an index storage parameter, and allows us to specify each
 threshold per GIN index. So, for example, we can increase the threshold
 only for the GIN index which can be updated heavily, and decrease it 
 otherwise.

 This patch uses another patch that I proposed (*1) as an infrastructure.
 Please apply that infrastructure patch first if you apply this patch.

 (*1)
 http://www.postgresql.org/message-id/CAHGQGwEanQ_e8WLHL25=bm_8z5zkyzw0k0yir+kdmv2hgne...@mail.gmail.com

 Regards,

 --
 Fujii Masao

 Sorry, I forgot to attached the patch This time, attached.


 I think that this patch should be rebased.
 It contains garbage code.

Thanks for reviewing the patch! ISTM that I failed to make the patch from
my git repository... Attached is the rebased version.

Regards,

-- 
Fujii Masao
diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml
index 80a578d..24c8dc1 100644
--- a/doc/src/sgml/gin.sgml
+++ b/doc/src/sgml/gin.sgml
@@ -728,8 +728,9 @@
from the indexed item). As of productnamePostgreSQL/productname 8.4,
acronymGIN/ is capable of postponing much of this work by inserting
new tuples into a temporary, unsorted list of pending entries.
-   When the table is vacuumed, or if the pending list becomes too large
-   (larger than xref linkend=guc-work-mem), the entries are moved to the
+   When the table is vacuumed, or if the pending list becomes larger than
+   literalPENDING_LIST_CLEANUP_SIZE/literal (or
+   xref linkend=guc-work-mem if not set), the entries are moved to the
main acronymGIN/acronym data structure using the same bulk insert
techniques used during initial index creation.  This greatly improves
acronymGIN/acronym index update speed, even counting the additional
@@ -812,18 +813,27 @@
   /varlistentry
 
   varlistentry
-   termxref linkend=guc-work-mem/term
+   termliteralPENDING_LIST_CLEANUP_SIZE/ and
+   xref linkend=guc-work-mem/term
listitem
 para
  During a series of insertions into an existing acronymGIN/acronym
  index that has literalFASTUPDATE/ enabled, the system will clean up
  the pending-entry list whenever the list grows larger than
- varnamework_mem/.  To avoid fluctuations in observed response time,
- it's desirable to have pending-list cleanup occur in the background
- (i.e., via autovacuum).  Foreground cleanup operations can be avoided by
- increasing varnamework_mem/ or making autovacuum more aggressive.
- However, enlarging varnamework_mem/ means that if a foreground
- cleanup does occur, it will take even longer.
+ literalPENDING_LIST_CLEANUP_SIZE/ (if not set, varnamework_mem/
+ is used as that threshold, instead). To avoid fluctuations in observed
+ response time, it's desirable to have pending-list cleanup occur in the
+ background (i.e., via autovacuum).  Foreground cleanup operations
+ can be avoided by increasing literalPENDING_LIST_CLEANUP_SIZE/
+ (or varnamework_mem/) or making autovacuum more aggressive.
+ However, enlarging the threshold of the cleanup operation means that
+ if a foreground cleanup does occur, it will take even longer.
+/para
+para
+ literalPENDING_LIST_CLEANUP_SIZE/ is an index storage parameter,
+ and allows each GIN index to have its own cleanup threshold.
+ For example, it's possible to increase the 

Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?

2014-08-17 Thread Andrew Dunstan


On 08/17/2014 09:53 PM, Tom Lane wrote:


OK.  I think I can fix it, if you don't have time.





[offlist]

Thanks. FYI I am still recovering from treatment for prostate cancer I 
had not long after pgcon ... it's taken more out of me that I expected, 
so time is limited.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] option -T in pg_basebackup doesn't work on windows

2014-08-17 Thread Amit Kapila
On Mon, Aug 18, 2014 at 7:08 AM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Mon, Aug 18, 2014 at 9:37 AM, Peter Eisentraut pete...@gmx.net wrote:
  It's not ready for committer if the current patch does not apply.
 FWIW, the latest version sent by Amit here applies correctly:

http://www.postgresql.org/message-id/caa4ek1+cc9rb1s9q4+nsokfas1yufvfgfxuhxy_6wlbq1re...@mail.gmail.com
 I haven't tested it myself though.

 However, independently on this patch and as pointed by MauMau, the
 code that has been committed in fb05f3c is incorrect in the way it
 defines the tablespace path, this:
 psprintf(%s/pg_tblspc/%d, basedir, oid);
 should be this:
 psprintf(%s/pg_tblspc/%u, basedir, oid);
 I am separating this fix (that should be backpatched to REL9_4_STABLE
 as well), in the patch attached if this helps.

I think the patch provided by me needs to be back patched to
9.4 as this is a new option introduced in 9.4 which is not working
on windows.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?

2014-08-17 Thread Andrew Dunstan


On 08/17/2014 10:50 PM, Andrew Dunstan wrote:


On 08/17/2014 09:53 PM, Tom Lane wrote:


OK.  I think I can fix it, if you don't have time.





[offlist]

Thanks. FYI I am still recovering from treatment for prostate cancer I 
had not long after pgcon ... it's taken more out of me that I 
expected, so time is limited.






Darn it. well, I guess everyone knows now.  *sigh*

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] option -T in pg_basebackup doesn't work on windows

2014-08-17 Thread Michael Paquier
On Mon, Aug 18, 2014 at 11:51 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Aug 18, 2014 at 7:08 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 On Mon, Aug 18, 2014 at 9:37 AM, Peter Eisentraut pete...@gmx.net wrote:
  It's not ready for committer if the current patch does not apply.
 FWIW, the latest version sent by Amit here applies correctly:

 http://www.postgresql.org/message-id/caa4ek1+cc9rb1s9q4+nsokfas1yufvfgfxuhxy_6wlbq1re...@mail.gmail.com
 I haven't tested it myself though.

 However, independently on this patch and as pointed by MauMau, the
 code that has been committed in fb05f3c is incorrect in the way it
 defines the tablespace path, this:
 psprintf(%s/pg_tblspc/%d, basedir, oid);
 should be this:
 psprintf(%s/pg_tblspc/%u, basedir, oid);
 I am separating this fix (that should be backpatched to REL9_4_STABLE
 as well), in the patch attached if this helps.

 I think the patch provided by me needs to be back patched to
 9.4 as this is a new option introduced in 9.4 which is not working
 on windows.
Yep. Definitely. This will fix both issues at the same time.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-17 Thread Fujii Masao
On Sat, Aug 16, 2014 at 6:51 PM, Rahila Syed rahilasye...@gmail.com wrote:
So, you're compressing backup blocks one by one. I wonder if that's the
right idea and if we shouldn't instead compress all of them in one run to
increase the compression ratio

 Please find attached patch for compression of all blocks of a record
 together .

 Following are the measurement results:


 Benchmark:

 Scale : 16
 Command  :java JR  /home/postgres/jdbcrunner-1.2/scripts/tpcc.js  -sleepTime
 550,250,250,200,200

 Warmup time  : 1 sec
 Measurement time : 900 sec
 Number of tx types   : 5
 Number of agents : 16
 Connection pool size : 16
 Statement cache size : 40
 Auto commit  : false


 Checkpoint segments:1024
 Checkpoint timeout:5 mins




 Compression   Multiple
 Blocks in one runSingle Block in one run

   Bytes saved
 0   0



 OFF   WAL generated
 1265150984(~1265MB)   1264771760(~1265MB)



  % Compression
 NA  NA




   Bytes saved
 215215079 (~215MB)   285675622 (~286MB)



 LZ4   WAL generated
 125118783(~1251MB)1329031918(~1329MB)



  % Compression  17.2
 % 21.49 %




  Bytes saved
 203705959 (~204MB)  271009408 (~271MB)



 Snappy   WAL generated
 1254505415(~1254MB)  1329628352(~1330MB)



  % Compression16.23
 %   20.38%




  Bytes saved
 155910177(~156MB)   182804997(~182MB)



 pglz WAL generated
 1259773129(~1260MB) 1286670317(~1287MB)



  % Compression12.37%
 14.21%





 As per measurement results of this benchmark, compression of multiple blocks
 didn't improve compression ratio over compression  of single block.

According to the measurement result, the amount of WAL generated in
Multiple Blocks in one run than that in Single Block in one run.
So ISTM that compression of multiple blocks at one run can improve
the compression ratio. Am I missing something?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Incremental backup: add backup profile to base backup

2014-08-17 Thread Alvaro Herrera
Marco Nenciarini wrote:

 To calculate the md5 checksum I've used the md5 code present in pgcrypto
 contrib as the code in src/include/libpq/md5.h is not suitable for large
 files. Since a core feature cannot depend on a piece of contrib, I've
 moved the files
 
 contrib/pgcrypto/md5.c
 contrib/pgcrypto/md5.h
 
 to
 
 src/backend/utils/hash/md5.c
 src/include/utils/md5.h
 
 changing the pgcrypto extension to use them.

We already have the FNV checksum implementation in the backend -- can't
we use that one for this and avoid messing with MD5?

(I don't think we're looking for a cryptographic hash here.  Am I wrong?)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-08-17 Thread Michael Paquier
On Fri, Aug 15, 2014 at 5:17 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Thanks for your review.

 On Fri, Aug 15, 2014 at 12:56 AM,  furu...@pm.nttdata.co.jp wrote:
 At consistency with pg_recvlogical, do you think about --start?
 I did not add that for the sake of backward-compatibility as in
 pg_recvlogical an action is mandatory. It is not the case now of
 pg_receivexlog.

 [postgres postgresql-6aa6158]$ make  /dev/null
 streamutil.c: In function 'CreateReplicationSlot':
 streamutil.c:244: warning: suggest parentheses around '' within '||'
 I see. Here is a rebased patch.

Looking more at the code, IDENTIFY_SYSTEM management can be unified
into a single function. So I have written a newer version of the patch
grouping IDENTIFY_SYSTEM call into a single function for both
utilities, fixing at the same time a couple of other issues:
- correct use of TimelineID in code
- IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
check was done but in 9.4 this command returns 4 fields:
(PQntuples(res) != 1 || PQnfields(res)  3)
That's not directly related to this patch, but making some corrections
is not going to hurt..

Regards,
-- 
Michael
From 160b00e595dab739aa4da7d18a81aff38d0a837f Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Mon, 18 Aug 2014 14:32:44 +0900
Subject: [PATCH] Add support for physical slot creation/deletion in
 pg_receivexlog

Physical slot creation can be done with --create and drop with --drop.
In both cases --slot is needed. Code for replication slot creation and
drop is refactored with what was available in pg_recvlogical, the same
applied with IDENTIFY_SYSTEM to simplify the whole set.
---
 doc/src/sgml/ref/pg_receivexlog.sgml   |  29 +++
 src/bin/pg_basebackup/pg_receivexlog.c | 108 -
 src/bin/pg_basebackup/pg_recvlogical.c | 119 +---
 src/bin/pg_basebackup/streamutil.c | 140 +
 src/bin/pg_basebackup/streamutil.h |   9 +++
 5 files changed, 265 insertions(+), 140 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml
index 5916b8f..7e46005 100644
--- a/doc/src/sgml/ref/pg_receivexlog.sgml
+++ b/doc/src/sgml/ref/pg_receivexlog.sgml
@@ -72,6 +72,35 @@ PostgreSQL documentation
   titleOptions/title
 
para
+applicationpg_receivexlog/application can run in one of two following
+modes, which control physical replication slot:
+
+variablelist
+
+ varlistentry
+  termoption--create/option/term
+  listitem
+   para
+Create a new physical replication slot with the name specified in
+option--slot/option, then exit.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
+  termoption--drop/option/term
+  listitem
+   para
+Drop the replication slot with the name specified
+in option--slot/option, then exit.
+   /para
+  /listitem
+ /varlistentry
+/variablelist
+
+   /para
+
+   para
 The following command-line options control the location and format of the
 output.
 
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index a8b9ad3..a56cd9e 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -38,6 +38,8 @@ static int	noloop = 0;
 static int	standby_message_timeout = 10 * 1000;		/* 10 sec = default */
 static int	fsync_interval = 0; /* 0 = default */
 static volatile bool time_to_abort = false;
+static bool do_create_slot = false;
+static bool do_drop_slot = false;
 
 
 static void usage(void);
@@ -78,6 +80,9 @@ usage(void)
 	printf(_(  -w, --no-password  never prompt for password\n));
 	printf(_(  -W, --password force password prompt (should happen automatically)\n));
 	printf(_(  -S, --slot=SLOTNAMEreplication slot to use\n));
+	printf(_(\nOptional actions:\n));
+	printf(_(  --create   create a new replication slot (for the slot's name see --slot)\n));
+	printf(_(  --drop drop the replication slot (for the slot's name see --slot)\n));
 	printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n));
 }
 
@@ -253,21 +258,10 @@ FindStreamingStart(uint32 *tli)
 static void
 StreamLog(void)
 {
-	PGresult   *res;
 	XLogRecPtr	startpos;
-	uint32		starttli;
+	TimeLineID	starttli;
 	XLogRecPtr	serverpos;
-	uint32		servertli;
-	uint32		hi,
-lo;
-
-	/*
-	 * Connect in replication mode to the server
-	 */
-	conn = GetConnection();
-	if (!conn)
-		/* Error message already written in GetConnection() */
-		return;
+	TimeLineID	servertli;
 
 	if (!CheckServerVersionForStreaming(conn))
 	{
@@ -280,33 +274,11 @@ StreamLog(void)
 	}
 
 	/*
-	 * Run IDENTIFY_SYSTEM so we can get the timeline and current xlog
-	 * position.
+	 * Identify server, obtaining start LSN position and current timeline ID
+	 * at the same time.
 	 */
-	res = PQexec(conn, IDENTIFY_SYSTEM);
-	if 

Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries

2014-08-17 Thread Alvaro Herrera
Pavel Stehule wrote:
 2014-08-13 15:22 GMT+02:00 MauMau maumau...@gmail.com:

  I didn't mean performance statistics data to be stored in database tables.
  I just meant:
 
  * pg_stat_system_events is a view to show data on memory, which returns
  one row for each event across the system.  This is similar to
  V$SYSTEM_EVENT in Oracle.
 
  * pg_stat_session_events is a view to show data on memory, which returns
  one row for each event on one session.  This is similar to V$SESSION_EVENT
  in Oracle.
 
  * The above views represent the current accumulated data like other
  pg_stat_xxx views.
 
  * EXPLAIN ANALYZE and auto_explain shows all events for one query.  The
  lock waits you are trying to record in the server log is one of the events.
 
 I am little bit sceptic about only memory based structure. Is it this
 concept acceptable for commiters?

Is this supposed to be session-local data, or is it visible from remote
sessions too?  How durable is it supposed to be?  Keep in mind that in
case of a crash, all pgstats data is erased.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers