Re: [HACKERS] Patch to support SEMI and ANTI join removal
On Sun, Aug 10, 2014 at 11:48 PM, David Rowley dgrowle...@gmail.com wrote: I've attached an updated version of the patch which fixes up some incorrect logic in the foreign key matching code, plus various comment improvements. I've made a few updated to the patch to simplify some logic in the code which analyses the join condition. The result is slightly faster code for detecting either successful or unsuccessful join removal. I've also been doing a little benchmarking of the overhead that this adds to planning time for a handful of different queries. With the queries I tested the overhead was between ~20 and ~423 nanoseconds per SEMI or ANTI join, the 20 was for the earliest fast path out on an unsuccessful removal and the 423 was for a successful removal. (tests done on a 4 year old intel i5 laptop). This accounted for between 0.01% and 0.2% of planning time for the tested queries. I was quite happy with this, but I did manage to knock it down a little more with the bms_get_singleton_v1.patch, which I've attached. This reduces the range to between ~15 and ~409 nanoseconds, but probably this is going into micro benchmark territory... so perhaps not worth the extra code... With the benchmarks I just put semiorantijoin_is_removable() in a tight 1 million iteration loop and grabbed the total planning time for that, I then compared this to an unpatched master's planning time after dividing the time reported for the 1 million removals version by 1 million. I didn't really find a good way to measure the extra overhead in actually loading the foreign key constraints in get_relation_info() Regards David Rowley --Benchmark with the following code: /* else if (sjinfo-jointype == JOIN_SEMI) { List *columnlist; RelOptInfo *rel; bool result; int x; for (x = 0; x 100; x++) result = semiorantijoin_is_removable(root, sjinfo, columnlist, rel); /* Skip if not removable */ if (!result) continue; Assert(columnlist != NIL); convert_semijoin_to_isnotnull_quals(root, rel, columnlist); } */ create table t2 (id int primary key); create table t1 (value1 int references t2, value2 int, value3 int, value4 int); -- test 1. Successful removal with single fk explain select * from t1 where value1 in(select id from t2); --times in milliseconds for planning plus 1 million semiorantijoin_is_removable calls 425.019 413.903 407.200 411.014 411.370 -- as above, but with with bms_get_singleton patch applied 405.971 390.838 401.959 407.593 393.540 -- unpatched master planning time for above query 0.207 0.216 0.203 0.194 0.200 -- test 2. Non var in outer join condition. explain select * from t1 where value1 in(select 0 from t2); --times in milliseconds for planning plus 1 million semiorantijoin_is_removable calls 40.419 40.374 36.831 42.832 40.248 -- as above, but with with bms_get_singleton patch applied 31.064 32.176 30.028 31.654 34.571 -- unpatched master planning time for above query 0.169 0.171 0.156 0.155 0.157 -- test 3. Fail case. No foreign key defined. explain select * from t1 where value2 in(select id from t2); --times in milliseconds for planning plus 1 million semiorantijoin_is_removable calls 409.801 414.237 406.165 406.959 401.322 -- as above, but with with bms_get_singleton patch applied 367.536 347.254 349.383 348.291 348.063 -- unpatched master planning time for above query 0.214 0.200 0.199 0.195 0.248 -- test 4. 2 foreign keys defined. (with join removed) ALTER TABLE t1 ADD CONSTRAINT t1_value2_fkey FOREIGN KEY (value2) REFERENCES t2; explain select * from t1 where value2 in(select id from t2); --times in milliseconds for planning plus 1 million semiorantijoin_is_removable calls 431.408 421.737 426.109 419.222 418.526 -- as above, but with with bms_get_singleton patch applied 392.525 392.785 393.102 388.653 393.168 -- unpatched master planning time for above query 0.211 0.201 0.235 0.236 0.230 -- test 5. 2 foreign keys defined (without join removed) explain select * from t1 where value3 in(select id from t2); --times in milliseconds for planning plus 1 million semiorantijoin_is_removable calls 426.327 423.019 431.331 423.405 413.779 -- as above, but with with bms_get_singleton patch applied 411.991 412.971 402.479 405.715 416.528 -- unpatched master planning time for above query 0.237 0.261 0.266 0.214 0.218 -- test 6. wrong operator. explain select * from t1 where exists(select 1 from t2 where value1 id); --times in milliseconds for planning plus 1 million semiorantijoin_is_removable calls 122.176 117.536 119.057 125.618 118.249 -- as above, but with with bms_get_singleton patch applied 109.182 108.424 109.526 110.091 107.240 -- unpatched master
Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak
Alvaro, all, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Can you be more specific on the exact grammar you're considering? The proposal above, ALTER TABLE ON ALL TABLES IN TABLESPACE xyz doesn't seem very good to me. I would think it'd be more like ALTER ALL TABLES IN TABLESPACE xyz but then if you return ALTER TABLE as a command tag that might be a bit strange. Maybe ALTER TABLE ALL IN TABLESPACE xyz which AFAICS should work since ALL is already a reserved keyword. I've implemented the 'ALTER TABLE ALL IN TABLESPACE ...' appproach, as discussed. Patch attached. Also, how would we document this? Would we have it in the same page as all the ALTER TABLE variants, or would we create a separate page for ALTER TABLE ALL? Keeping in mind that in the future we might want to allow things such as ALTER TABLE ALL IN SCHEMA xyz it might be better to have the selection logic documented neatly in its own little page instead of together with the ALTER TABLE mess which is already rather large. As mentioned, I'll add this to the ALTER TABLE documentation and remove it from the TABLESPACE docs. That's not done yet but I should have time in the next few days to get that done also and will then commit it all to master and back-patch to 9.4, barring objections. Thanks, Stephen diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index c86b999..feceed7 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -50,6 +50,7 @@ #include commands/tablespace.h #include commands/trigger.h #include commands/typecmds.h +#include commands/user.h #include executor/executor.h #include foreign/foreign.h #include miscadmin.h @@ -9204,6 +9205,176 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode) } /* + * Alter Table ALL ... SET TABLESPACE + * + * Allows a user to move all objects of some type in a given tablespace in the + * current database to another tablespace. Objects can be chosen based on the + * owner of the object also, to allow users to move only their objects. + * The user must have CREATE rights on the new tablespace, as usual. The main + * permissions handling is done by the lower-level table move function. + * + * All to-be-moved objects are locked first. If NOWAIT is specified and the + * lock can't be acquired then we ereport(ERROR). + */ +Oid +AlterTableMoveAll(AlterTableMoveAllStmt *stmt) +{ + List *relations = NIL; + ListCell *l; + ScanKeyData key[1]; + Relation rel; + HeapScanDesc scan; + HeapTuple tuple; + Oid orig_tablespaceoid; + Oid new_tablespaceoid; + List *role_oids = roleNamesToIds(stmt-roles); + + /* Ensure we were not asked to move something we can't */ + if (stmt-objtype != OBJECT_TABLE stmt-objtype != OBJECT_INDEX + stmt-objtype != OBJECT_MATVIEW) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(only tables, indexes, and materialized views exist in tablespaces))); + + /* Get the orig and new tablespace OIDs */ + orig_tablespaceoid = get_tablespace_oid(stmt-orig_tablespacename, false); + new_tablespaceoid = get_tablespace_oid(stmt-new_tablespacename, false); + + /* Can't move shared relations in to or out of pg_global */ + /* This is also checked by ATExecSetTableSpace, but nice to stop earlier */ + if (orig_tablespaceoid == GLOBALTABLESPACE_OID || + new_tablespaceoid == GLOBALTABLESPACE_OID) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(cannot move relations in to or out of pg_global tablespace))); + + /* + * Must have CREATE rights on the new tablespace, unless it is the + * database default tablespace (which all users implicitly have CREATE + * rights on). + */ + if (OidIsValid(new_tablespaceoid) new_tablespaceoid != MyDatabaseTableSpace) + { + AclResult aclresult; + + aclresult = pg_tablespace_aclcheck(new_tablespaceoid, GetUserId(), + ACL_CREATE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, ACL_KIND_TABLESPACE, + get_tablespace_name(new_tablespaceoid)); + } + + /* + * Now that the checks are done, check if we should set either to + * InvalidOid because it is our database's default tablespace. + */ + if (orig_tablespaceoid == MyDatabaseTableSpace) + orig_tablespaceoid = InvalidOid; + + if (new_tablespaceoid == MyDatabaseTableSpace) + new_tablespaceoid = InvalidOid; + + /* no-op */ + if (orig_tablespaceoid == new_tablespaceoid) + return new_tablespaceoid; + + /* + * Walk the list of objects in the tablespace and move them. This will + * only find objects in our database, of course. + */ + ScanKeyInit(key[0], +Anum_pg_class_reltablespace, +BTEqualStrategyNumber, F_OIDEQ, +ObjectIdGetDatum(orig_tablespaceoid)); + + rel = heap_open(RelationRelationId, AccessShareLock); + scan = heap_beginscan_catalog(rel, 1, key); + while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + { + Oid relOid =
Re: [HACKERS] Index-only scans for GIST
2014-08-07 0:30 GMT+04:00 Heikki Linnakangas hlinnakan...@vmware.com: * I'm getting two regression failures with this (opr_sanity and join). opr_sanity failure is corrected. But there is remain question with join. I check the latest version of my github repo and there's no fail in join regression test All 145 tests passed. To tell the truth, I don't understand which changes could led to this failure. Could you show me regression diffs? I want to understand what's wrong with the patch. * The regression test queries that use LIMIT are not guaranteed to always return the same rows, hence they're not very good regression test cases. I'd suggest using more restricting WHERE clauses, so that each query only returns a handful of rows. Thank you for comment, I rewrote wrong queries. But could you explain why LIMIT queries may return different results? Is it happens because of different query optimization? * I think it's leaking memory, in GIST scan context. I tested this with a variant of the regression tests: insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 0.05*i)), point(0.05*i, 0.05*i) FROM generate_series(0, 1000) as i; CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p); set enable_seqscan=off; set enable_bitmapscan=off; explain analyze select p from gist_tbl where p @ box(point(0,0), point(999,999)) and length(p::text) 10; while the final query runs, 'top' shows constantly increasing memory usage. I don't think it's memory leak. After some increasing, memory using remain the same. It works similar without using indexonlyscan. -- Best regards, Lubennikova Anastasia
Re: [HACKERS] 9.5: Memory-bounded HashAgg
On 10.8.2014 23:26, Jeff Davis wrote: This patch is requires the Memory Accounting patch, or something similar to track memory usage. The attached patch enables hashagg to spill to disk, which means that hashagg will contain itself to work_mem even if the planner makes a bad misestimate of the cardinality. This is a well-known concept; there's even a Berkeley homework assignment floating around to implement it -- in postgres 7.2, no less. I didn't take the exact same approach as the homework assignment suggests, but it's not much different, either. My apologies if some classes are still using this as a homework assignment, but postgres needs to eventually have an answer to this problem. Included is a GUC, enable_hashagg_disk (default on), which allows the planner to choose hashagg even if it doesn't expect the hashtable to fit in memory. If it's off, and the planner misestimates the cardinality, hashagg will still use the disk to contain itself to work_mem. One situation that might surprise the user is if work_mem is set too low, and the user is *relying* on a misestimate to pick hashagg. With this patch, it would end up going to disk, which might be significantly slower. The solution for the user is to increase work_mem. Rough Design: Change the hash aggregate algorithm to accept a generic work item, which consists of an input file as well as some other bookkeeping information. Initially prime the algorithm by adding a single work item where the file is NULL, indicating that it should read from the outer plan. If the memory is exhausted during execution of a work item, then continue to allow existing groups to be aggregated, but do not allow new groups to be created in the hash table. Tuples representing new groups are saved in an output partition file referenced in the work item that is currently being executed. When the work item is done, emit any groups in the hash table, clear the hash table, and turn each output partition file into a new work item. Each time through at least some groups are able to stay in the hash table, so eventually none will need to be saved in output partitions, no new work items will be created, and the algorithm will terminate. This is true even if the number of output partitions is always one. Open items: * costing * EXPLAIN details for disk usage * choose number of partitions intelligently * performance testing Initial tests indicate that it can be competitive with sort+groupagg when the disk is involved, but more testing is required. Feedback welcome. I've been working on this for a few hours - getting familiar with the code, testing queries etc. Two comments. 1) Apparently there's something broken, because with this: create table table_b (fk_id int, val_a int, val_b int); insert into table_b select i, mod(i,1000), mod(i,1000) from generate_series(1,1000) s(i); analyze table_b; I get this: set work_mem = '8MB'; explain analyze select fk_id, count(*) from table_b where val_a 50 and val_b 50 group by 1; The connection to the server was lost. Attempting reset: Failed. Stacktrace attached, but apparently there's a segfault in advance_transition_function when accessing pergroupstate. This happened for all queries that I tried, once they needed to do the batching. 2) Using the same hash value both for dynahash and batching seems really fishy to me. I'm not familiar with dynahash, but I'd bet the way it's done now will lead to bad distribution in the hash table (some buckets will be always empty in some batches, etc.). This is why hashjoin tries so hard to use non-overlapping parts of the hash for batchno/bucketno. The hashjoin implements it's onw hash table, which makes it clear how the bucket is derived from the hash value. I'm not sure how dynahash does that, but I'm pretty sure we can'd just reuse the hash value like this. I see two options - compute our own hash value, or somehow derive a new one (e.g. by doing hashvalue XOR random_seed). I'm not sure the latter would work, though. regards Tomas Program received signal SIGSEGV, Segmentation fault. 0x0064e6fc in advance_transition_function (aggstate=0x11405c0, peraggstate=0x1143540, pergroupstate=0x8) at nodeAgg.c:465 465 if (pergroupstate-noTransValue) (gdb) bt #0 0x0064e6fc in advance_transition_function (aggstate=0x11405c0, peraggstate=0x1143540, pergroupstate=0x8) at nodeAgg.c:465 #1 0x0064eb0e in advance_aggregates (aggstate=0x11405c0, pergroup=0x8) at nodeAgg.c:621 #2 0x006502a7 in agg_fill_hash_table (aggstate=0x11405c0) at nodeAgg.c:1584 #3 0x0064fc3f in ExecAgg (node=0x11405c0) at nodeAgg.c:1289 #4 0x0063c754 in ExecProcNode (node=0x11405c0) at execProcnode.c:476 #5 0x0063a483 in ExecutePlan (estate=0x11404a8, planstate=0x11405c0,
Re: [HACKERS] 9.5: Memory-bounded HashAgg
On 15.8.2014 19:53, Robert Haas wrote: On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis pg...@j-davis.com wrote: On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote: Oh? So if we have aggregates like array_agg whose memory footprint increases over time, the patch completely fails to avoid bloat? Yes, in its current form. I might think a patch with such a limitation was useful, if it weren't for the fact that aggregates of that nature are a large part of the cases where the planner misestimates the table size in the first place. Any complication that we add to nodeAgg should be directed towards dealing with cases that the planner is likely to get wrong. In my experience, the planner has a lot of difficulty estimating the cardinality unless it's coming from a base table without any operators above it (other than maybe a simple predicate). This is probably a lot more common than array_agg problems, simply because array_agg is relatively rare compared with GROUP BY in general. I think that's right, and I rather like your (Jeff's) approach. It's definitely true that we could do better if we have a mechanism for serializing and deserializing group states, but (1) I think an awful lot of cases would get an awful lot better even just with the approach proposed here and (2) I doubt we would make the serialization/deserialization interfaces mandatory, so even if we had that we'd probably want a fallback strategy anyway. I certainly agree that we need Jeff's approach even if we can do better in some cases (when we are able to serialize/deserialize the states). And yes, (mis)estimating the cardinalities is a big issue, and certainly a source of many problems. Furthermore, I don't really see that we're backing ourselves into a corner here. If prohibiting creation of additional groups isn't sufficient to control memory usage, but we have serialization/deserialization functions, we can just pick an arbitrary subset of the groups that we're storing in memory and spool their transition states off to disk, thus reducing memory even further. I understand Tomas' point to be that this is quite different from what we do for hash joins, but I think it's a different problem. In the case of a hash join, there are two streams of input tuples, and we've got to batch them in compatible ways. If we were to, say, exclude an arbitrary subset of tuples from the hash table instead of basing it on the hash code, we'd have to test *every* outer tuple against the hash table for *every* batch. That would incur a huge amount of additional cost vs. being able to discard outer tuples once the batch to which they pertain has been processed. Being able to batch inner and outer relations in a matching way is certainly one of the reasons why hashjoin uses that particular scheme. There are other reasons, though - for example being able to answer 'Does this group belong to this batch?' quickly, and automatically update number of batches. I'm not saying the lookup is extremely costly, but I'd be very surprised if it was as cheap as modulo on a 32-bit integer. Not saying it's the dominant cost here, but memory bandwidth is quickly becoming one of the main bottlenecks. But the situation here isn't comparable, because there's only one input stream. I'm pretty sure we'll want to keep track of which transition states we've spilled due to lack of memory as opposed to those which were never present in the table at all, so that we can segregate the unprocessed tuples that pertain to spilled transition states from the ones that pertain to a group we haven't begun yet. Why would that be necessary or useful? I don't see a reason for tracking that / segregating the tuples. And it might be that if we know (or learn as we go along) that we're going to vastly blow out work_mem it makes sense to use batching to segregate the tuples that we decide not to process onto N tapes binned by hash code, so that we have a better chance that future batches will be the right size to fit in memory. But I'm not convinced that there's a compelling reason why the *first* batch has to be chosen by hash code; we're actually best off picking any arbitrary set of groups that does the best job reducing the amount of data remaining to be processed, at least if the transition states are fixed size and maybe even if they aren't. If you don't choose the fist batch by hash code, it's over, IMHO. You can't just redo that later easily, because the HashWork items are already treated separately. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED
On Mon, Jul 28, 2014 at 2:24 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Mon, Jul 28, 2014 at 1:41 PM, Christoph Berg c...@df7cb.de wrote: Re: Fabrízio de Royes Mello 2014-07-28 CAFcNs+pctx4Q2UYsLOvVFWaznO3U0XhPpkMx5DRhR=jw8w3...@mail.gmail.com There are something that should I do on this patch yet? I haven't got around to have a look at the newest incarnation yet, but I plan to do that soonish. (Of course that shouldn't stop others from doing that as well if they wish.) Thanks! Updated version. -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 69a1e14..2d131df 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -59,16 +59,17 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable ENABLE ALWAYS RULE replaceable class=PARAMETERrewrite_rule_name/replaceable CLUSTER ON replaceable class=PARAMETERindex_name/replaceable SET WITHOUT CLUSTER +SET {LOGGED | UNLOGGED} SET WITH OIDS SET WITHOUT OIDS SET ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] ) +SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable RESET ( replaceable class=PARAMETERstorage_parameter/replaceable [, ... ] ) INHERIT replaceable class=PARAMETERparent_table/replaceable NO INHERIT replaceable class=PARAMETERparent_table/replaceable OF replaceable class=PARAMETERtype_name/replaceable NOT OF OWNER TO replaceable class=PARAMETERnew_owner/replaceable -SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable REPLICA IDENTITY {DEFAULT | USING INDEX replaceable class=PARAMETERindex_name/replaceable | FULL | NOTHING} phraseand replaceable class=PARAMETERtable_constraint_using_index/replaceable is:/phrase @@ -447,6 +448,20 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable /varlistentry varlistentry +termliteralSET {LOGGED | UNLOGGED}/literal/term +listitem + para + This form changes the table persistence type from unlogged to permanent or + from unlogged to permanent (see xref linkend=SQL-CREATETABLE-UNLOGGED). + /para + para + Changing the table persistence type acquires an literalACCESS EXCLUSIVE/literal lock + and rewrites the table contents and associated indexes into new disk files. + /para +/listitem + /varlistentry + + varlistentry termliteralSET WITH OIDS/literal/term listitem para diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index b1c411a..7f497c7 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -574,7 +574,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose) heap_close(OldHeap, NoLock); /* Create the transient table that will receive the re-ordered data */ - OIDNewHeap = make_new_heap(tableOid, tableSpace, false, + OIDNewHeap = make_new_heap(tableOid, tableSpace, + OldHeap-rd_rel-relpersistence, AccessExclusiveLock); /* Copy the heap data into the new table in the desired order */ @@ -601,7 +602,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose) * data, then call finish_heap_swap to complete the operation. */ Oid -make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp, +make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence, LOCKMODE lockmode) { TupleDesc OldHeapDesc; @@ -613,7 +614,6 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp, Datum reloptions; bool isNull; Oid namespaceid; - char relpersistence; OldHeap = heap_open(OIDOldHeap, lockmode); OldHeapDesc = RelationGetDescr(OldHeap); @@ -636,16 +636,10 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp, if (isNull) reloptions = (Datum) 0; - if (forcetemp) - { + if (relpersistence == RELPERSISTENCE_TEMP) namespaceid = LookupCreationNamespace(pg_temp); - relpersistence = RELPERSISTENCE_TEMP; - } else - { namespaceid = RelationGetNamespace(OldHeap); - relpersistence = OldHeap-rd_rel-relpersistence; - } /* * Create the new heap, using a temporary name in the same namespace as @@ -1146,6 +1140,7 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class, Oid relfilenode1, relfilenode2; Oid swaptemp; + char swaprelpersistence; CatalogIndexState indstate; /* We need writable copies of both pg_class tuples. */ @@ -1177,6 +1172,10 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class, relform1-reltablespace = relform2-reltablespace; relform2-reltablespace = swaptemp; +
Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak
Alvaro, all, * Stephen Frost (sfr...@snowman.net) wrote: As mentioned, I'll add this to the ALTER TABLE documentation and remove it from the TABLESPACE docs. That's not done yet but I should have time in the next few days to get that done also and will then commit it all to master and back-patch to 9.4, barring objections. Here's a first pass with the documentation changes included. Feedback welcome, and I'll review it again and plan to commit it on Tuesday. Thanks, Stephen diff --git a/doc/src/sgml/ref/alter_index.sgml b/doc/src/sgml/ref/alter_index.sgml index 94a7af0..85159a0 100644 --- a/doc/src/sgml/ref/alter_index.sgml +++ b/doc/src/sgml/ref/alter_index.sgml @@ -25,6 +25,8 @@ ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RENA ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable SET TABLESPACE replaceable class=PARAMETERtablespace_name/replaceable ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable SET ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] ) ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RESET ( replaceable class=PARAMETERstorage_parameter/replaceable [, ... ] ) +ALTER INDEX ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable [ OWNED BY replaceable class=PARAMETERrole_name/replaceable [, ... ] ] +SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable [ NOWAIT ] /synopsis /refsynopsisdiv @@ -63,6 +65,17 @@ ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RESE para This form changes the index's tablespace to the specified tablespace and moves the data file(s) associated with the index to the new tablespace. + To change the tablespace of an index, you must own the index and have + literalCREATE/literal privilege on the new tablespace. + All indexes in a tablespace can be moved by using the + literalALL IN TABLESPACE/literal form, which will lock all indexes + to be moved and then move each one. This form also supports + literalOWNED BY/literal, which will only move indexes owned by the + roles specified. If the literalNOWAIT/literal option is specified + then the command will fail if it is unable to acquire all of the locks + required immediately. Note that system catalogs will not be moved by + this command, use commandALTER DATABASE/command or explicit + commandALTER INDEX/command invocations instead if desired. See also xref linkend=SQL-CREATETABLESPACE. /para diff --git a/doc/src/sgml/ref/alter_materialized_view.sgml b/doc/src/sgml/ref/alter_materialized_view.sgml index 1932eeb..b0759fc 100644 --- a/doc/src/sgml/ref/alter_materialized_view.sgml +++ b/doc/src/sgml/ref/alter_materialized_view.sgml @@ -29,6 +29,8 @@ ALTER MATERIALIZED VIEW [ IF EXISTS ] replaceable class=parametername/repla RENAME TO replaceable class=parameternew_name/replaceable ALTER MATERIALIZED VIEW [ IF EXISTS ] replaceable class=parametername/replaceable SET SCHEMA replaceable class=parameternew_schema/replaceable +ALTER MATERIALIZED VIEW ALL IN TABLESPACE replaceable class=parametername/replaceable [ OWNED BY replaceable class=PARAMETERrole_name/replaceable [, ... ] ] +SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable [ NOWAIT ] phrasewhere replaceable class=PARAMETERaction/replaceable is one of:/phrase diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 69a1e14..83d230c 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -31,6 +31,8 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RENAME TO replaceable class=PARAMETERnew_name/replaceable ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable SET SCHEMA replaceable class=PARAMETERnew_schema/replaceable +ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable [ OWNED BY replaceable class=PARAMETERrole_name/replaceable [, ... ] ] +SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable [ NOWAIT ] phrasewhere replaceable class=PARAMETERaction/replaceable is one of:/phrase @@ -597,6 +599,17 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable moves the data file(s) associated with the table to the new tablespace. Indexes on the table, if any, are not moved; but they can be moved separately with additional literalSET TABLESPACE/literal commands. + All tables in a tablespace can be moved by using the + literalALL IN TABLESPACE/literal form, which will lock all tables + to be moved first and then move each one. This form also supports + literalOWNED BY/literal, which will only move tables owned by the + roles specified. If the literalNOWAIT/literal option is specified +
Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak
On 15 Aug 2014 19:06, Robert Haas robertmh...@gmail.com wrote: As for the expanded-mode changes, I thought there was consensus to revert that from 9.4. Me too. In fact, I think that's been the consensus for many months, but unless I'm mistaken it ain't done. Yeah, this is entirely my fault. I was traveling, busy, not feeling well, in various permutations but I should have gotten to it earlier. I'll take care of it tomorrow morning.
[HACKERS] [PATCH] Incremental backup: add backup profile to base backup
Hi Hackers, This is the first piece of file level incremental backup support, as described on wiki page https://wiki.postgresql.org/wiki/Incremental_backup It is not yet complete, but I wish to share it on the list to receive comments and suggestions. The point of the patch is adding an option to pg_basebackup and replication protocol BASE_BACKUP command to generate a backup_profile file. When taking a full backup with pg_basebackup, the user can request Postgres to generate a backup_profile file through the --profile option (-B short option, which I've arbitrarily picked up because both -P and -p was already taken) At the moment the backup profile consists of a file with one line per file detailing modification time, md5, size, tablespace and path relative to tablespace root (PGDATA or the tablespace) To calculate the md5 checksum I've used the md5 code present in pgcrypto contrib as the code in src/include/libpq/md5.h is not suitable for large files. Since a core feature cannot depend on a piece of contrib, I've moved the files contrib/pgcrypto/md5.c contrib/pgcrypto/md5.h to src/backend/utils/hash/md5.c src/include/utils/md5.h changing the pgcrypto extension to use them. There are still some TODOs: * User documentation * Remove the pg_basebackup code duplication I've introduced with the ReceiveAndUnpackTarFileToDir function, which is almost the same of ReceiveAndUnpackTarFile but does not expect to handle a tablespace. It instead simply extract a tar stream in a destination directory. The latter could probably be rewritten using the former as component, but it needs some adjustment to the progress reporting part, which is not present at the moment in ReceiveAndUnpackTarFileToDir. * Add header section to backup_profile file which at the moment contains only the body part. I'm thinking to change the original design and put the whole backup label as header, which is IMHO clearer and well known. I would use something like: START WAL LOCATION: 0/E28 (file 0001000E) CHECKPOINT LOCATION: 0/E60 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2014-08-14 18:54:01 CEST LABEL: pg_basebackup base backup END LABEL I've attached the current patch based on master. Any comment will be appreciated. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it diff --git a/contrib/pgcrypto/Makefile b/contrib/pgcrypto/Makefile index b6c9484..daf2ba3 100644 *** a/contrib/pgcrypto/Makefile --- b/contrib/pgcrypto/Makefile *** *** 1,6 # contrib/pgcrypto/Makefile ! INT_SRCS = md5.c sha1.c sha2.c internal.c internal-sha2.c blf.c rijndael.c \ fortuna.c random.c pgp-mpi-internal.c imath.c INT_TESTS = sha2 --- 1,6 # contrib/pgcrypto/Makefile ! INT_SRCS = sha1.c sha2.c internal.c internal-sha2.c blf.c rijndael.c \ fortuna.c random.c pgp-mpi-internal.c imath.c INT_TESTS = sha2 diff --git a/contrib/pgcrypto/internal.c b/contrib/pgcrypto/internal.c index cb8ba26..0d2db23 100644 *** a/contrib/pgcrypto/internal.c --- b/contrib/pgcrypto/internal.c *** *** 30,40 */ #include postgres.h #include time.h #include px.h - #include md5.h #include sha1.h #include blf.h #include rijndael.h --- 30,40 */ #include postgres.h + #include utils/md5.h #include time.h #include px.h #include sha1.h #include blf.h #include rijndael.h diff --git a/contrib/pgcrypto/md5.c b/contrib/pgcrypto/md5.c index cac4e40..e69de29 . *** a/contrib/pgcrypto/md5.c --- b/contrib/pgcrypto/md5.c *** *** 1,397 - /* $KAME: md5.c,v 1.3 2000/02/22 14:01:17 itojun Exp $ */ - - /* - * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. - * All rights reserved. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * 3. Neither the name of the project nor the names of its contributors - * may be used to endorse or promote products derived from this software - * without specific prior written permission. - * - * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak
On 8/17/14 5:19 PM, Stephen Frost wrote: Alvaro, all, * Stephen Frost (sfr...@snowman.net) wrote: As mentioned, I'll add this to the ALTER TABLE documentation and remove it from the TABLESPACE docs. That's not done yet but I should have time in the next few days to get that done also and will then commit it all to master and back-patch to 9.4, barring objections. Here's a first pass with the documentation changes included. Feedback welcome, and I'll review it again and plan to commit it on Tuesday. One thing that is not clear from this (before or after) is whether all {tables|indexes|etc} in the tablespace actually means in this tablespace and in the current database. Presumably it does. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] option -T in pg_basebackup doesn't work on windows
On 8/16/14 8:46 AM, Amit Kapila wrote: On Fri, Aug 15, 2014 at 1:03 PM, MauMau maumau...@gmail.com mailto:maumau...@gmail.com wrote: Thank you. The code looks correct. I confirmed that the pg_basebackup could relocate the tablespace directory on Windows. I marked this patch as ready for committer. Thanks for the review. It's not ready for committer if the current patch does not apply. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why not ISO 8601 format for date values rendered into JSON?
I was just going over the release notes, and noticed the bit about timestamp and timestamptz now being rendered in a fixed ISO-8601-compliant format rather than whatever random DateStyle is in use. That's fine, but I wonder why the same approach wasn't applied to type date? regression=# set datestyle to postgres; SET regression=# select row_to_json(row(now())); row_to_json --- {f1:2014-08-17T20:34:54.424237-04:00} (1 row) regression=# select row_to_json(row(current_date)); row_to_json - {f1:08-17-2014} (1 row) Doesn't seem real consistent ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] option -T in pg_basebackup doesn't work on windows
On Mon, Aug 18, 2014 at 9:37 AM, Peter Eisentraut pete...@gmx.net wrote: It's not ready for committer if the current patch does not apply. FWIW, the latest version sent by Amit here applies correctly: http://www.postgresql.org/message-id/caa4ek1+cc9rb1s9q4+nsokfas1yufvfgfxuhxy_6wlbq1re...@mail.gmail.com I haven't tested it myself though. However, independently on this patch and as pointed by MauMau, the code that has been committed in fb05f3c is incorrect in the way it defines the tablespace path, this: psprintf(%s/pg_tblspc/%d, basedir, oid); should be this: psprintf(%s/pg_tblspc/%u, basedir, oid); I am separating this fix (that should be backpatched to REL9_4_STABLE as well), in the patch attached if this helps. Regards, -- Michael diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 3d26e22..ef7cd94 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -1119,7 +1119,7 @@ update_tablespace_symlink(Oid oid, const char *old_dir) if (strcmp(old_dir, new_dir) != 0) { - char *linkloc = psprintf(%s/pg_tblspc/%d, basedir, oid); + char *linkloc = psprintf(%s/pg_tblspc/%u, basedir, oid); if (unlink(linkloc) != 0 errno != ENOENT) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?
On 08/17/2014 08:42 PM, Tom Lane wrote: I was just going over the release notes, and noticed the bit about timestamp and timestamptz now being rendered in a fixed ISO-8601-compliant format rather than whatever random DateStyle is in use. That's fine, but I wonder why the same approach wasn't applied to type date? regression=# set datestyle to postgres; SET regression=# select row_to_json(row(now())); row_to_json --- {f1:2014-08-17T20:34:54.424237-04:00} (1 row) regression=# select row_to_json(row(current_date)); row_to_json - {f1:08-17-2014} (1 row) Doesn't seem real consistent ... Good point. Probably because I didn't get a complaint about it, which in turn is probably because JavaScript's builtin Date class is in fact (from our POV) more or less a timestamp(tz) type. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date But yes, I agree it should be fixed. Whatever we output should be suitable as input for the string-argument constructor of class Date. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?
Andrew Dunstan and...@dunslane.net writes: On 08/17/2014 08:42 PM, Tom Lane wrote: I was just going over the release notes, and noticed the bit about timestamp and timestamptz now being rendered in a fixed ISO-8601-compliant format rather than whatever random DateStyle is in use. That's fine, but I wonder why the same approach wasn't applied to type date? But yes, I agree it should be fixed. Whatever we output should be suitable as input for the string-argument constructor of class Date. OK. I think I can fix it, if you don't have time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication commands and index terms
On Fri, Aug 15, 2014 at 11:25 AM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Aug 15, 2014 at 4:59 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Aug 14, 2014 at 12:59 PM, Fujii Masao masao.fu...@gmail.com wrote: Since I sometimes try to search the replication command in the index, I'd feel inclined to expose all those commands as index terms... +1. Attached patch exposes replication commands as index terms. Barring any objection, I will commit this. Done. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Sat, Aug 16, 2014 at 4:23 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Aug 8, 2014 at 11:45 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas robertmh...@gmail.com wrote: Should we try to install some hack around fastupdate for 9.4? I fear the divergence between reasonable values of work_mem and reasonable sizes for that list is only going to continue to get bigger. I'm sure there's somebody out there who has work_mem = 16GB, and stuff like 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the appeal of large values. Controlling the threshold of the size of pending list only by GUC doesn't seem reasonable. Users may want to increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. So I think that it's better to add new storage parameter for GIN index to control the threshold, or both storage parameter and GUC. Yeah, -1 for a GUC. A GIN-index-specific storage parameter seems more appropriate. The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. This is an index storage parameter, and allows us to specify each threshold per GIN index. So, for example, we can increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. This patch uses another patch that I proposed (*1) as an infrastructure. Please apply that infrastructure patch first if you apply this patch. (*1) http://www.postgresql.org/message-id/CAHGQGwEanQ_e8WLHL25=bm_8z5zkyzw0k0yir+kdmv2hgne...@mail.gmail.com Regards, -- Fujii Masao Sorry, I forgot to attached the patch This time, attached. I think that this patch should be rebased. It contains garbage code. Thanks for reviewing the patch! ISTM that I failed to make the patch from my git repository... Attached is the rebased version. Regards, -- Fujii Masao diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml index 80a578d..24c8dc1 100644 --- a/doc/src/sgml/gin.sgml +++ b/doc/src/sgml/gin.sgml @@ -728,8 +728,9 @@ from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. - When the table is vacuumed, or if the pending list becomes too large - (larger than xref linkend=guc-work-mem), the entries are moved to the + When the table is vacuumed, or if the pending list becomes larger than + literalPENDING_LIST_CLEANUP_SIZE/literal (or + xref linkend=guc-work-mem if not set), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional @@ -812,18 +813,27 @@ /varlistentry varlistentry - termxref linkend=guc-work-mem/term + termliteralPENDING_LIST_CLEANUP_SIZE/ and + xref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym index that has literalFASTUPDATE/ enabled, the system will clean up the pending-entry list whenever the list grows larger than - varnamework_mem/. To avoid fluctuations in observed response time, - it's desirable to have pending-list cleanup occur in the background - (i.e., via autovacuum). Foreground cleanup operations can be avoided by - increasing varnamework_mem/ or making autovacuum more aggressive. - However, enlarging varnamework_mem/ means that if a foreground - cleanup does occur, it will take even longer. + literalPENDING_LIST_CLEANUP_SIZE/ (if not set, varnamework_mem/ + is used as that threshold, instead). To avoid fluctuations in observed + response time, it's desirable to have pending-list cleanup occur in the + background (i.e., via autovacuum). Foreground cleanup operations + can be avoided by increasing literalPENDING_LIST_CLEANUP_SIZE/ + (or varnamework_mem/) or making autovacuum more aggressive. + However, enlarging the threshold of the cleanup operation means that + if a foreground cleanup does occur, it will take even longer. +/para +para + literalPENDING_LIST_CLEANUP_SIZE/ is an index storage parameter, + and allows each GIN index to have its own cleanup threshold. + For example, it's possible to increase the
Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?
On 08/17/2014 09:53 PM, Tom Lane wrote: OK. I think I can fix it, if you don't have time. [offlist] Thanks. FYI I am still recovering from treatment for prostate cancer I had not long after pgcon ... it's taken more out of me that I expected, so time is limited. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] option -T in pg_basebackup doesn't work on windows
On Mon, Aug 18, 2014 at 7:08 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Aug 18, 2014 at 9:37 AM, Peter Eisentraut pete...@gmx.net wrote: It's not ready for committer if the current patch does not apply. FWIW, the latest version sent by Amit here applies correctly: http://www.postgresql.org/message-id/caa4ek1+cc9rb1s9q4+nsokfas1yufvfgfxuhxy_6wlbq1re...@mail.gmail.com I haven't tested it myself though. However, independently on this patch and as pointed by MauMau, the code that has been committed in fb05f3c is incorrect in the way it defines the tablespace path, this: psprintf(%s/pg_tblspc/%d, basedir, oid); should be this: psprintf(%s/pg_tblspc/%u, basedir, oid); I am separating this fix (that should be backpatched to REL9_4_STABLE as well), in the patch attached if this helps. I think the patch provided by me needs to be back patched to 9.4 as this is a new option introduced in 9.4 which is not working on windows. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?
On 08/17/2014 10:50 PM, Andrew Dunstan wrote: On 08/17/2014 09:53 PM, Tom Lane wrote: OK. I think I can fix it, if you don't have time. [offlist] Thanks. FYI I am still recovering from treatment for prostate cancer I had not long after pgcon ... it's taken more out of me that I expected, so time is limited. Darn it. well, I guess everyone knows now. *sigh* cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] option -T in pg_basebackup doesn't work on windows
On Mon, Aug 18, 2014 at 11:51 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Aug 18, 2014 at 7:08 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Aug 18, 2014 at 9:37 AM, Peter Eisentraut pete...@gmx.net wrote: It's not ready for committer if the current patch does not apply. FWIW, the latest version sent by Amit here applies correctly: http://www.postgresql.org/message-id/caa4ek1+cc9rb1s9q4+nsokfas1yufvfgfxuhxy_6wlbq1re...@mail.gmail.com I haven't tested it myself though. However, independently on this patch and as pointed by MauMau, the code that has been committed in fb05f3c is incorrect in the way it defines the tablespace path, this: psprintf(%s/pg_tblspc/%d, basedir, oid); should be this: psprintf(%s/pg_tblspc/%u, basedir, oid); I am separating this fix (that should be backpatched to REL9_4_STABLE as well), in the patch attached if this helps. I think the patch provided by me needs to be back patched to 9.4 as this is a new option introduced in 9.4 which is not working on windows. Yep. Definitely. This will fix both issues at the same time. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sat, Aug 16, 2014 at 6:51 PM, Rahila Syed rahilasye...@gmail.com wrote: So, you're compressing backup blocks one by one. I wonder if that's the right idea and if we shouldn't instead compress all of them in one run to increase the compression ratio Please find attached patch for compression of all blocks of a record together . Following are the measurement results: Benchmark: Scale : 16 Command :java JR /home/postgres/jdbcrunner-1.2/scripts/tpcc.js -sleepTime 550,250,250,200,200 Warmup time : 1 sec Measurement time : 900 sec Number of tx types : 5 Number of agents : 16 Connection pool size : 16 Statement cache size : 40 Auto commit : false Checkpoint segments:1024 Checkpoint timeout:5 mins Compression Multiple Blocks in one runSingle Block in one run Bytes saved 0 0 OFF WAL generated 1265150984(~1265MB) 1264771760(~1265MB) % Compression NA NA Bytes saved 215215079 (~215MB) 285675622 (~286MB) LZ4 WAL generated 125118783(~1251MB)1329031918(~1329MB) % Compression 17.2 % 21.49 % Bytes saved 203705959 (~204MB) 271009408 (~271MB) Snappy WAL generated 1254505415(~1254MB) 1329628352(~1330MB) % Compression16.23 % 20.38% Bytes saved 155910177(~156MB) 182804997(~182MB) pglz WAL generated 1259773129(~1260MB) 1286670317(~1287MB) % Compression12.37% 14.21% As per measurement results of this benchmark, compression of multiple blocks didn't improve compression ratio over compression of single block. According to the measurement result, the amount of WAL generated in Multiple Blocks in one run than that in Single Block in one run. So ISTM that compression of multiple blocks at one run can improve the compression ratio. Am I missing something? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Incremental backup: add backup profile to base backup
Marco Nenciarini wrote: To calculate the md5 checksum I've used the md5 code present in pgcrypto contrib as the code in src/include/libpq/md5.h is not suitable for large files. Since a core feature cannot depend on a piece of contrib, I've moved the files contrib/pgcrypto/md5.c contrib/pgcrypto/md5.h to src/backend/utils/hash/md5.c src/include/utils/md5.h changing the pgcrypto extension to use them. We already have the FNV checksum implementation in the backend -- can't we use that one for this and avoid messing with MD5? (I don't think we're looking for a cryptographic hash here. Am I wrong?) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On Fri, Aug 15, 2014 at 5:17 PM, Michael Paquier michael.paqu...@gmail.com wrote: Thanks for your review. On Fri, Aug 15, 2014 at 12:56 AM, furu...@pm.nttdata.co.jp wrote: At consistency with pg_recvlogical, do you think about --start? I did not add that for the sake of backward-compatibility as in pg_recvlogical an action is mandatory. It is not the case now of pg_receivexlog. [postgres postgresql-6aa6158]$ make /dev/null streamutil.c: In function 'CreateReplicationSlot': streamutil.c:244: warning: suggest parentheses around '' within '||' I see. Here is a rebased patch. Looking more at the code, IDENTIFY_SYSTEM management can be unified into a single function. So I have written a newer version of the patch grouping IDENTIFY_SYSTEM call into a single function for both utilities, fixing at the same time a couple of other issues: - correct use of TimelineID in code - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following check was done but in 9.4 this command returns 4 fields: (PQntuples(res) != 1 || PQnfields(res) 3) That's not directly related to this patch, but making some corrections is not going to hurt.. Regards, -- Michael From 160b00e595dab739aa4da7d18a81aff38d0a837f Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 18 Aug 2014 14:32:44 +0900 Subject: [PATCH] Add support for physical slot creation/deletion in pg_receivexlog Physical slot creation can be done with --create and drop with --drop. In both cases --slot is needed. Code for replication slot creation and drop is refactored with what was available in pg_recvlogical, the same applied with IDENTIFY_SYSTEM to simplify the whole set. --- doc/src/sgml/ref/pg_receivexlog.sgml | 29 +++ src/bin/pg_basebackup/pg_receivexlog.c | 108 - src/bin/pg_basebackup/pg_recvlogical.c | 119 +--- src/bin/pg_basebackup/streamutil.c | 140 + src/bin/pg_basebackup/streamutil.h | 9 +++ 5 files changed, 265 insertions(+), 140 deletions(-) diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml index 5916b8f..7e46005 100644 --- a/doc/src/sgml/ref/pg_receivexlog.sgml +++ b/doc/src/sgml/ref/pg_receivexlog.sgml @@ -72,6 +72,35 @@ PostgreSQL documentation titleOptions/title para +applicationpg_receivexlog/application can run in one of two following +modes, which control physical replication slot: + +variablelist + + varlistentry + termoption--create/option/term + listitem + para +Create a new physical replication slot with the name specified in +option--slot/option, then exit. + /para + /listitem + /varlistentry + + varlistentry + termoption--drop/option/term + listitem + para +Drop the replication slot with the name specified +in option--slot/option, then exit. + /para + /listitem + /varlistentry +/variablelist + + /para + + para The following command-line options control the location and format of the output. diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index a8b9ad3..a56cd9e 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -38,6 +38,8 @@ static int noloop = 0; static int standby_message_timeout = 10 * 1000; /* 10 sec = default */ static int fsync_interval = 0; /* 0 = default */ static volatile bool time_to_abort = false; +static bool do_create_slot = false; +static bool do_drop_slot = false; static void usage(void); @@ -78,6 +80,9 @@ usage(void) printf(_( -w, --no-password never prompt for password\n)); printf(_( -W, --password force password prompt (should happen automatically)\n)); printf(_( -S, --slot=SLOTNAMEreplication slot to use\n)); + printf(_(\nOptional actions:\n)); + printf(_( --create create a new replication slot (for the slot's name see --slot)\n)); + printf(_( --drop drop the replication slot (for the slot's name see --slot)\n)); printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n)); } @@ -253,21 +258,10 @@ FindStreamingStart(uint32 *tli) static void StreamLog(void) { - PGresult *res; XLogRecPtr startpos; - uint32 starttli; + TimeLineID starttli; XLogRecPtr serverpos; - uint32 servertli; - uint32 hi, -lo; - - /* - * Connect in replication mode to the server - */ - conn = GetConnection(); - if (!conn) - /* Error message already written in GetConnection() */ - return; + TimeLineID servertli; if (!CheckServerVersionForStreaming(conn)) { @@ -280,33 +274,11 @@ StreamLog(void) } /* - * Run IDENTIFY_SYSTEM so we can get the timeline and current xlog - * position. + * Identify server, obtaining start LSN position and current timeline ID + * at the same time. */ - res = PQexec(conn, IDENTIFY_SYSTEM); - if
Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries
Pavel Stehule wrote: 2014-08-13 15:22 GMT+02:00 MauMau maumau...@gmail.com: I didn't mean performance statistics data to be stored in database tables. I just meant: * pg_stat_system_events is a view to show data on memory, which returns one row for each event across the system. This is similar to V$SYSTEM_EVENT in Oracle. * pg_stat_session_events is a view to show data on memory, which returns one row for each event on one session. This is similar to V$SESSION_EVENT in Oracle. * The above views represent the current accumulated data like other pg_stat_xxx views. * EXPLAIN ANALYZE and auto_explain shows all events for one query. The lock waits you are trying to record in the server log is one of the events. I am little bit sceptic about only memory based structure. Is it this concept acceptable for commiters? Is this supposed to be session-local data, or is it visible from remote sessions too? How durable is it supposed to be? Keep in mind that in case of a crash, all pgstats data is erased. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers