Re: [HACKERS] PostgreSQL Columnar Store for Analytic Workloads
Hi Hadi Do you think that cstore_fd*w* is also welll suited for storing and retrieving linked data (RDF)? -S. 2014-04-03 18:43 GMT+02:00 Hadi Moshayedi h...@citusdata.com: Dear Hackers, We at Citus Data have been developing a columnar store extension for PostgreSQL. Today we are excited to open source it under the Apache v2.0 license. This columnar store extension uses the Optimized Row Columnar (ORC) format for its data layout, which improves upon the RCFile format developed at Facebook, and brings the following benefits: * Compression: Reduces in-memory and on-disk data size by 2-4x. Can be extended to support different codecs. We used the functions in pg_lzcompress.h for compression and decompression. * Column projections: Only reads column data relevant to the query. Improves performance for I/O bound queries. * Skip indexes: Stores min/max statistics for row groups, and uses them to skip over unrelated rows. We used the PostgreSQL FDW APIs to make this work. The extension doesn't implement the writable FDW API, but it uses the process utility hook to enable COPY command for the columnar tables. This extension uses PostgreSQL's internal data type representation to store data in the table, so this columnar store should support all data types that PostgreSQL supports. We tried the extension on TPC-H benchmark with 4GB scale factor on a m1.xlarge Amazon EC2 instance, and the query performance improved by 2x-3x compared to regular PostgreSQL table. Note that we flushed the page cache before each test to see the impact on disk I/O. When data is cached in memory, the performance of cstore_fdw tables were close to the performance of regular PostgreSQL tables. For more information, please visit: * our blog post: http://citusdata.com/blog/76-postgresql-columnar-store-for-analytics * our github page: https://github.com/citusdata/cstore_fdw Feedback from you is really appreciated. Thanks, -- Hadi
[HACKERS] Doc typo in 9.28. Event Trigger Functions
Just a single missing 's'. Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index 6e2fbda..b5807f3 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** FOR EACH ROW EXECUTE PROCEDURE suppress_ *** 17446,17452 /para para ! functionpg_event_trigger_dropped_objects/ returns a list of all object dropped by the command in whose literalsql_drop/ event it is called. If called in any other context, functionpg_event_trigger_dropped_objects/ raises an error. --- 17446,17452 /para para ! functionpg_event_trigger_dropped_objects/ returns a list of all objects dropped by the command in whose literalsql_drop/ event it is called. If called in any other context, functionpg_event_trigger_dropped_objects/ raises an error. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Fwd: [HACKERS] Proposal: variant of regclass
On Mon, 7 Apr 2014 12:00:49 -0400 Robert Haas robertmh...@gmail.com wrote: In other words, let's revert the whole refactoring of this file to create reg*_guts functions, and instead just copy the relevant logic for the name lookups into the new functions. For to_regproc(), for example, it would look like this (untested): names = stringToQualifiedNameList(pro_name_or_oid); clist = FuncnameGetCandidates(names, -1, NIL, false, false, false); if (clist == NULL || clist-next != NULL) result = InvalidOid; else result = clist-oid; With that change, this patch will actually get a whole lot smaller, change less already-existing code, and deliver cleaner behavior. Here is an updated patch. I rewrote regproc.c not to use _guts functions, and fixed to_reg* not accept a numeric OID. I also updated the documents and some comments. Is this better than the previous one? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Yugo Nagata nag...@sraoss.co.jp to_regclass.patch.v7 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Including replication slot data in base backups
On Tue, Apr 8, 2014 at 1:18 AM, Robert Haas robertmh...@gmail.com wrote: Not sure if this is exactly the right way to do it, but I agree that something along those lines is a good idea. I also think, maybe even importantly, that we should probably document that people using file-copy based hot backups should strongly consider removing the replication slots by hand before using the backup. Good point. Something here would be adapted in this case: http://www.postgresql.org/docs/devel/static/backup-file.html I am attaching an updated patch as well. -- Michael diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 854b5fd..d8286b0 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -448,6 +448,13 @@ tar -cf backup.tar /usr/local/pgsql/data the contents of indexes for example, just the commands to recreate them.) However, taking a file system backup might be faster. /para + + para + When doing a file system backup, it is recommended to drop replication + slots (see xref linkend=streaming-replication-slots) before using + it as it is not guaranteed that the WAL files needed by a slot will be + kept on the newly-created node. + /para /sect1 sect1 id=continuous-archiving diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 6ce0c8c..b81ad8d 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -590,6 +590,13 @@ PostgreSQL documentation or an older major version, down to 9.1. However, WAL streaming mode (-X stream) only works with server version 9.3. /para + + para + The backup will not include information about replication slots + (see xref linkend=streaming-replication-slots) as it is not + guaranteed that a node in recovery will have WAL files required for + a given slot. + /para /refsect1 refsect1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is it not sane to pass ExecStoreTuple(shouldFree=true) for tuples point into buffers
On 2014-04-07 21:47:38 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: So what I now do is essentially: while ((scantuple = index_getnext(scan, ForwardScanDirection)) != NULL) { ... ht = palloc(sizeof(HeapTupleData)); /* in the right context */ memcpy(ht, scantuple, sizeof(HeapTupleData)); ExecStoreTuple(ht, slot, scan-xs_cbuf, false); slot-tts_shouldFree = true; ... } Well, that is certainly messy. I think you could just use a local HeapTupleData variable instead of palloc'ing every time, where local means has lifespan similar to the slot pointer. Doesn't work nicely in this specific situation, but it's obviously an alternative. There's some vaguely similar hacking near the end of ExecDelete. Yea, and some other places. I wonder if a ExecShallowMaterializeSlot() or something would be useful for me, that callsite and others? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Minor improvements in alter_table.sgml
Attached is a patch to improve the manual page for the ALTER TABLE command. Thanks, Best regards, Etsuro Fujita diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 0b08f83..ce67c71 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -625,8 +625,9 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable listitem para The literalRENAME/literal forms change the name of a table - (or an index, sequence, or view), the name of an individual column in - a table, or the name of a constraint of the table. There is no effect on the stored data. + (or an index, sequence, view, materialized view, or foreign table), the name + of an individual column in a table, or the name of a constraint of the table. + There is no effect on the stored data. /para /listitem /varlistentry @@ -717,7 +718,7 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable /varlistentry varlistentry - termreplaceable class=PARAMETERtype/replaceable/term + termreplaceable class=PARAMETERdata_type/replaceable/term listitem para Data type of the new column, or new data type for an existing @@ -739,7 +740,7 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable termreplaceable class=PARAMETERconstraint_name/replaceable/term listitem para -Name of an existing constraint to drop. +Name of an existing constraint to alter, validate, or drop. /para /listitem /varlistentry @@ -836,6 +837,15 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable /varlistentry varlistentry + termreplaceable class=PARAMETERtype_name/replaceable/term + listitem + para +Name of an existing composite type. + /para + /listitem + /varlistentry + + varlistentry termreplaceable class=PARAMETERnew_owner/replaceable/term listitem para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: add psql tab completion for event triggers
As it was kind of annoying not to have this when playing around with event triggers. This also tightens up the existing tab completion for ALTER TRIGGER, which contained redundant code for table name completion, and which was also causing a spurious RENAME TO to be inserted in this context: CREATE EVENT TRIGGER foo ON {event} ^I Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index 6e2fbda..b5807f3 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** FOR EACH ROW EXECUTE PROCEDURE suppress_ *** 17446,17452 /para para ! functionpg_event_trigger_dropped_objects/ returns a list of all object dropped by the command in whose literalsql_drop/ event it is called. If called in any other context, functionpg_event_trigger_dropped_objects/ raises an error. --- 17446,17452 /para para ! functionpg_event_trigger_dropped_objects/ returns a list of all objects dropped by the command in whose literalsql_drop/ event it is called. If called in any other context, functionpg_event_trigger_dropped_objects/ raises an error. diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c new file mode 100644 index 202ffce..7179642 *** a/src/bin/psql/tab-complete.c --- b/src/bin/psql/tab-complete.c *** static const SchemaQuery Query_for_list_ *** 714,721 FROM pg_catalog.pg_prepared_statements \ WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s' /* ! * This is a list of all things in Pgsql, which can show up after CREATE or * DROP; and there is also a query to get a list of them. */ --- 714,726 FROM pg_catalog.pg_prepared_statements \ WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s' + #define Query_for_list_of_event_triggers \ + SELECT pg_catalog.quote_ident(evtname) \ +FROM pg_catalog.pg_event_trigger \ + WHERE substring(pg_catalog.quote_ident(evtname),1,%d)='%s' + /* ! * This is a list of all things in Postgres, which can show up after CREATE or * DROP; and there is also a query to get a list of them. */ *** static const pgsql_thing_t words_after_c *** 746,751 --- 751,757 {DATABASE, Query_for_list_of_databases}, {DICTIONARY, Query_for_list_of_ts_dictionaries, NULL, THING_NO_SHOW}, {DOMAIN, NULL, Query_for_list_of_domains}, + {EVENT TRIGGER, NULL, NULL}, {EXTENSION, Query_for_list_of_extensions}, {FOREIGN DATA WRAPPER, NULL, NULL}, {FOREIGN TABLE, NULL, NULL}, *** psql_completion(const char *text, int st *** 934,942 { static const char *const list_ALTER[] = {AGGREGATE, COLLATION, CONVERSION, DATABASE, DEFAULT PRIVILEGES, DOMAIN, ! EXTENSION, FOREIGN DATA WRAPPER, FOREIGN TABLE, FUNCTION, GROUP, INDEX, LANGUAGE, LARGE OBJECT, MATERIALIZED VIEW, OPERATOR, ! ROLE, RULE, SCHEMA, SERVER, SEQUENCE, SYSTEM SET, TABLE, TABLESPACE, TEXT SEARCH, TRIGGER, TYPE, USER, USER MAPPING FOR, VIEW, NULL}; --- 940,948 { static const char *const list_ALTER[] = {AGGREGATE, COLLATION, CONVERSION, DATABASE, DEFAULT PRIVILEGES, DOMAIN, ! EXTENSION, EVENT TRIGGER, FOREIGN DATA WRAPPER, FOREIGN TABLE, FUNCTION, GROUP, INDEX, LANGUAGE, LARGE OBJECT, MATERIALIZED VIEW, OPERATOR, ! ROLE, RULE, SCHEMA, SERVER, SEQUENCE, SYSTEM SET, TABLE, TABLESPACE, TEXT SEARCH, TRIGGER, TYPE, USER, USER MAPPING FOR, VIEW, NULL}; *** psql_completion(const char *text, int st *** 1013,1018 --- 1019,1055 COMPLETE_WITH_LIST(list_ALTEREXTENSION); } + /* ALTER EVENT TRIGGER */ + else if (pg_strcasecmp(prev3_wd, ALTER) == 0 + pg_strcasecmp(prev2_wd, EVENT) == 0 + pg_strcasecmp(prev_wd, TRIGGER) == 0) + { + COMPLETE_WITH_QUERY(Query_for_list_of_event_triggers); + } + + /* ALTER EVENT TRIGGER name */ + else if (pg_strcasecmp(prev4_wd, ALTER) == 0 + pg_strcasecmp(prev3_wd, EVENT) == 0 + pg_strcasecmp(prev2_wd, TRIGGER) == 0) + { + static const char *const list_ALTER_EVENT_TRIGGER[] = + {DISABLE, ENABLE, OWNER TO, RENAME TO, NULL}; + + COMPLETE_WITH_LIST(list_ALTER_EVENT_TRIGGER); + } + + /* ALTER EVENT TRIGGER name ENABLE */ + else if (pg_strcasecmp(prev5_wd, ALTER) == 0 + pg_strcasecmp(prev4_wd, EVENT) == 0 + pg_strcasecmp(prev3_wd, TRIGGER) == 0 + pg_strcasecmp(prev_wd, ENABLE) == 0) + { + static const char *const list_ALTER_EVENT_TRIGGER_ENABLE[] = + {REPLICA, ALWAYS, NULL}; + + COMPLETE_WITH_LIST(list_ALTER_EVENT_TRIGGER_ENABLE); + } + /* ALTER FOREIGN */ else if (pg_strcasecmp(prev2_wd, ALTER) == 0 pg_strcasecmp(prev_wd, FOREIGN) == 0) *** psql_completion(const char *text, int st *** 1318,1340 pg_strcasecmp(prev2_wd, TRIGGER) == 0)
Re: [HACKERS] Patch: add psql tab completion for event triggers
On 08/04/14 18:22, Ian Barwick wrote: As it was kind of annoying not to have this when playing around with event triggers. This also tightens up the existing tab completion for ALTER TRIGGER, which contained redundant code for table name completion, and which was also causing a spurious RENAME TO to be inserted in this context: CREATE EVENT TRIGGER foo ON {event} ^I Apologies, previous patch had some unrelated changes in it. Correct patch attached. -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c new file mode 100644 index 202ffce..7179642 *** a/src/bin/psql/tab-complete.c --- b/src/bin/psql/tab-complete.c *** static const SchemaQuery Query_for_list_ *** 714,721 FROM pg_catalog.pg_prepared_statements \ WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s' /* ! * This is a list of all things in Pgsql, which can show up after CREATE or * DROP; and there is also a query to get a list of them. */ --- 714,726 FROM pg_catalog.pg_prepared_statements \ WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s' + #define Query_for_list_of_event_triggers \ + SELECT pg_catalog.quote_ident(evtname) \ +FROM pg_catalog.pg_event_trigger \ + WHERE substring(pg_catalog.quote_ident(evtname),1,%d)='%s' + /* ! * This is a list of all things in Postgres, which can show up after CREATE or * DROP; and there is also a query to get a list of them. */ *** static const pgsql_thing_t words_after_c *** 746,751 --- 751,757 {DATABASE, Query_for_list_of_databases}, {DICTIONARY, Query_for_list_of_ts_dictionaries, NULL, THING_NO_SHOW}, {DOMAIN, NULL, Query_for_list_of_domains}, + {EVENT TRIGGER, NULL, NULL}, {EXTENSION, Query_for_list_of_extensions}, {FOREIGN DATA WRAPPER, NULL, NULL}, {FOREIGN TABLE, NULL, NULL}, *** psql_completion(const char *text, int st *** 934,942 { static const char *const list_ALTER[] = {AGGREGATE, COLLATION, CONVERSION, DATABASE, DEFAULT PRIVILEGES, DOMAIN, ! EXTENSION, FOREIGN DATA WRAPPER, FOREIGN TABLE, FUNCTION, GROUP, INDEX, LANGUAGE, LARGE OBJECT, MATERIALIZED VIEW, OPERATOR, ! ROLE, RULE, SCHEMA, SERVER, SEQUENCE, SYSTEM SET, TABLE, TABLESPACE, TEXT SEARCH, TRIGGER, TYPE, USER, USER MAPPING FOR, VIEW, NULL}; --- 940,948 { static const char *const list_ALTER[] = {AGGREGATE, COLLATION, CONVERSION, DATABASE, DEFAULT PRIVILEGES, DOMAIN, ! EXTENSION, EVENT TRIGGER, FOREIGN DATA WRAPPER, FOREIGN TABLE, FUNCTION, GROUP, INDEX, LANGUAGE, LARGE OBJECT, MATERIALIZED VIEW, OPERATOR, ! ROLE, RULE, SCHEMA, SERVER, SEQUENCE, SYSTEM SET, TABLE, TABLESPACE, TEXT SEARCH, TRIGGER, TYPE, USER, USER MAPPING FOR, VIEW, NULL}; *** psql_completion(const char *text, int st *** 1013,1018 --- 1019,1055 COMPLETE_WITH_LIST(list_ALTEREXTENSION); } + /* ALTER EVENT TRIGGER */ + else if (pg_strcasecmp(prev3_wd, ALTER) == 0 + pg_strcasecmp(prev2_wd, EVENT) == 0 + pg_strcasecmp(prev_wd, TRIGGER) == 0) + { + COMPLETE_WITH_QUERY(Query_for_list_of_event_triggers); + } + + /* ALTER EVENT TRIGGER name */ + else if (pg_strcasecmp(prev4_wd, ALTER) == 0 + pg_strcasecmp(prev3_wd, EVENT) == 0 + pg_strcasecmp(prev2_wd, TRIGGER) == 0) + { + static const char *const list_ALTER_EVENT_TRIGGER[] = + {DISABLE, ENABLE, OWNER TO, RENAME TO, NULL}; + + COMPLETE_WITH_LIST(list_ALTER_EVENT_TRIGGER); + } + + /* ALTER EVENT TRIGGER name ENABLE */ + else if (pg_strcasecmp(prev5_wd, ALTER) == 0 + pg_strcasecmp(prev4_wd, EVENT) == 0 + pg_strcasecmp(prev3_wd, TRIGGER) == 0 + pg_strcasecmp(prev_wd, ENABLE) == 0) + { + static const char *const list_ALTER_EVENT_TRIGGER_ENABLE[] = + {REPLICA, ALWAYS, NULL}; + + COMPLETE_WITH_LIST(list_ALTER_EVENT_TRIGGER_ENABLE); + } + /* ALTER FOREIGN */ else if (pg_strcasecmp(prev2_wd, ALTER) == 0 pg_strcasecmp(prev_wd, FOREIGN) == 0) *** psql_completion(const char *text, int st *** 1318,1340 pg_strcasecmp(prev2_wd, TRIGGER) == 0) COMPLETE_WITH_CONST(ON); - else if (pg_strcasecmp(prev4_wd, ALTER) == 0 - pg_strcasecmp(prev3_wd, TRIGGER) == 0) - { - completion_info_charp = prev2_wd; - COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger); - } - /* ! * If we have ALTER TRIGGER sth ON, then add the correct tablename */ else if (pg_strcasecmp(prev4_wd, ALTER) == 0 pg_strcasecmp(prev3_wd, TRIGGER) == 0 pg_strcasecmp(prev_wd, ON) == 0) ! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL); /* ALTER TRIGGER name ON name */ ! else if (pg_strcasecmp(prev4_wd, TRIGGER) == 0 pg_strcasecmp(prev2_wd, ON) == 0) COMPLETE_WITH_CONST(RENAME TO); --- 1355,1374
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On 04/07/2014 11:35 PM, Peter Geoghegan wrote: Okay. Here is a worst-case, with the pgbench script the same as my original test-case, but with much almost maximally unsympathetic data to sort: [local]/postgres=# update customers set firstname = 'padding-padding-padding-padding' || firstname; Hmm. I would expect the worst case to be where the strxfrm is not helping because all the entries have the same prefix, but the actual key is as short and cheap-to-compare as possible. So the padding should be as short as possible. Also, we have a fast path for pre-sorted input, which reduces the number of comparisons performed; that will make the strxfrm overhead more significant. I'm getting about 2x slowdown on this test case: create table sorttest (t text); insert into sorttest select 'foobarfo' || (g) || repeat('a', 75) from generate_series(1, 3) g; explain analyze select * from sorttest order by t; Now, you can argue that that's acceptable because it's such a special case, but if we're looking for the worst-case.. (BTW, IMHO it's way too late to do this for 9.4) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On 04/07/2014 09:19 PM, Peter Geoghegan wrote: The only case that this patch could possibly regress is where there are strings that differ beyond about the first 8 bytes, but are not identical (we chance a memcmp() == 0 before doing a full strcoll() when tie-breaking on the semi-reliable initial comparison). We still always avoid fmgr-overhead (and shim overhead, which I've measured), as in your original patch - you put that at adding 7% at the time, which is likely to make up for otherwise-regressed cases. There is nothing at all contrived about my test-case. Did I understand correctly that this patch actually does two things: 1. Avoid fmgr and shim overhead 2. Use strxfrm to produce a pseudo-leading key that's cheaper to compare. In that case, these changes need to be analyzed separately. You don't get to make up for the losses by the second part by the gains from the first part. We could commit just the first part (for 9.5!), and that has to be the baseline for the second part. This is very promising stuff, but it's not a slam-dunk win. I'd suggest adding these to the next commitfest, as two separate patches. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is it not sane to pass ExecStoreTuple(shouldFree=true) for tuples point into buffers
Andres Freund and...@2ndquadrant.com writes: On 2014-04-07 21:47:38 -0400, Tom Lane wrote: Well, that is certainly messy. I think you could just use a local HeapTupleData variable instead of palloc'ing every time, where local means has lifespan similar to the slot pointer. There's some vaguely similar hacking near the end of ExecDelete. Yea, and some other places. I wonder if a ExecShallowMaterializeSlot() or something would be useful for me, that callsite and others? Don't like that name much, but I agree there's some room for a function like this. I guess you're imagining that we'd add a HeapTupleData field to TupleTableSlots, and use that for the workspace when this situation arises? An alternative possibility would be to not invent a new function, but just make ExecStoreTuple do this unconditionally when shouldFree=false. Not sure if there'd be a noticeable runtime penalty --- but the existing approach seems rather fragile. I know I've always thought of slots as being fully independent storage, and in this case they are not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is it not sane to pass ExecStoreTuple(shouldFree=true) for tuples point into buffers
On 2014-04-08 09:37:33 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-04-07 21:47:38 -0400, Tom Lane wrote: Well, that is certainly messy. I think you could just use a local HeapTupleData variable instead of palloc'ing every time, where local means has lifespan similar to the slot pointer. There's some vaguely similar hacking near the end of ExecDelete. Yea, and some other places. I wonder if a ExecShallowMaterializeSlot() or something would be useful for me, that callsite and others? Don't like that name much, but I agree there's some room for a function like this. I am not the biggest fan either, for one it's really rather long, for another it's not really descriptive. One might think it's about toast or something. Do you have a better name? I guess you're imagining that we'd add a HeapTupleData field to TupleTableSlots, and use that for the workspace when this situation arises? I wasn't really sure about the approach yet. Either do something like you describe (possibly reusing/recoining tts_minhdr?), or just allocate a HeapTupleData struct. An alternative possibility would be to not invent a new function, but just make ExecStoreTuple do this unconditionally when shouldFree=false. Not sure if there'd be a noticeable runtime penalty. I think that's a viable alternative. I know I've always thought of slots as being fully independent storage, and in this case they are not. Me to. I was initially rather confused by the memory corruptions I was seeing. I really thought that storing a tuple pointing to a buffer in the slot should just work. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Columnar Store for Analytic Workloads
Hi Stefan, On Tue, Apr 8, 2014 at 9:28 AM, Stefan Keller sfkel...@gmail.com wrote: Hi Hadi Do you think that cstore_fd*w* is also welll suited for storing and retrieving linked data (RDF)? I am not very familiar with RDF. Note that cstore_fdw doesn't change the query language of PostgreSQL, so if your queries are expressible in SQL, they can be answered using cstore_fdw too. If your data is huge and doesn't fit in memory, then using cstore_fdw can be beneficial for you. Can you give some more information about your use case? For example, what are some of your queries? do you have sample data? how much memory do you have? how large is the data? -- Hadi
Re: Fwd: [HACKERS] Proposal: variant of regclass
On Tue, Apr 8, 2014 at 3:01 AM, Yugo Nagata nag...@sraoss.co.jp wrote: On Mon, 7 Apr 2014 12:00:49 -0400 Robert Haas robertmh...@gmail.com wrote: In other words, let's revert the whole refactoring of this file to create reg*_guts functions, and instead just copy the relevant logic for the name lookups into the new functions. For to_regproc(), for example, it would look like this (untested): names = stringToQualifiedNameList(pro_name_or_oid); clist = FuncnameGetCandidates(names, -1, NIL, false, false, false); if (clist == NULL || clist-next != NULL) result = InvalidOid; else result = clist-oid; With that change, this patch will actually get a whole lot smaller, change less already-existing code, and deliver cleaner behavior. Here is an updated patch. I rewrote regproc.c not to use _guts functions, and fixed to_reg* not accept a numeric OID. I also updated the documents and some comments. Is this better than the previous one? Looks good, committed with a bit of further cleanup. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
On Thu, Apr 3, 2014 at 7:44 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I'd like to do some changes to the WAL format in 9.5. I want to annotate each WAL record with the blocks that they modify. Every WAL record already includes that information, but it's done in an ad hoc way, differently in every rmgr. The RelFileNode and block number are currently part of the WAL payload, and it's the REDO routine's responsibility to extract it. I want to include that information in a common format for every WAL record type. That makes life a lot easier for tools that are interested in knowing which blocks a WAL record modifies. One such tool is pg_rewind; it currently has to understand every WAL record the backend writes. There's also a tool out there called pg_readahead, which does prefetching of blocks accessed by WAL records, to speed up PITR. I don't think that tool has been actively maintained, but at least part of the reason for that is probably that it's a pain to maintain when it has to understand the details of every WAL record type. It'd also be nice for contrib/pg_xlogdump and backend code itself. The boilerplate code in all WAL redo routines, and writing WAL records, could be simplified. I think it will also be useful, if we want to implement table/tablespace PITR. That's for the normal cases. We'll need a couple of variants for also registering buffers that don't need full-page images, and perhaps also a function for registering a page that *always* needs a full-page image, regardless of the LSN. A few existing WAL record types just WAL-log the whole page, so those ad-hoc full-page images could be replaced with this. With these changes, a typical WAL insertion would look like this: /* register the buffer with the WAL record, with ID 0 */ XLogRegisterBuffer(0, buf, true); rdata[0].data = (char *) xlrec; rdata[0].len = sizeof(BlahRecord); rdata[0].buffer_id = -1; /* -1 means the data is always included */ rdata[0].next = (rdata[1]); rdata[1].data = (char *) mydata; rdata[1].len = mydatalen; rdata[1].buffer_id = 0; /* 0 here refers to the buffer registered above */ rdata[1].next = NULL ... recptr = XLogInsert(RM_BLAH_ID, xlinfo, rdata); PageSetLSN(buf, recptr); If we do register buffer's (that require or don't require FPI) before calling XLogInsert(), then will there be any impact to handle case where we come to know that we need to backup the buffer after taking WALInsertLock.. or will that part of code remains same as it is today. Redo There are four different states a block referenced by a typical WAL record can be in: 1. The old page does not exist at all (because the relation was truncated later) 2. The old page exists, but has an LSN higher than current WAL record, so it doesn't need replaying. 3. The LSN is current WAL record, so it needs to be replayed. 4. The WAL record contains a full-page image, which needs to be restored. I think there might be a need to have separate handling for some special cases like Init Page which is used in few ops (Insert/Update/multi-insert). Is there any criteria to decide if it needs to be a separate state or a special handling for operations? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Fwd: [HACKERS] Proposal: variant of regclass
Robert Haas robertmh...@gmail.com writes: Looks good, committed with a bit of further cleanup. I had not actually paid attention to the non-regclass parts of this, and now that I look, I've got to say that it seems borderline insane to have chosen to implement regproc/regoper rather than regprocedure/regoperator. The types implemented here are incapable of dealing with overloaded names, which --- particularly in the operator case --- makes them close to useless. I don't think this code was ready to commit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: COUNT(*) (and related) speedup
On Fri, Apr 4, 2014 at 1:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: Joshua Yanovski pythones...@gmail.com writes: But worse, what happens if a count(*) is in progress? It might or might not have scanned this page already, and there's no way to get the right answer in both cases. Counter updates done by VACUUM would have a similar race-condition problem. I don't think the first part really matters. If the page was visible when COUNT(*) started and then got dirtied by some other transaction, any changes by the second transaction shouldn't alter the actual count in our transaction anyway, so whether we scan the page needlessly or not seems beside the point. The question is not whether you scan a page needlessly or not, it's whether you double-count the tuples on it. I don't think it's possible to be sure that when you add the central counter value into your local sum, you are neither double-counting nor omitting pages whose all-visible state changed while you were scanning the table. I think it would work if you also had information about which pages you needed to scan. For example, suppose you had a data structure sorta like the visibility map, except that instead of one bit per page you have one 16-bit integer per page. The integer is -1 if the page isn't known to be all-visible, and is otherwise the count of tuples on the page. Now, we can do a count(*) as follows: 1. Lock the first as-yet-unexamined page of the data structure. 2. Add all the values that aren't -1 to your running count of tuples, and remember the page numbers corresponding to the remaining entries. 3. Unlock the page you locked in step 1. 4. Visit each heap page you remembered in step 2 and add the number of tuples visible to your scan to your running count of tuples. 5. Provided there's a page of the data structure you haven't visited yet, go to step 1. On the write side, any operation that clears the visibility map bit must also set the entry for that page to -1. When the page is all-visible, the value can be set to the total number of tuples on the page. This is safe because, even if the page ceases to be all-visible after we add its tuples to the count, the new tuples that were added aren't visible to our scan, and any tuples deleted are still visible to our scan - because the transaction making the changes, in either case, is obviously still running, and therefore didn't commit before we took our snapshot. Now, this wouldn't be O(1) but it would presumably be quite fast if your visibility map bits are mostly set. One 8kB page of 16-bit counters would cover 32MB of heap. The bad news is that I am pretty sure there's no easy way for an index AM to get callbacks at the right time, so it's really unclear how something like this would fit in with our existing abstractions. And even if we answered that question, it would be a lot of work for a pretty narrow use case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Fwd: [HACKERS] Proposal: variant of regclass
On Tue, Apr 8, 2014 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Looks good, committed with a bit of further cleanup. I had not actually paid attention to the non-regclass parts of this, and now that I look, I've got to say that it seems borderline insane to have chosen to implement regproc/regoper rather than regprocedure/regoperator. The types implemented here are incapable of dealing with overloaded names, which --- particularly in the operator case --- makes them close to useless. I don't think this code was ready to commit. Well, I noticed that, too, but I didn't think it was my job to tell the patch author what functions he should have wanted. A follow-on patch to add to_regprocedure and to_regoperator wouldn't be much work, if you want that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Fwd: [HACKERS] Proposal: variant of regclass
On Tue, Apr 8, 2014 at 11:01 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Apr 8, 2014 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Looks good, committed with a bit of further cleanup. I had not actually paid attention to the non-regclass parts of this, and now that I look, I've got to say that it seems borderline insane to have chosen to implement regproc/regoper rather than regprocedure/regoperator. The types implemented here are incapable of dealing with overloaded names, which --- particularly in the operator case --- makes them close to useless. I don't think this code was ready to commit. Well, I noticed that, too, but I didn't think it was my job to tell the patch author what functions he should have wanted. A follow-on patch to add to_regprocedure and to_regoperator wouldn't be much work, if you want that. And here is a patch for that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 33e093e..e742181 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15288,10 +15288,18 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); /indexterm indexterm +primaryto_regprocedure/primary + /indexterm + + indexterm primaryto_regoper/primary /indexterm indexterm +primaryto_regoperator/primary + /indexterm + + indexterm primaryto_regtype/primary /indexterm @@ -15476,11 +15484,21 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); entryget the oid of the named function/entry /row row + entryliteralfunctionto_regprocedure(parameterfunc_name/parameter)/function/literal/entry + entrytyperegprocedure/type/entry + entryget the oid of the named function/entry + /row + row entryliteralfunctionto_regoper(parameteroperator_name/parameter)/function/literal/entry entrytyperegoper/type/entry entryget the oid of the named operator/entry /row row + entryliteralfunctionto_regoperator(parameteroperator_name/parameter)/function/literal/entry + entrytyperegoperator/type/entry + entryget the oid of the named operator/entry + /row + row entryliteralfunctionto_regtype(parametertype_name/parameter)/function/literal/entry entrytyperegtype/type/entry entryget the oid of the named type/entry @@ -15652,10 +15670,12 @@ SELECT collation for ('foo' COLLATE de_DE); para The functionto_regclass/function, functionto_regproc/function, - functionto_regoper/function and functionto_regtype/function - translate relation, function, operator, and type names to objects of - type typeregclass/, typeregproc/, typeregoper/ and - typeregtype/, respectively. These functions differ from a cast from + functionto_regprocedure/function, functionto_regoper/function, + functionto_regoperator/function, and functionto_regtype/function + functions translate relation, function, operator, and type names to objects + of type typeregclass/, typeregproc/, typeregprocedure/type, + typeregoper/, typeregoperator/type, and typeregtype/, + respectively. These functions differ from a cast from text in that they don't accept a numeric OID, and that they return null rather than throwing an error if the name is not found (or, for functionto_regproc/function and functionto_regoper/function, if diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c index ed2bdbf..6210f45 100644 --- a/src/backend/utils/adt/regproc.c +++ b/src/backend/utils/adt/regproc.c @@ -323,6 +323,38 @@ regprocedurein(PG_FUNCTION_ARGS) } /* + * to_regprocedure - converts proname(args) to proc OID + * + * If the name is not found, we return NULL. + */ +Datum +to_regprocedure(PG_FUNCTION_ARGS) +{ + char *pro_name = PG_GETARG_CSTRING(0); + List *names; + int nargs; + Oid argtypes[FUNC_MAX_ARGS]; + FuncCandidateList clist; + + /* + * Parse the name and arguments, look up potential matches in the current + * namespace search list, and scan to see which one exactly matches the + * given argument types. (There will not be more than one match.) + */ + parseNameAndArgTypes(pro_name, false, names, nargs, argtypes); + + clist = FuncnameGetCandidates(names, nargs, NIL, false, false, true); + + for (; clist; clist = clist-next) + { + if (memcmp(clist-args, argtypes, nargs * sizeof(Oid)) == 0) + PG_RETURN_OID(clist-oid); + } + + PG_RETURN_NULL(); +} + +/* * format_procedure - converts proc OID to pro_name(args) * * This exports the useful functionality of regprocedureout for use @@ -722,6 +754,45 @@ regoperatorin(PG_FUNCTION_ARGS) } /* + * to_regoperator - converts oprname(args) to operator OID + * + * If the name is not found, we return NULL. + */ +Datum +to_regoperator(PG_FUNCTION_ARGS) +{ + char *opr_name_or_oid =
Re: [HACKERS] four minor proposals for 9.5
2014-04-08 6:27 GMT+02:00 Amit Kapila amit.kapil...@gmail.com: On Mon, Apr 7, 2014 at 12:16 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-04-04 6:51 GMT+02:00 Amit Kapila amit.kapil...@gmail.com: On Tue, Apr 1, 2014 at 11:42 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-03-27 17:56 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com: So I'll prepare a some prototypes in next month for 1. log a execution time for cancelled queries, 2. track a query lock time Yes. Initially I though only about cancelled queries, but now O am thinking so some more wide solution can be better. Sometimes - some long queries can be stopped by Postgres, or by system - and info about duration can be same usefull. Right. One more thing I think currently also we can find that by crude way (pg_stat_activity has query_start time and log_line_prefix has end time), but I think the having in one place 'log' will be more useful. ?? I just wanted to say that if someone wants to calculate the duration of cancelled query (or other error'd query), you can do that by checking the start time from pg_stat_activity and end time from log (using log_line_prefix), but this is of course cumbersome. Same technique I would to use for printing lock time - it can be printing instead symbol %L. Above you have mentioned that you are planing to have three different lock times (Table, Tuple and others), so will this one symbol (%L) enable all three lock times? My idea is start with %L as total lock time, what is useful for wide group of users, and next or in same time we can enhance it with two chars prefix symbols So do you want to just print lock time for error'd statements, won't it better to do it for non-error'd statements as well or rather I feel it can be more useful for non-error statements? Do we already have some easy way to get wait-time for non-error statements? There are two points: a) we have no a some infrastructure how to glue some specific info to any query other than log_line_prefix. And I have no any idea, how and what implement better. And I don't think so any new infrastructure (mechanism) is necessary. log_line_prefix increase log size, but it is very well parseable - splunk and similar sw has no problem with it. b) lock time can be interesting on error statements too - for example - you can cancel locked queries - so you would to see a lock time and duration for cancelled queries. So this implementation can be sensible too. so %L .. total lock time %Lr .. lock relation %Lt .. lock tuples %Lo .. lock others L = Lr + Lt + Lr Are you planing to add new logs for logging this or planing to use existing infrastructure? I have not a prototype yet, so I don't know what will be necessary Okay, I think then it's better to discuss after your initial analysis/ prototype, but I think you might need to add some infrastructure code to make it possible, as lock database object (relation, tuple, ..) and lock others (buffers, ..) have different locking strategy, so to get total wait time for a statement due to different kind of locks you need to accumulate different wait times. yes, we can wait after prototype will be ready, One general point is that won't it be bit difficult to parse the log line having so many times, should we consider to log with some special marker for each time (for example: tup_lock_wait_time:1000ms). We should to optimize a log_line_prefix processing instead. Proposed options are interesting for enterprise using, when you have a some more smart tools for log entry processing, and when you need a complex view about performance of billions queries - when cancel time and lock time is important piece in mosaic of server' fitness. Exactly, though this might not be directly related to this patch, but having it would be useful. I don't afraid about impact to performance (surely, it should be tested first). My previous implementation in GoodData was based on active used mechanism - it doesn't introduce any new overhead. But it should be verified on prototype regards Pavel With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Dynamic Shared Memory stuff
On Fri, Apr 4, 2014 at 10:01 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jan 22, 2014 at 10:17 AM, Noah Misch n...@leadboat.com wrote: Yeah, abandoning the state file is looking attractive. Here's a draft patch getting rid of the state file. This should address concerns raised by Heikki and Fujii Masao and echoed by Tom that dynamic shared memory behaves differently than the main shared memory segment. The control segment ID is stored in the System V shared memory block, so we're guaranteed that when using either System V or POSIX shared memory we'll always latch onto the control segment that matches up with the main shared memory segment we latched on to. Cleanup for the file-mmap and Windows methods doesn't need to change, because the former always just means clearing out $PGDATA/pg_dynshmem, and the latter is done automatically by the operating system. Comments? Apparently not. However, I'm fairly sure this is a step toward addressing the complaints previously raised, even if there may be some details people still want changed, so I've gone ahead and committed it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc typo in 9.28. Event Trigger Functions
On Tue, Apr 8, 2014 at 2:39 AM, Ian Barwick i...@2ndquadrant.com wrote: Just a single missing 's'. Thanks, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE FOREIGN TABLE ( ... LIKE ... )
On Mon, Apr 7, 2014 at 4:24 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-05 11:46:16 -0400, Tom Lane wrote: ISTM this is because the proposed feature is wrongheaded. The basic concept of CREATE TABLE LIKE is that you're copying properties from another object of the same type. You might or might not want every property, but there's no question of whether you *could* copy every property. In contrast, what this is proposing to do is copy properties from (what might be) a plain table to a foreign table, and those things aren't even remotely the same kind of object. It would make sense to me to restrict LIKE to copy from another foreign table, and then there would be a different set of INCLUDING/EXCLUDING options that would be relevant (options yes, indexes no, for example). I actually think it's quite useful to create a foreign table that's the same shape as a local table. And the patches approach of refusing to copy thinks that aren't supported sounds sane to me. Consider e.g. moving off older partitioned data off to an archiving server. New local partitions are often created using CREATE TABLE LIKE, but that's not possible for the foreign ones. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pending 9.4 patches
On 4/7/14 2:59 AM, Craig Ringer wrote: On 04/05/2014 03:57 AM, Andres Freund wrote: c07) Updatable security barrier views. This needs a serious look by a committer. I've been exercising it via row security and it's been looking pretty solid. It isn't a huge or intrusive patch, and it's seen several rounds of discussion during its development and refinement. (Thanks Dean). Same here, nothing but good feedback from testing. The updatable security barrier views has been sitting in Ready For Committer since late January. Unfortunately, I just learned that some of the people who might commit in this area--Stephen Frost for example--thought there were still major oustanding issues with that part. I (and I think Craig too) been waiting for that to be picked up by a committer, Stephen was waiting for me or Craig to fix unrelated bugs, and that's where the CF process has been deadlocked on this one. A core bugfix with locking in security barrier views is required before the regression tests can be fixed up properly, for one thing. Tom also expressed concerns about how plan invalidation works, though it's not yet clear whether that was just miscommunication about how it works on my part or whether there's a concrete problem there. This is similarly stuck. I'm not out of resources, there's just nothing I can do here myself. For this to move forward, a committer needs to pick up the security barrier views part. We also need a bug fix for the issue that's breaking the regression tests. Once all that's done, RLS on top of updateable security barrier views might be commitable. But there's no way to tell for sure until those other two bits are sorted out. I'd really love to finish this off for 9.4, but other projects have to come first. I have no other projects ahead of this for the remainder of this month. I just can't figure out what to do next until there's a committer (or committers, if someone else is going to take on the locking bug) identified. I looked at the locking problem enough to know that one is over my head. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Mon, Apr 7, 2014 at 7:32 PM, Peter Geoghegan p...@heroku.com wrote: I think that Greg's choice of words was a little imprudent, but must be viewed in the context of an offline discussion during the hall track of pgConf NYC. Clearly Greg wasn't about to go off and unilaterally commit this. FWIW, I think I put him off the idea a few hours after he made those remarks, without intending for what I'd said to have that specific effect (or the opposite effect). It was somewhere between your two interpretations. I intend to review everything I can from the commitfest and then this patch and if this patch is ready for commit before feature freeze I was saying I would go ahead and commit it. That would only happen if there was a pretty solid consensus that the my review was good and the patch was good of course. The point of the commit fest is to ensure that all patches get attention. That's why I would only look at this after I've reviewed anything else from the commitfest that I feel up to reviewing. But if I have indeed done so there's no point in not taking other patches as well up to feature freeze. I don't have any intention of lowering our review standards of course. So let's table this discussion until the hypothetical case of me doing lots of reviews *and* reviewing this patch *and* that review being positive and decisive enough to commit after one review cycle. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pending 9.4 patches
On Tue, Apr 8, 2014 at 11:59 AM, Gregory Smith gregsmithpg...@gmail.com wrote: I have no other projects ahead of this for the remainder of this month. I just can't figure out what to do next until there's a committer (or committers, if someone else is going to take on the locking bug) identified. I looked at the locking problem enough to know that one is over my head. What post or thread should I read for details about this locking bug? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying wide tables in psql
On Sat, Feb 15, 2014 at 11:08 AM, Emre Hasegeli e...@hasegeli.com wrote: This is my review about 3th version of the patch. It is an useful improvement in my opinion. It worked well on my environment. I'm reviewing this patch. One thing to comment: With no doc changes and no regression tests I was halfway inclined to just reject it out of hand. To be fair there were no regression tests for wrapped output prior to the patch but still I would have wanted to see them added. We often pare down regression tests when committing patches but it's easier to pare them down than write new ones and it helps show the author's intention. In this case I'm inclined to expand the regression tests. We've had bugs in these formatting functions before and at least I find it hard to touch code like this with any confidence that I'm not breaking things. Formatting code ends up being pretty spaghetti-like easily and there are lots of cases so it's easy to unintentionally break cases you didn't realize you were touching. In addition there are several cases of logic that looks like this: if (x) .. else { if (y) ... else ... } I know there are other opinions on this but I find this logic very difficut to follow. It's almost always clearer to refactor the branches into a flat if / else if / else if /.../ else form. Even if it results in some duplication of code (typically there's some trivial bit of code outside the second if) it's easy to quickly see whether all the cases are handled and understand whether any of the cases have forgotten any steps. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying wide tables in psql
On 2014-04-08 12:15:47 -0400, Greg Stark wrote: With no doc changes and no regression tests I was halfway inclined to just reject it out of hand. To be fair there were no regression tests for wrapped output prior to the patch but still I would have wanted to see them added. We often pare down regression tests when committing patches but it's easier to pare them down than write new ones and it helps show the author's intention. I don't think this is easily testable that way - doesn't it rely on determining the width of the terminal? Which you won't have when started from pg_regress? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor improvements in alter_table.sgml
On Tue, Apr 8, 2014 at 5:05 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Attached is a patch to improve the manual page for the ALTER TABLE command. Do we really need to add a section for type_name when we already have a section for OF type_name? constraint_name is also used for adding a constraint using an index. So it could not only be a constraint to alter, validate, or drop, but also a new constraint name to be added. Honestly, how much value is there in even having a section for this? Do we really want to document constraint_name as name of an existing constraint, or the name of a new constraint to be added? It would be accurate, then, but it also doesn't really tell you anything you didn't know already. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying wide tables in psql
On Tue, Apr 8, 2014 at 12:19 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think this is easily testable that way - doesn't it rely on determining the width of the terminal? Which you won't have when started from pg_regress? There's a pset variable to set the target width so at least the formatting code can be tested. It would be nice to have the ioctl at least get called on the regression farm so we're sure we aren't doing something unportable. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] four minor proposals for 9.5
On 4/6/14 2:46 PM, Pavel Stehule wrote: Proposed options are interesting for enterprise using, when you have a some more smart tools for log entry processing, and when you need a complex view about performance of billions queries - when cancel time and lock time is important piece in mosaic of server' fitness. I once sent a design proposal over for something I called Performance Events that included this. It will be difficult to get everything people want to track into log_line_prefix macro form. You're right that this particular one can probably be pushed into there, but you're adding four macros just for this feature. And that's only a fraction of what people expect from database per-query performance metrics. The problem I got stuck on with the performance event project was where to store the data collected. If you want to keep up with read rates, you can't use the existing log infrastructure. It has to be something faster, lighter. I wanted to push the data into shared memory somewhere instead. Then some sort of logging consumer could drain that queue and persist it to disk. Since then, we've had a number of advances, particularly these two: -Dynamic shared memory allocation. -Query log data from pg_stat_statements can persist. With those important fundamentals available, I'm wandering around right now trying to get development resources to pick the whole event logging idea up again. The hardest parts of the infrastructure I was stuck on in the past are in the code today. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] four minor proposals for 9.5
2014-04-08 18:34 GMT+02:00 Gregory Smith gregsmithpg...@gmail.com: On 4/6/14 2:46 PM, Pavel Stehule wrote: Proposed options are interesting for enterprise using, when you have a some more smart tools for log entry processing, and when you need a complex view about performance of billions queries - when cancel time and lock time is important piece in mosaic of server' fitness. I once sent a design proposal over for something I called Performance Events that included this. It will be difficult to get everything people want to track into log_line_prefix macro form. You're right that this particular one can probably be pushed into there, but you're adding four macros just for this feature. And that's only a fraction of what people expect from database per-query performance metrics. The problem I got stuck on with the performance event project was where to store the data collected. If you want to keep up with read rates, you can't use the existing log infrastructure. It has to be something faster, lighter. I wanted to push the data into shared memory somewhere instead. Then some sort of logging consumer could drain that queue and persist it to disk. Since then, we've had a number of advances, particularly these two: -Dynamic shared memory allocation. -Query log data from pg_stat_statements can persist. I know nothing about your proposal, so I cannot to talk about it. But I am sure so any memory based solution is not practical for us. It can work well for cumulative values (per database), but we need a two views - individual (per queries) and cumulative (per database, per database server). We process billion queries per day, and for us is more practical to use a external log processing tools. But I understand well so for large group of users can be memory solution perfect and I am thinking so these designs should coexists together - we log a slow queries (we log plans) and we use a pg_stat_statements. And users can choose the best method for their environment. Probably some API (some data) can be shared by both designs. Regards Pavel With those important fundamentals available, I'm wandering around right now trying to get development resources to pick the whole event logging idea up again. The hardest parts of the infrastructure I was stuck on in the past are in the code today. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
t On Tue, Apr 8, 2014 at 3:12 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: 1. Avoid fmgr and shim overhead 2. Use strxfrm to produce a pseudo-leading key that's cheaper to compare. In that case, these changes need to be analyzed separately. You don't get to make up for the losses by the second part by the gains from the first part. We could commit just the first part (for 9.5!), and that has to be the baseline for the second part. Yes, that's right. Robert already submitted a patch that only did 1) almost 2 years ago. That should have been committed at the time, but wasn't. At the time, the improvement was put at about 7% by Robert. It would be odd to submit the same patch that Robert withdrew already. Why shouldn't 2) be credited with the same benefits as 1) ? It's not as if the fact that the strxfrm() trick uses SortSupport is a contrivance. I cannot reasonably submit the two separately, unless the second in a cumulative patch. By far the largest improvements come from 2), while 1) doesn't regress anything. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Buffer Allocation Concurrency Limits
In December, Metin (a coworker of mine) discussed an inability to scale a simple task (parallel scans of many independent tables) to many cores (it’s here). As a ramp-up task at Citus I was tasked to figure out what the heck was going on here. I have a pretty extensive writeup here (whose length is more a result of my inexperience with the workings of PostgreSQL than anything else) and was looking for some feedback. In short, my conclusion is that a working set larger than memory results in backends piling up on BufFreelistLock. As much as possible I removed anything that could be blamed for this: Hyper-Threading is disabled zone reclaim mode is disabled numactl was used to ensure interleaved allocation kernel.sched_migration_cost was set to highly disable migration kernel.sched_autogroup_enabled was disabled transparent hugepage support was disabled For a way forward, I was thinking the buffer allocation sections could use some of the atomics Andres added here. Rather than workers grabbing BufFreelistLock to iterate the clock hand until they find a victim, the algorithm could be rewritten in a lock-free style, allowing workers to move the clock hand in tandem. Alternatively, the clock iteration could be moved off to a background process, similar to what Amit Kapila proposed here. Is this assessment accurate? I know 9.4 changes a lot about lock organization, but last I looked I didn’t see anything that could alleviate this contention: are there any plans to address this? —Jason
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On 04/08/2014 08:02 PM, Peter Geoghegan wrote: On Tue, Apr 8, 2014 at 3:12 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: 1. Avoid fmgr and shim overhead 2. Use strxfrm to produce a pseudo-leading key that's cheaper to compare. In that case, these changes need to be analyzed separately. You don't get to make up for the losses by the second part by the gains from the first part. We could commit just the first part (for 9.5!), and that has to be the baseline for the second part. Yes, that's right. Robert already submitted a patch that only did 1) almost 2 years ago. That should have been committed at the time, but wasn't. At the time, the improvement was put at about 7% by Robert. It would be odd to submit the same patch that Robert withdrew already. Why shouldn't 2) be credited with the same benefits as 1) ? It's not as if the fact that the strxfrm() trick uses SortSupport is a contrivance. I cannot reasonably submit the two separately, unless the second in a cumulative patch. Sure, submit the second as a cumulative patch. By far the largest improvements come from 2), while 1) doesn't regress anything. Right. But 1) is the baseline we need to evaluate 2) against. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pending 9.4 patches
* Robert Haas (robertmh...@gmail.com) wrote: On Tue, Apr 8, 2014 at 11:59 AM, Gregory Smith gregsmithpg...@gmail.com wrote: I have no other projects ahead of this for the remainder of this month. I just can't figure out what to do next until there's a committer (or committers, if someone else is going to take on the locking bug) identified. I looked at the locking problem enough to know that one is over my head. What post or thread should I read for details about this locking bug? I think it was discussed a couple times, but it's here: http://www.postgresql.org/message-id/531d3355.6010...@2ndquadrant.com Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Pending 9.4 patches
* Tom Lane (t...@sss.pgh.pa.us) wrote: Craig Ringer cr...@2ndquadrant.com writes: On 04/05/2014 03:57 AM, Andres Freund wrote: r04) Row-security based on Updatable security barrier views This one's fate seems to be hard to judge without c07. Open issues remain with this patch, and resources for working on it in 9.4 have run out. It is not ready for commit. A core bugfix with locking in security barrier views is required before the regression tests can be fixed up properly, for one thing. Tom also expressed concerns about how plan invalidation works, though it's not yet clear whether that was just miscommunication about how it works on my part or whether there's a concrete problem there. I'd really love to finish this off for 9.4, but other projects have to come first. Given that, I think we should go ahead and mark this one Returned With Feedback. It's past time to be punting anything that doesn't have a serious chance of getting committed for 9.4. I'm a bit confused on this point- is the only issue the *preexisting* bug with security barrier views? I agree we need to fix that, but I'd really like to see that fixed and backpatched to address the risk in back-branches. I had understood there to be *other* issues with this, which is why I hadn't spent time on it. Craig, in general, I'd argue that a pre-existing bug isn't a reason that a patch isn't ready for commit. The bug may need to be fixed before the patch goes in, but saying a patch isn't ready implied, to me at least, issues with the *patch*, which it sounds like isn't the case here. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Autonomous Transaction (WIP)
On Mon, Apr 7, 2014 at 12:06 AM, Rajeev rastogi rajeev.rast...@huawei.com wrote: Deadlock Detection: It is possible that the main or upper autonomous transaction has taken a lock on some resource, which might be required by lower autonomous transaction. If it happens so then deadlock will occur. So in order to solve this issue, each main and autonomous transaction will hold list of all locks acquired in PROLOCK based on which deadlock will be resolved. I'm not sure how this would work out internally -- it would depend on how you plan to allocate the new transaction in the internal data structures -- but the natural way to prevent/detect deadlocks would be to have the parent transaction immediately take a lock on the autonomous transaction as soon as it's started. That would cause any lock in the autonomous transaction which caused it to wait on the parent transaction to be detected as a deadlock. It would also cause any monitoring tool to correctly show the parent transaction as waiting on the autonomous transaction to finish. If the autonomous transaction is actually a separate procarray entry (which I suspect it would have to be, much like prepared transactions and the dblink connections which are commonly used to kludge autonomous transactions) then this should be fairly painless. If you implement some kind of saving and restoring procarray data then it probably wouldn't work out. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0
On Sun, Apr 6, 2014 at 11:45:59AM +0530, Amit Kapila wrote: On Tue, Apr 1, 2014 at 6:31 AM, Bruce Momjian br...@momjian.us wrote: I reviewed this patch and you are correct that we are not handling socket() and accept() returns properly on Windows. We were doing it properly in most place in the backend, but your patch fixes the remaining places: http://msdn.microsoft.com/en-us/library/windows/desktop/ms740516%28v=vs.85%29.aspx However, libpq doesn't seem to be doing much to handle Windows properly in this area. I have adjusted libpq to map socket to -1, but the proper fix is to distribute pgsocket and PGINVALID_SOCKET checks throughout the libpq code. I am not sure how to handle PQsocket() --- should it still return -1? I think changing PQsocket() can impact all existing applications as it is mentioned in docs that result of -1 indicates that no server connection is currently open.. Do you see any compelling need to change return value of PQSocket() after your patch? No, I do not. In fact, the SSL_get_fd() call in secure_read() returns a signed integer too, and that is passed around like a socket, so in fact the SSL API doesn't even allow us to get an unsigned value on Windows in all cases. Having the return value be conditional on the operating system is ugly. How much of this should be backpatched? I think it's okay to back patch all the changes. Is there any part in patch which you feel is risky to back patch? Well, we would not backpatch this if it is just a stylistic fix, and I am starting to think it just a style issue. This MSDN website says -1, SOCKET_ERROR, and INVALID_SOCKET are very similar: http://msdn.microsoft.com/en-us/library/windows/desktop/cc507522%28v=vs.85%29.aspx and this Stackoverflow thread says the same: http://stackoverflow.com/questions/10817252/why-is-invalid-socket-defined-as-0-in-winsock2-h-c In fact, this C program compiled by gcc on Debian issues no compiler warnings and returns 'hello', showing that -1 and ~0 compare as equal: int main(int argc, char **argv) { int i; unsigned int j; i = -1; j = ~0; if (i == j) printf(hello\n); return 0; } meaning our incorrect syntax is computed correctly. Why aren't we getting warnings on Windows about assigning the socket() return value to an integer? I think by default Windows doesn't give warning for such code even at Warning level 4. I have found one related link: http://stackoverflow.com/questions/75385/make-vs-compiler-catch-signed-unsigned-assignments Updated patch attached. It seems you have missed to change at below places. 1. int pg_foreach_ifaddr(PgIfAddrCallback callback, void *cb_data) sock = WSASocket(AF_INET, SOCK_DGRAM, 0, 0, 0, 0); if (sock == SOCKET_ERROR) Well, the actual problem here is that WSASocket() returns INVALID_SOCKET per the documentation, not SOCKET_ERROR. I did not use PGINVALID_SOCKET here because this is Windows-specific code, defining 'sock' as SOCKET. We could have sock defined as pgsocket, but because this is Windows code already, it doesn't seem wise to mix portability code in there. 2. pgwin32_waitforsinglesocket(SOCKET s, int what, int timeout) { static HANDLE waitevent = INVALID_HANDLE_VALUE; static SOCKET current_socket = -1; Yes, that -1 is wrong and I have changed it to INVALID_SOCKET, again using the same rules that say PGINVALID_SOCKET doesn't make sense here as it is Windows-specific code. I am attaching an updated patch, which explains the PQsocket() return value issue, and fixes the items listed above. I am inclined to apply this just to head for correctness, and modify libpq to use pgsocket consistently in a follow-up patch. This is not like the readdir() fix we had to backpatch because that was clearly not catching errors. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c new file mode 100644 index d062c1d..8fa9aa7 *** a/src/backend/libpq/auth.c --- b/src/backend/libpq/auth.c *** ident_inet(hbaPort *port) *** 1463,1469 sock_fd = socket(ident_serv-ai_family, ident_serv-ai_socktype, ident_serv-ai_protocol); ! if (sock_fd 0) { ereport(LOG, (errcode_for_socket_access(), --- 1463,1469 sock_fd = socket(ident_serv-ai_family, ident_serv-ai_socktype, ident_serv-ai_protocol); ! if (sock_fd == PGINVALID_SOCKET) { ereport(LOG, (errcode_for_socket_access(), *** ident_inet(hbaPort *port) *** 1543,1549 ident_response))); ident_inet_done: ! if (sock_fd = 0) closesocket(sock_fd); pg_freeaddrinfo_all(remote_addr.addr.ss_family, ident_serv);
Re: [HACKERS] ipc_test
On Mon, Apr 7, 2014 at 10:43 AM, Robert Haas robertmh...@gmail.com wrote: OK, done. One less thing to worry about when committing! Also one less thing to cause headaches with etags and similar tools. It always drove me nuts that I was constantly being sent to ipc_test files for various typedefs. Thanks! -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
Greg Stark wrote: If the autonomous transaction is actually a separate procarray entry (which I suspect it would have to be, much like prepared transactions and the dblink connections which are commonly used to kludge autonomous transactions) then this should be fairly painless. If you implement some kind of saving and restoring procarray data then it probably wouldn't work out. I don't have time to digest this proposal ATM, but in previous occasion when we have discussed autonomous transactions (ATs), we have always considered natural that they have their own procarray entries; there are too many strange issues otherwise. Since the number of procarray entries is fixed at startup time, one natural consequence of this is that the number of ATs in flight at any moment is also fixed. Normally we consider allocating a single AT per session to be sufficient. So you can't have one AT start another AT, for instance -- that seems a reasonable restriction. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ipc_test
Greg Stark wrote: On Mon, Apr 7, 2014 at 10:43 AM, Robert Haas robertmh...@gmail.com wrote: OK, done. One less thing to worry about when committing! Also one less thing to cause headaches with etags and similar tools. It always drove me nuts that I was constantly being sent to ipc_test files for various typedefs. Thanks! Yeah, my thoughts exactly. Thanks. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached
On Mon, Apr 7, 2014 at 8:29 PM, Michael Paquier michael.paqu...@gmail.com wrote: Documentation of jsonb tells that jsonb documents should be kept at a reasonable size to reduce lock contention, but there is no mention of size limitation for indexes: http://www.postgresql.org/docs/devel/static/datatype-json.html It seems like your complaint is that this warrants special consideration from the jsonb docs, due to this general limitation being particularly likely to affect jsonb users. Is that accurate? The structure of the JSON in your test case is quite atypical, since there is one huge string containing each of the translations, rather than a bunch of individual array elements (one per translation) or a bunch of object pairs. As it happens, just this morning I read that MongoDB's new version 2.6 now comes with stricter enforcement of key length: http://docs.mongodb.org/master/release-notes/2.6-compatibility/#index-key-length-incompatibility . While previous versions just silently failed to index values that were inserted, there is now a 1024 limit imposed on the total size of indexed values. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Apr 8, 2014 at 10:10 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Right. But 1) is the baseline we need to evaluate 2) against. I don't agree with that. Surely we're concerned with not regressing cases that people actually care about, which in practical terms means the changes of a single release. While I guess I'm fine with structuring the patch like that, I don't think it's fair that the strxfrm() stuff doesn't get credit for not regressing those cases so badly just because they're only ameliorated by the fmgr-eliding stuff. While I'm concerned about worst case performance myself, I don't want to worry about Machiavelli rather than Murphy. What collation did you use for your test-case? The fmgr-eliding stuff is only really valuable in that it ameliorates the not-so-bad regressions, and is integral to the strxfrm() stuff. Let's not lose sight of the fact that (if we take TPC style benchmarks as representative) the majority of text sorts can be made at least 3 times faster. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Apr 8, 2014 at 3:10 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Apr 8, 2014 at 10:10 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Right. But 1) is the baseline we need to evaluate 2) against. I don't agree with that. Surely we're concerned with not regressing cases that people actually care about, which in practical terms means the changes of a single release. No, we're concerned about ending up with the best possible performance. That could mean applying the fmgr-elision but not the other part. Whether the other part is beneficial is based on how it compares to the performance post-fmgr-elision. As an oversimplified example, suppose someone were to propose two patches, one that makes PostgreSQL ten times as fast and the other of which slows it down by a factor of five. If someone merges those two things into a single combined patch, we would surely be foolish to apply the whole thing. The right thing would be to separate them out and apply only the first one. Every change has to stand on its own two feet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On Tue, Apr 8, 2014 at 2:43 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Greg Stark wrote: If the autonomous transaction is actually a separate procarray entry (which I suspect it would have to be, much like prepared transactions and the dblink connections which are commonly used to kludge autonomous transactions) then this should be fairly painless. If you implement some kind of saving and restoring procarray data then it probably wouldn't work out. I don't have time to digest this proposal ATM, but in previous occasion when we have discussed autonomous transactions (ATs), we have always considered natural that they have their own procarray entries; there are too many strange issues otherwise. Since the number of procarray entries is fixed at startup time, one natural consequence of this is that the number of ATs in flight at any moment is also fixed. Normally we consider allocating a single AT per session to be sufficient. So you can't have one AT start another AT, for instance -- that seems a reasonable restriction. It depends. A lot of Oracle users are used to having autonomous transactions be very cheap, so you can just mark random procedures as running in an autonomous transaction and forget about it. If the call stack is several levels deep, then you could easily have one such procedure call another such procedure. Of course, you may feel that's bad practice or that we shouldn't emulate what $COMPETITOR does, and I agree we don't have to necessarily do it that way just because they do it that way, but I'm not sure it's accurate to say that nobody will care. I'm also pretty unconvinced that multiple PGPROCs is the right way to go. First, PGPROCs have a bunch of state in them that is assumed to exist once per backend. We might find pretty substantial code churn there if we try to go change that. Second, why do other backends really need to know about our ATs? As far as I can see, if other backends see the AT as a subtransaction of our top-level transaction up until it actually commits, that ought to be just fine. Maybe the backend needs to internally frob visibility rules, but that's not a matter for shared memory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 2014-04-08 15:39:18 -0400, Robert Haas wrote: I'm also pretty unconvinced that multiple PGPROCs is the right way to go. First, PGPROCs have a bunch of state in them that is assumed to exist once per backend. We might find pretty substantial code churn there if we try to go change that. Second, why do other backends really need to know about our ATs? As far as I can see, if other backends see the AT as a subtransaction of our top-level transaction up until it actually commits, that ought to be just fine. Maybe the backend needs to internally frob visibility rules, but that's not a matter for shared memory. Agreed. That's also how I imagined things to work. I think except the visibility semantics, there's really not that much to do if we were to reuse the subtransaction framework. There's some complications with Hot Standby, but I think those can be solved. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST support for inet datatypes
I wrote: [ inet-gist-v6.patch ] Committed with some additional documentation work. Thanks for submitting this! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST support for inet datatypes
On Tue, Apr 8, 2014 at 3:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: [ inet-gist-v6.patch ] Committed with some additional documentation work. Thanks for submitting this! NICE. I'd like to tell you how excited I am about this part: # It also handles a new network # operator inet inet (overlaps, a/k/a is supernet or subnet of), # which is expected to be useful in exclusion constraints. ...but I can't, because my mouth is too full of drool. Wouldn't you really want that more for cidr than for inet, though? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST support for inet datatypes
Robert Haas robertmh...@gmail.com writes: NICE. I'd like to tell you how excited I am about this part: # It also handles a new network # operator inet inet (overlaps, a/k/a is supernet or subnet of), # which is expected to be useful in exclusion constraints. ...but I can't, because my mouth is too full of drool. Wouldn't you really want that more for cidr than for inet, though? Probably, but it works with either since they're binary-compatible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
Robert Haas robertmh...@gmail.com writes: I'm also pretty unconvinced that multiple PGPROCs is the right way to go. First, PGPROCs have a bunch of state in them that is assumed to exist once per backend. We might find pretty substantial code churn there if we try to go change that. Second, why do other backends really need to know about our ATs? As far as I can see, if other backends see the AT as a subtransaction of our top-level transaction up until it actually commits, that ought to be just fine. If we can make it work like that, sure. I'm a bit worried about how you'd decouple a subtransaction and commit it atomically ... or if that's not atomic, will it create any problems? The point being that you need to change both pg_subtrans and pg_clog to make that state transition. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 2014-04-08 16:13:21 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: I'm also pretty unconvinced that multiple PGPROCs is the right way to go. First, PGPROCs have a bunch of state in them that is assumed to exist once per backend. We might find pretty substantial code churn there if we try to go change that. Second, why do other backends really need to know about our ATs? As far as I can see, if other backends see the AT as a subtransaction of our top-level transaction up until it actually commits, that ought to be just fine. If we can make it work like that, sure. I'm a bit worried about how you'd decouple a subtransaction and commit it atomically ... or if that's not atomic, will it create any problems? The point being that you need to change both pg_subtrans and pg_clog to make that state transition. I think it can be made work sensibly - while those states are changed it will still appear to be running via the procarray. There's some fun around suboverflowed entries, but I think that can be handled by reserving an entry for autonomous transactions. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Call for GIST/GIN/SP-GIST opclass documentation
I just created sections in the SGML manual chapters about GIST, GIN, and SP-GIST to hold documentation about the standard opclasses provided for them: http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html We never had any well-defined place to discuss such opclasses before, but I think it's past time we did. I envision these sections as places to document, for example, the difference between the two jsonb gin opclasses. I put this text in about that: Of the two operator classes for type jsonb, jsonb_ops is the default. jsonb_hash_ops supports fewer operators but will work with larger indexed values than jsonb_ops can support. Is that accurate? Do we need to say more? For the moment what's there is mostly just tables of the core opclasses and the operators they support. If anyone can think of specific additions that should be there, please send in patches. (BTW, I didn't worry about btree and hash because they don't have such a wide variety of opclass behaviors.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Apr 8, 2014 at 12:31 PM, Robert Haas robertmh...@gmail.com wrote: No, we're concerned about ending up with the best possible performance. That could mean applying the fmgr-elision but not the other part. Whether the other part is beneficial is based on how it compares to the performance post-fmgr-elision. I agree with everything you say here, but I'm puzzled only because it's overwhelmingly obvious that the strxfrm() stuff is where the value is. You can dispute whether or not I should have made various tweaks, and you probably should, but the basic value of that idea is very much in evidence already. You yourself put the improvements of fmgr-elision alone at ~7% back in 2012 [1]. At the time, Noah said that he didn't think it was worth bothering with that patch for what he considered to be a small gain, a view which I did not share at the time. What I have here looks like it speeds things up a little over 200% (so a little over 300% of the original throughput) with a single client for many representative cases. That's a massive difference, to the point that I don't see a lot of sense in considering fmgr-elision alone separately. [1] http://www.postgresql.org/message-id/CA+Tgmoa8by24gd+YbuPX=5gsgmn0w5sgipzwwq7_8is26vl...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST support for inet datatypes
* Robert Haas (robertmh...@gmail.com) wrote: On Tue, Apr 8, 2014 at 3:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: [ inet-gist-v6.patch ] Committed with some additional documentation work. Thanks for submitting this! NICE. I'd like to tell you how excited I am about this part: # It also handles a new network # operator inet inet (overlaps, a/k/a is supernet or subnet of), # which is expected to be useful in exclusion constraints. ...but I can't, because my mouth is too full of drool. Wouldn't you really want that more for cidr than for inet, though? You realize ip4r has had all this and more for, like, forever, right? It's also more efficient wrt storage and generally more 'sane' regarding operators, etc.. Just thought I'd share.. If you have a use-case, ip4r is what everyone's been using for quite some time. Also, yes, it supports both ipv4 and ipv6, despite the name. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation
On Tue, Apr 8, 2014 at 1:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: I just created sections in the SGML manual chapters about GIST, GIN, and SP-GIST to hold documentation about the standard opclasses provided for them: I think that that's a good idea. I too was bothered by this omission. Of the two operator classes for type jsonb, jsonb_ops is the default. jsonb_hash_ops supports fewer operators but will work with larger indexed values than jsonb_ops can support. Is that accurate? Do we need to say more? Well, I'm not sure that it's worth noting there, but as you probably already know jsonb_hash_ops will perform a lot better than the default GIN opclass, and will have much smaller indexes. FWIW I think that the size limitation is overblown, and performance is in fact the compelling reason to prefer jsonb_hash_ops, although it's probably incongruous to explain the issues that way in this section of the docs. It probably suffices that that is covered in the JSON Types section. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
On Tue, Apr 1, 2014 at 01:36:02PM -0400, Robert Haas wrote: Although I agree with the general principle, I'm skeptical in this case. There are a bunch of table-level options, and I don't think it's very reasonable to expect that users are going to remember which ones are going to be displayed under which conditions, especially if we change it from release to release. If somebody doesn't see the has OIDs line, are they going to conclude that the table doesn't have OIDs, or are they going to conclude that psql doesn't ever display that information and they need to query pg_class manually? I'm sure at least some people will guess wrong. Now, admittedly, this is not the hill I want to die on. The future of PostgreSQL doesn't rest on whatever ends up happening here. But I think what's going on on this thread is a lot of tinkering with stuff that's not really broken. I'm not saying don't ever change psql output. What I'm saying is that changing psql output that is absolutely fine the way it is does not represent meaningful progress. The replica identity and has OIDs lines are a negligible percentage of what \d+ spits out - in a test I just did, 2 out of 37 lines on \d+ pg_class. I can't accept that tinkering with that is reducing clutter in any meaningful way; it's just change for the sake of change. I looked over the psql code and Robert is right that Has OIDs and Identity Replica are the only two cases where we are printing status, rather than unique strings like an index name. The only example I could find was where we print Number of child tables in \d when there are child tables. Of course, we print the child table names when they exist in \d+, but we don't print No child tables with \d or \d+, which seems logical. If we ignore backward compatibility, then Has OIDs and Identity Replica are similar. One thing that strongly (for me) supports not always printing them is that I expect more people will be confused by the mention of OIDs or Identity Replica than will actually care about these features. For example, if we always printed Child tables: 0, more people would be confused than helped. I don't think there is enough other cases for \d++ to make sense. I am not sure where to go from here. One idea would be tally the people who have commented in this thread, but a more thorough approach would be to have a vote. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
Bruce Momjian br...@momjian.us writes: If we ignore backward compatibility, then Has OIDs and Identity Replica are similar. One thing that strongly (for me) supports not always printing them is that I expect more people will be confused by the mention of OIDs or Identity Replica than will actually care about these features. For example, if we always printed Child tables: 0, more people would be confused than helped. This is a good argument, actually: these fields are not only noise for most people, but confusing if you don't know the feature they are talking about. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
On Tue, Apr 8, 2014 at 05:29:45PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: If we ignore backward compatibility, then Has OIDs and Identity Replica are similar. One thing that strongly (for me) supports not always printing them is that I expect more people will be confused by the mention of OIDs or Identity Replica than will actually care about these features. For example, if we always printed Child tables: 0, more people would be confused than helped. This is a good argument, actually: these fields are not only noise for most people, but confusing if you don't know the feature they are talking about. Let me put it this way: I didn't know what Identity Replica meant when I saw it in psql. Now, some might say that is expected, but still. ;-) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached
We are working to avoid this limitation. On Tue, Apr 8, 2014 at 10:54 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Apr 7, 2014 at 8:29 PM, Michael Paquier michael.paqu...@gmail.com wrote: Documentation of jsonb tells that jsonb documents should be kept at a reasonable size to reduce lock contention, but there is no mention of size limitation for indexes: http://www.postgresql.org/docs/devel/static/datatype-json.html It seems like your complaint is that this warrants special consideration from the jsonb docs, due to this general limitation being particularly likely to affect jsonb users. Is that accurate? The structure of the JSON in your test case is quite atypical, since there is one huge string containing each of the translations, rather than a bunch of individual array elements (one per translation) or a bunch of object pairs. As it happens, just this morning I read that MongoDB's new version 2.6 now comes with stricter enforcement of key length: http://docs.mongodb.org/master/release-notes/2.6-compatibility/#index-key-length-incompatibility . While previous versions just silently failed to index values that were inserted, there is now a 1024 limit imposed on the total size of indexed values. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation
Peter Geoghegan p...@heroku.com writes: On Tue, Apr 8, 2014 at 1:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: Of the two operator classes for type jsonb, jsonb_ops is the default. jsonb_hash_ops supports fewer operators but will work with larger indexed values than jsonb_ops can support. Is that accurate? Do we need to say more? Well, I'm not sure that it's worth noting there, but as you probably already know jsonb_hash_ops will perform a lot better than the default GIN opclass, and will have much smaller indexes. FWIW I think that the size limitation is overblown, and performance is in fact the compelling reason to prefer jsonb_hash_ops, although it's probably incongruous to explain the issues that way in this section of the docs. It probably suffices that that is covered in the JSON Types section. Well, the subtext is whether we should move that discussion to this new section. I think there is some comparable discussion in the full-text-indexing chapter, too. (BTW, wasn't there some discussion of changing our minds about which one is the default? We already have one bug report complaining about jsonb_ops' size restriction, so that seems to be evidence in favor of changing ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation
On Tue, Apr 8, 2014 at 2:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: (BTW, wasn't there some discussion of changing our minds about which one is the default? We already have one bug report complaining about jsonb_ops' size restriction, so that seems to be evidence in favor of changing ...) Yes, there was. I very nearly came down on the side of making jsonb_hash_ops the default, but given that it doesn't make all operators indexable, I ultimately decided against supporting that course of action. I thought that that would be an odd limitation for the default GIN opclass to have. It was a very close call in my mind, and if you favor changing the default now, in light of the few complaints we've heard, I think that's a reasonable decision. That said, as I noted in the main -bugs thread, the case presented is fairly atypical. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
Peter Geoghegan p...@heroku.com writes: On Tue, Apr 8, 2014 at 2:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: (BTW, wasn't there some discussion of changing our minds about which one is the default? We already have one bug report complaining about jsonb_ops' size restriction, so that seems to be evidence in favor of changing ...) Yes, there was. I very nearly came down on the side of making jsonb_hash_ops the default, but given that it doesn't make all operators indexable, I ultimately decided against supporting that course of action. I thought that that would be an odd limitation for the default GIN opclass to have. It was a very close call in my mind, and if you favor changing the default now, in light of the few complaints we've heard, I think that's a reasonable decision. That said, as I noted in the main -bugs thread, the case presented is fairly atypical. Well, let me see if I understand the situation correctly: * jsonb_ops supports more operators * jsonb_hash_ops produces smaller, better-performing indexes * jsonb_ops falls over on inputs with wide field values, but jsonb_hash_ops does not If that's an accurate summary then I would say that we've got the default backwards. I would much rather tell people you can have more operators supported, but here are the tradeoffs than have a default that fails under evidently-not-so-improbable cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached
Oleg Bartunov obartu...@gmail.com writes: We are working to avoid this limitation. What do you mean by that ... do you see it as something that could be fixed quickly, or is this a long-term improvement project? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
Peter Geoghegan wrote: What I have here looks like it speeds things up a little over 200% (so a little over 300% of the original throughput) with a single client for many representative cases. That's a massive difference, to the point that I don't see a lot of sense in considering fmgr-elision alone separately. I think the point here is what matters is that that gain from the strxfrm part of the patch is large, regardless of what the baseline is (right?). If there's a small loss in an uncommon worst case, that's probably acceptable, as long as the worst case is uncommon and the loss is small. But if the loss is large, or the case is not uncommon, then a fix for the regression is going to be a necessity. You seem to be assuming that a fix for whatever regression is found is going to be impossible to find. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, let me see if I understand the situation correctly: * jsonb_ops supports more operators * jsonb_hash_ops produces smaller, better-performing indexes * jsonb_ops falls over on inputs with wide field values, but jsonb_hash_ops does not There might be some compelling cases for indexing existence rather than containment, since the recheck flag isn't set there, but in general this summary seems sound. I would say that broadly, existence is a less useful operator than containment, and so jsonb_hash_ops is broadly more compelling. I didn't propose changing the default due to concerns about the POLA, but I'm happy to be told that those concerns were out of proportion to the practical benefits of a different default. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached
On Wed, Apr 9, 2014 at 1:48 AM, Tom Lane t...@sss.pgh.pa.us wrote: Oleg Bartunov obartu...@gmail.com writes: We are working to avoid this limitation. What do you mean by that ... do you see it as something that could be fixed quickly, or is this a long-term improvement project? Unfortunately, It's long-term project VODKA, we hope to use spgist (also needed some adjustment) for keys+values instead of btree. Hopefully, we'll show something at PGCon. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On 04/08/2014 05:57 PM, Peter Geoghegan wrote: On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, let me see if I understand the situation correctly: * jsonb_ops supports more operators * jsonb_hash_ops produces smaller, better-performing indexes * jsonb_ops falls over on inputs with wide field values, but jsonb_hash_ops does not There might be some compelling cases for indexing existence rather than containment, since the recheck flag isn't set there, but in general this summary seems sound. I would say that broadly, existence is a less useful operator than containment, and so jsonb_hash_ops is broadly more compelling. I didn't propose changing the default due to concerns about the POLA, but I'm happy to be told that those concerns were out of proportion to the practical benefits of a different default. I tend to agree with Tom that POLA will be more violated by the default ops class not being able to index some values. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Apr 8, 2014 at 2:48 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I think the point here is what matters is that that gain from the strxfrm part of the patch is large, regardless of what the baseline is (right?). If there's a small loss in an uncommon worst case, that's probably acceptable, as long as the worst case is uncommon and the loss is small. But if the loss is large, or the case is not uncommon, then a fix for the regression is going to be a necessity. That all seems reasonable. I just don't understand why you'd want to break out the fmgr-elision part, given that that was already discussed at length two years ago. You seem to be assuming that a fix for whatever regression is found is going to be impossible to find. I think that a fix that is actually worth it on balance will be elusive. Heikki's worst case is extremely narrow. I think he'd acknowledge that himself. I've already fixed some plausible regressions. For example, the opportunistic early len1 == l3n2 memcmp() == 0? test covers the common case where two leading keys are equal. I think we're very much into chasing diminishing returns past this point. I think that my figure of a 5% regression is much more realistic, even though it is itself quite unlikely. I think that the greater point is that we don't want to take worrying about worst case performance to extremes. Calling Heikki's 50% regression the worst case is almost unfair, since it involves very carefully crafted input. You could probably also carefully craft input that made our quicksort implementation itself go quadratic, a behavior not necessarily exhibited by an inferior implementation for the same input. Yes, let's consider a pathological worst case, but lets put it in the context of being one end of a spectrum of behaviors, on the extreme fringes. In reality, only a tiny number of individual sort operations will experience any kind of regression at all. In simple terms, I'd be very surprised if anyone complained about a regression at all. If anyone does, it's almost certainly not going to be a 50% regression. There is a reason why many other systems have representative workloads that they target (i.e. a variety of tpc benchmarks). I think that a tyranny of the majority is a bad thing myself, but I'm concerned that we sometimes take that too far. I wonder, why did Heikki not add more padding to the end of the strings in his example, in order to give strxfrm() more wasted work? Didn't he want to make his worst case even worse? Or was is to control for TOASTing noise? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
* Tom Lane (t...@sss.pgh.pa.us) wrote: Bruce Momjian br...@momjian.us writes: If we ignore backward compatibility, then Has OIDs and Identity Replica are similar. One thing that strongly (for me) supports not always printing them is that I expect more people will be confused by the mention of OIDs or Identity Replica than will actually care about these features. For example, if we always printed Child tables: 0, more people would be confused than helped. This is a good argument, actually: these fields are not only noise for most people, but confusing if you don't know the feature they are talking about. I concur with this and would rather they not be there. One of the things that annoys me about certain other RDBMS's is how darn verbose they are- it makes trying to read the definitions require much head-scratching. I'm on the fence about a \d++. In general, I get a bit annoyed when certain information isn't available through the backslash commands, but it's hard to justify another '+' level for just these. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached
On Wed, Apr 9, 2014 at 6:48 AM, Tom Lane t...@sss.pgh.pa.us wrote: Oleg Bartunov obartu...@gmail.com writes: We are working to avoid this limitation. What do you mean by that ... do you see it as something that could be fixed quickly, or is this a long-term improvement project? If this is a known limitation and no fix is planned for 9.4, could it be possible to document it appropriately for this release? This would surprise users. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached
On Tue, Apr 8, 2014 at 4:07 PM, Michael Paquier michael.paqu...@gmail.com wrote: If this is a known limitation and no fix is planned for 9.4, could it be possible to document it appropriately for this release? This would surprise users. It looks like the default GIN opclass will be changed, so it becomes a matter of opting in to the particular set of trade-offs that the current default represents. Presumably that includes the size limitation, which isn't separately documented at present. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New option in pg_basebackup to exclude pg_log files during base backup
Hi all, Following the discussion in message id - cahgqgwffmor4ecugwhzpaapyqbsekdg66vmj1rvej6z-ega...@mail.gmail.commailto:cahgqgwffmor4ecugwhzpaapyqbsekdg66vmj1rvej6z-ega...@mail.gmail.com , I have developed the patch which gives option to user to exclude pg_log directory contents in pg_basebackup. [Current situation] During pg_basebackup, all files in pg_log directory will be copied to new backup directory. [Design] - Added new non-mandatory option -S/--skip-log-dir to pg_basebackup . - If skip-log-dir is specified in pg_basebackup command, then in basebackup, exclude copying log files from standard pg_log directory and any other directory specified in Log_directory guc variable. (Still empty folder pg_log/$Log_directory will be created) - In case, pg_log/$Log_directory is symbolic link, then an empty folder will be created [Advantage] It gives an option to user to avoid copying of large log files if they doesn't wish to and hence can save memory space. Attached the patch. Thanks Regards, Vaishnavi Fujitsu Australia pgbasebackup_excludes_pglog_v1.patch Description: pgbasebackup_excludes_pglog_v1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Wed, Apr 9, 2014 at 8:48 AM, Florian Pflug f...@phlo.org wrote: As explain above, invtrans_bool is a bit problematic, since it carries a real risk of performance regressions. It's included for completeness' sake, and should probably not be committed at this time. Did you mean to write invtrans_minmax? Otherwise you didn't explain about you concerns with bool. Regards David Rowley
Re: [HACKERS] Pending 9.4 patches
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/09/2014 02:00 AM, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Craig Ringer cr...@2ndquadrant.com writes: On 04/05/2014 03:57 AM, Andres Freund wrote: r04) Row-security based on Updatable security barrier views This one's fate seems to be hard to judge without c07. Open issues remain with this patch, and resources for working on it in 9.4 have run out. It is not ready for commit. A core bugfix with locking in security barrier views is required before the regression tests can be fixed up properly, for one thing. Tom also expressed concerns about how plan invalidation works, though it's not yet clear whether that was just miscommunication about how it works on my part or whether there's a concrete problem there. I'd really love to finish this off for 9.4, but other projects have to come first. Given that, I think we should go ahead and mark this one Returned With Feedback. It's past time to be punting anything that doesn't have a serious chance of getting committed for 9.4. I'm a bit confused on this point- is the only issue the *preexisting* bug with security barrier views? This thread discusses two patches. The above refers to row security (per quoted text at top), not updatable security barrier views. Updatable security barrier views are ready. There's a pre-existing bug with security barrier views, but updatable s.b. views don't make it any worse and it can be fixed separately. Row security is not. It could possibly be committed w/o a fix for the security barrier bug by deleting the relevant regression tests, but Tom had reservations about plan invalidation in it, the docs need updating, and it needs a bunch more testing. It's possible I could have it ready in a few days - or it might be a couple of weeks. I ran out of time to work on it for 9.4. Craig, in general, I'd argue that a pre-existing bug isn't a reason that a patch isn't ready for commit. The bug may need to be fixed before the patch goes in, but saying a patch isn't ready implied, to me at least, issues with the *patch*, which it sounds like isn't the case here. I tend to agree, and for that reason want updatable security barrier views to make it in for 9.4. - -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTRKCAAAoJELBXNkqjr+S2kwQH+gP9+sNyEnE2HiKpRkEgFn0C g+rIfhjJl0ANPMAt6DIBNbns/1t38xqhpkbmirT8cS0RVplAETV6ynYngdzcQQOk GVeoOylSr75Hh3PWC82qRBHtgMZ7tV8RChNXgW6p4qekpAhqmAMJzBwq+bVhKXmZ +Wfpc1u5wTTc0aw9pmQVmr3ZpjibI+C54+eYrq97+JmC7kFHQWrLAmM/stiGeJpW nzOCADfQolpjCWDts/flwKDu+F2y4aUNhOUEiMo+LtPqPRgYioZwIUMeF5HBz+Ng CQTnjDeC/ROBFMvD1Jk1wBKvNl5lPd3ikdaLIaCmjav4hX2B35fbmuQLKgkxOwM= =AaWD -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pending 9.4 patches
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/09/2014 01:54 AM, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Tue, Apr 8, 2014 at 11:59 AM, Gregory Smith gregsmithpg...@gmail.com wrote: I have no other projects ahead of this for the remainder of this month. I just can't figure out what to do next until there's a committer (or committers, if someone else is going to take on the locking bug) identified. I looked at the locking problem enough to know that one is over my head. What post or thread should I read for details about this locking bug? I think it was discussed a couple times, but it's here: http://www.postgresql.org/message-id/531d3355.6010...@2ndquadrant.com Yep. The follow-ups are important though - showing that it's actually a pre-existing bug, and while it might also occur in updatable security barrier views, there's already a locking problem in the existing security barrier views that needs to be fixed. The short version is: When you have a RowMark (SELECT ... FOR UPDATE/SHARE, UPDATE, or DELETE) on the results of a query, security barrier views are incorrectly pushing this row mark down into the security_barrier subquery they generate. That means that if you: SELECT * FROM some_view FOR UPDATE WHERE id % 2 = 0; you should be locking only *even numbered* rows that match the predicate of some_view, but in fact, what'll get run looks like the pseudo-SQL: SELECT * FROM ( SELECT * FROM underlying_table WHERE view_predicate FOR UPDATE ) some_view FOR UPDATE; i.e. we're locking *all rows that match the view predicate*, failing to apply the *user* predicate before locking. - -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTRKFtAAoJELBXNkqjr+S2pa8IAI1UJPstG1EIcoT5szB7BXWT FBnRpe5zECM1faZuHAjx9dRYXGjv/u5E+wq2jwocXLqRPIf4Cu90KDmwx3O2gCPO psv8lpfkmjX7MGtuz4Y1A8OkcB+Q3m+4neV+NpFnPA5A3Dx7WLjiFCdHTurlvtD1 BZxK0YkUWw3S40v67MZtcOIrCRwVPQP9NS+PEt3WfTydRryXecOKnJxdolH6H4A8 1inCLvIfphkCChFMiLV6r+mzzi4JxRiPEwWg3uccLRhCwcTf1BQJcXbiKdbcTYBW XT5CSyVm76gpd3WkFfxahlXkaSLOrzGney0LGHUI4ItunlxQzPgQhx2ghc97P9w= =H425 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pending 9.4 patches
* Craig Ringer (cr...@2ndquadrant.com) wrote: On 04/09/2014 02:00 AM, Stephen Frost wrote: I'm a bit confused on this point- is the only issue the *preexisting* bug with security barrier views? This thread discusses two patches. The above refers to row security (per quoted text at top), not updatable security barrier views. Right, I understood that. Updatable security barrier views are ready. There's a pre-existing bug with security barrier views, but updatable s.b. views don't make it any worse and it can be fixed separately. Ok. Row security is not. It could possibly be committed w/o a fix for the security barrier bug by deleting the relevant regression tests, but Tom had reservations about plan invalidation in it, the docs need updating, and it needs a bunch more testing. It's possible I could have it ready in a few days - or it might be a couple of weeks. I ran out of time to work on it for 9.4. So- row security makes the *existing bug* worse; I get that. The question regarding plan invalidation may be something we can work out. As for docs and testing, those are things we would certainly be better off with and may mean that this isn't able to make it into 9.4, which is fair, but I wouldn't toss it out solely due to that. Craig, in general, I'd argue that a pre-existing bug isn't a reason that a patch isn't ready for commit. The bug may need to be fixed before the patch goes in, but saying a patch isn't ready implied, to me at least, issues with the *patch*, which it sounds like isn't the case here. I tend to agree, and for that reason want updatable security barrier views to make it in for 9.4. Ok. I'm going to make a serious effort to find time to work on this, at least. Right now I'm busy preparing to launch a new site (you'll see the announce in a couple days...), etc, etc, but I should have time this weekend... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Apr9, 2014, at 02:55 , David Rowley dgrowle...@gmail.com wrote: On Wed, Apr 9, 2014 at 8:48 AM, Florian Pflug f...@phlo.org wrote: As explain above, invtrans_bool is a bit problematic, since it carries a real risk of performance regressions. It's included for completeness' sake, and should probably not be committed at this time. Did you mean to write invtrans_minmax? Otherwise you didn't explain about you concerns with bool. Grmpf. Should have re-read that once more before sending :-( Yes, I meant invtrans_minmax is problematic! invtrans_bool is fine, the inverse transition function never fails for BOOL_AND and BOOL_OR. This is why I factored it out into a separate patch, to make it easy to not apply the MIN/MAX stuff, while still applying the BOOL stuff. Sorry for the confision. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pending 9.4 patches
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/09/2014 09:28 AM, Stephen Frost wrote: Ok. I'm going to make a serious effort to find time to work on this, at least. Right now I'm busy preparing to launch a new site (you'll see the announce in a couple days...), etc, etc, but I should have time this weekend... That'd be fantastic. I'm committed on other work, but I'll make the time to get back on this somehow. I've put a lot of time and work into it, and would love to see it make 9.4 against the odds. If just updatable security barrier views make it, that's a big plus for next time around. - -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTRKqGAAoJELBXNkqjr+S28MgIAJ7uD5gYMS+zX1VKhtB104qH 9+iO1KeK/JKvNlVitwVQ+rpAUcCt13VU0CIKH3mn/5+hRQLiM/8mffdl33oCdh4L RRtx3zMiM74NiJk8H0Z9awjdAAAEe5IpcQuac57sFn8+NjQJAykpv03AwltLgbd7 s+ZK90kqGHtQTRAvxJqGfObRa/rc7IP1iASxk26xiRR/fTBxjGIJ0T+ihbjf/XI0 gYobmaUUQJFxoKTprhLL+MZHBf2UZntbJJBuL7VS9b6NlgyeS6rVai7f5MyREW0m giIpKWRJTROW7o8syAy7jjCpuKgbuxVvAHOFdo8EIyX4u6tkZ4NakUwdn1Zimgg= =QPHO -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Thu, Apr 3, 2014 at 12:19 PM, Thom Brown t...@linux.com wrote: Looking good: -T 100 -n -f sort.sql Master: 21.670467 / 21.718653 (avg: 21.69456) Patch: 66.888756 / 66.888756 (avg: 66.888756) These were almost exactly the same figures as I saw on my machine. However, when compiling with certain additional flags -- with CFLAGS=-O3 -march=native -- I was able to squeeze more out of this. My machine has a recent Intel CPU, Intel(R) Core(TM) i7-3520M. With these build settings the benchmark then averages about 75.5 tps across multiple runs, which I'd call a fair additional improvement. I tried this because I was specifically interested in the results of a memcmp implementation that uses SIMD. I believe that these flags make gcc/glibc use a memcmp implementation that takes advantage of SSE where supported (and various subsequent extensions). Although I didn't go to the trouble of verifying all this by going through the disassembly, or instrumenting the code in any way, that is my best guess as to what actually helped. I don't know how any of that might be applied to improve matters in the real world, which is why I haven't dived into this further, but it's worth being aware of. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
Andrew Dunstan and...@dunslane.net writes: On 04/08/2014 05:57 PM, Peter Geoghegan wrote: ... I didn't propose changing the default due to concerns about the POLA, but I'm happy to be told that those concerns were out of proportion to the practical benefits of a different default. I tend to agree with Tom that POLA will be more violated by the default ops class not being able to index some values. We should wait a bit longer to see if anyone objects, but assuming that this represents the consensus opinion ... ISTM that the name jsonb_ops should have pride of place as the default jsonb opclass. Therefore, if we make this change, jsonb_hash_ops needs to be renamed to jsonb_ops, and we need a new name for what is now jsonb_ops. I haven't paid attention to the technical details of the differences so I have no idea what to suggest for the new name. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer Allocation Concurrency Limits
On Tue, Apr 8, 2014 at 10:38 PM, Jason Petersen ja...@citusdata.com wrote: In December, Metin (a coworker of mine) discussed an inability to scale a simple task (parallel scans of many independent tables) to many cores (it's here). As a ramp-up task at Citus I was tasked to figure out what the heck was going on here. I have a pretty extensive writeup here (whose length is more a result of my inexperience with the workings of PostgreSQL than anything else) and was looking for some feedback. At this moment, I am not able to open the above link (here), may be some problem (it's showing Service Unavailable); I will try it later. In short, my conclusion is that a working set larger than memory results in backends piling up on BufFreelistLock. Here when you say that working set larger than memory, do you mean to refer *memory* as shared_buffers? I think if the data is more than total memory available, anyway the effect of I/O can over shadow the effect of BufFreelistLock contention. As much as possible I removed anything that could be blamed for this: Hyper-Threading is disabled zone reclaim mode is disabled numactl was used to ensure interleaved allocation kernel.sched_migration_cost was set to highly disable migration kernel.sched_autogroup_enabled was disabled transparent hugepage support was disabled For a way forward, I was thinking the buffer allocation sections could use some of the atomics Andres added here. Rather than workers grabbing BufFreelistLock to iterate the clock hand until they find a victim, the algorithm could be rewritten in a lock-free style, allowing workers to move the clock hand in tandem. Alternatively, the clock iteration could be moved off to a background process, similar to what Amit Kapila proposed here. I think both of the above ideas can be useful, but not sure if they are sufficient for scaling shared buffer's. Is this assessment accurate? I know 9.4 changes a lot about lock organization, but last I looked I didn't see anything that could alleviate this contention: are there any plans to address this? I am planing to work on this for 9.5. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor improvements in alter_table.sgml
(2014/04/09 1:23), Robert Haas wrote: On Tue, Apr 8, 2014 at 5:05 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Attached is a patch to improve the manual page for the ALTER TABLE command. Do we really need to add a section for type_name when we already have a section for OF type_name? I think that the section for type_name would be necessary as that in chapter Parameters, not in chapter Description, which includes the section for OF type_name. constraint_name is also used for adding a constraint using an index. So it could not only be a constraint to alter, validate, or drop, but also a new constraint name to be added. I overlooked that. Honestly, how much value is there in even having a section for this? Do we really want to document constraint_name as name of an existing constraint, or the name of a new constraint to be added? It would be accurate, then, but it also doesn't really tell you anything you didn't know already. You have a point there, but I feel odd about the documentation as is, because some are well written (eg, column_name) and some are not (eg, constraint_name). So, if there are no objections, I'd like to update the patch. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
On Tue, Apr 8, 2014 at 5:37 PM, Bruce Momjian br...@momjian.us wrote: On Tue, Apr 8, 2014 at 05:29:45PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: If we ignore backward compatibility, then Has OIDs and Identity Replica are similar. One thing that strongly (for me) supports not always printing them is that I expect more people will be confused by the mention of OIDs or Identity Replica than will actually care about these features. For example, if we always printed Child tables: 0, more people would be confused than helped. This is a good argument, actually: these fields are not only noise for most people, but confusing if you don't know the feature they are talking about. Let me put it this way: I didn't know what Identity Replica meant when I saw it in psql. Now, some might say that is expected, but still. ;-) Well, that's sorta my concern. I mean, right now we've got people saying what the heck is a replica identity?. But, if the logical decoding stuff becomes popular, as I hope it will, that's going to be an important thing for people to adjust, and the information needs to be present in a clear and easily-understood way. I haven't studied the current code in detail so maybe it's fine. I just want to make sure we're not giving it second-class treatment solely on the basis that it's new and people aren't using it yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] Call for GIST/GIN/SP-GIST opclass documentation
On Tue, Apr 8, 2014 at 4:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: I just created sections in the SGML manual chapters about GIST, GIN, and SP-GIST to hold documentation about the standard opclasses provided for them: http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html We never had any well-defined place to discuss such opclasses before, but I think it's past time we did. +1. Great idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
Robert Haas robertmh...@gmail.com writes: Well, that's sorta my concern. I mean, right now we've got people saying what the heck is a replica identity?. But, if the logical decoding stuff becomes popular, as I hope it will, that's going to be an important thing for people to adjust, and the information needs to be present in a clear and easily-understood way. I haven't studied the current code in detail so maybe it's fine. I just want to make sure we're not giving it second-class treatment solely on the basis that it's new and people aren't using it yet. I think the proposal is don't mention the property if it has the default value. That's not second-class status, as long as people who know what the property is understand that behavior. It's just conserving screen space. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 09 April 2014 01:09, Rover Haas Wrote: I'm also pretty unconvinced that multiple PGPROCs is the right way to go. First, PGPROCs have a bunch of state in them that is assumed to exist once per backend. We might find pretty substantial code churn there if we try to go change that. Yes you right. That is why I am not creating a separate procarray entry to maintain autonomous transaction. Please find details in previous reply sent today sometime back. Second, why do other backends really need to know about our ATs? As far as I can see, if other backends see the AT as a subtransaction of our top-level transaction up until it actually commits, that ought to be just fine. Maybe the backend needs to internally frob visibility rules, but that's not a matter for shared memory. In order to get snapshot from other session, it will be required by other session to access autonomous transaction and their sub-transactions. During snapshot creation, autonomous transaction is considered as main transaction and list of all running autonomous transaction and their sub-transaction gets stored in snapshot data. e.g. Suppose below processes are running with given transactions: Proc-1: 100 Proc-2: 101, 102 (Auto Tx1), 103 (Auto Tx2), 104 (Sub-tx of Auto Tx2) Proc-3: 105, 106 (Auto Tx2), 107 (Auto Tx2) Suppose latest completed transaction is 108. Then Snapshot data for autonomous transaction 107 will be as below: Xmin: 100 Xmax: 109 Snapshot-xip[]: 100, 101, 102, 103, 105, 106 Snapshot-subxip[]: 104 Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal for Merge Join for Non '=' Operators
I would like to propose a New merge join algorithm for optimizing non '=' operators. ('', '=', '', '=') - Currently Merge join is only supported for '=' operator. For '' or '' operator it chooses Nest Loop Join, or Nest loop with materialization. - I think when tuple from lower node is sorted or sorting cost is very less, then we can use Merge Join for Non Equal operator also and which will give better performance than NLJ (for selecting this new cost calculation can be implemented in planner). Example for using merge Join for operator. T1T2 3 1 4 2 5 4 Outer tuple (3) need to be compared with inner tuple one by one, so it will satisfy condition at third inner tuple (as 34). So here we can save this point of inner tuple so that next outer tuple can directly start comparison from this tuple. 1. In this algorithm we can put one more optimization: Once outer tuple satisfies the Merge QUAL it can skip the Merge QUAL test with remaining inner tuple and directly apply Other QUALs, as merge QUAL will always satisfy for remaining tuples. Implementation Detail: 1. Need to add new cost calculation mechanism for this. I still have to work on this part. 2. Implementing in Executor a. This algorithm is almost same as normal merge Join with some changes. b. Both Inner and Outer Data Sources should be sorted, same as normal merge Join. ALGORITHM: Merge Qual (R.A Q.A) r = first tuple from R (Outer Relation) q = first tuple in Q( Inner Relation) save_pos = q; /* Position to start scanning in relation Q*/ While (fetch tuple r from R till relation end) { for each tuple q in Q starting from save_pos { Merge Qual Satisfy { save_pos = q; Consume all subsequent tuples and project(just need to match Other Quals if any.) } Else Fetch Next tuple from Q; } } - Performance Comparison: Suppose tuples of inner and outer is already sorted or Index scan on inner and outer. * Then cost of NLJ is always O (r*q). * The cost of this MJ will be b/w: O (n) to O (r*q). Where r is number of tuple in R (outer relation) and q is number of tuple in Q (inner Relation). Please provide your feedback/suggestions. Thanks Regards, Dilip Kumar
Re: [HACKERS] Autonomous Transaction (WIP)
On 09 April 2014 01:43, Tom Lane Wrote: I'm also pretty unconvinced that multiple PGPROCs is the right way to go. First, PGPROCs have a bunch of state in them that is assumed to exist once per backend. We might find pretty substantial code churn there if we try to go change that. Second, why do other backends really need to know about our ATs? As far as I can see, if other backends see the AT as a subtransaction of our top-level transaction up until it actually commits, that ought to be just fine. If we can make it work like that, sure. I'm a bit worried about how you'd decouple a subtransaction and commit it atomically ... or if that's not atomic, will it create any problems? Though autonomous transaction uses mixed approach of sub-transaction as well as main transaction, transaction state of autonomous transaction is handled independently. So depending on the transaction state of autonomous transaction (for commit TBLOCK_AUTOCOMMIT), this transaction will be committed. While committing: 1. Commit of record and logging the corresponding WAL happens in the same way as main transaction (except the way autonomous transaction and their sub-transaction accessed). This will take care automatically of updating pg_clog also for autonomous transaction. 2. Also it marks the autonomous transaction finish by setting appropriate fields of MyPgAutonomousXact in similar manner as done for main transaction. 3. Freeing of all resource and popping out of parent transaction happens in the same way as sub-transaction. The point being that you need to change both pg_subtrans and pg_clog to make that state transition. Yes I am changing both. But no specific changes were required. During commit and assignment of autonomous transaction, it is automatically taken care. Any comment/feedback/doubt are welcome? Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers