Re: [HACKERS] PostgreSQL Columnar Store for Analytic Workloads

2014-04-08 Thread Stefan Keller
Hi Hadi

Do you think that cstore_fd*w* is also welll suited for storing and
retrieving linked data (RDF)?

-S.



2014-04-03 18:43 GMT+02:00 Hadi Moshayedi h...@citusdata.com:

 Dear Hackers,

 We at Citus Data have been developing a columnar store extension for
 PostgreSQL. Today we are excited to open source it under the Apache v2.0
 license.

 This columnar store extension uses the Optimized Row Columnar (ORC) format
 for its data layout, which improves upon the RCFile format developed at
 Facebook, and brings the following benefits:

 * Compression: Reduces in-memory and on-disk data size by 2-4x. Can be
 extended to support different codecs. We used the functions in
 pg_lzcompress.h for compression and decompression.
 * Column projections: Only reads column data relevant to the query.
 Improves performance for I/O bound queries.
 * Skip indexes: Stores min/max statistics for row groups, and uses them to
 skip over unrelated rows.

 We used the PostgreSQL FDW APIs to make this work. The extension doesn't
 implement the writable FDW API, but it uses the process utility hook to
 enable COPY command for the columnar tables.

 This extension uses PostgreSQL's internal data type representation to
 store data in the table, so this columnar store should support all data
 types that PostgreSQL supports.

 We tried the extension on TPC-H benchmark with 4GB scale factor on a
 m1.xlarge Amazon EC2 instance, and the query performance improved by 2x-3x
 compared to regular PostgreSQL table. Note that we flushed the page cache
 before each test to see the impact on disk I/O.

 When data is cached in memory, the performance of cstore_fdw tables were
 close to the performance of regular PostgreSQL tables.

 For more information, please visit:
  * our blog post:
 http://citusdata.com/blog/76-postgresql-columnar-store-for-analytics
  * our github page: https://github.com/citusdata/cstore_fdw

 Feedback from you is really appreciated.

 Thanks,
   -- Hadi




[HACKERS] Doc typo in 9.28. Event Trigger Functions

2014-04-08 Thread Ian Barwick

Just a single missing 's'.


Regards

Ian Barwick

--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 6e2fbda..b5807f3
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** FOR EACH ROW EXECUTE PROCEDURE suppress_
*** 17446,17452 
 /para
  
 para
! functionpg_event_trigger_dropped_objects/ returns a list of all object
  dropped by the command in whose literalsql_drop/ event it is called.
  If called in any other context,
  functionpg_event_trigger_dropped_objects/ raises an error.
--- 17446,17452 
 /para
  
 para
! functionpg_event_trigger_dropped_objects/ returns a list of all objects
  dropped by the command in whose literalsql_drop/ event it is called.
  If called in any other context,
  functionpg_event_trigger_dropped_objects/ raises an error.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] Proposal: variant of regclass

2014-04-08 Thread Yugo Nagata
On Mon, 7 Apr 2014 12:00:49 -0400
Robert Haas robertmh...@gmail.com wrote:

 In other words, let's revert the whole refactoring of this file to
 create reg*_guts functions, and instead just copy the relevant logic
 for the name lookups into the new functions.  For to_regproc(), for
 example, it would look like this (untested):
 
   names = stringToQualifiedNameList(pro_name_or_oid);
   clist = FuncnameGetCandidates(names, -1, NIL, false, false, false);
   if (clist == NULL || clist-next != NULL)
result = InvalidOid;
   else
result = clist-oid;
 
 With that change, this patch will actually get a whole lot smaller,
 change less already-existing code, and deliver cleaner behavior.

Here is an updated patch. I rewrote regproc.c not to use _guts functions,
and fixed to_reg* not accept a numeric OID. I also updated the documents
and some comments. Is this better than the previous one?

 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


-- 
Yugo Nagata nag...@sraoss.co.jp


to_regclass.patch.v7
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Including replication slot data in base backups

2014-04-08 Thread Michael Paquier
On Tue, Apr 8, 2014 at 1:18 AM, Robert Haas robertmh...@gmail.com wrote:
 Not sure if this is exactly the right way to do it, but I agree that
 something along those lines is a good idea.  I also think, maybe even
 importantly, that we should probably document that people using
 file-copy based hot backups should strongly consider removing the
 replication slots by hand before using the backup.
Good point. Something here would be adapted in this case:
http://www.postgresql.org/docs/devel/static/backup-file.html
I am attaching an updated patch as well.
-- 
Michael
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 854b5fd..d8286b0 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -448,6 +448,13 @@ tar -cf backup.tar /usr/local/pgsql/data
the contents of indexes for example, just the commands to recreate
them.)  However, taking a file system backup might be faster.
   /para
+
+  para
+   When doing a file system backup, it is recommended to drop replication
+   slots (see xref linkend=streaming-replication-slots) before using
+   it as it is not guaranteed that the WAL files needed by a slot will be
+   kept on the newly-created node.
+  /para
  /sect1
 
  sect1 id=continuous-archiving
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 6ce0c8c..b81ad8d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -590,6 +590,13 @@ PostgreSQL documentation
or an older major version, down to 9.1. However, WAL streaming mode (-X
stream) only works with server version 9.3.
   /para
+
+  para
+   The backup will not include information about replication slots
+   (see xref linkend=streaming-replication-slots) as it is not
+   guaranteed that a node in recovery will have WAL files required for
+   a given slot.
+  /para
  /refsect1
 
  refsect1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is it not sane to pass ExecStoreTuple(shouldFree=true) for tuples point into buffers

2014-04-08 Thread Andres Freund
On 2014-04-07 21:47:38 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  So what I now do is essentially:
  while ((scantuple = index_getnext(scan, ForwardScanDirection)) != NULL)
  {
  ...
  ht = palloc(sizeof(HeapTupleData)); /* in the right context */
  memcpy(ht, scantuple, sizeof(HeapTupleData));
  ExecStoreTuple(ht, slot, scan-xs_cbuf, false);
  slot-tts_shouldFree = true;
  ...
  }
 
 Well, that is certainly messy.  I think you could just use a local
 HeapTupleData variable instead of palloc'ing every time, where local
 means has lifespan similar to the slot pointer.

Doesn't work nicely in this specific situation, but it's obviously an
alternative.

 There's some vaguely similar hacking near the end of ExecDelete.

Yea, and some other places. I wonder if a ExecShallowMaterializeSlot()
or something would be useful for me, that callsite and others?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor improvements in alter_table.sgml

2014-04-08 Thread Etsuro Fujita
Attached is a patch to improve the manual page for the ALTER TABLE command.

Thanks,

Best regards,
Etsuro Fujita
diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 0b08f83..ce67c71 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -625,8 +625,9 @@ ALTER TABLE [ IF EXISTS ] replaceable 
class=PARAMETERname/replaceable
 listitem
  para
   The literalRENAME/literal forms change the name of a table
-  (or an index, sequence, or view), the name of an individual column in
-  a table, or the name of a constraint of the table. There is no effect on 
the stored data.
+  (or an index, sequence, view, materialized view, or foreign table), the 
name
+  of an individual column in a table, or the name of a constraint of the 
table.
+  There is no effect on the stored data.
  /para
 /listitem
/varlistentry
@@ -717,7 +718,7 @@ ALTER TABLE [ IF EXISTS ] replaceable 
class=PARAMETERname/replaceable
  /varlistentry
 
  varlistentry
-  termreplaceable class=PARAMETERtype/replaceable/term
+  termreplaceable class=PARAMETERdata_type/replaceable/term
   listitem
para
 Data type of the new column, or new data type for an existing
@@ -739,7 +740,7 @@ ALTER TABLE [ IF EXISTS ] replaceable 
class=PARAMETERname/replaceable
   termreplaceable class=PARAMETERconstraint_name/replaceable/term
   listitem
para
-Name of an existing constraint to drop.
+Name of an existing constraint to alter, validate, or drop.
/para
   /listitem
  /varlistentry
@@ -836,6 +837,15 @@ ALTER TABLE [ IF EXISTS ] replaceable 
class=PARAMETERname/replaceable
  /varlistentry
 
  varlistentry
+  termreplaceable class=PARAMETERtype_name/replaceable/term
+  listitem
+   para
+Name of an existing composite type.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termreplaceable class=PARAMETERnew_owner/replaceable/term
   listitem
para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: add psql tab completion for event triggers

2014-04-08 Thread Ian Barwick
As it was kind of annoying not to have this when playing around with 
event triggers.


This also tightens up the existing tab completion for ALTER TRIGGER, 
which contained redundant code for table name completion, and which was 
also causing a spurious RENAME TO to be inserted in this context:


CREATE EVENT TRIGGER foo ON {event} ^I


Regards

Ian Barwick

--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 6e2fbda..b5807f3
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** FOR EACH ROW EXECUTE PROCEDURE suppress_
*** 17446,17452 
 /para
  
 para
! functionpg_event_trigger_dropped_objects/ returns a list of all object
  dropped by the command in whose literalsql_drop/ event it is called.
  If called in any other context,
  functionpg_event_trigger_dropped_objects/ raises an error.
--- 17446,17452 
 /para
  
 para
! functionpg_event_trigger_dropped_objects/ returns a list of all objects
  dropped by the command in whose literalsql_drop/ event it is called.
  If called in any other context,
  functionpg_event_trigger_dropped_objects/ raises an error.
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
new file mode 100644
index 202ffce..7179642
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*** static const SchemaQuery Query_for_list_
*** 714,721 
 FROM pg_catalog.pg_prepared_statements \
WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'
  
  /*
!  * This is a list of all things in Pgsql, which can show up after CREATE or
   * DROP; and there is also a query to get a list of them.
   */
  
--- 714,726 
 FROM pg_catalog.pg_prepared_statements \
WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'
  
+ #define Query_for_list_of_event_triggers \
+  SELECT pg_catalog.quote_ident(evtname) \
+FROM pg_catalog.pg_event_trigger \
+   WHERE substring(pg_catalog.quote_ident(evtname),1,%d)='%s'
+ 
  /*
!  * This is a list of all things in Postgres, which can show up after CREATE or
   * DROP; and there is also a query to get a list of them.
   */
  
*** static const pgsql_thing_t words_after_c
*** 746,751 
--- 751,757 
  	{DATABASE, Query_for_list_of_databases},
  	{DICTIONARY, Query_for_list_of_ts_dictionaries, NULL, THING_NO_SHOW},
  	{DOMAIN, NULL, Query_for_list_of_domains},
+ 	{EVENT TRIGGER, NULL, NULL},
  	{EXTENSION, Query_for_list_of_extensions},
  	{FOREIGN DATA WRAPPER, NULL, NULL},
  	{FOREIGN TABLE, NULL, NULL},
*** psql_completion(const char *text, int st
*** 934,942 
  	{
  		static const char *const list_ALTER[] =
  		{AGGREGATE, COLLATION, CONVERSION, DATABASE, DEFAULT PRIVILEGES, DOMAIN,
! 			EXTENSION, FOREIGN DATA WRAPPER, FOREIGN TABLE, FUNCTION,
  			GROUP, INDEX, LANGUAGE, LARGE OBJECT, MATERIALIZED VIEW, OPERATOR,
! 			 ROLE, RULE, SCHEMA, SERVER, SEQUENCE, SYSTEM SET, TABLE,
  			TABLESPACE, TEXT SEARCH, TRIGGER, TYPE,
  		USER, USER MAPPING FOR, VIEW, NULL};
  
--- 940,948 
  	{
  		static const char *const list_ALTER[] =
  		{AGGREGATE, COLLATION, CONVERSION, DATABASE, DEFAULT PRIVILEGES, DOMAIN,
! 			EXTENSION, EVENT TRIGGER, FOREIGN DATA WRAPPER, FOREIGN TABLE, FUNCTION,
  			GROUP, INDEX, LANGUAGE, LARGE OBJECT, MATERIALIZED VIEW, OPERATOR,
! 			ROLE, RULE, SCHEMA, SERVER, SEQUENCE, SYSTEM SET, TABLE,
  			TABLESPACE, TEXT SEARCH, TRIGGER, TYPE,
  		USER, USER MAPPING FOR, VIEW, NULL};
  
*** psql_completion(const char *text, int st
*** 1013,1018 
--- 1019,1055 
  		COMPLETE_WITH_LIST(list_ALTEREXTENSION);
  	}
  
+ 	/* ALTER EVENT TRIGGER */
+ 	else if (pg_strcasecmp(prev3_wd, ALTER) == 0 
+ 			 pg_strcasecmp(prev2_wd, EVENT) == 0 
+ 			 pg_strcasecmp(prev_wd, TRIGGER) == 0)
+ 	{
+ 		COMPLETE_WITH_QUERY(Query_for_list_of_event_triggers);
+ 	}
+ 
+ 	/* ALTER EVENT TRIGGER name */
+ 	else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
+ 			 pg_strcasecmp(prev3_wd, EVENT) == 0 
+ 			 pg_strcasecmp(prev2_wd, TRIGGER) == 0)
+ 	{
+ 		static const char *const list_ALTER_EVENT_TRIGGER[] =
+ 			{DISABLE, ENABLE, OWNER TO, RENAME TO, NULL};
+ 
+ 		COMPLETE_WITH_LIST(list_ALTER_EVENT_TRIGGER);
+ 	}
+ 
+ 	/* ALTER EVENT TRIGGER name ENABLE */
+ 	else if (pg_strcasecmp(prev5_wd, ALTER) == 0 
+ 			 pg_strcasecmp(prev4_wd, EVENT) == 0 
+ 			 pg_strcasecmp(prev3_wd, TRIGGER) == 0 
+ 			 pg_strcasecmp(prev_wd, ENABLE) == 0)
+ 	{
+ 		static const char *const list_ALTER_EVENT_TRIGGER_ENABLE[] =
+ 			{REPLICA, ALWAYS, NULL};
+ 
+ 		COMPLETE_WITH_LIST(list_ALTER_EVENT_TRIGGER_ENABLE);
+ 	}
+ 
  	/* ALTER FOREIGN */
  	else if (pg_strcasecmp(prev2_wd, ALTER) == 0 
  			 pg_strcasecmp(prev_wd, FOREIGN) == 0)
*** psql_completion(const char *text, int st
*** 1318,1340 
  			 pg_strcasecmp(prev2_wd, TRIGGER) == 0)
  		

Re: [HACKERS] Patch: add psql tab completion for event triggers

2014-04-08 Thread Ian Barwick

On 08/04/14 18:22, Ian Barwick wrote:

As it was kind of annoying not to have this when playing around with
event triggers.

This also tightens up the existing tab completion for ALTER TRIGGER,
which contained redundant code for table name completion, and which was
also causing a spurious RENAME TO to be inserted in this context:

 CREATE EVENT TRIGGER foo ON {event} ^I


Apologies, previous patch had some unrelated changes in it.

Correct patch attached.


--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
new file mode 100644
index 202ffce..7179642
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*** static const SchemaQuery Query_for_list_
*** 714,721 
 FROM pg_catalog.pg_prepared_statements \
WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'
  
  /*
!  * This is a list of all things in Pgsql, which can show up after CREATE or
   * DROP; and there is also a query to get a list of them.
   */
  
--- 714,726 
 FROM pg_catalog.pg_prepared_statements \
WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'
  
+ #define Query_for_list_of_event_triggers \
+  SELECT pg_catalog.quote_ident(evtname) \
+FROM pg_catalog.pg_event_trigger \
+   WHERE substring(pg_catalog.quote_ident(evtname),1,%d)='%s'
+ 
  /*
!  * This is a list of all things in Postgres, which can show up after CREATE or
   * DROP; and there is also a query to get a list of them.
   */
  
*** static const pgsql_thing_t words_after_c
*** 746,751 
--- 751,757 
  	{DATABASE, Query_for_list_of_databases},
  	{DICTIONARY, Query_for_list_of_ts_dictionaries, NULL, THING_NO_SHOW},
  	{DOMAIN, NULL, Query_for_list_of_domains},
+ 	{EVENT TRIGGER, NULL, NULL},
  	{EXTENSION, Query_for_list_of_extensions},
  	{FOREIGN DATA WRAPPER, NULL, NULL},
  	{FOREIGN TABLE, NULL, NULL},
*** psql_completion(const char *text, int st
*** 934,942 
  	{
  		static const char *const list_ALTER[] =
  		{AGGREGATE, COLLATION, CONVERSION, DATABASE, DEFAULT PRIVILEGES, DOMAIN,
! 			EXTENSION, FOREIGN DATA WRAPPER, FOREIGN TABLE, FUNCTION,
  			GROUP, INDEX, LANGUAGE, LARGE OBJECT, MATERIALIZED VIEW, OPERATOR,
! 			 ROLE, RULE, SCHEMA, SERVER, SEQUENCE, SYSTEM SET, TABLE,
  			TABLESPACE, TEXT SEARCH, TRIGGER, TYPE,
  		USER, USER MAPPING FOR, VIEW, NULL};
  
--- 940,948 
  	{
  		static const char *const list_ALTER[] =
  		{AGGREGATE, COLLATION, CONVERSION, DATABASE, DEFAULT PRIVILEGES, DOMAIN,
! 			EXTENSION, EVENT TRIGGER, FOREIGN DATA WRAPPER, FOREIGN TABLE, FUNCTION,
  			GROUP, INDEX, LANGUAGE, LARGE OBJECT, MATERIALIZED VIEW, OPERATOR,
! 			ROLE, RULE, SCHEMA, SERVER, SEQUENCE, SYSTEM SET, TABLE,
  			TABLESPACE, TEXT SEARCH, TRIGGER, TYPE,
  		USER, USER MAPPING FOR, VIEW, NULL};
  
*** psql_completion(const char *text, int st
*** 1013,1018 
--- 1019,1055 
  		COMPLETE_WITH_LIST(list_ALTEREXTENSION);
  	}
  
+ 	/* ALTER EVENT TRIGGER */
+ 	else if (pg_strcasecmp(prev3_wd, ALTER) == 0 
+ 			 pg_strcasecmp(prev2_wd, EVENT) == 0 
+ 			 pg_strcasecmp(prev_wd, TRIGGER) == 0)
+ 	{
+ 		COMPLETE_WITH_QUERY(Query_for_list_of_event_triggers);
+ 	}
+ 
+ 	/* ALTER EVENT TRIGGER name */
+ 	else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
+ 			 pg_strcasecmp(prev3_wd, EVENT) == 0 
+ 			 pg_strcasecmp(prev2_wd, TRIGGER) == 0)
+ 	{
+ 		static const char *const list_ALTER_EVENT_TRIGGER[] =
+ 			{DISABLE, ENABLE, OWNER TO, RENAME TO, NULL};
+ 
+ 		COMPLETE_WITH_LIST(list_ALTER_EVENT_TRIGGER);
+ 	}
+ 
+ 	/* ALTER EVENT TRIGGER name ENABLE */
+ 	else if (pg_strcasecmp(prev5_wd, ALTER) == 0 
+ 			 pg_strcasecmp(prev4_wd, EVENT) == 0 
+ 			 pg_strcasecmp(prev3_wd, TRIGGER) == 0 
+ 			 pg_strcasecmp(prev_wd, ENABLE) == 0)
+ 	{
+ 		static const char *const list_ALTER_EVENT_TRIGGER_ENABLE[] =
+ 			{REPLICA, ALWAYS, NULL};
+ 
+ 		COMPLETE_WITH_LIST(list_ALTER_EVENT_TRIGGER_ENABLE);
+ 	}
+ 
  	/* ALTER FOREIGN */
  	else if (pg_strcasecmp(prev2_wd, ALTER) == 0 
  			 pg_strcasecmp(prev_wd, FOREIGN) == 0)
*** psql_completion(const char *text, int st
*** 1318,1340 
  			 pg_strcasecmp(prev2_wd, TRIGGER) == 0)
  		COMPLETE_WITH_CONST(ON);
  
- 	else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
- 			 pg_strcasecmp(prev3_wd, TRIGGER) == 0)
- 	{
- 		completion_info_charp = prev2_wd;
- 		COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger);
- 	}
- 
  	/*
! 	 * If we have ALTER TRIGGER sth ON, then add the correct tablename
  	 */
  	else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
  			 pg_strcasecmp(prev3_wd, TRIGGER) == 0 
  			 pg_strcasecmp(prev_wd, ON) == 0)
! 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
  
  	/* ALTER TRIGGER name ON name */
! 	else if (pg_strcasecmp(prev4_wd, TRIGGER) == 0 
  			 pg_strcasecmp(prev2_wd, ON) == 0)
  		COMPLETE_WITH_CONST(RENAME TO);
  
--- 1355,1374 
  			 

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Heikki Linnakangas

On 04/07/2014 11:35 PM, Peter Geoghegan wrote:

Okay. Here is a worst-case, with the pgbench script the same as my
original test-case, but with much almost maximally unsympathetic data
to sort:

[local]/postgres=# update customers set firstname =
'padding-padding-padding-padding' || firstname;


Hmm. I would expect the worst case to be where the strxfrm is not 
helping because all the entries have the same prefix, but the actual key 
is as short and cheap-to-compare as possible. So the padding should be 
as short as possible. Also, we have a fast path for pre-sorted input, 
which reduces the number of comparisons performed; that will make the 
strxfrm overhead more significant.


I'm getting about 2x slowdown on this test case:

create table sorttest (t text);
insert into sorttest select 'foobarfo' || (g) || repeat('a', 75) from 
generate_series(1, 3) g;


explain analyze select * from sorttest order by t;

Now, you can argue that that's acceptable because it's such a special 
case, but if we're looking for the worst-case..


(BTW, IMHO it's way too late to do this for 9.4)

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Heikki Linnakangas

On 04/07/2014 09:19 PM, Peter Geoghegan wrote:

The only case that this patch could possibly regress is where there
are strings that differ beyond about the first 8 bytes, but are not
identical (we chance a memcmp() == 0 before doing a full strcoll()
when tie-breaking on the semi-reliable initial comparison). We still
always avoid fmgr-overhead (and shim overhead, which I've measured),
as in your original patch - you put that at adding 7% at the time,
which is likely to make up for otherwise-regressed cases. There is
nothing at all contrived about my test-case.


Did I understand correctly that this patch actually does two things:

1. Avoid fmgr and shim overhead
2. Use strxfrm to produce a pseudo-leading key that's cheaper to compare.

In that case, these changes need to be analyzed separately. You don't 
get to make up for the losses by the second part by the gains from the 
first part. We could commit just the first part (for 9.5!), and that has 
to be the baseline for the second part.


This is very promising stuff, but it's not a slam-dunk win. I'd suggest 
adding these to the next commitfest, as two separate patches.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is it not sane to pass ExecStoreTuple(shouldFree=true) for tuples point into buffers

2014-04-08 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-07 21:47:38 -0400, Tom Lane wrote:
 Well, that is certainly messy.  I think you could just use a local
 HeapTupleData variable instead of palloc'ing every time, where local
 means has lifespan similar to the slot pointer.

 There's some vaguely similar hacking near the end of ExecDelete.

 Yea, and some other places. I wonder if a ExecShallowMaterializeSlot()
 or something would be useful for me, that callsite and others?

Don't like that name much, but I agree there's some room for a function
like this.  I guess you're imagining that we'd add a HeapTupleData field
to TupleTableSlots, and use that for the workspace when this situation
arises?

An alternative possibility would be to not invent a new function, but
just make ExecStoreTuple do this unconditionally when shouldFree=false.
Not sure if there'd be a noticeable runtime penalty --- but the
existing approach seems rather fragile.  I know I've always thought
of slots as being fully independent storage, and in this case they
are not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is it not sane to pass ExecStoreTuple(shouldFree=true) for tuples point into buffers

2014-04-08 Thread Andres Freund
On 2014-04-08 09:37:33 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-04-07 21:47:38 -0400, Tom Lane wrote:
  Well, that is certainly messy.  I think you could just use a local
  HeapTupleData variable instead of palloc'ing every time, where local
  means has lifespan similar to the slot pointer.
 
  There's some vaguely similar hacking near the end of ExecDelete.
 
  Yea, and some other places. I wonder if a ExecShallowMaterializeSlot()
  or something would be useful for me, that callsite and others?
 
 Don't like that name much, but I agree there's some room for a function
 like this.

I am not the biggest fan either, for one it's really rather long, for
another it's not really descriptive. One might think it's about toast or
something. Do you have a better name?

 I guess you're imagining that we'd add a HeapTupleData field
 to TupleTableSlots, and use that for the workspace when this situation
 arises?

I wasn't really sure about the approach yet. Either do something like
you describe (possibly reusing/recoining tts_minhdr?), or just allocate
a HeapTupleData struct.

 An alternative possibility would be to not invent a new function, but
 just make ExecStoreTuple do this unconditionally when shouldFree=false.
 Not sure if there'd be a noticeable runtime penalty.

I think that's a viable alternative.

 I know I've always thought
 of slots as being fully independent storage, and in this case they
 are not.

Me to. I was initially rather confused by the memory corruptions I was
seeing. I really thought that storing a tuple pointing to a buffer in
the slot should just work.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Columnar Store for Analytic Workloads

2014-04-08 Thread Hadi Moshayedi
Hi Stefan,

On Tue, Apr 8, 2014 at 9:28 AM, Stefan Keller sfkel...@gmail.com wrote:

 Hi Hadi

 Do you think that cstore_fd*w* is also welll suited for storing and
 retrieving linked data (RDF)?




I am not very familiar with RDF. Note that cstore_fdw doesn't change the
query language of PostgreSQL, so if your queries are expressible in SQL,
they can be answered using cstore_fdw too. If your data is huge and doesn't
fit in memory, then using cstore_fdw can be beneficial for you.

Can you give some more information about your use case? For example, what
are some of your queries? do you have sample data? how much memory do you
have? how large is the data?

-- Hadi


Re: Fwd: [HACKERS] Proposal: variant of regclass

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 3:01 AM, Yugo Nagata nag...@sraoss.co.jp wrote:
 On Mon, 7 Apr 2014 12:00:49 -0400
 Robert Haas robertmh...@gmail.com wrote:
 In other words, let's revert the whole refactoring of this file to
 create reg*_guts functions, and instead just copy the relevant logic
 for the name lookups into the new functions.  For to_regproc(), for
 example, it would look like this (untested):

   names = stringToQualifiedNameList(pro_name_or_oid);
   clist = FuncnameGetCandidates(names, -1, NIL, false, false, false);
   if (clist == NULL || clist-next != NULL)
result = InvalidOid;
   else
result = clist-oid;

 With that change, this patch will actually get a whole lot smaller,
 change less already-existing code, and deliver cleaner behavior.

 Here is an updated patch. I rewrote regproc.c not to use _guts functions,
 and fixed to_reg* not accept a numeric OID. I also updated the documents
 and some comments. Is this better than the previous one?

Looks good, committed with a bit of further cleanup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-04-08 Thread Amit Kapila
On Thu, Apr 3, 2014 at 7:44 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I'd like to do some changes to the WAL format in 9.5. I want to annotate
 each WAL record with the blocks that they modify. Every WAL record already
 includes that information, but it's done in an ad hoc way, differently in
 every rmgr. The RelFileNode and block number are currently part of the WAL
 payload, and it's the REDO routine's responsibility to extract it. I want to
 include that information in a common format for every WAL record type.

 That makes life a lot easier for tools that are interested in knowing which
 blocks a WAL record modifies. One such tool is pg_rewind; it currently has
 to understand every WAL record the backend writes. There's also a tool out
 there called pg_readahead, which does prefetching of blocks accessed by WAL
 records, to speed up PITR. I don't think that tool has been actively
 maintained, but at least part of the reason for that is probably that it's a
 pain to maintain when it has to understand the details of every WAL record
 type.

 It'd also be nice for contrib/pg_xlogdump and backend code itself. The
 boilerplate code in all WAL redo routines, and writing WAL records, could be
 simplified.

I think it will also be useful, if we want to implement table/tablespace
PITR.


 That's for the normal cases. We'll need a couple of variants for also
 registering buffers that don't need full-page images, and perhaps also a
 function for registering a page that *always* needs a full-page image,
 regardless of the LSN. A few existing WAL record types just WAL-log the
 whole page, so those ad-hoc full-page images could be replaced with this.

 With these changes, a typical WAL insertion would look like this:

 /* register the buffer with the WAL record, with ID 0 */
 XLogRegisterBuffer(0, buf, true);

 rdata[0].data = (char *) xlrec;
 rdata[0].len = sizeof(BlahRecord);
 rdata[0].buffer_id = -1; /* -1 means the data is always included */
 rdata[0].next = (rdata[1]);

 rdata[1].data = (char *) mydata;
 rdata[1].len = mydatalen;
 rdata[1].buffer_id = 0; /* 0 here refers to the buffer registered
 above */
 rdata[1].next = NULL

 ...
 recptr = XLogInsert(RM_BLAH_ID, xlinfo, rdata);

 PageSetLSN(buf, recptr);

If we do register buffer's (that require or don't require FPI) before
calling XLogInsert(), then will there be any impact to handle case
where we come to know that we need to backup the buffer after
taking WALInsertLock.. or will that part of code remains same as it is
today.

 Redo
 

 There are four different states a block referenced by a typical WAL record
 can be in:

 1. The old page does not exist at all (because the relation was truncated
 later)
 2. The old page exists, but has an LSN higher than current WAL record, so it
 doesn't need replaying.
 3. The LSN is  current WAL record, so it needs to be replayed.
 4. The WAL record contains a full-page image, which needs to be restored.

I think there might be a need to have separate handling for some special
cases like Init Page which is used in few ops (Insert/Update/multi-insert).
Is there any criteria to decide if it needs to be a separate state or a special
handling for operations?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] Proposal: variant of regclass

2014-04-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Looks good, committed with a bit of further cleanup.

I had not actually paid attention to the non-regclass parts of this, and
now that I look, I've got to say that it seems borderline insane to have
chosen to implement regproc/regoper rather than regprocedure/regoperator.
The types implemented here are incapable of dealing with overloaded names,
which --- particularly in the operator case --- makes them close to
useless.  I don't think this code was ready to commit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: COUNT(*) (and related) speedup

2014-04-08 Thread Robert Haas
On Fri, Apr 4, 2014 at 1:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Joshua Yanovski pythones...@gmail.com writes:
 But worse, what happens if a count(*)
 is in progress?  It might or might not have scanned this page already,
 and there's no way to get the right answer in both cases.  Counter
 updates done by VACUUM would have a similar race-condition problem.

 I don't think the first part really matters.  If the page was visible
 when COUNT(*) started and then got dirtied by some other transaction,
 any changes by the second transaction shouldn't alter the actual count
 in our transaction anyway, so whether we scan the page needlessly or
 not seems beside the point.

 The question is not whether you scan a page needlessly or not, it's
 whether you double-count the tuples on it.  I don't think it's possible to
 be sure that when you add the central counter value into your local sum,
 you are neither double-counting nor omitting pages whose all-visible state
 changed while you were scanning the table.

I think it would work if you also had information about which pages
you needed to scan.  For example, suppose you had a data structure
sorta like the visibility map, except that instead of one bit per page
you have one 16-bit integer per page.  The integer is -1 if the page
isn't known to be all-visible, and is otherwise the count of tuples on
the page.  Now, we can do a count(*) as follows:

1. Lock the first as-yet-unexamined page of the data structure.
2. Add all the values that aren't -1 to your running count of tuples,
and remember the page numbers corresponding to the remaining entries.
3. Unlock the page you locked in step 1.
4. Visit each heap page you remembered in step 2 and add the number of
tuples visible to your scan to your running count of tuples.
5. Provided there's a page of the data structure you haven't visited
yet, go to step 1.

On the write side, any operation that clears the visibility map bit
must also set the entry for that page to -1.  When the page is
all-visible, the value can be set to the total number of tuples on the
page.  This is safe because, even if the page ceases to be all-visible
after we add its tuples to the count, the new tuples that were added
aren't visible to our scan, and any tuples deleted are still visible
to our scan - because the transaction making the changes, in either
case, is obviously still running, and therefore didn't commit before
we took our snapshot.

Now, this wouldn't be O(1) but it would presumably be quite fast if
your visibility map bits are mostly set.  One 8kB page of 16-bit
counters would cover 32MB of heap.

The bad news is that I am pretty sure there's no easy way for an index
AM to get callbacks at the right time, so it's really unclear how
something like this would fit in with our existing abstractions.  And
even if we answered that question, it would be a lot of work for a
pretty narrow use case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] Proposal: variant of regclass

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Looks good, committed with a bit of further cleanup.

 I had not actually paid attention to the non-regclass parts of this, and
 now that I look, I've got to say that it seems borderline insane to have
 chosen to implement regproc/regoper rather than regprocedure/regoperator.
 The types implemented here are incapable of dealing with overloaded names,
 which --- particularly in the operator case --- makes them close to
 useless.  I don't think this code was ready to commit.

Well, I noticed that, too, but I didn't think it was my job to tell
the patch author what functions he should have wanted.  A follow-on
patch to add to_regprocedure and to_regoperator wouldn't be much work,
if you want that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] Proposal: variant of regclass

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 11:01 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 8, 2014 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Looks good, committed with a bit of further cleanup.

 I had not actually paid attention to the non-regclass parts of this, and
 now that I look, I've got to say that it seems borderline insane to have
 chosen to implement regproc/regoper rather than regprocedure/regoperator.
 The types implemented here are incapable of dealing with overloaded names,
 which --- particularly in the operator case --- makes them close to
 useless.  I don't think this code was ready to commit.

 Well, I noticed that, too, but I didn't think it was my job to tell
 the patch author what functions he should have wanted.  A follow-on
 patch to add to_regprocedure and to_regoperator wouldn't be much work,
 if you want that.

And here is a patch for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 33e093e..e742181 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15288,10 +15288,18 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
/indexterm
 
indexterm
+primaryto_regprocedure/primary
+   /indexterm
+
+   indexterm
 primaryto_regoper/primary
/indexterm
 
indexterm
+primaryto_regoperator/primary
+   /indexterm
+
+   indexterm
 primaryto_regtype/primary
/indexterm
 
@@ -15476,11 +15484,21 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
entryget the oid of the named function/entry
   /row
   row
+   entryliteralfunctionto_regprocedure(parameterfunc_name/parameter)/function/literal/entry
+   entrytyperegprocedure/type/entry
+   entryget the oid of the named function/entry
+  /row
+  row
entryliteralfunctionto_regoper(parameteroperator_name/parameter)/function/literal/entry
entrytyperegoper/type/entry
entryget the oid of the named operator/entry
   /row
   row
+   entryliteralfunctionto_regoperator(parameteroperator_name/parameter)/function/literal/entry
+   entrytyperegoperator/type/entry
+   entryget the oid of the named operator/entry
+  /row
+  row
entryliteralfunctionto_regtype(parametertype_name/parameter)/function/literal/entry
entrytyperegtype/type/entry
entryget the oid of the named type/entry
@@ -15652,10 +15670,12 @@ SELECT collation for ('foo' COLLATE de_DE);
 
   para
The functionto_regclass/function, functionto_regproc/function,
-   functionto_regoper/function and functionto_regtype/function
-   translate relation, function, operator, and type names to objects of
-   type typeregclass/, typeregproc/, typeregoper/ and
-   typeregtype/, respectively.  These functions differ from a cast from
+   functionto_regprocedure/function, functionto_regoper/function,
+   functionto_regoperator/function, and functionto_regtype/function
+   functions translate relation, function, operator, and type names to objects
+   of type typeregclass/, typeregproc/, typeregprocedure/type,
+   typeregoper/, typeregoperator/type, and typeregtype/,
+   respectively.  These functions differ from a cast from
text in that they don't accept a numeric OID, and that they return null
rather than throwing an error if the name is not found (or, for
functionto_regproc/function and functionto_regoper/function, if
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index ed2bdbf..6210f45 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -323,6 +323,38 @@ regprocedurein(PG_FUNCTION_ARGS)
 }
 
 /*
+ * to_regprocedure	- converts proname(args) to proc OID
+ *
+ * If the name is not found, we return NULL.
+ */
+Datum
+to_regprocedure(PG_FUNCTION_ARGS)
+{
+	char	   *pro_name = PG_GETARG_CSTRING(0);
+	List	   *names;
+	int			nargs;
+	Oid			argtypes[FUNC_MAX_ARGS];
+	FuncCandidateList clist;
+
+	/*
+	 * Parse the name and arguments, look up potential matches in the current
+	 * namespace search list, and scan to see which one exactly matches the
+	 * given argument types.	(There will not be more than one match.)
+	 */
+	parseNameAndArgTypes(pro_name, false, names, nargs, argtypes);
+
+	clist = FuncnameGetCandidates(names, nargs, NIL, false, false, true);
+
+	for (; clist; clist = clist-next)
+	{
+		if (memcmp(clist-args, argtypes, nargs * sizeof(Oid)) == 0)
+			PG_RETURN_OID(clist-oid);
+	}
+
+	PG_RETURN_NULL();
+}
+
+/*
  * format_procedure		- converts proc OID to pro_name(args)
  *
  * This exports the useful functionality of regprocedureout for use
@@ -722,6 +754,45 @@ regoperatorin(PG_FUNCTION_ARGS)
 }
 
 /*
+ * to_regoperator	- converts oprname(args) to operator OID
+ *
+ * If the name is not found, we return NULL.
+ */
+Datum
+to_regoperator(PG_FUNCTION_ARGS)
+{
+	char	   *opr_name_or_oid = 

Re: [HACKERS] four minor proposals for 9.5

2014-04-08 Thread Pavel Stehule
2014-04-08 6:27 GMT+02:00 Amit Kapila amit.kapil...@gmail.com:

 On Mon, Apr 7, 2014 at 12:16 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  2014-04-04 6:51 GMT+02:00 Amit Kapila amit.kapil...@gmail.com:
  On Tue, Apr 1, 2014 at 11:42 PM, Pavel Stehule pavel.steh...@gmail.com
 
  wrote:
   2014-03-27 17:56 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com:
   So I'll prepare a some prototypes in next month for
  
   1. log a execution time for cancelled queries,
   2. track a query lock time
  Yes. Initially I though only about cancelled queries, but now O am
 thinking
  so some more wide solution can be better. Sometimes - some long queries
 can
  be stopped by Postgres, or by system - and info about duration can be
 same
  usefull.

 Right.

 
  One more thing I think currently also we can find that by crude way
  (pg_stat_activity has query_start time and log_line_prefix has end
 time),
  but I think the having in one place 'log' will be more useful.
  ??

 I just wanted to say that if someone wants to calculate the duration
 of cancelled query (or other error'd query), you can do that by checking
 the start time from pg_stat_activity and end time from log (using
 log_line_prefix), but this is of course cumbersome.

   Same technique I would to
   use for printing lock time - it can be printing instead symbol %L.
 
  Above you have mentioned that you are planing to have three different
  lock times (Table, Tuple and others), so will this one symbol (%L)
 enable
  all three lock times?
 
 
  My idea is start with %L as total lock time, what is useful for wide
 group
  of users, and next or in same time we can enhance it with two chars
 prefix
  symbols

 So do you want to just print lock time for error'd statements, won't
 it better to
 do it for non-error'd statements as well or rather I feel it can be more
 useful
 for non-error statements? Do we already have some easy way to get wait-time
 for non-error statements?


There are two points:

a) we have no a some infrastructure how to glue some specific info to any
query other than log_line_prefix. And I have no any idea, how and what
implement better. And I don't think so any new infrastructure (mechanism)
is necessary. log_line_prefix increase log size, but it is very well
parseable - splunk and similar sw has no problem with it.

b) lock time can be interesting on error statements too - for example - you
can cancel locked queries - so you would to see a lock time and duration
for cancelled queries. So this implementation can be sensible too.



  so
 
  %L .. total lock time
  %Lr .. lock relation
  %Lt .. lock tuples
  %Lo .. lock others
 
  L = Lr + Lt + Lr
 
 
  Are you planing to add new logs for logging this or planing to use
  existing
  infrastructure?
 
 
  I have not a prototype yet, so I don't know what will be necessary

 Okay, I think then it's better to discuss after your initial analysis/
 prototype, but I think you might need to add some infrastructure code
 to make it possible, as lock database object (relation, tuple, ..) and lock
 others (buffers, ..) have different locking strategy, so to get total wait
 time
 for a statement due to different kind of locks you need to accumulate
 different wait times.


yes, we can wait after prototype will be ready,



  One general point is that won't it be bit difficult to parse the log
 line
  having
  so many times, should we consider to log with some special marker for
  each time (for example: tup_lock_wait_time:1000ms).
 
 
  We should to optimize a log_line_prefix processing instead.
 
  Proposed options are interesting for enterprise using, when you have a
  some more smart tools for log entry processing, and when you need a
 complex
  view about performance of billions queries - when cancel time and lock
 time
  is important piece in mosaic of server' fitness.

 Exactly, though this might not be directly related to this patch, but
 having
 it would be useful.


I don't afraid about impact to performance (surely, it should be tested
first). My previous implementation in GoodData was based on active used
mechanism - it doesn't introduce any new overhead.

But it should be verified on prototype

regards

Pavel



 With Regards,
 Amit Kapila.
 EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-08 Thread Robert Haas
On Fri, Apr 4, 2014 at 10:01 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 22, 2014 at 10:17 AM, Noah Misch n...@leadboat.com wrote:
 Yeah, abandoning the state file is looking attractive.

 Here's a draft patch getting rid of the state file.  This should
 address concerns raised by Heikki and Fujii Masao and echoed by Tom
 that dynamic shared memory behaves differently than the main shared
 memory segment.  The control segment ID is stored in the System V
 shared memory block, so we're guaranteed that when using either System
 V or POSIX shared memory we'll always latch onto the control segment
 that matches up with the main shared memory segment we latched on to.
 Cleanup for the file-mmap and Windows methods doesn't need to change,
 because the former always just means clearing out $PGDATA/pg_dynshmem,
 and the latter is done automatically by the operating system.

 Comments?

Apparently not.  However, I'm fairly sure this is a step toward
addressing the complaints previously raised, even if there may be some
details people still want changed, so I've gone ahead and committed
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doc typo in 9.28. Event Trigger Functions

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 2:39 AM, Ian Barwick i...@2ndquadrant.com wrote:
 Just a single missing 's'.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE FOREIGN TABLE ( ... LIKE ... )

2014-04-08 Thread Robert Haas
On Mon, Apr 7, 2014 at 4:24 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-04-05 11:46:16 -0400, Tom Lane wrote:
 ISTM this is because the proposed feature is wrongheaded.  The basic
 concept of CREATE TABLE LIKE is that you're copying properties from
 another object of the same type.  You might or might not want every
 property, but there's no question of whether you *could* copy every
 property.  In contrast, what this is proposing to do is copy properties
 from (what might be) a plain table to a foreign table, and those things
 aren't even remotely the same kind of object.

 It would make sense to me to restrict LIKE to copy from another foreign
 table, and then there would be a different set of INCLUDING/EXCLUDING
 options that would be relevant (options yes, indexes no, for example).

 I actually think it's quite useful to create a foreign table that's the
 same shape as a local table. And the patches approach of refusing to
 copy thinks that aren't supported sounds sane to me.
 Consider e.g. moving off older partitioned data off to an archiving
 server. New local partitions are often created using CREATE TABLE LIKE,
 but that's not possible for the foreign ones.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pending 9.4 patches

2014-04-08 Thread Gregory Smith

On 4/7/14 2:59 AM, Craig Ringer wrote:

On 04/05/2014 03:57 AM, Andres Freund wrote:


c07) Updatable security barrier views.
  This needs a serious look by a committer.

I've been exercising it via row security and it's been looking pretty
solid. It isn't a huge or intrusive patch, and it's seen several rounds
of discussion during its development and refinement. (Thanks Dean).


Same here, nothing but good feedback from testing.  The updatable 
security barrier views has been sitting in Ready For Committer since 
late January.  Unfortunately, I just learned that some of the people who 
might commit in this area--Stephen Frost for example--thought there were 
still major oustanding issues with that part.


I (and I think Craig too) been waiting for that to be picked up by a 
committer, Stephen was waiting for me or Craig to fix unrelated bugs, 
and that's where the CF process has been deadlocked on this one.


A core bugfix with locking in security barrier views is required 
before the regression tests can be fixed up properly, for one thing. 
Tom also expressed concerns about how plan invalidation works, though 
it's not yet clear whether that was just miscommunication about how it 
works on my part or whether there's a concrete problem there.


This is similarly stuck.  I'm not out of resources, there's just nothing 
I can do here myself.  For this to move forward, a committer needs to 
pick up the security barrier views part.  We also need a bug fix for the 
issue that's breaking the regression tests.  Once all that's done, RLS 
on top of updateable security barrier views might be commitable.  But 
there's no way to tell for sure until those other two bits are sorted out.


I'd really love to finish this off for 9.4, but other projects have to 
come first. 


I have no other projects ahead of this for the remainder of this month.  
I just can't figure out what to do next until there's a committer (or 
committers, if someone else is going to take on the locking bug) 
identified.  I looked at the locking problem enough to know that one is 
over my head.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Greg Stark
On Mon, Apr 7, 2014 at 7:32 PM, Peter Geoghegan p...@heroku.com wrote:
 I think that Greg's choice of words was a little imprudent, but must
 be viewed in the context of an offline discussion during the hall
 track of pgConf NYC. Clearly Greg wasn't about to go off and
 unilaterally commit this. FWIW, I think I put him off the idea a few
 hours after he made those remarks, without intending for what I'd said
 to have that specific effect (or the opposite effect).

It was somewhere between your two interpretations. I intend to review
everything I can from the commitfest and then this patch and if this
patch is ready for commit before feature freeze I was saying I would
go ahead and commit it. That would only happen if there was a pretty
solid consensus that the my review was good and the patch was good of
course.

The point of the commit fest is to ensure that all patches get
attention. That's why I would only look at this after I've reviewed
anything else from the commitfest that I feel up to reviewing. But if
I have indeed done so there's no point in not taking other patches as
well up to feature freeze. I don't have any intention of lowering our
review standards of course.

So let's table this discussion until the hypothetical case of me doing
lots of reviews *and* reviewing this patch *and* that review being
positive and decisive enough to commit after one review cycle.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pending 9.4 patches

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 11:59 AM, Gregory Smith gregsmithpg...@gmail.com wrote:
 I have no other projects ahead of this for the remainder of this month.  I
 just can't figure out what to do next until there's a committer (or
 committers, if someone else is going to take on the locking bug) identified.
 I looked at the locking problem enough to know that one is over my head.

What post or thread should I read for details about this locking bug?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with displaying wide tables in psql

2014-04-08 Thread Greg Stark
On Sat, Feb 15, 2014 at 11:08 AM, Emre Hasegeli e...@hasegeli.com wrote:
 This is my review about 3th version of the patch. It is an useful
 improvement in my opinion. It worked well on my environment.


I'm reviewing this patch.

One thing to comment:

With no doc changes and no regression tests I was halfway inclined to
just reject it out of hand. To be fair there were no regression tests
for wrapped output prior to the patch but still I would have wanted to
see them added. We often pare down regression tests when committing
patches but it's easier to pare them down than write new ones and it
helps show the author's intention.

In this case I'm inclined to expand the regression tests. We've had
bugs in these formatting functions before and at least I find it hard
to touch code like this with any confidence that I'm not breaking
things. Formatting code ends up being pretty spaghetti-like easily and
there are lots of cases so it's easy to unintentionally break cases
you didn't realize you were touching.

In addition there are several cases of logic that looks like this:

if (x)
  ..
else
   {
 if (y)
   ...
 else
   ...
   }

I know there are other opinions on this but I find this logic very
difficut to follow. It's almost always clearer to refactor the
branches into a flat if / else if / else if /.../ else form. Even if
it results in some duplication of code (typically there's some trivial
bit of code outside the second if)  it's easy to quickly see whether
all the cases are handled and understand whether any of the cases have
forgotten any steps.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with displaying wide tables in psql

2014-04-08 Thread Andres Freund
On 2014-04-08 12:15:47 -0400, Greg Stark wrote:
 With no doc changes and no regression tests I was halfway inclined to
 just reject it out of hand. To be fair there were no regression tests
 for wrapped output prior to the patch but still I would have wanted to
 see them added. We often pare down regression tests when committing
 patches but it's easier to pare them down than write new ones and it
 helps show the author's intention.

I don't think this is easily testable that way - doesn't it rely on
determining the width of the terminal? Which you won't have when started
from pg_regress?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor improvements in alter_table.sgml

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 5:05 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 Attached is a patch to improve the manual page for the ALTER TABLE command.

Do we really need to add a section for type_name when we already
have a section for OF type_name?

constraint_name is also used for adding a constraint using an index.
So it could not only be a constraint to alter, validate, or drop, but
also a new constraint name to be added.  Honestly, how much value is
there in even having a section for this?  Do we really want to
document constraint_name as name of an existing constraint, or the
name of a new constraint to be added?  It would be accurate, then,
but it also doesn't really tell you anything you didn't know already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with displaying wide tables in psql

2014-04-08 Thread Greg Stark
On Tue, Apr 8, 2014 at 12:19 PM, Andres Freund and...@2ndquadrant.com wrote:
 I don't think this is easily testable that way - doesn't it rely on
 determining the width of the terminal? Which you won't have when started
 from pg_regress?

There's a pset variable to set the target width so at least the
formatting code can be tested. It would be nice to have the ioctl at
least get called on the regression farm so we're sure we aren't doing
something unportable.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] four minor proposals for 9.5

2014-04-08 Thread Gregory Smith

On 4/6/14 2:46 PM, Pavel Stehule wrote:


Proposed options are interesting for enterprise using, when you have 
a some more smart tools for log entry processing, and when you need a 
complex view about performance of billions queries - when cancel time 
and lock time is important piece in mosaic of server' fitness.


I once sent a design proposal over for something I called Performance 
Events that included this.  It will be difficult to get everything 
people want to track into log_line_prefix macro form.  You're right that 
this particular one can probably be pushed into there, but you're adding 
four macros just for this feature. And that's only a fraction of what 
people expect from database per-query performance metrics.


The problem I got stuck on with the performance event project was where 
to store the data collected.  If you want to keep up with read rates, 
you can't use the existing log infrastructure.  It has to be something 
faster, lighter.  I wanted to push the data into shared memory somewhere 
instead.  Then some sort of logging consumer could drain that queue and 
persist it to disk.


Since then, we've had a number of advances, particularly these two:

-Dynamic shared memory allocation.
-Query log data from pg_stat_statements can persist.

With those important fundamentals available, I'm wandering around right 
now trying to get development resources to pick the whole event logging 
idea up again.  The hardest parts of the infrastructure I was stuck on 
in the past are in the code today.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] four minor proposals for 9.5

2014-04-08 Thread Pavel Stehule
2014-04-08 18:34 GMT+02:00 Gregory Smith gregsmithpg...@gmail.com:

 On 4/6/14 2:46 PM, Pavel Stehule wrote:


 Proposed options are interesting for enterprise using, when you have a
 some more smart tools for log entry processing, and when you need a complex
 view about performance of billions queries - when cancel time and lock time
 is important piece in mosaic of server' fitness.


 I once sent a design proposal over for something I called Performance
 Events that included this.  It will be difficult to get everything people
 want to track into log_line_prefix macro form.  You're right that this
 particular one can probably be pushed into there, but you're adding four
 macros just for this feature. And that's only a fraction of what people
 expect from database per-query performance metrics.

 The problem I got stuck on with the performance event project was where to
 store the data collected.  If you want to keep up with read rates, you
 can't use the existing log infrastructure.  It has to be something faster,
 lighter.  I wanted to push the data into shared memory somewhere instead.
  Then some sort of logging consumer could drain that queue and persist it
 to disk.

 Since then, we've had a number of advances, particularly these two:

 -Dynamic shared memory allocation.
 -Query log data from pg_stat_statements can persist.


I know nothing about your proposal, so I cannot to talk about it. But I am
sure so any memory based solution is not practical for us. It can work well
for cumulative values (per database), but we need a two views - individual
(per queries) and cumulative (per database, per database server). We
process billion queries per day, and for us is more practical to use a
external log processing tools. But I understand well so for large group of
users can be memory solution perfect and I am  thinking so these designs
should coexists together - we log a slow queries (we log plans) and we use
a pg_stat_statements. And users can choose the best method for their
environment.

Probably some API (some data) can be shared by both designs.

Regards

Pavel



 With those important fundamentals available, I'm wandering around right
 now trying to get development resources to pick the whole event logging
 idea up again.  The hardest parts of the infrastructure I was stuck on in
 the past are in the code today.

 --
 Greg Smith greg.sm...@crunchydatasolutions.com
 Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/



Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Peter Geoghegan
t
On Tue, Apr 8, 2014 at 3:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 1. Avoid fmgr and shim overhead
 2. Use strxfrm to produce a pseudo-leading key that's cheaper to compare.

 In that case, these changes need to be analyzed separately. You don't get to
 make up for the losses by the second part by the gains from the first
 part. We could commit just the first part (for 9.5!), and that has to be the
 baseline for the second part.

Yes, that's right. Robert already submitted a patch that only did 1)
almost 2 years ago. That should have been committed at the time, but
wasn't. At the time, the improvement was put at about 7% by Robert. It
would be odd to submit the same patch that Robert withdrew already.

Why shouldn't 2) be credited with the same benefits as 1) ? It's not
as if the fact that the strxfrm() trick uses SortSupport is a
contrivance. I cannot reasonably submit the two separately, unless the
second in a cumulative patch. By far the largest improvements come
from 2), while 1) doesn't regress anything.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Buffer Allocation Concurrency Limits

2014-04-08 Thread Jason Petersen
In December, Metin (a coworker of mine) discussed an inability to scale a 
simple task (parallel scans of many independent tables) to many cores (it’s 
here). As a ramp-up task at Citus I was tasked to figure out what the heck was 
going on here.

I have a pretty extensive writeup here (whose length is more a result of my 
inexperience with the workings of PostgreSQL than anything else) and was 
looking for some feedback.

In short, my conclusion is that a working set larger than memory results in 
backends piling up on BufFreelistLock. As much as possible I removed anything 
that could be blamed for this:

Hyper-Threading is disabled
zone reclaim mode is disabled
numactl was used to ensure interleaved allocation
kernel.sched_migration_cost was set to highly disable migration
kernel.sched_autogroup_enabled was disabled
transparent hugepage support was disabled

For a way forward, I was thinking the buffer allocation sections could use some 
of the atomics Andres added here. Rather than workers grabbing BufFreelistLock 
to iterate the clock hand until they find a victim, the algorithm could be 
rewritten in a lock-free style, allowing workers to move the clock hand in 
tandem.

Alternatively, the clock iteration could be moved off to a background process, 
similar to what Amit Kapila proposed here.

Is this assessment accurate? I know 9.4 changes a lot about lock organization, 
but last I looked I didn’t see anything that could alleviate this contention: 
are there any plans to address this?

—Jason



Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Heikki Linnakangas

On 04/08/2014 08:02 PM, Peter Geoghegan wrote:

On Tue, Apr 8, 2014 at 3:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

1. Avoid fmgr and shim overhead
2. Use strxfrm to produce a pseudo-leading key that's cheaper to compare.

In that case, these changes need to be analyzed separately. You don't get to
make up for the losses by the second part by the gains from the first
part. We could commit just the first part (for 9.5!), and that has to be the
baseline for the second part.


Yes, that's right. Robert already submitted a patch that only did 1)
almost 2 years ago. That should have been committed at the time, but
wasn't. At the time, the improvement was put at about 7% by Robert. It
would be odd to submit the same patch that Robert withdrew already.

Why shouldn't 2) be credited with the same benefits as 1) ? It's not
as if the fact that the strxfrm() trick uses SortSupport is a
contrivance. I cannot reasonably submit the two separately, unless the
second in a cumulative patch.


Sure, submit the second as a cumulative patch.


By far the largest improvements come
from 2), while 1) doesn't regress anything.


Right. But 1) is the baseline we need to evaluate 2) against.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pending 9.4 patches

2014-04-08 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Apr 8, 2014 at 11:59 AM, Gregory Smith gregsmithpg...@gmail.com 
 wrote:
  I have no other projects ahead of this for the remainder of this month.  I
  just can't figure out what to do next until there's a committer (or
  committers, if someone else is going to take on the locking bug) identified.
  I looked at the locking problem enough to know that one is over my head.
 
 What post or thread should I read for details about this locking bug?

I think it was discussed a couple times, but it's here:

http://www.postgresql.org/message-id/531d3355.6010...@2ndquadrant.com

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Pending 9.4 patches

2014-04-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
  On 04/05/2014 03:57 AM, Andres Freund wrote:
  r04) Row-security based on Updatable security barrier views
  This one's fate seems to be hard to judge without c07.
 
  Open issues remain with this patch, and resources for working on it in
  9.4 have run out.
 
  It is not ready for commit. A core bugfix with locking in security
  barrier views is required before the regression tests can be fixed up
  properly, for one thing. Tom also expressed concerns about how plan
  invalidation works, though it's not yet clear whether that was just
  miscommunication about how it works on my part or whether there's a
  concrete problem there.
 
  I'd really love to finish this off for 9.4, but other projects have to
  come first.
 
 Given that, I think we should go ahead and mark this one Returned With
 Feedback.  It's past time to be punting anything that doesn't have a
 serious chance of getting committed for 9.4.

I'm a bit confused on this point- is the only issue the *preexisting*
bug with security barrier views?  I agree we need to fix that, but I'd
really like to see that fixed and backpatched to address the risk in
back-branches.  I had understood there to be *other* issues with this,
which is why I hadn't spent time on it.

Craig, in general, I'd argue that a pre-existing bug isn't a reason that
a patch isn't ready for commit.  The bug may need to be fixed before the
patch goes in, but saying a patch isn't ready implied, to me at least,
issues with the *patch*, which it sounds like isn't the case here.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-08 Thread Greg Stark
On Mon, Apr 7, 2014 at 12:06 AM, Rajeev rastogi
rajeev.rast...@huawei.com wrote:

 Deadlock Detection:

 It is possible that the main or upper autonomous transaction has taken a lock 
 on some resource, which might be required by lower autonomous transaction. If 
 it happens so then deadlock will occur. So in order to solve this issue, each 
 main and autonomous transaction will hold list of all locks acquired in 
 PROLOCK based on which deadlock will be resolved.


I'm not sure how this would work out internally -- it would depend on
how you plan to allocate the new transaction in the internal data
structures -- but the natural way to prevent/detect deadlocks would be
to have the parent transaction immediately take a lock on the
autonomous transaction as soon as it's started. That would cause any
lock in the autonomous transaction which caused it to wait on the
parent transaction to be detected as a deadlock. It would also cause
any monitoring tool to correctly show the parent transaction as
waiting on the autonomous transaction to finish.

If the autonomous transaction is actually a separate procarray entry
(which I suspect it would have to be, much like prepared transactions
and the dblink connections which are commonly used to kludge
autonomous transactions) then this should be fairly painless. If you
implement some kind of saving and restoring procarray data then it
probably wouldn't work out.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0

2014-04-08 Thread Bruce Momjian
On Sun, Apr  6, 2014 at 11:45:59AM +0530, Amit Kapila wrote:
 On Tue, Apr 1, 2014 at 6:31 AM, Bruce Momjian br...@momjian.us wrote:
  I reviewed this patch and you are correct that we are not handling
  socket() and accept() returns properly on Windows.  We were doing it
  properly in most place in the backend, but your patch fixes the
  remaining places:
 
  
  http://msdn.microsoft.com/en-us/library/windows/desktop/ms740516%28v=vs.85%29.aspx
 
  However, libpq doesn't seem to be doing much to handle Windows properly
  in this area.  I have adjusted libpq to map socket to -1, but the proper
  fix is to distribute pgsocket and PGINVALID_SOCKET checks throughout the
  libpq code.  I am not sure how to handle PQsocket() --- should it still
  return -1?
 
 I think changing PQsocket() can impact all existing applications as
 it is mentioned in docs that result of -1 indicates that no server
 connection is currently open..  Do you see any compelling need to
 change return value of PQSocket() after your patch?

No, I do not.  In fact, the SSL_get_fd() call in secure_read() returns a
signed integer too, and that is passed around like a socket, so in fact
the SSL API doesn't even allow us to get an unsigned value on Windows in
all cases.

  Having the return value be conditional on the operating
  system is ugly.  How much of this should be backpatched?
 
 I think it's okay to back patch all the changes.
 Is there any part in patch which you feel is risky to back patch?

Well, we would not backpatch this if it is just a stylistic fix, and I
am starting to think it just a style issue.  This MSDN website says -1,
SOCKET_ERROR, and INVALID_SOCKET are very similar:


http://msdn.microsoft.com/en-us/library/windows/desktop/cc507522%28v=vs.85%29.aspx

and this Stackoverflow thread says the same:


http://stackoverflow.com/questions/10817252/why-is-invalid-socket-defined-as-0-in-winsock2-h-c

In fact, this C program compiled by gcc on Debian issues no compiler
warnings and returns 'hello', showing that -1 and ~0 compare as equal:

int
main(int argc, char **argv)
{
int i;
unsigned int j;

i = -1;
j = ~0;

if (i == j)
printf(hello\n);

return 0;
}

meaning our incorrect syntax is computed correctly.

   Why aren't we
  getting warnings on Windows about assigning the socket() return value to
  an integer?
 
 I think by default Windows doesn't give warning for such code even at Warning
 level 4.  I have found one related link:
 http://stackoverflow.com/questions/75385/make-vs-compiler-catch-signed-unsigned-assignments
 
  Updated patch attached.
 
 It seems you have missed to change at below places.
 
 1.
 int
 pg_foreach_ifaddr(PgIfAddrCallback callback, void *cb_data)
 sock = WSASocket(AF_INET, SOCK_DGRAM, 0, 0, 0, 0);
 if (sock == SOCKET_ERROR)

Well, the actual problem here is that WSASocket() returns INVALID_SOCKET
per the documentation, not SOCKET_ERROR.  I did not use PGINVALID_SOCKET
here because this is Windows-specific code, defining 'sock' as SOCKET. 
We could have sock defined as pgsocket, but because this is Windows code
already, it doesn't seem wise to mix portability code in there.

 2.
 pgwin32_waitforsinglesocket(SOCKET s, int what, int timeout)
 {
 static HANDLE waitevent = INVALID_HANDLE_VALUE;
 static SOCKET current_socket = -1;

Yes, that -1 is wrong and I have changed it to INVALID_SOCKET, again
using the same rules that say PGINVALID_SOCKET doesn't make sense here
as it is Windows-specific code.

I am attaching an updated patch, which explains the PQsocket() return
value issue, and fixes the items listed above.  I am inclined to apply
this just to head for correctness, and modify libpq to use pgsocket
consistently in a follow-up patch.

This is not like the readdir() fix we had to backpatch because that was
clearly not catching errors.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
new file mode 100644
index d062c1d..8fa9aa7
*** a/src/backend/libpq/auth.c
--- b/src/backend/libpq/auth.c
*** ident_inet(hbaPort *port)
*** 1463,1469 
  
  	sock_fd = socket(ident_serv-ai_family, ident_serv-ai_socktype,
  	 ident_serv-ai_protocol);
! 	if (sock_fd  0)
  	{
  		ereport(LOG,
  (errcode_for_socket_access(),
--- 1463,1469 
  
  	sock_fd = socket(ident_serv-ai_family, ident_serv-ai_socktype,
  	 ident_serv-ai_protocol);
! 	if (sock_fd == PGINVALID_SOCKET)
  	{
  		ereport(LOG,
  (errcode_for_socket_access(),
*** ident_inet(hbaPort *port)
*** 1543,1549 
  	ident_response)));
  
  ident_inet_done:
! 	if (sock_fd = 0)
  		closesocket(sock_fd);
  	pg_freeaddrinfo_all(remote_addr.addr.ss_family, ident_serv);
  	

Re: [HACKERS] ipc_test

2014-04-08 Thread Greg Stark
On Mon, Apr 7, 2014 at 10:43 AM, Robert Haas robertmh...@gmail.com wrote:
 OK, done.  One less thing to worry about when committing!

Also one less thing to cause headaches with etags and similar tools.
It always drove me nuts that I was constantly being sent to ipc_test
files for various typedefs. Thanks!

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-08 Thread Alvaro Herrera
Greg Stark wrote:

 If the autonomous transaction is actually a separate procarray entry
 (which I suspect it would have to be, much like prepared transactions
 and the dblink connections which are commonly used to kludge
 autonomous transactions) then this should be fairly painless. If you
 implement some kind of saving and restoring procarray data then it
 probably wouldn't work out.

I don't have time to digest this proposal ATM, but in previous occasion
when we have discussed autonomous transactions (ATs), we have always
considered natural that they have their own procarray entries; there are
too many strange issues otherwise.

Since the number of procarray entries is fixed at startup time, one
natural consequence of this is that the number of ATs in flight at any
moment is also fixed.  Normally we consider allocating a single AT per
session to be sufficient.  So you can't have one AT start another AT,
for instance -- that seems a reasonable restriction.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ipc_test

2014-04-08 Thread Alvaro Herrera
Greg Stark wrote:
 On Mon, Apr 7, 2014 at 10:43 AM, Robert Haas robertmh...@gmail.com wrote:
  OK, done.  One less thing to worry about when committing!
 
 Also one less thing to cause headaches with etags and similar tools.
 It always drove me nuts that I was constantly being sent to ipc_test
 files for various typedefs. Thanks!

Yeah, my thoughts exactly.  Thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached

2014-04-08 Thread Peter Geoghegan
On Mon, Apr 7, 2014 at 8:29 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Documentation of jsonb tells that jsonb documents should be kept at a
 reasonable size to reduce lock contention, but there is no mention of
 size limitation for indexes:
 http://www.postgresql.org/docs/devel/static/datatype-json.html

It seems like your complaint is that this warrants special
consideration from the jsonb docs, due to this general limitation
being particularly likely to affect jsonb users. Is that accurate?

The structure of the JSON in your test case is quite atypical, since
there is one huge string containing each of the translations, rather
than a bunch of individual array elements (one per translation) or a
bunch of object pairs.

As it happens, just this morning I read that MongoDB's new version 2.6
now comes with stricter enforcement of key length:
http://docs.mongodb.org/master/release-notes/2.6-compatibility/#index-key-length-incompatibility
. While previous versions just silently failed to index values that
were inserted, there is now a 1024 limit imposed on the total size of
indexed values.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Peter Geoghegan
On Tue, Apr 8, 2014 at 10:10 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Right. But 1) is the baseline we need to evaluate 2) against.

I don't agree with that. Surely we're concerned with not regressing
cases that people actually care about, which in practical terms means
the changes of a single release. While I guess I'm fine with
structuring the patch like that, I don't think it's fair that the
strxfrm() stuff doesn't get credit for not regressing those cases so
badly just because they're only ameliorated by the fmgr-eliding stuff.
While I'm concerned about worst case performance myself, I don't want
to worry about Machiavelli rather than Murphy. What collation did you
use for your test-case?

The fmgr-eliding stuff is only really valuable in that it ameliorates
the not-so-bad regressions, and is integral to the strxfrm() stuff.
Let's not lose sight of the fact that (if we take TPC style benchmarks
as representative) the majority of text sorts can be made at least 3
times faster.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 3:10 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Apr 8, 2014 at 10:10 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 Right. But 1) is the baseline we need to evaluate 2) against.

 I don't agree with that. Surely we're concerned with not regressing
 cases that people actually care about, which in practical terms means
 the changes of a single release.

No, we're concerned about ending up with the best possible
performance.  That could mean applying the fmgr-elision but not the
other part.  Whether the other part is beneficial is based on how it
compares to the performance post-fmgr-elision.

As an oversimplified example, suppose someone were to propose two
patches, one that makes PostgreSQL ten times as fast and the other of
which slows it down by a factor of five.  If someone merges those two
things into a single combined patch, we would surely be foolish to
apply the whole thing.  The right thing would be to separate them out
and apply only the first one.  Every change has to stand on its own
two feet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 2:43 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Greg Stark wrote:
 If the autonomous transaction is actually a separate procarray entry
 (which I suspect it would have to be, much like prepared transactions
 and the dblink connections which are commonly used to kludge
 autonomous transactions) then this should be fairly painless. If you
 implement some kind of saving and restoring procarray data then it
 probably wouldn't work out.

 I don't have time to digest this proposal ATM, but in previous occasion
 when we have discussed autonomous transactions (ATs), we have always
 considered natural that they have their own procarray entries; there are
 too many strange issues otherwise.

 Since the number of procarray entries is fixed at startup time, one
 natural consequence of this is that the number of ATs in flight at any
 moment is also fixed.  Normally we consider allocating a single AT per
 session to be sufficient.  So you can't have one AT start another AT,
 for instance -- that seems a reasonable restriction.

It depends.  A lot of Oracle users are used to having autonomous
transactions be very cheap, so you can just mark random procedures as
running in an autonomous transaction and forget about it.  If the call
stack is several levels deep, then you could easily have one such
procedure call another such procedure.  Of course, you may feel that's
bad practice or that we shouldn't emulate what $COMPETITOR does, and I
agree we don't have to necessarily do it that way just because they do
it that way, but I'm not sure it's accurate to say that nobody will
care.

I'm also pretty unconvinced that multiple PGPROCs is the right way to
go.  First, PGPROCs have a bunch of state in them that is assumed to
exist once per backend.  We might find pretty substantial code churn
there if we try to go change that.  Second, why do other backends
really need to know about our ATs?  As far as I can see, if other
backends see the AT as a subtransaction of our top-level transaction
up until it actually commits, that ought to be just fine.  Maybe the
backend needs to internally frob visibility rules, but that's not a
matter for shared memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-08 Thread Andres Freund
On 2014-04-08 15:39:18 -0400, Robert Haas wrote:
 I'm also pretty unconvinced that multiple PGPROCs is the right way to
 go.  First, PGPROCs have a bunch of state in them that is assumed to
 exist once per backend.  We might find pretty substantial code churn
 there if we try to go change that.  Second, why do other backends
 really need to know about our ATs?  As far as I can see, if other
 backends see the AT as a subtransaction of our top-level transaction
 up until it actually commits, that ought to be just fine.  Maybe the
 backend needs to internally frob visibility rules, but that's not a
 matter for shared memory.

Agreed. That's also how I imagined things to work.

I think except the visibility semantics, there's really not that much to
do if we were to reuse the subtransaction framework. There's some
complications with Hot Standby, but I think those can be solved.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-04-08 Thread Tom Lane
I wrote:
 [ inet-gist-v6.patch ]

Committed with some additional documentation work.  Thanks for
submitting this!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 3:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 [ inet-gist-v6.patch ]

 Committed with some additional documentation work.  Thanks for
 submitting this!

NICE.  I'd like to tell you how excited I am about this part:

# It also handles a new network
# operator inet  inet (overlaps, a/k/a is supernet or subnet of),
# which is expected to be useful in exclusion constraints.

...but I can't, because my mouth is too full of drool.  Wouldn't you
really want that more for cidr than for inet, though?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-04-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 NICE.  I'd like to tell you how excited I am about this part:

 # It also handles a new network
 # operator inet  inet (overlaps, a/k/a is supernet or subnet of),
 # which is expected to be useful in exclusion constraints.

 ...but I can't, because my mouth is too full of drool.  Wouldn't you
 really want that more for cidr than for inet, though?

Probably, but it works with either since they're binary-compatible.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I'm also pretty unconvinced that multiple PGPROCs is the right way to
 go.  First, PGPROCs have a bunch of state in them that is assumed to
 exist once per backend.  We might find pretty substantial code churn
 there if we try to go change that.  Second, why do other backends
 really need to know about our ATs?  As far as I can see, if other
 backends see the AT as a subtransaction of our top-level transaction
 up until it actually commits, that ought to be just fine.

If we can make it work like that, sure.  I'm a bit worried about how you'd
decouple a subtransaction and commit it atomically ... or if that's not
atomic, will it create any problems?  The point being that you need to
change both pg_subtrans and pg_clog to make that state transition.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-08 Thread Andres Freund
On 2014-04-08 16:13:21 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  I'm also pretty unconvinced that multiple PGPROCs is the right way to
  go.  First, PGPROCs have a bunch of state in them that is assumed to
  exist once per backend.  We might find pretty substantial code churn
  there if we try to go change that.  Second, why do other backends
  really need to know about our ATs?  As far as I can see, if other
  backends see the AT as a subtransaction of our top-level transaction
  up until it actually commits, that ought to be just fine.
 
 If we can make it work like that, sure.  I'm a bit worried about how you'd
 decouple a subtransaction and commit it atomically ... or if that's not
 atomic, will it create any problems?  The point being that you need to
 change both pg_subtrans and pg_clog to make that state transition.

I think it can be made work sensibly - while those states are changed it
will still appear to be running via the procarray. There's some fun
around suboverflowed entries, but I think that can be handled by
reserving an entry for autonomous transactions.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Call for GIST/GIN/SP-GIST opclass documentation

2014-04-08 Thread Tom Lane
I just created sections in the SGML manual chapters about GIST, GIN, and
SP-GIST to hold documentation about the standard opclasses provided for
them:

http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html
http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html
http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html

We never had any well-defined place to discuss such opclasses before,
but I think it's past time we did.  I envision these sections as places to
document, for example, the difference between the two jsonb gin opclasses.
I put this text in about that:

Of the two operator classes for type jsonb, jsonb_ops is the
default. jsonb_hash_ops supports fewer operators but will work with
larger indexed values than jsonb_ops can support.

Is that accurate?  Do we need to say more?

For the moment what's there is mostly just tables of the core opclasses
and the operators they support.  If anyone can think of specific additions
that should be there, please send in patches.

(BTW, I didn't worry about btree and hash because they don't have such
a wide variety of opclass behaviors.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Peter Geoghegan
On Tue, Apr 8, 2014 at 12:31 PM, Robert Haas robertmh...@gmail.com wrote:
 No, we're concerned about ending up with the best possible
 performance.  That could mean applying the fmgr-elision but not the
 other part.  Whether the other part is beneficial is based on how it
 compares to the performance post-fmgr-elision.

I agree with everything you say here, but I'm puzzled only because
it's overwhelmingly obvious that the strxfrm() stuff is where the
value is. You can dispute whether or not I should have made various
tweaks, and you probably should, but the basic value of that idea is
very much in evidence already. You yourself put the improvements of
fmgr-elision alone at ~7% back in 2012 [1]. At the time, Noah said
that he didn't think it was worth bothering with that patch for what
he considered to be a small gain, a view which I did not share at the
time.

What I have here looks like it speeds things up a little over 200% (so
a little over 300% of the original throughput) with a single client
for many representative cases. That's a massive difference, to the
point that I don't see a lot of sense in considering fmgr-elision
alone separately.

[1] 
http://www.postgresql.org/message-id/CA+Tgmoa8by24gd+YbuPX=5gsgmn0w5sgipzwwq7_8is26vl...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-04-08 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Apr 8, 2014 at 3:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I wrote:
  [ inet-gist-v6.patch ]
 
  Committed with some additional documentation work.  Thanks for
  submitting this!
 
 NICE.  I'd like to tell you how excited I am about this part:
 
 # It also handles a new network
 # operator inet  inet (overlaps, a/k/a is supernet or subnet of),
 # which is expected to be useful in exclusion constraints.
 
 ...but I can't, because my mouth is too full of drool.  Wouldn't you
 really want that more for cidr than for inet, though?

You realize ip4r has had all this and more for, like, forever, right?
It's also more efficient wrt storage and generally more 'sane' regarding
operators, etc..

Just thought I'd share..  If you have a use-case, ip4r is what
everyone's been using for quite some time.  Also, yes, it supports both
ipv4 and ipv6, despite the name.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation

2014-04-08 Thread Peter Geoghegan
On Tue, Apr 8, 2014 at 1:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I just created sections in the SGML manual chapters about GIST, GIN, and
 SP-GIST to hold documentation about the standard opclasses provided for
 them:

I think that that's a good idea. I too was bothered by this omission.

 Of the two operator classes for type jsonb, jsonb_ops is the
 default. jsonb_hash_ops supports fewer operators but will work with
 larger indexed values than jsonb_ops can support.

 Is that accurate?  Do we need to say more?

Well, I'm not sure that it's worth noting there, but as you probably
already know jsonb_hash_ops will perform a lot better than the default
GIN opclass, and will have much smaller indexes. FWIW I think that the
size limitation is overblown, and performance is in fact the
compelling reason to prefer jsonb_hash_ops, although it's probably
incongruous to explain the issues that way in this section of the
docs. It probably suffices that that is covered in the JSON Types
section.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \d+ and oid display

2014-04-08 Thread Bruce Momjian
On Tue, Apr  1, 2014 at 01:36:02PM -0400, Robert Haas wrote:
 Although I agree with the general principle, I'm skeptical in this
 case.  There are a bunch of table-level options, and I don't think
 it's very reasonable to expect that users are going to remember which
 ones are going to be displayed under which conditions, especially if
 we change it from release to release.  If somebody doesn't see the
 has OIDs line, are they going to conclude that the table doesn't
 have OIDs, or are they going to conclude that psql doesn't ever
 display that information and they need to query pg_class manually?
 I'm sure at least some people will guess wrong.
 
 Now, admittedly, this is not the hill I want to die on.  The future of
 PostgreSQL doesn't rest on whatever ends up happening here.  But I
 think what's going on on this thread is a lot of tinkering with stuff
 that's not really broken.  I'm not saying don't ever change psql
 output.  What I'm saying is that changing psql output that is
 absolutely fine the way it is does not represent meaningful progress.
 The replica identity and has OIDs lines are a negligible
 percentage of what \d+ spits out - in a test I just did, 2 out of 37
 lines on \d+ pg_class.  I can't accept that tinkering with that is
 reducing clutter in any meaningful way; it's just change for the sake
 of change.

I looked over the psql code and Robert is right that Has OIDs and
Identity Replica are the only two cases where we are printing status,
rather than unique strings like an index name.  The only example I could
find was where we print Number of child tables in \d when there are
child tables.  Of course, we print the child table names when they exist
in \d+, but we don't print No child tables with \d or \d+, which seems
logical.

If we ignore backward compatibility, then Has OIDs and Identity
Replica are similar.  One thing that strongly (for me) supports not
always printing them is that I expect more people will be confused by
the mention of OIDs or Identity Replica than will actually care about
these features.  For example, if we always printed Child tables: 0,
more people would be confused than helped.

I don't think there is enough other cases for \d++ to make sense.

I am not sure where to go from here.  One idea would be tally the people
who have commented in this thread, but a more thorough approach would
be to have a vote.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \d+ and oid display

2014-04-08 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 If we ignore backward compatibility, then Has OIDs and Identity
 Replica are similar.  One thing that strongly (for me) supports not
 always printing them is that I expect more people will be confused by
 the mention of OIDs or Identity Replica than will actually care about
 these features.  For example, if we always printed Child tables: 0,
 more people would be confused than helped.

This is a good argument, actually: these fields are not only noise for
most people, but confusing if you don't know the feature they are
talking about.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \d+ and oid display

2014-04-08 Thread Bruce Momjian
On Tue, Apr  8, 2014 at 05:29:45PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  If we ignore backward compatibility, then Has OIDs and Identity
  Replica are similar.  One thing that strongly (for me) supports not
  always printing them is that I expect more people will be confused by
  the mention of OIDs or Identity Replica than will actually care about
  these features.  For example, if we always printed Child tables: 0,
  more people would be confused than helped.
 
 This is a good argument, actually: these fields are not only noise for
 most people, but confusing if you don't know the feature they are
 talking about.

Let me put it this way:  I didn't know what Identity Replica meant
when I saw it in psql.  Now, some might say that is expected, but still. ;-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached

2014-04-08 Thread Oleg Bartunov
We are working to avoid this limitation.

On Tue, Apr 8, 2014 at 10:54 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Apr 7, 2014 at 8:29 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Documentation of jsonb tells that jsonb documents should be kept at a
 reasonable size to reduce lock contention, but there is no mention of
 size limitation for indexes:
 http://www.postgresql.org/docs/devel/static/datatype-json.html

 It seems like your complaint is that this warrants special
 consideration from the jsonb docs, due to this general limitation
 being particularly likely to affect jsonb users. Is that accurate?

 The structure of the JSON in your test case is quite atypical, since
 there is one huge string containing each of the translations, rather
 than a bunch of individual array elements (one per translation) or a
 bunch of object pairs.

 As it happens, just this morning I read that MongoDB's new version 2.6
 now comes with stricter enforcement of key length:
 http://docs.mongodb.org/master/release-notes/2.6-compatibility/#index-key-length-incompatibility
 . While previous versions just silently failed to index values that
 were inserted, there is now a 1024 limit imposed on the total size of
 indexed values.

 --
 Peter Geoghegan


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation

2014-04-08 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Tue, Apr 8, 2014 at 1:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Of the two operator classes for type jsonb, jsonb_ops is the
 default. jsonb_hash_ops supports fewer operators but will work with
 larger indexed values than jsonb_ops can support.
 
 Is that accurate?  Do we need to say more?

 Well, I'm not sure that it's worth noting there, but as you probably
 already know jsonb_hash_ops will perform a lot better than the default
 GIN opclass, and will have much smaller indexes. FWIW I think that the
 size limitation is overblown, and performance is in fact the
 compelling reason to prefer jsonb_hash_ops, although it's probably
 incongruous to explain the issues that way in this section of the
 docs. It probably suffices that that is covered in the JSON Types
 section.

Well, the subtext is whether we should move that discussion to this
new section.  I think there is some comparable discussion in the
full-text-indexing chapter, too.

(BTW, wasn't there some discussion of changing our minds about which
one is the default?  We already have one bug report complaining about
jsonb_ops' size restriction, so that seems to be evidence in favor
of changing ...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation

2014-04-08 Thread Peter Geoghegan
On Tue, Apr 8, 2014 at 2:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 (BTW, wasn't there some discussion of changing our minds about which
 one is the default?  We already have one bug report complaining about
 jsonb_ops' size restriction, so that seems to be evidence in favor
 of changing ...)

Yes, there was. I very nearly came down on the side of making
jsonb_hash_ops the default, but given that it doesn't make all
operators indexable, I ultimately decided against supporting that
course of action. I thought that that would be an odd limitation for
the default GIN opclass to have. It was a very close call in my mind,
and if you favor changing the default now, in light of the few
complaints we've heard, I think that's a reasonable decision. That
said, as I noted in the main -bugs thread, the case presented is
fairly atypical.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-08 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Tue, Apr 8, 2014 at 2:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 (BTW, wasn't there some discussion of changing our minds about which
 one is the default?  We already have one bug report complaining about
 jsonb_ops' size restriction, so that seems to be evidence in favor
 of changing ...)

 Yes, there was. I very nearly came down on the side of making
 jsonb_hash_ops the default, but given that it doesn't make all
 operators indexable, I ultimately decided against supporting that
 course of action. I thought that that would be an odd limitation for
 the default GIN opclass to have. It was a very close call in my mind,
 and if you favor changing the default now, in light of the few
 complaints we've heard, I think that's a reasonable decision. That
 said, as I noted in the main -bugs thread, the case presented is
 fairly atypical.

Well, let me see if I understand the situation correctly:

* jsonb_ops supports more operators

* jsonb_hash_ops produces smaller, better-performing indexes

* jsonb_ops falls over on inputs with wide field values, but
jsonb_hash_ops does not

If that's an accurate summary then I would say that we've got
the default backwards.  I would much rather tell people you
can have more operators supported, but here are the tradeoffs
than have a default that fails under evidently-not-so-improbable
cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached

2014-04-08 Thread Tom Lane
Oleg Bartunov obartu...@gmail.com writes:
 We are working to avoid this limitation.

What do you mean by that ... do you see it as something that could be
fixed quickly, or is this a long-term improvement project?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Alvaro Herrera
Peter Geoghegan wrote:

 What I have here looks like it speeds things up a little over 200% (so
 a little over 300% of the original throughput) with a single client
 for many representative cases. That's a massive difference, to the
 point that I don't see a lot of sense in considering fmgr-elision
 alone separately.

I think the point here is what matters is that that gain from the
strxfrm part of the patch is large, regardless of what the baseline is
(right?).  If there's a small loss in an uncommon worst case, that's
probably acceptable, as long as the worst case is uncommon and the loss
is small.  But if the loss is large, or the case is not uncommon, then a
fix for the regression is going to be a necessity.

You seem to be assuming that a fix for whatever regression is found is
going to be impossible to find.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-08 Thread Peter Geoghegan
On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, let me see if I understand the situation correctly:

 * jsonb_ops supports more operators

 * jsonb_hash_ops produces smaller, better-performing indexes

 * jsonb_ops falls over on inputs with wide field values, but
 jsonb_hash_ops does not

There might be some compelling cases for indexing existence rather
than containment, since the recheck flag isn't set there, but in
general this summary seems sound. I would say that broadly, existence
is a less useful operator than containment, and so jsonb_hash_ops is
broadly more compelling. I didn't propose changing the default due to
concerns about the POLA, but I'm happy to be told that those concerns
were out of proportion to the practical benefits of a different
default.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached

2014-04-08 Thread Oleg Bartunov
On Wed, Apr 9, 2014 at 1:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oleg Bartunov obartu...@gmail.com writes:
 We are working to avoid this limitation.

 What do you mean by that ... do you see it as something that could be
 fixed quickly, or is this a long-term improvement project?

Unfortunately, It's long-term project VODKA, we hope to use spgist
(also needed some adjustment) for keys+values instead of btree.
Hopefully, we'll show something at PGCon.


 regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-08 Thread Andrew Dunstan


On 04/08/2014 05:57 PM, Peter Geoghegan wrote:

On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Well, let me see if I understand the situation correctly:

* jsonb_ops supports more operators

* jsonb_hash_ops produces smaller, better-performing indexes

* jsonb_ops falls over on inputs with wide field values, but
jsonb_hash_ops does not

There might be some compelling cases for indexing existence rather
than containment, since the recheck flag isn't set there, but in
general this summary seems sound. I would say that broadly, existence
is a less useful operator than containment, and so jsonb_hash_ops is
broadly more compelling. I didn't propose changing the default due to
concerns about the POLA, but I'm happy to be told that those concerns
were out of proportion to the practical benefits of a different
default.




I tend to agree with Tom that POLA will be more violated by the default 
ops class not being able to index some values.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Peter Geoghegan
On Tue, Apr 8, 2014 at 2:48 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 I think the point here is what matters is that that gain from the
 strxfrm part of the patch is large, regardless of what the baseline is
 (right?).  If there's a small loss in an uncommon worst case, that's
 probably acceptable, as long as the worst case is uncommon and the loss
 is small.  But if the loss is large, or the case is not uncommon, then a
 fix for the regression is going to be a necessity.

That all seems reasonable. I just don't understand why you'd want to
break out the fmgr-elision part, given that that was already discussed
at length two years ago.

 You seem to be assuming that a fix for whatever regression is found is
 going to be impossible to find.

I think that a fix that is actually worth it on balance will be
elusive. Heikki's worst case is extremely narrow. I think he'd
acknowledge that himself. I've already fixed some plausible
regressions. For example, the opportunistic early len1 == l3n2 
memcmp() == 0? test covers the common case where two leading keys are
equal. I think we're very much into chasing diminishing returns past
this point. I think that my figure of a 5% regression is much more
realistic, even though it is itself quite unlikely.

I think that the greater point is that we don't want to take worrying
about worst case performance to extremes. Calling Heikki's 50%
regression the worst case is almost unfair, since it involves very
carefully crafted input. You could probably also carefully craft input
that made our quicksort implementation itself go quadratic, a behavior
not necessarily exhibited by an inferior implementation for the same
input. Yes, let's consider a pathological worst case, but lets put it
in the context of being one end of a spectrum of behaviors, on the
extreme fringes. In reality, only a tiny number of individual sort
operations will experience any kind of regression at all. In simple
terms, I'd be very surprised if anyone complained about a regression
at all. If anyone does, it's almost certainly not going to be a 50%
regression. There is a reason why many other systems have
representative workloads that they target (i.e. a variety of tpc
benchmarks). I think that a tyranny of the majority is a bad thing
myself, but I'm concerned that we sometimes take that too far.

I wonder, why did Heikki not add more padding to the end of the
strings in his example, in order to give strxfrm() more wasted work?
Didn't he want to make his worst case even worse? Or was is to control
for TOASTing noise?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \d+ and oid display

2014-04-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Bruce Momjian br...@momjian.us writes:
  If we ignore backward compatibility, then Has OIDs and Identity
  Replica are similar.  One thing that strongly (for me) supports not
  always printing them is that I expect more people will be confused by
  the mention of OIDs or Identity Replica than will actually care about
  these features.  For example, if we always printed Child tables: 0,
  more people would be confused than helped.
 
 This is a good argument, actually: these fields are not only noise for
 most people, but confusing if you don't know the feature they are
 talking about.

I concur with this and would rather they not be there.  One of the
things that annoys me about certain other RDBMS's is how darn verbose
they are- it makes trying to read the definitions require much
head-scratching.

I'm on the fence about a \d++.  In general, I get a bit annoyed when
certain information isn't available through the backslash commands, but
it's hard to justify another '+' level for just these.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached

2014-04-08 Thread Michael Paquier
On Wed, Apr 9, 2014 at 6:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oleg Bartunov obartu...@gmail.com writes:
 We are working to avoid this limitation.

 What do you mean by that ... do you see it as something that could be
 fixed quickly, or is this a long-term improvement project?
If this is a known limitation and no fix is planned for 9.4, could it
be possible to document it appropriately for this release? This would
surprise users.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default gin operator class of jsonb failing with index row size maximum reached

2014-04-08 Thread Peter Geoghegan
On Tue, Apr 8, 2014 at 4:07 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 If this is a known limitation and no fix is planned for 9.4, could it
 be possible to document it appropriately for this release? This would
 surprise users.

It looks like the default GIN opclass will be changed, so it becomes a
matter of opting in to the particular set of trade-offs that the
current default represents. Presumably that includes the size
limitation, which isn't separately documented at present.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New option in pg_basebackup to exclude pg_log files during base backup

2014-04-08 Thread Prabakaran, Vaishnavi
Hi all,

Following the discussion in message id - 
cahgqgwffmor4ecugwhzpaapyqbsekdg66vmj1rvej6z-ega...@mail.gmail.commailto:cahgqgwffmor4ecugwhzpaapyqbsekdg66vmj1rvej6z-ega...@mail.gmail.com
  , I have developed the patch which gives option to user to exclude pg_log 
directory contents in pg_basebackup.

[Current situation]
During pg_basebackup, all files in pg_log directory will be copied to new 
backup directory.

[Design]
- Added new non-mandatory option -S/--skip-log-dir to pg_basebackup .
- If skip-log-dir is specified in pg_basebackup command, then in basebackup, 
exclude copying log files from standard pg_log directory and any other 
directory specified in Log_directory guc variable. (Still empty folder 
pg_log/$Log_directory will be created)
- In case, pg_log/$Log_directory is symbolic link, then an empty folder will be 
created

[Advantage]
It gives an option to user to avoid copying of large log files if they doesn't 
wish to and hence can save memory space.

Attached the patch.


Thanks  Regards,
Vaishnavi
Fujitsu Australia



pgbasebackup_excludes_pglog_v1.patch
Description: pgbasebackup_excludes_pglog_v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-08 Thread David Rowley
On Wed, Apr 9, 2014 at 8:48 AM, Florian Pflug f...@phlo.org wrote:


 As explain above, invtrans_bool is a bit problematic, since it carries
 a real risk of performance regressions. It's included for completeness'
 sake, and should probably not be committed at this time.


Did you mean to write invtrans_minmax? Otherwise you didn't explain about
you concerns with bool.

Regards

David Rowley


Re: [HACKERS] Pending 9.4 patches

2014-04-08 Thread Craig Ringer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/09/2014 02:00 AM, Stephen Frost wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 On 04/05/2014 03:57 AM, Andres Freund wrote:
 r04) Row-security based on Updatable security barrier
 views This one's fate seems to be hard to judge without
 c07.
 
 Open issues remain with this patch, and resources for
 working on it in 9.4 have run out.
 
 It is not ready for commit. A core bugfix with locking in
 security barrier views is required before the regression
 tests can be fixed up properly, for one thing. Tom also
 expressed concerns about how plan invalidation works,
 though it's not yet clear whether that was just 
 miscommunication about how it works on my part or whether
 there's a concrete problem there.
 
 I'd really love to finish this off for 9.4, but other
 projects have to come first.
 
 Given that, I think we should go ahead and mark this one
 Returned With Feedback.  It's past time to be punting anything
 that doesn't have a serious chance of getting committed for
 9.4.

 I'm a bit confused on this point- is the only issue the
 *preexisting* bug with security barrier views?

This thread discusses two patches. The above refers to row security
(per quoted text at top), not updatable security barrier views.

Updatable security barrier views are ready. There's a pre-existing bug
with security barrier views, but updatable s.b. views don't make it
any worse and it can be fixed separately.

Row security is not. It could possibly be committed w/o a fix for the
security barrier bug by deleting the relevant regression tests, but
Tom had reservations about plan invalidation in it, the docs need
updating, and it needs a bunch more testing. It's possible I could
have it ready in a few days - or it might be a couple of weeks. I ran
out of time to work on it for 9.4.

 Craig, in general, I'd argue that a pre-existing bug isn't a reason
 that a patch isn't ready for commit.  The bug may need to be fixed
 before the patch goes in, but saying a patch isn't ready implied,
 to me at least, issues with the *patch*, which it sounds like isn't
 the case here.

I tend to agree, and for that reason want updatable security barrier
views to make it in for 9.4.

- -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTRKCAAAoJELBXNkqjr+S2kwQH+gP9+sNyEnE2HiKpRkEgFn0C
g+rIfhjJl0ANPMAt6DIBNbns/1t38xqhpkbmirT8cS0RVplAETV6ynYngdzcQQOk
GVeoOylSr75Hh3PWC82qRBHtgMZ7tV8RChNXgW6p4qekpAhqmAMJzBwq+bVhKXmZ
+Wfpc1u5wTTc0aw9pmQVmr3ZpjibI+C54+eYrq97+JmC7kFHQWrLAmM/stiGeJpW
nzOCADfQolpjCWDts/flwKDu+F2y4aUNhOUEiMo+LtPqPRgYioZwIUMeF5HBz+Ng
CQTnjDeC/ROBFMvD1Jk1wBKvNl5lPd3ikdaLIaCmjav4hX2B35fbmuQLKgkxOwM=
=AaWD
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pending 9.4 patches

2014-04-08 Thread Craig Ringer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/09/2014 01:54 AM, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Apr 8, 2014 at 11:59 AM, Gregory Smith
 gregsmithpg...@gmail.com wrote:
 I have no other projects ahead of this for the remainder of
 this month.  I just can't figure out what to do next until
 there's a committer (or committers, if someone else is going to
 take on the locking bug) identified. I looked at the locking
 problem enough to know that one is over my head.
 
 What post or thread should I read for details about this locking
 bug?
 
 I think it was discussed a couple times, but it's here:
 
 http://www.postgresql.org/message-id/531d3355.6010...@2ndquadrant.com

Yep.
 
The follow-ups are important though - showing that it's actually
a pre-existing bug, and while it might also occur in updatable
security barrier views, there's already a locking problem in the
existing security barrier views that needs to be fixed.

The short version is:

When you have a RowMark (SELECT ... FOR UPDATE/SHARE, UPDATE, or
DELETE) on the results of a query, security barrier views are
incorrectly pushing this row mark down into the security_barrier
subquery they generate.

That means that if you:

SELECT * FROM some_view FOR UPDATE WHERE id % 2 = 0;

you should be locking only *even numbered* rows that match the
predicate of some_view, but in fact, what'll get run looks like the
pseudo-SQL:

SELECT * FROM (
  SELECT * FROM underlying_table WHERE view_predicate FOR UPDATE
) some_view
FOR UPDATE;

i.e. we're locking *all rows that match the view predicate*, failing
to apply the *user* predicate before locking.


- -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTRKFtAAoJELBXNkqjr+S2pa8IAI1UJPstG1EIcoT5szB7BXWT
FBnRpe5zECM1faZuHAjx9dRYXGjv/u5E+wq2jwocXLqRPIf4Cu90KDmwx3O2gCPO
psv8lpfkmjX7MGtuz4Y1A8OkcB+Q3m+4neV+NpFnPA5A3Dx7WLjiFCdHTurlvtD1
BZxK0YkUWw3S40v67MZtcOIrCRwVPQP9NS+PEt3WfTydRryXecOKnJxdolH6H4A8
1inCLvIfphkCChFMiLV6r+mzzi4JxRiPEwWg3uccLRhCwcTf1BQJcXbiKdbcTYBW
XT5CSyVm76gpd3WkFfxahlXkaSLOrzGney0LGHUI4ItunlxQzPgQhx2ghc97P9w=
=H425
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pending 9.4 patches

2014-04-08 Thread Stephen Frost
* Craig Ringer (cr...@2ndquadrant.com) wrote:
 On 04/09/2014 02:00 AM, Stephen Frost wrote:
  I'm a bit confused on this point- is the only issue the
  *preexisting* bug with security barrier views?
 
 This thread discusses two patches. The above refers to row security
 (per quoted text at top), not updatable security barrier views.

Right, I understood that.

 Updatable security barrier views are ready. There's a pre-existing bug
 with security barrier views, but updatable s.b. views don't make it
 any worse and it can be fixed separately.

Ok.

 Row security is not. It could possibly be committed w/o a fix for the
 security barrier bug by deleting the relevant regression tests, but
 Tom had reservations about plan invalidation in it, the docs need
 updating, and it needs a bunch more testing. It's possible I could
 have it ready in a few days - or it might be a couple of weeks. I ran
 out of time to work on it for 9.4.

So- row security makes the *existing bug* worse; I get that.  The
question regarding plan invalidation may be something we can work out.
As for docs and testing, those are things we would certainly be better
off with and may mean that this isn't able to make it into 9.4, which is
fair, but I wouldn't toss it out solely due to that.

  Craig, in general, I'd argue that a pre-existing bug isn't a reason
  that a patch isn't ready for commit.  The bug may need to be fixed
  before the patch goes in, but saying a patch isn't ready implied,
  to me at least, issues with the *patch*, which it sounds like isn't
  the case here.
 
 I tend to agree, and for that reason want updatable security barrier
 views to make it in for 9.4.

Ok.  I'm going to make a serious effort to find time to work on this, at
least.  Right now I'm busy preparing to launch a new site (you'll see
the announce in a couple days...), etc, etc, but I should have time this
weekend...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-08 Thread Florian Pflug
On Apr9, 2014, at 02:55 , David Rowley dgrowle...@gmail.com wrote:
 On Wed, Apr 9, 2014 at 8:48 AM, Florian Pflug f...@phlo.org wrote:
 
 As explain above, invtrans_bool is a bit problematic, since it carries
 a real risk of performance regressions. It's included for completeness'
 sake, and should probably not be committed at this time.
 
 Did you mean to write invtrans_minmax? Otherwise you didn't explain about
 you concerns with bool.

Grmpf. Should have re-read that once more before sending :-(

Yes, I meant invtrans_minmax is problematic! invtrans_bool is fine, the
inverse transition function never fails for BOOL_AND and BOOL_OR. This
is why I factored it out into a separate patch, to make it easy to not
apply the MIN/MAX stuff, while still applying the BOOL stuff. Sorry for
the confision.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pending 9.4 patches

2014-04-08 Thread Craig Ringer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/09/2014 09:28 AM, Stephen Frost wrote:

 Ok.  I'm going to make a serious effort to find time to work on
 this, at least.  Right now I'm busy preparing to launch a new site
 (you'll see the announce in a couple days...), etc, etc, but I
 should have time this weekend...

That'd be fantastic.

I'm committed on other work, but I'll make the time to get back on
this somehow. I've put a lot of time and work into it, and would love
to see it make 9.4 against the odds.

If just updatable security barrier views make it, that's a big plus
for next time around.

- -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTRKqGAAoJELBXNkqjr+S28MgIAJ7uD5gYMS+zX1VKhtB104qH
9+iO1KeK/JKvNlVitwVQ+rpAUcCt13VU0CIKH3mn/5+hRQLiM/8mffdl33oCdh4L
RRtx3zMiM74NiJk8H0Z9awjdAAAEe5IpcQuac57sFn8+NjQJAykpv03AwltLgbd7
s+ZK90kqGHtQTRAvxJqGfObRa/rc7IP1iASxk26xiRR/fTBxjGIJ0T+ihbjf/XI0
gYobmaUUQJFxoKTprhLL+MZHBf2UZntbJJBuL7VS9b6NlgyeS6rVai7f5MyREW0m
giIpKWRJTROW7o8syAy7jjCpuKgbuxVvAHOFdo8EIyX4u6tkZ4NakUwdn1Zimgg=
=QPHO
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-08 Thread Peter Geoghegan
On Thu, Apr 3, 2014 at 12:19 PM, Thom Brown t...@linux.com wrote:
 Looking good:

 -T 100 -n -f sort.sql

 Master: 21.670467 / 21.718653 (avg: 21.69456)
 Patch: 66.888756 / 66.888756 (avg: 66.888756)

These were almost exactly the same figures as I saw on my machine.
However, when compiling with certain additional flags -- with
CFLAGS=-O3 -march=native -- I was able to squeeze more out of this.
My machine has a recent Intel CPU, Intel(R) Core(TM) i7-3520M. With
these build settings the benchmark then averages about 75.5 tps across
multiple runs, which I'd call a fair additional improvement. I tried
this because I was specifically interested in the results of a memcmp
implementation that uses SIMD. I believe that these flags make
gcc/glibc use a memcmp implementation that takes advantage of SSE
where supported (and various subsequent extensions). Although I didn't
go to the trouble of verifying all this by going through the
disassembly, or instrumenting the code in any way, that is my best
guess as to what actually helped. I don't know how any of that might
be applied to improve matters in the real world, which is why I
haven't dived into this further, but it's worth being aware of.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-08 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 04/08/2014 05:57 PM, Peter Geoghegan wrote:
 ... I didn't propose changing the default due to
 concerns about the POLA, but I'm happy to be told that those concerns
 were out of proportion to the practical benefits of a different
 default.

 I tend to agree with Tom that POLA will be more violated by the default 
 ops class not being able to index some values.

We should wait a bit longer to see if anyone objects, but assuming that
this represents the consensus opinion ...

ISTM that the name jsonb_ops should have pride of place as the default
jsonb opclass.  Therefore, if we make this change, jsonb_hash_ops needs to
be renamed to jsonb_ops, and we need a new name for what is now jsonb_ops.
I haven't paid attention to the technical details of the differences so
I have no idea what to suggest for the new name.  Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Allocation Concurrency Limits

2014-04-08 Thread Amit Kapila
On Tue, Apr 8, 2014 at 10:38 PM, Jason Petersen ja...@citusdata.com wrote:
 In December, Metin (a coworker of mine) discussed an inability to scale a
 simple task (parallel scans of many independent tables) to many cores (it's
 here). As a ramp-up task at Citus I was tasked to figure out what the heck
 was going on here.

 I have a pretty extensive writeup here (whose length is more a result of my
 inexperience with the workings of PostgreSQL than anything else) and was
 looking for some feedback.

At this moment, I am not able to open the above link (here), may be some
problem (it's showing Service Unavailable); I will try it later.

 In short, my conclusion is that a working set larger than memory results in
 backends piling up on BufFreelistLock.

Here when you say that working set larger than memory, do you mean to refer
*memory* as shared_buffers?
I think if the data is more than total memory available, anyway the
effect of I/O
can over shadow the effect of BufFreelistLock contention.

 As much as possible I removed
 anything that could be blamed for this:

 Hyper-Threading is disabled
 zone reclaim mode is disabled
 numactl was used to ensure interleaved allocation
 kernel.sched_migration_cost was set to highly disable migration
 kernel.sched_autogroup_enabled was disabled
 transparent hugepage support was disabled


 For a way forward, I was thinking the buffer allocation sections could use
 some of the atomics Andres added here. Rather than workers grabbing
 BufFreelistLock to iterate the clock hand until they find a victim, the
 algorithm could be rewritten in a lock-free style, allowing workers to move
 the clock hand in tandem.

 Alternatively, the clock iteration could be moved off to a background
 process, similar to what Amit Kapila proposed here.

I think both of the above ideas can be useful, but not sure if they are
sufficient for scaling shared buffer's.

 Is this assessment accurate? I know 9.4 changes a lot about lock
 organization, but last I looked I didn't see anything that could alleviate
 this contention: are there any plans to address this?

I am planing to work on this for 9.5.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor improvements in alter_table.sgml

2014-04-08 Thread Etsuro Fujita

(2014/04/09 1:23), Robert Haas wrote:

On Tue, Apr 8, 2014 at 5:05 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:

Attached is a patch to improve the manual page for the ALTER TABLE command.


Do we really need to add a section for type_name when we already
have a section for OF type_name?


I think that the section for type_name would be necessary as that in 
chapter Parameters, not in chapter Description, which includes the 
section for OF type_name.



constraint_name is also used for adding a constraint using an index.
So it could not only be a constraint to alter, validate, or drop, but
also a new constraint name to be added.


I overlooked that.

 Honestly, how much value is

there in even having a section for this?  Do we really want to
document constraint_name as name of an existing constraint, or the
name of a new constraint to be added?  It would be accurate, then,
but it also doesn't really tell you anything you didn't know already.


You have a point there, but I feel odd about the documentation as is, 
because some are well written (eg, column_name) and some are not (eg, 
constraint_name).  So, if there are no objections, I'd like to update 
the patch.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \d+ and oid display

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 5:37 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Apr  8, 2014 at 05:29:45PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  If we ignore backward compatibility, then Has OIDs and Identity
  Replica are similar.  One thing that strongly (for me) supports not
  always printing them is that I expect more people will be confused by
  the mention of OIDs or Identity Replica than will actually care about
  these features.  For example, if we always printed Child tables: 0,
  more people would be confused than helped.

 This is a good argument, actually: these fields are not only noise for
 most people, but confusing if you don't know the feature they are
 talking about.

 Let me put it this way:  I didn't know what Identity Replica meant
 when I saw it in psql.  Now, some might say that is expected, but still. ;-)

Well, that's sorta my concern.  I mean, right now we've got people
saying what the heck is a replica identity?.  But, if the logical
decoding stuff becomes popular, as I hope it will, that's going to be
an important thing for people to adjust, and the information needs to
be present in a clear and easily-understood way.  I haven't studied
the current code in detail so maybe it's fine.  I just want to make
sure we're not giving it second-class treatment solely on the basis
that it's new and people aren't using it yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DOCS] Call for GIST/GIN/SP-GIST opclass documentation

2014-04-08 Thread Robert Haas
On Tue, Apr 8, 2014 at 4:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I just created sections in the SGML manual chapters about GIST, GIN, and
 SP-GIST to hold documentation about the standard opclasses provided for
 them:

 http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html
 http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html
 http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html

 We never had any well-defined place to discuss such opclasses before,
 but I think it's past time we did.

+1.  Great idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \d+ and oid display

2014-04-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Well, that's sorta my concern.  I mean, right now we've got people
 saying what the heck is a replica identity?.  But, if the logical
 decoding stuff becomes popular, as I hope it will, that's going to be
 an important thing for people to adjust, and the information needs to
 be present in a clear and easily-understood way.  I haven't studied
 the current code in detail so maybe it's fine.  I just want to make
 sure we're not giving it second-class treatment solely on the basis
 that it's new and people aren't using it yet.

I think the proposal is don't mention the property if it has the
default value.  That's not second-class status, as long as people
who know what the property is understand that behavior.  It's just
conserving screen space.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-08 Thread Rajeev rastogi
On 09 April 2014 01:09, Rover Haas Wrote:
 
 I'm also pretty unconvinced that multiple PGPROCs is the right way to
 go.  First, PGPROCs have a bunch of state in them that is assumed to
 exist once per backend.  We might find pretty substantial code churn
 there if we try to go change that.  

Yes you right. That is why I am not creating a separate procarray entry to 
maintain autonomous transaction. Please find details in previous reply sent 
today sometime back.

 Second, why do other backends 
 really need to know about our ATs?  As far as I can see, if other
 backends see the AT as a subtransaction of our top-level transaction up
 until it actually commits, that ought to be just fine.  Maybe the
 backend needs to internally frob visibility rules, but that's not a
 matter for shared memory.

In order to get snapshot from other session, it will be required by other 
session to access autonomous transaction and their sub-transactions.

During snapshot creation, autonomous transaction is considered as main
transaction and list of all running autonomous transaction and their 
sub-transaction
gets stored in snapshot data.

e.g. Suppose below processes are running with given transactions:

Proc-1: 100
Proc-2: 101, 102 (Auto Tx1), 103 (Auto Tx2), 104 (Sub-tx of Auto Tx2)
Proc-3: 105, 106 (Auto Tx2), 107 (Auto Tx2)

Suppose latest completed transaction is 108.

Then Snapshot data for autonomous transaction 107 will be as below:
Xmin: 100
Xmax: 109
Snapshot-xip[]:  100, 101, 102, 103, 105, 106  

Snapshot-subxip[]:   104

Thanks and Regards,
Kumar Rajeev Rastogi




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal for Merge Join for Non '=' Operators

2014-04-08 Thread Dilip kumar
I would like to propose a New merge join algorithm for optimizing non '=' 
operators. ('', '=', '', '=')


-  Currently  Merge join is only supported for '=' operator. For '' or 
'' operator it chooses Nest Loop Join, or Nest loop with materialization.

-  I think when tuple from lower node is sorted or sorting cost is very 
less, then we can use Merge Join for Non Equal operator also and which will 
give better performance than NLJ (for selecting this new cost calculation can 
be implemented in planner).



Example for using merge Join for  operator.

T1T2

3  1

4  2

5  4

Outer tuple (3) need to be compared with inner tuple one by one, so it will 
satisfy condition at third inner tuple (as 34). So here we can save this point 
of inner tuple so that next outer tuple can directly start comparison from this 
tuple.

1.   In this algorithm we can put one more optimization: Once outer tuple 
satisfies the Merge QUAL it can skip the Merge QUAL test with remaining inner 
tuple and directly apply Other QUALs, as merge QUAL will always satisfy for 
remaining tuples.



Implementation Detail:

1.   Need to add new cost calculation mechanism for this. I still have to 
work on this part.

2.   Implementing in Executor

a.   This algorithm is almost same as normal merge Join with some changes.

b.   Both Inner and Outer Data Sources should be sorted, same as normal 
merge Join.
ALGORITHM:
Merge Qual (R.A  Q.A)
r = first tuple from R (Outer Relation)
q = first tuple in Q( Inner Relation)
save_pos = q;  /* Position to start scanning in relation Q*/
While (fetch tuple r from R till relation end)
{
for each tuple q in Q starting from save_pos
{
 Merge Qual Satisfy
 {
save_pos = q;
Consume all subsequent tuples 
and project(just need to match Other Quals if any.)
 }
Else
Fetch Next tuple from Q;
}
}



-  Performance Comparison:
Suppose tuples of inner and outer is already sorted or Index scan on inner and 
outer.


* Then cost of NLJ is always O (r*q).

* The cost of this MJ will be b/w: O (n) to O (r*q).


Where r is number of tuple in R (outer relation) and q is number of tuple in Q 
(inner Relation).

Please provide your feedback/suggestions.

Thanks  Regards,
Dilip Kumar



Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-08 Thread Rajeev rastogi
On 09 April 2014 01:43, Tom Lane Wrote:

  I'm also pretty unconvinced that multiple PGPROCs is the right way to
  go.  First, PGPROCs have a bunch of state in them that is assumed to
  exist once per backend.  We might find pretty substantial code churn
  there if we try to go change that.  Second, why do other backends
  really need to know about our ATs?  As far as I can see, if other
  backends see the AT as a subtransaction of our top-level transaction
  up until it actually commits, that ought to be just fine.
 
 If we can make it work like that, sure.  I'm a bit worried about how
 you'd decouple a subtransaction and commit it atomically ... or if
 that's not atomic, will it create any problems?  

Though autonomous transaction uses mixed approach of sub-transaction as well as 
main
transaction, transaction state of autonomous transaction is handled 
independently.
So depending on the transaction state of autonomous transaction (for commit 
TBLOCK_AUTOCOMMIT), 
this transaction will be committed. While committing:
1.  Commit of record and logging the corresponding WAL happens in the same 
way as main transaction (except the way autonomous transaction and their 
sub-transaction accessed).
This will take care automatically of updating pg_clog also for 
autonomous transaction.
2.  Also it marks the autonomous transaction finish by setting appropriate 
fields of MyPgAutonomousXact in similar manner as done for main transaction.
3.  Freeing of all resource and popping out of parent transaction happens 
in the same way as sub-transaction.

 The point being that
 you need to change both pg_subtrans and pg_clog to make that state
 transition.

Yes I am changing both. But no specific changes were required. During commit 
and assignment of autonomous transaction, it is automatically taken care. 

Any comment/feedback/doubt are welcome?

Thanks and Regards,
Kumar Rajeev Rastogi








-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers