Re: [HACKERS] Minmax indexes

2014-07-11 Thread Fujii Masao
On Thu, Jul 10, 2014 at 6:16 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Claudio Freire wrote:

 An aggregate to generate a compressed set from several values
 A function which adds a new value to the compressed set and returns
 the new compressed set
 A function which tests if a value is in a compressed set
 A function which tests if a compressed set overlaps another
 compressed set of equal type

 If you can define different compressed sets, you can use this to
 generate both min/max indexes as well as bloom filter indexes. Whether
 we'd want to have both is perhaps questionable, but having the ability
 to is probably desirable.

 Here's a new version of this patch, which is more generic the original
 versions, and similar to what you describe.

I've not read the discussion so far at all, but I found the problem
when I played with this patch. Sorry if this has already been discussed.

=# create table test as select num from generate_series(1,10) num;
SELECT 10
=# create index testidx on test using minmax (num);
CREATE INDEX
=# alter table test alter column num type text;
ERROR:  could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-07-11 Thread Andres Freund
On 2014-07-04 19:27:10 +0530, Rahila Syed wrote:
 + /* Allocates memory for compressed backup blocks according to the 
 compression
 + * algorithm used.Once per session at the time of insertion of first XLOG
 + * record.
 + * This memory stays till the end of session. OOM is handled by making 
 the
 + * code proceed without FPW compression*/
 + static char *compressed_pages[XLR_MAX_BKP_BLOCKS];
 + static bool compressed_pages_allocated = false;
 + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF 
 + compressed_pages_allocated!= true)
 + {
 + size_t buffer_size = VARHDRSZ;
 + int j;
 + if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_SNAPPY)
 + buffer_size += snappy_max_compressed_length(BLCKSZ);
 + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_LZ4)
 + buffer_size += LZ4_compressBound(BLCKSZ);
 + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_PGLZ)
 + buffer_size += PGLZ_MAX_OUTPUT(BLCKSZ);
 + for (j = 0; j  XLR_MAX_BKP_BLOCKS; j++)
 + {   compressed_pages[j] = (char *) malloc(buffer_size);
 + if(compressed_pages[j] == NULL)
 + {
 + 
 compress_backup_block=BACKUP_BLOCK_COMPRESSION_OFF;
 + break;
 + }
 + }
 + compressed_pages_allocated = true;
 + }

Why not do this in InitXLOGAccess() or similar?

   /*
* Make additional rdata chain entries for the backup blocks, so that we
* don't need to special-case them in the write loop.  This modifies the
 @@ -1015,11 +1048,32 @@ begin:;
   rdt-next = (dtbuf_rdt2[i]);
   rdt = rdt-next;
  
 + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF)
 + {
 + /* Compress the backup block before including it in rdata chain 
 */
 + rdt-data = CompressBackupBlock(page, BLCKSZ - 
 bkpb-hole_length,
 + 
 compressed_pages[i], (rdt-len));
 + if (rdt-data != NULL)
 + {
 + /*
 +  * write_len is the length of compressed block 
 and its varlena
 +  * header
 +  */
 + write_len += rdt-len;
 + bkpb-hole_length = BLCKSZ - rdt-len;
 + /*Adding information about compression in the 
 backup block header*/
 + bkpb-block_compression=compress_backup_block;
 + rdt-next = NULL;
 + continue;
 + }
 + }
 +

So, you're compressing backup blocks one by one. I wonder if that's the
right idea and if we shouldn't instead compress all of them in one run to
increase the compression ratio.


 +/*
   * Get a pointer to the right location in the WAL buffer containing the
   * given XLogRecPtr.
   *
 @@ -4061,6 +4174,50 @@ RestoreBackupBlockContents(XLogRecPtr lsn, BkpBlock 
 bkpb, char *blk,
   {
   memcpy((char *) page, blk, BLCKSZ);
   }
 + /* Decompress if backup block is compressed*/
 + else if (VARATT_IS_COMPRESSED((struct varlena *) blk)
 +  
 bkpb.block_compression!=BACKUP_BLOCK_COMPRESSION_OFF)
 + {
 + if (bkpb.block_compression == BACKUP_BLOCK_COMPRESSION_SNAPPY)
 + {
 + int ret;
 + size_t compressed_length = VARSIZE((struct varlena *) 
 blk) - VARHDRSZ;
 + char *compressed_data = (char *)VARDATA((struct varlena 
 *) blk);
 + size_t s_uncompressed_length;
 +
 + ret = snappy_uncompressed_length(compressed_data,
 + compressed_length,
 + s_uncompressed_length);
 + if (!ret)
 + elog(ERROR, snappy: failed to determine 
 compression length);
 + if (BLCKSZ != s_uncompressed_length)
 + elog(ERROR, snappy: compression size mismatch 
 %d != %zu,
 + BLCKSZ, s_uncompressed_length);
 +
 + ret = snappy_uncompress(compressed_data,
 + compressed_length,
 + page);
 + if (ret != 0)
 + elog(ERROR, snappy: decompression failed: %d, 
 ret);
 + }
 + else if (bkpb.block_compression == BACKUP_BLOCK_COMPRESSION_LZ4)
 + {
 + int ret;
 +   

Re: [HACKERS] Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls

2014-07-11 Thread Jeff Davis
On Mon, 2014-07-07 at 01:21 -0700, Jeff Davis wrote:
 On Sun, 2014-07-06 at 21:11 -0700, Jeff Davis wrote:
  On Wed, 2014-04-16 at 12:50 +0100, Nicholas White wrote:
   Thanks for the detailed feedback, I'm sorry it took so long to
   incorporate it. I've attached the latest version of the patch, fixing
   in particular:

As innocent as this patch seemed at first, it actually opens up a lot of
questions.

Attached is the (incomplete) edit of the patch so far.

Changes from your patch:

* changed test to be locale-insensitive
* lots of refactoring in the execution itself
* fix offset 0 case
* many test improvements
* remove bitmapset and just use an array bitmap
* fix error message typo

Open Issues:

I don't think exposing the frame options is a good idea. That's an
internal concept now, but putting it in windowapi.h will mean that it
needs to live forever.

The struct is private, so there's no easy hack to access the frame
options directly. That means that we need to work with the existing API
functions, which is OK because I think that everything we want to do can
go into WinGetFuncArgInPartition(). If we do the same thing for
WinGetFuncArgInFrame(), then first/last/nth also work.

That leaves the questions:
 * Do we want IGNORE NULLS to work for every window function, or only a
specified subset?
 * If it only works for some window functions, is that hard-coded or
driven by the catalog?
 * If it works for all window functions, could it cause some
pre-existing functions to behave strangely?

Also, I'm re-thinking Dean's comments here:

http://www.postgresql.org/message-id/CAEZATCWT3=P88nv2ThTjvRDLpOsVtAPxaVPe=MaWe-x=guh...@mail.gmail.com

He brings up a few good points. I will look into the frame vs. window
option, though it looks like you've already at least fixed the crash.
His other point about actually eliminating the NULLs from the window
itself is interesting, but I don't think it works. IGNORE NULLS ignores
*other* rows with NULL, but (per spec) does not ignore the current row.
That sounds awkward if you've already removed the NULL rows from the
window, but maybe there's something that could work.

And there are a few other things I'm still looking into, but hopefully
they don't raise new issues.

Regards,
Jeff Davis

*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 13164,13169  SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
--- 13164,13170 
   lag(replaceable class=parametervalue/replaceable typeany/
   [, replaceable class=parameteroffset/replaceable typeinteger/
   [, replaceable class=parameterdefault/replaceable typeany/ ]])
+  [ { RESPECT | IGNORE } NULLS ]
 /function
/entry
entry
***
*** 13178,13184  SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
 replaceable class=parameterdefault/replaceable are evaluated
 with respect to the current row.  If omitted,
 replaceable class=parameteroffset/replaceable defaults to 1 and
!replaceable class=parameterdefault/replaceable to null
/entry
   /row
  
--- 13179,13187 
 replaceable class=parameterdefault/replaceable are evaluated
 with respect to the current row.  If omitted,
 replaceable class=parameteroffset/replaceable defaults to 1 and
!replaceable class=parameterdefault/replaceable to null. If
!literalIGNORE NULLS/ is specified then the function will be evaluated
!as if the rows containing nulls didn't exist.
/entry
   /row
  
***
*** 13191,13196  SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
--- 13194,13200 
   lead(replaceable class=parametervalue/replaceable typeany/
[, replaceable class=parameteroffset/replaceable typeinteger/
[, replaceable class=parameterdefault/replaceable typeany/ ]])
+  [ { RESPECT | IGNORE } NULLS ]
 /function
/entry
entry
***
*** 13205,13211  SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
 replaceable class=parameterdefault/replaceable are evaluated
 with respect to the current row.  If omitted,
 replaceable class=parameteroffset/replaceable defaults to 1 and
!replaceable class=parameterdefault/replaceable to null
/entry
   /row
  
--- 13209,13217 
 replaceable class=parameterdefault/replaceable are evaluated
 with respect to the current row.  If omitted,
 replaceable class=parameteroffset/replaceable defaults to 1 and
!replaceable class=parameterdefault/replaceable to null. If
!literalIGNORE NULLS/ is specified then the function will be evaluated
!as if the rows containing nulls didn't exist.
/entry
   /row
  
***
*** 13299,13309  SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y 

Re: [HACKERS] inherit support for foreign tables

2014-07-11 Thread Etsuro Fujita

(2014/07/10 18:12), Shigeru Hanada wrote:

2014-06-24 16:30 GMT+09:00 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:

(2014/06/23 18:35), Ashutosh Bapat wrote:



Selecting tableoid on parent causes an error, ERROR:  cannot extract
system attribute from virtual tuple. The foreign table has an OID which
can be reported as tableoid for the rows coming from that foreign table.
Do we want to do that?



No.  I think it's a bug.  I'll fix it.



IIUC, you mean that tableoid can't be retrieved when a foreign table
is accessed via parent table,


No.  What I want to say is that tableoid *can* be retrieved when a 
foreign table is accessed via the parent table.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-07-11 Thread Simon Riggs
On 9 July 2014 23:54, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Jul 9, 2014 at 2:16 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
 All this being said, I'm sticking to the name Minmax indexes.  There
 was a poll in pgsql-advocacy
 http://www.postgresql.org/message-id/53a0b4f8.8080...@agliodbs.com
 about a new name, but there were no suggestions supported by more than
 one person.  If a brilliant new name comes up, I'm open to changing it.

 How about summarizing indexes? That seems reasonably descriptive.

-1 for another name change. That boat sailed some months back.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-07-11 Thread Simon Riggs
On 10 July 2014 00:13, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Josh Berkus wrote:
 On 07/09/2014 02:16 PM, Alvaro Herrera wrote:
  The way it works now, each opclass needs to have three support
  procedures; I've called them getOpers, maybeUpdateValues, and compare.
  (I realize these names are pretty bad, and will be changing them.)

 I kind of like maybeUpdateValues.  Very ... NoSQL-ish.  Maybe update
 the values, maybe not.  ;-)

 :-)  Well, that's exactly what happens.  If we insert a new tuple into
 the table, and the existing summarizing tuple (to use Peter's term)
 already covers it, then we don't need to update the index tuple at all.
 What this name doesn't say is what values are to be maybe-updated.

There are lots of functions that maybe-do-things, that's just modular
programming. Not sure we need to prefix things with maybe to explain
that, otherwise we'd have maybeXXX everywhere.

More descriptive name would be MaintainIndexBounds() or similar.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-07-11 Thread Etsuro Fujita

(2014/07/11 15:50), Etsuro Fujita wrote:

(2014/07/10 18:12), Shigeru Hanada wrote:



IIUC, you mean that tableoid can't be retrieved when a foreign table
is accessed via parent table,


No.  What I want to say is that tableoid *can* be retrieved when a
foreign table is accessed via the parent table.


In fact, you can do that with v13 [1], but I plan to change the way of 
fixing (see [2]).


Thanks,

[1] http://www.postgresql.org/message-id/53b10914.2010...@lab.ntt.co.jp
[2] http://www.postgresql.org/message-id/53b2188b.4090...@lab.ntt.co.jp

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add line number as prompt option to psql

2014-07-11 Thread Jeevan Chalke
Hi,

Found new issues with latest patch:


 Thank you for reviewing the patch with variable cases.
 I have revised the patch, and attached latest patch.

  A:
  Will you please explain the idea behind these changes ?
 I thought wrong about adding new to tail of query_buf.
 The latest patch does not change related to them.

 Thanks.


  B:
 I added the condition of cur_line  0.


A.

However, this introduced new bug. As I told, when editor number of lines
reaches INT_MAX it starts giving negative number. You tried overcoming this
issue by adding  0 check. But I guess you again fumbled in setting that
correctly. You are setting it to INT_MAX - 1. This enforces each new line
to show line number as INT_MAX - 1 which is incorrect.

postgres=# \set PROMPT1 '%/[%l]%R%# '
postgres[1]=# \set PROMPT2 '%/[%l]%R%# '
postgres[1]=# \e
postgres[2147483646]-# limit
postgres[2147483646]-# 1;
   relname
--
 pg_statistic
(1 row)

postgres[1]=# \e
postgres[2147483646]-# =
postgres[2147483646]-# '
postgres[2147483646]'# abc
postgres[2147483646]'# '
postgres[2147483646]-# ;
 relname
-
(0 rows)

postgres[1]=# select
relname
from
pg_class
where
relname
=
'
abc
'
;


Again to mimic that, I have simply intialized newline to INT_MAX - 2.
Please don't take me wrong, but it seems that your unit testing is not
enough. Above issue I discovered by doing exactly same steps I did in
reviewing previous patch. If you had tested your new patch with those steps
I guess you have caught that yourself.


B.

+ /* Calculate the line number */
+ if (scan_result != PSCAN_INCOMPLETE)
+ {
+ /* The one new line is always added to tail of query_buf
*/
+ newline = (newline != 0) ? (newline + 1) : 1;
+ cur_line += newline;
+ }

Here in above code changes, in any case you are adding 1 to newline. i.e.
when it is 0 you are setting it 1 (+1) and when  0 you are setting nl + 1
(again +1).
So why can't you simply use
if (scan_result != PSCAN_INCOMPLETE)
cur_line += (newline + 1);

Or better, why can't you initialize newline with 1 itself and then directly
assign cur_line with newline. That will eliminate above entire code block,
isn't it?
Or much better, simply get rid of newline, and use cur_line itself, will
this work well for you?


C. Typos:
1.
/* Count the number of new line for calculate ofline number */

Missing space between 'of' and 'line'.
However better improve that to something like (just suggestion):
Count the number of new lines to correctly adjust current line number

2.
/* Avoid cur_line and newline exceeds the INT_MAX */

You are saying avoid cur_line AND newline, but there is no adjustment for
newline in the code following the comment.

Thanks
-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-11 Thread Simon Riggs
On 9 July 2014 18:54, Tomas Vondra t...@fuzzy.cz wrote:

 (1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number
 of batches this requires)

If we start off by assuming NTUP_PER_BUCKET = 1, how much memory does
it save to recalculate the hash bucket at 10 instead?
Resizing sounds like it will only be useful of we only just overflow our limit.

If we release next version with this as a hardcoded change, my
understanding is that memory usage for hash joins will leap upwards,
even if the run time of queries reduces. It sounds like we need some
kind of parameter to control this. We made it faster might not be
true if we run this on servers that are already experiencing high
memory pressure.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread Benedikt Grundmann
That is it possible to tell the planner that index is off limits i.e. don't
ever generate a plan using it?

Rationale:  Schema changes on big tables.  I might have convinced myself /
strong beliefs that for all queries that I need to be fast the planner does
not need to use a given index (e.g. other possible plans are fast enough).
However if I just drop the index and it turns out I'm wrong I might be in a
world of pain because it might just take way to long to recreate the index.

I know that I can use pg_stat* to figure out if an index is used at all.
But in the presense of multiple indices and complex queries the planner
might prefer the index-to-be-dropped but the difference to the alternatives
available is immaterial.

The current best alternative we have is to test such changes on a testing
database that gets regularly restored from production.  However at least in
our case we simply don't know all possible queries (and logging all of them
is not an option).

Cheers,

Bene


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-07-11 Thread Ali Akbar
Greetings,

Attached modified patch to handle xmlCopyNode returning NULL. The patch is
larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
is needed for calling xml_ereport).

From libxml2 source, it turns out that the other cases that xmlCopyNode
will return NULL will not happen. So in this patch i assume that the only
case is memory exhaustion.

But i found some bug in libxml2's code, because we call xmlCopyNode with 1
as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
check the return of xmlStaticCopyNode whether it's NULL. So if someday the
recursion returns NULL (maybe because of memory exhaustion), it will
SEGFAULT.

Knowing this but in libxml2 code, is this patch is still acceptable?

Thanks,
Ali Akbar
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***
*** 141,149  static bool print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
--- 141,151 
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
! 	   PgXmlErrorContext *xmlerrcxt);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate,
! 	   PgXmlErrorContext *xmlerrcxt);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
***
*** 3595,3620  SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur)
  {
  	xmltype*result;
  
  	if (cur-type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
  
  		buf = xmlBufferCreate();
  		PG_TRY();
  		{
! 			xmlNodeDump(buf, NULL, cur, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
  		xmlBufferFree(buf);
  	}
  	else
--- 3597,3636 
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
  {
  	xmltype*result;
  
  	if (cur-type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
+ 		xmlNodePtr cur_copy;
  
  		buf = xmlBufferCreate();
+ 
+ 		/* the result of xmlNodeDump won't contain namespace definitions
+ 		 * from parent nodes, but xmlCopyNode duplicates a node along
+ 		 * with its required namespace definitions.
+ 		 */
+ 		cur_copy = xmlCopyNode(cur, 1);
+ 
+ 		if (cur_copy == NULL)
+ 			xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ 		could not serialize xml);
+ 
  		PG_TRY();
  		{
! 			xmlNodeDump(buf, NULL, cur_copy, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
+ 			xmlFreeNode(cur_copy);
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
+ 		xmlFreeNode(cur_copy);
  		xmlBufferFree(buf);
  	}
  	else
***
*** 3656,3662  xml_xmlnodetoxmltype(xmlNodePtr cur)
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate)
  {
  	int			result = 0;
  	Datum		datum;
--- 3672,3679 
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate,
! 	   PgXmlErrorContext *xmlerrcxt)
  {
  	int			result = 0;
  	Datum		datum;
***
*** 3678,3684  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
  
  	for (i = 0; i  result; i++)
  	{
! 		datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj-nodesetval-nodeTab[i]));
  		*astate = accumArrayResult(*astate, datum,
     false, XMLOID,
     CurrentMemoryContext);
--- 3695,3702 
  
  	for (i = 0; i  result; i++)
  	{
! 		datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj-nodesetval-nodeTab[i],
! 	 xmlerrcxt));
  		*astate = accumArrayResult(*astate, datum,
     false, XMLOID,
     CurrentMemoryContext);
***
*** 3882,3890  xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
  		 * Extract the results as requested.
  		 */
  		if (res_nitems != NULL)
! 			*res_nitems = xml_xpathobjtoxmlarray(xpathobj, astate);
  		else
! 			(void) xml_xpathobjtoxmlarray(xpathobj, astate);
  	}
  	PG_CATCH();
  	{
--- 3900,3908 
  		 * Extract the results as requested.
  		 */
  		if (res_nitems != NULL)
! 			*res_nitems = 

[HACKERS] No exact/lossy block information in EXPLAIN ANALYZE for a bitmap heap scan

2014-07-11 Thread Etsuro Fujita
I've noticed that EXPLAIN ANALYZE shows no information on exact/lossy
blocks for a bitmap heap scan when both the numbers of exact/lossy pages
retrieved by the node are zero.  Such an example is shown below.  I
think it would be better to suppress the 'Heap Blocks' line in that
case, based on the same idea of the 'Buffers' line.  Patch attached.

postgres=# explain (analyze, verbose, buffers) select * from test where
id  10;
QUERY PLAN
--
 Bitmap Heap Scan on public.test  (cost=4.29..8.31 rows=1 width=29)
(actual time=0.006..0.006 rows=0 loops=1)
   Output: id, inserted, data
   Recheck Cond: (test.id  10)
   Heap Blocks:
   Buffers: shared hit=2
   -  Bitmap Index Scan on test_pkey  (cost=0.00..4.29 rows=1 width=0)
(actual time=0.003..0.003 rows=0 loops=1)
 Index Cond: (test.id  10)
 Buffers: shared hit=2
 Planning time: 0.118 ms
 Execution time: 0.027 ms
(10 rows)

Thanks,

Best regards,
Etsuro Fujita
*** a/src/backend/commands/explain.c
--- b/src/backend/commands/explain.c
***
*** 1937,1949  show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
  	}
  	else
  	{
! 		appendStringInfoSpaces(es-str, es-indent * 2);
! 		appendStringInfoString(es-str, Heap Blocks:);
! 		if (planstate-exact_pages  0)
! 			appendStringInfo(es-str,  exact=%ld, planstate-exact_pages);
! 		if (planstate-lossy_pages  0)
! 			appendStringInfo(es-str,  lossy=%ld, planstate-lossy_pages);
! 		appendStringInfoChar(es-str, '\n');
  	}
  }
  
--- 1937,1952 
  	}
  	else
  	{
! 		if (planstate-exact_pages  0 || planstate-lossy_pages  0)
! 		{
! 			appendStringInfoSpaces(es-str, es-indent * 2);
! 			appendStringInfoString(es-str, Heap Blocks:);
! 			if (planstate-exact_pages  0)
! appendStringInfo(es-str,  exact=%ld, planstate-exact_pages);
! 			if (planstate-lossy_pages  0)
! appendStringInfo(es-str,  lossy=%ld, planstate-lossy_pages);
! 			appendStringInfoChar(es-str, '\n');
! 		}
  	}
  }
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS Design

2014-07-11 Thread Stephen Frost
On Thursday, July 10, 2014, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Jul 9, 2014 at 2:13 AM, Stephen Frost sfr...@snowman.net
 javascript:; wrote:
  Yes, this would be possible (and is nearly identical to the original
  patch, except that this includes per-role considerations), however, my
  thinking is that it'd be simpler to work with policy names rather than
  sets of quals, to use when mapping to roles, and they would potentially
  be useful later for other things (eg: for setting up which policies
  should be applied when, or which should be OR' or ANDd with other
  policies, or having groups of policies, etc).

 Hmm.  I guess that's reasonable.  Should the policy be a per-table
 object (like rules, constraints, etc.) instead of a global object?

 You could do:

 ALTER TABLE table_name ADD POLICY policy_name (quals);
 ALTER TABLE table_name POLICY FOR role_name IS policy_name;
 ALTER TABLE table_name DROP POLICY policy_name;


Right, I was thinking they would be per table as they would specifically
provide a name for a set of quals, and quals are naturally table-specific.
I don't see a need to have them be global- that had been brought up before
with the notion of applications picking their policy, but we could also add
that later through another term (eg: contexts) which would then map to
policies or similar. We could even extend policies to be global by mapping
existing per-table ones to be global if we really needed to...

My feeling at the moment is that having them be per-table makes sense and
we'd still have flexibility to change later if we had some compelling
reason to do so.

Thanks!

Stephen


[HACKERS] pg_receivexlog and replication slots

2014-07-11 Thread Magnus Hagander
Is there a particular reason why pg_receivexlog only supports using
manually created slots but pg_recvlogical supports creating and
dropping them? Wouldn't it be good for consistency there?

I'm guessing it's related to not being exposed over the replication
protocol? We had a discussion earlier that I remember about being able
to use an auto-drop slot in pg_basebackup, but this would be
different - this would be about creating and dropping a regular
physical replication slot...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-07-11 Thread Andres Freund
On 2014-07-11 11:08:48 +0200, Magnus Hagander wrote:
 Is there a particular reason why pg_receivexlog only supports using
 manually created slots but pg_recvlogical supports creating and
 dropping them? Wouldn't it be good for consistency there?

I've added it to recvlogical because logical decoding isn't usable
without slots. I'm not sure what you'd want to create/drop a slot from
receivexlog for, but I'm not adverse to adding the capability.

 I'm guessing it's related to not being exposed over the replication
 protocol?

It's exposed:
create_replication_slot:
/* CREATE_REPLICATION_SLOT slot PHYSICAL */
K_CREATE_REPLICATION_SLOT IDENT K_PHYSICAL

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-07-11 Thread Magnus Hagander
On Fri, Jul 11, 2014 at 11:14 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-07-11 11:08:48 +0200, Magnus Hagander wrote:
 Is there a particular reason why pg_receivexlog only supports using
 manually created slots but pg_recvlogical supports creating and
 dropping them? Wouldn't it be good for consistency there?

 I've added it to recvlogical because logical decoding isn't usable
 without slots. I'm not sure what you'd want to create/drop a slot from
 receivexlog for, but I'm not adverse to adding the capability.

I was mostly thinking for consistency, really. Using slots with
pg_receivexlog makes quite a bit of sense, even though it's useful
without it. But having the ability to create and drop (with compatible
commandline arguments of course) them directly when you set it up
would certainly be more convenient.


 I'm guessing it's related to not being exposed over the replication
 protocol?

 It's exposed:
 create_replication_slot:
 /* CREATE_REPLICATION_SLOT slot PHYSICAL */
 K_CREATE_REPLICATION_SLOT IDENT K_PHYSICAL

Hmm. You know, it would help if I actually looked at a 9.4 version of
the file when I check for functions of this kind :) Thanks!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Incorrect comment in postgres_fdw.c

2014-07-11 Thread Etsuro Fujita
I think the following comment for store_returning_result() in
postgres_fdw.c is not right.

/* PGresult must be released before leaving this function. */

I think PGresult should not be released before leaving this function *on
success* in that function.

(I guess the comment has been copied and pasted from that for
get_remote_estimate().)

Thanks,

Best regards,
Etsuro Fujita
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 7dd43a9..f328833 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2257,7 +2257,6 @@ static void
 store_returning_result(PgFdwModifyState *fmstate,
 	   TupleTableSlot *slot, PGresult *res)
 {
-	/* PGresult must be released before leaving this function. */
 	PG_TRY();
 	{
 		HeapTuple	newtup;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-11 Thread Tomas Vondra
On 11 Červenec 2014, 9:27, Simon Riggs wrote:
 On 9 July 2014 18:54, Tomas Vondra t...@fuzzy.cz wrote:

 (1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number
 of batches this requires)

 If we start off by assuming NTUP_PER_BUCKET = 1, how much memory does
 it save to recalculate the hash bucket at 10 instead?
 Resizing sounds like it will only be useful of we only just overflow our
 limit.

 If we release next version with this as a hardcoded change, my
 understanding is that memory usage for hash joins will leap upwards,
 even if the run time of queries reduces. It sounds like we need some
 kind of parameter to control this. We made it faster might not be
 true if we run this on servers that are already experiencing high
 memory pressure.

Sure. We certainly don't want to make things worse for environments with
memory pressure.

The current implementation has two issues regarding memory:

(1) It does not include buckets into used memory, i.e. it's not included
into work_mem (so we may overflow work_mem). I plan to fix this, to make
work_mem a bit more correct, as it's important for cases with
NTUP_PER_BUCKET=1.

(2) There's a significant palloc overhead, because of allocating each
tuple separately - see my message from yesterday, where I observed the
batch memory context to get 1.4GB memory for 700MB of tuple data. By
densely packing the tuples, I got down to ~700MB (i.e. pretty much no
overhead).

The palloc overhead seems to be 20B (on x86_64) per tuple, and eliminating
this it more than compensates for ~8B per tuple, required for
NTUP_PER_BUCKET=1. And fixing (1) makes it more correct / predictable.

It also improves the issue that palloc overhead is not counted into
work_mem at all (that's why I got ~1.4GB batch context with work_mem=1GB).

So in the end this should give us much lower memory usage for hash joins,
even if we switch to NTUP_PER_BUCKET=1 (although that's pretty much
independent change). Does that seem reasonable?

Regarding the tunable to control this - I certainly don't want another GUC
no one really knows how to set right. And I think it's unnecessary thanks
to the palloc overhead / work_mem accounting fix, described above.

The one thing I'm not sure about is what to do in case of reaching the
work_mem limit (which should only happen with underestimated row count /
row width) - how to decide whether to shrink the hash table or increment
the number of batches. But this is not exclusive to NTUP_PER_BUCKET=1, it
may happen with whatever NTUP_PER_BUCKET value you choose.

The current code does not support resizing at all, so it always increments
the number of batches, but I feel an interleaved approach might be more
appropriate (nbuckets/2, nbatches*2, nbuckets/2, nbatches*2, ...). It'd be
nice to have some cost estimates ('how expensive is a rescan' vs. 'how
expensive is a resize'), but I'm not sure how to get that.

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-07-11 Thread Andres Freund
On 2014-07-11 11:18:58 +0200, Magnus Hagander wrote:
 On Fri, Jul 11, 2014 at 11:14 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-07-11 11:08:48 +0200, Magnus Hagander wrote:
  Is there a particular reason why pg_receivexlog only supports using
  manually created slots but pg_recvlogical supports creating and
  dropping them? Wouldn't it be good for consistency there?
 
  I've added it to recvlogical because logical decoding isn't usable
  without slots. I'm not sure what you'd want to create/drop a slot from
  receivexlog for, but I'm not adverse to adding the capability.
 
 I was mostly thinking for consistency, really. Using slots with
 pg_receivexlog makes quite a bit of sense, even though it's useful
 without it. But having the ability to create and drop (with compatible
 commandline arguments of course) them directly when you set it up
 would certainly be more convenient.

Ok. Do you plan to take care of it? If, I'd be fine with backpatching
it. I'm not likely to get to it right now :(

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-07-11 Thread Christoph Berg
Re: Bruce Momjian 2014-07-08 20140708202114.gd9...@momjian.us
I believe pg_upgrade itself still needs a fix. While it's not a
security problem to put the socket in $CWD while upgrading (it is
using -c unix_socket_permissions=0700), this behavior is pretty
unexpected, and does fail if your $CWD is  107 bytes.

In f545d233ebce6971b6f9847680e48b679e707d22 Peter fixed the pg_ctl
perl tests to avoid that problem, so imho it would make even more
sense to fix pg_upgrade which could also fail in production.
   
   +1.  Does writing that patch interest you?
  
  I'll give it a try once I've finished this CF review.
 
 OK.  Let me know if you need help.

Here's the patch. Proposed commit message:

Create pg_upgrade sockets in temp directories

pg_upgrade used to use the current directory for UNIX sockets to
access the old/new cluster.  This fails when the current path is
 107 bytes.  Fix by reusing the tempdir code from pg_regress
introduced in be76a6d39e2832d4b88c0e1cc381aa44a7f86881.  For cleanup,
we need to remember up to two directories.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-07-11 Thread Christoph Berg
Re: To Bruce Momjian 2014-07-11 20140711093923.ga3...@msg.df7cb.de
 Re: Bruce Momjian 2014-07-08 20140708202114.gd9...@momjian.us
 I believe pg_upgrade itself still needs a fix. While it's not a
 security problem to put the socket in $CWD while upgrading (it is
 using -c unix_socket_permissions=0700), this behavior is pretty
 unexpected, and does fail if your $CWD is  107 bytes.
 
 In f545d233ebce6971b6f9847680e48b679e707d22 Peter fixed the pg_ctl
 perl tests to avoid that problem, so imho it would make even more
 sense to fix pg_upgrade which could also fail in production.

+1.  Does writing that patch interest you?
   
   I'll give it a try once I've finished this CF review.
  
  OK.  Let me know if you need help.
 
 Here's the patch. Proposed commit message:
 
 Create pg_upgrade sockets in temp directories
 
 pg_upgrade used to use the current directory for UNIX sockets to
 access the old/new cluster.  This fails when the current path is
  107 bytes.  Fix by reusing the tempdir code from pg_regress
 introduced in be76a6d39e2832d4b88c0e1cc381aa44a7f86881.  For cleanup,
 we need to remember up to two directories.

Uh... now really.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
index b81010a..121a3d4 100644
--- a/contrib/pg_upgrade/option.c
+++ b/contrib/pg_upgrade/option.c
@@ -14,6 +14,7 @@
 
 #include pg_upgrade.h
 
+#include signal.h
 #include time.h
 #include sys/types.h
 #ifdef WIN32
@@ -26,6 +27,13 @@ static void check_required_directory(char **dirpath, char **configpath,
    char *envVarName, char *cmdLineOption, char *description);
 #define FIX_DEFAULT_READ_ONLY -c default_transaction_read_only=false
 
+#ifdef HAVE_UNIX_SOCKETS
+/* make_temp_sockdir() is invoked at most twice from pg_upgrade.c via get_sock_dir() */
+#define MAX_TEMPDIRS 2
+static int n_tempdirs = 0;	/* actual number of directories created */
+static const char *temp_sockdir[MAX_TEMPDIRS];
+#endif
+
 
 UserOpts	user_opts;
 
@@ -396,6 +404,86 @@ adjust_data_dir(ClusterInfo *cluster)
 	check_ok();
 }
 
+#ifdef HAVE_UNIX_SOCKETS
+/*
+ * Remove the socket temporary directories.  pg_ctl waits for postmaster
+ * shutdown, so we expect the directory to be empty, unless we are interrupted
+ * by a signal, in which case the postmaster will clean up the sockets, but
+ * there's a race condition with us removing the directory.  Ignore errors;
+ * leaking a (usually empty) temporary directory is unimportant.  This can run
+ * from a signal handler.  The code is not acceptable in a Windows signal
+ * handler (see initdb.c:trapsig()), but Windows is not a HAVE_UNIX_SOCKETS
+ * platform.
+ */
+static void remove_temp(void)
+{
+	int			i;
+
+	for (i = 0; i  n_tempdirs; i++)
+	{
+		Assert(temp_sockdir[i]);
+		rmdir(temp_sockdir[i]);
+	}
+}
+
+/*
+ * Signal handler that calls remove_temp() and reraises the signal.
+ */
+static void
+signal_remove_temp(int signum)
+{
+	remove_temp();
+
+	pqsignal(signum, SIG_DFL);
+	raise(signum);
+}
+
+/*
+ * Create a temporary directory suitable for the server's Unix-domain socket.
+ * The directory will have mode 0700 or stricter, so no other OS user can open
+ * our socket to exploit it independently from the auth method used.  Most
+ * systems constrain the length of socket paths well below _POSIX_PATH_MAX, so
+ * we place the directory under /tmp rather than relative to the possibly-deep
+ * current working directory.
+ *
+ * Using a temporary directory so no connections arrive other than what
+ * pg_upgrade initiate itself.  Compared to using the compiled-in
+ * DEFAULT_PGSOCKET_DIR, this also permits pg_upgrade to work in builds that
+ * relocate it to a directory not writable to the cluster owner.
+ */
+static const char *
+make_temp_sockdir(void)
+{
+	char	   *template = strdup(/tmp/pg_upgrade-XX);
+
+	Assert(n_tempdirs  MAX_TEMPDIRS);
+	temp_sockdir[n_tempdirs] = mkdtemp(template);
+	if (temp_sockdir[n_tempdirs] == NULL)
+	{
+		fprintf(stderr, _(%s: could not create directory \%s\: %s\n),
+os_info.progname, template, strerror(errno));
+		exit(2);
+	}
+
+	/*
+	 * Remove the directories during clean exit.  Register the handler only
+	 * once, though.
+	 */
+	if (n_tempdirs == 0)
+		atexit(remove_temp);
+
+	/*
+	 * Remove the directory before dying to the usual signals.  Omit SIGQUIT,
+	 * preserving it as a quick, untidy exit.
+	 */
+	pqsignal(SIGHUP, signal_remove_temp);
+	pqsignal(SIGINT, signal_remove_temp);
+	pqsignal(SIGPIPE, signal_remove_temp);
+	pqsignal(SIGTERM, signal_remove_temp);
+
+	return temp_sockdir[n_tempdirs++];
+}
+#endif   /* HAVE_UNIX_SOCKETS */
 
 /*
  * get_sock_dir
@@ -418,10 +506,8 @@ get_sock_dir(ClusterInfo *cluster, bool live_check)
 	{
 		if (!live_check)
 		{
-			/* Use the current directory for the socket */
-			cluster-sockdir = pg_malloc(MAXPGPATH);
-			if (!getcwd(cluster-sockdir, MAXPGPATH))
-pg_fatal(cannot find current directory\n);
+			/* 

Re: [HACKERS] add line number as prompt option to psql

2014-07-11 Thread Sawada Masahiko
On Fri, Jul 11, 2014 at 4:23 PM, Jeevan Chalke
jeevan.cha...@enterprisedb.com wrote:
 Hi,

 A.

 However, this introduced new bug. As I told, when editor number of lines
 reaches INT_MAX it starts giving negative number. You tried overcoming this
 issue by adding  0 check. But I guess you again fumbled in setting that
 correctly. You are setting it to INT_MAX - 1. This enforces each new line to
 show line number as INT_MAX - 1 which is incorrect.

 postgres=# \set PROMPT1 '%/[%l]%R%# '
 postgres[1]=# \set PROMPT2 '%/[%l]%R%# '
 postgres[1]=# \e
 postgres[2147483646]-# limit
 postgres[2147483646]-# 1;

relname
 --
  pg_statistic
 (1 row)

 postgres[1]=# \e
 postgres[2147483646]-# =
 postgres[2147483646]-# '
 postgres[2147483646]'# abc
 postgres[2147483646]'# '
 postgres[2147483646]-# ;
  relname
 -
 (0 rows)


 postgres[1]=# select
 relname
 from
 pg_class
 where
 relname
 =
 '
 abc
 '
 ;


 Again to mimic that, I have simply intialized newline to INT_MAX - 2.
 Please don't take me wrong, but it seems that your unit testing is not
 enough. Above issue I discovered by doing exactly same steps I did in
 reviewing previous patch. If you had tested your new patch with those steps
 I guess you have caught that yourself.


To my understating cleanly, you means that line number is not changed
when newline has reached to INT_MAX, is incorrect?
And the line number should be switched to 1 when line number has
reached to INT_MAX?


 B.

 + /* Calculate the line number */
 + if (scan_result != PSCAN_INCOMPLETE)
 + {
 + /* The one new line is always added to tail of query_buf
 */
 + newline = (newline != 0) ? (newline + 1) : 1;
 + cur_line += newline;
 + }

 Here in above code changes, in any case you are adding 1 to newline. i.e.
 when it is 0 you are setting it 1 (+1) and when  0 you are setting nl + 1
 (again +1).
 So why can't you simply use
 if (scan_result != PSCAN_INCOMPLETE)
 cur_line += (newline + 1);

 Or better, why can't you initialize newline with 1 itself and then directly
 assign cur_line with newline. That will eliminate above entire code block,
 isn't it?
 Or much better, simply get rid of newline, and use cur_line itself, will
 this work well for you?

this is better.  I will change code to this.


 C. Typos:
 1.
 /* Count the number of new line for calculate ofline number */

 Missing space between 'of' and 'line'.
 However better improve that to something like (just suggestion):
 Count the number of new lines to correctly adjust current line number

 2.
 /* Avoid cur_line and newline exceeds the INT_MAX */

 You are saying avoid cur_line AND newline, but there is no adjustment for
 newline in the code following the comment.

 Thanks
 --
 Jeevan B Chalke
 Principal Software Engineer, Product Development
 EnterpriseDB Corporation
 The Enterprise PostgreSQL Company


Thanks.
I will fix it.

-- 
Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing join removals for more join types

2014-07-11 Thread David Rowley
On Wed, Jul 9, 2014 at 12:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 David Rowley dgrow...@gmail.com writes:
  On 9 July 2014 09:27, Tom Lane t...@sss.pgh.pa.us wrote:
  On review it looks like analyzejoins.c would possibly benefit from an
  earlier fast-path check as well.

  Do you mean for non-subqueries? There already is a check to see if the
  relation has no indexes.

 Oh, sorry, that was a typo: I meant to write pathnode.c.  Specifically,
 we could skip the translate_sub_tlist step.  Admittedly that's not
 hugely expensive, but as long as we have the infrastructure for a quick
 check it might be worth doing.


  TBH I find the checks for FOR UPDATE and volatile functions to be
  questionable as well.

  Well, that's a real tough one for me as I only added that based on what
 you
  told me here:
  I doubt you should drop a subquery containing FOR UPDATE, either.
  That's a side effect, just as much as a volatile function would be.

 Hah ;-).  But the analogy to qual pushdown hadn't occurred to me at the
 time.


Ok, I've removed the check for volatile functions and FOR UPDATE.


  As far as I know the FOR UPDATE check is pretty much void as of now
 anyway,
  since the current state of query_is_distinct_for() demands that there's
  either a DISTINCT, GROUP BY or just a plain old aggregate without any
  grouping, which will just return a single row, neither of these will
 allow
  FOR UPDATE anyway.

 True.

  So the effort here should be probably be more focused on if we should
 allow
  the join removal when the subquery contains volatile functions. We should
  probably think fairly hard on this now as I'm still planning on working
 on
  INNER JOIN removals at some point and I'm thinking we should likely be
  consistent between the 2 types of join for when it comes to FOR UPDATE
 and
  volatile functions, and I'm thinking right now that for INNER JOINs that,
  since they're INNER that we could remove either side of the join. In that
  case maybe it would be harder for the user to understand why their
 volatile
  function didn't get executed.

 Color me dubious.  In exactly what circumstances would it be valid to
 suppress an inner join involving a sub-select?


hmm, probably I didn't think this through enough before commenting as I
don't actually have any plans for subselects with INNER JOINs. Though
saying that I guess there are cases that can be removed... Anything that
queries a single table that has a foreign key matching the join condition,
where the subquery does not filter or group the results. Obviously
something about the query would have to exist that caused it not to be
flattened, perhaps some windowing functions...


I've attached an updated patch which puts in some fast path code for
subquery type joins. I'm really not too sure on a good name for this
function. I've ended up with query_supports_distinctness() which I'm not
that keen on, but I didn't manage to come up with anything better.

Regards

David Rowley


subquery_leftjoin_removal_v1.6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Fix search_path default value separator.

2014-07-11 Thread Christoph Martin
Hi

I noticed a minor inconsistency with the search_path separator used in the
default configuration.

The schemas of any search_path set using `SET search_path TO...` are
separated by ,  (comma, space), while the default value is only separated
by , (comma).

The attached patch against master changes the separator of the default
value to be consistent with the usual comma-space separators, and updates
the documentation of `SHOW search_path;` accordingly.

This massive three-byte change passes all 144 tests of make check.

Regards,

Christoph
From 0f52f107af59f560212bff2bda74e643d63687f0 Mon Sep 17 00:00:00 2001
From: Christoph Martin christoph.r.mar...@gmail.com
Date: Fri, 11 Jul 2014 08:52:39 +0200
Subject: [PATCH] Fix search_path default value separator.

When setting the search_path with `SET search_path TO...`, the schemas
of the resulting search_path as reported by `SHOW search_path` are
separated by a comma followed by a single space. This patch applies the
same format to the default search_path, and updates the documentation
accordingly.
---
 doc/src/sgml/ddl.sgml | 2 +-
 src/backend/utils/misc/guc.c  | 2 +-
 src/backend/utils/misc/postgresql.conf.sample | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3b7fff4..2273616 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1746,7 +1746,7 @@ SHOW search_path;
 screen
  search_path
 --
- $user,public
+ $user, public
 /screen
 The first element specifies that a schema with the same name as
 the current user is to be searched.  If no such schema exists,
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3a31a75..a3f1051 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2915,7 +2915,7 @@ static struct config_string ConfigureNamesString[] =
 			GUC_LIST_INPUT | GUC_LIST_QUOTE
 		},
 		namespace_search_path,
-		\$user\,public,
+		\$user\, public,
 		check_search_path, assign_search_path, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 3f3e706..df98b02 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -499,7 +499,7 @@
 
 # - Statement Behavior -
 
-#search_path = '$user,public'		# schema names
+#search_path = '$user, public'	# schema names
 #default_tablespace = ''		# a tablespace name, '' uses the default
 #temp_tablespaces = ''			# a list of tablespace names, '' uses
 	# only default tablespace
-- 
2.0.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No exact/lossy block information in EXPLAIN ANALYZE for a bitmap heap scan

2014-07-11 Thread Fujii Masao
On Fri, Jul 11, 2014 at 5:45 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 I've noticed that EXPLAIN ANALYZE shows no information on exact/lossy
 blocks for a bitmap heap scan when both the numbers of exact/lossy pages
 retrieved by the node are zero.  Such an example is shown below.  I
 think it would be better to suppress the 'Heap Blocks' line in that
 case, based on the same idea of the 'Buffers' line.  Patch attached.

The patch looks good to me. Barring any objection, I will commit this both
in HEAD and 9.4.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-11 Thread Simon Riggs
On 11 July 2014 10:23, Tomas Vondra t...@fuzzy.cz wrote:
 On 11 Červenec 2014, 9:27, Simon Riggs wrote:
 On 9 July 2014 18:54, Tomas Vondra t...@fuzzy.cz wrote:

 (1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number
 of batches this requires)

 If we start off by assuming NTUP_PER_BUCKET = 1, how much memory does
 it save to recalculate the hash bucket at 10 instead?
 Resizing sounds like it will only be useful of we only just overflow our
 limit.

 If we release next version with this as a hardcoded change, my
 understanding is that memory usage for hash joins will leap upwards,
 even if the run time of queries reduces. It sounds like we need some
 kind of parameter to control this. We made it faster might not be
 true if we run this on servers that are already experiencing high
 memory pressure.

 Sure. We certainly don't want to make things worse for environments with
 memory pressure.

 The current implementation has two issues regarding memory:

 (1) It does not include buckets into used memory, i.e. it's not included
 into work_mem (so we may overflow work_mem). I plan to fix this, to make
 work_mem a bit more correct, as it's important for cases with
 NTUP_PER_BUCKET=1.

 (2) There's a significant palloc overhead, because of allocating each
 tuple separately - see my message from yesterday, where I observed the
 batch memory context to get 1.4GB memory for 700MB of tuple data. By
 densely packing the tuples, I got down to ~700MB (i.e. pretty much no
 overhead).

 The palloc overhead seems to be 20B (on x86_64) per tuple, and eliminating
 this it more than compensates for ~8B per tuple, required for
 NTUP_PER_BUCKET=1. And fixing (1) makes it more correct / predictable.

 It also improves the issue that palloc overhead is not counted into
 work_mem at all (that's why I got ~1.4GB batch context with work_mem=1GB).

 So in the end this should give us much lower memory usage for hash joins,
 even if we switch to NTUP_PER_BUCKET=1 (although that's pretty much
 independent change). Does that seem reasonable?

Yes, that alleviates my concern. Thanks.

 Regarding the tunable to control this - I certainly don't want another GUC
 no one really knows how to set right. And I think it's unnecessary thanks
 to the palloc overhead / work_mem accounting fix, described above.

Agreed, nor does anyone.

 The one thing I'm not sure about is what to do in case of reaching the
 work_mem limit (which should only happen with underestimated row count /
 row width) - how to decide whether to shrink the hash table or increment
 the number of batches. But this is not exclusive to NTUP_PER_BUCKET=1, it
 may happen with whatever NTUP_PER_BUCKET value you choose.

 The current code does not support resizing at all, so it always increments
 the number of batches, but I feel an interleaved approach might be more
 appropriate (nbuckets/2, nbatches*2, nbuckets/2, nbatches*2, ...). It'd be
 nice to have some cost estimates ('how expensive is a rescan' vs. 'how
 expensive is a resize'), but I'm not sure how to get that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing autocomplete for CREATE DATABASE

2014-07-11 Thread Magnus Hagander
On Fri, Jul 11, 2014 at 12:01 AM, Vik Fearing vik.fear...@dalibo.com wrote:
 On 07/10/2014 09:32 PM, Magnus Hagander wrote:
 It seems psql is missing autocomplete entries for LC_COLLATE and
 LC_CTYPE for the CREATE DATABASE command. Attached patch adds that.

 I doubt this is important enough to backpatch - thoughts?

 It won't apply to current head, but otherwise I see no problem with it.

Meh, thanks for pointing that out. I git-fetch:ed from the wrong repository :)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add line number as prompt option to psql

2014-07-11 Thread Jeevan Chalke
Hi,


On Fri, Jul 11, 2014 at 3:13 PM, Sawada Masahiko sawada.m...@gmail.com
wrote:

 
 

 To my understating cleanly, you means that line number is not changed
 when newline has reached to INT_MAX, is incorrect?


As per my thinking yes.


 And the line number should be switched to 1 when line number has
 reached to INT_MAX?


Yes, when it goes beyond INT_MAX, wrap around to 1.

BTW, I wonder, can't we simply use unsigned int instead?

Also, what is the behaviour on LINE n, in error message in case of such
wrap-around?



 
  Or much better, simply get rid of newline, and use cur_line itself, will
  this work well for you?

 this is better.  I will change code to this.
 Thanks.
 I will fix it.



Meanwhile I have tried this. Attached patch to have your suggestion on
that.

Thanks

-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 255e8ca..030f4d0 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3298,6 +3298,11 @@ testdb=gt; userinputINSERT INTO my_table VALUES (:'content');/userinput
   /varlistentry
 
   varlistentry
+termliteral%l/literal/term
+listitemparaThe current line number/para/listitem
+  /varlistentry
+
+  varlistentry
 termliteral%/literalreplaceable class=parameterdigits/replaceable/term
 listitem
 para
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index c3aff20..675b550 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -8,6 +8,7 @@
 #include postgres_fe.h
 #include mainloop.h
 
+#include limits.h
 
 #include command.h
 #include common.h
@@ -58,6 +59,7 @@ MainLoop(FILE *source)
 	pset.cur_cmd_source = source;
 	pset.cur_cmd_interactive = ((source == stdin)  !pset.notty);
 	pset.lineno = 0;
+	cur_line = 1;
 
 	/* Create working state */
 	scan_state = psql_scan_create();
@@ -225,6 +227,7 @@ MainLoop(FILE *source)
 		{
 			PsqlScanResult scan_result;
 			promptStatus_t prompt_tmp = prompt_status;
+			char *tmp = line;
 
 			scan_result = psql_scan(scan_state, query_buf, prompt_tmp);
 			prompt_status = prompt_tmp;
@@ -235,6 +238,30 @@ MainLoop(FILE *source)
 exit(EXIT_FAILURE);
 			}
 
+			/* 
+			 * Increase current line number counter with the new lines present
+			 * in the line buffer
+			 */
+			while (*tmp != '\0'  scan_result != PSCAN_INCOMPLETE)
+			{
+if (*(tmp++) == '\n')
+	cur_line++;
+			}
+
+			/* The one new line is always added to tail of query_buf */
+			if (scan_result != PSCAN_INCOMPLETE)
+cur_line++;
+
+			/*
+			 * If we overflow, then we start at INT_MIN and move towards 0.  So
+			 * to get +ve wrap-around line number we have to add INT_MAX + 2 to
+			 * this number.  We add 2 due to the fact that we have difference
+			 * of 1 in absolute value of INT_MIN and INT_MAX and another 1 as
+			 * line number starts at one and not at zero.
+			 */
+			if (cur_line  0)
+cur_line += INT_MAX + 2;
+
 			/*
 			 * Send command if semicolon found, or if end of line and we're in
 			 * single-line mode.
@@ -256,6 +283,7 @@ MainLoop(FILE *source)
 /* execute query */
 success = SendQuery(query_buf-data);
 slashCmdStatus = success ? PSQL_CMD_SEND : PSQL_CMD_ERROR;
+cur_line = 1;
 
 /* transfer query to previous_buf by pointer-swapping */
 {
@@ -303,6 +331,7 @@ MainLoop(FILE *source)
  query_buf : previous_buf);
 
 success = slashCmdStatus != PSQL_CMD_ERROR;
+cur_line = 1;
 
 if ((slashCmdStatus == PSQL_CMD_SEND || slashCmdStatus == PSQL_CMD_NEWEDIT) 
 	query_buf-len == 0)
diff --git a/src/bin/psql/prompt.c b/src/bin/psql/prompt.c
index 26fca04..6a62e5f 100644
--- a/src/bin/psql/prompt.c
+++ b/src/bin/psql/prompt.c
@@ -44,6 +44,7 @@
  *		in prompt2 -, *, ', or ;
  *		in prompt3 nothing
  * %x - transaction status: empty, *, !, ? (unknown or no connection)
+ * %l - the line number
  * %? - the error code of the last query (not yet implemented)
  * %% - a percent sign
  *
@@ -229,6 +230,9 @@ get_prompt(promptStatus_t status)
 		}
 	break;
 
+case 'l':
+	sprintf(buf, %d, cur_line);
+	break;
 case '?':
 	/* not here yet */
 	break;
diff --git a/src/bin/psql/prompt.h b/src/bin/psql/prompt.h
index 4d2f7e3..f1f80d2 100644
--- a/src/bin/psql/prompt.h
+++ b/src/bin/psql/prompt.h
@@ -22,4 +22,7 @@ typedef enum _promptStatus
 
 char	   *get_prompt(promptStatus_t status);
 
+/* Current line number */
+intcur_line;
+
 #endif   /* PROMPT_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing NOT IN to use ANTI joins

2014-07-11 Thread David Rowley
On Fri, Jul 11, 2014 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  We could no doubt fix this by also insisting that the left-side vars
  be provably not null, but that's going to make the patch even slower
  and even less often applicable.  I'm feeling discouraged about whether
  this is worth doing in this form.


:-( seems I didn't do my analysis very well on that one.


  Hm ... actually, there might be a better answer: what about transforming

WHERE (x,y) NOT IN (SELECT provably-not-null-values FROM ...)

 to

WHERE antijoin condition AND x IS NOT NULL AND y IS NOT NULL

 ?


I think this is the way to go.
It's basically what I had to do with the WIP patch I have here for SEMI
JOIN removal, where when a IN() or EXISTS type join could be removed due to
the existence of a foreign key, the NULL values still need to be filtered
out.

Perhaps it would be possible for a future patch to check get_attnotnull and
remove these again in eval_const_expressions, if the column can't be null.

Thanks for taking the time to fix up the weirdness with the NATURAL joins
and also making use of the join condition to prove not null-ability.

I'll try and get some time soon to look into adding the IS NOT NULL quals,
unless you were thinking of looking again?

Regards

David Rowley


 Of course this would require x/y not being volatile, but if they are,
 we're not going to get far with optimizing the query anyhow.

 regards, tom lane



Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-07-11 Thread Rahila Syed
Thank you for review.

So, you're compressing backup blocks one by one. I wonder if that's the
right idea and if we shouldn't instead compress all of them in one run to
increase the compression ratio.
The idea behind compressing blocks one by one was to keep the code as much
similar to the original as possible.
For instance the easiest change I could think of is , if we compress all
backup blocks of a WAL record together the below format of WAL record might
change

Fixed-size header (XLogRecord struct)
 rmgr-specific data
 BkpBlock
 backup block data
 BkpBlock
 backup block data


to

 Fixed-size header (XLogRecord struct)
 rmgr-specific data
 BkpBlock
 BkpBlock
 backup blocks data
...

But at the same time, it can be worth giving a try to see if there is
significant improvement in compression .

So why aren't we compressing the hole here instead of compressing the
parts that the current logic deems to be filled with important information?
Entire full page image in the WAL record is compressed. The unimportant
part of the full page image  which is hole is not WAL logged in original
code. This patch compresses entire full page image inclusive of hole. This
can be optimized by omitting hole in the compressed FPW(incase hole is
filled with non-zeros) like the original uncompressed FPW . But this can
lead to change in BkpBlock structure.

This should be named 'compress_full_page_writes' or so, even if a
temporary guc. There's the 'full_page_writes' guc and I see little
reaason to deviate from its name.

Yes. This will be renamed to full_page_compression according to suggestions
earlier in the discussion.


Thank you,

Rahila Syed


On Fri, Jul 11, 2014 at 12:00 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-07-04 19:27:10 +0530, Rahila Syed wrote:
  + /* Allocates memory for compressed backup blocks according to the
 compression
  + * algorithm used.Once per session at the time of insertion of
 first XLOG
  + * record.
  + * This memory stays till the end of session. OOM is handled by
 making the
  + * code proceed without FPW compression*/
  + static char *compressed_pages[XLR_MAX_BKP_BLOCKS];
  + static bool compressed_pages_allocated = false;
  + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF 
  + compressed_pages_allocated!= true)
  + {
  + size_t buffer_size = VARHDRSZ;
  + int j;
  + if (compress_backup_block ==
 BACKUP_BLOCK_COMPRESSION_SNAPPY)
  + buffer_size +=
 snappy_max_compressed_length(BLCKSZ);
  + else if (compress_backup_block ==
 BACKUP_BLOCK_COMPRESSION_LZ4)
  + buffer_size += LZ4_compressBound(BLCKSZ);
  + else if (compress_backup_block ==
 BACKUP_BLOCK_COMPRESSION_PGLZ)
  + buffer_size += PGLZ_MAX_OUTPUT(BLCKSZ);
  + for (j = 0; j  XLR_MAX_BKP_BLOCKS; j++)
  + {   compressed_pages[j] = (char *) malloc(buffer_size);
  + if(compressed_pages[j] == NULL)
  + {
  +
 compress_backup_block=BACKUP_BLOCK_COMPRESSION_OFF;
  + break;
  + }
  + }
  + compressed_pages_allocated = true;
  + }

 Why not do this in InitXLOGAccess() or similar?

/*
 * Make additional rdata chain entries for the backup blocks, so
 that we
 * don't need to special-case them in the write loop.  This
 modifies the
  @@ -1015,11 +1048,32 @@ begin:;
rdt-next = (dtbuf_rdt2[i]);
rdt = rdt-next;
 
  + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF)
  + {
  + /* Compress the backup block before including it in rdata
 chain */
  + rdt-data = CompressBackupBlock(page, BLCKSZ -
 bkpb-hole_length,
  +
   compressed_pages[i], (rdt-len));
  + if (rdt-data != NULL)
  + {
  + /*
  +  * write_len is the length of compressed
 block and its varlena
  +  * header
  +  */
  + write_len += rdt-len;
  + bkpb-hole_length = BLCKSZ - rdt-len;
  + /*Adding information about compression in
 the backup block header*/
  +
 bkpb-block_compression=compress_backup_block;
  + rdt-next = NULL;
  + continue;
  + }
  + }
  +

 So, you're compressing backup blocks one by one. I wonder if that's the
 right idea and if we shouldn't instead compress all of them in one run to
 increase the compression ratio.


  +/*
* Get a pointer to the right location in the WAL buffer containing the
* given XLogRecPtr.
*
  @@ -4061,6 +4174,50 @@ RestoreBackupBlockContents(XLogRecPtr 

Re: [HACKERS] Minmax indexes

2014-07-11 Thread Alvaro Herrera
Fujii Masao wrote:
 On Thu, Jul 10, 2014 at 6:16 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  Here's a new version of this patch, which is more generic the original
  versions, and similar to what you describe.
 
 I've not read the discussion so far at all, but I found the problem
 when I played with this patch. Sorry if this has already been discussed.
 
 =# create table test as select num from generate_series(1,10) num;
 SELECT 10
 =# create index testidx on test using minmax (num);
 CREATE INDEX
 =# alter table test alter column num type text;
 ERROR:  could not determine which collation to use for string comparison
 HINT:  Use the COLLATE clause to set the collation explicitly.

Ah, yes, I need to pass down collation OIDs to comparison functions.
That's marked as XXX in various places in the code.  Sorry I forgot to
mention that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-07-11 Thread Bruce Momjian
On Fri, Jul 11, 2014 at 12:18:40AM -0400, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  On Thu, Jul 10, 2014 at 06:38:26PM -0400, Bruce Momjian wrote:
 
   I have thought some more on this.  I thought I would need to open
   pg_class in C and do complex backend stuff, but I now realize I can do
   it from libpq, and just call ALTER TABLE and I think that always
   auto-checks if a TOAST table is needed.  All I have to do is query
   pg_class from libpq, then construct ALTER TABLE commands for each item,
   and it will optionally create the TOAST table if needed.  I just have to
   use a no-op ALTER TABLE command, like SET STATISTICS.
  
  Attached is the backend part of the patch.  I will work on the
  pg_upgrade/libpq/ALTER TABLE part later.
 
 Uh, why does this need to be in ALTER TABLE?  Can't this be part of
 table creation done by pg_dump?

Uh, I think you need to read the thread.  We have to delay the toast
creation part so we don't use an oid that will later be required by
another table from the old cluster.  This has to be done after all
tables have been created.

We could have pg_dump spit out those ALTER lines at the end of the dump,
but it seems simpler to do it in pg_upgrade.

Even if we have pg_dump create all the tables that require pre-assigned
TOAST oids first, then the other tables that _might_ need a TOAST table,
those later tables might create a toast oid that matches a later
non-TOAST-requiring table, so I don't think that fixes the problem.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-07-11 Thread Bruce Momjian
On Fri, Jul 11, 2014 at 09:48:06AM -0400, Bruce Momjian wrote:
  Uh, why does this need to be in ALTER TABLE?  Can't this be part of
  table creation done by pg_dump?
 
 Uh, I think you need to read the thread.  We have to delay the toast
 creation part so we don't use an oid that will later be required by
 another table from the old cluster.  This has to be done after all
 tables have been created.
 
 We could have pg_dump spit out those ALTER lines at the end of the dump,
 but it seems simpler to do it in pg_upgrade.
 
 Even if we have pg_dump create all the tables that require pre-assigned
 TOAST oids first, then the other tables that _might_ need a TOAST table,
 those later tables might create a toast oid that matches a later
 non-TOAST-requiring table, so I don't think that fixes the problem.

What would be nice is if I could mark just the tables that will need
toast tables created in that later phase (those tables that didn't have
a toast table in the old cluster, but need one in the new cluster). 
However, I can't see where to store that or how to pass that back into
pg_upgrade.  I don't see a logical place in pg_class to put it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add line number as prompt option to psql

2014-07-11 Thread Alvaro Herrera
Jeevan Chalke wrote:

 On Fri, Jul 11, 2014 at 3:13 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

  And the line number should be switched to 1 when line number has
  reached to INT_MAX?
 
 Yes, when it goes beyond INT_MAX, wrap around to 1.
 
 BTW, I wonder, can't we simply use unsigned int instead?

That was my thought also: let the variable be unsigned, and have it wrap
around normally.  So once you reach UINT_MAX, the next line number is
zero (instead of getting stuck at UINT_MAX, which would be rather
strange).  Anyway I don't think anyone is going to reach the UINT_MAX
limit ... I mean that would be one hell of a query, wouldn't it.  If
your query is upwards of a million lines, surely you are in deep trouble
already.

Does your text editor handle files longer than 4 billion lines?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread David G Johnston
Benedikt Grundmann wrote
 That is it possible to tell the planner that index is off limits i.e.
 don't
 ever generate a plan using it?
 
 Rationale:  Schema changes on big tables.  I might have convinced myself /
 strong beliefs that for all queries that I need to be fast the planner
 does
 not need to use a given index (e.g. other possible plans are fast enough).
 However if I just drop the index and it turns out I'm wrong I might be in
 a
 world of pain because it might just take way to long to recreate the
 index.
 
 I know that I can use pg_stat* to figure out if an index is used at all.
 But in the presense of multiple indices and complex queries the planner
 might prefer the index-to-be-dropped but the difference to the
 alternatives
 available is immaterial.
 
 The current best alternative we have is to test such changes on a testing
 database that gets regularly restored from production.  However at least
 in
 our case we simply don't know all possible queries (and logging all of
 them
 is not an option).
 
 Cheers,
 
 Bene

Worth double-checking in test but...

BEGIN;
DROP INDEX ...;
EXPLAIN ANALYZE SELECT ...
ROLLBACK;

Index dropping is transactional so your temporary action lasts until you
abort said transaction.

Though given your knowledge limitations this really isn't an improvement...

Catalog hacking could work but not recommended (nor do I know the proper
commands and limitations).  Do you need the database/table to accept writes
during the testing period?

You can avoid all indexes, but not a named subset, using a configuration
parameter.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Is-there-a-way-to-temporarily-disable-a-index-tp5811249p5811290.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing NOT IN to use ANTI joins

2014-07-11 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 On Fri, Jul 11, 2014 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hm ... actually, there might be a better answer: what about transforming
 WHERE (x,y) NOT IN (SELECT provably-not-null-values FROM ...)
 to
 WHERE antijoin condition AND x IS NOT NULL AND y IS NOT NULL

 I think this is the way to go.

 I'll try and get some time soon to look into adding the IS NOT NULL quals,
 unless you were thinking of looking again?

Go for it, I've got a lot of other stuff on my plate.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread Tom Lane
David G Johnston david.g.johns...@gmail.com writes:
 Benedikt Grundmann wrote
 That is it possible to tell the planner that index is off limits i.e.
 don't ever generate a plan using it?

 Catalog hacking could work but not recommended (nor do I know the proper
 commands and limitations).  Do you need the database/table to accept writes
 during the testing period?

Hacking pg_index.indisvalid could work, given a reasonably recent PG.
I would not try it in production until I'd tested it ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread Andres Freund
On 2014-07-11 11:07:21 -0400, Tom Lane wrote:
 David G Johnston david.g.johns...@gmail.com writes:
  Benedikt Grundmann wrote
  That is it possible to tell the planner that index is off limits i.e.
  don't ever generate a plan using it?
 
  Catalog hacking could work but not recommended (nor do I know the proper
  commands and limitations).  Do you need the database/table to accept writes
  during the testing period?
 
 Hacking pg_index.indisvalid could work, given a reasonably recent PG.
 I would not try it in production until I'd tested it ;-)

Works, but IIRC can cause problems at least  9.4 because concurrent
cache builds might miss the pg_index row...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-07-11 11:07:21 -0400, Tom Lane wrote:
 Hacking pg_index.indisvalid could work, given a reasonably recent PG.
 I would not try it in production until I'd tested it ;-)

 Works, but IIRC can cause problems at least  9.4 because concurrent
 cache builds might miss the pg_index row...

If you're talking about SnapshotNow hazards, I think the risk would be
minimal, and probably no worse than cases that the system will cause
by itself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread Andres Freund
On 2014-07-11 11:20:08 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-07-11 11:07:21 -0400, Tom Lane wrote:
  Hacking pg_index.indisvalid could work, given a reasonably recent PG.
  I would not try it in production until I'd tested it ;-)
 
  Works, but IIRC can cause problems at least  9.4 because concurrent
  cache builds might miss the pg_index row...
 
 If you're talking about SnapshotNow hazards, I think the risk would be
 minimal, and probably no worse than cases that the system will cause
 by itself.

Yes, SnapshotNow. I could reproduce it causing 'spurious' HOT updates
and missing index inserts a while back. And I don't think it's
comparable with normal modifications. Those either have a modification
blocking lock or use heap_inplace...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-07-11 11:20:08 -0400, Tom Lane wrote:
 If you're talking about SnapshotNow hazards, I think the risk would be
 minimal, and probably no worse than cases that the system will cause
 by itself.

 Yes, SnapshotNow. I could reproduce it causing 'spurious' HOT updates
 and missing index inserts a while back. And I don't think it's
 comparable with normal modifications. Those either have a modification
 blocking lock or use heap_inplace...

I still think the risk is minimal, but if the OP was worried about this
he could take out an AccessExclusive lock on the parent table for long
enough to commit the pg_index change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING

2014-07-11 Thread Jeff Davis
On Fri, 2014-07-11 at 14:41 +0900, Fujii Masao wrote:
 Could you add the patch into next CF?

Sure. The patch is so small I was thinking about committing it in a few
days (assuming no complaints), but I'm in no hurry.

 The patch doesn't contain the change of the document. But I think that
 it's better to document what character is allowed as escape in LIKE,
 SIMILAR TO and SUBSTRING.

It should be assumed that multi-byte characters are allowed nearly
everywhere, and we should document the places where only single-byte
characters are allowed.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING

2014-07-11 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 Attached is a small patch to $SUBJECT.
 In master, only single-byte characters are allowed as an escape. Of
 course, with the patch it must still be a single character, but it may
 be multi-byte.

I'm concerned about the performance cost of this patch.  Have you done
any measurements about what kind of overhead you are putting on the
inner loop of similar_escape?

At the very least, please don't call GetDatabaseEncoding() over again
every single time through the inner loop.  More generally, why do we
need to use pg_encoding_verifymb?  The input data is presumably validly
encoded.  ISTM the significantly cheaper pg_mblen() would be more
appropriate.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread Michael Banck
On Fri, Jul 11, 2014 at 11:07:21AM -0400, Tom Lane wrote:
 David G Johnston david.g.johns...@gmail.com writes:
  Benedikt Grundmann wrote
  That is it possible to tell the planner that index is off limits
  i.e.
  don't ever generate a plan using it?
 
  Catalog hacking could work but not recommended (nor do I know the
  proper
  commands and limitations).  Do you need the database/table to accept
  writes
  during the testing period?
 
 Hacking pg_index.indisvalid could work, given a reasonably recent PG.
 I would not try it in production until I'd tested it ;-)

I wonder whether this should be exposed at the SQL level?  Hacking
pg_index is left to superusers, but the creator of an index (or the
owner of the schema) might want to experiment with disabling indices
while debugging query plans as well.

Turns out this is already in the TODO, Steve Singer has requested this
(in particular, ALTER TABLE ...  ENABLE|DISABLE INDEX ...) in
http://www.postgresql.org/message-id/87hbegz5ir@cbbrowne.afilias-int.info
(as linked to from the TODO wiki page), but the neighboring discussion
was mostly about FK constraints.

Thoughts?


Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way to temporarily disable a index

2014-07-11 Thread David Johnston
On Fri, Jul 11, 2014 at 12:12 PM, Michael Banck mba...@gmx.net wrote:

 On Fri, Jul 11, 2014 at 11:07:21AM -0400, Tom Lane wrote:
  David G Johnston david.g.johns...@gmail.com writes:
   Benedikt Grundmann wrote
   That is it possible to tell the planner that index is off limits
   i.e.
   don't ever generate a plan using it?
 
   Catalog hacking could work but not recommended (nor do I know the
   proper
   commands and limitations).  Do you need the database/table to accept
   writes
   during the testing period?
 
  Hacking pg_index.indisvalid could work, given a reasonably recent PG.
  I would not try it in production until I'd tested it ;-)

 I wonder whether this should be exposed at the SQL level?  Hacking
 pg_index is left to superusers, but the creator of an index (or the
 owner of the schema) might want to experiment with disabling indices
 while debugging query plans as well.

 Turns out this is already in the TODO, Steve Singer has requested this
 (in particular, ALTER TABLE ...  ENABLE|DISABLE INDEX ...) in

 http://www.postgresql.org/message-id/87hbegz5ir@cbbrowne.afilias-int.info
 (as linked to from the TODO wiki page), but the neighboring discussion
 was mostly about FK constraints.

 Thoughts?


 Michael


Apparently work is ongoing on to allow EXPLAIN to calculate the impact a
particular index has on table writes.  What is needed is a mechanism to
temporarily facilitate the remove impact of specific indexes on reads
without ​having to disable the index for writing.  Ideally on a per-query
basis so altering the catalog doesn't make sense.  I know we do not want
traditional planner hints but in the spirit of the existing
enable_indexscan GUC there should be a 
disable_readofindex='table1.index1,table1.index2,table2.index1'  GUC
capability that would allow for session, user, or system-level control of
which indexes are to be used during table reads.

David J.


Re: [HACKERS] Minmax indexes

2014-07-11 Thread Claudio Freire
On Thu, Jul 10, 2014 at 4:20 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Claudio Freire wrote:
 On Wed, Jul 9, 2014 at 6:16 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
  Another thing I noticed is that version 8 of the patch blindly believed
  the pages_per_range declared in catalogs.  This meant that if somebody
  did alter index foo set pages_per_range=123 the index would
  immediately break (i.e. return corrupted results when queried).  I have
  fixed this by storing the pages_per_range value used to construct the
  index in the metapage.  Now if you do the ALTER INDEX thing, the new
  value is only used when the index is recreated by REINDEX.

 This seems a lot like parameterizing.

 I don't understand what that means -- care to elaborate?

We've been talking about bloom filters, and how their shape differs
according to the parameters of the bloom filter (number of hashes,
hash type, etc).

But after seeing this case of pages_per_range, I noticed it's an
effective-enough mechanism. Like:

CREATE INDEX ix_blah ON some_table USING bloom (somecol)
  WITH (BLOOM_HASHES=15, BLOOM_BUCKETS=1024, PAGES_PER_RANGE=64);

Marking as read-only is ok, or emitting a NOTICE so that if anyone
changes those parameters that change the shape of the index, they know
it needs a rebuild would be OK too. Both mechanisms work for me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add line number as prompt option to psql

2014-07-11 Thread Sawada Masahiko
On Fri, Jul 11, 2014 at 11:01 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Jeevan Chalke wrote:

 On Fri, Jul 11, 2014 at 3:13 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

  And the line number should be switched to 1 when line number has
  reached to INT_MAX?

 Yes, when it goes beyond INT_MAX, wrap around to 1.

 BTW, I wonder, can't we simply use unsigned int instead?

 That was my thought also: let the variable be unsigned, and have it wrap
 around normally.  So once you reach UINT_MAX, the next line number is
 zero (instead of getting stuck at UINT_MAX, which would be rather
 strange).  Anyway I don't think anyone is going to reach the UINT_MAX
 limit ... I mean that would be one hell of a query, wouldn't it.  If
 your query is upwards of a million lines, surely you are in deep trouble
 already.

 Does your text editor handle files longer than 4 billion lines?


As you said, if line number reached UINT_MAX then I think that this
case is too strange.
I think INT_MAX is enough for line number.

The v5 patch which Jeevan is created seems to good.
But one point, I got hunk when patch is applied to HEAD. (doc file)
So I have revised it and attached.

Regards,

---
Sawada Masahiko


psql-line-number_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add line number as prompt option to psql

2014-07-11 Thread Alvaro Herrera
Sawada Masahiko wrote:

 As you said, if line number reached UINT_MAX then I think that this
 case is too strange.
 I think INT_MAX is enough for line number.

My point is not whether 2 billion is a better number than 4 billion as a
maximum value.  My point is that wraparound of signed int is, I think,
not even defined in C, whereas wraparound of unsigned int is well
defined.  cur_line should be declared as unsigned int.  I don't trust
that INT_MAX+2 arithmetic.

Please don't use cur_line as a name for a global variable.  Something
like PSQLLineNumber seems more appropriate if it's going to be exposed
through prompt.h.  However, note that MainLoop() keeps state in local
variables and notes that it is reentrant; what happens to your cur_line
when a file is read by \i and similar?  I wonder if it should be part of
PsqlScanStateData instead ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-11 Thread Tomas Vondra
On 10.7.2014 21:33, Tomas Vondra wrote:
 On 9.7.2014 16:07, Robert Haas wrote:
 On Tue, Jul 8, 2014 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Thinking about this a bit more, do we really need to build the hash
 table on the first pass? Why not to do this:

 (1) batching
 - read the tuples, stuff them into a simple list
 - don't build the hash table yet

 (2) building the hash table
 - we have all the tuples in a simple list, batching is done
 - we know exact row count, can size the table properly
 - build the table

 We could do this, and in fact we could save quite a bit of memory if
 we allocated say 1MB chunks and packed the tuples in tightly instead
 of palloc-ing each one separately.  But I worry that rescanning the
 data to build the hash table would slow things down too much.
 
 I did a quick test of how much memory we could save by this. The 
 attached patch densely packs the tuples into 32kB chunks (1MB seems
 way too much because of small work_mem values, but I guess this might
 be tuned based on number of tuples / work_mem size ...).

Turns out getting this working properly will quite complicated. The
patch was only a quick attempt to see how much overhead there is, and
didn't solve one important details - batching.

The problem is that when increasing the number of batches, we need to
get rid of the tuples from one of them. Which means calling pfree() on
the tuples written to a temporary file, and that's not possible with the
dense allocation.


1) copy into new chunks (dead end)
--

The first idea I had was to just copy everything into new chunks and
then throw away the old ones, but this way we might end up using 150% of
work_mem on average (when the two batches are about 1/2 the data each),
and in an extreme case up to 200% of work_mem (all tuples having the
same key and thus falling into the same batch). So I don't think that's
really a good approach.


2) walking through the tuples sequentially
--

The other option is not to walk the tuples through buckets, but by
walking throught the chunks - we know the tuples are stored as
HashJoinTuple/MinimalTuple, so it shouldn't be that difficult. And by
walking the tuples in the order they're stored, we may just rearrange
the tuples in already allocated memory. And maybe it could be even
faster than the current code, because of the sequential access to the
memory (as opposed to the random access through buckets) and CPU caches.
So I like this approach - it's simple, clean and shouldn't add any
overhead (memory or CPU).


3) batch-aware chunks
-

I also think a batch-aware chunks might work. Say we're starting with
nbatch=N. Instead of allocating everything in a single chunk, we'll
allocate the tuples from the chunks according to a hypothetical batch
number - what batch would the tuple belong to if we had (nbatch=N*2).
So we'd have two chunks (or sequences of chunks), and we'd allocate the
tuples into them.

Then if we actually need to increase the number of batches, we know we
can just free one of the chunks, because all of the tuples are from the
'wrong' batche, and redistribute the remaining tuples into the new
chunks (according to the new hypothetical batch numbers).

I'm not sure how much overhead this would be, though. I think it can be
done quite efficiently, though, and it shouldn't have any impact at all,
if we don't do any additional batching (i.e. if the initial estimates
are ok).

Any other ideas how to tackle this?

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2014-07-11 Thread Alvaro Herrera
Heikki Linnakangas wrote:

 I did again the refactoring you did back in 2006, patch attached.
 One thing I did differently: I moved the raw, non-encrypted,
 read/write functions to separate functions: pqsecure_raw_read and
 pqsecure_raw_write. Those functions encapsulate the SIGPIPE
 handling. The OpenSSL code implements a custom BIO, which calls to
 pqsecure_raw_read/write to do the low-level I/O.  Similarly in the
 server-side, there are be_tls_raw_read and pg_tls_raw_write
 functions, which do the
 prepare_for_client_read()/client_read_ended() dance, so that the SSL
 implementation doesn't need to know about that.

I'm skimming over this patch (0001).  There are some issues:

* You duplicated the long comment under the IDENTIFICATION tag that was
  in be-secure.c; it's now both in that file and also in
  be-secure-openssl.c.  I think it should be removed from be-secure.c.
  Also, the hardcoded DH params are duplicated in be-secure.c, but they
  belong in -openssl.c only now.

* There is some mixup regarding USE_SSL and USE_OPENSSL in be-secure.c.
  I think anything that's OpenSSL-specific should be in
  be-secure-openssl.c only; any new SSL implementation will need to
  implement all those functions.  For instance, be_tls_init().
  I imagine that if we select any SSL implementation, USE_SSL would get
  defined, and each SSL implementation would additionally define its own
  symbol.  Unless the idea is to get rid of USE_OPENSSL completely, and
  use only the Makefile bit to decide which implementation to use?  If
  so, then USE_OPENSSL as a preprocessor symbol is useless ...

* ssl_renegotiation_limit is also duplicated.  But removing this one is
  probably not going to be as easy as deleting a line from be-secure.c,
  because guc.c depends on that one.  I think that variable should be
  defined in be-secure.c (i.e. delete it from -openssl) and make sure
  that all SSL implementations enforce it on their own somehow.

The DISABLE_SIGPIPE thingy looks wrong in pqsecure_write.  I think it
should be like this:

ssize_t
pqsecure_write(PGconn *conn, const void *ptr, size_t len)
{
ssize_t n;

#ifdef USE_SSL
if (conn-ssl_in_use)
{
DECLARE_SIGPIPE_INFO(spinfo);

DISABLE_SIGPIPE(conn, spinfo, return -1);

n = pgtls_write(conn, ptr, len);

RESTORE_SIGPIPE(spinfo);
}
else 
#endif   /* USE_OPENSSL */
{
n = pqsecure_raw_write(conn, ptr, len);
}

return n;
}

You are missing the restore call, and I moved the declaration inside the
ssl_in_use block since otherwise it's not useful.  The other path is
fine since pqsecure_raw_write disables and restores the flag by itself.
Also, you're missing DECLARE/DISABLE/RESTORE in the ssl_in_use block in
pqsecure_read.  (The original code does not have that code in the
non-SSL path.  I assume, without checking, that that's okay.)  I also
assume without checking that all SSL implementations would be fine with
this SIGPIPE handling.

Another thing that seems wrong is the REMEMBER_EPIPE stuff.  The
fe-secure-openssl.c code should be setting the flag, but AFAICS only the
non-SSL code is doing it.


Thanks for working on this -- I'm sure many distributors will be happy
to be able to enable other, less license-broken TLS implementations.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS Design

2014-07-11 Thread Robert Haas
On Fri, Jul 11, 2014 at 4:55 AM, Stephen Frost sfr...@snowman.net wrote:
 On Thursday, July 10, 2014, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jul 9, 2014 at 2:13 AM, Stephen Frost sfr...@snowman.net wrote:
  Yes, this would be possible (and is nearly identical to the original
  patch, except that this includes per-role considerations), however, my
  thinking is that it'd be simpler to work with policy names rather than
  sets of quals, to use when mapping to roles, and they would potentially
  be useful later for other things (eg: for setting up which policies
  should be applied when, or which should be OR' or ANDd with other
  policies, or having groups of policies, etc).

 Hmm.  I guess that's reasonable.  Should the policy be a per-table
 object (like rules, constraints, etc.) instead of a global object?

 You could do:

 ALTER TABLE table_name ADD POLICY policy_name (quals);
 ALTER TABLE table_name POLICY FOR role_name IS policy_name;
 ALTER TABLE table_name DROP POLICY policy_name;

 Right, I was thinking they would be per table as they would specifically
 provide a name for a set of quals, and quals are naturally table-specific. I
 don't see a need to have them be global- that had been brought up before
 with the notion of applications picking their policy, but we could also add
 that later through another term (eg: contexts) which would then map to
 policies or similar. We could even extend policies to be global by mapping
 existing per-table ones to be global if we really needed to...

 My feeling at the moment is that having them be per-table makes sense and
 we'd still have flexibility to change later if we had some compelling reason
 to do so.

I don't think you can really change it later.  If policies are
per-table, then you could have a policy p1 on table t1 and also on
table t2; if they become global objects, then you can't have p1 in two
places.  I hope I'm not beating a dead horse here, but changing syntax
after it's been released is very, very hard.

But that's not an argument against doing it this way; I think
per-table policies are probably simpler and better here.  It means,
for example, that policies need not have their own permissions and
ownership structure - they're part of the table, just like a
constraint, trigger, or rule, and the table owner's permissions
control.  I like that, and I think our users will, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] things I learned from working on memory allocation

2014-07-11 Thread Robert Haas
On Thu, Jul 10, 2014 at 1:05 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Jul 8, 2014 at 1:27 AM, Robert Haas robertmh...@gmail.com wrote:
 6. In general, I'm worried that it's going to be hard to keep the
 overhead of parallel sort from leaking into the non-parallel case.
 With the no-allocator approach, every place that uses
 GetMemoryChunkSpace() or repalloc() or pfree() will have to handle the
 DSM and non-DSM cases differently, which isn't great for either
 performance or maintainability.  And even with an allocator, the
 SortTuple array will need to use relative pointers in a DSM; that
 might burden the non-DSM case.

 I think you are right in saying that there can be a performance impact
 for non-parallel case due to pfree() (or other similar calls) and/or due
 to usage of relative pointers, but can it have measurable impact during
 actual sort operation?

Benchmarking seems to indicate that it indeed can.

 If there is an noticeable impact, then do you think having separate
 file/infrastructure for parallel sort can help, basically non-parallel
 and parallel sort will have some common and some specific parts.
 The above layer will call the parallel/non-parallel functions based on
 operation need.

The tuplesort.c already handles so many cases already that adding yet
another axis of variability is intimidating.  But it may work out that
there's no better option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS Design

2014-07-11 Thread Stephen Frost
Robert,

On Friday, July 11, 2014, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Jul 11, 2014 at 4:55 AM, Stephen Frost sfr...@snowman.net
 javascript:; wrote:
  My feeling at the moment is that having them be per-table makes sense and
  we'd still have flexibility to change later if we had some compelling
 reason
  to do so.

 I don't think you can really change it later.  If policies are
 per-table, then you could have a policy p1 on table t1 and also on
 table t2; if they become global objects, then you can't have p1 in two
 places.  I hope I'm not beating a dead horse here, but changing syntax
 after it's been released is very, very hard.


Fair enough. My thinking was we'd come up with a way to map them (eg:
table_policy), but I do agree that changing it later would really suck and
having them be per-table makes a lot of sense.


 But that's not an argument against doing it this way; I think
 per-table policies are probably simpler and better here.  It means,
 for example, that policies need not have their own permissions and
 ownership structure - they're part of the table, just like a
 constraint, trigger, or rule, and the table owner's permissions
 control.  I like that, and I think our users will, too.


Agreed and I believe this is more-or-less what I had proposed up-thread
(not at a computer at the moment). I hope to have a chance to review and
update the design and flush out the catalog definition this weekend.

Thanks!

Stephen


Re: [HACKERS] Minmax indexes

2014-07-11 Thread Greg Stark
On Fri, Jul 11, 2014 at 6:00 PM, Claudio Freire klaussfre...@gmail.com wrote:
 Marking as read-only is ok, or emitting a NOTICE so that if anyone
 changes those parameters that change the shape of the index, they know
 it needs a rebuild would be OK too. Both mechanisms work for me.

We don't actually have any of these mechanisms. They wouldn't be bad
things to have but I don't think we should gate adding new types of
indexes on adding them. In particular, the index could just hard code
a value for these parameters and having them be parameterized is
clearly better even if that doesn't produce all the warnings or
rebuild things automatically or whatever.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-07-11 Thread Claudio Freire
On Fri, Jul 11, 2014 at 3:47 PM, Greg Stark st...@mit.edu wrote:
 On Fri, Jul 11, 2014 at 6:00 PM, Claudio Freire klaussfre...@gmail.com 
 wrote:
 Marking as read-only is ok, or emitting a NOTICE so that if anyone
 changes those parameters that change the shape of the index, they know
 it needs a rebuild would be OK too. Both mechanisms work for me.

 We don't actually have any of these mechanisms. They wouldn't be bad
 things to have but I don't think we should gate adding new types of
 indexes on adding them. In particular, the index could just hard code
 a value for these parameters and having them be parameterized is
 clearly better even if that doesn't produce all the warnings or
 rebuild things automatically or whatever.


No, I agree, it's just a nice to have.

But at least the docs should mention it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Over-optimization in ExecEvalWholeRowVar

2014-07-11 Thread Tom Lane
This example in the regression database is a simplified version of a bug
I was shown off-list:

regression=# select (
select q from
( select 1,2,3 where f10
  union all
  select 4,5,6.0 where f1=0
) q
)
from int4_tbl;
ERROR:  record type has not been registered

The reason for the problem is that ExecEvalWholeRowVar is designed
to bless the Var's record type (via assign_record_type_typmod) just
once per query.  That works fine, as long as the source tuple slot
is the exact same TupleTableSlot throughout the query.  But in the
above example, we have an Append plan node for the UNION ALL, which
will pass back its children's result slots directly --- so at the
level where q is being evaluated, we see first the first leaf SELECT's
output slot (which gets blessed) and then the second leaf SELECT's
output slot (which doesn't).  That leads to generating a composite
Datum with typmod -1 ... oops.

I can reproduce this bug in all active branches.  Probably the reason
it hasn't been identified long since is that in most scenarios the
planner won't use a whole-row Var at all in cases like this, but will
prefer to build RowExprs.  (In particular, the inconsistent data types
of the last sub-select output columns are necessary to trigger the bug
in this example.)

The easy fix is to move the blessing code from ExecEvalWholeRowVar
into ExecEvalWholeRowFast.  (ExecEvalWholeRowSlow doesn't need it
since it doesn't deal with RECORD cases.)  That'll add a usually-useless
comparison or two per row cycle, which is unlikely to be measurable.

A bigger-picture problem is that we have other once-per-query tests in
ExecEvalWholeRowVar, and ExecEvalScalarVar too, whose validity should be
questioned given this bug.  I think they are all right, though, because
those tests are concerned with validating the source rowtype, which should
not change across Append children.  The bug here is because we're assuming
not just that the input rowtype is abstractly the same across all rows,
but that the exact same TupleTableSlot is used to return every source row.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-07-11 Thread Martijn van Oosterhout
On Thu, Jul 10, 2014 at 08:46:55AM +0530, Amit Kapila wrote:
 As per my understanding of the general theory around barriers,
 read and write are defined to avoid reordering due to compiler and
 full memory barriers are defined to avoid reordering due to processors.
 There are some processors that support instructions for memory
 barriers with acquire, release and fence semantics.

Not exactly, both compilers and processors can rearrange loads and
stores.  Because the architecture most developers work on (x86) has
such a strong memory model (it's goes to lot of effort to hide
reordering) people are under the impression that it's only the compiler
you need to worry about, but that's not true for any other
architechture.

Memory barriers/fences are on the whole discouraged if you can use
acquire/release semantics because the latter are faster and much easier
to optimise for.

I strongly recommend the Atomic Weapons talk on the C11 memory model
to help understand how they work. As a bonus it includes correct
implementations for the major architectures.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-07-11 Thread Robert Haas
On Thu, Jul 10, 2014 at 2:52 AM, Tomonari Katsumata
katsumata.tomon...@po.ntts.co.jp wrote:
 Several couple weeks ago, I posted a mail to pgsql-doc.
 http://www.postgresql.org/message-id/53992ff8.2060...@po.ntts.co.jp

 In this thread, I concluded that it's better to
 round up the value which is less than its unit.
 Current behavior (rounding down) has undesirable setting risk,
 because some GUCs have special meaning for 0.

 And then I made a patch for this.
 Please check the attached patch.

Thanks for the patch.  Please add it here:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING

2014-07-11 Thread Jeff Davis
On Fri, 2014-07-11 at 11:51 -0400, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
  Attached is a small patch to $SUBJECT.
  In master, only single-byte characters are allowed as an escape. Of
  course, with the patch it must still be a single character, but it may
  be multi-byte.
 
 I'm concerned about the performance cost of this patch.  Have you done
 any measurements about what kind of overhead you are putting on the
 inner loop of similar_escape?

I didn't consider this very performance critical, because this is
looping through the pattern, which I wouldn't expect to be a long
string. On my machine using en_US.UTF-8, the difference is imperceptible
for a SIMILAR TO ... ESCAPE query.

I was able to see about a 2% increase in runtime when using the
similar_escape function directly. I made a 10M tuple table and did:

explain analyze
  select 
similar_escape('','#')
 from t;

which was the worst reasonable case I could think of. (It appears that
selecting from a table is faster than from generate_series. I'm curious
what you use when testing the performance of an individual function at
the SQL level.)

 At the very least, please don't call GetDatabaseEncoding() over again
 every single time through the inner loop.  More generally, why do we
 need to use pg_encoding_verifymb?  The input data is presumably validly
 encoded.  ISTM the significantly cheaper pg_mblen() would be more
 appropriate.

Thank you. Using the non-verifying variants reduces the penalty in the
above test to 1%.

If needed, we could optimize further through code specialization, as
like_escape() does. Though I think like_escape() is specialized just
because MatchText() is specialized; and MatchText is specialized because
it operates on the actual data, not just the pattern. So I don't see a
reason to specialize similar_escape().

Regards,
Jeff Davis

*** a/src/backend/utils/adt/regexp.c
--- b/src/backend/utils/adt/regexp.c
***
*** 688,698  similar_escape(PG_FUNCTION_ARGS)
  		elen = VARSIZE_ANY_EXHDR(esc_text);
  		if (elen == 0)
  			e = NULL;			/* no escape character */
! 		else if (elen != 1)
! 			ereport(ERROR,
! 	(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
! 	 errmsg(invalid escape string),
!   errhint(Escape string must be empty or one character.)));
  	}
  
  	/*--
--- 688,703 
  		elen = VARSIZE_ANY_EXHDR(esc_text);
  		if (elen == 0)
  			e = NULL;			/* no escape character */
! 		else
! 		{
! 			int escape_mblen = pg_mbstrlen_with_len(e, elen);
! 
! 			if (escape_mblen  1)
! ereport(ERROR,
! 		(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
! 		 errmsg(invalid escape string),
! 		 errhint(Escape string must be empty or one character.)));
! 		}
  	}
  
  	/*--
***
*** 722,781  similar_escape(PG_FUNCTION_ARGS)
  
  	while (plen  0)
  	{
! 		char		pchar = *p;
  
! 		if (afterescape)
  		{
! 			if (pchar == ''  !incharclass)	/* for SUBSTRING patterns */
! *r++ = ((nquotes++ % 2) == 0) ? '(' : ')';
! 			else
  			{
  *r++ = '\\';
  *r++ = pchar;
  			}
! 			afterescape = false;
! 		}
! 		else if (e  pchar == *e)
! 		{
! 			/* SQL99 escape character; do not send to output */
! 			afterescape = true;
  		}
! 		else if (incharclass)
  		{
! 			if (pchar == '\\')
  *r++ = '\\';
! 			*r++ = pchar;
! 			if (pchar == ']')
! incharclass = false;
! 		}
! 		else if (pchar == '[')
! 		{
! 			*r++ = pchar;
! 			incharclass = true;
! 		}
! 		else if (pchar == '%')
! 		{
! 			*r++ = '.';
! 			*r++ = '*';
! 		}
! 		else if (pchar == '_')
! 			*r++ = '.';
! 		else if (pchar == '(')
! 		{
! 			/* convert to non-capturing parenthesis */
! 			*r++ = '(';
! 			*r++ = '?';
! 			*r++ = ':';
! 		}
! 		else if (pchar == '\\' || pchar == '.' ||
!  pchar == '^' || pchar == '$')
! 		{
! 			*r++ = '\\';
! 			*r++ = pchar;
  		}
- 		else
- 			*r++ = pchar;
- 		p++, plen--;
  	}
  
  	*r++ = ')';
--- 727,813 
  
  	while (plen  0)
  	{
! 		char	pchar = *p;
! 		int		mblen = pg_mblen(p);
  
! 		if (mblen == 1)
  		{
! 			if (afterescape)
! 			{
! if (pchar == ''  !incharclass)	/* for SUBSTRING patterns */
! 	*r++ = ((nquotes++ % 2) == 0) ? '(' : ')';
! else
! {
! 	*r++ = '\\';
! 	*r++ = pchar;
! }
! afterescape = false;
! 			}
! 			else if (e  pchar == *e)
! 			{
! /* SQL99 escape character; do not send to output */
! afterescape = true;
! 			}
! 			else if (incharclass)
! 			{
! if (pchar == '\\')
! 	*r++ = '\\';
! *r++ = pchar;
! if (pchar == ']')
! 	incharclass = false;
! 			}
! 			else if (pchar == '[')
! 			{
! *r++ = pchar;
! incharclass = true;
! 			}
! 			else if (pchar == '%')
! 			{
! *r++ = '.';
! *r++ = '*';
! 			}
! 			else if (pchar == '_')
! *r++ = '.';
! 			else if (pchar == '(')
! 			{
! /* convert to non-capturing parenthesis */
! *r++ = '(';
! *r++ = '?';
! *r++ = 

Re: [HACKERS] Crash on backend exit w/ OpenLDAP [2.4.24, 2.4.31]

2014-07-11 Thread Noah Misch
On Thu, Jun 19, 2014 at 01:01:34PM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Thu, Jun 12, 2014 at 05:02:19PM -0400, Noah Misch wrote:
  You can cause the at-exit crash by building PostgreSQL against OpenLDAP
  2.4.31, connecting with LDAP authentication, and issuing LOAD 'dblink'.
 
  4. Detect older OpenLDAP versions at runtime, just before we would 
  otherwise
  initialize OpenLDAP, and raise an error.  Possibly make the same check at
  compile time, for packager convenience.
 
  Having pondered this some more, I lean toward the following conservative 
  fix.
  Add to all supported branches a test case that triggers the crash and a
  configure-time warning if the OpenLDAP version falls in the vulnerable 
  range.
  So long as those who build from source monitor either configure output or
  test suite failures, they'll have the opportunity to head off the problem.
 
 +1 for a configure warning, but I share your concern that it's likely to
 go unnoticed (sometimes I wish autoconf were not so chatty...).

 I'm not sure about the practicality of adding a test case --- how will we
 test that if no LDAP server is at hand?

Merely attempting an LDAP connection (to a closed port, for example)
initializes the library far enough to trigger the problem.  Here's a patch
implementing the warning and test case.  The test case is based on the one I
posted upthread, modified to work with installcheck, work with non-LDAP
builds, and close a race condition.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
diff --git a/configure b/configure
index 481096c..e2850e4 100755
--- a/configure
+++ b/configure
@@ -9464,6 +9464,17 @@ fi
 
 fi
 
+# PGAC_LDAP_SAFE
+# --
+# PostgreSQL sometimes loads libldap_r and plain libldap into the same
+# process.  Check for OpenLDAP versions known not to tolerate doing so; assume
+# non-OpenLDAP implementations are safe.  The dblink test suite exercises the
+# hazardous interaction directly.
+
+
+
+
+
 if test $with_ldap = yes ; then
   if test $PORTNAME != win32; then
  for ac_header in ldap.h
@@ -9480,6 +9491,47 @@ fi
 
 done
 
+ { $as_echo $as_me:${as_lineno-$LINENO}: checking for compatible LDAP 
implementation 5
+$as_echo_n checking for compatible LDAP implementation...  6; }
+if ${pgac_cv_ldap_safe+:} false; then :
+  $as_echo_n (cached)  6
+else
+  cat confdefs.h - _ACEOF conftest.$ac_ext
+/* end confdefs.h.  */
+#include ldap.h
+#if !defined(LDAP_VENDOR_VERSION) || \
+ (defined(LDAP_API_FEATURE_X_OPENLDAP)  \
+  LDAP_VENDOR_VERSION = 20424  LDAP_VENDOR_VERSION = 20431)
+choke me
+#endif
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile $LINENO; then :
+  pgac_cv_ldap_safe=yes
+else
+  pgac_cv_ldap_safe=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo $as_me:${as_lineno-$LINENO}: result: $pgac_cv_ldap_safe 5
+$as_echo $pgac_cv_ldap_safe 6; }
+
+if test $pgac_cv_ldap_safe != yes; then
+  { $as_echo $as_me:${as_lineno-$LINENO}: WARNING:
+*** With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, each backend
+*** process that loads libpq (via WAL receiver, dblink, or postgres_fdw) and
+*** also uses LDAP will crash on exit. 5
+$as_echo $as_me: WARNING:
+*** With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, each backend
+*** process that loads libpq (via WAL receiver, dblink, or postgres_fdw) and
+*** also uses LDAP will crash on exit. 2;}
+fi
   else
  for ac_header in winldap.h
 do :
diff --git a/configure.in b/configure.in
index c938a5d..9f324f0 100644
--- a/configure.in
+++ b/configure.in
@@ -1096,10 +1096,39 @@ if test $with_libxslt = yes ; then
   AC_CHECK_HEADER(libxslt/xslt.h, [], [AC_MSG_ERROR([header file 
libxslt/xslt.h is required for XSLT support])])
 fi
 
+# PGAC_LDAP_SAFE
+# --
+# PostgreSQL sometimes loads libldap_r and plain libldap into the same
+# process.  Check for OpenLDAP versions known not to tolerate doing so; assume
+# non-OpenLDAP implementations are safe.  The dblink test suite exercises the
+# hazardous interaction directly.
+
+AC_DEFUN([PGAC_LDAP_SAFE],
+[AC_CACHE_CHECK([for compatible LDAP implementation], [pgac_cv_ldap_safe],
+[AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
+[#include ldap.h
+#if !defined(LDAP_VENDOR_VERSION) || \
+ (defined(LDAP_API_FEATURE_X_OPENLDAP)  \
+  LDAP_VENDOR_VERSION = 20424  LDAP_VENDOR_VERSION = 20431)
+choke me
+#endif], [])],
+[pgac_cv_ldap_safe=yes],
+[pgac_cv_ldap_safe=no])])
+
+if test $pgac_cv_ldap_safe != yes; then
+  AC_MSG_WARN([
+*** With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, each backend
+*** process that loads libpq (via WAL receiver, dblink, or postgres_fdw) and
+*** also uses LDAP will crash on exit.])
+fi])
+
+
+
 if test $with_ldap = yes ; then
   if test $PORTNAME != win32; then
  AC_CHECK_HEADERS(ldap.h, [],
   [AC_MSG_ERROR([header file ldap.h is required for 
LDAP])])
+ PGAC_LDAP_SAFE
   else
  

Re: [HACKERS] things I learned from working on memory allocation

2014-07-11 Thread Amit Kapila
On Fri, Jul 11, 2014 at 11:15 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jul 10, 2014 at 1:05 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
  If there is an noticeable impact, then do you think having separate
  file/infrastructure for parallel sort can help, basically non-parallel
  and parallel sort will have some common and some specific parts.
  The above layer will call the parallel/non-parallel functions based on
  operation need.

 The tuplesort.c already handles so many cases already that adding yet
 another axis of variability is intimidating.  But it may work out that
 there's no better option.

For using new allocator, one idea is that it works seemlesly with current
memory allocation routines (palloc, pfree, repalloc, ..), another could be
that have separate routines (shm_palloc, shm_pfree, ..) for allocation
from new allocator.  I think having separate routines means all the call
sites for allocation/deallocation needs to be aware of operation type
(parallel or non-parallel), however I think this can avoid the overhead
for non-parallel cases.

I think it is bit difficult to say which approach (with allocator or
directly use dsm) will turn out to be better as both have its pros and
cons as listed by you, however I feel that if we directly using dsm, we
might need to change more core logic than with allocator.

I believe we will face the same question again if somebody wants to
parallelize the join, so one parameter to decide could be based on
which approach will lead to less change in core logic/design.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] better atomics - v0.5

2014-07-11 Thread Amit Kapila
On Sat, Jul 12, 2014 at 1:26 AM, Martijn van Oosterhout klep...@svana.org
wrote:
 On Thu, Jul 10, 2014 at 08:46:55AM +0530, Amit Kapila wrote:
  As per my understanding of the general theory around barriers,
  read and write are defined to avoid reordering due to compiler and
  full memory barriers are defined to avoid reordering due to processors.
  There are some processors that support instructions for memory
  barriers with acquire, release and fence semantics.

 Not exactly, both compilers and processors can rearrange loads and
 stores.  Because the architecture most developers work on (x86) has
 such a strong memory model (it's goes to lot of effort to hide
 reordering) people are under the impression that it's only the compiler
 you need to worry about, but that's not true for any other
 architechture.

I am not saying that we need only barriers to avoid reordering due to
compiler, rather my point was that we need to avoid reordering due to
both compiler and processor, however ways to achieve it require different
API's

 Memory barriers/fences are on the whole discouraged if you can use
 acquire/release semantics because the latter are faster and much easier
 to optimise for.

Do all processors support instructions for memory barriers with acquire,
release semantics?

 I strongly recommend the Atomic Weapons talk on the C11 memory model
 to help understand how they work. As a bonus it includes correct
 implementations for the major architectures.

Yes that will be a good if we can can implement as per C11 memory model and
thats what I am referring while reviewing this patch, however even if that
is not
possible or difficult to achieve in all cases, it is worth to have a
discussion for
memory model used in current API's implemented in patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com