Re: GIN pending list pages not recycled promptly (was Re: [HACKERS] GIN improvements part 1: additional information)

2014-06-18 Thread Amit Langote
On Wed, Jan 22, 2014 at 9:12 PM, Heikki Linnakangas
 wrote:
> On 01/22/2014 03:39 AM, Tomas Vondra wrote:
>>
>> What annoys me a bit is the huge size difference between the index
>> updated incrementally (by a sequence of INSERT commands), and the index
>> rebuilt from scratch using VACUUM FULL. It's a bit better with the patch
>> (2288 vs. 2035 MB), but is there a chance to improve this?
>
>
> Hmm. What seems to be happening is that pending item list pages that the
> fast update mechanism uses are not getting recycled. When enough list pages
> are filled up, they are flushed into the main index and the list pages are
> marked as deleted. But they are not recorded in the FSM, so they won't be
> recycled until the index is vacuumed. Almost all of the difference can be
> attributed to deleted pages left behind like that.
>
> So this isn't actually related to the packed postinglists patch at all. It
> just makes the bloat more obvious, because it makes the actual size of the
> index size, excluding deleted pages, smaller. But it can be observed on git
> master as well:
>
> I created a simple test table and index like this:
>
> create table foo (intarr int[]);
> create index i_foo on foo using gin(intarr) with (fastupdate=on);
>
> I filled the table like this:
>
> insert into foo select array[-1] from generate_series(1, 1000) g;
>
> postgres=# \d+i
>List of relations
>  Schema | Name | Type  | Owner  |  Size  | Description
> +--+---+++-
>  public | foo  | table | heikki | 575 MB |
> (1 row)
>
> postgres=# \di+
>List of relations
>  Schema | Name  | Type  | Owner  | Table |  Size  | Description
> +---+---++---++-
>  public | i_foo | index | heikki | foo   | 251 MB |
> (1 row)
>
> I wrote a little utility that scans all pages in a gin index, and prints out
> the flags indicating what kind of a page it is. The distribution looks like
> this:
>
>  19 DATA
>7420 DATA LEAF
>   24701 DELETED
>   1 LEAF
>   1 META
>
> I think we need to add the deleted pages to the FSM more aggressively.
>
> I tried simply adding calls to RecordFreeIndexPage, after the list pages
> have been marked as deleted, but unfortunately that didn't help. The problem
> is that the FSM is organized into a three-level tree, and
> RecordFreeIndexPage only updates the bottom level. The upper levels are not
> updated until the FSM is vacuumed, so the pages are still not visible to
> GetFreeIndexPage calls until next vacuum. The simplest fix would be to add a
> call to IndexFreeSpaceMapVacuum after flushing the pending list, per
> attached patch. I'm slightly worried about the performance impact of the
> IndexFreeSpaceMapVacuum() call. It scans the whole FSM of the index, which
> isn't exactly free. So perhaps we should teach RecordFreeIndexPage to update
> the upper levels of the FSM in a retail-fashion instead.
>

I wonder if you pursued this further?

You recently added a number of TODO items related to GIN index; is it
worth adding this to the list?

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/18/2014 12:14 PM, Joe Conway wrote:
> On 06/18/2014 12:09 PM, Tom Lane wrote:
>> Joe Conway  writes:
>>> I think the context deletion was missed in the first place 
>>> because storeRow() is the wrong place to create the context. 
>>> Rather than creating the context in storeRow() and deleting it 
>>> two levels up in materializeQueryResult(), I think it should
>>> be created and deleted in the interim layer,
>>> storeQueryResult(). Patch along those lines attached.
> 
>> Since the storeInfo struct is longer-lived than 
>> storeQueryResult(), it'd probably be better if the cleanup
>> looked like
> 
>> +if (sinfo->tmpcontext != NULL) + 
>> MemoryContextDelete(sinfo->tmpcontext); +sinfo->tmpcontext = 
>> NULL;
> 
> 
> good point

If there is no further discussion, I will commit this version of the
patch back to 9.2 where this leak was introduced.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTompDAAoJEDfy90M199hlZoMP/333eXbmFvh1WCowMehKPEy1
WV3VgyZx7hBvSr/cP/HUMjXTb34hRgi5uGa79ZMyAW+U+jDxrIN4ozxp54LlgU5x
pTzD8J8VviunvWzf6stgHTe5bfcBJ9kA9oZlgHApH+JRGeySsYALI4ZBluJa1tRa
gf6ePd8Kwl4AUucbM3v7kJGxtVS5XRGKcffJnATg+OLiBUReVv9ZCjxeYasyOaRt
K2Q8ijW878UgV9HViTCsl8THelnlh7q0BpbKaSZBJG847CgmrVxfRt/l8Od8a4G/
ODoQ/fV0ZOI3h5j9oirL4/RC9HWOqchJBvBd3CK7caWLwNKVwqqEWGV3uqEO2TnA
NH3QgHb4u8WGNhoWbwL6dGWa27s2EntlWLpJRuavKxCIN2owvVBKSZ6H8d3dSlI5
AqaGnxOPvxWEB5K2w0wBynkZ9nrk4PYuIVKADu8fqyYyDsG3wGsZmI1trLNXj5sm
uE/FTbdvUjcU2uIVMMJSbPxa5JvvfR/+9rM/AF8VP5PMnSs1CyeLQXkW7nazRZOP
7zHUY6N4vwem8tVQuuXPouLu2B/eTJoXaUTm7iQvXztJDmwKxKgYCzLW/LxfvI4Q
mDIY/Arh/4RdC7kVXu6m5BEisNIbBuKtcA6f0DM+4G9i0Wtft450fgajYV4xcic9
DMPyBMwD7csz3ssF04Ox
=PJyj
-END PGP SIGNATURE-
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index a81853f..2e43c8f 100644
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
*** storeQueryResult(storeInfo *sinfo, PGcon
*** 1076,1081 
--- 1076,1089 
  	if (!PQsetSingleRowMode(conn))		/* shouldn't fail */
  		elog(ERROR, "failed to set single-row mode for dblink query");
  
+ 	/* Create short-lived memory context for data conversions */
+ 	if (!sinfo->tmpcontext)
+ 		sinfo->tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+   "dblink temporary context",
+   ALLOCSET_DEFAULT_MINSIZE,
+   ALLOCSET_DEFAULT_INITSIZE,
+   ALLOCSET_DEFAULT_MAXSIZE);
+ 
  	for (;;)
  	{
  		CHECK_FOR_INTERRUPTS();
*** storeQueryResult(storeInfo *sinfo, PGcon
*** 1119,1124 
--- 1127,1137 
  	/* clean up GUC settings, if we changed any */
  	restoreLocalGucs(nestlevel);
  
+ 	/* clean up data conversion short-lived memory context */
+ 	if (sinfo->tmpcontext != NULL)
+ 		MemoryContextDelete(sinfo->tmpcontext);
+ 	sinfo->tmpcontext = NULL;
+ 
  	/* return last_res */
  	res = sinfo->last_res;
  	sinfo->last_res = NULL;
*** storeRow(storeInfo *sinfo, PGresult *res
*** 1204,1218 
  		if (sinfo->cstrs)
  			pfree(sinfo->cstrs);
  		sinfo->cstrs = (char **) palloc(nfields * sizeof(char *));
- 
- 		/* Create short-lived memory context for data conversions */
- 		if (!sinfo->tmpcontext)
- 			sinfo->tmpcontext =
- AllocSetContextCreate(CurrentMemoryContext,
- 	  "dblink temporary context",
- 	  ALLOCSET_DEFAULT_MINSIZE,
- 	  ALLOCSET_DEFAULT_INITSIZE,
- 	  ALLOCSET_DEFAULT_MAXSIZE);
  	}
  
  	/* Should have a single-row result if we get here */
--- 1217,1222 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread Kevin Grittner
David Fetter  wrote:
> On Wed, Jun 18, 2014 at 03:30:34PM -0700, Kevin Grittner wrote:

>> the more I think about it (and discuss it) the more I'm
>> inclined to suffer the additional complexity of the standard
>> syntax for specifying transition relations in order to avoid
>> unnecessary overhead creating them when not needed.  I'm also
>> leaning toward just storing TIDs in the tuplestores, even though
>> it requires mixed snapshots in executing queries in the
>> triggers.
>
> So in this case one tuplestore with two TIDs, either of which
> might be NULL?

No, one or two tuplestores, depending on need, each with TIDs of
either the "before" set or the "after" set of all tuples affected
by the DML statement, however many that may be.  More or less like
this first draft patch, except with TIDs instead of copies of the
rows, and with better selectivity about when the tuplestores are
generated.

> Having the "before" tuplestore available to a BEFORE STATEMENT
> trigger would make it possible to do things with the before
> transition table that are fragile and hacky now.

How do you propose to have an accurate "before" tuplestore of
affected rows before the scan runs and before the BEFORE ... FOR
EACH ROW triggers fire?  That would be particularly interesting to
try to generate if the scan involves evaluating any VOLATILE
functions.

> Again, I see the tuplestores as infrastructure both deltas and
> many other things, so long as they're attached to the right
> objects.  In my opinion, the right objects would include
> materialized views, triggers, and certain very specific kinds of
> DML of which all the RETURNING ones are one example.  They would
> not include the underlying tables.

Right now I've presented something for capturing the information
and allowing it to be accessed from triggers.  I don't think the
means of capturing it precludes passing it along to other
consumers.  I would like to get this part working before trying to
wire it up to anything other than triggers.  The best way to kill
an effort like this is to allow scope creep.  Now, if you think
that something fundamentally belongs at another level, that's
something to address -- but the point where we capture data to pass
to triggers seems like a natural and sound place to capture it for
other purposes.  And since almost all the code for this patch is in
trigger.c, this seems like I'm in the right place for a trigger
feature.

>> standard syntax, the first thing would be for the statement to
>> somehow communicate to the trigger layer the need to capture a
>> tuplestore it might otherwise not generate, and there would need
>> to be a way for the statement to access the needed
>> tuplestore(s).
>
> Right.  Hence my proposal to make the existence of the
> tuplestores part of Query, writeable by the types of triggers
> which specify that they'll be needed.

> I just don't think that Rel is the place to connect them.

I don't know what you mean by that.  I've already said that I now
think we should use the standard CREATE TRIGGER syntax to specify
the transition tables, and that if we do that we don't need the
reloption that's in the patch.  If you ignore the 19 lines of new
code to add that reloption, absolutely 100% of the code changes in
the patch so far are in trigger.c and trigger.h.  That reloption
was never anything I would consider as *connecting* the tuplestores
to the Rel anyway -- it was simply an attempt to minimize
unnecessary work.  No matter how I try, I'm not seeing what you
mean by references to "connecting to Rel".

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Ian Barwick



On 19/06/14 12:35, Tom Lane wrote:

Peter Geoghegan  writes:

On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick  wrote:

Interesting, I'll take a look later.



I'm pretty suspicious of incompatibilities that may exist between the
two sets of OS collations involved here. We aren't very clear on the
extent to which what you're doing is supported, but it's certainly the
case that bttextcmp()/varstr_cmp()/strcoll() return values must be
immutable between the two systems.


Oooh, I'll bet that's exactly it.  Is the database using UTF8 encoding and
a non-C locale?


Yup, that is indeed the case.

> It's well known that OS X's UTF8 locales sort nothing at

all like the supposedly equivalent locales on other systems.


True, that. A different sort order wouldn't have surprised me,
but the failure to return an extant row had me thinking there
was something awry with the laptop causing file corruption (it's
getting on in years and has been bashed about a bit).


Regards

Ian Barwick


--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/18/2014 08:45 PM, Tom Lane wrote:
> Joe Conway  writes:
>> On 06/18/2014 07:29 PM, Tom Lane wrote:
>>> With the attached patch on top of yours, I see no leak
>>> anymore.
> 
>> I can confirm that -- rock solid with 1 million iterations. I
>> assume that should not be back-patched though?
> 
> Well, we usually think memory leaks are back-patchable bugs.  I'm a
> bit worried about the potential performance impact of an extra 
> memory context creation/deletion though.  It's probably not
> noticeable in this test case, but that's just because dblink() is
> such a spectacularly expensive function.

Probably so. I'll try to scrounge up some time to test the performance
impact of your patch.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTol38AAoJEDfy90M199hleycP/2kOi2Pa6vcVXKxhNQo3mSdO
A84Ae/4LTfUbeVzUTf+uBRcz6LOtOlrHATZOcftJMlyTNmM++JJvF3YYMpGgmxO/
UfiykDs2bqDgPrIxbPxAEpgdeXWcsdJZzzOV1YWurU/qnTdoKD2ArPQhakWLGZH0
CRc46Cn2Qb3NCvnuO5R+ZhGPXS0t6EqTiGlmWtk9ZaI8MHmv1qVKMOKBor3v+2lk
/wdlc5lypPnZ07NKIjCVN0gzEJ+RV9nxQk1M3QkNYNsHOBiexEmaucXo6ab4derO
nXoOGo/0XwMhlhA6vrKlAKhxjkTNnJulVHQOWVLMCiNvcfX0KISJZVIoT/NraR94
Hc5ZZMjmhosbU8sgQiKjGoFSJq2Wld6SADuLt6xvsY9k5AiuEcPFbfVjAWlCIaEm
lOQ2cOrk+4nhEA1ygIsRw96GMT2WaEtOek4l3hJs6yd3zuzXoouO9i02QaXBqgR8
QmIJ+tOjwKnOPFThbJMjxlsrQMwJ6mPywhwt6E74YsKV6ndGFigBOfzjZlOn3OKX
DM60oWFhuCfHQdOlid1d6Uyuc4yeFb4g4XSS4sXW9wLPpvve63NxxBQ8ez0r3Up8
nLwcCqxFZRFwKX2Wp6fgtpmhgxolLF29XvpfTUMR6pPai+/Ei59vU4JXqqz0haqa
3UHpQ3AznN5vm+UvZJYe
=pvQS
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
Joe Conway  writes:
> On 06/18/2014 07:29 PM, Tom Lane wrote:
>> With the attached patch on top of yours, I see no leak anymore.

> I can confirm that -- rock solid with 1 million iterations. I assume
> that should not be back-patched though?

Well, we usually think memory leaks are back-patchable bugs.  I'm
a bit worried about the potential performance impact of an extra
memory context creation/deletion though.  It's probably not noticeable in
this test case, but that's just because dblink() is such a spectacularly
expensive function.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Peter Geoghegan
On Wed, Jun 18, 2014 at 8:35 PM, Tom Lane  wrote:
>> Still, it should be possible to
>> determine if that's the problem using btreecheck.
>
> Does btreecheck attempt to verify that the sort ordering of the index
> matches the comparison behavior of the datatype?  That would (in general)
> require calling user-defined code, which seems like probably a pretty
> bad idea for the purposes btreecheck is being advertised for.


Yes, it does, but I see no alternative for a general-purpose tool, and
the fact that it is general purpose is of considerable value. I have
more or less invented my own weird index scans.

I assume you're referring to the field-verification of indexes use
case, which is not an immediate goal of btreecheck, even though it's
an important goal that there has already been some discussion of.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
Joe Conway  writes:
> On 06/18/2014 08:19 PM, Tom Lane wrote:
>> Actually, I was wondering whether we couldn't remove that 
>> CreateTupleDescCopy call entirely.

> Apparently not, at least without some additional surgery.
> ExecMakeTableFunctionResult() tries to free the tupledesc and segfaults.

Hmm ... oh, I missed this bit:

/* We must get the tupledesc from call context */
if (rsinfo && IsA(rsinfo, ReturnSetInfo) &&
rsinfo->expectedDesc != NULL)
{
result = TYPEFUNC_COMPOSITE;
if (resultTupleDesc)
*resultTupleDesc = rsinfo->expectedDesc;
/* Assume no polymorphic columns here, either */
}

So it's passing back the same tupdesc passed in by the caller of
ExecMakeTableFunctionResult.  We can free that all right, but the caller
won't be happy :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick  wrote:
>> Interesting, I'll take a look later.

> I'm pretty suspicious of incompatibilities that may exist between the
> two sets of OS collations involved here. We aren't very clear on the
> extent to which what you're doing is supported, but it's certainly the
> case that bttextcmp()/varstr_cmp()/strcoll() return values must be
> immutable between the two systems.

Oooh, I'll bet that's exactly it.  Is the database using UTF8 encoding and
a non-C locale?  It's well known that OS X's UTF8 locales sort nothing at
all like the supposedly equivalent locales on other systems.

> Still, it should be possible to
> determine if that's the problem using btreecheck.

Does btreecheck attempt to verify that the sort ordering of the index
matches the comparison behavior of the datatype?  That would (in general)
require calling user-defined code, which seems like probably a pretty
bad idea for the purposes btreecheck is being advertised for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Ian Barwick

On 19/06/14 12:30, Peter Geoghegan wrote:

On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick  wrote:

Interesting, I'll take a look later.


I'm pretty suspicious of incompatibilities that may exist between the
two sets of OS collations involved here. We aren't very clear on the
extent to which what you're doing is supported, but it's certainly the
case that bttextcmp()/varstr_cmp()/strcoll() return values must be
immutable between the two systems. Still, it should be possible to
determine if that's the problem using btreecheck.

Do you get perfectly consistent answers between the two when you ORDER BY login?


Hmm, nope, different sort order.


Regards

Ian Barwick

--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/18/2014 08:19 PM, Tom Lane wrote:
> Actually, I was wondering whether we couldn't remove that 
> CreateTupleDescCopy call entirely.

Apparently not, at least without some additional surgery.
ExecMakeTableFunctionResult() tries to free the tupledesc and segfaults.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTolmPAAoJEDfy90M199hlz7YQAInPRKkGLskcqx4ujmsNpbEG
RS2zSll4PqCELa/wMcWDJUsoVkzjybuw6A/1MSLGtERIamHDcmDIbTFwx9Z02n0u
nuFGgyds9auIqn+AB18rvwgas+gL6tRHOUe4bagiuqNzywnOceW98PT0NttWt86y
Fsyz/xfapIoV+kS1k9FAXC5B/PfayYoPq3cB3qWGNX/vor+xIgw3BGT9Bk3DbN2J
IqD75HqoFV5jEwiStwxLaLvgqiLPVMzI/HdiQhprbveTa1e2vFM7Tu6n02i8uFVt
fVu4UCtBtOWRIC9oOO0QhVtqnETMpwsxwWIxC5RhWScL7M8CT3Z9cw5zCIJWo2Q8
VaUDa+gpXukisg8OUfK2AhSduxcPYJ8I+b/VMS9p6j5P/87slnLuMaTnxU7afVCM
3F2UrDOgv53t43NMeD3xou8J4ntf/NJbaOsXCQx9yXjcq3UMzT0g3OSwxPbfViE+
YN6GCelnt6e0mT3Uk/pZDm0+QwYeMv6ZHjYD3y7vD4+Ipnt/HNAhO6r/HyRDk7/x
DvSeO0okJXwUqTq/xcJ6wBE7frBqTpfzLxEMLXEVMMRCZWpKOAwK05afIZsadKqP
wgeJETiSirKfWUWb0/bmMIVv2vA4fRZYpLW31pBr6OLS1GlwsrsfuYNxCm8ur1tG
gUe/trrEIMBo6iyE//N5
=i7jE
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Peter Geoghegan
On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick  wrote:
> Interesting, I'll take a look later.

I'm pretty suspicious of incompatibilities that may exist between the
two sets of OS collations involved here. We aren't very clear on the
extent to which what you're doing is supported, but it's certainly the
case that bttextcmp()/varstr_cmp()/strcoll() return values must be
immutable between the two systems. Still, it should be possible to
determine if that's the problem using btreecheck.

Do you get perfectly consistent answers between the two when you ORDER BY login?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
Joe Conway  writes:
> On a side note, while perusing this section of code:

> 8<-- at dblink.c:1176 --
>  /* make sure we have a persistent copy of the tupdesc */
>  tupdesc = CreateTupleDescCopy(tupdesc);

> Shouldn't that CreateTupleDescCopy() happen in ecxt_per_query_memory?

Not necessary (we'd have seen crashes long since if it was).
ExecMakeTableFunctionResult doesn't need the tupdesc to persist past
return.

Actually, I was wondering whether we couldn't remove that
CreateTupleDescCopy call entirely.  The risk would be if
get_call_result_type returned a pointer into relcache or some other cached
tuple descriptor, which might be subject to a cache flush --- but AFAICS
it always returns a freshly created or copied tupdesc.  (This might not
have been true originally, which could explain why dblink thinks it needs
to copy.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Ian Barwick

On 19/06/14 11:58, Peter Geoghegan wrote:

On Wed, Jun 18, 2014 at 6:54 PM, Ian Barwick  wrote:

I've just run into an index issue on 9.5 HEAD on a slave (master and slave
both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details
below (I have only found one index on the slave where the issue occurs so far).


Would you mind running my btreecheck tool on both systems? That might
shed some light on this. You can get it from:
http://www.postgresql.org/message-id/cam3swzrtv+xmrwlwq6c-x7czvwavfdwfi4st1zz4ddgfh4y...@mail.gmail.com
.

I suggest running bt_parent_index_verify() and bt_leftright_verify()
on all indexes on both systems. It shouldn't take too long.


Interesting, I'll take a look later.


Thanks

Ian Barwick

--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/18/2014 07:29 PM, Tom Lane wrote:
> I wrote:
>> I do see growth in the per-query context as well.  I'm not sure 
>> where that's coming from, but we probably should try to find
>> out. A couple hundred bytes per iteration is going to add up,
>> even if it's not as fast as 8K per iteration.  I'm not sure it's
>> dblink's fault, because I don't see anything in dblink.c that is
>> allocating anything in the per-query context, except for the
>> returned tuplestores, which somebody else should clean up.
> 
> I poked at this example some more, and found that the additional
> memory leak is occurring during evaluation of the arguments to be
> passed to dblink().  There's been a comment there for a very long
> time suggesting that we might need to do something about that ...
> 
> With the attached patch on top of yours, I see no leak anymore.

I can confirm that -- rock solid with 1 million iterations. I assume
that should not be back-patched though?

On a side note, while perusing this section of code:

8<-- at dblink.c:1176 --
 /* make sure we have a persistent copy of the tupdesc */
 tupdesc = CreateTupleDescCopy(tupdesc);

 /* check result and tuple descriptor have the same number of columns */
 if (nfields != tupdesc->natts)
 ereport(ERROR,
 (errcode(ERRCODE_DATATYPE_MISMATCH),
  errmsg("remote query result rowtype does not match "
  "the specified FROM clause rowtype")));

 /* Prepare attinmeta for later data conversions */
 sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);

 /* Create a new, empty tuplestore */
 oldcontext =
MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
 sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
 rsinfo->setResult = sinfo->tuplestore;
 rsinfo->setDesc = tupdesc;
 MemoryContextSwitchTo(oldcontext);
8<-- at dblink.c:1194 --

Shouldn't that CreateTupleDescCopy() happen in ecxt_per_query_memory?

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTolNHAAoJEDfy90M199hlEN8QAIi3eq5D93Y4CqWWS8Uulqx1
jOBQUoLTwF7/CkYM0F+tGvX1QO1BAIDAxS6jwWXK/c9NqesVuS5tNwQcdrLb69Pm
me4hQ2ZYtoCA4Y8SiFL0sOjUGuads2QEjhL3HUMQDBXTMjCpzamotIBGvYWu1OOe
bf1VaY83bwGa6XvXYcfFFyqyUz+kMHvcM/Rq9Mj7pLrGT9Lqvec9RL/1FhFYamrI
2ewaKYC4htWXwk1xIvpDZtWTjLyv3BUl3X41BzPOecChWqXmYCpaPo+dR6V9LSm1
OxGFQL4Z3pM7VZDWk5FOj3vOFz1AJhWXEh8fyzlLKeDm7FFaApyf3rPLULBRO4sj
4C88lUdSbhzgTlMreq/wqBh2bKFmLGUA8M9sSwd+2DMc6QQQN0DWCgDtSZq/3Fwr
Tc/0LWvHwh+/Tozx0kk0DxIQzS0BOsWHzjtOMwb1p3aLZXlG9FP5IrrPJlIyDOQK
feVB9JNH5kGUFEU0OkGaqPaQy1lTVTizIA/FmTqV9QeJo2+vKSNPt2Ys4dM5jevo
Tpp6ZAjvC6sAoIoWmvtazV5WOL7FwXwf8AQRJRgmh8JqHw4C2nZt9R+CNQqbGPa2
hxiTWi5EMufBmVOJeaEX867CUTLL6qzCtr/I2a2XZyMuPD5dQbS3l2MaYw1l2ucU
o9x6G78hBR32xagpqPCE
=LqzA
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Peter Geoghegan
On Wed, Jun 18, 2014 at 6:54 PM, Ian Barwick  wrote:
> I've just run into an index issue on 9.5 HEAD on a slave (master and slave
> both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details
> below (I have only found one index on the slave where the issue occurs so 
> far).

Would you mind running my btreecheck tool on both systems? That might
shed some light on this. You can get it from:
http://www.postgresql.org/message-id/cam3swzrtv+xmrwlwq6c-x7czvwavfdwfi4st1zz4ddgfh4y...@mail.gmail.com
.

I suggest running bt_parent_index_verify() and bt_leftright_verify()
on all indexes on both systems. It shouldn't take too long.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread David Fetter
On Wed, Jun 18, 2014 at 03:30:34PM -0700, Kevin Grittner wrote:
> David Fetter  wrote:
> > Robert Haas wrote:
> >> Kevin Grittner  wrote:
> 
> > The good:
> > - Generating the tuplestores.  Yay!
> 
> Thanks for that.  ;-)

Sorry, I just can't resist references to Spaghetti Westerns.
https://en.wikipedia.org/wiki/The_Good,_the_Bad_and_the_Ugly

> > The bad:
> > - Generating them exactly and only for AFTER triggers
> 
> The standard only allows them for AFTER triggers, and I'm not sure
> what the semantics would be for any others.

As, so here's where we differ.  You're looking at deltas, a very nice
capability to have.  I'm looking at the before and after tuplestores
as components of which deltas, among many other things, could be
composed.

> > - Requiring that the tuplestores both be generated or not at
> >   all.  There are real use cases described below where only
> >   one would be relevant.
> 
> Yeah.
> 
> > - Generating the tuplestores unconditionally.
> 
> Well, there are conditions.  Only when the reloption allows and
> only if there is an AFTER trigger for the type of operation in
> progress.

For deltas, this is just the thing.

I'm vaguely picturing the following as infrastructure:

- Instead of modifying Rel, we modify Query to contain two more bools
  default false: hasBeforeTuplestore and hasAfterTuplestore
- Each use case we implement would set 0 or more of these to true.
  For the delta use case, appropriate trigger definitions would set
  both.

This is vague because I haven't really gotten hacking on it, just
exploring what I hope are the relevant parts of the code.

> > The ugly:
> > - Attaching tuplestore generation to tables rather than
>     callers (triggers, DML, etc.)
> 
> I'm not sure what you're getting at here.  This patch is
> specifically only concerned with generating delta relations for DML
> AFTER triggers, although my hope is that it will be a basis for
> delta relations used for other purposes.  This seems to me like the
> right place to initially capture the data for incremental
> maintenance of materialized views, and might be of value for other
> purposes, too.

Hrm.  I don't really see this stuff as table properties.  The
materialized view case is an obvious example where the matview, not
the relations underneath, wants this information.  The relations
underneath may have their own concerns, but it's the matview whose
existence should ensure that the tuplestores are being generated.

Once the last depending-on-one-of-the-tuplestores things is gone, and
this could simply be the end of a RETURNING query, the tuplestores go
away.

> > [formal definition of standard CREATE TRIGGER statement]
> 
> > Sorry that was a little verbose, but what it does do is give us
> > what we need at trigger definition time.  I'd say it's pilot
> > error if a trigger definition says "make these tuplestores" and
> > the trigger body then does nothing with them, which goes to
> > Robert's point below re: unconditional overhead.
> 
> Yeah, the more I think about it (and discuss it) the more I'm
> inclined to suffer the additional complexity of the standard syntax
> for specifying transition relations in order to avoid unnecessary
> overhead creating them when not needed.  I'm also leaning toward
> just storing TIDs in the tuplestores, even though it requires mixed
> snapshots in executing queries in the triggers.

So in this case one tuplestore with two TIDs, either of which might be
NULL?

> just seems like there will otherwise be to much overhead in copying
> around big, unreferenced columns for some situations.

Yeah, it'd be nice to have the minimal part be as slim as possible.

> > Along that same line, we don't always need to capture both the
> > before tuplestores and the after ones.  Two examples of this come
> > to mind:
> >
> > - BEFORE STATEMENT triggers accessing rows, where there is no
> > after part to use,
> 
> Are you talking about an UPDATE for which the AFTER trigger(s) only
> reference the before transition table, and don't look at AFTER?If
> so, using the standard syntax would cover that just fine.  If not,
> can you elaborate?

Sorry I was unclear.  I was looking at one of the many things having
these tuplestores around could enable.  As things stand now, there is
no access of any kind to rows with any per-statement trigger, modulo
user-space hacks like this one:

http://people.planetpostgresql.org/dfetter/index.php?/archives/71-Querying-Rows-in-Statement-Triggers.html

Having the "before" tuplestore available to a BEFORE STATEMENT trigger
would make it possible to do things with the before transition table
that are fragile and hacky now.

> > and
> > - DML (RETURNING BEFORE, e.g.) which only touches one of them. 
> > This applies both to extant use cases of RETURNING and to planned
> > ones.
> 
> I think that can be sorted out by a patch which implements that, if
> these deltas even turn out to be the appropriate way to get that
> data (whic

Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
I wrote:
> I do see growth in the per-query context as well.  I'm not sure
> where that's coming from, but we probably should try to find out.
> A couple hundred bytes per iteration is going to add up, even if it's
> not as fast as 8K per iteration.  I'm not sure it's dblink's fault,
> because I don't see anything in dblink.c that is allocating anything in
> the per-query context, except for the returned tuplestores, which
> somebody else should clean up.

I poked at this example some more, and found that the additional memory
leak is occurring during evaluation of the arguments to be passed to
dblink().  There's been a comment there for a very long time suggesting
that we might need to do something about that ...

With the attached patch on top of yours, I see no leak anymore.

regards, tom lane

diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index f162e92..cf8cbb1 100644
*** a/src/backend/executor/execQual.c
--- b/src/backend/executor/execQual.c
*** ExecMakeTableFunctionResult(ExprState *f
*** 2014,2019 
--- 2014,2020 
  	PgStat_FunctionCallUsage fcusage;
  	ReturnSetInfo rsinfo;
  	HeapTupleData tmptup;
+ 	MemoryContext argContext = NULL;
  	MemoryContext callerContext;
  	MemoryContext oldcontext;
  	bool		direct_function_call;
*** ExecMakeTableFunctionResult(ExprState *f
*** 2083,2094 
  		/*
  		 * Evaluate the function's argument list.
  		 *
! 		 * Note: ideally, we'd do this in the per-tuple context, but then the
! 		 * argument values would disappear when we reset the context in the
! 		 * inner loop.  So do it in caller context.  Perhaps we should make a
! 		 * separate context just to hold the evaluated arguments?
  		 */
  		argDone = ExecEvalFuncArgs(&fcinfo, fcache->args, econtext);
  		/* We don't allow sets in the arguments of the table function */
  		if (argDone != ExprSingleResult)
  			ereport(ERROR,
--- 2084,2104 
  		/*
  		 * Evaluate the function's argument list.
  		 *
! 		 * We can't do this in the per-tuple context: the argument values
! 		 * would disappear when we reset that context in the inner loop.  And
! 		 * the caller's context is typically a query-lifespan context, so we
! 		 * don't want to leak memory there.  So make a separate context just
! 		 * to hold the evaluated arguments.
  		 */
+ 		argContext = AllocSetContextCreate(callerContext,
+ 		   "Table function arguments",
+ 		   ALLOCSET_DEFAULT_MINSIZE,
+ 		   ALLOCSET_DEFAULT_INITSIZE,
+ 		   ALLOCSET_DEFAULT_MAXSIZE);
+ 		oldcontext = MemoryContextSwitchTo(argContext);
  		argDone = ExecEvalFuncArgs(&fcinfo, fcache->args, econtext);
+ 		MemoryContextSwitchTo(oldcontext);
+ 
  		/* We don't allow sets in the arguments of the table function */
  		if (argDone != ExprSingleResult)
  			ereport(ERROR,
*** no_function_result:
*** 2321,2328 
--- 2331,2342 
  			FreeTupleDesc(rsinfo.setDesc);
  	}
  
+ 	/* Clean up */
  	MemoryContextSwitchTo(callerContext);
  
+ 	if (argContext)
+ 		MemoryContextDelete(argContext);
+ 
  	/* All done, pass back the tuplestore */
  	return rsinfo.setResult;
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Ian Barwick
Hi

I've just run into an index issue on 9.5 HEAD on a slave (master and slave
both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details
below (I have only found one index on the slave where the issue occurs so far).
The setup is admittedly slightly unusual; master is OS X 10.7.5, slave is
CentOS on a Virtualbox guest VM on the same system. The issue only occurs
with this combination of master and slave; I haven't been able to reproduce
it with master and slave running natively on OS X, or with a Linux guest VM
on a Linux machine. I have reproduced it several times on the OS X/Linux guest 
VM
combination.

I can't dig any further into this at the moment but can happily provide further
details etc.

Master
==

$ uname -a
Darwin nara.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 
PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64

tgg_current=> SELECT version();
 version

--
 PostgreSQL 9.5devel on x86_64-apple-darwin11.4.2, compiled by gcc 
(MacPorts gcc48 4.8.2_2) 4.8.2, 64-bit
(1 row)

tgg_current=> select user_id, login from tgg_user where login ='admin';
 user_id | login
-+---
   1 | admin
(1 row)


Slave
=

$ uname -a
Linux localhost.localdomain 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 
00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

tgg_current=> select version();
 version

-
 PostgreSQL 9.5devel on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 
4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)

tgg_current=> select user_id,login from tgg_user where login ='admin';
 user_id | login
-+---
(0 rows)

tgg_current=> explain select user_id,login from tgg_user where login 
='admin';
 QUERY PLAN


 Index Scan using tgg_user_login_key on tgg_user  (cost=0.28..8.30 rows=1 
width=15)
   Index Cond: ((login)::text = 'admin'::text)
 Planning time: 0.105 ms
(3 rows)

tgg_current=> set enable_bitmapscan=off;
SET
tgg_current=> set enable_indexscan =off;
SET
tgg_current=> select user_id,login from tgg_user where login ='admin';
 user_id | login
-+---
   1 | admin
(1 row)


tgg_current=> \d tgg_user_login_key
   Index "epp.tgg_user_login_key"
 Column | Type  | Definition
+---+
 login  | character varying(32) | login
unique, btree, for table "epp.tgg_user"


Regards

Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Matheus de Oliveira
I was going to answer each message, but Fabrízio summarized everything so
far really nicely. Thanks Fabrízio.

On Wed, Jun 18, 2014 at 5:25 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

> Then, to summarize Matheus must do:
> * use an option instead of change the syntax and catalog to indicate that
> a tablespace is used to store temp objects
>

Yes. I myself wasn't sure TEMPORARY syntax would be good, but I would just
like to hear more about. Storing as an options seems nice to me.


> * throw an exception if we try ALTER the option "only_temp_relations"
>

I think we should postpone the ALTER option, perhaps as another patch.
Wasn't that what was decided?

I'll investigate the options better before going this route. Let me work on
CREATE first.



> * add regression tests
> * add documentation
>
> And, of course, register to the next open commitfest [1] to get detailed
> feedback and review.
>

Noted. It will be on the next commitfest. Just knowing some people do want
this make me more comfortable on working better.

Best regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


Re: [HACKERS] How to change the pgsql source code and build it??

2014-06-18 Thread Craig Ringer
On 06/19/2014 01:50 AM, Shreesha wrote:

> However in my case, I don't know why there wasn't a default database
> with name 'root' got created or why the server rejects the client when
> it tries to connect to the default database. 

Take a look at the initdb manual, the tutorial, and the
installation/system administration documentation.

> Can anyone shed some light on 
> 1) when the default database gets created

By initdb - see the initdb manual.

> 2) how is this database 'name' is decided? Or what is the name of the
> default database name?

See the initdb manual.

> 3) Is there any other places in the database server code where this
> check is applied?

Wherever you connect, libpq picks the default database = the current
username.

> Upon looking at the error I got, I believe the code is searching for the
> database name == user name.  If anyone can give some input on the code,
> it would be helpful!

libpq


I think you probably need to move this to pgsql-general, Stack Overflow,
etc. It's not really on topic for pgsql-hackers.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread David G Johnston
On Wed, Jun 18, 2014 at 8:01 PM, Josh Berkus [via PostgreSQL] <
ml-node+s1045698n5807868...@n5.nabble.com> wrote:

> On 06/18/2014 04:54 PM, Marko Tiikkaja wrote:
> > On 2014-06-19 1:46 AM, Josh Berkus wrote:
> >> Robert's right, not killing the "BEGIN;" only transactions is liable to
> >> result in user confusion unless we label those sessions differently in
> >> pg_stat_activity.
> >
> > Wouldn't they be labeled differently already?  i.e. the last query would
> > be "BEGIN".  Unless your app tries to unsuccessfully use nested
> > transactions, you would know why it hasn't been killed.
>
> That's pretty darned obscure for a casual user.  *you* would know, and
> *I* would know, but 99.5% of our users would be very confused.
>
> Plus, if a session which has only issued a "BEGIN;" doesn't have a
> snapshot and isn't holding any locks, then I'd argue we shouldn't lable
> it IIT in the first place because it's not doing any of the bad stuff we
> want to resolve by killing IITs.  Effectively, it's just "idle".
>
>
​+1

Since the BEGIN is not meaningfully interpreted until the first subsequent
command (SET excepting apparently - are there others?) is issued no
transaction has begun at this point so "idle" and not "IIT" should be the
reported state on pg_stat_activity​.

Though I can understand some level of surprise if someone sees "idle" with
a "BEGIN" (or SET TIMEZONE) as the last executed command - so maybe "idle
before transaction" instead of "idle in transaction" - which hopefully will
not be assumed to be controlled by the "idle_in_transaction_timeout" GUC.
 I presume that we have some way to distinguish this particular case and
can report it accordingly.  Even if that state endures after a SET TIMEZONE
or similar session configuration command explaining that all those are
"pre-transaction" shouldn't be too hard - though as a third option "idle
initialized transaction" could be the state indicator.

Depending on how "idle in transaction (aborted)" gets resolved we could
also consider "idle in transaction (initializing)" - but since the former,
IMO, should be dropped (and thus matches the "in" specification of the GUC)
naming the later - which should not be dropped (I'll go with the stated
opinion here for now) - with "in" is not advisable.

The current behavior is transparent to the casual user so sticking with
"idle" does seem like the best choice to maintain the undocumented nature
of the behavior.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/idle-in-transaction-timeout-tp5805859p5807874.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Tom Lane
Josh Berkus  writes:
> Counter-argument: most app frameworks which do an automatic BEGIN; also
> do other stuff like SET TIMEZONE each time as well.  Is this really a
> case worth worrying about?

SET TIMEZONE doesn't result in taking a snapshot either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-18 Thread Tom Lane
Gurjeet Singh  writes:
> On Tue, Jun 10, 2014 at 12:05 PM, Tom Lane  wrote:
>> If we're going to do this, I would say that we should also take the
>> opportunity to get out from under the question of which kernel API
>> we're dealing with.  So perhaps a design like this:
>> 
>> 1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's
>> the name of a file to write something into after forking.  The typical
>> value would be "/proc/self/oom_score_adj" or "/proc/self/oom_adj".
>> If not set, nothing happens.
>> 
>> 2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's
>> the string we write, otherwise we write "0".

> Please find attached the patch. It includes the doc changes as well.

Applied with some editorialization.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Josh Berkus
On 06/18/2014 04:54 PM, Marko Tiikkaja wrote:
> On 2014-06-19 1:46 AM, Josh Berkus wrote:
>> Robert's right, not killing the "BEGIN;" only transactions is liable to
>> result in user confusion unless we label those sessions differently in
>> pg_stat_activity.
> 
> Wouldn't they be labeled differently already?  i.e. the last query would
> be "BEGIN".  Unless your app tries to unsuccessfully use nested
> transactions, you would know why it hasn't been killed.

That's pretty darned obscure for a casual user.  *you* would know, and
*I* would know, but 99.5% of our users would be very confused.

Plus, if a session which has only issued a "BEGIN;" doesn't have a
snapshot and isn't holding any locks, then I'd argue we shouldn't lable
it IIT in the first place because it's not doing any of the bad stuff we
want to resolve by killing IITs.  Effectively, it's just "idle".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Marko Tiikkaja

On 2014-06-19 1:46 AM, Josh Berkus wrote:

Robert's right, not killing the "BEGIN;" only transactions is liable to
result in user confusion unless we label those sessions differently in
pg_stat_activity.


Wouldn't they be labeled differently already?  i.e. the last query would 
be "BEGIN".  Unless your app tries to unsuccessfully use nested 
transactions, you would know why it hasn't been killed.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Josh Berkus
On 06/18/2014 02:52 PM, Bruce Momjian wrote:
> On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote:
>> The only problem I see is that it makes the semantics kind of weird
>> and confusing.  "Kill connections that are idle in transaction for too
>> long" is a pretty clear spec; "kill connections that are idle in
>> transaction except if they haven't executed any commands yet because
>> we think you don't care about that case" is not quite as clear, and
>> not really what the GUC name says, and maybe not what everybody wants,
>> and maybe masterminding.
> 
> "Kill connections that are idle in non-empty transaction block for too
> long"

Here's the POLS violation in this:

"I have idle_in_transaction_timeout set to 10min, but according to
pg_stat_activity I have six connections which are IIT for over an hour.
 What gives?"

Robert's right, not killing the "BEGIN;" only transactions is liable to
result in user confusion unless we label those sessions differently in
pg_stat_activity. Tom is right in that killing them will cause some
users to not use IIT_timeout when they should, or will set the timeout
too high to be useful.

So it seems like what we should do is NOT call sessions IIT if they've
merely executed a BEGIN; and not done anything else.  Then the timeout
and pg_stat_activity would be consistent.

Counter-argument: most app frameworks which do an automatic BEGIN; also
do other stuff like SET TIMEZONE each time as well.  Is this really a
case worth worrying about?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Jsonb: jbvBinary usage in the convertJsonbValue?

2014-06-18 Thread Andrew Dunstan


On 06/02/2014 11:38 AM, Andrew Dunstan wrote:


On 06/02/2014 10:22 AM, Andrew Dunstan wrote:


On 06/02/2014 10:02 AM, Robert Haas wrote:
On Thu, May 29, 2014 at 7:34 AM, Dmitry Dolgov 
<9erthali...@gmail.com> wrote:

I'm little confused by the convertJsonbValue functon at jsonb_utils.c
Maybe I misunderstood something, so I need help =)


if (IsAJsonbScalar(val) || val->type == jbvBinary)
 convertJsonbScalar(buffer, header, val);
As I can see, the convertJsonbScalar function is used for scalar 
and binary

jsonb values. But this function doesn't handle the jbvBinary type.

There definitely seems to be an oversight here of some kind. Either
convertJsonbValue() shouldn't be calling convertJsonbScalar() with an
object of type jbvBinary, or convertJsonbScalar() should know how to
handle that case.




Yes, I've just been looking at that. I think this is probably a 
hangover from when these routines were recast to some extent. Given 
that we're not seeing any errors from it, I'd be inclined to remove 
the the "|| val->type == jbvBinary" part. One of the three call sites 
to convertJsonbValue asserts that this can't be true, and doing so 
doesn't result in a regression failure.


Peter and Teodor, comments?




Thinking about it a bit more, ISTM this should be ok, since we convert 
a JsonbValue to Jsonb in a depth-first recursive pattern. We should 
certainly add some comments along these lines to explain why the 
jbvbinary case shouldn't arise.




Nobody has commented further that I have noticed, so I have committed this.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread Kevin Grittner
David Fetter  wrote:
> Robert Haas wrote:
>> Kevin Grittner  wrote:

> The good:
> - Generating the tuplestores.  Yay!

Thanks for that.  ;-)

> The bad:
> - Generating them exactly and only for AFTER triggers

The standard only allows them for AFTER triggers, and I'm not sure
what the semantics would be for any others.

> - Requiring that the tuplestores both be generated or not at
>   all.  There are real use cases described below where only
>   one would be relevant.

Yeah.

> - Generating the tuplestores unconditionally.

Well, there are conditions.  Only when the reloption allows and
only if there is an AFTER trigger for the type of operation in
progress.

> The ugly:
> - Attaching tuplestore generation to tables rather than
    callers (triggers, DML, etc.)

I'm not sure what you're getting at here.  This patch is
specifically only concerned with generating delta relations for DML
AFTER triggers, although my hope is that it will be a basis for
delta relations used for other purposes.  This seems to me like the
right place to initially capture the data for incremental
maintenance of materialized views, and might be of value for other
purposes, too.

> [formal definition of standard CREATE TRIGGER statement]

> Sorry that was a little verbose, but what it does do is give us
> what we need at trigger definition time.  I'd say it's pilot
> error if a trigger definition says "make these tuplestores" and
> the trigger body then does nothing with them, which goes to
> Robert's point below re: unconditional overhead.

Yeah, the more I think about it (and discuss it) the more I'm
inclined to suffer the additional complexity of the standard syntax
for specifying transition relations in order to avoid unnecessary
overhead creating them when not needed.  I'm also leaning toward
just storing TIDs in the tuplestores, even though it requires mixed
snapshots in executing queries in the triggers.  It just seems like
there will otherwise be to much overhead in copying around big,
unreferenced columns for some situations.

> Along that same line, we don't always need to capture both the
> before tuplestores and the after ones.  Two examples of this come
> to mind:
>
> - BEFORE STATEMENT triggers accessing rows, where there is no
> after part to use,

Are you talking about an UPDATE for which the AFTER trigger(s) only
reference the before transition table, and don't look at AFTER?  If
so, using the standard syntax would cover that just fine.  If not,
can you elaborate?

> and
> - DML (RETURNING BEFORE, e.g.) which only touches one of them. 
> This applies both to extant use cases of RETURNING and to planned
> ones.

I think that can be sorted out by a patch which implements that, if
these deltas even turn out to be the appropriate way to get that
data (which is not clear to me at this time).  Assuming standard
syntax, the first thing would be for the statement to somehow
communicate to the trigger layer the need to capture a tuplestore
it might otherwise not generate, and there would need to be a way
for the statement to access the needed tuplestore(s).  The
statement would also need to project the right set of columns. 
None of that seems to me to be relevant to this patch.  If this
patch turns out to provide infrastructure that helps, great.  If
you have a specific suggestion about how to make the tuplestores
more accessible to other layers, I'm listening.

> In summary, I'd like to propose that the tuplestores be generated
> separately in general and attached to callers. We can optimize
> this by not generating redundant tuplestores.

Well, if we use the standard syntax for CREATE TRIGGER and store
the transition table names (if any) in pg_trigger, the code can
generate one relation if any AFTER triggers which are going to fire
need it.  I don't see any point in generating exactly the same
tuplestore contents for each trigger.  And suspect that we can wire
in any other uses later when we have something to connect them to.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btreecheck extension

2014-06-18 Thread Peter Geoghegan
On Wed, Jun 18, 2014 at 4:48 AM, Stephen Frost  wrote:
> I'm fine with having these start out as external tools which are doing
> checks, but I've been specifically asked about (and have desired myself
> from time-to-time...) an in-core capability to check index/heap/etc
> validity.  Folks coming from certain other RDBMS's find it amazing that
> we don't have any support for that when what they really want is a
> background worker which is just automatically going around doing these
> checks.

Yes, I think that being able to verify the integrity of index and heap
relations is an important feature, and I think it's notable that we're
the only major RDBMS that lacks this support. I'm not quite sure what
that should entail just yet, but clearly it should be somewhat like
what I have here. I think the big open questions to make something
like this work are mostly around the cost/benefit ratio of each of the
checks I've outlined. Certainly, for that use case minimizing
disruption on a live system becomes even more important. I'll probably
look at a buffer access strategy, just to give an example of that off
the top of my head.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread Kevin Grittner
Robert Haas  wrote:
> Kevin Grittner  wrote:

> Can you describe what the standard syntax for the row variables
> is, as opposed to our syntax?  Also, what's the standard syntax
> for the the transition relations?

If you want either (or both), the standard has you using a
REFERENCING clause right before the FOR EACH part of the statement.
You can specify one or more sections in the format:

{ OLD | NEW } [ ROW | TABLE ] [ AS ] name

If you have more than one, they are separated by whitespace.  (No
commas or other visible delimiters.)  If you omit ROW/TABLE it
defaults to ROW.  You are only allowed to specify TABLE on an AFTER
trigger.  You are only allowed to specify ROW on a FOR EACH ROW
trigger.  (There is no prohibition against specifying TABLE on a
FOR EACH ROW trigger.)  You are only allowed to specify OLD for a
DELETE or UPDATE trigger.  (The ability for one trigger definition
to specify multiple operations is a PostgreSQL extension.)  You are
only allowed to specify NEW for an UPDATE or INSERT trigger.  You
may not repeat an entry.

Essentially, we have an implied clause on every FOR EACH ROW
trigger like:

  REFERENCING OLD ROW AS OLD NEW ROW AS NEW

>> Some things which I *did* follow from the standard: these new
>> relations are only allowed within AFTER triggers, but are
>> available in both AFTER STATEMENT and AFTER ROW triggers.  That
>> is, an AFTER UPDATE ... FOR EACH ROW trigger could use both the
>> OLD and NEW row variables as well as the delta relations (under
>> whatever names we pick).  That probably won't be used very
>> often, but I can imagine some cases where it might be useful.  I
>> expect that these will normally be used in FOR EACH STATEMENT
>> triggers.
>
> I'm concerned about the performance implications of capturing the
> delta relations unconditionally.  If a particular trigger
> actually needs the delta relations, then the time spent capturing
> that information is well spent; but if it doesn't, then it's a
> waste.  There are lots of people using FOR EACH STATEMENT
> triggers right now who won't be happy if those start requiring
> O(n) additional temporary storage, where n is the number of rows
> modified by the statement.  This is likely an even bigger issue
> for per-row triggers, of course, where as you say, it probably
> won't be used often.

That is why I added a reloption which must be specifically enabled
for a table in order to generate these deltas.  That would be an
inconvenience for those wanting to use the new feature, but should
prevent a performance regression for any tables where it is not
specifically turned on.  That's not perfect, of course, because if
you turn it on for an UPDATE ... AFTER EACH STATEMENT trigger where
you want it, you do suffer the overhead on every AFTER trigger on
that table.  Perhaps this is sufficient reason to use the standard
syntax for the new delta tables -- the would then be generated only
in the specific cases where they were needed.  And I think I could
lose the reloption.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Bruce Momjian
On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote:
> On Wed, Jun 18, 2014 at 3:53 PM, Josh Berkus  wrote:
> > On 06/18/2014 12:32 PM, Tom Lane wrote:
> >> Josh Berkus  writes:
> >>>  There are plenty of badly-written applications which "auto-begin", that
> >>> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or
> >>> not there's any additional work to do.  This is a major source of IIT
> >>> and the timeout should not ignore it.
> >>
> >> Nonsense.  We explicitly don't do anything useful until the first actual
> >> command arrives, precisely to avoid that problem.
> >
> > Oh, we don't allocate a snapshot?  If not, then no objection here.
> 
> The only problem I see is that it makes the semantics kind of weird
> and confusing.  "Kill connections that are idle in transaction for too
> long" is a pretty clear spec; "kill connections that are idle in
> transaction except if they haven't executed any commands yet because
> we think you don't care about that case" is not quite as clear, and
> not really what the GUC name says, and maybe not what everybody wants,
> and maybe masterminding.

"Kill connections that are idle in non-empty transaction block for too
long"

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread Kevin Grittner
Alvaro Herrera  wrote:
> Kevin Grittner wrote:

> TRUNCATE triggers shouldn't have delta relations, that's for
> sure, or it'd completely negate the point of TRUNCATE triggers.
> I don't think we have any other event, do we?  People who get
> delta relations for deleting all rows should be using DELETE, I
> think.

That sounds reasonable.  The only other issue was INSTEAD OF
triggers, but I don't think we need them there, either.

> I am not sure about providing delta relations in the FOR EACH ROW
> case.  For what cases are them useful?

In an accounting application for courts I have seen a case where
each receivable (asset) account had a contra (liability) account of
"uncollected receivables", because until and unless the Clerk of
Courts collected the judgment there was no obligation to pay the
money back out.  Any financial transaction had to be in a database
transaction, and not only did all the transaction detail need to
balance to zero, but any receivable detail row needed to be
immediately followed by a balancing contra account row (and vice
versa).  A FOR EACH ROW trigger, on seeing one of these rows, could
check for the required balancing row.  Now this would only be
solved by the standard feature if both rows were always inserted by
a single statement, which might not always have been the case; so
even with this example I am stretching a bit.  But it's close
enough to suggest that there might be legitimate uses.

And there is the fact that the standard seems to want this to be
supported.

> In all cases I think NEW and OLD are sufficient.  I didn't read
> the standard, but if it's saying that in FOR EACH ROW the
> new/deleted/changed row should be accessible by way of a delta
> relation, [...]

No, it says that you can specify *both* the row variables for the
row under consideration and the tables for the full set of rows
affected by the statement *for the same FOR EACH ROW trigger*.

> Now, if you only have delta relations in FOR EACH STATEMENT, then
> it seems to me you can optimize obtaining them only when such
> triggers are defined; this lets you do away with the reloption
> entirely, doesn't it?

That was intended to minimize the situations under which there was
a performance hit where the new delta relations were not needed in
an AFTER trigger.  If we only generate these for FOR EACH STATEMENT
triggers, that certainly reduces the need for that, but I'm not
sure it eliminates it -- especially if we're generating tuplestores
for the full rows rather than their TIDs.

It is already generating the tuplestores only if there is an AFTER
trigger for the type of statement being executed, but I agree that
it would be a very rare FOR EACH ROW trigger that actually used it.
Of course, that is one argument for the standard syntax -- we
could test whether any of the trigger definitions for that
operation on that table specified each transition table, and only
generate them if needed based on that.

> I noticed that GetCurrentFDWTuplestore() changed its identity
> without getting its comment updated.  The new name seems awfully
> generic, and I don't think it really describes what the function
> does.  I think it needs more renaminguu

Good point.  Will fix that in the next version.

>> (1)  My first impulse was to capture this delta data in the form
>> of tuplestores of just TIDs, and fetching the tuples themselves
>> from the heap on reference.  In earlier discussions others have
>> argued for using tuplestores of the actual rows themselves.
>
> Can you please supply pointers to such discussion?

That was in a matview-related discussion, so perhaps we should
ignore that for now and discuss what's best for triggers here.  If
matviews need something different, the rows could always be
materialized from the TIDs at a later point.

> I don't see any point in not just storing TIDs, but perhaps I'm
> missing something.

I think there was some question about performance, but I really
have a hard time seeing the performance being significantly worse
with a TID tuplestore for most reasonable uses; and I think the TID
tuplestore could be a lot faster if (for example) you have a table
with a large, TOASTed text or bytea column which is not referenced
(in selection criteria or the SET clause) and is not used in the
FOR EACH STATEMENT trigger.

I think there might also have been some question about visibility.
A TID tuplestore would need to use a different snapshot (like maybe
SnapshotAny) in the same query where it joined to other tables with
a normal MVCC snapshot.

>> (2)  Do we want to just pick names for these in the PLs rather
>> than using the standard syntax?  Implementing the standard
>> syntax seemed to require three new (unreserved) keywords,
>> changes to the catalogs to store the chosen relations names, and
>> some way to tie the specified relation names in to the various
>> PLs.
>
> I think the only one for which we have a compulsion to follow
> someone in this area would be PL/pgSQL,

... which curr

Re: [HACKERS] Atomics hardware support table & supported architectures

2014-06-18 Thread Andres Freund
On 2014-06-18 17:04:49 -0400, Robert Haas wrote:
> On Wed, Jun 18, 2014 at 12:13 PM, Andres Freund  
> wrote:
> > There might be cases where that's true, but in the majority of cases
> > where the variable isn't highly contended it's just about the same. In
> > many cases the microcode will implement atomic ops using ll/sc
> > operations internally anyway.
> 
> But if the variable isn't highly contended, then why are we messing
> around with atomic ops in the first place?

a) Quite often the "strange" atomic op is only needed to allow a more
   performance critical codepart to use atomic ops (e.g. setting flags
   atomically is only required to make atomic pins work).
b) A spinlock protected region is often more likely to take longer than
   a single atomic op because it'll often touch more cachelines and
   such. For such cmpxchg loops there's pretty much no intermediary
   instructions inbetween which the cacheline could be stolen by another
   core.
c) The overall amount of bus traffic is often noticeably lower with a
   atomic op because the average total number of locked instructions is
   lower (unlocking often enough requires another atomic op or barrier)


> > Why don't you pass it to configure or add it in Makefile.custom? That's
> > what I do.
> 
> Yeah, I should probably do that instead.  But I still think the point
> that two different commiters have managed to flip that flag by
> accident is telling.

Not arguing against having the configure flag here (already decided),
just wondering whether your life could be made easier :)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table & supported architectures

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 12:13 PM, Andres Freund  wrote:
> There might be cases where that's true, but in the majority of cases
> where the variable isn't highly contended it's just about the same. In
> many cases the microcode will implement atomic ops using ll/sc
> operations internally anyway.

But if the variable isn't highly contended, then why are we messing
around with atomic ops in the first place?

> I'm fine with always implementing everything but tas, cas, add ontop of
> cas for now. I want or/and/sub/... to be conveniently available to C
> code, but they don't need to be based on hardware ops. Having open-coded
> versions of the above in several places sounds like a horrible idea to
> me. Both, because we might want to optimize it at some point and because
> it's horrible readability wise.

OK, so if we're only going to have TAS, CAS, and fetch-and-add as
primitives (which sounds sensible to me) and implement the rest in
terms of those for now, then the table on the wiki only needs one more
column: information about support for fetch-and-add.

>> More than that, I actually really hate
>> things that don't have a configure option, like WAL_DEBUG, because you
>> have to change a checked-in file, which shows up as a diff, and if
>> you're not careful you check it in, and if you are careful it still
>> gets blown away every time you git reset --hard, which I do a lot.  I
>> think the fact that both Heikki and I on separate occasions have made
>> commits enabling WAL_DEBUG shows pretty clearly the weaknesses of that
>> method of doing business.
>
> Why don't you pass it to configure or add it in Makefile.custom? That's
> what I do.

Yeah, I should probably do that instead.  But I still think the point
that two different commiters have managed to flip that flag by
accident is telling.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 3:53 PM, Josh Berkus  wrote:
> On 06/18/2014 12:32 PM, Tom Lane wrote:
>> Josh Berkus  writes:
>>>  There are plenty of badly-written applications which "auto-begin", that
>>> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or
>>> not there's any additional work to do.  This is a major source of IIT
>>> and the timeout should not ignore it.
>>
>> Nonsense.  We explicitly don't do anything useful until the first actual
>> command arrives, precisely to avoid that problem.
>
> Oh, we don't allocate a snapshot?  If not, then no objection here.

The only problem I see is that it makes the semantics kind of weird
and confusing.  "Kill connections that are idle in transaction for too
long" is a pretty clear spec; "kill connections that are idle in
transaction except if they haven't executed any commands yet because
we think you don't care about that case" is not quite as clear, and
not really what the GUC name says, and maybe not what everybody wants,
and maybe masterminding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Reduce the number of semaphores used under --disable-spinlocks.

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 3:56 PM, Andres Freund  wrote:
>> > I'm looking at the way you did this in the context of the atomics
>> > patch. Won't:
>> > s_init_lock_sema(volatile slock_t *lock)
>> > {
>> > static int  counter = 0;
>> >
>> > *lock = (++counter) % NUM_SPINLOCK_SEMAPHORES;
>> > }
>> >
>> > lead to bad results if spinlocks are intialized after startup?
>>
>> Why?
>
> Because every further process will start with a copy of the postmaster's
> counter or with 0 (EXEC_BACKEND)?

Oh, true.  Maybe we should randomize that.

>> > Essentially mapping new spinlocks to the same semaphore?
>>
>> Yeah, but so what?  If we're mapping a bajillion spinlocks to the same
>> semaphore already, what's a few more?
>
> Well, imagine something like parallel query creating new segments,
> including a spinlock (possibly via a lwlock) at runtime. If there were
> several backends processing such queries this they'd all map to the same
> semaphore.

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Fabrízio de Royes Mello
On Wed, Jun 18, 2014 at 4:53 PM, Alvaro Herrera 
wrote:
>
> Stephen Frost wrote:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > > Stephen Frost  writes:
> > > > Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
> > > > ERROR that lists out all of the non-temporary objects which are
found
> > > > (and lists any other databases which have objects in those
> > > > tablespaces..).  That would allow administrators who have existing
> > > > notionally temporary-only tablespaces to go clean things up to make
them
> > > > actually temporary-only.
>
> > > I would certainly suggest that the first version of the patch not
> > > undertake to allow this property to be ALTERed; the cost-benefit
> > > ratio isn't there IMO.
> >
> > I suppose scheduling downtime to do the check manually across all
> > databases, then drop and recreate the tablespace, would work.  As
> > someone who's working with a couple of these cases, it'd be awful nice
> > if there was a way PG would handle it for me.
>
> I wonder if some form of NOT VALID marking could be useful here.  Of
> course, this is not a constraint.  But a mechanism of a similar spirit
> seems apropos.  It seems reasonable to leave such a thing for future
> improvement.
>

+1

Then, to summarize Matheus must do:
* use an option instead of change the syntax and catalog to indicate that a
tablespace is used to store temp objects
* throw an exception if we try ALTER the option "only_temp_relations"
* add regression tests
* add documentation

And, of course, register to the next open commitfest [1] to get detailed
feedback and review.

Regards,

[1] https://commitfest.postgresql.org/

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
Joe Conway  writes:
> On 06/18/2014 12:09 PM, Tom Lane wrote:
>> I find myself a bit suspicious of this whole thing though.  If
>> it's necessary to explicitly clean up the tmpcontext, why not also
>> the sinfo->cstrs allocation?  And what about the tupdesc,
>> attinmeta, etc created further up in that "if (first)" block?  I'd
>> have expected that all this stuff gets allocated in a context
>> that's short-lived enough that we don't really need to clean it up
>> explicitly.

> Yeah, I thought about that too. My testing showed a small amount of
> remaining leakage -- like 20 MB on 100,000 iterations -- I wasn't sure
> that it was worthwhile to worry about. The memory context leak was
> much larger.

[ Thinks for awhile... ]  Ah.  The memory context is a child of
the econtext's ecxt_per_tuple_memory, which is supposed to be
short-lived, but we only do MemoryContextReset() on that after
each tuple, not MemoryContextResetAndDeleteChildren().  I recall
there's been debate about changing that, but it's not something
we can risk changing in back branches, for sure.  So I concur
we need an explicit context delete here.

I do see growth in the per-query context as well.  I'm not sure
where that's coming from, but we probably should try to find out.
A couple hundred bytes per iteration is going to add up, even if it's
not as fast as 8K per iteration.  I'm not sure it's dblink's fault,
because I don't see anything in dblink.c that is allocating anything in
the per-query context, except for the returned tuplestores, which
somebody else should clean up.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Reduce the number of semaphores used under --disable-spinlocks.

2014-06-18 Thread Andres Freund
On 2014-06-18 15:52:49 -0400, Robert Haas wrote:
> On Wed, Jun 18, 2014 at 3:32 PM, Andres Freund  wrote:
> > Hi,
> >
> > On 2014-01-08 23:58:16 +, Robert Haas wrote:
> >> Reduce the number of semaphores used under --disable-spinlocks.
> >>
> >> Instead of allocating a semaphore from the operating system for every
> >> spinlock, allocate a fixed number of semaphores (by default, 1024)
> >> from the operating system and multiplex all the spinlocks that get
> >> created onto them.  This could self-deadlock if a process attempted
> >> to acquire more than one spinlock at a time, but since processes
> >> aren't supposed to execute anything other than short stretches of
> >> straight-line code while holding a spinlock, that shouldn't happen.
> >>
> >> One motivation for this change is that, with the introduction of
> >> dynamic shared memory, it may be desirable to create spinlocks that
> >> last for less than the lifetime of the server.  Without this change,
> >> attempting to use such facilities under --disable-spinlocks would
> >> quickly exhaust any supply of available semaphores.  Quite apart
> >> from that, it's desirable to contain the quantity of semaphores
> >> needed to run the server simply on convenience grounds, since using
> >> too many may make it harder to get PostgreSQL running on a new
> >> platform, which is mostly the point of --disable-spinlocks in the
> >> first place.
> >
> > I'm looking at the way you did this in the context of the atomics
> > patch. Won't:
> > s_init_lock_sema(volatile slock_t *lock)
> > {
> > static int  counter = 0;
> >
> > *lock = (++counter) % NUM_SPINLOCK_SEMAPHORES;
> > }
> >
> > lead to bad results if spinlocks are intialized after startup?
> 
> Why?

Because every further process will start with a copy of the postmaster's
counter or with 0 (EXEC_BACKEND)?

> > Essentially mapping new spinlocks to the same semaphore?
> 
> Yeah, but so what?  If we're mapping a bajillion spinlocks to the same
> semaphore already, what's a few more?

Well, imagine something like parallel query creating new segments,
including a spinlock (possibly via a lwlock) at runtime. If there were
several backends processing such queries this they'd all map to the same
semaphore.

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Josh Berkus
On 06/18/2014 12:32 PM, Tom Lane wrote:
> Josh Berkus  writes:
>>  There are plenty of badly-written applications which "auto-begin", that
>> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or
>> not there's any additional work to do.  This is a major source of IIT
>> and the timeout should not ignore it.
> 
> Nonsense.  We explicitly don't do anything useful until the first actual
> command arrives, precisely to avoid that problem.

Oh, we don't allocate a snapshot?  If not, then no objection here.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Alvaro Herrera
Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
> > > Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
> > > ERROR that lists out all of the non-temporary objects which are found
> > > (and lists any other databases which have objects in those
> > > tablespaces..).  That would allow administrators who have existing
> > > notionally temporary-only tablespaces to go clean things up to make them
> > > actually temporary-only.

> > I would certainly suggest that the first version of the patch not
> > undertake to allow this property to be ALTERed; the cost-benefit
> > ratio isn't there IMO.
> 
> I suppose scheduling downtime to do the check manually across all
> databases, then drop and recreate the tablespace, would work.  As
> someone who's working with a couple of these cases, it'd be awful nice
> if there was a way PG would handle it for me.

I wonder if some form of NOT VALID marking could be useful here.  Of
course, this is not a constraint.  But a mechanism of a similar spirit
seems apropos.  It seems reasonable to leave such a thing for future
improvement.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Reduce the number of semaphores used under --disable-spinlocks.

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 3:32 PM, Andres Freund  wrote:
> Hi,
>
> On 2014-01-08 23:58:16 +, Robert Haas wrote:
>> Reduce the number of semaphores used under --disable-spinlocks.
>>
>> Instead of allocating a semaphore from the operating system for every
>> spinlock, allocate a fixed number of semaphores (by default, 1024)
>> from the operating system and multiplex all the spinlocks that get
>> created onto them.  This could self-deadlock if a process attempted
>> to acquire more than one spinlock at a time, but since processes
>> aren't supposed to execute anything other than short stretches of
>> straight-line code while holding a spinlock, that shouldn't happen.
>>
>> One motivation for this change is that, with the introduction of
>> dynamic shared memory, it may be desirable to create spinlocks that
>> last for less than the lifetime of the server.  Without this change,
>> attempting to use such facilities under --disable-spinlocks would
>> quickly exhaust any supply of available semaphores.  Quite apart
>> from that, it's desirable to contain the quantity of semaphores
>> needed to run the server simply on convenience grounds, since using
>> too many may make it harder to get PostgreSQL running on a new
>> platform, which is mostly the point of --disable-spinlocks in the
>> first place.
>
> I'm looking at the way you did this in the context of the atomics
> patch. Won't:
> s_init_lock_sema(volatile slock_t *lock)
> {
> static int  counter = 0;
>
> *lock = (++counter) % NUM_SPINLOCK_SEMAPHORES;
> }
>
> lead to bad results if spinlocks are intialized after startup?

Why?

> Essentially mapping new spinlocks to the same semaphore?

Yeah, but so what?  If we're mapping a bajillion spinlocks to the same
semaphore already, what's a few more?

> That's a
> restriction I can live with, especially as this is only for super old
> platforms. But it might be worth mentioning somewhere?

Dunno.  What restriction?

> I've essentially reeimplemented that kind of logic in the atomics
> patch. Looking to get rid of the duplication... There I used something
> like
> slot = ((uintptr_t) addr_of_atomic >> (sizeof(void*) + 5)) % NUM_LOCKS
> but I think your method is actually better because it allows to place
> spinlocks/atomics to be placed in dsm segments placed at different
> location in individual processes.

Right.

> My current plan to get rid of the duplication is to simply embed the
> spinlock inside the atomic variable instead of having a separate array
> of spinlocks protecting atomic variables.

Doesn't sound crazy at first glance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Jeff Janes
On Wed, Jun 18, 2014 at 10:58 AM, Andres Freund  wrote:
> On 2014-06-18 13:54:16 -0400, Tom Lane wrote:

>> I think we're not on the same page.  My point is that someone might want
>> to automate the whole sequence: stop old postmaster, run pg_upgrade, start
>> the updated postmaster normally (hence it *is* open for business), kick
>> off the analyze runs.  If you're concerned about minimal downtime you
>> would not want to be waiting around for the admin to issue a perfectly
>> predictable series of commands.
>
> Oh, yea. Definitely. I think that's what I've seen happen in pretty much
> *all* usages of pg_upgrade.

I think it is a popular way to do it not because it is a particularly
good way, but because the better alternatives are not readily
available.

If your database needs statistics badly enough that you want to do a
coarse pre-pass with default_statistics_target=1, why would you want
that pass to be done on an open database?  Surely you don't want 100
open connections all doing giant seq scans (that should be single-row
look up, but without stats they are not) competing with the analyze.

Having a database which is "open" to queries but they have such
deranged execution plans that they never actually finish is not truly
open, and the attempts to service those futile queries just delays the
true opening even further.

If you really need a multi pass ANALYZE, you probably need the first
pass to be before the database opens because otherwise the open will
be a disaster, and the 2nd pass to be after the database opens but
before your bulk queries (mass deletes, EOM reports, etc.) kick in.
Having both passes be on the same "side" of the opening seems unlikely
to do much good for most use cases.  Fortunately it probably doesn't
do much harm to most people, either, simple because most databases are
not terribly sensitive to the issue.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 3:28 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Jun 18, 2014 at 2:55 PM, Tom Lane  wrote:
>>> The net behavior would be the same, but I thought it might be easier to
>>> code by thinking of it this way.  Or maybe it wouldn't --- it's just a
>>> suggestion.
>
>> Well, the difference is that if we just don't check it, there can
>> never be an error.  Basically, it's the user's job to DTRT.  If we
>> check it against some semi-arbitrary value, we'll catch the case where
>> the old cluster was modified with a custom setting and the new one was
>> not - but couldn't we also get false positives under obscure
>> circumstances?
>
> Huh?  What we'd be checking is the LOBLKSIZE compiled into pg_upgrade
> versus that stored into pg_control by the new postmaster.  If those
> are different, then pg_upgrade didn't come from the same build as the
> new postmaster, which is already a pretty hazardous situation (especially
> if the user is fooling with low-level stuff like this).

OK, I agree that checking that wouldn't hurt anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Tom Lane
Josh Berkus  writes:
>  There are plenty of badly-written applications which "auto-begin", that
> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or
> not there's any additional work to do.  This is a major source of IIT
> and the timeout should not ignore it.

Nonsense.  We explicitly don't do anything useful until the first actual
command arrives, precisely to avoid that problem.

It might be that we should slap such apps' wrists anyway, but given
that we've gone to the trouble of working around the behavior at the
system structural level, I'd be inclined to say not.  What you'd be
doing is preventing people who have to deal with such apps from using
the timeout in any useful fashion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-18 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jun 18, 2014 at 2:55 PM, Tom Lane  wrote:
>> The net behavior would be the same, but I thought it might be easier to
>> code by thinking of it this way.  Or maybe it wouldn't --- it's just a
>> suggestion.

> Well, the difference is that if we just don't check it, there can
> never be an error.  Basically, it's the user's job to DTRT.  If we
> check it against some semi-arbitrary value, we'll catch the case where
> the old cluster was modified with a custom setting and the new one was
> not - but couldn't we also get false positives under obscure
> circumstances?

Huh?  What we'd be checking is the LOBLKSIZE compiled into pg_upgrade
versus that stored into pg_control by the new postmaster.  If those
are different, then pg_upgrade didn't come from the same build as the
new postmaster, which is already a pretty hazardous situation (especially
if the user is fooling with low-level stuff like this).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Josh Berkus
On 06/18/2014 11:50 AM, Kevin Grittner wrote:
> The first thing is that I don't think a delay between the BEGIN and
> the SELECT should cause a timeout to trigger, but more importantly
> there should not be two ERROR responses to one SELECT statement.

I do think a delay between BEGIN and SELECT should trigger the timeout.
 There are plenty of badly-written applications which "auto-begin", that
is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or
not there's any additional work to do.  This is a major source of IIT
and the timeout should not ignore it.

> I'm inclined to abandon the ERROR approach as not worth the effort
> and fragility, and focus on v1 of the patch.  If we can't get to
> consensus on that, I think that this patch should be flagged
> "Returned with Feedback", noting that any follow-up version
> requires some way to deal with the issues raised regarding multiple
> ERROR messages.

+1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 2:55 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jun 17, 2014 at 8:46 PM, Bruce Momjian  wrote:
>>> On Tue, Jun 17, 2014 at 07:12:16PM -0400, Tom Lane wrote:
 What about comparing to the symbolic value LOBLKSIZE?  This would make
 pg_upgrade assume that the old installation had been tweaked the same
 as in its own build.  This ends up being the same as what you said,
 ie, effectively no comparison ... but it might be less complicated to
 code/understand.
>
>>> OK, assume the compiled-in default is the value for an old cluster that
>>> has no value --- yeah, I could do that.
>
>> I'm not really sure why this is better than Bruce's original proposal, 
>> though.
>
> The net behavior would be the same, but I thought it might be easier to
> code by thinking of it this way.  Or maybe it wouldn't --- it's just a
> suggestion.

Well, the difference is that if we just don't check it, there can
never be an error.  Basically, it's the user's job to DTRT.  If we
check it against some semi-arbitrary value, we'll catch the case where
the old cluster was modified with a custom setting and the new one was
not - but couldn't we also get false positives under obscure
circumstances?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/18/2014 12:09 PM, Tom Lane wrote:
> Joe Conway  writes:
>> I think the context deletion was missed in the first place
>> because storeRow() is the wrong place to create the context.
>> Rather than creating the context in storeRow() and deleting it
>> two levels up in materializeQueryResult(), I think it should be
>> created and deleted in the interim layer, storeQueryResult().
>> Patch along those lines attached.
> 
> Since the storeInfo struct is longer-lived than
> storeQueryResult(), it'd probably be better if the cleanup looked
> like
> 
> + if (sinfo->tmpcontext != NULL) +
> MemoryContextDelete(sinfo->tmpcontext); + sinfo->tmpcontext =
> NULL;


good point

> I find myself a bit suspicious of this whole thing though.  If
> it's necessary to explicitly clean up the tmpcontext, why not also
> the sinfo->cstrs allocation?  And what about the tupdesc,
> attinmeta, etc created further up in that "if (first)" block?  I'd
> have expected that all this stuff gets allocated in a context
> that's short-lived enough that we don't really need to clean it up
> explicitly.

Yeah, I thought about that too. My testing showed a small amount of
remaining leakage -- like 20 MB on 100,000 iterations -- I wasn't sure
that it was worthwhile to worry about. The memory context leak was
much larger.

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJToeUQAAoJEDfy90M199hls8kP/0hhsVdn8HKcY7OZ64cKju4e
dH0SwQNMZxklpQkjEqb6vpbTmqaQ/7At/i7b6RGTMWFbIBu7L5Vpz+NZnJ7SyfaC
UjSgKfSe6M2auDIXlLCiZ62d8KZJJmVQj6U2h8ljhqorJq22kvClgXAUlcxZMoVv
SEthE1y7Udxgf3IqBO0AN6SMPrxP8bZDgrPbtZqw9UEHkCGZK0MDdH8TSRip7p6V
heyg9GbWeTvwFWBUYomMnMEUB6UlgBRnmHYsZIAjmUgxZfGiKhlydGOrb7MDHopz
ejyb06fg2MsSQnrEnCElLbonUqb5S9bD4p9BZHF/bz67AhVieJvDvnsSIoDjzR1a
+JIYe36ZDqo2kIGx4PuESgGX4ZTxxsrw033AQhVBW+KGkIuxjnvhh7tvEj3ACSjd
0bC1ztrnzCJLyNFd6jKaq5KAcNCEb/zvfKQAdHygJ87JkwE8I6u+ms1hdDpc10uZ
w484lcv/qP/JO2SaqPerDMwRsDXaovT8kEcsbqr1D7XFXu4cufHVGwe64IJYMjt+
9Y09/+i+Xt9DKXfkxC68Q2QWxwst5NfShRWDF3NG02iWlMqU2OsoVTQB6137DCiB
7hBKoNBmOApLJIjwCzEjB5IJmeeAz9x7YxgsqicRPnXoCKO+aF3dTbS+1u04va84
5XQ0p0lwUZaNfYr/xbi6
=H5bR
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
Joe Conway  writes:
> I think the context deletion was missed in the first place because
> storeRow() is the wrong place to create the context. Rather than
> creating the context in storeRow() and deleting it two levels up in
> materializeQueryResult(), I think it should be created and deleted in
> the interim layer, storeQueryResult(). Patch along those lines attached.

Since the storeInfo struct is longer-lived than storeQueryResult(),
it'd probably be better if the cleanup looked like

+   if (sinfo->tmpcontext != NULL)
+   MemoryContextDelete(sinfo->tmpcontext);
+   sinfo->tmpcontext = NULL;

I find myself a bit suspicious of this whole thing though.  If it's
necessary to explicitly clean up the tmpcontext, why not also the
sinfo->cstrs allocation?  And what about the tupdesc, attinmeta,
etc created further up in that "if (first)" block?  I'd have expected
that all this stuff gets allocated in a context that's short-lived
enough that we don't really need to clean it up explicitly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-18 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jun 17, 2014 at 8:46 PM, Bruce Momjian  wrote:
>> On Tue, Jun 17, 2014 at 07:12:16PM -0400, Tom Lane wrote:
>>> What about comparing to the symbolic value LOBLKSIZE?  This would make
>>> pg_upgrade assume that the old installation had been tweaked the same
>>> as in its own build.  This ends up being the same as what you said,
>>> ie, effectively no comparison ... but it might be less complicated to
>>> code/understand.

>> OK, assume the compiled-in default is the value for an old cluster that
>> has no value --- yeah, I could do that.

> I'm not really sure why this is better than Bruce's original proposal, though.

The net behavior would be the same, but I thought it might be easier to
code by thinking of it this way.  Or maybe it wouldn't --- it's just a
suggestion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/10/2014 02:57 AM, MauMau wrote:
> From: "Amit Kapila" 
>> Is there a need to free memory context in PG_CATCH()? In error
>> path the memory due to this temporary context will get freed as
>> it will delete the transaction context and this temporary context
>> will definitely be in the hierarchy of it, so it should also get
>> deleted.  I think such handling can be useful incase we use
>> PG_CATCH() to suppress the error.
> 
> I thought the same, but I also felt that I should make an effort
> to release resources as soon as possible, considering the memory
> context auto deletion as a last resort.  However, looking at other
> places where PG_CATCH() is used, memory context is not deleted.
> So, I removed the modification from PG_CATCH() block.  Thanks.

I think the context deletion was missed in the first place because
storeRow() is the wrong place to create the context. Rather than
creating the context in storeRow() and deleting it two levels up in
materializeQueryResult(), I think it should be created and deleted in
the interim layer, storeQueryResult(). Patch along those lines attached.

Any objections to me back-patching it this way?

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTod+UAAoJEDfy90M199hlaWgP/iitSQm7m+Sf2i8hP73TAdTX
Naw+TKtag822xzVT6AoVJafeZvEv0PNRYurP69y4zE11cwiG9u+RfoO1X1hP7FYu
mdHo5++YG+znmqDVC57Qav0qAheYaCk2Y9RKEZk9gEJLUO+HJRxuzc+L277lsAni
Iyu3DyaiogzmGBtjrPPex4VMtpFAPHzHWfABaL11kvNBXXpUjO/Z02ex+1nuZDV7
+wNSV4IdTTsmpdbbi/NRhDeU7sZOerIAY29B6vI/GXfHdwhDhg+3tG4PuoDg6YG2
jX/H37UE0zq0+J3ySnM92HZtn9RxfrjdkHTz/X7DAjZjHNwBNVi18oNfrsXUyqHO
3j3VEaRelow2+tUDaTxziEkZRA0vCRoeG20B7dgQY3op60wm9lpJ+pCQH8UGwPYC
HDxaOYcVKmxbWq+j86l2ZyYlfP8sVbqTMEc00dnYoc2sdDzU36+KJadIqoMoUgds
G5uNYOZ5eTxumua9exz2cqGVOdgzcDD3r/ZAUUVM0jk3LwdA4CKLorsy26Ce59lF
mSIO58uAJe198tyOCmbpZjbypIRRO3Qm73SfUmioCbcUzSg2tDdPpf0LP62V/YEm
gFIsPHNyZnaVm0baLilbJFg+88sxFSo1hvpoaUfNPVww7woAfFxi6uBt5h5KthUO
JoDWxYjYUy9VFDdC4rh4
=gj6p
-END PGP SIGNATURE-
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index a81853f..46c4a51 100644
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
*** storeQueryResult(storeInfo *sinfo, PGcon
*** 1076,1081 
--- 1076,1089 
  	if (!PQsetSingleRowMode(conn))		/* shouldn't fail */
  		elog(ERROR, "failed to set single-row mode for dblink query");
  
+ 	/* Create short-lived memory context for data conversions */
+ 	if (!sinfo->tmpcontext)
+ 		sinfo->tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+   "dblink temporary context",
+   ALLOCSET_DEFAULT_MINSIZE,
+   ALLOCSET_DEFAULT_INITSIZE,
+   ALLOCSET_DEFAULT_MAXSIZE);
+ 
  	for (;;)
  	{
  		CHECK_FOR_INTERRUPTS();
*** storeQueryResult(storeInfo *sinfo, PGcon
*** 1119,1124 
--- 1127,1136 
  	/* clean up GUC settings, if we changed any */
  	restoreLocalGucs(nestlevel);
  
+ 	/* clean up data conversion short-lived memory context */
+ 	if (sinfo->tmpcontext != NULL)
+ 		MemoryContextDelete(sinfo->tmpcontext);
+ 
  	/* return last_res */
  	res = sinfo->last_res;
  	sinfo->last_res = NULL;
*** storeRow(storeInfo *sinfo, PGresult *res
*** 1204,1218 
  		if (sinfo->cstrs)
  			pfree(sinfo->cstrs);
  		sinfo->cstrs = (char **) palloc(nfields * sizeof(char *));
- 
- 		/* Create short-lived memory context for data conversions */
- 		if (!sinfo->tmpcontext)
- 			sinfo->tmpcontext =
- AllocSetContextCreate(CurrentMemoryContext,
- 	  "dblink temporary context",
- 	  ALLOCSET_DEFAULT_MINSIZE,
- 	  ALLOCSET_DEFAULT_INITSIZE,
- 	  ALLOCSET_DEFAULT_MAXSIZE);
  	}
  
  	/* Should have a single-row result if we get here */
--- 1216,1221 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Kevin Grittner
Robert Haas  wrote:
> Tom Lane  wrote:
>> Robert Haas  writes:

>>> I thought the reason why this hasn't been implemented before
>>> now is that sending an ErrorResponse to the client will result
>>> in a loss of protocol sync.
>>
>> Hmm ... you are right that this isn't as simple as an
>> ereport(ERROR), but I'm not sure it's impossible.  We could for
>> instance put the backend into skip-till-Sync state so that it
>> effectively ignored the next command message.  Causing that to
>> happen might be impracticably messy, though.
>
> Another thing we could maybe do is AbortCurrentTransaction() and
> send the client a NoticeResponse saying "hey, expect all of your
> future commands to fail with complaints about the transaction
> being aborted".

Wow, until I read through the thread on this patch and the old
thread that Robert linked to I had forgotten the attempt by Andres
to deal with this three and a half years ago.  That patch died
because of how complicated it was to do the right thing if the
connection was left open.

Personally, where I have seen a use for this, treating it as FATAL
would be better anyway; but was OK with treating it as an ERROR if
that didn't sink the patch.  I guess that puts me in the same camp
as Josh and Robert.

Vik has submitted v1 and v2 of this patch; the only difference
between them is that v1 makes a timeout FATAL and v2 makes it an
ERROR (with appropriate corresponding changes to code comments,
documentation, and message text).  It's not hard to show the
problems with an ERROR under v2, confirming that the simple
approach of just changing the ereport severity is not enough:

test=# set idle_in_transaction_timeout = '3s';
SET
test=# begin;
BEGIN
test=# select 1;
ERROR:  current transaction is aborted due to idle-in-transaction timeout
test=# commit;
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
test=# commit;
ROLLBACK

The first thing is that I don't think a delay between the BEGIN and
the SELECT should cause a timeout to trigger, but more importantly
there should not be two ERROR responses to one SELECT statement.

I'm inclined to abandon the ERROR approach as not worth the effort
and fragility, and focus on v1 of the patch.  If we can't get to
consensus on that, I think that this patch should be flagged
"Returned with Feedback", noting that any follow-up version
requires some way to deal with the issues raised regarding multiple
ERROR messages.

Abridged for space. hopefully without losing the main points of the
authors, so far:

Josh Berkus:
> My $0.018: terminating the connection is the
> preferred behavior.

Tom Lane:
> FWIW, I think aborting the transaction is probably better,
> especially if the patch is designed to do nothing to
> already-aborted transactions.  If the client is still there, it
> will see the abort as a failure in its next query, which is less
> likely to confuse it completely than a connection loss.  (I
> think, anyway.)

Álvaro Herrera:
> I think if we want this at all it should abort the xact, not drop
> the connection.

Andrew Dunstan:
> [quotes Tom's argument]
> Yes, I had the same thought.

David G Johnston:
> Default to dropping the connection but give the
> usersadministrators the capability to decide for themselves?

Robert Haas:
> Assuming that the state of play hasn't changed in some way I'm
> unaware of since 2010, I think the best argument for FATAL here
> is that it's what we can implement.  I happen to think it's
> better anyway, because the cases I've seen where this would
> actually be useful involve badly-written applications that are
> not under the same administrative control as the database and
> supposedly have built-in connection poolers, except sometimes
> they seem to forget about some of the connections they themselves
> opened.  The DBAs can't make the app developers fix the app; they
> just have to try to survive.  Aborting the transaction is a step
> in the right direction but since the guy at the other end of the
> connection is actually or in effect dead, that just leaves you
> with an eternally idle connection.

Tom Lane (reprise):
> I'm not sure whether cancel-transaction behavior is enough better
> than cancel-session to warrant extra effort here.

Andres Freund:
> [quotes Tom's latest (above)]
> I don't think idle_in_transaction_timeout is worth this, but if
> we could implement it we could finally have recovery conflicts
> against idle in txn sessions not use FATAL...

Kevin Grittner:
> Personally, where I have seen a use for this, treating it as
> FATAL would be better anyway

A few ideas were put forward on how a much more complicated patch
could allow transaction rollback with an ERROR work acceptably, but
I think that would probably need to be a new patch in a later CF,
so that is omitted here.

So, can we agree to use FATAL to terminate the connection?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-

Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 8:46 PM, Bruce Momjian  wrote:
> On Tue, Jun 17, 2014 at 07:12:16PM -0400, Tom Lane wrote:
>> Bruce Momjian  writes:
>> > On Tue, Jun 17, 2014 at 03:55:02PM -0400, Alvaro Herrera wrote:
>> >> Can't you compare it to the historic default value?  I mean, add an
>> >> assumption that people thus far has never tweaked it.
>>
>> > Well, if they did tweak it, then they would be unable to use pg_upgrade
>> > because it would complain about a mismatch if they actually matched the
>> > old and new servers.
>>
>> What about comparing to the symbolic value LOBLKSIZE?  This would make
>> pg_upgrade assume that the old installation had been tweaked the same
>> as in its own build.  This ends up being the same as what you said,
>> ie, effectively no comparison ... but it might be less complicated to
>> code/understand.
>
> OK, assume the compiled-in default is the value for an old cluster that
> has no value --- yeah, I could do that.

I'm not really sure why this is better than Bruce's original proposal, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
> > ERROR that lists out all of the non-temporary objects which are found
> > (and lists any other databases which have objects in those
> > tablespaces..).  That would allow administrators who have existing
> > notionally temporary-only tablespaces to go clean things up to make them
> > actually temporary-only.
> 
> That seems just about impossible from a concurrency standpoint
> (ie, what if someone is creating a new table in the tablespace
> concurrently with your check?  Possibly in a different database?)

Yeah, that's definitely an annoying complexity.

> I would certainly suggest that the first version of the patch not
> undertake to allow this property to be ALTERed; the cost-benefit
> ratio isn't there IMO.

I suppose scheduling downtime to do the check manually across all
databases, then drop and recreate the tablespace, would work.  As
someone who's working with a couple of these cases, it'd be awful nice
if there was a way PG would handle it for me.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Jun 17, 2014 at 10:25 PM, Stephen Frost  wrote:
> > I agree that we want to support that, if we can do so reasonably.  What
> > I was trying to get at is simply this- don't we provide that already
> > with the leakproof attribute and functions?  If we don't have enough
> > there to allow index scans then we should be looking to add more, I'm
> > thinking.
> 
> So the reason why we got onto this particular topic was because of the
> issue of multiple security policies for a single table.  Of course,
> multiple security policies can always be merged into a single
> more-complex policy, but the resulting policy may be so complex that
> the query-planner is no longer capable of doing a good job optimizing
> it.  

Yeah, I could see that happening with some use-cases.

> >> ALTER TABLE tab ADD POLICY polname WHERE quals;
> >> GRANT SELECT (polname) ON TABLE tab TO role;
> >
> > Right, if we were to support multiple policies on a given table then we
> > would have to support adding and removing them individually, as well as
> > specify when they are to be applied- and what if that "when" overlaps?
> > Do we apply both and only a row which passed them all gets sent to the
> > user?  Essentially we'd be defining the RLS policies to be AND'd
> > together, right?  Would we want to support both AND-based and OR-based,
> > and allow users to pick what set of conditionals they want applied to
> > their various overlapping RLS policies?
> 
> AND is not a sensible policy; it would need to be OR.  If you grant
> someone access to two different subsets of the rows in a table, it
> stands to reason that they will expect to have access to all of the
> rows that are in at least one of those subsets.  

I think I can buy off on this.  What that also means is that any
'short-circuiting' that we try to do here would be based on "stop once
we get back a 'true'".  This could seriously change how we actually
implement RLS though as doing it all through query rewrites and making
this work with multiple security policies which are OR'd together and
yet keeping the optimization and qual push-down and index-based plans is
looking pretty daunting.  

I'm also of the opinion that this isn't strictly necessary for the
initial RLS offering in PG- there's a clear way we could migrate
existing users to a multi-policy system from a single-policy system.
Sure, to get the performance and optimization benefits that we'd
presumably have in the multi-policy case they'd need to re-work their
RLS configuration, but for users who care, they'll likely be very happy
to do so to gain those benefits.

Perhaps the question here is- if we implement RLS one way for the single
case and then change the implementation all around for the multi case,
will we end up breaking the single case?  Or destroying the performance
for it?  I can't see either of those cases being allowed- if and when we
support multi, we must still support single and the whole point of multi
would be to allow more performant implementations and that solution will
require the single case to be at least as performant as what we're
proposing to do today, I believe.

Or are you thinking that we would never support calling user-defined
functions in any RLS scheme because we want to be able to do that
optimization?  I don't see that being acceptable from a feature
standpoint.

> Alternatively, we could:
> 
> - Require the user to specify in some way which of the available
> policies they want applied, and then apply only that one.

I'd want to at least see a way to apply an ordering to the policies
being applied, or have PG work out which one is "cheapest" and try that
one first.

> - Decide that such scenarios constitute misconfiguration. Throw an
> error and make the table owner or other relevant local authority fix
> it.

Having them all be OR'd together feels simpler and easier to work with
than trying to provide the user with all the knobs necessary to select
which subset of users they want the policy applied to when (user X from
IP range a.b.c.d/24 at time 1500).  We could probably make it work with
exclusion constraints, range types, etc, and perhaps it'd be a reason to
bring btree_gist into core (which I'm all for) and make it work with
catalog tables, but... just 'yuck' all around, for my part.

> > Sounds all rather painful and much better done programatically by the
> > user in a language which is suited to that task- eg: pl/pgsql, perl, C,
> > or something besides our ALTER syntax + catalog representation.
> 
> I think exactly the opposite, for the query planning reasons
> previously stated.  I think the policies will quickly get so
> complicated that they're no longer optimizable.  Here's a simple
> example:
> 
> - Policy 1 allows the user to access rows for which complexfunc() returns 
> true.
> - Policy 2 allows the user to access rows for which a = 1.
> 
> Most users have access only through policy 2, but some have access
> t

Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Josh Berkus
On 06/18/2014 11:03 AM, Andres Freund wrote:
> Well, all those actually do write to the xlog (to write a new
> checkpoint, containing the updated control file). Since pg_resetxlog has
> done all this pretty much since forever renaming it now seems to be a
> big hassle for users for pretty much no benefit? This isn't a tool the
> average user should ever touch.

If we're using it to create a unique system ID which can be used to
orchestrate replication and clustering systems, a lot more people are
going to be touching it than ever did before -- and not just for BDR.

Or are you saying that we have to destroy the data by resetting the xlog
before we can change the system identifier?  If so, this feature is less
than completely useful ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Fabrízio de Royes Mello
On Wed, Jun 18, 2014 at 1:59 PM, Tom Lane  wrote:
>
> Stephen Frost  writes:
> > Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
> > ERROR that lists out all of the non-temporary objects which are found
> > (and lists any other databases which have objects in those
> > tablespaces..).  That would allow administrators who have existing
> > notionally temporary-only tablespaces to go clean things up to make them
> > actually temporary-only.
>
> That seems just about impossible from a concurrency standpoint
> (ie, what if someone is creating a new table in the tablespace
> concurrently with your check?  Possibly in a different database?)
>

You are correct, I completely forgot that CREATE TABLESPACE is not
transaction safe.


> I would certainly suggest that the first version of the patch not
> undertake to allow this property to be ALTERed; the cost-benefit
> ratio isn't there IMO.
>

+1

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Andres Freund
On 2014-06-18 10:59:59 -0700, Josh Berkus wrote:
> On 06/18/2014 10:48 AM, Abhijit Menon-Sen wrote:
> > At 2014-06-18 10:44:56 -0700, j...@agliodbs.com wrote:
> >>
> >> I'm unclear on why we would overload pg_resetxlog for this. 
> > 
> > Because pg_resetxlog already does something very similar, so the patch
> > is small. If it were independent, it would have to copy quite some code
> > from pg_resetxlog.
> 
> Aha.  In that case, it seems like it's time to rename pg_resetxlog, if
> it does a bunch of things that aren't resetting the xlog.

Well, all those actually do write to the xlog (to write a new
checkpoint, containing the updated control file). Since pg_resetxlog has
done all this pretty much since forever renaming it now seems to be a
big hassle for users for pretty much no benefit? This isn't a tool the
average user should ever touch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Josh Berkus
On 06/18/2014 10:48 AM, Abhijit Menon-Sen wrote:
> At 2014-06-18 10:44:56 -0700, j...@agliodbs.com wrote:
>>
>> I'm unclear on why we would overload pg_resetxlog for this. 
> 
> Because pg_resetxlog already does something very similar, so the patch
> is small. If it were independent, it would have to copy quite some code
> from pg_resetxlog.

Aha.  In that case, it seems like it's time to rename pg_resetxlog, if
it does a bunch of things that aren't resetting the xlog.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Andres Freund
On 2014-06-18 13:54:16 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-06-18 13:24:14 -0400, Tom Lane wrote:
> >> Andres Freund  writes:
> >>> On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
>  Another angle is that some folks might have tried to automate things
>  even more, with a wrapper script that starts up the new postmaster
>  and runs analyze_new_cluster.sh all by itself.  I guess they could
>  make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
>  though, so maybe that's not a fatal objection either.
> 
> >>> Wouldn't that be quite counterproductive? The reason we don't normally
> >>> do that and why --analyze-in-stages exists is that the cluster should be
> >>> started up as fast as possible. Restarting it after ANALYZE went through
> >>> would be defeating that purpose, no?
> 
> >> How so?  Once you've started the postmaster, you're open for business,
> >> no?
> 
> > Wasn't there lots of talk about making the server inaccessible while
> > pg_upgrade is doing its thing? Also, many people are desparately unhappy
> > if postgres has to be restarted (to return to be being OS managed) after
> > their application already has connected.
> 
> I think we're not on the same page.  My point is that someone might want
> to automate the whole sequence: stop old postmaster, run pg_upgrade, start
> the updated postmaster normally (hence it *is* open for business), kick
> off the analyze runs.  If you're concerned about minimal downtime you
> would not want to be waiting around for the admin to issue a perfectly
> predictable series of commands.

Oh, yea. Definitely. I think that's what I've seen happen in pretty much
*all* usages of pg_upgrade. I somehow misread that you wanted to add
that into pg_upgrade. Not really sure how, sorry.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Tom Lane
Andres Freund  writes:
> On 2014-06-18 13:24:14 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
 Another angle is that some folks might have tried to automate things
 even more, with a wrapper script that starts up the new postmaster
 and runs analyze_new_cluster.sh all by itself.  I guess they could
 make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
 though, so maybe that's not a fatal objection either.

>>> Wouldn't that be quite counterproductive? The reason we don't normally
>>> do that and why --analyze-in-stages exists is that the cluster should be
>>> started up as fast as possible. Restarting it after ANALYZE went through
>>> would be defeating that purpose, no?

>> How so?  Once you've started the postmaster, you're open for business,
>> no?

> Wasn't there lots of talk about making the server inaccessible while
> pg_upgrade is doing its thing? Also, many people are desparately unhappy
> if postgres has to be restarted (to return to be being OS managed) after
> their application already has connected.

I think we're not on the same page.  My point is that someone might want
to automate the whole sequence: stop old postmaster, run pg_upgrade, start
the updated postmaster normally (hence it *is* open for business), kick
off the analyze runs.  If you're concerned about minimal downtime you
would not want to be waiting around for the admin to issue a perfectly
predictable series of commands.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to change the pgsql source code and build it??

2014-06-18 Thread Shreesha
Well, the initdb issue looks to be resolved.  Actually, after making the
changes as suggested by Kyotaro Horiguchi, I copied only initdb binary to
my platform and didn't copy all of them. Hence, the dependencies were still
not resolved and was getting the error. However, now the database server is
started and is up and running.

But, When I try to connect the client to the server, I am getting the
following error:

/switch/pgsql/bin # ./psql
FATAL:  database "root" does not exist
psql: FATAL:  database "root" does not exist

Upon browsing couple of links, I learned that in order to get through with
this issue, we should login with the actual postgres user so that it will
let the client to get connected with the default database. However in my
case, I don't know why there wasn't a default database with name 'root' got
created or why the server rejects the client when it tries to connect to
the default database.

Can anyone shed some light on
1) when the default database gets created
2) how is this database 'name' is decided? Or what is the name of the
default database name?
3) Is there any other places in the database server code where this check
is applied?

Upon looking at the error I got, I believe the code is searching for the
database name == user name.  If anyone can give some input on the code, it
would be helpful!

Thanks,
Shreesha



On Mon, Jun 16, 2014 at 6:58 PM, Craig Ringer  wrote:

> On 06/17/2014 02:02 AM, Shreesha wrote:
> > But I believe, I am looking forward for the exact opposite of it. In
> > other words, a possible work around for a root user to execute certain
> > executable(s) as an unprivileged user.
> >
>
> So you want the su or sudo commands?
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
~Shreesha.


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Abhijit Menon-Sen
At 2014-06-18 10:44:56 -0700, j...@agliodbs.com wrote:
>
> I'm unclear on why we would overload pg_resetxlog for this. 

Because pg_resetxlog already does something very similar, so the patch
is small. If it were independent, it would have to copy quite some code
from pg_resetxlog.

> Wouldn't it be better design to have an independant function,
> "pg_set_system_identifier"?

A *function*? Why?

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Andres Freund
On 2014-06-18 10:44:56 -0700, Josh Berkus wrote:
> On 06/13/2014 05:31 PM, Petr Jelinek wrote:
> > Hello,
> > 
> > attached is a simple patch which makes it possible to change the system
> > identifier of the cluster in pg_control. This is useful for
> > individualization of the instance that is started on top of data
> > directory produced by pg_basebackup - something that's helpful for
> > logical replication setup where you need to easily identify each node
> > (it's used by Bidirectional Replication for example).
> 
> I'm unclear on why we would overload pg_resetxlog for this.  Wouldn't it
> be better design to have an independant function,
> "pg_set_system_identifier"?

You mean an independent binary? Because it's not possible to change this
at runtime.

If so, it's because pg_resetxlog already has the option to change many
related things (e.g. the timeline id). And it'd require another copy of
several hundred lines of code. It's all stored in the control file/checkpoints.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Alvaro Herrera
Josh Berkus wrote:
> On 06/13/2014 05:31 PM, Petr Jelinek wrote:
> > Hello,
> > 
> > attached is a simple patch which makes it possible to change the system
> > identifier of the cluster in pg_control. This is useful for
> > individualization of the instance that is started on top of data
> > directory produced by pg_basebackup - something that's helpful for
> > logical replication setup where you need to easily identify each node
> > (it's used by Bidirectional Replication for example).
> 
> I'm unclear on why we would overload pg_resetxlog for this.  Wouldn't it
> be better design to have an independant function,
> "pg_set_system_identifier"?

We have overloaded pg_resetxlog for all pg_control-tweaking needs.  I
don't see any reason to do differently here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Josh Berkus
On 06/13/2014 05:31 PM, Petr Jelinek wrote:
> Hello,
> 
> attached is a simple patch which makes it possible to change the system
> identifier of the cluster in pg_control. This is useful for
> individualization of the instance that is started on top of data
> directory produced by pg_basebackup - something that's helpful for
> logical replication setup where you need to easily identify each node
> (it's used by Bidirectional Replication for example).

I'm unclear on why we would overload pg_resetxlog for this.  Wouldn't it
be better design to have an independant function,
"pg_set_system_identifier"?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Andres Freund
On 2014-06-18 13:26:37 -0400, Robert Haas wrote:
> My vote is to hold off on this until we've talked about replication
> identifiers and other related topics in more depth.

I really don't understand this. We're *NOT* proposing this patch as an
underhanded way of preempting the discussion of whether/how replication
identifiers are going to be used. We're proposing it because we
currently have a need for the facility and this will reduce the number
of patches we have to keep around after 9.5. And more importantly
because there's several other use cases than our internal one for it.

To settle one more point: I am not planning to propose BDR's generation
of replication identifier names for integration. It works well enough
for BDR but I think we can come up with something better for core. If I
had my current knowledge two years back I'd not have chosen the current
scheme.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Petr Jelinek

On 18/06/14 19:26, Robert Haas wrote:

On Wed, Jun 18, 2014 at 12:54 PM, Andres Freund  wrote:

I don't see how the proposed ability makes it more dangerous. It
*already* has the ability to reset the timelineid. That's the case where
users are much more likely to think about resetting it because that's
much more plausible than taking a unrelated cluster and resetting its
sysid, timeline and LSN.


All right, well, I've said my piece.  I don't have anything to add to
that that isn't sheer repetition.  My vote is to hold off on this
until we've talked about replication identifiers and other related
topics in more depth.  But if that position doesn't garner majority
support ... so be it!



I am not sure I get what does this have to do with replication 
identifiers. The patch has several use-cases, one of them has to do that 
you can know the future system id before you set it, which is useful for 
automating some things...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Andres Freund
On 2014-06-18 13:24:14 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
> >> Another angle is that some folks might have tried to automate things
> >> even more, with a wrapper script that starts up the new postmaster
> >> and runs analyze_new_cluster.sh all by itself.  I guess they could
> >> make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
> >> though, so maybe that's not a fatal objection either.
> 
> > Wouldn't that be quite counterproductive? The reason we don't normally
> > do that and why --analyze-in-stages exists is that the cluster should be
> > started up as fast as possible. Restarting it after ANALYZE went through
> > would be defeating that purpose, no?
> 
> How so?  Once you've started the postmaster, you're open for business,
> no?

Wasn't there lots of talk about making the server inaccessible while
pg_upgrade is doing its thing? Also, many people are desparately unhappy
if postgres has to be restarted (to return to be being OS managed) after
their application already has connected.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 12:54 PM, Andres Freund  wrote:
>> Well, I think it *does* make pg_resetxlog more dangerous; see previous
>> discussion of pg_computemaxlsn.
>
> Wasn't the thing around pg_computemaxlsn that there's actually no case
> where it could be used safely? And that experienced people suggested to
> use it an unsafe fashion?

There is a use case - to determine whether WAL has been replayed "from
the future" relative to the WAL stream and control file you have on
disk.  But the rest is true enough.

> I don't see how the proposed ability makes it more dangerous. It
> *already* has the ability to reset the timelineid. That's the case where
> users are much more likely to think about resetting it because that's
> much more plausible than taking a unrelated cluster and resetting its
> sysid, timeline and LSN.

All right, well, I've said my piece.  I don't have anything to add to
that that isn't sheer repetition.  My vote is to hold off on this
until we've talked about replication identifiers and other related
topics in more depth.  But if that position doesn't garner majority
support ... so be it!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Tom Lane
Andres Freund  writes:
> On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
>> Another angle is that some folks might have tried to automate things
>> even more, with a wrapper script that starts up the new postmaster
>> and runs analyze_new_cluster.sh all by itself.  I guess they could
>> make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
>> though, so maybe that's not a fatal objection either.

> Wouldn't that be quite counterproductive? The reason we don't normally
> do that and why --analyze-in-stages exists is that the cluster should be
> started up as fast as possible. Restarting it after ANALYZE went through
> would be defeating that purpose, no?

How so?  Once you've started the postmaster, you're open for business,
no?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Andres Freund
On 2014-06-18 18:54:05 +0200, Andres Freund wrote:
> On 2014-06-18 12:36:13 -0400, Robert Haas wrote:
> > Sure, but that only requires being able to reset the ID randomly, not
> > to a particular value.
> 
> I can definitely see a point in a version of the option that generates
> the id randomly.

That's actually included in the patch btw (thanks Abhijit)...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Tom Lane
Stephen Frost  writes:
> Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
> ERROR that lists out all of the non-temporary objects which are found
> (and lists any other databases which have objects in those
> tablespaces..).  That would allow administrators who have existing
> notionally temporary-only tablespaces to go clean things up to make them
> actually temporary-only.

That seems just about impossible from a concurrency standpoint
(ie, what if someone is creating a new table in the tablespace
concurrently with your check?  Possibly in a different database?)

I would certainly suggest that the first version of the patch not
undertake to allow this property to be ALTERed; the cost-benefit
ratio isn't there IMO.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Andres Freund
On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
> Another angle is that some folks might have tried to automate things
> even more, with a wrapper script that starts up the new postmaster
> and runs analyze_new_cluster.sh all by itself.  I guess they could
> make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
> though, so maybe that's not a fatal objection either.

Wouldn't that be quite counterproductive? The reason we don't normally
do that and why --analyze-in-stages exists is that the cluster should be
started up as fast as possible. Restarting it after ANALYZE went through
would be defeating that purpose, no?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Andres Freund
On 2014-06-18 12:36:13 -0400, Robert Haas wrote:
> On Tue, Jun 17, 2014 at 12:50 PM, Andres Freund  
> wrote:
> >> >> I can clearly understand the utility of being able to reset the system
> >> >> ID to a new, randomly-generated system ID - but giving the user the
> >> >> ability to set a particular value of their own choosing seems like a
> >> >> pretty sharp tool.  What is the use case for that?
> >
> > I've previously hacked this up adhoc during data recovery when I needed
> > to make another cluster similar enough that I could replay WAL.
> >
> > Another usecase is to mark a database as independent from its
> > origin. Imagine a database that gets sharded across several
> > servers. It's not uncommon to do that by initially basebackup'ing the
> > database to several nodes and then use them separately from
> > thereon. It's quite useful to actually mark them as being
> > distinct. Especially as several of them right now would end up with the
> > same timeline id...
> 
> Sure, but that only requires being able to reset the ID randomly, not
> to a particular value.

I can definitely see a point in a version of the option that generates
the id randomly. But the use case one up actually does require setting
it to a s specific value...

> > Uh. Right now this patch has been written because it's needed for a out
> > of core replication solution. That's what BDR is at this point. The
> > patch is unobtrusive, has other usecases than just our internal one and
> > doesn't make pg_resetxlog even more dangerous than it already is. I
> > don't see much problem with considering it on it's own cost/benefit?
> 
> Well, I think it *does* make pg_resetxlog more dangerous; see previous
> discussion of pg_computemaxlsn.

Wasn't the thing around pg_computemaxlsn that there's actually no case
where it could be used safely? And that experienced people suggested to
use it an unsafe fashion?
I don't see how the proposed ability makes it more dangerous. It
*already* has the ability to reset the timelineid. That's the case where
users are much more likely to think about resetting it because that's
much more plausible than taking a unrelated cluster and resetting its
sysid, timeline and LSN.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Tom Lane
Christoph Berg  writes:
> now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't
> it make sense to get rid of the analyze_new_cluster.sh file which
> pg_upgrade writes? The net content is a single line which could as
> well be printed by pg_upgrade itself. Instead of an lengthy
> explanation how to invoke that manually, there should be a short note
> and a pointer to some manual section. I think the chances of people
> reading that would even be increased.

> Similary, I don't really see the usefulness of delete_old_cluster.sh
> as a file, when "rm -rf" could just be presented on the console for
> the admin to execute by cut-and-paste.

There are contexts where pg_upgrade is executed by some wrapper script
and the user doesn't normally see its output directly.  This is the
case in the Red Hat packaging (unless Honza changed it since I left ;-))
and I think Debian might be similar.

I generally don't like the amount of cruft pg_upgrade leaves lying
around, so I'd be glad to see these script files go away if possible;
but we need to think about how this will play when there's a wrapper
script between pg_upgrade and the human user.

In the Red Hat wrapper script, the pg_upgrade output is dumped into a
log file, which the user can look at if he wants, but I'd bet the
average user doesn't read it --- that was certainly the expectation.
Of course, said user probably never notices the separate shell
scripts either, so maybe it's a wash.

Another angle is that some folks might have tried to automate things
even more, with a wrapper script that starts up the new postmaster
and runs analyze_new_cluster.sh all by itself.  I guess they could
make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
though, so maybe that's not a fatal objection either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication identifier format

2014-06-18 Thread Andres Freund
On 2014-06-18 12:36:13 -0400, Robert Haas wrote:
> > I actually don't think any of the discussions I was involved in had the
> > externally visible version of replication identifiers limited to 16bits?
> > If you are referring to my patch, 16bits was just the width of the
> > *internal* name that should basically never be looked at. User visible
> > replication identifiers are always identified by an arbitrary string -
> > whose format is determined by the user of the replication identifier
> > facility. *BDR* currently stores the system identifer, the database id
> > and a name in there - but that's nothing core needs to concern itself
> > with.
> 
> I don't think you're going to be able to avoid users needing to know
> about those IDs.  The configuration table is going to have to be the
> same on all nodes, and how are you going to get that set up without
> those IDs being user-visible?

Why? Users and other systems only ever see the external ID. Everything
leaving the system is converted to the external form. The short id
basically is only used in shared memory and in wal records. For both
using longer strings would be problematic.

In the patch I have the user can actually see them as they're stored in
pg_replication_identifier, but there should never be a need for that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread David Fetter
On Tue, Jun 17, 2014 at 04:07:55PM -0400, Robert Haas wrote:
> On Sat, Jun 14, 2014 at 7:56 PM, Kevin Grittner  wrote:
> > I looked at the standard, and initially tried to implement the
> > standard syntax for this; however, it appeared that the reasons
> > given for not using standard syntax for the row variables also
> > apply to the transition relations (the term used by the standard).
> > There isn't an obvious way to tie that in to all the PLs we
> > support.  It could be done, but it seems like it would intolerably
> > ugly, and more fragile than what we have done so far.
> 
> I'm not too familiar with this area.  Can you describe what the
> standard syntax for the row variables is, as opposed to our syntax?
> Also, what's the standard syntax for the the transition relations?

The good:
- Generating the tuplestores.  Yay!

The bad:
- Generating them exactly and only for AFTER triggers
- Requiring that the tuplestores both be generated or not at all.
  There are real use cases described below where only one would be relevant.
- Generating the tuplestores unconditionally.

The ugly:
- Attaching tuplestore generation to tables rather than callers (triggers, 
DML, etc.)

The SQL standard says:

 ::=
CREATE TRIGGER   
ON  [ REFERENCING  ]


 ::=
BEFORE
| AFTER
| INSTEAD OF

 ::=
INSERT
| DELETE
| UPDATE [ OF  ]

 ::=


 ::=
[ FOR EACH { ROW | STATEMENT } ]
[  ]


 ::=
WHEN   

 ::=

| BEGIN ATOMIC {   }... END

 ::=
...

 ::=
  OLD [ ROW ] [ AS ] 
| NEW [ ROW ] [ AS ] 
| OLD TABLE [ AS ] 
| NEW TABLE [ AS ] 

 ::=

 ::=


 ::=


 ::=


 ::=


Sorry that was a little verbose, but what it does do is give us what we need at
trigger definition time.  I'd say it's pilot error if a trigger
definition says "make these tuplestores" and the trigger body then
does nothing with them, which goes to Robert's point below re:
unconditional overhead.

> > Some things which I *did* follow from the standard: these new
> > relations are only allowed within AFTER triggers, but are available
> > in both AFTER STATEMENT and AFTER ROW triggers.  That is, an AFTER
> > UPDATE ... FOR EACH ROW trigger could use both the OLD and NEW row
> > variables as well as the delta relations (under whatever names we
> > pick).  That probably won't be used very often, but I can imagine
> > some cases where it might be useful.  I expect that these will
> > normally be used in FOR EACH STATEMENT triggers.
> 
> I'm concerned about the performance implications of capturing the
> delta relations unconditionally.

Along that same line, we don't always need to capture both the before
tuplestores and the after ones.  Two examples of this come to mind:

- BEFORE STATEMENT triggers accessing rows, where there is no after part to 
use, and
- DML (RETURNING BEFORE, e.g.) which only touches one of them.  This
  applies both to extant use cases of RETURNING and to planned ones.

I'm sure if I can think of two, there are more.

In summary, I'd like to propose that the tuplestores be generated
separately in general and attached to callers. We can optimize this by
not generating redundant tuplestores.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 12:50 PM, Andres Freund  wrote:
>> >> I can clearly understand the utility of being able to reset the system
>> >> ID to a new, randomly-generated system ID - but giving the user the
>> >> ability to set a particular value of their own choosing seems like a
>> >> pretty sharp tool.  What is the use case for that?
>
> I've previously hacked this up adhoc during data recovery when I needed
> to make another cluster similar enough that I could replay WAL.
>
> Another usecase is to mark a database as independent from its
> origin. Imagine a database that gets sharded across several
> servers. It's not uncommon to do that by initially basebackup'ing the
> database to several nodes and then use them separately from
> thereon. It's quite useful to actually mark them as being
> distinct. Especially as several of them right now would end up with the
> same timeline id...

Sure, but that only requires being able to reset the ID randomly, not
to a particular value.

>> But it seems to me that we might need to have a process discussion
>> here, because, while I'm all in favor of incremental feature proposals
>> that build towards a larger goal, it currently appears that the larger
>> goal toward which you are building is not something that's been
>> publicly discussed and debated on this list.  And I really think we
>> need to have that conversation.  Obviously, individual patches will
>> still need to be debated, but I feel like 2ndQuadrant is trying to
>> construct a castle without showing the community the floor plan.  I
>> believe that there is relatively broad agreement that we would all
>> like a castle, but different people may have legitimately different
>> ideas about how it should be constructed.  If the work arrives as a
>> series of disconnected pieces (user-specified system ID, event
>> triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has
>> to take it on faith that those pieces are going to eventually fit
>> together in a way that we'll all be happy with.  In some cases, that's
>> fine, because the feature is useful on its own merits whether it ends
>> up being part of the castle or not.
>>
>
> Uh. Right now this patch has been written because it's needed for a out
> of core replication solution. That's what BDR is at this point. The
> patch is unobtrusive, has other usecases than just our internal one and
> doesn't make pg_resetxlog even more dangerous than it already is. I
> don't see much problem with considering it on it's own cost/benefit?

Well, I think it *does* make pg_resetxlog more dangerous; see previous
discussion of pg_computemaxlsn.

> So this seems to be a concern that's relatively independent of this
> patch. Am I seing that right?

Partially; not completely.

> I actually don't think any of the discussions I was involved in had the
> externally visible version of replication identifiers limited to 16bits?
> If you are referring to my patch, 16bits was just the width of the
> *internal* name that should basically never be looked at. User visible
> replication identifiers are always identified by an arbitrary string -
> whose format is determined by the user of the replication identifier
> facility. *BDR* currently stores the system identifer, the database id
> and a name in there - but that's nothing core needs to concern itself
> with.

I don't think you're going to be able to avoid users needing to know
about those IDs.  The configuration table is going to have to be the
same on all nodes, and how are you going to get that set up without
those IDs being user-visible?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Stephen Frost
* Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote:
> On Wed, Jun 18, 2014 at 9:00 AM, Stephen Frost  wrote:
> > Not sure about that specific syntax (don't we have SET options now?) but
> > I do like the general idea.
> 
> Maybe something like that:
> 
> CREATE TABLESPACE spcname LOCATION '/foo/bar' WITH (only_temp_relations =
> true);

Yeah, that's more along the lines of what I was thinking.

> Have in mind you must take care if you use ALTER TABLESPACE spcname SET
> (...) to guarantee that exists only temp objects stored in the target
> tablespace, and if exists a regular object you must throw an exception.
> 
> Makes sense?

Yes.  I'd definitely like to see an ALTER TABLESPACE option, with an
ERROR that lists out all of the non-temporary objects which are found
(and lists any other databases which have objects in those
tablespaces..).  That would allow administrators who have existing
notionally temporary-only tablespaces to go clean things up to make them
actually temporary-only.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Atomics hardware support table & supported architectures

2014-06-18 Thread Andres Freund
On 2014-06-18 11:15:15 -0400, Robert Haas wrote:
> On Tue, Jun 17, 2014 at 1:55 PM, Andres Freund  wrote:
> > What happens is that gcc will do a syscall triggering the kernel to turn
> > of scheduling; perform the math and store the result; turn scheduling on
> > again. That way there cannot be a other operation influencing the
> > calculation/store. Imagine if you have hardware that, internally, only
> > does stores in 4 byte units. Even if it's a single CPU machine, which
> > most of those are, the kernel could schedule a separate process after
> > the first 4bytes have been written. Oops. The kernel has ways to prevent
> > that, userspace doesn't...
> 
> Interesting.  "Turn off scheduling" sounds like a pretty dangerous syscall.

The kernel does that internally, just during cmpxchg. It'll reenable
interrupts before returning. There's hundreds of places inside at least
linux doing that internally...

Should you be interested in the details:
https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt

> >> > Does somebody want other columns in there?
> >>
> >> I think the main question at the developer meeting was how far we want
> >> to go with supporting primitives like atomic add, atomic and, atomic
> >> or, etc.  So I think we should add columns for those.
> >
> > Well, once CAS is available, atomic add etc is all trivially
> > implementable - without further hardware support. It might be more
> > efficient to use the native instruction (e.g. xadd can be much better
> > than a cmpxchg loop because there's no retries), but that's just
> > optimization that won't matter unless you have a fair bit of
> > concurrency.
> >
> > There's currently fallbacks like:
> > #ifndef PG_HAS_ATOMIC_FETCH_ADD_U32
> > #define PG_HAS_ATOMIC_FETCH_ADD_U32
> > STATIC_IF_INLINE uint32
> > pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 add_)
> > {
> > uint32 old;
> > while (true)
> > {
> > old = pg_atomic_read_u32_impl(ptr);
> > if (pg_atomic_compare_exchange_u32_impl(ptr, &old, old + 
> > add_))
> > break;
> > }
> > return old;
> > }
> 
> I understand, but the performance characteristics are quite different.

There might be cases where that's true, but in the majority of cases
where the variable isn't highly contended it's just about the same. In
many cases the microcode will implement atomic ops using ll/sc
operations internally anyway.

> My understanding from the developer meeting was that we'd be OK with
> having, say, three levels of support for atomic ops: all ops
> supported, only TAS, none.  Or maybe four: all ops, CAS + TAS, only
> TAS, none.  But I think there was resistance (in which I participate)
> to the idea of, say, having platform 1 with "add" but not "and" and
> "or", platform 2 with "and" and "or" but not "add", platform 3 with
> both, platform 4 with neither, etc.  Then it becomes too hard for
> developers to predict whether something that is a win on their
> platform will be a loss on some other platform.

I don't think developers really have to care. In nearly all the cases
the above cmpxchg loop will loop precisely once. Usually the cacheline
will stay in the exclusive state on that process, and that's it. In most
cases what it'll be replacing will be something protected by a spinlock
anyways - and the above isn't any worse than that.

Note that we're far from the only one falling back to cmpxchg for all
these ops - that's pretty much a standard fallback.

I'm fine with always implementing everything but tas, cas, add ontop of
cas for now. I want or/and/sub/... to be conveniently available to C
code, but they don't need to be based on hardware ops. Having open-coded
versions of the above in several places sounds like a horrible idea to
me. Both, because we might want to optimize it at some point and because
it's horrible readability wise.

> >> > 3) sparcv8: Last released model 1997.
> >>
> >> I seem to recall hearing about this in a customer situation relatively
> >> recently, so there may be a few of these still kicking around out
> >> there.
> >
> > Really? As I'd written in a reply solaris 10 (released 2005) dropped
> > support for it. Dropping support for a platform that's been desupported
> > 10 years ago by it's manufacturer doesn't sound bad imo...
> 
> We definitely have at least one customer using Solaris 9.  I don't
> know their architecture for certain, but they did recently install a
> new version of PostgreSQL.

But Solaris 9 doesn't imply sparcv8. Ultrasparc (first sparcv9) support
was added in solaris 2.5... That's a *long* time ago.

If my googling turned up the right information you needed to use special
-xarch flags for sun studio to even compile binaries that are pure v8
(v8plus is fine btw) since sun studio 9... Same for gcc apparently 
(-mno-v8plus),
although I don't know how far back.

The reason I'd like to kill it is that there's basically zero chance of

Re: [HACKERS] BUG #10680 - ldapbindpasswd leaks to postgresql log

2014-06-18 Thread Magnus Hagander
On Wed, Jun 18, 2014 at 4:50 AM, Tom Lane  wrote:

> Steven Siebert  writes:
> > Attached is a proposed patch for BUG #10680.
>
> > It's a simple fix to the problem of the ldapbindpasswd leaking in
> > clear text to the postgresql log.  The patch simply removes the raw
> > pg_hba.conf line from the log message, but retains the log line number
> > to assist admins in troubleshooting.
>
> You haven't exactly explained why this is a problem.  The proposed patch
> would impede diagnosing of many other problems, so it's not going to get
> committed without a thoroughly compelling rationale.
>

Yes, properly logging that was intentional, in commit
7f49a67f954db3e92fd96963169fb8302959576e.


Hint: "I don't store my postmaster log securely" is not compelling.
> We've been over that ground before; there are far too many reasons
> why access to the postmaster log is a potential security hazard
> to justify concluding that this particular one is worse.
>

Yeah, and the password is already in cleartext in a file next to it.

If we actually feel the need to get rid of it, we should do a better job.
Such as actively blanking it out with something else. Since we know the
password (we parsed it out), it shouldn't be impossible to actually blank
out *just the password*, without ruining all the other diagnostics usage of
it.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] Is analyze_new_cluster.sh still useful?

2014-06-18 Thread Christoph Berg
Hi,

now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't
it make sense to get rid of the analyze_new_cluster.sh file which
pg_upgrade writes? The net content is a single line which could as
well be printed by pg_upgrade itself. Instead of an lengthy
explanation how to invoke that manually, there should be a short note
and a pointer to some manual section. I think the chances of people
reading that would even be increased.

Similary, I don't really see the usefulness of delete_old_cluster.sh
as a file, when "rm -rf" could just be presented on the console for
the admin to execute by cut-and-paste.

Christoph
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 CF1

2014-06-18 Thread Magnus Hagander
On Tue, Jun 17, 2014 at 6:54 PM, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > Robert Haas wrote:
> >> On Mon, Jun 16, 2014 at 2:36 AM, Abhijit Menon-Sen 
> wrote:
> >>> P.S. If you tag your reviews with [REVIEW] in the Subject, it'll be
> >>> easier to keep track of them.
>
> >> I and, I believe, various other people hate that style, because at
> >> least in Gmail, it breaks the threading.  It is much easier to find
> >> things if they are all posted on one thread.
>
> > Yes, please don't do that.  A simple, normal reply to the message that
> > submits the patch is much better from my point of view as a subsequent
> > reviewer and committer.
>
> Worth noting also is that Magnus is working on a new version of the
> commitfest app that will be able to automatically keep track of threads
> about patches --- so long as they *are* threads according to our mailing
> list archives.  I'm not sure if the archives recognize replies with a
> changed Subject: as being the same thread or not.
>

The archives code does threading based on the headers (in-reply-to and
references, in priority order). It completely ignores the subject when it
comes to the threading.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] buildfarm client release 4.13

2014-06-18 Thread Andrew Dunstan

I have released version 4.13 of the PostgreSQL Buildfarm client.

This can be downloaded from 



Changes in this release (from the git log):

   fcc182b Don't run TestCollateLinuxUTF8 on unsupported branches.
   273af50 Don't run FileTextArrayFDW tests if not wanted.
   9944a4a Don't remove the ccache directory on failure, unless
   configured to.
   73e4187 Make ccache and vpath builds play nicely together.
   9ff8c22 Work around path idiocy in msysGit.
   0948ac7 revert naming change for log files
   ca68525 Exclude ecpg/tests from find_typedefs code.


If you are using ccache, please note that there are adjustments to the 
recommended use pattern. The sample config file no longer suggests that 
the ccache directory have the branch name at the end. It is now 
recommended that you use a single cache for all branches for a 
particular member. To do this remove "/$branch" from the relevant line 
in your config file, if you have it, and remove those directories in the 
cache root. Your first run on each branch will rebuild some or all of 
the cache. My unifoed cache on crake is now at 615 Mb, rather than the 
multiples of Gb it had been previously.


It is recommended that this release be deployed by all users fairly 
quickly because of the fix in log file names that was discarding some 
that were quite important.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quantify small changes to predicate evaluation

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 10:23 AM, Dennis Butterstein  wrote:
> Hi Marti, thank you for your quick reply. I tried the proposed tweaks and
> see some differences regarding the measurements. It seems as if the overall
> query performance dropped a little what I think the disabled turbo boost
> mode is responsible for (all measurements are single query only). I think
> that kind of behaviour is somewhat expected. Run1: 26.559s Run2: 28.280s
> Unfortunately the variance between the runs seems to remain high. In fact I
> have exclusive access to the machine and I made sure not to run in any i/o
> related problems regarding buffer caches. Are there any other stumbling
> blocks I'm missing at the moment? Maybe I've to rethink my (end-to-end)
> measurement strategy. In your experience how small is it possible to get
> measurement variances on Postgres? Thank you very much. Kind regards, Dennis

I find that it's possible to get smaller variations than what you're
experiencing on read-only workloads without any special tuning, but
variation on workloads that write data is much higher, similar to what
you're seeing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table & supported architectures

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 1:55 PM, Andres Freund  wrote:
> But the concern is more whether 1 byte can actually be written
> without also writing neighbouring values. I.e. there's hardware out
> there that'll implement a 1byte store as reading 4 bytes, changing one
> of the bytes in a register, and then write the 4 bytes out again. Which
> would mean that a neighbouring write will possibly cause a wrong value
> to be written...

Ah, OK.  I've heard about that before, but had forgotten.

> What happens is that gcc will do a syscall triggering the kernel to turn
> of scheduling; perform the math and store the result; turn scheduling on
> again. That way there cannot be a other operation influencing the
> calculation/store. Imagine if you have hardware that, internally, only
> does stores in 4 byte units. Even if it's a single CPU machine, which
> most of those are, the kernel could schedule a separate process after
> the first 4bytes have been written. Oops. The kernel has ways to prevent
> that, userspace doesn't...

Interesting.  "Turn off scheduling" sounds like a pretty dangerous syscall.

>> > Does somebody want other columns in there?
>>
>> I think the main question at the developer meeting was how far we want
>> to go with supporting primitives like atomic add, atomic and, atomic
>> or, etc.  So I think we should add columns for those.
>
> Well, once CAS is available, atomic add etc is all trivially
> implementable - without further hardware support. It might be more
> efficient to use the native instruction (e.g. xadd can be much better
> than a cmpxchg loop because there's no retries), but that's just
> optimization that won't matter unless you have a fair bit of
> concurrency.
>
> There's currently fallbacks like:
> #ifndef PG_HAS_ATOMIC_FETCH_ADD_U32
> #define PG_HAS_ATOMIC_FETCH_ADD_U32
> STATIC_IF_INLINE uint32
> pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 add_)
> {
> uint32 old;
> while (true)
> {
> old = pg_atomic_read_u32_impl(ptr);
> if (pg_atomic_compare_exchange_u32_impl(ptr, &old, old + 
> add_))
> break;
> }
> return old;
> }

I understand, but the performance characteristics are quite different.
My understanding from the developer meeting was that we'd be OK with
having, say, three levels of support for atomic ops: all ops
supported, only TAS, none.  Or maybe four: all ops, CAS + TAS, only
TAS, none.  But I think there was resistance (in which I participate)
to the idea of, say, having platform 1 with "add" but not "and" and
"or", platform 2 with "and" and "or" but not "add", platform 3 with
both, platform 4 with neither, etc.  Then it becomes too hard for
developers to predict whether something that is a win on their
platform will be a loss on some other platform.

>> > 3) sparcv8: Last released model 1997.
>>
>> I seem to recall hearing about this in a customer situation relatively
>> recently, so there may be a few of these still kicking around out
>> there.
>
> Really? As I'd written in a reply solaris 10 (released 2005) dropped
> support for it. Dropping support for a platform that's been desupported
> 10 years ago by it's manufacturer doesn't sound bad imo...

We definitely have at least one customer using Solaris 9.  I don't
know their architecture for certain, but they did recently install a
new version of PostgreSQL.

>> > 4) i386: Support dropped from windows 98 (yes, really), linux, openbsd
>> >(yes, really), netbsd (yes, really). No code changes needed.
>>
>> Wow, OK.  In that case, yeah, let's dump it.  But let's make sure we
>> adequately document that someplace in the code comments, along with
>> the reasons, because not everyone may realize how dead it is.
>
> I'm generally wondering how to better document the supported os/platform
> combinations. E.g. it's not apparent that we only support certain
> platforms on a rather limited set of compilers...
>
> Maybe a table with columns like: platform, platform version,
> supported-OSs, supported-compilers?

Sounds worth at try.

>> > 6) armv-v5
>>
>> I think this is also a bit less dead than the other ones; Red Hat's
>> shows Bugzilla shows people filing bugs for platform-specific problems
>> as recently as January of 2013:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=892378
>
> Closed as WONTFIX :P.
>
> Joking aside, I think there are still usecases for arm-v5 - but it's
> embedded stuff without a real OS and such. Nothing you'd install PG
> on. There's distributions that are dropping ARMv6 support already... My
> biggest problem is that it's not even documented whether v5 has atomic
> 4byte stores - while it's documted for v6.

I think in doubtful cases we might as well keep the support in.  If
you've got the fallback to non-atomics, keeping the other code around
doesn't hurt much, and might make it easier for someone who is
interested in one of those platforms.  It's fine and good to kill
things that

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Wed, Jun 18, 2014 at 10:40 AM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost  wrote:
>> > I had taken it to be a single privilege, but you're right, it could be
>> > done for each of those..  I really don't think we have the bits for more
>> > than one case here though (if that) without a fair bit of additional
>> > rework.  I'm not against that rework (and called for it wayyy back when
>> > I proposed the TRUNCATE privilege, as I recall) but that's a whole
>> > different challenge and no small bit of work..
>>
>> Technically, there are 4 bits left, and that's what we need for
>> separate privileges.
>
> I'd really hate to chew them all up..

Usually it's the patch author who WANTS to chew up all the available
bit space and OTHER people who say no.  :-)

>> We last consumed bits in 2008 (for TRUNCATE) and
>> 2006 (for GRANT ON DATABASE), so even if we used all of the remaining
>> bits it might be another 5 years before anyone has to do that
>> refactoring.
>
> Perhaps, or we might come up with some new whiz-bang permission to add
> next year. :/

Well, people proposed separate permissions for things like VACUUM and
ANALYZE around the time TRUNCATE was added, and those were rejected on
the grounds that they didn't add enough value to justify wasting bits
on them.  I think we see whether there's a workable system that such
that marginal permissions (like TRUNCATE) that won't be checked often
don't have to consume bits.

>> But even if the refactoring needs to be done now for
>> some reason, it's only June, and the last CommitFest doesn't start
>> until February 15th.  I think we're being way too quick to jump to
>> talking about what can and can't be done in time for 9.5.  Let's start
>> by figuring out how we'd really like it to work and then, if it's too
>> ambitious, we can scale it back.
>
> Alright- perhaps we can discuss what kind of refactoring would be needed
> for such a change then, to get a better idea as to the scope of the
> change and the level of effort required.
>
> My thoughts on how to address this were to segregate the ACL bits by
> object type.  That is to say, the AclMode stored for databases might
> only use bits 0-2 (create/connect/temporary), while tables would use
> bits 0-7 (insert/select/update/delete/references/trigger).  This would
> allow us to more easily add more rights at the database and/or
> tablespace level too.

Yeah, that's another idea.  But it really deserves its own thread.
I'm still not convinced we have to do this at all to meet this need,
but that should be argued back and forth on that other thread.

>> My main concern about using only one bit is that someone might want to
>> allow a user to bypass RLS on SELECT while still enforcing it for
>> data-modifying operations.  That seems like a plausible use case to
>> me.
>
> I absolutely agree that it's a real use-case and one which we should
> support, just trying to avoid biting off more than can be done between
> now and February.  Still, if we get things hammered out and more-or-less
> agreement on the way forward, getting the code written may move quickly.

Nifty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict

2014-06-18 Thread Joshua D. Drake


On 06/18/2014 06:34 AM, Tom Lane wrote:

Craig Ringer  writes:

I posted about a possible packaging issue with RHEL 6 PGDG packages for
9.4beta on pgsql-yum-pkg, but things are pretty quiet over there (a
prior mail asking about what happened with moving to git hasn't had a
response).
http://www.postgresql.org/message-id/53a10ef9.9060...@2ndquadrant.com


Why in the world is PGDG installing something named libevent at all?



pg_bouncer.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
"If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans."


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 10:25 PM, Stephen Frost  wrote:
>> Yeah, if we have to ask an external security module a question for
>> each row, there's little hope of any real optimization.  However, I
>> think there will be a significant number of cases where people will
>> want filtering clauses that can be realized by doing an index scan
>> instead of a sequential scan, and if we end up forcing a sequential
>> scan anyway, the feature will be useless to those people.
>
> I agree that we want to support that, if we can do so reasonably.  What
> I was trying to get at is simply this- don't we provide that already
> with the leakproof attribute and functions?  If we don't have enough
> there to allow index scans then we should be looking to add more, I'm
> thinking.

So the reason why we got onto this particular topic was because of the
issue of multiple security policies for a single table.  Of course,
multiple security policies can always be merged into a single
more-complex policy, but the resulting policy may be so complex that
the query-planner is no longer capable of doing a good job optimizing
it.  I won't mention here exactly what a certain large commercial
database vendor has implemented here; suffice it to say, however, that
their design avoids this pitfall, and ours currently does not.

> I agree on this point, but I'm still hopeful that we'll be able to get a
> good feature into 9.5.  There are quite a few resources available for
> the 'just programming' part, so the long pole in the tent here is
> absolutely hashing out what we want and how it should function.

Agreed.

> I'd be happy to host or participate in a conference call or similar if
> that would be useful to move this along- or we can continue to
> communicate via email.  There's a bit of a lull in conferences to which
> I'm going to right now, so in person is unlikely, unless folks want to
> get together somewhere on the east coast (I'd be happy to travel to
> Philly, Pittsburgh, NYC, etc, if it'd help..).

For me, email is easiest; but there are other options, too.

>> > What solution did you come up with for this case, which performed well
>> > and was also secure..?
>>
>> I put the logic in the client.  :-(
>
> Well, that's not helpful here. ;)

Sure.  The reason I brought it up is to say - hey, look, I had this
come up in the real world.  What would it take to be able to do
actually do it in the database server?  And the answer is - something
that will handle multiple security policies cleanly.

>> But I'm not sure; that
>> feels like it's giving something up that might be important.  And I
>> think that the kinds of syntax we're discussing won't support leaving
>> that out of the initial version and adding it later, so if we commit
>> to this syntax, we're stuck with that behavior.  To avoid that, we'd
>> need something like this:
>>
>> ALTER TABLE tab ADD POLICY polname WHERE quals;
>> GRANT SELECT (polname) ON TABLE tab TO role;
>
> Right, if we were to support multiple policies on a given table then we
> would have to support adding and removing them individually, as well as
> specify when they are to be applied- and what if that "when" overlaps?
> Do we apply both and only a row which passed them all gets sent to the
> user?  Essentially we'd be defining the RLS policies to be AND'd
> together, right?  Would we want to support both AND-based and OR-based,
> and allow users to pick what set of conditionals they want applied to
> their various overlapping RLS policies?

AND is not a sensible policy; it would need to be OR.  If you grant
someone access to two different subsets of the rows in a table, it
stands to reason that they will expect to have access to all of the
rows that are in at least one of those subsets.  If you give someone
your car key and your house key, that means they can operate your car
or enter your house; it does not mean that they can operate your car
but only when it's inside your garage.

Alternatively, we could:

- Require the user to specify in some way which of the available
policies they want applied, and then apply only that one.
or
- Decide that such scenarios constitute misconfiguration. Throw an
error and make the table owner or other relevant local authority fix
it.

> Sounds all rather painful and much better done programatically by the
> user in a language which is suited to that task- eg: pl/pgsql, perl, C,
> or something besides our ALTER syntax + catalog representation.

I think exactly the opposite, for the query planning reasons
previously stated.  I think the policies will quickly get so
complicated that they're no longer optimizable.  Here's a simple
example:

- Policy 1 allows the user to access rows for which complexfunc() returns true.
- Policy 2 allows the user to access rows for which a = 1.

Most users have access only through policy 2, but some have access
through policy 1.  Users who have access through policy 1 will always
get a sequential scan, but users who have access through po

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost  wrote:
> > I had taken it to be a single privilege, but you're right, it could be
> > done for each of those..  I really don't think we have the bits for more
> > than one case here though (if that) without a fair bit of additional
> > rework.  I'm not against that rework (and called for it wayyy back when
> > I proposed the TRUNCATE privilege, as I recall) but that's a whole
> > different challenge and no small bit of work..
> 
> Technically, there are 4 bits left, and that's what we need for
> separate privileges.  

I'd really hate to chew them all up..

> We last consumed bits in 2008 (for TRUNCATE) and
> 2006 (for GRANT ON DATABASE), so even if we used all of the remaining
> bits it might be another 5 years before anyone has to do that
> refactoring.  

Perhaps, or we might come up with some new whiz-bang permission to add
next year. :/

> But even if the refactoring needs to be done now for
> some reason, it's only June, and the last CommitFest doesn't start
> until February 15th.  I think we're being way too quick to jump to
> talking about what can and can't be done in time for 9.5.  Let's start
> by figuring out how we'd really like it to work and then, if it's too
> ambitious, we can scale it back.

Alright- perhaps we can discuss what kind of refactoring would be needed
for such a change then, to get a better idea as to the scope of the
change and the level of effort required.

My thoughts on how to address this were to segregate the ACL bits by
object type.  That is to say, the AclMode stored for databases might
only use bits 0-2 (create/connect/temporary), while tables would use
bits 0-7 (insert/select/update/delete/references/trigger).  This would
allow us to more easily add more rights at the database and/or
tablespace level too.

> My main concern about using only one bit is that someone might want to
> allow a user to bypass RLS on SELECT while still enforcing it for
> data-modifying operations.  That seems like a plausible use case to
> me.

I absolutely agree that it's a real use-case and one which we should
support, just trying to avoid biting off more than can be done between
now and February.  Still, if we get things hammered out and more-or-less
agreement on the way forward, getting the code written may move quickly.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict

2014-06-18 Thread Devrim Gündüz

Hi,

On Wed, 2014-06-18 at 09:41 -0400, Andrew Dunstan wrote:
> The only thing I can think of offhand that would require it is
> pgbouncer.

Yeah. Recent pgbouncer versions require libevent 2.0+, and RHEL 6 has
1.4.

Regards,
-- 
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] comparison operators

2014-06-18 Thread Tom Lane
David G Johnston  writes:
> Andrew Dunstan wrote
>> I think I'd rather just say "for many data types" or something along 
>> those lines, rather than imply that there is some obvious rule that 
>> users should be able to intuit.

> Ideal world for me: we'd list the data types that do not provide comparison
> operators (or not a full set) by default with links to the section in the
> documentation where the reasoning for said omission is explained and/or
> affirmed.

I was just wondering whether that wouldn't be a shorter list.
It's not hard to get the base types that don't have btree opclasses:

select typname from pg_type where not exists
(select 1 from pg_opclass where opcmethod = 403 and opcdefault and opcintype = 
pg_type.oid) and typtype = 'b' and not (typelem!=0 and typlen=-1) order by 1;
typname
---
 aclitem
 box
 cid
 cidr
 circle
 gtsvector
 json
 line
 lseg
 path
 pg_node_tree
 point
 polygon
 refcursor
 regclass
 regconfig
 regdictionary
 regoper
 regoperator
 regproc
 regprocedure
 regtype
 smgr
 txid_snapshot
 unknown
 varchar
 xid
 xml
(28 rows)

although this is misleading because some of these are binary-coercible to
indexable types, which means that the indexable type's opclass works for
them.

Eliminating those, we get

select typname from pg_type where not exists
(select 1 from pg_opclass where opcmethod = 403 and opcdefault and 
binary_coercible(pg_type.oid, opcintype)) and typtype = 'b' and not (typelem!=0 
and typlen=-1) order by 1;
typname
---
 aclitemhaven't bothered, no obvious sort order anyway
 boxno linear sort order
 cidhaven't bothered
 circle no linear sort order
 gtsvector  internal type, wouldn't be useful
 json
 line   no linear sort order
 lseg   no linear sort order
 path   no linear sort order
 point  no linear sort order
 polygonno linear sort order
 refcursor  haven't bothered
 smgr   useless legacy type
 txid_snapshot  no linear sort order
 unknownthere are no operations for 'unknown'
 xidno linear sort order (yes, really)
 xml
(17 rows)

So really we're pretty close to being able to say "there are comparison
operators for every built-in type for which it's sensible".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] comparison operators

2014-06-18 Thread David G Johnston
Andrew Dunstan wrote
> On 06/17/2014 07:25 PM, Andres Freund wrote:
>> On 2014-06-17 19:22:07 -0400, Tom Lane wrote:
>>> Andrew Dunstan <

> andrew@

> > writes:
 I went to have a look at documenting the jsonb comparison operators,
 and
 found that the docs on comparison operators contain this:
  Comparison operators are available for all relevant data types.
 They neglect to specify further, however. This doesn't seem very
 satisfactory. How is a user to know which are relevant? I know they are
 not available for xml and json, but are for jsonb. Just talking about
 "all relevant types" seems rather hand-wavy.
>>> Well, there are 38 default btree opclasses in the standard system ATM.
>>> Are we worried enough about this to list them all explicitly?  Given the
>>> lack of complaints to date, I'm not.
> 
> I think I'd rather just say "for many data types" or something along 
> those lines, rather than imply that there is some obvious rule that 
> users should be able to intuit.

Ideal world for me: we'd list the data types that do not provide comparison
operators (or not a full set) by default with links to the section in the
documentation where the reasoning for said omission is explained and/or
affirmed.

My other reaction is that referring to data types at all in this section is
unnecessary - other than maybe to state (which it does not currently) that
both sides of the comparison must be of the same (or binary equivalent, like
text/varchar) type or there must exist an implicit cast for one of the
operands.  Much of that knowledge is implied and well understood though, as
is the fact that operators are closely associated with data types.  IOW - I
would be fine with removing "Comparison operators are available for all
relevant data types" and not replacing it with anything.  Though "for many
data types" is my preferred equivalent phrase for the same reasons Andrew
noted.

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/comparison-operators-tp5807654p5807757.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] comparison operators

2014-06-18 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Andrew Dunstan (and...@dunslane.net) wrote:
> >> I think I'd rather just say "for many data types" or something along
> >> those lines, rather than imply that there is some obvious rule that
> >> users should be able to intuit.
> 
> > Perhaps with a link to where the informaiton about which exist is
> > available..?  Or a query to get the list?
> 
> Queries for this sort of thing are covered in the chapter about index
> opclasses.  The basic query would be like

Right, a link to there from this would be useful, imv.

> select opcintype::regtype from pg_opclass where opcmethod = 403 and 
> opcdefault;
> 
> but I'm not sure if this is completely useful; it's not obvious for
> example that the "text" opclass is also used for varchar.  Another
> point is that some of the operators aren't named in the conventional
> way.

Good point.  Hopefully a link over to the index-opclasses.html would be
helpful to users exploring these questions.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Fabrízio de Royes Mello
On Wed, Jun 18, 2014 at 9:00 AM, Stephen Frost  wrote:
>
> > I have took some small time to make a PoC just to see if that is doable.
> > And so I did a new syntax like:
> >
> > CREATE TABLESPACE spcname [TEMP | TEMPORARY] LOCATION ...
> >
> > So, if TEMP or TEMPORARY is present, I mark a new column at
pg_tablespace
> > as true. On every table creation or moving to a new tablespace, I just
> > check this, and fails if the tablespace is "temporary" but the
> > "relpersistence" says the table is not.
>
> Not sure about that specific syntax (don't we have SET options now?) but
> I do like the general idea.
>

Maybe something like that:

CREATE TABLESPACE spcname LOCATION '/foo/bar' WITH (only_temp_relations =
true);

Have in mind you must take care if you use ALTER TABLESPACE spcname SET
(...) to guarantee that exists only temp objects stored in the target
tablespace, and if exists a regular object you must throw an exception.

Makes sense?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Clarification of FDW API Documentation

2014-06-18 Thread Bernd Helmle



--On 13. Juni 2014 13:46:38 -0400 Tom Lane  wrote:


Imagine if `BeginForeignScan` set up a remote cursor and
`IterateForeignScan` just fetched _one tuple at a time_ (unlike the
current behavior where they are fetched in batches). The tuple would be
passed to `ExecForeignDelete` (as is required), but the remote cursor
would remain pointing at that tuple. Couldn't `ExecForeignDelete` just
call `DELETE FROM table WHERE CURRENT OF cursor` to then delete that
tuple?


No.  This is not guaranteed (or even likely) to work in join cases: the
tuple to be updated/deleted might no longer be the current one of the
scan. You *must* arrange for the scan to return enough information to
uniquely identify the tuple later, and that generally means adding some
resjunk columns.


Yeah, this is exactly the trap i ran into while implementing the 
informix_fdw driver. It used an updatable cursor to implement the modify 
actions as you proposed first. Consider a query like


UPDATE remote SET f1 = t.id FROM local t WHERE t.id = f1

The planner might choose a hash join where the hash table is built by 
forwarding the cursor via the foreign scan. You'll end up with the cursor 
positioned at the end and you have no way to get it back "in sync" when the 
modify action is actually called.


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-18 Thread Robert Haas
On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Fri, Jun 13, 2014 at 3:11 AM, Dean Rasheed  
>> wrote:
>> > Yeah, I was thinking something like this could work, but I would go
>> > further. Suppose you had separate GRANTable privileges for direct
>> > access to individual tables, bypassing RLS, e.g.
>> >
>> >   GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name
>>
>> So, is this one new privilege (DIRECT) or four separate new privileges
>> that are variants of the existing privileges (DIRECT SELECT, DIRECT
>> INSERT, DIRECT UPDATE, DIRECT DELETE)?
>
> I had taken it to be a single privilege, but you're right, it could be
> done for each of those..  I really don't think we have the bits for more
> than one case here though (if that) without a fair bit of additional
> rework.  I'm not against that rework (and called for it wayyy back when
> I proposed the TRUNCATE privilege, as I recall) but that's a whole
> different challenge and no small bit of work..

Technically, there are 4 bits left, and that's what we need for
separate privileges.  We last consumed bits in 2008 (for TRUNCATE) and
2006 (for GRANT ON DATABASE), so even if we used all of the remaining
bits it might be another 5 years before anyone has to do that
refactoring.  But even if the refactoring needs to be done now for
some reason, it's only June, and the last CommitFest doesn't start
until February 15th.  I think we're being way too quick to jump to
talking about what can and can't be done in time for 9.5.  Let's start
by figuring out how we'd really like it to work and then, if it's too
ambitious, we can scale it back.

My main concern about using only one bit is that someone might want to
allow a user to bypass RLS on SELECT while still enforcing it for
data-modifying operations.  That seems like a plausible use case to
me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >